Stable Diffusion Reddit7h ago|Research & PapersProducts & Services

Training a Keep/Trash Classifier on CLIP & DINOv2 Embeddings for Stable Diffusion Coloring Pages

The author trains a simple logistic regression classifier on CLIP and DINOv2 image embeddings to automatically classify Stable Diffusion-generated coloring page images as 'keep' or 'trash', reducing the manual curation workload.

💡

Why it matters

This approach demonstrates how pretrained computer vision models can be leveraged to build efficient content curation pipelines, reducing manual effort for tasks like Stable Diffusion image generation.

Key Points

  • 1Tested CLIP and DINOv2 embeddings for classifying coloring page quality
  • 2CLIP semantic embeddings outperformed DINOv2 structural embeddings
  • 3Trained a linear logistic regression model instead of a complex neural network

Details

The author generates coloring page line art using Stable Diffusion, but manually rating thousands of images was becoming a bottleneck. They trained a simple logistic regression classifier on CLIP and DINOv2 image embeddings to automatically classify the images as 'keep' or 'trash', skipping the obvious failures. CLIP embeddings, which capture semantic understanding of the image content, performed better than DINOv2 embeddings which focus on visual structure. The author tested six classifiers across the two embedding models and two feature sets, finding that a straightforward linear model was sufficient to achieve good results. In the first real-world deployment, the classifier was able to safely auto-trash 55% of the images with a conservative threshold, reducing the manual curation workload.

Like
Save
Read original
Cached
Comments
?

No comments yet

Be the first to comment

AI Curator - Daily AI News Curation

AI Curator

Your AI news assistant

Ask me anything about AI

I can help you understand AI news, trends, and technologies