Fine-tuning CLIP on a Niche Domain: How I Got +26pp Accuracy on Architectural Styles and What You…

1 / 6

Fine-tuning CLIP on a Niche Domain: How I Got +26pp Accuracy on Architectural Styles and What You Can Apply to Your Own Domain

DEV Community·Shiva Shrestha·20 days ago

#b0SCRoNu

#ai #machinelearning #python #computervision #model #training

Reading 0:00

15s threshold

Most fine-tuning write-ups end at "we got X% accuracy." This one walks through the four decisions before and after the training loop that actually moved the number. The training loop itself was the easy part. If you're fine-tuning a vision-language model on a niche domain, these are the decisions you'll face too. The project: I fine-tuned OpenCLIP ViT-B/32 on 24 architectural style classes and shipped the embedder as the retrieval backbone for visquery.com , an architectural precedent search tool. Base CLIP zero-shot on my val set: 61.4% . Fine-tuned: 87.4% . That's +26 percentage points, and almost none of it came from tuning the training loop. Each section below is a decision point with the reasoning behind it. Not just what I did, but why, and what the generalizable principle is for any domain. 1. Pick a domain where you can read the errors, not just count them Generalizable principle: domain knowledge isn't just context, it's a forcing function for better decisions at every stage.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Fine-tuning CLIP on a Niche Domain: How I Got +26pp Accuracy on Architectural Styles and What You Can Apply to Your Own Domain