Menu

Post image 1
Post image 2
Post image 3
1 / 3
0

Where small models beat frontier LLMs (and where they don't): a 125M PHI detector

DEV Community·Raihan·21 days ago
#1LHdKJT8
Reading 0:00
15s threshold

Last month I published a 184M-parameter intent classifier that matches frontier LLMs at 22× lower latency. The story was clean: small specialized model, narrow task, comparable accuracy, much faster, almost free per inference. People liked it. The second model in the ClarioScope SLM Suite tells a more complicated story. It's a PHI detector — a token classifier that tags spans of protected health information in inbound patient text across all 18 HIPAA Safe Harbor identifier categories. On the macro-F1 headline number, it loses to Claude Sonnet 4.6: 0.63 vs 0.89 . On Claude Haiku 4.5: 0.63 vs 0.85. On GPT-4o: 0.63 vs 0.81. So the click-through headline isn't "matches frontier." It's: on aggregate, frontier wins . But the macro number hides what's actually happening, and the per-entity breakdown reveals something more interesting than either "small model wins" or "small model loses." Model on Hugging Face: raihan-js/clarioscope-phi-deberta-v1 .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More