Where small models beat frontier LLMs (and where they don't): a 125M PHI detector

1 / 3

Where small models beat frontier LLMs (and where they don't): a 125M PHI detector

DEV Community·Raihan·21 days ago

#1LHdKJT8

#machinelearning #healthcare #ai #model #frontier #entity

Reading 0:00

15s threshold

Last month I published a 184M-parameter intent classifier that matches frontier LLMs at 22× lower latency. The story was clean: small specialized model, narrow task, comparable accuracy, much faster, almost free per inference. People liked it. The second model in the ClarioScope SLM Suite tells a more complicated story. It's a PHI detector — a token classifier that tags spans of protected health information in inbound patient text across all 18 HIPAA Safe Harbor identifier categories. On the macro-F1 headline number, it loses to Claude Sonnet 4.6: 0.63 vs 0.89 . On Claude Haiku 4.5: 0.63 vs 0.85. On GPT-4o: 0.63 vs 0.81. So the click-through headline isn't "matches frontier." It's: on aggregate, frontier wins . But the macro number hides what's actually happening, and the per-entity breakdown reveals something more interesting than either "small model wins" or "small model loses." Model on Hugging Face: raihan-js/clarioscope-phi-deberta-v1 .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Where small models beat frontier LLMs (and where they don't): a 125M PHI detector