Menu

Post image 1
Post image 2
Post image 3
Post image 4
Post image 5
1 / 5
0

Matching frontier LLMs at 22 lower latency: a 184M-parameter intent classifier for healthcare text

DEV Community·Raihan·24 days ago
#lXvKxgHp
#ai#machinelearning#python#model#cost#patient
Reading 0:00
15s threshold

Healthcare practices drown in inbound patient text. Email, contact forms, live chat, SMS, voicemail transcripts — every channel sends messages that need to be routed: to scheduling, to billing, to clinical, to the front desk. It's a high-volume, deterministic, latency-sensitive task. The obvious answer in 2026 is to throw a frontier LLM at it. Claude Haiku 4.5 will give you 95% accuracy on this kind of classification. GPT-4o will too. But every call costs real money, adds about a second of network round-trip, and sends patient text to a third party that doesn't have a BAA with you. I built a small alternative — a 184M-parameter DeBERTa-v3-base fine-tune — and benchmarked it against Claude Haiku 4.5, Claude Sonnet 4.6, and GPT-4o on a 1,154-example test set. The fine-tuned model lands within 4 percentage points of accuracy of the best frontier model, runs 22× faster on a CPU, and costs effectively $0 per inference after training. Total cost to build it: under $3.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More