Healthcare practices drown in inbound patient text. Email, contact forms, live chat, SMS, voicemail transcripts — every channel sends messages that need to be routed: to scheduling, to billing, to clinical, to the front desk. It's a high-volume, deterministic, latency-sensitive task. The obvious answer in 2026 is to throw a frontier LLM at it. Claude Haiku 4.5 will give you 95% accuracy on this kind of classification. GPT-4o will too. But every call costs real money, adds about a second of network round-trip, and sends patient text to a third party that doesn't have a BAA with you. I built a small alternative — a 184M-parameter DeBERTa-v3-base fine-tune — and benchmarked it against Claude Haiku 4.5, Claude Sonnet 4.6, and GPT-4o on a 1,154-example test set. The fine-tuned model lands within 4 percentage points of accuracy of the best frontier model, runs 22× faster on a CPU, and costs effectively $0 per inference after training. Total cost to build it: under $3.…