Menu

📰
0

voice as primary interface on desktop, the real bottleneck isn't accuracy

Reddit r/swift·u/Deep_Ad1959·about 1 month ago
#5T7m3rV0
Reading 0:00
15s threshold

I've been optimizing transcription accuracy (local vs cloud, model size vs latency). Turns out that's not the constraint. The real bottleneck is that voice without friction spirals. Keyboard forces you to pause and think. Voice at your Mac doesn't.

Shipped a hold-to-talk interface with breath detection and timeout release. Took more engineering time than the entire voice pipeline. Users immediately preferred it. The forced pause before every message prevents rambling and mistakes.

Local transcription sits at 90-94% accuracy out of the box. Cloud hits 98%. That 4-6% miss rate was expected. What killed engagement was the absence of a natural stopping point. Every agent mistake now feels intentional instead of accidental.

Been testing this in production for six months. Pattern holds across hundreds of queries.

Read More