Building a Barge‑In Detector That Doesn’t Cut the Conversation Short

1 / 2

Building a Barge‑In Detector That Doesn’t Cut the Conversation Short

DEV Community·isabelle dubuis·23 days ago

#82FXsfdo

#ai #python #tutorial #software #user #false

Reading 0:00

15s threshold

During a live demo at Alexa RE:Invent 2023, our prototype cut off a user’s question mid‑sentence, causing a 12‑second silence that dropped the demo’s engagement score from 94 % to 71 %. That moment crystallized a problem every voice team knows but rarely measures: we waste 3‑5 seconds per session by naively chopping audio, and those wasted seconds cost NPS more than the occasional false positive. Why Traditional VAD Fails in Conversational Flows Energy‑threshold pitfalls Most voice stacks start with a simple energy‑threshold VAD. It works for “wake‑word only” use cases, but conversation is a moving target. Energy spikes from background TV, a door slam, or even the assistant’s own synthesized speech can dip below the threshold, causing the system to think the user stopped talking. In our logs, 38 % of false barge‑in detections occur within the first 200 ms of user speech .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Building a Barge‑In Detector That Doesn’t Cut the Conversation Short