Voice and Camera Input in React: Speech Recognition, Media Devices, and Permissions Voice and camera are the two senses that turn a static web app into something that feels alive. A search bar that you can talk to. A note-taking app that transcribes you in real time. A meeting tool that lets you pick which webcam to use. A walkie-talkie that talks when you hold a key. None of these are exotic anymore -- the browser has had the APIs for years -- but every one of them lives behind a gauntlet of permission prompts, vendor prefixes, and lifecycle quirks that make them painful to integrate into a React component. This post walks through four browser capabilities for voice and camera input: live speech recognition with interim results, enumerating the user's cameras and microphones, querying permissions in a way that survives revocation, and using the Shift key as a push-to-talk modifier.…