I do a lot of learning from online videos. Many of them are not in English. With AI becoming part of my workflow, I stopped watching full videos and started extracting transcripts, feeding them into my models, and letting the model pull out what I actually need. The problem was: the workflow was fragmented and annoying. Upload to a cloud service, wait for processing, get a transcript full of gibberish from background noise, then move that into another tool for translation. Slow, not private, and expensive at scale. So I built ytx — a local-first command-line tool that runs entirely on your machine. The tech I chose mlx-whisper because Apple Silicon's GPU architecture is a perfect match for local inference. Instead of fighting TensorFlow or converting models, I could lean into Apple's native MLX framework and let the Mac GPU handle the full whisper-large-v3 model. No cloud account. No per-minute fees. Just your hardware.…