Local-first video transcription on Apple Silicon with mlx-whisper

1 / 3

Local-first video transcription on Apple Silicon with mlx-whisper

DEV Community·Dipesh Sukhani·26 days ago

#4oMNJOxR

#python #cli #opensource #machinelearning #whisper #workflow

Reading 0:00

15s threshold

I do a lot of learning from online videos. Many of them are not in English. With AI becoming part of my workflow, I stopped watching full videos and started extracting transcripts, feeding them into my models, and letting the model pull out what I actually need. The problem was: the workflow was fragmented and annoying. Upload to a cloud service, wait for processing, get a transcript full of gibberish from background noise, then move that into another tool for translation. Slow, not private, and expensive at scale. So I built ytx — a local-first command-line tool that runs entirely on your machine. The tech I chose mlx-whisper because Apple Silicon's GPU architecture is a perfect match for local inference. Instead of fighting TensorFlow or converting models, I could lean into Apple's native MLX framework and let the Mac GPU handle the full whisper-large-v3 model. No cloud account. No per-minute fees. Just your hardware.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Local-first video transcription on Apple Silicon with mlx-whisper