Menu

Post image 1
Post image 2
Post image 3
1 / 3
0

Local-first video transcription on Apple Silicon with mlx-whisper

DEV Community·Dipesh Sukhani·26 days ago
#4oMNJOxR
Reading 0:00
15s threshold

I do a lot of learning from online videos. Many of them are not in English. With AI becoming part of my workflow, I stopped watching full videos and started extracting transcripts, feeding them into my models, and letting the model pull out what I actually need. The problem was: the workflow was fragmented and annoying. Upload to a cloud service, wait for processing, get a transcript full of gibberish from background noise, then move that into another tool for translation. Slow, not private, and expensive at scale. So I built ytx — a local-first command-line tool that runs entirely on your machine. The tech I chose mlx-whisper because Apple Silicon's GPU architecture is a perfect match for local inference. Instead of fighting TensorFlow or converting models, I could lean into Apple's native MLX framework and let the Mac GPU handle the full whisper-large-v3 model. No cloud account. No per-minute fees. Just your hardware.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More