Cohere just open-sourced a 5.42 WER speech model - here's what testing it on real audio showed

1 / 2

Cohere just open-sourced a 5.42 WER speech model - here's what testing it on real audio showed

DEV Community·Jim L·about 1 month ago

#rz5KkRka

#ai #machinelearning #nlp #whisper #cohere #large

Reading 0:00

15s threshold

Cohere released their new ASR model on March 26 with a 5.42% Word Error Rate on the LibriSpeech test-clean benchmark. That's a noticeable improvement over Whisper-large-v3 (~5.7%), and given it's open-source under a permissive license, I spent the last two weeks running it through real-world audio to see if the benchmark numbers translate. The short answer: yes for clean studio audio, partially for noisy real-world recordings, and not yet for code-switched conversations. What's actually new Cohere's transcribe model is built on a different architecture than Whisper (encoder-decoder transformer with a lighter decoder). Key claims from the release notes: 5.42% WER on LibriSpeech test-clean Roughly 30% faster inference than Whisper-large-v3 at similar batch sizes Released with weights + inference code (not API-only) Supports streaming via chunked inference The "30% faster" caveat: this assumes you're running on the same hardware Cohere benchmarked.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Cohere just open-sourced a 5.42 WER speech model - here's what testing it on real audio showed