Menu

Post image 1
Post image 2
1 / 2
0

Mozilla Common Voice Meets Wikidata (How the Dagbanli Dictionary Got Audio Usage Examples)

Reading 0:00
15s threshold

Wikidata provides pronunciation audio for words; Mozilla Common Voice provides spoken example sentences . Mozilla Common Voice has thousands of Dagbanli sentences with native‑speaker audio. We built a pipeline to match these sentences to dictionary words, creating audio‑rich usage examples. Introduction Wikidata gave us the bones for the Dagbanli dictionary: Lexemes, Senses, Forms, and even the pronunciation audio of words. But a word without context is just… a word. Beyond listening to how a word sounds, users need to see how it is used in real life. Hearing it spoken by a real person in an everyday setting is even better. Wikidata itself already supports usage examples through the property P5831. Those are valuable, but they are text only. Mozilla Common Voice provides something different: thousands of spoken sentences, each with an audio file recorded by a native speaker. We built a pipeline to connect these audio‑rich sentences to the right dictionary entries.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More