How human feedback actually steers TTS fine-tuning

1 / 2

How human feedback actually steers TTS fine-tuning

DEV Community·Turbo Electric·24 days ago

#zGr0QVIe

#machinelearning #tts #ai #training #epoch #listening

Reading 0:00

15s threshold

How human feedback actually steers TTS fine-tuning Notes on the iteration loop we ran while fine-tuning F5-TTS and StyleTTS2 on a small Northern English corpus. The headline finding is that the listening test isn't optional polish at the end — it's the only measurement that catches the failure modes that matter, and each round of listening produces specific phonetic observations that map to specific engineering decisions. This is a write-up of the methodology, with the concrete examples that forced each decision. The loop ┌────────────────────────┐ │ render passage │ │ (baseline + ft) │ └──────────┬─────────────┘ ▼ ┌────────────────────────┐ a feature is "right" if a native │ human listens against │ speaker recognises it. Record both │ marker list (BATH, │ ◀───── what's working AND what's broken; │ FOOT-STRUT, …) │ both are signal. └──────────┬─────────────┘ ▼ ┌────────────────────────┐ translate audible features │ diagnose: why is the │ to training-side cause: │ output the way it is?…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

How human feedback actually steers TTS fine-tuning