Menu

Post image 1
Post image 2
1 / 2
0

How human feedback actually steers TTS fine-tuning

DEV Community·Turbo Electric·24 days ago
#zGr0QVIe
Reading 0:00
15s threshold

How human feedback actually steers TTS fine-tuning Notes on the iteration loop we ran while fine-tuning F5-TTS and StyleTTS2 on a small Northern English corpus. The headline finding is that the listening test isn't optional polish at the end — it's the only measurement that catches the failure modes that matter, and each round of listening produces specific phonetic observations that map to specific engineering decisions. This is a write-up of the methodology, with the concrete examples that forced each decision. The loop ┌────────────────────────┐ │ render passage │ │ (baseline + ft) │ └──────────┬─────────────┘ ▼ ┌────────────────────────┐ a feature is "right" if a native │ human listens against │ speaker recognises it. Record both │ marker list (BATH, │ ◀───── what's working AND what's broken; │ FOOT-STRUT, …) │ both are signal. └──────────┬─────────────┘ ▼ ┌────────────────────────┐ translate audible features │ diagnose: why is the │ to training-side cause: │ output the way it is?…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More