Menu

Post image 1
Post image 2
1 / 2
0

htdemucs vs BS-RoFormer vs Spleeter: A 2026 Audio Source Separation Benchmark

DEV Community·codesugar lin·about 1 month ago
#uf0U0X07
#ai#deeplearning#time#htdemucs#stem#roformer
Reading 0:00
15s threshold

If you've spent any time looking at AI music separation in the last twelve months, you've probably run into the same three names: Spleeter , htdemucs (Hybrid Transformer Demucs), and BS-RoFormer . They show up in every comparison post, every research paper, and every "how to extract vocals" tutorial — but the way they're compared is usually wrong. Most posts cite a single SDR number from a 2019 paper and call it a day. That's not useful if you're trying to ship a product, build a pipeline, or pick a model for real audio. This post compares the three on the dimensions that actually matter when you're deploying audio separation: Quality — SDR scores from peer-reviewed sources, not vibes Inference speed — what you'll actually wait for in production Cost per song — running on commodity GPUs at 2026 prices Output flexibility — 2 stems vs 4 stems vs 6 stems When each one is the right choice — and when it isn't Everything below is based on published benchmarks plus our own production deployment of htdemucs at…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More