htdemucs vs BS-RoFormer vs Spleeter: A 2026 Audio Source Separation Benchmark

1 / 2

htdemucs vs BS-RoFormer vs Spleeter: A 2026 Audio Source Separation Benchmark

DEV Community·codesugar lin·about 1 month ago

#uf0U0X07

#ai #deeplearning #time #htdemucs #stem #roformer

Reading 0:00

15s threshold

If you've spent any time looking at AI music separation in the last twelve months, you've probably run into the same three names: Spleeter , htdemucs (Hybrid Transformer Demucs), and BS-RoFormer . They show up in every comparison post, every research paper, and every "how to extract vocals" tutorial — but the way they're compared is usually wrong. Most posts cite a single SDR number from a 2019 paper and call it a day. That's not useful if you're trying to ship a product, build a pipeline, or pick a model for real audio. This post compares the three on the dimensions that actually matter when you're deploying audio separation: Quality — SDR scores from peer-reviewed sources, not vibes Inference speed — what you'll actually wait for in production Cost per song — running on commodity GPUs at 2026 prices Output flexibility — 2 stems vs 4 stems vs 6 stems When each one is the right choice — and when it isn't Everything below is based on published benchmarks plus our own production deployment of htdemucs at…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

htdemucs vs BS-RoFormer vs Spleeter: A 2026 Audio Source Separation Benchmark