When Go’s scheduler becomes the bottleneck — detecting and fixing the hidden costs of M:N threading Goroutines To OS Threads: The 73% Latency Drop We Measured By Promoting Work When Go’s scheduler becomes the bottleneck — detecting and fixing the hidden costs of M:N threading Promoting critical work to dedicated OS threads bypasses scheduler contention — direct kernel scheduling eliminates goroutine multiplexing overhead for latency-sensitive operations. So there’s this thing about goroutines that’s been bothering me for months now — actually, wait, let me back up. You know how everyone says “use goroutines, they’re lightweight, they’re amazing”? Yeah, well, turns out that’s not always true. I mean it IS true, but… okay let me just start from the beginning. Our real-time trading system had this puzzling problem. P99 latency was sitting at 47ms when our profiler kept screaming that we were only doing 12ms of actual work. Where the hell were those other 35 milliseconds going?…