tuning a custom proxy on a single ARM64 core and hitting a pretty bad throughput wall my framing benchmarks look fine encode is around 12.8 GB/s with 1 alloc and round trip sits at about 4.3 GB/s with 2 allocs but once it hits the muxer it just drops hard to ~530 MB/s with 5 allocs why does this happen is it more likely channel contention lock blocking or is the garbage collector getting stressed from those 5 allocs under load? submitted by /u/No-Condition-2137 [link] [comments]