The Day the JVM Died Under 200k QPS and Lived to Tell the Story

1 / 3

The Day the JVM Died Under 200k QPS and Lived to Tell the Story

DEV Community: rust·pretty ncube·2 days ago

#9st4dywr

#dev #rust #side #scheduler #first #time

Reading 0:00

15s threshold

The Problem We Were Actually Solving At Veltrix our treasure hunt engine was running on the JVM stack—OpenJDK 17, GraalVM Native Image, and a Kotlin coroutine pipeline. By month six the pipeline was falling apart at 180k QPS. Not the CPU cliff—the GC pause cliff. Every 250 ms the ZGC cycle would spike 18 ms and half the player actions timed out. The heap was 16 GB, but NewSpace had a 300 ms evacuation window at that load. Wed tuned everything: -XX:MaxGCPauseMillis=20 , transparent huge pages, isolated cores. Still the coroutine scheduler would block during the safepoint, and clients would see 503s on /hunt/next . What We Tried First (And Why It Failed) First we threw money at the JVM. We moved to Azul Zulu Prime with 4 ms pauseless guarantees. That cut safepoint time to 4 ms, but the allocation rate was 14 GB/s and the nursery still evacuated before it could finish. We tried Shenandoah on Amazon Corretto; same pause cliff, just shifted to final marking.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

The Day the JVM Died Under 200k QPS and Lived to Tell the Story