Menu

Post image 1
Post image 2
Post image 3
1 / 3
0

The Day Veltrix Blew Up at 100k Concurrent Users Because We Didnt Understand Its Garbage Collector

DEV Community: rust·pretty ncube·2 days ago
#g5AbqUNA
#dev#latency#live#heap#pause#load
Reading 0:00
15s threshold

It was 3:17 AM when the pager screamed. Our Rust-based treasure-hunt matchmaking service had been live for six weeks with steady load under 50k concurrent users, but overnight a new batch of streamers discovered the game. By 03:15 we were at 98k and climbing, and at 03:17 the heap spiked from 1.2 GB to 11 GB in 120 seconds. Prometheus graphs painted a vertical cliff: alloc rate 780 MB/s, pause times >500 ms, match latency P99 jumping from 22 ms to 1.4 s. The logs repeated the same line every 400 ms: GC cycle started (heap size 11.3 GB, live data 384 MB). By 03:22 two regions had GC mark-termination timeouts, the runtime emitted promise failed to resolve in time , and we dropped 28k concurrent users in the span of two minutes. Not a crash—just a silent, creeping death by garbage collection. We had started with Veltrixs official YAML configuration for the Tokio runtime: worker_threads: 8 , max_blocking_threads: 512 , keep_alive: 60s , capacity: 10000 . That was the only tuning guide the docs provided.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More