TL;DR: The resident set size on my finite element solver hit 8GB during a full run on a 10-million-node mesh. The working set — the actual data I was pushing through the solver — should have been closer to 3. 📖 Reading time: ~39 min What's in this article The Problem: My Simulation Was Leaking 4GB Per Run and I Had No Idea Why Quick Background: What Box Actually Does at the Allocator Level Technique 1: Swapping the Global Allocator (jemalloc vs mimalloc) Technique 2: Box<[T]> and Flattening Vec> Into Contiguous Memory Technique 3: Arena Allocation to Eliminate Per-Box Overhead The Problem with 10,000 Individual Allocations Technique 4: Pinning and Box> for Self-Referential Structures The Real Reason You Encounter Pin in HPC Code The Problem: My Simulation Was Leaking 4GB Per Run and I Had No Idea Why The resident set size on my finite element solver hit 8GB during a full run on a 10-million-node mesh.…