Making Rate Limiting Correct Under Concurrency Most rate limiting tutorials stop at the single-instance case. That’s fine for learning, but it breaks quickly in production. Once you have multiple instances and real traffic patterns, the problem changes. It’s no longer just about picking an algorithm — it’s about correctness under concurrency . This article walks through what actually goes wrong and how to fix it. The In-Memory Trap The first implementation most people write looks like this: keep a counter in memory increment on each request reject when the limit is reached This works perfectly in a single instance. Now deploy two instances. Each instance has its own counter. A client can exceed your intended limit just by hitting different instances. At that point, you don’t have a rate limiter anymore. You have a suggestion. Redis Fixes Distribution, Not Concurrency The next step is moving state to Redis. Now all instances share the same counters. Good.…