The Auth Gateway sits in front of every authenticated request in the platform. Its latency isn't just its own latency — it's the floor for every service behind it. If auth takes 50ms, every request to every upstream service starts 50ms in the hole. Our internal target is sub-millisecond on cache-hot paths. The way we hit it isn't clever algorithms — it's a stack of small caches, each one handling a different kind of state, each invalidated through a different channel. This post walks through all of them. The principle that shapes everything Before the individual layers: a rule we hold as policy. Redis is allowed to influence the hot path. Redis is not allowed to block it. Every cache in the system is in-process. Redis feeds them asynchronously — pushing revocation events, triggering trie reloads, syncing SA versions. But a pod whose Redis connection is dead can still answer requests correctly, for the duration of its staleness window.…