If you manage a realtime application, you know that Redis is often the beating heart of your infrastructure. Recently, our production application—which relies heavily on Redis for both backend caching and realtime collaboration (via Hocuspocus/Yjs)—experienced a bizarre and catastrophic outage. Every few months, out of nowhere, Redis would randomly crash our system. The logs were flooded with a single, confusing error: READONLY You can't write against a read only replica The symptoms were severe: writes failed entirely, reads stopped working, and the entire realtime system came to a grinding halt. Restarting the Docker container fixed the issue immediately, but without a root cause, it was only a matter of time before it happened again. Here is a step-by-step breakdown of how I investigated, debugged, and ultimately solved this elusive Redis bug. Step 1: Evaluating the Infrastructure Before diving into logs, I needed to confirm exactly what our architecture looked like.…