Menu

Post image 1
Post image 2
1 / 2
0

Zero-Downtime Postgres Migrations: The Mistakes That Locked My Production Database

DEV Community·Alex Cloudstar·18 days ago
#Vi0EnboD
#adding#backend#migration#table#column#every
Reading 0:00
15s threshold

The first production database migration I ran that broke things took down an internal tool for forty-two minutes. The migration looked harmless. It added a NOT NULL column to a table with thirty-eight million rows. I ran it on a Wednesday afternoon, watched it sit at "pending" for a few seconds, then watched our entire app stop responding. Postgres was rewriting the table. Every read and write was queued behind an ACCESS EXCLUSIVE lock. I had no idea this would happen because in development the same migration ran in two hundred milliseconds. That was the day I learned the difference between a migration that works on a small table and a migration that works on a real production database. They are not the same operation. They have different cost models, different failure modes, and different blast radius. The Postgres docs describe the locking behaviour of every command, but you have to know to look. Most ORM migration tutorials do not even mention locks.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More