Tracing a 2s Latency Spike to a Single SQL Query

1 / 2

Tracing a 2s Latency Spike to a Single SQL Query

DEV Community·wheresthelag·27 days ago

#oZvw8GG6

#apm #postgres #webdev #monitoring #logs #query

Reading 0:00

15s threshold

We received an alert around early morning 4AM indicating that our checkout service latency had jumped from its usual 50ms p99 to over 2 seconds. There were no errors, CPU usage was normal, and the database appeared healthy. Despite everything looking fine in the logs, users were clearly experiencing delays. Initial Checks We started with the usual suspects: Application metrics: CPU and memory utilization were stable. Database health: PostgreSQL showed no signs of stress. Slow query logs: No entries, even with the threshold set to 1 second. Redis/cache layer: Operating as expected. Why Logs Weren’t Enough Our logs provided detailed information about individual events such as HTTP requests, SQL executions, and service interactions. However, they lacked the context needed to understand how time was spent across the entire request lifecycle. Logs answered what happened, but not where the time went.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Tracing a 2s Latency Spike to a Single SQL Query