Menu

Post image 1
Post image 2
1 / 2
0

What the first 24 hours of production CloudWatch data told us

DEV Community·Glenn Gray·28 days ago
#HTiwd5st
#rightsizing#autoscaling#ecs#task#tasks#scale
Reading 0:00
15s threshold

Originally published on graycloudarch.com . The morning after go-live, the first thing I looked at was CPU. One of the two delivery services was sitting at 99.8% average utilization across 9 tasks. P50 latency: 1,010ms. We'd launched deliberately without autoscaling. The plan was to observe real traffic patterns before configuring a scaling policy — you can't tune a policy you haven't seen the workload demand yet. What we didn't know was that the workload would reveal something about the task itself before we'd had a chance to watch it for a week. Thirty-six hours after go-live, we'd shipped right-sizing changes, a working autoscaling configuration, and a new observability source for ALB-layer signals. All of it came directly from what the first day of production data said. Here's how we read it. What 99.8% CPU means at 0.5 vCPU The service was allocated 512 ECS CPU units per task — half a vCPU. CloudWatch was telling us the tasks were spending essentially all of their scheduled CPU time working.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More