What the first 24 hours of production CloudWatch data told us

1 / 2

What the first 24 hours of production CloudWatch data told us

DEV Community·Glenn Gray·28 days ago

#HTiwd5st

#rightsizing #autoscaling #ecs #task #tasks #scale

Reading 0:00

15s threshold

Originally published on graycloudarch.com . The morning after go-live, the first thing I looked at was CPU. One of the two delivery services was sitting at 99.8% average utilization across 9 tasks. P50 latency: 1,010ms. We'd launched deliberately without autoscaling. The plan was to observe real traffic patterns before configuring a scaling policy — you can't tune a policy you haven't seen the workload demand yet. What we didn't know was that the workload would reveal something about the task itself before we'd had a chance to watch it for a week. Thirty-six hours after go-live, we'd shipped right-sizing changes, a working autoscaling configuration, and a new observability source for ALB-layer signals. All of it came directly from what the first day of production data said. Here's how we read it. What 99.8% CPU means at 0.5 vCPU The service was allocated 512 ECS CPU units per task — half a vCPU. CloudWatch was telling us the tasks were spending essentially all of their scheduled CPU time working.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

What the first 24 hours of production CloudWatch data told us