Menu

πŸ“°
0

How to model Lambda cold-start behaviour under spike traffic before you deploy

DEV Community: lambdaΒ·Abhishek GuptaΒ·about 1 month ago
#IbIwpSEh
#dev#lambda#strong#concurrency#cold#simulation
Reading 0:00
15s threshold

There is a class of AWS incident I have started calling the "everything looked fine in testing" failure. The pattern is consistent. You design a serverless API. Lambda function with sensible defaults, wired through API Gateway, pointing at DynamoDB. You test it in dev throughout the week. Latency is acceptable. Costs track to plan. Then a campaign drops, or a new enterprise customer brings their three thousand users on day one, and your traffic goes from 300 RPS to 3,000 RPS in under a minute. Lambda, which has never had to spin up more than a dozen concurrent environments at once, is now being asked to handle a hundred. Cold starts accumulate. p99 latency goes from 80ms to 2,400ms. API Gateway timeout windows close on in-flight requests. Customers see errors. The Slack channel fires. You spend a Saturday explaining to your CTO why the architecture that "passed all our tests" just fell over under a load it should have anticipated. I have been in this situation. Not once.…

Continue reading β€” create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More