How to model Lambda cold-start behaviour under spike traffic before you deploy

📰

How to model Lambda cold-start behaviour under spike traffic before you deploy

DEV Community: lambda·Abhishek Gupta·about 1 month ago

#dev #lambda #strong #concurrency #cold #simulation

Reading 0:00

15s threshold

There is a class of AWS incident I have started calling the "everything looked fine in testing" failure. The pattern is consistent. You design a serverless API. Lambda function with sensible defaults, wired through API Gateway, pointing at DynamoDB. You test it in dev throughout the week. Latency is acceptable. Costs track to plan. Then a campaign drops, or a new enterprise customer brings their three thousand users on day one, and your traffic goes from 300 RPS to 3,000 RPS in under a minute. Lambda, which has never had to spin up more than a dozen concurrent environments at once, is now being asked to handle a hundred. Cold starts accumulate. p99 latency goes from 80ms to 2,400ms. API Gateway timeout windows close on in-flight requests. Customers see errors. The Slack channel fires. You spend a Saturday explaining to your CTO why the architecture that "passed all our tests" just fell over under a load it should have anticipated. I have been in this situation. Not once.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

How to model Lambda cold-start behaviour under spike traffic before you deploy