Can AWS Lambda Scale infinitely?
We all heard these buzz words Lambda can scale infinitely. Serverless is magical, but is it true? During AWS re:invent 2022, AWS released Lambda Snapstart for java which triggered this conversation with all these cold start issues, Can Lambda scale infinitely? I decided to perform a series of testing and found some interesting results.
🔬To read my complete test, check this doc https://lnkd.in/gd8Q285t
📖 To view the complete course https://lnkd.in/gjeGAPd2
➡️ You can contact me via https://lnkd.in/dePjvNDw
🔑 Key term to understand
✅ Concurrency: Concurrency defines how many tasks the system does in parallel. The process can perform more than one task simultaneously. It’s not as same as the number of requests your system can process per second, as your system may take more or less than a second to process.
🍁 AWS Lambda execution environment lifecycle
This refers to what happens with the Lambda function. There is a cold start and a warm start process.
⓵ When the request comes to invoke your Lambda function, Lambda creates an isolated micro VM called an execution environment.
⓶ Lambda then runs your function initialization(init) code which is the code outside the main handler function.
⓷ It then runs the function handler code as the invocation
⓸ Finally, it receives the event payload and executes your business logic.
💡 The execution environment, as described in Step 1, can only process one request simultaneously. When this execution environment is processing the request, it can’t process any other request at that time.
🥶 Cold Start vs. Hot start ☀️
After Lambda finishes processing the request, it can reuse the environment to process the additional request for the same function. As the initialization(init) code has already run for the new request, Lambda only needs to execute the handler code, and this process is called Hot start. Still, if Lambda needs to create the execution environment again, it’s called a Cold start.
💡 Lambda function will reuse the execution environment or create a new one if necessary.
🙋♂️ Now the question how many execution environment Lambda needs to create?
This is where concurrency comes into place. The number of execution environments determines the concurrency. It is the sum of all the concurrent requests for the currently running function at a particular time. If there is a single execution environment, the concurrency is 1. If the number of requests decreases for your function, Lambda will stop these unused environments and free up the scaling capacity for other functions. Let’s understand this with the help of mathematical formula
concurrent requests = RequestsPerSecond * AverageDuratrionInSeconds
🧐 If your Lambda function takes 100 requests per second and the function takes 500ms to execute, then the number of concurrent requests is 50.
concurrent requests = RequestsPerSecond * AverageDuratrionInSeconds
concurrent requests = 100 requests/sec * .5 sec = 50
🧐 Now, if your function duration is decreased to 250ms and the number of requests increases to 200 requests/sec, your concurrent requests will remain the same to 50
concurrent requests = 200 requests/sec * .25 sec = 50
🔬 After doing all these mathematics calculations, it showtime 🎤
👨💻 My setup was pretty simple, a Python Lambda function 🐍, frontend by API gateway, and use of apache benchmark(ab) for testing.
1️⃣ I was involved in some deployments where Lambda can scale up to 100, so I decided to go with the high number, like 500, and Lambda scales easily to 500 functions.
2️⃣ So the first one works without a hitch, so I decided to bump the number and, this time, go with 1000. Yay, it works well.
3️⃣ With the confidence I gained from the previous two results, I decided to scale Lambda to 3000 functions this time. 👊 This time, it doesn’t work 😢. I decided to read further and found out there is an Account concurrency quota of 1000, which is the maximum concurrency in a particular region. This is shared across all the functions in the account. As you can see in the below diagram, it’s set to 1000, and it’s adjustable I created the AWS support and asked them to bump this limit to 5000. I reran the test after the limit increased, which worked this time. Use the Service Quotas console to increase the limit https://console.aws.amazon.com/servicequotas/home
4️⃣ With the new limit increased, I decided to scale the Lambda function further, this time to 4000. This time I see some strange behavior. Once the Lambda scale to 3000 functions, it doesn’t scale immediately; it scales to 500 functions in the next 1 min and then another 500 after an additional 1 min. So my first hunch is maybe I am hitting any other Lambda Limit 🙇♂️. My guess is correct its called Burst concurrency, and it’s a quota provided an initial burst in traffic between 500 and 3000 per minute, which varies per region. On top of that, this limit is not adjustable, which makes sense as AWS needs to safeguard its resources https://aws.amazon.com/premiumsupport/knowledge-center/lambda-concurrency-limit-increase/#:~:text=The%20default%20concurrency%20limit%20per,concurrency%20limit%20for%20Lambda%20functions.
5️⃣ I decided to perform one more round of tests, but this time with 5000. As mentioned above found the same behavior. AWS increases the Lambda function in steps of 500 every min.
6️⃣ Last test I performed to scale Lambda to 6000. But as in step #3, I asked AWS support to increase the limit to 5000, and AWS started throttling any function going over the 5000 limits.
💡One more thing I learned if you are performing synchronous invocation, Lambda will return a throttling error of 429, and the caller must retry the request. In the case of asynchronous function Lambda automatically retries the requests. Check the following two blogs for more info
Fig: It shows Lambda function scaling from 0 to 5000 functions. The lambda spikes instantly to 3000 functions but then spike 500 containers every minute.
So if your expecting your function is going to receive a large number of requests, you can configure these two types of concurrency control
1️⃣ Reserved Concurrency: No other function can be used when you configure this concurrency. It guarantees the maximum number of concurrent instances for your function and is free of cost.
2️⃣ Provisioned Concurrency: This will initiate a number of the execution environment so that they will immediately be available to process any requests. However, this comes with a price 💰.
🏁 AWS already has a doc explaining how it works, so I know what result we will get, so I request everyone to go through this doc first if they have a requirement where they want Lambda to scale under high load 🙏. So whether the question remains the same is whether Lambda is infinitely scalable. Still, as you can see, it immediately scales to 3000 functions which are 3TB of memory, and 18000 CPU, which is immense(considering the upper bound of 10GB memory and 6 vCPU). https://docs.aws.amazon.com/lambda/latest/dg/lambda-concurrency.html