Why Google Cloud Run's maximum number of concurrent requests is so low (1000) while the app server frameworks can support multiple 1000s per sec?


From the google cloud run docs page:

Concurrency is configurable. By default each Cloud Run container instance can receive up to 80 requests at the same time; you can increase this to a maximum of 1000.

If my app is written in Node.js express/fastify, it could easily support well beyond 1000. See the benchmark.

Fastify: 56457 req/sec (50x Cloud Run's max)
Express: 11883 req/sec (11x Cloud Run's max)

I understand that practical results may be lower than the above results. But still it could support well beyond a single 1000, I hope.

While the server frameworks support a higher concurrency, why do Google Cloud Run throttle it to a maximum of single 1000?

(Same is the case with Firebase Functions v2 which runs in Google Cloud Run. Hence tagging firebase as well here in the question)


You made a mistake.

  • Take the Cloud Run limitation: can handle up to 1000 requests concurrently
  • Take your test results: 56457 requests per seconds.

Now, the mistake. Imagine your request is processed in 20ms, so 1/50 of second. If you handle 1000 concurrent requests each 1/50 of second, you can handle 50000 request per seconds.

There is no limitation on Cloud Run on the number of request per second, but the number of concurrent requests in the same time, on the same instance (1000 should be a limit due to Google load balancing and traffic routing infrastructure)

Answered By – guillaume blaquiere

Answer Checked By – Katrina (AngularFixing Volunteer)

Leave a Reply

Your email address will not be published.