How many concurrent requests does a single Flask process receive?
Currently there is a far simpler solution than the ones already provided. When running your application you just have to pass along the threaded=True
parameter to the app.run()
call, like:
app.run(host="your.host", port=4321, threaded=True)
Another option as per what we can see in the werkzeug docs, is to use the processes
parameter, which receives a number > 1 indicating the maximum number of concurrent processes to handle:
- threaded – should the process handle each request in a separate thread?
- processes – if greater than 1 then handle each request in a new process up to this maximum number of concurrent processes.
Something like:
app.run(host="your.host", port=4321, processes=3) #up to 3 processes
More info on the run()
method here, and the blog post that led me to find the solution and api references.
Note: on the Flask docs on the run()
methods it's indicated that using it in a Production Environment is discouraged because (quote): "While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well."
However, they do point to their Deployment Options page for the recommended ways to do this when going for production.
When running the development server - which is what you get by running app.run()
, you get a single synchronous process, which means at most 1 request is being processed at a time.
By sticking Gunicorn in front of it in its default configuration and simply increasing the number of --workers
, what you get is essentially a number of processes (managed by Gunicorn) that each behave like the app.run()
development server. 4 workers == 4 concurrent requests. This is because Gunicorn uses its included sync
worker type by default.
It is important to note that Gunicorn also includes asynchronous workers, namely eventlet
and gevent
(and also tornado
, but that's best used with the Tornado framework, it seems). By specifying one of these async workers with the --worker-class
flag, what you get is Gunicorn managing a number of async processes, each of which managing its own concurrency. These processes don't use threads, but instead coroutines. Basically, within each process, still only 1 thing can be happening at a time (1 thread), but objects can be 'paused' when they are waiting on external processes to finish (think database queries or waiting on network I/O).
This means, if you're using one of Gunicorn's async workers, each worker can handle many more than a single request at a time. Just how many workers is best depends on the nature of your app, its environment, the hardware it runs on, etc. More details can be found on Gunicorn's design page and notes on how gevent works on its intro page.