Is connection pool in sqlalchemy thread-safe?
All in all there seems to be a mix between threads and processes. The question begins by asking if an SQLAlchemy connection pool is thread-safe, but ends with a code example that uses multiprocessing
. The short answer to the "general question" is: no, you should not share an engine and its associated connection pool over process boundaries, if forking is used. There are exceptions, though.
The pool implementations are thread-safe themselves and by proxy an Engine
is thread-safe as well, because an engine does not hold state in addition to keeping a reference to the pool. On the other hand the connections checked out from a pool are not thread-safe, and neither is a Session
.
Documentation says that connection pool also is not designed for multithreading:
There's a bit of a misreading, since the original quote from the documentation is about sharing connection pools over process boundaries, if forking is used. This will likely lead to trouble, because beneath the SQLAlchemy and DB-API layers there is usually a TCP/IP socket or a file handle, and those should not be operated on concurrently.
In this particular case using a NullPool
would be safe, while others are not, since it does not pool at all and so connections won't be shared between processes, unless one goes out of their way to do so.
Does it mean that only 3 concurrent thread will do some work while others will wait until one or more thread will call
session.close()
?
Assuming a QueuePool
is in use, the set size is not a hard limit and there is some room for overflow. The size determines the number of connections to keep persistently in the pool. If the overflow limit is reached, the call will wait for timeout
seconds before giving up and raising a TimeoutError
, if no connection became available.
Or there is a chance that >2 threads will use the same connection simultaneously?
Two or more threads will not be able to accidentally checkout the same connection from a pool, except a StaticPool
, but one could explicitly share it between threads after (don't).
In the end, "Working with Engines and Connections - Basic Usage" covers the main parts of the question:
A single
Engine
manages many individual DBAPI connections on behalf of the process and is intended to be called upon in a concurrent fashion [emphasis added]....
For a multiple-process application that uses the
os.fork
system call, or for example the Pythonmultiprocessing
module, it’s usually required that a separateEngine
be used for each child process. This is because theEngine
maintains a reference to a connection pool that ultimately references DBAPI connections - these tend to not be portable across process boundaries. AnEngine
that is configured not to use pooling (which is achieved via the usage ofNullPool
) does not have this requirement.