Error with OMP_NUM_THREADS when using dask distributed

Short answer

export OMP_NUM_THREADS=1

or 

dask-worker --nthreads 1

Explanation

The OMP_NUM_THREADS environment variable controls the number of threads that many libraries, including the BLAS library powering numpy.dot, use in their computations, like matrix multiply.

The conflict here is that you have two parallel libraries that are calling each other, BLAS, and dask.distributed. Each library is designed to use as many threads as there are logical cores available in the system.

For example if you had eight cores then dask.distributed might run your function f eight times at once on different threads. The numpy.dot function call within f would use eight threads per call, resulting in 64 threads running at once.

This is actually fine, you'll experience a performance hit but everything can run correctly, but it will be slower than if you use just eight threads at a time, either by limiting dask.distributed or by limiting BLAS.

Your system probably has OMP_THREAD_LIMIT set at some reasonable number like 16 to warn you of this event when it happens.


If you're using MKL blas you might also get some improvement using the TBB threading layer. I haven't actually had occasion to try it out so YMMV.

http://conference.scipy.org/proceedings/scipy2018/anton_malakhov.html