Compiling numpy with OpenBLAS integration
Here's a simpler approach than @ali_m's answer and it works on macOS.
Install a gfortran compiler if you don't have one. E.g. using homebrew on macOS:
$ brew install gcc
Compile
OpenBLAS
from source [or use a package manager], either getting the source repo or downloading a release:$ git clone https://github.com/xianyi/OpenBLAS $ cd OpenBLAS && make FC=gfortran $ sudo make PREFIX=/opt/OpenBLAS install
If you don't/can't sudo, set
PREFIX=
to another directory and modify the path in the next step.OpenBLAS does not need to be on the compiler include path or the linker library path.
Create a
~/.numpy-site.cfg
file containing the PREFIX path you used in step 2:[openblas] libraries = openblas library_dirs = /opt/OpenBLAS/lib runtime_library_dirs = /opt/OpenBLAS/lib include_dirs = /opt/OpenBLAS/include
include_dirs
is for the compiler.library_dirs
is for the linker.runtime_library_dirs
is for the loader, and might not be needed.pip-install numpy and scipy from source (preferably into a virtualenv) without manually downloading them [you can also specify the release versions]:
pip install numpy scipy --no-binary numpy,scipy
In my experience, this
OPENBLAS_NUM_THREADS
setting at runtime makes OpenBLAS faster, not slower, esp. when multiple CPU processes are using it at the same time:export OPENBLAS_NUM_THREADS=1
(Alternatively, you can compile OpenBLAS with
make FC=gfortran USE_THREAD=0
.)
See the other answers for ways to test it.
Just in case you are using ubuntu or mint, you can easily have openblas linked numpy by installing both numpy and openblas via apt-get as
sudo apt-get install numpy libopenblas-dev
On a fresh docker ubuntu, I tested the following script copied from the blog post "Installing Numpy and OpenBLAS"
import numpy as np
import numpy.random as npr
import time
# --- Test 1
N = 1
n = 1000
A = npr.randn(n,n)
B = npr.randn(n,n)
t = time.time()
for i in range(N):
C = np.dot(A, B)
td = time.time() - t
print("dotted two (%d,%d) matrices in %0.1f ms" % (n, n, 1e3*td/N))
# --- Test 2
N = 100
n = 4000
A = npr.randn(n)
B = npr.randn(n)
t = time.time()
for i in range(N):
C = np.dot(A, B)
td = time.time() - t
print("dotted two (%d) vectors in %0.2f us" % (n, 1e6*td/N))
# --- Test 3
m,n = (2000,1000)
A = npr.randn(m,n)
t = time.time()
[U,s,V] = np.linalg.svd(A, full_matrices=False)
td = time.time() - t
print("SVD of (%d,%d) matrix in %0.3f s" % (m, n, td))
# --- Test 4
n = 1500
A = npr.randn(n,n)
t = time.time()
w, v = np.linalg.eig(A)
td = time.time() - t
print("Eigendecomp of (%d,%d) matrix in %0.3f s" % (n, n, td))
Without openblas the result is:
dotted two (1000,1000) matrices in 563.8 ms
dotted two (4000) vectors in 5.16 us
SVD of (2000,1000) matrix in 6.084 s
Eigendecomp of (1500,1500) matrix in 14.605 s
After I installed openblas with apt install openblas-dev
, I checked the numpy linkage with
import numpy as np
np.__config__.show()
and the information is
atlas_threads_info:
NOT AVAILABLE
openblas_info:
NOT AVAILABLE
atlas_blas_info:
NOT AVAILABLE
atlas_3_10_threads_info:
NOT AVAILABLE
blas_info:
library_dirs = ['/usr/lib']
libraries = ['blas', 'blas']
language = c
define_macros = [('HAVE_CBLAS', None)]
mkl_info:
NOT AVAILABLE
atlas_3_10_blas_threads_info:
NOT AVAILABLE
atlas_3_10_blas_info:
NOT AVAILABLE
openblas_lapack_info:
NOT AVAILABLE
lapack_opt_info:
library_dirs = ['/usr/lib']
libraries = ['lapack', 'lapack', 'blas', 'blas']
language = c
define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
blas_opt_info:
library_dirs = ['/usr/lib']
libraries = ['blas', 'blas']
language = c
define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
atlas_info:
NOT AVAILABLE
blas_mkl_info:
NOT AVAILABLE
lapack_mkl_info:
NOT AVAILABLE
atlas_3_10_info:
NOT AVAILABLE
lapack_info:
library_dirs = ['/usr/lib']
libraries = ['lapack', 'lapack']
language = f77
atlas_blas_threads_info:
NOT AVAILABLE
It doesn't show linkage to openblas. However, the new result of the script shows that numpy must have used openblas:
dotted two (1000,1000) matrices in 15.2 ms
dotted two (4000) vectors in 2.64 us
SVD of (2000,1000) matrix in 0.469 s
Eigendecomp of (1500,1500) matrix in 2.794 s
I just compiled numpy
inside a virtualenv
with OpenBLAS
integration, and it seems to be working OK.
This was my process:
Compile
OpenBLAS
:$ git clone https://github.com/xianyi/OpenBLAS $ cd OpenBLAS && make FC=gfortran $ sudo make PREFIX=/opt/OpenBLAS install
If you don't have admin rights you could set
PREFIX=
to a directory where you have write privileges (just modify the corresponding steps below accordingly).Make sure that the directory containing
libopenblas.so
is in your shared library search path.To do this locally, you could edit your
~/.bashrc
file to contain the lineexport LD_LIBRARY_PATH=/opt/OpenBLAS/lib:$LD_LIBRARY_PATH
The
LD_LIBRARY_PATH
environment variable will be updated when you start a new terminal session (use$ source ~/.bashrc
to force an update within the same session).Another option that will work for multiple users is to create a
.conf
file in/etc/ld.so.conf.d/
containing the line/opt/OpenBLAS/lib
, e.g.:$ sudo sh -c "echo '/opt/OpenBLAS/lib' > /etc/ld.so.conf.d/openblas.conf"
Once you are done with either option, run
$ sudo ldconfig
Grab the
numpy
source code:$ git clone https://github.com/numpy/numpy $ cd numpy
Copy
site.cfg.example
tosite.cfg
and edit the copy:$ cp site.cfg.example site.cfg $ nano site.cfg
Uncomment these lines:
.... [openblas] libraries = openblas library_dirs = /opt/OpenBLAS/lib include_dirs = /opt/OpenBLAS/include ....
Check configuration, build, install (optionally inside a
virtualenv
)$ python setup.py config
The output should look something like this:
... openblas_info: FOUND: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = c define_macros = [('HAVE_CBLAS', None)] FOUND: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = c define_macros = [('HAVE_CBLAS', None)] ...
Installing with
pip
is preferable to usingpython setup.py install
, sincepip
will keep track of the package metadata and allow you to easily uninstall or upgrade numpy in the future.$ pip install .
Optional: you can use this script to test performance for different thread counts.
$ OMP_NUM_THREADS=1 python build/test_numpy.py version: 1.10.0.dev0+8e026a2 maxint: 9223372036854775807 BLAS info: * libraries ['openblas', 'openblas'] * library_dirs ['/opt/OpenBLAS/lib'] * define_macros [('HAVE_CBLAS', None)] * language c dot: 0.099796795845 sec $ OMP_NUM_THREADS=8 python build/test_numpy.py version: 1.10.0.dev0+8e026a2 maxint: 9223372036854775807 BLAS info: * libraries ['openblas', 'openblas'] * library_dirs ['/opt/OpenBLAS/lib'] * define_macros [('HAVE_CBLAS', None)] * language c dot: 0.0439578056335 sec
There seems to be a noticeable improvement in performance for higher thread counts. However, I haven't tested this very systematically, and it's likely that for smaller matrices the additional overhead would outweigh the performance benefit from a higher thread count.