Why does it take ages to install Pandas on Alpine Linux
ANSWER: AS OF 3/9/2020, FOR PYTHON 3, IT STILL DOESN'T!
Here is a complete working Dockerfile:
FROM python:3.7-alpine
RUN echo "@testing http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
RUN apk add --update --no-cache py3-numpy py3-pandas@testing
The build is very sensitive to the exact python and alpine version numbers - getting these wrong seems to provoke Max Levy's error so:libpython3.7m.so.1.0 (missing)
- but the above does now work for me.
My updated Dockerfile is available at https://gist.github.com/jtlz2/b0f4bc07ce2ff04bc193337f2327c13b
[Earlier Update:]
ANSWER: IT DOESN'T!
In any Alpine Dockerfile you can simply do*
RUN apk add py2-numpy@community py2-scipy@community py-pandas@edge
This is because numpy
, scipy
and now pandas
are all available prebuilt on alpine
:
https://pkgs.alpinelinux.org/packages?name=*numpy
https://pkgs.alpinelinux.org/packages?name=*scipy&branch=edge
https://pkgs.alpinelinux.org/packages?name=*pandas&branch=edge
One way to avoid rebuilding every time, or using a Docker layer, is to use a prebuilt, native Alpine Linux/.apk
package, e.g.
https://github.com/sgerrand/alpine-pkg-py-pandas
https://github.com/nbgallery/apks
You can build these .apk
s once and use them wherever in your Dockerfile you like :)
This also saves you having to bake everything else into the Docker image before the fact - i.e. the flexibility to pre-build any Docker image you like.
PS I have put a Dockerfile stub at https://gist.github.com/jtlz2/b0f4bc07ce2ff04bc193337f2327c13b that shows roughly how to build the image. These include the important steps (*):
RUN echo "@community http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories
RUN apk update
RUN apk add --update --no-cache libgfortran
Real honest advice here, switch to Debian based image and then all your problems will be gone.
Alpine for python applications doesn't work well.
Here is an example of my dockerfile
:
FROM python:3.7.6-buster
RUN pip install pandas==1.0.0
RUN pip install sklearn
RUN pip install Django==3.0.2
RUN pip install cx_Oracle==7.3.0
RUN pip install excel
RUN pip install djangorestframework==3.11.0
The python:3.7.6-buster
is more appropriate in this case, in addition, you don't need any extra dependency in the OS.
Follow a usefull and recent article: https://pythonspeed.com/articles/alpine-docker-python/:
Don’t use Alpine Linux for Python images Unless you want massively slower build times, larger images, more work, and the potential for obscure bugs, you’ll want to avoid Alpine Linux as a base image. For some recommendations on what you should use, see my article on choosing a good base image.
Debian based images use only python pip
to install packages with .whl
format:
Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
WHL format was developed as a quicker and more reliable method of installing Python software than re-building from source code every time. WHL files only have to be moved to the correct location on the target system to be installed, whereas a source distribution requires a build step before installation.
Wheel packages pandas
and numpy
are not supported in images based on Alpine platform. That's why when we install them using python pip
during the building process, we always compile them from the source files in alpine:
Downloading pandas-0.22.0.tar.gz (11.3MB)
Downloading numpy-1.14.1.zip (4.9MB)
and we can see the following inside container during the image building:
/ # ps aux
PID USER TIME COMMAND
1 root 0:00 /bin/sh -c pip install pandas
7 root 0:04 {pip} /usr/local/bin/python /usr/local/bin/pip install pandas
21 root 0:07 /usr/local/bin/python -c import setuptools, tokenize;__file__='/tmp/pip-build-en29h0ak/pandas/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n
496 root 0:00 sh
660 root 0:00 /bin/sh -c gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/pri
661 root 0:00 gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/private -Inump
662 root 0:00 /usr/libexec/gcc/x86_64-alpine-linux-musl/6.4.0/cc1 -quiet -I build/src.linux-x86_64-3.6/numpy/core/src/private -I numpy/core/include -I build/src.linux-x86_64-3.6/numpy/core/includ
663 root 0:00 ps aux
If we modify Dockerfile
a little:
FROM python:3.6.4-alpine3.7
RUN apk add --no-cache g++ wget
RUN wget https://pypi.python.org/packages/da/c6/0936bc5814b429fddb5d6252566fe73a3e40372e6ceaf87de3dec1326f28/pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
we get the following error:
Step 4/4 : RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
---> Running in 0faea63e2bda
pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl is not a supported wheel on this platform.
The command '/bin/sh -c pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl' returned a non-zero code: 1
Unfortunately, the only way to install pandas
on an Alpine image is to wait until build finishes.
Of course if you want to use the Alpine image with pandas
in CI for example, the best way to do so is to compile it once, push it to any registry and use it as a base image for your needs.
EDIT:
If you want to use the Alpine image with pandas
you can pull my nickgryg/alpine-pandas docker image. It is a python image with pre-compiled pandas
on the Alpine platform. It should save your time.