How can I create a Docker image to run both Python and R?
The Dockerfile I built for Python and R to run together with their dependencies in this manner is:
FROM ubuntu:latest
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends build-essential r-base r-cran-randomforest python3.6 python3-pip python3-setuptools python3-dev
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip3 install -r requirements.txt
RUN Rscript -e "install.packages('data.table')"
COPY . /app
The commands to build the image, run the container (naming it SnakeR here), and execute the code are:
docker build -t my_image .
docker run -it --name SnakeR my_image
docker exec SnakeR /bin/sh -c "python3 test_call_r.py"
I treated it like a Ubuntu OS and built the image as follows:
- suppress the prompts for choosing your location during the R install;
- update the apt-get;
- set installation criteria of:
- y = yes to user prompts for proceeding (e.g. memory allocation);
- install only the recommended, not suggested, dependencies;
- include some essential installation packages for Ubuntu;
- r-base for the R software;
- r-cran-randomforest to force the package to be available (unlike the separate install of data.table which didn’t work for randomForest for some reason);
- python3.6 version of python;
- python3-pip to allow pip be used to install the requirements;
- python3-setuptools to somehow help execute the pip installs (?!);
- python3-dev to execute the JayDeBeApi installation as part of the requirements (that it otherwise confuses is for Python2 not 3);
- specify the active “working directory” to be the /app location;
- copy the requirements file that holds the python dependencies (built from the virtual environment of the Python codebase, e.g., with pip freeze);
- install the Python packages from the requirements file (pip3 for Python3);
- install the R packages (e.g. just data.table here);
- copy the directory contents to the specified working directory /app.
This is replicated from my blog post at https://datascienceunicorn.tumblr.com/post/182297983466/building-a-docker-to-run-python-r
Being specific on both Python and R versions will save you future headaches. This approach, for instance, will always install R v4.0 and Python v3.8
FROM r-base:4.0.3
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y --no-install-recommends build-essential libpq-dev python3.8 python3-pip python3-setuptools python3-dev
RUN pip3 install --upgrade pip
ENV PYTHONPATH "${PYTHONPATH}:/app"
WORKDIR /app
ADD requirements.txt .
ADD requirements.r .
# installing python libraries
RUN pip3 install -r requirements.txt
# installing r libraries
RUN Rscript requirements.r
And your requirements.r file should look like
install.packages('data.table')
install.packages('jsonlite')
...