Understanding "VOLUME" instruction in DockerFile
In short: No, your VOLUME
instruction is not correct.
Dockerfile's VOLUME
specify one or more volumes given container-side paths. But it does not allow the image author to specify a host path. On the host-side, the volumes are created with a very long ID-like name inside the Docker root. On my machine this is /var/lib/docker/volumes
.
Note: Because the autogenerated name is extremely long and makes no sense from a human's perspective, these volumes are often referred to as "unnamed" or "anonymous".
Your example that uses a '.' character will not even run on my machine, no matter if I make the dot the first or second argument. I get this error message:
docker: Error response from daemon: oci runtime error: container_linux.go:265: starting container process caused "process_linux.go:368: container init caused "open /dev/ptmx: no such file or directory"".
I know that what has been said to this point is probably not very valuable to someone trying to understand VOLUME
and -v
and it certainly does not provide a solution for what you try to accomplish. So, hopefully, the following examples will shed some more light on these issues.
Minitutorial: Specifying volumes
Given this Dockerfile:
FROM openjdk:8u131-jdk-alpine
VOLUME vol1 vol2
(For the outcome of this minitutorial, it makes no difference if we specify vol1 vol2
or /vol1 /vol2
— this is because the default working directory within a Dockerfile is /
)
Build it:
docker build -t my-openjdk
Run:
docker run --rm -it my-openjdk
Inside the container, run ls
in the command line and you'll notice two directories exist; /vol1
and /vol2
.
Running the container also creates two directories, or "volumes", on the host-side.
While having the container running, execute docker volume ls
on the host machine and you'll see something like this (I have replaced the middle part of the name with three dots for brevity):
DRIVER VOLUME NAME
local c984...e4fc
local f670...49f0
Back in the container, execute touch /vol1/weird-ass-file
(creates a blank file at said location).
This file is now available on the host machine, in one of the unnamed volumes lol. It took me two tries because I first tried the first listed volume, but eventually I did find my file in the second listed volume, using this command on the host machine:
sudo ls /var/lib/docker/volumes/f670...49f0/_data
Similarly, you can try to delete this file on the host and it will be deleted in the container as well.
Note: The _data
folder is also referred to as a "mount point".
Exit out from the container and list the volumes on the host. They are gone. We used the --rm
flag when running the container and this option effectively wipes out not just the container on exit, but also the volumes.
Run a new container, but specify a volume using -v
:
docker run --rm -it -v /vol3 my-openjdk
This adds a third volume and the whole system ends up having three unnamed volumes. The command would have crashed had we specified only -v vol3
. The argument must be an absolute path inside the container. On the host-side, the new third volume is anonymous and resides together with the other two volumes in /var/lib/docker/volumes/
.
It was stated earlier that the Dockerfile
can not map to a host path which sort of pose a problem for us when trying to bring files in from the host to the container during runtime. A different -v
syntax solves this problem.
Imagine I have a subfolder in my project directory ./src
that I wish to sync to /src
inside the container. This command does the trick:
docker run -it -v $(pwd)/src:/src my-openjdk
Both sides of the :
character expects an absolute path. Left side being an absolute path on the host machine, right side being an absolute path inside the container. pwd
is a command that "print current/working directory". Putting the command in $()
takes the command within parenthesis, runs it in a subshell and yields back the absolute path to our project directory.
Putting it all together, assume we have ./src/Hello.java
in our project folder on the host machine with the following contents:
public class Hello {
public static void main(String... ignored) {
System.out.println("Hello, World!");
}
}
We build this Dockerfile:
FROM openjdk:8u131-jdk-alpine
WORKDIR /src
ENTRYPOINT javac Hello.java && java Hello
We run this command:
docker run -v $(pwd)/src:/src my-openjdk
This prints "Hello, World!".
The best part is that we're completely free to modify the .java file with a new message for another output on a second run - without having to rebuild the image =)
Final remarks
I am quite new to Docker, and the aforementioned "tutorial" reflects information I gathered from a 3-day command line hackathon. I am almost ashamed I haven't been able to provide links to clear English-like documentation backing up my statements, but I honestly think this is due to a lack of documentation and not personal effort. I do know the examples work as advertised using my current setup which is "Windows 10 -> Vagrant 2.0.0 -> Docker 17.09.0-ce".
The tutorial does not solve the problem "how do we specify the container's path in the Dockerfile and let the run command only specify the host path". There might be a way, I just haven't found it.
Finally, I have a gut feeling that specifying VOLUME
in the Dockerfile is not just uncommon, but it's probably a best practice to never use VOLUME
. For two reasons. The first reason we have already identified: We can not specify the host path - which is a good thing because Dockerfiles should be very agnostic to the specifics of a host machine. But the second reason is people might forget to use the --rm
option when running the container. One might remember to remove the container but forget to remove the volume. Plus, even with the best of human memory, it might be a daunting task to figure out which of all anonymous volumes are safe to remove.
To better understand the volume
instruction in dockerfile, let us learn the typical volume usage in mysql official docker file implementation.
VOLUME /var/lib/mysql
Reference: https://github.com/docker-library/mysql/blob/3362baccb4352bcf0022014f67c1ec7e6808b8c5/8.0/Dockerfile
The /var/lib/mysql
is the default location of MySQL that store data files.
When you run test container for test purpose only, you may not specify its mounting point,e.g.
docker run mysql:8
then the mysql container instance will use the default mount path which is specified by the volume
instruction in dockerfile. the volumes is created with a very long ID-like name inside the Docker root, this is called "unnamed" or "anonymous" volume. In the folder of underlying host system /var/lib/docker/volumes.
/var/lib/docker/volumes/320752e0e70d1590e905b02d484c22689e69adcbd764a69e39b17bc330b984e4
This is very convenient for quick test purposes without the need to specify the mounting point, but still can get best performance by using Volume for data store, not the container layer.
For a formal use, you will need to specify the mount path by using named volume or bind mount, e.g.
docker run -v /my/own/datadir:/var/lib/mysql mysql:8
The command mounts the /my/own/datadir directory from the underlying host system as /var/lib/mysql inside the container.The data directory /my/own/datadir won't be automatically deleted, even the container is deleted.
Usage of the mysql official image (Please check the "Where to Store Data" section):
Reference: https://hub.docker.com/_/mysql/
The official docker tutorial says:
A data volume is a specially-designated directory within one or more containers that bypasses the Union File System. Data volumes provide several useful features for persistent or shared data:
Volumes are initialized when a container is created. If the container’s base image contains data at the specified mount point,
that existing data is copied into the new volume upon volume
initialization. (Note that this does not apply when mounting a host
directory.)Data volumes can be shared and reused among containers.
Changes to a data volume are made directly.
Changes to a data volume will not be included when you update an image.
Data volumes persist even if the container itself is deleted.
In Dockerfile
you can specify only the destination of a volume inside a container. e.g. /usr/src/app
.
When you run a container, e.g. docker run --volume=/opt:/usr/src/app my_image
, you may but do not have to specify its mounting point (/opt
) on the host machine. If you do not specify --volume
argument then the mount point will be chosen automatically, usually under /var/lib/docker/volumes/
.
Specifying a VOLUME
line in a Dockerfile configures a bit of metadata on your image, but how that metadata is used is important.
First, what did these two lines do:
WORKDIR /usr/src/app
VOLUME . /usr/src/app
The WORKDIR
line there creates the directory if it doesn't exist, and updates some image metadata to specify all relative paths, along with the current directory for commands like RUN
will be in that location. The VOLUME
line there specifies two volumes, one is the relative path .
, and the other is /usr/src/app
, both just happen to be the same directory. Most often the VOLUME
line only contains a single directory, but it can contain multiple as you've done, or it can be a json formatted array.
You cannot specify a volume source in the Dockerfile: A common source of confusion when specifying volumes in a Dockerfile is trying to match the runtime syntax of a source and destination at image build time, this will not work. The Dockerfile can only specify the destination of the volume. It would be a trivial security exploit if someone could define the source of a volume since they could update a common image on the docker hub to mount the root directory into the container and then launch a background process inside the container as part of an entrypoint that adds logins to /etc/passwd, configures systemd to launch a bitcoin miner on next reboot, or searches the filesystem for credit cards, SSNs, and private keys to send off to a remote site.
What does the VOLUME line do? As mentioned, it sets some image metadata to say a directory inside the image is a volume. How is this metadata used? Every time you create a container from this image, docker will force that directory to be a volume. If you do not provide a volume in your run command, or compose file, the only option for docker is to create an anonymous volume. This is a local named volume with a long unique id for the name and no other indication for why it was created or what data it contains (anonymous volumes are were data goes to get lost). If you override the volume, pointing to a named or host volume, your data will go there instead.
VOLUME breaks things: You cannot disable a volume once defined in a Dockerfile. And more importantly, the RUN
command in docker is implemented with temporary containers with the classic builder. Those temporary containers will get a temporary anonymous volume. That anonymous volume will be initialized with the contents of your image. Any writes inside the container from your RUN
command will be made to that volume. When the RUN
command finishes, changes to the image are saved, and changes to the anonymous volume are discarded. Because of this, I strongly recommend against defining a VOLUME
inside the Dockerfile. It results in unexpected behavior for downstream users of your image that wish to extend the image with initial data in volume location.
How should you specify a volume? To specify where you want to include volumes with your image, provide a docker-compose.yml
. Users can modify that to adjust the volume location to their local environment, and it captures other runtime settings like publishing ports and networking.
Someone should document this! They have. Docker includes warnings on the VOLUME usage in their documentation on the Dockerfile along with advice to specify the source at runtime:
- Changing the volume from within the Dockerfile: If any build steps change the data within the volume after it has been declared, those changes will be discarded.
...
- The host directory is declared at container run-time: The host directory (the mountpoint) is, by its nature, host-dependent. This is to preserve image portability, since a given host directory can’t be guaranteed to be available on all hosts. For this reason, you can’t mount a host directory from within the Dockerfile. The
VOLUME
instruction does not support specifying ahost-dir
parameter. You must specify the mountpoint when you create or run the container.
The behavior of defining a VOLUME followed by RUN steps in a Dockerfile has changed with the introduction of buildkit. Here are two examples. First the Dockerfile:
$ cat df.vol-run
FROM busybox
WORKDIR /test
VOLUME /test
RUN echo "hello" >/test/hello.txt \
&& chown -R nobody:nobody /test
Next, building without buildkit. Note how the changes from the RUN step are lost:
$ DOCKER_BUILDKIT=0 docker build -t test-vol-run -f df.vol-run .
Sending build context to Docker daemon 23.04kB
Step 1/4 : FROM busybox
---> beae173ccac6
Step 2/4 : WORKDIR /test
---> Running in aaf2c2920ebd
Removing intermediate container aaf2c2920ebd
---> 7960bec5b546
Step 3/4 : VOLUME /test
---> Running in 9e2fbe3e594b
Removing intermediate container 9e2fbe3e594b
---> 5895ddaede1f
Step 4/4 : RUN echo "hello" >/test/hello.txt && chown -R nobody:nobody /test
---> Running in 2c6adff98c70
Removing intermediate container 2c6adff98c70
---> ef2c30f207b6
Successfully built ef2c30f207b6
Successfully tagged test-vol-run:latest
$ docker run -it test-vol-run /bin/sh
/test # ls -al
total 8
drwxr-xr-x 2 root root 4096 Mar 6 14:35 .
drwxr-xr-x 1 root root 4096 Mar 6 14:35 ..
/test # exit
And then building with buildkit. Note how the changes from the RUN step are preserved:
$ docker build -t test-vol-run -f df.vol-run .
[+] Building 0.5s (7/7) FINISHED
=> [internal] load build definition from df.vol-run 0.0s
=> => transferring dockerfile: 154B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 34B 0.0s
=> [internal] load metadata for docker.io/library/busybox:latest 0.0s
=> CACHED [1/3] FROM docker.io/library/busybox 0.0s
=> [2/3] WORKDIR /test 0.0s
=> [3/3] RUN echo "hello" >/test/hello.txt && chown -R nobody:nobody /test 0.4s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:8cb3220e3593b033778f47e7a3cb7581235e4c6fa921c5d8ce1ab329ebd446b6 0.0s
=> => naming to docker.io/library/test-vol-run 0.0s
$ docker run -it test-vol-run /bin/sh
/test # ls -al
total 12
drwxr-xr-x 2 nobody nobody 4096 Mar 6 14:34 .
drwxr-xr-x 1 root root 4096 Mar 6 14:34 ..
-rw-r--r-- 1 nobody nobody 6 Mar 6 14:34 hello.txt
/test # exit