Properly Versioning Docker Images
Docker gives no semantic meaning at all to tag values. A tag can be any string value at all, and tags can be reused. The only special tag value is that if you just say imagename
in a docker pull
or docker run
command, it is automatically interpreted as imagename:latest
.
Mechanically, you can give the same image multiple tags, but you need to docker push
all of them. The expensive part of the push is the layer content and so this will mostly just push the fact of the alternate tag on an existing image. Similarly, pulling an image tag, if it's a duplicate of an image you already have, is all but free, but there's no easy way to find out all of the tags for a given image.
I would recommend:
- Give every build a unique identifier, something like a source control commit ID or a timestamp.
- If and when you do official releases, also tag builds of that release with the release number. (More generally, if the current source control commit is tagged, tag the Docker image with the source control tag.)
- If it's useful for your development workflow, also tag builds that are the tips of branches with their branch name.
- Given its prominence it's probably useful to tag something as
latest
(maybe the most recent release). - Avoid using
latest
and other tags that you expect to change when referring to built images (indocker run
commands, DockerfileFROM
lines, Kubernetes pod specs, ...).
This combination of things could mean the same image is tagged imagename:g1234567
, :1.2.3
, :master
, and :latest
, and your CI system would need to do four docker push
es. You would probably expect the first two images to be fairly constant, but the latter two to change routinely. You could then run something like imagename:1.2.3
with some confidence.
(The one special case that comes to mind is a software package that changes rarely and so might need to be rebuilt if there are upstream fixes or security updates. It seems typical to reuse the same tag for this: for instance, ubuntu:18.04
gets updated every week or two.)
For me it's all about being able to tell what version of (my) software went into the Docker image. My recommendation is to use something like the git's short version ID. I don't use latest
as it carries no helpful context.
Build the Docker image with the Git version as the tag. The stable-package-name
below is just a name of your application like "HelloWorld" or anything you like:
REV_TAG=$(git log -1 --pretty=format:%h)
docker build -t <stable-package-name>:$REV_TAG .
Later I push what I tagged to the remote repository:
# nominate the tagged image for deployment
docker tag <stable-package-name>:$REV_TAG <repository-name>:$REV_TAG
# push docker image to remote repository
docker push <repository-name>
I tag with the git commit hash and the build timestamp (concatenated)
This is simply because I want to recognise that sometimes things change on the build server which mean the same code may have been compiled differently. E.g. switching the build server to compile with Java 13 instead of Java 11.
Images in docker are referred to by a reference, the most common being an image repository and tag. And that tag is a relative free formed string that points to a specific image. Tags are best thought of as a mutable pointer, it can be changed, you can have multiple pointers pointing to the same image, and it can be deleted while the underlying image may remain intact.
Since the docker does not enforce much structure on the tags (other than verifying it contains valid characters and does not exceed a length limit), enforcing this is an exercise left up to each repository maintainer, and many different solutions have resulted.
For repository maintainers, here are a few common implementations:
Option A: Ideally, repository maintainers follow some form of semver. This version number should map to the version of the packaged software, often with an additional patch number for the image revision. Importantly, images tagged this way should include tags not just for version 1.2.3-1, but also 1.2.3, 1.2, and 1, each of which are updated to the latest release within their respective hierarchy. This allows downstream users to depend on 1.2 and automatically get the updates for 1.2.4, 1.2.5, etc, as bug fixes and security updates come out.
Option B: Similar to the semver option above, many projects include other important metadata with their tags, e.g. which architecture, or base image, was used for that build. This is commonly seen with alpine vs debian/slim images, or arm vs amd compiled code. These will often be combined with semver, so you may see tags like alpine-1.5
, in addition to alpine-1
and alpine
tags.
Option C: Some projects follow more of a rolling release that offer no backward compatibility promises. This is often done with build numbers or a date string, and indeed Docker itself uses this, though with a process to deprecate features and avoid breaking changes. I've seen quite a few internal projects with companies use this strategy to version their images, relying on build number from a CI server.
Option D: I'm less of a fan of putting Git revision hashes as image tags since these convey no details without referring back to the Git repository. Not every user may have this access or skill to understand this reference. And by looking at two different hashes, I have no idea of which is newer or compatible with my application without an external check. They also assume the sole important version number is from Git, and ignore that the same Git revision may be used to create multiple images, from different parent images, different architectures, or just multiple Dockerfiles/multistage targets within the same Git repo. Instead, I like using label schema, and eventually the image spec annotations once we get tooling around image annotations, to track details like Git revisions. This places the Git revision into metadata that you can query to verify an image, while still leaving the tag itself to be user informative.
For image users, if you have a requirement to avoid unexpected changes from upstream, there are two options I know of.
The first is to run your own registry server, and pull your external dependencies to a local server. Docker includes an image for a standalone registry that you can install, and the API is open which has allowed many artifact repository vendors to support the docker registry. Do take care to regularly update this registry, and include a way to go back to previous versions if an update breaks your environment.
The second option is to stop depending on mutable tags. Instead, you can use image pinning which refers to the registry's sha256 unique reference to the manifest that cannot be changed. You can find this value in the RepoDigests when you inspect an image pulled from a registry server:
$ docker inspect -f '{{json .RepoDigests}}' debian:latest
["debian@sha256:de3eac83cd481c04c5d6c7344cd7327625a1d8b2540e82a8231b5675cef0ae5f"]
$ docker run -it --rm debian@sha256:de3eac83cd481c04c5d6c7344cd7327625a1d8b2540e82a8231b5675cef0ae5f /bin/bash
root@ac9db398dc03:/#
The biggest risk from binding to a specific image like this is missing security updates and important bug fixes. If you take this option, make sure to have a procedure to regularly update these images.
Regardless of which solution you follow for pulling images, using latest is only useful for a quick developer test, not for any production use cases. The behavior of latest entirely depends on the repository maintainer, some always update it to the last release, some make it the last stable release, and some forget to update it at all. If you depend on latest, you'll likely experience an outage when upstream images change from a version like 1.5 to 2.0, with backwards-incompatible changes. Your next deploy will inadvertently include these changes unless you explicitly depend on a tag that offers the promise of bug fixes and security patches without breaking changes.