Restarting an unhealthy docker container based on healthcheck
For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.
You can try put in your Dockerfile something like this:
HEALTHCHECK --interval=5s --timeout=2s CMD curl --fail http://localhost || kill 1
Don't forget --restart always
option.
kill 1
will kill process with pid 1 in container and force container exit. Usually the process started by CMD or ENTRYPOINT has pid 1.
Unfortunally, this method likely don't change container's state to unhealthy, so be careful with it.
Restarting of unhealty container feature was in the original PR (https://github.com/moby/moby/pull/22719), but was removed after a discussion and considered to be done later as enhancement of RestartPolicy.
At this moment you can use this workaround to automatically restarting unhealty containers: https://hub.docker.com/r/willfarrell/autoheal/
Here is a sample compose file:
version: '2'
services:
autoheal:
restart: always
image: willfarrell/autoheal
environment:
- AUTOHEAL_CONTAINER_LABEL=all
volumes:
- /var/run/docker.sock:/var/run/docker.sock
Simply execute docker-compose up -d
on this
You can restart automatically an unhealthy container by setting a smart HEALTHCHECK and a proper restart policy.
The Docker restart policy should be one of always
or unless-stopped
.
The HEALTHCHECK instead should implement a logic that kills the container when it's unhealthy.
In the following example I used curl
with its internal retry mechanism and piped it (in case of failure/service unhealthy) to the kill
command.
HEALTHCHECK --interval=5m --timeout=2m --start-period=45s \
CMD curl -f --retry 6 --max-time 5 --retry-delay 10 --retry-max-time 60 "http://localhost:8080/health" || bash -c 'kill -s 15 -1 && (sleep 10; kill -s 9 -1)'
The important step to understand here is that the retry logic is self-contained in the curl
command, the Docker retry here actually is mandatory but useless. Then if the curl
HTTP request fails 3 times, then kill
is executed. First it sends a SIGTERM to all the processes in the container, to allow them to gracefully stop, then after 10 seconds it sends a SIGKILL to completely kill all the processes in the container. It must be noted that when the PID1 of a container dies, then the container itself dies and the restart policy is invoked.
kill
docs: https://linux.die.net/man/1/killcurl
docs: https://curl.haxx.se/docs/manpage.htmldocker
restart docs: https://docs.docker.com/compose/compose-file/compose-file-v2/#restart
Gotchas: kill
behaves differently in bash than in sh. In bash you can use -1
to signal all the processes with PID greater than 1 to die.