Restarting an unhealthy docker container based on healthcheck

For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.

You can try put in your Dockerfile something like this:

HEALTHCHECK --interval=5s --timeout=2s CMD curl --fail http://localhost || kill 1

Don't forget --restart always option.

kill 1 will kill process with pid 1 in container and force container exit. Usually the process started by CMD or ENTRYPOINT has pid 1.

Unfortunally, this method likely don't change container's state to unhealthy, so be careful with it.

Restarting of unhealty container feature was in the original PR (https://github.com/moby/moby/pull/22719), but was removed after a discussion and considered to be done later as enhancement of RestartPolicy.

At this moment you can use this workaround to automatically restarting unhealty containers: https://hub.docker.com/r/willfarrell/autoheal/

Here is a sample compose file:

version: '2'
services:
  autoheal:
    restart: always
    image: willfarrell/autoheal
    environment:
      - AUTOHEAL_CONTAINER_LABEL=all
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

Simply execute docker-compose up -d on this

You can restart automatically an unhealthy container by setting a smart HEALTHCHECK and a proper restart policy.

The Docker restart policy should be one of always or unless-stopped.

The HEALTHCHECK instead should implement a logic that kills the container when it's unhealthy.

In the following example I used curl with its internal retry mechanism and piped it (in case of failure/service unhealthy) to the kill command.

HEALTHCHECK --interval=5m --timeout=2m --start-period=45s \
   CMD curl -f --retry 6 --max-time 5 --retry-delay 10 --retry-max-time 60 "http://localhost:8080/health" || bash -c 'kill -s 15 -1 && (sleep 10; kill -s 9 -1)'

The important step to understand here is that the retry logic is self-contained in the curl command, the Docker retry here actually is mandatory but useless. Then if the curl HTTP request fails 3 times, then kill is executed. First it sends a SIGTERM to all the processes in the container, to allow them to gracefully stop, then after 10 seconds it sends a SIGKILL to completely kill all the processes in the container. It must be noted that when the PID1 of a container dies, then the container itself dies and the restart policy is invoked.

kill docs: https://linux.die.net/man/1/kill
curl docs: https://curl.haxx.se/docs/manpage.html
docker restart docs: https://docs.docker.com/compose/compose-file/compose-file-v2/#restart

Gotchas: kill behaves differently in bash than in sh. In bash you can use -1 to signal all the processes with PID greater than 1 to die.

Restarting an unhealthy docker container based on healthcheck

Tags:

Docker

Health Monitoring

Related

Recent Posts