Spring Boot app + Kubernetes liveness/readiness checks
ReadinessProbe - is the app ready to handle requests?
Use a health check to check if the app is ready to handle new requests. This can be implemented in /actuator/health
. Also see StartupProbe below.
Under high load?
If your app is under high load, it may not be able to respond on health check in time, resulting in ReadinessProbe to fail. Consider using Horizontal Pod Autoscaler to get more replicas to handle the load.
LivenessProbe - is the app deadlocked?
If your app is in a unrecoverable state, it is best if it can terminate itself, e.g. using java.lang.System.exit(1)
. If the app can be deadlocked, unable to proceed, consider implementing an endpoint for LivenessProbe
, this may be the same as for the ReadinessProbe
.
Not responding to readiness in a long time
If your app haven't responded to the ReadinessProbe in a long time, e.g. many minutes, something is probably wrong (unless you expect this to happen for your app), then you should probably also have /actuator/health
as your LivenessProbe but with a higher failureThreshold
and a high initialDelaySeconds
(e.g. a few minutes)
StartupProbe - better alternative on Kubernetes 1.16+
The ReadinessProbe is most useful during app startup, since it may need to load e.g. data before it is ready to receive requests - but ReadinessProbe is executed periodic during the pod lifecycle. StartupProbe is now a better alternative for slow starting apps in combination with LivenessProbe that is only active after StartupProbe. You may still need a ReadinessProbe to notify that the pod is ready to handle requests.
Depending on other services
If your app depends on other services, that are not healthy - it is better if your app can recover from those situations, when the backing service is up again, e.g. reconnect. Otherwise this will be a domino chain reaction if you have a chain of services that does not respond on ReadinessProbe or LivenessProbe because the last app in the chain have a problem. Consider to provide degraded service, notify that you are not in full service, maybe some of your endpoints still works correct.
Use Management Server Port
It is the kubelet on the same node that send probe requests. Consider to use a Management Server Port for probes. You don't need to expose this port to the Service
, better to use one port for http and another for management.
Cloud provider Load Balancer Service health check
If you are using a Cloud Provider Load Balancer, it may do health checks on your services, and you may need to configure the path it sends health checks on, e.g. Google Cloud Platform defaults to /
. This is a health check for the Service not for the individual Pod.
As of Spring Boot 2.3, the Availability state of the application (including Liveness and Readiness) is supported in the core and can be exposed as Kubernetes Probes with Actuator.
Your question is spot on and this was discussed at length in the Spring Boot issue for the Liveness/Readiness feature.
The /health
endpoint was never really designed to expose the application state and drive how the cloud platform treats the app instance it and routes traffic to it. It's been used that way quite a lot since Spring Boot didn't have better to offer here.
The Liveness
should only fail when the internal state of the application is broken and we cannot recover from it. As you've underlined in your question, failing here as soon as an external system is unavailable can be dangerous: the platform might recycle all application instances depending on that external system (maybe all of them?) and cause cascading failures, since other systems might be depending on that application as well.
By default, the liveness proble will reply with "Success" unless the application itself changed that internal state.
The Readiness
probe is really about the ability for the application to serve traffic. As you've mentioned, some health checks might show the state of essential parts of the application, some others not. Spring Boot will synchronize the Readiness state with the lifecycle of the application (the web app has started, the graceful shutdown has been requested and we shouldn't route traffic anymore, etc). There is a way to configure a "readiness" health group to contain a custom set of health checks for your particular use case.
I disagree with a few statements in the answer that received the bounty, especially because a lot changed in Spring Boot since:
- You should not use
/actuator/health
for Liveness or Readiness probes as of Spring Boot 2.3.0. - With the new Spring Boot lifecycle, you should move all the long-running startup tasks as
ApplicationRunner
beans - they will be executed after Liveness is Success, but before Readiness is Success. If the application startup is still too slow for the configured probes, you should then use the StartupProbe with a longer timeout and point it to the Liveness endpoint. - Using the management port can be dangerous, since it's using a separate web infrastructure. For example, the probes exposed on the management port might be OK but the main connector (serving the actual traffic to clients) might be overwhelmed and cannot serve more traffic. Reusing the same server and web infrastructure for the probes can be safer in some case.
For more information about this new feature, you can read the dedicated Kubernetes Liveness and Readiness Probes with Spring Boot blog post.