Prevent Spring Boot application closing until all current requests are finished
You could increase the terminationGracePeriodSeconds
, the default is 30 seconds. But unfortunately, there's nothing to prevent a cluster admin from force deleting your pod, and there's all sorts of reasons the whole node could go away.
We did a combination of the above to resolve our problem.
- increased the terminationGracePeriodSeconds to the absolute maximum we expect to see in production
- added livenessProbe to prevent Traefik routing to our pod too soon
- introduced a pre-stop hook injecting a pause and invoking a monitoring script:
- Monitored netstat for ESTABLISHED connections to our process (pid 1) with a Foreign Address of our cluster Traefik service
- sent TERM to pid 1
Note that because we send TERM to pid 1 from the monitoring script the pod will terminate at this point and the terminationGracePeriodSeconds never gets hit (it's there as a precaution)
Here's the script:
#!/bin/sh
while [ "$(/bin/netstat -ap 2>/dev/null | /bin/grep http-alt.*ESTABLISHED.*1/java | grep -c traefik-ingress-service)" -gt 0 ]
do
sleep 1
done
kill -TERM 1
Here's the new pod spec:
containers:
- env:
- name: spring_profiles_active
value: dev
image: container.registry.host/project/app:@@version@@
imagePullPolicy: Always
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- sleep 5 && /monitoring.sh
livenessProbe:
httpGet:
path: /actuator/health
port: 8080
initialDelaySeconds: 60
periodSeconds: 20
timeoutSeconds: 3
name: app
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /actuator/health
port: 8080
initialDelaySeconds: 60
resources:
limits:
cpu: 2
memory: 2Gi
requests:
cpu: 2
memory: 2Gi
imagePullSecrets:
- name: app-secret
serviceAccountName: vault-auth
terminationGracePeriodSeconds: 86400