How to fix "process apparently never started in ..." error in Jenkins pipeline?
I had this same problem and in my case, it was related to the -u <user>
arg passed to the agent. In the end, changing my pipeline to use -u root
fixed the problem.
In the original post, I notice a -u ubuntu
was used to run the container:
docker run -t -d -u 1002:1006 -u ubuntu ... -e ******** quay.io/arubadevops/acp-build:ut-build cat
I was also using a custom user, one I've added when building the Docker image.
agent {
docker {
image "app:latest"
args "-u someuser"
alwaysPull false
reuseNode true
}
}
steps {
sh '''
# DO STUFF
'''
}
Starting the container locally using the same Jenkins commands works OK:
docker run -t -d -u 1000:1000 -u someuser app:image cat
docker top <hash> -eo pid,comm
docker exec -it <hash> ls # DO STUFF
But in Jenkins, it fails with the same "process never started.." error:
$ docker run -t -d -u 1000:1000 -u someuser app:image cat
$ docker top <hash> -eo pid,comm
[Pipeline] {
[Pipeline] unstash
[Pipeline] sh
process apparently never started in /home/jenkins/agent/workspace/branch@tmp/durable-f5dfbb1c
For some reason, changing it to -u root
worked.
agent {
docker {
image "app:latest"
args "-u root" # <=-----------
alwaysPull false
reuseNode true
}
}
The issue is caused by some breaking changes introduced in the Jenkins durable-task plugin v1.31.
Source:
https://issues.jenkins-ci.org/browse/JENKINS-59907 and https://github.com/jenkinsci/durable-task-plugin/blob/master/CHANGELOG.md
Solution: Upgrading the Jenkins durable-task plugin to v1.33 resolved the issue for us.
This error means the Jenkins process is stuck on some command.
Some suggestions:
- Upgrade all of your plugins and re-try.
- Make sure you've the right number of executors and jobs aren't stuck in the queue.
- If you're pulling the image (not your local), try adding
alwaysPull true
(next line toimage
). - When using
agent
insidestage
, remove the outer agent. See: JENKINS-63449. - Execute
org.jenkinsci.plugins.durabletask.BourneShellScript.LAUNCH_DIAGNOSTICS=true
in Jenkins's Script Console to debug. - When the process is stuck, SSH to Jenkins VM and run
docker ps
to see which command is running. - Run
docker ps -a
to see the latest failed runs. In my case it tried to runcat
next to custom CMD command set by container (e.g.ansible-playbook cat
), which was the invalid command. Thecat
command is used by design. To change entrypoint, please read JENKINS-51307. - If your container is still running, you can login to your Docker container by
docker exec -it -u0 $(docker ps -ql) bash
and runps wuax
to see what's doing. - Try removing some global variables (could be a bug), see: parallel jobs not starting with docker workflow.