Are .pid files reliable for determining whether a process is running?
Solution 1:
in simple terms, no: a process (e.g. a daemon) can crash and not have the time to clear its .pid file.
A technique to be more certain of the state of a program: use an explicit communication channel such as a socket. Write the socket port in a file and have the supervisor
process look it up.
You can also use the services of DBus on Linux: register a specific name and have your supervisor process (whatever you call it) check for that name.
There are numerous techniques.
One thing to remember: it is not the OS' responsibility to manage the PID files.
Solution 2:
Jldupont is correct in stating that .pid files are not reliable for determining whether a process is running as the file may not be removed in the event of a crash.
Race conditions aside, I often use pgrep when I need to know if a process is running. I could then cross-reference the output against the .pid file(s) if I felt it necessary.
Solution 3:
A file containing a process id is not reliable do determine if a process is running or not. It is just a reliable source, to figure out the last given process id for the process.
When you have the process id, you have to do futher checking, if the process is realy running.
Here is an example:
#!/usr/bin/env sh
file="/var/run/sshd.pid"
processid=$(cat /var/run/sshd.pid)
if [ ! -f ${file} ]; then
echo "File does not exists: ${file}"
exit 1
fi
if [ ! -r ${file} ]; then
echo "Insufficient file persmissons: ${file}"
exit 1
fi
psoutput=$(ps -p ${processid} -o comm=)
if [ $? == 0 ];then
if [ ${psoutput} == "sshd" ]; then
echo "sshd process is realy running with process id ${processid}"
exit 0
else
echo "given process id ${processid} is not sshd: ${psoutput}"
exit 1
fi
else
echo "there is no process runing with process id ${processid}"
exit 0
fi
pgrep is a nice command, but you'll get in trouble, when you have multiple instances running. For example when you have a regular sshd running on port TCP/22 and you have another sshd running on port TCP/2222, then pgrep will deliver two process ids when searching for sshd... when the normal sshd have its pid in /var/run/sshd.pid and the other could have its pid in /var/run/sshd-other.pid you can clearly differentiate the processes.
I do not recommend using just ps, piping through one or multiple pipes with grep and grep -v trying to filter out all other stuff which does not interest you... it a bit like using
find . | grep myfile
to figure out, if a file exits.
Solution 4:
It is not reliable to simply check the existence of a process with the same pid as contained in the file.
But many pidfile implementations also do locking on the pidfile, so that if the process dies, the lock goes away. Provided the locking mechanism is reliable, checking to see if the file is still locked is a relatively reliable mechanism for determining whether the original process is still running.