Patterns for PHP multi processes?

From your php script, you could launch another script (using exec) to do the processing. Save status updates in a text file, which could then be read periodically by the parent thread.

Note: to avoid php waiting for the exec'd script to complete, pipe the output to a file:

exec('/path/to/file.php | output.log');

Alternatively, you can fork a script using the PCNTL functions. This uses one php script, which when forked can detect whether it is the parent or the child and operate accordingly. There are functions to send/receive signals for the purpose of communicating between parent/child, or you have the child log to a file and the parent read from that file.

From the pcntl_fork manual page:

$pid = pcntl_fork();
if ($pid == -1) {
     die('could not fork');
} else if ($pid) {
     // we are the parent
     pcntl_wait($status); //Protect against Zombie children
} else {
     // we are the child
}

If your goal is minimal time - the solution is simple to describe, but not that simple to implement.

You need to find a pattern to divide the work (You don't provide much information in the question in this regard).

Then use one master process that forks children to do the work. As a rule the total number of processes you use should be between n and 2n, where n is the number of cores the machine has.

Assuming this data will be stored in files you might consider using non-blocking IO to maximize the throughput. Not doing so will make most of your process spend time waiting for the disk. PHP has stream_select() that might help you. Note that using it is not trivial.

If you decide not to use select - increasing the number of processes might help.


In regards to pcntl functions: I've written a deamon with them (a proper one with forking, changing session id, the running user, etc...) and it's one of the most reliable piece of software I've written. Because it spawns workers for every task, even if there is a bug in one of the tasks, it does not affect the others.


This might be a good time to consider using a message queue, even if you run it all on one machine.