Python, running command line tools in parallel

Use the Pool object from the multiprocessing module. You can then use e.g. Pool.map() to do parallel processing. An example would be my markphotos script (see below), where a function is called multiple times in parallel to each process a picture.

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# Adds my copyright notice to photos.
#
# Author: R.F. Smith <[email protected]>
# $Date: 2012-10-28 17:00:24 +0100 $
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to markphotos.py. This work is published from
# the Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

import sys
import subprocess
from multiprocessing import Pool, Lock
from os import utime, devnull
import os.path
from time import mktime

globallock = Lock() 

def processfile(name):
    """Adds copyright notice to the file.

    Arguments:
    name -- file to modify
    """
    args = ['exiftool', '-CreateDate', name]
    createdate = subprocess.check_output(args)
    fields = createdate.split(":") #pylint: disable=E1103
    year = int(fields[1])
    cr = "R.F. Smith <[email protected]> http://rsmith.home.xs4all.nl/"
    cmt = "Copyright © {} {}".format(year, cr)
    args = ['exiftool', '-Copyright="Copyright (C) {} {}"'.format(year, cr),
            '-Comment="{}"'.format(cmt), '-overwrite_original', '-q', name]
    rv = subprocess.call(args)
    modtime = int(mktime((year, int(fields[2]), int(fields[3][:2]),
                          int(fields[3][3:]), int(fields[4]), int(fields[5]),
                          0,0,-1)))
    utime(name, (modtime, modtime))
    globallock.acquire()
    if rv == 0:
        print "File '{}' processed.".format(name)
    else:
        print "Error when processing file '{}'".format(name)
    globallock.release()

def checkfor(args):
    """Make sure that a program necessary for using this script is
    available.

    Arguments:
    args -- list of commands to pass to subprocess.call.
    """
    if isinstance(args, str):
        args = args.split()
    try:
        with open(devnull, 'w') as f:
            subprocess.call(args, stderr=subprocess.STDOUT, stdout=f)
    except:
        print "Required program '{}' not found! exiting.".format(args[0])
        sys.exit(1)

def main(argv):
    """Main program.

    Arguments:
    argv -- command line arguments
    """
    if len(argv) == 1:
        binary = os.path.basename(argv[0])
        print "Usage: {} [file ...]".format(binary)
        sys.exit(0)
    checkfor(['exiftool',  '-ver'])
    p = Pool()
    p.map(processfile, argv[1:])
    p.close()

if __name__ == '__main__':
    main(sys.argv)

If you want to run commandline tools as separate processes, just use os.system (or better: The subprocess module) to start them asynchronously. On Unix/linux/macos:

subprocess.call("command -flags arguments &", shell=True)

On Windows:

subprocess.call("start command -flags arguments", shell=True)

As for knowing when a command has finished: Under unix you could get set up with wait etc., but if you're writing the commandline scripts, I'd just have them write a message into a file, and monitor the file from the calling python script.

@James Youngman proposed a solution to your second question: Synchronization. If you want to control your processes from python, you could start them asynchronously with Popen.

p1 = subprocess.Popen("command1 -flags arguments")
p2 = subprocess.Popen("command2 -flags arguments")

Beware that if you use Popen and your processes write a lot of data to stdout, your program will deadlock. Be sure to redirect all output to a log file.

p1 and p2 are objects that you can use to keep tabs on your processes. p1.poll() will not block, but will return None if the process is still running. It will return the exit status when it is done, so you can check if it is zero.

while True:
    time.sleep(60)
    for proc in [p1, p2]:
        status = proc.poll()
        if status == None:
            continue
        elif status == 0:
            # harvest the answers
        else:
            print "command1 failed with status", status

The above is just a model: As written, it will never exit, and it will keep "harvesting" the results of completed processes. But I trust you get the idea.

Python, running command line tools in parallel

Tags:

Python

Shell

Command Line

Parallel Processing

Related

Recent Posts