Converting PDF to images automatically
If the PDFs are truly scanned images, then you shouldn't convert the PDF to an image, you should extract the image from the PDF. Most likely, all of the data in the PDF is essentially one giant image, wrapped in PDF verbosity to make it readable in Acrobat.
You should try the simple expedient of simply finding the image in the PDF, and copying the bytes out: Extracting JPGs from PDFs. The code there is dead simple, and there are probably dozens of reasons it won't work on your PDF files. But if it does, you'll have a quick and painless way to get the image data out of the PDF files.
You could call e.g. pdftoppm
from the command-line (or using Python's subprocess
module) and then convert the resulting PPM files to the desired format using e.g. ImageMagick (again, using subprocess
or some bindings if they exist).