Extract a page from a pdf as a jpeg
The pdf2image library can be used.
You can install it simply using,
pip install pdf2image
Once installed you can use following code to get images.
from pdf2image import convert_from_path
pages = convert_from_path('pdf_file', 500)
Saving pages in jpeg format
for page in pages:
page.save('out.jpg', 'JPEG')
Edit: the Github repo pdf2image also mentions that it uses pdftoppm
and that it requires other installations:
pdftoppm is the piece of software that does the actual magic. It is distributed as part of a greater package called poppler. Windows users will have to install poppler for Windows. Mac users will have to install poppler for Mac. Linux users will have pdftoppm pre-installed with the distro (Tested on Ubuntu and Archlinux) if it's not, run
sudo apt install poppler-utils
.
You can install the latest version under Windows using anaconda by doing:
conda install -c conda-forge poppler
note: Windows versions upto 0.67 are available at http://blog.alivate.com.au/poppler-windows/ but note that 0.68 was released in Aug 2018 so you'll not be getting the latest features or bug fixes.
I found this simple solution, PyMuPDF, output to png file. Note the library is imported as "fitz", a historical name for the rendering engine it uses.
import fitz
pdffile = "infile.pdf"
doc = fitz.open(pdffile)
page = doc.loadPage(0) # number of page
pix = page.get_pixmap()
output = "outfile.png"
pix.save(output)