python pdf to text code example
Example 1: python pdf to image
#The pdf2image library can be used
#You can install it simply using,
pip install pdf2image
#Once installed you can use following code to get images.
from pdf2image import convert_from_path
pages = convert_from_path('pdf_file', 500)
#Saving pages in jpeg format
for page in pages:
page.save('out.jpg', 'JPEG')
Example 2: extract pdf text with python
# pip install tika
from tika import parser
raw = parser.from_file('yourfile.pdf')
print(raw['content'])
Example 3: pdf to string python
pip install PyPDF2
import PyPDF2
pdfFileObject=open(r"F:\fileName.pdf",'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObject)
print(" No. Of Pages :", pdfReader.numPages)
Example 4: pdf to text python
#!pip install tabula-py
import tabula
#read all table data
df = tabula.read_pdf("sample.pdf",pages=[1,2])
df[1]
#tabula.convert_into("sample.pdf", "sample.csv", output_format="csv")
Example 5: pdf to string python
import PyPDF2
pdfFileObject = open(r"F:\pdf.pdf", 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObject)
print(" No. Of Pages :", pdfReader.numPages)
pageObject = pdfReader.getPage(0)
print(pageObject.extractText())
pdfFileObject.close()
Example 6: pdf to text python 3
pip install pdftotext