pdf to text python code example

Example 1: extract pdf text with python

# pip install tika
from tika import parser

raw = parser.from_file('yourfile.pdf')
print(raw['content'])

Example 2: pdf to string python

pip install PyPDF2
import PyPDF2
pdfFileObject=open(r"F:\fileName.pdf",'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObject) //Creating reader obj
print(" No. Of Pages :", pdfReader.numPages)//To know no.of pages

Example 3: pdf to text python

#!pip install tabula-py
import tabula
#read all table data
df = tabula.read_pdf("sample.pdf",pages=[1,2])
df[1]

#tabula.convert_into("sample.pdf", "sample.csv", output_format="csv")

Example 4: pdf to string python

import PyPDF2

pdfFileObject = open(r"F:\pdf.pdf", 'rb')

pdfReader = PyPDF2.PdfFileReader(pdfFileObject)

print(" No. Of Pages :", pdfReader.numPages)

pageObject = pdfReader.getPage(0)

print(pageObject.extractText())

pdfFileObject.close()

Example 5: pdf to text python 3

pip install pdftotext

Tags:

Misc Example