How can i read a PDF file from inline raw_bytes (not from file)?
you can use io
import requests, PyPDF2, io
url = 'http://www.asx.com.au/asxpdf/20171108/pdf/43p1l61zf2yct8.pdf'
response = requests.get(url)
with io.BytesIO(response.content) as open_pdf_file:
read_pdf = PyPDF2.PdfFileReader(open_pdf_file)
num_pages = read_pdf.getNumPages()
print(num_pages)
2
PS. To open files, always use a context manager (with
-statement)
Try This (With IO module and an additional decryptor) :
import requests, PyPDF2, io
url = 'http://www.asx.com.au/asxpdf/20171103/pdf/43nyyw9r820c6r.pdf'
response = requests.get(url).content
reserve_pdf_on_memory = io.BytesIO(response)
load_pdf = PyPDF2.PdfFileReader(reserve_pdf_on_memory)
if load_pdf.isEncrypted:
load_pdf.decrypt("")
print(load_pdf.getPage(0).extractText())
else:
print(load_pdf.getPage(0).extractText())
Good Luck ... :)