Opening a pdf and reading in tables with python pandas

There is a new version of tabula called tabula-py

pip install tabula-py

the .read_pdf method works just like in the old version, documentation is here: https://pypi.org/project/tabula-py/


you can use tabula https://blog.chezo.uno/tabula-py-extract-table-from-pdf-into-python-dataframe-6c7acfa5f302

from tabula import read_pdf
df = read_pdf('data.pdf')

I can see more in the link!


In case it is a one-off, you can copy the data from your PDF table into a text file, format it (using search-and-replace, Notepad++ macros, a script), save it as a CSV file and load it into Pandas.

If you need to do this in a scalable way, you might try this product: http://tabula.technology/. I have not used it yet, so I don't know how well it works, but you can explore it if you need it.

Tags:

Python

Pandas

Pdf