How to save a pandas DataFrame table as a png
The best solution to your problem is probably to first export your dataframe to HTML and then convert it using an HTML-to-image tool. The final appearance could be tweaked via CSS.
Popular options for HTML-to-image rendering include:
WeasyPrint
wkhtmltopdf
/wkhtmltoimage
Let us assume we have a dataframe named df
.
We can generate one with the following code:
import string
import numpy as np
import pandas as pd
np.random.seed(0) # just to get reproducible results from `np.random`
rows, cols = 5, 10
labels = list(string.ascii_uppercase[:cols])
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 10)), columns=labels)
print(df)
# A B C D E F G H I J
# 0 44 47 64 67 67 9 83 21 36 87
# 1 70 88 88 12 58 65 39 87 46 88
# 2 81 37 25 77 72 9 20 80 69 79
# 3 47 64 82 99 88 49 29 19 19 14
# 4 39 32 65 9 57 32 31 74 23 35
Using WeasyPrint
This approach uses a pip
-installable package, which will allow you to do everything using the Python ecosystem.
One shortcoming of weasyprint
is that it does not seem to provide a way of adapting the image size to its content.
Anyway, removing some background from an image is relatively easy in Python / PIL, and it is implemented in the trim()
function below (adapted from here).
One also would need to make sure that the image will be large enough, and this can be done with CSS's @page size
property.
The code follows:
import weasyprint as wsp
import PIL as pil
def trim(source_filepath, target_filepath=None, background=None):
if not target_filepath:
target_filepath = source_filepath
img = pil.Image.open(source_filepath)
if background is None:
background = img.getpixel((0, 0))
border = pil.Image.new(img.mode, img.size, background)
diff = pil.ImageChops.difference(img, border)
bbox = diff.getbbox()
img = img.crop(bbox) if bbox else img
img.save(target_filepath)
img_filepath = 'table1.png'
css = wsp.CSS(string='''
@page { size: 2048px 2048px; padding: 0px; margin: 0px; }
table, td, tr, th { border: 1px solid black; }
td, th { padding: 4px 8px; }
''')
html = wsp.HTML(string=df.to_html())
html.write_png(img_filepath, stylesheets=[css])
trim(img_filepath)
Using wkhtmltopdf
/wkhtmltoimage
This approach uses an external open source tool and this needs to be installed prior to the generation of the image.
There is also a Python package, pdfkit
, that serves as a front-end to it (it does not waive you from installing the core software yourself), but I will not use it.
wkhtmltoimage
can be simply called using subprocess
(or any other similar means of running an external program in Python).
One would also need to output to disk the HTML file.
The code follows:
import subprocess
df.to_html('table2.html')
subprocess.call(
'wkhtmltoimage -f png --width 0 table2.html table2.png', shell=True)
and its aspect could be further tweaked with CSS similarly to the other approach.
There is actually a python library called dataframe_image Just do a
pip install dataframe_image
Do the imports
import pandas as pd
import numpy as np
import dataframe_image as dfi
df = pd.DataFrame(np.random.randn(6, 6), columns=list('ABCDEF'))
and style your table if you want by:
df_styled = df.style.background_gradient() #adding a gradient based on values in cell
and finally:
dfi.export(df_styled,"mytable.png")
Pandas allows you to plot tables using matplotlib (details here). Usually this plots the table directly onto a plot (with axes and everything) which is not what you want. However, these can be removed first:
import matplotlib.pyplot as plt
import pandas as pd
from pandas.table.plotting import table # EDIT: see deprecation warnings below
ax = plt.subplot(111, frame_on=False) # no visible frame
ax.xaxis.set_visible(False) # hide the x axis
ax.yaxis.set_visible(False) # hide the y axis
table(ax, df) # where df is your data frame
plt.savefig('mytable.png')
The output might not be the prettiest but you can find additional arguments for the table() function here. Also thanks to this post for info on how to remove axes in matplotlib.
EDIT:
Here is a (admittedly quite hacky) way of simulating multi-indexes when plotting using the method above. If you have a multi-index data frame called df that looks like:
first second
bar one 1.991802
two 0.403415
baz one -1.024986
two -0.522366
foo one 0.350297
two -0.444106
qux one -0.472536
two 0.999393
dtype: float64
First reset the indexes so they become normal columns
df = df.reset_index()
df
first second 0
0 bar one 1.991802
1 bar two 0.403415
2 baz one -1.024986
3 baz two -0.522366
4 foo one 0.350297
5 foo two -0.444106
6 qux one -0.472536
7 qux two 0.999393
Remove all duplicates from the higher order multi-index columns by setting them to an empty string (in my example I only have duplicate indexes in "first"):
df.ix[df.duplicated('first') , 'first'] = '' # see deprecation warnings below
df
first second 0
0 bar one 1.991802
1 two 0.403415
2 baz one -1.024986
3 two -0.522366
4 foo one 0.350297
5 two -0.444106
6 qux one -0.472536
7 two 0.999393
Change the column names over your "indexes" to the empty string
new_cols = df.columns.values
new_cols[:2] = '','' # since my index columns are the two left-most on the table
df.columns = new_cols
Now call the table function but set all the row labels in the table to the empty string (this makes sure the actual indexes of your plot are not displayed):
table(ax, df, rowLabels=['']*df.shape[0], loc='center')
et voila:
Your not-so-pretty but totally functional multi-indexed table.
EDIT: DEPRECATION WARNINGS
As pointed out in the comments, the import statement for table
:
from pandas.tools.plotting import table
is now deprecated in newer versions of pandas in favour of:
from pandas.plotting import table
EDIT: DEPRECATION WARNINGS 2
The ix
indexer has now been fully deprecated so we should use the loc
indexer instead. Replace:
df.ix[df.duplicated('first') , 'first'] = ''
with
df.loc[df.duplicated('first') , 'first'] = ''