Adding meta-information/metadata to pandas DataFrame

Just ran into this issue myself. As of pandas 0.13, DataFrames have a _metadata attribute on them that does persist through functions that return new DataFrames. Also seems to survive serialization just fine (I've only tried json, but I imagine hdf is covered as well).


Sure, like most Python objects, you can attach new attributes to a pandas.DataFrame:

import pandas as pd
df = pd.DataFrame([])
df.instrument_name = 'Binky'

Note, however, that while you can attach attributes to a DataFrame, operations performed on the DataFrame (such as groupby, pivot, join or loc to name just a few) may return a new DataFrame without the metadata attached. Pandas does not yet have a robust method of propagating metadata attached to DataFrames.

Preserving the metadata in a file is possible. You can find an example of how to store metadata in an HDF5 file here.


As of pandas 1.0, possibly earlier, there is now a Dataframe.attrs property. It is experimental, but this is probably what you'll want in the future. For example:

import pandas as pd
df = pd.DataFrame([])
df.attrs['instrument_name'] = 'Binky'

Find it in the docs here.

Trying this out with to_parquet and then from_parquet, it doesn't seem to persist, so be sure you check that out with your use case.

Tags:

Python

Pandas