Adding meta-information/metadata to pandas DataFrame
Just ran into this issue myself. As of pandas 0.13, DataFrames have a _metadata attribute on them that does persist through functions that return new DataFrames. Also seems to survive serialization just fine (I've only tried json, but I imagine hdf is covered as well).
Sure, like most Python objects, you can attach new attributes to a pandas.DataFrame
:
import pandas as pd
df = pd.DataFrame([])
df.instrument_name = 'Binky'
Note, however, that while you can attach attributes to a DataFrame, operations performed on the DataFrame (such as groupby
, pivot
, join
or loc
to name just a few) may return a new DataFrame without the metadata attached. Pandas does not yet have a robust method of propagating metadata attached to DataFrames.
Preserving the metadata in a file is possible. You can find an example of how to store metadata in an HDF5 file here.
As of pandas 1.0, possibly earlier, there is now a Dataframe.attrs
property. It is experimental, but this is probably what you'll want in the future.
For example:
import pandas as pd
df = pd.DataFrame([])
df.attrs['instrument_name'] = 'Binky'
Find it in the docs here.
Trying this out with to_parquet
and then from_parquet
, it doesn't seem to persist, so be sure you check that out with your use case.