Python Pandas to_pickle cannot pickle large dataframes

Try to use compression. It worked for me.

data_df.to_pickle('data_df.pickle.gzde', compression='gzip')

Probably not the answer you were hoping for but this is what I did......

Split the dataframe into smaller chunks using np.array_split (although numpy functions are not guaranteed to work, it does now, although there used to be a bug for it).

Then pickle the smaller dataframes.

When you unpickle them use pandas.append or pandas.concat to glue everything back together.

I agree it is a fudge and suboptimal. If anyone can suggest a "proper" answer I'd be interested in seeing it, but I think it as simple as dataframes are not supposed to get above a certain size.

Split a large pandas dataframe

Until there is a fix somewhere on pickle/pandas side of things, I'd say a better option is to use alternative IO backend. HDF is suitable for large datasets (GBs). So you don't need to add additional split/combine logic.

df.to_hdf('my_filename.hdf','mydata',mode='w')

df = pd.read_hdf('my_filename.hdf','mydata')

Python Pandas to_pickle cannot pickle large dataframes

Tags:

Python

Pandas

Pickle

Related

Recent Posts