Efficiently joining two dataframes based on multiple levels of a multiindex
This is not implemented internally ATM, but your soln is the recommended one, see here as well the issue
You can simply wrap this in a function if you want to make it look nicer. reset_index/set_index
do copy (though you can pass an inplace=True
argument if you want); it IS truly inplace as these are just changing the index attribute.
You could patch in a nice function like:
def merge_multi(self, df, on):
return self.reset_index().join(df,on=on).set_index(self.index.names)
DataFrame.merge_multi = merge_multi
df1.merge_multi(df2,on=['Body','Season'])
However, merging by definition creates new data, so not sure how much this will actually save you.
A better method is to build up smaller frames, then do a larger merge. You also might want to do something like this
join
now allows merging of MultiIndex DataFrames with partially matching indices.
Following your example:
df1 = df1.join(df2, on=['Body','Season'])
make sure the on
columns are specified in exactly the order that match the Index of the other DataFrame as the on
argument matches the order you specify the labels of the calling DataFrame with the Index as it is in the other
DataFrame you join to.
or just join
without using on
and by default it will use the common index levels between the two DataFrames:
df1 = df1.join(df2)
Resulting df1
:
A B Mood
Body Season Item
sun summer one -0.483779 0.981052 Good
winter one -0.309939 0.803862 Bad
two -0.413732 0.025331 Bad
moon summer one -0.926068 -1.316808 Ugly
two 0.221627 -0.226154 Ugly
three 1.064856 0.402827 Ugly
winter one 0.526461 -0.932231 Confused
two -0.296415 -0.812374 Confused