pandas multiindex - how to select second level when using columns?
Using @JohnZwinck's data sample:
In [132]: df
Out[132]:
0
stock1 price 1
volume 2
stock2 price 3
volume 4
stock3 price 5
volume 6
Option 1:
In [133]: df.loc[(slice(None), slice('price')), :]
Out[133]:
0
stock1 price 1
stock2 price 3
stock3 price 5
Option 2:
In [134]: df.loc[pd.IndexSlice[:, 'price'], :]
Out[134]:
0
stock1 price 1
stock2 price 3
stock3 price 5
UPDATE:
But what if for the 2nd Index, I want to select everything but price and there are multiple values so that enumeration is not an option. Is there something like slice(~'price')
first let's name the index levels:
df = df.rename_axis(["lvl0", "lvl1"])
now we can use the df.query()
method:
In [18]: df.query("lvl1 != 'price'")
Out[18]:
0
lvl0 lvl1
stock1 volume 2
stock2 volume 4
stock3 volume 6
df.unstack()
will "tear off" the last level of your MultiIndex
and make your DataFrame a lot more conventional, with one column per type of data. For example:
index = pd.MultiIndex.from_product([['stock1','stock2','stock3'],['price','volume']])
df = pd.DataFrame([1,2,3,4,5,6], index)
print(df.unstack())
Gives you:
0
price volume
stock1 1 2
stock2 3 4
stock3 5 6
I have found the most intuitive solution for accessing a second-level column in a DataFrame with MultiIndex columns is using .loc
together with slice()
.
In case of your DataFrame with
df
stock1 stock2 stock3
price volume price volume price volume
0 1 2 3 4 5 6
1 2 3 4 5 6 7
using df.loc[:, (slice(None), "price")]
would deliver all columns with the sub-column of "price"
stock1 stock2 stock3
price price price
0 1 3 5
1 2 4 6
Within df.loc[:, (slice(None), "price")]
the first argument of loc :
delivers the result for all rows, the second argument (slice(None), "price")
is a tuple responsible for selecting all first level columns (slice(None)
) and all second-level columns with the name of "price"
.
Also using John's data sample:
Using xs()
is another way to slice a MultiIndex
:
df
0
stock1 price 1
volume 2
stock2 price 3
volume 4
stock3 price 5
volume 6
df.xs('price', level=1, drop_level=False)
0
stock1 price 1
stock2 price 3
stock3 price 5
Alternatively if you have a MultiIndex
in place of columns:
df
stock1 stock2 stock3
price volume price volume price volume
0 1 2 3 4 5 6
df.xs('price', axis=1, level=1, drop_level=False)
stock1 stock2 stock3
price price price
0 1 3 5