Selecting rows from a Pandas dataframe with a compound (hierarchical) index
Alternatively you can use query
:
1. group1 == 'a'
In [11]: df.query('group1 == "a"')
Out[11]:
value1 value2
group1 group2
a c 1.1 7.1
c 2.0 8.0
d 3.0 9.0
2. group1 == 'a' & group2 == 'c'
In [12]: df.query('group1 == "a" & group2 == "c"')
Out[12]:
value1 value2
group1 group2
a c 1.1 7.1
c 2.0 8.0
3. group2 == 'c'
In [13]: df.query('group2 == "c"')
Out[13]:
value1 value2
group1 group2
a c 1.1 7.1
c 2.0 8.0
4. group1 in ['a','b','c']
In [14]: df.query('group1 in ["a", "b", "c"]')
Out[14]:
value1 value2
group1 group2
a c 1.1 7.1
c 2.0 8.0
d 3.0 9.0
b d 4.0 10.0
d 5.0 11.0
e 6.0 12.0
Try using xs
to be very precise:
In [5]: df.xs('a', level=0)
Out[5]:
value1 value2
group2
c 1.1 7.1
c 2.0 8.0
d 3.0 9.0
In [6]: df.xs('c', level='group2')
Out[6]:
value1 value2
group1
a 1.1 7.1
a 2.0 8.0
In Python 0.19.0 there is a new suggested approach, which is explained here1. I believe the clearest example they give is the following, in which they slice from a four-level indexing. This is how the dataframe is made:
In [46]: def mklbl(prefix,n):
....: return ["%s%s" % (prefix,i) for i in range(n)]
....:
In [47]: miindex = pd.MultiIndex.from_product([mklbl('A',4),
....: mklbl('B',2),
....: mklbl('C',4),
....: mklbl('D',2)])
....:
In [48]: micolumns = pd.MultiIndex.from_tuples([('a','foo'),('a','bar'),
....: ('b','foo'),('b','bah')],
....: names=['lvl0', 'lvl1'])
....:
In [49]: dfmi = pd.DataFrame(np.arange(len(miindex)*len(micolumns)).reshape((len(miindex),len(micolumns))),
....: index=miindex,
....: columns=micolumns).sort_index().sort_index(axis=1)
....:
In [50]: dfmi
Out[50]:
lvl0 a b
lvl1 bar foo bah foo
A0 B0 C0 D0 1 0 3 2
D1 5 4 7 6
C1 D0 9 8 11 10
D1 13 12 15 14
C2 D0 17 16 19 18
D1 21 20 23 22
C3 D0 25 24 27 26
... ... ... ... ...
A3 B1 C0 D1 229 228 231 230
C1 D0 233 232 235 234
D1 237 236 239 238
C2 D0 241 240 243 242
D1 245 244 247 246
C3 D0 249 248 251 250
D1 253 252 255 254
And this is how they select the different rows:
In [51]: dfmi.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:]
Out[51]:
lvl0 a b
lvl1 bar foo bah foo
A1 B0 C1 D0 73 72 75 74
D1 77 76 79 78
C3 D0 89 88 91 90
D1 93 92 95 94
B1 C1 D0 105 104 107 106
D1 109 108 111 110
C3 D0 121 120 123 122
... ... ... ... ...
A3 B0 C1 D1 205 204 207 206
C3 D0 217 216 219 218
D1 221 220 223 222
B1 C1 D0 233 232 235 234
D1 237 236 239 238
C3 D0 249 248 251 250
D1 253 252 255 254
So quite simply, in df.loc[(indices),:]
, you specify the indices that you want to select per level, from highest level to lowest. If you do not want to make a selection of the lowest level(s) of indices, you can omit specifying them. If you don't want to make a slice between other specified levels, you add slice(None)
. Both cases are shown in the example, where level D is omitted and level B is specified between A and C.
Syntax like the following will work:
df.ix['a']
df.ix['a'].ix['c']
since group1
and group2
are indices. Please forgive my previous attempt!
To get at the second index only, I think you have to swap indices:
df.swaplevel(0,1).ix['c']
But I'm sure Wes will correct me if I'm wrong.