Get unique values from index column in MultiIndex
Andy Hayden's answer (index.levels[blah]
) is great for some scenarios, but can lead to odd behavior in others. My understanding is that Pandas goes to great lengths to "reuse" indices when possible to avoid having the indices of lots of similarly-indexed DataFrames taking up space in memory. As a result, I've found the following annoying behavior:
import pandas as pd
import numpy as np
np.random.seed(0)
idx = pd.MultiIndex.from_product([['John', 'Josh', 'Alex'], list('abcde')],
names=['Person', 'Letter'])
large = pd.DataFrame(data=np.random.randn(15, 2),
index=idx,
columns=['one', 'two'])
small = large.loc[['Jo'==d[0:2] for d in large.index.get_level_values('Person')]]
print small.index.levels[0]
print large.index.levels[0]
Which outputs
Index([u'Alex', u'John', u'Josh'], dtype='object')
Index([u'Alex', u'John', u'Josh'], dtype='object')
rather than the expected
Index([u'John', u'Josh'], dtype='object')
Index([u'Alex', u'John', u'Josh'], dtype='object')
As one person pointed out on the other thread, one idiom that seems very natural and works properly would be:
small.index.get_level_values('Person').unique()
large.index.get_level_values('Person').unique()
I hope this helps someone else dodge the super-unexpected behavior that I ran into.
One way is to use index.levels
:
In [11]: df
Out[11]:
C
A B
0 one 3
1 one 2
2 two 1
In [12]: df.index.levels[1]
Out[12]: Index([one, two], dtype=object)
Another way is to use unique()
function of index
df.index.unique('B')
Unlike levels
this function is documented.