Slice Pandas dataframe by index values that are (not) in a list
Thanks to ASGM; I found that I needed to turn the set into a list to make it work with a MultiIndex:
mi1 = pd.MultiIndex.from_tuples([("a", 1), ("a", 2), ("b", 1), ("b", 2)])
df1 = pd.DataFrame(data={"aaa":[1,2,3,4]}, index=mi1)
setValid = set(df1.index) - set([("a", 2)])
df1.loc[list(setValid)] # works
df1.loc[setValid] # fails
(sorry can't comment, insufficient rep)
Use isin
on the index and invert the boolean index to perform label selection:
In [239]:
df = pd.DataFrame({'a':np.random.randn(5)})
df
Out[239]:
a
0 -0.548275
1 -0.411741
2 -1.187369
3 1.028967
4 -2.755030
In [240]:
t = [2,4]
df.loc[~df.index.isin(t)]
Out[240]:
a
0 -0.548275
1 -0.411741
3 1.028967
You could use set()
to create the difference between your original indices and those that you want to remove:
df.loc[set(df.index) - set(blacklist)]
It has the advantage of being parsimonious, as well as being easier to read than a list comprehension.
If you are looking for a way to select all rows that are outside a condition you can use np.invert()
given that the condition returns an array of booleans.
df.loc[np.invert(({condition 1}) & (condition 2))]