Pandas How to filter a Series
A fast way of doing this is to reconstruct using numpy
to slice the underlying arrays. See timings below.
mask = s.values != 1
pd.Series(s.values[mask], s.index[mask])
0
383 3.000000
737 9.000000
833 8.166667
dtype: float64
naive timing
In [5]:
import pandas as pd
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
s = pd.Series(test)
s = s[s != 1]
s
Out[0]:
383 3.000000
737 9.000000
833 8.166667
dtype: float64
From pandas version 0.18+ filtering a series can also be done as below
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
pd.Series(test).where(lambda x : x!=1).dropna()
Checkout: http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements
As DACW pointed out, there are method-chaining improvements in pandas 0.18.1 that do what you are looking for very nicely.
Rather than using .where
, you can pass your function to either the .loc
indexer or the Series indexer []
and avoid the call to .dropna
:
test = pd.Series({
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
})
test.loc[lambda x : x!=1]
test[lambda x: x!=1]
Similar behavior is supported on the DataFrame and NDFrame classes.