Filter out nan rows in a specific column
Use dropna
:
df = df.dropna(subset=['Col2'])
print (df)
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN
Another solution - boolean indexing
with notnull
:
df = df[df['Col2'].notnull()]
print (df)
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN
What is same as:
df = df[~df['Col2'].isnull()]
print (df)
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN
you can use DataFrame.dropna()
method:
In [202]: df.dropna(subset=['Col2'])
Out[202]:
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN
or (in this case) less idiomatic Series.notnull():
In [204]: df.loc[df.Col2.notnull()]
Out[204]:
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN
or using DataFrame.query() method:
In [205]: df.query("Col2 == Col2")
Out[205]:
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN
numexpr
solution:
In [241]: import numexpr as ne
In [242]: col = df.Col2
In [243]: df[ne.evaluate("col == col")]
Out[243]:
Col1 Col2 Col3
1 2 5.0 4.0
2 3 3.0 NaN
Using numpy
's isnan
to mask and construct a new dataframe
m = ~np.isnan(df.Col2.values)
pd.DataFrame(df.values[m], df.index[m], df.columns)
Col1 Col2 Col3
1 2.0 5.0 4.0
2 3.0 3.0 NaN
Timing
Bigger Data
np.random.seed([3,1415])
df = pd.DataFrame(np.random.choice([np.nan, 1], size=(10000, 10))).add_prefix('Col')
%%timeit
m = ~np.isnan(df.Col2.values)
pd.DataFrame(df.values[m], df.index[m], df.columns)
1000 loops, best of 3: 326 µs per loop
%timeit df.query("Col2 == Col2")
1000 loops, best of 3: 1.48 ms per loop
%timeit df.loc[df.Col2.notnull()]
1000 loops, best of 3: 417 µs per loop
%timeit df[~df['Col2'].isnull()]
1000 loops, best of 3: 385 µs per loop
%timeit df.dropna(subset=['Col2'])
1000 loops, best of 3: 913 µs per loop