Why does testing `NaN == NaN` not work for dropping from a pandas dataFrame?
You should use isnull
and notnull
to test for NaN (these are more robust using pandas dtypes than numpy), see "values considered missing" in the docs.
Using the Series method dropna
on a column won't affect the original dataframe, but do what you want:
In [11]: df
Out[11]:
comments
0 VP
1 VP
2 VP
3 TEST
4 NaN
5 NaN
In [12]: df.comments.dropna()
Out[12]:
0 VP
1 VP
2 VP
3 TEST
Name: comments, dtype: object
The dropna
DataFrame method has a subset argument (to drop rows which have NaNs in specific columns):
In [13]: df.dropna(subset=['comments'])
Out[13]:
comments
0 VP
1 VP
2 VP
3 TEST
In [14]: df = df.dropna(subset=['comments'])
You need to test NaN
with math.isnan()
function (Or numpy.isnan
). NaNs cannot be checked with the equality operator.
>>> a = float('NaN')
>>> a
nan
>>> a == 'NaN'
False
>>> isnan(a)
True
>>> a == float('NaN')
False
Help Function ->
isnan(...)
isnan(x) -> bool
Check if float x is not a number (NaN).