Select data when specific columns have null value in pandas

try this:

In [7]: df[df.filter(like='Date').isnull().any(1)]
Out[7]:
           ID     Date1     Date2
0  58844880.0  04/11/16       NaN
2  59743311.0  04/13/16       NaN
4  59598413.0       NaN       NaN
8  59561198.0       NaN  04/17/16

Use boolean indexing:

mask = df['Date1'].isnull() | df['Date2'].isnull()
print (df[mask])
           ID     Date1     Date2
0  58844880.0  04/11/16       NaN
2  59743311.0  04/13/16       NaN
4  59598413.0       NaN       NaN
8  59561198.0       NaN  04/17/16

Timings:

#[900000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)

In [12]: %timeit (df[df['Date1'].isnull() | df['Date2'].isnull()])
10 loops, best of 3: 89.3 ms per loop

In [13]: %timeit (df[df.filter(like='Date').isnull().any(1)])
10 loops, best of 3: 146 ms per loop

Quickly see if either column has any null values

df.isnull().any()

Count rows that have any null values

df.isnull().sum()

Get rows with null values

(1) Create truth table of null values (i.e. create dataframe with True/False in each column/cell, according to whether it has null value)

truth_table = df.isnull()

(2) Create truth table that shows conclusively which rows have any null values

conclusive_truth_table = truth_table.any(axis='columns')

(3) isolate/show rows that have any null values

df[conclusive_truth_table]

(1)-(3) put it all together

df[df.isnull().any(axis='columns')]

Alternatively

Isolate rows that have null values in any specified column

df.loc[:,['Date1','Date2']].isnull().any(axis='columns')

Isolate rows that have null values in BOTH specified columns

df[ df.loc[ :,['Date1','Date2'] ].isnull().sum(axis=1) == 2]

Tags:

Python

Pandas