Select data when specific columns have null value in pandas
try this:
In [7]: df[df.filter(like='Date').isnull().any(1)]
Out[7]:
ID Date1 Date2
0 58844880.0 04/11/16 NaN
2 59743311.0 04/13/16 NaN
4 59598413.0 NaN NaN
8 59561198.0 NaN 04/17/16
Use boolean indexing
:
mask = df['Date1'].isnull() | df['Date2'].isnull()
print (df[mask])
ID Date1 Date2
0 58844880.0 04/11/16 NaN
2 59743311.0 04/13/16 NaN
4 59598413.0 NaN NaN
8 59561198.0 NaN 04/17/16
Timings:
#[900000 rows x 3 columns]
df = pd.concat([df]*100000).reset_index(drop=True)
In [12]: %timeit (df[df['Date1'].isnull() | df['Date2'].isnull()])
10 loops, best of 3: 89.3 ms per loop
In [13]: %timeit (df[df.filter(like='Date').isnull().any(1)])
10 loops, best of 3: 146 ms per loop
Quickly see if either column has any null values
df.isnull().any()
Count rows that have any null values
df.isnull().sum()
Get rows with null values
(1) Create truth table of null values (i.e. create dataframe with True/False in each column/cell, according to whether it has null value)
truth_table = df.isnull()
(2) Create truth table that shows conclusively which rows have any null values
conclusive_truth_table = truth_table.any(axis='columns')
(3) isolate/show rows that have any null values
df[conclusive_truth_table]
(1)-(3) put it all together
df[df.isnull().any(axis='columns')]
Alternatively
Isolate rows that have null values in any specified column
df.loc[:,['Date1','Date2']].isnull().any(axis='columns')
Isolate rows that have null values in BOTH specified columns
df[ df.loc[ :,['Date1','Date2'] ].isnull().sum(axis=1) == 2]