Python - Drop row if two columns are NaN

Specify dropna() method:

df.dropna(subset=[1,2], how='all')

The accepted answer didn't work for me. I tried the following code and nothing changed in my dataframe.

df.dropna(subset=['col1', 'col2', 'col3', 'col4', 'col5', 'col6'], how='all', inplace=True)

EDIT: I think it didn't work because 'all' refers to all columns in the df, and not all columns in the subset. The thresh is also a bit confusing for me, because it's a condition to keep rows, even though this a dropping function. In the OP, if I understand correctly, it's required to have at least one non nan value to keep a row, and if a row has more than 2 nan values, it should be dropped. In my case, I want to drop a row if all values in the columns of the subset have nan values, so I want to keep a row if all 6 values are not nan, and I don't know if this corresponds to thresh=6.

These two commands also didn't work for me:

df.dropna(subset=['col1', 'col2', 'col3', 'col4', 'col5', 'col6'], thres=6, inplace=True)
df.dropna(subset=['col1', 'col2', 'col3', 'col4', 'col5', 'col6'], how='all', thres=6, inplace=True)

and when I added axis=1 or axis='columns' in the parameters, I got a KeyError that the list of columns I am passing are not in the dataframe, and I don't know the reason why.

I ended up iterating over the df like this, also because I realized that some supposedly NaN values, might not be caught, because they are Null or empty:

indices_to_drop = list()
for index, row in df.iterrows():
        if (pd.isnull(row["col1"]) or pd.isna(row["col1"]) or row["col1"] == "") and \
                (pd.isnull(row["col2"]) or pd.isna(row["col2"]) or row["col2"] == "") and \
                (pd.isnull(row["col3"]) or pd.isna(row["col3"]) or row["col3"]
                 == "") and \
                (pd.isnull(row["col4"]) or pd.isna(row["col4"]) or row["col4"] ==
                 "") \
                and \
                (pd.isnull(row["col5"]) or pd.isna(row["col5"]) or row[
                    "col6"] == "") and \
                (pd.isnull(row["col6"]) or pd.isna(row["col6"]) or row["col6"] == ""):
            indices_to_drop.append(index)
df.drop(labels=indices_to_drop, inplace=True)

Any one of the following two:

df.dropna(subset=[1, 2], how='all')

df.dropna(subset=[1, 2], thresh=1)

I faced a similar issue where I'd 45 features(columns) and wanted to drop rows for only selected features having NaN values eg columns 7 to 45.

Step 1: I created a list (col_lst) from columns which I wanted to be operated for NaN

Step 2: df.dropna(axis = 0, subset = col_lst, how = 'all', inplace = True)

The above step removed only those rows fromthe dataframe which had all (not any) the columns from 7 to 45 with NaN values.

Python - Drop row if two columns are NaN

Tags:

Python

Pandas

Dataframe

Related

Recent Posts