Fast method for removing duplicate columns in pandas.Dataframe

Perhaps you would be better off avoiding the problem altogether, by using pd.merge instead of pd.concat:

df_ab = pd.merge(df_a, df_b, how='inner')

This will merge df_a and df_b on all columns shared in common.

The easiest way is:

df = df.loc[:,~df.columns.duplicated()]

One line of code can change everything

You may use np.unique to get indices of unique columns, and then use .iloc:

>>> df
   A  A   B   B
0  5  5  10  10
1  6  6  19  19
>>> _, i = np.unique(df.columns, return_index=True)
>>> df.iloc[:, i]
   A   B
0  5  10
1  6  19

Fast method for removing duplicate columns in pandas.Dataframe

Tags:

Python

Pandas

Related

Recent Posts