Combine duplicated columns within a DataFrame

I believe this does what you are after:

df.groupby(lambda x:x, axis=1).sum()

Alternatively, between 3% and 15% faster depending on the length of the df:

df.groupby(df.columns, axis=1).sum()

EDIT: To extend this beyond sums, use .agg() (short for .aggregate()):

df.groupby(df.columns, axis=1).agg(numpy.max)

pandas >= 0.20: `df.groupby(level=0, axis=1)`

You don't need a lambda here, nor do you explicitly have to query df.columns; groupby accepts a level argument you can specify in conjunction with the axis argument. This is cleaner, IMO.

# Setup
np.random.seed(0)
df = pd.DataFrame(np.random.choice(50, (5, 5)), columns=list('AABBB'))
df

    A   A   B   B   B
0  44  47   0   3   3
1  39   9  19  21  36
2  23   6  24  24  12
3   1  38  39  23  46
4  24  17  37  25  13

<!_ >

df.groupby(level=0, axis=1).sum()

    A    B
0  91    6
1  48   76
2  29   60
3  39  108
4  41   75

Handling `MultiIndex` columns

Another case to consider is when dealing with MultiIndex columns. Consider

df.columns = pd.MultiIndex.from_arrays([['one']*3 + ['two']*2, df.columns])
df
  one         two    
    A   A   B   B   B
0  44  47   0   3   3
1  39   9  19  21  36
2  23   6  24  24  12
3   1  38  39  23  46
4  24  17  37  25  13

To perform aggregation across the upper levels, use

df.groupby(level=1, axis=1).sum()

    A    B
0  91    6
1  48   76
2  29   60
3  39  108
4  41   75

or, if aggregating per upper level only, use

df.groupby(level=[0, 1], axis=1).sum()

  one     two
    A   B   B
0  91   0   6
1  48  19  57
2  29  24  36
3  39  39  69
4  41  37  38

Alternate Interpretation: Dropping Duplicate Columns

If you came here looking to find out how to simply drop duplicate columns (without performing any aggregation), use Index.duplicated:

df.loc[:,~df.columns.duplicated()]

    A   B
0  44   0
1  39  19
2  23  24
3   1  39
4  24  37

Or, to keep the last ones, specify keep='last' (default is 'first'),

df.loc[:,~df.columns.duplicated(keep='last')]

    A   B
0  47   3
1   9  36
2   6  12
3  38  46
4  17  13

The groupby alternatives for the two solutions above are df.groupby(level=0, axis=1).first(), and ... .last(), respectively.

Combine duplicated columns within a DataFrame

pandas >= 0.20: `df.groupby(level=0, axis=1)`

Handling `MultiIndex` columns

Alternate Interpretation: Dropping Duplicate Columns

Tags:

Python

Pandas

Group By

Dataframe

Pandas Groupby

Related

Recent Posts

Combine duplicated columns within a DataFrame

pandas >= 0.20: df.groupby(level=0, axis=1)

Handling MultiIndex columns

Alternate Interpretation: Dropping Duplicate Columns

Tags:

Python

Pandas

Group By

Dataframe

Pandas Groupby

Related

pandas >= 0.20: `df.groupby(level=0, axis=1)`

Handling `MultiIndex` columns