Group by two columns and count the occurrences of each combination in Pandas

Maybe this is what you want?

>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id  product_id
a1       p1            2
         p2            1
a2       p1            3
a3       p2            2
         p3            1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
  user_id product_id  size
0      a1         p1     2
1      a1         p2     1
2      a2         p1     3
3      a3         p2     2
4      a3         p3     1
>>> new_df['size']
0    2
1    1
2    3
3    2
4    1
Name: size, dtype: int64

In Pandas 1.1.0 you can use the method value_counts with DataFrames:

df.value_counts()

Output:

product_id  user_id
p1          a2         3
p2          a3         2
p1          a1         2
p3          a3         1
p2          a1         1

If you need a DataFrame:

df.value_counts().to_frame('counts').reset_index()

Output:

  product_id user_id  counts
0         p1      a2       3
1         p2      a3       2
2         p1      a1       2
3         p3      a3       1
4         p2      a1       1

Group by two columns and count the occurrences of each combination in Pandas

Tags:

Python

Pandas

Dataframe

Data Analysis

Related

Recent Posts