Group by two columns and count the occurrences of each combination in Pandas
Maybe this is what you want?
>>> data = pd.DataFrame({'user_id' : ['a1', 'a1', 'a1', 'a2','a2','a2','a3','a3','a3'], 'product_id' : ['p1','p1','p2','p1','p1','p1','p2','p2','p3']})
>>> count_series = data.groupby(['user_id', 'product_id']).size()
>>> count_series
user_id product_id
a1 p1 2
p2 1
a2 p1 3
a3 p2 2
p3 1
dtype: int64
>>> new_df = count_series.to_frame(name = 'size').reset_index()
>>> new_df
user_id product_id size
0 a1 p1 2
1 a1 p2 1
2 a2 p1 3
3 a3 p2 2
4 a3 p3 1
>>> new_df['size']
0 2
1 1
2 3
3 2
4 1
Name: size, dtype: int64
In Pandas 1.1.0 you can use the method value_counts
with DataFrames:
df.value_counts()
Output:
product_id user_id
p1 a2 3
p2 a3 2
p1 a1 2
p3 a3 1
p2 a1 1
If you need a DataFrame:
df.value_counts().to_frame('counts').reset_index()
Output:
product_id user_id counts
0 p1 a2 3
1 p2 a3 2
2 p1 a1 2
3 p3 a3 1
4 p2 a1 1