Counting most common combination of values in dataframe column

We can merge within ID and filter out duplicate merges (I assume you have a default RangeIndex). Then we sort so that the grouping is regardless of order:

import pandas as pd
import numpy as np

df1 = df.reset_index()
df1 = df1.merge(df1, on='ID').query('index_x > index_y')

df1 = pd.DataFrame(np.sort(df1[['Product_x', 'Product_y']].to_numpy(), axis=1))
df1.groupby([*df1]).size()

0  1
A  B    2
   C    1
   D    1
C  D    1
dtype: int64

Use itertools.combinations, explode and value_counts

import itertools

(df.groupby('ID').Product.agg(lambda x: list(itertools.combinations(x,2)))
                 .explode().str.join('-').value_counts())

Out[611]:
A-B    2
C-D    1
A-D    1
A-C    1
Name: Product, dtype: int64

Or:

import itertools

(df.groupby('ID').Product.agg(lambda x: list(map('-'.join, itertools.combinations(x,2))))
                 .explode().value_counts())

Out[597]:
A-B    2
C-D    1
A-D    1
A-C    1
Name: Product, dtype: int64

Counting most common combination of values in dataframe column

Tags:

Python

Pandas

Related

Recent Posts