Keep rows in data frame that, for all combinations of the values of certain columns, contain the same elements in another column
Here is one way. Get unique lists per group and then check common elements across all the returned arrays using reduce
and np.intersect1d
. Then filter the dataframe using series.isin
and boolean indexing
from functools import reduce
out = df[df['c'].isin(reduce(np.intersect1d,df.groupby(['a','b'])['c'].unique()))]
Breakdown:
s = df.groupby(['a','b'])['c'].unique()
common_elements = reduce(np.intersect1d,s)
#Returns :-> array(['c1', 'c3'], dtype=object)
out = df[df['c'].isin(common_elements )]#.copy()
a b c d
0 x z c1 1
2 x z c3 3
3 x w c1 4
4 x w c3 5
5 y z c1 6
6 y z c3 7
7 y w c1 8
9 y w c3 10
Lets try groupby
with nunique
to count of unique elements per column c
group:
s = df['a'] + ',' + df['b'] # combination of a, b
m = s.groupby(df['c']).transform('nunique').eq(s.nunique())
df[m]
a b c d
0 x z c1 1
2 x z c3 3
3 x w c1 4
4 x w c3 5
5 y z c1 6
6 y z c3 7
7 y w c1 8
9 y w c3 10
Try something diff crosstab
s = pd.crosstab([df['a'],df['b']],df.c).all()
out = df.loc[df.c.isin(s.index[s])]
Out[34]:
a b c d
0 x z c1 1
2 x z c3 3
3 x w c1 4
4 x w c3 5
5 y z c1 6
6 y z c3 7
7 y w c1 8
9 y w c3 10