GroupBy operation using an entire dataframe to group values
You can stack
then groupby
two Series
a.stack().groupby(b.stack()).mean()
If you want a fast numpy solution, use np.unique
and np.bincount
:
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
# array([-0.0887145 , -0.34004319, -0.04559595, 0.58213553])
To construct a Series, use
pd.Series(np.bincount(i, c) / cnt, index=u)
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
For comparison, stack
returns,
a.stack().groupby(b.stack()).mean()
1 -0.088715
2 -0.340043
3 -0.045596
4 0.582136
dtype: float64
%timeit a.stack().groupby(b.stack()).mean()
%%timeit
c, d = (a_.to_numpy().ravel() for a_ in [a, b])
u, i, cnt = np.unique(d, return_inverse=True, return_counts=True)
np.bincount(i, c) / cnt
5.16 ms ± 305 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
113 µs ± 1.92 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)