Groupby based on a multiple logical conditions applied to a different columns DataFrame
Create mask by your conditions - here for greater by Series.gt
with not equal by Series.ne
chained by &
for bitwise AND
and then use GroupBy.transform
for count True
s by sum
:
mask = df['value'].gt(3) & df['type'].ne('medium')
df['count'] = mask.groupby(df['entity']).transform('sum')
Solution with helper column new
:
mask = df['value'].gt(3) & df['type'].ne('medium')
df['count'] = df.assign(new = mask).groupby('entity')['new'].transform('sum')
print (df)
value type entity count
0 1 big R 3
1 2 small R 3
2 3 medium R 3
3 4 big P 4
4 2 big R 3
5 42 big P 4
6 12 big P 4
7 21 medium P 4
8 21 small R 3
9 424 small R 3
10 34 small P 4
11 12 medium R 3
12 42 small R 3
The solution in Pandas is superb. This is an alternative in a different package. The reason I am throwing this in here is because the original code was in data.table
in R, and it might be useful for others, who probably want a similar solution within Python.
This is a solution in pydatatable, a library that aims to replicate data.table
in python. Note that it is not as feature rich as Pandas; hopefully, with time, more features will be added.
Create the frame with datatable
:
from datatable import dt, f, by, update
df = dt.Frame({'value':[1,2,3,4,2,42,12,21,21,424,34,12,42],
'type':['big','small','medium','big','big','big','big','medium','small','small','small','medium','small'],
'entity':['R','R','R','P','R','P','P','P','R','R','P','R','R']})
Create the condition - In datatable, the f
symbol is a shortcut to refer to the dataframe:
condition = (f.type!="medium") & (f.value>3)
The syntax below should be familiar to users of data.table
,
DT[i, j, by]
where i
refers to anything that can occur in the rows, j
refers to column operations, and by
is for grouping operations. The update function is similar in function to the :=
function in data.table
; it allows for creation of new columns or update of existing columns in place.
df[:, update(count=dt.sum(condition)), by('entity')]
df
value type entity count
0 1 big R 3
1 2 small R 3
2 3 medium R 3
3 4 big P 4
4 2 big R 3
5 42 big P 4
6 12 big P 4
7 21 medium P 4
8 21 small R 3
9 424 small R 3
10 34 small P 4
11 12 medium R 3
12 42 small R 3