pandas dataframe: how to aggregate a subset of rows based on value of a column
You can use lambda
and DataFrame.append
to achieve this in a 'one-liner':
thresh = 6
(df[lambda x: x['value'] >= thresh]
.append(df[lambda x: x['value'] < thresh].sum().rename('X')))
Or if you prefer
mask = df['value'].ge(thresh)
df[mask].append(df[~mask].sum().rename('X'))
[out]
value
lab
A 50
B 35
C 8
X 7
Use setting with enlargement with filtered DataFrame
:
threshold = 6
m = df['value'] < threshold
df1 = df[~m].copy()
df1.loc['Z'] = df.loc[m, 'value'].sum()
print (df1)
value
lab
A 50
B 35
C 8
Z 7
Another solution:
m = df['value'] < threshold
df1 = df[~m].append(df.loc[m, ['value']].sum().rename('Z'))
print (df1)
value
lab
A 50
B 35
C 8
Z 7