How to assign unique values to groups of rows in a pandas dataframe based on a condition?
You can use cumsum
and map to letters with chr
:
m = df['A'].eq(0)
df['B'] = m.cumsum().add(65).map(chr).mask(m, '-')
df
A B
0 3 A
1 5 A
2 0 B
3 2 B
4 6 B
5 9 B
6 0 C
7 3 C
8 4 C
A NumPy solution can be written from this using views
, and should be quite fast:
m = np.cumsum(df['A'].values == 0)
# thanks to @user3483203 for the neat trick!
df['B'] = (m + 65).view('U2')
df
A B
0 3 A
1 5 A
2 0 B
3 2 B
4 6 B
5 9 B
6 0 C
7 3 C
8 4 C
From v0.22, you can also do this through pandas Series.view
:
m = df['A'].eq(0)
df['B'] = (m.cumsum()+65).view('U2').mask(m, '-')
df
A B
0 3 A
1 5 A
2 0 -
3 2 B
4 6 B
5 9 B
6 0 -
7 3 C
8 4 C
Here's one way using np.where
. I'm using numerical labeling here, which might be more appropiate in the case there are many groups:
import numpy as np
m = df.eq(0)
df['A'] = np.where(m, '-', m.cumsum())
A
0 0
1 0
2 -
3 1
4 1
5 1
6 -
7 2
8 2
IIUC
import string
s=df.A.eq(0).cumsum()
d=dict(zip(s.unique(),string.ascii_uppercase[:s.max()+1]))
s.loc[df.A!=0].map(d).reindex(df.index,fill_value='-')
Out[360]:
0 A
1 A
2 -
3 B
4 B
5 B
6 -
7 C
8 C
Name: A, dtype: object