Pandas: Insert missing row data and iterate with conditions within groups
We can come up this , Idea here is use shift
get the match row and add the not match row to original df
s=df.groupby('name',sort=False).From_num.shift()
addingdata=pd.concat([s,df.drop('From_num',1)],axis=1)[df.To_num.ne(s)&s.notnull()]
addingdata.index-=1
addingdata.columns=['To_num','name', 'From_num']
df=df.append(addingdata).sort_index()
df
name From_num To_num
0 Jim 80 99
1 Jim 68 80
2 Jim 751 68
3 Jim Started 751
4 Mike 32 105
5 Mike 68 32
6 Mike 126 68
7 Mike 49 126
8 Mike Started 49
9 Polo 105 324
10 Polo 68 105
10 Polo 114 68
11 Polo 76 114
12 Polo Started 76
13 Tom 251 96
13 Tom 115 251
14 Tom 49 115
15 Tom 23 49
16 Tom Started 23
We can do the following:
- Check if next row of
To_num
is equal to current rowFrom_num
- Do this check per group of
name
- For these rows, replace
To_num
byFrom_num
- Finally fill in
To_num
of next row inFrom_num
This solution should be fast, since it is all vectorized, except that we have to check the booleans
for each group with GroupBy.apply
, but that is an oké scenario to use apply
.
def create_masks(d):
shift = d['To_num'].shift(-1)
m1 = d['From_num'].ne(shift)
m2 = shift.notna()
return m1 & m2
def create_rows(d):
bools = d.groupby('name').apply(create_masks).reset_index(drop=True)
vals = d[bools].copy()
vals['To_num'] = vals['From_num']
vals.loc[:, 'From_num'] = d.loc[bools.shift().fillna(False), 'To_num'].to_numpy()
d = d.append(vals).sort_index().reset_index(drop=True)
return d
df = create_rows(df)
Output
name From_num To_num
0 Jim 80 99
1 Jim 68 80
2 Jim 751 68
3 Jim Started 751
4 Mike 32 105
5 Mike 68 32
6 Mike 126 68
7 Mike 49 126
8 Mike Started 49
9 Polo 105 324
10 Polo 68 105
11 Polo 114 68
12 Polo 76 114
13 Polo Started 76
14 Tom 251 96
15 Tom 115 251
16 Tom 49 115
17 Tom 23 49
18 Tom Started 23