Find first and last non-zero column in each row of a pandas dataframe
Using cumsum
on the underlying array
m = df.drop(['Name', 'count'], axis=1)
u = m.to_numpy().cumsum(1)
start = (u!=0).argmax(1)
end = u.argmax(1)
df.assign(start=m.columns[start], end=m.columns[end])
Name Jan17 Jun18 Dec18 Apr19 count start end
0 Nick 0.0 1.7 3.7 0.0 2 Jun18 Dec18
1 Jack 0.0 0.0 2.8 3.5 2 Dec18 Apr19
2 Fox 0.0 1.7 0.0 0.0 1 Jun18 Jun18
3 Rex 1.0 0.0 3.0 4.2 3 Jan17 Apr19
4 Snack 0.0 0.0 2.8 4.4 2 Dec18 Apr19
5 Yosee 0.0 0.0 0.0 4.3 1 Apr19 Apr19
6 Petty 0.5 1.3 2.8 3.5 4 Jan17 Apr19
first_valid_index
and last_valid_index
d = df.mask(df == 0).drop(['Name', 'count'], 1)
df.assign(
Start=d.apply(pd.Series.first_valid_index, 1),
Finish=d.apply(pd.Series.last_valid_index, 1)
)
Name Jan17 Jun18 Dec18 Apr19 count Start Finish
0 Nick 0.0 1.7 3.7 0.0 2 Jun18 Dec18
1 Jack 0.0 0.0 2.8 3.5 2 Dec18 Apr19
2 Fox 0.0 1.7 0.0 0.0 1 Jun18 Jun18
3 Rex 1.0 0.0 3.0 4.2 3 Jan17 Apr19
4 Snack 0.0 0.0 2.8 4.4 2 Dec18 Apr19
5 Yosee 0.0 0.0 0.0 4.3 1 Apr19 Apr19
6 Petty 0.5 1.3 2.8 3.5 4 Jan17 Apr19
stack
then groupby
d = df.mask(df == 0).drop(['Name', 'count'], 1)
def fl(s): return s.xs(s.name).index[[0, -1]]
s, f = d.stack().groupby(level=0).apply(fl).str
df.assign(Start=s, Finish=f)
Name Jan17 Jun18 Dec18 Apr19 count Start Finish
0 Nick 0.0 1.7 3.7 0.0 2 Jun18 Dec18
1 Jack 0.0 0.0 2.8 3.5 2 Dec18 Apr19
2 Fox 0.0 1.7 0.0 0.0 1 Jun18 Jun18
3 Rex 1.0 0.0 3.0 4.2 3 Jan17 Apr19
4 Snack 0.0 0.0 2.8 4.4 2 Dec18 Apr19
5 Yosee 0.0 0.0 0.0 4.3 1 Apr19 Apr19
6 Petty 0.5 1.3 2.8 3.5 4 Jan17 Apr19
idxmax
mask = df.drop(['Name', 'count'], axis=1) > 0
df.assign(start=mask.idxmax(axis=1), end=mask.iloc[:,::-1].idxmax(axis=1))
Name Jan17 Jun18 Dec18 Apr19 count start end
0 Nick 0.0 1.7 3.7 0.0 2 Jun18 Dec18
1 Jack 0.0 0.0 2.8 3.5 2 Dec18 Apr19
2 Fox 0.0 1.7 0.0 0.0 1 Jun18 Jun18
3 Rex 1.0 0.0 3.0 4.2 3 Jan17 Apr19
4 Snack 0.0 0.0 2.8 4.4 2 Dec18 Apr19
5 Yosee 0.0 0.0 0.0 4.3 1 Apr19 Apr19
6 Petty 0.5 1.3 2.8 3.5 4 Jan17 Apr19
Drop irrelevant columns, then use idxmax
first on the columns, then on the reversed columns to find the first and last valid indices respectively.
In your case try something different with dot
s=df.loc[:,'Jan17':'Apr19'].ne(0)
s=s.dot(s.columns+',').str[:-1].str.split(',')
s.str[0],s.str[-1]
Out[899]:
(0 Jun18
1 Dec18
2 Jun18
3 Jan17
4 Dec18
5 Apr19
6 Jan17
dtype: object, 0 Dec18
1 Apr19
2 Jun18
3 Apr19
4 Apr19
5 Apr19
6 Apr19
dtype: object)
#df['Start'],df['End']=s.str[0],s.str[-1]