unstacking shift data (start and end time) into hourly data

One idea is working with minutes - first use list comprehension with flattening for Series and then grouping by hours with hours for count by GroupBy.size and last divide by 60 for final hours:

s = pd.Series([z for x, y in zip(df['Pay Time Start'], 
                                 df['Pay Time End'] - pd.Timedelta(60, unit='s')) 
                 for z in pd.date_range(x, y, freq='Min')])

df = (s.groupby([s.dt.date.rename('Business Date'), s.dt.hour.rename('Time')])
       .size()
       .div(60)
       .reset_index(name='Hour'))
print (df)
  Business Date  Time  Hour
0    2019-05-24    11  1.00
1    2019-05-24    12  0.75
2    2019-05-24    13  0.50

If you need to group by a location or ID

 df1 = pd.DataFrame([(z, w) for x, y, w in zip(df['Pay Time Start'], 
                                              df['Pay Time End'] - pd.Timedelta(60, unit='s'), 
                                              df['Location']) for z in pd.date_range(x, y, freq='Min')], 
                   columns=['Date','Location']) 

 df = (df1.groupby([df1['Date'].dt.date.rename('Business Date'), 
                       df1['Date'].dt.hour.rename('Time'), df1['Location']]) 
          .size() .div(60) .reset_index(name='Hour'))

Another idea, similar's to @jezrael's but work with seconds for higher precision:

def get_series(a):
    s, e, h = a
    idx = pd.date_range(s,e, freq='6s')
    return pd.Series(h/len(idx), index=idx)

(pd.concat(map(get_series, zip(df.Pay_Time_Start,
                          df.Pay_Time_End, 
                          df.Hours)))
   .resample('H').sum()
)

Output:

2019-05-24 11:00:00    0.998668
2019-05-24 12:00:00    0.750500
2019-05-24 13:00:00    0.500832
Freq: H, dtype: float64

unstacking shift data (start and end time) into hourly data

Tags:

Python

Pandas

Related

Recent Posts