unstacking shift data (start and end time) into hourly data
One idea is working with minutes - first use list comprehension with flattening for Series
and then grouping by hours
with hour
s for count by GroupBy.size
and last divide by 60
for final hours:
s = pd.Series([z for x, y in zip(df['Pay Time Start'],
df['Pay Time End'] - pd.Timedelta(60, unit='s'))
for z in pd.date_range(x, y, freq='Min')])
df = (s.groupby([s.dt.date.rename('Business Date'), s.dt.hour.rename('Time')])
.size()
.div(60)
.reset_index(name='Hour'))
print (df)
Business Date Time Hour
0 2019-05-24 11 1.00
1 2019-05-24 12 0.75
2 2019-05-24 13 0.50
If you need to group by a location or ID
df1 = pd.DataFrame([(z, w) for x, y, w in zip(df['Pay Time Start'],
df['Pay Time End'] - pd.Timedelta(60, unit='s'),
df['Location']) for z in pd.date_range(x, y, freq='Min')],
columns=['Date','Location'])
df = (df1.groupby([df1['Date'].dt.date.rename('Business Date'),
df1['Date'].dt.hour.rename('Time'), df1['Location']])
.size() .div(60) .reset_index(name='Hour'))
Another idea, similar's to @jezrael's but work with seconds for higher precision:
def get_series(a):
s, e, h = a
idx = pd.date_range(s,e, freq='6s')
return pd.Series(h/len(idx), index=idx)
(pd.concat(map(get_series, zip(df.Pay_Time_Start,
df.Pay_Time_End,
df.Hours)))
.resample('H').sum()
)
Output:
2019-05-24 11:00:00 0.998668
2019-05-24 12:00:00 0.750500
2019-05-24 13:00:00 0.500832
Freq: H, dtype: float64