How do you model something-over-time in Python?
My approach would be to build the time series, but include the availability object with a value set to the availability in that period.
availability:
[
{
"start": 09:00,
"end": 12:00,
"value": 4
},
{
"start": 12:00,
"end": 13:00,
"value": 3
}
]
data: [
{
"start": 10:00,
"end": 10:30,
}
]
Build the time series indexing on start/end times, with the value as the value. A start time for availability is +value, end time -value. While for an event, it'd be -1 or +1 as you said.
"09:00" 4
"10:00" -1
"10:30" 1
"12:00" -4
"12:00" 3
"13:00" -3
Then group by index, sum and cumulative sum.
getting:
"09:00" 4
"10:00" 3
"10:30" 4
"12:00" 3
"13:00" 0
Example code in pandas:
import numpy as np
import pandas as pd
data = [
{
"start": "10:00",
"end": "10:30",
}
]
breakpoints = [
{
"start": "00:00",
"end": "09:00",
"value": 0
},
{
"start": "09:00",
"end": "12:00",
"value": 4
},
{
"start": "12:00",
"end": "12:30",
"value": 4
},
{
"start": "12:30",
"end": "13:00",
"value": 3
},
{
"start": "13:00",
"end": "00:00",
"value": 0
}
]
df = pd.DataFrame(data, columns=['start', 'end'])
print(df.head(5))
starts = pd.DataFrame(data, columns=['start'])
starts["value"] = -1
starts = starts.set_index("start")
ends = pd.DataFrame(data, columns=['end'])
ends["value"] = 1
ends = ends.set_index("end")
breakpointsStarts = pd.DataFrame(breakpoints, columns=['start', 'value']).set_index("start")
breakpointsEnds = pd.DataFrame(breakpoints, columns=['end', 'value'])
breakpointsEnds["value"] = breakpointsEnds["value"].transform(lambda x: -x)
breakpointsEnds = breakpointsEnds.set_index("end")
countsDf = pd.concat([starts, ends, breakpointsEnds, breakpointsStarts]).sort_index()
countsDf = countsDf.groupby(countsDf.index).sum().cumsum()
print(countsDf)
# Periods that are available
df = countsDf
df["available"] = df["value"] > 0
# Indexes where the value of available changes
# Alternatively swap out available for the value.
time_changes = df["available"].diff()[df["available"].diff() != 0].index.values
newDf = pd.DataFrame(time_changes, columns= ["start"])
# Setting the end column to the value of the next start
newDf['end'] = newDf.transform(np.roll, shift=-1)
print(newDf)
# Join this back in to get the actual value of available
mergedDf = newDf.merge(df, left_on="start", right_index=True)
print(mergedDf)
returning at the end:
start end value available
0 00:00 09:00 0 False
1 09:00 13:00 4 True
2 13:00 00:00 0 False
I'd approach it the same way you did with the appointments. Model the free time as appointments on its own. For each ending appointment check if theres another on ongoing, if so, skip here. If not, find the next starting appointment (one with a start date greater than this ones enddate.)
After you iterated all off your appointments, you should have an inverted mask of it.
To me, this problem would be well-represented by a list of boolean values. For ease of explanation, let's assume the length of every potential job is a multiple of 15 minutes. So, from 9 to 6, we have 135 "time slots" that we want to track availability for. We represent a queue's availability in a time slot with boolean variables: False
if the queue is processing a job, True
if the queue is available.
First, we create a list of time slots for every queue as well as the output. So, every queue and the output has time slots tk, 1 <= k <= 135.
Then, given five job queues, qj, 1 <= j <= 5, we say that tk is "open" at time k if there exists at least one qj where the time slot list at index k is True
.
We can implement this in standalone Python as follows:
slots = [ True ] * 135
queues = [ slots ] * 5
output = [ False ] * 135
def available (k):
for q in queues:
if q[k]:
return True
return False
We can then assume there exists some function dispatch (length)
that assigns a job to an available queue, setting the appropriate slots in queue[q]
to False
.
Finally, to update the output, we simply call:
def update():
for k in range(0, 135):
output[k] = available[k]
Or, for increased efficiency:
def update(i, j):
for k in range(i, j):
output[k] = available[k]
Then, you could simply call update(i, j)
whenever dispatch()
updates time slots i
thru j
for a new job. In this way, dispatching and updating is an O(n) operation, where n
is how many time slots are being changed, regardless of how many time slots there are.
This would allow you to make a simple function that maps human-readable time onto the range of time slot values, which would allow for making time slots larger or smaller as you wish.
You could also easily extend this idea to use a pandas data frame where each column is one queue, allowing you to use Series.any()
on every row at once to quickly update the output column.
Would love to hear suggestions regarding this approach! Perhaps there's a complexity of the problem I've missed, but I think this is a nice solution.