TimeSeries interpolation with time gap awareness
I do not see an elegant solution using TimeSeriesResample
or the like using ResamplingMethod
or MissingDataMethod
. Maybe there is something undocumented?
But programming something that does what you want does look straight forward imo:
(* define a function to determine the gapsize for a given time *)
gapsize = Function[ {ts, time},
With[
{
differences = ts["Times"] // Differences,
intervals = ts["Times"] // Partition[#, 2, 1] & // Map[Interval]
},
Module[
{ pos },
pos = FirstPosition[ intervals, intval_ /; IntervalMemberQ[intval, time] ];
Extract[ differences, pos ]
]
]
];
Options[gapPathFunc] = {
"CriticalGapSize" -> 3
};
gapPathFunc[ ts_, time_, opts : OptionsPattern[gapPathFunc] ] := With[
{
pathf = ts["PathFunction"],
gapsize = gapsize,
critGapSize = OptionValue["CriticalGapSize"]
},
If[ gapsize[ts, time] < critGapSize,
(* then *) pathf[time],
(* else *) Missing[]
]
]
gapPathFunc[ ts, # ]& /@ Range[ 1, 13 ]
{1, 2, 3, 4, Missing[], Missing[], Missing[], Missing[], Missing[], Missing[], 3, 2, 1}
This may of course be made more robust and elegant (e.g. returning a modified pathfunction), but it is a start.
After a lot of trial and error, I created a function that I think is fast enough for general data science (I mean, to be applicable to millions of records).
At first, I tried to create two interpolation functions: one for the gap itself (with a zero order interpolation function), and one for the values (the "PathFunction" itself). But Mathematica Interpolation function is needing some revamping, both in speed and options . For a zero order, there's only Ceilling
. That's OK, since we can "slide" data, and cover for the "Floor
" numbers with a MemberQ
test. I haven't tried all possible options of this technique, and eventually, I could have gotten better results, if counting with memorization (pre-build the InterpolationFunction
), but all versions felt very cumbersome.
But then I remembered Leonid post. This lead me to considerable speedups, even without any special memorization. This is the adapted version (not compiled):
timeSeriesInterpolationGap[ts_, moment_, gap_] :=
Module[{n0, m, pos, times = ts["Times"], n1, values = ts["Values"],
timeMoment, gapTime},
If[DateObjectQ[moment], timeMoment = AbsoluteTime[moment],
timeMoment = moment];
If[QuantityQ[gap],
gapTime = QuantityMagnitude@UnitConvert[gap, "Seconds"],
gapTime = gap];
Which[
times[[1]] <= timeMoment < times[[-1]],
n0 = 1; n1 = Length[times];
While[n0 <= n1, m = Floor[(n0 + n1)/2];
If[times[[m]] == timeMoment, Break[]];
If[times[[m]] < timeMoment, n0 = m + 1, n1 = m - 1]];
pos = If[times[[m]] < timeMoment, m, m - 1];
If[times[[pos + 1]] - times[[pos]] <= gapTime,
values[[pos]] + (values[[pos + 1]] -
values[[pos]])/(times[[pos + 1]] -
times[[pos]])*(timeMoment - times[[pos]]), Missing[]],
timeMoment == times[[-1]],
values[[-1]],
True,
Missing[]
]]
The function can be called with times or dates. Also, the gap can be specified with time units (which internally will be converted into seconds), or without units (which will be considered as seconds).
WARNING: it is not an exact replica of TimeSeries
interpolation function, since repeated timestamps are not treated correctly. To be an exact replica, it has to answer to the following test:
TimeSeries[{{0, 1}, {0, 2}, {0, 3}, {0, 4}}][0]
(*5/2*)
TimeSeries[{{0, 1}, {0, 2}, {0, 3}, {1, 4}}][0.5]
(*3.*)
TimeSeries[{{0, 1}, {1, 2}, {1, 3}, {1, 4}}][0.5]
(*2.*)
Also, there's no treatment of Missing
values, but that is also the case for the TimeSeries
functionality.
Any help improving it, to match with the built-in one, but still fast, is welcomed.
On 300 000 records, the function is approximately three orders of magnitude faster than the solution presented by @gwr. One thing that is not helping on gwr answer is the fact that FirstPosition
is two orders of magnitude slower than Leonid's function, which doesn't make too much sense (I mean, doesn't make sense that Mathematica doesn't have a more optimized function). This lead me to also test the TimeSeriesWindow
function. It compares correctly with Leonid's function, which even makes less sense (I haven't checked if we can see the code).
I also created a version that just does the interpolation, without any gap checking. I haven't mixed both on the same code...:
timeSeriesInterpolation[ts_, moment_] :=
Module[{n0, m, pos, times = ts["Times"], n1, values = ts["Values"],
timeMoment},
If[DateObjectQ[moment], timeMoment = AbsoluteTime[moment],
timeMoment = moment];
Which[
times[[1]] <= timeMoment < times[[-1]],
n0 = 1; n1 = Length[times];
While[n0 <= n1,
m = Floor[(n0 + n1)/2];
If[times[[m]] == timeMoment, Break[]];
If[times[[m]] < timeMoment, n0 = m + 1, n1 = m - 1]
];
pos = If[times[[m]] < timeMoment, m, m - 1];
values[[pos]] + (values[[pos + 1]] -
values[[pos]])/(times[[pos + 1]] - times[[pos]])*(timeMoment -
times[[pos]]),
timeMoment == times[[-1]],
values[[-1]],
True,
Missing[]
]]
(again, suffers from the same difference with the TimeSeries
functionality that was listed above)
Testing this version with the TimeSeries
functionality, for the same 300 000 records:
[10^-6 s] The fastest is the use of
"PathFunction"
, having prebuilt it first [0.5 s], and calling the prebuilt ourselves:myFun[time_]=ts["PathFunction"][time]
(notice the=
, instead of:=
)[10^-3 s] Using
ts[time]
is 20% faster thants[date]
. Both have a pre-built time [0.5 s]. It is interesting that calling our own pre-builtSet
memorization of the"PathFunction"
makes such a difference in speed (three orders of magnitude...)[10^-5 s] Using
ts["PathFunction"][time]
. Obviously, it also has the pre-built time. But again, it is still strange that calling our own pre-builtSet
memorization of the"PathFunction"
makes a difference in speed (one order of magnitude...)[10^-4 s] Using the
timeSeriesInterpolation[ts, time]
(and two times slower, if using date instead of time). BUT NO PRE-BUILT time. So, it is almost as fast as what TimeSeries can be, and doesn't need pre-building. Obviously, the main advantage is the fact that it's the only option we have that has time gap awareness.
Why so much trouble? TimeSeries
are not exactly as the examples the documentation presents... They typically have millions of records, and hence 10^-6 s is fundamental. Only if I could get 10^-6 with the gap analysis...
It is also welcomed the addition of an option that considers a certain tolerance to the gap. That is, if the searched timestamp is within a too big gap, but at just a delta value from a recorded timestamp, it assumes the value of the recorded timestamp (this is important for records being saved with different time precisions, which is common on field instrumentation, PLC, etc.).
Interesting also to notice that if a TimeSeries
is supplied with units, the timeSeriesInterpolation
function is one order of magnitude slower. This seems to be due to a huge over cost coming from units arithmetic... But at least it works, while everything based on "PathFunction"
breaks when in the presence of Quantity
. The results from the following tests seem quite explicit:
RepeatedTiming[Quantity[1.0, "Seconds"]/10]
(*{0.000273, Quantity[0.1, "Seconds"]}*)
RepeatedTiming[Quantity[QuantityMagnitude[Quantity[1.0, "Seconds"]]/10,
QuantityUnit[Quantity[1.0, "Seconds"]]]]
(*{0.0000667, Quantity[0.1, "Seconds"]}*)
RepeatedTiming[Quantity[Quantity[1.0, "Seconds"][[1]]/10,
Quantity[1.0, "Seconds"][[2]]]]
(*{0.0000278, Quantity[0.1, "Seconds"]}*)
RepeatedTiming[1/10]
(*{8.0*10^-7, 1/10}*)
There seems to be some space for optimizations...
I believe this can be done more efficiently and simply than either of the answers posted, by using two InterpolatingFunction
s as you described yourself. Here is a proof of concept for your evaluation.
makePF[ts_, gap_?NumericQ] :=
Module[{t, d, p, gf, pf, def},
t = ts["Times"];
d = UnitStep[gap - Differences @ t];
gf = ListInterpolation[d, Rest@t, InterpolationOrder -> 0];
pf = ts["PathFunction"][[2, 0]];
Quiet @ Extract[
Missing[] @@ pf[#],
Partition[Unitize @ gf[#] * Range @ Length @ #, 1]
] &
]
Basic example:
ts = TimeSeries[{{1, 1}, {2, 2}, {4, 4}, {10, 4}, {11, 3}, {12, 2}, {13, 1}}];
makePF[ts, 3] @ Range[13]
{1, 2, 3, 4, Missing[], Missing[], Missing[], Missing[], Missing[], Missing[], 3, 2, 1}
Note that the function generated by makePF
is listable and that is how it should be applied.
Performance on a 300,000 element TimeSeries:
bigts = TimeSeries[
{Sort @ RandomReal[1*^6, 300000],
RandomReal[{-9, 9}, 300000]}\[Transpose]
];
AbsoluteTiming[bigfn = makePF[bigts, 5];] (* build my interpolating object *)
{0.715165, Null}
samples = Range[1, 1*^6, 100];
(timeSeriesInterpolationGap[bigts, #, 5] & /@ samples); // RepeatedTiming
(bigfn @ samples); // RepeatedTiming
{2.181, Null} {0.0268, Null}
The output is not an exact match for yours but if this method holds promise I imagine it could be adjusted.
Obviously if only a few samples are needed your own method is superior but I am assuming a lot of the time intensive sampling would be performed.