TimeSeries interpolation with time gap awareness

I do not see an elegant solution using TimeSeriesResample or the like using ResamplingMethod or MissingDataMethod. Maybe there is something undocumented?

But programming something that does what you want does look straight forward imo:

(* define a function to determine the gapsize for a given time *)
gapsize = Function[ {ts, time},
    With[
      {
          differences  = ts["Times"] // Differences,
          intervals = ts["Times"] // Partition[#, 2, 1] & // Map[Interval]
      },
      Module[
        { pos },
        pos = FirstPosition[ intervals, intval_ /; IntervalMemberQ[intval, time] ];
        Extract[ differences, pos ]
      ]
   ]
];

Options[gapPathFunc] = {
   "CriticalGapSize" -> 3
};

gapPathFunc[ ts_, time_, opts : OptionsPattern[gapPathFunc] ] := With[
  {
    pathf = ts["PathFunction"],
    gapsize = gapsize,
    critGapSize = OptionValue["CriticalGapSize"]
  },
  If[ gapsize[ts, time] < critGapSize,
      (* then *) pathf[time],
      (* else *) Missing[]
  ]
]

gapPathFunc[ ts, # ]& /@ Range[ 1, 13 ]

{1, 2, 3, 4, Missing[], Missing[], Missing[], Missing[], Missing[], Missing[], 3, 2, 1}

This may of course be made more robust and elegant (e.g. returning a modified pathfunction), but it is a start.

After a lot of trial and error, I created a function that I think is fast enough for general data science (I mean, to be applicable to millions of records).

At first, I tried to create two interpolation functions: one for the gap itself (with a zero order interpolation function), and one for the values (the "PathFunction" itself). But Mathematica Interpolation function is needing some revamping, both in speed and options . For a zero order, there's only Ceilling. That's OK, since we can "slide" data, and cover for the "Floor" numbers with a MemberQ test. I haven't tried all possible options of this technique, and eventually, I could have gotten better results, if counting with memorization (pre-build the InterpolationFunction), but all versions felt very cumbersome.

But then I remembered Leonid post. This lead me to considerable speedups, even without any special memorization. This is the adapted version (not compiled):

timeSeriesInterpolationGap[ts_, moment_, gap_] := 
 Module[{n0, m, pos, times = ts["Times"], n1, values = ts["Values"], 
   timeMoment, gapTime},
  If[DateObjectQ[moment], timeMoment = AbsoluteTime[moment], 
   timeMoment = moment];
  If[QuantityQ[gap], 
   gapTime = QuantityMagnitude@UnitConvert[gap, "Seconds"], 
   gapTime = gap];
  Which[
   times[[1]] <= timeMoment < times[[-1]],
   n0 = 1; n1 = Length[times];
   While[n0 <= n1, m = Floor[(n0 + n1)/2]; 
    If[times[[m]] == timeMoment, Break[]]; 
    If[times[[m]] < timeMoment, n0 = m + 1, n1 = m - 1]]; 
   pos = If[times[[m]] < timeMoment, m, m - 1];
   If[times[[pos + 1]] - times[[pos]] <= gapTime, 
    values[[pos]] + (values[[pos + 1]] - 
         values[[pos]])/(times[[pos + 1]] - 
         times[[pos]])*(timeMoment - times[[pos]]), Missing[]],
   timeMoment == times[[-1]],
   values[[-1]],
   True,
   Missing[]
   ]]

The function can be called with times or dates. Also, the gap can be specified with time units (which internally will be converted into seconds), or without units (which will be considered as seconds).

WARNING: it is not an exact replica of TimeSeries interpolation function, since repeated timestamps are not treated correctly. To be an exact replica, it has to answer to the following test:

TimeSeries[{{0, 1}, {0, 2}, {0, 3}, {0, 4}}][0]
(*5/2*)

TimeSeries[{{0, 1}, {0, 2}, {0, 3}, {1, 4}}][0.5]
(*3.*)

TimeSeries[{{0, 1}, {1, 2}, {1, 3}, {1, 4}}][0.5]
(*2.*)

Also, there's no treatment of Missing values, but that is also the case for the TimeSeries functionality.

Any help improving it, to match with the built-in one, but still fast, is welcomed.

On 300 000 records, the function is approximately three orders of magnitude faster than the solution presented by @gwr. One thing that is not helping on gwr answer is the fact that FirstPosition is two orders of magnitude slower than Leonid's function, which doesn't make too much sense (I mean, doesn't make sense that Mathematica doesn't have a more optimized function). This lead me to also test the TimeSeriesWindow function. It compares correctly with Leonid's function, which even makes less sense (I haven't checked if we can see the code).

I also created a version that just does the interpolation, without any gap checking. I haven't mixed both on the same code...:

timeSeriesInterpolation[ts_, moment_] := 
 Module[{n0, m, pos, times = ts["Times"], n1, values = ts["Values"], 
   timeMoment},
  If[DateObjectQ[moment], timeMoment = AbsoluteTime[moment], 
   timeMoment = moment];
  Which[
   times[[1]] <= timeMoment < times[[-1]],
   n0 = 1; n1 = Length[times];
   While[n0 <= n1,
    m = Floor[(n0 + n1)/2];
    If[times[[m]] == timeMoment, Break[]];
    If[times[[m]] < timeMoment, n0 = m + 1, n1 = m - 1]
    ];
   pos = If[times[[m]] < timeMoment, m, m - 1];
   values[[pos]] + (values[[pos + 1]] - 
        values[[pos]])/(times[[pos + 1]] - times[[pos]])*(timeMoment -
        times[[pos]]),
   timeMoment == times[[-1]],
   values[[-1]],
   True,
   Missing[]
   ]]

(again, suffers from the same difference with the TimeSeries functionality that was listed above)

Testing this version with the TimeSeries functionality, for the same 300 000 records:

[10^-6 s] The fastest is the use of "PathFunction", having prebuilt it first [0.5 s], and calling the prebuilt ourselves: myFun[time_]=ts["PathFunction"][time] (notice the =, instead of :=)
[10^-3 s] Using ts[time] is 20% faster than ts[date]. Both have a pre-built time [0.5 s]. It is interesting that calling our own pre-built Set memorization of the "PathFunction"makes such a difference in speed (three orders of magnitude...)
[10^-5 s] Using ts["PathFunction"][time]. Obviously, it also has the pre-built time. But again, it is still strange that calling our own pre-built Set memorization of the "PathFunction"makes a difference in speed (one order of magnitude...)
[10^-4 s] Using the timeSeriesInterpolation[ts, time] (and two times slower, if using date instead of time). BUT NO PRE-BUILT time. So, it is almost as fast as what TimeSeries can be, and doesn't need pre-building. Obviously, the main advantage is the fact that it's the only option we have that has time gap awareness.

Why so much trouble? TimeSeries are not exactly as the examples the documentation presents... They typically have millions of records, and hence 10^-6 s is fundamental. Only if I could get 10^-6 with the gap analysis...

It is also welcomed the addition of an option that considers a certain tolerance to the gap. That is, if the searched timestamp is within a too big gap, but at just a delta value from a recorded timestamp, it assumes the value of the recorded timestamp (this is important for records being saved with different time precisions, which is common on field instrumentation, PLC, etc.).

Interesting also to notice that if a TimeSeries is supplied with units, the timeSeriesInterpolationfunction is one order of magnitude slower. This seems to be due to a huge over cost coming from units arithmetic... But at least it works, while everything based on "PathFunction" breaks when in the presence of Quantity. The results from the following tests seem quite explicit:

RepeatedTiming[Quantity[1.0, "Seconds"]/10]
(*{0.000273, Quantity[0.1, "Seconds"]}*)

RepeatedTiming[Quantity[QuantityMagnitude[Quantity[1.0, "Seconds"]]/10, 
 QuantityUnit[Quantity[1.0, "Seconds"]]]]
(*{0.0000667, Quantity[0.1, "Seconds"]}*)

RepeatedTiming[Quantity[Quantity[1.0, "Seconds"][[1]]/10, 
 Quantity[1.0, "Seconds"][[2]]]]
(*{0.0000278, Quantity[0.1, "Seconds"]}*)

RepeatedTiming[1/10]
(*{8.0*10^-7, 1/10}*)

There seems to be some space for optimizations...

I believe this can be done more efficiently and simply than either of the answers posted, by using two InterpolatingFunctions as you described yourself. Here is a proof of concept for your evaluation.

makePF[ts_, gap_?NumericQ] :=
  Module[{t, d, p, gf, pf, def},
    t = ts["Times"];
    d = UnitStep[gap - Differences @ t];
    gf = ListInterpolation[d, Rest@t, InterpolationOrder -> 0];
    pf = ts["PathFunction"][[2, 0]];
    Quiet @ Extract[
      Missing[] @@ pf[#],
      Partition[Unitize @ gf[#] * Range @ Length @ #, 1]
    ] &
  ]

Basic example:

ts = TimeSeries[{{1, 1}, {2, 2}, {4, 4}, {10, 4}, {11, 3}, {12, 2}, {13, 1}}];

makePF[ts, 3] @ Range[13]

{1, 2, 3, 4, Missing[], Missing[], Missing[],
 Missing[], Missing[], Missing[], 3, 2, 1}

Note that the function generated by makePF is listable and that is how it should be applied.

Performance on a 300,000 element TimeSeries:

bigts = TimeSeries[
   {Sort @ RandomReal[1*^6, 300000], 
    RandomReal[{-9, 9}, 300000]}\[Transpose]
];

AbsoluteTiming[bigfn = makePF[bigts, 5];]  (* build my interpolating object *)

{0.715165, Null}

samples = Range[1, 1*^6, 100];

(timeSeriesInterpolationGap[bigts, #, 5] & /@ samples); // RepeatedTiming

(bigfn @ samples);                                      // RepeatedTiming

{2.181, Null}

{0.0268, Null}

The output is not an exact match for yours but if this method holds promise I imagine it could be adjusted.

Obviously if only a few samples are needed your own method is superior but I am assuming a lot of the time intensive sampling would be performed.

TimeSeries interpolation with time gap awareness

Tags:

Options

Time Series

Performance Tuning

Related

Recent Posts