Finding beginning of end of sequences in a list
TemporalData/TimeSeries + MissingDataMethod
You can replace non-numeric values with Missing[]
in testy
and use TemporalData
(or TimeSeries
) with the option MissingDataMethod
:
testy2 = Replace[testy, {x_?NumericQ :> x, _ :> Missing[]}, {1}];
td = TemporalData[testy2, {1},
MissingDataMethod -> {"Interpolation", InterpolationOrder -> 1}];
td["Values"]
{1.1, 2.4, 3.5, 2.5, 3., 3.5, 4., 4.5, 8.5, 7.5, 6.5, 5.5, 4.5, 6.5, 8.5}
MapAt[Highlighted, %, Position[testy, "xx"]]
Row[{ListLinePlot[testy, PlotLabel -> Style["testy", 16],
PlotMarkers -> {Graphics[{FaceForm[Blue], EdgeForm[], Disk[]}, ImageSize -> 12]},
ImageSize -> 300] ,
ListLinePlot[td, PlotLabel -> Style["td", 16],
PlotMarkers -> {Graphics[{FaceForm[Red], EdgeForm[], Disk[]}, ImageSize -> 12]},
ImageSize -> 300]}, Spacer[10]]
Further examples:
methods = {{"Interpolation", InterpolationOrder -> 1},
{"Interpolation", InterpolationOrder -> 2},
{"Interpolation", InterpolationOrder -> 0},
{"Interpolation", "HoldFrom" -> Left},
{"Interpolation", "HoldFrom" -> Right},
{"Constant", 7}};
Multicolumn[Labeled[ListLinePlot[{testy,
TemporalData[testy2, {1}, MissingDataMethod -> #]},
PlotStyle -> {Directive[AbsoluteThickness[5], Blue],
Directive[AbsoluteThickness[2], Red]},
PlotMarkers -> {Graphics[{FaceForm[Blue], EdgeForm[], Disk[]}, ImageSize -> 14],
Graphics[{FaceForm[{Opacity[1], Red}], EdgeForm[], Disk[]}, ImageSize -> 10]},
ImageSize -> 300],
Style[Column[{"MissingDataMethod -> ", #}, Alignment -> Center], 16],
Top] & /@ methods,
2, Appearance -> "Horizontal"]
TimeSeriesResample + ResamplingMethod
tss = TimeSeriesResample@
TemporalData[Cases[_Real]@testy, {Flatten@Position[testy, _Real]},
ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}];
tss["Values"] // MapAt[Highlighted, #, Position[testy, "xx"]] &
ReplaceRepeated
rule = {l___, a_Real, x : "xx" .., b_Real, r___} :>
Flatten[{l, Subdivide[a, b, 1 + Length@{x}], r}];
testy //. rule // MapAt[Highlighted, #, Position[testy, "xx"]] &
Interpolation
For large inputs, Interpolation
is faster than alternative methods. The following is a slightly faster version of Jean-Pierre's approach:
intF = Interpolation[Transpose[{Flatten@Position[#, _Real], Cases[_Real]@#}],
InterpolationOrder -> 1] /@ Range[Length @#] &;
intF @ testy // MapAt[Highlighted, #, Position[testy, "xx"]] &
Update: Timings
SeedRandom[1]
n = 1000;
tstx = RandomReal[100, n];
tstx = MapAt["xx" &, tstx, List /@ RandomSample[Range[2, n - 1], n/2]];
Timings for the six methods posted so far:
resa = Interpolation[Transpose[{Flatten@Position[tstx, _Real],
Cases[_Real]@tstx}], InterpolationOrder -> 1] /@ Range[Length[tstx]]; //
RepeatedTiming // First
0.0031
(p = Partition[Flatten@MapIndexed[{#2, #1} &, tstx], 2];
cp = Cases[p, {_Integer, _Real}];
ff = Interpolation[cp, InterpolationOrder -> 1];
resb = Map[ff[#[[1]]] &, p];) //
RepeatedTiming // First (* method from Jean-Pierre's answer *)
0.0046
resc = TemporalData[Replace[tstx, {x_?NumericQ :> x, _ :> Missing[]}, {1}], {1},
MissingDataMethod -> {"Interpolation",
InterpolationOrder -> 1}]["Values"]; // RepeatedTiming // First
0.0094
resd = TimeSeriesResample[TemporalData[Cases[_Real]@tstx,
{Flatten@Position[tstx, _Real]}, ResamplingMethod -> {"Interpolation",
InterpolationOrder -> 1}]]["Values"]; //
RepeatedTiming // First
0.0092
lerp[x_, y_, n_] := Interpolation[{{0, x}, {n + 1, y}},
InterpolationOrder -> 1][Range[1, n]]
rese = FixedPoint[SequenceReplace[{x_Real, xs : "xx" .., y_Real} :>
With[{n = Length@{xs}},
Sequence @@ Flatten[{x, lerp[x, y, n], y}]]], tstx]; //
RepeatedTiming // First (* method from Pillsy's answer *)
8.2
rule = {l___, a_Real, x : "xx" .., b_Real, r___} :>
Flatten[{l, Subdivide[a, b, 1 + Length@{x}], r}];
resf = tstx //. rule; // RepeatedTiming // First
330.
resa == resb == resc == resd == rese == resf
True
We almost want SequenceReplace
, but it doesn't quite work because of the way it handles overlapping patters. However, at the potential cost of some inefficiency, we can combine it with FixedPoint
to get the result we want.
First, a utility function to do the interpolation:
lerp[x_, y_, n_] :=
Interpolation[{{0, x}, {n + 1, y}}, InterpolationOrder -> 1][
Range[1, n]]
With that, it gives the result you want:
FixedPoint[
SequenceReplace[{x_Real, xs : "xx" .., y_Real} :>
With[{n = Length@{xs}},
Sequence @@ Flatten[{x, lerp[x, y, n], y}]]],
testy]
(* {1.1, 2.4, 3.5, 2.5, 3., 3.5, 4., 4.5, 8.5, 7.5, 6.5, 5.5, 4.5, 6.5, \
8.5} *)
EDIT
Unfortunately, with large data sets this is slow as all get-out. If you want speed, just use Interpolation
directly, as suggested by Jean-Pierre and klgr. However, you can speed this up even more by using the third argument of Pick
, instead of Cases
.
With[{n = 1000},
SeedRandom[1];
testx =
MapAt["xx" &, RandomReal[100, n],
List /@ RandomSample[Range[2, n - 1], n/2]]];
Interpolation[Transpose[{Flatten@Position[#, _Real], Cases[_Real]@#}],
InterpolationOrder -> 1] /@ Range[Length@#] &[
testx]; // RepeatedTiming
(* {0.00331371, Null} *)
Interpolation[
Pick[
Transpose@{Range@Length@testz, testx},
testz,
_Real],
InterpolationOrder -> 1]; // RepeatedTiming
(* {0.000958979, Null} *)
Given the size of the data set in question, a factor of 3 speed up is nothing to sneeze at.
testy = {1.1, 2.4, 3.5, 2.5, "xx", "xx", "xx", 4.5, 8.5, "xx", "xx",
"xx", 4.5, "xx", 8.5};
p = Partition[Flatten@MapIndexed[{#2, #1} &, testy], 2];
cp = Cases[p, {_Integer, _Real}];
ff = Interpolation[cp, InterpolationOrder -> 1];
Map[ff[#[[1]]] &, p]