Finding beginning of end of sequences in a list

TemporalData/TimeSeries + MissingDataMethod

You can replace non-numeric values with Missing[] in testy and use TemporalData (or TimeSeries) with the option MissingDataMethod:

testy2 = Replace[testy, {x_?NumericQ :> x, _ :> Missing[]}, {1}];

td = TemporalData[testy2, {1}, 
  MissingDataMethod -> {"Interpolation", InterpolationOrder -> 1}];

td["Values"]

{1.1, 2.4, 3.5, 2.5, 3., 3.5, 4., 4.5, 8.5, 7.5, 6.5, 5.5, 4.5, 6.5, 8.5}

MapAt[Highlighted, %, Position[testy, "xx"]]

enter image description here

Row[{ListLinePlot[testy, PlotLabel -> Style["testy", 16], 
   PlotMarkers -> {Graphics[{FaceForm[Blue], EdgeForm[], Disk[]}, ImageSize -> 12]}, 
   ImageSize -> 300] ,
  ListLinePlot[td, PlotLabel -> Style["td", 16], 
   PlotMarkers -> {Graphics[{FaceForm[Red], EdgeForm[], Disk[]}, ImageSize -> 12]}, 
   ImageSize -> 300]}, Spacer[10]]

enter image description here

Further examples:

methods = {{"Interpolation", InterpolationOrder -> 1}, 
   {"Interpolation", InterpolationOrder -> 2}, 
   {"Interpolation", InterpolationOrder -> 0}, 
   {"Interpolation", "HoldFrom" -> Left}, 
   {"Interpolation", "HoldFrom" -> Right},
   {"Constant", 7}};

Multicolumn[Labeled[ListLinePlot[{testy, 
       TemporalData[testy2, {1}, MissingDataMethod -> #]}, 
     PlotStyle -> {Directive[AbsoluteThickness[5], Blue], 
       Directive[AbsoluteThickness[2], Red]}, 
     PlotMarkers -> {Graphics[{FaceForm[Blue], EdgeForm[], Disk[]}, ImageSize -> 14], 
       Graphics[{FaceForm[{Opacity[1], Red}], EdgeForm[], Disk[]},  ImageSize -> 10]}, 
         ImageSize -> 300], 
    Style[Column[{"MissingDataMethod -> ", #}, Alignment -> Center], 16], 
   Top] & /@ methods,
  2,  Appearance -> "Horizontal"]

enter image description here

TimeSeriesResample + ResamplingMethod

tss = TimeSeriesResample@
   TemporalData[Cases[_Real]@testy, {Flatten@Position[testy, _Real]}, 
    ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}];

tss["Values"] // MapAt[Highlighted, #, Position[testy, "xx"]] &

enter image description here

ReplaceRepeated

rule = {l___, a_Real, x : "xx" .., b_Real, r___} :> 
   Flatten[{l, Subdivide[a, b, 1 + Length@{x}], r}];

testy //. rule // MapAt[Highlighted, #, Position[testy, "xx"]] &

enter image description here

Interpolation

For large inputs, Interpolation is faster than alternative methods. The following is a slightly faster version of Jean-Pierre's approach:

intF = Interpolation[Transpose[{Flatten@Position[#, _Real], Cases[_Real]@#}], 
     InterpolationOrder -> 1] /@ Range[Length @#] &;

intF @ testy // MapAt[Highlighted, #, Position[testy, "xx"]] &

enter image description here

Update: Timings

SeedRandom[1]
n = 1000;
tstx = RandomReal[100, n];
tstx = MapAt["xx" &, tstx, List /@ RandomSample[Range[2, n - 1], n/2]];

Timings for the six methods posted so far:

resa = Interpolation[Transpose[{Flatten@Position[tstx, _Real], 
    Cases[_Real]@tstx}], InterpolationOrder -> 1] /@ Range[Length[tstx]]; // 
  RepeatedTiming // First

 0.0031

(p = Partition[Flatten@MapIndexed[{#2, #1} &, tstx], 2];
   cp = Cases[p, {_Integer, _Real}];
   ff = Interpolation[cp, InterpolationOrder -> 1];
   resb = Map[ff[#[[1]]] &, p];) // 
     RepeatedTiming // First (* method from Jean-Pierre's answer *)

 0.0046

resc = TemporalData[Replace[tstx, {x_?NumericQ :> x, _ :> Missing[]}, {1}], {1}, 
      MissingDataMethod -> {"Interpolation", 
        InterpolationOrder -> 1}]["Values"]; // RepeatedTiming // First

 0.0094

resd = TimeSeriesResample[TemporalData[Cases[_Real]@tstx, 
    {Flatten@Position[tstx, _Real]}, ResamplingMethod -> {"Interpolation", 
     InterpolationOrder -> 1}]]["Values"]; // 
     RepeatedTiming // First

 0.0092

lerp[x_, y_, n_] := Interpolation[{{0, x}, {n + 1, y}}, 
   InterpolationOrder -> 1][Range[1, n]]

rese = FixedPoint[SequenceReplace[{x_Real, xs : "xx" .., y_Real} :> 
  With[{n = Length@{xs}}, 
    Sequence @@ Flatten[{x, lerp[x, y, n], y}]]], tstx]; // 
  RepeatedTiming // First (* method from Pillsy's answer *)

8.2

rule = {l___, a_Real, x : "xx" .., b_Real, r___} :> 
   Flatten[{l, Subdivide[a, b, 1 + Length@{x}], r}];

resf = tstx //. rule; // RepeatedTiming // First

 330.

resa == resb == resc == resd == rese == resf

 True

We almost want SequenceReplace, but it doesn't quite work because of the way it handles overlapping patters. However, at the potential cost of some inefficiency, we can combine it with FixedPoint to get the result we want.

First, a utility function to do the interpolation:

lerp[x_, y_, n_] :=
 Interpolation[{{0, x}, {n + 1, y}}, InterpolationOrder -> 1][
  Range[1, n]]

With that, it gives the result you want:

FixedPoint[
 SequenceReplace[{x_Real, xs : "xx" .., y_Real} :>
   With[{n = Length@{xs}}, 
    Sequence @@ Flatten[{x, lerp[x, y, n], y}]]],
 testy]
(* {1.1, 2.4, 3.5, 2.5, 3., 3.5, 4., 4.5, 8.5, 7.5, 6.5, 5.5, 4.5, 6.5, \
8.5} *)

EDIT

Unfortunately, with large data sets this is slow as all get-out. If you want speed, just use Interpolation directly, as suggested by Jean-Pierre and klgr. However, you can speed this up even more by using the third argument of Pick, instead of Cases.

With[{n = 1000},
  SeedRandom[1];
  testx = 
   MapAt["xx" &, RandomReal[100, n], 
    List /@ RandomSample[Range[2, n - 1], n/2]]];

Interpolation[Transpose[{Flatten@Position[#, _Real], Cases[_Real]@#}],
       InterpolationOrder -> 1] /@ Range[Length@#] &[
   testx]; // RepeatedTiming
(* {0.00331371, Null} *)

Interpolation[
   Pick[
    Transpose@{Range@Length@testz, testx},
    testz,
    _Real],
   InterpolationOrder -> 1]; // RepeatedTiming
(* {0.000958979, Null} *)

Given the size of the data set in question, a factor of 3 speed up is nothing to sneeze at.

testy = {1.1, 2.4, 3.5, 2.5, "xx", "xx", "xx", 4.5, 8.5, "xx", "xx", 
   "xx", 4.5, "xx", 8.5};
p = Partition[Flatten@MapIndexed[{#2, #1} &, testy], 2];
cp = Cases[p, {_Integer, _Real}];
ff = Interpolation[cp, InterpolationOrder -> 1];
Map[ff[#[[1]]] &, p]

Finding beginning of end of sequences in a list

TemporalData/TimeSeries + MissingDataMethod

TimeSeriesResample + ResamplingMethod

ReplaceRepeated

Interpolation

Update: Timings

EDIT

Tags:

Pattern Matching

List Manipulation

Replacement

Related

Recent Posts