The longest run of at most k different elements in a list
Edit: My first implementation, using ListCorrelate
, had unexpectedly poor computational complexity. Here is a second implementation of my original idea using a more basic procedural method to keep track of the counts. It performs significantly better and likely compilation would make it significantly faster still. It shall have to stand for now until I have more time.
g2[list_, n_, k_] :=
Module[{x, a, c},
x = ArrayComponents[list, 1, 0 -> 1];
a = ConstantArray[0, Max@x];
Do[a[[ x[[p]] ]]++, {p, n}];
c = {Tr @ UnitStep[a - 1]} ~Join~
Table[
a[[ x[[p]] ]]--; a[[ x[[p + n]] ]]++; Tr @ UnitStep[a - 1],
{p, Length@x - n - 1}
];
list[[# ;; # + n - 1]] & /@ Join @@ Position[UnitStep[k - c], 1]
]
My first idea is to try to avoid checking each and every partition of length n
, especially if n
may be large. To that end I would make sure that list elements are non-numeric and then use a sliding sum to tally values. Here is an implementation of g
following this idea.
g[list_, n_, k_] :=
Module[{x = ToString /@ list},
x = ListCorrelate[ConstantArray[1, n], x];
x = Replace[x, {p_Plus :> Length[p], _ -> 1}, {1}];
x = Join @@ Position[UnitStep[k - x], 1];
list[[# ;; # + n - 1]] & /@ x
]
list = {a, c, b, a, a, a, b, c, e, f, b, a, e, e, e, a};
g[list, 4, 2]
{{b, a, a, a}, {a, a, a, b}, {a, e, e, e}, {e, e, e, a}}
g[list, 5, 2]
{{b, a, a, a, b}, {a, e, e, e, a}}
g[list, 3, 1]
{{a, a, a}, {e, e, e}}
Note: Position
is used in the code above. In version 10 this is adequately fast. In earlier versions, e.g. 7, for maximum performance it may be preferable to replace that line with:
x = Join @@ SparseArray[UnitStep[k - x]]["NonzeroPositions"];
brute force..
getmax[list_, n_] := Module[{i,p={},lp},
i = n;
While[lp = p; Length[p = Select[ Partition[list, i, 1] ,
Length[Union[#]] <= n & ]] > 0 , ++i]; lp ];
list = {a, c, b, a, a, a, b, c, e, f, b, a, e, e, e, a}
getmax[list, 2]
{{b, a, a, a, b}, {a, e, e, e, a}}
list = RandomChoice[Range[10000], 500000];
getmax[list, 3]
{{229, 5652, 4858, 5652}, {5652, 4858, 5652, 6371}, {9287, 6906, 1022, 9287}, ...
If your list is not random and actually has long runs this is really going to bog down though.
Note is speeds it up a good bit to select only the first match:
Select[ Partition[list, i, 1] , Length[Union[#]] <= n & , 1 ]
edit faster version
an example of the requested size with a long run burried in the middle:
list = RandomChoice[Range[10000], 250000] ~Join~
RandomChoice[Range[10], 500]~Join~
RandomChoice[Range[10000], 250000];
getmax[list_, n_] := Module[{ipos= Range[Length[list]], i=n, last},
While[ Length@ipos > 0 ,
last = {i, ipos};
ipos = Select[ ipos , (# + i <
Length[list] && (Length[Union[ list[[ # ;; # + i ]] ]] <=
n) ) & ]; ++i ];
list[[last[[-1, 1]] ;; last[[-1, 1]] + last[[1]] - 1]] ]
getmax[list, 10] // Timing
{6.505242, {8, 4, 9, 2, 4, 9, 2, 5, 1, 8, 7, 7, 3, 1, 2, 9, 4, 2, 4, 7, 5, 3, 8, 10, 4, 5, 7, 4, 5, 3, 6, 1, 4, 5, 7, 6, 5, 5, 8, 8, 9, 4, 10, 10, 10, 2, 3, 10, 9, 6, 6, 7, 3, 5, 5, 2, 7, 9, 1, 1, 10, 7, 7, 8, 10, 5, 7, 4, 1, 1, 7, 9, 9, 9, 1 ... (* 500 elements *)
Pattern-based (also brute force)
f[k_] := MaximalBy[ReplaceList[list, {___, x__ /; CountDistinct@{x} <= k, ___} :> {x}],
Length]
f[1]
(* {{a, a, a}, {e, e, e}} *)
f[2]
(* {{b, a, a, a, b}, {a, e, e, e, a}} *)
f[3]
(* {{a, c, b, a, a, a, b, c}} *)
A more intelligent approach (take while the number of distinct elements is not greater then k
starting from each element)
f[k_] := MaximalBy[Table[Module[{x = {}},
TakeWhile[list[[i ;;]], (x = x ⋃ {#}; Length@x <= k) &]], {i, Length@list}], Length]
f[1]
(* {{a, a, a}, {e, e, e}} *)
f[2]
(* {{b, a, a, a, b}, {a, e, e, e, a}} *)
f[3]
(* {{a, c, b, a, a, a, b, c}} *)