Identify groups of varying continuous numbers in a list
The itertools
pairwise recipe is one way to solve the problem. Applied with itertools.groupby
, groups of pairs whose mathematical difference are equivalent can be created. The first and last items of each group are then selected for multi-item groups or the last item is selected for singleton groups:
from itertools import groupby, tee, izip
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
def grouper(lst):
result = []
for k, g in groupby(pairwise(lst), key=lambda x: x[1] - x[0]):
g = list(g)
if len(g) > 1:
try:
if g[0][0] == result[-1]:
del result[-1]
elif g[0][0] == result[-1][1]:
g = g[1:] # patch for duplicate start and/or end
except (IndexError, TypeError):
pass
result.append((g[0][0], g[-1][-1], k))
else:
result.append(g[0][-1]) if result else result.append(g[0])
return result
Trial: input -> grouper(lst) -> output
Input: [2, 3, 4, 5, 12, 13, 14, 15, 16, 17, 20]
Output: [(2, 5, 1), (12, 17, 1), 20]
Input: [2, 4, 6, 8, 12, 13, 14, 15, 16, 17, 20]
Output: [(2, 8, 2), (12, 17, 1), 20]
Input: [2, 4, 6, 8, 12, 12.4, 12.9, 13, 14, 15, 16, 17, 20]
Output: [(2, 8, 2), 12, 12.4, 12.9, (13, 17, 1), 20] # 12 does not appear in the second group
Update: (patch for duplicate start and/or end values)
s1 = [i + 10 for i in xrange(0, 11, 2)]; s2 = [30]; s3 = [i + 40 for i in xrange(45)]
Input: s1+s2+s3
Output: [(10, 20, 2), (30, 40, 10), (41, 84, 1)]
# to make 30 appear as an entry instead of a group change main if condition to len(g) > 2
Input: s1+s2+s3
Output: [(10, 20, 2), 30, (41, 84, 1)]
Input: [2, 4, 6, 8, 10, 12, 13, 14, 15, 16, 17, 20]
Output: [(2, 12, 2), (13, 17, 1), 20]
Here is a quickly written (and extremely ugly) answer:
def test(inArr):
arr=inArr[:] #copy, unnecessary if we use index in a smart way
result = []
while len(arr)>1: #as long as there can be an arithmetic progression
x=[arr[0],arr[1]] #take first two
arr=arr[2:] #remove from array
step=x[1]-x[0]
while len(arr)>0 and x[1]+step==arr[0]: #check if the next value in array is part of progression too
x[1]+=step #add it
arr=arr[1:]
result.append((x[0],x[1],step)) #append progression to result
if len(arr)==1:
result.append(arr[0])
return result
print test([2, 4, 6, 8, 12, 13, 14, 15, 16, 17, 20])
This returns [(2, 8, 2), (12, 17, 1), 20]
Slow, as it copies a list and removes elements from it
It only finds complete progressions, and only in sorted arrays.
In short, it is shitty, but should work ;)
There are other (cooler, more pythonic) ways to do this, for example you could convert your list to a set, keep removing two elements, calculate their arithmetic progression and intersect with the set.
You could also reuse the answer you provided to check for certain step sizes. e.g.:
ranges = []
step_size=2
for key, group in groupby(enumerate(data), lambda (index, item): step_size*index - item):
group = map(itemgetter(1), group)
if len(group) > 1:
ranges.append(xrange(group[0], group[-1]))
else:
ranges.append(group[0])
Which finds every group with step size of 2, but only those.
You can create an iterator to help grouping and try to pull the next element from the following group which will be the end of the previous group:
def ranges(lst):
it = iter(lst)
next(it) # move to second element for comparison
grps = groupby(lst, key=lambda x: (x - next(it, -float("inf"))))
for k, v in grps:
i = next(v)
try:
step = next(v) - i # catches single element v or gives us a step
nxt = list(next(grps)[1])
yield xrange(i, nxt.pop(0), step)
# outliers or another group
if nxt:
yield nxt[0] if len(nxt) == 1 else xrange(nxt[0], next(next(grps)[1]), nxt[1] - nxt[0])
except StopIteration:
yield i # no seq
which give you:
In [2]: l1 = [2, 3, 4, 5, 8, 10, 12, 14, 13, 14, 15, 16, 17, 20, 21]
In [3]: l2 = [2, 4, 6, 8, 12, 13, 14, 15, 16, 17, 20]
In [4]: l3 = [13, 14, 15, 16, 17, 18]
In [5]: s1 = [i + 10 for i in xrange(0, 11, 2)]
In [6]: s2 = [30]
In [7]: s3 = [i + 40 for i in xrange(45)]
In [8]: l4 = s1 + s2 + s3
In [9]: l5 = [1, 2, 5, 6, 9, 10]
In [10]: l6 = {1, 2, 3, 5, 6, 9, 10, 13, 19, 21, 22, 23, 24}
In [11]:
In [11]: for l in (l1, l2, l3, l4, l5, l6):
....: print(list(ranges(l)))
....:
[xrange(2, 5), xrange(8, 14, 2), xrange(13, 17), 20, 21]
[xrange(2, 8, 2), xrange(12, 17), 20]
[xrange(13, 18)]
[xrange(10, 20, 2), 30, xrange(40, 84)]
[1, 2, 5, 6, 9, 10]
[xrange(1, 3), 5, 6, 9, 10, 13, 19, xrange(21, 24)]
When the step is 1
it is not included in the xrange output.