How to calculate the size of blocks of values in a list?

Try with cumsum with diff then transform count

s = pd.Series(list_1)
s.groupby(s.diff().ne(0).cumsum()).transform('count')
Out[91]: 
0     1
1     2
2     2
3     3
4     3
5     3
6     4
7     4
8     4
9     4
10    1
11    1
dtype: int64

NumPy way -

Click to copy

In [15]: a = np.array(list_1)

In [16]: c = np.diff(np.flatnonzero(np.r_[True,a[:-1] != a[1:],True]))

In [17]: np.repeat(c,c)
Out[17]: array([1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 1, 1])

Timings on 10,000x tiled version of given sample :

Click to copy

In [45]: list_1
Out[45]: [0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1]

In [46]: list_1 = np.tile(list_1,10000).tolist()

# Itertools groupby way :
In [47]: %%timeit
    ...: result = []
    ...: for k, v in groupby(list_1):
    ...:     length = len(list(v))
    ...:     result.extend([length] * length)
28.7 ms ± 435 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Pandas way :
In [48]: %%timeit
    ...: s = pd.Series(list_1)
    ...: s.groupby(s.diff().ne(0).cumsum()).transform('count')
28.3 ms ± 324 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# NumPy way :
In [49]: %%timeit
    ...: a = np.array(list_1)
    ...: c = np.diff(np.flatnonzero(np.r_[True,a[:-1] != a[1:],True]))
    ...: np.repeat(c,c)
8.16 ms ± 76.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

You can use from itertools import groupby as groupby(list_1) will produce the following structure

Click to copy

>> [(k, list(v)) for k, v in groupby(list_1)]
[(0, [0]), (1, [1, 1]), (0, [0, 0, 0]), (1, [1, 1, 1, 1]), (0, [0]), (1, [1])]

Then just iterate and add as many boxes as the length of the list

Click to copy

result = []
for k, v in groupby(list_1):
    length = len(list(v))
    result.extend([length] * length) # list of value 'length' of size 'length'

print(result)  # [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 1, 1]

How to calculate the size of blocks of values in a list?

Tags:

Python

Pandas

Arrays

List

Numpy

Related

Recent Posts