Fast way to find length and start index of repeated elements in array
Here is a pedestrian try, solving the problem by programming the problem.
We prepend and also append a zero to A
, getting a vector ZA
, then detect the 1
islands, and the 0
islands coming in alternating manner in the ZA
by comparing the shifted versions ZA[1:]
and ZA[-1]
. (In the constructed arrays we take the even places, corresponding to the ones in A
.)
import numpy as np
def structure(A):
ZA = np.concatenate(([0], A, [0]))
indices = np.flatnonzero( ZA[1:] != ZA[:-1] )
counts = indices[1:] - indices[:-1]
return indices[::2], counts[::2]
Some sample runs:
In [71]: structure(np.array( [0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0] ))
Out[71]: (array([ 2, 6, 10]), array([3, 2, 1]))
In [72]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1] ))
Out[72]: (array([ 0, 5, 9, 13, 15]), array([3, 3, 2, 1, 1]))
In [73]: structure(np.array( [1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0] ))
Out[73]: (array([0, 5, 9]), array([3, 3, 2]))
In [74]: structure(np.array( [1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1] ))
Out[74]: (array([ 0, 2, 5, 7, 11, 14]), array([1, 2, 1, 3, 2, 3]))
Let's try unique
:
_, idx, counts = np.unique(np.cumsum(1-A)*A, return_index=True, return_counts=True)
# your expected output:
idx, counts
Output:
(array([ 2, 6, 10]), array([3, 2, 1]))
You can use the fact that the indexes of '1s' provide all information you need. It's enough to find starts and ends of series of '1s'.
A = np.concatenate(([0], A, [0])) # get rid of some edge cases
diff = np.argwhere((A[:-1] + A[1:]) == 1).ravel()
starts = diff[::2]
ends = diff[1::2]
print(starts, ends - starts)