Restart cumsum and get index if cumsum more than value

A fun method

sumlm = np.frompyfunc(lambda a,b:a+b if a < 10 else b,2,1)
newx=sumlm.accumulate(x, dtype=np.object)
newx
array([1, 3, 4, 7, 10, 2, 3, 8, 9, 10], dtype=object)
np.nonzero(newx==10)

(array([4, 9]),)

Here's one with numba and array-initialization -

from numba import njit

@njit
def cumsum_breach_numba2(x, target, result):
    total = 0
    iterID = 0
    for i,x_i in enumerate(x):
        total += x_i
        if total >= target:
            result[iterID] = i
            iterID += 1
            total = 0
    return iterID

def cumsum_breach_array_init(x, target):
    x = np.asarray(x)
    result = np.empty(len(x),dtype=np.uint64)
    idx = cumsum_breach_numba2(x, target, result)
    return result[:idx]

Timings

Including @piRSquared's solutions and using the benchmarking setup from the same post -

In [58]: np.random.seed([3, 1415])
    ...: x = np.random.randint(100, size=1000000).tolist()

# @piRSquared soln1
In [59]: %timeit list(cumsum_breach(x, 10))
10 loops, best of 3: 73.2 ms per loop

# @piRSquared soln2
In [60]: %timeit cumsum_breach_numba(np.asarray(x), 10)
10 loops, best of 3: 69.2 ms per loop

# From this post
In [61]: %timeit cumsum_breach_array_init(x, 10)
10 loops, best of 3: 39.1 ms per loop

Numba : Appending vs. array-initialization

For a closer look at how the array-initialization helps, which seems be the big difference between the two numba implementations, let's time these on the array data, as the array data creation was in itself heavy on runtime and they both depend on it -

In [62]: x = np.array(x)

In [63]: %timeit cumsum_breach_numba(x, 10)# with appending
10 loops, best of 3: 31.5 ms per loop

In [64]: %timeit cumsum_breach_array_init(x, 10)
1000 loops, best of 3: 1.8 ms per loop

To force the output to have it own memory space, we can make a copy. Won't change the things in a big way though -

In [65]: %timeit cumsum_breach_array_init(x, 10).copy()
100 loops, best of 3: 2.67 ms per loop

Restart cumsum and get index if cumsum more than value

Tags:

Python

Pandas

Numpy

Related

Recent Posts