Most efficient way of adding elements given the index list in numpy

I doubt you can get much faster than np.bincount - and notice how the official documentation provides this exact usecase

Click to copy

# Your example
A = [0.5, 0.6]
D = [[0.1, 0.1, 0.2], [0.2, 0.4, 0.1]]
I = [[0, 1, 0], [0, 1, 1]]

# Solution
import numpy as np    
D, I = np.array(D).flatten(), np.array(I).flatten()
print(np.bincount(I, D)) #[0.5 0.6]

The shape of I and D doesn't matter: you can clearly ravel the arrays without changing the outcome:

Click to copy

index = np.ravel(I)
data = np.ravel(D)

Now you can sort both arrays according to I:

Click to copy

sorter = np.argsort(index)
index = index[sorter]
data = data[sorter]

This is helpful because now index looks like this:

Click to copy

0, 0, 0, 1, 1, 1

And data is this:

Click to copy

0.1, 0.2, 0.2, 0.1, 0.4, 0.1

Adding together runs of consecutive numbers should be easier than processing random locations. Let's start by finding the indices where the runs start:

Click to copy

runs = np.r_[0, np.flatnonzero(np.diff(index)) + 1]

Now you can use the fact that ufuncs like np.add have a partial reduce operation called reduceat. This allows you to sum regions of an array:

Click to copy

a = np.add.reduceat(data, runs)

If I is guaranteed to contain all indices in [0, A.size) at least once, you're done: just assign to A instead of a. If not, you can make the mapping using the fact that the start of each run in index is the target index:

Click to copy

A = np.zeros(n)
A[index[runs]] = a

Algorithmic complexity analysis:

ravel is O(1) in time and space if the data is in an array. If it's a list, this is O(MN) in time and space
argsort is O(MN log MN) in time and O(MN) in space
Indexing by sorter is O(MN) in time and space
Computing runs is O(MN) in time and O(MN + M) = O(MN) in space
reduceat is a single pass: O(MN) in time, O(M) in space
Reassigning A is O(M) in time and space

Total: O(MN log MN) time, O(MN) space

TL;DR

Click to copy

def make_A(D, I, M):
    index = np.ravel(I)
    data = np.ravel(D)
    sorter = np.argsort(index)
    index = index[sorter]

    if index[0] < 0 or index[-1] >= M:
        raise ValueError('Bad indices')

    data = data[sorter]
    runs = np.r_[0, np.flatnonzero(np.diff(index)) + 1]
    a = np.add.reduceat(data, runs)
    if a.size == M:
        return a
    A = np.zeros(M)
    A[index[runs]] = a
    return A

Most efficient way of adding elements given the index list in numpy

Tags:

Python

Numpy

Add

Related

Recent Posts