What is the fastest way to map group names of numpy array to indices?
You could use Cython:
%%cython -c-O3 -c-march=native -a
#cython: language_level=3, boundscheck=False, wraparound=False, initializedcheck=False, cdivision=True, infer_types=True
import math
import cython as cy
cimport numpy as cnp
cpdef groupby_index_dict_cy(cnp.int32_t[:, :] arr):
cdef cy.size_t size = len(arr)
result = {}
for i in range(size):
key = arr[i, 0], arr[i, 1], arr[i, 2]
if key in result:
result[key].append(i)
else:
result[key] = [i]
return result
but it will not make you faster than what Pandas does, although it is the fastest after that (and perhaps the numpy_index
based solution), and does not come with the memory penalty of it.
A collection of what has been proposed so far is here.
In OP's machine that should get close to ~12 sec execution time.
You might just iterate and add the index of each element to the corresponding list.
from collections import defaultdict
res = defaultdict(list)
for idx, elem in enumerate(cubes):
#res[tuple(elem)].append(idx)
res[elem.tobytes()].append(idx)
Runtime can be further improved by using tobytes() instead of converting the key to a tuple.
Constant number of indices per group
Approach #1
We can perform dimensionality-reduction
to reduce cubes
to a 1D array. This is based on a mapping of the given cubes data onto a n-dim grid to compute the linear-index equivalents, discussed in detail here
. Then, based on the uniqueness of those linear indices, we can segregate unique groups and their corresponding indices. Hence, following those strategies, we would have one solution, like so -
N = 4 # number of indices per group
c1D = np.ravel_multi_index(cubes.T, cubes.max(0)+1)
sidx = c1D.argsort()
indices = sidx.reshape(-1,N)
unq_groups = cubes[indices[:,0]]
# If you need in a zipped dictionary format
out = dict(zip(map(tuple,unq_groups), indices))
Alternative #1 : If the integer values in cubes
are too large, we might want to do the dimensionality-reduction
such that the dimensions with shorter extent are choosen as the primary axes. Hence, for those cases, we can modify the reduction step to get c1D
, like so -
s1,s2 = cubes[:,:2].max(0)+1
s = np.r_[s2,1,s1*s2]
c1D = cubes.dot(s)
Approach #2
Next up, we can use Cython-powered kd-tree
for quick nearest-neighbor lookup to get nearest neighbouring indices and hence solve our case like so -
from scipy.spatial import cKDTree
idx = cKDTree(cubes).query(cubes, k=N)[1] # N = 4 as discussed earlier
I = idx[:,0].argsort().reshape(-1,N)[:,0]
unq_groups,indices = cubes[I],idx[I]
Generic case : Variable number of indices per group
We will extend the argsort based method with some splitting to get our desired output, like so -
c1D = np.ravel_multi_index(cubes.T, cubes.max(0)+1)
sidx = c1D.argsort()
c1Ds = c1D[sidx]
split_idx = np.flatnonzero(np.r_[True,c1Ds[:-1]!=c1Ds[1:],True])
grps = cubes[sidx[split_idx[:-1]]]
indices = [sidx[i:j] for (i,j) in zip(split_idx[:-1],split_idx[1:])]
# If needed as dict o/p
out = dict(zip(map(tuple,grps), indices))
Using 1D versions of groups of cubes
as keys
We will extend earlier listed method with the groups of cubes
as keys to simplify the process of dictionary creating and also make it efficient with it, like so -
def numpy1(cubes):
c1D = np.ravel_multi_index(cubes.T, cubes.max(0)+1)
sidx = c1D.argsort()
c1Ds = c1D[sidx]
mask = np.r_[True,c1Ds[:-1]!=c1Ds[1:],True]
split_idx = np.flatnonzero(mask)
indices = [sidx[i:j] for (i,j) in zip(split_idx[:-1],split_idx[1:])]
out = dict(zip(c1Ds[mask[:-1]],indices))
return out
Next up, we will make use of numba
package to iterate and get to the final hashable dictionary output. Going with it, there would be two solutions - One that gets the keys and values separately using numba
and the main calling will zip and convert to dict, while the other one will create a numba-supported
dict type and hence no extra work required by the main calling function.
Thus, we would have first numba
solution :
from numba import njit
@njit
def _numba1(sidx, c1D):
out = []
n = len(sidx)
start = 0
grpID = []
for i in range(1,n):
if c1D[sidx[i]]!=c1D[sidx[i-1]]:
out.append(sidx[start:i])
grpID.append(c1D[sidx[start]])
start = i
out.append(sidx[start:])
grpID.append(c1D[sidx[start]])
return grpID,out
def numba1(cubes):
c1D = np.ravel_multi_index(cubes.T, cubes.max(0)+1)
sidx = c1D.argsort()
out = dict(zip(*_numba1(sidx, c1D)))
return out
And second numba
solution as :
from numba import types
from numba.typed import Dict
int_array = types.int64[:]
@njit
def _numba2(sidx, c1D):
n = len(sidx)
start = 0
outt = Dict.empty(
key_type=types.int64,
value_type=int_array,
)
for i in range(1,n):
if c1D[sidx[i]]!=c1D[sidx[i-1]]:
outt[c1D[sidx[start]]] = sidx[start:i]
start = i
outt[c1D[sidx[start]]] = sidx[start:]
return outt
def numba2(cubes):
c1D = np.ravel_multi_index(cubes.T, cubes.max(0)+1)
sidx = c1D.argsort()
out = _numba2(sidx, c1D)
return out
Timings with cubes.npz
data -
In [4]: cubes = np.load('cubes.npz')['array']
In [5]: %timeit numpy1(cubes)
...: %timeit numba1(cubes)
...: %timeit numba2(cubes)
2.38 s ± 14.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.13 s ± 25.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.8 s ± 5.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Alternative #1 : We can achieve further speedup with numexpr
for large arrays to compute c1D
, like so -
import numexpr as ne
s0,s1 = cubes[:,0].max()+1,cubes[:,1].max()+1
d = {'s0':s0,'s1':s1,'c0':cubes[:,0],'c1':cubes[:,1],'c2':cubes[:,2]}
c1D = ne.evaluate('c0+c1*s0+c2*s0*s1',d)
This would be applicable at all places that require c1D
.