Efficiently replace elements in array based on dictionary - NumPy / Python
Given that you're using numpy arrays, I'd suggest you do a mapping using numpy too. Here's a vectorized approach using np.select
:
mapping = {1:2, 5:3, 8:6}
keys, choices = list(zip(*mapping.items()))
# [(1, 5, 8), (2, 3, 6)]
# we can use broadcasting to obtain a 3x100x100
# array to use as condlist
conds = np.array(keys)[:,None,None] == input_array
# use conds as arrays of conditions and the values
# as choices
np.select(conds, choices)
array([[2, 2, 2, ..., 0, 0, 0],
[2, 2, 2, ..., 0, 0, 0],
[2, 2, 2, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]])
Approach #1 : Loopy one with array data
One approach would be extracting the keys and values in arrays and then use a similar loop -
k = np.array(list(mapping.keys()))
v = np.array(list(mapping.values()))
out = np.zeros_like(input_array)
for key,val in zip(k,v):
out[input_array==key] = val
Benefit with this one over the original one is the spatial-locality of the array data for efficient data-fetching, which is used in the iterations.
Also, since you mentioned thousand large np.arrays
. So, if the mapping
dictionary stays the same, that step to get the array versions - k
and v
would be a one-time setup process.
Approach #2 : Vectorized one with searchsorted
A vectorized one could be suggested using np.searchsorted
-
sidx = k.argsort() #k,v from approach #1
k = k[sidx]
v = v[sidx]
idx = np.searchsorted(k,input_array.ravel()).reshape(input_array.shape)
idx[idx==len(k)] = 0
mask = k[idx] == input_array
out = np.where(mask, v[idx], 0)
Approach #3 : Vectorized one with mapping-array for integer keys
A vectorized one could be suggested using a mapping array for integer keys, which when indexed by the input array would lead us directly to the final output -
mapping_ar = np.zeros(k.max()+1,dtype=v.dtype) #k,v from approach #1
mapping_ar[k] = v
out = mapping_ar[input_array]
The numpy_indexed library (disclaimer: I am its author) provides functionality to implement this operation in an efficient vectorized maner:
import numpy_indexed as npi
output_array = npi.remap(input_array.flatten(), list(mapping.keys()), list(mapping.values())).reshape(input_array.shape)
Note; I didnt test it; but it should work along these lines. Efficiency should be good for large inputs, and many items in the mapping; I imagine similar to divakars' method 2; not as fast as his method 3. But this solution is aimed more at generality; and it will also work for inputs which are not positive integers; or even nd-arrays (f.i. replacing colors in an image with other colors, etc).
I think the Divakar #3 method assumes that the mapping dict covers all values (or at least the maximum value) in the target array. Otherwise, to avoid index out of range errors, you have to replace the line
mapping_ar = np.zeros(k.max()+1,dtype=v.dtype)
with
mapping_ar = np.zeros(array.max()+1,dtype=v.dtype)
That adds considerable overhead.