How to use numpy.argsort() as indices in more than 2 dimensions?
Here is a general method:
import numpy as np
array = np.array([[[ 0.81774634, 0.62078744],
[ 0.43912609, 0.29718462]],
[[ 0.1266578 , 0.82282054],
[ 0.98180375, 0.79134389]]])
a = 1 # or 0 or 2
order = array.argsort(axis=a)
idx = np.ogrid[tuple(map(slice, array.shape))]
# if you don't need full ND generality: in 3D this can be written
# much more readable as
# m, n, k = array.shape
# idx = np.ogrid[:m, :n, :k]
idx[a] = order
print(np.all(array[idx] == np.sort(array, axis=a)))
Output:
True
Explanation: We must specify for each element of the output array the complete index of the corresponding element of the input array. Thus each index into the input array has the same shape as the output array or must be broadcastable to that shape.
The indices for the axes along which we do not sort/argsort stay in place. We therefore need to pass a broadcastable range(array.shape[i]) for each of those. The easiest way is to use ogrid to create such a range for all dimensions (If we used this directly, the array would come back unchanged.) and then replace the index correspondingg to the sort axis with the output of argsort
.
UPDATE March 2019:
Numpy is becoming more strict in enforcing multi-axis indices being passed as tuples. Currently, array[idx]
will trigger a deprecation warning. To be future proof use array[tuple(idx)]
instead. (Thanks @Nathan)
Or use numpy's new (version 1.15.0) convenience function take_along_axis
:
np.take_along_axis(array, order, a)
@Hameer's answer works, though it might use some simplification and explanation.
sort
and argsort
are working on the last axis. argsort
returns a 3d array, same shape as the original. The values are the indices on that last axis.
In [17]: np.argsort(arr, axis=2)
Out[17]:
array([[[1, 0],
[1, 0]],
[[0, 1],
[1, 0]]], dtype=int32)
In [18]: _.shape
Out[18]: (2, 2, 2)
In [19]: idx=np.argsort(arr, axis=2)
To use this we need to construct indices for the other dimensions that broadcast to the same (2,2,2) shape. ix_
is a handy tool for this.
Just using idx
as one of the ix_
inputs doesn't work:
In [20]: np.ix_(range(2),range(2),idx)
....
ValueError: Cross index must be 1 dimensional
Instead I use the last range, and then ignore it. @Hameer instead constructs the 2d ix_
, and then expands them.
In [21]: I,J,K=np.ix_(range(2),range(2),range(2))
In [22]: arr[I,J,idx]
Out[22]:
array([[[ 0.62078744, 0.81774634],
[ 0.29718462, 0.43912609]],
[[ 0.1266578 , 0.82282054],
[ 0.79134389, 0.98180375]]])
So the indices for the other dimensions work with the (2,2,2)
idx array:
In [24]: I.shape
Out[24]: (2, 1, 1)
In [25]: J.shape
Out[25]: (1, 2, 1)
That's the basics for constructing the other indices when you are given multidimensional index for one dimension.
@Paul constructs the same indices with ogrid
:
In [26]: np.ogrid[slice(2),slice(2),slice(2)] # np.ogrid[:2,:2,:2]
Out[26]:
[array([[[0]],
[[1]]]), array([[[0],
[1]]]), array([[[0, 1]]])]
In [27]: _[0].shape
Out[27]: (2, 1, 1)
ogrid
as a class
works with slices, while ix_
requires a list/array/range.
argsort for a multidimensional ndarray (from 2015) works with a 2d array, but the same logic applies (find a range index(s) that broadcasts with the argsort
).
Here's a vectorized implementation. It should be N-dimensional and quite a bit faster than what you're doing.
import numpy as np
def sort1(array, args):
array_sort = np.zeros_like(array)
for i in range(array.shape[0]):
for j in range(array.shape[1]):
array_sort[i, j] = array[i, j, args[i, j]]
return array_sort
def sort2(array, args):
shape = array.shape
idx = np.ix_(*tuple(np.arange(l) for l in shape[:-1]))
idx = tuple(ar[..., None] for ar in idx)
array_sorted = array[idx + (args,)]
return array_sorted
if __name__ == '__main__':
array = np.random.rand(5, 6, 7)
idx = np.argsort(array)
result1 = sort1(array, idx)
result2 = sort2(array, idx)
print(np.array_equal(result1, result2))