Efficient way to take the minimum/maximum n values and indices from a matrix using NumPy
Since the time of the other answer, NumPy has added the numpy.partition
and numpy.argpartition
functions for partial sorting, allowing you to do this in O(arr.size)
time, or O(arr.size+n*log(n))
if you need the elements in sorted order.
numpy.partition(arr, n)
returns an array the size of arr
where the n
th element is what it would be if the array were sorted. All smaller elements come before that element and all greater elements come afterward.
numpy.argpartition
is to numpy.partition
as numpy.argsort
is to numpy.sort
.
Here's how you would use these functions to find the indices of the minimum n
elements of a two-dimensional arr
:
flat_indices = numpy.argpartition(arr.ravel(), n-1)[:n]
row_indices, col_indices = numpy.unravel_index(flat_indices, arr.shape)
And if you need the indices in order, so row_indices[0]
is the row of the minimum element instead of just one of the n
minimum elements:
min_elements = arr[row_indices, col_indices]
min_elements_order = numpy.argsort(min_elements)
row_indices, col_indices = row_indices[min_elements_order], col_indices[min_elements_order]
The 1D case is a lot simpler:
# Unordered:
indices = numpy.argpartition(arr, n-1)[:n]
# Extra code if you need the indices in order:
min_elements = arr[indices]
min_elements_order = numpy.argsort(min_elements)
ordered_indices = indices[min_elements_order]
Since there is no heap implementation in NumPy, probably your best guess is to sort the whole array and take the last n
elements:
def n_max(arr, n):
indices = arr.ravel().argsort()[-n:]
indices = (numpy.unravel_index(i, arr.shape) for i in indices)
return [(arr[i], i) for i in indices]
(This will probably return the list in reverse order compared to your implementation - I did not check.)
A more efficient solution that works with newer versions of NumPy is given in this answer.