Numpy sort ndarray on multiple columns

numpy ndarray sort by the 1st, 2nd or 3rd column:

>>> a = np.array([[1,30,200], [2,20,300], [3,10,100]])

>>> a
array([[  1,  30, 200],         
       [  2,  20, 300],          
       [  3,  10, 100]])

>>> a[a[:,2].argsort()]           #sort by the 3rd column ascending
array([[  3,  10, 100],
       [  1,  30, 200],
       [  2,  20, 300]])

>>> a[a[:,2].argsort()][::-1]     #sort by the 3rd column descending
array([[  2,  20, 300],
       [  1,  30, 200],
       [  3,  10, 100]])

>>> a[a[:,1].argsort()]        #sort by the 2nd column ascending
array([[  3,  10, 100],
       [  2,  20, 300],
       [  1,  30, 200]])

To explain what is going on here: argsort() is passing back an array containing integer sequence of its parent: https://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html

>>> x = np.array([15, 30, 4, 80, 6])
>>> np.argsort(x)
array([2, 4, 0, 1, 3])

Sort by column 3, then by column 2 then 1:

>>> a = np.array([[2,30,200], [1,30,200], [1,10,200]])

>>> a
array([[  2,  30, 200],
       [  1,  30, 200],
       [  1,  10, 200]])

>>> a[np.lexsort((a[:,2], a[:,1],a[:,0]))]
array([[  1,  10, 200],
       [  1,  30, 200],
       [  2,  30, 200]])

>>> a[np.lexsort((a[:,2], a[:,1],a[:,0]))][::-1]        #reverse
array([[  2  30 200]
       [  1  30 200]
       [  1  10 200]])

Import letting Numpy guess the type and sorting in place:

import numpy as np

# let numpy guess the type with dtype=None
my_data = np.genfromtxt(infile, dtype=None, names=["a", "b", "c", "d"])

# access columns by name
print(my_data["b"]) # column 1

# sort column 1 and column 0 
my_data.sort(order=["b", "a"])

# save specifying required format (tab separated values)
np.savetxt("sorted.tsv", my_data, fmt="%d\t%d\t%.6f\t%.6f"

Alternatively, specifying the input format and sorting to a new array:

import numpy as np

# tell numpy the first 2 columns are int and the last 2 are floats
my_data = np.genfromtxt(infile, dtype=[('a', '<i8'), ('b', '<i8'), ('x', '<f8'), ('d', '<f8')])

# access columns by name
print(my_data["b"]) # column 1

# get the indices to sort the array using lexsort
# the last element of the tuple (column 1) is used as the primary key
ind = np.lexsort((my_data["a"], my_data["b"]))

# create a new, sorted array
sorted_data = my_data[ind]

# save specifying required format (tab separated values)
np.savetxt("sorted.tsv", sorted_data, fmt="%d\t%d\t%.6f\t%.6f")

Output:

2   1   2.000000    0.000000
3   1   2.000000    0.000000
4   1   2.000000    0.000000
2   2   100.000000  0.000000
3   2   4.000000    0.000000
4   2   4.000000    0.000000
2   3   100.000000  0.000000
3   3   6.000000    0.000000
4   3   6.000000    0.000000

this method works for any numpy array:

import numpy as np

my_data = [[   2.,    1.,    2.,    0.],
           [   2.,    2.,  100.,    0.],
           [   2.,    3.,  100.,    0.],
           [   3.,    1.,    2.,    0.],
           [   3.,    2.,    4.,    0.],
           [   3.,    3.,    6.,    0.],
           [   4.,    1.,    2.,    0.],
           [   4.,    2.,    4.,    0.],
           [   4.,    3.,    6.,    0.]]
my_data = np.array(my_data)
r = np.core.records.fromarrays([my_data[:,1],my_data[:,0]],names='a,b')
my_data = my_data[r.argsort()]
print(my_data)

Result:

[[  2.   1.   2.   0.]
 [  3.   1.   2.   0.]
 [  4.   1.   2.   0.]
 [  2.   2. 100.   0.]
 [  3.   2.   4.   0.]
 [  4.   2.   4.   0.]
 [  2.   3. 100.   0.]
 [  3.   3.   6.   0.]
 [  4.   3.   6.   0.]]

Numpy sort ndarray on multiple columns

numpy ndarray sort by the 1st, 2nd or 3rd column:

Sort by column 3, then by column 2 then 1:

this method works for any numpy array:

Result:

Tags:

Python

Arrays

Sorting

Numpy

Related

Recent Posts