shuffle vs permute numpy
Adding on to what @ecatmur said, np.random.permutation
is useful when you need to shuffle ordered pairs, especially for classification:
from np.random import permutation
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
# Data is currently unshuffled; we should shuffle
# each X[i] with its corresponding y[i]
perm = permutation(len(X))
X = X[perm]
y = y[perm]
np.random.permutation
has two differences from np.random.shuffle
:
- if passed an array, it will return a shuffled copy of the array;
np.random.shuffle
shuffles the array inplace - if passed an integer, it will return a shuffled range i.e.
np.random.shuffle(np.arange(n))
If x is an integer, randomly permute np.arange(x). If x is an array, make a copy and shuffle the elements randomly.
The source code might help to understand this:
3280 def permutation(self, object x):
...
3307 if isinstance(x, (int, np.integer)):
3308 arr = np.arange(x)
3309 else:
3310 arr = np.array(x)
3311 self.shuffle(arr)
3312 return arr