Python random sample of two arrays, but matching indices

After test numpy.random.choice solution, I found out it was very slow for larger array.

numpy.random.randint should be much faster

example

x = np.arange(1e8)
y = np.arange(1e8)
idx = np.random.randint(0, x.shape[0], 10000)
return x[idx], y[idx]

Just zip the two together and use that as the population:

import random

random.sample(zip(xs,ys), 1000)

The result will be 1000 pairs (2-tuples) of corresponding entries from xs and ys.


You can use np.random.choice on an index array and apply it to both arrays:

idx = np.random.choice(np.arange(len(x)), 1000, replace=False)
x_sample = x[idx]
y_sample = y[idx]