Select N random elements from a List efficiently (without toArray and change the list)

You are probably looking for something like Resorvoir Sampling.

Start with an initial array with first k elements, and modify it with new elements with decreasing probabilities:

java like pseudo code:

E[] r = new E[k]; //not really, cannot create an array of generic type, but just pseudo code
int i = 0;
for (E e : list) {
   //assign first k elements:
   if (i < k) { r[i++] = e; continue; }
   //add current element with decreasing probability:
   j = random(i++) + 1; //a number from 1 to i inclusive
   if (j <= k) r[j] = e;
}
return r;

This requires a single pass on the data, with very cheap ops every iteration, and the space consumption is linear with the required output size.


If n is very small compared to the length of the list, take an empty set of ints and keep adding a random index until the set has the right size.

If n is comparable to the length of the list, do the same, but then return items in the list that don't have indexes in the set.

In the middle ground, you can iterate through the list, and randomly select items based on how many items you've seen, and how many items you've already returned. In pseudo-code, if you want k items from N:

for i = 0 to N-1
    if random(N-i) < k
        add item[i] to the result
        k -= 1
    end
end

Here random(x) returns a random number between 0 (inclusive) and x (exclusive).

This produces a uniformly random sample of k elements. You could also consider making an iterator to avoid building the results list to save memory, assuming the list is unchanged as you're iterating over it.

By profiling, you can determine the transition point where it makes sense to switch from the naive set-building method to the iteration method.


Let's assume that you can generate n random indices out of m that are pairwise disjoint and then look them up efficiently in the collection. If you don't need the order of the elements to be random, then you can use an algorithm due to Robert Floyd.

Random r = new Random();
Set<Integer> s = new HashSet<Integer>();
for (int j = m - n; j < m; j++) {
    int t = r.nextInt(j);
    s.add(s.contains(t) ? j : t);
}

If you do need the order to be random, then you can run Fisher--Yates where, instead of using an array, you use a HashMap that stores only those mappings where the key and the value are distinct. Assuming that hashing is constant time, both of these algorithms are asymptotically optimal (though clearly, if you want to randomly sample most of the array, then there are data structures with better constants).