How to generate list of unique random floats in Python
If you need to guarantee uniqueness, it may be more efficient to
- Try and generate
n
random floats in[lo, hi]
at once. - If the length of the unique floats is not
n
, try and generate however many floats are still needed
and continue accordingly until you have enough, as opposed to generating them 1-by-1 in a Python level loop checking against a set.
If you can afford NumPy doing so with np.random.uniform
can be a huge speed-up.
import numpy as np
def gen_uniq_floats(lo, hi, n):
out = np.empty(n)
needed = n
while needed != 0:
arr = np.random.uniform(lo, hi, needed)
uniqs = np.setdiff1d(np.unique(arr), out[:n-needed])
out[n-needed: n-needed+uniqs.size] = uniqs
needed -= uniqs.size
np.random.shuffle(out)
return out.tolist()
If you cannot use NumPy, it still may be more efficient depending on your data needs to apply the same concept of checking for dupes afterwards, maintaining a set.
def no_depend_gen_uniq_floats(lo, hi, n):
seen = set()
needed = n
while needed != 0:
uniqs = {random.uniform(lo, hi) for _ in range(needed)}
seen.update(uniqs)
needed -= len(uniqs)
return list(seen)
Rough benchmark
Extreme degenerate case
# Mitch's NumPy solution
%timeit gen_uniq_floats(0, 2**-50, 1000)
153 µs ± 3.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
# Mitch's Python-only solution
%timeit no_depend_gen_uniq_floats(0, 2**-50, 1000)
495 µs ± 43.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# Raymond Hettinger's solution (single number generation)
%timeit sample_floats(0, 2**-50, 1000)
618 µs ± 13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
More "normal" case (with larger sample)
# Mitch's NumPy solution
%timeit gen_uniq_floats(0, 1, 10**5)
15.6 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Mitch's Python-only solution
%timeit no_depend_gen_uniq_floats(0, 1, 10**5)
65.7 ms ± 2.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Raymond Hettinger's solution (single number generation)
%timeit sample_floats(0, 1, 10**5)
78.8 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
You could just use random.uniform(start, stop)
. With double precision floats, you can be relatively sure that they are unique if your set is small. If you want to generate a big number of random floats and need to avoid that you have a number twice, check before adding them to the list.
However, if you are looking for a selection of specific numbers, this is not the solution.
Answer
One easy way is to keep a set of all random values seen so far and reselect if there is a repeat:
import random
def sample_floats(low, high, k=1):
""" Return a k-length list of unique random floats
in the range of low <= x <= high
"""
result = []
seen = set()
for i in range(k):
x = random.uniform(low, high)
while x in seen:
x = random.uniform(low, high)
seen.add(x)
result.append(x)
return result
Notes
This technique is how Python's own random.sample() is implemented.
The function uses a set to track previous selections because searching a set is O(1) while searching a list is O(n).
Computing the probability of a duplicate selection is equivalent to the famous Birthday Problem.
Given 2**53 distinct possible values from random(), duplicates are infrequent. On average, you can expect a duplicate float at about 120,000,000 samples.
Variant: Limited float range
If the population is limited to just a range of evenly spaced floats, then it is possible to use random.sample() directly. The only requirement is that the population be a Sequence:
from __future__ import division
from collections import Sequence
class FRange(Sequence):
""" Lazily evaluated floating point range of evenly spaced floats
(inclusive at both ends)
>>> list(FRange(low=10, high=20, num_points=5))
[10.0, 12.5, 15.0, 17.5, 20.0]
"""
def __init__(self, low, high, num_points):
self.low = low
self.high = high
self.num_points = num_points
def __len__(self):
return self.num_points
def __getitem__(self, index):
if index < 0:
index += len(self)
if index < 0 or index >= len(self):
raise IndexError('Out of range')
p = index / (self.num_points - 1)
return self.low * (1.0 - p) + self.high * p
Here is a example of choosing ten random samples without replacement from a range of 41 evenly spaced floats from 10.0 to 20.0.
>>> import random
>>> random.sample(FRange(low=10.0, high=20.0, num_points=41), k=10)
[13.25, 12.0, 15.25, 18.5, 19.75, 12.25, 15.75, 18.75, 13.0, 17.75]
You can easily use your list of integers to generate floats:
int_list = random.sample(range(1, 100), 10)
float_list = [x/10 for x in int_list]
Check out this Stack Overflow question about generating random floats.
If you want it to work with python2, add this import:
from __future__ import division