Weighted random sample without replacement in python
Built-in solution
As suggested by Miriam Farber, you can just use the numpy's builtin solution:
np.random.choice(vec,size,replace=False, p=P)
Pure python equivalent
What follows is close to what numpy does internally. It, of course, uses numpy arrays and numpy.random.choices():
from random import choices
def weighted_sample_without_replacement(population, weights, k=1):
weights = list(weights)
positions = range(len(population))
indices = []
while True:
needed = k - len(indices)
if not needed:
break
for i in choices(positions, weights, k=needed):
if weights[i]:
weights[i] = 0.0
indices.append(i)
return [population[i] for i in indices]
Related problem: Selection when elements can be repeated
This is sometimes called an urn problem. For example, given an urn with 10 red balls, 4 white balls, and 18 green balls, choose nine balls without replacement.
To do it with numpy, generate the unique selections from the total population count with sample(). Then, bisect the cumulative weights to get the population indices.
import numpy as np
from random import sample
population = np.array(['red', 'blue', 'green'])
counts = np.array([10, 4, 18])
k = 9
cum_counts = np.add.accumulate(counts)
total = cum_counts[-1]
selections = sample(range(total), k=k)
indices = np.searchsorted(cum_counts, selections, side='right')
result = population[indices]
To do this without *numpy', the same approach can be implemented with bisect() and accumulate() from the standard library:
from random import sample
from bisect import bisect
from itertools import accumulate
population = ['red', 'blue', 'green']
weights = [10, 4, 18]
k = 9
cum_weights = list(accumulate(weights))
total = cum_weights.pop()
selections = sample(range(total), k=k)
indices = [bisect(cum_weights, s) for s in selections]
result = [population[i] for i in indices]
You can use np.random.choice
with replace=False
as follows:
np.random.choice(vec,size,replace=False, p=P)
where vec
is your population and P
is the weight vector.
For example:
import numpy as np
vec=[1,2,3]
P=[0.5,0.2,0.3]
np.random.choice(vec,size=2,replace=False, p=P)