Is there a Python equivalent to R's sample() function?
I think numpy.random.choice(a, size=None, replace=True, p=None)
may well be what you are looking for.
The p
argument corresponds to the prob
argument in the sample()
function.
In pandas (Python's closest analogue to R) there are the DataFrame.sample
and Series.sample
methods, which were both introduced in version 0.16.1.
For example:
>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 7, 8, 9, 0]})
>>> df
a b
0 1 6
1 2 7
2 3 8
3 4 9
4 5 0
Sampling 3 rows without replacement:
>>> df.sample(3)
a b
4 5 0
1 2 7
3 4 9
Sample 4 rows from column 'a' with replacement, using column 'b' as the corresponding weights for the choices:
>>> df['a'].sample(4, replace=True, weights=df['b'])
3 4
0 1
0 1
2 3
These methods are almost identical to the R function, allowing you to sample a particular number of values - or fraction of values - from your DataFrame/Series, with or without replacement. Note that the prob
argument in R's sample()
corresponds to weights
in the pandas methods.