Pandas: create new column in df with random integers from range
To add a column of random integers, use randint(low, high, size)
. There's no need to waste memory allocating range(low, high)
; that could be a lot of memory if high
is large.
df1['randNumCol'] = np.random.randint(0,5, size=len(df1))
Notes:
- when we're just adding a single column,
size
is just an integer. In general if we want to generate an array/dataframe ofrandint()s
, size can be a tuple, as in Pandas: How to create a data frame of random integers?) - in Python 3.x
range(low, high)
no longer allocates a list (potentially using lots of memory), it produces arange()
object - use
random.seed(...)
for determinism and reproducibility
One solution is to use numpy.random.randint
:
import numpy as np
df1['randNumCol'] = np.random.randint(1, 6, df1.shape[0])
Or if the numbers are non-consecutive (albeit slower), you can use this:
df1['randNumCol'] = np.random.choice([1, 9, 20], df1.shape[0])
In order to make the results reproducible you can set the seed with numpy.random.seed
(e.g. np.random.seed(42)
)
An option that doesn't require an additional import for numpy:
df1['randNumCol'] = pd.Series(range(1,6)).sample(int(5e4), replace=True).array