Randomly insert NA's values in a pandas dataframe
Here's a way to clear exactly 10% of cells (or rather, as close to 10% as can be achieved with the existing data frame's size).
import random
ix = [(row, col) for row in range(df.shape[0]) for col in range(df.shape[1])]
for row, col in random.sample(ix, int(round(.1*len(ix)))):
df.iat[row, col] = np.nan
Here's a way to clear cells independently with a per-cell probability of 10%.
df = df.mask(np.random.random(df.shape) < .1)
I think you can easily iterate over data frame columns and assign NaN
value to every cell produced by pandas.DataFrame.sample()
method.
The code is following.
for col in df.columns:
df.loc[df.sample(frac=0.1).index, col] = pd.np.nan