How do I add random `NA`s into a data frame
Return x
within your function:
> df <- apply (df, 2, function(x) {x[sample( c(1:n), floor(n/10))] <- NA; x} )
> tail(df)
id age sex
[45,] "45" "41" NA
[46,] "46" NA "f"
[47,] "47" "38" "f"
[48,] "48" "32" "f"
[49,] "49" "53" NA
[50,] "50" "74" "f"
Apply returns an array, thereby converting all columns to the same type. You could use this instead:
df[,-1] <- do.call(cbind.data.frame,
lapply(df[,-1], function(x) {
x[sample(c(1:n),floor(n/10))]<-NA
x
})
)
Or use a for
loop:
for (i in seq_along(df[,-1])+1) {
is.na(df[sample(seq_len(n), floor(n/10)),i]) <- TRUE
}
I think you need to return the x
value from the function:
apply(subset(df,select=-id), 2, function(x)
{x[sample(c(1:n),floor(n/10))]<-NA; x})
but you also need to assign this back to the relevant subset of the data frame (and subset(...) <- ...
doesn't work)
idCol <- names(df)=="id"
df[,!idCol] <- apply(df[,!idCol], 2, function(x)
{x[sample(1:n,floor(n/10))] <- NA; x})
(if you have only a single non-ID column you'll need df[,!idCol,drop=FALSE]
)