How do I add random `NA`s into a data frame

Return x within your function:

> df <- apply (df, 2, function(x) {x[sample( c(1:n), floor(n/10))] <- NA; x} )
> tail(df)
      id   age  sex
[45,] "45" "41" NA 
[46,] "46" NA   "f"
[47,] "47" "38" "f"
[48,] "48" "32" "f"
[49,] "49" "53" NA 
[50,] "50" "74" "f"

Apply returns an array, thereby converting all columns to the same type. You could use this instead:

df[,-1] <- do.call(cbind.data.frame, 
                   lapply(df[,-1], function(x) {
                     x[sample(c(1:n),floor(n/10))]<-NA
                     x
                   })
                   )

Or use a for loop:

for (i in seq_along(df[,-1])+1) {
  is.na(df[sample(seq_len(n), floor(n/10)),i]) <- TRUE
}

I think you need to return the x value from the function:

apply(subset(df,select=-id), 2, function(x) 
     {x[sample(c(1:n),floor(n/10))]<-NA; x}) 

but you also need to assign this back to the relevant subset of the data frame (and subset(...) <- ... doesn't work)

idCol <- names(df)=="id"
df[,!idCol] <- apply(df[,!idCol], 2, function(x) 
     {x[sample(1:n,floor(n/10))] <- NA; x})

(if you have only a single non-ID column you'll need df[,!idCol,drop=FALSE])

Tags:

R

Apply

Dataframe