Assigning categorical values to NAs randomly or proportionally
We can use ifelse
and is.na
to determine if na
exist, and then use sample
to randomly select female
and male
.
df$gender <- ifelse(is.na(df$gender), sample(c("female", "male"), 1), df$gender)
How about this:
> df <- structure(list(gender = c("female", "male", NA, NA, "male", "male",
+ "male"),
+ Division = c("South Atlantic", "East North Central",
+ "Pacific", "East North Central", "South Atlantic", "South Atlantic",
+ "Pacific"),
+ Median = c(57036.6262, 39917, 94060.208, 89822.1538,
+ 107683.9118, 56149.3217, 46237.265),
+ first_name = c("Marilyn", "Jeffery", "Yashvir", "Deyou", "John", "Jose", "Daniel")),
+ row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))
>
> Gender <- rbinom(length(df$gender), 1, 0.52)
> Gender <- factor(Gender, labels = c("female", "male"))
>
> df$gender[is.na(df$gender)] <- as.character(Gender[is.na(df$gender)])
>
> df$gender
[1] "female" "male" "female" "female" "male" "male" "male"
>
Thats is random with a given probability. You could also consider imputing values using nearest neighbors, hot desk, or similar.
Hope it helps.
Just assign
df$gender[is.na(df$gender)]=sample(c("female", "male"), dim(df)[1], replace = TRUE)[is.na(df$gender)]