How to replace NA with mean by group / subset?
Before answering this, I want to say that am a beginner in R. Hence, please let me know if you feel my answer is wrong.
Code:
DF[is.na(DF$length), "length"] <- mean(na.omit(telecom_original_1$length))
and apply the same for width.
DF stands for name of the data.frame.
Thanks, Parthi
Several other options:
1) with data.table's new nafill
-function
library(data.table)
setDT(dat)
cols <- c("length", "width")
dat[, (cols) := lapply(.SD, function(x) nafill(x, type = "const", fill = mean(x, na.rm = TRUE)))
, by = taxa
, .SDcols = cols][]
2) with zoo's na.aggregate
-function
library(zoo)
library(data.table)
setDT(dat)
cols <- c("length", "width")
dat[, (cols) := lapply(.SD, na.aggregate)
, by = taxa
, .SDcols = cols][]
The default function from na.aggregate
is mean
; if you want to use another function you should specify that with the FUN
-parameter (example: FUN = median
). See also the help-file with ?na.aggregate
.
Of course you can also use this in the tidyverse:
library(dplyr)
library(zoo)
dat %>%
group_by(taxa) %>%
mutate_at(cols, na.aggregate)
Not my own technique I saw it on the boards a while back:
dat <- read.table(text = "id taxa length width
101 collembola 2.1 0.9
102 mite 0.9 0.7
103 mite 1.1 0.8
104 collembola NA NA
105 collembola 1.5 0.5
106 mite NA NA", header=TRUE)
library(plyr)
impute.mean <- function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
dat2 <- ddply(dat, ~ taxa, transform, length = impute.mean(length),
width = impute.mean(width))
dat2[order(dat2$id), ] #plyr orders by group so we have to reorder
Edit A non plyr approach with a for
loop:
for (i in which(sapply(dat, is.numeric))) {
for (j in which(is.na(dat[, i]))) {
dat[j, i] <- mean(dat[dat[, "taxa"] == dat[j, "taxa"], i], na.rm = TRUE)
}
}
Edit many moons later here is a data.table & dplyr approach:
data.table
library(data.table)
setDT(dat)
dat[, length := impute.mean(length), by = taxa][,
width := impute.mean(width), by = taxa]
dplyr
library(dplyr)
dat %>%
group_by(taxa) %>%
mutate(
length = impute.mean(length),
width = impute.mean(width)
)