Replace NA with previous or next value, by group, using dplyr
Using @agenis method with na.locf()
combined with purrr
, you could do:
library(purrr)
library(zoo)
ps1 %>%
slice_rows("userID") %>%
by_slice(function(x) {
na.locf(na.locf(x), fromLast=T) },
.collate = "rows")
library(tidyr) #fill is part of tidyr
ps1 %>%
group_by(userID) %>%
#fill(color, age, gender) %>% #default direction down
fill(color, age, gender, .direction = "downup")
Which gives you:
Source: local data frame [9 x 4]
Groups: userID [3]
userID color age gender
<dbl> <fctr> <fctr> <fctr>
1 21 blue 3yrs F
2 21 blue 2yrs F
3 21 red 2yrs M
4 22 blue 3yrs F
5 22 blue 3yrs F
6 22 blue 3yrs F
7 23 red 4yrs F
8 23 red 4yrs F
9 23 gold 4yrs F
I wrote this function and it is definitely faster than fill and probably faster than na.locf:
fill_NA <- function(x) {
which.na <- c(which(!is.na(x)), length(x) + 1)
values <- na.omit(x)
if (which.na[1] != 1) {
which.na <- c(1, which.na)
values <- c(values[1], values)
}
diffs <- diff(which.na)
return(rep(values, times = diffs))
}
Using zoo::na.locf
directly on the whole data.frame would fill the NA regardless of the userID
groups. Package dplyr's grouping has unfortunately no effect on na.locf
function, that's why I went with a split:
library(dplyr); library(zoo)
ps1 %>% split(ps1$userID) %>%
lapply(function(x) {na.locf(na.locf(x), fromLast=T)}) %>%
do.call(rbind, .)
#### userID color age gender
#### 21.1 21 blue 3yrs F
#### 21.2 21 blue 2yrs F
#### 21.3 21 red 2yrs M
#### 22.4 22 blue 3yrs F
#### 22.5 22 blue 3yrs F
#### 22.6 22 blue 3yrs F
#### 23.7 23 red 4yrs F
#### 23.8 23 red 4yrs F
#### 23.9 23 gold 4yrs F
What it does is that it first splits the data into 3 data.frames, then I apply a first pass of imputation (downwards), then upwards with the anonymous function in lapply
, and eventually use rbind
to bring the data.frames back together. You have the expected output.