Replace NA with previous or next value, by group, using dplyr

Using @agenis method with na.locf() combined with purrr, you could do:

library(purrr)
library(zoo)

ps1 %>% 
  slice_rows("userID") %>% 
  by_slice(function(x) { 
    na.locf(na.locf(x), fromLast=T) }, 
    .collate = "rows")

library(tidyr) #fill is part of tidyr

ps1 %>% 
  group_by(userID) %>% 
  #fill(color, age, gender) %>% #default direction down
  fill(color, age, gender, .direction = "downup")

Which gives you:

Source: local data frame [9 x 4]
Groups: userID [3]

  userID  color    age gender
   <dbl> <fctr> <fctr> <fctr>
1     21   blue   3yrs      F
2     21   blue   2yrs      F
3     21    red   2yrs      M
4     22   blue   3yrs      F
5     22   blue   3yrs      F
6     22   blue   3yrs      F
7     23    red   4yrs      F
8     23    red   4yrs      F
9     23   gold   4yrs      F

I wrote this function and it is definitely faster than fill and probably faster than na.locf:

fill_NA <- function(x) {
  which.na <- c(which(!is.na(x)), length(x) + 1)
  values <- na.omit(x)

  if (which.na[1] != 1) {
    which.na <- c(1, which.na)
    values <- c(values[1], values)
  }

  diffs <- diff(which.na)
  return(rep(values, times = diffs))
}

Using zoo::na.locf directly on the whole data.frame would fill the NA regardless of the userID groups. Package dplyr's grouping has unfortunately no effect on na.locf function, that's why I went with a split:

library(dplyr); library(zoo)
ps1 %>% split(ps1$userID) %>% 
  lapply(function(x) {na.locf(na.locf(x), fromLast=T)}) %>% 
  do.call(rbind, .)
####      userID color  age gender
#### 21.1     21  blue 3yrs      F
#### 21.2     21  blue 2yrs      F
#### 21.3     21   red 2yrs      M
#### 22.4     22  blue 3yrs      F
#### 22.5     22  blue 3yrs      F
#### 22.6     22  blue 3yrs      F
#### 23.7     23   red 4yrs      F
#### 23.8     23   red 4yrs      F
#### 23.9     23  gold 4yrs      F

What it does is that it first splits the data into 3 data.frames, then I apply a first pass of imputation (downwards), then upwards with the anonymous function in lapply, and eventually use rbind to bring the data.frames back together. You have the expected output.

Replace NA with previous or next value, by group, using dplyr

Tags:

R

Missing Data

Zoo

Dplyr

Related

Recent Posts