Transform NA values based on first registration and nearest values

Here is a way using na.approx from the zoo package and apply with MARGIN = 1 (so this is probably not very efficient but get's the job done).

library(zoo)
df1 <- as.data.frame(t(apply(dat, 1, na.approx, method = "constant", f = .5, na.rm = FALSE)))

This results in

df1
#   V1  V2  V3   V4  V5
#A  NA 0.1 0.2 0.25 0.3
#B 0.1 0.2 0.2 0.30 0.2
#C  NA  NA  NA   NA 0.3
#E  NA  NA 0.1 0.20 0.1

Replace NAs and rename columns.

df1[is.na(df1)] <- 0
names(df1) <- names(dat)
df1
#  Date_1 Date_2 Date_3 Date_4 Date_5
#A    0.0    0.1    0.2   0.25    0.3
#B    0.1    0.2    0.2   0.30    0.2
#C    0.0    0.0    0.0   0.00    0.3
#E    0.0    0.0    0.1   0.20    0.1

explanation

Given a vector

x <- c(0.1, NA, NA, 0.3, 0.2)
na.approx(x)

returns x with linear interpolated values

#[1] 0.1000000 0.1666667 0.2333333 0.3000000 0.2000000

But OP asked for constant values so we need the argument method = "constant" from the approx function.

na.approx(x, method = "constant") 
# [1] 0.1 0.1 0.1 0.3 0.2

But this is still not what OP asked for because it carries the last observation forward while you want the mean for the closest non-NA values. Therefore we need the argument f (also from approx)

na.approx(x, method = "constant", f = .5)
# [1] 0.1 0.2 0.2 0.3 0.2 # looks good

From ?approx

f : for method = "constant" a number between 0 and 1 inclusive, indicating a compromise between left- and right-continuous step functions. If y0 and y1 are the values to the left and right of the point then the value is y0 if f == 0, y1 if f == 1, and y0*(1-f)+y1*f for intermediate values. In this way the result is right-continuous for f == 0 and left-continuous for f == 1, even for non-finite y values.

Lastly, if we don't want to replace the NAs at the beginning and end of each row we need na.rm = FALSE.

From ?na.approx

na.rm : logical. If the result of the (spline) interpolation still results in NAs, should these be removed?

data

dat <- structure(list(Date_1 = c(NA, 0.1, NA, NA), Date_2 = c(0.1, NA, 
NA, NA), Date_3 = c(0.2, NA, NA, 0.1), Date_4 = c(NA, 0.3, NA, 
0.2), Date_5 = c(0.3, 0.2, 0.3, 0.1)), .Names = c("Date_1", "Date_2", 
"Date_3", "Date_4", "Date_5"), class = "data.frame", row.names = c("A", 
"B", "C", "E"))

EDIT

If there are NAs in the last column we can replace these with the last non-NAs before we apply na.approx as shown above.

dat$Date_6[is.na(dat$Date_6)] <- dat[cbind(1:nrow(dat),
                                           max.col(!is.na(dat), ties.method = "last"))][is.na(dat$Date_6)]

This is another possible answer, using na.locf from the zoo package. Edit: apply is actually not required; This solution fills in the last observed value if this value is missing.

# create the dataframe
Date1 <- c(NA,.1,NA,NA)
Date2 <- c(.1, NA,NA,NA)
Date3 <- c(.2,NA,NA,.1)
Date4 <- c(NA,.3,NA,.2)
Date5 <- c(.3,.2,.3,.1)
Date6 <- c(.1,NA,NA,NA)
df <- as.data.frame(cbind(Date1,Date2,Date3,Date4,Date5,Date6))
rownames(df) <- c('A','B','C','D')

> df
  Date1 Date2 Date3 Date4 Date5 Date6
A    NA   0.1   0.2    NA   0.3   0.1
B   0.1    NA    NA   0.3   0.2    NA
C    NA    NA    NA    NA   0.3    NA
D    NA    NA   0.1   0.2   0.1    NA



# Load library
library(zoo)
df2 <- t(na.locf(t(df),na.rm = F)) # fill last observation carried forward
df3 <- t(na.locf(t(df),na.rm = F, fromLast = T)) # last obs carried backward

df4 <- (df2 + df3)/2 # mean of both dataframes

df4 <- t(na.locf(t(df4),na.rm = F)) # fill last observation carried forward
df4[is.na(df4)] <- 0 # NA values are 0

  Date1 Date2 Date3 Date4 Date5 Date6
A   0.0   0.1   0.2  0.25   0.3   0.1
B   0.1   0.2   0.2  0.30   0.2   0.2
C   0.0   0.0   0.0  0.00   0.3   0.3
D   0.0   0.0   0.1  0.20   0.1   0.1

Transform NA values based on first registration and nearest values

Tags:

R

Na

Related

Recent Posts