Merge nearest date, and related variables from a another dataframe by group
We can also do this by one-liner with dplyr
.
library(dplyr)
left_join(df1, df2, by = "ID") %>%
mutate(dateDiff = abs(dateTarget.x - dateTarget.y)) %>%
group_by(ID, dateTarget.x) %>%
filter(dateDiff == min(dateDiff))
Here is the solution based on the base package:
z <- lapply(intersect(df1$ID,df2$ID),function(id) {
d1 <- subset(df1,ID==id)
d2 <- subset(df2,ID==id)
d1$indices <- sapply(d1$dateTarget,function(d) which.min(abs(d2$dateTarget - d)))
d2$indices <- 1:nrow(d2)
merge(d1,d2,by=c('ID','indices'))
})
z2 <- do.call(rbind,z)
z2$indices <- NULL
print(z2)
# ID dateTarget.x Value dateTarget.y ValueMatch
# 1 3 2015-11-14 47 2015-07-06 48
# 2 3 2015-12-08 98 2015-07-06 48
# 3 3 2015-02-22 52 2015-03-09 94
# 4 3 2014-11-17 68 2014-12-15 95
# 5 3 2013-05-30 91 2013-04-01 85
# 6 1 2013-11-04 70 2014-02-21 35
# 7 1 2014-12-29 18 2014-12-06 88
# 8 2 2013-01-14 52 2013-04-08 77
# 9 2 2015-07-29 97 2015-08-01 68
# 10 2 2015-06-15 98 2015-08-01 68
Using data.table
, simple and elegant solution:
library(data.table)
setDT(df1)
setDT(df2)
setkey(df2, ID, dateTarget)[, dateMatch:=dateTarget]
df2[df1, roll='nearest']
ID dateTarget ValueMatch dateMatch Value
1: 3 2015-11-14 48 2015-07-06 47
2: 3 2015-02-22 94 2015-03-09 52
3: 1 2014-12-29 88 2014-12-06 18
4: 3 2015-12-08 48 2015-07-06 98
5: 2 2013-01-14 77 2013-04-08 52
6: 2 2015-07-29 68 2015-08-01 97
7: 3 2013-05-30 85 2013-04-01 91
8: 1 2013-11-04 35 2014-02-21 70
9: 2 2015-06-15 68 2015-08-01 98
10: 3 2014-11-17 95 2014-12-15 68
Here's my take using dplyr
, based on the accepted answer. I wanted to have a bit more freedom on the grouping column.
match_by_group_date <- function(df1, df2, grp, datecol) {
grp1 <- df1 %>% pull({{grp}}) %>% unique()
grp2 <- df2 %>% pull({{grp}}) %>% unique()
li <-
lapply(intersect(grp1, grp2), function(tt) {
d1 <- filter(df1, {{grp}}== tt)
d2 <- filter(df2, {{grp}}==tt) %>% mutate(indices = 1:n())
d2_date <- d2 %>% pull({{datecol}}) %>% as.POSIXct()
print(d2_date)
d1 <- mutate(d1, indices = map_dbl({{datecol}}, function(d) which.min(abs(d2_date - as.POSIXct(d)))))
left_join(d1,d2, by=c(quo_name(enquo(grp)), "indices"))
})
# bind rows
return(bind_rows(li))
}
Update
As of 2022, there is a join_by()
in the works. See dplyr
dev docs here
https://dplyr.tidyverse.org/dev/reference/join_by.html
For now I will continue using this method, or data.table
. But join_by()
will probably get stable enough that is fast and preferred.