More effective merging of matched column with duplicates in data.table
Well, I would use a merge
like the following, but I am not sure that it is simpler than what you have already done.
merge(y, unique(x[, c('let','num')]), all.x=TRUE, by='let')
The only improvements that I could think of is that
You could skip the
setkey(x, let)
partYou could also update
y
by reference (rather than creating a copy using<-
and then assigning back toy
)
If you are using the current stable version version of data.table
(v <= 1.9.4) you will have to use allow.cartesian = TRUE
setkey(y,let)
y[x[!duplicated(let)], num := i.num, allow.cartesian = TRUE][]
You could alternatively use unique
instead of duplicated
(they both have data.table
methods)
y[unique(x, by = "let"), num := i.num, allow.cartesian = TRUE]
Here's another possibility using the new .EACHI
method, although there's no need for the use of by=.EACHI
here. I've shown you just to expose this feature for you. Have a look at this post for a detailed explanation of what this does and when it's useful.
y[x, num := unique(i.num), by = .EACHI, allow.cartesian = TRUE]
Edit: (Thanks to @Arun for pointing this out)
We shouldn't need allow.cartesian
argument here, as there are no duplicates in i
. In fact, it's a bug, #742 that has been fixed in the current development version (1.9.5). So you just need to do:
y[x[!duplicated(let)], num := i.num, on = "let"]
# or
y[unique(x, by = "let"), num := i.num, on = "let"]
# or (though not recommended in this specific case)
y[x, num := unique(i.num), by = .EACHI, on = "let"]