More effective merging of matched column with duplicates in data.table

Well, I would use a merge like the following, but I am not sure that it is simpler than what you have already done.

merge(y, unique(x[, c('let','num')]), all.x=TRUE, by='let')

The only improvements that I could think of is that

You could skip the setkey(x, let) part
You could also update y by reference (rather than creating a copy using <- and then assigning back to y)

If you are using the current stable version version of data.table (v <= 1.9.4) you will have to use allow.cartesian = TRUE

setkey(y,let)
y[x[!duplicated(let)], num := i.num, allow.cartesian = TRUE][]

You could alternatively use unique instead of duplicated (they both have data.table methods)

y[unique(x, by = "let"), num := i.num, allow.cartesian = TRUE]

Here's another possibility using the new .EACHI method, although there's no need for the use of by=.EACHI here. I've shown you just to expose this feature for you. Have a look at this post for a detailed explanation of what this does and when it's useful.

y[x, num := unique(i.num), by = .EACHI, allow.cartesian = TRUE]

Edit: (Thanks to @Arun for pointing this out)

We shouldn't need allow.cartesian argument here, as there are no duplicates in i. In fact, it's a bug, #742 that has been fixed in the current development version (1.9.5). So you just need to do:

y[x[!duplicated(let)], num := i.num, on = "let"]
# or
y[unique(x, by = "let"), num := i.num, on = "let"]
# or (though not recommended in this specific case)
y[x, num := unique(i.num), by = .EACHI, on = "let"]

More effective merging of matched column with duplicates in data.table

Tags:

Merge

R

Data.Table

Related

Recent Posts