How to merge two data.table by different column names?

You can also merge using multiple columns having different names. see example below

# create data frame authors
authors <- data.frame(
FirstName=c("Lorne", "Loren", "Robin",
              "Robin", "Billy"),
LastName=c("Green", "Jaye", "Green",
             "Howe", "Jaye"),
Age=c(82, 40, 45, 2, 40),
Income=c(1200000, 40000, 25000, 0, 27500),
Home=c("California", "Washington", "Washington",
    "Alberta", "Washington"))

# create data frame books Note First name in authors is same as AuthorFirstname same thing with lastname.
books <- data.frame(
        AuthorFirstName=c("Lorne", "Loren", "Loren",
            "Loren", "Robin", "Rich"),
        AuthorLastName=c("Green", "Jaye", "Jaye", "Jaye",
            "Green", "Calaway"),
        Book=c("Bonanza", "Midwifery", "Gardening",
        "Perennials", "Who_dun_it?", "Support"))

merge(authors, books, by.x=c("FirstName", "LastName"),
      by.y=c("AuthorFirstName", "AuthorLastName"),
      all.x=TRUE)

As of data.table version 1.9.6 (on CRAN on sep 2015) you can specify the by.x and by.y arguments in data.table::merge

merge(x=X, y=Y, by.x="id", by.y="ID")[]
#     id area value price sales
#1: c001   US   100   500    20
#2: c002   UK   200   200    30
#3: c003   EU   300   400    15

However, in data.table 1.9.6 you can also specfy the on argument in the X[Y] notation

X[Y] syntax can now join without having to set keys by using the new on argument. For example: DT1[DT2, on=c(x = "y")] would join column "y" of DT2 with "x" of DT1. DT1[DT2, on="y"] would join column "y" of both data.tables.

X[Y, on=c(id = "ID")]
#   area   id value price sales
#1:   US c001   100   500    20
#2:   UK c002   200   200    30
#3:   EU c003   300   400    15

this answer by the data.table author has more details

Merge fails when you use by.x and by.y with data.table. Taking your data:

> merge(X,Y, by.x='id', by.y='ID')
Error in merge.data.table(X, Y, by.x = "id", by.y = "ID")

You can use data.table with merge , but you need to use by argument for joining (so rename the columns to have the same colnames)

Y = setNames(Y,c('id','price','sales'))

This will still not work:

merge(X,Y, by.x='id', by.y='id')
Error in merge.data.table(X, Y, by.x = "id", by.y = "id") :

But this will work:

> merge(X,Y, by='id')
#     id area value price sales
#1: c001   US   100   500    20
#2: c002   UK   200   200    30
#3: c003   EU   300   400    15

Alternatively, you would need to convert data.table to data.frame in order to use merge with by.x and by.y arguments:

merge(data.frame(X), data.frame(Y), by.x='id', by.y='ID')

OUTDATED

Use this operation:

X[Y]
#    area   id value price sales
# 1:   US c001   100   500    20
# 2:   UK c002   200   200    30
# 3:   EU c003   300   400    15

or this operation:

Y[X]
#      ID price sales area value
# 1: c001   500    20   US   100
# 2: c002   200    30   UK   200
# 3: c003   400    15   EU   300

Edit after you edited your question, I read Section 1.12 of the FAQ: "What is the didifference between X[Y] and merge(X,Y)?", which led me to checkout ?merge and I discovered there are two different merge functions depending upon which package you are using. The default is merge.data.frame but data.table uses merge.data.table. Compare

merge(X, Y, by.x = "id", by.y = "ID") # which is merge.data.table
# Error in merge.data.table(X, Y, by.x = "id", by.y = "ID") : 
# A non-empty vector of column names for `by` is required.

with

merge.data.frame(X, Y, by.x = "id", by.y = "ID")
#     id area value price sales
# 1 c001   US   100   500    20
# 2 c002   UK   200   200    30
# 3 c003   EU   300   400    15

Edit for completeness based upon a comment by @Michael Bernsteiner, it looks like the data.table team is planning on implementing by.x and by.y into the merge.data.table function, but hasn't done so yet.

How to merge two data.table by different column names?

OUTDATED

Tags:

Merge

R

Data.Table

Related

Recent Posts