Multiple join conditions using the $lookup operator

Starting Mongo 4.4, we can achieve this type of "join" with the new $unionWith aggregation stage coupled with a classic $group stage:

// > db.collection1.find()
//   { "user1" : 1, "user2" : 2, "percent" : 0.56 }
//   { "user1" : 4, "user2" : 3, "percent" : 0.14 }
// > db.collection2.find()
//   { "user1" : 1, "user2" : 2, "percent" : 0.3  }
//   { "user1" : 2, "user2" : 3, "percent" : 0.25 }
db.collection1.aggregate([
  { $set: { percent1: "$percent" } },
  { $unionWith: {
    coll: "collection2",
    pipeline: [{ $set: { percent2: "$percent" } }]
  }},
  { $group: {
    _id: { user1: "$user1", user2: "$user2" },
    percents: { $mergeObjects: { percent1: "$percent1", percent2: "$percent2" } }
  }}
])
// { _id: { user1: 1, user2: 2 }, percents: { percent1: 0.56, percent2: 0.3 } }
// { _id: { user1: 2, user2: 3 }, percents: { percent2: 0.25 } }
// { _id: { user1: 4, user2: 3 }, percents: { percent1: 0.14 } }

This:

Starts with a union of both collections into the pipeline via the new $unionWith stage:
- We first rename percent from collection1 to percent1 (using a $set stage)
- Within the $unionWith stage, we specify a pipeline on the collection2 in order to also rename percent this time to percent2.
- This way, we can differentiate the percentage field's origin.
Continues with a $group stage that:
- Groups records based on user1 and user2
- Accumulate percentages via a $mergeObjects operation. Using $first: "$percent1" and $first: "$percent2" wouldn't work since this could potentially take null first (for elements from the other collection). Whereas $mergeObjects discards null values.

If you need a different output format, you can add a downstream $project stage.

We can do multiple join conditions with the $lookup aggregation pipeline operator in version 3.6 and newer.

We need to assign the fields's values to variable using the let optional field; you then access those variables in the pipeline field stages where you specify the pipeline to run on the collections.

Note that in the $match stage, we use the $expr evaluation query operator to compare the fields's value.

The last stage in the pipeline is the $replaceRoot aggregation pipeline stage where we simply merge the $lookup result with part of the $$ROOT document using the $mergeObjects operator.

db.collection2.aggregate([
       {
          $lookup: {
             from: "collection1",
             let: {
                firstUser: "$user1",
                secondUser: "$user2"
             },
             pipeline: [
                {
                   $match: {
                      $expr: {
                         $and: [
                            {
                               $eq: [
                                  "$user1",
                                  "$$firstUser"
                               ]
                            },
                            {
                               $eq: [
                                  "$user2",
                                  "$$secondUser"
                               ]
                            }
                         ]
                      }
                   }
                }
             ],
             as: "result"
          }
       },
       {
          $replaceRoot: {
             newRoot: {
                $mergeObjects:[
                   {
                      $arrayElemAt: [
                         "$result",
                         0
                      ]
                   },
                   {
                      percent1: "$$ROOT.percent1"
                   }
                ]
             }
          }
       }
    ]
)

This pipeline yields something that look like this:

{
    "_id" : ObjectId("59e1ad7d36f42d8960c06022"),
    "user1" : 1,
    "user2" : 2,
    "percent" : 0.3,
    "percent1" : 0.56
}

If you are not on version 3.6+, you can first join using one of your field let say "user1" then from there you unwind the array of the matching document using the $unwind aggregation pipeline operator. The next stage in the pipeline is the $redact stage where you filter out those documents where the value of "user2" from the "joined" collection and the input document are not equal using the $$KEEP and $$PRUNE system variables. You can then reshape your document in $project stage.

db.collection1.aggregate([
    { "$lookup": { 
        "from": "collection2", 
        "localField": "user1", 
        "foreignField": "user1", 
        "as": "collection2_doc"
    }}, 
    { "$unwind": "$collection2_doc" },
    { "$redact": { 
        "$cond": [
            { "$eq": [ "$user2", "$collection2_doc.user2" ] }, 
            "$$KEEP", 
            "$$PRUNE"
        ]
    }}, 
    { "$project": { 
        "user1": 1, 
        "user2": 1, 
        "percent1": "$percent", 
        "percent2": "$collection2_doc.percent"
    }}
])

which produces:

{
    "_id" : ObjectId("572daa87cc52a841bb292beb"),
    "user1" : 1,
    "user2" : 2,
    "percent1" : 0.56,
    "percent2" : 0.3
}

If the documents in your collections have the same structure and you find yourself performing this operation often, then you should consider to merge the two collections into one or insert the documents in those collections into a new collection.

db.collection3.insertMany(
    db.collection1.find({}, {"_id": 0})
    .toArray()
    .concat(db.collection2.find({}, {"_id": 0}).toArray())
)

Then $group your documents by "user1" and "user2"

db.collection3.aggregate([
    { "$group": {
        "_id": { "user1": "$user1", "user2": "$user2" }, 
        "percent": { "$push": "$percent" }
    }}
])

which yields:

{ "_id" : { "user1" : 1, "user2" : 2 }, "percent" : [ 0.56, 0.3 ] }

If you're trying to model your data, and came here to check if mongodb can perform joins on multiple fields before deciding to do so, please read on.

While MongoDB can perform joins, you also have the freedom to model data according to your application access pattern. If the data is as simple as presented in the question, we can simply maintain a single collection that looks like this:

{
    user1: 1,
    user2: 2,
    percent1: 0.56,
    percent2: 0.3
}

Now you can perform all the operations on this collection you would have performed by joining. Why are we trying to avoid joins? Because they are not supported by sharded collections (docs), which will stop you from scaling out when needed. Normalizing data (having separate tables/collections) works very well in SQL, but when it comes to Mongo, avoiding joins can offer advantages without consequences in most cases. Use normalization in MongoDB only when you have no other choice. From the docs:

In general, use normalized data models:

when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.

to represent more complex many-to-many relationships.

to model large hierarchical data sets.

Check here to read more about embedding and why you would choose it over normalization.

Multiple join conditions using the $lookup operator

Tags:

Join

Mongodb

Mongodb Query

Aggregation Framework

Related

Recent Posts