$lookup on ObjectId's in an array
2017 update
$lookup can now directly use an array as the local field. $unwind
is no longer needed.
Old answer
The $lookup
aggregation pipeline stage will not work directly with an array. The main intent of the design is for a "left join" as a "one to many" type of join ( or really a "lookup" ) on the possible related data. But the value is intended to be singular and not an array.
Therefore you must "de-normalise" the content first prior to performing the $lookup
operation in order for this to work. And that means using $unwind
:
db.orders.aggregate([
// Unwind the source
{ "$unwind": "$products" },
// Do the lookup matching
{ "$lookup": {
"from": "products",
"localField": "products",
"foreignField": "_id",
"as": "productObjects"
}},
// Unwind the result arrays ( likely one or none )
{ "$unwind": "$productObjects" },
// Group back to arrays
{ "$group": {
"_id": "$_id",
"products": { "$push": "$products" },
"productObjects": { "$push": "$productObjects" }
}}
])
After $lookup
matches each array member the result is an array itself, so you $unwind
again and $group
to $push
new arrays for the final result.
Note that any "left join" matches that are not found will create an empty array for the "productObjects" on the given product and thus negate the document for the "product" element when the second $unwind
is called.
Though a direct application to an array would be nice, it's just how this currently works by matching a singular value to a possible many.
As $lookup
is basically very new, it currently works as would be familiar to those who are familiar with mongoose as a "poor mans version" of the .populate()
method offered there. The difference being that $lookup
offers "server side" processing of the "join" as opposed to on the client and that some of the "maturity" in $lookup
is currently lacking from what .populate()
offers ( such as interpolating the lookup directly on an array ).
This is actually an assigned issue for improvement SERVER-22881, so with some luck this would hit the next release or one soon after.
As a design principle, your current structure is neither good or bad, but just subject to overheads when creating any "join". As such, the basic standing principle of MongoDB in inception applies, where if you "can" live with the data "pre-joined" in the one collection, then it is best to do so.
The one other thing that can be said of $lookup
as a general principle, is that the intent of the "join" here is to work the other way around than shown here. So rather than keeping the "related ids" of the other documents within the "parent" document, the general principle that works best is where the "related documents" contain a reference to the "parent".
So $lookup
can be said to "work best" with a "relation design" that is the reverse of how something like mongoose .populate()
performs it's client side joins. By idendifying the "one" within each "many" instead, then you just pull in the related items without needing to $unwind
the array first.
Starting with MongoDB v3.4 (released in 2016), the $lookup
aggregation pipeline stage can also work directly with an array. There is no need for $unwind
any more.
This was tracked in SERVER-22881.
You can also use the pipeline
stage to perform checks on a sub-docunment array
Here's the example using python
(sorry I'm snake people).
db.products.aggregate([
{ '$lookup': {
'from': 'products',
'let': { 'pid': '$products' },
'pipeline': [
{ '$match': { '$expr': { '$in': ['$_id', '$$pid'] } } }
// Add additional stages here
],
'as':'productObjects'
}
])
The catch here is to match all objects in the ObjectId
array
(foreign _id
that is in local
field/prop products
).
You can also clean up or project the foreign records with additional stage
s, as indicated by the comment above.