How to remove duplicate entries from an array?
As of MongoDB 2.2 you can use the aggregation framework with an $unwind
, $group
and $project
stage to achieve this:
db.users.aggregate([{$unwind: '$favorites.books'},
{$group: {_id: '$_id',
books: {$addToSet: '$favorites.books'},
name: {$first: '$name'}}},
{$project: {'favorites.books': '$books', name: '$name'}}
])
Note the need for the $project
to rename the favorites
field, since $group
aggregate fields cannot be nested.
function unique(arr) {
var hash = {}, result = [];
for (var i = 0, l = arr.length; i < l; ++i) {
if (!hash.hasOwnProperty(arr[i])) {
hash[arr[i]] = true;
result.push(arr[i]);
}
}
return result;
}
db.collection.find({}).forEach(function (doc) {
db.collection.update({ _id: doc._id }, { $set: { "favorites.books": unique(doc.favorites.books) } });
})
What you have to do is use map reduce to detect and count duplicate tags .. then use $set
to replace the entire books based on { "_id" : ObjectId("4f6cd3c47156522f4f45b26f"),
This has been discussed sevel times here .. please seee
Removing duplicate records using MapReduce
Fast way to find duplicates on indexed column in mongodb
http://csanz.posterous.com/look-for-duplicates-using-mongodb-mapreduce
http://www.mongodb.org/display/DOCS/MapReduce
How to remove duplicate record in MongoDB by MapReduce?
The easiest solution is to use setUnion (Mongo 2.6+):
db.users.aggregate([
{'$addFields': {'favorites.books': {'$setUnion': ['$favorites.books', []]}}}
])
Another (more lengthy) version that is based on the idea from @kynan's answer, but preserves all the other fields without explicitly specifying them (Mongo 3.4+):
> db.users.aggregate([
{'$unwind': {
'path': '$favorites.books',
// output the document even if its list of books is empty
'preserveNullAndEmptyArrays': true
}},
{'$group': {
'_id': '$_id',
'books': {'$addToSet': '$favorites.books'},
// arbitrary name that doesn't exist on any document
'_other_fields': {'$first': '$$ROOT'},
}},
{
// the field, in the resulting document, has the value from the last document merged for the field. (c) docs
// so the new deduped array value will be used
'$replaceRoot': {'newRoot': {'$mergeObjects': ['$_other_fields', "$$ROOT"]}}
},
// this stage wouldn't be necessary if the field wasn't nested
{'$addFields': {'favorites.books': '$books'}},
{'$project': {'_other_fields': 0, 'books': 0}}
])
{ "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), "name" : "robert", "favorites" :
{ "books" : [ "The Art of Computer Programmning", "Graph Theory", "Algorithms in C++" ] } }