Mongodb Aggregation Framework: Does $group use index?
Per Mongo's 4.2 $group documentation, there is a special optimization for $first:
Optimization to Return the First Document of Each Group
If a pipeline sorts and groups by the same field and the $group stage only uses the $first accumulator operator, consider adding an index on the grouped field which matches the sort order. In some cases, the $group stage can use the index to quickly find the first document of each group.
It makes sense, since only the first entry in an ordered index should be needed for each bin in the $group stage. Unfortunately, in my 3.6 testing, I haven't been able to get nearly the performance I would expect if the index were really being used. I've posted about that problem in detail in another question.
EDIT 2020-04-23
I confirmed with Atlas's MongoDB Support that this $first optimization was added in Mongo 4.2, hence my trouble getting it to work with 3.6. There is also a bug preventing it from working with a composite $group _id at the moment. Further details are available in the post that I linked above.
$group
does not use index data.
From the mongoDB docs:
The $match and $sort pipeline operators can take advantage of an index when they occur at the beginning of the pipeline.
The $geoNear pipeline operator takes advantage of a geospatial index. When using $geoNear, the $geoNear pipeline operation must appear as the first stage in an aggregation pipeline.
@ArthurTacca, as of Mongo 4.0 $sort
preceding $group
will speed up things significantly. See https://stackoverflow.com/a/56427875/92049.
As 4J41's answer says, $group
does not (directly) use an index, although $sort
does if it is the first stage in the pipeline. However, it seems possible that $group
could, in principle, have an optimised implementation if it immediately follows a $sort
, in which case you could make it effectively make use of an index by putting a $sort
before hand.
There does not seem to be a straight answer either way in the docs about whether $group
has this optimisation (although I bet there would be if it did, so this suggests it doesn't). The answer is in MongoDB bug 4507: currently $group
does NOT have this implementation, so the top line of 4J41's answer is right after all. If you really need efficiency, depending on the application it may be quickest to use a regular query and do the grouping in your client code.
Edit: As sebastian's answer says, it seems that in practice using $sort
(that can take advantage of an index) before a $group
can make a very large speed improvement. The bug above is still open so it seems that it is not making the absolute best possible advantage of the index (that is, starting to group items as items are loaded, rather than loading them all in memory first). But it is still certainly worth doing.