Mongodb Explain for Aggregation framework
Starting with version 2.6.x mongodb allows users to do explain with aggregation framework.
All you need to do is to add explain : true
db.records.aggregate(
[ ...your pipeline...],
{ explain: true }
)
Thanks to Rafa, I know that it was possible to do even in 2.4, but only through runCommand()
. But now you can use aggregate as well.
Starting with MongoDB version 3.0, simply changing the order from
collection.aggregate(...).explain()
to
collection.explain().aggregate(...)
will give you the desired results (documentation here).
For older versions >= 2.6, you will need to use the explain
option for aggregation pipeline operations
explain:true
db.collection.aggregate([
{ $project : { "Tags._id" : 1 }},
{ $unwind : "$Tags" },
{ $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
{ $group: {
_id : "$_id",
count: { $sum:1 }
}},
{$sort: {"count":-1}}
],
{
explain:true
}
)
An important consideration with the Aggregation Framework is that an index can only be used to fetch the initial data for a pipeline (e.g. usage of $match
, $sort
, $geonear
at the beginning of a pipeline) as well as subsequent $lookup
and $graphLookup
stages. Once data has been fetched into the aggregation pipeline for processing (e.g. passing through stages like $project
, $unwind
, and $group
) further manipulation will be in-memory (possibly using temporary files if the allowDiskUse
option is set).
Optimizing pipelines
In general, you can optimize aggregation pipelines by:
- Starting a pipeline with a
$match
stage to restrict processing to relevant documents. - Ensuring the initial
$match
/$sort
stages are supported by an efficient index. - Filtering data early using
$match
,$limit
, and$skip
. - Minimizing unnecessary stages and document manipulation (perhaps reconsidering your schema if complicated aggregation gymnastics are required).
- Taking advantage of newer aggregation operators if you have upgraded your MongoDB server. For example, MongoDB 3.4 added many new aggregation stages and expressions including support for working with arrays, strings, and facets.
There are also a number of Aggregation Pipeline Optimizations that automatically happen depending on your MongoDB server version. For example, adjacent stages may be coalesced and/or reordered to improve execution without affecting the output results.
Limitations
As at MongoDB 3.4, the Aggregation Framework explain
option provides information on how a pipeline is processed but does not support the same level of detail as the executionStats
mode for a find()
query. If you are focused on optimizing initial query execution you will likely find it beneficial to review the equivalent find().explain()
query with executionStats
or allPlansExecution
verbosity.
There are a few relevant feature requests to watch/upvote in the MongoDB issue tracker regarding more detailed execution stats to help optimize/profile aggregation pipelines:
- SERVER-19758: Add "executionStats" and "allPlansExecution" explain modes to aggregation explain
- SERVER-21784: Track execution stats for each aggregation pipeline stage and expose via explain
- SERVER-22622: Improve $lookup explain to indicate query plan on the "from" collection