How mongoDB projection affects performance?
By default, queries return all fields in matching documents. If you need all the fields, returning full documents is going to be more efficient than having the server manipulate the result set with projection criteria.
However, using projection to limit fields to return from query results can improve performance by:
- removing unneeded fields from query results (saving on network bandwidth)
- limiting result fields to achieve a covered query (returning indexed query results without fetching full documents)
When using projection to remove unused fields, the MongoDB server will have to fetch each full document into memory (if it isn't already there) and filter the results to return. This use of projection doesn't reduce the memory usage or working set on the MongoDB server, but can save significant network bandwidth for query results depending on your data model and the fields projected.
A covered query is a special case where all requested fields in a query result are included in the index used, so the server does not have to fetch the full document. Covered queries can improve performance (by avoiding fetching documents) and memory usage (if other queries don't require fetching the same document).
Examples
For demonstration purposes via the mongo
shell, imagine you have a document that looks like this:
db.data.insert({
a: 'webscale',
b: new Array(10*1024*1024).join('z')
})
The field b
might represent a selection of values (or in this case a very long string).
Next, create an index on {a:1}
which is a commonly used field queried by your use case:
db.data.createIndex({a:1})
A simple findOne()
with no projection criteria returns a query result which is about 10MB:
> bsonsize(db.data.findOne({}))
10485805
Adding the projection {a:1}
will limit the output to the field a
and the document _id
(which is included by default). The MongoDB server is still manipulating a 10MB document to select two fields, but the query result is now only 33 bytes:
> bsonsize(db.data.findOne({}, {a:1}))
33
This query isn't covered because the full document has to be fetched to discover the _id
value. The _id
field is included in query results by default since it is the unique identifier for a document, but _id
won't be included in a secondary index unless explicitly added.
The totalDocsExamined
and totalKeysExamined
metrics in explain()
results will show how many documents and index keys were examined:
> db.data.find(
{a:'webscale'},
{a:1}
).explain('executionStats').executionStats.totalDocsExamined
> 1
This query can be improved using projection to exclude the _id
field and achieve a covered query using only the {a:1}
index. The covered query no longer needs to fetch a ~10MB document into memory, so will be efficient in both network and memory usage:
> db.data.find(
{a:'webscale'},
{a:1, _id:0}
).explain('executionStats').executionStats.totalDocsExamined
0
> bsonsize(db.data.findOne( {a:'webscale'},{a:1, _id:0}))
21
I have slow MongoDB queries. Is returning a subset affect my slow query (I have compound index on the field)?
This isn't answerable without the context of a specific query, example document, and the full explain output. However, you could run some benchmarks in your own environment for the same query with and without projection to compare the outcome. If your projection is adding significant overhead to the overall query execution time (processing and transferring results), this may be a strong hint that your data model could be improved.
If it's not clear why a query is slow, it would be best to post a new question with specific details to investigate.