MongoDB ranged pagination

I recently encounter the same problem when trying to paginate a request while using a field that wasn't unique, for example "FirstName". The idea of this query is to be able to implement pagination on a non-unique field without using skip()

The main problem here is being able to query for a field that is not unique "FirstName" because the following will happen:

  1. $gt: {"FirstName": "Carlos"} -> this will skip all the records where first name is "Carlos"
  2. $gte: {"FirstName": "Carlos"} -> will always return the same set of data

Therefore the solution I came up with was making the $match portion of the query unique by combining the targeted search field with a secondary field in order to make it a unique search.

Ascending order:

db.customers.aggregate([
    {$match: { $or: [ {$and: [{'FirstName': 'Carlos'}, {'_id': {$gt: ObjectId("some-object-id")}}]}, {'FirstName': {$gt: 'Carlos'}}]}},
    {$sort: {'FirstName': 1, '_id': 1}},
    {$limit: 10}
    ])

Descending order:

db.customers.aggregate([
    {$match: { $or: [ {$and: [{'FirstName': 'Carlos'}, {'_id': {$gt: ObjectId("some-object-id")}}]}, {'FirstName': {$lt: 'Carlos'}}]}},
    {$sort: {'FirstName': -1, '_id': 1}},
    {$limit: 10}
    ])

The $match part of this query is basically behaving as an if statement: if firstName is "Carlos" then it needs to also be greater than this id if firstName is not equal to "Carlos" then it needs to be greater than "Carlos"

Only problem is that you cannot navigate to an specific page number (it can probably be done with some code manipulation) but other than it solved my problem with pagination for non-unique fields without having to use skip which eats a lot of memory and processing power when getting to the end of whatever dataset you are querying for.


Good question!

"How many is too many?" - that, of course, depends on your data size and performance requirements. I, personally, feel uncomfortable when I skip more than 500-1000 records.

The actual answer depends on your requirements. Here's what modern sites do (or, at least, some of them).

First, navbar looks like this:

1 2 3 ... 457

They get final page number from total record count and page size. Let's jump to page 3. That will involve some skipping from the first record. When results arrive, you know id of first record on page 3.

1 2 3 4 5 ... 457

Let's skip some more and go to page 5.

1 ... 3 4 5 6 7 ... 457

You get the idea. At each point you see first, last and current pages, and also two pages forward and backward from the current page.

Queries

var current_id; // id of first record on current page.

// go to page current+N
db.collection.find({_id: {$gte: current_id}}).
              skip(N * page_size).
              limit(page_size).
              sort({_id: 1});

// go to page current-N
// note that due to the nature of skipping back,
// this query will get you records in reverse order 
// (last records on the page being first in the resultset)
// You should reverse them in the app.
db.collection.find({_id: {$lt: current_id}}).
              skip((N-1)*page_size).
              limit(page_size).
              sort({_id: -1});

It's hard to give a general answer because it depends a lot on what query (or queries) you are using to construct the set of results that are being displayed. If the results can be found using only the index and are presented in index order then db.dataset.find().limit().skip() can perform well even with a large number of skips. This is likely the easiest approach to code up. But even in that case, if you can cache page numbers and tie them to index values you can make it faster for the second and third person that wants to view page 71, for example.

In a very dynamic dataset where documents will be added and removed while someone else is paging through data, such caching will become out-of-date quickly and the limit and skip method may be the only one reliable enough to give good results.