API pagination best practices

I'm not completely sure how your data is handled, so this may or may not work, but have you considered paginating with a timestamp field?

When you query /foos you get 100 results. Your API should then return something like this (assuming JSON, but if it needs XML the same principles can be followed):

{
    "data" : [
        {  data item 1 with all relevant fields    },
        {  data item 2   },
        ...
        {  data item 100 }
    ],
    "paging":  {
        "previous":  "http://api.example.com/foo?since=TIMESTAMP1" 
        "next":  "http://api.example.com/foo?since=TIMESTAMP2"
    }

}

Just a note, only using one timestamp relies on an implicit 'limit' in your results. You may want to add an explicit limit or also use an until property.

The timestamp can be dynamically determined using the last data item in the list. This seems to be more or less how Facebook paginates in its Graph API (scroll down to the bottom to see the pagination links in the format I gave above).

One problem may be if you add a data item, but based on your description it sounds like they would be added to the end (if not, let me know and I'll see if I can improve on this).


If you've got pagination you also sort the data by some key. Why not let API clients include the key of the last element of the previously returned collection in the URL and add a WHERE clause to your SQL query (or something equivalent, if you're not using SQL) so that it returns only those elements for which the key is greater than this value?


You have several problems.

First, you have the example that you cited.

You also have a similar problem if rows are inserted, but in this case the user get duplicate data (arguably easier to manage than missing data, but still an issue).

If you are not snapshotting the original data set, then this is just a fact of life.

You can have the user make an explicit snapshot:

POST /createquery
filter.firstName=Bob&filter.lastName=Eubanks

Which results:

HTTP/1.1 301 Here's your query
Location: http://www.example.org/query/12345

Then you can page that all day long, since it's now static. This can be reasonably light weight, since you can just capture the actual document keys rather than the entire rows.

If the use case is simply that your users want (and need) all of the data, then you can simply give it to them:

GET /query/12345?all=true

and just send the whole kit.