Approaches to generate auto-incrementing numeric ids in CouchDB
Keeping in mind the issues around replication and conflicts, you can use an update function to generate incrementing IDs that are guaranteed unique in a single master setup.
function(doc, req) {
if (!doc) {
doc = {
_id: req.id,
type: 'idGenerator',
count: 0
};
}
doc.count++;
return [doc, toJSON(doc.count)];
}
Include this function in a design document like so:
{
"_id": "_design/application",
"language": "javascript",
"updates": {
"generateId": "function (doc, req) {\n\t\t\tif (!doc) {\n\t\t\t\tdoc = {\n\t\t\t\t\t_id: req.id,\n\t\t\t\t\ttype: 'idGenerator',\n\t\t\t\t\tcount: 0\n\t\t\t\t};\n\t\t\t}\n\n\t\t\tdoc.count++;\n\t\t\t\n\t\t\treturn [doc, toJSON(doc.count)];\n\t\t}"
}
}
Then call it like so:
curl -XPOST http://localhost:5984/mydb/_design/application/_update/generateId/entityId
Replace entityId
with whatever you like to create several independent ID sequences.
As Dominic Barnes says, auto-increment integers are not scalable, not distributed-friendly or cloud-friendly. It seems every app nowadays needs a mobile version with offline support, and that is not directly compatible with auto-increment integers. We all know this, but it's true: auto-increment integers are necessary for legacy code and arguably other stuff.
In both scenarios, you are responsible for producing the auto-incrementing integer. A view is running emit(the_numeric_id, null)
. (You could also have a "type" namespace, e.g. by emit([doc.type, the_numeric_id], null)
. Query for the final row (e.g. with a startkey=MAXINT&descending=true&limit=1
, increment the value returned, and that is your next id. The attempt to save is in a loop which can retry if there was a collision.
You can also play tricks if you don't need 100% density of the list of IDs. For example, you can add timestamps to the emit()
rows, and estimate the document creation velocity, and increment by that velocity times your computation and transmit time. You could also simply increment by a random integer between 1 and N, so most of the time the first insert works, at a cost of non-homogeneous ID numbers.
About where to store the integer, I think there is the id strategy and the try and check strategy.
The id strategy is simpler and quicker in the short term. Document IDs are an integer (perhaps prefixed with a type to add a namespace). Since Couch guarantees uniqueness on the _id
field, you just worry about the auto-incrementing. Do this in a loop: 409 Conflict
triggers a retry, 201 Accepted
means you're done.
I think the major pain with this trick is, that if and when you get conflicts, you have two completely unrelated documents, and one of them must be copied into a fresh document. If there were relationships with other documents, they must all be corrected. (The CouchDB 0.11 emit(key, {_id: some_foreign_doc_id})
trick comes to mind.)
The try and check strategy uses the default UUID as the doc._id
, so every insert will succeed. Ideally, all or most of your inter-document relations are based on the immutable UUID _id
, not the integer. That is just used for users and UI. The auto-incrementing integer is simply a field in the document, {"int_id":20}
. The view of course does emit(doc.int_id, null)
. (You can look up a document by integer id with a ?key=23?include_docs=true
parameter of the view.
Of course, after a replication, you might have id conflicts (not official CouchDB conflicts, but just documents using the same numeric id). The view which emits by ID would also have a reduce phase: simply _count
should be enough. Next you must patrol the DB, querying this view with ?group=true
and looking for any row (corresponding to an integer id) which has a count > 1. On the plus side, correcting the numeric id of a document is a minor change because it does not require new document creation.
Those are my ideas. Now that I wrote them down, I feel like you must do relation-shepherding regardless of where the id is stored; so perhaps using _id
is better after all. The only other downside I see is that you are permanently married to a fundamentally broken naming model—for some definition of "permanently."
I have had pretty good luck just using an iso formatted date as my key:
http://wiki.apache.org/couchdb/IsoFormattedDateAsDocId
It's pretty simple to do, human-readable and it basically builds in a few querying options by just existing. :-)
Is there any particular reason you want to use numeric IDs over the UUIDs that CouchDB can generate for you? UUIDs are perfect for the distributed paradigm that CouchDB uses, stick with what is built in.
If you find yourself with any more than 1 CouchDB node in your architecture, you're going to get conflicting document IDs if you rely on something like "auto increment" when it comes time for replication. Even if you're only using 1 node now, that's probably not always going to be the case, especially since CouchDB works so well in a distributed and "offline" architecture.