What can go wrong using the same sequence across multiple tables in postgres?
Three possible issues that spring to mind are:
With any shared resource you are creating a potential bottleneck. My gut says that for your peak load this should not be an issue but I strongly suggest benchmarking any such solution in a production-like production-sized environment to be sure.
You are essentially assigning meaning to surrogate keys which defeats part of their purpose in RDB theory. A surrogate key by its nature should not have meaning beyond being a key for identifying tuples in that relation. If the entities might have meaning together and so need collision free keys, is it correct that they are being modelled separately or has something been missed in the requirements and/or data model design?
You are introducing a potential point of failure. What if a deployment doesn't get its initial sequence starting point set? You then either have a deployment blocking error or deployments start from the same place "breaking" your feature. Also, what will you do if somewhere down the line someone thinks it is a good idea to branch a deployment (in production perhaps a tenant company divests part of itself and needs to separate out the data). What if the seed somehow gets reset by a bad upgrade deployment or other migration?[0]
If none of those issues concerns you then go ahead, the idea isn't going to break anything IMO. Of course there may be better ways even if this one isn't wrong in itself.
When you say "UUID-lite" you imply that you have already considered and discounted UUIDs. Is that the case, and if so are there particular reasons for deciding they are not suitable for this project?
One possible reason for not using UUIDs is index fragmentation though the significance of that is often greatly over-stated[1]. SQL Server's answer to this is the "sequential GUID" which is pretty much equivalent to what you are suggesting if we discount assigning meaning to key values - perhaps postgres has an equivalent to that? Of course always increasing indexes can have their own performance issues (last-page contention, index stats growing stale) in some very specific high volume workloads[2].
Another common argument against UUIDs is key length: why use 16 bytes per value when 4 or 8 will suffice? If the uniqueness is truly a useful property then this will usually trump key-size concerns significantly. If key-size is a concern but you are happy to use a 64-bit INT rather than needing to keep inside 32-bits you could use your technique without adding a potential shared-resource contention issue by doing your seeded-integer-key idea per table[3] using a normal INT IDENTITY(<start>, 1)
[4] column definition, though again this is adding deployment complexity (a small amount, but certainly not zero).
Human readability is sometimes cited as a problem, but that goes back to assigning meaning to surrogate keys.
Compressibility is a less common concern but one you might come across. To just about any compression algorithm UUIDs are likely to look like random (therefor uncompressible) data unless you are using something like SQL server's sequential UUIDs. This might be a concern for a very large set of links (or other block of data) that contains many entity IDs served to an application over a slow network, or if needing to use something like SQL Server's index compression features, though both these matters are essentially just restating the key size concern in a slightly different way and sequential UUIDs may help here too.
[0] this could happen for normal identity columns too of course, but as you are using a less common feature you are increasing the chance of a less experienced DBA after you missing the problem if it happens once you are off doing something new and exciting elsewhere!
[1] I'm a SQL Server guy, I suspect the potential issue is the same in postgres but for all I know it may have a different index layout that can mitigate the effect.
[2] Though again these may be SQL Server specific especially the latter of the two examples I listed
[3] Top two bytes: vary by database, next two: vary by table, remaining four: the incrementing bits
[4] That is MS SQL Server syntax, postgres syntax may vary but you shoudl see what I mean and be able to translate
tl;dr: if you find yourself reinventing the wheel, make sure that all of the existing designs really aren't suitable before starting to considering why a new one might or might not be.
We are considering using a shared sequence to assign ids to primary keys for all of the tables in our database. There are about 100 of them. Only a couple are inserted to frequently and regularly. We want to rule out it being "a terrible idea for an obvious reason" before we moved to the phase of actually trying it and testing it at load.
That's a horrible idea: rule it out. Just use a GUID/UUID. Why did you rule out that idea? In PostgreSQL we use uuid-ossp
,
uuid_generate_v4()
This function generates a version 4 UUID, which is derived entirely from random numbers.
Like this,
CREATE EXTENSION uuid-ossp;
CREATE TABLE f ( f_id uuid DEFAULT uuid_generate_v4() );
You make a lot of assumptions in your answer in order for it to be valid,
- speed "shouldn't be an issue"
- gaps "shouldn't be an issue"
- id exhaustion won't happen
You don't have to assume any of that. What if you get a DOS on the ID creating a massive a gap and pushing rollover on one shard? Why not just use the industry solution for this problem? It's not clear that there is a single drawback. It's likely all win. Except for a few bytes of storage.