What data type is recommended for ID columns?

A big disadvantage of using GUID keys is that it is difficult to perform "ad-hoc" queries by hand. Sometimes it is very useful that you can do this:

SELECT * FROM User where UserID=452245

With GUID keys this can become very annoying.

I would recommend 64 bit integers

Any integer type of sufficient size to store anticipated data ranges. Generally 32 bit ints are viewed as too small (rightly or wrongly) for tables with a lot of rows or changes. A 64 bit int is plenty. Many databases won't have or won't use that integer type but will use a NUMBER type with specified scale and precision. 10-15 digits is a fairly common size.

The reason for choosing integer types is twofold:

Size; and
Speed.

The size of an integer is:

32 bit: 4 bytes;
64 bit: 8 bytes;
Binary coded decimal: two digits per byte plus as much as a byte for sign, scale and/or precision.

Compare that to a GUID, which is 128 bits or a normal string, which is at least one byte per character (more in certain character encodings) plus an overhead that might be as little as one byte (terminating null) or could be much more in some cases.

Sorting integers is trivial and, assuming they are unique and the range is sufficiently small, can actually be done in O(n) time, compared to, at best, O(n log n).

also, just as importantly, most databases can generate unique IDs by means of auto-increment columns and/or sequences. Guaranteeing uniqueness in an application is otherwise actually quite hard and tends to result in bloated keys.

Plus auto-generated integer keys are typically either loosely or absolutely ordered (depending on database and configuration), which is a useful quality. Randomly generated GUIDs are basically unordered, which is far less useful.

Popular databases allow for larger autoincrement fields for years now, so it's much less of an issue.

As for what to use, it's always a choice. One is not clearly better than the other, they have different characteristics and each is good in different scenarios. I have used both over time, and the next schema I work with I'll consider both.

Pros for GUID:

Should be unique across computers.
Random, unmemorable goo means people are likely to use this only for its intended purpose of an opaque identifier.

Pros for autoincrement:

Human understandable.
Sequential assignment means you can use a clustered index and impact performance.
Suitable for data partitioning.

What data type is recommended for ID columns?

Tags:

Language Agnostic

Types

Database Design

Database Agnostic

Primary Key

Related

Recent Posts