Pros and cons of using MD5 Hash as the primary key vs. use a int identity as the primary key in SQL Server
An int key is simpler to implement and easier to use and understand. It's also smaller (4 bytes vs 16 bytes), so indexes will fit about double the number of entries per IO page, meaning better performance. The table rows too will be smaller (OK, not much smaller), so again you'll fit more rows per page = less IO.
Hash can always produce collisions. Although exceedingly rare, nevertheless, as the birthday problem shows, collisions become more and more likely as record count increases. The number of items needed for a 50% chance of a collision with various bit-length hashes is as follows:
Hash length (bits) Item count for 50% chance of collision
32 77000
64 5.1 billion
128 22 billion billion
256 400 billion billion billion billion
There's also the issue of having to pass around non-ascii bytes - harder to debug, send over wire, etc.
Use int
sequential primary keys for your tables. Everybody else does.
Use ints for primary keys, not hashes. Everyone warns about hash collisions, but in practice they are not a big problem; it's easy to check for collisions and re-hash. Sequential IDs can collide as well if you merge databases.
The big problem with hashes as keys is that you cannot change your data. If you try, your hash will change and all foreign keys become invalid. You have to create a “no, this is the real hash” column in your database and your old hash just becomes a big nonsequential integer.
I bet your business analyst will say “we implement WORM so our records will never change”. They will be proven wrong.