Generating human-readable/usable, short but unique IDs
Base 62 is used by tinyurl and bit.ly for the abbreviated URLs. It's a well-understood method for creating "unique", human-readable IDs. Of course you will have to store the created IDs and check for duplicates on creation to ensure uniqueness. (See code at bottom of answer)
Base 62 uniqueness metrics
5 chars in base 62 will give you 62^5 unique IDs = 916,132,832 (~1 billion) At 10k IDs per day you will be ok for 91k+ days
6 chars in base 62 will give you 62^6 unique IDs = 56,800,235,584 (56+ billion) At 10k IDs per day you will be ok for 5+ million days
Base 36 uniqueness metrics
6 chars will give you 36^6 unique IDs = 2,176,782,336 (2+ billion)
7 chars will give you 36^7 unique IDs = 78,364,164,096 (78+ billion)
Code:
public void TestRandomIdGenerator()
{
// create five IDs of six, base 62 characters
for (int i=0; i<5; i++) Console.WriteLine(RandomIdGenerator.GetBase62(6));
// create five IDs of eight base 36 characters
for (int i=0; i<5; i++) Console.WriteLine(RandomIdGenerator.GetBase36(8));
}
public static class RandomIdGenerator
{
private static char[] _base62chars =
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
.ToCharArray();
private static Random _random = new Random();
public static string GetBase62(int length)
{
var sb = new StringBuilder(length);
for (int i=0; i<length; i++)
sb.Append(_base62chars[_random.Next(62)]);
return sb.ToString();
}
public static string GetBase36(int length)
{
var sb = new StringBuilder(length);
for (int i=0; i<length; i++)
sb.Append(_base62chars[_random.Next(36)]);
return sb.ToString();
}
}
Output:
z5KyMg wd4SUp uSzQtH UPrGAT UIf2IS QCF9GNM5 0UV3TFSS 3MG91VKP 7NTRF10T AJK3AJU7
I recommend http://hashids.org/ which converts any number (e.g. DB ID) into a string (using salt).
It allows decoding this string back to the number. So you don't need to store it in the database.
Has libs for JavaScript, Ruby, Python, Java, Scala, PHP, Perl, Swift, Clojure, Objective-C, C, C++11, Go, Erlang, Lua, Elixir, ColdFusion, Groovy, Kotlin, Nim, VBA, CoffeeScript and for Node.js & .NET.
I had similar requirements as the OP. I looked into available libraries but most of them are based on randomness and I didn't want that. I could not really find anything that was not based on random and still very short... So I ended up rolling my own based on the technique Flickr uses, but modified to require less coordination and allow for longer periods offline.
In short:
- A central server issues ID blocks consisting of 32 IDs each
- The local ID generator maintains a pool of ID blocks to generate an ID every time one is requested. When the pool runs low it fetches more ID blocks from the server to fill it up again.
Disadvantages:
- Requires central coordination
- IDs are more or less predictable (less so than regular DB ids but they aren't random)
Advantages
- Stays within 53 bits (Javascript / PHP max size for integer numbers)
- very short IDs
- Base 36 encoded so very easy for humans to read, write and pronounce
- IDs can be generated locally for a very long time before needing contact with the server again (depending on pool settings)
- Theoretically no chance of collissions
I have published both a Javascript library for the client side, as well as a Java EE server implementation. Implementing servers in other languages should be easy as well.
Here are the projects:
suid - Distributed Service-Unique IDs that are short and sweet
suid-server-java - Suid-server implementation for the Java EE technology stack.
Both libraries are available under a liberal Creative Commons open source license. Hoping this may help someone else looking for short unique IDs.
I used base 36 when I solved this problem for an application I was developing a couple of years back. I needed to generate a human readable reasonably unique number (within the current calendar year anyway). I chose to use the time in milliseconds from midnight on Jan 1st of the current year (so each year, the timestamps could duplicate) and convert it to a base 36 number. If the system being developed ran into a fatal issue it generated the base 36 number (7 chars) that was displayed to an end user via the web interface who could then relay the issue encountered (and the number) to a tech support person (who could then use it to find the point in the logs where the stacktrace started). A number like 56af42g7 is infinitely easier for a user to read and relay than a timestamp like 2016-01-21T15:34:29.933-08:00 or a random UUID like 5f0d3e0c-da96-11e5-b5d2-0a1d41d68578.