Encoding name strings into an unique number

To get the same width numbers, can't you just zero-pad on the left?

Some options:

  1. Sort them. Count them. The 10th name is number 10.
  2. Treat each character as a digit in a base 26 (case insensitive, no digits) or 52 (case significant, no digits) or 36 (case insensitive with digits) or 62 (case sensitive with digits) number. Compute the value in an int. EG, for a name of "abc", you'd have 0 * 26^2 + 1 * 26^1 + 2 * 20^0. Sometimes Chinese names may use digits to indicate tonality.
  3. Use a "perfect hashing" scheme: http://en.wikipedia.org/wiki/Perfect_hash_function
  4. This one's mostly suggested in fun: use goedel numbering :). So "abc" would be 2^0 * 3^1 * 5^2 - it's a product of powers of primes. Factoring the number gives you back the characters. The numbers could get quite large though.
  5. Convert to ASCII, if you aren't already using it. Then treat each ordinal of a character as a digit in a base-256 numbering system. So "abc" is 0*256^2 + 1*256^1 + 2*256^0.

If you need to be able to update your list of names and numbers from time to time, #2, #4 and #5 should work. #1 and #3 would have problems. #5 is probably the most future-proofed, though you may find you need unicode at some point.

I believe you could do unicode as a variant of #5, using powers of 2^32 instead of 2^8 == 256.


What you are trying to do there is actually hashing (at least if you have a fixed number of digits). There are some good hashing algorithms with few collisions. Try out sha1 for example, that one is well tested and available for modern languages (see http://en.wikipedia.org/wiki/Sha1) -- it seems to be good enough for git, so it might work for you.

There is of course a small possibility for identical hash values for two different names, but that's always the case with hashing and can be taken care of. With sha1 and such you won't have any obvious connection between names and IDs, which can be a good or a bad thing, depending on your problem.

If you really want unique ids for sure, you will need to do something like NealB suggested, create IDs yourself and connect names and IDs in a Database (you could create them randomly and check for collisions or increment them, starting at 0000000000001 or so).

(improved answer after giving it some thought and reading the first comments)


You can use the BigInteger for encoding arbitrary strings like this:

BigInteger bi = new BigInteger("some string".getBytes());

And for getting the string back use:

String str = new String(bi.toByteArray());