Mapping of strings to integers

This is impossible to achieve without any restrictions, simply because there are more possible Strings than there are integers, so eventually you will run out of numbers.

A solution is only possible when you limit the number of usable Strings. Then you can use a simple counter. Here is a simple implementation where all (2^32 = 4294967296 different strings) can be used. Never mind that it uses lots of memory.

import java.util.HashMap;
import java.util.Map;

public class StringToInt {

    private Map<String, Integer> map;

    private int counter = Integer.MIN_VALUE;

    public StringToInt() {
        map = new HashMap<String, Integer>();
    }

    public int toInt(String s) {
        Integer i = map.get(s);
        if (i == null) {
            map.put(s, counter);
            i = counter;
            ++counter;
        }
        return i;
    }
}

Have a look at perfect hashing.


In most hashcode() type implementations, collisions are accepted as inevitable and tested for.

If you absolutely must have no collisions, guaranteed, the solution you outline will work.

Aside from this, there are cryptographic hash functions such as MD5 and SHA, where collisions are extremely unlikely (though with a lot of effort can be forced). The Java Cryptography Architecture has implementations of these. Those methods may perhaps be faster than a good implementation of your solution for very large sets. They will also execute in constant time and give the same code for the same string, no matter which order the strings are added in. Also, it doesn't require storing each string. Crypto hash results could be considered as integers but they won't fit in a java int - you could use a BigInteger to hold them as suggested in another answer.

Incidentally, if you're put off by the idea of a collision being 'extremely unlikely', it's probably similar likelihood that a bit would randomly flip in your computer memory or hard disk and cause any program to behave differently than you expect :-)

Note, there are also some theoretical weaknesses in some hash functions (e.g. MD5) but for your purposes that probably doesn't matter and you could just use the most efficient such function - those weaknesses are only relevant if someone is maliciously trying to come up with strings that have the same code as another string.

edit: I just noticed in the title of your question, it seems you want bidirectional mapping, though you don't actually state this in the question. It is (by design) not possible to go from a Crypto hash to the original string. If you really need that, you'd have to store a map keying hashes back to strings.


There's not going to be an easy or complete solution. We use hashes because there are way more possible Strings than there are ints. Collisions are just a limitation of using a finite number of bits to represent integers.

Tags:

Java