What's a good hash function for English words?

To simply sum the letters is not a good strategy because a permutation gives the same result.

This one (djb2) is quite popular and works nicely with ASCII strings.

unsigned long hashstring(unsigned char *str)
{
    unsigned long hash = 5381;
    int c;

    while (c = *str++)
        hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

    return hash;
}

More info here.

If you need more alternatives and some perfomance measures, read here.

Added: These are general hashing functions, where the input domain is not known in advance (except perhaps some very general assumptions: eg the above works slightly better with ascii input), which is the most usual scenario. If you have a known restricted domain (set of inputs fixed) you can do better, see Fionn's answer.


Maybe something like this would help you: http://www.gnu.org/s/gperf/

It generates a optimized hashing function for the input domain.


If you don't need it be cryptographically secure, I would suggest the Murmur Hash. It's extremely fast and has high diffusion. Easy to use.

http://en.wikipedia.org/wiki/MurmurHash

http://code.google.com/p/smhasher/wiki/MurmurHash3

If you do need a cryptographically secure hash, then I suggest SHA1 via OpenSSL.

http://www.openssl.org/docs/crypto/sha.html

Tags:

C++

C

Hash