Two-way "Hashing" of string

Reducing an arbitrary length string to a fixed size int is mathematically impossible to reverse. See Pidgeonhole principle. There is a near infinite amount of strings, but only 2^32 32 bit integers.

32 bit hashes(assuming your int is 32 bit) can have collisions very easily. So it's not a good unique ID either.

There are hashfunctions which allow you to create a message with a predefined hash, but it most likely won't be the original message. This is called a pre-image.

For your problem it looks like the best idea is creating a dictionary that maps integer-ids to strings and back.


To get the likelyhood of a collision when you hash n strings check out the birthday paradox. The most important property in that context is that collisions become likely once the number of hashed messages approaches the squareroot of the number of available hash values. So with a 32 bit integer collisions become likely if you hash around 65000 strings. But if you're unlucky it can happen much earlier.


I have exactly what you need. It is called a "pointer". In this system, the "pointer" is always unique, and can always be used to recover the string. It can "point" to any string of any length. As a bonus, it also has the same size as your int. You can obtain a "pointer" to a string by using the & operand, as shown in my example code:

#include <string>
int main() {
    std::string s = "Hai!";
    std::string* ptr = &s; // this is a pointer
    std::string copy = *ptr; // this retrieves the original string
    std::cout << copy; // prints "Hai!"
}

What you need is encryption. Hashing is by definition one way. You might try simple XOR Encryption with some addition/subtraction of values.

  • Reversible hash function?
  • How come MD5 hash values are not reversible?
  • checksum/hash function with reversible property
  • http://groups.google.com/group/sci.crypt.research/browse_thread/thread/ffca2f5ac3093255

... and many more via google search...

Tags:

C++

Hash