What's the best way to store and yet still index encrypted customer data?

I'm currently looking for a solution to this same problem.

One of the best ideas I've found, is this article from Raul García, https://docs.microsoft.com/en-us/archive/blogs/raulga/indexing-encrypted-data.

He suggests using a MAC, to create an indexable column. The solution is for MS SQL Server, but it could be applied to another system.


Update: You'll want to check out CipherSweet instead of rolling your own design. It takes care of a lot of subtle security details and has a straightforward security argument.


Hash functions aren't the solution here. As the accepted answer suggests, indexing encrypted data requires a "blind index", facilitated by a MAC.

Let's say you're encrypting social security numbers. When you insert them into the database, you might do something like this:

$ssn_encrypted = \Defuse\Crypto\Crypto::encrypt($ssn, $our_encryption_key);
$ssn_blind_idx = \hash_hmac('sha512', $ssn, $our_search_key);

And then store both values in the database. When you need to quickly grab a value based on an SSN input, you can recalculate the HMAC and search based on that.

The database never sees the SSN, and your encryption keys should never be checked into source control (SVN, git, etc.).


You need to use a new class of encryption algorithms called Format Preserving Encryption (search Wiki).

I would be judicious in using such algorithms off-hand simply for the reason that they are relatively new to the literature and it is a thumb rule that you wait for an algorithm to be crypt-analyzed for (say) a decade before you can use it for serious purposes. I am also not sure if there are any standards for such encryption formats. There is only a draft for standard that was submitted in 2010. http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/proposedmodes/ffx/ffx-spec.pdf

So, consider using it judiciously. Do not rely on format-preserving encryption for information that needs a secrecy span of more than (say) 5 years.