Timing Safe String Comparison - Avoiding Length Leak
Being able to process strings of arbitrary length without leaking information on their length seems to be very hard (i.e. I don't see how to do it) because of caches. A very long string, by definition, will take a lot of room, and thus reading the string will incur interaction with the caches. Accessing the string from RAM will trigger cache misses, and also evict other data elements from the caches, impacting the future behaviour of the application code. A cache miss costs dozens or even hundreds of clock cycles: it is at least ten times more visible, from the outside, than a branch misprediction. If you worry about branches, then you should worry a lot more about caches.
However, we can cheat with padding. Assume that you could arrange for the two strings you want to compare to be written at the start of two big, equal-sized buffers full of zeros; also, we suppose that a byte of value 0 cannot appear in a normal string (e.g. these are C strings). Then all you need is to make a leak-free comparison of the two buffers, who have the same length. The buffer length will leak, but it is a fixed, constant and publicly known parameter, so that's not a problem.
This does not solves the problem; it moves it. Now, you have to make sure that whatever produced the strings could write them in the buffers without leaking size information. Generally speaking, you no longer have strings. You have binary values of a given fixed length that you copy with a big memcpy()
; these values just happen to have a string interpretation in which the bytes are considered to be characters, up to the first byte of value zero.
From a higher point of view, having a "safe string comparison function" is like bringing a bucket aboard the Titanic. If your code is handling secret data, then everything you do with the data is potentially subject to timing attacks. In general, your application can be of two sorts:
If the only secret part is a single cryptographic element and everything else is public, then using a few leak-free primitives makes sense, and will improve overall security. A classic example is a Certification Authority, where the only secret part is the CA private key; as long as the signature algorithm does not leak secrets, the whole system is robust against timing attacks. Similarly, a Web site which does password-based authentication but otherwise contains only public data will be fine.
If the secrecy is spread throughout the system, such as a Web site which does password-based authentication to give access to some confidential data, then concentrating on the string comparison misses the point. The whole server code must be made leak-free, and that is a considerably more difficult endeavour (and we don't really know how to do it).
In any case, trying to protect any given piece of code against side-channel attacks becomes harder when the language is more "high-level". A language such as PHP, with its automatic memory management (the garbage collector) and string management (string are values just like integers) will not help at all. That's the reason why low-level primitives implemented in C (such as a leak-free string comparison function) must be provided, but the issue is much larger and encompasses a lot of PHP code as well.
If you assume an adversary that can observe memory access patterns through cache leaks then it's silly to try and protect against the adversary learning the length of the secret. He'll always know. The only way to protect against this is to guarantee that you can access past the end of the string without segfaulting - which you almost surely can't without over-allocating every string in the programming language.
Have you researched the needs of the PHP programmers who want this function?
In the practical applications I can think of - verifying passwords, session tokens, etc. the known string would be relatively small, say < 64 bytes; within one Intel cache line. So your trivial implementation would not actually cause different cache access patterns.
If you really need to compare long strings without leaking length, you should consider comparing hashes instead.