How can I prevent side-channel attacks against authentication?

From the code example provided, it is possible to determine the correct password by timing the code when given various inputs.

First, you shouldn't actually examine the password directly! At the very least, you should hash the password with a password hash like Argon2id first, and compare the password hash of the input with the password hash you stored during user registration (or when the user last changed their password).

Even better, you should use a password-authenticated key agreement protocol like OPAQUE, but these may be beyond your pay grade at the moment until they see more widespread adoption and implementation.

What can I do to ensure that my code is not vulnerable to such timing attacks?

The best way to start is to use a library routine or primitive that someone else has already written and has a reason to maintain. For example, in NaCl/libsodium, you can use crypto_verify_32 to compare two 32-byte strings, like two Argon2id hashes, or two HMAC-SHA256 message authentication codes. Then the effort to answer this question can be focused on a single place that will get a lot of attention and scrutiny and will keep up with advances.

But let's say you don't have crypto_verify_32, or you want to impleent it yourself. What can you do?

To start, you need to understand what operations have side channels. It's tempting to say—as other answers did—that the side channel arises only because of an early abort. But that's not the whole story. In general, there are many operations (here written in C for illustration) that may take an amount of time which depends on the values of the inputs—we call these operations variable-time operations, in contrast to constant-time^*:

for (i = 0; i < n; i++) if (x[i] == y[i]) return EFAIL; obviously takes fewer loop iterations so it is practically guaranteed to run in variable time depending on the secret values of x[i] and y[i].
A mere secret-dependent conditional for (i = 0; i < n; i++) if (x[i]) bad++;, if x[i] is secret, may run in variable time too even if the loop doesn't abort early. Why?
- Here's a coarse approximation. The machine instructions that the CPU may execute look something like this:
```
0:      tmp := x[i]
        branch to 1 if tmp is zero
        bad := bad + 1
1:      i := i + 1
        branch to 0 if i < n
```
  The number of instructions executed depends on what the value of x[i] is at each iteration: we skip bad := bad + 1 on some iterations. This is a good model for how early timing attacks on, e.g., RSA worked as in Kocher's seminal paper on timing attacks: the main modular exponentiation loop computes a (say) 2048-bit modular squaring unconditionally, but computes a 2048-bit modular multiplication conditionally depending on the value of the secret exponent. Skipping the multiplication substantially changes the time taken by the whole operation.
- There's another reason, though, and it has to do with branch prediction, a key design element in what makes modern CPUs run so fast on many workloads—even if you write the same amount of code (say, same number of machine instructions, and somehow you guarantee that they take the same number of cycles to compute) in each branch of a conditional, the time it takes to execute may depend on which way the condition went.
- In general, CPUs are bad at keeping which instructions got executed secret, so don't make the choice of instructions depend on secrets.
Table/array lookups might take a different amount of time depending on what memory has been cached in the CPU cache. Consequently, if the location in the array that you're reading from depends on a secret, the time it takes might depend on the secret, which has been exploited to recover AES keys by cache timing.

(This makes AES a rather questionable design in retrospect, with its intentional use of key-dependent table lookups! NIST's published rationale (§3.6.2, Attacks on Implementations: The Role of Operations) curiously claims table lookups are ‘not vulnerable to timing attacks’ despite the numerous such attacks that have been reported since then.)
Variable-distance shifting like x = y << z may take more time on some CPUs if z is larger and less time if it is smaller.

(This makes RC5 and the AES finalist RC6 rather questionable design in retrospect, with their intentional use of key-dependent rotation distances!)
On some CPUs, multiplication may run faster or slower depending on whether the upper half of the inputs is zero or not.
64-bit integer addition on 32-bit CPUs in principle might take more time depending on whether there is a carry. This is because, when x, y, and z, are 64-bit integers, the logic x = y + z might look something more like:
```
int carry = 0;
x[0] = y[0] + z[0];
if (the previous addition overflowed)
  carry = 1;
x[1] = y[1] + z[1] + carry;
```
Consequently, the time it takes may depend on whether there is a carry from the sum of the low 32-bit halves to the sum of the high 32-bit halves. (In practice, this is usually only a concern on exotic CPUs or for other types of side channels like power analysis that are relevant more to smart cards than to laptops and phones.)

This might sound a little overwhelming. What can we do?

There are some operations that generally do run in constant time on most CPUs. They are:

Bitwise operations: x & y, x | y, x ^ y, ~x, and other ones that don't appear in C like AND-with-complement.
Constant-distance shifts and rotations like the shift x << 3 or the rotationx <<< 3 (not standard C but common in cryptography; it means (x << 3) | (x >> (32 - 3)), if x is 32-bit).
Often integer addition and subtration: x + y, x - y, when x and y are (say) unsigned 32-bit integers on a 32-bit CPU, and often even 64-bit integers on a 32-bit CPU with the help of ADD-with-carry instructions.
Sometimes integer multiplication, but the story about multiplication is complicated, which is unfortunate for cryptography because multiplication mixes bits around quite nicely and has useful algebraic properties.

To be clear: I do not mean that a C compiler guarantees these operations run in constant-time if you use them in a C program; I am merely using C notation for the operations that CPUs generally execute in constant time. (More on this caveat in a moment.)

‘But wait,’ you protest, ‘how can I possibly write a useful program out of these operations? No conditionals? No loops? No arrays?’

First, you don't have to eschew conditionals, loops, or arrays altogether. They just can't depend on secrets. For example, for (i = 0; i < 32; i++) ... x[i] ... is fine. But for (i = 0; i < m[0]; i++) ... is not fine if m[0] is supposed to be secret, and for (i = 0; i < m[0]; i++) ... tab[x[i]] ... is not fine if x[i] is supposed to be secret.

Second, you can still build these things! It's just a little trickier. For example, suppose b is a uint32_t that is either 0 or 1. Then b - 1 is either -1 = 0xffffffff or 0, respectively, so

x = ((b - 1) & z) | (~(b - 1) & y);

causes x = y if b is 1, or x = z if b is 0—much like x = (b ? y : z), but without a branch. Obviously, this requires computing both y and z, so there is some performance impact! Similarly, you can look up an element of a table by looking up all elements of the table and selecting the one you want with bitwise operations like this. Not as fast as x[i], but not as leaky either.

In general, you can convert a program with conditionals into a logic circuit with no conditionals, even if you don't want to for performance reasons. There are various other similar tricks you can do. You might draft a constant-time memory equality routine such as crypto_verify_32 like this, assuming x and y are uint8_t arrays:

uint32_t result = 0;
for (i = 0; i < 32; i++)
  result |= x[i] ^ y[i];
return ((result - 1) >> 8) & 1;

Exercise: Does this return 0 for equal and 1 for inequal, or 0 for inequal and 1 for equal?

Writing programs like this—and adopting cryptosystems such as X25519 that encourage implementations that look like this, instead of cryptosystems such as RSA or AES that encourage implementations involving involve secret-dependent branches or secret-dependent table lookups—is a good start for plugging timing side channels.

But, there's a catch! Remember when I said that the C compiler doesn't guarantee constant time? A smart C compiler like Clang/LLVM might recognize that the clever crypto_verify_32 loop above can be executed more efficiently by making it abort early, and might undo the hard work you did to rewrite it as a logic circuit that runs in constant time. (In other circumstances, it might help you out, for example by converting x = (b ? y : z); into a conditional move instruction, CMOV, without branches, but usually you can't rely on the C compiler's good will.)

There are some tricks you can do to thwart this, like an inline assembly fragment that causes the compiler to drop roughly all assumptions for optimization:

uint32_t result = 0;
for (i = 0; i < 32; i++)
  result |= x[i] ^ y[i];
asm volatile ("" ::: "memory");
return ((result - 1) >> 8) & 1;

This may or may not work with your compiler. To be confident, you really have to examine the compiler's generated machine code—and even then, a compiler might perform just-in-time optimizations that rewrite the machine code according to profiling analysis, especially in higher-level languages like Java. So you might really want to write the logic in assembly (or in a programming language like qhasm that can generate the fine-tuned assembly more reliably than a C compiler), and just call it from C.

Maybe some day C compilers will adopt a secret qualifier, like const or volatile, which forces the compiler to generate only machine instructions that are known—in some model of the CPU!—to run in constant time when operating on the object, and prevents the compiler from taking shortcuts like secret-dependent early aborts from a loop. But that day is not here yet.

There's also a matter of which machine instructions actually run in constant time on a CPU, which is sometimes documented and sometimes reliable. So in addition to doing engineering to build your programs out of logic circuits, you also need to do science to figure out which operations are actually safe to use on the CPU.

This brings us back to the original point: You really want to focus the effort of maintaining this into a library routine, so that each programmer doesn't have to keep track of vagaries of compilers (and CPU designs!) in generated code and timing on their own, and can instead leave it to our friendly neighborhood bear.

Are there other countermeasures than constant-time logic? Sometimes, yes.

You could inject random noise into your logic, in the hope that it will confuse the attacker's measurements. But there's already noise in their measurements, like scheduling in the operating system, so they just have to take more samples—and it turns out noise is not a very effective side channel countermeasure.

Specifically, artificial noise raises the attacker's costs by at most about the square of the ratio of artificial noise to true noise, which is far below what is usually considered an acceptable gap for security in cryptography. So it mostly costs you a lot of time doing nothing.
You could use algebraic properties of the cryptosystem to randomize it, sometimes called ‘blinding’. For example, instead of computing y^d mod n where d is a secret exponent in RSA, you could pick r at random, compute s := r^e mod n where e*d ≡ 1 (mod (n)), multiply y by s to get (y * r^e) mod n, compute (y * r^e)^d mod n = (r * y^d) mod n, and then divide off r.

Many implementations, such as OpenSSL, use this approach because it's an easy way to retrofit an existing implementation of a cryptosystem like RSA that has the necessary algebraic structure. It's not a bad idea like random noise is, but it does have costs: you have to do the extra work for randomization, you have to have modular division or inversion logic—and side channels might still leak information about r and d. For example, even blinded modular exponentiation will leak the Hamming weight of d unless you take additional countermeasures like adding a random multiple of (n) to d first—which may expose additional side channels, etc.
For the specific case of comparing two byte strings for equality (for example, two message authentication codes), one reasonable option is to hash them with a pseudorandom function family like HMAC-SHA256 under a one-time secret key k, and check whether HMAC-SHA256_k(x) == HMAC-SHA256_k(y).

The probability of a false positive is 1/2²⁵⁶, which is a smaller probability than you ever have to worry about. You can safely use variable-time equality for the HMAC because if x is not equal to y, then the amount of time even in the naivest byte string equality routine (assuming it doesn't bail out at the first zero byte or something stupid like that!) will be independent of the values of x and y: there's a 255/256 probability it'll stop after one iteration, a 65535/65536 probability after two iterations, etc.

Of course, this really helps only if you can implement HMAC-SHA256 in constant time! Fortunately SHA-256 is designed to be easily implemented as a constant-time logic circuit, so C implementations tend to be reasonably resistant to side channels—but, say, Python will get you into trouble because of the small integer cache if nothing else.

_{^* The terminology is unfortunately a little confusing. Here constant-time means that the amount of time does not vary depending on inputs, and is not the same as the asymptotic notion of ‘constant-time’ in computer science, often written O(1), which just means the amount of time may vary depending on inputs but is bounded by a constant. I'm sorry. I didn't invent the terminology. I might have chosen ‘fixed-time’ vs. ‘variable-time’ but it's too late now—‘constant-time’ is entrenched in the literature.}

Side-Channel attacks are notoriously difficult to detect, because there are many side-channels that an attacker could look for. This includes, but is not limited to:

Timing Attacks
Cache Attacks
Power-Monitoring Attacks
Acoustic Cryptanalysis

Wikipedia has an excellent list, from which this is just an excerpt. Since there are so many different side-channels, each one of them needs to be addressed independently.

What about timing attacks?

Your code is vulnerable to timing attacks, but you already knew that. The question is, how can you fix it? The solution would be to make a constant-time comparison. One example would be code like this:

difference = 0;
for (i = 0; i < n; i++) {
  difference |= (password[i] ^ input[i]);
}

return difference == 0 ? E_OK : E_FAIL;

This code assumes password and input are the same length, e.g. because they are the output of a hash function. The code would accumulate the bit difference between each pair of elements, then returns a result based if the differences are zero. Also beware that your friendly optimizing C compiler is at liberty to spot what this is doing and generate the assembly it would have generated for your original (broken) code. You need to check the actual generate assembler (or use a library function designed for this).

Of course, this would only protect against one kind of side-channel attack, and not others.

What about the other side-channels?

That depends entirely on the side-channel you are focusing on. Some, such as power consumption, require physical access (or other ways to measure consumption), so they may not be a problem if the attacker is far away.

In general, in order to defend against side-channel attacks you need to:

Be aware that the side-channel exists
Check if this side-channel could be a potential problem in your threat model
Check what information is leaked via this side channel
Check how to prevent leaking this information

I assume that the code from the question is just an intentionally trivialized example for illustration, because in a real-world system you would never store passwords in plaintext. But if you would want to replace this fictional code with an implementation which isn't vulnerable to timing attacks, then you would make sure that the algorithm doesn't terminate on the first wrong character but always does the same number of comparisions:

bool isCorrect = true;
for (i = 0; i < PASSWORD_MAX_LENGTH; i++) {
    if (password[i] != input[i]) {
       isCorrect = false;
    }
}
return isCorrect;

However, this is not completely proof against timing attacks either, because depending on how the CPU processes this code, it might still take longer or shorter on fails. One possible source of timing difference could be branch prediction.

Grossly oversimplified: When the CPU notices that it processes an if-condition in a for-loop and that if-condition turns out false most of the time, the CPU will optimize itself on the assumption that it always turns out false. This allows it to process that for-loop a lot faster. But if that if-statement does turn out true all of a sudden, then it causes quite a chaos inside the CPU pipeline which takes a couple clock cycle to clean up. So the timing differences caused by branch prediction failures can be another possible timing side-channel. This is hard to avoid, because it's a feature of the CPU which is completely opaque to the developer and can even depend on the exact model of CPU. For more information, do some research on the Spectre vulnerability.

But there is also a different approach to avoiding timing attacks which is crude and simple but effective: Add a random delay after each password comparison. If the length of the delay comes from a cryptograpically secure pseudorandom number generator, then it ruins the accuracy of the time measurements the attacker relies on.

How can I prevent side-channel attacks against authentication?

What about timing attacks?

What about the other side-channels?

Tags:

Side Channel

Timing Attack

Related

Recent Posts