Is autocorrecting typos in passwords secure?
Full disclosure: I am one of the authors of the paper.
An exact password checking system stores a standard salted, slow-to-compute hash of the password. When a password, such as password123, is registered with the authentication service, a random salt "sa" is selected (this should be 16 or more random bytes) and a slow-to-compute hash function H is applied: H(sa,password123). The result, call it h, is stored in a database along with the salt sa. As mentioned, one should pick H to be slow (10s or 100s of milliseconds to compute). Good choices are argon2, scrypt, or PBKDF2, properly configured.
When a user later attempts login, if they type in their password the hash is recomputed and checked against the stored value. In the example above, if the user sends password123 then one recomputes (again slowly) H(sa,password123) and checks the result --- it matches the previously computed h and login can be allowed. Any deviation in the submitted password from the previously registered one results in a totally different hash value, and login fails.
Our idea is simple: if the first check fails, the system can additionally apply a small number of "corrector" functions to the submitted password, and then apply the hashing algorithm to the result. For example we might fix a caps lock corrector function F_caps that takes as input a password and outputs the password with the capitalization of all letters changed: F_caps(PASSWORD123) = password123 and F_caps(pAsSwOrD123) = PaSsWoRd123. We might also fix a first-letter capitalization corrector: F_first(Password123) = password123 and F_first(pASSWORD123) = PASSWORD123. Note that these are simple to implement.
Then to do typo-tolerant checking one would apply the following logic for a previously registered salt, hash pair (sa,h) and submitted password pw
If H(sa,pw) = h or H(sa,F_caps(pw)) = h or H(sa,F_first(pw)) = h then allow login
As an example, if one submitted PASSWORD123, the checks would be on PASSWORD123, password123, and pASSWORD123, with the second check succeeding.
A few points:
1) The efficacy of offline brute force attacks are exactly the same as before. Why? Because we only store sa,h. The attacker, given sa,h, will only learn the password via a brute-force attack that tries the correct password, in our example password123. There is no loss in security here as we haven't changed how H is computed.
2) There is negligible change in the security against remote guessing attacks. We show this via extensive analyses in the paper, but it boils down to the fact that in the real world the best strategy is to submit the most likely passwords up to some threshold (e.g., many sites lock an account after 10 failed attempts). The extra checks performed due to typo correction may help the attacker get a little bit more lucky, but we show that it is essentially negligible. If one is worried about it we give techniques that reduce it even further.
3) Typo-tolerance will increase CPU utilization for the login server (and possibly memory load, when H is a memory-hard hash such as scrypt or argon2). This is because of the extra slow-to-compute invocations of H. In practice this doesn't seem to be as much overhead as you might expect (3x in our example above) because anyway users would have ended up resubmitting after rejection and you pay the price for each attempt.
4) We do not suggest allowing arbitrary typos. This isn't even possible currently, as it would require recomputing H a prohibitive number of times (remember it is slow!), and anyway this would be insecure. One needs to carefully choose which typos to allow based on a principled analysis. We believe the two correctors mentioned above, caps lock and first letter capitalization, are a no-brainer for secure deployment, beyond that it starts getting more nuanced.
We added an FAQ that provides additional information: https://www.cs.cornell.edu/~rahul/projects/pwtypos.html