How does Excel's worksheet password protection work

It is a small world, as the code indicates I posted that code at another forum around 10 years ago having seen it somewhere else - I think on John Walkenbach's old web site

It is important to note that this code protection applies to worksheet protection only - not to Excel's file open or VBA passwords.

  • One example of this write-up in full is here (screenshot below)
  • googling excel sheet protection “test” and “zzyw” gives other references such as this from Tom Urtis

enter image description here


Edit (2020): From Excel 2013 on, apparently the protection scheme has changed. So the original answer only has historical significance anymore.

The new protection makes it near-impossible to retrieve the password by using state-of-the-art SHA-512 hashing. But why break it if you can simply pluck it out within seconds.

  • unzip the .xlsx or .xlsm file
  • edit xl/worksheets/sheet<num>.xml
  • search and remove <sheetProtection... /> tag
  • save, zip again, enjoy

Original answer (up to Excel 2010)

Fascinating - I knew the code snippet before, but not the explanation that brettdj posted. As the others explained, it is a brute-force search for hash collisions. Actually it seems to have been made by trial and error, since it does much more work than necessary (194560 combinations are generated, but there are only 32768 hashvalues possible.)

Excel's hash algorithm in short (as explained in http://chicago.sourceforge.net/devel/docs/excel/encrypt.html):

  1. Take the ascii code of each character of the passwort.
  2. Treat it as a 16-bit signed number. Shift its bits to the left, based on the position of the character (1 bit for first character, 2 for 2nd and so on)
  3. XOR all the characters together, giving a 16-bit signed int >=0.
  4. XOR that result with the length of the password and a magic number.

Knowing this, one can devise a brute-force search as follows:

  • The highest bit is always zero, so there are 15 bits to test.
  • Split them up into three counters each covering 5 bits. That way each counter can represent a printable ascii char.
  • Pack the ascii representation of those counters in a password string, in a way so that they do not affect each other.

The simplest way is to use a 11-character password and put the counters at position 1, 6 and 11. The bit-shifting in step 2 aligns the counter bits the right way: the first counter ("x") is shifted 1 bit, the second one ("y") 6 bits, the third one ("z") 11 bits. In a bitwise representation of the hash, the counters affect the following bits:

bit: 76543210 76543210
cnt: -zzzzyyy yyxxxxxz

The XOR operations can be ignored since the XOR argument is constant all the time. For the same reason, a constant offset (e.g. 64) can be added. Also it does not matter what character is used on the other password bytes (2-5, 7-10).

By iterating over all possible combinations of x, y, z you eventually find a password that gives the same hash value as the original one.

Public Sub demo()
    ' http://stackoverflow.com/questions/12852095/how-does-excels-worksheet-password-protection-work
    Dim x As Integer, y as Integer, z as Integer
    Dim part1 As String, part12 As String
    Dim sh As Worksheet
    Set sh = ThisWorkbook.Worksheets(1)

    sh.Protect "$ome_Insanely_Long_and_c0mplex_password! [(which i$ imp*ssible t0 re-member)]"

    For x = 64 To 95
        ' pad with dots, so that x, y and z affect nonoverlapping bits of the hash.
        part1 = Chr(x) + "...."
        For y = 64 To 95
            part12 = part1 + Chr(y) + "...."
            For z = 64 To 95
                On Error Resume Next
                    sh.Unprotect part12 + Chr(z)
                    If Err.Number = 0 Then
                        Debug.Print "Password: '" & part12 + Chr(z) & "'"
                        Exit Sub
                    End If
                On Error GoTo 0
            Next
        Next
    Next
End Sub

Just a guess, but it looks to me like Excel tests if the password is valid by running it through some sort of hash function which produces a pretty small range of results and compares it with the hashed value stored.

Presumedly what this function is doing is testing all those values until it finds one that works. Judging by the values used, the hash function produces 2^11*(126-31) different values, all of which can be produced by the values generated in this code.

My analysis assumes this routine works. I haven't tested it.