firefox cache hash key generation algorithm bug

From what I understand of just reading the bugzilla entry, the bug manifests when two distinct problems occur:

  1. Their hash algorithm generates collisions for urls that are "similar enough". From the bug "similiar enough" seems to mean every 4 characters (or perhaps 8) the urls are the same, and
  2. Their logic for dealing with hash collisions fails because they haven't flushed the previous url with the same hash value to disk yet.

So basically, if you have a page with two very similar urls this might happen on some versions of Firefox. It generally won't happen on different pages, I would expect, since then FF will have time to flush the entries to disk avoiding the timing issue.

So if you have multiple resources (scripts, images, etc) that are all loaded from the same page, make sure they have a run of 9 characters that are completely different. One way you might ensure this is by appending a querystring (that you ignore) with a random bit of data, something like:

  • http://foo.com/resource.js?r=dn#@JdsK#

Here is how the algorithm works:

initialize hash to 0
for each byte
    shift hash 4 bits to left (with rotate)
    hash = hash XOR character

visually (16-bit version):

00110000             = '0'
    00110001         = '1'
        00110010     = '2'
            00110011 = '3'
0100            0011 = '4'
00110101             = '5'
====================
01000110001000010000  (and then this will be 'rotated'
                       so that it lines up with the end)
giving:
        00100001000001000110

What this means is that if you have strings of the same length and are mostly the same, then in at least one case, the lower 4 bits of a char and upper 4 bits of the next char xor each other must be unique. However, the method of sticking the 32 bit number into a table might be ever weaker, meaning that it requires the lower4 xor upper4 of a particular location in the string (mod 8 chars) be unique.


This bug was a major issue for my site: http://worldofsolitaire.com

I worked around it a long time ago by using a conditional rule in an .htaccess file that would disable ALL caching of images on the site for Firefox users. This was a horrible thing to need to do, but at the time I couldn't track down the bug within Firefox and having the site be slightly slower is better than showing duplicate/corrupted images.

When I read in the linked bug that it was fixed in the latest Firefox releases, I changed the conditional on April 19th 2009 (yesterday) to only disable caching for Firefox 2 users.

A few hours later I've received over 10 e-mails from Firefox 3 users (confirmed) that they were seeing duplicate images. So this issue is STILL a problem in Firefox 3.

I decided to create a simple Linux test program that would allow me to check URL's to see if they are generating the same cache hash keys.

To compile in any Linux system: g++ -o ffgenhash ffgenhash.cpp

Here is the code (save to file ffgenhash.cpp)

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define ULONG_MAX 0xFFFFFFFF
#define PR_ROTATE_LEFT32(a, bits) (((a) << (bits)) | ((a) >> (32 - (bits))))

unsigned long ffgenhash(const char * key)
{
    unsigned long h=0;

    for(const unsigned char * s = (unsigned char *) key; *s != '\0'; ++s)
    {
        h = PR_ROTATE_LEFT32(h, 4) ^ *s;
    }

    return (h==0 ? ULONG_MAX : h);
}

int main(int argc, char ** argv)
{
    printf("%d\n", ffgenhash(argv[1]));
    return 0;
}

As you can see, here are two real life URL's that generate the same cache hash key:

./ffgenhash "http://worldofsolitaire.com/decks/paris/5/12c.png"
1087949033
./ffgenhash "http://worldofsolitaire.com/decks/paris/5/13s.png"
1087949033

Since I pre-load these images in a Javascript loop, trying to use some sort of empty <script> tag workaround is not possible here.

Indeed I think my only real solution is to modify the URL's for Firefox users in some way to generate a unique cache hash key. So that's the approach I'll use.

By the way, I'm half tempted to create a Firebug addition that will check all resources loaded by a site and give a big error if two resources on the site share a common hash key so the developer is aware. It would be great to run sites like Google maps through this as I've seen weird things with those images over the past few years :)