HTTP: Generating ETag Header

An etag is an arbitrary string that the server sends to the client that the client will send back to the server the next time the file is requested.

The etag should be computable on the server based on the file. Sort of like a checksum, but you might not want to checksum every file sending it out.

 server                client
 
        <------------- request file foo
 
 file foo etag: "xyz"  -------->
 
        <------------- request file foo
                       etag: "xyz" (what the server just sent)
 
 (the etag is the same, so the server can send a 304)

I built up a string in the format "datestamp-file size-file inode number". So, if a file is changed on the server after it has been served out to the client, the newly regenerated etag won't match if the client re-requests it.

char *mketag(char *s, struct stat *sb)
{
    sprintf(s, "%d-%d-%d", sb->st_mtime, sb->st_size, sb->st_ino);
    return s;
}

As long as it changes whenever the resource representation changes, how you produce it is completely up to you.

You should try to produce it in a way that additionally:

  1. doesn't require you to re-compute it on each conditional GET, and
  2. doesn't change if the resource content hasn't changed

Using hashes of content can cause you to fail at #1 if you don't store the computed hashes along with the files.

Using inode numbers can cause you to fail at #2 if you rearrange your filesystem or you serve content from multiple servers.

One mechanism that can work is to use something entirely content dependent such as a SHA-1 hash or a version string, computed and stored once whenever your resource content changes.


From http://developer.yahoo.com/performance/rules.html#etags:

By default, both Apache and IIS embed data in the ETag that dramatically reduces the odds of the validity test succeeding on web sites with multiple servers.

...

If you're not taking advantage of the flexible validation model that ETags provide, it's better to just remove the ETag altogether.