How reliable is the adler32 checksum?

Adler32 has an entirely different purpose than MD5. Adler32 is a checksum. MD5 is a secure message digest. Adler32 is for quick hashes, has a small bit space, and simple algorithm. Its collision rate is low, but not low enough to be secure. MD5, SHA, and other cryptographic/secure hashes (or message digests) have much larger bitspaces and more complex algorithms, thus have far fewer collisions. Compare SHA2-256, for example; 256 bits compared to Adler32's measly 32 bits.

Adler does have its purpose, in hash tables for instance, or rapid data integrity checks. Still, it is not designed with the same purpose as MD5 or other secure digests.

BTW, if a simple but somewhat reliable checksum is what you need, then it seems Fletcher out-performs Adler. I'd speculate they both out-perform CRC, though perhaps not a simple addition based checksum (though it is very prone to collisions). If you want BOTH performance AND security, then use BOTH algorithms. Have the checksum algorithm used as a quick calculation and lookup, then use the larger digest for a more thorough confirmation if found.

To answer your question on ensuring the validity of archives, I would say that it would probably suffice just fine. Best choice? Questionable. Possibility of error? Very low.

It is less reliable than say MD5 or CRC (about the same as CRC actually). Advantage is speed, disadvantage is more showing for short data (few hundred bytes) - the meaning is that the distribution of hash values does not cover very well the available 32bit output. For big files it is a good choice.

For details on the error-checking capabilities of the Adler-32 checksum, see for example Revisiting Fletcher and Adler Checksums. Maxino, 2006.

This paper contains an analysis on the Hamming distance provided by these two checksums, and provides an indication of the residual error rate for data words up to about 2^11 bits. Which, obviously is much less than your requirement of 2^38 bits...

This is an ancient algorithm; one which, as the Wikipedia page says, "trades accuracy for speed". In short, no, you shouldn't rely on it.

The point is that with multiple corruptions, this checksum might still pass as "okay". Due to the avalanche effect, this is significantly less likely to occur in modern algorithms (even the old MD5).

For today's machines, speed is not so much of a concern, therefore I'd suggest using a modern algorithm (whichever is current), even for files in the TB range. The insignificant time savings you'd get with an old checksum system are IMHO not enough to balance the significantly increased risk of undetected data corruption - and honestly, 20GB of files is not that much data these days that you'd need to use weak (and I daresay broken) algorithms.

How reliable is the adler32 checksum?

Tags:

Checksum

Data Consistency

Md5

Adler32

Related

Recent Posts