Why the most common prefix of hashed (SHA1) passwords is "00000"?
someone would need to check my guess against the sha1 algorithm (and troy may have already debunked it since as per his blog answer he "took a peak at the [plain text] passwords) but since passwords are just alpha/numeric and limited symbols as depicted in ASCII creating a hash will ALWAYS start working with a first bit of ZERO (ascii is 0-255 but letters numbers and symbols used are in 32-98 range i believe, so first bit of every 8 bits always zero) and while it is the function of a hash to gloss over this, I suspect predictable bit positioning isn't as easy to obfuscate as one expects. while it ties with 4, 0 is 00000000 in bit form and 4 is 00000100 so both have first FIVE bits as 0,
also note that the two least frequent hash headers both start with E, WHICH IS 11111110 in binary, so they are almost exact opposite in construction (1's vs 0's) AND frequency (low vs high) implying the presence of zero bits may be a side effect of either the algorithm outright (doubtful) or a function of the algorithm on a limited subset skewed by convention, in other words, letters and digits occupy only 1/3rd - 1/4th of the full range depicted by ASCII which is most probable
of course we could go "tin foil hat" with this convo, but I'd just bet coincidence and ASCII are more to blame than that man on the grassy knoll
It's either a coincidence, or (less likely) an artifact/error in acquiring or assembling the results for publication.
Not that it looks like a significant outlier. The spread that's described (381 min, 478 average, 584 max) seems like an even spread for the sample size. A graph of the entire corpus would probably look pretty random.
Like any reasonably constructed hashing algorithm, character frequency in SHA1 results should be randomly distributed. (If SHA1 had some kind of bias, this would be major news in the math and cryptography/cryptology community!)
Well, since the passwords originally come from data breaches, my best guess is that the password table in one of the breached systems was sorted or clustered by the (unsalted -- those are the kind of folks who get their passwords stolen) SHA1 hash of the password. When the system was breached, the attackers started with the "00000" hashes and just didn't make it all the way through...
Or maybe the list that Troy used includes the first part of an SHA1 rainbow table (https://en.wikipedia.org/wiki/Rainbow_table)...
Or something like that. The basic idea is that the SHA1 hash of the passwords was part of the password selection process.