Why are static password requirements used so frequently?
After the famous XKCD strip, there were a few projects started up to deal with exactly this kind of entropy checking. One of these was the ZXCVBN password checker, made by a Dropbox employee.
It is possibly the most thorough password checker of its kind. It checks for patterns, words, and more, adding to (or subtracting from) an entropy score accordingly. It is explained in detail on their blog.
This is a great idea, in fact it is the only proper way of measuring password strength.
But how would you measure password entropy?
Entropy is an aspect of the generation process, not of the output.
For example, what would be the output of such a measurement for Tr0ub4dor&3
? By any reasonable measure of possible entropy based on a given password, that would be rather decent - over 70 bits of entropy. Or maybe, taking into account a supposed password generation process, I might be smart enough to realize it is actually capped to only 28 bits, since each character is not selected randomly, but first a whole word is selected. But in reality I should junk this whole idea altogether, since I obviously just copied it directly from that comic.
Same issue would apply if the password was correct horse battery staple
(one of the most popular passwords amongst a certain population).
So yeah, password requirements should be based on the password entropy, but you cannot apply this requirement to a given password after the fact.
(Btw, as I mentioned in another answer on this topic (from a different direction), it could be a good idea to implement a system where passwords / passphrases are auto-generated for a given level of entropy, and provided to the user, instead of asking the users to come up with one that meets our requirements. Of course, this is what a good password manager would do on the client, anyway...)
Static password policies are chosen for two major reasons: usability and the body of research demonstrating acceptable effectiveness. Most of my answer comes from the excellent research paper on an advanced password-strength meter, Telepathwords.
First, to summarize some of the research used to back up current password policies:
Password-composition rules date back at least to 1979, when Morris and Thompson reported on the predictability of the passwords used by users on their Unix systems; they proposed that passwords longer than four characters, or purely alphabetic passwords longer than five characters, will be “very safe indeed” [19] [However] Bonneau analyzed nearly 70 million passwords in 2012, 33 years later, to measure the impact of a six-character minimum requirement compared with no requirement [2]. He found that it made almost no difference in security...
This includes the work of Komanduri et al. [13] and Kelley et al. [12], who used similar study designs to perform comparative analyses of password composition rules. These prior studies found that increasing length requirements in passwords generally led to more usable passwords that were also less likely to be identified as weak by their guessing algorithm [13 12]. Most recently, Shay et al. studied password-composition policies requiring longer passwords, finding the best performance came from mixing a 12-character minimum with a requirement of three character sets [25].
Usability is a huge reason why more complex criteria like password entropy aren't used more frequently:
In a study of the distribution of password policies, Florencio and Herley found that usability imperatives appeared to play at least as large a role as security among the 75 websites examined [8]. ...
Ur et al. also studied the effect of password strength meters on password-creation. They found that when users became frustrated and lost confidence in the meter, more weak passwords appeared. [28] ...
While [Dropbox's] zxcvbn provides a much-needed improvement in the credibility of its strength estimates when compared to approaches relying solely on composition rules, this credibility is unlikely to be observed by users. In fact, its perceived credibility may suffer if users, who have been told that adding characters increases password strength, see scores decrease when certain characters are added. For example, when typing iatemylunch, the strength estimate decreases from the second-best score (3) to the worst score (1) when the final character is added. Even if users find zxcvbn’s strength estimates credible, they are unlikely to understand the underlying entropy-estimation mechanism and thus be unsure how to improve their scores. [30]
Finally, for sake of completeness, we have to realize that defining entropy in this example is very difficult (but far from impossible). There are lots of different assumptions we can make about the sophistication of a password cracker's guessing algorithm or dictionary, and these all lead to differing answers on the entropy of passwords like "Tr0ub4dor&3" or "correct horse battery staple". The most sophisticated password entropy measures are based off dictionaries of millions of passwords and advanced study of password patterns, and this level of sophistication is difficult to achieve for many administrators (and hackers).