Should I reject obviously poor passwords?
Is it worth it to even include a password strength estimator?
A strength meter? No, because it is too difficult to inform normal users that the "strength" listed is an absolute maximum possible strength, and that their password may be trivially crackable by a skilled opponent no matter what the meter says, while still asking they pay attention to the meter.
A known-weak-password warning? Absolutely! If you can detect that a password is weak, then it's weak. You merely cannot possibly detect whether or not a password is strong, since your software won't have realtime access to the latest in cracking dictionaries, rulesets, and other advances (Markov chains, etc.)
Is it worth trying to bolster the existing library with stricter rules?
Yes! Go take a look at Hashcat for the types of rules it supports, and note that when you have the plaintext user password, it's easy to apply many rules.
- You can handle all the uppercase/lowercase rules with a simple UPPER() equivalent and an all-uppercase dictionary - if you find it, it's weak. (JacQueLinE)
- Appending/prepending numbers purely to meet length minimums is a simple pattern match - if the last/first N characters are numbers, and the remaining length isn't enough, it's weak. (Riddick123)
- Remove N numbers from the beginning/end, uppercase it, and check the dictionary for the remainder (JacQueLine12)
- The above, but N-1 numbers and/or symbols (#1JacQueLine)
- The above, but date formats. (JacQueLine02121995)
- If the last/first N-1 characters are numbers and the last/first is a symbol, and the remaining length isn't enough, it's weak. (!JacQueLine1)
- Take out one character at a time, see if it matches the dictionary. (jacqu$eline)
- Combine some of these.
- Reverse all of these.
Do a pattern-match for dictionary words as subsets, i.e.
correcthorsebatterystaple
- correct: 1813th most common English word, row 16828 on phpbb, row 9871 on Ubuntu american english small.
- horse: 1291st most common English word, row 14820 on phpbb (horses is at row 1723!), row 21607 on Ubuntu american english small.
- battery: 3226th most common English word, row 7775 on phpbb, row 3644 on Ubuntu american english small.
- staple: 6 characters, all lower case, not in the top 5000 most common words. row 40524 on phpbb (staples is at row 3852!), row 42634 on Ubuntu american english small.
Note that all of these are length 7 or less words; there are less than 21,000 length 7 or less words in Ubuntu's american english small dictionary, 21000^4 ~= 1.9E17, which is more or less 2^58 for a very simple "combinator attack: 4 words, no separators, length 1-7, from this one small dictionary".
Certainly correcthorsebattery would be a much, much weaker password against a combinator attack - 3226^3 ~= 3.3E10 ~= 2^35, using the top 3226 most common English words.
Get some better dictionaries; don't try to send them to the client, host them and the more complex rules serverside. Sure, send the client a tiny one for a first pass, but you need more. Phpbb is the best common small wordlist I know of, then add in rockyou. Many crackers start with brute force for tiny passwords, then small wordlists and large rulesets, then large wordlists - the largest I'm aware of is over 30GB, and includes almost every password found to have been cracked by anyone on a given popular forum, plus many, many other large wordlists.
Find yourself a happy medium - fast enough to be performant, large enough and with enough "rules" to cut out the first few fast passes of cracking software - if you really are using enough bcrypt iterations, then only small dictionaries + large rulesets and large dictionaries + small rulesets will be practical attacks for a few years.
Am I just going frustrate my users when I ban a password they think is a good idea? (I'm worried this will potentially cause them to either give up or copy there password to their monitor, etc).
Yes. When you say "password", "Password", "P@$$w0rd", "P@$$w0rd1", "P@$$w0rd123", and even "P@$$w0rd123!" are bad passwords, you're going to annoy them. When you say "Jennifer2007" is a bad password, they're going to be frustrated (and perhaps Jennifer will be upset, too!). Manage their frustration as best you can, and simply accept some. Personally, I would recommend actually being explicit - tell them their password is a word in known cracking dictionaries plus two numbers, which is a normal cracking rule!
Your purpose is twofold. First, you don't want weak passwords in your system. Second, you want to educate users on what a weak password is, so they have some understanding to mitigate their frustration.
As part of educating, perhaps show them some alternatives you generate that pass your own tests, if you flunk their password.
1) Fully random passwords
2) Fully random passwords translated into bubblebabble or another pronounceable subset
3) correcthorsebatterystaple type passwords, but with longer and uncommon words. For instance, take the Ubuntu american english insane dictionary, subtract out all the words in the american english small dictionary, and select N words of at least 7 characters in length. This leaves you without any really short words, and without the most common words.
4) a mix of 1, 2, and/or 3.
Then your users can, if they choose, simply pick something you showed them (over HTTPS with the best cipher suites you can get away with, of course).
Personally, I would also strongly suggest raising your length limit; about 14 is what I would recommend, but for most userbases that's just too long. Try a minimum of 12 or even 10, enough so a fully random password might have a slight amount of value at the minimum length and character set.
A "password strength meter" does not measure the absolute strength of a password. What it measures is how fast the designer of the "meter" would have cracked the password, generically and without context. The "meter" would try a number of passwords in some order, and the strength measure is just how far the actual password is in this list of potential passwords.
In that sense, a "password meter" can only overestimate the strength. If the meter says "31 bits of entropy", it really means "I could have broken that in 230 operations, so it cannot be stronger than 31 bits". However, the password could be much weaker. The really important expression in the paragraph above is "without context". This is not a very realistic assumption. The attacker attacks the password because he is interested in attacking the password, meaning that he has some notion of what is protected by that password. He may also have some information on the password owner (e.g. his name, email address...).
For instance, just imagine how many Facebook account passwords really are "F4ceb00k7823": if your password meter does not know that it is evaluating passwords meant for Facebook, then it may miss how the first 8 characters of the password contribute almost nothing to the total entropy.
Get a password meter and see if you can fool it with a "witty" (but weak) password. It is easy. It is fun ! E.g. the demo will give a whooping 39.218 bits of entropy (cracking time of 13 months !) to nothing less than "BarackObama". Now that is a strong password ! (Strangely enough, "BillClinton" and "VladimirPutin" rate much less.) More to the point, average users will also play such games, and find it fun. That's the problem. A password meter is doomed to have some failure cases like this one. Some of the users will use the password meter as an excuse for weak passwords. Others will believe that they have a strong password; a false feeling of safety is about as big an issue as a voluntarily weak password.
Password meters can be useful as a behind-the-curtains tool, to gather statistics on passwords as chosen "naturally" by users, so that you may know what you should say during security awareness training. However, making the tool available to end users invites trouble: some will play with it, others will feel protected by it, and the general security level will decrease.
A policy of "no less than 8 characters", on the other hand, is acceptable, because:
- No password of less than 8 characters can be deemed strong, because it will fall to the stupidest of brute force attacks.
- Users understand why short passwords are weak, and will cooperate.
- There is absolutely no fun whatsoever in finding a password which is not rejected by the "no less than 8 characters" rule.
You mention creating a password meter, yet you really seem to be interested in banning "unusable" passwords. Rather than a meeter you may be better served by a simple check mark, or red X to indicate whether it's acceptable or not (simpler for users). If their password doesn't meet your complexity, you may consider including suggestions on how to make it conform.
I notice that zxcvbn calculates entropy of a password using lists of patterns, which is probably why 'tenagemutantninjaturtles' passes (it's likely not included in the list). You may consider tweaking the entropy calculation. The amount of actual entropy in 'tenagemutantninjaturtles' is pretty little. The example has a very limited range of characters, which on traditional strength checkers would fail (no capitals, numbers, special chars). Simply modifying to 'TeenageMutantNinjaTurtles1!' would bypass most naive checkers though. You may consider placing a weight on chars near the beginning or end, making special characters/numbers/caps only count for a portion of the required number.
Another solution could be to run the password through Wikipedia. A Wikipedia search could capture references that a simple syntactic search wouldn't.