Regexp Java for password validation

simple example using regex

public class passwordvalidation {
    public static void main(String[] args) {
      String passwd = "aaZZa44@"; 
      String pattern = "(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\\S+$).{8,}";
      System.out.println(passwd.matches(pattern));
   }
}

Explanations:

  • (?=.*[0-9]) a digit must occur at least once
  • (?=.*[a-z]) a lower case letter must occur at least once
  • (?=.*[A-Z]) an upper case letter must occur at least once
  • (?=.*[@#$%^&+=]) a special character must occur at least once
  • (?=\\S+$) no whitespace allowed in the entire string
  • .{8,} at least 8 characters

You should not use overly complex Regex (if you can avoid them) because they are

  • hard to read (at least for everyone but yourself)
  • hard to extend
  • hard to debug

Although there might be a small performance overhead in using many small regular expressions, the points above outweight it easily.

I would implement like this:

bool matchesPolicy(pwd) {
    if (pwd.length < 8) return false;
    if (not pwd =~ /[0-9]/) return false;
    if (not pwd =~ /[a-z]/) return false;
    if (not pwd =~ /[A-Z]/) return false;
    if (not pwd =~ /[%@$^]/) return false;
    if (pwd =~ /\s/) return false;
    return true;
}

All the previously given answers use the same (correct) technique to use a separate lookahead for each requirement. But they contain a couple of inefficiencies and a potentially massive bug, depending on the back end that will actually use the password.

I'll start with the regex from the accepted answer:

^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\S+$).{8,}$

First of all, since Java supports \A and \z I prefer to use those to make sure the entire string is validated, independently of Pattern.MULTILINE. This doesn't affect performance, but avoids mistakes when regexes are recycled.

\A(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\S+$).{8,}\z

Checking that the password does not contain whitespace and checking its minimum length can be done in a single pass by using the all at once by putting variable quantifier {8,} on the shorthand \S that limits the allowed characters:

\A(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])\S{8,}\z

If the provided password does contain a space, all the checks will be done, only to have the final check fail on the space. This can be avoided by replacing all the dots with \S:

\A(?=\S*[0-9])(?=\S*[a-z])(?=\S*[A-Z])(?=\S*[@#$%^&+=])\S{8,}\z

The dot should only be used if you really want to allow any character. Otherwise, use a (negated) character class to limit your regex to only those characters that are really permitted. Though it makes little difference in this case, not using the dot when something else is more appropriate is a very good habit. I see far too many cases of catastrophic backtracking because the developer was too lazy to use something more appropriate than the dot.

Since there's a good chance the initial tests will find an appropriate character in the first half of the password, a lazy quantifier can be more efficient:

\A(?=\S*?[0-9])(?=\S*?[a-z])(?=\S*?[A-Z])(?=\S*?[@#$%^&+=])\S{8,}\z

But now for the really important issue: none of the answers mentions the fact that the original question seems to be written by somebody who thinks in ASCII. But in Java strings are Unicode. Are non-ASCII characters allowed in passwords? If they are, are only ASCII spaces disallowed, or should all Unicode whitespace be excluded.

By default \s matches only ASCII whitespace, so its inverse \S matches all Unicode characters (whitespace or not) and all non-whitespace ASCII characters. If Unicode characters are allowed but Unicode spaces are not, the UNICODE_CHARACTER_CLASS flag can be specified to make \S exclude Unicode whitespace. If Unicode characters are not allowed, then [\x21-\x7E] can be used instead of \S to match all ASCII characters that are not a space or a control character.

Which brings us to the next potential issue: do we want to allow control characters? The first step in writing a proper regex is to exactly specify what you want to match and what you don't. The only 100% technically correct answer is that the password specification in the question is ambiguous because it does not state whether certain ranges of characters like control characters or non-ASCII characters are permitted or not.


Try this:

^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\S+$).{8,}$

Explanation:

^                 # start-of-string
(?=.*[0-9])       # a digit must occur at least once
(?=.*[a-z])       # a lower case letter must occur at least once
(?=.*[A-Z])       # an upper case letter must occur at least once
(?=.*[@#$%^&+=])  # a special character must occur at least once
(?=\S+$)          # no whitespace allowed in the entire string
.{8,}             # anything, at least eight places though
$                 # end-of-string

It's easy to add, modify or remove individual rules, since every rule is an independent "module".

The (?=.*[xyz]) construct eats the entire string (.*) and backtracks to the first occurrence where [xyz] can match. It succeeds if [xyz] is found, it fails otherwise.

The alternative would be using a reluctant qualifier: (?=.*?[xyz]). For a password check, this will hardly make any difference, for much longer strings it could be the more efficient variant.

The most efficient variant (but hardest to read and maintain, therefore the most error-prone) would be (?=[^xyz]*[xyz]), of course. For a regex of this length and for this purpose, I would dis-recommend doing it that way, as it has no real benefits.

Tags:

Java

Regex