Regular Expression for UpperCase Letters In A String

This should do what you're after,

@Test
public void testCountTheNumberOfUpperCaseCharacters() {
  String testStr = "abcdefghijkTYYtyyQ";
  String regEx = "[A-Z]+";
  Pattern pattern = Pattern.compile(regEx);
  Matcher matcher = pattern.matcher(testStr);
  int count = 0;
  while (matcher.find()) {
    count+=matcher.group(0).length();
  }
  System.out.printf("Found %d, of capital letters in %s%n", count, testStr);
}

It doesn't work because you have 2 problems:

  1. Regex is incorrect, it should be "[A-Z]" for ASCII letter or \p{Lu} for Unicode uppercase letters
  2. You're not calling while (matcher.find()) before matcher.groupCount()

Correct code:

public void testCountTheNumberOfUpperCaseCharacters() {
    String testStr = "abcdefghijkTYYtyyQ";
    String regEx = "(\\p{Lu})";
    Pattern pattern = Pattern.compile(regEx);
    Matcher matcher = pattern.matcher(testStr);
    while (matcher.find())
        System.out.printf("Found %d, of capital letters in %s%n", 
          matcher.groupCount(), testStr);

}

UPDATE: Use this much simpler one-liner code to count number of Unicode upper case letters in a string:

int countuc = testStr.split("(?=\\p{Lu})").length - 1;

  1. You didn't call matches or find on the matcher. It hasn't done any work.

  2. getGroupCount is the wrong method to call. Your regex has no capture groups, and even if it did, it wouldn't give you the character count.

You should be using find, but with a different regex, one without anchors. I would also advise using the proper Unicode character class: "\\p{Lu}+". Use this in a while (m.find()) loop, and accumulate the total number of characters obtained from m.group(0).length() at each step.


It should find upper case letters in the given string and give me the count.

No, it shouldn't: the ^ and $ anchors prevent it from doing so, forcing to look for a non-empty string composed entirely of uppercase characters.

Moreover, you cannot expect a group count in an expression that does not define groups to be anything other than zero (no matches) or one (a single match).

If you insist on using a regex, use a simple [A-Z] expression with no anchors, and call matcher.find() in a loop. A better approach, however, would be calling Character.isUpperCase on the characters of your string, and counting the hits:

int count = 0;
for (char c : str.toCharArray()) {
    if (Character.isUpperCase(c)) {
        count++;
    }
}

Tags:

Java

Regex