How to find the exact word using a regex in Java?

For a good explanation, see: http://www.regular-expressions.info/java.html

myString.matches("regex") returns true or false depending whether the string can be matched entirely by the regular expression. It is important to remember that String.matches() only returns true if the entire string can be matched. In other words: "regex" is applied as if you had written "^regex$" with start and end of string anchors. This is different from most other regex libraries, where the "quick match test" method returns true if the regex can be matched anywhere in the string. If myString is abc then myString.matches("bc") returns false. bc matches abc, but ^bc$ (which is really being used here) does not.

This writes "true":

String input = "Print this";
System.out.println(input.matches(".*\\bthis\\b"));

When you use the matches() method, it is trying to match the entire input. In your example, the input "Print this" doesn't match the pattern because the word "Print" isn't matched.

So you need to add something to the regex to match the initial part of the string, e.g.

.*\\bthis\\b

And if you want to allow extra text at the end of the line too:

.*\\bthis\\b.*

Alternatively, use a Matcher object and use Matcher.find() to find matches within the input string:

    Pattern p = Pattern.compile("\\bthis\\b");
    Matcher m = p.matcher("Print this");
    m.find();
    System.out.println(m.group());

Output:

this

If you want to find multiple matches in a line, you can call find() and group() repeatedly to extract them all.


Full example method for matcher:

public static String REGEX_FIND_WORD="(?i).*?\\b%s\\b.*?";

public static boolean containsWord(String text, String word) {
    String regex=String.format(REGEX_FIND_WORD, Pattern.quote(word));
    return text.matches(regex);
}

Explain:

  1. (?i) - ignorecase
  2. .*? - allow (optionally) any characters before
  3. \b - word boundary
  4. %s - variable to be changed by String.format (quoted to avoid regex errors)
  5. \b - word boundary
  6. .*? - allow (optionally) any characters after

You may use groups to find the exact word. Regex API specifies groups by parentheses. For example:

A(B(C))D

This statement consists of three groups, which are indexed from 0.

  • 0th group - ABCD
  • 1st group - BC
  • 2nd group - C

So if you need to find some specific word, you may use two methods in Matcher class such as: find() to find statement specified by regex, and then get a String object specified by its group number:

String statement = "Hello, my beautiful world";
Pattern pattern = Pattern.compile("Hello, my (\\w+).*");
Matcher m = pattern.matcher(statement);
m.find();
System.out.println(m.group(1));

The above code result will be "beautiful"

Tags:

Java

Regex