Constructing regex pattern to match sentence

String regex = "^\\s+[A-Za-z,;'\"\\s]+[.?!]$"

^ means "begins with"
\\s means white space
+ means 1 or more
[A-Za-z,;'"\\s] means any letter, ,, ;, ', ", or whitespace character
$ means "ends with"


A sentence starts with a word boundary (hence \b) and ends with one or more terminators. Thus:

\b[^.!?]+[.!?]+

https://regex101.com/r/7DdyM1/1

This gives pretty accurate results. However, it will not handle fractional numbers. E.g. This sentence will be interpreted as two sentences:

The value of PI is 3.141...

An example regex to match sentences by the definition: "A sentence is a series of characters, starting with at lease one whitespace character, that ends in one of ., ! or ?" is as follows:

\s+[^.!?]*[.!?]

Regular expression visualization

Note that newline characters will also be included in this match.

Tags:

Java

Regex