Constructing regex pattern to match sentence
String regex = "^\\s+[A-Za-z,;'\"\\s]+[.?!]$"
^
means "begins with"\\s
means white space+
means 1 or more[A-Za-z,;'"\\s]
means any letter, ,
, ;
, '
, "
, or whitespace character$
means "ends with"
A sentence starts with a word boundary (hence \b
) and ends with one or more terminators. Thus:
\b[^.!?]+[.!?]+
https://regex101.com/r/7DdyM1/1
This gives pretty accurate results. However, it will not handle fractional numbers. E.g. This sentence will be interpreted as two sentences:
The value of PI is 3.141...
An example regex to match sentences by the definition: "A sentence is a series of characters, starting with at lease one whitespace character, that ends in one of .
, !
or ?
" is as follows:
\s+[^.!?]*[.!?]
Note that newline characters will also be included in this match.