What's a good regex to include accented characters in a simple way?
Accented Characters: DIY Character Range Subtraction
If your regex engine allows it (and many will), this will work:
(?i)^(?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])+$
Please see the demo (you can add characters to test).
Explanation
(?i)
sets case-insensitive mode- The
^
anchor asserts that we are at the beginning of the string (?:(?![×Þß÷þø])[-'0-9a-zÀ-ÿ])
matches one character...- The lookahead
(?![×Þß÷þø])
asserts that the char is not one of those in the brackets [-'0-9a-zÀ-ÿ]
allows dash, apostrophe, digits, letters, and chars in a wide accented range, from which we need to subtract- The
+
matches that one or more times - The
$
anchor asserts that we are at the end of the string
Reference
Extended ASCII Table
A version without the exclusion rules:
^[-'a-zA-ZÀ-ÖØ-öø-ÿ]+$
Explanation
- The
^
anchor asserts that we are at the beginning of the string [...]
allows dash, apostrophe, digits, letters, and chars in a wide accented range,- The
+
matches that one or more times - The
$
anchor asserts that we are at the end of the string
Reference
- Extended ASCII Table
You just put in your expression:
\p{L}\p{M}
This in Unicode will match:
- any letter character (L) from any language
- and marks (M)(i.e, a character that is to be combined with another: accent, etc.)