Is There a Way to Match Any Unicode Alphabetic Character?

Check out Unicode character properties: http://www.regular-expressions.info/unicode.html#prop. I think what you are looking for is probably

\p{L}

which will match any letters or ideographs. You may also want to include letters with marks on them, so you could do

\p{L}\p{M}*

In any case, all the different types of character properties are detailed in the first link.

Edit: You may also want to look at this Stack Overflow answer discussing whether \w matches unicode characters. They suggest that you could also use \p{Word} or \p{Alnum}: Does \w match all alphanumeric characters defined in the Unicode standard?

Depending on which language you're using, the regular expression engine may or may not be Unicode aware. If it is, it may or may not know the \p{} property tokens. If it does, your answer is in Unicode Characters and Properties in Jan Goyvaerts' regex tutorial.

You can use \p{Latin}, if supported, to detect everything that is (or isn't, of course) from a language that uses any of the Unicode Latin blocks.

Is There a Way to Match Any Unicode Alphabetic Character?

Tags:

Unicode

Regex

Perl

Character Properties

Related

Recent Posts