Unicode characters in Regex

Try incorporating \p{L} which will match a unicode "letter". So a and á should match against \p{L}.

Just for reference you don't need to escape the above ',. in your character class [], and you can avoid having to escape the dash - by placing it at the beginning or end of your character class.

You can use \p{L} which matches any kind of letter from any language. See the example below:

string[] names = { "Brendán", "Jóhn", "Jason" };
Regex rgx      = new Regex(@"^\p{L}+$");
foreach (string name in names)
    Console.WriteLine("{0} {1} a valid name.", name, rgx.IsMatch(name) ? "is" : "is not");

// Brendán is a valid name.
// Jóhn is a valid name.
// Jason is a valid name.

Or simply just add the desired characters to your character class [] you want to include.


To expand your regular expression to include vowels with an acute accent (fada), you can use Unicode code points. You need to know about these unicode blocks:

  • C0 controls and Basic Latin
  • C1 controls and Latin-1 Supplement
  • and possibly Latin Extended-A

More Unicode code charts at http://www.unicode.org/charts/index.html#scripts, covering Latin Extended-B, -C and -D and Latin Extended-Addional (which ought to cover pretty much every European language in its entirety).

So, we see that the Irish fada vowels are

  • Á is \u00C1; á is \u00E1
  • É is \u00C9; é is \u00E9
  • Í is \u00CD; í is \u00ED
  • Ó is \u00D3; ó is \u00F3
  • Ú is \u00DA; ú is \u00FA

And thus your regular expression need to be extended:

Regex rx = new Regex( @"^[A-Za-z\u00C1\u00C9\u00CD\u00D3\u00DA\u00E1\u00E9\u00ED\u00F3\u00FA][A-Za-z\u00C1\u00C9\u00CD\u00D3\u00DA\u00E1\u00E9\u00ED\u00F3\u00FA0-9@#%&\'\-\s\.\,*]*$");



