Unicode characters in Regex
Try incorporating \p{L}
which will match a unicode "letter". So a
and á
should match against \p{L}
.
Just for reference you don't need to escape the above ',.
in your character class []
, and you can avoid having to escape the dash -
by placing it at the beginning or end of your character class.
You can use \p{L}
which matches any kind of letter from any language. See the example below:
string[] names = { "Brendán", "Jóhn", "Jason" };
Regex rgx = new Regex(@"^\p{L}+$");
foreach (string name in names)
Console.WriteLine("{0} {1} a valid name.", name, rgx.IsMatch(name) ? "is" : "is not");
// Brendán is a valid name.
// Jóhn is a valid name.
// Jason is a valid name.
Or simply just add the desired characters to your character class []
you want to include.
@"^[a-zA-Z0-9áéíóú@#%&',.\s-]+$"
To expand your regular expression to include vowels with an acute accent (fada), you can use Unicode code points. You need to know about these unicode blocks:
- C0 controls and Basic Latin
- C1 controls and Latin-1 Supplement
- and possibly Latin Extended-A
More Unicode code charts at http://www.unicode.org/charts/index.html#scripts, covering Latin Extended-B, -C and -D and Latin Extended-Addional (which ought to cover pretty much every European language in its entirety).
So, we see that the Irish fada vowels are
Á
is\u00C1
;á
is\u00E1
É
is\u00C9
;é
is\u00E9
Í
is\u00CD
;í
is\u00ED
Ó
is\u00D3
;ó
is\u00F3
Ú
is\u00DA
;ú
is\u00FA
And thus your regular expression need to be extended:
Regex rx = new Regex( @"^[A-Za-z\u00C1\u00C9\u00CD\u00D3\u00DA\u00E1\u00E9\u00ED\u00F3\u00FA][A-Za-z\u00C1\u00C9\u00CD\u00D3\u00DA\u00E1\u00E9\u00ED\u00F3\u00FA0-9@#%&\'\-\s\.\,*]*$");