How can I validate a culture code with a regular expression?
This is what I found in the Dublin Core / W3C xsd's : http://www.w3.org/2001/XMLSchema
<xs:simpleType name="language" id="language">
<xs:annotation>
<xs:documentation
source="http://www.w3.org/TR/xmlschema-2/#language"/>
</xs:annotation>
<xs:restriction base="xs:token">
<xs:pattern
value="[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*"
id="language.pattern">
<xs:annotation>
<xs:documentation
source="http://www.ietf.org/rfc/rfc3066.txt">
pattern specifies the content of section 2.12 of XML 1.0e2
and RFC 3066 (Revised version of RFC 1766).
</xs:documentation>
</xs:annotation>
</xs:pattern>
</xs:restriction>
</xs:simpleType>
Then the pattern is :
[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*
You can validate with this :
/^[a-z]{2,3}(?:-[A-Z]{2,3}(?:-[a-zA-Z]{4})?)?$/
Here is how it works
^ <- Starts with
[a-z] <- From a to z (lower-case)
{2,3} <- Repeated at least 2 times, at most 3
(?: <- Non capturing group
- <- The "-" character
[A-Z] <- From a to z (upper-case)
{2,3} <- Repeated at least 2 times, at most 3
(?: <- Non capturing group
- <- The "-" character
[a-zA-Z] <- from a to Z (case insensitive)
{4} <- Repeated 4 times
) <- End of the group
? <- Facultative
) <- End of the group
? <- Facultative
$ <- Ends here
You can also replace the last non capturing group by (?:-(?:Cyrl|Latn))?
if the only options are Cyrl and Latn