RegEx for ukrainian letters. How to separate cyrillic words by capital letter?
[А-Я]
is not Cyrillic alphabet, it's just Russian!
Cyrillic is a writing system. It used in alphabets for many languages. (Like Latin: charset for West European languages, East European &c.)
To have both Russian and Ukrainian you'd get [А-ЯҐЄІЇ]
.
To add Belarisian: [А-ЯҐЄІЇЎ]
And for all Cyrillic chars (including Balcanian languages and Old Cyrillic), you can get it through Unicode subset class, like: \p{IsCyrillic}
To deal with Ukrainian separately:
[А-ЩЬЮЯҐЄІЇ]
or [А-ЩЬЮЯҐЄІЇа-щьюяґєії]
seems to be full Ukrainian alphabet of 33 letters in each case.
Apostrophe is not a letter, but occasionally included in alphabet, because it has an impact to the next vowel. Apostrophe is a part of the words, not divider. It may be displayed in a few ways:
27 "'" APOSTROPHE 60 "`" GRAVE ACCENT 2019 "’" RIGHT SINGLE QUOTATION MARK 2bc "ʼ" MODIFIER LETTER APOSTROPHE
and maybe some more.
Yes, it's a bit complicated with apostrophe. There is no common standard for it.
Use \p{Lu}
for uppercase match, \p{Ll}
for lowercase, or \p{L}
to match any letter
update: That works only for Java, not for JavaScript. Don't forget to include "apostrof", "ji" to your regexp
The way to solve this is to look at the unicode table to determine the character ranges you need. If, for example, I use the pattern:
str.match(/[А-Я][а-яєі]+/g)
it works with your example string. (sorry i don't know ukrainian letters)