RegEx for ukrainian letters. How to separate cyrillic words by capital letter?

[А-Я] is not Cyrillic alphabet, it's just Russian!

Cyrillic is a writing system. It used in alphabets for many languages. (Like Latin: charset for West European languages, East European &c.)

To have both Russian and Ukrainian you'd get [А-ЯҐЄІЇ].

To add Belarisian: [А-ЯҐЄІЇЎ]

And for all Cyrillic chars (including Balcanian languages and Old Cyrillic), you can get it through Unicode subset class, like: \p{IsCyrillic}


To deal with Ukrainian separately:

[А-ЩЬЮЯҐЄІЇ] or [А-ЩЬЮЯҐЄІЇа-щьюяґєії] seems to be full Ukrainian alphabet of 33 letters in each case.

Apostrophe is not a letter, but occasionally included in alphabet, because it has an impact to the next vowel. Apostrophe is a part of the words, not divider. It may be displayed in a few ways:

27 "'" APOSTROPHE
60 "`" GRAVE ACCENT
2019 "’" RIGHT SINGLE QUOTATION MARK
2bc "ʼ" MODIFIER LETTER APOSTROPHE

and maybe some more.

Yes, it's a bit complicated with apostrophe. There is no common standard for it.


Use \p{Lu} for uppercase match, \p{Ll} for lowercase, or \p{L} to match any letter

update: That works only for Java, not for JavaScript. Don't forget to include "apostrof", "ji" to your regexp


The way to solve this is to look at the unicode table to determine the character ranges you need. If, for example, I use the pattern:

str.match(/[А-Я][а-яєі]+/g)

it works with your example string. (sorry i don't know ukrainian letters)