Serbian Cyrillic using LuaTeX and XeTeX

Here is a method for XeLaTeX.

Prepare a file ascii-to-serbian.map with the following content:

; TECkit mapping for TeX input conventions <-> Unicode characters

LHSName "ASCII-to-Serbian"
RHSName "UNICODE"

pass(Unicode)

; ligatures from Knuth's original CMR fonts
U+002D U+002D           <>  U+2013  ; -- -> en dash
U+002D U+002D U+002D    <>  U+2014  ; --- -> em dash

U+0027          <>  U+2019  ; ' -> right single quote
U+0027 U+0027   <>  U+201D  ; '' -> right double quote
U+0022           >  U+201D  ; " -> right double quote

U+0060          <>  U+2018  ; ` -> left single quote
U+0060 U+0060   <>  U+201C  ; `` -> left double quote

U+0021 U+0060   <>  U+00A1  ; !` -> inverted exclam
U+003F U+0060   <>  U+00BF  ; ?` -> inverted question

; additions supported in T1 encoding
U+002C U+002C   <>  U+201E  ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C   <>  U+00AB  ; << -> LEFT POINTING GUILLEMET
U+003E U+003E   <>  U+00BB  ; >> -> RIGHT POINTING GUILLEMET

U+0041 <> U+0410 ; A
U+0042 <> U+0411 ; B
U+0043 <> U+0426 ; C
U+0043 U+0048 <> U+0427 ; CH
U+0043 U+0068 <> U+0427 ; Ch
U+0043 U+0031 <> U+040B ; C1
U+0027 U+0043 <> U+040B ; 'C
U+0044 <> U+0414 ; D
U+0044 U+004A <> U+0402 ; DJ
U+0044 U+006A <> U+0402 ; Dj
U+0044 U+005A U+0048 <> U+040F ; DZH
U+0044 U+007A U+0068 <> U+040F ; Dzh
U+0044 U+0031 <> U+040F ; D1
U+0045 <> U+0415 ; E
U+0046 <> U+0424 ; F
U+0047 <> U+0413 ; G
U+0048 <> U+0425 ; H
U+0049 <> U+0418 ; I
U+004A <> U+0408 ; J
U+004B <> U+041A ; K
U+004B U+0048 <> U+0425 ; KH
U+004B U+0068 <> U+0425 ; Kh
U+004C <> U+041B ; L
U+004C U+004A <> U+0409 ; LJ
U+004C U+006A <> U+0409 ; Lj
U+004D <> U+041C ; M
U+004E <> U+041D ; N
U+004E U+004A <> U+040A ; NJ
U+004E U+006A <> U+040A ; Nj
U+004F <> U+041E ; O
U+0050 <> U+041F ; P
;U+0051 <> ; Q
U+0052 <> U+0420 ; R
U+0053 <> U+0421 ; S
U+0053 U+0048 <> U+0428 ; SH
U+0053 U+0068 <> U+0428 ; Sh
U+0054 <> U+0422 ; T
U+0055 <> U+0423 ; U
U+0056 <> U+0412 ; V
;U+0057 <> ; W
U+0058 <> U+0425 ; X
;U+0059 ; Y
U+005A <> U+0417 ; Z
U+005A U+0048 <> U+0416 ; ZH
U+005A U+0068 <> U+0416 ; Zh

U+0061 <> U+0430 ; a
U+0062 <> U+0431 ; b
U+0063 <> U+0446 ; c
U+0063 U+0068 <> U+0447 ; ch
U+0063 U+0031 <> U+045B ; c1
U+0027 U+0063 <> U+045B ; 'c
U+0064 <> U+0434 ; d
U+0064 U+006A <> U+0452 ; dj
U+0064 U+007A U+0068 <> U+045F ; dzh
U+0064 U+0031 <> U+045F ; d1
U+0065 <> U+0435 ; e
U+0066 <> U+0444 ; f
U+0067 <> U+0433 ; g
U+0068 <> U+0445 ; h
U+0069 <> U+0438 ; i
U+006A <> U+0458 ; j
U+006B <> U+043A ; k
U+006B U+0068 <> U+0445 ; kh
U+006C <> U+043B ; l
U+006C U+006A <> U+0459 ; lj
U+006D <> U+043C ; m
U+006E <> U+043D ; n
U+006E U+006A <> U+045A ; nj
U+006F <> U+043E ; o
U+0070 <> U+043F ; p
;U+0071 <> ; q
U+0072 <> U+0440 ; r
U+0073 <> U+0441 ; s
U+0073 U+0068 <> U+0448 ; sh
U+0074 <> U+0442 ; t
U+0075 <> U+0443 ; u
U+0076 <> U+0432 ; v
;U+0077 <> ; w
U+0078 <> U+0445 ; x
;U+0079 ; y
U+007A <> U+0437 ; z
U+007A U+0068 <> U+0436 ; zh

; Additional (for official translitteration)
U+0110 <> U+0402 ; Đ
U+0111 <> U+0452 ; đ
U+017D <> U+0416 ; Ž
U+017E <> U+0436 ; ž
U+0106 <> U+040B ; Ć
U+0107 <> U+045B ; ć
U+010C <> U+0427 ; Č
U+010D <> U+0447 ; č
U+0044 U+017D <> U+040F ; DŽ
U+0044 U+017E <> U+040F ; Dž
U+0064 U+017E <> U+045F ; dž
U+0160 <> U+0428 ; Š
U+0161 <> U+0448 ; š

Then process it with

teckit_compile ascii-to-serbian.map

This will produce a file ascii-to-serbian.tec that you can put anywhere XeTeX will find it (in the working directory, for instance). Then make the following test file:

\documentclass{article}
\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{Linux Libertine O}
\newfontfamily{\serbianfont}[Mapping=ascii-to-serbian]{Linux Libertine O}
\usepackage{polyglossia}
\setmainlanguage{english}
\setotherlanguage[Script=Cyrillic]{serbian}

\begin{document}
Serbian alphabet again

\begin{serbian}
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R S T C1 U F Kh C Ch D1 Sh

a b v g d dj e zh z i j k l m n nj o p r s t c1 u f kh c ch d1 sh
\end{serbian} 
\end{document}

Sample output after xelatex test.tex

enter image description here

Note 1: the characters Џ and џ can be input also as DZH (or Dzh) and dzh. If this is incorrect (it might bring to incorrect ligatures) then remove the corresponding lines from ascii-to-serbian.map.

Note 2: if you find it inconvenient to type C1 and c1 to get Ћ and ћ, you can add the lines

U+0027 U+0043 <> U+040B ; 'C

and

U+0027 U+0063 <> U+040B ; 'c

after the C1 and c1 entries. This will allow you to input the characters as 'C and 'c.

If you want to input them as \'C and \'c, then insert this code after having loaded the Serbian language with Polyglossia

\let\standardcommandquote\'
\DeclareRobustCommand{\serbiancommandquote}[1]{%
  \ifnum\strcmp{#1}{c}=0 c1\else
    \ifnum\strcmp{#1}{C}=0 C1\else
      \standardcommandquote{#1}\fi\fi}
\makeatletter
\appto\blockextras@serbian{\let\'\serbiancommandquote}
\appto\inlineextras@serbian{\let\'\serbiancommandquote}
\appto\noextras@serbian{\let\'\standardcommandquote}
\makeatother

Note 3 (added Feb. 17): If one has available Unicode input, then also

Đ đ Ž ž Ć ć Č č DŽ Dž dž Š š

are mapped to

Ђ ђ Ж ж Ћ ћ Ч ч Џ џ Ш ш

respectively.


Your example will with some changes work with xelatex (and lualatex) too as long as you use only ascii for the serbian. (The file should be encoded in utf8 if you use non-ascii chars outside the serbian):

\documentclass{article}
\usepackage[OT2]{fontenc}
\input{cyracc.def}
\usepackage{fontspec}
\setmainfont{Arial} % to see the difference
\newcommand\textcyr[1]{{\fontencoding{OT2}\fontfamily{wncyr}\selectfont #1}}
\begin{document}
Serbian alphabet again \dots \textcyr{\cyracc
A B V G D DJ E Zh Z I J K L LJ M N NJ O P R  S T \'C U F Kh C Ch \Dzh\ Sh
} roman text again
\end{document}

It is possible the definitions in cyracc.def have unwanted side-effects in longer documents. It is also possible that you get problems if you want hyphenation .

But with this input you are not making use of the strength of xetex/luatex. You are not using real unicode input and you are not using system fonts for the cyrillic - the fonts you can use are limited to the OT2-encoded fonts.

So if you really want to stick to the ASCII input you should better use the mapping mentioned by egreg (which will work only with xelatex). But imho in the long run people who want to use two or more scripts should better learn how to switch the keyboard layout so that they can input the characters directly. For short pieces of text you can find virtual keyboards on the net.


With LuaLaTeX, it is possible to emulate xetex's mapping feature using opentype feature file. In contrast with xetex mapping features, it is not based on substituting of one unicode character with another, but on replacing unicode characters with glyph names used in the fonts. There is example using gentium:

\documentclass{article}
\usepackage{fontspec}
\usepackage[serbian]{babel}
\def\Dzh{Dzh}
\def\Sh{Sh}
\setmainfont{Gentium Plus}
\newfontfamily\serbianfont[RawFeature=+gsub,FeatureFile=serb.fea,Script=Cyrillic]
{Gentium Plus}
\begin{document}
\noindent српска ћирилица\\
Hello world\\
\serbianfont
Hello world\\
A B V G D DJ E Zh Z I J K Kh L LJ M N NJ O P R S T Th U F Kh C Ch \Dzh\Sh\\
a b v g d dj e zh z i j k kh l lj m n nj o p r s t th u f kh c ch dzh sh
\end{document}

and feature file serb.fea

languagesystem cyrl SRB;
languagesystem cyrl DFLT;

feature liga {
  sub C H  by Checyrillic;
  sub C h by Checyrillic;
  sub D J by Djecyrillic;
  sub D j by Djecyrillic;
  sub D z h by Dzhecyrillic;
  sub D Z H by Dzhecyrillic;
  sub K H by Khacyrillic;
  sub K h by Khacyrillic;
  sub L J by Ljecyrillic;
  sub L j by Ljecyrillic;
  sub N j by Njecyrillic;
  sub N J by Njecyrillic;
  sub S H by Shacyrillic;
  sub S h by Shacyrillic;
  sub Z H by Zhecyrillic;
  sub T h by Tshecyrillic;
  sub T H by Tshecyrillic;
  sub Z h by Zhecyrillic;
  sub c h by checyrillic;
  sub t h by tshecyrillic;
  sub d j by djecyrillic;
  sub d z h by dzhecyrillic;
  sub k h by khacyrillic;
  sub l j by ljecyrillic;
  sub n j by njecyrillic;
  sub s h by shacyrillic;
  sub z h by zhecyrillic;
} liga;

feature gsub {
 sub A by Acyrillic;  
 sub B by Becyrillic;
 sub C by Vecyrillic;
 sub D by Decyrillic;
 sub E by Iecyrillic;
 sub F by Efcyrillic;
 sub G by Gecyrillic;
 sub H by Khacyrillic;
 sub I by Iicyrillic;
 sub J by Jecyrillic;
 sub K by Kacyrillic;
 sub L by Elcyrillic;
 sub M by Emcyrillic;
 sub N by Encyrillic;
 sub O by Ocyrillic;
 sub P by Pecyrillic;
 sub R by Ercyrillic;
 sub S by Escyrillic;
 sub T by Tecyrillic;
 sub U by Ucyrillic;
 sub V by Vecyrillic;
 sub X by Khacyrillic;
 sub Z by Zecyrillic;
 sub a by acyrillic;
 sub b by becyrillic;
 sub c by tsecyrillic;
 sub d by decyrillic;
 sub e by iecyrillic;
 sub f by efcyrillic;
 sub g by gecyrillic;
 sub h by khacyrillic;
 sub i by iicyrillic;
 sub j by jecyrillic;
 sub k by kacyrillic;
 sub l by elcyrillic;
 sub m by emcyrillic;
 sub n by encyrillic;
 sub o by ocyrillic;
 sub p by pecyrillic;
 sub r by ercyrillic;
 sub s by escyrillic;
 sub t by tecyrillic;
 sub u by ucyrillic;
 sub v by vecyrillic;
 sub x by khacyrillic;
 sub z by zecyrillic;
} gsub;

example output