How to modify single (ranges of) characters within a custom environment?
You can do regular expression search and replace. In particular, s
and S
are changed into the character followed by U+030C (COMBINING CARON), which XeTeX will normalize to š
and Š
respectively; similarly for h
and H
, with U+032E (COMBINING BREVE BELOW) that are normalized to ḫ
and Ḫ
.
I took the occasion for suggesting |...|
instead of \textsuperscript{...}
, adding a substitution for this.
\documentclass{article}
\usepackage{fontspec}
\usepackage{xparse}
\setmainfont{Libertinus Serif}
\ExplSyntaxOn
\NewDocumentCommand{\hitt}{m}
{
\tl_set:Nn \l_xander_hitt_tl { #1 }
% change every run of lowercase letters into italic
\regex_replace_all:nnN
{ [a-z]+ }
{ \c{textit}\cB\{\0\cE\} }
\l_xander_hitt_tl
% change every h/H into h/H U+032E
\regex_replace_all:nnN
{ [hH] }
{ \0 \x{032e} }
\l_xander_hitt_tl
% change every s/S into s/S U+032E
\regex_replace_all:nnN
{ [sS] }
{ \0 \x{030c} }
\l_xander_hitt_tl
% change |...| into \textsuperscript{...}
\regex_replace_all:nnN
{ \|([^|]+)\| }
{ \c{textsuperscript}\cB\{\1\cE\} }
\l_xander_hitt_tl
% print the result
\tl_use:N \l_xander_hitt_tl
}
\ExplSyntaxOff
\begin{document}
\hitt{|DUG|ha-ne-es-n[a-as]}
\end{document}
Notes
[a-z]+
stands for a run of one or more lowercase letters\c{textit}
means “the control sequence\textit
”\cB\{
and\cE\}
stand for the group-making TeX braces\x{<hex digits>}
stands for the character with the corresponding code\0
in replacement strings denotes the match;\1
is the first capturing group\|([^|]*)\|
searches for|
(it must be escaped in search strings) followed by any run of tokens different from|
up to finding the matching|
Possibilities open up.
Not an answer, but the potential of an extension: the rubification of Hittite text (and also Old Persian, Ugaritic and so on).
Using a mapping file with a "plain" text input methodology, so that egreg's environment takes care of 's' and 'h' and, with an addition, doubled vowels become single-letter acute, and then compiling with xelatex:
MWE:
\documentclass[12pt]{article}
\usepackage{fontspec}
\usepackage{graphicx}
\usepackage{xcolor}
\usepackage{stackengine}
\setmainfont{Linux Libertine}
\newfontface\hitt{Noto Sans Cuneiform}
\newfontface\fmhitt[Mapping=latin-to-hittite2,Colour=blue]{Noto Sans Cuneiform}
\DeclareTextFontCommand{\texthitt}{\Large\fmhitt}
\newenvironment{thitt}
{\fmhitt\Large\ignorespaces}
{\ignorespacesafterend}
\ExplSyntaxOn
\NewDocumentCommand{\ehitt}{m}
{
\tl_set:Nn \l_xander_hitt_tl { #1 }
% change every run of lowercase letters into italic
\regex_replace_all:nnN
{ [a-z]+ }
{ \c{textit}\cB\{\0\cE\} }
\l_xander_hitt_tl
% change every h/H into h/H U+032E
\regex_replace_all:nnN
{ [hH] }
{ \0 \x{032e} }
\l_xander_hitt_tl
% change every s/S into s/S U+032E
\regex_replace_all:nnN
{ [sS] }
{ \0 \x{030c} }
\l_xander_hitt_tl
% change every double vowel aa/ into a/ U+0301
\regex_replace_all:nnN
{ ([a,e,i,u]){2,2} }
{ \1 \x{0301} }
\l_xander_hitt_tl
% change |...| into \textsuperscript{...}
\regex_replace_all:nnN
{ \|([^|]+)\| }
{ \c{textsuperscript}\cB\{\c{textsc}\cB\{\1\cE\}\cE\}
}
\l_xander_hitt_tl
% print the result
\tl_use:N \l_xander_hitt_tl
}
\ExplSyntaxOff
\newcommand\hstackon[2]{\stackon{\begin{thitt}#1\end{thitt}}{\ehitt{#2}}}
\begin{document}
\section{Hittite}
\hstackon{|dug|}{|dug|}
\hstackon{ha}{ha}
\hstackon{ne}{ne}
\hstackon{es}{es}
\hstackon{na}{na}
\hstackon{as}{as}
Random text from the UChicago \textit{Hittite Dictionary}, volume \textbf{P}, p 144, sv \ehitt{parh-} (chase): "and may the oath deities pursue you continuously"
\begin{center}
\hstackon{niis}{|niis|}
\hstackon{dingir}{|dingir|}
\hstackon{mes}{|mes|}
\hstackon{paar}{paar}
\hstackon{he}{he}
\hstackon{es}{es}
\hstackon{kaan}{kaan}
\hstackon{du}{du}
\end{center}
\begin{center}
\hstackon{|niis.dingir.mes|paar-he-es-kaan-du}{|niis.dingir.mes|paar-he-es-kaan-du}
\end{center}
Mapping file (based on Wikipedia article) is incomplete, having the syllables (V, VC, VC, CVC) and a few determiners. Glyph for \textsc{\ehitt{niis}} needs to be tracked down; keyed in as \texttt{niis}, it is obviously incorrect as it is being parsed as the two syllables \texttt{ni} + \texttt{is}: \hstackon{ni}{ni}\hstackon{is}{is} by the font map.
Otherwise, some general formatting required (e.g., struts or similar to line up the ruby text).
\end{document}
Map file, to compile with teckit_compile.exe:
; TECkit mapping for TeX input conventions <-> Unicode characters
LHSName "latin-to-hittite"
RHSName "UNICODE"
pass(Unicode)
; ligatures from Knuth's original CMR fonts
U+002D U+002D <> U+2013 ; -- -> en dash
U+002D U+002D U+002D <> U+2014 ; --- -> em dash
U+0027 <> U+2019 ; ' -> right single quote
U+0027 U+0027 <> U+201D ; '' -> right double quote
U+0022 > U+201D ; " -> right double quote
U+0060 <> U+2018 ; ` -> left single quote
U+0060 U+0060 <> U+201C ; `` -> left double quote
U+0021 U+0060 <> U+00A1 ; !` -> inverted exclam
U+003F U+0060 <> U+00BF ; ?` -> inverted question
; additions supported in T1 encoding
U+002C U+002C <> U+201E ; ,, -> DOUBLE LOW-9 QUOTATION MARK
U+003C U+003C <> U+00AB ; << -> LEFT POINTING GUILLEMET
U+003E U+003E <> U+00BB ; >> -> RIGHT POINTING GUILLEMET
;U+0020 > U+0020 ; space maps to space
U+002D > U+200D ; hyphen as Zero Width Joiner
U+002E > U+200D ; dot as Zero Width Joiner
U+007C > U+200C ; pipe as Zero Width Non-Joiner
U+0061 <> U+12000 ; a
U+0065 <> U+1208A ; e
U+0069 <> U+1213F ; i
U+0075 <> U+1230B ; u
U+0075 U+0075 <> U+12311 ; uu
U+0062 U+0061 <> U+12040 ; ba
U+0070 U+0061 <> U+1227A ; pa
U+0064 U+0061 <> U+12055 ; da
U+0074 U+0061 <> U+122EB ; ta
U+0067 U+0061 <> U+120B5 ; ga
U+006B U+0061 <> U+12157 ; ka
U+0068 U+0061 <> U+12129 ; ha
U+006C U+0061 <> U+121B7 ; la
U+006D U+0061 <> U+12220 ; ma
U+006E U+0061 <> U+1223E ; na
U+0072 U+0061 <> U+1228F ; ra
U+0073 U+0061 <> U+122AD ; sa
U+0077 U+0061 <> U+1227F ; wa
U+0079 U+0061 <> U+12140 ; ya
U+007A U+0061 <> U+1235D ; za
U+0062 U+0065 <> U+12041 ; be
U+0070 U+0065 U+0065 <> U+12049 ; pee
U+0070 U+0069 U+0069 <> U+12049 ; pii
U+0062 U+0069 <> U+12049 ; bi
U+0064 U+0065 <> U+12072 ; de
U+0064 U+0069 <> U+12072 ; di
U+0074 U+0065 <> U+122FC ; te
U+0074 U+0069 <> U+122FE ; ti
U+0067 U+0065 <> U+12100 ; ge
U+0067 U+0069 <> U+12100 ; gi
U+006B U+0065 <> U+121A0 ; ke
U+006B U+0069 <> U+121A0 ; ki
U+0068 U+0065 <> U+1212D ; he
U+0068 U+0065 U+0065 <> U+120F6 ; hee
U+0068 U+0069 <> U+1212D ; hi
U+006C U+0065 <> U+121F7 ; le
U+006C U+0069 <> U+121F7 ; li
U+006D U+0065 <> U+12228 ; me
U+006D U+0065 U+0065 <> U+1222A ; mee
U+006D U+0069 <> U+1222A ; mi
U+006E U+0065 <> U+12248 ; ne
U+006E U+0065 U+0065 <> U+1224C ; nee
U+006E U+0069 <> U+1224C ; ni
U+0072 U+0065 <> U+12291 ; re
U+0072 U+0069 <> U+12291 ; ri
U+0073 U+0065 <> U+122BA ; se
U+0073 U+0069 <> U+12146 ; si
U+0077 U+0069 <> U+120FE ; wi
U+007A U+0065 <> U+12363 ; ze
U+007A U+0065 U+0065 <> U+12362 ; zee
U+007A U+0069 <> U+12363 ; zi
U+0062 U+0075 <> U+1204D ; bu
U+0070 U+0075 <> U+1204D ; pu
U+0064 U+0075 <> U+1207A ; du
U+0074 U+0075 <> U+12305 ; tu
U+0067 U+0075 <> U+12116 ; gu
U+006B U+0075 <> U+121AA ; ku
U+0068 U+0075 <> U+12137 ; hu
U+006C U+0075 <> U+121FB ; lu
U+006D U+0075 <> U+1222C ; mu
U+006E U+0075 <> U+12261 ; nu
U+0072 U+0075 <> U+12292 ; ru
U+0073 U+0075 <> U+122D7 ; su
U+0073 U+0075 U+0075 <> U+122D9 ; suu
U+007A U+0075 <> U+1236A ; zu
U+0061 U+0062 <> U+1200A ; ab
U+0061 U+0070 <> U+1200A ; ap
U+0061 U+0064 <> U+1201C ; ad
U+0061 U+0074 <> U+1201C ; at
U+0061 U+0067 <> U+1201D ; ag
U+0061 U+006B <> U+1201D ; ak
U+0061 U+0068 <> U+12134 ; ah
U+0065 U+0068 <> U+12134 ; eh
U+0069 U+0068 <> U+12134 ; ih
U+0075 U+0068 <> U+12134 ; uh
U+0061 U+006C <> U+12020 ; al
U+0061 U+006D <> U+12120 ; am
U+0061 U+006E <> U+1202D ; an
U+0061 U+0072 <> U+12148 ; ar
U+0061 U+0073 <> U+12038 ; as
U+0061 U+007A <> U+1228D ; az
U+0065 U+0062 <> U+12141 ; eb
U+0065 U+0070 <> U+12141 ; ep
U+0069 U+0062 <> U+12141 ; ib
U+0069 U+0070 <> U+12141 ; ip
U+0065 U+0064 <> U+12009 ; ed
U+0065 U+0074 <> U+12009 ; et
U+0069 U+0064 <> U+12009 ; id
U+0069 U+0074 <> U+12009 ; it
U+0065 U+0067 <> U+12145 ; eg
U+0065 U+006B <> U+12145 ; ek
U+0069 U+0067 <> U+12145 ; ig
U+0069 U+006B <> U+12145 ; ik
U+0065 U+006C <> U+12096 ; el
U+0069 U+006C <> U+1214B ; il
U+0065 U+006D <> U+1214E ; em
U+0069 U+006D <> U+1214E ; im
U+0065 U+006E <> U+12097 ; en
U+0069 U+006E <> U+12154 ; in
U+0065 U+0072 <> U+12155 ; er
U+0069 U+0072 <> U+12155 ; ir
U+0065 U+0073 <> U+1230D ; es
U+0065 U+0073 U+0073 <> U+12401 ; ess
U+0069 U+0073 <> U+12156 ; is
U+0065 U+007A <> U+12111 ; ez
U+0069 U+007A <> U+12111 ; iz
U+0075 U+0062 <> U+12312 ; ub
U+0075 U+0070 <> U+12312 ; up
U+0075 U+0064 <> U+12313 ; ud
U+0075 U+0074 <> U+12313 ; ut
U+0075 U+0067 <> U+1228C ; ug
U+0075 U+006B <> U+1228C ; uk
U+0075 U+006C <> U+1230C ; ul
U+0075 U+006D <> U+1231D ; um
U+0075 U+006E <> U+12326 ; un
U+0075 U+0072 <> U+12328 ; ur
U+0075 U+0075 U+0072 <> U+1232B ; uur
U+0075 U+0073 <> U+12351 ; us
U+0075 U+007A <> U+122BB ; uz
U+0068 U+0061 U+006C <> U+1212C ; hal
U+0068 U+0061 U+0062 <> U+121B8 ; hab
U+0068 U+0061 U+0070 <> U+121B8 ; hap
U+0068 U+0061 U+0073 <> U+122FB ; has
U+0068 U+0061 U+0064 <> U+1227A ; had
U+0068 U+0061 U+0074 <> U+1227A ; hat
U+0068 U+0075 U+006C <> U+003F ; hul ?
U+0068 U+0075 U+0062 <> U+1213D ; hub
U+0068 U+0075 U+0070 <> U+1213D ; hup
U+0068 U+0061 U+0072 <> U+1212F ; har
U+0068 U+0075 U+0072 <> U+1212F ; hur
U+0067 U+0061 U+006C <> U+120F2 ; gal
U+006B U+0061 U+006C <> U+12197 ; kal
U+0067 U+0061 U+006C U+0067 <> U+12197 ; galg
U+006B U+0061 U+006D <> U+12130 ; kam
U+0067 U+0061 U+0061 U+006D <> U+12130 ; gaam
U+006B U+0061 U+0061 U+006E <> U+120F7 ; kaan
U+0067 U+0061 U+0061 <> U+120F7 ; gaan
U+006B U+0061 U+0062 <> U+1218F ; kab
U+006B U+0061 U+0070 <> U+1218F ; kap
U+0067 U+0061 U+0061 U+0062 <> U+1218F ; gaab
U+0067 U+0061 U+0061 U+0070 <> U+1218F ; gaap
U+006B U+0061 U+0072 <> U+003F ; kar ?
U+006B U+0061 U+0061 U+0072 <> U+120FC ; kaar
U+0067 U+0061 U+0061 U+0072 <> U+120FC ; gaar
U+006B U+0061 U+0064 <> U+120F0 ; kad
U+006B U+0061 U+0074 <> U+120F0 ; kat
U+0067 U+0061 U+0064 <> U+120F0 ; gad
U+0067 U+0061 U+0074 <> U+120F0 ; gat
U+0067 U+0061 U+007A <> U+12124 ; gaz
U+006B U+0069 U+0062 <> U+003F ; kib ?
U+006B U+0069 U+0070 <> U+003F ; kip ?
U+006B U+0069 U+0072 <> U+1212B ; kir
U+0067 U+0069 U+0072 <> U+1212B ; gir
U+006B U+0069 U+0073 <> U+121A7 ; kis
U+006B U+0069 U+0064 <> U+120F0 ; kid
U+006B U+0069 U+0074 <> U+120F0 ; kit
U+006B U+0061 U+006C U+006C <> U+12197 ; kall
U+006B U+0075 U+006C <> U+121B0 ; kul
U+006B U+0075 U+0075 U+006C <> U+12122 ; kuul
U+0067 U+0075 U+006C <> U+12122 ; gul
U+006B U+0075 U+006D <> U+12123 ; kum
U+0067 U+0075 U+006D <> U+12123 ; gum
U+006B U+0075 U+0072 <> U+121B3 ; kur
U+006B U+0075 U+0075 U+0072 <> U+12125 ; kuur
U+0067 U+0075 U+0072 <> U+12125 ; gur
U+006C U+0061 U+006C <> U+121F2 ; lal
U+006C U+0061 U+006D <> U+121F4 ; lam
U+006C U+0069 U+0067 <> U+12328 ; lig
U+006C U+0069 U+006B <> U+12328 ; lik
U+006C U+0069 U+0073 <> U+121FA ; lis
U+006C U+0075 U+0068 <> U+1221B ; luh
U+006C U+0075 U+006D <> U+1221D ; lum
U+006D U+0061 U+0068 <> U+12224 ; mah
U+006D U+0061 U+006E <> U+003F ; man ?
U+006D U+0061 U+0072 <> U+12225 ; mar
U+006D U+0061 U+0073 <> U+12226 ; mas
;U+006D U+0065 U+0073 <> U+003F ; mes ?
U+006D U+0069 U+006C <> U+12156 ; mil
U+006D U+0065 U+006C <> U+12156 ; mel
U+006D U+0069 U+0073 <> U+12229 ; mis
U+006D U+0075 U+0072 <> U+1212F ; mur
U+006D U+0075 U+0074 <> U+003F ; mut ?
U+006E U+0061 U+006D <> U+12246 ; nam
U+006E U+0061 U+0062 <> U+1202E ; nab
U+006E U+0061 U+0070 <> U+1202E ; nap
U+006E U+0069 U+0072 <> U+1226A ; nir
U+006E U+0069 U+0073 <> U+003F ; nis ?
U+0062 U+0061 U+006C <> U+12044 ; bal
U+0070 U+0061 U+006C <> U+12044 ; pal
U+0062 U+0061 U+0072 <> U+12226 ; bar
U+0070 U+0061 U+0061 U+0072 <> U+12226 ; paar
U+0070 U+0061 U+0073 <> U+003F ; pas ?
U+0070 U+0061 U+0061 U+0064 <> U+12041 ; paad
U+0070 U+0061 U+0061 U+0074 <> U+12041 ; paat
U+0070 U+0069 U+0069 U+0064 <> U+12041 ; piid
U+0070 U+0069 U+0069 U+0074 <> U+12041 ; piit
U+0062 U+0069 U+0069 U+006C <> U+1224B ; biil
U+0070 U+0069 U+0069 U+006C <> U+1224B ; piil
U+0070 U+0069 U+0072 <> U+003F ; pir ?
U+0062 U+0069 U+0073 <> U+1212B ; bis
U+0070 U+0069 U+0073 <> U+1212B ; pis
U+0070 U+0075 U+0075 U+0073 <> U+1212B ; puus
U+0062 U+0075 U+0072 <> U+003F ; bur ?
U+0070 U+0075 U+0072 <> U+003F ; pur ?
U+0072 U+0061 U+0064 <> U+122E5 ; rad
U+0072 U+0061 U+0074 <> U+122E5 ; rat
U+0072 U+0069 U+0073 <> U+12295 ; ris
U+0073 U+0061 U+0068 <> U+122DA ; sah
U+0073 U+0061 U+0067 <> U+12295 ; sag
U+0073 U+0061 U+006B <> U+12295 ; sak
U+0073 U+0061 U+006C <> U+122A9 ; sal
U+0073 U+0061 U+006D <> U+12311 ; sam
U+0073 U+0061 U+0062 <> U+003F ; sab ?
U+0073 U+0061 U+0070 <> U+003F ; sap ?
U+0073 U+0061 U+0072 <> U+122AC ; sar
U+0073 U+0069 U+0069 U+0070 <> U+003F ; siip ?
U+0073 U+0069 U+0072 <> U+122D3 ; sir
U+0073 U+0075 U+006D <> U+122F3 ; sum
U+0073 U+0075 U+0072 <> U+122E9 ; sur
U+0064 U+0061 U+0068 <> U+1222D ; dah
U+0074 U+0061 U+0068 <> U+1222D ; tah
U+0074 U+0075 U+0075 U+0068 <> U+1222D ; tuuh
U+0074 U+0061 U+0061 U+0067 <> U+12056 ; taag
U+0074 U+0061 U+0061 U+006B <> U+12056 ; taak
U+0064 U+0061 U+0067 <> U+12056 ; dag
U+0064 U+0061 U+006B <> U+12056 ; dak
U+0064 U+0061 U+006C <> U+12291 ; dal
U+0074 U+0061 U+006C <> U+12291 ; tal
U+0064 U+0061 U+006D <> U+1206E ; dam
U+0074 U+0061 U+0061 U+006D <> U+1206E ; taam
U+0064 U+0061 U+006E <> U+12197 ; dan
U+0074 U+0061 U+006E <> U+12197 ; tan
U+0074 U+0061 U+0062 <> U+122F0 ; tab
U+0074 U+0061 U+0070 <> U+122F0 ; tap
U+0064 U+0061 U+0061 U+0062 <> U+122F0 ; daab
U+0064 U+0061 U+0061 U+0070 <> U+122F0 ; daap
U+0074 U+0061 U+0072 <> U+122FB ; tar
U+0064 U+0061 U+0061 U+0073 <> U+12079 ; daas
U+0074 U+0061 U+0061 U+0073 <> U+12079 ; taas
U+0064 U+0069 U+0073 <> U+12079 ; dis
U+0074 U+0069 U+0073 <> U+12079 ; tis
U+0074 U+0061 U+0061 <> U+1203E ; taas
U+0074 U+0069 U+006E <> U+12077 ; tin
U+0074 U+0065 U+0065 U+006E <> U+12077 ; teen
U+0064 U+0069 U+006D <> U+12074 ; dim
U+0074 U+0069 U+006D <> U+12074 ; tim
U+0064 U+0069 U+0072 <> U+003F ; dir ?
U+0074 U+0065 U+0072 <> U+12301 ; ter
U+0074 U+0069 U+0072 <> U+12301 ; tir
U+0074 U+0069 U+0069 U+0073 <> U+003F ; tiis ?
U+0074 U+0075 U+0075 U+006C <> U+121E5 ; tuul
U+0064 U+0075 U+006D <> U+12308 ; dum
U+0074 U+0075 U+006D <> U+12308 ; tum
U+0064 U+0075 U+0062 <> U+1207E ; dub
U+0064 U+0075 U+0070 <> U+1207E ; dup
U+0074 U+0075 U+0062 <> U+1207E ; tub
U+0074 U+0075 U+0070 <> U+1207E ; tup
U+0064 U+0075 U+0072 <> U+12119 ; dur
U+0074 U+0075 U+0075 U+0072 <> U+12119 ; tuur
U+007A U+0075 U+006C <> U+12084 ; zul
U+007A U+0075 U+006D <> U+1236E ; zum
U+0064 U+0069 U+0073 <> U+12079 ; dis
U+0064 U+0069 U+0064 U+006C U+0069 <> U+12038 ; didli
U+0064 U+0069 U+0064 U+006C U+0069 U+0068 U+0069 U+0061 <> U+12038 U+1212D U+12000 ; didlihia
U+0064 U+0069 U+006E U+0067 U+0069 U+0072 <> U+1202D ; dingir
U+0064 U+0075 U+0067 <> U+12081 ; dug
U+0065 U+0065 <> U+1208D ; ee
U+0067 U+0061 U+0064 <> U+120F0 ; gad
U+0067 U+0069 <> U+12100 ; gi
U+0067 U+0069 U+0073 <> U+12111 ; gis
U+0067 U+0075 U+0064 <> U+1211E ; gud
U+0068 U+0069 U+0061 <> U+1212D U+12000 ; hia
U+0068 U+0075 U+0072 U+0073 U+0061 U+0067 <> U+1212F U+12295 ; hursag
U+0069 U+0069 U+0064 <> U+003F ; iid ?
U+0069 U+006D <> U+1214E ; im
U+0069 U+0074 U+0075 <> U+1231A ; itu
U+006B U+0061 U+006D <> U+12130 ; kam
U+006B U+0069 <> U+121A0 ; ki
U+006B U+0075 <> U+12129 ; ku
U+006B U+0075 U+0072 <> U+121B3 ; kur
U+006B U+0075 U+0073 <> U+12200 ; kus
U+006C U+0075 U+0075 <> U+121FD ; luu
U+006D U+0065 U+0073 <> U+12228 U+1230D ; mes
U+006D U+0065 U+0073 U+0068 U+0069 U+0061 <> U+12228 U+1230D U+1212D U+12000 ; meshia
U+006D U+0075 U+006C <> U+1202F ; mul
U+006D U+0075 U+006E U+0075 U+0073 <> U+122A9 ; munus
U+006D U+0075 U+0073 <> U+12232 ; mus
U+006D U+0075 U+0073 U+0065 U+006E <> U+12137 ; musen
U+006E U+0061 <> U+003F ; na ?
U+006E U+0069 U+006E U+0064 U+0061 <> U+120FB ; ninda
U+0070 U+0075 U+0075 <> U+003F ; puu ?
U+0073 U+0061 U+0072 <> U+122AC ; sar
U+0073 U+0069 <> U+122DB ; si
U+0073 U+0069 U+0069 <> U+0122 ; siig
U+0074 U+0075 <> U+12130 ; tu
U+0074 U+0075 U+0075 <> U+12306 ; tuug
U+0075 U+0075 <> U+12311 ; uu
U+0075 U+0072 U+0075 <> U+12337 ; uru
U+0075 U+0072 U+0075 U+0064 U+0075 <> U+12350 ; urudu
U+0075 U+007A U+0075 <> U+1235C ; uzu