placement of dot-below accent using LuaLaTeX: significant variations depending on text font

What happens, at the moment, is that no \d composites are declared in the TU encoding, so \d{a} and so on are realized as combinations with the combining dot below (U+0323).

However, with XeLaTeX, the HarfBuzz library will seize the initiative and if the precomposed character exists in the font, it will use it. This doesn't happen with LuaTeX.

This is a grey area, I'd say. If the composite are declared, you get nothing if the font doesn't support the characters, if not, you have to rely on good positioning.

Indeed, if I try \showoutput with LuaLaTeX, I get for \d{i}

....\TU/lmr/m/n/10 i
....\TU/lmr/m/n/10 ̣

whereas XeLaTeX shows

....\TU/lmr/m/n/10 ị

Unfortunately, Latin Modern is quite weak in positioning certain combining characters, other fonts are better in this respect.

You can fix the issue, provisionally, by defining yourself the necessary composites:

\documentclass{article}
\usepackage{fontspec,ifluatex}

\makeatletter
% fix the hidden feature in tuenc.def
\def\add@unicode@accent#1#2{%
  \if\relax\detokenize{#2}\relax^^a0\else#2\fi
  \char#1\relax
}
\makeatother

\ifluatex
\usepackage{luatex85}
\DeclareTextComposite{\d}\UnicodeEncodingName{A}{"1EA0}
\DeclareTextComposite{\d}\UnicodeEncodingName{a}{"1EA1}
\DeclareTextComposite{\d}\UnicodeEncodingName{E}{"1EB8}
\DeclareTextComposite{\d}\UnicodeEncodingName{e}{"1EB9}
\DeclareTextComposite{\d}\UnicodeEncodingName{I}{"1ECA}
\DeclareTextComposite{\d}\UnicodeEncodingName{i}{"1ECB}
\DeclareTextComposite{\d}\UnicodeEncodingName{O}{"1ECC}
\DeclareTextComposite{\d}\UnicodeEncodingName{o}{"1ECD}
\DeclareTextComposite{\d}\UnicodeEncodingName{U}{"1EE4}
\DeclareTextComposite{\d}\UnicodeEncodingName{u}{"1EE5}
\fi    

\begin{document}
\d{a} \d{e} \d{i} \d{o} \d{u}
\end{document}

enter image description here

For the “hidden feature”, see How to place a dot below for example: e?


As egreg notes this is tricky, if we leave it as the default \d accent then you at least get some kind of dot under as long as the font has U+0303 however if we declare the composites, you get a better positioned dot if the font has the composite but a missing glyph in luatex if it doesn't, which is not really an improvement over a mis-aligned dot. xetex (harfbuzz) will find the right glyph in any case if the font has the pre-composed character.

I need to do an update anyway for the error reported

How to place a dot below for example: e?

but I'm wondering whether to add some \d declarations. I suspect not at least for the update which needs to be today or tomorrow.

Unicode has 42 pre-composed characters with a base followed by U+0303, but not all fonts have them, so it isn't clearly an advantage to declare the composites in the tuenc default.

\documentclass{article}
\showoutput
\makeatletter
\input{tuenc.def}
\begin{document}

1 \d{i}

2 i^^^^0323

3 ^^^^1ecb

\end{document}

in luatex and xetex:

enter image description here


To see what happens if all the composites with latin letters are declared

\documentclass{article}
\showoutput
\makeatletter
\input{tuenc.def}
\begin{document}



x: \d{A} \char"1EA0

x: \d{B} \char"1E04

x: \d{D} \char"1E0C

x: \d{E} \char"1EB8

x: \d{H} \char"1E24

x: \d{I} \char"1ECA

x: \d{K} \char"1E32

x: \d{L} \char"1E36

x: \d{M} \char"1E42

x: \d{N} \char"1E46

x: \d{O} \char"1ECC

x: \d{R} \char"1E5A

x: \d{S} \char"1E62

x: \d{T} \char"1E6C

x: \d{U} \char"1EE4

x: \d{V} \char"1E7E

x: \d{W} \char"1E88

x: \d{Y} \char"1EF4

x: \d{Z} \char"1E92



x: \d{a} \char"1EA1

x: \d{b} \char"1E05

x: \d{d} \char"1E0D

x: \d{e} \char"1EB9

x: \d{h} \char"1E25

x: \d{i} \char"1ECB

x: \d{k} \char"1E33

x: \d{l} \char"1E37

x: \d{m} \char"1E43

x: \d{n} \char"1E47

x: \d{o} \char"1ECD

x: \d{r} \char"1E5B

x: \d{s} \char"1E63

x: \d{t} \char"1E6D

x: \d{u} \char"1EE5

x: \d{v} \char"1E7F

x: \d{w} \char"1E89

x: \d{y} \char"1EF5

x: \d{z} \char"1E93


\end{document}

If the composites are declared then use of the \d command with the specified base would be equivalent to \char with the composite character, the following shows the result with luatex and xetex.

enter image description here

Note that luatex drops some characters, reporting

Missing character: There is no Ḅ (U+1E04) in font [lmroman10-regular]:+tlig;!
Missing character: There is no Ḳ (U+1E32) in font [lmroman10-regular]:+tlig;!
Missing character: There is no Ṿ (U+1E7E) in font [lmroman10-regular]:+tlig;!
Missing character: There is no Ẉ (U+1E88) in font [lmroman10-regular]:+tlig;!
Missing character: There is no ḅ (U+1E05) in font [lmroman10-regular]:+tlig;!
Missing character: There is no ḳ (U+1E33) in font [lmroman10-regular]:+tlig;!
Missing character: There is no ṿ (U+1E7F) in font [lmroman10-regular]:+tlig;!
Missing character: There is no ẉ (U+1E89) in font [lmroman10-regular]:+tlig;!

So those combinations work with \d in Latin Modern only if the composite isn't declared.

Note it would be possible to declare \d (or all accents) to do a check to see if the composite character is present and if not use the combining combination, but currently that would mean doing the test on every use, as latex stores these accent commands per-encoding not per-font. fontspec provides a mechanism to declare variants of TU encoding which could be used to address this, but it is not really convenient to declare a custom encoding for every font.

Another possibility of course would be to use harfbuzz from luatex, which is not an impossible aim.