Syntax and semantics of XDV commands (XeTeX)
Not a lot better than the source you gave but the xetex output is read by xdvipdfmx and the source for that in texlive svn has dvicodes.h
which has
/* XeTeX ".xdv" codes */
#define XDV_NATIVE_FONT_DEF 252 /* fontdef for native platform font */
#define XDV_GLYPHS 253 /* string of glyph IDs with X and Y positions */
#define XDV_TEXT_AND_GLYPHS 254 /* like XDV_GLYPHS plus original Unicode text */
#define PTEXDIR 255 /* Ascii pTeX DIR command */
These are handled by dvi.c
in the same directory, but I guess your C is better than mine:-)
There are some comments as to the expected byte layout in the C eg
case XDV_GLYPHS:
need_XeTeX(opcode);
get_and_buffer_bytes(fp, 4); /* width */
len = get_and_buffer_unsigned_pair(fp); /* glyph count */
get_and_buffer_bytes(fp, len * 10); /* 2 bytes ID + 8 bytes x,y-location per glyph */
break;
case XDV_TEXT_AND_GLYPHS:
need_XeTeX(opcode);
len = get_and_buffer_unsigned_pair(fp); /* utf16 code unit count */
get_and_buffer_bytes(fp, len * 2); /* 2 bytes per code unit */
get_and_buffer_bytes(fp, 4); /* width */
len = get_and_buffer_unsigned_pair(fp); /* glyph count */
get_and_buffer_bytes(fp, len * 10); /* 2 bytes ID + 8 bytes x,y-location per glyph */
break;
case XDV_NATIVE_FONT_DEF:
need_XeTeX(opcode);
do_native_font_def(get_signed_quad(dvi_file));
break;
dviasm (despite its name) can show xdv files, I had to change the font loading to find the font, but on your test file it reports
[preamble]
id: 7
numerator: 25400000
denominator: 473628672
magnification: 1000
comment: ' XeTeX output 2019.06.16:2127'
[postamble]
maxv: 633.947250pt
maxh: 407pt
maxs: 3
pages: 1
[font definitions]
fntdef: "/usr/local/texlive/2018/texmf-dist/fonts/truetype/public/amiri/amiri-regular.ttf" at 10pt
[page 1 0 0 0 0 0 0 0 0 0]
xxx: 'pdf:pagesize default'
down: 633pt
push:
down: -605pt
down: 575pt
push:
down: -540pt
push:
right: 300.325195pt
xxx: 'pdf:docinfo<</BIDI.Fullbanner(This is the bidi package, Version 35.8, Released May 1, 2019. )>>'
w: 2.929688pt
fnt: "/usr/local/texlive/2018/texmf-dist/fonts/truetype/public/amiri/amiri-regular.ttf" at 10pt
setglyphs: 31.865234pt gid37(0pt) gid82(5.820312pt) gid81(10.595703pt) gid77(15.791016pt) gid82(18.125000pt) gid88(23.095703pt) gid85(28.125000pt)
w0:
setglyphs: 19.155273pt gid2388(0pt) gid4497(10.791016pt) gid4466(13.232422pt) gid2021(14.956055pt) gid2083(17.250977pt)
w0:
xxx: 'ligne x'
setglyphs: 31.865234pt gid37(0pt) gid82(5.820312pt) gid81(10.595703pt) gid77(15.791016pt) gid82(18.125000pt) gid88(23.095703pt) gid85(28.125000pt)
pop:
pop:
down: 30pt
push:
right: 231.570312pt
setglyphs: 5.859375pt gid447(0pt)
pop:
pop:
With the tex file you gave apart from changing the font line to
\setmainfont[Script=Arabic]{Amiri}
Based on the dvisvgm sources and dvipdfm-x sources: The truly XDV-specific opcodes are (as of the current Version 7) only three:
252 (
fc
): This is to define a font (code refers to it asXDV_NATIVE_FONT_DEF
orXFontDef
), and is the most complicated of the three. Parameters are:fontnum[4] ptsize[4] flags[2] psname_len[1] fontname[psname_len] fontIndex[4]
followed by up to (2 + 4 * 65535) more bytes depending on the flags.
253 (
fd
). This is a “string of glyph IDs with X and Y positions”, referred to in code byXDV_GLYPHS
orXGlyphArray
. Parameters are:w[4] n[2] xy[(4+4)n] g[2n]
where
w
is the total width of the glyph array,n
is the number of glyphs,xy
is a sequence of (dx, dy) pairs (the relative horizontal and vertical positions of each glyph), andg
contains the “FreeType indices of the glyphs to typeset”.254 (
fe
): This is similar except it includes “a leading array of UTF-16 characters that specify the "actual text" represented by the glyphs to be printed. It usually contains the text with special characters (like ligatures) expanded so that it can be used for text search, plain text copy & paste etc. This XDV command was introduced with XeTeX 0.99995 and can be triggered by\XeTeXgenerateactualtext1
”. So its parameters are:parameters: l[2] t[2l] w[4] n[2] xy[8n] g[2n]
I don't think the TeX-XeT commands 250–251 nor the pTeX command 255 are used by XeTeX, which is consistent with you not seeing them in the file.
The hexdump in the question starts with f7
= 247, the DVI “pre” command, and the next byte is the DVI version, which here is 07
. So we're looking at (XDV) version 7, as expected.
So in your file when you see (at either byte offset 278 or 423) bytes like fd 00 1f dd 80 00 07 00 00 00
and so on, it's actually not just two bytes that are the parameters, but rather 00 1f dd 80
are w, then 00 07
are n (the number of glyphs), then the next 56 bytes are xy
or (dx, dy)
(the offsets for each of these 7 glyphs), then the next 14 bytes are g
or glyphs
(the 7 glyphs). Needless to say, these 7 in your example are Bonjour
:
00 25 00 52 00 51 00 4d 00 52 00 58 00 55
As you observe, these are not ASCII codes, so where does this mapping of 00 25
to B, etc come from? Well it's the same as with the regular DVI format: these are the positions of the glyphs in the font, and the font can choose to put any glyph at any position. This is confirmed by opening the font and counting positions: maybe FontForge can show it but I couldn't find it in the UI, but I could find it with fonttools
:
$ ttx amiri-regular.ttf
Dumping "amiri-regular.ttf" to "amiri-regular.ttx"...
and the file contains:
<GlyphID id="37" name="B"/>
where 37 is 0x25, etc.
The definitive source is the XeTeX source tree, and specifically the xetex.web file. Quoting from it:
\yskip\noindent Commands 250--255 are undefined in normal .{DVI} files, but the following commands are used in .{XDV} files.
\yskip\hang\vbox{\halign{#&#\hfil\cr |define_native_font| 252 & |k[4]| |s[4]| |flags[2]| |l1| |n[l]| |i[4]|\cr & |if (flags and COLORED) then| |rgba[4]|\cr & |if (flags and EXTEND) then| |extend[4]|\cr & |if (flags and SLANT) then| |slant[4]|\cr & |if (flags and EMBOLDEN) then| |embolden[4]|\cr }}
\yskip\hang|set_glyphs| 253 |w[4]| |k[2]| |xy[8k]| |g[2k]|.
\yskip\hang|set_text_and_glyphs| 254 |l[2]| |t[2l]| |w[4]| |k[2]| |xy[8k]| |g[2k]|.
\yskip\noindent Commands 250 and 255 are undefined in normal .{XDV} files.