Giving arbitrary unicode characters, passed as arguments, a math-active definition?

EDIT: the bug has been fixed; I haven't checked when.

You have found a bug in XeTeX's implementation of \scantokens (the underlying primitive used for LaTeX3's \tl_rescan:nn) for characters beyond the BMP.

Running the following through (plain) LuaTeX yields (./test.tex ****120162,32**** ), the rightful character code of followed by that of a space (which follows #2 in the definition of \test).

Running it throught (plain) XeTeX yields (./test.tex ****55349,56674**** ), which are the two pairs of bytes appearing in the UTF-16 representation of (at least they're in the right ballpark). Basically, rescanning transforms into a pair of characters. Somehow, though, can safely go through being written to a file and input back: the problem really seems specific to \scantokens.

\def\test#1#2.{\message{****\number`#1,\number`#2 ****}}
\scantokens{\test .}
\bye

Please report.


There is already a function for globally assigning a meaning to an active character, without resorting to \tl_rescan:nn.

\documentclass{article}
\usepackage{xparse}
\usepackage{fontspec}
\setmainfont[Ligatures=TeX]{STIXGeneral}

\ExplSyntaxOn
\cs_new_protected:Npn \my_set_math_active:Nn #1 #2
 {
  \AtBeginDocument{
    \char_set_mathcode:nn {`#1} { "8000 }
  }
  \group_begin:
  \char_gset_active:Npn #1 { #2 }
  \group_end:
}
\my_set_math_active:Nn q {(testa)}
\my_set_math_active:Nn  {(testb)}
\ExplSyntaxOff

\begin{document}

`q' is used in $q$.

`' is used in $$.


\end{document}

enter image description here


You can't use directly a math active character in its definition, because an infinite loop will result. It has nothing to do with active characters; with the classic

{\catcode`?=\active \xdef?{(\string?)}}
\mathcode`?="8000

the input $?$ would explode even if ? is not active, because it's math active.

There are workarounds. Here's a way: if you want to use a character in its replacement text when made math active, use \normal:

\documentclass{article}
\usepackage{xparse}
\usepackage{unicode-math}
\setmainfont[Ligatures=TeX]{STIXGeneral}
\setmathfont{XITS Math}

\ExplSyntaxOn
\cs_new_protected:Npn \helvens_set_math_active:Nn #1 #2
 {
  \group_begin:
  \char_gset_active:Npn #1 { #2 }
  \group_end:
  \cs_set:cpx { helvens_old_#1 }
   { \Umathcharnum \int_eval:n { \Umathcodenum`#1 } ~ } % a space for terminating the number
  \char_set_mathcode:nn {`#1} { "8000 }
 }
\NewDocumentCommand{\setmathactive}{mm}
 {
  \helvens_set_math_active:Nn #1 { #2 }
 }
\NewDocumentCommand{\normal}{m}
 {
  \use:c { helvens_old_#1 }
 }
\ExplSyntaxOff

\setmathactive{q}{(\normal{q})}
\setmathactive{}{(\normal{})}

\begin{document}

`q' is used in $q$.

`' is used in $$.

And $\normal{}$ works in math.

\end{document}

enter image description here