How to safely check by means of expansion-methods whether a list of tokens contains a token which is defined in terms of \outer?
What did I get myself into. . .
There are three possible ways:
\def\outer{}
and go on with your life. Hands down the best choice (in all aspects).- I bet it's easier to implement
\suppressoutererror
in TeX than to write TeX code to do that. - It's 2020, the year Knuth will address reported bugs in TeX. Submit
\outer
for consideration. - I said three!
- I suppose if you are still reading you expect me to show some code. Read on, then :-)
The problem with \outer
macros is that they are meant to be used only. You are not supposed to do things with them. Anything fancy you try to do, TeX will yell at you. This rules out one of TeX's most powerful features: macros. You simply can't use them on an \outer
macro.
And then you want expandability, so you just ruled out most of TeX's primitives as well (including the most useful contender, \let
, which can not only look at an \outer
macro, but can also remove it). Not even ε-TeX is helpful here, as \detokenize
and friends cannot have an \outer
control sequence in them (so yes, the code below works in Knuth TeX).
This leaves you with minimal resources. Only one primitive might help you now: \meaning
. The code below misuses \meaning
to try to find out if there is an \outer
control sequence in the argument. . .
What It can do:
There are two macros, \ifoutertl
and \ifouterarg
. The first one expands the argument token list once, to expose its contents, and the second requires braces (catcode 1 and 2) around the argument. In your example you'd use them like:
\ifoutertl\unfold{with}{without} outer
\ifouterarg{\umbrella My bald head always dries so slowly.}{with}{without} outer
\ifouterarg{My bald head always dries so slowly.}{with}{without} outer
and it would print: with outer with outer without outer
.
How It works:
Abandon all hope, ye who enter here.
Basics
When you do:
\ifouterarg{This is \outer: \umbrella}{T}{F}
the code will start by removing the leading {
and will hit the first token with \meaning
:
\some@macro the letter This is \outer: \umbrella}{T}{F}
and then the \some@macro
will do some checks for the \escapechar
(essentially ignored in the process), then will look at the immediate next t
and will try to find out what to do with it. A t
will be either a letter (the letter <something>
) or a character (the character <something>
). The code then processes that and moves on, doing the same with the rest. Then the code arrives at \outer
, and hits it with \meaning
:
\some@macro \outer: \umbrella}{T}{F}
(\outer
is a primitive, so \meaning\outer
is \12o12u12t12e12r12
). In this case, the code will see the \
and will do its thing with \outer
[note 1]. Later, it arrives at \umbrella
and hits it with \meaning
:
\some@macro \outer macro:->No raindrops on my head, please! }{T}{F}
and this time it will see the sequence of character tokens \outer macro:->
, in which case it will understand that an \outer
control sequence was in the argument, and then a bunch of expansion steps later it will leave T
as a result. If no \outer
control sequence was found F
would be left as result. This “bunch of expansion steps” means hitting all the tokens in No raindrops on my head, please!
with \meaning
[note 2] and removing them as described earlier.
Analysing each token (after \meaning
ing it)
After a token is hit with \meaning
(remember, we cannot look at a token before hitting it with \meaning
) the code proceeds to analyse it to find out what's to be done. We're interested in the prefix of a control sequence, so the very first thing the code looks for is for \protected
\long
and \outer
. Lucky us, \outer
is always the last prefix to appear in the list, so the code just skips the other two and keeps looking for more. If it finds \outer
, we look for \outer macro:->
, which is guaranteed to be there[note 3].
In case the token is not a macro (thus no macro:->
text), then we keep looking if it's one of TeX's 10 character tokens (with catcode 1, 2, 3, 4, 6, 7, 8, 10, 11, or 12). If the token is one of these, then the code simply removes it and proceeds scanning.
If it's not (for instance, a primitive, whose meaning is itself, a \count
register, whose meaning is \count<number>
, and a couple others), then the code gives up and starts again[note 4] (it's not an \outer
control sequence anyway, so we don't care). This time, it will start by doing \meaning
again (for example, in \count123
) and then the first token will be a the character \
, and it will find its way out.
Nesting
I'm not sure you noticed, but never in the process we told the macro when to end.
It is doing \meaning
on things and looking at it. So here comes the two exceptions in the process above: If the token was a begin-group character
, then the code starts a (sort of) new level of expansion, which will end at the next end-group character
. Any \outer
control sequence found in there will be reported to the upper expansion level, so you can nest braces to your heart's content :-)
If braces are balanced and an end-group character
is found, the end of the token list is reached, and it stops, selecting the proper conditional branch.
To keep track of nesting levels and whether an \outer
control sequence was found, the macro that commands the scanning process does something like (pseudocode):
\romannumeral
\expandafter \after_egroup_action
\expandafter \outer_found_boolean
\romannumeral \scan
when \scan
finds another begin-group character
, a new layer (\action\boolean\scan
) is placed, and the code keeps scanning. When an end-group character
is found, an \exp_end:
\ud@exp@end
is inserted stopping \romannumeral
, and the control passes to the \after_egroup_action
macro, which will either: keep scanning the token list with the \outer_found_boolean
, or will end the process and use \outer_found_boolean
to pick the proper conditional branch.
Notes:
- 1: Since the code cannot look at the actual control sequence, the process is not completely robust. Suppose here:
if you managed to change the catcode of the marked tokens to 12, then yes, you managed to fool the code into thinking that you had an% V------V \ifouterarg{\outer macro:->}{T}{F}
\outer
control sequence. I doubt it is possible to overcome this problem: we can't examine\outer
(and find out it's the primitive\outer
) beforehand without the risk of grabbing an\outer
control sequence. And once we hit it with\meaning
, it is indistinguishable from an\outer
macro. So yes, it's not fool proof, sorry.
2: Yes, the code is slow. Awfully slow. It has to hit each token with
\meaning
, so if you have\def\a{\b\b\b\b
. . . lots . . .\b\b\b\b\b}
and\def\b{<something awfully long>}
, then yes, it will do\meaning
in every\b
, and then\meaning
in every token in\b
, which might escalate quickly. Again, I doubt this can be optimised. Don't take it wrong, though: the code will not expand forever. Any macro in the argument is expanded with\meaning
, however everything else are characters, which will end up being scanned and removed.3: That is, if the
\outer macro:->
comes from the\meaning
of an\outer
control sequence. If you take the example in note 1 and make it read\ifouterarg{\outer macro:>}{T}{F}
instead, then the code will expand toF
.4: Yes, the code could be optimised to know more primitive tokens other than the character tokens, so that (taking the example of
\count123
) instead of consuming each of\
,c
,o
, . . . one by one (with\meaning
and a large loop), it would see that it's\count<something>
and take a shortcut. Implementing this is left as an exercise for the reader ;-)
The code is probably not what can be called robust, but it gets the job done. At this point I'm not really sure there is a better way to do this. Though much of it was written while I was changing my mind about how it would work, so there are probably redundancies and it could be improved a bit for speed. Not much, though, I think. Proceed with caution!
And if you managed to read all the way to here, congrats! Here's the code:
\catcode`\@=11
% Utilities
\def\@empty{}
\long\def\@gobble#1{}
\long\def\@firstoftwo#1#2{#1}
\long\def\@secondoftwo#1#2{#2}
\long\def\ud@usetwo#1#2{#1#2}
\def\ud@zap@space#1{\ud@@zap@space#1 \@empty}
\def\ud@@zap@space#1 #2{#1%
\ifx#2\@empty\else\expandafter\ud@@zap@space\fi#2}
\ud@usetwo{\let\ud@sptoken= }{ }
\chardef\ud@exp@end=0
\chardef\ud@false=0
\chardef\ud@true=1
% User-level macros
\def\ifoutertl#1{%
\romannumeral
\iffalse{\fi\ud@ifouter\ud@false\ud@ifouter@TF#1}}
\def\ifouterarg{%
\romannumeral
\expandafter\expandafter\expandafter\ud@ifouterarg@aux
\expandafter\@gobble\string}
% Internal macros
\def\ud@ifouter#1#2#3{%
\expandafter\ud@if@outer@scan\expandafter#1\expandafter#2#3}
\def\ud@ifouterarg@aux{%
\ud@ifouter\ud@false\ud@ifouter@TF\@empty}
\def\ud@ifouter@TF#1{%
\ifodd#1%
\expandafter\expandafter\expandafter
\ud@exp@end\expandafter\@firstoftwo
\else
\expandafter\expandafter\expandafter
\ud@exp@end\expandafter\@secondoftwo
\fi}
\def\ud@if@outer@scan#1#2{%
\expandafter\ud@if@outer@decypher
\expandafter#1\expandafter#2\meaning}
\def\ud@if@outer@decypher#1#2{%
\expandafter\ud@rearrange
\expandafter#1\expandafter#2%
\romannumeral\ud@if@outer@escapechar}
\def\ud@if@outer@escapechar#1{%
\ifnum\ifnum\escapechar<0 0\else 1\fi
\ifnum\escapechar>255 0\else 1\fi=0
\ud@decypher@noescape
\else
\ifnum`#1=\escapechar
\ud@decypher@escape
\else
\ud@decypher@noescape
\fi
\fi#1}
\def\ud@rearrange#1#2#3{#3#1#2}
\def\ud@decypher@noescape#1\fi\fi{\fi\fi\ud@decypher@cs@prefix}
\def\ud@decypher@escape\else#1\fi\fi#2{\fi\fi
\ud@decypher@cs@prefix}
\def\ud@decypher@cs@prefix#1{%
\ifcase0\if #1p1\fi \if #1l2\fi
\if #1o3\fi \if #1m4\fi \ud@sptoken
\expandafter\ud@scan@token@keyword%
\or \expandafter\ud@scan@string@p % \protected
\or \expandafter\ud@scan@string@l % \long
\or \expandafter\ud@scan@string@o % \outer
\or \expandafter\ud@scan@string@m % macro :->
\fi#1}
\def\ud@return@same@scanner{\ud@exp@end\ud@if@outer@scan}
\def\ud@return@true@scanner{\ud@exp@end\ud@return@true@outer}
\def\ud@return@true@outer#1{\ud@if@outer@scan\ud@true}
\def\ud@newstring#1#2{%
\ifx\relax#2%
\expandafter\@gobble
\else
\edef\ud@tmp@tl{\ud@tmp@tl#1}%
\edef\ud@test@tokn{\string#1}%
\expandafter\edef\csname ud@scan@string@\ud@tmp@tl\endcsname##1{%
\noexpand\ifx ##1\ud@test@tokn
\noexpand\expandafter\expandafter\noexpand
\csname ud@scan@string@\ud@tmp@tl#2\endcsname
\noexpand\else
\noexpand\expandafter\noexpand\ud@return@same@scanner
\noexpand\fi}%
\expandafter\ud@newstring
\fi{#2}}
\def\ud@new@scan@string#1{%
\def\ud@tmp@tl{}%
\ud@newstring #1{end}\relax
\expandafter\def\csname ud@scan@string@\ud@zap@space{#1}end\endcsname}
\ud@new@scan@string{protected}{\ud@if@outer@escapechar}
\ud@new@scan@string{long}{\ud@if@outer@escapechar}
\ud@new@scan@string{macro:->}{\ud@return@same@scanner}
\ud@new@scan@string{outer macro:->}{\ud@return@true@scanner}
\def\ud@scan@bgroup{\ud@exp@end\ud@scan@bgroup@aux}
\def\ud@scan@bgroup@aux#1#2{%
\expandafter\ud@after@group@continue
\expandafter#1\expandafter#2%
\romannumeral\ud@if@outer@scan#1\ud@after@egroup}
\def\ud@after@egroup{\ud@exp@end}
\def\ud@after@group@continue#1#2#3{%
\ud@if@outer@scan#3#2}
\def\ud@scan@egroup{\ud@exp@end\ud@scan@egroup@aux}
\def\ud@scan@egroup@aux#1#2{#2#1}
\def\ud@gobble@char@return#1{\ud@return@same@scanner}
\def\ud@gobble@char@do#1#2{#1}
\ud@usetwo{\def\ud@gobble@two@spaces}{ } {}
\ud@new@scan@string{begin-group character}{\ud@gobble@char@do\ud@scan@bgroup}
\ud@new@scan@string{end-group character}{\ud@gobble@char@do\ud@scan@egroup}
\ud@new@scan@string{math shift character}{\ud@gobble@char@return}
\ud@new@scan@string{alignment tab character}{\ud@gobble@char@return}
\ud@new@scan@string{macro parameter character}{\ud@gobble@char@return}
\ud@new@scan@string{superscript character}{\ud@gobble@char@return}
\ud@new@scan@string{subscript character}{\ud@gobble@char@return}
\ud@new@scan@string{blank space}{\expandafter\ud@return@same@scanner\ud@gobble@two@spaces}
\ud@new@scan@string{the letter}{\ud@gobble@char@return}
\ud@new@scan@string{thec haracter}{\ud@gobble@char@return}
\def\ud@scan@token@keyword#1{%
\expandafter\ifx\csname ud@scan@string@#1\endcsname\relax
\expandafter\ud@return@same@scanner
\else
\csname ud@scan@string@#1\expandafter\endcsname
\fi#1}
% keyword forks
\def\ud@strip@prefix#1>{}
\def\ud@detokenize#1#2{\def#1{#2}\edef#1{\expandafter\ud@strip@prefix\meaning#1}}
\def\ud@tl@head{\expandafter\@firstoftwo}
\def\ud@set@fork@string#1#2#3{%
\begingroup \escapechar-1
\def\x{ud@scan@string@#1}%
\expandafter\ud@set@fork@string@aux
\csname\x#2\expandafter\endcsname
\csname\x#2_\expandafter\endcsname
\csname\x#3\expandafter\expandafter\expandafter\endcsname
\expandafter\string\csname#2\expandafter\expandafter\expandafter\endcsname
\expandafter\string\csname#3\endcsname}
\def\ud@set@fork@string@aux#1#2#3#4#5{%
\endgroup
\let#2#1%
\def#1##1{%
\ifx ##1#4\expandafter#2%
\else \ifx ##1#5\expandafter\expandafter\expandafter#3%
\else \expandafter\expandafter\expandafter\ud@return@same@scanner
\fi
\fi##1}}
\ud@set@fork@string{ma}{c}{t} % macro / math
\ud@set@fork@string{macro}{p}{:} % macro parameter / macro:->
\ud@set@fork@string{the}{c}{l} % the character / the letter
\ud@set@fork@string{b}{l}{e} % blank / begin
\catcode`\@=12
% -----
% Tests
% -----
\def\rain{My bald head is still wet.}
\def\unfold{\umbrella My bald {head always} dries so slowly.}
\outer\def\umbrella{No raindrops on my head, please! }
\newcount\abc
\tt
0\ifouterarg{\newcount}{T}{F} (T)\par
1\ifouterarg{\abc}{T}{F} (F)\par
2\ifouterarg{\zzz}{T}{F} (F)\par
3\ifoutertl\unfold{T}{F} (T)\par
4\ifoutertl\rain{T}{F} (F)\par
5\ifouterarg{No raindrops on my head, please! }{T}{F} (F)\par
6\ifouterarg{\umbrella My bald {head always} dries so slowly.}{T}{F} (T)\par
7\edef\tmpa{\ifoutertl\unfold{T}{F}}\meaning\tmpa (T)\par
8\edef\tmpa{\ifouterarg{\umbrella corp.}{T}{F}}\meaning\tmpa (T)\par
9\edef\tmpa{\ifouterarg{\zombies!}{T}{F}}\meaning\tmpa (F)\par
\bye
Let's try a Lua solution. We could use \suppressoutererror
to make scanning easier but that wouldn't be much fun, so we scan individual tokens instead and manually try to keep track of nested braces.
For every scanned token, Lua can access the "command id". This is kind of a generalization of catcodes. Especially, every catcode 1 ({
) token has id 1, every catcode 2
(}
) token has id 2 and every token which would invoke a \outer
macro has he id returned by token.command_id'outer_call'
or token.command_id'long_outer_call'
. So for every token we only have to check if it has any of these command ids. For id 1
we increase the nesting level, for 2
we decrease it and if one of the other two ids are found we remember to return true at the end:
\documentclass{article}
\begin{document}
\directlua{
local i = luatexbase.new_luafunction'hasouter'
% The following creates a table outer_cmd, such that
% outer_cmds[i] is true iff i is a id corresponding to
% a call to an \outer macro
local outer_cmds = {
[token.command_id'outer_call'] = true,
[token.command_id'long_outer_call'] = true,
}
lua.get_functions_table()[i] = function() % This function will be executed if we use `\hasouter`
local tok = token.scan_token() % scan_token applies full expansion until the first non-expandable token is found. This allows e.g. \hasouter\expandafter{...}
local cmd = tok.command % Look at the command code
if cmd \csstring\~= 1 then % \csstring\ must makes sure that TeX does not expand ~.
token.put_next(firsttok) % If we read a wrong character, putting it back ensures that TeX gets less confused if the user decides to continue after the error.
error[[Argument must start with \csstring\{]]
end
local nesting = 0
local result = false % This will become true if we find an \outer call
while true do % An endless loop. This will still terminate because we return early if nesing becomes 0 again
if cmd == 1 then % tok is equivalent to `{`. Increase the nesting level.
nesting = nesting + 1
elseif cmd == 2 then % tok is equivalent to `}`. Decrease the nesting level.
nesting = nesting - 1
if nesting == 0 then
% We want to expand to the first or second parameter depending on result, so we insert @first/secondoftwo
token.put_next(token.create(result and '@firstoftwo' or '@secondoftwo'))
return
end
else
result = result or outer_cmds[cmd] % If result is already true, don't change anything. Otherwise make it true if cmd corresponds to an outer call
end
tok = token.get_next() % Continue with the next token. get_next applies no expansion.
cmd = tok.command
end
end
token.set_lua('hasouter', i) % Define \hasouter to execute the function above
}
\def\unfold{\umbrella My bald head always dries so slowly.}
\def\unfoldX{My bald head always dries so slowly.}
\outer\def\umbrella{No raindrops on my head, please! }
\hasouter\expandafter{\unfold}{with}{without} outer
\hasouter\expandafter{\unfoldX}{with}{without} outer
\hasouter{\umbrella My bald head always dries so slowly.}{with}{without} outer
\hasouter{My bald head always dries so slowly.}{with}{without} outer
\end{document}
As Phelype Oleinik mentioned in a comment, this does not actually work inside \edef
because there get_next
enforces the restriction that no outer macros should be contained in an macro.
In modern LuaTeX versions (e.g. starting with TeXLive 2020 or when compiled with lualatex-dev
in TeXLive 2019), this can be worked around by using tex.runtoks
: Executing \lowercase{}
or something similar doesn't do anything, but it returns TeX's scanner to a normal state where \outer
macros are accepted. Of course \ŀowercase
isn't expandable, but runtoks
allows to use non-expandable things in an expandable context.
\documentclass{article}
\begin{document}
\directlua{
local i = luatexbase.new_luafunction'realhasouter'
local j = luatexbase.new_luafunction'hasouter'
local outer_cmds = {
[token.command_id'outer_call'] = true,
[token.command_id'long_outer_call'] = true,
}
lua.get_functions_table()[i] = function()
local delayed_tok = token.get_next()
local tok = token.scan_token()
local cmd = tok.command
if cmd \csstring\~= 1 then
token.put_next(firsttok)
error[[Argument must start with \csstring\{]]
end
local nesting = 0
local result = false
while true do
if cmd == 1 then
nesting = nesting + 1
elseif cmd == 2 then
nesting = nesting - 1
if nesting == 0 then
token.put_next(token.create(result and '@firstoftwo' or '@secondoftwo'))
token.put_next(delayed_tok)
return
end
else
result = result or outer_cmds[cmd]
end
tok = token.get_next()
cmd = tok.command
end
end
local call_realhasouter_toks = {
token.new(0, token.command_id'case_shift'), token.new(0, 1), token.new(0, 2), % This is similar to \lowercase{}. It doesn't do anything, but it changes the status of TeX's scanner to allow outer tokens
token.new(i, token.command_id'lua_expandable_call')
}
lua.get_functions_table()[j] = function()
tex.runtoks(function()
token.put_next(call_realhasouter_toks)
end)
end
token.set_lua('hasouter', j)
}
\def\unfold{\umbrella My bald head always dries so slowly.}
\def\unfoldX{My bald head always dries so slowly.}
\outer\def\umbrella{No raindrops on my head, please! }
\hasouter\expandafter{\unfold}{with}{without} outer
\hasouter\expandafter{\unfoldX}{with}{without} outer
\hasouter{\umbrella My bald head always dries so slowly.}{with}{without} outer
\hasouter{My bald head always dries so slowly.}{with}{without} outer
\edef\xxx{\hasouter{My bald head always dries so slowly.}{with}{without} outer}
\show\xxx
\edef\xxx{\hasouter{\umbrella My bald head always dries so slowly.}{with}{without} outer}
\show\xxx
\end{document}
This introduces \countouters{<macro>}
, which will count the occurrences of \outer
macros contained in the argument. The result is stored in the counter outercnt
. I had to make an adjustment to the logic of tokcycle
token-digesting package (inserting an extra trapping level to look for pre-digested \outer
macros) to accomplish this.
The macro to be tested can itself contain macros and groups...however, \outer
macros are not allowed within groups inside the defined macro. So, for example in the MWE below, the macro \unfold
could not be defined as \def\unfold{\textit{\umbrella}}
, if \umbrella
were later defined as an outer macro.
\documentclass{article}
\usepackage{listofitems,tokcycle}
\newcounter{outercnt}
\makeatletter
\let\detect@CANTabsorbA\detect@CANTabsorb
\long\def\detect@CANTabsorb{%
\expandafter\def\expandafter\mytmp\expandafter{\meaning\tc@next}%
\expandafter\setsepchar\expandafter{\detokenize{\outer macro:->}}%
\readlist\mylist{\mytmp}%
\tctestifnum{\listlen\mylist[]>1}%
{\stepcounter{outercnt}\expandafter\@tokcycle\string}%
{\detect@CANTabsorbA}%
}
\makeatother
\tokcycleenvironment\countouterenv{}{\processtoks{##1}}{}{}
\newcommand\countouters[1]{\setcounter{outercnt}{0}%
\expandafter\countouterenv#1\endcountouterenv}
\begin{document}
\def\unfold{1. \umbrella My bald head \textit{always} dries so slowly.
My \umbrella}
\outer\def\umbrella{No raindrops on my head, please! }
\countouters{\unfold}
Outer occurences = \theoutercnt
\end{document}