How to safely check by means of expansion-methods whether a list of tokens contains a token which is defined in terms of \outer?

What did I get myself into. . .

There are three possible ways:

  1. \def\outer{} and go on with your life. Hands down the best choice (in all aspects).
  2. I bet it's easier to implement \suppressoutererror in TeX than to write TeX code to do that.
  3. It's 2020, the year Knuth will address reported bugs in TeX. Submit \outer for consideration.
  4. I said three!
  5. I suppose if you are still reading you expect me to show some code. Read on, then :-)

The problem with \outer macros is that they are meant to be used only. You are not supposed to do things with them. Anything fancy you try to do, TeX will yell at you. This rules out one of TeX's most powerful features: macros. You simply can't use them on an \outer macro.

And then you want expandability, so you just ruled out most of TeX's primitives as well (including the most useful contender, \let, which can not only look at an \outer macro, but can also remove it). Not even ε-TeX is helpful here, as \detokenize and friends cannot have an \outer control sequence in them (so yes, the code below works in Knuth TeX).

This leaves you with minimal resources. Only one primitive might help you now: \meaning. The code below misuses \meaning to try to find out if there is an \outer control sequence in the argument. . .

What It can do:

There are two macros, \ifoutertl and \ifouterarg. The first one expands the argument token list once, to expose its contents, and the second requires braces (catcode 1 and 2) around the argument. In your example you'd use them like:

\ifoutertl\unfold{with}{without} outer
\ifouterarg{\umbrella My bald head always dries so slowly.}{with}{without} outer
\ifouterarg{My bald head always dries so slowly.}{with}{without} outer

and it would print: with outer with outer without outer.

How It works:

Abandon all hope, ye who enter here.


When you do:

\ifouterarg{This is \outer: \umbrella}{T}{F}

the code will start by removing the leading { and will hit the first token with \meaning:

\some@macro the letter This is \outer: \umbrella}{T}{F}

and then the \some@macro will do some checks for the \escapechar (essentially ignored in the process), then will look at the immediate next t and will try to find out what to do with it. A t will be either a letter (the letter <something>) or a character (the character <something>). The code then processes that and moves on, doing the same with the rest. Then the code arrives at \outer, and hits it with \meaning:

\some@macro \outer: \umbrella}{T}{F}

(\outer is a primitive, so \meaning\outer is \12o12u12t12e12r12). In this case, the code will see the \ and will do its thing with \outer[note 1]. Later, it arrives at \umbrella and hits it with \meaning:

\some@macro \outer macro:->No raindrops on my head, please! }{T}{F}

and this time it will see the sequence of character tokens \outer macro:->, in which case it will understand that an \outer control sequence was in the argument, and then a bunch of expansion steps later it will leave T as a result. If no \outer control sequence was found F would be left as result. This “bunch of expansion steps” means hitting all the tokens in No raindrops on my head, please! with \meaning[note 2] and removing them as described earlier.

Analysing each token (after \meaninging it)

After a token is hit with \meaning (remember, we cannot look at a token before hitting it with \meaning) the code proceeds to analyse it to find out what's to be done. We're interested in the prefix of a control sequence, so the very first thing the code looks for is for \protected \long and \outer. Lucky us, \outer is always the last prefix to appear in the list, so the code just skips the other two and keeps looking for more. If it finds \outer, we look for \outer macro:->, which is guaranteed to be there[note 3].

In case the token is not a macro (thus no macro:-> text), then we keep looking if it's one of TeX's 10 character tokens (with catcode 1, 2, 3, 4, 6, 7, 8, 10, 11, or 12). If the token is one of these, then the code simply removes it and proceeds scanning.

If it's not (for instance, a primitive, whose meaning is itself, a \count register, whose meaning is \count<number>, and a couple others), then the code gives up and starts again[note 4] (it's not an \outer control sequence anyway, so we don't care). This time, it will start by doing \meaning again (for example, in \count123) and then the first token will be a the character \, and it will find its way out.


I'm not sure you noticed, but never in the process we told the macro when to end.
It is doing \meaning on things and looking at it. So here comes the two exceptions in the process above: If the token was a begin-group character, then the code starts a (sort of) new level of expansion, which will end at the next end-group character. Any \outer control sequence found in there will be reported to the upper expansion level, so you can nest braces to your heart's content :-)
If braces are balanced and an end-group character is found, the end of the token list is reached, and it stops, selecting the proper conditional branch.

To keep track of nesting levels and whether an \outer control sequence was found, the macro that commands the scanning process does something like (pseudocode):

  \expandafter \after_egroup_action
  \expandafter \outer_found_boolean
    \romannumeral \scan

when \scan finds another begin-group character, a new layer (\action\boolean\scan) is placed, and the code keeps scanning. When an end-group character is found, an \exp_end: \ud@exp@end is inserted stopping \romannumeral, and the control passes to the \after_egroup_action macro, which will either: keep scanning the token list with the \outer_found_boolean, or will end the process and use \outer_found_boolean to pick the proper conditional branch.


  • 1: Since the code cannot look at the actual control sequence, the process is not completely robust. Suppose here:
    %                  V------V
    \ifouterarg{\outer macro:->}{T}{F}
    if you managed to change the catcode of the marked tokens to 12, then yes, you managed to fool the code into thinking that you had an \outer control sequence. I doubt it is possible to overcome this problem: we can't examine \outer (and find out it's the primitive \outer) beforehand without the risk of grabbing an \outer control sequence. And once we hit it with \meaning, it is indistinguishable from an \outer macro. So yes, it's not fool proof, sorry.
  • 2: Yes, the code is slow. Awfully slow. It has to hit each token with \meaning, so if you have \def\a{\b\b\b\b. . . lots . . .\b\b\b\b\b} and \def\b{<something awfully long>}, then yes, it will do \meaning in every \b, and then \meaning in every token in \b, which might escalate quickly. Again, I doubt this can be optimised. Don't take it wrong, though: the code will not expand forever. Any macro in the argument is expanded with \meaning, however everything else are characters, which will end up being scanned and removed.

  • 3: That is, if the \outer macro:-> comes from the \meaning of an \outer control sequence. If you take the example in note 1 and make it read \ifouterarg{\outer macro:>}{T}{F} instead, then the code will expand to F.

  • 4: Yes, the code could be optimised to know more primitive tokens other than the character tokens, so that (taking the example of \count123) instead of consuming each of \, c, o, . . . one by one (with \meaning and a large loop), it would see that it's \count<something> and take a shortcut. Implementing this is left as an exercise for the reader ;-)

The code is probably not what can be called robust, but it gets the job done. At this point I'm not really sure there is a better way to do this. Though much of it was written while I was changing my mind about how it would work, so there are probably redundancies and it could be improved a bit for speed. Not much, though, I think. Proceed with caution!

And if you managed to read all the way to here, congrats! Here's the code:

% Utilities
\def\ud@zap@space#1{\ud@@zap@space#1 \@empty}
\def\ud@@zap@space#1 #2{#1%
\ud@usetwo{\let\ud@sptoken= }{ }
% User-level macros
% Internal macros
  \ifnum\ifnum\escapechar<0   0\else 1\fi
        \ifnum\escapechar>255 0\else 1\fi=0
  \ifcase0\if #1p1\fi \if #1l2\fi
          \if #1o3\fi \if #1m4\fi \ud@sptoken
  \or \expandafter\ud@scan@string@p % \protected
  \or \expandafter\ud@scan@string@l % \long
  \or \expandafter\ud@scan@string@o % \outer
  \or \expandafter\ud@scan@string@m % macro :->
    \expandafter\edef\csname ud@scan@string@\ud@tmp@tl\endcsname##1{%
      \noexpand\ifx ##1\ud@test@tokn
          \csname ud@scan@string@\ud@tmp@tl#2\endcsname
  \ud@newstring #1{end}\relax
  \expandafter\def\csname ud@scan@string@\ud@zap@space{#1}end\endcsname}
\ud@new@scan@string{outer macro:->}{\ud@return@true@scanner}
\ud@usetwo{\def\ud@gobble@two@spaces}{ } {}
\ud@new@scan@string{begin-group character}{\ud@gobble@char@do\ud@scan@bgroup}
\ud@new@scan@string{end-group character}{\ud@gobble@char@do\ud@scan@egroup}
\ud@new@scan@string{math shift character}{\ud@gobble@char@return}
\ud@new@scan@string{alignment tab character}{\ud@gobble@char@return}
\ud@new@scan@string{macro parameter character}{\ud@gobble@char@return}
\ud@new@scan@string{superscript character}{\ud@gobble@char@return}
\ud@new@scan@string{subscript character}{\ud@gobble@char@return}
\ud@new@scan@string{blank space}{\expandafter\ud@return@same@scanner\ud@gobble@two@spaces}
\ud@new@scan@string{the letter}{\ud@gobble@char@return}
\ud@new@scan@string{thec haracter}{\ud@gobble@char@return}
  \expandafter\ifx\csname ud@scan@string@#1\endcsname\relax
    \csname ud@scan@string@#1\expandafter\endcsname
% keyword forks
  \begingroup \escapechar-1
    \ifx ##1#4\expandafter#2%
    \else \ifx ##1#5\expandafter\expandafter\expandafter#3%
          \else \expandafter\expandafter\expandafter\ud@return@same@scanner
\ud@set@fork@string{ma}{c}{t} % macro / math
\ud@set@fork@string{macro}{p}{:} % macro parameter / macro:->
\ud@set@fork@string{the}{c}{l} % the character / the letter
\ud@set@fork@string{b}{l}{e} % blank / begin

% -----
% Tests
% -----

\def\rain{My bald head is still wet.}
\def\unfold{\umbrella My bald {head always} dries so slowly.}
\outer\def\umbrella{No raindrops on my head, please! }
0\ifouterarg{\newcount}{T}{F} (T)\par
1\ifouterarg{\abc}{T}{F} (F)\par
2\ifouterarg{\zzz}{T}{F} (F)\par
3\ifoutertl\unfold{T}{F} (T)\par
4\ifoutertl\rain{T}{F} (F)\par
5\ifouterarg{No raindrops on my head, please! }{T}{F} (F)\par
6\ifouterarg{\umbrella My bald {head always} dries so slowly.}{T}{F} (T)\par
7\edef\tmpa{\ifoutertl\unfold{T}{F}}\meaning\tmpa (T)\par
8\edef\tmpa{\ifouterarg{\umbrella corp.}{T}{F}}\meaning\tmpa (T)\par
9\edef\tmpa{\ifouterarg{\zombies!}{T}{F}}\meaning\tmpa (F)\par

Let's try a Lua solution. We could use \suppressoutererror to make scanning easier but that wouldn't be much fun, so we scan individual tokens instead and manually try to keep track of nested braces.

For every scanned token, Lua can access the "command id". This is kind of a generalization of catcodes. Especially, every catcode 1 ({) token has id 1, every catcode 2 (}) token has id 2 and every token which would invoke a \outer macro has he id returned by token.command_id'outer_call' or token.command_id'long_outer_call'. So for every token we only have to check if it has any of these command ids. For id 1 we increase the nesting level, for 2 we decrease it and if one of the other two ids are found we remember to return true at the end:

  local i = luatexbase.new_luafunction'hasouter'
  % The following creates a table outer_cmd, such that
  % outer_cmds[i] is true iff i is a id corresponding to
  % a call to an \outer macro 
  local outer_cmds = {
    [token.command_id'outer_call'] = true,
    [token.command_id'long_outer_call'] = true,
  lua.get_functions_table()[i] = function() % This function will be executed if we use `\hasouter`
    local tok = token.scan_token() % scan_token applies full expansion until the first non-expandable token is found. This allows e.g. \hasouter\expandafter{...}
    local cmd = tok.command % Look at the command code
    if cmd \csstring\~= 1 then % \csstring\ must makes sure that TeX does not expand ~.
      token.put_next(firsttok) % If we read a wrong character, putting it back ensures that TeX gets less confused if the user decides to continue after the error.
      error[[Argument must start with \csstring\{]]
    local nesting = 0
    local result = false % This will become true if we find an \outer call
    while true do % An endless loop. This will still terminate because we return early if nesing becomes 0 again
      if cmd == 1 then % tok is equivalent to `{`. Increase the nesting level.
        nesting = nesting + 1
      elseif cmd == 2 then % tok is equivalent to `}`. Decrease the nesting level.
        nesting = nesting - 1
        if nesting == 0 then
          % We want to expand to the first or second parameter depending on result, so we insert @first/secondoftwo
          token.put_next(token.create(result and '@firstoftwo' or '@secondoftwo'))
        result = result or outer_cmds[cmd] % If result is already true, don't change anything. Otherwise make it true if cmd corresponds to an outer call
      tok = token.get_next() % Continue with the next token. get_next applies no expansion.
      cmd = tok.command
  token.set_lua('hasouter', i) % Define \hasouter to execute the function above

\def\unfold{\umbrella My bald head always dries so slowly.}
\def\unfoldX{My bald head always dries so slowly.}
\outer\def\umbrella{No raindrops on my head, please! }
\hasouter\expandafter{\unfold}{with}{without} outer
\hasouter\expandafter{\unfoldX}{with}{without} outer
\hasouter{\umbrella My bald head always dries so slowly.}{with}{without} outer
\hasouter{My bald head always dries so slowly.}{with}{without} outer

As Phelype Oleinik mentioned in a comment, this does not actually work inside \edef because there get_next enforces the restriction that no outer macros should be contained in an macro.

In modern LuaTeX versions (e.g. starting with TeXLive 2020 or when compiled with lualatex-dev in TeXLive 2019), this can be worked around by using tex.runtoks: Executing \lowercase{} or something similar doesn't do anything, but it returns TeX's scanner to a normal state where \outer macros are accepted. Of course \ŀowercase isn't expandable, but runtoks allows to use non-expandable things in an expandable context.

  local i = luatexbase.new_luafunction'realhasouter'
  local j = luatexbase.new_luafunction'hasouter'
  local outer_cmds = {
    [token.command_id'outer_call'] = true,
    [token.command_id'long_outer_call'] = true,
  lua.get_functions_table()[i] = function()
    local delayed_tok = token.get_next()
    local tok = token.scan_token()
    local cmd = tok.command
    if cmd \csstring\~= 1 then
      error[[Argument must start with \csstring\{]]
    local nesting = 0
    local result = false
    while true do
      if cmd == 1 then
        nesting = nesting + 1
      elseif cmd == 2 then
        nesting = nesting - 1
        if nesting == 0 then
          token.put_next(token.create(result and '@firstoftwo' or '@secondoftwo'))
        result = result or outer_cmds[cmd]
      tok = token.get_next()
      cmd = tok.command
  local call_realhasouter_toks = {, token.command_id'case_shift'),, 1),, 2), % This is similar to \lowercase{}. It doesn't do anything, but it changes the status of TeX's scanner to allow outer tokens, token.command_id'lua_expandable_call')
  lua.get_functions_table()[j] = function()
  token.set_lua('hasouter', j)
\def\unfold{\umbrella My bald head always dries so slowly.}
\def\unfoldX{My bald head always dries so slowly.}
\outer\def\umbrella{No raindrops on my head, please! }
\hasouter\expandafter{\unfold}{with}{without} outer
\hasouter\expandafter{\unfoldX}{with}{without} outer
\hasouter{\umbrella My bald head always dries so slowly.}{with}{without} outer
\hasouter{My bald head always dries so slowly.}{with}{without} outer
\edef\xxx{\hasouter{My bald head always dries so slowly.}{with}{without} outer}
\edef\xxx{\hasouter{\umbrella My bald head always dries so slowly.}{with}{without} outer}

This introduces \countouters{<macro>}, which will count the occurrences of \outer macros contained in the argument. The result is stored in the counter outercnt. I had to make an adjustment to the logic of tokcycle token-digesting package (inserting an extra trapping level to look for pre-digested \outer macros) to accomplish this.

The macro to be tested can itself contain macros and groups...however, \outer macros are not allowed within groups inside the defined macro. So, for example in the MWE below, the macro \unfold could not be defined as \def\unfold{\textit{\umbrella}}, if \umbrella were later defined as an outer macro.


  \expandafter\setsepchar\expandafter{\detokenize{\outer macro:->}}%



\def\unfold{1. \umbrella My bald head \textit{always} dries so slowly.
  My \umbrella}
\outer\def\umbrella{No raindrops on my head, please! }


Outer occurences = \theoutercnt 

enter image description here