How are parameter tokens (#1,#2,...,#9) processed?

tex macros have two kinds of argument, delimited and non delimited, for a non delimited argument the argument is either a single token, or if the token is an explicit brace (a character of catcode 1) then the argument is all the balanced text up to the matching } (character of catcode 2) in the latter case the braces are not passed as part of the argument. So if you have

\def\xxx#1{...#1...}

Then after \xxx Z then #1 will be the single token Z but after \xxx {ab{c}} it will be the 5 tokens ab{c}

Delimited arguments are similar but match all tokens up to a specified sequence of tokens (] in your example above) after

\def\yyy#1@?@{...#1...}

then after \yyy abc @?@ then #1 is the 4 tokens abc and the same tokens are passed if the input is \yyy {abc }@?@ as if a delimited argument would consist just of a brace group, the outer level of braces is stripped.

\show only ever shows a single token so \show #1 if #1 is one is the same as \show one which will show o and typeset ne


The question

Is the recursive line \expandafter\setarrayItem\fi a kind of implicit loop for parameter token munching?

isn't particularly related to parameters other than the \fi closes the \ifx\end#1 test which means that if #1 was not end, the macro recursively calls itself in this branch, the branch when #1 is \end is empty, so stopping the iteration.


Parameter tokens #1 to #9 are only relevant at macro definition time, so you're being misled when thinking to them.

The macro \setarray has two undelimited arguments (because the parameter token are not separated from each other by anything). This means that TeX will look for two arguments when expanding \setarray.

When looking for an undelimited argument (again, the “delimited” or “undelimited” only refers to how the macro has been defined), TeX skips space tokens until finding a nonspace one. There are two cases:

  1. the nonspace token is not a <left brace>
  2. the nonspace token is a <left brace>

(meaning an explicit { or any other token with category code 1, but let's not complicate things).

In the first case, the nonspace token is substituted for the corresponding parameter in the replacement text. In the second case, TeX continues scanning the input looking for the matching <right brace> (so keeping track of brace nesting). When it has found it, it strips off those outer braces and substitutes the whole set of absorbed tokens in place of the corresponding parameter.

Thus, with \def\foo#1#2{-#1-#2-}, the calls

\foo\bar\x
\foo\bar{abc}
\foo{abc}\bar
\foo{abc}{def}

will result in delivering, respectively,

-\bar-\x-
-\bar-abc-
-abc-\bar-
-abc-def-

to the main token list for further processing.

Let's see what \setarray\groups{{one}{two}{three}} does; by the rules above, #1 is replaced with \groups and #2 by {one}{two}{three}, so the new token list will be

\itemidx=0 \edef\tmp{\string\groups}\setarrayItem{one}{two}{three}\end

The two assignments are performed and we remain with

\setarrayItem{one}{two}{three}\end

According to its definition, \setarrayItem has one argument; the rules above say it's {one} (but the braces will be stripped off), so we get

\advance\itemidx by1
   \ifx\end one\else
      \expandafter\def\csname data:\tmp:\the\itemidx\endcsname{one}%
      \expandafter\setarrayItem\fi
{two}{three}\end

(line breaks and % don't really make sense in token lists, I use them just for clarity). The assignment is performed and disappears (\itemidx will contain the value 1). Then the \ifx test is performed, comparing \end with o; since the two tokens are different, the tokens up to \else are swallowed, so we remain with

\expandafter\def\csname data:\tmp:\the\itemidx\endcsname{one}%
\expandafter\setarrayItem\fi
{two}{three}\end

OK, \expandafter acts on \csname which will build a symbolic token; we'll be left with

\def\data:\groups:1{one}\expandafter\setarrayItem\fi{two}{three}\end

where, remember, \data:\groups:1 is a single token. The definition is performed and we're left with

\expandafter\setarrayItem\fi{two}{three}\end

Here \expandafter expands \fi (that leaves nothing), so we obtain

\setarrayItem{two}{three}\end

and the same as before will be repeated causing the definition of \data:\groups:2 and \data:\groups:3. At the next iteration, we'll be left with

\setarrayItem\end

and now we'll have

\advance\itemidx by1
   \ifx\end\end\else
      \expandafter\def\csname data:\tmp:\the\itemidx\endcsname{\end}%
      \expandafter\setarrayItem\fi

The counter is advanced, then \end is compared to \end: oh, the test returns true! So nothing is removed except the test tokens, so we remain with

\else
\expandafter\def\csname data:\tmp:\the\itemidx\endcsname{\end}%
\expandafter\setarrayItem\fi

What's the expansion of \else? It consists in swallowing everything up to the matching \fi and make everything found disappear. End of the recursion. To recapitulate, we have defined three macros (with complicated names.

In the case of

\setarray\nogroups{one two three}

the routine will do the definitions

\def\data:\nogroups:1{o}
\def\data:\nogroups:2{n}
\def\data:\nogroups:3{e}
\def\data:\nogroups:4{t}
\def\data:\nogroups:5{w}
\def\data:\nogroups:6{o}
\def\data:\nogroups:7{t}
\def\data:\nogroups:8{h}
\def\data:\nogroups:9{r}
\def\data:\nogroups:10{e}
\def\data:\nogroups:11{e}

because the spaces between e and t and between o and t will be ignored because of the rule by which undelimited arguments are looked for.

The macro \getarray[1]\groups will build the control sequence named

\data:\groups:1

and its expansion will deliver one; with \getarray[1]\nogroups, the control sequence

\data:\nogroups:1

is built and its expansion delivers o.


Now you should compare all of the above with this rough implementation in expl3, where the code is almost self-explaining (but less fun):

\usepackage{expl3}
\ExplSyntaxOn
\cs_new_protected:Npn \setarray #1 #2
 {
  \tl_clear_new:N #1
  \tl_set:Nn #1 { #2 }
 }
\cs_new:Npn \getarray [#1] #2
 {
  \tl_item:Nn #2 #1
 }
\ExplSyntaxOff

Not that I recommend the syntax \getarray[1]\groups, as the square brackets seem extraneous to the context. This even allows, out of the box, to call \getarray[-1]\groups to access the last item in the array.

Oh, and

\setarray\foo{{\textbf{a}}{\textit{a}}{\textsf{a}}}

\edef\baz{\getarray[1]\foo}

would work and store \textbf{a} in \baz. Try it with the (admittedly clever) code by wipet.