What is "expansion"?
Expansion is when: You take a thing that is expandable and ... well ... you expand it.
For example, if you define \newcommand\Aa{\Bb\relax}
, then:
\Aa
is (once) expandable, and it's expansion is\Bb\relax
.\Aa
is not fully expandable, because its 1st expansion contains\relax
, which is unexpandable.
Shortly, expansion is the process of substituting macro contents in place of macro call. What TeX-core (and whence LaTeX) does is that it:
- Expands whatever expandable it sees;
- Processes whatever un-expandable it sees.
So, for instance a TeX-definition macro \def
is unexpandable. So TeX processes it, in the following way (ignoring any specialities): take the first thing after \def
-- this is the name of the macro being defined. Take whatever follows until {
-- that are the parameters, and take the following one block {...}
and that's what the new macro expands to. In this process, TeX of course doesn't expand the things it processes unless it has to, but that's not the case of \def
.
On the other hand, when TeX-core sees in LaTeX \newcommand\greeting[1]{Hello #1!}
, then it starts expanding \newcommand
(which is expandable to some extent), and after many crazy things (and with me not being quite precise), it expands the sequence \newcommand\greeting[1]{Hello #1!}
into \def\greeting#1{Hello #1!}
. We already know from before that \def
is unexpandable, and when this is processed, it will define a macro called \greeting
which takes one parameter and which expands to Hello #1!
wherever used.
So now that we have our \greeting
macro, let's use it! So we write \greeting{glassbjs}
. TeX-core sees it, and knowing that \greeting
is expandable and takes one argument, it expands the sequence \greeting{glassbjs}
into Hello glassbjs!
. Since printable characters (letters, spaces, punctuation, ...) are by default unexpandable, TeX-core tries to process them, which means that it converts them into the glyphs from the current font (again I'm not precise) and it adds them to something that is called "the current horizontal list". And at that moment, TeX-core is happy about it and it moves forward.
If you look for a similarity in other languages, expansion is closest to pre-processor substitutions like C's #define f(x) something-here
. So the above thing would be in pretty similar to (in C++):
#define greeting(x) cout << "Hello " << x << "!"
However, expansion has some strange rules that do not apply for pre-processing in C. The two main differences are:
Expansion is not pre-processing, it goes along with procession. Which means that:
You can re-define things, and the definition used is the one that is valid in the current scope; in this sense, all macros are much more similar to variables than to functions; they are sensitive to changes, to grouping etc.
Expansion is simply half of how TeX works, so this is not going to be an easy answer.
First of all, TeX operates on tokens (see What is a token?). A token can be either expandable or unexpandable; if it's unexpandable, it obviously never expands, otherwise it can expand. What an expandable token expands to depends on how it has been made part of the language.
Some primitive (that is, whose meaning is predeclared when TeX starts from scratch) tokens are expandable, some aren't. Among the expandable primitives one finds \expandafter
, \if
and all other conditionals; this was predictable, as these primitive are used to control the expansion process itself; the expansion of an expandable primitive can be void, but it can trigger some other action. All macros defined with \def
(along with its variants \gdef
, \edef
or \xdef
) are expandable and their expansion is precisely the replacement text.
So, assume we have
\def\test{some text}
or, in LaTeX, \newcommand\test{some text}
which is pretty similar. Under normal operations, if TeX finds
Here is \test
it will absorb Here<space>is<space>
(eight unexpandable tokens) and stop at \test
, examine it and decide that it is expandable (TeX uses the current meaning of \test
), so it replaces it by some text
and goes on starting from “s”. Since this token is unexpandable, it's handed over to another phase of processing and TeX goes on with “o” and so on.
Now, suppose that we have
\def\test{\foo{o}me text}
\def\foo#1{s#1}
an input such as
Here is \test
would make TeX hand over tokens up to finding \test
, which it would replace with
\foo{o}me text
and \foo
would now be expanded to so
, after having absorbed its argument.
I said “normal operations”, because in some cases expansion is inhibited:
when TeX is looking for macro arguments;
when TeX is doing a definition with
\def
or\gdef
, with regard to the macro to be defined, the parameter text and the replacement text;when TeX is doing a definition with
\edef
or\xdef
, but this only concerns the macro to be defined and the parameter text, not the replacement text;when TeX is examining the token to be defined after
\let
,\chardef
,\mathchardef
and all other\...def
commands;when TeX is examining the “right hand side” of a
\let
;when TeX is absorbing the tokens for
\write
,\uppercase
,\lowercase
,\detokenize
,\unexpanded
or an assignment to a token register.
Sometimes we talk about "full expansion". What's that? When we do
\edef\baz{<tokens>}
the <tokens>
are subject to expansion, the resulting token list is subject to expansion and so on, until only unexpandable tokens remain, always starting from the next token as explained before. After this, the definition is performed, using the token list thus obtained as replacement text. Thus, with the definition above,
\edef\baz{\test}
would expand \test
to \foo{o}me text
and this to some text
, resulting in the same as if we said \def\baz{some text}
. Why should we want to do in this way? Because TeX uses the current meaning of the tokens; if we said
\def\baz{\test}
and later change the definition of \foo
(or of \test
, of course), the macro \baz
would expand to something else; with \edef
we're sure it expands to some text
.
However, \edef
doesn't perform assignments. So if we do
\count255=1
\edef\baz{\advance\count255 by 1 \number\count255 }
using \baz
would print 1, not 2. This is because \advance
and \count
are not expandable; character tokens aren't expandable either; \number
is expandable and its expansion is the decimal representation of the <number>
following it. So this \edef
is just equivalent to
\def\baz{\advance\count255 by 1 1}
The same happens when, ultimately, TeX expands the token list in a \write
: the expansion is performed "all the way", just like in \edef
. Expansion of tokens can be inhibited by preceding the one we don't want to expand with \noexpand
or enclosing them as the argument to \unexpanded
. Note that this is just a “one shot” activity: if one does
\def\a{A}\def\b{B}\def\c{C}
\edef\foo{\a\b\noexpand\c}
the replacement text for \foo
will be AB\c
; upon usage of \foo
, \c
will be expanded according to its current meaning.
Also tokens appearing after \csname
and before the matching \endcsname
are subject to full expansion; in this case only character tokens must remain, so a \relax
is illegal here, while it wouldn't be in \edef
or \write
.
Complicated? Yes.
Expansion by example, from the TeX Book (p 200, answer on p 328):
EXERCISE 20.2: What is the expansion of
\puzzle
, given the following definitions?\def\a{\b} \def\b{A\def\a{B\def\a{C\def\a{\b}}}} \def\puzzle{\a\a\a\a\a}
Answer:
ABCAB
. (The first\a
expands intoA\def\a{B...}
; this redefines\a
, so the second\a
expands intoB...
, etc.) At least, that's what happens if\puzzle
is encountered when TeX is building a list. But if\puzzle
is expanded in an\edef
or\message
or something like that, we will see later that the interior\def
commands are not performed while the expansion is taking place, and the control sequences following\def
are expanded; so the result is an infinite stringA\def A\def A\def A\def A\def A\def A\def A\def A...
which causes TeX to abort because the program's input stack is finite. This example points out that a control sequence (e.g.,
\b
) need not be defined when it appears in the replacement text of a definition. The example also shows that TeX doesn't expand a macro until it needs to.
We can see the expansion when using \tracingmacros=1
(see LaTeX \tracing
commands list?) and examining the .log
of the following MWE:
\documentclass{article}
\def\a{\b}
\def\b{A\def\a{B\def\a{C\def\a{\b}}}}
\def\puzzle{\a\a\a\a\a}
\begin{document}
{\tracingmacros=1
\puzzle}
\end{document}
With some comments:
\puzzle ->\a \a \a \a \a % Expansion of \puzzle contains five \a's
\a ->\b % Expansion of first \a
\b ->A\def \a {B\def \a {C\def \a {\b }}} % Expansion of \b (adds A to output)
\a ->B\def \a {C\def \a {\b }} % Expansion of second \a (adds B to output)
\a ->C\def \a {\b } % Expansion of third \a (adds C to output)
\a ->\b % Expansions of fourth \a
\b ->A\def \a {B\def \a {C\def \a {\b }}} % Expansion of \b (adds A to output)
\a ->B\def \a {C\def \a {\b }} % Expansion of fifth \a (adds B to output)
The final redefinition of \a
(to C\def\a{\b}
) is never "used".