When is \catcode executed?
TeX absorbs one record (typically a line) at a time, but doesn’t immediately tokenize it. It rather normalizes it changing the EOL character(s), if the operating system uses it (them), with the character corresponding to the current value of \endlinechar
.
Then it proceeds reading the line tokenizing the input as needed for determining what is coming along.
For instance, if it finds \foo {xyz}
and \foo
is a single argument macro, it will ignore the space and tokenize the open brace and whatever it finds until the matching closed brace is found (and tokenized). Going on with the example, if the expansion of \foo
includes something like \catcode\endlinechar=12
, the next end of line character not yet tokenized will be interpreted as having category code 12. So catcode changes are done in the stomach, but they can and will influence input not yet entered in the mouth, that is, not yet tokenized.
However, keep in mind that TeX interprets no instruction and expands no macro when it’s absorbing the replacement text of a macro. This is mainly what Knuth refers to with characters already inside token lists.
A curious example:
x\obeyspaces x\bye
You may know that \obeyspaces
is simply
\catcode`\ =\active
and that Plain TeX sets the active space character to expand to a catcode 10 space. You can check that the output is
Wait! Doesn't TeX ignore spaces after control words? Yes and no. When tokenizing the input, TeX determines that the next character after s
is not a letter (that is, the internal table of catcodes doesn't assign it the code 11), so it stops searching for the control sequence names and the input scanner goes into state “skipping blanks”, but the space has not yet been tokenized. The token \obeyspaces
is a parameterless macro, so it's expanded and the category code change performed. Now TeX needs more tokens, so it tokenizes the next character, which happens to be a space and it assigns it category code 13 as instructed by the (just changed) catcode table: since the next character doesn't have catcode 10, the state changed from “skipping blanks” to “middle of line”. Then TeX expands the active character it and a space in output appears.
According to the TeXbook, characters in a file are first converted to tokens with catcodes ("mouth") and then any nonexpandable commands are executed ("stomach").
You have to be careful here about what is meant by “and then”.
It is true that characters are converted into tokens by the “mouth”, and that these tokens are passed to the “stomach”. But if you interpreted it as saying that all characters in a file are first tokenized, and only then (after everything has been tokenized) the “stomach” comes into play — then that's not true. Instead, the two systems interact: the “mouth” may pass a command to the “stomach”, which takes some actions and then asks the “mouth” for more tokens, and so on. The actions taken in the “stomach” can influence the future workings of the “mouth”.
It may help to consider the other names of the “mouth” and “stomach”: they are called the “input processor”[+“expansion processor”] and “execution processor” in TeX by Topic, and “syntactic routines” and “semantic routines” by Knuth in the TeX program:
To a first approximation, you can think of the main control loop of TeX as a hungry stomach, simply executing commands one after another, and repeatedly asking the mouth for tokens either after completing a previous command, or while executing a particular command. For example, suppose you had the following input file:
hi\hskip 10 pt\end
Then the stomach gets
- the token
h₁₁
(which it “executes” essentially as a command to typeset that character — puts that character in the appropriate list). - the token
i₁₁
(which it “executes”, same as above). - the token
\hskip
— at this point, the stomach executes thehskip
command, as part of which the syntactic routines (mouth) are invoked and asked for tokens, to scan the glue specification10 pt
. - the token
\end
(which it executes as a command).
So when The TeXbook gives (on page 38) the example you mentioned, of {\hskip 36 pt}
being converted into the sequence of tokens {₁
, hskip
, 3₁₂
, 6₁₂
, p₁₁
, t₁₁
, }₂
, it is a bit misleading: although the characters do indeed get converted to those tokens at some point, this tokenization (of p
and t
for example) does not fully happen before the “stomach” sees the \hskip
command; much of it happens after.
\catcode
is a nonexpandable command, so it should be executed after character tokens have been assigned a catcode. […] So if a\catcode
command is encountered in the "mouth", does TeX automatically execute it so that it will affect any tokens after it? Or is it still executed in the stomach, after tokens following it might have been assigned the wrong catcode already?
There is a lot of (understandable) confusion here, but the answer is: \catcode
is executed in the stomach, after previous characters have been assigned catcodes and turned into tokens.
If
\catcode
is encountered in the mouth when the stomach is looking for a command to execute, then it is passed to the stomach, executed there, and affects future tokens.If
\catcode
is encountered in the mouth when the stomach is simply collecting tokens (such as in the definition of a macro or a token-list assignment) then it is simply collected as yet another token (not executed), and future tokens (in the list being collected) will be scanned according to the catcodes when the collection started.
To illustrate, consider \catcode`S=3
which changes the category code of the letter S
to 3 (namely math shift, like $
).
An example of the first case:
hello \catcode`S=3 SxS
\bye
Result:
An example of the second case:
\def\change{hello \catcode`S=3 SxS}
\change
now SyS
\bye
Result:
(Here, first the definition of \change
was collected as a token list in which there was an explicit “letter” token S, so when we used \change
it expanded to a list containing that letter-S token which is what got typeset. But that expansion of \change
also contained a \catcode
command, which got executed this time and affected future tokens.)
At which "organ"/stage of TeX is
\catcode
executed?
Simple answer: In the stomach.