Where do I find \futurelet's nasty behaviour documented?
IMHO the shown behavior in the first example is understandable, i.e. "correct". The TeXbook says about the \futurelet
macro (p.207):
TEX also allows the construction
\futurelet\cs<token1><token2>
, which has the effect of\let\cs = <token2><token1><token2>
.
So we are talking about tokens, which already have their catcode assigned. In your example A
is the <token2>
and must be read by TeX in order to assign it to the command sequence \testtoken
. The macro \activateA
is not executed yet so the catcode of A is still letter. When it is placed back into the input stream (which is the main point here) the catcode change in \activeA
doesn't effect it any longer.
So actually \futurelet\cs<token1><token2>
does not have the 100% same effect of \let\cs = <token2><token1><token2>
because in the second case <token1>
can still modify the catcodes in effect for reading the second <token2>
.
You second example is much more tricky. IMHO you are falling victim to automatically inserted code after (and before) the cell content. Therefore when \futurelet
is executed there is really an \outer endtemplate
behind it. This is then a behaviour of \halign
and shouldn't be blamed on \futurelet
. As you know my knowledge about \halign
is still limited so I can't example the resulting error as well.
I would say if there is a documentation of this behaviour, it is in The TeXbook. I don't think eTeX changed the behavior of \futurelet
in any way.
There are actually two parts to your question that are interacting a bit strangely.
For the first example, the behavior is completely sensible. As others have observed, the A
is tokenized and given category code 11. Once a character token is tokenized, it never loses its category code (except for e-TeX's \detokenize
extension).
The second example is much trickier. My initial reaction was that it should work as you want. Unfortunately, TeX treats \halign
in a very strange way. Let me quote from The TeXbook, page 248.
You have to be careful with the use of
&
and\span
and\cr
because these tokens are intercepted by TeX's scanner even when it is not expanding macros. For example, if you say ‘\let\x=\span
’ in the midst of an alignment entry, TeX will think that the ‘\span
’ ends the entry so\x
will become equal to the first token following the ‘#
’ in the template. You can hide this\span
by putting it in braces; e.g., ‘{\global\let\x=\span}
’. (And Appendix D explains how to avoid\global
here.)
This is very surprising, but it explains the behavior you see. Namely, when TeX encounters the &
when looking for the token to \let
to \testtoken
, it sees a &
and so inserts the rest of the template (empty in this case) followed by \endtemplate
—the command that causes the contents of the alignment entry to be typeset in an "unset" box (meaning the glue hasn't been set yet).
Disclaimer: I have no references to support my claims below.
Let me start with a simple example:
\def\showtoken{\show\testtoken}
\halign{#b\cr
a\futurelet\testtoken\showtoken\cr}
When it reads the first cell, TeX sees the unexpandable token a
, so it knows that there is no \omit
. It then inserts the part of the template before #
, namely, nothing, and it prepares to insert the part of the template after #
, namely, b
, when the cell finishes.
It then reads the \futurelet
, and executes it. Thus TeX looks for three tokens. The first and second are \testtoken
and \showtoken
. The third is not \cr
: when it sees \cr
in this context, TeX inserts what it desperately wants to insert: the end of the template, b
. So we have \futurelet \testtoken \showtoken b
. Then \testtoken
is \let
to b
, and we are shown b
by the macro \showtoken
.
An interesting combination of \futurelet
and \afterassignment
allows you to peek two tokens ahead, and we check that \futurelet
indeed reads the tokens \testtoken
, b
and c
and assigns \testtoken=c
.
\def\showtoken{\show\testtoken}
\halign{#bcd\cr
a\afterassignment\showtoken\futurelet\testtoken\cr}
Now delete cd
in the template. TeX should print on your terminal
> \testtoken=\outer endtemplate:
.
\showtoken ->\show \testtoken
<to be read again>
b
<to be read again>
\endtemplate
<template> b\endtemplate
l.3 ...assignment\showtoken\futurelet\testtoken\cr
}
As you can see, the tokens inserted by TeX at the end of the cell are ended by \endtemplate
, which is some internal token. Removing even the b
, we see that \futurelet
reads \testtoken
, the internal \endtemplate
, and the following token, }
. Then TeX crashes.
A variant on this is to grab the \endtemplate
using \let
, and redo it to close the template. Playing around with the code below, I notice that if we put anything else than \testtoken
(which holds the \endtemplate
) in the last position of \showdotoken
, TeX crashes. I have no idea why.
\def\showdotoken{\show\testtoken \testtoken}
\halign{#\cr
a\afterassignment\showdotoken\let\testtoken\cr
}
It seems that TeX crashes whenever we try to reach out after the end of a cell in an alignment: both \halign
below crash.
\halign{#\cr
a\futurelet\testtoken\cr}
\halign{#&#\cr
a\futurelet\testtoken & text\cr}
Back on Hendrik's example.
\def\begintestalign{\show\testtoken
$\vcenter\bgroup\halign\bgroup##&##\cr}
\def\endtestalign{\egroup\egroup$}
\halign{#&#\cr
a & \futurelet\testtoken
\begintestalign
& c \cr
test & de \cr
\endtestalign \cr
}
The \futurelet
takes three tokens: \testtoken
, \begintestalign
, and the material that TeX is waiting to insert as soon as it sees either &
or \cr
(or \crcr
), namely, nothing, followed by the internal \endtemplate
. So that &
is already converted to \endtemplate
(and perhaps some \begin-next-template
) of the first \halign
. Then \begintestalign
is read, creates a new inner \halign
, and that \halign
awaits a fresh &
... it sees an old &
, already belonging to the enclosing \halign
.
All this might be related to an error mentioned in the TeXbook, p. 299:
Interwoven alignment preambles are not allowed.
If you have been so devious as to get this message, you will understand it, and you will deserve no sympathy.