Expansion of first token in a tabular cell

It is done by TeX itself, and it will be hard to stop it. From the TeXbook, double dangerous bend on page 240:

once the \cr at the end of the preamble has been sensed, TeX must look ahead to see if the next token is \noalign or \omit, and macros are expanded until the next non-space token is found. If the token doesn’t turn out to be \noalign or \omit, it is put back to be read again, and TeX begins to read the template (still expanding macros).

The problem seems to be that the stuff inserted by >{...} is put into the preamble of an \halign, and the expansion happens before TeX reads the template from the preamble.


TeX's tables are strange beasts with respect to expansion. Here is a plain TeX table with a right-aligned and a left-aligned columns

\halign{\hfil #&#\hfil \cr
  a & bc \cr
  de & f \cr}

The first line is the "preamble", which tells TeX what to do with the material in each cell, represented by #. So the first line becomes \hfil a&bc\hfil \cr. The \hfil stand for blank space that can stretch to fill an arbitrary distance, hence aligning the cell contents.

Taking this approach as given, let us try to invent how rules could be added to the language, with Knuth's hat on. [Admittedly, I'm likely to err in some of the reasons that led Knuth and later engine developers to make various decisions, but I think my picture is consistent, albeit full of anachronisms.]

Rules should be added between two lines, in other words, just after \cr. We postulate a hypothetical \hline, added after the \cr to produce a horizontal rule:

% Fake TeX code
\halign{\hfil #&#\hfil \cr
  a & bc\cr
  \hline
  de & f\cr
  \hline}

In the approach described so far, TeX would take \hline de as the cell's contents, and the second line would become \hfil \hline de&f\hfil\cr. In that particular case, we could still manage by making \hline insert material before the current line. But the trailing \hline is problematic: TeX would realize too late that there is no cell here: it would insert \hfil for the start of the first cell's preamble, then see \hline, insert the rule before the current line, see }, and have to somehow reverse the \hfil.

A saner approach would be that when TeX sees \cr, it should look ahead to see what follows the \hline. Should \hline be a primitive? That would only allow for the types of rules or inter-row material which have been hardcoded in the engine itself. No. A much better solution, which Knuth chose, is to allow for arbitrary material, with \noalign. Then there is no need to also provide \hline, which is simply a horizontal rule (\hrule) which is not aligned. So the real plain TeX way of having a line (well, two) is

\halign{#&#\cr
  a & b \cr
  \noalign{\hrule}
  c & d \cr
  \noalign{\hrule}}

Most will complain that the rules are too close to the text, but we are interested for now in expansion issues, not typography. Naturally, one may want to provide a shorthand for \noalign{\hrule}, say, \hline:

\def\hline{\noalign{\hrule}}
\halign{\hfil #&#\hfil\cr
  a & b \cr \hline
  c & d \cr \hline}

Again, we end up with the question of how TeX can know that this \hline macro hides a \noalign, and how TeX knows that it shouldn't just insert \hfil right away. The answer is that to find the \noalign, TeX expands macros after \cr, before inserting the material from the preamble. The same happens at each cell to look for \omit, but I won't delve on that.

This leads to problems: for instance, a macro which behaves differently in math mode than in text mode, should not start with \ifmmode but with \relax\ifmmode.

\def\foo{\relax\ifmmode x^2\else the square of $x$\fi}
\halign{\hfil $#$\hfil &\hfil $#$\hfil \cr
  \foo & y^2 \cr
  z-2 & t\cr}

TeX expands the first token after the \cr, which is \foo, then sees \relax. This stops the expansion, it is not \noalign nor \omit, so the preamble is inserted, entering math mode. The \ifmmode test is then performed. Without the \relax, \ifmmode would have been evaluated before the preamble was inserted, and would have been false.

To help fight against this expansion, the eTeX programmers decided (in version 2, according to the eTeX manual) that \protected macros would stop the expansion in this situation. This is somewhat inconsistent with how eTeX expands in other full-expansion-from-the-left settings such as \romannumeral-`q. In Martin's case (see his answer to the current question), this ends up being very useful since he can stop the expansion using a protected macro. In other cases (see some of Peter Grill's questions, for instance one about how to provide a wrapper for \cmidrule), we would like TeX to try harder to find the hidden \noalign.


Thanks to Hendrik's answer and the chat with him and Joseph Wright as well as the comment from Bruno Le Floch, I now solved all the issues related to the expansion in tabulars, i.e. in the underlying plainTeX primitive \halign. I like to list them here in case there are useful for others.

Just to summaries: The issue here was that \halign expands the tokens after \cr (which is more or less the plainTeX version of \\) and after & to see if a \noalign follows. This expansion stops when a non-expandable token is found. The collcell package wants to collect this tokens unexpanded. There is also the issue of the \\ getting expanded if the cell only is empty or contains only macros which expand to nothing or spaces, like \empty or \space. This is an issue because the code looks for \\ as an end-marker and didn't liked the included dirty tricks much.

The solutions I came up with now are:

  • The \\ macro is \robustify-ed with the help of the etoolbox package. This way it isn't expanded by \halign but still works as normal otherwise.
  • A special start-marker will be provided which is also \protected to be unexpandable. The collcell code will explicitly ignore (i.e. gobble) it when it is collecting the cell content. This allows users to place it in cells they want to be collected without any expansion and a \relax isn't suitable. This is necessary because special macros like \texttiming (tikz-timing package) must not be fed with unexpandable macros.
    This isn't optimal, because it still requires some action by the user, but I can't see any other solution. Redefining & is not really an option IMHO.
  • After the \\ is found by collcell the collection of tokens is stopped, the there included \cr is temporary redefined to re-start the collection after the original \cr is expanded (using \expandafter) which makes TeX insert the tokens defined by <{ }. This way these tokens are now also collected, which is important to support cascaded >{}/<{}.

I'm happy to hear any feedback about this.