\string command behavior - Plain TeX
The ASCII code of \
is 92, and this is also TeX's internal code for the backslash. Now have a look at Appendix F of the TeXbook (F like Font Tables). There you see that the typewriter font cmtt10
indeed has a \
sitting in position 92 (which is ´134 in octal notation), whereas the standard text font cmr10
has a quotation mark “
in that position. The reason for the latter is that in a standard text you'll need quotation marks quite often, but hardly ever a backslash.
If you want to procude the font tables yourself, use the following code (thanks to Stefan Kottwitz for the suggestion!):
\documentclass{article}
\addtolength{\textheight}{1cm}
\usepackage{fonttable}
\begin{document}
\fonttable{cmr10}
\fonttable{cmtt10}
\end{document}
Hendrik's answer is great. Nevertheless, please allow me to elaborate on \string
and \escapechar
a little more:
Like you said the \string
primitive converts a control sequence into a list of character tokens1. It also works on other tokens and turns them into their string representation. Two things are notable here: All characters with the exception of spaces are returned as category code (catcode) 12 "other", even if their where from catcode 11 "letter" beforehand. If the control sequence contained spaces (possible with \csname .. \endcsname
) then they have still catcode 10 "space". This catcode change has no influence on typesetting or \write
ing these characters.
Secondly, the backslash in the control sequence is not stored by TeX with every control sequence name. The backslash is only required to mark a control sequence as one and once that got tokenized it is no longer required. Also the backslash is not hard wired into TeX. Every character with the catcode 0 "escape character" will work. However by default the backslash is the only one character with this catcode.
Because TeX doesn't store the escape character used for every particular control sequence it needs to get told how to represent it when it has to turn the control sequence back into a string. This is done using the \escapechar
register2 which holds the ASCII number of the used character. By default it is that to 92 which is the number for the backslash. This register can be changed at will. If it contains a negative number, no escape character will be produced by \string
at all. This fact is often used by (La)TeX code which like to get the macro name only. Has Hendrik already wrote in his great answer, you need to make sure that the escape character is available in the currently used font. Usually you are right when using a tt font.
So, for example \escapechar=`\A\catcode`\|=0 |string|foo
will output Afoo
, not \foo
or |foo
. After the catcode change TeX doesn't care here if you use |
or \
and you can mix them as you want (Note that \\
can be written as |\
now but not as ||
because the second backslash is not an escape character). You are also not forced to keep \escapechar
to a character of catcode 0.
1) Knuth, The TeXBook, page 40, paragraph 1, Chapter 7: How TEX Reads What You Type.
2) Knuth, The TeXBook, page 40, paragraph 6, Chapter 7: How TEX Reads What You Type.
The theory
How does \string
work? There are various cases, but first of all one must be aware that it's a primitive command of TeX which just looks at the next token without expanding it. This lookup happens after the tokenization process, which can transform the input in various ways.
So we are interested in the expansion of \string<token>
If
<token>
is a character of any category code except 10 (including active characters) , the expansion is the same character with category code 12. Note, however, that<token>
can't have category 0, 9, 14 or 15, because such characters never reach the "mouth" where expansion takes place.If
<token>
is a character of category code 10, the expansion is the pair (32,10), that is, a normal space token of category code 10.If
<token>
is a character with code 32 and any category code, the expansion is the pair (32,10), as in the preceding case.If
<token>
is a control sequence, then the expansion of\string<token>
is the control sequence's name preceded by the character having character code equal to the current value of\escapechar
. All characters in this expansion will have category code 12, except spaces that are normalized to normal space tokens with character code 32 and category code 10. The control sequence needn't be defined and it won't be stored in the hash table.However, if the value of
\escapechar
is less than 0 or greater than 255, no character will be added at the start of the returned string. The upper limit is 2097151, that is0x1FFFFF
, for XeTeX and LuaTeX.
Examples
The input
\edef\x{\string a}\ifcat\x a\message{11}\else\message{12}\fi
will return 12
The input
\catcode`?=10 \def\temp{?}\edef\x{\expandafter\string\temp}\show\x
will return
> \x=macro: -> .
The input
\catcode`\ =12 \edef\x{\string }% \ifcat\x\space\message{10}\else\message{12}\fi
will return 10
The input
\edef\x{\string\foo}\show\x
will return
> \x=macro: ->\foo.
(the missing space between
o
and.
shows that\foo
isn't a control sequence). Instead\escapechar=`A \edef\x{\string\foo}\show\x
will return> Ax=macro: ->Afoo.
The input
\escapechar=-1 \edef\x{\string\foo}\show\x
will return
> \x=macro: ->foo.
(without any character before
foo
)
Printing
The question about the printed output of \string\foo
is a different story; if one says \char`\\
, TeX will print whatever character is in slot 92 of the current font, which might be a backslash or not. It isn't in the standard Plain TeX font \tenrm
, that is cmr10
, but it is in cmtt10
. The same will happen when printing the string resulting from \string\foo
.