\string command behavior - Plain TeX

The ASCII code of \ is 92, and this is also TeX's internal code for the backslash. Now have a look at Appendix F of the TeXbook (F like Font Tables). There you see that the typewriter font cmtt10 indeed has a \ sitting in position 92 (which is ´134 in octal notation), whereas the standard text font cmr10 has a quotation mark in that position. The reason for the latter is that in a standard text you'll need quotation marks quite often, but hardly ever a backslash.

If you want to procude the font tables yourself, use the following code (thanks to Stefan Kottwitz for the suggestion!):

\documentclass{article}
\addtolength{\textheight}{1cm}
\usepackage{fonttable}
\begin{document}
\fonttable{cmr10}
\fonttable{cmtt10}
\end{document}

font tables


Hendrik's answer is great. Nevertheless, please allow me to elaborate on \string and \escapechar a little more:

Like you said the \string primitive converts a control sequence into a list of character tokens1. It also works on other tokens and turns them into their string representation. Two things are notable here: All characters with the exception of spaces are returned as category code (catcode) 12 "other", even if their where from catcode 11 "letter" beforehand. If the control sequence contained spaces (possible with \csname .. \endcsname) then they have still catcode 10 "space". This catcode change has no influence on typesetting or \writeing these characters.

Secondly, the backslash in the control sequence is not stored by TeX with every control sequence name. The backslash is only required to mark a control sequence as one and once that got tokenized it is no longer required. Also the backslash is not hard wired into TeX. Every character with the catcode 0 "escape character" will work. However by default the backslash is the only one character with this catcode. Because TeX doesn't store the escape character used for every particular control sequence it needs to get told how to represent it when it has to turn the control sequence back into a string. This is done using the \escapechar register2 which holds the ASCII number of the used character. By default it is that to 92 which is the number for the backslash. This register can be changed at will. If it contains a negative number, no escape character will be produced by \string at all. This fact is often used by (La)TeX code which like to get the macro name only. Has Hendrik already wrote in his great answer, you need to make sure that the escape character is available in the currently used font. Usually you are right when using a tt font.

So, for example \escapechar=`\A\catcode`\|=0 |string|foo will output Afoo, not \foo or |foo. After the catcode change TeX doesn't care here if you use | or \ and you can mix them as you want (Note that \\ can be written as |\ now but not as || because the second backslash is not an escape character). You are also not forced to keep \escapechar to a character of catcode 0.


1) Knuth, The TeXBook, page 40, paragraph 1, Chapter 7: How TEX Reads What You Type.
2) Knuth, The TeXBook, page 40, paragraph 6, Chapter 7: How TEX Reads What You Type.


The theory

How does \string work? There are various cases, but first of all one must be aware that it's a primitive command of TeX which just looks at the next token without expanding it. This lookup happens after the tokenization process, which can transform the input in various ways.

So we are interested in the expansion of \string<token>

  1. If <token> is a character of any category code except 10 (including active characters) , the expansion is the same character with category code 12. Note, however, that <token> can't have category 0, 9, 14 or 15, because such characters never reach the "mouth" where expansion takes place.

  2. If <token> is a character of category code 10, the expansion is the pair (32,10), that is, a normal space token of category code 10.

  3. If <token> is a character with code 32 and any category code, the expansion is the pair (32,10), as in the preceding case.

  4. If <token> is a control sequence, then the expansion of \string<token> is the control sequence's name preceded by the character having character code equal to the current value of \escapechar. All characters in this expansion will have category code 12, except spaces that are normalized to normal space tokens with character code 32 and category code 10. The control sequence needn't be defined and it won't be stored in the hash table.

  5. However, if the value of \escapechar is less than 0 or greater than 255, no character will be added at the start of the returned string. The upper limit is 2097151, that is 0x1FFFFF, for XeTeX and LuaTeX.

Examples

  1. The input

    \edef\x{\string a}\ifcat\x a\message{11}\else\message{12}\fi
    

    will return 12

  2. The input

    \catcode`?=10 \def\temp{?}\edef\x{\expandafter\string\temp}\show\x
    

    will return

    > \x=macro:
    -> .
    
  3. The input

    \catcode`\ =12 \edef\x{\string }%
    \ifcat\x\space\message{10}\else\message{12}\fi
    

    will return 10

  4. The input

    \edef\x{\string\foo}\show\x
    

    will return

    > \x=macro:
    ->\foo.
    

    (the missing space between o and . shows that \foo isn't a control sequence). Instead \escapechar=`A \edef\x{\string\foo}\show\x will return

    > Ax=macro:
    ->Afoo.
    
  5. The input

    \escapechar=-1 \edef\x{\string\foo}\show\x
    

    will return

    > \x=macro:
    ->foo.
    

    (without any character before foo)

Printing

The question about the printed output of \string\foo is a different story; if one says \char`\\, TeX will print whatever character is in slot 92 of the current font, which might be a backslash or not. It isn't in the standard Plain TeX font \tenrm, that is cmr10, but it is in cmtt10. The same will happen when printing the string resulting from \string\foo.