Which characters are technically legal in macro names with T1?
Basically none. Some can work, some may not.
In the case of \v c
, the ultimate expansion is the character with code 163; in the case of \ss
, the ultimate expansion is \char"FF
, which is illegal inside \csname...\endcsname
.
With \def\äöü
you are not defining such a command, but rather the control symbol having as name the character number 0xC3
and which is required to be followed by the characters with codes 0xA4
, 0xC3
, 0xB6
, 0xC3
and 0xBC
(you should be able to recognize the UTF-8 representations of ä
, ö
and ü
).
Indeed, when you do \string\äöü
, you get an error, because the character 0xA4
appears isolated (the first byte in the UTF-8 representation of ä
has been absorbed by \string
) and so it raises an error about a malformed UTF-8 sequence.
The end result is pretty much arbitrarily wrong.
Can you give a precise rule for which characters are admissible in macro names?
Absolutely all bytes 0 to 255 are admissible in macro names. But how convenient they are to type, and how they correspond to characters in the human-visible sense, can depend, among other things, on the catcodes and on the definitions of active characters, which in turn can depend on the packages currently loaded (the input encoding and the font encoding).
The precise rule is that a macro is either:
A single active character: a token with 13 as the category code, and any number 0–255 as the character code.
A control word: an escape character (
\
) followed by a sequence of letters (tokens with 11 as the category code, and any number 0–255 as the character code).A control symbol: an escape character (
\
) followed by a single non-letter (token with anything other than 11 as the category code, and any number 0–255 as the character code).
Before answering the rest of your questions, some explanation.
Like most software systems, TeX (specifically, non-Unicode TeX, i.e. Knuth TeX or pdfTeX, as opposed to XeTeX or LuaTeX) understands only bytes (0 to 255); it doesn't understand “characters” as such. (And like most pre-Unicode systems, its terminology uses “bytes” and “characters” sometimes misleadingly.) To give the illusion of “understanding” bytes as characters, there are two “translations” that happen:
Font encoding: this says where the shapes (glyphs) for certain (what we think of as) characters are “supposed” to be in a font: e.g. under the default (OT1) encoding (and also under the T1 encoding), position 65 (octal
'101
, hexadecimal"41
) is supposed to contain something that looks like an “A”. And position 231 (hexadecimal"E7
) is supposed to contain a glyph for the “ç” in the T1 encoding, and not supposed to contain anything in the default (OT1) encoding. Correspondingly, thefontenc
package redefines the meanings of\c
etc as appropriate.Input encoding: With
\usepackage[utf8]{inputenc}
, this sets up certain characters (bytes) as active, so that UTF-8 sequences of bytes can be interpreted as the corresponding Unicode character.
Also: TeX has a way of directly inputting a specific byte in the input file, by ^^
followed by two hex digits (0123456789abcdef
), e.g. anywhere you can type 'A' (in text, in a macro name, whatever), you can also type ^^41
, etc. Let's use that for clarity.
With that understanding, the two examples in the question are:
\csname \c c\v c\'e\endcsname
— here, with\usepackage[T1]{fontenc}
, the definitions of\c
,\v
and\'
are such that\c c
expands to a token with category code 11 and character code 231 (hexe7
),\v c
expands to a token with category code 11 and character code 163 (hexa3
),\' e
expands to a token with category code 11 and character code 233 (hexe9
).
So the following are equivalent:
\expandafter\def\csname \c c\v c\'e\endcsname{I am weird}
and
{\catcode"E7=11 \catcode"A3=11 \catcode"E9=11 \expandafter\def\csname ^^e7^^a3^^e9\endcsname{I am weird}}
and simply
{\catcode"E7=11 \catcode"A3=11 \catcode"E9=11 \def\^^e7^^a3^^e9{I am weird}}
This is a macro of the “control word” type: a backslash followed by a sequence of three letters.
Here,
äöü
in the input file is (assuming you've saved the file in the UTF-8 encoding) the sequence of bytes C3 A4 C3 B6 C3 BC. Further,\usepackage[utf8]{inputenc}
changes the catcodes of all these bytes to active. So the following two are equivalent:% Assuming UTF-8 inputenc \def\äöü{Me too}
and
{\catcode"C3=13 \catcode"A4=13 \catcode"B6=13 \catcode"BC=13 % Same as those set by \usepackage[utf8]{inputenc} \def\^^c3^^a4^^c3^^b6^^c3^^bc{Me too}}
This is a macro of the “control symbol” type: what it has actually defined is
\^^c3
(a single nonletter), with the requirement that when used it's supposed to be followed by the tokens^^a4^^c3^^b6^^c3^^bc
all of catcode 13. (Else you'll get something likeUse of \^^c3 does not match its definition
.)
Now to answer the rest of your questions:
Why is
\v c
legal but\v o
not?
\v c
expands to the token with category code 11 (letter) and character code 163 (hex"A3
). This you can see is the characterč
in T1.\v o
does not expand to a single character token (there is ač
but noǒ
in the T1 encoding), but to instructions to add an appropriate accent to theo
character. Inside\csname ... \endcsname
, everything should expand to just character tokens.
Why is there a difference between writing
\ss
and writingß
?
There's not much of a difference really; just that you (I guess) tried the former inside \csname … \endcsname
, and the latter directly after \def
.
Unlike the earlier case where (for example) \c c
expands to a single token with category code 11 and character code 231, \ss
expands to \char"FF
— that is, the TeX primitive command \char
, followed by (if \char
is being processed) the number "FF
. (This is different from the token ^^ff
, though why fontenc
doesn't define \ss
to expand to a single character token I don't know.) This too is not allowed inside \csname … \endcsname
.
ß
too expands to something similar (you can't use it inside \csname … \endcsname
either), but if you're using it after \def
directly, then without expansion it's a sequence of two active characters ^^c3^^9f
, and \def
doesn't expand the tokens.
Why does
\def\äöü
work but\string\äöü
not?
See above for why \def\äöü
works: it's \def\^^c3^^a4^^c3^^b6^^c3^^bc
.
And \string\äöü
is \string\^^c3^^a4^^c3^^b6^^c3^^bc
which is \string\^^c3
(which works: try it) followed by ^^a4^^c3^^b6^^c3^^bc
(and the first byte there, the second byte of the UTF-8 representation of ä
, has been defined as an active character that throws an error, because it should never appear on its own in valid UTF-8).
Why does this only work when using \usepackage[T1]{fontenc}?
The definition of the control symbol, as in \def\äöü{Me too}
, will work with or without \usepackage[T1]{fontenc}
, so will its usage. But if you want to use these “special” characters inside \csname ... \endcsname
, then you need their definitions to be things that expand to just character tokens (which \usepackage[T1]{fontenc}
does, because it can: those characters exist in the font), rather than expand to instructions for placing accents above/below other characters (which is what happens without \usepackage[T1]{fontenc}
, as there's no alternative).