How does TeX look for delimited arguments?

From \def\foo#1#2\baz{..}, the #1#2 bit means that #1 is not delimited, so it's either a single token or a braced argument, so the commands grabs a single token or a braced group (the only way one of this arguments are empty is with {}); after that, the #2\baz part means that #2 is indeed delimited, so it grabs until \baz (removing possibly outer braces, this can be empty by just putting \baz next). In case 4 it grabs first argument, and then goes until \baz for the second one; in case 5 it grabs first argument (which is \baz) and then grabs until \baz but there is no \baz left (it's been grabbed as first argument) so it cannot grab the second argument, hence the error message.

Maybe it could help to think in terms of nested commands, each grabbing one argument. \def\foo#1#2\baz{(1:#1) and (2:#2)} could be done like

\def\foo#1{(1:#1) and \footwo}
\def\footwo#1\baz{(2:#1)}

Short introduction to the topic

At the time of defining a macro, some ⟨parameter text⟩ can be provided.

⟨parameter text⟩ at the time of defining a macro implies that at times of expanding the macro in question TeX has to "fetch" tokens from the token stream in order to process them as macro arguments and argument delimiters.

Can `\outer` tokens occur in ⟨parameter text⟩ ?
Can `\outer` tokens be components of macro arguments?

Tokens that are \outer cannot be part of a ⟨parameter text⟩.

\outer tokens also cannot be components of macro arguments.

\outer tokens can make it into the ⟨definition text⟩ of macro definitions by means of the \edef..\noexpand-trick :

\outer\def\foo{This is foo and foo is outer.}
\edef\bar{%
  This is bar and it calls 
  {\noexpand\tt\noexpand\string\noexpand\foo:}
  \noexpand\foo
}

\foo

\bar

%This would yield an error:
% ! Forbidden control sequence found while scanning definition of ... 
%\def\bar{This is bar and it calls {\tt\string\foo:} \foo}

\bye

enter image description here

About `\long` and short

In case a macro is defined in terms of \long, e.g.,
\long\def\macro#1#2{Argument 1: #1; Argument 2: #2}
, all arguments of that macro can contain the token \par.
A macro defined in terms of \long colloquially is called a "long macro".

In case a macro is not defined in terms of \long, e.g., without \long, just
\def\macro#1#2{Argument 1: #1; Argument 2: #2}
, one of its arguments containing the token \par triggers an error-message on the console/terminal in terms of the question about "Runaway argument?"

E.g., something like the following console-output:

*\def\macro#1#2{Argument 1: #1; Argument 2: #2}

*\macro{\par argument 1}{argument 2}
Runaway argument?
{
! Paragraph ended before \macro was complete.
<to be read again> 
                   \par 
<*> \macro{\par
                argument 1}{argument 2}
?

Although unlike the \long-prefix there is no \short-prefix, a macro not defined in terms of \long sometimes colloquially is called a "short macro".

The \long mechanism detects the token \par no matter if \par has its usual meaning (as something that "tells" TeX that the end of the paragraph is reached and that TeX shall typeset that paragraph) or if \par is redefined.
The \long-mechanism does not detect tokens other than \par whose meaning equals that of the \par-primitive.
E.g., after

\let\pur=\par 
\def\par{Hello!}

, the \long-mechanism will still take \par as a trigger for the runaway-argument-message while \pur is not a trigger for the runaway-argument-message.

About the order in time in which TeX does gather macro arguments

When expanding a macro, TeX takes the ⟨parameter text⟩ from the corresponding ⟨definition⟩ for the directive to gather the macro arguments one by one from the token stream.

There are two sorts of macro arguments:

1. Non-delimited macro arguments

A non-delimited macro argument consists

either of a single token which neither is an explicit character token of category code 1(begin grouping) nor is an explicit character token of category code 2(end grouping)—i.e.: a single token which neither is an opening curly brace {₁, nor is a closing curly brace }₂,
or of a set of tokens nested into a leading explicit character token of category code 1(begin grouping) and a trailing explicit character token of category code 2(end grouping), and containing explicit character tokens of category code 1(begin grouping) and explicit character tokens of category code 2(end grouping) in a way where each explicit character token of category code 1(begin grouping) has exactly one matching explicit character token of category code 2(end grouping) and vice versa—i.e.: a set of tokens nested into an opening curly brace and a closing curly brace, and containing opening curly braces and closing curly braces in a way where each opening curly brace has exactly one matching closing curly brace and where each closing curly brace has exactly one matching opening curly brace.

In case the non-delimited macro argument is a set of tokens—by the way: that set can be empty—that is nested into a leading explicit character token of category code 1(begin grouping) and a trailing explicit character token of category code 2(end grouping), these two tokens will silently be removed when processing the macro argument. I.e.: In case the non-delimited macro argument is a set of tokens that is nested in curly braces, these curly braces will silently be stripped off. Only the outermost level of bracing will be removed.
If a macro passes an argument to another macro, e.g., something like

\def\foo#1{\bar#1!}
\def\bar#1{\string\bar's argument is in parentheses: (#1)}

—the exclamation-mark indicates the place of the last token of the argument of \foo and parentheses surround what is processed as argument of \bar—, then each macro might remove a level of surrounding curly braces:

E.g., \foo{{Argument}} yields \bar{Argument}!, which in turn yields \string\bar's argument is in parentheses: (Argument)!.

E.g., \foo{{A}rgument} yields \bar{A}rgument!, which in turn yields \string\bar's argument is in parentheses: (A)rgument!.

E.g., \foo{Argument} yields \bar Argument!, which in turn yields \string\bar's argument is in parentheses: (A)rgument!.

When TeX starts gathering an non-delimited macro argument from the token stream, it will silently discard explicit space tokens (i.e., explicit character tokens of category code 10(space) and character code 32) until finding a token that is not an explicit space token.
Either that token in turn will be considered the single token that forms an non-delimited macro argument which is not nested within a pair of explicit character tokens of category code 1(begin grouping) respective category code 2(end grouping)—i.e., that token will be considered the single token that forms a non-delimited macro argument which is not nested in curly braces.
Or that token in turn will be considered the leading explicit character token of category code 1(begin grouping) of the pair of explicit character tokens of category code 1(begin grouping) respective category code 2(end grouping) wherein the set of tokens forming the non-delimited macro argument is nested—i.e., that token will be considered the opening curly brace of the curly braces wherein the set of tokens forming the non-delimited macro argument is nested.

This discarding of explicit space tokens is the reason why having spaces between non-delimited macro arguments and not having spaces between non-delimited macro arguments makes no difference:

\def\ProcessTwoNonDelimitedArgs#1#2{Arg1:(#1), Arg2:(#2)}

% A <space token> between {Argument1} and {Argument2}:
\ProcessTwoNonDelimitedArgs{Argument1}  {Argument2}

% No <space token> between {Argument1} and {Argument2}:
\ProcessTwoNonDelimitedArgs{Argument1}{Argument2}
\bye

enter image description here

2. Delimited macro arguments

A delimited macro argument is delimited by a set of tokens neither containing explicit character tokens of category code 1(begin grouping), nor containing explicit character tokens of category code 2(end grouping), nor containing explicit character tokens of category code 6(parameter)— i.e., a delimited macro argument is delimited by a set of tokens neither containing opening curly braces, nor containing closing curly braces, nor containing explicit hashes.
(#₆, hash, is an explicit character token of category code 6(parameter). An implicit character token of category code 6(parameter) would, e.g., be the implicit hash-character-tokens of category code 6(parameter) \hash and U with \let\hash=# or \catcode`\U=13 \letU=#. By the way: The so-called #-doubling at times of unexpanded writing etc only occurs with explicit character tokens of category code 6(parameter).)

With delimited macro arguments the delimiting set of tokens is always behind the macro argument.

Argument delimiters can contain implicit character tokens of category code 6(parameter) if the tokens constituting the implicit character tokens of category code 6(parameter) were not yet implicit character tokens of category code 6(parameter) at the time when defining took place but were turned into implicit character tokens of category code 6(parameter) by redefining them afterwards.

\let\ImplicitHash=# %
\def\Macro#1\ImplicitHash\ImplicitHash#2\ImplicitHash{Arg1:(#1), Arg2:(#2)}
\Macro{A1}\ImplicitHash\ImplicitHash{A2}\ImplicitHash
\bye

does not work, while

\def\Macro#1\ImplicitHash\ImplicitHash#2\ImplicitHash{Arg1:(#1), Arg2:(#2)}
\let\ImplicitHash=# %
\Macro{A1}\ImplicitHash\ImplicitHash{A2}\ImplicitHash
\bye

enter image description here

does work.

When TeX gathers a delimited macro argument for a macro, it will gather tokens from the token stream until

either it finds tokens that form the matching argument delimiter
or it is obvious that an error message needs to be raised.

A sequence of tokens within the token stream which looks like those tokens that within the ⟨parameter text⟩ of the corresponding macro definition form the argument delimiter but is not within the same group formed by explicit character tokens of category code 1 and matching explicit character tokens of category code 2 as the macro-token whose macro arguments are gathered, will not be considered the matching argument delimiter. I.e.: A sequence of tokens in the token stream that looks like those tokens that within the ⟨parameter text⟩ of the corresponding macro definition form the argument delimiter but is not within the same curly-brace-group as the macro-token whose macro arguments are gathered, will not be considered the matching argument delimiter.

Considering such a sequence a matching argument delimiter would imply the possibility of delimited macro arguments where explicit character tokens of category code 1 or 2 don't have matching counterparts. I.e., considering such a sequence a matching argument delimiter would imply the possibility of delimited macro arguments where curly braces don't have matching counterparts.
Therefore proper curly-brace-nesting/proper balancing of explicit character tokens of category code 1 with matching explicit character tokens of category code 2 is taken into account when searching an argument delimiter.

E.g., after defining

\def\macro#1\deli\miter{This is the argument: (#1)}

, calling \macro as in the sequence

\macro PartOfArgument{\deli\miter}OtherPartOfArgument\deli\miter

yields that #1 from the ⟨parameter text⟩ of the definition with this call will be considered a placeholder for the sequence PartOfArgument{\deli\miter}OtherPartOfArgument.

In cases of finding the matching argument delimiter, the tokens forming the argument delimiter will be discarded. After finding and discarding the matching argument delimiter, TeX will check whether the entire set of tokens gathered for the macro argument so far (that set can be empty in case the delimiter is found immediately) is nested into a leading explicit character token of category code 1(begin grouping), and a trailing explicit character token of category code 2(end grouping). I.e., after finding and discarding the matching argument delimiter, TeX will check whether the entire set of tokens gathered for the macro argument so far is nested in curly-brace tokens. If so, these two tokens will silently be removed. (This can result in emptiness in case TeX only gathered an explicit character token of category code 1(begin grouping) followed by an explicit character token of category code 2(end grouping) when finding the delimiter. I.e., this can result in emptiness in case TeX only gathered a pair of matching curly braces when finding the delimiter.)

There are situations where you wish to avoid this removal of the outermost level of curly braces from delimited macro arguments. This can easily be achieved by putting a token in front of the actual argument which cannot accidentally/erroneously match the argument delimiter and at the end of the expansion-chain having this token removed.

E.g.:
(\detokenize in use, thus eTeX-extensions required this time ;-) )

\def\DoNothing{}%
\def\RemoveDot.{}%
\def\macro#1#2\delimited{\detokenize\expandafter{#1#2}}%

\tt

% Stripping of curly braces with the delimited argument will take place.
% #2 will be: 'Delimited Argument'
\macro{\DoNothing}{Delimited Argument}\delimited

% Due to the leading dot stripping of curly braces with the delimited argument  will not take place.
% #2 will be: '.{Delimited Argument}'    
\macro{\RemoveDot}.{Delimited Argument}\delimited
\bye

enter image description here

When TeX starts gathering a delimited macro argument, it will not discard whatsoever token. It will not discard space tokens. When TeX starts gathering a delimited macro argument, anything that might be taken for a preceding space token will be part of the set of tokens that forms the macro argument.

Let's exhibit these subtleties:

Assume \macro is defined as follows while LaTeX2e with eTeX-extensions is in use:

\def\macro#1#2\delimited{%
  Non-delimited argument: \expandafter\verb\scantokens{*|#1|};
  Delimited argument: \expandafter\verb\scantokens{*|#2|}%
}%

When calling \macro{Non-delimited}Delimited\delimited, at the delimited argument there is no ⟨space token⟩ and there are no surrounding curly braces, thus you get:

Non-delimited argument: Non-delimited; Delimited argument: Delimited

When calling \macro{Non-delimited} Delimited\delimited, at the delimited argument there is a ⟨space token⟩ and there are no surrounding curly braces, thus you get:

Non-delimited argument: Non-delimited; Delimited argument: ␣Delimited

When calling \macro{Non-delimited}{Delimited}\delimited, at the delimited argument there is no ⟨space token⟩ and there are surrounding curly braces—the surrounding curly braces will be removed, thus you get:

Non-delimited argument: Non-delimited; Delimited argument: Delimited

When calling \macro{Non-delimited} {Delimited}\delimited, at the delimited argument there is a ⟨space token⟩ and there are curly braces. This time they do not surround the entire delimited argument because the ⟨space token⟩ is there too. Thus this time they will not be removed:

Non-delimited argument: Non-delimited; Delimited argument: ␣{Delimited}

There is a subtle thing in TeX: `#{`-notation:

The ⟨definition text⟩ of a macro must be something that is nested between an explicit character token of category code 1(begin grouping) and an explicit character token of category code 2(end grouping). I.e., usually the ⟨definition text⟩ of a macro must be something that is nested between curly braces.

Therefore you can, e.g., write:

\def\macro{definition-text}
{\tt \meaning\macro}
\bye

enter image description here

In case the last token of the ⟨parameter text⟩ is a single character token of category code 6(parameter), be it explicit, e.g., a hash (#₆), or implicit, the explicit character token of category code 1(begin grouping) wherein the ⟨definition text⟩ is nested will also be considered the last token of the delimiter of the last macro argument. While—at the time of gathering a delimited macro argument—usually the entire argument delimiter of a delimited argument gets discarded, in this special case that last token of the argument delimiter of the last macro argument will not just be discarded but will be discarded and re-inserted, which is the same as if it would be left in place, and after expanding the macro in question that token will occur right behind the ⟨replacement text⟩ delivered by expanding the macro in question.

Example:

\tolerance 9999
\emergencystretch 3em
\hfuzz 0pt \vfuzz \hfuzz
\parindent=0ex
%-------------------------------------------------------------------------------------------------
\let\ImplicitHash=#
\def\macroA#1#2\relax#{Argument1:(#1), Argument2:(#2) The meaning of the next token: \meaning}
\def\macroB#1#2\relax\ImplicitHash{Argument1:(#1), Argument2:(#2) The meaning of the next token: \meaning}
\def\Weird{Whatsoever}

\tt\frenchspacing

\meaning\macroA

\noindent\hrulefill\null

\macroA{Arg 1}{Arg 2}\relax{ The meaning of the next token: \meaning}

\noindent\hrulefill\null

\meaning\macroB

\noindent\hrulefill\null

\macroB{Arg 1}{Arg 2}\relax{ The meaning of the next token: \meaning}

\bye

enter image description here

Until TeX, version 3.141592653, January 2021, there was an edge thing

(It was brought to my attention by others in the usenet newsgroup comp.text.tex in 2009. See e.g., the usenet-discussion "Is there any difference between { } and bgroup and egroup ?")

In case the last token of the ⟨parameter text⟩ is a single character token of category code 6(parameter), be it explicit, e.g., a hash (#₆), or implicit, with TeX prior to version 3.141592653 you are not bound to the first token of the ⟨definition text⟩ being an explicit character token of category code 1(begin grouping). In that case the first token of the ⟨definition text⟩ at definition time being whatsoever character token of category code 1(begin grouping), be it explicit or implicit, is sufficient, and in that case that explicit/implicit character token of category code 1(begin grouping) will not only be considered the first token of the ⟨definition text⟩ but it will also be considered the last token of the delimiter of the last macro argument.

By the way: In this special case where at definition time this explicit or implicit character token of category code 1(begin grouping) is considered both the first thing of the ⟨definition text⟩ and the last (and not to be discarded) component of the delimiter of the last macro argument, you can, when using an implicit token of category code 1(begin grouping), when defining is done, have that token redefined to be something other than a token of category code 1(begin grouping), and it will be left in place anyway when at expansion-time such a delimited macro argument is gathered.

Example:

\let\Weird={
\let\ImplicitHash=#
\def\macroA#1#2\relax#\Weird Argument1:(#1), Argument2:(#2)\string}
\def\macroB#1#2\relax\ImplicitHash\Weird Argument1:(#1), Argument2:(#2)\string}
\def\Weird{Whatsoever}

\tt
\meaning\macroA

\macroA{Arg 1}{Arg 2}\relax\Weird

\meaning\macroB

\macroB{Arg 1}{Arg 2}\relax\Weird

\bye

enter image description here

In the TeXbook it is explained that you can use this #{-feature for having the last macro argument of some macro delimited also by an opening curly brace that will be left in place so that it can, e.g., serve as opening curly brace of a following non-delimited macro argument which in turn is to be processed by another macro whose call is also delivered by the already mentioned "some" macro.

The most important things to keep in mind about TeX's gathering of macro arguments are:

TeX will gather macro arguments from the token stream one by one.
In case of gathering a delimited macro argument, TeX will gather tokens from the token stream until finding the delimiter. This implies that delimited macro arguments can be empty in case TeX has not yet gathered any token when finding the delimiter.
When gathering non-delimited macro arguments, TeX will discard preceding space tokens.
When gathering delimited macro arguments, TeX will not discard space tokens. With delimited macro arguments anything that might be taken for a preceding space token will be part of the set of tokens that forms the macro argument.
If present, one level of curly braces surrounding an entire macro argument, be it a delimited macro argument or a non-delimited macro argument, will be removed silently.

With your definition

\def\foo#1#2\baz{\#1: -#1- \quad \#2: -#2-}

TeX will in any case first gather an non-delimited macro argument and then gather a delimited macro argument where \baz is the delimiter:

\foo hello world\baz

TeX will gather the tokens of the non-delimited macro argument, discarding preceding space tokens if present, and then discarding curly braces surrounding the entire macro argument if present:
h

\foo supercalifragilistiexpialidocious\baz

TeX will gather the tokens of the non-delimited macro argument, discarding preceding space tokens if present, and then discarding curly braces surrounding the entire macro argument if present:
s

\foo a \baz

TeX will gather the tokens of the non-delimited macro argument, discarding preceding space tokens if present, and then discarding curly braces surrounding the entire macro argument if present:
a

Then TeX will obtain the delimited macro argument by gathering tokens until finding the delimiter \baz and then—if present—discarding curly braces surrounding the entire set of tokens gathered so far: As space tokens do get removed only when they are preceding non-delimited macro arguments while this is a delimited macro argument, the ⟨space token⟩ between the a and the \baz-delimiter will be gathered as the macro argument.

\foo a\baz

TeX will gather the tokens of the non-delimited macro argument, discarding preceding space tokens if present, and then discarding curly braces surrounding the entire macro argument if present:
a.

Then TeX will obtain the delimited macro argument by gathering tokens until finding the delimiter \baz and then—if present—discarding curly braces surrounding the entire set of tokens gathered so far: In this case the delimiter will be found immediately, while not yet having gathered any tokens. Thus in this case the macro argument will be empty.

How does TeX look for delimited arguments?

Short introduction to the topic

Can `\outer` tokens occur in ⟨parameter text⟩ ?
Can `\outer` tokens be components of macro arguments?

About `\long` and short

About the order in time in which TeX does gather macro arguments

There are two sorts of macro arguments:

1. Non-delimited macro arguments

2. Delimited macro arguments

There is a subtle thing in TeX: `#{`-notation:

Until TeX, version 3.141592653, January 2021, there was an edge thing

The most important things to keep in mind about TeX's gathering of macro arguments are:

With your definition

Tags:

Macros

Tex Core

Related

Recent Posts

How does TeX look for delimited arguments?

Short introduction to the topic

Can \outer tokens occur in ⟨parameter text⟩ ?Can \outer tokens be components of macro arguments?

About \long and short

About the order in time in which TeX does gather macro arguments

There are two sorts of macro arguments:

1. Non-delimited macro arguments

2. Delimited macro arguments

There is a subtle thing in TeX: #{-notation:

Until TeX, version 3.141592653, January 2021, there was an edge thing

The most important things to keep in mind about TeX's gathering of macro arguments are:

With your definition

Tags:

Macros

Tex Core

Related

Can `\outer` tokens occur in ⟨parameter text⟩ ?
Can `\outer` tokens be components of macro arguments?

About `\long` and short

There is a subtle thing in TeX: `#{`-notation: