Basics of parsing
The accepted TeX solution here includes several problems. One of them is that whole argument is read at each step, no only one token. The second is that the recursive loop in the accepted code generates the nested \if...\fi
construction which is very limited in TeX.
So, I show here the common scanner declared by TeX primitives without the problems described above. The scanning of the spaces is allowed, but braces are not allowed (for simplicity).
\def\scan#1{\scanA#1\end}
\def\scanA{\futurelet\next\scanB}
\def\scanB{\expandafter\ifx\space\next \expandafter\scanC \else \expandafter\scanE \fi}
\def\scanC{\afterassignment\scanD \let\next= }
\def\scanD{\scanE{ }}
\def\scanE#1{\ifx\end#1\else
(#1)% <- The processing over one token is here
\expandafter \scanA \fi
}
\scan{abcdef ghijkl mno}
\bye
Edit: If you leave the space behavior unchanged (i.e. they are ignored), then the code is much more simple:
\def\scan#1{\scanA#1\end}
\def\scanA#1{\ifx\end#1\else
(#1)% <- The processing over one token is here
\expandafter \scanA \fi
}
Scanning one token at a time requires at least distinguishing whether the scanned token is a space or a left brace. This is because you can't remove the scanned token with a one parameter macro in those cases.
First of all, let's see what \futurelet
does; your \futurelet\token\scanB
tells TeX to look at what token follows \scanB
, without removing it, then making a \let\token=<scanned token>
assignment and finally “seeing” \scanB
, which should make decisions based on the value of \token
.
For terminating the scanning, you have to place some special token at the end; this token is frequently a “quark”, say
\def\quark{\quark}
so \scanB
can do \ifx\token\quark
and, in this case, stop the recursion. Let's put into play what we have till now:
\makeatletter
\def\scan@quark{\scan@quark}% if we find it in bad places, we'll know!
\newcommand\scan[1]{\futurelet\@let@token\scan@aux@i#1\scan@quark}
\def\scan@aux@i{%
\ifx\@let@token\scan@quark
\expandafter\@gobbletwo
\else
\expandafter\@firstofone
\fi
{\scan@aux@ii}%
}
The macro \scan@aux@ii
should now go on with other tests. I used \@gobbletwo
in the “true” case so to gobble \scan@aux@ii
and \scan@quark
.
If instead you want just to split the input at a certain token, a better approach is using delimited arguments: you can find several examples on the site. With expl3
it's quite easy, because there are built in functions that do the job.
So, say you have an input such as \word{abc^def^ghi}
that you want to print with alternating colors. Here's an implementation:
\documentclass{article}
\usepackage{xparse,xcolor}
\ExplSyntaxOn
\NewDocumentCommand{\word}{m}
{
\kormylo_word:n { #1 }
}
\seq_new:N \l_kormylo_word_fragment_seq
\bool_new:N \l_kormylo_second_color_bool
\cs_new_protected:Npn \kormylo_word:n #1
{
\kormylo_change_color:
\seq_set_split:Nnn \l_kormylo_word_fragment_seq { ^ } { #1 }
\seq_use:Nn \l_kormylo_word_fragment_seq { \kormylo_change_color: }
}
\cs_new_protected:Npn \kormylo_change_color:
{
\bool_if:NTF \l_kormylo_second_color_bool
{ \color{second} \bool_set_false:N \l_kormylo_second_color_bool }
{ \color{first} \bool_set_true:N \l_kormylo_second_color_bool }
}
\ExplSyntaxOff
\colorlet{first}{black}
\colorlet{second}{red}
\begin{document}
\word{su^per^cal^i^frag^i^lis^tic^ex^pi^al^i^do^cious}
\end{document}
Note that you can use spaces around the separator token for better input, such spaces will be disregarded.
The macros could be extended to allow spaces in the input: just split at spaces and do a mapping.
\documentclass{article}
\usepackage{xparse,xcolor}
\ExplSyntaxOn
\NewDocumentCommand{\words}{m}
{
\kormylo_words:n { #1 }
}
\seq_new:N \l_kormylo_word_seq
\seq_new:N \l_kormylo_word_fragment_seq
\bool_new:N \l_kormylo_second_color_bool
\cs_new_protected:Npn \kormylo_words:n #1
{
\seq_set_split:Nnn \l_kormylo_word_seq { ~ } { #1 }
\seq_map_inline:Nn \l_kormylo_word_seq
{
\kormylo_word:n { ##1 }
\c_space_tl
}
}
\cs_new_protected:Npn \kormylo_word:n #1
{
\kormylo_change_color:
\seq_set_split:Nnn \l_kormylo_word_fragment_seq { ^ } { #1 }
\seq_use:Nn \l_kormylo_word_fragment_seq { \kormylo_change_color: }
}
\cs_new_protected:Npn \kormylo_change_color:
{
\bool_if:NTF \l_kormylo_second_color_bool
{ \color{second} \bool_set_false:N \l_kormylo_second_color_bool }
{ \color{first} \bool_set_true:N \l_kormylo_second_color_bool }
}
\ExplSyntaxOff
\colorlet{first}{black}
\colorlet{second}{red}
\begin{document}
\words{su^per^cal^i^frag^i^lis^tic^ex^pi^al^i^do^cious syl^la^ble con^cate^na^tion}
\end{document}
For the first answers, I assume that you want to scan pure text, without any groups commands etc. and without any spaces.
This is the (mainly) TeX solution.
\documentclass{minimal}
\begin{document}
\def\END{}
\def\ENDEND{}
\newcommand*\scan[1]{\scani #1\END\ENDEND}
\def\scani#1#2\ENDEND{%
\ifx\END#1%
\else%
(#1)%
\scani#2\ENDEND%
\fi
}
\scan{test}
\end{document}
And here is the LaTeX3 version.
\documentclass{minimal}
\usepackage{expl3}
\begin{document}
\ExplSyntaxOn
\newcommand*\scan[1]
{
\tl_map_inline:nn {#1} { (##1) }
}
\ExplSyntaxOff
\scan{test}
\end{document}
Both will output
(t)(e)(s)(t)
Dealing with spaces, it somewhat tricky. I have a TeX solution here (from Usenet times), but do not understand it myself.
For LaTeX3, here are solutions that can cope with spaces: LaTeX3: tl_map with spaces
Or you use my version, which is
\documentclass{minimal}
\usepackage{expl3}
\begin{document}
\ExplSyntaxOn
\newcommand*\scan[1]
{
\__scanloop: #1 \q_recursion_stop
}
\cs_new:Nn \__scanloop:
{
\peek_meaning_remove:NTF \q_recursion_stop
{}
{
\peek_charcode_remove:NTF \c_space_token
{
(~)
\__scanloop:
}
% else
{
\__scanloop_aux:
}}
}
\cs_new:Npn \__scanloop_aux: #1
{
( #1 )
\__scanloop:
}
\ExplSyntaxOff
\scan{test with spaces}
\end{document}
which will output
(t)(e)(s)(t)( )(w)(i)(t)(h)( )(s)(p)(a)(c)(e)(s)