a macro to print the catcodes of all tokens
The solution which uses only TeX primitives is here. You need not expl3, LaTeX etc. The result of \showcatcodes
is the same as in another solution presented here.
\def\showcatcodes#1{\showcA#1\showcA}
\def\showcA{\let\next=\showcC \futurelet\nextc\showcB}
\def\showcB{%
\ifx\nextc\showcA \def\next##1{}\fi
\expandafter\ifx\space\nextc \def\next{\showcD\ {10}}\fi
\ifx\nextc{\def\next{\showcD\{{1}}\fi
\ifx\nextc}\def\next{\showcD\}{2}}\fi
\next
}
\def\showcC#1{{\tt\string#1}\expandafter
\ifcat\noexpand#1\relax \showcE{16}\else \showcE{\the\catcode`#1}\fi
\showcA
}
\def\showcD#1#2{{\tt\char`#1}\showcE{#2}\afterassignment\showcA \let\nextc= }
\def\showcE#1{${}_{#1}$\thinspace}
\showcatcodes{a~b{cd{e1}}2 3!$_ ^y\xxx}
\end
Edit With regard to the comment (below) by DavidCarlisle I'v added second version of my code:
\def\showcatcodes#1{\showcA#1\showcA}
\def\showcA{\let\next=\showcC \futurelet\nextc\showcB}
\def\showcB{%
\ifx\nextc\showcA \def\next##1{}\fi
\ifcat\space\noexpand\nextc \def\next{\showcD\ {10}}\fi
\ifcat\noexpand\nextc{\def\next{\showcD\{{1}}\fi
\ifcat\noexpand\nextc}\def\next{\showcD\}{2}}\fi
\next
}
\def\showcC#1{{\tt\string#1}\showcE{%
\ifcat\noexpand#1$3\fi \ifcat\noexpand#1&4\fi \ifcat\noexpand#1##6\fi
\ifcat\noexpand#1^7\fi \ifcat\noexpand#1_8\fi \ifcat\noexpand#1x11\fi
\ifcat\noexpand#1:12\fi \ifcat\noexpand#1\noexpand~13\fi
\ifcat\noexpand#1\hbox16\fi
}\showcA
}
\def\showcD#1#2{{\tt\char`#1}\showcE{#2}\afterassignment\showcA \let\nextc= }
\def\showcE#1{${}_{#1}$\thinspace}
\showcatcodes{a~b{cd{e1}}2 3!$_ ^y\xxx}
\end
You can use a variation on https://tex.stackexchange.com/a/358697/4427
\documentclass{article}
\usepackage{expl3,xparse}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\ExplSyntaxOn
\NewDocumentCommand\showcatcodes { m }
{
\group_begin:
\ttfamily
\tl_set:Nn \l_tmpa_tl { #1 }
\jakun_remove_braces:
\regex_extract_all:nVN { . } \l_tmpa_tl \l_tmpa_seq
\seq_map_inline:Nn \l_tmpa_seq
{ \jakun_value_catcode:n { ##1 } }
\group_end:
}
\cs_new_protected:Nn \jakun_remove_braces:
{
\regex_match:nVT { \cB. } \l_tmpa_tl
{
\regex_replace_all:nnN { \cB. (.*?) \cE\} } { \cO\{ \1 \cO\} } \l_tmpa_tl
\jakun_remove_braces:
}
}
\cs_generate_variant:Nn \regex_extract_all:nnN { nV }
\prg_generate_conditional_variant:Nnn \regex_match:nn { nV } { T }
\cs_new_protected:Nn \jakun_value_catcode:n
{
\bool_lazy_and:nnTF { \tl_if_single_p:n { #1 } } { \token_if_cs_p:N #1 }
{
\token_to_str:N #1 \textsubscript{16}
}
{
\str_if_eq:nnTF { #1 } { ~ }
{ \textvisiblespace \textsubscript{10} }
{ \token_to_str:N #1 \textsubscript{\char_value_catcode:n { `#1 }} }
}
}
\ExplSyntaxOff
\begin{document}
\showcatcodes{a~b{cd{e1}}2 3!$_ ^y\xxx}
\end{document}
A version that uses the current category codes; I think that much more work would be needed to cope with implicit characters. You can play with it.
I think that \tl_analysis_show:n
is much better for the purpose of debugging.
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{xparse}
\ExplSyntaxOn
\NewDocumentCommand{\showcatcodes}{sm}
{
\group_begin:
\ttfamily
\IfBooleanTF{#1}
{
\exp_last_unbraced:NV \jakun_showcatcodes: #2 \q_stop
}
{
\jakun_showcatcodes: #2 \q_stop
}
\group_end:
}
\cs_new_protected:Nn \jakun_showcatcodes:
{
\peek_meaning_remove:NTF \q_stop
{
%\unskip
}
{
\peek_catcode_remove:NTF \c_space_token
{
\jakun_print_catcode:nn { \textvisiblespace } { 10 }
\jakun_showcatcodes:
}
{
\peek_catcode_remove:NTF \c_group_begin_token
{
\jakun_print_catcode:nn { \{ } { 1 }
\jakun_showcatcodes:
}
{
\peek_catcode_remove:NTF \c_group_end_token
{
\jakun_print_catcode:nn { \} } { 2 }
\jakun_showcatcodes:
}
{
\jakun_other_catcode:N
}
}
}
}
}
\cs_new_protected:Nn \jakun_other_catcode:N
{
\token_if_cs:NTF #1
{
\jakun_print_catcode:nn { \token_to_str:N #1 } { 16 }
}
{
\token_if_eq_catcode:NNTF \c_math_toggle_token #1
{
\jakun_print_catcode:nn { \token_to_str:N #1 } { 3 }
}
{
\token_if_eq_catcode:NNTF \c_alignment_token #1
{
\jakun_print_catcode:nn { \token_to_str:N #1 } { 4 }
}
{
\token_if_eq_catcode:NNTF \c_parameter_token #1
{
\jakun_print_catcode:nn { \token_to_str:N #1 } { 6 }
}
{
\token_if_eq_catcode:NNTF \c_math_superscript_token #1
{
\jakun_print_catcode:nn { \token_to_str:N #1 } { 7 }
}
{
\token_if_eq_catcode:NNTF \c_math_subscript_token #1
{
\jakun_print_catcode:nn { \token_to_str:N #1 } { 8 }
}
{
\token_if_eq_catcode:NNTF \c_catcode_letter_token #1
{
\jakun_print_catcode:nn { \token_to_str:N #1 } { 11 }
}
{
\token_if_eq_catcode:NNTF \c_catcode_other_token #1
{
\jakun_print_catcode:nn { \token_to_str:N #1 } { 12 }
}
{
\jakun_print_catcode:nn { \token_to_str:N #1 } { 13 }
}
}
}
}
}
}
}
}
\jakun_showcatcodes:
}
\cs_new_protected:Nn \jakun_print_catcode:nn
{
#1\textsubscript{#2}~
}
\ExplSyntaxOff
\begin{document}
\showcatcodes{abc x{y{z}}~&#_\xyz}
{
\catcode`z=\active
\showcatcodes{abc x{y{z}}~&#_\xyz}
\gdef\test{abc x{y{z}}~&##_\xyz}
}
\showcatcodes*{\test}
\end{document}
This answer uses the tokcycle
package to give the catcode decodes. It can handle implicit, active, and long tokens, however, there are some limitations.
The package is currently set up to remember the name of only one implicit cat-6 token at a time. If there is more than one implicit cat-6 in the input stream, it will detect all of them as cat-6, but will only remember the name of the most recent implicit cat-6 declaration. A multiplicity of explicit cat-6 tokens pose no problem.
The package can handle changes in cat 1,2 tokens. However, it cannot detect the charcode associated with such tokens, but must be told them in advance. I'll show an example later in the answer.
It will never interpret a %
in the input stream as cat-14. Rather, the %
is digested by TeX as a comment character before it ever reaches the tokcycle
environment.
Likewise, cat-5 end-of-line's are intercepted by TeX before they reach tokcycle
, and so they will be interpreted as explicit space tokens.
First, an MWE that leaves cat 1,2 tokens as {}
, and has only one implicit cat-6 token \C
.
\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{tokcycle,lmodern}
\tokcycleenvironment\showcats
{\ifcatSIX\addcytoks{\thistok{##1}{6}}\else\addcytoks{\catcomp{##1}}\fi}
{\addcytoks{\thistok{\{}{1}}\processtoks{##1}\addcytoks{\thistok{\}}{2}}}
{\addcytoks{\catcomp{##1}}\tctestifx{\par##1}{\addcytoks{\par}}{}}
{\ifimplicittok\addcytoks{\catcomp{##1}}\else
\addcytoks{\thistok{\textvisiblespace}{\number\catcode`##1}}\fi}
\newcommand\thistok[2]{#1$_{#2}$\,\allowbreak}
\makeatletter
\newcommand\catcomp[1]{%
\tctestifx{\implicitsixtok#1}{\expandafter\string#1$_{6}}{%
\string#1$_{%
\tctestifcatnx #1\relax{0}{%
\tctestifcatnx #1${3}{%
\tctestifcatnx #1&{4}{%
\tctestifcatnx #1^{7}{%
\tctestifcatnx #1_{8}{%
\tctestifcatnx #1\@sptoken{10}{%
\tctestifcatnx #1a{11}{%
\tctestifcatnx #11{12}{%
\tctestifcatnx #1~{13}{%
*%
}}}}}}}}}}%
}$\,\allowbreak%
}
\let\deftok\tc@deftok
\makeatother
\begin{document}
\ttfamily
\let\A$% 3
\let\B&% 4
\let\C#% 6
\let\D^% 7
\let\E_% 8
\deftok\F{ }% 10
\let\G a% 11
\let\H 1% 12
\let\I~% 0, because \I is not active,
% but a macro that takes the same meaning as ~
\let\J\relax% 0
\def\K{xyz}% 0
\catcode`q=\active% 13
\def q{x}
\catcode`Q=\active
\let Q #% This implicit assignment makes
% the catcode of Q=6, rather than 13
\deftok\sptoken{ }% 10
\showcats
\A\B\C\D\E\F\G\H\I\J\K
A9 $x_2^{y+1}$ \today &#~
\space\sptoken qQ<>
\endshowcats
\end{document}
One should note that we are testing here the catcodes of the actual tokens, not the current value of \catcode
associated with the given charcode. So, for example, if after \E
is let to _
, we reassign _
the catcode of 7, the token \E
will still test as cat-8, not 7.
Now, for the case of cat 1,2 changes, for which I will use <>
. So, first, one has to save the catcode-12 versions of these cat 1,2 tokens. I do that with
\def\<{<}
\def\>{>}
before any catcode changes are made. Then I reassign the catcodes with
\catcode`<=1
\catcode`>=2
\let\bgroup<
\let\egroup>
\settcGrouping<<#1>>
\catcode`{=12
\catcode`}=12
The only unusual thing here is the \settcGrouping<<#1>>
macro that tells tokcycle
what tokens to place in the output stream for grouping (it uses {_1}_2
as the default, now reset to <_1>_2
). This invocation is not really essential for this particular problem, because I don't actually detokenize every token the output stream. But if I did, it would ensure that the output stream grouping would be populated with the updated <>
tokens.
To bring the change for this particular approach, I explicitly tell the Showcats
pseudo-environment to display the previously defined \<_1
and \>_2
whenever a grouping situation arises.
The MWE...and for fun, I do it in plain pdfTeX, since tokcycle
can operate in that mode:
\input tokcycle
\def\thistok#1#2{#1$_{#2}$\,\allowbreak}
\catcode`@=11
\def\textvisiblespace{\char"20}
\def\,{\kern2pt}
\long\def\catcomp#1{%
\tctestifx{\implicitsixtok#1}{\expandafter\string#1$_{6}}{%
\string#1$_{%
\tctestifcatnx #1\relax{0}{%
\tctestifcatnx #1${3}{%
\tctestifcatnx #1&{4}{%
\tctestifcatnx #1^{7}{%
\tctestifcatnx #1_{8}{%
\tctestifcatnx #1\@sptoken{10}{%
\tctestifcatnx #1a{11}{%
\tctestifcatnx #11{12}{%
\tctestifcatnx #1~{13}{%
*%
}}}}}}}}}}%
}\,$\allowbreak%
}
\let\deftok\tc@deftok
\deftok\@sptoken{ }% 10
\catcode`@=12
\tt
\let\A$% 3
\let\B&% 4
\let\C#% 6
\let\D^% 7
\let\E_% 8
\deftok\F{ }% 10
\let\G a% 11
\let\H 1% 12
\let\I~% 0, because \I is not active,
% but a macro that takes the same meaning as ~
\let\J\relax% 0
\def\K{xyz}% 0
\catcode`q=\active% 13
\def q{x}
\catcode`Q=\active
\let Q #% This implicit assignment makes
% the catcode of Q=6, rather than 13
\deftok\sptoken{ }% 10
\def\<{<}
\def\>{>}
\catcode`<=1
\catcode`>=2
\let\bgroup<
\let\egroup>
\settcGrouping<<#1>>
\catcode`{=12
\catcode`}=12
\tokcycleenvironment\Showcats
<\ifcatSIX\addcytoks<\thistok<##1><6>>\else\addcytoks<\catcomp<##1>>\fi>
<\addcytoks<\thistok<\<><1>>\processtoks<##1>\addcytoks<\thistok<\>><2>>>
<\addcytoks<\catcomp<##1>>\tctestifx<\par##1><\addcytoks<\par>><>>
<\ifimplicittok\addcytoks<\catcomp<##1>>\else
\addcytoks<\thistok<\textvisiblespace><\number\catcode`##1>>\fi>
\Showcats
\A\B\C\D\E\F\G\H\I\J\K
A9 $x_2^<y+1>$ \today &#~
\space\sptoken qQ{}
\endShowcats
\bye
In the course of formulating this answer, I found a bug in the package. It did not process active-implicit spaces properly. For example,
\makeatletter
\catcode`Q=\active
\tc@deftok Q{ }
\tokcycle{}{}{}{\detokenize{[#1]}}{x y zQw}
does not produce a sensible result when encountering the Q
in the input stream.
I have now implemented that feature into the package and uploaded v1.2 (2020-10-01) to ctan for redistribution.