"Namethatchar" macro
If the Unicode value is known, then there is a mapping from glyph name to Unicode in glyphlist.txt
that can quite easily parsed (with #
as comment char) to get the hex string of the Unicode value. However this method is quite limited, because there are lots of Unicode characters without proper glyph name in this list.
If the name does not care, then the Unicode hex value can be used directly.
Because you also want to support LICRs (`A
), this is a bit tricky.
Package hyperref
has to do a similar job, when it converts arbitrary TeX strings to PDF strings. Thus the main work is done by \pdfstringdef
in the following example:
\documentclass{article}
\usepackage{ifluatex}
\usepackage{ifxetex}
\ifluatex
\else
\ifxetex
\else
\usepackage[utf8]{inputenc}
\fi
\fi
\usepackage[pdfencoding=auto]{hyperref}
\usepackage{pdfescape}
\newcommand*{\namethatchar}[2][]{%
\begingroup
\hypersetup{unicode}%
\pdfstringdef\gcharname{#2}%
\endgroup
\EdefUnescapeString\charname{\gcharname}%
\EdefEscapeHex\charname{\charname}%
\edef\charname{%
\expandafter\stripBOM\charname\@empty\@empty\@empty\@empty
}%
\ifx\relax#1\relax
\expandafter\charname
\else
\let#1\charname
\fi
}
\newcommand*{\stripBOM}[4]{%
\ifnum"0#1#2#3#4="FEFF %
\else
#1#2#3#4%
\fi
}
\begin{document}
\namethatchar{A}
\namethatchar{À}
\namethatchar{a}
\namethatchar{\`A}
\namethatchar{\texteuro}
\namethatchar{\textpertenthousand}
\end{document}
If the optional argument is specified, the result is stored in the macro instead of printing the result.
\namethatchar[\result]{\`A}
\stripBOM
strips the byte order mark that is not needed here.The result is actually UTF-16BE, that means surrogates are used for higher Unicode values that does not fit in the first plane (Basic Multilingual Plane).
Because
hyperref
's\pdfstringdef
supports the big chars of XeTeX and LuaTeX, the solution works with XeTeX and LuaTeX. Then the line `\usepackage[utf8]{inputenc}' should be removed.
Extended solution with glyph names
This solution additionally looks for glyph names in glyphlist.txt
and glyphtounicode.tex
. Care is needed to make the mapping unique. Therefore I have left lots of \typeout
and \@latex@warning
lines to show the assignments and dropped mappings. It makes sense for an application to provide its own unique mapping file to get a faster loading and better control of the mappings.
\documentclass{article}
\usepackage{ifluatex}
\usepackage{ifxetex}
\ifluatex
\else
\ifxetex
\else
\usepackage[utf8]{inputenc}
\fi
\fi
\usepackage[T1]{fontenc}
\usepackage[pdfencoding=auto]{hyperref}
\usepackage{pdfescape}
\usepackage{ltxcmds}
\makeatletter
\def\GlyphlistLine#1;#2;#3\@nil{%
\ifx\\#2\\%
\else
\ltx@ifundefined{glyphlist@#2}{%
\ltx@ifundefined{listglyph@#1}{%
\typeout{Defining #2 -> #1}%
\expandafter\gdef\csname glyphlist@#2\endcsname{#1}%
\expandafter\gdef\csname listglyph@#1\endcsname{#2}%
}{%
\@latex@warning{%
#2 -> #1 ignored, because of\MessageBreak
\@nameuse{listglyph@#1} -> #1%
}%
}%
}{%
\edef\ua{\@nameuse{glyphlist@#2}}%
\edef\ub{#1}%
\ifx\ua\ub
\else
\@latex@warning{%
#2 -> #1 ignored, because\MessageBreak
#2 -> \ua\space exists
}%
\fi
}%
\fi
}%
\IfFileExists{glyphlist.txt}\@firstoftwo\@secondoftwo{%
\typeout{* Glyph mapping: glyphlist.txt}%
\begingroup
\catcode`\#=14 % comment
\catcode`\ =9 % ignore spaces
\endlinechar=-1 % ignore line ends
% misusing \@inputcheck to save a read register
\openin\@inputcheck=glyphlist.txt\relax
\loop
\read\@inputcheck to\mapline
\ifeof\@inputcheck
\else
\expandafter\GlyphlistLine\mapline;;\@nil
\repeat
\immediate\closein\@inputcheck
\endgroup
}{%
\@latex@warning@no@line{Missing `glyphlist.txt'}%
}
\IfFileExists{glyphtounicode}\@firstoftwo\@secondoftwo{%
\typeout{* Glyph mapping: glyphtounicode.tex}%
\begingroup
\def\pdfglyphtounicode#1#2{%
\GlyphlistLine#1;#2;\@nil
}%
\endlinechar=-1 %
\catcode`\ =9 %
\input{glyphtounicode.tex}%
\endgroup
}{%
\@latex@warning@no@line{Missing `glyphtounicode.tex'}%
}
\newcommand*{\namethatchar}[2][]{%
\begingroup
\hypersetup{unicode}%
\pdfstringdef\gcharname{#2}%
\endgroup
\EdefUnescapeString\charname{\gcharname}%
\EdefEscapeHex\charname{\charname}%
\edef\charname{%
\expandafter\stripBOM\charname\@empty\@empty\@empty\@empty
}%
\ltx@IfUndefined{glyphlist@\charname}{%
}{%
\expandafter\let\expandafter\charname
\csname glyphlist@\charname\endcsname
}%
\ifx\relax#1\relax
\expandafter\charname
\else
\let#1\charname
\fi
}
\newcommand*{\stripBOM}[4]{%
\ifnum"0#1#2#3#4="FEFF %
\else
#1#2#3#4%
\fi
}
\makeatother
\begin{document}
\newcommand*{\test}[2][]{%
\texttt{\ifx\\#1\\\detokenize{#2}\else#1\fi}
& \texttt{\namethatchar{#2}}\\
}
\begin{tabular}{ll}
\test{A}
\test[Ã@]{Ã@}
\test{a}
\test{\`A}
\test{\texteuro}
\test{\textpertenthousand}
\test{\textsucceqq}% needs hyperref 2012/08/18 or later
\test{\textBicycle}% needs hyperref 2012/08/18 or later
\test{\textcopyleft}% needs hyperref 2012/08/18 or later
\end{tabular}
\end{document}
Not sure if you want it typeset or in the log, this version uses \typeout
and produces
A is A
À is A grave
a is a
\`A is A grave
\documentclass{article}
\usepackage[utf8]{inputenc}
\makeatletter
\def\namethatchar#1{{%
\let\IeC\@firstofone
\def\'##1{##1 acute}%
\def\`##1{##1 grave}%
\protected@edef\zzz{#1}%
\typeout{\unexpanded{#1} is \zzz}}}
\makeatother
\namethatchar{A}
\namethatchar{À}
\namethatchar{a}
\namethatchar{\`A}
\stop