Unicode char {U+200B}
U+200B is zero width space which you have here:
elim}
from the end of this line
\DeclareMultiCiteCommand{\parencites}[\mkbibparens]{\parencite}{\multicitedelim}
is (using this unicode converter)
U+0065 LATIN SMALL LETTER E e
U+006c LATIN SMALL LETTER L l
U+0069 LATIN SMALL LETTER I i
U+006d LATIN SMALL LETTER M m
U+200c ZERO WIDTH NON-JOINER ‌
U+200b ZERO WIDTH SPACE ​
U+007d RIGHT CURLY BRACKET } } \rbrace
delete that line and re-type without the invisible control characters between the m
and }
This is one of the top hits for U+200B and LaTeX, so I’ll post solutions here.
Take the following example:
\tracinglostchars=2
\documentclass{article}
\pagestyle{empty}
\begin{document}fl fl fl\end{document}
In LuaLaTeX, it compiles to:
The first fl has no ligature because I inserted U+200B, a zero-width space. The second has no ligature because I inserted U+200C, a zero-width non-joiner. These might have been in the original source you copied from intentionally: a zero-width space could mean a potential line break, such as after a slash, and a zero-width non-joiner disables a ligature. For example, the fi in Elfin or the fl in Halfling are (according to pedants like me) not supposed to be ligated, since they belong to different pieces of a compound word. Almost no one bothers to actually do that, but it is much more common in some other languages.
If you try to compile it in PDFLaTeX, you will get the error message that brought you here:
! Package inputenc Error: Unicode character (U+200B)
(inputenc) not set up for use with LaTeX.
There are several ways to fix it.
Clean Your Source by Hand
This is what most people on this site recommend. Your editor might have a way to display special characters so you can delete them all. But really, isn’t this a job for a computer?
Clean your Source in your Editor
This is harder with an invisible, zero-width character, but you might be able to copy zero-width space from a character map, open up your search-and-replace dialogue box and paste the character into the search field. You can then replace it with something like ZWS
or {\hskip 0pt}
.
Clean Your Source with Perl
The following one-line Perl script will create a new source file with all zero-width spaces removed:
perl -CSD -pe "s/\N{U+200B}//gu" < U200B.tex > noU200B.tex
If it’s easier to remember, you could also write this as
perl -CSD -pe "s/\N{ZERO WIDTH SPACE}//gu" < U200B.tex > noU200B.tex
The -CSD
option selects UTF-8 unconditionally, even if you don’t have UTF-8 as your default locale. The -pe
option runs the given Perl script on the input file and prints to the output file. The s
command does substitution, the \N{...}
is a regular expression matching zero-width space, the empty field between //
means replace with nothing, and gu
means replace all instances globally in the unicode string. Then, the <
and >
operators select the input and output files.
Either of these produce a file that compiles to:
It’s also possible to automatically remove all characters outside a given subset. The script
perl -CSD -pe "s/[^\p{Word}\p{Punct}\p{Symbol}\p{Mark}\p{PerlSpace}]//gu"
allows only the following: Unicode “word” characters, punctuation, symbols, accents and a few kinds of spaces. It erases most invisible characters. An even more restrictive version would be
perl -CSD -pe "s/[^\p{ASCII}]//gu"
This cleans out all characters but for the ASCII originally allowed in TeX (including “
instead of double backtick).
And yes, we could replace zero-width space by something instead of nothing. The script
perl -CSD -pe "s/\N{ZERO WIDTH SPACE}/{\\\\hskip 0pt}/gu; s/\N{ZERO WIDTH NON-JOINER}/{}/gu"
given the above MWE as input, produces the following output:
\tracinglostchars=2
\documentclass{article}
\pagestyle{empty}
\begin{document}f{\hskip 0pt}l f{}l fl\end{document}
Teach LaTeX to Understand Zero-Width Space
If the problem is that U+200B is “not set up for use with LaTeX,” but it’s equivalent to a TeX command—\hskip 0pt
or \hspace{0pt}
are zero-width spaces that prevent ligatures and enable a potential line break—we can set the character up to use that command.
\tracinglostchars=2
\documentclass{article}
\usepackage{iftex}
\pagestyle{empty}
\ifTUTeX
\usepackage{fontspec}
\else
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc} % The default since 2018
\DeclareUnicodeCharacter{200B}{{\hskip 0pt}}
\fi
\begin{document}fl fl fl\end{document}
Although the \DeclareUnicodeCharacter
command is in inputenc
, the LaTeX kernel has loaded it by default since 2018. So, we could have skipped declaring it.