Unicode -(U+301) error in biblatex, but not in main text: {\'{\i}}

With biblatex and Biber the best solution™ is of course to use the correct Unicode characters (and ideally the precomposed characters: Åström, not a combination of the combining characters: Åström) in the source.

author    = {Qinsi Zheng and Steffen Jockusch and Gabriel G. Rodríguez-Calero
             and Zhou Zhou and Hong Zhao and Roger B. Altman and Héctor D. Abruña
             and Scott C. Blanchard},

The benefit of this solution is that it is easier to read, just works and avoids the additional braces that BibTeX needs (and that are retained in Biber for simplicity and backwards compatibility, those braces could destroy kerning and are otherwise unnecessary for Biber, see How to write “ä” and other umlauts and accented letters in bibliography? for why they are needed for BibTeX).

If that is not possible and you can't replace {\'{\i}} with {\'i} in the source, you can try a sourcemap as shown in PLK's answer to Input encoding error after upgrading from Biber 1.9 to Biber 2.1.

The logistic drawback of that approach is that you need to add a substitution rule for every possible problematic combination.

To offer some additional benefit over PLK's answer, the code below uses the new loop functionality to replace \`{\i}, \'{\i}, \^{\i} and \"{\i} (all Latin-1 dotless-i combinations) for (hopefully) all fields where it makes sense.

\documentclass{article}
\usepackage[english]{babel}
\usepackage[utf8]{inputenc}
\usepackage{csquotes}
\usepackage[style = authoryear, backend = biber, maxbibnames=999]{biblatex}
\addbibresource{\jobname.bib}

\DeclareDatafieldSet{setall}{
  \member[datatype=literal]
  \member[datatype=name]
  \member[field=journal]% journal is special since it is
                        % actually journaltitle
}

\DeclareSourcemap{
  \maps[datatype=bibtex]{
    \map[overwrite, foreach={setall}]{
      % \`{\i}
      \step[fieldsource=\regexp{$MAPLOOP},
            match=\regexp{\x{0131}\x{0300}},
            replace=\regexp{\x{00EC}}]
      % \'{\i}
      \step[fieldsource=\regexp{$MAPLOOP},
            match=\regexp{\x{0131}\x{0301}},
            replace=\regexp{\x{00ED}}]
      % \^{\i}
      \step[fieldsource=\regexp{$MAPLOOP},
            match=\regexp{\x{0131}\x{0302}},
            replace=\regexp{\x{00EE}}]
      % \"{\i}
      \step[fieldsource=\regexp{$MAPLOOP},
            match=\regexp{\x{0131}\x{0308}},
            replace=\regexp{\x{00EF}}]
    }
  }
}

\usepackage{filecontents}
\begin{filecontents}{\jobname.bib}
@article{itest,
  author  = {Lo{\"{\i}}c Rodr{\'{\i}}guez-Calero},
  title   = {Lor{\"{\i}}m {\'{\i}}psum and {\`{\i}}v{\^{\i}}n},
  journal = {Dol{\"{\i}}r s{\'{\i}}t},
  note    = {Am{\"{\i}}t cons{\'{\i}}ctur},
  date    = {2018},
}
@article{Zheng2016,
  author    = {Qinsi Zheng and Steffen Jockusch
               and Gabriel G. Rodr{\'{\i}}guez-Calero
               and Zhou Zhou and Hong Zhao and Roger B. Altman
               and H{\'e}ctor D. Abru{\~n}a and Scott C. Blanchard},
  title     = {Intra-molecular triplet energy transfer is a general
               approach to improve organic fluorophore photostability},
  journal   = {Photochemical {\&} Photobiological Sciences},
  year      = {2016},
  volume    = {15},
  number    = {2},
  pages     = {196--203},
  doi       = {10.1039/c5pp00400d},
}
\end{filecontents}

\begin{document}
\parencite{Zheng2016}
\cite{itest}

\printbibliography
\end{document}

Rodríguez-Calero, Loïc (2018). “Lorïm ípsum and ìvîn”. In: Dolïr sít. Amït consíctur.

Why is this Unicode business such an issue?

Unicode combines characters by adding the combining marks after the base glyph. LaTeX works exactly the other way round: The combining accents are added before the glyph (as a macro that gets the base glyph as argument).

Biber 'parses' the LaTeX character macros and converts them to Unicode characters for sorting and the like. That is done according to simple translations for macros into Unicode points and the complex Unicode rules.

Combining characters involving i are particularly complicated since LaTeX usually bases its characters upon the 'dotless i' (\i - ı, U+0131) to avoid clashes of accent and tittle, whereas Unicode seems to prefer its combining characters based on the 'small i' (i - i, U+0069) http://unicode.org/faq/char_combmark.html#22. That means that \'i gets converted to í (í, U+00ED), but \'\i to ı́ (ı́, U+0131 + U+0301, a combination of the dotless i and the accent).

LaTeX's inputenc can only deal with a sensible subset of Unicode and fails to account for ı́ (U+0131 + U+0301) while it handles í (U+00ED) just fine.

See also PLK's explanation in the linked answer as well as comments in https://github.com/plk/biber/issues/65 and https://github.com/plk/biblatex/issues/819.

Another solution that needs no such tricks, but might not be compatible with your workflow, is to use a proper Unicode engine like LuaLaTeX or XeLaTeX and font that has properly kerned accents (Linux Libertine, for example).

Unicode -(U+301) error in biblatex, but not in main text: {\'{\i}}

Tags:

Unicode

Latexmk

Biblatex

Related

Recent Posts