Having problems with listings and UTF-8. Can it be fixed?
One way to get around this limitation of listings
is to use the option extendedchars=true
and then to use the literate
option for each accents you're going to be using (it's a bit tedious to do, but once you've done all the accents of your language, you never have to worry about them again). The syntax is
literate={á}{{\'a}}1 {ã}{{\~a}}1 {é}{{\'e}}1
For each accent you must put the real character inside braces (e.g. {á}
) then you put what you want this character to be inside double braces (e.g. {{\'a}}
) and finally you put the number one (1
); between two entries, you can put a space for clarity.
Here's your example modified to use this:
\documentclass[12pt,a4paper]{scrbook}
\KOMAoptions{twoside=false,open=any,chapterprefix=on,parskip=full,fontsize=14pt}
\usepackage[portuguese]{babel}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{listings}
\usepackage{xcolor}
\usepackage{inconsolata}
\lstset{
language=bash, %% Troque para PHP, C, Java, etc... bash é o padrão
basicstyle=\ttfamily\small,
numberstyle=\footnotesize,
numbers=left,
backgroundcolor=\color{gray!10},
frame=single,
tabsize=2,
rulecolor=\color{black!30},
title=\lstname,
escapeinside={\%*}{*)},
breaklines=true,
breakatwhitespace=true,
framextopmargin=2pt,
framexbottommargin=2pt,
inputencoding=utf8,
extendedchars=true,
literate={á}{{\'a}}1 {ã}{{\~a}}1 {é}{{\'e}}1,
}
\begin{document}
\begin{lstlisting}
<?php
echo 'Olá mundo!';
print 'áãé';
\end{lstlisting}
\end{document}
Escape those characters to LaTeX, as the documentation (listings manual, page 14) suggests:
Similarly, if you are using UTF-8 extended characters in a listing, they must be placed within an escape to LaTeX.
\documentclass[12pt,a4paper]{scrbook}
\KOMAoptions{twoside=false,open=any,chapterprefix=on,parskip=full,fontsize=14pt}
\usepackage[portuguese]{babel}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{listingsutf8}
\usepackage{xcolor}
\usepackage{inconsolata}
\lstset{
language=bash, %% Troque para PHP, C, Java, etc... bash é o padrão
basicstyle=\ttfamily\small,
numberstyle=\footnotesize,
numbers=left,
backgroundcolor=\color{gray!10},
frame=single,
tabsize=2,
rulecolor=\color{black!30},
title=\lstname,
escapeinside={\%*}{*)},
breaklines=true,
breakatwhitespace=true,
framextopmargin=2pt,
framexbottommargin=2pt,
extendedchars=false,
inputencoding=utf8
}
\begin{document}
\begin{lstlisting}
<?php
echo '%*Olá mundo*)!';
print '%*Olá mundo*)!';
\end{lstlisting}
\end{document}
The way the inputenc
package works with non-ASCII UTF-8-encoded characters (by making the first byte active and then reading the following ones as arguments) is fundamentally incompatible with the way the listing
package works, which reads each byte individually and expects it to be an individual character.
The listingsutf8 package tries to work around this for the case that your characters are convertible to some 8-bit encoding (and you are using PdfLaTeX) - but this will work only with \lstinputlisting
(as Marc's answer pointed out), not with inline listings. For inline listings the literate
option (as pointed out by Phillipe) sounds good. An alternative would be escaping to LaTeX (as pointed out by Gonzalo) - but this makes simple cut-and-paste not work.
The last time I had to typeset a code which included non-ASCII Unicode characters (stuff like ℤ as Java identifiers, which are not in any 8-bit encoding, AFAIK), I switched to XeLaTeX, which supports UTF-8 input out of the box, without needing the inputenc package. With this, it worked nicely. I suppose LuaLaTeX would work the same way (but it was not that mature then).
(But I later wanted the comments to be formatted, too, thus I started/revived my ltxdoclet project to include source code and formatted comments.)