Can't copy-paste from my PDF. Any idea why?
It depends on the fonts that you are using:
pdffonts test.pdf
Type 3 fonts
Try to avoid them, because they are bitmapped fonts that do not scale well. Also the characters do not have glyph names that make it easier for text extraction tools.
If you are using \usepackage[T1]{fontenc}
with standard fonts, then you get the EC fonts. Install cm-super
to get the Type 1 versions. Alternatively use the successor fonts Latin Modern (\usepackage{lmodern}
).
Package cmap
It is based on LaTeX's font encodings and adds map entries from slot positions to Unicode slots.
The package should be loaded at the very beginning:
\RequirePackage{cmap}
\documentclass{...}
The package does not depend on correct glyph names in the font or the font type. On the other side, undefined encodings (for symbol fonts, …) are not well supported.
\pdfgentounicode
This primitive of pdfTeX adds the Unicode mapping based on font glyph names. It does not work for Type 3 fonts. The support is better if the fonts contain the correct glyph names and a mapping is provided.
Usage:
\pdfgentounicode=1 %
\input glyphtounicode.tex %
The file glyphtounicode.tex
contains predefined mappings for many glyhp names
to Unicode.
Package accsupp
The package uses the /ActualText
feature of PDF that allows to say what text should be used for the displayed glyphs. This allows the support and finer control of symbols, for example.
PDF viewer/text extractor
Also it depends on the PDF viewer or tool that extracts the text, what features they support to detect the characters. Some might only work with glyph names and slot positions, others support the Unicode mappings (they should) and more advanced support the /ActualText
feature.
Addition:
Package cmap
and the method \pdfgentounicode
should not be used together, because they add the same data structures to the fonts in the PDF file. If these are the same exactly it would not be too much of a problem, but there might be differences and this violates the PDF structure causing unpredictable behavior of the PDF reader applications that are free to choose, which value they use for the same key.