Is it possible to produce a PDF with un-copyable text?
Besides converting all texts to images, one method as I know, is to destroy the Cmaps of the fonts. We can use cmap
package and a special cmap file for this purpose. This cmap file is generated inside the VerbatimOut environment.
(Warning: it does not make much sense to produce un-copyable PDF. OCR is very easy today.)
% pdflatex is required
\documentclass{article}
\usepackage[resetfonts]{cmap}
\usepackage{fancyvrb}
\begin{VerbatimOut}{ot1.cmap}
%!PS-Adobe-3.0 Resource-CMap
%%DocumentNeededResources: ProcSet (CIDInit)
%%IncludeResource: ProcSet (CIDInit)
%%BeginResource: CMap (TeX-OT1-0)
%%Title: (TeX-OT1-0 TeX OT1 0)
%%Version: 1.000
%%EndComments
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo
<< /Registry (TeX)
/Ordering (OT1)
/Supplement 0
>> def
/CMapName /TeX-OT1-0 def
/CMapType 2 def
1 begincodespacerange
<00> <7F>
endcodespacerange
8 beginbfrange
<00> <01> <0000>
<09> <0A> <0000>
<23> <26> <0000>
<28> <3B> <0000>
<3F> <5B> <0000>
<5D> <5E> <0000>
<61> <7A> <0000>
<7B> <7C> <0000>
endbfrange
40 beginbfchar
<02> <0000>
<03> <0000>
<04> <0000>
<05> <0000>
<06> <0000>
<07> <0000>
<08> <0000>
<0B> <0000>
<0C> <0000>
<0D> <0000>
<0E> <0000>
<0F> <0000>
<10> <0000>
<11> <0000>
<12> <0000>
<13> <0000>
<14> <0000>
<15> <0000>
<16> <0000>
<17> <0000>
<18> <0000>
<19> <0000>
<1A> <0000>
<1B> <0000>
<1C> <0000>
<1D> <0000>
<1E> <0000>
<1F> <0000>
<21> <0000>
<22> <0000>
<27> <0000>
<3C> <0000>
<3D> <0000>
<3E> <0000>
<5C> <0000>
<5F> <0000>
<60> <0000>
<7D> <0000>
<7E> <0000>
<7F> <0000>
endbfchar
endcmap
CMapName currentdict /CMap defineresource pop
end
end
%%EndResource
%%EOF
\end{VerbatimOut}
\usepackage{lipsum}
\begin{document}
\lipsum
\end{document}
Luatex allows manipulating fonts in the define_font
callback.
Luaotfload facilitates this even more with an extra hook it installs
right after the font loader has finished its job: the
luaotfload.patch_font
callback.
Normally it is used for serious and constructive tasks like setting a
couple font dimensions or ensuring backward compatibility in the data
structures.
Of course, it can also be abused for dirty hacks like disabling copy
and paste.
At the point where the patch_font
callback is applied, the font is
already defined and ready to use.
All necessary tables are created and put in a place where Luatex
expects them.
Among these is the characters
table that holds preprocessed
information about the glyphs.
In the below code we modify the tounicode
field of each glyph so
that it maps to some random location within the printable ASCII range.
Note that this does not affect the shape and metrics of the glyph since
those are unrelated to the actual codepoint.
As a consequence, the PDF will contain legible text that cannot be
copied.
Package file obfuscate.lua
:
packagedata = packagedata or { }
local mathrandom = math.random
local stringformat = string.format
--- this is the callback by means of which we will obfuscate
--- the tounicode values so they map to random characters of
--- the printable ascii range (between 0x21 / 33 and 0x7e / 126)
local obfuscate = function (tfmdata, _specification)
if not tfmdata or type (tfmdata) ~= "table" then
return
end
local characters = tfmdata.characters
if characters then
for codepoint, char in next, characters do
char.tounicode = stringformat ([[%0.4X]], mathrandom (0x21, 0x7e))
end
end
end
--- we also need some functions to toggle the callback activation so
--- we can obfuscate fonts selectively
local active = false
packagedata.obfuscate_begin = function ()
if not active then
luatexbase.add_to_callback ("luaotfload.patch_font", obfuscate,
"user.obfuscate_font", 1)
active = true
end
end
packagedata.obfuscate_end = function ()
if active then
luatexbase.remove_from_callback ("luaotfload.patch_font",
"user.obfuscate_font")
active = false
end
end
Usage demonstration:
%% we will need these packages
\input luatexbase.sty
\input luaotfload.sty
%% for inspecting the pdf with an ordinary editor
\pdfcompresslevel0
\pdfobjcompresslevel0
%% load obfuscation code
\RequireLuaModule {obfuscate}
%% convenience macro
\def \packagecmd #1{\directlua {packagedata.#1}}
%% the obfuscate environment, mapping to Lua functions that enable and
%% disable tounicode obfuscation
\def \beginobfuscate {\packagecmd {obfuscate_begin ()}}
\def \endobfuscate {\packagecmd {obfuscate_end ()}}
%%···································································%%
%% Demo
%%···································································%%
%% firstly, load some fonts. within the “obfuscate” environment all
%% fonts will get their cmaps scrambled ...
\beginobfuscate
\font \mainfont = "file:Iwona-Regular.otf:mode=base"
\font \italicfont = "file:Iwona-Italic.otf:mode=base"
\endobfuscate
%% ... while fonts defined outside will have the mapping intact
\font \boldfont = "file:Iwona-Bold.otf:mode=base"
\font \bolditalicfont = "file:Iwona-BoldItalic.otf:mode=base"
%% now we can use them in our document like any ordinary font
\mainfont
obfuscated text before {\italicfont obfuscated too} and after \par
obfuscated text before {\boldfont not obfuscated} and after \par
obfuscated text before {\bolditalicfont not obfuscated} and after \par
\bye
Result in PDF viewer:
Contrast this with the output of pdftotext
:
\rf2yC'I_J I_dI r_f\{_ 9;H`bp<<L& <99 '5J 'fI_{
\rf2yC'I_J I_dI r_f\{_ not obfuscated '5J 'fI_{
\rf2yC'I_J I_dI r_f\{_ not obfuscated '5J 'fI_{
But please forget about all this immediately and never obfuscate a production text -- don’t be mean to your readers!
EDIT Because the generous karma donor specifically asked for a Context solution, I’ll throw that one in as a bonus. It is a good deal more elegant since it relies on the font goodies mechanism that allows applying postprocessors to specific fonts which can afterwards be used just like common font features.
\startluacode
local mathrandom = math.random
local stringformat = string.format
--- create a postprocessor
local obfuscate = function (tfmdata)
fonts.goodies.registerpostprocessor (tfmdata, function (tfmdata)
if not tfmdata or type (tfmdata) ~= "table" then
return
end
local characters = tfmdata.characters
if characters then
for codepoint, char in next, characters do
char.tounicode = stringformat ([[%0.4X]], mathrandom (0x21, 0x7e))
end
end
end)
end
--- now register as a font feature
fonts.handlers.otf.features.register {
name = "obfuscate",
description = "treat the reader like a piece of garbage",
default = false,
initializers = {
base = obfuscate,
node = obfuscate,
}
}
\stopluacode
%%···································································%%
%% demonstration
%%···································································%%
%% we can now treat the obfuscation postprocessor like any other
%% font feature
\definefontfeature [obfuscate] [obfuscate=yes]
\definefont [mainfont] [file:Iwona-Regular.otf*obfuscate]
\definefont [italicfont] [file:Iwona-Italic.otf*obfuscate]
\definefont [boldfont] [file:Iwona-Bold.otf]
\definefont [bolditalicfont] [file:Iwona-BoldItalic.otf]
\starttext
\mainfont
obfuscated text before {\italicfont obfuscated too} and after \par
obfuscated text before {\boldfont not obfuscated} and after \par
obfuscated text before {\bolditalicfont not obfuscated} and after \par
\stoptext
Remarks
I use a little script, which converts all my fonts to paths. The script uses the first parameter as input of a .pdf
-file and writes the output to a file with the same name and the extension-rst.pdf
You need Ghostscript for my script to run.
Implementation
Runs on bash
#!/bin/sh
GS=/usr/bin/gs
$GS -sDEVICE=pswrite -dNOCACHE -sOutputFile=- -q -dBATCH -dNOPAUSE "$1" -c quit | ps2pdf - > "${1%%.*}-rst.pdf"
if [ $? -eq 0 ]; then
echo "Output written to ${1%%.*}-rst.pdf"
else
echo "There were errors. See the output."
fi
use ps2write (in stead of pswrite) these days as seen here.