Is it possible to produce a PDF with un-copyable text?

Besides converting all texts to images, one method as I know, is to destroy the Cmaps of the fonts. We can use cmap package and a special cmap file for this purpose. This cmap file is generated inside the VerbatimOut environment.

(Warning: it does not make much sense to produce un-copyable PDF. OCR is very easy today.)

% pdflatex is required
\documentclass{article}
\usepackage[resetfonts]{cmap}
\usepackage{fancyvrb}
\begin{VerbatimOut}{ot1.cmap}
%!PS-Adobe-3.0 Resource-CMap
%%DocumentNeededResources: ProcSet (CIDInit)
%%IncludeResource: ProcSet (CIDInit)
%%BeginResource: CMap (TeX-OT1-0)
%%Title: (TeX-OT1-0 TeX OT1 0)
%%Version: 1.000
%%EndComments
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo
<< /Registry (TeX)
/Ordering (OT1)
/Supplement 0
>> def
/CMapName /TeX-OT1-0 def
/CMapType 2 def
1 begincodespacerange
<00> <7F>
endcodespacerange
8 beginbfrange
<00> <01> <0000>
<09> <0A> <0000>
<23> <26> <0000>
<28> <3B> <0000>
<3F> <5B> <0000>
<5D> <5E> <0000>
<61> <7A> <0000>
<7B> <7C> <0000>
endbfrange
40 beginbfchar
<02> <0000>
<03> <0000>
<04> <0000>
<05> <0000>
<06> <0000>
<07> <0000>
<08> <0000>
<0B> <0000>
<0C> <0000>
<0D> <0000>
<0E> <0000>
<0F> <0000>
<10> <0000>
<11> <0000>
<12> <0000>
<13> <0000>
<14> <0000>
<15> <0000>
<16> <0000>
<17> <0000>
<18> <0000>
<19> <0000>
<1A> <0000>
<1B> <0000>
<1C> <0000>
<1D> <0000>
<1E> <0000>
<1F> <0000>
<21> <0000>
<22> <0000>
<27> <0000>
<3C> <0000>
<3D> <0000>
<3E> <0000>
<5C> <0000>
<5F> <0000>
<60> <0000>
<7D> <0000>
<7E> <0000>
<7F> <0000>
endbfchar
endcmap
CMapName currentdict /CMap defineresource pop
end
end
%%EndResource
%%EOF
\end{VerbatimOut}

\usepackage{lipsum}

\begin{document}

\lipsum

\end{document}

Luatex allows manipulating fonts in the define_font callback. Luaotfload facilitates this even more with an extra hook it installs right after the font loader has finished its job: the luaotfload.patch_font callback. Normally it is used for serious and constructive tasks like setting a couple font dimensions or ensuring backward compatibility in the data structures. Of course, it can also be abused for dirty hacks like disabling copy and paste.

At the point where the patch_font callback is applied, the font is already defined and ready to use. All necessary tables are created and put in a place where Luatex expects them. Among these is the characters table that holds preprocessed information about the glyphs. In the below code we modify the tounicode field of each glyph so that it maps to some random location within the printable ASCII range. Note that this does not affect the shape and metrics of the glyph since those are unrelated to the actual codepoint. As a consequence, the PDF will contain legible text that cannot be copied.

Package file obfuscate.lua:

packagedata = packagedata or { }

local mathrandom    = math.random
local stringformat  = string.format

--- this is the callback by means of which we will obfuscate
--- the tounicode values so they map to random characters of
--- the printable ascii range (between 0x21 / 33 and 0x7e / 126)

local obfuscate = function (tfmdata, _specification)
  if not tfmdata or type (tfmdata) ~= "table" then
    return
  end

  local characters = tfmdata.characters
  if characters then
    for codepoint, char in next, characters do
      char.tounicode = stringformat ([[%0.4X]], mathrandom (0x21, 0x7e))
    end
  end
end

--- we also need some functions to toggle the callback activation so
--- we can obfuscate fonts selectively

local active = false

packagedata.obfuscate_begin = function ()
  if not active then
    luatexbase.add_to_callback ("luaotfload.patch_font", obfuscate,
                                "user.obfuscate_font", 1)
    active = true
  end
end

packagedata.obfuscate_end = function ()
  if active then
    luatexbase.remove_from_callback ("luaotfload.patch_font",
                                     "user.obfuscate_font")
    active = false
  end
end

Usage demonstration:

%% we will need these packages
\input luatexbase.sty
\input luaotfload.sty

%% for inspecting the pdf with an ordinary editor
\pdfcompresslevel0
\pdfobjcompresslevel0

%% load obfuscation code
\RequireLuaModule {obfuscate}

%% convenience macro
\def \packagecmd #1{\directlua {packagedata.#1}}

%% the obfuscate environment, mapping to Lua functions that enable and
%% disable tounicode obfuscation
\def \beginobfuscate {\packagecmd {obfuscate_begin ()}}
\def \endobfuscate   {\packagecmd {obfuscate_end   ()}}

%%···································································%%
%% Demo
%%···································································%%

%% firstly, load some fonts. within the “obfuscate” environment all
%% fonts will get their cmaps scrambled ...

\beginobfuscate

  \font \mainfont   = "file:Iwona-Regular.otf:mode=base"
  \font \italicfont = "file:Iwona-Italic.otf:mode=base"

\endobfuscate

%% ... while fonts defined outside will have the mapping intact

\font \boldfont       = "file:Iwona-Bold.otf:mode=base"
\font \bolditalicfont = "file:Iwona-BoldItalic.otf:mode=base"

%% now we can use them in our document like any ordinary font

\mainfont
obfuscated text before {\italicfont     obfuscated too} and after \par
obfuscated text before {\boldfont       not obfuscated} and after \par
obfuscated text before {\bolditalicfont not obfuscated} and after \par

\bye

Result in PDF viewer:

result displayed

Contrast this with the output of pdftotext:

\rf2yC'I_J I_dI r_f\{_ 9;H`bp<<L& <99 '5J 'fI_{
\rf2yC'I_J I_dI r_f\{_ not obfuscated '5J 'fI_{
\rf2yC'I_J I_dI r_f\{_ not obfuscated '5J 'fI_{

But please forget about all this immediately and never obfuscate a production text -- don’t be mean to your readers!

EDIT Because the generous karma donor specifically asked for a Context solution, I’ll throw that one in as a bonus. It is a good deal more elegant since it relies on the font goodies mechanism that allows applying postprocessors to specific fonts which can afterwards be used just like common font features.

\startluacode

local mathrandom    = math.random
local stringformat  = string.format

--- create a postprocessor

local obfuscate = function (tfmdata)
  fonts.goodies.registerpostprocessor (tfmdata, function (tfmdata)
    if not tfmdata or type (tfmdata) ~= "table" then
      return
    end

    local characters = tfmdata.characters
    if characters then
      for codepoint, char in next, characters do
        char.tounicode = stringformat ([[%0.4X]], mathrandom (0x21, 0x7e))
      end
    end
  end)
end

--- now register as a font feature

fonts.handlers.otf.features.register {
  name         = "obfuscate",
  description  = "treat the reader like a piece of garbage",
  default      = false,
  initializers = {
    base     = obfuscate,
    node     = obfuscate,
  }
}

\stopluacode

%%···································································%%
%% demonstration
%%···································································%%

%% we can now treat the obfuscation postprocessor like any other
%% font feature

\definefontfeature [obfuscate] [obfuscate=yes]

\definefont [mainfont]   [file:Iwona-Regular.otf*obfuscate]
\definefont [italicfont] [file:Iwona-Italic.otf*obfuscate]

\definefont [boldfont]       [file:Iwona-Bold.otf]
\definefont [bolditalicfont] [file:Iwona-BoldItalic.otf]


\starttext

  \mainfont
  obfuscated text before {\italicfont     obfuscated too} and after \par
  obfuscated text before {\boldfont       not obfuscated} and after \par
  obfuscated text before {\bolditalicfont not obfuscated} and after \par

\stoptext

Remarks

I use a little script, which converts all my fonts to paths. The script uses the first parameter as input of a .pdf-file and writes the output to a file with the same name and the extension-rst.pdf

You need Ghostscript for my script to run.

Implementation

Runs on bash

#!/bin/sh

GS=/usr/bin/gs

$GS -sDEVICE=pswrite        -dNOCACHE -sOutputFile=-        -q -dBATCH -dNOPAUSE "$1"       -c quit | ps2pdf - > "${1%%.*}-rst.pdf"
if [ $? -eq 0 ]; then
    echo "Output written to ${1%%.*}-rst.pdf"
else
    echo "There were errors. See the output."
fi

use ps2write (in stead of pswrite) these days as seen here.

Is it possible to produce a PDF with un-copyable text?

Remarks

Implementation

Result

Tags:

Pdf

Copy Paste

Drm

Related

Recent Posts