Add text to Existing PDF using Python

pdfrw will let you read in pages from an existing PDF and draw them to a reportlab canvas (similar to drawing an image). There are examples for this in the pdfrw examples/rl1 subdirectory on github. Disclaimer: I am the pdfrw author.


I know this is an older post, but I spent a long time trying to find a solution. I came across a decent one using only ReportLab and PyPDF so I thought I'd share:

  1. read your PDF using PdfFileReader(), we'll call this input
  2. create a new pdf containing your text to add using ReportLab, save this as a string object
  3. read the string object using PdfFileReader(), we'll call this text
  4. create a new PDF object using PdfFileWriter(), we'll call this output
  5. iterate through input and apply .mergePage(*text*.getPage(0)) for each page you want the text added to, then use output.addPage() to add the modified pages to a new document

This works well for simple text additions. See PyPDF's sample for watermarking a document.

Here is some code to answer the question below:

packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
<do something with canvas>
can.save()
packet.seek(0)
input = PdfFileReader(packet)

From here you can merge the pages of the input file with another document.


Example for [Python 2.7]:

from pyPdf import PdfFileWriter, PdfFileReader
import StringIO
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = StringIO.StringIO()
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)

# create a new PDF with Reportlab
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(file("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = file("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()

Example for Python 3.x:


from PyPDF2 import PdfFileWriter, PdfFileReader
import io
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
can.drawString(10, 100, "Hello world")
can.save()

#move to the beginning of the StringIO buffer
packet.seek(0)

# create a new PDF with Reportlab
new_pdf = PdfFileReader(packet)
# read your existing PDF
existing_pdf = PdfFileReader(open("original.pdf", "rb"))
output = PdfFileWriter()
# add the "watermark" (which is the new pdf) on the existing page
page = existing_pdf.getPage(0)
page.mergePage(new_pdf.getPage(0))
output.addPage(page)
# finally, write "output" to a real file
outputStream = open("destination.pdf", "wb")
output.write(outputStream)
outputStream.close()

cpdf will do the job from the command-line. It isn't python, though (afaik):

cpdf -add-text "Line of text" input.pdf -o output .pdf

Tags:

Python

Pdf