How do I make pdfcrop output all pages of the same size?

I found that the --verbose flag will output the bounding box used at each step. Since this was a "growing" animation, the last page is the largest.

So to get them all the same size, I ran pdfcrop with --verbose and extracted this output:

%%HiResBoundingBox: 48.000022 299.872046 624.124950 420.127932

and then fed that to a second run of pdfcrop, specifying the bounding box:

pdfcrop --bbox "48.000022 299.872046 624.124950 420.127932" ~/animation.pdf

If the last page is not the largest, we need to compute the maximum width and height among all the pages, and then use these values to determine the right bounding boxes. Note that the four coordinates in a bounding box are:

  • x-coordinate (distance from left edge of page) of upper-left corner,
  • y-coordinate (distance from top edge of page) of upper-left corner,
  • x-coordinate (distance from left edge of page) of bottom-right corner,
  • y-coordinate (distance from top edge of page) of bottom-right corner.

Computing the right bounding boxes for each page and using them could be done with an appropriate patch to the pdfcrop script (it's written in Perl), but as I'm not very comfortable with Perl, did it in Python instead; here is the script in case it's useful to someone.

import re, sys
lines = sys.stdin.readlines()
width = height = 0
# First pass: compute |width| and |height|.
for line in lines:
  m = re.match(r'\\page (\d*) \[([0-9.]*) ([0-9.]*) ([0-9.]*) ([0-9.]*)\](.*)', line, re.DOTALL)
  if m:
    page, xmin, ymin, xmax, ymax, rest = m.groups()
    width = max(width, float(xmax) - float(xmin))
    height = max(height, float(ymax) - float(ymin))
# Second pass: change bounding boxes to have width |width| and height |height|.
for line in lines:
  m = re.match(r'\\page (\d*) \[([0-9.]*) ([0-9.]*) ([0-9.]*) ([0-9.]*)\](.*)', line, re.DOTALL)
  if m:
    page, xmin, ymin, xmax, ymax, rest = m.groups()
    xmin = float(xmin)
    ymin = float(ymin)
    xmax = float(xmax)
    ymax = float(ymax)
    # We want |xmin| and |xmax| such that their difference is |width|
    addx = (width - (xmax - xmin)) / 2.0
    xmin -= addx
    xmax += addx
    # We want |ymin| and |ymax| such that their difference is |height|
    addy = (height - (ymax - ymin)) / 2.0
    ymin -= addy
    ymax += addy
    sys.stdout.write(r'\page %s [%s %s %s %s]%s' % (page, xmin, ymin, xmax, ymax, rest))
  else:
    sys.stdout.write(line)

Usage:

  1. Run the regular pdfcrop command, with --debug, e.g.:

    pdfcrop --debug foo.pdf
    

    Because of --debug, it will not delete the tmp-pdfcrop-*.tex file it created. Also, note down the pdftex (or whatever) command it executed at the end, if you had passed in some special options to pdfcrop and it's therefore nontrivial.

  2. Pass the tmp-pdfcrop-* file through the script above, e.g.:

    python find-common.py < tmp-pdfcrop-34423.tex > tmp-pdfcrop-common.tex
    

    This will write out tmp-pdfcrop-common.tex with different bounding boxes.

  3. Run the pdftex (or whatever) command that pdfcrop called, with this file:

    pdftex -no-shell-escape -interaction=nonstopmode tmp-pdfcrop-common.tex
    
  4. Check the resulting PDF file, and rename it to whatever you like:

    mv tmp-pdfcrop-common.pdf foo-crop.pdf