Reproducible LaTeX builds - compile to a file which always hashes to the same value

Since TeX Live 2016, there are a couple of options to achieve reproducible builds:

pdfTeX

For pdfTeX (version ≥1.40.17), there are three new primitives:

  • \pdfinfoomitdate, which removes the /CreationDate and /ModDate entries in the document info dictionary. By default, these entries would be set to the date when the document was compiled. They could already be modified in older versions of pdfTeX using \pdfinfo{/CreationDate (...)} /ModDate (...)}, but with \pdfinfoomitdate=1, they can be removed completely from the resulting PDF file.
  • \pdftrailerid, which sets the file identifier of the PDF document in the /ID document info entry as described in the PDF specification, section 10.3. By default, it is computed by hashing the current date and time (even if \pdfinfoomitdate=1 is used) and the full path of the output file. By including \pdftrailerid{string} with a fixed string in your document, the hash of this string is used as the identifier instead. Leaving it blank like \pdftrailerid{} completely removes the /ID entry.
  • \pdfsuppressptexinfo controls some additional metadata written to the document: firstly, pdfTeX usually creates an entry PTEX.Fullbanner containing the full version string as seen in the output of pdftex --version. Furthermore, for every PDF image included in your document, some additional metadata is written. Suppressing these entries is not strictly necessary for reproducible builds, but might help if you want to compile the same document on different systems. It can be done by issuing \pdfsuppressptexinfo=-1.

TL; DR: So the easiest way to get reproducible PDF output is to use

\documentclass{article}
\pdfinfoomitdate=1
\pdftrailerid{}
\begin{document}
Hello, World!
\end{document}

 

LuaTeX

Since TeX Live 2017, LuaTeX (version ≥1.0.4) also supports these features, albeit with a little different syntax:

  • \pdfvariable suppressoptionalinfo prevents certain metadata do be included in the resulting PDF file, similar to \pdfsuppressptexinfo in pdfTeX, but with more options:

    \pdfvariable suppressoptionalinfo \numexpr
            0
        +   1   % PTEX.FullBanner
        +   2   % PTEX.FileName
        +   4   % PTEX.PageNumber
        +   8   % PTEX.InfoDict
        +  16   % Creator
        +  32   % CreationDate
        +  64   % ModDate
        + 128   % Producer
        + 256   % Trapped
        + 512   % ID
    \relax
    
  • \pdfvariable trailerid lets you specify your own file identifier like \pdftrailerid does, but you have to get the syntax right yourself, so I recommend simply suppressing the ID using the above command instead.

TL; DR: For reproducible builds in LuaLaTeX, use

\documentclass{article}
\pdfvariable suppressoptionalinfo \numexpr32+64+512\relax
\begin{document}
Hello, World!
\end{document}

 

XeTeX

Since TeX Live 2019, XeTeX supports specifying the file identifier:

  • pdf:trailerid is a \special command recognized by dvipdfmx, which is used by XeTeX to produce PDF files. The value format is the same as for the \pdfvariable trailerid in LuaTeX: a raw PDF array of two PDF strings. Both strings must be 16 bytes. The dvipdfmx doc gives an example with literal strings (specified between parentheses). Another example is with a 16-byte hex string (specified between brackets <[…]>) that could be an MD5 hash identifying the document:

    \special{pdf:trailerid [
        <00112233445566778899aabbccddeeff>
        <00112233445566778899aabbccddeeff>
    ]}
    

 

All major engines (pdfTeX, LuaTeX, XeTeX)

As an alternative, pdfTeX, LuaTeX and XeTeX support SOURCE_DATE_EPOCH:

If you set the SOURCE_DATE_EPOCH environment variable to a certain date (in the form of a Unix timestamp, as produced e.g. by the output of date +%s), this date is used instead of the current date. Setting it to a fixed date therefore lets you create reproducible PDF files without any changes to the LaTeX source code. Keep in mind however that the output file name (for pdfTeX and LuaTeX including the full path, for XeTeX only the name itself) is still used to compute the file identifier described above: so if you move or rename your LaTeX document and compile, the resulting PDF document will change.


update there is now (texlive 2016) SOURCE_DATE_EPOCH support in pdftex to address this (see the other answer).

If you modify the source to

\pdfcompresslevel=0
\pdfobjcompresslevel=0
\documentclass{article}
\begin{document}
Hello, World!
\end{document}

And run it twice, you find exactly three lines change

/CreationDate (D:20150222233514Z)
/ModDate (D:20150222233514Z)

/ID [<84943B8BBB033F5EF8FAE4B3E350E35C> <84943B8BBB033F5EF8FAE4B3E350E35C>] >>

So one possibility would be to use a wrapper script that ran pdflatex then blanked out these fields, keeping the byte count the same.