Reproducible LaTeX builds - compile to a file which always hashes to the same value
Since TeX Live 2016, there are a couple of options to achieve reproducible builds:
pdfTeX
For pdfTeX (version ≥1.40.17), there are three new primitives:
\pdfinfoomitdate
, which removes the/CreationDate
and/ModDate
entries in the document info dictionary. By default, these entries would be set to the date when the document was compiled. They could already be modified in older versions of pdfTeX using\pdfinfo{/CreationDate (...)} /ModDate (...)}
, but with\pdfinfoomitdate=1
, they can be removed completely from the resulting PDF file.\pdftrailerid
, which sets the file identifier of the PDF document in the/ID
document info entry as described in the PDF specification, section 10.3. By default, it is computed by hashing the current date and time (even if\pdfinfoomitdate=1
is used) and the full path of the output file. By including\pdftrailerid{string}
with a fixed string in your document, the hash of this string is used as the identifier instead. Leaving it blank like\pdftrailerid{}
completely removes the/ID
entry.\pdfsuppressptexinfo
controls some additional metadata written to the document: firstly, pdfTeX usually creates an entryPTEX.Fullbanner
containing the full version string as seen in the output ofpdftex --version
. Furthermore, for every PDF image included in your document, some additional metadata is written. Suppressing these entries is not strictly necessary for reproducible builds, but might help if you want to compile the same document on different systems. It can be done by issuing\pdfsuppressptexinfo=-1
.
TL; DR: So the easiest way to get reproducible PDF output is to use
\documentclass{article}
\pdfinfoomitdate=1
\pdftrailerid{}
\begin{document}
Hello, World!
\end{document}
LuaTeX
Since TeX Live 2017, LuaTeX (version ≥1.0.4) also supports these features, albeit with a little different syntax:
\pdfvariable suppressoptionalinfo
prevents certain metadata do be included in the resulting PDF file, similar to\pdfsuppressptexinfo
in pdfTeX, but with more options:\pdfvariable suppressoptionalinfo \numexpr 0 + 1 % PTEX.FullBanner + 2 % PTEX.FileName + 4 % PTEX.PageNumber + 8 % PTEX.InfoDict + 16 % Creator + 32 % CreationDate + 64 % ModDate + 128 % Producer + 256 % Trapped + 512 % ID \relax
\pdfvariable trailerid
lets you specify your own file identifier like\pdftrailerid
does, but you have to get the syntax right yourself, so I recommend simply suppressing the ID using the above command instead.
TL; DR: For reproducible builds in LuaLaTeX, use
\documentclass{article}
\pdfvariable suppressoptionalinfo \numexpr32+64+512\relax
\begin{document}
Hello, World!
\end{document}
XeTeX
Since TeX Live 2019, XeTeX supports specifying the file identifier:
pdf:trailerid
is a\special
command recognized bydvipdfmx
, which is used by XeTeX to produce PDF files. The value format is the same as for the\pdfvariable trailerid
in LuaTeX: a raw PDF array of two PDF strings. Both strings must be 16 bytes. The dvipdfmx doc gives an example with literal strings (specified between parentheses). Another example is with a 16-byte hex string (specified between brackets<[…]>
) that could be an MD5 hash identifying the document:\special{pdf:trailerid [ <00112233445566778899aabbccddeeff> <00112233445566778899aabbccddeeff> ]}
All major engines (pdfTeX, LuaTeX, XeTeX)
As an alternative, pdfTeX, LuaTeX and XeTeX support SOURCE_DATE_EPOCH
:
If you set the SOURCE_DATE_EPOCH
environment variable to a certain date (in the form of a Unix timestamp, as produced e.g. by the output of date +%s
), this date is used instead of the current date. Setting it to a fixed date therefore lets you create reproducible PDF files without any changes to the LaTeX source code. Keep in mind however that the output file name (for pdfTeX and LuaTeX including the full path, for XeTeX only the name itself) is still used to compute the file identifier described above: so if you move or rename your LaTeX document and compile, the resulting PDF document will change.
update there is now (texlive 2016) SOURCE_DATE_EPOCH support in pdftex to address this (see the other answer).
If you modify the source to
\pdfcompresslevel=0
\pdfobjcompresslevel=0
\documentclass{article}
\begin{document}
Hello, World!
\end{document}
And run it twice, you find exactly three lines change
/CreationDate (D:20150222233514Z)
/ModDate (D:20150222233514Z)
/ID [<84943B8BBB033F5EF8FAE4B3E350E35C> <84943B8BBB033F5EF8FAE4B3E350E35C>] >>
So one possibility would be to use a wrapper script that ran pdflatex then blanked out these fields, keeping the byte count the same.