How to convert HTML to PDF using pandoc?

From https://pandoc.org/MANUAL.html:

Alternatively, pandoc can use any of the following HTML/CSS-to-PDF-engines, to create a PDF:

  • wkhtmltopdf
  • weasyprint
  • prince

To do this, specify an output file with a .pdf extension, as before, but add the --pdf-engine option or -t context, -t html, or -t ms to the command line (-t html defaults to --pdf-engine=wkhtmltopdf).


The following works for me:

pandoc --pdf-engine=xelatex https://www.python.org/dev/peps/pep-0008/ -o pep8.pdf

You need to have a LaTeX distribution installed such as TeX Live.

If you want to color the links, you should add linkcolors options. If the webpage contains CJK characters, you need to specify CJKmainfont options. An example is shown below:

pandoc --pdf-engine=xelatex -V colorlinks -V CJKmainfont="KaiTi" https://jdhao.github.io/2019/01/07/windows_tools_for_programmers/ -o programmer_tools.pdf

The font KaiTi supports Chinese characters. If you use other languages, you may use the mainfont option to specify a font which supports the language the webpage uses.

If the webpage contains svg images, you also need to install rsvg-convert to successfully convert the webpage to PDF files (see reference here).


pdf is not a valid output format. latex and beamer (for latex slideshows) are.

To create a pdf, use -t latex and -o myoutput.pdf. You can omit the -t argument since a .pdf in -o defaults to latex. So you can use either:

pandoc reports/7/report.html -t latex -o reports/7/report.pdf

or:

pandoc reports/7/report.html -o reports/7/report.pdf

Tags:

Pandoc