How to convert HTML to PDF using pandoc?
From https://pandoc.org/MANUAL.html:
Alternatively, pandoc can use any of the following HTML/CSS-to-PDF-engines, to create a PDF:
- wkhtmltopdf
- weasyprint
- prince
To do this, specify an output file with a .pdf extension, as before, but add the --pdf-engine option or -t context, -t html, or -t ms to the command line (-t html defaults to --pdf-engine=wkhtmltopdf).
The following works for me:
pandoc --pdf-engine=xelatex https://www.python.org/dev/peps/pep-0008/ -o pep8.pdf
You need to have a LaTeX distribution installed such as TeX Live.
If you want to color the links, you should add linkcolors
options. If the webpage contains CJK characters, you need to specify CJKmainfont
options. An example is shown below:
pandoc --pdf-engine=xelatex -V colorlinks -V CJKmainfont="KaiTi" https://jdhao.github.io/2019/01/07/windows_tools_for_programmers/ -o programmer_tools.pdf
The font KaiTi
supports Chinese characters. If you use other languages, you may use the mainfont
option to specify a font which supports the language the webpage uses.
If the webpage contains svg images, you also need to install rsvg-convert to successfully convert the webpage to PDF files (see reference here).
pdf
is not a valid output format. latex
and beamer
(for latex slideshows) are.
To create a pdf, use -t latex
and -o myoutput.pdf
. You can omit the -t
argument since a .pdf
in -o
defaults to latex. So you can use either:
pandoc reports/7/report.html -t latex -o reports/7/report.pdf
or:
pandoc reports/7/report.html -o reports/7/report.pdf