Convert Word doc, docx and Excel xls, xlsx to PDF with PHP
I found a solution to my issue and after a request, will post it here to help others. Apologies if I missed any details, it's been a while since I worked on this solution.
The first thing that is required is to install Openoffice.org on the server. I requested my hosting provider to install the open office RPM on my VPS. This can be done through WHM directly.
Now that the server has the capability to handle MS Office files you are able to convert the files by executing command line instructions via PHP. To handle this, I found PyODConverter: https://github.com/mirkonasato/pyodconverter
I created a directory on the server and placed the PyODConverter python file within it. I also created a plain text file above the web root (I named it "adocpdf"), with the following command line instructions in it:
directory=$1
filename=$2
extension=$3
SERVICE='soffice'
if [ "`ps ax|grep -v grep|grep -c $SERVICE`" -lt 1 ]; then
unset DISPLAY
/usr/bin/soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &
sleep 5s
fi
python /home/website/python/DocumentConverter.py /home/website/$directory$filename$extension /home/website/$directory$filename.pdf
This checks that the openoffice.org libraries are running and then calls the PyODConverter script to process the file and output it as a PDF. The 3 variables on the first three lines are provided when the script is executed from with a PHP file. The delay ("sleep 5s") is used to ensure that openoffice.org has enough to time to initiate if required. I have used this for months now and the 5s gap seems to give enough breathing room.
The script will create a PDF version of the document in the same directory as the original.
Finally, initiating the conversion of a Word / Excel file from within PHP (I have it within a function that checks if the file we are dealing with is a word / excel document)...
//use openoffice.org
$output = array();
$return_var = 0;
exec("/opt/adocpdf {$directory} {$filename} {$extension}", $output, $return_var);
This PHP function is called once the Word / Excel file has been uploaded to the server. The 3 variables in the exec() call relate directly to the 3 at the start of the plain text script above. Note that the $directory variable requires no leading forward slash if the file for conversion is within the web root.
OK, that's it! Hopefully this will be useful to someone and save them the difficulties and learning curve I faced.
I successfully put a portable version of libreoffice on my host's webserver, which I call with PHP to do a commandline conversion from .docx, etc. to pdf. on the fly. I do not have admin rights on my host's webserver. Here is my blog post of what I did:
http://geekswithblogs.net/robertphyatt/archive/2011/11/19/converting-.docx-to-pdf-or-.doc-to-pdf-or-.doc.aspx
Yay! Convert directly from .docx or .odt to .pdf using PHP with LibreOffice (OpenOffice's successor)!
Well my 2 cents when it comes to the topic word 2007 docx
, word 97-2004 doc
, pdf
and all other types of MS Office wishing to be "converted from y
to z
but in real they don't wanna be". In my experience so far, conversion with LibreOffice or OpenOffice can't be relied on. Though .doc
documents tend to be better supported than word 2007's .docx
. In general it's very hard to convert the .docx
to .doc
without breaking anything.
.docx
also tend to be extremely useful for templating where .doc
is not for being binary.
The conversion from .doc
to PDF was most of the time quite reliable. If you can still influence the design or content of the word document then this might be satisfying, but in my situation documents were supplied from foreign companies where even after generating the .docx
templates, in some scenario's, the generated .docx
had to be slightly modified with supplement text before it was generated to a PDF.
WINDOWS BASED!
All this hiccup made me come to the conclusion that the only true reliable conversion method I found was using the COM class in PHP and let the MS Word or Excel Application do all the work for you. I'll just give an example on converting .docx
to .doc
and/or PDF. If you do not have MS Office installed, you can download a trial version of 60 days which would give you enough room for testing purposes.
the COM.net extension is by default commented out in the php.ini
, just search for the line php_com_dotnet.dll
and uncomment it like so
extension=php_com_dotnet.dll
Restart the web server (IIS is not a pre, Apache will work just as well).
The code below is a demonstration on how easy it is.
$word = new COM("Word.Application") or die ("Could not initialise Object.");
// set it to 1 to see the MS Word window (the actual opening of the document)
$word->Visible = 0;
// recommend to set to 0, disables alerts like "Do you want MS Word to be the default .. etc"
$word->DisplayAlerts = 0;
// open the word 2007-2013 document
$word->Documents->Open('yourdocument.docx');
// save it as word 2003
$word->ActiveDocument->SaveAs('newdocument.doc');
// convert word 2007-2013 to PDF
$word->ActiveDocument->ExportAsFixedFormat('yourdocument.pdf', 17, false, 0, 0, 0, 0, 7, true, true, 2, true, true, false);
// quit the Word process
$word->Quit(false);
// clean up
unset($word);
This is just a small demonstration. I can just say that if it comes to conversion, this was the only real reliable option I could use and even recommend.