How to convert .tex into .odt?

There is a tool in the repositories that changes LaTeX to openoffice.org's XML format: tex4ht Install tex4ht

TeX4ht is a highly configurable TeX-based authoring system for producing hypertext. It interacts with TeX-based applications through style files and postprocessors, leaving the processing of the source files to the native TeX compiler. Consequently, TeX4ht can handle the features of TeX-based systems in general, and of LaTeX in particular.

TeX4ht can be used both for authoring HTML using TeX/LaTeX input files, or for converting existing TeX input files (in any format) into HTML, with (usually) only minor modifications. Other varieties of hypertext can also be produced, including XML, XHTML, MathML and the Openoffice.org format of XML.

Command line...

  1. latex filename.tex
  2. bibtex filename.aux
  3. mk4ht oolatex filename.tex

Should end up with an openoffice.org/libreoffice compatible file.


Believe it or not, with complex documents and lots of packages included, I got much better results with LaTeX2HTML than with LaTeX2RTF, Pandoc or TeX4ht.

latex2html texfile.tex -split 0 -no_navigation -info "" -address "" -html_version 4.0,unicode

This will generate a folder with the same texfile name, so you'll be able to convert the generated HTML to ODT:

libreoffice --headless --convert-to odt:"OpenDocument Text Flat XML" texfile/index.html

This will produce a index.odt file. Take a look at this answer to check how to use LibreOffice's convert filters.

Edit from comment discussion:

Although the method above works, it is very disappointing that the only way I found to generate a trully reliable document is using the PDF output from LaTeX on Adobe Acrobat Pro.


Another solution is provided from the package pandoc Install pandoc

As an example, you can do:

pandoc -f latex -t odt -o output.odt input.tex

If the input file is latin1 encoded, like my tex files, the solution is:

iconv -f ISO-8859-1 input.tex | pandoc -f latex -t odt -o output.odt

I report part of the description of the package:

Pandoc is a Haskell library for converting from one markup format to another, and a command-line tool that uses this library. It can read

  • markdown and
  • subsets of
    • reStructuredText,
    • HTML, and
    • LaTeX

and it can write

  • plain text,
  • markdown,
  • reStructuredText,
  • HTML, -LaTeX, -ConTeXt,
  • RTF,
  • DocBook XML,
  • OpenDocument XML,
  • ODT,
  • GNU Texinfo,
  • MediaWiki markup,
  • EPUB,
  • Textile,
  • groff man pages,
  • Emacs Org-mode, and
  • Slidy or
  • S5 HTML slide shows.