extract text from tex, remove latex tags
Maybe not 100% what the OP requested, but maybe it is of some help.
There is pdftotext
in poppler-utils. This can convert a PDF file to a TXT file via
pdftotext yourPDF.pdf
Of course this incurs the overhead of installing this package, but I think it's neglible, since it is the standard library to render PDF on Linux if I remember correctly, so if you have a PDF viewer installed (Think Evince or Okular), it will be installed already.
Find here some more instructions.
opendetex is available both for windows and Linux
download the program opendetex from here
http://opendetex.googlecode.com/files/opendetex-2.8.1.tar.bz2
http://code.google.com/p/opendetex/downloads/list
Usage: http://code.google.com/p/opendetex/wiki/Usage
extract it to any directory of your choice. Say u extract it to Downloads directory.
make another directory of any name in that (optional. but its good if u create). say the directory name is “my_paper”. Put your paper in the “my_paper” directory. say your paper name is project.tex
Navigate through the path
cd ~/Downloads/opendetex
Run the command
detex -n my_paper/project.tex > out.txt
generic form
detex -n full_path_to_tex_file.tex > output_text_file.txt
detex(1):
Please see the OpenDetex GitHub page for the latest version of OpenDetex. It is a more modern, derivative version of my original DeTeX.
My legacy DeTeX home page is available here.
If you just want the legacy detex-2.8.tar source, you can get it here.