Convert a pdf file to text in C#

I've had the need myself and I used this article to get me started: http://www.codeproject.com/KB/string/pdf2text.aspx


Ghostscript could do what you need. Below is a command for extracting text from a pdf file into a txt file (you can run it from a command line to test if it works for you):

gswin32c.exe -q -dNODISPLAY -dSAFER -dDELAYBIND -dWRITESYSTEMDICT -dSIMPLE -c save -f ps2ascii.ps "test.pdf" -c quit >"test.txt"

Check here: codeproject: Convert PDF to Image Using Ghostscript API for details on how to use ghostscript with C#

Tags:

C#

Pdf

Text Files