Extracting text from a PDF file

try this

http://www.codeproject.com/KB/cs/PDFToText.aspx

Bye

pdftotext seems to do the trick quite nicely.

pdftotext file.pdf [textfile.txt]

Edit: I'm not sure how you would like to retain information about the tables. The best looking output (to my human eye, at least) is produced by

pdftotext -layout file.pdf [textfile.txt]

This maintains the original layout of the document as best as possible. In particular, the tables still look pretty good in the text output. The default is to interpret the columns of the table as columns of text (terrible). Another option that doesn't look as good to me, but might still be useful, is the -raw option.

Extracting text from a PDF file

Tags:

C#

Java

Pdf

Related

Recent Posts