Can I read PDF or Word Docs with Node.js?

textract is a great lib that supports PDFs, Doc, Docx, etc.


Looks like there's a few for pdf, but I didn't find any for Word.

CPU bound processing like that isn't really Node's strong point anyway (i.e. you get no additional benefits using node to do it over any other language). A pragmatic approach would be to find a good tool and utilise it from Node.

I have heard good things around the office about docsplit http://documentcloud.github.com/docsplit/

While it's not Node, you could easily invoke it from Node with http://nodejs.org/docs/latest/api/all.html#child_process.exec