How to know if a PDF file is infected?
Analyzing malicious PDF can sometimes be very tricky, attackers are becoming more and more creative in a way of infecting people.
But let's make this simple, here are some examples which will indicate that PDF is malicious.
JavaScript based exploits
The PDF specification supports JavaScript programming and makes a number of JavaScript functions available to programmers in the form of APIs.
Due to its flexibility and ease of use, JavaScript is widely used in malicious PDFs, and it is used to exploit a vulnerable JavaScript API and to setup the PDF reader program’s memory with malicious code (aka heap spray).
Non-JavaScript based exploits
Although the majority of malicious PDFs observed in the wild use JavaScript, either for the exploit or to set up the memory for further exploitation, we have observed other techniques used as well. One alternative to using JavaScript is to embed Flash objects in the PDF instead.
From PDF document: The Rise of PDF Malware
Here is also nice cheat-sheet for analyzing malicious documents.
Also take a look of 'How can I tell if a PDF file I was sent contains malware?'
After a little looking it appears that the tool you are using to investigate this PDF document is standalone python(?) tool written by a "security researcher". I put that title in quotes simply because I know nothing about him, other than the fact that he claims to be a security researcher and likes putting his name on his website.
Perhaps someone who is more of a PDF expert can come by and give some better information, but from what I have seen so far it doesn't seem like his tool is actually very helpful for trying to decide if a particular PDF file contains malicious javascript. Considering that both javascript and actions are a part of the Adobe standard for PDF files, it seems crazy to assume that just because a PDF file contains javascript/actions that it might be malicious. He doesn't state that himself, but he does state the very useless qualifier that "every malicious PDF file I have seen contains javascript/actions". Here is an equally true statement: "Every malicious website I have seen contains javascript". Do I therefore disable javascript in my browser or avoid pages with javascript? Obviously not. From my perspective, the biggest problem I see is a research who perhaps doesn't understand the difference between correlation and causation.
That being said, it is possible this document contains malicious javascript. The best way to find out would be to try to extract the javascript in question and see what it actually does without running it. Since the tool in question is already parsing a PDF file, it may be possible to get that information out of said tool. Then again you might have to find another tool or attempt it yourself.
If none of those options appeal to you I would try to look at this as a risk/benefit analysis:
- Do you have any reason to distrust this PDF file?
- Did it come from a reputable source?
If it came from a reputable source and you have no reason to distrust it, I would probably just open it. If you are worried you can always try to open it in a virtual machine, or find a PDF reader that doesn't process javascript. You can also try to find a way to remove any javascript from the PDF before viewing. I imagine that this is what pdfid -d
is supposed to do, but considering that I know nothing about the tool that would be something best directed to the author.
If you are on linux something as simple as:
pdf2ps input.pdf - | ps2pdf - output.pdf
may work. This will convert it from pdf to ps and back to pdf. Basically, it prints the file, which (I believe) will remove all meta information. I imagine that pdf2ps doesn't have a built in javascript library, so I think it is safe to assume that any malicious javascript will be securely removed in this process.
Then again, all of this is an "off the top of my head" answer, so your best bet is to ask another question about how to safely remove javascript from a PDF file. I'm sure that is a much more concrete (and easily answered) question then "How to know if a PDF file is infected?".