Convert a PDF file to image
You can try to use NonSequentialParser to avoid errors with some PDF files (with incremental updates):
PDDocument doc = PDDocument.loadNonSeq(new File("/document.pdf"));
You can easily convert 04-Request-Headers.pdf file pages into image format.
Convert all pdf pages into image format in Java using PDF Box.
Solution for Apache PDFBox 1.8.* version:
Jar required pdfbox-1.8.3.jar
or the maven dependency
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>1.8.3</version>
</dependency>
Here is the solution:
package com.pdf.pdfbox.examples;
import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
@SuppressWarnings("unchecked")
public class ConvertPDFPagesToImages {
public static void main(String[] args) {
try {
String sourceDir = "C:/Documents/04-Request-Headers.pdf"; // Pdf files are read from this folder
String destinationDir = "C:/Documents/Converted_PdfFiles_to_Image/"; // converted images from pdf document are saved here
File sourceFile = new File(sourceDir);
File destinationFile = new File(destinationDir);
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("Folder Created -> "+ destinationFile.getAbsolutePath());
}
if (sourceFile.exists()) {
System.out.println("Images copied to Folder: "+ destinationFile.getName());
PDDocument document = PDDocument.load(sourceDir);
List<PDPage> list = document.getDocumentCatalog().getAllPages();
System.out.println("Total files to be converted -> "+ list.size());
String fileName = sourceFile.getName().replace(".pdf", "");
int pageNumber = 1;
for (PDPage page : list) {
BufferedImage image = page.convertToImage();
File outputfile = new File(destinationDir + fileName +"_"+ pageNumber +".png");
System.out.println("Image Created -> "+ outputfile.getName());
ImageIO.write(image, "png", outputfile);
pageNumber++;
}
document.close();
System.out.println("Converted Images are saved at -> "+ destinationFile.getAbsolutePath());
} else {
System.err.println(sourceFile.getName() +" File not exists");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Possible conversions of image into jpg, jpeg, png, bmp, gif
format.
Note: I mentioned the mainly used image formats.
ImageIO.write(image , "jpg", new File( destinationDir +fileName+"_"+pageNumber+".jpg" ));
ImageIO.write(image , "jpeg", new File( destinationDir +fileName+"_"+pageNumber+".jpeg" ));
ImageIO.write(image , "png", new File( destinationDir +fileName+"_"+pageNumber+".png" ));
ImageIO.write(image , "bmp", new File( destinationDir +fileName+"_"+pageNumber+".bmp" ));
ImageIO.write(image , "gif", new File( destinationDir +fileName+"_"+pageNumber+".gif" ));
Console Output:
Images copied to Folder: Converted_PdfFiles_to_Image
Total files to be converted -> 13
Aug 06, 2014 1:35:49 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_1.png
Aug 06, 2014 1:35:50 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_2.png
Aug 06, 2014 1:35:51 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_3.png
Aug 06, 2014 1:35:51 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_4.png
Aug 06, 2014 1:35:52 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_5.png
Aug 06, 2014 1:35:52 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_6.png
Aug 06, 2014 1:35:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_7.png
Aug 06, 2014 1:35:53 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_8.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_9.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_10.png
Aug 06, 2014 1:35:54 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_11.png
Aug 06, 2014 1:35:55 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_12.png
Aug 06, 2014 1:35:55 PM org.apache.pdfbox.util.PDFStreamEngine processOperator
INFO: unsupported/disabled operation: i
Image Created -> 04-Request-Headers_13.png
Converted Images are saved at -> C:\Documents\Converted_PdfFiles_to_Image
Solution for Apache PDFBox 2.0.* version:
Required Jars pdfbox-2.0.16.jar, fontbox-2.0.16.jar, commons-logging-1.2.jar
or from the pom.xml dependencies
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/fontbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>fontbox</artifactId>
<version>2.0.16</version>
</dependency>
<!-- https://mvnrepository.com/artifact/commons-logging/commons-logging -->
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.2</version>
</dependency>
Solution for 2.0.16 version:
package com.pdf.pdfbox.examples;
import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
/**
*
* @author venkataudaykiranp
*
* @version 2.0.16(Apache PDFBox version support)
*
*/
public class ConvertPDFPagesToImages {
public static void main(String[] args) {
try {
String sourceDir = "C:\\Users\\venkataudaykiranp\\Downloads\\04-Request-Headers.pdf"; // Pdf files are read from this folder
String destinationDir = "C:\\Users\\venkataudaykiranp\\Downloads\\Converted_PdfFiles_to_Image/"; // converted images from pdf document are saved here
File sourceFile = new File(sourceDir);
File destinationFile = new File(destinationDir);
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("Folder Created -> "+ destinationFile.getAbsolutePath());
}
if (sourceFile.exists()) {
System.out.println("Images copied to Folder Location: "+ destinationFile.getAbsolutePath());
PDDocument document = PDDocument.load(sourceFile);
PDFRenderer pdfRenderer = new PDFRenderer(document);
int numberOfPages = document.getNumberOfPages();
System.out.println("Total files to be converting -> "+ numberOfPages);
String fileName = sourceFile.getName().replace(".pdf", "");
String fileExtension= "png";
/*
* 600 dpi give good image clarity but size of each image is 2x times of 300 dpi.
* Ex: 1. For 300dpi 04-Request-Headers_2.png expected size is 797 KB
* 2. For 600dpi 04-Request-Headers_2.png expected size is 2.42 MB
*/
int dpi = 300;// use less dpi for to save more space in harddisk. For professional usage you can use more than 300dpi
for (int i = 0; i < numberOfPages; ++i) {
File outPutFile = new File(destinationDir + fileName +"_"+ (i+1) +"."+ fileExtension);
BufferedImage bImage = pdfRenderer.renderImageWithDPI(i, dpi, ImageType.RGB);
ImageIO.write(bImage, fileExtension, outPutFile);
}
document.close();
System.out.println("Converted Images are saved at -> "+ destinationFile.getAbsolutePath());
} else {
System.err.println(sourceFile.getName() +" File not exists");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}