ChromeDriver --print-to-pdf after page load

Using ChromiumDriver from Java Selenium 4.x.x release, this can be achieved.

String command = "Page.printToPDF";
Map<String, Object> params = new HashMap<>();
params.put("landscape", false);
Map<String, Object> output = driver.executeCdpCommand(command, params);
try {
    FileOutputStream fileOutputStream = new FileOutputStream("export.pdf");
    byte[] byteArray = Base64.getDecoder().decode((String)output.get("data"));
    fileOutputStream.write(byteArray);
} catch (IOException e) {
    e.printStackTrace();
}

Source: Selenium_CDP

This is indeed possible to do through Selenium Chromedriver, by means of the ExecuteChromeCommandWithResult method. When executing the command Page.printToPDF, a base-64-encoded PDF document is returned in the "data" item of the result dictionary.

A C# example, which should be easy to translate into Java, is available in this answer:

https://stackoverflow.com/a/58698226/2416627

Here is another C# example, which illustrates some useful options:

public static void Main(string[] args)
{
    var driverOptions = new ChromeOptions();
    // In headless mode, PDF writing is enabled by default (tested with driver major version 85)
    driverOptions.AddArgument("headless");
    using (var driver = new ChromeDriver(driverOptions))
    {
        driver.Navigate().GoToUrl("https://stackoverflow.com/questions");
        new WebDriverWait(driver, TimeSpan.FromSeconds(10)).Until(d => d.FindElements(By.CssSelector("#questions")).Count == 1);
        // Output a PDF of the first page in A4 size at 90% scale
        var printOptions = new Dictionary<string, object>
        {
            { "paperWidth", 210 / 25.4 },
            { "paperHeight", 297 / 25.4 },
            { "scale", 0.9 },
            { "pageRanges", "1" }
        };
        var printOutput = driver.ExecuteChromeCommandWithResult("Page.printToPDF", printOptions) as Dictionary<string, object>;
        var pdf = Convert.FromBase64String(printOutput["data"] as string);
        File.WriteAllBytes("stackoverflow-page-1.pdf", pdf);
    }
}

The options available for the Page.printToPDF call are documented here:

https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF

An example for doing this from command line, takes a little tinkering with the page html and sed:

LOGIN='myuserid'
PASSW='mypasswd'
AUTH='pin=$LOGIN&accessCode=$PASSW&Submit=Submit'
TIMESTAMP=`TZ=HST date -d "today" +"%m/%d/%y %I:%M %p HST"`
wget -q --save-cookies cookies.txt --keep-session-cookies \
    --post-data $AUTH \
    https://csea.ehawaii.gov/iwa/index.html
sed -i 's#href="/iwa/css#href="./bin#g' index.html
sed -i 's#src="/iwa/images#src="./bin#g' index.html
wkhtmltopdf -q --print-media-type \
            --header-left "$d" --header-font-size 10 \
            --header-line --header-spacing 10 \
            --footer-left "Page [page] of [toPage]" --footer-font-size 10 \
            --footer-line --footer-spacing 10 \
            --footer-right "$TIMESTAMP" \
            --margin-bottom 20 --margin-left 15 \
            --margin-top 20 --margin-right 15 \
            index.html index.pdf

Assuming valid cookies, further pages available after login could be accessed like this:

wget -q --load-cookies cookies.txt https://csea.ehawaii.gov/otherpage.html
wkhtmltopdf <all the options> otherpage.html otherpage.pdf

Also, I had previously dumped all the css and images in a local bin directory, something like this:

wget -r -A.jpg -A.gif -A.css -nd -Pbin \
    https://csea.ehawaii.gov/iwa/index.html

As there are no answers, I will explain my workaround. Instead of trying to find how to request from Chrome to print the current page, I went down another route.

For this example we will try to download the results page from Google on the query 'example':

Navigate with driver.get("google.com"), input the query 'example', click 'Google Search'
Wait for the results page to load
Retrieve the page source with driver.getPageSource()
Parse source with e.g. Jsoup in order to remap all relative links to point to an endpoint defined for this purpose (explained below) - example to localhost:8080. Link './style.css' would become 'localhost:8080/style.css'
Save HTML to a file, e.g. named 'query-example'
Run chrome --print-to-pdf localhost:8080/search?id=query-example

What will happen is that chrome will request the HTML from our controller, and for resources defined in the HTML we return, it will go to our controller - since we remapped relative links - which will in turn forward that request to the real location of the resource - google.com. Below is an example Spring controller, and note that the example is incomplete and is here only as a guidance.

@RestController
@RequestMapping
public class InternationalOffloadRestController {
  @RequestMapping(method = RequestMethod.GET, value = "/search/html")
  public String getHtml(@RequestParam("id") String id) {
    File file = new File("location of the HTML file", id);
    try (FileInputStream input = new FileInputStream(file)) {
      return IOUtils.toString(input, HTML_ENCODING);
    }
  }
  @RequestMapping("/**") // forward all remapped links to google.com
  public void forward(HttpServletResponse httpServletResponse, ...) {
    URI uri = new URI("https", null, "google.com", -1, 
      request.getRequestURI(), request.getQueryString(), null);
    httpServletResponse.setHeader("Location", uri.toString());
    httpServletResponse.setStatus(HttpServletResponse.SC_MOVED_PERMANENTLY);
  }
}

ChromeDriver --print-to-pdf after page load

Tags:

Selenium Webdriver

Selenium Chromedriver

Google Chrome Headless

Related

Recent Posts