ChromeDriver --print-to-pdf after page load
Using ChromiumDriver from Java Selenium 4.x.x release, this can be achieved.
String command = "Page.printToPDF";
Map<String, Object> params = new HashMap<>();
params.put("landscape", false);
Map<String, Object> output = driver.executeCdpCommand(command, params);
try {
FileOutputStream fileOutputStream = new FileOutputStream("export.pdf");
byte[] byteArray = Base64.getDecoder().decode((String)output.get("data"));
fileOutputStream.write(byteArray);
} catch (IOException e) {
e.printStackTrace();
}
Source: Selenium_CDP
This is indeed possible to do through Selenium Chromedriver, by means of the ExecuteChromeCommandWithResult
method. When executing the command Page.printToPDF
, a base-64-encoded PDF document is returned in the "data" item of the result dictionary.
A C# example, which should be easy to translate into Java, is available in this answer:
https://stackoverflow.com/a/58698226/2416627
Here is another C# example, which illustrates some useful options:
public static void Main(string[] args)
{
var driverOptions = new ChromeOptions();
// In headless mode, PDF writing is enabled by default (tested with driver major version 85)
driverOptions.AddArgument("headless");
using (var driver = new ChromeDriver(driverOptions))
{
driver.Navigate().GoToUrl("https://stackoverflow.com/questions");
new WebDriverWait(driver, TimeSpan.FromSeconds(10)).Until(d => d.FindElements(By.CssSelector("#questions")).Count == 1);
// Output a PDF of the first page in A4 size at 90% scale
var printOptions = new Dictionary<string, object>
{
{ "paperWidth", 210 / 25.4 },
{ "paperHeight", 297 / 25.4 },
{ "scale", 0.9 },
{ "pageRanges", "1" }
};
var printOutput = driver.ExecuteChromeCommandWithResult("Page.printToPDF", printOptions) as Dictionary<string, object>;
var pdf = Convert.FromBase64String(printOutput["data"] as string);
File.WriteAllBytes("stackoverflow-page-1.pdf", pdf);
}
}
The options available for the Page.printToPDF
call are documented here:
https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-printToPDF
An example for doing this from command line, takes a little tinkering with the page html and sed
:
LOGIN='myuserid'
PASSW='mypasswd'
AUTH='pin=$LOGIN&accessCode=$PASSW&Submit=Submit'
TIMESTAMP=`TZ=HST date -d "today" +"%m/%d/%y %I:%M %p HST"`
wget -q --save-cookies cookies.txt --keep-session-cookies \
--post-data $AUTH \
https://csea.ehawaii.gov/iwa/index.html
sed -i 's#href="/iwa/css#href="./bin#g' index.html
sed -i 's#src="/iwa/images#src="./bin#g' index.html
wkhtmltopdf -q --print-media-type \
--header-left "$d" --header-font-size 10 \
--header-line --header-spacing 10 \
--footer-left "Page [page] of [toPage]" --footer-font-size 10 \
--footer-line --footer-spacing 10 \
--footer-right "$TIMESTAMP" \
--margin-bottom 20 --margin-left 15 \
--margin-top 20 --margin-right 15 \
index.html index.pdf
Assuming valid cookies, further pages available after login could be accessed like this:
wget -q --load-cookies cookies.txt https://csea.ehawaii.gov/otherpage.html
wkhtmltopdf <all the options> otherpage.html otherpage.pdf
Also, I had previously dumped all the css and images in a local bin
directory, something like this:
wget -r -A.jpg -A.gif -A.css -nd -Pbin \
https://csea.ehawaii.gov/iwa/index.html
As there are no answers, I will explain my workaround. Instead of trying to find how to request from Chrome to print the current page, I went down another route.
For this example we will try to download the results page from Google on the query 'example':
- Navigate with
driver.get("google.com")
, input the query 'example', click 'Google Search' - Wait for the results page to load
- Retrieve the page source with
driver.getPageSource()
- Parse source with e.g. Jsoup in order to remap all relative links to point to an endpoint defined for this purpose (explained below) - example to
localhost:8080
. Link './style.css' would become 'localhost:8080/style.css' - Save HTML to a file, e.g. named 'query-example'
- Run
chrome --print-to-pdf localhost:8080/search?id=query-example
What will happen is that chrome will request the HTML from our controller, and for resources defined in the HTML we return, it will go to our controller - since we remapped relative links - which will in turn forward that request to the real location of the resource - google.com. Below is an example Spring controller, and note that the example is incomplete and is here only as a guidance.
@RestController
@RequestMapping
public class InternationalOffloadRestController {
@RequestMapping(method = RequestMethod.GET, value = "/search/html")
public String getHtml(@RequestParam("id") String id) {
File file = new File("location of the HTML file", id);
try (FileInputStream input = new FileInputStream(file)) {
return IOUtils.toString(input, HTML_ENCODING);
}
}
@RequestMapping("/**") // forward all remapped links to google.com
public void forward(HttpServletResponse httpServletResponse, ...) {
URI uri = new URI("https", null, "google.com", -1,
request.getRequestURI(), request.getQueryString(), null);
httpServletResponse.setHeader("Location", uri.toString());
httpServletResponse.setStatus(HttpServletResponse.SC_MOVED_PERMANENTLY);
}
}