Jsoup: how to get an image's absolute url?
Once you have the image element, e.g.:
Element image = document.select("img").first();
String url = image.absUrl("src");
// url = http://www.example.com/images/chicken.jpg
Alternatively:
String url = image.attr("abs:src");
Jsoup has a builtin absUrl() method on all nodes to resolve an attribute to an absolute URL, using the base URL of the node (which could be different from the URL the document was retrieved from).
See also the Working with URLs jsoup documentation.
Document doc = Jsoup.connect("www.abc.com").get();
Elements img = doc.getElementsByTag("img");
for (Element el : img) {
String src = el.absUrl("src");
System.out.println("Image Found!");
System.out.println("src attribute is : "+src);
getImages(src);
}
Let's assume you are parsing http://www.example.com/index.html
.
Use jsoup to extract the img src which gives you: images/chicken.jpg
You can then use the URI class to resolve this to an absolute path:
URL url = new URL("http://www.example.com/index.html");
URI uri = url.toURI();
System.out.println(uri.resolve("images/chicken.jpg").toString());
prints
http://www.example.com/images/chicken.jpg