remove empty tag pairs from HTML fragment

Not really familiar with jsoup, but you could do this with a simple regex replace:

String html = "<p></p><div></div><p>Hello<br/>world</p><p></p>";
html = html.replaceAll("<([^>]*)></\\1>", "");

Although with a full parser you could probably just drop empty content during processing, depending on what you're eventually going to do with it.


Here is an example that does just that (using JSoup):

String html = "<p></p><div></div><p>Hello<br/>world</p><p></p>";
Document doc = Jsoup.parse(html);

for (Element element : doc.select("*")) {
    if (!element.hasText() && element.isBlock()) {
        element.remove();
    }
}

System.out.println(doc.body().html())

The output of the code above is what you are looking for:

<p>Hello<br />world</p>

Jsoup will make correct XML from user-input HTML. Use XML parser to find and remove all empty tags. I think it's a better idea than regexp. Look here: Java Remove empty XML tags You can also use JSoup to find empty tags for you. Look here : http://jsoup.org/cookbook/extracting-data/selector-syntax and use Node.remove() method.