Jsoup.clean without adding html entities

There are already feature requests on the website of Jsoup. You can extend source code yourself by adding a new empty Map and a new escaping type. If you don't want to do this you can use StringEscapeUtils from apache commons.

public static String getTextOnlyFromHtmlText(String htmlText){
    Document doc = Jsoup.parse( htmlText );
    doc.outputSettings().charset("UTF-8");
    htmlText = Jsoup.clean( doc.body().html(), Whitelist.simpleText() );
    htmlText = StringEscapeUtils.unescapeHtml(htmlText);
    return htmlText;
}

You can configure Jsoup's escaping mode: Using EscapeMode.xhtml will give you output w/o entities.

Here's a complete snippet that accepts str as input, and cleans it using Whitelist.simpleText():

// Parse str into a Document
Document doc = Jsoup.parse(str);

// Clean the document.
doc = new Cleaner(Whitelist.simpleText()).clean(doc);

// Adjust escape mode
doc.outputSettings().escapeMode(EscapeMode.xhtml);

// Get back the string of the body.
str = doc.body().html();