Android - How to get plain HTML using evaluateJavascript from Webview? JSOUP not able to parse the result HTML
You should use JsonReader to parse the value:
webView.evaluateJavascript("(function() {return document.getElementsByTagName('html')[0].outerHTML;})();", new ValueCallback<String>() {
@Override
public void onReceiveValue(final String value) {
JsonReader reader = new JsonReader(new StringReader(value));
reader.setLenient(true);
try {
if(reader.peek() == JsonToken.STRING) {
String domStr = reader.nextString();
if(domStr != null) {
handleResponseSuccessByBody(domStr);
}
}
} catch (IOException e) {
// handle exception
} finally {
IoUtil.close(reader);
}
}
});
for remove the UTFCharacthers use this function:
public static StringBuffer removeUTFCharacters(String data) {
Pattern p = Pattern.compile("\\\\u(\\p{XDigit}{4})");
Matcher m = p.matcher(data);
StringBuffer buf = new StringBuffer(data.length());
while (m.find()) {
String ch = String.valueOf((char) Integer.parseInt(m.group(1), 16));
m.appendReplacement(buf, Matcher.quoteReplacement(ch));
}
m.appendTail(buf);
return buf;
}
and call it inside the onReceiveValue(String html) like this:
@Override
public void onReceiveValue(String html) {
String result = removeUTFCharacters(html).toString();
}
You will obtain a string with clean html.
Bye, Alex
try this
v=StringEscapeUtils.unescapeJavaScript(v.substring(1,v.length()-1));
unescapeJavaScript
is from apache commons-lang
So many string processing for android webview, why...
The removeUTFCharacters
method provided in the previous answer is not clean enough.There still remain stuffs like \"
.