Is there a JDK class to do HTML encoding (but not URL encoding)?
A simple way seem to be this one:
/**
* HTML encode of UTF8 string i.e. symbols with code more than 127 aren't encoded
* Use Apache Commons Text StringEscapeUtils if it is possible
*
* <pre>
* escapeHtml("\tIt's timeto hack & fun\r<script>alert(\"PWNED\")</script>")
* .equals("	It's time to hack & fun <script>alert("PWNED")</script>")
* </pre>
*/
public static String escapeHtml(String rawHtml) {
int rawHtmlLength = rawHtml.length();
// add 30% for additional encodings
int capacity = (int) (rawHtmlLength * 1.3);
StringBuilder sb = new StringBuilder(capacity);
for (int i = 0; i < rawHtmlLength; i++) {
char ch = rawHtml.charAt(i);
if (ch == '<') {
sb.append("<");
} else if (ch == '>') {
sb.append(">");
} else if (ch == '"') {
sb.append(""");
} else if (ch == '&') {
sb.append("&");
} else if (ch < ' ' || ch == '\'') {
// non printable ascii symbols escaped as numeric entity
// single quote ' in html doesn't have ' so show it as numeric entity '
sb.append("&#").append((int)ch).append(';');
} else {
// any non ASCII char i.e. upper than 127 is still UTF
sb.append(ch);
}
}
return sb.toString();
}
But if you do need to escape all non ASCII symbols i.e. you'll transmit encoded text on 7bit encoding then replace the last else with:
} else {
// encode non ASCII characters if needed
int c = (ch & 0xFFFF);
if (c > 127) {
sb.append("&#").append(c).append(';');
} else {
sb.append(ch);
}
}
There isn't a JDK built in class to do this, but it is part of the Jakarta commons-lang library.
String escaped = StringEscapeUtils.escapeHtml3(stringToEscape);
String escaped = StringEscapeUtils.escapeHtml4(stringToEscape);
Check out the JavaDoc
Adding the dependency is usually as simple as dropping the jar somewhere, and commons-lang has so many useful utilities that it is often worthwhile having it on board.