How to remove control characters from java string?
One option is to use a combination of CharMatcher
s:
CharMatcher charsToPreserve = CharMatcher.anyOf("\r\n\t");
CharMatcher allButPreserved = charsToPreserve.negate();
CharMatcher controlCharactersToRemove = CharMatcher.JAVA_ISO_CONTROL.and(allButPreserved);
Then use removeFrom
as before. I don't know how efficient it is, but it's at least simple.
As noted in edits, JAVA_ISO_CONTROL
is now deprecated in Guava; the javaIsoControl()
method is preferred.
You can do something like this if you want to delete all characters in other or control uni-code category
System.out.println(
"a\u0000b\u0007c\u008fd".replaceAll("\\p{Cc}", "")
); // abcd
Note : This actually removes (among others) '\u008f' Unicode character from the string, not the escaped form "%8F" string.
Courtesy : polygenelubricants ( Replace Unicode Control Characters )
This seems to be an option
String s = "\u0001\t\r\n".replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "");
for (char c : s.toCharArray()) {
System.out.print((int) c + " ");
}
prints 9 13 10
just like you said "except carriage returns, line feeds, and tabs".