How to remove control characters from java string?

One option is to use a combination of CharMatchers:

CharMatcher charsToPreserve = CharMatcher.anyOf("\r\n\t");
CharMatcher allButPreserved = charsToPreserve.negate();
CharMatcher controlCharactersToRemove = CharMatcher.JAVA_ISO_CONTROL.and(allButPreserved);

Then use removeFrom as before. I don't know how efficient it is, but it's at least simple.


As noted in edits, JAVA_ISO_CONTROL is now deprecated in Guava; the javaIsoControl() method is preferred.


You can do something like this if you want to delete all characters in other or control uni-code category

System.out.println(
    "a\u0000b\u0007c\u008fd".replaceAll("\\p{Cc}", "")
); // abcd

Note : This actually removes (among others) '\u008f' Unicode character from the string, not the escaped form "%8F" string.

Courtesy : polygenelubricants ( Replace Unicode Control Characters )


This seems to be an option

    String s = "\u0001\t\r\n".replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "");
    for (char c : s.toCharArray()) {
        System.out.print((int) c + " ");
    }

prints 9 13 10 just like you said "except carriage returns, line feeds, and tabs".