Java - removing strange characters from a String
Justin Thomas's was close, but this is probably closer to what you're looking for:
String nonStrange = strangeString.replaceAll("\\p{Cntrl}", "");
The selector \p{Cntrl} selects "A control character: [\x00-\x1F\x7F]."
To delete non-Latin symbols from the string I use the following code:
String s = "小米体验版 latin string 01234567890";
s = s.replaceAll("[^\\x00-\\x7F]", "");
The output string will be: " latin string 01234567890"
You can use a String.replaceAll("[my-list-of-strange-and-unwanted-chars]","")
There is no Character.isStrangeAndUnWanted()
, you have to define what you want.
If you want to remove control characters you can do
String str = "\u0000\u001f hi \n";
str = str.replaceAll("[\u0000-\u001f]", "");
prints hi
(keeps the space).
EDIT If you want to know the unicode of any 16-bit character you can do
int num = string.charAt(n);
System.out.println(num);
A black diamond with a question mark is not a unicode character -- it's a placeholder for a character that your font cannot display. If there is a glyph that exists in the string that is not in the font you're using to display that string, you will see the placeholder. This is defined as U+FFFD: �. Its appearance varies depending on the font you're using.
You can use java.text.normalizer
to remove Unicode characters that are not in the "normal" ASCII character set.