How to check if a String contains only ASCII?
You can do it with java.nio.charset.Charset.
import java.nio.charset.Charset;
public class StringUtils {
public static boolean isPureAscii(String v) {
return Charset.forName("US-ASCII").newEncoder().canEncode(v);
// or "ISO-8859-1" for ISO Latin 1
// or StandardCharsets.US_ASCII with JDK1.7+
}
public static void main (String args[])
throws Exception {
String test = "Réal";
System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
test = "Real";
System.out.println(test + " isPureAscii() : " + StringUtils.isPureAscii(test));
/*
* output :
* Réal isPureAscii() : false
* Real isPureAscii() : true
*/
}
}
Detect non-ASCII character in a String
From Guava 19.0 onward, you may use:
boolean isAscii = CharMatcher.ascii().matchesAllOf(someString);
This uses the matchesAllOf(someString)
method which relies on the factory method ascii()
rather than the now deprecated ASCII
singleton.
Here ASCII includes all ASCII characters including the non-printable characters lower than 0x20
(space) such as tabs, line-feed / return but also BEL
with code 0x07
and DEL
with code 0x7F
.
This code incorrectly uses characters rather than code points, even if code points are indicated in the comments of earlier versions. Fortunately, the characters required to create code point with a value of U+010000
or over uses two surrogate characters with a value outside of the ASCII range. So the method still succeeds in testing for ASCII, even for strings containing emoji's.
For earlier Guava versions without the ascii()
method you may write:
boolean isAscii = CharMatcher.ASCII.matchesAllOf(someString);
Here is another way not depending on a library but using a regex.
You can use this single line:
text.matches("\\A\\p{ASCII}*\\z")
Whole example program:
public class Main {
public static void main(String[] args) {
char nonAscii = 0x00FF;
String asciiText = "Hello";
String nonAsciiText = "Buy: " + nonAscii;
System.out.println(asciiText.matches("\\A\\p{ASCII}*\\z"));
System.out.println(nonAsciiText.matches("\\A\\p{ASCII}*\\z"));
}
}
Understanding the regex :
- li
\\A
: Beginning of input \\p{ASCII}
: Any ASCII character*
: all repetitions\\z
: End of input