get char value in java

Those "UTF-8" codes are no such thing. They're actually just Unicode values, as per the Unicode code charts.

So an 'é' is actually U+00E9 - in UTF-8 it would be represented by two bytes { 0xc3, 0xa9 }.

Now to get the Unicode value - or to be more precise the UTF-16 value, as that's what Java uses internally - you just need to convert the value to an integer:

char c = '\u00e9'; // c is now e-acute
int i = c; // i is now 233

This produces good result:

int a = 'a';
System.out.println(a); // outputs 97

Likewise:

System.out.println((int)'é');

prints out 233.

Note that the first example only works for characters included in the standard and extended ASCII character sets. The second works with all Unicode characters. You can achieve the same result by multiplying the char by 1. System.out.println( 1 * 'é');

char is actually a numeric type containing the unicode value (UTF-16, to be exact - you need two chars to represent characters outside the BMP) of the character. You can do everything with it that you can do with an int.

Character.getNumericValue() tries to interpret the character as a digit.

You can use the codePointAt(int index) method of java.lang.String for that. Here's an example:

"a".codePointAt(0) --> 97
"é".codePointAt(0) --> 233

If you want to avoid creating strings unnecessarily, the following works as well and can be used for char arrays:

Character.codePointAt(new char[] {'a'},0)

get char value in java

Tags:

Java

Character Encoding

Related

Recent Posts