get char value in java
Those "UTF-8" codes are no such thing. They're actually just Unicode values, as per the Unicode code charts.
So an 'é' is actually U+00E9 - in UTF-8 it would be represented by two bytes { 0xc3, 0xa9 }.
Now to get the Unicode value - or to be more precise the UTF-16 value, as that's what Java uses internally - you just need to convert the value to an integer:
char c = '\u00e9'; // c is now e-acute
int i = c; // i is now 233
This produces good result:
int a = 'a';
System.out.println(a); // outputs 97
Likewise:
System.out.println((int)'é');
prints out 233
.
Note that the first example only works for characters included in the standard and extended ASCII character sets. The second works with all Unicode characters. You can achieve the same result by multiplying the char by 1. System.out.println( 1 * 'é');
char
is actually a numeric type containing the unicode value (UTF-16, to be exact - you need two char
s to represent characters outside the BMP) of the character. You can do everything with it that you can do with an int
.
Character.getNumericValue()
tries to interpret the character as a digit.
You can use the codePointAt(int index) method of java.lang.String for that. Here's an example:
"a".codePointAt(0) --> 97
"é".codePointAt(0) --> 233
If you want to avoid creating strings unnecessarily, the following works as well and can be used for char arrays:
Character.codePointAt(new char[] {'a'},0)