Why does Apache Commons consider '१२३' numeric?
Because that "CharSequence contains only Unicode digits" (quoting your linked documentation).
All of the characters return true for Character.isDigit
:
Some Unicode character ranges that contain digits:
- '\u0030' through '\u0039', ISO-LATIN-1 digits ('0' through '9')
- '\u0660' through '\u0669', Arabic-Indic digits
- '\u06F0' through '\u06F9', Extended Arabic-Indic digits
- '\u0966' through '\u096F', Devanagari digits
- '\uFF10' through '\uFF19', Fullwidth digits
Many other character ranges contain digits as well.
१२३
are Devanagari digits:
१
is DEVANAGARI DIGIT ONE,\u0967
२
is DEVANAGARI DIGIT TWO,\u0968
३
is DEVANAGARI DIGIT THREE,\u0969
The symbol १२३ is the same as 123 for the Nepali language or any other language using the Devanagari script such as Hindi, Gujarati, and so on, and is therefore is a number for Apache Commons.
You can use Character#getType
to check the character's general category:
System.out.println(Character.DECIMAL_DIGIT_NUMBER == Character.getType('१'));
This will print true
, which is an "evidence" that '१' is a digit number.
Now let's examine the unicode value of the '१' character:
System.out.println(Integer.toHexString('१'));
// 967
This number is on the range of Devanagari digits - which is: \u0966
through \u096F
.
Also try:
Character.UnicodeBlock block = Character.UnicodeBlock.of('१');
System.out.println(block.toString());
// DEVANAGARI
Devanagari is:
is an abugida (alphasyllabary) alphabet of India and Nepal
"१२३" is a "123" (Basic Latin unicode).
Reading:
- More details on the '१' character
StringUtils#isNumeric
implementation