Python isnumeric function works only on unicode
According to the Python documentation, isnumeric
is only present for unicode objects:
The following methods are present only on unicode objects:
unicode.isnumeric()
Return True if there are only numeric characters in S, False otherwise. Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH.
Just different name.
'1'.isdigit() True
Often you will want to check if a string in Python is a number. This happens all the time, for example with user input, fetching data from a database (which may return a string), or reading a file containing numbers. Depending on what type of number you are expecting, you can use several methods. Such as parsing the string, using regex, or simply attempting to cast (convert) it to a number and see what happens. Often you will also encounter non-ASCII numbers, encoded in Unicode. These may or may not be numbers. For example ๒, which is 2 in Thai. However © is simply the copyright symbol, and is obviously not a number.
link : http://pythoncentral.io/how-to-check-if-a-string-is-a-number-in-python-including-unicode/
isnumeric()
has extended support for different numeral systems in Unicode strings.
In Americas and Europe the Hindu-Arabic numeral system is used which consists of 0123456789 digits.
The Hindu-Arabic numerals are also called European digits by the Unicode.
The are other numeral systems available such as:
- Roman numerals
- Ancient Greek numerals
- Tamil numerals
- Japaneese numerals
- Chineese numerals
- Korean numerals
More information about numeral systems can be found here: wikiwand.com/en/Numerals_in_Unicode#/Numerals_by_script
Unicode subscript
, superscript
and fractions
are also considered valid numerals by the isnumeric()
function.
You can use the isnumeric() function below to check if a string is a non-unicode number.
l = ['abc' + chr(255), 'abc', '123', '45a6', '78b', u"\u2155", '123.4', u'\u2161', u'\u2168']
def isnumeric(s):
'''Returns True for all non-unicode numbers'''
try:
s = s.decode('utf-8')
except:
return False
try:
float(s)
return True
except:
return False
for i in l:
print i, 'isnumeric:', isnumeric(i)
print '--------------------'
print u'\u2169', 'isnumeric', u'\u2169'.isnumeric()
print u'\u2165', 'isnumeric', u'\u2165'.isnumeric()
Edit: I'll update this post as soon as I have enough reputation to add more than 2 links to this answer.