Python: Detect number separator symbols and parse into a float without locale
One approach:
import re
with open('numbers') as fhandle:
for line in fhandle:
line = line.strip()
separators = re.sub('[0-9]', '', line)
for sep in separators[:-1]:
line = line.replace(sep, '')
if separators:
line = line.replace(separators[-1], '.')
print(line)
On your sample input (comments removed), the output is:
1.0000
1.0000
10000.0000
10000.0000
1000000.0000
1000000.0000
1000000.0000
Update: Handling Unicode
As NeoZenith points out in the comments, with modern unicode fonts, the venerable regular expression [0-9]
is not reliable. Use the following instead:
import re
with open('numbers') as fhandle:
for line in fhandle:
line = line.strip()
separators = re.sub(r'\d', '', line, flags=re.U)
for sep in separators[:-1]:
line = line.replace(sep, '')
if separators:
line = line.replace(separators[-1], '.')
print(line)
Without the re.U
flag, \d
is equivalent to [0-9]
. With that flag, \d
matches whatever is classified as a decimal digit in the Unicode character properties database. Alternatively, for handling unusual digit characters, one may want to consider using unicode.translate
.