How can I convert Persian numerals in UTF-8 to European numerals in ASCII?
Since it's a fixed set of numbers, you can do it by hand:
$ echo ۲۱ | LC_ALL=en_US.UTF-8 sed -e 'y/۰۱۲۳۴۵۶۷۸۹/0123456789/'
21
(or using tr
, but not GNU tr yet)
Setting your locale to en_US.utf8
(or better to the locale which characters set belongs to) is required for sed
to recognize your characters set.
With perl
:
$ echo "۲۱" |
perl -CS -MUnicode::UCD=num -MUnicode::Normalize -lne 'print num(NFKD($_))'
21
For Python there is the unidecode
library which handles such conversions in general: https://pypi.python.org/pypi/Unidecode.
In Python 2:
>>> from unidecode import unidecode
>>> unidecode(u"۰۱۲۳۴۵۶۷۸۹")
'0123456789'
In Python 3:
>>> from unidecode import unidecode
>>> unidecode("۰۱۲۳۴۵۶۷۸۹")
'0123456789'
The SO thread at https://stackoverflow.com/q/8087381/2261442 might be related.
/edit:
As Wander Nauta pointed out in the comments and as mentioned on the Unidecode page there is also a shell version of unidecode
(under /usr/local/bin/
if installed over pip
):
$ echo '۰۱۲۳۴۵۶۷۸۹' | unidecode
0123456789
A pure bash version:
#!/bin/bash
number="$1"
number=${number//۱/1}
number=${number//۲/2}
number=${number//۳/3}
number=${number//۴/4}
number=${number//۵/5}
number=${number//۶/6}
number=${number//۷/7}
number=${number//۸/8}
number=${number//۹/9}
number=${number//۰/0}
echo "Result is $number"
Have tested in my Gentoo machine and it works.
./convert ۱۳۲
Result is 132
Done as a loop, given the list of characters (from 0 to 9) to convert:
#!/bin/bash
conv() ( LC_ALL=en_US.UTF-8
local n="$2"
for ((i=0;i<${#1};i++)); do
n=${n//"${1:i:1}"/"$i"}
done
printf '%s\n' "$n"
)
conv "۰۱۲۳۴۵۶۷۸۹" "$1"
And used as:
$ convert ۱۳۲
132
Another (rather overkill) way using grep
:
#!/bin/bash
nums=$(echo "$1" | grep -o .)
result=()
for i in $nums
do
case $i in
۱)
result+=1
;;
۲)
result+=2
;;
۳)
result+=3
;;
۴)
result+=4
;;
۵)
result+=5
;;
۶)
result+=6
;;
۷)
result+=7
;;
۸)
result+=8
;;
۹)
result+=9
;;
۰)
result+=0
;;
esac
done
echo "Result is $result"