How can I convert Persian numerals in UTF-8 to European numerals in ASCII?

Since it's a fixed set of numbers, you can do it by hand:

$ echo ۲۱ | LC_ALL=en_US.UTF-8 sed -e 'y/۰۱۲۳۴۵۶۷۸۹/0123456789/'
21

(or using tr, but not GNU tr yet)

Setting your locale to en_US.utf8 (or better to the locale which characters set belongs to) is required for sed to recognize your characters set.

With perl:

$ echo "۲۱" |
  perl -CS -MUnicode::UCD=num -MUnicode::Normalize -lne 'print num(NFKD($_))'
21

For Python there is the unidecode library which handles such conversions in general: https://pypi.python.org/pypi/Unidecode.

In Python 2:

>>> from unidecode import unidecode
>>> unidecode(u"۰۱۲۳۴۵۶۷۸۹")
'0123456789'

In Python 3:

>>> from unidecode import unidecode
>>> unidecode("۰۱۲۳۴۵۶۷۸۹")
'0123456789'

The SO thread at https://stackoverflow.com/q/8087381/2261442 might be related.

/edit: As Wander Nauta pointed out in the comments and as mentioned on the Unidecode page there is also a shell version of unidecode (under /usr/local/bin/ if installed over pip):

$ echo '۰۱۲۳۴۵۶۷۸۹' | unidecode
0123456789

A pure bash version:

#!/bin/bash

number="$1"

number=${number//۱/1}
number=${number//۲/2}
number=${number//۳/3}
number=${number//۴/4}
number=${number//۵/5}
number=${number//۶/6}
number=${number//۷/7}
number=${number//۸/8}
number=${number//۹/9}
number=${number//۰/0}

echo "Result is $number"

Have tested in my Gentoo machine and it works.

./convert ۱۳۲
Result is 132

Done as a loop, given the list of characters (from 0 to 9) to convert:

#!/bin/bash
conv() ( LC_ALL=en_US.UTF-8
         local n="$2"
         for ((i=0;i<${#1};i++)); do
              n=${n//"${1:i:1}"/"$i"}
         done
         printf '%s\n' "$n"
       )

conv "۰۱۲۳۴۵۶۷۸۹" "$1"

And used as:

$ convert ۱۳۲
132

Another (rather overkill) way using grep:

#!/bin/bash

nums=$(echo "$1" | grep -o .)
result=()

for i in $nums
do
    case $i in
        ۱)
            result+=1
            ;;
        ۲)
            result+=2
            ;;
        ۳)
            result+=3
            ;;
        ۴)
            result+=4
            ;;
        ۵)
            result+=5
            ;;
        ۶)
            result+=6
            ;;
        ۷)
            result+=7
            ;;
        ۸)
            result+=8
            ;;
        ۹)
            result+=9
            ;;
        ۰)
            result+=0
            ;;
    esac
done
echo "Result is $result"

How can I convert Persian numerals in UTF-8 to European numerals in ASCII?

Tags:

Unicode

Bash

Conversion

Related

Recent Posts