How to convert unicode accented characters to pure ascii without accents?

I needed something like this but to remove only accented characters, ignoring special ones and I did this small function:

# ~*~ coding: utf-8 ~*~
import re

def remove_accents(string):
    if type(string) is not unicode:
        string = unicode(string, encoding='utf-8')

    string = re.sub(u"[àáâãäå]", 'a', string)
    string = re.sub(u"[èéêë]", 'e', string)
    string = re.sub(u"[ìíîï]", 'i', string)
    string = re.sub(u"[òóôõö]", 'o', string)
    string = re.sub(u"[ùúûü]", 'u', string)
    string = re.sub(u"[ýÿ]", 'y', string)

    return string

I like that function because you can customize it in case you need to ignore other characters


how do i convert all those escape characters into their respective characters like if there is an unicode à, how do i convert that into a standard a?

Assume you have loaded your unicode into a variable called my_unicode... normalizing à into a is this simple...

import unicodedata
output = unicodedata.normalize('NFD', my_unicode).encode('ascii', 'ignore')

Explicit example...

>>> myfoo = u'àà'
>>> myfoo
u'\xe0\xe0'
>>> unicodedata.normalize('NFD', myfoo).encode('ascii', 'ignore')
'aa'
>>>

How it works
unicodedata.normalize('NFD', "insert-unicode-text-here") performs a Canonical Decomposition (NFD) of the unicode text; then we use str.encode('ascii', 'ignore') to transform the NFD mapped characters into ascii (ignoring errors).


@Mike Pennington's solution works great thanks to him. but when I tried that solution I notice that it fails some special characters (i.e. ı character from Turkish alphabet) which has not defined at NFD.

I discovered another solution which you can use unidecode library to this conversion.

>>>import unidecode
>>>example = "ABCÇDEFGĞHIİJKLMNOÖPRSŞTUÜVYZabcçdefgğhıijklmnoöprsştuüvyz"


#convert it to utf-8
>>>utf8text = unicode(example, "utf-8")

>>> print utf8text
ABCÇDEFGĞHIİJKLMNOÖPRSŞTUÜVYZabcçdefgğhıijklmnoöprsştuüvyz

#convert utf-8 to ascii text
asciitext = unidecode.unidecode(utf8text)

>>>print asciitext

ABCCDEFGGHIIJKLMNOOPRSSTUUVYZabccdefgghiijklmnooprsstuuvyz