use string.translate in Python to transliterate Cyrillic?
translate behaves differently when used with unicode strings. Instead of a maketrans
table, you have to provide a dictionary ord(search)->ord(replace)
:
symbols = (u"абвгдеёжзийклмнопрстуфхцчшщъыьэюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ",
u"abvgdeejzijklmnoprstufhzcss_y_euaABVGDEEJZIJKLMNOPRSTUFHZCSS_Y_EUA")
tr = {ord(a):ord(b) for a, b in zip(*symbols)}
# for Python 2.*:
# tr = dict( [ (ord(a), ord(b)) for (a, b) in zip(*symbols) ] )
text = u'Добрый Ден'
print text.translate(tr) # looks good
That said, I'd second the suggestion not to reinvent the wheel and to use an established library: http://pypi.python.org/pypi/Unidecode
You can use transliterate package (https://pypi.python.org/pypi/transliterate)
Example #1:
from transliterate import translit
print translit("Lorem ipsum dolor sit amet", "ru")
# Лорем ипсум долор сит амет
Example #2:
print translit(u"Лорем ипсум долор сит амет", "ru", reversed=True)
# Lorem ipsum dolor sit amet
Check out the CyrTranslit package, it's specifically made to transliterate from and to Cyrillic script text. It currently supports Serbian, Montenegrin, Macedonian, and Russian.
Example usage:
>>> import cyrtranslit
>>> cyrtranslit.supported()
['me', 'sr', 'mk', 'ru']
>>> cyrtranslit.to_latin('Моё судно на воздушной подушке полно угрей', 'ru')
'Moyo sudno na vozdushnoj podushke polno ugrej'
>>> cyrtranslit.to_cyrillic('Moyo sudno na vozdushnoj podushke polno ugrej')
'Моё судно на воздушной подушке полно угрей'