Python - replace unicode emojis with ASCII characters
With the tip about unicodedata.name
and some further research I managed to put this thing together:
import unicodedata
from unidecode import unidecode
def deEmojify(inputString):
returnString = ""
for character in inputString:
try:
character.encode("ascii")
returnString += character
except UnicodeEncodeError:
replaced = unidecode(str(character))
if replaced != '':
returnString += replaced
else:
try:
returnString += "[" + unicodedata.name(character) + "]"
except ValueError:
returnString += "[x]"
return returnString
Basically it first tries to find the most appropriate ascii representation, if that fails it tries using the unicode name, and if even that fails it simply replaces it with some simple marker.
For example Taking this string:
abcdšeđfčgžhÅiØjÆk 可爱!!!!!!!!
And running the function:
string = u'abcdšeđfčgžhÅiØjÆk \u53ef\u7231!!!!!!!!\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f60d\U0001f61d'
print(deEmojify(string))
Will produce the following result:
abcdsedfcgzhAiOjAEk[x] Ke Ai !!!!!!!![SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][SMILING FACE WITH HEART-SHAPED EYES][FACE WITH STUCK-OUT TONGUE AND TIGHTLY-CLOSED EYES]
Try this
import unicodedata
print( unicodedata.name(u'\U0001f60d'))
result is
SMILING FACE WITH HEART-SHAPED EYES