Python: Getting rid of \u200b from a string using regular expressions

I tested that with python 2.7. replace works as expected:

>>> u'used\u200b'.replace(u'\u200b', '*')
u'used*'

and so does strip:

>>> u'used\u200b'.strip(u'\u200b')
u'used'

Just remember that the arguments to those functions have to be Unicode literals. It should be u'\u200b', not '\u200b'. Notice the u in the beginning.

And actually, writing that character to a file works just fine.

>>> import codecs
>>> f = codecs.open('a.txt', encoding='utf-8', mode='w')
>>> f.write(u'used\u200bZero')

See resources:

The python 2 Unicode howto
The python 3 Unicode howto
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Python: Getting rid of \u200b from a string using regular expressions

Tags:

Python

Unicode

Regex

Related

Recent Posts