How to export DataFrame to Html with utf-8 encoding?
Your problem is in other code. Your sample code has a Unicode string that has been mis-decoded as latin1
, Windows-1252
, or similar, since it has UTF-8 sequences in it. Here I undo the bad decoding and redecode as UTF-8, but you'll want to find where the wrong decode is being performed:
>>> s = u'Rue du Gu\xc3\xa9, 78120 Sonchamp'
>>> s.encode('latin1').decode('utf8')
u'Rue du Gu\xe9, 78120 Sonchamp'
>>> print(s.encode('latin1').decode('utf8'))
Rue du Gué, 78120 Sonchamp
The way it worked for me:
html = df.to_html()
with open("dataframe.html", "w", encoding="utf-8") as file:
file.writelines('<meta charset="UTF-8">\n')
file.write(html)
The issue is actually in using df.to_html("mypage.html")
to save the HTML to a file directly. If instead you write the file yourself, you can avoid this encoding bug with pandas.
html = df.to_html()
with open("mypage.html", "w", encoding="utf-8") as file:
file.write(html)
You may also need to specify the character set in the head of the HTML for it to show up properly on certain browsers (HTML5 has UTF-8 as default):
<meta charset="UTF-8">
This was the only method that worked for me out of the several I've seen.