Is it possible to force Excel recognize UTF-8 CSV files automatically?

It is incredible that there are so many answers but none answers the question:

"When I was asking this question, I asked for a way of opening a UTF-8 CSV file in Excel without any problems for a user,..."

The answer marked as the accepted answer with 200+ up-votes is useless for me because I don't want to give my users a manual how to configure Excel. Apart from that: this manual will apply to one Excel version but other Excel versions have different menus and configuration dialogs. You would need a manual for each Excel version.

So the question is how to make Excel show UTF8 data with a simple double click?

Well at least in Excel 2007 this is not possible if you use CSV files because the UTF8 BOM is ignored and you will see only garbage. This is already part of the question of Lyubomyr Shaydariv:

"I also tried specifying UTF-8 BOM EF BB BF, but Excel ignores that."

I make the same experience: Writing russian or greek data into a UTF8 CSV file with BOM results in garbage in Excel:

Content of UTF8 CSV file:

Colum1;Column2
Val1;Val2
Авиабилет;Tλληνικ

Result in Excel 2007:

CSV UTF8 Excel

A solution is to not use CSV at all. This format is implemented so stupidly by Microsoft that it depends on the region settings in control panel if comma or semicolon is used as separator. So the same CSV file may open correctly on one computer but on anther computer not. "CSV" means "Comma Separated Values" but for example on a german Windows by default semicolon must be used as separator while comma does not work. (Here it should be named SSV = Semicolon Separated Values) CSV files cannot be interchanged between different language versions of Windows. This is an additional problem to the UTF-8 problem.

Excel exists since decades. It is a shame that Microsoft was not able to implement such a basic thing as CSV import in all these years.


However, if you put the same values into a HTML file and save that file as UTF8 file with BOM with the file extension XLS you will get the correct result.

Content of UTF8 XLS file:

<table>
<tr><td>Colum1</td><td>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
</table>

Result in Excel 2007:

UTF8 HTML Excel

You can even use colors in HTML which Excel will show correctly.

<style>
.Head { background-color:gray; color:white; }
.Red  { color:red; }
</style>
<table border=1>
<tr><td class=Head>Colum1</td><td class=Head>Column2</td></tr>
<tr><td>Val1</td><td>Val2</td></tr>
<tr><td class=Red>Авиабилет</td><td class=Red>Tλληνικ</td></tr>
</table>

Result in Excel 2007:

UTF8 HTML Excel

In this case only the table itself has a black border and lines. If you want ALL cells to display gridlines this is also possible in HTML:

<html xmlns:x="urn:schemas-microsoft-com:office:excel">
    <head>
        <meta http-equiv="content-type" content="text/plain; charset=UTF-8"/>
        <xml>
            <x:ExcelWorkbook>
                <x:ExcelWorksheets>
                    <x:ExcelWorksheet>
                        <x:Name>MySuperSheet</x:Name>
                        <x:WorksheetOptions>
                            <x:DisplayGridlines/>
                        </x:WorksheetOptions>
                    </x:ExcelWorksheet>
                </x:ExcelWorksheets>
            </x:ExcelWorkbook>
        </xml>
    </head>
    <body>
        <table>
            <tr><td>Colum1</td><td>Column2</td></tr>
            <tr><td>Val1</td><td>Val2</td></tr>
            <tr><td>Авиабилет</td><td>Tλληνικ</td></tr>
        </table>
    </body>
</html>

This code even allows to specify the name of the worksheet (here "MySuperSheet")

Result in Excel 2007:

enter image description here


Alex is correct, but as you have to export to csv, you can give the users this advice when opening the csv files:

  1. Save the exported file as a csv
  2. Open Excel
  3. Import the data using Data-->Import External Data --> Import Data
  4. Select the file type of "csv" and browse to your file
  5. In the import wizard change the File_Origin to "65001 UTF" (or choose correct language character identifier)
  6. Change the Delimiter to comma
  7. Select where to import to and Finish

This way the special characters should show correctly.


The UTF-8 Byte-order marker will clue Excel 2007+ in to the fact that you're using UTF-8. (See this SO post).

In case anybody is having the same issues I was, .NET's UTF8 encoding class does not output a byte-order marker in a GetBytes() call. You need to use streams (or use a workaround) to get the BOM to output.


The bug with ignored BOM seems to be fixed for Excel 2013. I had same problem with Cyrillic letters, but adding BOM character \uFEFF did help.

Tags:

Csv

Excel

Utf 8