Reading CSV files in C#

Here, written by yours truly to use generic collections and iterator blocks. It supports double-quote enclosed text fields (including ones that span mulitple lines) using the double-escaped convention (so "" inside a quoted field reads as single quote character). It does not support:

  • Single-quote enclosed text
  • \ -escaped quoted text
  • alternate delimiters (won't yet work on pipe or tab delimited fields)
  • Unquoted text fields that begin with a quote

But all of those would be easy enough to add if you need them. I haven't benchmarked it anywhere (I'd love to see some results), but performance should be very good - better than anything that's .Split() based anyway.

Now on GitHub

Update: felt like adding single-quote enclosed text support. It's a simple change, but I typed it right into the reply window so it's untested. Use the revision link at the bottom if you'd prefer the old (tested) code.

public static class CSV
{
    public static IEnumerable<IList<string>> FromFile(string fileName, bool ignoreFirstLine = false)
    {
        using (StreamReader rdr = new StreamReader(fileName))
        {
            foreach(IList<string> item in FromReader(rdr, ignoreFirstLine)) yield return item;
        }
    }

    public static IEnumerable<IList<string>> FromStream(Stream csv, bool ignoreFirstLine=false)
    {
        using (var rdr = new StreamReader(csv))
        {
            foreach (IList<string> item in FromReader(rdr, ignoreFirstLine)) yield return item;
        }
    }

    public static IEnumerable<IList<string>> FromReader(TextReader csv, bool ignoreFirstLine=false)
    {
        if (ignoreFirstLine) csv.ReadLine();

        IList<string> result = new List<string>();

        StringBuilder curValue = new StringBuilder();
        char c;
        c = (char)csv.Read();
        while (csv.Peek() != -1)
        {
            switch (c)
            {
                case ',': //empty field
                    result.Add("");
                    c = (char)csv.Read();
                    break;
                case '"': //qualified text
                case '\'':
                    char q = c;
                    c = (char)csv.Read();
                    bool inQuotes = true;
                    while (inQuotes && csv.Peek() != -1)
                    {
                        if (c == q)
                        {
                            c = (char)csv.Read();
                            if (c != q)
                                inQuotes = false;
                        }

                        if (inQuotes)
                        {
                            curValue.Append(c);
                            c = (char)csv.Read();
                        } 
                    }
                    result.Add(curValue.ToString());
                    curValue = new StringBuilder();
                    if (c == ',') c = (char)csv.Read(); // either ',', newline, or endofstream
                    break;
                case '\n': //end of the record
                case '\r':
                    //potential bug here depending on what your line breaks look like
                    if (result.Count > 0) // don't return empty records
                    {
                        yield return result;
                        result = new List<string>();
                    }
                    c = (char)csv.Read();
                    break;
                default: //normal unqualified text
                    while (c != ',' && c != '\r' && c != '\n' && csv.Peek() != -1)
                    {
                        curValue.Append(c);
                        c = (char)csv.Read();
                    }
                    result.Add(curValue.ToString());
                    curValue = new StringBuilder();
                    if (c == ',') c = (char)csv.Read(); //either ',', newline, or endofstream
                    break;
            }
            
        }
        if (curValue.Length > 0) //potential bug: I don't want to skip on a empty column in the last record if a caller really expects it to be there
            result.Add(curValue.ToString());
        if (result.Count > 0) 
            yield return result;

    }
}

Take a look at A Fast CSV Reader on CodeProject.


The last time this question was asked, here's the answer I gave:

If you're just trying to read a CSV file with C#, the easiest thing is to use the Microsoft.VisualBasic.FileIO.TextFieldParser class. It's actually built into the .NET Framework, instead of being a third-party extension.

Yes, it is in Microsoft.VisualBasic.dll, but that doesn't mean you can't use it from C# (or any other CLR language).

Here's an example of usage, taken from the MSDN documentation:

Using MyReader As New _
Microsoft.VisualBasic.FileIO.TextFieldParser("C:\testfile.txt")
   MyReader.TextFieldType = FileIO.FieldType.Delimited
   MyReader.SetDelimiters(",")
   Dim currentRow As String()
   While Not MyReader.EndOfData
      Try
         currentRow = MyReader.ReadFields()
         Dim currentField As String
         For Each currentField In currentRow
            MsgBox(currentField)
         Next
      Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
      MsgBox("Line " & ex.Message & _
      "is not valid and will be skipped.")
      End Try
   End While
End Using

Again, this example is in VB.NET, but it would be trivial to translate it to C#.