pandas read_csv index_col=None not working with delimiters at the end of each line

Quick Answer

Use index_col=False instead of index_col=None when you have delimiters at the end of each line to turn off index column inference and discard the last column.

More Detail

After looking at the data, there is a comma at the end of each line. And this quote (the documentation has been edited since the time this post was created):

index_col: column number, column name, or list of column numbers/names, to use as the index (row labels) of the resulting DataFrame. By default, it will number the rows without using any column, unless there is one more data column than there are headers, in which case the first column is taken as the index.

from the documentation shows that pandas believes you have n headers and n+1 data columns and is treating the first column as the index.

EDIT 10/20/2014 - More information

I found another valuable entry that is specifically about trailing limiters and how to simply ignore them:

If a file has one more column of data than the number of column names, the first column will be used as the DataFrame’s row names: ...

Ordinarily, you can achieve this behavior using the index_col option.

There are some exception cases when a file has been prepared with delimiters at the end of each data line, confusing the parser. To explicitly disable the index column inference and discard the last column, pass index_col=False: ...

Re: craigts's response, for anyone having trouble with using either False or None parameters for index_col, such as in cases where you're trying to get rid of a range index, you can instead use an integer to specify the column you want to use as the index. For example:

df = pd.read_csv('file.csv', index_col=0)

The above will set the first column as the index (and not add a range index in my "common case").

Update

Given the popularity of this answer, I thought i'd add some context/ a demo:

# Setting up the dummy data
In [1]: df = pd.DataFrame({"A":[1, 2, 3], "B":[4, 5, 6]})

In [2]: df
Out[2]:
   A  B
0  1  4
1  2  5
2  3  6

In [3]: df.to_csv('file.csv', index=None)
File[3]:
A  B
1  4
2  5
3  6

Reading without index_col or with None/False will all result in a range index:

In [4]: pd.read_csv('file.csv')
Out[4]:
   A  B
0  1  4
1  2  5
2  3  6

# Note that this is the default behavior, so the same as In [4]
In [5]: pd.read_csv('file.csv', index_col=None)
Out[5]:
   A  B
0  1  4
1  2  5
2  3  6

In [6]: pd.read_csv('file.csv', index_col=False)
Out[6]:
   A  B
0  1  4
1  2  5
2  3  6

However, if we specify that "A" (the 0th column) is actually the index, we can avoid the range index:

In [7]: pd.read_csv('file.csv', index_col=0)
Out[7]:
   B
A
1  4
2  5
3  6

pandas read_csv index_col=None not working with delimiters at the end of each line

Quick Answer

More Detail

Update

Tags:

Python

Pandas

Related

Recent Posts