How to get rid of "Unnamed: 0" column in a pandas DataFrame?

It's the index column, pass pd.to_csv(..., index=False) to not write out an unnamed index column in the first place, see the to_csv() docs.

Example:

In [37]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
pd.read_csv(io.StringIO(df.to_csv()))

Out[37]:
   Unnamed: 0         a         b         c
0           0  0.109066 -1.112704 -0.545209
1           1  0.447114  1.525341  0.317252
2           2  0.507495  0.137863  0.886283
3           3  1.452867  1.888363  1.168101
4           4  0.901371 -0.704805  0.088335

compare with:

In [38]:
pd.read_csv(io.StringIO(df.to_csv(index=False)))

Out[38]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

You could also optionally tell read_csv that the first column is the index column by passing index_col=0:

In [40]:
pd.read_csv(io.StringIO(df.to_csv()), index_col=0)

Out[40]:
          a         b         c
0  0.109066 -1.112704 -0.545209
1  0.447114  1.525341  0.317252
2  0.507495  0.137863  0.886283
3  1.452867  1.888363  1.168101
4  0.901371 -0.704805  0.088335

This is usually caused by your CSV having been saved along with an (unnamed) index (RangeIndex).

(The fix would actually need to be done when saving the DataFrame, but this isn't always an option.)

Workaround: `read_csv` with `index_col=[0]` argument

IMO, the simplest solution would be to read the unnamed column as the index. Specify an index_col=[0] argument to pd.read_csv, this reads in the first column as the index. (Note the square brackets).

df = pd.DataFrame('x', index=range(5), columns=list('abc'))
df

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

# Save DataFrame to CSV.
df.to_csv('file.csv')

<!- ->

pd.read_csv('file.csv')

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

# Now try this again, with the extra argument.
pd.read_csv('file.csv', index_col=[0])

   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

Note
You could have avoided this in the first place by using index=False if the output CSV was created in pandas, if your DataFrame does not have an index to begin with:
df.to_csv('file.csv', index=False)
But as mentioned above, this isn't always an option.

Stopgap Solution: Filtering with `str.match`

If you cannot modify the code to read/write the CSV file, you can just remove the column by filtering with str.match:

df 

   Unnamed: 0  a  b  c
0           0  x  x  x
1           1  x  x  x
2           2  x  x  x
3           3  x  x  x
4           4  x  x  x

df.columns
# Index(['Unnamed: 0', 'a', 'b', 'c'], dtype='object')

df.columns.str.match('Unnamed')
# array([ True, False, False, False])

df.loc[:, ~df.columns.str.match('Unnamed')]
 
   a  b  c
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

To get ride of all Unnamed columns, you can also use regex such as df.drop(df.filter(regex="Unname"),axis=1, inplace=True)

How to get rid of "Unnamed: 0" column in a pandas DataFrame?

Workaround: `read_csv` with `index_col=[0]` argument

Stopgap Solution: Filtering with `str.match`

Tags:

Python

Pandas

Csv

Dataframe

Related

Recent Posts

How to get rid of "Unnamed: 0" column in a pandas DataFrame?

Workaround: read_csv with index_col=[0] argument

Stopgap Solution: Filtering with str.match

Tags:

Python

Pandas

Csv

Dataframe

Related

Workaround: `read_csv` with `index_col=[0]` argument

Stopgap Solution: Filtering with `str.match`