How to specify column names while reading an Excel file using Pandas?

This thread is 5 years old and outdated now, but still shows up on the top of the list from a generic search. So I am adding this note. Pandas now (v0.22) has a keyword to specify column names at parsing Excel files. Use:

import pandas as pd
xl = pd.ExcelFile("Path + filename")
df = xl.parse("Sheet 1", header=None, names=['A', 'B', 'C'])

If header=None is not set, pd seems to consider the first row as header and delete it during parsing. If there is indeed a header, but you dont want to use it, you have two choices, either (1) use "names" kwarg only; or (2) use "names" with header=None and skiprows=1. I personally prefer the second option, since it clearly makes note that the input file is not in the format I want, and that I am doing something to go around it.

As Ram said, this post comes on the top and may be useful to some.... In pandas 0.24.2 (may be earlier as well), read_excel itself has the capability of ignoring the source headers and giving your own col names and few other good controls:

DID = pd.read_excel(file1, sheet_name=0, header=None, usecols=[0, 1, 6], names=['A', 'ID', 'B'], dtype={2:str}, skiprows=10)

# for example....
# usecols => read only specific col indexes
# dtype => specifying the data types
# skiprows => skip number of rows from the top.

I think setting them afterwards is the only way in this case, so if you have for example four columns in your DataFrame:

df.columns = ['W','X','Y','Z']

If you know in advance what the headers in the Excelfile are its probably better to rename them, this would rename W into A, etc:

df.rename(columns={'W':'A', 'X':'B', etc})

How to specify column names while reading an Excel file using Pandas?

Tags:

Python

Pandas

Related

Recent Posts