Using Pandas to pd.read_excel() for multiple worksheets of the same workbook
You could also specify the sheet name as a parameter:
data_file = pd.read_excel('path_to_file.xls', sheet_name="sheet_name")
will upload only the sheet "sheet_name"
.
Try pd.ExcelFile
:
xls = pd.ExcelFile('path_to_file.xls')
df1 = pd.read_excel(xls, 'Sheet1')
df2 = pd.read_excel(xls, 'Sheet2')
As noted by @HaPsantran, the entire Excel file is read in during the ExcelFile()
call (there doesn't appear to be a way around this). This merely saves you from having to read the same file in each time you want to access a new sheet.
Note that the sheet_name
argument to pd.read_excel()
can be the name of the sheet (as above), an integer specifying the sheet number (eg 0, 1, etc), a list of sheet names or indices, or None
. If a list is provided, it returns a dictionary where the keys are the sheet names/indices and the values are the data frames. The default is to simply return the first sheet (ie, sheet_name=0
).
If None
is specified, all sheets are returned, as a {sheet_name:dataframe}
dictionary.
There are a few options:
Read all sheets directly into an ordered dictionary.
import pandas as pd
# for pandas version >= 0.21.0
sheet_to_df_map = pd.read_excel(file_name, sheet_name=None)
# for pandas version < 0.21.0
sheet_to_df_map = pd.read_excel(file_name, sheetname=None)
Read the first sheet directly into dataframe
df = pd.read_excel('excel_file_path.xls')
# this will read the first sheet into df
Read the excel file and get a list of sheets. Then chose and load the sheets.
xls = pd.ExcelFile('excel_file_path.xls')
# Now you can list all sheets in the file
xls.sheet_names
# ['house', 'house_extra', ...]
# to read just one sheet to dataframe:
df = pd.read_excel(file_name, sheet_name="house")
Read all sheets and store it in a dictionary. Same as first but more explicit.
# to read all sheets to a map
sheet_to_df_map = {}
for sheet_name in xls.sheet_names:
sheet_to_df_map[sheet_name] = xls.parse(sheet_name)
# you can also use sheet_index [0,1,2..] instead of sheet name.
Thanks @ihightower for pointing it out way to read all sheets and @toto_tico,@red-headphone for pointing out the version issue.
sheetname : string, int, mixed list of strings/ints, or None, default 0 Deprecated since version 0.21.0: Use sheet_name instead Source Link
You can also use the index for the sheet:
xls = pd.ExcelFile('path_to_file.xls')
sheet1 = xls.parse(0)
will give the first worksheet. for the second worksheet:
sheet2 = xls.parse(1)