In Pandas, whats the equivalent of 'nrows' from read_csv() to be used in read_excel()?
If you know the number of rows in your Excel sheet, you can use the skip_footer parameter to read the first n - skip_footer rows of your file, where n is the total number of rows.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html
Usage:
data = pd.read_excel(filepath, header=0, parse_cols = "A:D", skip_footer=80)
Assuming your excel sheet has 100 rows, this line would parse the first 20 rows.
As noted in the documentation, as of pandas version 0.23, this is now a built-in option, and functions almost exactly as the OP stated.
The code
data = pd.read_excel(filepath, header=0, skiprows=4, nrows= 20, use_cols = "A:D")
will now read the excel file, take data from the first sheet (default), skip 4 rows of data, then take the first line (i.e., the fifth line of the sheet) as the header, read the next 20 rows of data into the dataframe (lines 6-25), and only use the columns A:D. Note that use_cols is now the final option, as parse_cols is deprecated.
I'd like to make (extend) @Erol's answer bit more flexible.
Assuming that we DON'T know the total number of rows in the excel sheet:
xl = pd.ExcelFile(filepath)
# parsing first (index: 0) sheet
total_rows = xl.book.sheet_by_index(0).nrows
skiprows = 4
nrows = 20
# calc number of footer rows
# (-1) - for the header row
skipfooter = total_rows - nrows - skiprows - 1
df = xl.parse(0, skiprows=skiprows, skipfooter=skipfooter, parse_cols="A:D") \
.dropna(axis=1, how='all')
.dropna(axis=1, how='all')
will drop all columns containing only NaN
's