Python Pandas read_csv skip rows but keep header

Great answers already. Consider this generalized scenario:

Say your xls/csv has junk rows in the top 2 rows (row #0,1). Row #2 (3rd row) is the real header and you want to load 10 rows starting from row #50 (i.e 51st row).

Here's the snippet:

pd.read_csv('test.csv', header=2, skiprows=range(3, 50), nrows=10)

You can pass a list of row numbers to skiprows instead of an integer.

By giving the function the integer 10, you're just skipping the first 10 lines.

To keep the first row 0 (as the header) and then skip everything else up to row 10, you can write:

pd.read_csv('test.csv', sep='|', skiprows=range(1, 10))

Other ways to skip rows using `read_csv`

The two main ways to control which rows read_csv uses are the header or skiprows parameters.

Supose we have the following CSV file with one column:

a
b
c
d
e
f

In each of the examples below, this file is f = io.StringIO("\n".join("abcdef")).

Read all lines as values (no header, defaults to integers)

>>> pd.read_csv(f, header=None)
   0
0  a
1  b
2  c
3  d
4  e
5  f

Use a particular row as the header (skip all lines before that):
```
>>> pd.read_csv(f, header=3)
   d
0  e
1  f
```

Use a multiple rows as the header creating a MultiIndex (skip all lines before the last specified header line):

>>> pd.read_csv(f, header=[2, 4])                                                                                                                                                                        
   c
   e
0  f

Skip N rows from the start of the file (the first row that's not skipped is the header):

>>> pd.read_csv(f, skiprows=3)                                                                                                                                                                      
   d
0  e
1  f

Skip one or more rows by giving the row indices (the first row that's not skipped is the header):

>>> pd.read_csv(f, skiprows=[2, 4])                                                                                                                                                                      
   a
0  b
1  d
2  f

To expand on @AlexRiley's answer, the skiprows argument takes a list of numbers which determines what rows to skip. So:

pd.read_csv('test.csv', sep='|', skiprows=range(1, 10))

is the same as:

pd.read_csv('test.csv', sep='|', skiprows=[1,2,3,4,5,6,7,8,9])

The best way to go about ignoring specific rows would be to create your ignore list (either manually or with a function like range that returns a list of integers) and pass it to skiprows.

Python Pandas read_csv skip rows but keep header

Other ways to skip rows using `read_csv`

Tags:

Python

Pandas

Csv

Related

Recent Posts

Python Pandas read_csv skip rows but keep header

Other ways to skip rows using read_csv

Tags:

Python

Pandas

Csv

Related

Other ways to skip rows using `read_csv`