How to load only specific columns from csv file into a DataFrame

Ian, I implemented a usecols option which does exactly what you describe. It will be in upcoming pandas 0.10; development version will be available soon.


Since 0.10, you can use usecols like

df = pd.read_csv(...., usecols=['name', 'age',..., 'income'])

There's no default way to do this right now. I would suggest chunking the file and iterating over it and discarding the columns you don't want. So something like pd.concat([x.ix[:, cols_to_keep] for x in pd.read_csv(..., chunksize=200)])

Tags:

Python

Pandas

Csv