How to load only specific columns from csv file into a DataFrame
Ian, I implemented a usecols
option which does exactly what you describe. It will be in upcoming pandas 0.10; development version will be available soon.
Since 0.10
, you can use usecols
like
df = pd.read_csv(...., usecols=['name', 'age',..., 'income'])
There's no default way to do this right now. I would suggest chunking the file and iterating over it and discarding the columns you don't want.
So something like pd.concat([x.ix[:, cols_to_keep] for x in pd.read_csv(..., chunksize=200)])