How to drop a specific column of csv file while reading it using pandas?
If you know the column names prior, you can do it by setting usecols
parameter
When you know which columns to use
Suppose you have csv file with columns ['id','name','last_name']
and you want just ['name','last_name']
. You can do it as below:
import pandas as pd
df = pd.read_csv("sample.csv", usecols = ['name','last_name'])
when you want first N columns
If you don't know the column names but you want first N columns from dataframe. You can do it by
import pandas as pd
df = pd.read_csv("sample.csv", usecols = [i for i in range(n)])
Edit
When you know name of the column to be dropped
# Read column names from file
cols = list(pd.read_csv("sample_data.csv", nrows =1))
print(cols)
# Use list comprehension to remove the unwanted column in **usecol**
df= pd.read_csv("sample_data.csv", usecols =[i for i in cols if i != 'name'])
Get the column headers from your CSV using pd.read_csv
with nrows=1
, then do a subsequent read with usecols
to pull everything but the column(s) you want to omit.
headers = [*pd.read_csv('sample.csv', nrows=1)]
df = pd.read_csv('sample.csv', usecols=[c for c in headers if c != 'name']))
Alternatively, you can do the same thing (read only the headers) very efficiently using the CSV module,
import csv
with open("sample.csv", 'r') as f:
header = next(csv.reader(f))
# For python 2, use
# header = csv.reader(f).next()
df = pd.read_csv('sample.csv', usecols=list(set(header) - {'name'}))
The only parameter to read_csv()
that you can use to select the columns you use is usecols
. According to the documentation, usecols
accepts list-like or callable. Because you only know the columns you want to drop, you can't use a list of the columns you want to keep. So use a callable:
pd.read_csv("sample.csv",
usecols=lambda x: x != 'name'
)
And you could of course say x not in ['unwanted', 'column', 'names']
if you had a list of column names you didn't want to use.