How to drop a specific column of csv file while reading it using pandas?

If you know the column names prior, you can do it by setting usecols parameter

When you know which columns to use

Suppose you have csv file with columns ['id','name','last_name'] and you want just ['name','last_name']. You can do it as below:

import pandas as pd
df = pd.read_csv("sample.csv", usecols = ['name','last_name'])

when you want first N columns

If you don't know the column names but you want first N columns from dataframe. You can do it by

import pandas as pd
df = pd.read_csv("sample.csv", usecols = [i for i in range(n)])

Edit

When you know name of the column to be dropped

# Read column names from file
cols = list(pd.read_csv("sample_data.csv", nrows =1))
print(cols)

# Use list comprehension to remove the unwanted column in **usecol**
df= pd.read_csv("sample_data.csv", usecols =[i for i in cols if i != 'name'])

Get the column headers from your CSV using pd.read_csv with nrows=1, then do a subsequent read with usecols to pull everything but the column(s) you want to omit.

headers = [*pd.read_csv('sample.csv', nrows=1)]
df = pd.read_csv('sample.csv', usecols=[c for c in headers if c != 'name']))

Alternatively, you can do the same thing (read only the headers) very efficiently using the CSV module,

import csv

with open("sample.csv", 'r') as f:
    header = next(csv.reader(f))
    # For python 2, use
    # header = csv.reader(f).next()

df = pd.read_csv('sample.csv', usecols=list(set(header) - {'name'}))

The only parameter to read_csv() that you can use to select the columns you use is usecols. According to the documentation, usecols accepts list-like or callable. Because you only know the columns you want to drop, you can't use a list of the columns you want to keep. So use a callable:

pd.read_csv("sample.csv", 
            usecols=lambda x: x != 'name'
            )

And you could of course say x not in ['unwanted', 'column', 'names'] if you had a list of column names you didn't want to use.

How to drop a specific column of csv file while reading it using pandas?

Tags:

Python

Pandas

Csv

Dataframe

Related

Recent Posts