How can I read only the header column of a CSV file using Python?
I've used iglob
as an example to search for the .csv
files, but one way is to use a set, then adjust as necessary, eg:
import csv
from glob import iglob
unique_headers = set()
for filename in iglob('*.csv'):
with open(filename, 'rb') as fin:
csvin = csv.reader(fin)
unique_headers.update(next(csvin, []))
I might be a little late to the party but here's one way to do it using just the Python standard library. When dealing with text data, I prefer to use Python 3 because unicode. So this is very close to your original suggestion except I'm only reading in one row rather than the whole file.
import csv
with open(fpath, 'r') as infile:
reader = csv.DictReader(infile)
fieldnames = reader.fieldnames
Hopefully that helps!
Expanding on the answer given by Jeff It is now possbile to use pandas
without actually reading any rows.
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: pd.DataFrame(np.random.randn(10, 4), columns=list('abcd')).to_csv('test.csv', mode='w')
In [4]: pd.read_csv('test.csv', index_col=0, nrows=0).columns.tolist()
Out[4]: ['a', 'b', 'c', 'd']
pandas
can have the advantage that it deals more gracefully with CSV encodings.