Read specific columns from a csv file with csv module?
You can use numpy.loadtext(filename)
. For example if this is your database .csv
:
ID | Name | Address | City | State | Zip | Phone | OPEID | IPEDS |
10 | Adam | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Carl | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Adolf | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
10 | Den | 130 W.. | Mo.. | AL... | 3.. | 334.. | 01023 | 10063 |
And you want the Name
column:
import numpy as np
b=np.loadtxt(r'filepath\name.csv',dtype=str,delimiter='|',skiprows=1,usecols=(1,))
>>> b
array([' Adam ', ' Carl ', ' Adolf ', ' Den '],
dtype='|S7')
More easily you can use genfromtext
:
b = np.genfromtxt(r'filepath\name.csv', delimiter='|', names=True,dtype=None)
>>> b['Name']
array([' Adam ', ' Carl ', ' Adolf ', ' Den '],
dtype='|S7')
Use pandas:
import pandas as pd
my_csv = pd.read_csv(filename)
column = my_csv.column_name
# you can also use my_csv['column_name']
Discard unneeded columns at parse time:
my_filtered_csv = pd.read_csv(filename, usecols=['col1', 'col3', 'col7'])
P.S. I'm just aggregating what other's have said in a simple manner. Actual answers are taken from here and here.
import csv
from collections import defaultdict
columns = defaultdict(list) # each value in each column is appended to a list
with open('file.txt') as f:
reader = csv.DictReader(f) # read rows into a dictionary format
for row in reader: # read a row as {column1: value1, column2: value2,...}
for (k,v) in row.items(): # go over each column name and value
columns[k].append(v) # append the value into the appropriate list
# based on column name k
print(columns['name'])
print(columns['phone'])
print(columns['street'])
With a file like
name,phone,street
Bob,0893,32 Silly
James,000,400 McHilly
Smithers,4442,23 Looped St.
Will output
>>>
['Bob', 'James', 'Smithers']
['0893', '000', '4442']
['32 Silly', '400 McHilly', '23 Looped St.']
Or alternatively if you want numerical indexing for the columns:
with open('file.txt') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
for (i,v) in enumerate(row):
columns[i].append(v)
print(columns[0])
>>>
['Bob', 'James', 'Smithers']
To change the deliminator add delimiter=" "
to the appropriate instantiation, i.e reader = csv.reader(f,delimiter=" ")
The only way you would be getting the last column from this code is if you don't include your print statement in your for
loop.
This is most likely the end of your code:
for row in reader:
content = list(row[i] for i in included_cols)
print content
You want it to be this:
for row in reader:
content = list(row[i] for i in included_cols)
print content
Now that we have covered your mistake, I would like to take this time to introduce you to the pandas module.
Pandas is spectacular for dealing with csv files, and the following code would be all you need to read a csv and save an entire column into a variable:
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']
so if you wanted to save all of the info in your column Names
into a variable, this is all you need to do:
names = df.Names
It's a great module and I suggest you look into it. If for some reason your print statement was in for
loop and it was still only printing out the last column, which shouldn't happen, but let me know if my assumption was wrong. Your posted code has a lot of indentation errors so it was hard to know what was supposed to be where. Hope this was helpful!