Most Pythonic way to read CSV values into dict of lists

Depending on what type of data you're storing and if you're ok with using numpy, a good way to do this can be with numpy.genfromtxt:

import numpy as np
data = np.genfromtxt('data.csv', delimiter=',', names=True)

What this will do is create a numpy Structured Array, which provides a nice interface for querying the data by header name (make sure to use names=True if you have a header row).

Example, given data.csv containing:

a,b,c
1,2,3
4,5,6
7,8,9

You can then access elements with:

>>> data['a']        # Column with header 'a'
array([ 1.,  4.,  7.])
>>> data[0]          # First row
(1.0, 2.0, 3.0)
>>> data['c'][2]     # Specific element
9.0
>>> data[['a', 'c']] # Two columns
array([(1.0, 3.0), (4.0, 6.0), (7.0, 9.0)],
      dtype=[('a', '<f8'), ('c', '<f8')])

genfromtext also provides a way, as you requested, to "format the data being ingested by column up front."

converters : variable, optional

The set of functions that convert the data of a column to a value. The converters can also be used to provide a default value for missing data: converters = {3: lambda s: float(s or 0)}.

If you're willing to use a third-party library, then the merge_with function from Toolz makes this whole operation a one-liner:

dict_of_lists = merge_with(list, *csv.DictReader(open(f)))

Using only the stdlib, a defaultdict makes the code less repetitive:

from collections import defaultdict
import csv

f = 'test.csv'

dict_of_lists = defaultdict(list)
for record in DictReader(open(f)):
    for key, val in record.items():    # or iteritems in Python 2
        dict_of_lists[key].append(val)

If you need to do this often, factor it out into a function, e.g. transpose_csv.

Most Pythonic way to read CSV values into dict of lists

Tags:

Python

Csv

Dictionary

List

Related

Recent Posts