CSV reader behavior with None and empty string

You could at least partially side-step what the csv module does by creating your own version of a singleton None-like class/value:

from __future__ import print_function
import csv


class NONE(object):
    ''' None-like class. '''
    def __repr__(self): # Method csv.writer class uses to write values.
        return 'NONE'   # Unique string value to represent None.
    def __len__(self):  # Method called to determine length and truthiness.
        return 0

NONE = NONE()  # Singleton instance of the class.


if __name__ == '__main__':

    try:
        from cStringIO import StringIO  # Python 2.
    except ModuleNotFoundError:
        from io import StringIO  # Python 3.

    data = [['None value', None], ['NONE value', NONE], ['empty string', '']]
    f = StringIO()
    csv.writer(f).writerows(data)

    f = StringIO(f.getvalue())
    print(" input:", data)
    print("output:", [e for e in csv.reader(f)])

Results:

 input: [['None value', None], ['NONE value', NONE],   ['empty string', '']]
output: [['None value', ''],   ['NONE value', 'NONE'], ['empty string', '']]

Using NONE instead of None would preserve enough information for you to be able to differentiate between it and any actual empty-string data values.

Even better alternative…

You could use the same approach to implement a pair of relatively lightweight csv.reader and csv.writer “proxy” classes — necessary since you can't actually subclass the built-in csv classes which are written in C — without introducing a lot of overhead (since the majority of the processing would still be performed by the underlying built-ins). This would make what goes on completely transparent since it's all encapsulated within the proxies.

from __future__ import print_function
import csv


class csvProxyBase(object): _NONE = '<None>'  # Unique value representing None.


class csvWriter(csvProxyBase):
    def __init__(self, csvfile, *args, **kwrags):
        self.writer = csv.writer(csvfile, *args, **kwrags)
    def writerow(self, row):
        self.writer.writerow([self._NONE if val is None else val for val in row])
    def writerows(self, rows):
        list(map(self.writerow, rows))


class csvReader(csvProxyBase):
    def __init__(self, csvfile, *args, **kwrags):
        self.reader = csv.reader(csvfile, *args, **kwrags)
    def __iter__(self):
        return self
    def __next__(self):
        return [None if val == self._NONE else val for val in next(self.reader)]
    next = __next__  # Python2.x compatibility.


if __name__ == '__main__':

    try:
        from cStringIO import StringIO  # Python 2.
    except ModuleNotFoundError:
        from io import StringIO  # Python 3.

    data = [['None value', None], ['empty string', '']]
    f = StringIO()
    csvWriter(f).writerows(data)

    f = StringIO(f.getvalue())
    print("input : ", data)
    print("ouput : ", [e for e in csvReader(f)])

Results:

 input: [['None value', None], ['empty string', '']]
output: [['None value', None], ['empty string', '']]

The documentation suggests that what you want is not possible:

To make it as easy as possible to interface with modules which implement the DB API, the value None is written as the empty string.

This is in the documentation for the writer class, suggesting it is true for all dialects and is an intrinsic limitation of the csv module.

I for one would support changing this (along with various other limitations of the csv module), but it may be that people would want to offload this sort of work into a different library, and keep the CSV module simple (or at least as simple as it is).

If you need more powerful file-reading capabilities, you might want to look at the CSV reading functions in numpy, scipy, and pandas, which as I recall have more options.

CSV reader behavior with None and empty string

Even better alternative…

Tags:

Python

Csv

String

Nonetype

Related

Recent Posts