What is a convenient way to store and retrieve boolean values in a CSV file

Ways to store boolean values in CSV files

  • Strings: Two common choices aretrue and false, True and False, but I've also seen yes and no.
  • Integers: 0 or 1
  • Floats: 0.0 or 1.0

Let's compare the respective advantages / disadvantages:

  • Strings:
    • + A human can read it
    • - CSV readers will have it as a string and both will evaluate to "true" when bool is applied to it
  • Integers:
    • + CSV readers might see that this column is integer and bool(0) evaluates to false.
    • + A bit more space efficient
    • - Not totally clear that it is boolean
  • Floats:
    • + CSV readers might see that this column is integer and bool(0.0) evaluates to false.
    • - Not totally clear that it is boolean
    • + Possible to have null (as NaN)

The Pandas CSV reader shows the described behaviour.

Convert Bool strings to Bool values

Have a look at mpu.string.str2bool:

>>> str2bool('True')
True
>>> str2bool('1')
True
>>> str2bool('0')
False

which has the following implementation:

def str2bool(string_, default='raise'):
    """
    Convert a string to a bool.

    Parameters
    ----------
    string_ : str
    default : {'raise', False}
        Default behaviour if none of the "true" strings is detected.

    Returns
    -------
    boolean : bool

    Examples
    --------
    >>> str2bool('True')
    True
    >>> str2bool('1')
    True
    >>> str2bool('0')
    False
    """
    true = ['true', 't', '1', 'y', 'yes', 'enabled', 'enable', 'on']
    false = ['false', 'f', '0', 'n', 'no', 'disabled', 'disable', 'off']
    if string_.lower() in true:
        return True
    elif string_.lower() in false or (not default):
        return False
    else:
        raise ValueError('The value \'{}\' cannot be mapped to boolean.'
                         .format(string_))

Tags:

Python

Csv