Text based data format which supports multiline strings

XML with ElementTree (standard library) or lxml if you are OK with the markup overhead:

Data

<?xml version="1.0"?>
<data>
  <string>Lorem
Ipsum
Dolor
  </string>
</data>

Script

import xml.etree.ElementTree
root = xml.etree.ElementTree.parse('data.xml').getroot()
for child in root:
  print(child.tag, child.attrib, child.text)

Output

string {} Lorem
Ipsum
Dolor

Apropos of your comment:

I want to use it for configuration. A lot of applications invent their own configuration language. I want to avoid this. But json and ConfigParser don't satisfy me. Json does not allow strings with newlines (only \n) and ConfigParser does not allow nested data structures. Next thing that I am missing: Validation (But this is a different topic).

There're 3 main options you have ConfigParser, ConfigObj, or YAML (PyYAML) - each with their particular pros and cons. All 3 are better then JSON for your use-case i.e. configuration file.

Now further, which one is better depends upon what exactly you want to store in your conf file.

ConfigObj - For configuration and validation (your use-case):

ConfigObj is very simple to use then YAML (also the ConfigParser). Supports default values and types, and also includes validation (a huge plus over ConfigParser).

An Introduction to ConfigObj

When you perform validation, each of the members in your specification are checked and they undergo a process that converts the values into the specified type. Missing values that have defaults will be filled in, and validation returns either True to indicate success or a dictionary with members that failed validation. The individual checks and conversions are performed by functions, and adding your own check function is very easy.

P.S. Yes, it allows multiline values.

Helpful links:

A Brief ConfigObj Tutorial

ConfigObj 5 Introduction and Reference

There are solid SO answers available on the comparison YAML vs ConfigParser vs ConfigObj:

What's better, ConfigObj or ConfigParser?

ConfigObj/ConfigParser vs. using YAML for Python settings file

I think you should consider YAML format. It supports block notation which is able to preserve newlines like this

data: |
   There once was a short man from Ealing
   Who got on a bus to Darjeeling
       It said on the door
       "Please don't spit on the floor"
   So he carefully spat on the ceiling

Also there is a lot of parsers for any kind of programming languages including python (i.e pyYaml).

Also there is a huge advantage that any valid JSON is YAML.

If the files are only used by Python (overlooking the interchange), you could simply put your data in a python script file and import this as a module:

Data

datum_1 = """ lorem
ipsum
dolor
"""
datum_list = [1, """two
liner"""]
datum_dict = {"key": None, "another": [None, 42.13]}
datum_tuple = ("anything", "goes")

Script

from data import *
d = [e for e in locals() if not e.startswith("__")]
print( d )
for k in d:
  print( k, locals()[k] )

Output

['datum_list', 'datum_1', 'datum_dict', 'datum_tuple']
datum_list [1, 'two\nliner']
datum_1  lorem
ipsum
dolor

datum_dict {'another': [None, 42.13], 'key': None}
datum_tuple ('anything', 'goes')

Update:

Code with dictionary comprehension

from data import *
d = {e:globals()[e] for e in globals() if not e.startswith("__")}
for k in d:
  print( k, d[k] )

Text based data format which supports multiline strings

Tags:

Python

Json

Format

Related

Recent Posts