python use Pyyaml and keep format

If you use ruamel.yaml ¹, you can relatively easily achieve this, by combining this and this answer here on StackOverlow.

By default ruamel.yaml normalizes to an indent of 2, and drops superfluous quotes. As you don't seem to want that, you have to either explicitly set the indent, or have ruamel.yaml analyse the input, and tell it to preserve quotes:

import sys
import ruamel.yaml
import ruamel.yaml.util

yaml_str = """\
nas:
    mount_dir: '/nvr'
    mount_dirs: ['/mount/data0', '/mount/data1', '/mount/data2']
"""

result, indent, block_seq_indent = ruamel.yaml.util.load_yaml_guess_indent(
    yaml_str, preserve_quotes=True)
result['nas']['mount_dirs'][0] = "haha"
ruamel.yaml.round_trip_dump(result, sys.stdout, indent=indent,
                            block_seq_indent=block_seq_indent)

instead of the load_yaml_guess_indent() invocation you can do:

result = ruamel.yaml.round_trip_load(yaml_str, preserve_quotes=True)
indent = 4
block_sequence_indent = None 

If you want haha to be (single) quoted in the output make it a SingleQuotedScalarString:

result['nas']['mount_dirs'][0] = \
       ruamel.yaml.scalarstring.SingleQuotedScalarString("haha")

with that the output will be:

nas:
    mount_dir: '/nvr'
    mount_dirs: ['haha', '/mount/data1', '/mount/data2']

(given that your short example input has no block style sequences, the block_sequence_indent cannot be determined and will be None)


When using the newer API you have control over the indent of the mapping and sequences seperately:

yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=6, offset=3)  # not that that looks nice
data = yaml.load(some_stream)
yaml.dump(data, some_stream)

This will make your YAML formatted consistently if it wasn't so to begin with, and make no further changes after the first round-trip.


¹ Disclaimer: I am the author of that package.


ruamel.yaml unfortunately does not completely preserve original format, quoting its docs:

Although individual indentation of lines is not preserved, you can specify separate indentation levels for mappings and sequences (counting for sequences does not include the dash for a sequence element) and specific offset of block sequence dashes within that indentation.

I do not know any Python library that does that.

When I need to change a YAML file without touching its format I reluctantly use regexp (reluctantly as it's almost as bad as parsing XHTML with it).

Please feel free to suggest a better solution if you know any, I would gladly learn about it!

Tags:

Python

Yaml