Python YAML to JSON to YAML
Your file is losing its formatting because the original dump
routine
by default writes all leaf nodes in YAML flow-style, whereas your input is block style
all the way.
You are also losing the order of the keys, which is first because the JSON parser
uses dict, and second because dump
sorts the output.
If you look at your intermediate JSON you already see that the key order is
gone at that point. To preserve that, use the new API to load your YAML
and have a special JSON encoder as a replacement for dump that can
handle the subclasses of Mapping
in which the YAML is loaded similar to
this example
from the standard Python doc.
Assuming your YAML is stored in input.yaml
:
import sys
import json
from collections.abc import Mapping, Sequence
from collections import OrderedDict
import ruamel.yaml
# if you instantiate a YAML instance as yaml, you have to explicitly import the error
from ruamel.yaml.error import YAMLError
yaml = ruamel.yaml.YAML() # this uses the new API
# if you have standard indentation, no need to use the following
yaml.indent(sequence=4, offset=2)
input_file = 'input.yaml'
intermediate_file = 'intermediate.json'
output_file = 'output.yaml'
class OrderlyJSONEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, Mapping):
return OrderedDict(o)
elif isinstance(o, Sequence):
return list(o)
return json.JSONEncoder.default(self, o)
def yaml_2_json(in_file, out_file):
with open(in_file, 'r') as stream:
try:
datamap = yaml.load(stream)
with open(out_file, 'w') as output:
output.write(OrderlyJSONEncoder(indent=2).encode(datamap))
except YAMLError as exc:
print(exc)
return False
return True
yaml_2_json(input_file, intermediate_file)
with open(intermediate_file) as fp:
sys.stdout.write(fp.read())
which gives:
{
"inputs": {
"webTierCpu": {
"type": "integer",
"minimum": 2,
"default": 2,
"maximum": 5,
"title": "Web Server CPU Count",
"description": "The number of CPUs for the Web nodes"
}
}
}
You see that your JSON has the appropriate key order, which we also
need to preserve on loading. You can do that without subclassing
anything, by specifying the loading of JSON objects into the subclass of
Mapping
, that the YAML parser is using internally, by providingobject_pairs_hook
.
from ruamel.yaml.comments import CommentedMap
def json_2_yaml(in_file, out_file):
with open(in_file, 'r') as stream:
try:
datamap = json.load(stream, object_pairs_hook=CommentedMap)
# if you need to "restore" literal style scalars, etc.
# walk_tree(datamap)
with open(out_file, 'w') as output:
yaml.dump(datamap, output)
except yaml.YAMLError as exc:
print(exc)
return False
return True
json_2_yaml(intermediate_file, output_file)
with open(output_file) as fp:
sys.stdout.write(fp.read())
Which outputs:
inputs:
webTierCpu:
type: integer
minimum: 2
default: 2
maximum: 5
title: Web Server CPU Count
description: The number of CPUs for the Web nodes
And I hope that that is similar enough to your original input to be acceptable.
Notes:
When using the new API I tend to use
yaml
as the name of the instance ofruamel.yaml.YAML()
, instead offrom ruamel import yaml
. That however masks the use ofyaml.YAMLError
because the error class is not an attribute ofYAML()
If you are developing this kind of stuff, I can recommend removing at least the user input from the actual functionality. It should be trivial to write your
parseyaml
andparsejson
to callyaml_2_json
resp.json_2_yaml
.Any comments in your original YAML file will be lost, although ruamel.yaml can load them. JSON originally did allow comments, but it is not in the specification and no parsers that I know can output comments.
Since your real file has literal block scalars you have to use some magic to get those back.
Include the following functions that walk a tree, recursing into dict values and list elements and converting any line with an embedded newline to a type that gets output to YAML as a literal blocks style scalar in place (hence no return value):
from ruamel.yaml.scalarstring import PreservedScalarString, SingleQuotedScalarString
from ruamel.yaml.compat import string_types, MutableMapping, MutableSequence
def preserve_literal(s):
return PreservedScalarString(s.replace('\r\n', '\n').replace('\r', '\n'))
def walk_tree(base):
if isinstance(base, MutableMapping):
for k in base:
v = base[k] # type: Text
if isinstance(v, string_types):
if '\n' in v:
base[k] = preserve_literal(v)
elif '${' in v or ':' in v:
base[k] = SingleQuotedScalarString(v)
else:
walk_tree(v)
elif isinstance(base, MutableSequence):
for idx, elem in enumerate(base):
if isinstance(elem, string_types):
if '\n' in elem:
base[idx] = preserve_literal(elem)
elif '${' in elem or ':' in elem:
base[idx] = SingleQuotedScalarString(elem)
else:
walk_tree(elem)
And then do
walk_tree(datamap)
after you load the data from JSON.
With all of the above you should have only one line that differs in your Wordpress.yaml
file.
function yaml_validate {
python -c 'import sys, yaml, json; yaml.safe_load(sys.stdin.read())'
}
function yaml2json {
python -c 'import sys, yaml, json; print(json.dumps(yaml.safe_load(sys.stdin.read())))'
}
function yaml2json_pretty {
python -c 'import sys, yaml, json; print(json.dumps(yaml.safe_load(sys.stdin.read()), indent=2, sort_keys=False))'
}
function json_validate {
python -c 'import sys, yaml, json; json.loads(sys.stdin.read())'
}
function json2yaml {
python -c 'import sys, yaml, json; print(yaml.dump(json.loads(sys.stdin.read())))'
}
More useful Bash tricks at http://github.com/frgomes/bash-scripts