How to parse/read a YAML file into a Python object?

If your YAML file looks like this:

# tree format
treeroot:
    branch1:
        name: Node 1
        branch1-1:
            name: Node 1-1
    branch2:
        name: Node 2
        branch2-1:
            name: Node 2-1

And you've installed PyYAML like this:

pip install PyYAML

And the Python code looks like this:

import yaml
with open('tree.yaml') as f:
    # use safe_load instead load
    dataMap = yaml.safe_load(f)

The variable dataMap now contains a dictionary with the tree data. If you print dataMap using PrettyPrint, you will get something like:

{
    'treeroot': {
        'branch1': {
            'branch1-1': {
                'name': 'Node 1-1'
            },
            'name': 'Node 1'
        },
        'branch2': {
            'branch2-1': {
                'name': 'Node 2-1'
            },
            'name': 'Node 2'
        }
    }
}

So, now we have seen how to get data into our Python program. Saving data is just as easy:

with open('newtree.yaml', "w") as f:
    yaml.dump(dataMap, f)

You have a dictionary, and now you have to convert it to a Python object:

class Struct:
    def __init__(self, **entries): 
        self.__dict__.update(entries)

Then you can use:

>>> args = your YAML dictionary
>>> s = Struct(**args)
>>> s
<__main__.Struct instance at 0x01D6A738>
>>> s...

and follow "Convert Python dict to object".

For more information you can look at pyyaml.org and this.


I wrote an implementation using named tuples that I believe is neat because of it being a bit readable. It handles the cases where your dictionary is nested as well. The parser code is as follows:

from collections import namedtuple


class Dict2ObjParser:
    def __init__(self, nested_dict):
        self.nested_dict = nested_dict

    def parse(self):
        nested_dict = self.nested_dict
        if (obj_type := type(nested_dict)) is not dict:
            raise TypeError(f"Expected 'dict' but found '{obj_type}'")
        return self._transform_to_named_tuples("root", nested_dict)

    def _transform_to_named_tuples(self, tuple_name, possibly_nested_obj):
        if type(possibly_nested_obj) is dict:
            named_tuple_def = namedtuple(tuple_name, possibly_nested_obj.keys())
            transformed_value = named_tuple_def(
                *[
                    self._transform_to_named_tuples(key, value)
                    for key, value in possibly_nested_obj.items()
                ]
            )
        elif type(possibly_nested_obj) is list:
            transformed_value = [
                self._transform_to_named_tuples(f"{tuple_name}_{i}", possibly_nested_obj[i])
                for i in range(len(possibly_nested_obj))
            ]
        else:
            transformed_value = possibly_nested_obj

        return transformed_value

I tested basic cases with the following code:

x = Dict2ObjParser({
    "a": {
        "b": 123,
        "c": "Hello, World!"
    },
    "d": [
        1,
        2,
        3
    ],
    "e": [
        {
            "f": "",
            "g": None
        },
        {
            "f": "Foo",
            "g": "Bar"
        },
        {
            "h": "Hi!",
            "i": None
        }
    ],
    "j": 456,
    "k": None
}).parse()

print(x)

It gives the following output: root(a=a(b=123, c='Hello, World!'), d=[1, 2, 3], e=[e_0(f='', g=None), e_1(f='Foo', g='Bar'), e_2(h='Hi!', i=None)], j=456, k=None)

Which when formatted a bit looks like:

root(
    a=a(
        b=123,
        c='Hello, World!'
    ),
    d=[1, 2, 3],
    e=[
        e_0(
            f='',
            g=None
        ),
        e_1(
            f='Foo',
            g='Bar'
        ),
        e_2(
            h='Hi!',
            i=None
        )
    ],
    j=456,
    k=None
)

And I can access the nested fields like any other object:

print(x.a.b)  # Prints: 123

In your case, the code would ultimately look as follows:

import yaml


with open(file_path, "r") as stream:
    nested_dict = yaml.safe_load(stream)
    nested_objt = Dict2ObjParser(nested_dict).parse()

I hope this helps!


From http://pyyaml.org/wiki/PyYAMLDocumentation:

add_path_resolver(tag, path, kind) adds a path-based implicit tag resolver. A path is a list of keys that form a path to a node in the representation graph. Paths elements can be string values, integers, or None. The kind of a node can be str, list, dict, or None.

#!/usr/bin/env python
import yaml

class Person(yaml.YAMLObject):
  yaml_tag = '!person'

  def __init__(self, name):
    self.name = name

yaml.add_path_resolver('!person', ['Person'], dict)

data = yaml.load("""
Person:
  name: XYZ
""")

print data
# {'Person': <__main__.Person object at 0x7f2b251ceb10>}

print data['Person'].name
# XYZ