xml.parsers.expat.ExpatError: not well-formed (invalid token)
Python 3
One Liner
data: dict = xmltodict.parse(ElementTree.tostring(ElementTree.parse(path).getroot()))
Helper for .json
and .xml
I wrote a small helper function to load .json
and .xml
files from a given path
.
I thought it might come in handy for some people here:
import json
import xml.etree.ElementTree
def load_json(path: str) -> dict:
if path.endswith(".json"):
print(f"> Loading JSON from '{path}'")
with open(path, mode="r") as open_file:
content = open_file.read()
return json.loads(content)
elif path.endswith(".xml"):
print(f"> Loading XML as JSON from '{path}'")
xml = ElementTree.tostring(ElementTree.parse(path).getroot())
return xmltodict.parse(xml, attr_prefix="@", cdata_key="#text", dict_constructor=dict)
print(f"> Loading failed for '{path}'")
return {}
Notes
if you want to get rid of the
@
and#text
markers in the json output, use the parametersattr_prefix=""
andcdata_key=""
normally
xmltodict.parse()
returns anOrderedDict
but you can change that with the parameterdict_constructor=dict
Usage
path = "my_data.xml"
data = load_json(path)
print(json.dumps(data, indent=2))
# OUTPUT
#
# > Loading XML as JSON from 'my_data.xml'
# {
# "mydocument": {
# "@has": "an attribute",
# "and": {
# "many": [
# "elements",
# "more elements"
# ]
# },
# "plus": {
# "@a": "complex",
# "#text": "element as well"
# }
# }
# }
Sources
ElementTree.tostring()
ElementTree.parse()
xmltodict
json.dumps()
I think you forgot to define the encoding type. I suggest that you try to initialize that xml file to a string variable:
import xml.etree.ElementTree as ET
import xmltodict
import json
tree = ET.parse('your_data.xml')
xml_data = tree.getroot()
#here you can change the encoding type to be able to set it to the one you need
xmlstr = ET.tostring(xml_data, encoding='utf-8', method='xml')
data_dict = dict(xmltodict.parse(xmlstr))
In my case the file was being saved with a Byte Order Mark as is the default with notepad++
I resaved the file without the BOM
to plain utf8
.