How to output CDATA using ElementTree
lxml has support for CDATA and API like ElementTree.
After a bit of work, I found the answer myself. Looking at the ElementTree.py source code, I found there was special handling of XML comments and preprocessing instructions. What they do is create a factory function for the special element type that uses a special (non-string) tag value to differentiate it from regular elements.
def Comment(text=None):
element = Element(Comment)
element.text = text
return element
Then in the _write
function of ElementTree that actually outputs the XML, there's a special case handling for comments:
if tag is Comment:
file.write("<!-- %s -->" % _escape_cdata(node.text, encoding))
In order to support CDATA sections, I create a factory function called CDATA
, extended the ElementTree class and changed the _write
function to handle the CDATA elements.
This still doesn't help if you want to parse an XML with CDATA sections and then output it again with the CDATA sections, but it at least allows you to create XMLs with CDATA sections programmatically, which is what I needed to do.
The implementation seems to work with both ElementTree and cElementTree.
import elementtree.ElementTree as etree
#~ import cElementTree as etree
def CDATA(text=None):
element = etree.Element(CDATA)
element.text = text
return element
class ElementTreeCDATA(etree.ElementTree):
def _write(self, file, node, encoding, namespaces):
if node.tag is CDATA:
text = node.text.encode(encoding)
file.write("\n<![CDATA[%s]]>\n" % text)
else:
etree.ElementTree._write(self, file, node, encoding, namespaces)
if __name__ == "__main__":
import sys
text = """
<?xml version='1.0' encoding='utf-8'?>
<text>
This is just some sample text.
</text>
"""
e = etree.Element("data")
cdata = CDATA(text)
e.append(cdata)
et = ElementTreeCDATA(e)
et.write(sys.stdout, "utf-8")