Remove namespace and prefix from xml in python using lxml
You could also use XSLT to strip the namespaces...
XSLT 1.0 (test.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="*" priority="1">
<xsl:element name="{local-name()}" namespace="">
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{local-name()}" namespace="">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
Python
from lxml import etree
tree = etree.parse("metadata.xml")
xslt = etree.parse("test.xsl")
new_tree = tree.xslt(xslt)
print(etree.tostring(new_tree, pretty_print=True, xml_declaration=True,
encoding="UTF-8").decode("UTF-8"))
Output
<?xml version='1.0' encoding='UTF-8'?>
<package>
<provider>some data</provider>
<language>en-GB</language>
</package>
Replace tag as Uku Loskit suggests. In addition to that, use lxml.objectify.deannotate.
from lxml import etree, objectify
metadata = '/Users/user1/Desktop/Python/metadata.xml'
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(metadata, parser)
root = tree.getroot()
####
for elem in root.getiterator():
if not hasattr(elem.tag, 'find'): continue # guard for Comment tags
i = elem.tag.find('}')
if i >= 0:
elem.tag = elem.tag[i+1:]
objectify.deannotate(root, cleanup_namespaces=True)
####
tree.write('/Users/user1/Desktop/Python/done.xml',
pretty_print=True, xml_declaration=True, encoding='UTF-8')
Note: Some tags like Comment
return a function when accessing tag
attribute. added a guard for that.
We can get the desired output document in two steps:
- Remove namespace URIs from element names
- Remove unused namespace declarations from the XML tree
Example code
from lxml import etree
input_xml = """
<package xmlns="http://apple.com/itunes/importer">
<provider>some data</provider>
<language>en-GB</language>
<!-- some comment -->
<?xml-some-processing-instruction ?>
</package>
"""
root = etree.fromstring(input_xml)
# Iterate through all XML elements
for elem in root.getiterator():
# Skip comments and processing instructions,
# because they do not have names
if not (
isinstance(elem, etree._Comment)
or isinstance(elem, etree._ProcessingInstruction)
):
# Remove a namespace URI in the element's name
elem.tag = etree.QName(elem).localname
# Remove unused namespace declarations
etree.cleanup_namespaces(root)
print(etree.tostring(root).decode())
Output XML
<package>
<provider>some data</provider>
<language>en-GB</language>
<!-- some comment -->
<?xml-some-processing-instruction ?>
</package>
Details explaining the code
As described in the documentation, we use lxml.etree.QName.localname
to get local names of elements, that is names without namespace URIs. Then we replace the fully qualified names of the elements by their local names.
Some XML elements, such as comments and processing instructions do not have names. So, we have to skip these elements while replacing element names, otherwise a ValueError
will be raised.
Finally, we use lxml.etree.cleanup_namespaces()
to remove unused namespace declarations from the XML tree.
import xml.etree.ElementTree as ET
def remove_namespace(doc, namespace):
"""Remove namespace in the passed document in place."""
ns = u'{%s}' % namespace
nsl = len(ns)
for elem in doc.getiterator():
if elem.tag.startswith(ns):
elem.tag = elem.tag[nsl:]
metadata = '/Users/user1/Desktop/Python/metadata.xml'
tree = ET.parse(metadata)
root = tree.getroot()
remove_namespace(root, u'http://apple.com/itunes/importer')
tree.write('/Users/user1/Desktop/Python/done.xml',
pretty_print=True, xml_declaration=True, encoding='UTF-8')
Used a snippet of code from here This method could be easily extended to delete any namespace attributes by searching for tags that begin with "xmlns"