What does Java Node normalize method do?
there are a lot of possible DOM trees that correspond to the same XML structure and each XML structure has at least one corresponding DOM tree. So conversion from DOM to XML is surjective. So it may happen that:
dom_tree_1 != dom_tree_2
# but:
dom_tree_1.save_DOM_as_XML() == dom_tree_2.save_DOM_as_XML()
And there is no way for ensuring:
dom_tree == dom_tree.save_DOM_as_XML().load_DOM_from_XML()
But we would like to have it bijective. That means each XML structure corresponds to one particular DOM tree.
So you can define a subset of all possible DOM trees that is bijective to the set of all possible XML structures.
# still:
dom_tree.save_DOM_as_XML() == dom_tree.normalized().save_DOM_as_XML()
# but with:
dom_tree_n = dom_tree.normalize()
# we now even have:
dom_tree_n == dom_tree_n.save_DOM_as_XML().load_DOM_from_XML().normalize()
So normalized DOM trees can be perfectly reconstructed from their XML representation. There is no information loss.
It cleans code from adjacent Text nodes and empty Text nodes
You can programmatically build a DOM tree that has extraneous structure not corresponding to actual XML structures - specifically things like multiple nodes of type text next to each other, or empty nodes of type text. The normalize()
method removes these, i.e. it combines adjacent text nodes and removes empty ones.
This can be useful when you have other code that expects DOM trees to always look like something built from an actual XML document.
This basically means that the following XML element
<foo>hello
wor
ld</foo>
could be represented like this in a denormalized node:
Element foo
Text node: ""
Text node: "Hello "
Text node: "wor"
Text node: "ld"
When normalized, the node will look like this
Element foo
Text node: "Hello world"