How to query XML using namespaces in Java with XPath?
All namespaces that you intend to select from in the source XML must be associated with a prefix in the host language. In Java/JAXP this is done by specifying the URI for each namespace prefix using an instance of javax.xml.namespace.NamespaceContext
. Unfortunately, there is no implementation of NamespaceContext
provided in the SDK.
Fortunately, it's very easy to write your own:
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import javax.xml.namespace.NamespaceContext;
public class SimpleNamespaceContext implements NamespaceContext {
private final Map<String, String> PREF_MAP = new HashMap<String, String>();
public SimpleNamespaceContext(final Map<String, String> prefMap) {
PREF_MAP.putAll(prefMap);
}
public String getNamespaceURI(String prefix) {
return PREF_MAP.get(prefix);
}
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
public Iterator getPrefixes(String uri) {
throw new UnsupportedOperationException();
}
}
Use it like this:
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
HashMap<String, String> prefMap = new HashMap<String, String>() {{
put("main", "http://schemas.openxmlformats.org/spreadsheetml/2006/main");
put("r", "http://schemas.openxmlformats.org/officeDocument/2006/relationships");
}};
SimpleNamespaceContext namespaces = new SimpleNamespaceContext(prefMap);
xpath.setNamespaceContext(namespaces);
XPathExpression expr = xpath
.compile("/main:workbook/main:sheets/main:sheet[1]");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
Note that even though the first namespace does not specify a prefix in the source document (i.e. it is the default namespace) you must associate it with a prefix anyway. Your expression should then reference nodes in that namespace using the prefix you've chosen, like this:
/main:workbook/main:sheets/main:sheet[1]
The prefix names you choose to associate with each namespace are arbitrary; they do not need to match what appears in the source XML. This mapping is just a way to tell the XPath engine that a given prefix name in an expression correlates with a specific namespace in the source document.
In the second example XML file the elements are bound to a namespace. Your XPath is attempting to address elements that are bound to the default "no namespace" namespace, so they don't match.
The preferred method is to register the namespace with a namespace-prefix. It makes your XPath much easier to develop, read, and maintain.
However, it is not mandatory that you register the namespace and use the namespace-prefix in your XPath.
You can formulate an XPath expression that uses a generic match for an element and a predicate filter that restricts the match for the desired local-name()
and the namespace-uri()
. For example:
/*[local-name()='workbook'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
/*[local-name()='sheets'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main']
/*[local-name()='sheet'
and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main'][1]
As you can see, it produces an extremely long and verbose XPath statement that is very difficult to read (and maintain).
You could also just match on the local-name()
of the element and ignore the namespace. For example:
/*[local-name()='workbook']/*[local-name()='sheets']/*[local-name()='sheet'][1]
However, you run the risk of matching the wrong elements. If your XML has mixed vocabularies (which may not be an issue for this instance) that use the same local-name()
, your XPath could match on the wrong elements and select the wrong content:
Your problem is the default namespace. Check out this article for how to deal with namespaces in your XPath: http://www.edankert.com/defaultnamespaces.html
One of the conclusions they draw is:
So, to be able to use XPath expressions on XML content defined in a (default) namespace, we need to specify a namespace prefix mapping
Note that this doesn't mean that you have to change your source document in any way (though you're free to put the namespace prefixes in there if you so desire). Sounds strange, right? What you will do is create a namespace prefix mapping in your java code and use said prefix in your XPath expression. Here, we'll create a mapping from spreadsheet
to your default namespace.
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
// there's no default implementation for NamespaceContext...seems kind of silly, no?
xpath.setNamespaceContext(new NamespaceContext() {
public String getNamespaceURI(String prefix) {
if (prefix == null) throw new NullPointerException("Null prefix");
else if ("spreadsheet".equals(prefix)) return "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI;
return XMLConstants.NULL_NS_URI;
}
// This method isn't necessary for XPath processing.
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
// This method isn't necessary for XPath processing either.
public Iterator getPrefixes(String uri) {
throw new UnsupportedOperationException();
}
});
// note that all the elements in the expression are prefixed with our namespace mapping!
XPathExpression expr = xpath.compile("/spreadsheet:workbook/spreadsheet:sheets/spreadsheet:sheet[1]");
// assuming you've got your XML document in a variable named doc...
Node result = (Node) expr.evaluate(doc, XPathConstants.NODE);
And voila...Now you've got your element saved in the result
variable.
Caveat: if you're parsing your XML as a DOM with the standard JAXP classes, be sure to call setNamespaceAware(true)
on your DocumentBuilderFactory
. Otherwise, this code won't work!