How to read well formed XML in Java, but skip the schema?

The simplest answer is this one-liner, called after creating the DocumentBuilderFactory:

dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

Shamelessly cribbed from Make DocumentBuilder.parse ignore DTD references.


The reference is not for Schema, but for a DTD.

DTD files can contain more than just structural rules. They can also contain entity references. XML parsers are obliged to load and parse DTD references, because they could contain entity references that might affect how the document is parsed and the content of the file(you could have an entity reference for characters or even whole phrases of text).

If you want to want to avoid loading and parsing the referenced DTD, you can provide your own EntityResolver and test for the referenced DTD and decide whether load a local copy of the DTD file or just return null.

Code sample from the referenced answer on custom EntityResolvers:

   builder.setEntityResolver(new EntityResolver() {
        @Override
        public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException {
            if (systemId.contains("foo.dtd")) {
                return new InputSource(new StringReader(""));
            } else {
                return null;
            }
        }
    });

The issue here isn't one of validation. Regardless of validation settings, the parser will still attempt to resolve any references in your document, such as entities, DTDs and (sometimes) schemas. It's only later on that it decides to validate using them (or not). You need to plug in an entity resolver to "intercept" these attempts at de-referencing.

Check out Apache XML Resolver for an easy(ish) way to do this.

Tags:

Java

Xml