Saturday, June 13, 2009

XML Schema validation with Xerces-J

Hiranya Jayathilaka raised an interesting discussion some time ago on xerces-dev list, that how we could validate an XML Schema 1.0 document using Xerces-J. Hiranya was looking for a solution using a Java API with Xerces-J. We were looking for verifying the correctness of the Schema document, and not doing an XML instance document validation.

I'm providing a summary of the discussion we had on the list, and the conclusions we made.

There are basically three ways of doing this:

1. Using a JAXP SchemaFactory
Using this technique, we do something like below:

SchemaFactory sf = SchemaFactory.newInstance ..
sf.setErrorHandler ..
Schema s = sf.newSchema(new StreamSource(schemapath));

The 'SchemaFactory.newSchema' call would not succeed if XML Schema has a grammar error.

2. Using XSLoader
Using this technique, we do something like below:

XSLoaderImpl xsLoader = new XSLoaderImpl();
XSModel xsModel = xsLoader.loadURI(xsdUri);

Michael Glavassevich suggested, how we could add an error handler to this mechanism:

DOMErrorHandler myErrorHandler = ...;

XSImplementation xsImpl = (XSImplementation) registry.getDOMImplementation("XS-Loader");
XSLoader xsLoader = xsImpl.createXSLoader(null);

DOMConfiguration config = xsLoader.getConfig();
config.setParameter("error-handler", myErrorHandler); // <-- set the error handler

3. Using XMLGrammarPreparser
Using this technique, we do something like below (thanks to Hiranya Jayathilaka for sharing this code):

XMLGrammarPreparser preparser = new XMLGrammarPreparser();
preparser.registerPreparser(XMLGrammarDescription.XML_SCHEMA, null);
preparser.setFeature("http://xml.org/sax/features/namespaces", true);
preparser.setFeature("http://xml.org/sax/features/validation", true);
preparser.setFeature("http://apache.org/xml/features/validation/schema", true);
preparser.setErrorHandler(new MyErrorHandler());
Grammar g = preparser.preparseGrammar(XMLGrammarDescription.XML_SCHEMA, new XMLInputSource(null, xsdUrl, null));

Michael Glavassevich provided a nice comparison of these three approaches:
SchemaFactory - it is an entry point into the JAXP Validation API for loading schemas for validation. If it was a user asking I'd recommend SchemaFactory of the three choices since it's in Java 5+ and would work in environments where Xerces isn't available.
XSLoader- it is an entry point into the XML Schema API for obtaining an XSModel for analysis/processing of the component model.
XMLGrammarPreparser - it provides API for preparsing schemas and DTDs for use in grammar caching (i.e. a lower-level alternative to SchemaFactory).

1 comment:

the g said...

Interesting post. Thanks. I was looking for something like this.

I am looking to validate a Schema as a part of our build process. I was wondering what would be the best choice....

We write our schemas (mostly they are normalised) and an antscript then de-normalises the xsds and creates the final xsd.

We do a manual check on the final xsd by loading it in XMLSpy and Eclipse.

We want to do it automatically when we do the build to avoid the above mentioned manual task.

Any help will be appreciated!

Regards
Gireesh