Saturday, August 22, 2009

XML document validation, while parsing with Java DOM API

I spent few hours, discovering this while working with the DOM XML parsing API, and using it with Xerces-J, in a Java program.

I wanted to parse an XML document in Java using a plain DOM parser, along with doing validation, using either W3C XML Schema or a DTD.

Following is a sequence of instructions which needs to be written for this:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.setValidating(true);
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema ...
dbf.setSchema(schema);
DocumentBuilder docBuilder = dbf.newDocumentBuilder();
docBuilder.parse(..


These statements, are all that are necessary to accomplish this task. But there, are few catches here, which I wish to share.

1) If dbf.setValidating(true) is specified, then a DTD is mandatory. Even if W3C XML Schema is provided with dbf.setSchema .., parsing would fail, since dbf.setValidating(true) was specified, and if a DTD is absent.

2) If we only want to do validation with W3C XML Schema, then we shouldn't specify dbf.setValidating(true), which is required only for DTD validation.

I spent a few hours discovering this, and thought that somebody might benefit from this post.

1 comment:

Unknown said...

This is an excellent find. It saved me of my unwanted frustration on a saturday afternoon! Thank you!!