Tuesday, April 21, 2009

Xerces-J: XML Schema 1.1 assertions support

This post is related to my earlier blog post, http://mukulgandhi.blogspot.com/2008/07/assertions.html about the XML Schema 1.1 assertions implementation into Xerces-J.

Today, I reached an important milestone with all the development finished for assertions in Xerces-J, and submitted an Apache JIRA issue for review.

Here is a small example of what XML Schema 1.1 assertions means:
<xs:complexType name="book">
      <xs:element name="name" type="xs:string" />
      <xs:element name="author" type="xs:string" />
      <xs:element name="price" type="xs:string" />
      <xs:element name="publisher" type="xs:string" />
      <xs:element name="pub-date" type="xs:date" /> 
    <xs:assert test="ends-with(price, 'USD')" />
    <xs:assert test="pub-date > xs:date('2007-12-31')" />

With this XML Schema 1.1 fragment, the user wants to have a validation constraint that, the price string should end with literal 'USD' and pub-date should be greater than the date 2007-12-31. This is a very simple example, but it does signify the usefulness of assertions syntax. We could have unlimited (0-n) numbers of xs:assert elements in a XSD schema type (which could be a simple type or a complex type. Though the assertions facet name in simple types is named xs:assertion). The value of 'test' attribute in assertions is an XPath 2.0 expression. All the assertions have to evaluate to boolean, "true" for an element to be locally valid.

There could be many other scenarios (and some of them quite complex, like for e.g., assertions present in a Schema type hierarchy) for writing assertions in XML Schema 1.1. It's difficult to specify all of them here. I'd ask the reader, to read the article [2] below, for learning about many of other, XML Schema 1.1 assertions scenarios.

With assertions in XML Schema 1.1 language, we could express much more involved XML validation constraints, that were almost impossible to specify in XML Schema 1.0. Using assertions, we can specify relationships between elements (like element names, contents etc), between elements and attributes, between attributes, and perhaps much more.

The assertions processing in XML Schema 1.1 works as follows:
When a XML Schema (1.1) processor encounters an element in the XML instance document, it must validate the element (if the user has requested validation) with it's associated type in the Schema (which could be a simple type or a complex type). The element's type declaration could be anonymous, or it could be a named type (which has a "name" attribute, and they are globally defined in the schema) declaration in the Schema. The XML Schema processor builds a XPath data model (XDM) tree rooted at this element (with Xerces, a XDM tree is built only if any assertions (which could be, 1-n in numbers) are associated with an element's type. If schema types of XML attributes have assertion facets, then these assertion facets work upon the attribute's value, and no XDM tree is constructed in this case). The XDM tree consists of the root element, it's attributes and all it's descendants. When an element validation is going on within Xerces, assertions evaluation also takes place as part of the validation process. Each assertion is evaluated on the XDM tree rooted at a given context element. Therefore, also any attempt by the assert XPath expression to access any node outside this element tree will not succeed.

We also have a wiki page for Xerces assertions implementation, http://wiki.apache.org/xerces/XML_Schema_1.1_Assertions. It describes a bit of implementation details of assertions in Xerces.

I'm happy to share that we expect Xerces-J to support the whole of assertions implementation in a near future release. And of course, Xerces would support lot of other XML Schema 1.1 features as well.

Following are few nice articles related to XML Schema 1.1, which are worth reading:
1. Overview of XML Schema 1.1 language
2. XML Schema 1.1 co-occurence constraints using XPath 2.0

No comments: