Sunday, February 14, 2010

Xerces-J, XSD 1.1 assertions: complexType -> simpleContent -> restriction

XSD 1.1 complex types are specified by the grammar given here, in the XSD 1.1 spec:

XSD complex type definitions are essentially composed of three mutually exclusive definitions, as follows:
  <complexType ...
    simpleContent |
    complexContent |
    openContent?, (group | all | choice | sequence)?, ((attribute | attributeGroup)*, anyAttribute?), assert*))

The assertions specification in complexType -> simpleContent -> restriction is a bit different, that all other assertions cases on complex types (as this consists of assertion facets, as well as/or assertions on the complex type).

This is specified by the following XSD 1.1 grammar:
    id = ID
    {any attributes with non-schema namespace . . .}>
    Content: (annotation?, (restriction | extension))

    base = QName
    id = ID
    {any attributes with non-schema namespace . . .}>
    Content: (annotation?, (simpleType?, (minExclusive | minInclusive | maxExclusive | maxInclusive | totalDigits | fractionDigits | maxScale | minScale | length | minLength | maxLength | enumeration | whiteSpace | pattern | assertion | {any with namespace: ##other})*)?, ((attribute | attributeGroup)*, anyAttribute?), assert*)

The XSD definition for xs:restriction above specifies assertions something like following:
assertion*, ..., assert*

Here, xs:assertion (with cardinality, 0-n) is a facet for the simple type value (specified by, complexType -> simpleContent). Whereas, xs:assert (with cardinality, 0-n) is an assertion definition on the complex type (which has access to the element tree, like the XML element itself, and it's attributes if there are any). xs:assertion definitions on, complexType -> simpleContent -> restriction do not have access to the element tree (on which the complex type is applicable), and can only access the simple type value (using, the implicit assertion variable $value, having a XSD type specified by the definition, <xs:restriction base = QName ...) of the element in the context.

Here's a small fictitious examples, illustrating these concepts:

XML document [1]:
  <A a="15">Example A</A>

XSD 1.1, Schema [2]:
  <xs:schema xmlns:xs="">

    <xs:element name="A">
          <xs:restriction base="myBase">    
            <xs:assertion test="contains($value, 'Example')" />
            <xs:assert test="@a mod 5 = 0" />    
    <xs:complexType name="myBase">
         <xs:extension base="xs:string"> 
           <xs:attribute name="a" type="xs:int" />  


In the Schema above [2], there are two assertions (shown with bold emphasis) specified on the XSD type. One of assertions is a facet for the simple content, and the other is an assertion on the complex type.

I believe, the above Schema is simple enough and self-explanatory, to illustrate the points I've tried to explain in this post.

Actually, what prompted me to write this post, was that there was a minor bug in complexType -> simpleContent -> restriction facet processing in Xerces-J XSD 1.1 SVN code, which we could fix today, and the fix is now available in Xerces-J SVN repository.

Interestingly, this fix was there in Xerces-J SVN during some past Xerces SVN version. But going forward with assertions development, this bug got introduced, and now has been fixed again.


yoann said...

I've got a question about xerces2 java... I would need to validate some XML against XSD1.1, but to do so, do I have to use the SVN branch of xerces2j? And if so, when do you think the project will release a "stable" version of this?


PS: I ask you because you seem to be quite informed on all of this.

Mukul Gandhi said...

I think, Apache Xerces-J 2.10.0 release (which would have XSD 1.1 support) could be released quite soon. The following Xerces-J project updates indicate, that Xerces-J 2.10.0 release is around the corner:

As of today, building Xerces-J JARs, from SVN branch is the only way to use XSD 1.1 functionality in Xerces.

yoann said...

Thanks a lot, I will try to confirm a release date from the developers.