Sunday, February 14, 2010

Xerces-J, XSD 1.1 assertions: complexType -> simpleContent -> restriction

XSD 1.1 complex types are specified by the grammar given here, in the XSD 1.1 spec:
http://www.w3.org/TR/xmlschema11-1/#declare-type

XSD complex type definitions are essentially composed of three mutually exclusive definitions, as follows:
  <complexType ...
    simpleContent |
    complexContent |
    openContent?, (group | all | choice | sequence)?, ((attribute | attributeGroup)*, anyAttribute?), assert*))
  </complexType>

The assertions specification in complexType -> simpleContent -> restriction is a bit different, that all other assertions cases on complex types (as this consists of assertion facets, as well as/or assertions on the complex type).

This is specified by the following XSD 1.1 grammar:
  <simpleContent
    id = ID
    {any attributes with non-schema namespace . . .}>
    Content: (annotation?, (restriction | extension))
  </simpleContent>

  <restriction
    base = QName
    id = ID
    {any attributes with non-schema namespace . . .}>
    Content: (annotation?, (simpleType?, (minExclusive | minInclusive | maxExclusive | maxInclusive | totalDigits | fractionDigits | maxScale | minScale | length | minLength | maxLength | enumeration | whiteSpace | pattern | assertion | {any with namespace: ##other})*)?, ((attribute | attributeGroup)*, anyAttribute?), assert*)
  </restriction>

The XSD definition for xs:restriction above specifies assertions something like following:
assertion*, ..., assert*

Here, xs:assertion (with cardinality, 0-n) is a facet for the simple type value (specified by, complexType -> simpleContent). Whereas, xs:assert (with cardinality, 0-n) is an assertion definition on the complex type (which has access to the element tree, like the XML element itself, and it's attributes if there are any). xs:assertion definitions on, complexType -> simpleContent -> restriction do not have access to the element tree (on which the complex type is applicable), and can only access the simple type value (using, the implicit assertion variable $value, having a XSD type specified by the definition, <xs:restriction base = QName ...) of the element in the context.

Here's a small fictitious examples, illustrating these concepts:

XML document [1]:
  <A a="15">Example A</A>

XSD 1.1, Schema [2]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="A">
      <xs:complexType>
        <xs:simpleContent>
          <xs:restriction base="myBase">    
            <xs:assertion test="contains($value, 'Example')" />
            <xs:assert test="@a mod 5 = 0" />    
          </xs:restriction>
        </xs:simpleContent>
      </xs:complexType>
    </xs:element>
  
    <xs:complexType name="myBase">
      <xs:simpleContent>
         <xs:extension base="xs:string"> 
           <xs:attribute name="a" type="xs:int" />  
         </xs:extension>
      </xs:simpleContent> 
    </xs:complexType>

  </xs:schema>

In the Schema above [2], there are two assertions (shown with bold emphasis) specified on the XSD type. One of assertions is a facet for the simple content, and the other is an assertion on the complex type.

I believe, the above Schema is simple enough and self-explanatory, to illustrate the points I've tried to explain in this post.

Actually, what prompted me to write this post, was that there was a minor bug in complexType -> simpleContent -> restriction facet processing in Xerces-J XSD 1.1 SVN code, which we could fix today, and the fix is now available in Xerces-J SVN repository.

Interestingly, this fix was there in Xerces-J SVN during some past Xerces SVN version. But going forward with assertions development, this bug got introduced, and now has been fixed again.

3 comments:

yoann said...

I've got a question about xerces2 java... I would need to validate some XML against XSD1.1, but to do so, do I have to use the SVN branch of xerces2j? And if so, when do you think the project will release a "stable" version of this?

Thanks,
Yoann

PS: I ask you because you seem to be quite informed on all of this.

Mukul Gandhi said...

I think, Apache Xerces-J 2.10.0 release (which would have XSD 1.1 support) could be released quite soon. The following Xerces-J project updates indicate, that Xerces-J 2.10.0 release is around the corner:
http://wiki.apache.org/xerces/February2010
http://wiki.apache.org/xerces/November2009

As of today, building Xerces-J JARs, from SVN branch is the only way to use XSD 1.1 functionality in Xerces.

yoann said...

Thanks a lot, I will try to confirm a release date from the developers.
Best,
Yoann