Saturday, November 28, 2009

Xerces-J: XSD 1.1 assertions on simple types

I'm trying to put up a post here, with few examples for assertions on XSD simple types, and also for complex types with simple contents, and testing them with Xerces-J XSD 1.1 implementation. The previous couple of posts on this blog, described assertions on XSD complex types having complex content (i.e, elements having "element only" or mixed content, and/or attributes).

1) Here's an example, taken from Roger L. Costello's collections of XSD 1.1 examples, which he's published on his web site:

XML document [1]:
  <Example>
    <even-integer>100</even-integer>        
  </Example>

XSD 1.1 document [2]:
  <schema xmlns="http://www.w3.org/2001/XMLSchema"
          elementFormDefault="qualified">

    <element name="Example">
       <complexType>
          <sequence>
             <element name="even-integer">
                <simpleType>
                  <restriction base="integer">
                     <assertion test="$value mod 2 = 0" />
                  </restriction>
                </simpleType>
             </element>
          </sequence>
       </complexType>
    </element>

  </schema>

The above XSD 1.1 schema [2] constrains the XSD integer values, to only even ones (this works fine with Xerces!). XSD 1.1 defines a new facet named, assertion on XSD built in simple types, which the above example describes.

Please note that, "assertion" facet (applicable both to XSD simple types, and complex types with simple contents) is conceptually different than "assert" constraint on complex types (some of the explanation, about this is also given below as well).

The XSD 1.1 spec mentions, that the assertions XPath 2 "dynamic context" get's augmented with a variable, $value. The XSD type of variable, $value is that of the base simple type (in this example, the type of $value is xs:integer). The detailed rules, for using variable $value in XSD 1.1 schemas are described, here.

It looks to me, that the ability to have an assertion facet on simple types, significantly enhances the XSD author's capability to provide many new constraints on simple type values, which were not possible in XSD 1.0 (for e.g, an ability to constrain integer values to be even, was not possible in XSD 1.0).

For the above example, we could specify assertions to something like below, as well:
<assertion test="$value mod 2 = 0" />
<assertion test="$value lt 500" />
(i.e, a set of two assertion facet instances)

Or perhaps, specifying only one assertion facet instance as following, <assertion test="($value mod 2 = 0) and ($value lt 500)" /> if user wishes, which realizes the same objective.

This enforces that the simple type value should be even, and also should be less than 500. Also, there are no limits to the number of assertion facet instances that can be specified. To my opinion, an ability to specify unlimited number of assertion facets (and also the assert constraints on complex types), makes assertions a tremendously useful XSD validation constructs.

Notes: Interestingly, the following facet definition achieves the same results as met by the 2nd assertion facet instance, that's described above:
<maxExclusive value="500" />
(this was available in, XSD 1.0 as well)

2) Complex types with simple contents, using assertions:
XML document [3]:
  <root>
    <x label="a">2</x>
    <x label="b">4</x>
  </root>

Here, the element "x" should have an attribute "label" with type xs:string. But the content of element "x" is simple (of type, xs:int for this example).
Additional we also want, that the simple content value of "x", should be an even number.

The XSD document for these validation constraints, is as follows [4]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
   <xs:element name="root">
     <xs:complexType>
       <xs:sequence>
         <xs:element name="x" maxOccurs="unbounded" type="X_Type" />
       </xs:sequence>
     </xs:complexType>
   </xs:element>
   
   <xs:complexType name="X_Type">
     <xs:simpleContent>
        <xs:extension base="xs:int">    
          <xs:attribute name="label" type="xs:string" />
          <xs:assert test="$value mod 2 = 0" />
        </xs:extension>
     </xs:simpleContent>
   </xs:complexType>
  
  </xs:schema>

The use of xs:assert instruction is stressed in this example.

It's interesting to see, that if we change value of one of "x" elements as follows:
<x label="a">21</x>
(I changed the first "x")

Xerces fails the validation of XML instance, and returns following error message to the user:
test.xml:2:22:cvc-assertion.3.13.4.1: Assertion evaluation ('$value mod 2 = 0') for element 'x' with type 'X_Type' did not succeed.

Here, the XML validation did not succeed, because the value 21 is not an even number.

3) The last example of this post is following:
This describes the scenario of Complex types with simple contents. But here, the simple content get's its value by "restriction of a complex type". The previous example described Complex types with simple contents, using derivation by extension.

The XML file remains same [3], while the new XSD document is following [5]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
   <xs:element name="root">
     <xs:complexType>
       <xs:sequence>
         <xs:element name="x" maxOccurs="unbounded" type="X_Type" />
       </xs:sequence>
     </xs:complexType>
   </xs:element>
   
   <xs:complexType name="X_Type">
     <xs:simpleContent>
        <xs:restriction base="x_base">      
           <xs:assertion test="$value mod 2 = 0" />
           <xs:assert test="@label = ('a','b')" />
        </xs:restriction>
     </xs:simpleContent>
   </xs:complexType>
   
   <xs:complexType name="x_base">
     <xs:simpleContent>
        <xs:extension base="xs:int">    
          <xs:attribute name="label" type="xs:string" />
        </xs:extension>
     </xs:simpleContent>
   </xs:complexType>
  
 </xs:schema>

Please notice, how assertions are specified on the complex type, "X_Type" (shown with bold emphasis). Here, we have two assertion instructions (xs:assertion and xs:assert). In this example, xs:assertion is a facet for the atomic value, of the complex type (the value of complex type is simple in this case!). While xs:assert is the assertions instruction on the complex type (which has access to the element tree).

The complexType -> simpleContent -> restriction, type definition can specify assertions with following grammar:
... assertion*, ..., assert* (i.e, 0-n xs:assertion components can be followed by 0-n xs:assert components (this ordering is significant, otherwise the XSD 1.1 processor will flag an error).
There could be other constructs as well, before xs:assertion here (and some after it. But anything after xs:assertion*, needs to be before the trailing xs:assert's). This is described in the relevant XSD 1.1 grammar at, http://www.w3.org/TR/2009/CR-xmlschema11-1-20090430/#dcl.ctd.ctsc.

Notes: The XML Schema WG decided to have two different names for assertion instructions (xs:assertion and xs:assert), for this particular scenario, so that the XSD Schema authors could decide, whether they are writing assertions as a facet for simple values, or assertions for complex types (which have access to the element tree). If this naming distinction was not made in XSD 1.1 assertions, then specification of asserts in XSD documents, in this case would have caused ambiguity (i.e, the XSD 1.1 processor could not tell, which assertion is a facet, and which is an assertion for the complex type).

Acknowledgements:
I must mention that XSD 1.1 examples shared by Roger L. Costello, helped us fix quite a bit of bugs in Xerces assertions implementation. Our sincere thanks are due, to Roger.

References:
1. Reader's could also find this article useful, http://www.ibm.com/developerworks/library/x-xml11pt2/ about XSD 1.1 co-occurence constraints, which describes XSD 1.1 assertions facility in detail.

I hope that this post was useful.

No comments: