Wednesday, November 18, 2009

XSD 1.1: some XSD 1.1 samples running with Xerces-J

I was thinking lately to functionally stress test, the upcoming Xerces-J XSD 1.1 preview release (using the SVN code we have now, and later using the public binaries which will be provided by the Xerces project). I'm just curious to know, if there are any non-compliant parts in Xerces-J XSD 1.1 implementation, that I can find, which could probably serve as inputs to improving Xerces-J XSD 1.1 code base. To start with, I'll try to write few XSD 1.1 schemas, using the XSD 1.1 assertions and "Conditional Type Assignment (CTA)/type alternative" instructions.

Assertions examples

Example 1
Sample XML [1]
  <x a="xyz">
    <foo>5</foo>
    <bar>10</bar>
  </x>

XSD 1.1 Schema [2]
(Use Case: "the value of the foo element must be less than or equal to the value of the bar element")
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
    <xs:element name="x">
      <xs:complexType>
         <xs:sequence>
           <xs:element name="foo" type="xs:int" />
           <xs:element name="bar" type="xs:int" />
         </xs:sequence>
         <xs:attribute name="a" type="xs:string" use="required" />
         <xs:assert test="foo le bar" />
      </xs:complexType>
    </xs:element>
  
  </xs:schema>

Using Xerces-J XSD 1.1 validator, the XML document [1] above validates fine with the given XSD document [2].

If the assertion is written as follows (which is a false assertions. this is just to check for false assertions, and the error messages):
<xs:assert test="(foo + 10) le bar" />

Then that would make the XML instance document ([1] above) invalid, and following error message is returned by Xerces:
test.xml:4:5:cvc-assertion.3.13.4.1: Assertion evaluation ('(foo + 10) le bar') for element 'x' with type '#anonymous' did not succeed.

Use Case: "if the value of the attribute "a" is xyz, then the bar and baz elements are required, but otherwise they are optional".

This would require following assertion definition:
<xs:assert test="if (@a eq 'xyz') then (foo and bar) else true()" />

This works fine with Xerces-J.

Acknowledgements: Thanks to Douglass A Glidden for contributing these use cases, on xml-dev list.

Example 2
Sample XML [3]
  <Example>
    <x>hi</x>
    <y>there</y>
    <ASomeNameSuffix/>
  </Example>

XSD 1.1 Schema [4]
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
    <xs:element name="Example" type="myType" />
 
    <xs:complexType name="myType">
      <xs:sequence>
        <xs:element name="x" type="xs:string" />
        <xs:element name="y" type="xs:string" />
        <xs:any processContents="lax" />
      </xs:sequence>
      <xs:assert test="starts-with(local-name(*[3]), 'A')" />
    </xs:complexType>

  </xs:schema>

In this particular example (Example 2), the immediate sibling element, of element "y" is defined via the XSD wild-card instruction, <xs:any/>. The assertion in XSD Schema [4] enforces, that name of the sibling element, that appears after element "y" must start with letter "A". I think, this could not have been accomplished (i.e, defining a constraint on an element name, in xs:any wild-card instruction) with XSD 1.0.

Example 3
Sample XML [5]
  <record>
    <wins>20</wins>
    <losses>15</losses>
    <ties>8</ties>
    <!--
      0 to n no's of well-formed elements, allowed here
      by XSD wild-card instruction, <xs:any />
    -->
  </record>

XSD 1.1 Schema [6]
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:complexType name="Record">
      <xs:sequence>
        <xs:element name="wins" type="xs:nonNegativeInteger"/>
        <xs:element name="losses" type="xs:nonNegativeInteger"/>
        <xs:element name="ties" type="xs:nonNegativeInteger" minOccurs="0"/>
        <xs:any minOccurs="0" maxOccurs="unbounded" namespace="##any" processContents="lax"/>   
      </xs:sequence>
      <xs:assert test="every $x in ties/following-sibling::* satisfies
                     not(empty(index-of(('x','y','z'), local-name($x))))" />
    </xs:complexType>

    <xs:element name="record" type="Record"/>

  </xs:schema>

The XSD schema, [6] validates the XML document [5]. The <xs:any ../> instruction in this schema ([6]) allows, 0-n number of well-formed XML elements after element, "ties". This facility was available in XSD 1.0 as well (for the interest of readers, XSD 1.1 has a weakened wild-card support, which makes the above XSD schema [6] valid -- in XSD 1.0 this schema was invalid, due to enforcement of UPA (unique particle attribution) constraint. An example of this is given in an article here, http://www.ibm.com/developerworks/xml/library/x-xml11pt3/index.html#N10122.).

The assertion in this schema ([6]) enforces that, any element after element, "ties" which is allowed by the xs:any wild-card, should have a name (i.e, a name without namespace prefix -- a XML local-name) among this list, ('x', 'y', 'z'). Something like this, was not possible with XSD 1.0, and to my opinion this is nice :)

PS: more examples to follow, in the next few posts :)

References:
XSD 1.1 Part 1: Structures
XSD 1.1 Part 2: Datatypes

I must acknowledge (a long enough acknowledgement. but I must do it anyway :)), that Xerces assertions is really powered by the PsychoPath XPath 2 engine, and the credit for bringing PsychoPath engine to almost 100% compliance to W3C XPath 2.0 test suite (as of now, PsychoPath is 99% + compliant to the W3C XPath 2.0 test suite) should largely go to Dave Carver and Jesper Steen Møller. I was fortunate enough to contribute somewhat to PsychoPath XPath implementation (the freedom given to me as a Eclipse Source Editing project committer -- thanks to Dave Carver for this, helped me to drive Xerces assertions development quickly). Needless to mention the original PsychoPath code contribution by Andrea Bittau and his team, to Eclipse Foundation. I must also mention the numerous reviews, and improvements suggested by Khaled Noaman and general design advice by Michael Glavassevich (both are Xerces committers) helped tremendously while developing Xerces assertions. I must also mention Ken Cai's contribution, who wrote the original Xerces-PsychoPath interface, and also an initial implementation of that interface.

No comments: