Friday, November 20, 2009

XSD 1.1: some CTA samples with Xerces-J

I've been trying to write few XSD 1.1 Conditional Type Assignment (CTA) samples, and trying them to run with the current Xerces-J schema development SVN code.

To start with, here's the first example (a very simple one indeed) that I find, which runs fine with Xerces-J:

XML document [1]:
  <root>
    <x>hello</x>
    <x kind="int">10</x>
  </root>

XSD 1.1 document [2]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

     <xs:element name="root">
       <xs:complexType>
         <xs:sequence>
           <xs:element name="x" type="xs:anyType" maxOccurs="unbounded">
             <xs:alternative test="@kind='int'" type="xInt_Type" />
             <xs:alternative type="xString_Type" />
           </xs:element>
         </xs:sequence>
       </xs:complexType>
     </xs:element>

     <xs:complexType name="xInt_Type">
       <xs:simpleContent>
         <xs:extension base="xs:int">
           <xs:attribute name="kind" type="xs:string" />
         </xs:extension>
       </xs:simpleContent>
     </xs:complexType>

     <xs:complexType name="xString_Type">
       <xs:simpleContent>
         <xs:extension base="xs:string">
           <xs:attribute name="kind" type="xs:string" />
         </xs:extension>
       </xs:simpleContent>
     </xs:complexType>

  </xs:schema>

Please note the presence of XSD 1.1 instruction, xs:alternative (which is newly introduced in XSD 1.1, and makes this XSD Schema, a type alternative scenario), within the declaration for element, "x" in above schema [2]. If the value of "kind" attribute on element "x" is 'int', then a schema type "xInt_Type" will be assigned to element "x". If the attribute "kind" is not present on element, "x" or if it's present, and it's value if not 'int', the schema type xString_Type get's assigned to element, "x".

Xerces-J successfully validates the above XML document [1] with the given XSD 1.1 Schema [2].

If we introduce the following change to the XML document:
<x kind="int">not an int</x>

Xerces-J would display following error messages:
cvc-datatype-valid.1.2.1: 'not an int' is not a valid value for 'integer'.

The above error message is correct, because the value 'not an int' in the XML document is not of type, xs:int.

Notes:
The schema types specified on xs:alternative instructions, need to validly derive (also referred to as, "type substitutable" in XSD 1.1 spec) from the default type specified on the element (which is, xs:anyType in this example), or the type on xs:alternative could be xs:error (this is a new schema type defined in XSD 1.1 spec, and is particularly useful with XSD type alternatives. The schema type xs:error has an empty lexical and value space, and any XML element or attribute which has this type, will always be invalid).

So for example, if we write an element declaration like following (demonstrating type substitutability/derivation of XSD types, specified on xs:alternative instructions):
  <xs:element name="x" type="xs:string" maxOccurs="unbounded">
    <xs:alternative test="@kind='int'" type="xInt_Type" />
  ...

Xerces-J would return following error message:
e-props-correct.7: Type alternative 'xInt_Type' is not xs:error or is not validly derived from the type definition, 'string', of element 'x'.

Making use of type xs:error, in CTAs:
Let's assume, that XML document remains same as document [1], and declaration of element "x" is now written like following:
  <xs:element name="x" type="xs:anyType" maxOccurs="unbounded">
    <xs:alternative test="@kind='int'" type="xInt_Type" />
    <xs:alternative type="xs:error" />
  </xs:element>

Now Xerces returns an error message like following:
cvc-datatype-valid.1.2.1: 'hello' is not a valid value for 'error'.

For this particular example, this error would occur if attribute "kind" is not present, or if the attribute "kind" is present, and it's value is not 'int'.

Xerces-J CTA implementation, using PsychoPath XPath 2 engine:
The XSD 1.1 spec, defines a small XPath 2 language subset, to be used by XSD 1.1 CTA instructions. Xerces-J has a native implementation of this XPath 2 subset (implemented by Hiranya Jayathilaka, a fellow Xerces-J committer), which get's selected by Xerces as a default XPath 2 processor, if CTA XPath 2 expressions conform to this XPath 2 subset (this was designed into Xerces, to make efficient XPath 2 evaluations using the CTA XPath 2 subset, since evaluating every XPath 2 expression with PsychoPath engine could have been computationally expensive).

But if, the XSD CTA XPath 2 expressions cannot be compiled by the native Xerces-J CTA XPath 2 subset, Xerces will attempt to use the PsychoPath XPath engine to evaluate CTA XPath expressions, as a fall back option (and also to enable users to use the full XPath 2 language with Xerces CTA implementation, if they want to).

To test, that PsychoPath engine does work with Xerces CTA implementation, I modified the type alternative instruction for the XSD example [2] above, to following:
<xs:alternative test="@kind='int' and (tokenize('xxx xx', '\s+')[1] eq 'xxx')" type="xInt_Type" />
I added a dummy XPath "and" clause, which can only succeed with Xerces, if PsychoPath engine would evaluate this XPath expression. This additional "and" clause doesn't make any difference to the validity of the XML document [1], as in this example it would always evaluate to a boolean "true". If we try to introduce any error into the above XPath expression like say, to following:
tokenize('xxx xx', '\s+')[1] eq 'xx' (please note the change from eq 'xxx' to eq 'xx', which will cause this XPath expression to evaluate to a boolean "false"), Xerces would report a XML validity error, which is really expected of the Xerces CTA implementation.

I hope that this post was useful.

No comments: