Saturday, November 28, 2009

Xerces-J: XSD 1.1 assertions on simple types

I am trying to put up a post here, with few examples for assertions on XSD "simple types", and also for "complex types with simple contents", and testing them with Xerces-J XSD 1.1 implementation. The previous couple of posts on this blog, described assertions on XSD complex types having complex content.

1) Here's an example, taken from Roger L. Costello's collections of XSD 1.1 examples, which he's published on his web site:

XML document [1]:
  <Example>
    <even-integer>100</even-integer>        
  </Example>

XSD 1.1 document [2]:
  <schema xmlns="http://www.w3.org/2001/XMLSchema"
          elementFormDefault="qualified">

    <element name="Example">
       <complexType>
          <sequence>
             <element name="even-integer">
                <simpleType>
                  <restriction base="integer">
                     <assertion test="$value mod 2 = 0" />
                  </restriction>
                </simpleType>
             </element>
          </sequence>
       </complexType>
    </element>

  </schema>

The above XSD 1.1 schema [2], constrain the XSD integer values, to only even ones (this works fine with Xerces!). XSD 1.1 defines a new facet named, assertion on XSD built in simple types, which the above example describes.

Please note that, "assertion" facet (applicable both to simple types, and complex types with simple contents) is conceptually different than "assert" constraint on complex types (some of the explanation, for this is also given below as well).

The XSD 1.1 spec mentions, that the assertions XPath 2 "dynamic context" get's augmented with a variable, $value. The XSD type of variable, $value is that of the base simple type (in this example, the type of $value is xs:integer). The detailed rules, for using variable $value in XSD 1.1 schemas are described, here.

It looks to me, that the ability to have an assertion facet on simple types, significantly enhances the XSD author's capability to provide many new constraints on simple type values, which were not possible in XSD 1.0 (for e.g, ability to constrain integer values, to be even, was not possible in XSD 1.0).

For the above example, we could specify assertions to something like below, as well:
<assertion test="$value mod 2 = 0" />
<assertion test="$value lt 500" />
(i.e, a set of two assertion facet instances)

Or perhaps, specifying only one assertion facet instance as following, <assertion test="($value mod 2 = 0) and ($value lt 500)" /> if user wishes, which realizes the same objective.

This enforces, that the simple type value should be even, and also should be less than 500. Also, there are no limits to the number of assertion facet instances that can be specified. To my opinion, an ability to specify unlimited number of assertion facets (and also the assert constraints on complex types), makes assertions an extremely powerful XSD validation constructs.

Note: Interestingly, the following facet definition achieves the same results, as met by the 2nd assertion facet instance, that's described above:
<maxExclusive value="500" />
(this was available in, XSD 1.0 as well)

2) Complex types with simple contents, using assertions:
XML document [3]:
  <root>
    <x label="a">2</x>
    <x label="b">4</x>
  </root>

Here, the element "x" should have an attribute "label" with type xs:string. But the content of element "x" is simple (of type, xs:int for this example).
Additional we also want, that the simple content value of "x", should be an even number.

The XSD document for these validation constraints, is as follows [4]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
   <xs:element name="root">
     <xs:complexType>
       <xs:sequence>
         <xs:element name="x" maxOccurs="unbounded" type="X_Type" />
       </xs:sequence>
     </xs:complexType>
   </xs:element>
   
   <xs:complexType name="X_Type">
     <xs:simpleContent>
        <xs:extension base="xs:int">    
          <xs:attribute name="label" type="xs:string" />
          <xs:assert test="$value mod 2 = 0" />
        </xs:extension>
     </xs:simpleContent>
   </xs:complexType>
  
  </xs:schema>

The use of xs:assert instruction is stressed in this example.

It's interesting to see, that if we change value of one of "x" elements as follows:
<x label="a">21</x>
(I changed the first "x")

Xerces fails the validation of XML instance, and returns following error message to the user:
test.xml:2:22:cvc-assertion.3.13.4.1: Assertion evaluation ('$value mod 2 = 0') for element 'x' with type 'X_Type' did not succeed.

Here, the XML validation did not succeed, because the value 21 is not an even number.

3) The last example of this post is following:
This describes, the scenario of Complex types with simple contents. But here, the simple content get's its value by "restriction of a complex type". The previous example described Complex types with simple contents, using extension.

The XML file remains same [3], and the new XSD document is following [5]:
 <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
   <xs:element name="root">
     <xs:complexType>
       <xs:sequence>
         <xs:element name="x" maxOccurs="unbounded" type="X_Type" />
       </xs:sequence>
     </xs:complexType>
   </xs:element>
   
   <xs:complexType name="X_Type">
     <xs:simpleContent>
        <xs:restriction base="x_base">      
           <xs:assertion test="$value mod 2 = 0" />
           <xs:assert test="@label = ('a','b')" />
        </xs:restriction>
     </xs:simpleContent>
   </xs:complexType>
   
   <xs:complexType name="x_base">
     <xs:simpleContent>
        <xs:extension base="xs:int">    
          <xs:attribute name="label" type="xs:string" />
        </xs:extension>
     </xs:simpleContent>
   </xs:complexType>
  
 </xs:schema>

Please notice, how assertions are specified on the complex type, "X_Type" (shown with bold). Here, we have two assertion instructions (xs:assertion and xs:assert). In this example, xs:assertion is a facet for the atomic value, of the complex type (the value of complex type is simple in this case!). While xs:assert is the assertions instruction on the complex type (which has access to the element tree).

The complexType -> simpleContent -> restriction, type definition can specify assertions with following grammar:
... assertion*, ..., assert* (i.e, 0-n xs:assertion components can be followed by 0-n xs:assert components (this ordering matters, otherwise the XSD 1.1 processor will flag an error).
There could be other constructs as well, before xs:assertion here (and some after it. But anything after xs:assertion*, needs to be before the trailing xs:assert's). This is described in the relevant XSD 1.1 grammar at, http://www.w3.org/TR/2009/CR-xmlschema11-1-20090430/#dcl.ctd.ctsc.

Notes: The XML Schema WG decided to have two different names for assertion instructions (xs:assertion and xs:assert), for this particular scenario, so the XSD Schema authors could decide, whether they are writing assertions as a facet for simple values, or assertions for complex types (which have access to the element tree). If this naming distinction was not made in XSD 1.1 assertions, then specification of asserts in XSD documents, in this case would have caused ambiguity (i.e, the XSD 1.1 processor could not tell, which assertion is a facet, and which is an assert for the complex type).

Acknowledgements:
I must mention that XSD 1.1 examples shared by Roger L. Costello, helped us fix quite a bit of bugs in Xerces assertions implementation. Our sincere thanks are due, to Roger.

References:
1. Reader's could also find this article useful, http://www.ibm.com/developerworks/library/x-xml11pt2/ about XSD 1.1 co-occurence constraints, which describes XSD 1.1 assertions facility in detail.

I hope that this post was useful.

Friday, November 27, 2009

XSD 1.1: another assertions example with Xerces-J !

Here's another XSD 1.1 assertions example, which I came up with today :)

An XML document is something like below:
  <person_db>
    <person id="1">
      <fname>john</fname>
      <lname>backus</lname>
      <dob>1995-12-10</dob>
    </person>
    <person id="2">
      <fname>rick</fname>
      <lname>palmer</lname>
      <dob>2001-11-09</dob>
    </person>
    <person id="3">
      <fname>neil</fname>
      <lname>cooks</lname>
      <dob>1998-11-10</dob>
    </person>
  </person_db>

Other than constraining the XML document to a structure like above, the XSD schema should specify following additional validation constraints, as well:
1) Each person's dob field should specify a date, which must be later than or equal to the date, 1900-01-01.
2) Each "person" element, should be sorted numerically according to "id" attribute, in an ascending fashion.

I wanted to achieve these validation objectives, completely with XSD 1.1 assertions. Here's the XSD 1.1 document, which I find that works fine, with Xerces-J:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
     <xs:element name="person_db">
       <xs:complexType>
          <xs:sequence>
            <xs:element name="person" maxOccurs="unbounded" type="Person" />
          </xs:sequence>
          <xs:assert test="every $p in person[position() lt last()] satisfies
                            ($p/@id lt $p/following-sibling::person[1]/@id)" />
       </xs:complexType>
     </xs:element>
   
     <xs:complexType name="Person">
        <xs:sequence>
          <xs:element name="fname" type="xs:string" />
          <xs:element name="lname" type="xs:string" />
          <xs:element name="dob" type="xs:date" />
        </xs:sequence>
        <xs:attribute name="id" type="xs:int" use="required" />
        <xs:assert test="dob ge xs:date('1900-01-01')" />
     </xs:complexType>
  
   </xs:schema>

Note: It also seems, that above XSD validation requirements could be met, with following changes as well:
1. Remove assertion from the complex type, "Person".
2. Have an additional assertion on the element, "person_db" which will now look something like following:
<xs:assert test="every $p in person[position() lt last()] satisfies
($p/@id lt $p/following-sibling::person[1]/@id)" />
<xs:assert test="every $p in person satisfies ($p/dob ge xs:date('1900-01-01'))" />

i.e, we'll now have two assertions on the element, "person_db" (which are actually specified on the element's schema type).

Though, I seem to like the first solution! as it seems elegant to me, and more logically in place.

I am happy, that this particular example worked fine as I expected, with Xerces.

I hope that this post was useful.

Friday, November 20, 2009

XSD 1.1: some CTA samples with Xerces-J

I have been trying to write few XSD 1.1 Conditional Type Assignment (CTA) samples, and trying them to run with the current Xerces-J schema development SVN code.

To start with, here's the first example (a very simple one indeed) that I find, which runs fine with Xerces-J.

XML document [1]
  <root>
    <x>hello</x>
    <x kind="int">10</x>
  </root>

XSD 1.1 document [2]
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

     <xs:element name="root">
       <xs:complexType>
         <xs:sequence>
           <xs:element name="x" type="xs:string" maxOccurs="unbounded">
             <xs:alternative test="@kind='int'" type="xInt_Type" />
           </xs:element>
         </xs:sequence>
       </xs:complexType>
     </xs:element>

     <xs:complexType name="xInt_Type">
       <xs:simpleContent>
         <xs:extension base="xs:int">
           <xs:attribute name="kind" type="xs:string" />
         </xs:extension>
       </xs:simpleContent>
     </xs:complexType>

  </xs:schema>

Please note the presence of XSD 1.1 instruction, xs:alternative (which makes this XSD Schema, a type alternative scenario), within the declaration for element, "x". If the value of "kind" attribute on element "x" is 'int', then a schema type "xInt_Type" will be assigned to element "x". Otherwise the schema type of element "x" will be, xs:string.

Xerces-J successfully validates the above XML document [1] with the given XSD 1.1 Schema [2].

If we introduce the following change to the XML document:
<x kind="int">not an int</x>

Xerces-J would display following error messages:
test.xml:3:31:cvc-datatype-valid.1.2.1: 'not an int' is not a valid value for 'integer'.
test.xml:3:31:cvc-complex-type.2.2: Element 'x' must have no element [children], and the value must be valid.


The above error message should obviously be correct, as the value 'not an int' in the XML document is not of type, xs:int.

Interestingly, if the XML document is changed to following:
  <root>
    <x>hello</x>
    <x kind="not-known">10</x>
  </root>

Xerces-J displays the following error messages:
test.xml:3:23:cvc-type.3.1.1: Element 'x' is a simple type, so it cannot have attributes, excepting those whose namespace name is identical to 'http://www.w3.org/2001/XMLSchema-instance' and whose [local name] is one of 'type', 'nil', 'schemaLocation' or 'noNamespaceSchemaLocation'. However, the attribute, 'kind' was found.

I find this error message to be correct. The 1st <x> element validates successfully (because it has met with the element declaration's default type assignment, which is <xs:element name="x" type="xs:string" ... (a)) while the 2nd instance of <x> gives a validation error. Since for the 2nd <x> value of "kind" attribute is the string 'not-known', Xerces tries to assign this particular <x>, the XSD type "xs:string", due to a default type assignment (a). The value "10" is indeed of type xs:string, so the value of <x> in this case is fine. But since the type of 2nd <x> needs to be xs:string (which is a XSD simple type), therefore this instance of <x> cannot have an attribute (and hence, Xerces returns an error message in this case).

Xerces-J CTA implementation, using PsychoPath XPath 2 engine:
The XSD 1.1 spec, defines a small XPath 2 language subset, to be used by XSD 1.1 CTA instructions. Xerces-J has a native implementation of this XPath 2 subset (implemented by Hiranya Jayathilaka, a fellow Xerces-J committer), which get's selected by Xerces as a default XPath 2 processor, if CTA XPath 2 expressions conform to this XPath 2 subset (this was designed into Xerces, to make efficient XPath 2 evaluations, for the CTA XPath 2 subset, since evaluating every XPath 2 expression with PsychoPath engine could have been expensive).

But if, the XSD CTA XPath 2 expressions cannot be compiled by the native Xerces-J CTA XPath 2 subset, Xerces will attempt to use the PsychoPath engine, to evaluate CTA XPath expressions, as a fall back option (and also to enable users to use the full XPath 2 language with Xerces CTA implementation, if they want to).

To test, that PsychoPath engine does work with Xerces CTA implementation, I modified the type alternative instruction for the XSD example [2] above, to following:
<xs:alternative test="@kind='int' and (tokenize('xxx xx', '\s+')[1] eq 'xxx')" type="xInt_Type" />
I added a dummy XPath "and" clause, which can only succeed with Xerces, if PsychoPath engine would evaluate this XPath expression. This additional "and" clause doesn't make any difference to the validity of the XML document [1], as in this example, it would always evaluate to a boolean "true". If we try to introduce any error into the above XPath expression like say, to following:
tokenize('xxx xx', '\s+')[1] eq 'xx' (please note the change from eq 'xxx' to eq 'xx', which will cause this XPath expression to evaluate to a boolean "false"), Xerces would report a XML validity error, which is really expected of the Xerces CTA implementation.

I hope that this post was useful.

Wednesday, November 18, 2009

XSD 1.1: some XSD 1.1 samples running with Xerces-J

I was thinking lately to functionally stress test, the upcoming Xerces-J XSD 1.1 preview release (using the SVN code we have now, and later using the public binaries which will be provided by the Xerces project). I am just curious to know, if there are any non-compliant parts in Xerces-J XSD 1.1 implementation, that I can find, which could probably serve as inputs to improving Xerces-J XSD 1.1 code base. To start with, I'll try to write few XSD 1.1 schemas, using the XSD 1.1 assertions and conditional type assignment (CTA) instructions.

Assertions examples

Example 1
Sample XML [1]
  <x a="xyz">
    <foo>5</foo>
    <bar>10</bar>
  </x>

XSD 1.1 Schema [2]
(Use Case: "the value of the foo element must be less than or equal to the value of the bar element")
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
    <xs:element name="x">
      <xs:complexType>
         <xs:sequence>
           <xs:element name="foo" type="xs:int" />
           <xs:element name="bar" type="xs:int" />
         </xs:sequence>
         <xs:attribute name="a" type="xs:string" use="required" />
         <xs:assert test="foo le bar" />
      </xs:complexType>
    </xs:element>
  
  </xs:schema>

Using Xerces-J XSD 1.1 validator, the XML document [1] above validates fine with the given XSD document ([2]).

If the assertion is written as follows (which is a false assertions. this is just to check for false assertions, and the error messages):
<xs:assert test="(foo + 10) le bar" />

Then that would make the XML instance document ([1] above) invalid, and following error message is printed by Xerces:
test.xml:4:5:cvc-assertion.3.13.4.1: Assertion evaluation ('(foo + 10) le bar') for element 'x' with type '#anonymous' did not succeed.

Use Case: "if the value of the attribute "a" is xyz, then the bar and baz elements are required, but otherwise they are optional".

This would require following assertions definition:
<xs:assert test="if (@a eq 'xyz') then (foo and bar) else false" />

This works fine with Xerces-J.

Acknowledgements: Thanks to Douglass A Glidden for contributing these use cases, on xml-dev list.

PS: more examples to follow, in the next few posts :)

References:
XSD 1.1 Part 1: Structures
XSD 1.1 Part 2: Datatypes

I must acknowledge (a long enough acknowledgement. but I must do it anyway :)), that Xerces assertions is really powered by the PsychoPath XPath 2 engine, and the credit for bringing PsychoPath engine to almost 100% compliance to W3C XPath 2.0 test suite (as of now, PsychoPath is 99% + compliant to the W3C XPath 2.0 test suite) should largely go to Dave Carver and Jesper Steen Møller. I was fortunate enough to contribute somewhat to PsychoPath XPath implementation (the freedom given to me as a Eclipse Source Editing project committer -- thanks to Dave Carver for this, helped me to drive Xerces assertions development quickly). Needless to mention the original PsychoPath code contribution by Andrea Bittau and his team, to Eclipse Foundation. I must also mention the numerous reviews, and improvements suggested by Khaled Noaman and general design advice by Michael Glavassevich (both are Xerces committers) helped tremendously while developing Xerces assertions. I must also mention Ken Cai's contribution, who wrote the original Xerces-PsychoPath interface, and also an initial implementation of that interface.

Saturday, November 14, 2009

Xerces-J XSD 1.1 update: bug fixes and enhancements

The Xerces-J team did few enhancements to the XSD 1.1 implementation, which solves few important XSD namespace URI issues, which affected Xerces assertions and Conditional Type Alternatives (CTA) implementation. These changes went into the Xerces-J SVN repository today.

Here are the summary of these improvements:
1) There is now an ability with Xerces-J XSD 1.1 implementation, to pass on the XSD language namespace prefix (which is declared on the XSD <schema> element), along with the XSD language URI as a prefix-URI binding pair to PsychoPath XPath 2.0 engine. This enhancement allows, the XSD language prefix declared on the "XSD 1.1 Schema instance" 's <schema> element to be used in the assertions and CTA XPath 2.0 expressions, for example as following:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" ...>
  ...
  <xs:assert test="xs:string(test) eq 'xxx'" />
  ...
</xs:schema>

OR say,

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" ...>
  ...
  <xsd:assert test="xsd:string(test) eq 'xxx'" />
  ...
</xsd:schema>

The previous code in Xerces SVN (before the today's commit), hardcoded the XML Schema prefix to "xs" while communicating to the PsychoPath XPath 2 engine interface. That didn't allow the XPath 2 expressions in assertions and CTA to evaluate correctly (the Xerces code before this fix, always returned false for assertions, due to the presence of this bug), which used any other XSD prefix, like say "xsd" (even if the prefix "xsd" was bound to the XSD namespace, on the XSD root element, <schema>).

This was a significant Xerces assertions and CTA bug, which got solved today, and the fix for this is now available on the Xerces-J XSD 1.1 development SVN repository.

2) Another enhancement which went into Xerces-J SVN repository today, is the ability to specify the XPath 2.0 F&O namespace declaration on the XSD document root element, <schema>.

This enhancement makes possible something like, the following XSD 1.1 Schema to become valid:
<xs:schema xmlns:xs="" xmlns:fn="http://www.w3.org/2005/xpath-functions" ...>
  ...
   <xs:assert test="xs:string(test) eq fn:string('xxx')" />
  ...
</xs:schema>

Here the XML Schema author can, qualify the XPath 2 function calls in assertions XPath expressions, with the XPath 2 F&O namespace prefix, like fn:string('xxx') above. The F&O namespace prefix must be bound to the F&O namespace URI, "http://www.w3.org/2005/xpath-functions" for such a XSD Schema to be valid.

Even the following XSD 1.1 Schema is also valid (this happened to work correctly, earlier also before this Xerces SVN commit):
<xs:schema xmlns:xs="" ...>
  ...
   <xs:assert test="xs:string(test) eq string('xxx')" />
  ...
</xs:schema>

Here the XML Schema author, can use XPath 2 functions in Xerces assertions without specifying any prefix, for example like string('xxx') in the above example. The XPath 2 function calls without specifying the XPath 2 F&O prefix, would work correctly for all the XPath 2.0 built in functions, in Xerces assertions XPath 2 expressions.

World community grid

There seems to be a nice initiative, "world community grid". I think, IBM sponsors this community computing grid. I have been participating on this grid, since quite a few days now, and it really works! and I believe, it does make a difference to community good.

This grid is composed by, computers which could be normal public personal computers at home, or office or any kind of computers that all can connect to the web. When a grid client is connected to the web, enabled by user authentication, the client computer participates in numerous public computing projects. Joining the grid, helps us to donate our computer's processing power to computations needed by these public projects, mostly requiring massive computing simulations in short time.

Joining the grid, doesn't disrupt the normal user activity on client computers, and the grid client intelligently utilizes memory (a very less amount of memory is needed by the grid client, while it works, which is normally as less as 5-10 MB) and the CPU, without disrupting anything for user's personal activities. It is also possible to configure the user's grid activity, about how to use one's CPU. Somebody may want to work in the default mode, or can give more CPU usage to the grid project tasks. The default mode works, well for me.

All these details, and much more are available on the "world community grid", web page.

Friday, November 13, 2009

XML spec and XSD

A few days ago, I started of a pretty length discussion on xml-dev mailing list about the following topic,

"Should the W3C XML specification specify XML Schema (a.k.a XSD) also as a XML validation language, as it specifies DTD (Document Type Definition)."

The XML spec seems to convey, that an XML document is valid, *only* if it's valid according to a DTD. I had a contention to this point, and started of a debate on xml-dev list related to this question. I argued, that since there are now newer XML validation languages like XSD, RelaxNG, Schematron etc, the XML spec now can modify the XML validation definition to refer to other XML Schema languages as well, rather than saying, that XML document is valid *only* if DTD is associated with the XML document.

Unfortunately, may people who spoke on xml-dev, who have been working with XML for long, did not agree to this idea. But alas, I still feel I had/have a valid point about this :(

I am referring to this threaded discussion again here, for records of this blog. Please follow this link, if anybody wants to read this whole discussion.