Sunday, December 13, 2009

XSD 1.1: few more assertions and CTA use cases

In my quest to test Xerces-J's XSD 1.1 implementation, I've come up with another example, using XSD 1.1 assertions and CTA (type alternatives) which I'll like to share here.

Here's a fictitious use-case and some discussions and analysis of the XSD technical options, for solving this use-case, later on in this post.

XML document [1]:
  <shapes>
    <polygon kind="square">
      <a>10</a>  
      <b>10</b>
      <c>10</c>
      <d>10</d>
    </polygon>
    <polygon kind="rectangle">
      <a>10</a>  
      <b>8</b>
      <c>10</c>
      <d>8</d>
    </polygon>
    <polygon kind="triangle">
      <a>5</a>  
      <b>10</b>
      <c>15</c>
    </polygon>
  </shapes>

XML document [2]:
  <shapes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <polygon kind="square" xsi:type="Quadrilateral">
      <a>10</a>  
      <b>10</b>
      <c>10</c>
      <d>10</d>
    </polygon>
    <polygon kind="rectangle" xsi:type="Quadrilateral">
      <a>10</a>  
      <b>8</b>
      <c>10</c>
      <d>8</d>
    </polygon>
    <polygon kind="triangle">
      <a>5</a>  
      <b>10</b>
      <c>15</c>
    </polygon>
  </shapes>

XSD 1.1 Schema [3]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="shapes">
       <xs:complexType>
         <xs:sequence>
           <xs:element name="polygon" type="Triangular" maxOccurs="unbounded">
             <xs:alternative test="@kind = ('square', 'rectangle')" type="Quadrilateral" />
           </xs:element>
         </xs:sequence>   
       </xs:complexType>    
    </xs:element>

    <xs:complexType name="Triangular">
       <xs:sequence>
         <xs:element name="a" type="xs:positiveInteger" />
         <xs:element name="b" type="xs:positiveInteger" />
         <xs:element name="c" type="xs:positiveInteger" />
       </xs:sequence>
       <xs:attribute name="kind" type="xs:string" use="required" />    
    </xs:complexType>

    <xs:complexType name="Quadrilateral">
       <xs:complexContent>
         <xs:extension base="Triangular">
           <xs:sequence>
             <xs:element name="d" type="xs:positiveInteger" />
           </xs:sequence>
           <xs:assert test="if (@kind = 'square') then (a = b and b = c and c = d) else true()" />
           <xs:assert test="if (@kind = 'rectangle') then (a = c and b = d) else true()" />
         </xs:extension>
       </xs:complexContent>
    </xs:complexType>

  </xs:schema>

XSD 1.1 Schema [4]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="shapes">
       <xs:complexType>
         <xs:sequence>
           <xs:element name="polygon" type="Polygon" maxOccurs="unbounded" />
         </xs:sequence>   
       </xs:complexType>    
    </xs:element>

    <xs:complexType name="Polygon">
       <xs:sequence>
         <xs:element name="a" type="xs:positiveInteger" />
         <xs:element name="b" type="xs:positiveInteger" />
         <xs:element name="c" type="xs:positiveInteger" />
         <xs:element name="d" type="xs:positiveInteger" minOccurs="0" />
       </xs:sequence>
       <xs:attribute name="kind" type="xs:string" use="required" />
       <xs:assert test="if (@kind = 'triangle') then not(d) else true()" />
       <xs:assert test="if (@kind = 'square') then (a = b and b = c and c = d) else true()" />
       <xs:assert test="if (@kind = 'rectangle') then (a = c and b = d) else true()" />    
    </xs:complexType>

  </xs:schema>

XSD 1.1 Schema [5]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="shapes">
       <xs:complexType>
         <xs:sequence>
           <xs:element name="polygon" type="Polygon" maxOccurs="unbounded" />
         </xs:sequence>   
       </xs:complexType>
    </xs:element>

    <xs:complexType name="Polygon">
       <xs:sequence>
         <xs:element name="a" type="xs:positiveInteger" />
         <xs:element name="b" type="xs:positiveInteger" />
         <xs:element name="c" type="xs:positiveInteger" />
         <xs:element name="d" type="xs:positiveInteger" minOccurs="0" />
       </xs:sequence>
       <xs:attribute name="kind" use="required">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:enumeration value="square" />
              <xs:enumeration value="rectangle" />
              <xs:enumeration value="triangle" />
            </xs:restriction>
          </xs:simpleType>
       </xs:attribute>
    </xs:complexType>

  </xs:schema>

XSD 1.1 Schema [6]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="shapes">
      <xs:complexType>
         <xs:sequence>
           <xs:element name="polygon" type="Polygon" maxOccurs="unbounded" />
         </xs:sequence>   
       </xs:complexType>    
    </xs:element>

    <xs:complexType name="Polygon">
       <xs:sequence>
         <xs:element name="a" type="xs:positiveInteger" />
         <xs:element name="b" type="xs:positiveInteger" />
         <xs:element name="c" type="xs:positiveInteger" />
       </xs:sequence>
       <xs:attribute name="kind" use="required">
         <xs:simpleType>
           <xs:restriction base="xs:string">
             <xs:enumeration value="square" />
             <xs:enumeration value="rectangle" />
             <xs:enumeration value="triangle" />
           </xs:restriction>
         </xs:simpleType>
        </xs:attribute>    
    </xs:complexType>
 
    <xs:complexType name="Quadrilateral">
       <xs:complexContent>
         <xs:extension base="Polygon">
           <xs:sequence>
             <xs:element name="d" type="xs:positiveInteger" />
           </xs:sequence>
         </xs:extension>
       </xs:complexContent>
    </xs:complexType>

  </xs:schema>

The goal of this use-case is following:
To define a XSD content model, for the XML document [1].

Solution of the use-case, and analysis:
The XSD 1.1 way of solving this would be schema's, [3] or [4] (These are possible solutions, that come to my mind. There could be other solutions too).

The Schema [3] uses both CTA and assertions. Whereas, Schema [4] uses only assertions. To solve this particular use-case, I might likely prefer Schema [4], because the content model defined in this schema is simpler/smaller, which is achieved by defining less of Schema types (only the 'Polygon' type here), and achieving further validation objectives, by defining assertions within this type.

Though, Schema [3] is also an useful solution to this problem, which according to me depicts better XSD type modularity, and also offers better possibilies to reuse the types, defined here in other contexts/use-cases.

But my gut feeling, is to go for Schema [4], for this use-case :)

I am next trying to think, how to solve this use-case in XSD 1.0 way. Here are the things, that come to my mind (with some of of my analysis):
1. Write a XSD Schema, as number [5] above. This is close to the desired solution of the use-case, described in this post. But this schema, doesn't solve this problem completely, as it doesn't strictly enforce the properties of a traingle (has 3 sides), square (has 4 sides, and all sides are equal) or a rectangle (has 4 sides, and opposite sides are equal). This is where, XSD assertions are really needed, if we want to specify XML validation entirely in the XSD layer (I think specifying much of XML validation in XSD layer is good, from application design point of view, as constraints specified with assertions, are entirely declarative and can be easily specified/modified by people, responsible for maintaining business rules, and without requiring to write say procedural code for these kind of validations, in imperative/OO languages like Java).
2. Modify the XML instance, to something like [2]. i.e, make use of XSD 1.0 construct xsi:type (which needs to be specified in the XML instance document) in some way, and validate it with a Schema like, [6]. This solution again, doesn't (and I think with XSD 1.0, we cannot do so) enforce properties of different kind of polygons (as specified in point 1, above), and this also makes the XML document XSD language specific (because it contains the instruction, xsi:type from XSD namespace), making it inconvenient to use such an XML document in environments, where XSD is not available, or where XSD processing is not needed.

The solutions presented in this post, are some of the possible ways, in which the given problem description here might be solved. But I can imagine, that there could be few other possibilities too (from XSD, syntax point of view), to solve such a use-case.

That's all about, I wanted to write at the moment :)

I hope that this post was useful!

Thursday, December 10, 2009

Xerces-J: XSD 1.1 assertions implementation updates

There have been some improvements lately, to the XSD 1.1 assertions support in Xerces-J.

Here are the summary of recent assertion implementation changes, in Xerces-J:

1) XPath 2 expressions, in assertion facets should not access the XPath 2 context, because XPath context is "undefined" during assert facet evaluation

This implies that, the right way to invoke assertion facets, is as follows:
  <xs:simpleType>
    <xs:restriction base="xs:int">
      <xs:assertion test="$value mod 2 = 0" />
    </xs:restriction>
  </xs:simpleType>

(i.e, we need to use the XPath "dynamic context" variable, $value to access the XSD simple type value.)

If an attempt is made to access the XPath context in above XPath expression, like say as follows (using the expression, "." here):
<xs:assertion test=". mod 2 = 0" />

Xerces returns an error message like, following:
test.xml:4:21:cvc-assertion.4.3.15.3: Assertion evaluation ('. mod 2 = 0') for element 'x (attribute => a)' with type '#anonymous' did not succeed (undefined context).

Or an XPath expression, like following:
./@a mod 2 = 0

Would result in a similar error.

A special error message, was constructed (designating, "undefined context" to the user) in Xerces, for this use case.

2) Ability to evaluate assertions, on XML attributes

If attributes in XML document use user-defined XSD simple types, then assertions would also apply to attributes, as they do for XML elements.

Following is a little example for this, use case.

XML document:
  <Example>
    <x a="210">101</x>
  </Example>

Corresponding XSD 1.1 schema:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
    <xs:element name="Example">
       <xs:complexType>
         <xs:sequence>
           <xs:element name="x" type="X_Type" maxOccurs="unbounded" />
         </xs:sequence>
       </xs:complexType>
    </xs:element>
 
    <xs:complexType name="X_Type">
       <xs:simpleContent>
          <xs:extension base="xs:int">
             <xs:attribute name="a">
                <xs:simpleType>
                   <xs:restriction base="xs:int">
                     <xs:assertion test="$value mod 2 = 0" />
                   </xs:restriction>
                </xs:simpleType>
             </xs:attribute>
          </xs:extension>
       </xs:simpleContent>
     </xs:complexType>

  </xs:schema>

Please note, that how we specify a XSD user-defined simple type for attribute "a" above, and an assertion facet on the simple type (there could by 0-n assertion facets here, as we have been looking at earlier).

The assertion facet XPath expression, $value mod 2 = 0 would operate on the context variable, $value (which is the attribute's value) and such an assert facet doesn't have access to the XPath context (a "context undefined" error would be flagged by Xerces, if an attempt is made to access the XPath context).

I hope, that this post was useful.

Saturday, December 5, 2009

XPath 2.0: PsychoPath XPath processor update

I've just run all the PsychoPath XPath 2 processor (an Eclipse Web Tools, Source Editing sub-project) W3C test-suite tests, and here are the results for them:

Tests: 8143
Errors: 0
Failures: 0

So it seems, PsychoPath XPath engine passes, 100% of the W3C XPath 2.0 test suite, and some of it's own tests.

This should be a moment of cheer, and wow!

It also looks, like that the upcoming Xerces-J release, 2.10.0 (ref, http://wiki.apache.org/xerces/November2009) would be getting almost a compliant XPath 2.0, engine for XSD 1.1 assertions and CTA.

Ref: An earlier post about PsychoPath status: http://mukulgandhi.blogspot.com/2009/09/psychopath-xpath-20-processor-update.html.