Wednesday, February 24, 2010

XSD 1.1: some more assertions fun

Here are some more XSD 1.1 assertions examples (interesting one's I guess), that I tried running with Xerces-J XSD 1.1 implementation (these ones run fine, with Xerces!):

Example 1 [1]:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="test" type="X" />
   
    <xs:complexType name="X">
      <xs:group ref="List1" />
      <xs:assert test="a and b and d" />
    </xs:complexType>
   
    <xs:complexType name="Y">
      <xs:group ref="List1" />
      <xs:assert test="a and b and c and d" />
    </xs:complexType>
   
    <xs:group name="List1">
       <xs:sequence>
         <xs:element name="a" type="xs:string" minOccurs="0"/>
         <xs:element name="b" type="xs:string" minOccurs="0"/>
         <xs:element name="c" type="xs:string" minOccurs="0"/>
         <xs:element name="d" type="xs:string" minOccurs="0"/>
       </xs:sequence>
    </xs:group>
           
  </xs:schema>

The corresponding XML instance, document is:
<test>
    <a>hello</a>
    <b>world</b>
    <!--<c>hello..</c>-->
    <d>world..</d>
  </test>

Here's the rationale/goal, that motived me to write this XSD sample:
I wanted to define a pair of XSD complex types (something like, X & Y above), such that one of the types could reuse the element particles, from the other type. If this problem could have been solved with XSD type derivation (which I attempted initially), I wanted that only one of the elements in the derived type could become optional -- element, "c" in this example (i.e, with minOccurs = 0 & maxOccurs = 1), while the other elements from the base type should have the same occurrence indicator (i.e, a mandatory indicator -- which is, minOccurs = maxOccurs = 1).

Interestingly, this problem is unsolvable with XSD type derivation (either complex type extension, or restriction mechanism).

For this schema use-case, I came up with the XSD sample above [1], which meets my goal to be able to re-use the element particles in the XSD types. The Schema above [1], defines a global group which contains a sequence of XML element definitions. All of the elements in the group, are marked as optional. Within the complex types (X & Y), the cardinality of elements (0-1 or 1-1) is enforced with XSD assertions. Defining all elements in the group, as optional allows us to reuse this list in different XSD types easily, as we can constrain the elements (say controlling the cardinality of elements, or even the contents of elements/attributes) in different contexts/types say using, assertions.

Using the above schema example [1], therefore if one wants to use a XSD type, where element "c" is optional, one would use the type, "X". While if, one wants to use a XSD type, where all elements are mandatory, one would use the type, "Y".

After having solved the use-case I had in mind (explained above), so just for fun, I wrote another schema using some more assertions.

Here's the 2nd XSD schema:

Example 2 [2]:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

     <xs:element name="test" type="X" />
   
     <xs:complexType name="X">
       <xs:group ref="List1" />
       <xs:assert test="a and b and d" />
     </xs:complexType>
   
     <xs:complexType name="Y">
       <xs:group ref="List1" />
       <xs:assert test="a and b and c and d" />
     </xs:complexType>
   
     <xs:group name="List1">
        <xs:sequence>
           <xs:element name="a" minOccurs="0">
             <xs:complexType>
               <xs:sequence>
                 <xs:element name="a1" type="xs:string" maxOccurs="unbounded" />
               </xs:sequence>
               <xs:attribute name="aCount" type="xs:nonNegativeInteger" />
               <xs:assert test="count(a1) eq @aCount" />
             </xs:complexType>
           </xs:element>
           <xs:element name="b" type="xs:string" minOccurs="0"/>
           <xs:element name="c" type="xs:string" minOccurs="0"/>
           <xs:element name="d" type="xs:string" minOccurs="0"/>
        </xs:sequence>
     </xs:group>
           
  </xs:schema>

The schema [2] is conceptually similar, to schema [1]. The only difference between the two schemas is, that in schema [2], element "a" has complex content, while in schema [1], element "a" is defined to have simple content (which is, xs:string). In schema, [2]'s complex type we define another assertion (which enforces the constraint that, value of attribute "aCount" is equal to the number of, "a1" children of element, "a"). The assertion definition in the complex type of element, "a" in the 2nd schema, is written only to visually increase the complexity of the element a's definition (of-course, this also does increase the functional complexity of element, "a" and subsequently the complexity of contents of the global group definition, in the 2nd schema).

The 2nd schema illustrates, that a more functionally complex list of particles (a, b, c & d here) get more benefit by the schema component re-use technique (accomplished with a XSD group, and assertions) illustrated in this post.

I hope, that this post is useful.

Sunday, February 14, 2010

Xerces-J, XSD 1.1 assertions: complexType -> simpleContent -> restriction

XSD 1.1 complex types are specified by the grammar given here, in the XSD 1.1 spec:
http://www.w3.org/TR/xmlschema11-1/#declare-type

XSD complex type definitions are essentially composed of three mutually exclusive definitions, as follows:
  <complexType ...
    simpleContent |
    complexContent |
    openContent?, (group | all | choice | sequence)?, ((attribute | attributeGroup)*, anyAttribute?), assert*))
  </complexType>

The assertions specification in complexType -> simpleContent -> restriction is a bit different, that all other assertions cases on complex types (as this consists of assertion facets, as well as/or assertions on the complex type).

This is specified by the following XSD 1.1 grammar:
  <simpleContent
    id = ID
    {any attributes with non-schema namespace . . .}>
    Content: (annotation?, (restriction | extension))
  </simpleContent>

  <restriction
    base = QName
    id = ID
    {any attributes with non-schema namespace . . .}>
    Content: (annotation?, (simpleType?, (minExclusive | minInclusive | maxExclusive | maxInclusive | totalDigits | fractionDigits | maxScale | minScale | length | minLength | maxLength | enumeration | whiteSpace | pattern | assertion | {any with namespace: ##other})*)?, ((attribute | attributeGroup)*, anyAttribute?), assert*)
  </restriction>

The XSD definition for xs:restriction above specifies assertions something like following:
assertion*, ..., assert*

Here, xs:assertion (with cardinality, 0-n) is a facet for the simple type value (specified by, complexType -> simpleContent). Whereas, xs:assert (with cardinality, 0-n) is an assertion definition on the complex type (which has access to the element tree, like the XML element itself, and it's attributes if there are any). xs:assertion definitions on, complexType -> simpleContent -> restriction do not have access to the element tree (on which the complex type is applicable), and can only access the simple type value (using, the implicit assertion variable $value, having a XSD type specified by the definition, <xs:restriction base = QName ...) of the element in the context.

Here's a small fictitious examples, illustrating these concepts:

XML document [1]:
  <A a="15">Example A</A>

XSD 1.1, Schema [2]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="A">
      <xs:complexType>
        <xs:simpleContent>
          <xs:restriction base="myBase">    
            <xs:assertion test="contains($value, 'Example')" />
            <xs:assert test="@a mod 5 = 0" />    
          </xs:restriction>
        </xs:simpleContent>
      </xs:complexType>
    </xs:element>
  
    <xs:complexType name="myBase">
      <xs:simpleContent>
         <xs:extension base="xs:string"> 
           <xs:attribute name="a" type="xs:int" />  
         </xs:extension>
      </xs:simpleContent> 
    </xs:complexType>

  </xs:schema>

In the Schema above [2], there are two assertions (shown with bold emphasis) specified on the XSD type. One of assertions is a facet for the simple content, and the other is an assertion on the complex type.

I believe, the above Schema is simple enough and self-explanatory, to illustrate the points I've tried to explain in this post.

Actually, what prompted me to write this post, was that there was a minor bug in complexType -> simpleContent -> restriction facet processing in Xerces-J XSD 1.1 SVN code, which we could fix today, and the fix is now available in Xerces-J SVN repository.

Interestingly, this fix was there in Xerces-J SVN during some past Xerces SVN version. But going forward with assertions development, this bug got introduced, and now has been fixed again.

Saturday, February 6, 2010

PsychoPath XPath2 processor update: fn:name() function fix

While writing following blog post, http://mukulgandhi.blogspot.com/2010/01/xsd-11-wild-cards-in-compositor-and.html (dated, Jan 31, 2010) [1], I actually unearthed a bug in PsychoPath XPath 2 processor, whereby the XPath2 fn:name() function didn't evaluate properly with zero arity (it raised a "context undefined" exception, even if a context item existed).

This bug led me to use the, fn:local-name() (whose implementation was correct) function instead, for the above mentioned blog post [1].

The good news is, that now this bug with fn:name() function is fixed (ref, https://bugs.eclipse.org/bugs/show_bug.cgi?id=301539).

For the example given in the blog post [1], the given XSD 1.1 assertion could now be written like following, as well:
  <xs:assert test="(*[1]/name() = ('fname', 'lname')) and 
                 (*[2]/name() = ('fname', 'lname'))" />

(instead, of the "local-name" function as used in the mentioned blog post [1])

Friday, February 5, 2010

C. M. Sperberg-McQueen: slides about XSD 1.1

I just came across this brief (but sufficient enough to give a good overview) slide presentation about XML Schema (XSD) 1.1, by C. M. Sperberg-McQueen:

http://www.blackmesatech.com/2009/07/xsd11/

Nice ones indeed, and highly recommended.