Sunday, January 31, 2010

XSD 1.1: wild-cards in xs:all compositor, and assertions

I was reading through the latest XSD 1.1 language draft, and one of the things that has changed between XSD 1.0 and 1.1, are some of the details of, xs:all compositor instruction.

XSD 1.1 defines <xs:all ..> compositor as follows:
  <all
    id = ID
    maxOccurs = 1 : 1
    minOccurs = (0 | 1) : 1
    {any attributes with non-schema namespace . . .}>
    Content: (annotation?, (element | any)*)
  </all>

Whereas, XSD 1.0 defined xs:all instruction as following:
  <all
    id = ID
    maxOccurs = 1 : 1
    minOccurs = (0 | 1) : 1
    {any attributes with non-schema namespace . . .}>
    Content: (annotation?, element*)
  </all>

XSD 1.1 allows xs:any wild-card to be part of xs:all (whereas, XSD 1.0 didn't allow this), which makes xs:all instruction to be more useful (because, with xs:any we could make the Schema type more open). We could also have certain Schema constraints present as assertions (as illustrated in the example below), restricting the degree of Schema openness (achieved by xs:any wild-card) to an extent we would want.

Here's an example I came up with, illustrating the use of xs:any wild-card within xs:all compositor, and having some assertions, for imposing some constraints on the ordering of elements (which means, that we are restricting the degree of openness achieved by xs:any using assertions) in the instance document:

XML document [1]:
  <Person>
    <fname>Mukul</fname>
    <lname>Gandhi</lname>
    <sex>M</sex>
    <address>
      <street1>xyz</street1>
      <street2>street</street2>
      <street2>gurgaon</street2>
    </address>
  </Person>

XSD 1.1 Schema [2]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="Person">
      <xs:complexType>
        <xs:all>
          <xs:element name="fname" type="xs:string" />
          <xs:element name="lname" type="xs:string" />
          <xs:element name="sex" type="xs:string" />
          <xs:any processContents="lax" />
        </xs:all>
        <xs:assert test="(*[1]/local-name() = ('fname', 'lname')) and 
                         (*[2]/local-name() = ('fname', 'lname'))" />
      </xs:complexType>
    </xs:element>
  
  </xs:schema>

In the above Schema [2], if the assertions are not present, then xs:all compositor would mean, that it's contents can be present in any order (including the xs:any wild-card, which is newly introduced in XSD 1.1 within xs:all).

The assertion in the Schema above [2], constrains the first two child elements of, "Person" element to be "fname" or "lname".

Xerces-J's XSD 1.1 processor, seems to implement these syntax fine.

I hope that this post is useful.

Wednesday, January 20, 2010

XQuery 1.0: Full axis feature

I was just reading through the XQuery 1.0 language (I'm reading, Priscilla Walmsley's excellent book on XQuery), and found an interesting point specified in XQuery 1.0.

The XQuery 1.0 spec says (Ref):
<quote>
[Definition: The following axes are designated as optional axes: ancestor, ancestor-or-self, following, following-sibling, preceding, and preceding-sibling.]

[Definition: A conforming XQuery implementation that supports the Full Axis Feature MUST support all the optional axes.]
</quote>

Giving a little thought about this, I felt that making certain XPath axes (and many of the useful ones, as specified above) as optional in XQuery language is most likely not correct, and would certainly trouble users, who want to uses these XPath axes out-of-the-box from an XQuery engine.

Interestingly, the latest working draft of the XQuery language (the, 1.1 version) has fixed this design mistake (I think, this was a design mistake!), and doesn't specify such a constraint in XQuery language.

Sunday, January 17, 2010

Trang: DTD to XSD conversion

This morning, I was playing with James Clark's utility which can do conversion between different XML Schema languages.

The utility is, Trang and it's described at:
http://www.thaiopensource.com/relaxng/trang.html

Here's a little example I tried with Trang (a DTD to XSD conversion).

DTD input:
  <!ELEMENT note (to,from,heading,body)>
  <!ELEMENT to (#PCDATA)>
  <!ELEMENT from (#PCDATA)>
  <!ELEMENT heading (#PCDATA)>
  <!ELEMENT body (#PCDATA)>

Here's the XSD output I got from, Trang:
  <?xml version="1.0" encoding="UTF-8"?>
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
    <xs:element name="note">
      <xs:complexType>
        <xs:sequence>
          <xs:element ref="to"/>
          <xs:element ref="from"/>
          <xs:element ref="heading"/>
          <xs:element ref="body"/>
        </xs:sequence>
      </xs:complexType>
    </xs:element>
    <xs:element name="to" type="xs:string"/>
    <xs:element name="from" type="xs:string"/>
    <xs:element name="heading" type="xs:string"/>
    <xs:element name="body" type="xs:string"/>
  </xs:schema>

As per Trang documentation, following combination of conversions are possible:
RELAX NG (both XML and compact syntax) & DTD
to
RELAX NG (both XML and compact syntax), DTD & XSD

I found this good, and can recommend this.

Saturday, January 16, 2010

XSD 1.1: Open contents, with Xerces-J

Here's some further update to XSD 1.1 feature implementation in Xerces-J.

Another significant feature, that's been added in XSD language (in, 1.1 version) is Open contents. It's defined on XSD complex types, and also at the schema level. A nice explanation about XSD 1.1 Open contents, is available in an article at, http://www.ibm.com/developerworks/xml/library/x-xml11pt3/index.html#N1034D.

The wild-card instruction xs:any in XSD 1.0 was a way to implement XSD Open contents. XSD 1.1 introduces two new instructions, for implementing open contents namely, <xs:openContent> and <xs:defaultOpenContent>. These new XSD instructions have more options for the XSD schema authors, to write open schemas.

I've been able to test couple of Open content examples, with current Xerces-J SVN code base, and Xerces-J implements them, sufficiently fine.

XPath 2.0: PsychoPath processor update

I just noticed release of the Eclipse, WTP (Web Tools Platform) 3.2M4 milestone (released on, December 11th, 2009). The WTP Source Editing release, in WTP 3.2M4 milestone, includes an enhanced PsychoPath XPath 2.0 engine.

Xerces-J users using the XSD 1.1 support (specifically XSD 1.1 assertions and CTA/type alternatives), can use the latest PsychoPath library from http://www.eclipse.org/webtools/. The latest development PsychoPath library can also be downloaded, from https://build.eclipse.org/hudson/view/WTP/job/cbi-wtp-wst.xsl.psychopath/.

Dave Carver, has been doing a great job in putting this all together.

Saturday, January 9, 2010

Xerces-J: more XSD 1.1 tests; negative wild-cards

I'm pretty satisfied with the XSD 1.1 assertions and CTA (type alternatives) implementation (as I've been writing few posts about them, earlier on this blog), in current Xerces-J SVN code base (though, a user feedback would be great, for the Xerces project. Instructions to report bugs in Xerces-J can be found at, http://xerces.apache.org/xerces2-j/jira.html).

I'm now beginning to test some of other XSD 1.1 features. To start with these new set of posts, following are few use cases for "Negative wildcards" (ref, http://www.ibm.com/developerworks/xml/library/x-xml11pt3/index.html#N101C9), which I've found to be working fine with Xerces-J.

XSD 1.0 had following XML representation of, xs:any wild-card Schema component:
  <any
    id = ID
    maxOccurs = (nonNegativeInteger | unbounded)  : 1
    minOccurs = nonNegativeInteger : 1
    namespace = ((##any | ##other) | List of (anyURI | (##targetNamespace | ##local)) )  : ##any
    processContents = (lax | skip | strict) : strict
    {any attributes with non-schema namespace . . .}>
      Content: (annotation?)
  </any>

("anyAttribute" is another wild-card Schema component)

XSD 1.1 enhances the xs:any wild-card definition to following:
  <any
    id = ID
    maxOccurs = (nonNegativeInteger | unbounded)  : 1
    minOccurs = nonNegativeInteger : 1
    namespace = ((##any | ##other) | List of (anyURI | (##targetNamespace | ##local)) ) 
    notNamespace = List of (anyURI | (##targetNamespace | ##local)) 
    notQName = List of (QName | (##defined | ##definedSibling)) 
    processContents = (lax | skip | strict) : strict
    {any attributes with non-schema namespace . . .}>
      Content: (annotation?)
  </any>

As we could notice, xs:any now allows (in XSD 1.1, which was not available in XSD 1.0) two additional specifiers in it's definition, namely "notNamespace" and "notQName".

Here's a fictitious example of usage of "notNamespace" specifier:

XML document, [1]:
  <Example xmlns="http://www.example.com/mySample">
    <a>hi there</a>
    <b>hi there ..</b>
    <c>hi there ...</c>
    <d xmlns="http://www.notallowed.com/sorry">hi there ....</d>
  </Example>

XSD 1.1 Schema, [2]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
             targetNamespace="http://www.example.com/mySample"
             elementFormDefault="qualified">

    <xs:element name="Example">
      <xs:complexType>
        <xs:sequence>
          <xs:element name="a" type="xs:string" />
          <xs:element name="b" type="xs:string" />
          <xs:element name="c" type="xs:string" />
          <xs:any notNamespace="http://www.notallowed.com/sorry"
                  processContents="lax"/>
        </xs:sequence>
      </xs:complexType>
    </xs:element>
  
  </xs:schema>

The XSD schema, [2] defines an xs:any wild-card definition, which doesn't allow an XML instance document to have an element instance (allowed by the wild-card) to be in the namespace, "http://www.notallowed.com/sorry".

Therefore when the XML instance, [1] is validated by XSD document, [2] we get following error while performing validation with Xerces-J XSD 1.1 schema engine:
test.xml:5:46:cvc-complex-type.2.4.a: Invalid content was found starting with element 'd'. One of '{WC[##other:"http://www.notallowed.com/sorry"]}' is expected.

If we change the instance document, to specify element "d" to following:
<d>hi there ....</d>
or say,
<d xmlns="http://www.allowed.com">hi there ....</d>

the XSD validation, succeeds (as element "d" is now not in the namespace, "http://www.notallowed.com/sorry").

Here's an example for usage of "notQName" specifier:

XML document, [3]:
  <Example xmlns="http://www.example.com/mySample">
    <a>hi there</a>
    <b>hi there ..</b>
    <c>hi there ...</c>
    <XX>hi there ....</XX>
  </Example>

XSD 1.1 Schema, [4]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
             xmlns:tns="http://www.example.com/mySample"
             targetNamespace="http://www.example.com/mySample"
             elementFormDefault="qualified">

    <xs:element name="Example">
      <xs:complexType>
        <xs:sequence>
          <xs:element name="a" type="xs:string" />
          <xs:element name="b" type="xs:string" />
          <xs:element name="c" type="xs:string" />
          <xs:any notQName="tns:XX"
                  processContents="lax"/>
        </xs:sequence>
      </xs:complexType>
    </xs:element>
  
  </xs:schema>

The XSD schema, [4] doesn't allow an instance document to have, an element "XX" (in namespace, "http://www.example.com/mySample"), where the xs:any wild-card allows an element content.

Therefore, when XML document [3] is validated by XSD schema, [4] we get following error message, with Xerces-J:
test.xml:5:7:cvc-complex-type.2.4.a: Invalid content was found starting with element 'XX'. One of '{WC[##any, notQName(tns:XX)]}' is expected.

So if we, replace the offending element, with:
<abc>hi there ....</abc>
or say,
<XX xmlns="http://www.allowed.com">hi there ....</XX>
(here, the local-name in XML instance document, is same as that specified in the "notQName" specifier, while the namespace of element instance is different, than specified by "notQName", which makes this element instance, valid)

the XML validation passes.

All these XSD language enhancements, in 1.1 version look cool (and, useful :)) to me, and they give some more XML validation capabilities to XSD schema, authors.

Out of my curiosity, I was thinking if we could write few of new XSD 1.1 wild-card capabilities, with assertions.

For e.g, some of the features of "notNamespace" attribute can be written with an assertion like following:
  <xs:assert test="not(namespace-uri(*[last()]) = (
                        'http://www.notallowed.com/sorry1',
                        'http://www.notallowed.com/sorry2',
                        'http://www.notallowed.com/sorry3')
                       )" />

But using assertions for this need, might have following disadvantages, or limitations [5]:
1. The XSD 1.1 engine, has to build a XPath tree to evaluate an assertion, which is a memory overhead.
2. It looks like, that by using assertions, we cannot implement following facilities of "notNamespace" attribute: we cannot specify namespace URIs with keywords, ##targetNamespace & ##local.
3. Using the, "notNamespace" attribute on xs:any wildcard, gives us optimization benefits of xs:any implementation (like, this doesn't have to build a XPath tree, which an assertion approach requires). Moreover, it's better to use a native facility of a construct (like, xs:any), which keeps the XSD schema's design more natural, and easy to understand.

And also, some of the features of "notQName" attribute can be written with an assertion like following:
  <xs:assert test="not(local-name(*[last()]) eq 'XX'
                       and
                      namespace-uri(*[last()]) eq 'http://www.example.com/mySample')" />

This approach would have similar issues, as specified above [5].

I hope, that this post was useful.

Friday, January 1, 2010

XSD 1.1: few more assertions use cases; assertion inheritance

I'm continuing through, with writing uses cases for XSD 1.1 assertions, and testing them with Xerces-J. Here's the next ones in this series:

The uses cases below, demonstrate XSD assertions usage in a XSD schema type hierarchy.

Example 1

XML document [1]:
  <Example x_count="3">
    <x a="val1">2</x>
    <x a="val2">4</x>
    <x a="val3">6</x>
  </Example>

XSD 1.1 document [2]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
    <xs:element name="Example">
      <xs:complexType>
        <xs:sequence>
          <xs:element name="x" type="x_Type" maxOccurs="unbounded" />
        </xs:sequence>
        <xs:attribute name="x_count" type="xs:nonNegativeInteger" use="required" />
        <xs:assert test="@x_count eq count(./x)" />
        <xs:assert test="every $x in x[position() lt last()] satisfies
                    number($x/@a/substring-after(.,'val')) lt
      number($x/following-sibling::x[1]/@a/substring-after(.,'val'))" />
      </xs:complexType>
    </xs:element>
  
    <xs:complexType name="x_Type">
      <xs:simpleContent>
        <xs:extension base="myInteger">
          <xs:attribute name="a" type="attrType" use="required" />
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  
    <xs:simpleType name="myInteger">
      <xs:restriction base="xs:positiveInteger">
        <xs:assertion test="$value mod 2 = 0" />
      </xs:restriction>
    </xs:simpleType>
  
    <xs:simpleType name="attrType">
      <xs:restriction base="xs:string">
         <xs:pattern value="val[1-9][0-9]*" />
         <xs:maxLength value="20" />
      </xs:restriction>
    </xs:simpleType>
  
  </xs:schema>

The purpose of the XML document [1], and the corresponding XSD schema [2] would be quite self-explanatory I believe.

I'll try to explain below, what the assertions in above XSD schema [2], are intended to accomplish:
a) The assertion on complex type (an anonymous type) of element, "Example" is an usual assertion on a XSD complex type, as I've explained in few earlier posts in this series. For the interest of readers, this assertion is ensuring that, value of attribute "a" of element(s) "x" have a suffix integer value, to string "val" is specified in a numerically ascending order.
b) The schema type, of element "x" is "x_Type". x_Type is a complex type (because, it specifies an attribute), and has simple content. The simple content definition of, element "x" is defined by the simple type, "myInteger". The type, myInteger specifies an assertion facet (which tests, that the integer value content, of element "x" is even). This demonstrates, that a schema type (x_Type here) inherits assertions from it's base types (the assertions, all the way up in type hierarchy are inherited -- if any of the assertions, in some of ancestor XSD types are specified).

Example 2

XML document [3]:
  <Example>
    <x a="val1">2</x>
    <x a="val2">4</x>
    <x a="val3">6</x>
  </Example>

XSD 1.1 document [4]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
    <xs:element name="Example">
      <xs:complexType>
        <xs:sequence>
          <xs:element name="x" type="x_Type" maxOccurs="unbounded" />
        </xs:sequence>
        <xs:assert test="every $x in x[position() lt last()] satisfies
                       number($x/@a/substring-after(.,'val')) lt
         number($x/following-sibling::x[1]/@a/substring-after(.,'val'))" />
      </xs:complexType>
    </xs:element>
  
    <xs:complexType name="x_Type">
      <xs:simpleContent>
        <xs:restriction base="x_base">
          <xs:assert test="$value lt 100" /> 
        </xs:restriction>
      </xs:simpleContent>
    </xs:complexType>
  
    <xs:complexType name="x_base">
      <xs:simpleContent>
        <xs:extension base="myInteger">
          <xs:attribute name="a" type="attrType" use="required" />
        </xs:extension>
      </xs:simpleContent>
    </xs:complexType>
  
    <xs:simpleType name="myInteger">
      <xs:restriction base="xs:positiveInteger">
        <xs:assertion test="$value mod 2 = 0" />
      </xs:restriction>
    </xs:simpleType>
  
    <xs:simpleType name="attrType">
      <xs:restriction base="xs:string">
        <xs:pattern value="val[1-9][0-9]*" />
        <xs:maxLength value="20" />
      </xs:restriction>
    </xs:simpleType>
  
  </xs:schema>

Here's how the assertions in XSD schema [4], work on XML document, [3]:
There's nothing too complicated about the assertion rules here. The assertions on the complex type, "x_Type" consists of assertions within this type, and the assertions inherited from the base type.

The XSD examples, [2] and [4] look quite similar. The difference between XSD examples, [2] and [4] is that, the type "x_Type" in example [2] inherits a XSD simple type, while the type "x_Type" in example, [4] inherits a XSD complex type. The element, "Example" in XML document, [3] doesn't have an attribute, "x_count" (this is a cosmetic difference, between the two examples).

Both of the above, XSD examples demonstrate assertions inheritance from base XSD types (one of the examples demonstrates inheriting assertions from a simple type, while the other example demonstrates inheriting assertions, from a complex type).

I hope, that this post was useful.

Happy New Year, 2010

As the new year dawns, here's a new year wish, to readers of this blog: