Tuesday, December 28, 2010

Schema based XML compare

David A. Lee (producer of XMLSH -- A command line shell for XML) raised an interesting discussion a while ago on XML-DEV mailing list, about how to do XML Schema aware XML document comparison. The whole of this discussion thread can be read here. Michael Kay suggested to use the XPath 2.0 function deep-equal (where the input document trees need to be validated by a schema -- to enable type-aware comparison, before doing a comparison by this function) for this kind of use case. Following Michael's idea I was playing with this concept using IBM's XPath 2.0 engine (which is XML Schema aware and is a component of WebSphere Application Server feature pack for XML). For the interest of readers, here's a minimal Java program illustrating this program design.
package com.ibm.xpath2;

import javax.xml.namespace.QName;
import javax.xml.transform.stream.StreamSource;

import com.ibm.xml.xapi.XDynamicContext;
import com.ibm.xml.xapi.XFactory;
import com.ibm.xml.xapi.XPathExecutable;
import com.ibm.xml.xapi.XSequenceCursor;
import com.ibm.xml.xapi.XSequenceType;
import com.ibm.xml.xapi.XStaticContext;

public class XMLCompare {

    public static void main(String[] args) throws Exception {
        String dataDir = System.getProperty("dataDir.path");
  
        XFactory factory = XFactory.newInstance();
        factory.setValidating(XFactory.FULL_VALIDATION);
        factory.registerSchema(new StreamSource(dataDir + "/test.xsd"));
        
        XStaticContext staticContext = factory.newStaticContext();
        staticContext.declareVariable(new QName("doc1"), factory.getSequenceTypeFactory().                      documentNode(XSequenceType.OccurrenceIndicator.ONE));
        staticContext.declareVariable(new QName("doc2"), factory.getSequenceTypeFactory().                                      documentNode(XSequenceType.OccurrenceIndicator.ONE));
        XDynamicContext dynamicContext = factory.newDynamicContext();
        dynamicContext.bind(new QName("doc1"), new StreamSource(dataDir + "/test1.xml"));
        dynamicContext.bind(new QName("doc2"), new StreamSource(dataDir + "/test2.xml"));
                
        XPathExecutable executable = factory.prepareXPath("deep-equal($doc1, $doc2)", staticContext);
        XSequenceCursor result = executable.execute(dynamicContext);
        if (result.exportAsList().get(0).getBooleanValue()) {
           System.out.println("deep-equal == true");
        }
        else {
           System.out.println("deep-equal == false");
        }
    }
} 

Following are the XML and XML Schema documents used for the above example.

test1.xml
<test>10.00</test>
test2.xml
<test>10</test>

test.xsd
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
  <element name="test" type="double" />
</schema>

For the above examples, if the schema type of element node "test" is xs:double then both the XML documents above are reported deep-equal (since the values 10 and 10.00 are same double values, and the element node was annotated with schema type xs:double and deep-equal function did a type aware comparison of XML documents). But if say the schema type of element node "test" is xs:string, then the XML documents shown above would be reported not deep-equal.

I hope that this post is useful.

Saturday, December 18, 2010

Apache Xerces-J 2.11.0 released

I am happy to extend the announcement made by Apache Xerces team few days ago, for the release of new version of Xerces-J (2.11.0) (ref http://markmail.org/message/oom75s3wpebfywh5). This Xerces release specifically improves compliance to the XML Schema 1.1 language (the detailed release notes for Xerces-J 2.11.0 are available at, http://xerces.apache.org/xerces2-j/releases.html).

On behalf of Xerces team I hope that this Xerces-J release would be found useful by the XML and XML Schema community.

Refrences to XML Schema language:
1. http://www.w3.org/XML/Schema (XML Schema WG Home Page)
2. http://www.w3.org/TR/xmlschema11-1/ (XML Schema 1.1 Structures specification)
3. http://www.w3.org/TR/xmlschema11-2/ (XML Schema 1.1 Datatypes specification)

Saturday, November 27, 2010

XML Schema 1.1: complexType restriction rules

I've been excited enough to write now about the new rules that have been specified in XML Schema 1.1 spec regarding type derivations between XML Schema complexType definitions and what is Xerces-J's (it's XML Schema 1.1 engine) current compliance about this area of XML Schema language. In this blog post I'm currently covering XML schema complexType restriction derivations. I'll try to write about complexType extensions sometime later. I thought that this post might find audience interested in this topic (anyone is invited to write a comment to this blog post, which will help me to learn more about type derivations between XML schema complex types -- "i'm interested in both restriction and extension derivations", and can also give Xerces team useful feedback to improve Xerces in desired and compliant ways). Below are my findings from the XML Schema 1.1 spec about this topic, and also Xerces's compliance status in this regard (I acknowledge that my understanding may yet not be complete about these areas of the XML Schema language :).

In XML Schema 1.0 language complex type restriction derivation rules are defined by schema particle restriction rules specified here, http://www.w3.org/TR/xmlschema-1/#coss-particle. There's a 5x5 table in this section which describes what constitute valid restrictions (and what schema type restrictions are forbidden) of XML schema particles.

In XML Schema 1.1 all of these complexType derivation rules are replaced by sections 3.4.6.3 Derivation Valid (Restriction, Complex) and 3.4.6.4 Content Type Restricts (Complex Content). In XML Schema 1.1 a mapping table (the 5x5 table) for particle restrictions is removed, and now a generic algorithm of subsumption relationship (a kind of containment or association relationship) of default bindings (which is an abstract notion for element and attributes declarations along with wild-card attributes "strict", "lax" and "skip") is specified. The XML Schema 1.1 complexType subsumption rules are simpler and easy to remember, than the corresponding type derivation rules from XML Schema 1.0 spec. My personal understanding so far is that, the improved default binding particle subsumption rules in XML Schema 1.1 make XML Schema 1.1 complexType restriction derivations largely compatible with corresponding type derivation rules in XML Schema 1.0, but the rules are now specified with better wordings.

Below are various XML schema complexType restriction cases I've studied so far (and these have corresponding implementations in Xerces; the upcoming Xerces-J 2.11.0 release would have these features), the characteristics of which are also described and I'm trying to discover more of the rules in these areas of XML Schema language.

xs:sequence, xs:choice and xs:all are possible compositors (which signify the notion of how we can compose schema particles in XML schema complexType definitions) in schema complexType's.

A) SEQUENCE TO SEQUENCE RESTRICTIONS
a.1 xs:element is derived from xs:any wild-card (both of these particles are part of an XML Schema sequence compositor). In this scenario cardinality of particles takes precedence than presence of a concrete element in derived type, when determining valid particle derivations.

For e.g <xs:element name="x" type="xs:string" minOccurs="0"/> is not a valid restriction of <xs:any processContents="lax" />, since the effective cardinality of element "x" (minOccurs="0" means that particle "x" is optional) is more than that of the wild-card particle (is mandatory).

a.2 There must be a similar (i.e X-to-X where X is a positive numerical value) mapping of particles from a schema 'base' to 'derived' type. i.e a derived type cannot have less number of particles than those in base type, and a particle in derived type must validly derive (i.e is subsumed validly as per rules specified in XML Schema 1.1 spec) from the corresponding particle in base schema type.

B) ALL TO SEQUENCE RESTRICTIONS
b.1 This is a valid schema compositor (and of particles in them) restriction (i.e ordered from unordered restriction).

For e.g sequence(b, a) and sequence(a, b) {order of particles in derived type doesn't matter} are valid restrictions of all(a, b).

b.2 Identity of particles (recognized by QName of the particles) is recognized by the XML schema validator, and corresponding such particles must obey rules of restriction by cardinality (i.e an optional characteristic of particle does not make particle a valid restriction of a mandatory particle, where QName's of corresponding such particles in base and derived types are same).

C) ALL TO ALL RESTRICTIONS
c.1 This is an unordered to unordered kind restriction. Concrete element particle is an valid derivation of a wild-card particle.

c.2 Cardinality of identical particles (having same QName's) in derived type must be same or less (which makes the derived particle validly derive from the corresponding particle from base type) than that in base type. Particle cardinalities take precedence over generic/concrete relationship between particles, when determining valid particle subsumptions.

c.3 Number of leaf particles (which are essentially xs:element and xs:any wild-card's) in derived and base types must be equal.

D) SEQUENCE TO ALL RESTRICTIONS
This is not a valid schema compositor restriction (i.e from ordered to unordered).

E) CHOICE TO SEQUENCE RESTRICTIONS
e.1 Here are few examples explaining some of the rules for this category.
  <xs:sequence>
     <xs:element name="c" type="xs:string" />
  </xs:sequence>

is a valid restriction of
  <xs:choice>  
     <xs:any processContents="lax" />
     <xs:element name="b" type="xs:string" />
  </xs:choice>

(the element particle "c" is subsumable by the wild-card)

e.2
  <xs:sequence>
     <xs:any processContents="lax" />
  </xs:sequence>

is not a valid restriction of
  <xs:choice>         
     <xs:element name="a" type="xs:string" />
     <xs:element name="b" type="xs:string" />
  </xs:choice>

This is so because a wild-card is not a valid subsumption of an element particle (i.e generic derivations from concrete elements is not a valid restriction, which in fact looks like an "type extension" concept).

F) SEQUENCE TO CHOICE RESTRICTIONS
Here's an example I can think over that correspond to use case of such kinds.
   
   <xs:restriction base="TYPE_BASE">
      <xs:choice>
         <xs:group ref="myGroup" />
      </xs:choice>
   </xs:restriction>
   
   is a valid restriction of
   
   <xs:complexType name="TYPE_BASE">
      <xs:group ref="myGroup" />
   </xs:complexType>
   
   <xs:group name="myGroup">
      <xs:sequence>
         <xs:element name="a" type="xs:string" />
         <xs:element name="b" type="xs:string" />
      </xs:sequence>
   </xs:group>

But this is not a useful schema type restriction, since the result of choice (i.e the schema particle produced from xs:choice) in derived type results only in one option, which is same as the contents of the sequence of the base type.

Other than the above example I cannot envision any other useful example for practical scenarios for "sequence to choice" restriction. I would imagine that schema authors must not bother much about "sequence to choice" restriction scenarios, as this doesn't looks a good and useful schema design scenario (but I don't deny that people may find valid uses of this as well :).

G) CHOICE TO CHOICE RESTRICTIONS
Here are few of the examples I can think of that satisfy this use-case (these I've found to be working fine with Xerces as well):

g.1 choice(a, c) is not a valid restriction of choice(a, b). Because element "c" in derived type doesn't have a corresponding element particle in the base type.

g.2
- choice(a, b) is a valid restriction of choice(a, wild-card processContents="lax"). If the wild-card can resolve to an element declaration that doesn't match element declaration "b", then this is NOT-A-VALID restriction.
- choice(a, b) is a valid restriction of choice(a, wild-card processContents="strict") if wild-card can resolve to an element decleration for "b" OTHER-WISE not.

g.3 choice(group name="myGroup", a) is a valid restriction of choice(group name="myGroup", xs:any processContents="lax"). Here model group instance is considered as a particle. But if the wild-card resolves to an element declaration that doesn't match element declaration "a", then this is NOT-A-VALID restriction.

g.4 choice(group name="myGroup", a) is not a valid restriction of choice(group name="myGroup", <xs:any/>). But this is a valid restriction if wild-card <xs:any> can find definition of element "a" which can derive (i.e is a valid subsumption) to element "a" in the derived type.

These are all the cases I can think of at the moment (enumerated A to G) which might occur for restriction between XML Schema 1.1 complexType's. I believe there would be few more complexType restriction cases which I'll try to post on this blog as I discover them.

I hope that this post was useful.

Saturday, October 23, 2010

XSD: schema type definition for empty XML content models

I'm inclined to write a little post, suggesting a correction (perhaps a better schema design) to an XML schema document I wrote in the blog post, http://mukulgandhi.blogspot.com/2010/07/xsd-11-xml-schema-design-approaches.html [1].

In this post [1], I suggested the following XML schema type definition for empty content models (I assume there would not be any attributes on an element):
  <xs:complexType name="EMPTY"> 
     <xs:complexContent> 
        <xs:restriction base="xs:anyType" /> 
     </xs:complexContent> 
  </xs:complexType>

Instead of the above schema type definition, I find the following (which is simpler I believe) schema type definition [2] (intending to constrain an XML element) to be better instead:
  <xs:element name="X">
    <xs:complexType/>
  </xs:element>

The element definition [2] above intends to validate an XML fragment like following:
<X/>

In the above example, I intend to suggest that there must not be any child nodes (and neither any XML attributes on an element) within element "X". Interestingly (nothing new really for people knowing XML schema language :) the XML Schema language, only allows constraining XML element and attribute nodes (and optionally these being XML namespace aware) and it doesn't bother about other XML infoset components like comments, processing-instructions and so on (which are present in XPath data model for example) [A] -- this means that any other kinds of nodes, than XML elements and attributes are ignored by XML Schema language and a compliant XML schema validator. This nature [A] of XML schema language is OK as I've learnt (there have been some nice discussions about all of this at XML-DEV list in recent past).

2010-10-26: Here's another variant for definition of empty XML content models.
  <xs:simpleType name="EMPTY">
     <xs:restriction base="xs:string">
        <xs:maxLength value="0"/>
     </xs:restriction>
  </xs:simpleType>

This defines an XML schema 'simpleType' -- and enforces content emptiness with the schema 'maxLength' facet on type xs:string, instead of a complex type as defined in the previous example. I'm more inclined to define element emptiness by an simpleType like above, since intent (and semantics) of schema simple types is never to define XML attributes, but those of complexType are.

I hope the corrections I've shared in this post is appreciated by folks who've read my earlier post cited above [1].

Sunday, October 10, 2010

XSD 1.1: XML schema design approaches cotd... PART 4

In this blog post i'm trying to describe (I find the subject matter here interesting enough to have a new blog post!) few more XML Schema (i'm trying to cook-up XSD 1.1 examples :) use-cases - using largely XSD 1.1 assertions which are now solvable with XML Schema 1.1 (for example constraining cardinality of XML Schema xs:list items as described below), and as per my view-point couldn't be solved with XML Schema 1.0.

I hope, XML Schema community might find few of the things here interesting.

This post can be considered the PART 4 of the XML Schema 1.1 design series that I started couple of weeks ago. The previous parts of this series are available here:

1) PART 1
2) PART 2
3) PART 3

I'm using latest XML Schema 1.1 code-base from Xerces-J SVN repos.

Use-case: (A)
The examples in this post illustrate, how we can constrain the cardinality of XML Schema 1.1 xs:list instance members, and optionally constraining (just to verify myself how XSD 1.1 assertions behave in various combinations) few aspects of list members (like for example that, list items need to be even integers).

Here's an XML instance document (this describes a simple enough list of integers encapsulated in an XML element "X"), which I'll use for illustrations in this post:

[XML 1] (named temp.xml)
  <X>2 4 6 5 10 3</X>

Below are few XML Schema 1.1 examples (with Schema 1.1 instructions highlighted with different color), and explanations from my point of view thereafter:

[XML Schema 1]
  <?xml version='1.0'?>
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
    <xs:element name="X">
       <xs:complexType>
         <xs:simpleContent>
            <xs:restriction base="INT_LIST">
              <xs:assertion test="count($value) le 5" />
            </xs:restriction>
         </xs:simpleContent>
       </xs:complexType>
    </xs:element>
   
    <xs:complexType name="INT_LIST">
       <xs:simpleContent>
         <xs:restriction base="xs:anyType">
            <xs:simpleType>
               <xs:list itemType="xs:integer" />          
            </xs:simpleType>
            <xs:assert test="every $x in $value satisfies ($x mod 2 = 0)" />
         </xs:restriction>
       </xs:simpleContent> 
    </xs:complexType>

  </xs:schema>

[XML Schema 2]
  <?xml version='1.0'?>
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
     <xs:element name="X">
        <xs:simpleType>
          <xs:restriction base="INT_LIST">
             <xs:assertion test="$value mod 2 = 0" />
          </xs:restriction>
        </xs:simpleType>
     </xs:element>
   
     <xs:simpleType name="INT_LIST">
       <xs:list itemType="xs:integer" />
     </xs:simpleType>

  </xs:schema>

[XML Schema 3]
  <?xml version='1.0'?>
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
    <xs:element name="X">
      <xs:complexType>
        <xs:simpleContent>
          <xs:extension base="INT_LIST">
             <xs:assert test="count($value) le 5" />
          </xs:extension>
        </xs:simpleContent>
      </xs:complexType>
    </xs:element>
   
    <xs:simpleType name="INT_LIST">
       <xs:list itemType="xs:integer" />
    </xs:simpleType>

  </xs:schema>

Here are results of XML instance (of document [XML 1]) validation, with the specified schema's:

1. When XML document ([XML 1]) is validated by the schema [XML Schema 1], we get following validation outcomes with Xerces:
[Error] temp.xml:1:20: cvc-assertion.3.13.4.1: Assertion evaluation ('every $x in $value satisfies ($x mod 2 = 0)') for element 'X' with type 'INT_LIST' did not succeed.
[Error] temp.xml:1:20: cvc-assertion.3.13.4.1: Assertion evaluation ('count($value) le 5') for element 'X' with type '#anonymous' did not succeed.


2. When XML document ([XML 1]) is validated by the schema [XML Schema 2], we get following validation outcomes (with Xerces):
[Error] temp.xml:1:20: cvc-assertion.3.13.4.1: Assertion evaluation ('$value mod 2 = 0') for element 'X' with type '#anonymous' did not succeed. Assertion failed for an xs:list member value '5'.
[Error] temp.xml:1:20: cvc-assertion.3.13.4.1: Assertion evaluation ('$value mod 2 = 0') for element 'X' with type '#anonymous' did not succeed. Assertion failed for an xs:list member value '3'.


3. When XML document ([XML 1]) is validated by the schema [XML Schema 3], we get following validation outcomes (with Xerces):
[Error] temp.xml:1:20: cvc-assertion.3.13.4.1: Assertion evaluation ('count($value) le 5') for element 'X' with type '#anonymous' did not succeed.

Here's some quick analysis from my point of view, with regards to what I wanted to achieve with these use-cases (A):

The XML Schema 1.1 assertions XPath 2.0 context variable "$value" has a type annotation xs:anyAtomicType*.

1. The first validation result (1. above) illustrates that every item of xs:list needs to be an even integer, and number of list items are constrained to be maximum "5" (this is a sample "max" limit on number of list items).

2. I intended to use validation results 2. and 3. in combination doing an boolean "AND" of them, essentially to have same XML instance validation objective as case 1. The boolean "AND" of two schema validations can be achieved with for example, Java JAXP validation API. I wrote XML Schema document, [XML Schema 2] to have the XML Schema validator return each individual list item, which do not pass test of mathematical evenness (this was not entirely achieved with schema document [XML Schema 1] -- where the schema detected an evenness failure for whole list instance, but didn't report every individual list item which didn't pass evenness test).

I hope the intent of the use-case described here, and the solutions offered are explained clear enough for XML Schema audience.

Thanks for reading, and as usual I hope that this blog post was interesting!

Sunday, September 5, 2010

XSD 1.1: Xerces-J implementation updates

Over the past one or two months, there have been few interesting changes happening at Xerces-J XML Schema 1.1 implementation. I feel obliged to share these enhancements with the XML Schema community, and also with folks at Eclipse WTP (where we enhanced few "schema aware" components of PsychoPath XPath 2.0 engine, to support these recent Xerces enhancements -- I think we improved the design of typed values of XML element and attribute XDM nodes in PsychoPath XPath2 engine, in case the XDM node has a type annotation of kind XML Schema simpleType, with varieties list or union).

Here's a summary of XML Schema 1.1 implementation changes that have recently been completed with Xerces (available at Xerces SVN repos as of now), which are planned to be part of the Xerces-J 2.11.0 release, planned to take please during November 2010 time frame.

1. Xerces-J now has a complete implementation of XML Schema 1.1 conditional inclusion functionality. The Xerces-J 2.10.0 release had implementation of XML Schema 1.1 conditional inclusion vc:minVersion and vc:maxVersion attributes. Xerces-J now supports all of "conditional inclusion" attributes as specified by the XML Schema 1.1 spec. The "conditional inclusion" attributes that are now newly supported in Xerces-J are: vc:typeAvailable, vc:typeUnavailable, vc:facetAvailable and vc:facetUnavailable. All of XML Schema 1.1 built-in types and facets are now supported by Xerces-J related to XML Schema 1.1 "conditional inclusion" components.

2. There are few interesting changes that have happened to Xerces-J XML Schema 1.1 assertions implementation as well, that are planned to be part of Xerces-J 2.11.0 release. Xerces now has an improved assertions evaluation processing on XML Schema (1.1) simple types, with varieties 'list' and 'union'.

2.1 Enhancements to assertions evaluation on simpleType -> list:

Here's an example of XML Schema 1.1 assertions on an xs:list schema component:
[XML Schema 1]
   <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

      <xs:element name="Example" type="EXAMPLE_LIST" />
   
      <xs:simpleType name="EXAMPLE_LIST">
         <xs:list>
            <xs:simpleType>
               <xs:restriction base="xs:integer">
                  <xs:assertion test="$value mod 2 = 0" />
               </xs:restriction>
            </xs:simpleType>
         </xs:list>
      </xs:simpleType>
   
   </xs:schema> 

If an XML instance document has a structure something like following:
[XML 1]
<Example>1 2 3</Example>

And if this XML instance document ([XML 1]) is validated by the above XML schema ([XML Schema 1]), Xerces-J would report error messages like following (assuming the name of XML document was, test.xml):
[Error] test.xml:1:25: cvc-assertion.3.13.4.1: Assertion evaluation ('$value mod 2 = 0') for element 'Example' with type '#anonymous' did not succeed. Assertion failed for an xs:list member value '1'.
[Error] test.xml:1:25: cvc-assertion.3.13.4.1: Assertion evaluation ('$value mod 2 = 0') for element 'Example' with type '#anonymous' did not succeed. Assertion failed for an xs:list member value '3'.


An assertion must evaluate on every 'simpleType -> list' item (which is validated by the itemType of xs:list) in an XML instance document. Xerces now does this, and needed error messages are displayed in case of schema assertion failures.

2.2 Enhancements to assertions evaluation on simpleType -> union:

Here's an example of XML Schema 1.1 assertions on an xs:union schema component:
[XML Schema 2]
   <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
      <xs:element name="Example">
         <xs:simpleType>
            <xs:union memberTypes="MYDATE xs:integer" />
         </xs:simpleType>
      </xs:element>
   
      <xs:simpleType name="MYDATE">
         <xs:restriction base="xs:date">
            <xs:assertion test="$value lt current-date()" />
         </xs:restriction>
      </xs:simpleType>

   </xs:schema>

If an XML instance document has a structure something like following:
[XML 2]
<Example>2010-12-05</Example>

And this instance document is validated by the schema document, [XML Schema 2] the following error message is displayed by Xerces:
[Error] temp.xml:1:30: cvc-assertion.union.3.13.4.1: Element 'Example' with value '2010-12-05' is not locally valid. One or more of the assertion facets on an element's schema type, with variety union, have failed.

Xerces tried to validate an atomic value '2010-12-05' both with schema types xs:integer and MYDATE. Since none of these types could successfully validate this atomic value, and an assertion failed in the process of these validation checks, the relevant assertion failure was reported by Xerces.

If the XML schema, [XML Schema 2] tries to validate the XML instance document:
<example>10</Example>

no validation failures are reported in this case, since an atomic value '10' conforms to the schema type xs:integer, which results in an overall validation success of the atomic value with an 'union' schema type.

I'm ending this blog post now. Stay tuned for more news here :)

And I hope, that this post was useful.

Saturday, July 17, 2010

XSD 1.1: XML schema design approaches cotd... PART 3

I'm continuing with the XML Schema design thoughts series, with the third part here. The first two parts are available here:
1) PART 1
2) PART 2

All the examples here have been tested with Xerces-J 2.10.0.

(I'm disclaiming in the beginning, that examples presented in this blog post are somewhat fictitious and may not serve a real life use-case. These examples are kind of cooked-up to only illustrate XML Schema 1.1 constructs, and some of design thinking behind them. I also refer at lot of places a phrase "element particles". This simply means XML elements, but "particles" is a formal term defined by the XML Schema spec, designating XML schema components having minOccurs and maxOccurs attributes -- if minOccurs/maxOccurs attributes are absent, then these have default values for the relevant schema components)

I'm presenting a sample 1.1 XML schema with corresponding XML document first, and then attempting trying to reflect on the inherent design from my point of view in these examples:

XML Schema 1.1 specific constructs are emphasized with a different color.

[XML1]
  <Book>
     <name>XML in a Nutshell</name>
     <ISBN>AB-1001</ISBN>
     <author>Jimmy</author>
     <NoPages>100</NoPages>
  </Book>

[XML Schema 1]
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
     <xs:element name="Book">
        <xs:complexType>
          <xs:complexContent>
            <xs:extension base="BOOK_FRAGMENT">
               <xs:openContent>
                 <xs:any processContents="lax" />
               </xs:openContent>
               <xs:assert test="not(* except (name, author, ISBN, NoPages)) and 
                                 (if (ISBN)
                                    then not(ISBN/*) 
                                    else true()) and 
                                 (if (NoPages) 
                                     then (not(NoPages/*) and (NoPages/text() castable as xs:positiveInteger))
                                     else true())" />    
            </xs:extension>
          </xs:complexContent>          
        </xs:complexType>
     </xs:element>
   
     <xs:complexType name="BOOK_FRAGMENT">
        <xs:sequence>
          <xs:element name="name" type="xs:string" />
          <xs:element name="author" type="xs:string" />
        </xs:sequence>
     </xs:complexType>

  </xs:schema>

The following use-case requirements motivated me to write this sample (I'm also trying to reflect on the schema design choices I've made, about which I surely invite comments from the readers -- if you've patience to read this post and respond!):
1. XML Schema 1.0 has a limitation that, when a complex type (having sequence or choice particles) is derived by extension then a derived complex type can only add element particles at the end of an element list (within the base type). Supposing that we want to re-use a complex type (having a sequence of element particles) by deriving it with extension, and need to add additional element particles say any-where in between the elements of the base type. This is what the above XML schema (XML Schema 1) example intends to do; and the above schema does indeed validates successfully the corresponding XML document presented above (XML1).

2. A key design decision in the above schema (XML Schema 1) is to use the XML Schema 1.1 "openContent" instruction (newly introduced in 1.1 version). The use of XSD 1.1 assertions here is optional, but is very practical to do so (which I'll try to explain!). An XML schema "openContent" instruction is essentially a wrapper around xs:any wild-card, producing the same effect as xs:any wild-card but has an interleave or a suffix appending behavior (please feel free to read the XML Schema 1.1 spec to learn more about XSD 1.1 open contents. Or perhaps if you want a lighter [but brilliant] explanation, you may read Roger L. Costello's XML Schema 1.1 write-up available here).
The XML Schema 1.1 spec defines an "openContent" instruction as following:
  <openContent
     id = ID
     mode = (none | interleave | suffix) : interleave
     {any attributes with non-schema namespace . . .}>
     Content: (annotation?, any?)
  </openContent>
It is an openContent instruction with "interleave" mode (which is the default openContent mode), which enables adding additional element particles interspersed between base type's element particles.

3. In the above example, the XML elements "ISBN" and "NoPages" are added to the base type's element particles which are not appended at the end of base type's elements, but can be added anywhere within the resulting XML content model. For this particular example, the placement of XML elements coming from the derived complex type are arbitrary, and is done to only illustrate the workings of "openContent" instruction in "interleave" mode.

4. It's interesting to see the benefit of XSD 1.1 assertions here. The assertions here are able to impose certain constraints on the resultant content model (otherwise the content model is kind-of wide open with no restrictions). The assertions in the above schema document (XML Schema 1) mean:
  a) The resulting content model can only have XML elements -> "name", "author", "ISBN" and "NoPages".
  b) The element "ISBN" needs to be an atomic string value, and the element "NoPages" needs to be an xs:positiveInteger value.

I'm presenting below another XML schema variant (than the example above -- XML Schema 1), which solves the same problem as described above, but in a slightly different way (with advantages and disadvantages described after the example):

[XML Schema 2]
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
      <xs:element name="Book">
         <xs:complexType>
            <xs:complexContent>
               <xs:extension base="BOOK_FRAGMENT">
                  <xs:openContent>
                     <xs:any processContents="strict"/>
                  </xs:openContent>
                  <xs:assert test="count(distinct-values(for $elem in (* except (name, author)) return $elem/name())) = count(for $elem in (* except (name, author)) return $elem/name())"/>       
               </xs:extension>
            </xs:complexContent>          
         </xs:complexType>
      </xs:element>
   
      <xs:complexType name="BOOK_FRAGMENT">
         <xs:sequence>
           <xs:element name="name" type="xs:string"/>
           <xs:element name="author" type="xs:string"/>
         </xs:sequence>
      </xs:complexType>
   
      <xs:element name="ISBN" type="xs:string" />
   
      <xs:element name="NoPages" type="xs:positiveInteger" />

   </xs:schema>

The example XML document for this schema (XML Schema 2) remains same (XML1). Here are the advantages (and unfortunately a little disadvantage as well, with a suggested workaround for the drawback...) of the sample, XML Schema 2:
1. Here we are using xs:any wild-card with processContents="strict" mode (the earlier example used the wild-card with "lax" mode) and providing the corresponding element declarations in the schema (the last two element declarations). This approach has advantage that, the content model of elements "ISBN" and "NoPages" are enforced natively by the XML schema engine, and the schema author doesn't have to implement the content model constraints herself/himself (for example, that an element is empty and has an atomic value) -- say via assertions. This approach is more robust, than trying to achieve the similar effect with assertions.

2. The assertion in schema document, [XML Schema 2] enforces that elements in the sequence could occur only once. This is accomplished by this simple algorithm:
count(distinct-values(names...)) = count(names...)

3. The only drawback I foresee with XML Schema 2, is that elements "ISBN" and "NoPages" are now global elements (which is necessary to have xs:any wild-card to work with processContents="strict" mode). This has implication that following XML documents would be reported valid as well, by the schema document XML Schema 2:
  <ISBN>AB-1001</ISBN>
AND
  <NoPages>100</NoPages>

This is a side-effect of schema document XML Schema 2, which I myself personally don't seem to like :(

To solve this limitation, I can imagine there could be a workaround as following:
We could perform two validations in sequence. One with the schema document, [XML Schema 2] (let's call this validation V1) and the second one with the following schema document (let's call this validation result V2):

[XML Schema 3]
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
      
      <xs:element name="ISBN" type="xs:string" />
   
      <xs:element name="NoPages" type="xs:positiveInteger" />

  </xs:schema>

This is kind of a little validation pipeline. The complete/end-to-end (which usually means, that this has domain meaning) schema validation succeeds in entirety, if validation V1 succeeds but V2 doesn't (I imagine, that this kind-of pipeline operation could be enforced by a host language, like Java using the XML Schema JAXP APIs).

Thanks for reading!

I hope that this post is useful.

Sunday, July 11, 2010

XSD 1.1: XML schema design approaches cotd... PART 2

I'm continuing with the XML Schema design approaches series, I started in the previous blog post. Here's the second post in this series.

Here's a description of the use-case I'll be illustrating in this post, with both XML Schema 1.0 and 1.1 examples:

We need to write an XML Schema for the following XML content model:
  colors
    -> (violet | indigo | blue | green | yellow | orange | red)+

Here the words "colors", "violet" etc represent XML elements, and they have no attributes and are empty. The above content model means, that children of element "colors" can repeat and are unordered, and at-least one of them is required.

Therefore following XML document is a valid instance according to this content model:

[XML1]
  <colors>
     <violet/>
     <indigo/>
     <blue/>
     <green/>
     <yellow/>
     <orange/>
     <red/>
  </colors>

AND for example, the following XML document is valid as well, as per the content model described above (here the element "colors" have less children than the previous example, and some of children of "colors" occur more than once):

[XML2]
  <colors>
     <violet/>
     <indigo/>
     <blue/>
     <green/>
     <green/>
  </colors>

Here are two XML schema examples that express the above XML content model constraints:

[XML Schema 1] (written in XML Schema 1.0)
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
     <xs:element name="colors">
        <xs:complexType>
           <xs:choice maxOccurs="unbounded">
              <xs:element name="violet" type="EMPTY" />
              <xs:element name="indigo" type="EMPTY" />
              <xs:element name="blue" type="EMPTY" />
              <xs:element name="green" type="EMPTY" />
              <xs:element name="yellow" type="EMPTY" />
              <xs:element name="orange" type="EMPTY" />
              <xs:element name="red" type="EMPTY" />     
           </xs:choice>
        </xs:complexType>
     </xs:element>
   
     <xs:complexType name="EMPTY"> 
        <xs:complexContent> 
          <xs:restriction base="xs:anyType" /> 
        </xs:complexContent> 
     </xs:complexType>

  </xs:schema>

[XML Schema 2] (written in XML Schema 1.1 -- the 1.1 specific constructs are displayed with a different color)
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
     <xs:element name="colors">
        <xs:complexType>
           <xs:sequence>
             <xs:any maxOccurs="unbounded" processContents="lax" />
           </xs:sequence>
           <xs:assert test="every $x in */name() satisfies ($x = 
                              ('violet','indigo','blue','green','yellow','orange','red'))" />
           <xs:assert test="every $x in * satisfies not($x/node())" />
        </xs:complexType>
     </xs:element>

  </xs:schema>

Here's some quick analysis from my point of view, about the differences between the above schema approaches, and if any of the above approaches is better than the other one:
1) "XML Schema 1" is written in a familiar 1.0 style, so people who want to stick with 1.0 can still adopt this technique. We can observe, that the first schema is a little more verbose than the second one, which I see as one of the advantage of the second one.

2) If you are comfortable writing the XPath 2.0 expressions, then there are virtually too many possibilities to express schema validation constraints with XSD 1.1 assertions, which is really lots of power in the hands of the schema author!

3) Personally speaking, I find the second way of writing the XML schema ("XML Schema 2") a really cool NEW way to express these validation constrains. I'm not suggesting that the 1st way is not really good! That technique has great value, in it's own sense and has stood the tests of time. I find the second technique a more natural description from the schema author, to express the logic of the use-case in question.

4) One the possibilities I now foresee with XML Schema 1.1, is that schema author could impose quite a bit of constraints on xs:any wild-card instruction via assertions (which is particularly useful with processContents="lax" mode of the xs:any wild-card). A point worth observing is that with processContents="strict" mode of the xs:any wild-card, assertions are not really useful because, the schema validator would strictly validate the XML element with an element declaration, which must be provided by the schema author to satisfy the processContents="strict" mode of the wild-card (and assertions here would actually interfere with the available element declarations, which to my opinion is not a good design). With processContents="skip" mode of the xs:any wild-card, assertions would always fail (and the XML instance would become invalid), because the concerned XML elements would be discarded by the XML schema validator, and consequently these elements would not be part of the XPath data-model tree, on which assertions operate.

And needless to mention, Xerces-J handles all the above examples fine!

I hope that this post is useful.

Saturday, July 3, 2010

XSD 1.1: XML schema design approaches in XSD 1.1 world... PART 1

I'm thinking to write a series of posts (since writing too many ideas in one post could be boring to read, and could be quite voluminous for one post. I'll try to make sure, that these blog posts starting from this one have cross-references between them for related issues AND I'll convey in some future blog post, when I'm stopping writing this series!) only on XML schema design, given the XML Schema 1.1 constructs. I'll try to reflect why XML Schema 1.1 is essential for certain use-cases, and where XML Schema 1.0 falls short.

It is possible, that there may be a blog post unrelated to this series between these posts. When this series completes, I'll try to summarize the ideas at the end, to make the whole series available as an unit.

I'm disclaiming in the beginning, that any advice offered here may not necessarily be best. Improvements are generally always possible! Any feedback would be great (about the correctness of anything described here, alternative ideas OR anything else).

To start with the 1st post in this series, here's a little background about the use-case I'm describing in the subsequent paragraphs:
I've been reading the book "DB2 pureXML Cookbook: Master the Power of the IBM Hybrid Data Server" [1] recently by Matthias Nicola (a member of DB2 pureXML team). This book describes an example (in chapter 2) as follows.
A physical object could be described by two possible XML content models as follows:

a) Metadata as values, aka Name/Value Pairs (often bad):
  <object type="car">
    <field name="brand" value="Honda" />
    <field name="price" value="5000" />
    <field name="currency" value="USD" />
    <field name="year" value="2002" />
  </object>

b) Metadata as element names (good):
  <car>
    <brand>Honda</brand>
    <price currency="USD">5000</price>
    <year>1996</year>
  </car>

I wouldn't describe here why one of the above XML design approaches is good or bad. This is described well in the book cited above [1], which I would encourage folks to read (the books has some nice explanation about DB2 pureXML as well).

Let's say we want to build an XSD schema, for the XML document (a) above. To start with, one of the design decisions I took is, to define a set of XML Schema types with a hierarchy as following:
  OBJECT
     -> OBJECT_ON_SALE
            -> CAR
            -> BOOK

The other few design decisions I've taken are as following:
1) We'll use XSD 1.1 type-alternatives to select between different schema types, depending on value of the attribute object/@type.
2) We'll use an hierarchy of XSD 1.1 assertions definitions, to enforce certain validation constraints.

The meaning of these design decisions would likely become clear to us, by looking at the XML instance and schema document I'm describing below:

The sample XML instance I propose is as following:
[XML1] (this is same as XML instance "a" above, and is repeated here for convenience)
  <object type="car">
     <field name="brand" value="Honda" />
     <field name="price" value="5000" />
     <field name="currency" value="USD" />
     <field name="year" value="2002" />
  </object>
OR

[XML2]
  <object type="book">
     <field name="title" value="XML in a Nutshell" />
     <field name="author" value="Jimmy" />
     <field name="author" value="Nick" />
     <field name="publisher" value="Prentice Hall" />
     <field name="price" value="15" />
     <field name="currency" value="USD" />
     <field name="year" value="2008" />
  </object>

I propose the following XSD 1.1 schema (the 1.1 specific constructs are highlighted with a different color), that is designed to validate both of above XML instance documents (after which I'll try to analyze few design elements of this schema):

[SCHEMA1]
  <?xml version="1.0" encoding="UTF-8"?>
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
             xmlns:xerces="http://xerces.apache.org">
 
     <xs:element name="object" type="OBJECT">
       <xs:alternative test="@type='book'" type="BOOK" />
       <xs:alternative test="@type='car'" type="CAR" />
     </xs:element>
 
     <xs:complexType name="BOOK">
        <xs:complexContent>
           <xs:extension base="OBJECT_ON_SALE">
              <xs:assert test="field/@name = 'title' and
                               field/@name = 'author' and
                               field/@name = 'publisher' and
                               field/@name = 'year'"
                         xerces:message="For a book the fields title/author/publisher/year are mandatory" />
              <xs:assert test="xs:int(field[@name = 'year']/@value) gt 1900 and 
                               xs:int(field[@name = 'year']/@value) lt 2011"
                         xerces:message="A book's publication year must be between 1900 and 2011" />
              <xs:assert test="count(field[not(@name = 'author')]) = 
                               count(distinct-values(field[not(@name = 'author')]/string(@name)))" 
                         xerces:message="A book can have multiple authors, but none of other fields of a book can occur twice" />
           </xs:extension>
        </xs:complexContent>
     </xs:complexType>
 
     <xs:complexType name="CAR">
        <xs:complexContent>
           <xs:extension base="OBJECT_ON_SALE">
              <xs:assert test="field/@name = 'brand' and
                               field/@name = 'year'" 
                         xerces:message="For a car the fields brand/year are required" />
              <xs:assert test="xs:int(field[@name = 'year']/@value) gt 2000 and 
                               xs:int(field[@name = 'year']/@value) lt 2011" 
                         xerces:message="A car's manufacture year must be between 2000 and 2011" />
              <xs:assert test="count(field) = count(distinct-values(field/string(@name)))" 
                         xerces:message="None of the fields of an object 'car' can occur twice" />
           </xs:extension>
        </xs:complexContent>
     </xs:complexType> 
 
     <xs:complexType name="OBJECT_ON_SALE">
        <xs:complexContent>
          <xs:extension base="OBJECT">
             <xs:assert test="field/@name = ('price','currency')" 
                        xerces:message="An object that can be sold, must have the fields price/currency" />
             <xs:assert test="if (field/@name = 'price') then 
                              (field[@name = 'price']/xs:decimal(@value) gt 0 and
                              field/@name = 'currency')
                              else true()" 
                         xerces:message="If a field price is present, the currency field should exist. The value of price must be greater than 0." />
          </xs:extension>
        </xs:complexContent>
     </xs:complexType>
 
     <xs:complexType name="OBJECT">
       <xs:sequence>
         <xs:element name="field" minOccurs="0" maxOccurs="unbounded">
            <xs:complexType>
               <xs:attribute name="name" type="xs:string" />
               <xs:attribute name="value" type="xs:string" />
            </xs:complexType>
         </xs:element>
       </xs:sequence>
       <xs:attribute name="type" type="xs:string" />    
     </xs:complexType>

  </xs:schema>

The key design elements in the above schema document are the inheritance hierarchy and assertions/type-alternatives. The element "object" is validated by the schema type "CAR" or "BOOK". The set of assertions applicable on the type CAR/BOOK are the assertions on this type, as well as assertions inherited from the base types. The schema type applicable on the element "object" is controlled by the type-alternative switch (which works upon the value of attribute "type").

When the XML document, [XML1] is validated (with Xerces-J 2.10.0 -- actually with the latest code-base on Xerces SVN as of today, because there was a minor bug [which affects this particular example] that got fixed few days after Xerces-J 2.10.0 got released) by the schema document [SCHEMA1] the validation succeeds (as there are no validation errors).

Let's try to introduce some data errors in the XML instance document, and see what happens upon validation with the same schema document.

Here's a modified XML instance document:
[XML3]
  <object type="car">
    <field name="price" value="-100" />
    <field name="currency" value="USD" />
    <field name="year" value="1999" />
  </object>

If the above XML document ([XML3] -- named "test.xml") is validated by the schema document, [SCHEMA1] we get following validation errors with Xerces-J:
[Error] test.xml:5:10: cvc-assertion.failure: Assertion failure. If a field price is present, the currency field should exist. The value of price must be greater than 0.
[Error] test.xml:5:10: cvc-assertion.failure: Assertion failure. For a car the fields brand/year are required.
[Error] test.xml:5:10: cvc-assertion.failure: Assertion failure. A car's manufacture year must be between 2000 and 2011.


If we remove all of xerces:message attributes from assertions above, the following error messages are printed by Xerces for the above scenario:
[Error] test.xml:5:10: cvc-assertion.3.13.4.1: Assertion evaluation ('if (field/@name = 'price') then (field[@name = 'price']/xs:decimal(@value) gt 0 and field/@name = 'currency') else true()') for element 'object' with type 'OBJECT_ON_SALE' did not succeed.
[Error] test.xml:5:10: cvc-assertion.3.13.4.1: Assertion evaluation ('field/@name = 'brand' and field/@name = 'year'') for element 'object' with type 'CAR' did not succeed.
[Error] test.xml:5:10: cvc-assertion.3.13.4.1: Assertion evaluation ('xs:int(field[@name = 'year']/@value) gt 2000 and xs:int(field[@name = 'year']/@value) lt 2011') for element 'object' with type 'CAR' did not succeed


It's up-to the user's, that which error format is appropriate for them (the error format without xerces:message prints more error-context information. While the format with xerces:message could be useful to print user-friendly error messages, upon assertions failure).

I won't describe in much detail now, the domain meaning of error messages above and the problem scenario itself. I believe, this fictitious problem domain is simple enough to understand these examples.

I would end this post now. I'll take up the case of schema type "BOOK" in a future blog post (I imagine, the concepts I'm trying to illustrate here for the domain object CAR would be similar for the object BOOK).

I'll try to write few more XSD 1.1 examples in subsequent posts!

I hope that this post is useful.

Sunday, June 27, 2010

XSD 1.1: assertions failure error messages

Here's some info for XML Schema users!

As refered on my earlier blog post (ref, http://mukulgandhi.blogspot.com/2010/04/xsd-11-xsprecisiondecimal-assertions.html), XSD 1.1 assertions error message was not a standard feature of the XML Schema language spec, and XML Schema WG was deliberating on this issue at that point of time.

As of now, the XML Schema WG has taken a decision on this, which is as following (which is an response from David Ezell, XML Schema WG chair):
<quote>
The WG believes that this topic should be covered in a separate note describing best practices for how to handle this issue. Liam suggests following the i18n practice of publishing "articles" to recommend best practice.

See http://www.w3.org/International/ for examples.
</quote>

Therefore, keeping in view this observation from XML Schema WG, the Xerces-J implementation would not change about assertions error messages. Or unless, there are any specific comments from Xerces users about assertions error reporting.

Saturday, June 19, 2010

Xerces-J 2.10.0 released!

I'm pleased to extend the announcement of Xerces-J 2.10.0 release (released today. ref, http://markmail.org/message/m73xwkrmyacppu3l).

Xerces-J 2.10.0 is now available on the Xerces site, http://xerces.apache.org/xerces2-j/ to be used by community. This is a release after nearly two and a half years of the previous Xerces-J release, 2.9.1. This is a significant milestone at Xerces, with lot's of new features & bug fixes/enhancements.

Xerces-J 2.10.0 provides two versions of the Xerces distributables:
1. Xerces2 Java 2.10.0 - this is a maintainence release for 2.9.1 release (having bug fixes & enhancements), with essentially the same parsing & API support as 2.9.1 (except I think addition of support for StAX 1.0 event API & Element Traversal API -- more details are available in Xerces release notes).
2. Xerces2 Java 2.10.0 (XML Schema 1.1) (Beta) - this release supports a partial experimental implementation of the XML Schema 1.1 Structures and Datatypes Working Drafts (December 2009), along with all the changes available in "Xerces2 Java 2.10.0" release (point 1).

PS: I would also like to extend special thanks to the Eclipse/PsychoPath XPath 2.0 team members, Dave Carver & Jesper S Møller for helping producing an excellent XPath 2.0 engine, which underlies Xerces XML Schema 1.1 assertions and type-alternatives implementation. Special thanks also to Andrea Bittau, who originally authored the PsychoPath engine.

I hope the community likes the Xerces 2.10.0 release!

Please feel free to discuss about Xerces-J at:

Thursday, May 27, 2010

XSD 1.1: schema versioning, and Xerces-J support

XML Schema 1.1 provides a nice schema composition feature, called "Conditional inclusion" which allows us to include/exclude Schema components, during schema processing, based on values of certain special control attributes (minVersion & maxVersion), specified on the schema components.

Here are two simplistic examples, illustrating this feature:

Example 1

XML document [1]:
<test>3</test>

XSD 1.1 document [2]:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
                  xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning">

      <xs:element name="test">
          <xs:simpleType>
              <xs:restriction base="xs:positiveInteger">
                 <xs:assertion test="$value mod 2 = 0" vc:minVersion="1.1" />
              </xs:restriction>
          </xs:simpleType>
      </xs:element>

</xs:schema>

In the above schema document [2], the attribute vc:minVersion on xs:assertion instruction specifies, that the assertion instruction would only be processable by XSD processors, which support 1.1 and a higher level of the XSD schema language. If this schema document [2] is run by an XSD 1.1 (and possible a higher language version in future) processor in XSD 1.0 mode, the assertion instruction would be ignored by the XSD engine. The schema versioning features allows us to have a XSD engine, ignore certain schema components (in entirety along with their descendant instructions).

Example 2

XML document [3]:
<address ver="V2">
    <street1></street1>
    <street2>XX</street2>
    <city>XX</city>
    <state>XX</state>
    <country>XX</country>
</address>


XSD 1.1 document [4]:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
                  xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning">

     <xs:element name="address" type="Address">
         <xs:alternative test="@ver = 'V2'" type="AddressV2" vc:minVersion="1.1" />
     </xs:element>

     <xs:complexType name="Address">
        <xs:sequence>
            <xs:element name="street1" type="xs:string" />
            <xs:element name="street2" type="xs:string" />
            <xs:element name="city" type="xs:string" />
            <xs:element name="state" type="xs:string" />
            <xs:element name="country" type="xs:string" />
        </xs:sequence>
     </xs:complexType>

     <xs:complexType name="AddressV2">
         <xs:complexContent>
             <xs:extension base="Address">
                <xs:attribute name="ver" type="xs:string" />
                <xs:assert test="not(normalize-space(street1) = '')" />
             </xs:extension>
         </xs:complexContent>
     </xs:complexType>

</xs:schema>

Similarly, the above schema document ([4]) ignores the type-alternative instruction, if the XSD 1.1 processor is run in a XSD 1.0 mode. I believe, the intent of the above schema and the XML document should be clear enough (we are using an "address" element in XML document, which needs to be validated by a corresponding XML Schema type. The complex type, "AddressV2" extends the type "Address", and has an assertion specification to constrain the contents of the element "street1" -- in this particular example, the assertion on type "AddressV2" constrains the element, "street1" to have some significant white-space characters).

Xerces-J runs these examples fine.

Summarizing this post: The XSD 1.1 schema versioning features, allows us to write a XSD schema containing mix and match of XSD 1.0 and 1.1 instructions (and XSD instructions beyond XSD 1.1 level, for future!), and have the XSD 1.1 engine ignore certain XSD instructions at run-time depending, at which XSD language level, the XSD 1.1 engine was invoked.

I hope that this post is useful.

Sunday, April 25, 2010

XSD 1.1: negative "pattern" facets and assertions

While exploring more of XSD 1.1 assertions, I've been pretty convinced that much of the limitations of XSD "pattern" facet can be overcome with assertions (and of-course one of real benefits of XSD 1.1 assertions is the ability to specify co-occurrence constraints, in XML Schema documents -- here's a nice article explaining XML Schema 1.1 co-occurrence constraints).

I think, one of the things which might get quite difficult to express in XML Schema 1.0, is specifying a negative word list.

For example, if we have this simple XML document:
  <fruit>apple</fruit>

And we want that, the XML element "fruit" must not contain say the words "cherry" or "guava". Although, this looks a pretty straight-forward regex use-case, but unfortunately it might get quite cumbersome to express this seemingly straightforward regex pattern, with the available XSD 1.0 regular-expression syntax.

My quick try to express this with XSD 1.0, was something like following:
<xs:pattern value="^(cherry|guava)" />

But unfortunately, the above pattern facet and quite a few similar regexes, can't accomplish this seemingly common use-case easily (I think, this is doable with XSD 1.0 regex's but certainly, it would be quite tedious to come to the right regex pattern -- of-course regex experts/gurus could do this easily, but not me at this moment!).

And now, I try to express these validation constraints with XSD 1.1 assertions. Here's a sample XSD 1.1 schema [1], using assertions to solve this, and few of similar use-cases:

  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

     <xs:element name="Example" type="Fruits1" />
  
     <xs:complexType name="Fruits1">
       <xs:sequence>
         <xs:element name="fruit" type="xs:string" />
         <xs:element name="exclude" type="xs:string" />
       </xs:sequence>
       <xs:assert test="not(fruit = tokenize(exclude,','))" />
     </xs:complexType>
  
     <xs:complexType name="Fruits2">
       <xs:sequence>
         <xs:element name="fruit" type="xs:string" />
         <xs:element name="exclude" type="xs:string" />
       </xs:sequence>
       <xs:assert test="not(fruit = (for $x in tokenize(exclude,',') return 
                                              normalize-space($x)))" />
     </xs:complexType>
  
     <xs:complexType name="Fruits3">
       <xs:sequence>
         <xs:element name="fruit" type="xs:string" />
         <xs:element name="exclude" type="xs:string" />
       </xs:sequence>
       <xs:assert test="not(fruit = (for $x in tokenize(exclude,',') return 
                                (string-join(tokenize($x,' '),''))))" />
     </xs:complexType>
  
   </xs:schema>

A sample XML instance document [2], that we'll validate with the above schema, is following:
  <Example>
    <fruit>apple</fruit>
    <exclude>cherry,guava</exclude>
  </Example>

As stated in the original requirements above, we want that the word in element "fruit" must not contain any of words, from the comma-separated list in the "exclude" element.

In the above XSD schema [1], the complex type "Fruits1" can successfully validate the above XML instance document [2].

The complex type "Fruits2" can validate an exclude list, where there could be white-spaces before and after the 'comma separator'. For example, the list "cherry, guava" (please note, an extra white-space after the 'comma') would be considered an appropriate exclusion list for this example. Whereas, this list variant cannot be validated by the schema type, "Fruits1".

And the complex type "Fruits3" can validate an exclude list of kind, "cherry, g u a v a" (i.e, there could be white-space characters, within a word) -- this is a figment of my imagination :). But certainly there could possibly be such lexical constraints in instance documents.

PS: All the examples in this post were tested with, Xerces-J.

I hope, that this post is useful.

Saturday, April 17, 2010

XSD 1.1: xs:precisionDecimal, assertions and Xerces-J updates

Section 1
Recently, I went through in sufficient detail about the XSD primitive data-type, xs:precisionDecimal (newly introduced in, XSD 1.1), and was trying to use XSD 1.1 assertions to simulate xs:precisionDecimal (just to vent my curiosity and exploring more of, XSD assertions) as a user-defined (as a restriction of xs:decimal data-type) XSD Simple Type (though I believe, a native implementation of xs:precisionDecimal should also exist in an XSD 1.1 implementation, or in language systems which may use the XSD type system -- for example, a stand-alone XPath (2.x) implementation which uses an XSD type system).

Here's an XSD 1.1 schema example, illustrating these concepts:
[1]
   <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

      <xs:element name="example" type="myPrecisionDecimal" />
  
      <xs:simpleType name="myPrecisionDecimal">
        <xs:restriction base="xs:decimal" xmlns:xerces="http://xerces.apache.org">
           <xs:totalDigits value="6" />
           <xs:fractionDigits value="4" />
           <xs:assertion test="string-length(substring-after(string($value), '.')) ge 2" 
                  xerces:message="minScale of this decimal number should be 2" />
        </xs:restriction>
      </xs:simpleType>
  
   </xs:schema>

The XSD type, "myPrecisionDecimal" defined above has following correspondences with the type, xs:decimal:
a) The facet specification, xs:totalDigits in "myPrecisionDecimal" is equivalent to the facet xs:totalDigits in xs:decimal.
b) The facet specification, xs:fractionDigits in "myPrecisionDecimal" is equivalent to the facet "maxScale" for, xs:decimal.
c) The assertion facet in, "myPrecisionDecimal" is equivalent (an user-defined attempt to equalize!) to the facet "minScale" for, xs:decimal.

When the above schema document [1], is used to validate the following XML instance:
<example>44.4</example>
The following error message is produced:
[Error] test.xml:1:24: cvc-assertion.failure: Assertion failure. minScale of this decimal number should be 2.

It's also worth noting that, the above user-defined type "myPrecisionDecimal" cannot be considered a true equivalent of XSD type, xs:precisionDecimal as defined in XSD 1.1 spec, because xs:precisionDecimal also includes values for positive and negative infinity and for "not a number", and it differentiates between "positive zero" and "negative zero" (these aspects, are not defined for xs:decimal). The above example, for "myPrecisionDecimal" only demostrates, simulating the "minScale" facet (which is not available in the type, xs:decimal) of xs:precisionDecimal.

Section 2
(Xerces-J, assertions implementation update)

Xerces-J recently implemented, an extension attribute "message" (specified in a namespace, http://xerces.apache.org, for Xerces-J XSD 1.1 implementation) on XSD 1.1 assertion instructions. The value of this attribute, needs to be an error message that will be reported by an XSD 1.1 engine upon assertions failure.

An example of this is illustrated, in the schema document above [1].

In the absence of the "message" attribute on assertions (or if it's present, but it doesn't contain any significant non-whitespace characters), the following default error message is produced by Xerces:
[Error] test.xml:1:24: cvc-assertion.3.13.4.1: Assertion evaluation ('string-l
ength(substring-after(string($value), '.')) ge 2') for element 'example' with type 'myPrecisionDecimal' did not succeed.


We could see the benefit of, the "message" attribute on assertions, which to my opinion are following:
a) For complex (& particularly, lengthy) XPath expressions in assertions, the default error messages produced by Xerces, could be quite verbose which the user's may not find convenient to view & debug. The user experience, with default assertions error messages, may be further trouble-some if there are numerous assertion evaluations for XML documents -- we could imagine the user-experience, for say maxOccurs="unbounded" specification on XML elements on which assertions apply OR let's say, there may be of the order of "> 10" different assertions.
b) We could specify, domain specific error messages with the assertions "message" attribute.

Though, the advantage of the default assertion error messages produced by Xerces is that, it prints to the user, the name of XSD type and the element/attribute involved in a particular assertions validation episode.

PS: There's been a recent issue raised with the XSD WG, which proposes addition of a "message" attribute on assertions in the XSD 1.1 language itself. The Xerces implementation of assertions "message" attribute may change in future, depending on a recommendation related to this, from the XSD WG.

I hope, that this post is useful.

Sunday, March 21, 2010

playing again with XSD 1.1 assertions

Some time ago, XSLT folks (including me!) were discussing on XSL-List the design of an XML schema, describing a product catalog. This post has nothing to do with XSLT, except that an earlier discussion on XSL-List enkindled me with yet another XSD schema use-case, to try out the Xerces-J XSD 1.1 assertions implementation. I wrote the following XSD 1.1 schema use-case, with a desire to find out if Xerces-J XSD 1.1 assertion implementation would succeed, for this example (and to cause no surprise to readers, I'm pleased to say, that Xerces passes this example!).

So here goes this example.

XML document:
  <?xml version="1.0" encoding="UTF-8" ?>
  <product id="100">
    <shortname>Sun Press, Java Book</shortname>
    <description>Java Language: Design and Programming</description>
    <author>James Gosling</author>
    <price>
      <value effective="2000-10-10" format="hard cover">25</value>
      <value effective="2005-10-10" format="hard cover">20</value>
      <value effective="2009-10-10" format="pdf" freeware="true">0</value>
    </price>
  </product>
An XSD 1.1 schema validating the above XML document:
  <?xml version="1.0" encoding="UTF-8" ?>
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:complexType name="Product">
     <xs:sequence>
       <xs:element name="shortName">
         <xs:simpleType>
           <xs:restriction base="xs:string">
             <xs:maxLength value="20"/>
           </xs:restriction>
         </xs:simpleType>
       </xs:element>
       <xs:element name="description" type="xs:string"/>
       <xs:element name="author" type="xs:string"/>
       <xs:element name="price">
         <xs:complexType>
           <xs:sequence>
             <xs:element name="value" maxOccurs="unbounded">
                <xs:complexType>
                  <xs:simpleContent>
                    <xs:extension base="xs:double">
                      <xs:attribute name="effective" type="xs:date" use="required"/>
                      <xs:attribute name="freeware" type="xs:boolean"/>
                      <xs:attribute name="format" use="required">
                        <xs:simpleType>
                          <xs:restriction base="xs:string">
                            <xs:enumeration value="hard cover"/>
                            <xs:enumeration value="pdf"/>
                          </xs:restriction>
                        </xs:simpleType>
                      </xs:attribute>
                      <xs:assert test="@effective lt current-date()" />
                      <xs:assert test="if (@freeware eq true()) then (@format eq 'pdf' and . eq 0)
                                       else true()" />                    
                    </xs:extension>
                  </xs:simpleContent>
                </xs:complexType>
             </xs:element>
           </xs:sequence>
           <xs:assert test="every $vl in value[position() lt last()] satisfies
                            ($vl gt $vl/following-sibling::value[1]) and 
                            ($vl/@effective lt $vl/following-sibling::value[1]/@effective)" />
         </xs:complexType>
       </xs:element>
     </xs:sequence>
     <xs:attribute name="ID" type="xs:positiveInteger" use="required"/>
   </xs:complexType>
   
   <xs:element name="product" type="Product"/>

  </xs:schema>

I don't wish to explain in detail the problem domain behind the above XML & XSD documents (I believe, readers familiar with XSD language & XML could easily understand the intent of the above example). In very shortest description, this example "illustrates a simple product catalog, describing a single product".
Here's a short explanation, about what the assertions -- highlighted with a different color (starting from assertion at top, to assertion at bottom) in above schema document are, intending to do:
1. The first assertion is checking, that the value of attribute "effective" (with a schema type, xs:date) is prior to today's date.
2. The second assertion is checking, that if value of attribute "freeware" is a boolean 'true', then value of attribute "format" must be 'pdf' & the numeric value of price should be 0.
3. The third assertion is checking, that price always reduces in future, & the effective date of the price is prior to the next price revision.

I enjoyed writing this example, and I'm glad that this worked with Xerces. The Eclipse/PsychoPath XPath 2.0 implementation (which is the underlying XPath 2 implementation, used by Xerces-J XSD 1.1 assertions implementation) also looks pretty compliant to the XPath 2 language.

I hope, that this post is useful.

Sunday, March 7, 2010

Xerces-J: XSModel serialization

There's a new API sample contributed to the Xerces-J code-base (in the schema-dev XSD 1.1, branch), which allows us to serialize a Xerces-J XSModel. This should be available in the upcoming Xerces-J release, 2.10.0.

This could be invoked by using the Java class, xs.XSSerializer.

Here's one of the use-case for this, as asked by colleagues in the community:
http://mail-archives.apache.org/mod_mbox/xerces-j-users/200611.mbox/%3c4557B05A.6000808@gael.fr%3e

Wednesday, February 24, 2010

XSD 1.1: some more assertions fun

Here are some more XSD 1.1 assertions examples (interesting one's I guess), that I tried running with Xerces-J XSD 1.1 implementation (these ones run fine, with Xerces!):

Example 1 [1]:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="test" type="X" />
   
    <xs:complexType name="X">
      <xs:group ref="List1" />
      <xs:assert test="a and b and d" />
    </xs:complexType>
   
    <xs:complexType name="Y">
      <xs:group ref="List1" />
      <xs:assert test="a and b and c and d" />
    </xs:complexType>
   
    <xs:group name="List1">
       <xs:sequence>
         <xs:element name="a" type="xs:string" minOccurs="0"/>
         <xs:element name="b" type="xs:string" minOccurs="0"/>
         <xs:element name="c" type="xs:string" minOccurs="0"/>
         <xs:element name="d" type="xs:string" minOccurs="0"/>
       </xs:sequence>
    </xs:group>
           
  </xs:schema>

The corresponding XML instance, document is:
<test>
    <a>hello</a>
    <b>world</b>
    <!--<c>hello..</c>-->
    <d>world..</d>
  </test>

Here's the rationale/goal, that motived me to write this XSD sample:
I wanted to define a pair of XSD complex types (something like, X & Y above), such that one of the types could reuse the element particles, from the other type. If this problem could have been solved with XSD type derivation (which I attempted initially), I wanted that only one of the elements in the derived type could become optional -- element, "c" in this example (i.e, with minOccurs = 0 & maxOccurs = 1), while the other elements from the base type should have the same occurrence indicator (i.e, a mandatory indicator -- which is, minOccurs = maxOccurs = 1).

Interestingly, this problem is unsolvable with XSD type derivation (either complex type extension, or restriction mechanism).

For this schema use-case, I came up with the XSD sample above [1], which meets my goal to be able to re-use the element particles in the XSD types. The Schema above [1], defines a global group which contains a sequence of XML element definitions. All of the elements in the group, are marked as optional. Within the complex types (X & Y), the cardinality of elements (0-1 or 1-1) is enforced with XSD assertions. Defining all elements in the group, as optional allows us to reuse this list in different XSD types easily, as we can constrain the elements (say controlling the cardinality of elements, or even the contents of elements/attributes) in different contexts/types say using, assertions.

Using the above schema example [1], therefore if one wants to use a XSD type, where element "c" is optional, one would use the type, "X". While if, one wants to use a XSD type, where all elements are mandatory, one would use the type, "Y".

After having solved the use-case I had in mind (explained above), so just for fun, I wrote another schema using some more assertions.

Here's the 2nd XSD schema:

Example 2 [2]:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

     <xs:element name="test" type="X" />
   
     <xs:complexType name="X">
       <xs:group ref="List1" />
       <xs:assert test="a and b and d" />
     </xs:complexType>
   
     <xs:complexType name="Y">
       <xs:group ref="List1" />
       <xs:assert test="a and b and c and d" />
     </xs:complexType>
   
     <xs:group name="List1">
        <xs:sequence>
           <xs:element name="a" minOccurs="0">
             <xs:complexType>
               <xs:sequence>
                 <xs:element name="a1" type="xs:string" maxOccurs="unbounded" />
               </xs:sequence>
               <xs:attribute name="aCount" type="xs:nonNegativeInteger" />
               <xs:assert test="count(a1) eq @aCount" />
             </xs:complexType>
           </xs:element>
           <xs:element name="b" type="xs:string" minOccurs="0"/>
           <xs:element name="c" type="xs:string" minOccurs="0"/>
           <xs:element name="d" type="xs:string" minOccurs="0"/>
        </xs:sequence>
     </xs:group>
           
  </xs:schema>

The schema [2] is conceptually similar, to schema [1]. The only difference between the two schemas is, that in schema [2], element "a" has complex content, while in schema [1], element "a" is defined to have simple content (which is, xs:string). In schema, [2]'s complex type we define another assertion (which enforces the constraint that, value of attribute "aCount" is equal to the number of, "a1" children of element, "a"). The assertion definition in the complex type of element, "a" in the 2nd schema, is written only to visually increase the complexity of the element a's definition (of-course, this also does increase the functional complexity of element, "a" and subsequently the complexity of contents of the global group definition, in the 2nd schema).

The 2nd schema illustrates, that a more functionally complex list of particles (a, b, c & d here) get more benefit by the schema component re-use technique (accomplished with a XSD group, and assertions) illustrated in this post.

I hope, that this post is useful.

Sunday, February 14, 2010

Xerces-J, XSD 1.1 assertions: complexType -> simpleContent -> restriction

XSD 1.1 complex types are specified by the grammar given here, in the XSD 1.1 spec:
http://www.w3.org/TR/xmlschema11-1/#declare-type

XSD complex type definitions are essentially composed of three mutually exclusive definitions, as follows:
  <complexType ...
    simpleContent |
    complexContent |
    openContent?, (group | all | choice | sequence)?, ((attribute | attributeGroup)*, anyAttribute?), assert*))
  </complexType>

The assertions specification in complexType -> simpleContent -> restriction is a bit different, that all other assertions cases on complex types (as this consists of assertion facets, as well as/or assertions on the complex type).

This is specified by the following XSD 1.1 grammar:
  <simpleContent
    id = ID
    {any attributes with non-schema namespace . . .}>
    Content: (annotation?, (restriction | extension))
  </simpleContent>

  <restriction
    base = QName
    id = ID
    {any attributes with non-schema namespace . . .}>
    Content: (annotation?, (simpleType?, (minExclusive | minInclusive | maxExclusive | maxInclusive | totalDigits | fractionDigits | maxScale | minScale | length | minLength | maxLength | enumeration | whiteSpace | pattern | assertion | {any with namespace: ##other})*)?, ((attribute | attributeGroup)*, anyAttribute?), assert*)
  </restriction>

The XSD definition for xs:restriction above specifies assertions something like following:
assertion*, ..., assert*

Here, xs:assertion (with cardinality, 0-n) is a facet for the simple type value (specified by, complexType -> simpleContent). Whereas, xs:assert (with cardinality, 0-n) is an assertion definition on the complex type (which has access to the element tree, like the XML element itself, and it's attributes if there are any). xs:assertion definitions on, complexType -> simpleContent -> restriction do not have access to the element tree (on which the complex type is applicable), and can only access the simple type value (using, the implicit assertion variable $value, having a XSD type specified by the definition, <xs:restriction base = QName ...) of the element in the context.

Here's a small fictitious examples, illustrating these concepts:

XML document [1]:
  <A a="15">Example A</A>

XSD 1.1, Schema [2]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="A">
      <xs:complexType>
        <xs:simpleContent>
          <xs:restriction base="myBase">    
            <xs:assertion test="contains($value, 'Example')" />
            <xs:assert test="@a mod 5 = 0" />    
          </xs:restriction>
        </xs:simpleContent>
      </xs:complexType>
    </xs:element>
  
    <xs:complexType name="myBase">
      <xs:simpleContent>
         <xs:extension base="xs:string"> 
           <xs:attribute name="a" type="xs:int" />  
         </xs:extension>
      </xs:simpleContent> 
    </xs:complexType>

  </xs:schema>

In the Schema above [2], there are two assertions (shown with bold emphasis) specified on the XSD type. One of assertions is a facet for the simple content, and the other is an assertion on the complex type.

I believe, the above Schema is simple enough and self-explanatory, to illustrate the points I've tried to explain in this post.

Actually, what prompted me to write this post, was that there was a minor bug in complexType -> simpleContent -> restriction facet processing in Xerces-J XSD 1.1 SVN code, which we could fix today, and the fix is now available in Xerces-J SVN repository.

Interestingly, this fix was there in Xerces-J SVN during some past Xerces SVN version. But going forward with assertions development, this bug got introduced, and now has been fixed again.

Saturday, February 6, 2010

PsychoPath XPath2 processor update: fn:name() function fix

While writing following blog post, http://mukulgandhi.blogspot.com/2010/01/xsd-11-wild-cards-in-compositor-and.html (dated, Jan 31, 2010) [1], I actually unearthed a bug in PsychoPath XPath 2 processor, whereby the XPath2 fn:name() function didn't evaluate properly with zero arity (it raised a "context undefined" exception, even if a context item existed).

This bug led me to use the, fn:local-name() (whose implementation was correct) function instead, for the above mentioned blog post [1].

The good news is, that now this bug with fn:name() function is fixed (ref, https://bugs.eclipse.org/bugs/show_bug.cgi?id=301539).

For the example given in the blog post [1], the given XSD 1.1 assertion could now be written like following, as well:
  <xs:assert test="(*[1]/name() = ('fname', 'lname')) and 
                 (*[2]/name() = ('fname', 'lname'))" />

(instead, of the "local-name" function as used in the mentioned blog post [1])

Friday, February 5, 2010

C. M. Sperberg-McQueen: slides about XSD 1.1

I just came across this brief (but sufficient enough to give a good overview) slide presentation about XML Schema (XSD) 1.1, by C. M. Sperberg-McQueen:

http://www.blackmesatech.com/2009/07/xsd11/

Nice ones indeed, and highly recommended.

Sunday, January 31, 2010

XSD 1.1: wild-cards in xs:all compositor, and assertions

I was reading through the latest XSD 1.1 language draft, and one of the things that has changed between XSD 1.0 and 1.1, are some of the details of, xs:all compositor instruction.

XSD 1.1 defines <xs:all ..> compositor as follows:
  <all
    id = ID
    maxOccurs = 1 : 1
    minOccurs = (0 | 1) : 1
    {any attributes with non-schema namespace . . .}>
    Content: (annotation?, (element | any)*)
  </all>

Whereas, XSD 1.0 defined xs:all instruction as following:
  <all
    id = ID
    maxOccurs = 1 : 1
    minOccurs = (0 | 1) : 1
    {any attributes with non-schema namespace . . .}>
    Content: (annotation?, element*)
  </all>

XSD 1.1 allows xs:any wild-card to be part of xs:all (whereas, XSD 1.0 didn't allow this), which makes xs:all instruction to be more useful (because, with xs:any we could make the Schema type more open). We could also have certain Schema constraints present as assertions (as illustrated in the example below), restricting the degree of Schema openness (achieved by xs:any wild-card) to an extent we would want.

Here's an example I came up with, illustrating the use of xs:any wild-card within xs:all compositor, and having some assertions, for imposing some constraints on the ordering of elements (which means, that we are restricting the degree of openness achieved by xs:any using assertions) in the instance document:

XML document [1]:
  <Person>
    <fname>Mukul</fname>
    <lname>Gandhi</lname>
    <sex>M</sex>
    <address>
      <street1>xyz</street1>
      <street2>street</street2>
      <street2>gurgaon</street2>
    </address>
  </Person>

XSD 1.1 Schema [2]:
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="Person">
      <xs:complexType>
        <xs:all>
          <xs:element name="fname" type="xs:string" />
          <xs:element name="lname" type="xs:string" />
          <xs:element name="sex" type="xs:string" />
          <xs:any processContents="lax" />
        </xs:all>
        <xs:assert test="(*[1]/local-name() = ('fname', 'lname')) and 
                         (*[2]/local-name() = ('fname', 'lname'))" />
      </xs:complexType>
    </xs:element>
  
  </xs:schema>

In the above Schema [2], if the assertions are not present, then xs:all compositor would mean, that it's contents can be present in any order (including the xs:any wild-card, which is newly introduced in XSD 1.1 within xs:all).

The assertion in the Schema above [2], constrains the first two child elements of, "Person" element to be "fname" or "lname".

Xerces-J's XSD 1.1 processor, seems to implement these syntax fine.

I hope that this post is useful.