Saturday, November 27, 2010

XML Schema 1.1: complexType restriction rules

I've been excited enough to write now about the new rules that have been specified in XML Schema 1.1 spec regarding type derivations between XML Schema complexType definitions and what is Xerces-J's (it's XML Schema 1.1 engine) current compliance about this area of XML Schema language. In this blog post I'm currently covering XML schema complexType restriction derivations. I'll try to write about complexType extensions sometime later. I thought that this post might find audience interested in this topic (anyone is invited to write a comment to this blog post, which will help me to learn more about type derivations between XML schema complex types -- "i'm interested in both restriction and extension derivations", and can also give Xerces team useful feedback to improve Xerces in desired and compliant ways). Below are my findings from the XML Schema 1.1 spec about this topic, and also Xerces's compliance status in this regard (I acknowledge that my understanding may yet not be complete about these areas of the XML Schema language :).

In XML Schema 1.0 language complex type restriction derivation rules are defined by schema particle restriction rules specified here, http://www.w3.org/TR/xmlschema-1/#coss-particle. There's a 5x5 table in this section which describes what constitute valid restrictions (and what schema type restrictions are forbidden) of XML schema particles.

In XML Schema 1.1 all of these complexType derivation rules are replaced by sections 3.4.6.3 Derivation Valid (Restriction, Complex) and 3.4.6.4 Content Type Restricts (Complex Content). In XML Schema 1.1 a mapping table (the 5x5 table) for particle restrictions is removed, and now a generic algorithm of subsumption relationship (a kind of containment or association relationship) of default bindings (which is an abstract notion for element and attributes declarations along with wild-card attributes "strict", "lax" and "skip") is specified. The XML Schema 1.1 complexType subsumption rules are simpler and easy to remember, than the corresponding type derivation rules from XML Schema 1.0 spec. My personal understanding so far is that, the improved default binding particle subsumption rules in XML Schema 1.1 make XML Schema 1.1 complexType restriction derivations largely compatible with corresponding type derivation rules in XML Schema 1.0, but the rules are now specified with better wordings.

Below are various XML schema complexType restriction cases I've studied so far (and these have corresponding implementations in Xerces; the upcoming Xerces-J 2.11.0 release would have these features), the characteristics of which are also described and I'm trying to discover more of the rules in these areas of XML Schema language.

xs:sequence, xs:choice and xs:all are possible compositors (which signify the notion of how we can compose schema particles in XML schema complexType definitions) in schema complexType's.

A) SEQUENCE TO SEQUENCE RESTRICTIONS
a.1 xs:element is derived from xs:any wild-card (both of these particles are part of an XML Schema sequence compositor). In this scenario cardinality of particles takes precedence than presence of a concrete element in derived type, when determining valid particle derivations.

For e.g <xs:element name="x" type="xs:string" minOccurs="0"/> is not a valid restriction of <xs:any processContents="lax" />, since the effective cardinality of element "x" (minOccurs="0" means that particle "x" is optional) is more than that of the wild-card particle (is mandatory).

a.2 There must be a similar (i.e X-to-X where X is a positive numerical value) mapping of particles from a schema 'base' to 'derived' type. i.e a derived type cannot have less number of particles than those in base type, and a particle in derived type must validly derive (i.e is subsumed validly as per rules specified in XML Schema 1.1 spec) from the corresponding particle in base schema type.

B) ALL TO SEQUENCE RESTRICTIONS
b.1 This is a valid schema compositor (and of particles in them) restriction (i.e ordered from unordered restriction).

For e.g sequence(b, a) and sequence(a, b) {order of particles in derived type doesn't matter} are valid restrictions of all(a, b).

b.2 Identity of particles (recognized by QName of the particles) is recognized by the XML schema validator, and corresponding such particles must obey rules of restriction by cardinality (i.e an optional characteristic of particle does not make particle a valid restriction of a mandatory particle, where QName's of corresponding such particles in base and derived types are same).

C) ALL TO ALL RESTRICTIONS
c.1 This is an unordered to unordered kind restriction. Concrete element particle is an valid derivation of a wild-card particle.

c.2 Cardinality of identical particles (having same QName's) in derived type must be same or less (which makes the derived particle validly derive from the corresponding particle from base type) than that in base type. Particle cardinalities take precedence over generic/concrete relationship between particles, when determining valid particle subsumptions.

c.3 Number of leaf particles (which are essentially xs:element and xs:any wild-card's) in derived and base types must be equal.

D) SEQUENCE TO ALL RESTRICTIONS
This is not a valid schema compositor restriction (i.e from ordered to unordered).

E) CHOICE TO SEQUENCE RESTRICTIONS
e.1 Here are few examples explaining some of the rules for this category.
  <xs:sequence>
     <xs:element name="c" type="xs:string" />
  </xs:sequence>

is a valid restriction of
  <xs:choice>  
     <xs:any processContents="lax" />
     <xs:element name="b" type="xs:string" />
  </xs:choice>

(the element particle "c" is subsumable by the wild-card)

e.2
  <xs:sequence>
     <xs:any processContents="lax" />
  </xs:sequence>

is not a valid restriction of
  <xs:choice>         
     <xs:element name="a" type="xs:string" />
     <xs:element name="b" type="xs:string" />
  </xs:choice>

This is so because a wild-card is not a valid subsumption of an element particle (i.e generic derivations from concrete elements is not a valid restriction, which in fact looks like an "type extension" concept).

F) SEQUENCE TO CHOICE RESTRICTIONS
Here's an example I can think over that correspond to use case of such kinds.
   
   <xs:restriction base="TYPE_BASE">
      <xs:choice>
         <xs:group ref="myGroup" />
      </xs:choice>
   </xs:restriction>
   
   is a valid restriction of
   
   <xs:complexType name="TYPE_BASE">
      <xs:group ref="myGroup" />
   </xs:complexType>
   
   <xs:group name="myGroup">
      <xs:sequence>
         <xs:element name="a" type="xs:string" />
         <xs:element name="b" type="xs:string" />
      </xs:sequence>
   </xs:group>

But this is not a useful schema type restriction, since the result of choice (i.e the schema particle produced from xs:choice) in derived type results only in one option, which is same as the contents of the sequence of the base type.

Other than the above example I cannot envision any other useful example for practical scenarios for "sequence to choice" restriction. I would imagine that schema authors must not bother much about "sequence to choice" restriction scenarios, as this doesn't looks a good and useful schema design scenario (but I don't deny that people may find valid uses of this as well :).

G) CHOICE TO CHOICE RESTRICTIONS
Here are few of the examples I can think of that satisfy this use-case (these I've found to be working fine with Xerces as well):

g.1 choice(a, c) is not a valid restriction of choice(a, b). Because element "c" in derived type doesn't have a corresponding element particle in the base type.

g.2
- choice(a, b) is a valid restriction of choice(a, wild-card processContents="lax"). If the wild-card can resolve to an element declaration that doesn't match element declaration "b", then this is NOT-A-VALID restriction.
- choice(a, b) is a valid restriction of choice(a, wild-card processContents="strict") if wild-card can resolve to an element decleration for "b" OTHER-WISE not.

g.3 choice(group name="myGroup", a) is a valid restriction of choice(group name="myGroup", xs:any processContents="lax"). Here model group instance is considered as a particle. But if the wild-card resolves to an element declaration that doesn't match element declaration "a", then this is NOT-A-VALID restriction.

g.4 choice(group name="myGroup", a) is not a valid restriction of choice(group name="myGroup", <xs:any/>). But this is a valid restriction if wild-card <xs:any> can find definition of element "a" which can derive (i.e is a valid subsumption) to element "a" in the derived type.

These are all the cases I can think of at the moment (enumerated A to G) which might occur for restriction between XML Schema 1.1 complexType's. I believe there would be few more complexType restriction cases which I'll try to post on this blog as I discover them.

I hope that this post was useful.