Sunday, July 22, 2012

XSD 1.1 assertions with complexType extensions

I thought, it would be good to write this post here and sharing with XML Schema folks.

There was an interesting debate on xmlschema-dev list recently, where we argued that what is the benefit of specifying an XSD 1.1 assertion within a XSD complexType that is derived from another complexType via an extension operation. It was initially thought, that an assertion within such a derived complexType would produce (and always) an XML content model restriction effect (which is opposed to the actual intent of complexType extension) -- if this is the only affect of assertions in this case, then using assertions in this case is counter intutive. Therefore, would there be any benefit of specifying assertions within a derived XSD complexType when using an extension derivation (and XSD 1.1 language currently provides this facility)?

After some thought, we found a benefit of using assertions for this scenario. Following is an example, illustrating one of the benefits of assertions for this case:

XSD Schema document (XS1):
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:complexContent>
             <xs:extension base="T1">
                <xs:sequence>
                   <xs:element name="c" type="xs:string"/>
                </xs:sequence>
                <xs:assert test="a = c">
                   <xs:annotation>
                      <xs:documentation>
                         The value of element "a" must be equal to value of element "c".
                      </xs:documentation>
                   </xs:annotation>
                </xs:assert>
             </xs:extension>
          </xs:complexContent>
       </xs:complexType>
    </xs:element>
    
    <xs:complexType name="T1">
       <xs:sequence>
          <xs:element name="a" type="xs:string"/>
          <xs:element name="b" type="xs:string"/>
       </xs:sequence>
    </xs:complexType>

</xs:schema>

XML instance document (XML1):
<X>
  <a>same</a>
  <b/>
  <c>same</c>
</X>

We want to validate the XML instance document, XML1 above with the schema shown above (XS1). The XML content within element "X", is declared via an XSD complexType that is derived by extension from another complexType. The xs:assert element specified in the schema XS1 above, has the following semantic intent: "to specify a relational constraint between two sibling elements" (elements "a" and "c" in this case).

Summarizing the design thoughts, for the schema specified above (XS1):
1) An assertion within XSD complexType extension derivation, doesn't always produce a restriction effect. As illustrated in the example above, an assertion is specifying a orthogonal (along with the traditional xs:extension constraint) co-occurence constraint -- this is intuitive, and useful.
2) We should be careful though, to be aware that an xs:assert element within complexType extension can easily inject a content model restriction effect. If this is not wanted, an assertion shouldn't be used for such derived XSD complex types. Following is an XML Schema example, illustrating this scenario:

XSD Schema document (XS2):
(intended to validate the XML document XML1 above)
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:complexContent>
             <xs:extension base="T1">
                <xs:sequence>
                   <xs:element name="c" type="xs:string"/>
                </xs:sequence>
                <xs:assert test="not(b)">
                   <xs:annotation>
                      <xs:documentation>
                         The element "b" is prohibited.
                      </xs:documentation>
                   </xs:annotation>
                </xs:assert>
             </xs:extension>
          </xs:complexContent>
       </xs:complexType>
    </xs:element>
    
    <xs:complexType name="T1">
       <xs:sequence>
          <xs:element name="a" type="xs:string"/>
          <xs:element name="b" type="xs:string" minOccurs="0"/>
       </xs:sequence>
    </xs:complexType>

</xs:schema>

The schema, XS2 above illustrates following design intents:
1) An xs:assert element within complexType of element "X" prohibits element "b" from occuring within XML instance element "X". An assertion like this, is restricting the complex type "content model" of the base type. If we wouldn't like a content model restricting effect like this, then we shouldn't use an xs:assert with complexType extension.
2) The schema document, XS2 specified above can still thought to be useful to design. The complexType definition of element "X" in schema XS2 above, is quite like a mixture of extension and restriction derivation both. It is an extension derivation, because some of the element particles of the base type are made available within the derived type via an xs:extension element (element "a" for this example). It is also a restriction derivation, because the element "b" of the base type is prohibited to occur in the derived type via an xs:assert element. The complexType definition of element "X" in this case, is unlike any of the facilities of the XSD 1.0 language which allows a pure extension derivation or a pure restriction derivation but not both. Assertions can sometimes thought to be useful via a schema design like this, when we want some of the complexType extension and restriction derivation effects both.

Therefore, here's my final take of these design issues:
1) An assertion is very much intutive (and useful), to specify co-ccurence constraints between XML elements within the sibling XPath axis, and very much so also with the XSD xs:extension element (this is unlike any of XSD 1.0 facilities). Other content model co-occurence scenarios are also useful in this case, like specifying co-constraints between an  element and a attribute etc. XSD assertions are certainly recommended for this case.
2) An assertion is also very much intutive, to specify a mixture of complexType extension and restriction derivation operations (as illustrated in schema example, XS2 above). XSD assertions are certainly also recommended for this case.
3) If an XSD schema author desires to strictly use the element xs:extension for expressing pure content model extension, then using assertion within xs:extension is counter intutive (since it may inject a content model restriction effect) and is not recommended.

Therefore, if we have to do some new kinds of XML Schema modeling with XSD 1.1 assertions (for e.g, with xs:extension derivations), assertions are certainly a nice XML Schema constructs.

I hope, that this post was useful.

No comments: