Sunday, July 11, 2010

XSD 1.1: XML schema design approaches cotd... PART 2

I'm continuing with the XML Schema design approaches series, I started in the previous blog post. Here's the second post in this series.

Here's a description of the use-case I'll be illustrating in this post, with both XML Schema 1.0 and 1.1 examples:

We need to write an XML Schema for the following XML content model:
  colors
    -> (violet | indigo | blue | green | yellow | orange | red)+

Here the words "colors", "violet" etc represent XML elements, and they have no attributes and are empty. The above content model means, that children of element "colors" can repeat and are unordered, and at-least one of them is required.

Therefore following XML document is a valid instance according to this content model:

[XML1]
  <colors>
     <violet/>
     <indigo/>
     <blue/>
     <green/>
     <yellow/>
     <orange/>
     <red/>
  </colors>

AND for example, the following XML document is valid as well, as per the content model described above (here the element "colors" have less children than the previous example, and some of children of "colors" occur more than once):

[XML2]
  <colors>
     <violet/>
     <indigo/>
     <blue/>
     <green/>
     <green/>
  </colors>

Here are two XML schema examples that express the above XML content model constraints:

[XML Schema 1] (written in XML Schema 1.0)
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
     <xs:element name="colors">
        <xs:complexType>
           <xs:choice maxOccurs="unbounded">
              <xs:element name="violet" type="EMPTY" />
              <xs:element name="indigo" type="EMPTY" />
              <xs:element name="blue" type="EMPTY" />
              <xs:element name="green" type="EMPTY" />
              <xs:element name="yellow" type="EMPTY" />
              <xs:element name="orange" type="EMPTY" />
              <xs:element name="red" type="EMPTY" />     
           </xs:choice>
        </xs:complexType>
     </xs:element>
   
     <xs:complexType name="EMPTY"> 
        <xs:complexContent> 
          <xs:restriction base="xs:anyType" /> 
        </xs:complexContent> 
     </xs:complexType>

  </xs:schema>

[XML Schema 2] (written in XML Schema 1.1 -- the 1.1 specific constructs are displayed with a different color)
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
     <xs:element name="colors">
        <xs:complexType>
           <xs:sequence>
             <xs:any maxOccurs="unbounded" processContents="lax" />
           </xs:sequence>
           <xs:assert test="every $x in */name() satisfies ($x = 
                              ('violet','indigo','blue','green','yellow','orange','red'))" />
           <xs:assert test="every $x in * satisfies not($x/node())" />
        </xs:complexType>
     </xs:element>

  </xs:schema>

Here's some quick analysis from my point of view, about the differences between the above schema approaches, and if any of the above approaches is better than the other one:
1) "XML Schema 1" is written in a familiar 1.0 style, so people who want to stick with 1.0 can still adopt this technique. We can observe, that the first schema is a little more verbose than the second one, which I see as one of the advantage of the second one.

2) If you are comfortable writing the XPath 2.0 expressions, then there are virtually too many possibilities to express schema validation constraints with XSD 1.1 assertions, which is really lots of power in the hands of the schema author!

3) Personally speaking, I find the second way of writing the XML schema ("XML Schema 2") a really cool NEW way to express these validation constrains. I'm not suggesting that the 1st way is not really good! That technique has great value, in it's own sense and has stood the tests of time. I find the second technique a more natural description from the schema author, to express the logic of the use-case in question.

4) One the possibilities I now foresee with XML Schema 1.1, is that schema author could impose quite a bit of constraints on xs:any wild-card instruction via assertions (which is particularly useful with processContents="lax" mode of the xs:any wild-card). A point worth observing is that with processContents="strict" mode of the xs:any wild-card, assertions are not really useful because, the schema validator would strictly validate the XML element with an element declaration, which must be provided by the schema author to satisfy the processContents="strict" mode of the wild-card (and assertions here would actually interfere with the available element declarations, which to my opinion is not a good design). With processContents="skip" mode of the xs:any wild-card, assertions would always fail (and the XML instance would become invalid), because the concerned XML elements would be discarded by the XML schema validator, and consequently these elements would not be part of the XPath data-model tree, on which assertions operate.

And needless to mention, Xerces-J handles all the above examples fine!

I hope that this post is useful.

No comments: