Sunday, May 19, 2013

Thanks to OxygenXML folks

On behalf of Xerces-J XML Schema team, I would like to thank folks from Oxygen XML team to highlight many important bugs within Xerces-J XSD 1.1 validator. We've been able to solve many of those reported bugs, and I feel this has made implementation of Xerces-J XSD 1.1 validator quite better.

Here's the list of issues reported by Oxygen folks during the past 1-2 years I guess, which are either resolved or closed:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20XERCESJ%20AND%20issuetype%20%3D%20Bug%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20reporter%20in%20%28radu_coravu%2C%20%22octavian.nadolu%22%29

In the above report, you might ignore bugs dated as old as 2006, which must have been resolved within an existing or an earlier Xerces-J version.

Other than the bugs reported by Oxygen XML folks, we also received bug reports from other members of XML community. Thanks to those persons also. 

I'm not sure when we're going to release next version of Xerces-J which should have many implementation improvements. Taking a very pessimistic view wrt this, I expect a new version of Xerces-J sometime later this year or might slip to next year.

Thursday, November 15, 2012

new thoughts about XSD 1.1 assertions

I've been thinking on these XSD topics for a while, and thought of summarizing my findings here.

Let me start this post by writing the following XML instance document (which will be the focus of all analysis in this post):

XML-1
<list attr="1 2 3 4 5 6">
    <item>a1</item>
    <item>a2</item>
    <item>a3</item>
    <item>a4</item>
    <item>a5</item>
    <item>a6</item>
</list>

We need to specify an XSD schema for the XML document above (XML-1), providing the following essential validation constraints:
1) The value of attribute "attr" is a sequence of single digit numbers. A number here can be modeled as an XSD type xs:integer, or as a restriction from xs:string (as we'll see below).
2) Each string value within an element "item" is of the form a[0-9]. i.e, this string value needs to be the character "a" followed by a single digit numeric character. We'll simply specify this with XSD type xs:string for now. We want that, each numeric character after "a" should be pair-wise same as the value at corresponding index within attribute value "attr". The above sample XML instance document (XML-1) is valid as per this requirement. Therefore, if we change any numeric value within the XML instance sample above (either within the attribute value "attr", or the numeric suffix of "a") only within the attribute "attr" or the elements "item", the XML instance document must then be reported as 'invalid' (this observation follows from the requirement that is stated in this point).

Now, let me come to the XSD solutions for these XML validation requirements.

First of all, we would need XSD 1.1 assertions to specify these validation constraints (since, this is clearly a co-occurrence data constraint issue.). Following is the first schema design, that quickly came to my mind:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
    <xs:element name="list">
        <xs:complexType>
           <xs:sequence>
              <xs:element name="item" type="xs:string" maxOccurs="unbounded"/>
           </xs:sequence>
           <xs:attribute name="attr">
              <xs:simpleType>
                 <xs:list itemType="xs:integer"/>
              </xs:simpleType>
           </xs:attribute>
           <xs:assert test="deep-equal(item/substring-after(., 'a'), data(@attr))"/>
        </xs:complexType>
    </xs:element>
   
</xs:schema>

The above schema is almost correct, except for a little problem with the way assertion is specified. As per the XPath 2.0 spec, the "deep-equal" function when comparing the two sequences for deep equality checks, requires that atomic values at same indices in the two sequences must be equal as per the rules of equality of an XSD atomic type. Within an assertion in the above schema, the first argument of "deep-equal" has a type annotation of xs:string* and the second argument has a type annotation xs:integer* (note that, the XPath 2.0 "data" function returns the typed value of a node) and therefore the "deep-equal" function as used in this case returns a 'false' result.

Assuming that we would not change the schema specification of "item" elements and the attribute "attr", the following assertion would therefore be correct to realize the above requirement:

<xs:assert test="deep-equal(item/substring-after(., 'a'), for $att in data(@attr) return string($att))"/>

(in this case, we've converted the second argument of "deep-equal" function (highlighted with a different color) to have a type annotation xs:string* and did not modify the type annotation of the first argument)

An alternative correct modification to the assertion would be:

<xs:assert test="deep-equal(item/number(substring-after(., 'a')), data(@attr))"/>

(in this case, we convert the type annotation of the first argument of "deep-equal" function to xs:integer* and do not modify the type annotation of the second argument)

I now propose a slightly different way to specify the schema for above requirements. Following is the modified schema document:

XS-2
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
    <xs:element name="list">
        <xs:complexType>
           <xs:sequence>
              <xs:element name="item" type="xs:string" maxOccurs="unbounded"/>
           </xs:sequence>
           <xs:attribute name="attr">
              <xs:simpleType>
                 <xs:list itemType="NumericChar"/>
              </xs:simpleType>
           </xs:attribute>
           <xs:assert test="deep-equal(item/substring-after(., 'a'), data(@attr))"/>
        </xs:complexType>
    </xs:element>
  
    <xs:simpleType name="NumericChar">
       <xs:restriction base="xs:string">
          <xs:pattern value="[0-9]"/>
       </xs:restriction>
    </xs:simpleType>
  
</xs:schema>

This schema document is right in all respects, and successfully validates the XML document specified above (i.e, XML-1). In this schema we've made following design decisions:
1) We've specified the itemType of list (the value of attribute "attr" is this list instance) as "NumericChar" (this is a user-defined simpleType, that uses the xs:pattern facet to constrain list items).
2) The "deep-equal" function as now written in the schema XS-2, has the type annotation xs:string* for both of its arguments. And therefore, it works fine.

I'll now try to summarize below the pros and cons of schema XS-2 wrt the other correct solutions specified earlier:
1) If the simpleType definition of attribute "attr" is not used in another schema context (i.e, ideally if this simpleType definition is the only use of such a type definition). Or in other words there is no need of re-usability of this type. Then the solution with schema XS-2  is acceptable.
2) If a schema author thought, that list items of attribute "attr" need to be numeric (due to semantic intent of the problem, or if the list's simpleType definition needs to be reused at more than one place and the other place needs a list of integers), then the schema solutions like shown earlier would be needed.

Here's another caution I can point wrt the schema solutions proposed above,
The above schemas would allow values within "item" elements like "pqra5" to produce a valid outcome with the "substring-after" function that is written in assertions. Therefore, the "item" element may be more correctly specified like,

<xs:element name="item" maxOccurs="unbounded">
    <xs:simpleType>
         <xs:restriction base="xs:string">
              <xs:pattern value="a[0-9]"/>
         </xs:restriction>
    </xs:simpleType>
</xs:element>

It is also evident, that XPath 2.0 "data" function allows us to do some useful things with simpleType lists, like getting the list's typed value and specifying certain checks on individual list items (possibly different checks on different list items) or accessing list items by an index (or a range of indices). For e.g, data(@attr)[2] or data(@attr)[position() gt 3]. This was not possible with XSD 1.0.

I hope that this post was useful, and hoping to come back with another post sometime soon.

Sunday, July 22, 2012

XSD 1.1 assertions with complexType extensions

I thought, it would be good to write this post here and sharing with XML Schema folks.

There was an interesting debate on xmlschema-dev list recently, where we argued that what is the benefit of specifying an XSD 1.1 assertion within a XSD complexType that is derived from another complexType via an extension operation. It was initially thought, that an assertion within such a derived complexType would produce (and always) an XML content model restriction effect (which is opposed to the actual intent of complexType extension) -- if this is the only affect of assertions in this case, then using assertions in this case is counter intutive. Therefore, would there be any benefit of specifying assertions within a derived XSD complexType when using an extension derivation (and XSD 1.1 language currently provides this facility)?

After some thought, we found a benefit of using assertions for this scenario. Following is an example, illustrating one of the benefits of assertions for this case:

XSD Schema document (XS1):
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:complexContent>
             <xs:extension base="T1">
                <xs:sequence>
                   <xs:element name="c" type="xs:string"/>
                </xs:sequence>
                <xs:assert test="a = c">
                   <xs:annotation>
                      <xs:documentation>
                         The value of element "a" must be equal to value of element "c".
                      </xs:documentation>
                   </xs:annotation>
                </xs:assert>
             </xs:extension>
          </xs:complexContent>
       </xs:complexType>
    </xs:element>
    
    <xs:complexType name="T1">
       <xs:sequence>
          <xs:element name="a" type="xs:string"/>
          <xs:element name="b" type="xs:string"/>
       </xs:sequence>
    </xs:complexType>

</xs:schema>

XML instance document (XML1):
<X>
  <a>same</a>
  <b/>
  <c>same</c>
</X>

We want to validate the XML instance document, XML1 above with the schema shown above (XS1). The XML content within element "X", is declared via an XSD complexType that is derived by extension from another complexType. The xs:assert element specified in the schema XS1 above, has the following semantic intent: "to specify a relational constraint between two sibling elements" (elements "a" and "c" in this case).

Summarizing the design thoughts, for the schema specified above (XS1):
1) An assertion within XSD complexType extension derivation, doesn't always produce a restriction effect. As illustrated in the example above, an assertion is specifying a orthogonal (along with the traditional xs:extension constraint) co-occurence constraint -- this is intuitive, and useful.
2) We should be careful though, to be aware that an xs:assert element within complexType extension can easily inject a content model restriction effect. If this is not wanted, an assertion shouldn't be used for such derived XSD complex types. Following is an XML Schema example, illustrating this scenario:

XSD Schema document (XS2):
(intended to validate the XML document XML1 above)
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:complexContent>
             <xs:extension base="T1">
                <xs:sequence>
                   <xs:element name="c" type="xs:string"/>
                </xs:sequence>
                <xs:assert test="not(b)">
                   <xs:annotation>
                      <xs:documentation>
                         The element "b" is prohibited.
                      </xs:documentation>
                   </xs:annotation>
                </xs:assert>
             </xs:extension>
          </xs:complexContent>
       </xs:complexType>
    </xs:element>
    
    <xs:complexType name="T1">
       <xs:sequence>
          <xs:element name="a" type="xs:string"/>
          <xs:element name="b" type="xs:string" minOccurs="0"/>
       </xs:sequence>
    </xs:complexType>

</xs:schema>

The schema, XS2 above illustrates following design intents:
1) An xs:assert element within complexType of element "X" prohibits element "b" from occuring within XML instance element "X". An assertion like this, is restricting the complex type "content model" of the base type. If we wouldn't like a content model restricting effect like this, then we shouldn't use an xs:assert with complexType extension.
2) The schema document, XS2 specified above can still thought to be useful to design. The complexType definition of element "X" in schema XS2 above, is quite like a mixture of extension and restriction derivation both. It is an extension derivation, because some of the element particles of the base type are made available within the derived type via an xs:extension element (element "a" for this example). It is also a restriction derivation, because the element "b" of the base type is prohibited to occur in the derived type via an xs:assert element. The complexType definition of element "X" in this case, is unlike any of the facilities of the XSD 1.0 language which allows a pure extension derivation or a pure restriction derivation but not both. Assertions can sometimes thought to be useful via a schema design like this, when we want some of the complexType extension and restriction derivation effects both.

Therefore, here's my final take of these design issues:
1) An assertion is very much intutive (and useful), to specify co-ccurence constraints between XML elements within the sibling XPath axis, and very much so also with the XSD xs:extension element (this is unlike any of XSD 1.0 facilities). Other content model co-occurence scenarios are also useful in this case, like specifying co-constraints between an  element and a attribute etc. XSD assertions are certainly recommended for this case.
2) An assertion is also very much intutive, to specify a mixture of complexType extension and restriction derivation operations (as illustrated in schema example, XS2 above). XSD assertions are certainly also recommended for this case.
3) If an XSD schema author desires to strictly use the element xs:extension for expressing pure content model extension, then using assertion within xs:extension is counter intutive (since it may inject a content model restriction effect) and is not recommended.

Therefore, if we have to do some new kinds of XML Schema modeling with XSD 1.1 assertions (for e.g, with xs:extension derivations), assertions are certainly a nice XML Schema constructs.

I hope, that this post was useful.

Saturday, April 14, 2012

XSD 1.1 is now a W3C standard

I've been looking forward (and I hope many others as well) to have the XSD 1.1 language to become a W3C recommendation (REC). XSD 1.1 did become a REC on 5th April 2012. This was a big big wait for the XSD community! But finally this has come so. The previous XSD standard (XSD 1.0 2nd edition) dates back to 2004.

XSD 1.1 implementations: There seems to be currently two XSD 1.1 implementations, which are Xerces and Saxon. Xerces is a project from Apache Software Foundation's XML activity, and Saxon is a product set from Saxonica (via Michael Kay). Both of these implementations pass near to 100% of the W3C XSD 1.1 test suite, so these tools are reliable implementations of the XSD 1.1 standard.

For the interest of readers (for those not aware), following are the feature change list (these are non-normative details, but are fairly complete. for the complete list of changes within XSD 1.1 wrt the XSD 1.0 language, you'll have to read the whole of XSD 1.1 language) that is within the XSD 1.1 language:

XSD 1.1 Structures specification: http://www.w3.org/TR/xmlschema11-1/#changes
XSD 1.1 Datatypes specification: http://www.w3.org/TR/xmlschema11-2/#changes

Wishing a happy reading, to the XSD folks (and to the wider XML community) of the new XSD 1.1 specification, and trying out the available implementations :)

Saturday, February 25, 2012

modular XML instances and modular XSD schemas

I was playing with some new ideas lately related to exploring design options, to construct modular XML instance documents vs/and modular XSD schema documents and thought to write my findings as a blog post here.

I believe, there are primarily following concepts related to constructing modular XML documents (and XSD schemas) when XSD validation is involved:
1. Modularize XML documents using the XInclude construct.
2. Modularize an XSD document via <xs:include> and <xs:import>. The <xs:include> construct maps significantly to modularlity concepts in XSD schemas, and <xs:import> is necessary (necessary in XSD 1.0, and optional in XSD 1.1) to compose (and also to modularize) XSD schemas coming from two or more distinct XML namespaces.

I don't intend to delve much in this post into concepts related to XSD constructs <xs:include> and <xs:import> since these are well known within the XSD and XML communities. In this post, I would tend to primarily focus on XML document modularization via the XInclude construct and presenting few thoughts about various design options (I don't claim to have covered every design option for these use cases, but I feel that I would cover few of the important ones) to validate such XML instance documents via XSD validation.

What is XInclude?
This is an XML standards specification, that defines about how to modularize any XML document information. The primary construct of XInclude is an <xi:include> XML element. Following is a small example of an XInclude aware XML document,

z.xml

<z xmlns:xi="http://www.w3.org/2001/XInclude">
    <xi:include href="x.xml"/>
    <xi:include href="y.xml"/>
</z>

x.xml

<x>
    <a>1</a>
    <b>2</b>
</x>

y.xml

<y>
    <p>5</p>
    <q>6</q>
</y>

We'll be using the XML document, z.xml provided above that is composed from other XML documents via an XInclude meta-data, to provide to an XSD validator for validation.

I essentially discuss here, the XSD schema design options to validate an XML instance document like z.xml above. Following are the XSD design options (that cause successful XML instance validations) that currently come to my mind for this need, along with some explanation of the corresponding design rationale:

XS1:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="z">
          <xs:complexType>
               <xs:sequence>
                    <xs:any processContents="skip" minOccurs="2" maxOccurs="2"/>
               </xs:sequence>
          </xs:complexType>
    </xs:element>
   
</xs:schema>

This schema is written with a view that, the XML document (i.e z.xml) would be validated with XInclude meta-data unexpanded. An xs:any wild-card in this schema would weakly validate (since this wild-card declaration only requires *any particular* XML element to be present in an instance document, which is validated by this wild-card. the wild-card here doesn't specify any other constraint for it's corresponding XML instance elements) each of the included XML document element roots (i.e XML elements "x" and "y").

XS2:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

        <xs:element name="z">
                <xs:complexType>
                     <xs:complexContent>
                         <xs:restriction base="T1">
                              <xs:sequence>
                                   <xs:element name="include"  minOccurs="2" maxOccurs="2" targetNamespace="http://www.w3.org/2001/XInclude"/>
                             </xs:sequence>
                         </xs:restriction>
                    </xs:complexContent>
                </xs:complexType>
        </xs:element>
   
    <xs:complexType name="T1" abstract="true">
          <xs:sequence>
               <xs:any processContents="skip" maxOccurs="unbounded"/>
          </xs:sequence>
    </xs:complexType>
   
</xs:schema>

This schema is also written with a view that, the XML document (i.e z.xml) would be validated with XInclude meta-data unexpanded. But this schema specifies slightly stronger XSD validation constraints as compared to the previous example (stronger in a sense that, this schema declares an XML element and specifies it's name and an namespace). This schema would need an XSD 1.1 processor, since the element declaration specifies a "targetNamespace" attribute. An XSD 1.0 version of this design approach is possible, which would involve using an XSD <xs:import> element to import XSD components from the XInclude namespace.

XS3:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

       <xs:element name="z">
              <xs:complexType>
                  <xs:sequence>
                       <xs:any processContents="skip" minOccurs="2" maxOccurs="2" namespace="http://www.w3.org/2001/XInclude"/>
                 </xs:sequence>
                 <xs:assert test="count(*[local-name() = 'include']) = 2"/>
                 <xs:assert test="deep-equal((*[1] | *[2])/@*/name(), ('href','href'))"/>
             </xs:complexType>
      </xs:element>
   
</xs:schema>

This schema is also written with a view that, the XML document (i.e z.xml) would be validated with XInclude meta-data unexpanded. But this schema enforces XSD validation even more strongly than the example "XS2" above (since this schema also requires the XInclude attribute "href" to be present on the XInclude meta-data, which the previous XSD schema doesn't enforce). This schema validates the names of XML instance elements, that are intended to be XInclude meta-data via XSD 1.1 <assert> elements (this may not be the best XSD validation approach, but such an XSD design idiom is now possible with XSD 1.1 language).

XS4:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="z">
         <xs:complexType>
               <xs:sequence>
                    <xs:element name="x">
                         <xs:complexType>
                             <xs:sequence>
                                  <xs:element name="a" type="xs:integer"/>
                                 <xs:element name="b" type="xs:integer"/>
                            </xs:sequence>
                        </xs:complexType>
                    </xs:element>
                    <xs:element name="y">
                         <xs:complexType>
                             <xs:sequence>
                                  <xs:element name="p" type="xs:integer"/>
                                  <xs:element name="q" type="xs:integer"/>
                             </xs:sequence>
                        </xs:complexType>
                   </xs:element>
              </xs:sequence>
         </xs:complexType>
     </xs:element>
   
</xs:schema>

This schema is written with a view that, the XML document (i.e z.xml) would be validated with XInclude meta-data expanded. This schema specifies the strongest of XSD validation constraints as compared to the previous three approaches (strongest in a sense that, the internal structure of XML element instances "x" and 'y" are now completely specified by the XSD document).

But to make this XSD validation approach to work, the XInclude meta-data needs to be expanded and the expanded XML infoset needs to be supplied to the XSD validator for validation. This would require an XInclude processor (like Apache Xerces), that plugs within the XML parsing stage to expand the <xi:include> tags.

For the interest of readers, following are few java code snippets (the skeletal class structure and imports are omitted to keep the text shorter) that enable XInclude processing and supplying the resulting XML infoset (i.e post the XInclude meta-data expansion) to the Xerces XSD validator,

try {           
     Schema schema = schemaFactory.newSchema(getSaxSource(xsdUri, false));
     Validator validator = schema.newValidator();
     validator.setErrorHandler(new ValidationErrHandler());
     validator.validate(getSaxSource(xmlUri, true));
}
catch(SAXException se) {
     se.printStackTrace();
}
catch (IOException ioe) {
     ioe.printStackTrace();
}

private SAXSource getSaxSource(String docUri, boolean isInstanceDoc) {

     XMLReader reader = null;

     try {
          reader = XMLReaderFactory.createXMLReader();
          if (isInstanceDoc) {
              reader.setFeature("http://apache.org/xml/features/xinclude", true);
              reader.setFeature("http://apache.org/xml/features/xinclude/fixup-base-uris", false);
          }
     }
     catch (SAXException se) {
          se.printStackTrace();
     }

     return new SAXSource(reader, new InputSource(docUri));

}
     
class ValidationErrHandler implements ErrorHandler {

      public void error(SAXParseException spe) throws SAXException {
           String formattedMesg = getFormattedMesg(spe.getSystemId(), spe.getLineNumber(), spe.getColumnNumber(), spe.getMessage());
           System.err.println(formattedMesg);
      }

      public void fatalError(SAXParseException spe) throws SAXException {
             String formattedMesg = getFormattedMesg(spe.getSystemId(), spe.getLineNumber(), spe.getColumnNumber(), spe.getMessage());
             System.err.println(formattedMesg);
      }

      public void warning(SAXParseException spe) throws SAXException {
           // NO-OP           
      }
       
}

private String getFormattedMesg(String systemId, int lineNo, int colNo, String mesg) {
      return systemId + ", line "+lineNo + ", col " + colNo + " : " + mesg;   
}

Summary: I would ponder that, is devising the above various XSD design approaches beneficial for an XSD schema design that involves validating XML instance documents that contain <xi:include> meta-data directives? My thought process with regards to the above presented XSD validation options had following concerns:
1) Providing various degrees of XSD validation strenghts for <xi:include> directives (essentially the un-expanded and expanded modes).
2) Exploring some of the new XML validation idioms offered by XSD 1.1 language for the use cases presented above (essentially using "targetNamespace" attribute on xs:element elements, and using <assert> elements).
3) Exploring the java SAX and JAXP APIs to enable XInclude meta-data expansion, and providing a SAXSource object containing an XInclude expanded XML infoset which is subsequently supplied further to the XSD validation pipeline.

I hope that this post was useful.

Sunday, February 5, 2012

"castable as" vs "instance of" XPath 2.0 expressions for XSD 1.1 assertions

I'm continuing with my thoughts related to my previous blog post (ref, http://mukulgandhi.blogspot.in/2012/01/using-xsd-11-assertions-on-complextype.html). The earlier post used the XPath 2.0 "castable as" expression to do some checks on the 'untyped' data of complexType's mixed content (essentially finding if the string/untyped value in an XML instance document is a lexical representation of an xs:integer value).

This post talks about the use of XPath 2.0 "instance of" vs "castable as" expressions in context of XSD 1.1 assertions -- essentially providing guidance about when it may be necessary to use one of these expressions.

The XSD 1.1 "castable as" use case was discussed in my earlier blog post. Here I essentially talk about "instance of" expression when used with XSD 1.1 assertions.

Let's assume that there is an XML instance document like following (XML1):

<X>
   <elem>
     <a>20</a>
     <b>30</b>
   </elem>
   <elem>
     <a>10</a>
     <b>2005-10-07</b>
   </elem>
</X>

The XSD schema should express the following constraints with respect to the above XML instance document (XML1):
1. The elements "a" and "b" can be typed as an xs:integer or a xs:date (therefore we'll express this with an XSD simpleType with variety 'union').
2. If both the elements "a" and "b" are of type xs:integer (this is allowable as per the simpleType definition described in point 1 above), then numeric value of element "a" should be less than numeric value of element "b".
3. If one of the elements "a" or "b" is an xs:integer and the other one is xs:date, then we would like to express the following constraints,
   - the numeric XML instance value of an xs:integer typed element should be less than 100
   - the xs:date XML instance value should be less that the current date

The following XSD (1.1) schema document describes all of the above validation constraints for a sample XML instance document (XML1) provided above:

[XS1]

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
     <xs:element name="X">
        <xs:complexType>
           <xs:sequence>
              <xs:element name="elem" maxOccurs="unbounded">
                 <xs:complexType>
                    <xs:sequence>
                       <xs:element name="a" type="union_of_date_and_integer"/>
                       <xs:element name="b" type="union_of_date_and_integer"/>
                    </xs:sequence>
                    <xs:assert test="if ((data(a) instance of xs:integer) and (data(b) instance of xs:integer))
                                              then (data(a) lt data(b))
                                           else if (not(deep-equal(data(a), data(b))))
                                              then (*[data(.) instance of xs:integer]/data(.) lt 100
                                                         and
                                                      *[data(.) instance of xs:date]/data(.) lt current-date())
                                              else true()"/>
                 </xs:complexType>
              </xs:element>
           </xs:sequence>
        </xs:complexType>
     </xs:element>
   
     <xs:simpleType name="union_of_date_and_integer">
        <xs:union memberTypes="xs:date xs:integer"/>
     </xs:simpleType>
   
</xs:schema>

I think it may be interesting for readers to know why I wrote an assertion like the one above. Following are few of the thoughts,
1. Since the XML elements "a" and "b" are typed as a simpleType 'union', therefore for an assertion to access the XML instance atomic values that were validated by such an simpleType we need to use the XPath 2.0 "data" function on a relevant XDM node (elements "a" and "b" in this case). Further determining that the XML document's atomic instance value is typed as xs:integer, we need to use the "instance of" expression -- "castable as" is not needed in this case, since the instance document's data is already typed.
2. The rest of the assertion implements what is mentioned in the requirements above.

If you want to have further visual and/or design elegance within what is written in an assertion above, one may feel free to break assertion rules into two or more assertions.

I would also want to write another XSD 1.1 assertions example which doesn't use an XPath 2.0 "castable as" or an "instance of" expression. This demonstrates that, if an XDM assert node is already typed it would usually be unnecessary to use the "castable as" expression (since "castable as" is essentially useful to programmatically enforce typing with string/untyped values) or an "instance of" expression may be needed for some cases.

Following is a slightly modified variant of the XML instance document specified above (XML1):

[XML2]

<X>
   <elem>
     <a>2</a>
     <b>2012-02-04</b>
   </elem>
   <elem>
     <a>10</a>
     <b>2005-10-07</b>
   </elem>
</X>

The XSD schema should express the following constraints with respect to the above XML instance document (XML2):
1. The element "a" is typed as an xs:nonNegativeInteger value, and element "b" is typed as xs:date.
2. The number of days equal to the numeric value specified in an element "a" if added to the xs:date value specified in an element "b", should result in an xs:date value which must be less than the current date.

The following XSD (1.1) schema document describes all of the above validation constraints for a sample XML instance document (XML2) provided above:

[XS2]

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
     <xs:element name="X">
        <xs:complexType>
           <xs:sequence>
              <xs:element name="elem" maxOccurs="unbounded">
                 <xs:complexType>
                    <xs:sequence>
                       <xs:element name="a" type="xs:nonNegativeInteger"/>
                       <xs:element name="b" type="xs:date"/>
                    </xs:sequence>
                    <xs:assert test="(b + xs:dayTimeDuration(concat('P', a, 'D'))) lt current-date()"/>
                 </xs:complexType>
              </xs:element>
           </xs:sequence>
        </xs:complexType>
     </xs:element>
   
</xs:schema>

That's all I had to say today.

I hope this post was useful.

Thursday, January 26, 2012

Using XSD 1.1 assertions on complexType mixed contents

There were some interesting ;) thoughts coming to my mind lately, and not surprisingly again related to XSD. I was playing with XSD 1.1 assertions once again to try to constrain an XSD complexType{mixed} content model and I'm sharing some of my findings ... (I guess, I hadn't written about this particular topic on this blog before or on any other forum. If you find any duplicacy of information in this post with any information I might have written elsewhere, kindly ignore the earlier things I might have written). I come to the topic now.

What is XSD mixed content (you may ignore reading this, if you already know about this)?
 I believe, this isn't really an XSD only topic. It is something which is present in plain XML (there can be a good old well-formed XML document, which might have "mixed" content and needn't be validated at all -- i.e in a schema free XML environment), but XSD allows to report such an XML instance document as 'valid' (more importantly, XSD would report a "mixed" content model XML instance as 'invalid' if validated by an "element only" content model specified by an XSD complexType definition) and also to constrain XML mixed contents in certain ways (particularly with XSD 1.1 in some new ways, which I'll try to talk about further below).

Example of "element only" (content of element "X" here) XML content model [X1]:

<X>
  <Y/>
  <Z/>
</X>
Example of "mixed content" (content of element "X" here) XML content model [X2]: 

<X>
  abc
  <Y/>
  123
  <Z/>
  654
</X> 

Therefore, "mixed content" allows "non whitespaced" text nodes as siblings of element nodes.

XSD 1.0 schema definition that allows "mixed" content [XS1]:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">    
        <xs:complexType mixed="true">
             <xs:sequence>
                 <xs:element name="Y"/>
                 <xs:element name="Z"/>
             </xs:sequence>
        </xs:complexType>
    </xs:element>
    
</xs:schema>

This schema (XS1) would report the XML document "X2" above as 'valid' (since that instance document has "mixed" content, and this schema allows "mixed" content via a property "mixed = 'true'" on a complexType definition).

But in the schema document "XS1" above, if we remove the property specifier "mixed = 'true'" or set the value of attribute "mixed" as 'false' (which is also the default value of this attribute), then such a modified schema would report the XML instance document "X2" above as 'invalid' (but the XML document "X1" above would be reported as 'valid' -- since it doesn't has "mixed" content).

New capabilities provided by XSD 1.1 to constrain XML "mixed" content further:

Following is a list of new features supported by XSD 1.1 for XML "mixed" contents, that currently come to my mind,

a)

XSD 1.1 schema "XS2":
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">    
       <xs:complexType mixed="true">
          <xs:sequence>
             <xs:element name="Y"/>
             <xs:element name="Z"/>
          </xs:sequence>          
          <xs:assert test="deep-equal(text()[matches(.,'\w')]/normalize-space(.), ('abc','123','654'))"/>
       </xs:complexType>
    </xs:element>
    
</xs:schema>
The <assert> element in this schema (XS2) constrains the mixed content in XML instance document to be a list (with order of list items been significant) of only few specified values. The assertion is written only to illustrate the technical capabilities of an assertion here, but not with any application in mind.
Following are few of other things, which XSD 1.1 assertions could achieve in an XML "mixed" content model's context:

b)
<xs:assert test="((text()[matches(.,'\w')]/normalize-space(.))[2] castable as xs:integer)
                    and
                 ((text()[matches(.,'\w')]/normalize-space(.))[3] castable as xs:integer)"/>

This assertion constrains specific items of an XML "mixed" content model list to be of a specified XSD schema type -- here the 2nd and 3rd items of the list need to be typed as xs:integer, whereas the first item is "untyped".

c)
<xs:assert test="count((text()[matches(.,'\w')]/normalize-space(.))[. castable as xs:integer])
                    =
                 count(text()[matches(.,'\w')]/normalize-space(.))"/>

This assertion constrains all items of the XML "mixed" content model list to be of the same type (xs:integer in this case) -- this uses a well defined pattern "count of xs:integer items is equal to the count of all the items".

d)
<xs:assert test="every $x in text()[matches(.,'\w')][position() gt 1]
                   satisfies 
                (number(normalize-space($x)) gt number($x/preceding-sibling::text()[matches(.,'\w')][1]))"/>

This assertion constrains the list of XML "mixed" content model to be in ascending numeric order (assuming that all items in the list are numeric. Though it should be possible to specify a numeric order on a heterogeneously typed list, and specify numeric order only for numeric list items).

Summary: XSD 1.0 allowed an "untyped" XML mixed content, that was uniformly available anywhere within the scope of an XML element that was validated by an XSD complexType. No further constraints on "mixed" content were possible in an XSD 1.0 environment. XSD 1.1 allows some new ways to constrain XML "mixed" content further (some of these capabilities were illustrated in examples above). To my opinion, the likely benefits of constraining XML "mixed" content in some of the ways as illustrated above, is to allow the XML document authors to model certain semantic content in "mixed" content scope and make this knowledge available to the XML applications. All examples above were tested with Apache Xerces (I hope that these examples would also be compliant with other XSD validators, notably Saxon currently which also supports XSD 1.1).

I hope that this information was useful.