Showing posts with label xerces. Show all posts
Showing posts with label xerces. Show all posts

Wednesday, June 9, 2021

XML Schema xsi:type and xs:alternative

After having studied little bit deeply about XML Schema's xsi:type attribute, and xs:alternative (introduced in the XML Schema 1.1 version) element, I've come to conclusion that, there are lot of functional similarities between xsi:type and xs:alternative, and of course differences as well. To illustrate these points, I've come up with following XML Schema and XML document instance examples (that I shall also attempt to explain within this blog post).


XML Schema document 1 (conforming to XSD 1.1)

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="note" type="NoteType"/>

    <xs:complexType name="NoteType">

       <xs:sequence>

          <xs:element name="to" type="xs:string"/>

          <xs:element name="from" type="xs:string"/>

          <xs:element name="heading" type="xs:string"/>

          <xs:element name="body" type="xs:string"/>

       </xs:sequence>

    </xs:complexType>

    <xs:complexType name="NoteType2">

       <xs:complexContent>

          <xs:extension base="NoteType">

             <xs:attribute name="isConfidential" type="xs:boolean" use="required"/>

          </xs:extension>

       </xs:complexContent>

    </xs:complexType>

    <xs:complexType name="NoteType3">

       <xs:complexContent>

          <xs:extension base="NoteType">

             <xs:attribute name="isConfidential" type="xs:boolean" use="required"/>

             <xs:assert test="to castable as emailAddress"/>

             <xs:assert test="from castable as emailAddress"/>

          </xs:extension>

       </xs:complexContent>

    </xs:complexType>

    <xs:simpleType name="emailAddress"> 

       <xs:restriction base="xs:string"> 

         <xs:pattern value="[^@]+@[^@\.]+(\.[^@\.]+)+"/>

       </xs:restriction> 

    </xs:simpleType>

</xs:schema>


Following are three XML document instances, that are valid with above specified XML Schema document:

XML document instance 1

<note>

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>

XML document instance 2

<note isConfidential="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="NoteType2">

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>

XML document instance 3

<note isConfidential="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="NoteType3">

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>

The "XML document instance 1", is an XML document that is valid according to an XSD element declaration and an XSD type definition "NoteType".

The "XML document instance 2" asserts that the type of an XML instance element "note" must be "NoteType2".

The "XML document instance 3" asserts that the type of an XML instance element "note" must be "NoteType3".

Note that, as per XML Schema language, the XSD type named as a value of xsi:type attribute, must be validly substitutable for the declared type (i.e, which is associated within an XML schema) of an XML element. According to the XML Schema language, a type S is validly substitutable for type T, if type S is a type derived from type T.


Now consider another XML Schema document, as following,

XML Schema document 2 (conforming to XSD 1.1)

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="note" type="NoteType">

       <xs:alternative test="@noteType2 = true()" type="NoteType2"/>

       <xs:alternative test="@noteType3 = true()" type="NoteType3"/>

    </xs:element>

    <xs:complexType name="NoteType">

       <xs:sequence>

          <xs:element name="to" type="xs:string"/>

          <xs:element name="from" type="xs:string"/>

          <xs:element name="heading" type="xs:string"/>

          <xs:element name="body" type="xs:string"/>

       </xs:sequence>

    </xs:complexType>

    <xs:complexType name="NoteType2">

       <xs:complexContent>

          <xs:extension base="NoteType">

             <xs:attribute name="isConfidential" type="xs:boolean" use="required"/>

             <xs:attribute name="noteType2" type="xs:boolean" use="required"/>

          </xs:extension>

       </xs:complexContent>

    </xs:complexType>

    <xs:complexType name="NoteType3">

       <xs:complexContent>

          <xs:extension base="NoteType">

             <xs:attribute name="isConfidential" type="xs:boolean" use="required"/>

             <xs:attribute name="noteType3" type="xs:boolean" use="required"/>

             <xs:assert test="to castable as emailAddress"/>

             <xs:assert test="from castable as emailAddress"/>

          </xs:extension>

       </xs:complexContent>

    </xs:complexType>

    <xs:simpleType name="emailAddress"> 

       <xs:restriction base="xs:string"> 

         <xs:pattern value="[^@]+@[^@\.]+(\.[^@\.]+)+"/>

       </xs:restriction> 

    </xs:simpleType>

</xs:schema>


Following are two XML document instances, that are valid with above specified XML Schema document:

XML document instance 4

<note isConfidential="true" noteType2="true">

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>

XML document instance 5

<note isConfidential="true" noteType3="true">

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>


I think that, XML Schema documents 1 and 2 as illustrated in examples above, solve the same XML document validation problem, but in two different ways. With XSD element xs:alternative, we need to introduce a new physical XML attribute like "noteType2" & "noteType3", whereas we can achieve the same effect using an attribute xsi:type with another solution.


Following is another XML Schema 1.1 document, that has a little variation than the XML Schema document "XML Schema document 2" specified earlier above,

XML Schema document 3

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="note" type="NoteType">

       <xs:alternative test="@noteType = 2" type="NoteType2"/>

       <xs:alternative test="@noteType = 3" type="NoteType3"/>

    </xs:element>

    <xs:complexType name="NoteType">

       <xs:sequence>

          <xs:element name="to" type="xs:string"/>

          <xs:element name="from" type="xs:string"/>

          <xs:element name="heading" type="xs:string"/>

          <xs:element name="body" type="xs:string"/>

       </xs:sequence>

    </xs:complexType>

    <xs:complexType name="NoteType2">

       <xs:complexContent>

          <xs:extension base="NoteType">

             <xs:attribute name="isConfidential" type="xs:boolean" use="required"/>

             <xs:attribute name="noteType" type="NoteTypeVal" use="required"/>

          </xs:extension>

       </xs:complexContent>

    </xs:complexType>

    <xs:complexType name="NoteType3">

       <xs:complexContent>

          <xs:extension base="NoteType">

             <xs:attribute name="isConfidential" type="xs:boolean" use="required"/>

             <xs:attribute name="noteType" type="NoteTypeVal" use="required"/>

             <xs:assert test="to castable as emailAddress"/>

             <xs:assert test="from castable as emailAddress"/>

          </xs:extension>

       </xs:complexContent>

    </xs:complexType>

    <xs:simpleType name="emailAddress"> 

       <xs:restriction base="xs:string"> 

         <xs:pattern value="[^@]+@[^@\.]+(\.[^@\.]+)+"/>

       </xs:restriction> 

    </xs:simpleType>

    <xs:simpleType name="NoteTypeVal"> 

       <xs:restriction base="xs:positiveInteger"> 

          <xs:minInclusive value="2"/>

          <xs:maxInclusive value="3"/>

       </xs:restriction> 

    </xs:simpleType>

</xs:schema>


Two valid XML instance documents, with the above mentioned XML Schema document are following,

<note isConfidential="true" noteType="2">

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>

<note isConfidential="true" noteType="3">

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>


With the XML Schema document "XML Schema document 3" specified above, we've defined an attribute "noteType" for both the types "NoteType2" and "NoteType3". We distinguish within the XML instance document, with which XSD type the "note" element would be validated, by the value of attribute "noteType" within the XML instance document.

Also note that, as per XML Schema 1.1 specification for type alternatives (i.e when having xs:alternative elements within XSD documents), the following must be applicable,

For each type T of sibling xs:alternative elements within an XSD document, type T must be validly derived from an element's default type definition (this is a constraint similar to those for xsi:type), or T can be type xs:error.  

Sunday, May 3, 2020

Online XML Schema validation service

During some of my spare time, I've developed and deployed an 'online XML Schema validation service' using Apache Xerces-J as XML Schema (XSD) processor at back-end. This 'online XML Schema validation service' is located at, http://www.softwarebytes.org/xmlvalidation/. The HTTPS version is available here: https://www.softwarebytes.org/xmlvalidation/.

The mentioned 'online XML Schema validation service', also provides REST APIs to be invoked from any program that can issue HTTP POST requests. The 'online XML Schema validation service' referred above, provides downloadable examples written in Python and C# that use the provided REST APIs. The responses from mentioned REST APIs can be in following formats: XML, JSON, plain text (the REST API response format, can be set while issuing HTTP requests).

Interestingly, I've discovered that, the above mentioned REST APIs can be invoked directly via a tool like curl by using its platform binary. With modern computer OSs (for e.g, Windows 10), curl comes pre-installed within the OS. Following are network responses on the command line, for the few curl requests that I issued to the mentioned REST APIs,

curl --form xmlFile=@two_inp_files/x1_valid_1.xml --form xsdFile1=@two_inp_files/x1.xsd --form ver=1.1 --form xsd11CtaFullXPath=no --form responseType=xml https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<validationReport>
   <xsdVer>1.1</xsdVer>
   <success>
      <message>XML document is assessed as valid with the XSD document(s) that were provided.</message>
   </success>
</validationReport>

curl --form xmlFile=@two_inp_files/x1_invalid_1.xml --form xsdFile1=@two_inp_files/x1.xsd --form ver=1.1 --form xsd11CtaFullXPath=no --form responseType=xml https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<validationReport>
   <xsdVer>1.1</xsdVer>
   <failure>
      <message>XML document is assessed as invalid with the XSD document(s) that were provided.</message>
      <details>
         <detail_1>[Error] x1_invalid_1.xml:3:5:cvc-assertion: Assertion evaluation ('if (@isB = true()) then b else not(b)') for element 'X' on schema type '#AnonType_X' did not succeed.</detail_1>
      </details>
   </failure>
</validationReport>

curl --form xmlFile=@two_inp_files/x1_valid_1.xml --form xsdFile1=@two_inp_files/x1.xsd --form ver=1.1 --form xsd11CtaFullXPath=no --form responseType=json https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

{
    "xsdVer": "1.1",
    "success": {"message": "XML document is assessed as valid with the XSD document(s) that were provided."}
}

curl --form xmlFile=@two_inp_files/x1_invalid_1.xml --form xsdFile1=@two_inp_files/x1.xsd --form ver=1.1 --form xsd11CtaFullXPath=no --form responseType=json https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

{
    "xsdVer": "1.1",
    "failure": {
        "details": ["[Error] x1_invalid_1.xml:3:5:cvc-assertion: Assertion evaluation ('if (@isB = true()) then b else not(b)') for element 'X' on schema type '#AnonType_X' did not succeed."],
        "message": "XML document is assessed as invalid with the XSD document(s) that were provided."
    }
}

curl --form xmlFile=@input_small.xml --form xsdFile1=@assert_2.xsd --form ver=1.1 --form xsd11CtaFullXPath=no https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

You selected XSD 1.1 validation.
XML document is assessed as valid with the XSD document(s) you have provided.

(please note that, since the last curl request above doesn't specify a command line argument 'responseType', a response formatted as plain text is received from the server API. i.e, a plain text response from this API, is the default response format)

The mentioned 'online XML Schema validation service', supports both 1.0 and 1.1 versions of XML Schema language.

Saturday, March 21, 2020

Using XML Schema 1.1 <alternative> with Xerces-J

I wish to share little information here, about Apache Xerces-J's implementation of XML Schema (XSD) 1.1 'type alternatives'.

The XSD 1.1 specification, defines a particular subset of XPath 2.0 language that can be used as value of 'test' attribute of XSD 1.1 <alternative> element. The XSD 1.1 language's XPath 2.0 subset is much smaller than the whole XPath 2.0 language. The specification of this smaller CTA XPath subset, can be read at https://www.w3.org/TR/xmlschema11-1/#coss-ta (specifically, the section mentioning '2.1 It conforms to the following extended BNF' which has grammar specification for the CTA XPath subset).

In fact, the XSD 1.1 specification allows XSD validators, implementing XSD 1.1's <alternative> element, to support a bigger set of XPath 2.0's features (commonly the full XPath 2.0 language) than what is defined by XSD 1.1 CTA (conditional type alternatives) XPath subset.

For XSD 1.1 CTAs, Xerces-J with user option, allows selecting either:

1) The smaller XPath subset (the default for Xerces-J), or

2) Full XPath 2.0. How selecting between XPath subset or the full XPath 2.0 language, can be done for Xerces-J's CTA implementation is described here, https://xerces.apache.org/xerces2-j/faq-xs.html#faq-3.

I've analyzed a bit, the nature of XSD 1.1 CTA XPath subset language. Following are essentially the main XSD 1.1 CTA XPath subset patterns, that may be used within XSD 1.1 schemas when using XSD <alternative> element,

1) Using comparators (like >, <, =, !=, <=, >=):

The example CTA XPath expressions are following,
@x = @y,
@x = 3,
@x != 3,
@x > @y

2) Using comparators with logical operators:

The example CTA XPath expressions are following,
(@x = @y) or (@p = @q),
((1 = 2) or (5 = 6)) and (5 = 7),
(1 and 2) or (5 and 7)

3) Using XPath 2.0 'not' function:

An example XPath expression is following,
(@x = @y) and not(@p)

Interestingly, the XSD 1.1 CTA XPath subset language, allows using only the XPath 2.0 fn:not function and no other XPath 2.0 built-in functions. Constructor functions, for all built-in XSD types may be used, for e.g xs:integer(..), xs:boolean(..) etc, in XSD 1.1 CTA XPath subset expressions.

As per the XSD 1.1 specification, during XSD 1.1 CTA evaluations, the XML element and attribute nodes are untyped (i.e the XML nodes do not carry any type annotation coming from a XML schema). Therefore, in many cases, XSD 1.1 CTA XPath subset expressions when used with Xerces-J need to use explicit casts (for e.g, <xs:alternative test="(xs:integer(@x) = xs:integer(@y)) and fn:not(xs:boolean(@p))"> with namespace prefix 'fn' bound to the URI 'http://www.w3.org/2005/xpath-functions'). For the CTA XPath subset language or the full XPath 2.0 language for CTAs, it is optional for the XPath expressions to have the "fn" prefix with the XPath built-in functions. Typically, XML schema authors would not use the "fn" prefix for XPath built-in functions.

Saturday, January 25, 2020

Apache Xerces-J 2.12.1 now available

On behalf of Apache Xerces XML project team, I'm pleased to share that version 2.12.1 of Apache Xerces-J is now available. For more information about this new Xerces-J release and to download Xerces-J, please visit the Xerces-J site.

Sunday, December 31, 2017

Xerces bug XERCESJ-1687

I wish to share my anguish, that following Xerces bug has caused me:

https://issues.apache.org/jira/browse/XERCESJ-1687

The bug reporter is very right in his arguments. But somehow I've to say, the Xerces team cannot fix this bug right now. I've also been thinking to "resolve" this bug with a fix reason "later" (but my conscience doesn't allow that either).

I hope the situation will improve.

Thursday, June 1, 2017

XPath 2.0 atomization with XML Schema 1.1 validation

XPath 2.0 atomization as a concept, as applicable to XML Schema 1.1 validation is worth looking at. I would attempt to write something about this topic, here in this blog post.

Lets look at the following XML Schema 1.1 validation example, that we'll use to discuss this topic.

XSD 1.1 document:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 
   <xs:element name="X">
      <xs:complexType>
         <xs:sequence>
            <xs:element name="a" type="xs:integer"/>
            <xs:element name="b" type="xs:integer"/>
         </xs:sequence>
         <xs:assert test="a gt b"/>
      </xs:complexType>
   </xs:element>
 
</xs:schema>

XML instance document that is validated with above mentioned XSD document:

<?xml version="1.0"?>
<X>
  <a>4</a>
  <b>7</b>
</X>

Upon XML Schema validation, the above mentioned XML instance document would be reported as invalid, because numeric value of "a" is less than "b". Now what is XPath 2.0 atomization, as for in this example that I wish to talk about?

Since the XML document has been validated with the mentioned XSD document, while building the XPath data model tree to evaluate <assert>, the nodes of XPath tree are bound with the XSD types as mentioned in the XSD document. Therefore, the <assert> XPath 2.0 expression "a gt b", comes with runtime availability of the corresponding XSD types on <assert> tree nodes for elements "a" and "b". In XPath 2.0 terms, the values as a result of atomization operation of nodes for XML elements "a" and "b" are used when an XPath expression "a gt b" is evaluated. We can't test a greater/less than relation on XML nodes, but we can do that on numbers for example, and the conversion of XML runtime nodes to atomic values like number is what XPath 2.0 atomization achieves.

I've used Apache Xerces as an XSD 1.1 validator, for testing examples for this blog post.

Sunday, May 19, 2013

thanks to OxygenXML folks

On behalf of Xerces-J XML Schema team, I would like to thank folks from Oxygen XML team to highlight many important bugs within Xerces-J XSD 1.1 validator. We've been able to solve many of those reported bugs, and I feel this has made implementation of Xerces-J XSD 1.1 validator quite better.

Here's the list of issues reported by Oxygen folks during the past 1-2 years I guess, which are either resolved or closed:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20XERCESJ%20AND%20issuetype%20%3D%20Bug%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20reporter%20in%20%28radu_coravu%2C%20%22octavian.nadolu%22%29

In the above report, you might ignore bugs dated as old as 2006, which must have been resolved within an existing or an earlier Xerces-J version.

Other than the bugs reported by Oxygen XML folks, we also received bug reports from other members of XML community. Thanks to those persons also. 

I'm not sure when we're going to release next version of Xerces-J which should have many implementation improvements. Taking a very pessimistic view wrt this, I expect a new version of Xerces-J sometime later this year or might slip to next year.

Thursday, November 15, 2012

new thoughts about XSD 1.1 assertions

I've been thinking on these XSD topics for a while, and thought of summarizing my findings here.

Let me start this post by writing the following XML instance document (which will be the focus of all analysis in this post):

XML-1
<list attr="1 2 3 4 5 6">
    <item>a1</item>
    <item>a2</item>
    <item>a3</item>
    <item>a4</item>
    <item>a5</item>
    <item>a6</item>
</list>

We need to specify an XSD schema for the XML document above (XML-1), providing the following essential validation constraints:
1) The value of attribute "attr" is a sequence of single digit numbers. A number here can be modeled as an XSD type xs:integer, or as a restriction from xs:string (as we'll see below).
2) Each string value within an element "item" is of the form a[0-9]. i.e, this string value needs to be the character "a" followed by a single digit numeric character. We'll simply specify this with XSD type xs:string for now. We want that, each numeric character after "a" should be pair-wise same as the value at corresponding index within attribute value "attr". The above sample XML instance document (XML-1) is valid as per this requirement. Therefore, if we change any numeric value within the XML instance sample above (either within the attribute value "attr", or the numeric suffix of "a") only within the attribute "attr" or the elements "item", the XML instance document must then be reported as 'invalid' (this observation follows from the requirement that is stated in this point).

Now, let me come to the XSD solutions for these XML validation requirements.

First of all, we would need XSD 1.1 assertions to specify these validation constraints (since, this is clearly a co-occurrence data constraint issue.). Following is the first schema design, that quickly came to my mind:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
    <xs:element name="list">
        <xs:complexType>
           <xs:sequence>
              <xs:element name="item" type="xs:string" maxOccurs="unbounded"/>
           </xs:sequence>
           <xs:attribute name="attr">
              <xs:simpleType>
                 <xs:list itemType="xs:integer"/>
              </xs:simpleType>
           </xs:attribute>
           <xs:assert test="deep-equal(item/substring-after(., 'a'), data(@attr))"/>
        </xs:complexType>
    </xs:element>
   
</xs:schema>

The above schema is almost correct, except for a little problem with the way assertion is specified. As per the XPath 2.0 spec, the "deep-equal" function when comparing the two sequences for deep equality checks, requires that atomic values at same indices in the two sequences must be equal as per the rules of equality of an XSD atomic type. Within an assertion in the above schema, the first argument of "deep-equal" has a type annotation of xs:string* and the second argument has a type annotation xs:integer* (note that, the XPath 2.0 "data" function returns the typed value of a node) and therefore the "deep-equal" function as used in this case returns a 'false' result.

Assuming that we would not change the schema specification of "item" elements and the attribute "attr", the following assertion would therefore be correct to realize the above requirement:

<xs:assert test="deep-equal(item/substring-after(., 'a'), for $att in data(@attr) return string($att))"/>

(in this case, we've converted the second argument of "deep-equal" function (highlighted with a different color) to have a type annotation xs:string* and did not modify the type annotation of the first argument)

An alternative correct modification to the assertion would be:

<xs:assert test="deep-equal(item/number(substring-after(., 'a')), data(@attr))"/>

(in this case, we convert the type annotation of the first argument of "deep-equal" function to xs:integer* and do not modify the type annotation of the second argument)

I now propose a slightly different way to specify the schema for above requirements. Following is the modified schema document:

XS-2
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
    <xs:element name="list">
        <xs:complexType>
           <xs:sequence>
              <xs:element name="item" type="xs:string" maxOccurs="unbounded"/>
           </xs:sequence>
           <xs:attribute name="attr">
              <xs:simpleType>
                 <xs:list itemType="NumericChar"/>
              </xs:simpleType>
           </xs:attribute>
           <xs:assert test="deep-equal(item/substring-after(., 'a'), data(@attr))"/>
        </xs:complexType>
    </xs:element>
  
    <xs:simpleType name="NumericChar">
       <xs:restriction base="xs:string">
          <xs:pattern value="[0-9]"/>
       </xs:restriction>
    </xs:simpleType>
  
</xs:schema>

This schema document is right in all respects, and successfully validates the XML document specified above (i.e, XML-1). In this schema we've made following design decisions:
1) We've specified the itemType of list (the value of attribute "attr" is this list instance) as "NumericChar" (this is a user-defined simpleType, that uses the xs:pattern facet to constrain list items).
2) The "deep-equal" function as now written in the schema XS-2, has the type annotation xs:string* for both of its arguments. And therefore, it works fine.

I'll now try to summarize below the pros and cons of schema XS-2 wrt the other correct solutions specified earlier:
1) If the simpleType definition of attribute "attr" is not used in another schema context (i.e, ideally if this simpleType definition is the only use of such a type definition). Or in other words there is no need of re-usability of this type. Then the solution with schema XS-2  is acceptable.
2) If a schema author thought, that list items of attribute "attr" need to be numeric (due to semantic intent of the problem, or if the list's simpleType definition needs to be reused at more than one place and the other place needs a list of integers), then the schema solutions like shown earlier would be needed.

Here's another caution I can point wrt the schema solutions proposed above,
The above schemas would allow values within "item" elements like "pqra5" to produce a valid outcome with the "substring-after" function that is written in assertions. Therefore, the "item" element may be more correctly specified like,

<xs:element name="item" maxOccurs="unbounded">
    <xs:simpleType>
         <xs:restriction base="xs:string">
              <xs:pattern value="a[0-9]"/>
         </xs:restriction>
    </xs:simpleType>
</xs:element>

It is also evident, that XPath 2.0 "data" function allows us to do some useful things with simpleType lists, like getting the list's typed value and specifying certain checks on individual list items (possibly different checks on different list items) or accessing list items by an index (or a range of indices). For e.g, data(@attr)[2] or data(@attr)[position() gt 3]. This was not possible with XSD 1.0.

I hope that this post was useful, and hoping to come back with another post sometime soon.

Saturday, February 25, 2012

modular XML instances and modular XSD schemas

I was playing with some new ideas lately related to exploring design options, to construct modular XML instance documents vs/and modular XSD schema documents and thought to write my findings as a blog post here.

I believe, there are primarily following concepts related to constructing modular XML documents (and XSD schemas) when XSD validation is involved:
1. Modularize XML documents using the XInclude construct.
2. Modularize an XSD document via <xs:include> and <xs:import>. The <xs:include> construct maps significantly to modularlity concepts in XSD schemas, and <xs:import> is necessary (necessary in XSD 1.0, and optional in XSD 1.1) to compose (and also to modularize) XSD schemas coming from two or more distinct XML namespaces.

I don't intend to delve much in this post into concepts related to XSD constructs <xs:include> and <xs:import> since these are well known within the XSD and XML communities. In this post, I would tend to primarily focus on XML document modularization via the XInclude construct and presenting few thoughts about various design options (I don't claim to have covered every design option for these use cases, but I feel that I would cover few of the important ones) to validate such XML instance documents via XSD validation.

What is XInclude?
This is an XML standards specification, that defines about how to modularize any XML document information. The primary construct of XInclude is an <xi:include> XML element. Following is a small example of an XInclude aware XML document,

z.xml

<z xmlns:xi="http://www.w3.org/2001/XInclude">
    <xi:include href="x.xml"/>
    <xi:include href="y.xml"/>
</z>

x.xml

<x>
    <a>1</a>
    <b>2</b>
</x>

y.xml

<y>
    <p>5</p>
    <q>6</q>
</y>

We'll be using the XML document, z.xml provided above that is composed from other XML documents via an XInclude meta-data, to provide to an XSD validator for validation.

I essentially discuss here, the XSD schema design options to validate an XML instance document like z.xml above. Following are the XSD design options (that cause successful XML instance validations) that currently come to my mind for this need, along with some explanation of the corresponding design rationale:

XS1:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="z">
          <xs:complexType>
               <xs:sequence>
                    <xs:any processContents="skip" minOccurs="2" maxOccurs="2"/>
               </xs:sequence>
          </xs:complexType>
    </xs:element>
   
</xs:schema>

This schema is written with a view that, the XML document (i.e z.xml) would be validated with XInclude meta-data unexpanded. An xs:any wild-card in this schema would weakly validate (since this wild-card declaration only requires *any particular* XML element to be present in an instance document, which is validated by this wild-card. the wild-card here doesn't specify any other constraint for it's corresponding XML instance elements) each of the included XML document element roots (i.e XML elements "x" and "y").

XS2:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

        <xs:element name="z">
                <xs:complexType>
                     <xs:complexContent>
                         <xs:restriction base="T1">
                              <xs:sequence>
                                   <xs:element name="include"  minOccurs="2" maxOccurs="2" targetNamespace="http://www.w3.org/2001/XInclude"/>
                             </xs:sequence>
                         </xs:restriction>
                    </xs:complexContent>
                </xs:complexType>
        </xs:element>
   
    <xs:complexType name="T1" abstract="true">
          <xs:sequence>
               <xs:any processContents="skip" maxOccurs="unbounded"/>
          </xs:sequence>
    </xs:complexType>
   
</xs:schema>

This schema is also written with a view that, the XML document (i.e z.xml) would be validated with XInclude meta-data unexpanded. But this schema specifies slightly stronger XSD validation constraints as compared to the previous example (stronger in a sense that, this schema declares an XML element and specifies it's name and an namespace). This schema would need an XSD 1.1 processor, since the element declaration specifies a "targetNamespace" attribute. An XSD 1.0 version of this design approach is possible, which would involve using an XSD <xs:import> element to import XSD components from the XInclude namespace.

XS3:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

       <xs:element name="z">
              <xs:complexType>
                  <xs:sequence>
                       <xs:any processContents="skip" minOccurs="2" maxOccurs="2" namespace="http://www.w3.org/2001/XInclude"/>
                 </xs:sequence>
                 <xs:assert test="count(*[local-name() = 'include']) = 2"/>
                 <xs:assert test="deep-equal((*[1] | *[2])/@*/name(), ('href','href'))"/>
             </xs:complexType>
      </xs:element>
   
</xs:schema>

This schema is also written with a view that, the XML document (i.e z.xml) would be validated with XInclude meta-data unexpanded. But this schema enforces XSD validation even more strongly than the example "XS2" above (since this schema also requires the XInclude attribute "href" to be present on the XInclude meta-data, which the previous XSD schema doesn't enforce). This schema validates the names of XML instance elements, that are intended to be XInclude meta-data via XSD 1.1 <assert> elements (this may not be the best XSD validation approach, but such an XSD design idiom is now possible with XSD 1.1 language).

XS4:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="z">
         <xs:complexType>
               <xs:sequence>
                    <xs:element name="x">
                         <xs:complexType>
                             <xs:sequence>
                                  <xs:element name="a" type="xs:integer"/>
                                 <xs:element name="b" type="xs:integer"/>
                            </xs:sequence>
                        </xs:complexType>
                    </xs:element>
                    <xs:element name="y">
                         <xs:complexType>
                             <xs:sequence>
                                  <xs:element name="p" type="xs:integer"/>
                                  <xs:element name="q" type="xs:integer"/>
                             </xs:sequence>
                        </xs:complexType>
                   </xs:element>
              </xs:sequence>
         </xs:complexType>
     </xs:element>
   
</xs:schema>

This schema is written with a view that, the XML document (i.e z.xml) would be validated with XInclude meta-data expanded. This schema specifies the strongest of XSD validation constraints as compared to the previous three approaches (strongest in a sense that, the internal structure of XML element instances "x" and 'y" are now completely specified by the XSD document).

But to make this XSD validation approach to work, the XInclude meta-data needs to be expanded and the expanded XML infoset needs to be supplied to the XSD validator for validation. This would require an XInclude processor (like Apache Xerces), that plugs within the XML parsing stage to expand the <xi:include> tags.

For the interest of readers, following are few java code snippets (the skeletal class structure and imports are omitted to keep the text shorter) that enable XInclude processing and supplying the resulting XML infoset (i.e post the XInclude meta-data expansion) to the Xerces XSD validator,

try {           
     Schema schema = schemaFactory.newSchema(getSaxSource(xsdUri, false));
     Validator validator = schema.newValidator();
     validator.setErrorHandler(new ValidationErrHandler());
     validator.validate(getSaxSource(xmlUri, true));
}
catch(SAXException se) {
     se.printStackTrace();
}
catch (IOException ioe) {
     ioe.printStackTrace();
}

private SAXSource getSaxSource(String docUri, boolean isInstanceDoc) {

     XMLReader reader = null;

     try {
          reader = XMLReaderFactory.createXMLReader();
          if (isInstanceDoc) {
              reader.setFeature("http://apache.org/xml/features/xinclude", true);
              reader.setFeature("http://apache.org/xml/features/xinclude/fixup-base-uris", false);
          }
     }
     catch (SAXException se) {
          se.printStackTrace();
     }

     return new SAXSource(reader, new InputSource(docUri));

}
     
class ValidationErrHandler implements ErrorHandler {

      public void error(SAXParseException spe) throws SAXException {
           String formattedMesg = getFormattedMesg(spe.getSystemId(), spe.getLineNumber(), spe.getColumnNumber(), spe.getMessage());
           System.err.println(formattedMesg);
      }

      public void fatalError(SAXParseException spe) throws SAXException {
             String formattedMesg = getFormattedMesg(spe.getSystemId(), spe.getLineNumber(), spe.getColumnNumber(), spe.getMessage());
             System.err.println(formattedMesg);
      }

      public void warning(SAXParseException spe) throws SAXException {
           // NO-OP           
      }
       
}

private String getFormattedMesg(String systemId, int lineNo, int colNo, String mesg) {
      return systemId + ", line "+lineNo + ", col " + colNo + " : " + mesg;   
}

Summary: I would ponder that, is devising the above various XSD design approaches beneficial for an XSD schema design that involves validating XML instance documents that contain <xi:include> meta-data directives? My thought process with regards to the above presented XSD validation options had following concerns:
1) Providing various degrees of XSD validation strenghts for <xi:include> directives (essentially the un-expanded and expanded modes).
2) Exploring some of the new XML validation idioms offered by XSD 1.1 language for the use cases presented above (essentially using "targetNamespace" attribute on xs:element elements, and using <assert> elements).
3) Exploring the java SAX and JAXP APIs to enable XInclude meta-data expansion, and providing a SAXSource object containing an XInclude expanded XML infoset which is subsequently supplied further to the XSD validation pipeline.

I hope that this post was useful.

Sunday, February 5, 2012

"castable as" vs "instance of" XPath 2.0 expressions for XSD 1.1 assertions

I'm continuing with my thoughts related to my previous blog post (ref, http://mukulgandhi.blogspot.in/2012/01/using-xsd-11-assertions-on-complextype.html). The earlier post used the XPath 2.0 "castable as" expression to do some checks on the 'untyped' data of complexType's mixed content (essentially finding if the string/untyped value in an XML instance document is a lexical representation of an xs:integer value).

This post talks about the use of XPath 2.0 "instance of" vs "castable as" expressions in context of XSD 1.1 assertions -- essentially providing guidance about when it may be necessary to use one of these expressions.

The XSD 1.1 "castable as" use case was discussed in my earlier blog post. Here I essentially talk about "instance of" expression when used with XSD 1.1 assertions.

Let's assume that there is an XML instance document like following (XML1):

<X>
   <elem>
     <a>20</a>
     <b>30</b>
   </elem>
   <elem>
     <a>10</a>
     <b>2005-10-07</b>
   </elem>
</X>

The XSD schema should express the following constraints with respect to the above XML instance document (XML1):
1. The elements "a" and "b" can be typed as an xs:integer or a xs:date (therefore we'll express this with an XSD simpleType with variety 'union').
2. If both the elements "a" and "b" are of type xs:integer (this is allowable as per the simpleType definition described in point 1 above), then numeric value of element "a" should be less than numeric value of element "b".
3. If one of the elements "a" or "b" is an xs:integer and the other one is xs:date, then we would like to express the following constraints,
   - the numeric XML instance value of an xs:integer typed element should be less than 100
   - the xs:date XML instance value should be less that the current date

The following XSD (1.1) schema document describes all of the above validation constraints for a sample XML instance document (XML1) provided above:

[XS1]

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
     <xs:element name="X">
        <xs:complexType>
           <xs:sequence>
              <xs:element name="elem" maxOccurs="unbounded">
                 <xs:complexType>
                    <xs:sequence>
                       <xs:element name="a" type="union_of_date_and_integer"/>
                       <xs:element name="b" type="union_of_date_and_integer"/>
                    </xs:sequence>
                    <xs:assert test="if ((data(a) instance of xs:integer) and (data(b) instance of xs:integer))
                                              then (data(a) lt data(b))
                                           else if (not(deep-equal(data(a), data(b))))
                                              then (*[data(.) instance of xs:integer]/data(.) lt 100
                                                         and
                                                      *[data(.) instance of xs:date]/data(.) lt current-date())
                                              else true()"/>
                 </xs:complexType>
              </xs:element>
           </xs:sequence>
        </xs:complexType>
     </xs:element>
   
     <xs:simpleType name="union_of_date_and_integer">
        <xs:union memberTypes="xs:date xs:integer"/>
     </xs:simpleType>
   
</xs:schema>

I think it may be interesting for readers to know why I wrote an assertion like the one above. Following are few of the thoughts,
1. Since the XML elements "a" and "b" are typed as a simpleType 'union', therefore for an assertion to access the XML instance atomic values that were validated by such an simpleType we need to use the XPath 2.0 "data" function on a relevant XDM node (elements "a" and "b" in this case). Further determining that the XML document's atomic instance value is typed as xs:integer, we need to use the "instance of" expression -- "castable as" is not needed in this case, since the instance document's data is already typed.
2. The rest of the assertion implements what is mentioned in the requirements above.

If you want to have further visual and/or design elegance within what is written in an assertion above, one may feel free to break assertion rules into two or more assertions.

I would also want to write another XSD 1.1 assertions example which doesn't use an XPath 2.0 "castable as" or an "instance of" expression. This demonstrates that, if an XDM assert node is already typed it would usually be unnecessary to use the "castable as" expression (since "castable as" is essentially useful to programmatically enforce typing with string/untyped values) or an "instance of" expression may be needed for some cases.

Following is a slightly modified variant of the XML instance document specified above (XML1):

[XML2]

<X>
   <elem>
     <a>2</a>
     <b>2012-02-04</b>
   </elem>
   <elem>
     <a>10</a>
     <b>2005-10-07</b>
   </elem>
</X>

The XSD schema should express the following constraints with respect to the above XML instance document (XML2):
1. The element "a" is typed as an xs:nonNegativeInteger value, and element "b" is typed as xs:date.
2. The number of days equal to the numeric value specified in an element "a" if added to the xs:date value specified in an element "b", should result in an xs:date value which must be less than the current date.

The following XSD (1.1) schema document describes all of the above validation constraints for a sample XML instance document (XML2) provided above:

[XS2]

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
     <xs:element name="X">
        <xs:complexType>
           <xs:sequence>
              <xs:element name="elem" maxOccurs="unbounded">
                 <xs:complexType>
                    <xs:sequence>
                       <xs:element name="a" type="xs:nonNegativeInteger"/>
                       <xs:element name="b" type="xs:date"/>
                    </xs:sequence>
                    <xs:assert test="(b + xs:dayTimeDuration(concat('P', a, 'D'))) lt current-date()"/>
                 </xs:complexType>
              </xs:element>
           </xs:sequence>
        </xs:complexType>
     </xs:element>
   
</xs:schema>

That's all I had to say today.

I hope this post was useful.

Thursday, January 26, 2012

Using XSD 1.1 assertions on complexType mixed contents

There were some interesting ;) thoughts coming to my mind lately, and not surprisingly again related to XSD. I was playing with XSD 1.1 assertions once again to try to constrain an XSD complexType{mixed} content model and I'm sharing some of my findings ... (I guess, I hadn't written about this particular topic on this blog before or on any other forum. If you find any duplicacy of information in this post with any information I might have written elsewhere, kindly ignore the earlier things I might have written). I come to the topic now.

What is XSD mixed content (you may ignore reading this, if you already know about this)?
 I believe, this isn't really an XSD only topic. It is something which is present in plain XML (there can be a good old well-formed XML document, which might have "mixed" content and needn't be validated at all -- i.e in a schema free XML environment), but XSD allows to report such an XML instance document as 'valid' (more importantly, XSD would report a "mixed" content model XML instance as 'invalid' if validated by an "element only" content model specified by an XSD complexType definition) and also to constrain XML mixed contents in certain ways (particularly with XSD 1.1 in some new ways, which I'll try to talk about further below).

Example of "element only" (content of element "X" here) XML content model [X1]:

<X>
  <Y/>
  <Z/>
</X>
Example of "mixed content" (content of element "X" here) XML content model [X2]: 

<X>
  abc
  <Y/>
  123
  <Z/>
  654
</X> 

Therefore, "mixed content" allows "non whitespaced" text nodes as siblings of element nodes.

XSD 1.0 schema definition that allows "mixed" content [XS1]:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">    
        <xs:complexType mixed="true">
             <xs:sequence>
                 <xs:element name="Y"/>
                 <xs:element name="Z"/>
             </xs:sequence>
        </xs:complexType>
    </xs:element>
    
</xs:schema>

This schema (XS1) would report the XML document "X2" above as 'valid' (since that instance document has "mixed" content, and this schema allows "mixed" content via a property "mixed = 'true'" on a complexType definition).

But in the schema document "XS1" above, if we remove the property specifier "mixed = 'true'" or set the value of attribute "mixed" as 'false' (which is also the default value of this attribute), then such a modified schema would report the XML instance document "X2" above as 'invalid' (but the XML document "X1" above would be reported as 'valid' -- since it doesn't has "mixed" content).

New capabilities provided by XSD 1.1 to constrain XML "mixed" content further:

Following is a list of new features supported by XSD 1.1 for XML "mixed" contents, that currently come to my mind,

a)

XSD 1.1 schema "XS2":
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">    
       <xs:complexType mixed="true">
          <xs:sequence>
             <xs:element name="Y"/>
             <xs:element name="Z"/>
          </xs:sequence>          
          <xs:assert test="deep-equal(text()[matches(.,'\w')]/normalize-space(.), ('abc','123','654'))"/>
       </xs:complexType>
    </xs:element>
    
</xs:schema>
The <assert> element in this schema (XS2) constrains the mixed content in XML instance document to be a list (with order of list items been significant) of only few specified values. The assertion is written only to illustrate the technical capabilities of an assertion here, but not with any application in mind.
Following are few of other things, which XSD 1.1 assertions could achieve in an XML "mixed" content model's context:

b)
<xs:assert test="((text()[matches(.,'\w')]/normalize-space(.))[2] castable as xs:integer)
                    and
                 ((text()[matches(.,'\w')]/normalize-space(.))[3] castable as xs:integer)"/>

This assertion constrains specific items of an XML "mixed" content model list to be of a specified XSD schema type -- here the 2nd and 3rd items of the list need to be typed as xs:integer, whereas the first item is "untyped".

c)
<xs:assert test="count((text()[matches(.,'\w')]/normalize-space(.))[. castable as xs:integer])
                    =
                 count(text()[matches(.,'\w')]/normalize-space(.))"/>

This assertion constrains all items of the XML "mixed" content model list to be of the same type (xs:integer in this case) -- this uses a well defined pattern "count of xs:integer items is equal to the count of all the items".

d)
<xs:assert test="every $x in text()[matches(.,'\w')][position() gt 1]
                   satisfies 
                (number(normalize-space($x)) gt number($x/preceding-sibling::text()[matches(.,'\w')][1]))"/>

This assertion constrains the list of XML "mixed" content model to be in ascending numeric order (assuming that all items in the list are numeric. Though it should be possible to specify a numeric order on a heterogeneously typed list, and specify numeric order only for numeric list items).

Summary: XSD 1.0 allowed an "untyped" XML mixed content, that was uniformly available anywhere within the scope of an XML element that was validated by an XSD complexType. No further constraints on "mixed" content were possible in an XSD 1.0 environment. XSD 1.1 allows some new ways to constrain XML "mixed" content further (some of these capabilities were illustrated in examples above). To my opinion, the likely benefits of constraining XML "mixed" content in some of the ways as illustrated above, is to allow the XML document authors to model certain semantic content in "mixed" content scope and make this knowledge available to the XML applications. All examples above were tested with Apache Xerces (I hope that these examples would also be compliant with other XSD validators, notably Saxon currently which also supports XSD 1.1).

I hope that this information was useful.



Tuesday, July 26, 2011

[revisiting] Xerces-J XSModel serializer

I started playing a bit with Xerces-J XSSerializer utility (it's actually a sample within Xerces-J and was introduced in Xerces-J 2.10.0 -- the version in SVN is slightly better and will be released with a future Xerces release; and it serializes an in-memory Xerces XSModel instance into a lexical XSD syntax), and thought of writing something about it's features.

XSModel serializer has following two important (and currently the only ones) serialization features/options:
1. Selecting the XSD language version, the XSModel serializer should work with. By default this is XSD 1.0, but it can be set to XSD 1.1 via the following command line parameter, {-version 1.1}. There are very few XSD 1.1 features that the XSModel serializer currently supports. We'll try to add more XSD 1.1 features in future to the XSModel serializer. But the XSD 1.0 support with Xerces's XSModel serializer is fairly complete.
2. The XSD language prefix during serialization output can be configured with the option, {-prefix <prefix-value>}. For e.g "-prefix xsd". If this option is not specified, the prefix "xs" is generated as default during XSModel instance serialization.

I've had few interesting observations while using the Xerces XSSerializer (illustrated with small examples below),

1. I supplied the following XSD document (only the element declaration is shown, since this is the focus of this point) to the XSModel serializer,
<xs:element name="E1">
   <xs:simpleType>
      <xs:list>
         <xs:simpleType>
           <xs:restriction base="xs:string">
              <xs:minLength value="5"/>
           </xs:restriction>
         </xs:simpleType>
      </xs:list>
   </xs:simpleType>
</xs:element>

and the XSModel serializer echoed this element instance (the XSModel serializer converted the lexical schema into XSModel instance, and then serialized the XSModel again to lexical XSD syntax) to following,
<xs:element name="E1">
   <xs:simpleType>
      <xs:list>
         <xs:simpleType>
            <xs:restriction base="xs:string">
               <xs:whiteSpace value="preserve"/>
               <xs:minLength value="5"/>
            </xs:restriction>
         </xs:simpleType>
      </xs:list>
   </xs:simpleType>
</xs:element>

The interesting thing I notice in this example is, the generation of the built in facet "whiteSpace" for the XSD type xs:string.

2. Serializing the following XSD element,
<xs:element name="E1">
   <xs:simpleType>
      <xs:list>
         <xs:simpleType>
            <xs:restriction base="xs:integer">
               <xs:minInclusive value="5"/>
            </xs:restriction>
         </xs:simpleType>
      </xs:list>
   </xs:simpleType>
</xs:element>
produces the following round-trip output with the XSModel serializer,
<xs:element name="E1">
   <xs:simpleType>
      <xs:list>
         <xs:simpleType>
            <xs:restriction base="xs:integer">
               <xs:whiteSpace value="collapse"/>
               <xs:fractionDigits value="0"/>
               <xs:minInclusive value="5"/>
               <xs:pattern value="[\-+]?[0-9]+"/>
            </xs:restriction>
         </xs:simpleType>
      </xs:list>
   </xs:simpleType>
</xs:element>
this shows the built in facets for the XSD type xs:integer ("whiteSpace", "fractionDigits" and others).

I personally like this feature of XSModel serializer, that it is able to generate certain hidden properties of XML Schema components, which the schema authors normally don't specify while writing the schema documents for applications.

3. I provided the following XSD Schema fragment to XSModel serializer (a complexType referring to a model group),
<xs:element name="E1">
  <xs:complexType>
     <xs:group ref="gp1"/>
  </xs:complexType>
</xs:element>
   
<xs:group name="gp1">
   <xs:sequence>
      <xs:element name="x" type="xs:string"/>
      <xs:element name="y" type="xs:string"/>
   </xs:sequence>
</xs:group>

and the XSModel serializer generated the following round-trip serialization result,
<xs:element name="E1">
   <xs:complexType>
      <xs:sequence>
         <xs:element name="x" type="xs:string"/>
         <xs:element name="y" type="xs:string"/>
      </xs:sequence>
   </xs:complexType>
</xs:element>

<xs:group name="gp1">
   <xs:sequence>
      <xs:element name="x" type="xs:string"/>
      <xs:element name="y" type="xs:string"/>
   </xs:sequence>
</xs:group>
The global "model group" is serialized as expected. But the complexType within the element declaration was serialized with it's element declarations expanded. The lexical group reference is not present in the serialized output.

At first this may look odd (i.e the absence of the model group reference) in the serialized output. But the fact is, that Xerces XSModel instance in it's complete compiled form, doesn't know whether a group particle (in this case xs:sequence) comes from a group reference. And I had to live with this XSModel serialization characteristic. But the serialized schema output in this example is equivalent to the original schema document (which was supplied to the XSModel serializer) from validation perspective (but the global group definition in the output in this case is redundant from validation perspective, and it's just a characteristic of the XSModel serializer currently).

That's all I have to say now. Thanks for reading this post.

Saturday, December 18, 2010

Apache Xerces-J 2.11.0 released

I am happy to extend the announcement made by Apache Xerces team few days ago, for the release of new version of Xerces-J (2.11.0) (ref http://markmail.org/message/oom75s3wpebfywh5). This Xerces release specifically improves compliance to the XML Schema 1.1 language (the detailed release notes for Xerces-J 2.11.0 are available at, http://xerces.apache.org/xerces2-j/releases.html).

On behalf of Xerces team I hope that this Xerces-J release would be found useful by the XML and XML Schema community.

Refrences to XML Schema language:
1. http://www.w3.org/XML/Schema (XML Schema WG Home Page)
2. http://www.w3.org/TR/xmlschema11-1/ (XML Schema 1.1 Structures specification)
3. http://www.w3.org/TR/xmlschema11-2/ (XML Schema 1.1 Datatypes specification)

Saturday, November 27, 2010

XML Schema 1.1: complexType restriction rules

I've been excited enough to write now about the new rules that have been specified in XML Schema 1.1 spec regarding type derivations between XML Schema complexType definitions and what is Xerces-J's (it's XML Schema 1.1 engine) current compliance about this area of XML Schema language. In this blog post I'm currently covering XML schema complexType restriction derivations. I'll try to write about complexType extensions sometime later. I thought that this post might find audience interested in this topic (anyone is invited to write a comment to this blog post, which will help me to learn more about type derivations between XML schema complex types -- "i'm interested in both restriction and extension derivations", and can also give Xerces team useful feedback to improve Xerces in desired and compliant ways). Below are my findings from the XML Schema 1.1 spec about this topic, and also Xerces's compliance status in this regard (I acknowledge that my understanding may yet not be complete about these areas of the XML Schema language :).

In XML Schema 1.0 language complex type restriction derivation rules are defined by schema particle restriction rules specified here, http://www.w3.org/TR/xmlschema-1/#coss-particle. There's a 5x5 table in this section which describes what constitute valid restrictions (and what schema type restrictions are forbidden) of XML schema particles.

In XML Schema 1.1 all of these complexType derivation rules are replaced by sections 3.4.6.3 Derivation Valid (Restriction, Complex) and 3.4.6.4 Content Type Restricts (Complex Content). In XML Schema 1.1 a mapping table (the 5x5 table) for particle restrictions is removed, and now a generic algorithm of subsumption relationship (a kind of containment or association relationship) of default bindings (which is an abstract notion for element and attributes declarations along with wild-card attributes "strict", "lax" and "skip") is specified. The XML Schema 1.1 complexType subsumption rules are simpler and easy to remember, than the corresponding type derivation rules from XML Schema 1.0 spec. My personal understanding so far is that, the improved default binding particle subsumption rules in XML Schema 1.1 make XML Schema 1.1 complexType restriction derivations largely compatible with corresponding type derivation rules in XML Schema 1.0, but the rules are now specified with better wordings.

Below are various XML schema complexType restriction cases I've studied so far (and these have corresponding implementations in Xerces; the upcoming Xerces-J 2.11.0 release would have these features), the characteristics of which are also described and I'm trying to discover more of the rules in these areas of XML Schema language.

xs:sequence, xs:choice and xs:all are possible compositors (which signify the notion of how we can compose schema particles in XML schema complexType definitions) in schema complexType's.

A) SEQUENCE TO SEQUENCE RESTRICTIONS
a.1 xs:element is derived from xs:any wild-card (both of these particles are part of an XML Schema sequence compositor). In this scenario cardinality of particles takes precedence than presence of a concrete element in derived type, when determining valid particle derivations.

For e.g <xs:element name="x" type="xs:string" minOccurs="0"/> is not a valid restriction of <xs:any processContents="lax" />, since the effective cardinality of element "x" (minOccurs="0" means that particle "x" is optional) is more than that of the wild-card particle (is mandatory).

a.2 There must be a similar (i.e X-to-X where X is a positive numerical value) mapping of particles from a schema 'base' to 'derived' type. i.e a derived type cannot have less number of particles than those in base type, and a particle in derived type must validly derive (i.e is subsumed validly as per rules specified in XML Schema 1.1 spec) from the corresponding particle in base schema type.

B) ALL TO SEQUENCE RESTRICTIONS
b.1 This is a valid schema compositor (and of particles in them) restriction (i.e ordered from unordered restriction).

For e.g sequence(b, a) and sequence(a, b) {order of particles in derived type doesn't matter} are valid restrictions of all(a, b).

b.2 Identity of particles (recognized by QName of the particles) is recognized by the XML schema validator, and corresponding such particles must obey rules of restriction by cardinality (i.e an optional characteristic of particle does not make particle a valid restriction of a mandatory particle, where QName's of corresponding such particles in base and derived types are same).

C) ALL TO ALL RESTRICTIONS
c.1 This is an unordered to unordered kind restriction. Concrete element particle is an valid derivation of a wild-card particle.

c.2 Cardinality of identical particles (having same QName's) in derived type must be same or less (which makes the derived particle validly derive from the corresponding particle from base type) than that in base type. Particle cardinalities take precedence over generic/concrete relationship between particles, when determining valid particle subsumptions.

c.3 Number of leaf particles (which are essentially xs:element and xs:any wild-card's) in derived and base types must be equal.

D) SEQUENCE TO ALL RESTRICTIONS
This is not a valid schema compositor restriction (i.e from ordered to unordered).

E) CHOICE TO SEQUENCE RESTRICTIONS
e.1 Here are few examples explaining some of the rules for this category.
  <xs:sequence>
     <xs:element name="c" type="xs:string" />
  </xs:sequence>

is a valid restriction of
  <xs:choice>  
     <xs:any processContents="lax" />
     <xs:element name="b" type="xs:string" />
  </xs:choice>

(the element particle "c" is subsumable by the wild-card)

e.2
  <xs:sequence>
     <xs:any processContents="lax" />
  </xs:sequence>

is not a valid restriction of
  <xs:choice>         
     <xs:element name="a" type="xs:string" />
     <xs:element name="b" type="xs:string" />
  </xs:choice>

This is so because a wild-card is not a valid subsumption of an element particle (i.e generic derivations from concrete elements is not a valid restriction, which in fact looks like an "type extension" concept).

F) SEQUENCE TO CHOICE RESTRICTIONS
Here's an example I can think over that correspond to use case of such kinds.
   
   <xs:restriction base="TYPE_BASE">
      <xs:choice>
         <xs:group ref="myGroup" />
      </xs:choice>
   </xs:restriction>
   
   is a valid restriction of
   
   <xs:complexType name="TYPE_BASE">
      <xs:group ref="myGroup" />
   </xs:complexType>
   
   <xs:group name="myGroup">
      <xs:sequence>
         <xs:element name="a" type="xs:string" />
         <xs:element name="b" type="xs:string" />
      </xs:sequence>
   </xs:group>

But this is not a useful schema type restriction, since the result of choice (i.e the schema particle produced from xs:choice) in derived type results only in one option, which is same as the contents of the sequence of the base type.

Other than the above example I cannot envision any other useful example for practical scenarios for "sequence to choice" restriction. I would imagine that schema authors must not bother much about "sequence to choice" restriction scenarios, as this doesn't looks a good and useful schema design scenario (but I don't deny that people may find valid uses of this as well :).

G) CHOICE TO CHOICE RESTRICTIONS
Here are few of the examples I can think of that satisfy this use-case (these I've found to be working fine with Xerces as well):

g.1 choice(a, c) is not a valid restriction of choice(a, b). Because element "c" in derived type doesn't have a corresponding element particle in the base type.

g.2
- choice(a, b) is a valid restriction of choice(a, wild-card processContents="lax"). If the wild-card can resolve to an element declaration that doesn't match element declaration "b", then this is NOT-A-VALID restriction.
- choice(a, b) is a valid restriction of choice(a, wild-card processContents="strict") if wild-card can resolve to an element decleration for "b" OTHER-WISE not.

g.3 choice(group name="myGroup", a) is a valid restriction of choice(group name="myGroup", xs:any processContents="lax"). Here model group instance is considered as a particle. But if the wild-card resolves to an element declaration that doesn't match element declaration "a", then this is NOT-A-VALID restriction.

g.4 choice(group name="myGroup", a) is not a valid restriction of choice(group name="myGroup", <xs:any/>). But this is a valid restriction if wild-card <xs:any> can find definition of element "a" which can derive (i.e is a valid subsumption) to element "a" in the derived type.

These are all the cases I can think of at the moment (enumerated A to G) which might occur for restriction between XML Schema 1.1 complexType's. I believe there would be few more complexType restriction cases which I'll try to post on this blog as I discover them.

I hope that this post was useful.

Sunday, October 10, 2010

XSD 1.1: XML schema design approaches cotd... PART 4

In this blog post i'm trying to describe (I find the subject matter here interesting enough to have a new blog post!) few more XML Schema (i'm trying to cook-up XSD 1.1 examples :) use-cases - using largely XSD 1.1 assertions which are now solvable with XML Schema 1.1 (for example constraining cardinality of XML Schema xs:list items as described below), and as per my view-point couldn't be solved with XML Schema 1.0.

I hope, XML Schema community might find few of the things here interesting.

This post can be considered the PART 4 of the XML Schema 1.1 design series that I started couple of weeks ago. The previous parts of this series are available here:

1) PART 1
2) PART 2
3) PART 3

I'm using latest XML Schema 1.1 code-base from Xerces-J SVN repos.

Use-case: (A)
The examples in this post illustrate, how we can constrain the cardinality of XML Schema 1.1 xs:list instance members, and optionally constraining (just to verify myself how XSD 1.1 assertions behave in various combinations) few aspects of list members (like for example that, list items need to be even integers).

Here's an XML instance document (this describes a simple enough list of integers encapsulated in an XML element "X"), which I'll use for illustrations in this post:

[XML 1] (named temp.xml)
  <X>2 4 6 5 10 3</X>

Below are few XML Schema 1.1 examples (with Schema 1.1 instructions highlighted with different color), and explanations from my point of view thereafter:

[XML Schema 1]
  <?xml version='1.0'?>
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
    <xs:element name="X">
       <xs:complexType>
         <xs:simpleContent>
            <xs:restriction base="INT_LIST">
              <xs:assertion test="count($value) le 5" />
            </xs:restriction>
         </xs:simpleContent>
       </xs:complexType>
    </xs:element>
   
    <xs:complexType name="INT_LIST">
       <xs:simpleContent>
         <xs:restriction base="xs:anyType">
            <xs:simpleType>
               <xs:list itemType="xs:integer" />          
            </xs:simpleType>
            <xs:assert test="every $x in $value satisfies ($x mod 2 = 0)" />
         </xs:restriction>
       </xs:simpleContent> 
    </xs:complexType>

  </xs:schema>

[XML Schema 2]
  <?xml version='1.0'?>
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
     <xs:element name="X">
        <xs:simpleType>
          <xs:restriction base="INT_LIST">
             <xs:assertion test="$value mod 2 = 0" />
          </xs:restriction>
        </xs:simpleType>
     </xs:element>
   
     <xs:simpleType name="INT_LIST">
       <xs:list itemType="xs:integer" />
     </xs:simpleType>

  </xs:schema>

[XML Schema 3]
  <?xml version='1.0'?>
  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
   
    <xs:element name="X">
      <xs:complexType>
        <xs:simpleContent>
          <xs:extension base="INT_LIST">
             <xs:assert test="count($value) le 5" />
          </xs:extension>
        </xs:simpleContent>
      </xs:complexType>
    </xs:element>
   
    <xs:simpleType name="INT_LIST">
       <xs:list itemType="xs:integer" />
    </xs:simpleType>

  </xs:schema>

Here are results of XML instance (of document [XML 1]) validation, with the specified schema's:

1. When XML document ([XML 1]) is validated by the schema [XML Schema 1], we get following validation outcomes with Xerces:
[Error] temp.xml:1:20: cvc-assertion.3.13.4.1: Assertion evaluation ('every $x in $value satisfies ($x mod 2 = 0)') for element 'X' with type 'INT_LIST' did not succeed.
[Error] temp.xml:1:20: cvc-assertion.3.13.4.1: Assertion evaluation ('count($value) le 5') for element 'X' with type '#anonymous' did not succeed.


2. When XML document ([XML 1]) is validated by the schema [XML Schema 2], we get following validation outcomes (with Xerces):
[Error] temp.xml:1:20: cvc-assertion.3.13.4.1: Assertion evaluation ('$value mod 2 = 0') for element 'X' with type '#anonymous' did not succeed. Assertion failed for an xs:list member value '5'.
[Error] temp.xml:1:20: cvc-assertion.3.13.4.1: Assertion evaluation ('$value mod 2 = 0') for element 'X' with type '#anonymous' did not succeed. Assertion failed for an xs:list member value '3'.


3. When XML document ([XML 1]) is validated by the schema [XML Schema 3], we get following validation outcomes (with Xerces):
[Error] temp.xml:1:20: cvc-assertion.3.13.4.1: Assertion evaluation ('count($value) le 5') for element 'X' with type '#anonymous' did not succeed.

Here's some quick analysis from my point of view, with regards to what I wanted to achieve with these use-cases (A):

The XML Schema 1.1 assertions XPath 2.0 context variable "$value" has a type annotation xs:anyAtomicType*.

1. The first validation result (1. above) illustrates that every item of xs:list needs to be an even integer, and number of list items are constrained to be maximum "5" (this is a sample "max" limit on number of list items).

2. I intended to use validation results 2. and 3. in combination doing an boolean "AND" of them, essentially to have same XML instance validation objective as case 1. The boolean "AND" of two schema validations can be achieved with for example, Java JAXP validation API. I wrote XML Schema document, [XML Schema 2] to have the XML Schema validator return each individual list item, which do not pass test of mathematical evenness (this was not entirely achieved with schema document [XML Schema 1] -- where the schema detected an evenness failure for whole list instance, but didn't report every individual list item which didn't pass evenness test).

I hope the intent of the use-case described here, and the solutions offered are explained clear enough for XML Schema audience.

Thanks for reading, and as usual I hope that this blog post was interesting!