Sunday, November 13, 2016

XML Schema : <assert> helps us process wild-cards and attributes

Please let me illustrate my point with the following XML Schema (1.1) example:

XML Schema document:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="X">
     <xs:complexType>
        <xs:sequence>
           <xs:any processContents="skip" minOccurs="3" maxOccurs="3"/>
        </xs:sequence>
        <xs:attribute name="y1" type="xs:string"/>
        <xs:attribute name="y2" type="xs:string"/>
        <xs:attribute name="y3" type="xs:string"/>
        <xs:assert test="deep-equal(for $el in * return name($el), for $at in @* return name($at))"/>
     </xs:complexType>
   </xs:element>
 
</xs:schema>

This schema document says following:
1) A wild-card requires 3 element nodes.
2) There are 3 attribute nodes, of same cardinality as the elements.
3) The <assert> says that, name of elements validated by wild-cards must be same as the names of attributes.

Here's a valid XML document for the above schema document:
<?xml version="1.0" encoding="UTF-8"?>
<X y1="A" y2="B" y3="C">
  <y1>A</y1>
  <y2>B</y2>
  <y3>C</y3>
</X>

Reference to XPath 2.0 language (for <assert> path expressions) : https://www.w3.org/TR/xpath20/.

I hope this example is useful.

Friday, October 28, 2016

XML Schema 1.1 : assertion refines enumeration

In this post, I'll try to explain how the XML Schema 1.1 <assertion> facet refines the XML Schema <enumeration> facet in useful ways, and also helps us build a useful domain vocabulary written in the XML Schema language.

Consider the following XML Schema 1.1 document:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="day" type="Day"/>
  
   <xs:element name="workday" type="WorkDay"/>
  
   <xs:element name="holiday" type="Holiday"/>
  
   <xs:simpleType name="WorkDay">
      <xs:restriction base="Day">
         <xs:assertion test="$value != 'saturday' and $value != 'sunday'"/>
      </xs:restriction>
   </xs:simpleType>
  
   <xs:simpleType name="Holiday">
     <xs:restriction base="Day">
        <xs:assertion test="$value = 'saturday' or $value = 'sunday'"/>
     </xs:restriction>
   </xs:simpleType> 
  
   <xs:simpleType name="Day">
      <xs:restriction base="xs:string">
         <xs:enumeration value="monday"/>
         <xs:enumeration value="tuesday"/>
         <xs:enumeration value="wednesday"/>
         <xs:enumeration value="thursday"/>
         <xs:enumeration value="friday"/>
         <xs:enumeration value="saturday"/>
         <xs:enumeration value="sunday"/>
      </xs:restriction>
   </xs:simpleType>
 
</xs:schema>

This is a very simple XSD document, and the XML documents validated by this XSD document will also be very simple.

The following is one invalid document for the given schema,
<holiday>wednesday</holiday>

To my opinion, this example has illustrated how the <assertion> facet refines the <enumeration> facet during simple type derivation, and also helps us clearly build a fine domain vocabulary. The types "WorkDay" and "Holiday" are verbally linked to the type "Day" in the XML Schema document as in this example.

Monday, October 3, 2016

XML Schema 1.1 : overlap in concept of CTA and "assert"

Here's another example, where I thought Conditional Type Alternative would have worked. It cannot in this case, because we have an attribute on element "X" and the data type is simple. Therefore we have to use an <assert> to solve this (because there's lot of conditional stuff here, an use case of XPath 2.0 "if", and a straight forward case of co-occurrence constraints).

Two valid XMLs:

(for attribute value 1, the simple type content of element "X" must be even)
<?xml version="1.0" encoding="UTF-8"?>
<X xa="1">
   4
</X>

(for attribute value 2, the simple type content of element "X" must be odd)
<?xml version="1.0" encoding="UTF-8"?>
<X xa="2">
   5
</X>

Invalid XML (attribute value other than 1 or 2 is not allowed):
<?xml version="1.0" encoding="UTF-8"?>
<X xa="3">
   4
</X>

The XML Schema 1.1 solution is below:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="X">
      <xs:complexType>
         <xs:simpleContent>
            <xs:extension base="xs:int">
               <xs:attribute name="xa" type="xs:int"/>
               <xs:assert test="if (@xa = 1) then $value mod 2 = 0
                            else if (@xa = 2) then $value mod 2 = 1
                                else false()"/>
            </xs:extension>
         </xs:simpleContent>
      </xs:complexType>
   </xs:element>
 
</xs:schema>

I hope this example is helpful.

Monday, September 26, 2016

XML Schema 1.1 : Conditional Type Assignment revisited

One of the other features of XML Schema 1.1, that I like very much is "conditional type assignment", or CTA. The only requirement is, that there must be an attribute on an XML element to use this feature.

Here is a very simple example.

I'm directly writing an XML Schema 1.1 document below using CTA:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="X">
        <xs:alternative type="Type1" test="@xa = 1"/>
        <xs:alternative type="Type2" test="@xa = 2"/>
        <xs:alternative type="xs:error"/>
   </xs:element>
  
   <xs:complexType name="Type1">
        <xs:sequence>
            <xs:element name="a" type="xs:int"/>
            <xs:element name="b" type="xs:int"/>
        </xs:sequence>
        <xs:attribute name="xa" type="xs:int"/>
   </xs:complexType>
  
   <xs:complexType name="Type2">
        <xs:sequence>
            <xs:element name="p" type="xs:int"/>
            <xs:element name="q" type="xs:int"/>
        </xs:sequence>
        <xs:attribute name="xa" type="xs:int"/>
   </xs:complexType>
 
</xs:schema>

The requirement of XML Schema 1.1 validation in this case is: If the attribute "xa" on element "X" has value 1, then element "X" has a certain type. If the value of attribute "xa" is 2, then element "X" has another type.

The two valid XML documents for the given XML Schema document are following:

<?xml version="1.0" encoding="UTF-8"?>
<X xa="1">
    <a>1</a>
    <b>2</b> 
</X>

and,

<?xml version="1.0" encoding="UTF-8"?>
<X xa="2">
  <p>1</p>
  <q>2</q> 
</X>

For anything else as value of attribute "xa", or infact any other kind of content the element "X" will be assigned the type xs:error (which makes the element "X" invalid").

Sunday, September 25, 2016

XML Schema 1.1 : accessing an XML tree structure during validation

In this post, I'll discuss an XML Schema validity definition that spans sibling elements in an XML document. Implementing this has become possible with XML Schema 1.1, by its new co-occurence facility.

Here's the XML document that needs to be validated by an XML Schema document:

<?xml version="1.0" encoding="UTF-8"?>
<X>
    <a>1</a>
    <b>2</b>
    <c>3</c>
    <d>4</d>
    <e>5</e>
</X>

The validation requirement is : write an XML Schema document, that meets following conditions:
Element "X" is valid, if sum of values within its child elements is greater than 7 (this is a hypothetical number for this problem).

The following XML Schema document solves this problem:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
         <xs:complexType>
             <xs:sequence>
                 <xs:element name="a" type="xs:int"/>
                 <xs:element name="b" type="xs:int"/>
                 <xs:element name="c" type="xs:int"/>
                 <xs:element name="d" type="xs:int"/>
                 <xs:element name="e" type="xs:int"/>
            </xs:sequence>         
            <xs:assert test="sum(*) gt 7"/>
            <!-- this assert also does the same thing : <xs:assert test="sum(a | b | c | d | e) gt 7"/> -->
         </xs:complexType>
    </xs:element>
   
</xs:schema>

This and earlier few posts illustrates the usefulness that XML Schema 1.1 <assert> (and also <assertion>) construct has. The simplicity behind this is, that XML Schema 1.1 <assert> / <assertion> can use the whole 'schema type aware' XPath 2.0 language, expressions of which work on the context tree (in case of <assert>) on which a particular set of schema <assert>'s works. Remember that, <assertion> is a facet (just like <minInclusive> for example) that has access only to an atomic value that is validated.

Please don't be mislead by the title of this post, "accessing an XML tree structure during validation" in a sense that, it applies only to an <assert>. It means also similarly, for example during the complex type definition of an XML element (in which we're defining the XML structure as a tree below a specific XML element). This post uses this terminology for XML Schema 1.1 <assert> trees, and not for other kinds of trees as mentioned.

Saturday, September 24, 2016

XML Schema 1.1 assertion facet on a simple type list and union

Here's some more information I have on using an XML Schema 1.1 <assertion> facet, when a simple type is used that has variety list or union. Note that, an XSD simple type can be of following 3 kinds:

1) <xs:simpleType
        <xs:restrition base="some-type

2) <xs:simpleType
       <xs:list itemType="some-type

3) <xs:simpleType
       <xs:union memberTypes="type-1, type-2, ..."


Example of an XML Schema simple type with variety list:
XML document:
<?xml version="1.0" encoding="UTF-8"?>
<X>1 2 3 4 5</X>

Write an XML Schema 1.1 document, that will report an XML document as valid when following conditions are met:
The element "X" has a simple type with variety list, such that the item type of the list is a simple type that validates even numbers.

The following XML Schema 1.1 document, is the solution for this requirement:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="X">
        <xs:simpleType>
            <xs:list itemType="EvenNum"/>
        </xs:simpleType>
   </xs:element>
  
   <xs:simpleType name="EvenNum">
        <xs:restriction base="xs:int">
            <xs:assertion test="$value mod 2 = 0"/>
        </xs:restriction>
   </xs:simpleType>
 
</xs:schema>

In this example, the XML document has following values as invalid in the list: 1, 3 & 5. When validated with Apache Xerces, following XML Schema validation outcome is reported:

[Error] list.xml:2:17: cvc-assertions-valid: Value '1' is not facet-valid with respect to assertion '$value mod 2 = 0'.
[Error] list.xml:2:17: cvc-assertion: Assertion evaluation ('$value mod 2 = 0') for element 'X' on schema type 'EvenNum' did not succeed. Assertion failed for an xs:list member value '1'.
[Error] list.xml:2:17: cvc-assertions-valid: Value '3' is not facet-valid with respect to assertion '$value mod 2 = 0'.
[Error] list.xml:2:17: cvc-assertion: Assertion evaluation ('$value mod 2 = 0') for element 'X' on schema type 'EvenNum' did not succeed. Assertion failed for an xs:list member value '3'.
[Error] list.xml:2:17: cvc-assertions-valid: Value '5' is not facet-valid with respect to assertion '$value mod 2 = 0'.
[Error] list.xml:2:17: cvc-assertion: Assertion evaluation ('$value mod 2 = 0') for element 'X' on schema type 'EvenNum' did not succeed. Assertion failed for an xs:list member value '5'.

A valid XML document will be, for example: <X>2 4</X>.

Example of an XML Schema simple type with variety union (its called union, because the value space of the simple type is a union of 2 or more simple types):
XML document:
<?xml version="1.0" encoding="UTF-8"?>
<X>
    <val>3</val>
    <val>2017-12-05</val>
</X>

Write an XML Schema 1.1 document, that will report an XML document as valid when following conditions are met:
The element "X" has an XSD complex type with following description,
Its a sequence of "val" elements (let's say maxOccurs of it is 5, or it could be unbounded if you wish). The value of element "val" is defined by the following simple type,
Its an union of 2 simple types T1 and T2 with following definitions:

<!-- a simple type validating even numbers -->
<xs:simpleType name="T1">
        <xs:restriction base="xs:int">
            <xs:assertion test="$value mod 2 = 0"/>
        </xs:restriction>
</xs:simpleType>  

<!-- a simple type that validates specific date values; values that are less than a specific date -->
<xs:simpleType name="T2">
      <xs:restriction base="xs:date">
          <xs:assertion test="$value lt xs:date('2016-01-01')"/>
      </xs:restriction>
</xs:simpleType>

The following XML Schema 1.1 document is a complete schema document, that is a solution for this requirement:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="X">
        <xs:complexType>
            <xs:sequence>
               <xs:element name="val" maxOccurs="5">
                  <xs:simpleType>
                      <xs:union memberTypes="T1 T2"/>
                  </xs:simpleType>
               </xs:element>
            </xs:sequence>
        </xs:complexType>
   </xs:element>
  
   <xs:simpleType name="T1">
        <xs:restriction base="xs:int">
            <xs:assertion test="$value mod 2 = 0"/>
        </xs:restriction>
   </xs:simpleType>
  
   <xs:simpleType name="T2">
        <xs:restriction base="xs:date">
            <xs:assertion test="$value lt xs:date('2016-01-01')"/>
        </xs:restriction>
   </xs:simpleType>
 
</xs:schema>

For the XML document given (an invalid one), the following validation outcomes are reported by Apache Xerces's XML Schema 1.1 validator:

[Error] union.xml:3:15: cvc-assertions-valid-union-elem: Value '3' is not facet-valid with respect to the specified assertions, on type '#AnonType_valX' on element 'val'.
[Error] union.xml:3:15: cvc-datatype-valid.1.2.3: '3' is not a valid value of union type '#AnonType_valX'.
[Error] union.xml:3:15: cvc-type.3.1.3: The value '3' of element 'val' is not valid.
[Error] union.xml:4:24: cvc-assertions-valid-union-elem: Value '2017-12-05' is not facet-valid with respect to the specified assertions, on type '#AnonType_valX' on element 'val'.
[Error] union.xml:4:24: cvc-datatype-valid.1.2.3: '2017-12-05' is not a valid value of union type '#AnonType_valX'.
[Error] union.xml:4:24: cvc-type.3.1.3: The value '2017-12-05' of element 'val' is not valid.

It should be fairly easy, to specify one of a valid XML documents for this requirement.

XML Schema 1.1 assertion facet revisited

Consider the following XML document:

<?xml version="1.0" encoding="UTF-8"?>
<X>
    <a>1</a>
    <a>2</a>
    <a>3</a>
    <a>4</a>
    <a>5</a> 
</X>

We have the following requirement for XML Schema validation : The element "X" will be considered valid, when the values in each of element "a" within it has mathematical even values. In the example above, following three values of elements "a" makes the element "X" invalid : 1, 3 & 5. The following XML Schema 1.1 document using the <assertion> facet (its a facet just like XML Schema 1.0 facets "minInclusive" etc), implements these requirements:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="X">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="a" maxOccurs="10">
                    <xs:simpleType>
                        <xs:restriction base="xs:int">
                            <xs:assertion test="$value mod 2 = 0"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
   </xs:element>
 
</xs:schema>

Implementing this requirement, requires using the <assertion> facet, since we have to use the XPath 2.0 "mod" operator to test for even values.

When using Apache Xerces as an XML Schema 1.1 validation engine, we get the following outputs for the validation attempt:
 [Error] x1.xml:3:11: cvc-assertions-valid: Value '1' is not facet-valid with respect to assertion '$value mod 2 = 0'.
[Error] x1.xml:3:11: cvc-assertion: Assertion evaluation ('$value mod 2 = 0') for element 'a' on schema type '#AnonType_aX' did not succeed.
[Error] x1.xml:5:11: cvc-assertions-valid: Value '3' is not facet-valid with respect to assertion '$value mod 2 = 0'.
[Error] x1.xml:5:11: cvc-assertion: Assertion evaluation ('$value mod 2 = 0') for element 'a' on schema type '#AnonType_aX' did not succeed.
[Error] x1.xml:7:11: cvc-assertions-valid: Value '5' is not facet-valid with respect to assertion '$value mod 2 = 0'.
[Error] x1.xml:7:11: cvc-assertion: Assertion evaluation ('$value mod 2 = 0') for element 'a' on schema type '#AnonType_aX' did not succeed.

This is a really nice capability of XML Schema 1.1 I think. Also note that, within error messages we see type names as '#AnonType_aX' (this is fine and great, and is a historical Apache Xerces error reporting, and it stands for anonymous XML Schema types since the type doesn't have a name). Had we given a specific name to the complex type, like "TypeX", then that would have appeared in the error messages if errors are there during the XML document validation.