Wednesday, September 21, 2022

XPath/XSLT 1.0 data model and beyond

Is the inherent XPath/XSLT 1.0 data model better from the point of view of functional capabilities, or the data models of next versions (2.0, 3.0) of these language specifications?

XPath/XSLT 1.0 data model, focuses on having a well-formed XML document tree as part of the data model. Whereas, 2.0 and 3.0 versions of these language specifications, focus on having a flat sequence of data model items (like atomic/list values or XML nodes). Many of the XPath/XSLT 2.0 and 3.0 use cases, still focus on achieving well-formed XML document trees as part of the output of an XSLT transform.

Although, the definition of data models for 1.0 versions of these language specifications, is fundamentally different than 2.0 versions of these language specifications (one is a coherent XML tree, whereas the newer version is a sequence of data model items), the XSLT 1.0 and 2.0/3.0 transforms try to achieve the same end-result (i.e, an XML well-formed serialization of the data model instance).

I think, XSLT 2.0/3.0 brought sequence of data model items as a fundamental new definition of data model, because XPath 2.0/3.0 data model components need to be strongly typed at a granular level (aligning with XML Schema specification).

If we need, greater strongly typed process of achieving an end-result of the XSLT transform, we should select the 2.0/3.0 versions of these language specifications. Otherwise we should opt for the 1.0 versions of these specifications.

The 2.0/3.0 versions of these language specifications, have brought in newer XSLT language features, and also a vastly expanded function library. That's an advantage of using the XSLT 2.0/3.0 languages, than the 1.0 version of these languages.

At various times, I'm not desirous of too much strong typing (in an XML Schema sense) within an XSLT transformation process (because that involves, greater design effort upfront), and if my XML transformation requirements are simple I tend to opt for an XSLT 1.0 transform. I certainly go for, XSLT 2.0/3.0 options, if I'm not constrained by these factors.

Monday, May 30, 2022

XML Schema : identity constraints essentials and best practices

In this blog post, I'll attempt to describe the best practices, for the use of XML Schema (XSD) identity constraints. I'm going to compare here, the XSD identity constraint instructions xs:unique and xs:key, and describe when to use which one of these.

The XSD xs:key serves the same purpose within XML, as the RDBMS primary keys, whereas XSD xs:unique is a generic syntax to enforce unique values within a set of XML data values. xs:key also enforces, unique values within a set of XML data values. Unlike xs:key, xs:unique permits the values within a XML dataset to be absent (i.e, logically speaking as null values).

Please consider following XML Schema validation example.

XML Schema document:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="catalog" type="CatalogType">

      <xs:unique name="prodNumKey">

         <xs:selector xpath="*/product"/>

         <xs:field xpath="number"/>

      </xs:unique>

   </xs:element>

   <xs:complexType name="CatalogType">

      <xs:sequence>

         <xs:element name="department" maxOccurs="unbounded">

            <xs:complexType>

               <xs:sequence>

                  <xs:element name="product" maxOccurs="unbounded">

                     <xs:complexType>

                        <xs:sequence>

                           <xs:element name="number" type="xs:positiveInteger" minOccurs="0"/>

                           <xs:element name="name" type="xs:string"/>

                           <xs:element name="price">

                              <xs:complexType>

                                 <xs:simpleContent>

                                    <xs:extension base="xs:decimal">

                                       <xs:attribute name="currency" type="xs:string"/>

                                    </xs:extension>

                                 </xs:simpleContent>

                              </xs:complexType>

                           </xs:element>

                        </xs:sequence>

                     </xs:complexType>   

                  </xs:element>

               </xs:sequence>

               <xs:attribute name="number" type="xs:positiveInteger"/>

            </xs:complexType>

         </xs:element>

      </xs:sequence>

   </xs:complexType>

</xs:schema>

One of valid XML instance document, for the above mentioned XSD schema, is following:

<catalog>

  <department number="021">

    <product>

      <number>557</number>

      <name>Short-Sleeved Linen Blouse</name>

      <price currency="USD">29.99</price>

    </product>

    <product>

      <name>Ten-Gallon Hat</name>

      <price currency="USD">69.99</price>

    </product>

    <product>

      <number>443</number>

      <name>Deluxe Golf Umbrella</name>

      <price currency="USD">49.99</price>

    </product>

  </department>

</catalog>

Pleas note, the following, within above cited XML Schema validation example,

1) Within the XML Schema document, the "number" child of "product" is specified as following,

<xs:element name="number" type="xs:positiveInteger" minOccurs="0"/>

(i.e, with minOccurs="0", meaning that this element is optional within the corresponding XML instance document)

2) The "catalog" element has following XSD xs:unique definition bound to it,

<xs:unique name="prodNumKey">

     <xs:selector xpath="*/product"/>

     <xs:field xpath="number"/>

</xs:unique>

The above stated facts, mean that, the "number" element is not intended to function as the primary key of "product" data set (because, the primary key value has to be present within all the records of the data set), but for the set of "number" elements that are present within the mentioned XML instance document (the "number" element can be absent within certain "product" elements, as per the above mentioned XML Schema document) their values have to be unique.

We've discussed, the role of XSD xs:unique instruction within above mentioned paragraphs.


Now, as we've stated earlier within this blog post, how do we enforce primary key kind of behavior within an XML Schema document.

Within the context, of above mentioned example, this can be simply done by changing the "number" element declaration to following (i.e, we must not write minOccurs="0" within the XML element declaration),

<xs:element name="number" type="xs:positiveInteger"/>

And, write the "catalog" element declaration as following (i.e, we now use xs:key instead of xs:unique), 

<xs:element name="catalog" type="CatalogType">

      <xs:key name="prodNumKey">

         <xs:selector xpath="*/product"/>

         <xs:field xpath="number"/>

      </xs:key>

</xs:element>

The above changes to the XML Schema document mean that,

All "product" elements must have a "number" child, and all the "number" values within XML instance document have to be unique (and, these characteristics shall make, "number" element as a primary key for "product" data set).


The XML Schema features, related to constructs xs:unique and xs:key, described within this blog post, are supported both by 1.0 and 1.1 versions of XML Schema language.


Acknowledgements : The XML Schema validation example, mentioned within this blog post is borrowed from Priscilla Walmsley's excellent book "Definitive XML Schema, 2nd edition".



Thursday, February 24, 2022

XML Schema 1.1 : <assertion> facet with attribute "fixed"

I've come up with an XML Schema 1.1 example, involving XSD <assertion> facet and XSD attribute "fixed", that I thought should be interesting to write about.

Please consider, following two XML instance documents,

XML document 1:
<?xml version="1.0"?>
<Test>
    <A>a</A>
    <country>USA</country>
    <C>c</C>
</Test>

XML document 2:
<?xml version="1.0"?>
<Test>
    <A>a</A>
    <country>U S A</country>
    <C>c</C>
</Test>

According to "XML document 1" specified above, the element "country" needs to have a fixed value "USA". Whereas, according to "XML document 2" specified above, the element "country" needs to have a fixed value USA with any amount of whitespace characters anywhere within the string value.

The XSD 1.1 schema, for "XML document 1" is following (the schema specified below, is a valid XSD 1.0 schema as well),

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
    <xs:element name="Test" type="TestType" />
   
    <xs:complexType name="TestType">
        <xs:sequence>
            <xs:element name="A" type="xs:string"/>
            <xs:element name="country" type="xs:string" fixed="USA"/>            
            <xs:element name="C" type="xs:string"/>
        </xs:sequence>
    </xs:complexType>

</xs:schema>

Whereas, XSD 1.1 schema, for "XML document 2" is following,

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
    <xs:element name="Test" type="TestType" />
   
    <xs:complexType name="TestType">
        <xs:sequence>
            <xs:element name="A" type="xs:string"/>
            <xs:element name="country">
               <xs:simpleType>
                  <xs:restriction base="xs:string">
                    <xs:assertion test="replace($value, '\s', '') = 'USA'"/>
                  </xs:restriction>
               </xs:simpleType>
            </xs:element>
            <xs:element name="C" type="xs:string"/>
        </xs:sequence>
    </xs:complexType>

</xs:schema>

According to the latter schema specified above, the XSD 1.1 <assertion> facet lets us achieve, a special notion of a fixed value as illustrated by the mentioned example above.

Saturday, January 29, 2022

XML Schema 1.1 : conditional inclusion

I've been wanting to, write something about XML Schema (XSD) 1.1 conditional inclusion feature. This particular XML Schema 1.1 feature is described here : https://www.w3.org/TR/xmlschema11-1/#cip. I'm copying, some relevant description from XML Schema 1.1 specification about this feature as following,

<quote>
Whenever a conforming XSD processor reads a ·schema document· in order to include the components defined in it in a schema, it first performs on the schema document the pre-processing described in this section.

Every element in the ·schema document· is examined to see whether any of the attributes vc:minVersion, vc:maxVersion, vc:typeAvailable, vc:typeUnavailable, vc:facetAvailable, or vc:facetUnavailable appear among its [attributes].

Where they appear, the attributes vc:minVersion and vc:maxVersion are treated as if declared with type xs:decimal, and their ·actual values· are compared to a decimal value representing the version of XSD supported by the processor (here represented as a variable V). For processors conforming to this version of this specification, the value of V is 1.1.

If V is less than the value of vc:minVersion, or if V is greater than or equal to the value of vc:maxVersion, then the element on which the attribute appears is to be ignored, along with all its attributes and descendants. The effect is that portions of the schema document marked with vc:minVersion and/or vc:maxVersion are retained if vc:minVersion ≤ V < vc:maxVersion.
</quote>

I'll present below a small XML Schema validation example (as tested with Apache Xerces XML Schema 1.1 processor), about XSD 1.1 conditional inclusion.

Following is an XML instance document, that'll be validated by an XML Schema document,

<val>5</val>

One of the validations, that we want to do is that, an integer value of element "val" must be an even number.

Following is an XML Schema document, that'll validate the above cited XML instance document,

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
                    xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning">

  <xs:element name="val" type="Integer"/>
  
  <xs:simpleType name="Integer" vc:minVersion="1" vc:maxVersion="1.05">
      <xs:restriction base="xs:integer"/>
  </xs:simpleType>
  
  <xs:simpleType name="Integer" vc:minVersion="1.1">
      <xs:restriction base="xs:integer">
         <xs:assertion test="$value mod 2 = 0"/>
      </xs:restriction>
  </xs:simpleType>

</xs:schema>

Within the above specified schema document, there's an element declaration for XML element "val" that is of XML schema type "Integer". There are two variants, of schema type "Integer" defined in this schema. One of an "Integer" type simply says that, the value should be xs:integer (the type with attributes vc:minVersion="1" vc:maxVersion="1.05"). The other "Integer" type says that, the value should be an even integer (the type with attribute vc:minVersion="1.1").

When we perform, the above mentioned XML schema validation, using XSD 1.1 processor in XML schema 1.0 mode, the valid outcome is reported (because, the simpleType with attributes vc:minVersion="1" vc:maxVersion="1.05" is selected, and the other simpleType definition is filtered out during XML schema conditional inclusion pre-processing).

Whereas, when we perform, the above mentioned XML schema validation, using XSD 1.1 processor in XML schema 1.1 mode, an invalid outcome is reported (because, the simpleType with attribute vc:minVersion="1.1" is selected, and the other simpleType definition is filtered out during XML schema conditional inclusion pre-processing).

Please note that, when the above mentioned XML schema validation is done with a pure XML Schema 1.0 processor (that's bundled with Apache XercesJ as well) that was written for the XML Schema 1.0 specification https://www.w3.org/TR/xmlschema-1/, the above cited XSD document won't compile successfully (because, with a pure XSD 1.0 processor, we cannot have within a schema document two global type definitions with same name; "Integer" for the above cited schema document).

Tuesday, January 18, 2022

XML Schema 1.1 : using regex

I've been thinking about this for a while, and thought of writing a blog post here, about this.

Consider the following, XML document instance,

<?xml version="1.0"?>
<temp>ABCABD</temp>

And the following, XML Schema (XSD) 1.1 document (that'll validate the above mentioned, XML document instance),

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
  <xs:element name="temp">
      <xs:simpleType>
         <xs:restriction base="xs:string">
            <xs:pattern value="(ABC)+"/>
            <xs:assertion test="matches($value, '(ABC)+')"/>
         </xs:restriction>
      </xs:simpleType>
  </xs:element>
  
</xs:schema>

At first thought, as shown within the above mentioned XSD 1.1 document, it might seem that both <xs:pattern> and the <xs:assertion> would fail the validation for the XML document instance value "ABCABD" (according to the XSD document shown, the string "ABC" is shown repeating one or more times).

But in reality, and according to the XSD 1.1 specification, for the example shown above, the XML document instance value "ABCABD" would be invalid for the <xs:pattern>, but valid for <xs:assertion>. That's so because, the XPath 2.0 "matches(..)" function, returns true when any substring matches the regex, unless the "matches(..)" regex is written within ^ and & characters.

Therefore, for the above cited XSD 1.1 example, the following are exactly equivalent XSD validation checks,
<xs:pattern value="(ABC)+"/>
<xs:assertion test="matches($value, '^(ABC)+$')"/>

And for <xs:pattern>, there's no explicit regex anchoring with ^ and $ available (its implied always). i.e, with <xs:pattern>, its always the entire string input that is checked against the pattern regex.