Saturday, January 29, 2022

XML Schema 1.1 : conditional inclusion

I've been wanting to, write something about XML Schema (XSD) 1.1 conditional inclusion feature. This particular XML Schema 1.1 feature is described here : https://www.w3.org/TR/xmlschema11-1/#cip. I'm copying, some relevant description from XML Schema 1.1 specification about this feature as following,

<quote>
Whenever a conforming XSD processor reads a ·schema document· in order to include the components defined in it in a schema, it first performs on the schema document the pre-processing described in this section.

Every element in the ·schema document· is examined to see whether any of the attributes vc:minVersion, vc:maxVersion, vc:typeAvailable, vc:typeUnavailable, vc:facetAvailable, or vc:facetUnavailable appear among its [attributes].

Where they appear, the attributes vc:minVersion and vc:maxVersion are treated as if declared with type xs:decimal, and their ·actual values· are compared to a decimal value representing the version of XSD supported by the processor (here represented as a variable V). For processors conforming to this version of this specification, the value of V is 1.1.

If V is less than the value of vc:minVersion, or if V is greater than or equal to the value of vc:maxVersion, then the element on which the attribute appears is to be ignored, along with all its attributes and descendants. The effect is that portions of the schema document marked with vc:minVersion and/or vc:maxVersion are retained if vc:minVersion ≤ V < vc:maxVersion.
</quote>

I'll present below a small XML Schema validation example (as tested with Apache Xerces XML Schema 1.1 processor), about XSD 1.1 conditional inclusion.

Following is an XML instance document, that'll be validated by an XML Schema document,

<val>5</val>

One of the validations, that we want to do is that, an integer value of element "val" must be an even number.

Following is an XML Schema document, that'll validate the above cited XML instance document,

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
                    xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning">

  <xs:element name="val" type="Integer"/>
  
  <xs:simpleType name="Integer" vc:minVersion="1" vc:maxVersion="1.05">
      <xs:restriction base="xs:integer"/>
  </xs:simpleType>
  
  <xs:simpleType name="Integer" vc:minVersion="1.1">
      <xs:restriction base="xs:integer">
         <xs:assertion test="$value mod 2 = 0"/>
      </xs:restriction>
  </xs:simpleType>

</xs:schema>

Within the above specified schema document, there's an element declaration for XML element "val" that is of XML schema type "Integer". There are two variants, of schema type "Integer" defined in this schema. One of an "Integer" type simply says that, the value should be xs:integer (the type with attributes vc:minVersion="1" vc:maxVersion="1.05"). The other "Integer" type says that, the value should be an even integer (the type with attribute vc:minVersion="1.1").

When we perform, the above mentioned XML schema validation, using XSD 1.1 processor in XML schema 1.0 mode, the valid outcome is reported (because, the simpleType with attributes vc:minVersion="1" vc:maxVersion="1.05" is selected, and the other simpleType definition is filtered out during XML schema conditional inclusion pre-processing).

Whereas, when we perform, the above mentioned XML schema validation, using XSD 1.1 processor in XML schema 1.1 mode, an invalid outcome is reported (because, the simpleType with attribute vc:minVersion="1.1" is selected, and the other simpleType definition is filtered out during XML schema conditional inclusion pre-processing).

Please note that, when the above mentioned XML schema validation is done with a pure XML Schema 1.0 processor (that's bundled with Apache XercesJ as well) that was written for the XML Schema 1.0 specification https://www.w3.org/TR/xmlschema-1/, the above cited XSD document won't compile successfully (because, with a pure XSD 1.0 processor, we cannot have within a schema document two global type definitions with same name; "Integer" for the above cited schema document).

Tuesday, January 18, 2022

XML Schema 1.1 : using regex

I've been thinking about this for a while, and thought of writing a blog post here, about this.

Consider the following, XML document instance,

<?xml version="1.0"?>
<temp>ABCABD</temp>

And the following, XML Schema (XSD) 1.1 document (that'll validate the above mentioned, XML document instance),

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
  <xs:element name="temp">
      <xs:simpleType>
         <xs:restriction base="xs:string">
            <xs:pattern value="(ABC)+"/>
            <xs:assertion test="matches($value, '(ABC)+')"/>
         </xs:restriction>
      </xs:simpleType>
  </xs:element>
  
</xs:schema>

At first thought, as shown within the above mentioned XSD 1.1 document, it might seem that both <xs:pattern> and the <xs:assertion> would fail the validation for the XML document instance value "ABCABD" (according to the XSD document shown, the string "ABC" is shown repeating one or more times).

But in reality, and according to the XSD 1.1 specification, for the example shown above, the XML document instance value "ABCABD" would be invalid for the <xs:pattern>, but valid for <xs:assertion>. That's so because, the XPath 2.0 "matches(..)" function, returns true when any substring matches the regex, unless the "matches(..)" regex is written within ^ and & characters.

Therefore, for the above cited XSD 1.1 example, the following are exactly equivalent XSD validation checks,
<xs:pattern value="(ABC)+"/>
<xs:assertion test="matches($value, '^(ABC)+$')"/>

And for <xs:pattern>, there's no explicit regex anchoring with ^ and $ available (its implied always). i.e, with <xs:pattern>, its always the entire string input that is checked against the pattern regex.