Mukul Gandhi: March 2020

Saturday, March 21, 2020

Using XML Schema 1.1 <alternative> with Xerces-J

I wish to share little information here, about Apache Xerces-J's implementation of XML Schema (XSD) 1.1 'type alternatives'.

The XSD 1.1 specification, defines a particular subset of XPath 2.0 language that can be used as value of 'test' attribute of XSD 1.1 <alternative> element. The XSD 1.1 language's XPath 2.0 subset is much smaller than the whole XPath 2.0 language. The specification of this smaller CTA XPath subset, can be read at https://www.w3.org/TR/xmlschema11-1/#coss-ta (specifically, the section mentioning '2.1 It conforms to the following extended BNF' which has grammar specification for the CTA XPath subset).

In fact, the XSD 1.1 specification allows XSD validators, implementing XSD 1.1's <alternative> element, to support a bigger set of XPath 2.0's features (commonly the full XPath 2.0 language) than what is defined by XSD 1.1 CTA (conditional type alternatives) XPath subset.

For XSD 1.1 CTAs, Xerces-J with user option, allows selecting either:

1) The smaller XPath subset (the default for Xerces-J), or

2) Full XPath 2.0. How selecting between XPath subset or the full XPath 2.0 language, can be done for Xerces-J's CTA implementation is described here, https://xerces.apache.org/xerces2-j/faq-xs.html#faq-3.

I've analyzed a bit, the nature of XSD 1.1 CTA XPath subset language. Following are essentially the main XSD 1.1 CTA XPath subset patterns, that may be used within XSD 1.1 schemas when using XSD <alternative> element,

1) Using comparators (like >, <, =, !=, <=, >=):

The example CTA XPath expressions are following,
@x = @y,
@x = 3,
@x != 3,
@x > @y

2) Using comparators with logical operators:

The example CTA XPath expressions are following,
(@x = @y) or (@p = @q),
((1 = 2) or (5 = 6)) and (5 = 7),
(1 and 2) or (5 and 7)

3) Using XPath 2.0 'not' function:

An example XPath expression is following,
(@x = @y) and not(@p)

Interestingly, the XSD 1.1 CTA XPath subset language, allows using only the XPath 2.0 fn:not function and no other XPath 2.0 built-in functions. Constructor functions, for all built-in XSD types may be used, for e.g xs:integer(..), xs:boolean(..) etc, in XSD 1.1 CTA XPath subset expressions.

As per the XSD 1.1 specification, during XSD 1.1 CTA evaluations, the XML element and attribute nodes are untyped (i.e the XML nodes do not carry any type annotation coming from a XML schema). Therefore, in many cases, XSD 1.1 CTA XPath subset expressions when used with Xerces-J need to use explicit casts (for e.g, <xs:alternative test="(xs:integer(@x) = xs:integer(@y)) and fn:not(xs:boolean(@p))"> with namespace prefix 'fn' bound to the URI 'http://www.w3.org/2005/xpath-functions'). For the CTA XPath subset language or the full XPath 2.0 language for CTAs, it is optional for the XPath expressions to have the "fn" prefix with the XPath built-in functions. Typically, XML schema authors would not use the "fn" prefix for XPath built-in functions.

Tuesday, March 10, 2020

XML Schema 1.1 <assert> continued ...

This blog post is related to the XML Schema (XSD) use case that I've discussed within my previous two blog posts. Consider the following XML Schema 1.1 document, having an XSD <assert> element,

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="X">
<xs:complexType>
<xs:sequence>
<xs:element name="isSeqTwo" type="xs:boolean"/>
<xs:choice>
<xs:sequence>
<xs:element name="a" type="xs:string"/>
<xs:element name="b" type="xs:string"/>
</xs:sequence>
<xs:sequence>
<xs:element name="p" type="xs:string"/>
<xs:element name="q" type="xs:string"/>
</xs:sequence>
<xs:sequence>
<xs:element name="x" type="xs:string"/>
<xs:element name="y" type="xs:string"/>
</xs:sequence>
</xs:choice>
</xs:sequence>
<xs:assert test="if (isSeqTwo = true()) then p else not(p)"/>
</xs:complexType>
</xs:element>

</xs:schema>

The above schema document, is different than my earlier schema documents that I've presented within my previous two blog posts, in following way:
The XML child content model of an element "X", is a sequence of an element followed by a choice.

Within the earlier two blog posts that I've presented, the XML child content model of element "X" is dependent on the value of an attribute on an element "X", which could be enforced using either an XSD 1.1 <assert> or an <alternative>.

Few XML instance documents that are valid or invalid, according to the above XSD schema document are following:

Valid,

<X>
<isSeqTwo>0</isSeqTwo>
<x>string1</x>
<y>string2</y>
</X>

Valid,

<X>
<isSeqTwo>1</isSeqTwo>
<p>string1</p>
<q>string2</q>
</X>

Invalid,

<X>
<isSeqTwo>1</isSeqTwo>
<x>string1</x>
<y>string2</y>
</X>

The XSD use case illustrated above, is useful and could only be accomplished using an XSD 1.1 <assert> element.

As a side discussion, to re-affirm I would like to cite from the XML Schema 1.1 structures specification the following rules: 3.4.4.2 Element Locally Valid (Complex Type) that say,
For an element information item E to be locally ·valid· with respect to a complex type definition T all of the following must be true:
1
2
3
...
6 E is ·valid· with respect to each of the assertions in T.{assertions} as per Assertion Satisfied (§3.13.4.1).

We can infer, from the above rules from XSD 1.1 spec, that an XML instance element is valid according to a XSD complex type definition, if an XML instance element is valid with respect to each of the assertions present on the complex type with which an XML instance element is validated, in addition to other XSD complex type validation rules.

Sunday, March 1, 2020

XML Schema 1.1 <alternative> use cases with <choice> and <attribute>

While using XML Schema (XSD) 1.1, many times when we use XSD 1.1 <assert> we could find a solution using XSD 1.1 <alternative> as well for the same use cases (and vice versa as well). This is usually the case, when the XML child content model of an element, is dependent on the values of attributes of an element on which the attributes appear. This is evident for the first example, of my previous blog post. Given the same XML input examples, as in the first example of my previous blog post, the following XML Schema 1.1 example using <alternative> is also a possible solution,

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="X">
<xs:alternative test="xs:boolean(@isB) eq true()">
<xs:complexType>
<xs:sequence>
<xs:element name="b" type="xs:string"/>
</xs:sequence>
<xs:attribute name="isB" type="xs:boolean" use="required"/>
</xs:complexType>
</xs:alternative>
<xs:alternative>
<xs:complexType>
<xs:choice>
<xs:element name="a" type="xs:string"/>
<xs:element name="c" type="xs:string"/>
</xs:choice>
<xs:attribute name="isB" type="xs:boolean" use="required"/>
</xs:complexType>
</xs:alternative>
</xs:element>

</xs:schema>

Then the question arises, for these same use cases should we use XSD 1.1 <assert> or an <alternative>? Below are the pros and cons for this, according to me:
1) An XSD 1.1 solution, using <assert> has less lines of code than the one using <alternative>, which many would consider as a benefit.
2) I personally, prefer an XPath expression '@isB = true()' (within 'if (@isB = true()) then b else not(b)') of an <assert> over 'xs:boolean(@isB) eq true()' in an <alternative>. With these examples, for the example involving <alternative> an attribute node 'isB' has a type annotation of xs:untypedAtomic that requires an explicit cast with xs:boolean(..). I tend to prefer, the XPath expressions that don't use explicit casts (since, such XPath expressions look more schema aware).
3) One of the benefits, I see with the solution using an XSD 1.1 <alternative> over <assert>, is better error diagnostics in case of XML validation errors.

Mukul Gandhi