Saturday, March 21, 2020

Using XML Schema 1.1 <alternative> with Xerces-J

I wish to share little information here, about Apache Xerces-J's implementation of XML Schema (XSD) 1.1 'type alternatives'.

The XSD 1.1 specification, defines a particular subset of XPath 2.0 language that can be used as value of 'test' attribute of XSD 1.1 <alternative> element. The XSD 1.1 language's XPath 2.0 subset is much smaller than the whole XPath 2.0 language. The specification of this smaller CTA XPath subset, can be read at https://www.w3.org/TR/xmlschema11-1/#coss-ta (specifically, the section mentioning '2.1 It conforms to the following extended BNF' which has grammar specification for the CTA XPath subset).

In fact, the XSD 1.1 specification allows XSD validators, implementing XSD 1.1's <alternative> element, to support a bigger set of XPath 2.0's features (commonly the full XPath 2.0 language) than what is defined by XSD 1.1 CTA (conditional type alternatives) XPath subset.

For XSD 1.1 CTAs, Xerces-J with user option, allows selecting either:

1) The smaller XPath subset (the default for Xerces-J), or

2) Full XPath 2.0. How selecting between XPath subset or the full XPath 2.0 language, can be done for Xerces-J's CTA implementation is described here, https://xerces.apache.org/xerces2-j/faq-xs.html#faq-3.

I've analyzed a bit, the nature of XSD 1.1 CTA XPath subset language. Following are essentially the main XSD 1.1 CTA XPath subset patterns, that may be used within XSD 1.1 schemas when using XSD <alternative> element,

1) Using comparators (like >, <, =, !=, <=, >=):

The example CTA XPath expressions are following,
@x = @y,
@x = 3,
@x != 3,
@x > @y

2) Using comparators with logical operators:

The example CTA XPath expressions are following,
(@x = @y) or (@p = @q),
((1 = 2) or (5 = 6)) and (5 = 7),
(1 and 2) or (5 and 7)

3) Using XPath 2.0 'not' function:

An example XPath expression is following,
(@x = @y) and not(@p)

Interestingly, the XSD 1.1 CTA XPath subset language, allows using only the XPath 2.0 fn:not function and no other XPath 2.0 built-in functions. Constructor functions, for all built-in XSD types may be used, for e.g xs:integer(..), xs:boolean(..) etc, in XSD 1.1 CTA XPath subset expressions.

As per the XSD 1.1 specification, during XSD 1.1 CTA evaluations, the XML element and attribute nodes are untyped (i.e the XML nodes do not carry any type annotation coming from a XML schema). Therefore, in many cases, XSD 1.1 CTA XPath subset expressions when used with Xerces-J need to use explicit casts (for e.g, <xs:alternative test="(xs:integer(@x) = xs:integer(@y)) and fn:not(xs:boolean(@p))"> with namespace prefix 'fn' bound to the URI 'http://www.w3.org/2005/xpath-functions'). For the CTA XPath subset language or the full XPath 2.0 language for CTAs, it is optional for the XPath expressions to have the "fn" prefix with the XPath built-in functions. Typically, XML schema authors would not use the "fn" prefix for XPath built-in functions.

No comments: