Showing posts with label xml. Show all posts
Showing posts with label xml. Show all posts

Sunday, December 31, 2023

XSLT 3.0 grouping use case

I've just been playing this evening, trying to improve XalanJ prototype processor's XSLT 3.0 xsl:for-each-group instruction's implementation. Following is an xsl:for-each-group instruction use case, that I've been trying to solve.

XML input document,

<?xml version="1.0" encoding="utf-8"?>

<root>

  <a>

    <itm1>hi</itm1>

    <itm2>hello</itm2>

    <itm3>there</itm3>

  </a>

  <b>

    <itm1>this</itm1>

    <itm2>is</itm2>

    <itm3>nice</itm3>

  </b>

  <c>

    <itm1>hello</itm1>

    <itm2>friends</itm2>

  </c>

  <d>

    <itm1>this is ok</itm1>

  </d>

</root>

XSLT 3.0 stylesheet, using xsl:for-each-group instruction to group XML instance elements from an XML document cited above,

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         version="3.0">

      <xsl:output method="xml" indent="yes"/>

     <xsl:template match="/root">

           <result>

               <xsl:for-each-group select="*" group-by="(count(*) eq 1) or (count(*) eq 3)">

            <group groupingCriteria="{if (current-grouping-key() eq true()) then '1,3' else 'not(1,3)'}">

                <xsl:copy-of select="current-group()"/>

            </group>

              </xsl:for-each-group>

          </result>

      </xsl:template>

</xsl:stylesheet>

The stylesheet transformation result, of above cited XSLT transform is following as produced by XalanJ,

<?xml version="1.0" encoding="UTF-8"?><result>

  <group groupingCriteria="1,3">

    <a>

    <itm1>hi</itm1>

    <itm2>hello</itm2>

    <itm3>there</itm3>

  </a>

    <b>

    <itm1>this</itm1>

    <itm2>is</itm2>

    <itm3>nice</itm3>

  </b>

    <d>

    <itm1>this is ok</itm1>

  </d>

  </group>

  <group groupingCriteria="not(1,3)">

    <c>

    <itm1>hello</itm1>

    <itm2>friends</itm2>

  </c>

  </group>

</result>

Achieving such XML data grouping, was very hard with XSLT 1.0 language. Thank god, we've XSLT 3.0 language available now.


Thursday, December 28, 2023

Managing complexity of XPath 3.1 'if' expressions, in the context of XSLT 3.0

I've just been playing around, with the following XSLT transformation example, and thought of sharing this as a blog post here.

Let's consider following XSLT 3.0 stylesheet, that we'll use to transform an XML document mentioned thereafter,

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:xs="http://www.w3.org/2001/XMLSchema"

                         xmlns:fn0="http://fn0"

                         exclude-result-prefixes="xs fn0"

                         version="3.0">

  <xsl:output method="xml" indent="yes"/>

  <xsl:variable name="date1" select="xs:date('2005-10-12')" as="xs:date"/>

  <xsl:template match="/root">

      <root>

          <xsl:copy-of select="if (fn0:func1($date1)) then a else b"/>

     </root>

  </xsl:template>

  <!-- An XSLT stylesheet function, that performs a specific boolean valued computation. The result of this function, is used to perform computations of distinct branches of XPath 'if' condition used within xsl:copy-of instruction written earlier above. -->

 <xsl:function name="fn0:func1" as="xs:boolean">

     <xsl:param name="date1" as="xs:date"/>

     <xsl:sequence select="if (current-date() lt $date1) 

                                                                               then true() 

                                                                               else false()"/>

   </xsl:function>

</xsl:stylesheet>

The corresponding XML instance document is following,

<?xml version="1.0" encoding="utf-8"?>

<root>

    <a/>

    <b/>

</root>

The two possible XSLT transformation results (depending upon the result of following XPath expression comparison : current-date() lt $date1, for the above mentioned XSLT transformation are following:

<?xml version="1.0" encoding="UTF-8"?><root>

  <b/>

</root>

and,

<?xml version="1.0" encoding="UTF-8"?><root>

  <a/>

</root>

Within the above mentioned XSLT transformation example, we may observe how, the XPath 3.1 'if' expressions have been written to achieve the desired XSLT transformation results. We're able to write stylesheet functions that may be significantly complex to produce boolean result, which may act as XPath 'if' expression branching condition.

I hope that, the above mentioned XSLT transformation example is useful.


Wednesday, December 27, 2023

XML data grouping with XSLT 3.0, illustrations

I've just been playing this morning, writing an XSLT 3.0 stylesheet, that does grouping of an XML input data as follows (that I wish to share with XML and XSLT community).

XML input document,

<root>

  <a>

    <m/>

  </a>

  <b>

    <n/>

  </b>

  <a>

    <o/>

  </a>

  <a>

    <p/>

  </a>

  <a>

    <q/>

  </a>

  <b>

    <r/>

  </b>

  <b>

    <s/>

  </b>

</root>


XSLT 3.0 stylesheet, that does grouping of XML document's data mentioned above (i.e, grouping of xml element children of element "root"),

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"                

                         version="3.0">

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/root">

     <xsl:for-each-group select="*" group-by="name()">

        <xsl:element name="{current-grouping-key()}">

           <xsl:copy-of select="current-group()/*"/>

        </xsl:element>

     </xsl:for-each-group>

  </xsl:template>

</xsl:stylesheet>


The XSLT transformation output, of this XML document transform is following,

<?xml version="1.0" encoding="UTF-8"?><a>

  <m/>

  <o/>

  <p/>

  <q/>

</a><b>

  <n/>

  <r/>

  <s/>

</b>


The XML data grouping algorithm implemented by the XSLT stylesheet illustrated above is following,

The XML element children of element "root", are formed into multiple groups (there are two XML data groups that're possible for this stylesheet transformation example.) on the basis of XML element names (the XML sibling elements which are child elements of element "root").

I hope that, this XSLT stylesheet example has been useful for us to study.

This XSLT stylesheet example, has been tested with Apache XalanJ's XSLT 3.0 prototype processor.

Tuesday, September 12, 2023

XSLT 3.0, XPath 3.1 and XalanJ

It's been a while that, I've written a blog post here. I've few new updates, about the work which XalanJ team has been doing over the past few months, that I wish to share with the XML community.

XalanJ project, provides XSLT and XPath processors that are written with Java language. An XSLT processor transforms an XML input document (or even only text files), into other formats like XML, HTML and text.

XalanJ project, has released a new version (2.7.3) of XalanJ on 2023-04-01. This XalanJ release, essentially is a bug fix release over the previous release. The XalanJ 2.7.3 release was extensively tested by XalanJ team, and it has very good compliance with XSLT 1.0 and XPath 1.0 specs.

Since Apr 2023, XalanJ team has been working to develop implementations of XSLT 3.0 and XPath 3.1 language specifications. These XalanJ codebase changes are currently not released by XalanJ team, but are available on XalanJ dev repos branch.

I further wish to write about, XSLT 3.0 user-defined callable component implementation enhancements within XalanJ, that should be available within one of the future XalanJ release. The callable components within a programming language are, essentially functions and procedures. XSLT 1.0 language has only one kind of user-defined callable component, which is written with an XML element name xsl:template.

XSLT 3.0 provides another kind of user-defined callable component, defined with an XML element name xsl:function. An XSLT instruction xsl:function was first made available within XSLT 2.0 language. A user-defined function present within an XSLT stylesheet, may be called within an XPath expression.

Following is an example of XSLT 3.0 stylesheet, that makes use of an xsl:function element,

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                         xmlns:ns0="http://ns0"
                         exclude-result-prefixes="ns0"
                         version="3.0">
    
    <xsl:output method="xml" indent="yes"/>
    
    <xsl:template match="/">       
         <result>
             <one>
                 <xsl:value-of select="ns0:func1(6, 5, true(), false())"/>
             </one>
             <two>
         <xsl:value-of select="ns0:func1(2, 5, true(), false())"/>
             </two>
         </result>
    </xsl:template>
    
    <xsl:function name="ns0:func1">
         <xsl:param name="val1"/>
         <xsl:param name="val2"/>
         <xsl:param name="a"/>
         <xsl:param name="b"/>
       
         <xsl:value-of select="if ($val1 gt $val2) then ($a and $b) else ($a or $b)"/>
    </xsl:function>
    
</xsl:stylesheet>

The above cited XSLT stylesheet, defines an user-defined function named "func1" bound to the specified non-null XML namespace. This function definition requires four arguments with a function call, and produces a boolean result based on few logical conditions.

The above cited XSLT stylesheet, produces following output with XalanJ,

<?xml version="1.0" encoding="UTF-8"?><result>
  <one>false</one>
  <two>true</two>
</result>

XPath 3.1 provides a new kind of callable component (that wasn't available with XPath 1.0), which is an inline function definition which when compiled by an XPath processor, produces an XPath data model (XDM) function item.

An XPath 3.1 function item, may be called via an XPath dynamic function call expression.

Following is an XSLT 3.0 stylesheet, that specifies an XPath inline function expression, and is an alternate solution to above cited XSLT stylesheet,

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                         version="3.0">
    
    <xsl:output method="xml" indent="yes"/>
    
    <xsl:variable name="func1" select="function($val1, $val2, $a, $b) { if ($val1 gt $val2) then ($a and $b) else ($a or $b) }"/>
    
    <xsl:template match="/">       
         <result>
             <one>
                   <xsl:value-of select="$func1(6, 5, true(), false())"/>
             </one>
             <two>
          <xsl:value-of select="$func1(2, 5, true(), false())"/>
             </two>
         </result>
    </xsl:template>
    
</xsl:stylesheet>

The above cited XSLT stylesheet, specifies an XPath inline function expression assigned to an XSLT variable "func1". This makes, XPath expressions like $func1(..) as function calls (which are termed as dynamic function calls by XPath 3.1 language).

The above cited XSLT stylesheet, produces an output with XalanJ, which is same as with an earlier cited stylesheet.

Its perhaps also interesting to discuss and analyze, which of the above mentioned XSLT callable components approaches an XSLT stylesheet author should choose?

An XPath 3.1 inline function expression is an *XPath expression*, therefore its function body is limited to have XPath syntax only.

Whereas, an xsl:function is an XSLT instruction (which may be invoked as a function call, from within XPath expressions). The xsl:function function's body may have significantly complex logic (with any permissible XSLT syntax and XPath expressions) as compared to XPath inline function expressions.

To conclude, I believe that, when using XSLT 3.0 and XPath 3.1, we have following three main kinds of user-defined callable components which may be used by XSLT stylesheet authors,

1) xsl:template   (which is very important within an XSLT stylesheet, and is the core of an XSLT stylesheet)

2) xsl:function

3) XPath inline function expression

That's all I wished to say within this blog post.



Monday, April 10, 2023

XPath 2.0 quantified expressions. Implementation with XSLT 1.0

XPath 2.0 language has introduced new syntax and semantics as compared to XPath 1.0 language, for e.g like the XPath 2.0 quantified expressions.

Following is an XPath 2.0 grammar, for the quantified expressions (quoted from the XPath 2.0 language specification),

QuantifiedExpr    ::=    ("some" | "every") "$" VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)* "satisfies" ExprSingle

The XPath 2.0 quantified expression, when evaluated over a list of XPath data model items, returns either boolean 'true' or a 'false' value.

I'm able to, suggest an XSLT 1.0 code pattern (tested with Apache XalanJ), that can implement the logic of XPath 2.0 like quantified expressions. Following is an example, illustrating these concepts,

XML input document:

<?xml version="1.0" encoding="UTF-8"?>

<elem>

  <a>5</a>

  <a>5</a>

  <a>4</a>

  <a>7</a>

  <a>5</a>

  <a>5</a>

  <a>7</a>

  <a>5</a>

</elem> 

XSLT 1.0 stylesheet, implementing the XPath 2.0 "every" like quantified expression (i.e, universal quantification):

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         exclude-result-prefixes="exslt"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      <xsl:variable name="temp">

         <xsl:for-each select="a">           

            <xsl:if test="number(.) &gt; 3">

              <yes/>

            </xsl:if>

         </xsl:for-each>

      </xsl:variable>

      <xsl:value-of select="count(exslt:node-set($temp)/yes) = count(a)"/>

   </xsl:template>

</xsl:stylesheet>

The above XSLT stylehseet, produces a boolean 'true' result, if all XML "a" input elements have value greater than 3, otherwise a boolean 'false' result is produced.

XSLT 1.0 stylesheet, implementing the XPath 2.0 "some" like quantified expression (i.e, existential quantification):

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         exclude-result-prefixes="exslt"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      <xsl:variable name="temp">

         <xsl:for-each select="a">           

            <xsl:if test="number(.) = 4">

              <yes/>

            </xsl:if>

         </xsl:for-each>

      </xsl:variable>

      <xsl:value-of select="count(exslt:node-set($temp)/yes) &gt;= 1"/>

   </xsl:template>

</xsl:stylesheet>

The above XSLT stylehseet, produces a boolean 'true' result, if at-least one XML "a" input element has value equal to 4, otherwise a boolean 'false' result is produced.

Within the above cited XSLT 1.0 stylesheets, we've used XSLT "node-set" extension function (that helps to convert an XSLT 1.0 "result tree fragment" into a node set).

We can therefore conclude that, within an XSLT 1.0 environment, we can largely simulate logic of many XPath 2.0 language constructs.

Thursday, April 6, 2023

XSLT 1.0 transformation : find distinct values

In continuation to my previous blog post on this site, this blog post describes how to use XSLT 1.0 language (tested with Apache XalanJ 2.7.3 along with its JavaScript extension function bindings), to find distinct values (i.e, doing de-duplication of data set) from data set originating from an XML instance document.

Following is an XSLT transformation example, illustrating these features.

XML instance document:

<?xml version="1.0" encoding="UTF-8"?>

<elem>

  <a>2</a>

  <a>3</a>

  <a>3</a>

  <a>5</a>

  <a>3</a>

  <a>1</a>

  <a>2</a>

  <a>5</a>

</elem>

Corresponding XSLT 1.0 transformation:

<?xml version="1.0"?>

<xsl:stylesheet  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                          xmlns:xalan="http://xml.apache.org/xalan"

          xmlns:js="http://js_functions"

                          extension-element-prefixes="js"

                          version="1.0">

   <xsl:output method="text"/>

   <xalan:component prefix="js" functions="reformString">

      <xalan:script lang="javascript">

        function reformString(str)

        {

           return str.substr(0, str.length - 1);

        }

      </xalan:script>

   </xalan:component>

   <xsl:template match="/elem">

      <xsl:if test="count(a) &gt; 0">

         <xsl:variable name="result">

            <xsl:call-template name="distinctValues">

               <xsl:with-param name="curr_node" select="a[1]"/>

               <xsl:with-param name="csv_result" select="concat(string(a[1]), ',')"/>

            </xsl:call-template>

         </xsl:variable>

         <xsl:value-of select="js:reformString(string($result))"/>

      </xsl:if>

   </xsl:template>

   <xsl:template name="distinctValues">

      <xsl:param name="curr_node"/>

      <xsl:param name="csv_result"/>

      <xsl:choose>

        <xsl:when test="$curr_node/following-sibling::*">

           <xsl:variable name="temp1">

              <xsl:choose>

         <xsl:when test="not(contains($csv_result, concat(string($curr_node), ',')))">

            <xsl:value-of select="concat($csv_result, string($curr_node), ',')"/>

         </xsl:when>

         <xsl:otherwise>

            <xsl:value-of select="$csv_result"/>

         </xsl:otherwise>

              </xsl:choose>

           </xsl:variable>

           <xsl:call-template name="distinctValues">

      <xsl:with-param name="curr_node" select="$curr_node/following-sibling::*[1]"/>

      <xsl:with-param name="csv_result" select="normalize-space($temp1)"/>

           </xsl:call-template>

        </xsl:when>

        <xsl:otherwise>

           <xsl:value-of select="$csv_result"/>

        </xsl:otherwise>

      </xsl:choose>      

   </xsl:template>

</xsl:stylesheet>

The above mentioned, XSLT transformation produces the following, desired result,

2,3,5,1

XalanJ users could find the, JavaScript language related jars (which needs to be available within, the jvm classpath at run-time during XSLT transformation) within XalanJ src distribution. These relevant jar files are : bsf.jarcommons-logging-1.2.jarrhino-1.7.14.jar (Rhino is mozilla's javascript engine implementation, bundled with XalanJ 2.7.3 src distribution).


Wednesday, April 5, 2023

XSLT 1.0 transformation : finding maximum from a list of numbers, from an XML input document

Apache Xalan project has released XalanJ 2.7.3 few days ago, and I thought to write couple of blog posts here, to report on the basic sanity of XalanJ 2.7.3's functional quality.

Following is a simple XML transformation requirement.

XML input document :

<?xml version="1.0" encoding="UTF-8"?>

<elem>

    <a>2</a>

    <a>3</a>

    <a>5</a>

    <a>1</a>

    <a>7</a>

    <a>4</a>

</elem>

We need to write an XSLT 1.0 stylesheet, that outputs the maximum value from the list of XML "a" elements mentioned within above cited XML document.

Following are the three XSLT 1.0 stylesheets that I've come up with, that do this correctly,

1)

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      <xsl:variable name="temp">

         <xsl:for-each select="a">

           <xsl:sort select="." data-type="number" order="descending"/>

           <e1><xsl:value-of select="."/></e1>

         </xsl:for-each>

      </xsl:variable>

      <xsl:value-of select="concat('Maximum : ', exslt:node-set($temp)/e1[1])"/>

   </xsl:template>

</xsl:stylesheet>

2)

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      Maximum : <xsl:call-template name="findMax"/>

   </xsl:template>

   <xsl:template name="findMax">

      <xsl:variable name="temp">

         <xsl:for-each select="a">

            <xsl:sort select="." data-type="number" order="descending"/>

            <e1><xsl:value-of select="."/></e1>

         </xsl:for-each>

      </xsl:variable>

      <xsl:value-of select="exslt:node-set($temp)/e1[1]"/>

   </xsl:template>

</xsl:stylesheet>

3)

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      <xsl:choose>

         <xsl:when test="count(a) = 0"/>

         <xsl:when test="count(a) = 1">

            Maximum : <xsl:value-of select="a[1]"/>

         </xsl:when>

         <xsl:otherwise>

            <xsl:variable name="result">

               <xsl:call-template name="findMax">

                  <xsl:with-param name="curr_max" select="a[1]"/>

                  <xsl:with-param name="next_node" select="a[2]"/>

               </xsl:call-template>

            </xsl:variable>

            Maximum :  <xsl:value-of select="$result"/> 

         </xsl:otherwise>

      </xsl:choose>

   </xsl:template>

   <xsl:template name="findMax">

      <xsl:param name="curr_max"/>

      <xsl:param name="next_node"/>

      <xsl:choose>

         <xsl:when test="$next_node/following-sibling::*">

            <xsl:choose>

               <xsl:when test="number($next_node) &gt; number($curr_max)">

                  <xsl:call-template name="findMax">

     <xsl:with-param name="curr_max" select="$next_node"/>

     <xsl:with-param name="next_node" select="$next_node/following-sibling::*[1]"/>

                  </xsl:call-template>

               </xsl:when>

               <xsl:otherwise>

          <xsl:call-template name="findMax">

             <xsl:with-param name="curr_max" select="$curr_max"/>

             <xsl:with-param name="next_node" select="$next_node/following-sibling::*[1]"/>

          </xsl:call-template>

               </xsl:otherwise>

            </xsl:choose>

         </xsl:when>

         <xsl:otherwise>

            <xsl:choose>

               <xsl:when test="number($next_node) &gt; number($curr_max)">

                  <xsl:value-of select="$next_node"/>

               </xsl:when>

               <xsl:otherwise>

                  <xsl:value-of select="$curr_max"/>

               </xsl:otherwise>

            </xsl:choose>

         </xsl:otherwise>

      </xsl:choose>

   </xsl:template>

</xsl:stylesheet>

I somehow, personally like the XSLT solution 3) illustrated above, for these requirements. This solution, traverses the sequence of XML "a" elements till the end of "a" elements list, and outputs the maximum value from the list at the end of XML elements traversal. This solution, seems to have an algorithmic time complexity of O(n), with a little bit of possible overhead of XSLT recursive template calls than the other two XSLT solutions.

The XSLT solutions 1) and 2) illustrated above, seem to have higher algorithmic time complexity than solution 3), due to the use of XSLT xsl:sort instruction (which probably has algorithmic time complexity of O(n * log(n)) or O(n * n)). The XSLT solutions 1) and 2) illustrated above, also seem to have higher algorithmic "space complexity" (this measures the memory used by the algorithm) due to storage of intermediate sorted result.

The XalanJ command line, to run above cited XSLT transformations are following,

java org.apache.xalan.xslt.Process -in file.xml -xsl file.xsl


Wednesday, March 29, 2023

A simple XSLT stylesheet, XML document validator

I've been thinking that, this shall be interesting to share.

Please consider following, XSLT 1.0 document transformation definition.

XML input document:

<?xml version="1.0" encoding="UTF-8"?>

<root>

  <a>2</a>

  <a>4</a>

  <a>6</a>

  <a>8</a>

  <a>10</a>

</root>

We should be able to tell, that this XML document is valid, if all XML /root/a elements within it have even numbers.

The following XSLT 1.0 stylesheet just does this XML document validation check,

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         exclude-result-prefixes="exslt"

                         version="1.0">

    <!-- An XSLT stylesheet, that checks whether values of all XML 

         input /root/a elements have even numbers (in which case, the XML input 

         document is reported as valid). -->                            

    <xsl:output method="text"/>                

    <xsl:template match="/root">

       <xsl:variable name="result">

          <xsl:for-each select="a">

             <e1><xsl:value-of select=". mod 2"/></e1>

          </xsl:for-each>

       </xsl:variable>

       <xsl:choose>

          <xsl:when test="count(exslt:node-set($result)/*[. = 0]) = count(exslt:node-set($result)/*)">

             <xsl:text>XML document is valid</xsl:text>

          </xsl:when>

          <xsl:otherwise>

             <xsl:text>XML document is in-valid</xsl:text>

          </xsl:otherwise>

       </xsl:choose>

    </xsl:template>

</xsl:stylesheet>

Please note that, within above mentioned XSLT 1.0 stylesheet, we've used an XSLT 1.0 extension function "node-set", that is supported by most of the XSLT 1.0 engines (for example, XalanJ as described here https://xalan.apache.org/xalan-j/apidocs/org/apache/xalan/lib/ExsltCommon.html). 

For the interest of readers, following is an equivalent XML Schema 1.1 validation, that solves the same problem,

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="root">

      <xs:complexType>

         <xs:sequence>

            <xs:element name="a" type="xs:integer" maxOccurs="unbounded"/>

         </xs:sequence>

         <xs:assert test="count(a) = count(a[. mod 2 = 0])"/>

      </xs:complexType>

   </xs:element>

</xs:schema>

Personally, speaking, I shall prefer an XML Schema 1.1 validation for this requirement, since XML Schema language is designed to do XML document validation, whereas XSLT language is designed to do an XML document transformation (but as illustrated within this blog post, the XSLT stylesheet does the job of an XML document validator as well).


Monday, May 30, 2022

XML Schema : identity constraints essentials and best practices

In this blog post, I'll attempt to describe the best practices, for the use of XML Schema (XSD) identity constraints. I'm going to compare here, the XSD identity constraint instructions xs:unique and xs:key, and describe when to use which one of these.

The XSD xs:key serves the same purpose within XML, as the RDBMS primary keys, whereas XSD xs:unique is a generic syntax to enforce unique values within a set of XML data values. xs:key also enforces, unique values within a set of XML data values. Unlike xs:key, xs:unique permits the values within a XML dataset to be absent (i.e, logically speaking as null values).

Please consider following XML Schema validation example.

XML Schema document:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="catalog" type="CatalogType">

      <xs:unique name="prodNumKey">

         <xs:selector xpath="*/product"/>

         <xs:field xpath="number"/>

      </xs:unique>

   </xs:element>

   <xs:complexType name="CatalogType">

      <xs:sequence>

         <xs:element name="department" maxOccurs="unbounded">

            <xs:complexType>

               <xs:sequence>

                  <xs:element name="product" maxOccurs="unbounded">

                     <xs:complexType>

                        <xs:sequence>

                           <xs:element name="number" type="xs:positiveInteger" minOccurs="0"/>

                           <xs:element name="name" type="xs:string"/>

                           <xs:element name="price">

                              <xs:complexType>

                                 <xs:simpleContent>

                                    <xs:extension base="xs:decimal">

                                       <xs:attribute name="currency" type="xs:string"/>

                                    </xs:extension>

                                 </xs:simpleContent>

                              </xs:complexType>

                           </xs:element>

                        </xs:sequence>

                     </xs:complexType>   

                  </xs:element>

               </xs:sequence>

               <xs:attribute name="number" type="xs:positiveInteger"/>

            </xs:complexType>

         </xs:element>

      </xs:sequence>

   </xs:complexType>

</xs:schema>

One of valid XML instance document, for the above mentioned XSD schema, is following:

<catalog>

  <department number="021">

    <product>

      <number>557</number>

      <name>Short-Sleeved Linen Blouse</name>

      <price currency="USD">29.99</price>

    </product>

    <product>

      <name>Ten-Gallon Hat</name>

      <price currency="USD">69.99</price>

    </product>

    <product>

      <number>443</number>

      <name>Deluxe Golf Umbrella</name>

      <price currency="USD">49.99</price>

    </product>

  </department>

</catalog>

Pleas note, the following, within above cited XML Schema validation example,

1) Within the XML Schema document, the "number" child of "product" is specified as following,

<xs:element name="number" type="xs:positiveInteger" minOccurs="0"/>

(i.e, with minOccurs="0", meaning that this element is optional within the corresponding XML instance document)

2) The "catalog" element has following XSD xs:unique definition bound to it,

<xs:unique name="prodNumKey">

     <xs:selector xpath="*/product"/>

     <xs:field xpath="number"/>

</xs:unique>

The above stated facts, mean that, the "number" element is not intended to function as the primary key of "product" data set (because, the primary key value has to be present within all the records of the data set), but for the set of "number" elements that are present within the mentioned XML instance document (the "number" element can be absent within certain "product" elements, as per the above mentioned XML Schema document) their values have to be unique.

We've discussed, the role of XSD xs:unique instruction within above mentioned paragraphs.


Now, as we've stated earlier within this blog post, how do we enforce primary key kind of behavior within an XML Schema document.

Within the context, of above mentioned example, this can be simply done by changing the "number" element declaration to following (i.e, we must not write minOccurs="0" within the XML element declaration),

<xs:element name="number" type="xs:positiveInteger"/>

And, write the "catalog" element declaration as following (i.e, we now use xs:key instead of xs:unique), 

<xs:element name="catalog" type="CatalogType">

      <xs:key name="prodNumKey">

         <xs:selector xpath="*/product"/>

         <xs:field xpath="number"/>

      </xs:key>

</xs:element>

The above changes to the XML Schema document mean that,

All "product" elements must have a "number" child, and all the "number" values within XML instance document have to be unique (and, these characteristics shall make, "number" element as a primary key for "product" data set).


The XML Schema features, related to constructs xs:unique and xs:key, described within this blog post, are supported both by 1.0 and 1.1 versions of XML Schema language.


Acknowledgements : The XML Schema validation example, mentioned within this blog post is borrowed from Priscilla Walmsley's excellent book "Definitive XML Schema, 2nd edition".



Thursday, February 24, 2022

XML Schema 1.1 : <assertion> facet with attribute "fixed"

I've come up with an XML Schema 1.1 example, involving XSD <assertion> facet and XSD attribute "fixed", that I thought should be interesting to write about.

Please consider, following two XML instance documents,

XML document 1:
<?xml version="1.0"?>
<Test>
    <A>a</A>
    <country>USA</country>
    <C>c</C>
</Test>

XML document 2:
<?xml version="1.0"?>
<Test>
    <A>a</A>
    <country>U S A</country>
    <C>c</C>
</Test>

According to "XML document 1" specified above, the element "country" needs to have a fixed value "USA". Whereas, according to "XML document 2" specified above, the element "country" needs to have a fixed value USA with any amount of whitespace characters anywhere within the string value.

The XSD 1.1 schema, for "XML document 1" is following (the schema specified below, is a valid XSD 1.0 schema as well),

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
    <xs:element name="Test" type="TestType" />
   
    <xs:complexType name="TestType">
        <xs:sequence>
            <xs:element name="A" type="xs:string"/>
            <xs:element name="country" type="xs:string" fixed="USA"/>            
            <xs:element name="C" type="xs:string"/>
        </xs:sequence>
    </xs:complexType>

</xs:schema>

Whereas, XSD 1.1 schema, for "XML document 2" is following,

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
    
    <xs:element name="Test" type="TestType" />
   
    <xs:complexType name="TestType">
        <xs:sequence>
            <xs:element name="A" type="xs:string"/>
            <xs:element name="country">
               <xs:simpleType>
                  <xs:restriction base="xs:string">
                    <xs:assertion test="replace($value, '\s', '') = 'USA'"/>
                  </xs:restriction>
               </xs:simpleType>
            </xs:element>
            <xs:element name="C" type="xs:string"/>
        </xs:sequence>
    </xs:complexType>

</xs:schema>

According to the latter schema specified above, the XSD 1.1 <assertion> facet lets us achieve, a special notion of a fixed value as illustrated by the mentioned example above.

Saturday, January 29, 2022

XML Schema 1.1 : conditional inclusion

I've been wanting to, write something about XML Schema (XSD) 1.1 conditional inclusion feature. This particular XML Schema 1.1 feature is described here : https://www.w3.org/TR/xmlschema11-1/#cip. I'm copying, some relevant description from XML Schema 1.1 specification about this feature as following,

<quote>
Whenever a conforming XSD processor reads a ·schema document· in order to include the components defined in it in a schema, it first performs on the schema document the pre-processing described in this section.

Every element in the ·schema document· is examined to see whether any of the attributes vc:minVersion, vc:maxVersion, vc:typeAvailable, vc:typeUnavailable, vc:facetAvailable, or vc:facetUnavailable appear among its [attributes].

Where they appear, the attributes vc:minVersion and vc:maxVersion are treated as if declared with type xs:decimal, and their ·actual values· are compared to a decimal value representing the version of XSD supported by the processor (here represented as a variable V). For processors conforming to this version of this specification, the value of V is 1.1.

If V is less than the value of vc:minVersion, or if V is greater than or equal to the value of vc:maxVersion, then the element on which the attribute appears is to be ignored, along with all its attributes and descendants. The effect is that portions of the schema document marked with vc:minVersion and/or vc:maxVersion are retained if vc:minVersion ≤ V < vc:maxVersion.
</quote>

I'll present below a small XML Schema validation example (as tested with Apache Xerces XML Schema 1.1 processor), about XSD 1.1 conditional inclusion.

Following is an XML instance document, that'll be validated by an XML Schema document,

<val>5</val>

One of the validations, that we want to do is that, an integer value of element "val" must be an even number.

Following is an XML Schema document, that'll validate the above cited XML instance document,

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
                    xmlns:vc="http://www.w3.org/2007/XMLSchema-versioning">

  <xs:element name="val" type="Integer"/>
  
  <xs:simpleType name="Integer" vc:minVersion="1" vc:maxVersion="1.05">
      <xs:restriction base="xs:integer"/>
  </xs:simpleType>
  
  <xs:simpleType name="Integer" vc:minVersion="1.1">
      <xs:restriction base="xs:integer">
         <xs:assertion test="$value mod 2 = 0"/>
      </xs:restriction>
  </xs:simpleType>

</xs:schema>

Within the above specified schema document, there's an element declaration for XML element "val" that is of XML schema type "Integer". There are two variants, of schema type "Integer" defined in this schema. One of an "Integer" type simply says that, the value should be xs:integer (the type with attributes vc:minVersion="1" vc:maxVersion="1.05"). The other "Integer" type says that, the value should be an even integer (the type with attribute vc:minVersion="1.1").

When we perform, the above mentioned XML schema validation, using XSD 1.1 processor in XML schema 1.0 mode, the valid outcome is reported (because, the simpleType with attributes vc:minVersion="1" vc:maxVersion="1.05" is selected, and the other simpleType definition is filtered out during XML schema conditional inclusion pre-processing).

Whereas, when we perform, the above mentioned XML schema validation, using XSD 1.1 processor in XML schema 1.1 mode, an invalid outcome is reported (because, the simpleType with attribute vc:minVersion="1.1" is selected, and the other simpleType definition is filtered out during XML schema conditional inclusion pre-processing).

Please note that, when the above mentioned XML schema validation is done with a pure XML Schema 1.0 processor (that's bundled with Apache XercesJ as well) that was written for the XML Schema 1.0 specification https://www.w3.org/TR/xmlschema-1/, the above cited XSD document won't compile successfully (because, with a pure XSD 1.0 processor, we cannot have within a schema document two global type definitions with same name; "Integer" for the above cited schema document).

Tuesday, January 18, 2022

XML Schema 1.1 : using regex

I've been thinking about this for a while, and thought of writing a blog post here, about this.

Consider the following, XML document instance,

<?xml version="1.0"?>
<temp>ABCABD</temp>

And the following, XML Schema (XSD) 1.1 document (that'll validate the above mentioned, XML document instance),

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  
  <xs:element name="temp">
      <xs:simpleType>
         <xs:restriction base="xs:string">
            <xs:pattern value="(ABC)+"/>
            <xs:assertion test="matches($value, '(ABC)+')"/>
         </xs:restriction>
      </xs:simpleType>
  </xs:element>
  
</xs:schema>

At first thought, as shown within the above mentioned XSD 1.1 document, it might seem that both <xs:pattern> and the <xs:assertion> would fail the validation for the XML document instance value "ABCABD" (according to the XSD document shown, the string "ABC" is shown repeating one or more times).

But in reality, and according to the XSD 1.1 specification, for the example shown above, the XML document instance value "ABCABD" would be invalid for the <xs:pattern>, but valid for <xs:assertion>. That's so because, the XPath 2.0 "matches(..)" function, returns true when any substring matches the regex, unless the "matches(..)" regex is written within ^ and & characters.

Therefore, for the above cited XSD 1.1 example, the following are exactly equivalent XSD validation checks,
<xs:pattern value="(ABC)+"/>
<xs:assertion test="matches($value, '^(ABC)+$')"/>

And for <xs:pattern>, there's no explicit regex anchoring with ^ and $ available (its implied always). i.e, with <xs:pattern>, its always the entire string input that is checked against the pattern regex.

Wednesday, June 9, 2021

XML Schema xsi:type and xs:alternative

After having studied little bit deeply about XML Schema's xsi:type attribute, and xs:alternative (introduced in the XML Schema 1.1 version) element, I've come to conclusion that, there are lot of functional similarities between xsi:type and xs:alternative, and of course differences as well. To illustrate these points, I've come up with following XML Schema and XML document instance examples (that I shall also attempt to explain within this blog post).


XML Schema document 1 (conforming to XSD 1.1)

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="note" type="NoteType"/>

    <xs:complexType name="NoteType">

       <xs:sequence>

          <xs:element name="to" type="xs:string"/>

          <xs:element name="from" type="xs:string"/>

          <xs:element name="heading" type="xs:string"/>

          <xs:element name="body" type="xs:string"/>

       </xs:sequence>

    </xs:complexType>

    <xs:complexType name="NoteType2">

       <xs:complexContent>

          <xs:extension base="NoteType">

             <xs:attribute name="isConfidential" type="xs:boolean" use="required"/>

          </xs:extension>

       </xs:complexContent>

    </xs:complexType>

    <xs:complexType name="NoteType3">

       <xs:complexContent>

          <xs:extension base="NoteType">

             <xs:attribute name="isConfidential" type="xs:boolean" use="required"/>

             <xs:assert test="to castable as emailAddress"/>

             <xs:assert test="from castable as emailAddress"/>

          </xs:extension>

       </xs:complexContent>

    </xs:complexType>

    <xs:simpleType name="emailAddress"> 

       <xs:restriction base="xs:string"> 

         <xs:pattern value="[^@]+@[^@\.]+(\.[^@\.]+)+"/>

       </xs:restriction> 

    </xs:simpleType>

</xs:schema>


Following are three XML document instances, that are valid with above specified XML Schema document:

XML document instance 1

<note>

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>

XML document instance 2

<note isConfidential="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="NoteType2">

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>

XML document instance 3

<note isConfidential="true" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="NoteType3">

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>

The "XML document instance 1", is an XML document that is valid according to an XSD element declaration and an XSD type definition "NoteType".

The "XML document instance 2" asserts that the type of an XML instance element "note" must be "NoteType2".

The "XML document instance 3" asserts that the type of an XML instance element "note" must be "NoteType3".

Note that, as per XML Schema language, the XSD type named as a value of xsi:type attribute, must be validly substitutable for the declared type (i.e, which is associated within an XML schema) of an XML element. According to the XML Schema language, a type S is validly substitutable for type T, if type S is a type derived from type T.


Now consider another XML Schema document, as following,

XML Schema document 2 (conforming to XSD 1.1)

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="note" type="NoteType">

       <xs:alternative test="@noteType2 = true()" type="NoteType2"/>

       <xs:alternative test="@noteType3 = true()" type="NoteType3"/>

    </xs:element>

    <xs:complexType name="NoteType">

       <xs:sequence>

          <xs:element name="to" type="xs:string"/>

          <xs:element name="from" type="xs:string"/>

          <xs:element name="heading" type="xs:string"/>

          <xs:element name="body" type="xs:string"/>

       </xs:sequence>

    </xs:complexType>

    <xs:complexType name="NoteType2">

       <xs:complexContent>

          <xs:extension base="NoteType">

             <xs:attribute name="isConfidential" type="xs:boolean" use="required"/>

             <xs:attribute name="noteType2" type="xs:boolean" use="required"/>

          </xs:extension>

       </xs:complexContent>

    </xs:complexType>

    <xs:complexType name="NoteType3">

       <xs:complexContent>

          <xs:extension base="NoteType">

             <xs:attribute name="isConfidential" type="xs:boolean" use="required"/>

             <xs:attribute name="noteType3" type="xs:boolean" use="required"/>

             <xs:assert test="to castable as emailAddress"/>

             <xs:assert test="from castable as emailAddress"/>

          </xs:extension>

       </xs:complexContent>

    </xs:complexType>

    <xs:simpleType name="emailAddress"> 

       <xs:restriction base="xs:string"> 

         <xs:pattern value="[^@]+@[^@\.]+(\.[^@\.]+)+"/>

       </xs:restriction> 

    </xs:simpleType>

</xs:schema>


Following are two XML document instances, that are valid with above specified XML Schema document:

XML document instance 4

<note isConfidential="true" noteType2="true">

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>

XML document instance 5

<note isConfidential="true" noteType3="true">

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>


I think that, XML Schema documents 1 and 2 as illustrated in examples above, solve the same XML document validation problem, but in two different ways. With XSD element xs:alternative, we need to introduce a new physical XML attribute like "noteType2" & "noteType3", whereas we can achieve the same effect using an attribute xsi:type with another solution.


Following is another XML Schema 1.1 document, that has a little variation than the XML Schema document "XML Schema document 2" specified earlier above,

XML Schema document 3

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="note" type="NoteType">

       <xs:alternative test="@noteType = 2" type="NoteType2"/>

       <xs:alternative test="@noteType = 3" type="NoteType3"/>

    </xs:element>

    <xs:complexType name="NoteType">

       <xs:sequence>

          <xs:element name="to" type="xs:string"/>

          <xs:element name="from" type="xs:string"/>

          <xs:element name="heading" type="xs:string"/>

          <xs:element name="body" type="xs:string"/>

       </xs:sequence>

    </xs:complexType>

    <xs:complexType name="NoteType2">

       <xs:complexContent>

          <xs:extension base="NoteType">

             <xs:attribute name="isConfidential" type="xs:boolean" use="required"/>

             <xs:attribute name="noteType" type="NoteTypeVal" use="required"/>

          </xs:extension>

       </xs:complexContent>

    </xs:complexType>

    <xs:complexType name="NoteType3">

       <xs:complexContent>

          <xs:extension base="NoteType">

             <xs:attribute name="isConfidential" type="xs:boolean" use="required"/>

             <xs:attribute name="noteType" type="NoteTypeVal" use="required"/>

             <xs:assert test="to castable as emailAddress"/>

             <xs:assert test="from castable as emailAddress"/>

          </xs:extension>

       </xs:complexContent>

    </xs:complexType>

    <xs:simpleType name="emailAddress"> 

       <xs:restriction base="xs:string"> 

         <xs:pattern value="[^@]+@[^@\.]+(\.[^@\.]+)+"/>

       </xs:restriction> 

    </xs:simpleType>

    <xs:simpleType name="NoteTypeVal"> 

       <xs:restriction base="xs:positiveInteger"> 

          <xs:minInclusive value="2"/>

          <xs:maxInclusive value="3"/>

       </xs:restriction> 

    </xs:simpleType>

</xs:schema>


Two valid XML instance documents, with the above mentioned XML Schema document are following,

<note isConfidential="true" noteType="2">

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>

<note isConfidential="true" noteType="3">

   <to>abc.pqr@gmail.com</to>

   <from>no-reply@gmail.com</from>

   <heading>hi</heading>

   <body>this is test</body>

</note>


With the XML Schema document "XML Schema document 3" specified above, we've defined an attribute "noteType" for both the types "NoteType2" and "NoteType3". We distinguish within the XML instance document, with which XSD type the "note" element would be validated, by the value of attribute "noteType" within the XML instance document.

Also note that, as per XML Schema 1.1 specification for type alternatives (i.e when having xs:alternative elements within XSD documents), the following must be applicable,

For each type T of sibling xs:alternative elements within an XSD document, type T must be validly derived from an element's default type definition (this is a constraint similar to those for xsi:type), or T can be type xs:error.  

Sunday, May 3, 2020

Online XML Schema validation service

During some of my spare time, I've developed and deployed an 'online XML Schema validation service' using Apache Xerces-J as XML Schema (XSD) processor at back-end. This 'online XML Schema validation service' is located at, http://www.softwarebytes.org/xmlvalidation/. The HTTPS version is available here: https://www.softwarebytes.org/xmlvalidation/.

The mentioned 'online XML Schema validation service', also provides REST APIs to be invoked from any program that can issue HTTP POST requests. The 'online XML Schema validation service' referred above, provides downloadable examples written in Python and C# that use the provided REST APIs. The responses from mentioned REST APIs can be in following formats: XML, JSON, plain text (the REST API response format, can be set while issuing HTTP requests).

Interestingly, I've discovered that, the above mentioned REST APIs can be invoked directly via a tool like curl by using its platform binary. With modern computer OSs (for e.g, Windows 10), curl comes pre-installed within the OS. Following are network responses on the command line, for the few curl requests that I issued to the mentioned REST APIs,

curl --form xmlFile=@two_inp_files/x1_valid_1.xml --form xsdFile1=@two_inp_files/x1.xsd --form ver=1.1 --form xsd11CtaFullXPath=no --form responseType=xml https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<validationReport>
   <xsdVer>1.1</xsdVer>
   <success>
      <message>XML document is assessed as valid with the XSD document(s) that were provided.</message>
   </success>
</validationReport>

curl --form xmlFile=@two_inp_files/x1_invalid_1.xml --form xsdFile1=@two_inp_files/x1.xsd --form ver=1.1 --form xsd11CtaFullXPath=no --form responseType=xml https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<validationReport>
   <xsdVer>1.1</xsdVer>
   <failure>
      <message>XML document is assessed as invalid with the XSD document(s) that were provided.</message>
      <details>
         <detail_1>[Error] x1_invalid_1.xml:3:5:cvc-assertion: Assertion evaluation ('if (@isB = true()) then b else not(b)') for element 'X' on schema type '#AnonType_X' did not succeed.</detail_1>
      </details>
   </failure>
</validationReport>

curl --form xmlFile=@two_inp_files/x1_valid_1.xml --form xsdFile1=@two_inp_files/x1.xsd --form ver=1.1 --form xsd11CtaFullXPath=no --form responseType=json https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

{
    "xsdVer": "1.1",
    "success": {"message": "XML document is assessed as valid with the XSD document(s) that were provided."}
}

curl --form xmlFile=@two_inp_files/x1_invalid_1.xml --form xsdFile1=@two_inp_files/x1.xsd --form ver=1.1 --form xsd11CtaFullXPath=no --form responseType=json https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

{
    "xsdVer": "1.1",
    "failure": {
        "details": ["[Error] x1_invalid_1.xml:3:5:cvc-assertion: Assertion evaluation ('if (@isB = true()) then b else not(b)') for element 'X' on schema type '#AnonType_X' did not succeed."],
        "message": "XML document is assessed as invalid with the XSD document(s) that were provided."
    }
}

curl --form xmlFile=@input_small.xml --form xsdFile1=@assert_2.xsd --form ver=1.1 --form xsd11CtaFullXPath=no https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

You selected XSD 1.1 validation.
XML document is assessed as valid with the XSD document(s) you have provided.

(please note that, since the last curl request above doesn't specify a command line argument 'responseType', a response formatted as plain text is received from the server API. i.e, a plain text response from this API, is the default response format)

The mentioned 'online XML Schema validation service', supports both 1.0 and 1.1 versions of XML Schema language.

Saturday, March 21, 2020

Using XML Schema 1.1 <alternative> with Xerces-J

I wish to share little information here, about Apache Xerces-J's implementation of XML Schema (XSD) 1.1 'type alternatives'.

The XSD 1.1 specification, defines a particular subset of XPath 2.0 language that can be used as value of 'test' attribute of XSD 1.1 <alternative> element. The XSD 1.1 language's XPath 2.0 subset is much smaller than the whole XPath 2.0 language. The specification of this smaller CTA XPath subset, can be read at https://www.w3.org/TR/xmlschema11-1/#coss-ta (specifically, the section mentioning '2.1 It conforms to the following extended BNF' which has grammar specification for the CTA XPath subset).

In fact, the XSD 1.1 specification allows XSD validators, implementing XSD 1.1's <alternative> element, to support a bigger set of XPath 2.0's features (commonly the full XPath 2.0 language) than what is defined by XSD 1.1 CTA (conditional type alternatives) XPath subset.

For XSD 1.1 CTAs, Xerces-J with user option, allows selecting either:

1) The smaller XPath subset (the default for Xerces-J), or

2) Full XPath 2.0. How selecting between XPath subset or the full XPath 2.0 language, can be done for Xerces-J's CTA implementation is described here, https://xerces.apache.org/xerces2-j/faq-xs.html#faq-3.

I've analyzed a bit, the nature of XSD 1.1 CTA XPath subset language. Following are essentially the main XSD 1.1 CTA XPath subset patterns, that may be used within XSD 1.1 schemas when using XSD <alternative> element,

1) Using comparators (like >, <, =, !=, <=, >=):

The example CTA XPath expressions are following,
@x = @y,
@x = 3,
@x != 3,
@x > @y

2) Using comparators with logical operators:

The example CTA XPath expressions are following,
(@x = @y) or (@p = @q),
((1 = 2) or (5 = 6)) and (5 = 7),
(1 and 2) or (5 and 7)

3) Using XPath 2.0 'not' function:

An example XPath expression is following,
(@x = @y) and not(@p)

Interestingly, the XSD 1.1 CTA XPath subset language, allows using only the XPath 2.0 fn:not function and no other XPath 2.0 built-in functions. Constructor functions, for all built-in XSD types may be used, for e.g xs:integer(..), xs:boolean(..) etc, in XSD 1.1 CTA XPath subset expressions.

As per the XSD 1.1 specification, during XSD 1.1 CTA evaluations, the XML element and attribute nodes are untyped (i.e the XML nodes do not carry any type annotation coming from a XML schema). Therefore, in many cases, XSD 1.1 CTA XPath subset expressions when used with Xerces-J need to use explicit casts (for e.g, <xs:alternative test="(xs:integer(@x) = xs:integer(@y)) and fn:not(xs:boolean(@p))"> with namespace prefix 'fn' bound to the URI 'http://www.w3.org/2005/xpath-functions'). For the CTA XPath subset language or the full XPath 2.0 language for CTAs, it is optional for the XPath expressions to have the "fn" prefix with the XPath built-in functions. Typically, XML schema authors would not use the "fn" prefix for XPath built-in functions.

Tuesday, March 10, 2020

XML Schema 1.1 <assert> continued ...

This blog post is related to the XML Schema (XSD) use case that I've discussed within my previous two blog posts. Consider the following XML Schema 1.1 document, having an XSD <assert> element,

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:sequence>
              <xs:element name="isSeqTwo" type="xs:boolean"/>
              <xs:choice>
                 <xs:sequence>
                    <xs:element name="a" type="xs:string"/>
                    <xs:element name="b" type="xs:string"/>
                 </xs:sequence>
                 <xs:sequence>
                    <xs:element name="p" type="xs:string"/>
                    <xs:element name="q" type="xs:string"/>
                 </xs:sequence>
                 <xs:sequence>
                    <xs:element name="x" type="xs:string"/>
                    <xs:element name="y" type="xs:string"/>
                 </xs:sequence>
               </xs:choice>
           </xs:sequence>       
           <xs:assert test="if (isSeqTwo = true()) then p else not(p)"/>
       </xs:complexType>
    </xs:element>

</xs:schema>

The above schema document, is different than my earlier schema documents that I've presented within my previous two blog posts, in following way:
The XML child content model of an element "X", is a sequence of an element followed by a choice.

Within the earlier two blog posts that I've presented, the XML child content model of element "X" is dependent on the value of an attribute on an element "X", which could be enforced using either an XSD 1.1 <assert> or an <alternative>.

Few XML instance documents that are valid or invalid, according to the above XSD schema document are following:

Valid,

<X>
    <isSeqTwo>0</isSeqTwo>
    <x>string1</x>
    <y>string2</y>
</X>

Valid,

<X>
    <isSeqTwo>1</isSeqTwo>
    <p>string1</p>
    <q>string2</q>
</X>

Invalid,

<X>
    <isSeqTwo>1</isSeqTwo>
    <x>string1</x>
    <y>string2</y>
</X>

The XSD use case illustrated above, is useful and could only be accomplished using an XSD 1.1 <assert> element.

As a side discussion, to re-affirm I would like to cite from the XML Schema 1.1 structures specification the following rules: 3.4.4.2 Element Locally Valid (Complex Type) that say,
For an element information item E to be locally ·valid· with respect to a complex type definition T all of the following must be true:
1
2
3
...
6 E is ·valid· with respect to each of the assertions in T.{assertions} as per Assertion Satisfied (§3.13.4.1).

We can infer, from the above rules from XSD 1.1 spec, that an XML instance element is valid according to a XSD complex type definition, if an XML instance element is valid with respect to each of the assertions present on the complex type with which an XML instance element is validated, in addition to other XSD complex type validation rules.

Sunday, March 1, 2020

XML Schema 1.1 <alternative> use cases with <choice> and <attribute>

While using XML Schema (XSD) 1.1, many times when we use XSD 1.1 <assert> we could find a solution using XSD 1.1 <alternative> as well for the same use cases (and vice versa as well). This is usually the case, when the XML child content model of an element, is dependent on the values of attributes of an element on which the attributes appear. This is evident for the first example, of my previous blog post. Given the same XML input examples, as in the first example of my previous blog post, the following XML Schema 1.1 example using <alternative> is also a possible solution,

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:alternative test="xs:boolean(@isB) eq true()">
          <xs:complexType>
             <xs:sequence>
               <xs:element name="b" type="xs:string"/>
            </xs:sequence>
             <xs:attribute name="isB" type="xs:boolean" use="required"/>
          </xs:complexType>
       </xs:alternative>
       <xs:alternative>
          <xs:complexType>
             <xs:choice>
               <xs:element name="a" type="xs:string"/>            
               <xs:element name="c" type="xs:string"/>
            </xs:choice>
             <xs:attribute name="isB" type="xs:boolean" use="required"/>
          </xs:complexType>
       </xs:alternative>
    </xs:element>

</xs:schema>

Then the question arises, for these same use cases should we use XSD 1.1 <assert> or an <alternative>? Below are the pros and cons for this, according to me:
1) An XSD 1.1 solution, using <assert> has less lines of code than the one using <alternative>, which many would consider as a benefit.
2) I personally, prefer an XPath expression '@isB = true()' (within 'if (@isB = true()) then b else not(b)') of an <assert> over 'xs:boolean(@isB) eq true()' in an <alternative>. With these examples, for the example involving <alternative> an attribute node 'isB' has a type annotation of xs:untypedAtomic that requires an explicit cast with xs:boolean(..). I tend to prefer, the XPath expressions that don't use explicit casts (since, such XPath expressions look more schema aware).
3) One of the benefits, I see with the solution using an XSD 1.1 <alternative> over <assert>, is better error diagnostics in case of XML validation errors.

Saturday, February 15, 2020

XML Schema 1.1 <assert> use cases with <choice> and <attribute>

I've been imagining that, what could be useful use cases of XML Schema (XSD) 1.1 <assert> construct.

According to the XSD 1.1 structures specification, "assertion components constrain the existence and values of related XML elements and attributes".

One of useful use cases possible for XSD 1.1 <assert> is, to constrain the standard behavior of XSD 1.0 / 1.1 <choice> construct. I'll attempt to write something about this, here on this blog post.

Below is an XSD schema example using the <choice> construct, that is correct for both 1.0 and 1.1 versions of XSD language:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:choice>
             <xs:element name="a" type="xs:string"/>
             <xs:element name="b" type="xs:string"/>
             <xs:element name="c" type="xs:string"/>
          </xs:choice>
       </xs:complexType>
    </xs:element>

</xs:schema>

The above schema document, ensures that following XML instance documents would be valid:

<X>
    <a>some string</a>
</X>

,

<X>
    <b>some string</b>
</X>

,

<X>
    <c>some string</c>
</X>

(essentially showing that, element 'X' can have only one of the elements 'a', 'b' or 'c' as a child element)

Lets see how the above XSD example, can be made a little different using XSD elements <attribute> and <assert>. Below is such a modified XSD document,

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:choice>
             <xs:element name="a" type="xs:string"/>
             <xs:element name="b" type="xs:string"/>
             <xs:element name="c" type="xs:string"/>
          </xs:choice>
          <xs:attribute name="isB" type="xs:boolean" use="required"/>
          <xs:assert test="if (@isB = true()) then b else not(b)"/>
       </xs:complexType>
    </xs:element>

</xs:schema>

The complete meaning of above XSD document is following,
1) The <choice> with three <element> declarations below it, essentially are the same constraints as the earlier XSD document has shown.
2) This schema additionally specifies, a mandatory boolean typed attribute named 'isB'.
3) The <assert> specifies that, if value of attribute 'isB' is true then element 'b' must be present as a child of element 'X'. If value of attribute 'isB' is false, then element 'X' cannot have element 'b' as its child but one of elements 'a' or 'c' would be a valid child of element 'X'.

The following XML instance documents would be valid according to above mentioned XSD document:

<X isB="1">
  <b>some string</b>
</X>

,

<X isB="0">
  <a>some string</a>
</X>

,

<X isB="0">
  <c>some string</c>
</X>

And, the following XML instance documents would be invalid according to the same XSD document:

<X isB="0">
  <b>some string</b>
</X>

,

<X isB="1">
  <a>some string</a>
</X>

,

<X isB="0">
  <d>some string</d>
</X>

Now lets consider another XSD example, where the schema document specifies a choice between three or more sequences. Below is mentioned such a schema document:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:choice>
             <xs:sequence>
                <xs:element name="a" type="xs:string"/>
                <xs:element name="b" type="xs:string"/>
             </xs:sequence>
             <xs:sequence>
        <xs:element name="p" type="xs:string"/>
        <xs:element name="q" type="xs:string"/>
             </xs:sequence>
             <xs:sequence>
        <xs:element name="x" type="xs:string"/>
        <xs:element name="y" type="xs:string"/>
             </xs:sequence>
          </xs:choice>
          <xs:attribute name="isSeqTwo" type="xs:boolean" use="required"/>
          <xs:assert test="if (@isSeqTwo = true()) then p else not(p)"/>
       </xs:complexType>
    </xs:element>

</xs:schema>

The complete meaning of above XSD document is following,
1) A <choice> is specified between three <sequence> elements. Therefore, element 'X' can have one of following sequences as its child: {a, b}, {p, q} or {x, y}.
2) This schema additionally specifies, a mandatory boolean typed attribute named 'isSeqTwo'.
3) The <assert> specifies that, if value of attribute 'isSeqTwo' is true then sequence {p, q} must be present as a child of element 'X'. If value of attribute 'isSeqTwo' is false, then element 'X' cannot have sequence {p, q} as its child but one of sequences {a, b} or {x, y} would be a valid child of element 'X'.

The following XML instance documents would be valid according to above mentioned XSD document:

<X isSeqTwo="1">
  <p>string1</p>
  <q>string2</q>
</X>

,

<X isSeqTwo="0">
  <a>string1</a>
  <b>string2</b>
</X>

,

<X isSeqTwo="0">
  <x>string1</x>
  <y>string2</y>
</X>

And, the following XML instance documents would be invalid according to the same XSD document:

<X isSeqTwo="0">
  <p>string1</p>
  <q>string2</q>
</X>

,

<X isSeqTwo="1">
  <a>string1</a>
  <b>string2</b>
</X>

,

<X isSeqTwo="0">
  <i>string1</i>
  <j>string2</j>
</X>


All the above examples, and any other XSD 1.0/1.1 constructs may be used with any standards compliant XSD validator.

That's about all I wanted to say, about this topic.