Showing posts with label xalan. Show all posts
Showing posts with label xalan. Show all posts

Tuesday, September 12, 2023

XSLT 3.0, XPath 3.1 and XalanJ

It's been a while that, I've written a blog post here. I've few new updates, about the work which XalanJ team has been doing over the past few months, that I wish to share with the XML community.

XalanJ project, provides XSLT and XPath processors that are written with Java language. An XSLT processor transforms an XML input document (or even only text files), into other formats like XML, HTML and text.

XalanJ project, has released a new version (2.7.3) of XalanJ on 2023-04-01. This XalanJ release, essentially is a bug fix release over the previous release. The XalanJ 2.7.3 release was extensively tested by XalanJ team, and it has very good compliance with XSLT 1.0 and XPath 1.0 specs.

Since Apr 2023, XalanJ team has been working to develop implementations of XSLT 3.0 and XPath 3.1 language specifications. These XalanJ codebase changes are currently not released by XalanJ team, but are available on XalanJ dev repos branch.

I further wish to write about, XSLT 3.0 user-defined callable component implementation enhancements within XalanJ, that should be available within one of the future XalanJ release. The callable components within a programming language are, essentially functions and procedures. XSLT 1.0 language has only one kind of user-defined callable component, which is written with an XML element name xsl:template.

XSLT 3.0 provides another kind of user-defined callable component, defined with an XML element name xsl:function. An XSLT instruction xsl:function was first made available within XSLT 2.0 language. A user-defined function present within an XSLT stylesheet, may be called within an XPath expression.

Following is an example of XSLT 3.0 stylesheet, that makes use of an xsl:function element,

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                         xmlns:ns0="http://ns0"
                         exclude-result-prefixes="ns0"
                         version="3.0">
    
    <xsl:output method="xml" indent="yes"/>
    
    <xsl:template match="/">       
         <result>
             <one>
                 <xsl:value-of select="ns0:func1(6, 5, true(), false())"/>
             </one>
             <two>
         <xsl:value-of select="ns0:func1(2, 5, true(), false())"/>
             </two>
         </result>
    </xsl:template>
    
    <xsl:function name="ns0:func1">
         <xsl:param name="val1"/>
         <xsl:param name="val2"/>
         <xsl:param name="a"/>
         <xsl:param name="b"/>
       
         <xsl:value-of select="if ($val1 gt $val2) then ($a and $b) else ($a or $b)"/>
    </xsl:function>
    
</xsl:stylesheet>

The above cited XSLT stylesheet, defines an user-defined function named "func1" bound to the specified non-null XML namespace. This function definition requires four arguments with a function call, and produces a boolean result based on few logical conditions.

The above cited XSLT stylesheet, produces following output with XalanJ,

<?xml version="1.0" encoding="UTF-8"?><result>
  <one>false</one>
  <two>true</two>
</result>

XPath 3.1 provides a new kind of callable component (that wasn't available with XPath 1.0), which is an inline function definition which when compiled by an XPath processor, produces an XPath data model (XDM) function item.

An XPath 3.1 function item, may be called via an XPath dynamic function call expression.

Following is an XSLT 3.0 stylesheet, that specifies an XPath inline function expression, and is an alternate solution to above cited XSLT stylesheet,

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                         version="3.0">
    
    <xsl:output method="xml" indent="yes"/>
    
    <xsl:variable name="func1" select="function($val1, $val2, $a, $b) { if ($val1 gt $val2) then ($a and $b) else ($a or $b) }"/>
    
    <xsl:template match="/">       
         <result>
             <one>
                   <xsl:value-of select="$func1(6, 5, true(), false())"/>
             </one>
             <two>
          <xsl:value-of select="$func1(2, 5, true(), false())"/>
             </two>
         </result>
    </xsl:template>
    
</xsl:stylesheet>

The above cited XSLT stylesheet, specifies an XPath inline function expression assigned to an XSLT variable "func1". This makes, XPath expressions like $func1(..) as function calls (which are termed as dynamic function calls by XPath 3.1 language).

The above cited XSLT stylesheet, produces an output with XalanJ, which is same as with an earlier cited stylesheet.

Its perhaps also interesting to discuss and analyze, which of the above mentioned XSLT callable components approaches an XSLT stylesheet author should choose?

An XPath 3.1 inline function expression is an *XPath expression*, therefore its function body is limited to have XPath syntax only.

Whereas, an xsl:function is an XSLT instruction (which may be invoked as a function call, from within XPath expressions). The xsl:function function's body may have significantly complex logic (with any permissible XSLT syntax and XPath expressions) as compared to XPath inline function expressions.

To conclude, I believe that, when using XSLT 3.0 and XPath 3.1, we have following three main kinds of user-defined callable components which may be used by XSLT stylesheet authors,

1) xsl:template   (which is very important within an XSLT stylesheet, and is the core of an XSLT stylesheet)

2) xsl:function

3) XPath inline function expression

That's all I wished to say within this blog post.



Monday, April 10, 2023

XPath 2.0 quantified expressions. Implementation with XSLT 1.0

XPath 2.0 language has introduced new syntax and semantics as compared to XPath 1.0 language, for e.g like the XPath 2.0 quantified expressions.

Following is an XPath 2.0 grammar, for the quantified expressions (quoted from the XPath 2.0 language specification),

QuantifiedExpr    ::=    ("some" | "every") "$" VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)* "satisfies" ExprSingle

The XPath 2.0 quantified expression, when evaluated over a list of XPath data model items, returns either boolean 'true' or a 'false' value.

I'm able to, suggest an XSLT 1.0 code pattern (tested with Apache XalanJ), that can implement the logic of XPath 2.0 like quantified expressions. Following is an example, illustrating these concepts,

XML input document:

<?xml version="1.0" encoding="UTF-8"?>

<elem>

  <a>5</a>

  <a>5</a>

  <a>4</a>

  <a>7</a>

  <a>5</a>

  <a>5</a>

  <a>7</a>

  <a>5</a>

</elem> 

XSLT 1.0 stylesheet, implementing the XPath 2.0 "every" like quantified expression (i.e, universal quantification):

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         exclude-result-prefixes="exslt"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      <xsl:variable name="temp">

         <xsl:for-each select="a">           

            <xsl:if test="number(.) &gt; 3">

              <yes/>

            </xsl:if>

         </xsl:for-each>

      </xsl:variable>

      <xsl:value-of select="count(exslt:node-set($temp)/yes) = count(a)"/>

   </xsl:template>

</xsl:stylesheet>

The above XSLT stylehseet, produces a boolean 'true' result, if all XML "a" input elements have value greater than 3, otherwise a boolean 'false' result is produced.

XSLT 1.0 stylesheet, implementing the XPath 2.0 "some" like quantified expression (i.e, existential quantification):

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         exclude-result-prefixes="exslt"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      <xsl:variable name="temp">

         <xsl:for-each select="a">           

            <xsl:if test="number(.) = 4">

              <yes/>

            </xsl:if>

         </xsl:for-each>

      </xsl:variable>

      <xsl:value-of select="count(exslt:node-set($temp)/yes) &gt;= 1"/>

   </xsl:template>

</xsl:stylesheet>

The above XSLT stylehseet, produces a boolean 'true' result, if at-least one XML "a" input element has value equal to 4, otherwise a boolean 'false' result is produced.

Within the above cited XSLT 1.0 stylesheets, we've used XSLT "node-set" extension function (that helps to convert an XSLT 1.0 "result tree fragment" into a node set).

We can therefore conclude that, within an XSLT 1.0 environment, we can largely simulate logic of many XPath 2.0 language constructs.

Thursday, April 6, 2023

XSLT 1.0 transformation : find distinct values

In continuation to my previous blog post on this site, this blog post describes how to use XSLT 1.0 language (tested with Apache XalanJ 2.7.3 along with its JavaScript extension function bindings), to find distinct values (i.e, doing de-duplication of data set) from data set originating from an XML instance document.

Following is an XSLT transformation example, illustrating these features.

XML instance document:

<?xml version="1.0" encoding="UTF-8"?>

<elem>

  <a>2</a>

  <a>3</a>

  <a>3</a>

  <a>5</a>

  <a>3</a>

  <a>1</a>

  <a>2</a>

  <a>5</a>

</elem>

Corresponding XSLT 1.0 transformation:

<?xml version="1.0"?>

<xsl:stylesheet  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                          xmlns:xalan="http://xml.apache.org/xalan"

          xmlns:js="http://js_functions"

                          extension-element-prefixes="js"

                          version="1.0">

   <xsl:output method="text"/>

   <xalan:component prefix="js" functions="reformString">

      <xalan:script lang="javascript">

        function reformString(str)

        {

           return str.substr(0, str.length - 1);

        }

      </xalan:script>

   </xalan:component>

   <xsl:template match="/elem">

      <xsl:if test="count(a) &gt; 0">

         <xsl:variable name="result">

            <xsl:call-template name="distinctValues">

               <xsl:with-param name="curr_node" select="a[1]"/>

               <xsl:with-param name="csv_result" select="concat(string(a[1]), ',')"/>

            </xsl:call-template>

         </xsl:variable>

         <xsl:value-of select="js:reformString(string($result))"/>

      </xsl:if>

   </xsl:template>

   <xsl:template name="distinctValues">

      <xsl:param name="curr_node"/>

      <xsl:param name="csv_result"/>

      <xsl:choose>

        <xsl:when test="$curr_node/following-sibling::*">

           <xsl:variable name="temp1">

              <xsl:choose>

         <xsl:when test="not(contains($csv_result, concat(string($curr_node), ',')))">

            <xsl:value-of select="concat($csv_result, string($curr_node), ',')"/>

         </xsl:when>

         <xsl:otherwise>

            <xsl:value-of select="$csv_result"/>

         </xsl:otherwise>

              </xsl:choose>

           </xsl:variable>

           <xsl:call-template name="distinctValues">

      <xsl:with-param name="curr_node" select="$curr_node/following-sibling::*[1]"/>

      <xsl:with-param name="csv_result" select="normalize-space($temp1)"/>

           </xsl:call-template>

        </xsl:when>

        <xsl:otherwise>

           <xsl:value-of select="$csv_result"/>

        </xsl:otherwise>

      </xsl:choose>      

   </xsl:template>

</xsl:stylesheet>

The above mentioned, XSLT transformation produces the following, desired result,

2,3,5,1

XalanJ users could find the, JavaScript language related jars (which needs to be available within, the jvm classpath at run-time during XSLT transformation) within XalanJ src distribution. These relevant jar files are : bsf.jarcommons-logging-1.2.jarrhino-1.7.14.jar (Rhino is mozilla's javascript engine implementation, bundled with XalanJ 2.7.3 src distribution).


Wednesday, April 5, 2023

XSLT 1.0 transformation : finding maximum from a list of numbers, from an XML input document

Apache Xalan project has released XalanJ 2.7.3 few days ago, and I thought to write couple of blog posts here, to report on the basic sanity of XalanJ 2.7.3's functional quality.

Following is a simple XML transformation requirement.

XML input document :

<?xml version="1.0" encoding="UTF-8"?>

<elem>

    <a>2</a>

    <a>3</a>

    <a>5</a>

    <a>1</a>

    <a>7</a>

    <a>4</a>

</elem>

We need to write an XSLT 1.0 stylesheet, that outputs the maximum value from the list of XML "a" elements mentioned within above cited XML document.

Following are the three XSLT 1.0 stylesheets that I've come up with, that do this correctly,

1)

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      <xsl:variable name="temp">

         <xsl:for-each select="a">

           <xsl:sort select="." data-type="number" order="descending"/>

           <e1><xsl:value-of select="."/></e1>

         </xsl:for-each>

      </xsl:variable>

      <xsl:value-of select="concat('Maximum : ', exslt:node-set($temp)/e1[1])"/>

   </xsl:template>

</xsl:stylesheet>

2)

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      Maximum : <xsl:call-template name="findMax"/>

   </xsl:template>

   <xsl:template name="findMax">

      <xsl:variable name="temp">

         <xsl:for-each select="a">

            <xsl:sort select="." data-type="number" order="descending"/>

            <e1><xsl:value-of select="."/></e1>

         </xsl:for-each>

      </xsl:variable>

      <xsl:value-of select="exslt:node-set($temp)/e1[1]"/>

   </xsl:template>

</xsl:stylesheet>

3)

<?xml version="1.0"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         version="1.0">

   <xsl:output method="text"/>

   <xsl:template match="/elem">

      <xsl:choose>

         <xsl:when test="count(a) = 0"/>

         <xsl:when test="count(a) = 1">

            Maximum : <xsl:value-of select="a[1]"/>

         </xsl:when>

         <xsl:otherwise>

            <xsl:variable name="result">

               <xsl:call-template name="findMax">

                  <xsl:with-param name="curr_max" select="a[1]"/>

                  <xsl:with-param name="next_node" select="a[2]"/>

               </xsl:call-template>

            </xsl:variable>

            Maximum :  <xsl:value-of select="$result"/> 

         </xsl:otherwise>

      </xsl:choose>

   </xsl:template>

   <xsl:template name="findMax">

      <xsl:param name="curr_max"/>

      <xsl:param name="next_node"/>

      <xsl:choose>

         <xsl:when test="$next_node/following-sibling::*">

            <xsl:choose>

               <xsl:when test="number($next_node) &gt; number($curr_max)">

                  <xsl:call-template name="findMax">

     <xsl:with-param name="curr_max" select="$next_node"/>

     <xsl:with-param name="next_node" select="$next_node/following-sibling::*[1]"/>

                  </xsl:call-template>

               </xsl:when>

               <xsl:otherwise>

          <xsl:call-template name="findMax">

             <xsl:with-param name="curr_max" select="$curr_max"/>

             <xsl:with-param name="next_node" select="$next_node/following-sibling::*[1]"/>

          </xsl:call-template>

               </xsl:otherwise>

            </xsl:choose>

         </xsl:when>

         <xsl:otherwise>

            <xsl:choose>

               <xsl:when test="number($next_node) &gt; number($curr_max)">

                  <xsl:value-of select="$next_node"/>

               </xsl:when>

               <xsl:otherwise>

                  <xsl:value-of select="$curr_max"/>

               </xsl:otherwise>

            </xsl:choose>

         </xsl:otherwise>

      </xsl:choose>

   </xsl:template>

</xsl:stylesheet>

I somehow, personally like the XSLT solution 3) illustrated above, for these requirements. This solution, traverses the sequence of XML "a" elements till the end of "a" elements list, and outputs the maximum value from the list at the end of XML elements traversal. This solution, seems to have an algorithmic time complexity of O(n), with a little bit of possible overhead of XSLT recursive template calls than the other two XSLT solutions.

The XSLT solutions 1) and 2) illustrated above, seem to have higher algorithmic time complexity than solution 3), due to the use of XSLT xsl:sort instruction (which probably has algorithmic time complexity of O(n * log(n)) or O(n * n)). The XSLT solutions 1) and 2) illustrated above, also seem to have higher algorithmic "space complexity" (this measures the memory used by the algorithm) due to storage of intermediate sorted result.

The XalanJ command line, to run above cited XSLT transformations are following,

java org.apache.xalan.xslt.Process -in file.xml -xsl file.xsl


Wednesday, March 29, 2023

A simple XSLT stylesheet, XML document validator

I've been thinking that, this shall be interesting to share.

Please consider following, XSLT 1.0 document transformation definition.

XML input document:

<?xml version="1.0" encoding="UTF-8"?>

<root>

  <a>2</a>

  <a>4</a>

  <a>6</a>

  <a>8</a>

  <a>10</a>

</root>

We should be able to tell, that this XML document is valid, if all XML /root/a elements within it have even numbers.

The following XSLT 1.0 stylesheet just does this XML document validation check,

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                         xmlns:exslt="http://exslt.org/common"

                         exclude-result-prefixes="exslt"

                         version="1.0">

    <!-- An XSLT stylesheet, that checks whether values of all XML 

         input /root/a elements have even numbers (in which case, the XML input 

         document is reported as valid). -->                            

    <xsl:output method="text"/>                

    <xsl:template match="/root">

       <xsl:variable name="result">

          <xsl:for-each select="a">

             <e1><xsl:value-of select=". mod 2"/></e1>

          </xsl:for-each>

       </xsl:variable>

       <xsl:choose>

          <xsl:when test="count(exslt:node-set($result)/*[. = 0]) = count(exslt:node-set($result)/*)">

             <xsl:text>XML document is valid</xsl:text>

          </xsl:when>

          <xsl:otherwise>

             <xsl:text>XML document is in-valid</xsl:text>

          </xsl:otherwise>

       </xsl:choose>

    </xsl:template>

</xsl:stylesheet>

Please note that, within above mentioned XSLT 1.0 stylesheet, we've used an XSLT 1.0 extension function "node-set", that is supported by most of the XSLT 1.0 engines (for example, XalanJ as described here https://xalan.apache.org/xalan-j/apidocs/org/apache/xalan/lib/ExsltCommon.html). 

For the interest of readers, following is an equivalent XML Schema 1.1 validation, that solves the same problem,

<?xml version="1.0"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="root">

      <xs:complexType>

         <xs:sequence>

            <xs:element name="a" type="xs:integer" maxOccurs="unbounded"/>

         </xs:sequence>

         <xs:assert test="count(a) = count(a[. mod 2 = 0])"/>

      </xs:complexType>

   </xs:element>

</xs:schema>

Personally, speaking, I shall prefer an XML Schema 1.1 validation for this requirement, since XML Schema language is designed to do XML document validation, whereas XSLT language is designed to do an XML document transformation (but as illustrated within this blog post, the XSLT stylesheet does the job of an XML document validator as well).


Sunday, November 1, 2009

XSLT 1.0: Regular expression string tokenization, and Xalan-J

Some time ago, XSLT folks were debating on xsl-list (ref, http://www.biglist.com/lists/lists.mulberrytech.com/xsl-list/archives/200910/msg00365.html) about how to implement string tokenizer functionality in XSLT. XPath 2.0 (and therefore, XSLT 2.0) has a built in function for this need (ref, fn:tokenize). XPath 2.0 string tokenizer method, 'fn:tokenize' takes a string and a tokenizing regular expression pattern as arguments. This is something, which cannot be done natively in XSLT 1.0. To do this, with XSLT 1.0 we need to write a recursive tokenizing "named XSLT template". But a "named XSLT template" using XSLT 1.0, for string tokenization has limitation, that it cannot accept natively an arbitrary regular expression, as a tokenizing delimiter.

I got motivated enough, to write a Java extension mechanism for regular expression based, string tokenization facility for XSLT 1.0 stylesheets, using the Xalan-J XSLT 1.0 engine.

Here's Java code and a sample XSLT stylesheet for this particular, functionality:

String tokenizer Xalan-J Java extension:
package org.apache.xalan.xslt.ext;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.apache.xpath.NodeSet;
import org.w3c.dom.Document;

public class XalanUtil {
    public static NodeSet tokenize(String str, String regExp) throws ParserConfigurationException {
      String[] tokens = str.split(regExp);
      NodeSet nodeSet = new NodeSet();
       
      DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
      DocumentBuilder docBuilder = dbf.newDocumentBuilder();
      Document document = docBuilder.newDocument();
       
      for (int nodeCount = 0; nodeCount < tokens.length; nodeCount++) {
        nodeSet.addElement(document.createTextNode(tokens[nodeCount]));   
      }
       
      return nodeSet;
    }
}
Sample XSLT stylesheet, using the above Java extension (named, test.xsl):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0"                                                    
                xmlns:java="http://xml.apache.org/xalan/java"
                exclude-result-prefixes="java">
                 
   <xsl:output method="xml" indent="yes" />
   
   <xsl:param name="str" />
   
   <xsl:template match="/">
     <words>
       <xsl:for-each select="java:org.apache.xalan.xslt.ext.XalanUtil.tokenize($str, '\s+')">
         <word>
           <xsl:value-of select="." />
         </word>
       </xsl:for-each>
     </words>
   </xsl:template>
   
 </xsl:stylesheet>
Now for e.g, when the above stylesheet is run with Xalan as follows: java -classpath <path to the extension java class> org.apache.xalan.xslt.Process -in test.xsl -xsl test.xsl -PARAM str "hello world", following output is produced:
<?xml version="1.0" encoding="UTF-8"?>
<words>
 <word>hello</word>
 <word>world</word>
</words>

This illustrates, that regular expression based string tokenization was applied as designed above, for XSLT 1.0 environment.

The above Java extension, should be running fine with a min JRE level of, 1.4 as it relies on the JDK method, java.lang.String.split(String regex) which is available since JDK 1.4.

PS: For easy reading and verboseness, the package name in the above Java extension class may be omitted, which will cause the corresponding XSLT instruction to be written like following:
xsl:for-each select="java:XalanUtil.tokenize(... I would personally prefer this coding style, for production Java XSLT extensions. Though, this should not matter and to my opinion, decision to handle this can be left to individual XSLT developers.

I hope, that this was useful.

Thursday, April 10, 2008

Xalan-J serializer

I thought there was some problem with Xalan-J serializer. I posted the following question on xalan-dev mailing list:

I think, there is scope of improvement to the Xalan-J 2.7.1 serializer.

I tried this sample XSLT stylesheet with Xalan-J 2.7.1.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output method="xml" indent="yes" />

<xsl:template match="/">
  <x>
    <y/>
  </x>
</xsl:template>

</xsl:stylesheet>

The output produced by Xalan is:
<?xml version="1.0" encoding="UTF-8"?><x>
<y/>
</x>

Please note that top most element tag, <x> is not indented properly.

I wish the output in this case should be:

<?xml version="1.0" encoding="UTF-8"?>
<x>
  <y/>
</x>

This problem seems to happen with any XML output.

Henry Zongaro provided a good argument that why this is so:

The problem here is that the serializer considers that the result document might be used as an external general parsed entity. So, suppose the result is named result.xml. If it's referenced inside a document such as the following, inserting whitespace before the x element in result.xml would affect the text content of its parent element, doc.

<!DOCTYPE doc [
<!ENTITY ref SYSTEM "result.xml">
]>
<doc>Some non-whitespace test &ref; Some more non-whitespace text</doc>