Tuesday, September 12, 2023
XSLT 3.0, XPath 3.1 and XalanJ
Monday, April 10, 2023
XPath 2.0 quantified expressions. Implementation with XSLT 1.0
XPath 2.0 language has introduced new syntax and semantics as compared to XPath 1.0 language, for e.g like the XPath 2.0 quantified expressions.
Following is an XPath 2.0 grammar, for the quantified expressions (quoted from the XPath 2.0 language specification),
QuantifiedExpr ::= ("some" | "every") "$" VarName "in" ExprSingle ("," "$" VarName "in" ExprSingle)* "satisfies" ExprSingle
The XPath 2.0 quantified expression, when evaluated over a list of XPath data model items, returns either boolean 'true' or a 'false' value.
I'm able to, suggest an XSLT 1.0 code pattern (tested with Apache XalanJ), that can implement the logic of XPath 2.0 like quantified expressions. Following is an example, illustrating these concepts,
XML input document:
<?xml version="1.0" encoding="UTF-8"?>
<elem>
<a>5</a>
<a>5</a>
<a>4</a>
<a>7</a>
<a>5</a>
<a>5</a>
<a>7</a>
<a>5</a>
</elem>
XSLT 1.0 stylesheet, implementing the XPath 2.0 "every" like quantified expression (i.e, universal quantification):
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exslt="http://exslt.org/common"
exclude-result-prefixes="exslt"
version="1.0">
<xsl:output method="text"/>
<xsl:template match="/elem">
<xsl:variable name="temp">
<xsl:for-each select="a">
<xsl:if test="number(.) > 3">
<yes/>
</xsl:if>
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="count(exslt:node-set($temp)/yes) = count(a)"/>
</xsl:template>
</xsl:stylesheet>
The above XSLT stylehseet, produces a boolean 'true' result, if all XML "a" input elements have value greater than 3, otherwise a boolean 'false' result is produced.
XSLT 1.0 stylesheet, implementing the XPath 2.0 "some" like quantified expression (i.e, existential quantification):
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exslt="http://exslt.org/common"
exclude-result-prefixes="exslt"
version="1.0">
<xsl:output method="text"/>
<xsl:template match="/elem">
<xsl:variable name="temp">
<xsl:for-each select="a">
<xsl:if test="number(.) = 4">
<yes/>
</xsl:if>
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="count(exslt:node-set($temp)/yes) >= 1"/>
</xsl:template>
</xsl:stylesheet>
The above XSLT stylehseet, produces a boolean 'true' result, if at-least one XML "a" input element has value equal to 4, otherwise a boolean 'false' result is produced.
Within the above cited XSLT 1.0 stylesheets, we've used XSLT "node-set" extension function (that helps to convert an XSLT 1.0 "result tree fragment" into a node set).
We can therefore conclude that, within an XSLT 1.0 environment, we can largely simulate logic of many XPath 2.0 language constructs.
Thursday, April 6, 2023
XSLT 1.0 transformation : find distinct values
In continuation to my previous blog post on this site, this blog post describes how to use XSLT 1.0 language (tested with Apache XalanJ 2.7.3 along with its JavaScript extension function bindings), to find distinct values (i.e, doing de-duplication of data set) from data set originating from an XML instance document.
Following is an XSLT transformation example, illustrating these features.
XML instance document:
<?xml version="1.0" encoding="UTF-8"?>
<elem>
<a>2</a>
<a>3</a>
<a>3</a>
<a>5</a>
<a>3</a>
<a>1</a>
<a>2</a>
<a>5</a>
</elem>
Corresponding XSLT 1.0 transformation:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xalan="http://xml.apache.org/xalan"
xmlns:js="http://js_functions"
extension-element-prefixes="js"
version="1.0">
<xsl:output method="text"/>
<xalan:component prefix="js" functions="reformString">
<xalan:script lang="javascript">
function reformString(str)
{
return str.substr(0, str.length - 1);
}
</xalan:script>
</xalan:component>
<xsl:template match="/elem">
<xsl:if test="count(a) > 0">
<xsl:variable name="result">
<xsl:call-template name="distinctValues">
<xsl:with-param name="curr_node" select="a[1]"/>
<xsl:with-param name="csv_result" select="concat(string(a[1]), ',')"/>
</xsl:call-template>
</xsl:variable>
<xsl:value-of select="js:reformString(string($result))"/>
</xsl:if>
</xsl:template>
<xsl:template name="distinctValues">
<xsl:param name="curr_node"/>
<xsl:param name="csv_result"/>
<xsl:choose>
<xsl:when test="$curr_node/following-sibling::*">
<xsl:variable name="temp1">
<xsl:choose>
<xsl:when test="not(contains($csv_result, concat(string($curr_node), ',')))">
<xsl:value-of select="concat($csv_result, string($curr_node), ',')"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$csv_result"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:call-template name="distinctValues">
<xsl:with-param name="curr_node" select="$curr_node/following-sibling::*[1]"/>
<xsl:with-param name="csv_result" select="normalize-space($temp1)"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$csv_result"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
The above mentioned, XSLT transformation produces the following, desired result,
2,3,5,1
XalanJ users could find the, JavaScript language related jars (which needs to be available within, the jvm classpath at run-time during XSLT transformation) within XalanJ src distribution. These relevant jar files are : bsf.jar, commons-logging-1.2.jar, rhino-1.7.14.jar (Rhino is mozilla's javascript engine implementation, bundled with XalanJ 2.7.3 src distribution).
Wednesday, April 5, 2023
XSLT 1.0 transformation : finding maximum from a list of numbers, from an XML input document
Apache Xalan project has released XalanJ 2.7.3 few days ago, and I thought to write couple of blog posts here, to report on the basic sanity of XalanJ 2.7.3's functional quality.
Following is a simple XML transformation requirement.
XML input document :
<?xml version="1.0" encoding="UTF-8"?>
<elem>
<a>2</a>
<a>3</a>
<a>5</a>
<a>1</a>
<a>7</a>
<a>4</a>
</elem>
We need to write an XSLT 1.0 stylesheet, that outputs the maximum value from the list of XML "a" elements mentioned within above cited XML document.
Following are the three XSLT 1.0 stylesheets that I've come up with, that do this correctly,
1)
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exslt="http://exslt.org/common"
version="1.0">
<xsl:output method="text"/>
<xsl:template match="/elem">
<xsl:variable name="temp">
<xsl:for-each select="a">
<xsl:sort select="." data-type="number" order="descending"/>
<e1><xsl:value-of select="."/></e1>
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="concat('Maximum : ', exslt:node-set($temp)/e1[1])"/>
</xsl:template>
</xsl:stylesheet>
2)
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exslt="http://exslt.org/common"
version="1.0">
<xsl:output method="text"/>
<xsl:template match="/elem">
Maximum : <xsl:call-template name="findMax"/>
</xsl:template>
<xsl:template name="findMax">
<xsl:variable name="temp">
<xsl:for-each select="a">
<xsl:sort select="." data-type="number" order="descending"/>
<e1><xsl:value-of select="."/></e1>
</xsl:for-each>
</xsl:variable>
<xsl:value-of select="exslt:node-set($temp)/e1[1]"/>
</xsl:template>
</xsl:stylesheet>
3)
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="text"/>
<xsl:template match="/elem">
<xsl:choose>
<xsl:when test="count(a) = 0"/>
<xsl:when test="count(a) = 1">
Maximum : <xsl:value-of select="a[1]"/>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="result">
<xsl:call-template name="findMax">
<xsl:with-param name="curr_max" select="a[1]"/>
<xsl:with-param name="next_node" select="a[2]"/>
</xsl:call-template>
</xsl:variable>
Maximum : <xsl:value-of select="$result"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template name="findMax">
<xsl:param name="curr_max"/>
<xsl:param name="next_node"/>
<xsl:choose>
<xsl:when test="$next_node/following-sibling::*">
<xsl:choose>
<xsl:when test="number($next_node) > number($curr_max)">
<xsl:call-template name="findMax">
<xsl:with-param name="curr_max" select="$next_node"/>
<xsl:with-param name="next_node" select="$next_node/following-sibling::*[1]"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="findMax">
<xsl:with-param name="curr_max" select="$curr_max"/>
<xsl:with-param name="next_node" select="$next_node/following-sibling::*[1]"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
<xsl:otherwise>
<xsl:choose>
<xsl:when test="number($next_node) > number($curr_max)">
<xsl:value-of select="$next_node"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$curr_max"/>
</xsl:otherwise>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
I somehow, personally like the XSLT solution 3) illustrated above, for these requirements. This solution, traverses the sequence of XML "a" elements till the end of "a" elements list, and outputs the maximum value from the list at the end of XML elements traversal. This solution, seems to have an algorithmic time complexity of O(n), with a little bit of possible overhead of XSLT recursive template calls than the other two XSLT solutions.
The XSLT solutions 1) and 2) illustrated above, seem to have higher algorithmic time complexity than solution 3), due to the use of XSLT xsl:sort instruction (which probably has algorithmic time complexity of O(n * log(n)) or O(n * n)). The XSLT solutions 1) and 2) illustrated above, also seem to have higher algorithmic "space complexity" (this measures the memory used by the algorithm) due to storage of intermediate sorted result.
The XalanJ command line, to run above cited XSLT transformations are following,
java org.apache.xalan.xslt.Process -in file.xml -xsl file.xsl
Wednesday, March 29, 2023
A simple XSLT stylesheet, XML document validator
I've been thinking that, this shall be interesting to share.
Please consider following, XSLT 1.0 document transformation definition.
XML input document:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<a>2</a>
<a>4</a>
<a>6</a>
<a>8</a>
<a>10</a>
</root>
We should be able to tell, that this XML document is valid, if all XML /root/a elements within it have even numbers.
The following XSLT 1.0 stylesheet just does this XML document validation check,
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exslt="http://exslt.org/common"
exclude-result-prefixes="exslt"
version="1.0">
<!-- An XSLT stylesheet, that checks whether values of all XML
input /root/a elements have even numbers (in which case, the XML input
document is reported as valid). -->
<xsl:output method="text"/>
<xsl:template match="/root">
<xsl:variable name="result">
<xsl:for-each select="a">
<e1><xsl:value-of select=". mod 2"/></e1>
</xsl:for-each>
</xsl:variable>
<xsl:choose>
<xsl:when test="count(exslt:node-set($result)/*[. = 0]) = count(exslt:node-set($result)/*)">
<xsl:text>XML document is valid</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:text>XML document is in-valid</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Please note that, within above mentioned XSLT 1.0 stylesheet, we've used an XSLT 1.0 extension function "node-set", that is supported by most of the XSLT 1.0 engines (for example, XalanJ as described here https://xalan.apache.org/xalan-j/apidocs/org/apache/xalan/lib/ExsltCommon.html).
For the interest of readers, following is an equivalent XML Schema 1.1 validation, that solves the same problem,
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="root">
<xs:complexType>
<xs:sequence>
<xs:element name="a" type="xs:integer" maxOccurs="unbounded"/>
</xs:sequence>
<xs:assert test="count(a) = count(a[. mod 2 = 0])"/>
</xs:complexType>
</xs:element>
</xs:schema>
Personally, speaking, I shall prefer an XML Schema 1.1 validation for this requirement, since XML Schema language is designed to do XML document validation, whereas XSLT language is designed to do an XML document transformation (but as illustrated within this blog post, the XSLT stylesheet does the job of an XML document validator as well).
Sunday, November 1, 2009
XSLT 1.0: Regular expression string tokenization, and Xalan-J
I got motivated enough, to write a Java extension mechanism for regular expression based, string tokenization facility for XSLT 1.0 stylesheets, using the Xalan-J XSLT 1.0 engine.
Here's Java code and a sample XSLT stylesheet for this particular, functionality:
String tokenizer Xalan-J Java extension:
package org.apache.xalan.xslt.ext; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import org.apache.xpath.NodeSet; import org.w3c.dom.Document; public class XalanUtil { public static NodeSet tokenize(String str, String regExp) throws ParserConfigurationException { String[] tokens = str.split(regExp); NodeSet nodeSet = new NodeSet(); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = dbf.newDocumentBuilder(); Document document = docBuilder.newDocument(); for (int nodeCount = 0; nodeCount < tokens.length; nodeCount++) { nodeSet.addElement(document.createTextNode(tokens[nodeCount])); } return nodeSet; } }Sample XSLT stylesheet, using the above Java extension (named, test.xsl):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:java="http://xml.apache.org/xalan/java" exclude-result-prefixes="java"> <xsl:output method="xml" indent="yes" /> <xsl:param name="str" /> <xsl:template match="/"> <words> <xsl:for-each select="java:org.apache.xalan.xslt.ext.XalanUtil.tokenize($str, '\s+')"> <word> <xsl:value-of select="." /> </word> </xsl:for-each> </words> </xsl:template> </xsl:stylesheet>Now for e.g, when the above stylesheet is run with Xalan as follows: java -classpath <path to the extension java class> org.apache.xalan.xslt.Process -in test.xsl -xsl test.xsl -PARAM str "hello world", following output is produced:
<?xml version="1.0" encoding="UTF-8"?> <words> <word>hello</word> <word>world</word> </words>
This illustrates, that regular expression based string tokenization was applied as designed above, for XSLT 1.0 environment.
The above Java extension, should be running fine with a min JRE level of, 1.4 as it relies on the JDK method, java.lang.String.split(String regex) which is available since JDK 1.4.
PS: For easy reading and verboseness, the package name in the above Java extension class may be omitted, which will cause the corresponding XSLT instruction to be written like following:
xsl:for-each select="java:XalanUtil.tokenize(... I would personally prefer this coding style, for production Java XSLT extensions. Though, this should not matter and to my opinion, decision to handle this can be left to individual XSLT developers.
I hope, that this was useful.
Thursday, April 10, 2008
Xalan-J serializer
I think, there is scope of improvement to the Xalan-J 2.7.1 serializer.
I tried this sample XSLT stylesheet with Xalan-J 2.7.1.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" indent="yes" />
<xsl:template match="/">
<x>
<y/>
</x>
</xsl:template>
</xsl:stylesheet>
The output produced by Xalan is:
<?xml version="1.0" encoding="UTF-8"?><x>
<y/>
</x>
Please note that top most element tag, <x> is not indented properly.
I wish the output in this case should be:
<?xml version="1.0" encoding="UTF-8"?>
<x>
<y/>
</x>
This problem seems to happen with any XML output.
Henry Zongaro provided a good argument that why this is so:
The problem here is that the serializer considers that the result document might be used as an external general parsed entity. So, suppose the result is named result.xml. If it's referenced inside a document such as the following, inserting whitespace before the x element in result.xml would affect the text content of its parent element, doc.
<!DOCTYPE doc [
<!ENTITY ref SYSTEM "result.xml">
]>
<doc>Some non-whitespace test &ref; Some more non-whitespace text</doc>