I got motivated enough, to write a Java extension mechanism for regular expression based, string tokenization facility for XSLT 1.0 stylesheets, using the Xalan-J XSLT 1.0 engine.
Here's Java code and a sample XSLT stylesheet for this particular, functionality:
String tokenizer Xalan-J Java extension:
package org.apache.xalan.xslt.ext; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import org.apache.xpath.NodeSet; import org.w3c.dom.Document; public class XalanUtil { public static NodeSet tokenize(String str, String regExp) throws ParserConfigurationException { String[] tokens = str.split(regExp); NodeSet nodeSet = new NodeSet(); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = dbf.newDocumentBuilder(); Document document = docBuilder.newDocument(); for (int nodeCount = 0; nodeCount < tokens.length; nodeCount++) { nodeSet.addElement(document.createTextNode(tokens[nodeCount])); } return nodeSet; } }Sample XSLT stylesheet, using the above Java extension (named, test.xsl):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:java="http://xml.apache.org/xalan/java" exclude-result-prefixes="java"> <xsl:output method="xml" indent="yes" /> <xsl:param name="str" /> <xsl:template match="/"> <words> <xsl:for-each select="java:org.apache.xalan.xslt.ext.XalanUtil.tokenize($str, '\s+')"> <word> <xsl:value-of select="." /> </word> </xsl:for-each> </words> </xsl:template> </xsl:stylesheet>Now for e.g, when the above stylesheet is run with Xalan as follows: java -classpath <path to the extension java class> org.apache.xalan.xslt.Process -in test.xsl -xsl test.xsl -PARAM str "hello world", following output is produced:
<?xml version="1.0" encoding="UTF-8"?> <words> <word>hello</word> <word>world</word> </words>
This illustrates, that regular expression based string tokenization was applied as designed above, for XSLT 1.0 environment.
The above Java extension, should be running fine with a min JRE level of, 1.4 as it relies on the JDK method, java.lang.String.split(String regex) which is available since JDK 1.4.
PS: For easy reading and verboseness, the package name in the above Java extension class may be omitted, which will cause the corresponding XSLT instruction to be written like following:
xsl:for-each select="java:XalanUtil.tokenize(... I would personally prefer this coding style, for production Java XSLT extensions. Though, this should not matter and to my opinion, decision to handle this can be left to individual XSLT developers.
I hope, that this was useful.
2 comments:
This is a fascinating problem. I was hoping this would work, but I got:
ERROR: 'Error checking type of the expression 'com.sun.org.apache.xalan.internal.xsltc.compiler.ForEach@1bc74f37'.'
FATAL ERROR: 'Could not compile stylesheet'
(Location of error unknown)XSLT Error (javax.xml.transform.TransformerConfigurationException): Could not compile stylesheet
it seems, you are using Xalan that's bundled with Sun JDK. To get this to work, I recommend that you should use the latest version, of Apache Xalan-J (ref, http://xml.apache.org/xalan-j/).
Post a Comment