Thursday, April 6, 2023

XSLT 1.0 transformation : find distinct values

In continuation to my previous blog post on this site, this blog post describes how to use XSLT 1.0 language (tested with Apache XalanJ 2.7.3 along with its JavaScript extension function bindings), to find distinct values (i.e, doing de-duplication of data set) from data set originating from an XML instance document.

Following is an XSLT transformation example, illustrating these features.

XML instance document:

<?xml version="1.0" encoding="UTF-8"?>

<elem>

  <a>2</a>

  <a>3</a>

  <a>3</a>

  <a>5</a>

  <a>3</a>

  <a>1</a>

  <a>2</a>

  <a>5</a>

</elem>

Corresponding XSLT 1.0 transformation:

<?xml version="1.0"?>

<xsl:stylesheet  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

                          xmlns:xalan="http://xml.apache.org/xalan"

          xmlns:js="http://js_functions"

                          extension-element-prefixes="js"

                          version="1.0">

   <xsl:output method="text"/>

   <xalan:component prefix="js" functions="reformString">

      <xalan:script lang="javascript">

        function reformString(str)

        {

           return str.substr(0, str.length - 1);

        }

      </xalan:script>

   </xalan:component>

   <xsl:template match="/elem">

      <xsl:if test="count(a) &gt; 0">

         <xsl:variable name="result">

            <xsl:call-template name="distinctValues">

               <xsl:with-param name="curr_node" select="a[1]"/>

               <xsl:with-param name="csv_result" select="concat(string(a[1]), ',')"/>

            </xsl:call-template>

         </xsl:variable>

         <xsl:value-of select="js:reformString(string($result))"/>

      </xsl:if>

   </xsl:template>

   <xsl:template name="distinctValues">

      <xsl:param name="curr_node"/>

      <xsl:param name="csv_result"/>

      <xsl:choose>

        <xsl:when test="$curr_node/following-sibling::*">

           <xsl:variable name="temp1">

              <xsl:choose>

         <xsl:when test="not(contains($csv_result, concat(string($curr_node), ',')))">

            <xsl:value-of select="concat($csv_result, string($curr_node), ',')"/>

         </xsl:when>

         <xsl:otherwise>

            <xsl:value-of select="$csv_result"/>

         </xsl:otherwise>

              </xsl:choose>

           </xsl:variable>

           <xsl:call-template name="distinctValues">

      <xsl:with-param name="curr_node" select="$curr_node/following-sibling::*[1]"/>

      <xsl:with-param name="csv_result" select="normalize-space($temp1)"/>

           </xsl:call-template>

        </xsl:when>

        <xsl:otherwise>

           <xsl:value-of select="$csv_result"/>

        </xsl:otherwise>

      </xsl:choose>      

   </xsl:template>

</xsl:stylesheet>

The above mentioned, XSLT transformation produces the following, desired result,

2,3,5,1

XalanJ users could find the, JavaScript language related jars (which needs to be available within, the jvm classpath at run-time during XSLT transformation) within XalanJ src distribution. These relevant jar files are : bsf.jarcommons-logging-1.2.jarrhino-1.7.14.jar (Rhino is mozilla's javascript engine implementation, bundled with XalanJ 2.7.3 src distribution).


No comments: