Friday, January 16, 2009

Normalizing unnecessary whitespace text nodes during XSLT transformation

Let's say that my input XML is following,

<test>
<a/>
<b/>
<c/>
<d/>
<e/>
<f>some data ..</f>
</test>

I need to write an XSLT transformation, which just removes elements, 'c' and 'd' and keeps rest of the structure same.

The result of the transformation should be following [1]:

<test>
<a/>
<b/>
<e/>
<f>some data ..</f>
</test>

The obvious solution to this problem is, to write a modified identity transformation logic.

i.e.,

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="xml" indent="yes" />

<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*" />
</xsl:copy>
</xsl:template>

<xsl:template match="c | d" />

</xsl:stylesheet>

But there is a subtle flaw in this logic. The actual output produced by the above stylesheet is,

<?xml version="1.0" encoding="UTF-8"?>
<test>
<a/>
<b/>


<e/>
<f>some data ..</f>
</test>

There are a kind of two whitepace holes in the output (created by the elements which are removed). This makes the output not 100% same as the desired output [1].

The whitespace holes in the output above can be very well explained. They are actually the newline whitespaces (near the elements 'c' and 'd') present in the original document, which are preserved in the generated output.

Adding a little bit of extra logic in the stylesheet can fix this problem.

The right solution will be following,

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:output method="xml" indent="yes" />

<xsl:template match="node() | @*">
<xsl:copy>
<xsl:apply-templates select="node() | @*" />
</xsl:copy>
</xsl:template>

<xsl:template match="c | d" />

<xsl:template match="text()[(normalize-space() = '') and (preceding-sibling::node()[1]/self::c or preceding-sibling::node()[1]/self::d)]" />

</xsl:stylesheet>


Please note the last template in this stylesheet, which fixed the whitespaces problem for me.

2 comments:

crontab said...

please check the template, it doesn't fully show in the browser

Mukul Gandhi said...

sorry for a late reply. the posts I make, I usually edit in google chrome. With google chrome, the post appears fine. unfortunately, I don't do enough portability checks to my posts.