Saturday, November 22, 2008

Are multiple XPath predicates same as boolean "and" operator

I had a doubt about this concept, and asked following question on xsl-list.

Supposing I write the following XPath expressions,

1) X[c1][c2] or specifying generically, X[c1][c2][]...[cn]

2) X[c1 and c2] or specifying generically, X[c1 and c2 ... and cn]

where c1, c2 etc. are boolean expressions.

are the two forms (1 & 2) above exactly equivalent (i.e., will they return the same nodeset/sequence)? I think yes ... but just wanted to confirm with the list.

if 1 & 2 are exactly equivalent, then what could be the rule of thumb for using which form in certain scenarios?

There was a good discussion on the list about this, and list members shared some useful thoughts.

Below is a summary of the points we discussed on the list.

1. David Carlisle

> are the two forms (1 & 2) above exactly equivalent

No

compare

X[position()=2][position()=2]

and

[position()=2 and position()=2]

the first one is

()

the second is

X[2]

David further wrote,

context position (position()) and size (last()) do change. so basically repeated filters are equivalent to and unless any of them depend on position() or last(), including the special case of [integer] being equivalent to [position()=integer]
this last case is what makes it tricky to do a static rewrite of this.

If you have

X[... foo ..][... bar ...]

you can only rewrite that to

X[(... foo ..) and (... bar ...)]

if you know that neither expression will evaluate to a number at run time.


2. Vasu Chakkera

If that were true, then the condition

myelement[@myattribute][1] should be same as
myelement[1][@myattribute], which is not true...

The predicate order is important

in a typical "and"

[a and b] = [b and a]


3. Andrew Welch

Only / will change the context node, so I would've thought one predicate after the other is pretty much equivalent apart from cases that rely on size of the selection (which is the only thing that changes after each predicate).

Mukul: I asked a related question in continuation to this.

for real world XSLT/XPath programs, upto how many predicates can we typically see?

I haven't seen programs using 3, 4 or more predicates.

X[..][..][..][..]

I have used only one or two predicates upto now.

are excessively large number or predicates really useful? (though, the syntax allows that)

I think perhaps, for complex 'and' conditions, using multiple predicates are useful.

David shared an interesting observation about this:

He has been using some stylesheets having upto 11 predicates.

He wrote:

> are excessively large number or predicates really useful? (though, the syntax allows that)

isn't that like asking if complicated expressions are useful? they are useful if you need them, otherwise they are not.

3 or 4 predicates is totally routine but the most common reason for having larger numbers is to filter attributes

[not(@purpose='iemode')]
[not(@purpose='artifact')]
[not(@purpose='w-dimension')]

is equivalent to

[not(@purpose='iemode') and
[not(@purpose='artifact') and
[not(@purpose='w-dimension')]

but I'd almost always use the first form in XSLT 1.0 because it's easier to indent and easier to refactor, but if starting from the beginning in XSLT 2.0 I'd write it as

[not(@purpose=('iemode','artifact','w-dimension'))]

Mukul: This was a nice discussion I believe, and I have learnt few useful concepts.

Tuesday, November 4, 2008

fn:contains -> multiple strings to compare with

An XSLT user asked following question on xsl-list:

I want to have something that does this: contains('$d/ris:organ/text()', 'Hamburg' or 'Koblenz' or 'xxx'...) ===> Compare 1 String with multpile strings.

instead of: contains('$d/ris:organ/text()','Hamburg') or contains('$d/ris:organ/text()','Koblenz')...

Andrew Welch suggested following answer:

some $x in ('Hamburg', 'Koblenz', 'xxx') satisfies
contains($d/ris:organ/text(), $x)


This uses the XPath 2.0 quantified expression, "some".

This is cool.

I was prompted to share Andrew's answer here, because I thought of a lengthy and perhaps inefficient solution for this (I feel a bit stupid, actually :) ):
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:my="http://my-functions"
version="2.0">

<xsl:output indent="yes" omit-xml-declaration="yes" />

<xsl:template match="/">
<xsl:variable name="str" select="'hello xxx dd'" />
<xsl:variable name="list" select="('Hamburg','Koblenz','xxx')" />

<xsl:if test="my:contains($str, $list)">
matches
</xsl:if>

</xsl:template>

<!-- a custom 'contains' implementation -->
<xsl:function name="my:contains" as="xs:boolean">
<xsl:param name="str" as="xs:string" />
<xsl:param name="list" as="xs:string+" />

<xsl:variable name="temp" as="xs:boolean*">
<xsl:for-each select="$list">
<xsl:if test="contains($str, .)">
<xsl:sequence select="xs:boolean('true')" />
</xsl:if>
</xsl:for-each>
</xsl:variable>

<xsl:sequence select="if ($temp[1] = xs:boolean('true')) then
xs:boolean('true') else xs:boolean('false')" />

</xsl:function>

</xsl:stylesheet>