Saturday, May 3, 2008

Output validation with XSLT 2.0

An interesting example occurred to me, about Schema-aware XSLT stylesheet design. Below is the code for it.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="2.0">

<xsl:output method="xml" indent="yes" />

<xsl:import-schema>
<xs:schema>
<xs:element name="x">
<xs:complexType>
<xs:sequence>
<xs:element name="y" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
</xsl:import-schema>

<xsl:import-schema>
<xs:schema>
<xs:element name="p">
<xs:complexType>
<xs:sequence>
<xs:element name="q" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
</xsl:import-schema>

<xsl:template match="/">
<xsl:variable name="temp1">
<x xsl:validation="strict">
<y/>
</x>
</xsl:variable>
<xsl:variable name="temp2">
<p xsl:validation="strict">
<q/>
</p>
</xsl:variable>
<result>
<xsl:copy-of select="$temp1" />
<xsl:copy-of select="$temp2" />
</result>
</xsl:template>

</xsl:stylesheet>

This stylesheet imports/declares two inline XSD schemas. In the body of the root template, two variables (temp1 and temp2) request strict validation of the element markup.

If we run this example with a Schema-aware XSLT 2.0 processor, we can find that invalid content cannot be generated from the stylehseet.

An alternate writing style for the above example could be:

<xsl:template match="/">
<xsl:variable name="temp1">
<x>
<y/>
</x>
</xsl:variable>
<xsl:variable name="temp2">
<p>
<q/>
</p>
</xsl:variable>
<result>
<xsl:copy-of select="$temp1" validation="strict" />
<xsl:copy-of select="$temp2" validation="strict" />
</result>
</xsl:template>

Now we specify validation="strict" option on xsl:copy-of instruction.

The intended meaning is same in both the above cases.

This to me is quite useful XSLT facility. XSLT 2.0 is very flexible, where we want the validation in output tree to occur.

12 comments:

nishant said...

Hi Mukul,

You seem to know a lot about XML.
I just wanted to know that do you know how I can validate the output of a XSL Style sheet.
I will try to be more specific:
- I have lots of XML documents
- I have a style sheet (XSL) which is used to see them on the browser.
- I want a tool to validate that the data that I see on browser is the data in the XML and is complete.
- If no such tool exists then can we validate certain critical sections of the XML.

Thanks in advance

Nishant Gupta
nishant@cs.jhu.edu

Mukul Gandhi said...

Hi Nishant,
From your post, it seems that you view XML data in the browser and want the XML validation to occur before you make use of the XML data. I think, this is a good use of the Schema technology.

I think, you can put to use the XML Schemas in 3 ways for this requirement.

1) Validate XML on server side. Here you can leverage any available XML Schema validators, like Xerces. You only send validated XML to the browser.

2) Validate XML on client side. In this case you need to use browser support for XML validation. If you want to use this approach, you should study the browser documentation on how to do this.

3) You can generate validated XML output from the XSLT 2.0 stylesheet, which this post talks about. This is a very new technology, and I think is very useful.

I would personally prefer the server side approach, or a Schema-aware stylesheet design.

Regards,
Mukul

nishant said...

Hi Mukul,

Thanks for such a quick reply but I think I did not frame my question correctly.

- XML validation according to schema is not the problem.

Let me explain the problem again:

- I have some XML Documents
- I use a tool by Altova which checks weather they are valid according to schema or not.

Now the problem,

- I use XSL style sheets to display data in the browser.
- What I want now is would I be able to generate a report alongside that all the data in XML is displayed on to the browser(as XSL can eat some data if a tag is missed or some If statement is not working).
- If the data is critical(marked in XML) and is not shown on the browser then the validation report should fail.
- So one approach is to use the XSL to make a HTML o/p and a tool to parse the html and the used XML to see weather the critical section were shown or not.
- So what I want to ask, IS there any better solution?? Which is implemented inside the XSLT itself so its real time.
- The thing is I can see visually everything is working but its medical data so FDA needs to be shown report that what scientists see is correct.
- And also visual approach is lame and the amount of data is enormus.

I know I am taking up your time but I am kind of struck. I have searched google and could not find anything. My supervisor things that if something to validate XSL o/p comes it will be on a research paper. So I was wondering if you have seen something like this or can think of some IDEA.

Thanks

-Nishant

Mukul Gandhi said...

Hi Nishant,
I think this approach cited by you is very much feasible.

"So one approach is to use the XSL to make a HTML o/p and a tool to parse the html and the used XML to see weather the critical section were shown or not."

Instead of HTML output, you should have a XHTML output, which you can parse by a regular XML parser, or feed to some XSLT stylesheet for comparison with the original XML.

Regards,
Mukul

nishant said...

Hey Mukul

Thanks for all the replies.
This was helpful as I am very new to the XML world. I am still trying my way out of it as I want something realtime which generates the report as and when it is displaying data on the browser. I some how feel 2.0 might provide some help.

Thanks for all the help

Nishant

Durga Rajan said...

hi Mukul,
i have a problem. Can you please provide me a good suggestion.the expalnation for the problem is as follows:
i have a xslt code which transforms a xml to xsd. i want to throw an error as the output when i execute the xslt if the schema generated by the xslt is not valid. So this should stop the schema generation also. so the output should only be an error message without the generation of the schemas

Mukul Gandhi said...

With Schema aware XSLT 2.0, this could be quite easy. Just put, xsl:validation="strict" in your stylesheet.

For XSLT 1.0, I can think of two approaches:

1. Generate the Schema from your stylesheet, and as an external 2nd step process, validate the Schema. You could pipeline the two stages using a Java program, which will make the whole thing appear as a single process.

2. Generate the Schema in a variable in the stylesheet. Then you could use an extension mechanism to validate the Schema (say in Java). This would be advantageous as Schema validation would occur in the same process as Schema generation.

To validate the Schema, I think you could, use:

1. Schema for Schemas, or

2. Use an API from a product like Xerces

Durga Rajan said...

hi Mukul
thanks for the reply...
I use altova authentic to perform the transformation.
with regards to the 1s1 approach:
Is it possible to pipeline the two stages using a Java program in authentic? and how to validate the schema using an external process

thanks
Durga

Mukul Gandhi said...

i am sorry, I haven't worked with Altova authentic.

To pipeline the two stages (Schema generation, and validation) in Java, Altova authentic must support invocation of the transform from a Java program. But I am just guessing the possibilities …

You must ask with Altova (even search in the software help), or Altova's forums whether it provides this feature.

Durga Rajan said...

Thanks Mukul

Durga Rajan said...

I am using xslt 2.0 but the version of the output xml is 1.0
i have tried using attribute validation = "strict" in xsl:result-document element but its not working out...

Mukul Gandhi said...

validation="strict" works with a Schema aware XSLT 2.0 processor. It won't work with basic XSLT 2.0 processor.

As of today, only Saxon and Altova support Schema aware XSLT 2.0 processing. You can try either of these processors to run Schema aware stylesheets.

validation="strict" won't work anywhere in the 2.0 stylesheet. it will work at specific places, as defined in the language specification.

I wrote an article for IBM, developerWorks about Schema aware XSLT 2.0 processing, which is available at, http://www.ibm.com/developerworks/xml/library/x-schemaxslt.html. It might help you to learn the basics of Schema awareness in XSLT stylesheets. If you need a more rigorous definition of Schema aware facilities in XSLT 2.0, please read the language specification (the links you can also find at the bottom of my article).