Sunday, April 25, 2010

XSD 1.1: negative "pattern" facets and assertions

While exploring more of XSD 1.1 assertions, I've been pretty convinced that much of the limitations of XSD "pattern" facet can be overcome with assertions (and of-course one of real benefits of XSD 1.1 assertions is the ability to specify co-occurrence constraints, in XML Schema documents -- here's a nice article explaining XML Schema 1.1 co-occurrence constraints).

I think, one of the things which might get quite difficult to express in XML Schema 1.0, is specifying a negative word list.

For example, if we have this simple XML document:
  <fruit>apple</fruit>

And we want that, the XML element "fruit" must not contain say the words "cherry" or "guava". Although, this looks a pretty straight-forward regex use-case, but unfortunately it might get quite cumbersome to express this seemingly straightforward regex pattern, with the available XSD 1.0 regular-expression syntax.

My quick try to express this with XSD 1.0, was something like following:
<xs:pattern value="^(cherry|guava)" />

But unfortunately, the above pattern facet and quite a few similar regexes, can't accomplish this seemingly common use-case easily (I think, this is doable with XSD 1.0 regex's but certainly, it would be quite tedious to come to the right regex pattern -- of-course regex experts/gurus could do this easily, but not me at this moment!).

And now, I try to express these validation constraints with XSD 1.1 assertions. Here's a sample XSD 1.1 schema [1], using assertions to solve this, and few of similar use-cases:

  <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

     <xs:element name="Example" type="Fruits1" />
  
     <xs:complexType name="Fruits1">
       <xs:sequence>
         <xs:element name="fruit" type="xs:string" />
         <xs:element name="exclude" type="xs:string" />
       </xs:sequence>
       <xs:assert test="not(fruit = tokenize(exclude,','))" />
     </xs:complexType>
  
     <xs:complexType name="Fruits2">
       <xs:sequence>
         <xs:element name="fruit" type="xs:string" />
         <xs:element name="exclude" type="xs:string" />
       </xs:sequence>
       <xs:assert test="not(fruit = (for $x in tokenize(exclude,',') return 
                                              normalize-space($x)))" />
     </xs:complexType>
  
     <xs:complexType name="Fruits3">
       <xs:sequence>
         <xs:element name="fruit" type="xs:string" />
         <xs:element name="exclude" type="xs:string" />
       </xs:sequence>
       <xs:assert test="not(fruit = (for $x in tokenize(exclude,',') return 
                                (string-join(tokenize($x,' '),''))))" />
     </xs:complexType>
  
   </xs:schema>

A sample XML instance document [2], that we'll validate with the above schema, is following:
  <Example>
    <fruit>apple</fruit>
    <exclude>cherry,guava</exclude>
  </Example>

As stated in the original requirements above, we want that the word in element "fruit" must not contain any of words, from the comma-separated list in the "exclude" element.

In the above XSD schema [1], the complex type "Fruits1" can successfully validate the above XML instance document [2].

The complex type "Fruits2" can validate an exclude list, where there could be white-spaces before and after the 'comma separator'. For example, the list "cherry, guava" (please note, an extra white-space after the 'comma') would be considered an appropriate exclusion list for this example. Whereas, this list variant cannot be validated by the schema type, "Fruits1".

And the complex type "Fruits3" can validate an exclude list of kind, "cherry, g u a v a" (i.e, there could be white-space characters, within a word) -- this is a figment of my imagination :). But certainly there could possibly be such lexical constraints in instance documents.

PS: All the examples in this post were tested with, Xerces-J.

I hope, that this post is useful.

4 comments:

anaran said...

Hello Mukul,
I find your blog very informative.
After spending quite some time I finally got XSLT 2.0 support working in my eclipse galileo installation, using saxonb9-1-0-8j.
Could you please give eclipse users like me some hints how they could run your XSD1.1 example in eclipse?
The xerces-j link in your blog seems to point to the dated 2.9.1 release.
I suppose that could not validate your XSD1.1 examples?
Thanks for any pointers!
Adrian

Mukul Gandhi said...

Thanks for your interest in Xerces-J & XSD 1.1.

Xerces-J is expecting to have a 2.10.0 release, in near future when you could work with XSD 1.1 support, with an official Xerces release from Apache.

In the meantime, building Xerces-J JARs, from SVN is the only way to work with Xerces XSD 1.1 implementation (the SVN link is, https://svn.apache.org/repos/asf/xerces/java/branches/xml-schema-1.1-dev/ -- also reachable from, http://xerces.apache.org/xerces2-j/source-repository.html).

I'm not sure though, what do you actually want to do with Xerces XSD 1.1 implementation in Eclipse. After you build Xerces JARs from SVN, you could use these JARs as usually we use JARs in Eclipse projects (say, having them in build classpath & so on).

anaran said...

Thanks, Mukul!
I was able to build from
xml-schema-1.1-dev
sources and schema- as well as instance validation works fine for me on xsd1.1 content.
How could I force the equivalent of -xsd11 validation inside eclipse using WTP XML and XSD editors?
I already have the newly built jars on my CLASSPATH.

To answer your question:
I am getting close to full validation of co-constrained customer data.
This will make the formerly hand-written rules executable. I might implement trivial fixes via stylesheet transformations soon.

Regards,
Adrian

Mukul Gandhi said...

I think, Eclipse XSD editor currently supports XSD 1.0 level only. So we can't get XSD 1.1 editor intellisense support, in Eclipse at the moment (if that's what you are looking for).

Perhaps you could raise this query on an Eclipse forum.