I think, one of the things which might get quite difficult to express in XML Schema 1.0, is specifying a negative word list.
For example, if we have this simple XML document:
<fruit>apple</fruit>
And we want that, the XML element "fruit" must not contain say the words "cherry" or "guava". Although, this looks a pretty straight-forward regex use-case, but unfortunately it might get quite cumbersome to express this seemingly straightforward regex pattern, with the available XSD 1.0 regular-expression syntax.
My quick try to express this with XSD 1.0, was something like following:
<xs:pattern value="^(cherry|guava)" />
But unfortunately, the above pattern facet and quite a few similar regexes, can't accomplish this seemingly common use-case easily (I think, this is doable with XSD 1.0 regex's but certainly, it would be quite tedious to come to the right regex pattern -- of-course regex experts/gurus could do this easily, but not me at this moment!).
And now, I try to express these validation constraints with XSD 1.1 assertions. Here's a sample XSD 1.1 schema [1], using assertions to solve this, and few of similar use-cases:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="Example" type="Fruits1" /> <xs:complexType name="Fruits1"> <xs:sequence> <xs:element name="fruit" type="xs:string" /> <xs:element name="exclude" type="xs:string" /> </xs:sequence> <xs:assert test="not(fruit = tokenize(exclude,','))" /> </xs:complexType> <xs:complexType name="Fruits2"> <xs:sequence> <xs:element name="fruit" type="xs:string" /> <xs:element name="exclude" type="xs:string" /> </xs:sequence> <xs:assert test="not(fruit = (for $x in tokenize(exclude,',') return normalize-space($x)))" /> </xs:complexType> <xs:complexType name="Fruits3"> <xs:sequence> <xs:element name="fruit" type="xs:string" /> <xs:element name="exclude" type="xs:string" /> </xs:sequence> <xs:assert test="not(fruit = (for $x in tokenize(exclude,',') return (string-join(tokenize($x,' '),''))))" /> </xs:complexType> </xs:schema>
A sample XML instance document [2], that we'll validate with the above schema, is following:
<Example> <fruit>apple</fruit> <exclude>cherry,guava</exclude> </Example>
As stated in the original requirements above, we want that the word in element "fruit" must not contain any of words, from the comma-separated list in the "exclude" element.
In the above XSD schema [1], the complex type "Fruits1" can successfully validate the above XML instance document [2].
The complex type "Fruits2" can validate an exclude list, where there could be white-spaces before and after the 'comma separator'. For example, the list "cherry, guava" (please note, an extra white-space after the 'comma') would be considered an appropriate exclusion list for this example. Whereas, this list variant cannot be validated by the schema type, "Fruits1".
And the complex type "Fruits3" can validate an exclude list of kind, "cherry, g u a v a" (i.e, there could be white-space characters, within a word) -- this is a figment of my imagination :). But certainly there could possibly be such lexical constraints in instance documents.
PS: All the examples in this post were tested with, Xerces-J.
I hope, that this post is useful.