Monday, May 30, 2022

XML Schema : identity constraints essentials and best practices

In this blog post, I'll attempt to describe the best practices, for the use of XML Schema (XSD) identity constraints. I'm going to compare here, the XSD identity constraint instructions xs:unique and xs:key, and describe when to use which one of these.

The XSD xs:key serves the same purpose within XML, as the RDBMS primary keys, whereas XSD xs:unique is a generic syntax to enforce unique values within a set of XML data values. xs:key also enforces, unique values within a set of XML data values. Unlike xs:key, xs:unique permits the values within a XML dataset to be absent (i.e, logically speaking as null values).

Please consider following XML Schema validation example.

XML Schema document:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

   <xs:element name="catalog" type="CatalogType">

      <xs:unique name="prodNumKey">

         <xs:selector xpath="*/product"/>

         <xs:field xpath="number"/>

      </xs:unique>

   </xs:element>

   <xs:complexType name="CatalogType">

      <xs:sequence>

         <xs:element name="department" maxOccurs="unbounded">

            <xs:complexType>

               <xs:sequence>

                  <xs:element name="product" maxOccurs="unbounded">

                     <xs:complexType>

                        <xs:sequence>

                           <xs:element name="number" type="xs:positiveInteger" minOccurs="0"/>

                           <xs:element name="name" type="xs:string"/>

                           <xs:element name="price">

                              <xs:complexType>

                                 <xs:simpleContent>

                                    <xs:extension base="xs:decimal">

                                       <xs:attribute name="currency" type="xs:string"/>

                                    </xs:extension>

                                 </xs:simpleContent>

                              </xs:complexType>

                           </xs:element>

                        </xs:sequence>

                     </xs:complexType>   

                  </xs:element>

               </xs:sequence>

               <xs:attribute name="number" type="xs:positiveInteger"/>

            </xs:complexType>

         </xs:element>

      </xs:sequence>

   </xs:complexType>

</xs:schema>

One of valid XML instance document, for the above mentioned XSD schema, is following:

<catalog>

  <department number="021">

    <product>

      <number>557</number>

      <name>Short-Sleeved Linen Blouse</name>

      <price currency="USD">29.99</price>

    </product>

    <product>

      <name>Ten-Gallon Hat</name>

      <price currency="USD">69.99</price>

    </product>

    <product>

      <number>443</number>

      <name>Deluxe Golf Umbrella</name>

      <price currency="USD">49.99</price>

    </product>

  </department>

</catalog>

Pleas note, the following, within above cited XML Schema validation example,

1) Within the XML Schema document, the "number" child of "product" is specified as following,

<xs:element name="number" type="xs:positiveInteger" minOccurs="0"/>

(i.e, with minOccurs="0", meaning that this element is optional within the corresponding XML instance document)

2) The "catalog" element has following XSD xs:unique definition bound to it,

<xs:unique name="prodNumKey">

     <xs:selector xpath="*/product"/>

     <xs:field xpath="number"/>

</xs:unique>

The above stated facts, mean that, the "number" element is not intended to function as the primary key of "product" data set (because, the primary key value has to be present within all the records of the data set), but for the set of "number" elements that are present within the mentioned XML instance document (the "number" element can be absent within certain "product" elements, as per the above mentioned XML Schema document) their values have to be unique.

We've discussed, the role of XSD xs:unique instruction within above mentioned paragraphs.


Now, as we've stated earlier within this blog post, how do we enforce primary key kind of behavior within an XML Schema document.

Within the context, of above mentioned example, this can be simply done by changing the "number" element declaration to following (i.e, we must not write minOccurs="0" within the XML element declaration),

<xs:element name="number" type="xs:positiveInteger"/>

And, write the "catalog" element declaration as following (i.e, we now use xs:key instead of xs:unique), 

<xs:element name="catalog" type="CatalogType">

      <xs:key name="prodNumKey">

         <xs:selector xpath="*/product"/>

         <xs:field xpath="number"/>

      </xs:key>

</xs:element>

The above changes to the XML Schema document mean that,

All "product" elements must have a "number" child, and all the "number" values within XML instance document have to be unique (and, these characteristics shall make, "number" element as a primary key for "product" data set).


The XML Schema features, related to constructs xs:unique and xs:key, described within this blog post, are supported both by 1.0 and 1.1 versions of XML Schema language.


Acknowledgements : The XML Schema validation example, mentioned within this blog post is borrowed from Priscilla Walmsley's excellent book "Definitive XML Schema, 2nd edition".