Tuesday, December 28, 2010

Schema based XML compare

David A. Lee (producer of XMLSH -- A command line shell for XML) raised an interesting discussion a while ago on XML-DEV mailing list, about how to do XML Schema aware XML document comparison. The whole of this discussion thread can be read here. Michael Kay suggested to use the XPath 2.0 function deep-equal (where the input document trees need to be validated by a schema -- to enable type-aware comparison, before doing a comparison by this function) for this kind of use case. Following Michael's idea I was playing with this concept using IBM's XPath 2.0 engine (which is XML Schema aware and is a component of WebSphere Application Server feature pack for XML). For the interest of readers, here's a minimal Java program illustrating this program design.
package com.ibm.xpath2;

import javax.xml.namespace.QName;
import javax.xml.transform.stream.StreamSource;

import com.ibm.xml.xapi.XDynamicContext;
import com.ibm.xml.xapi.XFactory;
import com.ibm.xml.xapi.XPathExecutable;
import com.ibm.xml.xapi.XSequenceCursor;
import com.ibm.xml.xapi.XSequenceType;
import com.ibm.xml.xapi.XStaticContext;

public class XMLCompare {

    public static void main(String[] args) throws Exception {
        String dataDir = System.getProperty("dataDir.path");
  
        XFactory factory = XFactory.newInstance();
        factory.setValidating(XFactory.FULL_VALIDATION);
        factory.registerSchema(new StreamSource(dataDir + "/test.xsd"));
        
        XStaticContext staticContext = factory.newStaticContext();
        staticContext.declareVariable(new QName("doc1"), factory.getSequenceTypeFactory().                      documentNode(XSequenceType.OccurrenceIndicator.ONE));
        staticContext.declareVariable(new QName("doc2"), factory.getSequenceTypeFactory().                                      documentNode(XSequenceType.OccurrenceIndicator.ONE));
        XDynamicContext dynamicContext = factory.newDynamicContext();
        dynamicContext.bind(new QName("doc1"), new StreamSource(dataDir + "/test1.xml"));
        dynamicContext.bind(new QName("doc2"), new StreamSource(dataDir + "/test2.xml"));
                
        XPathExecutable executable = factory.prepareXPath("deep-equal($doc1, $doc2)", staticContext);
        XSequenceCursor result = executable.execute(dynamicContext);
        if (result.exportAsList().get(0).getBooleanValue()) {
           System.out.println("deep-equal == true");
        }
        else {
           System.out.println("deep-equal == false");
        }
    }
} 

Following are the XML and XML Schema documents used for the above example.

test1.xml
<test>10.00</test>
test2.xml
<test>10</test>

test.xsd
<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema">
  <element name="test" type="double" />
</schema>

For the above examples, if the schema type of element node "test" is xs:double then both the XML documents above are reported deep-equal (since the values 10 and 10.00 are same double values, and the element node was annotated with schema type xs:double and deep-equal function did a type aware comparison of XML documents). But if say the schema type of element node "test" is xs:string, then the XML documents shown above would be reported not deep-equal.

I hope that this post is useful.

No comments: