Sunday, May 3, 2020

Online XML Schema validation service

During some of my spare time, I've developed and deployed an 'online XML Schema validation service' using Apache Xerces-J as XML Schema (XSD) processor at back-end. This 'online XML Schema validation service' is located at, http://www.softwarebytes.org/xmlvalidation/. The HTTPS version is available here: https://www.softwarebytes.org/xmlvalidation/.

The mentioned 'online XML Schema validation service', also provides REST APIs to be invoked from any program that can issue HTTP POST requests. The 'online XML Schema validation service' referred above, provides downloadable examples written in Python and C# that use the provided REST APIs. The responses from mentioned REST APIs can be in following formats: XML, JSON, plain text (the REST API response format, can be set while issuing HTTP requests).

Interestingly, I've discovered that, the above mentioned REST APIs can be invoked directly via a tool like curl by using its platform binary. With modern computer OSs (for e.g, Windows 10), curl comes pre-installed within the OS. Following are network responses on the command line, for the few curl requests that I issued to the mentioned REST APIs,

curl --form xmlFile=@two_inp_files/x1_valid_1.xml --form xsdFile1=@two_inp_files/x1.xsd --form ver=1.1 --form xsd11CtaFullXPath=no --form responseType=xml https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<validationReport>
   <xsdVer>1.1</xsdVer>
   <success>
      <message>XML document is assessed as valid with the XSD document(s) that were provided.</message>
   </success>
</validationReport>

curl --form xmlFile=@two_inp_files/x1_invalid_1.xml --form xsdFile1=@two_inp_files/x1.xsd --form ver=1.1 --form xsd11CtaFullXPath=no --form responseType=xml https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<validationReport>
   <xsdVer>1.1</xsdVer>
   <failure>
      <message>XML document is assessed as invalid with the XSD document(s) that were provided.</message>
      <details>
         <detail_1>[Error] x1_invalid_1.xml:3:5:cvc-assertion: Assertion evaluation ('if (@isB = true()) then b else not(b)') for element 'X' on schema type '#AnonType_X' did not succeed.</detail_1>
      </details>
   </failure>
</validationReport>

curl --form xmlFile=@two_inp_files/x1_valid_1.xml --form xsdFile1=@two_inp_files/x1.xsd --form ver=1.1 --form xsd11CtaFullXPath=no --form responseType=json https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

{
    "xsdVer": "1.1",
    "success": {"message": "XML document is assessed as valid with the XSD document(s) that were provided."}
}

curl --form xmlFile=@two_inp_files/x1_invalid_1.xml --form xsdFile1=@two_inp_files/x1.xsd --form ver=1.1 --form xsd11CtaFullXPath=no --form responseType=json https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

{
    "xsdVer": "1.1",
    "failure": {
        "details": ["[Error] x1_invalid_1.xml:3:5:cvc-assertion: Assertion evaluation ('if (@isB = true()) then b else not(b)') for element 'X' on schema type '#AnonType_X' did not succeed."],
        "message": "XML document is assessed as invalid with the XSD document(s) that were provided."
    }
}

curl --form xmlFile=@input_small.xml --form xsdFile1=@assert_2.xsd --form ver=1.1 --form xsd11CtaFullXPath=no https://www.softwarebytes.org/xmlvalidation/api/xsValidationHandler

You selected XSD 1.1 validation.
XML document is assessed as valid with the XSD document(s) you have provided.

(please note that, since the last curl request above doesn't specify a command line argument 'responseType', a response formatted as plain text is received from the server API. i.e, a plain text response from this API, is the default response format)

The mentioned 'online XML Schema validation service', supports both 1.0 and 1.1 versions of XML Schema language.

Wednesday, April 15, 2020

RDBMS, Message queue, Message broker, ESB, HTTP (technology comparison)

The software technologies mentioned in the subject of this blog post (i.e RDBMS, Message queue, Message broker, ESB [enterprise service bus] & HTTP) are the primary 'vendor agnostic' application integration technologies (backbones). I wish to write in this blog post, the characteristics, and the differences between the mentioned application integration technologies, and the relevant use cases when each of the mentioned application integration technologies should be used. Below are the details,

1) RDBMS (relational database management system): This is the oldest, and perhaps the most robust of the mentioned four application integration technologies. It is defined on the notion that, "data" generated and consumed by a software application is persistent for a long time. With RDBMS, the fundamental storage primitives at an application layer are tables. Most of the RDBMS vendor products store their tables, in binary file system files as underlying platform storage. The application code, accesses the RDBMS tables (and may be other things like views, functions, stored procedures, triggers) via a SQL interface (for e.g jdbc in java, and ADO.Net in .NET).

2) Message queue: Message queue is an application storage type, where messages (i.e data) are pushed at the end of queue and retrieved from the front of queue. The message sending and receiving applications are usually different, but they may as well be same. A message until it is pulled by an application 'at most once' from the queue, remains in the queue. This essentially differs from the RDBMS style in a way, that in RDBMS database data remains in the tables no matter how many times it is read by applications. This fundamental difference between RDBMS and "message queue", point to the kind of uses cases where one of these technologies should be used.
For e.g, a search engine's database would be appropriate to be stored in RDBMS databases. Whereas, 'send and forget' or 'read and forget' kinds of messages would be appropriate to be stored in message queue.
Message queue can be further classified into linear queue, and publish subscribe storage. Both linear queue and publish subscribe storage, are managed by an underlying queue manager. With a linear queue, after any one receiver application retrieves message from a queue no other application will find that message in the queue (since it's deleted after the first retrieval). With publish subscribe storage, for a particular class of message (typically published by one application) one or more subscribers can register. Upon publication of a message by a sender application, all subscribers for that message type will receive the message and after which the message would be deleted from the underlying queue(s).

3) Message broker: With this storage type, the underlying storage data structure is same as for "Message queue". Unlike 'Message queue', the 'Message broker' is thought to be central (multiple sender and receiver applications, can connect to a central 'Message broker'). The 'Message broker' kind of a storage is more sophisticated than a 'message queue'.
To establish a simple application integration design, I'd perhaps use message queue (perhaps using multiple message queues for a complex scenario) instead of a message broker.

4) ESB (enterprise service bus): The ESB storage and distribution type, is somewhat like a message broker. But unlike message broker, the message distribution backbone of an ESB is linear rather than central as for the message broker. Just like message queue and message broker, the responsibility of an ESB is to allow multiple message senders to connect with multiple message receivers.

Unlike message queue, both message broker and ESB allow selective routing and transformation of the messages between applications (where routing and message transformation decisions take place at the message broker or on an ESB node).

The following site provides a nice documentation about how to build messaging applications: https://www.ibm.com/support/knowledgecenter/SSFKSJ_9.0.0/com.ibm.mq.dev.doc/q022830_.htm.

5) HTTP: As good as it was when it was introduced, the HTTP protocol and its implementations within web browsers and various programming libraries is a very good means to establish software application integration. An HTTP server can respond with information and can perform actions, as asked by the HTTP clients. For various needs these days, HTTP servers can respond with information stored within the HTTP server itself, or HTTP servers can interface with other backend systems like databases and fetch information from them or modify those information as asked by HTTP clients.

It is useful to know that, HTTP style of software application integration is inherently synchronous in nature (i.e, application clients keep waiting at the point of issuing a request until a response is received), while all other styles of application integration mentioned in this blog post are inherently asynchronous (i.e application clients can start doing something else after issuing a request and before getting a response).

To build a messaging application, I'd perhaps start with a message queue for a simplistic design and gradually moving to message broker or an ESB for more complex requirements.

Any comments about subject matter of this blog post are welcome.

Saturday, March 21, 2020

Using XML Schema 1.1 <alternative> with Xerces-J

I wish to share little information here, about Apache Xerces-J's implementation of XML Schema (XSD) 1.1 'type alternatives'.

The XSD 1.1 specification, defines a particular subset of XPath 2.0 language that can be used as value of 'test' attribute of XSD 1.1 <alternative> element. The XSD 1.1 language's XPath 2.0 subset is much smaller than the whole XPath 2.0 language. The specification of this smaller CTA XPath subset, can be read at https://www.w3.org/TR/xmlschema11-1/#coss-ta (specifically, the section mentioning '2.1 It conforms to the following extended BNF' which has grammar specification for the CTA XPath subset).

In fact, the XSD 1.1 specification allows XSD validators, implementing XSD 1.1's <alternative> element, to support a bigger set of XPath 2.0's features (commonly the full XPath 2.0 language) than what is defined by XSD 1.1 CTA (conditional type alternatives) XPath subset.

For XSD 1.1 CTAs, Xerces-J with user option, allows selecting either:

1) The smaller XPath subset (the default for Xerces-J), or

2) Full XPath 2.0. How selecting between XPath subset or the full XPath 2.0 language, can be done for Xerces-J's CTA implementation is described here, https://xerces.apache.org/xerces2-j/faq-xs.html#faq-3.

I've analyzed a bit, the nature of XSD 1.1 CTA XPath subset language. Following are essentially the main XSD 1.1 CTA XPath subset patterns, that may be used within XSD 1.1 schemas when using XSD <alternative> element,

1) Using comparators (like >, <, =, !=, <=, >=):

The example CTA XPath expressions are following,
@x = @y,
@x = 3,
@x != 3,
@x > @y

2) Using comparators with logical operators:

The example CTA XPath expressions are following,
(@x = @y) or (@p = @q),
((1 = 2) or (5 = 6)) and (5 = 7),
(1 and 2) or (5 and 7)

3) Using XPath 2.0 'not' function:

An example XPath expression is following,
(@x = @y) and not(@p)

Interestingly, the XSD 1.1 CTA XPath subset language, allows using only the XPath 2.0 fn:not function and no other XPath 2.0 built-in functions. Constructor functions, for all built-in XSD types may be used, for e.g xs:integer(..), xs:boolean(..) etc, in XSD 1.1 CTA XPath subset expressions.

As per the XSD 1.1 specification, during XSD 1.1 CTA evaluations, the XML element and attribute nodes are untyped (i.e the XML nodes do not carry any type annotation coming from a XML schema). Therefore, in many cases, XSD 1.1 CTA XPath subset expressions when used with Xerces-J need to use explicit casts (for e.g, <xs:alternative test="(xs:integer(@x) = xs:integer(@y)) and fn:not(xs:boolean(@p))"> with namespace prefix 'fn' bound to the URI 'http://www.w3.org/2005/xpath-functions'). For the CTA XPath subset language or the full XPath 2.0 language for CTAs, it is optional for the XPath expressions to have the "fn" prefix with the XPath built-in functions. Typically, XML schema authors would not use the "fn" prefix for XPath built-in functions.

Tuesday, March 10, 2020

XML Schema 1.1 <assert> continued ...

This blog post is related to the XML Schema (XSD) use case that I've discussed within my previous two blog posts. Consider the following XML Schema 1.1 document, having an XSD <assert> element,

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:sequence>
              <xs:element name="isSeqTwo" type="xs:boolean"/>
              <xs:choice>
                 <xs:sequence>
                    <xs:element name="a" type="xs:string"/>
                    <xs:element name="b" type="xs:string"/>
                 </xs:sequence>
                 <xs:sequence>
                    <xs:element name="p" type="xs:string"/>
                    <xs:element name="q" type="xs:string"/>
                 </xs:sequence>
                 <xs:sequence>
                    <xs:element name="x" type="xs:string"/>
                    <xs:element name="y" type="xs:string"/>
                 </xs:sequence>
               </xs:choice>
           </xs:sequence>       
           <xs:assert test="if (isSeqTwo = true()) then p else not(p)"/>
       </xs:complexType>
    </xs:element>

</xs:schema>

The above schema document, is different than my earlier schema documents that I've presented within my previous two blog posts, in following way:
The XML child content model of an element "X", is a sequence of an element followed by a choice.

Within the earlier two blog posts that I've presented, the XML child content model of element "X" is dependent on the value of an attribute on an element "X", which could be enforced using either an XSD 1.1 <assert> or an <alternative>.

Few XML instance documents that are valid or invalid, according to the above XSD schema document are following:

Valid,

<X>
    <isSeqTwo>0</isSeqTwo>
    <x>string1</x>
    <y>string2</y>
</X>

Valid,

<X>
    <isSeqTwo>1</isSeqTwo>
    <p>string1</p>
    <q>string2</q>
</X>

Invalid,

<X>
    <isSeqTwo>1</isSeqTwo>
    <x>string1</x>
    <y>string2</y>
</X>

The XSD use case illustrated above, is useful and could only be accomplished using an XSD 1.1 <assert> element.

As a side discussion, to re-affirm I would like to cite from the XML Schema 1.1 structures specification the following rules: 3.4.4.2 Element Locally Valid (Complex Type) that say,
For an element information item E to be locally ·valid· with respect to a complex type definition T all of the following must be true:
1
2
3
...
6 E is ·valid· with respect to each of the assertions in T.{assertions} as per Assertion Satisfied (§3.13.4.1).

We can infer, from the above rules from XSD 1.1 spec, that an XML instance element is valid according to a XSD complex type definition, if an XML instance element is valid with respect to each of the assertions present on the complex type with which an XML instance element is validated, in addition to other XSD complex type validation rules.

Sunday, March 1, 2020

XML Schema 1.1 <alternative> use cases with <choice> and <attribute>

While using XML Schema (XSD) 1.1, many times when we use XSD 1.1 <assert> we could find a solution using XSD 1.1 <alternative> as well for the same use cases (and vice versa as well). This is usually the case, when the XML child content model of an element, is dependent on the values of attributes of an element on which the attributes appear. This is evident for the first example, of my previous blog post. Given the same XML input examples, as in the first example of my previous blog post, the following XML Schema 1.1 example using <alternative> is also a possible solution,

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:alternative test="xs:boolean(@isB) eq true()">
          <xs:complexType>
             <xs:sequence>
               <xs:element name="b" type="xs:string"/>
            </xs:sequence>
             <xs:attribute name="isB" type="xs:boolean" use="required"/>
          </xs:complexType>
       </xs:alternative>
       <xs:alternative>
          <xs:complexType>
             <xs:choice>
               <xs:element name="a" type="xs:string"/>            
               <xs:element name="c" type="xs:string"/>
            </xs:choice>
             <xs:attribute name="isB" type="xs:boolean" use="required"/>
          </xs:complexType>
       </xs:alternative>
    </xs:element>

</xs:schema>

Then the question arises, for these same use cases should we use XSD 1.1 <assert> or an <alternative>? Below are the pros and cons for this, according to me:
1) An XSD 1.1 solution, using <assert> has less lines of code than the one using <alternative>, which many would consider as a benefit.
2) I personally, prefer an XPath expression '@isB = true()' (within 'if (@isB = true()) then b else not(b)') of an <assert> over 'xs:boolean(@isB) eq true()' in an <alternative>. With these examples, for the example involving <alternative> an attribute node 'isB' has a type annotation of xs:untypedAtomic that requires an explicit cast with xs:boolean(..). I tend to prefer, the XPath expressions that don't use explicit casts (since, such XPath expressions look more schema aware).
3) One of the benefits, I see with the solution using an XSD 1.1 <alternative> over <assert>, is better error diagnostics in case of XML validation errors.

Saturday, February 15, 2020

XML Schema 1.1 <assert> use cases with <choice> and <attribute>

I've been imagining that, what could be useful use cases of XML Schema (XSD) 1.1 <assert> construct.

According to the XSD 1.1 structures specification, "assertion components constrain the existence and values of related XML elements and attributes".

One of useful use cases possible for XSD 1.1 <assert> is, to constrain the standard behavior of XSD 1.0 / 1.1 <choice> construct. I'll attempt to write something about this, here on this blog post.

Below is an XSD schema example using the <choice> construct, that is correct for both 1.0 and 1.1 versions of XSD language:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:choice>
             <xs:element name="a" type="xs:string"/>
             <xs:element name="b" type="xs:string"/>
             <xs:element name="c" type="xs:string"/>
          </xs:choice>
       </xs:complexType>
    </xs:element>

</xs:schema>

The above schema document, ensures that following XML instance documents would be valid:

<X>
    <a>some string</a>
</X>

,

<X>
    <b>some string</b>
</X>

,

<X>
    <c>some string</c>
</X>

(essentially showing that, element 'X' can have only one of the elements 'a', 'b' or 'c' as a child element)

Lets see how the above XSD example, can be made a little different using XSD elements <attribute> and <assert>. Below is such a modified XSD document,

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:choice>
             <xs:element name="a" type="xs:string"/>
             <xs:element name="b" type="xs:string"/>
             <xs:element name="c" type="xs:string"/>
          </xs:choice>
          <xs:attribute name="isB" type="xs:boolean" use="required"/>
          <xs:assert test="if (@isB = true()) then b else not(b)"/>
       </xs:complexType>
    </xs:element>

</xs:schema>

The complete meaning of above XSD document is following,
1) The <choice> with three <element> declarations below it, essentially are the same constraints as the earlier XSD document has shown.
2) This schema additionally specifies, a mandatory boolean typed attribute named 'isB'.
3) The <assert> specifies that, if value of attribute 'isB' is true then element 'b' must be present as a child of element 'X'. If value of attribute 'isB' is false, then element 'X' cannot have element 'b' as its child but one of elements 'a' or 'c' would be a valid child of element 'X'.

The following XML instance documents would be valid according to above mentioned XSD document:

<X isB="1">
  <b>some string</b>
</X>

,

<X isB="0">
  <a>some string</a>
</X>

,

<X isB="0">
  <c>some string</c>
</X>

And, the following XML instance documents would be invalid according to the same XSD document:

<X isB="0">
  <b>some string</b>
</X>

,

<X isB="1">
  <a>some string</a>
</X>

,

<X isB="0">
  <d>some string</d>
</X>

Now lets consider another XSD example, where the schema document specifies a choice between three or more sequences. Below is mentioned such a schema document:

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

    <xs:element name="X">
       <xs:complexType>
          <xs:choice>
             <xs:sequence>
                <xs:element name="a" type="xs:string"/>
                <xs:element name="b" type="xs:string"/>
             </xs:sequence>
             <xs:sequence>
        <xs:element name="p" type="xs:string"/>
        <xs:element name="q" type="xs:string"/>
             </xs:sequence>
             <xs:sequence>
        <xs:element name="x" type="xs:string"/>
        <xs:element name="y" type="xs:string"/>
             </xs:sequence>
          </xs:choice>
          <xs:attribute name="isSeqTwo" type="xs:boolean" use="required"/>
          <xs:assert test="if (@isSeqTwo = true()) then p else not(p)"/>
       </xs:complexType>
    </xs:element>

</xs:schema>

The complete meaning of above XSD document is following,
1) A <choice> is specified between three <sequence> elements. Therefore, element 'X' can have one of following sequences as its child: {a, b}, {p, q} or {x, y}.
2) This schema additionally specifies, a mandatory boolean typed attribute named 'isSeqTwo'.
3) The <assert> specifies that, if value of attribute 'isSeqTwo' is true then sequence {p, q} must be present as a child of element 'X'. If value of attribute 'isSeqTwo' is false, then element 'X' cannot have sequence {p, q} as its child but one of sequences {a, b} or {x, y} would be a valid child of element 'X'.

The following XML instance documents would be valid according to above mentioned XSD document:

<X isSeqTwo="1">
  <p>string1</p>
  <q>string2</q>
</X>

,

<X isSeqTwo="0">
  <a>string1</a>
  <b>string2</b>
</X>

,

<X isSeqTwo="0">
  <x>string1</x>
  <y>string2</y>
</X>

And, the following XML instance documents would be invalid according to the same XSD document:

<X isSeqTwo="0">
  <p>string1</p>
  <q>string2</q>
</X>

,

<X isSeqTwo="1">
  <a>string1</a>
  <b>string2</b>
</X>

,

<X isSeqTwo="0">
  <i>string1</i>
  <j>string2</j>
</X>


All the above examples, and any other XSD 1.0/1.1 constructs may be used with any standards compliant XSD validator.

That's about all I wanted to say, about this topic.

Saturday, January 25, 2020

Apache Xerces-J 2.12.1 now available

On behalf of Apache Xerces XML project team, I'm pleased to share that version 2.12.1 of Apache Xerces-J is now available. For more information about this new Xerces-J release and to download Xerces-J, please visit the Xerces-J site.