Friday, September 12, 2008

Should we include <?xml version="1.0" ... in XPath data model, or DOM?

We all know that the following declaration statement appears at the beginning of most of the XML documents.

[1]
<?xml version="1.0" encoding="UTF-8"?>

But the details of the above statement is not included in a DOM tree, and neither it is part of the XPath (2.0) data model. To my opinion, this statement is used by the XML parser to initialize certain behaviors (or to enable certain properties).

I just had a weird thought, that should we not include this information as part of XPath data model, perhaps as properties of the document node; and in case of DOM, part of the DOM tree? If this declaration is not present in the XML document, then the relevant properties can be empty sequences.

The XML declaration is available in the XML infoset [2], but it's not included in a DOM, or the XPath data model.

I think, perhaps the XML declaration information is not useful to end user applications, and is only useful to the XML parser.

[2] http://www.w3.org/TR/xml-infoset/

Michael Glavassevich on xml-dev list corrected me, about DOM:

Information from the XML declaration is already stored in the DOM [3][4][5] (since DOM Level 3). Within the API the values have an effect on serialization, in-memory well-formedness checking and in-memory validation.

[3] http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Document3-version
[4] http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Document3-encoding
[5] http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/core.html#Document3-standalone

No comments: