SQL is a Navigationless Language and Should Not Have to Use Navigation to Support XML
January 6th, 2010XML Background
Even though XQuery was supposed to be designed from the top down to be an XML processor, a number of politically driven decisions were made early in the design that were bad for XML and its SQL support of XML.
SQL vendor designers wishing to move the industry past SQL to the new XQuery product needed a migration path from SQL to XQuery to speed up and ease this process. The solution used was to add the relational processing capability to XQuery. This can be seen today with the relational standard Inner Join operation being placed in XQuery as its default join operation.
The decision to support the Inner Join in XQuery killed any chance of XQuery supporting XML’s natural hierarchical processing potential in a dedicated and intelligent manner. With XQuery supporting relational and hierarchical processing at the same time, it is impossible to support hierarchical processing intelligently and accurately because it can not be assumed the hierarchical processing is being performed. In addition, the default Inner Join in XQuery is a hierarchical structure killer, one use and any hierarchical structure is destroyed.
The SQL/XML specification design and capabilities were influenced and designed following XQuery capabilities using very XML centric functions and Xpath procedural navigation. Even though XQuery advocates claim that XQuery is very similar to SQL, XQuery XML support and procedural navigation is totally foreign to SQL users especially since SQL is known as a navigationless language and SQL users are probably not that familiar with XML.
XML was originally designed to support semistructured data with its variable format suitable for containing markuped unstructured data. Use of XML containing database structured data such as SQL’s relational data was an afterthought. Unfortunately, SQL/XML support in SQL carries with it the responsibility to support semistructured data which requires navigation because of its variable complexity. SQL’s standard structured data is fixed and is unambiguous from a querying point of view which can allow the nonprocedural and navigationless processing of structured XML data used when semistructured data is not present or being processed. Semistructured data processing is fuzzy and inexact while structured data requires exact and precise processing. The requirement of also supporting semistructured data and its required navigation is not necessary in SQL. XQuery can handle semistructured data, and in this way SQL can remain dedicated to processing structured data which can include the transparent and navigationless structured data in XML.
XML Processing
XML processing today is basically limited to single linear path processing. This is due to navigational complexity of navigating multiple paths at the same time and to the little known fact that multipath processing requires a complex processing that requires Lowest Common Ancestor (LCA) logic. This logic is not automatically supported in current XQuery and SQL/XML processing.
Another requirement to support LCA processing is to operate hierarchical structure-aware, with knowledge of the structure being utilized in the processing. This is not possible today because XML processing is not limited to supporting only hierarchical structures. This is an important capability because hierarchical structures can produce many times their specific data value with multipath queries as they do with single path linear queries. LCA support in XQuery is being academically researched today using LCA functions on top of XQuery. Unfortunately this is not being performed transparently or automatically, it requires users’ knowledge of LCA logic to use, and is limited in scope.
SQL Hierarchical Processing
A recent discovery that has the potential to breathe new life into SQL is that standard ANSI SQL can support full multipath hierarchical processing using standard SQL-92 with its addition of the LEFT Outer Join. This is an inherent capability in ANSI SQL that has also solved the relational/XML data integration problem with the capability to naturally raise SQL processing to a hierarchical processing level.
Currently there is no standard for hierarchical processing. Every vendor or user is using their own hierarchical processing procedures. SQL’s inherent hierarchical processing is a valid subset of relational processing that is operating hierarchically correct. SQL’s LEFT Outer join syntax can model full multipath hierarchical structures and its associated LEFT Outer join semantics defines its hierarchical operations. The ANSI SQL engine can process this hierarchical syntax and semantics input directly to perform hierarchical processing and to automatically produce correct rowset hierarchical results. A further SQL discovery identified LCA processing occurring naturally in the relational Cartesian product used in the joining of tables or structures that model the hierarchical structures.
Extending SQL Hierarchical Processing to XML
Limiting SQL processing to hierarchical processing means that the hierarchical structures defined and dynamically modified during processing by joining and transforming them can be tracked. This means that hierarchical structure-aware processing can be performed. This allows the new capability of dynamically formatted XML output to be possible. This opens the door to many new capabilities such as dynamic XML publishing of reports involving structured data.
Unlike XQuery, SQL’s Dynamic SELECT List limits the processing to only the parts of the structure necessary based dynamically on the output specified in the SELECT List. This means that hierarchical SQL views can be global with no additional overhead for using views larger than necessary for the active query. This keeps the length of query invocations terse enabling simple but complex queries and allows a global view to handle many different and dynamic query uses, reusing data and limiting the number of different views needed. This also allows SQL global queries to be easily specifies using the “SELECT *” capability where entire structures can be easily modified or filtered. This is a capability not previously feasible with XML structures. XQuery and SQL/XML functions on the other hand have their output data processing hard coded in their procedural code making them fairly static and less flexible.
The navigationless XML processing allows nontechnical users to finally have access to XML. Users also do not have to have knowledge of the hierarchical structures being queried. This means that anyone can specify a powerful multipath query without even knowing it is a multipath query using SQL’s powerful yet simple SELECT List.
Is this SQL Transparent XML Hierarchical Processing Possible?
This very advanced next generation of SQL supported structured XML multipath hierarchical processing is possible and has already been accomplished today. My small company has been researching this naturally available technology and has built an online interactive ANSI SQL Transparent XML Hierarchical Processor prototype available for online interactive testing at www.adatinc.com/demo.html. The demo contains complete documentation for this technology and its demo use.
Our latest publications on this technology can be found at http://www.adatinc.com/ourpublications.html.