ANSI SQL Transparent XML Hierarchical Processing Available in Near Future
Occasionally I produce a document or correspondence that I feel is also useful as a blog. This blog is such a case. It will demonstrate what will be possible in ANSI SQL structured XML data processing in the near future. A major problem I noticed with XML processors today is that they try to process two types of XML together without differentiating their processing. The two different types of XML are markup (semistructured, document) and database (structured, data). By separating the types of processing, each type can be better processed in isolation, but processed the same, both types will suffer. SQL processes structured data, so its processing should assume it is processing only structured XML data. This allows new XML capabilities such as navigationless XML and all the new capabilities this enables as will be discussed.
My work in this area started when I discovered that ANSI SQL can inherently perform full hierarchical processing straight out of the box. My research showed how to make this happen by modeling full multipath hierarchical structures in SQL using the Left Outer Join syntax. The associated Left Outer Join semantics as defined in the ANSI SQL specification precisely defines how the ANSI SQL engine would process the Left Outer Join basic data preservation processing. This data preservation exactly equates to basic full multipath hierarchical processing. The Outer Join’s ON clause specifies the structure link points at each specific join point; the older single WHERE clause could not unambiguously do this. It is now used for global hierarchical data filtering.
Most XML professionals do not realize that XML processors today are basically limited to single linear path queries. This is for two reasons. First, all XML processing processors today require user navigation. Multipath nonlinear hierarchical data processing is too difficult to code using user navigation. But the real stumbling block is the second reason that multipath hierarchical processing requires Lowest Common Ancestor (LCA) logic processing in order to coordination the processing between hierarchical pathways in order to produce meaningful results. Most XML professionals today are not familiar with LCA processing. This is because there are no W3C standards for how to carry out hierarchical processing and each vendor can have their own processing rules for processing. This means invalid processing can go unnoticed today and can’t be positively trusted. But ANSI SQL’s inherent hierarchical processing automatically follows hierarchical principles in the hierarchically modeled Left Outer Join for correct processing.
Even XQuery does not support LCA query processing today. But this capability is important which is proven out by a number of academic projects attempting to add LCA processing logic to XQuery. Quite amazingly, ANSI SQL automatically supports this LCA processing fully when a hierarchical structure has been modeled in SQL and is processed directly by an ANSI SQL processor. See my recent article on this amazing unplanned ANSI SQL capability at: http://www.tdan.com/view-articles/11069.
As you may know, IBM, Oracle and Microsoft SQL Servers have not solved the relational and XML data integration problem. Because of this the SQL vendors have all gone off in different proprietary directions becoming incompatible with each other even though SQL/XML functions have been standardized. Additionally, these SQL/XML functions are XML centric and require XML knowledge to code procedurally often requiring looping similar to XQuery looping structures. All of this is unnecessary by utilizing ANSI SQL’s inherent hierarchical processing. By naturally elevating ANSI SQL processing to a full hierarchical level, relational and XML data are integrated naturally at a lossless hierarchical level, solving the relational/XML integration problem.
My partner and I have seamlessly extended ANSI SQL’s full and correct hierarchical multipath processing to support transparent native XML hierarchical processing by using a middleware XML enabler prototype that operates on top of the customer’s SQL processor. This enables the customer’s SQL processor to transparently support full hierarchical and XML processing. This is because ANSI SQL structured processing of hierarchical XML structures requires no user navigation or knowledge of the data structure enabling use by nontechnical users. This is possible because structured hierarchical data is unambiguous and is well behaved in its processing. This processing automatically supports full multipath processing and it is correctly performed. This is also important because the nonprofessional user without knowledge of the structure may be specifying a simple query that is accessing more than one pathway without any way to know this. Multipath queries allow any query to be specified within the processed structure’s multipath range without imposing any limitations.
This hierarchical processing in the middleware is automatically structure-aware and allows many hierarchical capabilities such as automatic XML structured output. This dynamic capability is necessary since the ANSI SQL can dynamically join hierarchical structures dynamically creating a new combined structure which can be dynamically output in formatted XML that matches the dynamic result.
As you can see, multipath processing is extremely powerful and easy to use. It also significantly increases the number of possible different structured queries possible increasing the value of the data. But the real power of multipath processing is that it dynamically increases the power of the query many times over by automatically utilizing the naturally occurring structure semantics that exist between the accessed pathways. For example, SELECTing data from one hierarchical pathway based on data values in another pathway. This is a more sophisticated query requiring additional structure semantic, but is still specified by the same simple SELECT, FROM, WHERE syntax. This opens the door to extremely powerful capabilities.
ANSI SQL’s dynamic SELECT list also allows for very flexible dynamic control of the internal logic of the hierarchical query by dynamically allowing the addition or removal of output data types. This flexibility in dynamic execution is missing in XQuery. XQuery’s ability to process hierarchical structures was also severely limited by the decision of its designers to also support relational processing needed as a bridge from relational to XML data processing. This severely limits the hierarchical processing capabilities by having to support two different types of database processing (relational and hierarchical) at the same time. Our ANSI SQL prototype only supports hierarchical processing allowing it to know the hierarchical structure it is processing at all times allowing advanced hierarchical capabilities to be performed naturally.
The new capabilities mentioned above are many and very significant for having a single ANSI SQL solution that solves all of the existing SQL/XML database integration problems today. To operate our online interactive ANSI SQL Transparent XML Hierarchical Processor prototype and to see our documentation visit: www.adatinc.com/demo.html. You will also notice a lot of additional hierarchical processing capabilities in the prototype that can launch XML into the next generation of multipath processing and accuracy for database processing.
You may be wondering why we have not taken this technology directly to the SQL vendors. We tried. This technology is too disruptive for them to utilize. This makes it too costly for them to change their technology. In addition, vendors like to be able to differentiate their products and technologies from each other and a single solution would prevent this. The only way to make this better more advanced and accurate technology available is to make our product available on the market which we are working towards. Any help would be appreciated. If our technology is better than the Big Three SQL vendors, they will have to come around and offer it too.
The multipath hierarchical processing that has been described is not new; it was utilized three decades ago when hierarchical databases were popular. It has been forgotten (lost knowledge, no Internet back then to preserve it) and now that XML hierarchical structures are back in use, it is not being fully utilized. So this multipath hierarchical processing is not untested. Its use has been fully proven and already tested. Now that it is automatically available through ANSI SQL, it just makes very good business sense to utilize and leverage this powerful natural capability. For more information on ANSI SQL Hierarchical Processing of XML see: http://www.devx.com/xml/Article/39183/1954