ViewX A SELECT A.a, B.b, C.c, D.e, E.e ↓ ↑ FROM ViewX B C=5 ß WHERE C.x=5 ↓ ↓ D E As a further example of the subtleties and complexities of hierarchical processing, the automatic hierarchical query internal semantic processing can become more powerful and complex with OR decision processing in the WHERE clause. This enhances and complicates the natural semantics of hierarchical data filtering processing, making it dynamically variable. Using the hierarchical structure from ViewX in Figure 4.4.3 below, what would the semantics of the query: SELECT D, E From ViewX WHERE D=1 OR E=0 ? If D=1 is true and E=3 is not, then every occurrence of E under the qualified Lowest Common Ancestor C node occurrence is qualified (thanks to D=1 being true) and only the D node data occurrence with the qualified D=1 qualifies since its only qualification is more specific. ViewX A B C ___|___ | | D=1 E=1 D=2 E=2 Figure 4.4.3: Variable semantics for WHERE D is true OR E is true As you would expect from the data qualification results directly above, the opposite results occurs if only E is true and D is not. In this case shown in Figure 4.4.3 above, every occurrence of D under the qualified common ancestor C node occurrence is qualified and only the E node data occurrence that tested true qualifies. If both conditions are true, then all occurrences of the D and E nodes qualify because of cross qualification. These results also demonstrate why it is easier to examine data filtering as data qualification, because both sides of the OR operation have an additive affect. While these output qualification semantics are complex, the results they produce are logical and intuitive to the user. The WHERE clause OR processing semantics described above means that both sides of the OR condition must always be tested because the qualification semantics can dynamically change depending on which side of the OR operation is true. SQL handles this advanced hierarchical processing automatically with its Cartesian product processing that generates all combinations of hierarchical relations which automatically checks both sides of the OR operation. The correctness of this operation can be logically verified by replacing the OR operation by using separate filtering operations on two queries and unioning the results. As just shown, this sophisticated and intuitive hierarchical data filtering is performed naturally in relational SQL Cartesian product processing. This natural hierarchical processing is quite remarkable and meaningful for mathematically correct relational hierarchical processing. It reinforces and validates that the hierarchical processing known and used before relational processing became popular is correct because it is the same as the hierarchical processing produced logically from relational processing. 4.4.4) Path Data Filtering (Data Model Rules) WHERE clause hierarchical data filtering is very powerful operating intuitively on the entire multi-leg structure, but you may still need a more restrictive (path only) type filtering similar to Xpath. This is where ON clause data filtering can be used. While the ON clause is used to specify the join condition, it can also include a data filtering condition. Being on the ON clause it only affects the join operation of the node it is specified on and the related lower level data nodes which can not exist without its data occurrence existence (causing a cascading delete). All other nodes on other legs are not affected in any way. As you probably realize, all ON clauses are processed before the WHERE clause is processed since the WHERE clause affects the entire structure which requires all the ON clauses to have been processed (at least logically). The above differences in the operation of the WHERE and ON Clauses are extremely noteworthy and useful. Using the structure Department over Employee over Dependent for example, a qualification on a dependent under the age of twenty-one on a WHERE clause can cause structure occurrences to be removed from the result when there are no data node occurrences that match the qualification. Placed on the ON clause of a Dependent node, this filtering can only remove the Dependent node occurrences it is used on, and any of its related descendent nodes. This is very useful as a business rule that can be included in the hierarchical data model defined in the SQL view since it is associated with the data model and not the specific query. Without this capability, employees with no dependents under twenty-one years old would be removed and if this causes departments with no employees, the departments would be removed and so on up the structure to the root. 5) Other Hierarchical Based Capabilities 5.1) XML Joinless Access 5.2) Natural Distributed Hierarchical Processing The performing of distributed hierarchical processing automatically when distributed processing is performed is extremely powerful and simple. When the hierarchical data modeling outer joins are broken up and sent to remote sites for processing, the hierarchical substructures they represent will automatically be performed hierarchically following ANSI SQL hierarchical processing specified by the Left Outer Join operations. The returned results have been naturally processed hierarchical at the remote sites and still remain correctly hierarchically mapped at the local site. The final result remains fully hierarchically processed. This happens automatically because the hierarchical data modeling Left Outer Join specification is self contained in the SQL and each ANSI SQL site’s ANSI SQL processor naturally performs the hierarchical processing defined by the data modeling SQL. 5.3) Full Ad Hoc Nonprocedural Processing Supported It is important to point out that all other SQL/XML integration solutions require proprietary solutions and XML centric syntax that is nonprocedural. A side effect of this is that ad hoc processing is really not possible if allowed at all. The ANSI SQL solution presented in this document is nonprocedural and transparent. It also supports full dynamic ad hoc processing support of XML and other forms of hierarchical data format such as legacy data. XML hierarchical structures are not quite as fixed as conventional hierarchical structures. This is because XML is composed of semistructured data where their structure is defined in the data allowing it to change dynamically or even define logical network data structures. These present problems for relational processing. Relational solutions to these cases based on SQL hierarchical processing are described below. While it is nice to support all XML capabilities derived from its structure definition flexibilities, no XML processor is capability of supporting all possible capabilities. Each has their own strengths and limitations. For this reason, SQL should not be expected to support all XML capabilities. 6.1) Variable Structures Figure 6.2: Hierarchical mapping solutions for network structures While the underlying storage of the Addr node is different in both usages (shared and separate) above in Figure 6.2, the semantics of both structures are basically the same and can be mapped into the same remodeled hierarchical structure which is unambiguous as shown above. 6.3) XML Hierarchical Structures and Capabilities XML structures sometimes do not conform to standard conventional hierarchical structures. We have already seen how they can resemble network structures and can be of variable structure which can be handled seamlessly. But structures in XML can also be formed where element types can appear in any order and element text can have elements inserted in the middle of the text. These are capabilities that are not useful for defining hierarchical data structures because they do not represent hierarchical data structures. These capabilities used together support text markup and are supported because XML was originally designed as a text markup language (the “M” in XML). XML as a hierarchical database storage format was actually an afterthought. The problem is that the two uses need to be treated differently. It is not useful to parse markup into a database hierarchical data structure. Markup needs to be processed in its contiguous form and stored and acted upon as a single data value. 8.1) Right Sided Nesting In section 4.2.2, Linking Below the Root, closer examination of the expanded SQL may seem to present a problem. A simplified example used here would contain View1:A LEFT JOIN B ON A.a=B.b and View2: X LEFT JOIN Y ON X.x=Y.y. Now join them with View1 over View2 with a link to View2 at a node below the root: View1 LEFT JOIN View2 ON B.Key=Y.key. The expanded view follows in Figure 8.1. SELECT * FROM A A A LEFT JOIN B ON A.a=B.b çView1 | | LEFT JOIN B X è B X LEFT JOIN Y ON X.x=Y.y çView2 \ | | ON B.Key=Y.key Y X | Figure 8.1: Right sided nesting Y The apparent problem with linking below the root of the lower level joined structure (Vew2) is that it may appear that node Y node is referenced before it is joined to root X. This would be logically invalid since node X determines node Y. On a closer look at the expansion in Figure 8.1, you will notice that the LEFT JOIN between View1 and View2 is delayed because its matching ON clause has been pushed to the far side of View2 which contains its own ON clauses. This causes expanded View2 to be nested (right sided nesting) which causes it to be fully performed in isolation before being joined to View1. This means that all nodes of View2 are available to be referenced when joined to View1. Any depth of nested views can be automatically and transparently handled this way. The SQL coder does not even need to be aware of this powerful nesting taking place. Each nested view is allocated to a new and separate working set while being processed. This also protects all other working sets from any destructive operation it may perform. This means for example that destructive Inner joins can be performed in views without causing data to be discarded in other working sets. This is a very powerful capability that can be exploited in many ways and a safety valve that keeps everything running smoothly with the SQL data modeling, adding powerful SQL recombinant capabilities. 8.2) Automatic Lowest Common Ancestor Logic In Section 4.4, the concept of Lowest Common Ancestor (LCA) was introduced. It deserves closer examination in its intrinsic use in ANSI SQL hierarchical processing. LCA logic is used to help interpret and utilize the semantics between nodes in different legs in a hierarchical structure. Finding the Lowest Common Ancestor of two nodes is not a trivial process and many papers have put forth fast algorithms. This presumes that the application that uses the LCA is aware of its importance and its uses. The ANSI SQL hierarchical processing presented in this document is an inherent operation that was not deliberately designed into ANSI SQL. So if the LCA logic is necessary to hierarchical processing how is it being performed if it was not designed or coded into ANSI SQL? The hierarchically restricted Cartesian product produced from a multi-table Left Outer Join that models the data structure automatically, naturally performs the logic of the LCA. It builds restricted Cartesian sub products under each Lowest Common Ancestor at each hierarchical join point. This means the logic of determining the LCA for node pairs is built into the Cartesian product processor of the relational processor. Using Figure 8.2 below, the predicate WHERE E=C would test all combinations of E=C generated under Lowest Common Ancestor node A, while WHERE D=E would test all combinations of D=C under Lowest Common Ancestor node B. WHERE E=C AND D=E would still test the correct combinations of each LCA thanks to the hierarchically restricted Cartesian product and its underlying inherent LCA logic. This enables WHERE clause hierarchical qualification to qualify a single row at a time which is required for relational processing. A / \ B C / \ / \ D E F G Figure 8.2: Testing Lowest Common Ancestors Lowest Common Ancestor (LCA) is also known as Nearest Common Ancestor (NCA) and Closest Common Ancestor. Lowest for lowest level common ancestor node in the structure, Nearest and Closest for the nearest/closest common ancestor node. They all derive the same Common Ancestor node. 8.3) Logical and Physical Structure Processing Consistency There is a hierarchical consistency that exists between logical and physical hierarchical structures, including between all the different types of physical hierarchical structures. The commonality is that the hierarchical structure maintains its basic principles and how it is logically operated upon irregardless of its makeup. The Left Outer Join syntax can model any conventional hierarchical structure and the associated syntax defines how it is operated on and the subsequent semantics of its result. This means all hierarchical structures can be defined in a global logical structure and accessed consistently by SQL. The access of each different type hierarchical structure will require its own access routine use and the result would be converted to a relational rowset that defines the hierarchical structure modeled by its Left Outer Join operation. The Left Outer Join hierarchical operation performed on tables does this by preserving the hierarchical structure by inserting nulls to keep the variable length leg segments aligned. These same data modeling Left Outer Joins can model physical structures and their access routines will duplicate the Left Outer Join format in their returned rowset. In this way the entire unified heterogeneous hierarchical structure is seamlessly defined by the fully expanded SQL. SQL operates consistently across all the joined rowsets because their make up is physically the same while their physical source may not have been. These rowsets add relational flexibility that may not have been previously recognized. Section 4.3.2 demonstrated structure transformation. This may seem simple to expect of relational data, but transforming physical data may seem a lot more difficult, having to pull it apart, filter, and reassembling it. But since the data from physical structures is now also in rowsets, it can also be easily accessed and reassembled by SQL as shown in Figure 4.3.2. Section 5.1, XML Joinless Access, explained why physical structures do not need to be accessed by performing joins. The physical structure processing operation defined by the Left Outer Join view also requires the natural processing of the Right Sided Nesting describe in Section 8.1. This is because a full hierarchical physical structure must be fully materialized before it can be joined below the root. The Right Sided Nesting enables this to happen. While directly accessing the root node of a physical structure does not present a problem, locating all other nodes directly can be a problem because of the required sequential navigation required. This is why physical structures need to be processes (materialized) all at once before being processed.. 8.4) Data Structure Extraction (DSE) Technology Almost all of the hierarchical processing capabilities are occurring automatically in ANSI SQL because of the Left Outer Join data modeling support. The ANSI SQL relational engine is not even aware of this hierarchical processing, so it can not automatically expand this capability to other areas such as integrating to external hierarchical data sources naturally at a hierarchical level. It would also need to be cognitive of the possible dynamical modeled hierarchical structure being accessed. The Data Structure Extraction (DSE) technology from ADAT’s patented procedure can automatically derive the dynamically created structure of the expanded global hierarchical structure being accessed. With this information seamless capabilities could be added to the SQL engine to extend its natural hierarchical processing naturally and transparently to XML and legacy data as discussed in this document. Conclusion More Hierarchical Information |