Our Hierarchical 4GL Research

SQL Hierarchical Test Cases

   

Structures & 4GLs

For the most current version, select the PDF version.

 

1)       Introduction

2)       Review of Data Structures

2.1)         Hierarchical Vs Network Structures

2.2)         Entity Relationships Review

2.3)         Logical and Physical Structures

2.4)         Hierarchical Data Structure History

2.5)         Internal Vs External Hierarchical Processing

2.5)         Recursive Structures

3)       Hierarchical 4GL Capabilities

3.1)         Hierarchical Semantics

3.3)         Hierarchical SQL Opportunity

4)       SQL as a Hierarchical 4GL Query Processor

4.1)         Hierarchical Query Specification

4.2)         FROM Clause Controls Hierarchical Data Modeling

4.4.1)           Joining Hierarchical Structures

4.2.2)           Linking Below the Lower Level Structure’s Root

4.3)         SELECT Clause Controls Node Promotion

4.3.1)           Fragment Isolation

4.3.2)           Structure Transformation

4.4)         WHERE Clause Controls Hierarchical Data Filtering

4.4.1)           Hierarchical Data Qualification

4.4.2)           Multi-leg Filtering Semantics

4.4.3)           Multi-leg Variable Semantics

4.4.4)           Path Data Filtering (Data Model Rules)

5)       Other Hierarchical Based Capabilities

5.1)         XML Joinless Access

5.2)         Natural Distributed Hierarchical Processing

5.3)         Full Ad Hoc Nonprocedural Processing Support

5.4)         Hierarchical Structure Construction Order

6)       Processing Unconventional Hierarchical Structures

6.1)         Variable Structures

6.2)         Mapping Network Structures to Hierarchical Structures

6.3)         XML Hierarchical Structures and Capabilities

7)       Hierarchical SQL View Use and Importance

7.1)         Hierarchical View Synergies

7.2)         Hierarchical View Optimizations

8)       More on the Internal Processing of this Technology

8.1)         Right Sided View Nesting

8.2)         Automatic Lowest Common Ancestor Logic

8.3)         Logical and Physical Structure Processing Consistency

8.4)         Data Structure Extraction (DSE) Technology

          Conclusion

1) Introduction

 

A significant part of XML’s usefulness comes from its concise hierarchical structure which allows any unique path or combination of paths to be processed and without the need for replicated data. Unfortunately, the standard level of knowledge about hierarchical data structures, their multi-leg semantics, and advanced database application is very low which may explain their lack of use today. In this same regard, information on this subject is very limited. Hierarchical structures inherently contain extremely powerful structural semantics. Properly utilized, this semantics can automatically supply fourth generation languages (4GLs) with the information needed to perform nonprocedural database navigation and semantically complex hierarchical queries logically and intuitively.

Today, the automatic use of hierarchical semantics by query processing has been limited to single legs ignoring the vast amount of semantics that naturally exists between the different legs. This inter-leg semantics can be used not only to automatically process queries involving any single leg query, but also any multi-leg query. It can, for example, process queries that access data from one leg of the structure based on data from another leg. This multi-leg processing is a common requirement that is very useful and meaningful, allowing the most semantically complex queries to be easily and nonprocedurally specified. This capability significantly increasing the value of the data.

Any time you look at an organizational chart you are analyzing the relationships across different legs of the structure based solely on the intuitive semantics of the hierarchical structure. This demonstrates the advanced level of semantic hierarchical processing automatically possible nonprocedurally and how it increases the value of the data. The current XML query languages today are not utilizing the vast quantity of semantics that exists naturally between all legs of the hierarchical structure to be able to nonprocedurally query any combination of legs in a full hierarchical structure. These full hierarchical structures can be defined by a single global view with no processing overhead.

A significant advantage of using large global hierarchical views by the 4GL user is that they do not need to be aware of the hierarchical relationships of the structure they are querying. The result will automatically reflect the valid relationships and meaningful result. These powerful 4GL nonprocedural hierarchical querying capabilities were very popular before relational databases came into conventional use. As it turns out, standard SQL can perform these powerful full multi-leg hierarchical queries inherently as will be shown.

 

2) Review of Data Structures

 

Some background into data structures is necessary to understand hierarchically structured data, its semantics, uses, and why it is so powerful. There are three common and basic types of data structures: flat, hierarchical and network. Flat data structures have no apparent structure. They are stored at a single level with no hierarchy of data, hence the term flat. They are simple contiguous structures that are typically relational tables and flat files.

 

Hierarchical data structures are multilevel structures and can contain multiple legs. They are composed of data nodes in a hierarchical multi-path tree structure. As a tree structure, they have a single start node known as the root node, and all other nodes are only located on a single path (one entrance). They offer a single unambiguous view of the data such as describing a particular application data view. For this reason, hierarchical structures can be navigated nonprocedurally because there are no choices of multiple paths to take to get to a particular node. The different node relationships between each other are: child, parent, ancestor, descendent, sibling, or cousin. Siblings are related by the same parent, and cousins are related by a common ancestor. Sibling node definition order under the parent node is application dependent.

 

The term “node” may be used generally as in the above paragraph which may apply equally to the node type (its definition in the structure) or data node (a node data occurrence). Sometimes this distinction is important when describing or understanding hierarchical semantics and in these cases the distinction will be made. With this in mind, siblings are the different children node types of a parent node type, the legs of these nodes are known as sibling legs. Twins are the different node data occurrences of a parent node data occurrence. They are very tightly related and for this reason they are also usually directly linked together under their parent node data occurrence.

 

2.1) Hierarchical Vs Network Structures

 

Network data structures, as opposed to hierarchical structures,  are for storing and processing more complex interlinked nodes which do not represent a single way of representing the structure or its access. They usually require procedural navigation because their nodes can be reached from multiple paths with each path representing a different semantics. They are useful for storing multiple combinations of overlapping structures, such as defining all the combinations of nodes that represent the valid different application views for a given database.

 

Network structures because of their intersecting paths store nodes separately connecting these nodes with pointers while hierarchical structures have the choice of having their nodes stored separately connected hierarchically with pointers or stored contiguously in a hierarchical nested fashion. Regardless of how stored, hierarchical structures with their single view of relationships between nodes actually possess a high level of unambiguous hierarchical semantics between all nodes. This can be extremely useful for nonprocedural processing of multiple applications using the same global view. This is not the case for network structures with their nodes connected in a network fashion because they do not represent any single unambiguous view. Because of this they require procedural navigation to handle different applications.

Another significant difference between network and hierarchical structures is hierarchical data preservation and hierarchical data inheritance. In a network structure, a given node can be in multiple paths, so as long as it exists in a path, it can not be deleted. Nodes in a hierarchical structure have a fixed parent-child relationship. In hierarchical structures, child data nodes can not exist without their single parent data occurrence existing. If a parent data node occurrence is deleted or filtered from the query, all related descendent data node occurrences are also deleted. This will cause cascading deletes down the leg producing variable length leg occurrences. It also allows for hierarchical data inheritance down the leg of a hierarchical structure. This means the data on a path above any data node occurrence is also available. These are important and useful hierarchical characteristics.

2.2) Entity Relationships Review

 

Hierarchical structures also have entity relationships between each adjoining node on a leg that can be independent of its hierarchy, but have an effect on the data structure semantics and data modeling. These relationships are one-to-many (1 to M), many-to-one (M to 1), and many-to-many (M to M or M to N). One-to-many is your typical Department to Employee relationship where a single Department can have zero to many Employees. The opposite relationship, many-to-one can be represented by an Employee to Department relationship where many Employees can belong to the same Department. While physical hierarchical structures like XML primarily use one-to-many relationships (one parent can have many children), logical hierarchical structures, like relational, can also support many-to-one relationships through data duplication as shown below in Figure 2.2. This duplication is perfectly valid and necessary to represent the M to 1 semantics correctly and produce a valid hierarchical result that is different than the 1 to M relationship applied to the same data as shown. M to 1 can also be hierarchical modeled as shown.

  

            1 to M                  M to 1     

            DeptX              Emp1   Emp2

                |           à        |           |

            Emp1             DeptX   DeptX

            Emp2

 

       Figure 2.2: M to 1 and 1 to M representing the same data differently

 

A many-to-many relationship is your typical Parts/Supplier relationship where a given Part can have many Suppliers while a given Supplier can carry many Parts. Physical databases have a difficult time supporting these bi-directional complex relationships while logical relational databases support them easily. This is performed by using an association table that contains all of the M to M bi-directional relationships so that the same association table can be used to represent the PartàSuppliers and SupplieràParts relations. Using the association table, both of these relationships become a hierarchical 1 to M. This works by PartàPart/SupplieràSuppliers where Part/Supplier is the association table which can be turned around to support SupplieràSupplier/PartàParts. 

 

These association tables can also contain intersecting data at the intersecting point such as the price of a specific part from a specific supplier (àPart/Price/Supplierà). This intersecting data can be naturally attached to the lower level node (Part or Supplier) which has the proper number of data occurrences to hold all the combinations because the association table has transformed the resulting relationship into a 1 to M. In this way the association table can be made transparent.

 

2.3) Logical and Physical Structures

 

Physical structures are fixed and must be modeled and accessed based on their fixed homogeneous structure. They do not need to be contiguously stored like XML which is nested, they can also be in separate pieces and related by physical pointers like IBM’s IMS database. Logical structures like relational structures are composed of multiple tables (rows are nodes) that are linked by logical data relationships. This means they can represent any number of logical structures which can be determined dynamically, thereby supplying data independence, Figure 2.2 is an example of this.

 

Logical and physical hierarchical structures can both be represented in the same hierarchical structure because they have the same attributes and principles of operation. Having the same basic hierarchical structure, logical and physical hierarchical structures can be seamlessly combined into a larger heterogeneous logical structure. This means, for example, that two XML documents can be easily and seamlessly joined using a relational association table in the middle to support M to M relationships. The new structure is a logical structure

 

2.4) Hierarchical Data Structure History

 

Relational databases became popular because of their flexible data independence that allows tables to be combined in any way. This put earlier hierarchical databases in a bad light because they prevented data independence because of their fixed hierarchical data structure. This mistakenly put all hierarchical data structures in a bad light. This unfortunately ignored the usefulness of logical hierarchical data structures which unlike physical hierarchical structures does not prevent data independence, it increases it.

 

Because of this mistaken popular belief, most commercial database software that used hierarchical data structures went into disfavor and eventually disappeared. This is the main reason there is so little information on hierarchical data structures and their advanced processing capabilities today. This also explains why there are so few Fourth Generation Languages that operate automatically based on the inherent semantics of the hierarchical structure being processed.

 

2.5) Internal Vs External Hierarchical Processing

 

Hierarchical processing in SQL today is performed by external hierarchical processing systems. There are many ways to build an external hierarchical processing system on top of SQL. External hierarchical processing systems require programming and use procedural functions and external hierarchical navigation information stored in the database. Because of this additional external processing complexity, the structures are usually limited to a single node type and recursive structures are used to form a homogenous hierarchical structure.

 

The above characteristics mean that external  hierarchical systems are not general purpose. For all of these reasons, external hierarchical databases are usually limited to simple structures. SQL-99 does support nested tables which can represent hierarchical structures, but they also require procedural methods to be written and our not general purpose. These hierarchical processing systems are not integrated with SQL, they use external processing while the retrieval and filtering of the data is still performed by SQL.

 

Nested relational internal hierarchical SQL database systems were popular for a while, but since they required new non standard syntax they were not seamless and did not catch on. In addition, they required a static schema definition limiting hierarchical flexibility and ad hoc processing. Fortunately, as will be explained in more detail shortly, it has now been determined that ANSI SQL with its generable purpose processing based on ANSI SQL hierarchical join relationships automatically performs hierarchical processing.

 

The hierarchical join relationships enabling inherent hierarchical processing define multiple heterogeneous node types. This allows very complex hierarchical structures such as XML to be naturally modeled and processed internally hierarchically. This means that no SQL hierarchical processing functions need to be written, the navigation and processing is also performed automatically and nonprocedurally by SQL. It is a powerful general purpose solution, and being performed internally in SQL making it efficient and seamless. This is the type of hierarchical structure processing that this document concentrates on. By being naturally integrated into SQL, the data is treated hierarchically from the start as it is hierarchically filtered during retrieval through output.

 

2.6) Recursive Structures

 

Recursive structures are a special case of hierarchical structures where a piece of the structure can double back on itself representing a new lower level substructure occurrence. This can happen indefinitely as needed as in a parts explosion where parts consist of other parts. XML supports complex recursive substructures composed of multiple node types and legs easily in its hierarchical type structure. SQL-99 supports flat recursive structures in its data controlled hierarchies. This is performed by a recursive operation that brings in data recursively through a loop which accesses recursively related rows and UNIONs them back into the result where they will also eventually go through the same process. Recursive information indicators can be inserted into rows to help with the processing of the result. The problem is that SQL recursive hierarchical structures consist of a single node type limited by its data controlled hierarchy. The SQL hierarchical type hierarchies documented in this web site can change this by opening up more possibilities leading to a combination of both recursive methods.

 

3) Hierarchical 4GL Capabilities

 

Fourth generation languages are nonprocedural languages. They are also known as declarative languages because the user only needs to specify what they want and not how to get it. In order for this to work, the 4GL needs to know the hierarchical data structure being processed. Using the inherent meta information derived directly from hierarchical structures definitions, the 4GL can automatically determine the semantics in the data structure. This can be naturally applied to the processing of the query without having to be supplied with the hierarchical processing instructions or logic.

 

With the advent and popularity of XML today, hierarchical data structures are coming back in a big way. Unfortunately, maybe because of the lack of knowledge about hierarchical structures and their capabilities, XML query languages have not taken full advantage of hierarchical structures and the powerful structural semantics they contain. This results in XML query languages having to be procedurally supplied with hierarchical semantic logic causing significant difficulty in specifying hierarchical queries that operate across multiple legs of the hierarchical structure. The programming complexity level increases as the number of legs involved in processing the query increases. This can make semantically complex queries not worth the time or effort to specify procedurally. It also prevents the ability to specify ad hoc queries, and increases the chances of inaccurate results from faulty hierarchical processing logic.

 

Multi-leg queries greatly increase the hierarchical semantics involved and complexity level necessary for processing them. While more complex to process, multi-leg queries processed by nonprocedural 4GLs are easy and intuitive for the user to specify, are useful in decision support, and can avoid using procedurally complex queries. More importantly, being able to freely query the entire structure nonprocedurally allows any query to be processed regardless of the number of legs involved. This frees the ad hoc user or query developer of any concerns or knowledge of the structure. The 4GL user does not need to know the location of the data or hierarchical relationships of the structure being queried. These complex hierarchical operations which are determined by the interaction of the query semantics and the hierarchical data structure semantics are not well understood or documented today, but can still be performed automatically by a 4GL.

 

3.1) Hierarchical Semantics

 

As stated previously, hierarchical data structures have an unambiguous semantics because they have only a single path to every node. This allows a nonprocedural query language to navigate them automatically and process the request utilizing the unambiguous semantics in the structure. The different legs of a hierarchical structure are independent of one another, but are related indirectly by the Lowest Common Ancestor (LCA) nodes that connect them. This makes the nodes on different legs indirectly related which is referred to here as cousins. This means that a single node data occurrence on one leg of the structure is also related to all other node data occurrences for sibling legs under their common node data occurrence. Each common ancestor data node occurrence naturally coordinates the hierarchical control logic necessary for hierarchical processing across legs. The number of common ancestors increases with the number of legs referenced in a query.

 

As mentioned above, hierarchical data structure semantics today are currently being utilized only on a single leg at a time. A primary advantage of a full hierarchical structure is that it can define any combination of the legs necessary to process any multi-leg query. This semantics across the multiple legs means that the semantics between every node in the structure are meaningful and can be utilized for hierarchical processing. Typically a 4GL user querying data nonprocedurally from a full hierarchical structure is naturally going to reference multiple legs. This is because the user does not need to know the structure or be restricted by it. This multi-leg processing can be handled automatically by a 4GL utilizing the inherent semantics and principles that occur inherently between all nodes in the entire structure. Hierarchical semantics remain consistent independent of how the structure is stored and represented physically or logically.

 

3.2) Hierarchical SQL Opportunity 

 

In a way, SQL’s underutilized and overlooked inherent hierarchical processing capabilities offers proof of what has been stated earlier about current underutilized hierarchical capabilities in XML. By modeling hierarchical structures directly in SQL using the ANSI SQL Left Outer Join operation where tables and XML elements are nodes in the hierarchical structure, the relational engine automatically operates at a full hierarchical level by inherently utilizing the semantics. This means hierarchical processing is a valid subset of relational processing. This makes XML integration in SQL seamless and capable of full hierarchical coverage carried out completely nonprocedurally and automatically. One of Advanced Data Access Technologies (ADAT) Inc.’s main goals is to utilize its hierarchical processing technology and expertise to fully utilize XML’s hierarchical processing capabilities in its innovative SQL products.

 

Besides ADAT’s dynamic Data Structure Extraction (DSE) technology which can dynamically determine the unified hierarchical structure represented in the Left Outer Joins, it has also researched and developed many additional capabilities. These new capabilities further utilize or seamlessly extend SQL’s inherent hierarchical processing capabilities. They include the nonprocedural support of XML’s advanced semistructure capabilities and the use of powerful nonprocedural, and flexible structure transformation operations specified and controlled directly in standard SQL.

 

SQL’s nonprocedural processing utilizes the semantics naturally present in the hierarchical data structures. This drives the hierarchical processing which can automatically correlate information throughout the entire structure and can also be performed in ad hoc mode to support powerful decision support. This significantly increases the value of the data by automatically being able to dynamically process very semantically complex queries. This takes on staggering importance and value with the goldmine of hierarchical semantics naturally existing in XML data structures and also available for legacy hierarchical data sources which can also be processed seamlessly by SQL hierarchical processing.

 

4) SQL as a Hierarchical 4GL Query Processor

 

With SQL being a  4GL, data modeling SQL syntax and semantics automatically instruct the SQL relational engine how to perform the hierarchical query processing. This is because all nodes are inherently related semantically and can be automatically processed as described earlier. This enables SQL to perform hierarchical processing naturally by following the semantics in the data structure being processed. The data structure is defined hierarchically by the Left Outer Join operation which unlike the standard and default Inner Join operates hierarchically. This enables complex hierarchical multi-leg queries to be processed nonprocedurally and automatically. For example, this allows an SQL query that selects data from one area of a structure based on data in another area of the structure to be performed automatically. The logic involved uses the Lowest Common Ancestor nodes between the referenced legs to coordinate and correlate the multi-leg processing.

 

A simple SQL query involving more than two legs would involve multiple common ancestors and their interrelationships making the internal logic increasingly complex. Fortunately, nonprocedural relational query languages such as SQL can do this accurately and automatically regardless of the semantic complexity. This is because the relational hierarchically restricted Cartesian product processing automatically generates all valid hierarchical combinations between legs starting at the Lowest Common Ancestor node which automatically performs this logic inherently and accurately. Current XML query languages such as XQuery are controlled procedurally by user supplied multi-leg hierarchical semantic logic. This can make them impracticable for semantically complex multi-leg hierarchical operations.

 

With hierarchical structure data preservation, each separate leg of a hierarchical structure can vary in number of data types from leg occurrence to leg occurrence. Surprisingly, this does not present a problem for fixed column relational data. Variable column legs in SQL relational databases are logically represented correctly hierarchically in form and length in standard fixed column relational rowsets. This is because the variable missing columns of every leg occurrence are automatically padded with Nulls as a normal operation of the SQL Outer Join data preservation operation. This allows variable column hierarchical leg occurrences to be stored accurately in fixed length rowsets dynamically preserving the fixed alignment, and enabling seamless and transparent operation.

 

SQL’s automatic processing of full multi-leg structures has all the same operational characteristics and capabilities as any physical or logical hierarchical structure being processed by older proven nonprocedural hierarchical processors. This also validates that this is the correct mathematically proven way of hierarchical processing. Hierarchical processing and its semantics operate the same for physical (XML) or logical (relational) structures. This enables a consistent and seamless operation across heterogeneous structure formats and hierarchical optimizations when SQL is automatically performing hierarchically. There will be further coverage on physical and logical structures and optimizations later.

 

4.1) Hierarchical Query Specification

 

A typical SQL request specification uses the familiar SELECT, FROM, and WHERE clause syntax.  The following description explains how this syntax naturally applies to hierarchical processing when SQL is processing hierarchically modeled structures. Since the SELECT clause relies on the data specified in the FROM clause, the FROM clause will be discussed first. The FROM clause specifies the  location of data types that form a pool of nodes consisting of relational tables and XML elements which will be placed in the input working set as needed. All required data types for processing the query must be included there. The Left Outer Join operation is specified in the FROM clause to specify how the nodes are related hierarchically into a single structure. This can be abstracted in separate SQL views for each structure.

 

The SELECT clause specifies which data items in the modeled structured defined in the FROM clause are to be returned. If any data item in a node is returned then that node is represented in the returned structure, otherwise that node is excluded in the returned structure.  The WHERE clause is used to return only the qualified data occurrences. If no WHERE clause is specified than no data filtering is performed.

 

4.2) FROM Clause Controls Hierarchical Data Modeling

 

The data associated with the left argument of the Left Outer Join is hierarchically preserved over the right data argument because it is preserved even when the right argument’s data is not present. Using this capability, any hierarchical structure can be modeled and processed hierarchically by the relational engine performing the hierarchical semantics associated with the SQL hierarchical modeling syntax. This is shown below in the definition of the ViewX structure in Figure 4.2 below. You can notice how legs are created by modeling them going down the structure and how multiple legs are formed when ON clauses link back to nodes already with a formed or partial formed leg. This SQL hierarchical Left Outer Join view can be generated automatically from existing meta data sources such as XML schemas.

 

       ViewX              CREATE VIEW ViewX AS

                                    A                  SELECT * FROM A

                                   /  \                  LEFT JOIN B ON A.a=B.a            

                                 B   C                LEFT JOIN C ON A.a=C.a

                                      /  \               LEFT JOIN D ON C.c=D.d

                                     D  E             LEFT JOIN E ON C.c=E.c

 

                               Figure 4.2: SQL hierarchical data modeling

 

The SQL hierarchical data modeling syntax in Figure 4.2 is also directly executable by SQL to perform hierarchical processing. This is possible because the associated SQL semantics of the data modeling SQL is specifying the basic principles of hierarchical structures and their processing. This makes for an extremely tight and seamless bond between SQL hierarchical data modeling and its associated hierarchical processing assuring data modeling accuracy, efficient processing, and an open and available data modeling language and its accompanying hierarchical processor. Also realize that this SQL data modeling syntax is a self defining data structure definition that accompanies the SQL were ever it may go. These powerful characteristics have very useful implications that will be examined further.

 

4.2.1) Joining Hierarchical Structures

 

In the same way that the hierarchical data modeling was performed a node at time using Left Outer Joins in Section 4.2, hierarchical substructures defined in SQL views can be joined (linked) by Left Outer Joins deriving a hierarchical superstructure. The left structure is joined over the right structure joined by the ON clause join criteria as demonstrated in Figure 4.2.1 below. The ON clause takes on added importance because it also specified the join points in each structure, in this case it is the C node linked to the X node.

 

                                      Relational                        XML

                                         ViewX                          ViewY

                                             A                                  X

                                            /   \                                /   \      

                                          B    C                            Y    Z

                                                /  \

                                              D   E                            

                                                                               

            SELECT * FROM ViewX LEFT JOIN ViewY

                                    ON C.Key=X.Key

 

                                                    Joined  Structure                    

                                                                A     

                                                               /   \                 

                                                             B    C  

                                                                  /  |  \

                                                                D  E  X

                                                                         /  \

                                                                       Y   Z

 

          Figure 4.2.1 Joining hierarchical data structures

 

Since the SQL hierarchical modeling views are SQL and the invoking statement is SQL, they combine naturally into a unified SQL executable statement. This means the SQL joining of the Left Outer Join views (ViewX and ViewY) in Figure 4.2.1 above automatically expand into a unified virtual SQL view that correctly maps the combined heterogeneous hierarchical structure being processed in the SQL query. This enables seamless access and processing across the heterogeneous global virtual structure. Also notice that the level of data modeling is dynamic and at a very high conceptual hierarchical modeling level manipulating structures as a whole entity as can be seen visually above and below in examples to follow.

 

4.2.2) Linking Below the Lower Level Structure’s Root

 

The joining of the two hierarchical structures in Figure 4.2.1 above is standard for hierarchical structures since the lower level structure was linked to its root node. The semantics of the combined hierarchical structure are intuitive as shown. What would the semantics of the combined structure if the lower level structure was linked to its Y node instead of its root node X  by replacing the ON clause with:  ON C.Key=Y.Key.  Interesting this capability is supported and the combined new structure generated remains the same as if the root was linked to as shown above. This is easily understood and makes sense since the root still exhibits the same inheritance and subordination effect on its structure as shown below in Section 4.3. As you might expect, data filtering is applied to the lower level structure at the actual link point, the Y node in this case. For more info on linking below the root, see Section 8.1 Right Sided Nesting.

 

4.3) Select Clause Controls Node Promotion

 

Data selection, known as projection in relational terms, is specified by the SQL SELECT clause. It controls which data types from the input working set defined by the FROM clause are moved to the output structure (a fixed vertical/column kind of filtering). Nodes with no data selected are not moved to the output structure which is standard relational processing. This slicing out of unselected output nodes is also standard hierarchical processing and is known as node promotion because the removed node’s selected descendent nodes are preserved by being naturally promoted up and around the removed nodes following the hierarchical form of the data structure. This is shown below in Figure 4.3 where node C was not selected for output.

 

              View X                                                               Node Promotion

                  A                  SELECT A.a, B.b, D.d, E.e                    A

                 /   \      à       FROM ViewX                        à         /  |  \

                B   C                                                                          B  D  E

                     /  \

                   D  E

 

Figure 4.3: Hierarchical node promotion and collection

 

The selected nodes in Figure 4.3 above also maintain their hierarchical semantics because the remaining nodes still exert the same hierarchical effect and semantics on each other (subordination is maintained). This is because the structure is naturally condensed in a hierarchically controlled manner. Also notice that nodes D and E with separate paths directly under node A is an example of node collection. All of these operations are basic hierarchical processing.

 

4.3.1) Fragment Isolation

 

Fragments are pieces of a hierarchical structure similar to substructures but more dynamic. This  definition is extended here to include node promotion caused by node exclusion. By extending upon the same data selection principles used in node promotion demonstrated  above in Figure 4.3, it is possible to isolate a hierarchically structured fragment in a view. This is demonstrated below in Figure 4.3.1 where a fragment is isolated by only selecting on data  in the C, D, and E nodes.

 

              View X                                                    Fragment Isolation

                  A              SELECT C.c, D.d, E.e                     C

                 /   \    à     FROM ViewX               à              /  \

                B   C                                                                 D  E      

                     /  \                                                                             

                   D  E                                                                          

                    |    |                                                                      

       F  G

 

             Figure 4.3.1: Hierarchical node promotion and Collection                         

           

4.3.2) Structure Transformation

 

By utilizing fragment processing shown above in Section 4.3.1 and hierarchical structure joining in Section 4.2.1, it is possible to isolate multiple structure fragments in a single structure and then manipulate them with join operations. This is fairly simple and intuitive, but involves another level of complexity. This is demonstrated in Figure 4.3.2 below where two fragments are formed and separately reconstructed from the same SQL view. These are fragments A-B and C-D, which can be used to independently manipulate and perform structure transformation by rejoining the fragments differently which is also shown below. These hierarchical operations can be abstracted in an SQL view for reuse.

 

          ViewX                                                                                     Transformed

              A                                                                                            Structure 

            /    \                 SELECT V2.C.c, V2.D.d, V1.A.a, V1.B.b            C

          B     C     à      FROM V2.ViewX LEFT JOIN V1.ViewX   à    /  \

                 /  \             ON V2.C.c=V1.A.a                                             D  A

                D  E                                                                                              |

                                                                                                                    B  

            Figure 4.3.2: Structure transforming using fragment processing

 

As shown in Figure 4.3.2 above, transforming data structures in standard SQL is possible by defining different fragments in the same structure by using the SQL alias/rename capability to keep them separate and distinct. Both logical and physical structures once retrieved to the working set in memory are stored as contiguously fixed structures. This allows logical fragments in the working set to be naturally joined and moved separately to the result set using standard hierarchical processing. This is possible because all accessed data types, such as logical relational, contiguous XML, and even linked IMS data forms are now stored in a homogenous rowset in the relational working set. This means the structured data in the working set can be treated as a fixed contiguous rowset and can be logically joined with other rowsets irregardless of the source data format.

 

4.4) WHERE Clause Controls Hierarchical Data Filtering

 

Hierarchical data filtering is controlled by the WHERE clause. It specifies which data row occurrences are filtered out based on their data content. This is a dynamic horizontal filtering (rows or path data occurrences). Data filtering will affect which data is in the result set on a row by row, path by path data occurrence basis. 

 

One of a hierarchical query’s most powerful capabilities is its hierarchical WHERE clause data filtering which is applied to the full hierarchical structure. To fully understand hierarchical query data filtering which is more involved than flat structure filtering, it is more easily understood if examined as data qualification. This is because it is simpler to demonstrate a positive rather than negative operation when demonstrating the natural hierarchical filtering operation.  This means that when hierarchical data filtering is used, it is examined as qualifying data rather than specifically filtering out data. When no WHERE clause filtering is specified, all the queried data is qualified.

 

4.4.1) Hierarchical Data Qualification

 

WHERE clause data qualification operates on the entire hierarchical structure in a hierarchical fashion based on how all the nodes relate to each other This same hierarchical process is occurring in standard SQL and relational table processing, but takes on hierarchical meaning (semantics) when the entire structure is recognized and the result is examined against its hierarchical structure. This semantics is more easily traced through the hierarchical structure following the related data qualified by the WHERE clause. This data qualification semantics is described below.

 

ViewX below in Figure 4.4.1 consists of tables A, B, C, D, E joined into the hierarchical structure as shown. The tables become nodes in the structure when modeled in SQL. If  “A”, “B”, “C”, “D” and “E”  are selected from this structure qualified on some value for “C” as in: SELECT A,B,C,D, E FROM ViewX WHERE C=5, what are the data selection semantics of this query? The qualification process starts at the WHERE clause condition that directly qualifies the C data node occurrence with WHERE C=5. Then all path data occurrences under the qualified C data node occurrence (D&E node occurrences) qualify and the path data occurrence above each qualified C data node occurrence (A node related data occurrence) qualify.

 

                    ViewX

                       A              SELECT A.a, B.b, C.c, D.e, E.e

                                    FROM ViewX   

                     B  C=5  ß WHERE C.x=5

                        

                       D  E

 

            Figure 4.4.1: WHERE clause data qualification flow

 

The above described WHERE clause in data selection logic in Figure 4.4.1 covers directly related qualification logic familiar today, but multi-leg qualification includes any node path data occurrences indirectly related across the legs. These are connected to a qualified node data occurrence such as the related B node cousin data occurrences related by the Lowest Common Ancestor (LCA) node A qualified node data occurrence in ViewX above in Figure 4.4.1. This same hierarchical result exactly reflects the result found in the relational result, with the meaning obscured by the flat representation. The hierarchical semantics and associated data are still available for hierarchical use. It can enable structured XML output to be nonprocedurally produced.

 

4.4.2) Multi-leg Filtering Semantics

 

When multiple legs take part in data filtering as in WHERE D=1 AND E=2 or arithmetic operations as WHERE D=E, using the ViewX below in Figure 4.4.3, all combinations across the qualified data value occurrences under the Lowest Common Ancestor data occurrences of node C are processed for a matching combination that qualifies. If lower level nodes are also selected, than all combinations that qualify must be processed since the lower level nodes can be qualified separately by different qualified combinations. This is also the same logic performed in hierarchical databases which is naturally reproduced by the relational Cartesian product processing.

 

Many queries will have data filtering criteria applied to more than two legs which can produce more than one Lowest Common Ancestor node filtering processes as described above. Using the hierarchical structure ViewX below in Figure 4.4.3, data filtering as in WHERE D=1 AND E=2 AND B=3 will cause a more complex processing of data filtering which in this example requires nested processing of Lowest Common Ancestor logic in this compound WHERE clause. Common ancestor C node for D&E is located under common ancestor A node for B & (D&E) requiring nested common ancestor processing that naturally follows correct hierarchical processing logic. Regardless of the number of common ancestors involved, standard SQL operating hierarchically will perform this hierarchical logic perfectly thanks to the hierarchically restricted Cartesian product processing controlled by the WHERE clause. It generates the correct combination of row values to automatically emulate this type of hierarchical nested Lowest Common Ancestor processing.

   

4.4.3) Multi-leg Variable Semantics

 

As a further example of the subtleties and complexities of hierarchical processing, the automatic hierarchical query internal semantic processing can become more powerful and complex with OR decision processing in the WHERE clause. This enhances and complicates the natural semantics of hierarchical data filtering processing, making it dynamically variable. Using the hierarchical structure from ViewX in Figure 4.4.3 below, what would the semantics of the query: SELECT D, E From ViewX WHERE D=1 OR E=0 ?  If D=1 is true and E=3 is not, then every occurrence of  E under the qualified Lowest Common Ancestor C node occurrence is qualified (thanks to D=1 being true) and only the D node data occurrence with the qualified D=1 qualifies since its only qualification is more specific.

 

                                ViewX

                                   A

                                   / \

                                 B  C

                                ___|___

                    |           |

                             D=1     E=1

                             D=2     E=2

 

            Figure 4.4.3: Variable semantics for WHERE D is true OR E is true

 

As you would expect from the data qualification results directly above, the opposite results occurs if only E is true and D is not. In this case shown in Figure 4.4.3 above, every occurrence of D under the qualified common ancestor C node occurrence is qualified and only the E node data occurrence that tested true qualifies. If both conditions are true, then all occurrences of  the D and E nodes qualify because of cross qualification. These results also demonstrate why it is easier to examine data filtering as data qualification, because both sides of the OR operation have an additive affect. While these output qualification semantics are complex, the results they produce are logical and intuitive to the user.

 

The WHERE clause OR processing semantics described above means that both sides of the OR condition must always be tested because the qualification semantics can dynamically change depending on which side of the OR operation is true. SQL handles this advanced hierarchical processing automatically with its Cartesian product processing that generates all combinations of hierarchical relations which automatically checks both sides of the OR operation. The correctness of this operation can be logically verified by replacing the OR operation by using separate filtering operations on two queries and unioning the results.

 

As just shown, this sophisticated and intuitive hierarchical data filtering is performed naturally in relational SQL Cartesian product processing. This natural hierarchical processing is quite remarkable and meaningful for mathematically correct relational hierarchical processing. It reinforces and validates that the hierarchical processing known and used before relational processing became popular is correct because it is the same as the hierarchical processing produced logically from relational processing.

 

4.4.4) Path Data Filtering (Data Model Rules)

 

WHERE clause hierarchical data filtering is very powerful operating intuitively on the entire multi-leg structure, but you may still need a more restrictive (path only) type filtering similar to Xpath. This is where ON clause data filtering can be used. While the ON clause is used to specify the join condition, it can also include a data filtering condition. Being on the ON clause it only affects the join operation of the node it is specified on and the related lower level data nodes which can not exist without its data occurrence existence (causing a cascading delete). All other nodes on other legs are not affected in any way. As you probably realize, all ON clauses are processed before the WHERE clause is processed since the WHERE clause affects the entire structure which requires all the ON clauses to have been processed (at least logically).

 

The above differences in the operation of the WHERE and ON Clauses are  extremely noteworthy and useful. Using the structure Department over Employee over Dependent for example, a qualification on a dependent under the age of twenty-one on a WHERE clause can cause structure occurrences to be removed from the result when there are no data node occurrences that match the qualification. Placed on the ON clause of a Dependent node, this filtering can only remove the Dependent node occurrences it is used on, and any of its related descendent nodes. This is very useful as a business rule that can be included in the hierarchical data model defined in the SQL view since it is associated with the data model and not the specific query. Without this capability, employees with no dependents under twenty-one years old would be removed and if this causes departments with no employees, the departments would be removed and so on up the structure to the root.

 

5) Other Hierarchical Based Capabilities

 

5.1) XML Joinless Access

 

While logical structures composed of relational tables use joining operations when retrieved, physical structures like XML do not need to simulate costly relational joins in the retrieval process when they are modeled by Outer Joins views. This is because the Left Outer Join hierarchical data modeling syntax for physical structures only represents the data structure metadata semantics. This means physical structures can be accessed and processed directly without joins because the hierarchical semantics of the Left Outer Join are reflected naturally in the physical hierarchical structures themselves and do not require expensive and needless joins to physically model their hierarchical structure. Query languages that simulate physical join views to integrate with relational data relationally will have join processing overhead for XML to support SQL/XML integration. But because the natural solution described in this document utilizes ANSI SQL’s hierarchical processing to perform SQL/XML integration at a hierarchical level, it can efficiently process physical hierarchical structures hierarchically. This allows constructing the rowset that matches the Left Outer Join data modeling without performing expensive relational joins as described above.

 

5.2) Natural Distributed Hierarchical Processing

 

The performing of distributed hierarchical processing automatically when distributed processing is performed is extremely powerful and simple. When the hierarchical data modeling outer joins are broken up and sent to remote sites for processing, the hierarchical substructures they represent will automatically be performed hierarchically following ANSI SQL hierarchical processing specified by the Left Outer Join operations. The returned results have been naturally processed hierarchical at the remote sites and still remain correctly hierarchically mapped at the local site. The final result remains fully hierarchically processed. This happens automatically because the hierarchical data modeling Left Outer Join specification is self contained in the SQL and each ANSI SQL site’s ANSI SQL processor naturally performs the hierarchical processing defined by the data modeling SQL.

 

5.3) Full Ad Hoc Nonprocedural Processing Supported

 

It is important to point out that all other SQL/XML integration solutions require proprietary solutions and XML centric syntax that is nonprocedural. A side effect of this is that ad hoc processing is really not possible if allowed at all. The ANSI SQL solution presented in this document is nonprocedural and transparent. It also supports full dynamic ad hoc processing support of XML and other forms of hierarchical data format such as legacy data.

  

5.4) Hierarchical Structure Construction Order

 

Hierarchal structures can generally be built or processed in any order without changing their semantics or result. This processing can be top-down, left to right, bottom-up, or in any combination. This is also true of the hierarchical structure’s data modeling definition. The same hierarchical structure can be built in a top-down, left to right, or bottom-up order. The completed hierarchical structure has the same semantics or derives the same results in what ever order it is built or processed. As an indication of SQL’s hierarchical processing capability and flexibility,  all of these same capabilities and operational characteristics remain the same for SQL processing. Generally speaking, hierarchical structures are most easily defined and built top-down as all the previous examples have intuitively demonstrated. Some automatic SQL data modeling definition or view expansion processes may change the standard data modeling order while remaining semantically the same. This can cause some inefficiencies that cause throwaway data from dangling tuples (no matching row) that are avoided with top-down processing. This is avoided by rewriting the outer join for top-down processing at runtime.

 

6) Processing Unconventional Hierarchical Structures

 

XML hierarchical structures are not quite as fixed as conventional hierarchical structures. This is because XML is composed of semistructured data where their structure is defined in the data allowing it to change dynamically or even define logical network data structures. These present problems for relational processing. Relational solutions to these cases based on SQL hierarchical processing are described below. While it is nice to support all XML capabilities derived from its structure definition flexibilities, no XML processor is capability of supporting all possible capabilities. Each has their own strengths and limitations. For this reason, SQL should not be expected to support all XML capabilities.

 

6.1) Variable Structures

 

Variable structures are hierarchical structures whose structure can change dynamically  between different structure occurrences and even within a given structure occurrence. This is allowed in XML. SQL can support variable structures by using variable values in the ON clause that hierarchically model the structures differently depending on the value of the variable. Since separate ON clauses are used at each join point they can dynamically control whether the join at any point in the structure building process is performed or not. By having a test condition based on a higher level data value located higher on the current path occurrence (hierarchical data inheritance supported in SQL), the variable generation of the data structure can be dynamically controlled. This is shown below in Figure 6.1 where either a D node or an E node data occurrence is generated, but not both, controlled by the value in the C.x data field. This is similar to COBOL’s Depending ON clause in its File Definition section. This is demonstrated below with an overly simplified example, but should give you an idea of how this capability is supported and can be utilized.

 

        ViewX-1                                                                      ViewX-2

             A           SELECT * FROM A                                       A

            /   \          LEFT JOIN B ON A.a=B.a                           /   \                   

          B    C        LEFT JOIN C ON A.a=C.a                         B    C

                  |   ç  LEFT JOIN D ON C.c=D.c AND C.x=1            |

                 D        LEFT JOIN E ON C.c=E.c  AND C.x=2  è     E

 

                          Figure 6.1: Variable structures

 

6.2) Mapping Network Structures to Hierarchical Structures

 

Some of XML’s advanced features require or create a logical network structure. Network structures unlike hierarchical structures allow a node type to be accessed from more than one path. This is demonstrated directly below in the XML IDRef Structure in Figure 6.2. This makes XML hierarchical structures ambiguous for nonprocedural, navigationless languages like SQL because a node can have multiple path entries each with its own semantics. A similar problem occurs with the use of duplicate element type nodes which is also permitted in XML and shown below. It is possible to model these network type structures as unambiguous hierarchical structures in SQL. SQL has an alias/rename ability that can enable a network to hierarchical structure mapping capability as shown below in Figure 6.2.

  

     IDRef  Structure               Hierarchical Structure      Duplicate Element      

              Dept                                   Dept                                 Dept

             /      \                                    /     \                                 /     \

       Cust    Emp            à          Cust    Emp            ß       Cust    Emp

            \       /                                 |          |                             |           |

             Addr                           AddrC   AddrE                    Addr   Addr

 

      Figure 6.2: Hierarchical mapping solutions for network structures

 

While the underlying storage of the Addr node is different in both usages (shared and separate) above in Figure 6.2, the semantics of both structures are basically the same and can be mapped into the same remodeled hierarchical structure which is unambiguous as shown above.

 

6.3) XML Hierarchical Structures and Capabilities

 

XML structures sometimes do not conform to standard conventional hierarchical structures. We have already seen how they can resemble network  structures and can be of variable structure which can be handled seamlessly. But structures in XML can also be formed where element types can appear in any order and element text can have elements inserted in the middle of the text. These are capabilities that are not useful for defining hierarchical data structures because they do not represent hierarchical data structures. These capabilities used together support text markup and are supported because XML was originally designed as a text markup language (the “M” in XML). XML as a hierarchical database storage format was actually an afterthought. The problem is that the two uses need to be treated differently. It is not useful to parse markup into a database hierarchical data structure. Markup needs to be processed in its contiguous form and stored and acted upon as a single data value.

 

7) Hierarchical SQL View Use and Importance

 

Hierarchical outer join definitions of logical relational table structures and physical XML documents can be placed in standard SQL views. These ANSI SQL hierarchical views have unique capabilities that come together synergistically producing even more powerful capabilities making SQL hierarchical processing more powerful and very user friendly are described below.

 

7.1) Hierarchical View Synergies

 

Hierarchical SQL views naturally map logical relational and physical XML structures using the Left Outer Join operations. This makes hierarchical processing flexible, reusable, and intuitive because of these hierarchical views’ powerful hierarchical view abstraction. These hierarchical views can be embedded or hierarchically joined with other hierarchical SQL views. This is performed the same way tables are by using the Left Outer Join directly on views to naturally form larger hierarchical views as in FROM ViewX LEFT JOIN ViewY ON X.x=Y.Y as discussed in Section 4.2.1. This joining of hierarchical views preserves and combines the hierarchical semantics and can even be preformed dynamically. When all the specified hierarchical logical and physical SQL views expand at runtime they automatically form into a seamless and consistent unified heterogeneous virtual hierarchical view. Defined naturally in SQL, this takes on significant synergistic proportions making hierarchical processing very user friendly.

 

7.2) Hierarchical View Optimization

 

The Left Outer Join hierarchical data preservation operation described earlier allows separate views or the entire unified heterogeneous view to be optimized hierarchically at runtime. This is performed at runtime by accessing only the nodes referenced or on a path to a referenced node, thereby saving on  unnecessary data access. This also significantly cuts down on data explosions caused by semantically incorrect and confusing data replication, and the inefficiency they cause in memory and CPU usage. This is demonstrated below in Figure 7.2 where nodes B and D from ViewX  are not accessed.

 

         ViewX                                                Nodes             Final

             A                                                   Accessed       Structure

            /  \               SELECT A.a, E.e             A                   A

          B   C    à     FROM ViewX        à      |                     |         

               /  \                                                      C                   E

              D  E                                                     |

  E

        Figure 7.2: Hierarchical optimization

 

You can also see in Figure 7.2 that node C while not referenced is still required for navigating from node A to node E. Node C will be removed from the final result since it was not selected for output. The lack of access  of  the optimized out nodes has no negative influence on the result and improves upon the semantic accuracy of the result by reducing unnecessary data replications.

 

This optimization’s power is significantly increased by how easily the SQL SELECT clause can be changed and the view can automatically adapt by eliminating unnecessary nodes from the defined structure. This is a form of semantic optimization driven by the data structure’s meta data instead of physical views that use procedural programming instructions.

 

Global views of entire structures become very useful with this hierarchical optimization. Global views result in fewer specialized views, further increasing the reuse and hierarchical data abstraction for the user so they do not need to be concerned with details of the structure being processed. This also allows global views to be used without incurring any overhead. In a nonprocedural 4GL like SQL, global views automatically increase the database domain of the queries using these large views. This makes them very user friendly by eliminating the use of many small views and having to know when and how to use which view.

 

This powerful hierarchical optimization is not a replacement for the standard Inner or Outer join optimization. Nor does it have to be integrated into the current Inner or Outer join optimization. This powerful hierarchical optimization is simply applied before the standard optimization is performed. It can be applied externally and dynamically by rebuilding the query to eliminate the unneeded nodes.

 

This hierarchical optimization can also be applied to accessing physical hierarchical structures described Section 5.1. It can be used to indicate paths that can avoid database access. This use made of this will vary for the type of physical structures and their access mechanisms. Structures comprised of physical pointers like IBM’s IMS database VS nested contiguous structures like XML and Structured VSAM will use different  access optimizations that can benefit from the hierarchical optimization described here.

 

8) More on the Internal Processing of this Technology

 

There are a number of automatic powerful internal ANSI SQL operations that play an essential role in ANSI SQL hierarchical processing that have not been fully covered yet. This was done to keep the previous descriptions more easily understood. At this point these powerful features can now be covered.

 

8.1) Right Sided Nesting

 

In section 4.2.2, Linking Below the Root, closer examination of the expanded SQL may seem to present a problem. A simplified example used here would contain View1:A LEFT JOIN B ON A.a=B.b and View2: X LEFT JOIN Y ON X.x=Y.y. Now join them with View1 over View2 with a link to View2 at a node below the root: View1 LEFT JOIN View2 ON B.Key=Y.key. The expanded view follows in Figure 8.1.

 

            SELECT * FROM                                                       A              A

      A LEFT JOIN B ON A.a=B.b   çView1            |                |

LEFT JOIN                                                                  B  X  è   B

     X LEFT JOIN Y ON X.x=Y.y     çView2             \  |           |

ON B.Key=Y.key                                                            Y         X 

                                                                                                     |     

Figure 8.1: Right sided nesting                                                 Y

 

The apparent problem with linking below the root of the lower level joined structure (Vew2) is that it may appear that node Y node is referenced before it is joined to root X. This would be logically invalid since node X determines node Y. On a closer look at the expansion in Figure 8.1, you will notice that the LEFT JOIN between View1 and View2 is delayed because its matching ON clause has been pushed to the far side of View2 which contains its own ON clauses. This causes expanded View2 to be nested (right sided nesting) which causes it to be fully performed in isolation before being joined to View1. This means that all nodes of View2 are available to be referenced when joined to View1. Any depth of nested views can be automatically and transparently handled this way. The SQL coder does not even need to be aware of this powerful nesting taking place.

 

Each nested view is allocated to a new and separate working set while being processed. This also protects all other working sets from any destructive operation it may perform. This means for example that destructive Inner joins can be performed in views without causing data to be discarded in other working sets. This is a very powerful capability that can be exploited in many ways and a safety valve that keeps everything running smoothly with the SQL data modeling, adding powerful SQL recombinant capabilities.

 

8.2) Automatic Lowest Common Ancestor Logic

 

In Section 4.4, the concept of Lowest Common Ancestor (LCA) was introduced.  It deserves closer examination in its intrinsic use in ANSI SQL hierarchical processing. LCA logic is used to help interpret and utilize the semantics between nodes in different legs  in a hierarchical structure. Finding the Lowest Common Ancestor of two nodes is not a trivial process and many papers have put forth fast algorithms. This presumes that the application that uses the LCA is aware of its importance and its uses.

 

The ANSI SQL hierarchical processing presented in this document is an inherent operation that was not deliberately designed into ANSI SQL. So if the LCA logic is necessary to hierarchical processing how is it being performed if it was not designed  or coded into ANSI SQL?

 

The hierarchically restricted Cartesian product produced from a multi-table Left Outer Join that models the data structure automatically, naturally performs the logic of the LCA. It builds restricted Cartesian sub products under each Lowest Common Ancestor at each hierarchical join point. This means the logic of determining the LCA for node pairs is built into the Cartesian product processor of the relational processor. Using Figure 8.2 below, the predicate WHERE E=C would test all combinations of E=C generated under Lowest Common Ancestor node A, while WHERE D=E would test all combinations of D=C under Lowest Common Ancestor node B. WHERE E=C AND D=E  would still test the correct combinations of each LCA  thanks to the hierarchically restricted Cartesian product and its underlying inherent LCA logic. This  enables  WHERE clause hierarchical qualification to qualify a single row at a time which is required for relational processing.

 

                       A

                     /    \

                  B       C

                 /   \     /   \

                D  E   F   G

 

          Figure 8.2: Testing Lowest Common Ancestors

 

Lowest Common Ancestor (LCA) is also known as Nearest Common Ancestor (NCA) and Closest Common Ancestor. Lowest for lowest level common ancestor node in the structure, Nearest and Closest for the nearest/closest common ancestor node. They all derive the same Common  Ancestor node.

 

8.3) Logical and Physical Structure Processing Consistency

 

There is a hierarchical consistency that exists between logical and physical hierarchical structures, including between all the different types of physical hierarchical structures. The commonality is that the hierarchical structure maintains its basic principles and how it is logically operated upon irregardless of its makeup. The Left Outer Join syntax can model any conventional hierarchical structure and the associated syntax defines how it is operated on and the subsequent semantics of its result. This means all hierarchical structures can be defined in a global logical structure and accessed consistently by SQL.

 

The access of each different type hierarchical structure will require its own access routine use and the result would be converted to a relational rowset that defines the hierarchical structure modeled by its Left Outer Join operation. The Left Outer Join hierarchical operation performed on tables does this by preserving the hierarchical structure by inserting  nulls to keep the variable length leg segments aligned.  These same data modeling Left Outer Joins can model physical structures and their access routines will duplicate the Left Outer Join format in their returned rowset. In this way the entire unified heterogeneous hierarchical structure is seamlessly defined by the fully expanded SQL.  SQL operates consistently across all the joined rowsets because their make up is physically the same while their physical source may not have been.

 

These rowsets add relational flexibility that may not have been previously recognized. Section 4.3.2 demonstrated  structure transformation. This may seem simple to expect of relational data, but transforming physical data may seem a lot more difficult, having to pull it apart, filter, and reassembling it. But since the data from physical structures is now also in rowsets, it can also be easily accessed and reassembled by SQL as shown in Figure 4.3.2.

 

Section 5.1, XML Joinless Access, explained why physical structures do not need to be accessed by performing joins. The physical structure processing operation defined by the Left Outer Join view  also requires the natural processing of the Right Sided Nesting describe in Section  8.1. This is because a full hierarchical physical structure must be fully materialized before it can be joined below the root.  The Right Sided Nesting enables this to happen. While directly  accessing the root node of a physical structure does not present a problem, locating all other nodes directly can be a problem because of the required sequential navigation required. This is why physical structures need to be processes (materialized) all at once before being processed..

 

8.4) Data Structure Extraction (DSE) Technology

 

Almost all of the hierarchical processing capabilities are occurring automatically in ANSI SQL because of the Left Outer Join data modeling support. The ANSI SQL relational engine is not even aware of this hierarchical processing, so it can not automatically expand this capability to other areas such as integrating to external hierarchical data sources naturally at a hierarchical level. It would also need to be cognitive of the possible dynamical modeled hierarchical structure being accessed. The Data Structure Extraction (DSE) technology from ADAT’s patented procedure can automatically derive the dynamically created structure of the expanded global hierarchical structure being accessed. With this information seamless capabilities could be added to the SQL engine to extend its natural hierarchical processing naturally and transparently to XML and legacy data as discussed in this document.

 

Conclusion

 

ANSI SQL makes a very powerful, complete, and flexible nonprocedural 4GL hierarchical querying language. It is unique because it automatically utilizes the semantics in the data structure to automatically process the most semantically complex multi-leg hierarchical queries. This means the user does not have to know the data structure or supply the hierarchical semantic processing logic, increasing the value of the data. The results are simultaneously hierarchical and relationally accurate making the result mathematically correct. This allows it to integrate logical or physical, and heterogeneous hierarchical structures transparently at a full interactive conceptual hierarchical processing level. SQL can perform hierarchical processing so well that it can be used as a model for determining correct hierarchical processing. This has already been done to determine and validate the semantics of linking below the root of the lower level structure when joining hierarchical data structures.

 

More Hierarchical Information

 

Click the links below to access our documentation that supplies more detailed and hard to find hierarchical structure processing information. It includes the utilization of this hierarchical semantic information to support SQL-based native XML integration and more advanced SQL hierarchical examples. These include more info on: Hierarchical structure joining, node promotion, fragment processing, structure transformation and variable structures.

Click here to see our Hierarchical Structure Research

Click here to see Advanced SQL Hierarchical Examples

Click here to see Our Prototype

Click here for Additional Information

 Ó Copyright 2003/4/5 Advanced Data Access Technologies, Inc. All rights reserved.