Hierarchical data filtering using the WHERE clause affects the entire range of the hierarchical structure. This means a filtering condition applied to a given node type affects all other nodes types in the hierarchical structure. I am talking in the general case of node types or node definitions in the structure. Later there will be further detail how data filtering is applied specifically at the node data occurrence level.
General Data Qualification
The diagram of a hierarchical structure below demonstrates how and why the WHERE clause does affect all node types that define a hierarchical structure. The WHERE clause in the diagram below is applied directly to the E node type. This qualifies the E node type directly. From the E node type, qualification spreads up and down in a linear fashion. Additionally, every node type qualified in the upward process also qualifies the node types below it. This is shown below where upon reaching the A node type on the up direction also qualifies all node types under it and so on. This is the same nonlinear process that is occurring in the SQL relational rowset because the WHERE clause affects entire rows at a time. This hierarchical qualification becomes a little more involved when we cover WHERE data qualification at the data occurrence level in the paragraph below.
Hierarchical Structure Node Qualification Flow
A ↓↑ BEßWHERE E=1 ↓ ↓ ↓ ↓ CDFG
Using the hierarchical data structure directly above, data values have been assigned to the structure as shown below. The root node in this example only has one data occurrence, but could also have other data occurrences but lets assume these will not be qualified. All other node types have multiple node data occurrences. For example, B1 and B2 data occurrences under the root node A1 data occurrence and C1 and C2 data occurrences under B1 data occurrences. Notice that the B1 data occurrence also has D1 and D2 data occurrences under it. The B2 data node occurrence also has a similar set of data occurrences under it. Some terminology and basic hierarchical principles can be made using this diagram. The C1 and C2 data occurrences are known as twins because they are related under the same parent data occurrence, The C3 and C4 data occurrences are also a separate set of twins. The C and D node types are siblings related by their parent B node type. The C and F node types are cousins related by the ancestor node type A. Ancestor nodes are located above parent nodes on a path.
Hierarchical processing involving only single path processing is very simple and intuitive. For example: SELECT A, B, C FROM AboveView WHERE B=’C1’ returns A1, B1, C1, C2 as described above. What about a query that has a WHERE condition that references sibling legs: SELECT B, A FROM AboveView Where C=’C2’ AND D=’D3’. No result is returned for this query. When dealing with multiple path processing, there is a logic known as Lowest Common Ancestor Logic (LCA) that is used to insure only meaningful answers are returned. In this example, the WHERE clause determines its LCA node type and then operates under the range of the LCA node data occurrences. In this WHERE clause, nodes types C and D are referenced making the B node type the LCA node type to control the range of filtering tests. Separate condition tests are performed for B1 and B2 data occurrences. The ordering of twins has no specific meaning, this means all the combination of comparisons between the siblings are performed. Since C=’C2’ and D=’D3’ have different parent LCA occurrences, there is no match. SELECT B, A FROM AboveView Where C=’C4’ AND D=’D3’ returns B2 and A. WHERE clause LCA processing can get more complex. Lets look at: SELECT C, B, A FROM AboveView Where C=’C2’ AND G=’G4’ ? In this Case, the LCA is node type A and not a direct parent node type. The processing performed under node data occurrence A1 would test all combination of C=’C2” AND G=’G4’ under the A1 data occurrence. A match is found for C1 under B1 and G4 under E2. The result for SELECTing A, B would be C1, C2, B1, and A1. B2 data occurrence did not qualify because it did not have a C2 data value under it needed to qualify it. SELECT Operation and LCA Processing The SELECT and WHERE operation used together to references multiple paths can cause a different use of LCA processing. This is demonstrated in: SELECT C FROM AboveView WHERE G=’G4’. In this case we are SELECTing data from one path of the structure, based on data in another path of the structure. The semantics of this are resolved using the LCA again, in this case the LCA of SELECT A and WHERE G=4 where Node types A and G are referenced produces node type A as the LCA node type. This is how this SELECT node qualification works: WHERE G=4 qualifies up the path to the LCA node data occurrence of A1 and then is reflected back down qualifying all active path occurrences under it (the flow of data qualification), This causes C1, C2, C3 and C4 to be SELECTed for output. Conclusion I have left out an interesting and important effect of the WHERE clause data qualification processing when the WHERE clause has a compound condition using an “OR” decision operator. The OR processing in this case requires that both sides of the OR operation always be tested for proper processing of the SELECT/WHERE operation. (Even if the first side tested is true, the other side of the OR operation requires testing too). Depending on each side of the OR condition, each determine the qualification data flow taken together. In one case, the data qualification flow can be determined from the other path of the structure, in the other case, the qualification flow can be determined from only the same path of the structure. This can dynamically change the flow of data qualification used for processing. Flow of data qualification was described in the above paragraph. This operation and all of the others discussed in this blog entry also operate the same in a relational rowset because of the data replications produced from the Cartesian product.