With the advent of hierarchically based XML, most XML products are still relationally based. This has presented a problem that has so far been ignored. Even XQuery is relationally based. In fact its default join operation is the Inner join. The Inner join does not model or preserve hierarchical structures. The Left Outer Join should be used to model hierarchical structures. Why even support the Inner join for hierarchical processors? XML aware products and applications are capable of inputting and outputting XML data. I would expect this to mean they output only validly processed hierarchical XML data. While the XML hierarchical format can be automatically checked for validity today, no one seems to care whether the hierarchical data itself is valid, just as long as it is correctly hierarchically formatted in XML. Current XML product implementers and vendors do not seem to be concerned. I guess this is because there is no standard for XML hierarchical processing. I recently suggested that there should be a best practices document for XML hierarchical processing on a prominent XML implementer’s group mailing list and no one bothered to reply. But when I referred to XPath’s navigation as being procedural in a recent article it caused quite a stir on the list. This did not leave me with a good feeling about hierarchical XML data processing. The following is an example of how the Inner Join interacts with hierarchical structures. An Example
A / \ B X | / \ C Y Z The above hierarchical structure is defined in SQL as: SELECT * FROM A LEFT JOIN B ON A.a=B.b LEFT JOIN C ON B.b=C.c LEFT JOIN X ON A.a=X.x LEFT JOIN Y ON X.x=Y.y LEFT JOIN Z ON X.x=Z.z The Left Join preserves the left side argument over the right side. This means A can exist even if there is no matching B items, but B can not exist without matching A items (where A and B represent relational tables or XML elements). The SQL processing continues left to right defining the hierarchical structure. This ordering is important because with the LEFT JOIN order of the joins is significant. The ON clause controls the link data points; this is why C is linked to B and not A. Each ON clause only operates locally at its specific use point because the left argument side is always preserved with Left Outer Joins. At this point we have the hierarchical structure modeled above. But, if we append an INNER JOIN linked to any node point of the above structure such as node X (as in INNER JOIN W ON X.x=W.w) and no relationship matches are found, the left side unmatched items which represents the entire structure (A,B,C,X,Y,Z) is removed from the result because the Inner Join does not preserve the left side of the argument when there is no matching items found. When working with hierarchical structures this makes no sense and destroys the hierarchical integrity of the result.