Schema-Free Query Vs. Structured Transformation
When available to the user, Schema-Free queries can be a much better alternative to using structure transformations when used for reshaping the structure to accommodate access to the data. Restructuring could include taking the data structure apart and putting it together again using different relationships in the data to create totally new semantic meaning to the data structure. I call this Restructuring. This is not what I am concerned with in this blog. Transformation could also mean what I call Reshaping where the data structure is reshaped into a different data structure preserving the relationships. This is usually performed to satisfy the requirement for a specific data structure shape or organization.
Schema-free queries imply two aspects of query execution. One is that the query is independent of the structure of the data to be processed. This also implies the query navigation is automatic and the user does not need to have knowledge of the structure. The other aspect of schema-free processing is that it is used with multipath hierarchical structures and can process multipath queries. For example, selecting data from one path of the structure based on a different path of the structure. The above references to transformations and Schema-free queries will become clear with the use of the hierarchical diagrams below.
Actual Structure Transformed Structure
/ \ |
B C A
The Actual Structure above was probably built over time first by the user requiring access to path A/B and later partially reused to support access to path A/C. Both relationships are most likely one-to-many relationships. Now let’s suppose the user needs the structure represented by the Transformed Structure above based on the semantics of the Actual Structure. The basic query the user needs to use is SELECT C WHERE B>4. The Actual Structure can not be queried because it involves a complex multipath query requiring complex semantic logic so the Actual Structure is reshaped into the Transformed Structure as shown above and the same basic query is used which is another characteristic of schema-free queries.
If the user wanted to achieve the exact same semantics as in the Actual Structure, this is not achieved because the A/B path transformed into the B/A path has changed the relationship from a one-to-many relationship into an many-to-one relationship and this can alter the semantics of the query. If the WHERE B>4 condition is true multiple times in B, it still only qualifies A once in the Actual Structure where path A/B is a one-to-many relationship. On the other hand, in the Transformed Structure, the WHERE B>4 condition qualifies A for each “>4″ match in B where path B/A is a many-to-one relationship. This causes multiple structures with the root being B in the result and additional data replication. This occurs because of the difference in semantics between one-to-many and many-to-one relationships which is intuitive.
If the above Transformed Structure semantics is desired, than the transform is necessary. This operation is usually not desired, but is tolerated if it is the only solution available. If this operation is not desired and Schema-free processing of multipath structures is possible, than there is no need to perform the transform. In fact any multipath query applied to the Actual Structure will produce the correct and desired result and semantics. This significantly increases the number of queries possible from the single hierarchical structure. These multipath queries also dynamically increase the value of the data in the structures because the processing across pathways utilizes the naturally existing structure semantics existing between pathways. This increases the natural processing power and correctness of these queries.