June 23rd, 2011
The Importance of Structured Data
While unstructured data is now used much more than structured data, unstructured data is not necessary to keep businesses running day to day. But structured data is necessary to keep businesses running day to day. This requires consistent predictable highly principled structured data processing for correct results. This means structured data cannot be replaced by unstructured or semi structured data.
Structured Data is Stuck in Static Mode
Structured data processing has been limited to fixed static structure processing because dynamically generated structured data cannot be automatically handled today. Sharing structured data today is performed with shared metadata. The metadata remains the same, so the structure must remain static. But with the prospect of newer dynamic structured processing, the data structure could be modified dynamically when needed. This opens the door to many new and useful uses of dynamic structured data. Our SQLfX ANSI SQL hierarchical structured data processor performs this dynamic data structure creation and processing.
SQLfX is Built to Process Dynamic Structured Data
SQL supports dynamic processing, so SQL operations can produce dynamically structured data. Dynamic joining of views can generate new structures and dynamic SQL SELECT lists can create variable structures. These dynamically structured data results can be output. This requires knowledge of the structure for internal processing which SQLfX extracts automatically from examining the input SQL outer join input. This enables dynamically generated output text to be automatically utilized visually, but other dynamically created output structures cannot be utilized automatically without updated metadata being available. This has limited its dynamic use today.
The Solution is Automatic Metadata Maintenance
To support variable unpredictable structured data output requires automatic metadata maintenance which has not been previously supported by the industry for structured data processing. This new capability enables the dynamic metadata to always be ready for data sharing. This capability still requires some installation process of metadata by the user. But this can be completely automated by SQLfX which can turn this process into a peer-to-peer collaboration process with automatically utilized metadata maintenance using SQLfX at both ends. This enables peers to initiate or modify data and pass it on where it can be processed automatically and immediately by the receiving peer with no metadata handling by the user. This powerful and flexible hierarchical structured processing peer-to-peer collaboration system is described further in the article at: http://soa.sys-con.com/node/1875139.
April 14th, 2011
Current SQL support of relational, XML and hierarchical legacy data such as IMS is driven by flattening the hierarchical data in order to integrate it naturally with relational (flat) data so that it can be processed relationally. Unfortunately, this strips out the natural semantics in hierarchical data which has the capability to dynamically increase the value of the data being processed and to perform powerful hierarchical operations. This flattening of the data is known as the Lowest Common Denominator (LCD) approach. With integration of relational and XML or relational and IMS data the LCD approach is flattening the data. Integrating XML and IMS does present a problem, but would allow better integration between the Internet and legacy data such as IMS.
The SQL-92 standard introduced the LEFT Outer Join which offers a powerful alternative to standard relational integration that can be used to perform full hierarchical processing integration naturally and inherently. This enables SQL to seamlessly and transparently integrate relational data at a full hierarchical structured data processing level with XML, IMS and other forms of legacy hierarchical data. This means there is no data loss for the integration of hierarchical and relational data while increasing the level of powerful automatic hierarchical operations.
It is important to note that current SQL access to XML via the SQL/XML standard is through XML semistructured data processing. This is a very relaxed hierarchical principled fuzzy process for processing markup and requires user specified navigation. The natural navigationless hierarchical processing performed by SQL always follows precise hierarchical principles and restrictions. This keeps the hierarchical structures unambiguous for automatic processing.
Also significant to SQL full hierarchical processing is its automatic support of multipath processing. This was a powerful hierarchical processing used for querying IMS databases four decades ago before the advent of relational processing. This utilizes the naturally occurring semantics in multipath queries to transparently perform powerful nonlinear hierarchical processing. This enables non technical users to easily specify more powerful internally complex dynamic queries than are available today. This capability and its advantages break away from the current relational flat mindset and its limitations.
The SQL hierarchical processing integration described here have been tested on a hierarchical processing middleware prototype which uses a standard ANSI SQL processor as its hierarchical processing engine. This is described further in my article at: http://xml.ulitzer.com/node/1764314
December 14th, 2010
The following is a list of ANSI SQL’s most useful and powerful hierarchical capabilities. These capabilities are tightly integrated and build upon each other. For this reason they are specified in this building block order. Our current article describes these capabilities: http://www.databasejournal.com/features/article.php/3915331/article.htm.
1: SQL Transparent Hierarchical Processing
2: Inherent LCA Multipath Processing
3: Structure-Aware Processing Enables:
3A: Hierarchical Optimization
3B: Navigationless Access
3C: Global Views
3D: Dynamically Formatted Output
4: Dynamic Structure Generation Supports:
4A: Basic Data Modeling
4B: Joining Hierarchical Data Structures
4C: Dynamically Structured Output
4D: Data Driven Variable Structures
5: Extended Hierarchical Modeling for Data Mashups
6: Data Structure Transformation:
7: Enhanced Multipath Processing Capabilities
8: Automatic Data Renormalization
9: Hierarchical Distributed Processing
10: Future Growth Capabilities:
10A: Replace With Hierarchical Engine Usage
10B: Parallel Processing and Multicore Support
November 18th, 2010
The data terms replicated data and duplicate data are used interchangeably today. But there is a big difference between the two data concepts, and their terms should signify this because they need to be processed differently to get the correct results. The standard dictionary term duplicate is defined as consisting of multiple identical items or being just like another. Replicated is defined as reproduced. This brings to mind a subtle differentiation. Applied to data, duplicate data represents the same valued data that is separately occurring data were each occurrence is significance such as two people with the same name. While applied to data, replicated data is information that has been copied for some reason during processing such as for a data place-holder during processing as produced in relational Cartesian processing when joining table structures such as tables Dept over Emp. This replication is necessary for processing in relational systems but needs to be removed or renormalized for output or stored back in a normalized structure like a hierarchical structure.
Having differentiated the terms replicated and duplicate data, it can be said that replicated data requires additional processing to remove its inadvertent effects on query results. This is a complicated process. Replicated data can also occur when inverting a one-to-many relationship to a many-to-one relationship. In this case, Dept over Emp to Emp over Dept. This does not have a physical effect on the relational rowset, but does have a data replication effect on the operational structure. In this case, the replicated data in the many-to-one inverted relationship is required for processing, but if this many-to-one relationship is inverted back to a one-to-many relationship the replicated data needs to be removed by renormalization and is shown directly below.
There is another problem to consider for renormalization. That is ordering outside of the hierarchical order. This operation if allowed can inadvertently change the structure of the data. This can happen by ordering the data by Emp which causes the higher level Dept data to be replicated inadvertently. Data ordering should not affect the data structure.
XQuery can support this renormalization processes through the use of XQuery functions that the user must specify correctly when necessary. SQLfX®, my company’s SQL transparent XML processor, supports this renormalization process automatically. This is possible because SQLfX® is data structure-aware which is possible because it only supports hierarchical structures which also makes its hierarchical processing more powerful and correct. This renormalization process is described further in our current article Extending SQL’s Inherent Hierarchical Processing Operation at: http://www.databasejournal.com/features/mssql/article.phpr/3912771/article.htm
October 11th, 2010
Four decades ago hierarchical databases like IBM’s IMS and VSAM were new and utilizing their full hierarchical processing capabilities much more thoroughly than today. They utilized their nonlinear multipath processing capability freely only slowed and limited by their procedural hierarchical database navigation needed to access and process the data. This also limited the processing complexity that the applications could take on.
With the advent of 4GLs, nonprocedural query products opened the door to automatic database navigation. This allowed for more complex unrestricted multipath queries and the formulation of another level of hierarchical processing rules and principles to govern multipath processing. This involved the use of Lowest Common Ancestor (LCA) node logic to coordinate processing across multiple pathways. By having unlimited query access to the entire hierarchical structure and all of its pathways, the power of these queries increased many times over which also dynamically increases the value of the data many times over.
With the development of relational data processing and its data independence there was a move to a new paradigm limiting data to flat structures to gain data independence. This introduced the problem that hierarchical structures and their processing is not compatible with relational processing. This new capability quickly supplanted hierarchical processing and moved it to the background. With SQL relational processing now in control, the integration of relational and hierarchical processing naturally utilized the lowest common denominator approach and flattened the hierarchical data to integrate it seamlessly with relational data. Unfortunately, this sacrificed the hierarchical semantics naturally in the data and its hierarchical capabilities.
This SQL relational driven integration has remained in place even with the popular reoccurrence of hierarchical structures with the advent of XML. Our research has shown that with the introduction of the SQL-92 LEFT Outer Join operation, hierarchical processing is actually a subset of relational processing. Using this finding, our SQLfX prototype has proven that ANSI SQL processors can perform full hierarchical processing. This overlap of operational principles allows a natural SQL driven relational/hierarchical integration at a full hierarchical level. This SQL driven hierarchical processing not only preserves the hierarchical semantics, it utilizes this semantics in its hierarchical processing as described in the ANSI SQL transparent XML hierarchical processing described and demonstrated in this web site.
This SQL hierarchical processing technology can also be applied to the natural and transparent full multipath hierarchical processing of IBM’s IMS databases and all other mainframe legacy hierarchical databases like VSAM and flat tabular databases. This allows an almost unlimited internally complex processing to be easily specified and run hierarchically against IMS databases and any hierarchical combination of flat, XML and legacy hierarchical data sources. The SQL addition of hierarchical processing produces an additional level of processing, correctness, efficiency and ease of use not available today.
I said earlier that hierarchical processing is a subset of relational processing. At this common point, the advantages of relational and hierarchical processing are both available. Relational data structure independence and hierarchical structure processing with its powerful semantics are both available at the same time. Originally with the advent of relational processing and its data independence, fixed hierarchical processing had to be sacrificed because they could not both exist simultaneously. This is not true today with hierarchical processing being a natural subset of relational processing. The power of relational data independence can be used with logical hierarchical data structures and their processing to support advanced new synergistic capabilities using SQL.
This SQL hierarchical processing capability allows both IMS applications to integrate with XML, and XML Internet applications to integrate with IMS.
October 1st, 2010
Multipath hierarchical processing is incredibly powerful and allows any SQL query to be processed without restrictions. Multipath processing is not fully understood in the industry today, but is silently allowed even though they often produce unmeaningful results. Multipath queries allow multiple hierarchical pathways to be processed in the query which requires special advanced coordination processing to produce meaningful results. A simple example of a multipath hierarchical query is selecting data from one pathway based on a data value in another pathway. We understand these powerful multipath hierarchical queries and insure that they always produce correct meaningful results. See: ANSI SQL Hierarchical Data Processing Basics for more background info on multipath hierarchical processing.
Our research has discovered and tracked down that SQL inherent hierarchical processing also includes the necessary Lowest Common Ancestor (LCA) processing required for correct multipath processing. It occurs automatically in ANSI SQL processing using our technology. XQuery does not support LCA processing automatically or procedurally. Procedural LCA processing is too complicated to be practical. Current academic projects are attempting to add LCA processing on top of XQuery by using LCA functions, but this has many problems. Ultimately, LCA processing needs to be performed internally and automatically as is occurring naturally in ANSI SQL. See: The Power Behind SQL’s Inherent Multipath LCA Hierarchical Processing for more information on LCA processing. SQL hierarchical XML processing naturally solves the XML Keyword Search problem using its internal and automatic LCA processing to remove unmeaningful results introduced from comparisons made across hierarchical pathways.
Our hierarchical processor uses a powerful dynamic semantic optimization processing to access only referenced nodes or nodes on the path to referenced nodes. This also dynamically limits the internal processing necessary to only the portions of the structure requiring access. This optimization occurs from query to query based on the SQL specified selected output. The same processed structure or view can have a totally different optimized processing based on a different set of selected output items. See SQL’s Optimized Hierarchical Data Processing Driven by its Data Structure for more information on this.
Our hierarchical processor’s natural ANSI SQL hierarchical processing includes five ways that dynamic variable structures can be built. These are by: basic hierarchical data modeling performed in SQL to model the structures; the dynamic hierarchical joining of hierarchical structures; using SQL’s dynamic variable SELECT list limiting the result structure and compressing it using node promotion; programmatically driven by data value; and advanced any-to-any hierarchical structure transformation. These methods can be combined in any way. I was also able to extend hierarchical structure joining operations to support more advanced hierarchical structure data mashups that are hierarchically and semantically correct. See: Extending Hierarchical Data Modeling Demonstrated in SQL for more information on this.
Our ANSI SQL Transparent XML Hierarchical Processor supports the dynamic and automatically hierarchically formatted XML output based on the dynamic resulting output structure. This is possible because the ANSI SQL hierarchical XML processor operates fully structure-aware. This advanced automatic output formatting capability can support new capabilities such as automatic dynamic XML publishing. Being hierarchical, when the structure is dynamically built it can break off into multiple pathways where each of these pathways continues generating independently and can continue breaking into multiple pathways. This opens the door to many new types of dynamic applications. See Dynamic Data Driven Variable Hierarchical Structures in SQL for more information on this capability.
Our research has also shown that there are two basic types of data structure transformations. One method of structure transformation is to use new or unused data relationships in the data to Restructure the data into a different structure. The other method of structure transformation is to use the natural structure semantics to Reshape the structure in anyway. This has distinguished the two transformation terms Restructuring and Reshaping to indicate the different types of transformations. These two terms are used interchangeably today which we hope will change as defined here. Our research used SQL to demonstrate these different structure transformations by using SQL’s natural hierarchical data structure processing capability along with data fragment manipulation. This can be seen in Hierarchical Data Structure Transformation in SQL.
August 22nd, 2010
This blog on Navigationless XML has shown how ANSI SQL inherently supports full hierarchical processing transparently and seamlessly to perform standard hierarchical processing capabilities naturally and accurately. It turns out that ANSI SQL has the ability to also go under the covers of XML to isolate and manipulate hierarchical structures at a data structure fragment level. This is done by utilizing SQL’s alias capability to reference multiple copies of the same data structure using assigned prefixes known as correlation names specified on the FROM statement as shown in the SQL below. This allows separate node fragment groupings to be specified under their common prefix. This is specified on the SELECT statement shown below.
/ \ |
C D F
SELECT X.A.a, X.F.f, Y.B.b, Y.C.c, Y.D.d
FROM ViewABC AS X LEFT JOIN ViewABC AS Y
ON X.A.a = Y.A.a
The structure diagram directly below demonstrates the X and Y prefix values referenced in the above SQL query. The E node is not referenced in the SQL SELECT list and will be squeezed out of the resulting structure naturally by hierarchical node promotion. The A and F node comprise the X fragment and the B, C, D nodes comprise the Y Fragment in their isolated hierarchical structure forms.
/ \ |
(Y)C D(Y) F(X)
The FROM Clause in the above SQL contains a LEFT Outer JOIN placing the X fragment hierarchically before the Y fragment as shown in the resulting output diagram directly below. This had the effect of reversing the original placement of the hierarchical fragments by placing the X fragment on the left and the Y fragment on the right.
The above examples have demonstrated how a hierarchical structure can be broken into structured fragments and manipulated separately. This capability can be extended to support advanced structure transformation as described in my latest article, Hierarchical Data Structure Transformation in SQL at:
July 12th, 2010
I have seen a lot written about the “NoSQL” alternative in the press as an alternative to SQL for a lighter, faster, cheaper, easier to add new data types, and larger memory capacity alternative to SQL usually in the cloud. Most of these light weight database processors do what they claim but at a cost of throwing data processing principles out the window. That’s the trade-off. This SQL trade-off may not be necessary. The solution is to take advantage of ANSI SQL’s powerful inherent full hierarchical processing capability.
SQL’s full hierarchical processing automatically supports full hierarchical processing taking advantage of all the powerful capabilities and principles that come with the overlap of hierarchical and relational processing principles. This occurs because hierarchical processing is a subset of relational processing. This also means that this SQL hierarchical processing’s powerful and principled subset of relational processing can be replaced with a simpler memory friendly powerful hierarchical engine that can still fulfill the NoSQL capabilities mentioned above. Putting an ANSI SQL hierarchical interface over an existing hierarchical processing engine or even building your own hierarchical engine would not be that difficult.
Hierarchical engines do not replicate data which avoids relational data explosions. They always reflect the actual data exactly. Powerful hierarchical optimizations dynamically tailor each query so that only the necessary data in the views based on the active query are accessed. This makes any large view a global view with no overhead so that fewer views or even a single view can be utilized so that users do not have to know what view to use or be familiar with the structure. In addition, performing external hierarchical structure joins of structures is performed internally with a simple link operation instead of expensive relational joins. These powerful hierarchical capabilities are performed transparently by using SQL’s SELECT, FROM, WHERE and LEFT JOIN operations. This means that nontechnical users can easily specify powerful hierarchical queries using a simple standard ANSI SQL hierarchical processing subset.
Most applications can be designed hierarchically and can benefit from principled hierarchical operations. As a hierarchical structure naturally grows larger in structure by adding additional nodes and pathways over time, its inherent data value increases nonlinearly automatically by the natural hierarchical data reuse and it significantly increases the number of possible new queries further increasing the value of the data. In addition, the naturally occurring data semantics between the referenced pathways are used to process powerful multipath queries dynamically creating more meaning and value from the data to answer these internally complex queries automatically. To see and test this type of YesSQL powerful easy to use SQL hierarchical database engine in action try out our online interactive SQL Transparent XML Hierarchical Processor at: www.adatinc.com/demo.html which contains operation and technical documentation
May 21st, 2010
Schema-free querying basically implies that the user specifying the query does not need to know the structure of the data or where the data is located. This means the query product must support this by having knowledge of the structure or by walking through the structure. Either of these two methods means the navigation is automatic. An additional capability that may not be immediately obvious with this automatic processing is that this processing capability is polymorphic. This means that many different data structures can be processed by the same query as long as the data values and data names remain the same.
Another capability that must be in place for schema-free hierarchical processing is Lowest Common Ancestor (LCA) processing which is necessary for multipath processing. Querying hierarchical structures from multiple pathways requires this advanced semantic processing to selectively remove meaningless data results introduced by the hierarchical structure. LCA query processing is not supported today which is why hierarchical processing today is basically limited to linear single path queries.
This presents a problem for schema-free processing because of its required unlimited navigationless ability and the fact that the user does not need to know the structure of the data and can not avoid performing multipath queries. This means that schema-free processing must support LCA processing and multipath processing. This also means that schema-free processing is a much more powerful processing. This level of multipath processing and its complexity is not practical with user navigation and becomes possible and consistently accurate with the automatic navigation and processing required by schema-free processing.
For more information on multipath and LCA processing, see my latest article: The Power Behind SQL’s Inherent Multipath LCA Hierarchical Processing at Database Journal.
April 23rd, 2010
This blog entry will demonstrate why SQL’s natural operation is performed at a higher level of processing giving it advantages over XQuery for automatic processing that greatly increases its level of nonprocedural processing. The hierarchical structure in Figure 1 is the structure that is to be processed.
The ANSI SQL Left Outer Join’s hierarchical syntax can model hierarchical structures and their hierarchical semantics define their hierarchical processing. This means they can be processed directly by the relational processor to support hierarchical processing. The ANSI SQL Left Outer Join string below in Figure 2 models the structure in Figure 1. It will process the structure left to right going down the left path and then down the right path.
SELECT * FROM A
LEFT JOIN B ON A.a=B.b
LEFT JOIN C ON B.b=C.c
LEFT JOIN D ON A.a=D.d
LEFT JOIN E ON D.d=E.e
The ANSI SQL Left Outer Join string below in Figure 3 also models the structure in Figure 1. It processes the structure width first. Both methods model the same structure. These SQL data structure definitions can be built automatically from metadata sources such as XML Schemas. The SQL hierarchical structure definition is easily extractible to support advanced nonprocedural hierarchical operations.
SELECT * FROM A
LEFT JOIN B ON A.a=B.b
LEFT JOIN D ON A.a=D.d
LEFT JOIN C ON B.b=C.c
LEFT JOIN E ON D.d=E.e
XQuery does not have this data modeling capability. It requires more lower level nested looping user specifications shown in a simplified XQuery form in Figure 4 to process the structure shown in Figure 1. The looping goes down the left path first and then the right path. This is the same as the SQL data modeling shown for Figure 2 except the XQuery hierarchical structure definition is not easily determined automatically to support advanced automatic nonprocedural processing.
FOR $A in //A
FOR $B in //B[$A.a=B.b]
FOR $C in //C[$B.b=C.c]
FOR $D in //D[$A.a=D.d]
FOR $E in //E[$D.d=E.e]
For XQuery to process the structure in Figure 1 using width first processing in the same way SQL data model processing was shown in Figure 3 see Figure 5. It is different than the top down processing. This demonstrates that XQuery operations are at a lower level closer to the control flow than SQL which is at a higher level driven automatically by the hierarchical structure being processed making SQL more nonprocedural.
FOR $A in //A
FOR $B in //B[$A.a=B.b]
FOR $D in //D[$A.a=D.d]
FOR $C in //C[$B.b=C.c]
FOR $E in //E[$D.d=E.e]
More information on comparing SQL’s higher level hierarchical processing to XQuery’s can be found in my recent Database Journal article: SQL’s Optimized Hierarchical Data Processing Driven by its Data Structure located at: