It is important to point out XML's powerful capabilities and features that are not found in the more conventional and well known relational or legacy data. These capabilities are unconventional and foreign by today's standards. XML is what is known as a semistructure language, which contains embedded metadata in the data that can enable advanced structures and capabilities.
These semistructure languages have been researched in the academic community for many years. XML is the most recent of these semistructure languages that has finally broken out into the commercial data processing community and it seems very ready for this task. Coming from the academic community, it does have many new ways of processing data and does present a paradigm shift in how data can be processed.
There are many new and unconventional capabilities being introduced by XML that are not being currently addressed in commercial SQL-based products. These capabilities are not being addressed because they impose significant problems for SQL and relational data. It is felt that many of these unsupported XML capabilities (duplicate element use, element sharing, variable structure formats, node promotion, and fragment processing) are useful and it is important that products naturally handle these new XML unconventional features and capabilities and SQLfX® does address these capabilities in an SQL centric fashion. On the other hand, semistructure data used for true markup and not database use does not pertain to database data processing directly; it is best processed in functions that understand markup operations such as full text searching.
Up until now, hierarchically structured data's metadata needed for automatic processing was defined externally. This kept hierarchical structures fixed in structure. The one exception to metadata stored in the data was to handle variable occurring or variable length data which used metadata in the data as variable field length or occurrence counts. But with semistructured data, metadata is embedded everywhere in the data specifying field types and fieldnames at will. This embedded metadata includes the nesting of data that defines the hierarchical structure. This means that it is possible to define any previously unknown record which varies dynamically in data fields, data types and structure. Of course this does not happen because no XML database application, utility or query language could process it. But it does demonstrate how and why there are practically no limits to what can be defined in XML and no one utility or query processor can handle all of XML's capabilities which are primarily used for data markup and not database direct processing.
Variable Structure Formats
Variable structure formats present a significantproblem caused by this unconventional XML capability. This is the situation where the data structure for a given document type can change from document occurrence to occurrence or even within a document occurrence. These variable (irregular) structure formats are difficult to process by relational databases and other conventional data processors because conventional processors have been designed primarily to handle only fixed format data.
The XML metadata that enables variable structure formats are new to conventional data processing, but use similar concepts that are utilized in other languages (i.e. object databases, COBOL data definitions using Depending On clause) and are now being standardized by XML. These concepts produce powerful and flexible capabilities that are common in XML and should not be ignored in the long run. This is why our products handle these new capabilities and unconventional structures in a seamless, non procedural manner utilizing a new ANSI SQL hierarchical structure view technology we have developed.
The reason that XML processors can support variable structure formats is that XML structures are self defining because the associated metadata that defines the data is stored along with the XML data (this is called semistructured data). This metadata includes the structure metadata. This allows the data structure formats to change dynamically because the metadata that defines it is changing dynamically to conform to the actual data format. For this reason, previous XML query products often enabled their query language to directly access and procedurally act upon metadata as standard data.
SQL can support variable structures much like COBOL's depending on clause which controls the structure definition based on values in higher levels of the data structure. The SQL ON clause at each join point can test these values in higher levels of the structure to control how the data structure occurrence is generated. It is assumed that most XML variable data structures will also have data in higher level fields that indicate the structure when variable structures are used. After all, even DOM uses this technique to help navigate variable XML structures.
Duplicate Element Use
This structured view technology is also useful with duplicate named element use that happens when the same node type (name) occurs in multiple locations in the data structure. This is similar to multiple inheritance for Object programming. This means that sometimes the parent element needs to be identified by examining its embedded metadata to know which duplicate element use has been located. For example, to distinguish an Address node occurrence for an employee rather than one from a customer requires the parent type to be identified.This special processing can be performed automatically utilizing our hierarchical structured view technology. Duplicate element use is also handled in SQL by its alias (renaming) capability which avoids context and ambiguous semantics problems.
Node Promotion and Collection
Node promotion is a standard XML operation where node types (node definitions) in the structure can be easily excluded from the hierarchical data structure simply by not selecting them for output. Node promotion causes the descendent node types whose ancestor node type(s) were not selected for output to be promoted (rolled up) under their closest existing ancestor node type.Our products support this capability naturally. This is controlled by the user simply not selecting data from the nodes types which will slice them out of the structure. This node promotion can also cause node collection to occur where nodes data from multiple lower level nodes collects directly under a higher level ancestor node.
XML fragments are selected isolated portions of an XML document which can also be located below its root. They are created and grouped together by using node promotion as described above. These fragments can be returned or moved to other locations in the structure being built for output. Returned fragments can be processed by allowing the user to specify a new collection node type or they can be linked (via joining) to a new location in the hierarchical structure being constructed. These capabilities are supported naturally and nonprocedurally in standard SQL. Fragment processing can be used to perform structure transformations. Fragments are naturally formed by the natural node promotion effect.
Shared Element Data
XML physically defines hierarchical structures by nesting elements within elements. XML IDREF specifications identify additional logical pathways that cause logical network structures to be modeled. These are structures where data nodes (elements) can be accessed from more than one path allowing their data to be shared. This does not present a problem for XML products because they can specify manual database navigation. SQL's database navigation is automatic, so this presents a problem when accessing a data node that can be accessed from multiple paths. This is because each path has a separate semantic meaning. Different paths can select a different set of data node occurrences. We have developed seamless nonprocedural solutions to handle IDREFs and the network structures they define using our hierarchical structure view technology. Shared elements can be handled in SQL by its alias (renaming) capability.
A possible problem with IDREFs is that they are untyped. This means they are not restricted to the same ID reference use. This allows the logical pathway to change unpredictable which can not be supported in SQL. But this is generally not the case.
XPath supports new search, collection, and containment capabilities that supports element duplicate use which allows the same element type (definition) to occur in multiple places in the hierarchical structure. These XPath search capabilities are designed to operate on structures with duplicate element use so they can be searched and collected together even though they exist in different locations in the structure. Our technology can support XPath type hierarchical linear navigation and filtering which will operate on our SQL modeled data structure making for a very integrated fit. Our hierarchical structure view technology allows XML’s unconventional features, like duplicate and shared element data, to operate transparently in a standard SQL fashion while also enabling XPath type navigation and data filtering.
Multiple Content Types
There are two basic styles for specifying data in XML. These are referred to as Element mode or Attribute mode. In Element mode, XML attributes are not used and in Attribute mode, only Element strings are used to specify data. As you would assume, in Mixed content type processing, both are used. Mapping procedures from XML to SQL must take this Mixed content type into consideration very carefully in determining the most efficiently representative structure. SQL to XML is more straightforward and is driven by the choice of mode to use. XML to SQL is also performed automatically by SQLfX® and it supports Element, Attribute and Mixed content format models.
Mixed Content Types and Markup
Another problem area for XML database processing centers around XML markup used in unstructured text formatting and labeling, for searching, displaying or printing data. It happens that the same XML element tag mechanism which names database data fields are also used directly as markup identification tags used within a text field. This overloaded use of XML elements tags makes it impossible to tell the difference between an XML data tag and a markup identification tag. These need to be distinguishable in some way so they can be processed differently. We believe we have a solution to this problem. This situation resulted because XML's use was first as a markup language (as XML's name implies) and then as a data definition language.
The capability to embed new element text into existing element text is needed to support markup but its use outside of markup into representing data items and hierarchical data structures can get quite confusing very quickly. This is called Mixed Content. The problem arises when text is embedded inside of text instead of after its end. Inserting at the end intuitively indicates that the next level is starting, otherwise it is very difficult to interpret the desired meaning and may be application specific. SQLfX® processes this situation by concatenating the pieces of the text into a single text string as if it was entered that way.
Text processing is a major aspect of XML, which makes sense since it is an outgrowth of SGML and HTML. For this reason, text transformation capability is an important capability to offer with SQL. This involves pattern matching and structure transformation which is lacking in relational and legacy data processing. XML query products natively include these capabilities. Using SQL's fragment and alias capability, our SQL native XML integration technology can nonprocedurally perform fairly sophisticated structure transformations by isolating and rejoining structure fragments into a different form. We can do structure transformation using existing relationships in the data (Restructuring) or can perform any-to-any structure transformation (Reshaping) without requiring any pre existing relationships in the data.
Even when SQL systems support native XML as a (built-in) XML column type, the direct joining of an XML column with another XML or SQL column type is not supported. It must be performed through an XML aware function since it has a complex hierarchical structure. This is not a problem for our SQL-based XML integration technology which supports hierarchical data modeling at the SQL join level allowing relational data to be directly linked into an XML document at any specific location, or the direct linking of two different type XML documents specifying the precise link points and indicating which structure is to be the higher level structure. This technology even supports linking to the lower level structure below its root node for added flexibility. This unrestricted join of nonlinear hierarchical structures into a combined hierarchical structure by SQLfX® is a data mashup and it can be performed dynamically. Dynamic Document Processing
Another aspect of XML that will continue to be extremely important and will continue to elevate the level of data processing in the future is its dynamic nature. XML documents can be easily created and processed on the fly, with or without a DTD or schema. We have already seen how the data structure can be dynamic. This leaves the door open to a new level of advanced capabilities that utilize dynamic processing. This is supported by our technology which operates dynamically and can support facilities such as dynamically processing documents which will allow our products to easily and quickly adapt to new levels of these data processing capabilities.
When combining SQL views, the same naming conflicts occur as when XML documents are combined and in a similar way as XML handle's naming conflicts, SQL's high level name prefix can be used when referring to data. Alias names can also be assigned in SQL to abstract the use of the high level prefix.
Recursive structures are just a special case of hierarchical structures where a piece of the structure can double back on itself. This can happen indefinitely as needed as in a parts explosion where parts can contain other parts. XML supports complex recursive substructures easily in its hierarchical type structure. SQL supports recursive structures in its flat data controlled hierarchies as a flat recursive operation that brings in data recursively through the use of a recursive processing based around a loop which accesses a recursive related row and UNIONs it back into the result where it will also eventually go through the same process. Recursive indicators are inserted into rows to help with the processing of the result. The problem here is that SQL hierarchical structures up until now consisted of a single node type being limited by its hierarchy controlled by data. With the SQL hierarchical data type hierarchy documented in this web site this changes and opens up more possibilities leading to a combination of both methods.
XML data is considered ordered by default and SQLfX® will maintain this ordering unless overridden. SQL data is not ordered by default. To make SQL data ordered it has to be ordered after it is retrieved into the result set, so it is not persistent. Another problem is that since the relationaldata is flat, even though it might hold a multi-leg structure, only one leg can be ordered in the result set. This is because beinga single flat structure, when one leg is ordered all the other legs become shuffled. A solution to this problem by SQLfx® is to seamlessly support separate ordering of legs assisted by its post processing. Another problem with ordering hierarchical data is that ordering that goes against the structure can change the hierarchical structure, separate leg ordering solves the problem also. Conclusion
It is interesting and important to note that all of the above unconventional XML capabilities can be specified in XML, but the applications that process this XML still must take on and handle these advanced capabilities and structures themselves. This means that not all documents can be processed by the standard XML processors such as XSLT and XQuery. Even the standard XML parsers DOM and SAX return different results. Fortunately, the capability to deliver the result in structured record format or in memory by our products will allow current non XML applications or new applications to seamlessly access XML with all of its capabilities performed using simple SQL requests. Many of these new capabilities, such as structure transformations and variable structures, normally require very complex code to process, but not with SQLfX®.