Glossary of SQL/XML Integration and Hierarchical Terms
The terminology defined here is used in this web site andpertains to its use in this web site.Since many of the technology areas covered in this web site are new, many of the terms listed were coined to help describe this new technology.
Access path: The access path refers to a navigation path in a hierarchical structure from the root node of the structure to the node requiring access. This path must be followed when accessing a node by accessing each node along the path to the required node in order to maintain the semantics of the data structure.
Ad hoc query: An ad hoc query is a database query that can be specified interactively or for unanticipated queries. This means that the database query does not require beingpredefined to the database system processing the query. In relational systems, this will require dynamic SQL query processing.
ADT: An ADT is an Abstract Data Type. This is an SQL3 feature that allows user-defined complex data types to be defined in SQL. They can contain an internal structure of multiple values that can be hierarchically related, and the C code logic that knows how to operate on it.
Aggregate data:Data that is the result of applying a process to combine data elements collectively or in summary form.
Alternate key:An alternate key is a column or field in a relational table or record that can also be used as the key besides its primary key. As such, this key probably is not unique among other rows or records in the table or file. A foreign key can be considered an alternate key. The alternate key is usually the "many" side of a one-to-many relationship.
Ambiguous semantics: Semantics are about meaning.Ambiguous semantics are semantics that have more than one possible meaning. These meanings can be conflicting. Semantics should be singular in meaning to be most useful.
Ambiguous structures: Data structures such as network structures have ambiguous semantics when used to represent a singular view of the data. These structures do not have a singular meaning because data values in the structure can usually be reached from multiple paths, with each path representing a different semantics or meaning. Proceduralstructure navigation is necessary to get reliable results in these cases.
Hierarchical structures are unambiguous because they have only one path to each data value.
Ancestor nodes: Ancestor nodes are nodes that are further up the path from their related descendent node. As in a parent node, it controls the existence or range of processing of those nodes under it.
ANSI: ANSI National Standards Institute.
API: Application Programming Interface.SQL is an application programming interface for relational data bases.
Application view: The application view is how the application visualizes the structure of the database. This structure should be hierarchical because hierarchical structures are singular (unambiguous) in meaning. This enhances the usefulness of the data structure semantics. With application views, applications can share views, and databases can support many different views.
Association table: Association tables are used in relational databases to maintain many-to-many data relationships such as Parts/Suppliers. This relationship can operate in either direction as a one-to many relationship: Part over Suppliers or Supplier over Parts. Both directions can not be maintained with just the Parts and Suppliers tables, so an association table is used between the Parts and Suppliers tables to maintain the one-to-many relationships in both directions when performing the necessary joins.
Associative operation: An associative operation is one where the operation's execution order can be changed within the limits of not altering the physical ordering of the operations without affecting a change the result. This is usually tested with the aid of parentheses. Addition and multiplication are associative in operation, subtraction and division are not. For example, with addition: 5*2*4 equals5*(2*4) while with subtraction: 5-3-1 does not equal5-(3-1). Building a hierarchical structure is associative because hierarchical structures can be built top-down, bottom-up, and in any order.
Atomic value:A basic value that is not combined of other classifiable parts.
Attribute:XML attributes are used to provide additional information about elements. XML attributes are specified by user specified keyword names.Also see Element for an explanation. Attribute used in a relational sense is referring to the columns in a relational table.
Attribute based or Attribute content: This is when an XML element contains only attribute data, no element text.
B2B: Stands for Business-to-Business. A method of describing the processing of business transactions over the Internet.
Base table: Normalized detailed table, not a view or a processed table.
Blob:A Blob is a relational column type used to hold Binary Large OBjects that can be composed ofany type of data and is used mainly for storage.For example, it can store a native XML document. It is not meant to be processed directly by SQL inherent operations. It can be processed by User Defined Functions.
Bottom-up processing/execution: Bottom-up processing of Outer join hierarchical structures involves their construction by building them from the bottom of the structure upwards. This can change the normal table join order but does not affect the result since hierarchical structures can be built from the bottom up or top down. Top down may be more efficient because it avoids throwaway data.
Bushy query: A bushy query is a query that accesses and/or processes multiple legs of the hierarchical data structure being processed.
Business rules: The operational rules of a business can be embedded into the database using stored procedures and triggers. Triggers can turn the database into an active database by having it automatically act on the rules by invoking the stored procedures. This process can be further enhanced by the hierarchical data filtering capability of the Outer join which allows the database to better represent the rules with a finer data filtering capability.
Candidate key: A combination of table attributes that uniquely identify each record within a table. Typically used when there is no unique key for the table.
Cardinality: Cardinality is a relational term for the number of rows in a table or result.
Cartesian product: A Cartesian product is the result when two relational tables are joined without or before applying a join criteria. Each row of one table is joined with every row of the other table creating all combinations. For this reason, the result is referred to as being exploded.
Cascading delete: When a hierarchical node occurrence is deleted (filtered out), all of its dependent node occurrences are removed also. Also see Hierarchical data preservation for a complete description.
CDATA: The XML CDATA type construct specifies an escape block for an Element that specifies that the indicated text data should not be parsed because it has special characters or requires special processing.
Centralized database: A centralized database takes the opposite approach to database management than federated databases. While federated databases access their data from remote heterogeneous databases when needed, centralized database contain all the data they may require. Each have there own strong points and weak points.
CGI: Common Gateway Interface, a standard for external gateway programs to interface with information servers such as HTTP servers.
Child: A child is the next lower level table, or node in the data structure which follows the path downward. There can be multiple children definitions for a parent node definition, each one on a separate path from the parent. In a hierarchical structure, children data occurrences can not exist without an active parent data occurrence.
CLOB:Character Large Object column type in a relational database.
Coalescing: Coalescing is the inspection of key values under the same domain to return a single valid key value representing a valid non null value amongst them. This has special significance for Outer joins where null key values can be produced because of their data preserving ability. This can identify a key field among multiple keys when there is at least one key non null value present so it can be used as the only key field. This avoids multiple key fields, some valid, some, not, and multiple key locations to check. The NATURAL option and also the USING option of the ANSI SQL outer join performs this capability.
Collection:See Document collection or Node collection.
Column:A column is a relational term for a data field that is defined in a table. It usually holds an atomic (single) value but in post relational databases it can hold nested tables and even native XML. A relational column is also known as an attribute (not to be confused with an XML attribute).
Control data: Is a data item in the data range of the query whose value influences the operation of the query.
COM:Component Object Model, the Microsoft paradigm to connect components.
Common ancestor: A common ancestor refers to the next higher level node in the data structure that is a common link point of two sibling legs of a data structure. This condition can also be referred loosely as a common parent. Common ancestors play an important role in determining semantics across sibling legs of the structure. Also known as Lowest Common Ancestor (LCA).
Common parent: See Common Ancestor.
Commutable joins: Are joins that can change order or be replaced with other joins to produce the same results.
Commutative operation: A commutative binary operation is one where its two input arguments can beswitched around without affecting the results. Additionand multiplication are commutative, subtraction and division are not. For example, with addition: 5+6 equals 6+5 while with subtraction: 4-2does not equal2-4. Symmetric outer join operations such the Full outer join are commutative.
Complex data modeling: Complex data modeling used in the context of this web site applies to the ability to construct hierarchical data structures that contain multiple legs by using the Outer join operation. Multiple legs add another level of capabilities and complexity to the principles involved in defining data structures with the Outer join and to the semantics associated with the data structure.
Composite key: A composite key is a key that is comprised of multiple columns from the same table. It is usually used when it is necessary to construct a unique key value when no single column in the table represents a unique key.
Concatenated key: A concatenated key is a composite key that represents the keys going down a path of the data structure so that the path is identified and can quickly be re-navigated.
Conceptual view: A conceptual view is a view or schema that defines all possible data and the valid relationships they comprise in a database so that all required application views can be defined from it. As such, a conceptual view requires a network structure to define it because of the high probability of converging paths.A conceptual view sits between the internal and external views and acts as an automatic level of abstraction between the two.
Containment:Hierarchical data structures and the operations on them enable the powerful use and control of containment for node data. Data ranges can be easily selected and grouped together. XPath makes a very powerful use of containment and duplicate element use where the same (named) element types within specified hierarchical ranges of the structure are collected together.
Content model:There are three content models when XML elements are declared. These aredata content, element content, and mixed content. With data content, elements can specify text data, but can not contain sub elements. With element content, an element can only specify sub elements and optional rules for their use. With mixed content, elements can specify both, text data and a sub elements (in this case, without any rules).
Conventional data structures: Conventional data structures are structures that are in common business use. These include relational, flat, and fixed format hierarchical structures. Semistructure data, whichis new to business, includes the capability to define structures that are still considered unconventional, these include structures that can have dynamically varying structure formats.
CORBA: Common Object Request Broker Architecture: CORBA is an architecture and specification for creating, distributing, and managing distributed program objects in a network.
Cousins: As used in this web site, arenodes that are not directly related to other nodes on the same active path, but are related indirectly by a common ancestor data occurrence. This means that every node in the hierarchical structure is related directly or indirectly to each other.
Cross join: The Cross join is one of the ANSI-92's SQL join types. It creates a basic non restricted Inner join Cartesian product result and as such, it does not use or require a join condition, so no ON or USING clause is used with it.
Dangling tuple: Dangling tuples are the rows that are not matched in join operations. With Inner joins they are discarded, with Outer joins they can be preserved in the result by padding their unmatched row side with null values.
Data abstraction: Data abstraction is the ability to hide the complexity of the data. In this web site, a good example would be a stored structured data view whose use helps hide the complexity of the hierarchical data structure.
Are documents usually processed by software. They are highly hierarchically structured.
Data content model: See content model.
Data definition:A data definition is a definition of the characteristics of data in the database. This includes but is not limited to the data type, size, number of occurrences and structure relationships of the data to other data in the data base. XML DTDs and Schemas can be classified as data definitions.
Data filtering: Data filtering is the dynamic process of selectively removing undesired data from the query result based on the values in the data. It is specified on the WHERE or ON clause, but is not considered part of the join criteria. The data filtering process operates differently when specified on the WHERE clause than the ON clause. The ON clause data filtering offers a much finer level of data filtering that follows the hierarchical structure defined from the point it is defined. The WHERE clause filters from the root node down (in relational terms this means filtering entire rows instead of portions of rows).
Data independence:Data independence is the characteristic that enables data to be easily combined into usually unlimited number of different structures. Without this property, data can not easily be combined to form different combinations of data. This property requires the normalizing of relational data by breaking it up into multiple tables following the rules of normalization.
Data inheritance: Data inheritance is the process of acquiring characteristics from an object that is included in another object. In the case of an Outer join structured view, this involves inheriting the data structure when it is included in a structured view of another structure being constructed so that the structured views can be combined.
Data integration:Combining dispersed data for analytic purposes from multiple heterogeneous systems.
Data modeling: Data modeling is the ability and process of specifying and constructing complex data structures that represent specific semantics. In SQL, this can be performed with the ANSI-92 LEFT Outer join operation which can inherently define and process complex data structures.
Data occurrence Vs data type.A data type is how and where the data is defined in the database and a data occurrence is an actual occurrence of the data in the data base. Data occurrence may also be known as a data instance which is actually more associated with data objects that contain more information than just a data occurrence.
Data partition: Breaking tables into multiple tables for different purposes. This can be done vertically or horizontally. Vertically, rows are split across multiple tables. Horizontally, tables are split based on some data value or range such as names starting from A to F in one table, G to M in another table or maybe by office location.
Data record: See Record.
Data segment: See Node.
Data Structure Extraction (DSE) technology: The DSE patented technology defines the process for extracting the data structure metadata from Outer joins that define data structures. This metadata contains a detailed description of the data structure from which powerful and useful semantics can be derived.
Data structure metadata: Metadata, also known as meta information, is information about information. Data structure metadata is information about the data structure such as a detailed description of its structure and data relationships.
Data structure processing: The ability of a database query to process a complex data structure by following the semantics of its data structure applied with the semantics of the query.
Data type Vs data occurrence.A data type is how the data is defined in the database and a data occurrence is an actual occurrence of the data in the database. Each data type can have unlimited data occurrences.
Data warehouse: A data warehouse is an out of production storehouse of a company's past and present data used for performing all forms of analysis. For this reason, this data needs to be combined (data modeled) in infinite ways, and processed in an ad hoc, interactivemanner.
Database record. See Record.
Database navigation: See Navigation.
DDL: Data Definition Language.
Declarative language: See Non procedural language.
Degree: Number of relational attributes (columns) in a table or result set.
Denormalization: Denormalization is the process of pre-joining normalized data and saving the result as a denormalized table. This is a deliberate data design decision. This avoids the overhead of performing the join operation each time the query that uses the denormalized data is used. Denormalization is performed for efficiency purposes, and puts the data in an unnormalizedform which is the form the data would naturally have from the join processing regardless of how it was stored. The disadvantage of denormalized data is that its data independence is lost and the data can become stale.
Derived data: Derived data, as its name implied is data derived from some process or calculation. Derived data when data is retrieved is modified after being retrieved and place in the input buffer as if it was retrieved directly. For example, a birthday could be converted to an age. This is good example because age is constantly changing.
Derived table: See Temporary table.
Descendent node: A descendent node is a node that is further down the path from the related node.
Directed Graph: Is a one way graph like a hierarchical structure which is only navigated top down.
Dirty data: Data that is or has become missing, inconsistent or erroneous.
Disparate heterogeneous database access: Disparate heterogeneous database access is the accessing ofvery different types of physical databases, possibly from different vendors, as if they were one logical database. Disparate heterogeneous database access includes intermixing different database types in the logical view such as both relational and non relationaldatabases. This is how federated databases operate.
Document: See XML document.
Document centric/oriented:Are documents usually processed by humans directly. They are not highly hierarchically structured.
Document collection: A document collection is a grouping of document occurrences(all of the same document type) that have been placed under an added root node we will call a collection node. This is like an XML data set for documents. An advantage is that the collection node can contain additional information about the document collection such as customer name ranges and/or the effective date range for the documents in the collection.
Document round tripping: This is when a native XML document is stored relationally by shredding and reconstructed later for output as native XML. The native XML document should remain identical. When a document is deconstructed and then reconstructed, it often will not be exactly identicaleven though it may be semantically identical. This is often because of how white space is treated.
DOM: DOM is the Document Object Model API. A DOM processor is used to access, parse, store and retrieve tokens from an XML document. There are other APIs such as SAX that can be used to access nativeXML documents.
DOM tree: Is the entire document internal hierarchical structure produced when DOM accesses the next document occurrence. This can be many times the size of the actual document native occurrence.
Domain: Domain in relational terms usually applies to columns in one or more tables that have the same use and meaning. For example, when relating two tables implies their join columns are in the same domain. The domain also specifies the valid values or ranges the data can have.
DRDA: Distributed Relational Architecture developed by IBM.
DSE technology:See Data Structure Extraction technology.
DTD: DTD is Document Type Definitions that contain rules and meta data about a specific class of XML documents. A newer, more powerful XML data definition, the Schema, can also be used instead of the DTD. These data definitions help with the processing of XML documents, but are not absolutelynecessary or required.
Duplicate data:In this web site, duplicate data is real data that naturally occurs. The term Replicated data describes data that is replicated because of operations applied to the data such as joining tables or flattening hierarchical structures.
Duplicate element usage: One of XML's advanced hierarchical capabilities allow a given Element type to be specified as a node in more than one location in the hierarchical structure. This is similar to object sub classes such as anAddress class being used as a sub class in both Customer and Employee super classes. These duplicate named element types will show up in multiple locations of an XML hierarchical structure causing ambiguity problems for navigationless query languages such as SQL. On the other hand, this duplicate element usage enables new capabilities such as node search and collection features specified in XPath specifications. These duplicate element usages cause the data occurrences they define to be stored in the XML document at the location they are defined. This means addresses for customers are stored separately from addresses for employees.
Dynamic: As a modifier for a database operation indicating that the operation it modifies can be performed dynamically or in an ad hoc fashion. Such as a dynamic query or a dynamic joining of structures.
Dynamic path shortening: Dynamic path shortening is a database access optimization. It is used in Outer join processing where the active access path can be dynamically shortened at the first path node position where missing data is encountered. This is significant to the Outer join operation since missing data is not usually a reason to stop processing with the Outer join.
Dynamic rebuild/rewrite: Dynamic rebuild or rewrite is an SQL optimization where the SQL Query can be dynamically rewritten at the time of execution to take advantage of the latest features in the SQL system. With the Outer join containing meta information about the data structure being processed, there aresignificant possibilities forsemantic optimizations to be applied dynamically. These include applying powerful new SQL3 features as they become available in the SQL processor.
Dynamic SQL specification: Dynamic SQL specification is the ability to build SQL query statements at run time. This enables SQL queries to be specified in an ad hoc, interactive fashion, not requiring pre-definition. This capability is automatically extended to data modeling, hierarchical structure processing, and hierarchical structure joining capabilities made possible by the ANSI-92 Outer join operation.
EAI: Enterprise application Integration.
EDI:Electronic Data Interchange, Contact between companies exchanging orders via the intranet or Internet.
EII: Enterprise Information Integration seeks to avoid moving large amounts of data by dynamically modeling and accessing virtual or federated databases in real-time. Other advantages include fresher data, access to real time data, and the ability to perform unanticipated queries.
Element: XML Elements define data in two ways, using a start and stop tag name which contains a text string that can also contain sub elements, and also through attributes which are name and value pairs. Either or both can be used unless restricted by a DTD or Schema. The tag names can be used to name the data values or actas markup in the text.
Element based or element content: With Element based content, only XML Element text is used to specify data, no attributes are used.
Element content model: See Content model.
Element sharing: This occurs when an XML IDRef is used in a document occurrence. It logically creates a network structure because it adds an alternate logical path to an element occurrence.
Embedded structure: A structure or fragment that is contained within another usually physical structure.
Embedded views: Embedding SQL views is the capability to nest views by placing views within views. This nesting capability seamlessly supports hierarchical structured views containing data structures defined by the Outer join operation. When expanded, the SQL will define the combined unified structure.
Empty element tag: An empty element tag represents an element declaration that can not contain data, it has no start and stop tag. This is different than a regular element instance that does not contain any data. These empty element tags are often used as flags and may be treated differently than an element that contains no data. An empty element tag is represented as <tagname/>
End tag:A matching tag for an XML start tag represented as </tagname>. It closes the definition of the current element occurrence.
Enterprise access: Enterprise access is the ability of an application or database system to access all databases in the corporate enterprise regardless of the database types or database locations involved.
Enterprise data: Data that is used or can be used across the entire corporation.
Enterprise modeling: The development of an consistent view of the data and its relationships across the enterprise.
Entity: An XML Entity specification operates like an include operation that bringsdifferent forms of data such as text or pictures into a document. For text, this is useful for boiler-plate material that exists in multiple locations in a single document occurrence or exists in many document occurrences or document types. For use in a single document occurrence the text can be defined once in the document and referred to multiple times.
Entity relationship diagram: An entity relationship diagram is a network structure diagram that depicts all of the data entities, their relationships, and their relationship types (i.e. one-to-many, many-to-one, many-to-many) in a database.
Equal join: An equal join is just that, a relational join that uses an equality operation to relate the tables. An equal join is also known in relational terms as an equi-join.
Equi-join: An equi-join is a fancy term for an equal join which is a relational join that uses an equality operation to relate the tables.
ETL: Extract, Transform and Load are utilities for accessing, converting and loading massive amounts of data. The newest ETL products aredesigned to convert andmove relational data sources to XML sources, and XML sources to relational sources. This involves shredding (flattening) the XML data.
Existential qualifier: An existence test such IF ANY NodeX THEN ...
This operation can become more controlled and useful with hierarchical processing by specifying the upper hierarchical structure bound. Such as IF NodeY HAS ANY NodeX THEN ...
Expanded views: Expanded views are embedded views whose name reference is replaced with its representative source code so that the query can be processed (parsed). When structured views are expanded, they automatically form a unified hierarchical view that uniformly models the hierarchical structure being processed.
Extended Cartesian product: A relational Cartesian product produces all combinations of rows from two relational tables. An extended Cartesian product as used in this web site operates by augmenting each table with an all null row that is joined when no other row is matched when performing the Cartesian product. This result reflects the operation and semantics of Outer join operations.
Extendibility:The ability to easily add new functions and capabilities to software.
External entity:Part of an XML document that is not contained in the document, but is referred to via a URI.
External view: An external view is one of the three types of views that comprise the three tier model for database architecture. These being the internal, external and conceptual views. The external view is the view that the application and user of an application has of the database. For this reason, it is also known as the application view.With application views, applications can share views, and databases can support many views.
Federated database: A federated database accesses the data from other databases when the data is needed. Also. This is the opposite of a centralized database system. Also see disparate heterogeneous database access
Field: A field in relational terms is a column in a table. It holds an atomic value.
First normal form: First normal form doesn't permit relational tables to contain repeating data types or groups in a single row. Repeating data should be placed in another table where each occurrence of the repeating data is placed in a different row. This allows a table to be a flat two dimensional structure. First normal form is not a prerequisite for good database design, it is only required for relational databases and their flat tables. Also see Non first normal form.
Fixed-occurring fields: Fixed-occurring fields are data fields that can occur multiple times in a record. They are fixed because the amount of space required to contain them is reserved in the record whether it is used or not. This means that a fixed-occurring field can contain a variable number of data fields, but is still considered fixed because it always uses the same fixed amount of storage space and can not exceed the maximum space allocated for it.
Flat file: A flat file is a file that has the same fixed, unvarying format for each record. It has no variable-occurring fields, but can have fixed-occurring fields. In this way, each record is of the same length. A flat file can be thought of as a relational table, with each ofits fixed records as a row of the table.
Flat structure: A flat structure is a two dimensional data structure. It has no variable-occurring fields, but can have fixed-occurring fields. In this way, each record is of the same length.
Flattening: Flattening a data structure means taking a multi-level structure such as a hierarchical structure and converting it into a flat two dimensional, first normal form table or rowset. A side effect of this flattening is losing data structure information and introducing replicated data values to fill out the flat structure.
Flwor: Pronounced flower is an XQuery operational construct for performing iterative operations and procedure-like programming. Stands for: For, Let, Where, Order, and Return. Join operations are specified this way.
Flwr: Older use of FLOWR withouot Order represented in name. Flwr pronounced flower is a XQuery operational construct for performing iterative operations and procedure-like programming. It stands for For, Let, Where and Return. Join operations are also specified this way.
Foreign key: A foreign key is an alternate key in one or more tables that relates to a primary key in another table creating either a one-to-many or many-to-one relationship.
Forrest: A collection of separate trees in XML.
Four value logic: With XML there can be four value logic, True, False, no value, and empty value. This is because an element can have no value specified or it can be specified as an empty element. The no value and empty value can both be interpreted differently by the application.
Fourth generation language:A fixed form language for nonprocedural specification of queries. See Nonprocedural language for more information.
Fragment: An XML fragment is an isolated data substructure located within a hierarchical document that can be processedor returned. What is significant is that this sub structure can be isolated below the root of the document, from within a document. Once a fragment has been retrieved and saved, It is interesting and important to note that the same document DTD can be used to define the fragment by specifying the new root segment as the root.
FTP:File Transmission Protocol, a standard Internet protocol to exchange files using TCP/IP.
Full join: A Full join is an Outer join type that preserves data on both sides of the join operation when rows are not matched up. Unmatched rows are padded with null values. This does not model a hierarchical structure, it models a flat structure because a full join is a symmetric operation. These can be incorporated into a hierarchical structure as a single logical node comprised of two or more full joinedtables or nodes.
Gateway: A software product or internal capability that allows one database interface to access data transparently from another data source which could be a heterogeneous source.
Heterogeneous database access: Heterogeneous database access is the accessing ofdatabases from different vendors, and can consist of different types of databases as if they were one logical unified database.
Hierarchical constraints: These are operations that are constrained using the hierarchical structure of the database. i.e. SELECT CompName, DeptName WHERE DepartmentHAS ALL EmployeeAge>40.
Hierarchical data preservation: Hierarchical structures preserve their structure hierarchically. This is because parent nodes can exist without any children. This means that when a node data type is deleted, all children data types below the deleted node type are also deleted in what is known as a cascading delete.
Hierarchical data semantics: Hierarchical data is organized hierarchically using data nodes hierarchically connected. The data nodes are all related hierarchically to each other through their own semantic relationships (meaning). These are fixed relationships and meanings that follow hierarchical principles. These can be utilized to automatically process queries against the data.
Hierarchical data structure: Hierarchical data structures are multi level data structures where the tables or nodes at each level only have one parent. This means the tables have only one pathway leading to them from the next higher level table above them. This results in hierarchical structures only having a single path from the root of the structure to any data item, making their semantics unambiguous and powerful.
Also of great importance is that childless parent nodes are preserved.
Hierarchical join:As used in this web site, means that the hierarchical structures being joined hierarchically one above the other, properly combines into the larger hierarchical structure with the correct combined hierarchical structure. One sided joins, Left or Right can be used to perform hierarchical joins.
Hierarchical optimization:Un-referenced portions of hierarchical structures do not need to be accessed and will not change the semantics of the query. This powerfulsemantic hierarchical optimization can be applied to hierarchical SQL views. This optimization is generally overlooked by relational optimizations because hierarchical structures are not recognized by the optimizer or relational engine.
Hierarchical relationships: Parent, Child, Siblings and Cousins are hierarchical relationships that exist between nodes of a hierarchical structure. Cousins exist on a different leg of the structure and are related by a common ancestor.
Hierarchical processing: The term hierarchical processing, as used in this web site, is the processing of hierarchical modeled structures so that the useful semantics of these structures are utilized. This means that the SQL processing of hierarchical modeled relational and non-relational data can be performed in non-first normal form to avoid flattening the data structures, which would cause semantic information loss.
Hierarchically restricted Cartesian product: See restricted Cartesian product.
Hierarchictivity:This is a term used in this web site to describe transformational principles of hierarchical structures that are not covered fully by commutative and associative principles. These apply to hierarchical semantic principles such as the capability to reorderjoin operations without changing the semantics that are not fully attributable or accountable to commutative and associative principles.
HTML: HyperText Markup Language used for formatting a web page for output. Its tags are fixed. You could say that it is an XML vocabulary for web output.
HTTP: HyperText Transportation Protocol.
Hyperlink: On the Web or other hypertext systems, hyperlink is a synonym for both link and hypertext link.
Hypertext:Hypertext is the organization of information units into connected associations that a user can choose to make. An example of such an association is called a link or hypertext link.
IDREF: An XML keyword that references another element making a logical pathway to it. This will most likely create a network structure logical connection because it supplies a secondary access path to a data value already having an access path to it, reffered to here as Shared element data.
Illogical structure: An illogical or invalid structure is a non hierarchical data structure constructed by hierarchical Outer joins (i.e. one-sided Outer joins) that does not follow the join linking rules for creating hierarchical structures. The semantics of illogical structures are often ambiguous, but may be useful in very specific cases.
Implicit natural join: An implicit Natural join is a term use in this web site for ANSI-92 Natural joins that are specified by replacing the ON clause with the USING clause which implies that a Natural join is to be performed, hence the use of the term implicit.
IMS: IMS is IBM's hierarchical database management system which is still a popular legacy system in wide use. This can also be used as a general term for Information Management System.
Indegree: Is the number of paths entering a node. Hierarchical structures only have a maximum of one.
Inner join: . The Inner join is the standard default join.It does not preserve unmatched data rows under any circumstances. It is a symmetric join operation and therefore models only a flat structure.
Internal view: An internal view is one of the three types of views that comprise the three tier model for database architecture. These being the internal, external and conceptual view. The internal view is the view that the database system has of how the data is physically stored in the database.
Internet:A worldwide system of computer networks based on the TCP/IP set of protocols.
Interoperability: The ability of different software systems to work together such as SQL and XQuery. This does not mean that these systems are integrated which implies a smooth interaction.
Intersecting data: Intersecting data is additional data that is stored in an association table along with the association data. An associationtable holds the relationships between two tables which have a many-to-many relationship such as Parts/Suppliers. The intersecting data is uniquely related to the associated data in each row at the intersection point which has a specific and unique meaning.An example of intersecting data is the price of a part from a specific supplier which could be differentfrom a different supplier.
Intranet:A private network that is contained within an enterprise using TCP/IP, and other Internet protocols.
Invalid structure: See illogical structure.
Inverted index: An index where every data item is indexed. Many capabilities where complex queries can be answered by just looking at the index.
Irregular data structure: An irregular data structure is a data structure that does not follow standard conventional formatting rules such as a semistructure dynamically varying format capability. Also see Unconventional data structure.
ISO: International Organization for Standardization.
Information Technology. Used to refer to the field of information technology.
JDBC:JDBC is the Java Database Connectivity API. It uses SQL as the database interface language. It is an open database connection standard that can be used in the Java programming environment.
Join operation: Relational tables are joined across their rows creating a larger wider table.
Join table order: The table join order can be specified in the Outer join statement. This table join control is important in some Outerjoin operations where it can effect the result.
Join table reordering: Join table reordering is the process of altering the table join order to optimize the execution of Outer joins. This can not be done indiscriminately, since changing the table join order can affect the results of the Outer join operation. Analyzing the data structures defined by the Outer join operation and understanding its semantics is one way of determining when and how table join order can be optimized without changing the result.
Late binding: Late binding with Outer join data modelingis the ability of the database application to accept different data structures which can be specified at run time.
Left join: The Left join operation is an Outer join that preserves unmatched rows from the table specified on the dominant left side of the join operation.
It is a natural hierarchical operation which allows the hierarchical structure to be built from the top to the bottom. The Right Outer join preserving data on the right builds hierarchical structures from the bottom upward. The Left Outer join is more easier and natural to use because it progresses naturally from the left to the right in the same direction as its execution.
Left sided nesting: Left sided nesting is the natural, intuitive way of specifying more than two tables in the join specification. Additional tables are introduced from the left side. Its name is derived from the way Outer join views are expanded when introduced from the left side.
Leg: A leg is a path in the data structure including the data that is stored along its path.
Legacy database:Legacy database applies to all pre-relational databases that are still in existence, or pre-relational database systems that are still in operation.
Link point: The link points are the connection node points, one in the upper and one or more in the lowerdata structures, which are connected by a pathway when the data structure is being built. This occurs using the Outer join operation and its ON clause join specification which specifies the link points.
Linking: Linking is the process defined in this web site for specifying a pathway between two structures which control how the two structures are joined into a single structure.
Lists: Lists of data are assumed ordered while sets of data are assumed unordered. XML data is assumed ordered while relational data is assumed unordered.
Location transparency: The user does not have to know where the data is located.
Logical data structure: This is a data structure with logicallinkages that rely on matchingdata values. These linkages can be made dynamically. An example is relational databases.Physical structures can also be logically linked into a larger structure that would be a hybrid structure, but could also be classified as a logical data structure.
Logical table: A logical table, as used in this web site, is a series of flat structures joined together that represent a single flat structure that is a single node in the overall hierarchical structure being modeled. This logical flat structure is modeled using inner or full outer joins which are symmetric join operations that model flat structures. This also enables inner and full outer join operations to be used in the modeling of hierarchical structures.
Lost data: See Missing data.
Lowest Common Ancestor: A lowest common ancestor (also known as a LCA) refers to the next higher level node in the data structure that is a common link point of two sibling legs of a data structure. Lowest Common Ancestors play an important role in determining the semantics across sibling legs of the hierarchical structure.
Many-to-many relationships: Many-to-manyrelationships are relationships used in data modeling where both sides of therelationship can have multiple occurrences. The classic example is the Parts-Suppliers relationship where one part can be carried by multiple suppliers, and one supplier can carry multiple parts.M-to-M relationships (also known as M-to-N) in relational databases require an association table to maintain the M-to-M relationships. These association tables can also contain intersecting data such as price of a specific part from a specific supplier.
Many-to-one relationships: Many-to-onerelationships are relationships used in data modeling where the upper level of the relationship has many occurrences and the lower level has only a single occurrence. The classic example is the Employee-to-Department relationship where many employees can have the same department.
Markup data: Markup datacontains markup elements that are used to indicate markup indicators in the text. The rules for markup elements allow them to be freely nested in any fashion as you would need for markup data, but this has no meaning for representing hierarchical data structures. If possible, the entire markup text should be defined as CData and processed separately.
Markup element: This is when an XML element tag is used as an inline text markup indicator and not an element tag that defines a piece of text as a data field. This overdefining of uses does present a problem of being able to determine when an element tag is used for data definition or markup use. Markup requires mixed content.
Marshalling: Marshalling is a term that loosely means moving data around which may require some conversion. Such as, marshalling data from here to there. It is used with ETL products a lot.
Mediator: A mediator is a product term used more frequently with earlier semistructure research to define a query processor that converts or integratesdata between two different models such as relational and XML.
Meta Information:See metadata.
Metadata: Metadata, also known as meta information, is information about information. Semistructured data, such as XML, embeds metadata within its data.When used with data structures as in this web site, it pertains to information about the data structure such as its description.
Materialized view: A Materialized view is the process of generating the view’s logical value as a temporary table to replace the view in the processing of a query.
Middleware: Middleware is software that sits between the user anduser interface,or between the user interface and the database, which adds value to the data.
Missing data: Missing data, also known as lost data, is the data that is lost in an Inner join when rows of the tables being joined do not match with any other rows. Missing data can also occur with one-sided joins on the side that is not being preserved. This definition ignores all the other reasons for missing data.
Mixed content or mixed element content:A combination of text and elements in any combination. Usually used for markup. Also see markup and content model.
Multi-leg semantics: There are semantics (semantic meaning) between ever node type in a hierarchical structure. The semantics between nodes on the same leg (parent/child or ancestor/descendent) are well known and understood. The semantics between nodes on different legs of the structure are referred to here as multi-leg semantics which are more complex to process requiring common ancestor logic. These multi-leg related nodes are known as cousins. Also see Common ancestor.
Multi-leg structure: A multi-leg structure is a complex hierarchical structure with multiple legs. If any node or table in a hierarchical structure has more than one pathway exiting it, it defines more than one leg. Multiple legs in hierarchical data structures significantly increases the semantics and complicates their operational principles. This is why multi-leg structures are considered complex structures in this book.
Namespace: Documents can be built from multiple documents, this causes the problem of naming conflicts. Namespaces are use to solve this problem so that different locations of the document can have a namespace qualifying prefix. This is only a very simple use and description of namespaces which can take up an entire chapter in a book.
Native XML: This is actual XML with its embedded metadata and not data that has been extracted and isolated from XML.
Natural ANSI join: A Natural operation is applied to a ANSI join operation which causes the natural join's common named join keys values to be coalesced into a single value in the result. This is very useful for Full joins when one key of the two join keys maybe null so that a key value is always available in the same location in the row.
Navigation: Database navigation is the process of positioning to any record in the database structure. Not all databases support procedural navigation, for example, relational databases are navigationless (self navigating) which operates transparently. The ANSI-92 SQL Outer join arguably does allow some level ofprocedural navigation since its join order can affect the result.
Navigationless: Fourth generation languages such as SQL by their very nature of being declarative languages, are navigationless. This means the user does not need to specify the database navigation or direct access. Even when SQL's processing is naturally raised to a hierarchical processing level, it remains navigationless. This keeps its hierarchical processing seamless and transparent.
Hierarchical structures can be automatically navigated because there is only one path between each node.
Nested display: A nested display is one in which the data is displayed in "What You See Is What You Get" (WYSIWYG) structured format. This format preserves the data structure and its semantics so the data and its structure can be displayed intuitively.
Nested relational processing: Nested relational processing is a post relational type of data where columns in relational databases can contain multiple values, tables, and even hierarchical structures. Nested relational processing can process these embedded structures hierarchically.
Nested structures: Nested structures are hierarchical structures whose hierarchical structure is represented physically and contiguously by nesting the hierarchical data. XML is an example of this.
Nested tables: See Nested relational processing.
Network structure: Unlike hierarchical data structures, network data structures can have multiple paths to the data stored in them. Like the hierarchical structure, this has specific uses. If the data can be reached from more than one path in a network structure, it makes the semantics of the data ambiguous from the point of view of the application. This limits its usefulness as a view for applications. But network structure are necessary, for example, to define a conceptual view with its capability to model intersecting paths that can represent all possible data relationships in the database.
In some cases XML structures can resemble network structures logically by the use of IDRefs and duplicate named elements.
Node: A node is a third normal form collection of closely related data connected in a graph or tree structure. For SQL datathisrows of a table. For XML data it is element data. In many legacy hierarchical data systems this is known as data segments.
Node collection: When node promotion happens on multiple paths under a common ancestor, the descendent nodesfrom the different paths are collected under the common ancestor.
Node definition, Node declaration or Node type: This refers to the definition of a node in the structure and not a data occurrence of the node.
Node occurrence or Node instance: This refers to a single node data occurrence and not the definition of the node in the structure.
Node promotion: When a defined node in the structure has not been picked for data selection (no data projection) from it, it is not placed in the output structure and its selected descendent nodes are moved up the patharound it to their next selected ancestor node.
Node type: See Node definition.
Non first normal form: In relational terms, non first normal form means that tables can support structured or nested data with repeating data (multiple occurrences of data in a single column). This form of relational data can be processed by a nested relational processor. The first normal form requirement is not a requirement for good database design or even a relational requirement, it is a requirement imposed by SQL and its requirement for two dimensional tables.
Non hierarchical join support: See Logical table.
Non procedural language: Non procedural languages are also known asfourth generation languages or declarative languages. The term declarative language got its name from the fact that with non procedural languages it is not necessary to specify how to perform a task, it is only necessary to specify what you want the task to accomplish.
Its advantages are that it is easier to specify, is automatically logically correct, and can be better optimized for access because it can be globally optimized.
Non relational database: A non relational database is any database that is not a relational database. These include legacy and post relational databases.
Normalization: Normalization is the process of designing a database following at least the first three relational normalization rules for good relational database design. All of the normalization rules require or rely on breaking the data apart and storing the data in multiple tables to increase its data independence. The join operation is used to combine the data back together when and as it is needed.
Null: Nulls are padding values that are used to represent missing data in Outer join results. Nulls are also used to represent unknown values when data is entered into a relational table.
OASIS:Organization for the Advancement of Structured Information Standards.
Object relational mapping: A form of modeling one to one, one to many, and many to many relationships with a relational database to model a hierarchical object. Unique keys are required to be used as object IDs.
ODBC: ODBC is the Open Database Connectivity API standard put forth by Microsoft Corporation. It uses SQL as the database interface language.
OID: OID is an Object IDentifier used in object programming and languages, but this term could be used elsewhere. XML uses this term and concept with their IDREF and ID= keywords. AN OID is the unique name of a data object which is useful for referencing objects. In Object languages and programming, every object should have an OID assigned.
OMG:Object Management Group.It was formed by a group of vendors with the aim of creating a standard architecture for distributed objects in networks that resulted in the Common Object Request Broker architecture (CORBA).
ON clause: The ON clause is used with the ANSI-92 Outer join operation to specify the join criteria for each table being joined in the join specification. The ON clause does supply greater control over Outer joining tables then is possible through a single WHERE clause. This proves that it has usefulness over the WHERE clause and is also crucial to performing outer join data modeling.
ON clause filtering: The ON clause is used with the ANSI-92 Outer join operation to specify the join criteria for each table being joined, but it can also specify hierarchical data filtering which allows more control and precise level of data filtering than if specified on the WHERE clause.
One-sided join: The one-sided join is the Left or Right join. These are known as one-sided joins since they preserve data only on one side, the dominant side.
One-to-many relationships: One-to-manyrelationships are relationships in data modeling where the upper level of the relationship has only one occurrence and the lower level has many related occurrences. The classic example of this is the Department-to-Employee relationship where each department can have many employees.
Ontology: Science of describing entities and how they relate in a problem domain giving a shared and common understanding of the data.
Open database interface: An open database interface is a database interface that is freely available to all potential users, and supplies access to most common database types.
Ordered data: Most data systems are either ordered or unordered systems by default. XML by default is ordered, and assumes the data is ordered. This is probably because XML was first a markup language where order is crucial. SQL is unordered by default. The SQL row order has no significance and rows can be returned in any order unless explicitly ordered. Ordered data are lists, and unordered data are sets.
Orthogonal: Term used to indicate that a feature or capability does not impose restrictions or limitations on normal processing.
Outdegree: Is the number of paths exiting a node.
Outer join: The Outer join operation is used to preserve data that doesn’t find a match in a join operation in order to preserve dangling tuples (partial rows). There are basically Full outer joins that preserves data on both sides of the join, and One-sided outer joins that preserves data only on one given side known as Left or Right joins.
Outer union: Without getting too technical, an outer union operation is way to union two tables (or rowsets) that have a different set of columns that are not union compatible. This is accomplished by padding one of the tables on the left with Nulls that match the form of the other table and the reverse is performed on the other table. This makes the tables union compatible. Also known as Union join. For more info see Union join.
P2P: Peer to Peer networks eliminate the need for servers and allow all computers to communicate and share resources as peers.
Parent: A parent is the next higher level table, or node in the data structure which follows the path upward. In a hierarchical structure, parents are important because their children can not be created without them.
Path or Pathway: A path is a series of connected nodes in a data structure. In a relational database,these nodes are tables while in a non relational database they can be flat files or segments.
Path qualification: Path qualification is when the join conditions of ON clauses also references higher level tables or nodes up the pathfrom the link point of the upper level structure being joined. This adds additional qualifications to the active join operation based on the path already established abovethe table or structure being joined.
Path shortening: See dynamic path shortening.
Pathway: A pathway is a path from one node to another node in a hierarchical structure. These can be defined by an Outer join operation. No two pathways can lead to the same lower level table in a hierarchical structure.
PCDATA: XML PCDATA is standard Element text that will be parsed
Persistent data: Data that is created and remains after the operation that created it and is available for reuse.
PHP: Hypertext preprocessor, a server-side HTML-embedded scripting language.
Physical data structure: Nodes that comprise physical databases are connected by physical address links (such as IBM'sIMS data base) or juxtaposition, proximity, or nesting(such as XML).
Post relational databases: Post relation databases are the next generation ofrelational databases, those with extended relational features such as nested relational processing.
Predicate: An expression used as a data filter in a query.
Primary key: A primary key is a database key that uniquely identifies a record or a row in a file or table and is usually required.
Procedural language: A procedural language is another name for a third generation language. With a procedural language, you have to procedurally specify or code how to perform the programming task you want performed.
Processing instruction: Used in XML to specify an application specific text processed by an application usually to specify some operational instruction. It should be avoided.
Projection:Selecting database data for output is referred to as projection. Projection controls which nodes types of the processed structure are output. This operation does not effect other data that has been selected for output.
Prolog: The first part of a document specifying the XML version, document character set and optional inline DTD.
Pseudo code: Pseudo code is high level code that is used in some of the examples in this web site that may not be totally complete or accurate, but is complete enough to easily convey the principles being demonstrated.
Query rewrite/rebuild: See Dynamic rewrite/rebuild.
RDBMS: Relational DataBase Management System.
RDF:Resource Description Framework, an XML application, providing a mechanism to exchange metadata.
Read-a-head: Read-a-head is a database access optimization technique that reads data before it is required to take advantage of current access optimizationopportunities that may not be available when the data is required.
Real-time data: Up to the second fresh data.
Record: A database record is comprised of all node occurrences from the root node occurrence down. A record is an internalized (computer data types) value of the data.
Recursive structures: XML supports recursive structure where the same element node specification in a path can be specified again in the same path in the structure causing a circular definition. This is used in structures to explode compound objects such as Parts which can consist of other parts which are repeated until their atomic parts are reached.
Regular data structure: A regular data structure is a data structure that follows standard conventional formatting rules. Also see Conventional data structure.
Replicated data:Replicated data is data that is replicated when a data structure is flattened into a two dimensional table structure in order to keep the structure flat and to preserve the data structure. This replicated data can throw summaries off, and has the potential to obscure the data structure. Replicated data is not the same as duplicate data whose identical data is semantically correct.
Reusable: Reusable with the Outer join is its ability to define structured views with sub structure views so they can be used many times in other structures. This also has the advantage that changes to the sub structure can be easily or automatically propagated to all of the structures it is used in.
Restricted Cartesian product: Relational Cartesian products are comprised of all row combinations from all the tables in the join with no restrictions applied. A restrictive Cartesian product has only the related combinations preserved so they properly reflect the data structure represented in the relationships specified for the relational query. In a hierarchical structure specified in a relational query, the Cartesian product is hierarchically restricted and produces a hierarchical result.
Restructuring:Restructuring, as used in this web site, is the changing of thehierarchical structure of the data structure and its data. This will change the semantics of the data and can change the occurrence counts of the data by replicating data. This is a form of transformation and can occur when the data of the structure is ordered against the natural structure (i.e. bottom up).
Result set:The result set is the flat relational result returned by SQL.
Right join: The Right join operation is an Outer join that preserves unmatched data from the dominant table specified on the right side of the join operation.
Right sided nesting: Right sided nesting ofSQL outer joins naturally occurs when outer join views are expanded for processing. This normal SQL view expansion process causes the current matching ON clause to be pushed to the right away from its join operation as a nested view expands causing the current join operation to be temporarily put on hold. This causes its associated related working set to be stacked and a new one created for processing the new active join view operation that just expanded. When complete, the result is available as the right argument to the stacked join operation which is then unstacked and processed. This stacking process has the beneficial side effect of preserving the structure in all the stacked working sets so they can not be influenced by the current join operation.This right sided nesting happens automatically and transparently when structured views are specified on the right side of outer joins and SQL programmers do not aware of this process. Also see Scope of control.
Root: The root of a hierarchical structure is the top most table or node in the structure. Since a hierarchical structure is an upside down tree, it makes sense that the starting table or node is called the root. All access to a hierarchical structure originates from the root.
Round tripping: See Document round tripping.
Row: Relational tables are made up ofhorizontal rows and vertical columns. The relational name for a row is a tuple. A row is analogous to a record in a flat file.
Rowset: A rowset is a flat relational data container. It can be an instance of a working set or result set. See Result set and Working set.
SAX: A simpler and smaller XML API than DOM. DOM reads the entire document into memory while SAX only returns the data requested. SAX is more efficient but more limited.
Scalability: The ability to keep overhead down as amount of processing and resources increase.Ideally a linear growth is desired as resources increase but difficult to achieve.
Schema: An XML schema defines and maps a specific class XML documents. It is newer and much more advanced than the older DTD which serves this same basic purpose.
Schema evolution:This is a process by which changes to the schema of a document can be made without invalidating older documents that use the same schema. This implies there was the applicable change to new documents produced.
Scope of control: Each specific join operation joins two working sets or tables. This means the tables referenced by ON clauses during each join operation must belong to one of the two working sets being currently joined. Because of right sided nesting, there can be many working sets that are stacked and these should not be referenced until they are unstacked and become active. This ON clause range of acceptable table references is known as the scope of control. Also see Right sided nesting.
Secondary key: A secondary key is a key that is not necessarily unique so that searching on it will return multiple records usually of the same type. It is also known as an alternate key. A primary key is unique.
Segment: As used in this web site, a segment is an older term for a contiguous block of closely related data. A structured record is made up of different segments types and their occurrences that are linked into a hierarchical structure. This term is a hold over from legacy databases and is still a useful generic term for describing database structures. A more generic term used in this website is Node.
Selection: Relational selection is filtering row data based on a data value. WHERE clause data selection removes entire rows. ON clause data filtering removes pieces of data from selected rows which are replaced with NULL values.
Semantic loss: Semantic loss, as used in this web site, occurs when semantic structural information is obscured or lost from a structure when it is transformed. In particular, when a hierarchical structure is flattened, the data structure and the structural semantics are significantly obscured.
Semantic mapping: Semantic mapping is the mapping of the meaning of the data derived from its data definition or its data structure. This can be determined in many ways such as the meaning in the data label names or data itself. As used in this web site, the meaning of the data is based on its hierarchical structure and the relationships between each node. For more info on hierarchical data semantics see Lowest Common Ancestor.
Semantic optimizations:Semantic optimizations are powerful optimizations based on the semantics of the data structure being accessed. They can be very high level optimizations where a single optimization can logically remove node types from the structure being accessed instead of optimizing accesses on an access by access basis.
Semantic web:The ability of software applications to understand and use the web as easily as humans can.
Semantically complex:Queries that operate across multiple legs of a hierarchical structure are semantically complex because of the semantically complex common ancestor node logic required to process them. All nodes in a multi-leg hierarchical structure are related to each other, this makes multi-leg processing very complex. Also see Common ancestor node.
Semi-Join:A semi-join operation is useful for decreasing I/O and transmission times in a multiprocessor system usually by only having to transfer one side of the join.
Semistructured data:XML is a semistructured language where the data contains embedded metadata. This is also known as a self describing language. This allows for many advanced hierarchical structures and capabilities. These include variable structures, network like structures, and dynamically defined structures.Some of these capabilities require that the embedded metadata be examined in the same way as the actual data by the programmer using the semistructure query language.
Serializing: Serializing is the conversion of usually structured data like XML into a byte stream that can be easily transmitted and reconstructed at the receiving end.
This is normally done through a depth first tree traversal.
Sets: Sets of data are assumed unordered while lists of data are assumed ordered. XML data is assumed ordered while relational data is assumed unordered.
SGML:Standard Generalized Markup Language. Standard for documents that defines a document. It was the predecessor of XML.
Shared element data: Shared element data as referred to in this web site is created by an XML IDREF usage that produces multiple paths into a node type so that the same physical data occurrences it defines is shared by two or more paths. For example, if all employees were automatically considered customers then the addresses for employees could be logically linked to their physical customer address. Also see IDREF.
Shredding:Shredding occurs when structured XML data is flattened and placed into multiple columns in relational tables usually by an ETL process. It is a form of flattening data. Also see Flattening.
Sibling Nodes: Sibling nodes are the sibling node types of a parent node type. There left to right defined ordered is application dependent.
Sibling legs: Sibling legs are parallel legs that are related directly through a common ancestor node. These legs are separate and are not directly related. They have no node by node occurrence correlation. This has specific consequences for the semantics of the data structure. For example, comparing data fields from two sibling legs requires comparing all combinations under the lowest common ancestor data occurrence.
Significant white space: Spaces, tabs and line break codes which are part of the document text and should be preserved and displayed when output.
Skolem function: A Skolem function creates a unique object ID using every value of its argument usually based on values in its data segment. This is used when an object ID has not been assigned but one is needed usually to make dynamic a linkage at a later date. This type of object ID is also known as a logical identifier.
SMP:Symmetrical Multi-Processing. The "shared everything" approach of parallel processing.
Soap:Simple Object Access Protocol. SOAP is a lightweight protocol for exchange of information in a decentralized, distributed environment. It is an XML based protocol that consists of three parts: an envelope that defines a framework for describing what is in a message and how to process it, a set of encoding rules for expressing instances of application-defined datatypes, and a convention for representing remote procedure calls and responses.
SQL: Structured Query Language.An ASNI and ISO standard interactive and programming language for getting information from and updating a database.
SQL-compliant: Conformity to the ANSI SQL standards.
SQL/XML Standard: This is the ANSI standard for defining syntax and functions in ANSI SQL to handle input and output of native XML from SQL. The input of XML has not been decided yet. A number of functions have been defined for XML output. These Output functions require nested use in order to form hierarchical XML documents.
SQL2: SQL2 is the ANSI standard that defines the Outer join operation which was ratified in 1992. It is also known as the ANSI-92 SQL standard.
SQL3: SQL3 is the object standard for relational databases. Its features and capabilities are showing up. These include support for Abstract Data Types (ADTs), User Defined Functions (UDFs), and User Defined Types (UDTs).
Start and stop tags: Enclose the content of an XML element. The first tag of a container element also names the element.
Static Query: A pre-optimized query, implies that query can not be specified dynamically.
Structure transformation: Involves changing the physical structure of a data structure which will automatically change the semantics of the data. Data will be deleted and duplicated as needed to fit the new structure.
Structured data: Used in this web site means the same as hierarchical data. See hierarchical data.
Structured data record: Structured data records are hierarchical structures stored contiguously top-down, left-to-right. They can be used directly by third and fourth generation languages.
Structured database processing: See nested relational processing.
Structured query output: See nested display.
Structured SQL views: SQL views that define logical or physical hierarchical structures and can be dynamically joined to form larger logical hierarchical structures. Structured SQL views are also self optimizing so they can be used more often greatly increasing their data abstraction.
Sub structure views: Sub structure views are SQL views that contain hierarchical data structures that can be seamlessly embedded in SQL statements and structured views to create larger views.
Surrogate key: Most relational keys serve two purposes, their use as a key, and their use as data. A surrogate is just used as a key, it is usually automatically generated because there was probably no data that could be used as the key.
Symmetric join:Inner and Full joins are referred to as symmetric joins because they are commutative in operation. They produce the same results when left and right table inputs are reversed. They model flat structures.
Table join order: See Join table order.
Tabular structure: A tabular structure is a flat two dimensional table structure with rows and columns.
Tags: See Elements or Start and stop tags.
TCP/IP:Transmission Control Protocol / Internet Protocol.
Text element: A text element is an XML data element, not to be confused with a markup element.
Three value logic: True, false, and unknown conditions used with relational processing.
Throwaways: The term throwaways as used in this web site, are rows retrieval in performing a join operation that are later discarded in the same join operation because of unmatched rows.
Top-down processing/execution: Top-down processing is the building and processing of hierarchical structures top-down. This is the best way to perform the join operations needed to create a hierarchical data structure since it avoids throwaways.
Transformation: XML documents require a lot of transformations. XSLT and XQuery, XML processing programs, are designed to do very complex transformations, that not only filter data, they reorganize it and change the data structure. This is what transformation involves.
Transitive property:if a > b and b > c, then a =c, greater-than is a transitive operation.
Tree model: A hierarchical structure which resembles an upside-down tree with branches.
Tree walking:Navigating a hierarchical structure.
Trigger: A previously defined condition that when occurs in the databases automatically causes a previously defined query statement to be executed that can change the database base by adding, changing, deleting the database.
Tuple:A tuple is the relational term for a row of a table.
Twins: The different children node types of a parent node type represent different children with different data types and formats, known as siblings. Twins are the multiple data occurrences for a specific node type that has the same parent data node occurrence. The node type is the same across twins and the node parent data occurrence is the same, hence the name twin or twins.
UDDI:Universal Description, Discovery, and Integration. A standard for a platform-independent, open framework for describing services on the Internet.
UDF: A UDF is a User Defined Function that executes in SQL and is written in a third generation language. It is an SQL3 capability. These UDF's can handle SQL ADT's.
UDT: A UDT is a User-Defined Type allowing the SQL user to specify new data types with their own usage rules and C code methods to support their use.
Unambiguous semantics: Unambiguous semantics are semantics with only one meaning or interpretation. Hierarchical data structures have unambiguous semantics because they are singular in nature, having only one path to any value. This makes their semantics unambiguous which makes them very useful and powerful.
Unconventional data structure: Unconventional data structures are structures that are not yet in common business use. These include structures produced from semistructure data whichis new to business and includes the capability to define structures that can have dynamically varying structure formats.
Unified view:A unified view sits over heterogeneous data sources and offers a consistent view definition by defining the entire logical structure view. The ANSI SQL outer join can do this and it offers some advantages which follow. The unified view can be specified as sub views in separate manageable and reusable SQL views and these SQL sub views can be specified and arranged dynamically at execution. In addition sub view definition can be specified dynamically, and when all the views are expanded they form a solid single, unified viewdefined entirely by ANSI SQL syntax and semantics.
URI:Universal Resource Identifier, either a URL or a URN (Uniform resource name).
Union join:A Union join is also called an Outer union. It actually unions two tables that have different formats. This operation was probably included in the join syntax because it can easily be simulated by the FULL join as in: T1 FULL JOIN T2 ON 1>2. Also see Outer union.
Universal Data Access: Universal Data Access (UDA) is a term that indicates that agiven product can support access to all forms, types, and combinations of data and databases.
Universal qualifier: Testing for universal existence such as IF ALL …
Universal Resource Locator :See URL.
Unnormalized: Unnormalized data is data that has not been normalized for what ever reason. Denormalized data is data that is unnormalized on purpose such as pre-joining tables for efficiency reasons.
Unordered data: See Ordered data.
Unparsed form: Native XML with its complex hierarchical structure and embedded metadata requires parsing to be accessed. There are times when it is desired that areas of the XML data are to be bypassed by the parsing operation. This unparsed data isidentified in the XML metadata as CDATA.
Unstructured data: Unstructured data has no real structure such as the data in an email and memos. Interestingly, estimates have 85% of all business information as unstructured data. There are now many products coming on the market that can put some structure into unstructured data so it can be categorized or organized hierarchically.
URL:A URL is a Universal (or Unified) Resource Locator used to access data on the Internet or Intranet. It is a web address.
User Defined Function: See UDF.
User Defined Type:See UDT.
USING clause: The USING clause is used instead of the ON clause to specify that an Implicit Natural join option is to be applied to the join operation.
Valid XML document: An XML document that adheres strictly to a DTD or XML schema specified in its document-type definition.
Validation:Check whether a document in HTML or XML conforms to its specification.
Variable data structure: With XML, the structure of the data structure can vary from document occurrence to occurrence or even within a given document. Within limits, SQL can do this using the ON clause based on a data value field at a higher level of the structure.
Variable length fields:Variable length fields are fields that are of variable length. They hold any type of value or field. The length of a variable length field is usually contained somewhere in the record (known to the application) preceding the variablelength field. This means that a data record with variable length fields is variable length record. This does not make it a varying(format) structure because this does not change the format.
Variable length records:Variable length data records in a data set are records that contain variable length fields and/or variable occurring fields making them a variable length that changes in the data set. This does not make it a varying(format) structure because this length change does not change the format.
Variable occurring fields: Variable occurring fields are data fields that can repeat sequentially for multiple occurrences in a record. They are variable because the amount of space required to contain them is variable, only using the space required. The active number of occurrences value usually directly precedes the variable occurring fields This means that a record with variable occurring fields is variable length record. This does not make it a varying(format) structure because this does not change the format.
View materialization:View materialization is the process of creating a temporary table or working set that exactly reflects the data and semantics of the view.
View optimization: View optimization is a powerful Outer join semantic optimization that can dynamically exclude nodes in a view from access based on which columns are specified at view invocation. This means there is never a penalty for using an Outer join hierarchical view that contains more nodes than are needed. This also means that the number of required views can be reduced since one large view can do the job of many small ones.
View update: The term "view update" is really the capability to update a multi-table join view. This has always presented a problem because of the lack of semantics when multiple tables are joined. Modeling hierarchical structures allows much more flexibility with multi-table updates.
Views-within-views: See embedded views.
Virtual key:A virtual key is a logical key that does not physically exist in a row or record, but is used to retrieve it and is inserted when the row or record is retrieved into storage to act as its key. This can be the case when the key exists in an index and does not exist in the row or record that is indexed.
Virtual view: A virtual view makes multiple data sources from possibly distributed sites appear as one seamless view in heterogeneous queries.
Virtualization:Making multiple sources appear as one.
WAP:Wireless Application Protocol standard for accessing the internet with wireless devices.
WebDAV:WWW Distributed Authoring and Versioning. HTTP extensions necessary to enable distributed web authoring tools to broadly interoperate.
Website:A Web site is a collection of Web files on a particular topic.
Web Service: Any software service that is available over the Internet using a standard XML messaging system not tied to a specific operating system.
WebDAV:WWW Distributed Authoring and Versioning. HTTP extensions necessary to enable distributed web authoring tools to broadly interoperate.
Website:A Web site is a collection of Web files on a particular topic.
Well-formed documents: An XML document that confirms to the XML standard but not necessarily to a DTD or XML schema. This reinforces the fact that XML documents do not require a predefined definition allowing them to be created dynamically.
WHERE clause filtering: WHERE clauses can also specify data filtering criteria besides join criteria. When data filtering is specified on the WHERE clause, it can affect the entire row so that if the data filtering criteria causes the last node occurrence to be removed, the entire row is filtered out. This is not the case with ON clause filtering which allows for a finer level of hierarchical filtering with its data preserving operation.
White space: White space in XML documents is controlled by the space, carriage return and linefeed characters. Unfortunately, white space canbecome important and affect outcomes of certain types of processing. For example, when reconstructing a document from its deconstructed pieces it is difficult to recreate the white space exactly. This can throw document comparisons off, to see if they have changed. DOM and SAX also treat white space differently. This is why white space can be an important issue.
Working set: A working set is similar to a temporary table that is a temporary rowset work area used in the performing of the query.
Wrapper element: A non mapped DB element used to make some input stream a valid element ususlly for transmission and processing. For example, specifying an SQL query or an argument list in a wrapper element.
WYSIWYG: A nested display that properly depicts the hierarchical structure of the data result. What You See, Is What You Get. Also see Nested display.
WWW:World Wide Web. All the resources and users on the Internet use the Hypertext Transfer Protocol (HTTP).
WYSIWYG: See Nested display.
W3C: World Wide Web Consortium. An industry consortium involved in the establishment of standards for the Web.
XHTML:Extensible HyperText Markup Language, A combination of XML and HTML.
Xlink: Xlink allows elements to be inserted into XML documents in order to create and describe links between resources. It uses XML syntax to create structures that can describe links similar to the simple unidirectional hyperlinks of today's HTML, and more sophisticated links.
XML fragment: See fragment.
XPath:XPath was a simple XML query language which is now used in most XML query languages as their navigational sub language. Unfortunately, XPath is single path oriented and can not handle multi-leg (bushy) queries in a single use very well.
XML: XML stands for Extensible Markup Language and it is a simistructure language that means it is self describing because its metadata is embedded along with its data. This also gives it many powerful new capabilities not found in current conventional data formats. Its data is stored in a nested form that controls its hierarchical datastructure.
XML aware: When an application can accept and output XML documents. Also see XML enabled.
XML document: An XML document is an XML formatted hierarchical data record including embedded meta data and XML prolog.
XML enabled: This term means that the indicated application or utility can input and output XML. This means it can operate in an XML environment.
XML island:A piece of XML code contained in an HTML document.
XML vocabulary: XML vocabularies are formalized DTD and XML schema’s that are designed and developed to support a specific industry terminology, rules, and message formats thatallowing the industry to communicate easily and automatically with one another.
Xpointer:Xpointer, which is based on the XML Path Language (XPath), supports addressing into the internal structures of XML documents. It allows for examination of a hierarchical document structure and choice of its internal parts based on various properties, such as element types, attribute values, character content, and relative position.
XQuery:XQuery is the newest XML query language endorsed by the W3C. It is a procedural-like language that is particularly good at textual transformation. As a separate procedural XML processor it is very good and powerful. Performing full multi-leg hierarchical processing will require complex procedural static processing.
XSL: Extensible Style Language. Takes XML, (which defines data and not output) and adds formatting to it. This allows one XML document to be dynamically formatted to fit the current output requirements.
XSLT: XSL Transformational language for transforming an XML document into another form which may be XML, HTML or some other data format. Transformation can involve changing the data structure.