SlideShare a Scribd company logo
A Survey of XML Tree Patterns
ABSTRACT
The XML becoming a ubiquitous language for data interoperability purposes in various domains, efficiently
querying XML data is a critical issue. This has lead to the design of algebraic frameworks based on tree-shaped
patterns akin to the tree-structured data model of XML. Tree patterns are graphic representations of queries over
data trees. They are actually matched against an input data tree to answer a query. Since the turn of the 21st
century, an astounding research effort has been focusing on tree pattern models and matching optimization (a
primordial issue). This paper is a comprehensive survey of these topics, in which we outline and compare the
various features of tree patterns. We also review and discuss the two main families of approaches for optimizing
tree pattern matching, namely pattern tree minimization and holistic matching. We finally present actual tree
pattern-based developments, to provide a global overview of this significant research topic.
Existing System
Efficiently evaluating path expressions in a tree-structured data model such as XML’s is crucial for the overall
performance of any query engine [10]. Initial efforts that mapped XML documents into relational databases
queried with SQL induced costly table joins. Thus, algebraic approaches based on tree-shaped patterns became
popular for evaluating XML processing natively instead . Tree algebras indeed provide a formal framework for
query expression and optimization, in a way similar to relational algebra with respect to the SQL language. In
this context, a tree pattern (TP), also called pattern tree or tree pattern query (TPQ) in the literature, models a
user query over a data tree. Simply put, a tree pattern is a graphic representation that provides an easy and
GLOBALSOFT TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE
BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS
CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401
Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
intuitive way of specifying the interesting parts from an input data tree that must appear in query output. More
formally, a TP is matched against a tree-structured database to answer a queryThe upper left-hand side part of
the figure features a simple XML document (a book catalog), and the lower left-hand side a sample XQuery that
may be run against this document (and that returns the title and author of each book). The tree representations of
the XML document and the associated query are featured on the upper and lower right-hand sides respectively.
At the tree level, answering the query translates in matching the TP against the data tree. This process can be
optimized and outputs a data tree that is eventually translated back as an XML document.
Disadvantage
Although the intersection between their paper and ours is not empty, both papers are complementary.
We do not address approaches related to the relational storage of XML data.
The efficiency of TP matching against treestructured data is central in TP usage, we review the two main
families of TP matching optimization methodsas well as tangential but nonetheless interesting methods.
Proposed System
The input data tree when performing actual matching operations. The initial binary join-based
approach for matchingthe tremendous number of holistic matching algorithms proposed in the literature, it is
quite impossible to review them all. Hence, we aim in the following sections at presenting the most influential.
Many labeling schemes have been proposed in the literature. We particularly focus in this section on the region
encoding (or containment) and the DeweyID (or prefix) labeling schemes that are used in holistic approaches.
However, other approaches do exist, based on a tree-traversal order prime numbers or a combination of
structural index and inverted lists [58], for instance. Various holistic algorithms actually achieve TP matching,
but they all exploit a data list that, for each node, contains all labels of nodes of the same type. In this section,
we first review the approaches based on the region encoding scheme, which were first proposed, and then the
approaches based on the Dewey ID scheme. These approaches aim at avoiding repeated access to the input data
tree. Thus, they exploit structural summaries similar to the DataGuide proposed for semistructured documents .
Advantage
A DataGuide’s structure describes using one single label all the nodes whose labels are identical. Its
definition is based on targeted path sets, of nodes that are reached by traversing a given path.
Simply replacing the region encoding labeling scheme by theDeweyID scheme wouldnot particularly
improve holistic matching approaches, since they would also need to read labels for all tree nodes.
Module
Tree Pattern Minimization
Translating XML Queries
Labeling Phase
Computing Phase
Tree Homeomorphism
Time Complexity
Module Description
Tree Pattern Minimization
The efficiency of TP matching depends a lot on the size of the pattern. It is thus essential to identify and
eliminate redundant nodes in the pattern and do so as efficiently as possible. This process is called TP
minimization. All research related to TP minimization is based on a formulate the problem as follows: given a
TP, find an equivalent TP of the smallest size.
Translating XML Queries
The Expressiveness is a complex issue. Translating XML queries into TPs is indeed easier than
translating TPs back into an XML query plan. XQuery, although the standard XML query language, suffers
from limitations such as the lack of a Group by construct. Thus, it is more efficient to implement TPs and
exploit them to enrich XML querying in an ad hoc environment such as TIMBER’s. We think that the richer the
pattern, with matching options, ordering specifications, possibility to associate with many operators (and other
options if possible), the more efficient querying is, in terms of user need satisfaction.
Labeling Phase
The aim of data tree labeling schemes is to determine the relationship between two nodes of a tree from
their labels alone . Many labeling schemes have been proposed in the literature. We particularly focus in this
section on the region encoding and the DeweyID labeling schemes that are used in holistic approaches.
However, other approaches do exist, based on a tree-traversal order prime numbers or a combination of
structural index and inverted lists, for instance.
Computing Phase
The Various holistic algorithms actually achieve TP matching, but they all exploit a data list that, for each node,
contains all labels of nodes of the same type. In this section, we first review the approaches based on the region
encoding scheme, which were first proposed, and then the approaches based on the Dewey ID scheme.
Tree Homeomorphism
The tree homeomorphism matching problem is a particular case of the TP matching problem. More precisely,
the considered TPs only bear descendant edges. Formally, given a TP p and a data tree t, tree homeomorphism
matching aims at determining whether there is a mapping from the nodes.
Time Complexity
Time complexity is quite well documented for minimization approaches. Except the first, naive matching
algorithms all optimized minimization algorithms, whether they take ICs into account or not, have the worst
case time complexity.
FLOW CHART
Documents
Books
EDITOR
Books
TITLE TITLE EDITOR
CONCLUSION
We provide in this paper a comprehensive survey about XML tree patterns, which are nowadays considered
crucial in XML querying and its optimization. We first compare TPs from a structural point of view, concluding
that the richer a TP is with matching possibilities, the larger the subset of XQuery/XPath it encompasses, and
thus the closer to user expectations it is. Second, acknowledging that TP querying, i.e., matching a TP against a
data tree, is central in TP usage, we review methods for TP matching optimization. They belong to two main
families: TP minimization and holistic matching. We trust we provide a good overview of these approaches’
evolution, and highlight the best algorithms in each family as of today. Moreover, we want to emphasize that
TP minimization and holistic matching are complementary and should both be used to wholly optimize TP
matching. We eventually illustrate how TPs are actually exploited in several application domains such as
system optimization, network routing or knowledge discovery from XML sources. We especially demonstrate
the use of frequent TP mining and TP rewriting for various purposes. Although TP-related research, which has
been ongoing for more than a decade, could look mature in the light of this survey, it is perpetually challenged
by the ever-growing acceptance and usage of XML. For instance, recent applications require either querying
data with a complex or only partially known structure, or integrating heterogeneous XML data sources (e.g.,
when dealing with streams). The keyword search-based languages that address these problems cannot be
expressed with TPs. Thus, TPs must be extended, e.g., by the so-called partial tree-pattern queries (PTPQs) that
allow the partial specification of a TP and are not restricted by a total order on nodes. In turn, adapted matching
procedures must be devised a trend that is likely to perpetuate in the near future. Moreover, we purposely focus
on the core of TP-related topics in this survey (namely, TPs themselves, matching issues and a couple of
applications). There is nonetheless a large number of important topics that we could not address
due to space constraints, such as TP indexing, TP-based view selection, TP for probabilistic XML, and
continuous TP matching over XML streams.
REFERENCES
[1] L. Quin, “Extensible Markup Language (XML),” World Wide Web Consortium (W3C),
http://www.w3.org/XML/, 2006.
[2] D. Carlisle, P. Ion, and R. Miner, “Mathematical Markup Language (MathML) Version 3.0,” World Wide
Web Consortium (W3C), http://www.w3.org/TR/MathML/, 2010.
[3] P. Murray-Rust and H. Rzepa, “Chemical Markup Language - CML,” http://www.xml-cml.org/, 1995.
[4] R. Lake, D.S. Burggraf, M. Trninic, and L. Rae, Geography Mark-Up Language: Foundation for the Geo-
Web. Wiley, 2004.
[5] ADL, “SCORM 2004 Fourth Edition Version 1.1 Overview,” Advanced Distributed Learning (ADL),
http://www. adlnet.gov/Technologies/scorm/, 2004.
[6] J. Clark and S. DeRose, “XML Path Language (XPath) Version 1.0,” World Wide Web Consortium (W3C),
http://www.w3.org/ TR/xpath, 1999.
[7] S. Boag, D. Chamberlin, M.F. Ferna´ndez, D. Florescu, J. Robie, and J. Sime´on, “XQuery 1.0: An XML
Query Language,” World Wide Web Consortium (W3C), http://www.w3.org/TR/xquery/, 2007.
[8] H.V. Jagadish, L.V.S. Lakshmanan, D. Srivastava, and K. Thompson, “TAX: A Tree Algebra for XML,”
Proc. Eighth Int’l Workshop Database Programming Languages (DBPL ’01), pp. 149-164, 2001.

More Related Content

PDF
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
PPTX
Tdm probabilistic models (part 2)
PDF
Taxonomy extraction from automotive natural language requirements using unsup...
PPTX
Probabilistic models (part 1)
PPTX
TextRank: Bringing Order into Texts
DOCX
Clustering sentence level text using a novel fuzzy relational clustering algo...
PDF
Ijetcas14 624
PDF
Ijetcas14 639
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
Tdm probabilistic models (part 2)
Taxonomy extraction from automotive natural language requirements using unsup...
Probabilistic models (part 1)
TextRank: Bringing Order into Texts
Clustering sentence level text using a novel fuzzy relational clustering algo...
Ijetcas14 624
Ijetcas14 639

What's hot (20)

PDF
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
PPT
Data Integration Ontology Mapping
PPTX
Tdm recent trends
PDF
Ju3517011704
PPTX
ADB introduction
PDF
Conceptual similarity measurement algorithm for domain specific ontology[
PPTX
Ontology integration - Heterogeneity, Techniques and more
PPTX
Language Models for Information Retrieval
PDF
Information retrieval as statistical translation
PDF
G04124041046
PDF
A NEW TOP-K CONDITIONAL XML PREFERENCE QUERIES
PDF
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
PDF
Learning ontologies
PDF
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
PPTX
Ontology-based Data Integration
PDF
Query Processing, Query Optimization and Transaction
PPT
Ontology Mapping
PPTX
PPTX
Collnet turkey feroz-core_scientific domain
PPTX
Collnet _Conference_Turkey
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING
Data Integration Ontology Mapping
Tdm recent trends
Ju3517011704
ADB introduction
Conceptual similarity measurement algorithm for domain specific ontology[
Ontology integration - Heterogeneity, Techniques and more
Language Models for Information Retrieval
Information retrieval as statistical translation
G04124041046
A NEW TOP-K CONDITIONAL XML PREFERENCE QUERIES
TEXT PLAGIARISM CHECKER USING FRIENDSHIP GRAPHS
Learning ontologies
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Ontology-based Data Integration
Query Processing, Query Optimization and Transaction
Ontology Mapping
Collnet turkey feroz-core_scientific domain
Collnet _Conference_Turkey
Ad

Similar to A survey of xml tree patterns (20)

DOCX
A survey of xml tree patterns
PDF
Java a survey of xml tree patterns
PDF
Java a survey of xml tree patterns
PDF
Dotnet a survey of xml tree patterns
PDF
A survey of xml tree patterns
PDF
A survey of xml tree patterns
PDF
Android a survey of xml tree patterns
PDF
Effective Data Retrieval in XML using TreeMatch Algorithm
PDF
Android a survey of xml tree patterns
PDF
Android a survey of xml tree patterns
PDF
A survey of xml tree patterns
PDF
A survey of xml tree patterns
PDF
Android a survey of xml tree patterns
PDF
Android a survey of xml tree patterns
PDF
A survey of xml tree patterns
PDF
A survey of xml tree patterns
PDF
Android a survey of xml tree patterns
PDF
Android a survey of xml tree patterns
PDF
Android a survey of xml tree patterns
PDF
A survey of xml tree patterns
A survey of xml tree patterns
Java a survey of xml tree patterns
Java a survey of xml tree patterns
Dotnet a survey of xml tree patterns
A survey of xml tree patterns
A survey of xml tree patterns
Android a survey of xml tree patterns
Effective Data Retrieval in XML using TreeMatch Algorithm
Android a survey of xml tree patterns
Android a survey of xml tree patterns
A survey of xml tree patterns
A survey of xml tree patterns
Android a survey of xml tree patterns
Android a survey of xml tree patterns
A survey of xml tree patterns
A survey of xml tree patterns
Android a survey of xml tree patterns
Android a survey of xml tree patterns
Android a survey of xml tree patterns
A survey of xml tree patterns
Ad

More from IEEEFINALYEARPROJECTS (20)

DOCX
Scalable face image retrieval using attribute enhanced sparse codewords
DOCX
Scalable face image retrieval using attribute enhanced sparse codewords
DOCX
Reversible watermarking based on invariant image classification and dynamic h...
DOCX
Reversible data hiding with optimal value transfer
DOCX
Query adaptive image search with hash codes
DOCX
Noise reduction based on partial reference, dual-tree complex wavelet transfo...
DOCX
Local directional number pattern for face analysis face and expression recogn...
DOCX
An access point based fec mechanism for video transmission over wireless la ns
DOCX
Towards differential query services in cost efficient clouds
DOCX
Spoc a secure and privacy preserving opportunistic computing framework for mo...
DOCX
Secure and efficient data transmission for cluster based wireless sensor netw...
DOCX
Privacy preserving back propagation neural network learning over arbitrarily ...
DOCX
Non cooperative location privacy
DOCX
Harnessing the cloud for securely outsourcing large
DOCX
Geo community-based broadcasting for data dissemination in mobile social netw...
DOCX
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
DOCX
Dynamic resource allocation using virtual machines for cloud computing enviro...
DOCX
A secure protocol for spontaneous wireless ad hoc networks creation
DOCX
Utility privacy tradeoff in databases an information-theoretic approach
DOCX
Two tales of privacy in online social networks
Scalable face image retrieval using attribute enhanced sparse codewords
Scalable face image retrieval using attribute enhanced sparse codewords
Reversible watermarking based on invariant image classification and dynamic h...
Reversible data hiding with optimal value transfer
Query adaptive image search with hash codes
Noise reduction based on partial reference, dual-tree complex wavelet transfo...
Local directional number pattern for face analysis face and expression recogn...
An access point based fec mechanism for video transmission over wireless la ns
Towards differential query services in cost efficient clouds
Spoc a secure and privacy preserving opportunistic computing framework for mo...
Secure and efficient data transmission for cluster based wireless sensor netw...
Privacy preserving back propagation neural network learning over arbitrarily ...
Non cooperative location privacy
Harnessing the cloud for securely outsourcing large
Geo community-based broadcasting for data dissemination in mobile social netw...
Enabling data dynamic and indirect mutual trust for cloud computing storage s...
Dynamic resource allocation using virtual machines for cloud computing enviro...
A secure protocol for spontaneous wireless ad hoc networks creation
Utility privacy tradeoff in databases an information-theoretic approach
Two tales of privacy in online social networks

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Empathic Computing: Creating Shared Understanding
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
KodekX | Application Modernization Development
DOCX
The AUB Centre for AI in Media Proposal.docx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
“AI and Expert System Decision Support & Business Intelligence Systems”
Digital-Transformation-Roadmap-for-Companies.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Unlocking AI with Model Context Protocol (MCP)
20250228 LYD VKU AI Blended-Learning.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
cuic standard and advanced reporting.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
MYSQL Presentation for SQL database connectivity
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
NewMind AI Weekly Chronicles - August'25 Week I
Empathic Computing: Creating Shared Understanding
Advanced methodologies resolving dimensionality complications for autism neur...
Reach Out and Touch Someone: Haptics and Empathic Computing
KodekX | Application Modernization Development
The AUB Centre for AI in Media Proposal.docx

A survey of xml tree patterns

  • 1. A Survey of XML Tree Patterns ABSTRACT The XML becoming a ubiquitous language for data interoperability purposes in various domains, efficiently querying XML data is a critical issue. This has lead to the design of algebraic frameworks based on tree-shaped patterns akin to the tree-structured data model of XML. Tree patterns are graphic representations of queries over data trees. They are actually matched against an input data tree to answer a query. Since the turn of the 21st century, an astounding research effort has been focusing on tree pattern models and matching optimization (a primordial issue). This paper is a comprehensive survey of these topics, in which we outline and compare the various features of tree patterns. We also review and discuss the two main families of approaches for optimizing tree pattern matching, namely pattern tree minimization and holistic matching. We finally present actual tree pattern-based developments, to provide a global overview of this significant research topic. Existing System Efficiently evaluating path expressions in a tree-structured data model such as XML’s is crucial for the overall performance of any query engine [10]. Initial efforts that mapped XML documents into relational databases queried with SQL induced costly table joins. Thus, algebraic approaches based on tree-shaped patterns became popular for evaluating XML processing natively instead . Tree algebras indeed provide a formal framework for query expression and optimization, in a way similar to relational algebra with respect to the SQL language. In this context, a tree pattern (TP), also called pattern tree or tree pattern query (TPQ) in the literature, models a user query over a data tree. Simply put, a tree pattern is a graphic representation that provides an easy and GLOBALSOFT TECHNOLOGIES IEEE PROJECTS & SOFTWARE DEVELOPMENTS IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401 Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
  • 2. intuitive way of specifying the interesting parts from an input data tree that must appear in query output. More formally, a TP is matched against a tree-structured database to answer a queryThe upper left-hand side part of the figure features a simple XML document (a book catalog), and the lower left-hand side a sample XQuery that may be run against this document (and that returns the title and author of each book). The tree representations of the XML document and the associated query are featured on the upper and lower right-hand sides respectively. At the tree level, answering the query translates in matching the TP against the data tree. This process can be optimized and outputs a data tree that is eventually translated back as an XML document. Disadvantage Although the intersection between their paper and ours is not empty, both papers are complementary. We do not address approaches related to the relational storage of XML data. The efficiency of TP matching against treestructured data is central in TP usage, we review the two main families of TP matching optimization methodsas well as tangential but nonetheless interesting methods. Proposed System The input data tree when performing actual matching operations. The initial binary join-based approach for matchingthe tremendous number of holistic matching algorithms proposed in the literature, it is quite impossible to review them all. Hence, we aim in the following sections at presenting the most influential. Many labeling schemes have been proposed in the literature. We particularly focus in this section on the region encoding (or containment) and the DeweyID (or prefix) labeling schemes that are used in holistic approaches. However, other approaches do exist, based on a tree-traversal order prime numbers or a combination of structural index and inverted lists [58], for instance. Various holistic algorithms actually achieve TP matching, but they all exploit a data list that, for each node, contains all labels of nodes of the same type. In this section, we first review the approaches based on the region encoding scheme, which were first proposed, and then the approaches based on the Dewey ID scheme. These approaches aim at avoiding repeated access to the input data tree. Thus, they exploit structural summaries similar to the DataGuide proposed for semistructured documents . Advantage A DataGuide’s structure describes using one single label all the nodes whose labels are identical. Its definition is based on targeted path sets, of nodes that are reached by traversing a given path.
  • 3. Simply replacing the region encoding labeling scheme by theDeweyID scheme wouldnot particularly improve holistic matching approaches, since they would also need to read labels for all tree nodes. Module Tree Pattern Minimization Translating XML Queries Labeling Phase Computing Phase Tree Homeomorphism Time Complexity Module Description Tree Pattern Minimization The efficiency of TP matching depends a lot on the size of the pattern. It is thus essential to identify and eliminate redundant nodes in the pattern and do so as efficiently as possible. This process is called TP minimization. All research related to TP minimization is based on a formulate the problem as follows: given a TP, find an equivalent TP of the smallest size. Translating XML Queries The Expressiveness is a complex issue. Translating XML queries into TPs is indeed easier than translating TPs back into an XML query plan. XQuery, although the standard XML query language, suffers from limitations such as the lack of a Group by construct. Thus, it is more efficient to implement TPs and exploit them to enrich XML querying in an ad hoc environment such as TIMBER’s. We think that the richer the pattern, with matching options, ordering specifications, possibility to associate with many operators (and other options if possible), the more efficient querying is, in terms of user need satisfaction. Labeling Phase The aim of data tree labeling schemes is to determine the relationship between two nodes of a tree from their labels alone . Many labeling schemes have been proposed in the literature. We particularly focus in this section on the region encoding and the DeweyID labeling schemes that are used in holistic approaches.
  • 4. However, other approaches do exist, based on a tree-traversal order prime numbers or a combination of structural index and inverted lists, for instance. Computing Phase The Various holistic algorithms actually achieve TP matching, but they all exploit a data list that, for each node, contains all labels of nodes of the same type. In this section, we first review the approaches based on the region encoding scheme, which were first proposed, and then the approaches based on the Dewey ID scheme. Tree Homeomorphism The tree homeomorphism matching problem is a particular case of the TP matching problem. More precisely, the considered TPs only bear descendant edges. Formally, given a TP p and a data tree t, tree homeomorphism matching aims at determining whether there is a mapping from the nodes. Time Complexity Time complexity is quite well documented for minimization approaches. Except the first, naive matching algorithms all optimized minimization algorithms, whether they take ICs into account or not, have the worst case time complexity.
  • 6. CONCLUSION We provide in this paper a comprehensive survey about XML tree patterns, which are nowadays considered crucial in XML querying and its optimization. We first compare TPs from a structural point of view, concluding that the richer a TP is with matching possibilities, the larger the subset of XQuery/XPath it encompasses, and thus the closer to user expectations it is. Second, acknowledging that TP querying, i.e., matching a TP against a data tree, is central in TP usage, we review methods for TP matching optimization. They belong to two main families: TP minimization and holistic matching. We trust we provide a good overview of these approaches’ evolution, and highlight the best algorithms in each family as of today. Moreover, we want to emphasize that TP minimization and holistic matching are complementary and should both be used to wholly optimize TP matching. We eventually illustrate how TPs are actually exploited in several application domains such as system optimization, network routing or knowledge discovery from XML sources. We especially demonstrate the use of frequent TP mining and TP rewriting for various purposes. Although TP-related research, which has been ongoing for more than a decade, could look mature in the light of this survey, it is perpetually challenged by the ever-growing acceptance and usage of XML. For instance, recent applications require either querying data with a complex or only partially known structure, or integrating heterogeneous XML data sources (e.g., when dealing with streams). The keyword search-based languages that address these problems cannot be expressed with TPs. Thus, TPs must be extended, e.g., by the so-called partial tree-pattern queries (PTPQs) that allow the partial specification of a TP and are not restricted by a total order on nodes. In turn, adapted matching procedures must be devised a trend that is likely to perpetuate in the near future. Moreover, we purposely focus on the core of TP-related topics in this survey (namely, TPs themselves, matching issues and a couple of applications). There is nonetheless a large number of important topics that we could not address due to space constraints, such as TP indexing, TP-based view selection, TP for probabilistic XML, and continuous TP matching over XML streams. REFERENCES [1] L. Quin, “Extensible Markup Language (XML),” World Wide Web Consortium (W3C), http://www.w3.org/XML/, 2006. [2] D. Carlisle, P. Ion, and R. Miner, “Mathematical Markup Language (MathML) Version 3.0,” World Wide Web Consortium (W3C), http://www.w3.org/TR/MathML/, 2010. [3] P. Murray-Rust and H. Rzepa, “Chemical Markup Language - CML,” http://www.xml-cml.org/, 1995.
  • 7. [4] R. Lake, D.S. Burggraf, M. Trninic, and L. Rae, Geography Mark-Up Language: Foundation for the Geo- Web. Wiley, 2004. [5] ADL, “SCORM 2004 Fourth Edition Version 1.1 Overview,” Advanced Distributed Learning (ADL), http://www. adlnet.gov/Technologies/scorm/, 2004. [6] J. Clark and S. DeRose, “XML Path Language (XPath) Version 1.0,” World Wide Web Consortium (W3C), http://www.w3.org/ TR/xpath, 1999. [7] S. Boag, D. Chamberlin, M.F. Ferna´ndez, D. Florescu, J. Robie, and J. Sime´on, “XQuery 1.0: An XML Query Language,” World Wide Web Consortium (W3C), http://www.w3.org/TR/xquery/, 2007. [8] H.V. Jagadish, L.V.S. Lakshmanan, D. Srivastava, and K. Thompson, “TAX: A Tree Algebra for XML,” Proc. Eighth Int’l Workshop Database Programming Languages (DBPL ’01), pp. 149-164, 2001.