SlideShare a Scribd company logo
Multi-Core Processing of XML Twig Patterns
Abstract:
XML is based on a tree-structured data model. Naturally, the most popular
XML querying language (XPath) uses patterns of selection predicates, on
multiple elements related by a tree structure, which often may be
abstracted by twig patterns. Finding all occurrences of such a twig pattern
in an XML database is a basic operation for XML query processing. We
present the parallel path stack algorithm (PPS) and the parallel twig stack
algorithm (PTS). PPS and PTS are novel and efficient algorithms for
matching XML query twig patterns in a parallel multi-threaded computing
platform. PPS and PTS are based on the PathStack and TwigStack
algorithms [1]. These algorithms employ a sophisticated search technique
for limiting processing to specific subtrees. We conducted extensive
experimentation with PPS and PTS. We compared PPS and PTS to the
standard (sequential) PathStack and TwigStack algorithms in terms of run
time (to completion). We checked their performance for varying numbers
of threads. Experimental results indicate that using PPS and PTS
significantly reduces the running time of queries in comparison with the
PathStack/TwigStack algorithm (up to 44 times faster for DBLP queries and
up to 22 times faster for XMark queries).
Existing System:
The problem we address is how to speed up processing for twig queries, an
important subset of the XPath language, within a multi-core architecture.
Unlike prior work, we deal with parallelizing the execution of a single
XPath query. The space of possible algorithms and associated storage
structures, and indices, is enormous.
Proposed System:
In this paper we present the Parallel Path Stack algorithm (PPS) and the
Parallel Twig Stack algorithm (PTS), novel parallel algorithms for
processing a single twig pattern in an L-Stream environment. PPS and PTS
are based on the PathStack and TwigStack algorithms.
TwigStack is a basic algorithm for XML documents that utilizes the
LStream representation. The PathStack and TwigStack algorithms are
designed so that no large intermediate results are created. The reason we
chose these particular algorithms is that they are very well understood and
effective, and therefore can provide a good vehicle to demonstrate the
power of parallelism in the context of XML querying.
Hardware Requirements:
• System : Pentium IV 2.4 GHz.
• Hard Disk : 40 GB.
• Floppy Drive : 1.44 Mb.
• Monitor : 15 VGA Colour.
• Mouse : Logitech.
• RAM : 256 Mb.
Software Requirements:
• Operating system : - Windows XP.
• Front End : - JSP
• Back End : - SQL Server
Software Requirements:
• Operating system : - Windows XP.
• Front End : - .Net
• Back End : - SQL Server

More Related Content

PDF
Incremental clustering in search engines
DOCX
Clustering sentence level text using a novel fuzzy relational clustering algo...
PDF
Modelling Multi-Component Predictive Systems as Petri Nets
PDF
SWiM – A Semantic Wiki for Mathematical Knowledge Management
PDF
Towards Transfer Learning of Link Specifications
PDF
Использование стека Hadoop для построения сервиса сверки данных НДС
PDF
AI & Topology concluding remarks - "The open-source landscape for topology in...
ODP
Five python libraries should know for machine learning
Incremental clustering in search engines
Clustering sentence level text using a novel fuzzy relational clustering algo...
Modelling Multi-Component Predictive Systems as Petri Nets
SWiM – A Semantic Wiki for Mathematical Knowledge Management
Towards Transfer Learning of Link Specifications
Использование стека Hadoop для построения сервиса сверки данных НДС
AI & Topology concluding remarks - "The open-source landscape for topology in...
Five python libraries should know for machine learning

What's hot (18)

PPTX
Tdm probabilistic models (part 2)
PDF
If the data cannot come to the algorithm...
PPTX
003 20151109 nn_faster_andfaster
PPTX
Probabilistic models (part 1)
PPTX
Parallel programming in modern world .net technics shared
ODP
Gist od2-feb-2011
PDF
Data visualization in Python
PDF
Introduction to Data streaming - 05/12/2014
PPT
SPARQL and SQL: technical aspects and synergy
PPTX
Nikolai Blenda - Clusterization of text documents using WordNet and semantic ...
PPTX
PhD Projects in OpenCV Research Ideas
PPTX
PhD Projects in Opnet Research Guidance
PPT
Stacks in algorithems & data structure
DOCX
Practical 9
PPTX
Python libraries
PPTX
A Standard Data Format for Computational Chemistry: CSX
PDF
Oshs_9_11_2015
PDF
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Tdm probabilistic models (part 2)
If the data cannot come to the algorithm...
003 20151109 nn_faster_andfaster
Probabilistic models (part 1)
Parallel programming in modern world .net technics shared
Gist od2-feb-2011
Data visualization in Python
Introduction to Data streaming - 05/12/2014
SPARQL and SQL: technical aspects and synergy
Nikolai Blenda - Clusterization of text documents using WordNet and semantic ...
PhD Projects in OpenCV Research Ideas
PhD Projects in Opnet Research Guidance
Stacks in algorithems & data structure
Practical 9
Python libraries
A Standard Data Format for Computational Chemistry: CSX
Oshs_9_11_2015
Spark, Deep Learning and Life Sciences, Systems Biology in the Big Data Age
Ad

Similar to Multi core processing of xml twig patterns (20)

DOCX
A survey of xml tree patterns
PDF
Effective Data Retrieval in XML using TreeMatch Algorithm
PDF
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
PPT
Peer-to-Peer Management of Large-Scale Memory Sources (midterm)
PPTX
Apresent
PPTX
Your data isn't that big @ Big Things Meetup 2016-05-16
PDF
Scaling up genomic analysis with ADAM
PDF
IJCTT-V4I9P137
PPT
Xml processing-by-asfak
PPTX
Frequent Itemset Mining on BigData
PPTX
Inefficiencies in using Middleboxes with OpenFlow
DOCX
JAVA 2013 IEEE DATAMINING PROJECT A probabilistic approach to string transfor...
DOCX
A probabilistic approach to string transformation
PDF
A parallel string matching engine for use in high speed network intrusion det...
PDF
Research Inventy : International Journal of Engineering and Science
PDF
PDF
Combining text and pattern preprocessing in an adaptive dna pattern matcher
PDF
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
PDF
Data streaming at VRT
PPTX
Capturing Network Traffic into Database
A survey of xml tree patterns
Effective Data Retrieval in XML using TreeMatch Algorithm
Large-Scale Text Processing Pipeline with Spark ML and GraphFrames: Spark Sum...
Peer-to-Peer Management of Large-Scale Memory Sources (midterm)
Apresent
Your data isn't that big @ Big Things Meetup 2016-05-16
Scaling up genomic analysis with ADAM
IJCTT-V4I9P137
Xml processing-by-asfak
Frequent Itemset Mining on BigData
Inefficiencies in using Middleboxes with OpenFlow
JAVA 2013 IEEE DATAMINING PROJECT A probabilistic approach to string transfor...
A probabilistic approach to string transformation
A parallel string matching engine for use in high speed network intrusion det...
Research Inventy : International Journal of Engineering and Science
Combining text and pattern preprocessing in an adaptive dna pattern matcher
Genomic repeats detection using Boyer-Moore algorithm on Apache Spark Streaming
Data streaming at VRT
Capturing Network Traffic into Database
Ad

More from ieeepondy (20)

PDF
Demand aware network function placement
PDF
Service description in the nfv revolution trends, challenges and a way forward
PDF
Secure optimization computation outsourcing in cloud computing a case study o...
PDF
Spatial related traffic sign inspection for inventory purposes using mobile l...
PDF
Standards for hybrid clouds
PDF
Rfhoc a random forest approach to auto-tuning hadoop's configuration
PDF
Resource and instance hour minimization for deadline constrained dag applicat...
PDF
Reliable and confidential cloud storage with efficient data forwarding functi...
PDF
Rebuttal to “comments on ‘control cloud data access privilege and anonymity w...
PDF
Scalable cloud–sensor architecture for the internet of things
PDF
Scalable algorithms for nearest neighbor joins on big trajectory data
PDF
Robust workload and energy management for sustainable data centers
PDF
Privacy preserving deep computation model on cloud for big data feature learning
PDF
Pricing the cloud ieee projects, ieee projects chennai, ieee projects 2016,ie...
PDF
Protection of big data privacy
PDF
Power optimization with bler constraint for wireless fronthauls in c ran
PDF
Performance aware cloud resource allocation via fitness-enabled auction
PDF
Performance limitations of a text search application running in cloud instances
PDF
Performance analysis and optimal cooperative cluster size for randomly distri...
PDF
Predictive control for energy aware consolidation in cloud datacenters
Demand aware network function placement
Service description in the nfv revolution trends, challenges and a way forward
Secure optimization computation outsourcing in cloud computing a case study o...
Spatial related traffic sign inspection for inventory purposes using mobile l...
Standards for hybrid clouds
Rfhoc a random forest approach to auto-tuning hadoop's configuration
Resource and instance hour minimization for deadline constrained dag applicat...
Reliable and confidential cloud storage with efficient data forwarding functi...
Rebuttal to “comments on ‘control cloud data access privilege and anonymity w...
Scalable cloud–sensor architecture for the internet of things
Scalable algorithms for nearest neighbor joins on big trajectory data
Robust workload and energy management for sustainable data centers
Privacy preserving deep computation model on cloud for big data feature learning
Pricing the cloud ieee projects, ieee projects chennai, ieee projects 2016,ie...
Protection of big data privacy
Power optimization with bler constraint for wireless fronthauls in c ran
Performance aware cloud resource allocation via fitness-enabled auction
Performance limitations of a text search application running in cloud instances
Performance analysis and optimal cooperative cluster size for randomly distri...
Predictive control for energy aware consolidation in cloud datacenters

Recently uploaded (20)

PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PDF
advance database management system book.pdf
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
RMMM.pdf make it easy to upload and study
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
Indian roads congress 037 - 2012 Flexible pavement
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Cell Types and Its function , kingdom of life
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PDF
Computing-Curriculum for Schools in Ghana
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
advance database management system book.pdf
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Practical Manual AGRO-233 Principles and Practices of Natural Farming
A systematic review of self-coping strategies used by university students to ...
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
RMMM.pdf make it easy to upload and study
A powerpoint presentation on the Revised K-10 Science Shaping Paper
Indian roads congress 037 - 2012 Flexible pavement
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Orientation - ARALprogram of Deped to the Parents.pptx
Chinmaya Tiranga quiz Grand Finale.pdf
Cell Types and Its function , kingdom of life
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
Computing-Curriculum for Schools in Ghana
What if we spent less time fighting change, and more time building what’s rig...
Weekly quiz Compilation Jan -July 25.pdf
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين

Multi core processing of xml twig patterns

  • 1. Multi-Core Processing of XML Twig Patterns Abstract: XML is based on a tree-structured data model. Naturally, the most popular XML querying language (XPath) uses patterns of selection predicates, on multiple elements related by a tree structure, which often may be abstracted by twig patterns. Finding all occurrences of such a twig pattern in an XML database is a basic operation for XML query processing. We present the parallel path stack algorithm (PPS) and the parallel twig stack algorithm (PTS). PPS and PTS are novel and efficient algorithms for matching XML query twig patterns in a parallel multi-threaded computing platform. PPS and PTS are based on the PathStack and TwigStack algorithms [1]. These algorithms employ a sophisticated search technique for limiting processing to specific subtrees. We conducted extensive experimentation with PPS and PTS. We compared PPS and PTS to the standard (sequential) PathStack and TwigStack algorithms in terms of run time (to completion). We checked their performance for varying numbers of threads. Experimental results indicate that using PPS and PTS significantly reduces the running time of queries in comparison with the PathStack/TwigStack algorithm (up to 44 times faster for DBLP queries and up to 22 times faster for XMark queries).
  • 2. Existing System: The problem we address is how to speed up processing for twig queries, an important subset of the XPath language, within a multi-core architecture. Unlike prior work, we deal with parallelizing the execution of a single XPath query. The space of possible algorithms and associated storage structures, and indices, is enormous. Proposed System: In this paper we present the Parallel Path Stack algorithm (PPS) and the Parallel Twig Stack algorithm (PTS), novel parallel algorithms for processing a single twig pattern in an L-Stream environment. PPS and PTS are based on the PathStack and TwigStack algorithms. TwigStack is a basic algorithm for XML documents that utilizes the LStream representation. The PathStack and TwigStack algorithms are designed so that no large intermediate results are created. The reason we chose these particular algorithms is that they are very well understood and effective, and therefore can provide a good vehicle to demonstrate the power of parallelism in the context of XML querying. Hardware Requirements:
  • 3. • System : Pentium IV 2.4 GHz. • Hard Disk : 40 GB. • Floppy Drive : 1.44 Mb. • Monitor : 15 VGA Colour. • Mouse : Logitech. • RAM : 256 Mb. Software Requirements: • Operating system : - Windows XP. • Front End : - JSP • Back End : - SQL Server Software Requirements: • Operating system : - Windows XP. • Front End : - .Net • Back End : - SQL Server