SlideShare a Scribd company logo
XML Parsing Basics
Basics and understanding
Malintha Adikari
Software Engineer
What is XML Parsing
An XML parser is the piece of software that reads XML files and makes
the information from those files available to applications and
programming languages.
XML parsing approaches...
● Tree-based APIs
○ Object (DOM, JDOM….etc)
● Event based APIs
○ PUSH (SAX)
○ PULL(Stax)
XML parsing approaches...
● Tree-based APIs
○ The whole XML document is parsed and a model of that is built in memory
○ made it possible to go back and forward through an XML which is already read
○ The model they build is usually larger than the original XML document, thus duplicating and
wasting memory
● Event based APIs
○ event based parser parses the whole XML document and throws "events" depending on the
information content of the XML
Object model ( DOM Tree)
The Document Object Model parser is a hierarchy-based parser that creates an object model of the
entire XML document, then hands that model to you to work with
Push model (SAX)
An event-based sequential access parser API that only operates on portions of the XML
document at any one time
Push model (SAX)
● SAX is a push style API
● This means that the SAX parser iterates through the XML and calls methods on the
handler object provided by you. For instance, when the SAX parser encounters the
beginning of an XML element, it calls the startElement on your handler object
● It "pushes" the information from the XML into your object. Hence the name "push" style
API. This is also referred to as an "event driven" API
● Your handler object is notified with event-calls when something interesting is found in the
XML document ("interesting" = elements, texts, comments etc.).
The SAX parser push style parsing is illustrated here:
Push model (SAX)
● the parser continuously pushes events to the calling application until it finishes reading the
whole XML document
● This is more efficient than DOM in terms of memory
● once started, it goes to the end of the document and the caller must be ready to handle all
the events in one shot
● The caller that invokes the parser has no control over the parsing process.
● Once started, tree based or event based push models consume the whole data stream at
once.
Push model (SAX)
● It works by iterating over the XML and call certain methods on a "listener" object when it
meets certain structural elements of the XML
● For instance, it will call the listener object for the following "events":
- startDocument
- startElement
- characters
- comments
- processing instructions
- endElement
- endDocument
Pull model
The Java Architecture for XML Binding maps Java classes to XML documents and allows you
to operate on the XML in a more natural way
Pull model (StaX parser)
● StAX is a pull style API.
● This means that you have to move the StAX parser from item to item in the XML file yourself,
just like you do with a standard Iterator or JDBC ResultSet.
● You can then access the XML information via the StAX parser for each such "item" encountered
in the XML file ("item" = elements, texts, comments etc.).
● unlike in SAX, the client has the full control to start, proceed, pause, and resume the parsing
process
The StAX parser pull style parsing is illustrated here:
String operations*
String operations on a loaded XML document to manually find bits of information within the XML as a String;
for instance, using the String class's indexOf and other built-in methods. This is not a scalable or reusable solution
Let’s start with DOM parser….
Document Object Model (DOM)
o The Java DOM API for XML parsing is intended for working with XML as an object graph in
memory - a "Document Object Model (DOM)"
o The parser traverses the XML file and creates the corresponding DOM objects
o These DOM objects are linked together in a tree structure
o Then you can traverse the DOM structure back and forth as you see fit
DOM structure
DOM - 3 pieces of XML
1. Elements (sometimes called tags)
2. Attributes
3. The data (also called values) that the elements and attributes
describe
Start with DOM parser
1. Creating a Java DOM XML parser
Creating a Java DOM XML parser is done using the javax.xml.parsers.DocumentBuilderFactory class. Here is an example:
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
builder = builderFactory.newDocumentBuilder();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
Start with DOM parser
2. Parsing an XML file into a DOM tree
Parsing an XML file into a DOM tree using the DocumentBuilder is done like this:
try {
Document document = builder.parse( new FileInputStream("/path/to/your/file.xml"));
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
You are now ready to traverse the Document instance you have received from the DocumentBuilder
DOM document element
● A DOM object contains a lot of different nodes connected in a tree-like structure.
● At the top is the Document object.
● The Document object has a single root element, which is returned by calling getDocumentElement()
Element rootElement = document.getDocumentElement();
Node,Element,Attribute …..
Node
Element AttributeDocument
Demo
When should I use DOM
❖ When you need random access to document data
➢ If random access to information is crucial, it is better to use theDOM to create a tree
structure for the data in memory.
❖ When you want to implement complex searches
➢ maintain data structures holding context information such as attributes of current
element
❖ No SAX implementation in current browsers
➢ Microsoft’s Internet Explorer
❖ When you need to perform XSLT transformations
❖ When you want to modify and save XML
❖ DOM allows you to create or modify a document in memory, as well as read a document
from an XML source file
When should I use SAX
o When your documents are large:Perhaps the biggest advantage of SAX is that it requires
significantly less memory to process an XMLdocument than the DOM. With SAX, memory
consumption does not increase with the size of the file
o When you need to abort parsing:Because SAX allows you to abort processing at any time,
you can use it to create applications that fetch particular data
o When you want to retrieve small amounts of information:For many XML-based solutions,
it is not necessary to read the entire document to achieve the desired results. Scanning only
a small percentage of the document results in a significant savings of system resources.
o When you want to create a new document structure:In some cases, you might want to
use SAX to create a data structure using only high-level objects, such as stock symbols and
news, and then combine the data from this XML file with other news sources.Rather than
build a DOM structure with low-level elements,attributes, and processing instructions, you
can build the document structure more efficiently and quickly using SAX
When should I use StAX
Most of the applications that process XML benefit from stream parsing, and most of the time does
not require the entire DOM model in the memory. Having mentioned that as the main advantage
we have in pull parsing, let's look at the other aspects.
● the client gains control of this parsing model and the parsing happens according to client
requirements. However in the pull model, the client is "pushed" with data, irrespective of
whether it is needed.
● Pull parsing libraries are much smaller compared to the respective push libraries, and even
the client code that interacts with these libraries are small, even for complex documents.
● Filtering of elements is easier, as the client knows that when a particular element comes in,
he has time to make the decisions
Questions?
Exercise
1. Download “automation.xml” file from following svn location
https://svn.wso2.org/repos/wso2/carbon/platform/branches/turing/platform-integration/test-automation-
framework-2/org.wso2.carbon.automation.engine/4.3.0/src/main/resources/automation.xml
2. Load data in the xml file to your data structure using DOM parser. Develop your own API to retrieve nodeValues
ex: getNodeValue(String value, Node parent)
* Provided xml file has comments. Identify the effects of comment over building the DOM tree. Remove xml
comments before you build the DOM tree
Get a XML and remove comments before you build it to DOM. What Happens ? Try to fix it
Contact us !

More Related Content

PPT
MYSQL - PHP Database Connectivity
PPTX
Uploading a file with php
PDF
javascript objects
PPT
JavaScript & Dom Manipulation
PDF
jQuery for beginners
PPT
JavaScript Objects
PPTX
Php oop presentation
PPTX
Java script
MYSQL - PHP Database Connectivity
Uploading a file with php
javascript objects
JavaScript & Dom Manipulation
jQuery for beginners
JavaScript Objects
Php oop presentation
Java script

What's hot (20)

PDF
Basics of JavaScript
PPT
Introduction to Javascript
PPSX
Introduction to Html5
PPTX
Database Connectivity in PHP
PPT
PDF
Bootstrap
PPT
Oops concepts in php
PPTX
PHP FUNCTIONS
PPTX
Static and Dynamic webpage
PPTX
PHP Cookies and Sessions
PPTX
Python decorators
PDF
JavaScript - Chapter 12 - Document Object Model
DOC
DBMS Practical File
PDF
07 java collection
PPT
PHP variables
DOCX
100 PHP question and answer
PDF
Object Oriented Programming Using C++ Practical File
PPT
SQLITE Android
Basics of JavaScript
Introduction to Javascript
Introduction to Html5
Database Connectivity in PHP
Bootstrap
Oops concepts in php
PHP FUNCTIONS
Static and Dynamic webpage
PHP Cookies and Sessions
Python decorators
JavaScript - Chapter 12 - Document Object Model
DBMS Practical File
07 java collection
PHP variables
100 PHP question and answer
Object Oriented Programming Using C++ Practical File
SQLITE Android
Ad

Viewers also liked (20)

ODP
Xml vs json
PDF
Parsing XML Data
PPT
DOM and SAX
ODT
Rest With Json Vs Soap With Xml
PPT
Xml parsers
PDF
Xml And JSON Java
PDF
eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsin...
PPT
Understanding XML DOM
PDF
XML DOM
PPTX
Parsing XML & JSON in Apex
PDF
Parsing XML in J2ME
PDF
手把手教你如何串接 Log 到各種網路服務
PPT
Java XML Parsing
PPT
Dino's DEV Project
PPTX
J2ee architecture
PPTX
JSON overview and demo
PPT
6 xml parsing
PPTX
Introduction to xml
PPT
Introduction to XML
Xml vs json
Parsing XML Data
DOM and SAX
Rest With Json Vs Soap With Xml
Xml parsers
Xml And JSON Java
eXtensible Markup Language APIs in Java 1.6 - Simple and efficient XML parsin...
Understanding XML DOM
XML DOM
Parsing XML & JSON in Apex
Parsing XML in J2ME
手把手教你如何串接 Log 到各種網路服務
Java XML Parsing
Dino's DEV Project
J2ee architecture
JSON overview and demo
6 xml parsing
Introduction to xml
Introduction to XML
Ad

Similar to Xml parsing (20)

PPT
Processing XML with Java
PPTX
Dom parser
PPT
PPTX
Unit iv xml dom
PPT
PPTX
WEB PRORAMMING NOTES WITH EXAMPLE PROGRAMS
PPT
5 xml parsing
PDF
Building XML Based Applications
PPTX
java API for XML DOM
PPTX
buildingxmlbasedapplications-180322042009.pptx
PDF
Ch23
PDF
Ch23 xml processing_with_java
PDF
X Usax Pdf
DOCX
PDF
Service Oriented Architecture - Unit II - Sax
PDF
Understanding Sax
PPTX
PDF
Processing XML
Processing XML with Java
Dom parser
Unit iv xml dom
WEB PRORAMMING NOTES WITH EXAMPLE PROGRAMS
5 xml parsing
Building XML Based Applications
java API for XML DOM
buildingxmlbasedapplications-180322042009.pptx
Ch23
Ch23 xml processing_with_java
X Usax Pdf
Service Oriented Architecture - Unit II - Sax
Understanding Sax
Processing XML

Recently uploaded (20)

PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Digital Strategies for Manufacturing Companies
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Nekopoi APK 2025 free lastest update
PPTX
Transform Your Business with a Software ERP System
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
Reimagine Home Health with the Power of Agentic AI​
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PPTX
ai tools demonstartion for schools and inter college
Adobe Illustrator 28.6 Crack My Vision of Vector Design
wealthsignaloriginal-com-DS-text-... (1).pdf
PTS Company Brochure 2025 (1).pdf.......
Softaken Excel to vCard Converter Software.pdf
How to Choose the Right IT Partner for Your Business in Malaysia
Understanding Forklifts - TECH EHS Solution
Odoo POS Development Services by CandidRoot Solutions
Digital Strategies for Manufacturing Companies
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
CHAPTER 2 - PM Management and IT Context
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Nekopoi APK 2025 free lastest update
Transform Your Business with a Software ERP System
2025 Textile ERP Trends: SAP, Odoo & Oracle
Which alternative to Crystal Reports is best for small or large businesses.pdf
Reimagine Home Health with the Power of Agentic AI​
Operating system designcfffgfgggggggvggggggggg
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
ai tools demonstartion for schools and inter college

Xml parsing

  • 1. XML Parsing Basics Basics and understanding Malintha Adikari Software Engineer
  • 2. What is XML Parsing An XML parser is the piece of software that reads XML files and makes the information from those files available to applications and programming languages.
  • 3. XML parsing approaches... ● Tree-based APIs ○ Object (DOM, JDOM….etc) ● Event based APIs ○ PUSH (SAX) ○ PULL(Stax)
  • 4. XML parsing approaches... ● Tree-based APIs ○ The whole XML document is parsed and a model of that is built in memory ○ made it possible to go back and forward through an XML which is already read ○ The model they build is usually larger than the original XML document, thus duplicating and wasting memory ● Event based APIs ○ event based parser parses the whole XML document and throws "events" depending on the information content of the XML
  • 5. Object model ( DOM Tree) The Document Object Model parser is a hierarchy-based parser that creates an object model of the entire XML document, then hands that model to you to work with
  • 6. Push model (SAX) An event-based sequential access parser API that only operates on portions of the XML document at any one time
  • 7. Push model (SAX) ● SAX is a push style API ● This means that the SAX parser iterates through the XML and calls methods on the handler object provided by you. For instance, when the SAX parser encounters the beginning of an XML element, it calls the startElement on your handler object ● It "pushes" the information from the XML into your object. Hence the name "push" style API. This is also referred to as an "event driven" API ● Your handler object is notified with event-calls when something interesting is found in the XML document ("interesting" = elements, texts, comments etc.). The SAX parser push style parsing is illustrated here:
  • 8. Push model (SAX) ● the parser continuously pushes events to the calling application until it finishes reading the whole XML document ● This is more efficient than DOM in terms of memory ● once started, it goes to the end of the document and the caller must be ready to handle all the events in one shot ● The caller that invokes the parser has no control over the parsing process. ● Once started, tree based or event based push models consume the whole data stream at once.
  • 9. Push model (SAX) ● It works by iterating over the XML and call certain methods on a "listener" object when it meets certain structural elements of the XML ● For instance, it will call the listener object for the following "events": - startDocument - startElement - characters - comments - processing instructions - endElement - endDocument
  • 10. Pull model The Java Architecture for XML Binding maps Java classes to XML documents and allows you to operate on the XML in a more natural way
  • 11. Pull model (StaX parser) ● StAX is a pull style API. ● This means that you have to move the StAX parser from item to item in the XML file yourself, just like you do with a standard Iterator or JDBC ResultSet. ● You can then access the XML information via the StAX parser for each such "item" encountered in the XML file ("item" = elements, texts, comments etc.). ● unlike in SAX, the client has the full control to start, proceed, pause, and resume the parsing process The StAX parser pull style parsing is illustrated here:
  • 12. String operations* String operations on a loaded XML document to manually find bits of information within the XML as a String; for instance, using the String class's indexOf and other built-in methods. This is not a scalable or reusable solution
  • 13. Let’s start with DOM parser….
  • 14. Document Object Model (DOM) o The Java DOM API for XML parsing is intended for working with XML as an object graph in memory - a "Document Object Model (DOM)" o The parser traverses the XML file and creates the corresponding DOM objects o These DOM objects are linked together in a tree structure o Then you can traverse the DOM structure back and forth as you see fit
  • 16. DOM - 3 pieces of XML 1. Elements (sometimes called tags) 2. Attributes 3. The data (also called values) that the elements and attributes describe
  • 17. Start with DOM parser 1. Creating a Java DOM XML parser Creating a Java DOM XML parser is done using the javax.xml.parsers.DocumentBuilderFactory class. Here is an example: DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = null; try { builder = builderFactory.newDocumentBuilder(); } catch (ParserConfigurationException e) { e.printStackTrace(); }
  • 18. Start with DOM parser 2. Parsing an XML file into a DOM tree Parsing an XML file into a DOM tree using the DocumentBuilder is done like this: try { Document document = builder.parse( new FileInputStream("/path/to/your/file.xml")); } catch (SAXException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } You are now ready to traverse the Document instance you have received from the DocumentBuilder
  • 19. DOM document element ● A DOM object contains a lot of different nodes connected in a tree-like structure. ● At the top is the Document object. ● The Document object has a single root element, which is returned by calling getDocumentElement() Element rootElement = document.getDocumentElement();
  • 21. Demo
  • 22. When should I use DOM ❖ When you need random access to document data ➢ If random access to information is crucial, it is better to use theDOM to create a tree structure for the data in memory. ❖ When you want to implement complex searches ➢ maintain data structures holding context information such as attributes of current element ❖ No SAX implementation in current browsers ➢ Microsoft’s Internet Explorer ❖ When you need to perform XSLT transformations ❖ When you want to modify and save XML ❖ DOM allows you to create or modify a document in memory, as well as read a document from an XML source file
  • 23. When should I use SAX o When your documents are large:Perhaps the biggest advantage of SAX is that it requires significantly less memory to process an XMLdocument than the DOM. With SAX, memory consumption does not increase with the size of the file o When you need to abort parsing:Because SAX allows you to abort processing at any time, you can use it to create applications that fetch particular data o When you want to retrieve small amounts of information:For many XML-based solutions, it is not necessary to read the entire document to achieve the desired results. Scanning only a small percentage of the document results in a significant savings of system resources. o When you want to create a new document structure:In some cases, you might want to use SAX to create a data structure using only high-level objects, such as stock symbols and news, and then combine the data from this XML file with other news sources.Rather than build a DOM structure with low-level elements,attributes, and processing instructions, you can build the document structure more efficiently and quickly using SAX
  • 24. When should I use StAX Most of the applications that process XML benefit from stream parsing, and most of the time does not require the entire DOM model in the memory. Having mentioned that as the main advantage we have in pull parsing, let's look at the other aspects. ● the client gains control of this parsing model and the parsing happens according to client requirements. However in the pull model, the client is "pushed" with data, irrespective of whether it is needed. ● Pull parsing libraries are much smaller compared to the respective push libraries, and even the client code that interacts with these libraries are small, even for complex documents. ● Filtering of elements is easier, as the client knows that when a particular element comes in, he has time to make the decisions
  • 26. Exercise 1. Download “automation.xml” file from following svn location https://svn.wso2.org/repos/wso2/carbon/platform/branches/turing/platform-integration/test-automation- framework-2/org.wso2.carbon.automation.engine/4.3.0/src/main/resources/automation.xml 2. Load data in the xml file to your data structure using DOM parser. Develop your own API to retrieve nodeValues ex: getNodeValue(String value, Node parent) * Provided xml file has comments. Identify the effects of comment over building the DOM tree. Remove xml comments before you build the DOM tree Get a XML and remove comments before you build it to DOM. What Happens ? Try to fix it