SlideShare a Scribd company logo
HPD Presentation
Semi-Structured data &
XML
Presented by-
Diksha R. Gupta
Roll no.:- 7
Hpd ppt
Semistructured Data
 Another data model, based on trees.
 Motivation: flexible representation of data.
◦ Often, data comes from multiple sources
with differences in notation, meaning, etc.
 Motivation: sharing of documents among
systems and databases.
3
Graphs of Semistructured Data
 Nodes = objects.
 Labels on arcs (attributes, relationships).
 Atomic values at leaf nodes (nodes with no
arcs out).
 Flexibility: no restriction on:
◦ Labels out of a node.
◦ Number of successors with a given label.
4
Hpd ppt
Hpd ppt
XML
 XML = Extensible Markup Language.
 While HTML uses tags for formatting
(e.g., “italic”), XML uses tags for
semantics (e.g., “this is an address”).
 Key idea: create tag sets for a domain
(e.g., genomics), and translate all data into
properly tagged XML documents.
7
HTML and XML
8
XML stands for extensible Markup Language
HTML is used to mark up
text so it can be displayed to
users
XML is used to mark up
data so it can be processed
by computers
HTML describes both
structure (e.g. <p>, <h2>,
<tr>,<td>) and appearance
(e.g. <br>, <font>, <i>)
XML describes only
content, or “meaning”
HTML uses a fixed,
unchangeable set of tags
In XML, you make up
your own tags
HTML
<h1> Bibliography </h1>
<p> <i> Foundations of Databases </i>
Abiteboul, Hull, Vianu
<br> Addison Wesley, 1995
<p> <i> Data on the Web </i>
Abiteboul, Buneman, Suciu
<br> Morgan Kaufmann, 1999
XML
<bibliography>
<book> <title> Foundations… </title>
<author> Abiteboul </author>
<author> Hull </author>
<author> Vianu </author>
<publisher> Addison Wesley </publisher>
<year> 1995 </year>
</book>
…
</bibliography>
XML describes the content
Hpd ppt
Well-Formed and Valid XML
 Well-Formed XML allows you to invent
your own tags.
◦ Similar to labels in semistructured data.
 Valid XML involves a DTD (Document
Type Definition), a grammar for tags.
12
Well-Formed XML
 Start the document with a declaration,
surrounded by <?xml … ?> .
 Normal declaration is:
<?xml version = “1.0” standalone = “yes”
?>
◦ “Standalone” = “no DTD provided.”
 Balance of document is a root tag
surrounding nested tags.
13
Tags
 Tags, as in HTML, are normally matched
pairs, as <FOO> … </FOO> .
 Tags may be nested arbitrarily.
 XML tags are case sensitive.
14
XML and Semistructured Data
 Well-Formed XML with nested tags is
exactly the same idea as trees of semi-
structured data.
 We shall see that XML also enables non
tree structures, as does the semi-structured
data model.
15
Example
 The <BARS> XML document is:
16
Joe’s Bar
Bud 2.50 Miller 3.00
PRICE
BAR
BAR
BARS
NAME . . .
BAR
PRICE
NAME
BEER
BEER
NAME
Slide 27- 17
XML Hierarchical (Tree) Data Model
(contd.)
 The basic object is XML is the XML
document.
 There are two main structuring concepts
that are used to construct an XML
document:
◦ Elements
◦ Attributes
 Attributes in XML provide additional
information that describe elements.
Slide 27- 18
XML Hierarchical (Tree) Data Model
(contd.)
 As in HTML, elements are identified in a document by
their start tag and end tag.
◦ The tag names are enclosed between angled brackets
<…>, and end tags are further identified by a
backslash </…>.
 Complex elements are constructed from other elements
hierarchically, whereas simple elements contain data
values.
 It is straightforward to see the correspondence between
the XML textual representation and the tree structure.
◦ In the tree representation, internal nodes represent
complex elements, whereas leaf nodes represent
simple elements.
◦ That is why the XML model is called a tree model or
a hierarchical model.
Slide 27- 19
XML Hierarchical (Tree) Data Model
(contd.)
 It is possible to characterize three main types of XML documents:
1. Data-centric XML documents
 These documents have many small data items that follow
a specific structure, and hence may be extracted from a
structured database. They are formatted as XML
documents in order to exchange them or display them
over the Web.
2. Document-centric XML documents:
 These are documents with large amounts of text, such as
news articles or books. There is little or no structured data
elements in these documents.
3. Hybrid XML documents:
 These documents may have parts that contains structured
data and other parts that are predominantly textual or
unstructured.
Hpd ppt
Hpd ppt
DTD Structure
<!DOCTYPE <root tag> [
<!ELEMENT <name>(<components>)>
. . . more elements . . .
]>
22
DTD Elements
 The description of an element consists of
its name (tag), and a parenthesized
description of any nested tags.
◦ Includes order of subtags and their
multiplicity.
 Leaves (text elements) have #PCDATA
(Parsed Character DATA ) in place of
nested tags.
23
Example: DTD
<!DOCTYPE BARS [
<!ELEMENT BARS (BAR*)>
<!ELEMENT BAR (NAME, BEER+)>
<!ELEMENT NAME (#PCDATA)>
<!ELEMENT BEER (NAME, PRICE)>
<!ELEMENT PRICE (#PCDATA)>
]>
24
A BARS object has
zero or more BAR’s
nested within.
A BAR has one
NAME and one
or more BEER
subobjects.
A BEER has a
NAME and a
PRICE.
NAME and PRICE
are text.
Element Descriptions
 Sub tags must appear in order shown.
 A tag may be followed by a symbol to
indicate its multiplicity.
◦ * = zero or more.
◦ + = one or more.
◦ ? = zero or one.
 Symbol | can connect alternative sequences
of tags.
25
Hpd ppt
XML Schema
 In XML format
 Element names and types associated locally
 Includes primitive data types (integers, strings,
dates, etc.)
 Supports value-based constraints (integers >
100)
 User-definable structured types
 Inheritance (extension or restriction)
 Foreign keys
 Element-type reference constraints
Sample XML Schema
<schema version=“1.0” xmlns=“http://www.w3.org/1999/XMLSchema”>
<element name=“author” type=“string” />
<element name=“date” type = “date” />
<element name=“abstract”>
<type>
…
</type>
</element>
<element name=“paper”>
<type>
<attribute name=“keywords” type=“string”/>
<element ref=“author” minOccurs=“0” maxOccurs=“*” />
<element ref=“date” />
<element ref=“abstract” minOccurs=“0” maxOccurs=“1” />
<element ref=“body” />
</type>
</element>
</schema>
Hpd ppt

More Related Content

DOCX
Xml viva questions
DOCX
Oracle soa xml faq
PPTX
Sgml and xml
PPT
uptu web technology unit 2 Xml2
PPT
uptu web technology unit 2 Xml2
PPT
uptu web technology unit 2 Xml2
PPTX
XML - Data Modeling
PPT
Xml iet 2015
Xml viva questions
Oracle soa xml faq
Sgml and xml
uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2
uptu web technology unit 2 Xml2
XML - Data Modeling
Xml iet 2015

What's hot (20)

PPT
XML Databases
PPTX
Xml dtd- Document Type Definition- Web Technology
PPTX
PPT
01 xml document structure
PPTX
Web data management (chapter-1)
PDF
XML
PPTX
Database fundamentals
PDF
Building XML Based Applications
PDF
Difference between dtd and xsd
PPT
9. Object Relational Databases in DBMS
PPT
XML and Databases
PPTX
Xml dtd
PPTX
PPS
eXtensible Markup Language
PPTX
Web data management
PPT
Intro to xml
XML Databases
Xml dtd- Document Type Definition- Web Technology
01 xml document structure
Web data management (chapter-1)
XML
Database fundamentals
Building XML Based Applications
Difference between dtd and xsd
9. Object Relational Databases in DBMS
XML and Databases
Xml dtd
eXtensible Markup Language
Web data management
Intro to xml
Ad

Viewers also liked (18)

PPTX
Skins textual analysis
PDF
PDF Nov-Dec 66-68
PPT
Cumplea+¦os feliz
PDF
DOCX
DOCX
Production log 1
PDF
PDF
ZPVDAY16
PDF
Vinit Gloves- Standard
PDF
ZP2015summerpromo
PDF
Lista de informatica
DOCX
Production log 1
PPTX
Ppt final
PPTX
Information filtering
PPTX
Hotsos 2013 - Creating Structure in Unstructured Data
PDF
Ixonos’ perspectives on MirrorLink
PDF
Chemical structure representation in PubChem
PPTX
ATM(AUTOMATIC TELLER MACHINE)-HISTORY,TYPES, WORKING, STRUCTURE
Skins textual analysis
PDF Nov-Dec 66-68
Cumplea+¦os feliz
Production log 1
ZPVDAY16
Vinit Gloves- Standard
ZP2015summerpromo
Lista de informatica
Production log 1
Ppt final
Information filtering
Hotsos 2013 - Creating Structure in Unstructured Data
Ixonos’ perspectives on MirrorLink
Chemical structure representation in PubChem
ATM(AUTOMATIC TELLER MACHINE)-HISTORY,TYPES, WORKING, STRUCTURE
Ad

Similar to Hpd ppt (20)

PPTX
Adbms_unit1_1 - Copy.pptxzsszcsczsxczxcxzcxzc
PPT
Deals with data Collectiona nd data management
PPTX
Adbms_unit1_1.pptx dsfdsfdfdfdfsdfdsfdsf
PPT
Xml basics concepts
PPT
database management system navathe chapter 26
PPT
ENCh26.ppt
PPTX
01_XMLDataModel.pptx
PPTX
Unit2_XML_S_SS_US Data_CS19414.pptx
PPTX
advDBMS_XML.pptx
PPTX
XML_Chapter13 presentation from the textbook
PDF
xMLDataModel.pdf
PPT
DATA INTEGRATION (Gaining Access to Diverse Data).ppt
PPT
unit_5_XML data integration database management
PPT
PDF
PPTX
PPT
xml.ppt
PPTX
XML-Extensible Markup Language
PPT
Introduction to XML.ppt
PPT
Introduction to XML.ppt
Adbms_unit1_1 - Copy.pptxzsszcsczsxczxcxzcxzc
Deals with data Collectiona nd data management
Adbms_unit1_1.pptx dsfdsfdfdfdfsdfdsfdsf
Xml basics concepts
database management system navathe chapter 26
ENCh26.ppt
01_XMLDataModel.pptx
Unit2_XML_S_SS_US Data_CS19414.pptx
advDBMS_XML.pptx
XML_Chapter13 presentation from the textbook
xMLDataModel.pdf
DATA INTEGRATION (Gaining Access to Diverse Data).ppt
unit_5_XML data integration database management
xml.ppt
XML-Extensible Markup Language
Introduction to XML.ppt
Introduction to XML.ppt

More from dikshagupta111 (10)

PPTX
Osd diksha presentation
PPTX
Dik seminar
PPTX
Diksha sda presentation
PPTX
Dik acn presentation
PPTX
Final ppt
PPTX
Ppt final
PPTX
Diksha gupta
PPT
PPTX
Parallel language &amp; compilers
PPTX
Final ppt
Osd diksha presentation
Dik seminar
Diksha sda presentation
Dik acn presentation
Final ppt
Ppt final
Diksha gupta
Parallel language &amp; compilers
Final ppt

Recently uploaded (20)

PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPT
Mechanical Engineering MATERIALS Selection
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Well-logging-methods_new................
PDF
Digital Logic Computer Design lecture notes
PPTX
Geodesy 1.pptx...............................................
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
DOCX
573137875-Attendance-Management-System-original
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
additive manufacturing of ss316l using mig welding
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Welding lecture in detail for understanding
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Lecture Notes Electrical Wiring System Components
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Mechanical Engineering MATERIALS Selection
Operating System & Kernel Study Guide-1 - converted.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Well-logging-methods_new................
Digital Logic Computer Design lecture notes
Geodesy 1.pptx...............................................
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
573137875-Attendance-Management-System-original
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
additive manufacturing of ss316l using mig welding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
CYBER-CRIMES AND SECURITY A guide to understanding
Welding lecture in detail for understanding
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Lecture Notes Electrical Wiring System Components

Hpd ppt

  • 1. HPD Presentation Semi-Structured data & XML Presented by- Diksha R. Gupta Roll no.:- 7
  • 3. Semistructured Data  Another data model, based on trees.  Motivation: flexible representation of data. ◦ Often, data comes from multiple sources with differences in notation, meaning, etc.  Motivation: sharing of documents among systems and databases. 3
  • 4. Graphs of Semistructured Data  Nodes = objects.  Labels on arcs (attributes, relationships).  Atomic values at leaf nodes (nodes with no arcs out).  Flexibility: no restriction on: ◦ Labels out of a node. ◦ Number of successors with a given label. 4
  • 7. XML  XML = Extensible Markup Language.  While HTML uses tags for formatting (e.g., “italic”), XML uses tags for semantics (e.g., “this is an address”).  Key idea: create tag sets for a domain (e.g., genomics), and translate all data into properly tagged XML documents. 7
  • 8. HTML and XML 8 XML stands for extensible Markup Language HTML is used to mark up text so it can be displayed to users XML is used to mark up data so it can be processed by computers HTML describes both structure (e.g. <p>, <h2>, <tr>,<td>) and appearance (e.g. <br>, <font>, <i>) XML describes only content, or “meaning” HTML uses a fixed, unchangeable set of tags In XML, you make up your own tags
  • 9. HTML <h1> Bibliography </h1> <p> <i> Foundations of Databases </i> Abiteboul, Hull, Vianu <br> Addison Wesley, 1995 <p> <i> Data on the Web </i> Abiteboul, Buneman, Suciu <br> Morgan Kaufmann, 1999
  • 10. XML <bibliography> <book> <title> Foundations… </title> <author> Abiteboul </author> <author> Hull </author> <author> Vianu </author> <publisher> Addison Wesley </publisher> <year> 1995 </year> </book> … </bibliography> XML describes the content
  • 12. Well-Formed and Valid XML  Well-Formed XML allows you to invent your own tags. ◦ Similar to labels in semistructured data.  Valid XML involves a DTD (Document Type Definition), a grammar for tags. 12
  • 13. Well-Formed XML  Start the document with a declaration, surrounded by <?xml … ?> .  Normal declaration is: <?xml version = “1.0” standalone = “yes” ?> ◦ “Standalone” = “no DTD provided.”  Balance of document is a root tag surrounding nested tags. 13
  • 14. Tags  Tags, as in HTML, are normally matched pairs, as <FOO> … </FOO> .  Tags may be nested arbitrarily.  XML tags are case sensitive. 14
  • 15. XML and Semistructured Data  Well-Formed XML with nested tags is exactly the same idea as trees of semi- structured data.  We shall see that XML also enables non tree structures, as does the semi-structured data model. 15
  • 16. Example  The <BARS> XML document is: 16 Joe’s Bar Bud 2.50 Miller 3.00 PRICE BAR BAR BARS NAME . . . BAR PRICE NAME BEER BEER NAME
  • 17. Slide 27- 17 XML Hierarchical (Tree) Data Model (contd.)  The basic object is XML is the XML document.  There are two main structuring concepts that are used to construct an XML document: ◦ Elements ◦ Attributes  Attributes in XML provide additional information that describe elements.
  • 18. Slide 27- 18 XML Hierarchical (Tree) Data Model (contd.)  As in HTML, elements are identified in a document by their start tag and end tag. ◦ The tag names are enclosed between angled brackets <…>, and end tags are further identified by a backslash </…>.  Complex elements are constructed from other elements hierarchically, whereas simple elements contain data values.  It is straightforward to see the correspondence between the XML textual representation and the tree structure. ◦ In the tree representation, internal nodes represent complex elements, whereas leaf nodes represent simple elements. ◦ That is why the XML model is called a tree model or a hierarchical model.
  • 19. Slide 27- 19 XML Hierarchical (Tree) Data Model (contd.)  It is possible to characterize three main types of XML documents: 1. Data-centric XML documents  These documents have many small data items that follow a specific structure, and hence may be extracted from a structured database. They are formatted as XML documents in order to exchange them or display them over the Web. 2. Document-centric XML documents:  These are documents with large amounts of text, such as news articles or books. There is little or no structured data elements in these documents. 3. Hybrid XML documents:  These documents may have parts that contains structured data and other parts that are predominantly textual or unstructured.
  • 22. DTD Structure <!DOCTYPE <root tag> [ <!ELEMENT <name>(<components>)> . . . more elements . . . ]> 22
  • 23. DTD Elements  The description of an element consists of its name (tag), and a parenthesized description of any nested tags. ◦ Includes order of subtags and their multiplicity.  Leaves (text elements) have #PCDATA (Parsed Character DATA ) in place of nested tags. 23
  • 24. Example: DTD <!DOCTYPE BARS [ <!ELEMENT BARS (BAR*)> <!ELEMENT BAR (NAME, BEER+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT BEER (NAME, PRICE)> <!ELEMENT PRICE (#PCDATA)> ]> 24 A BARS object has zero or more BAR’s nested within. A BAR has one NAME and one or more BEER subobjects. A BEER has a NAME and a PRICE. NAME and PRICE are text.
  • 25. Element Descriptions  Sub tags must appear in order shown.  A tag may be followed by a symbol to indicate its multiplicity. ◦ * = zero or more. ◦ + = one or more. ◦ ? = zero or one.  Symbol | can connect alternative sequences of tags. 25
  • 27. XML Schema  In XML format  Element names and types associated locally  Includes primitive data types (integers, strings, dates, etc.)  Supports value-based constraints (integers > 100)  User-definable structured types  Inheritance (extension or restriction)  Foreign keys  Element-type reference constraints
  • 28. Sample XML Schema <schema version=“1.0” xmlns=“http://www.w3.org/1999/XMLSchema”> <element name=“author” type=“string” /> <element name=“date” type = “date” /> <element name=“abstract”> <type> … </type> </element> <element name=“paper”> <type> <attribute name=“keywords” type=“string”/> <element ref=“author” minOccurs=“0” maxOccurs=“*” /> <element ref=“date” /> <element ref=“abstract” minOccurs=“0” maxOccurs=“1” /> <element ref=“body” /> </type> </element> </schema>