SlideShare a Scribd company logo
Amarnath Gupta
University of California San Diego
NIF as a Multi-Model Semantic
Information System
Part 1: Relational, XML, RDF and OWL models
Preamble – 1
 As we design and extend the NIF system we
recognize that
 Users will give us data in any form that is
convenient for them
 Standard data may be stored in a flat file
 Web service output can be in XML
 Semantic Web enthusiasts may represent data using
proper RDF
 However, regardless of the form in which data
may be represented
 The NIF system must treat them
(query, index, relate, ...) in a uniform manner
 The NIF system must utilize the underlying systems
Preamble – 2
 In this presentation we intend to
 Explain our perspective on these different
data models
 Provide a background on the data models
we consider
 Offer a sense of the “semantic character” of
these data models
 Present our design philosophy on
 Where to keep them separate
 Where to transform them into a common model
What is a Data Model?
 A conceptual data model
 A formal representation of the users’/application’s
mental model of data elements and their
relationships that should be put in a
database, manipulated, queried and operated upon
 A logical data model
 A formal description of the data model in a logical
structure that a computer can use to perform the
queries and other operations. In many cases, the
same conceptual model can be represented by
different logical models
 A physical data model
 An implementable version of the data model in
terms of data structures, access structures
(e.g., indices) and the set of low-level operations
A Conceptual Model
ORM Model – Terry Halpin
Object
Relationship/
Role
Value
Constraint
Uniqueness
Constraint
Inter-relationship
Constraint
Value
Type
n-ary
Role
A Logical Data Model
 A formal specification of
 The structure of the data
 The structure tells us how the data is organized
(123, “Purkinje Cell”, Cerebellum)
(828, Hippocampus, “Hilar Cell”)
 Often the structure of the data, together with some
constraints, represent some semantics
 If the data are not structured (like free text), the techniques for
handling them will be different.
 Operations on this structure
 Every data model is based on some mathematical principles
that define what you can do with the data
 the nature of data values
 Data domains and data types
 operations on data values
is not structured
The Relational Data Model
NeuronID NeuronName BrainRegion NeuroTransmitte
r
Current
1 Purkinje Cell Cerebellum Glutamate Transient Na+
2 Hilar Cell Dentate
Gyrus
GABA Ca2+
 Attribute Domain all possible values the attribute can
take
 Candidate key: a set of columns that uniquely
determines a row
 Relational model is a set (bag) of tuples model
 Metadata stored in a separate catalog which is also
relational
 First order constraints
 All queries are about some combination of
 Selecting rows, columns
 Combining tables by union, intersection, join
 Computing data or aggregate functions
 Grouping and sorting
Table: Neurons
Attribute name
Attribute value:
Cannot be compl
Relation name
Tuple
Object Relational Model
 Eases some of the problems of the classical relational
model
 Data values can be of arbitrary data types
 Sets (e.g., multiple currents for a neuron)
 Tuples (e.g., references ordered by year)
 Time-series (e.g., raw EEG data)
 Spatial Data (e.g., atlases in CCDB)
 Each data type can have its own operations
 Find all data points within a neighborhood of a spatial location
 Queries are still values
 Catalog queries and data queries cannot be mixed in a single
query
 All industrial-strength DBMSs use some version this
model
 Need to be a skilled DB programmer to develop custom
XML (Two Perspectives)
 Document Community
 data = linear text documents
 mark up (annotate) text pieces to describe
context, structure, semantics of the marked text
<physiologicalCondition> Oxidative stress </physiologicalCondition> has been
proposed to be involved in the <biologicalProcess context=“disease”> pathogenesis
</biologicalProcess> of <disease> Parkinson's disease</disease> (PD). A plausible
source of <physiologicalCondition> oxidative stress </physiologicalCondition> in
<brainRegion> nigral </brainRegion> <neuron> dopaminergic neurons </neuron> is
the redox reactions that specifically involve <chemical> dopamine </chemical> and
produce various <chemical context=“biologicalAgent”> toxic </chemical> molecules.
XML (Two Perspectives)
 Database Community
 XML as a (most prominent) example of the semi-
structured data model
=> captures the whole spectrum from highly
structured, regular data to unstructured data
(relational, object-oriented, marked up text, ...)<?xml version="1.0" encoding="utf-8"?>
<NDTF_Annotation>
<description>A new annotation file </description>
<timeMarker>true</timeMarker>
<timeResolution>0.000001</timeResolution>
<interval group_id="04">
<eventNote timeOffset="1237888.230” attachedFile="sound1.wmv”
application="realplayer">Text message for the event
start.</eventNote>
<eventNote timeOffset="18958585.232">Text message for the event
end.</eventNote>
</interval>
From the CARMEN gro
XML as a Logical Data Model
 XML is a tree-structured
document
 Nodes
 Element nodes
 Children can be ordered
 Recursive elements
(parts under parts)
 Attribute nodes
 Mandatory or optional
 Edges
 Sub-element edges
 Attribute edges
 IDRef edges
 Constraints
 References
 Value restrictions, OneOf
 Cardinality
• Trees are more flexible than
tables
• Any number of nodes can be
added anywhere without
breaking the model
XML as a Logical Data Model
• XML has its own schema language
• Lets you specify a complex type system
• A database is a collection of XML trees
 Storing XML
 Mostly relational with some very clever indexing to encode
the hierarchy, tree paths, and order
 Querying XML
 Elements, attribute names, values and structure can be
queried
 Multiple trees can be joined by value
 Example (Xpath)
 http://mousespinal.brain-
map.org/imageseries/detail/100002661.xml
 Find images of the spinal column
 //image[//structurelabel/text()=“SPINAL
COLUMN”]/ish_image_path
Misusing and Abusing XML
 Using XML if your data is relational
 It will result in flat trees that will suffer from complex
querying
 Encoding orders and hierarchies that need special
parsing
<Brand_Mixtures count=“2”>
<Brand_Mixture_1> Apo-Levocarb (carbidopa + levodopa)
</Brand_Mixture_1>
<Brand_Mixture_2> Apo-Levocarb CR Controlled-Release Tablets
(carbidopa + levodopa) </Brand_Mixture_2>
</Brand_Mixtures>
 Using implicit multi-valuedness
<atomArray atomID="a1 a2 a3" elementType="O N C" hydrogenCount="1 1 3">
<array dictRef="cml:calcCharge" dataType="xsd:decimal"
units="cml:electron">0.2 -0.3
0.1</array>
</atomArray>
Expressing Semantics in XML
 Adorning elements with Namespaces
 A namespace is a unique URI (Uniform Resource
Locator)
 To disambiguate between two elements that happen to share
the same name
 To group elements relating to a common idea together
<item xmlns:bp="http://www.biopax.org/release/biopax-
level1.owl#">
<bp:protein ID="Protein1">
<bp:NAME>Metalloelastase</bp:NAME>
<bp:XREF>
<bp:unificationXref rdf:ID="Xref1">
<bp:ID>NP_304845</bp:ID>
<bp:DB>RefSeq</bp:DB>
</bp:unificationXref>
</bp:XREF>
</bp:protein>
The Problem with XML
Semantics
 Two different XML
representations of the
same kind of
information may not
be easily unifiable
 What did XML not
encode?
Resource Description Format
(RDF)
Rdf:statement
URI(CNTFR- URI(modulat
es)
URI(eSNCA-
mediated
neurotoxicity)
Rdf:type
Rdf:object
Rdf:predicate
Rdf:subject
URI(membra
ne-protein)
Rdf:type
URI(protein-
mediated
toxicity)
Rdf:type
Rdf:property
The Basic Constructs of RDF
 RDF meta-model basic elements
 All defined in rdf namespace
 http://www.w3.org/1999/02/22-rdf-syntax-ns#
 Types (or classes)
 rdf:resource – everything that can be identified (with a
URI)
 rdf:property – specialization of a resource expressing a
binary relation between two resources
 rdf:statement – a triple with properties
rdf:subject, rdf:predicate, rdf:object
 Properties
 rdf:type - subject is an instance of that category or
class defined by the value
 rdf:subject, rdf:predicate, rdf:object – relate elements
of statement tuple to a resource of type statement.
Relational Data vis-à-vis RDF
 Node to edge ratio is
relatively small in
many applications
 Number of
relationships need not
be fixed at design time
 The general tendency
is keep the number of
edge labels small
 Graph-based
operations can be
performed on
RDF, which requires
an unspecified number
of joins in relational
data
RDF Blank Nodes
 RDF allows one to create anonymous objects whose
existence is known but details are not
 There exists some neuron to which both NeuronX and
NeuronY connect
 <neurons:NeuronX
rdf:about="http://neurons.org/Neuron#NeuronX">
<conn:connectsTo>
<neurons:Neuron rdf:nodeID=“n1"/>
</conn:connectsTo>
</neurons:NeuronX>
 <neurons:NeuronY
rdf:about="http://neurons.org/Neuron#NeuronY">
<conn:connectsTo>
<neurons:Neuron rdf:nodeID=“n1"/>
</conn:connectsTo>
</neurons:NeuronY>
RDF Schema
 Declaration of vocabularies
 classes, properties, and relationships defined by a
particular community
 rdfs:Class, rdfs:subClassOf
 Property-related
 rdfs:subPropertyOf
 relationship of properties to classes
 rdfs:domain, rdfs:range
 Provides substructure for inferences based on existing
triples
 NOT prescriptive, but descriptive
 This is different from XML Schema
 Schema language is an expression of basic RDF model
 uses meta-model constructs:
resources, statements, properties
Examples of RDF Inferencing
 From this we can infer
 (:alice rdf:type parent)
 (:betty rdf:type parent)
 (:eve rdf:type female-person)
 (:charles rdf:type :person)
RDF as a Logical Data Model
 RDF does not distinguish between different
relationships
 Instance-to-type
 Instance-to-instance
 Type-to-instance
 No transitivity inference is possible over, say, rdf:type
 RDF (as well as XML) has lost the notion of the
abstract data type like spatial object or time
 Operations on object types does not mix well with RDF
 Constraints like uniqueness, 1-to-1
relationships, cannot be expressed
 SPARQL, the query language for RDF is
 An edge-only language – it cannot express the //
construct of XML
 Blank nodes are treated as variables not output in the
results
 Parts of the language are undecidable!
A problem is undecidable if it can be proved that there can be no algorithm
OWL
 Components of an OWL Ontology
 Vocabulary (concepts)
 Structure (attributes of concepts and hierarchy)
 Concept-to-concept, concept-to-data, property-to-
property relationships
 Logical characteristics of relationships
 Domain and range restrictions
 Properties of relations (symmetry, transitivity)
 Cardinality of relations
 Open world vs. Closed world assumptions
 Contrast to most reasoning systems that assume
anything absent from knowledge base is not true
 Need to maintain monotonicity with tolerance for
contradictions
 OWL Classes
Class of all classes
Basic OWL Constructs
 Creating OWL Classes
 disjointWith
 Neurons are not glial cells
 sameClassAs (equivalence)
 Class Gabaergic neuron is exactly the same class as
neuronswhich has GABA as neurotransmitter
 Enumerations (on instances)
 Class Cerebellar lobules are Lobule I, Lobule II, …
 Boolean set semantics (on classes)
 Union (logical disjunction)
 Class nerve cell is union of neuron, glial cell
 Intersection (logical conjunction of class with properties)
 Class hippocampal neurons is conjunction of things of
class Neuron and have property (has-soma-located-in)
(hippocampus union any class that is (part-of)
hippocampus)
 complimentOf (logical negation)
 Class ‘benign tumor’ is disjunct of class ‘malignant
tumor’
Properties of OWL Properties
 Transitive Property
 P(x,y) and P(y,z) P(x,z) subclassOf
 SymmetricProperty
 P(x,y) iff P(y,x) is_functionally_related_to
 Functional Property
 P(x,y) and P(x,z) y=z soma_located_in
 inverseOf
 P1(x,y) iff P2(y,x) regulates is_regulated_by
 InverseFunctional Property
 P(y,x) and P(z,x) y=z is_isoform_of
 Cardinality
 Only 0 or 1 in OWL-lite and OWL-full
Instances in OWL
 Instances are distinct from Classes
 In RDF there is no distinction between class and
instances
 <Species, type, Class>
 <Lion, type, Species>
 <MyLion, type, Lion>
 OWL DL restrictions
 Type separation
 Class can not also be an individual or property
 Property can not also be an individual or class
is allowed in RDF
A Rough Comparison
~
RDF and OWL do not represent n-ary roles
Querying OWL
 The are several languages in the making
 SPARQL engines (e.g., Virtuoso) are used often
 Pellet is used for reasoning tasks
 Subsumption
 Consistency
 New, more advanced languages like nSPARQL
are coming up
 vSPARQL is being developed to enable views on
SPARQL, which will lead to nested SPARQL
queries
 Our goal
 Develop a query processor for these advanced
languages
 Part of OntoQuest, our ontological information
Where does NIF stand in this?
 Not every model is directly inter-convertible with every
other model
 NIF is designed to
 Work with multiple models
 Ensure that the modeling capability and query capability of
every model is preserved in its native form
 Queries in our system get translated to queries in the native
forms of the databases we federate
 Express the local semantics of any data appropriately by
 Augmenting the semantic model of the data
 Connecting the data to NIF’s ontology
 Extending the NIF ontology in the process
 Develop a mechanism to create a common integrated
model over these models
 this model is an ontological graph that incorporates object and
temporal semantics
Example of An Ontological Extension
 Representing time and events
 Phenotypes, physiology, …
 Instants, intervals, and periods
 Temporal granularity of observation
 Events
 Multi-temporal observations based on conditions on properties
 Modeling states, objects in state, and state transitions
 One-only, repeatable, and time deictic events
 Subevents
 History of objects, events, roles
 Subtype migration, Temporal roles and role migration
 Progression of disease, symptom or recovery states
 Repeatability
Considering
TOWL and
Temporal
ORM
Questions?

More Related Content

PDF
Domain Specific Named Entity Recognition Using Supervised Approach
PDF
Expression of Query in XML object-oriented database
PPT
Overview of Object-Oriented Concepts Characteristics by vikas jagtap
PDF
Cs501 intro
PDF
RDF and Java
PPTX
PPL, OQL & oodbms
PDF
Domain Specific Named Entity Recognition Using Supervised Approach
Expression of Query in XML object-oriented database
Overview of Object-Oriented Concepts Characteristics by vikas jagtap
Cs501 intro
RDF and Java
PPL, OQL & oodbms

What's hot (19)

PPT
Oodbms ch 20
PPTX
Odbms concepts
PPTX
PDF
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
PDF
Chapt 1 odbms
PPTX
Ontology-based Data Integration
PPT
Object Oriented Database Management System
DOCX
ICS Part 2 Computer Science Short Notes
PDF
Ijarcet vol-2-issue-2-676-678
PDF
Database management system chapter5
PPTX
Ontology For Data Integration
PPTX
Deductive Databases Presentation
ODP
Journalism and the Semantic Web
PPT
RESTful Services
PPTX
Semantics 101
PDF
Database system
PDF
Semantic Web Nature
PPTX
Object oriented database concepts
Oodbms ch 20
Odbms concepts
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...
Chapt 1 odbms
Ontology-based Data Integration
Object Oriented Database Management System
ICS Part 2 Computer Science Short Notes
Ijarcet vol-2-issue-2-676-678
Database management system chapter5
Ontology For Data Integration
Deductive Databases Presentation
Journalism and the Semantic Web
RESTful Services
Semantics 101
Database system
Semantic Web Nature
Object oriented database concepts
Ad

Similar to NIF as a Multi-Model Semantic Information System (20)

PPT
Dbms Lec Uog 02
PPTX
Presentation1
PDF
Expression of Query in XML object-oriented database
PDF
Expression of Query in XML object-oriented database
DOCX
COMPUTERS Database
PPTX
DatabaseManagementSystem.pptx
PDF
Database Concepts & SQL(1).pdf
PDF
Part2- The Atomic Information Resource
PDF
Database systems Handbook by Muhammad Sharif.pdf
PDF
Database systems Handbook by Muhammad Sharif.pdf
PDF
Database systems Handbook by Muhammad Sharif.pdf
PDF
Database systems Handbook by Muhammad Sharif.pdf
PDF
Database systems Handbook by Muhammad Sharif.pdf
PDF
Database systems Handbook.pdf
PDF
Database systems Handbook.pdf
PDF
Database systems Handbook.pdf
PDF
Space efficient structures for json documents
PPTX
Presentation
PPTX
Spatial Database and Database Management System
PDF
2. Chapter Two.pdf
Dbms Lec Uog 02
Presentation1
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented database
COMPUTERS Database
DatabaseManagementSystem.pptx
Database Concepts & SQL(1).pdf
Part2- The Atomic Information Resource
Database systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdf
Database systems Handbook by Muhammad Sharif.pdf
Database systems Handbook.pdf
Database systems Handbook.pdf
Database systems Handbook.pdf
Space efficient structures for json documents
Presentation
Spatial Database and Database Management System
2. Chapter Two.pdf
Ad

More from Neuroscience Information Framework (20)

PDF
Why should my institution support RRIDs?
PDF
Why should Journals ask fo RRIDs?
PPTX
Neuroscience as networked science
PPTX
Martone acs presentation
PPT
Data Landscapes - Addiction
PPTX
INCF 2013 - Uniform Resource Layer
PDF
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
PPTX
The Neuroscience Information Framework: A Scalable Platform for Information E...
PPTX
The Uniform Resource Layer
PPTX
NIF services overview
PPTX
PPT
NIF Data Registration
PPTX
PPTX
A Deep Survey of the Digital Resource Landscape
PPTX
The possibility and probability of a global Neuroscience Information Framework
PPTX
NIF: A vision for a uniform resource layer
Why should my institution support RRIDs?
Why should Journals ask fo RRIDs?
Neuroscience as networked science
Martone acs presentation
Data Landscapes - Addiction
INCF 2013 - Uniform Resource Layer
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
The Neuroscience Information Framework: A Scalable Platform for Information E...
The Uniform Resource Layer
NIF services overview
NIF Data Registration
A Deep Survey of the Digital Resource Landscape
The possibility and probability of a global Neuroscience Information Framework
NIF: A vision for a uniform resource layer

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Cloud computing and distributed systems.
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Approach and Philosophy of On baking technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Machine learning based COVID-19 study performance prediction
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Per capita expenditure prediction using model stacking based on satellite ima...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Encapsulation_ Review paper, used for researhc scholars
“AI and Expert System Decision Support & Business Intelligence Systems”
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
MYSQL Presentation for SQL database connectivity
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
sap open course for s4hana steps from ECC to s4
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Approach and Philosophy of On baking technology
Unlocking AI with Model Context Protocol (MCP)
Diabetes mellitus diagnosis method based random forest with bat algorithm
Machine learning based COVID-19 study performance prediction
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation

NIF as a Multi-Model Semantic Information System

  • 1. Amarnath Gupta University of California San Diego NIF as a Multi-Model Semantic Information System Part 1: Relational, XML, RDF and OWL models
  • 2. Preamble – 1  As we design and extend the NIF system we recognize that  Users will give us data in any form that is convenient for them  Standard data may be stored in a flat file  Web service output can be in XML  Semantic Web enthusiasts may represent data using proper RDF  However, regardless of the form in which data may be represented  The NIF system must treat them (query, index, relate, ...) in a uniform manner  The NIF system must utilize the underlying systems
  • 3. Preamble – 2  In this presentation we intend to  Explain our perspective on these different data models  Provide a background on the data models we consider  Offer a sense of the “semantic character” of these data models  Present our design philosophy on  Where to keep them separate  Where to transform them into a common model
  • 4. What is a Data Model?  A conceptual data model  A formal representation of the users’/application’s mental model of data elements and their relationships that should be put in a database, manipulated, queried and operated upon  A logical data model  A formal description of the data model in a logical structure that a computer can use to perform the queries and other operations. In many cases, the same conceptual model can be represented by different logical models  A physical data model  An implementable version of the data model in terms of data structures, access structures (e.g., indices) and the set of low-level operations
  • 5. A Conceptual Model ORM Model – Terry Halpin Object Relationship/ Role Value Constraint Uniqueness Constraint Inter-relationship Constraint Value Type n-ary Role
  • 6. A Logical Data Model  A formal specification of  The structure of the data  The structure tells us how the data is organized (123, “Purkinje Cell”, Cerebellum) (828, Hippocampus, “Hilar Cell”)  Often the structure of the data, together with some constraints, represent some semantics  If the data are not structured (like free text), the techniques for handling them will be different.  Operations on this structure  Every data model is based on some mathematical principles that define what you can do with the data  the nature of data values  Data domains and data types  operations on data values is not structured
  • 7. The Relational Data Model NeuronID NeuronName BrainRegion NeuroTransmitte r Current 1 Purkinje Cell Cerebellum Glutamate Transient Na+ 2 Hilar Cell Dentate Gyrus GABA Ca2+  Attribute Domain all possible values the attribute can take  Candidate key: a set of columns that uniquely determines a row  Relational model is a set (bag) of tuples model  Metadata stored in a separate catalog which is also relational  First order constraints  All queries are about some combination of  Selecting rows, columns  Combining tables by union, intersection, join  Computing data or aggregate functions  Grouping and sorting Table: Neurons Attribute name Attribute value: Cannot be compl Relation name Tuple
  • 8. Object Relational Model  Eases some of the problems of the classical relational model  Data values can be of arbitrary data types  Sets (e.g., multiple currents for a neuron)  Tuples (e.g., references ordered by year)  Time-series (e.g., raw EEG data)  Spatial Data (e.g., atlases in CCDB)  Each data type can have its own operations  Find all data points within a neighborhood of a spatial location  Queries are still values  Catalog queries and data queries cannot be mixed in a single query  All industrial-strength DBMSs use some version this model  Need to be a skilled DB programmer to develop custom
  • 9. XML (Two Perspectives)  Document Community  data = linear text documents  mark up (annotate) text pieces to describe context, structure, semantics of the marked text <physiologicalCondition> Oxidative stress </physiologicalCondition> has been proposed to be involved in the <biologicalProcess context=“disease”> pathogenesis </biologicalProcess> of <disease> Parkinson's disease</disease> (PD). A plausible source of <physiologicalCondition> oxidative stress </physiologicalCondition> in <brainRegion> nigral </brainRegion> <neuron> dopaminergic neurons </neuron> is the redox reactions that specifically involve <chemical> dopamine </chemical> and produce various <chemical context=“biologicalAgent”> toxic </chemical> molecules.
  • 10. XML (Two Perspectives)  Database Community  XML as a (most prominent) example of the semi- structured data model => captures the whole spectrum from highly structured, regular data to unstructured data (relational, object-oriented, marked up text, ...)<?xml version="1.0" encoding="utf-8"?> <NDTF_Annotation> <description>A new annotation file </description> <timeMarker>true</timeMarker> <timeResolution>0.000001</timeResolution> <interval group_id="04"> <eventNote timeOffset="1237888.230” attachedFile="sound1.wmv” application="realplayer">Text message for the event start.</eventNote> <eventNote timeOffset="18958585.232">Text message for the event end.</eventNote> </interval> From the CARMEN gro
  • 11. XML as a Logical Data Model  XML is a tree-structured document  Nodes  Element nodes  Children can be ordered  Recursive elements (parts under parts)  Attribute nodes  Mandatory or optional  Edges  Sub-element edges  Attribute edges  IDRef edges  Constraints  References  Value restrictions, OneOf  Cardinality • Trees are more flexible than tables • Any number of nodes can be added anywhere without breaking the model
  • 12. XML as a Logical Data Model • XML has its own schema language • Lets you specify a complex type system • A database is a collection of XML trees  Storing XML  Mostly relational with some very clever indexing to encode the hierarchy, tree paths, and order  Querying XML  Elements, attribute names, values and structure can be queried  Multiple trees can be joined by value  Example (Xpath)  http://mousespinal.brain- map.org/imageseries/detail/100002661.xml  Find images of the spinal column  //image[//structurelabel/text()=“SPINAL COLUMN”]/ish_image_path
  • 13. Misusing and Abusing XML  Using XML if your data is relational  It will result in flat trees that will suffer from complex querying  Encoding orders and hierarchies that need special parsing <Brand_Mixtures count=“2”> <Brand_Mixture_1> Apo-Levocarb (carbidopa + levodopa) </Brand_Mixture_1> <Brand_Mixture_2> Apo-Levocarb CR Controlled-Release Tablets (carbidopa + levodopa) </Brand_Mixture_2> </Brand_Mixtures>  Using implicit multi-valuedness <atomArray atomID="a1 a2 a3" elementType="O N C" hydrogenCount="1 1 3"> <array dictRef="cml:calcCharge" dataType="xsd:decimal" units="cml:electron">0.2 -0.3 0.1</array> </atomArray>
  • 14. Expressing Semantics in XML  Adorning elements with Namespaces  A namespace is a unique URI (Uniform Resource Locator)  To disambiguate between two elements that happen to share the same name  To group elements relating to a common idea together <item xmlns:bp="http://www.biopax.org/release/biopax- level1.owl#"> <bp:protein ID="Protein1"> <bp:NAME>Metalloelastase</bp:NAME> <bp:XREF> <bp:unificationXref rdf:ID="Xref1"> <bp:ID>NP_304845</bp:ID> <bp:DB>RefSeq</bp:DB> </bp:unificationXref> </bp:XREF> </bp:protein>
  • 15. The Problem with XML Semantics  Two different XML representations of the same kind of information may not be easily unifiable  What did XML not encode?
  • 16. Resource Description Format (RDF) Rdf:statement URI(CNTFR- URI(modulat es) URI(eSNCA- mediated neurotoxicity) Rdf:type Rdf:object Rdf:predicate Rdf:subject URI(membra ne-protein) Rdf:type URI(protein- mediated toxicity) Rdf:type Rdf:property
  • 17. The Basic Constructs of RDF  RDF meta-model basic elements  All defined in rdf namespace  http://www.w3.org/1999/02/22-rdf-syntax-ns#  Types (or classes)  rdf:resource – everything that can be identified (with a URI)  rdf:property – specialization of a resource expressing a binary relation between two resources  rdf:statement – a triple with properties rdf:subject, rdf:predicate, rdf:object  Properties  rdf:type - subject is an instance of that category or class defined by the value  rdf:subject, rdf:predicate, rdf:object – relate elements of statement tuple to a resource of type statement.
  • 18. Relational Data vis-à-vis RDF  Node to edge ratio is relatively small in many applications  Number of relationships need not be fixed at design time  The general tendency is keep the number of edge labels small  Graph-based operations can be performed on RDF, which requires an unspecified number of joins in relational data
  • 19. RDF Blank Nodes  RDF allows one to create anonymous objects whose existence is known but details are not  There exists some neuron to which both NeuronX and NeuronY connect  <neurons:NeuronX rdf:about="http://neurons.org/Neuron#NeuronX"> <conn:connectsTo> <neurons:Neuron rdf:nodeID=“n1"/> </conn:connectsTo> </neurons:NeuronX>  <neurons:NeuronY rdf:about="http://neurons.org/Neuron#NeuronY"> <conn:connectsTo> <neurons:Neuron rdf:nodeID=“n1"/> </conn:connectsTo> </neurons:NeuronY>
  • 20. RDF Schema  Declaration of vocabularies  classes, properties, and relationships defined by a particular community  rdfs:Class, rdfs:subClassOf  Property-related  rdfs:subPropertyOf  relationship of properties to classes  rdfs:domain, rdfs:range  Provides substructure for inferences based on existing triples  NOT prescriptive, but descriptive  This is different from XML Schema  Schema language is an expression of basic RDF model  uses meta-model constructs: resources, statements, properties
  • 21. Examples of RDF Inferencing  From this we can infer  (:alice rdf:type parent)  (:betty rdf:type parent)  (:eve rdf:type female-person)  (:charles rdf:type :person)
  • 22. RDF as a Logical Data Model  RDF does not distinguish between different relationships  Instance-to-type  Instance-to-instance  Type-to-instance  No transitivity inference is possible over, say, rdf:type  RDF (as well as XML) has lost the notion of the abstract data type like spatial object or time  Operations on object types does not mix well with RDF  Constraints like uniqueness, 1-to-1 relationships, cannot be expressed  SPARQL, the query language for RDF is  An edge-only language – it cannot express the // construct of XML  Blank nodes are treated as variables not output in the results  Parts of the language are undecidable! A problem is undecidable if it can be proved that there can be no algorithm
  • 23. OWL  Components of an OWL Ontology  Vocabulary (concepts)  Structure (attributes of concepts and hierarchy)  Concept-to-concept, concept-to-data, property-to- property relationships  Logical characteristics of relationships  Domain and range restrictions  Properties of relations (symmetry, transitivity)  Cardinality of relations  Open world vs. Closed world assumptions  Contrast to most reasoning systems that assume anything absent from knowledge base is not true  Need to maintain monotonicity with tolerance for contradictions  OWL Classes Class of all classes
  • 24. Basic OWL Constructs  Creating OWL Classes  disjointWith  Neurons are not glial cells  sameClassAs (equivalence)  Class Gabaergic neuron is exactly the same class as neuronswhich has GABA as neurotransmitter  Enumerations (on instances)  Class Cerebellar lobules are Lobule I, Lobule II, …  Boolean set semantics (on classes)  Union (logical disjunction)  Class nerve cell is union of neuron, glial cell  Intersection (logical conjunction of class with properties)  Class hippocampal neurons is conjunction of things of class Neuron and have property (has-soma-located-in) (hippocampus union any class that is (part-of) hippocampus)  complimentOf (logical negation)  Class ‘benign tumor’ is disjunct of class ‘malignant tumor’
  • 25. Properties of OWL Properties  Transitive Property  P(x,y) and P(y,z) P(x,z) subclassOf  SymmetricProperty  P(x,y) iff P(y,x) is_functionally_related_to  Functional Property  P(x,y) and P(x,z) y=z soma_located_in  inverseOf  P1(x,y) iff P2(y,x) regulates is_regulated_by  InverseFunctional Property  P(y,x) and P(z,x) y=z is_isoform_of  Cardinality  Only 0 or 1 in OWL-lite and OWL-full
  • 26. Instances in OWL  Instances are distinct from Classes  In RDF there is no distinction between class and instances  <Species, type, Class>  <Lion, type, Species>  <MyLion, type, Lion>  OWL DL restrictions  Type separation  Class can not also be an individual or property  Property can not also be an individual or class is allowed in RDF
  • 27. A Rough Comparison ~ RDF and OWL do not represent n-ary roles
  • 28. Querying OWL  The are several languages in the making  SPARQL engines (e.g., Virtuoso) are used often  Pellet is used for reasoning tasks  Subsumption  Consistency  New, more advanced languages like nSPARQL are coming up  vSPARQL is being developed to enable views on SPARQL, which will lead to nested SPARQL queries  Our goal  Develop a query processor for these advanced languages  Part of OntoQuest, our ontological information
  • 29. Where does NIF stand in this?  Not every model is directly inter-convertible with every other model  NIF is designed to  Work with multiple models  Ensure that the modeling capability and query capability of every model is preserved in its native form  Queries in our system get translated to queries in the native forms of the databases we federate  Express the local semantics of any data appropriately by  Augmenting the semantic model of the data  Connecting the data to NIF’s ontology  Extending the NIF ontology in the process  Develop a mechanism to create a common integrated model over these models  this model is an ontological graph that incorporates object and temporal semantics
  • 30. Example of An Ontological Extension  Representing time and events  Phenotypes, physiology, …  Instants, intervals, and periods  Temporal granularity of observation  Events  Multi-temporal observations based on conditions on properties  Modeling states, objects in state, and state transitions  One-only, repeatable, and time deictic events  Subevents  History of objects, events, roles  Subtype migration, Temporal roles and role migration  Progression of disease, symptom or recovery states  Repeatability Considering TOWL and Temporal ORM