SlideShare a Scribd company logo
Linked Data Driven Data Virtualization for Web-scale Integration © 2009 OpenLink Software, All rights reserved Orri Erling Program Manager, Virtuoso
Situation Analysis Agility via  ad hoc  data access has prevailed throughout the history of IT. Data, heterogeneity are growing exponentially, across Intranets, Extranets, and the Internet Processing windows remain static (we still only have 24 hrs. in a day for personal and professional activities) Individual and Enterprise Agility remains totally dependent on data access, manipulation, and dissemination Data remains dirty and its context remains necessary for extracting meaning. Data Virtualization (in the form of heterogeneous Linked Data Spaces) remains the only viable way forward. © 2009 OpenLink Software, All rights reserved
What is Linked Data? RDF (Resource Description Framework) Data Model - a graph model where records take the form of  3-tuples i.e., subject-predicate-object or entity-attribute-value RDF Data Serialization Formats - (X)HTML+RDFa, Turtle, N3, TriX, RDF/XML, and others RDF Data Item Identity - is HTTP URI based RDF is inherently schema-last and self-describing Linked Data - application of RDF model where records identifiers, fields, and optionally field values, are endowed with HTTP scheme URIs whether instance data (ABox) or data dictionary data (TBox) Linked Data enables follow-your-nose traversal of RDF data records where every record identifier, field, or field value is a data pathway © 2009 OpenLink Software, All rights reserved
The Linked Data Landscape Core vocabularies - common terms facilitate integration: FOAF for Personal Profile SIOC for Social Networking Dublin Core for Bibliography GoodRelations for eCommerce Geonames Domain specific vocabularies for all verticals: OBO Foundry for biology Dbpedia, OpenCYC, Yago, SUMO, Geonames etc. define URIs for talking about almost any well known real world entity or class of entities. © 2009 OpenLink Software, All rights reserved
The Linked Open Data Cloud © 2009 OpenLink Software, All rights reserved
What Linked Data Offers for Data Integration © 2009 OpenLink Software, All rights reserved In RDF, all things have a single-part global HTTP based Identifier:  Anything can join with anything else  through its URI.Many people will use a different identifier for the same thing. Whether two things can be considered the same depends on context.  OWL  sameAs  is a generic way of stating identity co-reference.Literal values can be tagged by type or language, allowing explicit representation of units of measure etc.RDF Triples are contained in Named Graphs.  The graph usually denotes provenance, and it has a URI, about which further statements can be made
RDF vs. Relational When the data is ragged and highly heterogenous, with schema last needs, use RDF and Linked Data The more different sources of data, the more you will need RDF and Linked Data If data is highly regular and uniform, relational offers higher performance:  Application specific indices, &c are faster than putting everything in a generic index scheme © 2009 OpenLink Software, All rights reserved
Incentives for Publishing If one is on the web, one is there in order to be found Publishing data in standard vocabularies allows applications to mesh data from many Web-addressable Data Spaces (eg. Pages) In the end, Linked Data will enhance the end user experience by added serendipitous discovery and increased relevance © 2009 OpenLink Software, All rights reserved
Models for Publishing Linked data is usually published in large dumps which have a release cycle Any relational database's contents can be published as linked data through generating RDF on demand via a relational to RDF schema mapping Whether one generates RDF on demand or ETLs RDBs as RDF depends on use case © 2009 OpenLink Software, All rights reserved If one publishes data  –  whether as a product, for promotion, or regulatory compliance – RDF/Linked Data is attractive because of a critical mass of  reusable terms  and a ready base of technology.  As more data is published, the link density increases, leading to more novel ways of  deriving value  from the data.
Use Case: CRM and MIS At OpenLink internal IT, all CRM, Support, Blogs, Wikis available as linked data Interactive drill down from products to support tickets to customers to docs, etc. Currently working on projects about exposing enterprise CRM as linked data © 2009 OpenLink Software, All rights reserved
Use Case: The Neurocommons © 2009 OpenLink Software, All rights reserved AddGene Plasmids NeuronDB BAMS Neurocommons text mining Homologene SWAN Entrez Gene Gene ontology annotations Mammalian Phenotype PDSPki BrainPharm AlzGene Antibodies PubChem MESH Reactome Allen Brain Atlas Publications CCDB Neuronbank OBO Ontologies NeuroMorpho SAO Coriell catalog
Bio2RDF - some of the larger datasets © 2009 OpenLink Software, All rights reserved Name Triple count PubMed * 797,000,000 NCBI GeneID 172,931,628 Uniprot 797,000,000 UniRef * 242,000,000 UniParc * 490,000,000 IproClass 149,342,977
Use Case: BBC Programs and Music Service Data Harvested via Sitemap and Web Crawling 20M Triples Integrated to Last.FM, Dbpedia, Musicbrainz:  See what any of these has to say about an artist of work. http://bbc.openlinksw.com © 2009 OpenLink Software, All rights reserved
Use Case: Linked Open Data Cloud Service Dbpedia, Freebase, Geodata, Neurocommons, Bio2RDF, Govtrack, US Census, RKB Explorer Pingthesemanticweb, Good Relations and more  Entity Ranks  Full Text, SPARQL, Faceted Browsing http://lod2.openlinksw.com ,  http://lod.openlinksw.com 7.59 billion triples © 2009 OpenLink Software, All rights reserved
The Generations of the Web Web 1.0 -  Publishing for all Web 2.0 - User generated content, mashups, the citizen journalist Linked Data Web - Big data, integration and analytics for all. © 2009 OpenLink Software, All rights reserved

More Related Content

PPT
Linked Data Planet Key Note
PPT
Virtuoso Universal Server Overview
PPT
Solving Real Problems Using Linked Data
PPT
Deploying RDF Linked Data via Virtuoso Universal Server
PDF
Integrating Semantic Systems
PPT
Understanding Linked Data via EAV Model based Structured Descriptions
PPTX
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
PPT
Making the Conceptual Layer Real via HTTP based Linked Data
Linked Data Planet Key Note
Virtuoso Universal Server Overview
Solving Real Problems Using Linked Data
Deploying RDF Linked Data via Virtuoso Universal Server
Integrating Semantic Systems
Understanding Linked Data via EAV Model based Structured Descriptions
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Making the Conceptual Layer Real via HTTP based Linked Data

What's hot (20)

PPTX
OpenLink Virtuoso - Management & Decision Makers Overview
PPT
Solving Real Problems Using Linked Data
PPTX
Understanding Data
PPT
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-end
PPT
ISWC 2012 - Linked Data Meetup
PPTX
Enterprise & Web based Federated Identity Management & Data Access Controls
PPTX
ODP
Introduction to OData
PPTX
Open Data Protocol (OData)
PPTX
Virtuoso Platform Overview
ODP
DC-2008 Architecture Forum Open session
ODP
ISO MLR semantics
PPTX
OData Introduction and Impact on API Design (Webcast)
PPTX
Linked Open Data (LOD) Cloud & Ontology Life Cycles
PPTX
OData Services
PPT
Accessing the Linked Open Data Cloud via ODBC
PDF
Semantics2014
PPT
Exploiting Linked (Open) Data via Microsoft Access using ODBC File DSNs
PPT
ESWC2008 Identity OpenLink - On The Evolution of Terms
PPT
Tableau Desktop as a Linked (Open) Data Front-End via ODBC
OpenLink Virtuoso - Management & Decision Makers Overview
Solving Real Problems Using Linked Data
Understanding Data
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-end
ISWC 2012 - Linked Data Meetup
Enterprise & Web based Federated Identity Management & Data Access Controls
Introduction to OData
Open Data Protocol (OData)
Virtuoso Platform Overview
DC-2008 Architecture Forum Open session
ISO MLR semantics
OData Introduction and Impact on API Design (Webcast)
Linked Open Data (LOD) Cloud & Ontology Life Cycles
OData Services
Accessing the Linked Open Data Cloud via ODBC
Semantics2014
Exploiting Linked (Open) Data via Microsoft Access using ODBC File DSNs
ESWC2008 Identity OpenLink - On The Evolution of Terms
Tableau Desktop as a Linked (Open) Data Front-End via ODBC
Ad

Similar to Linked Data Driven Data Virtualization for Web-scale Integration (20)

PPTX
Enterprise knowledge graphs
PPTX
Linked data HHS 2015
PDF
Semantic web assignment1
PPT
Web 3 Mark Greaves
PPTX
Linked data for Enterprise Data Integration
PDF
Big Data
PDF
Ontotext Overview Winter 2012
PPT
Structured Dynamics' Semantic Technologies Product Stack
PDF
Linked sensor data
PPT
Linked Data Tutorial
PPTX
Linked Data In Action
PPT
Web Topics
PPT
Putting the L in front: from Open Data to Linked Open Data
PPT
Corrib.org - OpenSource and Research
PDF
Llinked open data training for EU institutions
PDF
From Linked Data to Semantic Applications
PDF
Linked Data for the Masses: The approach and the Software
PDF
Linked Open Data Principles, Technologies and Examples
PPTX
Sören Auer | Enterprise Knowledge Graphs
PPTX
reegle - a new key portal for open energy data
Enterprise knowledge graphs
Linked data HHS 2015
Semantic web assignment1
Web 3 Mark Greaves
Linked data for Enterprise Data Integration
Big Data
Ontotext Overview Winter 2012
Structured Dynamics' Semantic Technologies Product Stack
Linked sensor data
Linked Data Tutorial
Linked Data In Action
Web Topics
Putting the L in front: from Open Data to Linked Open Data
Corrib.org - OpenSource and Research
Llinked open data training for EU institutions
From Linked Data to Semantic Applications
Linked Data for the Masses: The approach and the Software
Linked Open Data Principles, Technologies and Examples
Sören Auer | Enterprise Knowledge Graphs
reegle - a new key portal for open energy data
Ad

More from rumito (7)

PPT
Open Conceptual Data Models
PPT
ESWC2008 SPARQL BI OpenLink- SPARQL for Business Intelligence
PPT
ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink V...
PPT
Data Portability And Data Spaces
PPT
Virtuoso Relational To RDF Mapping
PPT
RDF Views of SQL Data Power Point Presentation - 1
PPT
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Open Conceptual Data Models
ESWC2008 SPARQL BI OpenLink- SPARQL for Business Intelligence
ESWC2008 Relational2RDF - Mapping Relational Databases to RDF with OpenLink V...
Data Portability And Data Spaces
Virtuoso Relational To RDF Mapping
RDF Views of SQL Data Power Point Presentation - 1
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources

Recently uploaded (20)

PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Electronic commerce courselecture one. Pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
Teaching material agriculture food technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
KodekX | Application Modernization Development
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Modernizing your data center with Dell and AMD
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
The Rise and Fall of 3GPP – Time for a Sabbatical?
Diabetes mellitus diagnosis method based random forest with bat algorithm
Review of recent advances in non-invasive hemoglobin estimation
Electronic commerce courselecture one. Pdf
Encapsulation_ Review paper, used for researhc scholars
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
KodekX | Application Modernization Development
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced methodologies resolving dimensionality complications for autism neur...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Modernizing your data center with Dell and AMD
Dropbox Q2 2025 Financial Results & Investor Presentation
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

Linked Data Driven Data Virtualization for Web-scale Integration

  • 1. Linked Data Driven Data Virtualization for Web-scale Integration © 2009 OpenLink Software, All rights reserved Orri Erling Program Manager, Virtuoso
  • 2. Situation Analysis Agility via ad hoc data access has prevailed throughout the history of IT. Data, heterogeneity are growing exponentially, across Intranets, Extranets, and the Internet Processing windows remain static (we still only have 24 hrs. in a day for personal and professional activities) Individual and Enterprise Agility remains totally dependent on data access, manipulation, and dissemination Data remains dirty and its context remains necessary for extracting meaning. Data Virtualization (in the form of heterogeneous Linked Data Spaces) remains the only viable way forward. © 2009 OpenLink Software, All rights reserved
  • 3. What is Linked Data? RDF (Resource Description Framework) Data Model - a graph model where records take the form of 3-tuples i.e., subject-predicate-object or entity-attribute-value RDF Data Serialization Formats - (X)HTML+RDFa, Turtle, N3, TriX, RDF/XML, and others RDF Data Item Identity - is HTTP URI based RDF is inherently schema-last and self-describing Linked Data - application of RDF model where records identifiers, fields, and optionally field values, are endowed with HTTP scheme URIs whether instance data (ABox) or data dictionary data (TBox) Linked Data enables follow-your-nose traversal of RDF data records where every record identifier, field, or field value is a data pathway © 2009 OpenLink Software, All rights reserved
  • 4. The Linked Data Landscape Core vocabularies - common terms facilitate integration: FOAF for Personal Profile SIOC for Social Networking Dublin Core for Bibliography GoodRelations for eCommerce Geonames Domain specific vocabularies for all verticals: OBO Foundry for biology Dbpedia, OpenCYC, Yago, SUMO, Geonames etc. define URIs for talking about almost any well known real world entity or class of entities. © 2009 OpenLink Software, All rights reserved
  • 5. The Linked Open Data Cloud © 2009 OpenLink Software, All rights reserved
  • 6. What Linked Data Offers for Data Integration © 2009 OpenLink Software, All rights reserved In RDF, all things have a single-part global HTTP based Identifier: Anything can join with anything else through its URI.Many people will use a different identifier for the same thing. Whether two things can be considered the same depends on context. OWL sameAs is a generic way of stating identity co-reference.Literal values can be tagged by type or language, allowing explicit representation of units of measure etc.RDF Triples are contained in Named Graphs. The graph usually denotes provenance, and it has a URI, about which further statements can be made
  • 7. RDF vs. Relational When the data is ragged and highly heterogenous, with schema last needs, use RDF and Linked Data The more different sources of data, the more you will need RDF and Linked Data If data is highly regular and uniform, relational offers higher performance: Application specific indices, &c are faster than putting everything in a generic index scheme © 2009 OpenLink Software, All rights reserved
  • 8. Incentives for Publishing If one is on the web, one is there in order to be found Publishing data in standard vocabularies allows applications to mesh data from many Web-addressable Data Spaces (eg. Pages) In the end, Linked Data will enhance the end user experience by added serendipitous discovery and increased relevance © 2009 OpenLink Software, All rights reserved
  • 9. Models for Publishing Linked data is usually published in large dumps which have a release cycle Any relational database's contents can be published as linked data through generating RDF on demand via a relational to RDF schema mapping Whether one generates RDF on demand or ETLs RDBs as RDF depends on use case © 2009 OpenLink Software, All rights reserved If one publishes data – whether as a product, for promotion, or regulatory compliance – RDF/Linked Data is attractive because of a critical mass of reusable terms and a ready base of technology. As more data is published, the link density increases, leading to more novel ways of deriving value from the data.
  • 10. Use Case: CRM and MIS At OpenLink internal IT, all CRM, Support, Blogs, Wikis available as linked data Interactive drill down from products to support tickets to customers to docs, etc. Currently working on projects about exposing enterprise CRM as linked data © 2009 OpenLink Software, All rights reserved
  • 11. Use Case: The Neurocommons © 2009 OpenLink Software, All rights reserved AddGene Plasmids NeuronDB BAMS Neurocommons text mining Homologene SWAN Entrez Gene Gene ontology annotations Mammalian Phenotype PDSPki BrainPharm AlzGene Antibodies PubChem MESH Reactome Allen Brain Atlas Publications CCDB Neuronbank OBO Ontologies NeuroMorpho SAO Coriell catalog
  • 12. Bio2RDF - some of the larger datasets © 2009 OpenLink Software, All rights reserved Name Triple count PubMed * 797,000,000 NCBI GeneID 172,931,628 Uniprot 797,000,000 UniRef * 242,000,000 UniParc * 490,000,000 IproClass 149,342,977
  • 13. Use Case: BBC Programs and Music Service Data Harvested via Sitemap and Web Crawling 20M Triples Integrated to Last.FM, Dbpedia, Musicbrainz: See what any of these has to say about an artist of work. http://bbc.openlinksw.com © 2009 OpenLink Software, All rights reserved
  • 14. Use Case: Linked Open Data Cloud Service Dbpedia, Freebase, Geodata, Neurocommons, Bio2RDF, Govtrack, US Census, RKB Explorer Pingthesemanticweb, Good Relations and more Entity Ranks Full Text, SPARQL, Faceted Browsing http://lod2.openlinksw.com , http://lod.openlinksw.com 7.59 billion triples © 2009 OpenLink Software, All rights reserved
  • 15. The Generations of the Web Web 1.0 - Publishing for all Web 2.0 - User generated content, mashups, the citizen journalist Linked Data Web - Big data, integration and analytics for all. © 2009 OpenLink Software, All rights reserved