Big Data that might benefit from
ontology technology, but why this
            usually fails

             Barry Smith
    National Center for Ontological
               Research

                                      1
The strategy of annotation
Databases describe data using multiple heterogeneous
labels
If we can annotate (tag) these labels using terms from
common controlled vocabularies, then a virtual arms-
length integration can be achieved, providing
• immediate benefits for search and retrieval
• a starting point for the creation of net-centric
   reference data
• potential longer term benefits for reasoning
with no need to modify existing systems, code or data

             See Ceusters et al. Proceedings of DILS 2004.
             http://ontology.buffalo.edu/bio/LinkSuite.pdf
                                                             2
String searches yield partial results, rest
  on manual effort and on familiarity
     with existing database contents
Ontologies facilitate grouping of annotations

          brain               20
            hindbrain         15
               rhombomere     10


      Query ‘brain’ without ontology 20
      Query ‘brain’ with ontology    45
Examples of where this method works
• Reference Genome Annotation Project
  http://www.geneontology.org/GO.refgenome.shtml
• Human resources data in large organizations
  http://www.youtube.com/watch?v=OzW3Gc_yA9A
• Military intelligence data
  Salmen et al. in http://ceur-ws.org/Vol-808/
Other potential areas of application:
• Crime                      • Public health
• Insurance                  • Finance

                                                   4
But normally the method does not work

Semantic technology (OWL, …) seeks to break
  down data silos
Unfortunately it is now so easy to create
  ontologies that myriad incompatible ontologies
  are being created in ad hoc ways leading to the
  creation of new, semantic silos
The Semantic Web framework as currently
  conceived and governed by the W3C (modeled
  on html) yields minimal standardization
The more semantic technology is
  successful, they more we fail to achieve
  our goals
                                                    5
Reasons for this effect
• Just as it’s easier to build a new database, so it’s
  easier to build a new ontology for each new
  project
• You will not get paid for reusing existing
  ontologies (Let a million ontologies bloom)
• There are no ‘good’ ontologies, anyway (just
  arbitrary choices of terms and relations …)
• Information technology (hardware) changes
  constantly, not worth the effort of getting things
  right
                                                         6
How to do it right?
• how create an incremental, evolutionary
  process, where what is good survives, and what is
  bad fails
• create a scenario in which people will find it
  profitable to reuse ontologies, terminologies and
  coding systems which have been tried and tested
• silo effects will be avoided and results of
  investment in Semantic Technology will cumulate
  effectively
                                                  7
Uses of ‘ontology’ in PubMed abstracts




                                         8
By far the most successful: GO (Gene Ontology)




                                                 9
GO provides a controlled vocabulary of terms for
  use in annotating (tagging) biological data

• multi-species, multi-disciplinary, open source
• built and maintained by domain experts
• contributing to the cumulativity of scientific results
  obtained by distinct research communities
• natural language and logical definitions for all
  terms to support consistent human application and
  computational exploitation
• rigorous governance process
• feedback loop connects users to editors
                                                      10
How to do it right
• ontologies should mimic the methodology used
  by the GO (following the principles of the OBO
  Foundry: http://obofoundry.org)
• ontologies in the same field should be
  developed in coordinated fashion to ensure
  that there is exactly one ontology for each
  subdomain
• ontologies should be developed incrementally
  in a way that builds on successful user testing at
  every stage
                                                  11

More Related Content

PPTX
Ontologies: What Librarians Need to Know
PPTX
Why should researchers care about data curation?
PPTX
Role of Amyloid Burden in cognitive decline
PDF
Data citation metrics : best practice to enable new metrics for research data
PPTX
Avoiding the tower of babel - The Role of Data Description Standards in Biome...
PPTX
A practical guide to practicing open science
PPTX
Experimenta
PPTX
Jsm madduri-august-2015
Ontologies: What Librarians Need to Know
Why should researchers care about data curation?
Role of Amyloid Burden in cognitive decline
Data citation metrics : best practice to enable new metrics for research data
Avoiding the tower of babel - The Role of Data Description Standards in Biome...
A practical guide to practicing open science
Experimenta
Jsm madduri-august-2015

What's hot (11)

PPTX
Modern tools for sharing and synthesizing neuroimaging results
PPTX
Reproducibility and replicability: a practical approach
PPT
Explorations in bioinformatics
PPTX
Reproducible research: theory
PPTX
Towards open and reproducible neuroscience in the age of big data
PDF
Pharos: A Torch to Use in Your Journey in the Dark Genome
PDF
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
PDF
Share and Reuse: how data sharing can take your research to the next level
PPTX
Open Source Pharma: From philosophy to real time experience
PPTX
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
PDF
Log Analysis to Understand Medical Professionals' Image Searching Behaviour
Modern tools for sharing and synthesizing neuroimaging results
Reproducibility and replicability: a practical approach
Explorations in bioinformatics
Reproducible research: theory
Towards open and reproducible neuroscience in the age of big data
Pharos: A Torch to Use in Your Journey in the Dark Genome
Case Study Life Sciences Data: Central for Integrative Systems Biology and Bi...
Share and Reuse: how data sharing can take your research to the next level
Open Source Pharma: From philosophy to real time experience
Andrew Treloar, overview of ACEAS Data Workflow, ACEAS Grand 2014
Log Analysis to Understand Medical Professionals' Image Searching Behaviour
Ad

Viewers also liked (17)

PPTX
Ontology Engineering for Big Data
PDF
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
PPSX
ontology meets big data: immutability
PDF
Linking Big Data to Rich Process Descriptions
PPT
Ontology based metadata schema for digital library projects in China
PPTX
Horizontal integration of warfighter intelligence data
PPT
Tutorial what is_an_ontology_ncbo_march_2012
PPTX
Imagenes De Amistad
PPTX
Towards Joint Doctrine for Military Informatics
PPTX
Ontologies for big data
PPTX
Towards an Ontology of Philosophy
PPTX
The Role of Ontology in the Era of Big Military Data
PPTX
IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
PPTX
Horizontal integration of warfighter intelligence data
PPTX
Semantics for Big Data Integration and Analysis
PPT
Ontology of Poker
PPTX
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Engineering for Big Data
AUTOMATIC CONVERSION OF RELATIONAL DATABASES INTO ONTOLOGIES: A COMPARATIVE A...
ontology meets big data: immutability
Linking Big Data to Rich Process Descriptions
Ontology based metadata schema for digital library projects in China
Horizontal integration of warfighter intelligence data
Tutorial what is_an_ontology_ncbo_march_2012
Imagenes De Amistad
Towards Joint Doctrine for Military Informatics
Ontologies for big data
Towards an Ontology of Philosophy
The Role of Ontology in the Era of Big Military Data
IAO-Intel: An Ontology of Information Artifacts in the Intelligence Domain
Horizontal integration of warfighter intelligence data
Semantics for Big Data Integration and Analysis
Ontology of Poker
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ad

Similar to Big data ontology_summit_feb2012 (20)

PPTX
Ontology for the Financial Services Industry
PPTX
Towards Joint Doctrine for Military Informatics
PDF
Building OBO Foundry ontology using semantic web tools
PPTX
Ontological realism as a strategy for integrating ontologies
PPT
Collaborative Ontology Building Project
PDF
Ontologies Fmi 042010
PPT
20111022 ontologiescomeofageocas germanymcguinnessfinal
PDF
20120419 linkedopendataandteamsciencemcguinnesschicago
PDF
A NAIVE METHOD FOR ONTOLOGY CONSTRUCTION
PDF
A Naive Method For Ontology Construction
PDF
A NAIVE METHOD FOR ONTOLOGY CONSTRUCTION
PPT
Semantic Web research anno 2006:main streams, popular falacies, current statu...
PDF
A Comparative Study of Recent Ontology Visualization Tools with a Case of Dia...
PPTX
Six Myths about Ontologies: The Basics of Formal Ontology
PDF
Overview of-semantic-technologies-and-ontologies
PPTX
Semantic Technologies for Big Sciences including Astrophysics
ODT
Riding The Semantic Wave
PDF
NetIKX Semantic Search Presentation
PPTX
Phyloinformatics and the Semantic Web
PPTX
Ontology
Ontology for the Financial Services Industry
Towards Joint Doctrine for Military Informatics
Building OBO Foundry ontology using semantic web tools
Ontological realism as a strategy for integrating ontologies
Collaborative Ontology Building Project
Ontologies Fmi 042010
20111022 ontologiescomeofageocas germanymcguinnessfinal
20120419 linkedopendataandteamsciencemcguinnesschicago
A NAIVE METHOD FOR ONTOLOGY CONSTRUCTION
A Naive Method For Ontology Construction
A NAIVE METHOD FOR ONTOLOGY CONSTRUCTION
Semantic Web research anno 2006:main streams, popular falacies, current statu...
A Comparative Study of Recent Ontology Visualization Tools with a Case of Dia...
Six Myths about Ontologies: The Basics of Formal Ontology
Overview of-semantic-technologies-and-ontologies
Semantic Technologies for Big Sciences including Astrophysics
Riding The Semantic Wave
NetIKX Semantic Search Presentation
Phyloinformatics and the Semantic Web
Ontology

More from Barry Smith (20)

PPT
An application of Basic Formal Ontology to the Ontology of Services and Commo...
PDF
Ways of Worldmarking: The Ontology of the Eruv
PPTX
The Division of Deontic Labor
PPTX
Ontology of Aging (August 2014)
PPT
Meaningful Use
PPTX
The Fifth Cycle of Philosophy
PPT
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
PPTX
Enhancing the Quality of ImmPort Data
PPTX
The Philosophome: An Exercise in the Ontology of the Humanities
PPT
Science of Emerging Social Media
PPTX
Ethics, Informatics and Obamacare
PDF
e‐Human Beings: The contribution of internet ranking systems to the developme...
PDF
Ontology of aging and death
PPTX
Ontology in-buffalo-2013
PPTX
ImmPort strategies to enhance discoverability of clinical trial data
PPT
Ontology of Documents (2005)
PPT
Ontology and the National Cancer Institute Thesaurus (2005)
PPTX
Introduction to the Logic of Definitions
PPTX
Ontology in Buffalo -- Big Data 2013
PPT
How to Do Things With Documents
An application of Basic Formal Ontology to the Ontology of Services and Commo...
Ways of Worldmarking: The Ontology of the Eruv
The Division of Deontic Labor
Ontology of Aging (August 2014)
Meaningful Use
The Fifth Cycle of Philosophy
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Enhancing the Quality of ImmPort Data
The Philosophome: An Exercise in the Ontology of the Humanities
Science of Emerging Social Media
Ethics, Informatics and Obamacare
e‐Human Beings: The contribution of internet ranking systems to the developme...
Ontology of aging and death
Ontology in-buffalo-2013
ImmPort strategies to enhance discoverability of clinical trial data
Ontology of Documents (2005)
Ontology and the National Cancer Institute Thesaurus (2005)
Introduction to the Logic of Definitions
Ontology in Buffalo -- Big Data 2013
How to Do Things With Documents

Big data ontology_summit_feb2012

  • 1. Big Data that might benefit from ontology technology, but why this usually fails Barry Smith National Center for Ontological Research 1
  • 2. The strategy of annotation Databases describe data using multiple heterogeneous labels If we can annotate (tag) these labels using terms from common controlled vocabularies, then a virtual arms- length integration can be achieved, providing • immediate benefits for search and retrieval • a starting point for the creation of net-centric reference data • potential longer term benefits for reasoning with no need to modify existing systems, code or data See Ceusters et al. Proceedings of DILS 2004. http://ontology.buffalo.edu/bio/LinkSuite.pdf 2
  • 3. String searches yield partial results, rest on manual effort and on familiarity with existing database contents Ontologies facilitate grouping of annotations brain 20 hindbrain 15 rhombomere 10 Query ‘brain’ without ontology 20 Query ‘brain’ with ontology 45
  • 4. Examples of where this method works • Reference Genome Annotation Project http://www.geneontology.org/GO.refgenome.shtml • Human resources data in large organizations http://www.youtube.com/watch?v=OzW3Gc_yA9A • Military intelligence data Salmen et al. in http://ceur-ws.org/Vol-808/ Other potential areas of application: • Crime • Public health • Insurance • Finance 4
  • 5. But normally the method does not work Semantic technology (OWL, …) seeks to break down data silos Unfortunately it is now so easy to create ontologies that myriad incompatible ontologies are being created in ad hoc ways leading to the creation of new, semantic silos The Semantic Web framework as currently conceived and governed by the W3C (modeled on html) yields minimal standardization The more semantic technology is successful, they more we fail to achieve our goals 5
  • 6. Reasons for this effect • Just as it’s easier to build a new database, so it’s easier to build a new ontology for each new project • You will not get paid for reusing existing ontologies (Let a million ontologies bloom) • There are no ‘good’ ontologies, anyway (just arbitrary choices of terms and relations …) • Information technology (hardware) changes constantly, not worth the effort of getting things right 6
  • 7. How to do it right? • how create an incremental, evolutionary process, where what is good survives, and what is bad fails • create a scenario in which people will find it profitable to reuse ontologies, terminologies and coding systems which have been tried and tested • silo effects will be avoided and results of investment in Semantic Technology will cumulate effectively 7
  • 8. Uses of ‘ontology’ in PubMed abstracts 8
  • 9. By far the most successful: GO (Gene Ontology) 9
  • 10. GO provides a controlled vocabulary of terms for use in annotating (tagging) biological data • multi-species, multi-disciplinary, open source • built and maintained by domain experts • contributing to the cumulativity of scientific results obtained by distinct research communities • natural language and logical definitions for all terms to support consistent human application and computational exploitation • rigorous governance process • feedback loop connects users to editors 10
  • 11. How to do it right • ontologies should mimic the methodology used by the GO (following the principles of the OBO Foundry: http://obofoundry.org) • ontologies in the same field should be developed in coordinated fashion to ensure that there is exactly one ontology for each subdomain • ontologies should be developed incrementally in a way that builds on successful user testing at every stage 11