Building a repository of biomedical
ontologies with Neo4j
Simon Jupp jupp@ebi.ac.uk, @simonjupp
Samples, Phenotypes and Ontologies Team
European Bioinformatics Institute
Cambridge, UK.
Biological data heavily interlinked
Proteome
Metabolome
Genome
tissue
CE-MS
antibody array LC-MS/MS
m/z
600 800 1000 1200 1400 1600
10
20
30
40
50
60
70
80
90
100
Intensity
609.256
b6
755.422
y8
882.357
b9
852.476
y9
995.435
b10
1092.506
b11
1181.252
y12
1318.578
b13
1587.759
b16
1715.817
b18
858.408
b18 ++
794.380
b16 ++
0
miRNA
array
mRNA
array
PathwaysProtein Interaction
Drug targets
We need terminology standards
Dyschromatopsia
Search PubMed for “color blindness”
Search PubMed for “Dyschromatopsia”
Search PubMed for "abnormality of the eye"
The ontology of color blindness
HP:0011518 (Dichromacy )HP:0011518 (Eye)
HP:0000551 (Abnormality of color vision )
HP:0007641 (Dyschromatopsia)
Is-a
Is-a
Disease-location
The ontology of color blindness
HP:0011518 (Dichromacy )HP:0011518 (Eye)
HP:0000551 (Abnormality of color vision )
HP:0007641 (Dyschromatopsia)
Is-a
Is-a
Disease-location
“Colorblindness”
“A form of colorblindness in
which only two of the three
fundamental colors can be
distinguished due to a lack of
one of the retinal cone
pigments.”
synonym
definition
9
Genotype Phenotype
Sequence
Proteins
Gene products Transcript
Pathways
Cell type
BRENDA tissue /
enzyme source
Development
Anatomy
Phenotype
Plasmodium
life cycle
-Sequence types
and features
-Genetic Context
- Molecule role
- Molecular Function
- Biological process
- Cellular component
-Protein covalent bond
-Protein domain
-UniProt taxonomy
-Pathway ontology
-Event (INOH pathway
ontology)
-Systems Biology
-Protein-protein
interaction
-Arabidopsis development
-Cereal plant development
-Plant growth and developmental stage
-C. elegans development
-Drosophila development FBdv fly
development.obo OBO yes yes
-Human developmental anatomy, abstract
version
-Human developmental anatomy, timed version
-Mosquito gross anatomy
-Mouse adult gross anatomy
-Mouse gross anatomy and development
-C. elegans gross anatomy
-Arabidopsis gross anatomy
-Cereal plant gross anatomy
-Drosophila gross anatomy
-Dictyostelium discoideum anatomy
-Fungal gross anatomy FAO
-Plant structure
-Maize gross anatomy
-Medaka fish anatomy and development
-Zebrafish anatomy and development
-NCI Thesaurus
-Mouse pathology
-Human disease
-Cereal plant trait
-PATO PATO attribute and value.obo
-Mammalian phenotype
- Human phenotype
-Habronattus courtship
-Loggerhead nesting
-Animal natural history and life history
eVOC (Expressed
Sequence Annotation
for Humans)
Ontologies for life sciences
Ontology Lookup Service
• Ontology search engine (Solr)
• Graph database of terms (Neo4j)
• Powerful RESTful API (Built with Spring data neo4j / rest)
• Open source project
• Generic infrastructure (can load any ontology represented in OWL)
https://github.com/EBISPOT/OLS
Repository of over 140 biomedical ontologies (4.5 million terms, 11 million relations)
http://www.ebi.ac.uk/ols/beta
Web Ontology Language – (OWL)
• W3C standard vocabulary for describing
ontologies
• Powerful knowledge representation
However
• OWL ontologies aren’t graphs, but…
… can be represented as an RDF graph
… people want to use them as graphs
• Plenty of RDF databases around
• But incomplete w.r.t. OWL semantics
• SPARQL is an acquired taste
OWL to Neo4j schema
• Each node label one of {Class, Property, Individuals} AND {Ontology name}
• All OWL annotations become properties (labels, id, descriptions etc)
• Superclass of (named and simple existentials) become edges in Neo4j
• E.g. In OWL “heart” subclassOf (part-of some “cardiovascular system”)
In Neo4j “heart” part-of “cardiovascular system”
What are the sub types of “colorblindess”?
MATCH (n:Class {obo_id: 'HP:0007641'})<-[r*]-(types:Class)
RETURN n, r, types
What parts of the eye are related to
diseases?MATCH
(eye:Class {obo_id: 'UBERON:0000970'})<-[r:Related
{label : "part_of"}]-(eye_part:Class)<-[r1:Related
{label : "has_disease_location"}]-(disease:Class)
RETURN eye, r,r1, eye_part, disease
Finding common ancestors via shortest path
Match p=shortestPath( (a:Class)-[r:SUBCLASSOF*]-(b:Class) )
Return nodes(p)
What is the common taxonomic
superfamily of Gibbons and Chimpanzees?
(or Hylobatidae and Pan troglodytes!)
https://commons.wikimedia.org/wiki/File:Hylobates_lar_pair_of_white_and_black_01.jpg
OLS visualisations
• Partonomy for heart from the UBERON anatomy ontology
MATCH path = (n:Class)-[r:SUBCLASSOF|PartOf*]->(ancestor)
REST API (Spring Data REST + Neo4j)
• Crawlable API - Hypermedia drivel (HAL)
• Get ontology and term meta data
• /ontologies
• /ontologies/{name}
• /ontologies/{name}/terms
• /ontologies/{name}/terms/{termid}
• Get related terms and navigate ontology structure
• /ontologies/{name}/terms/{termid}/parent
• /ontologies/{name}/terms/{termid}/children
• /ontologies/{name}/terms/{termid}/descendants
• /ontologies/{name}/terms/{termid}/ancestors
• /ontologies/{name}/terms/{termid}/{relation} e.g. part_of
http://www.ebi.ac.uk/ols/beta/api
Building the index
• We check all 140 external ontology files nightly for
changes
• We have a master build index
• When ontology updates we remove the old version and
reload using the Neo4j BatchInserter (Potentially fragile)
• We push master index to various production data centers
• Provides load balancing
Nightly crawl of all
>140 registered
ontologies
Conclusion
• We’ve built a scalable repository of biomedical ontologies
with Neo4j
• Generic OWL indexer (simplified OWL)
• Powerful REST API built with Spring
• Acts as standalone OWL ontology server
• Now being deployed externally
• Beta ~2000 users / 10 Million requests per month
• Would like to discuss
• Batch Inserter
• Migrating to Spring Data Neo4j 4
Acknowledgements
• Sample Phenotypes and Ontologies Team - Tony
Burdett, James Malone, Dani Welter, Catherine Leroy,
Sira Sarntivijai, Ilinca Tudose, Helen Parkinson
• Matt Pearce – Flax (BioSOLR project)
• Michal Bachman and GraphAware team (Neo4j training)
• Funding
• European Molecular Biology Laboratory (EMBL)
• European Union projects: DIACHRON, BioMedBridges and
CORBEL

More Related Content

PPTX
Building a repository of biomedical ontologies with Neo4j
PPTX
Ontologies neo4j-graph-workshop-berlin
PPTX
Facilitating semantic alignment.-biohackathon-jupp
PPTX
schema.org and biomedical ontologies
PPT
Importing life science at a into Neo4j
PPTX
Semantics as a service at EMBL-EBI
PPTX
Ontologies: Necessary, but not sufficient
PPTX
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
Building a repository of biomedical ontologies with Neo4j
Ontologies neo4j-graph-workshop-berlin
Facilitating semantic alignment.-biohackathon-jupp
schema.org and biomedical ontologies
Importing life science at a into Neo4j
Semantics as a service at EMBL-EBI
Ontologies: Necessary, but not sufficient
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...

What's hot (20)

PDF
Neo4j and bioinformatics
PDF
PPTX
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
PDF
Federated data stores using semantic web technology
PDF
BioSamples Database Linked Data, SWAT4LS Tutorial
PPTX
All together now: piecing together the knowledge graph of life
PPT
OWL-XML-Summer-School-09
PDF
Ontologies for life sciences: examples from the gene ontology
PDF
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
PPTX
Classifications in EOL
PPT
Building and Using Ontologies to do biology
PPTX
How to search_free_crystallography_databases_benedictine_university final 111...
PPT
The importance of the InChI identifier as a foundation technology for eScienc...
PPT
Ontology learning from text
PDF
SWAT4LS 2014 SLIDE by Yamamoto
PPT
Issues in Learning an Ontology from Text
PDF
BioSD Tutorial 2014 Editition
PPTX
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
PDF
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Neo4j and bioinformatics
Graph DB + Bioinformatics: Bio4j, recent applications and future directions
Federated data stores using semantic web technology
BioSamples Database Linked Data, SWAT4LS Tutorial
All together now: piecing together the knowledge graph of life
OWL-XML-Summer-School-09
Ontologies for life sciences: examples from the gene ontology
Making Linked Data SPARQL with the InterMine Biological Data Warehouse
Classifications in EOL
Building and Using Ontologies to do biology
How to search_free_crystallography_databases_benedictine_university final 111...
The importance of the InChI identifier as a foundation technology for eScienc...
Ontology learning from text
SWAT4LS 2014 SLIDE by Yamamoto
Issues in Learning an Ontology from Text
BioSD Tutorial 2014 Editition
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
Ad

Viewers also liked (20)

PDF
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
PPTX
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
PPTX
Neo4j GraphTalks - Semantische Netze
ODP
Graph databases in computational bioloby: case of neo4j and TitanDB
PDF
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
PPTX
Temporal graph
PPTX
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
PDF
The power of graphs to analyze biological data
PDF
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
PDF
GraphConnect Europe 2016 - Moving Graphs to Production at Scale - Ian Robinson
PDF
GraphConnect Europe 2016 - Navigating All the Knowledge - James Weaver
PDF
GraphConnect Europe 2016 - Governing Multichannel Services with Graphs - Albe...
PPTX
GraphConnect Europe 2016 - Inside the Spider’s Web: Dependency Management wit...
PDF
GraphConnect Europe 2016 - How Go and Neo4j enabled the FT to Deliver at Spee...
PDF
GraphConnect Europe 2016 - Building Spring Data Neo4j 4.1 Applications Like A...
PDF
GraphConnect Europe 2016 - Creating the Best Teams Ever with Collaborative Fi...
PDF
GraphConnect Europe 2016 - Pushing the Evolution of Software Analytics with G...
PDF
GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham
PDF
GraphConnect Europe 2016 - Who Cares What Beyonce Ate for Lunch? - Alicia Powers
PDF
Slides from GraphDay Santa Clara
GraphTalks - Semantisches Produktdatenmanagement, Dr. Andreas Weber
Managing Genetic Ancestry at Scale with Neo4j and Kafka - StampedeCon 2015
Neo4j GraphTalks - Semantische Netze
Graph databases in computational bioloby: case of neo4j and TitanDB
Geschäftliches Potential für System-Integratoren und Berater - Graphdatenban...
Temporal graph
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
The power of graphs to analyze biological data
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
GraphConnect Europe 2016 - Moving Graphs to Production at Scale - Ian Robinson
GraphConnect Europe 2016 - Navigating All the Knowledge - James Weaver
GraphConnect Europe 2016 - Governing Multichannel Services with Graphs - Albe...
GraphConnect Europe 2016 - Inside the Spider’s Web: Dependency Management wit...
GraphConnect Europe 2016 - How Go and Neo4j enabled the FT to Deliver at Spee...
GraphConnect Europe 2016 - Building Spring Data Neo4j 4.1 Applications Like A...
GraphConnect Europe 2016 - Creating the Best Teams Ever with Collaborative Fi...
GraphConnect Europe 2016 - Pushing the Evolution of Software Analytics with G...
GraphConnect Europe 2016 - Tuning Your Cypher - Petra Selmer, Mark Needham
GraphConnect Europe 2016 - Who Cares What Beyonce Ate for Lunch? - Alicia Powers
Slides from GraphDay Santa Clara
Ad

Similar to GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp (20)

PDF
Ontology Services for the Biomedical Sciences
PDF
Connecting life sciences data at the European Bioinformatics Institute
PDF
Open interoperability standards, tools and services at EMBL-EBI
PPT
Chado introduction
PDF
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
PDF
Use of open_linked_data_in_bioinformatics
PPTX
Semantic Web Technologies: A Paradigm for Medical Informatics
PPT
NCBO SPARQL Endpoint
PPT
Oe2 tutorial 1010
PPT
Ontology at Manchester
PPTX
NCBO haendel talk 2013
PDF
Bio ontologies and semantic technologies
PPTX
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
PPTX
Real World Applications of OWL
PPTX
Drug-discovery knowledge integration and analysis using OWL and reasoners
PPTX
Building a Network of Interoperable and Independently Produced Linked and Ope...
PPTX
GIGA2 Structuring Phenotype Data
PPTX
Phenotype terminologies in use for genotype-phenotype databases: a common cor...
PDF
Bio ontologies and semantic technologies
PDF
Bio ontologies and semantic technologies[2]
Ontology Services for the Biomedical Sciences
Connecting life sciences data at the European Bioinformatics Institute
Open interoperability standards, tools and services at EMBL-EBI
Chado introduction
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Use of open_linked_data_in_bioinformatics
Semantic Web Technologies: A Paradigm for Medical Informatics
NCBO SPARQL Endpoint
Oe2 tutorial 1010
Ontology at Manchester
NCBO haendel talk 2013
Bio ontologies and semantic technologies
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Real World Applications of OWL
Drug-discovery knowledge integration and analysis using OWL and reasoners
Building a Network of Interoperable and Independently Produced Linked and Ope...
GIGA2 Structuring Phenotype Data
Phenotype terminologies in use for genotype-phenotype databases: a common cor...
Bio ontologies and semantic technologies
Bio ontologies and semantic technologies[2]

More from Neo4j (20)

PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
PDF
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
PDF
GraphSummit Singapore Master Deck - May 20, 2025
PPTX
Graphs & GraphRAG - Essential Ingredients for GenAI
PPTX
Neo4j Knowledge for Customer Experience.pptx
PPTX
GraphTalk New Zealand - The Art of The Possible.pptx
PDF
Neo4j: The Art of the Possible with Graph
PDF
Smarter Knowledge Graphs For Public Sector
PDF
GraphRAG and Knowledge Graphs Exploring AI's Future
PDF
Matinée GenAI & GraphRAG Paris - Décembre 24
PDF
ANZ Presentation: GraphSummit Melbourne 2024
PDF
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
PDF
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
PDF
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
PDF
Démonstration Digital Twin Building Wire Management
PDF
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
PDF
Démonstration Supply Chain - GraphTalk Paris
PDF
The Art of Possible - GraphTalk Paris Opening Session
PPTX
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
PDF
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Jin Foo - Prospa GraphSummit Sydney Presentation.pdf
GraphSummit Singapore Master Deck - May 20, 2025
Graphs & GraphRAG - Essential Ingredients for GenAI
Neo4j Knowledge for Customer Experience.pptx
GraphTalk New Zealand - The Art of The Possible.pptx
Neo4j: The Art of the Possible with Graph
Smarter Knowledge Graphs For Public Sector
GraphRAG and Knowledge Graphs Exploring AI's Future
Matinée GenAI & GraphRAG Paris - Décembre 24
ANZ Presentation: GraphSummit Melbourne 2024
Google Cloud Presentation GraphSummit Melbourne 2024: Building Generative AI ...
Telstra Presentation GraphSummit Melbourne: Optimising Business Outcomes with...
Hands-On GraphRAG Workshop: GraphSummit Melbourne 2024
Démonstration Digital Twin Building Wire Management
Swiss Life - Les graphes au service de la détection de fraude dans le domaine...
Démonstration Supply Chain - GraphTalk Paris
The Art of Possible - GraphTalk Paris Opening Session
How Siemens bolstered supply chain resilience with graph-powered AI insights ...
Knowledge Graphs for AI-Ready Data and Enterprise Deployment - Gartner IT Sym...

Recently uploaded (20)

PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPT
What is a Computer? Input Devices /output devices
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPTX
The various Industrial Revolutions .pptx
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
Architecture types and enterprise applications.pdf
PDF
Five Habits of High-Impact Board Members
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Getting Started with Data Integration: FME Form 101
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Enhancing emotion recognition model for a student engagement use case through...
What is a Computer? Input Devices /output devices
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
A review of recent deep learning applications in wood surface defect identifi...
WOOl fibre morphology and structure.pdf for textiles
Final SEM Unit 1 for mit wpu at pune .pptx
Taming the Chaos: How to Turn Unstructured Data into Decisions
The various Industrial Revolutions .pptx
Benefits of Physical activity for teenagers.pptx
A novel scalable deep ensemble learning framework for big data classification...
Architecture types and enterprise applications.pdf
Five Habits of High-Impact Board Members
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
DP Operators-handbook-extract for the Mautical Institute
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
O2C Customer Invoices to Receipt V15A.pptx
Getting Started with Data Integration: FME Form 101

GraphConnect Europe 2016 - Building a Repository of Biomedical Ontologies with Neo4j - Simon Jupp

  • 1. Building a repository of biomedical ontologies with Neo4j Simon Jupp jupp@ebi.ac.uk, @simonjupp Samples, Phenotypes and Ontologies Team European Bioinformatics Institute Cambridge, UK.
  • 2. Biological data heavily interlinked Proteome Metabolome Genome tissue CE-MS antibody array LC-MS/MS m/z 600 800 1000 1200 1400 1600 10 20 30 40 50 60 70 80 90 100 Intensity 609.256 b6 755.422 y8 882.357 b9 852.476 y9 995.435 b10 1092.506 b11 1181.252 y12 1318.578 b13 1587.759 b16 1715.817 b18 858.408 b18 ++ 794.380 b16 ++ 0 miRNA array mRNA array PathwaysProtein Interaction Drug targets
  • 3. We need terminology standards Dyschromatopsia
  • 4. Search PubMed for “color blindness”
  • 5. Search PubMed for “Dyschromatopsia”
  • 6. Search PubMed for "abnormality of the eye"
  • 7. The ontology of color blindness HP:0011518 (Dichromacy )HP:0011518 (Eye) HP:0000551 (Abnormality of color vision ) HP:0007641 (Dyschromatopsia) Is-a Is-a Disease-location
  • 8. The ontology of color blindness HP:0011518 (Dichromacy )HP:0011518 (Eye) HP:0000551 (Abnormality of color vision ) HP:0007641 (Dyschromatopsia) Is-a Is-a Disease-location “Colorblindness” “A form of colorblindness in which only two of the three fundamental colors can be distinguished due to a lack of one of the retinal cone pigments.” synonym definition
  • 9. 9 Genotype Phenotype Sequence Proteins Gene products Transcript Pathways Cell type BRENDA tissue / enzyme source Development Anatomy Phenotype Plasmodium life cycle -Sequence types and features -Genetic Context - Molecule role - Molecular Function - Biological process - Cellular component -Protein covalent bond -Protein domain -UniProt taxonomy -Pathway ontology -Event (INOH pathway ontology) -Systems Biology -Protein-protein interaction -Arabidopsis development -Cereal plant development -Plant growth and developmental stage -C. elegans development -Drosophila development FBdv fly development.obo OBO yes yes -Human developmental anatomy, abstract version -Human developmental anatomy, timed version -Mosquito gross anatomy -Mouse adult gross anatomy -Mouse gross anatomy and development -C. elegans gross anatomy -Arabidopsis gross anatomy -Cereal plant gross anatomy -Drosophila gross anatomy -Dictyostelium discoideum anatomy -Fungal gross anatomy FAO -Plant structure -Maize gross anatomy -Medaka fish anatomy and development -Zebrafish anatomy and development -NCI Thesaurus -Mouse pathology -Human disease -Cereal plant trait -PATO PATO attribute and value.obo -Mammalian phenotype - Human phenotype -Habronattus courtship -Loggerhead nesting -Animal natural history and life history eVOC (Expressed Sequence Annotation for Humans) Ontologies for life sciences
  • 10. Ontology Lookup Service • Ontology search engine (Solr) • Graph database of terms (Neo4j) • Powerful RESTful API (Built with Spring data neo4j / rest) • Open source project • Generic infrastructure (can load any ontology represented in OWL) https://github.com/EBISPOT/OLS Repository of over 140 biomedical ontologies (4.5 million terms, 11 million relations) http://www.ebi.ac.uk/ols/beta
  • 11. Web Ontology Language – (OWL) • W3C standard vocabulary for describing ontologies • Powerful knowledge representation However • OWL ontologies aren’t graphs, but… … can be represented as an RDF graph … people want to use them as graphs • Plenty of RDF databases around • But incomplete w.r.t. OWL semantics • SPARQL is an acquired taste
  • 12. OWL to Neo4j schema • Each node label one of {Class, Property, Individuals} AND {Ontology name} • All OWL annotations become properties (labels, id, descriptions etc) • Superclass of (named and simple existentials) become edges in Neo4j • E.g. In OWL “heart” subclassOf (part-of some “cardiovascular system”) In Neo4j “heart” part-of “cardiovascular system”
  • 13. What are the sub types of “colorblindess”? MATCH (n:Class {obo_id: 'HP:0007641'})<-[r*]-(types:Class) RETURN n, r, types
  • 14. What parts of the eye are related to diseases?MATCH (eye:Class {obo_id: 'UBERON:0000970'})<-[r:Related {label : "part_of"}]-(eye_part:Class)<-[r1:Related {label : "has_disease_location"}]-(disease:Class) RETURN eye, r,r1, eye_part, disease
  • 15. Finding common ancestors via shortest path Match p=shortestPath( (a:Class)-[r:SUBCLASSOF*]-(b:Class) ) Return nodes(p) What is the common taxonomic superfamily of Gibbons and Chimpanzees? (or Hylobatidae and Pan troglodytes!) https://commons.wikimedia.org/wiki/File:Hylobates_lar_pair_of_white_and_black_01.jpg
  • 16. OLS visualisations • Partonomy for heart from the UBERON anatomy ontology MATCH path = (n:Class)-[r:SUBCLASSOF|PartOf*]->(ancestor)
  • 17. REST API (Spring Data REST + Neo4j) • Crawlable API - Hypermedia drivel (HAL) • Get ontology and term meta data • /ontologies • /ontologies/{name} • /ontologies/{name}/terms • /ontologies/{name}/terms/{termid} • Get related terms and navigate ontology structure • /ontologies/{name}/terms/{termid}/parent • /ontologies/{name}/terms/{termid}/children • /ontologies/{name}/terms/{termid}/descendants • /ontologies/{name}/terms/{termid}/ancestors • /ontologies/{name}/terms/{termid}/{relation} e.g. part_of http://www.ebi.ac.uk/ols/beta/api
  • 18. Building the index • We check all 140 external ontology files nightly for changes • We have a master build index • When ontology updates we remove the old version and reload using the Neo4j BatchInserter (Potentially fragile) • We push master index to various production data centers • Provides load balancing Nightly crawl of all >140 registered ontologies
  • 19. Conclusion • We’ve built a scalable repository of biomedical ontologies with Neo4j • Generic OWL indexer (simplified OWL) • Powerful REST API built with Spring • Acts as standalone OWL ontology server • Now being deployed externally • Beta ~2000 users / 10 Million requests per month • Would like to discuss • Batch Inserter • Migrating to Spring Data Neo4j 4
  • 20. Acknowledgements • Sample Phenotypes and Ontologies Team - Tony Burdett, James Malone, Dani Welter, Catherine Leroy, Sira Sarntivijai, Ilinca Tudose, Helen Parkinson • Matt Pearce – Flax (BioSOLR project) • Michal Bachman and GraphAware team (Neo4j training) • Funding • European Molecular Biology Laboratory (EMBL) • European Union projects: DIACHRON, BioMedBridges and CORBEL