SlideShare a Scribd company logo
Rod Page @rdmpage
http://iphylo.blogspot.com
Knowledge graphs
Holly Bik @hollybik
Let’s rise up to unite taxonomy and technology
10.1371/journal.pbio.2002231
http://ispecies.org
Simple Javascript mashup
DBpedia
GBIF
CrossRef
EOL
Open Tree of Life
TreeBASE
https://doi.org/10.7717/peerj.190
The Semantic web:
“The future of the web…
and always will be” –
Peter Norvig (Google)
Obstacles to building knowledge graphs
•Technical
•Social
Obstacles to building knowledge graphs
• Need globally unique, persistent identifiers
(how to label the nodes of the graph)
• Need to create and agree on vocabularies
(how to label the edges of the graph)
• Need to agree how to transmit the graph
• Who stores the global graph?
A new hope
• The identifier wars are (nearly) over (DOIs FTW)
• Lots of domain-specific vocabularies, but
schema.org is “good enough” for most things
• XML becoming a bedtime story to frighten the
children, JSON is everywhere (JSON-LD FTW).
• Wikidata
Obstacles to building knowledge graphs
•Technical
•Social Economic
Identifiers, identifiers, identifiers, identifiers
How do we measure progress?
before
now
now
before
Linear growth (easy) Connectivity (hard)
Need network effects
One is useless Two is “meh” Many is better
The Semantic web:
“The future of the web…
and always will be” –
Peter Norvig (Google)
The knowledge graph is
already here (it’s just
not evenly distributed)
William Gibson @GreatDismal
Google’s Knowledge Graph
Towards a biodiversity knowledge graph
PREFIX wdt: http://www.wikidata.org/prop/direct/
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT ?root_name ?parent_name ?child_name WHERE
{
VALUES ?root_name {"Hominini"}
?root wdt:P225 ?root_name .
?child wdt:P171+ ?root .
?child wdt:P171 ?parent .
?child wdt:P225 ?child_name .
?parent wdt:P225 ?parent_name .
}
http://biohackathon.org/d3sparql/
Toshiaki Katayama @tktym
http://iphylo.blogspot.ca/2017/01/displaying-taxonomic-classifications.html
“Citations for the sum of
human knowledge”
WikiCite @WikiCite
Goal 1: Every citation in the Wikipedias should be in Wikidata
Goal 2: Every citation should be in Wikidata (!?)
Small knowledge graphs (hexastores)
Very simple
ontology
Tom Scott @derivadow
Leigh Dodds @ldodds
Towards a biodiversity knowledge graph
Towards a biodiversity knowledge graph
Hexastore
• A triple is [s, p, o]
• Find all statements [s, ?, ?] is simple array lookup (all elements with key “s”)
• Find all statements [?, ?, o] is slow (scan all triples)…
• …unless we add array of [o, s, p] triples, then simple array lookup (all elements with
key “o”)
• Six variations cover all queries: [s,p,o], [s,o,p], [p, s, o], [p, o, s], [o, s, p], [o, p, s]
(hence “hexastore”)
• In-memory graph database in Javascript (think offline apps)
http://crubier.github.io/Hexastore/
Towards a biodiversity knowledge graph
Towards a biodiversity knowledge graph
Towards a biodiversity knowledge graph
Xanadu,
the web that wasn’t
Ted Nelson Hyperlinks and
hypermedia
Two-way links and
“transclusion”
= Xanadu
Tim Berners-Lee
HTTP, URL, HTML
One-way links
= world wide web
Web page Other web
page
Web linking, one way, document-level, “target”
doesn’t know that it is linked to (“cited”),
link can break (404)
text
Work Source
text
Xanadu linking, two way, fragment-level,
“source” knows it is linked to, source content
is embedded, links don’t break
Xanadu
A New Account of the Genus
Horsfieldia (Myristicaceae), Pt 2
W J J O De Wilde
The Gardens' bulletin, Singapore 38(1): 55-144 (1985)
http://biostor.org/reference/175018
Horsfieldia lancifolia
BioStor @biostor_org
Biodiversity Heritage Library @biodivlibrary
Flora Malesiana. Series I - Seed Plants,
Volume 14. Myristicaceae
https://doi.org/10.3897/ab.e1141
DescriptionDescriptio
n
Flora Article
Embedded markup (bad)…
Crocidura absconditus, new species
<i>Crocidura absconditus</i>, new species
0 20
{ [0,20], “italics” }
…versus annotation (good)
(think NLM JATS XML markup
versus Substance JSON used
by Lens viewer
https://lens.elifesciences.org/
about/)
Crocidura absconditus, new species
@hypothes_is
Annotating a
scientific paper
Aggregating annotations (iPhylo)
http://iphylo.blogspot.co.uk/2016/06/aggregating-annotations-on-scientific_30.html
Taxonomic
names,
specimen
codes,
geographic
localities,
references are
all
annotations
Taxonomic databases
are not lists of names…
…they are lists of annotations
(“this name occurs on this page”)
Annotations are retrospective nanopublications
Annotating existing content
(extracting “facts”)
Today
Publishing “facts” as nanopublications
Stream of “facts”
Social design and the
knowledge graph
Obstacles to building knowledge graphs
•Technical
•Social Economic
Nico Franz @taxonbytes
ORCID
(person)
DOI
(publication)
LSID
(plant name)
Find my papers that
published new species
@SandyKnapp
Towards a biodiversity knowledge graph
ORCID
(person)
DOI
(publication)
LSID
(plant name)
#Iamataxonomist
(claim/demonstrate expertise)
specimen plant name
What Sandy really wants
collected type for
publication
person
“What specimens that I collected that have been
described as new species by other people?”
Published in
author
other person
not the same person
Towards a biodiversity knowledge graph
Knowledge graphs
considered harmful
(remember Impact Factors?)
http://www.museum-analytics.org/
Cited, linkable specimens
NMNH Vertebrate Zoology
Herpetology Collections
11194
CAS Herpetology Collection Catalog
MCZ Herpetology Collection
Herpetology Collection (University
of Kansas Biodiversity Research
Center)
9619
6720
5818
http://iphylo.blogspot.co.uk/2012/02/gbif-specimens-in-biostor-who-are-top.html
Towards a biodiversity knowledge graph
We will need to ensure our knowledge graph is
free, open, and used for good

More Related Content

PPTX
What are we DOIng about the missing links? Connecting taxonomic names to the ...
PDF
DBpedia as Gaeilge Chapter
PPTX
Content Mining of Science and Medicine
PPTX
ContentMine + EPMC: Finding Zika!
PPT
Andrew Polaszek - ZooBank: ICZN’s open-access web-based register of all new a...
PPTX
The Who in Collections: Revealing the Network of Collectors and Determiners o...
PPT
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
PPTX
Geography in Linked Ancient World Data
What are we DOIng about the missing links? Connecting taxonomic names to the ...
DBpedia as Gaeilge Chapter
Content Mining of Science and Medicine
ContentMine + EPMC: Finding Zika!
Andrew Polaszek - ZooBank: ICZN’s open-access web-based register of all new a...
The Who in Collections: Revealing the Network of Collectors and Determiners o...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Geography in Linked Ancient World Data

Similar to Towards a biodiversity knowledge graph (20)

PPTX
SLiDInG6 talk on biodiversity knowledge graph
PPTX
PPT
Something about links
PPT
Surfacing the deep data of taxonomy
PPTX
Wikidata and the Biodiversity Knowledge Graph
PPT
Resources, resources, resources: the three rs of the Web
PPTX
Cornell 2011 05-13
PPTX
Ciard Initiative and a Global Infrastructure for Linked Open Data
PPTX
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
PPTX
Scott Edmunds: Data publication in the data deluge
PPTX
Ozymandias - from an atlas to a knowledge graph of living Australia
PPTX
Democratizing Big Semantic Data management
PDF
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
PPT
E scidocdays review
PPTX
TDWG at the University of Tasmania
PDF
From Research Objects to Reproducible Science Tales
PPTX
Seeding links from Wikipedia to BHL (2008 - 2012)
PPTX
Scott Edmunds: Data Dissemination in the era of "Big-Data"
PPT
download
PPT
download
SLiDInG6 talk on biodiversity knowledge graph
Something about links
Surfacing the deep data of taxonomy
Wikidata and the Biodiversity Knowledge Graph
Resources, resources, resources: the three rs of the Web
Cornell 2011 05-13
Ciard Initiative and a Global Infrastructure for Linked Open Data
SPARQL1.1 Tutorial, given in UChile by Axel Polleres (DERI)
Scott Edmunds: Data publication in the data deluge
Ozymandias - from an atlas to a knowledge graph of living Australia
Democratizing Big Semantic Data management
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
E scidocdays review
TDWG at the University of Tasmania
From Research Objects to Reproducible Science Tales
Seeding links from Wikipedia to BHL (2008 - 2012)
Scott Edmunds: Data Dissemination in the era of "Big-Data"
download
download
Ad

More from Roderic Page (20)

PPTX
ALEC (A List of Everything Cool)
PPTX
BioStor Next
PPTX
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
PPTX
The Sam Adams talk
PPTX
Unknown knowns, long tails, and long data
PPTX
In praise of grumpy old men: Open versus closed data and the challenge of cre...
PPTX
BHL, BioStor, and beyond
PPTX
Cisco Digital Catapult
PPTX
Built in the 19th century, rebuilt for the 21st
PPTX
Two graphs, three responses
PPTX
GrBio Workshop talk
PPTX
Biodiversity Knowledge Graphs
PPTX
Visualing phylogenies: a personal view
PPTX
Biodiversity informatics: digitising the living world
PPTX
Ebbe Nielsen Challenge GBIF #gb21
PPTX
GBIF Science Committee Report GB21, Delhi, India
PPTX
Building the Biodiversity Knowledge Graph
PPT
GBIF ideas
PPT
Biodiversity informatics: why aren't we there yet?
PPT
Why I blog instead of writing papers
ALEC (A List of Everything Cool)
BioStor Next
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
The Sam Adams talk
Unknown knowns, long tails, and long data
In praise of grumpy old men: Open versus closed data and the challenge of cre...
BHL, BioStor, and beyond
Cisco Digital Catapult
Built in the 19th century, rebuilt for the 21st
Two graphs, three responses
GrBio Workshop talk
Biodiversity Knowledge Graphs
Visualing phylogenies: a personal view
Biodiversity informatics: digitising the living world
Ebbe Nielsen Challenge GBIF #gb21
GBIF Science Committee Report GB21, Delhi, India
Building the Biodiversity Knowledge Graph
GBIF ideas
Biodiversity informatics: why aren't we there yet?
Why I blog instead of writing papers
Ad

Recently uploaded (20)

PPTX
Overview of calcium in human muscles.pptx
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
Placing the Near-Earth Object Impact Probability in Context
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Microbiology with diagram medical studies .pptx
PPTX
BIOMOLECULES PPT........................
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPT
6.1 High Risk New Born. Padetric health ppt
PDF
The scientific heritage No 166 (166) (2025)
PDF
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
. Radiology Case Scenariosssssssssssssss
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPT
protein biochemistry.ppt for university classes
PPTX
Application of enzymes in medicine (2).pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
Overview of calcium in human muscles.pptx
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Placing the Near-Earth Object Impact Probability in Context
POSITIONING IN OPERATION THEATRE ROOM.ppt
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Microbiology with diagram medical studies .pptx
BIOMOLECULES PPT........................
Phytochemical Investigation of Miliusa longipes.pdf
6.1 High Risk New Born. Padetric health ppt
The scientific heritage No 166 (166) (2025)
Lymphatic System MCQs & Practice Quiz – Functions, Organs, Nodes, Ducts
Taita Taveta Laboratory Technician Workshop Presentation.pptx
. Radiology Case Scenariosssssssssssssss
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Introduction to Cardiovascular system_structure and functions-1
protein biochemistry.ppt for university classes
Application of enzymes in medicine (2).pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...

Towards a biodiversity knowledge graph

Editor's Notes

  • #4: https://doi.org/10.1371/journal.pbio.2002231
  • #5: http://ispecies.org
  • #6: https://doi.org/10.7717/peerj.190
  • #18: https://www.wikidata.org
  • #19: http://iphylo.blogspot.ca/2017/01/displaying-taxonomic-classifications.html
  • #21: http://www.bbc.co.uk/nature/life/Steller's_Sea_Eagle
  • #25: http://crubier.github.io/Hexastore/
  • #33: Ted Nelson’s Xanadu project, linking and microcredit
  • #34: http://biostor.org/reference/175018
  • #35: https://doi.org/10.3897/ab.e1141
  • #37: https://lens.elifesciences.org/about/)
  • #40: http://iphylo.blogspot.co.uk/2016/06/aggregating-annotations-on-scientific_30.html
  • #45: https://doi.org/10.1101/157214
  • #54: https://ontotext.com/knowledgehub/case-studies/sn-scigraph-uses-graphdb/. Springer SciGraph https://twitter.com/OntotextGraphDB/status/898143878724935681