SlideShare a Scribd company logo
Integra(ng	data	with	phylogenies,	
at	scale	
Nico	Cellinese	
University	of	Florida	
&	
Hilmar	Lapp	
Duke	University
WHAT’S	IN	A	NAME?
What’s	in	a	name?	
Chaos!	
•  Names	and	Concepts	do	not	
reconcile	that	easily	
•  Names	are	text	strings	
•  Context	is	lacking	or	subjec(ve	
•  Meaning	is	not	computable
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789	
I	don’t	understand	any	of	those	concepts	
whether	in	LaDn	or	English,	but	I	can	sDll	
link	them	to	their	names,	as	in	one	object	
to	one	object
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789	
…and	200+	
…and	400+
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
Idiosyncratic Russian dolls syndrome
From	a	human	perspecDve,	we	lose	track	of	concepts.	Hard	to	reconcile	all	of	them.	We	need	
help!	Can	we	compute	them?	
Idiosyncratic Russian dolls syndrome
Linnean	names	point	to	concepts		
	
Antoine	Laurent	de	Jussieu	
Genera	Plantarum,	1789	
…and	200+	
…and	400+
Integrating data with phylogenies, at scale
•  We	can	uncluNer	concepts,	and	thereby	
nomenclature	
•  How	do	we	navigate	along	the	Tree	of	Life	
repurposing	Linnean	names,	which	are	
linked	to	tradi(onal	concepts?
Dark	taxa!
Dark	taxa!	
How	do	we	integrate	data	with	this	tree?
Tree-thinking	
Common	descent	àevoluDon	at	the	center	of	taxonomy	
B	 C	 D	
Branches	
Synapomorphies	
A	
Clades	=	taxa	
Discovery
Tree-thinking	
Common	descent	àevoluDon	at	the	center	of	taxonomy	
Discovery	
CommunicaDon	How??	
0147
Density
0.07
0.22
0.72
Diversification rate
Tree-thinking	
Berberidopsidaceae	
Opiliones	
Zingiberaceae	
Hamamelidaceae	
Sarcolaenaceae	
Lingulidae	
Hymenoptera	
Mammalia	
Apocynaceae	
Galliformes	
Rubiaceae 	
Anarthriaceae	
Lineidae	
Crocodylidae	
Stylosiphonia
Andrenidae Cracidae
Gavialis
Globba
Micrella
Rhodoleia
Phalangiidae Tachyglossa
Lyginia
Mediusella
Chamaeclitandra
Tree-thinking	
Berberidopsidaceae	
Opiliones	
Zingiberaceae	
Hamamelidaceae	
Sarcolaenaceae	
Lingulidae	
Hymenoptera	
Mammalia	
Apocynaceae	
Galliformes	
Rubiaceae 	
Anarthriaceae	
Lineidae	
Crocodylidae	
Stylosiphonia
Andrenidae Cracidae
Gavialis
Globba
Micrella
Rhodoleia
Phalangiidae Tachyglossa
Lyginia
Mediusella
Chamaeclitandra
These	names	are	not	generated	in	an	evoluDonary-based	framework	
(Groups	defined	by	character	similarity	vs.	common	descent)
Both	the	Encyclopedia	of	Life	(EOL)	and	the	Open	Tree	of	Life	suggest	that	
Campanuloideae	is	a	misspelling	of	Campaniloidea	(marine	gastropods!)		
GBIF	does	not	currently	have	Campanuloideae	in	its	backbone	taxonomy.
Are	you	kidding	me?	
These	are	the	Campanuloideae!	
Wang	et	al.	2014
Life	as	a	street	map	How	to	navigate	life	as	a	machine
Mapping	data	to	phylogene(c	
knowledge	space
Integrating data with phylogenies, at scale
Street	signs	serve	people,	not	machines
•  How	do	we	build	a	reliable	GPS	for	phylogenies?	
•  How	do	we	reproducibly	find	the	right	nodes?	
	
Mapping	data	to	phylogene(c	
knowledge	space
FEED
Textual Definition –
The hyoglossus is a muscle that attaches to
the hyoid and tongue and is innervated by
Cranial Nerve XII.
Computable Definition –
('attached to' some 'hyoid bone')
and ('attached to' some tongue)
and ('innervated by' some 'hypoglossal
nerve') and
spatially disjoint with 'intrinsic tongue
muscle'
Druzinsky	et	al	(2015):	Logic	definiDons	of	mammalian	
feeding	muscles	by	means	of	necessary	and	sufficient	
condiDons	true	for	all	mammals	
Nomenclature	≠	Seman(cs
Phyloreference	
=	
Logic	defini(on	of	a	clade,	
using	the	property	common	to	
all	of	life
Phyloreferences	
Statements	formally	expressing	the	paaerns	we	discover	
(analogous	to	map	coordinates)	
	
Node-Based Branch-Based Apomorphy-Based
A B C A B C A B C
X
The	clade	originaDng	
with	the	last	common	
ancestor	of	B	and	C.	
The	clade	originaDng	
with	the	first	ancestor	of	
B	that	is		not	an	
ancestor	of	A.	
The	clade	originaDng	
with	the	first	ancestor	
of	C	to	evolve	X.
Phyloreferences	yield	a	
coordinate	system	for	the	Tree	of	Life	
•  Any	node,	branch,	subtree	is	referenceable	
•  References	are	unambiguous	
•  References	are	computable	
•  References	are	portable	
•  Adapts	to	new	and	changing	knowledge
Many	needed	technologies	already	exist	
•  OWL	ontologies	designed	
for	
–  PhylogeneDc	knowledge:	
CDAO	
–  Phenotypic	knowledge:	
Uberon,	PATO,	…	
–  Efficient	and	expressive	
reasoners:	FaCT++,	HermiT,	
Racer,	ELK
0.0
Campanula_rotundifolia
Pseudonemacladus_oppositifolius
Lobelia_cardinalis
Campanula_latifolia
Cyphocarpus_rigescens
Wahlenbergia_linifolia
Nemacladus_ramosissmus
Lobelia_coronopifolia
Cyphia_elata
Pentaphragma
Crysanthemum
Sphenoclea
Platycodon_grandiflorus
Cyphia_bulbosa
5
3
Campanula
1
7
8
9
4
Lobelia
Cyphia
6
1 0
2
Class:	Campanulaceae_1889_to_1980	
EquivalentTo:		
										cdao:has_Descendant	value	taxon:Campanula_laDfolia	
				and	phyloref:excludes_lineage	value	taxon:Crysanthemum
0.0
Campanula_rotundifolia
Pseudonemacladus_oppositifolius
Lobelia_cardinalis
Campanula_latifolia
Cyphocarpus_rigescens
Wahlenbergia_linifolia
Nemacladus_ramosissmus
Lobelia_coronopifolia
Cyphia_elata
Pentaphragma
Crysanthemum
Sphenoclea
Platycodon_grandiflorus
Cyphia_bulbosa
5
3
Campanula
1
7
8
9
4
Lobelia
Cyphia
6
1 0
2
Class:	Campanulaceae_1980	
EquivalentTo:		
										cdao:has_Descendant	value	taxon:Campanula_laDfolia	
				and	phyloref:excludes_lineage	value	taxon:Lobelia
0.0
Campanula_rotundifolia
Pseudonemacladus_oppositifolius
Lobelia_cardinalis
Campanula_latifolia
Cyphocarpus_rigescens
Wahlenbergia_linifolia
Nemacladus_ramosissmus
Lobelia_coronopifolia
Cyphia_elata
Pentaphragma
Crysanthemum
Sphenoclea
Platycodon_grandiflorus
Cyphia_bulbosa
5
3
Campanula
1
7
8
9
4
Lobelia
Cyphia
6
1 0
2
Class:	Campanulaceae_aier_1995	
EquivalentTo:		
										cdao:has_Descendant	value	taxon:Campanula_laDfolia	
				and	phyloref:excludes_lineage	value	taxon:Sphenoclea
Phyloreferences	as	ontological	expressions	
Phyloreference	expressions	
can	be:		
•  Easily	generated	by	
anyone	
•  Can	work	on	any	tree	
•  Named	and	registered	
– To	promote	reuse	and	
consistency	
– To	improve	usability	
and	accessibility	
Class:	Campanulaceae	
Annota(ons:	
				rdfs:label	“Campanulaceae_aier_1995”	
				dc:descripDon	“the	clade	that	includes	
Campanula	laDfolia	but	not	Sphenoclea”	
EquivalentTo:		
cdao:has_Descendant	value	
taxon:Campanula_laDfolia	and	
phyloref:excludes_lineage	value	taxon:Sphenoclea	
Class:	AGF4-SHRU-3560	
EquivalentTo:		
	cdao:has_Descendant	value	
taxon:Campanula_laDfolia	and	
phyloref:excludes_lineage	value	taxon:Sphenoclea	
vs.
Challenges	
•  OWL-based	data	model	to	saDsfy	phylogeneDc	
taxonomy,	reasoning	expressivity,	scalability	
•  ConvenDons	for	data	transformaDon,	and	
consequences	of	different	choices	
•  Least	common	ancestor	reasoning	for	OWL	
data	
•  Lack	of	canonical	specimen	idenDfier	system	
•  Specifier	mapping	ontologies
Tree	of	Life,	ontologized:	
A	universal	coordinate	system	
•  The	Tree	of	Life	is	itself	an	aggregaDon	and	
integraDon	of	our	phylogeneDc	knowledge.	
•  Phyloreferencing	is	addressing	into	a	knowledge	
universe.	
•  Ontologies,	reasoning,	and	other	KR	techniques	
are	powerful	tools	for	this.
Acknowledgements	
•  NaDonal	Science	FoundaDon	(DBI-1458484)	
•  Ken	and	Linda	McGurn	
•  Phenoscape	
•  EvoIO

More Related Content

PDF
OBF Address at BOSC 2013
PDF
PhyloCommons: Sharing, annotating, and reusing Phylogenies
PDF
BioSQL Reloaded: v1.0 Release, PhyloDB Module, and Future Features
PDF
Rphenoscape: 
Connecting the semantics of evolutionary morphology to comparat...
PPT
OOW2008 in China
PDF
Global Sourcing Updates 2010
PDF
Of Trees and Owl: 
The challenges of reasoning over the semantics of shared d...
PDF
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...
OBF Address at BOSC 2013
PhyloCommons: Sharing, annotating, and reusing Phylogenies
BioSQL Reloaded: v1.0 Release, PhyloDB Module, and Future Features
Rphenoscape: 
Connecting the semantics of evolutionary morphology to comparat...
OOW2008 in China
Global Sourcing Updates 2010
Of Trees and Owl: 
The challenges of reasoning over the semantics of shared d...
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...

More from Hilmar Lapp (14)

PDF
Open Bioinformatics Foundation: 2014 Update & Some Introspection
PDF
Reproducible Science - Panel at iEvoBio 2014
PDF
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
PDF
The Dryad Digital Repository: Published data as part of the greater data ecos...
PDF
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
PDF
The blessing and the curse: handshaking between general and specialist data r...
PDF
Bringing reason to phenotype diversity, character change, and common descent
KEY
Phyloinformatics VoCamp
PDF
Reasoning over phenotype diversity, character change, and evolutionary descent
KEY
Open science, open-source, and open data: Collaboration as an emergent property?
KEY
Liberating Our Beautiful Trees: A Call to Arms.
KEY
OBF Address at BOSC 2012
PDF
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
PDF
Lapp, ISCB Software Sharing Symposium
Open Bioinformatics Foundation: 2014 Update & Some Introspection
Reproducible Science - Panel at iEvoBio 2014
Semantics of and for the diversity of life:
 Opportunities and perils of tryi...
The Dryad Digital Repository: Published data as part of the greater data ecos...
The MIAPA ontology: An annotation ontology for validating minimum metadata re...
The blessing and the curse: handshaking between general and specialist data r...
Bringing reason to phenotype diversity, character change, and common descent
Phyloinformatics VoCamp
Reasoning over phenotype diversity, character change, and evolutionary descent
Open science, open-source, and open data: Collaboration as an emergent property?
Liberating Our Beautiful Trees: A Call to Arms.
OBF Address at BOSC 2012
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Lapp, ISCB Software Sharing Symposium
Ad

Recently uploaded (20)

PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
Cell Membrane: Structure, Composition & Functions
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
DOCX
Viruses (History, structure and composition, classification, Bacteriophage Re...
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
2. Earth - The Living Planet earth and life
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
. Radiology Case Scenariosssssssssssssss
Classification Systems_TAXONOMY_SCIENCE8.pptx
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
The KM-GBF monitoring framework – status & key messages.pptx
2. Earth - The Living Planet Module 2ELS
ECG_Course_Presentation د.محمد صقران ppt
HPLC-PPT.docx high performance liquid chromatography
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
Comparative Structure of Integument in Vertebrates.pptx
AlphaEarth Foundations and the Satellite Embedding dataset
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Cell Membrane: Structure, Composition & Functions
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Viruses (History, structure and composition, classification, Bacteriophage Re...
microscope-Lecturecjchchchchcuvuvhc.pptx
Introduction to Cardiovascular system_structure and functions-1
2. Earth - The Living Planet earth and life
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
Ad

Integrating data with phylogenies, at scale