Donat Agosti Plazi
http://plazi.org
Systematics Association
Oxford, 28. August 2015
Nothing in taxonomy makes sense
except in the light of Open Access
Nothing in taxonomy makes sense except in the light of Open Access
I want to be able at anytime, anywhere to access, mine and analyse a
significant body of published and digitized taxonomic knowledge.
I want to build by machine the catalogue of life.
I hope taxonomiy communications arrives in the 21st century
Vision and hope
1. The demand
Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the
only location with a complete set of ant systematics publications from 1758 - present.
Through antbase.org‘s
digital library, access
to this body of
literature is worldwide,
and it is actively used
(>10,000 visits in one
month only).
2004
2. The corpus of taxonomic literature
Build and establish a TreatmentBank, such as Plazi, as basis for
content mining of and linking to the taxonomic literature
3. The core corpus of taxonomic knowledge: Treatments
4. Make use of the semantic linked WWW
Avoid all the waistful actual publishing!
• Publish structured data
• Publish open access
• Make taxonomic literature first class literature by minting
DOIs and making digital copies accessible
• Add links to names, treatments, articles, DNA sequences,
digital objects
• Help by building your own public corpus of citable data
Pensoft journals (e.g. Biodiversity Data Journal, Zookeys,
Phytokeys) are the gold standard.
Surfing or the seduction of science (for a young kid)
Surfing or the seduction of science (for a young kid)
Surfing or the seduction of science (for a young kid)
Surfing or the seduction of science (for an adult)
Get a copy of the Cyclothone paper
Surfing or the seduction of science (for an adult)
Surfing or the imperative for science
Surfing or the imperative for science
Linking treatments and data with external resources
NCBI
Surfing or the imperative for science
Establish Plazi as, or use Plazi to build TreatmentBank as source for content mining of the
taxonomic literature
TreatmentBank
What are the species in Amazonia?
TreatmentBank
Countries (Region)
Australia (Queensland)
Export species materials citations (DwC)
Text mining tools: Visualization of treatment content
Summary of content of 37 Zootaxa spider publications and 8
Biodiversity Data Journal. (Miller et al., 2015)
Pseudomyrmex ants and Vachellia ant-acacias
are a classic example of mutualism in biology.
allenii
melanoceras
ruddiae
chiapensis
collinsii
cookii
cornigera
globulifera
hindsii
janzenii
mayana
sphaerocephala
boopis
flavicornis
hesperius
ita
janzeni
kuenckeli
mixtecus
nigrocinctus
nigropilosus
opaciceps
particeps
peperi
reconditus
satanicus
simulans
spinicola
subtilissimus
veneficus
ferrugineus
gentlei
gracilis
Transbiotic link network
Associated species linked through
references in taxonomic treatments
Acacia-ant species: Pseudomyrmex gracili
Treatment: redescription
Associated ant-acacia: Acacia gentlei
Ants Plants
Photocredits: Alex Wild
Treatment
Treatments linked
through citations
Text mining tools: Visualization of treatment content
What does this mean?
The Linking Open Data cloud diagram
Linked Open Data Cloud
The demand: scientists and citizen scientists
Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the
only location with a complete set of ant systematics publications from 1758 - present.
Through antbase.org‘s
digital library, access
to this body of
literature is worldwide,
and it is actively used
(>10,000 visits in one
month only).
Online catalogue
Open access
Online library
Online catalogue
The interest of big science
2004
2005
The demand: scientists and citizen scientists
The scientific challenge: Bridging the gap
1 tnntttccca cgaataaata atataagatt ttgattatta cctccttctt taattttatt
61 attatcaaga agattagttt ataaaggagt aggaacagga tgaactgttt atcctccttt
121 atctaataat ttatatcata atggattttc aactgattta gcaatttttt ctttacatat
181 tgcaggaata tcatcaatta taggagcaat taattttatt tcaacaattt taaatataca
241 tcataaaaat ttatcattag ataaaattcc attgttagtt tgatcaattt taattacagc
301 tattttatta ttattatctt tacctgtatt agcaggtgca attactatat tattaactga
361 tcgaaatcta aatacaactt tttttgatcc ttcgggtgga ggagatccaa ttttatatca
421 acatttattt
Where do we stand?
Nothing in taxonomy makes sense except in the light of Open Access
The bristlemouths are a rapacious
family of deep-sea fishes that include
the wildly successful genus
Cyclothone
In contrast, ichthyologists put the
likely figure for bristlemouths at
hundreds of trillions — and perhaps
quadrillions, or thousands of
trillions.
The bristlemouths are a rapacious
family of deep-sea fishes that include
the wildly successful genus
Cyclothone
Nothing in taxonomy makes sense except in the light of Open Access
Taxonomy?
Source?
Nothing in taxonomy makes sense except in the light of Open Access
Issue USD 266.00
Article USD 48.00
Get a copy of the Cyclothone paper
Our contribution for a better understanding of biodiversity
Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire
body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages. Source: (Agosti 2005)
Access
• Limited access (copyright)
• Limited discoverability of content
• Research results cannot be cited
• Data mining does not work
Issues of access
Provide an open access, linked corpus of taxonomic literature
A solution
Surfing at breakfast table
article
treatment
Cites
httpURI
cites (DOI)
Scientific name
https://www.wikidata.org/wiki/
Property:P1992
Feed Wikipedia with taxonomic data
Surfing or the imperative for science
Surfing or the imperative for science
Surfing or the imperative for science
LODPDF
HNS
H
Surfing or the imperative for science: Use of name services
The goal
Create a citable open corpus of taxonomic publications
Nothing in taxonomy makes sense except in the light of Open Access
Biodiversity Literature Repository: Record
Biodiversity Literature Repository: RecordTreatment
Illustration
http://plazi.org/wiki/Blue_ListPatterson et al., 2014: http://dx.doi.org/10.1186/1756-0500-7-79
Legal issues
Workflow
Plazi
SRS
find scan «OCR» markup store +
access
Text
<tax:treatment>
<tax:nomenclature>
<tax:name>
<tax:xid source="HNS" identifier="193329"/>
<tax:xmldata>
<dc:Genus>Mystrium</dc:Genus>
<dc:Species>leonie</dc:Species>
</tax:xmldata>
Mystrium leonie
</tax:name>
<tax:status>n. sp.</tax:status>
Fig 1 D - F
</tax:nomenclature>
<tax:div type="description">
<tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI
1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margi
to a sharp apical tooth, the apex parallel to the an
(Holotype with material in mandibles, so mandibles a
$ described below from paratypes.) Median clypeus
....
</treatment>
Semantisch
erweiterter Text
(TaxonX)
… alternatives: From human to machine readable text
RDF
Plazi tools: table extraction
«Treatment»
Wissenschaftliche Artname
Verbreitungsnachweis
Cataglyphis tartessica workers
Variable mean ± SD
Head length 11.23 ± 0.12
Head width 11.15 ± 0.12
Scape length 11.47 ± 0.12
Mesosoma length 11.94 ± 0.16
Femur length 12.03 ± 0.14
Cephalic index 0 93.60 ± 3.940
Scape index 128.10 ± 7.660
Plazi tools: discovering of scientific names
Plazi tools: discovering and parsing of bibliographic references
Plazi tools: discovering and parsing of observation data
Plazi tools: discovering of treatments
Treatment: a well defined part of an article that
defines the particular usage of a scientific name
by an authority at a given time (a page(s) in a
publication).
Treatment
The special case taxonomic literature: The citated elements are
treatments, not article
Formica obsoleta Linnaeus, 1758: 580
Treatment
Original combinations
Reference to an orginal combination
Subsequent useages of names cite the referenced treatment
What is a treatment?
Treatment and treatment reference and citation
Treatmentcitation
Treatment
references
Treatment
Citing of treatments or linking of treatments to treatments
By minting persistent httpURIs for treatments, treatments
can be cited like a bibliographic reference
http://treatment.plazi.org/id/A9FFD1FC-4629-FFB4-968F-AD38386521BA
Status quo
• 50,000+ treatments life, daily growth
• RDF in Betaversion
• GoldenGate Imagine (PDF and text mining tool) in betaversion
• Provider for data for NCBI, Wikidata, GBIF, EOL, antweb
• Biodiversity Literature Repository functional
Next steps
• Collaborate with ContentMine to extract >50
treatments/day
Next steps
Planned collaboration with ContentMine to extract treatments on a
daly bases
http://www.slideshare.net/petermurrayrust/?
BioDiv
Next steps
• Collaborate with ContentMine to extract 50 treatments/day
• 1 Million treatments life
• RDF Version accessibl
• GoldenGate Imagine (Text mining tool)
• Provider für Daten für NCBI, GBIF, EOL, antweb
• Biodiversity Literature Repository mit 100,000 bibliographic
references and digital copies (PDF, images, etc.)
Next steps
BUT
Next steps
Avoid all this waste (our next generation will have to clean up)!
Publish structured data
Publish open access
Publish in journals with DOI
Add links to names, treatments, articles, DNA sequences, digital
objects
Help build your own corpus of citable data
Pensoft journals (e.g. Biodiversity Data Journal, Zookeys,
Phytokeys) are the gold standard.
Thanks!
Donat Agosti
agosti@plazi.org
Acknowledgment: Pensoft, Zenodo/CERN, NCBI, Wikidata, ContentMine

More Related Content

PDF
20140317 pi b_nmbe_journal_club
PPTX
Agosti 20140813 icd8_agosti_global_dipterology-2
PPTX
Visualizing Primary Data form Taxonomic Literature
PDF
Botanists and annotations: use cases and their relevance for the larger scie...
PPT
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
PPTX
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
PPT
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
PDF
Workshop 5: Uptake of, and concepts in text and data mining
20140317 pi b_nmbe_journal_club
Agosti 20140813 icd8_agosti_global_dipterology-2
Visualizing Primary Data form Taxonomic Literature
Botanists and annotations: use cases and their relevance for the larger scie...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
PRIDE and ProteomeXchange: supporting the cultural change in proteomics publi...
An International Cooperative Digital Library for Taxonomic Literature: The Bi...
Workshop 5: Uptake of, and concepts in text and data mining

What's hot (20)

PDF
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
PPT
Nigel Robinson - ZooBank and Zoological Record: a partnership for success
PPT
Nigel J. Robinson - ZooBank and Zoological Record - a partnership for success
PPTX
Optimising the use of existing knowledge
PPTX
ContentMine + EPMC: Finding Zika!
PPTX
Content Mining of Science in Cambridge
PPTX
Zika virus -a research landscape analysis using journals, patents and dataset...
PPT
The Biodiversity Heritage Library: Corn-fed, Missouri Raised, Going Global
PPTX
Automatic Extraction of Knowledge from Biomedical literature
PPT
A Botanical Introduction to The Biodiversity Heritage Library
PDF
Modern Tools & Rationales for 21st Century Research
PPTX
schema.org and biomedical ontologies
PPT
PDF
Museum impact: linking-up specimens with research published on them
PPTX
Cochrane workshop2016
PPTX
Can Computers understand the scientific literature (includes compscie material)
PPT
A Global Library of Life: The Biodiversity Heritage Library
PPTX
Mining the scientific literature for plants and chemistry
PPTX
OSFair2017 Workshop | OmicsDI: Omics discovery index
PPTX
The culture of researchData
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Nigel Robinson - ZooBank and Zoological Record: a partnership for success
Nigel J. Robinson - ZooBank and Zoological Record - a partnership for success
Optimising the use of existing knowledge
ContentMine + EPMC: Finding Zika!
Content Mining of Science in Cambridge
Zika virus -a research landscape analysis using journals, patents and dataset...
The Biodiversity Heritage Library: Corn-fed, Missouri Raised, Going Global
Automatic Extraction of Knowledge from Biomedical literature
A Botanical Introduction to The Biodiversity Heritage Library
Modern Tools & Rationales for 21st Century Research
schema.org and biomedical ontologies
Museum impact: linking-up specimens with research published on them
Cochrane workshop2016
Can Computers understand the scientific literature (includes compscie material)
A Global Library of Life: The Biodiversity Heritage Library
Mining the scientific literature for plants and chemistry
OSFair2017 Workshop | OmicsDI: Omics discovery index
The culture of researchData
Ad

Viewers also liked (7)

PPTX
Linked Open Data and Systematic Taxonomy
PPTX
Open taxonomy
PPTX
Open Research Data: Taxonomy
PPTX
The role of product category for brand relationships
PPT
Brand As A Category Not A Product
PDF
Category Management Project
PDF
Taxonomies for E-commerce
Linked Open Data and Systematic Taxonomy
Open taxonomy
Open Research Data: Taxonomy
The role of product category for brand relationships
Brand As A Category Not A Product
Category Management Project
Taxonomies for E-commerce
Ad

Similar to Nothing in taxonomy makes sense except in the light of Open Access (20)

PPT
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
PDF
BioDIP - a proposed infrastructure to link the taxonomic to the genomic and o...
PPTX
20140327 rda plazi_final
PPT
Donat Agosti - Copyright, Biopiracy and the Taxonomic Impediment
PPT
Special Libraries Associatin
PPTX
Scientific search for everyone
PPT
Eol fellow-march2010
PPT
Biodiversity Heritage Library : Development and Partnerhips
PPT
Open Access to Legacy Biodiversity Literature
PDF
2 donat agosti-1
PDF
Botanists and annotations printer friendly
PPTX
Botanical Literature Goes Global: The Biodiversity Heritage Library
PPT
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
PPTX
What are we DOIng about the missing links? Connecting taxonomic names to the ...
PPT
Mla May 7
PPT
2009 05 20 Cimc Pilsk
PPT
Smithsonian Libraries 2.0 and the Biodiversity Heritage Library Project
PPTX
An Introduction to the Biodiversity Heritage Library for the DC Science Libra...
PDF
3 Years On: The Biodiversity Heritage Library
PDF
2017.07.25 xixibc kalfatovic
Biodiversity Heritage Library: A Conversation About A Collaborative Digitizin...
BioDIP - a proposed infrastructure to link the taxonomic to the genomic and o...
20140327 rda plazi_final
Donat Agosti - Copyright, Biopiracy and the Taxonomic Impediment
Special Libraries Associatin
Scientific search for everyone
Eol fellow-march2010
Biodiversity Heritage Library : Development and Partnerhips
Open Access to Legacy Biodiversity Literature
2 donat agosti-1
Botanists and annotations printer friendly
Botanical Literature Goes Global: The Biodiversity Heritage Library
The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this...
What are we DOIng about the missing links? Connecting taxonomic names to the ...
Mla May 7
2009 05 20 Cimc Pilsk
Smithsonian Libraries 2.0 and the Biodiversity Heritage Library Project
An Introduction to the Biodiversity Heritage Library for the DC Science Libra...
3 Years On: The Biodiversity Heritage Library
2017.07.25 xixibc kalfatovic

More from agosti (17)

PPTX
DOI and the Mitteilungen: communicating scientific results in the future
PPTX
Data Sharing Principles and Legal Interoperability for Essential Biodiversity...
PPTX
Revolutionizing the Research on Ants through new Methods and Technologies: th...
PPTX
20150701 opendata bern_agosti_2
PPTX
Plazi or the challenge to free biodiversity data caught in hundreds of millio...
PPTX
20141027 bouchout declaration
PPT
20140924 rda _bouchout
PPTX
20140922 rda codata_legal_ig_plazi_final
PDF
A Step Towards (From) Read to Write Access to Taxonomic Publications
PPT
Bouchout Declaration on Open Biodiversity Knowledge Management, Montpellier J...
PPT
Bouchout Declaration on Open Biodiversity Knowledge Management, Montpellier J...
PDF
20140623 swets agosti_final
PPT
20140523 swiss curators_bouchout_2
PPTX
20110725 ibc xml
PDF
20110222 behesty monitoring and measuring biodiversity
PPT
20110122 vibrant final
PPT
20090921 Art Databanken Agosti Final
DOI and the Mitteilungen: communicating scientific results in the future
Data Sharing Principles and Legal Interoperability for Essential Biodiversity...
Revolutionizing the Research on Ants through new Methods and Technologies: th...
20150701 opendata bern_agosti_2
Plazi or the challenge to free biodiversity data caught in hundreds of millio...
20141027 bouchout declaration
20140924 rda _bouchout
20140922 rda codata_legal_ig_plazi_final
A Step Towards (From) Read to Write Access to Taxonomic Publications
Bouchout Declaration on Open Biodiversity Knowledge Management, Montpellier J...
Bouchout Declaration on Open Biodiversity Knowledge Management, Montpellier J...
20140623 swets agosti_final
20140523 swiss curators_bouchout_2
20110725 ibc xml
20110222 behesty monitoring and measuring biodiversity
20110122 vibrant final
20090921 Art Databanken Agosti Final

Recently uploaded (20)

PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PPTX
perinatal infections 2-171220190027.pptx
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PDF
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
PPT
Mutation in dna of bacteria and repairss
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
gene cloning powerpoint for general biology 2
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
Microbes in human welfare class 12 .pptx
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
BODY FLUIDS AND CIRCULATION class 11 .pptx
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PPTX
PMR- PPT.pptx for students and doctors tt
PPTX
Introcution to Microbes Burton's Biology for the Health
PDF
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
perinatal infections 2-171220190027.pptx
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Presentation1 INTRODUCTION TO ENZYMES.pptx
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
Mutation in dna of bacteria and repairss
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
gene cloning powerpoint for general biology 2
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Microbes in human welfare class 12 .pptx
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Hypertension_Training_materials_English_2024[1] (1).pptx
BODY FLUIDS AND CIRCULATION class 11 .pptx
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PMR- PPT.pptx for students and doctors tt
Introcution to Microbes Burton's Biology for the Health
CHAPTER 2 The Chemical Basis of Life Lecture Outline.pdf

Nothing in taxonomy makes sense except in the light of Open Access

  • 1. Donat Agosti Plazi http://plazi.org Systematics Association Oxford, 28. August 2015 Nothing in taxonomy makes sense except in the light of Open Access
  • 3. I want to be able at anytime, anywhere to access, mine and analyse a significant body of published and digitized taxonomic knowledge. I want to build by machine the catalogue of life. I hope taxonomiy communications arrives in the 21st century Vision and hope
  • 4. 1. The demand Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present. Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only). 2004
  • 5. 2. The corpus of taxonomic literature
  • 6. Build and establish a TreatmentBank, such as Plazi, as basis for content mining of and linking to the taxonomic literature 3. The core corpus of taxonomic knowledge: Treatments
  • 7. 4. Make use of the semantic linked WWW Avoid all the waistful actual publishing! • Publish structured data • Publish open access • Make taxonomic literature first class literature by minting DOIs and making digital copies accessible • Add links to names, treatments, articles, DNA sequences, digital objects • Help by building your own public corpus of citable data Pensoft journals (e.g. Biodiversity Data Journal, Zookeys, Phytokeys) are the gold standard.
  • 8. Surfing or the seduction of science (for a young kid)
  • 9. Surfing or the seduction of science (for a young kid)
  • 10. Surfing or the seduction of science (for a young kid)
  • 11. Surfing or the seduction of science (for an adult)
  • 12. Get a copy of the Cyclothone paper Surfing or the seduction of science (for an adult)
  • 13. Surfing or the imperative for science
  • 14. Surfing or the imperative for science
  • 15. Linking treatments and data with external resources NCBI Surfing or the imperative for science
  • 16. Establish Plazi as, or use Plazi to build TreatmentBank as source for content mining of the taxonomic literature TreatmentBank
  • 17. What are the species in Amazonia? TreatmentBank
  • 18. Countries (Region) Australia (Queensland) Export species materials citations (DwC)
  • 19. Text mining tools: Visualization of treatment content Summary of content of 37 Zootaxa spider publications and 8 Biodiversity Data Journal. (Miller et al., 2015)
  • 20. Pseudomyrmex ants and Vachellia ant-acacias are a classic example of mutualism in biology. allenii melanoceras ruddiae chiapensis collinsii cookii cornigera globulifera hindsii janzenii mayana sphaerocephala boopis flavicornis hesperius ita janzeni kuenckeli mixtecus nigrocinctus nigropilosus opaciceps particeps peperi reconditus satanicus simulans spinicola subtilissimus veneficus ferrugineus gentlei gracilis Transbiotic link network Associated species linked through references in taxonomic treatments Acacia-ant species: Pseudomyrmex gracili Treatment: redescription Associated ant-acacia: Acacia gentlei Ants Plants Photocredits: Alex Wild Treatment Treatments linked through citations Text mining tools: Visualization of treatment content
  • 21. What does this mean? The Linking Open Data cloud diagram Linked Open Data Cloud
  • 22. The demand: scientists and citizen scientists Before antbase.org, Harvard‘s Museum of Comparative Zoology could claim to be the only location with a complete set of ant systematics publications from 1758 - present. Through antbase.org‘s digital library, access to this body of literature is worldwide, and it is actively used (>10,000 visits in one month only). Online catalogue Open access Online library
  • 23. Online catalogue The interest of big science 2004 2005
  • 24. The demand: scientists and citizen scientists
  • 25. The scientific challenge: Bridging the gap 1 tnntttccca cgaataaata atataagatt ttgattatta cctccttctt taattttatt 61 attatcaaga agattagttt ataaaggagt aggaacagga tgaactgttt atcctccttt 121 atctaataat ttatatcata atggattttc aactgattta gcaatttttt ctttacatat 181 tgcaggaata tcatcaatta taggagcaat taattttatt tcaacaattt taaatataca 241 tcataaaaat ttatcattag ataaaattcc attgttagtt tgatcaattt taattacagc 301 tattttatta ttattatctt tacctgtatt agcaggtgca attactatat tattaactga 361 tcgaaatcta aatacaactt tttttgatcc ttcgggtgga ggagatccaa ttttatatca 421 acatttattt
  • 26. Where do we stand?
  • 28. The bristlemouths are a rapacious family of deep-sea fishes that include the wildly successful genus Cyclothone In contrast, ichthyologists put the likely figure for bristlemouths at hundreds of trillions — and perhaps quadrillions, or thousands of trillions.
  • 29. The bristlemouths are a rapacious family of deep-sea fishes that include the wildly successful genus Cyclothone
  • 34. Get a copy of the Cyclothone paper Our contribution for a better understanding of biodiversity
  • 35. Access to ant taxonomic publications through antbase.org /Smithsonian Institution, including currently the entire body of non-copyrighted publications since 1758 (>4,000 publications or 85,000 pages. Source: (Agosti 2005) Access
  • 36. • Limited access (copyright) • Limited discoverability of content • Research results cannot be cited • Data mining does not work Issues of access
  • 37. Provide an open access, linked corpus of taxonomic literature A solution
  • 40. Surfing or the imperative for science
  • 41. Surfing or the imperative for science
  • 42. Surfing or the imperative for science
  • 43. LODPDF HNS H Surfing or the imperative for science: Use of name services
  • 45. Create a citable open corpus of taxonomic publications
  • 48. Biodiversity Literature Repository: RecordTreatment Illustration
  • 49. http://plazi.org/wiki/Blue_ListPatterson et al., 2014: http://dx.doi.org/10.1186/1756-0500-7-79 Legal issues
  • 50. Workflow Plazi SRS find scan «OCR» markup store + access
  • 51. Text <tax:treatment> <tax:nomenclature> <tax:name> <tax:xid source="HNS" identifier="193329"/> <tax:xmldata> <dc:Genus>Mystrium</dc:Genus> <dc:Species>leonie</dc:Species> </tax:xmldata> Mystrium leonie </tax:name> <tax:status>n. sp.</tax:status> Fig 1 D - F </tax:nomenclature> <tax:div type="description"> <tax:p>HOLOTYPE WORKER: TL 3.95, HL 1.02, HW 0.95, CI 1.30, SI 137, PW 0.73, ML 0.38. Mandible outer margi to a sharp apical tooth, the apex parallel to the an (Holotype with material in mandibles, so mandibles a $ described below from paratypes.) Median clypeus .... </treatment> Semantisch erweiterter Text (TaxonX) … alternatives: From human to machine readable text RDF
  • 52. Plazi tools: table extraction «Treatment» Wissenschaftliche Artname Verbreitungsnachweis Cataglyphis tartessica workers Variable mean ± SD Head length 11.23 ± 0.12 Head width 11.15 ± 0.12 Scape length 11.47 ± 0.12 Mesosoma length 11.94 ± 0.16 Femur length 12.03 ± 0.14 Cephalic index 0 93.60 ± 3.940 Scape index 128.10 ± 7.660
  • 53. Plazi tools: discovering of scientific names
  • 54. Plazi tools: discovering and parsing of bibliographic references
  • 55. Plazi tools: discovering and parsing of observation data
  • 56. Plazi tools: discovering of treatments
  • 57. Treatment: a well defined part of an article that defines the particular usage of a scientific name by an authority at a given time (a page(s) in a publication). Treatment The special case taxonomic literature: The citated elements are treatments, not article Formica obsoleta Linnaeus, 1758: 580
  • 59. Original combinations Reference to an orginal combination Subsequent useages of names cite the referenced treatment What is a treatment?
  • 60. Treatment and treatment reference and citation Treatmentcitation Treatment references
  • 61. Treatment Citing of treatments or linking of treatments to treatments By minting persistent httpURIs for treatments, treatments can be cited like a bibliographic reference http://treatment.plazi.org/id/A9FFD1FC-4629-FFB4-968F-AD38386521BA
  • 62. Status quo • 50,000+ treatments life, daily growth • RDF in Betaversion • GoldenGate Imagine (PDF and text mining tool) in betaversion • Provider for data for NCBI, Wikidata, GBIF, EOL, antweb • Biodiversity Literature Repository functional
  • 63. Next steps • Collaborate with ContentMine to extract >50 treatments/day
  • 64. Next steps Planned collaboration with ContentMine to extract treatments on a daly bases http://www.slideshare.net/petermurrayrust/? BioDiv
  • 65. Next steps • Collaborate with ContentMine to extract 50 treatments/day • 1 Million treatments life • RDF Version accessibl • GoldenGate Imagine (Text mining tool) • Provider für Daten für NCBI, GBIF, EOL, antweb • Biodiversity Literature Repository mit 100,000 bibliographic references and digital copies (PDF, images, etc.)
  • 67. Next steps Avoid all this waste (our next generation will have to clean up)! Publish structured data Publish open access Publish in journals with DOI Add links to names, treatments, articles, DNA sequences, digital objects Help build your own corpus of citable data Pensoft journals (e.g. Biodiversity Data Journal, Zookeys, Phytokeys) are the gold standard.
  • 68. Thanks! Donat Agosti agosti@plazi.org Acknowledgment: Pensoft, Zenodo/CERN, NCBI, Wikidata, ContentMine