Leveraging an international infrastructure
Case studies from the Encyclopedia of Life




Cynthia Parr, Katja Schulz, and Jennifer Hammock
TDWG 2012                                          @cydparr
Beijing, China 25 October 2012                     @eol
Outline
• Briefly, who are we and why are we here?
• The information landscape of species descriptions
• Thoughts for the future

Note:
Rubenstein Fellows Proposals are due 15 November
See http://eol.org/info/Rubenstein_2013_competition
EOL aggregates and curates


                              Curate


Aggregate

                              Comment
                              Rate, Collect
                                              eol.org


            Quality control

                                              Third party apps
Why survey the landscape?
•   Improve standards and set goals
•   Prepare for text mining
•   Learn how best to support quality control
•   Baseline for improving multilingual, open
    content

• Because we can
>1.1 million taxon pages with content
from more than 200 providers, 1000s individuals
          5 million content objects
Total of 1,822,079 images 9,586 videos 28,569 sounds
Number of text objects
                                             -   100,000   200,000   300,000   400,000   500,000   600,000   700,000   800,000

                              Distribution

                          TypeInformation
Subject of text object




                                   Habitat

                                  Threats

                             Conservation

                                   Trends

                              Associations

                           TrophicStrategy

                         PopulationBiology

                                Migration

                            LifeExpectancy

                                Behaviour

                                 Diseases
Many user-created EOL collections are
          local checklists
                               Geographical checklists
                                      n=618




            Other checklists
               n=1662
License restrictions vary by object type
              n=~5 million
100%

80%
                                            public domain
60%
                                            cc-by
40%                                         cc-by-sa
                                            cc-by-nc
20%                                         cc-by-nc-sa
 0%
           text images maps videos sounds
       n=~3 million
Norway
                          Dutch
             USA                                       Taiwan
 Mexico                                            China
                                  Egypt
                                           India
     Costa
     Rica      Colombia

                   Peru
                                                         Australia
                            South Africa




EOL interface now in 12 languages
Via translatewiki.org
However…




Vernacular names   Text description objects
in 163 languages       in 17 languages
Some providers get higher ratings
                than others
100%

80%                                                               5 stars
60%
                                                                  4 stars
                                                                  3 stars
40%
                                                                  2 stars
20%                                                               1 star
 0%



           Total n = 154,308 rating actions
           Showing only those 17 providers who got at least 1000 ratings
Full curators down-rate
       and non-curators up-rate
100%
90%
80%
70%                                                        5
60%
                                                           4
50%
                                                           3
40%
30%
                                                           2
20%                                                        1
10%
 0%
       non-curators   assistant curators   full curators
Assistant (n=177) & full curators (n=984) are different
                   5816       223,639 actions
           100%

            90%

            80%

            70%
                                                  common names
            60%
                                                  set exemplar
            50%                                   rating
            40%                                   taxon associations
                                                  add articles
            30%
                                                  classifications
            20%                                   trust/untrust

            10%
                                                33 actions per assistant curator
            0%                                  227 actions per full curator
                  assistant   full/master
                  curators      curators
Quality control case studies

         The case of “Panisopis”



1.   Rod spots the error on EOL and posts about it on his blog
2.   Cyndy reads the blog and posts it as a comment on EOL
3.   The EOL comment gets sent to ITIS automatically
4.   ITIS fixes its database

5. EOL hasn’t yet updated from ITIS
The case of the Far Side cartoon
Conclusions
• We’ve made a lot of progress
  – Large repository, many subjects
  – Great start on collections/checklists
  – Lots of CC-licenses
  – Lot of international partnerships and interface
    languages
  – Active curators

• We’ve got plenty of room for more
  – Please share your ideas for the future
Thanks to




John D. and Catherine T. MacArthur
Foundation, Alfred P. Sloan Foundation, Smithsonian
Institution, Marine Biological Laboratory, Harvard
University, David Rubenstein, and other funders and   eol.org
donors
All our users, content provider & global partners     @cydparr
    especially the Chinese Academy of Sciences        @eol

More Related Content

PPTX
The able decade 2003 2012
PPT
Waste - Brighton Manifesto roundtable
PPTX
Undala Alam - India and Pakistan's truculent co-operation - is 50 years enough?
PPT
Atlas of Living Australia
PDF
Erik Millstone - Can we research tipping points? If so, how?
PPTX
eBooks - Tipping Points and Milestones
PPT
Biodiversity, ecosystem services, social sustainability and tipping points in...
PDF
Tim Lenton - Early warning of climate tipping points
The able decade 2003 2012
Waste - Brighton Manifesto roundtable
Undala Alam - India and Pakistan's truculent co-operation - is 50 years enough?
Atlas of Living Australia
Erik Millstone - Can we research tipping points? If so, how?
eBooks - Tipping Points and Milestones
Biodiversity, ecosystem services, social sustainability and tipping points in...
Tim Lenton - Early warning of climate tipping points

Viewers also liked (6)

PDF
Implementing the PR&G at usda gollehon
PPT
Maniefsto: Annabel Marin - Innovation In Natural Resource Based Industries I...
PPT
Manifesto: Monique Deminint
PDF
Richard A Matthew
PDF
Tipping points and indicators
PPT
Properties of seawater
Implementing the PR&G at usda gollehon
Maniefsto: Annabel Marin - Innovation In Natural Resource Based Industries I...
Manifesto: Monique Deminint
Richard A Matthew
Tipping points and indicators
Properties of seawater
Ad

Similar to Leveraging an international infrastructure: Case studies from the Encyclopeda of Life (20)

PPTX
Species pages and portals
PPTX
Introduction to EOL.org for scientists
PDF
BioCatalogue DILS & Enfin 2009 by Jits
PPTX
Global content summit: Overview, content partnering, richness
PDF
Bruno Progress Report
PDF
Bruno Progress
PPTX
Building EOL species pages
PDF
Welcome to the Jungle - Oz-IA 2010 - Matt Moore
PPT
Scientific Information Integration & Discovery Service: Getting the most rele...
PPT
Biocatalogue Talk Slides
PPTX
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
PPTX
Swiss_GLAM_Survey_presentation_20130412
PDF
Taxonomy briefing-2012-v01
PDF
Bruno A F G
PDF
Searching for Interestingness in Wikipedia and Yahoo! Answers
PDF
Ie 16 01 2005
PPTX
Franz et al 2017 ecn creating and publishing a symbiota based checklist version
PPT
DCC Keynote 2007
PPT
Some thoughts on social tagging
PDF
Danis lsssg
Species pages and portals
Introduction to EOL.org for scientists
BioCatalogue DILS & Enfin 2009 by Jits
Global content summit: Overview, content partnering, richness
Bruno Progress Report
Bruno Progress
Building EOL species pages
Welcome to the Jungle - Oz-IA 2010 - Matt Moore
Scientific Information Integration & Discovery Service: Getting the most rele...
Biocatalogue Talk Slides
NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an...
Swiss_GLAM_Survey_presentation_20130412
Taxonomy briefing-2012-v01
Bruno A F G
Searching for Interestingness in Wikipedia and Yahoo! Answers
Ie 16 01 2005
Franz et al 2017 ecn creating and publishing a symbiota based checklist version
DCC Keynote 2007
Some thoughts on social tagging
Danis lsssg
Ad

More from Cyndy Parr (20)

PDF
Open data and the ag data commons
PPTX
Ag Data Commons for AgBioData
PPTX
Biodiversity informatics and the agricultural data landscape
PPTX
Public access to research results at USDA
PPTX
Ag Data Commons: Agricultural research metadata and data
PPTX
Ag Data Commons: A new USDA catalog and repository for agricultural research ...
PPTX
Preparing for data-intensive science across domains.
PPTX
Parr ag datacommonsnal_brownbag
PPTX
Ag Data Commons: Adding Value to open agricultural research data
PPTX
Big Data Initiatives for Agroecosystems
PPTX
TDWG 2014 opening talk: Chair's Welcome
PPTX
Behavior ontology workshop princeton
PPT
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
PPT
Frontiers of discovery with Encyclopedia of Life
PPTX
Practical interoperability across semantic stores of data for ecological, tax...
PPTX
Using and extending Darwin Core for structured attribute data
PPTX
How the Encyclopedia of Life is wrangling organismal attribute data
PPTX
The Road to TraitBank: What's Next for the Encyclopedia of Life
PPTX
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
PPTX
Encyclopedia of Life: Use cases for phenotypes
Open data and the ag data commons
Ag Data Commons for AgBioData
Biodiversity informatics and the agricultural data landscape
Public access to research results at USDA
Ag Data Commons: Agricultural research metadata and data
Ag Data Commons: A new USDA catalog and repository for agricultural research ...
Preparing for data-intensive science across domains.
Parr ag datacommonsnal_brownbag
Ag Data Commons: Adding Value to open agricultural research data
Big Data Initiatives for Agroecosystems
TDWG 2014 opening talk: Chair's Welcome
Behavior ontology workshop princeton
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
Frontiers of discovery with Encyclopedia of Life
Practical interoperability across semantic stores of data for ecological, tax...
Using and extending Darwin Core for structured attribute data
How the Encyclopedia of Life is wrangling organismal attribute data
The Road to TraitBank: What's Next for the Encyclopedia of Life
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Use cases for phenotypes

Recently uploaded (20)

PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Consumable AI The What, Why & How for Small Teams.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
Enhancing plagiarism detection using data pre-processing and machine learning...
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
DOCX
search engine optimization ppt fir known well about this
PDF
Architecture types and enterprise applications.pdf
PDF
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
PDF
Flame analysis and combustion estimation using large language and vision assi...
PPTX
TEXTILE technology diploma scope and career opportunities
PPTX
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
UiPath Agentic Automation session 1: RPA to Agents
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
OpenACC and Open Hackathons Monthly Highlights July 2025
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Consumable AI The What, Why & How for Small Teams.pdf
1 - Historical Antecedents, Social Consideration.pdf
Enhancing plagiarism detection using data pre-processing and machine learning...
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
A review of recent deep learning applications in wood surface defect identifi...
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
search engine optimization ppt fir known well about this
Architecture types and enterprise applications.pdf
How IoT Sensor Integration in 2025 is Transforming Industries Worldwide
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
Flame analysis and combustion estimation using large language and vision assi...
TEXTILE technology diploma scope and career opportunities
GROUP4NURSINGINFORMATICSREPORT-2 PRESENTATION
Module 1.ppt Iot fundamentals and Architecture
Comparative analysis of machine learning models for fake news detection in so...
UiPath Agentic Automation session 1: RPA to Agents
Credit Without Borders: AI and Financial Inclusion in Bangladesh
The influence of sentiment analysis in enhancing early warning system model f...
OpenACC and Open Hackathons Monthly Highlights July 2025

Leveraging an international infrastructure: Case studies from the Encyclopeda of Life

  • 1. Leveraging an international infrastructure Case studies from the Encyclopedia of Life Cynthia Parr, Katja Schulz, and Jennifer Hammock TDWG 2012 @cydparr Beijing, China 25 October 2012 @eol
  • 2. Outline • Briefly, who are we and why are we here? • The information landscape of species descriptions • Thoughts for the future Note: Rubenstein Fellows Proposals are due 15 November See http://eol.org/info/Rubenstein_2013_competition
  • 3. EOL aggregates and curates Curate Aggregate Comment Rate, Collect eol.org Quality control Third party apps
  • 4. Why survey the landscape? • Improve standards and set goals • Prepare for text mining • Learn how best to support quality control • Baseline for improving multilingual, open content • Because we can
  • 5. >1.1 million taxon pages with content from more than 200 providers, 1000s individuals 5 million content objects
  • 6. Total of 1,822,079 images 9,586 videos 28,569 sounds
  • 7. Number of text objects - 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 Distribution TypeInformation Subject of text object Habitat Threats Conservation Trends Associations TrophicStrategy PopulationBiology Migration LifeExpectancy Behaviour Diseases
  • 8. Many user-created EOL collections are local checklists Geographical checklists n=618 Other checklists n=1662
  • 9. License restrictions vary by object type n=~5 million 100% 80% public domain 60% cc-by 40% cc-by-sa cc-by-nc 20% cc-by-nc-sa 0% text images maps videos sounds n=~3 million
  • 10. Norway Dutch USA Taiwan Mexico China Egypt India Costa Rica Colombia Peru Australia South Africa EOL interface now in 12 languages Via translatewiki.org
  • 11. However… Vernacular names Text description objects in 163 languages in 17 languages
  • 12. Some providers get higher ratings than others 100% 80% 5 stars 60% 4 stars 3 stars 40% 2 stars 20% 1 star 0% Total n = 154,308 rating actions Showing only those 17 providers who got at least 1000 ratings
  • 13. Full curators down-rate and non-curators up-rate 100% 90% 80% 70% 5 60% 4 50% 3 40% 30% 2 20% 1 10% 0% non-curators assistant curators full curators
  • 14. Assistant (n=177) & full curators (n=984) are different 5816 223,639 actions 100% 90% 80% 70% common names 60% set exemplar 50% rating 40% taxon associations add articles 30% classifications 20% trust/untrust 10% 33 actions per assistant curator 0% 227 actions per full curator assistant full/master curators curators
  • 15. Quality control case studies The case of “Panisopis” 1. Rod spots the error on EOL and posts about it on his blog 2. Cyndy reads the blog and posts it as a comment on EOL 3. The EOL comment gets sent to ITIS automatically 4. ITIS fixes its database 5. EOL hasn’t yet updated from ITIS
  • 16. The case of the Far Side cartoon
  • 17. Conclusions • We’ve made a lot of progress – Large repository, many subjects – Great start on collections/checklists – Lots of CC-licenses – Lot of international partnerships and interface languages – Active curators • We’ve got plenty of room for more – Please share your ideas for the future
  • 18. Thanks to John D. and Catherine T. MacArthur Foundation, Alfred P. Sloan Foundation, Smithsonian Institution, Marine Biological Laboratory, Harvard University, David Rubenstein, and other funders and eol.org donors All our users, content provider & global partners @cydparr especially the Chinese Academy of Sciences @eol

Editor's Notes

  • #2: As you may know, Encyclopedia of Life is a web site providing global access to knowledge about life on earth.Global – the whole worldAccess – free, and freely re-usableKnowledge – synthesized, not rawLife on Earth – biological diversity
  • #4: EOL takes information from about 200 sources so far, mostly scientific databases, but also including Flickr and Wikipedia, and automatically sorts it onto on taxon pages. Our curators can then trust or untrust it, or anybody can provide comments or ratings. About a thousand credentialed scientists have already volunteered to help with quality control. Actions and comments get fed back to the original providers, and the material on EOL is also available to other applications via an Application Programming Interface, which I’ll talk more about in a moment.We’re partnering with over two hundred scientific databases as well as public conribution sites like Flickr and Wikipedia.100+ partner databases700 curators/1000s contributors/46,000 members2.8 million pages500 thousand pages with Creative Commons contentOver 2 million data objects and >1 million pages with links to research literatureTraffic in past year: 1.7 million unique users, 6.2 million page views
  • #7: These numbers a bit out of date now
  • #8: These are only the top subjects, there are many more. Subjects are almost all infoitems from the TDWG Species Profile Model. Multiple topics includes several vague subjects like “Biology” “TaxonBiology” and “Description”
  • #9: These are only checklists that have more than one item.
  • #10: Images are less than half the amount of text (1.37 million). Far fewer examples of videos and sounds, but these are expected to grow.
  • #13: Not going to name names, except to say that the two to the right here with more 5 star ratings are flickr and Wikipedia, and the ones on the left are the museums and specimens.
  • #15: Full curators have credentials and have more power. Assistant curators do relatively more adding of common names, and are also more likely to add article text.Full curators are the only ones that can trust or untrust (red)Both spend a lot of time rating objects (1 to 5 stars)So far, few full curators are working with classifications.
  • #16: One point: would have been faster if Rod had just posted the comment directly.Another point: Obviously it would be better if EOL regularly updated from IT IS, because it has been four months and we stil don’t have the correction on EOL.
  • #17: This was discovered on EOL by a curator. It was a slide in a Smithsonian botanist’s slide deck that an intern scanned and added to the specimen catalog. Luckily it was “identified” as “Undetermined”. Because it was spotted by an EOL curator, the museum was able to remove it from their catalog.