SlideShare a Scribd company logo
The Natural History Open Data Challenge @ OTA16
Diverse collections spanning
space and time
Challenge of scale:
>80 million specimens!
Challenge of speed
(digitising within a lifetime)
Ambitious digitisation
programme (DCP)
Institutional policy
“open by default” 
The Natural History Open Data Challenge @ OTA16
Higher Classification
Scientific name: Thymelicus lineola (Ochsenheimer, 1808)
Family: Hesperiidae
Location
Locality: Tilbury Docks
State/province: England
Country: United Kingdom
Continent: Europe
Decimal latitude: 51.4605
Decimal longitude: 0.3449
Collection Event
Recorded by: T G. Howarth; Howarth
Collection date: 31 / 07 / 1938
Most iCollections specimens will have ~30 fields containing data
(over 100 different fields across all collections)
There are some issues…
(where is H. M. Edelsten!?)
http://data.nhm.ac.uk
Complete NHM Specimen Dataset (3.3M records)
http://bit.ly/2goEpBB
GitHub Gist – NHM API:
http://bit.ly/2gtukRv
iCollections Datasets
http://bit.ly/2gGZub5
Even more data…
http://www.gbif.org/occurrence
Potential Challenges
How did collecting effort change over time?
Who was the collector who collected from the most distinct localities? – can we make a ranking
table and mash up data with Wikipedia or other sources?
What can we learn about the collectors – who travelled the furthest or most regularly?
Were most specimens collected in rural areas? Is there collection bias in particular counties?
How can we make the data more attractive to difference audiences?
How could we display the data in more engaging or informative ways?
Complete NHM Specimen Dataset (3.3M records)
http://bit.ly/2goEpBB
GitHub Gist – NHM API:
http://bit.ly/2gtukRv
iCollections Datasets
http://bit.ly/2gGZub5
Even more data…
http://www.gbif.org/occurrence

More Related Content

PDF
Frictionless Data Exchange
PDF
OTA 2016 schedule
PDF
Mobilising the world's Natural History - Open Data + Citizen Science
PDF
Getting Started with Citizen Science - Principles & Matrix.pdf
PDF
Seeing Stars Leiden - Citizen Science Lab.pdf
PDF
The Citizen Science Lab at Leiden University
PDF
Een introductie tot Citizen Science / Burgerwetenschap
PPTX
A whirlwind tour of Citizen Science in Astronomy
Frictionless Data Exchange
OTA 2016 schedule
Mobilising the world's Natural History - Open Data + Citizen Science
Getting Started with Citizen Science - Principles & Matrix.pdf
Seeing Stars Leiden - Citizen Science Lab.pdf
The Citizen Science Lab at Leiden University
Een introductie tot Citizen Science / Burgerwetenschap
A whirlwind tour of Citizen Science in Astronomy

More from Margaret Gold (20)

PDF
Kicking off the INCENTIVE project with an intro to the CS Principles and Char...
PDF
ECSA, the ECSA principles, and the ECSA Characteristics of Citizen Science
PPTX
My report to the ECSA General Assemby 2020 re WeObserve & Landsense
PDF
Co-creation with the city of Leiden: 444 and the Citizen Science Lab
PDF
The ECSA Characteristics of Citizen Science
PDF
Opening up Science through Public Engagement - WeObserve and the Landscape of...
PDF
Citizen Science and the UN Sustainable Development Goals
PDF
Introduction to the eu citizen science project
PDF
ECSA and the 10 Principles of Citizen Science
PDF
School assembly the journey of the rover opportunity
PDF
Introduction to the European Citizen Science Association
PDF
A Landscape of Citizen Observatories in Europe - EuroGEOSS Poster
PDF
The Landscape of Citizen Observatories across the EU - ESA Phi-week 2018
PPTX
My Keynote at the GLOBE conference in Leysin, March 2018
PPTX
Science Hack Day Vilnius - Science for all and all for Science
PPTX
Setting Collections Data Free with the Power of the Crowd - SYNTHESYS3
PPTX
CitSci Association Conference 2017 - Digitising Dinosaurs - Crowdsourcing at ...
PPTX
CitSci Association Conference 2017 - Hack Days & ThinkCamps for Citizen Science
PDF
Miniature Fossils Magnified at the Fossil Festival
PPTX
Science week 2017
Kicking off the INCENTIVE project with an intro to the CS Principles and Char...
ECSA, the ECSA principles, and the ECSA Characteristics of Citizen Science
My report to the ECSA General Assemby 2020 re WeObserve & Landsense
Co-creation with the city of Leiden: 444 and the Citizen Science Lab
The ECSA Characteristics of Citizen Science
Opening up Science through Public Engagement - WeObserve and the Landscape of...
Citizen Science and the UN Sustainable Development Goals
Introduction to the eu citizen science project
ECSA and the 10 Principles of Citizen Science
School assembly the journey of the rover opportunity
Introduction to the European Citizen Science Association
A Landscape of Citizen Observatories in Europe - EuroGEOSS Poster
The Landscape of Citizen Observatories across the EU - ESA Phi-week 2018
My Keynote at the GLOBE conference in Leysin, March 2018
Science Hack Day Vilnius - Science for all and all for Science
Setting Collections Data Free with the Power of the Crowd - SYNTHESYS3
CitSci Association Conference 2017 - Digitising Dinosaurs - Crowdsourcing at ...
CitSci Association Conference 2017 - Hack Days & ThinkCamps for Citizen Science
Miniature Fossils Magnified at the Fossil Festival
Science week 2017
Ad

Recently uploaded (20)

PPTX
A Presentation on Touch Screen Technology
PPTX
1. Introduction to Computer Programming.pptx
PDF
project resource management chapter-09.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
TLE Review Electricity (Electricity).pptx
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
Encapsulation theory and applications.pdf
PDF
Hybrid model detection and classification of lung cancer
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
A Presentation on Touch Screen Technology
1. Introduction to Computer Programming.pptx
project resource management chapter-09.pdf
A comparative study of natural language inference in Swahili using monolingua...
Hindi spoken digit analysis for native and non-native speakers
cloud_computing_Infrastucture_as_cloud_p
TLE Review Electricity (Electricity).pptx
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Univ-Connecticut-ChatGPT-Presentaion.pdf
Heart disease approach using modified random forest and particle swarm optimi...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
NewMind AI Weekly Chronicles - August'25-Week II
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Unlocking AI with Model Context Protocol (MCP)
Programs and apps: productivity, graphics, security and other tools
WOOl fibre morphology and structure.pdf for textiles
Encapsulation theory and applications.pdf
Hybrid model detection and classification of lung cancer
Digital-Transformation-Roadmap-for-Companies.pptx
Ad

The Natural History Open Data Challenge @ OTA16

  • 2. Diverse collections spanning space and time Challenge of scale: >80 million specimens! Challenge of speed (digitising within a lifetime) Ambitious digitisation programme (DCP) Institutional policy “open by default” 
  • 4. Higher Classification Scientific name: Thymelicus lineola (Ochsenheimer, 1808) Family: Hesperiidae Location Locality: Tilbury Docks State/province: England Country: United Kingdom Continent: Europe Decimal latitude: 51.4605 Decimal longitude: 0.3449 Collection Event Recorded by: T G. Howarth; Howarth Collection date: 31 / 07 / 1938 Most iCollections specimens will have ~30 fields containing data (over 100 different fields across all collections) There are some issues… (where is H. M. Edelsten!?)
  • 6. Complete NHM Specimen Dataset (3.3M records) http://bit.ly/2goEpBB GitHub Gist – NHM API: http://bit.ly/2gtukRv iCollections Datasets http://bit.ly/2gGZub5 Even more data… http://www.gbif.org/occurrence
  • 7. Potential Challenges How did collecting effort change over time? Who was the collector who collected from the most distinct localities? – can we make a ranking table and mash up data with Wikipedia or other sources? What can we learn about the collectors – who travelled the furthest or most regularly? Were most specimens collected in rural areas? Is there collection bias in particular counties? How can we make the data more attractive to difference audiences? How could we display the data in more engaging or informative ways?
  • 8. Complete NHM Specimen Dataset (3.3M records) http://bit.ly/2goEpBB GitHub Gist – NHM API: http://bit.ly/2gtukRv iCollections Datasets http://bit.ly/2gGZub5 Even more data… http://www.gbif.org/occurrence

Editor's Notes

  • #2: Thanks for joining us today – to my knowledge this is the Natrual History Museum’s first hackathon based on its specimen data!
  • #3: 80M awesome objects - aim to get them done in under a lifetime[5:59] Default policy is openess - data and images going on the portal[5:59] Hopefully 3D at some point!