SlideShare a Scribd company logo
Biodiversity Informatics
Dag Endresen | GBIF Node Manager for Norway
Lecture | NTNU, Trondheim, Norway | 27th January 2021
Global Biodiversity Information Facility
Illustration by Harry Potter Bygett (NTNU) Wiki Commons CC BY 2.0
WHY SHOULD STUDENTS LEARN OPEN SCIENCE?
WHY TEACH STUDENTS OPEN SCIENCE?
v We are in the middle of an ongoing paradigm shift
in scientific practice (and impact metrics).
v The open science wave is moving fast!
v Young scientists will need different skills, than was
needed previously to succeed in academia.
v Researchers will need to develop different
approaches, than they needed in the past – to
remain relevant.
v Society is quickly gaining Big Data maturity and will
expect new services from biodiversity information
and research.
DATA CITATION AS A NEW CURRENCY OF SCIENCE
● Peer-reviewed scholarly papers in high impact journals
maintain considerable weight for impact metrics.
● A movement is under way to build similar status for
open data, open metadata, open material samples, and
other open scientific research products…
DECLARATION ON RESEARCH ASSESSMENT
● The Declaration on Research Assessment (DORA) recognizes the
need to improve the ways in which the outputs of scholarly research
are evaluated. Developed in 2012 in San Francisco.
● It has become a worldwide initiative covering all scholarly
disciplines and all key stakeholders including funders, publishers,
professional societies, institutions, and researchers.
● DORA’s vision is to advance practical and robust approaches to
research assessment globally and across all scholarly disciplines.
● The Research Council of Norway (RCN) signed DORA (May 2018)
To date (2021-01-27), 16 873 individuals and 2 139 organizations in 144 countries have signed DORA.
DATA CITATION PRINCIPLES
1. Data to be legitimate citable products of research.
2. Data citations giving scholarly credit and attribution.
3. In scholarly literature, whenever claims are based on data, data should always be cited.
4. Persistent method for identification of data, that is machine actionable, globally unique,
universal.
5. Data citation facilitate access to data or at least to metadata.
6. Unique identifiers that persist even beyond the lifespan of the data.
7. Data citation identify and access the specific data that support verification of the claim
(provenance, time-slice, version).
8. Flexible, but attention to interoperability of practices across communities.
Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014; https://doi.org/10.25490/a97f-egyk
FAIR DATA PRINCIPLES
… researchers need to do more than simply post
their data on the web for it to be re-usable.
WHAT IS FAIR DATA?
● FINDABLE à Data and supplementary materials have sufficiently rich
metadata and a unique and persistent identifier.
● ACCESSIBLE à Metadata and data are understandable to humans and
machines. Data is deposited in a trusted repository.
● INTEROPERABLE à Metadata use a formal, accessible, shared, and
broadly applicable language for knowledge representation.
● REUSABLE à Data and collections have a clear usage licenses and
provide accurate information on provenance.
● https://www.go-fair.org/
FAIR data is about machine-readable data …
Novel possibilities… (for novel curiosity-driven research)
Open science
Traditional science
Young
Researcher
CAREER OPPORTUNITIES
● Skills for open research and open data
are in increasing demand!
● Enables new research methodologies
that were not possible before.
● Bring benefits for your career as a
(young) researcher.
The dark side
REPRODUCIBILITY CRISIS
"Scientific irreproducibility —
the inability to repeat others'
experiments and reach the same
conclusion” (Nature 2016)
Baker (2016) 1,500 scientists lift the lid on reproducibility. Nature.
doi:10.1038/533452a
Baker (2016) 1,500 scientists lift the lid on reproducibility. Nature. doi:10.1038/533452a
Scientific irreproducibility
is a growing concern.
Open Science solution: researchers to
share their methods, data, computer code
and results in central data repositories.
For physical real-world samples we also
need herbarium specimen and bio-
repositories (museums).
WILL ANYBODY TRUST CLOSED SCIENCE AGAIN?
● Studies (1,2) indicates that p-hacking
is a significant problem – sometimes
even without the scientist even being
aware of doing so.
● Pre-registered (open) data provides a
good insurance against suspicion of
both data dredging (and plain data
falsification).
● “p-hacking” = occurs when researchers collect or
select data or statistical analyses until nonsignificant
results become significant (data fishing, …)
(1) Head et al. (2015) The Extent and Consequences of P-Hacking in Science . PLoS Biol. doi:10.1371/journal.pbio.1002106
(2) Ioannidis (2005). "Why Most Published Research Findings Are False". PLoS Medicine. doi:10.1371/journal.pmed.0020124.
TOOLS FOR REUSE OF RESEARCH DATA
DATA MANAGEMENT PLAN (DMP)
• A formal document that describes how data are to
be handled during a research project, and after the
project is completed.
• The goal is to plan data management before the
project begins.
• Including a plan for the costs of data management
and archiving.
• This saves time in the long run and promotes data
fitness for reuse.
• Reduce duplication of existing scientific studies.
• Reduce the loss of data.
https://innsida.ntnu.no/wiki/-/wiki/English/Data+management+plan
Illustration CC BY Jørgen Stamp
WHAT IS METADATA?
• Metadata, literally means
“data about data” are an
essential component of a data
management system,
describing such aspects as
the “what, where, when, who
and how” pertaining to a
resource.
METADATA SHOULD ALLOW A
USER OF THE DATA TO
• Identify & discover its existence
• Learn how to access or acquire the data
• Understand its fitness-for-use
• Learn how to obtain a copy of the data
• Learn how the data should be used
METADATA REDUCES
“DATA ENTROPY”
Illustration from: The Loss of Information about Data (Metadata) Over Time, Michener et al, 1997
WHAT IS A DATA PAPER?
• A data paper is a peer reviewed document
describing a dataset, published in a peer
reviewed journal.
• It takes effort to prepare, curate and
describe data.
• Data papers provide recognition for this
effort by means of a scholarly article.
• Getting scholarly recognition for improving
the fitness for reuse for your own datasets.
• Getting scholarly recognition for preparing
and making legacy research data available
for reuse.
https://www.gbif.org/data-papers
2021-01-27--biodiversity-informatics-gbif-(52slides)
Intergovernmental network
and research infrastructure
Provides anyone, anywhere,
free and open access to data
about all types of life on Earth
Voluntary collaboration
through Memorandum of
Understanding (MoU)
Participant nodes, Secretariat
in Copenhagen, Denmark
WHAT IS GBIF?
https://www.gbif.org
GBIF PARTICIPANT COUNTRIES
https://www.gbif.org/the-gbif-network
41 Voting Participants
21 Associate Countries
39 Other Associate
1704 Data publishers
A WINDOW ON EVIDENCE ABOUT WHERE SPECIES HAVE LIVED, AND WHEN
https://www.gbif.org/occurrence/search
Digitized
specimens
Observations
Literature
Remote-sensing
Environmental
DNA
Common
standards
(DwC)
Data publishing
and indexing
Data discovery and use
Darwin
Core
Research
data portals
GBIF: MULTIPLE-PURPOSE DATA PUBLISHING SERVICES
portal
Bio-Collections
& ecology datasets
Norwegian Red List
EU Directive reporting
(proposed workflow)
BY THE NUMBERS | 27 JANUARY 2021
62
Country
Participants
39
Organizational
Participants
5 373
Peer-review papers
using data
1 650 849 882
Species occurrence records
55 902
Datasets
1 634
Publishers
23.6 billion
Average records downloaded per month
(2020)
BY THE NUMBERS | 27 JANUARY 2021 -- NORWAY
124
Peer-review papers
using data (co-author
from Norway
40 894 819
Species occurrence records (published from)
301
Datasets (published from)
38
Publishers
(from Norway)
DATA TRENDS ON GBIF.org
https://www.gbif.org/analytics/global
% specimens
DATA
RICHNESS
LEVELS
SUPPORTED
BY
GBIF
https://www.gbif.org/dataset-classes
Dataset description,
taxonomic/geographic/temporal scope
Dataset metadata
M
List of taxa
regional or thematic (e.g. invasive, medicinal)
Species checklists
C
Species occurrences and sampling events
dates, coordinates, sampling effort / protocol, abundance
Sampling-event data
SE
Species occurrences
dates, coordinates, basis of record
Occurrence-only data
O
SPECIES OCCURRENCE RECORDS
WITH MULTIMEDIA EVIDENCE
27th January 2021
75 million records with taxonomically
identified images (1.8 million from Norway)
• 41.1 million specimens (Norway: 884 763)
• 31.3 million human observations (Norway: 923 763)
• 1.4 million material samples (Norway: 38 386)
685 806 audio files (Norway: 3 566)
2 825 videos (Norway: 4)
https://www.gbif.org/occurrence/gallery
SOURCES OF DATA IN GBIF: CITIZEN SCIENCE OBSERVATIONS
SOURCES OF DATA IN GBIF: DIGITIZED SPECIMENS FROM MUSEUM COLLECTIONS
SOURCES OF DATA IN GBIF: TAXONOMIC LITERATURE, OLD AND NEW
Data liberation
GLOBAL BIODIVERSITY VS. DIGITALLY AVAILABLE DATA
Image:
FL
Fawcett
in
Wheller
Ann.
Entomol.
Soc.
Am.
1990
Troudet
et
al.
Nature
Scientific
Reports
2017
1200 mill.
animals
300 m
plants
20 m
fungi
16 m
bacteria
0,04 m
virus
LATIN NAMES ARE RULED BY THE CODES
Domain (Eukarya)
Kingdom (Animalia)
Phylum (Chordata)
Class (Mammalia)
Order (Primates)
Family (Hominidae)
Genus (Homo)
Species
(Homo sapiens)
PhyloCode
ICN
ICZN
OTU = SH,
Species
hypothesis
numbers [DOI]
OTU = BIN,
Barcode
identification
number
GBIF
backbone
taxonomy
BIN DEF0002
SH ABC0001
OTU = Operational Taxonomic Unit
Species DNA Barcode
SOURCES OF DATA IN GBIF: DNA SEQUENCE-DERIVED OCCURRENCE DATA
MGnify -- https://www.gbif.org/publisher/ab733144-7043-4e88-bd4f-fca7bf858880
NEW GBIF GUIDE: PUBLISHING
SEQUENCE-DERIVED DATA THROUGH
BIODIVERSITY DISCOVERY
PLATFORMS
• Authors from Australia, Norway, Sweden, Denmark, UNITE, and GBIF
• Based on practical mapping and data publishing experiences
• Cross-platform
• About 40 pages long ”cookbook”
v Introduction – refresh your ”data culinary” knowledge
v Categorization – what ”data ingredients” you got to publish?
v Mapping – choose and follow the ”recipe”
v Visuals – clarity and guidelines
v Future prospects
v Resources: glossary, links, references
Based on Darwin Core and MIxS data standards
https://doi.org/10.35035/doc-vf1a-nr22
POLICY LINKS
International treaties – national progress based on national GBIF data publication
POLICY LINKS: AICHI TARGETS
- Trend in invasive
alien species
introductions (through
Global Register of
Introduced and
Invasive Species)
- Species Protection
Index
- Protected Area
Representativeness
Index
- Comprehensiveness
of conservation of
socioeconomically/cu
lturally valuable
species
- Agrobiodiversity
Index
- Crop Wild Relative
Index
- Growth in species
occurrence records
accessible through
GBIF
- Species Status
Information Index
https://www.cbd.int/cooperation/csp/gbif.shtml | https://www.cbd.int/csp/survey/GBIF.pdf
A DATA RESOURCE TO SUPPORT RESEARCH AND SUSTAINABLE DEVELOPMENT
Conservation
- Protected areas
- Threatened species
- Invasive species risk
Food Security
- Crop wild relatives
- In situ, ex situ
conservation of
genetic diversity
- Fisheries planning
Climate change
- Modelling impacts on
species ranges
- Adaptation strategies
- Mitigation benefits,
risks
Human health
- Disease risk based on
occurrence of vectors,
hosts, reservoirs
- Medicinal plants
- Hazards e.g. snakebite
https://www.gbif.org/science-review
PEER-REVIEWED PUBLICATIONS USING GBIF-MEDIATED DATA September 2020
https://www.gbif.org/resource/search?contentType=literature&literatureType=journal&relevance=GBIF_USED&peerReview=true
626
52
89
148
169
229
249
350
407
428
696
676
743
938
0 200 400 600 800 1 000 1 200
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
Year-to-date Annual total (with projection for 2020)
~ 2-3 papers a day
#CiteTheDOI
DATA USE IN PEER-REVIEWED JOURNALS
https://www.gbif.org/resource/search?contentType=literature&year=2020&literatureType=journal&relevance=GBIF_USED&countriesOfResearcher=NO&peerReview=true
GBIF-mediated data use citations 2020 2019
1 United States 260 215
2 China 192 175
3 Brazil 119 95
4 United Kingdom 107 89
5 Mexico 92 64
5 Germany 79 63
7 Spain 77 51
8 Canada 63 44
9 Australia 55 53
10 France 52 48
-- Norway 20 19
78
58
220
261
255
597
47
54
176
182
214
398
Oceania
Africa
Asia
North America
Latin America
Europe
0 200 400 600 800
Peer-reviewed uses by region
2020 2019
HOW TO CITE DATA MEDIATED BY GBIF
1. Download data from GBIF.org
2. and receive recommended citation with a download DOI
3. Cite the DOI in published research or other work
Example: GBIF.org (12 October 2020) GBIF Occurrence Download https//doi.org/10.15468/dl.xxxxxx
https://www.gbif.org/citation-guidelines
#CiteTheDOI
WHY CITE DATA?
• Good academic practice for transparent and reproducible research
• Credit institutions who shared data and supported your research
• Help data publishing institutions to demonstrate value of digitization
and data publication through research
• Correct citation encourages data sharing
• Data accessed through GBIF is free for all – but not free of
obligations: see the user agreement
https://www.gbif.org/citation-guidelines
#CiteTheDOI
Filter
Source dataset #1
Source dataset #2
Source dataset #3
GBIF download
Process
Archive
Final state of data
Dataset DOIs Download DOI Archive DOI Bibliographic DOI
Analyze &
publish
Filter
Source dataset #1
Source dataset #2
Source dataset #3
GBIF download
Process
Archive
Final state of data
Dataset DOIs Download DOI Archive DOI Bibliographic DOI
Analyze &
publish
Filter
Source dataset #1
Source dataset #2
Source dataset #3
GBIF download
Process
Archive
Final state of data
Dataset DOIs Download DOI Archive DOI Bibliographic DOI
Analyze &
publish
(possible with persistent object identifiers)
DOI BASED DATA CITATION AT GBIF.ORG
NTNU Vascular plants: https://doi.org/10.15468/zrlqok
citations papers
dataset
THE RESEARCH DATA LIFECYCLE
https://library.sydney.edu.au/research/data-management/research-data-management.html
GBIF.org
THANK YOU
www.gbif.org
Dag Endresen | GBIF Norway
helpdesk@gbif.no

More Related Content

PPTX
Biodiversity : Definitions, Principles and Threats
DOCX
Applied ecology
PPTX
Loss of Biodiversity
PPTX
PHYTOREMEDIATION
PPTX
Conservation Strategies
PPTX
In situ conservation2222
PPTX
Hotspot: India As a Mega Biodiversity Hotspot
Biodiversity : Definitions, Principles and Threats
Applied ecology
Loss of Biodiversity
PHYTOREMEDIATION
Conservation Strategies
In situ conservation2222
Hotspot: India As a Mega Biodiversity Hotspot

What's hot (20)

PPTX
Threats to biodiversity
PPTX
DOCX
Ex-situ and in situ conservation
PPTX
Red list categories
PPTX
Type of biodiversity
PDF
Global Biodiversity Information Facility - 2013
PPTX
BIODIVERSITY & ITS TYPES
PPTX
Red data book and Red list categories
PPTX
Methods of measuring biodiversity Biodiversity index
PPTX
Environment impact assessment
PDF
Documentation in plant taxonomy
PPTX
Bioresources and uses of biodiversity
PPTX
Iucn red list
PPTX
Introduction of Non Wood Forest Products
PPTX
Phytogeography, climate, vegetation and botanical zones
PPTX
Restoration Ecology in Environmental Science
PPTX
Ecological indicators and support with local examples
PPTX
Agrobiodiversity and sustainability
PPTX
Community ecology
PPTX
Succession ,its types ,causes and theories
Threats to biodiversity
Ex-situ and in situ conservation
Red list categories
Type of biodiversity
Global Biodiversity Information Facility - 2013
BIODIVERSITY & ITS TYPES
Red data book and Red list categories
Methods of measuring biodiversity Biodiversity index
Environment impact assessment
Documentation in plant taxonomy
Bioresources and uses of biodiversity
Iucn red list
Introduction of Non Wood Forest Products
Phytogeography, climate, vegetation and botanical zones
Restoration Ecology in Environmental Science
Ecological indicators and support with local examples
Agrobiodiversity and sustainability
Community ecology
Succession ,its types ,causes and theories
Ad

Similar to 2021-01-27--biodiversity-informatics-gbif-(52slides) (20)

PDF
The role of biodiversity informatics in GBIF, 2021-05-18
PDF
GBIF and Biodiversity informatics for museums, 15 March 2021
PDF
FAIR and open biodiversity collection data management
PDF
Open science curriculum for students, June 2019
PDF
Museum collections as research data - October 2019
PDF
Research Data Management
PPTX
Open Science and Open Data for Librarians
PDF
GBIF and Open Science
PDF
GBIF data mobilisation for the Nansen Legacy, Tromsø, 2022-09-20
PPTX
RDM Training: Publish research data with the Research Data Repository
PDF
Enhance your rese​arch impact through open science
PDF
MOA 2015, Keynote - Open All The Things
PPTX
METRO RDM Webinar
PPTX
Basi Conept of Open Science presentation training.pptx
PPTX
Session 02, Introduction to the 2015 Data Publishing Landscape at the GB22 No...
PDF
RDFC2012 Open Access to Research Data
PPTX
Data accessibility and the role of informatics in predicting the biosphere
PPTX
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
PDF
OpenML 2014
PPTX
Data Literacy: Creating and Managing Reserach Data
The role of biodiversity informatics in GBIF, 2021-05-18
GBIF and Biodiversity informatics for museums, 15 March 2021
FAIR and open biodiversity collection data management
Open science curriculum for students, June 2019
Museum collections as research data - October 2019
Research Data Management
Open Science and Open Data for Librarians
GBIF and Open Science
GBIF data mobilisation for the Nansen Legacy, Tromsø, 2022-09-20
RDM Training: Publish research data with the Research Data Repository
Enhance your rese​arch impact through open science
MOA 2015, Keynote - Open All The Things
METRO RDM Webinar
Basi Conept of Open Science presentation training.pptx
Session 02, Introduction to the 2015 Data Publishing Landscape at the GB22 No...
RDFC2012 Open Access to Research Data
Data accessibility and the role of informatics in predicting the biosphere
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
OpenML 2014
Data Literacy: Creating and Managing Reserach Data
Ad

More from Dag Endresen (20)

PDF
Joint GBIF Biodiversa+ symposium in Helsinki on 2024-04-16
PDF
Iliad webinar 2024-03-13, Accessing and publishing marine biodiversity data i...
PDF
Modelling Research Expeditions in Wikidata: Best Practice for Standardisation...
PDF
Ontologies for biodiversity informatics, UiO DSC June 2023
PDF
Evacuation of the Kherson herbarium
PDF
2023-05-08 GLIS SAC Rome
PDF
BioDT for the UiO Science section meeting 2023-03-24
PDF
Data and Stats Forum at MINA NMBU - 2023-04-26
PPTX
BioDATA final conference in Oslo, November 2022
PDF
GBIF at Living Norway Open Science Lab 2022-03-03
PDF
GBIF & GRScicoll, Høstseminar Norges museumsforbunds Seksjon for natur, 2021-...
PDF
Råd fra GBIF-Norge til datainfrastrukturutvalget i dialogmøte 2021-11-19
PDF
2016-10-12 MUSIT & GBIF - Dataset portals
PDF
BioDATA capacity enhancement curriculum at GBIF GB26 Global Nodes Meeting in ...
PDF
GBIF-Norway node story lightning talk at GB26 in Leiden, October 2019
PDF
GBIF towards 2030 (November 2018)
PDF
Event core and new datatypes in GBIF - 10th European GBIF Nodes Meeting in Ta...
PDF
GBIF/OBIS hackathon in Brussels January 2018
PDF
Reuse of biodiversity data published in GBIF, November 2017
PDF
GBIF lunch seminar at UiO Natural History Museum in Oslo, 2017-03-30
Joint GBIF Biodiversa+ symposium in Helsinki on 2024-04-16
Iliad webinar 2024-03-13, Accessing and publishing marine biodiversity data i...
Modelling Research Expeditions in Wikidata: Best Practice for Standardisation...
Ontologies for biodiversity informatics, UiO DSC June 2023
Evacuation of the Kherson herbarium
2023-05-08 GLIS SAC Rome
BioDT for the UiO Science section meeting 2023-03-24
Data and Stats Forum at MINA NMBU - 2023-04-26
BioDATA final conference in Oslo, November 2022
GBIF at Living Norway Open Science Lab 2022-03-03
GBIF & GRScicoll, Høstseminar Norges museumsforbunds Seksjon for natur, 2021-...
Råd fra GBIF-Norge til datainfrastrukturutvalget i dialogmøte 2021-11-19
2016-10-12 MUSIT & GBIF - Dataset portals
BioDATA capacity enhancement curriculum at GBIF GB26 Global Nodes Meeting in ...
GBIF-Norway node story lightning talk at GB26 in Leiden, October 2019
GBIF towards 2030 (November 2018)
Event core and new datatypes in GBIF - 10th European GBIF Nodes Meeting in Ta...
GBIF/OBIS hackathon in Brussels January 2018
Reuse of biodiversity data published in GBIF, November 2017
GBIF lunch seminar at UiO Natural History Museum in Oslo, 2017-03-30

Recently uploaded (20)

PPTX
Cell Types and Its function , kingdom of life
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
Pre independence Education in Inndia.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Classroom Observation Tools for Teachers
PPTX
Institutional Correction lecture only . . .
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Cell Structure & Organelles in detailed.
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Pharma ospi slides which help in ospi learning
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Basic Mud Logging Guide for educational purpose
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
RMMM.pdf make it easy to upload and study
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Cell Types and Its function , kingdom of life
Microbial diseases, their pathogenesis and prophylaxis
Abdominal Access Techniques with Prof. Dr. R K Mishra
Pre independence Education in Inndia.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Classroom Observation Tools for Teachers
Institutional Correction lecture only . . .
O5-L3 Freight Transport Ops (International) V1.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Anesthesia in Laparoscopic Surgery in India
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Final Presentation General Medicine 03-08-2024.pptx
Cell Structure & Organelles in detailed.
Supply Chain Operations Speaking Notes -ICLT Program
Pharma ospi slides which help in ospi learning
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Basic Mud Logging Guide for educational purpose
VCE English Exam - Section C Student Revision Booklet
RMMM.pdf make it easy to upload and study
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...

2021-01-27--biodiversity-informatics-gbif-(52slides)

  • 1. Biodiversity Informatics Dag Endresen | GBIF Node Manager for Norway Lecture | NTNU, Trondheim, Norway | 27th January 2021 Global Biodiversity Information Facility Illustration by Harry Potter Bygett (NTNU) Wiki Commons CC BY 2.0
  • 2. WHY SHOULD STUDENTS LEARN OPEN SCIENCE?
  • 3. WHY TEACH STUDENTS OPEN SCIENCE? v We are in the middle of an ongoing paradigm shift in scientific practice (and impact metrics). v The open science wave is moving fast! v Young scientists will need different skills, than was needed previously to succeed in academia. v Researchers will need to develop different approaches, than they needed in the past – to remain relevant. v Society is quickly gaining Big Data maturity and will expect new services from biodiversity information and research.
  • 4. DATA CITATION AS A NEW CURRENCY OF SCIENCE ● Peer-reviewed scholarly papers in high impact journals maintain considerable weight for impact metrics. ● A movement is under way to build similar status for open data, open metadata, open material samples, and other open scientific research products…
  • 5. DECLARATION ON RESEARCH ASSESSMENT ● The Declaration on Research Assessment (DORA) recognizes the need to improve the ways in which the outputs of scholarly research are evaluated. Developed in 2012 in San Francisco. ● It has become a worldwide initiative covering all scholarly disciplines and all key stakeholders including funders, publishers, professional societies, institutions, and researchers. ● DORA’s vision is to advance practical and robust approaches to research assessment globally and across all scholarly disciplines. ● The Research Council of Norway (RCN) signed DORA (May 2018) To date (2021-01-27), 16 873 individuals and 2 139 organizations in 144 countries have signed DORA.
  • 6. DATA CITATION PRINCIPLES 1. Data to be legitimate citable products of research. 2. Data citations giving scholarly credit and attribution. 3. In scholarly literature, whenever claims are based on data, data should always be cited. 4. Persistent method for identification of data, that is machine actionable, globally unique, universal. 5. Data citation facilitate access to data or at least to metadata. 6. Unique identifiers that persist even beyond the lifespan of the data. 7. Data citation identify and access the specific data that support verification of the claim (provenance, time-slice, version). 8. Flexible, but attention to interoperability of practices across communities. Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014; https://doi.org/10.25490/a97f-egyk
  • 7. FAIR DATA PRINCIPLES … researchers need to do more than simply post their data on the web for it to be re-usable.
  • 8. WHAT IS FAIR DATA? ● FINDABLE à Data and supplementary materials have sufficiently rich metadata and a unique and persistent identifier. ● ACCESSIBLE à Metadata and data are understandable to humans and machines. Data is deposited in a trusted repository. ● INTEROPERABLE à Metadata use a formal, accessible, shared, and broadly applicable language for knowledge representation. ● REUSABLE à Data and collections have a clear usage licenses and provide accurate information on provenance. ● https://www.go-fair.org/
  • 9. FAIR data is about machine-readable data …
  • 10. Novel possibilities… (for novel curiosity-driven research) Open science Traditional science Young Researcher
  • 11. CAREER OPPORTUNITIES ● Skills for open research and open data are in increasing demand! ● Enables new research methodologies that were not possible before. ● Bring benefits for your career as a (young) researcher.
  • 13. REPRODUCIBILITY CRISIS "Scientific irreproducibility — the inability to repeat others' experiments and reach the same conclusion” (Nature 2016) Baker (2016) 1,500 scientists lift the lid on reproducibility. Nature. doi:10.1038/533452a
  • 14. Baker (2016) 1,500 scientists lift the lid on reproducibility. Nature. doi:10.1038/533452a Scientific irreproducibility is a growing concern. Open Science solution: researchers to share their methods, data, computer code and results in central data repositories. For physical real-world samples we also need herbarium specimen and bio- repositories (museums).
  • 15. WILL ANYBODY TRUST CLOSED SCIENCE AGAIN? ● Studies (1,2) indicates that p-hacking is a significant problem – sometimes even without the scientist even being aware of doing so. ● Pre-registered (open) data provides a good insurance against suspicion of both data dredging (and plain data falsification). ● “p-hacking” = occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant (data fishing, …) (1) Head et al. (2015) The Extent and Consequences of P-Hacking in Science . PLoS Biol. doi:10.1371/journal.pbio.1002106 (2) Ioannidis (2005). "Why Most Published Research Findings Are False". PLoS Medicine. doi:10.1371/journal.pmed.0020124.
  • 16. TOOLS FOR REUSE OF RESEARCH DATA
  • 17. DATA MANAGEMENT PLAN (DMP) • A formal document that describes how data are to be handled during a research project, and after the project is completed. • The goal is to plan data management before the project begins. • Including a plan for the costs of data management and archiving. • This saves time in the long run and promotes data fitness for reuse. • Reduce duplication of existing scientific studies. • Reduce the loss of data. https://innsida.ntnu.no/wiki/-/wiki/English/Data+management+plan Illustration CC BY Jørgen Stamp
  • 18. WHAT IS METADATA? • Metadata, literally means “data about data” are an essential component of a data management system, describing such aspects as the “what, where, when, who and how” pertaining to a resource.
  • 19. METADATA SHOULD ALLOW A USER OF THE DATA TO • Identify & discover its existence • Learn how to access or acquire the data • Understand its fitness-for-use • Learn how to obtain a copy of the data • Learn how the data should be used
  • 20. METADATA REDUCES “DATA ENTROPY” Illustration from: The Loss of Information about Data (Metadata) Over Time, Michener et al, 1997
  • 21. WHAT IS A DATA PAPER? • A data paper is a peer reviewed document describing a dataset, published in a peer reviewed journal. • It takes effort to prepare, curate and describe data. • Data papers provide recognition for this effort by means of a scholarly article. • Getting scholarly recognition for improving the fitness for reuse for your own datasets. • Getting scholarly recognition for preparing and making legacy research data available for reuse. https://www.gbif.org/data-papers
  • 23. Intergovernmental network and research infrastructure Provides anyone, anywhere, free and open access to data about all types of life on Earth Voluntary collaboration through Memorandum of Understanding (MoU) Participant nodes, Secretariat in Copenhagen, Denmark WHAT IS GBIF? https://www.gbif.org
  • 24. GBIF PARTICIPANT COUNTRIES https://www.gbif.org/the-gbif-network 41 Voting Participants 21 Associate Countries 39 Other Associate 1704 Data publishers
  • 25. A WINDOW ON EVIDENCE ABOUT WHERE SPECIES HAVE LIVED, AND WHEN https://www.gbif.org/occurrence/search Digitized specimens Observations Literature Remote-sensing Environmental DNA Common standards (DwC) Data publishing and indexing Data discovery and use
  • 26. Darwin Core Research data portals GBIF: MULTIPLE-PURPOSE DATA PUBLISHING SERVICES portal Bio-Collections & ecology datasets Norwegian Red List EU Directive reporting (proposed workflow)
  • 27. BY THE NUMBERS | 27 JANUARY 2021 62 Country Participants 39 Organizational Participants 5 373 Peer-review papers using data 1 650 849 882 Species occurrence records 55 902 Datasets 1 634 Publishers 23.6 billion Average records downloaded per month (2020)
  • 28. BY THE NUMBERS | 27 JANUARY 2021 -- NORWAY 124 Peer-review papers using data (co-author from Norway 40 894 819 Species occurrence records (published from) 301 Datasets (published from) 38 Publishers (from Norway)
  • 29. DATA TRENDS ON GBIF.org https://www.gbif.org/analytics/global % specimens
  • 30. DATA RICHNESS LEVELS SUPPORTED BY GBIF https://www.gbif.org/dataset-classes Dataset description, taxonomic/geographic/temporal scope Dataset metadata M List of taxa regional or thematic (e.g. invasive, medicinal) Species checklists C Species occurrences and sampling events dates, coordinates, sampling effort / protocol, abundance Sampling-event data SE Species occurrences dates, coordinates, basis of record Occurrence-only data O
  • 31. SPECIES OCCURRENCE RECORDS WITH MULTIMEDIA EVIDENCE 27th January 2021 75 million records with taxonomically identified images (1.8 million from Norway) • 41.1 million specimens (Norway: 884 763) • 31.3 million human observations (Norway: 923 763) • 1.4 million material samples (Norway: 38 386) 685 806 audio files (Norway: 3 566) 2 825 videos (Norway: 4) https://www.gbif.org/occurrence/gallery
  • 32. SOURCES OF DATA IN GBIF: CITIZEN SCIENCE OBSERVATIONS
  • 33. SOURCES OF DATA IN GBIF: DIGITIZED SPECIMENS FROM MUSEUM COLLECTIONS
  • 34. SOURCES OF DATA IN GBIF: TAXONOMIC LITERATURE, OLD AND NEW Data liberation
  • 35. GLOBAL BIODIVERSITY VS. DIGITALLY AVAILABLE DATA Image: FL Fawcett in Wheller Ann. Entomol. Soc. Am. 1990 Troudet et al. Nature Scientific Reports 2017 1200 mill. animals 300 m plants 20 m fungi 16 m bacteria 0,04 m virus
  • 36. LATIN NAMES ARE RULED BY THE CODES Domain (Eukarya) Kingdom (Animalia) Phylum (Chordata) Class (Mammalia) Order (Primates) Family (Hominidae) Genus (Homo) Species (Homo sapiens) PhyloCode ICN ICZN
  • 37. OTU = SH, Species hypothesis numbers [DOI] OTU = BIN, Barcode identification number GBIF backbone taxonomy BIN DEF0002 SH ABC0001 OTU = Operational Taxonomic Unit Species DNA Barcode
  • 38. SOURCES OF DATA IN GBIF: DNA SEQUENCE-DERIVED OCCURRENCE DATA MGnify -- https://www.gbif.org/publisher/ab733144-7043-4e88-bd4f-fca7bf858880
  • 39. NEW GBIF GUIDE: PUBLISHING SEQUENCE-DERIVED DATA THROUGH BIODIVERSITY DISCOVERY PLATFORMS • Authors from Australia, Norway, Sweden, Denmark, UNITE, and GBIF • Based on practical mapping and data publishing experiences • Cross-platform • About 40 pages long ”cookbook” v Introduction – refresh your ”data culinary” knowledge v Categorization – what ”data ingredients” you got to publish? v Mapping – choose and follow the ”recipe” v Visuals – clarity and guidelines v Future prospects v Resources: glossary, links, references Based on Darwin Core and MIxS data standards https://doi.org/10.35035/doc-vf1a-nr22
  • 40. POLICY LINKS International treaties – national progress based on national GBIF data publication
  • 41. POLICY LINKS: AICHI TARGETS - Trend in invasive alien species introductions (through Global Register of Introduced and Invasive Species) - Species Protection Index - Protected Area Representativeness Index - Comprehensiveness of conservation of socioeconomically/cu lturally valuable species - Agrobiodiversity Index - Crop Wild Relative Index - Growth in species occurrence records accessible through GBIF - Species Status Information Index https://www.cbd.int/cooperation/csp/gbif.shtml | https://www.cbd.int/csp/survey/GBIF.pdf
  • 42. A DATA RESOURCE TO SUPPORT RESEARCH AND SUSTAINABLE DEVELOPMENT Conservation - Protected areas - Threatened species - Invasive species risk Food Security - Crop wild relatives - In situ, ex situ conservation of genetic diversity - Fisheries planning Climate change - Modelling impacts on species ranges - Adaptation strategies - Mitigation benefits, risks Human health - Disease risk based on occurrence of vectors, hosts, reservoirs - Medicinal plants - Hazards e.g. snakebite https://www.gbif.org/science-review
  • 43. PEER-REVIEWED PUBLICATIONS USING GBIF-MEDIATED DATA September 2020 https://www.gbif.org/resource/search?contentType=literature&literatureType=journal&relevance=GBIF_USED&peerReview=true 626 52 89 148 169 229 249 350 407 428 696 676 743 938 0 200 400 600 800 1 000 1 200 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Year-to-date Annual total (with projection for 2020) ~ 2-3 papers a day #CiteTheDOI
  • 44. DATA USE IN PEER-REVIEWED JOURNALS https://www.gbif.org/resource/search?contentType=literature&year=2020&literatureType=journal&relevance=GBIF_USED&countriesOfResearcher=NO&peerReview=true GBIF-mediated data use citations 2020 2019 1 United States 260 215 2 China 192 175 3 Brazil 119 95 4 United Kingdom 107 89 5 Mexico 92 64 5 Germany 79 63 7 Spain 77 51 8 Canada 63 44 9 Australia 55 53 10 France 52 48 -- Norway 20 19 78 58 220 261 255 597 47 54 176 182 214 398 Oceania Africa Asia North America Latin America Europe 0 200 400 600 800 Peer-reviewed uses by region 2020 2019
  • 45. HOW TO CITE DATA MEDIATED BY GBIF 1. Download data from GBIF.org 2. and receive recommended citation with a download DOI 3. Cite the DOI in published research or other work Example: GBIF.org (12 October 2020) GBIF Occurrence Download https//doi.org/10.15468/dl.xxxxxx https://www.gbif.org/citation-guidelines #CiteTheDOI
  • 46. WHY CITE DATA? • Good academic practice for transparent and reproducible research • Credit institutions who shared data and supported your research • Help data publishing institutions to demonstrate value of digitization and data publication through research • Correct citation encourages data sharing • Data accessed through GBIF is free for all – but not free of obligations: see the user agreement https://www.gbif.org/citation-guidelines #CiteTheDOI
  • 47. Filter Source dataset #1 Source dataset #2 Source dataset #3 GBIF download Process Archive Final state of data Dataset DOIs Download DOI Archive DOI Bibliographic DOI Analyze & publish
  • 48. Filter Source dataset #1 Source dataset #2 Source dataset #3 GBIF download Process Archive Final state of data Dataset DOIs Download DOI Archive DOI Bibliographic DOI Analyze & publish
  • 49. Filter Source dataset #1 Source dataset #2 Source dataset #3 GBIF download Process Archive Final state of data Dataset DOIs Download DOI Archive DOI Bibliographic DOI Analyze & publish (possible with persistent object identifiers)
  • 50. DOI BASED DATA CITATION AT GBIF.ORG NTNU Vascular plants: https://doi.org/10.15468/zrlqok citations papers dataset
  • 51. THE RESEARCH DATA LIFECYCLE https://library.sydney.edu.au/research/data-management/research-data-management.html GBIF.org
  • 52. THANK YOU www.gbif.org Dag Endresen | GBIF Norway helpdesk@gbif.no