SlideShare a Scribd company logo
Force 11 Scholarly Communications Institute
Summer School
31 July – 4 August 2017
University of California, San Diego
Data in the Scholarly Communications
Lifecycle
Natasha Simons
Senior Research Data Specialist
Wednesday 2 August
Session one – data discovery
• What is data discovery?
• What is metadata?
• Exercise: describe this dataset!
• More about metadata including types, storage, standards,
vocabularies, the power of rich metadata, m-2-m metadata exchange
• Discussion groups exploring the FAIR data principles:
• Findable
• Accessible
• Interoperable
• Reusable
Duration: 60 mins
Why do people search for data?
Why do people search for data*?
• Exploratory/Scoping
• Reuse/Secondary data analysis
• Can be starting point or ad hoc
• Peer review
• Reproduce/extend results
• Repurpose (e.g. for mashups, visualisations, simulations)
• Verify claims (e.g. report findings)
*Not in any order; not exhaustive!
How do people find data?
How do people find data*?
• Google
• Ask a colleague
• Find link to data in a journal article
• Data journals
• Data registries e.g. re3data
• Open data portals e.g. data.gov
• Institutional repositories
• Data / Discipline repositories e.g. Dryad
• Project website
• Data discovery aggregators like Research Data Australia
• Library catalogues, databases
*Not in any order; not exhaustive!
Characteristics of finding data
• Movable feast / changing beast
• No standard practice, universal standard or vocab
• Databases are non-exhaustive
• Methods for searching and terms driven by why people are
looking and how the data is stored
 Find
 Identify
 Select
 Obtain
When looking for data, people need to:
This is our guide for creating metadata records!
FISO
What is metadata?
https://www.youtube.com/watch?v=ABF2FvSPVYE
Exercise: create metadata
Your task:
1. Divide into 2 groups
2. Each take one of the CSV data files
3. Describe the data!
Duration: 10 mins
Metadata records
How did you go?
What did you learn?
Here are the original metadata descriptions:
CSV dataset #1 - https://data.qld.gov.au/dataset/marine-oil-
spills-data
CSV dataset #2 –
https://data.qld.gov.au/dataset/koala-hospital-data
Types of metadata
Metadata elements can describe either a single item or a collection, and
can serve different purposes. Examples of metadata for a photograph
could include:
• descriptive metadata, such as the name of the photographer, the
location and subject of the photograph, the date and time that the
photograph was taken
• technical metadata, such as the type of camera used to take the
photograph, the file format in which the photograph is stored, the
exposure time and dimensions of the photograph, and so on
• access and rights metadata, defining who is allowed to view the
photograph under what conditions, and what they can do with it
(reuse)
• preservation metadata, which keeps track of actions taken to preserve
or sustain the photograph for later access and use.
Source: ANDS website
Where does metadata come from?
• Metadata can be created manually by people or automatically by
instruments or computers.
• Metadata capture is easiest if it is automatically generated when the
data is created, for example, the metadata your camera captures
every time you take a photo.
• For much research data, the researcher needs to create the
descriptive and provenance metadata, as only they have that
information.
When should metadata for research data be created?
Whenever it is needed, particularly:
• During the course of data collection
• When the data changes
• And at the end when the data is deposited and ‘published’
Where is metadata stored?
• Metadata can be stored in local source systems like repositories –
often with the data it is about.
• Metadata that enables research data to be discovered and accessed
should be published in discovery portals like Research Data Australia,
or in discipline or institutional portals.
• Metadata that gives detailed contextual information and supports
reuse, such as data-item-level metadata, workflows, analysis, and
detailed methods information, is usually stored with the data.
The power of rich metadata
• Well described metadata records show the power of rich metadata in
making research data collections discoverable, citable, reusable and
accessible for the long term.
• Two-Rocks moorings data 2004 - 2005 metadata record in the CSIRO
Data Access Portal contains 35 metadata fields which enable
researchers to quickly and accurately assess the relevance of this
dataset to their research. The metadata record and the data are closely
linked through co-location on the same access page. The Files tab
contains additional metadata about each of the 17 files within this
collection: file type, last modified, and file size.
• Rich metadata allows records to be syndicated to other data
catalogues; here is the same Two-Rocks mooring data record
syndicated to:
• Research Data Australia: Australia’s aggregated research data
catalogue
• Marlin Oceans and Atmosphere: a discipline-specific metadata
catalogue
Metadata standards
A metadata standard is a schema that has been formally approved and
published, with governance procedures in place to maintain and update
the standard.
Examples:
Dublin Core - http://dublincore.org/documents/dces/
RIF-CS –
http://services.ands.org.au/documentation/rifcs/1.6/guidelines/rif-
cs.html
How do I find metadata standards?
• See this disciplinary metadata directory
Vocabularies and research data
• A vocabulary sets out the common language a discipline has agreed to
use to refer to concepts of interest in that discipline.
• Researchers planning observation or surveys need to define their data
items clearly.
• An agreed vocabulary (a standard) makes a good starting point for
translating concepts into other vocabularies so that collaboration can
occur.
• Indexing vocabularies are used to tag items in library catalogues and
search portals and to provide keywords for academic journal articles.
• A vocabulary service is a machine-to-machine service that can support
activities such as creating, managing and querying vocabularies.
Want more?
• Read the ANDS Guide on vocabularies for research data
• Explore Research Vocabularies Australia
• Check out the COAR vocabularies
m-2-m metadata exchange
m-2-m = networked devices to exchange information and perform actions
without the manual assistance of humans.
Examples:
OAI-PMH – also known as metadata harvesting and commonly used by
repositories and repository aggregators
APIs – such as the ORCID API that enables things like authenticating
against the ORCID registry
Want more?
Have a go at these Things:
Thing 4 – data discovery
Thing 11 – what’s my metadata schema?
Thing 12 – vocabularies for data description
Thing 13 – walk the crosswalk
Discussion: FAIR data
Your task:
• Divide into groups of 2 or 3 people
• Read through the FAIR data principles:
https://www.force11.org/group/fairgroup/fairprinciples
• Discuss:
• Are these good principles? Why? Why not?
• How might these principles be put into practice?
• Then we will re-group and you will be invited to share
Duration: 20 mins
FAIR data - resources
In Nature - https://www.nature.com/articles/sdata201618
EUDAT webinar - https://eudat.eu/events/webinar/fair-data-in-trustworthy-
data-repositories-webinar
LIBER webinar - http://libereurope.eu/blog/2017/02/23/liber-webinar-fair-
data-principles-fair/
The European Commission has established an Expert Group on Turning FAIR
Data into Reality (E03464) which will run until Spring 2018.
Horizon 2020 Guidelines on FAIR Data Management -
http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/
oa_pilot/h2020-hi-oa-data-mgt_en.pdf
With the exception of logos, third party images or where otherwise indicated, this
work is licensed under the Creative Commons Australia Attribution 3.0 Licence.
ANDS is supported by the Australian
Government through the National Collaborative
Research Infrastructure Strategy Program.
Monash University leads the partnership with
the Australian National University and CSIRO.
Natasha Simons
natasha.simons@ands.org.au
Tw: @n_simons
ORCID: https://orcid.org/0000-0003-0635-1998

More Related Content

PDF
Preparing Data for Sharing: The FAIR Principles
PPT
Managing data throughout the research lifecycle
PDF
"Cool" metadata for FAIR data
PPTX
Research data management workshop april12 2016
PPTX
PPTX
Fsci 2018 thursday2_august_am6
PDF
Research Data Management and Sharing for the Social Sciences and Humanities
PPTX
D4Science Data infrastructure: a facilitator for a FAIR data management
Preparing Data for Sharing: The FAIR Principles
Managing data throughout the research lifecycle
"Cool" metadata for FAIR data
Research data management workshop april12 2016
Fsci 2018 thursday2_august_am6
Research Data Management and Sharing for the Social Sciences and Humanities
D4Science Data infrastructure: a facilitator for a FAIR data management

What's hot (20)

PDF
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
PPTX
Fair data principles for AOASG
PDF
Data Repositories Impact
PPTX
Research Data Management
PPT
Ownership, intellectual property, and governance considerations for academic ...
PDF
DataTags, The Tags Toolset, and Dataverse Integration
PPTX
LIBER Webinar: Are the FAIR Data Principles really fair?
PPTX
INSTRUCT - Integrated Structural Biology Infrastructure
PDF
Dataverse, Cloud Dataverse, and DataTags
PPTX
HKU Data Curation MLIM7350 Class 9
PPTX
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
PPTX
DRI Introductory Training: Introduction to Metadata
PPTX
DataONE Education Module 01: Why Data Management?
PPTX
DataONE Education Module 02: Data Sharing
PPT
Data Management for Undergraduate Research
PPTX
DataONE Education Module 08: Data Citation
PDF
Managing, Sharing and Curating Your Research Data in a Digital Environment
PPTX
DataONE Education Module 03: Data Management Planning
PPTX
MetadataTheory: Introduction to Metadata (5th of 10)
PPT
Information retrieval system
FAIR Data in Trustworthy Data Repositories Webinar - 12-13 December 2016| www...
Fair data principles for AOASG
Data Repositories Impact
Research Data Management
Ownership, intellectual property, and governance considerations for academic ...
DataTags, The Tags Toolset, and Dataverse Integration
LIBER Webinar: Are the FAIR Data Principles really fair?
INSTRUCT - Integrated Structural Biology Infrastructure
Dataverse, Cloud Dataverse, and DataTags
HKU Data Curation MLIM7350 Class 9
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
DRI Introductory Training: Introduction to Metadata
DataONE Education Module 01: Why Data Management?
DataONE Education Module 02: Data Sharing
Data Management for Undergraduate Research
DataONE Education Module 08: Data Citation
Managing, Sharing and Curating Your Research Data in a Digital Environment
DataONE Education Module 03: Data Management Planning
MetadataTheory: Introduction to Metadata (5th of 10)
Information retrieval system
Ad

Similar to FSCI Data Discovery (20)

PPTX
Documentation and Metdata - VA DM Bootcamp
PPTX
Research data life cycle
PDF
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
PPTX
Research data management workshop April 2016
PPTX
Essentials 4 Data Support: a fine course in FAIR Data Support
PPTX
L07 metadata
PPT
Data Management for Undergraduate Researchers
PPTX
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
PPT
Planning for Research Data Management: 26th January 2016
PDF
The state of global research data initiatives: observations from a life on th...
PPTX
Managing provenance in the Social Sciences: the Data Documentation Initiative...
PPTX
Research Lifecycles and RDM
PDF
Researh data management
PPTX
Love Your Data Locally
PDF
Planning for Research Data Management
PPTX
Introduction to data management
PPTX
DataONE Education Module 07: Metadata
PPTX
Data Literacy: Creating and Managing Reserach Data
PDF
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
PPTX
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Documentation and Metdata - VA DM Bootcamp
Research data life cycle
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
Research data management workshop April 2016
Essentials 4 Data Support: a fine course in FAIR Data Support
L07 metadata
Data Management for Undergraduate Researchers
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Planning for Research Data Management: 26th January 2016
The state of global research data initiatives: observations from a life on th...
Managing provenance in the Social Sciences: the Data Documentation Initiative...
Research Lifecycles and RDM
Researh data management
Love Your Data Locally
Planning for Research Data Management
Introduction to data management
DataONE Education Module 07: Metadata
Data Literacy: Creating and Managing Reserach Data
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Ad

More from ARDC (20)

PPTX
Introduction to ADA
PPTX
Architecture and Standards
PPTX
Data Sharing and Release Legislation
PPT
Australian Dementia Network (ADNet)
PPTX
Investigator-initiated clinical trials: a community perspective
PPTX
NCRIS and the health domain
PPTX
International perspective for sharing publicly funded medical research data
PPTX
Clinical trials data sharing
PPTX
Clinical trials and cohort studies
PPTX
Introduction to vision and scope
PPTX
FAIR for the future: embracing all things data
PDF
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
PDF
Skilling-up-in-research-data-management-20181128
PDF
Research data management and sharing of medical data
PPTX
Findable, Accessible, Interoperable and Reusable (FAIR) data
PPTX
Applying FAIR principles to linked datasets: Opportunities and Challenges
PDF
How to make your data count webinar, 26 Nov 2018
PDF
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
PDF
How FAIR is your data? Copyright, licensing and reuse of data
PDF
Peter neish DMPs BoF eResearch 2018
Introduction to ADA
Architecture and Standards
Data Sharing and Release Legislation
Australian Dementia Network (ADNet)
Investigator-initiated clinical trials: a community perspective
NCRIS and the health domain
International perspective for sharing publicly funded medical research data
Clinical trials data sharing
Clinical trials and cohort studies
Introduction to vision and scope
FAIR for the future: embracing all things data
ARDC 2018 state engagements - Nov-Dec 2018 - Slides - Ian Duncan
Skilling-up-in-research-data-management-20181128
Research data management and sharing of medical data
Findable, Accessible, Interoperable and Reusable (FAIR) data
Applying FAIR principles to linked datasets: Opportunities and Challenges
How to make your data count webinar, 26 Nov 2018
Ready, Set, Go! Join the Top 10 FAIR Data Things Global Sprint
How FAIR is your data? Copyright, licensing and reuse of data
Peter neish DMPs BoF eResearch 2018

Recently uploaded (20)

PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
Introduction to Inferential Statistics.pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Leprosy and NLEP programme community medicine
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Microsoft Core Cloud Services powerpoint
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Global Data and Analytics Market Outlook Report
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
New ISO 27001_2022 standard and the changes
STERILIZATION AND DISINFECTION-1.ppthhhbx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Introduction to Inferential Statistics.pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Leprosy and NLEP programme community medicine
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Microsoft Core Cloud Services powerpoint
SAP 2 completion done . PRESENTATION.pptx
Global Data and Analytics Market Outlook Report
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
[EN] Industrial Machine Downtime Prediction
Database Infoormation System (DBIS).pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
New ISO 27001_2022 standard and the changes

FSCI Data Discovery

  • 1. Force 11 Scholarly Communications Institute Summer School 31 July – 4 August 2017 University of California, San Diego Data in the Scholarly Communications Lifecycle Natasha Simons Senior Research Data Specialist
  • 2. Wednesday 2 August Session one – data discovery • What is data discovery? • What is metadata? • Exercise: describe this dataset! • More about metadata including types, storage, standards, vocabularies, the power of rich metadata, m-2-m metadata exchange • Discussion groups exploring the FAIR data principles: • Findable • Accessible • Interoperable • Reusable Duration: 60 mins
  • 3. Why do people search for data?
  • 4. Why do people search for data*? • Exploratory/Scoping • Reuse/Secondary data analysis • Can be starting point or ad hoc • Peer review • Reproduce/extend results • Repurpose (e.g. for mashups, visualisations, simulations) • Verify claims (e.g. report findings) *Not in any order; not exhaustive!
  • 5. How do people find data?
  • 6. How do people find data*? • Google • Ask a colleague • Find link to data in a journal article • Data journals • Data registries e.g. re3data • Open data portals e.g. data.gov • Institutional repositories • Data / Discipline repositories e.g. Dryad • Project website • Data discovery aggregators like Research Data Australia • Library catalogues, databases *Not in any order; not exhaustive!
  • 7. Characteristics of finding data • Movable feast / changing beast • No standard practice, universal standard or vocab • Databases are non-exhaustive • Methods for searching and terms driven by why people are looking and how the data is stored
  • 8.  Find  Identify  Select  Obtain When looking for data, people need to: This is our guide for creating metadata records! FISO
  • 10. Exercise: create metadata Your task: 1. Divide into 2 groups 2. Each take one of the CSV data files 3. Describe the data! Duration: 10 mins
  • 11. Metadata records How did you go? What did you learn? Here are the original metadata descriptions: CSV dataset #1 - https://data.qld.gov.au/dataset/marine-oil- spills-data CSV dataset #2 – https://data.qld.gov.au/dataset/koala-hospital-data
  • 12. Types of metadata Metadata elements can describe either a single item or a collection, and can serve different purposes. Examples of metadata for a photograph could include: • descriptive metadata, such as the name of the photographer, the location and subject of the photograph, the date and time that the photograph was taken • technical metadata, such as the type of camera used to take the photograph, the file format in which the photograph is stored, the exposure time and dimensions of the photograph, and so on • access and rights metadata, defining who is allowed to view the photograph under what conditions, and what they can do with it (reuse) • preservation metadata, which keeps track of actions taken to preserve or sustain the photograph for later access and use. Source: ANDS website
  • 13. Where does metadata come from? • Metadata can be created manually by people or automatically by instruments or computers. • Metadata capture is easiest if it is automatically generated when the data is created, for example, the metadata your camera captures every time you take a photo. • For much research data, the researcher needs to create the descriptive and provenance metadata, as only they have that information. When should metadata for research data be created? Whenever it is needed, particularly: • During the course of data collection • When the data changes • And at the end when the data is deposited and ‘published’
  • 14. Where is metadata stored? • Metadata can be stored in local source systems like repositories – often with the data it is about. • Metadata that enables research data to be discovered and accessed should be published in discovery portals like Research Data Australia, or in discipline or institutional portals. • Metadata that gives detailed contextual information and supports reuse, such as data-item-level metadata, workflows, analysis, and detailed methods information, is usually stored with the data.
  • 15. The power of rich metadata • Well described metadata records show the power of rich metadata in making research data collections discoverable, citable, reusable and accessible for the long term. • Two-Rocks moorings data 2004 - 2005 metadata record in the CSIRO Data Access Portal contains 35 metadata fields which enable researchers to quickly and accurately assess the relevance of this dataset to their research. The metadata record and the data are closely linked through co-location on the same access page. The Files tab contains additional metadata about each of the 17 files within this collection: file type, last modified, and file size. • Rich metadata allows records to be syndicated to other data catalogues; here is the same Two-Rocks mooring data record syndicated to: • Research Data Australia: Australia’s aggregated research data catalogue • Marlin Oceans and Atmosphere: a discipline-specific metadata catalogue
  • 16. Metadata standards A metadata standard is a schema that has been formally approved and published, with governance procedures in place to maintain and update the standard. Examples: Dublin Core - http://dublincore.org/documents/dces/ RIF-CS – http://services.ands.org.au/documentation/rifcs/1.6/guidelines/rif- cs.html How do I find metadata standards? • See this disciplinary metadata directory
  • 17. Vocabularies and research data • A vocabulary sets out the common language a discipline has agreed to use to refer to concepts of interest in that discipline. • Researchers planning observation or surveys need to define their data items clearly. • An agreed vocabulary (a standard) makes a good starting point for translating concepts into other vocabularies so that collaboration can occur. • Indexing vocabularies are used to tag items in library catalogues and search portals and to provide keywords for academic journal articles. • A vocabulary service is a machine-to-machine service that can support activities such as creating, managing and querying vocabularies. Want more? • Read the ANDS Guide on vocabularies for research data • Explore Research Vocabularies Australia • Check out the COAR vocabularies
  • 18. m-2-m metadata exchange m-2-m = networked devices to exchange information and perform actions without the manual assistance of humans. Examples: OAI-PMH – also known as metadata harvesting and commonly used by repositories and repository aggregators APIs – such as the ORCID API that enables things like authenticating against the ORCID registry
  • 19. Want more? Have a go at these Things: Thing 4 – data discovery Thing 11 – what’s my metadata schema? Thing 12 – vocabularies for data description Thing 13 – walk the crosswalk
  • 20. Discussion: FAIR data Your task: • Divide into groups of 2 or 3 people • Read through the FAIR data principles: https://www.force11.org/group/fairgroup/fairprinciples • Discuss: • Are these good principles? Why? Why not? • How might these principles be put into practice? • Then we will re-group and you will be invited to share Duration: 20 mins
  • 21. FAIR data - resources In Nature - https://www.nature.com/articles/sdata201618 EUDAT webinar - https://eudat.eu/events/webinar/fair-data-in-trustworthy- data-repositories-webinar LIBER webinar - http://libereurope.eu/blog/2017/02/23/liber-webinar-fair- data-principles-fair/ The European Commission has established an Expert Group on Turning FAIR Data into Reality (E03464) which will run until Spring 2018. Horizon 2020 Guidelines on FAIR Data Management - http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/ oa_pilot/h2020-hi-oa-data-mgt_en.pdf
  • 22. With the exception of logos, third party images or where otherwise indicated, this work is licensed under the Creative Commons Australia Attribution 3.0 Licence. ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program. Monash University leads the partnership with the Australian National University and CSIRO. Natasha Simons natasha.simons@ands.org.au Tw: @n_simons ORCID: https://orcid.org/0000-0003-0635-1998

Editor's Notes

  • #21: https://www.force11.org/group/fairgroup/fairprinciples