SlideShare a Scribd company logo
Do It Yourself (DIY) Earth Science
Collaboratories Using Best Practices
and Breakthrough Technologies
IN13D-01
ERIC STEPHAN
December 11, 2017 1
Pacific Northwest National Laboratory
AGU Fall meeting 2017, New Orleans, LA
IN13D: Approaches for Curation to Data Discovery in the Era of Big Data Variety II
Addressing Data Challenges of Scientists on
Small and Midscale Budgets
Do it yourself (DIY) home project videos have taken storm in media,
helping you reroof a house or replace a water pump.
DIY recommendations can even help you determine if you can, do it yourself!
Talk targeting innovative smaller sized science projects that produce
quality science products including data that can be shared with future
consumer communities..
Many best practices can be carried out in even the humblest situations.
big data center, smaller projects want more effective ways to connect to your
resources beyond ’point and click’.
December 11, 2017 2
Emergence of Scientific Collaborative Tools –
Science inspired the Web and so much more!
Collaboratory - A center without walls, in which the nation’s researchers can perform their research
without regard to physical location, interacting with colleagues, accessing instrumentation, sharing data
and computational resources, [and] accessing information in digital libraries1
December 11, 2017 3
1The national collaboratory. In Towards a national collaboratory. Unpublished report of a National Science Foundation
invitational workshop, Rockefeller University, New York. 1988.
The DOE 2000 Project
Environmental Molecular Sciences
Laboratory (EMSL) User Facility
12 March 1989, Sir Tim
Berners-Lee original “vague
but exciting” submission to
CERN on a distributed
information system
National Institute of Health:
The Human Genome Project
(HGP) Began 1989.
Engage with EMSL to advance your research
How can we work together?
§ Collaborate with our experts
§ Work within multi-disc iplinary teams
to ac c elerate sc ience
§ Acc ess world-c lass sc ientific
user facilities and spec ialized
instrumentation
§ Provide research and c areer
opportunities for your students
Dec ember 8, 2017
www.emsl.pnnl.gov
www.universities.pnnl.gov
Examples of Off the Shelf and Standards
Deluge: What Works for You?
December 11, 2017 4
Attaining Data Study Afterlife?
December 11, 2017 5
Signal
Message
Application
Database
File store
Archive
Deep Web
Science publications
Data
Visibility through commercial search engine
New advancements in science
and engineering require
careful attention to keeping
scientific discovery literature
and data artifacts in
circulation
Example
Data
Lifecycle
“…Placed in storage, the data has as much
productive value as your labor value when
you sit on the sofa at night to watch TV. “
“…If you want to increase the value of your data
you have to increase its active circulation and
utility!” Steven Adler, DWBP co-Chair
Without some help, science can remain largely
invisible in the Deep Web
Increasing Lifespan, Reuse and Visibility DIY
 Choose from 35 DWBP best practices to match research functional needs
 Scope best practices with reference model sketches
 Assess off the shelf product capabilities and limitations with DWBP
 Identify required additional plumbing to accomplish research
https://www.w3.org/TR/dwbp/
DWBP Data Challenges and Motivating
Questions
December 11, 2017 7
Metadata
Data License
Provenance
Data Quality
Versioning
Identification
Data Formats
Vocabularies
Access
Preservation
Feedback
Enrichment
Replication
How do I provide metadata?
How do I permit/restrict access?
How can I convey transparency?
How can I add trust?
How can I track version history?
How can I create and use
persistent identifiers?
What non-proprietary structures
should I use?
How do I make my data more
easily understood?
How can I make data retrieval
easy, robust, and intuitive?
What should I consider when
archiving?
How can data producers and users
be better engaged?
How can I add better value to
data?
How do I use data responsibly?
“The Web is not a glorified USB Stick”,
Phil Archer, W3C Data Activity Lead https://www.w3.org/2017/Talks/0621-phila-oai/
http://w3c.github.io/dwbp/dwbp-implementation-report.html
Best Practices Benefit Measures
December 11, 2017 8
• Comprehension: humans will have a better understanding about the data
structure and meaning, the metadata and the nature of the dataset.
• Processability: machines can automatically ingest and operate on data.
• Discoverability: finding new associations between and in data resources.
• Reuse: increase intrinsic value to wider data consumer communities.
• Trust: improving the confidence that consumers have in the dataset.
• Linkability: it will be possible to associate data resources
• Access: humans and machines will be able to retrieve relevant data in familiar
common formats.
• Interoperability: cooperation among data publishers and consumers.
Using Technology Agnostic Reference
Models to Assess Best Practice Relevance
December 11, 2017 9
ISO Open Archival Information System (OAIS) ISO 14721:2003
The Context, Containers, Components and Classes (C4) model for software architecture
• Provide data provenance information
• Provide data quality information
• Provide a version indicator
• Provide version history
• Preserve identifiers
Example Context Data Producer Reference
Models
December 11, 2017 10
• Provide metadata
• Provide structural metadata
• Use machine-readable standardized data formats
• Provide data in multiple formats
• Reuse vocabularies, preferably standardized
ones
• Provide Subsets for Large Datasets
Provide bulk download
Provide Subsets for Large Datasets
Use Case: Energy Exascale Earth System
Model (E3SM) and Mass Spectrometry
Achieves this through IETF, W3C
formats, W3C Provenance,
Interoperable Protocols,
Off the shelf: Swagger, Jupyter
Notebook, NoSQL databases
Repurposed to support
reproducible Mass Spectrometry
Experiments
December 11, 2017 11
Focus: Recovering enough information to re-execute a given simulation
Thomas M, J Laskin, B Raju, EG Stephan, TO Elsethagen, NYS Van, and SN Nguyen. 2016. "Enabling Re-
executable Workflows with Near-real-time Visualization, Provenance Capture and Advanced Querying for Mass
Spectrometry Data." In NYSDS 2016 - Data-Driven Discovery.
Example Context Data Publisher Reference
Model
December 11, 2017 12
• Provide metadata
• Provide descriptive metadata
• Provide structural metadata
• Provide data provenance information
• Use locale-neutral data representations
• Reuse vocabularies, preferably standardized ones
• Choose the right formalization level
• Gather feedback from data consumers
• Enrich data by generating new data
• Provide Complementary Presentations
• Interoperability
• Use persistent URIs as identifiers of datasets
• Use persistent URIs as identifiers within datasets
• Reuse vocabularies, preferably standardized ones
• Choose the right formalization level
• Make data available through an API
• Use Web Standards as the foundation of APIs
• Avoid Breaking Changes to Your API
• Provide Feedback to the Original Publisher
• Provide data provenance information
• Provide data quality information
• Provide a version indicator
• Provide version history
• Preserve identifiers
December 11, 2017 13
Example curating and re-publishing to
support discovery
Based on a single soil moisture use case
1.4 billion triples curated measurement
metadata (i.e., relationships, graph edges)
Including descriptions of 777,230 datasets,
2,767 data catalogs,
1,701 data centers,
52 data networks.
Chappell AR, JR Weaver, S Purohit, WP Smith, KL Schuchardt, P West, B Lee, and P Fox. 2015. "Enhancing the Impact of Science Data:
Toward Data Discovery and Reuse." In Proceedings of the 14th IEEE/ACIS International Conference on Computer and Information Science
2015.
Ontology alignment
Query Optimization with SPARQL and
Schema.org
Use of services such as geonames.org
DWBP Implementation Report: Field
Guide to Examples of Best Practices
December 11, 2017 14
Use evaluation criteria in report for
assessing your own technology stack and
data resources.
http://w3c.github.io/dwbp/dwbp-implementation-report.html
Indirect Collaborations
December 11, 2017 15
Producers
Publishers
Analysts
Researchers
There is real interest in your data from
emerging fields!
Using common methods and
approaches are extremely helpful
indirect collaborations
Internationalizing your products can
widen your impact
Approach supports open and closed
(behind firewall) collaborations
Example
Data
Lifecycle
What Type of Data Terrain Are We Providing
for Future Science?
Active technical recommendation communities such as W3C are here to serve
you and are interested in your problems.
Evolving good practice as a guideline is less expensive than technology solution
context switching without good practices.
Success criteria described in the DWBP can help you measure benefit to your
project
Change is good, for legacy applications, good practice and new technology
adoption may be more impactful at a gradual pace
December 11, 2017 16
Questions? Eric.Stephan@pnnl.gov
Paraphrased from notes on TBL’s remarks at the the W3C Technical Plenary and Advisor Committee 2014
“Thank you for giving us level terrain to build upon”
Sir Tim Berners-Lee (inventor of the Web), recalling a conversation he had with Vint Cerf (co-
inventor of the Internet)
The International Data on the Web Best
Practices Recommendations Team!
Contributors:
• Annette Greiner (Lawrence Berkley National Laboratory)
• Antoine Isaac
• Carlos Iglesias
• Carlos Laufer
• Christophe Guéret
• Deirdre Lee (Working Group co-Chair)
• Doug Schepers
• Eric G. Stephan (Pacific Northwest National Laboratory)
• Eric Kauz
• Ghislain A. Atemezing
• Hadley Beeman (Working Group co-Chair)
• Ig Ibert Bittencourt
• João Paulo Almeida
• Makx Dekkers
• Peter Winstanley
• Phil Archer (Data Activity Chair)
• Riccardo Albertoni
• Sumit Purohit (Pacific Northwest National Laboratory)
• Yasodara Córdova December 11, 2017 17
DWBP Editors:
• Bernadette Farias Lóscio
• Caroline Burle
• Newton Calegari
Working Group Chairs
• Hadley Beeman
• Deirdre Lee
• Yasodara Córdova
• Steven Adler, Perspective & Community Outreach
W3C Data Activity Lead, W3C Team Contact: Phil Archer

More Related Content

PPTX
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...
PPTX
2017 bio it world
PPTX
Scott Edmunds slides from #IDCC13 Data Science session
PPTX
NSF DataNet Partners Update at RDAP14
PDF
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
PPT
Cyberistructure
PDF
THE Jisc Supplement 25 Nov 2009
PPTX
EMBL Australian Bioinformatics Resource AHM - Data Commons
2013 DataCite Summer Meeting - California Digital Library (Joan Starr - Calif...
2017 bio it world
Scott Edmunds slides from #IDCC13 Data Science session
NSF DataNet Partners Update at RDAP14
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...
Cyberistructure
THE Jisc Supplement 25 Nov 2009
EMBL Australian Bioinformatics Resource AHM - Data Commons

What's hot (20)

PPTX
Linked data presentation for libraries (COMO)
PPT
Presentation of science 2.0 at European Astronomical Society
PPTX
ESA14 Workshop on SEAD's Data Services and Tools
PPTX
Bonazzi commons bd2 k ahm 2016 v2
PDF
Everything Has Changed Except Us: Modernizing the Data Warehouse
PPTX
Research Data Curation _ Grad Humanities Class
PPTX
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
PPTX
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
PDF
Executive Summary - Data Management Hub
PPTX
DataViz_What_How_Why
PDF
Research Solutions for Education
PDF
Bi isn't big data and big data isn't BI (updated)
PDF
Big Data, The Community and The Commons (May 12, 2014)
PDF
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
PDF
Talk at OHSU, September 25, 2013
PDF
Assumptions about Data and Analysis: Briefing room webcast slides
PDF
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
PPTX
Data Discovery and Visualization
PDF
On community-standards, data curation and scholarly communication - BITS, Ita...
PDF
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
Linked data presentation for libraries (COMO)
Presentation of science 2.0 at European Astronomical Society
ESA14 Workshop on SEAD's Data Services and Tools
Bonazzi commons bd2 k ahm 2016 v2
Everything Has Changed Except Us: Modernizing the Data Warehouse
Research Data Curation _ Grad Humanities Class
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Executive Summary - Data Management Hub
DataViz_What_How_Why
Research Solutions for Education
Bi isn't big data and big data isn't BI (updated)
Big Data, The Community and The Commons (May 12, 2014)
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
Talk at OHSU, September 25, 2013
Assumptions about Data and Analysis: Briefing room webcast slides
New Data Science Framework for Analysing and Mining Big Data - Charith Silva
Data Discovery and Visualization
On community-standards, data curation and scholarly communication - BITS, Ita...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
Ad

Similar to Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and Breakthrough Technologies (20)

PDF
Digital Curation for Excel (DCXL)
PDF
Christophe Gueret: Publish Web data - an interactive session
PPTX
Paving the way to open and interoperable research data service workflows
PPTX
Ten Habits of Highly Effective Data
PPTX
Data accessibility and the role of informatics in predicting the biosphere
PPT
Acting as Advocate? Seven steps for libraries in the data decade
PPT
Data curation issues for repositories
PPTX
Paving the way to open and interoperable research data service workflows Prog...
PPTX
Why data science matters and what we can do with it
PDF
Tag.bio: Self Service Data Mesh Platform
PPTX
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
PDF
OpenML data@Sheffield
PPTX
Ten habits of highly effective data
PDF
OeRC Seminar
PPTX
Ten Habits of Highly Successful Data
PPT
discopen
PPTX
Building Data Ecosystems for Accelerated Discovery
PDF
Dataverse in the Universe of Data by Christine L. Borgman
PDF
ODIN Final Event - The Care and Feeding of Scientific Data
PPTX
The habits of highly successful data:
Digital Curation for Excel (DCXL)
Christophe Gueret: Publish Web data - an interactive session
Paving the way to open and interoperable research data service workflows
Ten Habits of Highly Effective Data
Data accessibility and the role of informatics in predicting the biosphere
Acting as Advocate? Seven steps for libraries in the data decade
Data curation issues for repositories
Paving the way to open and interoperable research data service workflows Prog...
Why data science matters and what we can do with it
Tag.bio: Self Service Data Mesh Platform
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
OpenML data@Sheffield
Ten habits of highly effective data
OeRC Seminar
Ten Habits of Highly Successful Data
discopen
Building Data Ecosystems for Accelerated Discovery
Dataverse in the Universe of Data by Christine L. Borgman
ODIN Final Event - The Care and Feeding of Scientific Data
The habits of highly successful data:
Ad

More from Eric Stephan (6)

PPTX
Increasing the Reputation of your Published Data on the Web
PPTX
Diary of a Wimpy Model Manager
PDF
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
PDF
Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate...
PDF
Climate Science for a Sustainable Energy Future Provenance
PDF
The Symbiotic Nature of Provenance and Workflow
Increasing the Reputation of your Published Data on the Web
Diary of a Wimpy Model Manager
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate...
Climate Science for a Sustainable Energy Future Provenance
The Symbiotic Nature of Provenance and Workflow

Recently uploaded (20)

PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Business Analytics and business intelligence.pdf
PDF
Mega Projects Data Mega Projects Data
PPT
Quality review (1)_presentation of this 21
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Introduction to machine learning and Linear Models
PDF
Foundation of Data Science unit number two notes
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Computer network topology notes for revision
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Miokarditis (Inflamasi pada Otot Jantung)
Business Analytics and business intelligence.pdf
Mega Projects Data Mega Projects Data
Quality review (1)_presentation of this 21
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Galatica Smart Energy Infrastructure Startup Pitch Deck
Qualitative Qantitative and Mixed Methods.pptx
Database Infoormation System (DBIS).pptx
Introduction to machine learning and Linear Models
Foundation of Data Science unit number two notes
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
ISS -ESG Data flows What is ESG and HowHow
Clinical guidelines as a resource for EBP(1).pdf
Computer network topology notes for revision
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Business Ppt On Nestle.pptx huunnnhhgfvu

Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and Breakthrough Technologies

  • 1. Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and Breakthrough Technologies IN13D-01 ERIC STEPHAN December 11, 2017 1 Pacific Northwest National Laboratory AGU Fall meeting 2017, New Orleans, LA IN13D: Approaches for Curation to Data Discovery in the Era of Big Data Variety II
  • 2. Addressing Data Challenges of Scientists on Small and Midscale Budgets Do it yourself (DIY) home project videos have taken storm in media, helping you reroof a house or replace a water pump. DIY recommendations can even help you determine if you can, do it yourself! Talk targeting innovative smaller sized science projects that produce quality science products including data that can be shared with future consumer communities.. Many best practices can be carried out in even the humblest situations. big data center, smaller projects want more effective ways to connect to your resources beyond ’point and click’. December 11, 2017 2
  • 3. Emergence of Scientific Collaborative Tools – Science inspired the Web and so much more! Collaboratory - A center without walls, in which the nation’s researchers can perform their research without regard to physical location, interacting with colleagues, accessing instrumentation, sharing data and computational resources, [and] accessing information in digital libraries1 December 11, 2017 3 1The national collaboratory. In Towards a national collaboratory. Unpublished report of a National Science Foundation invitational workshop, Rockefeller University, New York. 1988. The DOE 2000 Project Environmental Molecular Sciences Laboratory (EMSL) User Facility 12 March 1989, Sir Tim Berners-Lee original “vague but exciting” submission to CERN on a distributed information system National Institute of Health: The Human Genome Project (HGP) Began 1989. Engage with EMSL to advance your research How can we work together? § Collaborate with our experts § Work within multi-disc iplinary teams to ac c elerate sc ience § Acc ess world-c lass sc ientific user facilities and spec ialized instrumentation § Provide research and c areer opportunities for your students Dec ember 8, 2017 www.emsl.pnnl.gov www.universities.pnnl.gov
  • 4. Examples of Off the Shelf and Standards Deluge: What Works for You? December 11, 2017 4
  • 5. Attaining Data Study Afterlife? December 11, 2017 5 Signal Message Application Database File store Archive Deep Web Science publications Data Visibility through commercial search engine New advancements in science and engineering require careful attention to keeping scientific discovery literature and data artifacts in circulation Example Data Lifecycle “…Placed in storage, the data has as much productive value as your labor value when you sit on the sofa at night to watch TV. “ “…If you want to increase the value of your data you have to increase its active circulation and utility!” Steven Adler, DWBP co-Chair Without some help, science can remain largely invisible in the Deep Web
  • 6. Increasing Lifespan, Reuse and Visibility DIY  Choose from 35 DWBP best practices to match research functional needs  Scope best practices with reference model sketches  Assess off the shelf product capabilities and limitations with DWBP  Identify required additional plumbing to accomplish research https://www.w3.org/TR/dwbp/
  • 7. DWBP Data Challenges and Motivating Questions December 11, 2017 7 Metadata Data License Provenance Data Quality Versioning Identification Data Formats Vocabularies Access Preservation Feedback Enrichment Replication How do I provide metadata? How do I permit/restrict access? How can I convey transparency? How can I add trust? How can I track version history? How can I create and use persistent identifiers? What non-proprietary structures should I use? How do I make my data more easily understood? How can I make data retrieval easy, robust, and intuitive? What should I consider when archiving? How can data producers and users be better engaged? How can I add better value to data? How do I use data responsibly? “The Web is not a glorified USB Stick”, Phil Archer, W3C Data Activity Lead https://www.w3.org/2017/Talks/0621-phila-oai/ http://w3c.github.io/dwbp/dwbp-implementation-report.html
  • 8. Best Practices Benefit Measures December 11, 2017 8 • Comprehension: humans will have a better understanding about the data structure and meaning, the metadata and the nature of the dataset. • Processability: machines can automatically ingest and operate on data. • Discoverability: finding new associations between and in data resources. • Reuse: increase intrinsic value to wider data consumer communities. • Trust: improving the confidence that consumers have in the dataset. • Linkability: it will be possible to associate data resources • Access: humans and machines will be able to retrieve relevant data in familiar common formats. • Interoperability: cooperation among data publishers and consumers.
  • 9. Using Technology Agnostic Reference Models to Assess Best Practice Relevance December 11, 2017 9 ISO Open Archival Information System (OAIS) ISO 14721:2003 The Context, Containers, Components and Classes (C4) model for software architecture
  • 10. • Provide data provenance information • Provide data quality information • Provide a version indicator • Provide version history • Preserve identifiers Example Context Data Producer Reference Models December 11, 2017 10 • Provide metadata • Provide structural metadata • Use machine-readable standardized data formats • Provide data in multiple formats • Reuse vocabularies, preferably standardized ones • Provide Subsets for Large Datasets Provide bulk download Provide Subsets for Large Datasets
  • 11. Use Case: Energy Exascale Earth System Model (E3SM) and Mass Spectrometry Achieves this through IETF, W3C formats, W3C Provenance, Interoperable Protocols, Off the shelf: Swagger, Jupyter Notebook, NoSQL databases Repurposed to support reproducible Mass Spectrometry Experiments December 11, 2017 11 Focus: Recovering enough information to re-execute a given simulation Thomas M, J Laskin, B Raju, EG Stephan, TO Elsethagen, NYS Van, and SN Nguyen. 2016. "Enabling Re- executable Workflows with Near-real-time Visualization, Provenance Capture and Advanced Querying for Mass Spectrometry Data." In NYSDS 2016 - Data-Driven Discovery.
  • 12. Example Context Data Publisher Reference Model December 11, 2017 12 • Provide metadata • Provide descriptive metadata • Provide structural metadata • Provide data provenance information • Use locale-neutral data representations • Reuse vocabularies, preferably standardized ones • Choose the right formalization level • Gather feedback from data consumers • Enrich data by generating new data • Provide Complementary Presentations • Interoperability • Use persistent URIs as identifiers of datasets • Use persistent URIs as identifiers within datasets • Reuse vocabularies, preferably standardized ones • Choose the right formalization level • Make data available through an API • Use Web Standards as the foundation of APIs • Avoid Breaking Changes to Your API • Provide Feedback to the Original Publisher • Provide data provenance information • Provide data quality information • Provide a version indicator • Provide version history • Preserve identifiers
  • 13. December 11, 2017 13 Example curating and re-publishing to support discovery Based on a single soil moisture use case 1.4 billion triples curated measurement metadata (i.e., relationships, graph edges) Including descriptions of 777,230 datasets, 2,767 data catalogs, 1,701 data centers, 52 data networks. Chappell AR, JR Weaver, S Purohit, WP Smith, KL Schuchardt, P West, B Lee, and P Fox. 2015. "Enhancing the Impact of Science Data: Toward Data Discovery and Reuse." In Proceedings of the 14th IEEE/ACIS International Conference on Computer and Information Science 2015. Ontology alignment Query Optimization with SPARQL and Schema.org Use of services such as geonames.org
  • 14. DWBP Implementation Report: Field Guide to Examples of Best Practices December 11, 2017 14 Use evaluation criteria in report for assessing your own technology stack and data resources. http://w3c.github.io/dwbp/dwbp-implementation-report.html
  • 15. Indirect Collaborations December 11, 2017 15 Producers Publishers Analysts Researchers There is real interest in your data from emerging fields! Using common methods and approaches are extremely helpful indirect collaborations Internationalizing your products can widen your impact Approach supports open and closed (behind firewall) collaborations Example Data Lifecycle
  • 16. What Type of Data Terrain Are We Providing for Future Science? Active technical recommendation communities such as W3C are here to serve you and are interested in your problems. Evolving good practice as a guideline is less expensive than technology solution context switching without good practices. Success criteria described in the DWBP can help you measure benefit to your project Change is good, for legacy applications, good practice and new technology adoption may be more impactful at a gradual pace December 11, 2017 16 Questions? Eric.Stephan@pnnl.gov Paraphrased from notes on TBL’s remarks at the the W3C Technical Plenary and Advisor Committee 2014 “Thank you for giving us level terrain to build upon” Sir Tim Berners-Lee (inventor of the Web), recalling a conversation he had with Vint Cerf (co- inventor of the Internet)
  • 17. The International Data on the Web Best Practices Recommendations Team! Contributors: • Annette Greiner (Lawrence Berkley National Laboratory) • Antoine Isaac • Carlos Iglesias • Carlos Laufer • Christophe Guéret • Deirdre Lee (Working Group co-Chair) • Doug Schepers • Eric G. Stephan (Pacific Northwest National Laboratory) • Eric Kauz • Ghislain A. Atemezing • Hadley Beeman (Working Group co-Chair) • Ig Ibert Bittencourt • João Paulo Almeida • Makx Dekkers • Peter Winstanley • Phil Archer (Data Activity Chair) • Riccardo Albertoni • Sumit Purohit (Pacific Northwest National Laboratory) • Yasodara Córdova December 11, 2017 17 DWBP Editors: • Bernadette Farias Lóscio • Caroline Burle • Newton Calegari Working Group Chairs • Hadley Beeman • Deirdre Lee • Yasodara Córdova • Steven Adler, Perspective & Community Outreach W3C Data Activity Lead, W3C Team Contact: Phil Archer