SlideShare a Scribd company logo
Converged IT and Data Commons
Simon Twigger, Ph.D.
1
Molecular Med Tri-Con
February 13th 2018
BioTeam
2
est, Objective, Vendor and
nology agnostic
ears bridging the gap
een IT and Science
by scientists forced to
IT to get the job done
ual company with
About Me
3
Strategic assessments,
Cloud (AWS/Google),
DevOps, Data Commons,
software development,
Whole genome sequencing resource identifies 18 new
candidate genes for autism spectrum disorder
Nature Neuroscience 20, 602–611 (2017)
Overview
‣ Scope
• Organization’s perspective
• Planning & implementation considerations
‣ Strategy around ‘data commons’
• How might it support the ‘bigger picture’?
‣ Implementation of a data commons
• What does it involve, what tools/tech might be useful
4
Strategy for a Data Commons
5
+
- Preliminary
Data
Supporting
Data
Run
Experiments
Raw
Data
Management
Analyses
Archive
Data
Publish
Reuse
Data
Generic Scientist’s Data ‘Journey’
Neutral
Experience
Likely external
Download as
needed.
Have Equipment
& instruments
Not consistent
Rarely any plan
Data spread out
No structure unless
using core facility
Instrument backup
uncertain
Compute OK
Have software
Have tools
Backup ?
Hard to find
Cant track across
project
Storage limits
Save long term
Physical data -
slides
Rarely reused
May not be readily
reusable by others
Hard to find data
Rely on original
person
or manual hunting
Can find own data
May not use others
No real issues
May store pub’d
data in one spot
Submit to GEO,
etc
What is a data commons?
7
An integrated (converged) environment that
provides access to shared data, compute and
analytic tools at a scale (or convenience)
greater than that typically available
Data Commons
Data Compute Tools
Where are the key areas for your research
What is in common for you?
8
Data Commons
Data
Compute
Tools
Do you need a commons….?
9
Lots of need for
compute, just need a
cluster?
Data Commons
Lots of data, not much
in common
Just need more
storage, and/or data
management?
Data
Common
sSignificant amounts of
data in common, plus
compute and tools
What problem are we solving, is a commons the
answer…
Strategy
10
‣ What does our data/compute/tools usage look like?
‣ What are the common issues that a Commons might
help with?
‣ What should our Commons contain?
‣ How does a Data Commons fit with our longer term
goals?
‣ How will we measure success?
To get some real information on what’s going on,
what the real problems (opportunities!) are
Digital Asset Inventory
‣ Experimental Approaches
• What type of analyses are they doing, what obstacles are getting in
their way?
• What data is the input, what is the output, file formats?
• Data volumes, storage & compute requirements
‣ Data Management
• Data management plan (ha!), “Wild West”?
• Metadata, descriptors, ontologies?
‣ Search/Retrieval/Sharing
• How do they go back and find old data, what do they search on
• What do they share (if anything), with whom
11
To get some real information on what’s going on,
what the real problems (opportunities!) are
Digital Asset Inventory
‣ Informatics/Core groups
• Algorithms, software (version control!), pipelines, data types,
data volumes, software packaging & deployment
• Workflow, workflow tools, data movement
• Data sharing, Data archiving & retrieval
‣ IT staff
• Current storage, rate of data growth, data lifecycle
• Compute resources, usage; Network, data flow
‣ Leadership
• Primary goals, data management strategy, budget, risks
12
It could help address a number of challenges for a
variety of audiences
Reasons for a Commons
‣ Scientists
• manage data, find data, ideally share data
• Democratize access to data & associated compute
‣ IT
• Manage storage, reduce duplication, ensure backups and DR
• Security - ensure the environment is appropriate for the data being
use (e.g. clinical)
• Consolidate compute into fewer environments, converge towards a
common platform…
‣ Organizational Leadership
• Promote management/sharing/reuse of data, leverage existing data
for new discoveries, reduce risk 13
Implementation considerations
14
Lab2 Lab2
Raw
Data
Final
Data
Reuse
Data is generated, added to a ‘commons’
environment for others to use
General Commons Data Flow
15
Lab Lab
Core
Raw
Data
Final
Data
Data Commons
Publish
Potential Stakeholders
‣ Scientists
‣ Division/Group Head
‣ PI/Lab Head/Lab manager
‣ Lab Tech
‣ Postdoc, Student, etc.
‣ Collaborator
‣ Informatics Team Members
‣ Informatics team lead
‣ Data scientist
‣ Core Labs
‣ Head of Core Lab
‣ Core lab manager (if different from the
Head of the Core)
‣ Scientist within the core lab
‣ Information Technology Team
Members
‣ Person in charge of compute, HPC, VMs,
Containers, Cloud, etc
‣ Person in charge of storage, etc.
‣ Person in charge of managing backups,
replication, and archiving
‣ Person in charge of storage capacity
planning
‣ Person in charge of network, data
movement to and from HPC, storage
‣ Person in charge of maintaining commons-
related systems, deployment, updates,
maintenance.
‣ Security and Compliance Office
‣ Leadership
‣ Persons responsible for strategic IT
decisions and purchasing
‣ Billing - to assign storage costs to specific
groups/users
‣ Legal - to be able to find data to respond to
formal requests for information (e.g. FOIA),
institute legal holds, data retention policies
‣ Non-human users (scripts, etc.)
‣ Scripts written to find data, add metadata,
move data, catalog usage, etc.
16
Define use cases, stories, competency questions
Nail down the details
17
As a: Scientist
I want: as much useful metadata associated with my data files as
possible, while doing as little extra work as possible (preferably no
extra work) to add this metadata.
So that: I can benefit from searching, reporting, organization, etc.
that comes with high quality metadata without having to take away
time and effort from research to add the metadata manually.
(the defining) User Story…
Example Competency questions
https://biocaddie.org/workgroup-3-group-links
NIH Data Commons Stack
18
Generic Commons Architecture
19
Data Processing for the Commons
20
Tools and Technologies
21
https://cdis.uchicago.edu/gen3
Tools and Technologies
22
https://dockstore.org/
https://github.com/NERSC/shifter
Containers, workflows, containers on HPC
http://singularity.lbl.gov/
http://geekyap.blogspot.ch/2016/11/docker-vs-singularity-vs-shifter-in-hpc.html
http://biocontainers.pro/
Tools and Technologies
23
https://irods.org/
http://www.arcitecta.com/
http://www.starfishstorage.com/
Mediaflux
Data and storage management, metadata
Final Thoughts
24
Considerations
‣ Define the Commons for you
• Address real pain points for your community
• What does success look like?
‣ Its a complex engineering challenge
• Databases, containers, compute, network, storage, etc.
• ‘Just clone the repo’ never quite works as hoped…
‣ Its a complex social engineering challenge..
• Common metadata, formats, sharing, collaboration
• Scientists would rather share their tooth brush than…
‣ Its (ideally) a long term commitment
• Funding, ‘evolvability’ to avoid technology lock-in
25
Metrics for these…?
Goals for a Commons
‣ Scientists
• Can manage data, find data, securely share data
• Have ready access to data & associated compute
‣ IT
• Has visibility into storage, has reduced duplication
• Have ensured backups and enabled (and tested) DR
• Are confident that the environment is appropriate for the data being used (e.g. clinical)
• Have consolidated compute into fewer environments and are converging towards a
common platform…
‣ Organizational Leadership
• Can demonstrate sharing/reuse of data
• Have examples of leveraging existing data for new discoveries
• Can quantify the reduction in risk
26

More Related Content

PPTX
Introduction to Data Management
PPTX
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
PDF
Unit 3 part 2
PDF
Is that a scientific report or just some cool pictures from the lab? Reproduc...
PDF
Introduction to data science intro,ch(1,2,3)
PPTX
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
PDF
Data management (1)
PPTX
Introduction to Data Science
Introduction to Data Management
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
Unit 3 part 2
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Introduction to data science intro,ch(1,2,3)
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
Data management (1)
Introduction to Data Science

What's hot (20)

PDF
Introduction to Data Science
PPTX
Big data road map
PDF
BIOMAG2018 - Denis Engemann - MNE-HCP
PPTX
Minimal viable-datareuse-czi
PDF
Facilitating good research data management practice as part of scholarly publ...
PPTX
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
PDF
Introduction To Data Science
PPTX
Intro to Data Science by DatalentTeam at Data Science Clinic#11
PDF
Basics of Research Data Management
PDF
Is one enough? Data warehousing for biomedical research
PPTX
Record matching over query results from Web Databases
PDF
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
PPTX
Session 10 handling bigger data
PDF
Big data service architecture: a survey
PDF
HathiTrust Research Center Secure Commons
PDF
Open Data, Big Data and Machine Learning
PDF
Big Data Analytics
PPTX
From Data Search to Data Showcasing
PPTX
Big data analytics
PDF
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Introduction to Data Science
Big data road map
BIOMAG2018 - Denis Engemann - MNE-HCP
Minimal viable-datareuse-czi
Facilitating good research data management practice as part of scholarly publ...
Some Ideas on Making Research Data: "It's the Metadata, stupid!"
Introduction To Data Science
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Basics of Research Data Management
Is one enough? Data warehousing for biomedical research
Record matching over query results from Web Databases
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Session 10 handling bigger data
Big data service architecture: a survey
HathiTrust Research Center Secure Commons
Open Data, Big Data and Machine Learning
Big Data Analytics
From Data Search to Data Showcasing
Big data analytics
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Ad

Similar to Converged IT and Data Commons (20)

PPTX
No Free Lunch: Metadata in the life sciences
PPTX
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
PDF
FAIR BioData Management
PPT
Data management plans
PPTX
Intro to RDM
PPT
eScience: A Transformed Scientific Method
PDF
PPTX
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
PDF
How to Prepare for a Career in Data Science
PPT
Data management plans (dmp) for nsf
PPT
Data management plans (dmp) for nsf
PDF
Open Science Governance and Regulation/Simon Hodson
PPTX
FAIRDOM data management support for ERACoBioTech Proposals
PPT
Data management plans
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
PPTX
Pemanfaatan Big Data Dalam Riset 2023.pptx
PDF
TOUG Big Data Challenge and Impact
PDF
The state of global research data initiatives: observations from a life on th...
PPTX
Data Management and Horizon 2020
PDF
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
No Free Lunch: Metadata in the life sciences
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
FAIR BioData Management
Data management plans
Intro to RDM
eScience: A Transformed Scientific Method
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
How to Prepare for a Career in Data Science
Data management plans (dmp) for nsf
Data management plans (dmp) for nsf
Open Science Governance and Regulation/Simon Hodson
FAIRDOM data management support for ERACoBioTech Proposals
Data management plans
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Pemanfaatan Big Data Dalam Riset 2023.pptx
TOUG Big Data Challenge and Impact
The state of global research data initiatives: observations from a life on th...
Data Management and Horizon 2020
Facilitating Collaborative Life Science Research in Commercial & Enterprise E...
Ad

More from Simon Twigger (9)

PPTX
A Distributed Annotation Pipeline for MSSNG
PPT
DevOps and Automation for Bioinformaticians
PDF
NCBO DBP
KEY
the iPad - an interface for Biologists?
KEY
Using Ontologies to accelerate candidate gene identification
KEY
Semantic Web Approaches to Candidate Gene Identification
KEY
Helping Haiti - a semantic web approach to crisis information management
PPT
Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...
PPT
Virtual Proteomics Analysis Cluster in the Cloud
A Distributed Annotation Pipeline for MSSNG
DevOps and Automation for Bioinformaticians
NCBO DBP
the iPad - an interface for Biologists?
Using Ontologies to accelerate candidate gene identification
Semantic Web Approaches to Candidate Gene Identification
Helping Haiti - a semantic web approach to crisis information management
Using the NCBO Web Services for Concept Recognition and Ontology Annotation o...
Virtual Proteomics Analysis Cluster in the Cloud

Recently uploaded (20)

PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
INTRODUCTION TO EVS | Concept of sustainability
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
2. Earth - The Living Planet Module 2ELS
PPT
protein biochemistry.ppt for university classes
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PDF
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
PDF
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
The scientific heritage No 166 (166) (2025)
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
HPLC-PPT.docx high performance liquid chromatography
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
INTRODUCTION TO EVS | Concept of sustainability
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
2. Earth - The Living Planet Module 2ELS
protein biochemistry.ppt for university classes
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
SEHH2274 Organic Chemistry Notes 1 Structure and Bonding.pdf
VARICELLA VACCINATION: A POTENTIAL STRATEGY FOR PREVENTING MULTIPLE SCLEROSIS
Biophysics 2.pdffffffffffffffffffffffffff
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Placing the Near-Earth Object Impact Probability in Context
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
The scientific heritage No 166 (166) (2025)
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...

Converged IT and Data Commons

  • 1. Converged IT and Data Commons Simon Twigger, Ph.D. 1 Molecular Med Tri-Con February 13th 2018
  • 2. BioTeam 2 est, Objective, Vendor and nology agnostic ears bridging the gap een IT and Science by scientists forced to IT to get the job done ual company with
  • 3. About Me 3 Strategic assessments, Cloud (AWS/Google), DevOps, Data Commons, software development, Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder Nature Neuroscience 20, 602–611 (2017)
  • 4. Overview ‣ Scope • Organization’s perspective • Planning & implementation considerations ‣ Strategy around ‘data commons’ • How might it support the ‘bigger picture’? ‣ Implementation of a data commons • What does it involve, what tools/tech might be useful 4
  • 5. Strategy for a Data Commons 5
  • 6. + - Preliminary Data Supporting Data Run Experiments Raw Data Management Analyses Archive Data Publish Reuse Data Generic Scientist’s Data ‘Journey’ Neutral Experience Likely external Download as needed. Have Equipment & instruments Not consistent Rarely any plan Data spread out No structure unless using core facility Instrument backup uncertain Compute OK Have software Have tools Backup ? Hard to find Cant track across project Storage limits Save long term Physical data - slides Rarely reused May not be readily reusable by others Hard to find data Rely on original person or manual hunting Can find own data May not use others No real issues May store pub’d data in one spot Submit to GEO, etc
  • 7. What is a data commons? 7 An integrated (converged) environment that provides access to shared data, compute and analytic tools at a scale (or convenience) greater than that typically available Data Commons Data Compute Tools
  • 8. Where are the key areas for your research What is in common for you? 8 Data Commons Data Compute Tools
  • 9. Do you need a commons….? 9 Lots of need for compute, just need a cluster? Data Commons Lots of data, not much in common Just need more storage, and/or data management? Data Common sSignificant amounts of data in common, plus compute and tools
  • 10. What problem are we solving, is a commons the answer… Strategy 10 ‣ What does our data/compute/tools usage look like? ‣ What are the common issues that a Commons might help with? ‣ What should our Commons contain? ‣ How does a Data Commons fit with our longer term goals? ‣ How will we measure success?
  • 11. To get some real information on what’s going on, what the real problems (opportunities!) are Digital Asset Inventory ‣ Experimental Approaches • What type of analyses are they doing, what obstacles are getting in their way? • What data is the input, what is the output, file formats? • Data volumes, storage & compute requirements ‣ Data Management • Data management plan (ha!), “Wild West”? • Metadata, descriptors, ontologies? ‣ Search/Retrieval/Sharing • How do they go back and find old data, what do they search on • What do they share (if anything), with whom 11
  • 12. To get some real information on what’s going on, what the real problems (opportunities!) are Digital Asset Inventory ‣ Informatics/Core groups • Algorithms, software (version control!), pipelines, data types, data volumes, software packaging & deployment • Workflow, workflow tools, data movement • Data sharing, Data archiving & retrieval ‣ IT staff • Current storage, rate of data growth, data lifecycle • Compute resources, usage; Network, data flow ‣ Leadership • Primary goals, data management strategy, budget, risks 12
  • 13. It could help address a number of challenges for a variety of audiences Reasons for a Commons ‣ Scientists • manage data, find data, ideally share data • Democratize access to data & associated compute ‣ IT • Manage storage, reduce duplication, ensure backups and DR • Security - ensure the environment is appropriate for the data being use (e.g. clinical) • Consolidate compute into fewer environments, converge towards a common platform… ‣ Organizational Leadership • Promote management/sharing/reuse of data, leverage existing data for new discoveries, reduce risk 13
  • 15. Lab2 Lab2 Raw Data Final Data Reuse Data is generated, added to a ‘commons’ environment for others to use General Commons Data Flow 15 Lab Lab Core Raw Data Final Data Data Commons Publish
  • 16. Potential Stakeholders ‣ Scientists ‣ Division/Group Head ‣ PI/Lab Head/Lab manager ‣ Lab Tech ‣ Postdoc, Student, etc. ‣ Collaborator ‣ Informatics Team Members ‣ Informatics team lead ‣ Data scientist ‣ Core Labs ‣ Head of Core Lab ‣ Core lab manager (if different from the Head of the Core) ‣ Scientist within the core lab ‣ Information Technology Team Members ‣ Person in charge of compute, HPC, VMs, Containers, Cloud, etc ‣ Person in charge of storage, etc. ‣ Person in charge of managing backups, replication, and archiving ‣ Person in charge of storage capacity planning ‣ Person in charge of network, data movement to and from HPC, storage ‣ Person in charge of maintaining commons- related systems, deployment, updates, maintenance. ‣ Security and Compliance Office ‣ Leadership ‣ Persons responsible for strategic IT decisions and purchasing ‣ Billing - to assign storage costs to specific groups/users ‣ Legal - to be able to find data to respond to formal requests for information (e.g. FOIA), institute legal holds, data retention policies ‣ Non-human users (scripts, etc.) ‣ Scripts written to find data, add metadata, move data, catalog usage, etc. 16
  • 17. Define use cases, stories, competency questions Nail down the details 17 As a: Scientist I want: as much useful metadata associated with my data files as possible, while doing as little extra work as possible (preferably no extra work) to add this metadata. So that: I can benefit from searching, reporting, organization, etc. that comes with high quality metadata without having to take away time and effort from research to add the metadata manually. (the defining) User Story… Example Competency questions https://biocaddie.org/workgroup-3-group-links
  • 18. NIH Data Commons Stack 18
  • 20. Data Processing for the Commons 20
  • 22. Tools and Technologies 22 https://dockstore.org/ https://github.com/NERSC/shifter Containers, workflows, containers on HPC http://singularity.lbl.gov/ http://geekyap.blogspot.ch/2016/11/docker-vs-singularity-vs-shifter-in-hpc.html http://biocontainers.pro/
  • 25. Considerations ‣ Define the Commons for you • Address real pain points for your community • What does success look like? ‣ Its a complex engineering challenge • Databases, containers, compute, network, storage, etc. • ‘Just clone the repo’ never quite works as hoped… ‣ Its a complex social engineering challenge.. • Common metadata, formats, sharing, collaboration • Scientists would rather share their tooth brush than… ‣ Its (ideally) a long term commitment • Funding, ‘evolvability’ to avoid technology lock-in 25
  • 26. Metrics for these…? Goals for a Commons ‣ Scientists • Can manage data, find data, securely share data • Have ready access to data & associated compute ‣ IT • Has visibility into storage, has reduced duplication • Have ensured backups and enabled (and tested) DR • Are confident that the environment is appropriate for the data being used (e.g. clinical) • Have consolidated compute into fewer environments and are converging towards a common platform… ‣ Organizational Leadership • Can demonstrate sharing/reuse of data • Have examples of leveraging existing data for new discoveries • Can quantify the reduction in risk 26

Editor's Notes

  • #5: Scope - your institution or company has decided that a Data Commons is needed - now what?
  • #6: What problem are we solving?
  • #7: General view of scientist’s data journey - many areas are OK, things are getting done, however, room for improvement in many areas, and particularly in data management and reuse
  • #8: Greater Scale = Data is key, but its not just more data, compute or tools, also more access for more people who couldn’t get at these types of resources previously
  • #9: One or more of these needs to be in common and significantly ‘big’/important/painful for a commons approach to make sense
  • #10: Data commons really requires a reasonable amount of Data in Common (common formats, commonly used, commonly accessed,
  • #11: Needs to address a real problem, have a demonstrable impact on something important. How might you find what these things are - can’t guess, you have to ask people and one way is to conduct a digital asset inventory
  • #12: Product development, Talk to the users, find out what their problems are, particularly as it relates to issues that a Data Commons might help with - Here’s some questions the research staff
  • #13: Product development, Talk to the users, find out what their problems are, particularly as it relates to issues that a Data Commons might help with (more storage, compute, analysis, more access to all of the above). IT is a critical partner as they need to be on board, will have much to contribute and potentially have much to benefit from this type of environment. Leadership can help articulate the bigger vision, what their main goals are, primary concerns, budget, etc.
  • #14: Lots of potential benefits from a Commons environment, however, which one(s) are relevant and important to your organization/constituency?
  • #17: Lots of groups/people to consider with a project of this nature
  • #21: What metadata attributes are needed, what terms will be used, what QC is necessary Note - scientists aren’t great at metadata…
  • #22: This is a really good set of high quality open source platforms Docker, DockStore, Packer, etc. are all great ways to go, either to reuse the Gen3 platform or to use to create your own environment.
  • #23: Dockstore - OICR, interesting Shifter - Containers on HPC, Docker-compatible Singularity - Containers on HPC,
  • #24: This is a really good set of high quality open source platforms
  • #26: Complex engineering - do you have the staff with the skills to pull this off? Social - Build it and they probably won’t come unless there’s clearly something in it for them Long term - its not just the first, sexy, 3-6m commons initiative, its the long term dedication to data management, data sharing
  • #27: Lots of potential benefits from a Commons environment, however, which one(s) are relevant and important to your organization/constituency?