SlideShare a Scribd company logo
A Method to Select
e-Infrastructure Components
to Sustain
Daniel S. Katz
dkatz@nsf.gov & d.katz@ieee.org
Program Director, Division of
Advanced Cyberinfrastructure,
National Science Foundation
&
David Proctor
djproctor@gmail.com
International Consortium of
Research Staff Associations
Reusable Infrastructure
• Systems and components created by one or
more people and intended to be used by others
• Over the last century, research infrastructure
(from microscopes to telescopes, and from
sequencers to colliders) has become essential
for many types of research
• Over the past few decades, most interfaces to
research infrastructure, and often the
infrastructure itself, have become digital
e-Infrastructure Defined
• e-Infrastructure, or cyberinfrastructure has
been defined by Craig Stewart as consisting of
“computing systems, data storage systems,
advanced instruments and data repositories,
visualization environments, and people, all
linked together by software and high
performance networks to improve research
productivity and enable breakthroughs not
otherwise possible”
• Can discuss both e-Infrastructure itself and
e-Infrastructure elements
e-Infrastructure Element Contexts
• Technical
– Architecture: How does it fit into the overall
infrastructure? How does it interact with other
infrastructure elements?
• Social
– Developers: Who has developed the element?
– Users: Who uses the element?
– Purpose: What is the intended use of the element?
• Political
– Funders: Who funds the development and
maintenance?
– Scope: Is the element national? International?
– Impact: How is the element valued by researchers
and funders
e-Infrastructure Element
• What resources are needed to create it,
and how can those resources be
assembled and applied?
• What resources are needed to sustain it,
and how can those resources be
assembled and applied?
e-Infrastructure Element
• What resources are needed to create it,
and how can those resources be
assembled and applied?
• What resources are needed to sustain it,
and how can those resources be
assembled and applied?
• Focus on 2nd half of questions here, since
amount and type of needed resources
vary widely by element
Infrastructure Challenges (1)
• Science
– Larger teams, more disciplines, more countries
• Data
– Size, complexity, rates all increasing rapidly
– Need for interoperability (systems and policies)
• Systems
– More cores, more architectures (GPUs), more memory
hierarchy
– Changing balances (latency versus bandwidth)
– Changing limits (power, funds)
– System architecture and business models changing
(clouds)
– Network capacity growing; increased networks ->
increased security
Infrastructure Challenges (2)
• Software
– Multiphysics algorithms, frameworks
– Programing models and abstractions for science,
data, and hardware
– Validation and verification, reproducibility
– Resilience & fault tolerance
• People
– Education and training
– Career paths
– Credit and attribution
e-Infrastructure Space
Temporal DurationSpatialExtent
Temporal Duration
Temporal DurationSpatialExtent
• Decisions that are made
about creating and
sustaining infrastructure
elements need to include
awareness of the expected
lifetime of the element.
• Rough estimates:
– Computer systems: 5 years
– Networks & instruments: 10
years
– Software: 20 years
– People: 40 years
– Data (& publications):
forever (or until they are no
longer useful, whichever
comes first)
Spatial Extent
Temporal DurationSpatialExtent
• In academia, could range:
– Lab
– Department
– College or school
– University
– University system or
regional alliance
– Nation
– Beyond
• General research institutions,
e.g. national labs, might have
similar type units with
different names, e.g. ‘Division’
rather than ‘Department’
Purpose
Temporal DurationSpatialExtent
• Ranges from:
– Used for one problem
• Maybe not infrastructure
– Used for variety of problems
in a discipline (e.g., climate
data from Arctic ice cores)
– Used for variety of problems
across many disciplines
(e.g., molecular dynamics
software)
– Used across all disciplines
(e.g., network, HPC system)
• Linked to temporal duration
– Lifetime of software element
may be 20 years, if the
element isn’t useful, its
lifetime will be shortened
Scale
Temporal DurationSpatialExtent
• Number of users (and
cost) should be larger
the farther the
element is from the
origin in any direction
• Generically called
‘scale’
• Scale is a metric of
the space, though not
orthogonal to any of
the three axes
Defining Sustainability
• [Environment Change]: Response
• [Dependent Infrastructure] When infrastructure on which the
element relies change, will element continue to provide the
same functionality?
• [Collaborative Infrastructure] As both collaborative elements
and the element changes, can the element still be
combined with other elements to meet user needs?
• [New Users] Are functionality and usability of the element
clearly explained to new users? Do users have a way to ask
questions and learn about the element?
• [Existing Users] Does the element provide the functionality
that current users want? As future needs develop, is the
element modular and adaptable so that it can meet them?
• [Science] As new science and theory develop, will the
element incorporate and implement them?
WSSPE2 Sustainability Discussion
Enablers
********** healthy and vibrant communities; vibrant community to champion
software
********** designing for growth and extension - open development
******* culture in community for reuse
**** portability
**** culture in developer community to support transition between
developers
*** interdisciplinary people: science + IT experience
** planning for end of life
** make smart choices about dependencies
* thinking of software as product lines - long term vs short term view
not all communities need new software
converting use into resources
WSSPE2 Sustainability Discussion
Barriers
******* lack of incentives, including promotion and tenure process; promotion
and tenure process in academic is incompatible with sustainability
***** absent or poor documentation
***** funding to ensure sustainability is difficult to obtain
**** developers are not computer scientists; don't have software
engineering practices (in particular, those needed to scale-up projects
to support and be developed by a large sustainable community)
*** overreliance on one or two people - bus test
** rate of change of underlying technologies
** lack of business models for sustainability
* lack of training for how to build sustainability into the system
maintenance needed for software is not visible, appears to ``just
happen’’
licensing issues
staff turnover - lack of continuity
Sustainability Models
• Open source
• Closed partnership
• For profit
• Dual licensing
• Open source and paid support
• Foundation or government
Governance
• Governance tells the community how the project makes
decisions and how they can be involved.
• Community = users and/or developers and/or advocates
...
• Important particularly when open source is involved, to a
lesser extent in the other models
• Examples of open source governance models
– Benevolent dictatorship, as in Linux kernel
– Meritocracy, as in Apache Foundation projects
• Can be considered top-down and bottom-up governance,
respectively
• Note: orthogonal to top-down (cathedral) and bottom-up
(bazaar) development
Which models work where?
• Research is needed to understand and correlate
success or failure of models with different
portions of infrastructure element space
• Some sample questions:
– Do the cathedral or bazaar governance model
correlate with successful projects along any or all
axes?
• For example, perhaps one works better at small scale,
and the other works better at large scale
– Do particular resource assembly and application
models cluster along any or all axes?
• Temporal duration: government funding at large values
of; mix of models at middle values; closed partnerships
at small values?
Conclusions
• Aim: encourage thought about how
e-Infrastructure elements and e-Infrastructure
itself should be considered
• In terms of how different elements may have
commonalities and differences across types of
element, user communities, etc.
• Want to begin a discussion about these issues
• Potential questions:
– Are the axes meaningful?
– What factors correlate with them?
– Are there any clusters?
• We are eager to receive feedback
Reference
Katz, D.S. and Proctor, D.
“A Framework for Discussing e-
Research Infrastructure Sustainability”
Journal of Open Research Software
2(1):e13, 2014.
DOI: 10.5334/jors.av

More Related Content

PDF
On data-driven systems analyzing, supporting and enhancing users’ interaction...
PDF
SGCI - S2I2: Science Gateways Community Institute
PPTX
What is eScience, and where does it go from here?
PDF
2009-C&T-NodeXL and social queries - a social media network analysis toolkit
PDF
Exploiting classical bibliometrics of CSCW: classification, evaluation, limit...
PDF
Sgci esip-7-20-18
PPTX
e-Research and the Demise of the Scholarly Article
PPTX
Big Data meets Big Social: Social Machines and the Semantic Web
On data-driven systems analyzing, supporting and enhancing users’ interaction...
SGCI - S2I2: Science Gateways Community Institute
What is eScience, and where does it go from here?
2009-C&T-NodeXL and social queries - a social media network analysis toolkit
Exploiting classical bibliometrics of CSCW: classification, evaluation, limit...
Sgci esip-7-20-18
e-Research and the Demise of the Scholarly Article
Big Data meets Big Social: Social Machines and the Semantic Web

What's hot (20)

PDF
Xsede for-nlhpc
PDF
Introduction to Systems Engineering
PDF
Network Science: Theory, Modeling and Applications
PDF
Infrastructure, relationships, trust, and RDA
PPT
Embracing Social Software And Semantic Web In Digital Libraries
PPTX
Reality Mining
PDF
Sgci nsf-si2-2-21-17
PPTX
Social Machines of Science and Scholarship
PPTX
Infrastructure As Afterthought
PPT
Making our mark: the important role of social scientists in the ‘era of big d...
PPTX
A Semantics-based Approach to Machine Perception
PPTX
ELIXIR . Technical Coordinator
PPT
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
PPTX
What's up at Kno.e.sis?
PDF
Resume sima das
PDF
Bridging Gaps and Broadening Participation in Today's and Future Research Com...
PPTX
SPARC Repositories conference in Baltimore - Nov 2010
PPT
Reality Mining (Nathan Eagle)
PPT
BYU CS Colloquium Presentation
PDF
The importance of FAIR and the Community of Data Driven Insights - the road t...
Xsede for-nlhpc
Introduction to Systems Engineering
Network Science: Theory, Modeling and Applications
Infrastructure, relationships, trust, and RDA
Embracing Social Software And Semantic Web In Digital Libraries
Reality Mining
Sgci nsf-si2-2-21-17
Social Machines of Science and Scholarship
Infrastructure As Afterthought
Making our mark: the important role of social scientists in the ‘era of big d...
A Semantics-based Approach to Machine Perception
ELIXIR . Technical Coordinator
Data Processing and Semantics for Advanced Internet of Things (IoT) Applicati...
What's up at Kno.e.sis?
Resume sima das
Bridging Gaps and Broadening Participation in Today's and Future Research Com...
SPARC Repositories conference in Baltimore - Nov 2010
Reality Mining (Nathan Eagle)
BYU CS Colloquium Presentation
The importance of FAIR and the Community of Data Driven Insights - the road t...
Ad

Similar to A Method to Select e-Infrastructure Components to Sustain (20)

PPT
Cultivating Sustainable Software For Research
PPTX
Fundamentals of software sustainability
PPTX
SGCI-URSSI-Sustainability in Research Computing
PPTX
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
PPTX
Working towards Sustainable Software for Science (an NSF and community view)
PPTX
Funding Software in Academia
PDF
Research software susainability
PPT
Future Profiles of e-Research
PDF
A Vision for a European e-Infrastructure for the 21st Century
PPTX
Scientific Software: Sustainability, Skills & Sociology
PPT
Why Good Software Sometimes Dies... and how to save it
PPTX
Research Software Sustainability takes a Village
PDF
NSF SI2 program discussion at 2013 SI2 PI meeting
PPTX
Software Sustainability Institute
PDF
Sustainability in Scientific Software: Ecosystem complexity and Software Vis...
ODP
Infrastructure is development
PDF
SGCI - The Science Gateways Community Institute: International Collaboration ...
PDF
WSSSPE: Building communities
PPTX
Open Source and Science at the National Science Foundation (NSF)
PPT
Software Sustainability in e-Research: Dying for a Change
Cultivating Sustainable Software For Research
Fundamentals of software sustainability
SGCI-URSSI-Sustainability in Research Computing
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Working towards Sustainable Software for Science (an NSF and community view)
Funding Software in Academia
Research software susainability
Future Profiles of e-Research
A Vision for a European e-Infrastructure for the 21st Century
Scientific Software: Sustainability, Skills & Sociology
Why Good Software Sometimes Dies... and how to save it
Research Software Sustainability takes a Village
NSF SI2 program discussion at 2013 SI2 PI meeting
Software Sustainability Institute
Sustainability in Scientific Software: Ecosystem complexity and Software Vis...
Infrastructure is development
SGCI - The Science Gateways Community Institute: International Collaboration ...
WSSSPE: Building communities
Open Source and Science at the National Science Foundation (NSF)
Software Sustainability in e-Research: Dying for a Change
Ad

More from Daniel S. Katz (20)

PPTX
Software Professionals (RSEs) at NCSA
PPTX
Parsl: Pervasive Parallel Programming in Python
PDF
Citation and Research Objects: Toward Active Research Objects
PDF
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
PPTX
Software Citation in Theory and Practice
PPTX
PDF
Research Software Sustainability: WSSSPE & URSSI
PDF
Software citation
PDF
Expressing and sharing workflows
PDF
Citation and reproducibility in software
PPTX
Software Citation: Principles, Implementation, and Impact
PPTX
Summary of WSSSPE and its working groups
PPTX
Working towards Sustainable Software for Science: Practice and Experience (WS...
PPTX
20160607 citation4software panel
PPTX
20160607 citation4software opening
PPTX
Scientific Software Challenges and Community Responses
PPTX
What do we need beyond a DOI?
PPTX
Looking at Software Sustainability and Productivity Challenges from NSF
PPTX
Scientific research: What Anna Karenina teaches us about useful negative results
PPTX
Panel: Our Scholarly Recognition System Doesn’t Still Work
Software Professionals (RSEs) at NCSA
Parsl: Pervasive Parallel Programming in Python
Citation and Research Objects: Toward Active Research Objects
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
Software Citation in Theory and Practice
Research Software Sustainability: WSSSPE & URSSI
Software citation
Expressing and sharing workflows
Citation and reproducibility in software
Software Citation: Principles, Implementation, and Impact
Summary of WSSSPE and its working groups
Working towards Sustainable Software for Science: Practice and Experience (WS...
20160607 citation4software panel
20160607 citation4software opening
Scientific Software Challenges and Community Responses
What do we need beyond a DOI?
Looking at Software Sustainability and Productivity Challenges from NSF
Scientific research: What Anna Karenina teaches us about useful negative results
Panel: Our Scholarly Recognition System Doesn’t Still Work

Recently uploaded (20)

PDF
Advanced IT Governance
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
Teaching material agriculture food technology
PDF
Approach and Philosophy of On baking technology
PDF
cuic standard and advanced reporting.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Advanced IT Governance
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Understanding_Digital_Forensics_Presentation.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Review of recent advances in non-invasive hemoglobin estimation
Chapter 3 Spatial Domain Image Processing.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Unlocking AI with Model Context Protocol (MCP)
Teaching material agriculture food technology
Approach and Philosophy of On baking technology
cuic standard and advanced reporting.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...

A Method to Select e-Infrastructure Components to Sustain

  • 1. A Method to Select e-Infrastructure Components to Sustain Daniel S. Katz dkatz@nsf.gov & d.katz@ieee.org Program Director, Division of Advanced Cyberinfrastructure, National Science Foundation & David Proctor djproctor@gmail.com International Consortium of Research Staff Associations
  • 2. Reusable Infrastructure • Systems and components created by one or more people and intended to be used by others • Over the last century, research infrastructure (from microscopes to telescopes, and from sequencers to colliders) has become essential for many types of research • Over the past few decades, most interfaces to research infrastructure, and often the infrastructure itself, have become digital
  • 3. e-Infrastructure Defined • e-Infrastructure, or cyberinfrastructure has been defined by Craig Stewart as consisting of “computing systems, data storage systems, advanced instruments and data repositories, visualization environments, and people, all linked together by software and high performance networks to improve research productivity and enable breakthroughs not otherwise possible” • Can discuss both e-Infrastructure itself and e-Infrastructure elements
  • 4. e-Infrastructure Element Contexts • Technical – Architecture: How does it fit into the overall infrastructure? How does it interact with other infrastructure elements? • Social – Developers: Who has developed the element? – Users: Who uses the element? – Purpose: What is the intended use of the element? • Political – Funders: Who funds the development and maintenance? – Scope: Is the element national? International? – Impact: How is the element valued by researchers and funders
  • 5. e-Infrastructure Element • What resources are needed to create it, and how can those resources be assembled and applied? • What resources are needed to sustain it, and how can those resources be assembled and applied?
  • 6. e-Infrastructure Element • What resources are needed to create it, and how can those resources be assembled and applied? • What resources are needed to sustain it, and how can those resources be assembled and applied? • Focus on 2nd half of questions here, since amount and type of needed resources vary widely by element
  • 7. Infrastructure Challenges (1) • Science – Larger teams, more disciplines, more countries • Data – Size, complexity, rates all increasing rapidly – Need for interoperability (systems and policies) • Systems – More cores, more architectures (GPUs), more memory hierarchy – Changing balances (latency versus bandwidth) – Changing limits (power, funds) – System architecture and business models changing (clouds) – Network capacity growing; increased networks -> increased security
  • 8. Infrastructure Challenges (2) • Software – Multiphysics algorithms, frameworks – Programing models and abstractions for science, data, and hardware – Validation and verification, reproducibility – Resilience & fault tolerance • People – Education and training – Career paths – Credit and attribution
  • 10. Temporal Duration Temporal DurationSpatialExtent • Decisions that are made about creating and sustaining infrastructure elements need to include awareness of the expected lifetime of the element. • Rough estimates: – Computer systems: 5 years – Networks & instruments: 10 years – Software: 20 years – People: 40 years – Data (& publications): forever (or until they are no longer useful, whichever comes first)
  • 11. Spatial Extent Temporal DurationSpatialExtent • In academia, could range: – Lab – Department – College or school – University – University system or regional alliance – Nation – Beyond • General research institutions, e.g. national labs, might have similar type units with different names, e.g. ‘Division’ rather than ‘Department’
  • 12. Purpose Temporal DurationSpatialExtent • Ranges from: – Used for one problem • Maybe not infrastructure – Used for variety of problems in a discipline (e.g., climate data from Arctic ice cores) – Used for variety of problems across many disciplines (e.g., molecular dynamics software) – Used across all disciplines (e.g., network, HPC system) • Linked to temporal duration – Lifetime of software element may be 20 years, if the element isn’t useful, its lifetime will be shortened
  • 13. Scale Temporal DurationSpatialExtent • Number of users (and cost) should be larger the farther the element is from the origin in any direction • Generically called ‘scale’ • Scale is a metric of the space, though not orthogonal to any of the three axes
  • 14. Defining Sustainability • [Environment Change]: Response • [Dependent Infrastructure] When infrastructure on which the element relies change, will element continue to provide the same functionality? • [Collaborative Infrastructure] As both collaborative elements and the element changes, can the element still be combined with other elements to meet user needs? • [New Users] Are functionality and usability of the element clearly explained to new users? Do users have a way to ask questions and learn about the element? • [Existing Users] Does the element provide the functionality that current users want? As future needs develop, is the element modular and adaptable so that it can meet them? • [Science] As new science and theory develop, will the element incorporate and implement them?
  • 15. WSSPE2 Sustainability Discussion Enablers ********** healthy and vibrant communities; vibrant community to champion software ********** designing for growth and extension - open development ******* culture in community for reuse **** portability **** culture in developer community to support transition between developers *** interdisciplinary people: science + IT experience ** planning for end of life ** make smart choices about dependencies * thinking of software as product lines - long term vs short term view not all communities need new software converting use into resources
  • 16. WSSPE2 Sustainability Discussion Barriers ******* lack of incentives, including promotion and tenure process; promotion and tenure process in academic is incompatible with sustainability ***** absent or poor documentation ***** funding to ensure sustainability is difficult to obtain **** developers are not computer scientists; don't have software engineering practices (in particular, those needed to scale-up projects to support and be developed by a large sustainable community) *** overreliance on one or two people - bus test ** rate of change of underlying technologies ** lack of business models for sustainability * lack of training for how to build sustainability into the system maintenance needed for software is not visible, appears to ``just happen’’ licensing issues staff turnover - lack of continuity
  • 17. Sustainability Models • Open source • Closed partnership • For profit • Dual licensing • Open source and paid support • Foundation or government
  • 18. Governance • Governance tells the community how the project makes decisions and how they can be involved. • Community = users and/or developers and/or advocates ... • Important particularly when open source is involved, to a lesser extent in the other models • Examples of open source governance models – Benevolent dictatorship, as in Linux kernel – Meritocracy, as in Apache Foundation projects • Can be considered top-down and bottom-up governance, respectively • Note: orthogonal to top-down (cathedral) and bottom-up (bazaar) development
  • 19. Which models work where? • Research is needed to understand and correlate success or failure of models with different portions of infrastructure element space • Some sample questions: – Do the cathedral or bazaar governance model correlate with successful projects along any or all axes? • For example, perhaps one works better at small scale, and the other works better at large scale – Do particular resource assembly and application models cluster along any or all axes? • Temporal duration: government funding at large values of; mix of models at middle values; closed partnerships at small values?
  • 20. Conclusions • Aim: encourage thought about how e-Infrastructure elements and e-Infrastructure itself should be considered • In terms of how different elements may have commonalities and differences across types of element, user communities, etc. • Want to begin a discussion about these issues • Potential questions: – Are the axes meaningful? – What factors correlate with them? – Are there any clusters? • We are eager to receive feedback
  • 21. Reference Katz, D.S. and Proctor, D. “A Framework for Discussing e- Research Infrastructure Sustainability” Journal of Open Research Software 2(1):e13, 2014. DOI: 10.5334/jors.av

Editor's Notes

  • #18: Open source: a leader (or a set of leaders) promotes a goal of creating an infrastructure element in a public manner and a community voluntarily forms to work together on this goal. Closed partnership: a set of partners works together to create an infrastructure element, but the partnership is not open to external contributions. For profit: a group creates an infrastructure element using its own resources with the goal of later selling, leasing, or licensing the element or its design to recover the expended resource and make a profit. Dual licensing: a group creates an infrastructure element using its own resources with the goal of allowing academic free use (and depending on the license, perhaps gaining further free contributions from that academic community), while also selling, leasing, or licensing the element or its design to industry in order to recover the expended resource and perhaps make a profit, or at least, break even. This model also often has an implicit goal of not allowing other companies to financially profit directly from the element. Open source and paid support: a group supports an open source element in exchange for resources from the users of that support. The support can include helping the users with the existing element, and adding features to the element for the supported users, though these added features become available to all users, not just those who have paid for support. Foundation or government: one or more groups convince an organization that promotes public advancement that creating an infrastructure element will be a public good that should be supported.