SlideShare a Scribd company logo
Thoughts on Cybersecurity
informed by large international
science & the Open Science Grid
Frank Würthwein
OSG Executive Director
UCSD/SDSC
Let’s reset your perception first
Jensen Huang keynote @ SC19
3
The Largest Cloud Simulation in History
50k NVIDIA GPUs in the Cloud
350 Petaflops for 2 hours
Distributed across US, Europe & Asia
Saturday morning before SC19 we bought all GPU capacity that was for sale in
Amazon Web Services, Microsoft Azure, and Google Cloud Platform worldwide
Science with 51,000 GPUs
achieved as peak performance
4
Time in Minutes
Each color is a different
cloud region in US, EU, or Asia.
Total of 28 Regions in use.
Peaked at 51,500 GPUs
~380 Petaflops of fp32
I can purchase a 300PFLOP32 hour in the cloud for $15k today
and nobody asks me any questions about cybersecurity.
• Nothing about my nationality or visa or …
• Nothing about two-factor authentication or my software
• Everything is wide open on the internet
Should cybersecurity requirements imposed
on open academic research executed at on-
prem resources be adjusted to the realities of
executing the same research on cloud
resources ?
Science is an International Team Sport
Science is a Team Sport
7
The ATLAS Collaboration
8
∼200 institutions across ∼40 countries
Cybersecurity enabling Science
• Humanity has built extraordinary instruments by
pooling human and financial resources globally.
• To derive science from the data and simulations
for those instruments requires globally
integrated Cyberinfrastructure.
• Cybersecurity is enabling this science.
 Policy framework
 Operational security
 Infrastructure software
9
Disk space use per site by CMS
XENON Collaboration as
a “Midscale” Instrument Example
XENON1T Storage & Processing
Challenge
• Experiment in Gran Sasso, Italy
• Tape Archive in Sweden
• Disk storage in 7 locations across Holland, Italy,
Israel, France, USA
 Petabyte of data divided into 20k datasets
• Compute sites on EGI, OSG, and NSF HPC
allocation
11
OSG took on the integration challenge
via ”embedded” technical support.
XENON1T Globally
Integrated Infrastructure
12
NIKHEF
Amsterdam
SURFSara
Amsterdam
Comet
XD Allocation
IN2P3
Lyon
Weizman
Tel Aviv
OSG integrates HPC allocations, contributions from collaborators,
and opportunistic capacity into a single platform to do science on.
Resource Federation
OSG Compute Federation
14
OSG federates
~200 clusters
worldwide
Owners determine
policy of use.
Many allow
opportunistic use
of spare capacity.
> 2 Billion CPU core hours per year
Federation Principle
• Any provider can bring their resources to the
table.
• Truth in advertising:
 Resource providers accurately specify (some)
details about the resource.
• Any consumer can decide which of the
available resources they are willing to use.
15
OSG matches consumers to providers globally
following policies expressed locally.
“NETFLIX” for Open Science
• NETFLIX operates a CDN, providing streaming access to
searchable curated data from anywhere at anytime to any
subscriber.
• For open science, the CDN needs to (in addition) be federated.
 Anybody can share their data from their locally owned data origin into the
CDN.
 Data Access is mediated via caches in the network and at endpoints to
minimize requirements on origins to maximally stimulate sharing.
 Performance of data access is determined by location and performance of
the closest cache rather than the data’s origin.
 Locally defined and managed groups of users share data securely with
each other globally. Data access is global.
16
Locally defined policies are enforced globally by the CDN
The OSG Data Federation
17
Cur r ent st ashcache infr ast r uct ur e (US)
GaTech
We operate a production “prototype” of such a CDN
Two Challenges to think about
Authz: Person vs Capability
• Operations teams are a mix of ”permanent”
staff and transients.
 E.g. CERN pays for ”Operators” funded via
”authorship fees”.
• Delegating a person’s identity to a computing
activity in order to authenticate the activity at a
remote server makes little sense.
• Delegating a capability to a computing activity
in order to authenticate it at a remote server
makes a lot of sense.
19
Division of Responsibility
• To maximize the capacity provided we need
to minimize the effort required to provide it.
• The services required for the CDN and/or
compute federations are specialized and
non-trivial.
 Large learning curve to achieve low cost
operations.
20
Service Operations is most (cost) effective
when separated from hardware operations
Network Cache Ops Model
• OSG supports the researchers
using the Data Federation
• OSG deploys & operates the
caching middleware.
• PRP, TNRP, I2, Regionals, …
responsible for network
performance.
• Hardware owners operate
hardware, OS install, and join
K8S for container orchestration.
21
Science Applications
Data Federation Services
Network Performance
Hardware & OS
A layered approach to distributed DevOps Responsibility
Cybersecurity Issues (I)
• Hardware owners only provide hardware
 Deploy OS and Kubernetes.
• Service Operators (I)
 A team that operates the K8S cluster.
• Service Operators (II)
 A team that deploys and operates the CDN service as
containers inside (and across generally multiple) K8S
clusters.
• Software Operations
 A team that provides the container images
22
How do you design a security model that supports this structure?
Cybersecurity Issues (II)
• Container Security Model
• Security Model that allows hardware owners to give service
responsibility to service operators.
 Diverse requirements
 Some institutions will want to operate their own K8S simply because of the
level of control that implies.
 Others won’t because of the level of effort it requires.
 How do DOE and other National Labs fit into this?
 How can a service provider in the US operate a service on hardware in
EU and Asia? Or vice versa.
 What about India, Pakistan, China, Iran, … pick your favorite country ….
 How to deal with institutions that require US Citizenship even for SUDO
access?
23
The set of issues and diversity of constraints seems endless
And now think back to the beginning: All of this is trivial in the cloud!!!
Summary & Conclusions
24
• Humanity has built extraordinary instruments by
pooling human and financial resources globally.
• To derive science from the data and simulations
for those instruments requires globally
integrated Cyberinfrastructure.
• Cybersecurity is enabling this science.
 Policy framework
 Operational security
 Infrastructure software
Contact us at: help@opensciencegrid.org
Or me personally at: fkw@ucsd.edu
Acknowledgements
• This work was partially supported by the
NSF grants OAC-1941481, MPS-1148698,
OAC-1841530, OAC-1904444, and OAC-
1826967
25

More Related Content

PDF
Bergman Enabling Computation for neuro ML external
PDF
LambdaFabric for Machine Learning Acceleration
PPTX
The Pacific Research Platform Two Years In
DOCX
Grid computing assiment
PDF
WekaIO: Making Machine Learning Compute Bound Again
PPTX
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
PDF
Long Live Posix - HPC Storage and the HPC Datacenter
PPTX
Welcome to the 2018 Stanford HPC Conference
Bergman Enabling Computation for neuro ML external
LambdaFabric for Machine Learning Acceleration
The Pacific Research Platform Two Years In
Grid computing assiment
WekaIO: Making Machine Learning Compute Bound Again
Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning
Long Live Posix - HPC Storage and the HPC Datacenter
Welcome to the 2018 Stanford HPC Conference

What's hot (20)

PPT
Grid
PDF
At the Crossroads of HPC and Cloud Computing with Openstack
PPT
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
PPTX
HPC Top 5 Stories: January 12, 2018
PDF
Cloud Computing in D-Grid
PDF
CloudLab Overview
PDF
Cloud Standards in the Real World: Cloud Standards Testing for Developers
PPT
Grid Presentation
PPTX
Open Science Data Cloud (IEEE Cloud 2011)
PPTX
Bionimbus - Northwestern CGI Workshop 4-21-2011
PDF
OGF Standards Overview - ITU-T JCA Cloud
PPTX
Open Science Data Cloud - CCA 11
PDF
OGF Introductory Overview - FAS* 2014
PPTX
Cloud vs grid
PPT
Globus toolkit in grid
PPT
Grid computing ppt 2003(done)
PDF
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
PDF
OCCI - The Open Cloud Computing Interface – flexible, portable, interoperable...
PDF
Deep Learning Use Cases using OpenPOWER systems
Grid
At the Crossroads of HPC and Cloud Computing with Openstack
High Performance Cyberinfrastructure Enabling Data-Driven Science in the Biom...
HPC Top 5 Stories: January 12, 2018
Cloud Computing in D-Grid
CloudLab Overview
Cloud Standards in the Real World: Cloud Standards Testing for Developers
Grid Presentation
Open Science Data Cloud (IEEE Cloud 2011)
Bionimbus - Northwestern CGI Workshop 4-21-2011
OGF Standards Overview - ITU-T JCA Cloud
Open Science Data Cloud - CCA 11
OGF Introductory Overview - FAS* 2014
Cloud vs grid
Globus toolkit in grid
Grid computing ppt 2003(done)
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
OCCI - The Open Cloud Computing Interface – flexible, portable, interoperable...
Deep Learning Use Cases using OpenPOWER systems
Ad

Similar to Thoughts on Cybersecurity (20)

PDF
The Open Science Grid and how it relates to PRAGMA
PDF
MPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
PDF
Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...
PDF
What Are Science Clouds?
PPT
Grid optical network service architecture for data intensive applications
PDF
Using the Open Science Data Cloud for Data Science Research
PPT
Cyberinfrastructure and Applications Overview: Howard University June22
PDF
Future Science on Future OpenStack
PPTX
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
PPTX
Past, present and future of advanced computing for data-driven science
PDF
Response to Commerce Dept's IoT RFC
PDF
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
PDF
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
PDF
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
PPTX
OpenStack Paris 2014 - Federation, are we there yet ?
PDF
Hpc, grid and cloud computing - the past, present, and future challenge
PDF
DDDP 2019 - Brown to Green
PPTX
20130529 openstack cee_day_v6
PDF
ISC Cloud13 Sill - Crossing organizational boundaries in cloud computing
PDF
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
The Open Science Grid and how it relates to PRAGMA
MPLS/SDN 2013 Intercloud Standardization and Testbeds - Sill
Amy Walton - NSF’s Computational Ecosystem for 21st Century Science & Enginee...
What Are Science Clouds?
Grid optical network service architecture for data intensive applications
Using the Open Science Data Cloud for Data Science Research
Cyberinfrastructure and Applications Overview: Howard University June22
Future Science on Future OpenStack
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Past, present and future of advanced computing for data-driven science
Response to Commerce Dept's IoT RFC
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
OpenStack Paris 2014 - Federation, are we there yet ?
Hpc, grid and cloud computing - the past, present, and future challenge
DDDP 2019 - Brown to Green
20130529 openstack cee_day_v6
ISC Cloud13 Sill - Crossing organizational boundaries in cloud computing
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Ad

Recently uploaded (20)

PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
BIOMOLECULES PPT........................
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPT
protein biochemistry.ppt for university classes
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
famous lake in india and its disturibution and importance
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
2Systematics of Living Organisms t-.pptx
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PDF
Sciences of Europe No 170 (2025)
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
. Radiology Case Scenariosssssssssssssss
PDF
The scientific heritage No 166 (166) (2025)
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Biophysics 2.pdffffffffffffffffffffffffff
Comparative Structure of Integument in Vertebrates.pptx
BIOMOLECULES PPT........................
Derivatives of integument scales, beaks, horns,.pptx
protein biochemistry.ppt for university classes
ECG_Course_Presentation د.محمد صقران ppt
Phytochemical Investigation of Miliusa longipes.pdf
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
famous lake in india and its disturibution and importance
bbec55_b34400a7914c42429908233dbd381773.pdf
2Systematics of Living Organisms t-.pptx
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Sciences of Europe No 170 (2025)
TOTAL hIP ARTHROPLASTY Presentation.pptx
. Radiology Case Scenariosssssssssssssss
The scientific heritage No 166 (166) (2025)
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.

Thoughts on Cybersecurity

  • 1. Thoughts on Cybersecurity informed by large international science & the Open Science Grid Frank Würthwein OSG Executive Director UCSD/SDSC
  • 2. Let’s reset your perception first
  • 3. Jensen Huang keynote @ SC19 3 The Largest Cloud Simulation in History 50k NVIDIA GPUs in the Cloud 350 Petaflops for 2 hours Distributed across US, Europe & Asia Saturday morning before SC19 we bought all GPU capacity that was for sale in Amazon Web Services, Microsoft Azure, and Google Cloud Platform worldwide
  • 4. Science with 51,000 GPUs achieved as peak performance 4 Time in Minutes Each color is a different cloud region in US, EU, or Asia. Total of 28 Regions in use. Peaked at 51,500 GPUs ~380 Petaflops of fp32 I can purchase a 300PFLOP32 hour in the cloud for $15k today and nobody asks me any questions about cybersecurity. • Nothing about my nationality or visa or … • Nothing about two-factor authentication or my software • Everything is wide open on the internet
  • 5. Should cybersecurity requirements imposed on open academic research executed at on- prem resources be adjusted to the realities of executing the same research on cloud resources ?
  • 6. Science is an International Team Sport
  • 7. Science is a Team Sport 7
  • 8. The ATLAS Collaboration 8 ∼200 institutions across ∼40 countries
  • 9. Cybersecurity enabling Science • Humanity has built extraordinary instruments by pooling human and financial resources globally. • To derive science from the data and simulations for those instruments requires globally integrated Cyberinfrastructure. • Cybersecurity is enabling this science.  Policy framework  Operational security  Infrastructure software 9 Disk space use per site by CMS
  • 10. XENON Collaboration as a “Midscale” Instrument Example
  • 11. XENON1T Storage & Processing Challenge • Experiment in Gran Sasso, Italy • Tape Archive in Sweden • Disk storage in 7 locations across Holland, Italy, Israel, France, USA  Petabyte of data divided into 20k datasets • Compute sites on EGI, OSG, and NSF HPC allocation 11 OSG took on the integration challenge via ”embedded” technical support.
  • 12. XENON1T Globally Integrated Infrastructure 12 NIKHEF Amsterdam SURFSara Amsterdam Comet XD Allocation IN2P3 Lyon Weizman Tel Aviv OSG integrates HPC allocations, contributions from collaborators, and opportunistic capacity into a single platform to do science on.
  • 14. OSG Compute Federation 14 OSG federates ~200 clusters worldwide Owners determine policy of use. Many allow opportunistic use of spare capacity. > 2 Billion CPU core hours per year
  • 15. Federation Principle • Any provider can bring their resources to the table. • Truth in advertising:  Resource providers accurately specify (some) details about the resource. • Any consumer can decide which of the available resources they are willing to use. 15 OSG matches consumers to providers globally following policies expressed locally.
  • 16. “NETFLIX” for Open Science • NETFLIX operates a CDN, providing streaming access to searchable curated data from anywhere at anytime to any subscriber. • For open science, the CDN needs to (in addition) be federated.  Anybody can share their data from their locally owned data origin into the CDN.  Data Access is mediated via caches in the network and at endpoints to minimize requirements on origins to maximally stimulate sharing.  Performance of data access is determined by location and performance of the closest cache rather than the data’s origin.  Locally defined and managed groups of users share data securely with each other globally. Data access is global. 16 Locally defined policies are enforced globally by the CDN
  • 17. The OSG Data Federation 17 Cur r ent st ashcache infr ast r uct ur e (US) GaTech We operate a production “prototype” of such a CDN
  • 18. Two Challenges to think about
  • 19. Authz: Person vs Capability • Operations teams are a mix of ”permanent” staff and transients.  E.g. CERN pays for ”Operators” funded via ”authorship fees”. • Delegating a person’s identity to a computing activity in order to authenticate the activity at a remote server makes little sense. • Delegating a capability to a computing activity in order to authenticate it at a remote server makes a lot of sense. 19
  • 20. Division of Responsibility • To maximize the capacity provided we need to minimize the effort required to provide it. • The services required for the CDN and/or compute federations are specialized and non-trivial.  Large learning curve to achieve low cost operations. 20 Service Operations is most (cost) effective when separated from hardware operations
  • 21. Network Cache Ops Model • OSG supports the researchers using the Data Federation • OSG deploys & operates the caching middleware. • PRP, TNRP, I2, Regionals, … responsible for network performance. • Hardware owners operate hardware, OS install, and join K8S for container orchestration. 21 Science Applications Data Federation Services Network Performance Hardware & OS A layered approach to distributed DevOps Responsibility
  • 22. Cybersecurity Issues (I) • Hardware owners only provide hardware  Deploy OS and Kubernetes. • Service Operators (I)  A team that operates the K8S cluster. • Service Operators (II)  A team that deploys and operates the CDN service as containers inside (and across generally multiple) K8S clusters. • Software Operations  A team that provides the container images 22 How do you design a security model that supports this structure?
  • 23. Cybersecurity Issues (II) • Container Security Model • Security Model that allows hardware owners to give service responsibility to service operators.  Diverse requirements  Some institutions will want to operate their own K8S simply because of the level of control that implies.  Others won’t because of the level of effort it requires.  How do DOE and other National Labs fit into this?  How can a service provider in the US operate a service on hardware in EU and Asia? Or vice versa.  What about India, Pakistan, China, Iran, … pick your favorite country ….  How to deal with institutions that require US Citizenship even for SUDO access? 23 The set of issues and diversity of constraints seems endless And now think back to the beginning: All of this is trivial in the cloud!!!
  • 24. Summary & Conclusions 24 • Humanity has built extraordinary instruments by pooling human and financial resources globally. • To derive science from the data and simulations for those instruments requires globally integrated Cyberinfrastructure. • Cybersecurity is enabling this science.  Policy framework  Operational security  Infrastructure software Contact us at: help@opensciencegrid.org Or me personally at: fkw@ucsd.edu
  • 25. Acknowledgements • This work was partially supported by the NSF grants OAC-1941481, MPS-1148698, OAC-1841530, OAC-1904444, and OAC- 1826967 25