SlideShare a Scribd company logo
Team Argon
“A Commons Platform for Promoting Continuous FAIRness”
NIH Data Commons Pilot
Globus, University of Chicago
University of Southern California
Contact: Ian Foster, foster@uchicago.edu
PIs: Kyle Chard, Ian Foster, Carl Kesselman, Ravi Madduri
Three big picture themes
• Continuous FAIRness: Make all data findable, accessible,
interoperable, reusable at every stage, via pervasive use of
simple identifier and exchange format conventions
• Build on proven security, data, and computation building
blocks that have large user communities inside and outside
biomedicine (see subsequent slides for details)
• Solutions leverage industry best practices and professional
services team to meet scalability, interoperability,
sustainability, and reliability needs
App
(Client)
Service
(Resource Server)
Service
(Resource Server)
Globus Auth: A foundational service for an
authentication and authorization ecosystem
• A flexible security infrastructure that can be used across the Commons
• Enables federation across services using arbitrary linked identities (e.g., @gmail
@xsede @uchicago)
• Facilitates secure/authorized communication between users, services, clients
• Supports arbitrary clients including REST, web, command line, software
• Flexible token management
• Secure sharing between services
• Fine-grain user consents and revocation
Service
(Resource Server)
Resource
Owner
Resource
server operator
App
(Client)
3
https://docs.globus.org/api/auth/
Standards-based, reliable, performant data management
• Globus Connect Server: S3-compatible
HTTP/OAuth interface for secure
client-server transfer
• Endpoints have DNS names
• Globus Transfer: Managed, high-
performance, secure, reliable bulk
asynchronous transfer
• In-place data sharing with flexible and
secure ACLs
• Standards compliant
• S3, OAuth, OIDC, HTTP, GridFTP
4
https://docs.globus.org/api/transfer/
Interoperability: naming and exchange
Minid
• Lightweight identifiers for any product
at any stage
• Easily created, dereferenced, validated
• Global integrity – validate content
across the commons
BDBag
• Self-describing and flexible format
for exchange
• Extended BagIt Specification
• Standard manifest representation
that supports different protocols
Data
Metadata
File1 2AG230..
File2 A31FDC.. FTP
File3 D0F142.. HTTP
…
Minid 001
Minid 007
Minid 719
http://minid.bd2k.org http://bd2k.ini.usc.edu/tools/bdbag/
Infrastructure
My Workspace
• Workspaces bring together data and tools
• Infrastructure designed for scalability and portability
• Leverages
• Federated identities & access control
• Secure access to distributed data
• Data interoperability, exchange
• Provenance
• Tracking activity around data
• By whom? With what?
• Publication & sharing of tools
and workflows
• Cost aware resource allocation for
both compute and data movement
Workspaces: Scalable compute for distributed data
Data Tools
6
Search, navigation, and virtual cohorts
• DERIVA: Digital asset management for heterogeneous data
• Organize, navigate, discover interrelated objects (e.g., assays from a sample over time)
• REST interface
• Entity/Relation model for organizing data
• Supports various DCPPC metadata models
• Fine grain access control to support diverse
collaboration models
• Model evolution to enable continuous
publication, diverse, heterogeneous use cases
• Model driven user interface that
self-configures to current data model
• Integration with Globus Auth, Minids, BDBags, and other components
• Complements Globus Search: Access-controlled search of derived data products
7
Workspace Manager
Bags Workspaces Pipelines
minid_1 Galaxy GTExRNA
minid_2 Jupyter GATKVar
minid_3 RStudio
UCSC
GTEx
TOPMed
MOD
User catalogs
User catalogs
User catalogs
User catalogs
Search
Analyze Visualize
Publish & Reproduce
Discover
Uniform,
secure,
reliable
access to
storage
Virtual cohorts
in standard
manifest with
lightweight ID
Uniform search
across multiple
data sources
All results tracked via standard
manifest and lightweight IDs
Workspaces support Jupyter and
Galaxy on different clouds
Publication
assigns DOIs
and indexes
datasets
Integrating scenario
Summary: Reusable components include...
• Globus Connect Server for data access, transfer, and sharing
• HTTP/S3 access to many storage systems (Posix, object store, etc.)
• GridFTP for managed, reliable, secure, efficient transfers
• Integration with Globus Auth for authentication and authorization
• Offers: A universal storage API
• Globus Auth for securing all REST API interactions
• OAuth2 and OIDC + fine-grained consents and revocation
• Offers: A universal authentication and authorization API
• BDBag (“big data bags”: profiles on BagIt) tools
• BagIt specification with profiles for “holey bags”, etc.
• Offers: Common manifest for exchange of query results, virtual cohorts
• Identifier service for creating lightweight identifiers
• ARKs, created on demand, associated checksum, simple metadata
• Offers: Common mechanism for naming and tracking derived data products9

More Related Content

PPTX
NIH Data Commons Architecture Ideas
PPTX
Thoughts on interoperability
PPTX
Research Automation for Data-Driven Discovery
PPT
Grid Computing July 2009
PPTX
Globus publication demo screenshots
PDF
Simplified Research Data Management with the Globus Platform
PPT
SomeSlides
PPTX
Globus status and publication plans
NIH Data Commons Architecture Ideas
Thoughts on interoperability
Research Automation for Data-Driven Discovery
Grid Computing July 2009
Globus publication demo screenshots
Simplified Research Data Management with the Globus Platform
SomeSlides
Globus status and publication plans

What's hot (20)

PDF
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
PDF
Enabling Secure Data Discoverability (SC21 Tutorial)
PDF
A Data Ecosystem to Support Machine Learning in Materials Science
PPTX
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
PPT
20090701 Climate Data Staging
PPTX
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
PDF
Globus: Enabling the Open Storage Network
PPTX
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
PDF
Automating Research Data Management at Scale with Globus
PDF
Foundations for the Future of Science
PDF
Doing Research in the Cloud - NIH Workshop Dennis Gannon
PPTX
Webinar: Q&A on Globus Subscription Features
PPTX
Gateways 2020 Tutorial - Introduction to Globus
PPTX
An Approach for RDF-based Semantic Access to NoSQL Repositories
PPTX
Linked Open Data and DANS
 
PDF
Why Elastic? @ 50th Vinitaly 2016
PPTX
Delivering a Campus Research Data Service with Globus
PDF
balloon: LOD forecasting - cloudy with a chance of services
PDF
20160922 Materials Data Facility TMS Webinar
PPTX
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
Enabling Secure Data Discoverability (SC21 Tutorial)
A Data Ecosystem to Support Machine Learning in Materials Science
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
20090701 Climate Data Staging
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Globus: Enabling the Open Storage Network
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Automating Research Data Management at Scale with Globus
Foundations for the Future of Science
Doing Research in the Cloud - NIH Workshop Dennis Gannon
Webinar: Q&A on Globus Subscription Features
Gateways 2020 Tutorial - Introduction to Globus
An Approach for RDF-based Semantic Access to NoSQL Repositories
Linked Open Data and DANS
 
Why Elastic? @ 50th Vinitaly 2016
Delivering a Campus Research Data Service with Globus
balloon: LOD forecasting - cloudy with a chance of services
20160922 Materials Data Facility TMS Webinar
balloon Fusion: SPARQL Rewriting Based on Unified Co-Reference Information
Ad

Similar to Team Argon Summary (20)

PDF
Instrument Data Orchestration with Globus Search and Flows
PPTX
Software Infrastructure for a National Research Platform
PDF
Jupyter + Globus: The Foundation for Interactive Data Science
PPTX
Sept 24 NISO Virtual Conference: Library Data in the Cloud
PDF
Introduction to Globus: Research Data Management Software at the ALCF
PDF
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
PDF
Globus Integrations (JupyterHub, Django, ...)
PDF
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
PDF
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
PDF
Enduring Impact in Data-Driven Science
PDF
Introduction to Globus - XSEDE14 Tutorial
PPTX
Scalable Data Management: Automation and the Modern Research Data Portal
PDF
What's New in Globus - Internet2 TechEXtra
PDF
Data Publication and Discovery with Globus
PPTX
Globus: Beyond File Transfer
PDF
Integrating Globus into the Tapis API
PDF
GlobusWorld 2024 Opening Keynote session
PDF
Globus Integrations (GlobusWorld Tour - UCSD)
PDF
Working with Globus Platform Services
PPTX
Globus for Data Management: 2014 Joint Facility User Forum
Instrument Data Orchestration with Globus Search and Flows
Software Infrastructure for a National Research Platform
Jupyter + Globus: The Foundation for Interactive Data Science
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Introduction to Globus: Research Data Management Software at the ALCF
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Globus Integrations (JupyterHub, Django, ...)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Enduring Impact in Data-Driven Science
Introduction to Globus - XSEDE14 Tutorial
Scalable Data Management: Automation and the Modern Research Data Portal
What's New in Globus - Internet2 TechEXtra
Data Publication and Discovery with Globus
Globus: Beyond File Transfer
Integrating Globus into the Tapis API
GlobusWorld 2024 Opening Keynote session
Globus Integrations (GlobusWorld Tour - UCSD)
Working with Globus Platform Services
Globus for Data Management: 2014 Joint Facility User Forum
Ad

More from Ian Foster (20)

PPTX
Global Services for Global Science March 2023.pptx
PPTX
The Earth System Grid Federation: Origins, Current State, Evolution
PPTX
Better Information Faster: Programming the Continuum
PPTX
ESnet6 and Smart Instruments
PPTX
Linking Scientific Instruments and Computation
PPTX
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
PPTX
Foster CRA March 2022.pptx
PPTX
Big Data, Big Computing, AI, and Environmental Science
PPTX
AI at Scale for Materials and Chemistry
PPTX
Coding the Continuum
PPTX
Data Tribology: Overcoming Data Friction with Cloud Automation
PPTX
Research Automation for Data-Driven Discovery
PPTX
Scaling collaborative data science with Globus and Jupyter
PPTX
Learning Systems for Science
PPTX
Data Automation at Light Sources
PPTX
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
PPTX
Going Smart and Deep on Materials at ALCF
PPTX
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
PPTX
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
PPTX
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Global Services for Global Science March 2023.pptx
The Earth System Grid Federation: Origins, Current State, Evolution
Better Information Faster: Programming the Continuum
ESnet6 and Smart Instruments
Linking Scientific Instruments and Computation
A Global Research Data Platform: How Globus Services Enable Scientific Discovery
Foster CRA March 2022.pptx
Big Data, Big Computing, AI, and Environmental Science
AI at Scale for Materials and Chemistry
Coding the Continuum
Data Tribology: Overcoming Data Friction with Cloud Automation
Research Automation for Data-Driven Discovery
Scaling collaborative data science with Globus and Jupyter
Learning Systems for Science
Data Automation at Light Sources
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Going Smart and Deep on Materials at ALCF
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Accelerating the Experimental Feedback Loop: Data Streams and the Advanced Ph...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...

Recently uploaded (20)

PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
STKI Israel Market Study 2025 version august
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PPT
What is a Computer? Input Devices /output devices
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
1. Introduction to Computer Programming.pptx
PPTX
The various Industrial Revolutions .pptx
PPT
Module 1.ppt Iot fundamentals and Architecture
PPTX
Modernising the Digital Integration Hub
DP Operators-handbook-extract for the Mautical Institute
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Getting started with AI Agents and Multi-Agent Systems
NewMind AI Weekly Chronicles – August ’25 Week III
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
STKI Israel Market Study 2025 version august
A novel scalable deep ensemble learning framework for big data classification...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
What is a Computer? Input Devices /output devices
Programs and apps: productivity, graphics, security and other tools
Chapter 5: Probability Theory and Statistics
Univ-Connecticut-ChatGPT-Presentaion.pdf
1 - Historical Antecedents, Social Consideration.pdf
WOOl fibre morphology and structure.pdf for textiles
A contest of sentiment analysis: k-nearest neighbor versus neural network
1. Introduction to Computer Programming.pptx
The various Industrial Revolutions .pptx
Module 1.ppt Iot fundamentals and Architecture
Modernising the Digital Integration Hub

Team Argon Summary

  • 1. Team Argon “A Commons Platform for Promoting Continuous FAIRness” NIH Data Commons Pilot Globus, University of Chicago University of Southern California Contact: Ian Foster, foster@uchicago.edu PIs: Kyle Chard, Ian Foster, Carl Kesselman, Ravi Madduri
  • 2. Three big picture themes • Continuous FAIRness: Make all data findable, accessible, interoperable, reusable at every stage, via pervasive use of simple identifier and exchange format conventions • Build on proven security, data, and computation building blocks that have large user communities inside and outside biomedicine (see subsequent slides for details) • Solutions leverage industry best practices and professional services team to meet scalability, interoperability, sustainability, and reliability needs
  • 3. App (Client) Service (Resource Server) Service (Resource Server) Globus Auth: A foundational service for an authentication and authorization ecosystem • A flexible security infrastructure that can be used across the Commons • Enables federation across services using arbitrary linked identities (e.g., @gmail @xsede @uchicago) • Facilitates secure/authorized communication between users, services, clients • Supports arbitrary clients including REST, web, command line, software • Flexible token management • Secure sharing between services • Fine-grain user consents and revocation Service (Resource Server) Resource Owner Resource server operator App (Client) 3 https://docs.globus.org/api/auth/
  • 4. Standards-based, reliable, performant data management • Globus Connect Server: S3-compatible HTTP/OAuth interface for secure client-server transfer • Endpoints have DNS names • Globus Transfer: Managed, high- performance, secure, reliable bulk asynchronous transfer • In-place data sharing with flexible and secure ACLs • Standards compliant • S3, OAuth, OIDC, HTTP, GridFTP 4 https://docs.globus.org/api/transfer/
  • 5. Interoperability: naming and exchange Minid • Lightweight identifiers for any product at any stage • Easily created, dereferenced, validated • Global integrity – validate content across the commons BDBag • Self-describing and flexible format for exchange • Extended BagIt Specification • Standard manifest representation that supports different protocols Data Metadata File1 2AG230.. File2 A31FDC.. FTP File3 D0F142.. HTTP … Minid 001 Minid 007 Minid 719 http://minid.bd2k.org http://bd2k.ini.usc.edu/tools/bdbag/
  • 6. Infrastructure My Workspace • Workspaces bring together data and tools • Infrastructure designed for scalability and portability • Leverages • Federated identities & access control • Secure access to distributed data • Data interoperability, exchange • Provenance • Tracking activity around data • By whom? With what? • Publication & sharing of tools and workflows • Cost aware resource allocation for both compute and data movement Workspaces: Scalable compute for distributed data Data Tools 6
  • 7. Search, navigation, and virtual cohorts • DERIVA: Digital asset management for heterogeneous data • Organize, navigate, discover interrelated objects (e.g., assays from a sample over time) • REST interface • Entity/Relation model for organizing data • Supports various DCPPC metadata models • Fine grain access control to support diverse collaboration models • Model evolution to enable continuous publication, diverse, heterogeneous use cases • Model driven user interface that self-configures to current data model • Integration with Globus Auth, Minids, BDBags, and other components • Complements Globus Search: Access-controlled search of derived data products 7
  • 8. Workspace Manager Bags Workspaces Pipelines minid_1 Galaxy GTExRNA minid_2 Jupyter GATKVar minid_3 RStudio UCSC GTEx TOPMed MOD User catalogs User catalogs User catalogs User catalogs Search Analyze Visualize Publish & Reproduce Discover Uniform, secure, reliable access to storage Virtual cohorts in standard manifest with lightweight ID Uniform search across multiple data sources All results tracked via standard manifest and lightweight IDs Workspaces support Jupyter and Galaxy on different clouds Publication assigns DOIs and indexes datasets Integrating scenario
  • 9. Summary: Reusable components include... • Globus Connect Server for data access, transfer, and sharing • HTTP/S3 access to many storage systems (Posix, object store, etc.) • GridFTP for managed, reliable, secure, efficient transfers • Integration with Globus Auth for authentication and authorization • Offers: A universal storage API • Globus Auth for securing all REST API interactions • OAuth2 and OIDC + fine-grained consents and revocation • Offers: A universal authentication and authorization API • BDBag (“big data bags”: profiles on BagIt) tools • BagIt specification with profiles for “holey bags”, etc. • Offers: Common manifest for exchange of query results, virtual cohorts • Identifier service for creating lightweight identifiers • ARKs, created on demand, associated checksum, simple metadata • Offers: Common mechanism for naming and tracking derived data products9