SlideShare a Scribd company logo
Tapis-Globus Integration
Joe Stubbs
(jstubbs@tacc.utexas.edu)
Globus World
May 11, 2021
Tapis Project
● 5 year, NSF funded computing framework supporting multi-site computational
research
● Used to manage data and execute code on HPC, HTC and cloud systems
(>51K researcher accounts, 23 tenants & 15 gateways 2021-2022)
● Agentless, SSH-based communication with storage/compute systems
● Implemented as microservices with REST interfaces
● Users obtain a token by authenticating to Tapis using OAuth2
○ Subsequent APIs calls are authenticated using the token
2
https://tapis-project.org
Tapis Higher-Level Objectives
Tapis Gives Researchers The Ability to...
● Track your analysis provenance - Tapis records your input and output data along
with application used and settings - so you know what you have done every time.
● Reproduce your analysis - Tapis records all your inputs/outputs/parameters etc so
you can re-run an analysis.
● Share your data, workflows/applications, computational resources with
collaborators or your lab - Tapis enables sharing with access controls for all your
data/resources/applications within Tapis.
Without having to install or support a complicated stack of technology
Who Is Using Tapis?
Science Gateways
● CyVerse
● DesignSafe
● VDJServer
● Synergistic Discovery and Design (SD2)
● 3D Electron Microscopy (3DEM) Platform
● iMicrobe
● `Ike Wai
Labs/Projects
● Planet Texas 2050
● Hawaii Data Science Institute
● iReceptor+
● C-MAIKI
● ECCO
● GenApp
● Acute to Chronic Pain Signatures (A2CPS)
Institutions
● TACC
● CDC
● UH
● NIH
● Compute Canada
Additional collaborations starting
soon...
Science Gateways
5
Across Various Domains
Streaming Data, Events and Functions
● Functions (Actors)
● *Notifications
● Streams
MetaData Management
● Meta
● PgREST
Tapis Services
7
Tenancy, Authentication and Security
● Tenants
● Sites
● Tokens
● Authenticator
● Security Kernel
● *Postits
Data Management and Code
Executions
● Systems
● Files
● Apps
● Jobs
https://tapis-project.github.io/live-docs
/systems /files /apps /jobs
8
● Register storage and compute systems
○ Systems have a type, such as
Linux, s3, iRODS, etc.
● Ingest, move and transform data files
and folders
● Register application containers on large
systems
● Launch jobs to invoke applications &
Capture metadata about the workflow
HPC Cloud
HTC
Data Management and Code Execution APIs
Globus Integration: Motivation
Globus supports a massive community:
● 20,000+ Globus endpoints; 200K+ users
● Reliable, high-performance transfer
● Support for many storage protocols via connectors: cloud APIs, archival tape
systems.
Many Tapis users are already using Globus, but currently this requires out-of-band
actions that Tapis is unaware of, causing issues:
● Data provenance and history gets broken
● Staging data as part of a job cannot be done through Globus with Tapis
Globus Integration: Design
Design:
● New Tapis systemType “Globus” (existing types: “s3”, “IRODS”, …)
● New Tapis endpoints to support walking Globus OAuth flow with user
○ Tapis obtains and managed access and refresh tokens
● New Tapis-Globus Proxy
○ Lightweight service that translates Tapis data transfer requests to Globus API requests
○ Written in Python to take advantage of the Globus Python SDK
● Modify Tapis data transfer agent to utilize Tapis-Globus Proxy
Future Work
● OAuth SSH
○ Short-lived tokens, obtained via OAuth flows, that can be used to SSH to systems at TACC
and perhaps other HPC centers in the future.
○ Both Tapis and Globus would be interested in this.
○ Collaboration with TACC Security and HPC groups.
● Support for JWT in Globus Auth Tokens for OAuth SSH
○ Would allow tokens to be validated without an extra API call
○ Modify OAuth SSH to allow configurable policies (JWT and MFA lifetimes, restricted identity
domains, etc.)
Thank You
12
Links
GitHub: https://github.com/tapis-project/
Reference: https://tapis.readthedocs.io
OpenAPI “live” docs: https://tapis-project.github.io/live-docs/
* Funding
The Tapis Framework: NSF Office of Advanced CyberInfrastructure #1931439 and #1931575
Contact
Joe Stubbs (jstubbs@tacc.utexas.edu)

More Related Content

PPTX
EUDAT B2STAGE & EOSC-hub
PPTX
"What's New With Globus" Webinar: Spring 2018
PDF
From leading IoT Protocols to Python Dashboarding_final
PDF
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
PDF
MACHBASE_NEO
PDF
Google Cloud infrastructure in Conrad Connect by Google & waylay
PDF
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
PPTX
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta
EUDAT B2STAGE & EOSC-hub
"What's New With Globus" Webinar: Spring 2018
From leading IoT Protocols to Python Dashboarding_final
Alluxio 2.0 & Near Real-time Big Data Platform w/ Spark & Alluxio
MACHBASE_NEO
Google Cloud infrastructure in Conrad Connect by Google & waylay
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Intro to InfluxDB 2.0 and Your First Flux Query by Sonia Gupta

Similar to Integrating Globus into the Tapis API (20)

PDF
Understanding Hadoop
PPTX
Using Google App Engine Python
PDF
GlobusWorld 2024 Opening Keynote session
PDF
GlobusWorld 2024 Opening Keynote session
PDF
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...
PDF
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
PDF
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
PDF
Openstack For Beginners
PPTX
InfluxDB Cloud Product Update
PDF
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
PPTX
Geo Analytics Canada Overview - May 2020
PDF
PPTX
HPC and cloud distributed computing, as a journey
PDF
Instrumenting and Scaling Databases with Envoy
PDF
Webinar: What's new in CDAP 3.5?
PDF
ArtigofinalpublicadoASTESJ_060139.pdf
PDF
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
PPTX
Monitoring federation open stack infrastructure
PPTX
General Introduction to technologies that will be seen in the school
PPTX
QN Blue Lava
Understanding Hadoop
Using Google App Engine Python
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Introduction to InfluxDB 2.0 & Your First Flux Query by Sonia Gupta, Develope...
Red Hat Summit 2017 - LT107508 - Better Managing your Red Hat footprint with ...
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
Openstack For Beginners
InfluxDB Cloud Product Update
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Geo Analytics Canada Overview - May 2020
HPC and cloud distributed computing, as a journey
Instrumenting and Scaling Databases with Envoy
Webinar: What's new in CDAP 3.5?
ArtigofinalpublicadoASTESJ_060139.pdf
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
Monitoring federation open stack infrastructure
General Introduction to technologies that will be seen in the school
QN Blue Lava
Ad

More from Globus (20)

PDF
Globus Compute wth IRI Workflows - GlobusWorld 2024
PDF
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
PDF
Globus Compute Introduction - GlobusWorld 2024
PDF
Globus Connect Server Deep Dive - GlobusWorld 2024
PDF
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
First Steps with Globus Compute Multi-User Endpoints
PDF
Enhancing Research Orchestration Capabilities at ORNL.pdf
PDF
Understanding Globus Data Transfers with NetSage
PDF
How to Position Your Globus Data Portal for Success Ten Good Practices
PDF
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
PDF
Developing Distributed High-performance Computing Capabilities of an Open Sci...
PDF
The Department of Energy's Integrated Research Infrastructure (IRI)
PDF
Enhancing Performance with Globus and the Science DMZ
PDF
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
PDF
Globus at the United States Geological Survey
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
Globus Compute with Integrated Research Infrastructure (IRI) workflows
PDF
Reactive Documents and Computational Pipelines - Bridging the Gap
PDF
Innovating Inference at Exascale - Remote Triggering of Large Language Models...
Globus Compute wth IRI Workflows - GlobusWorld 2024
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus Compute Introduction - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
First Steps with Globus Compute Multi-User Endpoints
Enhancing Research Orchestration Capabilities at ORNL.pdf
Understanding Globus Data Transfers with NetSage
How to Position Your Globus Data Portal for Success Ten Good Practices
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
The Department of Energy's Integrated Research Infrastructure (IRI)
Enhancing Performance with Globus and the Science DMZ
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus at the United States Geological Survey
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Reactive Documents and Computational Pipelines - Bridging the Gap
Innovating Inference at Exascale - Remote Triggering of Large Language Models...
Ad

Recently uploaded (20)

PPTX
CNN LeNet5 Architecture: Neural Networks
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PDF
Visual explanation of Dijkstra's Algorithm using Python
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
Types of Token_ From Utility to Security.pdf
PPTX
Tech Workshop Escape Room Tech Workshop
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
AI Guide for Business Growth - Arna Softech
DOCX
How to Use SharePoint as an ISO-Compliant Document Management System
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PDF
Time Tracking Features That Teams and Organizations Actually Need
PPTX
GSA Content Generator Crack (2025 Latest)
CNN LeNet5 Architecture: Neural Networks
Salesforce Agentforce AI Implementation.pdf
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Wondershare Recoverit Full Crack New Version (Latest 2025)
Advanced SystemCare Ultimate Crack + Portable (2025)
Visual explanation of Dijkstra's Algorithm using Python
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Types of Token_ From Utility to Security.pdf
Tech Workshop Escape Room Tech Workshop
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
Why Generative AI is the Future of Content, Code & Creativity?
Oracle Fusion HCM Cloud Demo for Beginners
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
AI Guide for Business Growth - Arna Softech
How to Use SharePoint as an ISO-Compliant Document Management System
MCP Security Tutorial - Beginner to Advanced
Weekly report ppt - harsh dattuprasad patel.pptx
Time Tracking Features That Teams and Organizations Actually Need
GSA Content Generator Crack (2025 Latest)

Integrating Globus into the Tapis API

  • 2. Tapis Project ● 5 year, NSF funded computing framework supporting multi-site computational research ● Used to manage data and execute code on HPC, HTC and cloud systems (>51K researcher accounts, 23 tenants & 15 gateways 2021-2022) ● Agentless, SSH-based communication with storage/compute systems ● Implemented as microservices with REST interfaces ● Users obtain a token by authenticating to Tapis using OAuth2 ○ Subsequent APIs calls are authenticated using the token 2 https://tapis-project.org
  • 3. Tapis Higher-Level Objectives Tapis Gives Researchers The Ability to... ● Track your analysis provenance - Tapis records your input and output data along with application used and settings - so you know what you have done every time. ● Reproduce your analysis - Tapis records all your inputs/outputs/parameters etc so you can re-run an analysis. ● Share your data, workflows/applications, computational resources with collaborators or your lab - Tapis enables sharing with access controls for all your data/resources/applications within Tapis. Without having to install or support a complicated stack of technology
  • 4. Who Is Using Tapis? Science Gateways ● CyVerse ● DesignSafe ● VDJServer ● Synergistic Discovery and Design (SD2) ● 3D Electron Microscopy (3DEM) Platform ● iMicrobe ● `Ike Wai Labs/Projects ● Planet Texas 2050 ● Hawaii Data Science Institute ● iReceptor+ ● C-MAIKI ● ECCO ● GenApp ● Acute to Chronic Pain Signatures (A2CPS) Institutions ● TACC ● CDC ● UH ● NIH ● Compute Canada Additional collaborations starting soon...
  • 7. Streaming Data, Events and Functions ● Functions (Actors) ● *Notifications ● Streams MetaData Management ● Meta ● PgREST Tapis Services 7 Tenancy, Authentication and Security ● Tenants ● Sites ● Tokens ● Authenticator ● Security Kernel ● *Postits Data Management and Code Executions ● Systems ● Files ● Apps ● Jobs https://tapis-project.github.io/live-docs
  • 8. /systems /files /apps /jobs 8 ● Register storage and compute systems ○ Systems have a type, such as Linux, s3, iRODS, etc. ● Ingest, move and transform data files and folders ● Register application containers on large systems ● Launch jobs to invoke applications & Capture metadata about the workflow HPC Cloud HTC Data Management and Code Execution APIs
  • 9. Globus Integration: Motivation Globus supports a massive community: ● 20,000+ Globus endpoints; 200K+ users ● Reliable, high-performance transfer ● Support for many storage protocols via connectors: cloud APIs, archival tape systems. Many Tapis users are already using Globus, but currently this requires out-of-band actions that Tapis is unaware of, causing issues: ● Data provenance and history gets broken ● Staging data as part of a job cannot be done through Globus with Tapis
  • 10. Globus Integration: Design Design: ● New Tapis systemType “Globus” (existing types: “s3”, “IRODS”, …) ● New Tapis endpoints to support walking Globus OAuth flow with user ○ Tapis obtains and managed access and refresh tokens ● New Tapis-Globus Proxy ○ Lightweight service that translates Tapis data transfer requests to Globus API requests ○ Written in Python to take advantage of the Globus Python SDK ● Modify Tapis data transfer agent to utilize Tapis-Globus Proxy
  • 11. Future Work ● OAuth SSH ○ Short-lived tokens, obtained via OAuth flows, that can be used to SSH to systems at TACC and perhaps other HPC centers in the future. ○ Both Tapis and Globus would be interested in this. ○ Collaboration with TACC Security and HPC groups. ● Support for JWT in Globus Auth Tokens for OAuth SSH ○ Would allow tokens to be validated without an extra API call ○ Modify OAuth SSH to allow configurable policies (JWT and MFA lifetimes, restricted identity domains, etc.)
  • 12. Thank You 12 Links GitHub: https://github.com/tapis-project/ Reference: https://tapis.readthedocs.io OpenAPI “live” docs: https://tapis-project.github.io/live-docs/ * Funding The Tapis Framework: NSF Office of Advanced CyberInfrastructure #1931439 and #1931575 Contact Joe Stubbs (jstubbs@tacc.utexas.edu)