SlideShare a Scribd company logo
Automating Research Data Workflows
Vas Vasiliadis
vas@uchicago.edu
CHPC National Conference
December 5, 2019
Data replication/migration/distribution
• For backup: initiated by user or system back up
• Automated transfer of data from science instrument
• Staging of reference data subset from repository
3
Recurring transfers
with sync option
Copy /ingest
Daily @ 3:30am
Staging data with compute jobs
• Stage data in or out as part of the job
• Transfer task is submitted when the job is run
– Endpoint may not be currently activated
• Alternative approaches
1. User adds directives to job submission script
2. Application manages data staging on user’s behalf
Application driven automation
• Application (e.g. portal, science gateway) submits a
transfer of compute results as the user
• Application monitors transfer, and initiates additional
processing and/or backup of data
Relevant Platform
Capabilities
Globus Auth: Native apps
• Client that cannot keep a secret, e.g…
– Command line, desktop apps
– Mobile apps
– Jupyter notebooks
• Native app is registered with Globus Auth
– Not a confidential client
• Native App Grant is used
– Variation on the Authorization Code Grant
• Globus SDK:
– To get tokens: NativeAppAuthClient
– To use tokens: AccessTokenAuthorizer
7
Browser
Native App grant
8
Native App
(Client)
1. Run
application
2. URL to
authenticate
3. Authenticate and
consent
4. Auth code
5. Register
auth code
6. Exchange
code
7. Access tokens
8. Authenticate with access
tokens to invoke transfer
service as user App/Service
(Resource Server)
Globus Auth
(Authorization Server)
Refresh tokens
• Common use cases
– Portal checking transfer status when user is not logged in
– Running command line app from script
• Refresh tokens issued to client, in particular scope
• Client uses refresh token to get access token
– Confidential client: client_id and client_secret required
– Native app: client_secret not required
• Refresh token good for 6 months after last use
• Consent rescindment revokes all tokens
9
Refresh tokens
10
Native App
(Client)
App/Service
(Resource Server)
Globus Auth
(Authorization Server)
1. Run
application
2. URL to
authenticate
Browser
3. Authenticate and consent
4. Auth code
5. Register
auth code
6. Exchange code,
request refresh tokens
7. Access
tokens and refresh tokens
9. Exchange refresh token
for new access tokens
8. Store refresh tokens
10. Access tokens
11. Authenticate with access
tokens to invoke service as user
Native App/Refresh Tokens Sample Code
github.com/globus/native-app-examples
• example_copy_paste.py
– User copies and pastes code to the app
• example_copy_paste_refresh_token.py
– Stores refresh token locally, uses it to get new access tokens
• See README for installation
11
Run them on your EC2 instance in ~globus/native-app-examples
Automation via the
Globus CLI
Globus Command Line Interface (CLI)
• Native application: docs.globus.org/cli/installation
• Open source, uses Python SDK
– globus login – get access/refresh tokens (~/.globus.cfg)
– globus logout – delete tokens
• Service (Transfer/Auth) invocation uses tokens
• Getting help: globus --help, globus list-commands
docs.globus.org/cli/examples
Available on your EC2 instance (log in as user “globus”)
UUIDs everywhere
• UUIDs for endpoint, task, user identity, groups…
• Use search/list options
• get-identities for identity username to UUID
$ globus endpoint search 'Globus Tutorial'
$ globus task list
$ globus get-identities demodoc@globusid.org
df191fb8-ac4d-42b6-966e-a89f07a63dc0
Batch Transfers
• Transfer tasks have one source/destination, but can have
any number of files
• Provide input source-dest pairs via local file
• e.g. move files listed in files.txt from $ep1 to $ep2
$ ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec
$ ep2=ddb59af0-6d04-11e5-ba46-22000b92c6ec
$ globus transfer $ep1:/share/godata/ $ep2:/~/ --
batch --label 'CLI Batch' < files.txt
Useful submission options
• Safe resubmissions
– Applies to all tasks (transfer and delete)
– Get a task UUID and use it in submission
– $ globus task generate-submission-id
– Use --submission-id option in transfer command
• Task wait
– Useful for scripting conditional on transfer task status
Parsing CLI output
• Default output is text; for JSON output use --format json
$ globus endpoint search --filter-scope my-endpoints
$ globus endpoint search --filter-scope my-endpoints --
format json
• Extract specific attributes using --jmespath <expression>
$ globus endpoint search --filter-scope my-endpoints --
jmespath 'DATA[].[id, display_name]'
Managing notifications
• Turn off emails sent for tasks
• Useful when an application manages tasks for a user
• Disable notifications with the --notify option
--notify off (all notifications)
--notify succeeded|failed|inactive (select notifications)
Permission management
• Set and manage permissions on shared endpoint
• Requires access manager role
$ share=<shared_endpoint_UUID>
$ globus endpoint permission create --permissions r --
identity demodoc@globusid.org $share:/NCARTest/
$ globus endpoint permission list $share
$ globus endpoint permission delete $share <perm_UUID>
Example: Recurring transfers
• Submit CLI transfer(s) via task manager/cron
• Useful for periodic sync, backup
• Interactions are as user: both for data access and to
invoke Globus services
Example: Job submission data staging
• CLI installed on head node
• User runs globus login; tokens stored in user’s
home directory
• Tokens accessible when job runs and submits stage
in/out tasks
• Use --skip-activation-check when submitting task
– Task accepted even if endpoint is not activated at submit time
– Task held until endpoint is activated
Example: Automation with portals
• Portal needs to act as the user
• User grants “offline” access to the portal
– Portal gets and stores refresh tokens for each user
– Uses client id/secret + refresh tokens to get new access tokens
– Portal maintains state about transfers being managed (task id)
Automation Examples
• Syncing a directory
– bash script; calls the Globus CLI
– Python module; run as script or import as module
• Staging data in a shared directory
– bash and Python variants
• Removing directories after files are transferred
– Python script
23
github.com/globus/automation-examples
Walkthrough
• Sample script that uses sync option to transfer files
github.com/globus/automation-
examples/blob/master/cli-sync.sh
• Same task, via an app that uses Python SDK
github.com/globus/automation-
examples/blob/master/globus_folder_sync.py
Support resources
• Globus documentation: docs.globus.org
• Sample code: github.com/globus
• Helpdesk and issue escalation: support@globus.org
• Customer engagement team
• Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios
Join the Globus community
• Access the service: globus.org/login
• Create a personal endpoint: globus.org/app/endpoints/create-gcp
• Documentation: docs.globus.org
• Engage: globus.org/mailing-lists
• Subscribe: globus.org/subscriptions
• Need help? support@globus.org
• Follow us: @globusonline

More Related Content

PDF
What's New in Globus - Internet2 TechEXtra
PDF
Connecting Your System to Globus (APS Workshop)
PDF
Globus Portal Framework (APS Workshop)
PDF
GlobusWorld 2021 Tutorial: Building with the Globus Platform
PDF
Introduction to the Globus Platform (APS Workshop)
PDF
Instrument Data Orchestration with Globus Search and Flows
PDF
Automating Research Data Management at Scale with Globus
PDF
GlobusWorld 2021 Tutorial: Globus for System Administrators
What's New in Globus - Internet2 TechEXtra
Connecting Your System to Globus (APS Workshop)
Globus Portal Framework (APS Workshop)
GlobusWorld 2021 Tutorial: Building with the Globus Platform
Introduction to the Globus Platform (APS Workshop)
Instrument Data Orchestration with Globus Search and Flows
Automating Research Data Management at Scale with Globus
GlobusWorld 2021 Tutorial: Globus for System Administrators

What's hot (20)

PPTX
GlobusWorld 2020 Keynote
PDF
Data Orchestration at Scale (GlobusWorld Tour West)
PPTX
"What's New With Globus" Webinar: Spring 2018
PPTX
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
PPTX
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
PDF
Enabling Secure Data Discoverability (SC21 Tutorial)
PDF
Tutorial: Leveraging Globus in your Research Applications
PDF
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
PPTX
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
PDF
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
PPTX
Gateways 2020 Tutorial - Introduction to Globus
PPT
20090701 Climate Data Staging
PPT
Grid Computing July 2009
PDF
Schema Agnostic Indexing with Azure DocumentDB
PPT
SomeSlides
PPTX
Globus Connect Server v5 Q&A Briefing
PDF
Cosmos DB at VLDB 2019
PDF
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
PPTX
Globus status and publication plans
PDF
Expert Roundtable: The Future of Metadata After Hive Metastore
GlobusWorld 2020 Keynote
Data Orchestration at Scale (GlobusWorld Tour West)
"What's New With Globus" Webinar: Spring 2018
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Enabling Secure Data Discoverability (SC21 Tutorial)
Tutorial: Leveraging Globus in your Research Applications
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
Recent Upgrades to ARM Data Transfer and Delivery Using Globus
Gateways 2020 Tutorial - Introduction to Globus
20090701 Climate Data Staging
Grid Computing July 2009
Schema Agnostic Indexing with Azure DocumentDB
SomeSlides
Globus Connect Server v5 Q&A Briefing
Cosmos DB at VLDB 2019
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Globus status and publication plans
Expert Roundtable: The Future of Metadata After Hive Metastore
Ad

Similar to Automating Research Data Flows with Globus (CHPC 2019 - South Africa) (20)

PDF
Automating Research Data Workflows (GlobusWorld Tour - UCSD)
PDF
Automating Research Data Workflows (GlobusWorld Tour - STFC)
PDF
Automating Research Data Workflows (GlobusWorld Tour - Columbia University)
PDF
Tutorial: Automating Research Data Workflows
PPTX
Automating Research Data Flows with the Globus Command Line Interface (CLI)
PDF
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)
PDF
Automating Research Data Flows and Introduction to the Globus Platform
PDF
Automating Research Data Flows and an Introduction to the Globus Platform
PDF
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
PDF
Globus Command Line Interface (APS Workshop)
PDF
Introduction to Globus and Research Automation.pdf
PDF
Simple Data Automation with Globus (GlobusWorld Tour West)
PDF
Using Globus to Streamline Research at Scale
PDF
Introduction to Research Automation with Globus
PDF
Introduction to the Command Line Interface (CLI)
PDF
Globus Automation
PDF
Automating Research Data Management with Globus
PDF
Jupyter + Globus: The Foundation for Interactive Data Science
PDF
Data Publication and Discovery with Globus
PDF
Leveraging the Globus Platform (GlobusWorld Tour - UCSD)
Automating Research Data Workflows (GlobusWorld Tour - UCSD)
Automating Research Data Workflows (GlobusWorld Tour - STFC)
Automating Research Data Workflows (GlobusWorld Tour - Columbia University)
Tutorial: Automating Research Data Workflows
Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)
Automating Research Data Flows and Introduction to the Globus Platform
Automating Research Data Flows and an Introduction to the Globus Platform
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
Globus Command Line Interface (APS Workshop)
Introduction to Globus and Research Automation.pdf
Simple Data Automation with Globus (GlobusWorld Tour West)
Using Globus to Streamline Research at Scale
Introduction to Research Automation with Globus
Introduction to the Command Line Interface (CLI)
Globus Automation
Automating Research Data Management with Globus
Jupyter + Globus: The Foundation for Interactive Data Science
Data Publication and Discovery with Globus
Leveraging the Globus Platform (GlobusWorld Tour - UCSD)
Ad

More from Globus (20)

PDF
Globus Compute wth IRI Workflows - GlobusWorld 2024
PDF
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
PDF
Globus Compute Introduction - GlobusWorld 2024
PDF
Globus Connect Server Deep Dive - GlobusWorld 2024
PDF
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
First Steps with Globus Compute Multi-User Endpoints
PDF
Enhancing Research Orchestration Capabilities at ORNL.pdf
PDF
Understanding Globus Data Transfers with NetSage
PDF
How to Position Your Globus Data Portal for Success Ten Good Practices
PDF
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
PDF
Developing Distributed High-performance Computing Capabilities of an Open Sci...
PDF
The Department of Energy's Integrated Research Infrastructure (IRI)
PDF
GlobusWorld 2024 Opening Keynote session
PDF
Enhancing Performance with Globus and the Science DMZ
PDF
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
PDF
Globus at the United States Geological Survey
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
Globus Compute with Integrated Research Infrastructure (IRI) workflows
PDF
Reactive Documents and Computational Pipelines - Bridging the Gap
Globus Compute wth IRI Workflows - GlobusWorld 2024
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus Compute Introduction - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
First Steps with Globus Compute Multi-User Endpoints
Enhancing Research Orchestration Capabilities at ORNL.pdf
Understanding Globus Data Transfers with NetSage
How to Position Your Globus Data Portal for Success Ten Good Practices
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
The Department of Energy's Integrated Research Infrastructure (IRI)
GlobusWorld 2024 Opening Keynote session
Enhancing Performance with Globus and the Science DMZ
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus at the United States Geological Survey
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Reactive Documents and Computational Pipelines - Bridging the Gap

Recently uploaded (20)

PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Lecture1 pattern recognition............
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Quality review (1)_presentation of this 21
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
1_Introduction to advance data techniques.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Clinical guidelines as a resource for EBP(1).pdf
Galatica Smart Energy Infrastructure Startup Pitch Deck
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Lecture1 pattern recognition............
IB Computer Science - Internal Assessment.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Quality review (1)_presentation of this 21
STUDY DESIGN details- Lt Col Maksud (21).pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Reliability_Chapter_ presentation 1221.5784
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
1_Introduction to advance data techniques.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Fluorescence-microscope_Botany_detailed content
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf

Automating Research Data Flows with Globus (CHPC 2019 - South Africa)

  • 1. Automating Research Data Workflows Vas Vasiliadis vas@uchicago.edu CHPC National Conference December 5, 2019
  • 2. Data replication/migration/distribution • For backup: initiated by user or system back up • Automated transfer of data from science instrument • Staging of reference data subset from repository 3 Recurring transfers with sync option Copy /ingest Daily @ 3:30am
  • 3. Staging data with compute jobs • Stage data in or out as part of the job • Transfer task is submitted when the job is run – Endpoint may not be currently activated • Alternative approaches 1. User adds directives to job submission script 2. Application manages data staging on user’s behalf
  • 4. Application driven automation • Application (e.g. portal, science gateway) submits a transfer of compute results as the user • Application monitors transfer, and initiates additional processing and/or backup of data
  • 6. Globus Auth: Native apps • Client that cannot keep a secret, e.g… – Command line, desktop apps – Mobile apps – Jupyter notebooks • Native app is registered with Globus Auth – Not a confidential client • Native App Grant is used – Variation on the Authorization Code Grant • Globus SDK: – To get tokens: NativeAppAuthClient – To use tokens: AccessTokenAuthorizer 7
  • 7. Browser Native App grant 8 Native App (Client) 1. Run application 2. URL to authenticate 3. Authenticate and consent 4. Auth code 5. Register auth code 6. Exchange code 7. Access tokens 8. Authenticate with access tokens to invoke transfer service as user App/Service (Resource Server) Globus Auth (Authorization Server)
  • 8. Refresh tokens • Common use cases – Portal checking transfer status when user is not logged in – Running command line app from script • Refresh tokens issued to client, in particular scope • Client uses refresh token to get access token – Confidential client: client_id and client_secret required – Native app: client_secret not required • Refresh token good for 6 months after last use • Consent rescindment revokes all tokens 9
  • 9. Refresh tokens 10 Native App (Client) App/Service (Resource Server) Globus Auth (Authorization Server) 1. Run application 2. URL to authenticate Browser 3. Authenticate and consent 4. Auth code 5. Register auth code 6. Exchange code, request refresh tokens 7. Access tokens and refresh tokens 9. Exchange refresh token for new access tokens 8. Store refresh tokens 10. Access tokens 11. Authenticate with access tokens to invoke service as user
  • 10. Native App/Refresh Tokens Sample Code github.com/globus/native-app-examples • example_copy_paste.py – User copies and pastes code to the app • example_copy_paste_refresh_token.py – Stores refresh token locally, uses it to get new access tokens • See README for installation 11 Run them on your EC2 instance in ~globus/native-app-examples
  • 12. Globus Command Line Interface (CLI) • Native application: docs.globus.org/cli/installation • Open source, uses Python SDK – globus login – get access/refresh tokens (~/.globus.cfg) – globus logout – delete tokens • Service (Transfer/Auth) invocation uses tokens • Getting help: globus --help, globus list-commands docs.globus.org/cli/examples Available on your EC2 instance (log in as user “globus”)
  • 13. UUIDs everywhere • UUIDs for endpoint, task, user identity, groups… • Use search/list options • get-identities for identity username to UUID $ globus endpoint search 'Globus Tutorial' $ globus task list $ globus get-identities demodoc@globusid.org df191fb8-ac4d-42b6-966e-a89f07a63dc0
  • 14. Batch Transfers • Transfer tasks have one source/destination, but can have any number of files • Provide input source-dest pairs via local file • e.g. move files listed in files.txt from $ep1 to $ep2 $ ep1=ddb59aef-6d04-11e5-ba46-22000b92c6ec $ ep2=ddb59af0-6d04-11e5-ba46-22000b92c6ec $ globus transfer $ep1:/share/godata/ $ep2:/~/ -- batch --label 'CLI Batch' < files.txt
  • 15. Useful submission options • Safe resubmissions – Applies to all tasks (transfer and delete) – Get a task UUID and use it in submission – $ globus task generate-submission-id – Use --submission-id option in transfer command • Task wait – Useful for scripting conditional on transfer task status
  • 16. Parsing CLI output • Default output is text; for JSON output use --format json $ globus endpoint search --filter-scope my-endpoints $ globus endpoint search --filter-scope my-endpoints -- format json • Extract specific attributes using --jmespath <expression> $ globus endpoint search --filter-scope my-endpoints -- jmespath 'DATA[].[id, display_name]'
  • 17. Managing notifications • Turn off emails sent for tasks • Useful when an application manages tasks for a user • Disable notifications with the --notify option --notify off (all notifications) --notify succeeded|failed|inactive (select notifications)
  • 18. Permission management • Set and manage permissions on shared endpoint • Requires access manager role $ share=<shared_endpoint_UUID> $ globus endpoint permission create --permissions r -- identity demodoc@globusid.org $share:/NCARTest/ $ globus endpoint permission list $share $ globus endpoint permission delete $share <perm_UUID>
  • 19. Example: Recurring transfers • Submit CLI transfer(s) via task manager/cron • Useful for periodic sync, backup • Interactions are as user: both for data access and to invoke Globus services
  • 20. Example: Job submission data staging • CLI installed on head node • User runs globus login; tokens stored in user’s home directory • Tokens accessible when job runs and submits stage in/out tasks • Use --skip-activation-check when submitting task – Task accepted even if endpoint is not activated at submit time – Task held until endpoint is activated
  • 21. Example: Automation with portals • Portal needs to act as the user • User grants “offline” access to the portal – Portal gets and stores refresh tokens for each user – Uses client id/secret + refresh tokens to get new access tokens – Portal maintains state about transfers being managed (task id)
  • 22. Automation Examples • Syncing a directory – bash script; calls the Globus CLI – Python module; run as script or import as module • Staging data in a shared directory – bash and Python variants • Removing directories after files are transferred – Python script 23 github.com/globus/automation-examples
  • 23. Walkthrough • Sample script that uses sync option to transfer files github.com/globus/automation- examples/blob/master/cli-sync.sh • Same task, via an app that uses Python SDK github.com/globus/automation- examples/blob/master/globus_folder_sync.py
  • 24. Support resources • Globus documentation: docs.globus.org • Sample code: github.com/globus • Helpdesk and issue escalation: support@globus.org • Customer engagement team • Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios
  • 25. Join the Globus community • Access the service: globus.org/login • Create a personal endpoint: globus.org/app/endpoints/create-gcp • Documentation: docs.globus.org • Engage: globus.org/mailing-lists • Subscribe: globus.org/subscriptions • Need help? support@globus.org • Follow us: @globusonline