SlideShare a Scribd company logo
Vas Vasiliadis
vas@uchicago.edu
February 27, 2024
Introduction to Research Automation
What do we mean by
research “automation”?
Executing research tasks* reliably,
at scale, with minimal (or no)
human intervention when required.
*data management and computation
2
Stepping into automation using Globus
• Level 1: Use the web app; it’s manual, but it may be
more automated than your current process J
• Level 2: Semi-automated, recurring tasks
• Level 3: Automation using the Globus CLI
• Level 4: Automation using Globus Flows
• Level 5: “Lights-out” automation using Globus Flows
with event triggers
A simple, and very common, use case
Transfer data
to a system for
sharing
Transfer
Set access
controls for
sharing data
Share
We’ll use this to
demonstrate
5 levels of automation
Level 1
Point-and-click using
the web app
6
Example: Ad hoc sharing of
results with a collaborator
Transfer Share
Level 2
Timer + manual setting
of permissions
7
Example: Scheduled backup
set to read-only access
Transfer Share
Timers
8
Scheduled and/or
recurring file
transfers
Support all transfer
and sync options
hpc.nih.gov/docs/globus/globus_cron.php#cron
Level 3
Parameterized CLI script
9
Transfer Share
Example: On-demand data
distribution from analysis
Globus Command Line Interface
Automation of
simple data
management tasks
Integration with
existing scripts
(job submission …)
Open source, uses
the Python SDK
$ globus
Usage: globus [OPTIONS] COMMAND [ARGS]...
Interact with Globus from the command line
All `globus` subcommands support `--help` documentation.
Use `globus login` to get started!
The documentation is also online at https://docs.globus.org/cli/
Options:
-v, --verbose Control level of output
-h, --help Show this message and exit.
-F, --format [unix|json|text] Output format for stdout. Defaults to
text
--jmespath, --jq TEXT A JMESPath expression to apply to json
output. Forces the format to be json
processed by this expression
--map-http-status TEXT Map HTTP statuses to any of these exit
codes:
0,1,50-99. e.g. "404=50,403=51"
Commands:
api Make API calls to Globus services
bookmark Manage endpoint bookmarks
cli-profile-list List all CLI profiles which have been used
collection Manage your Collections
delete Submit a delete task (asynchronous)
endpoint Manage Globus endpoint definitions
flows Interact with the Globus Flows service
Transfer and share CLI commands
11
$ globus transfer 
> --recursive 
> source_collection_uuid:source_path 
> guest_collection_uuid:destination_path
Message: The transfer has been accepted and a task has been created and
queued for execution
Task ID: f5eb855c-4098-11ee-8ba2-2197ca2bfedc
$ globus endpoint permission create 
> --group $group_uuid 
> --permissions $permissions 
> guest_collection_uuid:destination_path
Granting group, ............., read access to the destination directory
Message: Access rule created successfully.
Rule ID: 7fe723a4-413b-11ee-88f9-03dc0e0dcc45
Exercise: Run script using the Globus CLI
• Log into your instance
• Go to the ~/globus-tutorials directory
• Run the transfer_share.sh script
$ ./transfer_share.sh 
> --source-collection a6f165fa-aee2-4fe5-95f3-97429c28bf82 
> --source-path /cli 
> --guest-collection fe2feb64-4ac0-4a40-ba90-94b99d06dd2c 
> --sharing-path /rpi/YOUR_NAME 
> --group-id 50b6a29c-63ac-11e4-8062-22000ab68755
Level 4
Using a Globus Flow
13
Transfer Share
Example: Moving data from
instrument to campus
cluster for analysis
Level 4: Automation with Globus Flows
• Flows Service: A platform for managed, secure,
reliable task orchestration
• Flows comprise Actions à invoke Globus services;
extensible to support your own services
• Run via web app, CLI, API, event-based triggers*
Common tasks in most instrument scenarios
Transfer raw
images to HPC
cluster
Transfer
Set access
controls to allow
analysis
Share
Flows
Service
2
Actions
1
:set_permission
Action Provider
:transfer
Action Provider
Flow lifecycle
16
• Define using JSON
Flow lifecycle
17
• Define using JSON/YAML
• Deploy to Flows service
Flow lifecycle
18
• Define using JSON/YAML
• Deploy to Flows service
• Set access policy for
visibility and execution
Flow lifecycle
19
• Define using JSON/YAML
• Deploy to Flows service
• Set access policy for
visibility and execution
• Run (debug) and monitor
Flow lifecycle: Write once, run many
20
• Define using JSON/YAML
• Deploy to Flows service
• Set access policy for
visibility and execution
• Run (debug) and monitor
• …and run again!
Flow definition
21
"StartAt": "TransferFiles",
"States": {
"TransferFiles": {
"Comment": "Transfer to a guest collection",
"Type": "Action",
"ActionUrl": "https://actions.automate.globus.org/transfer/transfer",
"Parameters": {
"source_endpoint_id.$": "$.input.source.id",
"destination_endpoint_id.$": "$.input.destination.id",
"transfer_items": [
{
"source_path.$": "$.input.source.path",
"destination_path.$": "$.input.destination.path",
"recursive.$": "$.input.recursive_tx"
}
]
},
"ResultPath": "$.TransferFiles",
"WaitTime": 60,
"Next": "SetPermission",
},
"SetPermission": {
.....
"End": True
}
}
Action
Action Provider URL
Action inputs
Timeout (seconds)
Next state
Flow input schema
22
{
....
"properties": {
"input": {
"type": "object",
"required": [
"source",
"destination",
"recursive_tx"
],
"properties": {
"source": {
"type": "object",
"title": "Select source collection and path",
"description": ”Source collection/path (MUST end with '/')",
"format": "globus-collection",
"required": [
"id",
"path"
],
"transfer_label": {
"type": "string",
"title": "Label for Transfer Task",
"pattern": "^[a-zA-Z0-9-_, ]+$",
"maxLength": 128,
}
....
Required inputs
Custom schema
Input type
Input type
We give you a head start
23
Run a flow
app.globus.org/flows
(make sure you’re a member
of the “Tutorial Users” group)
25
Transfer Share
Exercise: Run Globus Flow using the web app
• Find “Tutorial - Transfer and Share” in flows library
• Click “Start”
• Confirm the source and destination collections
• Change the name of target path: /rpi/YOUR_NAME
• Enter a label for the flow run
• Click “Start Run”
• Monitor flow progress on the “Event log” tab
Level 5
Triggering flows
automatically
27
Transfer Share
EC2
Instance
“Instrument”
Simulating an instrument flow
Monitor
script
transfer control
Access data
and run
analysis
0 trigger
flow run
set
permissions
2
Globus
Connect
Server
Globus
Connect Server
ALCF
Eagle
transfer
files
1
Illustrating the
possible…
29
A more interesting scenario: cryoEM
Globus
Flows
Carbon!
Correct,
classify, …
Transfer
Transfer
raw files
Compute
Launch
analysis job
Share
Set access
controls
Transfer
Move final
files to repo
Globus
Flows
End-to-end Automation: Serial Crystallography
Image
processing
Data capture
Carbon!
Check
threshold
Data publication
Transfer
Transfer
raw files
Transfer
Move results
to repo
Compute
Analyze
images
Compute
Visualize
Compute
Gather
metadata
Share
Set access
controls
Compute
Launch QA
job
Search
Ingest to
index
Extending the Flows
ecosystem
32
Extending the ecosystem: Action providers
33
• Action Provider is a
service endpoint
– Run
– Status
– Cancel
– Release
– Resume
• Action Provider Toolkit
action-provider-tools.readthedocs.io
compute
ACLs
delete
identifier
transfer
notify ingest
mkdir
search
ls
Xtract describe
web form
Custom developed
docs.globus.org/api/flows/hosted-action-providers
Support resources
• Flows service in web app: app.globus.org/flows
• Flows documentation: docs.globus.org/api/flows
• Helpdesk: support@globus.org
• Customer engagement team can advise on flows
• Professional services team can help build flows

More Related Content

PDF
Automating Research Data Management with Globus
PDF
Automating Research Data Flows and an Introduction to the Globus Platform
PDF
Automating Research Data Flows and Introduction to the Globus Platform
PDF
Globus Automation
PDF
Introduction to Globus and Research Automation.pdf
PDF
Automating Research Data with Globus Flows and Compute
PDF
Using Globus to Streamline Research at Scale
PDF
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Automating Research Data Management with Globus
Automating Research Data Flows and an Introduction to the Globus Platform
Automating Research Data Flows and Introduction to the Globus Platform
Globus Automation
Introduction to Globus and Research Automation.pdf
Automating Research Data with Globus Flows and Compute
Using Globus to Streamline Research at Scale
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)

Similar to Introduction to Research Automation with Globus (20)

PDF
Tutorial: Automating Research Data Workflows
PDF
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)
PDF
Automating Research Data Workflows (GlobusWorld Tour - STFC)
PPTX
Automating Research Data Flows with the Globus Command Line Interface (CLI)
PDF
Automating Research Data Workflows (GlobusWorld Tour - Columbia University)
PDF
Automating Research Data Workflows (GlobusWorld Tour - UCSD)
PDF
Globus Command Line Interface (APS Workshop)
PDF
Introduction to the Command Line Interface (CLI)
PDF
Research Automation with Globus Flows.pdf
PDF
Data Publication and Discovery with Globus
PDF
Simple Data Automation with Globus (GlobusWorld Tour West)
PDF
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
PPTX
Gateways 2020 Tutorial - Introduction to Globus
PDF
Leveraging the Globus Platform (GlobusWorld Tour - UCSD)
PDF
GlobusWorld 2024 Opening Keynote session
PDF
Introduction to the Globus PaaS (GlobusWorld Tour - STFC)
PDF
Tutorial: Leveraging Globus in your Research Applications
PDF
Jupyter + Globus: The Foundation for Interactive Data Science
PDF
Introduction to Globus for New Users
PPTX
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Tutorial: Automating Research Data Workflows
Automating Data Flows with the Globus CLI (GlobusWorld Tour - UMich)
Automating Research Data Workflows (GlobusWorld Tour - STFC)
Automating Research Data Flows with the Globus Command Line Interface (CLI)
Automating Research Data Workflows (GlobusWorld Tour - Columbia University)
Automating Research Data Workflows (GlobusWorld Tour - UCSD)
Globus Command Line Interface (APS Workshop)
Introduction to the Command Line Interface (CLI)
Research Automation with Globus Flows.pdf
Data Publication and Discovery with Globus
Simple Data Automation with Globus (GlobusWorld Tour West)
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
Gateways 2020 Tutorial - Introduction to Globus
Leveraging the Globus Platform (GlobusWorld Tour - UCSD)
GlobusWorld 2024 Opening Keynote session
Introduction to the Globus PaaS (GlobusWorld Tour - STFC)
Tutorial: Leveraging Globus in your Research Applications
Jupyter + Globus: The Foundation for Interactive Data Science
Introduction to Globus for New Users
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Ad

More from Globus (20)

PDF
Globus Compute wth IRI Workflows - GlobusWorld 2024
PDF
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
PDF
Globus Compute Introduction - GlobusWorld 2024
PDF
Globus Connect Server Deep Dive - GlobusWorld 2024
PDF
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
First Steps with Globus Compute Multi-User Endpoints
PDF
Enhancing Research Orchestration Capabilities at ORNL.pdf
PDF
Understanding Globus Data Transfers with NetSage
PDF
How to Position Your Globus Data Portal for Success Ten Good Practices
PDF
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
PDF
Developing Distributed High-performance Computing Capabilities of an Open Sci...
PDF
The Department of Energy's Integrated Research Infrastructure (IRI)
PDF
Enhancing Performance with Globus and the Science DMZ
PDF
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
PDF
Globus at the United States Geological Survey
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
Globus Compute with Integrated Research Infrastructure (IRI) workflows
PDF
Reactive Documents and Computational Pipelines - Bridging the Gap
PDF
Innovating Inference at Exascale - Remote Triggering of Large Language Models...
Globus Compute wth IRI Workflows - GlobusWorld 2024
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus Compute Introduction - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
First Steps with Globus Compute Multi-User Endpoints
Enhancing Research Orchestration Capabilities at ORNL.pdf
Understanding Globus Data Transfers with NetSage
How to Position Your Globus Data Portal for Success Ten Good Practices
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
The Department of Energy's Integrated Research Infrastructure (IRI)
Enhancing Performance with Globus and the Science DMZ
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus at the United States Geological Survey
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Reactive Documents and Computational Pipelines - Bridging the Gap
Innovating Inference at Exascale - Remote Triggering of Large Language Models...
Ad

Recently uploaded (20)

PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Essential Infomation Tech presentation.pptx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
top salesforce developer skills in 2025.pdf
PDF
Nekopoi APK 2025 free lastest update
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Digital Strategies for Manufacturing Companies
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
AI in Product Development-omnex systems
Design an Analysis of Algorithms I-SECS-1021-03
Odoo Companies in India – Driving Business Transformation.pdf
Essential Infomation Tech presentation.pptx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Wondershare Filmora 15 Crack With Activation Key [2025
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
top salesforce developer skills in 2025.pdf
Nekopoi APK 2025 free lastest update
Operating system designcfffgfgggggggvggggggggg
CHAPTER 2 - PM Management and IT Context
How Creative Agencies Leverage Project Management Software.pdf
Digital Strategies for Manufacturing Companies
Understanding Forklifts - TECH EHS Solution
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Which alternative to Crystal Reports is best for small or large businesses.pdf
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
wealthsignaloriginal-com-DS-text-... (1).pdf
AI in Product Development-omnex systems

Introduction to Research Automation with Globus

  • 1. Vas Vasiliadis vas@uchicago.edu February 27, 2024 Introduction to Research Automation
  • 2. What do we mean by research “automation”? Executing research tasks* reliably, at scale, with minimal (or no) human intervention when required. *data management and computation 2
  • 3. Stepping into automation using Globus • Level 1: Use the web app; it’s manual, but it may be more automated than your current process J • Level 2: Semi-automated, recurring tasks • Level 3: Automation using the Globus CLI • Level 4: Automation using Globus Flows • Level 5: “Lights-out” automation using Globus Flows with event triggers
  • 4. A simple, and very common, use case Transfer data to a system for sharing Transfer Set access controls for sharing data Share We’ll use this to demonstrate 5 levels of automation
  • 5. Level 1 Point-and-click using the web app 6 Example: Ad hoc sharing of results with a collaborator Transfer Share
  • 6. Level 2 Timer + manual setting of permissions 7 Example: Scheduled backup set to read-only access Transfer Share
  • 7. Timers 8 Scheduled and/or recurring file transfers Support all transfer and sync options hpc.nih.gov/docs/globus/globus_cron.php#cron
  • 8. Level 3 Parameterized CLI script 9 Transfer Share Example: On-demand data distribution from analysis
  • 9. Globus Command Line Interface Automation of simple data management tasks Integration with existing scripts (job submission …) Open source, uses the Python SDK $ globus Usage: globus [OPTIONS] COMMAND [ARGS]... Interact with Globus from the command line All `globus` subcommands support `--help` documentation. Use `globus login` to get started! The documentation is also online at https://docs.globus.org/cli/ Options: -v, --verbose Control level of output -h, --help Show this message and exit. -F, --format [unix|json|text] Output format for stdout. Defaults to text --jmespath, --jq TEXT A JMESPath expression to apply to json output. Forces the format to be json processed by this expression --map-http-status TEXT Map HTTP statuses to any of these exit codes: 0,1,50-99. e.g. "404=50,403=51" Commands: api Make API calls to Globus services bookmark Manage endpoint bookmarks cli-profile-list List all CLI profiles which have been used collection Manage your Collections delete Submit a delete task (asynchronous) endpoint Manage Globus endpoint definitions flows Interact with the Globus Flows service
  • 10. Transfer and share CLI commands 11 $ globus transfer > --recursive > source_collection_uuid:source_path > guest_collection_uuid:destination_path Message: The transfer has been accepted and a task has been created and queued for execution Task ID: f5eb855c-4098-11ee-8ba2-2197ca2bfedc $ globus endpoint permission create > --group $group_uuid > --permissions $permissions > guest_collection_uuid:destination_path Granting group, ............., read access to the destination directory Message: Access rule created successfully. Rule ID: 7fe723a4-413b-11ee-88f9-03dc0e0dcc45
  • 11. Exercise: Run script using the Globus CLI • Log into your instance • Go to the ~/globus-tutorials directory • Run the transfer_share.sh script $ ./transfer_share.sh > --source-collection a6f165fa-aee2-4fe5-95f3-97429c28bf82 > --source-path /cli > --guest-collection fe2feb64-4ac0-4a40-ba90-94b99d06dd2c > --sharing-path /rpi/YOUR_NAME > --group-id 50b6a29c-63ac-11e4-8062-22000ab68755
  • 12. Level 4 Using a Globus Flow 13 Transfer Share Example: Moving data from instrument to campus cluster for analysis
  • 13. Level 4: Automation with Globus Flows • Flows Service: A platform for managed, secure, reliable task orchestration • Flows comprise Actions à invoke Globus services; extensible to support your own services • Run via web app, CLI, API, event-based triggers*
  • 14. Common tasks in most instrument scenarios Transfer raw images to HPC cluster Transfer Set access controls to allow analysis Share Flows Service 2 Actions 1 :set_permission Action Provider :transfer Action Provider
  • 16. Flow lifecycle 17 • Define using JSON/YAML • Deploy to Flows service
  • 17. Flow lifecycle 18 • Define using JSON/YAML • Deploy to Flows service • Set access policy for visibility and execution
  • 18. Flow lifecycle 19 • Define using JSON/YAML • Deploy to Flows service • Set access policy for visibility and execution • Run (debug) and monitor
  • 19. Flow lifecycle: Write once, run many 20 • Define using JSON/YAML • Deploy to Flows service • Set access policy for visibility and execution • Run (debug) and monitor • …and run again!
  • 20. Flow definition 21 "StartAt": "TransferFiles", "States": { "TransferFiles": { "Comment": "Transfer to a guest collection", "Type": "Action", "ActionUrl": "https://actions.automate.globus.org/transfer/transfer", "Parameters": { "source_endpoint_id.$": "$.input.source.id", "destination_endpoint_id.$": "$.input.destination.id", "transfer_items": [ { "source_path.$": "$.input.source.path", "destination_path.$": "$.input.destination.path", "recursive.$": "$.input.recursive_tx" } ] }, "ResultPath": "$.TransferFiles", "WaitTime": 60, "Next": "SetPermission", }, "SetPermission": { ..... "End": True } } Action Action Provider URL Action inputs Timeout (seconds) Next state
  • 21. Flow input schema 22 { .... "properties": { "input": { "type": "object", "required": [ "source", "destination", "recursive_tx" ], "properties": { "source": { "type": "object", "title": "Select source collection and path", "description": ”Source collection/path (MUST end with '/')", "format": "globus-collection", "required": [ "id", "path" ], "transfer_label": { "type": "string", "title": "Label for Transfer Task", "pattern": "^[a-zA-Z0-9-_, ]+$", "maxLength": 128, } .... Required inputs Custom schema Input type Input type
  • 22. We give you a head start 23
  • 23. Run a flow app.globus.org/flows (make sure you’re a member of the “Tutorial Users” group) 25 Transfer Share
  • 24. Exercise: Run Globus Flow using the web app • Find “Tutorial - Transfer and Share” in flows library • Click “Start” • Confirm the source and destination collections • Change the name of target path: /rpi/YOUR_NAME • Enter a label for the flow run • Click “Start Run” • Monitor flow progress on the “Event log” tab
  • 26. EC2 Instance “Instrument” Simulating an instrument flow Monitor script transfer control Access data and run analysis 0 trigger flow run set permissions 2 Globus Connect Server Globus Connect Server ALCF Eagle transfer files 1
  • 28. A more interesting scenario: cryoEM Globus Flows Carbon! Correct, classify, … Transfer Transfer raw files Compute Launch analysis job Share Set access controls Transfer Move final files to repo
  • 29. Globus Flows End-to-end Automation: Serial Crystallography Image processing Data capture Carbon! Check threshold Data publication Transfer Transfer raw files Transfer Move results to repo Compute Analyze images Compute Visualize Compute Gather metadata Share Set access controls Compute Launch QA job Search Ingest to index
  • 31. Extending the ecosystem: Action providers 33 • Action Provider is a service endpoint – Run – Status – Cancel – Release – Resume • Action Provider Toolkit action-provider-tools.readthedocs.io compute ACLs delete identifier transfer notify ingest mkdir search ls Xtract describe web form Custom developed docs.globus.org/api/flows/hosted-action-providers
  • 32. Support resources • Flows service in web app: app.globus.org/flows • Flows documentation: docs.globus.org/api/flows • Helpdesk: support@globus.org • Customer engagement team can advise on flows • Professional services team can help build flows