SlideShare a Scribd company logo
Building with
Globus platform
services:
Search and Flows
Vas Vasiliadis
GlobusWorld - May 13, 2021
Data distribution/access
via data portal, science
gateway, data commons…
Emerging best practice:
Modern Research Data
Portal design pattern
Canonical data access/distribution use case
• Portal/science gateway to
distribute data
• Interface to search and
discover data of interest
• Asynchronous transfer to
user’s system or via
HTTPS (e.g. for catalogs)
• Fine-grained authorization
enforced on (meta)data
Search and request
data of interest
Transfer
data to
destination
Common solution components
• Guest collection for “staging” data
• Registered application (manages permissions)
• Search index with faceted queries
• Data transfer, to and from shared endpoint
Relevant Globus platform features
• Guest collection creation requires authentication
– Cannot be completely automated
– Must be a managed endpoint
• Roles for management of endpoint and tasks
– Access Manager role grants the right to manage permissions
– Granted to other users, groups or applications
Application registration
• Set desired scopes
• Set callback URL
• Get client ID and secret
• Consents implement
least privileges principle
7
Auth
developers.globus.org
Data sharing permissions management
• Permissions are set per folder, on a guest collection
• Permissions management can be automated
• For a user
– Identity: user must log in with this
– Email: user gets a code via email; link to their Globus Account
• For a group
– Group UUID: search for group to get UUID
– Access governed by membership in the group
• For an application
– Application identity: appclientid@clients.auth.globus.org
Application concepts
• Custom application that can automatically manage
permissions
– Can use Globus CLI
• Confidential apps: use client id and secret
– Ensure application is on a secure device
– Set up policy for rotation of secret (limited life tokens)
Client credential grant
10
1. Authenticate with app
client id and secret
2. Access Tokens
Application,
Science Gateway,
Data Portal
(Client)
3. Authenticate as app
with access tokens (to
manage permissions)
Globus Transfer
(Resource Server)
Globus Auth
(Authorization Server)
Data description and discovery
• (Meta)data store with fine-
grained visibility controls
• Schema agnostic
à dynamic schemas
• Simple search using URL
query parameters
• Complex search using
search request document
11
docs.globus.org/api/search
Search
Index
Search
github.com/globus/searchable-files-demo
Data ingest with Globus Search
12
Search
Index
POST /index/{index_id}/ingest'
Search
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": "filetype",
"subject”: "https://search.api.globus.org/abc.txt",
"visible_to": ["public"],
"content": {
"metadata-schema/file#type": "file”
}
},
...
]
}
Data ingest with Globus Search
13
Search
Index
POST /index/{index_id}/ingest'
Search
{
"ingest_type": "GMetaList",
"ingest_data": {
"gmeta": [
{
"id": "size",
"subject": "https://search.api.globus.org/abc.txt",
"visible_to": ["urn:globus:auth:identity:46bd0f56-
e24f-11e5-a510-131bef46955c"],
"content": {
"metadata-schema/file#size": "1000000",
"metadata-schema/file#size_human": "1MB”
}
},
...
]
}
Visibility limited to Globus Auth identity
- Single user
- Globus Group
- Registered client application
Data discovery with Globus Search
14
{
"@datatype": "GSearchResult",
"@version": "2017-09-01",
"count": 1,
"gmeta": [
{
"@datatype": "GMetaResult",
"@version": "2019-08-27",
"entries": [
{ ... }
],
"subject": "https://..."
}
],
"offset": 0,
"total": 1
}
GET /index/{index_id}/search?q=type%3Ahdf5
Search
Index
Simple query
Search
Data discovery with Globus Search
15
POST /index/{index_id}/search
Search
Index
Complex query
{
"filters": [
{
"type": "range",
"field_name": ”pubdate",
"values": [
{
"from": "*",
"to": "2020-12-31"
}
]
}
],
"facets": [
{
"name": "Publication Date",
"field_name": "pubdate",
...
}
]
}
Search
Data Access and Sharing
• Set guest collection access rule
• Check authenticated user’s Group membership
• Submit Transfer task
16
Groups
service
Transfer
service
GET /groups/my_groups
POST /endpoint/{endpoint_id}/access
POST /transfer
Groups
Transfer
Jupyter notebook demonstration
github.com/globus/globus-jupyter-notebooks
(Metadata_Search_and_Discovery.ipynb)
Automation at scale using
Globus Flows
Multiple ways to “automate” data management
• Scripts using the CLI (+ cron?)
• Globus Timer service à scheduled/recurring transfers
• Your own code calling the Globus APIs (ugh!)
• Globus Flows service!
– Flows comprise Actions
– Actions execute against an Action Provider service endpoint
– Extend by using the Action Provider Toolkit
action-provider-tools.readthedocs.io/en/latest
Let’s deploy and run a
simple flow…
Initiate a Globus transfer
task to move data to a
guest collection
Add an access rule
allowing a Group to
access the data
Start End
Jupyter notebook demonstration
github.com/globus/globus-jupyter-notebooks
(Automation_Using_Globus_Flows.ipynb)
Resources
• Globus API documentation: docs.globus.org/api
• Helpdesk and issue escalation: support@globus.org
• Mailing list: discuss@globus.org
• Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios
Join the Globus community
• Access the service: app.globus.org
• Create a personal endpoint: app.globus.org/file-manager/gcp
• Documentation: docs.globus.org
• Engage: discuss@globus.org
• Subscribe: globus.org/subscriptions
• Need help? support@globus.org
• Follow us: @globus

More Related Content

PDF
Instrument Data Orchestration with Globus Search and Flows
PDF
Connecting Your System to Globus (APS Workshop)
PDF
GlobusWorld 2021 Tutorial: Introduction to Globus
PDF
Data Orchestration at Scale (GlobusWorld Tour West)
PDF
Globus Portal Framework (APS Workshop)
PDF
Introduction to the Globus Platform (APS Workshop)
PDF
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
PDF
GlobusWorld 2021 Tutorial: Globus for System Administrators
Instrument Data Orchestration with Globus Search and Flows
Connecting Your System to Globus (APS Workshop)
GlobusWorld 2021 Tutorial: Introduction to Globus
Data Orchestration at Scale (GlobusWorld Tour West)
Globus Portal Framework (APS Workshop)
Introduction to the Globus Platform (APS Workshop)
GlobusWorld 2021 Tutorial: The Globus CLI, Platform and SDK
GlobusWorld 2021 Tutorial: Globus for System Administrators

What's hot (20)

PDF
Introduction to Globus (APS Workshop)
PDF
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
PDF
Introduction to Globus (GlobusWorld Tour West)
PDF
What's New in Globus - Internet2 TechEXtra
PPTX
Globus and Dataverse: Towards big Data Publication
PDF
Globus Command Line Interface (APS Workshop)
PDF
Best Practices for Data Sharing (GlobusWorld Tour - UCSD)
PDF
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
PDF
Automating Research Data Management at Scale with Globus
PPTX
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
PDF
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
PPTX
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
PPTX
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
PPTX
Gateways 2020 Tutorial - Introduction to Globus
PPTX
GlobusWorld 2020 Keynote
PDF
Tutorial: Leveraging Globus in your Research Applications
PPTX
"What's New With Globus" Webinar: Spring 2018
PDF
Enabling Secure Data Discoverability (SC21 Tutorial)
PPTX
Globus: Research Data Management as Service and Platform - pearc17
PPT
20090701 Climate Data Staging
Introduction to Globus (APS Workshop)
Automating Research Data Flows with Globus (CHPC 2019 - South Africa)
Introduction to Globus (GlobusWorld Tour West)
What's New in Globus - Internet2 TechEXtra
Globus and Dataverse: Towards big Data Publication
Globus Command Line Interface (APS Workshop)
Best Practices for Data Sharing (GlobusWorld Tour - UCSD)
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Automating Research Data Management at Scale with Globus
Gateways 2020 Tutorial - Automated Data Ingest and Search with Globus
Introduction to the Globus Platform (GlobusWorld Tour - UMich)
Gateways 2020 Tutorial - Large Scale Data Transfer with Globus
Gateways 2020 Tutorial - Instrument Data Distribution with Globus
Gateways 2020 Tutorial - Introduction to Globus
GlobusWorld 2020 Keynote
Tutorial: Leveraging Globus in your Research Applications
"What's New With Globus" Webinar: Spring 2018
Enabling Secure Data Discoverability (SC21 Tutorial)
Globus: Research Data Management as Service and Platform - pearc17
20090701 Climate Data Staging
Ad

Similar to GlobusWorld 2021 Tutorial: Building with the Globus Platform (20)

PDF
Working with Globus Platform Services
PDF
Advanced Computing Meets Data FAIRness
PDF
Building Research Applications with Globus PaaS
PDF
Working with Globus Platform Services and Portals
PDF
Building Data Portals and Science Gateways with Globus
PDF
Simplified Research Data Management with the Globus Platform
PPTX
Scalable Data Management: Automation and the Modern Research Data Portal
PDF
Tutorial: What's New with Globus
PDF
Globus Integrations (JupyterHub, Django, ...)
PDF
Globus Integrations (GlobusWorld Tour - UMich)
PDF
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
PDF
An Introduction to Globus for Researchers
PPTX
Sept 24 NISO Virtual Conference: Library Data in the Cloud
PPTX
Globus presentation
PDF
Globus Integrations (GlobusWorld Tour - UCSD)
PDF
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
PDF
Data Publication and Discovery with Globus
PDF
Enduring Impact in Data-Driven Science
PDF
Introduction to Globus: Research Data Management Software at the ALCF
PDF
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Working with Globus Platform Services
Advanced Computing Meets Data FAIRness
Building Research Applications with Globus PaaS
Working with Globus Platform Services and Portals
Building Data Portals and Science Gateways with Globus
Simplified Research Data Management with the Globus Platform
Scalable Data Management: Automation and the Modern Research Data Portal
Tutorial: What's New with Globus
Globus Integrations (JupyterHub, Django, ...)
Globus Integrations (GlobusWorld Tour - UMich)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
An Introduction to Globus for Researchers
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Globus presentation
Globus Integrations (GlobusWorld Tour - UCSD)
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
Data Publication and Discovery with Globus
Enduring Impact in Data-Driven Science
Introduction to Globus: Research Data Management Software at the ALCF
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Ad

More from Globus (20)

PDF
Globus Compute wth IRI Workflows - GlobusWorld 2024
PDF
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
PDF
Globus Compute Introduction - GlobusWorld 2024
PDF
Globus Connect Server Deep Dive - GlobusWorld 2024
PDF
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
First Steps with Globus Compute Multi-User Endpoints
PDF
Enhancing Research Orchestration Capabilities at ORNL.pdf
PDF
Understanding Globus Data Transfers with NetSage
PDF
How to Position Your Globus Data Portal for Success Ten Good Practices
PDF
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
PDF
Developing Distributed High-performance Computing Capabilities of an Open Sci...
PDF
The Department of Energy's Integrated Research Infrastructure (IRI)
PDF
GlobusWorld 2024 Opening Keynote session
PDF
Enhancing Performance with Globus and the Science DMZ
PDF
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
PDF
Globus at the United States Geological Survey
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
Globus Compute with Integrated Research Infrastructure (IRI) workflows
PDF
Reactive Documents and Computational Pipelines - Bridging the Gap
Globus Compute wth IRI Workflows - GlobusWorld 2024
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus Compute Introduction - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
First Steps with Globus Compute Multi-User Endpoints
Enhancing Research Orchestration Capabilities at ORNL.pdf
Understanding Globus Data Transfers with NetSage
How to Position Your Globus Data Portal for Success Ten Good Practices
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
The Department of Energy's Integrated Research Infrastructure (IRI)
GlobusWorld 2024 Opening Keynote session
Enhancing Performance with Globus and the Science DMZ
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus at the United States Geological Survey
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Reactive Documents and Computational Pipelines - Bridging the Gap

Recently uploaded (20)

PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Introduction to Artificial Intelligence
PDF
Digital Strategies for Manufacturing Companies
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
Online Work Permit System for Fast Permit Processing
PPTX
Transform Your Business with a Software ERP System
PDF
System and Network Administration Chapter 2
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
AI in Product Development-omnex systems
PPT
Introduction Database Management System for Course Database
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
System and Network Administraation Chapter 3
How Creative Agencies Leverage Project Management Software.pdf
Understanding Forklifts - TECH EHS Solution
Introduction to Artificial Intelligence
Digital Strategies for Manufacturing Companies
Navsoft: AI-Powered Business Solutions & Custom Software Development
Softaken Excel to vCard Converter Software.pdf
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Upgrade and Innovation Strategies for SAP ERP Customers
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Online Work Permit System for Fast Permit Processing
Transform Your Business with a Software ERP System
System and Network Administration Chapter 2
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
AI in Product Development-omnex systems
Introduction Database Management System for Course Database
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
System and Network Administraation Chapter 3

GlobusWorld 2021 Tutorial: Building with the Globus Platform

  • 1. Building with Globus platform services: Search and Flows Vas Vasiliadis GlobusWorld - May 13, 2021
  • 2. Data distribution/access via data portal, science gateway, data commons…
  • 3. Emerging best practice: Modern Research Data Portal design pattern
  • 4. Canonical data access/distribution use case • Portal/science gateway to distribute data • Interface to search and discover data of interest • Asynchronous transfer to user’s system or via HTTPS (e.g. for catalogs) • Fine-grained authorization enforced on (meta)data Search and request data of interest Transfer data to destination
  • 5. Common solution components • Guest collection for “staging” data • Registered application (manages permissions) • Search index with faceted queries • Data transfer, to and from shared endpoint
  • 6. Relevant Globus platform features • Guest collection creation requires authentication – Cannot be completely automated – Must be a managed endpoint • Roles for management of endpoint and tasks – Access Manager role grants the right to manage permissions – Granted to other users, groups or applications
  • 7. Application registration • Set desired scopes • Set callback URL • Get client ID and secret • Consents implement least privileges principle 7 Auth developers.globus.org
  • 8. Data sharing permissions management • Permissions are set per folder, on a guest collection • Permissions management can be automated • For a user – Identity: user must log in with this – Email: user gets a code via email; link to their Globus Account • For a group – Group UUID: search for group to get UUID – Access governed by membership in the group • For an application – Application identity: appclientid@clients.auth.globus.org
  • 9. Application concepts • Custom application that can automatically manage permissions – Can use Globus CLI • Confidential apps: use client id and secret – Ensure application is on a secure device – Set up policy for rotation of secret (limited life tokens)
  • 10. Client credential grant 10 1. Authenticate with app client id and secret 2. Access Tokens Application, Science Gateway, Data Portal (Client) 3. Authenticate as app with access tokens (to manage permissions) Globus Transfer (Resource Server) Globus Auth (Authorization Server)
  • 11. Data description and discovery • (Meta)data store with fine- grained visibility controls • Schema agnostic à dynamic schemas • Simple search using URL query parameters • Complex search using search request document 11 docs.globus.org/api/search Search Index Search github.com/globus/searchable-files-demo
  • 12. Data ingest with Globus Search 12 Search Index POST /index/{index_id}/ingest' Search { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": "filetype", "subject”: "https://search.api.globus.org/abc.txt", "visible_to": ["public"], "content": { "metadata-schema/file#type": "file” } }, ... ] }
  • 13. Data ingest with Globus Search 13 Search Index POST /index/{index_id}/ingest' Search { "ingest_type": "GMetaList", "ingest_data": { "gmeta": [ { "id": "size", "subject": "https://search.api.globus.org/abc.txt", "visible_to": ["urn:globus:auth:identity:46bd0f56- e24f-11e5-a510-131bef46955c"], "content": { "metadata-schema/file#size": "1000000", "metadata-schema/file#size_human": "1MB” } }, ... ] } Visibility limited to Globus Auth identity - Single user - Globus Group - Registered client application
  • 14. Data discovery with Globus Search 14 { "@datatype": "GSearchResult", "@version": "2017-09-01", "count": 1, "gmeta": [ { "@datatype": "GMetaResult", "@version": "2019-08-27", "entries": [ { ... } ], "subject": "https://..." } ], "offset": 0, "total": 1 } GET /index/{index_id}/search?q=type%3Ahdf5 Search Index Simple query Search
  • 15. Data discovery with Globus Search 15 POST /index/{index_id}/search Search Index Complex query { "filters": [ { "type": "range", "field_name": ”pubdate", "values": [ { "from": "*", "to": "2020-12-31" } ] } ], "facets": [ { "name": "Publication Date", "field_name": "pubdate", ... } ] } Search
  • 16. Data Access and Sharing • Set guest collection access rule • Check authenticated user’s Group membership • Submit Transfer task 16 Groups service Transfer service GET /groups/my_groups POST /endpoint/{endpoint_id}/access POST /transfer Groups Transfer
  • 18. Automation at scale using Globus Flows
  • 19. Multiple ways to “automate” data management • Scripts using the CLI (+ cron?) • Globus Timer service à scheduled/recurring transfers • Your own code calling the Globus APIs (ugh!) • Globus Flows service! – Flows comprise Actions – Actions execute against an Action Provider service endpoint – Extend by using the Action Provider Toolkit action-provider-tools.readthedocs.io/en/latest
  • 20. Let’s deploy and run a simple flow… Initiate a Globus transfer task to move data to a guest collection Add an access rule allowing a Group to access the data Start End
  • 22. Resources • Globus API documentation: docs.globus.org/api • Helpdesk and issue escalation: support@globus.org • Mailing list: discuss@globus.org • Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios
  • 23. Join the Globus community • Access the service: app.globus.org • Create a personal endpoint: app.globus.org/file-manager/gcp • Documentation: docs.globus.org • Engage: discuss@globus.org • Subscribe: globus.org/subscriptions • Need help? support@globus.org • Follow us: @globus