SlideShare a Scribd company logo
Globus Integrations
Vas Vasiliadis
vas@uchicago.edu
May 2, 2019
Enabling large-scale
data intensive science
with Jupyter
2
Andre Schleife, UIUC
16,000 CPU-hours per simulation
Sample'Experimental'
sca0ering'
Material'
composi4on'
Simulated'
structure'
Simulated'
sca0ering'
La'60%'
Sr'40%'
Evolu4onary'op4miza4on'
786,432 CPUs, 10 PFLOPS supercomputer
Argonne Leadership Computing Facility
MDF: Advanced materials research
Modeling stopping power
with time-dependent density
functional theory
@python_app
Logan Ward
Jupyter notebooks enable rapid iteration/results
But the data are big, distributed…
…and the science is collaborative
petrel.alcf.anl.gov
materialsdatafacility.org
2PB, 80Gbps store
3.2M materials data
Cooley: 290 TFLOPS
Query1 Share4
Transfer2
Learn3
Need multi-credential, multi-service authentication and data management
Hub
Configurable HTTP proxy
Authenticator
User DB
Spawner
Notebook
/api/auth
Browser
/hub/
/user/[name]/
• Multi-user hub
• Manages multiple instances
of Jupyter notebook server
• Configurable HTTP proxy
JupyterHub
Goal: Liberate the notebook!
• Tokens for remote services
• APIs for remote actions, e.g. data
management via Globus service
petrel.alcf.anl.gov
Securing JupyterHub with Globus Auth plugin
• Existing OAuth
framework
• Can restrict IdP
• Custom scopes
• Tokens passed into
notebook environment
github.com/jupyterhub/oauthenticator
github.com/jupyterhub/oauthenticator#globus-setup
Securing JupyterHub with Globus Auth
REST APIs
REST APIs
REST APIs
Bearer a45cd...
Hub
Configurable HTTP proxy
Authenticator
User DB
Spawner
Notebook
/api/auth
/hub/
/user/[name]/
login
Browser
{"tokens":...
{"tokens":...
Tokens in Jupyter notebooks
The world is your
oyster API…
• Globus Transfer
• Globus Search
• Your app
• Data portal
• Analysis engine
• …
Ad hoc data analysis/results distribution
Notebook
Data
Repository
Bearer a45cd…
Dataset
Shared
endpoint
POST '/endpoint/a3c345f... /mkdir’
200 OK
...
X-Transfer-API-Version: 0.10
Content-Type: application/json
...
Analyze
Experiment with the demo notebook
• Login into our JupyterHub*: jupyter.demo.globus.org
• Launch (spawn) a notebook server; get tokens
• Using the JupyterHub_Integration.ipynb notebook:
– Access Globus APIs; download some data
– “Analyze” data (generate plot)
– PUT results (graph) on an HTTPS endpoint
– Share the URL with others so they can access the results
*zero-to-jupyterhub.readthedocs.io
Leveraging the next
generation of services
12
Our (simplistic) data flow thus far…
• Adequate for ad hoc sharing (implicit knowledge)
• Broader access, reuse requires “formalization”
• Leverage additional Globus platform services
Notebook
Data
Repository
Bearer a45cd…
Dataset
Shared
endpoint
POST '/endpoint/a3c345f... /mkdir’
200 OK
...
X-Transfer-API-Version: 0.10
Content-Type: application/json
...
Analyze
Globus Search
• Scalable service à billions of entries
• Schema agnostic: use standard (e.g. DataCite) or custom
metadata
• Fine grained access control: only returns results that are
visible to user
• Plain text search: ranked results
• Faceted search: facilitates data discovery
• Rich query language: ranges, expressions, regex, etc.
14
docs.globus.org/api/search
Persistent identifiers
• Developing service for issuing persistent identifiers
– DOI, ARK, Handle, Globus
– e.g. https://identifiers.globus.org/doi:10.1145/2076450.2076468
• Within a namespace, e.g. your DataCite namespace
– Control which identities/groups can create identifiers
• Identifier attributes:
– Link to data: one or more https URLs, to file, folder or manifest
– Landing page: provided by service, or by user
– Visibility: identities, groups that can see identifier
– Checksum: of the file or manifest
– Metadata: as required by identifier (e.g., DataCite), extensible
– Replaces/replaced-by: for versioning
15
SearchIdentifierDescribeTransferAuth
Extending the automation flow
• How can we enable more structured/robust data
discovery using Globus platform services?
Create
folder
Transfer
data
Get
metadata
Mint
persistent
identifier
Catalog
Get
credentials
Set ACL
Other Globus integrations
• Web app development frameworks (Flask, Django)
• Content management systems (WordPress, Drupal)
• Development tools (Confluence, Jira)
• Scalable cyberinfrastructure (Kubernetes)
• Genomics analysis (Galaxy)
– galaxyproject.org/authnz/use/oidc/idps/globus
globus-integration-examples.readthedocs.io
Example
ALCF Data Discovery Portal
https://petreldata.net
Support resources
• Globus documentation: docs.globus.org
• Sample code: github.com/globus
• Helpdesk and issue escalation: support@globus.org
• Customer engagement team
• Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios
Join the Globus community
• Access the service: globus.org/login
• Create a personal endpoint: globus.org/app/endpoints/create-gcp
• Documentation: docs.globus.org
• Engage: globus.org/mailing-lists
• Subscribe: globus.org/subscriptions
• Need help? support@globus.org
• Follow us: @globusonline

More Related Content

PDF
Globus Integrations (GlobusWorld Tour - UCSD)
PDF
Globus Integrations (GlobusWorld Tour - UMich)
PDF
Data Orchestration at Scale (GlobusWorld Tour West)
PDF
Enabling Secure Data Discoverability (SC21 Tutorial)
PPT
20090701 Climate Data Staging
PPT
Web Crawling and Data Gathering with Apache Nutch
PPTX
Globus and Dataverse: Towards big Data Publication
PDF
Repository As A Service (RaaS) at ICPSR
Globus Integrations (GlobusWorld Tour - UCSD)
Globus Integrations (GlobusWorld Tour - UMich)
Data Orchestration at Scale (GlobusWorld Tour West)
Enabling Secure Data Discoverability (SC21 Tutorial)
20090701 Climate Data Staging
Web Crawling and Data Gathering with Apache Nutch
Globus and Dataverse: Towards big Data Publication
Repository As A Service (RaaS) at ICPSR

What's hot (20)

PPTX
PDF
GlobusWorld 2021 Tutorial: Building with the Globus Platform
PDF
Globus: Enabling the Open Storage Network
ODP
Large scale crawling with Apache Nutch
PPTX
Summit v4 dave wolcott
PDF
Instrument Data Orchestration with Globus Search and Flows
PDF
Foundations for the Future of Science
PDF
Connecting Your System to Globus (APS Workshop)
PDF
Globus Portal Framework (APS Workshop)
PPTX
Polyglot metadata for Hadoop
PDF
Data modeling for Elasticsearch
PDF
Access Control Model in Hadoop
PDF
Health Sciences Research Informatics, Powered by Globus
PPTX
Accessing external hadoop data sources using pivotal e xtension framework (px...
DOCX
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Facilitating document annotation usin...
PDF
An introduction to Storm Crawler
PDF
Web Crawling with Apache Nutch
PDF
What's New in Globus - Internet2 TechEXtra
PPTX
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
ODP
Commodity Semantic Search: A Case Study of DiscoverEd
GlobusWorld 2021 Tutorial: Building with the Globus Platform
Globus: Enabling the Open Storage Network
Large scale crawling with Apache Nutch
Summit v4 dave wolcott
Instrument Data Orchestration with Globus Search and Flows
Foundations for the Future of Science
Connecting Your System to Globus (APS Workshop)
Globus Portal Framework (APS Workshop)
Polyglot metadata for Hadoop
Data modeling for Elasticsearch
Access Control Model in Hadoop
Health Sciences Research Informatics, Powered by Globus
Accessing external hadoop data sources using pivotal e xtension framework (px...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Facilitating document annotation usin...
An introduction to Storm Crawler
Web Crawling with Apache Nutch
What's New in Globus - Internet2 TechEXtra
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Commodity Semantic Search: A Case Study of DiscoverEd
Ad

Similar to Globus Integrations (JupyterHub, Django, ...) (20)

PDF
Globus Integrations (CHPC 2019 - South Africa)
PDF
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
PPTX
FAIR Workflows and Research Objects get a Workout
PPTX
Scaling collaborative data science with Globus and Jupyter
PDF
Introduction to Globus - XSEDE14 Tutorial
PPTX
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
PPTX
Linked Open Data and Digital Curation (Islandora)
PDF
Kafka & Hadoop in Rakuten
PDF
Simplified Research Data Management with the Globus Platform
PPTX
Shug meetup Hops Hadoop
PDF
Data Science with the Help of Metadata
PPTX
Globus: Beyond File Transfer
PPTX
Large-Scale Data Science in Apache Spark 2.0
PPT
Simple Web service Offering Repository Deposit (SWORD)‏
PDF
C19013010 the tutorial to build shared ai services session 2
PPTX
Introduction to Kafka and Zookeeper
PDF
Building Research Applications with Globus PaaS
PPTX
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
PPTX
Science as a Service: How On-Demand Computing can Accelerate Discovery
PPTX
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Globus Integrations (CHPC 2019 - South Africa)
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
FAIR Workflows and Research Objects get a Workout
Scaling collaborative data science with Globus and Jupyter
Introduction to Globus - XSEDE14 Tutorial
Discovery Engines for Big Data: Accelerating Discovery in Basic Energy Sciences
Linked Open Data and Digital Curation (Islandora)
Kafka & Hadoop in Rakuten
Simplified Research Data Management with the Globus Platform
Shug meetup Hops Hadoop
Data Science with the Help of Metadata
Globus: Beyond File Transfer
Large-Scale Data Science in Apache Spark 2.0
Simple Web service Offering Repository Deposit (SWORD)‏
C19013010 the tutorial to build shared ai services session 2
Introduction to Kafka and Zookeeper
Building Research Applications with Globus PaaS
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ad

More from Globus (20)

PDF
Globus Compute wth IRI Workflows - GlobusWorld 2024
PDF
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
PDF
Globus Compute Introduction - GlobusWorld 2024
PDF
Globus Connect Server Deep Dive - GlobusWorld 2024
PDF
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
First Steps with Globus Compute Multi-User Endpoints
PDF
Enhancing Research Orchestration Capabilities at ORNL.pdf
PDF
Understanding Globus Data Transfers with NetSage
PDF
How to Position Your Globus Data Portal for Success Ten Good Practices
PDF
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
PDF
Developing Distributed High-performance Computing Capabilities of an Open Sci...
PDF
The Department of Energy's Integrated Research Infrastructure (IRI)
PDF
GlobusWorld 2024 Opening Keynote session
PDF
Enhancing Performance with Globus and the Science DMZ
PDF
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
PDF
Globus at the United States Geological Survey
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
Globus Compute with Integrated Research Infrastructure (IRI) workflows
PDF
Reactive Documents and Computational Pipelines - Bridging the Gap
Globus Compute wth IRI Workflows - GlobusWorld 2024
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus Compute Introduction - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
First Steps with Globus Compute Multi-User Endpoints
Enhancing Research Orchestration Capabilities at ORNL.pdf
Understanding Globus Data Transfers with NetSage
How to Position Your Globus Data Portal for Success Ten Good Practices
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
The Department of Energy's Integrated Research Infrastructure (IRI)
GlobusWorld 2024 Opening Keynote session
Enhancing Performance with Globus and the Science DMZ
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus at the United States Geological Survey
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Reactive Documents and Computational Pipelines - Bridging the Gap

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Spectroscopy.pptx food analysis technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Electronic commerce courselecture one. Pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPT
Teaching material agriculture food technology
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.
A comparative analysis of optical character recognition models for extracting...
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25-Week II
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectroscopy.pptx food analysis technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
Electronic commerce courselecture one. Pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
20250228 LYD VKU AI Blended-Learning.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Teaching material agriculture food technology
Programs and apps: productivity, graphics, security and other tools
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Review of recent advances in non-invasive hemoglobin estimation
Digital-Transformation-Roadmap-for-Companies.pptx

Globus Integrations (JupyterHub, Django, ...)

  • 2. Enabling large-scale data intensive science with Jupyter 2
  • 3. Andre Schleife, UIUC 16,000 CPU-hours per simulation Sample'Experimental' sca0ering' Material' composi4on' Simulated' structure' Simulated' sca0ering' La'60%' Sr'40%' Evolu4onary'op4miza4on' 786,432 CPUs, 10 PFLOPS supercomputer Argonne Leadership Computing Facility MDF: Advanced materials research Modeling stopping power with time-dependent density functional theory
  • 4. @python_app Logan Ward Jupyter notebooks enable rapid iteration/results
  • 5. But the data are big, distributed… …and the science is collaborative petrel.alcf.anl.gov materialsdatafacility.org 2PB, 80Gbps store 3.2M materials data Cooley: 290 TFLOPS Query1 Share4 Transfer2 Learn3 Need multi-credential, multi-service authentication and data management
  • 6. Hub Configurable HTTP proxy Authenticator User DB Spawner Notebook /api/auth Browser /hub/ /user/[name]/ • Multi-user hub • Manages multiple instances of Jupyter notebook server • Configurable HTTP proxy JupyterHub Goal: Liberate the notebook! • Tokens for remote services • APIs for remote actions, e.g. data management via Globus service petrel.alcf.anl.gov
  • 7. Securing JupyterHub with Globus Auth plugin • Existing OAuth framework • Can restrict IdP • Custom scopes • Tokens passed into notebook environment github.com/jupyterhub/oauthenticator
  • 9. REST APIs REST APIs REST APIs Bearer a45cd... Hub Configurable HTTP proxy Authenticator User DB Spawner Notebook /api/auth /hub/ /user/[name]/ login Browser {"tokens":... {"tokens":... Tokens in Jupyter notebooks The world is your oyster API… • Globus Transfer • Globus Search • Your app • Data portal • Analysis engine • …
  • 10. Ad hoc data analysis/results distribution Notebook Data Repository Bearer a45cd… Dataset Shared endpoint POST '/endpoint/a3c345f... /mkdir’ 200 OK ... X-Transfer-API-Version: 0.10 Content-Type: application/json ... Analyze
  • 11. Experiment with the demo notebook • Login into our JupyterHub*: jupyter.demo.globus.org • Launch (spawn) a notebook server; get tokens • Using the JupyterHub_Integration.ipynb notebook: – Access Globus APIs; download some data – “Analyze” data (generate plot) – PUT results (graph) on an HTTPS endpoint – Share the URL with others so they can access the results *zero-to-jupyterhub.readthedocs.io
  • 13. Our (simplistic) data flow thus far… • Adequate for ad hoc sharing (implicit knowledge) • Broader access, reuse requires “formalization” • Leverage additional Globus platform services Notebook Data Repository Bearer a45cd… Dataset Shared endpoint POST '/endpoint/a3c345f... /mkdir’ 200 OK ... X-Transfer-API-Version: 0.10 Content-Type: application/json ... Analyze
  • 14. Globus Search • Scalable service à billions of entries • Schema agnostic: use standard (e.g. DataCite) or custom metadata • Fine grained access control: only returns results that are visible to user • Plain text search: ranked results • Faceted search: facilitates data discovery • Rich query language: ranges, expressions, regex, etc. 14 docs.globus.org/api/search
  • 15. Persistent identifiers • Developing service for issuing persistent identifiers – DOI, ARK, Handle, Globus – e.g. https://identifiers.globus.org/doi:10.1145/2076450.2076468 • Within a namespace, e.g. your DataCite namespace – Control which identities/groups can create identifiers • Identifier attributes: – Link to data: one or more https URLs, to file, folder or manifest – Landing page: provided by service, or by user – Visibility: identities, groups that can see identifier – Checksum: of the file or manifest – Metadata: as required by identifier (e.g., DataCite), extensible – Replaces/replaced-by: for versioning 15
  • 16. SearchIdentifierDescribeTransferAuth Extending the automation flow • How can we enable more structured/robust data discovery using Globus platform services? Create folder Transfer data Get metadata Mint persistent identifier Catalog Get credentials Set ACL
  • 17. Other Globus integrations • Web app development frameworks (Flask, Django) • Content management systems (WordPress, Drupal) • Development tools (Confluence, Jira) • Scalable cyberinfrastructure (Kubernetes) • Genomics analysis (Galaxy) – galaxyproject.org/authnz/use/oidc/idps/globus globus-integration-examples.readthedocs.io
  • 18. Example ALCF Data Discovery Portal https://petreldata.net
  • 19. Support resources • Globus documentation: docs.globus.org • Sample code: github.com/globus • Helpdesk and issue escalation: support@globus.org • Customer engagement team • Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios
  • 20. Join the Globus community • Access the service: globus.org/login • Create a personal endpoint: globus.org/app/endpoints/create-gcp • Documentation: docs.globus.org • Engage: globus.org/mailing-lists • Subscribe: globus.org/subscriptions • Need help? support@globus.org • Follow us: @globusonline