SlideShare a Scribd company logo
Simplified Research Data Management
with the Globus Platform
CNI Membership Meeting, Fall 2018 - Project Update
Vas Vasiliadis
vas@uchicago.edu
Topics
‱ What is Globus?
‱ Globus from a researcher’s perspective
‱ Common use cases: research data automation
‱ Data publication with Globus
‱ Sustainability – it’s in our DNA
Research data management today (circa 2008)
How do we...
...move, share, describe,
discover, reproduce?
Index?
Facilitate data stewardship
Globus: A Brief History of Time
‱ Oct. 1998 – Globus Toolkit v1.0.0
‱ Nov. 2010 – Globus Online initial release
‱ Nov. 2013 – Sustainability model launched
‱ Dec. 2016 – 50,000 registered users, 200PB+ moved
‱ Jan. 2018 – Globus Toolkit support EOL
‱ Jan. 2019 - 100th subscriber signed, >50% sustainable
‱ ??? – Globus becomes fully self-sustaining
globus online
Globus

bridges data and people
within and beyond
organizational boundaries
6
Research Computing HPC
Desktop Workstations
Mass Storage Instruments
Personal Resources
Public/Private Cloud
National Resources
Unified access to data across storage tiers
Public / private cloud stores
External
campus
storage
Project
repositories,
replication stores
Public repositories
Sharing with collaborators, community
Globus: Core functions
Researcher initiates
transfer request; or
requested automatically
by script, science
gateway
1
Instrument
Compute Facility
Globus transfers files
reliably, securely
2
Globus controls
access to shared
files on existing
storage; no need
to move files to
cloud storage!
4
Curator reviews and
approves; data set
published on campus
or other system
7
Researcher
selects files to
share, selects
user or group,
and sets access
permissions
3
Collaborator logs in to
Globus and accesses
shared files; no local
account required;
download via Globus
5
Researcher
assembles data set;
describes it using
metadata (Dublin
core and domain-
specific)
6
6
Peers, collaborators
search and discover
datasets; transfer and
share using Globus
8
Publication
Repository
Personal Computer
Transfer
Share
Publish
Discover
‱ Use a Web browser
‱ Access any storage
‱ Use an existing identity
Demonstration
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Globus for high assurance data management
‱ Restricted data handling: PHI, PII, CUI
‱ Security controls: NIST 800-53, 800-171 Low
‱ Business Associate Agreement (BAA) w/UChicago
– University of Chicago has a BAA with Amazon
High Assurance features
‱ Additional authentication assurance
– Per storage gateway policy on frequency of authentication with
specific identity for access to data
– Ensure that user authenticates with the specific identity that gives
them access within session
‱ Application instance isolation
– Authentication context is per application, per session
‱ Encryption of user data in transit and Globus data at rest
‱ Detailed audit log (on data transfer nodes)
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Simplified Research Data Management with the Globus Platform
Accessing Globus from
your own storage

client software
that makes a
storage system
accessible via
Globus
Globus Connect Personal
‱ Installers do not require admin access
‱ Zero configuration; auto updating
‱ No firewall changes required; handles NATs
Globus Connect Server
‱ Installed and managed by sysadmin
‱ Default access for all local accounts
31
docs.globus.org/globus-connect-server-installation-guide/
Local system users
Local Storage System
(HPC cluster, NAS, 
)
Globus Connect Server
MyProxy
CA
GridFTP
Server
OAuth
Server
Data
Transfer
Node
‱ POSIX + connectors
‱ Native packaging
Linux: DEB, RPM
IBM Spectrum Scale
Current Planned
Storage Connectors - globus.org/connectors
ActiveScale
Use(r)-appropriate interfaces
GET /endpoint/go%23ep1
PUT /endpoint/vas#my_endpt
200 OK
X-Transfer-API-Version: 0.10
Content-Type: application/json
...
Globus service
Web
CLI
Rest
API
Globus Command Line Interface
‱ Full-featured (web++)
‱ Uses Python SDK
‱ Open source
github.com/globus/globus-cli
docs.globus.org/cli
Globus is PaaS


for building science
gateways, portals, and
other web applications in
support of research and
education
Globus Auth
(identity and access management)


GlobusAPIs
(Transfer,Search,Identifiers,
)
GlobusConnect
Data Publication
File Sharing
File Transfer, Sync
The Globus Platform
Data Automation
Globus Auth
‱ Foundational Identity and Access
Management (IAM) service
‱ Protects REST API communications
‱ Enables login for diverse app ecosystem,
no new identity required
‱ Employs least privileges security model
Auth
User
Authentication
Secure service
interactions
Application identity
and interactions
Globus helps
automate data flows
Automated data movement
39
Scheduled
backup
My Drive/backups
|__/20170930
|__/20170929
|__/20170928
|__
.
Recurring transfers
with sync option
Campus Lab
Streamlined data distribution
My Drive/projectX
|__/source
|__/pipe0001
|__/pipe0002
|__
.
Secure sharing with
research community
Discover and access
via data portalHPC resource,
Campus storage,


Data distribution example: NCAR RDA
Reliable instrument data egress
My Drive/FASTQ
|__/cohort_0_0
|__/cohort_0_1
|__/cohort_0_2
|__
.
Stage data for
downstream analysis
NGS and high-res Imaging
(APS, ALS, CryoEM, fMRI,
)
Instrument data egress example
‱ Kasthuri Lab at Uchicago: brain aging and disease
‱ Construct connectome (map neuron connections)
JLSEUChicago
ALCFAPS
Publication7
Kasthuri Lab neuroanatomy reconstruction pipeline
Imaging1
Lab Server 1
Acquisition2
Lab Server 2
Pre-processing3 Preview/Center4
Reconstruction6Visualization8
User validation5
Science!9
Data Management Plan enablement
My Drive/datasets
|__/afdb4523
|__/235fabcc
|__/cd23a421
|__
.
Dataset
assembly,
description,
curation
http://hdl.handle.net/11466/OMN5BFB
Access via
persistent
identifier
Diverse
storage
systems
Globus Data Publication V1
‱ Cloud-based web app
‱ BYO storage
‱ User-managed collections
‱ Select pre-defined schema
‱ Handle, DOI persistent
identifiers
‱ >2000 users, >600 datasets
publish.globus.org
Many variations of data publication

Citable Data
‱ Standard metadata
‱ Persistent identifiers
‱ Durable storage
‱ Many domains
‱ Custom metadata
‱ Locally managed storage
Institutional Data
‱ Agreed schema
‱ Larger datasets
‱ Fine grained metadata
Community Data

Including active data management
Active Research Data
‱ Less standard and evolving schema
‱ Data organized independent of storage
‱ Support active collaboration
‱ Location agnostic identifiers
Publication v2 Platform
‱ Decompose Globus turnkey solution into microservices
‱ Enable flexible re-composition and adaptation of services
‱ Support extension and enhancement of publication flows
Automate
SearchIdentifyDescribeTransferAuth
Create
folder
Transfer
data
Get
metadata
Mint
persistent
identifier
Catalog
Get
credentials
Set ACL
Globus Search service
‱ Hosted, scalable service for research data discovery
‱ Schema agnostic
‱ Fine grained access control
‱ Plain text search
‱ Faceted search
‱ Rich query language
50
Globus Identifiers service (limited beta release)
‱ Issue persistent identifiers

‱ 
within your namespace, with access control
‱ Identifiers have

– 
link to data
– 
landing page
– 
visibility
– 
checksum
– 
extensible metadata
– 
versioning
51
Globus Automate (coming soon)
‱ Composition and execution service for automating
research data management
‱ Higher level flow description language and authoring
tools
‱ Pluggable API to integrate any actions
– e.g. automated validation, metadata extraction
‱ Flexible invocation of actions: user or event driven
Globus
platform applications
Jupyter + Globus for interactive data science at scale
petrel.alcf.anl.gov
materialsdatafacility.org
2PB, 80Gbps store
3.2M materials data
Cooley: 290 TFLOPS
Query1 Share4
Transfer2
Learn3
Genotype imputation: Wellcome Sanger
National Resource Access
Identity Management
Globus PaaS developer resources
Python SDK
Sample
Application
docs.globus.org/api github.com/globus
Jupyter Notebook

on sustainability
8,300
active shared
endpoints
70+
petabyte movers
500 PB
moved
20,400
active personal
endpoints
80 billion
files processed
1,800
active server
endpoints
94
subscribers
1 PB
largest single
transfer to date
99.9%
availability
559
identity providers
1,923
most shared
endpoints
at a single
institution 120,000
users
Globus by the numbers
Thank you to our sponsors...
U . S . D E P A R T M E N T O F
ENERGY

and THANK YOU, subscribers!
Globus sustainability model
‱ Standard Subscription
– Sharing, data publication
– HTTPS access
– Console, usage reporting
– Priority support
– App integration support
‱ High Assurance subscription
– App instance isolation
– Additional authentication assurance
– Audit logging
– NIST 800-53, NIST 800-171 (+ BAA)
‱ Branded Web Site
‱ Premium Storage Connectors
‱ Alternate Identity Provider (InCommon is standard)
Support resources
‱ Globus documentation: docs.globus.org
‱ Community email list: developer-discuss@globus.org
‱ Helpdesk and issue escalation: support@globus.org
‱ Customer engagement team
‱ Globus professional services team
– Assist with portal/gateway/app architecture and design
– Develop custom applications that leverage the Globus platform
– Advise on customized deployment and integration scenarios

More Related Content

PPTX
NIH Data Commons Architecture Ideas
PPT
SomeSlides
PDF
Enabling Secure Data Discoverability (SC21 Tutorial)
PDF
Globus: Enabling the Open Storage Network
PPT
Grid Computing July 2009
PPT
20090701 Climate Data Staging
PDF
Foundations for the Future of Science
PDF
Architecting An Enterprise Storage Platform Using Object Stores
NIH Data Commons Architecture Ideas
SomeSlides
Enabling Secure Data Discoverability (SC21 Tutorial)
Globus: Enabling the Open Storage Network
Grid Computing July 2009
20090701 Climate Data Staging
Foundations for the Future of Science
Architecting An Enterprise Storage Platform Using Object Stores

What's hot (20)

PPTX
Research Automation for Data-Driven Discovery
PDF
A Data Ecosystem to Support Machine Learning in Materials Science
PDF
Automating Research Data Management at Scale with Globus
PDF
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
PPT
Distributed Interactive Computing Environment (DICE)
PDF
Webinar: Managing Real Time Risk Analytics with MongoDB
PPTX
Webinar: Q&A on Globus Subscription Features
PPTX
Web Performance
 
PPTX
Time Series Analytics Azure ADX
PDF
Building Data Applications with Apache Druid
 
PPTX
Coding the Continuum
PPTX
Redis Streams plus Spark Structured Streaming
PPTX
Sharing a Startup’s Big Data Lessons
PPTX
Azure Data Explorer deep dive - review 04.2020
PPTX
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
PPTX
Webinar: Utilisations courantes de MongoDB
PDF
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
PDF
Data Gloveboxes: A Philosophy of Data Science Data Security
PDF
Analytics over Terabytes of Data at Twitter
 
PDF
[Data Innovation Summit 2015] Belga Big Content Platform
Research Automation for Data-Driven Discovery
A Data Ecosystem to Support Machine Learning in Materials Science
Automating Research Data Management at Scale with Globus
Facilitating Collaboration with Globus (GlobusWorld Tour - STFC)
Distributed Interactive Computing Environment (DICE)
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Q&A on Globus Subscription Features
Web Performance
 
Time Series Analytics Azure ADX
Building Data Applications with Apache Druid
 
Coding the Continuum
Redis Streams plus Spark Structured Streaming
Sharing a Startup’s Big Data Lessons
Azure Data Explorer deep dive - review 04.2020
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
Webinar: Utilisations courantes de MongoDB
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
Data Gloveboxes: A Philosophy of Data Science Data Security
Analytics over Terabytes of Data at Twitter
 
[Data Innovation Summit 2015] Belga Big Content Platform
Ad

Similar to Simplified Research Data Management with the Globus Platform (20)

PDF
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
PDF
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
PPTX
Globus: Beyond File Transfer
PDF
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
PDF
Introduction to Globus - XSEDE14 Tutorial
PDF
Introduction to Globus for New Users
PDF
Introduction to Globus (APS Workshop)
PPTX
GlobusWorld 2020 Keynote
PDF
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
PDF
GlobusWorld 2024 Opening Keynote session
PDF
GlobusWorld 2024 Opening Keynote session
PPTX
re:Invent 2013-foster-madduri
PDF
GlobusWorld 2021 Tutorial: Introduction to Globus
PDF
Science cloud foster june 2013
PPTX
Science as a Service: How On-Demand Computing can Accelerate Discovery
PDF
Introduction to Globus
PDF
Introduction to Globus (GlobusWorld Tour West)
PDF
Introduction to Data Transfer and Sharing for Researchers
PDF
Introduction to Globus for New Users
PPTX
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Globus: A Data Management Platform for Collaborative Research (CHPC 2019 - So...
Introduction to Globus for New Users (GlobusWorld Tour - Columbia University)
Globus: Beyond File Transfer
Introduction to Globus for New Users (GlobusWorld Tour - UCSD)
Introduction to Globus - XSEDE14 Tutorial
Introduction to Globus for New Users
Introduction to Globus (APS Workshop)
GlobusWorld 2020 Keynote
Introduction to the Globus SaaS (GlobusWorld Tour - STFC)
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
re:Invent 2013-foster-madduri
GlobusWorld 2021 Tutorial: Introduction to Globus
Science cloud foster june 2013
Science as a Service: How On-Demand Computing can Accelerate Discovery
Introduction to Globus
Introduction to Globus (GlobusWorld Tour West)
Introduction to Data Transfer and Sharing for Researchers
Introduction to Globus for New Users
Science Services and Science Platforms: Using the Cloud to Accelerate and Dem...
Ad

More from Globus (20)

PDF
Globus Compute wth IRI Workflows - GlobusWorld 2024
PDF
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
PDF
Globus Compute Introduction - GlobusWorld 2024
PDF
Globus Connect Server Deep Dive - GlobusWorld 2024
PDF
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
First Steps with Globus Compute Multi-User Endpoints
PDF
Enhancing Research Orchestration Capabilities at ORNL.pdf
PDF
Understanding Globus Data Transfers with NetSage
PDF
How to Position Your Globus Data Portal for Success Ten Good Practices
PDF
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
PDF
Developing Distributed High-performance Computing Capabilities of an Open Sci...
PDF
The Department of Energy's Integrated Research Infrastructure (IRI)
PDF
Enhancing Performance with Globus and the Science DMZ
PDF
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
PDF
Globus at the United States Geological Survey
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
PDF
Globus Compute with Integrated Research Infrastructure (IRI) workflows
PDF
Reactive Documents and Computational Pipelines - Bridging the Gap
PDF
Innovating Inference at Exascale - Remote Triggering of Large Language Models...
Globus Compute wth IRI Workflows - GlobusWorld 2024
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus Compute Introduction - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
First Steps with Globus Compute Multi-User Endpoints
Enhancing Research Orchestration Capabilities at ORNL.pdf
Understanding Globus Data Transfers with NetSage
How to Position Your Globus Data Portal for Success Ten Good Practices
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
The Department of Energy's Integrated Research Infrastructure (IRI)
Enhancing Performance with Globus and the Science DMZ
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus at the United States Geological Survey
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Reactive Documents and Computational Pipelines - Bridging the Gap
Innovating Inference at Exascale - Remote Triggering of Large Language Models...

Recently uploaded (20)

PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Nekopoi APK 2025 free lastest update
PDF
System and Network Administraation Chapter 3
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
L1 - Introduction to python Backend.pptx
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
AI in Product Development-omnex systems
PDF
Digital Strategies for Manufacturing Companies
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
top salesforce developer skills in 2025.pdf
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
medical staffing services at VALiNTRY
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
System and Network Administration Chapter 2
Wondershare Filmora 15 Crack With Activation Key [2025
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Odoo POS Development Services by CandidRoot Solutions
Design an Analysis of Algorithms II-SECS-1021-03
Nekopoi APK 2025 free lastest update
System and Network Administraation Chapter 3
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
L1 - Introduction to python Backend.pptx
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
AI in Product Development-omnex systems
Digital Strategies for Manufacturing Companies
PTS Company Brochure 2025 (1).pdf.......
top salesforce developer skills in 2025.pdf
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
2025 Textile ERP Trends: SAP, Odoo & Oracle
medical staffing services at VALiNTRY
Odoo Companies in India – Driving Business Transformation.pdf
How to Choose the Right IT Partner for Your Business in Malaysia
System and Network Administration Chapter 2

Simplified Research Data Management with the Globus Platform

  • 1. Simplified Research Data Management with the Globus Platform CNI Membership Meeting, Fall 2018 - Project Update Vas Vasiliadis vas@uchicago.edu
  • 2. Topics ‱ What is Globus? ‱ Globus from a researcher’s perspective ‱ Common use cases: research data automation ‱ Data publication with Globus ‱ Sustainability – it’s in our DNA
  • 3. Research data management today (circa 2008) How do we... ...move, share, describe, discover, reproduce? Index? Facilitate data stewardship
  • 4. Globus: A Brief History of Time ‱ Oct. 1998 – Globus Toolkit v1.0.0 ‱ Nov. 2010 – Globus Online initial release ‱ Nov. 2013 – Sustainability model launched ‱ Dec. 2016 – 50,000 registered users, 200PB+ moved ‱ Jan. 2018 – Globus Toolkit support EOL ‱ Jan. 2019 - 100th subscriber signed, >50% sustainable ‱ ??? – Globus becomes fully self-sustaining globus online
  • 5. Globus
 bridges data and people within and beyond organizational boundaries
  • 6. 6 Research Computing HPC Desktop Workstations Mass Storage Instruments Personal Resources Public/Private Cloud National Resources Unified access to data across storage tiers
  • 7. Public / private cloud stores External campus storage Project repositories, replication stores Public repositories Sharing with collaborators, community
  • 8. Globus: Core functions Researcher initiates transfer request; or requested automatically by script, science gateway 1 Instrument Compute Facility Globus transfers files reliably, securely 2 Globus controls access to shared files on existing storage; no need to move files to cloud storage! 4 Curator reviews and approves; data set published on campus or other system 7 Researcher selects files to share, selects user or group, and sets access permissions 3 Collaborator logs in to Globus and accesses shared files; no local account required; download via Globus 5 Researcher assembles data set; describes it using metadata (Dublin core and domain- specific) 6 6 Peers, collaborators search and discover datasets; transfer and share using Globus 8 Publication Repository Personal Computer Transfer Share Publish Discover ‱ Use a Web browser ‱ Access any storage ‱ Use an existing identity
  • 23. Globus for high assurance data management ‱ Restricted data handling: PHI, PII, CUI ‱ Security controls: NIST 800-53, 800-171 Low ‱ Business Associate Agreement (BAA) w/UChicago – University of Chicago has a BAA with Amazon
  • 24. High Assurance features ‱ Additional authentication assurance – Per storage gateway policy on frequency of authentication with specific identity for access to data – Ensure that user authenticates with the specific identity that gives them access within session ‱ Application instance isolation – Authentication context is per application, per session ‱ Encryption of user data in transit and Globus data at rest ‱ Detailed audit log (on data transfer nodes)
  • 29. 
client software that makes a storage system accessible via Globus
  • 30. Globus Connect Personal ‱ Installers do not require admin access ‱ Zero configuration; auto updating ‱ No firewall changes required; handles NATs
  • 31. Globus Connect Server ‱ Installed and managed by sysadmin ‱ Default access for all local accounts 31 docs.globus.org/globus-connect-server-installation-guide/ Local system users Local Storage System (HPC cluster, NAS, 
) Globus Connect Server MyProxy CA GridFTP Server OAuth Server Data Transfer Node ‱ POSIX + connectors ‱ Native packaging Linux: DEB, RPM
  • 32. IBM Spectrum Scale Current Planned Storage Connectors - globus.org/connectors ActiveScale
  • 33. Use(r)-appropriate interfaces GET /endpoint/go%23ep1 PUT /endpoint/vas#my_endpt 200 OK X-Transfer-API-Version: 0.10 Content-Type: application/json ... Globus service Web CLI Rest API
  • 34. Globus Command Line Interface ‱ Full-featured (web++) ‱ Uses Python SDK ‱ Open source github.com/globus/globus-cli docs.globus.org/cli
  • 35. Globus is PaaS
 
for building science gateways, portals, and other web applications in support of research and education
  • 36. Globus Auth (identity and access management) 
 GlobusAPIs (Transfer,Search,Identifiers,
) GlobusConnect Data Publication File Sharing File Transfer, Sync The Globus Platform Data Automation
  • 37. Globus Auth ‱ Foundational Identity and Access Management (IAM) service ‱ Protects REST API communications ‱ Enables login for diverse app ecosystem, no new identity required ‱ Employs least privileges security model Auth User Authentication Secure service interactions Application identity and interactions
  • 39. Automated data movement 39 Scheduled backup My Drive/backups |__/20170930 |__/20170929 |__/20170928 |__
. Recurring transfers with sync option Campus Lab
  • 40. Streamlined data distribution My Drive/projectX |__/source |__/pipe0001 |__/pipe0002 |__
. Secure sharing with research community Discover and access via data portalHPC resource, Campus storage, 

  • 42. Reliable instrument data egress My Drive/FASTQ |__/cohort_0_0 |__/cohort_0_1 |__/cohort_0_2 |__
. Stage data for downstream analysis NGS and high-res Imaging (APS, ALS, CryoEM, fMRI,
)
  • 43. Instrument data egress example ‱ Kasthuri Lab at Uchicago: brain aging and disease ‱ Construct connectome (map neuron connections)
  • 44. JLSEUChicago ALCFAPS Publication7 Kasthuri Lab neuroanatomy reconstruction pipeline Imaging1 Lab Server 1 Acquisition2 Lab Server 2 Pre-processing3 Preview/Center4 Reconstruction6Visualization8 User validation5 Science!9
  • 45. Data Management Plan enablement My Drive/datasets |__/afdb4523 |__/235fabcc |__/cd23a421 |__
. Dataset assembly, description, curation http://hdl.handle.net/11466/OMN5BFB Access via persistent identifier Diverse storage systems
  • 46. Globus Data Publication V1 ‱ Cloud-based web app ‱ BYO storage ‱ User-managed collections ‱ Select pre-defined schema ‱ Handle, DOI persistent identifiers ‱ >2000 users, >600 datasets publish.globus.org
  • 47. Many variations of data publication
 Citable Data ‱ Standard metadata ‱ Persistent identifiers ‱ Durable storage ‱ Many domains ‱ Custom metadata ‱ Locally managed storage Institutional Data ‱ Agreed schema ‱ Larger datasets ‱ Fine grained metadata Community Data
  • 48. 
Including active data management Active Research Data ‱ Less standard and evolving schema ‱ Data organized independent of storage ‱ Support active collaboration ‱ Location agnostic identifiers
  • 49. Publication v2 Platform ‱ Decompose Globus turnkey solution into microservices ‱ Enable flexible re-composition and adaptation of services ‱ Support extension and enhancement of publication flows Automate SearchIdentifyDescribeTransferAuth Create folder Transfer data Get metadata Mint persistent identifier Catalog Get credentials Set ACL
  • 50. Globus Search service ‱ Hosted, scalable service for research data discovery ‱ Schema agnostic ‱ Fine grained access control ‱ Plain text search ‱ Faceted search ‱ Rich query language 50
  • 51. Globus Identifiers service (limited beta release) ‱ Issue persistent identifiers
 ‱ 
within your namespace, with access control ‱ Identifiers have
 – 
link to data – 
landing page – 
visibility – 
checksum – 
extensible metadata – 
versioning 51
  • 52. Globus Automate (coming soon) ‱ Composition and execution service for automating research data management ‱ Higher level flow description language and authoring tools ‱ Pluggable API to integrate any actions – e.g. automated validation, metadata extraction ‱ Flexible invocation of actions: user or event driven
  • 54. Jupyter + Globus for interactive data science at scale petrel.alcf.anl.gov materialsdatafacility.org 2PB, 80Gbps store 3.2M materials data Cooley: 290 TFLOPS Query1 Share4 Transfer2 Learn3
  • 58. Globus PaaS developer resources Python SDK Sample Application docs.globus.org/api github.com/globus Jupyter Notebook
  • 60. 8,300 active shared endpoints 70+ petabyte movers 500 PB moved 20,400 active personal endpoints 80 billion files processed 1,800 active server endpoints 94 subscribers 1 PB largest single transfer to date 99.9% availability 559 identity providers 1,923 most shared endpoints at a single institution 120,000 users Globus by the numbers
  • 61. Thank you to our sponsors... U . S . D E P A R T M E N T O F ENERGY
  • 62. 
and THANK YOU, subscribers!
  • 63. Globus sustainability model ‱ Standard Subscription – Sharing, data publication – HTTPS access – Console, usage reporting – Priority support – App integration support ‱ High Assurance subscription – App instance isolation – Additional authentication assurance – Audit logging – NIST 800-53, NIST 800-171 (+ BAA) ‱ Branded Web Site ‱ Premium Storage Connectors ‱ Alternate Identity Provider (InCommon is standard)
  • 64. Support resources ‱ Globus documentation: docs.globus.org ‱ Community email list: developer-discuss@globus.org ‱ Helpdesk and issue escalation: support@globus.org ‱ Customer engagement team ‱ Globus professional services team – Assist with portal/gateway/app architecture and design – Develop custom applications that leverage the Globus platform – Advise on customized deployment and integration scenarios