SlideShare a Scribd company logo
Cloud Computing and
Bioinformatics
Enis Afgan*, Nuwan Goonasekera†
* Johns Hopkins University, Taylor Lab, USA
†
University of Melbourne, Victorian Life Science Computation Initiative, Australia
@ University of Colombo
Feb 2017
Overview
• The key characteristics of cloud computing
• Dynamically scaling cloud resources
• Using Cloud Computing for bioinformatics
Source: http://dilbert.com/strips/comic/2012-05-25/
Life before cloud computing
source: http://www.rackspace.com/knowledge_center/whitepaper/revolution-not-evolution-how-cloud-computing-differs-from-traditional-it-and-why-it
Cloud Computing: A Definition
• NIST definition: “Cloud computing is a model for enabling
ubiquitous, convenient, on-demand network access to a
shared pool of configurable computing resources (e.g.,
networks, servers, storage, applications, and services) that
can be rapidly provisioned and released with minimal
management effort or service provider interaction.”
» National Institute of Standards and Technology
(http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf)
The Cloud Model
Private Community Public Hybrid
Deployment
Models
Delivery
Models
Essential
Characteristics
Software as a Service
(SaaS)
Platform as a Service
(PaaS)
Infrastructure as a
Service (IaaS)
• On-demand self-service
• Broad network access
• Resource pooling
• Rapid elasticity
• Measured service
Delivery Models
source: http://www.businessinsider.com.au/10-most-important-in-cloud-computing-2013-4?op=1#a-word-about-clouds-1
Public SaaS examples
• Gmail
• Sharepoint
• Salesforce.com CRM
• On-live
• Gaikai
• Microsoft Office 365
• Some definitions include those that do not require payment.
E.g. ad-supported sites
Public PaaS Examples
Cloud Name Language and
Developer Tools
Programming
Models Supported
by Provider
Target Applications
and Storage Options
Google App Engine Python, Java, Go,
PHP + JVM languages
(scala, groovy, jruby)
MapReduce, Web,
DataStore, Storage
and other APIs
Web applications and
BigTable storage
Salesforce.com’s
Force.com
Apex, Eclipsed-based
IDE, web-based
wizard
Workflow, excel-like
formula, web
programming
Business applications
such as CRM
Microsoft Azure .NET, Visual Studio,
Azure tools
Unrestricted model Enterprise and web
apps
Amazon Elastic
MapReduce
Hive, Pig, Java, Ruby
etc.
MapReduce Data processing and
e-commerce
Aneka .NET, stand-alone
SDK
Threads, task,
MapReduce
.NET enterprise
applications, HPC
Infrastructure-as-a-Service (IaaS)
• Amazon Web Services (Market leader)
• Rackspace Cloud
• NeCTAR/OpenStack Research Cloud
• Joyent Cloud
• GoGrid
• FlexiScale
Common Terms
Machine Image: A stored image/template from which a new virtual
machine can be launched. E.g. Ubuntu
Instance: A running virtual machine based on some machine image.
Volume: Attachable Block Storage, which is the equivalent of a virtual disk
drive.
Object Store: A large store for storing simple binary objects + metadata
within containers
Security Groups: A means of specifying firewall rules
Key-pairs: Public/private key pairs for accessing a virtual machine
Getting started with Cloud resources
Demo 1
Many clouds exist - how do we use them?
Many clouds and many solutions
launch.genome.edu.au ; use.jetstream-cloud.org ; launch.usegalaxy.org
?!?!
Architectural stack
CloudLaunch.usegalaxy.org
A P P L I C A T I O N S
CloudBridge
CloudMan
Goonasekera, N., Lonie, A., Taylor, J., Afgan, E., “CloudBridge – a Simple Cross-Cloud Python
Library”, XSEDE 16, Miami, FL, July 2016.
Demo 2
CloudBridge Design Principles
A simple, open-source python multi-cloud library.
Uniform API irrespective of the underlying provider
No special casing of application code
Simpler code
Provide a set of conformance tests for all supported clouds
No need to test against each cloud
“Write-once-run-anywhere”
> 92% test coverage at present
Supports OpenStack and AWS right now
Community contributions for GCE and Azure forthcoming!
http://cloudbridge.readthedocs.org/
https://github.com/gvlproject/cloudbridge
Sample code: launch an instance
1. kp = provider.security.key_pairs.create('cloudbridge_intro')
2. with open('cloudbridge_intro.pem', 'w') as f:
3. f.write(kp.material)
4. sg = provider.security.security_groups.create(
5. 'cloudbridge_intro', 'A security group used by CloudBridge')
6. sg.add_rule('tcp', 22, 22, '0.0.0.0/0')
7. img = provider.compute.images.get(image_id)
8. inst_type = sorted([t for t in provider.compute.instance_types.list() if t.vcpus >= 2 and t.ram >= 4],
key=lambda x: x.vcpus*x.ram)[0]
9. inst = provider.compute.instances.create(
name='CloudBridge-intro', image=img, instance_type=inst_type,
key_pair=kp, security_groups=[sg])
10. # Wait until ready
11. inst.wait_till_ready()
12. # Show instance state
13. inst.state
14. # 'running'
15. inst.public_ips
16. # [u'54.166.125.219']
Create a key pair
Create a security group
Launch an instance
Portal for deploying cloud-enable applications.
Support for customization
Support launch for diff versions, apps, configs, clouds → fill a role of a science
gateway discovery and access portal
Modular and extensible platform
App-store for cloud-enabled applications
Users can develop and integrate custom application launch and management
components, at the UI and backend
Natively multi-cloud
Backed by CloudBridge
CloudLaunch Feature Highlights
https://beta.launch.usegalaxy.org/
https://github.com/galaxyproject/cloudlaunch-ui
https://github.com/galaxyproject/cloudlaunch
CloudLaunch architecture
CloudBridge
Django + REST framework + Celery
Angular 2
GVL
CloudMan
Galaxy
CloudMan
Ubuntu Pluggable
components
Pluggable component example
<form class="form" [ngFormModel]="gvlLaunchForm" (ngSubmit)="onSubmit(gvlLaunchForm.value)">
<!-- GVL Component Selection -->
<config-panel>
<panel-header>GVL Settings</panel-header>
<panel-body>
<div class="form-group">
<label>Auto-start the selected applications</label>
<div class="checkbox">
<label>
<input type="checkbox" name="gvlapp_cmdlineutils" ngControl="gvl_cmdline_utilities" />
GVL Commandline Utilities </label>
</div>
<div class="checkbox">
<label>
<input type="checkbox" name="gvlapp_smrt_analysis" ngControl="smrt_portal" />
PacBio SMRT Analysis </label>
</div>
</div>
</panel-body>
</config-panel>
<!-- CloudMan settings -->
<cloudman-config [initialConfig]="initialConfig.config_gvl"></cloudman-config>
</form>
Cloud capacity is great -
but what do we use it for?
Bioinformatics: in one slide
A multi-disciplinary science using computers for acquiring, managing and
analyzing biological data.
It is a data-driven science.
Biology Medicine
Math &
Physics
Computer
Science
Bioinformatics
What type of data are we talking about?
DNA → RNA → Protein → to Complex… to Tissues… to Organs… to full
Organisms
Each cell contains an (almost) the same DNA in it nuclei.
Adult human body has approximately 37 trillion cells.
Apply data transformations to extract useful information
This is not always a well-defined process
This is typically done with existing tools, or by developing one’s own
Tools can be chained into workflows
What do we do with the data?
And how do we obtain such data?
First methods developed in the mid-1970’s, called Sanger sequencing.
In the 1990’s, the international Human Genome Project took 13 years to sequence
the human genome.
In the 2000’s, massively parallel Next Generation Sequencers (NGS) were
developed that took days to sequence a human genome at a much lesser cost.
Today, nanopore sequencers are emerging offering real time sequencing.
There are many public data repositories with free access to data (e.g., TCGA,
1000 genomes, GenBank).
omicsmaps.com
Cloud computing and bioinformatics
Cloud computing and bioinformatics
Results
External reference
data
Raw
data
Raw data to results
100-1000's GB
few GB
Typical genomics flow
Results
Raw
data
Some computers + reliable persistent data storage +
bioinf tools + reference data + workflow system
100-1000's GB
few GB
Indexed
genomes
10-100's GB
Aug
Sep
Oct
Nov
...
A real-world system
A Data analysis and integration tool
A (free for everyone) web service integrating a
wealth of tools, compute resources, terabytes of
reference data and permanent storage
Open source software that makes integrating your
own tools and data and customizing for your own
site simple
Galaxy: accessible analysis system
Three ways to use Galaxy
1. Download and run locally
2. Public website (http://usegalaxy.org)
3. Run on the Cloud
100sG
B
100+
Results
Raw
data
Some computers + reliable persistent data storage +
bioinf tools + reference data + workflow system
100-1000's GB
few GB
Indexed
genomes
10-100's GB
Aug
Sep
Oct
Nov
...
A real-world system
CloudBridge
CloudLaunch
CloudMan
CloudScale
Pathway Expected Outcomes
Improved features (root volume size)
Cloud independence
Improved stability
Federated single-sign on
“One-click” launch
Bulk launch
Cloud Independence
Tasks
Complete
Use CloudBridge
Assemble image from
Docker containers
Remove shared filesystem
Simpler deployment
Extensible platform
Scaling for institutional Galaxy instances
Scale-out support for labs
Audience
All users
Academic users
All
Virtual labs
(e.g., GVL,
CLIMB)
CLIMB
/Other labs
Hosted services
Tutorials
Complete
Progress roadmap
Acknowledgments
Did this sound interesting?
This entire project is an effort from a large community!
Come talk to us - get involved.
enis.afgan@jhu.edu or nuwan.goonasekera@unimelb.edu.au

More Related Content

PDF
The pulse of cloud computing with bioinformatics as an example
PDF
Testing data and metadata backends with ClawIO
PDF
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
PPTX
Open Science Data Cloud (IEEE Cloud 2011)
PDF
Data-intensive IceCube Cloud Burst
PPTX
Open Science Data Cloud - CCA 11
PPTX
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
PPTX
Learning Systems for Science
The pulse of cloud computing with bioinformatics as an example
Testing data and metadata backends with ClawIO
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Open Science Data Cloud (IEEE Cloud 2011)
Data-intensive IceCube Cloud Burst
Open Science Data Cloud - CCA 11
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Learning Systems for Science

What's hot (20)

PPTX
Toward a National Research Platform
PPT
DIET_BLAST
PDF
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
PDF
Burst data retrieval after 50k GPU Cloud run
PDF
CERNBox: Site Report
PPTX
"Building and running the cloud GPU vacuum cleaner"
PDF
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
PDF
饿了么 TensorFlow 深度学习平台:elearn
PDF
Materials Project computation and database infrastructure
PPTX
NERSC, AI and the Superfacility, Debbie Bard
PDF
The Materials Project: Experiences from running a million computational scien...
PDF
Kubernetes stack reliability
PPTX
The Pacific Research Platform
 Two Years In
PPTX
Scaling Apache Storm (Hadoop Summit 2015)
PDF
How HPC and large-scale data analytics are transforming experimental science
PPTX
Taming Big Data!
PPTX
The Pacific Research Platform Two Years In
PDF
Bergman Enabling Computation for neuro ML external
PDF
The Open Science Data Cloud: Empowering the Long Tail of Science
PDF
LambdaFabric for Machine Learning Acceleration
Toward a National Research Platform
DIET_BLAST
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Burst data retrieval after 50k GPU Cloud run
CERNBox: Site Report
"Building and running the cloud GPU vacuum cleaner"
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
饿了么 TensorFlow 深度学习平台:elearn
Materials Project computation and database infrastructure
NERSC, AI and the Superfacility, Debbie Bard
The Materials Project: Experiences from running a million computational scien...
Kubernetes stack reliability
The Pacific Research Platform
 Two Years In
Scaling Apache Storm (Hadoop Summit 2015)
How HPC and large-scale data analytics are transforming experimental science
Taming Big Data!
The Pacific Research Platform Two Years In
Bergman Enabling Computation for neuro ML external
The Open Science Data Cloud: Empowering the Long Tail of Science
LambdaFabric for Machine Learning Acceleration
Ad

Viewers also liked (15)

PDF
Resource planning on the (Amazon) cloud
PDF
Java For The Cloud Presentation @ AlphaCSP's JavaEdge 2008
PDF
Java Update - Bristol JUG. Part 2 - Java EE / Java in the Cloud.
PPTX
CloudOps evening presentation from Amazon
PDF
Java EE 7: Developing for the Cloud at Java Day, Istanbul, May 2012
PDF
Introduction of Cloud Computing
PDF
Simplify Cloud Applications using Spring Cloud
PPTX
AWS Monitoring & Logging
PDF
Spring Cloud Servicesの紹介 #pcf_tokyo
PDF
Cloud Native Microservices with Spring Cloud
PDF
Java in the Cloud : PaaS Platforms in Comparison
PDF
Microservices - java ee vs spring boot and spring cloud
PDF
Microservice With Spring Boot and Spring Cloud
PPT
Applications Of Bioinformatics In Drug Discovery And Process
PDF
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Resource planning on the (Amazon) cloud
Java For The Cloud Presentation @ AlphaCSP's JavaEdge 2008
Java Update - Bristol JUG. Part 2 - Java EE / Java in the Cloud.
CloudOps evening presentation from Amazon
Java EE 7: Developing for the Cloud at Java Day, Istanbul, May 2012
Introduction of Cloud Computing
Simplify Cloud Applications using Spring Cloud
AWS Monitoring & Logging
Spring Cloud Servicesの紹介 #pcf_tokyo
Cloud Native Microservices with Spring Cloud
Java in the Cloud : PaaS Platforms in Comparison
Microservices - java ee vs spring boot and spring cloud
Microservice With Spring Boot and Spring Cloud
Applications Of Bioinformatics In Drug Discovery And Process
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Ad

Similar to Cloud computing and bioinformatics (20)

PPTX
Federated Cloud Computing
PDF
Research Inventy : International Journal of Engineering and Science
PDF
Introduction to Cloud Computing
PPTX
Ohio LinuxFest: Crash Course in Open Source Cloud Computing
PPTX
Federating Infrastructure as a Service cloud computing systems to create a un...
PPT
Cloud Computing
PDF
Cloud Expo East 2013: Essential Open Source Software for Building the Open Cloud
PDF
Openstack Pakistan intro
PPTX
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
PPTX
Bionimbus - An Overview (2010-v6)
PDF
Understanding Kubernetes
PPTX
Cloud Native Summit 2019 Summary
PDF
Cloud Standards in the Real World: Cloud Standards Testing for Developers
PDF
Cloud Services On UI and Ideas for Federated Cloud on idREN
PPT
CHAPTER 2 cloud computing technology in cs
PDF
LinuxFest NW 2013: Hitchhiker's Guide to Open Source Cloud Computing
PDF
OpenStack for VMware Administrators
PDF
Linux Foundation Collaboration Summit: Hitchhiker's Guide to the Cloud
PPTX
CNCF Introduction - Feb 2018
PPTX
Machine learning in cybersecutiry
Federated Cloud Computing
Research Inventy : International Journal of Engineering and Science
Introduction to Cloud Computing
Ohio LinuxFest: Crash Course in Open Source Cloud Computing
Federating Infrastructure as a Service cloud computing systems to create a un...
Cloud Computing
Cloud Expo East 2013: Essential Open Source Software for Building the Open Cloud
Openstack Pakistan intro
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
Bionimbus - An Overview (2010-v6)
Understanding Kubernetes
Cloud Native Summit 2019 Summary
Cloud Standards in the Real World: Cloud Standards Testing for Developers
Cloud Services On UI and Ideas for Federated Cloud on idREN
CHAPTER 2 cloud computing technology in cs
LinuxFest NW 2013: Hitchhiker's Guide to Open Source Cloud Computing
OpenStack for VMware Administrators
Linux Foundation Collaboration Summit: Hitchhiker's Guide to the Cloud
CNCF Introduction - Feb 2018
Machine learning in cybersecutiry

More from Enis Afgan (15)

PDF
Federated Galaxy: Biomedical Computing at the Frontier
PDF
From laptop to super-computer: standardizing installation and management of G...
PDF
Horizontal scaling with Galaxy
PDF
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
PDF
2016 07 - CloudBridge Python library (XSEDE16)
PDF
2017.07.19 Galaxy & Jetstream cloud
PDF
Galaxy CloudMan performance on AWS
PDF
Adding Transparency and Automation into the Galaxy Tool Installation Process
PDF
Enabling Cloud Bursting for Life Sciences within Galaxy
PDF
Introduction to Galaxy and RNA-Seq
PDF
IRB Galaxy CloudMan radionica
PDF
GCC 2014 scriptable workshop
PPTX
Data analysis with Galaxy on the Cloud
PDF
Galaxy workshop
PDF
CloudMan workshop
Federated Galaxy: Biomedical Computing at the Frontier
From laptop to super-computer: standardizing installation and management of G...
Horizontal scaling with Galaxy
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
2016 07 - CloudBridge Python library (XSEDE16)
2017.07.19 Galaxy & Jetstream cloud
Galaxy CloudMan performance on AWS
Adding Transparency and Automation into the Galaxy Tool Installation Process
Enabling Cloud Bursting for Life Sciences within Galaxy
Introduction to Galaxy and RNA-Seq
IRB Galaxy CloudMan radionica
GCC 2014 scriptable workshop
Data analysis with Galaxy on the Cloud
Galaxy workshop
CloudMan workshop

Recently uploaded (20)

PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
history of c programming in notes for students .pptx
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
ai tools demonstartion for schools and inter college
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
System and Network Administration Chapter 2
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
How to Migrate SBCGlobal Email to Yahoo Easily
How to Choose the Right IT Partner for Your Business in Malaysia
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
CHAPTER 2 - PM Management and IT Context
Navsoft: AI-Powered Business Solutions & Custom Software Development
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Design an Analysis of Algorithms I-SECS-1021-03
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PTS Company Brochure 2025 (1).pdf.......
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Operating system designcfffgfgggggggvggggggggg
Wondershare Filmora 15 Crack With Activation Key [2025
Understanding Forklifts - TECH EHS Solution
history of c programming in notes for students .pptx
Upgrade and Innovation Strategies for SAP ERP Customers
ai tools demonstartion for schools and inter college
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
System and Network Administration Chapter 2
Adobe Illustrator 28.6 Crack My Vision of Vector Design

Cloud computing and bioinformatics

  • 1. Cloud Computing and Bioinformatics Enis Afgan*, Nuwan Goonasekera† * Johns Hopkins University, Taylor Lab, USA † University of Melbourne, Victorian Life Science Computation Initiative, Australia @ University of Colombo Feb 2017
  • 2. Overview • The key characteristics of cloud computing • Dynamically scaling cloud resources • Using Cloud Computing for bioinformatics Source: http://dilbert.com/strips/comic/2012-05-25/
  • 3. Life before cloud computing source: http://www.rackspace.com/knowledge_center/whitepaper/revolution-not-evolution-how-cloud-computing-differs-from-traditional-it-and-why-it
  • 4. Cloud Computing: A Definition • NIST definition: “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” » National Institute of Standards and Technology (http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf)
  • 5. The Cloud Model Private Community Public Hybrid Deployment Models Delivery Models Essential Characteristics Software as a Service (SaaS) Platform as a Service (PaaS) Infrastructure as a Service (IaaS) • On-demand self-service • Broad network access • Resource pooling • Rapid elasticity • Measured service
  • 7. Public SaaS examples • Gmail • Sharepoint • Salesforce.com CRM • On-live • Gaikai • Microsoft Office 365 • Some definitions include those that do not require payment. E.g. ad-supported sites
  • 8. Public PaaS Examples Cloud Name Language and Developer Tools Programming Models Supported by Provider Target Applications and Storage Options Google App Engine Python, Java, Go, PHP + JVM languages (scala, groovy, jruby) MapReduce, Web, DataStore, Storage and other APIs Web applications and BigTable storage Salesforce.com’s Force.com Apex, Eclipsed-based IDE, web-based wizard Workflow, excel-like formula, web programming Business applications such as CRM Microsoft Azure .NET, Visual Studio, Azure tools Unrestricted model Enterprise and web apps Amazon Elastic MapReduce Hive, Pig, Java, Ruby etc. MapReduce Data processing and e-commerce Aneka .NET, stand-alone SDK Threads, task, MapReduce .NET enterprise applications, HPC
  • 9. Infrastructure-as-a-Service (IaaS) • Amazon Web Services (Market leader) • Rackspace Cloud • NeCTAR/OpenStack Research Cloud • Joyent Cloud • GoGrid • FlexiScale
  • 10. Common Terms Machine Image: A stored image/template from which a new virtual machine can be launched. E.g. Ubuntu Instance: A running virtual machine based on some machine image. Volume: Attachable Block Storage, which is the equivalent of a virtual disk drive. Object Store: A large store for storing simple binary objects + metadata within containers Security Groups: A means of specifying firewall rules Key-pairs: Public/private key pairs for accessing a virtual machine
  • 11. Getting started with Cloud resources
  • 13. Many clouds exist - how do we use them?
  • 14. Many clouds and many solutions launch.genome.edu.au ; use.jetstream-cloud.org ; launch.usegalaxy.org ?!?!
  • 15. Architectural stack CloudLaunch.usegalaxy.org A P P L I C A T I O N S CloudBridge CloudMan Goonasekera, N., Lonie, A., Taylor, J., Afgan, E., “CloudBridge – a Simple Cross-Cloud Python Library”, XSEDE 16, Miami, FL, July 2016.
  • 17. CloudBridge Design Principles A simple, open-source python multi-cloud library. Uniform API irrespective of the underlying provider No special casing of application code Simpler code Provide a set of conformance tests for all supported clouds No need to test against each cloud “Write-once-run-anywhere” > 92% test coverage at present Supports OpenStack and AWS right now Community contributions for GCE and Azure forthcoming! http://cloudbridge.readthedocs.org/ https://github.com/gvlproject/cloudbridge
  • 18. Sample code: launch an instance 1. kp = provider.security.key_pairs.create('cloudbridge_intro') 2. with open('cloudbridge_intro.pem', 'w') as f: 3. f.write(kp.material) 4. sg = provider.security.security_groups.create( 5. 'cloudbridge_intro', 'A security group used by CloudBridge') 6. sg.add_rule('tcp', 22, 22, '0.0.0.0/0') 7. img = provider.compute.images.get(image_id) 8. inst_type = sorted([t for t in provider.compute.instance_types.list() if t.vcpus >= 2 and t.ram >= 4], key=lambda x: x.vcpus*x.ram)[0] 9. inst = provider.compute.instances.create( name='CloudBridge-intro', image=img, instance_type=inst_type, key_pair=kp, security_groups=[sg]) 10. # Wait until ready 11. inst.wait_till_ready() 12. # Show instance state 13. inst.state 14. # 'running' 15. inst.public_ips 16. # [u'54.166.125.219'] Create a key pair Create a security group Launch an instance
  • 19. Portal for deploying cloud-enable applications. Support for customization Support launch for diff versions, apps, configs, clouds → fill a role of a science gateway discovery and access portal Modular and extensible platform App-store for cloud-enabled applications Users can develop and integrate custom application launch and management components, at the UI and backend Natively multi-cloud Backed by CloudBridge CloudLaunch Feature Highlights https://beta.launch.usegalaxy.org/ https://github.com/galaxyproject/cloudlaunch-ui https://github.com/galaxyproject/cloudlaunch
  • 20. CloudLaunch architecture CloudBridge Django + REST framework + Celery Angular 2 GVL CloudMan Galaxy CloudMan Ubuntu Pluggable components
  • 21. Pluggable component example <form class="form" [ngFormModel]="gvlLaunchForm" (ngSubmit)="onSubmit(gvlLaunchForm.value)"> <!-- GVL Component Selection --> <config-panel> <panel-header>GVL Settings</panel-header> <panel-body> <div class="form-group"> <label>Auto-start the selected applications</label> <div class="checkbox"> <label> <input type="checkbox" name="gvlapp_cmdlineutils" ngControl="gvl_cmdline_utilities" /> GVL Commandline Utilities </label> </div> <div class="checkbox"> <label> <input type="checkbox" name="gvlapp_smrt_analysis" ngControl="smrt_portal" /> PacBio SMRT Analysis </label> </div> </div> </panel-body> </config-panel> <!-- CloudMan settings --> <cloudman-config [initialConfig]="initialConfig.config_gvl"></cloudman-config> </form>
  • 22. Cloud capacity is great - but what do we use it for?
  • 23. Bioinformatics: in one slide A multi-disciplinary science using computers for acquiring, managing and analyzing biological data. It is a data-driven science. Biology Medicine Math & Physics Computer Science Bioinformatics
  • 24. What type of data are we talking about? DNA → RNA → Protein → to Complex… to Tissues… to Organs… to full Organisms Each cell contains an (almost) the same DNA in it nuclei. Adult human body has approximately 37 trillion cells.
  • 25. Apply data transformations to extract useful information This is not always a well-defined process This is typically done with existing tools, or by developing one’s own Tools can be chained into workflows What do we do with the data?
  • 26. And how do we obtain such data? First methods developed in the mid-1970’s, called Sanger sequencing. In the 1990’s, the international Human Genome Project took 13 years to sequence the human genome. In the 2000’s, massively parallel Next Generation Sequencers (NGS) were developed that took days to sequence a human genome at a much lesser cost. Today, nanopore sequencers are emerging offering real time sequencing. There are many public data repositories with free access to data (e.g., TCGA, 1000 genomes, GenBank).
  • 30. Results External reference data Raw data Raw data to results 100-1000's GB few GB Typical genomics flow
  • 31. Results Raw data Some computers + reliable persistent data storage + bioinf tools + reference data + workflow system 100-1000's GB few GB Indexed genomes 10-100's GB Aug Sep Oct Nov ... A real-world system
  • 32. A Data analysis and integration tool A (free for everyone) web service integrating a wealth of tools, compute resources, terabytes of reference data and permanent storage Open source software that makes integrating your own tools and data and customizing for your own site simple
  • 34. Three ways to use Galaxy 1. Download and run locally 2. Public website (http://usegalaxy.org) 3. Run on the Cloud
  • 36. Results Raw data Some computers + reliable persistent data storage + bioinf tools + reference data + workflow system 100-1000's GB few GB Indexed genomes 10-100's GB Aug Sep Oct Nov ... A real-world system
  • 37. CloudBridge CloudLaunch CloudMan CloudScale Pathway Expected Outcomes Improved features (root volume size) Cloud independence Improved stability Federated single-sign on “One-click” launch Bulk launch Cloud Independence Tasks Complete Use CloudBridge Assemble image from Docker containers Remove shared filesystem Simpler deployment Extensible platform Scaling for institutional Galaxy instances Scale-out support for labs Audience All users Academic users All Virtual labs (e.g., GVL, CLIMB) CLIMB /Other labs Hosted services Tutorials Complete Progress roadmap
  • 39. Did this sound interesting? This entire project is an effort from a large community! Come talk to us - get involved. enis.afgan@jhu.edu or nuwan.goonasekera@unimelb.edu.au