SlideShare a Scribd company logo
Zero to a Bioinformatics
Analysis Platform in Four Minutes
    Enis Afgan, Brad Chapman, Konstantinos Krampis, James Taylor
                                                         BOSC 2012
                                                     Long Beach, CA
Australian National Research Cloud



Provide computational infrastructure to support researchers
needs

    Compute and Storage
    (~25,000 cores + ? PB)
What’s required for genomics?
✔ •    Compute
✔ •    Storage
  •    Data resources
       o  Ensembl, dbSNP, etc

  •    Tools
  •    Visualisation
  •    Protocols
  •    Expertise
  •    Community!
Genomics Virtual Lab
Compute + Storage =   IaaS
shell vs. IDE

   We want it now
What’s required for genomics?
•    Compute
•    Storage
•    Data resources
     o  Ensembl, dbSNP, etc

•    Tools
•    Visualisation
•    Protocols
•    Expertise
•    Community!
Galaxy




                          y
       y                     CloudMan
CloudBioLinux            y
     y


           BioCloudCentral.org
Playing together
•    CloudBioLinux
     o  Quickly build-your-own tool suite / ready to roll
     o  Graphical & command line access

•    CloudMan
     o  Create a scalable and shareable processing platform

•    Galaxy
     o  Do exploratory analysis

•    BioCloudCentral.org
     o  Get started easily
E Afgan - Zero to a bioinformatics analysis platform in four minutes
•    Bundle infrastructure with an analysis tool suite, quickly
     o  Validate our approach
     o  Easier to maintain and replicate

•    Expose it all via at a variety of interfaces
     o  Support meta-analysis workflow

•    Move forward
     o  Add new features
     o  Start using it
And one new thing…

blend
 o    A python library for interacting with Galaxy’s API
 o    And CloudMan
 o    And BioCloudCentral
Request compute infrastructure

Manipulate compute infrastructure

Upload data and run analyses




Docs and examples




Test               Automate
                 repetitive tasks
Distribute
Docs and
examples included
http://blend.readthedocs.org/
Playing together
•    CloudBioLinux
     o  Build-your-own tool suite / ready to roll
     o  Graphical & command line access

•    CloudMan
     o  Create a scalable and shareable processing platform

•    BioCloudCentral.org
     o  Get started easily

•    Galaxy
     o  Do exploratory analysis

•    Blend library
     o  Automate repetitive tasks: analysis AND infrastructure
Questions?
  cloudbiolinux.org
  usecloudman.org
  usegalaxy.org
  biocloudcentral.org
  blend.readthedocs.org

  Visit the poster session (poster #10)

More Related Content

PPTX
Hyperloglog Lightning Talk
PPTX
Graph Databases at Netflix
PDF
HPC Cloud - SURF Research Boot Camp
PDF
Cloud computing and bioinformatics
PPTX
Deploy Deep Learning Models with TensorFlow + Lambda
PPT
Super Fast Gevent Introduction
PPTX
Elk meetup boston - logz.io
PDF
Oscon 2015 - OpenIO: Enabling the petabyte-sized mailboxes
Hyperloglog Lightning Talk
Graph Databases at Netflix
HPC Cloud - SURF Research Boot Camp
Cloud computing and bioinformatics
Deploy Deep Learning Models with TensorFlow + Lambda
Super Fast Gevent Introduction
Elk meetup boston - logz.io
Oscon 2015 - OpenIO: Enabling the petabyte-sized mailboxes

What's hot (20)

PDF
Kibana + timelion: time series with the elastic stack
PDF
Preservation Workflows with Taverna
PDF
IT Press Tour #17 - OpenIO & Technology
PDF
OpenStack - Pour un Cloud ouvert - Journées FedeRez 2014
PDF
Introducing Hydra – An Open Source Document Processing Framework
PPTX
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
PDF
Ci Connect: A service for building multi-institutional cluster environments
PPTX
Ansible
PDF
Webinar kubernetes and-spark
PPTX
Indexing big data in the cloud
PPTX
Use cases for cassandra in federal and state government
PDF
Open Science Grid
PDF
Joel Jacobson (Datastax) - Diagnosing Cassandra Problems in Production
PPTX
Building big data applications on AWS by Ran Tessler
PPTX
IronSource Atom - Redshift - Lessons Learned
ODP
Circos plot
PDF
Building an open data platform with apache iceberg
PPTX
Shug meetup Hops Hadoop
PPTX
Elastic Stack Introduction
PDF
Datascience lab 2017 odessa kappa architecture 2.0
Kibana + timelion: time series with the elastic stack
Preservation Workflows with Taverna
IT Press Tour #17 - OpenIO & Technology
OpenStack - Pour un Cloud ouvert - Journées FedeRez 2014
Introducing Hydra – An Open Source Document Processing Framework
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
Ci Connect: A service for building multi-institutional cluster environments
Ansible
Webinar kubernetes and-spark
Indexing big data in the cloud
Use cases for cassandra in federal and state government
Open Science Grid
Joel Jacobson (Datastax) - Diagnosing Cassandra Problems in Production
Building big data applications on AWS by Ran Tessler
IronSource Atom - Redshift - Lessons Learned
Circos plot
Building an open data platform with apache iceberg
Shug meetup Hops Hadoop
Elastic Stack Introduction
Datascience lab 2017 odessa kappa architecture 2.0
Ad

Similar to E Afgan - Zero to a bioinformatics analysis platform in four minutes (20)

PPTX
re:Invent 2013-foster-madduri
PDF
Intro to Machine Learning with H2O and AWS
PDF
New Developments in H2O: April 2017 Edition
PDF
DevOps environment with OpenStack and NetApp
PDF
Open Source Visualization of Scientific Data
PPTX
Cloud patterns
PPT
Apache Cassandra training. Overview and Basics
PDF
Latest Developments in H2O
PDF
H2O Deep Water - Making Deep Learning Accessible to Everyone
PDF
Chemical Databases and Open Chemistry on the Desktop
PPTX
Packaging computational biology tools for broad distribution and ease-of-reuse
PPTX
Architecting Your First Big Data Implementation
PPTX
Blue Teaming on a Budget of Zero
PPTX
The Power of Azure DevOps
PPT
Avoiding cloud lock-in
PPTX
OpenStack Documentation in the Open
PPTX
Data visualisation in python tool - a brief
PDF
Genomics Applications in the Cloud with the DNAnexus Platform
PPT
Cloud computing and Hadoop introduction
PDF
Kaspersky Lab Products Remover 1.0.5497.0
re:Invent 2013-foster-madduri
Intro to Machine Learning with H2O and AWS
New Developments in H2O: April 2017 Edition
DevOps environment with OpenStack and NetApp
Open Source Visualization of Scientific Data
Cloud patterns
Apache Cassandra training. Overview and Basics
Latest Developments in H2O
H2O Deep Water - Making Deep Learning Accessible to Everyone
Chemical Databases and Open Chemistry on the Desktop
Packaging computational biology tools for broad distribution and ease-of-reuse
Architecting Your First Big Data Implementation
Blue Teaming on a Budget of Zero
The Power of Azure DevOps
Avoiding cloud lock-in
OpenStack Documentation in the Open
Data visualisation in python tool - a brief
Genomics Applications in the Cloud with the DNAnexus Platform
Cloud computing and Hadoop introduction
Kaspersky Lab Products Remover 1.0.5497.0
Ad

More from Jan Aerts (20)

PDF
VIZBI 2014 - Visualizing Genomic Variation
PDF
Visual Analytics in Omics - why, what, how?
PDF
Visual Analytics in Omics: why, what, how?
PDF
Visual Analytics talk at ISMB2013
PDF
Visualizing the Structural Variome (VMLS-Eurovis 2013)
PPT
Humanizing Data Analysis
PDF
Intro to data visualization
PDF
L Fu - Dao: a novel programming language for bioinformatics
PPTX
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
PDF
S Cain - GMOD in the cloud
PDF
B Temperton - The Bioinformatics Testing Consortium
PDF
J Goecks - The Galaxy Visual Analysis Framework
PDF
S Cain - GMOD in the cloud
PDF
B Chapman - Toolkit for variation comparison and analysis
PDF
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
PPT
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
PPT
S Cheng - eagle-i: development and expansion of a scientific resource discove...
PPTX
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
PDF
A Kalderimis - InterMine: Embeddable datamining components
PPT
B Kinoshita - Creating biology pipelines with BioUno
VIZBI 2014 - Visualizing Genomic Variation
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics: why, what, how?
Visual Analytics talk at ISMB2013
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Humanizing Data Analysis
Intro to data visualization
L Fu - Dao: a novel programming language for bioinformatics
J Wang - bioKepler: a comprehensive bioinformatics scientific workflow module...
S Cain - GMOD in the cloud
B Temperton - The Bioinformatics Testing Consortium
J Goecks - The Galaxy Visual Analysis Framework
S Cain - GMOD in the cloud
B Chapman - Toolkit for variation comparison and analysis
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
A Kanterakis - PyPedia: a python crowdsourcing development environment for bi...
A Kalderimis - InterMine: Embeddable datamining components
B Kinoshita - Creating biology pipelines with BioUno

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Modernizing your data center with Dell and AMD
PDF
cuic standard and advanced reporting.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
KodekX | Application Modernization Development
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPT
Teaching material agriculture food technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
Empathic Computing: Creating Shared Understanding
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Reach Out and Touch Someone: Haptics and Empathic Computing
Modernizing your data center with Dell and AMD
cuic standard and advanced reporting.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KodekX | Application Modernization Development
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Review of recent advances in non-invasive hemoglobin estimation
Network Security Unit 5.pdf for BCA BBA.
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Machine learning based COVID-19 study performance prediction
Chapter 3 Spatial Domain Image Processing.pdf
Electronic commerce courselecture one. Pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Teaching material agriculture food technology
The AUB Centre for AI in Media Proposal.docx
NewMind AI Monthly Chronicles - July 2025
Spectral efficient network and resource selection model in 5G networks
The Rise and Fall of 3GPP – Time for a Sabbatical?

E Afgan - Zero to a bioinformatics analysis platform in four minutes

  • 1. Zero to a Bioinformatics Analysis Platform in Four Minutes Enis Afgan, Brad Chapman, Konstantinos Krampis, James Taylor BOSC 2012 Long Beach, CA
  • 2. Australian National Research Cloud Provide computational infrastructure to support researchers needs Compute and Storage (~25,000 cores + ? PB)
  • 3. What’s required for genomics? ✔ •  Compute ✔ •  Storage •  Data resources o  Ensembl, dbSNP, etc •  Tools •  Visualisation •  Protocols •  Expertise •  Community!
  • 6. shell vs. IDE We want it now
  • 7. What’s required for genomics? •  Compute •  Storage •  Data resources o  Ensembl, dbSNP, etc •  Tools •  Visualisation •  Protocols •  Expertise •  Community!
  • 8. Galaxy y y CloudMan CloudBioLinux y y BioCloudCentral.org
  • 9. Playing together •  CloudBioLinux o  Quickly build-your-own tool suite / ready to roll o  Graphical & command line access •  CloudMan o  Create a scalable and shareable processing platform •  Galaxy o  Do exploratory analysis •  BioCloudCentral.org o  Get started easily
  • 11. •  Bundle infrastructure with an analysis tool suite, quickly o  Validate our approach o  Easier to maintain and replicate •  Expose it all via at a variety of interfaces o  Support meta-analysis workflow •  Move forward o  Add new features o  Start using it
  • 12. And one new thing… blend o  A python library for interacting with Galaxy’s API o  And CloudMan o  And BioCloudCentral
  • 13. Request compute infrastructure Manipulate compute infrastructure Upload data and run analyses Docs and examples Test Automate repetitive tasks Distribute
  • 15. Playing together •  CloudBioLinux o  Build-your-own tool suite / ready to roll o  Graphical & command line access •  CloudMan o  Create a scalable and shareable processing platform •  BioCloudCentral.org o  Get started easily •  Galaxy o  Do exploratory analysis •  Blend library o  Automate repetitive tasks: analysis AND infrastructure
  • 16. Questions? cloudbiolinux.org usecloudman.org usegalaxy.org biocloudcentral.org blend.readthedocs.org Visit the poster session (poster #10)