SlideShare a Scribd company logo
Connecting Chipster genome browser
to the cloud




                      Aleksi Kallio
           CSC – IT Center for Science, Finland
Architecture of Chipster platform
                 Authentication           Management
                    service                 service




                          Message broker


                            File broker
     Clients
                     Brokers                  Computing
                                               services

   Loosely coupled, independent components
   Message oriented communications
   Flexible, scalable, robust
   In other words, very cloud like
Kallio bosc2010 chipster-cloud
Chipster in the cloud

 1) Deploying compute nodes in the cloud
    •   Easy, because architecture already loosely coupled and based
        on message passing
 2) Running large parallel jobs in the cloud
    •   Architecture allows this easily
    •   Cloud compatible tools can be integrated quickly
 3) Using cloud as a back end for interactive
  visualisations
    •   Not maybe so obvious
    •   So let's dig into this further...
Background: Chipster Genome Browser


   Interactive Swing-based GUI
   Shows reads and analysis results in genomic context
   Interactive zooming from chromosome down to nucleotide level
   Ensembl annotations for genes and transcripts
   Integrated with the rest of the Chipster
   Parallel, distributed to some extent
Kallio bosc2010 chipster-cloud
Kallio bosc2010 chipster-cloud
Kallio bosc2010 chipster-cloud
Basic idea

 Preprocess data with Hadoop / MapReduce
 Generate powers of two summaries for the data, like in
  Google Earth
    •   Doubles the data size
 Current genome browser samples data to produce
  summaries
 Now summaries can be read directly
    – Accurate results, significantly less disk seeks
 Distribute data to scale into massive datasets
    •   Use messaging to query independent data providers
 Aggregate results as/if they appear to the visualiser
Work in progress...

 Genome browser up and
  running
 Hadoop based data
  processing at very early
  stages
 Currently trying to get it
  scale well
What's the point?

 Besides items (e.g., reads), visualiser can receive
  “superitems” (e.g., summaries of reads)
    •   Summarises coverage, quality, SNP's etc. of the original reads
 All kinds of advanced information can be generated in
  the preprocessing step
    – Such as features that combine large number of genomes
    – Generators should be pluggable
 We spend resources on the server side to improve user
  experience on the client side
    •   At server side CPU, memory and disk space required
    •   But only for a short time (like in large batch jobs)
    •   Cheap commodity servers can be used
    •   And the experiment has already been expensive
Summary

 Use cheap server resources to enable better user
  experience
 Goal: to make data analysis quicker (and more fun)
 Tackle server side unreliability on the client side
 Future development
        –    If this works out, it could be used in other Chipster
             visualisers also
        –    Integrating Hbase queries to interactive visualisations
        –    Optimising data summarising for visual truthfulness
 For more info: aleksi.kallio@csc.fi,

More Related Content

PPTX
The elephantintheroom bigdataanalyticsinthecloud
PPTX
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
PDF
Deep Learning on Apache Spark
PPTX
What is cloud computing ?
PPTX
Dig economy workshop slides
PDF
Conclusions
PPT
никуда я не хочу идти
PDF
Economy katalog
The elephantintheroom bigdataanalyticsinthecloud
Tensorflow London 13: Barbara Fusinska 'Hassle Free, Scalable, Machine Learni...
Deep Learning on Apache Spark
What is cloud computing ?
Dig economy workshop slides
Conclusions
никуда я не хочу идти
Economy katalog

Viewers also liked (11)

PPTX
Ensiklopedi Mukjizat Alquran dan Hadis
PPT
Plodinec nola-082610
PDF
Pictures of students in sw 475
PPT
Chap013 sales management
PPTX
Nutriferonageofthesuperbugs2132013 130325160944-phpapp02
PPTX
Sfe time robbers
PDF
Texto sustitutorio final informe btr
PPTX
Proyecto de aula
PPTX
D:\documents and settings\informatica\escritorio\collage ninos indigo[1]
PPT
Alfabeto español
Ensiklopedi Mukjizat Alquran dan Hadis
Plodinec nola-082610
Pictures of students in sw 475
Chap013 sales management
Nutriferonageofthesuperbugs2132013 130325160944-phpapp02
Sfe time robbers
Texto sustitutorio final informe btr
Proyecto de aula
D:\documents and settings\informatica\escritorio\collage ninos indigo[1]
Alfabeto español
Ad

Similar to Kallio bosc2010 chipster-cloud (20)

PPTX
HPC and cloud distributed computing, as a journey
PDF
Kognitio overview april 2013
PDF
Lunar Way and the Cloud Native "stack"
PDF
Kognitio feb 2013
ODP
Scaling Streaming - Concepts, Research, Goals
PDF
Privacy preserving public auditing for secured cloud storage
PPTX
ACES QuakeSim 2011
PDF
Estimating the Total Costs of Your Cloud Analytics Platform
PDF
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
PDF
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
PDF
IoT meets Big Data
PDF
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
PDF
H017144148
PDF
IoT Story: From Edge to HDP
ODP
Large scale data management in Chipster workflow environment
PDF
Kognitio cloud webinar feb 2013
PPTX
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
PPTX
cc_mod1.ppt useful for engineering students
PPTX
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
PPTX
Big Data on Cloud Native Platform
HPC and cloud distributed computing, as a journey
Kognitio overview april 2013
Lunar Way and the Cloud Native "stack"
Kognitio feb 2013
Scaling Streaming - Concepts, Research, Goals
Privacy preserving public auditing for secured cloud storage
ACES QuakeSim 2011
Estimating the Total Costs of Your Cloud Analytics Platform
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
DM Radio Webinar: Adopting a Streaming-Enabled Architecture
IoT meets Big Data
Comparative Analysis, Security Aspects & Optimization of Workload in Gfs Base...
H017144148
IoT Story: From Edge to HDP
Large scale data management in Chipster workflow environment
Kognitio cloud webinar feb 2013
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
cc_mod1.ppt useful for engineering students
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
Big Data on Cloud Native Platform
Ad

More from BOSC 2010 (20)

PPTX
Mercer bosc2010 microsoft_framework
PPT
Langmead bosc2010 cloud-genomics
PDF
Schultheiss bosc2010 persistance-web-services
PPT
Swertz bosc2010 molgenis
PPT
Rice bosc2010 emboss
PDF
Morris bosc2010 evoker
PPT
Kono bosc2010 pathway_projector
PPTX
Kanterakis bosc2010 molgenis
PDF
Gautier bosc2010 pythonbioconductor
PDF
Gardler bosc2010 community_developmentattheasf
PDF
Friedberg bosc2010 iprstats
PDF
Fields bosc2010 bio_perl
PDF
Chapman bosc2010 biopython
PDF
Bonnal bosc2010 bio_ruby
PDF
Puton bosc2010 bio_python-modules-rna
PPT
Bader bosc2010 cytoweb
PDF
Talevich bosc2010 bio-phylo
PPTX
Zmasek bosc2010 aptx
PPTX
Wilkinson bosc2010 moby-to-sadi
PPT
Venkatesan bosc2010 onto-toolkit
Mercer bosc2010 microsoft_framework
Langmead bosc2010 cloud-genomics
Schultheiss bosc2010 persistance-web-services
Swertz bosc2010 molgenis
Rice bosc2010 emboss
Morris bosc2010 evoker
Kono bosc2010 pathway_projector
Kanterakis bosc2010 molgenis
Gautier bosc2010 pythonbioconductor
Gardler bosc2010 community_developmentattheasf
Friedberg bosc2010 iprstats
Fields bosc2010 bio_perl
Chapman bosc2010 biopython
Bonnal bosc2010 bio_ruby
Puton bosc2010 bio_python-modules-rna
Bader bosc2010 cytoweb
Talevich bosc2010 bio-phylo
Zmasek bosc2010 aptx
Wilkinson bosc2010 moby-to-sadi
Venkatesan bosc2010 onto-toolkit

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Empathic Computing: Creating Shared Understanding
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
cuic standard and advanced reporting.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Big Data Technologies - Introduction.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
Spectral efficient network and resource selection model in 5G networks
Building Integrated photovoltaic BIPV_UPV.pdf
Understanding_Digital_Forensics_Presentation.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Empathic Computing: Creating Shared Understanding
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation
cuic standard and advanced reporting.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Big Data Technologies - Introduction.pptx
Review of recent advances in non-invasive hemoglobin estimation
Per capita expenditure prediction using model stacking based on satellite ima...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Reach Out and Touch Someone: Haptics and Empathic Computing
20250228 LYD VKU AI Blended-Learning.pptx
Unlocking AI with Model Context Protocol (MCP)
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
The Rise and Fall of 3GPP – Time for a Sabbatical?

Kallio bosc2010 chipster-cloud

  • 1. Connecting Chipster genome browser to the cloud Aleksi Kallio CSC – IT Center for Science, Finland
  • 2. Architecture of Chipster platform Authentication Management service service Message broker File broker Clients Brokers Computing services  Loosely coupled, independent components  Message oriented communications  Flexible, scalable, robust  In other words, very cloud like
  • 4. Chipster in the cloud  1) Deploying compute nodes in the cloud • Easy, because architecture already loosely coupled and based on message passing  2) Running large parallel jobs in the cloud • Architecture allows this easily • Cloud compatible tools can be integrated quickly  3) Using cloud as a back end for interactive visualisations • Not maybe so obvious • So let's dig into this further...
  • 5. Background: Chipster Genome Browser  Interactive Swing-based GUI  Shows reads and analysis results in genomic context  Interactive zooming from chromosome down to nucleotide level  Ensembl annotations for genes and transcripts  Integrated with the rest of the Chipster  Parallel, distributed to some extent
  • 9. Basic idea  Preprocess data with Hadoop / MapReduce  Generate powers of two summaries for the data, like in Google Earth • Doubles the data size  Current genome browser samples data to produce summaries  Now summaries can be read directly – Accurate results, significantly less disk seeks  Distribute data to scale into massive datasets • Use messaging to query independent data providers  Aggregate results as/if they appear to the visualiser
  • 10. Work in progress...  Genome browser up and running  Hadoop based data processing at very early stages  Currently trying to get it scale well
  • 11. What's the point?  Besides items (e.g., reads), visualiser can receive “superitems” (e.g., summaries of reads) • Summarises coverage, quality, SNP's etc. of the original reads  All kinds of advanced information can be generated in the preprocessing step – Such as features that combine large number of genomes – Generators should be pluggable  We spend resources on the server side to improve user experience on the client side • At server side CPU, memory and disk space required • But only for a short time (like in large batch jobs) • Cheap commodity servers can be used • And the experiment has already been expensive
  • 12. Summary  Use cheap server resources to enable better user experience  Goal: to make data analysis quicker (and more fun)  Tackle server side unreliability on the client side  Future development – If this works out, it could be used in other Chipster visualisers also – Integrating Hbase queries to interactive visualisations – Optimising data summarising for visual truthfulness  For more info: aleksi.kallio@csc.fi,