SlideShare a Scribd company logo
Scien&fic	
  Compu&ng	
  with	
  Amazon	
  Web	
  Services
Deepak	
  Singh




ACAT	
  2010.	
  	
  Jaipur,	
  India
Plenary Talk at ACAT 2010
Via Reavel under a CC-BY-NC-ND license
Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010
life science industry
Credit: Bosco Ho
Plenary Talk at ACAT 2010
By ~Prescott under a CC-BY-NC license
Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010
data
Image: Wikipedia
Image: Matt Wood
Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010
couldn’t find a good picture for
arrays of sensors in the ocean
Image	
  via	
  image	
  editor	
  under	
  a	
  CC-­‐BY	
  License
years
weeks
days
days
mi
    nu
      tes
days        ?
gigabytes
terabytes
petabytes
petabytes
ex
         ab
            y   tes
petabytes             ?
Image: Chris Dagdigian
Plenary Talk at ACAT 2010
scale has implications
data management
data processing
data sharing
amazon web services
the cloud
has_many :definitions
infrastructure as a service
Plenary Talk at ACAT 2010
Compute                          Storage
    Amazon Elastic Compute                                   Database
                                       Amazon Simple        Amazon RDS and
         Cloud (EC2)                 Storage Service (S3)
-       Elastic Load Balancing                                 SimpleDB
                                 -      AWS Import/Export
-            Auto Scaling
Payments          On-Demand
Parallel Processing                                     Messaging
                           Content Delivery                                Amazon Flexible      Workforce
     Amazon Elastic                                   Amazon Simple
                          Amazon CloudFront                               Payments Service   Amazon Mechanical
      MapReduce                                     Queue Service (SQS)
                                                                               (FPS)               Turk




          Compute                                     Storage
    Amazon Elastic Compute                                                               Database
                                                    Amazon Simple                       Amazon RDS and
         Cloud (EC2)                              Storage Service (S3)
-        Elastic Load Balancing                                                            SimpleDB
                                              -      AWS Import/Export
-             Auto Scaling
Tools                  Isolated Networks
         Monitoring                    Management
                                                                 AWS Toolkit for Eclipse        Amazon Virtual Private
     Amazon CloudWatch            AWS Management Console
                                                                  AWS Toolkit for .NET                 Cloud



                                                                                  Payments             On-Demand
Parallel Processing                                     Messaging
                           Content Delivery                                   Amazon Flexible           Workforce
     Amazon Elastic                                   Amazon Simple
                          Amazon CloudFront                                  Payments Service       Amazon Mechanical
      MapReduce                                     Queue Service (SQS)
                                                                                  (FPS)                   Turk




          Compute                                     Storage
    Amazon Elastic Compute                                                                    Database
                                                    Amazon Simple                            Amazon RDS and
         Cloud (EC2)                              Storage Service (S3)
-        Elastic Load Balancing                                                                 SimpleDB
                                              -      AWS Import/Export
-             Auto Scaling
Your Custom Applications and Services

                                                                          Tools                  Isolated Networks
         Monitoring                    Management
                                                                 AWS Toolkit for Eclipse        Amazon Virtual Private
     Amazon CloudWatch            AWS Management Console
                                                                  AWS Toolkit for .NET                 Cloud



                                                                                  Payments             On-Demand
Parallel Processing                                     Messaging
                           Content Delivery                                   Amazon Flexible           Workforce
     Amazon Elastic                                   Amazon Simple
                          Amazon CloudFront                                  Payments Service       Amazon Mechanical
      MapReduce                                     Queue Service (SQS)
                                                                                  (FPS)                   Turk




          Compute                                     Storage
    Amazon Elastic Compute                                                                    Database
                                                    Amazon Simple                            Amazon RDS and
         Cloud (EC2)                              Storage Service (S3)
-        Elastic Load Balancing                                                                 SimpleDB
                                              -      AWS Import/Export
-             Auto Scaling
scalable
scalable
cost effective
go
                        o u
                  s y
  scalable ay    a
            P
cost effective
scalable
cost effective
   reliable
scalable
cost effective
   reliable
   secure
Amazon EC2
servers on demand
highly scalable
Plenary Talk at ACAT 2010
elastic
3000 CPU’s for one firm’s risk management application
     3444JJ'
!"#$%&'()'*+,'-./01.2%/'




                                                                    344'+567/'(.'
                                                                    8%%9%.:/'




            344'JJ'



                           I%:.%/:1='    ;<"&/:1='     A&B:1='     C10"&:1='    C".:1='      E(.:1='      ;"%/:1='
                           >?,,?,44@'   >?,3?,44@'   >?,>?,44@'   >?,H?,44@'   >?,D?,44@'   >?,F?,44@'   >?,G?,44@'
highly available systems
Plenary Talk at ACAT 2010
“Everything fails, all the time”
                   -- Werner Vogels
2.3% AFR in population of 13,250
                         3.3% AFR in population of 22,400
                         4.2% AFR in population of 246,000




Source: James Hamilton
assume sw/hw failure
design apps to be resilient
automate & bootstrap
nothing fails
elastic block store
elastic IP
SQS
US East Region



Availability     Availability
 Zone A           Zone B



Availability     Availability
 Zone C           Zone D
on-demand instances
 reserved instances
   spot instances
data storage
one size does not fit all
Plenary Talk at ACAT 2010
Amazon S3
distributed object store
durable
available
!"#$%&'()*+


T                 T




     T
scalable
fast
simple
Plenary Talk at ACAT 2010
structured data anyone?
Amazon SimpleDB
zero administration
highly available
schema less
key-value store
Amazon Relational Data Service
single API call
MySQL database
automatic backup
scale up with API call
Plenary Talk at ACAT 2010
hosted hadoop service
hadoop easy and simple
Amazon Elastic
                                    MapReduce

                                     Amazon EC2 Instances
                                                                                                     End
Deploy Application
                                    Hadoop                Hadoop     Hadoop
                         Elastic                                                         Elastic
                       MapReduce                                                       MapReduce
                                    Hadoop                Hadoop     Hadoop                        Notify
Web Console, Command
      line tools                    Input                                    output
                                   dataset                                   results



                                        Input	
  S3	
              Output	
  S3	
                   Get Results
   Input Data
                                         bucket                     bucket



                                      Amazon S3
apache hive



 http://hadoop.apache.org/hive/
apache pig



http://hadoop.apache.org/pig/
cascading



http://www.cascading.org/
Plenary Talk at ACAT 2010
computing platforms
Plenary Talk at ACAT 2010
Plenary Talk at ACAT 2010
http://cyclecomputing.com
sudo gem install cloud-crowd

     http://cyclecomputing.com
http://wiki.github.com/documentcloud/cloud-crowd
http://www.rightscale.com
Amazon Elastic
                                    MapReduce

                                     Amazon EC2 Instances
                                                                                                     End
Deploy Application
                                    Hadoop                Hadoop     Hadoop
                         Elastic                                                         Elastic
                       MapReduce                                                       MapReduce
                                    Hadoop                Hadoop     Hadoop                        Notify
Web Console, Command
      line tools                    Input                                    output
                                   dataset                                   results



                                        Input	
  S3	
              Output	
  S3	
                   Get Results
   Input Data
                                         bucket                     bucket



                                      Amazon S3
application platforms
http://heroku.com
software distribution
http://www.cloudbiolinux.com/
http://bitbucket.org/galaxy/galaxy-central/wiki/Home
Plenary Talk at ACAT 2010
data distribution
Plenary Talk at ACAT 2010
http://aws.amazon.com/publicdatasets/
problem solving
Plenary Talk at ACAT 2010
3.7 million classifications in just over three days
~15 million in less than a month
>2.6 million clicks in 100 hours
software & algorithms
Plenary Talk at ACAT 2010
Crossbow: Rapid whole genome SNP analysis


                        Preprocessed reads


                              Map: Bowtie


                   Sort: Bin and partition


                        Reduce: SoapSNP
                    Langmead B, Trapnell C, Pop M, Salzberg SL. Genome Biol 10 (3): R25.
Crossbow	
   condenses	
   over	
   1,000	
   hours	
   of	
  
resequencing	
   computa:on	
   into	
   a	
   few	
   hours	
  
without	
   requiring	
   the	
   user	
   to	
   own	
   or	
   operate	
   a	
  
computer	
  cluster
doing science
http://bioteam.net
BLAT @ U. PENN
Map 100 million, 100 base paired end reads
Quad core with 5 GB of RAM would take 16 days




30 high-memory instances; 32 hours; $195
GALAXY MAPPING
Goal: Create an astrometric catalog of a billion
stars with micro arc second precision
Gaia satellite launched 2011; observations till
2017; catalog ready 2019
Problem: Single pass through the data for image
processing would take 30 years (on one CPU)

Solution: Use AWS
Capacity                                   Capacity
Resources




                                            Resources
                                 Demand                                     Demand



                 Time                                         Time

            Static data center                   Data center in the cloud

                                    Unused resources
HEAVY-ION COLLISIONS

Problem: Quark matter physics conference
imminent but no compute resources handy

Solution: NIMBUS context broker allowed
researchers to provision 300 nodes and get the
simulations done
BELLE MONTE CARLO




Credit: Tom Fifield
AWS for the sciences
available resources
task-based resources
shared dataspaces
new software architectures
new computing platforms
http://aws.amazon.com/education
the cloud works
today
Plenary Talk at ACAT 2010
Thank	
  you!




deesingh@amazon.com	
  Twi<er:@mndoci	
  
       Presenta?on	
  ideas	
  from	
  James	
  Hamilton	
  and	
  @lessig

More Related Content

PDF
Talk at Microsoft Cloud Futures 2010
PDF
Bio-IT World 2010 - Keynote talk
PDF
Talk given at "Cloud Computing for Systems Biology" workshop
PDF
Masterworks talk on Big Data and the implications of petascale science
PDF
NHGRI Cloud Computing talk
PPTX
13h00 aws 2012-fault_tolerant_applications
PDF
Fault Tolerant Applications on AWS
PPTX
Cloud computing with AWS
Talk at Microsoft Cloud Futures 2010
Bio-IT World 2010 - Keynote talk
Talk given at "Cloud Computing for Systems Biology" workshop
Masterworks talk on Big Data and the implications of petascale science
NHGRI Cloud Computing talk
13h00 aws 2012-fault_tolerant_applications
Fault Tolerant Applications on AWS
Cloud computing with AWS

What's hot (10)

PDF
An intro to Amazon Web Services (AWS)
PDF
AWS Use Cases
PPTX
Aws overview (Amazon Web Services)
PDF
The Cloud as a Platform
PPTX
Basics AWS Presentation
PPTX
Aws platform overview
PPT
Cloud Computing With AWS
PPS
Amazon web service
PPTX
Aws tutorial for beginners- tibacademy.in
PPTX
Introduction to amazon web service (clean)
An intro to Amazon Web Services (AWS)
AWS Use Cases
Aws overview (Amazon Web Services)
The Cloud as a Platform
Basics AWS Presentation
Aws platform overview
Cloud Computing With AWS
Amazon web service
Aws tutorial for beginners- tibacademy.in
Introduction to amazon web service (clean)
Ad

Viewers also liked (9)

PPTX
Spring 2014 exit survey
PPTX
From Transactions to Transformations, MD Libraries 2014
PPTX
Fall 2013 follow up survey
PPT
Mentoring as a part of CPD
PPTX
Spring 2014 exit survey
PPTX
T2 t online follow up summer 2014 slideshare version
PPTX
T2 t online follow up summer 2014
PPTX
Making the Most of LATI: How to help your LAs translate what they learn into...
PPTX
ACCL Focus Groups Summary
Spring 2014 exit survey
From Transactions to Transformations, MD Libraries 2014
Fall 2013 follow up survey
Mentoring as a part of CPD
Spring 2014 exit survey
T2 t online follow up summer 2014 slideshare version
T2 t online follow up summer 2014
Making the Most of LATI: How to help your LAs translate what they learn into...
ACCL Focus Groups Summary
Ad

Similar to Plenary Talk at ACAT 2010 (7)

PDF
Keynote - Cloud e o Futuro com Werner Vogels, CTO da amazon
PDF
Jeff barr Seattle_interactive_2011_q4
PDF
Architecting an Highly Available and Scalable WordPress Site in AWS
PPTX
Keynote aws summit 2012 final
ODP
Amazon EC2: What is this and what can I do with it?
PDF
Overview of Amazon Web Services
PPTX
Serverless on AWS overview - PeachPayments meetup
Keynote - Cloud e o Futuro com Werner Vogels, CTO da amazon
Jeff barr Seattle_interactive_2011_q4
Architecting an Highly Available and Scalable WordPress Site in AWS
Keynote aws summit 2012 final
Amazon EC2: What is this and what can I do with it?
Overview of Amazon Web Services
Serverless on AWS overview - PeachPayments meetup

More from Deepak Singh (17)

PDF
Intel Theater Presentation - SC11
PDF
Talk at West Coast Association of Shared Resource Directors
PDF
Platforms for Data Science - Computing on the Brink
PDF
High Performance Cloud Computing
PPTX
#arseniclife
PDF
High Performance Cloud Computing
PDF
Systems Bioinformatics Workshop Keynote
PDF
Talk at NCRR P41 Director's Meeting
PDF
Platforms for data science
PDF
Discovery 2015 Workshop
KEY
Hadoop for Bioinformatics
KEY
Big Data & the networked future of Science (at Ignite Seattle 7)
PPT
Science Big, Science Connected
PPT
Bioscreencast: Capturing the life sciences frame by frame
PPT
Searching Science
PPT
Nanotechnology and medicine
PPT
An Open Scientific Future
Intel Theater Presentation - SC11
Talk at West Coast Association of Shared Resource Directors
Platforms for Data Science - Computing on the Brink
High Performance Cloud Computing
#arseniclife
High Performance Cloud Computing
Systems Bioinformatics Workshop Keynote
Talk at NCRR P41 Director's Meeting
Platforms for data science
Discovery 2015 Workshop
Hadoop for Bioinformatics
Big Data & the networked future of Science (at Ignite Seattle 7)
Science Big, Science Connected
Bioscreencast: Capturing the life sciences frame by frame
Searching Science
Nanotechnology and medicine
An Open Scientific Future

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
cuic standard and advanced reporting.pdf
PPT
Teaching material agriculture food technology
PPTX
A Presentation on Artificial Intelligence
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Cloud computing and distributed systems.
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
DOCX
The AUB Centre for AI in Media Proposal.docx
Machine learning based COVID-19 study performance prediction
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
cuic standard and advanced reporting.pdf
Teaching material agriculture food technology
A Presentation on Artificial Intelligence
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Understanding_Digital_Forensics_Presentation.pptx
Approach and Philosophy of On baking technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Network Security Unit 5.pdf for BCA BBA.
Cloud computing and distributed systems.
Big Data Technologies - Introduction.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The AUB Centre for AI in Media Proposal.docx

Plenary Talk at ACAT 2010