SlideShare a Scribd company logo
Cloud Computing and Innovations
for Optimizing Life Sciences
Research
Krittika Sasmal @ InterpretOmics
19th
February 2014
2
Acknowledgement
Organizers of this event
Scientists and Researchers who work for others and
make their research findings OPEN
Entrepreneurs who translate this knowledge
GNU, Open Software, NIH and other funding bodies
that keep Biology & Medical information OPEN
19.02.14
19.02.14 3
4 Grand Social Challenges
Food Security
Health Security
Energy Security
Environmental Security
Common Thread is Biology
19.02.14 4
The 21st Century Biology
– The Quantitative Biology
Ref: A New Biology for the 21st Century, The National Academies
Will create a discovery
engine able to
tackle extremely
complex biological
and societal
problems
19.02.14 5
Decoding the Book of Life
– milestone for Quantitative Biology
A Milestone for Humanity – the Human genome
Human Genome Completed, June 26th, 2000
19.02.14 6
Journey is
- From Reduction to Integration
− There are many diseases that were researched and understood through the process of reduction
− However as the understanding of diseases mature, and the need for proactive medicine increases, researchers find
that many diseases including cancer are due to somatic mutations that cannot be understood in the reduced space
− Understanding of such disease and discover a drug for these diseases will need a reverse operation - integration
and systems biology
Ref: Hiroaki Kitano, et al. Systems Biology: A Brief Overview, Science 295, 1662 (March 2002);
19.02.14 8
Translational Medicine
– Genomics + Clinical + Non-clinical Data to Discovery of Novel therapeutics
Data
Information
Knowledge
Literature/
Molecular Data
Clinical/Bedside Data
Medical
Knowledge
Target Data
Preprocessed
Data
Transformed
Data
Patterns
iOmics
Disease/Drug
Data
9
Next Generation Sequence Data
• FASTQ (Illumina)
• Sff (454)
• CCS (PacBio)
• ...
• Microarray
Single End
Sequences
Insert Size
Library Size
Sequence SequencePaired End or
Mate-paired
 
   


DNA/RNA/miRNA
OverlappedOverlapped reads

Random Order & Orientation
Long reads
Short reads
Fixed length reads
Variable length reads
cDNA/mRNA
Hundreds to Billions Bases
Circular Consensus reads
Billions to Hundreds Bases
19.02.14
Data domains and Challenges
Source: Clevergene Biocorp
1119.02.14
12
Exploratory Data Analysis
Exploratory Data Analysis (EDA) is an approach/philosophy
for data analysis that employs a variety of techniques (mostly
graphical) to
1. Maximize insight into a data set
2. Uncover underlying structure
3. Extract important variables
4. Detect outliers and anomalies
5. Test underlying assumptions
6. Develop parsimonious models
7. Determine optimal factor settings
19.02.14
Each of these is addressed through
algorithms to solve a computational
problem, typically based upon
optimizing some mathematical
criterion
1319.02.14
15
Trends in Genomic Medicine
J. J. McCarthy et al., Sci Transl Med 2013;5:189sr4
19.02.14
16
Data Driven Healthcare
Personalized Health Care
Translational
Medicine
Health Care
Today Digital Imaging
Episodic Treatment Electronic Health Records Artificial Expert Systems
Clinical Genomics
Genetic Predisposition Testing
Molecular Medicine
CA Diagnosis
Pre-symptomatic Treatment
Lifetime Treatment
Evolutionary Practices
RevolutionaryTechnology
Automated
Systems
Non-specific
(Treat Symptoms)
Information
Correlation
Organized
(Error Reduction)
Personalized
(Disease Prevention)
Data and Systems Integration
DistributedHigh-Throughput
Analytics
19.02.14
17
P6 Medicine
– Preventive, Predictive, Participatory, Personalized, Precision, and Pervasive
19.02.14
18
Personal Data
19.02.14
19
Population Data
Registry
Registry
Claims Data
Clinical Trial
Drug reaction
Literature
Genomic Data
Survivability
Public Health Epidemiology
Population Data
19.02.14
20
Biomedical DataClinical
Repositories
Online Mendelian
Inheritance in Man
Medical Subject
Heading
Genome/Gene
Annotations
University of Washington
Digital Anatomist
UWDA
NCBI
TaxonomiesHuman
Metabolome
RxNorm
Drug
ICD10
Logical Observation
Identifiers Names and
Code
UMLS
(Unified Medical
Language System)
19.02.14
7V's in Healthcare Big-data
 Vexing. Proper algorithm needs to be designed to ensure data
processing time is linear. Genomic data are generally NP-Hard and
proper parallel algorithms need to be designed to access data in a
near real-time manner.
 Volume. Physical volume of data that needs to be online. This
includes structured and unstructured data. Storage is available,
however, the challenge is to determine relevance within large data
volumes and how to use analytics to create value from relevant
data.
 Velocity. Data must be retrieved in a timely manner. In healthcare,
many data sources are outside the enterprise. Reacting quickly
enough to deal with data velocity is critical for most healthcare
applications.
2119.02.14
 Variety. Data today comes in all types of formats. Structured,
numeric, unstructured data like CT scan, MRI, Ultrasound, X-Rays
etc. in different forms. Also, most healthcare data is categorical,
with or without any order. Managing, merging and governing
different varieties of data is a challenge.
 Variability. Healthcare data are highly inconsistent with periodic
peaks. Cancer for example has four different types of variability viz.,
Intratumoral, Intermetastatic, Intrametastatic, and Interpatient.
Discovery of independent variables and the causal attributes are
critical.
 Veracity. Quality, relevance, repeatability, quantification,
meaningfulness, predictive value, reduction of error
 Value. The final result and its quantification from ROI or reduction of
readmission or reducing the morbidity is the key that will finally be
measured.
2219.02.14
23
Healthcare Analytics & Decision
Support System
Analytics
of 7Vs
19.02.14
24
Big Data in Life Sciences
19.02.14
25
Question is, NOT whether we can
do it! But, HOW QUICKLY and
HOW ACCURATELY we can do
it; where, Speed, Repeatability,
Reliability, Predictability, and
Precision matter!
19.02.14
26
The Cloud
You don't buy a COW when you need
Milk!
Likewise, a Biologist, a Breeder, or a Clinician
need not Worry about Data Analysis, Algorithms,
Supercomputer, Pipelines, or even the Analytics and
the
Biomedical Informatics!
19.02.14
27
Cloud Computing Defined
Cloud computing is an emerging computing paradigm
where data and applications reside in the cyberspace,
it allows users to access their data and information
through any web-connected device be it fixed or
mobile.
Source: John B. Horrigan, Use of Cloud Computing Applications & Services, Data memo, PEW Internet &
American Life project, September 2008
19.02.14
28
Cloud Computing User – I (Amir)
19.02.14
29
Cloud Computing User – II (Fakir)
19.02.14
30
Characteristics of Cloud Computing
Virtual – Physical location and underlying
infrastructure details are transparent to users
Scalable – Able to break complex workloads into
pieces to be served across an incrementally
expandable infrastructure
Efficient – Services Oriented Architecture for
dynamic provisioning of shared compute resources
Flexible – Can serve a variety of workload types –
both consumer and commercial
19.02.14
31
Benefits of Cloud
Unlimited Resource
 Unlimited Computing power
 Unlimited storage (Filestore & online memory)
Users can use resources without owning anything –
converting Capex to Opex
Helping Green computing by lending out idle resources
through Cycle Scavenging
Pay as you go
19.02.14
32
Cloud Computing Stack
Facilities
Hardware
Facilities
Integration & Middleware
Data Metadata Content
Application
API
Presentation Modality Presentation Platform
Infrastructure as
a Service
Platform as a
Service
Software as a
Service
Connectivity & delivery
API
Facilities
Hardware
Facilities
Connectivity & delivery
API
Integration & Middleware
Q
O
E
&
Q
O
S
S
E
C
U
R
I
T
Y
User/
Customer/
Device
M
I
D
D
L
E
W
A
R
E
Original Cloud ProviderCloud VendorCloud User Next Gen
Network
Next
Generation
Network /
Intranet
19.02.14
33
Divide and Conquer: MapReduce
Output files
Split 1
Split 2
Split 3
Split 4
Worker
Worker
Worker
k1:v1
k3:v2
k1:v3
k2:v4
k2:v5
k4:v6
k1:v1,v3
k2:v4,v5
k3:v2
Worker
Worker
Worker Output 1
Output 2
k4:v6
Output 3
Master
Input Files
Map
Intermediate files
Reduce
Sort/Group
19.02.14
34
Open Source MapReduce
Hadoop
− Implemented in Java; enabled on Amazon
Twister
− Light weight new arrival in town
19.02.14
35
Security in the Cloud
Security in the cloud needs to answer few specific questions
like:
1. How much trust do you have on virtualized environment or
the hypervisors in the cloud as against your own physical
hardware?
2. How much trust do you have on cloud vendor versus your
own infrastructure?
3. How do you address regulatory and compliance
requirement in an environment when your application might
be running on an infrastructure in a foreign country?
19.02.14
36
New generation software in
bioinformatics needs to be:
Fast/Very fast software, with a low memory
consumption
Be able to handle and analyze TB of data
Store data efficiently to query
Distribute computation, not data
Focused, and useful in analysis
19.02.14
37
www.iomics.in
The Omics Lab in the Internet
19.02.14
Crop to Cancer
19.02.14
3919.02.14
Analysis Pipelines in iOMICS
QA/QC of raw reads
SNP/InDel/CNV analysis
miRNA discovery
Exome sequencing
ChIP sequencing
... (New additions)
Meta-analysis
Visualization of results
Omics data
Systems Biology
4019.02.14
4119.02.14
4219.02.14
4319.02.14
4519.02.14
46
SNP & CNV Analysis
19.02.14
4719.02.14
48
Hierarchical Clustering
19.02.14
49
Gene Interaction/Enrichment
19.02.14
50
19.02.14
51
Cloud computing Holds the Potential to
Address the Challenges and Transform
Biology and Heathcare
Krittika Sasmal
Email: “krittika” dot “sasmal” (at) “interpretomics” dot “co”
19.02.14

More Related Content

PDF
SRGE COVID-19 Publications 2020
PDF
Artificial intelligence to fight against covid19
PDF
Iisrt z dr.s.sapna
PDF
Predictive Data Mining for Converged Internet of
PDF
archenaa2015-survey-big-data-government.pdf
PPTX
Startups Step Up - how healthcare ai startups are taking action during covid-...
PDF
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUES
PDF
IRJET - Prediction and Analysis of Multiple Diseases using Machine Learni...
SRGE COVID-19 Publications 2020
Artificial intelligence to fight against covid19
Iisrt z dr.s.sapna
Predictive Data Mining for Converged Internet of
archenaa2015-survey-big-data-government.pdf
Startups Step Up - how healthcare ai startups are taking action during covid-...
PREDICTIVE ANALYTICS IN HEALTHCARE SYSTEM USING DATA MINING TECHNIQUES
IRJET - Prediction and Analysis of Multiple Diseases using Machine Learni...

What's hot (20)

PDF
IRJET- Air Quality Forecast Monitoring and it’s Impact on Brain Health based ...
PPTX
Big data in IoT for healthcare - www.pepgra.com
PPTX
ROLES OF TECHNOLOGY AGAINST NOVEL CORONA VIRUS
PPTX
Artificial intelligence and its applications in healthcare and pharmacy
PDF
Recognition of Corona virus disease (COVID-19) using deep learning network
PDF
Survey of IOT based Patient Health Monitoring System
PDF
Medic - Artificially Intelligent System for Healthcare Services ...
PDF
IRJET- An Information Forwarder for Healthcare Service and analysis using Big...
PPTX
Diabetes Data Science
PDF
IRJET - Machine Learning for Diagnosis of Diabetes
PPTX
Innovative project1
PPTX
Predictive Analytics and Machine Learning for Healthcare - Diabetes
PPTX
Is Big Data Always Good Data?
PDF
IRJET- Diabetes Diagnosis using Machine Learning Algorithms
PPTX
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
PPTX
kHealth Bariatrics
PDF
Promise and peril: How artificial intelligence is transforming health care
PPTX
Artificial intelligence(chirag mittal)
PDF
AI in Healthcare
IRJET- Air Quality Forecast Monitoring and it’s Impact on Brain Health based ...
Big data in IoT for healthcare - www.pepgra.com
ROLES OF TECHNOLOGY AGAINST NOVEL CORONA VIRUS
Artificial intelligence and its applications in healthcare and pharmacy
Recognition of Corona virus disease (COVID-19) using deep learning network
Survey of IOT based Patient Health Monitoring System
Medic - Artificially Intelligent System for Healthcare Services ...
IRJET- An Information Forwarder for Healthcare Service and analysis using Big...
Diabetes Data Science
IRJET - Machine Learning for Diagnosis of Diabetes
Innovative project1
Predictive Analytics and Machine Learning for Healthcare - Diabetes
Is Big Data Always Good Data?
IRJET- Diabetes Diagnosis using Machine Learning Algorithms
Kno.e.sis Approach to Impactful Research & Training for Exceptional Careers
kHealth Bariatrics
Promise and peril: How artificial intelligence is transforming health care
Artificial intelligence(chirag mittal)
AI in Healthcare
Ad

Similar to Cloud Computing and Innovations for Optimizing Life Sciences Research (20)

PPTX
2016 09 cxo forum
PPTX
Next Gen Sequencing and Associated Big Data / AI problem
PDF
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
PPTX
Precision Medicine in the Big Data World
PPTX
A Modern Data Strategy for Precision Medicine
PDF
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
PDF
Open Source Networking Solving Molecular Analysis of Cancer
PPTX
Cloud-native machine learning - Transforming bioinformatics research
ODP
Life sciences big data use cases
PDF
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
PDF
Expert Panel on Data Challenges in Translational Research
PDF
D1 1440 cesar wong next generation sequencing & bio medical data analysis
PPTX
2018 10 igneous
PDF
2016 iHT2 San Diego Health IT Summit
PDF
Digital Medicine: Bringing Digital Solutions to Medical Practice Ralf Huss
PDF
Microsoft genomics to advance clinical science
PDF
[IJCT-V3I3P1] Authors: Sunny Sharma, Karandeep Kaur, Amritpal Singh
PPTX
Production Bioinformatics, emphasis on Production
PPTX
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
PPTX
Trends from the Trenches: 2019
2016 09 cxo forum
Next Gen Sequencing and Associated Big Data / AI problem
From the Benchtop to the Datacenter: IT and Converged Infrastructure in Life ...
Precision Medicine in the Big Data World
A Modern Data Strategy for Precision Medicine
Big Data and its Impact on Industry (Example of the Pharmaceutical Industry)
Open Source Networking Solving Molecular Analysis of Cancer
Cloud-native machine learning - Transforming bioinformatics research
Life sciences big data use cases
The Future of Healthcare with Big Data and AI with Ion Stoica and Frank Nothaft
Expert Panel on Data Challenges in Translational Research
D1 1440 cesar wong next generation sequencing & bio medical data analysis
2018 10 igneous
2016 iHT2 San Diego Health IT Summit
Digital Medicine: Bringing Digital Solutions to Medical Practice Ralf Huss
Microsoft genomics to advance clinical science
[IJCT-V3I3P1] Authors: Sunny Sharma, Karandeep Kaur, Amritpal Singh
Production Bioinformatics, emphasis on Production
(Em)Powering Science: High-Performance Infrastructure in Biomedical Science
Trends from the Trenches: 2019
Ad

More from InterpretOmics (8)

PPT
PPTX
iOMICS Clinical & Omnia
PDF
Big Data in Disease Management
PPT
iOMICS Research
PDF
Genomics2 Phenomics Complete
PDF
Bda2015 tutorial-part2-data&databases
PDF
Bda2015 tutorial-part1-intro
PPT
CNVMiner: Pipeline to Mine CNV & Structural Variation in Hierarchical Fashion
iOMICS Clinical & Omnia
Big Data in Disease Management
iOMICS Research
Genomics2 Phenomics Complete
Bda2015 tutorial-part2-data&databases
Bda2015 tutorial-part1-intro
CNVMiner: Pipeline to Mine CNV & Structural Variation in Hierarchical Fashion

Recently uploaded (20)

PPTX
Acid Base Disorders educational power point.pptx
DOCX
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
PPTX
Uterus anatomy embryology, and clinical aspects
PPTX
POLYCYSTIC OVARIAN SYNDROME.pptx by Dr( med) Charles Amoateng
PPTX
ACID BASE management, base deficit correction
DOC
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
DOCX
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
PPTX
1 General Principles of Radiotherapy.pptx
PPTX
CME 2 Acute Chest Pain preentation for education
PPTX
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
PPTX
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
PPTX
neonatal infection(7392992y282939y5.pptx
PPT
MENTAL HEALTH - NOTES.ppt for nursing students
PPTX
Gastroschisis- Clinical Overview 18112311
PPT
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
PPTX
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
PPTX
surgery guide for USMLE step 2-part 1.pptx
PPT
Obstructive sleep apnea in orthodontics treatment
PPT
OPIOID ANALGESICS AND THEIR IMPLICATIONS
PPTX
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...
Acid Base Disorders educational power point.pptx
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
Uterus anatomy embryology, and clinical aspects
POLYCYSTIC OVARIAN SYNDROME.pptx by Dr( med) Charles Amoateng
ACID BASE management, base deficit correction
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
1 General Principles of Radiotherapy.pptx
CME 2 Acute Chest Pain preentation for education
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
neonatal infection(7392992y282939y5.pptx
MENTAL HEALTH - NOTES.ppt for nursing students
Gastroschisis- Clinical Overview 18112311
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
DENTAL CARIES FOR DENTISTRY STUDENT.pptx
surgery guide for USMLE step 2-part 1.pptx
Obstructive sleep apnea in orthodontics treatment
OPIOID ANALGESICS AND THEIR IMPLICATIONS
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...

Cloud Computing and Innovations for Optimizing Life Sciences Research

  • 1. Cloud Computing and Innovations for Optimizing Life Sciences Research Krittika Sasmal @ InterpretOmics 19th February 2014
  • 2. 2 Acknowledgement Organizers of this event Scientists and Researchers who work for others and make their research findings OPEN Entrepreneurs who translate this knowledge GNU, Open Software, NIH and other funding bodies that keep Biology & Medical information OPEN 19.02.14
  • 3. 19.02.14 3 4 Grand Social Challenges Food Security Health Security Energy Security Environmental Security Common Thread is Biology
  • 4. 19.02.14 4 The 21st Century Biology – The Quantitative Biology Ref: A New Biology for the 21st Century, The National Academies Will create a discovery engine able to tackle extremely complex biological and societal problems
  • 5. 19.02.14 5 Decoding the Book of Life – milestone for Quantitative Biology A Milestone for Humanity – the Human genome Human Genome Completed, June 26th, 2000
  • 6. 19.02.14 6 Journey is - From Reduction to Integration − There are many diseases that were researched and understood through the process of reduction − However as the understanding of diseases mature, and the need for proactive medicine increases, researchers find that many diseases including cancer are due to somatic mutations that cannot be understood in the reduced space − Understanding of such disease and discover a drug for these diseases will need a reverse operation - integration and systems biology Ref: Hiroaki Kitano, et al. Systems Biology: A Brief Overview, Science 295, 1662 (March 2002);
  • 7. 19.02.14 8 Translational Medicine – Genomics + Clinical + Non-clinical Data to Discovery of Novel therapeutics Data Information Knowledge Literature/ Molecular Data Clinical/Bedside Data Medical Knowledge Target Data Preprocessed Data Transformed Data Patterns iOmics Disease/Drug Data
  • 8. 9 Next Generation Sequence Data • FASTQ (Illumina) • Sff (454) • CCS (PacBio) • ... • Microarray Single End Sequences Insert Size Library Size Sequence SequencePaired End or Mate-paired         DNA/RNA/miRNA OverlappedOverlapped reads  Random Order & Orientation Long reads Short reads Fixed length reads Variable length reads cDNA/mRNA Hundreds to Billions Bases Circular Consensus reads Billions to Hundreds Bases 19.02.14
  • 9. Data domains and Challenges Source: Clevergene Biocorp 1119.02.14
  • 10. 12 Exploratory Data Analysis Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to 1. Maximize insight into a data set 2. Uncover underlying structure 3. Extract important variables 4. Detect outliers and anomalies 5. Test underlying assumptions 6. Develop parsimonious models 7. Determine optimal factor settings 19.02.14
  • 11. Each of these is addressed through algorithms to solve a computational problem, typically based upon optimizing some mathematical criterion 1319.02.14
  • 12. 15 Trends in Genomic Medicine J. J. McCarthy et al., Sci Transl Med 2013;5:189sr4 19.02.14
  • 13. 16 Data Driven Healthcare Personalized Health Care Translational Medicine Health Care Today Digital Imaging Episodic Treatment Electronic Health Records Artificial Expert Systems Clinical Genomics Genetic Predisposition Testing Molecular Medicine CA Diagnosis Pre-symptomatic Treatment Lifetime Treatment Evolutionary Practices RevolutionaryTechnology Automated Systems Non-specific (Treat Symptoms) Information Correlation Organized (Error Reduction) Personalized (Disease Prevention) Data and Systems Integration DistributedHigh-Throughput Analytics 19.02.14
  • 14. 17 P6 Medicine – Preventive, Predictive, Participatory, Personalized, Precision, and Pervasive 19.02.14
  • 16. 19 Population Data Registry Registry Claims Data Clinical Trial Drug reaction Literature Genomic Data Survivability Public Health Epidemiology Population Data 19.02.14
  • 17. 20 Biomedical DataClinical Repositories Online Mendelian Inheritance in Man Medical Subject Heading Genome/Gene Annotations University of Washington Digital Anatomist UWDA NCBI TaxonomiesHuman Metabolome RxNorm Drug ICD10 Logical Observation Identifiers Names and Code UMLS (Unified Medical Language System) 19.02.14
  • 18. 7V's in Healthcare Big-data  Vexing. Proper algorithm needs to be designed to ensure data processing time is linear. Genomic data are generally NP-Hard and proper parallel algorithms need to be designed to access data in a near real-time manner.  Volume. Physical volume of data that needs to be online. This includes structured and unstructured data. Storage is available, however, the challenge is to determine relevance within large data volumes and how to use analytics to create value from relevant data.  Velocity. Data must be retrieved in a timely manner. In healthcare, many data sources are outside the enterprise. Reacting quickly enough to deal with data velocity is critical for most healthcare applications. 2119.02.14
  • 19.  Variety. Data today comes in all types of formats. Structured, numeric, unstructured data like CT scan, MRI, Ultrasound, X-Rays etc. in different forms. Also, most healthcare data is categorical, with or without any order. Managing, merging and governing different varieties of data is a challenge.  Variability. Healthcare data are highly inconsistent with periodic peaks. Cancer for example has four different types of variability viz., Intratumoral, Intermetastatic, Intrametastatic, and Interpatient. Discovery of independent variables and the causal attributes are critical.  Veracity. Quality, relevance, repeatability, quantification, meaningfulness, predictive value, reduction of error  Value. The final result and its quantification from ROI or reduction of readmission or reducing the morbidity is the key that will finally be measured. 2219.02.14
  • 20. 23 Healthcare Analytics & Decision Support System Analytics of 7Vs 19.02.14
  • 21. 24 Big Data in Life Sciences 19.02.14
  • 22. 25 Question is, NOT whether we can do it! But, HOW QUICKLY and HOW ACCURATELY we can do it; where, Speed, Repeatability, Reliability, Predictability, and Precision matter! 19.02.14
  • 23. 26 The Cloud You don't buy a COW when you need Milk! Likewise, a Biologist, a Breeder, or a Clinician need not Worry about Data Analysis, Algorithms, Supercomputer, Pipelines, or even the Analytics and the Biomedical Informatics! 19.02.14
  • 24. 27 Cloud Computing Defined Cloud computing is an emerging computing paradigm where data and applications reside in the cyberspace, it allows users to access their data and information through any web-connected device be it fixed or mobile. Source: John B. Horrigan, Use of Cloud Computing Applications & Services, Data memo, PEW Internet & American Life project, September 2008 19.02.14
  • 25. 28 Cloud Computing User – I (Amir) 19.02.14
  • 26. 29 Cloud Computing User – II (Fakir) 19.02.14
  • 27. 30 Characteristics of Cloud Computing Virtual – Physical location and underlying infrastructure details are transparent to users Scalable – Able to break complex workloads into pieces to be served across an incrementally expandable infrastructure Efficient – Services Oriented Architecture for dynamic provisioning of shared compute resources Flexible – Can serve a variety of workload types – both consumer and commercial 19.02.14
  • 28. 31 Benefits of Cloud Unlimited Resource  Unlimited Computing power  Unlimited storage (Filestore & online memory) Users can use resources without owning anything – converting Capex to Opex Helping Green computing by lending out idle resources through Cycle Scavenging Pay as you go 19.02.14
  • 29. 32 Cloud Computing Stack Facilities Hardware Facilities Integration & Middleware Data Metadata Content Application API Presentation Modality Presentation Platform Infrastructure as a Service Platform as a Service Software as a Service Connectivity & delivery API Facilities Hardware Facilities Connectivity & delivery API Integration & Middleware Q O E & Q O S S E C U R I T Y User/ Customer/ Device M I D D L E W A R E Original Cloud ProviderCloud VendorCloud User Next Gen Network Next Generation Network / Intranet 19.02.14
  • 30. 33 Divide and Conquer: MapReduce Output files Split 1 Split 2 Split 3 Split 4 Worker Worker Worker k1:v1 k3:v2 k1:v3 k2:v4 k2:v5 k4:v6 k1:v1,v3 k2:v4,v5 k3:v2 Worker Worker Worker Output 1 Output 2 k4:v6 Output 3 Master Input Files Map Intermediate files Reduce Sort/Group 19.02.14
  • 31. 34 Open Source MapReduce Hadoop − Implemented in Java; enabled on Amazon Twister − Light weight new arrival in town 19.02.14
  • 32. 35 Security in the Cloud Security in the cloud needs to answer few specific questions like: 1. How much trust do you have on virtualized environment or the hypervisors in the cloud as against your own physical hardware? 2. How much trust do you have on cloud vendor versus your own infrastructure? 3. How do you address regulatory and compliance requirement in an environment when your application might be running on an infrastructure in a foreign country? 19.02.14
  • 33. 36 New generation software in bioinformatics needs to be: Fast/Very fast software, with a low memory consumption Be able to handle and analyze TB of data Store data efficiently to query Distribute computation, not data Focused, and useful in analysis 19.02.14
  • 34. 37 www.iomics.in The Omics Lab in the Internet 19.02.14
  • 37. Analysis Pipelines in iOMICS QA/QC of raw reads SNP/InDel/CNV analysis miRNA discovery Exome sequencing ChIP sequencing ... (New additions) Meta-analysis Visualization of results Omics data Systems Biology 4019.02.14
  • 42. 46 SNP & CNV Analysis 19.02.14
  • 47. 51 Cloud computing Holds the Potential to Address the Challenges and Transform Biology and Heathcare Krittika Sasmal Email: “krittika” dot “sasmal” (at) “interpretomics” dot “co” 19.02.14