SlideShare a Scribd company logo
Predictive Analytics in
Healthcare:
Building Successful
Pipelines
Danielle Belgrave
Researcher
Healthcare Machine Learning
@DaniCMBelg
Latent Variable Modelling
My Research
Missing DataLongitudinal Data Analysis
0 2 4 6 8 10 12 14 16 18 20 22 24
Time (hours)
Patient 1
Patient 2
Patient 3
Patient 4
MultidisciplinaryCausality
X Y
Z
Patient-Centric Approach
Roadmap: Predictive Analytics for Healthcare
Part 1
• Drivers ofpredictive analyticsin healthcare
Part 2
• Evaluationof data-driven healthcare
Part3
• Important concepts in study design
Part 4
• Top 10 DataScience Skills
Part 1: Drivers of Predictive Analytics in healthcare
PopulationPatient/Person Pharmaceuticals Providers
TheKey Stakeholders inHealth
ThePerson at the Centre of Healthcare
Patient/Person
MachineLearninghas the capacityto
transformhealthcare
- Understandingphysiological changes
overtime
- Forecastingofprogression or onsetof
disease
- Personalisingtreatmentstrategies
Population Data-driven Healthcare
Population
Elucidatesaverage effectsand deviations
fromaverage effects
Policy recommendations
Healtheducation
Outreach
Researchfor diseasedetectionand injury
prevention
Reducehealthcareinequalities
Whatwe as asociety do collectively
toassurethe conditionsin which
people can be healthy
The Pharmaceutical Perspective:
Drug Discoveryand Therapeutics
Idea
Basic
Research
Clinical
Trails
Phase I Phase II Phase III Regulatory
Approval
Patient
Care
Data Protection and
Connected Care: The Provider
and Regulator Perspective
Providers
Predictive Analytics forHealthcare
Part 1
• Drivers of predictive analyticsin healthcare
Part 2
• Evaluation of data-driven healthcare
Part3
• Important concepts in study design
Part 4
• Top 10 DataScience Skills
DATA EXPERTISE
METHODS &
MODELS
Vast data volume,
velocity, variety
TSUNAMI
Supra-linear growth in
papers & tools
BLIZZARD
Human expertise to make
sense of the growing data
DROUGHT
Myth 1: ‘Big data’ are a solution
Myth 2: We have the models needed to turn healthcare data into decision support algorithms
Myth 3: Clinicians will continue to be the main source of data
Current Stateof Healthcare Data
Machine Learning has the Potential to
Disrupt and Impact Healthcare…
…But it’s Important to Remember the Beginnings
The study of the distribution and determinantsof healthrelated statesor events in
specific populations & theapplications of this study to thecontrol of healthproblems
Data visualisation: death toll of
the Crimean War
Army data: 16,000/18,000
deaths notdue to battle
wounds, but to preventable
diseases, spread by poor
sanitation
The Beginnings of Data-Driven Health
Florence Nightingale(1820– 1910)
Anchor Data Science in
The Power ofObservation
Contextualphenomena:cholera
incidence
Ecologicaldesign:compare
cholera rates by region
Cohort design:comparecholera
ratesin exposedand non-
exposedindividuals
R.A. Fisher and the
Principles of Experimental Design
1.Randomisation:Unbiasedallocationof
treatmentsto
differentexperimentalplot
2.Replication:repetitionofthe treatment
tomorethan
oneexperimentalplot
3.Error control: Measurefor reducing the
error ofvariance
Why do these 2
plants differ in growth?
Predictive Analytics forHealthcare
Part 1
• Drivers of predictive analyticsin healthcare
Part 2
• Evaluationof data-driven healthcare
Part3
• Important conceptsin study design
Part 4
• Top 10 DataScience Skills
Principles of Study Design
Needtosetup a studytoanswer aresearch question
Designmost importantaspect of a studyand perhaps themostneglected
Thestudydesign should matchresearch question
So that we don’t endup collecting useless data orthe principle outcome ends up not being recorded
Nomatter howgoodanalgorithmis,if the studydesign is inadequate
(garbage in) for answeringthe research question,we’ll get garbage out
Good Predictive Analyticsfor Healthcare
starts with Good Study Design
Threequestions:
1. Whatis thequestionor hypothesis I’mtrying toinvestigate
2. Is thedataI have adequateinorder to address thisquestion
3. How can I design astudythataddresses thesequestions
Types of Study Design
Non-Experimental
Observational
Studies
Descriptive
Case Reports
Case Series
Cross-Sectional or
Prevalence Study
Analytical
Case-control
Cohort Study
Experimental
Intervention
Studies
Randomised
Clinical Trial
Non-randomized/
Field/ Community
Trial
CaseReport
Profileofa singlepatientis reportedin detailby oneor more clinician
CaseSeries
An individualcase report thathas been expandedtoincludea number ofpatientswitha
givendisease
Cross-sectional study
• Allinformationcollectedatsametimepoint
• Snapshots of healthstate
• Designedtoobtaininformationfromsamplesregarding prevalence, distributionsand
interrelations
• Eg:survey
• Thereare no typical formats:they aredesignedor modifiedtomeettheneeds ofthe
researcher or fitthetopicofresearch
Cross-sectional Study
Source
Population
Exposed
&
Bad outcome
Exposed
&
Good outcome
Unexposed
&
Bad outcome
Unexposed
&
Good outcome
Ineligible Eligible
Participation
No
Paricipation
Types of Study Design
Non-Experimental
Observational
Studies
Descriptive
Case Reports
Case Series
Cross-Sectional or
Prevalence Study
Analytical
Case-control
Cohort Study
Experimental
Intervention
Studies
Randomised
Clinical Trial
Non-randomized/
Field/ Community
Trial
Observational Studies
• Analytical
• Mainobjective is to test hypothesis of relationship between exposure to risk factor and
disease or other healthoutcome
• A measure of associationis estimated
• The magnitude, precision and statisticalsignificanceif the associationis determined
Cohort Study
• Identifya group ofsubjects
• Followthemover time
• Compare eventrate in:
• Subjects exposed to risk factors and
• Those not exposedto risk factor
• Can be both retrospectiveand prospective
Cohort Study
Source
Population
Ineligible Eligible
Exposed
No
Participation
Participation
Bad
Outcome
Good
Outcome
Unexposed
No
Participation
Participation
Bad
Outcome
Good
Outcome
Types of Study Design
Non-Experimental
Observational
Studies
Descriptive
Case Reports
Case Series
Cross-Sectional or
Prevalence Study
Analytical
Case-control
Cohort Study
Experimental
Intervention
Studies
Randomised
Clinical Trial
Non-randomized/
Field/ Community
Trial
Important Concept: Randomisation
Definition:Theprocess by which allocationofsubjectstotreatmentgroups
is doneby chance,withouttheabilitytopredict who is in whatgroup
Aims:-
-To prevent statisticalbias in allocating subjects to treatment groups
- To achieve comparability between the groups
- To ensure samples representative of the general population
SimpleRandomSampling PermutedBlockRandomisation StratifiedRandomSampling
Methods ofRandomisation
1 2 3 4
5 6 7 8
9 10 11 12
2 35
3
8 10
Population
Sample
AABBAABB
BBBBAAA
AABAAABB
Populations Strata Sample
Predictive Analytics forHealthcare
Part 1
• Drivers of predictive analyticsin healthcare
Part 2
• Evaluationof data-driven healthcare
Part3
• Important concepts in study design
Part 4
• Top10 Data Science Skills
Skill #1: ROC Curves
No disease Disease
No disease
(D = 0)

Specificity
X
Type I error (False
+) 
Disease
(D = 1)
X
Type II error (False -)


Power 1 - ;
Sensitivity
UsefulTool forDiagnosticTestEvaluation
Specific Example:How welldoesIgE“score” for Ara h 2peanutcomponent
predict/ diagnosepeanutallergy
IgE response to Ara h2
Ptswith
peanut allergy
Ptswithout the
peanut allergy
IgE response to Ara h2
Call these patients “non-peanut allergic” Call these patients “peanut allergic”
Threshold
IgE= 3.15
Call these patients “non-peanut allergic” Call these patients “peanut allergic”
withoutpeanut allergy
with peanutallergy
TruePositives
Some definitions ...
IgE response to Ara h2
IgE= 3.15
withoutpeanut allergy
with peanutallergy
False Positives
IgE response to Ara h2
IgE= 3.15
Call these patients “non-peanut allergic” Call these patients “peanut allergic”
True negatives
Call these patients “non-peanut allergic” Call these patients “peanut allergic”
withoutpeanut allergy
with peanutallergy
IgE response to Ara h2
IgE= 3.15
False negatives
Call these patients “non-peanut allergic” Call these patients “peanut allergic”
withoutpeanut allergy
with peanutallergy
IgE response to Ara h2
IgE= 3.15
TruePositiveRate(sensitivity)
0%
100%
FalsePositiveRate(1-
specificity)
0% 100%
ROC curve
IgEResponse to Arah 2
without peanut allergy
with peanut allergy
Skill #2: Power and Sample Size Calculation
https://towardsdatascience.com/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f
𝑛 =
𝑟 + 1
𝑟
𝜎2
𝑍 𝛽 + 𝑍 𝛼
2
2
𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒2
r=ratio ofcontrols tocases
Samplesize in case group
Standard deviation ofthe outcome
variable
Representsthe desired power
(typically0.84 for 80% power)
Effect Size
(the difference in means)
Representsthelevelof statistical
significance
(typically1.96)
Skill #3: Data Visualisation
https://towardsdatascience.com/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f
Skill #4: A/B Testing
Patient Group
Population is splitinto 2
groups byrandom allocation
Intervention
Control
Outcomes for both
groups are measured
= Cured = Still Diseased
Skill #5 Linear Regression
• Linear regression allowsus tolookat thelinear relationshipbetweenone normally
distributedinterval predictorand onenormallydistributedinterval outcomevariable
• Example:Wewish tolookatthe relationshipbetweenFEV1and IgEatage5
• In otherwords, predictingFEV1from IgE
Linear Regression: Example
• WeseethattherelationshipbetweenFEVand TotalIgEat age 5 isnegative(-0.11)
• Based on thep-value(<0.05), wewouldconclude thisrelationship is statistically
significant.
• Hence, wewould say thereisa statisticallysignificantnegativelinearrelationship
betweenreading andwriting.
Coefficientsa
1.146 .033 35.014 .000
-.011 .005 -.221 -2.242 .027
(Constant)
Total serum IgE, age 5
Model
1
B Std. Error
Unstandardized
Coefficients
Beta
Standardized
Coefficients
t Sig.
Dependent Variable: FEV1, age 5a.
Skill #6: Logistic Regression
Asthm a/wheeze sym ptom between 0 to 8 years * Ever antibiotic in first 12 m onths
165 257 422
39.1% 60.9% 100.0%
98 380 478
20.5% 79.5% 100.0%
263 637 900
29.2% 70.8% 100.0%
Control
Case
Total
No Yes
Antibiotic in first 12 months
Total
Odds of developing asthma inantibiotic group are:
(380/637)/(257/637)= 380/257= 1.48
Odds of developing asthma innon-antibiotic group are:
(98/263)/(165/263)= 98/165= 0.59
Theratio of odds for antibiotic users to the odds of non-antibiotic
users is:
(380/257)/(98/165)= (380*165)/(257*98)= 2.49
Theodds for antibiotic users is 1.49times the odds for non-users
What are the Odds?
Variables in the Equation
.912 .151 36.506 1 .000 2.489 1.852 3.347
-.521 .128 16.688 1 .000 .594
evrab12m
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B) Lower Upper
95.0%C.I.for EXP(B)
Variable(s) entered on step 1: evrab12m.a.
Theinterceptof-0.52is thelogoddsfor non-antibioticusers since
this isthereference group.log(-0.52)=0.59
Thecoefficientofevrab12mis thelogoftheodds ratiobetweenthe
antibioticusers andnon-antibioticusers
Skill #7: CausalReasoning
Thequestions that motivate most studies in the health, social and behavioral sciences
arenotassociational butcausal in nature.
Before an association is assessed for the possibility that it is causal, other explanations
such as chance,bias and confounding have to beexcluded
Require some knowledge of the data-generating process - cannot be computed from
the data alone, nor from distributions governing data
Aim: to infer dynamics of beliefs under changing conditions, for example, changes
induced by treatments orexternal interventions.
Pearl, Judea."Causal inferencein statistics: An overview." Statistics surveys3 (2009): 96-146.
Bradford-Hill Principles of Causality
Plausibility
Does causation make sense
Consistency
Cause associated with disease in
different population and studies
Temporality
Cause precedes disease
Strength
Cause strongly associated with disease
Specificity
Does the cause lead toa specific effect
Dose-Response
Greater exposure to cause,
higher the risk of disease
Prognosticbiomarker (risk
factor)
Treatment
Genetic Marker
Tumor Size
Outcome
(Survival)
U
Example: Personalisation of Cancer Treatment
Skill #8: Missing Data
MissingCompletelyAtRandom(MCAR)
The probability of data being missing does not depend on the observed or unobserved data
e.g. logit(pit) = θ0
MissingAtRandom(MAR)
The probability of data being missing does not depend on the unobserved data, conditional on the observed data
e.g. Children with missing wheeze data have better lung function
e.g. logit(pit) = θ0 + θ1ti or logit(pit) = θ0 + θ2y0
MissingNotAt Random(MNAR)
The probability of data being missing does depend on the unobserved data, conditional on the observed data.
e.g. Children with missing lung function have better lung function
e.g. logit(pit) = θ0 + θ3yit
Alexina Mason. “Bayesian methods for modelling non-randommissing data mechanisms in longitudinal studies” PhD Thesis (2009)
The Challenge of Missing Data
Missing data is a common problem in healthcaredata and can produce biased
parameter estimates
Reasons for missingness may be informativefor estimatingmodel parameters
Bayesian models: coherent approach to incorporating uncertaintyby assigning
prior distributions
Mason, Alexina, NickyBest, SylviaRichardson, and IANPLEWIS. "Strategy for modelling non-randommissing data mechanisms in observational studies using
Bayesian methods." Journalof Official Statistics (2010)
Skill #9: LatentVariable Modelling using Probabilistic
Programming
To identifysubgroups ofcomplexdiseaserisk
Treatmentoutcomeexplainedby distinctiveunderlyingmechanism
FoundationofStratifiedMedicine
Seekingbetter-targetedinterventions
Identifying DiseaseEndotypes
Parsimonious descriptionofthe
datainferredfromwhatis
observed
Poor
Lung Function
Wheeze
Allergy
AsthmaMedication
Asthma Symptoms
Exacerbations
Grow out of
Asthma
Asthma Late in
Childhood
Respond to
treatment
Don’t Respond to
treatment
Severity
EndotypesDiscovery: Different DiseasesWith Different Causes
Phenotypes: Observable Manifestationsof Disease
Defining Asthma
Asthma
Probabilistic Programming:
Finding Patterns in Data
Inference
Algorithm
Probabilistic
model
Probabilistic reasoning system
The probabilistic
model expresses
general knowledge
about a situation
The inference
algorithm uses the
model to answer
queries given evidence
The answers to queries
are framed as
probabilities of
different outcomes
The basic components of a probabilistic reasoning system
Answer
Queries
Evidence
The evidence contains
specific information
about a situation
The queries express
the things that will
help you make a
decisionAdapted from Pfeffer,Avi. "Practical probabilistic programming." International ConferenceonInductive Logic Programming. Springer Berlin Heidelberg, 2010.
1. Team Science:Discoveries about healthcare, not hypothesised a priori, have been made by experts
explaining structure learned from data by algorithms tuned by those experts
2. Heuristic blend of biostatistics and machine-learningfor principled problem-led healthcare research
3. An ML approach to extracting knowledge from information in healthcare requires persistent integration of
Data
Methods
Expertise
Skill #10: Interdisciplinary Research
Problem-led vsData-driven Health
Thinkdeeplyabout theclinical context.Find
solutions which are specifictothe problem.
Goodscienceis aboutmergingdifferentschoolsof
thoughtfordevelopingthebiggerpicture.
Datadriven approach + DomainKnowledge=
Problem-ledapproach withthe patientatthecentre
DanielleBelgrave, JohnHenderson,AngelaSimpson,IainBuchan,ChristopherBishop,andAdnanCustovic.
"Disaggregatingasthma:Big investigationversusbig data." Journalof Allergy andClinicalImmunology139, no. 2 (2017):400-407..
Take Home Message
Problem-Led Patient-Centred Research
Healthcare ML at MSRCambridge
Javier Alvarez-Valle Danielle Belgrave Chris Bishop Laurence Bourn David Carter Richard Lowe
Hannah Murfet Jay Nanavati Kenton O’Hara Konstantina Palla Anton Schwaighofer
Kenji Takeda Ivan Tarapov Stefan Wijnen
Aditya NoriPratik Ghosh
Anja Thieme
Tim Regan
Isabel Chien
Jan Steumer
Antonio Criminisi
Sebastian Tschiatschek
Thank You
@DaniCMBelg
Cohort Study: Example
• A long-term follow-up study of doctors began in 1950 (Doll and Hill)
• Questionnaire sent to all men and women on British Medical Register and
Resident in the UK
• Asked about age and smoking habits
• Replies obtained from 34,440 men
• Subsequent follow-up focused on men
• Further questionnaires sent at intervals and asked about changes in smoking
habit
• During 1951-1971 10,074 had died
Missing Completely At Random
𝑚𝑖
𝑝𝑖σ²
β
µ𝑖
θ
𝑦𝑖
Individual i
Model of Interest
Model of
Missingness
𝑥𝑖
logit(pit) = θ0
Missing At Random
𝑚𝑖
𝑝𝑖σ²
β
µ𝑖
θ
𝑦𝑖
Individual i
Model of Interest
Model of Missingness
𝑥𝑖
logit(pit) = θ0 + θ1xi
Missing NotAt Random
𝑚𝑖
𝑝𝑖σ²
β
µ𝑖
θ
𝑦𝑖
Individual i
Model of Interest
Model of Missingness
𝑥𝑖
logit(pit) = θ0 + θ3yit
Prognostic Biomarker (Risk Factor)
A biological measurementmade before treatment toindicatelong-term
outcome for patientseither untreated or receiving standardoutcome
Prognosticbiomarker (risk
factor)
Random Allocation
(Treatment) Outcomes
Dunn,Graham, RichardEmsley, Hanhua Liu, and Sabine Landau. "Integratingbiomarkerinformation within trials to evaluatetreatment mechanisms andefficacy for
personalised medicine." ClinicalTrials10, no. 5 (2013): 709-719.
Predictive Biomarker (Moderator)
A variablethat changesthe impact of treatment on the outcome. A
biologicalmeasurement made before treatmentto identifypatientslikely
or unlikelyto benefit from a particulartreatment
Predictive biomarker
(Moderator)
Random Allocation
(Treatment) Outcomes
Mediator
A mechanism by which one variableaffectsanother variable.Omitted
common causes (hidden confounding)shouldalwaysbe considered as a
possibleexplanationfor associationsthat might be interpreted as causal
Random Allocation
(Treatment)
Mediator
Outcomes
U
Efficacyand mechanism evaluation: Causal
Framework forinvestigating who
medications workfor
Prognosticbiomarker (risk
factor)
Random Allocation
Predictive biomarker
(moderator)
Mediator
Outcomes
U
‘‘-’’ ‘‘+’’
Moving the Threshold: right
withoutpeanut allergy
with peanutallergy
IgE response to Ara h2
3.15 7.155.15 9.15
‘‘-’’ ‘‘+’’
Moving the Threshold: left
IgE response to Ara h2
withoutpeanut allergy
with peanutallergy
3.152.151.150.15

More Related Content

PDF
The Future of Digital Health in 2022
PPTX
Big-Data in HealthCare _ Overview
PPTX
Role of Artificial Intelligence in Public Health
PDF
Big Data Solutions for Healthcare
PPTX
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
PDF
Big Data Analytics for Healthcare
PDF
Artificial Intelligence in Healthcare Report
PPTX
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...
The Future of Digital Health in 2022
Big-Data in HealthCare _ Overview
Role of Artificial Intelligence in Public Health
Big Data Solutions for Healthcare
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big Data Analytics for Healthcare
Artificial Intelligence in Healthcare Report
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...

What's hot (20)

PDF
Introduction to Digital Health (EN)
PPTX
Transforming Healthcare Analytics: Five Critical Steps
PPTX
Big Data in Medicine
PPTX
Big Data applications in Health Care
PDF
Data Profiling: The First Step to Big Data Quality
PPTX
Machine learning in healthcare.pptx
PPTX
Artificial Intelligence in Public Health
PPTX
Data quality and data profiling
PDF
Data Analytics in Healthcare
PDF
Healthcare analytics
PPT
Survival Analysis Lecture.ppt
PDF
Digital Healthcare Trends: Transformation Towards Better Care Relationship
PPTX
Big data
PPTX
Session 6 a DHIS2 : Overview and Implementation in West Africa
PPTX
Big data and the Healthcare Sector
PPTX
Big data analytics in healthcare
PPTX
DIGITAL HEALTH.pptx
PDF
MT115 Precision Medicine: Integrating genomics to enable better patient outcomes
PDF
Logical Data Fabric: Architectural Components
PDF
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Introduction to Digital Health (EN)
Transforming Healthcare Analytics: Five Critical Steps
Big Data in Medicine
Big Data applications in Health Care
Data Profiling: The First Step to Big Data Quality
Machine learning in healthcare.pptx
Artificial Intelligence in Public Health
Data quality and data profiling
Data Analytics in Healthcare
Healthcare analytics
Survival Analysis Lecture.ppt
Digital Healthcare Trends: Transformation Towards Better Care Relationship
Big data
Session 6 a DHIS2 : Overview and Implementation in West Africa
Big data and the Healthcare Sector
Big data analytics in healthcare
DIGITAL HEALTH.pptx
MT115 Precision Medicine: Integrating genomics to enable better patient outcomes
Logical Data Fabric: Architectural Components
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Ad

Similar to Day 1 (Lecture 3): Predictive Analytics in Healthcare (20)

PPTX
Predicting the Future of Predictive Analytics in Healthcare
PPTX
There Is A 90% Probability That Your Son Is Pregnant: Predicting the Future ...
PPTX
Future of Healthcare Forum (Digital Health 2017) - Andrew Satz
PPTX
Clinical trials its types and designs
PPTX
Can CER and Personalized Medicine Work Together
PPTX
4 Essential Lessons for Adopting Predictive Analytics in Healthcare
PDF
eBook - Data Analytics in Healthcare
DOCX
Chapter 4 Knowledge Discovery, Data Mining, and Practice-Based Evi.docx
PPTX
McGrath Health Data Analyst SXSW
PDF
Statistics — Your Friend, Not Your Foe
PDF
HEALTH PREDICTION ANALYSIS USING DATA MINING
PPTX
Predictive analytics for personalized healthcare
PPTX
Data Science in Healthcare.pptx sdflkhsdalk jdsj
PPT
introductoin to Biostatistics ( 1st and 2nd lec ).ppt
PDF
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
PPTX
Working With Large-Scale Clinical Datasets
PDF
Know Your Patient’s Condition Before It’s Too Late with Data Analytics
PPT
Innovative Insights for Smarter Care: Care Management and Analytics
PDF
Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...
PPTX
Data science 101
Predicting the Future of Predictive Analytics in Healthcare
There Is A 90% Probability That Your Son Is Pregnant: Predicting the Future ...
Future of Healthcare Forum (Digital Health 2017) - Andrew Satz
Clinical trials its types and designs
Can CER and Personalized Medicine Work Together
4 Essential Lessons for Adopting Predictive Analytics in Healthcare
eBook - Data Analytics in Healthcare
Chapter 4 Knowledge Discovery, Data Mining, and Practice-Based Evi.docx
McGrath Health Data Analyst SXSW
Statistics — Your Friend, Not Your Foe
HEALTH PREDICTION ANALYSIS USING DATA MINING
Predictive analytics for personalized healthcare
Data Science in Healthcare.pptx sdflkhsdalk jdsj
introductoin to Biostatistics ( 1st and 2nd lec ).ppt
Big Data as a game-changer of clinical research strategies by Rafael San Migu...
Working With Large-Scale Clinical Datasets
Know Your Patient’s Condition Before It’s Too Late with Data Analytics
Innovative Insights for Smarter Care: Care Management and Analytics
Data Con LA 2019 - Best Practices for Prototyping Machine Learning Models for...
Data science 101
Ad

More from Aseda Owusua Addai-Deseh (9)

PDF
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
PDF
Day 1 (Lecture 4): Data Science in the Retail Marketing and Financial Services
PPTX
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
PPTX
Day 2 (Lecture 3): Deep Learning Fundamentals - Architecture and Applications
PPTX
Day 1 Keynote Address-GDSS 2019 (IndabaX Ghana)
PPTX
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
PPTX
Day 1 (Lecture 2): Business Analytics
PPTX
Day 2 (Lecture 4): Machine Learning Applications in Health Care
PDF
Welcome Address-GDSS 2019 (IndabaX Ghana)
Day 2 (Lecture 1): Introduction to Statistical Machine Learning and Applications
Day 1 (Lecture 4): Data Science in the Retail Marketing and Financial Services
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 3): Deep Learning Fundamentals - Architecture and Applications
Day 1 Keynote Address-GDSS 2019 (IndabaX Ghana)
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 2): Business Analytics
Day 2 (Lecture 4): Machine Learning Applications in Health Care
Welcome Address-GDSS 2019 (IndabaX Ghana)

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Introduction to Business Data Analytics.
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Fluorescence-microscope_Botany_detailed content
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Computer network topology notes for revision
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
IB Computer Science - Internal Assessment.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Business Data Analytics.
Launch Your Data Science Career in Kochi – 2025
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Fluorescence-microscope_Botany_detailed content
.pdf is not working space design for the following data for the following dat...
Introduction to Knowledge Engineering Part 1
Computer network topology notes for revision
STUDY DESIGN details- Lt Col Maksud (21).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Miokarditis (Inflamasi pada Otot Jantung)
IBA_Chapter_11_Slides_Final_Accessible.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Data_Analytics_and_PowerBI_Presentation.pptx

Day 1 (Lecture 3): Predictive Analytics in Healthcare

  • 1. Predictive Analytics in Healthcare: Building Successful Pipelines Danielle Belgrave Researcher Healthcare Machine Learning @DaniCMBelg
  • 2. Latent Variable Modelling My Research Missing DataLongitudinal Data Analysis 0 2 4 6 8 10 12 14 16 18 20 22 24 Time (hours) Patient 1 Patient 2 Patient 3 Patient 4 MultidisciplinaryCausality X Y Z Patient-Centric Approach
  • 3. Roadmap: Predictive Analytics for Healthcare Part 1 • Drivers ofpredictive analyticsin healthcare Part 2 • Evaluationof data-driven healthcare Part3 • Important concepts in study design Part 4 • Top 10 DataScience Skills
  • 4. Part 1: Drivers of Predictive Analytics in healthcare PopulationPatient/Person Pharmaceuticals Providers TheKey Stakeholders inHealth
  • 5. ThePerson at the Centre of Healthcare Patient/Person MachineLearninghas the capacityto transformhealthcare - Understandingphysiological changes overtime - Forecastingofprogression or onsetof disease - Personalisingtreatmentstrategies
  • 6. Population Data-driven Healthcare Population Elucidatesaverage effectsand deviations fromaverage effects Policy recommendations Healtheducation Outreach Researchfor diseasedetectionand injury prevention Reducehealthcareinequalities Whatwe as asociety do collectively toassurethe conditionsin which people can be healthy
  • 7. The Pharmaceutical Perspective: Drug Discoveryand Therapeutics Idea Basic Research Clinical Trails Phase I Phase II Phase III Regulatory Approval Patient Care
  • 8. Data Protection and Connected Care: The Provider and Regulator Perspective Providers
  • 9. Predictive Analytics forHealthcare Part 1 • Drivers of predictive analyticsin healthcare Part 2 • Evaluation of data-driven healthcare Part3 • Important concepts in study design Part 4 • Top 10 DataScience Skills
  • 10. DATA EXPERTISE METHODS & MODELS Vast data volume, velocity, variety TSUNAMI Supra-linear growth in papers & tools BLIZZARD Human expertise to make sense of the growing data DROUGHT Myth 1: ‘Big data’ are a solution Myth 2: We have the models needed to turn healthcare data into decision support algorithms Myth 3: Clinicians will continue to be the main source of data Current Stateof Healthcare Data
  • 11. Machine Learning has the Potential to Disrupt and Impact Healthcare…
  • 12. …But it’s Important to Remember the Beginnings The study of the distribution and determinantsof healthrelated statesor events in specific populations & theapplications of this study to thecontrol of healthproblems
  • 13. Data visualisation: death toll of the Crimean War Army data: 16,000/18,000 deaths notdue to battle wounds, but to preventable diseases, spread by poor sanitation The Beginnings of Data-Driven Health Florence Nightingale(1820– 1910)
  • 14. Anchor Data Science in The Power ofObservation Contextualphenomena:cholera incidence Ecologicaldesign:compare cholera rates by region Cohort design:comparecholera ratesin exposedand non- exposedindividuals
  • 15. R.A. Fisher and the Principles of Experimental Design 1.Randomisation:Unbiasedallocationof treatmentsto differentexperimentalplot 2.Replication:repetitionofthe treatment tomorethan oneexperimentalplot 3.Error control: Measurefor reducing the error ofvariance Why do these 2 plants differ in growth?
  • 16. Predictive Analytics forHealthcare Part 1 • Drivers of predictive analyticsin healthcare Part 2 • Evaluationof data-driven healthcare Part3 • Important conceptsin study design Part 4 • Top 10 DataScience Skills
  • 17. Principles of Study Design Needtosetup a studytoanswer aresearch question Designmost importantaspect of a studyand perhaps themostneglected Thestudydesign should matchresearch question So that we don’t endup collecting useless data orthe principle outcome ends up not being recorded Nomatter howgoodanalgorithmis,if the studydesign is inadequate (garbage in) for answeringthe research question,we’ll get garbage out
  • 18. Good Predictive Analyticsfor Healthcare starts with Good Study Design Threequestions: 1. Whatis thequestionor hypothesis I’mtrying toinvestigate 2. Is thedataI have adequateinorder to address thisquestion 3. How can I design astudythataddresses thesequestions
  • 19. Types of Study Design Non-Experimental Observational Studies Descriptive Case Reports Case Series Cross-Sectional or Prevalence Study Analytical Case-control Cohort Study Experimental Intervention Studies Randomised Clinical Trial Non-randomized/ Field/ Community Trial
  • 20. CaseReport Profileofa singlepatientis reportedin detailby oneor more clinician
  • 21. CaseSeries An individualcase report thathas been expandedtoincludea number ofpatientswitha givendisease
  • 22. Cross-sectional study • Allinformationcollectedatsametimepoint • Snapshots of healthstate • Designedtoobtaininformationfromsamplesregarding prevalence, distributionsand interrelations • Eg:survey • Thereare no typical formats:they aredesignedor modifiedtomeettheneeds ofthe researcher or fitthetopicofresearch
  • 23. Cross-sectional Study Source Population Exposed & Bad outcome Exposed & Good outcome Unexposed & Bad outcome Unexposed & Good outcome Ineligible Eligible Participation No Paricipation
  • 24. Types of Study Design Non-Experimental Observational Studies Descriptive Case Reports Case Series Cross-Sectional or Prevalence Study Analytical Case-control Cohort Study Experimental Intervention Studies Randomised Clinical Trial Non-randomized/ Field/ Community Trial
  • 25. Observational Studies • Analytical • Mainobjective is to test hypothesis of relationship between exposure to risk factor and disease or other healthoutcome • A measure of associationis estimated • The magnitude, precision and statisticalsignificanceif the associationis determined
  • 26. Cohort Study • Identifya group ofsubjects • Followthemover time • Compare eventrate in: • Subjects exposed to risk factors and • Those not exposedto risk factor • Can be both retrospectiveand prospective
  • 28. Types of Study Design Non-Experimental Observational Studies Descriptive Case Reports Case Series Cross-Sectional or Prevalence Study Analytical Case-control Cohort Study Experimental Intervention Studies Randomised Clinical Trial Non-randomized/ Field/ Community Trial
  • 29. Important Concept: Randomisation Definition:Theprocess by which allocationofsubjectstotreatmentgroups is doneby chance,withouttheabilitytopredict who is in whatgroup Aims:- -To prevent statisticalbias in allocating subjects to treatment groups - To achieve comparability between the groups - To ensure samples representative of the general population
  • 30. SimpleRandomSampling PermutedBlockRandomisation StratifiedRandomSampling Methods ofRandomisation 1 2 3 4 5 6 7 8 9 10 11 12 2 35 3 8 10 Population Sample AABBAABB BBBBAAA AABAAABB Populations Strata Sample
  • 31. Predictive Analytics forHealthcare Part 1 • Drivers of predictive analyticsin healthcare Part 2 • Evaluationof data-driven healthcare Part3 • Important concepts in study design Part 4 • Top10 Data Science Skills
  • 32. Skill #1: ROC Curves No disease Disease No disease (D = 0)  Specificity X Type I error (False +)  Disease (D = 1) X Type II error (False -)   Power 1 - ; Sensitivity UsefulTool forDiagnosticTestEvaluation
  • 33. Specific Example:How welldoesIgE“score” for Ara h 2peanutcomponent predict/ diagnosepeanutallergy IgE response to Ara h2 Ptswith peanut allergy Ptswithout the peanut allergy
  • 34. IgE response to Ara h2 Call these patients “non-peanut allergic” Call these patients “peanut allergic” Threshold IgE= 3.15
  • 35. Call these patients “non-peanut allergic” Call these patients “peanut allergic” withoutpeanut allergy with peanutallergy TruePositives Some definitions ... IgE response to Ara h2 IgE= 3.15
  • 36. withoutpeanut allergy with peanutallergy False Positives IgE response to Ara h2 IgE= 3.15 Call these patients “non-peanut allergic” Call these patients “peanut allergic”
  • 37. True negatives Call these patients “non-peanut allergic” Call these patients “peanut allergic” withoutpeanut allergy with peanutallergy IgE response to Ara h2 IgE= 3.15
  • 38. False negatives Call these patients “non-peanut allergic” Call these patients “peanut allergic” withoutpeanut allergy with peanutallergy IgE response to Ara h2 IgE= 3.15
  • 40. Skill #2: Power and Sample Size Calculation https://towardsdatascience.com/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f 𝑛 = 𝑟 + 1 𝑟 𝜎2 𝑍 𝛽 + 𝑍 𝛼 2 2 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒2 r=ratio ofcontrols tocases Samplesize in case group Standard deviation ofthe outcome variable Representsthe desired power (typically0.84 for 80% power) Effect Size (the difference in means) Representsthelevelof statistical significance (typically1.96)
  • 41. Skill #3: Data Visualisation https://towardsdatascience.com/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f
  • 42. Skill #4: A/B Testing Patient Group Population is splitinto 2 groups byrandom allocation Intervention Control Outcomes for both groups are measured = Cured = Still Diseased
  • 43. Skill #5 Linear Regression • Linear regression allowsus tolookat thelinear relationshipbetweenone normally distributedinterval predictorand onenormallydistributedinterval outcomevariable • Example:Wewish tolookatthe relationshipbetweenFEV1and IgEatage5 • In otherwords, predictingFEV1from IgE
  • 44. Linear Regression: Example • WeseethattherelationshipbetweenFEVand TotalIgEat age 5 isnegative(-0.11) • Based on thep-value(<0.05), wewouldconclude thisrelationship is statistically significant. • Hence, wewould say thereisa statisticallysignificantnegativelinearrelationship betweenreading andwriting. Coefficientsa 1.146 .033 35.014 .000 -.011 .005 -.221 -2.242 .027 (Constant) Total serum IgE, age 5 Model 1 B Std. Error Unstandardized Coefficients Beta Standardized Coefficients t Sig. Dependent Variable: FEV1, age 5a.
  • 45. Skill #6: Logistic Regression Asthm a/wheeze sym ptom between 0 to 8 years * Ever antibiotic in first 12 m onths 165 257 422 39.1% 60.9% 100.0% 98 380 478 20.5% 79.5% 100.0% 263 637 900 29.2% 70.8% 100.0% Control Case Total No Yes Antibiotic in first 12 months Total Odds of developing asthma inantibiotic group are: (380/637)/(257/637)= 380/257= 1.48 Odds of developing asthma innon-antibiotic group are: (98/263)/(165/263)= 98/165= 0.59 Theratio of odds for antibiotic users to the odds of non-antibiotic users is: (380/257)/(98/165)= (380*165)/(257*98)= 2.49 Theodds for antibiotic users is 1.49times the odds for non-users
  • 46. What are the Odds? Variables in the Equation .912 .151 36.506 1 .000 2.489 1.852 3.347 -.521 .128 16.688 1 .000 .594 evrab12m Constant Step 1 a B S.E. Wald df Sig. Exp(B) Lower Upper 95.0%C.I.for EXP(B) Variable(s) entered on step 1: evrab12m.a. Theinterceptof-0.52is thelogoddsfor non-antibioticusers since this isthereference group.log(-0.52)=0.59 Thecoefficientofevrab12mis thelogoftheodds ratiobetweenthe antibioticusers andnon-antibioticusers
  • 47. Skill #7: CausalReasoning Thequestions that motivate most studies in the health, social and behavioral sciences arenotassociational butcausal in nature. Before an association is assessed for the possibility that it is causal, other explanations such as chance,bias and confounding have to beexcluded Require some knowledge of the data-generating process - cannot be computed from the data alone, nor from distributions governing data Aim: to infer dynamics of beliefs under changing conditions, for example, changes induced by treatments orexternal interventions. Pearl, Judea."Causal inferencein statistics: An overview." Statistics surveys3 (2009): 96-146.
  • 48. Bradford-Hill Principles of Causality Plausibility Does causation make sense Consistency Cause associated with disease in different population and studies Temporality Cause precedes disease Strength Cause strongly associated with disease Specificity Does the cause lead toa specific effect Dose-Response Greater exposure to cause, higher the risk of disease
  • 49. Prognosticbiomarker (risk factor) Treatment Genetic Marker Tumor Size Outcome (Survival) U Example: Personalisation of Cancer Treatment
  • 50. Skill #8: Missing Data MissingCompletelyAtRandom(MCAR) The probability of data being missing does not depend on the observed or unobserved data e.g. logit(pit) = θ0 MissingAtRandom(MAR) The probability of data being missing does not depend on the unobserved data, conditional on the observed data e.g. Children with missing wheeze data have better lung function e.g. logit(pit) = θ0 + θ1ti or logit(pit) = θ0 + θ2y0 MissingNotAt Random(MNAR) The probability of data being missing does depend on the unobserved data, conditional on the observed data. e.g. Children with missing lung function have better lung function e.g. logit(pit) = θ0 + θ3yit Alexina Mason. “Bayesian methods for modelling non-randommissing data mechanisms in longitudinal studies” PhD Thesis (2009)
  • 51. The Challenge of Missing Data Missing data is a common problem in healthcaredata and can produce biased parameter estimates Reasons for missingness may be informativefor estimatingmodel parameters Bayesian models: coherent approach to incorporating uncertaintyby assigning prior distributions Mason, Alexina, NickyBest, SylviaRichardson, and IANPLEWIS. "Strategy for modelling non-randommissing data mechanisms in observational studies using Bayesian methods." Journalof Official Statistics (2010)
  • 52. Skill #9: LatentVariable Modelling using Probabilistic Programming To identifysubgroups ofcomplexdiseaserisk Treatmentoutcomeexplainedby distinctiveunderlyingmechanism FoundationofStratifiedMedicine Seekingbetter-targetedinterventions
  • 54. Poor Lung Function Wheeze Allergy AsthmaMedication Asthma Symptoms Exacerbations Grow out of Asthma Asthma Late in Childhood Respond to treatment Don’t Respond to treatment Severity EndotypesDiscovery: Different DiseasesWith Different Causes Phenotypes: Observable Manifestationsof Disease Defining Asthma Asthma
  • 55. Probabilistic Programming: Finding Patterns in Data Inference Algorithm Probabilistic model Probabilistic reasoning system The probabilistic model expresses general knowledge about a situation The inference algorithm uses the model to answer queries given evidence The answers to queries are framed as probabilities of different outcomes The basic components of a probabilistic reasoning system Answer Queries Evidence The evidence contains specific information about a situation The queries express the things that will help you make a decisionAdapted from Pfeffer,Avi. "Practical probabilistic programming." International ConferenceonInductive Logic Programming. Springer Berlin Heidelberg, 2010.
  • 56. 1. Team Science:Discoveries about healthcare, not hypothesised a priori, have been made by experts explaining structure learned from data by algorithms tuned by those experts 2. Heuristic blend of biostatistics and machine-learningfor principled problem-led healthcare research 3. An ML approach to extracting knowledge from information in healthcare requires persistent integration of Data Methods Expertise Skill #10: Interdisciplinary Research
  • 57. Problem-led vsData-driven Health Thinkdeeplyabout theclinical context.Find solutions which are specifictothe problem. Goodscienceis aboutmergingdifferentschoolsof thoughtfordevelopingthebiggerpicture. Datadriven approach + DomainKnowledge= Problem-ledapproach withthe patientatthecentre DanielleBelgrave, JohnHenderson,AngelaSimpson,IainBuchan,ChristopherBishop,andAdnanCustovic. "Disaggregatingasthma:Big investigationversusbig data." Journalof Allergy andClinicalImmunology139, no. 2 (2017):400-407..
  • 58. Take Home Message Problem-Led Patient-Centred Research
  • 59. Healthcare ML at MSRCambridge Javier Alvarez-Valle Danielle Belgrave Chris Bishop Laurence Bourn David Carter Richard Lowe Hannah Murfet Jay Nanavati Kenton O’Hara Konstantina Palla Anton Schwaighofer Kenji Takeda Ivan Tarapov Stefan Wijnen Aditya NoriPratik Ghosh Anja Thieme Tim Regan Isabel Chien Jan Steumer Antonio Criminisi Sebastian Tschiatschek
  • 61. Cohort Study: Example • A long-term follow-up study of doctors began in 1950 (Doll and Hill) • Questionnaire sent to all men and women on British Medical Register and Resident in the UK • Asked about age and smoking habits • Replies obtained from 34,440 men • Subsequent follow-up focused on men • Further questionnaires sent at intervals and asked about changes in smoking habit • During 1951-1971 10,074 had died
  • 62. Missing Completely At Random 𝑚𝑖 𝑝𝑖σ² β µ𝑖 θ 𝑦𝑖 Individual i Model of Interest Model of Missingness 𝑥𝑖 logit(pit) = θ0
  • 63. Missing At Random 𝑚𝑖 𝑝𝑖σ² β µ𝑖 θ 𝑦𝑖 Individual i Model of Interest Model of Missingness 𝑥𝑖 logit(pit) = θ0 + θ1xi
  • 64. Missing NotAt Random 𝑚𝑖 𝑝𝑖σ² β µ𝑖 θ 𝑦𝑖 Individual i Model of Interest Model of Missingness 𝑥𝑖 logit(pit) = θ0 + θ3yit
  • 65. Prognostic Biomarker (Risk Factor) A biological measurementmade before treatment toindicatelong-term outcome for patientseither untreated or receiving standardoutcome Prognosticbiomarker (risk factor) Random Allocation (Treatment) Outcomes Dunn,Graham, RichardEmsley, Hanhua Liu, and Sabine Landau. "Integratingbiomarkerinformation within trials to evaluatetreatment mechanisms andefficacy for personalised medicine." ClinicalTrials10, no. 5 (2013): 709-719.
  • 66. Predictive Biomarker (Moderator) A variablethat changesthe impact of treatment on the outcome. A biologicalmeasurement made before treatmentto identifypatientslikely or unlikelyto benefit from a particulartreatment Predictive biomarker (Moderator) Random Allocation (Treatment) Outcomes
  • 67. Mediator A mechanism by which one variableaffectsanother variable.Omitted common causes (hidden confounding)shouldalwaysbe considered as a possibleexplanationfor associationsthat might be interpreted as causal Random Allocation (Treatment) Mediator Outcomes U
  • 68. Efficacyand mechanism evaluation: Causal Framework forinvestigating who medications workfor Prognosticbiomarker (risk factor) Random Allocation Predictive biomarker (moderator) Mediator Outcomes U
  • 69. ‘‘-’’ ‘‘+’’ Moving the Threshold: right withoutpeanut allergy with peanutallergy IgE response to Ara h2 3.15 7.155.15 9.15
  • 70. ‘‘-’’ ‘‘+’’ Moving the Threshold: left IgE response to Ara h2 withoutpeanut allergy with peanutallergy 3.152.151.150.15