SlideShare a Scribd company logo
October 13, 2016
Deploying Predictive Analytics:
A practitioner’s guide
Eric Just
Senior Vice President
Levi Thatcher
Director of Data Science
Deploying Predictive Analytics in Healthcare
Deploying Predictive Analytics in Healthcare
© 2016 Health Catalyst
Proprietary and Confidential
Poll Question #1
4
How important are predictive analytics for the future of healthcare?
1) Not at all important
2) Low importance
3) Neutral
4) Moderately important
5) Extremely important
6) Unsure or not applicable
© 2016 Health Catalyst
Proprietary and Confidential
Predictive analytics is about using pattern
recognition to predict future events but…
Predicting something is not good enough; you must
have the data to act and intervene…
and the organizational wherewithal to inteverne
5
© 2016 Health Catalyst
Proprietary and Confidential
What Is Machine Learning?
6
Machine learning explores the study and construction of
algorithms that can learn from and make predictions on data.
Within the field of data analytics, machine learning is a method
used to devise complex models and algorithms that lend
themselves to prediction. In commercial use, this is known as
predictive analytics.
https://en.wikipedia.org/wiki/Machine_learning
© 2016 Health Catalyst
Proprietary and Confidential
Predictive Analytics in Healthcare:
“Classic” Approaches
7
Charlson Index, 1987 LACE Index, 2010
© 2016 Health Catalyst
Proprietary and Confidential
What Has Happened Since 2010?
8
© 2016 Health Catalyst
Proprietary and Confidential 9
What Has
Happened Since
2010?
ML – Based
Predictive
Analytics
© 2016 Health Catalyst
Proprietary and Confidential
Poll Question #2
10
What is the biggest barrier to implementing predictive analytics?
a) We do not have the people or skills
b) We do not have the right data or technical tools/infrastructure
c) We do not have executive support or budget
d) Past efforts have failed to show results
e) Other
f) Unsure or not applicable
© 2016 Health Catalyst
Proprietary and Confidential 11
Predictive analytics is easy
(or at least easier!)
Organizations are struggling with
making predictive analytics routine,
pervasive, and actionable.
© 2016 Health Catalyst
Proprietary and Confidential
Typical ‘Current State’ for Predictive Analytics
12
Data
Source
Predictive
Model ?
Gnarly
SQL Query
Data
Manipulation
Tools/
Algorithms
SAS | Weka |
R | Python
Deploy
© 2016 Health Catalyst
Proprietary and Confidential
Three Key Recommendations for
Scaling Predictive Analytics
13
Data
Source
Predictive
Model ?
Gnarly
SQL Query
Data
Manipulation
Tools/
Algorithms
SAS | Weka |
R | Python
Deploy
Deploy with a
strategy for
intervention
Standardize
tools and
methods
using
production
quality code
Fully leverage
your analytics
environment
© 2016 Health Catalyst
Proprietary and Confidential
Fully Leverage Your Analytics Environment
14
© 2016 Health Catalyst
Proprietary and Confidential
What is a Feature?
“In machine learning and pattern recognition, a feature is an individual measurable property
of a phenomenon being observed. Choosing informative, discriminating and independent
features is a crucial step for effective algorithms in pattern recognition, classification and
regression.”
https://en.wikipedia.org/wiki/Feature_(machine_learning)
15
© 2016 Health Catalyst
Proprietary and Confidential
Leverage Your Analytics Environment
16
A data warehouse provides access to raw data and pre-defined data like:
 Clinical registries
 Comorbidity models (i.e. Charlson Score)
 Readmissions
 Length of stay
 Other calculated fields
Read-only access is not enough!
© 2016 Health Catalyst
Proprietary and Confidential
Polypharmacy Feature
17
© 2016 Health Catalyst
Proprietary and Confidential
Polypharmacy Data Mart
18
PatientID MedicationID StartDT EndDT
Z006F600A51F 15862 7/27/2006 7/28/2006
Z006F600A51F 41801 7/27/2006 10/28/2009
Z006F600A51F 10994 7/27/2006 NULL
Z006F600A51F 15862 7/27/2006 7/28/2006
Z006F600A51F 41801 7/27/2006 NULL
Z006F600A51F 10994 7/27/2006 NULL
Z006F600A51F 15862 7/27/2006 7/28/2006
Z006F600A51F 41801 7/27/2006 NULL
Z006F600A51F 10994 7/27/2006 NULL
Z13148BF2583 4996 9/14/2005 11/15/2005
Z13148BF2583 11798 9/14/2005 11/15/2005
Z13148BF2583 15061 9/14/2005 11/15/2005
Z13148BF2583 2079 11/15/2005 11/15/2005
PatientEncounterID PolypharmacyCNT
1048826 6
1048912 0
1048923 0
1048924 0
1048925 0
1048926 0
1049094 2
1049095 2
1049096 2
1049097 3
1049098 3
1049099 4
1049100 2
Step 1:
Clean up data
• Missing end dates
• One time doses
Step 2:
Aggregate into
polypharmacy count at
each encounter
© 2016 Health Catalyst
Proprietary and Confidential
What Is Feature Engineering?
“Feature engineering is the process of transforming raw data into
features that better represent the underlying problem to the predictive
models, resulting in improved model accuracy...”
Jason Brownlee in “Discover Feature Engineering, How to Engineer Features and How to
Get Good at It”
http://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/
“Much of the success of machine learning is actually success in
engineering features that a learner can understand.”
Scott Locklin in “Neglected Machine Learning Ideas”
https://scottlocklin.wordpress.com/2014/07/22/neglected-machine-learning-ideas/
19
© 2016 Health Catalyst
Proprietary and Confidential
Other Examples of Feature Engineering
20
The ability for data scientists to engineer features is critical to a
successful predictive analytics/ machine learning strategy
Number of ER visits
in the last year
Line days Number and types of
comorbid conditions
Almost any input into
a predictive model will
be engineered in
some way
© 2016 Health Catalyst
Proprietary and Confidential
Fully Leverage Your Analytics Environment
21
Most feature
engineering should
be done in the
analytics
environment (data
warehouse).
Give data scientists
enough access to
data warehouse to
promote efficient re-
use of engineered
features.
Use standard data
warehouse ETL
tools to
operationalize
engineered
features.
© 2016 Health Catalyst
Proprietary and Confidential
Standardize Tools and Methods Using
Production-Quality Code
22
© 2016 Health Catalyst
Proprietary and Confidential
Three Key Recommendations for
Scaling Predictive Analytics
23
Data
Source
Predictive
Model ?
Gnarly
SQL Query
Data
Manipulation
Tools/
Algorithms
SAS | Weka |
R | Python
Deploy
Standardize
tools and
methods
using
production
quality code
Fully leverage
your analytics
environment
© 2016 Health Catalyst
Proprietary and Confidential
You Need Lots of Smart People!
24
• Develops software leveraged by
data scientists to test and deploy
their models
• Requires data science knowledge
• Requires knowledge of software
engineering best practices
• A rare find!
• Formulates hypotheses about
features driving a predictive
model (with clinical input)
• Tries various models to
determine best approach for
prediction
• Assesses model output and
accuracy and operationalizes the
best approach
Data Scientist Machine Learning Engineer
© 2016 Health Catalyst
Proprietary and Confidential
Predictive Analytics Processes
2525
Build model
experiment
(~30-40 features)
Split data
into train
and test
Run
multiple
algorithms
Measure &
report
performance
Select best
algorithm &
important features
(~10 features)
Store
parameters
Load model
parameters
Receive
patient ‘record’
(~10 features)
Calculate
prediction
Output
prediction
2525
Development Process (Readmission prediction)
Running the Model (Readmission prediction)
© 2016 Health Catalyst
Proprietary and Confidential
Developing a Machine Learning Code Base
26
Focus data scientist
on model
development – not
writing code or
reinventing the
wheel
Standardize
methodologies so
that best practices
can be deployed
Predictive models in
production require
production quality
code
Why?
© 2016 Health Catalyst
Proprietary and Confidential
Version control
27
Developing a Machine Learning Code Base:
Best Practices
Unit Testing
Also,
• Documentation
• Continuous integration
This is production code – it
should be treated as such!
© 2016 Health Catalyst
Proprietary and Confidential 28
Developing a Machine Learning Code Base:
Technology Choices
R
• Open source, deeply entrenched in healthcare
• More familiar to analysts/statisticians
Python
• Open source, newer approach (with lots of momentum)
• More familiar to developers
AzureML
• Cloud-based
• Easy to deploy
Plenty of other choices
© 2016 Health Catalyst
Proprietary and Confidential
Our Code Base Includes:
Data
Ingestion/Preparation
• Load data from
database and/or CSV
• Impute missing values
• Remove ‘bad’ data
(rows and columns)
• Date/Time expansion
(i.e. Day of week, week
of year)
29
Model Development
• Split test and train
• Feature selection
• Run algorithms
 Random Forest
 Lasso
 Mixed Models (Q4 2016)
 k-means (Q4 2016)
• Evaluate model
performance
• Save model
Analysis
• Model performance
reports
• Trend identification
• Risk adjusted
comparisons
© 2016 Health Catalyst
Proprietary and Confidential
Scaling People
Data Architects
 Great domain knowledge
 Often looking for opportunities to advance career/skills
With the right tools…
 Data architects make great feature engineers
 Data architects can easily get started in predictive analytics
“One awesome thing about the output from the R [package] you put together is the
output aligns perfectly with creating Patient Stratification algorithms. …The fact that I feel
comfortable running this stuff speaks to how easy you have made it. Thanks again, Levi.”
30
© 2016 Health Catalyst
Proprietary and Confidential
Putting Predictive Models in Production
31
© 2016 Health Catalyst
Proprietary and Confidential
Modality #1:
Extract, Transform, Load (ETL) Process
Deploy in this modality if:
• Prediction is not based on highly dynamic data
• Prediction is used in analytic application
• Intervention strategy ‘OK’ with some latency (up to 24 hours)
• Example: Readmission prediction
32
Write
Predictions
to
Database
(ML Code)
Run
Predictions
(ML Code)
Load
Engineered
Features
(ETL)
Load
Source
Tables
(ETL)
© 2016 Health Catalyst
Proprietary and Confidential
Modality #2: Web Services
Deploy in this modality if:
• Prediction is dynamic data
• Prediction is used in workflow application
• Intervention strategy not ‘OK’ with some latency (up to 24 hours)
• Example: Sepsis early detection
33
Real-time
features from
application
Early detection
web service
Historic
features
(EDW)
Run prediction
(ML Code)
Model input
Model output
Model input
© 2016 Health Catalyst
Proprietary and Confidential
Scaling Predictive Analytics
34
Data
Source
Predictive
Model ?
Gnarly
SQL Query
Data
Manipulation
Tools/
Algorithms
SAS | Weka |
R | Python
DeployDeploy with a
strategy for
intervention
Standardized
tools and
methods
using
production
quality code
Fully leverage
your analytics
environment
© 2016 Health Catalyst
Proprietary and Confidential
Deploy With a Strategy for Intervention
35
© 2016 Health Catalyst
Proprietary and Confidential
Case Study: Central Line Associated Bloodstream
Infection (CLABSI)
• Approximately 41,000 patients with central lines will end up with a blood
stream infection (CLABSI)
• One in four patients with a CLABSI will die
• CLABSI improvement team looking at compliance with evidence-based
guidelines
• Retrospective analysis led to increased insight into problem areas and
associated interventions
• Team wanted more pro-active notification of high-risk patients
• Developed predictive algorithm based on 16 features
36
© 2016 Health Catalyst
Proprietary and Confidential 37
© 2016 Health Catalyst
Proprietary and Confidential
Discussing Models With Clinicians
• Clinicians will adopt predictive analytics
… insofar as they understand it
• Complexity comes at a price
 Regression can often strike a good balance between predictive value
and interpretability
 Use regularization techniques to penalize complexity
For example:
In statistics and machine learning, lasso (least absolute shrinkage and selection operator) (also Lasso or
LASSO) is a regression analysis method that performs both variable selection and regularization in order
to enhance the prediction accuracy and interpretability of the statistical model it produces. (Wikipedia)
38
© 2016 Health Catalyst
Proprietary and Confidential
Models Built To Date:
In Development
Built
Planned
Central line-associated bloodstream infection (CLABSI) – Clinical Decision Support
Forecast IBNR claims/year-end expenditures – Financial Decision Support
Congestive Heart Failure, Readmissions Risk – Clinical Decision Support
Predictive Risk & Cost – Population Health and Accountable Care
Patient Flight Path, Diabetes Future Risk – Clinical Decision Support
Patient Flight Path, Diabetes Future Cost– Clinical Decision Support
Patient Flight Path, Diabetes Top Treatments – Clinical Decision Support
Patient Flight Path, Diabetes Next Likely Complications (Glaucoma) – Clinical Decision Support
Patient Flight Path, Diabetes Next Likely Complications (Retinopathy) – Clinical Decision Support
Patient Flight Path, Diabetes Next Likely Complications (ESRD) – Clinical Decision Support
Plus several more… (Nephropathy, Cataracts, CHF, CAD, Ketoacidosis, Erectile Dysfunction, Foot Ulcers)
Predictive appointment no shows – Operations and Performance Management
Propensity to pay – Financial Decision Support
Pre-surgical risk (Bowel) – Clinical Decision Support and client request
Post-surgical risk (Hips and Knees) – Clinical Decision Support
Patient Flight Path, Congestive Heart Failure (5-6 new flight path algorithms similar to Patient Flight Path, Diabetes below)
Patient Flight Path, Coronary Artery Disease (5-6 new flight path algorithms similar to Patient Flight Path, Diabetes below)
Geo-spatial health system service area definition, network referral/leakage
INSIGHT socio-economic based risk – Clinical Decision Support and client request
Native SQL/R predictive framework and standard package - Platform
Feature selection, Parallel Models, Rank and Impact of Input Variables – Platform
Predictive ETL batch load times – Platform
Early detection of CLABSI, CAUTI, Clostridium difficile (c. diff) hospital infections – Clinical Decision Support
Early detection of Sepsis/Septicemia (Blood Infection) – Clinical Decision Support
Public data sets, benchmarks, “Catalyst Risk”, expected mortality, length of stay – CAFÉ collaboration
Clusters of population risk (near term risk/cost) – Population Health and Accountable Care
© 2016 Health Catalyst
Proprietary and Confidential
Poll Question #3
What are or would be the top three most important data sources to your
organization in making predictions? (select 3, if applicable)
a) Clinical EMR data
b) Claims data
c) Patient outcomes data
d) Financial data
e) Non-medical patient data (e.g. socioeconomic, behavioral)
f) Patient satisfaction data
g) Unsure or not applicable
40
© 2016 Health Catalyst
Proprietary and Confidential
Three Key Recommendations for
Scaling Predictive Analytics
41
Data
Source
Predictive
Model ?
Gnarly
SQL Query
Data
Manipulation
Tools/
Algorithms
SAS | Weka |
R | Python
Deploy
Deploy with a
strategy for
intervention
Standardize
tools and
methods
using
production
quality code
Fully leverage
your analytics
environment
© 2016 Health Catalyst
Proprietary and Confidential
What the Future Holds
42
© 2016 Health Catalyst
Proprietary and Confidential
Closed Loop Architecture
Web/Mobile
Apps
(SMART on FHIR)
Data Warehouse
• Local patient data
• Clinical trials
• Social determinants data
• Patient reported outcomes
• Regional and National
Data Sets
• Activity Based Costing
• Genomic (+ other ‘omics)
• Environmental
• Geospatial
• Device data (includes
fitness)
• Patient engagement
A
P
I
Clustering Algorithms
(Patients Like This)
Readmission Predictors
High and rising risk predictors
EHR
Analytic insights drive:
• Alerting
• Ordering
• Refills
• Inbox
• Diagnosis
• Referrals
• Dynamic screen
generation
• Suggestions
• Risk management
Clinical Workflow Engine Analytics Engine
Algorithm
Library
Integrated
Data
Repository
Registry Definitions
Clinicians need
analytic insight
delivered in their
natural workflow
Text Processing/NLP Algorithms
Email
Text Messaging
A
P
I
(FHIR)
Patients need to be
involved, too!
CAFÉ
65M+ Patient
Records
© 2016 Health Catalyst
Proprietary and Confidential
Thank
You
44

More Related Content

PPTX
Data management
PPTX
The Use of Predictive Analytics in Health Care
PPTX
Healthcare Data Management: Three Principles of Using Data to Its Full Potential
PPSX
OLAP OnLine Analytical Processing
PDF
Healthcare analytics
PDF
Big Data Analytics for Healthcare
PPTX
Clinical decision support systems
PPTX
Data Staging Strategy
Data management
The Use of Predictive Analytics in Health Care
Healthcare Data Management: Three Principles of Using Data to Its Full Potential
OLAP OnLine Analytical Processing
Healthcare analytics
Big Data Analytics for Healthcare
Clinical decision support systems
Data Staging Strategy

What's hot (20)

PDF
Digital Healthcare - Detailed Presentation PDF
PPTX
Analytics in healthcare
PPTX
Big-Data in HealthCare _ Overview
PDF
Introduction on Data Science
PDF
Data Management PowerPoint Presentation Slides
PPTX
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
PDF
Top 10 digital transformation trends for healthcare in 2022
PPTX
Healthcare Information Analytics
PPTX
Data Governance
PPTX
Big Data applications in Health Care
PPTX
Use of data analytics in health care
PPTX
Data Analytics
PPTX
Top Healthcare Trends 2022
PDF
Data Analytics and Artificial Intelligence in the era of Digital Transformation
PDF
AI Data Acquisition and Governance: Considerations for Success
PDF
Data Governance PowerPoint Presentation Slides
PDF
The Future of Digital Health in 2022
PDF
Data Analytics in Healthcare
PPTX
Data Quality Presentation
PDF
Data Audit Approach To Developing An Enterprise Data Strategy
Digital Healthcare - Detailed Presentation PDF
Analytics in healthcare
Big-Data in HealthCare _ Overview
Introduction on Data Science
Data Management PowerPoint Presentation Slides
Big Data in Healthcare Made Simple: Where It Stands Today and Where It’s Going
Top 10 digital transformation trends for healthcare in 2022
Healthcare Information Analytics
Data Governance
Big Data applications in Health Care
Use of data analytics in health care
Data Analytics
Top Healthcare Trends 2022
Data Analytics and Artificial Intelligence in the era of Digital Transformation
AI Data Acquisition and Governance: Considerations for Success
Data Governance PowerPoint Presentation Slides
The Future of Digital Health in 2022
Data Analytics in Healthcare
Data Quality Presentation
Data Audit Approach To Developing An Enterprise Data Strategy
Ad

Similar to Deploying Predictive Analytics in Healthcare (20)

PPTX
Introducing catalyst.ai and MACRA Measures & Insights
PPT
Predictive Analytics in Healthcare
PPTX
Machine Learning Misconceptions
PPTX
Three Approaches to Predictive Analytics in Healthcare
PPTX
4 Essential Lessons for Adopting Predictive Analytics in Healthcare
PPTX
The Why And How Of Machine Learning And AI: An Implementation Guide For Healt...
PPTX
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
PPTX
Predicting the Future of Predictive Analytics in Healthcare
PDF
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
 
PPTX
There Is A 90% Probability That Your Son Is Pregnant: Predicting the Future ...
PPTX
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...
PPTX
Artificial Intelligence in Healthcare: A Change Management Problem
PDF
Predictions And Analytics In Healthcare: Advancements In Machine Learning
PPTX
MachineIntelligence powerpoint presentation
PPTX
Part 2 - 20 Years in Healthcare Analytics & Data Warehousing: What did we lea...
PPTX
20 Years in Healthcare Analytics & Data Warehousing: What did we learn? What'...
PPTX
Is Big Data a Big Deal...or Not?
PPTX
Predicting Patient Outcomes in Real-Time at HCA
PPTX
A New Era of Personalized Medicine: The Power of Analytics and AI
PPTX
Machine Learning in Healthcare: A Case Study
Introducing catalyst.ai and MACRA Measures & Insights
Predictive Analytics in Healthcare
Machine Learning Misconceptions
Three Approaches to Predictive Analytics in Healthcare
4 Essential Lessons for Adopting Predictive Analytics in Healthcare
The Why And How Of Machine Learning And AI: An Implementation Guide For Healt...
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
Predicting the Future of Predictive Analytics in Healthcare
Strata Rx 2013 - Data Driven Drugs: Predictive Models to Improve Product Qual...
 
There Is A 90% Probability That Your Son Is Pregnant: Predicting the Future ...
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...
Artificial Intelligence in Healthcare: A Change Management Problem
Predictions And Analytics In Healthcare: Advancements In Machine Learning
MachineIntelligence powerpoint presentation
Part 2 - 20 Years in Healthcare Analytics & Data Warehousing: What did we lea...
20 Years in Healthcare Analytics & Data Warehousing: What did we learn? What'...
Is Big Data a Big Deal...or Not?
Predicting Patient Outcomes in Real-Time at HCA
A New Era of Personalized Medicine: The Power of Analytics and AI
Machine Learning in Healthcare: A Case Study
Ad

More from Health Catalyst (20)

PDF
2025 CPT Updates - Professional Evaluation & Management (E/M) and Medicine Ch...
PPTX
2025 CPT Updates - Professional Evaluation & Management (E/M) and Medicine Ch...
PPTX
2025 CPT® Code Updates ( HIM Focused )
PPTX
2025 CPT® Code Updates ( CDM Focused )
PPTX
What’s Next for the OPPS: A Look at the 2025 Final Rule
PPTX
Unlocking Data for Growth: Harnessing Insights for Strategic Decisions
PPTX
How the PFS Final Rule Will Impact Your MSSP ACO Quality Reporting and Savings
PPTX
2025 Medicare Physician Fee Schedule (MPFS) Final Rule Updates
PPTX
What’s Next for the OPPS: A Look at the 2025 Final Rule
PPTX
Elevate Your Charge Capture: Harnessing Technology for Streamlined Data Colle...
PPTX
Looking Forward: The Evolution of Cancer Registry
PPTX
Addressing Key Challenges in Ambulatory Settings.pptx
PPTX
Leveraging Automated Data Flows, AI, and Analytics for Chart Abstraction
PPTX
Vitalware Insight into the 2025 ICD-10 PCS Updates
PPTX
Vitalware-Insight-Into-the-2025-ICD10-CM-Updates.pptx
PPTX
Embedded Refills: Improving Workflow Efficiency and Optimizing the Medication...
PPTX
A Data and Analytics Ecosystem, Purpose-Built for Healthcare
PPTX
Health Catalyst AI Becker's Webinar.pptx
PPTX
Empowering ACOs: Leveraging Quality Management Tools for MIPS and Beyond
PPTX
Unlock the Secrets to Optimizing Ambulatory Operations Efficiency and Change ...
2025 CPT Updates - Professional Evaluation & Management (E/M) and Medicine Ch...
2025 CPT Updates - Professional Evaluation & Management (E/M) and Medicine Ch...
2025 CPT® Code Updates ( HIM Focused )
2025 CPT® Code Updates ( CDM Focused )
What’s Next for the OPPS: A Look at the 2025 Final Rule
Unlocking Data for Growth: Harnessing Insights for Strategic Decisions
How the PFS Final Rule Will Impact Your MSSP ACO Quality Reporting and Savings
2025 Medicare Physician Fee Schedule (MPFS) Final Rule Updates
What’s Next for the OPPS: A Look at the 2025 Final Rule
Elevate Your Charge Capture: Harnessing Technology for Streamlined Data Colle...
Looking Forward: The Evolution of Cancer Registry
Addressing Key Challenges in Ambulatory Settings.pptx
Leveraging Automated Data Flows, AI, and Analytics for Chart Abstraction
Vitalware Insight into the 2025 ICD-10 PCS Updates
Vitalware-Insight-Into-the-2025-ICD10-CM-Updates.pptx
Embedded Refills: Improving Workflow Efficiency and Optimizing the Medication...
A Data and Analytics Ecosystem, Purpose-Built for Healthcare
Health Catalyst AI Becker's Webinar.pptx
Empowering ACOs: Leveraging Quality Management Tools for MIPS and Beyond
Unlock the Secrets to Optimizing Ambulatory Operations Efficiency and Change ...

Recently uploaded (20)

PPT
Recent advances in Diagnosis of Autoimmune Disorders
PPTX
First aid in common emergency conditions.pptx
PPTX
Pulmonary Circulation PPT final for easy
PDF
NUTRITION THROUGHOUT THE LIFE CYCLE CHILDHOOD -AGEING
PPTX
Medical aspects of impairment including all the domains mentioned in ICF
PPTX
ABG advance Arterial Blood Gases Analysis
PPT
Microscope is an instrument that makes an enlarged image of a small object, t...
PDF
CHAPTER 9 MEETING SAFETY NEEDS FOR OLDER ADULTS.pdf
PPTX
1. Drug Distribution System.pptt b pharmacy
PPTX
Current Treatment Of Heart Failure By Dr Masood Ahmed
PPTX
Basics of pharmacology (Pharmacology I).pptx
PPTX
General Pharmacology by Nandini Ratne, Nagpur College of Pharmacy, Hingna Roa...
PPTX
AI_in_Pharmaceutical_Technology_Presentation.pptx
PPTX
Genaralised anxiety disorder presentation
PPTX
Immunity....(shweta).................pptx
PPTX
HEMODYNAMICS - I DERANGEMENTS OF BODY FLUIDS.pptx
PDF
Dermatology diseases Index August 2025.pdf
PPTX
Infection prevention and control for medical students
PPT
Adrenergic drugs (sympathomimetics ).ppt
PDF
MECE & SCQA FRAMEWORKS, - Adding Innovation & Influencing Hospital & Super-Sp...
Recent advances in Diagnosis of Autoimmune Disorders
First aid in common emergency conditions.pptx
Pulmonary Circulation PPT final for easy
NUTRITION THROUGHOUT THE LIFE CYCLE CHILDHOOD -AGEING
Medical aspects of impairment including all the domains mentioned in ICF
ABG advance Arterial Blood Gases Analysis
Microscope is an instrument that makes an enlarged image of a small object, t...
CHAPTER 9 MEETING SAFETY NEEDS FOR OLDER ADULTS.pdf
1. Drug Distribution System.pptt b pharmacy
Current Treatment Of Heart Failure By Dr Masood Ahmed
Basics of pharmacology (Pharmacology I).pptx
General Pharmacology by Nandini Ratne, Nagpur College of Pharmacy, Hingna Roa...
AI_in_Pharmaceutical_Technology_Presentation.pptx
Genaralised anxiety disorder presentation
Immunity....(shweta).................pptx
HEMODYNAMICS - I DERANGEMENTS OF BODY FLUIDS.pptx
Dermatology diseases Index August 2025.pdf
Infection prevention and control for medical students
Adrenergic drugs (sympathomimetics ).ppt
MECE & SCQA FRAMEWORKS, - Adding Innovation & Influencing Hospital & Super-Sp...

Deploying Predictive Analytics in Healthcare

  • 1. October 13, 2016 Deploying Predictive Analytics: A practitioner’s guide Eric Just Senior Vice President Levi Thatcher Director of Data Science
  • 4. © 2016 Health Catalyst Proprietary and Confidential Poll Question #1 4 How important are predictive analytics for the future of healthcare? 1) Not at all important 2) Low importance 3) Neutral 4) Moderately important 5) Extremely important 6) Unsure or not applicable
  • 5. © 2016 Health Catalyst Proprietary and Confidential Predictive analytics is about using pattern recognition to predict future events but… Predicting something is not good enough; you must have the data to act and intervene… and the organizational wherewithal to inteverne 5
  • 6. © 2016 Health Catalyst Proprietary and Confidential What Is Machine Learning? 6 Machine learning explores the study and construction of algorithms that can learn from and make predictions on data. Within the field of data analytics, machine learning is a method used to devise complex models and algorithms that lend themselves to prediction. In commercial use, this is known as predictive analytics. https://en.wikipedia.org/wiki/Machine_learning
  • 7. © 2016 Health Catalyst Proprietary and Confidential Predictive Analytics in Healthcare: “Classic” Approaches 7 Charlson Index, 1987 LACE Index, 2010
  • 8. © 2016 Health Catalyst Proprietary and Confidential What Has Happened Since 2010? 8
  • 9. © 2016 Health Catalyst Proprietary and Confidential 9 What Has Happened Since 2010? ML – Based Predictive Analytics
  • 10. © 2016 Health Catalyst Proprietary and Confidential Poll Question #2 10 What is the biggest barrier to implementing predictive analytics? a) We do not have the people or skills b) We do not have the right data or technical tools/infrastructure c) We do not have executive support or budget d) Past efforts have failed to show results e) Other f) Unsure or not applicable
  • 11. © 2016 Health Catalyst Proprietary and Confidential 11 Predictive analytics is easy (or at least easier!) Organizations are struggling with making predictive analytics routine, pervasive, and actionable.
  • 12. © 2016 Health Catalyst Proprietary and Confidential Typical ‘Current State’ for Predictive Analytics 12 Data Source Predictive Model ? Gnarly SQL Query Data Manipulation Tools/ Algorithms SAS | Weka | R | Python Deploy
  • 13. © 2016 Health Catalyst Proprietary and Confidential Three Key Recommendations for Scaling Predictive Analytics 13 Data Source Predictive Model ? Gnarly SQL Query Data Manipulation Tools/ Algorithms SAS | Weka | R | Python Deploy Deploy with a strategy for intervention Standardize tools and methods using production quality code Fully leverage your analytics environment
  • 14. © 2016 Health Catalyst Proprietary and Confidential Fully Leverage Your Analytics Environment 14
  • 15. © 2016 Health Catalyst Proprietary and Confidential What is a Feature? “In machine learning and pattern recognition, a feature is an individual measurable property of a phenomenon being observed. Choosing informative, discriminating and independent features is a crucial step for effective algorithms in pattern recognition, classification and regression.” https://en.wikipedia.org/wiki/Feature_(machine_learning) 15
  • 16. © 2016 Health Catalyst Proprietary and Confidential Leverage Your Analytics Environment 16 A data warehouse provides access to raw data and pre-defined data like:  Clinical registries  Comorbidity models (i.e. Charlson Score)  Readmissions  Length of stay  Other calculated fields Read-only access is not enough!
  • 17. © 2016 Health Catalyst Proprietary and Confidential Polypharmacy Feature 17
  • 18. © 2016 Health Catalyst Proprietary and Confidential Polypharmacy Data Mart 18 PatientID MedicationID StartDT EndDT Z006F600A51F 15862 7/27/2006 7/28/2006 Z006F600A51F 41801 7/27/2006 10/28/2009 Z006F600A51F 10994 7/27/2006 NULL Z006F600A51F 15862 7/27/2006 7/28/2006 Z006F600A51F 41801 7/27/2006 NULL Z006F600A51F 10994 7/27/2006 NULL Z006F600A51F 15862 7/27/2006 7/28/2006 Z006F600A51F 41801 7/27/2006 NULL Z006F600A51F 10994 7/27/2006 NULL Z13148BF2583 4996 9/14/2005 11/15/2005 Z13148BF2583 11798 9/14/2005 11/15/2005 Z13148BF2583 15061 9/14/2005 11/15/2005 Z13148BF2583 2079 11/15/2005 11/15/2005 PatientEncounterID PolypharmacyCNT 1048826 6 1048912 0 1048923 0 1048924 0 1048925 0 1048926 0 1049094 2 1049095 2 1049096 2 1049097 3 1049098 3 1049099 4 1049100 2 Step 1: Clean up data • Missing end dates • One time doses Step 2: Aggregate into polypharmacy count at each encounter
  • 19. © 2016 Health Catalyst Proprietary and Confidential What Is Feature Engineering? “Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy...” Jason Brownlee in “Discover Feature Engineering, How to Engineer Features and How to Get Good at It” http://machinelearningmastery.com/discover-feature-engineering-how-to-engineer-features-and-how-to-get-good-at-it/ “Much of the success of machine learning is actually success in engineering features that a learner can understand.” Scott Locklin in “Neglected Machine Learning Ideas” https://scottlocklin.wordpress.com/2014/07/22/neglected-machine-learning-ideas/ 19
  • 20. © 2016 Health Catalyst Proprietary and Confidential Other Examples of Feature Engineering 20 The ability for data scientists to engineer features is critical to a successful predictive analytics/ machine learning strategy Number of ER visits in the last year Line days Number and types of comorbid conditions Almost any input into a predictive model will be engineered in some way
  • 21. © 2016 Health Catalyst Proprietary and Confidential Fully Leverage Your Analytics Environment 21 Most feature engineering should be done in the analytics environment (data warehouse). Give data scientists enough access to data warehouse to promote efficient re- use of engineered features. Use standard data warehouse ETL tools to operationalize engineered features.
  • 22. © 2016 Health Catalyst Proprietary and Confidential Standardize Tools and Methods Using Production-Quality Code 22
  • 23. © 2016 Health Catalyst Proprietary and Confidential Three Key Recommendations for Scaling Predictive Analytics 23 Data Source Predictive Model ? Gnarly SQL Query Data Manipulation Tools/ Algorithms SAS | Weka | R | Python Deploy Standardize tools and methods using production quality code Fully leverage your analytics environment
  • 24. © 2016 Health Catalyst Proprietary and Confidential You Need Lots of Smart People! 24 • Develops software leveraged by data scientists to test and deploy their models • Requires data science knowledge • Requires knowledge of software engineering best practices • A rare find! • Formulates hypotheses about features driving a predictive model (with clinical input) • Tries various models to determine best approach for prediction • Assesses model output and accuracy and operationalizes the best approach Data Scientist Machine Learning Engineer
  • 25. © 2016 Health Catalyst Proprietary and Confidential Predictive Analytics Processes 2525 Build model experiment (~30-40 features) Split data into train and test Run multiple algorithms Measure & report performance Select best algorithm & important features (~10 features) Store parameters Load model parameters Receive patient ‘record’ (~10 features) Calculate prediction Output prediction 2525 Development Process (Readmission prediction) Running the Model (Readmission prediction)
  • 26. © 2016 Health Catalyst Proprietary and Confidential Developing a Machine Learning Code Base 26 Focus data scientist on model development – not writing code or reinventing the wheel Standardize methodologies so that best practices can be deployed Predictive models in production require production quality code Why?
  • 27. © 2016 Health Catalyst Proprietary and Confidential Version control 27 Developing a Machine Learning Code Base: Best Practices Unit Testing Also, • Documentation • Continuous integration This is production code – it should be treated as such!
  • 28. © 2016 Health Catalyst Proprietary and Confidential 28 Developing a Machine Learning Code Base: Technology Choices R • Open source, deeply entrenched in healthcare • More familiar to analysts/statisticians Python • Open source, newer approach (with lots of momentum) • More familiar to developers AzureML • Cloud-based • Easy to deploy Plenty of other choices
  • 29. © 2016 Health Catalyst Proprietary and Confidential Our Code Base Includes: Data Ingestion/Preparation • Load data from database and/or CSV • Impute missing values • Remove ‘bad’ data (rows and columns) • Date/Time expansion (i.e. Day of week, week of year) 29 Model Development • Split test and train • Feature selection • Run algorithms  Random Forest  Lasso  Mixed Models (Q4 2016)  k-means (Q4 2016) • Evaluate model performance • Save model Analysis • Model performance reports • Trend identification • Risk adjusted comparisons
  • 30. © 2016 Health Catalyst Proprietary and Confidential Scaling People Data Architects  Great domain knowledge  Often looking for opportunities to advance career/skills With the right tools…  Data architects make great feature engineers  Data architects can easily get started in predictive analytics “One awesome thing about the output from the R [package] you put together is the output aligns perfectly with creating Patient Stratification algorithms. …The fact that I feel comfortable running this stuff speaks to how easy you have made it. Thanks again, Levi.” 30
  • 31. © 2016 Health Catalyst Proprietary and Confidential Putting Predictive Models in Production 31
  • 32. © 2016 Health Catalyst Proprietary and Confidential Modality #1: Extract, Transform, Load (ETL) Process Deploy in this modality if: • Prediction is not based on highly dynamic data • Prediction is used in analytic application • Intervention strategy ‘OK’ with some latency (up to 24 hours) • Example: Readmission prediction 32 Write Predictions to Database (ML Code) Run Predictions (ML Code) Load Engineered Features (ETL) Load Source Tables (ETL)
  • 33. © 2016 Health Catalyst Proprietary and Confidential Modality #2: Web Services Deploy in this modality if: • Prediction is dynamic data • Prediction is used in workflow application • Intervention strategy not ‘OK’ with some latency (up to 24 hours) • Example: Sepsis early detection 33 Real-time features from application Early detection web service Historic features (EDW) Run prediction (ML Code) Model input Model output Model input
  • 34. © 2016 Health Catalyst Proprietary and Confidential Scaling Predictive Analytics 34 Data Source Predictive Model ? Gnarly SQL Query Data Manipulation Tools/ Algorithms SAS | Weka | R | Python DeployDeploy with a strategy for intervention Standardized tools and methods using production quality code Fully leverage your analytics environment
  • 35. © 2016 Health Catalyst Proprietary and Confidential Deploy With a Strategy for Intervention 35
  • 36. © 2016 Health Catalyst Proprietary and Confidential Case Study: Central Line Associated Bloodstream Infection (CLABSI) • Approximately 41,000 patients with central lines will end up with a blood stream infection (CLABSI) • One in four patients with a CLABSI will die • CLABSI improvement team looking at compliance with evidence-based guidelines • Retrospective analysis led to increased insight into problem areas and associated interventions • Team wanted more pro-active notification of high-risk patients • Developed predictive algorithm based on 16 features 36
  • 37. © 2016 Health Catalyst Proprietary and Confidential 37
  • 38. © 2016 Health Catalyst Proprietary and Confidential Discussing Models With Clinicians • Clinicians will adopt predictive analytics … insofar as they understand it • Complexity comes at a price  Regression can often strike a good balance between predictive value and interpretability  Use regularization techniques to penalize complexity For example: In statistics and machine learning, lasso (least absolute shrinkage and selection operator) (also Lasso or LASSO) is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces. (Wikipedia) 38
  • 39. © 2016 Health Catalyst Proprietary and Confidential Models Built To Date: In Development Built Planned Central line-associated bloodstream infection (CLABSI) – Clinical Decision Support Forecast IBNR claims/year-end expenditures – Financial Decision Support Congestive Heart Failure, Readmissions Risk – Clinical Decision Support Predictive Risk & Cost – Population Health and Accountable Care Patient Flight Path, Diabetes Future Risk – Clinical Decision Support Patient Flight Path, Diabetes Future Cost– Clinical Decision Support Patient Flight Path, Diabetes Top Treatments – Clinical Decision Support Patient Flight Path, Diabetes Next Likely Complications (Glaucoma) – Clinical Decision Support Patient Flight Path, Diabetes Next Likely Complications (Retinopathy) – Clinical Decision Support Patient Flight Path, Diabetes Next Likely Complications (ESRD) – Clinical Decision Support Plus several more… (Nephropathy, Cataracts, CHF, CAD, Ketoacidosis, Erectile Dysfunction, Foot Ulcers) Predictive appointment no shows – Operations and Performance Management Propensity to pay – Financial Decision Support Pre-surgical risk (Bowel) – Clinical Decision Support and client request Post-surgical risk (Hips and Knees) – Clinical Decision Support Patient Flight Path, Congestive Heart Failure (5-6 new flight path algorithms similar to Patient Flight Path, Diabetes below) Patient Flight Path, Coronary Artery Disease (5-6 new flight path algorithms similar to Patient Flight Path, Diabetes below) Geo-spatial health system service area definition, network referral/leakage INSIGHT socio-economic based risk – Clinical Decision Support and client request Native SQL/R predictive framework and standard package - Platform Feature selection, Parallel Models, Rank and Impact of Input Variables – Platform Predictive ETL batch load times – Platform Early detection of CLABSI, CAUTI, Clostridium difficile (c. diff) hospital infections – Clinical Decision Support Early detection of Sepsis/Septicemia (Blood Infection) – Clinical Decision Support Public data sets, benchmarks, “Catalyst Risk”, expected mortality, length of stay – CAFÉ collaboration Clusters of population risk (near term risk/cost) – Population Health and Accountable Care
  • 40. © 2016 Health Catalyst Proprietary and Confidential Poll Question #3 What are or would be the top three most important data sources to your organization in making predictions? (select 3, if applicable) a) Clinical EMR data b) Claims data c) Patient outcomes data d) Financial data e) Non-medical patient data (e.g. socioeconomic, behavioral) f) Patient satisfaction data g) Unsure or not applicable 40
  • 41. © 2016 Health Catalyst Proprietary and Confidential Three Key Recommendations for Scaling Predictive Analytics 41 Data Source Predictive Model ? Gnarly SQL Query Data Manipulation Tools/ Algorithms SAS | Weka | R | Python Deploy Deploy with a strategy for intervention Standardize tools and methods using production quality code Fully leverage your analytics environment
  • 42. © 2016 Health Catalyst Proprietary and Confidential What the Future Holds 42
  • 43. © 2016 Health Catalyst Proprietary and Confidential Closed Loop Architecture Web/Mobile Apps (SMART on FHIR) Data Warehouse • Local patient data • Clinical trials • Social determinants data • Patient reported outcomes • Regional and National Data Sets • Activity Based Costing • Genomic (+ other ‘omics) • Environmental • Geospatial • Device data (includes fitness) • Patient engagement A P I Clustering Algorithms (Patients Like This) Readmission Predictors High and rising risk predictors EHR Analytic insights drive: • Alerting • Ordering • Refills • Inbox • Diagnosis • Referrals • Dynamic screen generation • Suggestions • Risk management Clinical Workflow Engine Analytics Engine Algorithm Library Integrated Data Repository Registry Definitions Clinicians need analytic insight delivered in their natural workflow Text Processing/NLP Algorithms Email Text Messaging A P I (FHIR) Patients need to be involved, too! CAFÉ 65M+ Patient Records
  • 44. © 2016 Health Catalyst Proprietary and Confidential Thank You 44