Clinical prediction modeling in the era of AI: a blessing and a curse

Maarten van Smeden, PhD
Julius Center for Health Sciences and Primary Care
8th Annual Danish Bioinformatics conference
Kopenhagen, 22 August 2024
Clinical prediction modeling
in the era of AI:
a blessing and a curse

Disclosures
• Nothing to disclose

In this lecture, I will talk about….

Kopenhagen, 22 Aug 2024 @MaartenvSmeden
AI
BLESSINGS
AND
CURSES

Img source: https://www.topbots.com/generative-vs-predictive-ai/

De Hond et al, Lancet Digital Health, 2024

Prediction
Source: https://www.intellspot.com/unsupervised-vs-supervised-learning/#google_vignette

van Smeden et al., JCE, 2021, doi: 10.1016/j.jclinepi.2021.01.009

Adversarial example
https://bit.ly/2N4mQFo; https://bit.ly/2W7X9rF

https://tinyurl.com/3knkuzs3

Image source: https://shorturl.at/styGJ

APGAR score
Apgar et al. JAMA, 1958

Still commonly used, but…
Sources: doi: 10.1097/ANC.0000000000000859, 10.1136/bmj.38117.665197.F7

“65% of U.S. physicians used MDCalc on a weekly basis”

Landscape of clinical prediction models
• 42 models for kidney failure in chronic kidney disease (Ramspek, 2019)
• 40 models for incident heart failure (Sahle, 2017)
• 37 models for treatment response in pulmonary TB (Peetluk, 2021)
• 35 models for in vitro fertilisation (Ratna, 2020)
• 34 models for stroke in type-2 diabetes (Chowdhury, 2019)
• 34 models for graft failure in kidney transplantation (Kabore, 2017)
• 31 models for length of stay in ICU (Verburg, 2016)
• 30 models for low back pain (Haskins, 2015)
• 27 models for pediatric early warning systems (Trubey, 2019)
• 27 models for malaria prognosis (Njim, 2019)
• 26 models for postoperative outcomes colorectal cancer (Souwer, 2020)
• 26 models for childhood asthma (Kothalawa, 2020)
• 25 models for lung cancer risk (Gray, 2016)
• 25 models for re-admission after admitted for heart failure (Mahajan, 2018)
• 23 models for recovery after ischemic stroke (Jampathong, 2018)
• 23 models for delirium in older adults (Lindroth, 2018)
• 21 models for atrial fibrillation detection in community (Himmelreich, 2020)
• 19 models for survival after resectable pancreatic cancer (Stijker, 2019)
• 18 models for recurrence hep. carc. after liver transplant (Al-Ameri, 2020)
• 18 models for future hypertension in children (Hamoen, 2018)
• 18 models for risk of falls after stroke (Walsh, 2016)
• 18 models for mortality in acute pancreatitis (Di, 2016)
• 17 models for bacterial meningitis (van Zeggeren, 2019)
• 17 models for cardiovascular disease in hypertensive population (Cai, 2020)
• 14 models for ICU delirium risk (Chen, 2020)
• 14 models for diabetic retinopathy progression (Haider, 2019)
• 1382 models for cardiovascular disease (Wessler, 2021)
• 731 models related to COVID-19 (Wynants, 2020)
• 408 models for COPD prognosis (Bellou, 2019)
• 363 models for cardiovascular disease general population (Damen, 2016)
• 327 models for toxicity prediction after radiotherapy (Takada, 2022)
• 263 prognosis models in obstetrics (Kleinrouweler, 2016)
• 258 models mortality after general trauma (Munter, 2017)
• 160 female-specific models for cardiovascular disease (Baart, 2019)
• 142 models for mortality prediction in preterm infants (van Beek, 2021)
• 119 models for critical care prognosis in LMIC (Haniffa, 2018)
• 101 models for primary gastric cancer prognosis (Feng, 2019)
• 99 models for neck pain (Wingbermühle, 2018)
• 81 models for sudden cardiac arrest (Carrick, 2020)
• 74 models for contrast-induced acute kidney injury (Allen, 2017)
• 73 models for 28/30 day hospital readmission (Zhou, 2016)
• 68 models for preeclampsia (De Kat, 2019)
• 68 models for living donor kidney/iver transplant counselling (Haller, 2022)
• 67 models for traumatic brain injury prognosis (Dijkland, 2019)
• 64 models for suicide / suicide attempt (Belsher, 2019)
• 61 models for dementia (Hou, 2019)
• 58 models for breast cancer prognosis (Phung, 2019)
• 52 models for pre‐eclampsia (Townsend, 2019)
• 52 models for colorectal cancer risk (Usher-Smith, 2016)
• 48 models for incident hypertension (Sun, 2017)
• 46 models for melanoma (Kaiser, 2020)
• 46 models for prognosis after carotid revascularisation (Volkers, 2017)
• 43 models for mortality in critically ill (Keuning, 2019)

Landscape of clinical prediction models
• 42 models for kidney failure in chronic kidney disease (Ramspek, 2019)
• 40 models for incident heart failure (Sahle, 2017)
• 37 models for treatment response in pulmonary TB (Peetluk, 2021)
• 35 models for in vitro fertilisation (Ratna, 2020)
• 34 models for stroke in type-2 diabetes (Chowdhury, 2019)
• 34 models for graft failure in kidney transplantation (Kabore, 2017)
• 31 models for length of stay in ICU (Verburg, 2016)
• 30 models for low back pain (Haskins, 2015)
• 27 models for pediatric early warning systems (Trubey, 2019)
• 27 models for malaria prognosis (Njim, 2019)
• 26 models for postoperative outcomes colorectal cancer (Souwer, 2020)
• 26 models for childhood asthma (Kothalawa, 2020)
• 25 models for lung cancer risk (Gray, 2016)
• 25 models for re-admission after admitted for heart failure (Mahajan, 2018)
• 23 models for recovery after ischemic stroke (Jampathong, 2018)
• 23 models for delirium in older adults (Lindroth, 2018)
• 21 models for atrial fibrillation detection in community (Himmelreich, 2020)
• 19 models for survival after resectable pancreatic cancer (Stijker, 2019)
• 18 models for recurrence hep. carc. after liver transplant (Al-Ameri, 2020)
• 18 models for future hypertension in children (Hamoen, 2018)
• 18 models for risk of falls after stroke (Walsh, 2016)
• 18 models for mortality in acute pancreatitis (Di, 2016)
• 17 models for bacterial meningitis (van Zeggeren, 2019)
• 17 models for cardiovascular disease in hypertensive population (Cai, 2020)
• 14 models for ICU delirium risk (Chen, 2020)
• 14 models for diabetic retinopathy progression (Haider, 2019)
• 1382 models for cardiovascular disease (Wessler, 2021)
• 731 models related to COVID-19 (Wynants, 2020)
• 408 models for COPD prognosis (Bellou, 2019)
• 363 models for cardiovascular disease general population (Damen, 2016)
• 327 models for toxicity prediction after radiotherapy (Takada, 2022)
• 263 prognosis models in obstetrics (Kleinrouweler, 2016)
• 258 models mortality after general trauma (Munter, 2017)
• 160 female-specific models for cardiovascular disease (Baart, 2019)
• 142 models for mortality prediction in preterm infants (van Beek, 2021)
• 119 models for critical care prognosis in LMIC (Haniffa, 2018)
• 101 models for primary gastric cancer prognosis (Feng, 2019)
• 99 models for neck pain (Wingbermühle, 2018)
• 81 models for sudden cardiac arrest (Carrick, 2020)
• 74 models for contrast-induced acute kidney injury (Allen, 2017)
• 73 models for 28/30 day hospital readmission (Zhou, 2016)
• 68 models for preeclampsia (De Kat, 2019)
• 68 models for living donor kidney/iver transplant counselling (Haller, 2022)
• 67 models for traumatic brain injury prognosis (Dijkland, 2019)
• 64 models for suicide / suicide attempt (Belsher, 2019)
• 61 models for dementia (Hou, 2019)
• 58 models for breast cancer prognosis (Phung, 2019)
• 52 models for pre‐eclampsia (Townsend, 2019)
• 52 models for colorectal cancer risk (Usher-Smith, 2016)
• 48 models for incident hypertension (Sun, 2017)
• 46 models for melanoma (Kaiser, 2020)
• 46 models for prognosis after carotid revascularisation (Volkers, 2017)
• 43 models for mortality in critically ill (Keuning, 2019)
Over 260 systematic reviews of clinical prediction models

Clinical prediction models
• > 150,000 clinical prediction models exist
• From simple scoring rules (e.g. APGAR) to increasingly complex
AI-based prediction models
Source: Arshi at al 2024, OSF, doi: 10.31219/osf.io/4txc6 .

A new clinical prediction model
is developed
every 1.5 hours
Source: Arshi at al 2024, OSF, doi: 10.31219/osf.io/4txc6 .

PREDICTION MODELS USED IN PRACTICE
PREDICTION MODELS THAT WILL NEVER BE USED IN PRACTICE
RESEARCH WASTE?

Example: living review
COVID-19 prediction models
• 731 prediction models between
March 2020 and February 2021
• Many models poorly reported
• Only 4% low risk of bias

External validation
COVID-19 prediction
models

Not just COVID

What is AI going to do to the
field of clinical prediction
models?

Self driving cars, etc
Created using Dall-E

IBM Watson winning Jeopardy! (2011)
https://bbc.in/2TMvV8I

IBM Watson for oncology
bit.ly/2LxiWGj ; bit.ly/3Esu68T

Tech company business model

Tech company business model
https://bit.ly/2HSp8X5; https://bit.ly/2Z0Pfop; https://bit.ly/2KIcpHG; https://bit.ly/33IJhr9

Proportion of studies indexed in Medline with the Medical Subject
Heading (MeSH) term “Artificial Intelligence”
Faes et al. doi: 10.3389/fdgth.2022.833912

Other success stories
https://go.nature.com/2VG2hS7; https://bbc.in/2Z1drXQ; https://bit.ly/2TAfRIP

https://twitter.com/AndrewLBeam/status/1620855064033382401?s=20&t=VO9_LdFFCj_wcwIQLvKcIQ

Source: Ilse Kant (UMC Utrecht)

https://bit.ly/2v2aokk

Ayers, JAMA Int Med, 2023, doi: 10.1001/jamainternmed.2023.1838
*Answers by healthcare professionals on Redit vs ChatGPT

Source: https://twitter.com/TansuYegen/status/1635388676539813889?s=20

Source: https://www.science.org/content/article/alarmed-tech-leaders-call-ai-research-pause

Reviewer #2

Three Myths
about
Machine
learning

Myth 1: “ML methods come from computer science”
Leo Breiman Jerome H
Friedman
Trevor Hastie Robert Tibshirani Daniela Witten
CART, random forest Gradient boosting Elements of statistical
learning
Lasso Introduction to statistical
learning
Edu Physics/Math Physics Statistics Statistics Statistics
Job title Professor of Statistics Professor of Statistics Professor of Statistics Professor of Statistics Professor of Statistics

Myth 2:“ML methods are for prediction, statistics is
for explaining”
1See further: Kreiff and Diaz Ordaz; https://bit.ly/2m1eYdK
ML and causal inference, small selection1
• Superlearner (e.g. van der Laan)
• High dimensional propensity scores (e.g. Schneeweiss)
• Causal forests (e.g. Athey)
• The book of why (Pearl)

Two cultures
Breiman, Stat Sci, 2001, DOI: 10.1214/ss/1009213726

Faes et al. doi: 10.3389/fdgth.2022.833912
Language

Robert Tibshirani: https://stanford.io/2zqEGfr
Machine learning: large grant = $1,000,000
Statistics: large grant = $50,000

ML refers to a culture, not to methods
Distinguishing between statistics and machine learning
• Substantial overlap methods used by both cultures
• Substantial overlap analysis goals
• Attempts to separate the two frequently result in disagreement
Pragmatic approach:
I’ll use “ML” to refer to models roughly outside of the traditional regression
types of analysis: decision trees (and descendants), SVMs, neural networks
(including Deep learning), boosting etc.

doi: 10.1001/jamapediatrics.2023.0034

Myth 3: Machine learning is (always) better at
prediction
Christodoulou et al. Journal of Clinical Epidemiology, 2019, doi: 10.1016/j.jclinepi.2019.02.004

Sources of prediction error
Y = 𝑓 𝑥 + 𝜀
For a model 𝑘 the expected test prediction error is:
σ2
+ bias2 መ
𝑓𝑘 𝑥 + var መ
𝑓𝑘 𝑥
See equation 2.46 in Hastie et al., the elements of statistical learning, https://stanford.io/2voWjra
Irreducible error Mean squared prediction error
(with E 𝜀 = 0, var 𝜀 = 𝜎2, values in 𝑥 are not random)
What we don’t model How we model
≈
≈

Y = 𝑓 𝑥 + 𝜀
σ2
+ bias2 መ
𝑓𝑘 𝑥
≈
≈
In words, two main components for error in predictions are:
• Mean squared predictor error
• Under control of the modeler

Y = 𝑓 𝑥 + 𝜀
σ2
+ bias2 መ
𝑓𝑘 𝑥
≈
≈
overfitting underfitting ”just right”

Y = 𝑓 𝑥 + 𝜀
σ2
+ bias2 መ
𝑓𝑘 𝑥
≈
≈
• Irreducible error
• Not under direct control of the modeler

What can we do to reduce “irreducible” error?
Changing the information
• Using text (NLP/text mining)
• For research: e.g. predicting life expectancy
https://bit.ly/2k8Ao8e
• Analyzing social media posts
• e.g. pharmacovigilance, adverse events monitoring via Twitter posts
https://bit.ly/2m0KKrg
• Speech signal processing
• e.g. Parkinson‟s disease,
https://bit.ly/2v3ZdHR
• Medical imaging

Examples where
AI has done well

Example: retinal disease
Gulshan et al, JAMA, 2016, 10.1001/jama.2016.17216; Picture retinopathy: https://bit.ly/2kB3X2w
Diabetic retinopathy
Deep learning (= Neural network)
• 128,000 images
• Transfer learning (preinitialization)
• Sensitivity and specificity > .90
• Estimated from training data

Approval of AI devices by FDA rapidly growing
Source: https://tinyurl.com/khn4dvyb (accessed 21/08/2024)

Examples where
AI has done poorly

Predicting mortality – the conclusion
PlosOne, 2018, DOI: 10.1371/journal.pone.0202344

Predicting mortality – the results
PlosOne, 2018, DOI: 10.1371/journal.pone.0202344

Predicting mortality – the media
PlosOne, 2018, DOI: 10.1371/journal.pone.0202344; https://bit.ly/2Q6H41R; https://bit.ly/2m3RLrn

HYPE!

Recidivism Algorithm
Pro-publica (2016) https://bit.ly/1XMKh5R

Skin cancer and rulers
Esteva et al., Nature, 2016, DOI: 10.1038/nature21056; https://bit.ly/2lE0vV0

https://www.tctmd.com/news/machine-learning-helps-predict-hospital-mortality-post-tavr-skepticism-abounds

AI assistance leads to more accurate diagnosis of liver cancer!

AI assistance leads to more accurate diagnosis of liver cancer! If AI is correct
AI assistance leads to less accurate diagnosis of liver cancer! If AI is incorrect

How can the field of clinical prediction
models using AI maximise benefits
and minimize risks and waste?

Image source: http://www.meditationcircle.org.uk/notes/acceptance/

The ML/AI model is only one small element in
getting the model in clinical practice
Source: https://tinyurl.com/jr23pdsk; courtesy Dr Ilse Kant (UMCU)

Leaky pipeline of clinical prediction models
Van Royen et al, ERJ, doi: 10.1183/13993003.00250-2022, also credits to Laure Wynants

Flexible algorithms are data hungry
From slide deck Ben van Calster: https://bit.ly/38Aqmjs

Flexible algorithms are energy hungry
The costs of training (cloud computing) the Transformer
once (!) are estimated at 1 to 3 million Dollars
https://bit.ly/33Dj38X

Expect heterogeneity in model performance
Wessler, Circulation CQO ,2021, doi:10.1161/CIRCOUTCOMES.121.007858

Dutch guideline prediction models based of AI
https://www.leidraad-ai.nl/

Dutch guideline prediction models based of AI
https://www.leidraad-ai.nl/
Collection and
management of the
data
Phase 1
Development of the
AIP
Phase 2
Validation of the
AIPA
Phase 3
Development of the
required software
Phase 4
Impact assessment
of the AIPA in
combination with
the software
Phase 5
Implementation
and use of the AIPA
with software in
daily practice
Phase 6
Saskia Haitjema
Andre Dekker
Paul Algra
Amy Eikelenboom
Christian van
Ginkel
Martine de Vries
Daniel Oberski
Desy Kakiay
Kicky van
Leeuwen
Joran Lokkerbol
Evangelos
Kanoulas
Gabrielle
Davelaar
Wouter Veldhuis
Bart-Jan Verhoeff
Vincent Stirler
Daan van den
Donk
Huib Burger
Giovanni Cina
Martijn van der
Meulen
Maurits Kaptein
Floor van
Leeuwen
Egge van der Poel
Marcel Hilgersom
Sade Faneyte
Jonas Teuwen
Teus Kappen
Ewout Steyerberg
Leo Hovestadt
René Drost
Bart Geerts
Anne de Hond
René Verhaart
Nynke Breimer
Karen Wiegant
Laure Wynants
Lysette
Meuleman

AI ecosystem in the University Medical Center Utrecht
You are here

R&D concentrated in 5 AI labs
https://www.umcutrecht.nl/en/campaign/ai-labs

• Hype
• AI rebranding and
reinventions
• Traditional issues such
as low N, lack of
validation, poor
reporting, data quality,
generalizability
• More research waste
• Energy consumption
• Other expenses beyond
model training
AI BLESSINGS AND CURSES
• Real innovation
• Methods/architectures
allowing (unstructured)
use of new types of
data at scale
• Computing power
• Software
• Clinical trials showing
benefit of AI assistance
• Willingness to invest in
prediction using AI

Maarten van Smeden
Julius Center for Health Sciences and Primary Care
University Medical Center Utrecht
Director of UMC Utrecht AI methods lab
Team lead of health data science group
Head of Julius Center’s methods program
E-mail: M.vanSmeden@umcutrecht.nl

Clinical prediction modeling in the era of AI: a blessing and a curse

More Related Content

Similar to Clinical prediction modeling in the era of AI: a blessing and a curse (20)

More from Maarten van Smeden (20)

Recently uploaded (20)

Clinical prediction modeling in the era of AI: a blessing and a curse