SlideShare a Scribd company logo
High-Confidence Data Programming
for Evaluating Suppression of
Physiological Alarms
Sydney Pugh1, Ivan Ruchkin1, Christopher P. Bonafide2, Sara B. DeMauro2,
Oleg Sokolsky1, Insup Lee1, James Weimer1
1 Department of Computer and Information Science, University of Pennsylvania
2 Children’s Hospital of Philadelphia
Outline
• Introduction
• Motivating Scenarios
• Problem Statement
• Related Work
• Approach
• Conclusion and Future Work
2
3
Patient Alarm-Generating
Device
Clinical Expert
Suppression
System
Alarms: suppressed
and not suppressed
Performance
Evaluation
…but this is expensive
• Key characteristics:
• Sensitivity
• Specificity
• Typically requires a
labeled alarm dataset
• <1% of alarms considered
actionable or informative
• 7-10 minute delays due to
alarm fatigue
• Clinical alarm fatigue in
ECRI Top 10 Health Tech
Hazards since 2007
Manually label the alarms
Motivating Scenarios
4
Hospital A
Hospital B
Want to evaluate with an unlabeled data
…but true labels will always be the gold standard
How well will it work?
Suppression
System
Scenario 1: Pre-Trial Evaluation
Scenario 2: Tuning of a Deployed System
Hospital A
ALARM FATIGUE!!!
How to tune the system?
Tunable
Suppression
System
Problem Statement
Given an unlabeled dataset of alarms, predict the sensitivity/specificity
of an alarm suppression system that would result from an
observational study with true alarm labels.
5
Related Work
• Alarm suppression system evaluation
• Interventional and observational studies [MacMurchy, et al., 2017]
• High-precision methods of labeling alarm data
• Patient simulations and clinical trials [Jang, et al., 2018]
• Data programming [Ratner, et al., 2016]
6
Our work is not meant to replace these studies
…instead, prioritize, guide, and reduce the risk of them.
Outline
• Introduction
• Motivating Scenarios
• Problem Statement
• Related Work
• Approach
• Conclusion and Future Work
7
Approach
8
We will discuss the approach via a case study.
Data
• Collect a dataset of representative alarms and corresponding patient data
• 𝑋 = 𝑥!, 𝑥", …
Case Study: Low SpO2 Alarms
• Suppression system under study:
• Suppress alarm if SpO2 at time of alarm is above 𝑇, otherwise do not suppress
• Dataset:
• IRB study of 100 children from Children’s Hospital of Pennsylvania
• Patient background info, continuously recorded vital signs, manually annotated
physiologic monitoring alarms
9
Approach
10
Labeling Functions
• Ask clinicians to describe guidelines for alarm suppressibility
• Want quantitative guidelines (qualitative guidelines are difficult to encode)
• 1 guideline à 1 or more labeling functions
• Assumption: each labeling function has at least 50% accuracy
• This is difficult…will discuss more in future work
• Apply labeling functions Λ to the data 𝑋 to get weak labels 𝐿! 𝑋
Case Study: 12 Guidelines (62 Labeling Functions)
x
11
Example 1: Short Alarm
If alarm duration less than 𝑡, then
suppressible; otherwise abstain
Example 2: Repeat Alarms
If more than 𝑛 alarms occurred within
𝑡 of alarm, then non-suppressible;
otherwise abstain
Approach
12
Probabilistic Labeling
13
Case Study: Snorkel
• State-of-the-art tool for weak label combination
Train Model
ℎ ∈ 𝐻
Class of Generative Models
𝐻 ⊂ {ℎ ∶ ℒ! → 𝑃(𝑌)}
Weak Labels for Alarms
𝐿! 𝑋
Obtain Strong Labels and
Associated Confidences
𝑓 𝑥 , 𝑔(𝑥)
Learn a weight for each labeling
function which are a function of
their unknown accuracies.
Noisy labels. Probabilistic labels.
Approach
14
Confidence Bounds
• Create interval,
𝐶 = 𝑅 𝐼" − 𝑐", 𝑅 𝐼" + 𝑐"
containing the sensitivity/specificity from an observational study 𝐼"
∗
with some probability 𝑝
• Interval size 𝑐" depends on two factors:
• the sampling randomness between 𝐼6 and 𝐼6
∗
• the quality of the probabilistic labels in 𝐼6
15
Confidence Bounds
High-Confidence Sample Sets
16
Sensitivity/Specificity Estimation
Confidence Bounds
Confidence Bounds
High-Confidence Sample Sets
17
Sensitivity/Specificity Estimation
Confidence Bounds
• We place more trust in samples that we label with high confidence
𝐼" ⊆ 𝑥 ∈ 𝑋 𝑓 𝑥 = 𝑗 ⋀ 𝑔 𝑥 ≥ 1 − 𝜀
Confidence Bounds
High-Confidence Sample Sets
18
Sensitivity/Specificity Estimation
Confidence Bounds
𝐼! ⊆ 𝑥 ∈ 𝑋 𝑓 𝑥 = 𝑗 ⋀ 𝑔 𝑥 ≥ 1 − 𝜀
• 𝑅 𝐼6 is the proportion of high-confidence samples the system labeled 𝑗
• Assumption 1: Consistency across Datasets
𝔼 𝑅 𝐼" − 𝔼 𝑅 𝐼"
∗
≤ 1 − 𝜂"
Est. Rate “Act.” Rate Avg. Label
Uncertainty
Confidence Bounds
High-Confidence Sample Sets
19
Sensitivity/Specificity Estimation
Confidence Bounds
𝐼! ⊆ 𝑥 ∈ 𝑋 𝑓 𝑥 = 𝑗 ⋀ 𝑔 𝑥 ≥ 1 − 𝜀
• Assumption 1: Consistency across Datasets
𝔼 𝑅 𝐼! − 𝔼 𝑅 𝐼!
∗
≤ 1 − 𝜂!
• Theorem 1: Bounded Difference of Estimates
ℙ 𝑅 𝐼! − 𝑅 𝐼!
∗
≥ 𝑐! ≤ 2exp −2 𝐼!
∗
(𝑐! + 𝜂! − 1 − 𝛾!) + 2exp −2 𝐼! 𝛾!
#
• Optimization
𝑐! = min
$!, &!
1 − 𝜂! + 𝛾! +
ln 2 − ln 𝑝! − 2exp −2 𝐼! 𝛾!
#
2 𝐼!
∗
Case Study: Confidence Bounds
20
Sensitivity Specificity Trade-off
Bound Width 0.045 0.201 N/A
Containment 1.000 1.000 1.000
Comparative Approach
• Probabilistic labeling by majority vote
• Each labeling function has equal weight
• Weaknesses:
• Inaccurate labeling functions can yield incorrect labels with high confidence
• Does not consider class balance
21
Comparative Approach
22
Sensitivity Specificity Trade-off
Bound Width 0.060 0.072 N/A
Containment 0.800 0.780 0.115
Outline
• Introduction
• Motivating Scenarios
• Related Work
• Problem Formulation
• Approach
• Conclusion and Future Work
23
Conclusion
• We proposed an approach for estimating the performance of a
physiologic alarm suppression system only with access only to
unlabeled data
• We demonstrated that our approach achieves moderately-tight
bounds with high containment in a low SpO2 alarm case study
• Future work:
• Automated extraction of labeling functions
• Unsupervised calibration for data programming
• More case studies!
24
25
THANK YOU!
http://precise.seas.upenn.edu
LET’S CONNECT!
Sydney Pugh
sfpugh.github.io

More Related Content

PDF
910 plenary Elder
PDF
MLSEV Virtual. Searching for Anomalies
PDF
MLSEV Virtual. State of the Art in ML
PPTX
Patient Journey Record(pajr) - Jing Su
PDF
PPTX
Final_Presentation.pptx
PDF
Making your science powerful : an introduction to NGS experimental design
PPTX
Biffani ncd
910 plenary Elder
MLSEV Virtual. Searching for Anomalies
MLSEV Virtual. State of the Art in ML
Patient Journey Record(pajr) - Jing Su
Final_Presentation.pptx
Making your science powerful : an introduction to NGS experimental design
Biffani ncd

Similar to High-Confidence Data Programming for Evaluating Suppression of Physiological Alarms (20)

PPTX
Lecture 7 gwas full
PPTX
Deep Learning for AI (3)
PDF
Basics Of Neural Network Analysis
PDF
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
PDF
840 plenary elder_using his laptop
PDF
Data analysis ( Bio-statistic )
PPS
Probability Forecasting - a Machine Learning Perspective
PPTX
OSCE paeds.pptx
PDF
Machine learning and Internet of Things, the future of medical prevention
PPTX
Lecture 3 quantitative traits and heritability full
PDF
Golden Rules of Bioinformatics
PPTX
Parkinson disease classification v2.0
PPTX
Non parametric study; Statistical approach for med student
PPTX
Sustained attention presentation
PPTX
Modeling Electronic Health Records with Recurrent Neural Networks
PPTX
Control of analytical quality using stable control materials postgrad
PPT
Errors2
PPT
1606 probabilistic risk assessment in environmental toxicology
PDF
What is Quality control in detail material
PPTX
Fetal Health Final ppt using machine learning.pptx
Lecture 7 gwas full
Deep Learning for AI (3)
Basics Of Neural Network Analysis
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
840 plenary elder_using his laptop
Data analysis ( Bio-statistic )
Probability Forecasting - a Machine Learning Perspective
OSCE paeds.pptx
Machine learning and Internet of Things, the future of medical prevention
Lecture 3 quantitative traits and heritability full
Golden Rules of Bioinformatics
Parkinson disease classification v2.0
Non parametric study; Statistical approach for med student
Sustained attention presentation
Modeling Electronic Health Records with Recurrent Neural Networks
Control of analytical quality using stable control materials postgrad
Errors2
1606 probabilistic risk assessment in environmental toxicology
What is Quality control in detail material
Fetal Health Final ppt using machine learning.pptx
Ad

More from Ivan Ruchkin (20)

PDF
Neural Approximation of Vision-Controlled Systems for Reachability Analysis
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
PPTX
NeuroStrata: Harnessing Neuro-Symbolic Paradigms for Improved Testability and...
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification o...
PDF
Four Principles for Physically Interpretable World Models
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification o...
PDF
Four Principles for Physically Interpretable World Models (poster)
PDF
Accelerating Neural Policy Repair with Preservation via Stability-Plasticity ...
PDF
Distributionally Robust Statistical Verification with Imprecise Neural Networks
PDF
Autonomous Drift Detection and Online Road Friction Estimation
PDF
Neuro-Symbolic Bridge: From Perception to Estimation & Control
PDF
Towards Physically Interpretable World Models: Meaningful Weakly Supervised R...
PDF
How Safe Will I Be Given What I See? Calibrated Visual Safety Chance Predict...
PDF
Bridging Dimensions: Confident Reachability for High-Dimensional Controllers...
PDF
Poster: Bridging Dimensions: Confident Reachability for High-Dimensional Cont...
PDF
Bridging Dimensions: Confident Reachability for High-Dimensional Controllers
PDF
Poster: How Safe Am I Given What I See? Calibrated Prediction of Safety Chanc...
PDF
Language-Enhanced Latent Representations for Out-of-Distribution Detection in...
PDF
​Poster: Zero-shot Safety Prediction for Autonomous Robots with Foundation Wo...
PDF
Curating Naturally Adversarial Datasets for Learning-Enabled Medical Cyber-Ph...
Neural Approximation of Vision-Controlled Systems for Reachability Analysis
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
NeuroStrata: Harnessing Neuro-Symbolic Paradigms for Improved Testability and...
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification o...
Four Principles for Physically Interpretable World Models
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification o...
Four Principles for Physically Interpretable World Models (poster)
Accelerating Neural Policy Repair with Preservation via Stability-Plasticity ...
Distributionally Robust Statistical Verification with Imprecise Neural Networks
Autonomous Drift Detection and Online Road Friction Estimation
Neuro-Symbolic Bridge: From Perception to Estimation & Control
Towards Physically Interpretable World Models: Meaningful Weakly Supervised R...
How Safe Will I Be Given What I See? Calibrated Visual Safety Chance Predict...
Bridging Dimensions: Confident Reachability for High-Dimensional Controllers...
Poster: Bridging Dimensions: Confident Reachability for High-Dimensional Cont...
Bridging Dimensions: Confident Reachability for High-Dimensional Controllers
Poster: How Safe Am I Given What I See? Calibrated Prediction of Safety Chanc...
Language-Enhanced Latent Representations for Out-of-Distribution Detection in...
​Poster: Zero-shot Safety Prediction for Autonomous Robots with Foundation Wo...
Curating Naturally Adversarial Datasets for Learning-Enabled Medical Cyber-Ph...
Ad

Recently uploaded (20)

PDF
Khaled Sary- Trailblazers of Transformation Middle East's 5 Most Inspiring Le...
PPTX
BLS, BCLS Module-A life saving procedure
PDF
Dermatology diseases Index August 2025.pdf
PPTX
3. Adherance Complianace.pptx pharmacy pci
PPT
KULIAH UG WANITA Prof Endang 121110 (1).ppt
PDF
Priorities Critical Care Nursing 7th Edition by Urden Stacy Lough Test Bank.pdf
PPTX
COMMUNICATION SKILSS IN NURSING PRACTICE
PPTX
Pulmonary Circulation PPT final for easy
PPTX
CBT FOR OCD TREATMENT WITHOUT MEDICATION
PPTX
Immunity....(shweta).................pptx
PPT
Recent advances in Diagnosis of Autoimmune Disorders
PPTX
Current Treatment Of Heart Failure By Dr Masood Ahmed
PDF
2E-Learning-Together...PICS-PCISF con.pdf
PDF
CHAPTER 9 MEETING SAFETY NEEDS FOR OLDER ADULTS.pdf
PDF
Dr. Jasvant Modi - Passionate About Philanthropy
PDF
Structure Composition and Mechanical Properties of Australian O.pdf
PPTX
General Pharmacology by Nandini Ratne, Nagpur College of Pharmacy, Hingna Roa...
PPTX
Trichuris trichiura infection
PPTX
PE and Health 7 Quarter 3 Lesson 1 Day 3,4 and 5.pptx
PPT
Pyramid Points Lab Values Power Point(11).ppt
Khaled Sary- Trailblazers of Transformation Middle East's 5 Most Inspiring Le...
BLS, BCLS Module-A life saving procedure
Dermatology diseases Index August 2025.pdf
3. Adherance Complianace.pptx pharmacy pci
KULIAH UG WANITA Prof Endang 121110 (1).ppt
Priorities Critical Care Nursing 7th Edition by Urden Stacy Lough Test Bank.pdf
COMMUNICATION SKILSS IN NURSING PRACTICE
Pulmonary Circulation PPT final for easy
CBT FOR OCD TREATMENT WITHOUT MEDICATION
Immunity....(shweta).................pptx
Recent advances in Diagnosis of Autoimmune Disorders
Current Treatment Of Heart Failure By Dr Masood Ahmed
2E-Learning-Together...PICS-PCISF con.pdf
CHAPTER 9 MEETING SAFETY NEEDS FOR OLDER ADULTS.pdf
Dr. Jasvant Modi - Passionate About Philanthropy
Structure Composition and Mechanical Properties of Australian O.pdf
General Pharmacology by Nandini Ratne, Nagpur College of Pharmacy, Hingna Roa...
Trichuris trichiura infection
PE and Health 7 Quarter 3 Lesson 1 Day 3,4 and 5.pptx
Pyramid Points Lab Values Power Point(11).ppt

High-Confidence Data Programming for Evaluating Suppression of Physiological Alarms

  • 1. High-Confidence Data Programming for Evaluating Suppression of Physiological Alarms Sydney Pugh1, Ivan Ruchkin1, Christopher P. Bonafide2, Sara B. DeMauro2, Oleg Sokolsky1, Insup Lee1, James Weimer1 1 Department of Computer and Information Science, University of Pennsylvania 2 Children’s Hospital of Philadelphia
  • 2. Outline • Introduction • Motivating Scenarios • Problem Statement • Related Work • Approach • Conclusion and Future Work 2
  • 3. 3 Patient Alarm-Generating Device Clinical Expert Suppression System Alarms: suppressed and not suppressed Performance Evaluation …but this is expensive • Key characteristics: • Sensitivity • Specificity • Typically requires a labeled alarm dataset • <1% of alarms considered actionable or informative • 7-10 minute delays due to alarm fatigue • Clinical alarm fatigue in ECRI Top 10 Health Tech Hazards since 2007 Manually label the alarms
  • 4. Motivating Scenarios 4 Hospital A Hospital B Want to evaluate with an unlabeled data …but true labels will always be the gold standard How well will it work? Suppression System Scenario 1: Pre-Trial Evaluation Scenario 2: Tuning of a Deployed System Hospital A ALARM FATIGUE!!! How to tune the system? Tunable Suppression System
  • 5. Problem Statement Given an unlabeled dataset of alarms, predict the sensitivity/specificity of an alarm suppression system that would result from an observational study with true alarm labels. 5
  • 6. Related Work • Alarm suppression system evaluation • Interventional and observational studies [MacMurchy, et al., 2017] • High-precision methods of labeling alarm data • Patient simulations and clinical trials [Jang, et al., 2018] • Data programming [Ratner, et al., 2016] 6 Our work is not meant to replace these studies …instead, prioritize, guide, and reduce the risk of them.
  • 7. Outline • Introduction • Motivating Scenarios • Problem Statement • Related Work • Approach • Conclusion and Future Work 7
  • 8. Approach 8 We will discuss the approach via a case study.
  • 9. Data • Collect a dataset of representative alarms and corresponding patient data • 𝑋 = 𝑥!, 𝑥", … Case Study: Low SpO2 Alarms • Suppression system under study: • Suppress alarm if SpO2 at time of alarm is above 𝑇, otherwise do not suppress • Dataset: • IRB study of 100 children from Children’s Hospital of Pennsylvania • Patient background info, continuously recorded vital signs, manually annotated physiologic monitoring alarms 9
  • 11. Labeling Functions • Ask clinicians to describe guidelines for alarm suppressibility • Want quantitative guidelines (qualitative guidelines are difficult to encode) • 1 guideline à 1 or more labeling functions • Assumption: each labeling function has at least 50% accuracy • This is difficult…will discuss more in future work • Apply labeling functions Λ to the data 𝑋 to get weak labels 𝐿! 𝑋 Case Study: 12 Guidelines (62 Labeling Functions) x 11 Example 1: Short Alarm If alarm duration less than 𝑡, then suppressible; otherwise abstain Example 2: Repeat Alarms If more than 𝑛 alarms occurred within 𝑡 of alarm, then non-suppressible; otherwise abstain
  • 13. Probabilistic Labeling 13 Case Study: Snorkel • State-of-the-art tool for weak label combination Train Model ℎ ∈ 𝐻 Class of Generative Models 𝐻 ⊂ {ℎ ∶ ℒ! → 𝑃(𝑌)} Weak Labels for Alarms 𝐿! 𝑋 Obtain Strong Labels and Associated Confidences 𝑓 𝑥 , 𝑔(𝑥) Learn a weight for each labeling function which are a function of their unknown accuracies. Noisy labels. Probabilistic labels.
  • 15. Confidence Bounds • Create interval, 𝐶 = 𝑅 𝐼" − 𝑐", 𝑅 𝐼" + 𝑐" containing the sensitivity/specificity from an observational study 𝐼" ∗ with some probability 𝑝 • Interval size 𝑐" depends on two factors: • the sampling randomness between 𝐼6 and 𝐼6 ∗ • the quality of the probabilistic labels in 𝐼6 15
  • 16. Confidence Bounds High-Confidence Sample Sets 16 Sensitivity/Specificity Estimation Confidence Bounds
  • 17. Confidence Bounds High-Confidence Sample Sets 17 Sensitivity/Specificity Estimation Confidence Bounds • We place more trust in samples that we label with high confidence 𝐼" ⊆ 𝑥 ∈ 𝑋 𝑓 𝑥 = 𝑗 ⋀ 𝑔 𝑥 ≥ 1 − 𝜀
  • 18. Confidence Bounds High-Confidence Sample Sets 18 Sensitivity/Specificity Estimation Confidence Bounds 𝐼! ⊆ 𝑥 ∈ 𝑋 𝑓 𝑥 = 𝑗 ⋀ 𝑔 𝑥 ≥ 1 − 𝜀 • 𝑅 𝐼6 is the proportion of high-confidence samples the system labeled 𝑗 • Assumption 1: Consistency across Datasets 𝔼 𝑅 𝐼" − 𝔼 𝑅 𝐼" ∗ ≤ 1 − 𝜂" Est. Rate “Act.” Rate Avg. Label Uncertainty
  • 19. Confidence Bounds High-Confidence Sample Sets 19 Sensitivity/Specificity Estimation Confidence Bounds 𝐼! ⊆ 𝑥 ∈ 𝑋 𝑓 𝑥 = 𝑗 ⋀ 𝑔 𝑥 ≥ 1 − 𝜀 • Assumption 1: Consistency across Datasets 𝔼 𝑅 𝐼! − 𝔼 𝑅 𝐼! ∗ ≤ 1 − 𝜂! • Theorem 1: Bounded Difference of Estimates ℙ 𝑅 𝐼! − 𝑅 𝐼! ∗ ≥ 𝑐! ≤ 2exp −2 𝐼! ∗ (𝑐! + 𝜂! − 1 − 𝛾!) + 2exp −2 𝐼! 𝛾! # • Optimization 𝑐! = min $!, &! 1 − 𝜂! + 𝛾! + ln 2 − ln 𝑝! − 2exp −2 𝐼! 𝛾! # 2 𝐼! ∗
  • 20. Case Study: Confidence Bounds 20 Sensitivity Specificity Trade-off Bound Width 0.045 0.201 N/A Containment 1.000 1.000 1.000
  • 21. Comparative Approach • Probabilistic labeling by majority vote • Each labeling function has equal weight • Weaknesses: • Inaccurate labeling functions can yield incorrect labels with high confidence • Does not consider class balance 21
  • 22. Comparative Approach 22 Sensitivity Specificity Trade-off Bound Width 0.060 0.072 N/A Containment 0.800 0.780 0.115
  • 23. Outline • Introduction • Motivating Scenarios • Related Work • Problem Formulation • Approach • Conclusion and Future Work 23
  • 24. Conclusion • We proposed an approach for estimating the performance of a physiologic alarm suppression system only with access only to unlabeled data • We demonstrated that our approach achieves moderately-tight bounds with high containment in a low SpO2 alarm case study • Future work: • Automated extraction of labeling functions • Unsupervised calibration for data programming • More case studies! 24