High-Confidence Data Programming for Evaluating Suppression of Physiological Alarms

High-Confidence Data Programming
for Evaluating Suppression of
Physiological Alarms
Sydney Pugh1, Ivan Ruchkin1, Christopher P. Bonafide2, Sara B. DeMauro2,
Oleg Sokolsky1, Insup Lee1, James Weimer1
1 Department of Computer and Information Science, University of Pennsylvania
2 Children’s Hospital of Philadelphia

Outline
• Introduction
• Motivating Scenarios
• Problem Statement
• Related Work
• Approach
• Conclusion and Future Work
2

3
Patient Alarm-Generating
Device
Clinical Expert
Suppression
System
Alarms: suppressed
and not suppressed
Performance
Evaluation
…but this is expensive
• Key characteristics:
• Sensitivity
• Specificity
• Typically requires a
labeled alarm dataset
• <1% of alarms considered
actionable or informative
• 7-10 minute delays due to
alarm fatigue
• Clinical alarm fatigue in
ECRI Top 10 Health Tech
Hazards since 2007
Manually label the alarms

Motivating Scenarios
4
Hospital A
Hospital B
Want to evaluate with an unlabeled data
…but true labels will always be the gold standard
How well will it work?
Suppression
System
Scenario 1: Pre-Trial Evaluation
Scenario 2: Tuning of a Deployed System
Hospital A
ALARM FATIGUE!!!
How to tune the system?
Tunable
Suppression
System

Problem Statement
Given an unlabeled dataset of alarms, predict the sensitivity/specificity
of an alarm suppression system that would result from an
observational study with true alarm labels.
5

Related Work
• Alarm suppression system evaluation
• Interventional and observational studies [MacMurchy, et al., 2017]
• High-precision methods of labeling alarm data
• Patient simulations and clinical trials [Jang, et al., 2018]
• Data programming [Ratner, et al., 2016]
6
Our work is not meant to replace these studies
…instead, prioritize, guide, and reduce the risk of them.

Outline
• Introduction
• Problem Statement
• Related Work
• Approach
7

Approach
8
We will discuss the approach via a case study.

Data
• Collect a dataset of representative alarms and corresponding patient data
• 𝑋 = 𝑥!, 𝑥", …
Case Study: Low SpO2 Alarms
• Suppression system under study:
• Suppress alarm if SpO2 at time of alarm is above 𝑇, otherwise do not suppress
• Dataset:
• IRB study of 100 children from Children’s Hospital of Pennsylvania
• Patient background info, continuously recorded vital signs, manually annotated
physiologic monitoring alarms
9

Labeling Functions
• Ask clinicians to describe guidelines for alarm suppressibility
• Want quantitative guidelines (qualitative guidelines are difficult to encode)
• 1 guideline à 1 or more labeling functions
• Assumption: each labeling function has at least 50% accuracy
• This is difficult…will discuss more in future work
• Apply labeling functions Λ to the data 𝑋 to get weak labels 𝐿! 𝑋
Case Study: 12 Guidelines (62 Labeling Functions)
x
11
Example 1: Short Alarm
If alarm duration less than 𝑡, then
suppressible; otherwise abstain
Example 2: Repeat Alarms
If more than 𝑛 alarms occurred within
𝑡 of alarm, then non-suppressible;
otherwise abstain

Probabilistic Labeling
13
Case Study: Snorkel
• State-of-the-art tool for weak label combination
Train Model
ℎ ∈ 𝐻
Class of Generative Models
𝐻 ⊂ {ℎ ∶ ℒ! → 𝑃(𝑌)}
Weak Labels for Alarms
𝐿! 𝑋
Obtain Strong Labels and
Associated Confidences
𝑓 𝑥 , 𝑔(𝑥)
Learn a weight for each labeling
function which are a function of
their unknown accuracies.
Noisy labels. Probabilistic labels.

Confidence Bounds
• Create interval,
𝐶 = 𝑅 𝐼" − 𝑐", 𝑅 𝐼" + 𝑐"
containing the sensitivity/specificity from an observational study 𝐼"
∗
with some probability 𝑝
• Interval size 𝑐" depends on two factors:
• the sampling randomness between 𝐼6 and 𝐼6
∗
• the quality of the probabilistic labels in 𝐼6
15

Confidence Bounds
High-Confidence Sample Sets
16
Sensitivity/Specificity Estimation
Confidence Bounds

Confidence Bounds
17
Confidence Bounds
• We place more trust in samples that we label with high confidence
𝐼" ⊆ 𝑥 ∈ 𝑋 𝑓 𝑥 = 𝑗 ⋀ 𝑔 𝑥 ≥ 1 − 𝜀

Confidence Bounds
18
Confidence Bounds
𝐼! ⊆ 𝑥 ∈ 𝑋 𝑓 𝑥 = 𝑗 ⋀ 𝑔 𝑥 ≥ 1 − 𝜀
• 𝑅 𝐼6 is the proportion of high-confidence samples the system labeled 𝑗
• Assumption 1: Consistency across Datasets
𝔼 𝑅 𝐼" − 𝔼 𝑅 𝐼"
∗
≤ 1 − 𝜂"
Est. Rate “Act.” Rate Avg. Label
Uncertainty

Confidence Bounds
19
Confidence Bounds
𝐼! ⊆ 𝑥 ∈ 𝑋 𝑓 𝑥 = 𝑗 ⋀ 𝑔 𝑥 ≥ 1 − 𝜀
• Assumption 1: Consistency across Datasets
𝔼 𝑅 𝐼! − 𝔼 𝑅 𝐼!
∗
≤ 1 − 𝜂!
• Theorem 1: Bounded Difference of Estimates
ℙ 𝑅 𝐼! − 𝑅 𝐼!
∗
≥ 𝑐! ≤ 2exp −2 𝐼!
∗
(𝑐! + 𝜂! − 1 − 𝛾!) + 2exp −2 𝐼! 𝛾!
#
• Optimization
𝑐! = min
$!, &!
1 − 𝜂! + 𝛾! +
ln 2 − ln 𝑝! − 2exp −2 𝐼! 𝛾!
#
2 𝐼!
∗

Case Study: Confidence Bounds
20
Sensitivity Specificity Trade-off
Bound Width 0.045 0.201 N/A
Containment 1.000 1.000 1.000

Comparative Approach
• Probabilistic labeling by majority vote
• Each labeling function has equal weight
• Weaknesses:
• Inaccurate labeling functions can yield incorrect labels with high confidence
• Does not consider class balance
21

Comparative Approach
22
Sensitivity Specificity Trade-off
Bound Width 0.060 0.072 N/A
Containment 0.800 0.780 0.115

Outline
• Introduction
• Related Work
• Problem Formulation
• Approach
23

Conclusion
• We proposed an approach for estimating the performance of a
physiologic alarm suppression system only with access only to
unlabeled data
• We demonstrated that our approach achieves moderately-tight
bounds with high containment in a low SpO2 alarm case study
• Future work:
• Automated extraction of labeling functions
• Unsupervised calibration for data programming
• More case studies!
24

25
THANK YOU!
http://precise.seas.upenn.edu
LET’S CONNECT!
Sydney Pugh
sfpugh.github.io

High-Confidence Data Programming for Evaluating Suppression of Physiological Alarms

More Related Content

Similar to High-Confidence Data Programming for Evaluating Suppression of Physiological Alarms (20)

More from Ivan Ruchkin (20)

Recently uploaded (20)

High-Confidence Data Programming for Evaluating Suppression of Physiological Alarms