SlideShare a Scribd company logo
Validation of a Natural Language
Processing Protocol for Detecting
     Heart Failure Signs and
 Symptoms in Electronic Health
       Record Text Notes
      Roy J. Byrd2, Steven R. Steinhubl1, Jimeng
   Sun2, Shahram Ebadollahi2, Zahra Daar1, Walter F.
                       Stewart1
        1Geisinger Medical Center, Center for Health
                   Research, Danville, PA
    2 IBM, T.J. Watson Research Center, Hawthorne, NY
Outline
•   Background and objectives
•   Datasets
•   Tools & Methods
•   Results
•   Discussion
    – Challenges
    – Opportunities
• Summary


• (Iterative annotation refinement)
Background and Objectives
• Background
   – Framingham criteria for HF published in 1971
   – Geisinger/IBM “PredMED” project on predictive modeling for
     early detection of HF, using longitudinal EHRs


• Overall Project Objective
    Better understand the presentation of HF in the primary care
     setting, in order to facilitate its more rapid identification and
                                  treatment


• Objective of this paper:
     Build and validate NLP extractors for Framingham criteria
     (signs and symptoms) from EHR clinical notes, so that they
       may be suitable for downstream diagnostic applications
Framingham HF Diagnostic Criteria
           MAJOR SYMPTOMS                        MINOR SYMPTOMS
1. Paroxysmal Nocturnal Dyspnea         1. Bilateral Ankle Edema
    (PND) or Orthopnea
2. Neck Vein Distension (JVD)           2. Nocturnal Cough
3. Rales                                3. Dyspnea on ordinary exertion
4. Radiographic Cardiomegaly            4. Hepatomegaly
5. Acute Pulmonary Edema                5. Pleural effusion
                                        6. A decrease in vital capacity by 1/3
6. S3 Gallop
                                            of the maximal value recorded**
7. Increased Central Venous Pressure    7. Tachycardia (>120 BPM)
     (> 16 cm H2O at RA)
8. Circulation Time of 25 seconds**
9. Hepatojugular Reflux (HJR)           ** Not extracted, since these criteria
                                           are not documented in routine
10.Weight loss 4.5kg in 5 days in          clinical practice.
   response to treatment
                       N Engl J Med. 1971;285:1441-1446.
(Sample downstream analysis)

                          Reports of Framingham HF criteria
                            in the year prior to diagnosis
Percent with Documented Criteria




                                   60

                                   50      Cases (N=4,644)                  Controls (N=45,981)

                                   40
                                                                                                     62.3          65
                                   30

                                   20
                                                                                                            28.6
                                                                                                                        22.9
                                   10   17.2         17.9                               17.7

                                               7.2          5.8   5.2 1.7     1.4 0.7          1.1
                                    0
                                        PND          Rales         JVD        Pulm CMegaly Ankle                   DOE
                                                                             Edema         Edema
Datasets
• Clinical notes from longitudinal (2001-2010) EHR
  encounters for
   – 6,355 case patients
       • Meet operational criteria for HF**
   – 26,052 control patients
       • Clinic-, gender- and age-matched to cases
   – The case-control distinction is exploited in downstream
     applications; it’s not relevant for criteria extraction.
• Development dataset                                  **Operational HF Criteria
   – 65 encounter notes                                    –HF diagnosis on
       • Selected for density of Framingham criteria        problem list,
       • Annotated by a clinical expert                    –HF diagnosis in EHR
                                                            for two outpatient
• Validation dataset                                        encounters,
                                                           –Two or more
   – 400 encounter notes (200 cases & 200 controls)         medications with ICD-
       • Randomly selected                                  9 code for HF, or
       • Annotated by consensus of 4 trained coders        –One HF diagnosis and
                                                            one medication with
       • N = 1492 criteria                                  ICD-9 code for HF
Tools


      • LRW1 – LanguageWare Resource Workbench
                     UIMA Collection Processing Engine
          – Basic Text Processing
Encounter – Dictionaries for
                Basic Processing      Dictionaries and Grammars                  Text Analysis Engines
                                                                                                                   Extracted
               paragraphs, sentences,   for recognizing criteria                for applying constraints
Documents                                                                                                           Criteria
          – Grammars etc.
                  tokenization,               candidates                         and annotating criteria


      • UIMA2 - Unstructured Information Management
        Architecture
            – Execution Pipeline, including I/O management
            – Text Analysis Engines
      • TextSTAT3 – Simple Text Analysis Tool
            – Concordance program, used for linguistic analysis

   1http://www/alphaworks.ibm.com/tech/lrw   2http://uima.apache.org   3http://neon.niederlandistik.fu-berlin.de/en/textstat
Criteria Extraction Methods:
               Dictionaries
• Framingham Criteria              • Negating words
  vocabulary                          – Used to deny criteria
   – Words and phrases used to            • no, free of, ruled out
     mention the 15
     Framingham Criteria
                                   • Counterfactual triggers
                                      – The criteria may not have
   – edema, leg                         occurred
     edema, oedema; shortness
     of breath, SOB                       • if, should, as needed for
   – Size: ~75 “lemma forms”       • Miscellaneous Classes
     (main entries) and               – Weight loss phrases
     hundreds of variant forms            • lose weight, diurese
• Segment Header words                – Time value words
  and phrases                             • day, week, month
   – Patient                          – Weight units
     History, Examination, Plan,          • pound, kilogram
     Instruction                      – Diuretics
                                          • Bumex, Furosimide
Criteria Extraction Methods:
               Grammars
• Shallow English syntax            • Negated Scope
   – Noun Phrases                      – regular rate and rhythm
      • some moderate DOE                without
   – Compound Noun Phrases               murmurs, clicks, gallops, o
                                         r rubs
      • chest pain, DOE, or night
        cough                       • Counterfactual Scope
   – Prepositional Phrases             – Patient should call if she
• No full-sentential parses              experiences shortness of
                                         breath
   – Not needed for simple HF
     criteria                       • Weight Loss
   – Unreliable sentence               – 20 pound weight loss in a
     boundaries and syntax in            week with diuretics
     clinical notes                 • Tachycardia
                                       – tachy at 120 (to 130)
                                       – HR: 135
Criteria Extraction Methods:
     Text Analysis Engines (TAEs)
• Rules to filter candidate       • Co-occurrence
  criteria created from             constraints
  dictionaries and                   – exercise HR: 135 doesn’t
  grammars.                            affirm Tachycardia
• Deny criteria mentioned         • Disambiguation
  in negated contexts                – edema is recognized as
   – regular rate and rhythm           APEdema, if near cxr, or in
     without murmurs, clicks,          a “Radiology” note, or in a
     gallops, or rubs  S3Neg          “Chest X-Ray” segment
• Ignore criteria in              • Numeric constraints
  counterfactual contexts            – she lost 5 pounds over a
                                       month doesn’t affirm
   – Patient should call if she        WeightLoss
     experiences shortness of
     breath                          – tachy @ 115 doesn’t affirm
                                       Tachycardia
Encounter Labeling Methods
• We can label an encounter note with labels showing the
  criteria that the note mentions
   – The labels can be used by downstream analyses to gather
     information such as: “This patient exhibited those symptoms on
     that date.”
• 2 Methods:
   – Machine-learning
      • Using candidate criteria and scope annotations, as features, …
      • use a [CHAID decision tree] classifier to assign criteria as labels.
   – Rule-based
      • Run the full extractor pipeline, then …
      • Assign labels consisting of all unique criteria that survive filtering.
Results
Evaluation Flow




Metrics:                                        Machine    Encounter
                                                Learning    Labels
 Precision (Positive Predictive Value):
                    Lexical       Lexical                              Encounter
   Encounter
  #TruePositive / (#TruePositive &+Scope
                   Look-up          #FalsePositive)                      Label
  Documents
                  & Scope     Annotations                              Evaluation
 Recall (Sensitivity):                                     Encounter
                                                 Rules
  #TruePositive / (#TruePositive + #FalseNegative)          Labels

 F-Score (the harmonic mean of Precision and Recall):
  (2 x Precision x Recall) / (Precision + Recall)                       Criteria
Encounter Labeling Performance

                   Machine-learning method                      Rule-based method

               Recall    Precision     F-Score        Recall      Precision     F-Score


 Affirmed     0.675000   0.754190      0.712401      0.738532    0.899441       0.811083


  Denied      0.945556   0.905319      0.925000      0.987599    0.931915       0.958949


  Overall     0.896364   0.881144      0.888689      0.938462    0.926720       0.932554

Overall 99%
                                     (0.848-0.929)                            (0.900-0.964)
Conf. Int.



    Conclusion: Machine-learning labeling does not significantly underperform
    rule-based labeling.
Performance of Framingham
        Diagnostic Criteria Extraction
                                                            99% Confidence
                          Precision   Recall     F-score
                                                           Interval (F-score)

        Overall (exact)   0.925234 0.896864 0.910828        (0.891 - 0.929)

       Overall (relaxed) 0.948239 0.919164 0.933475         (0.916 - 0.950)

          Affirmed        0.747801 0.789474 0.768072        (0.711 - 0.824)

           Denied         0.982857 0.928058 0.954672        (0.938 - 0.970)


Note: Performance on affirmed criteria is worse, possibly because of their
greater syntactic diversity. For example, we don’t find:
         PleuralEffusion: blunting of the right costrophrenic angle
         DOExertion: she felt like she couldn’t get enough air in
Precision and Recall for Individual
             Criteria
Analysis of 1492 extracted criteria:
             PredMED extractions vs.
            Gold Standard annotations




                                                                                                                                                     e
                                                                                                                                                 tiv
                    ED eg
                    KE td




                                                                                                                                              si
                    E g




                                                                                                                                     TL g
                                                                                                  g
                 AP DN




                  EP g
                 D Ne




                                                                                                                                   W Ne
                                                                                           R eg




                                                                                                                                           Po
                 H eg




                                                                                           R Ne




                                                                                                                                   TA eg
                                                            JV e g



                                                                     N eg




                                                                                           PN eg
                 AN dS

                 AN D




                       e




                                                                                   PL g




                                                                                                                           S3 g
                    EN




                                                                                                N
                     N
                    KE



                    ED




                                                                                       e




                                                                                                                               e



                                                                                                                                         N

                                                                                                                                         H

                                                                                                                                         H
                                                                N




                                                                                               E

                                                                                               E
                                                                        N




                                                                                              EN




                                                                                                                                         e
                                                                                     N




                                                                                                                             N
                                                                                              D

                                                                                              D
             ol




                  EP




                                                                                                                             G

                                                                                                                                      G

                                                                                                                                      C

                                                                                                                                      C
                                                                                            AL

                                                                                            AL
                  JR

                                                             JR




                                                                                     E
                                                              D

                                                                       D




                                                                                                                                      ls
                  O

                  O




                                                                      C

                                                                               C




                                                                                             C

                                                                                                                        C
PredMED




                                                                                           PN
                 AP




                                                                                                                                   TA
                                                                                           PL




                                                                                                                                   S3
                                                                     JV




                                                                                                                                   Fa
            G




                 D

                 H

                 H



                                                            H




                                                                              N




                                                                                           R



                                                                                                                       R
ANKED            90     6                                                                                                                           16
ANKEDNeg              230                                                                                                                            6
APED              8         5                                                        2                                                          1   22
APEDNeg                         0
DOE                                 116 17                                                         1                                                3
DOENeg                                3 135                                                        2                                                1
HEP                                           0     1
HEPNeg                                            125
HJR                                                     2   1
HJRNeg                                                      9
JVD                                                             7     2
JVDNeg                                                               91
NC                                                                        2
NCNeg                                                                         43                                                                    2
PLE                                                                                  8
PLENeg                                                                                     1
PND                                   1                                                        7    2
PNDNeg                                                                                             69
RALE                                                                                                    11                                          1
RALENeg                                                                                                      197
RC                                                                                                                 6
RCNeg                                                                                                                  1
S3G                                                                                                                          0
S3GNeg                                                                                                                           131
TACH                                                                                                                                    1           2
TACHNeg                                                                                                                                     0       4
WTL                                                                                                                                             0
False Negative    6    8    5   2     6   5   1    4    1            3               2         2   7         35    2   1     1     10
Discussion
• Challenges                           • Opportunities
   – Data quality: EHR text data is       – We can apply similar
     messy.                                 techniques to other collections
       • >10% (i.e., 26/237) of the         of criteria.
         errors are caused by                 • NY Heart Association
         misspellings & bad sentence          • European Society of
         boundaries                             Cardiology
   – Human anatomy                            • MedicalCriteria.com
       • We need a better solution        – Many specific criteria
         than word co-occurrence            extractors can be re-used in
         constraints
                                            other settings.
   – Syntactic diversity of affirmed
     criteria
       • We need deeper syntactic         – For downstream applications,
         and semantic analysis              see posters and presentations
   – Contradictions and                     from our project at this
     redundancy                             conference
       • An issue for downstream
         analysis
Validation of a Natural Language Processing Protocol for Detecting Heart Failure Sins in Electronic Health Record Notes BYRD
Summary
• Extractors can identify affirmations and denials
  of Framingham HF criteria in EHR clinical notes
  with an overall F-Score of 0.91.
• Classifiers can label EHR encounters with the
  Framingham critera they mention with an F-
  Score of 0.93.
• Information about HF criteria mentioned in EHR
  notes appears to be useful for downstream
  applications that seek to achieve early detection
  of HF.
Backup:
Iterative Annotation Refinement
Iterative Annotation Refinement
• What are the problems solved?
  – Annotations are required for training and evaluating
    criteria extractors.
  – Human annotators without guidelines have high
    precision but lower recall.
  – Domain experts’ intuitions (about the language for
    expressing criteria) are initially imprecise.
• What is produced?
  – Annotated dataset
  – Annotation guidelines         … that are consistent
  – Criteria extractors
The Development Process:
           Iterative Annotation Refinement
               Initialization   Results                  Iteration

                                                        Update the
                                  Expert
                 Write          Annotations             annotations
                 initial                                  and the
Expert         guidelines                                guidelines

   Discuss
      the                       Annotation    Annotate texts         Perform
  language        Encounter     Guidelines     with current           error
     of HF          Texts
                                                extractors           analysis
    criteria



                 Build
                                  Criteria              Update the
                 initial         Extractors             extractors
               extractors
Linguist
User interface for the annotation tool, which was
used to manage annotations during refinement.
Performance improvement during
         development
                                        Performance comparison
                                                                                  Final
                                         PredMED       Clinical Expert
              1                                                Ini al




             0.9
                                                                               Final



             0.8
 Precision




                         Ini al
             0.7



             0.6



             0.5
                   0.5            0.6         0.7               0.8      0.9              1
                                                    Recall
Iterative methods for creating
 annotations, guidelines, and extractors
                   Extraction       Result of using    Sources of       Arbiter for     Objective (and
                   target           the method         annotations      disagreements   metric) for each
                                                       compared in      at each         iteration
                                                       each iteration   iteration

Iterative          Framingham       - Annotations      Expert and       Expert          Improve extractor
Annotation         HF criteria      - Guidelines       Extractor                        performance (F-
Refinement                          - Extractor                                         score)

Annotation         Clinical         - Guidelines (in   Expert and       Consensus       Improve inter-
Induction          conditions       the form of an     Linguist                         annotator
(Chapman, et                        annotation                                          agreement (F-
al. J Biom Inf                      schema)                                             score)
2006)
CDKRM              Classes in the   - Annotations      2 Experts        Consensus       Improve inter-
(Coden, et al.,    cancer disease   - Guidelines                                        annotator
J Biom Inf         model                                                                agreement
2009)                                                                                   (agreement %)
TALLAL             PHI (protected   - Annotations      Expert and       Expert          Annotate full
(Carrell, et al,   health           - Extractor        Extractor                        dataset (to the
GHRI-IT            information)                                                         expert’s
poster, 2010)      classes                                                              satisfaction)

More Related Content

PPTX
Approach to Vomiting in children
PDF
abdominal migraine.pdf
PPTX
ABG Interpretation.pptx
PPTX
Precardial examination basics
PPTX
Abg interpretation copy
PPTX
Acute & Chronic Diarrhea and Constipation: Approach to Management 2 Oct 2017
PPTX
Abdominal pain hot or not
PPT
racecadotril
Approach to Vomiting in children
abdominal migraine.pdf
ABG Interpretation.pptx
Precardial examination basics
Abg interpretation copy
Acute & Chronic Diarrhea and Constipation: Approach to Management 2 Oct 2017
Abdominal pain hot or not
racecadotril

What's hot (20)

PPTX
Approach to GI Bleeding in Children
PPT
Death Case Review
PPTX
Approach to anemia in children
PPT
Case history of spinal muscular atrophy
PPTX
Hyperglycemic crises and hypoglycemia
PPTX
Approach to chronic diarrhoea
PDF
Approach to Arthritis in Children
PPTX
Glomerulonephritis Case Presentation
PPTX
Heart failure in children
PPTX
An approach to a child with hepatosplenomegaly and lymphadenopathy
PPT
ALCOHOLIC LIVER DISEASE
PDF
Growth Charts.pdf
PDF
Core clinical cases in pediatrics
PPTX
Acute complications of Diabetes Mellitus
PDF
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
PPTX
Celiac disease 2020
PPTX
PPTX
Heart Failure Approach class.pptx
PPTX
Approach to abdominal pain
PPTX
Approach to acyanotic congenital heart diseases
Approach to GI Bleeding in Children
Death Case Review
Approach to anemia in children
Case history of spinal muscular atrophy
Hyperglycemic crises and hypoglycemia
Approach to chronic diarrhoea
Approach to Arthritis in Children
Glomerulonephritis Case Presentation
Heart failure in children
An approach to a child with hepatosplenomegaly and lymphadenopathy
ALCOHOLIC LIVER DISEASE
Growth Charts.pdf
Core clinical cases in pediatrics
Acute complications of Diabetes Mellitus
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Celiac disease 2020
Heart Failure Approach class.pptx
Approach to abdominal pain
Approach to acyanotic congenital heart diseases
Ad

Similar to Validation of a Natural Language Processing Protocol for Detecting Heart Failure Sins in Electronic Health Record Notes BYRD (20)

PPTX
McGill Workshop
PDF
IVMS -ICM Medical History and Physical Examination Overview
PPTX
How Long Before the Clinical Diagnosis of Heart Failure are HF Signs and Symp...
PDF
La informática en el ámbito de la salud una ayuda en la gestión del servicio ...
PDF
Choosing Proper Levels of EM Services - Dave Klein, CPC, CHC
PDF
Choosing Proper Levels of EM Services - Dave Klein, CPC, CHC
PPT
Billing training coding e&m
PDF
PAC 5100 Physical Diagnosis II Syllabus
PDF
Why is this patient here today?
PDF
Why is this patient here today
PPTX
Workshop — The Art of Writing Good Multiple-Choice Questions for High-Stakes ...
PPT
UOG Journal Club: Maternal hemodynamics at 11-13 weeks’ gestation and risk of...
PDF
PAC 5000 Physical Diagnosis I Syllabus
DOCX
In 750-1,000 words, develop an evaluation plan to be included in y
PDF
2 page questionnaire health assessment form
PDF
PAC 5200 Physical Diagnosis III Syllabus
PDF
Automating the formalization of clinical guidelines using information extraction
PDF
Key Feature Questions - An Introduction
PDF
Emp Stemi 200906
PPTX
HMS Trial Slides
McGill Workshop
IVMS -ICM Medical History and Physical Examination Overview
How Long Before the Clinical Diagnosis of Heart Failure are HF Signs and Symp...
La informática en el ámbito de la salud una ayuda en la gestión del servicio ...
Choosing Proper Levels of EM Services - Dave Klein, CPC, CHC
Choosing Proper Levels of EM Services - Dave Klein, CPC, CHC
Billing training coding e&m
PAC 5100 Physical Diagnosis II Syllabus
Why is this patient here today?
Why is this patient here today
Workshop — The Art of Writing Good Multiple-Choice Questions for High-Stakes ...
UOG Journal Club: Maternal hemodynamics at 11-13 weeks’ gestation and risk of...
PAC 5000 Physical Diagnosis I Syllabus
In 750-1,000 words, develop an evaluation plan to be included in y
2 page questionnaire health assessment form
PAC 5200 Physical Diagnosis III Syllabus
Automating the formalization of clinical guidelines using information extraction
Key Feature Questions - An Introduction
Emp Stemi 200906
HMS Trial Slides
Ad

More from HMO Research Network (20)

PPT
New Rules Dealing with Conflicts of Interest in Public Health Service Funded ...
PPTX
From Populations to Patients
PPTX
Evaluation of the Validity of the Gestational Length Assumptions Based Upon A...
PPTX
Comparative Safety of Infliximaband Etanercept on the Risk of Serious Infecti...
PPTX
A Multi State Markov Model for Analyzing Patterns of Use of Opiod Treatments ...
PPTX
A Descriptive Study of Vaccinations Occuring During Pregnancy HENNINGER
PPTX
The Use of Administrative Data and Natural Language Processing to Estimate th...
PPTX
Patient Views of KRAS Testing for Treatment of Metastatic Colorectal Cancer L...
PPTX
Comparative Effectiveness of Chemotherapy Regimens for Advanced Lung Cancer C...
PPT
CER HUB An Informatics Platform for Conducting Compartive Effectiveness with ...
PPTX
An Application of Doubly Robust Estimation JOHNSON
PPT
Risk Factors for Short Term Virologic Outcomes Among HIV Infected Patients Un...
PPTX
Expanding SEER Reporting with Comorbidity Data Colorectal Cancer HORNBROOK
PPT
Drug Characteristics Associated with Medication Adherence Across Eight Diseas...
PPTX
Feasibility of Implementing Screening Brief Intervention and Referral to Trea...
PPTX
eCare for Heart Wellness A Trial to Test the Feasibility of Web Based Dietici...
PPTX
A Telephone Based Diabetes Prevention Program and Social Support for Weight L...
PPTX
Technological Resources & Personnel Costs Required to Implement an Automated ...
PPTX
Online Patient Access to their Medical Record and Health Providers is Associa...
PPTX
Documentations of Advanced Heath Care Directives Where Are They TAI_SEALE
New Rules Dealing with Conflicts of Interest in Public Health Service Funded ...
From Populations to Patients
Evaluation of the Validity of the Gestational Length Assumptions Based Upon A...
Comparative Safety of Infliximaband Etanercept on the Risk of Serious Infecti...
A Multi State Markov Model for Analyzing Patterns of Use of Opiod Treatments ...
A Descriptive Study of Vaccinations Occuring During Pregnancy HENNINGER
The Use of Administrative Data and Natural Language Processing to Estimate th...
Patient Views of KRAS Testing for Treatment of Metastatic Colorectal Cancer L...
Comparative Effectiveness of Chemotherapy Regimens for Advanced Lung Cancer C...
CER HUB An Informatics Platform for Conducting Compartive Effectiveness with ...
An Application of Doubly Robust Estimation JOHNSON
Risk Factors for Short Term Virologic Outcomes Among HIV Infected Patients Un...
Expanding SEER Reporting with Comorbidity Data Colorectal Cancer HORNBROOK
Drug Characteristics Associated with Medication Adherence Across Eight Diseas...
Feasibility of Implementing Screening Brief Intervention and Referral to Trea...
eCare for Heart Wellness A Trial to Test the Feasibility of Web Based Dietici...
A Telephone Based Diabetes Prevention Program and Social Support for Weight L...
Technological Resources & Personnel Costs Required to Implement an Automated ...
Online Patient Access to their Medical Record and Health Providers is Associa...
Documentations of Advanced Heath Care Directives Where Are They TAI_SEALE

Validation of a Natural Language Processing Protocol for Detecting Heart Failure Sins in Electronic Health Record Notes BYRD

  • 1. Validation of a Natural Language Processing Protocol for Detecting Heart Failure Signs and Symptoms in Electronic Health Record Text Notes Roy J. Byrd2, Steven R. Steinhubl1, Jimeng Sun2, Shahram Ebadollahi2, Zahra Daar1, Walter F. Stewart1 1Geisinger Medical Center, Center for Health Research, Danville, PA 2 IBM, T.J. Watson Research Center, Hawthorne, NY
  • 2. Outline • Background and objectives • Datasets • Tools & Methods • Results • Discussion – Challenges – Opportunities • Summary • (Iterative annotation refinement)
  • 3. Background and Objectives • Background – Framingham criteria for HF published in 1971 – Geisinger/IBM “PredMED” project on predictive modeling for early detection of HF, using longitudinal EHRs • Overall Project Objective Better understand the presentation of HF in the primary care setting, in order to facilitate its more rapid identification and treatment • Objective of this paper: Build and validate NLP extractors for Framingham criteria (signs and symptoms) from EHR clinical notes, so that they may be suitable for downstream diagnostic applications
  • 4. Framingham HF Diagnostic Criteria MAJOR SYMPTOMS MINOR SYMPTOMS 1. Paroxysmal Nocturnal Dyspnea 1. Bilateral Ankle Edema (PND) or Orthopnea 2. Neck Vein Distension (JVD) 2. Nocturnal Cough 3. Rales 3. Dyspnea on ordinary exertion 4. Radiographic Cardiomegaly 4. Hepatomegaly 5. Acute Pulmonary Edema 5. Pleural effusion 6. A decrease in vital capacity by 1/3 6. S3 Gallop of the maximal value recorded** 7. Increased Central Venous Pressure 7. Tachycardia (>120 BPM) (> 16 cm H2O at RA) 8. Circulation Time of 25 seconds** 9. Hepatojugular Reflux (HJR) ** Not extracted, since these criteria are not documented in routine 10.Weight loss 4.5kg in 5 days in clinical practice. response to treatment N Engl J Med. 1971;285:1441-1446.
  • 5. (Sample downstream analysis) Reports of Framingham HF criteria in the year prior to diagnosis Percent with Documented Criteria 60 50 Cases (N=4,644) Controls (N=45,981) 40 62.3 65 30 20 28.6 22.9 10 17.2 17.9 17.7 7.2 5.8 5.2 1.7 1.4 0.7 1.1 0 PND Rales JVD Pulm CMegaly Ankle DOE Edema Edema
  • 6. Datasets • Clinical notes from longitudinal (2001-2010) EHR encounters for – 6,355 case patients • Meet operational criteria for HF** – 26,052 control patients • Clinic-, gender- and age-matched to cases – The case-control distinction is exploited in downstream applications; it’s not relevant for criteria extraction. • Development dataset **Operational HF Criteria – 65 encounter notes –HF diagnosis on • Selected for density of Framingham criteria problem list, • Annotated by a clinical expert –HF diagnosis in EHR for two outpatient • Validation dataset encounters, –Two or more – 400 encounter notes (200 cases & 200 controls) medications with ICD- • Randomly selected 9 code for HF, or • Annotated by consensus of 4 trained coders –One HF diagnosis and one medication with • N = 1492 criteria ICD-9 code for HF
  • 7. Tools • LRW1 – LanguageWare Resource Workbench UIMA Collection Processing Engine – Basic Text Processing Encounter – Dictionaries for Basic Processing Dictionaries and Grammars Text Analysis Engines Extracted paragraphs, sentences, for recognizing criteria for applying constraints Documents Criteria – Grammars etc. tokenization, candidates and annotating criteria • UIMA2 - Unstructured Information Management Architecture – Execution Pipeline, including I/O management – Text Analysis Engines • TextSTAT3 – Simple Text Analysis Tool – Concordance program, used for linguistic analysis 1http://www/alphaworks.ibm.com/tech/lrw 2http://uima.apache.org 3http://neon.niederlandistik.fu-berlin.de/en/textstat
  • 8. Criteria Extraction Methods: Dictionaries • Framingham Criteria • Negating words vocabulary – Used to deny criteria – Words and phrases used to • no, free of, ruled out mention the 15 Framingham Criteria • Counterfactual triggers – The criteria may not have – edema, leg occurred edema, oedema; shortness of breath, SOB • if, should, as needed for – Size: ~75 “lemma forms” • Miscellaneous Classes (main entries) and – Weight loss phrases hundreds of variant forms • lose weight, diurese • Segment Header words – Time value words and phrases • day, week, month – Patient – Weight units History, Examination, Plan, • pound, kilogram Instruction – Diuretics • Bumex, Furosimide
  • 9. Criteria Extraction Methods: Grammars • Shallow English syntax • Negated Scope – Noun Phrases – regular rate and rhythm • some moderate DOE without – Compound Noun Phrases murmurs, clicks, gallops, o r rubs • chest pain, DOE, or night cough • Counterfactual Scope – Prepositional Phrases – Patient should call if she • No full-sentential parses experiences shortness of breath – Not needed for simple HF criteria • Weight Loss – Unreliable sentence – 20 pound weight loss in a boundaries and syntax in week with diuretics clinical notes • Tachycardia – tachy at 120 (to 130) – HR: 135
  • 10. Criteria Extraction Methods: Text Analysis Engines (TAEs) • Rules to filter candidate • Co-occurrence criteria created from constraints dictionaries and – exercise HR: 135 doesn’t grammars. affirm Tachycardia • Deny criteria mentioned • Disambiguation in negated contexts – edema is recognized as – regular rate and rhythm APEdema, if near cxr, or in without murmurs, clicks, a “Radiology” note, or in a gallops, or rubs  S3Neg “Chest X-Ray” segment • Ignore criteria in • Numeric constraints counterfactual contexts – she lost 5 pounds over a month doesn’t affirm – Patient should call if she WeightLoss experiences shortness of breath – tachy @ 115 doesn’t affirm Tachycardia
  • 11. Encounter Labeling Methods • We can label an encounter note with labels showing the criteria that the note mentions – The labels can be used by downstream analyses to gather information such as: “This patient exhibited those symptoms on that date.” • 2 Methods: – Machine-learning • Using candidate criteria and scope annotations, as features, … • use a [CHAID decision tree] classifier to assign criteria as labels. – Rule-based • Run the full extractor pipeline, then … • Assign labels consisting of all unique criteria that survive filtering.
  • 13. Evaluation Flow Metrics: Machine Encounter Learning Labels Precision (Positive Predictive Value): Lexical Lexical Encounter Encounter #TruePositive / (#TruePositive &+Scope Look-up #FalsePositive) Label Documents & Scope Annotations Evaluation Recall (Sensitivity): Encounter Rules #TruePositive / (#TruePositive + #FalseNegative) Labels F-Score (the harmonic mean of Precision and Recall): (2 x Precision x Recall) / (Precision + Recall) Criteria
  • 14. Encounter Labeling Performance Machine-learning method Rule-based method Recall Precision F-Score Recall Precision F-Score Affirmed 0.675000 0.754190 0.712401 0.738532 0.899441 0.811083 Denied 0.945556 0.905319 0.925000 0.987599 0.931915 0.958949 Overall 0.896364 0.881144 0.888689 0.938462 0.926720 0.932554 Overall 99% (0.848-0.929) (0.900-0.964) Conf. Int. Conclusion: Machine-learning labeling does not significantly underperform rule-based labeling.
  • 15. Performance of Framingham Diagnostic Criteria Extraction 99% Confidence Precision Recall F-score Interval (F-score) Overall (exact) 0.925234 0.896864 0.910828 (0.891 - 0.929) Overall (relaxed) 0.948239 0.919164 0.933475 (0.916 - 0.950) Affirmed 0.747801 0.789474 0.768072 (0.711 - 0.824) Denied 0.982857 0.928058 0.954672 (0.938 - 0.970) Note: Performance on affirmed criteria is worse, possibly because of their greater syntactic diversity. For example, we don’t find: PleuralEffusion: blunting of the right costrophrenic angle DOExertion: she felt like she couldn’t get enough air in
  • 16. Precision and Recall for Individual Criteria
  • 17. Analysis of 1492 extracted criteria: PredMED extractions vs. Gold Standard annotations e tiv ED eg KE td si E g TL g g AP DN EP g D Ne W Ne R eg Po H eg R Ne TA eg JV e g N eg PN eg AN dS AN D e PL g S3 g EN N N KE ED e e N H H N E E N EN e N N D D ol EP G G C C AL AL JR JR E D D ls O O C C C C PredMED PN AP TA PL S3 JV Fa G D H H H N R R ANKED 90 6 16 ANKEDNeg 230 6 APED 8 5 2 1 22 APEDNeg 0 DOE 116 17 1 3 DOENeg 3 135 2 1 HEP 0 1 HEPNeg 125 HJR 2 1 HJRNeg 9 JVD 7 2 JVDNeg 91 NC 2 NCNeg 43 2 PLE 8 PLENeg 1 PND 1 7 2 PNDNeg 69 RALE 11 1 RALENeg 197 RC 6 RCNeg 1 S3G 0 S3GNeg 131 TACH 1 2 TACHNeg 0 4 WTL 0 False Negative 6 8 5 2 6 5 1 4 1 3 2 2 7 35 2 1 1 10
  • 18. Discussion • Challenges • Opportunities – Data quality: EHR text data is – We can apply similar messy. techniques to other collections • >10% (i.e., 26/237) of the of criteria. errors are caused by • NY Heart Association misspellings & bad sentence • European Society of boundaries Cardiology – Human anatomy • MedicalCriteria.com • We need a better solution – Many specific criteria than word co-occurrence extractors can be re-used in constraints other settings. – Syntactic diversity of affirmed criteria • We need deeper syntactic – For downstream applications, and semantic analysis see posters and presentations – Contradictions and from our project at this redundancy conference • An issue for downstream analysis
  • 20. Summary • Extractors can identify affirmations and denials of Framingham HF criteria in EHR clinical notes with an overall F-Score of 0.91. • Classifiers can label EHR encounters with the Framingham critera they mention with an F- Score of 0.93. • Information about HF criteria mentioned in EHR notes appears to be useful for downstream applications that seek to achieve early detection of HF.
  • 22. Iterative Annotation Refinement • What are the problems solved? – Annotations are required for training and evaluating criteria extractors. – Human annotators without guidelines have high precision but lower recall. – Domain experts’ intuitions (about the language for expressing criteria) are initially imprecise. • What is produced? – Annotated dataset – Annotation guidelines … that are consistent – Criteria extractors
  • 23. The Development Process: Iterative Annotation Refinement Initialization Results Iteration Update the Expert Write Annotations annotations initial and the Expert guidelines guidelines Discuss the Annotation Annotate texts Perform language Encounter Guidelines with current error of HF Texts extractors analysis criteria Build Criteria Update the initial Extractors extractors extractors Linguist
  • 24. User interface for the annotation tool, which was used to manage annotations during refinement.
  • 25. Performance improvement during development Performance comparison Final PredMED Clinical Expert 1 Ini al 0.9 Final 0.8 Precision Ini al 0.7 0.6 0.5 0.5 0.6 0.7 0.8 0.9 1 Recall
  • 26. Iterative methods for creating annotations, guidelines, and extractors Extraction Result of using Sources of Arbiter for Objective (and target the method annotations disagreements metric) for each compared in at each iteration each iteration iteration Iterative Framingham - Annotations Expert and Expert Improve extractor Annotation HF criteria - Guidelines Extractor performance (F- Refinement - Extractor score) Annotation Clinical - Guidelines (in Expert and Consensus Improve inter- Induction conditions the form of an Linguist annotator (Chapman, et annotation agreement (F- al. J Biom Inf schema) score) 2006) CDKRM Classes in the - Annotations 2 Experts Consensus Improve inter- (Coden, et al., cancer disease - Guidelines annotator J Biom Inf model agreement 2009) (agreement %) TALLAL PHI (protected - Annotations Expert and Expert Annotate full (Carrell, et al, health - Extractor Extractor dataset (to the GHRI-IT information) expert’s poster, 2010) classes satisfaction)