SlideShare a Scribd company logo
Tutorial 1:
Methods and Applications of Natural Language
Processing in Medicine
Rui Zhang1, Hua Xu2, Yanshan Wang3, Yifan Peng4
1University of Minnesota, 2University of Texas Health,
3Mayo Clinic, 4Weill Cornell Medicine
International Conference on Artificial Intelligence in Medicine (AIME 2020)
August 25, 2020
Purpose of this tutorial
• Review NLP systems and tools in solving clinical problems and
facilitating clinical research
• Showcase our real-world NLP application in clinical practice and
research across four institutions
• Discuss opportunities and challenges of NLP in medicine
JAMA 2014;311(24):2479-80
Healthcare Big Data
Motivation for Clinical NLP
20%
80%
Demographics, Lab results,
Medication, Diagnosis…
Clinical notes
Patient provided information
Family history
Social history
Radiology reports
Pathology reports
…
Structured Data
Unstructured Data
Speakers and Topics
Developing high-performance NLP solutions for
healthcare applications
Dr. Hua Xu is a Professor at University of Texas Health School of Biomedical Informatics and a fellow of
the American College of Medical Informatics. His primary research interest is to develop NLP methods
and systems and apply them to clinical research and operation. He has worked on different clinical NLP
topics, such as entity recognition, relation extraction, syntactic parsing, word sense disambiguation,
and active learning, with over 200 publications. He has built multiple clinical NLP systems including the
medication information extraction tool MedEx and a recent comprehensive clinical NLP system CLAMP,
using machine learning and deep learning methods. Those tools have been widely used in large clinical
consortia such as OHDSI and CTSA.
• NLP concepts and tasks
• Issues affecting NLP performance
• Tools to facilitate NLP development
• Applications to healthcare
https://sbmi.uth.edu/faculty-and-staff/hua-xu.htm
Transfer Learning of NLP in Medicine
Dr. Yifan Peng is an assistant professor of population health sciences in the Division of Health
Informatics at Weill Cornell Medicine. After receiving his Ph.D. in Computer Science from the
University of Delaware in 2016, Dr. Peng worked as a research fellow at the National Center for
Biotechnology Information at National Library of Medicine at NIH. Dr. Peng’s main research interests
include biomedical and clinical natural language processing and medical image analysis (by courtesy).
His current project focuses on applying information extracted through NLP and image analysis on
radiological data classification.
• Transfer learning
• Pre-training of BERT model on large-scale clinical corpora
• Fine-tuning the BERT model on specific tasks such as NER and RE
• Multi-task learning
http://vivo.med.cornell.edu/display/cwid-yip4002
Digital Phenotyping for Cohort Discovery
Dr. Yanshan Wang is an Assistant Professor at Mayo Clinic. His current work is centered on developing
novel NLP and artificial intelligence (AI) methodologies for facilitating clinical research and solving real-
world clinical problems. Dr. Wang has extensive collaborative research experience with physicians,
epidemiology researchers, and statisticians. Dr. Wang has published over 40 peer-reviewed articles at
referred computational linguistic conferences (e.g., NAACL), and medical informatics journals and
conference (e.g., JBI, JAMIA, JMIR and AMIA). He has served on program committees for EMNLP,
NAACL, IEEE-ICHI, IEEE-BIBM.
• Cohort retrieval
• Approaches for cohort retrieval
• Case study
• Patient cohort retrieval for clinical trials accrual
https://www.mayo.edu/research/faculty/wang-yanshan-ph-d/bio-20199713.
Advances of NLP in Clinical Research
Dr. Rui Zhang is an McKnight Presidential Fellow and Associate Professor in the College of Pharmacy
and the Institute for Health Informatics (IHI), and also graduate faculty in Data Science at the University
of Minnesota (UMN). He is the Director of NLP Services in Clinical and Transnational Science Institution
(CTSI) at the UMN. Dr. Zhang’s research focuses on health and biomedical informatics, especially
biomedical NLP and text mining. His research interests include the secondly analysis of EHR data for
patient care as well as pharmacovigilance knowledge discovery through mining biomedical literature.
• Background of NLP to Support Clinical Research
• NLP Systems and Tools for Clinical Research
• Case study
• NLP to Support Dietary Supplement Safety Research
http://ruizhang.umn.edu
Schedule
Time Session Presenter
9:00 - 9:05 Introduction Rui Zhang
9:05 - 9:45 Developing high-performance NLP solutions for
healthcare applications
Hua Xu
9:45 - 10:25 Transfer Learning of NLP in Medicine: A case study
with BERT
Yifan Peng
10:25 – 10:30 Break
10:30 – 11:10 Digital Phenotyping for Cohort Discovery Yanshan Wang
11:10 – 11:50 Advances of NLP for Clinical Research Rui Zhang
11:50 – 12: 00 Q&A
Building High-performance
NLP Systems in Healthcare
Hua Xu PhD
School of Biomedical Informatics, University of Texas
AIME NLP Tutorial
8/25/2020
Data
Science
Biomedcine
NLP
Disclosure
§ Founder and CEO:
§ Melax Technologies Inc.
§ Consultant:
§ Hebta LLC
§ More Health Inc.
§ DCHealth Technologies Inc.
2
Outline
01 Overview & Challenges
02 Select right algorithms
03 Annotate good data
3
04 Bring human into the loop
Part 1
01 Overview & Challenges
4
NLP Tasks – Let’s focus on Information Extraction
Information
Retrieval
Information
extraction
Document
classification
Question
answering
Language
generation
Wikipedia
Website
Social media
Email
Office files
Computational techniques for
analyzing and representing
naturally occurring languages at
one or more levels of linguistic
analysis for the purpose of
achieving human-like language
processing for a range of tasks or
applications.
5
Applications of Biomedical IE Systems
Application
Clinical document
Drug labels
Clinical trial
protocols
Biomedical
literature
NLP
Decision support
Business Intelligence
Clinical Research
Surveillance
6
Active Development of Biomedical IE Systems
General Purpose Specific Purpose
MedLEE
MetaMap
CLAMP
cTAKES
Smoking status
PHI De-identification
Social determinants
……
7
Bleeding events
Cancer metastasis
Challenges for End-user to Utilize Biomedical NLP
§ General clinical NLP systems exist, but their performance is often
suboptimal for user-specific applications
§ Specific-purpose NLP systems often show good performance in a given
task, but performance drop when transporting these NLP tools
§ The generalizability issue when users build or deploy NLP applications
§ From one type of document to another
§ From one organization to another
§ From one application to another
8
An Example of Smoking Status Detection
§ Mayo Clinic cTAKES for smoking detection
§ Sentence-level mention detection and classification – machine learning (ML)
§ Document-level status classification – rules
§ Patient-level summarization – rules
§ Performance drop at deployment
§ I2b2 dataset, F-measure 85.5% (Savova et al. JAMIA 2008)
§ Vanderbilt dataset, F-measure 75% (Liu et al. AMIA 2012)
§ Steps to customize it to improve performance to 89%
§ Collect and annotate local data
§ Re-train models using specific algorithms
§ Specify rules by local physicians
9
Optimizing NLP performance
could be time-consuming and
costly …
Components for Building High-performance NLP Systems
10
Rules
Machine learning
Deep learning
Algorithm
Conduct annotation
Specify rules
Curate Knowledgebases
Human
What to annotate
Annotation Quality
Annotation Cost
Data
Practical NLP for
Biomedcine
Part 2
02 Select right algorithms
11
Rules vs. Machine Learning vs. Deep Learning
Rule-based approach to medication information extraction
§ Input: a clinical document, e.g., discharge summary
§ Output: all drug names with associated signature information such as dose,
frequency, route…
§ Issues:
§ Misspellings and abbreviations
§ ibuprofen ("ibuprfen"), augmentin ("qugmentin"), insulin ("inuslin"), and ASA ( aspirin )
§ Context of drug mentions
§ Allergy: pt is allergic to penicillin
§ Negation: never on warfarin
§ Lab tests: potassium level is normal vs. take potassium
§ Temporal status: was on warfarin 3 days before admission
§ Multiple signatures and multiple drugs in one sentence
§ Coumadin 2.5mg po dly except 5mg qTu,Th
§ start the patient on Lovenox for the duration of this pregnancy, followed by a transition to Coumadin postpartum, to be
continued for likely long-term, possibly lifelong duration.
12
Findings Prec Rec F-Score
DrugName 95.0 91.5 93.2
Strength 98.8 90.5 94.5
Route 98.8 89.6 93.9
Frequency 98.9 93.2 96.0
Table 1. Evaluation on discharge
summaries from Vanderbilt.
• Semantic-based parsing (Drug names and signatures)
• Maps to RxNorm concepts
Semantic Tagger Parser
Semantic
Grammar
Lexicon &
Rules
MedEx
Clinical Text
She is currently maintained
on Prograf 3mg bid.
Structured Output
Drugname: Prograf
Strength: 3mg
Frequency: bid
Pre-processing
Xu et al. JAMIA 2010; 17:19-24
Findings Prec Rec F-Score
DrugName 96.7 88.0 92.1
Strength 94.7 94.7 94.7
Route 96.0 87.0 91.3
Frequency 96.8 89.2 92.9
Table 2. Evaluation on clinic visit notes
from Vanderbilt.
13
MedEx – a rule-based tool to identify drug information from free text
Drug Name Lisinopril , Famotidine
Strength 50mg , 500/50
Route by mouth , iv
Frequency b.i.d. , every 2 days
Form tablet , ointment
Dose Amount take one tablet
IntakeTime cc , at 10am
Duration for 10 days
Dispense Amount dispensed #30
Refill refills: 2
Necessity prn , as needed
14
Define Semantic Categories
Entity Recognition
§ Lexicon lookup tagger
§ Drug names
§ Include RxNorm, UMLS, and manually collected drug terms
§ Exclude certain English terms ( sleep )
§ Regular expression-based tagger
§ Frequency, such as q8hrs
§ Transformation/Disambiguation
§ Rule-based transformation/disambiguation of initial tags to final semantic
tags
15
Parsing
§A Chart Parser in NLTK
§ Semantic grammar
§ Parse Tree à Structured output
§A Regular Expression based Chunker
DGMSIG
<S> :=<DrugList>
<DrugList> := <Drug>|<Drug><DrugList>
<Drug> := <DGSSIG> | <DGMSIG>
<DGSSIG> := <DGN> | <DGN> <SIG>
<SIG> := <DOSE> | <FORM> | <RUT> ….
DGN SIG SIG
FreqStr FreqStr
Prograf 3mg qam and 2mg qpm
Figure 1. Simplified semantic grammar.
16
Extend MedEx for the 2009 i2b2 Challenge
Semantic Tagger Parser
MedEx
Clinical Notes I2b2 Output
Sentence Splitter
Section Identification
Post-processing
Spell Checker
Findings Prec Rec F-Score
DrugName 84.2 87.1 85.6
Dose 89.5 81.8 85.5
Route 91.8 85.8 88.7
Frequency 87.9 85.8 86.8
Reason 45.9 29.6 36.0
Duration 36.4 35.8 36.1
All 83.9 80.3 82.1
Table 3. Evaluation on 2009 i2b2 data set.
Ranked 2nd out of 20 participating
teams.
Doan et al. JAMIA 2010; 17: 528-
31
17
§ The 2010 i2b2 Challenge: recognize problem, treatment and test
§ Convert it into a machine learning task
§ Optimize the ML models
§ ML algorithms: CRFs, SSVMs
§ Features: words, sections, dictionary, representations
§ Entity tag sets: BIO, BIESO
18
Machine Learning for clinical entity recognition
She was given 1 unit of packed red blood cell .
O O O O O O B I I I O
“Plavix was not recommended, given her recent GI bleeding.”
Jiang et al. JAMIA 2011
Results
19
Tags Features SSVMs - F(R/P) CRFs - F(R/P)
BIO
Baseline 84.51 (82.61/86.49) 84.02 (81.32/86.90)
Optimized 85.22 (84.05/86.43) 85.16 (82.94/87.50)
BIESO
Baseline 84.71 (82.53/87.02) 84.22 (81.40/87.23)
Optimized 85.82 (84.31/87.38) 85.59 (83.16/88.16)
Tang et al. BMC Medical Informatics and Decision Making 2013
Embedding
Methods
Traditional
- word2vec
- GloVe
- fastText
Contextual
- ELMo
- BERTBASE
- BERTLARGE
- BioBERT
Open-domain
- Off-the-shelf
(General)
Clinical domain
- Pre-trained on
clinical notes from
MIMIC-III starting
from open-domain
checkpoint
Entity Recognition Tasks
- i2b2 2010
- i2b2 2012
- SemEval 2014 Task 7
- SemEval 2014 Task 14
Pre-training Evaluation
Si Y et al. Enhancing clinical concept extraction with contextual embeddings. JAMIA. 2020
Contextual embeddings for deep learning-based NER
Algorithm Comparison on Benchmark Dataset
21
Algorithms Feature F1
CRFs (Jiang et al., 2010)
(#2 in challenge)
Bag of words 77.33
Optimized features 83.60
Semi-Markov (deBruijn B, et
al., 2010)
(#1 in challenge)
Optimized features + Brown clustering 85.23
SSVMs (Tang et al., 2014) Optimized features
+ Brown clustering + Random indexing
85.82
CNN (Wu et al., 2015) Word embedding 82.77
Bi-LSTM-CRF (Wu et al., 2017) Word embedding 85.91
BERT (Si et al., 2020) Pre-trained language model - BERT, fine tuned on clinical text 90.25
§ Task: 2010 i2b2 challenge – entity recognition for problem, treatment,
and test in discharge summaries
Additional thoughts on deep learning approaches
§ Parameter optimization
§ Computation resources (e.g., GPU)
§ Prediction speed
§ CRF-based NER – 1 second per discharge summary
§ BERT-based NER – 20 second per discharge summary
§ Reliability and explainability
A Review of Deep Learning in Clinical NLP
Wu S et. al Deep learning in clinical natural language processing: a methodical review. JAMIA 2019
NLP Tasks and Applications
NLP Tasks Sub-tasks NLP Applications
Word sequence POS tagging, language models,
Named entity recognition, Relation
extraction / semantic annotation (semantic
role labeling, event detection, FrameNet)
Information extraction
Sequence to sequence encoders and decoders Machine translation
Summarization
Text
classification/clustering
Document classification, Sentence
classification, Sentiment analysis, Topic
models
Email Spam
Product sentiment
Information retrieval Query expansion, Indexing, Relevance
ranking
Search engine
Dialog systems Speech recognition, natural language
generation
Chat bot
Summary about algorithm selection
§ The simplest approach that can achieve good performance is the best
§ Take available resources into consideration
§ Computation resources
§ Both labeled and unlabeled data
§ Expertise in machine learning/deep learning
§ Keep deployment in mind
§ Technical architecture and infrastructure
§ Fitting into your workflow
§ Other requirements such as speed, robustness etc.
Part 3
03 Annotate good data
26
Availability, Quality, and Sample Size
Data Availability
§ Large unlabeled data is useful, especially for deep leaning based
approaches
§ High quality annotated data is the key to machine learning/deep learning
based approaches
§ Be aware of the privacy issue of biomedical textual data - De-identification
programs that can remove protected health information (i.e., names,
address, dates) are available
§ An NER task – many rule-based, ML-based, hybrid approaches
§ Performance varied (some as high as 95%)
§ Examples: MIST, De-ID…
What about synthetic text?
§ Generating synthetic notes
§ Task – generate HPI section
§ Data – 826 clinical notes
§ Methods – SeqGAN, GPT-2, and CTRL
This is a 39 year-old female with a history of diabetes mellitus , coronary artery
disease , who presents with shortness of breath and cough . She has no relief from
antacids or antiinflammatories . She is admitted now with increasing radiation
damage to her home and extensive medical bills . She denies any pleural chest pain.
Annotation Quality Matters
§ Task: 2014 i2b2 challenge – extracting 36 risk factors , a document
classification task
§ Dataset: 790 training and 514 test notes with document labels and
evidence span highlighted
§ The top-ranked system:
§ Traditional SVM classifiers
§ Re-annotated the corpus to:
§ Fix inconsistent boundary
§ Identify negative mentions
Roberts, K., Shooshan, S.E., Rodriguez, L., Abhyankar, S., Kilicoglu, H. and Demner-Fushman, D., 2015. The role of fine-grained annotations in supervised
recognition of risk factors for heart disease from EHRs. Journal of biomedical informatics, 58, pp.S111-S119.
Requirements for annotation
§ Annotation guideline
§ Clear definitions of entity and relation (an information model)
§ Appropriate granularity to benefit your application
§ Consistent and robust for representing information
§ High quality Annotation (e.g., consistent)
§ Annotator knowledge
§ Sufficient training
§ Adequate sample size
30
Annotation Workflow
Data collection
Pre-annotation
Guideline
development
Training &
annotation
Model
training
Quality
control
ML Models
Guideline development – content
§ Goals of the annotation
§ Definitions of entities, relations, etc.
§ Information model
§ Granularity
§ Detailed annotation rules (different scenarios)
§ Human vs. computer’s thinking
§ Provide many examples
§ Positive examples
§ Negative examples
32
Guideline development - workflow
§ Iterative process
§ Involvement of both domain experts and linguists/informaticians
33
Guideline development – example
34
Annotators selection and training
§ Annotator selection
§ Background: domain experts/linguists/lay person
§ Sources: physicians/nurses, residents, students, or crowdsourcing - Amazon
Mechanical Turk
§ Annotator training
§ Iterative training until achieve an expected performance
§ Quality checking during the annotation
§ Multi-annotator magament
§ Train each annotator to make sure consistent annotations achieved before start
§ If resources available, each sample can be double annotated by two people, and a
third more experienced one to judge discrepancy.
§ Otherwise, assign a small portion of data to both annotators so that inter-annotation
agreement (IAA) can be calculated
35
Annotation tools
§ BRAT
§ MAE
§ eHOST
§ ….
§ Prodigy
§ LightTag
§ CLAMP
36
Quality checking
§ Inter annotator agreement
§ Precision/Recall/F-measure
§ Cohen’s Kappa, Fleiss’s Kappa (https://en.wikipedia.org/wiki/Cohen%27s_kappa)
§ Self-train and self-test
§ One dataset – build the model and predict the same dataset again.
§ Performance should be high, otherwise it indicates issues with annotation
37
Sample Size
§ How many samples are needed for the required performance of
the specific task? – No definite answer…
§ Many studies reported results on several hundreds of documents
§ Sample size could be estimated based on power calculation
§ More precisely, we can plot a learning curve
38
0.25
0.35
0.45
0.55
0.65
0.75
0.85
8 32 128 512 2048 8192
F-measure
Number of sentences in the training set
Uncertainty
Challenges and potential solutions
§Annotation cost/time
§Requires reasonable sized annotated corpus
§Annotation by experts (e.g., physicians) is expensive
§Technologies to save annotation time
§Weak supervision
§ Get low quality labels more efficiently
§Transfer learning
§ Leverage labeled data/models for a different domain/task
§Active learning
§ Label informative samples to build better models
Summary
§ Data and annotation play important roles in machine learning/deep
learning based NLP systems
§ A good annotated corpus that leads to high performance ML models should
include:
§ Annotation guideline designed for the task
§ Knowledgeable and well-trained annotators
§ Enough annotated samples
§ Annotation could be costly and time-consuming
Part 4
41
04 Bring human into the loop
Human annotation, Rule augmentation, Biomedical knowledge bases
Rule Augmentation is Effective
SVM
post-
processing
CNN-RNN
+ post-
processing
biLSTM-
CRF
+ post-
processing
Strength -> Drug 0.9704 0.9792 0.9760 0.9853 0.9865 0.9916
Dosage -> Drug 0.9637 0.9798 0.9642 0.9818 0.9720 0.9860
Duration -> Drug 0.84 0.8947 0.8519 0.9125 0.8829 0.9292
Frequency -> Drug 0.9525 0.9735 0.9592 0.9810 0.9692 0.9873
Form -> Drug 0.9728 0.9867 0.9713 0.9864 0.9765 0.9890
Route -> Drug 0.9581 0.9742 0.9668 0.9805 0.9736 0.9858
Reason -> Drug 0.7328 0.8364 0.7464 0.8466 0.7579 0.8488
ADE -> Drug 0.7604 0.8221 0.7528 0.8112 0.7946 0.8502
Overall 0.9256 0.9521 0.9304 0.9574 0.9399 0.9630
Task: 2018 n2c2 Drug-ADE challenge
Wei, Q., et al. A study of deep learning approaches for medication and adverse drug event extraction from
clinical text. JAMIA, 2020 27(1), pp.13-21.
Active Learning to Reduce Annotation Cost
§ Goal: minimize annotation cost while maximizing the quality of ML-based model
43
Pool of
unlabeled data
Labeled
data
Human
annotator Machine
learner
active learning:
select the most
informative samples
select samples
randomly
VS.
passive learning:
Querying Algorithms
§ Uncertainty-based
querying
§ Clustering and
uncertainty sampling
engine (CAUSE)
§ Query the most uncertain
and representative
sentences
44
Score(c1) = 0.6
Score(c2) = 0.4
Score(c3) = 0.1
c1
c2
c3
Inputs:
Clusters
Uncertainty
Number of queries
= 2
Steps:
(1) Cluster scoring
(2) Representative
sampling
Outputs:
a and c
a
c
b
An active learning-enabled annotation system for clinical named entity recognition. Chen Y et al. BMC Med Inform & Decis
Mak. 2017
Simulation Study
45
0.25
0.35
0.45
0.55
0.65
0.75
0.85
8 32 128 512 2048 8192
F-measure
Number of sentences in the training set
Uncertainty
Diversity
Length
Random
Annotation
cost
Random
sampling
Uncertainty
sampling
Reduction
Sentences 8,702 2,971 66%
To achieve a model with 0.80 in F-
measure
Online Learning Interface
Active Learning Workflow
47
Annotate a sentence
by user
Encode
CRF model
Rank sentences based
on querying algorithm
Labeled
set
Ranked
unlabeled
set
Start
Pool
User quits
or
time out
EndYes
No
Learning Annotation
Activator
Interface
Load data pool
Select top unlabeled
sentence
Real Time User Study on Acitive vs. Passive Learning
48
Categories Features
Basic Number of words (NOW)
Number of entities (NOE)
Number of entity words (NOEW)
Syntactic Entropy of POS tag (EOP)
Semantic TFIDF
Sentence MRI by report showed bilateral rotator cuff repairs and
he was admitted for repair of the left rotator cuff .
Feature NOW NOE NOEW TFIDF EOP
Value 20 3 11 35.36 2.28
!"#$ # = !& + (
)
!)*) #
A linear regression model was used to estimate annotation time
based on the basic information, semantic complexity and syntactic
complexity of the sentence.
UPC(s) = Utility(s)/Cost(s)
Cost-aware Active Learning
8 out of 9 users showed better performance
on active learning than random sampling
AL saves 20-30% annotation time!
A Larger User Study
Wei Q. et. al Cost-aware Active Learning for Named Entity Recognition in Clinical Text JAMIA 2019
Human Annotation Process is Complicated
§ Annotation speed vs. quality
68
70
72
74
76
78
80
82
84
86
88
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9
Annotionquality(F1score)
Speed(words/minute)
Users
Statistics of Annotations
Speed
Quality
Wei, Q., et al. AMIA 2018
user1 user2 user3 user4 user5 user6 user7 user8 user9
DD -0.038* -0.036 -0.002 -0.036 -0.088* -0.107* -0.273 0.001 -0.13*
EOP 0.338* 0.245 1.046* -0.459 0.695* -1.066* -0.459 -0.349 0.95*
NOP 0.243 0.705 -0.372 -0.609 0.431* 0.583* 0.478 -0.147 -0.297
ISC -0.083 -0.397 -0.38 -0.196 0.421* -0.737* -0.452 -0.49* -0.261
NOV 0.402* 0.634* -0.35* 0.201 -0.22 0.139 1.175* 0.514* 1.17*
DOP -0.234 -0.903 0.58* 0.592 -0.756* 1.213* -0.885 0.716* 1.25*
§ Syntactic structure impact
DD - Dependency Distance
EOP – Entropy of POS tags
NOP - Number Of Phrase nodes
…….
Mapping to Standard Clinical Terminologies is Important
§ Encoding (Entity Linking) – find the corresponding concept ID in a
terminology for a given term/entity
§ Example:
§ Entity: “right below - knee amputation”
§ Candidates:
• 1: C2202463 amput below knee leg right
• 2: C0002692 amput below knee
• 3: C0002692 amput below bka knee
• …
§ Challenges
§ Lexical variation
§ Polysemy
§ Granularity differences
Entity Linking Framework – Map to UMLS
NE term
Query on UMLS
concepts
Ranking by
similarity scores
Post processing,
adjust CUI offset
Query expansion:
LVG/abbr/synonyms/
adj-to-noun…
Learning to
Rank
UMLS
Index
Builder
UMLS
Index
Task Dataset Method Accuracy
SNOMED-CT
clinical text
2013 ShARe/CLEF
2014 Semeval
BM25 + Domain knowledge+RankSVM (#1
in challenge) (Zhang, 2014)
0.873
BM25 + domain Knowledge + CNN (Tang,
2017)
0.903
BM25 + BERT (Ji, 2019) 0.911
MedDRA drug labels
2018 TAC ADR
BM25 + Translational model + RankSVM
(#1 in challenge) (Xu, 2018)
0.911
BM25 + BERT (Ji, 2019) 0.932
MeSH biomedical
literature
NCBI
BM25 + domain Knowledge + CNN (Tang,
2017)
0.861
BM25 + BERT (Ji, 2019) 0.891
Encoding Algorithms and Performance on Benchmark Data
Summary
§ Rules are still important when optimizing performance of biomedical NLP
systems – hybrid approaches often achieve best performance
§ Interacting human with data/algorithm is one way to improve model
performance while reducing annotation cost
§ Biomedical ontologies and other knowledge bases are valuable for many
NLP applications
Integrate All to Better Support End-Users
The CLAMP system
CLAMP - Clinical Language Annotation, Modeling, and Processing
NLP Challenge Tasks Ranking
Named entity
recognition
2009 i2b2 medication information
extraction
#2
2010 i2b2 problem, treatment, test
extraction
#2
2013 SHARe/CLEF abbreviation recognition #1
2016 CEGS N-GRID, De-identification #2
UMLS
encoding
2014 SemEval, disorder encoding #1
Relation
extraction
2012 i2b2 Temporal information extraction #1
2015 SemEval Disease-modifier extraction #1
2015 BioCREATIVE Chemical-induced
disease from literature
#1
2016 SemEvel, temporal information
extraction
#1
2017 TAC ADR extraction from drug labels #1
2018 n2c2, medication and associated ADR #1
CLAMP Algorithms
Track record in clinical NLP challenges The DeepMed Framework
CRFs, SSVMs
Bi-LSTM-CRF
BERT/bioBERT
…..
AutoML
Docker Container
CLAMP Data Annotation
CLAMP Rule Interface for Human
61
Available as:
• CLAMP-CMD
• CLAMP-GUI
• CLAMP-EE
CLAMP Users
When developing biomedical NLP applications, please
§ Identify the right NLP tasks for your projects
§ Assemble the development team with required expertise (domain experts,
business owners, informaticians, developers ….)
§ Collect data and annotate data following a standard protocol (guideline
development, annotator training/agreement check, annotation quality control …)
§ Select appropriate algorithms (accuracy, speed, implementation …) and carefully
evaluate its performance/usability/interoperability
§ Keep human (multidisciplinary) in the loop during the life cycle of the
development
1
Transfer Learning of NLP in Medicine:
A Case Study with BERT
Yifan Peng
Department of Population Health Sciences
2
u A technique that allows to reutilize an already trained model on one dataset
and adapt it to a different dataset
u In the field of computer vision, researchers have repeatedly shown the
value of transfer learning
u Two steps
u Pre-training: Use a large training set to learn network parameters and save them for
later use
u Fine-tuning: Train all (or part of) layers of the pretrained network on the target
dataset
Transfer learning
3
u Step 1: Pre-train the CNN on ImageNet (14 million images)
Example: Detect lung diseases from chest X-ray
4
u Step 2: Fine-tune the model on NIH Chest X-ray (100,000 chest X-ray images)
Example: Detect lung diseases from chest X-ray
Wang et al., ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised
Classification and Localization of Common Thorax Diseases. CVPR. 2017.
5
u Less difficult to train a complex network
u Speed up the convergence speed of the training
u How can transfer learning benefit NLP in medicine
Transfer learning
6
u Word embedding
u ELMo
u BERT
u How to use pre-trained BERT
u Performance comparison of BERT in medicine
u Multi-task learning
Outlines
• How BERT’s idea are gradually formed?
• What has been innovated?
• Why the effect is good?
7
u There are an estimated 13 million words for the English language
u They are not completely unrelated.
u Hotel vs Motel vs Dog
u We want to encode each word into some representations that the machine
can understand.
How do we represent the meaning of a word?
8
u Encode similarity in the vectors themselves
u Some N-dimensional space (e.g., 200D) that is sufficient to encode all
semantics of our language.
u Each dimension would encode some meaning that we transfer using
speech.
u tense (past vs. present vs. future)
u count (singular vs. plural)
Word vector
9
u Word2vec, fastText, etc
u BioWordVec: https://github.com/ncbi-nlp/BioWordVec
Word Embeddings
Zhang et al., Improving biomedical word embeddings with subword information and MeSH. Scientific Data. 2019
Sources Documents Tokens
PubMed 30M 4000M
MIMIC III Clinical notes 2M 500M
10
Interesting semantic patterns emerge in the vectors
Word pair word2vec BioWordVec
thalassemia hemoglobinopathy — 0.834
mycosis histoplasmosis 0.353 0.706
thirsty hunger 0.252 0.629
influenza pneumoniae 0.482 0.611
atherosclerosis angina 0.503 0.589
Zhang et al., Improving biomedical word embeddings with subword information and MeSH. Scientific Data. 2019
11
Interesting syntactic patterns emerge in the vectors
Rohde et al. An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence. 2005
12
u Algebraic relations:
u vec(‘‘woman")−vec(‘‘man") + vec(‘‘aunt") ≈ vec(‘‘uncle")
u vec(‘‘woman")−vec(‘‘man") + vec(‘‘queen") ≈ vec(‘‘king")
Linear translations
Tomas Mikolov , Wen-tau Yih, Geoffrey Zweig,“Linguistic Regularities in Continuous Space Word Representations”, NAACL-
HLT 2013
13
How to use Word Embeddings
Peng et al. Extracting chemical-protein relations with ensembles of
SVM and deep learning models. Database. 2018.
Recurrent Neural NetworkConvolutional Neural Network
14
u Evaluation of word embeddings in protein-protein interaction (PPI)
extraction task
Word Embeddings in DL
Data Set Word2vec BioWordVec
AIMed 0.445 0.487
BioInfer 0.524 0.549
BioInfer 0.603 0.623
IEPA 0.484 0.511
HPRD50 0.679 0.713
15
Polysemous problem
u I arrived at the bank after crossing the river
u The bank has plan to branch through the country…
Static Word Embedding can't solve the problem of polysemous words.
Limitations of Word Embeddings
16
u “Embedding from Language Models”
Adjust the Word Embedding representation of the word according to the
semantics of the context word
From Word Embedding to ELMo
Peters et al., Deep contextualized word representations. NAACL. 2018
17
u A typical two-stage process
u The first stage is to use the language model for pre-training.
ELMo
no evidence of infiltrate
Target Word
Left context Right context
18
u A typical two-stage process
u The first stage is to use the language model for pre-training.
u The second stage is to extract the Embeddings of each layer.
ELMo
ELMo word representations are
functions of the entire input sentence
19
u Evaluation of ELMo in named entity recognition and relation extraction
tasks
ELMo in medical NLP
Task Dataset SOTA ELMo
Named Entity Recognition ShARe/CLEFE 70.0 75.6
Relation Extraction DDI 72.9 78.9
Relation Extraction ChemProt 64.1 66.6
20
u Hard to capture long distance information
u Computational expensive
Limitations of ELMo
21
u Bidirectional Encoder Representations from Transformers
From ELMo to BERT
Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL. 2019
22
Transformer
23
u A self-attention mechanism which directly models relationships
between all words in a sentence.
Why transformer?
https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html
24
Why transformer?
25
u In parallel. Much faster and more space-efficient
Why transformer?
26
u Pre-training
u PubMed abstracts
u Clinical notes
u Fine-tuning
u Sentence Similarity
u Named entity recognition
u Relation extraction
u etc
BERT and BlueBERT
Corpus Words Domain
PubMed abstract 4,000M Biomedical
MIMIC-III 500M Clinical
27
u Word embedding
u ELMo
u BERT
u How to use pre-trained BERT
u Performance of BERT in medicine (BLUE Benchmark)
u Multi-task learning
Outlines
• How BERT’s idea are gradually formed?
• What has been innovated?
• Why the effect is good?
28
Assign tags or categories to text
according to its content.
For example
u Organizing millions of cancer-
related references from PubMed
into the Hallmarks of Cancer
How to use BERT - Sentence classification
29
Extract semantic relationships from a text
How to use BERT - Relation extraction
30
Predict similarity scores based on the
sentence pairs.
For examples,
u The above was discussed with the
patient, and she voiced understanding of
the content and plan.
u The patient verbalized understanding of
the information and was satisfied with
the plan of care.
How to use BERT - Sentence similarity
31
u Locate and classify named entity
mentions in text into pre-defined
categories
How to use BERT - Named entity recognition
32
u Pre-trained models
u Fine-tuning codes
u Preprocessed texts in PubMed
https://github.com/ncbi-nlp/bluebert
33
u Word embedding
u ELMo
u BERT
u How to use pre-trained BERT
u Performance of BERT in medicine (BLUE Benchmark)
u Multi-task learning
Outlines
• How BERT’s idea are gradually formed?
• What has been innovated?
• Why the effect is good?
34
u Significant advances in the development of pretraining language representations
in the general domain
u ELMo, BERT, Transformer-XL, XLNet
u General Language Understanding Evaluation (GLUE) benchmark in general
domain
u No publicly available benchmarking in the biomedicine domain
Biomedical Language Understanding Evaluation (BLUE) benchmark
u Contains diverse range of text genres (biomedical literature and clinical notes)
u Highlight common biomedicine text-mining challenges
u Promote development of language representations in biomedicine domain
BLUE Benchmark
35
BLUE benchmark
Not publicly available, but the permissions can be requested.
36
Results
Sentence similarity
Named entity
recognition
Relation extraction
Doc classification
Inference
37
https://github.com/ncbi-nlp/BLUE
38
u Word embedding
u ELMo
u BERT
u How to use pre-trained BERT
u Performance of BERT in medicine (BLUE Benchmark)
u Multi-task learning
Outlines
• How BERT’s idea are gradually formed?
• What has been innovated?
• Why the effect is good?
39
u Multi-task learning (MTL) is a field of machine learning where multiple tasks
are learned in parallel while using a shared representation
u Increase the sample size for training the model, thus lead to performance
improvement by increasing the generalization of the model
u This is particularly helpful in some applications such as medical informatics
where (labeled) datasets are hard to collect
u May also helpful in the context that researchers are in the hassle of
choosing a suitable model for new problems where training resources are
limited
Multi-task learning
40
Multi-task model
mt-dnn (https://github.com/namisan/mt-dnn)
41
u Pretraining
u BlueBERT: pretrained on PubMed and MIMIC-III
u BioBERT: pretrained on PubMed
u Refining via Multi-task learning
u Refine all layers in the model
u Fine-tuning MT-BERT
u Continue training all layers on each specific task
Training procedure
42
Test results on clinical tasks
• Fine-tuning BERT (4 models)
• Refining via Multi-task learning (1 model)
• Refine all layers in the model
• Fine-tuning MT-BERT (4 models)
• Continue training all layers on each specific task
43
Test results on biomedical tasks
• Fine-tuning BERT (4 models)
• Refining via Multi-task learning (1 model)
• Refine all layers in the model
• Fine-tuning MT-BERT (4 models)
• Continue training all layers on each specific task
44
Test results on eight BLUE tasks
45
u Word embeddings à ELMo à BERT
u Pre-trained BERT models
u How to use BERT
u Performance comparison and benchmark
u Multi-task learning
Summary
46
u https://github.com/ncbi-nlp/BioWordVec
u https://github.com/ncbi-nlp/BioSentVec
u https://github.com/ncbi-nlp/bluebert
u https://github.com/ncbi-nlp/BLUE
Resources
47
u BERT, ELMo, and mt-dnn
u Shared tasks and datasets
u BIOSSTS, MedSTS, BioCreative V chemical-disease relation task, ShARe/CLEF eHealth
task, DDI extraction 2013 task, BioCreative VI CHEMPROT, i2b2 2010 shared task,
Hallmarks of Cancers corpus
u This work was supported by the Intramural Research Programs of the
National Library of Medicine, National Institutes of Health, and
K99LM013001.
Acknowledgment
48
Thank you!
yip4002@med.cornell.edu
AIME 2020
Digital Phenotyping for Cohort
Discovery using Electronic Health
Records
Yanshan Wang
Assistant Professor of Biomedical Informatics
Division of Digital Sciences Research
Mayo Clinic
AIME 2020
Why take this tutorial?
• Patient cohort retrieval is still labor
expensive today.
• Most information is embedded in
unstructured EHRs.
• Natural language processing is under-
utilized for cohort retrieval.
2
AIME 2020
Goal of this tutorial
• To get an understanding of basic concepts
about cohort retrieval in clinical domain.
• To connect NLP theory with clinical
knowledge.
• To get an introduction into clinical use
cases of cohort retrieval.
3
AIME 2020
Suggested reading
• Books
AIME 2020
Suggested reading
• Papers
• A review of approaches to identifying patient phenotype
cohorts using electronic health records. Shivade et al. 2013.
• Case-based reasoning using electronic health records
efficiently identifies eligible patients for clinical trials. Miotto
et al. 2015.
• A survey of practices for the use of electronic health
records to support research recruitment. Obeid et al. 2017.
• Clinical information extraction applications: a literature
review. Wang et al. 2018
• Using clinical natural language processing for health
outcomes research: Overview and actionable suggestions
for future advances. Velupillai et al. 2018.
5
AIME 2020
Agenda
• Basic Concepts
• EHR, Phenotyping, Evidence-based Clinical
Research, Knowledge Base, Common Data
Model
• Patient Cohort Discovery
• Brief Introduction to NLP
• NLP for Cohort Discovery
AIME 2020
Basic Concepts
• Electronic Health Record
• Phenotyping
• Evidence-based clinical research
• Knowledge bases
• Common Data Model
AIME 2020
Basic Concepts
• Electronic Health Record
AIME 2020
Basic Concepts
• Phenotyping
• The phenotype (as opposed to genotype, which is the set of
genes in our DNA responsible for a particular trait) is the
physical expression, or characteristics, of that trait.
• Phenotyping is the practice of developing algorithms
designed to identify specific phenomic traits within an
individual1.
• Digital phenotyping using EHRs
• Traditionally, clinical studies often use self-report
questionnaires or clinical staff to obtain phenotypes from
patients. (slow, expensive, could not scale).
• EHR data come in both structured and unstructured
formats, and the use of both types of information can be
essential for creating accurate phenotypes2.
1. eMERGE network.
2. Wei, W. Q., & Denny, J. C. (2015). Extracting research-quality phenotypes from electronic health
records to support precision medicine. Genome medicine, 7(1), 41.
AIME 2020
Basic Concepts
• Phenotyping
• Phenotyping is the practice of developing algorithms
designed to identify specific phenomic traits within an
individual1.
• Digital Phenotyping using EHRs
• EHR data come in both structured and unstructured
formats, and the use of both types of information can be
essential for creating accurate phenotypes2.
Source: Wei, W. Q., & Denny, J. C. (2015). Extracting research-quality phenotypes from electronic
health records to support precision medicine. Genome medicine, 7(1), 41.
(NLP)
(NLP)
AIME 2020
Evidence-based clinical research
• Observational studies
• Types of studies in epidemiology, such as the cohort study
and the case-control study.
• The investigators retrospectively assess associations
between the treatments given to participants and their
health status.
• Randomized control trials
• Clinical trials are prospective biomedical or behavioral
research studies on human participants that are designed
to answer specific questions about biomedical or behavioral
interventions including new treatments, such as novel
vaccines, drugs, and medical devices.
AIME 2020
Basic Concepts
• Cohort/Eligibility Criteria
• Inclusion criteria
• Exclusion criteria
AIME 2020
Basic Concepts
• Cohort/Eligibility Criteria
• Inclusion criteria
• Exclusion criteria
https://clinicaltrials.gov/ct2/show/NCT03690193?cond=alzheimer%27s+disease&rank=5
clinicaltrials.gov
AIME 2020
Basic Concepts
• Knowledge Bases
• UMLS (Unified Medical Language System) (including the Metathesaurus,
Semantic Network, the Specialist Lexicon)
• Used as a knowledge base and resource for a lexicon. Metathesaurus
provides the medical concept identifiers. Semantic Network specifies the
semantic categories for the medical concepts.
AIME 2020
Basic Concepts
• Knowledge Bases
• UMLS (Unified Medical Language System) (including the Metathesaurus,
Semantic Network, the Specialist Lexicon)
• Used as a knowledge base and resource for a lexicon. Metathesaurus
provides the medical concept identifiers. Semantic Network specifies the
semantic categories for the medical concepts.
• SNOMED-CT
• Standardized vocabulary of clinical terminology.
AIME 2020
Basic Concepts
• Knowledge Bases
• UMLS (Unified Medical Language System) (including the Metathesaurus,
Semantic Network, the Specialist Lexicon)
• Used as a knowledge base and resource for a lexicon. Metathesaurus
provides the medical concept identifiers. Semantic Network specifies the
semantic categories for the medical concepts.
• SNOMED-CT
• Standardized vocabulary of clinical terminology.
• LOINC
• Standardized vocabulary for identifying health measurements, observations,
and documents.
AIME 2020
Basic Concepts
• Knowledge Bases
• UMLS (Unified Medical Language System) (including the Metathesaurus,
Semantic Network, the Specialist Lexicon)
• Used as a knowledge base and resource for a lexicon. Metathesaurus
provides the medical concept identifiers. Semantic Network specifies the
semantic categories for the medical concepts.
• SNOMED-CT
• Standardized vocabulary of clinical terminology.
• LOINC
• Standardized vocabulary for identifying health measurements, observations,
and documents.
• MeSH
• NLM controlled vocabulary thesaurus used for indexing articles for PubMed
articles.
AIME 2020
Basic Concepts
• Knowledge Bases
• UMLS (Unified Medical Language System) (including the Metathesaurus,
Semantic Network, the Specialist Lexicon)
• Used as a knowledge base and resource for a lexicon. Metathesaurus
provides the medical concept identifiers. Semantic Network specifies the
semantic categories for the medical concepts.
• SNOMED-CT
• Standardized vocabulary of clinical terminology.
• LOINC
• Standardized vocabulary for identifying health measurements, observations,
and documents.
• MeSH
• NLM controlled vocabulary thesaurus used for indexing articles for PubMed
articles.
• MedDRA
• Terminologies specific to adverse event.
• RxNorm
• Terminologies specific to medications
AIME 2020
Basic Concepts
• Common Data Model
• Common Data Model (CDM) is a specification that
describes how data from multiple sources (e.g., multiple
EHR systems) can be combined. Many CDMs use a
relational database.
• Observational Medical Outcomes Partnership (OMOP)
CDM by Observational Health Data Sciences and
Informatics (OHDSI)
AIME 2020
OMOP CDM v. 5.0
Source: https://www.ohdsi.org/data-standardization/the-common-data-model/
AIME 2020
Why Natural Language Processing
(NLP)?
AIME 2020
Facts
• Artificial Intelligence (AI) is one of the
most interesting fields of research today.
• The growth of and interest in AI is due to
the recent advances in deep learning.
• Language is the most compelling
manifestation of intelligence.
22
AIME 2020
Facts
• Artificial Intelligence (AI) is one of the
most interesting fields of research today.
• The growth of and interest in AI is due to
the recent advances in deep learning.
• Language is the most compelling
manifestation of intelligence.
23
AIME 2020
24
AIME 2020
Natural Language Processing
• What is NLP?
• "Natural language processing (NLP) is a subfield of computer
science, information engineering, and artificial intelligence
concerned with the interactions between computers and human
(natural) languages, in particular how to program computers to
process and analyze large amounts of natural language data."
(Wikipedia)
AIME 2020
Natural Language Processing
• What is NLP?
• "Natural language processing (NLP) is a subfield of computer
science, information engineering, and artificial intelligence
concerned with the interactions between computers and human
(natural) languages, in particular how to program computers to
process and analyze large amounts of natural language data."
(Wikipedia)
AIME 2020
Question Answering
27
Source: https://www.youtube.com/watch?v=BkpAro4zIwU
IBM
Watson
Voice
Assistant
AIME 2020
Information Extraction
28
AIME 2020
Information Extraction
29
The patient’s maternal grandmother was diagnosed with breast cancer at
age 59 and passed away at age 80.
AIME 2020
Information Extraction
30
The patient’s maternal grandmother was diagnosed with breast cancer at
age 59 and passed away at age 80.
Entity normalization
The patient’s FAMILY MEMBER was diagnosed with CONDITION at age
AGE and LIVING_STATUS at age AGE.
Dependency parser
Maternal
Grandmother
Breast
cancer
59
deceased
80
Family Member
Condition
Living Status
Age
AIME 2020
Sentiment Analysis
31
■ nice and compact to carry!
■ since the camera is small and light, I won't need to
carry around those heavy, bulky professional cameras
either!
■ the camera feels flimsy, is plastic and very light in
weight you have to be very delicate in the handling of
this camera
Reviews:
Attributes:
zoom
affordability
size and weight
flash
ease of use
✓
✗
✓
Source: https://web.stanford.edu/~jurafsky/NLPCourseraSlides.html
AIME 2020
Information Retrieval
AIME 2020
Information Retrieval
AIME 2020
Information Retrieval
AIME 2020
How to represent Natural Language
• Natural language text = sequences of
discrete symbols (e.g. words).
• Vector representations of words: Vector Space Model
• Bag-of-words
35
I
love
NLP
like
dogs
I love NLP and like dogs
I 1 0 0 0 0 0
love 0 1 0 0 0 0
NLP 0 0 1 0 0 0
like 0 0 0 0 1 0
Vocabulary list
AIME 2020
How to represent Natural Language
• Drawbacks
• love=[0,1,0,0,0,0] AND like=[0,0,0,0,1,0] = 0!
• Using such representation, there’s no meaningful (semantic)
comparison we can make between words.
36
I =[ 1 0 0 0 0 0 ]
love =[ 0 1 0 0 0 0 ]
NLP =[ 0 0 1 0 0 0 ]
like =[ 0 0 0 0 1 0 ]
Sparse representation
AIME 2020
How to represent Natural Language
• With AI models, we learn “meaning” of a
word using dense semantic representation
/word embeddings
• Learning semantic representation from data
(corpus).
• Simply examining a large corpus it’s possible to
learn word vectors that are able to capture the
semantics and relationships between words in a
surprisingly expressive way.37
I =[ 0.99 0.05 0.1 0.87 0.1 0.1 ]
love =[ 0.1 0.85 0.99 0.1 0.83 0.09 ]
NLP =[ 0.67 0.23 0.01 0.02 0.01 0.81 ]
like =[ 0.1 0.73 0.99 0.05 1.79 0.09 ]
semantic representation
AIME 2020
NLP in AI is All About Learning A Better
Representation of Language
AIME 2020
• Images are easier to be presented by RGB
values
Source: https://analyticsindiamag.com/computer-vision-primer-how-ai-sees-an-image/
AIME 2020
• Language is much harder…
• “The weather’s looking gloomy today. I better wear
my trusty rubber boots!”
• “The weather’s looking gloomy today. I’m going to
stay inside.”1
• Will, will Will will Will Will's will? – Will (a person),
will (future tense helping verb) Will (a second
person) will (bequeath) [to] Will (a third person)
Will's (the second person) will (a document)?
(Someone asked Will 1 directly if Will 2 plans to
bequeath his own will, the document, to Will 3)2
Source: 1. Carlson L. Moral and Linguistic Perspectives on Pain and Suffering in Doctor-Patient Discourse. UMN Thesis.
2. Han, Bianca-Oana (2015). "On Language Peculiarities: when language evolves that much that speakers find it strange" (PDF). Philologia
(18): 140. ISSN 1582-9960. Archived (PDF) from the original on 14 October 2015.
AIME 2020
Learning Better Representations
41 Source: https://www.datasciencecentral.com/profiles/blogs/overview-of-artificial-intelligence-and-role-of-natural-language
Better
Word
Representation
“Representation Learning”
Language
Features
AIME 2020
NLP for Patient Cohort Discovery
AIME 2020
Clinical Research Pathway
Research
Question
Protocol
Design Feasibility
Identify
Patients
Invite
Patients
Pre-
screening
ConsentAnalysisReport
AIME 2020
Clinical Research Pathway
Research
Question
Protocol
Design Feasibility
Identify
Patients
Invite
Patients
Pre-
screening
ConsentAnalysisReport
AIME 2020
Clinical Trials Eligibility Screening and
Recruitment
• Clinical trials recruitment
• Randomized clinical trials are fundamental to the
advancement of medicine. However, patient recruitment for
clinical trials remains the biggest barrier to clinical and
translational research.
Cancer patients are
eligible1
Cancer patients are
participate1
Clinical trials fail to
retain enough patients2
85%20% <5%
1. Haddad TC, Helgeson J, Pomerleau K, Makey M, Lombardo P, Coverdill S, Urman A, Rammage M, Goetz MP, LaRusso N. Impact of a cognitive computing
clinical trial matching system in an ambulatory oncology practice. American Society of Clinical Oncology; 2018.
2. Cote DN. Minimizing Trial Costs by Accelerating and Improving Enrollment and Retention. Global Clinical Trials for Alzheimer's Disease: Elsevier; 2014. p. 197-
215.
AIME 2020
268
190 187
177
170
141
133 131
122
114 113 110 106 105 103
95 94 90 90
78 77 77 76 75 75 73 72 70 68
0
50
100
150
200
250
300
M
ayo
Clinic
Johns
H
opkins
University
D
uke
U
niversity
M
D
Anderson
C
ancerCenter
M
assachusetts
G
eneralH
ospital
U
niversity
ofW
ashington
U
niversity
ofCalifornia,San
Francisco
Stanford
University
M
em
orialSloan-Kettering
C
ancerC
enter
U
niversity
ofNorth
C
arolina
C
olum
bia
U
niversity
VanderbiltU
niversity
U
niversity
ofPittsburgh
U
niversity
ofPennsylvania
U
niversity
ofM
innesota
U
niversity
ofCalifornia,Los
Angeles
W
ashington
U
niversity
in
StLouis
The
C
leveland
Clinic
U
niversity
ofW
isconsin
U
niversity
ofM
ichigan
BaylorCollege
ofM
edicine
Yale
U
niversity
U
niversity
ofChicago
Em
ory
U
niversity
U
niversity
ofCalifornia,San
D
iego
O
regon
H
ealth
and
Science
U
niversity
Indiana
U
niversity
U
niversity
ofAlabam
a
atBirm
ingham
C
ase
W
estern
R
eserve
U
niversity
NUMBER OF TRIALS BETWEEN 2007 AND 2010
Source: Chen et al. Publication and reporting of clinical trial results: cross sectional analysis across academic medical centers. BMJ. 2016
AIME 2020
NLP for Clinical Trials Eligibility Screening
Natural Language
Processing
All patients
Eligible patients
Expeditepatientscreeningand
increasepatientrecruitmentrates
EHR
Recruit
Clinical trials criteria
AIME 2020
A Real-World Project
AIME 2020
Example: Clinical trials eligibility
screening for GERD
Identify a cohort of patients with and without chronic reflux using the definitions spelled out below. We wish to
test people with and without chronic reflux as our working hypothesis is that the prevalence of Barrett's
esophagus is comparable between those with and without chronic reflux.
Inclusion criteria :
1. Age greater than 50 years.
2. Gastroesophageal reflux disease. This can be defined using ICD 9 or ICD 10 cords. Additional criteria which
could be used to define GERD broadly are chronic (> 3 mo) use of a proton pump inhibitor (drug names include
omeprazole, esomeprazole, pantoprazole, rabeprazole, dexlansoprazole, lansoprazole) or a H2 receptor blocker
(ranitidine, famotidine, cimetidine). Prior endoscopic diagnosis of erosive esophagitis can also be used to make
a diagnosis of GERD.
3. Male gender
4. Obesity defined as body mass index greater than equal to 30. This is a surrogate marker for central obesity.
5. Current or previous history of smoking
6. Family history of esophageal adenocarcinoma/cancer or Barrett's esophagus
Exclusion criteria
1. Previous history of esophageal adenocarcinoma/cancer or Barrett's esophagus, previous history of
endoscopic ablation for Barrett's esophagus.
2. Previous history of esophageal squamous cancer or squamous dysplasia.
3. Treatment with oral anticoagulation including warfarin/Coumadin.
4. History of cirrhosis or esophageal varices
5. History of Barrett’s esophagus : this can be defined with ICD 9/10 codes.
6. History of endoscopy (will need to use a procedure code for EGD) in the last 5 years.
AIME 2020
Criteria ICD 9 ICD 10 CPT 4 Medication
Inclusion
1. Age greater than 50 years.
2. Gastroesophageal reflux
disease (any of 2.1, 2.2, 2.3)
2.1 Gastroesophageal reflux
disease defined by Dx
530.81 K21.9
2.2 Gastroesophageal reflux
disease defined by drug,
duration of use >= 3 months
over the last 5 years
omeprazole,
esomeprazole,
pantoprazole,
rabeprazole,
dexlansoprazole,
lansoprazole, ranitidine,
famotidine, cimetidine
2.3 Gastroesophageal reflux
disease defined by prior
endoscopic diagnosis of
erosive esophagitis
530.19 K21.0 Not able to find specific
code for esophagitis
3. Male gender
4. Obesity defined as body mass index greater than equal
to 30.
5. Current or previous history of smoking
6. Family history of esophageal adenocarcinoma/cancer
or Barrett's esophagus
7. Caucasian
Exclusion
1. Previous history of esophageal adenocarcinoma/cancer 150.9 C15.9
2. previous history of endoscopic ablation for Barrett's
esophagus.
43229, 43270
43228
43258
3. Previous history of esophageal squamous carcinoma
(included in 1)
150.9 C15.9
4. Previous history of esophageal squamous dysplasia 622.10 N87.9
5. Current Treatment (drug) with oral anticoagulation -
warfarin
warfarin
6. Current Treatment (drug) with oral anticoagulation -
Coumadin. (included in 5)
Coumadin
7. History of cirrhosis 571.5 K74.60
8. History of esophageal varices 456.20 I85.00
9. History of Barrett’s esophagus 530.85 K22.7
K22.710
K22.711
K22.719
10. History of endoscopy in the last 5 years 43235-43270
AIME 2020
Criteria ICD 9 ICD 10 CPT 4 Medication
Inclusion
1. Age greater than 50 years.
2. Gastroesophageal reflux
disease (any of 2.1, 2.2, 2.3)
2.1 Gastroesophageal reflux
disease defined by Dx
530.81 K21.9
2.2 Gastroesophageal reflux
disease defined by drug,
duration of use >= 3 months
over the last 5 years
omeprazole,
esomeprazole,
pantoprazole,
rabeprazole,
dexlansoprazole,
lansoprazole, ranitidine,
famotidine, cimetidine
2.3 Gastroesophageal reflux
disease defined by prior
endoscopic diagnosis of
erosive esophagitis
530.19 K21.0 Not able to find specific
code for esophagitis
3. Male gender
4. Obesity defined as body mass index greater than equal
to 30.
5. Current or previous history of smoking
6. Family history of esophageal adenocarcinoma/cancer
or Barrett's esophagus
7. Caucasian
Exclusion
1. Previous history of esophageal adenocarcinoma/cancer 150.9 C15.9
2. previous history of endoscopic ablation for Barrett's
esophagus.
43229, 43270
43228
43258
3. Previous history of esophageal squamous carcinoma
(included in 1)
150.9 C15.9
4. Previous history of esophageal squamous dysplasia 622.10 N87.9
5. Current Treatment (drug) with oral anticoagulation -
warfarin
warfarin
6. Current Treatment (drug) with oral anticoagulation -
Coumadin. (included in 5)
Coumadin
7. History of cirrhosis 571.5 K74.60
8. History of esophageal varices 456.20 I85.00
9. History of Barrett’s esophagus 530.85 K22.7
K22.710
K22.711
K22.719
10. History of endoscopy in the last 5 years 43235-43270
NLP-based Digital
Phenotyping
Algorithm
AIME 2020
Screening patients by Inclusion criteria
1, 3, 4, 7 and all Exclusion criteria using
i2b2. Get patient set A (n=31749)
From patient set A, screening patients
by Inclusion criteria 2.1 using i2b2. Get
patient set B (n=8667)
From patient set A, screening patients
by Inclusion criteria 2.2 using i2b2. Get
patient set C (n=1577)
From patient set A, screening patients
by Inclusion criteria 2.3, 5, 6 using ACE
and NLP. Get patient set D (n=230)
Union patient sets B, C, and D. Get
patient set E (n=9080)
AIME 2020
Architecture of Current Solutions
Structured
data
Collating Results
User Interface
(visualization, analytics, reporting, etc.)
Postprocessing
using
Unstructured
data
AIME 2020
An Integrated Framework
Structur
ed
data
Unstruct
ured
data
Collating Results
User Interface
(visualization, analytics, reporting, etc.)
AIME 2020
Information Retrieval for Cohort
Discovery
• Cohort retrieval is similar to modern search
engines.
AIME 2020
Structured
EHR
Structured Data
Unstructured
EHR
Data Retrieval
Unstructured
Concepts
Clinical Texts
Data Retrieval
Unstructured
Query
EHR
CDR
Full-text
Query
Structured
Query
Transform
Index
Structured
Index
Unstructured
Index
Parse/Edit
Filtered
Cohort
Relevant
Cohort
End User
NLP
Structured Data Flow
Unstructured Data Flow
Cohort Retrieval Enhanced Analysis by Text from EHR (CREATE)
ICD9/10, CPT, 
SNOMED CT ...
"Adults with inflammatory
bowel disease (ulcerative 
colitis or Crohn's disease)"
Index
Index
Machine
Learning
CREATE (Cohort Retrieval Enhanced by the Analysis of
Text from Electronic health records)
Liu et al. CREATE: Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records using OMOP Common Data Model.. 2019.
AIME 2020
AIME 2020
AIME 2020
AIME 2020
Another Way of Thinking of Cohort
Retrieval: Patient Representation
Patient Representation
Patient 1 -0.0011 -0008 -0.0050 ...
Patient 2 0.0108 -0.0194 0.0101 …
Patient 3 -0.0433 0.0361 0.0272 ...
Patient 4 -0.0935 0.0655…
…
…
EHR
AI
Clinical
Trial
Target Eligible Patients
Similarity
Measurement
AIME 2020
Unsupervised Machine Learning for
Patient Representation using EHRs
Poisson Dirichlet Model: an unsupervised generative
probabilistic machine learning model.
Wang et al. Unsupervised Machine Learning for the Discovery of Latent Disease Clusters and Patient Subgroups
Using Electronic Health Records. Journal of biomedical informatics. 2019.
Latent Dirichelt
Allocation (LDA)
Poisson Dirichlet
Model (PDM)
AIME 2020
Unsupervised Machine Learning for
Patient Representation using EHRs
Discovering
disease clusters
using EHRs
Discovering
patient subgroups
using EHRs
diabetes
comorbidities
Enhance
disease risk
prediction
Discover new
underlying
disease
mechanisms
Personalized care,
diagnosis, treatment,
and prevention.
EHR
AI
AIEHR
Wang et al. Unsupervised Machine Learning for the Discovery of Latent Disease Clusters and Patient Subgroups
Using Electronic Health Records. Journal of biomedical informatics. 2019.
AIME 2020
DiseaseClusters
Osteoporosis
Cohort
Delirium/Dementia
Cohort
COPD/ Bronchiectasis
Cohort
LDA PDM
AIME 2020
Patient Subgroups
Wang et al. Unsupervised Machine Learning for the Discovery of Latent Disease Clusters and Patient Subgroups
Using Electronic Health Records. Journal of biomedical informatics. 2019.
Osteoporosis Cohort Delirium/Dementia Cohort COPD/Bronchiectasis Cohort
AIME 2020
NLP for Cohort Discovery is All About
Learning A Better Representation of Patient
AIME 2020
Research Collaborations
Thank you!
Q&A
Wang.Yanshan@mayo.edu
Advances of Natural Language Processing
in Clinical Research
Rui Zhang, Ph.D.
Associate Professor and McKnight Presidential Fellow
Institute for Health Informatics, Department of Pharmaceutical Care & Health
Systems, and Data Science
University of Minnesota, Twin Cities
August 25, 2020
1
AIME 2020 Tutorial 1
Outline
• Part 1: NLP for Dietary Supplement Clinical
Research
• Part 2: Information Extraction in EHRs and Clinical
Trials
2
Clinical Research Informatics (CRI)
• CRI involves the use of informatics in the discovery and
management of new knowledge relating to health and
disease.
• It includes management of information related to clinical
trials and also involves informatics related to secondary
research use of clinical data.
• It involves approaches to collect, process, analyze, and
display health care and biomedical data for research
3
Leveraging Big Data for Pharmacovigilance
Big Data
Analytics
Leveraging Big Data for Pharmacovigilance
https://knowledgent.com/whitepaper/big-data-enabling-better-pharmacovigilance/
5
Leveraging NLP in Healthcare Analytics
NLP (extract, classify, summarize)
Biomedical Literature
!"#$%&#'(
!"#$%&#')
!"#$%&#'*
'''''''+++'''''
''''''''''''
''''''''''''
1
2
n
...
Clinical Notes
• Adverse events
• Substance use
• Family history
• Medical history
Biomedical knowledge
(Subject – Predicate - Object)
Healthcare providers, clinical researchers
6
Social Media
Pharmacovigilance signals
(Drug/supplement -
adverse Events)
Part 1: NLP for Dietary
Supplement Clinical Research
1R01AT009457 (PI: Rui Zhang)
• Integrated DS Knowledge Base (iDISK)
• Expanding DS terminology
• Detecting DS safety signals on clinical notes
• Mining biomedical literature to discover DSIs
• Active learning to reduce annotation costs
• Detecting DS safety signals on Twitter
7
Online resources
Clinical notes
Literature
Social media
Introduction to Dietary Supplements
• Dietary supplements
Ø Herbs, vitamins, minerals, probiotics, amino acids, others.
• Use of supplements increasing
Ø More than half of U.S. adults take dietary supplements (Center
for Disease Control and Prevention)
Ø One in six U.S. adults takes a supplement simultaneously with
prescription medications
Ø Sales over $6 billion per year in U.S. (American Botanical
Council, 2014)
https://nccih.nih.gov/health/supplements
Use of complementary and alternative medicine by children in Europe: Published data and expert perspectives. Complement
Ther Med. 2013 4;21.
Kaufman, Kelly, JAMA. 2002;287(3):337-344.
Dietary Supplement Use Among U.S. Adults Has Increased Since NHANES III (1988–1994). 2014(Nov 4, 2014). CDC. 8
Safety of Dietary Supplements
• Doctors often poorly informed about supplements
Ø 75.5% of 1,157 clinicians
• Supplements are NOT always safe
Ø Averagely 23,000 annual emergency visits for supplements
adverse events
Ø Drug-supplement interactions (DSIs)
• Concomitant administration of supplements and drugs
increases risks of DSIs
• Example: Docetaxel & St John’s Wort (hyperforin component
induces docetaxel metabolism via P450 3A4)
Kaufman, Kelly, JAMA. 2002;287(3):337-344.
Geller et al. New England J Med. 2015; 373:1531-40.
Gurley BJ. Molecular nutrition & food research. 2008, 52(7):772-9.
9
Regulation for Dietary Supplements
• Regulated by Dietary Supplement Health and Education
Act of 1994 (DSHEA)
Ø Different regulatory framework from prescription and over-
the-counter drugs
Ø Safety testing and FDA approval NOT required before
marketing
Ø Postmarketing reporting only required for serious adverse
events (hospitalization, significant disability or death)
Department of Health and Human Services, Food and Drug Administration. New dietary ingredients in dietary supplements —
background for industry. March 3, 2014
Dietary Supplement and Nonprescription Drug Consumer Protection Act. Public Law 109-462, 120 Stat 4500.
10
Limited Supplements Research
• Supplement safety research is limited
Ø Not required for clinical trials
Ø Not found until new supplement is on the market
Ø Voluntary adverse events reporting underestimates
the safety issues
Ø Pharmacy studies only focuses on specific
supplements
Ø DSI documentation is limited due to less rigorous
regulatory rules on supplements
11
Informatics and AI for Supplements Safety Research
• Online resources
Ø Provides DS knowledge across various resources
Ø Need informatics method to standard and integrate knowledge
• Electronic health records
Ø EHR provides patient data for supplement use
Ø Detailed supplements usage information documented in
clinical notes
• Biomedical literature
Ø Contains pharmacokinetics and pharmacodynamics knowledge
Ø Discover undefined pathways for DSIs
Ø Find potential DSIs by linking information
12
Informatics and AI for Supplements Safety Research
• Social media
Ø Contains customer’s DS use experience
Ø Discover their information needs
• Adverse Event Reporting System (CARES)
Ø Contains reported AEs
Ø A good resource to mine DS-AE signals
13
Challenges for Supplement Clinical Research
• No standardized and consistent DS knowledge
representation
• Lexical variations of supplements in clinical notes
• Detailed usage information related to
supplements
• Differentiate adverse events vs purpose use
14
1.1. Supplement Knowledge Base
15
To generate an integrated and standardized DS knowledge base
Rizvi R, et al. AMIA CRI (student paper competition finalist) 2018.
JAMIA 2019. doi: 10.1093/jamia/ocz216
iDISK
Development
q To build a one-stop
Integrated DIetary
Supplement
Knowledge base
(iDISK)
q DS related content
is represented in:
consistent and
standardized forms
16JAMIA 2019. doi: 10.1093/jamia/ocz216
DSLD, Dietary Supplement Label Database; MSKCC, Memorial Sloan Kettering
Cancer Center; NHP, Natural Health Products Database; NMCD, Natural Medicines
Comprehensive Database.
iDISK data model
Alfafa in iDISK
• Evaluation showed that iDISK achieved high accuracy (98.5%-100%) across all data elements
iDISK Statistics
19
iDISK vs UMLS on DS Coverage
iDISK: 41,628 unique DS ingredient names
UMLSDistilled : Only with certain semantic types (Nucleic Acid, Nucleoside, or Nucleotide,
Organic Chemical, Pharmacologic Substance, Vitamin, Bacterium, Fish, Fungus, Plant, or
Food, etc)
UMLSDS: select all concepts using Parent-Children relationship under the “Dietary
Supplements” (C0242295) and “Vitamin” (C0042890) concepts.
Matched
Against
iDISK
element
Exact Match (%) +luiNorm (+%) Total (%)
UMLS
Concepts
UMLS
Atoms 27 992 (45.7%) +550 (+0.9%) 28 542 (46.6%)
10 716
Unique Terms 12 744 (30.6%) +474 (+1.1%) 13 218 (31.7%)
UMLSDistilled
Atoms 27 553 (45.0%) +524 (+0.9%) 28 077 (45.9%)
8 684
Unique Terms 12 397 (29.8%) +450 (+1.0%) 12 847 (30.8%)
UMLSDS
Atoms 12 096 (19.7%) +407 (+0.7%) 12 503 (20.4%)
5 817
Unique Terms 4 899 (11.8%) +308 (+0.7%) 5 207 (12.5%)
Evaluation on a DS NER Task
Evaluation
Criterion
QuickUMLS
Installation Precision Recall F1
Lenient
UMLS 0.08 0.91 0.15
UMLSDistilled 0.25 0.89 0.39
UMLSDS 0.32 0.86 0.46
iDISK 0.51 0.82 0.63
Union 0.32 0.91 0.48
Strict
UMLS 0.05 0.67 0.10
UMLSDistilled 0.19 0.69 0.30
UMLSDS 0.22 0.61 0.33
iDISK 0.43 0.69 0.53
Union 0.23 0.77 0.36
Identifying 3710 DS entities on 351 abstracts
1.2. Expanding Supplement Terminology
22
Objective
23
• To apply word embedding models to expand the
terminology of DS from clinical notes: semantic variants,
brand names, misspellings
• Word embeddings
• Reveal hidden relationship between words (similarity and relatedness)
• More efficient; can be trained a large amount of unannotated data
calcium chamomile cranberry dandelion flaxseed garlic ginger
ginkgo ginseng glucosamine lavender melatonin turmeric valerian
Method Overview
24
Model Training
25
• Corpus size
• Hyperparameter tuning
• Window size (i.e., 4, 6, 8, 10, and 12)
• Vector size (i.e., 100, 150, 200, 250)
• Glove trained on the same corpus
• Window size and vector size
• Optimal parameters were chosen based on human annotation (intrinsic evaluation)
Results: Query Expansion Examples
Initial Query word2vec Expanded Query Expanded Examples
Black
cohosh
Misspelling: black kohosh, black kohash;
Brand name: remifemin Estroven Estrovan
estraven icool amberen amberin Estrovera
EstroFactor
• Please try black cohash or Estroven for hot flashes.
• Pt has discontinued Remifemin but still has symptoms.
• Recommend Estroven trial for symptoms of menopause.
Turmeric Misspelling: tumeric
• Pt emailed wondering about taking Tumeric
• Patient states that she sometimes takes the supplements
Tumeric
Folic acid
Brand name: Folgard, Folbic
Other name: Folate
• Patient is willing to try Folgard if ok with provider.
• Patient is on folate and does not smoke.
Valerian
Misspelling: velarian
Brand name: myocalm pm, somnapure
• Taking Velarian root and benadryl as well
• I would recommend moving to 6mg dose first, then trying
somnapure if still not helping.
Melatonin
Misspelling: Melantonin, melotonin
Brand name: alteril, neuro sleep
• Can try melantonin for sleep aid.
• Try alteril - it is over the counter sleep aid Let me know if
this is not better over the next few weeks
26
Results: Comparison of Base and Expanded Queries
27
Results: Comparison of word embedding expanded
versus external resource expanded queries
28
1.3. Detecting DS Indications and Adverse
Event Signals in Clinical Texts
• Clinical notes document information related to patient safety
• ADs
• “Patient gets headaches with black cohosh”
• Indications
• “Presently, patient is taking black cohosh for night sweats and
hot flashes”
• Temporal relationship between medical events
• “Also headaches did start shortly after starting black cohosh”
• DS safety surveillance
• NLP for medical concepts and relation extraction
29Fan, et al, J Am Med Inform Assoc. 2020
Objectives
• To demonstrate the feasibility of deep learning models
applied to clinical notes to facilitate discovery of DS
safety knowledge
• To evaluate different deep learning (e.g., pre-trained
BERT) models on annotated DS-specific clinical corpora
30
Results of NER Models
31
7000 sentences on 7 DS were randomly selected. DS entities including generic name, brand name, abbreviations,
misspellings.
DS Symptom Overall (micro)
P* R* F1 Num* P R F1 Num P R F1 Num
CRF 0.900
±
0.00
0.791
±
0.00
0.842
±
0.00
1247 0.714
±
0.00
0.567
±
0.00
0.632
±
0.00
356 0.861
±
0.00
0.741
±
0.00
0.797
±
0.00
1603
Bi-LSTM-CRF
(word only)
0.905
±
0.002
0.854
±
0.007
0.879
±
0.003
1247 0.812
±
0.015
0.825
±
0.007
0.818
±
0.009
356 0.884
±
0.004
0.847
±
0.003
0.865
±
0.003
1603
Bi-LSTM-CRF (char
lstm)
0.900
±
0.006
0.860
±
0.002
0.879
±
0.003
1247 0.806
±
0.008
0.837
±
0.011
0.822
±
0.008
356 0.877
±
0.003
0.855
±
0.003
0.866
±
0.002
1603
Bi-LSTM-CRF (char
cnn)
0.905
±
0.006
0.864
±
0.004
0.884
±
0.003
1247 0.847
±
0.018
0.845
±
0.007
0.846
±
0.011
356 0.892
±
0.006
0.860
±
0.003
0.876
±
0.004
160
Clinical BERT 0.931
±
0.002
0.845
±
0.002
0.886
±
0.002
1247
0.836
±
0.014
0.840
±
0.007
0.838
±
0.008
356
0.908
±
0.003
0.845
±
0.002
0.875
±
0.001
1603
BERT 0.931
±
0.005
0.850
±
0.003
0.889
±
0.003
1247
0.860
±
0.010
0.854
±
0.006
0.857
±
0.004
356
0.914
±
0.007
0.851
±
0.003
0.881
±
0.003
1603
Results for Relation Extraction
32
Positive Negative Not related Overall (micro)
P R F1 Num P R F1 Num P R F1 Num P R F1 Num
Random
Forest
0.835
±
0.002
0.939
±
0.003
0.884
±
0.002
336 0.782
±
0.009
0.716
±
0.007
0.747
±
0.006
109 0.825
±
0.011
0.438±
0.006
0.572
±
0.005
69 0.823
±
0.003
0.824
±
0.002
0.813
±
0.002
514
CNN 0.937
±
0.013
0.936
±
0.031
0.936
±
0.010
336 0.804
±
0.057
0.926
±
0.021
0.859
±
0.026
109 0.824
±
0.095
0.634
±
0.060
0.721
±
0.040
69 0.899
±
0.013
0.896±
0.016
0.890
±
0.016
514
Att-
BLSTM
0.913
±
0.011
0.967
±
0.017
0.939
±
0.004
336 0.869
±
0.035
0.861
±
0.063
0.863
±
0.024
109 0.876
±
0.028
0.798
±
0.009
0.826
±
0.007
69 0.897
±
0.006
0.899
±
0.005
0.893
±
0.004
514
3,000 sentences (200 sentences on each) of the 15 DS, including black cohosh, chamomile, cranberry,
dandelion, folic acid, garlic, ginger, ginkgo, ginseng, glucosamine, green tea, lavender, melatonin, milk thistle,
and saw palmetto
Positive relationships (indication)
33
18,348 pairs
Entity pair NMCD Sentences
Vitamin C, Wound Ö Starting mv, Vitamin C and zinc for wound healing.
Fish oil, Hyperlipidemia Ö Patient has history of hyperlipidemia which was until recently well-controlled with
fish oil and simvastatin.
Peppermint, Nausea Ö He has much less nausea with peppermint oil and marijuana.
Vitamin E, Scar Ö Vitamin E po apply 1 capsule daily as needed to scar on forehead.
Fish oil, Pain Ö I suggested that she could try daily fish oil which may help the breast pain when it
is taken for at least a month or two and could use iburprofen and heat for the pain
as well.
Psyllium, Constipation Ö Patients states she takes psyllium powder daily for constipation, and needs refills.
Vitamin C, UTI Ö Patient with hx recurrent utis, on vitamin c for urinary acidification
Fish oil, Anxiety ☓ I encourage over the counter multi vitamin and fish oil pills, as they can help
improve some anxiety and depression symptoms.
Peppermint, Pain Ö She also has experienced pain relief when rubbing peppermint essential oil on the
low back.
Negative relationships (Adverse Event)
34
13,130 pairs
Entity pair NMCD Sentences
Niacin, Rash x Lisinopril causes a cough and niacin causes a rash.
Niacin, Flushing x She was having significant flushing with niacin, so she discontinued this
about 6 months ago.
Niacin, Hives x Patient stating reaction to niacin is hives though has used mvi in past
without issues.
Fish oil, Rash Ö Pt states when she takes fish oil tablets she get a small rash on her chin.
Fish oil, vomiting x Also, discussed vomiting with fish oil caps because she bit into them-
would not pursue further at this time.
Vitamin C, Nausea x She did have 1 or 2 episodes of nausea related to taking delayed-release
vitamin c for wound healing
Niacin, GI disturbance x Allergen reactions: niacin: gi disturnbance; simvastation: cramps.
Fish oil, Diarrhea Ö Discussed titrating back up on fish oil as he tolerates, previously has
been causing a lot of diarrhea so going slow.
1.4. Mining Biomedical Literature to Discovery
Drug-Supplement Interactions (DSIs)
http://www.wsj.com/articles/what-you-should-know-about-how-your-supplements-interact-with-prescription-drugs-1456777548
Researchers at the University of Minnesota in
Minneapolis are exploring interactions between
cancer drugs and dietary supplements, based on
data extracted from 23 million scientific
publications, according to lead author Rui
Zhang, a clinical assistant professor in health
informatics. In a study published last year by a
conference of the American Medical
Informatics Association, he says, they identified
some that were previously unknown.
35
Objective
• Explore potential DSIs by linking knowledge
extracted from biomedical literature
36
Literature-based Discovery
We have shown that ECHINACEA
preparations and some common alkylamides
weakly inhibit several cytochrome P450
(CYP) isoforms, with considerable variation
in potency. (19790031)
Echinacea - INHIBITS - CYP450
Tamoxifen and toremifene are metabolised by
the cytochrome p450 enzyme system, and
raloxifene is metabolised by glucuronide
conjugation. (12648026)
CYP450 - INTERACTS_WITH - Toremifene
Named entity recognition (NER), Relationship extraction
&
Echinacea - <Potentially Interacts With> - Toremifene
Big Data:
29 million abstracts
X1-Y1
X2-Y2
…
Xm-Ym
Y1-Z1
Y3-Z2
…
Yn-Zn
&
X1-Z1
Y5-Z8
…
Xk-Zt
Drug/Supplement Predicate Gene/Gene Class Predicate
Supplement/Dru
g
Known
Echinacea INH CYP450 INT Docetaxel Y
Echinacea INH CYP450 INT Toremifene N
Echinacea STI CYP1A1 INT Exemestane N
Grape seed extract INH CYP3A4 INT Docetaxel N
Kava preparation STI CYP3A4 INT Docetaxel Y
Results: Selected Interactions
INH, INHIBITS; STI, STIMULATES; INT, INTERACTS_WITH
Echinacea: fights the common cold and viral infections
Grape seed extract: cardiac conditions
Kava: treat sleep problems, relieve anxiety and stress
38
Results: Selected Predications
Semantic
predication
Citations
Echinacea
INHIBITS
CYP450
We have shown that ECHINACEA preparations and some common
alkylamides weakly inhibit several cytochrome P450 (CYP) isoforms, with
considerable variation in potency. (19790031)
Grape seed extract
INHIBITS
CYP3A4
Four brands of GSE had no effect, while another five produced mild to
moderate but variable inhibition of CYP3A4, ranging from 6.4% by Country
Life GSE to 26.8% by Loma Linda Market brand. (19353999)
Melatonin
INHIBITS
Cyclooxygenase-2
Moreover, Western blot analysis showed that melatonin inhibited LPS/IFN-
gamma-induced expression of COX-2 protein, but not that of constitutive
cyclooxygenase. (18078452).
CYP450
INTERACTS_WITH
Toremifene
Tamoxifen and toremifene are metabolised by the cytochrome p450 enzyme
system, and raloxifene is metabolised by glucuronide conjugation. (12648026)
CYP3A
INHIBITS
Docetaxel
Because docetaxel is inactivated by CYP3A, we studied the effects of the St.
John's wort constituent hyperforin on docetaxel metabolism in a human
hepatocyte model. (16203790)
39
1.5. Active Learning to Reduce
Annotation Costs for NLP Tasks
• NLP tasks requires human annotations
Ø Time consuming and labor intensive
• Active learning reduces annotation costs
Ø Used in biomedical and clinical texts
Ø Effectiveness varies across datasets and tasks
Chen Y, Lasko TA, Mei Q, Denny JC, Xu H. A study of active learning methods for named entity recognition in clinical text. J Biomed Inform 2015; 58: 11–8.
Chen Y, Cao H, Mei Q, Zheng K, Xu H. Applying active learning to supervised word sense disambiguation in MEDLINE. J Am Med Inform Assoc 2013; 20 (5): 1001–6. 40
Objectives
• To assess the effectiveness of AL methods on
filtering incorrect semantic predication
• To evaluate various query strategies and provide a
comparative analysis of AL method through
visualization
Vasilakes J, Rizvi R, Melton G, Pakhomov S, Zhang R. J Am Med Info Assoc Open. 2018 41
Method Overview
Query strategies:
• Uncertainty sampling
• Representative sampling
• Combined sampling
Evaluation:
• 10-fold cross validation
• Training = 2700, L0=270
• Testing = 300 using AUC
42
Query Strategies
• Uncertainty
Ø Simple Margin
Ø Least confidence
Ø Least confidence with dynamic bias
• Representative
Ø Distance to center
Ø Density
Ø Min-max
• Combined
Ø Information density
Ø Dynamic
43
Datasets and Annotations
• Substance interaction (3,000):
Ø INTERACTS_WITH, STIMULATES, or INHIBITS
• Clinical Medicine (3,000):
Ø ADMINISTERED_TO, COEXISTS_WITH, COM- PLICATES,
DIAGNOSES, MANIFESTATION_OF, PRE- CEDES, PREVENTS,
PROCESS_OF, PRODUCES, TREATS, or USES
• Inter-rater agreement:
Ø Kappa: 0.74 (SI), 0.72 (CM)
Ø Percentage agreement: 87% (SI), 91% (CM)
44
Performance Comparison
When L is small and U is large:
• it is unlikely that L is representative of U
• given that L is small and unrepresentative, the prediction model trained on L is likely to be poor.
|U| is the size of the current unlabeled set
|L| is the size of the current labeled set 45
Results
Uncertainty Sampling
Query Strategy ALC
Passive Learning 0.590
Uncertainty Sampling 0.597 – 0.607
Results
Representative Sampling
Query Strategy ALC
Passive Learning 0.590
Uncertainty Sampling 0.597 – 0.607
Representative
Sampling
0.622 – 0.634
Results
Combined Sampling
Query Strategy ALC
Passive Learning 0.590
Uncertainty Sampling 0.597 – 0.607
Representative
Sampling
0.622 – 0.634
ID (manual β) 0.642
Results
Dynamic β
Passive
Learning
Query Strategy ALC
Passive Learning 0.590
Uncertainty Sampling 0.597 – 0.607
Representative
Sampling
0.622 – 0.634
ID (manual β) 0.642
ID (dynamic β) 0.641
Performance Analysis
Uncertainty Sampling
(worst performing)
Representative Sampling
(best performing)
Vasilakes J, Rizvi R, Melton G, Pakhomov S, Zhang R. J Am Med Info Assoc Open. 2018
1.6. Mining Twitter to Detect DS Adverse
Events
• Objectives
Ø To develop an AI model to make an end-to-end
pipeline for identifying DS-AEs from tweets
Ø To compare the DS AEs discovered from the tweets to
those curated in iDISK
Data Collection
• Data Collection
Ø 332 DS terms including 40 commonly used DS and their name
variants
Ø 14,143 AE terms from ADR lexicon and iDISK knowledge base
Ø The final dataset includes 247,807 tweets that contain at least one
DS-AE pair from 2012 to 2018.
• Data preprocessing
Ø Remove URL, user handle (@username), hashtag symbol (#), and
emojis
Ø Contractions (e.g. can’t) were expanded
Ø Hashtags were segmented into constituent words
Ø Stop words were kept (e.g. “throw up” is different from “throw”)
Results – Concept Extraction
Concept Type Deep Learning Method Precision Recall F1-measure
Supplement LSTM-CRF + PubMed Word2Vec 0.8587 ± 0.0211 0.8055 ± 0.0280 0.8310 ± 0.0218
LSTM-CRF + GloVe Twitter 0.8491 ± 0.0321 0.8127 ± 0.0196 0.8300 ± 0.0179
LSTM-CRF + Glove Crawl 0.8736 ± 0.0210 0.8375± 0.0152 0.8551 ± 0.0157
LSTM-CRT + fastText 0.8538 ± 0.0160 0.8092 ± 0.0231 0.8308 ± 0.0175
BioBERT 0.8570 ± 0.0248 0.8725 ± 0.0212 0.8646 ± 0.0220
BERT 0.8560 ± 0.0185 0.8736 ± 0.0198 0.8647 ± 0.0184
Symptom LSTM-CRF + PubMed Word2Vec 0.7909 ± 0.0188 0.6794 ± 0.0258 0.7306 ± 0.0173
LSTM-CRF + GloVe Twitter 0.8048 ± 0.015 0.6994 ± 0.0244 0.7482 ± 0.0155
LSTM-CRF + Glove Crawl 0.8012 ± 0.0205 0.7146 ± 0.0344 0.7550 ± 0.0232
LSTM-CRT + fastText 0.7784 ± 0.0247 0.6841 ± 0.0271 0.7277 ± 0.0182
BioBERT 0.8416 ± 0.0204 0.8582 ± 0.0200 0.8497 ± 0.0172
BERT 0.8393 ± 0.0161 0.8664 ± 0.0147 0.8526 ± 0.0138
Results – Relation Extraction (RE)
Relation Type Deep Learning Method Precision Recall F1-measure
Indication CNN + GloVe Twitter 0.7774 ± 0.0252 0.7946 ± 0.0318 0.7850 ± 0.0124
CNN + GloVe Wiki GigaWord 0.7720 ± 0.0206 0.7901 ± 0.0280 0.7804 ± 0.0142
BioBERT 0.8177 ± 0.0214 0.8595 ± 0.0321 0.8374 ± 0.0147
BERT 0.8181 ± 0.0319 0.8522 ± 0.0409 0.8335 ± 0.0169
Adverse
events
LSTM-CRF + PubMed Word2Vec 0.6995 ± 0.0653 0.6381 ± 0.0539 0.6645 ± 0.0410
LSTM-CRF + GloVe Twitter 0.7069 ± 0.0553 0.5995 ± 0.0783 0.6456 ± 0.0561
BioBERT 0.7349 ± 0.0430 0.7603 ± 0.0519 0.7459 ± 0.0341
BERT 0.7312 ± 0.0694 0.7845 ± 0.1041 0.7538 ± 0.0376
Results
• 194,190 pairs were identified as DS indication
• 45,668 were identified as DS-AE
• 190,170 pairs have no relations
Results – DS-AE pairs examples
• Vitamin C – Kidney stones tweets: (iDISK has this entry)
Ø some medications yes even prolonged high dose vitamin c causes kidney
stones
Ø vitamin c is not actually an effective treatment for the common cold and
high doses may cause kidney stones nausea and diarrhea
Ø too much vitamin c can cause kidney stones
• Vitamin C – diarrhea tweets: (iDISK has this entry)
Ø i would eat this whole bag of oranges but vitamin c in high doses can
induce skin breakouts and diarrhea facts
Ø too much vitamin c or zinc could cause nausea diarrhea and stomach
cramps check your dose
Ø too much vitamin c can cause diarrhea and or nausea
Ø it can cause diarrhea because of all the vitamin c
Results – DS-AE pairs examples
• Niacin – Flush tweets: (not in iDISK)
Ø the niacin flush may be uncomfortable for a few mins but it is well
worth it it may be itchy or burn a little but it passes in 10 30
Ø note to self if you are used to 250 mg of niacin jump up to 500 mg
the niacin flush is so intense
Ø already got a niacin flush crap
• Fish oil – prostate cancer tweets: (not in iDISK)
Ø fish oil makes you more likely to get prostate cancer good enough for
me to stop taking it just a heads up
Ø some docs say fish oil can raise your risk of prostate cancer wait so i
should stop stuffing goldfish up there tgif
Ø correction study finds fish oil increases risk of high grade prostate
cancer by 71 percent <url>
58
Part 2:
Information Extraction in EHR and Clinical
Trials
• Extract Breast Cancer Receptor Status
• Identify Clinically New Information
• Parse Clinical Trail Eligibility Criteria
2.1. Breast Cancer Receptor Status
Phenotyping from EHR
59
Breitenstein MK, Liu H, Maxwell KN, Pathak J, Zhang R. Electronic health record phenotypes for precision medicine:
perspectives and caveats from treatment of breast cancer at a single institution. Clinical and Translational Science.
2018 Jan;11(1):85-92.
Phenotyping Granularity
• Phenotyping usually identify case or control
• Precision medicine phenotypes of of breast cancer
subtypes
Ø Estrogen receptor (ER)
Ø Progesterone receptor (PR)
Ø Human epidermal growth factor receptor 2 (HER2)
Ø Triple negative breast cancer (TNBC: ER-, PR-, HER2-)
60
Objectives
• Develop NLP-based breast cancer precision
medicine phenotyping methods to identify
receptor status
• Compare the clinical data source coverage on
receptor status
61
NLP tutorial at AIME 2020
2.2. Identifying Clinically Relevant New Versus
Redundant Information in Clinical Texts
64
• EHR “copy-and-pasting” functionality
Ø 74-90% physicians copy and paste
Ø 20-78% physician notes are copied text
• Results
Ø Little deletion, only addition
Ø Longer notes, recombinant versions of previous notes
Ø Errors repeat
• User issues
Ø Information overload
Ø Difficulties in finding information
• Predict the probability of a word based on all previous words
• Markov Assumption
Ø The probability of a word depends only on the previous n words
• N-gram model
Ø A (n-1)th order Markov model
Example: P (congestion | a female presenting with a chief complaint of nasal)
Bigram: P (congestion | nasal)
Trigram: P (congestion | of nasal)
Four-gram: P (congestion | complaint of nasal)
Statistical N-gram Language Model
65
€
P(wk | w1
k−1
) ≈ P(wk | wk−n +1
k−1
)
€
P(w1
n
) = P(w1)P(w2 | w1)P(w3 | w1w2)…P(wn | w1w2…wn−1)
€
= P(wk | w1
k−1
)
k=1
n
∏
Manning and SchÜtze. Foundations of Statistical Natural Language Processing. The MIT Press; 2003.
• Sparseness of corpus
Ø Zero to unseen event
Ø Zero will propagate to the probability of a long string
• Smoothing Methods
Ø Decrease probability of seen events and allows the occurrence
of unseen n-grams
Ø Good-Turning
Statistical N-gram Language Model
66Manning and SchÜtze. Foundations of Statistical Natural Language Processing. The MIT Press; 2003.
If C(w1...wn ) = r > 0, PGT (w1wn ) =
r *
N
, where r* =
(r +1)Nr+1
Nr
If C(w1...wn ) = 0, PGT (w1wn ) =
1− Nr
r *
Nr=1
∞
∑
N0
≈
N1
N0 N
Semantic Similarity Measures
• Measure semantic similarity between two
biomedical concepts by determining the closeness in
a hierarchy
• UMLS brings many biomedical vocabularies and
standards together
• UMLS::Similarity provides a platform to calculate
similarity using various methods
• Methods: Resnik; Jiang and Conrath; Lin
Pedersen T, Pakhomov S et al. 2nd ACM SIGHIT IHI Symp Proc, 2012.
Pakhomov S, McInnes B et al. AMIA Annu Symp Proc 2010: 572-6.
McInnes B, Pedersen T, Pakhomov S. AMIA Annu Symp Proc 2009:431-5.
Pedersen T, Pakhomov S et al. J Biomed Inform. 2007 Jun;40(3):288-99.
P. Resnik, International Joint Conference for Artificial Intelligence, 448-53, 1995.
J. Jiang and D. Conrath, Proceedings on International Conference on Research in CL, 9-33, 1997.
D. Lin, Proceedings of the International Conference on ML., 296-304, 1998.
Results: Performance Comparison
Algorithms Recall Precision F1-Measure
Optimal
Threshold
Baseline 0.85 0.64 0.73 -
Baseline + Lin 0.87 0.62 0.72 0.9
Baseline + Res 0.87 0.61 0.72 0.9
Baseline + Jcn 0.87 0.61 0.72 0.9
Baseline: rule-based section information adjustment + removal note formatting and
noises + removal both stop words + lexical normalization
Semantic similarity methods: Lin; Resnik; Jiang and Conrath
Precision =TP/(TP+FP), Recall = TP/(TP+FN),
F1-Measure = 2×Precision×Recall/(Precision+Recall).
NIP Score to Navigate Notes
0"
10"
20"
30"
40"
50"
60"
70"
29" 30" 31" 32" 33" 34" 35" 36" 37" 38"
New$Informa,on$Propor,on$(%)$
Note$Index$
10"Notes"
20"Notes"
30, 32, 33 & 35 Nothing new
31. NEW: RUQ pain worse with
eating greasy foods
34. NEW: pt visits
diabetes RN
36. NEW: sore
throat x 3 days
37. NEW: having chest
pain, will try colchicine for
pericarditis
38. NEW: depressive
symptoms, bulging L TM
on exam
70
• Cyclical pattern
• High correlation with human judgment
• Source note of redundant information *NIP: New Information Proportion
New Information
Semantic Types
Figure: Plot of NDIP (disease), NMIP (medication), and
NLIP (laboratory) over time for a patient. Biomedical
concepts for each note were automatically extracted and
included in boxes. NDIP, new problem/disease
information proportion; NMIP, new medication
information proportion; NLIP, new laboratory information
proportion.
!"
#!"
$!"
%!"
&!"
'!"
(!"
$')*+,)!-"
#&)./0)!-"
%)12/)!-"
$$)345)!6"
#%)748)!6"
$)749)!6"
$#)3+5)!6"
#!)*+,)!6"
$6):2;)!6"
#-)<=>)!6"
?)345)#!"
$()@2A)#!"
#?)*;8)#!"
()3+5)#!"
$()3+B)#!"
#&):2;)#!"
%)<=>)#!"
$%)12/)#!"
##)@2A)##"
$)*;8)##"
$$)749)##"
##)3+B)##"
%!)*+,)##"
#6)./0)##"
!"#$
New
Information
71
NIP = NDIP+NMIP+NLIP+NOIP
!"
#"
$!"
$#"
%!"
%#"
%#&'()&!*"
$+&,-.&!*"
/&01-&!*"
%%&234&!5"
$/&637&!5"
%&638&!5"
%$&2(4&!5"
$!&'()&!5"
%5&91:&!5"
$*&;<=&!5"
>&234&$!"
%?&@1A&$!"
$>&':7&$!"
?&2(4&$!"
%?&2(B&$!"
$+&91:&$!"
/&;<=&$!"
%/&01-&$!"
$$&@1A&$$"
%&':7&$$"
%%&638&$$"
$$&2(B&$$"
/!&'()&$$"
$5&,-.&$$"
!"#$%
!"
#"
$!"
$#"
%!"
%#"
%#&'()&!*"
$+&,-.&!*"
/&01-&!*"
%%&234&!5"
$/&637&!5"
%&638&!5"
%$&2(4&!5"
$!&'()&!5"
%5&91:&!5"
$*&;<=&!5"
>&234&$!"
%?&@1A&$!"
$>&':7&$!"
?&2(4&$!"
%?&2(B&$!"
$+&91:&$!"
/&;<=&$!"
%/&01-&$!"
$$&@1A&$$"
%&':7&$$"
%%&638&$$"
$$&2(B&$$"
/!&'()&$$"
$5&,-.&$$"
!"#$%
5-Sep-08: clonazepam
24-Sep-08: sertraline,
clonazepam
22-Oct-08:
sertraline
31-Dec-08:
glimepiride
24-Mar-09: tylenol,
ibuprofen, Imitrex
8-Mar-10: janumet,
metformin, Imitrex,
sertraline, estroven
7-May-10:
glipizide
25-Mar-11: buspirone,
venlafaxine
17-Sep-11: influenza vaccine
New Medication
Information
5-Sep-08: elbow pain, hand pain,
stress, depression, weight gain,
fatigure, osteoarthritis
24-Sep-08: sleepy, dizziness,
nausea, numbness, low back
pain, hip pain
22-Oct-08: anxiety
31-Dec-08: obesity, joint
tenderness, depression
23-Jan-09: hypoglycemia, hot
flushes, menorrhagia,
headache
24-Mar-09: arm pain,
migraine headaches, anxiety
8-Mar-10: depression,
back pain, fatigue
7-May-10: weight loss, family
stress, thirsty,
hypercholesterolemia
25-Mar-11: shoulder pain,
cramping, Leg pain,
patellofemoral syndrome
New Disease
Information
!"
#"
$!"
$#"
%!"
%#"
%#&'()&!*"
$+&,-.&!*"
/&01-&!*"
%%&234&!5"
$/&637&!5"
%&638&!5"
%$&2(4&!5"
$!&'()&!5"
%5&91:&!5"
$*&;<=&!5"
>&234&$!"
%?&@1A&$!"
$>&':7&$!"
?&2(4&$!"
%?&2(B&$!"
$+&91:&$!"
/&;<=&$!"
%/&01-&$!"
$$&@1A&$$"
%&':7&$$"
%%&638&$$"
$$&2(B&$$"
/!&'()&$$"
$5&,-.&$$"
!"#$%
5-Sep-08: BP, weight
24-Sep-08: breast cancer
screeing, X-ray spine
31-Dec-08: A1C, CHOL, HDL, LDL,
TRIG, Microalbuminuria
measurement, X-ray knee
24-Mar-09: A1C, BP
7-May-10: glucuse monitoring,
A1C, HDL, LDL, GLC, BP,
blood glucose
25-Mar-11: blood glucose
New Laboratory
Information
NOIP (Other types of new information)
New Information Visualization in Epic EHR System
2.3. Parsing Clinical Trial Eligibility Criteria
• Patient recruitment delays are remarkably
common and costly
Ø nearly 80 percent of patient recruitment timelines in
clinical trials are not met
Ø over 50 percent of the patients are not enrolled within
the planned time frames
• Objective
Ø Using NLP to parse trial inclusion/exclusion eligibility
criteria entities
Ø Using stat-of-the-art methods on CLAMP platform
Demographics
Observation
Procedure
Condition
Drug
Dietary Supplement
Diet
Device
Measurement
Temporal Measurement
Qualifier
EntitiesandAttributes
NLP for Clinical Trial
Matching
Semantic Class Example Criteria (entities and attributes are underlined and marked in blue)
Entity Demographics Women must be > 18 to 45 years of age; BMI = 27 kg/m2;
Observation Bilirubin greater than 1.2 g/dl; MMSE below 24, dementia or unstable clinical depression by
exam
Procedure History of bilateral hip replacement
Condition Uncontrolled hypertension (BP over 180mm HG)
Drug Taking metformin, propranolol and other medications
Dietary supplement (ds) Use of St. John’s Wort or any other dietary supplement
Device Claustrophobia, metal implants, pacemaker or other factors affecting feasibility and / or
safety of MRI scanning
Attribute Measurement BUN above 40 mg/dl, Cr above 1.8 mg/dl, CrCl < 60 mg/dl
Qualifier Signs and symptoms of increased intracranial pressure; severe
hypercalcemia
Temporal_measurement Use of systemic corticosteroids within the last year
Negation Use of anti-diabetic drugs other than metformin
Annotations
Mapping to UMLS semantic groups across NLP systems
A: BioMedICUS, B: CLAMP, C: cTAKES and D: MetaMap.
Performances of individual NLP systems & Boolean ensemble
Anusha Bompelli, et al. Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical
Trials using NLP-ADAPT. AIME 2020 (will present on Aug 25 at 14:00, NLP session)
Eligibility Criteria Corpus (149 trials) Berta
(strict, lenient) Robertab
(strict, lenient) Electrac
(strict, lenient)
Semantic Class # Precision Recall F1 measure Precision Recall F1 measure Precision Recall F1 measure
Entity demographics 194 0.856, 0.916 0.409, 0.451 0.554, 0.604 0.654, 0.928 0.537, 0.801 0.586, 0.859 0.500, 0.541 0.273, 0.294 0.353, 0.380
observation 868 0.694, 0.829 0.658, 0.829 0.675, 0.829 0.721, 0.91 0.684, 0.897 0.702, 0.904 0.663, 0.865 0.590, 0.795 0.624, 0.829
procedure 148 0.667, 1.000 0.600, 0.800 0.632, 0.889 0.542, 0.708 0.650, 0.850 0.591, 0.773 0.448, 0.552 0.650, 0.850 0.531, 0.669
condition 1832 0.794, 0.995 0.698, 0.851 0.743, 0.917 0.813, 0.949 0.767, 0.900 0.789, 0.924 0.778, 0.0.918 0.788, 0.893 0.744, 0.905
drug 890 0.935, 0.959 0.505, 0.543 0.655, 0.693 0.707, 0.846 0.699, 0.952 0.703, 0.895 0.423, 0.453 0.391, 0.447 0.406, 0.449
supplement 188 0.111, 0.278 0.250, 0.625 0.154, 0.385 0.296, 0.412 0.625, 1.000 0.400, 0.583 0.250, 0.438 0.500, 0.875 0.333, 0.583
device 37 0.857, 0.857 1.000, 1.000 0.923, 0.923 0.857, 0.857 1.000, 1.000 0.923, 0.923 0.750, 0.750 1.000, 1.000 0.857, 0.857
Attribu
te
measurement 397 0.731, 0.851 0.700, 0.829 0.715, 0.840 0.781, 0.938 0.725, 0.87 0.752, 0.902 0.667, 0.810 0.600, 0.786 0.632, 0.797
qualifier 1137 0.730, 0.795 0.754, 0.822 0.742, 0.808 0.817, 0.872 0.761, 0.795 0.788, 0.831 0.705, 0.750 0.788, 0.839 0.744, 0.792
temporal 646 0.805, 0.931 0.729, 0.823 0.765, 0.874 0.837, 0.989 0.811, 0.926 0.824, 0.957 0.859, 0.976 0.760, 0.844 0.807, 0.905
negation 261 0.818, 0.879 0.562, 0.769 0.667, 0.820 0.914, 0.943 0.821, 0.846 0.825, 0.892 0.735, 0.765 0.641, 0.667 0.685, 0.712
arXiv:a1810.04805, b1907.11692, c2003.10555.
Performance Comparison of the Deep Learning Models
Acknowledgements
Extramural Funding
NCCIH 1R01AT009457 (Zhang)
OD R01AT009457-03S1 (Zhang)
NIA 3R01AT009457-04S1 (Zhang)
CTSA 1UL1TR002494 (Blazer)
AHRQ 1R01HS022085 (Melton)
Medtronic Inc. (Speedie)
Collaborators
Mayo Clinic (Liu, Wang), U of Florida (Bian), Florida State U
(He), NIH/NLM (Rindflesch, Bodenreider), UIUC (Kilicoglu)
Contact Information
Rui Zhang, Ph.D.
Email: zhan1386@umn.edu
Research Lab: http://ruizhang.umn.edu/

More Related Content

PDF
Using Large Language Models in 10 Lines of Code
PPTX
Evolutionary Computing
PPT
Artificial Intelligence: Knowledge Engineering
PPTX
Machine learning (ML) and natural language processing (NLP)
PDF
Train foundation model for domain-specific language model
PPTX
NAMED ENTITY RECOGNITION
PDF
Visual reasoning
PDF
Natural language processing (NLP) introduction
Using Large Language Models in 10 Lines of Code
Evolutionary Computing
Artificial Intelligence: Knowledge Engineering
Machine learning (ML) and natural language processing (NLP)
Train foundation model for domain-specific language model
NAMED ENTITY RECOGNITION
Visual reasoning
Natural language processing (NLP) introduction

What's hot (20)

PPTX
Natural language processing PPT presentation
PDF
Introduction to ChatGPT and Overview of its capabilities and functionality.pdf
PPTX
A brief primer on OpenAI's GPT-3
PDF
AI simple search strategies
PPT
Planning
PPTX
AI_Session 3 Problem Solving Agent and searching for solutions.pptx
PPT
Latent Semantic Indexing and Analysis
PDF
IE: Named Entity Recognition (NER)
PDF
Transformers - Part 1
ODP
Topic Modeling
PDF
LanGCHAIN Framework
PPT
Natural language procssing
PPTX
‘Big models’: the success and pitfalls of Transformer models in natural langu...
PPT
Artificial intelligence and knowledge representation
PDF
Introduction to Few shot learning
PPT
Natural language processing
PDF
Introduction to Transformers for NLP - Olga Petrova
PDF
Large Language Models Bootcamp
PDF
Transformer Introduction (Seminar Material)
PPTX
Data Science with Python Libraries
Natural language processing PPT presentation
Introduction to ChatGPT and Overview of its capabilities and functionality.pdf
A brief primer on OpenAI's GPT-3
AI simple search strategies
Planning
AI_Session 3 Problem Solving Agent and searching for solutions.pptx
Latent Semantic Indexing and Analysis
IE: Named Entity Recognition (NER)
Transformers - Part 1
Topic Modeling
LanGCHAIN Framework
Natural language procssing
‘Big models’: the success and pitfalls of Transformer models in natural langu...
Artificial intelligence and knowledge representation
Introduction to Few shot learning
Natural language processing
Introduction to Transformers for NLP - Olga Petrova
Large Language Models Bootcamp
Transformer Introduction (Seminar Material)
Data Science with Python Libraries
Ad

Similar to NLP tutorial at AIME 2020 (20)

PPTX
Natural Language Processing to Curate Unstructured Electronic Health Records
PDF
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
PPTX
Natural Language Understanding in Healthcare
PDF
CV_Min_Jiang
PPTX
Applying NLP to Personalized Healthcare - 2021
PDF
Nlp based retrieval of medical information for diagnosis of human diseases
PDF
Nlp based retrieval of medical information for diagnosis of human diseases
PDF
NLP Prescription for Healthcare Challenges.pdf
PDF
Challenges in understanding clinical notes: Why NLP Engines Fall Short
PDF
Learning to speak medicine
PDF
Natural Language Processing In Healthcare
PDF
1555 track2 talby
PDF
additional Reading dnbvbfdvfivddcdsvfbivdcsdlcd
PPTX
New Frontiers in Applied NLP​ - PAW Healthcare 2022
PPTX
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
DOCX
ROBOTICS ESSAYS ANSWERS BY KANTE- IRVIN MAKUWAZA.docx
PPTX
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
PDF
KCI_NLP_OHSUResearchWeek2016-NLPatOHSU-final
PPTX
How to Apply NLP to Analyze Clinical Trials
PPTX
HXR 2016: Data Insights: Mining, Modeling, and Visualizations- Niraj Katwala
Natural Language Processing to Curate Unstructured Electronic Health Records
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Natural Language Understanding in Healthcare
CV_Min_Jiang
Applying NLP to Personalized Healthcare - 2021
Nlp based retrieval of medical information for diagnosis of human diseases
Nlp based retrieval of medical information for diagnosis of human diseases
NLP Prescription for Healthcare Challenges.pdf
Challenges in understanding clinical notes: Why NLP Engines Fall Short
Learning to speak medicine
Natural Language Processing In Healthcare
1555 track2 talby
additional Reading dnbvbfdvfivddcdsvfbivdcsdlcd
New Frontiers in Applied NLP​ - PAW Healthcare 2022
"Can NLP techniques be utilized as a reliable tool for medical science?" -Bui...
ROBOTICS ESSAYS ANSWERS BY KANTE- IRVIN MAKUWAZA.docx
Natural Language Understanding with Machine Learned Annotators and Deep Learn...
KCI_NLP_OHSUResearchWeek2016-NLPatOHSU-final
How to Apply NLP to Analyze Clinical Trials
HXR 2016: Data Insights: Mining, Modeling, and Visualizations- Niraj Katwala
Ad

Recently uploaded (20)

PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Leprosy and NLEP programme community medicine
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
DOCX
Factor Analysis Word Document Presentation
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Database Infoormation System (DBIS).pptx
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Managing Community Partner Relationships
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Leprosy and NLEP programme community medicine
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
Factor Analysis Word Document Presentation
Optimise Shopper Experiences with a Strong Data Estate.pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Qualitative Qantitative and Mixed Methods.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
IBA_Chapter_11_Slides_Final_Accessible.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Database Infoormation System (DBIS).pptx
importance of Data-Visualization-in-Data-Science. for mba studnts
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
[EN] Industrial Machine Downtime Prediction
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Managing Community Partner Relationships
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx

NLP tutorial at AIME 2020

  • 1. Tutorial 1: Methods and Applications of Natural Language Processing in Medicine Rui Zhang1, Hua Xu2, Yanshan Wang3, Yifan Peng4 1University of Minnesota, 2University of Texas Health, 3Mayo Clinic, 4Weill Cornell Medicine International Conference on Artificial Intelligence in Medicine (AIME 2020) August 25, 2020
  • 2. Purpose of this tutorial • Review NLP systems and tools in solving clinical problems and facilitating clinical research • Showcase our real-world NLP application in clinical practice and research across four institutions • Discuss opportunities and challenges of NLP in medicine
  • 4. Motivation for Clinical NLP 20% 80% Demographics, Lab results, Medication, Diagnosis… Clinical notes Patient provided information Family history Social history Radiology reports Pathology reports … Structured Data Unstructured Data
  • 6. Developing high-performance NLP solutions for healthcare applications Dr. Hua Xu is a Professor at University of Texas Health School of Biomedical Informatics and a fellow of the American College of Medical Informatics. His primary research interest is to develop NLP methods and systems and apply them to clinical research and operation. He has worked on different clinical NLP topics, such as entity recognition, relation extraction, syntactic parsing, word sense disambiguation, and active learning, with over 200 publications. He has built multiple clinical NLP systems including the medication information extraction tool MedEx and a recent comprehensive clinical NLP system CLAMP, using machine learning and deep learning methods. Those tools have been widely used in large clinical consortia such as OHDSI and CTSA. • NLP concepts and tasks • Issues affecting NLP performance • Tools to facilitate NLP development • Applications to healthcare https://sbmi.uth.edu/faculty-and-staff/hua-xu.htm
  • 7. Transfer Learning of NLP in Medicine Dr. Yifan Peng is an assistant professor of population health sciences in the Division of Health Informatics at Weill Cornell Medicine. After receiving his Ph.D. in Computer Science from the University of Delaware in 2016, Dr. Peng worked as a research fellow at the National Center for Biotechnology Information at National Library of Medicine at NIH. Dr. Peng’s main research interests include biomedical and clinical natural language processing and medical image analysis (by courtesy). His current project focuses on applying information extracted through NLP and image analysis on radiological data classification. • Transfer learning • Pre-training of BERT model on large-scale clinical corpora • Fine-tuning the BERT model on specific tasks such as NER and RE • Multi-task learning http://vivo.med.cornell.edu/display/cwid-yip4002
  • 8. Digital Phenotyping for Cohort Discovery Dr. Yanshan Wang is an Assistant Professor at Mayo Clinic. His current work is centered on developing novel NLP and artificial intelligence (AI) methodologies for facilitating clinical research and solving real- world clinical problems. Dr. Wang has extensive collaborative research experience with physicians, epidemiology researchers, and statisticians. Dr. Wang has published over 40 peer-reviewed articles at referred computational linguistic conferences (e.g., NAACL), and medical informatics journals and conference (e.g., JBI, JAMIA, JMIR and AMIA). He has served on program committees for EMNLP, NAACL, IEEE-ICHI, IEEE-BIBM. • Cohort retrieval • Approaches for cohort retrieval • Case study • Patient cohort retrieval for clinical trials accrual https://www.mayo.edu/research/faculty/wang-yanshan-ph-d/bio-20199713.
  • 9. Advances of NLP in Clinical Research Dr. Rui Zhang is an McKnight Presidential Fellow and Associate Professor in the College of Pharmacy and the Institute for Health Informatics (IHI), and also graduate faculty in Data Science at the University of Minnesota (UMN). He is the Director of NLP Services in Clinical and Transnational Science Institution (CTSI) at the UMN. Dr. Zhang’s research focuses on health and biomedical informatics, especially biomedical NLP and text mining. His research interests include the secondly analysis of EHR data for patient care as well as pharmacovigilance knowledge discovery through mining biomedical literature. • Background of NLP to Support Clinical Research • NLP Systems and Tools for Clinical Research • Case study • NLP to Support Dietary Supplement Safety Research http://ruizhang.umn.edu
  • 10. Schedule Time Session Presenter 9:00 - 9:05 Introduction Rui Zhang 9:05 - 9:45 Developing high-performance NLP solutions for healthcare applications Hua Xu 9:45 - 10:25 Transfer Learning of NLP in Medicine: A case study with BERT Yifan Peng 10:25 – 10:30 Break 10:30 – 11:10 Digital Phenotyping for Cohort Discovery Yanshan Wang 11:10 – 11:50 Advances of NLP for Clinical Research Rui Zhang 11:50 – 12: 00 Q&A
  • 11. Building High-performance NLP Systems in Healthcare Hua Xu PhD School of Biomedical Informatics, University of Texas AIME NLP Tutorial 8/25/2020 Data Science Biomedcine NLP
  • 12. Disclosure § Founder and CEO: § Melax Technologies Inc. § Consultant: § Hebta LLC § More Health Inc. § DCHealth Technologies Inc. 2
  • 13. Outline 01 Overview & Challenges 02 Select right algorithms 03 Annotate good data 3 04 Bring human into the loop
  • 14. Part 1 01 Overview & Challenges 4
  • 15. NLP Tasks – Let’s focus on Information Extraction Information Retrieval Information extraction Document classification Question answering Language generation Wikipedia Website Social media Email Office files Computational techniques for analyzing and representing naturally occurring languages at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications. 5
  • 16. Applications of Biomedical IE Systems Application Clinical document Drug labels Clinical trial protocols Biomedical literature NLP Decision support Business Intelligence Clinical Research Surveillance 6
  • 17. Active Development of Biomedical IE Systems General Purpose Specific Purpose MedLEE MetaMap CLAMP cTAKES Smoking status PHI De-identification Social determinants …… 7 Bleeding events Cancer metastasis
  • 18. Challenges for End-user to Utilize Biomedical NLP § General clinical NLP systems exist, but their performance is often suboptimal for user-specific applications § Specific-purpose NLP systems often show good performance in a given task, but performance drop when transporting these NLP tools § The generalizability issue when users build or deploy NLP applications § From one type of document to another § From one organization to another § From one application to another 8
  • 19. An Example of Smoking Status Detection § Mayo Clinic cTAKES for smoking detection § Sentence-level mention detection and classification – machine learning (ML) § Document-level status classification – rules § Patient-level summarization – rules § Performance drop at deployment § I2b2 dataset, F-measure 85.5% (Savova et al. JAMIA 2008) § Vanderbilt dataset, F-measure 75% (Liu et al. AMIA 2012) § Steps to customize it to improve performance to 89% § Collect and annotate local data § Re-train models using specific algorithms § Specify rules by local physicians 9 Optimizing NLP performance could be time-consuming and costly …
  • 20. Components for Building High-performance NLP Systems 10 Rules Machine learning Deep learning Algorithm Conduct annotation Specify rules Curate Knowledgebases Human What to annotate Annotation Quality Annotation Cost Data Practical NLP for Biomedcine
  • 21. Part 2 02 Select right algorithms 11 Rules vs. Machine Learning vs. Deep Learning
  • 22. Rule-based approach to medication information extraction § Input: a clinical document, e.g., discharge summary § Output: all drug names with associated signature information such as dose, frequency, route… § Issues: § Misspellings and abbreviations § ibuprofen ("ibuprfen"), augmentin ("qugmentin"), insulin ("inuslin"), and ASA ( aspirin ) § Context of drug mentions § Allergy: pt is allergic to penicillin § Negation: never on warfarin § Lab tests: potassium level is normal vs. take potassium § Temporal status: was on warfarin 3 days before admission § Multiple signatures and multiple drugs in one sentence § Coumadin 2.5mg po dly except 5mg qTu,Th § start the patient on Lovenox for the duration of this pregnancy, followed by a transition to Coumadin postpartum, to be continued for likely long-term, possibly lifelong duration. 12
  • 23. Findings Prec Rec F-Score DrugName 95.0 91.5 93.2 Strength 98.8 90.5 94.5 Route 98.8 89.6 93.9 Frequency 98.9 93.2 96.0 Table 1. Evaluation on discharge summaries from Vanderbilt. • Semantic-based parsing (Drug names and signatures) • Maps to RxNorm concepts Semantic Tagger Parser Semantic Grammar Lexicon & Rules MedEx Clinical Text She is currently maintained on Prograf 3mg bid. Structured Output Drugname: Prograf Strength: 3mg Frequency: bid Pre-processing Xu et al. JAMIA 2010; 17:19-24 Findings Prec Rec F-Score DrugName 96.7 88.0 92.1 Strength 94.7 94.7 94.7 Route 96.0 87.0 91.3 Frequency 96.8 89.2 92.9 Table 2. Evaluation on clinic visit notes from Vanderbilt. 13 MedEx – a rule-based tool to identify drug information from free text
  • 24. Drug Name Lisinopril , Famotidine Strength 50mg , 500/50 Route by mouth , iv Frequency b.i.d. , every 2 days Form tablet , ointment Dose Amount take one tablet IntakeTime cc , at 10am Duration for 10 days Dispense Amount dispensed #30 Refill refills: 2 Necessity prn , as needed 14 Define Semantic Categories
  • 25. Entity Recognition § Lexicon lookup tagger § Drug names § Include RxNorm, UMLS, and manually collected drug terms § Exclude certain English terms ( sleep ) § Regular expression-based tagger § Frequency, such as q8hrs § Transformation/Disambiguation § Rule-based transformation/disambiguation of initial tags to final semantic tags 15
  • 26. Parsing §A Chart Parser in NLTK § Semantic grammar § Parse Tree à Structured output §A Regular Expression based Chunker DGMSIG <S> :=<DrugList> <DrugList> := <Drug>|<Drug><DrugList> <Drug> := <DGSSIG> | <DGMSIG> <DGSSIG> := <DGN> | <DGN> <SIG> <SIG> := <DOSE> | <FORM> | <RUT> …. DGN SIG SIG FreqStr FreqStr Prograf 3mg qam and 2mg qpm Figure 1. Simplified semantic grammar. 16
  • 27. Extend MedEx for the 2009 i2b2 Challenge Semantic Tagger Parser MedEx Clinical Notes I2b2 Output Sentence Splitter Section Identification Post-processing Spell Checker Findings Prec Rec F-Score DrugName 84.2 87.1 85.6 Dose 89.5 81.8 85.5 Route 91.8 85.8 88.7 Frequency 87.9 85.8 86.8 Reason 45.9 29.6 36.0 Duration 36.4 35.8 36.1 All 83.9 80.3 82.1 Table 3. Evaluation on 2009 i2b2 data set. Ranked 2nd out of 20 participating teams. Doan et al. JAMIA 2010; 17: 528- 31 17
  • 28. § The 2010 i2b2 Challenge: recognize problem, treatment and test § Convert it into a machine learning task § Optimize the ML models § ML algorithms: CRFs, SSVMs § Features: words, sections, dictionary, representations § Entity tag sets: BIO, BIESO 18 Machine Learning for clinical entity recognition She was given 1 unit of packed red blood cell . O O O O O O B I I I O “Plavix was not recommended, given her recent GI bleeding.” Jiang et al. JAMIA 2011
  • 29. Results 19 Tags Features SSVMs - F(R/P) CRFs - F(R/P) BIO Baseline 84.51 (82.61/86.49) 84.02 (81.32/86.90) Optimized 85.22 (84.05/86.43) 85.16 (82.94/87.50) BIESO Baseline 84.71 (82.53/87.02) 84.22 (81.40/87.23) Optimized 85.82 (84.31/87.38) 85.59 (83.16/88.16) Tang et al. BMC Medical Informatics and Decision Making 2013
  • 30. Embedding Methods Traditional - word2vec - GloVe - fastText Contextual - ELMo - BERTBASE - BERTLARGE - BioBERT Open-domain - Off-the-shelf (General) Clinical domain - Pre-trained on clinical notes from MIMIC-III starting from open-domain checkpoint Entity Recognition Tasks - i2b2 2010 - i2b2 2012 - SemEval 2014 Task 7 - SemEval 2014 Task 14 Pre-training Evaluation Si Y et al. Enhancing clinical concept extraction with contextual embeddings. JAMIA. 2020 Contextual embeddings for deep learning-based NER
  • 31. Algorithm Comparison on Benchmark Dataset 21 Algorithms Feature F1 CRFs (Jiang et al., 2010) (#2 in challenge) Bag of words 77.33 Optimized features 83.60 Semi-Markov (deBruijn B, et al., 2010) (#1 in challenge) Optimized features + Brown clustering 85.23 SSVMs (Tang et al., 2014) Optimized features + Brown clustering + Random indexing 85.82 CNN (Wu et al., 2015) Word embedding 82.77 Bi-LSTM-CRF (Wu et al., 2017) Word embedding 85.91 BERT (Si et al., 2020) Pre-trained language model - BERT, fine tuned on clinical text 90.25 § Task: 2010 i2b2 challenge – entity recognition for problem, treatment, and test in discharge summaries
  • 32. Additional thoughts on deep learning approaches § Parameter optimization § Computation resources (e.g., GPU) § Prediction speed § CRF-based NER – 1 second per discharge summary § BERT-based NER – 20 second per discharge summary § Reliability and explainability
  • 33. A Review of Deep Learning in Clinical NLP Wu S et. al Deep learning in clinical natural language processing: a methodical review. JAMIA 2019
  • 34. NLP Tasks and Applications NLP Tasks Sub-tasks NLP Applications Word sequence POS tagging, language models, Named entity recognition, Relation extraction / semantic annotation (semantic role labeling, event detection, FrameNet) Information extraction Sequence to sequence encoders and decoders Machine translation Summarization Text classification/clustering Document classification, Sentence classification, Sentiment analysis, Topic models Email Spam Product sentiment Information retrieval Query expansion, Indexing, Relevance ranking Search engine Dialog systems Speech recognition, natural language generation Chat bot
  • 35. Summary about algorithm selection § The simplest approach that can achieve good performance is the best § Take available resources into consideration § Computation resources § Both labeled and unlabeled data § Expertise in machine learning/deep learning § Keep deployment in mind § Technical architecture and infrastructure § Fitting into your workflow § Other requirements such as speed, robustness etc.
  • 36. Part 3 03 Annotate good data 26 Availability, Quality, and Sample Size
  • 37. Data Availability § Large unlabeled data is useful, especially for deep leaning based approaches § High quality annotated data is the key to machine learning/deep learning based approaches § Be aware of the privacy issue of biomedical textual data - De-identification programs that can remove protected health information (i.e., names, address, dates) are available § An NER task – many rule-based, ML-based, hybrid approaches § Performance varied (some as high as 95%) § Examples: MIST, De-ID…
  • 38. What about synthetic text? § Generating synthetic notes § Task – generate HPI section § Data – 826 clinical notes § Methods – SeqGAN, GPT-2, and CTRL This is a 39 year-old female with a history of diabetes mellitus , coronary artery disease , who presents with shortness of breath and cough . She has no relief from antacids or antiinflammatories . She is admitted now with increasing radiation damage to her home and extensive medical bills . She denies any pleural chest pain.
  • 39. Annotation Quality Matters § Task: 2014 i2b2 challenge – extracting 36 risk factors , a document classification task § Dataset: 790 training and 514 test notes with document labels and evidence span highlighted § The top-ranked system: § Traditional SVM classifiers § Re-annotated the corpus to: § Fix inconsistent boundary § Identify negative mentions Roberts, K., Shooshan, S.E., Rodriguez, L., Abhyankar, S., Kilicoglu, H. and Demner-Fushman, D., 2015. The role of fine-grained annotations in supervised recognition of risk factors for heart disease from EHRs. Journal of biomedical informatics, 58, pp.S111-S119.
  • 40. Requirements for annotation § Annotation guideline § Clear definitions of entity and relation (an information model) § Appropriate granularity to benefit your application § Consistent and robust for representing information § High quality Annotation (e.g., consistent) § Annotator knowledge § Sufficient training § Adequate sample size 30
  • 41. Annotation Workflow Data collection Pre-annotation Guideline development Training & annotation Model training Quality control ML Models
  • 42. Guideline development – content § Goals of the annotation § Definitions of entities, relations, etc. § Information model § Granularity § Detailed annotation rules (different scenarios) § Human vs. computer’s thinking § Provide many examples § Positive examples § Negative examples 32
  • 43. Guideline development - workflow § Iterative process § Involvement of both domain experts and linguists/informaticians 33
  • 45. Annotators selection and training § Annotator selection § Background: domain experts/linguists/lay person § Sources: physicians/nurses, residents, students, or crowdsourcing - Amazon Mechanical Turk § Annotator training § Iterative training until achieve an expected performance § Quality checking during the annotation § Multi-annotator magament § Train each annotator to make sure consistent annotations achieved before start § If resources available, each sample can be double annotated by two people, and a third more experienced one to judge discrepancy. § Otherwise, assign a small portion of data to both annotators so that inter-annotation agreement (IAA) can be calculated 35
  • 46. Annotation tools § BRAT § MAE § eHOST § …. § Prodigy § LightTag § CLAMP 36
  • 47. Quality checking § Inter annotator agreement § Precision/Recall/F-measure § Cohen’s Kappa, Fleiss’s Kappa (https://en.wikipedia.org/wiki/Cohen%27s_kappa) § Self-train and self-test § One dataset – build the model and predict the same dataset again. § Performance should be high, otherwise it indicates issues with annotation 37
  • 48. Sample Size § How many samples are needed for the required performance of the specific task? – No definite answer… § Many studies reported results on several hundreds of documents § Sample size could be estimated based on power calculation § More precisely, we can plot a learning curve 38 0.25 0.35 0.45 0.55 0.65 0.75 0.85 8 32 128 512 2048 8192 F-measure Number of sentences in the training set Uncertainty
  • 49. Challenges and potential solutions §Annotation cost/time §Requires reasonable sized annotated corpus §Annotation by experts (e.g., physicians) is expensive §Technologies to save annotation time §Weak supervision § Get low quality labels more efficiently §Transfer learning § Leverage labeled data/models for a different domain/task §Active learning § Label informative samples to build better models
  • 50. Summary § Data and annotation play important roles in machine learning/deep learning based NLP systems § A good annotated corpus that leads to high performance ML models should include: § Annotation guideline designed for the task § Knowledgeable and well-trained annotators § Enough annotated samples § Annotation could be costly and time-consuming
  • 51. Part 4 41 04 Bring human into the loop Human annotation, Rule augmentation, Biomedical knowledge bases
  • 52. Rule Augmentation is Effective SVM post- processing CNN-RNN + post- processing biLSTM- CRF + post- processing Strength -> Drug 0.9704 0.9792 0.9760 0.9853 0.9865 0.9916 Dosage -> Drug 0.9637 0.9798 0.9642 0.9818 0.9720 0.9860 Duration -> Drug 0.84 0.8947 0.8519 0.9125 0.8829 0.9292 Frequency -> Drug 0.9525 0.9735 0.9592 0.9810 0.9692 0.9873 Form -> Drug 0.9728 0.9867 0.9713 0.9864 0.9765 0.9890 Route -> Drug 0.9581 0.9742 0.9668 0.9805 0.9736 0.9858 Reason -> Drug 0.7328 0.8364 0.7464 0.8466 0.7579 0.8488 ADE -> Drug 0.7604 0.8221 0.7528 0.8112 0.7946 0.8502 Overall 0.9256 0.9521 0.9304 0.9574 0.9399 0.9630 Task: 2018 n2c2 Drug-ADE challenge Wei, Q., et al. A study of deep learning approaches for medication and adverse drug event extraction from clinical text. JAMIA, 2020 27(1), pp.13-21.
  • 53. Active Learning to Reduce Annotation Cost § Goal: minimize annotation cost while maximizing the quality of ML-based model 43 Pool of unlabeled data Labeled data Human annotator Machine learner active learning: select the most informative samples select samples randomly VS. passive learning:
  • 54. Querying Algorithms § Uncertainty-based querying § Clustering and uncertainty sampling engine (CAUSE) § Query the most uncertain and representative sentences 44 Score(c1) = 0.6 Score(c2) = 0.4 Score(c3) = 0.1 c1 c2 c3 Inputs: Clusters Uncertainty Number of queries = 2 Steps: (1) Cluster scoring (2) Representative sampling Outputs: a and c a c b An active learning-enabled annotation system for clinical named entity recognition. Chen Y et al. BMC Med Inform & Decis Mak. 2017
  • 55. Simulation Study 45 0.25 0.35 0.45 0.55 0.65 0.75 0.85 8 32 128 512 2048 8192 F-measure Number of sentences in the training set Uncertainty Diversity Length Random Annotation cost Random sampling Uncertainty sampling Reduction Sentences 8,702 2,971 66% To achieve a model with 0.80 in F- measure
  • 57. Active Learning Workflow 47 Annotate a sentence by user Encode CRF model Rank sentences based on querying algorithm Labeled set Ranked unlabeled set Start Pool User quits or time out EndYes No Learning Annotation Activator Interface Load data pool Select top unlabeled sentence
  • 58. Real Time User Study on Acitive vs. Passive Learning 48
  • 59. Categories Features Basic Number of words (NOW) Number of entities (NOE) Number of entity words (NOEW) Syntactic Entropy of POS tag (EOP) Semantic TFIDF Sentence MRI by report showed bilateral rotator cuff repairs and he was admitted for repair of the left rotator cuff . Feature NOW NOE NOEW TFIDF EOP Value 20 3 11 35.36 2.28 !"#$ # = !& + ( ) !)*) # A linear regression model was used to estimate annotation time based on the basic information, semantic complexity and syntactic complexity of the sentence. UPC(s) = Utility(s)/Cost(s) Cost-aware Active Learning
  • 60. 8 out of 9 users showed better performance on active learning than random sampling AL saves 20-30% annotation time! A Larger User Study Wei Q. et. al Cost-aware Active Learning for Named Entity Recognition in Clinical Text JAMIA 2019
  • 61. Human Annotation Process is Complicated § Annotation speed vs. quality 68 70 72 74 76 78 80 82 84 86 88 0 10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 Annotionquality(F1score) Speed(words/minute) Users Statistics of Annotations Speed Quality Wei, Q., et al. AMIA 2018 user1 user2 user3 user4 user5 user6 user7 user8 user9 DD -0.038* -0.036 -0.002 -0.036 -0.088* -0.107* -0.273 0.001 -0.13* EOP 0.338* 0.245 1.046* -0.459 0.695* -1.066* -0.459 -0.349 0.95* NOP 0.243 0.705 -0.372 -0.609 0.431* 0.583* 0.478 -0.147 -0.297 ISC -0.083 -0.397 -0.38 -0.196 0.421* -0.737* -0.452 -0.49* -0.261 NOV 0.402* 0.634* -0.35* 0.201 -0.22 0.139 1.175* 0.514* 1.17* DOP -0.234 -0.903 0.58* 0.592 -0.756* 1.213* -0.885 0.716* 1.25* § Syntactic structure impact DD - Dependency Distance EOP – Entropy of POS tags NOP - Number Of Phrase nodes …….
  • 62. Mapping to Standard Clinical Terminologies is Important § Encoding (Entity Linking) – find the corresponding concept ID in a terminology for a given term/entity § Example: § Entity: “right below - knee amputation” § Candidates: • 1: C2202463 amput below knee leg right • 2: C0002692 amput below knee • 3: C0002692 amput below bka knee • … § Challenges § Lexical variation § Polysemy § Granularity differences
  • 63. Entity Linking Framework – Map to UMLS NE term Query on UMLS concepts Ranking by similarity scores Post processing, adjust CUI offset Query expansion: LVG/abbr/synonyms/ adj-to-noun… Learning to Rank UMLS Index Builder UMLS Index
  • 64. Task Dataset Method Accuracy SNOMED-CT clinical text 2013 ShARe/CLEF 2014 Semeval BM25 + Domain knowledge+RankSVM (#1 in challenge) (Zhang, 2014) 0.873 BM25 + domain Knowledge + CNN (Tang, 2017) 0.903 BM25 + BERT (Ji, 2019) 0.911 MedDRA drug labels 2018 TAC ADR BM25 + Translational model + RankSVM (#1 in challenge) (Xu, 2018) 0.911 BM25 + BERT (Ji, 2019) 0.932 MeSH biomedical literature NCBI BM25 + domain Knowledge + CNN (Tang, 2017) 0.861 BM25 + BERT (Ji, 2019) 0.891 Encoding Algorithms and Performance on Benchmark Data
  • 65. Summary § Rules are still important when optimizing performance of biomedical NLP systems – hybrid approaches often achieve best performance § Interacting human with data/algorithm is one way to improve model performance while reducing annotation cost § Biomedical ontologies and other knowledge bases are valuable for many NLP applications
  • 66. Integrate All to Better Support End-Users The CLAMP system
  • 67. CLAMP - Clinical Language Annotation, Modeling, and Processing
  • 68. NLP Challenge Tasks Ranking Named entity recognition 2009 i2b2 medication information extraction #2 2010 i2b2 problem, treatment, test extraction #2 2013 SHARe/CLEF abbreviation recognition #1 2016 CEGS N-GRID, De-identification #2 UMLS encoding 2014 SemEval, disorder encoding #1 Relation extraction 2012 i2b2 Temporal information extraction #1 2015 SemEval Disease-modifier extraction #1 2015 BioCREATIVE Chemical-induced disease from literature #1 2016 SemEvel, temporal information extraction #1 2017 TAC ADR extraction from drug labels #1 2018 n2c2, medication and associated ADR #1 CLAMP Algorithms Track record in clinical NLP challenges The DeepMed Framework CRFs, SSVMs Bi-LSTM-CRF BERT/bioBERT ….. AutoML Docker Container
  • 70. CLAMP Rule Interface for Human
  • 71. 61 Available as: • CLAMP-CMD • CLAMP-GUI • CLAMP-EE CLAMP Users
  • 72. When developing biomedical NLP applications, please § Identify the right NLP tasks for your projects § Assemble the development team with required expertise (domain experts, business owners, informaticians, developers ….) § Collect data and annotate data following a standard protocol (guideline development, annotator training/agreement check, annotation quality control …) § Select appropriate algorithms (accuracy, speed, implementation …) and carefully evaluate its performance/usability/interoperability § Keep human (multidisciplinary) in the loop during the life cycle of the development
  • 73. 1 Transfer Learning of NLP in Medicine: A Case Study with BERT Yifan Peng Department of Population Health Sciences
  • 74. 2 u A technique that allows to reutilize an already trained model on one dataset and adapt it to a different dataset u In the field of computer vision, researchers have repeatedly shown the value of transfer learning u Two steps u Pre-training: Use a large training set to learn network parameters and save them for later use u Fine-tuning: Train all (or part of) layers of the pretrained network on the target dataset Transfer learning
  • 75. 3 u Step 1: Pre-train the CNN on ImageNet (14 million images) Example: Detect lung diseases from chest X-ray
  • 76. 4 u Step 2: Fine-tune the model on NIH Chest X-ray (100,000 chest X-ray images) Example: Detect lung diseases from chest X-ray Wang et al., ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. CVPR. 2017.
  • 77. 5 u Less difficult to train a complex network u Speed up the convergence speed of the training u How can transfer learning benefit NLP in medicine Transfer learning
  • 78. 6 u Word embedding u ELMo u BERT u How to use pre-trained BERT u Performance comparison of BERT in medicine u Multi-task learning Outlines • How BERT’s idea are gradually formed? • What has been innovated? • Why the effect is good?
  • 79. 7 u There are an estimated 13 million words for the English language u They are not completely unrelated. u Hotel vs Motel vs Dog u We want to encode each word into some representations that the machine can understand. How do we represent the meaning of a word?
  • 80. 8 u Encode similarity in the vectors themselves u Some N-dimensional space (e.g., 200D) that is sufficient to encode all semantics of our language. u Each dimension would encode some meaning that we transfer using speech. u tense (past vs. present vs. future) u count (singular vs. plural) Word vector
  • 81. 9 u Word2vec, fastText, etc u BioWordVec: https://github.com/ncbi-nlp/BioWordVec Word Embeddings Zhang et al., Improving biomedical word embeddings with subword information and MeSH. Scientific Data. 2019 Sources Documents Tokens PubMed 30M 4000M MIMIC III Clinical notes 2M 500M
  • 82. 10 Interesting semantic patterns emerge in the vectors Word pair word2vec BioWordVec thalassemia hemoglobinopathy — 0.834 mycosis histoplasmosis 0.353 0.706 thirsty hunger 0.252 0.629 influenza pneumoniae 0.482 0.611 atherosclerosis angina 0.503 0.589 Zhang et al., Improving biomedical word embeddings with subword information and MeSH. Scientific Data. 2019
  • 83. 11 Interesting syntactic patterns emerge in the vectors Rohde et al. An Improved Model of Semantic Similarity Based on Lexical Co-Occurrence. 2005
  • 84. 12 u Algebraic relations: u vec(‘‘woman")−vec(‘‘man") + vec(‘‘aunt") ≈ vec(‘‘uncle") u vec(‘‘woman")−vec(‘‘man") + vec(‘‘queen") ≈ vec(‘‘king") Linear translations Tomas Mikolov , Wen-tau Yih, Geoffrey Zweig,“Linguistic Regularities in Continuous Space Word Representations”, NAACL- HLT 2013
  • 85. 13 How to use Word Embeddings Peng et al. Extracting chemical-protein relations with ensembles of SVM and deep learning models. Database. 2018. Recurrent Neural NetworkConvolutional Neural Network
  • 86. 14 u Evaluation of word embeddings in protein-protein interaction (PPI) extraction task Word Embeddings in DL Data Set Word2vec BioWordVec AIMed 0.445 0.487 BioInfer 0.524 0.549 BioInfer 0.603 0.623 IEPA 0.484 0.511 HPRD50 0.679 0.713
  • 87. 15 Polysemous problem u I arrived at the bank after crossing the river u The bank has plan to branch through the country… Static Word Embedding can't solve the problem of polysemous words. Limitations of Word Embeddings
  • 88. 16 u “Embedding from Language Models” Adjust the Word Embedding representation of the word according to the semantics of the context word From Word Embedding to ELMo Peters et al., Deep contextualized word representations. NAACL. 2018
  • 89. 17 u A typical two-stage process u The first stage is to use the language model for pre-training. ELMo no evidence of infiltrate Target Word Left context Right context
  • 90. 18 u A typical two-stage process u The first stage is to use the language model for pre-training. u The second stage is to extract the Embeddings of each layer. ELMo ELMo word representations are functions of the entire input sentence
  • 91. 19 u Evaluation of ELMo in named entity recognition and relation extraction tasks ELMo in medical NLP Task Dataset SOTA ELMo Named Entity Recognition ShARe/CLEFE 70.0 75.6 Relation Extraction DDI 72.9 78.9 Relation Extraction ChemProt 64.1 66.6
  • 92. 20 u Hard to capture long distance information u Computational expensive Limitations of ELMo
  • 93. 21 u Bidirectional Encoder Representations from Transformers From ELMo to BERT Devlin et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL. 2019
  • 95. 23 u A self-attention mechanism which directly models relationships between all words in a sentence. Why transformer? https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html
  • 97. 25 u In parallel. Much faster and more space-efficient Why transformer?
  • 98. 26 u Pre-training u PubMed abstracts u Clinical notes u Fine-tuning u Sentence Similarity u Named entity recognition u Relation extraction u etc BERT and BlueBERT Corpus Words Domain PubMed abstract 4,000M Biomedical MIMIC-III 500M Clinical
  • 99. 27 u Word embedding u ELMo u BERT u How to use pre-trained BERT u Performance of BERT in medicine (BLUE Benchmark) u Multi-task learning Outlines • How BERT’s idea are gradually formed? • What has been innovated? • Why the effect is good?
  • 100. 28 Assign tags or categories to text according to its content. For example u Organizing millions of cancer- related references from PubMed into the Hallmarks of Cancer How to use BERT - Sentence classification
  • 101. 29 Extract semantic relationships from a text How to use BERT - Relation extraction
  • 102. 30 Predict similarity scores based on the sentence pairs. For examples, u The above was discussed with the patient, and she voiced understanding of the content and plan. u The patient verbalized understanding of the information and was satisfied with the plan of care. How to use BERT - Sentence similarity
  • 103. 31 u Locate and classify named entity mentions in text into pre-defined categories How to use BERT - Named entity recognition
  • 104. 32 u Pre-trained models u Fine-tuning codes u Preprocessed texts in PubMed https://github.com/ncbi-nlp/bluebert
  • 105. 33 u Word embedding u ELMo u BERT u How to use pre-trained BERT u Performance of BERT in medicine (BLUE Benchmark) u Multi-task learning Outlines • How BERT’s idea are gradually formed? • What has been innovated? • Why the effect is good?
  • 106. 34 u Significant advances in the development of pretraining language representations in the general domain u ELMo, BERT, Transformer-XL, XLNet u General Language Understanding Evaluation (GLUE) benchmark in general domain u No publicly available benchmarking in the biomedicine domain Biomedical Language Understanding Evaluation (BLUE) benchmark u Contains diverse range of text genres (biomedical literature and clinical notes) u Highlight common biomedicine text-mining challenges u Promote development of language representations in biomedicine domain BLUE Benchmark
  • 107. 35 BLUE benchmark Not publicly available, but the permissions can be requested.
  • 108. 36 Results Sentence similarity Named entity recognition Relation extraction Doc classification Inference
  • 110. 38 u Word embedding u ELMo u BERT u How to use pre-trained BERT u Performance of BERT in medicine (BLUE Benchmark) u Multi-task learning Outlines • How BERT’s idea are gradually formed? • What has been innovated? • Why the effect is good?
  • 111. 39 u Multi-task learning (MTL) is a field of machine learning where multiple tasks are learned in parallel while using a shared representation u Increase the sample size for training the model, thus lead to performance improvement by increasing the generalization of the model u This is particularly helpful in some applications such as medical informatics where (labeled) datasets are hard to collect u May also helpful in the context that researchers are in the hassle of choosing a suitable model for new problems where training resources are limited Multi-task learning
  • 113. 41 u Pretraining u BlueBERT: pretrained on PubMed and MIMIC-III u BioBERT: pretrained on PubMed u Refining via Multi-task learning u Refine all layers in the model u Fine-tuning MT-BERT u Continue training all layers on each specific task Training procedure
  • 114. 42 Test results on clinical tasks • Fine-tuning BERT (4 models) • Refining via Multi-task learning (1 model) • Refine all layers in the model • Fine-tuning MT-BERT (4 models) • Continue training all layers on each specific task
  • 115. 43 Test results on biomedical tasks • Fine-tuning BERT (4 models) • Refining via Multi-task learning (1 model) • Refine all layers in the model • Fine-tuning MT-BERT (4 models) • Continue training all layers on each specific task
  • 116. 44 Test results on eight BLUE tasks
  • 117. 45 u Word embeddings à ELMo à BERT u Pre-trained BERT models u How to use BERT u Performance comparison and benchmark u Multi-task learning Summary
  • 118. 46 u https://github.com/ncbi-nlp/BioWordVec u https://github.com/ncbi-nlp/BioSentVec u https://github.com/ncbi-nlp/bluebert u https://github.com/ncbi-nlp/BLUE Resources
  • 119. 47 u BERT, ELMo, and mt-dnn u Shared tasks and datasets u BIOSSTS, MedSTS, BioCreative V chemical-disease relation task, ShARe/CLEF eHealth task, DDI extraction 2013 task, BioCreative VI CHEMPROT, i2b2 2010 shared task, Hallmarks of Cancers corpus u This work was supported by the Intramural Research Programs of the National Library of Medicine, National Institutes of Health, and K99LM013001. Acknowledgment
  • 121. AIME 2020 Digital Phenotyping for Cohort Discovery using Electronic Health Records Yanshan Wang Assistant Professor of Biomedical Informatics Division of Digital Sciences Research Mayo Clinic
  • 122. AIME 2020 Why take this tutorial? • Patient cohort retrieval is still labor expensive today. • Most information is embedded in unstructured EHRs. • Natural language processing is under- utilized for cohort retrieval. 2
  • 123. AIME 2020 Goal of this tutorial • To get an understanding of basic concepts about cohort retrieval in clinical domain. • To connect NLP theory with clinical knowledge. • To get an introduction into clinical use cases of cohort retrieval. 3
  • 125. AIME 2020 Suggested reading • Papers • A review of approaches to identifying patient phenotype cohorts using electronic health records. Shivade et al. 2013. • Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials. Miotto et al. 2015. • A survey of practices for the use of electronic health records to support research recruitment. Obeid et al. 2017. • Clinical information extraction applications: a literature review. Wang et al. 2018 • Using clinical natural language processing for health outcomes research: Overview and actionable suggestions for future advances. Velupillai et al. 2018. 5
  • 126. AIME 2020 Agenda • Basic Concepts • EHR, Phenotyping, Evidence-based Clinical Research, Knowledge Base, Common Data Model • Patient Cohort Discovery • Brief Introduction to NLP • NLP for Cohort Discovery
  • 127. AIME 2020 Basic Concepts • Electronic Health Record • Phenotyping • Evidence-based clinical research • Knowledge bases • Common Data Model
  • 128. AIME 2020 Basic Concepts • Electronic Health Record
  • 129. AIME 2020 Basic Concepts • Phenotyping • The phenotype (as opposed to genotype, which is the set of genes in our DNA responsible for a particular trait) is the physical expression, or characteristics, of that trait. • Phenotyping is the practice of developing algorithms designed to identify specific phenomic traits within an individual1. • Digital phenotyping using EHRs • Traditionally, clinical studies often use self-report questionnaires or clinical staff to obtain phenotypes from patients. (slow, expensive, could not scale). • EHR data come in both structured and unstructured formats, and the use of both types of information can be essential for creating accurate phenotypes2. 1. eMERGE network. 2. Wei, W. Q., & Denny, J. C. (2015). Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome medicine, 7(1), 41.
  • 130. AIME 2020 Basic Concepts • Phenotyping • Phenotyping is the practice of developing algorithms designed to identify specific phenomic traits within an individual1. • Digital Phenotyping using EHRs • EHR data come in both structured and unstructured formats, and the use of both types of information can be essential for creating accurate phenotypes2. Source: Wei, W. Q., & Denny, J. C. (2015). Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome medicine, 7(1), 41. (NLP) (NLP)
  • 131. AIME 2020 Evidence-based clinical research • Observational studies • Types of studies in epidemiology, such as the cohort study and the case-control study. • The investigators retrospectively assess associations between the treatments given to participants and their health status. • Randomized control trials • Clinical trials are prospective biomedical or behavioral research studies on human participants that are designed to answer specific questions about biomedical or behavioral interventions including new treatments, such as novel vaccines, drugs, and medical devices.
  • 132. AIME 2020 Basic Concepts • Cohort/Eligibility Criteria • Inclusion criteria • Exclusion criteria
  • 133. AIME 2020 Basic Concepts • Cohort/Eligibility Criteria • Inclusion criteria • Exclusion criteria https://clinicaltrials.gov/ct2/show/NCT03690193?cond=alzheimer%27s+disease&rank=5 clinicaltrials.gov
  • 134. AIME 2020 Basic Concepts • Knowledge Bases • UMLS (Unified Medical Language System) (including the Metathesaurus, Semantic Network, the Specialist Lexicon) • Used as a knowledge base and resource for a lexicon. Metathesaurus provides the medical concept identifiers. Semantic Network specifies the semantic categories for the medical concepts.
  • 135. AIME 2020 Basic Concepts • Knowledge Bases • UMLS (Unified Medical Language System) (including the Metathesaurus, Semantic Network, the Specialist Lexicon) • Used as a knowledge base and resource for a lexicon. Metathesaurus provides the medical concept identifiers. Semantic Network specifies the semantic categories for the medical concepts. • SNOMED-CT • Standardized vocabulary of clinical terminology.
  • 136. AIME 2020 Basic Concepts • Knowledge Bases • UMLS (Unified Medical Language System) (including the Metathesaurus, Semantic Network, the Specialist Lexicon) • Used as a knowledge base and resource for a lexicon. Metathesaurus provides the medical concept identifiers. Semantic Network specifies the semantic categories for the medical concepts. • SNOMED-CT • Standardized vocabulary of clinical terminology. • LOINC • Standardized vocabulary for identifying health measurements, observations, and documents.
  • 137. AIME 2020 Basic Concepts • Knowledge Bases • UMLS (Unified Medical Language System) (including the Metathesaurus, Semantic Network, the Specialist Lexicon) • Used as a knowledge base and resource for a lexicon. Metathesaurus provides the medical concept identifiers. Semantic Network specifies the semantic categories for the medical concepts. • SNOMED-CT • Standardized vocabulary of clinical terminology. • LOINC • Standardized vocabulary for identifying health measurements, observations, and documents. • MeSH • NLM controlled vocabulary thesaurus used for indexing articles for PubMed articles.
  • 138. AIME 2020 Basic Concepts • Knowledge Bases • UMLS (Unified Medical Language System) (including the Metathesaurus, Semantic Network, the Specialist Lexicon) • Used as a knowledge base and resource for a lexicon. Metathesaurus provides the medical concept identifiers. Semantic Network specifies the semantic categories for the medical concepts. • SNOMED-CT • Standardized vocabulary of clinical terminology. • LOINC • Standardized vocabulary for identifying health measurements, observations, and documents. • MeSH • NLM controlled vocabulary thesaurus used for indexing articles for PubMed articles. • MedDRA • Terminologies specific to adverse event. • RxNorm • Terminologies specific to medications
  • 139. AIME 2020 Basic Concepts • Common Data Model • Common Data Model (CDM) is a specification that describes how data from multiple sources (e.g., multiple EHR systems) can be combined. Many CDMs use a relational database. • Observational Medical Outcomes Partnership (OMOP) CDM by Observational Health Data Sciences and Informatics (OHDSI)
  • 140. AIME 2020 OMOP CDM v. 5.0 Source: https://www.ohdsi.org/data-standardization/the-common-data-model/
  • 141. AIME 2020 Why Natural Language Processing (NLP)?
  • 142. AIME 2020 Facts • Artificial Intelligence (AI) is one of the most interesting fields of research today. • The growth of and interest in AI is due to the recent advances in deep learning. • Language is the most compelling manifestation of intelligence. 22
  • 143. AIME 2020 Facts • Artificial Intelligence (AI) is one of the most interesting fields of research today. • The growth of and interest in AI is due to the recent advances in deep learning. • Language is the most compelling manifestation of intelligence. 23
  • 145. AIME 2020 Natural Language Processing • What is NLP? • "Natural language processing (NLP) is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data." (Wikipedia)
  • 146. AIME 2020 Natural Language Processing • What is NLP? • "Natural language processing (NLP) is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data." (Wikipedia)
  • 147. AIME 2020 Question Answering 27 Source: https://www.youtube.com/watch?v=BkpAro4zIwU IBM Watson Voice Assistant
  • 149. AIME 2020 Information Extraction 29 The patient’s maternal grandmother was diagnosed with breast cancer at age 59 and passed away at age 80.
  • 150. AIME 2020 Information Extraction 30 The patient’s maternal grandmother was diagnosed with breast cancer at age 59 and passed away at age 80. Entity normalization The patient’s FAMILY MEMBER was diagnosed with CONDITION at age AGE and LIVING_STATUS at age AGE. Dependency parser Maternal Grandmother Breast cancer 59 deceased 80 Family Member Condition Living Status Age
  • 151. AIME 2020 Sentiment Analysis 31 ■ nice and compact to carry! ■ since the camera is small and light, I won't need to carry around those heavy, bulky professional cameras either! ■ the camera feels flimsy, is plastic and very light in weight you have to be very delicate in the handling of this camera Reviews: Attributes: zoom affordability size and weight flash ease of use ✓ ✗ ✓ Source: https://web.stanford.edu/~jurafsky/NLPCourseraSlides.html
  • 155. AIME 2020 How to represent Natural Language • Natural language text = sequences of discrete symbols (e.g. words). • Vector representations of words: Vector Space Model • Bag-of-words 35 I love NLP like dogs I love NLP and like dogs I 1 0 0 0 0 0 love 0 1 0 0 0 0 NLP 0 0 1 0 0 0 like 0 0 0 0 1 0 Vocabulary list
  • 156. AIME 2020 How to represent Natural Language • Drawbacks • love=[0,1,0,0,0,0] AND like=[0,0,0,0,1,0] = 0! • Using such representation, there’s no meaningful (semantic) comparison we can make between words. 36 I =[ 1 0 0 0 0 0 ] love =[ 0 1 0 0 0 0 ] NLP =[ 0 0 1 0 0 0 ] like =[ 0 0 0 0 1 0 ] Sparse representation
  • 157. AIME 2020 How to represent Natural Language • With AI models, we learn “meaning” of a word using dense semantic representation /word embeddings • Learning semantic representation from data (corpus). • Simply examining a large corpus it’s possible to learn word vectors that are able to capture the semantics and relationships between words in a surprisingly expressive way.37 I =[ 0.99 0.05 0.1 0.87 0.1 0.1 ] love =[ 0.1 0.85 0.99 0.1 0.83 0.09 ] NLP =[ 0.67 0.23 0.01 0.02 0.01 0.81 ] like =[ 0.1 0.73 0.99 0.05 1.79 0.09 ] semantic representation
  • 158. AIME 2020 NLP in AI is All About Learning A Better Representation of Language
  • 159. AIME 2020 • Images are easier to be presented by RGB values Source: https://analyticsindiamag.com/computer-vision-primer-how-ai-sees-an-image/
  • 160. AIME 2020 • Language is much harder… • “The weather’s looking gloomy today. I better wear my trusty rubber boots!” • “The weather’s looking gloomy today. I’m going to stay inside.”1 • Will, will Will will Will Will's will? – Will (a person), will (future tense helping verb) Will (a second person) will (bequeath) [to] Will (a third person) Will's (the second person) will (a document)? (Someone asked Will 1 directly if Will 2 plans to bequeath his own will, the document, to Will 3)2 Source: 1. Carlson L. Moral and Linguistic Perspectives on Pain and Suffering in Doctor-Patient Discourse. UMN Thesis. 2. Han, Bianca-Oana (2015). "On Language Peculiarities: when language evolves that much that speakers find it strange" (PDF). Philologia (18): 140. ISSN 1582-9960. Archived (PDF) from the original on 14 October 2015.
  • 161. AIME 2020 Learning Better Representations 41 Source: https://www.datasciencecentral.com/profiles/blogs/overview-of-artificial-intelligence-and-role-of-natural-language Better Word Representation “Representation Learning” Language Features
  • 162. AIME 2020 NLP for Patient Cohort Discovery
  • 163. AIME 2020 Clinical Research Pathway Research Question Protocol Design Feasibility Identify Patients Invite Patients Pre- screening ConsentAnalysisReport
  • 164. AIME 2020 Clinical Research Pathway Research Question Protocol Design Feasibility Identify Patients Invite Patients Pre- screening ConsentAnalysisReport
  • 165. AIME 2020 Clinical Trials Eligibility Screening and Recruitment • Clinical trials recruitment • Randomized clinical trials are fundamental to the advancement of medicine. However, patient recruitment for clinical trials remains the biggest barrier to clinical and translational research. Cancer patients are eligible1 Cancer patients are participate1 Clinical trials fail to retain enough patients2 85%20% <5% 1. Haddad TC, Helgeson J, Pomerleau K, Makey M, Lombardo P, Coverdill S, Urman A, Rammage M, Goetz MP, LaRusso N. Impact of a cognitive computing clinical trial matching system in an ambulatory oncology practice. American Society of Clinical Oncology; 2018. 2. Cote DN. Minimizing Trial Costs by Accelerating and Improving Enrollment and Retention. Global Clinical Trials for Alzheimer's Disease: Elsevier; 2014. p. 197- 215.
  • 166. AIME 2020 268 190 187 177 170 141 133 131 122 114 113 110 106 105 103 95 94 90 90 78 77 77 76 75 75 73 72 70 68 0 50 100 150 200 250 300 M ayo Clinic Johns H opkins University D uke U niversity M D Anderson C ancerCenter M assachusetts G eneralH ospital U niversity ofW ashington U niversity ofCalifornia,San Francisco Stanford University M em orialSloan-Kettering C ancerC enter U niversity ofNorth C arolina C olum bia U niversity VanderbiltU niversity U niversity ofPittsburgh U niversity ofPennsylvania U niversity ofM innesota U niversity ofCalifornia,Los Angeles W ashington U niversity in StLouis The C leveland Clinic U niversity ofW isconsin U niversity ofM ichigan BaylorCollege ofM edicine Yale U niversity U niversity ofChicago Em ory U niversity U niversity ofCalifornia,San D iego O regon H ealth and Science U niversity Indiana U niversity U niversity ofAlabam a atBirm ingham C ase W estern R eserve U niversity NUMBER OF TRIALS BETWEEN 2007 AND 2010 Source: Chen et al. Publication and reporting of clinical trial results: cross sectional analysis across academic medical centers. BMJ. 2016
  • 167. AIME 2020 NLP for Clinical Trials Eligibility Screening Natural Language Processing All patients Eligible patients Expeditepatientscreeningand increasepatientrecruitmentrates EHR Recruit Clinical trials criteria
  • 169. AIME 2020 Example: Clinical trials eligibility screening for GERD Identify a cohort of patients with and without chronic reflux using the definitions spelled out below. We wish to test people with and without chronic reflux as our working hypothesis is that the prevalence of Barrett's esophagus is comparable between those with and without chronic reflux. Inclusion criteria : 1. Age greater than 50 years. 2. Gastroesophageal reflux disease. This can be defined using ICD 9 or ICD 10 cords. Additional criteria which could be used to define GERD broadly are chronic (> 3 mo) use of a proton pump inhibitor (drug names include omeprazole, esomeprazole, pantoprazole, rabeprazole, dexlansoprazole, lansoprazole) or a H2 receptor blocker (ranitidine, famotidine, cimetidine). Prior endoscopic diagnosis of erosive esophagitis can also be used to make a diagnosis of GERD. 3. Male gender 4. Obesity defined as body mass index greater than equal to 30. This is a surrogate marker for central obesity. 5. Current or previous history of smoking 6. Family history of esophageal adenocarcinoma/cancer or Barrett's esophagus Exclusion criteria 1. Previous history of esophageal adenocarcinoma/cancer or Barrett's esophagus, previous history of endoscopic ablation for Barrett's esophagus. 2. Previous history of esophageal squamous cancer or squamous dysplasia. 3. Treatment with oral anticoagulation including warfarin/Coumadin. 4. History of cirrhosis or esophageal varices 5. History of Barrett’s esophagus : this can be defined with ICD 9/10 codes. 6. History of endoscopy (will need to use a procedure code for EGD) in the last 5 years.
  • 170. AIME 2020 Criteria ICD 9 ICD 10 CPT 4 Medication Inclusion 1. Age greater than 50 years. 2. Gastroesophageal reflux disease (any of 2.1, 2.2, 2.3) 2.1 Gastroesophageal reflux disease defined by Dx 530.81 K21.9 2.2 Gastroesophageal reflux disease defined by drug, duration of use >= 3 months over the last 5 years omeprazole, esomeprazole, pantoprazole, rabeprazole, dexlansoprazole, lansoprazole, ranitidine, famotidine, cimetidine 2.3 Gastroesophageal reflux disease defined by prior endoscopic diagnosis of erosive esophagitis 530.19 K21.0 Not able to find specific code for esophagitis 3. Male gender 4. Obesity defined as body mass index greater than equal to 30. 5. Current or previous history of smoking 6. Family history of esophageal adenocarcinoma/cancer or Barrett's esophagus 7. Caucasian Exclusion 1. Previous history of esophageal adenocarcinoma/cancer 150.9 C15.9 2. previous history of endoscopic ablation for Barrett's esophagus. 43229, 43270 43228 43258 3. Previous history of esophageal squamous carcinoma (included in 1) 150.9 C15.9 4. Previous history of esophageal squamous dysplasia 622.10 N87.9 5. Current Treatment (drug) with oral anticoagulation - warfarin warfarin 6. Current Treatment (drug) with oral anticoagulation - Coumadin. (included in 5) Coumadin 7. History of cirrhosis 571.5 K74.60 8. History of esophageal varices 456.20 I85.00 9. History of Barrett’s esophagus 530.85 K22.7 K22.710 K22.711 K22.719 10. History of endoscopy in the last 5 years 43235-43270
  • 171. AIME 2020 Criteria ICD 9 ICD 10 CPT 4 Medication Inclusion 1. Age greater than 50 years. 2. Gastroesophageal reflux disease (any of 2.1, 2.2, 2.3) 2.1 Gastroesophageal reflux disease defined by Dx 530.81 K21.9 2.2 Gastroesophageal reflux disease defined by drug, duration of use >= 3 months over the last 5 years omeprazole, esomeprazole, pantoprazole, rabeprazole, dexlansoprazole, lansoprazole, ranitidine, famotidine, cimetidine 2.3 Gastroesophageal reflux disease defined by prior endoscopic diagnosis of erosive esophagitis 530.19 K21.0 Not able to find specific code for esophagitis 3. Male gender 4. Obesity defined as body mass index greater than equal to 30. 5. Current or previous history of smoking 6. Family history of esophageal adenocarcinoma/cancer or Barrett's esophagus 7. Caucasian Exclusion 1. Previous history of esophageal adenocarcinoma/cancer 150.9 C15.9 2. previous history of endoscopic ablation for Barrett's esophagus. 43229, 43270 43228 43258 3. Previous history of esophageal squamous carcinoma (included in 1) 150.9 C15.9 4. Previous history of esophageal squamous dysplasia 622.10 N87.9 5. Current Treatment (drug) with oral anticoagulation - warfarin warfarin 6. Current Treatment (drug) with oral anticoagulation - Coumadin. (included in 5) Coumadin 7. History of cirrhosis 571.5 K74.60 8. History of esophageal varices 456.20 I85.00 9. History of Barrett’s esophagus 530.85 K22.7 K22.710 K22.711 K22.719 10. History of endoscopy in the last 5 years 43235-43270 NLP-based Digital Phenotyping Algorithm
  • 172. AIME 2020 Screening patients by Inclusion criteria 1, 3, 4, 7 and all Exclusion criteria using i2b2. Get patient set A (n=31749) From patient set A, screening patients by Inclusion criteria 2.1 using i2b2. Get patient set B (n=8667) From patient set A, screening patients by Inclusion criteria 2.2 using i2b2. Get patient set C (n=1577) From patient set A, screening patients by Inclusion criteria 2.3, 5, 6 using ACE and NLP. Get patient set D (n=230) Union patient sets B, C, and D. Get patient set E (n=9080)
  • 173. AIME 2020 Architecture of Current Solutions Structured data Collating Results User Interface (visualization, analytics, reporting, etc.) Postprocessing using Unstructured data
  • 174. AIME 2020 An Integrated Framework Structur ed data Unstruct ured data Collating Results User Interface (visualization, analytics, reporting, etc.)
  • 175. AIME 2020 Information Retrieval for Cohort Discovery • Cohort retrieval is similar to modern search engines.
  • 176. AIME 2020 Structured EHR Structured Data Unstructured EHR Data Retrieval Unstructured Concepts Clinical Texts Data Retrieval Unstructured Query EHR CDR Full-text Query Structured Query Transform Index Structured Index Unstructured Index Parse/Edit Filtered Cohort Relevant Cohort End User NLP Structured Data Flow Unstructured Data Flow Cohort Retrieval Enhanced Analysis by Text from EHR (CREATE) ICD9/10, CPT,  SNOMED CT ... "Adults with inflammatory bowel disease (ulcerative  colitis or Crohn's disease)" Index Index Machine Learning CREATE (Cohort Retrieval Enhanced by the Analysis of Text from Electronic health records) Liu et al. CREATE: Cohort Retrieval Enhanced by Analysis of Text from Electronic Health Records using OMOP Common Data Model.. 2019.
  • 180. AIME 2020 Another Way of Thinking of Cohort Retrieval: Patient Representation Patient Representation Patient 1 -0.0011 -0008 -0.0050 ... Patient 2 0.0108 -0.0194 0.0101 … Patient 3 -0.0433 0.0361 0.0272 ... Patient 4 -0.0935 0.0655… … … EHR AI Clinical Trial Target Eligible Patients Similarity Measurement
  • 181. AIME 2020 Unsupervised Machine Learning for Patient Representation using EHRs Poisson Dirichlet Model: an unsupervised generative probabilistic machine learning model. Wang et al. Unsupervised Machine Learning for the Discovery of Latent Disease Clusters and Patient Subgroups Using Electronic Health Records. Journal of biomedical informatics. 2019. Latent Dirichelt Allocation (LDA) Poisson Dirichlet Model (PDM)
  • 182. AIME 2020 Unsupervised Machine Learning for Patient Representation using EHRs Discovering disease clusters using EHRs Discovering patient subgroups using EHRs diabetes comorbidities Enhance disease risk prediction Discover new underlying disease mechanisms Personalized care, diagnosis, treatment, and prevention. EHR AI AIEHR Wang et al. Unsupervised Machine Learning for the Discovery of Latent Disease Clusters and Patient Subgroups Using Electronic Health Records. Journal of biomedical informatics. 2019.
  • 184. AIME 2020 Patient Subgroups Wang et al. Unsupervised Machine Learning for the Discovery of Latent Disease Clusters and Patient Subgroups Using Electronic Health Records. Journal of biomedical informatics. 2019. Osteoporosis Cohort Delirium/Dementia Cohort COPD/Bronchiectasis Cohort
  • 185. AIME 2020 NLP for Cohort Discovery is All About Learning A Better Representation of Patient
  • 188. Advances of Natural Language Processing in Clinical Research Rui Zhang, Ph.D. Associate Professor and McKnight Presidential Fellow Institute for Health Informatics, Department of Pharmaceutical Care & Health Systems, and Data Science University of Minnesota, Twin Cities August 25, 2020 1 AIME 2020 Tutorial 1
  • 189. Outline • Part 1: NLP for Dietary Supplement Clinical Research • Part 2: Information Extraction in EHRs and Clinical Trials 2
  • 190. Clinical Research Informatics (CRI) • CRI involves the use of informatics in the discovery and management of new knowledge relating to health and disease. • It includes management of information related to clinical trials and also involves informatics related to secondary research use of clinical data. • It involves approaches to collect, process, analyze, and display health care and biomedical data for research 3
  • 191. Leveraging Big Data for Pharmacovigilance Big Data Analytics
  • 192. Leveraging Big Data for Pharmacovigilance https://knowledgent.com/whitepaper/big-data-enabling-better-pharmacovigilance/ 5
  • 193. Leveraging NLP in Healthcare Analytics NLP (extract, classify, summarize) Biomedical Literature !"#$%&#'( !"#$%&#') !"#$%&#'* '''''''+++''''' '''''''''''' '''''''''''' 1 2 n ... Clinical Notes • Adverse events • Substance use • Family history • Medical history Biomedical knowledge (Subject – Predicate - Object) Healthcare providers, clinical researchers 6 Social Media Pharmacovigilance signals (Drug/supplement - adverse Events)
  • 194. Part 1: NLP for Dietary Supplement Clinical Research 1R01AT009457 (PI: Rui Zhang) • Integrated DS Knowledge Base (iDISK) • Expanding DS terminology • Detecting DS safety signals on clinical notes • Mining biomedical literature to discover DSIs • Active learning to reduce annotation costs • Detecting DS safety signals on Twitter 7 Online resources Clinical notes Literature Social media
  • 195. Introduction to Dietary Supplements • Dietary supplements Ø Herbs, vitamins, minerals, probiotics, amino acids, others. • Use of supplements increasing Ø More than half of U.S. adults take dietary supplements (Center for Disease Control and Prevention) Ø One in six U.S. adults takes a supplement simultaneously with prescription medications Ø Sales over $6 billion per year in U.S. (American Botanical Council, 2014) https://nccih.nih.gov/health/supplements Use of complementary and alternative medicine by children in Europe: Published data and expert perspectives. Complement Ther Med. 2013 4;21. Kaufman, Kelly, JAMA. 2002;287(3):337-344. Dietary Supplement Use Among U.S. Adults Has Increased Since NHANES III (1988–1994). 2014(Nov 4, 2014). CDC. 8
  • 196. Safety of Dietary Supplements • Doctors often poorly informed about supplements Ø 75.5% of 1,157 clinicians • Supplements are NOT always safe Ø Averagely 23,000 annual emergency visits for supplements adverse events Ø Drug-supplement interactions (DSIs) • Concomitant administration of supplements and drugs increases risks of DSIs • Example: Docetaxel & St John’s Wort (hyperforin component induces docetaxel metabolism via P450 3A4) Kaufman, Kelly, JAMA. 2002;287(3):337-344. Geller et al. New England J Med. 2015; 373:1531-40. Gurley BJ. Molecular nutrition & food research. 2008, 52(7):772-9. 9
  • 197. Regulation for Dietary Supplements • Regulated by Dietary Supplement Health and Education Act of 1994 (DSHEA) Ø Different regulatory framework from prescription and over- the-counter drugs Ø Safety testing and FDA approval NOT required before marketing Ø Postmarketing reporting only required for serious adverse events (hospitalization, significant disability or death) Department of Health and Human Services, Food and Drug Administration. New dietary ingredients in dietary supplements — background for industry. March 3, 2014 Dietary Supplement and Nonprescription Drug Consumer Protection Act. Public Law 109-462, 120 Stat 4500. 10
  • 198. Limited Supplements Research • Supplement safety research is limited Ø Not required for clinical trials Ø Not found until new supplement is on the market Ø Voluntary adverse events reporting underestimates the safety issues Ø Pharmacy studies only focuses on specific supplements Ø DSI documentation is limited due to less rigorous regulatory rules on supplements 11
  • 199. Informatics and AI for Supplements Safety Research • Online resources Ø Provides DS knowledge across various resources Ø Need informatics method to standard and integrate knowledge • Electronic health records Ø EHR provides patient data for supplement use Ø Detailed supplements usage information documented in clinical notes • Biomedical literature Ø Contains pharmacokinetics and pharmacodynamics knowledge Ø Discover undefined pathways for DSIs Ø Find potential DSIs by linking information 12
  • 200. Informatics and AI for Supplements Safety Research • Social media Ø Contains customer’s DS use experience Ø Discover their information needs • Adverse Event Reporting System (CARES) Ø Contains reported AEs Ø A good resource to mine DS-AE signals 13
  • 201. Challenges for Supplement Clinical Research • No standardized and consistent DS knowledge representation • Lexical variations of supplements in clinical notes • Detailed usage information related to supplements • Differentiate adverse events vs purpose use 14
  • 202. 1.1. Supplement Knowledge Base 15 To generate an integrated and standardized DS knowledge base Rizvi R, et al. AMIA CRI (student paper competition finalist) 2018. JAMIA 2019. doi: 10.1093/jamia/ocz216
  • 203. iDISK Development q To build a one-stop Integrated DIetary Supplement Knowledge base (iDISK) q DS related content is represented in: consistent and standardized forms 16JAMIA 2019. doi: 10.1093/jamia/ocz216 DSLD, Dietary Supplement Label Database; MSKCC, Memorial Sloan Kettering Cancer Center; NHP, Natural Health Products Database; NMCD, Natural Medicines Comprehensive Database.
  • 206. • Evaluation showed that iDISK achieved high accuracy (98.5%-100%) across all data elements iDISK Statistics 19
  • 207. iDISK vs UMLS on DS Coverage iDISK: 41,628 unique DS ingredient names UMLSDistilled : Only with certain semantic types (Nucleic Acid, Nucleoside, or Nucleotide, Organic Chemical, Pharmacologic Substance, Vitamin, Bacterium, Fish, Fungus, Plant, or Food, etc) UMLSDS: select all concepts using Parent-Children relationship under the “Dietary Supplements” (C0242295) and “Vitamin” (C0042890) concepts. Matched Against iDISK element Exact Match (%) +luiNorm (+%) Total (%) UMLS Concepts UMLS Atoms 27 992 (45.7%) +550 (+0.9%) 28 542 (46.6%) 10 716 Unique Terms 12 744 (30.6%) +474 (+1.1%) 13 218 (31.7%) UMLSDistilled Atoms 27 553 (45.0%) +524 (+0.9%) 28 077 (45.9%) 8 684 Unique Terms 12 397 (29.8%) +450 (+1.0%) 12 847 (30.8%) UMLSDS Atoms 12 096 (19.7%) +407 (+0.7%) 12 503 (20.4%) 5 817 Unique Terms 4 899 (11.8%) +308 (+0.7%) 5 207 (12.5%)
  • 208. Evaluation on a DS NER Task Evaluation Criterion QuickUMLS Installation Precision Recall F1 Lenient UMLS 0.08 0.91 0.15 UMLSDistilled 0.25 0.89 0.39 UMLSDS 0.32 0.86 0.46 iDISK 0.51 0.82 0.63 Union 0.32 0.91 0.48 Strict UMLS 0.05 0.67 0.10 UMLSDistilled 0.19 0.69 0.30 UMLSDS 0.22 0.61 0.33 iDISK 0.43 0.69 0.53 Union 0.23 0.77 0.36 Identifying 3710 DS entities on 351 abstracts
  • 209. 1.2. Expanding Supplement Terminology 22
  • 210. Objective 23 • To apply word embedding models to expand the terminology of DS from clinical notes: semantic variants, brand names, misspellings • Word embeddings • Reveal hidden relationship between words (similarity and relatedness) • More efficient; can be trained a large amount of unannotated data calcium chamomile cranberry dandelion flaxseed garlic ginger ginkgo ginseng glucosamine lavender melatonin turmeric valerian
  • 212. Model Training 25 • Corpus size • Hyperparameter tuning • Window size (i.e., 4, 6, 8, 10, and 12) • Vector size (i.e., 100, 150, 200, 250) • Glove trained on the same corpus • Window size and vector size • Optimal parameters were chosen based on human annotation (intrinsic evaluation)
  • 213. Results: Query Expansion Examples Initial Query word2vec Expanded Query Expanded Examples Black cohosh Misspelling: black kohosh, black kohash; Brand name: remifemin Estroven Estrovan estraven icool amberen amberin Estrovera EstroFactor • Please try black cohash or Estroven for hot flashes. • Pt has discontinued Remifemin but still has symptoms. • Recommend Estroven trial for symptoms of menopause. Turmeric Misspelling: tumeric • Pt emailed wondering about taking Tumeric • Patient states that she sometimes takes the supplements Tumeric Folic acid Brand name: Folgard, Folbic Other name: Folate • Patient is willing to try Folgard if ok with provider. • Patient is on folate and does not smoke. Valerian Misspelling: velarian Brand name: myocalm pm, somnapure • Taking Velarian root and benadryl as well • I would recommend moving to 6mg dose first, then trying somnapure if still not helping. Melatonin Misspelling: Melantonin, melotonin Brand name: alteril, neuro sleep • Can try melantonin for sleep aid. • Try alteril - it is over the counter sleep aid Let me know if this is not better over the next few weeks 26
  • 214. Results: Comparison of Base and Expanded Queries 27
  • 215. Results: Comparison of word embedding expanded versus external resource expanded queries 28
  • 216. 1.3. Detecting DS Indications and Adverse Event Signals in Clinical Texts • Clinical notes document information related to patient safety • ADs • “Patient gets headaches with black cohosh” • Indications • “Presently, patient is taking black cohosh for night sweats and hot flashes” • Temporal relationship between medical events • “Also headaches did start shortly after starting black cohosh” • DS safety surveillance • NLP for medical concepts and relation extraction 29Fan, et al, J Am Med Inform Assoc. 2020
  • 217. Objectives • To demonstrate the feasibility of deep learning models applied to clinical notes to facilitate discovery of DS safety knowledge • To evaluate different deep learning (e.g., pre-trained BERT) models on annotated DS-specific clinical corpora 30
  • 218. Results of NER Models 31 7000 sentences on 7 DS were randomly selected. DS entities including generic name, brand name, abbreviations, misspellings. DS Symptom Overall (micro) P* R* F1 Num* P R F1 Num P R F1 Num CRF 0.900 ± 0.00 0.791 ± 0.00 0.842 ± 0.00 1247 0.714 ± 0.00 0.567 ± 0.00 0.632 ± 0.00 356 0.861 ± 0.00 0.741 ± 0.00 0.797 ± 0.00 1603 Bi-LSTM-CRF (word only) 0.905 ± 0.002 0.854 ± 0.007 0.879 ± 0.003 1247 0.812 ± 0.015 0.825 ± 0.007 0.818 ± 0.009 356 0.884 ± 0.004 0.847 ± 0.003 0.865 ± 0.003 1603 Bi-LSTM-CRF (char lstm) 0.900 ± 0.006 0.860 ± 0.002 0.879 ± 0.003 1247 0.806 ± 0.008 0.837 ± 0.011 0.822 ± 0.008 356 0.877 ± 0.003 0.855 ± 0.003 0.866 ± 0.002 1603 Bi-LSTM-CRF (char cnn) 0.905 ± 0.006 0.864 ± 0.004 0.884 ± 0.003 1247 0.847 ± 0.018 0.845 ± 0.007 0.846 ± 0.011 356 0.892 ± 0.006 0.860 ± 0.003 0.876 ± 0.004 160 Clinical BERT 0.931 ± 0.002 0.845 ± 0.002 0.886 ± 0.002 1247 0.836 ± 0.014 0.840 ± 0.007 0.838 ± 0.008 356 0.908 ± 0.003 0.845 ± 0.002 0.875 ± 0.001 1603 BERT 0.931 ± 0.005 0.850 ± 0.003 0.889 ± 0.003 1247 0.860 ± 0.010 0.854 ± 0.006 0.857 ± 0.004 356 0.914 ± 0.007 0.851 ± 0.003 0.881 ± 0.003 1603
  • 219. Results for Relation Extraction 32 Positive Negative Not related Overall (micro) P R F1 Num P R F1 Num P R F1 Num P R F1 Num Random Forest 0.835 ± 0.002 0.939 ± 0.003 0.884 ± 0.002 336 0.782 ± 0.009 0.716 ± 0.007 0.747 ± 0.006 109 0.825 ± 0.011 0.438± 0.006 0.572 ± 0.005 69 0.823 ± 0.003 0.824 ± 0.002 0.813 ± 0.002 514 CNN 0.937 ± 0.013 0.936 ± 0.031 0.936 ± 0.010 336 0.804 ± 0.057 0.926 ± 0.021 0.859 ± 0.026 109 0.824 ± 0.095 0.634 ± 0.060 0.721 ± 0.040 69 0.899 ± 0.013 0.896± 0.016 0.890 ± 0.016 514 Att- BLSTM 0.913 ± 0.011 0.967 ± 0.017 0.939 ± 0.004 336 0.869 ± 0.035 0.861 ± 0.063 0.863 ± 0.024 109 0.876 ± 0.028 0.798 ± 0.009 0.826 ± 0.007 69 0.897 ± 0.006 0.899 ± 0.005 0.893 ± 0.004 514 3,000 sentences (200 sentences on each) of the 15 DS, including black cohosh, chamomile, cranberry, dandelion, folic acid, garlic, ginger, ginkgo, ginseng, glucosamine, green tea, lavender, melatonin, milk thistle, and saw palmetto
  • 220. Positive relationships (indication) 33 18,348 pairs Entity pair NMCD Sentences Vitamin C, Wound Ö Starting mv, Vitamin C and zinc for wound healing. Fish oil, Hyperlipidemia Ö Patient has history of hyperlipidemia which was until recently well-controlled with fish oil and simvastatin. Peppermint, Nausea Ö He has much less nausea with peppermint oil and marijuana. Vitamin E, Scar Ö Vitamin E po apply 1 capsule daily as needed to scar on forehead. Fish oil, Pain Ö I suggested that she could try daily fish oil which may help the breast pain when it is taken for at least a month or two and could use iburprofen and heat for the pain as well. Psyllium, Constipation Ö Patients states she takes psyllium powder daily for constipation, and needs refills. Vitamin C, UTI Ö Patient with hx recurrent utis, on vitamin c for urinary acidification Fish oil, Anxiety ☓ I encourage over the counter multi vitamin and fish oil pills, as they can help improve some anxiety and depression symptoms. Peppermint, Pain Ö She also has experienced pain relief when rubbing peppermint essential oil on the low back.
  • 221. Negative relationships (Adverse Event) 34 13,130 pairs Entity pair NMCD Sentences Niacin, Rash x Lisinopril causes a cough and niacin causes a rash. Niacin, Flushing x She was having significant flushing with niacin, so she discontinued this about 6 months ago. Niacin, Hives x Patient stating reaction to niacin is hives though has used mvi in past without issues. Fish oil, Rash Ö Pt states when she takes fish oil tablets she get a small rash on her chin. Fish oil, vomiting x Also, discussed vomiting with fish oil caps because she bit into them- would not pursue further at this time. Vitamin C, Nausea x She did have 1 or 2 episodes of nausea related to taking delayed-release vitamin c for wound healing Niacin, GI disturbance x Allergen reactions: niacin: gi disturnbance; simvastation: cramps. Fish oil, Diarrhea Ö Discussed titrating back up on fish oil as he tolerates, previously has been causing a lot of diarrhea so going slow.
  • 222. 1.4. Mining Biomedical Literature to Discovery Drug-Supplement Interactions (DSIs) http://www.wsj.com/articles/what-you-should-know-about-how-your-supplements-interact-with-prescription-drugs-1456777548 Researchers at the University of Minnesota in Minneapolis are exploring interactions between cancer drugs and dietary supplements, based on data extracted from 23 million scientific publications, according to lead author Rui Zhang, a clinical assistant professor in health informatics. In a study published last year by a conference of the American Medical Informatics Association, he says, they identified some that were previously unknown. 35
  • 223. Objective • Explore potential DSIs by linking knowledge extracted from biomedical literature 36
  • 224. Literature-based Discovery We have shown that ECHINACEA preparations and some common alkylamides weakly inhibit several cytochrome P450 (CYP) isoforms, with considerable variation in potency. (19790031) Echinacea - INHIBITS - CYP450 Tamoxifen and toremifene are metabolised by the cytochrome p450 enzyme system, and raloxifene is metabolised by glucuronide conjugation. (12648026) CYP450 - INTERACTS_WITH - Toremifene Named entity recognition (NER), Relationship extraction & Echinacea - <Potentially Interacts With> - Toremifene Big Data: 29 million abstracts X1-Y1 X2-Y2 … Xm-Ym Y1-Z1 Y3-Z2 … Yn-Zn & X1-Z1 Y5-Z8 … Xk-Zt
  • 225. Drug/Supplement Predicate Gene/Gene Class Predicate Supplement/Dru g Known Echinacea INH CYP450 INT Docetaxel Y Echinacea INH CYP450 INT Toremifene N Echinacea STI CYP1A1 INT Exemestane N Grape seed extract INH CYP3A4 INT Docetaxel N Kava preparation STI CYP3A4 INT Docetaxel Y Results: Selected Interactions INH, INHIBITS; STI, STIMULATES; INT, INTERACTS_WITH Echinacea: fights the common cold and viral infections Grape seed extract: cardiac conditions Kava: treat sleep problems, relieve anxiety and stress 38
  • 226. Results: Selected Predications Semantic predication Citations Echinacea INHIBITS CYP450 We have shown that ECHINACEA preparations and some common alkylamides weakly inhibit several cytochrome P450 (CYP) isoforms, with considerable variation in potency. (19790031) Grape seed extract INHIBITS CYP3A4 Four brands of GSE had no effect, while another five produced mild to moderate but variable inhibition of CYP3A4, ranging from 6.4% by Country Life GSE to 26.8% by Loma Linda Market brand. (19353999) Melatonin INHIBITS Cyclooxygenase-2 Moreover, Western blot analysis showed that melatonin inhibited LPS/IFN- gamma-induced expression of COX-2 protein, but not that of constitutive cyclooxygenase. (18078452). CYP450 INTERACTS_WITH Toremifene Tamoxifen and toremifene are metabolised by the cytochrome p450 enzyme system, and raloxifene is metabolised by glucuronide conjugation. (12648026) CYP3A INHIBITS Docetaxel Because docetaxel is inactivated by CYP3A, we studied the effects of the St. John's wort constituent hyperforin on docetaxel metabolism in a human hepatocyte model. (16203790) 39
  • 227. 1.5. Active Learning to Reduce Annotation Costs for NLP Tasks • NLP tasks requires human annotations Ø Time consuming and labor intensive • Active learning reduces annotation costs Ø Used in biomedical and clinical texts Ø Effectiveness varies across datasets and tasks Chen Y, Lasko TA, Mei Q, Denny JC, Xu H. A study of active learning methods for named entity recognition in clinical text. J Biomed Inform 2015; 58: 11–8. Chen Y, Cao H, Mei Q, Zheng K, Xu H. Applying active learning to supervised word sense disambiguation in MEDLINE. J Am Med Inform Assoc 2013; 20 (5): 1001–6. 40
  • 228. Objectives • To assess the effectiveness of AL methods on filtering incorrect semantic predication • To evaluate various query strategies and provide a comparative analysis of AL method through visualization Vasilakes J, Rizvi R, Melton G, Pakhomov S, Zhang R. J Am Med Info Assoc Open. 2018 41
  • 229. Method Overview Query strategies: • Uncertainty sampling • Representative sampling • Combined sampling Evaluation: • 10-fold cross validation • Training = 2700, L0=270 • Testing = 300 using AUC 42
  • 230. Query Strategies • Uncertainty Ø Simple Margin Ø Least confidence Ø Least confidence with dynamic bias • Representative Ø Distance to center Ø Density Ø Min-max • Combined Ø Information density Ø Dynamic 43
  • 231. Datasets and Annotations • Substance interaction (3,000): Ø INTERACTS_WITH, STIMULATES, or INHIBITS • Clinical Medicine (3,000): Ø ADMINISTERED_TO, COEXISTS_WITH, COM- PLICATES, DIAGNOSES, MANIFESTATION_OF, PRE- CEDES, PREVENTS, PROCESS_OF, PRODUCES, TREATS, or USES • Inter-rater agreement: Ø Kappa: 0.74 (SI), 0.72 (CM) Ø Percentage agreement: 87% (SI), 91% (CM) 44
  • 232. Performance Comparison When L is small and U is large: • it is unlikely that L is representative of U • given that L is small and unrepresentative, the prediction model trained on L is likely to be poor. |U| is the size of the current unlabeled set |L| is the size of the current labeled set 45
  • 233. Results Uncertainty Sampling Query Strategy ALC Passive Learning 0.590 Uncertainty Sampling 0.597 – 0.607
  • 234. Results Representative Sampling Query Strategy ALC Passive Learning 0.590 Uncertainty Sampling 0.597 – 0.607 Representative Sampling 0.622 – 0.634
  • 235. Results Combined Sampling Query Strategy ALC Passive Learning 0.590 Uncertainty Sampling 0.597 – 0.607 Representative Sampling 0.622 – 0.634 ID (manual β) 0.642
  • 236. Results Dynamic β Passive Learning Query Strategy ALC Passive Learning 0.590 Uncertainty Sampling 0.597 – 0.607 Representative Sampling 0.622 – 0.634 ID (manual β) 0.642 ID (dynamic β) 0.641
  • 237. Performance Analysis Uncertainty Sampling (worst performing) Representative Sampling (best performing) Vasilakes J, Rizvi R, Melton G, Pakhomov S, Zhang R. J Am Med Info Assoc Open. 2018
  • 238. 1.6. Mining Twitter to Detect DS Adverse Events • Objectives Ø To develop an AI model to make an end-to-end pipeline for identifying DS-AEs from tweets Ø To compare the DS AEs discovered from the tweets to those curated in iDISK
  • 239. Data Collection • Data Collection Ø 332 DS terms including 40 commonly used DS and their name variants Ø 14,143 AE terms from ADR lexicon and iDISK knowledge base Ø The final dataset includes 247,807 tweets that contain at least one DS-AE pair from 2012 to 2018. • Data preprocessing Ø Remove URL, user handle (@username), hashtag symbol (#), and emojis Ø Contractions (e.g. can’t) were expanded Ø Hashtags were segmented into constituent words Ø Stop words were kept (e.g. “throw up” is different from “throw”)
  • 240. Results – Concept Extraction Concept Type Deep Learning Method Precision Recall F1-measure Supplement LSTM-CRF + PubMed Word2Vec 0.8587 ± 0.0211 0.8055 ± 0.0280 0.8310 ± 0.0218 LSTM-CRF + GloVe Twitter 0.8491 ± 0.0321 0.8127 ± 0.0196 0.8300 ± 0.0179 LSTM-CRF + Glove Crawl 0.8736 ± 0.0210 0.8375± 0.0152 0.8551 ± 0.0157 LSTM-CRT + fastText 0.8538 ± 0.0160 0.8092 ± 0.0231 0.8308 ± 0.0175 BioBERT 0.8570 ± 0.0248 0.8725 ± 0.0212 0.8646 ± 0.0220 BERT 0.8560 ± 0.0185 0.8736 ± 0.0198 0.8647 ± 0.0184 Symptom LSTM-CRF + PubMed Word2Vec 0.7909 ± 0.0188 0.6794 ± 0.0258 0.7306 ± 0.0173 LSTM-CRF + GloVe Twitter 0.8048 ± 0.015 0.6994 ± 0.0244 0.7482 ± 0.0155 LSTM-CRF + Glove Crawl 0.8012 ± 0.0205 0.7146 ± 0.0344 0.7550 ± 0.0232 LSTM-CRT + fastText 0.7784 ± 0.0247 0.6841 ± 0.0271 0.7277 ± 0.0182 BioBERT 0.8416 ± 0.0204 0.8582 ± 0.0200 0.8497 ± 0.0172 BERT 0.8393 ± 0.0161 0.8664 ± 0.0147 0.8526 ± 0.0138
  • 241. Results – Relation Extraction (RE) Relation Type Deep Learning Method Precision Recall F1-measure Indication CNN + GloVe Twitter 0.7774 ± 0.0252 0.7946 ± 0.0318 0.7850 ± 0.0124 CNN + GloVe Wiki GigaWord 0.7720 ± 0.0206 0.7901 ± 0.0280 0.7804 ± 0.0142 BioBERT 0.8177 ± 0.0214 0.8595 ± 0.0321 0.8374 ± 0.0147 BERT 0.8181 ± 0.0319 0.8522 ± 0.0409 0.8335 ± 0.0169 Adverse events LSTM-CRF + PubMed Word2Vec 0.6995 ± 0.0653 0.6381 ± 0.0539 0.6645 ± 0.0410 LSTM-CRF + GloVe Twitter 0.7069 ± 0.0553 0.5995 ± 0.0783 0.6456 ± 0.0561 BioBERT 0.7349 ± 0.0430 0.7603 ± 0.0519 0.7459 ± 0.0341 BERT 0.7312 ± 0.0694 0.7845 ± 0.1041 0.7538 ± 0.0376
  • 242. Results • 194,190 pairs were identified as DS indication • 45,668 were identified as DS-AE • 190,170 pairs have no relations
  • 243. Results – DS-AE pairs examples • Vitamin C – Kidney stones tweets: (iDISK has this entry) Ø some medications yes even prolonged high dose vitamin c causes kidney stones Ø vitamin c is not actually an effective treatment for the common cold and high doses may cause kidney stones nausea and diarrhea Ø too much vitamin c can cause kidney stones • Vitamin C – diarrhea tweets: (iDISK has this entry) Ø i would eat this whole bag of oranges but vitamin c in high doses can induce skin breakouts and diarrhea facts Ø too much vitamin c or zinc could cause nausea diarrhea and stomach cramps check your dose Ø too much vitamin c can cause diarrhea and or nausea Ø it can cause diarrhea because of all the vitamin c
  • 244. Results – DS-AE pairs examples • Niacin – Flush tweets: (not in iDISK) Ø the niacin flush may be uncomfortable for a few mins but it is well worth it it may be itchy or burn a little but it passes in 10 30 Ø note to self if you are used to 250 mg of niacin jump up to 500 mg the niacin flush is so intense Ø already got a niacin flush crap • Fish oil – prostate cancer tweets: (not in iDISK) Ø fish oil makes you more likely to get prostate cancer good enough for me to stop taking it just a heads up Ø some docs say fish oil can raise your risk of prostate cancer wait so i should stop stuffing goldfish up there tgif Ø correction study finds fish oil increases risk of high grade prostate cancer by 71 percent <url>
  • 245. 58 Part 2: Information Extraction in EHR and Clinical Trials • Extract Breast Cancer Receptor Status • Identify Clinically New Information • Parse Clinical Trail Eligibility Criteria
  • 246. 2.1. Breast Cancer Receptor Status Phenotyping from EHR 59 Breitenstein MK, Liu H, Maxwell KN, Pathak J, Zhang R. Electronic health record phenotypes for precision medicine: perspectives and caveats from treatment of breast cancer at a single institution. Clinical and Translational Science. 2018 Jan;11(1):85-92.
  • 247. Phenotyping Granularity • Phenotyping usually identify case or control • Precision medicine phenotypes of of breast cancer subtypes Ø Estrogen receptor (ER) Ø Progesterone receptor (PR) Ø Human epidermal growth factor receptor 2 (HER2) Ø Triple negative breast cancer (TNBC: ER-, PR-, HER2-) 60
  • 248. Objectives • Develop NLP-based breast cancer precision medicine phenotyping methods to identify receptor status • Compare the clinical data source coverage on receptor status 61
  • 250. 2.2. Identifying Clinically Relevant New Versus Redundant Information in Clinical Texts 64 • EHR “copy-and-pasting” functionality Ø 74-90% physicians copy and paste Ø 20-78% physician notes are copied text • Results Ø Little deletion, only addition Ø Longer notes, recombinant versions of previous notes Ø Errors repeat • User issues Ø Information overload Ø Difficulties in finding information
  • 251. • Predict the probability of a word based on all previous words • Markov Assumption Ø The probability of a word depends only on the previous n words • N-gram model Ø A (n-1)th order Markov model Example: P (congestion | a female presenting with a chief complaint of nasal) Bigram: P (congestion | nasal) Trigram: P (congestion | of nasal) Four-gram: P (congestion | complaint of nasal) Statistical N-gram Language Model 65 € P(wk | w1 k−1 ) ≈ P(wk | wk−n +1 k−1 ) € P(w1 n ) = P(w1)P(w2 | w1)P(w3 | w1w2)…P(wn | w1w2…wn−1) € = P(wk | w1 k−1 ) k=1 n ∏ Manning and SchÜtze. Foundations of Statistical Natural Language Processing. The MIT Press; 2003.
  • 252. • Sparseness of corpus Ø Zero to unseen event Ø Zero will propagate to the probability of a long string • Smoothing Methods Ø Decrease probability of seen events and allows the occurrence of unseen n-grams Ø Good-Turning Statistical N-gram Language Model 66Manning and SchÜtze. Foundations of Statistical Natural Language Processing. The MIT Press; 2003. If C(w1...wn ) = r > 0, PGT (w1wn ) = r * N , where r* = (r +1)Nr+1 Nr If C(w1...wn ) = 0, PGT (w1wn ) = 1− Nr r * Nr=1 ∞ ∑ N0 ≈ N1 N0 N
  • 253. Semantic Similarity Measures • Measure semantic similarity between two biomedical concepts by determining the closeness in a hierarchy • UMLS brings many biomedical vocabularies and standards together • UMLS::Similarity provides a platform to calculate similarity using various methods • Methods: Resnik; Jiang and Conrath; Lin Pedersen T, Pakhomov S et al. 2nd ACM SIGHIT IHI Symp Proc, 2012. Pakhomov S, McInnes B et al. AMIA Annu Symp Proc 2010: 572-6. McInnes B, Pedersen T, Pakhomov S. AMIA Annu Symp Proc 2009:431-5. Pedersen T, Pakhomov S et al. J Biomed Inform. 2007 Jun;40(3):288-99. P. Resnik, International Joint Conference for Artificial Intelligence, 448-53, 1995. J. Jiang and D. Conrath, Proceedings on International Conference on Research in CL, 9-33, 1997. D. Lin, Proceedings of the International Conference on ML., 296-304, 1998.
  • 254. Results: Performance Comparison Algorithms Recall Precision F1-Measure Optimal Threshold Baseline 0.85 0.64 0.73 - Baseline + Lin 0.87 0.62 0.72 0.9 Baseline + Res 0.87 0.61 0.72 0.9 Baseline + Jcn 0.87 0.61 0.72 0.9 Baseline: rule-based section information adjustment + removal note formatting and noises + removal both stop words + lexical normalization Semantic similarity methods: Lin; Resnik; Jiang and Conrath Precision =TP/(TP+FP), Recall = TP/(TP+FN), F1-Measure = 2×Precision×Recall/(Precision+Recall).
  • 255. NIP Score to Navigate Notes 0" 10" 20" 30" 40" 50" 60" 70" 29" 30" 31" 32" 33" 34" 35" 36" 37" 38" New$Informa,on$Propor,on$(%)$ Note$Index$ 10"Notes" 20"Notes" 30, 32, 33 & 35 Nothing new 31. NEW: RUQ pain worse with eating greasy foods 34. NEW: pt visits diabetes RN 36. NEW: sore throat x 3 days 37. NEW: having chest pain, will try colchicine for pericarditis 38. NEW: depressive symptoms, bulging L TM on exam 70 • Cyclical pattern • High correlation with human judgment • Source note of redundant information *NIP: New Information Proportion
  • 256. New Information Semantic Types Figure: Plot of NDIP (disease), NMIP (medication), and NLIP (laboratory) over time for a patient. Biomedical concepts for each note were automatically extracted and included in boxes. NDIP, new problem/disease information proportion; NMIP, new medication information proportion; NLIP, new laboratory information proportion. !" #!" $!" %!" &!" '!" (!" $')*+,)!-" #&)./0)!-" %)12/)!-" $$)345)!6" #%)748)!6" $)749)!6" $#)3+5)!6" #!)*+,)!6" $6):2;)!6" #-)<=>)!6" ?)345)#!" $()@2A)#!" #?)*;8)#!" ()3+5)#!" $()3+B)#!" #&):2;)#!" %)<=>)#!" $%)12/)#!" ##)@2A)##" $)*;8)##" $$)749)##" ##)3+B)##" %!)*+,)##" #6)./0)##" !"#$ New Information 71 NIP = NDIP+NMIP+NLIP+NOIP !" #" $!" $#" %!" %#" %#&'()&!*" $+&,-.&!*" /&01-&!*" %%&234&!5" $/&637&!5" %&638&!5" %$&2(4&!5" $!&'()&!5" %5&91:&!5" $*&;<=&!5" >&234&$!" %?&@1A&$!" $>&':7&$!" ?&2(4&$!" %?&2(B&$!" $+&91:&$!" /&;<=&$!" %/&01-&$!" $$&@1A&$$" %&':7&$$" %%&638&$$" $$&2(B&$$" /!&'()&$$" $5&,-.&$$" !"#$% !" #" $!" $#" %!" %#" %#&'()&!*" $+&,-.&!*" /&01-&!*" %%&234&!5" $/&637&!5" %&638&!5" %$&2(4&!5" $!&'()&!5" %5&91:&!5" $*&;<=&!5" >&234&$!" %?&@1A&$!" $>&':7&$!" ?&2(4&$!" %?&2(B&$!" $+&91:&$!" /&;<=&$!" %/&01-&$!" $$&@1A&$$" %&':7&$$" %%&638&$$" $$&2(B&$$" /!&'()&$$" $5&,-.&$$" !"#$% 5-Sep-08: clonazepam 24-Sep-08: sertraline, clonazepam 22-Oct-08: sertraline 31-Dec-08: glimepiride 24-Mar-09: tylenol, ibuprofen, Imitrex 8-Mar-10: janumet, metformin, Imitrex, sertraline, estroven 7-May-10: glipizide 25-Mar-11: buspirone, venlafaxine 17-Sep-11: influenza vaccine New Medication Information 5-Sep-08: elbow pain, hand pain, stress, depression, weight gain, fatigure, osteoarthritis 24-Sep-08: sleepy, dizziness, nausea, numbness, low back pain, hip pain 22-Oct-08: anxiety 31-Dec-08: obesity, joint tenderness, depression 23-Jan-09: hypoglycemia, hot flushes, menorrhagia, headache 24-Mar-09: arm pain, migraine headaches, anxiety 8-Mar-10: depression, back pain, fatigue 7-May-10: weight loss, family stress, thirsty, hypercholesterolemia 25-Mar-11: shoulder pain, cramping, Leg pain, patellofemoral syndrome New Disease Information !" #" $!" $#" %!" %#" %#&'()&!*" $+&,-.&!*" /&01-&!*" %%&234&!5" $/&637&!5" %&638&!5" %$&2(4&!5" $!&'()&!5" %5&91:&!5" $*&;<=&!5" >&234&$!" %?&@1A&$!" $>&':7&$!" ?&2(4&$!" %?&2(B&$!" $+&91:&$!" /&;<=&$!" %/&01-&$!" $$&@1A&$$" %&':7&$$" %%&638&$$" $$&2(B&$$" /!&'()&$$" $5&,-.&$$" !"#$% 5-Sep-08: BP, weight 24-Sep-08: breast cancer screeing, X-ray spine 31-Dec-08: A1C, CHOL, HDL, LDL, TRIG, Microalbuminuria measurement, X-ray knee 24-Mar-09: A1C, BP 7-May-10: glucuse monitoring, A1C, HDL, LDL, GLC, BP, blood glucose 25-Mar-11: blood glucose New Laboratory Information NOIP (Other types of new information)
  • 257. New Information Visualization in Epic EHR System
  • 258. 2.3. Parsing Clinical Trial Eligibility Criteria • Patient recruitment delays are remarkably common and costly Ø nearly 80 percent of patient recruitment timelines in clinical trials are not met Ø over 50 percent of the patients are not enrolled within the planned time frames • Objective Ø Using NLP to parse trial inclusion/exclusion eligibility criteria entities Ø Using stat-of-the-art methods on CLAMP platform
  • 260. Semantic Class Example Criteria (entities and attributes are underlined and marked in blue) Entity Demographics Women must be > 18 to 45 years of age; BMI = 27 kg/m2; Observation Bilirubin greater than 1.2 g/dl; MMSE below 24, dementia or unstable clinical depression by exam Procedure History of bilateral hip replacement Condition Uncontrolled hypertension (BP over 180mm HG) Drug Taking metformin, propranolol and other medications Dietary supplement (ds) Use of St. John’s Wort or any other dietary supplement Device Claustrophobia, metal implants, pacemaker or other factors affecting feasibility and / or safety of MRI scanning Attribute Measurement BUN above 40 mg/dl, Cr above 1.8 mg/dl, CrCl < 60 mg/dl Qualifier Signs and symptoms of increased intracranial pressure; severe hypercalcemia Temporal_measurement Use of systemic corticosteroids within the last year Negation Use of anti-diabetic drugs other than metformin Annotations
  • 261. Mapping to UMLS semantic groups across NLP systems
  • 262. A: BioMedICUS, B: CLAMP, C: cTAKES and D: MetaMap. Performances of individual NLP systems & Boolean ensemble Anusha Bompelli, et al. Comparing NLP Systems to Extract Entities of Eligibility Criteria in Dietary Supplements Clinical Trials using NLP-ADAPT. AIME 2020 (will present on Aug 25 at 14:00, NLP session)
  • 263. Eligibility Criteria Corpus (149 trials) Berta (strict, lenient) Robertab (strict, lenient) Electrac (strict, lenient) Semantic Class # Precision Recall F1 measure Precision Recall F1 measure Precision Recall F1 measure Entity demographics 194 0.856, 0.916 0.409, 0.451 0.554, 0.604 0.654, 0.928 0.537, 0.801 0.586, 0.859 0.500, 0.541 0.273, 0.294 0.353, 0.380 observation 868 0.694, 0.829 0.658, 0.829 0.675, 0.829 0.721, 0.91 0.684, 0.897 0.702, 0.904 0.663, 0.865 0.590, 0.795 0.624, 0.829 procedure 148 0.667, 1.000 0.600, 0.800 0.632, 0.889 0.542, 0.708 0.650, 0.850 0.591, 0.773 0.448, 0.552 0.650, 0.850 0.531, 0.669 condition 1832 0.794, 0.995 0.698, 0.851 0.743, 0.917 0.813, 0.949 0.767, 0.900 0.789, 0.924 0.778, 0.0.918 0.788, 0.893 0.744, 0.905 drug 890 0.935, 0.959 0.505, 0.543 0.655, 0.693 0.707, 0.846 0.699, 0.952 0.703, 0.895 0.423, 0.453 0.391, 0.447 0.406, 0.449 supplement 188 0.111, 0.278 0.250, 0.625 0.154, 0.385 0.296, 0.412 0.625, 1.000 0.400, 0.583 0.250, 0.438 0.500, 0.875 0.333, 0.583 device 37 0.857, 0.857 1.000, 1.000 0.923, 0.923 0.857, 0.857 1.000, 1.000 0.923, 0.923 0.750, 0.750 1.000, 1.000 0.857, 0.857 Attribu te measurement 397 0.731, 0.851 0.700, 0.829 0.715, 0.840 0.781, 0.938 0.725, 0.87 0.752, 0.902 0.667, 0.810 0.600, 0.786 0.632, 0.797 qualifier 1137 0.730, 0.795 0.754, 0.822 0.742, 0.808 0.817, 0.872 0.761, 0.795 0.788, 0.831 0.705, 0.750 0.788, 0.839 0.744, 0.792 temporal 646 0.805, 0.931 0.729, 0.823 0.765, 0.874 0.837, 0.989 0.811, 0.926 0.824, 0.957 0.859, 0.976 0.760, 0.844 0.807, 0.905 negation 261 0.818, 0.879 0.562, 0.769 0.667, 0.820 0.914, 0.943 0.821, 0.846 0.825, 0.892 0.735, 0.765 0.641, 0.667 0.685, 0.712 arXiv:a1810.04805, b1907.11692, c2003.10555. Performance Comparison of the Deep Learning Models
  • 264. Acknowledgements Extramural Funding NCCIH 1R01AT009457 (Zhang) OD R01AT009457-03S1 (Zhang) NIA 3R01AT009457-04S1 (Zhang) CTSA 1UL1TR002494 (Blazer) AHRQ 1R01HS022085 (Melton) Medtronic Inc. (Speedie) Collaborators Mayo Clinic (Liu, Wang), U of Florida (Bian), Florida State U (He), NIH/NLM (Rindflesch, Bodenreider), UIUC (Kilicoglu) Contact Information Rui Zhang, Ph.D. Email: zhan1386@umn.edu Research Lab: http://ruizhang.umn.edu/