SlideShare a Scribd company logo
BIOSTEC 2022
Learning Embeddingsfrom Free-TextTriageNotes Using
PretrainedTransformerModels
Émilien Arnaud, Mahmoud Elbattah, Maxime Gignon, Gilles Dequen
Université de Picardie Jules Verne (UPJV), France
mahmoud.elbattah@u-picardie.fr
https://www.researchgate.net/publication/358867707_Learning_Embeddings_from_Free-text_Triage_Notes_using_Pretrained_Transformer_Models
BIOSTEC 2022
Study Context
• Exploring pre-trained BERT models as a mechanism for learning embeddings
from clinical notes.
• A use case of triage notes in the French language, from the University
Hospital of Picardie Jules Verne in France.
2
BIOSTEC 2022
Data Description
3
# Field Name Type
1 Arrival (Week Day /Hour) Categorical
2 Gender Categorical
3 Origin Categorical
4 Arrival Modlaity Categorical
5 Accompaniers Categorical
6 Family Status Categorical
7 Waiting Modality Categorical
8 Reason for Encounter Categorical
9 Circumstances Categorical
10 Age Numeric
11 Oxygen Flow Numeric
12 Heart Rate Numeric
13 Respiration Rate Numeric
14 Systolic Blood Pressure Numeric
15 Diastolic Blood Pressure Numeric
16 Pain Scale Numeric
17 Temperature Numeric
18 Oxygen Saturation Numeric
19 Capillary Blood Glucose Numeric
20 Capillary Blood Hemoglobin Numeric
21 Bladder volume Numeric
22 Capillary Blood Ketones Numeric
23 Breath Test of Alcohol Numeric
24 Nurse Triage Scale Numeric
25 Nurse Notes Text
26 Psychiatric History Text
27 Surgical History Text
28 Medical History Text
More than 260K ED records over the period of
January 2015 to June 2019.
BIOSTEC 2022
Data Description (cont’d)
Specialty/ Label Hospitalization %
Surgery/ CHIR 19.7%
Short-Term Hospitalization Unit / UHCD 42.4%
Medical Specialty / MED 33%
Other 4.9%
4
BIOSTEC 2022
Our EarlierWork
• Early prediction of hospitalization1
• Prediction of medical specialties for patients hospitalized 2.
5
1 Arnaud, E., Elbattah, M., Gignon, M., & Dequen, G. (2020). Deep learning to predict hospitalization at triage: Integration of structured data and unstructured text.
In Proceedings of the IEEE International Conference on Big Data. IEEE.
2 Arnaud, E., Elbattah, M., Gignon, G & Dequen, G. (2021). NLP-based prediction of medical specialties at hospital admission using triage notes. In Proceedings of
IEEE International Conference on Healthcare Informatics (ICHI).
BIOSTEC 2022
Approach Overview
6
BIOSTEC 2022
TransformerModels
• CamemBERT (Martin et al. 2019)
• FlauBERT (Le et al. 2019)
• mBART (Liu et al. 2020)
• All models were accessed through the HuggingFace repository.
7
BIOSTEC 2022
FeatureExtractionExperiments
8
Model Params Embedding
Dimension Runtime
CamemBERT 110M 768 31 min
FlauBERT 137 M 768 32 min
MBART 610M 1024 64 min
Full Transfer-Learning was applied for the feature extraction process.
Single Nvidia V-100 GPU was used.
BIOSTEC 2022
ClusteringExperiments
9
Parameter Value
Number of Clusters (K) 2–10
Centroid Initialisation k-means++
Similarity Metric Euclidian Distance
Number of Iterations 200
BIOSTEC 2022
Evaluationof Clusters
• Silhouette Score:
• Fowlkes-Mallows Score:
10
BIOSTEC 2022
Results: SilhouetteScore
11
BIOSTEC 2022
Results: Fowlkes-MallowsScore
12
BIOSTEC 2022
Conclusions
• BERT-based contextual embeddings could produce clusters of good
coherence in general.
• Our experiments could largely validate the suitability of Transfer Leaning in
this context.
• Pretrained transformers can serve as an effective mechanism for learning
embeddings from free-text notes, ubiquitously in the healthcare environment.
13
BIOSTEC 2022
Thank You!
mahmoud.elbattah@u-picardie.fr

More Related Content

PDF
Deep learning for biomedical discovery and data mining II
PPTX
Modeling Electronic Health Records with Recurrent Neural Networks
PDF
台灣人工智慧學校南部智慧醫療專班開學典禮 - 主題演講:邁向智慧醫療新時代(陳昇瑋執行長)
PDF
BioCreative2023_proceedings_instructions_authors_template.pdf
PPTX
Recent Advances in Deep Learning Techniques for Electronic Health Record
PDF
台灣人工智慧學校北部智慧醫療專班開學典禮 - 主題演講:邁向智慧醫療新時代(陳昇瑋執行長)
PDF
邁向智慧醫療新時代_台灣人工智慧學校中部智慧醫療專班開學主題演講
PDF
Deep Learning in Healthcare
Deep learning for biomedical discovery and data mining II
Modeling Electronic Health Records with Recurrent Neural Networks
台灣人工智慧學校南部智慧醫療專班開學典禮 - 主題演講:邁向智慧醫療新時代(陳昇瑋執行長)
BioCreative2023_proceedings_instructions_authors_template.pdf
Recent Advances in Deep Learning Techniques for Electronic Health Record
台灣人工智慧學校北部智慧醫療專班開學典禮 - 主題演講:邁向智慧醫療新時代(陳昇瑋執行長)
邁向智慧醫療新時代_台灣人工智慧學校中部智慧醫療專班開學主題演講
Deep Learning in Healthcare

Similar to Learning Embeddings from Free-text Triage Notes using Pretrained Transformer Models (20)

PDF
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
PDF
Challenges and opportunities for machine learning in biomedical research
PPTX
NLP for Biomedical Applications
PDF
Automated and Explainable Deep Learning for Clinical Language Understanding a...
PPTX
Deep Learning for EHR Data
PDF
Using natural language processing to evaluate the impact of specialized trans...
PDF
2019 acl bio_nlp_nli_surf_poster
PPSX
Autonomous medical coding with discriminative transformers
PDF
ICU Mortality Rate Estimation Using Machine Learning and Artificial Neural Ne...
PDF
NLP-Based Prediction of Medical Specialties at Hospital Admission Using Triag...
PDF
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
PDF
Deep learning for episodic interventional data
PDF
ARTIFICIAL NEURAL NETWORK FOR DIAGNOSIS OF PANCREATIC CANCER
PDF
ARTIFICIAL NEURAL NETWORK FOR DIAGNOSIS OF PANCREATIC CANCER
PDF
Care expert assistant for Medicare system using Machine learning
PDF
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
PDF
Extreme scale text based classification of medical data
PDF
Towards a disease prediction system: biobert-based medical profile representa...
PDF
AI approaches in healthcare - targeting precise and personalized medicine
PDF
MLPA for health care presentation smc
Noise Pollution in Hospital Readmission Prediction: Long Document Classificat...
Challenges and opportunities for machine learning in biomedical research
NLP for Biomedical Applications
Automated and Explainable Deep Learning for Clinical Language Understanding a...
Deep Learning for EHR Data
Using natural language processing to evaluate the impact of specialized trans...
2019 acl bio_nlp_nli_surf_poster
Autonomous medical coding with discriminative transformers
ICU Mortality Rate Estimation Using Machine Learning and Artificial Neural Ne...
NLP-Based Prediction of Medical Specialties at Hospital Admission Using Triag...
Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthca...
Deep learning for episodic interventional data
ARTIFICIAL NEURAL NETWORK FOR DIAGNOSIS OF PANCREATIC CANCER
ARTIFICIAL NEURAL NETWORK FOR DIAGNOSIS OF PANCREATIC CANCER
Care expert assistant for Medicare system using Machine learning
DSS Ontotext Webinar -Examode: Extreme-scale text-based classification of med...
Extreme scale text based classification of medical data
Towards a disease prediction system: biobert-based medical profile representa...
AI approaches in healthcare - targeting precise and personalized medicine
MLPA for health care presentation smc
Ad

More from Mahmoud Elbattah (19)

PDF
Vision-Based Approach for Autism Diagnosis Using Transfer Learning and Eye-Tr...
PDF
NLP-Based Approach to Detect Autism Spectrum Disorder in Saccadic Eye Movement
PDF
Generative Modeling of Synthetic Eye-Tracking Data: NLP-Based Approach with R...
PDF
Multi-Channel ConvNet Approach to Predict the Risk of In-Hospital Mortality f...
PDF
Learning Clusters in Autism Spectrum Disorder: Image-Based Clustering of Eye-...
PDF
Learning to Predict Autism Spectrum Disorder Based on the Visual Patterns of ...
PDF
Designing Care Pathways Using Simulation Modeling and Machine Learning
PDF
Clustering-Aided Approach for Predicting Patient Outcomes with Application to...
PDF
Using Machine Learning to Predict Length of Stay and Discharge Destination fo...
PDF
Evaluation Criteria of ERP Systems
PDF
ML-Aided Simulation: A Conceptual Framework for Integrating Simulation Models...
PDF
The Economic Burden of Hip Fractures among Elderly Patients in Ireland: A Com...
PDF
Using Simulation Modeling to Design Value-Based Healthcare Systems
PDF
Large-Scale Ontology Storage and Query Using Graph Database-Oriented Approach
PDF
Towards Improving Modeling and Simulation of Clinical Pathways: Lessons Learn...
PDF
FrrbaseViz-A Tool for Exploring Freebase Using Query-Driven Visualisation
PDF
Supply Chains Modelling and Simulation Framework:Graph-Driven Approach Using ...
PDF
Coupling Simulation with Machine Learning:A Hybrid Approach for Elderly Disch...
PDF
Learning about Systems Using Machine Learning:Towards More Data-Driven Feedba...
Vision-Based Approach for Autism Diagnosis Using Transfer Learning and Eye-Tr...
NLP-Based Approach to Detect Autism Spectrum Disorder in Saccadic Eye Movement
Generative Modeling of Synthetic Eye-Tracking Data: NLP-Based Approach with R...
Multi-Channel ConvNet Approach to Predict the Risk of In-Hospital Mortality f...
Learning Clusters in Autism Spectrum Disorder: Image-Based Clustering of Eye-...
Learning to Predict Autism Spectrum Disorder Based on the Visual Patterns of ...
Designing Care Pathways Using Simulation Modeling and Machine Learning
Clustering-Aided Approach for Predicting Patient Outcomes with Application to...
Using Machine Learning to Predict Length of Stay and Discharge Destination fo...
Evaluation Criteria of ERP Systems
ML-Aided Simulation: A Conceptual Framework for Integrating Simulation Models...
The Economic Burden of Hip Fractures among Elderly Patients in Ireland: A Com...
Using Simulation Modeling to Design Value-Based Healthcare Systems
Large-Scale Ontology Storage and Query Using Graph Database-Oriented Approach
Towards Improving Modeling and Simulation of Clinical Pathways: Lessons Learn...
FrrbaseViz-A Tool for Exploring Freebase Using Query-Driven Visualisation
Supply Chains Modelling and Simulation Framework:Graph-Driven Approach Using ...
Coupling Simulation with Machine Learning:A Hybrid Approach for Elderly Disch...
Learning about Systems Using Machine Learning:Towards More Data-Driven Feedba...
Ad

Recently uploaded (20)

PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Database Infoormation System (DBIS).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction to machine learning and Linear Models
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Foundation of Data Science unit number two notes
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
annual-report-2024-2025 original latest.
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Database Infoormation System (DBIS).pptx
Clinical guidelines as a resource for EBP(1).pdf
Business Ppt On Nestle.pptx huunnnhhgfvu
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Introduction to Knowledge Engineering Part 1
climate analysis of Dhaka ,Banglades.pptx
Reliability_Chapter_ presentation 1221.5784
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction to machine learning and Linear Models
Fluorescence-microscope_Botany_detailed content
Foundation of Data Science unit number two notes
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
annual-report-2024-2025 original latest.
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Supervised vs unsupervised machine learning algorithms
Acceptance and paychological effects of mandatory extra coach I classes.pptx

Learning Embeddings from Free-text Triage Notes using Pretrained Transformer Models

  • 1. BIOSTEC 2022 Learning Embeddingsfrom Free-TextTriageNotes Using PretrainedTransformerModels Émilien Arnaud, Mahmoud Elbattah, Maxime Gignon, Gilles Dequen Université de Picardie Jules Verne (UPJV), France mahmoud.elbattah@u-picardie.fr https://www.researchgate.net/publication/358867707_Learning_Embeddings_from_Free-text_Triage_Notes_using_Pretrained_Transformer_Models
  • 2. BIOSTEC 2022 Study Context • Exploring pre-trained BERT models as a mechanism for learning embeddings from clinical notes. • A use case of triage notes in the French language, from the University Hospital of Picardie Jules Verne in France. 2
  • 3. BIOSTEC 2022 Data Description 3 # Field Name Type 1 Arrival (Week Day /Hour) Categorical 2 Gender Categorical 3 Origin Categorical 4 Arrival Modlaity Categorical 5 Accompaniers Categorical 6 Family Status Categorical 7 Waiting Modality Categorical 8 Reason for Encounter Categorical 9 Circumstances Categorical 10 Age Numeric 11 Oxygen Flow Numeric 12 Heart Rate Numeric 13 Respiration Rate Numeric 14 Systolic Blood Pressure Numeric 15 Diastolic Blood Pressure Numeric 16 Pain Scale Numeric 17 Temperature Numeric 18 Oxygen Saturation Numeric 19 Capillary Blood Glucose Numeric 20 Capillary Blood Hemoglobin Numeric 21 Bladder volume Numeric 22 Capillary Blood Ketones Numeric 23 Breath Test of Alcohol Numeric 24 Nurse Triage Scale Numeric 25 Nurse Notes Text 26 Psychiatric History Text 27 Surgical History Text 28 Medical History Text More than 260K ED records over the period of January 2015 to June 2019.
  • 4. BIOSTEC 2022 Data Description (cont’d) Specialty/ Label Hospitalization % Surgery/ CHIR 19.7% Short-Term Hospitalization Unit / UHCD 42.4% Medical Specialty / MED 33% Other 4.9% 4
  • 5. BIOSTEC 2022 Our EarlierWork • Early prediction of hospitalization1 • Prediction of medical specialties for patients hospitalized 2. 5 1 Arnaud, E., Elbattah, M., Gignon, M., & Dequen, G. (2020). Deep learning to predict hospitalization at triage: Integration of structured data and unstructured text. In Proceedings of the IEEE International Conference on Big Data. IEEE. 2 Arnaud, E., Elbattah, M., Gignon, G & Dequen, G. (2021). NLP-based prediction of medical specialties at hospital admission using triage notes. In Proceedings of IEEE International Conference on Healthcare Informatics (ICHI).
  • 7. BIOSTEC 2022 TransformerModels • CamemBERT (Martin et al. 2019) • FlauBERT (Le et al. 2019) • mBART (Liu et al. 2020) • All models were accessed through the HuggingFace repository. 7
  • 8. BIOSTEC 2022 FeatureExtractionExperiments 8 Model Params Embedding Dimension Runtime CamemBERT 110M 768 31 min FlauBERT 137 M 768 32 min MBART 610M 1024 64 min Full Transfer-Learning was applied for the feature extraction process. Single Nvidia V-100 GPU was used.
  • 9. BIOSTEC 2022 ClusteringExperiments 9 Parameter Value Number of Clusters (K) 2–10 Centroid Initialisation k-means++ Similarity Metric Euclidian Distance Number of Iterations 200
  • 10. BIOSTEC 2022 Evaluationof Clusters • Silhouette Score: • Fowlkes-Mallows Score: 10
  • 13. BIOSTEC 2022 Conclusions • BERT-based contextual embeddings could produce clusters of good coherence in general. • Our experiments could largely validate the suitability of Transfer Leaning in this context. • Pretrained transformers can serve as an effective mechanism for learning embeddings from free-text notes, ubiquitously in the healthcare environment. 13