SlideShare a Scribd company logo
Generating Computable Phenotype Intersection Metadata Using the Phenoflow Library
Generating Computable Phenotype
Intersection Metadata Using the Phenoflow
Library
Toward Implementation: Addressing Real-World Deployments
S25
Martin Chapman, Vasa Curcin
King’s College London
Luke V. Rasmussen,
Jennifer A. Pacheco
Northwestern University
Laura K. Wiley
WashU Medicine
DISCLOSURE OF CONFLICTS OF INTEREST
I have not had any relationships with ACCME-defined ineligible companies within
the past 24 months.
Background: Computable phenotypes
Knowledge objects that capture the logic required to identify individuals with a
disease or condition from their medical records.
Phenotype libraries
Online phenotype catalogues, which store a significant number of computable
phenotypes for the same disease or condition.
Phenotype definition multiplicity
This is a good thing (mostly)…
• It is not desirable (or feasible) to have a single computable phenotype for
every condition. Different use cases necessitate different logic.
But…
• We need to understand which use cases are already supported, to facilitate
reuse. In other words, we need to understand what is unique about each
phenotype. This can then be stored as metadata.
Phenotype intersection
To understand what is unique about each
phenotype (and thus which use cases it best
supports), we can first do the opposite and
understand how two phenotypes for the
same condition intersect.
We can aim to do this automatically and
therefore at scale.
Barriers to automated intersection analysis
1. Identifying when two computable phenotypes target the same disease or
condition in the first place.
• e.g. ‘T2DM Implementation’ vs. ‘Type 2 Diabetes Mellitus’ (PheKB)
2. Comparing different forms of computable phenotypes
• e.g. codelists vs. Natural Language Processing (NLP)
Methods: Identifying same disease/condition
1. Levenshtein distance to identify text similarity.
2. HDR UK API calls to identify phenotypes that target the
same condition but lack text similarity using common
keywords.
3. Large Language Model (LLM) (Llama 3.1) to validate the
additional phenotypes returned in 2. (Not all 161 definitions
are actually for diabetes).
Methods: Comparing definitions
Results: Intersection – Condition groups
1171 definitions loaded into the Phenoflow library.
137 condition groups (conditions with two or more phenotypes). PPV 95%.
574 definitions exist as a part of a group (49%).
Good insight into the extent of the definition multiplicity phenomenon.
Results: Intersection – Steps
Trend: Across the 10 largest condition
groups, the average number of steps in
common between pairs of definitions
relative to the average number of
steps in the group is low.
While definition multiplicty exists,
definitions still have a considerable
number of unique steps.
Results: LLM impact
We observed our LLM:
• Identifying false positives (e.g. matches between phenotypes for different
types of heart failure).
• Identifying false negatives (e.g. phenotype names that do not include the
condition but still aim to identify the condition via the presence of
medications).
Summary and Future work
The use of Phenoflow has allowed us to compare definitions to understand more
about definition multiplicity (extensive) and intersection (limited).
Integrating an LLM increases the reliability of this process.
Unique steps will soon be added to Phenoflow as metadata to support reuse.
To complement definition intersection insight (horizontal), definition
subsumption (vertical) will be explored next.
Links
Implementation (Python): https://github.com/phenoflow/curator
Data analysis (Jupyter): https://github.com/phenoflow/intersection-analysis
Live Phenoflow site: https://kclhi.org/phenoflow

More Related Content

PDF
Phenoflow: An Architecture for Computable Phenotypes
PDF
Phenoflow 2021
PDF
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
PPTX
Haendel clingenetics.3.14.14
PDF
Using computable phenotypes in point of care clinical trial recruitment
PDF
The Monarch Initiative: From Model Organism to Precision Medicine
PPTX
Making the most of phenotypes in ontology-based biomedical knowledge discovery
PDF
CINECA webinar slides: H3ABioNet Experiences in Phenotype Data Harmonisation ...
Phenoflow: An Architecture for Computable Phenotypes
Phenoflow 2021
Phenoflow: A Microservice Architecture for Portable Workflow-based Phenotype ...
Haendel clingenetics.3.14.14
Using computable phenotypes in point of care clinical trial recruitment
The Monarch Initiative: From Model Organism to Precision Medicine
Making the most of phenotypes in ontology-based biomedical knowledge discovery
CINECA webinar slides: H3ABioNet Experiences in Phenotype Data Harmonisation ...

Similar to Generating Computable Phenotype Intersection Metadata Using the Phenoflow Library (20)

PDF
A Data-centric perspective on Data-driven healthcare: a short overview
PPTX
Equivalence is in the (ID) of the beholder
PPTX
The Application of the Human Phenotype Ontology
PPTX
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
PDF
My ontology is better than yours! Building and evaluating ontologies for inte...
PPTX
Phenopackets as applied to variant interpretation
PDF
Using CWL to support EHR-based phenotyping
PDF
Unleash the Power of Neo4j with GPT and Large Language Models: Harmonizing Co...
DOC
DISEASE INFERENCE FROM HEALTH-RELATED QUESTIONS VIA SPARSE DEEP LEARNING
DOC
Disease inference from health-related uestions vissparse deep learning
PDF
PhenDisco: Phenotype Discovery System for the Database of Genotypes and Pheno...
PPTX
9 28-2012 surveys phenotypic drug discovery sig
PPTX
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
PPTX
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
PDF
A Survey On Medical Health Records And AI
PDF
Hochheiser nlm-meeting-201406041612
PDF
Data-driven Disease Phenotyping and Bulk Learning
PPT
Smart health disease prediction python django
PDF
Toward interactive visual tools for comparing phenotype profiles
PPTX
Deep phenotyping for everyone
A Data-centric perspective on Data-driven healthcare: a short overview
Equivalence is in the (ID) of the beholder
The Application of the Human Phenotype Ontology
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Who...
My ontology is better than yours! Building and evaluating ontologies for inte...
Phenopackets as applied to variant interpretation
Using CWL to support EHR-based phenotyping
Unleash the Power of Neo4j with GPT and Large Language Models: Harmonizing Co...
DISEASE INFERENCE FROM HEALTH-RELATED QUESTIONS VIA SPARSE DEEP LEARNING
Disease inference from health-related uestions vissparse deep learning
PhenDisco: Phenotype Discovery System for the Database of Genotypes and Pheno...
9 28-2012 surveys phenotypic drug discovery sig
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
A Survey On Medical Health Records And AI
Hochheiser nlm-meeting-201406041612
Data-driven Disease Phenotyping and Bulk Learning
Smart health disease prediction python django
Toward interactive visual tools for comparing phenotype profiles
Deep phenotyping for everyone
Ad

More from Martin Chapman (20)

PDF
Phenoflow: An Architecture for FAIRer Phenotypes
PDF
Principles of Health Informatics: Artificial intelligence and machine learning
PDF
Principles of Health Informatics: Clinical decision support systems
PDF
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
PDF
Technical Validation through Automated Testing
PDF
Scalable architectures for phenotype libraries
PDF
Using AI to understand how preventative interventions can improve the health ...
PDF
Using AI to autonomously identify diseases within groups of patients
PDF
Using AI to understand how preventative interventions can improve the health ...
PDF
Principles of Health Informatics: Evaluating medical software
PDF
Principles of Health Informatics: Usability of medical software
PDF
Principles of Health Informatics: Social networks, telehealth, and mobile health
PDF
Principles of Health Informatics: Communication systems in healthcare
PDF
Principles of Health Informatics: Terminologies and classification systems
PDF
Principles of Health Informatics: Representing medical knowledge
PDF
Principles of Health Informatics: Informatics skills - searching and making d...
PDF
Principles of Health Informatics: Informatics skills - communicating, structu...
PDF
Principles of Health Informatics: Models, information, and information systems
PDF
Using AI to understand how preventative interventions can improve the health ...
PDF
Using Microservices to Design Patient-facing Research Software
Phenoflow: An Architecture for FAIRer Phenotypes
Principles of Health Informatics: Artificial intelligence and machine learning
Principles of Health Informatics: Clinical decision support systems
Mechanisms for Integrating Real Data into Search Game Simulations: An Applica...
Technical Validation through Automated Testing
Scalable architectures for phenotype libraries
Using AI to understand how preventative interventions can improve the health ...
Using AI to autonomously identify diseases within groups of patients
Using AI to understand how preventative interventions can improve the health ...
Principles of Health Informatics: Evaluating medical software
Principles of Health Informatics: Usability of medical software
Principles of Health Informatics: Social networks, telehealth, and mobile health
Principles of Health Informatics: Communication systems in healthcare
Principles of Health Informatics: Terminologies and classification systems
Principles of Health Informatics: Representing medical knowledge
Principles of Health Informatics: Informatics skills - searching and making d...
Principles of Health Informatics: Informatics skills - communicating, structu...
Principles of Health Informatics: Models, information, and information systems
Using AI to understand how preventative interventions can improve the health ...
Using Microservices to Design Patient-facing Research Software
Ad

Recently uploaded (20)

PDF
Insiders guide to clinical Medicine.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Pre independence Education in Inndia.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
master seminar digital applications in india
PPTX
Microbial diseases, their pathogenesis and prophylaxis
Insiders guide to clinical Medicine.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Anesthesia in Laparoscopic Surgery in India
Sports Quiz easy sports quiz sports quiz
Supply Chain Operations Speaking Notes -ICLT Program
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Renaissance Architecture: A Journey from Faith to Humanism
O7-L3 Supply Chain Operations - ICLT Program
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Pharmacology of Heart Failure /Pharmacotherapy of CHF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Pre independence Education in Inndia.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
master seminar digital applications in india
Microbial diseases, their pathogenesis and prophylaxis

Generating Computable Phenotype Intersection Metadata Using the Phenoflow Library

  • 2. Generating Computable Phenotype Intersection Metadata Using the Phenoflow Library Toward Implementation: Addressing Real-World Deployments S25 Martin Chapman, Vasa Curcin King’s College London Luke V. Rasmussen, Jennifer A. Pacheco Northwestern University Laura K. Wiley WashU Medicine
  • 3. DISCLOSURE OF CONFLICTS OF INTEREST I have not had any relationships with ACCME-defined ineligible companies within the past 24 months.
  • 4. Background: Computable phenotypes Knowledge objects that capture the logic required to identify individuals with a disease or condition from their medical records.
  • 5. Phenotype libraries Online phenotype catalogues, which store a significant number of computable phenotypes for the same disease or condition.
  • 6. Phenotype definition multiplicity This is a good thing (mostly)… • It is not desirable (or feasible) to have a single computable phenotype for every condition. Different use cases necessitate different logic. But… • We need to understand which use cases are already supported, to facilitate reuse. In other words, we need to understand what is unique about each phenotype. This can then be stored as metadata.
  • 7. Phenotype intersection To understand what is unique about each phenotype (and thus which use cases it best supports), we can first do the opposite and understand how two phenotypes for the same condition intersect. We can aim to do this automatically and therefore at scale.
  • 8. Barriers to automated intersection analysis 1. Identifying when two computable phenotypes target the same disease or condition in the first place. • e.g. ‘T2DM Implementation’ vs. ‘Type 2 Diabetes Mellitus’ (PheKB) 2. Comparing different forms of computable phenotypes • e.g. codelists vs. Natural Language Processing (NLP)
  • 9. Methods: Identifying same disease/condition 1. Levenshtein distance to identify text similarity. 2. HDR UK API calls to identify phenotypes that target the same condition but lack text similarity using common keywords. 3. Large Language Model (LLM) (Llama 3.1) to validate the additional phenotypes returned in 2. (Not all 161 definitions are actually for diabetes).
  • 11. Results: Intersection – Condition groups 1171 definitions loaded into the Phenoflow library. 137 condition groups (conditions with two or more phenotypes). PPV 95%. 574 definitions exist as a part of a group (49%). Good insight into the extent of the definition multiplicity phenomenon.
  • 12. Results: Intersection – Steps Trend: Across the 10 largest condition groups, the average number of steps in common between pairs of definitions relative to the average number of steps in the group is low. While definition multiplicty exists, definitions still have a considerable number of unique steps.
  • 13. Results: LLM impact We observed our LLM: • Identifying false positives (e.g. matches between phenotypes for different types of heart failure). • Identifying false negatives (e.g. phenotype names that do not include the condition but still aim to identify the condition via the presence of medications).
  • 14. Summary and Future work The use of Phenoflow has allowed us to compare definitions to understand more about definition multiplicity (extensive) and intersection (limited). Integrating an LLM increases the reliability of this process. Unique steps will soon be added to Phenoflow as metadata to support reuse. To complement definition intersection insight (horizontal), definition subsumption (vertical) will be explored next.
  • 15. Links Implementation (Python): https://github.com/phenoflow/curator Data analysis (Jupyter): https://github.com/phenoflow/intersection-analysis Live Phenoflow site: https://kclhi.org/phenoflow