SlideShare a Scribd company logo
Interpretable machine learning models
in endocrinology and beyond
Michael Biehl
www.cs.rug.nl/~biehl
Bernoulli Institute for Mathematics,
Computer Science and Artificial Intelligence
University of Groningen, The Netherlands
Centre for Systems Modelling &
Quantitative Biomedicine
supervised learning: regression / classification
data: observations, e.g.
vectors of num. values
regression problems:
predict quantitative property e.g.
a
b
estimate weight , model:
example
data
set
classification tasks:
assign data to a category
x1
x2 bulls
cows
training:
optimize parameters
⇧
model: linear separation
“bull”
else “cow”
x1
girth
x2
length
x1
preferred: transparent/interpretable, white box
avoid blind application of ML in black box mode
interpretable machine learning
popular keywords: explainable AI (XAI)
fair, honest, trustworthy … AI
preferred: transparent/interpretable, white box
avoid blind application of ML in black box mode
interpretable machine learning
interpretable machine learning
- understand how decisions are taken
- avoid artifacts, e.g. due to hidden bias in the data
- obtain insight into the data set/problem
- posthoc simplification of the model …
accuracy is not enough [Paulo Lisboa]
… is not necessarily the goal
(e.g. basic research, biomarker identification)
😺 vs. 🐶
(sometimes it is)
• training: represent data by one or
several prototypes per class
• working: classify a query according to
the label of the nearest prototype
• decision boundaries according
to (Euclidean) distances
+
+ low storage needs
little computational effort
parameterized in feature space, intuitive and interpretable
one intuitive framework: prototype systems
for distance-based classification
Learning Vector Quantization (LVQ)
N-dim. feature space
?
x1
x2
distance measures and relevance learning
distance measure compares
prototypes
data points
(squared) Euclidean distance
- all features equally important ?
- features of the same type/scale ?
- are features independent ?
distance measure compares
prototypes
data points
generalized measure
relevance of a particular single feature
contribution of a pair of features
training: optimize prototypes and relevance matrix
w.r.t. performance on training data ( objective function )
Generalized Matrix Relevance LVQ
application example: steroid metabolomics
adrenocortical tumors (adenoma vs. carcinoma)
www.ensat.org
benign ACA malignant ACC
features: 32 steroid metabolite excretion values (GC/MS)
non-invasive measurement (24 hrs. urine)
steroid
#
set of
labelled
example
data
aim: develop a tool / support system for differential diagnosis
idea: analyse retrospective data by machine learning
identify characteristic steroid prototypes and relevances
Generalized Matrix LVQ , ACC vs. ACA classification
o pre-processing: log-transformation of excretion values
• data split into 90% training, 10% validation set
• training: determine prototypes and relevance matrix
representative profiles (1 per class)
parameterizes distance measure
• validation: apply classifier to 10% hold-out data
evaluates expected performance (error rates, ROC, … )
o repeat and average results over many random splits
application example: steroid metabolomics
ROC characteristics
clear improvement due to
relevance learning
on average over 1000
randomized splits
1-specificity
sensitivity
diagonal rel.
Euclidean
full matrix
AUC
0.87
0.93
0.97
validation performance
no relevances
only diagonal
full
more than accuracy ?
prototypes: steroid excretion in ACA/ACC
ACA
ACC
(z-score
transformed)
metabolite
excretion
above
- average
below
above
- average
below
insight: prototypes
… pairs of markers
importance of single markers
insight: relevance matrix
5-PT 5-PD
THS
facilitates selection of reduced
panels with similar performance
ACA
ACC
relevances
confirm – surprise – visualize
19 THS
individually
discriminative
relevances
(8) 5⍺ THA (12) TH-Doc
???
confirm - surprise - visualize
ACC
ACA
GMLVQ: multivariate analysis,
discriminative combinations
ACA
ACC
relevance matrix is dominated by leading eigenvectors
confirm – surprise - visualize
• visualize data set
and prototypes
 misclassifications?
• inspect individual cases
o uncertain cases
 outliers
GMLVQ: example of an interpretable classifier
- class representatives in terms of orginal feature space
- relevances of single features / combinations thereof
- visualization & low-dimensional representation
summary
example application:
steroid metabolomics based tumor classification
et al.
prospective
steroid metabolomics: on-going and future work
- identify reduced panels of metabolites
- monitoring of patients, detection of recurrences
- other disorders relating to steroid metabolism …
other biomedical applications GMLVQ and similar methods:
- analysis of cytokine markers in rheumatoid arthritis
- neuroimaging: FDG-PET scans in neurodegenerative disorders
- gene expression for risk prediction in cancer
- mRNA expression for the analysis of ribosome composition …
methodological extensions:
- high-dimensional data, heterogeneous data
- modified distance measures, local relevances
- probabilistic classification, forms of regression …
outlook
IEEE Members News, March 2021
girth
x2
length
x1
some take stay home messages
exploit domain knowledge
(c) https://twitter.com/jessenleon
some links and example references
www.cs.rug.nl/~biehl publications, news, links
GMLVQ code: Matlab, Python, Java
M. Biehl, B. Hammer, T. Villmann. Prototype-based models in machine learning
Advanced Review in WIRES Cognitive Science, 7(2): 92-111, 2016
M. Biehl. Biomedical Applications of Prototype Based Classifiers and Relevance Learning
In: International Conference on Algorithms for Computational Biology AlCoB 2017
Springer Lecture Notes in Computer Science 10252: 3-23, 2017
R. van Veen, V. Gurvits, R. Kogan, S. Meles, G.-J. de Vries, R. Renken et al.
An application of Generalized Matrix Learning Vector Quantization in Neuroimaging
Computer Methods and Programs in Biomedicine, Vol. 197: 105708, 2020
A. Moolla, J. de Boer, D. Pavlov, A. Amin et al. Accurate non-invasive diagnosis and
staging of non-alcoholic fatty liver disease using the urinary steroid metabolome
Alimentary Pharmacology and Therapeutics 51: 1188-1197, 2020

More Related Content

PPSX
Prototype-based classifiers and their applications in the life sciences
PPSX
2013: Prototype-based learning and adaptive distances for classification
PPSX
2016: Classification of FDG-PET Brain Data
PPSX
2015: Distance based classifiers: Basic concepts, recent developments and app...
PDF
Clustering and Classification of Cancer Data Using Soft Computing Technique
DOCX
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
PDF
Model Integration
PDF
Prototype-based classifiers and their applications in the life sciences
2013: Prototype-based learning and adaptive distances for classification
2016: Classification of FDG-PET Brain Data
2015: Distance based classifiers: Basic concepts, recent developments and app...
Clustering and Classification of Cancer Data Using Soft Computing Technique
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Model Integration

What's hot (20)

PDF
Classification of Breast Cancer Diseases using Data Mining Techniques
PDF
Multi-Cluster Based Approach for skewed Data in Data Mining
PPSX
SUPERVISED DISCRETISATION AND GROUPING (VIDEO 2/4)
PDF
Fault detection of imbalanced data using incremental clustering
PPT
Usage of Semantic Web Technologies (Web 3.0) Aiming to Facilitate the Utilisa...
PDF
MACHINE LEARNING
DOC
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
PDF
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
PDF
Intro 2 Machine Learning
PDF
Unsupervised Clustering Classify theCancer Data withtheHelp of FCM Algorithm
PPTX
Edbt2014 talk
PDF
Incremental learning from unbalanced data with concept class, concept drift a...
PDF
A new model for iris data set classification based on linear support vector m...
PPT
Scientific applications of machine learning
PPSX
BAYESIAN ENSEMBLE CLASSIFIER (VIDEO 3/4)
PPTX
Drug discovery presentation
TXT
PDF
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
PPSX
EXTRACTION OF SEQUENTIAL RULES (VIDEO 4/4)
PPSX
AN INTRODUCTION TO AUTO-ML EDGE-ML (VIDEO 1/4)
Classification of Breast Cancer Diseases using Data Mining Techniques
Multi-Cluster Based Approach for skewed Data in Data Mining
SUPERVISED DISCRETISATION AND GROUPING (VIDEO 2/4)
Fault detection of imbalanced data using incremental clustering
Usage of Semantic Web Technologies (Web 3.0) Aiming to Facilitate the Utilisa...
MACHINE LEARNING
Robust Breast Cancer Diagnosis on Four Different Datasets Using Multi-Classif...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
Intro 2 Machine Learning
Unsupervised Clustering Classify theCancer Data withtheHelp of FCM Algorithm
Edbt2014 talk
Incremental learning from unbalanced data with concept class, concept drift a...
A new model for iris data set classification based on linear support vector m...
Scientific applications of machine learning
BAYESIAN ENSEMBLE CLASSIFIER (VIDEO 3/4)
Drug discovery presentation
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
EXTRACTION OF SEQUENTIAL RULES (VIDEO 4/4)
AN INTRODUCTION TO AUTO-ML EDGE-ML (VIDEO 1/4)
Ad

Similar to Interpretable machine-learning (in endocrinology and beyond) (20)

PPSX
Biehl hanze-2021
PPSX
June 2017: Biomedical applications of prototype-based classifiers and relevan...
PDF
Challenges and opportunities for machine learning in biomedical research
PDF
Machine learning in biology
PDF
Use cases
PDF
An ensemble deep learning classifier of entropy convolutional neural network ...
PDF
ML to cure the world
PDF
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
PDF
Health Care Application using Machine Learning and Deep Learning
PPTX
Application of Biomedical Informatics in Clinical Problem Solving
PDF
1-s2.0-S1877050915004561-main
PDF
Accurate prediction of chronic diseases using deep learning algorithms
PPTX
Machine Learning in Healthcare_ Mayank-Singh_Madhur-Jain.pptx
PPTX
Data Science for (Health) Science: tales from a challenging front line, and h...
PDF
An introduction to machine learning in biomedical research: Key concepts, pr...
PDF
Machine learning, biomarker accuracy and best practices
PDF
PPTX
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
PPTX
CARDIOGUIDE PROGRESS.pptx heart disease Prediction
PPTX
Electronic health records and machine learning
Biehl hanze-2021
June 2017: Biomedical applications of prototype-based classifiers and relevan...
Challenges and opportunities for machine learning in biomedical research
Machine learning in biology
Use cases
An ensemble deep learning classifier of entropy convolutional neural network ...
ML to cure the world
Xavier Amatriain, Cofounder & CTO, Curai at MLconf SF 2017
Health Care Application using Machine Learning and Deep Learning
Application of Biomedical Informatics in Clinical Problem Solving
1-s2.0-S1877050915004561-main
Accurate prediction of chronic diseases using deep learning algorithms
Machine Learning in Healthcare_ Mayank-Singh_Madhur-Jain.pptx
Data Science for (Health) Science: tales from a challenging front line, and h...
An introduction to machine learning in biomedical research: Key concepts, pr...
Machine learning, biomarker accuracy and best practices
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
CARDIOGUIDE PROGRESS.pptx heart disease Prediction
Electronic health records and machine learning
Ad

More from University of Groningen (17)

PDF
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
PDF
ESE-Eyes-2023.pdf
PDF
APPIS-FDGPET.pdf
PDF
stat-phys-appis-reduced.pdf
PDF
prototypes-AMALEA.pdf
PDF
stat-phys-AMALEA.pdf
PDF
Evidence for tissue and stage-specific composition of the ribosome: machine l...
PPTX
The statistical physics of learning revisted: Phase transitions in layered ne...
PPSX
2020: Prototype-based classifiers and relevance learning: medical application...
PPSX
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
PPTX
2020: So you thought the ribosome was constant and conserved ...
PPSX
Prototype-based models in machine learning
PPSX
The statistical physics of learning - revisited
PPSX
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
PPSX
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
PPSX
2017: Prototype-based models in unsupervised and supervised machine learning
PPSX
January 2020: Prototype-based systems in machine learning
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
ESE-Eyes-2023.pdf
APPIS-FDGPET.pdf
stat-phys-appis-reduced.pdf
prototypes-AMALEA.pdf
stat-phys-AMALEA.pdf
Evidence for tissue and stage-specific composition of the ribosome: machine l...
The statistical physics of learning revisted: Phase transitions in layered ne...
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: So you thought the ribosome was constant and conserved ...
Prototype-based models in machine learning
The statistical physics of learning - revisited
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2017: Prototype-based models in unsupervised and supervised machine learning
January 2020: Prototype-based systems in machine learning

Recently uploaded (20)

PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PDF
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
PPTX
Pharmacology of Autonomic nervous system
PPTX
2Systematics of Living Organisms t-.pptx
PPT
protein biochemistry.ppt for university classes
PPTX
Microbiology with diagram medical studies .pptx
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PPTX
2. Earth - The Living Planet earth and life
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PDF
HPLC-PPT.docx high performance liquid chromatography
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PPT
6.1 High Risk New Born. Padetric health ppt
TOTAL hIP ARTHROPLASTY Presentation.pptx
Placing the Near-Earth Object Impact Probability in Context
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Formation of Supersonic Turbulence in the Primordial Star-forming Cloud
Pharmacology of Autonomic nervous system
2Systematics of Living Organisms t-.pptx
protein biochemistry.ppt for university classes
Microbiology with diagram medical studies .pptx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
7. General Toxicologyfor clinical phrmacy.pptx
2. Earth - The Living Planet earth and life
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
HPLC-PPT.docx high performance liquid chromatography
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
Phytochemical Investigation of Miliusa longipes.pdf
6.1 High Risk New Born. Padetric health ppt

Interpretable machine-learning (in endocrinology and beyond)

  • 1. Interpretable machine learning models in endocrinology and beyond Michael Biehl www.cs.rug.nl/~biehl Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence University of Groningen, The Netherlands Centre for Systems Modelling & Quantitative Biomedicine
  • 2. supervised learning: regression / classification data: observations, e.g. vectors of num. values regression problems: predict quantitative property e.g. a b estimate weight , model: example data set classification tasks: assign data to a category x1 x2 bulls cows training: optimize parameters ⇧ model: linear separation “bull” else “cow” x1 girth x2 length x1
  • 3. preferred: transparent/interpretable, white box avoid blind application of ML in black box mode interpretable machine learning popular keywords: explainable AI (XAI) fair, honest, trustworthy … AI
  • 4. preferred: transparent/interpretable, white box avoid blind application of ML in black box mode interpretable machine learning interpretable machine learning - understand how decisions are taken - avoid artifacts, e.g. due to hidden bias in the data - obtain insight into the data set/problem - posthoc simplification of the model … accuracy is not enough [Paulo Lisboa] … is not necessarily the goal (e.g. basic research, biomarker identification) 😺 vs. 🐶 (sometimes it is)
  • 5. • training: represent data by one or several prototypes per class • working: classify a query according to the label of the nearest prototype • decision boundaries according to (Euclidean) distances + + low storage needs little computational effort parameterized in feature space, intuitive and interpretable one intuitive framework: prototype systems for distance-based classification Learning Vector Quantization (LVQ) N-dim. feature space ? x1 x2
  • 6. distance measures and relevance learning distance measure compares prototypes data points (squared) Euclidean distance - all features equally important ? - features of the same type/scale ? - are features independent ?
  • 7. distance measure compares prototypes data points generalized measure relevance of a particular single feature contribution of a pair of features training: optimize prototypes and relevance matrix w.r.t. performance on training data ( objective function ) Generalized Matrix Relevance LVQ
  • 8. application example: steroid metabolomics adrenocortical tumors (adenoma vs. carcinoma) www.ensat.org benign ACA malignant ACC features: 32 steroid metabolite excretion values (GC/MS) non-invasive measurement (24 hrs. urine) steroid # set of labelled example data aim: develop a tool / support system for differential diagnosis idea: analyse retrospective data by machine learning identify characteristic steroid prototypes and relevances
  • 9. Generalized Matrix LVQ , ACC vs. ACA classification o pre-processing: log-transformation of excretion values • data split into 90% training, 10% validation set • training: determine prototypes and relevance matrix representative profiles (1 per class) parameterizes distance measure • validation: apply classifier to 10% hold-out data evaluates expected performance (error rates, ROC, … ) o repeat and average results over many random splits application example: steroid metabolomics
  • 10. ROC characteristics clear improvement due to relevance learning on average over 1000 randomized splits 1-specificity sensitivity diagonal rel. Euclidean full matrix AUC 0.87 0.93 0.97 validation performance no relevances only diagonal full more than accuracy ?
  • 11. prototypes: steroid excretion in ACA/ACC ACA ACC (z-score transformed) metabolite excretion above - average below above - average below insight: prototypes
  • 12. … pairs of markers importance of single markers insight: relevance matrix 5-PT 5-PD THS facilitates selection of reduced panels with similar performance
  • 13. ACA ACC relevances confirm – surprise – visualize 19 THS individually discriminative
  • 14. relevances (8) 5⍺ THA (12) TH-Doc ??? confirm - surprise - visualize ACC ACA GMLVQ: multivariate analysis, discriminative combinations
  • 15. ACA ACC relevance matrix is dominated by leading eigenvectors confirm – surprise - visualize • visualize data set and prototypes  misclassifications? • inspect individual cases o uncertain cases  outliers
  • 16. GMLVQ: example of an interpretable classifier - class representatives in terms of orginal feature space - relevances of single features / combinations thereof - visualization & low-dimensional representation summary example application: steroid metabolomics based tumor classification et al. prospective
  • 17. steroid metabolomics: on-going and future work - identify reduced panels of metabolites - monitoring of patients, detection of recurrences - other disorders relating to steroid metabolism … other biomedical applications GMLVQ and similar methods: - analysis of cytokine markers in rheumatoid arthritis - neuroimaging: FDG-PET scans in neurodegenerative disorders - gene expression for risk prediction in cancer - mRNA expression for the analysis of ribosome composition … methodological extensions: - high-dimensional data, heterogeneous data - modified distance measures, local relevances - probabilistic classification, forms of regression … outlook
  • 18. IEEE Members News, March 2021 girth x2 length x1 some take stay home messages exploit domain knowledge (c) https://twitter.com/jessenleon
  • 19. some links and example references www.cs.rug.nl/~biehl publications, news, links GMLVQ code: Matlab, Python, Java M. Biehl, B. Hammer, T. Villmann. Prototype-based models in machine learning Advanced Review in WIRES Cognitive Science, 7(2): 92-111, 2016 M. Biehl. Biomedical Applications of Prototype Based Classifiers and Relevance Learning In: International Conference on Algorithms for Computational Biology AlCoB 2017 Springer Lecture Notes in Computer Science 10252: 3-23, 2017 R. van Veen, V. Gurvits, R. Kogan, S. Meles, G.-J. de Vries, R. Renken et al. An application of Generalized Matrix Learning Vector Quantization in Neuroimaging Computer Methods and Programs in Biomedicine, Vol. 197: 105708, 2020 A. Moolla, J. de Boer, D. Pavlov, A. Amin et al. Accurate non-invasive diagnosis and staging of non-alcoholic fatty liver disease using the urinary steroid metabolome Alimentary Pharmacology and Therapeutics 51: 1188-1197, 2020