SlideShare a Scribd company logo
1
Improved Interpretability for Computer-Aided Severity
Assessment of Retinopathy of Prematurity
M. Graziani, J. M. Brown, V. Andrearczyk, V. Yildiz, J. P. Campbell, D. Erdogmus, S.
Ioannidis, M. F. Chiang, J. Kalpathy-Cramer, and H. Müller
2
What is Retinopathy Of Prematurity (ROP)?
‣ Abnormal growth of blood vessels in the retina

‣ 14,000 to 16,000 premature infants (per year, in U.S.)1, 10 % requires prompt
treatment to avoid retina detachment, blindness (incidence is growing)

‣ Staging from 1 to 5, detection of pre-plus or plus

‣ Detection of Plus disease, high disagreement among experts
1. U.S National Eye Institute (nei.nih.gov/health/rop/rop)
Fig 1. Stage 3: visual examples of Normal, Pre-plus and Plus cases
3
Detection of Plus: a Machine Learning approach
Four steps:

‣ Vessel Segmentation

‣ Centerline Tracing 

‣ Feature Extraction

‣ Classification
* Rate of changing velocity between points wrt curve length

Table 1. Feature types and descriptions

We extract 11 types of handcrafted features from the images, whose importance to the evaluation o
disease was evaluated by Aeter-Cansizoglu et al.2
The impact of the features choice on the diag
thoroughly investigated in the literature and the selected method constitutes a reference standard
inter-expert agreement.2
For each type of feature, 8 traditional statistics (such as minimum, maximu
median and second and third moments) and 5 Gaussian Mixture Model (GMM) statistics are extr
a total of 143 handcrafted features (more details in the Appendix 1). Such features are extracted
automated vessel segmentations and express curvature, tortuosity and dilation of retinal arteries
(details reported in Table 1). The features in Table 1 are first computed independently for each ve
image. The ”vesselness” of the whole retinal sample is then summarized by standard statistics such as
median of the per-vessel features. A ranking of the features is computed on the basis of their Gini coe
random forest classification of normal and pre-plus or worse on 100 random train-test splits (with rep
of the data. The retaining criterion used for this analysis identified a set of six measures that covered
of clinically interpretable features, discarding measures with a frequency of appearance lower than 1
ranking. The retained measures were: curvature mean, curvature median, avg point diameter mean, av
diameter mean, cti mean and cti median. Notwithstanding, the same analysis can be repeated with
criterion or with the exhaustive analysis of all the 143 features.
Feature Description Clinical interpretation
curvature (s) rate of direction change
avg segment diameter #pixels/Lc(x) global dilation
avg point diameter Wn(x) absolute dilation
Cumulative Tortuosity Index (CTI) cti(x) = Lc(x)/Lx(x) curving, curling, twisting rate
Table 1: Handcrafted feature description and clinical interpretation. (s) describes the rate of chang
ity between points with respect to the rate of changing the curve length between points. Lc and
respectively curve and chord length. Wn denotes the width of vessel on the normal direction.
3.3.2 Regression Concept Vectors
RCVs are computed by seeking in the activation space of a layer the direction of greatest increase
measurements for one retinal concept. This direction is computed as the LLS regression of the retin
Vessel width

curve length chord length
*
segmentation tracing feature eng. classification
Fig 2. Classification pipeline of handcrafted features
2
copy and crop
input
image
tile
output
segmentation
map
64
1
128
256
512
1024
max pool 2x2
up-conv 2x2
conv 3x3, ReLU
572
x
572
284²
64
128
256
512
570
x
570
568
x
568
282²
280²
140²
138²
136²
68²
66²
64²
32²
28²
56²
54²
52²
512
104²
102²
100²
200²
30²
198²
196²
392
x
392
390
x
390
388
x
388
388
x
388
1024
512 256
256 128
64
128 64 2
conv 1x1
Fig. 1. U-net architecture (example for 32x32 pixels in the lowest resolution). Each blue
box corresponds to a multi-channel feature map. The number of channels is denoted
on top of the box. The x-y-size is provided at the lower left edge of the box. White
boxes represent copied feature maps. The arrows denote the di↵erent operations.
as input. First, this network can localize. Secondly, the training data in terms
of patches is much larger than the number of training images. The resulting
network won the EM segmentation challenge at ISBI 2012 by a large margin.
Obviously, the strategy in Ciresan et al. [1] has two drawbacks. First, it
is quite slow because the network must be run separately for each patch, and
there is a lot of redundancy due to overlapping patches. Secondly, there is a
trade-o↵ between localization accuracy and the use of context. Larger patches
require more max-pooling layers that reduce the localization accuracy, while
small patches allow the network to see only little context. More recent approaches
[11,4] proposed a classifier output that takes into account the features from
multiple layers. Good localization and the use of context are possible at the
same time.
In this paper, we build upon a more elegant architecture, the so-called “fully
convolutional network” [9]. We modify and extend this architecture such that it
works with very few training images and yields more precise segmentations; see
Figure 1. The main idea in [9] is to supplement a usual contracting network by
successive layers, where pooling operators are replaced by upsampling operators.
Hence, these layers increase the resolution of the output. In order to localize, high
resolution features from the contracting path are combined with the upsampled
4
Detection of Plus: a Deep Learning approach
Inception V1
Normal

Preplus

Plus
UNet
Performances significantly higher than non-experts!
Fig 3. End-to-end classification with Deep Learning

[Brown J., et al., 2018] SPIE 2018
2
copy and crop
input
image
tile
output
segmentation
map
64
1
128
256
512
1024
max pool 2x2
up-conv 2x2
conv 3x3, ReLU
572
x
572
284²
64
128
256
512
570
x
570
568
x
568
282²
280²
140²
138²
136²
68²
66²
64²
32²
28²
56²
54²
52²
512
104²
102²
100²
200²
30²
198²
196²
392
x
392
390
x
390
388
x
388
388
x
388
1024
512 256
256 128
64
128 64 2
conv 1x1
Fig. 1. U-net architecture (example for 32x32 pixels in the lowest resolution). Each blue
box corresponds to a multi-channel feature map. The number of channels is denoted
on top of the box. The x-y-size is provided at the lower left edge of the box. White
boxes represent copied feature maps. The arrows denote the di↵erent operations.
as input. First, this network can localize. Secondly, the training data in terms
of patches is much larger than the number of training images. The resulting
network won the EM segmentation challenge at ISBI 2012 by a large margin.
Obviously, the strategy in Ciresan et al. [1] has two drawbacks. First, it
is quite slow because the network must be run separately for each patch, and
there is a lot of redundancy due to overlapping patches. Secondly, there is a
trade-o↵ between localization accuracy and the use of context. Larger patches
require more max-pooling layers that reduce the localization accuracy, while
small patches allow the network to see only little context. More recent approaches
[11,4] proposed a classifier output that takes into account the features from
multiple layers. Good localization and the use of context are possible at the
same time.
In this paper, we build upon a more elegant architecture, the so-called “fully
convolutional network” [9]. We modify and extend this architecture such that it
works with very few training images and yields more precise segmentations; see
Figure 1. The main idea in [9] is to supplement a usual contracting network by
successive layers, where pooling operators are replaced by upsampling operators.
Hence, these layers increase the resolution of the output. In order to localize, high
resolution features from the contracting path are combined with the upsampled
5
Detection of Plus: a Deep Learning approach
>5K images
3024 training
(1084 normal; 1074 pre-plus; 1080 plus)
965 validation
(817 normal; 148 pre-plus; 20 plus)
Inception V1
Normal
Preplus
Plus
UNet
Performances comparable to experts and
significantly higher than non-experts
TRUST ?
INTERPRET ?
EXPLAIN ?
2
copy and crop
input
image
tile
output
segmentation
map
64
1
128
256
512
1024
max pool 2x2
up-conv 2x2
conv 3x3, ReLU
572
x
572
284²
64
128
256
512
570
x
570
568
x
568
282²
280²
140²
138²
136²
68²
66²
64²
32²
28²
56²
54²
52²
512
104²
102²
100²
200²
30²
198²
196²
392
x
392
390
x
390
388
x
388
388
x
388
1024
512 256
256 128
64
128 64 2
conv 1x1
Fig. 1. U-net architecture (example for 32x32 pixels in the lowest resolution). Each blue
box corresponds to a multi-channel feature map. The number of channels is denoted
on top of the box. The x-y-size is provided at the lower left edge of the box. White
boxes represent copied feature maps. The arrows denote the di↵erent operations.
as input. First, this network can localize. Secondly, the training data in terms
of patches is much larger than the number of training images. The resulting
network won the EM segmentation challenge at ISBI 2012 by a large margin.
Obviously, the strategy in Ciresan et al. [1] has two drawbacks. First, it
is quite slow because the network must be run separately for each patch, and
there is a lot of redundancy due to overlapping patches. Secondly, there is a
trade-o↵ between localization accuracy and the use of context. Larger patches
require more max-pooling layers that reduce the localization accuracy, while
small patches allow the network to see only little context. More recent approaches
[11,4] proposed a classifier output that takes into account the features from
multiple layers. Good localization and the use of context are possible at the
same time.
In this paper, we build upon a more elegant architecture, the so-called “fully
convolutional network” [9]. We modify and extend this architecture such that it
works with very few training images and yields more precise segmentations; see
Figure 1. The main idea in [9] is to supplement a usual contracting network by
successive layers, where pooling operators are replaced by upsampling operators.
Hence, these layers increase the resolution of the output. In order to localize, high
resolution features from the contracting path are combined with the upsampled
6
Detection of Plus: a Deep Learning approach
>5K images
3024 training
(1084 normal; 1074 pre-plus; 1080 plus)
965 validation
(817 normal; 148 pre-plus; 20 plus)
Inception V1
Normal
Preplus
Plus
UNet
Performances comparable to experts and
significantly higher than non-experts
TRUST ?
INTERPRET ?
EXPLAIN ?
If we only could make sure
that the network is looking
at the same things that we
look at…
Can we relate hand-crafted visual
features to DL features?
7
8
Interpretability with Concept Activation Vectors
Classification in the
activation space
Relevance scores

for each concept
Select concept and images
1
[Kim B. et al., 2018] ICML 2018
Fig 4. credits: Testing with Concept Activation Vectors, Kim B. Et al, 2018
9
Interpretability with Concept Activation Vectors
Classification in the
activation space
Relevance scores

for each concept
Select concept and images
1
2
[Kim B. et al., 2018] ICML 2018
Fig 4. credits: Testing with Concept Activation Vectors, Kim B. Et al, 2018
10
Interpretability with Concept Activation Vectors
Classification in the
activation space
Relevance scores

for each concept
Select concept and images
1
2
3
[Kim B. et al., 2018] ICML 2018
Fig 4. credits: Testing with Concept Activation Vectors, Kim B. Et al, 2018
11
Concept measures for continuous features
‣ Medical applications often rely on continuous measures, which can be
related to a clinical interpretation

‣ Such measures are often used to compute hand-crafted visual features.
‣ Regression Concept Vectors extend TCAV to continuous concept
measures [Graziani et al., 2018]

[Graziani et al., 2018] iMIMIC at MICCAI 2018
ter-expert agreement.2
For each type of feature, 8 traditional statistics (such as minimum, maximum, mean,
edian and second and third moments) and 5 Gaussian Mixture Model (GMM) statistics are extracted, for
total of 143 handcrafted features (more details in the Appendix 1). Such features are extracted from the
utomated vessel segmentations and express curvature, tortuosity and dilation of retinal arteries and veins
details reported in Table 1). The features in Table 1 are first computed independently for each vessel in the
mage. The ”vesselness” of the whole retinal sample is then summarized by standard statistics such as mean and
edian of the per-vessel features. A ranking of the features is computed on the basis of their Gini coefficient for
ndom forest classification of normal and pre-plus or worse on 100 random train-test splits (with replacement)
the data. The retaining criterion used for this analysis identified a set of six measures that covered a wide set
clinically interpretable features, discarding measures with a frequency of appearance lower than 10% in the
nking. The retained measures were: curvature mean, curvature median, avg point diameter mean, avg segment
ameter mean, cti mean and cti median. Notwithstanding, the same analysis can be repeated with a di↵erent
iterion or with the exhaustive analysis of all the 143 features.
Feature Description Clinical interpretation
curvature (s) rate of direction change
avg segment diameter #pixels/Lc(x) global dilation
avg point diameter Wn(x) absolute dilation
Cumulative Tortuosity Index (CTI) cti(x) = Lc(x)/Lx(x) curving, curling, twisting rate
able 1: Handcrafted feature description and clinical interpretation. (s) describes the rate of changing veloc-
y between points with respect to the rate of changing the curve length between points. Lc and Lx denote
spectively curve and chord length. Wn denotes the width of vessel on the normal direction.
3.2 Regression Concept Vectors
CVs are computed by seeking in the activation space of a layer the direction of greatest increase of a set of
easurements for one retinal concept. This direction is computed as the LLS regression of the retinal concept
Table 1. Feature types and clinical interpretation
segmentation
12
Regression Concept Vectors (RCVs)
= ‘pre-plus’ large
Trained network
= ‘normal’
= ‘plus’
Main steps:

‣ Selection of concept measures (set of images, annotations)

‣ LLS regression of the concept measure given the activation vector 

‣ Computation of sensitivity and relevance scores
‣ Replace classification with Linear Least Squares (LLS) regression of the
concept measures for a set of inputs
Fig 5. Interpretation of the model of Brown et al. (SPIE 2018) with RCVs
‣ for individual explanations
x
13
Sensitivity and relevance
‣ Br for global explanations
Regression
determination
coefficient
Standard
deviation
Mean of 

the
Fig 6. Directional derivative of the decision function
over the RCV direction 

More details can be found in [Graziani et al., 2018] iMIMIC at MICCAI 2018
14
Regression is better in plus
Figure 7. Comparison of the R2 for inputs of class normal (left) vs plus (right)
15
Sensitivity is negative for normal inputs
raw segmented
Individual relevance
pn = 0.22
ppre = 0.70
pplus = 0.08
GT: normal; prediction: normal
cti median
cti mean
curvature median
Avg point diameter mean
Avg segment diameter median
curvature mean
raw segmented
Individual relevance
pn = 0.99
ppre = 0.009
pplus = 0.0
1.082

1.168

0.118

0.447

5.24

5.89
-1 1
0
1.030

1.045

0.040

0.095

3.775

4.247
Original 

concept
measures
Figure 8. Interpretation of the network’s decision for the single datapoint
16
Bidirectional scores give global explanations
Figure 9. Comparison of the R2 for inputs of class normal (left) vs plus (right)
17
Summary
Can we relate hand-crafted visual features to DL features?
YES!
‣ RCVs allow to measure the relevance of hand-crafted visual features
in the end-to-end classification by the deep network

‣ Sensitivity scores are large and positive for plus cases

‣ Concepts of tortuosity, curvature and dilation are relevant to the
classification
18
Questions?
https://github.com/medgift/SPIE2019_interpretableROP
http://medgift.hevs.ch/wordpress/
mara.graziani@hevs.ch

More Related Content

PDF
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
PDF
An efficient color image compression technique
PDF
Review paper on segmentation methods for multiobject feature extraction
PDF
A new approach on noise estimation of images
PDF
BoDong-ICPR2014-CameraReady
PDF
An efficient image segmentation approach through enhanced watershed algorithm
PDF
40120140507003
PDF
Pixel Recursive Super Resolution. Google Brain
INVESTIGATIONS OF THE INFLUENCES OF A CNN’S RECEPTIVE FIELD ON SEGMENTATION O...
An efficient color image compression technique
Review paper on segmentation methods for multiobject feature extraction
A new approach on noise estimation of images
BoDong-ICPR2014-CameraReady
An efficient image segmentation approach through enhanced watershed algorithm
40120140507003
Pixel Recursive Super Resolution. Google Brain

What's hot (19)

PDF
Rendering Process of Digital Terrain Model on Mobile Devices
PDF
International Journal of Engineering Inventions (IJEI)
PDF
H0114857
PDF
A Novel Multiple-kernel based Fuzzy c-means Algorithm with Spatial Informatio...
PPTX
Marker Controlled Segmentation Technique for Medical application
PDF
Dual Tree Complex Wavelet Transform, Probabilistic Neural Network and Fuzzy C...
PDF
Maximizing Strength of Digital Watermarks Using Fuzzy Logic
PDF
REGION CLASSIFICATION BASED IMAGE DENOISING USING SHEARLET AND WAVELET TRANSF...
PDF
Automated Medical image segmentation for detection of abnormal masses using W...
PDF
BIG DATA-DRIVEN FAST REDUCING THE VISUAL BLOCK ARTIFACTS OF DCT COMPRESSED IM...
PDF
A Low Hardware Complex Bilinear Interpolation Algorithm of Image Scaling for ...
PDF
Intelligent Parallel Processing and Compound Image Compression
PDF
[IJET-V2I2P10] Authors:M. Dhivya, P. Jenifer, D. C. Joy Winnie Wise, N. Rajap...
PDF
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
PDF
A version of watershed algorithm for color image segmentation
PPTX
A study and comparison of different image segmentation algorithms
PDF
Enhancement of genetic image watermarking robust against cropping attack
PDF
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...
Rendering Process of Digital Terrain Model on Mobile Devices
International Journal of Engineering Inventions (IJEI)
H0114857
A Novel Multiple-kernel based Fuzzy c-means Algorithm with Spatial Informatio...
Marker Controlled Segmentation Technique for Medical application
Dual Tree Complex Wavelet Transform, Probabilistic Neural Network and Fuzzy C...
Maximizing Strength of Digital Watermarks Using Fuzzy Logic
REGION CLASSIFICATION BASED IMAGE DENOISING USING SHEARLET AND WAVELET TRANSF...
Automated Medical image segmentation for detection of abnormal masses using W...
BIG DATA-DRIVEN FAST REDUCING THE VISUAL BLOCK ARTIFACTS OF DCT COMPRESSED IM...
A Low Hardware Complex Bilinear Interpolation Algorithm of Image Scaling for ...
Intelligent Parallel Processing and Compound Image Compression
[IJET-V2I2P10] Authors:M. Dhivya, P. Jenifer, D. C. Joy Winnie Wise, N. Rajap...
A NOVEL APPROACH TO SMOOTHING ON 3D STRUCTURED ADAPTIVE MESH OF THE KINECT-BA...
A version of watershed algorithm for color image segmentation
A study and comparison of different image segmentation algorithms
Enhancement of genetic image watermarking robust against cropping attack
4 ijaems jun-2015-5-hybrid algorithmic approach for medical image compression...
Ad

Similar to Improved interpretability for Computer-Aided Assessment of Retinopathy of Prematurity (20)

PDF
Garbage Classification Using Deep Learning Techniques
PDF
Reconfiguration layers of convolutional neural network for fundus patches cla...
PDF
Lung Cancer Detection Using Convolutional Neural Network
PDF
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
PDF
International Journal of Computational Science, Information Technology and Co...
PDF
6119ijcsitce01
PDF
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
PDF
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
PPTX
Deep learning and computer vision
PDF
A Review on Color Recognition using Deep Learning and Different Image Segment...
PDF
Mnist report
PDF
CONVOLUTIONAL NEURAL NETWORK BASED RETINAL VESSEL SEGMENTATION
PDF
Convolutional Neural Network based Retinal Vessel Segmentation
PDF
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
PDF
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
PDF
A survey on the layers of convolutional Neural Network
PPTX
Mnist report ppt
PPTX
brain tumor.pptx
PDF
PPTX
UNetEliyaLaialy (2).pptx
Garbage Classification Using Deep Learning Techniques
Reconfiguration layers of convolutional neural network for fundus patches cla...
Lung Cancer Detection Using Convolutional Neural Network
RunPool: A Dynamic Pooling Layer for Convolution Neural Network
International Journal of Computational Science, Information Technology and Co...
6119ijcsitce01
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
Deep learning and computer vision
A Review on Color Recognition using Deep Learning and Different Image Segment...
Mnist report
CONVOLUTIONAL NEURAL NETWORK BASED RETINAL VESSEL SEGMENTATION
Convolutional Neural Network based Retinal Vessel Segmentation
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
EDGE-Net: Efficient Deep-learning Gradients Extraction Network
A survey on the layers of convolutional Neural Network
Mnist report ppt
brain tumor.pptx
UNetEliyaLaialy (2).pptx
Ad

Recently uploaded (20)

PDF
Empathic Computing: Creating Shared Understanding
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
Tartificialntelligence_presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Approach and Philosophy of On baking technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Assigned Numbers - 2025 - Bluetooth® Document
Empathic Computing: Creating Shared Understanding
Group 1 Presentation -Planning and Decision Making .pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Building Integrated photovoltaic BIPV_UPV.pdf
Tartificialntelligence_presentation.pptx
Machine learning based COVID-19 study performance prediction
Approach and Philosophy of On baking technology
MIND Revenue Release Quarter 2 2025 Press Release
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
cuic standard and advanced reporting.pdf
Spectral efficient network and resource selection model in 5G networks
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Spectroscopy.pptx food analysis technology
Reach Out and Touch Someone: Haptics and Empathic Computing
Assigned Numbers - 2025 - Bluetooth® Document

Improved interpretability for Computer-Aided Assessment of Retinopathy of Prematurity

  • 1. 1 Improved Interpretability for Computer-Aided Severity Assessment of Retinopathy of Prematurity M. Graziani, J. M. Brown, V. Andrearczyk, V. Yildiz, J. P. Campbell, D. Erdogmus, S. Ioannidis, M. F. Chiang, J. Kalpathy-Cramer, and H. Müller
  • 2. 2 What is Retinopathy Of Prematurity (ROP)? ‣ Abnormal growth of blood vessels in the retina ‣ 14,000 to 16,000 premature infants (per year, in U.S.)1, 10 % requires prompt treatment to avoid retina detachment, blindness (incidence is growing) ‣ Staging from 1 to 5, detection of pre-plus or plus ‣ Detection of Plus disease, high disagreement among experts 1. U.S National Eye Institute (nei.nih.gov/health/rop/rop) Fig 1. Stage 3: visual examples of Normal, Pre-plus and Plus cases
  • 3. 3 Detection of Plus: a Machine Learning approach Four steps: ‣ Vessel Segmentation ‣ Centerline Tracing ‣ Feature Extraction ‣ Classification * Rate of changing velocity between points wrt curve length Table 1. Feature types and descriptions We extract 11 types of handcrafted features from the images, whose importance to the evaluation o disease was evaluated by Aeter-Cansizoglu et al.2 The impact of the features choice on the diag thoroughly investigated in the literature and the selected method constitutes a reference standard inter-expert agreement.2 For each type of feature, 8 traditional statistics (such as minimum, maximu median and second and third moments) and 5 Gaussian Mixture Model (GMM) statistics are extr a total of 143 handcrafted features (more details in the Appendix 1). Such features are extracted automated vessel segmentations and express curvature, tortuosity and dilation of retinal arteries (details reported in Table 1). The features in Table 1 are first computed independently for each ve image. The ”vesselness” of the whole retinal sample is then summarized by standard statistics such as median of the per-vessel features. A ranking of the features is computed on the basis of their Gini coe random forest classification of normal and pre-plus or worse on 100 random train-test splits (with rep of the data. The retaining criterion used for this analysis identified a set of six measures that covered of clinically interpretable features, discarding measures with a frequency of appearance lower than 1 ranking. The retained measures were: curvature mean, curvature median, avg point diameter mean, av diameter mean, cti mean and cti median. Notwithstanding, the same analysis can be repeated with criterion or with the exhaustive analysis of all the 143 features. Feature Description Clinical interpretation curvature (s) rate of direction change avg segment diameter #pixels/Lc(x) global dilation avg point diameter Wn(x) absolute dilation Cumulative Tortuosity Index (CTI) cti(x) = Lc(x)/Lx(x) curving, curling, twisting rate Table 1: Handcrafted feature description and clinical interpretation. (s) describes the rate of chang ity between points with respect to the rate of changing the curve length between points. Lc and respectively curve and chord length. Wn denotes the width of vessel on the normal direction. 3.3.2 Regression Concept Vectors RCVs are computed by seeking in the activation space of a layer the direction of greatest increase measurements for one retinal concept. This direction is computed as the LLS regression of the retin Vessel width curve length chord length * segmentation tracing feature eng. classification Fig 2. Classification pipeline of handcrafted features
  • 4. 2 copy and crop input image tile output segmentation map 64 1 128 256 512 1024 max pool 2x2 up-conv 2x2 conv 3x3, ReLU 572 x 572 284² 64 128 256 512 570 x 570 568 x 568 282² 280² 140² 138² 136² 68² 66² 64² 32² 28² 56² 54² 52² 512 104² 102² 100² 200² 30² 198² 196² 392 x 392 390 x 390 388 x 388 388 x 388 1024 512 256 256 128 64 128 64 2 conv 1x1 Fig. 1. U-net architecture (example for 32x32 pixels in the lowest resolution). Each blue box corresponds to a multi-channel feature map. The number of channels is denoted on top of the box. The x-y-size is provided at the lower left edge of the box. White boxes represent copied feature maps. The arrows denote the di↵erent operations. as input. First, this network can localize. Secondly, the training data in terms of patches is much larger than the number of training images. The resulting network won the EM segmentation challenge at ISBI 2012 by a large margin. Obviously, the strategy in Ciresan et al. [1] has two drawbacks. First, it is quite slow because the network must be run separately for each patch, and there is a lot of redundancy due to overlapping patches. Secondly, there is a trade-o↵ between localization accuracy and the use of context. Larger patches require more max-pooling layers that reduce the localization accuracy, while small patches allow the network to see only little context. More recent approaches [11,4] proposed a classifier output that takes into account the features from multiple layers. Good localization and the use of context are possible at the same time. In this paper, we build upon a more elegant architecture, the so-called “fully convolutional network” [9]. We modify and extend this architecture such that it works with very few training images and yields more precise segmentations; see Figure 1. The main idea in [9] is to supplement a usual contracting network by successive layers, where pooling operators are replaced by upsampling operators. Hence, these layers increase the resolution of the output. In order to localize, high resolution features from the contracting path are combined with the upsampled 4 Detection of Plus: a Deep Learning approach Inception V1 Normal Preplus Plus UNet Performances significantly higher than non-experts! Fig 3. End-to-end classification with Deep Learning [Brown J., et al., 2018] SPIE 2018
  • 5. 2 copy and crop input image tile output segmentation map 64 1 128 256 512 1024 max pool 2x2 up-conv 2x2 conv 3x3, ReLU 572 x 572 284² 64 128 256 512 570 x 570 568 x 568 282² 280² 140² 138² 136² 68² 66² 64² 32² 28² 56² 54² 52² 512 104² 102² 100² 200² 30² 198² 196² 392 x 392 390 x 390 388 x 388 388 x 388 1024 512 256 256 128 64 128 64 2 conv 1x1 Fig. 1. U-net architecture (example for 32x32 pixels in the lowest resolution). Each blue box corresponds to a multi-channel feature map. The number of channels is denoted on top of the box. The x-y-size is provided at the lower left edge of the box. White boxes represent copied feature maps. The arrows denote the di↵erent operations. as input. First, this network can localize. Secondly, the training data in terms of patches is much larger than the number of training images. The resulting network won the EM segmentation challenge at ISBI 2012 by a large margin. Obviously, the strategy in Ciresan et al. [1] has two drawbacks. First, it is quite slow because the network must be run separately for each patch, and there is a lot of redundancy due to overlapping patches. Secondly, there is a trade-o↵ between localization accuracy and the use of context. Larger patches require more max-pooling layers that reduce the localization accuracy, while small patches allow the network to see only little context. More recent approaches [11,4] proposed a classifier output that takes into account the features from multiple layers. Good localization and the use of context are possible at the same time. In this paper, we build upon a more elegant architecture, the so-called “fully convolutional network” [9]. We modify and extend this architecture such that it works with very few training images and yields more precise segmentations; see Figure 1. The main idea in [9] is to supplement a usual contracting network by successive layers, where pooling operators are replaced by upsampling operators. Hence, these layers increase the resolution of the output. In order to localize, high resolution features from the contracting path are combined with the upsampled 5 Detection of Plus: a Deep Learning approach >5K images 3024 training (1084 normal; 1074 pre-plus; 1080 plus) 965 validation (817 normal; 148 pre-plus; 20 plus) Inception V1 Normal Preplus Plus UNet Performances comparable to experts and significantly higher than non-experts TRUST ? INTERPRET ? EXPLAIN ?
  • 6. 2 copy and crop input image tile output segmentation map 64 1 128 256 512 1024 max pool 2x2 up-conv 2x2 conv 3x3, ReLU 572 x 572 284² 64 128 256 512 570 x 570 568 x 568 282² 280² 140² 138² 136² 68² 66² 64² 32² 28² 56² 54² 52² 512 104² 102² 100² 200² 30² 198² 196² 392 x 392 390 x 390 388 x 388 388 x 388 1024 512 256 256 128 64 128 64 2 conv 1x1 Fig. 1. U-net architecture (example for 32x32 pixels in the lowest resolution). Each blue box corresponds to a multi-channel feature map. The number of channels is denoted on top of the box. The x-y-size is provided at the lower left edge of the box. White boxes represent copied feature maps. The arrows denote the di↵erent operations. as input. First, this network can localize. Secondly, the training data in terms of patches is much larger than the number of training images. The resulting network won the EM segmentation challenge at ISBI 2012 by a large margin. Obviously, the strategy in Ciresan et al. [1] has two drawbacks. First, it is quite slow because the network must be run separately for each patch, and there is a lot of redundancy due to overlapping patches. Secondly, there is a trade-o↵ between localization accuracy and the use of context. Larger patches require more max-pooling layers that reduce the localization accuracy, while small patches allow the network to see only little context. More recent approaches [11,4] proposed a classifier output that takes into account the features from multiple layers. Good localization and the use of context are possible at the same time. In this paper, we build upon a more elegant architecture, the so-called “fully convolutional network” [9]. We modify and extend this architecture such that it works with very few training images and yields more precise segmentations; see Figure 1. The main idea in [9] is to supplement a usual contracting network by successive layers, where pooling operators are replaced by upsampling operators. Hence, these layers increase the resolution of the output. In order to localize, high resolution features from the contracting path are combined with the upsampled 6 Detection of Plus: a Deep Learning approach >5K images 3024 training (1084 normal; 1074 pre-plus; 1080 plus) 965 validation (817 normal; 148 pre-plus; 20 plus) Inception V1 Normal Preplus Plus UNet Performances comparable to experts and significantly higher than non-experts TRUST ? INTERPRET ? EXPLAIN ? If we only could make sure that the network is looking at the same things that we look at…
  • 7. Can we relate hand-crafted visual features to DL features? 7
  • 8. 8 Interpretability with Concept Activation Vectors Classification in the activation space Relevance scores for each concept Select concept and images 1 [Kim B. et al., 2018] ICML 2018 Fig 4. credits: Testing with Concept Activation Vectors, Kim B. Et al, 2018
  • 9. 9 Interpretability with Concept Activation Vectors Classification in the activation space Relevance scores for each concept Select concept and images 1 2 [Kim B. et al., 2018] ICML 2018 Fig 4. credits: Testing with Concept Activation Vectors, Kim B. Et al, 2018
  • 10. 10 Interpretability with Concept Activation Vectors Classification in the activation space Relevance scores for each concept Select concept and images 1 2 3 [Kim B. et al., 2018] ICML 2018 Fig 4. credits: Testing with Concept Activation Vectors, Kim B. Et al, 2018
  • 11. 11 Concept measures for continuous features ‣ Medical applications often rely on continuous measures, which can be related to a clinical interpretation ‣ Such measures are often used to compute hand-crafted visual features. ‣ Regression Concept Vectors extend TCAV to continuous concept measures [Graziani et al., 2018] [Graziani et al., 2018] iMIMIC at MICCAI 2018 ter-expert agreement.2 For each type of feature, 8 traditional statistics (such as minimum, maximum, mean, edian and second and third moments) and 5 Gaussian Mixture Model (GMM) statistics are extracted, for total of 143 handcrafted features (more details in the Appendix 1). Such features are extracted from the utomated vessel segmentations and express curvature, tortuosity and dilation of retinal arteries and veins details reported in Table 1). The features in Table 1 are first computed independently for each vessel in the mage. The ”vesselness” of the whole retinal sample is then summarized by standard statistics such as mean and edian of the per-vessel features. A ranking of the features is computed on the basis of their Gini coefficient for ndom forest classification of normal and pre-plus or worse on 100 random train-test splits (with replacement) the data. The retaining criterion used for this analysis identified a set of six measures that covered a wide set clinically interpretable features, discarding measures with a frequency of appearance lower than 10% in the nking. The retained measures were: curvature mean, curvature median, avg point diameter mean, avg segment ameter mean, cti mean and cti median. Notwithstanding, the same analysis can be repeated with a di↵erent iterion or with the exhaustive analysis of all the 143 features. Feature Description Clinical interpretation curvature (s) rate of direction change avg segment diameter #pixels/Lc(x) global dilation avg point diameter Wn(x) absolute dilation Cumulative Tortuosity Index (CTI) cti(x) = Lc(x)/Lx(x) curving, curling, twisting rate able 1: Handcrafted feature description and clinical interpretation. (s) describes the rate of changing veloc- y between points with respect to the rate of changing the curve length between points. Lc and Lx denote spectively curve and chord length. Wn denotes the width of vessel on the normal direction. 3.2 Regression Concept Vectors CVs are computed by seeking in the activation space of a layer the direction of greatest increase of a set of easurements for one retinal concept. This direction is computed as the LLS regression of the retinal concept Table 1. Feature types and clinical interpretation segmentation
  • 12. 12 Regression Concept Vectors (RCVs) = ‘pre-plus’ large Trained network = ‘normal’ = ‘plus’ Main steps: ‣ Selection of concept measures (set of images, annotations) ‣ LLS regression of the concept measure given the activation vector ‣ Computation of sensitivity and relevance scores ‣ Replace classification with Linear Least Squares (LLS) regression of the concept measures for a set of inputs Fig 5. Interpretation of the model of Brown et al. (SPIE 2018) with RCVs
  • 13. ‣ for individual explanations x 13 Sensitivity and relevance ‣ Br for global explanations Regression determination coefficient Standard deviation Mean of the Fig 6. Directional derivative of the decision function over the RCV direction More details can be found in [Graziani et al., 2018] iMIMIC at MICCAI 2018
  • 14. 14 Regression is better in plus Figure 7. Comparison of the R2 for inputs of class normal (left) vs plus (right)
  • 15. 15 Sensitivity is negative for normal inputs raw segmented Individual relevance pn = 0.22 ppre = 0.70 pplus = 0.08 GT: normal; prediction: normal cti median cti mean curvature median Avg point diameter mean Avg segment diameter median curvature mean raw segmented Individual relevance pn = 0.99 ppre = 0.009 pplus = 0.0 1.082 1.168 0.118 0.447 5.24 5.89 -1 1 0 1.030 1.045 0.040 0.095 3.775 4.247 Original concept measures Figure 8. Interpretation of the network’s decision for the single datapoint
  • 16. 16 Bidirectional scores give global explanations Figure 9. Comparison of the R2 for inputs of class normal (left) vs plus (right)
  • 17. 17 Summary Can we relate hand-crafted visual features to DL features? YES! ‣ RCVs allow to measure the relevance of hand-crafted visual features in the end-to-end classification by the deep network ‣ Sensitivity scores are large and positive for plus cases ‣ Concepts of tortuosity, curvature and dilation are relevant to the classification