SlideShare a Scribd company logo
MEDICAL IMAGE COMPUTING (CAP 5937)
LECTURE 14: Evaluation Framework for Medical Image
Segmentation
Dr. Ulas Bagci
HEC 221, Center for Research in Computer
Vision (CRCV), University of Central Florida
(UCF), Orlando, FL 32814.
bagci@ucf.edu or bagci@crcv.ucf.edu
1SPRING 2017
Outline
• How to evaluate accuracy of image segmentation?
– Gold standard ~ surrogate of truths
– Qualitative
• Visual
• Inter- and intra-observer agreement rates
– Quantitative
• Volumetric measurements (regression)
• Region overlaps
• Shape based measurements
• Theoretical comparisons
• STAPLE, Uncertainty guidance, and evaluation w/o truths
2
Visual Assessment
3
Manual image segmentation from the full spectrum of IDEAL MRI data to delineate red: SAT,
green: VAT, blue: liver, yellow: pancreas, purple: kidneys. Left to right: water- only, fat-only, in-
phase, out-of-phase, fat fraction, and segmented labels from SliceOmatic.
Reference: Assessment of Abdominal Adiposity and Organ Fat with Magnetic Resonance Imaging (chp11).
Inherent Uncertainty
4
Comparison of glioblastoma multiforme (GBM) segmentation results on an axial slice: semi-
automatic segmentation under Slicer (green, left image) and pure manual segmentation (blue,
middle image). Egger et al., Nat Sci Rep., 2012.
Inherent Uncertainty 5
red: endocardium; green: epicardium; yellow: ground truth
Queiros et al., European Heart Journal, 2016.
Segmentation Evaluation
Can be considered to consist of two components:
(1) Theoretical
Study mathematical equivalence among algorithms.
(2) Empirical
Study practical performance of algorithms in specific application
domains.
6
Segmentation Evaluation: Theoretical
Fundamental challenges in segmentation evaluation:
(Ch1) Are major pI (purely Image based) frameworks such as active
contours, level sets, graph cuts, fuzzy connectedness, watersheds,
truly distinct or some level of equivalence exists among them?
7
Segmentation Evaluation: Theoretical
Fundamental challenges in segmentation evaluation:
(Ch1) Are major pI (purely Image based) frameworks such as active
contours, level sets, graph cuts, fuzzy connectedness, watersheds,
truly distinct or some level of equivalence exists among them?
(Ch2) How to develop truly distinct methods constituting real
advance?
8
Segmentation Evaluation: Theoretical
Fundamental challenges in segmentation evaluation:
(Ch1) Are major pI (purely Image based) frameworks such as active
contours, level sets, graph cuts, fuzzy connectedness, watersheds,
truly distinct or some level of equivalence exists among them?
(Ch2) How to develop truly distinct methods constituting real
advance?
(Ch3) How to choose a method for a given application domain?
9
Segmentation Evaluation: Theoretical
Fundamental challenges in segmentation evaluation:
(Ch1) Are major pI (purely Image based) frameworks such as active
contours, level sets, graph cuts, fuzzy connectedness, watersheds,
truly distinct or some level of equivalence exists among them?
(Ch2) How to develop truly distinct methods constituting real
advance?
(Ch3) How to choose a method for a given application domain?
(Ch4) How to set an algorithm optimally for an application
domain?
10
Segmentation Evaluation: Theoretical
Fundamental challenges in segmentation evaluation:
(Ch1) Are major pI (purely Image based) frameworks such as active
contours, level sets, graph cuts, fuzzy connectedness, watersheds,
truly distinct or some level of equivalence exists among them?
(Ch2) How to develop truly distinct methods constituting real
advance?
(Ch3) How to choose a method for a given application domain?
(Ch4) How to set an algorithm optimally for an application
domain?
Currently any method A can be shown empirically to be better than any
method B, even when they are equivalent.
11
Segmentation Evaluation: Theoretical
Attributes commonly used by segmentation methods:
(1) Connectedness
(2) Texture
(3) Smoothness of boundary
(4) Gradient / homogeneity
(5) Shape information about object
(6) Noise handling
(7) Optimization employed
(8) Orientedness of boundary
Attributes utilized by well-known delineation models
Connected Gradient Texture Smooth Shape Noise Optimize
Fuzzy con Yes Gr = hom
affinity
Obj feat
affinity
No No Scale
FC
In RFC
Chan-Vese No No Yes Yes No No Yes
Mum-Shah No No Yes Yes No Yes Yes
KWT snake Boundary Yes No Yes No No Yes
MSV LS Fg when
expandng
Yes No No No No No
Live wire Boundary Yes Yes Yes User No Yes
Act. shape Yes No No No Yes No Yes
Act. app Yes No Yes No Yes No Yes
Graph cut Usly not Yes Possible No No No Yes
Clustering No No Yes No No No Yes
SEGMENTATION		EVALUATION:	Theoretical
Attributes utilized by well-known delineation models
Connected Gradient Texture Smooth Shape Noise Optimize
Fuzzy con Yes Gr = hom
affinity
Obj feat
affinity
No No Scale
FC
In RFC
Chan-Vese No No Yes Yes No No Yes
Mum-Shah No No Yes Yes No Yes Yes
KWT snake Boundary Yes No Yes No No Yes
MSV LS Fg when
expandng
Yes No No No No No
Live wire Boundary Yes Yes Yes User No Yes
Act. shape Yes No No No Yes No Yes
Act. app Yes No Yes No Yes No Yes
Graph cut Usly not Yes Possible No No No Yes
Clustering No No Yes No No No Yes
SEGMENTATION		EVALUATION:	Theoretical
Deep
Learning
Yes Yes Yes Yes Yes Yes Yes
Segmentation Evaluation: Empirical
T :
B :
P :
Example: Estimating the volume
of brain.
A body region -
Imaging protocol -
Application domain: A particular triple .
A task -
Example: Head.
Example: T2 weighted MR
imaging with a particular set of
parameters.
Q: A set of scenes acquired for a particular application
domain
, ,á ñT B P
, , .T B Pá ñ
Segmentation Evaluation: Empirical
16
The segmentation efficacy of a method M in an application
domain may be characterized by three groups
of factors:
Precision :
(Reliability)
Repeatability taking into account all
subjective actions influencing the result.
Accuracy :
(Validity)
Degree to which the result agrees with
truth.
Efficiency :
(Viability)
Practical viability of the method.
, ,T B Pá ñ
Validation of Image Segmentation
• Spectrum of accuracy versus realism in reference standard.
• Digital phantoms.
– Ground truth known accurately.
– Not so realistic.
• Acquisitions and careful segmentation.
– Some uncertainty in ground truth.
– More realistic.
• Autopsy/histopathology.
– Addresses pathology directly; resolution.
• Clinical data ?
– Hard to know ground truth.
– Most realistic model.
Slide Credit: N. Archip
Comparison To Higher Resolution
MRI Photograph MRI
Provided by Peter Ratiu and Florin Talos.
Credit: N. Archip
Segmentation Evaluation: Empirical
19
Intra operator variations
Inter operator variations
Intra scanner variations
Inter scanner variations
Inter scanner variations include variations due to the
same brand and different brands.
Repeatability taking into account all subjective actions
that influence the segmentation result.
Precision
Segmentation Evaluation: Empirical
20
Precision
( )
-
1 - , = 3, 4.
+ 2
1 2
i
1 2
O O
M MT
M O O
M M
PR i=
C C
C C
A measure of precision for method M in a trial that produces
and for situation Ti is given by
Intra/inter operator
Intra/inter scanner
may be binary or fuzzy segmentations.
1O
MC 2O
MC
CM
O1
,CM
O2
Segmentation Evaluation: Empirical
21
Accuracy
The degree to which segmentations agree with true
segmentation.
Surrogates of truth are needed.
For any image C acquired for application domain
CM
O
- segmentation of O in C by method M,
Ctd
- surrogate of true delineation of O in C.
22
TPFP
TN
FN
True segmentation
O
MC
tdC
Segmentation
by algorithm M.
FP
FN
Ud
Segmentation Evaluation: Empirical
23
FNVFM
d
=
Ctd
− CM
O
Ctd
, TPVFM
d
=
Ctd
∩ CM
O
Ctd
FPVFM
d
=
CM
O
− Ctd
Ud
- Ctd
, TNVFM
d
=
Ud
− CM
O
-Ctd
Ud
-Ctd
,
Ud : A binary scene representing a reference super set
(for example, this may be the body region that is imaged).
: Amount of tissue truly in that is missed by .
: Amount of tissue falsely delineated by .
d
M
d
M
FNVF O M
FPVF M
Segmentation Evaluation: Empirical
24
Requirements for accuracy metrics:
(1) Capture M’s behavior of trade-off between FP and FN.
(2) Satisfy laws of tissue conservation:
(3) Capable of characterizing the range of behavior of M.
(4) Any monotonic function g(FNVF, FPVF) is fine as a
metric.
(5) Appropriate for
1
1
d d
M M
d d
M M
FNVF TPVF
FPVF TNVF
= -
= -
, , .T B Pá ñ
25
Segmentation Evaluation: Empirical
Segmentation Evaluation: Empirical
26
1-FNVF
FPVF
Brain WM
segmentation
in PD MR
images.
Each value of parameter vector p of M gives a point on the
DOC curve.
The DOC curve characterizes the behavior of M over a range of
parametric values of M.
Delineation Operating Characteristic
:MA Area under
the DOC curve
Segmentation Evaluation: Empirical
27
, ,á ñT B P
.
FPVF
1-FNVF
0
1
p - parameter vector for method M
gp(FPVF, FNVF) - monotonic fn
p* = arg min p [gp(FPVF, FNVF)]
Set M to operate at p*.
Optimally setting an algorithm for
1
Existent Segmentation Data
28
Expert 1 Expert 2 Expert 3 Expert 4
Original
Image
• Manual
segmentation
performed by 4
independent experts
• low grade glioma
Expert and Student Segmentations
29
Test image ? ?
? ?
Expert and Student Segmentations
30
Test image Expert consensus Student 1
Student 2 Student 3
Segmentation Evaluation: Empirical
31
Describes practical viability of a method.
Four factors should be considered:
(1) Computational time – for one time training of M
(2) Computational time – for segmenting each scene
(3) Human time – for one-time training of M
(4) Human time – for segmenting each scene
(2) and (4) are crucial. (4) determines the degree of
automation of M.
Efficiency
( )1c
Mt
( )2c
Mt
( )1h
Mt
( )2h
Mt
Segmentation Evaluation: Empirical
32
Precision : Accuracy :
:
:
:
: Area under the DOC curveintra scanner
FN fraction for delineation:inter operator
FP fraction for delineation:intra operator1T
MPR
2T
MPR
3T
MPR
d
MFPVF
MA
d
MFNVF
Efficiency :
operator time for scene segmentation.:
operator time for algorithm training.:
computational time for scene segmentation.:
computational time for algorithm training.:1c
Mt
2c
Mt
1h
Mt
2h
Mt
4T
MPR : inter scanner
Remarks
33
(1) Precision, accuracy, efficiency are interdependent.
accuracy à efficiency.
precision and accuracy à difficult.
(2) “Automatic segmentation method” has no meaning unless the
results are proven on a large number of data sets with
acceptable precision, accuracy, efficiency, and with .
(3) A descriptive answer to “is method M1 better than M2 under
?” in terms of the 11 parameters is more meaningful
than a “yes” or “no” answer.
(4) DOC is essential to describe the range of behavior of M.
2h
Mt = 0
, ,T B Pá ñ
Velazquez et al, Scientific Reports 2013.
34
Shape Based Metrics for Segmentation
Evaluation
35
Sensitivity=94.69%
Specificity=94.19%
Sensitivity=72.99%
Specificity=78.16%
If you use only DSC (dice similarity, or overlap measure), DSC values are similar to each other
In both examples (but not sensitivity-specificity values).
Sufficient Enough?
Hausdorff Distance
• Can be used for a complementary evaluation metric to the
overlap measure for measuring boundary mismatches!
36
Hausdorff Distance
• Can be used for a complementary evaluation metric to the
overlap measure for measuring boundary mismatches!
• Lower Haussdorff Distance (HD), Better segmentation
accuracy!
37
( ))(max),(maxmax),( bdadBAHD A
Bb
B
Aa ÎÎ
=
( )),(min)( badad
Bb
B
Î
= is a distance of one point a
on A from B
Segmentation Evaluation: STAPLE
38
• STAPLE (Simultaneous Truth and Performance Level
Estimation):
– An algorithm for estimating performance and ground truth from a
collection of independent segmentations.
– Warfield, Zou, Wells MICCAI 2002.
– Warfield, Zou, Wells, IEEE TMI 2004.
– Publicly Available
– The STAPLE algorithm ( Warfield et al., 2004) is a region formulation
for producing consensus segmentations.
– When foreground is small à weight w is small
Segmentation Evaluation: STAPLE
• Segmentations are generated by sampling independently at
each voxel.
• However, the produced segmentations may not be realistic
for two reasons.
– First, the variability of the segmentation does not account for the
intensity in the image such that borders with strong gradients are
equally variable as borders with weak gradient. This is counter intuitive
as the basic hypothesis of image segmentation is that changes of
intensity are correlated with changes of labels.
– Second, borders of the segmented structures are unrealistic mainly
due to their lack of geometric regularity.
39
Regression Analysis in Clinical Problems
• Linear regression between volume(s)
– automated segmentation’s volume vs. manual segmentation’s volume
– Bland-Altman plot
• Linear regression between visual inspection (raters)
– Kappa statistics
– t-test / p-value
• Significantly different volumes ? Score ?
40
Regression Analysis in Clinical Problems
41
Manual segmentation
Vedentham, et al. JCIS, 2014
What is Bland-Altman plot?
42
What is Bland-Altman plot?
• is a method of data plotting used in analyzing the agreement
between two different assays.
• Claim: any two methods that are designed to measure the
same parameter should have good correlation.
– X-axis: mean of the two measurement
– Y-axis: difference between the two values
• Good first step analyzing the data!
43
Bland-Altman Plots (e.g., airway segmentation
evaluation)
44
Xu, Bagci, et al. MedIA, 2015.
New Directions: Sampling Image
Segmentations (Le et al, MedIA, 2016)
• Automatically produce plausible image segmentation samples
from a single expert segmentation!
45
New Directions: Sampling Image
Segmentations (Le et al, MedIA, 2016)
• Automatically produce plausible image segmentation samples
from a single expert segmentation!
• A probability distribution of image segmentation boundaries is
defined as Gaussian Process, which leads to segmentations
which are spatially coherent and consistent with the presence
of salient borders in the image.
46
The Gaussian Density
47
Remark: Gaussian Process (GP) ?
48
Credit: Ghahramani
Remark: Gaussian Process (GP) ?
49
Credit: Ghahramani
Remark: Gaussian Process (GP) ?
50
Credit: Ghahramani
Remark: (GP) ? 51
Remark: (GP) ? 52
New Directions: Sampling Image
Segmentations (Le et al, MedIA, 2016)
• Automatically produce plausible image segmentation samples
from a single expert segmentation!
• A probability distribution of image segmentation boundaries is
defined as Gaussian Process, which leads to segmentations
which are spatially coherent and consistent with the presence
of salient borders in the image.
53
Sample segmentation contours according to
mean inter-sample dice coefficient!
54
(Top Left) Mean of the GP µ; (Top Middle) Sample of the level set function φ(a) drawn from
𝒢𝒫(µ,Σ) (Others) GPSSI samples. The ground truth is outlined in red, the GPSSI samples are
outlined in orange.
55
New Directions: Sampling Image
Segmentations (Le et al, MedIA, 2016)
(Left) Signed geodesic distance µ(a) of the ROI with isocontours –45, 0, 45, 100, 200.
(Right) One can check that the samples most probably lie in the region delineated by
the isocontours µ(a)=±45. The sampled contours are in orange.
New Directions: Sampling Image
Segmentations (Le et al, MedIA, 2016)
56
New Directions: Sampling Image
Segmentations (Le et al, MedIA, 2016)
57
Provocative Question?
• Can we evaluate segmentation error without the ground
truth?
58
Provocative Question?
• Can we evaluate segmentation error without the ground
truth?
– With the machine learning support, can we design a classifier which
LEARNS segmentation error and adapt itself for better delineation?
59
Summary
• Segmentation Evaluation
– Theoretical vs. Empirical
– Visual Assessment
– Volumetric Agreement
– Efficacy (efficiency, accuracy, …)
– STAPLE
– New Trends!
– Segmentation Challenges (choose your project!)
60
Slide Credits and References
• Credits to: Jayaram K. Udupa of Univ. of Penn., MIPG
• Bagci’s CV Course 2015 Fall.
• K.D. Toennies, Guide to Medical Image Analysis,
• Handbook of Medical Imaging, Vol. 2. SPIE Press.
• Handbook of Biomedical Imaging, Paragios, Duncan, Ayache.
• Seutens,P., Medical Imaging, Cambridge Press.
• Neculai Archip, Ph.D
• Simon K. Warfield, Ph.D. (See STAPLE Algorithm)
61

More Related Content

PDF
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...
PDF
Lec8: Medical Image Segmentation (II) (Region Growing/Merging)
PDF
Data Science - Part XVII - Deep Learning & Image Processing
PDF
Lec13: Clustering Based Medical Image Segmentation Methods
PDF
Lec5: Pre-Processing Medical Images (III) (MRI Intensity Standardization)
PPTX
Radon Transform - image analysis
PDF
Lec6: Pre-Processing for Nuclear Medicine Images
PDF
Lec3: Pre-Processing Medical Images
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...
Lec8: Medical Image Segmentation (II) (Region Growing/Merging)
Data Science - Part XVII - Deep Learning & Image Processing
Lec13: Clustering Based Medical Image Segmentation Methods
Lec5: Pre-Processing Medical Images (III) (MRI Intensity Standardization)
Radon Transform - image analysis
Lec6: Pre-Processing for Nuclear Medicine Images
Lec3: Pre-Processing Medical Images

What's hot (20)

PPTX
COM2304: Introduction to Computer Vision & Image Processing
PPTX
Filtering an image is to apply a convolution
PDF
Ph.D Dissertation Defense Slides on Efficient VLSI Architectures for Image En...
PDF
Lec15: Medical Image Registration (Introduction)
PPTX
HIGH PASS FILTER IN DIGITAL IMAGE PROCESSING
PDF
Lec4: Pre-Processing Medical Images (II)
PDF
Lecture 15 DCT, Walsh and Hadamard Transform
PDF
Lec10: Medical Image Segmentation as an Energy Minimization Problem
PPTX
Image enhancement
PPSX
Image Processing Basics
PPT
6 spatial filtering p2
PPTX
Chain code in dip
PDF
4.intensity transformations
PDF
Medical image analysis
PPTX
Digital Image restoration
PPT
Thresholding.ppt
PPTX
Image Enhancement in Spatial Domain
PDF
CV_1 Introduction of Computer Vision and its Application
PPTX
Image seg using_thresholding
PDF
Image restoration
COM2304: Introduction to Computer Vision & Image Processing
Filtering an image is to apply a convolution
Ph.D Dissertation Defense Slides on Efficient VLSI Architectures for Image En...
Lec15: Medical Image Registration (Introduction)
HIGH PASS FILTER IN DIGITAL IMAGE PROCESSING
Lec4: Pre-Processing Medical Images (II)
Lecture 15 DCT, Walsh and Hadamard Transform
Lec10: Medical Image Segmentation as an Energy Minimization Problem
Image enhancement
Image Processing Basics
6 spatial filtering p2
Chain code in dip
4.intensity transformations
Medical image analysis
Digital Image restoration
Thresholding.ppt
Image Enhancement in Spatial Domain
CV_1 Introduction of Computer Vision and its Application
Image seg using_thresholding
Image restoration
Ad

Viewers also liked (6)

PDF
Lec16: Medical Image Registration (Advanced): Deformable Registration
PDF
Lec1: Medical Image Computing - Introduction
PDF
Lec11: Active Contour and Level Set for Medical Image Segmentation
PDF
Lec9: Medical Image Segmentation (III) (Fuzzy Connected Image Segmentation)
PDF
Lec12: Shape Models and Medical Image Segmentation
PDF
Lec2: Digital Images and Medical Imaging Modalities
Lec16: Medical Image Registration (Advanced): Deformable Registration
Lec1: Medical Image Computing - Introduction
Lec11: Active Contour and Level Set for Medical Image Segmentation
Lec9: Medical Image Segmentation (III) (Fuzzy Connected Image Segmentation)
Lec12: Shape Models and Medical Image Segmentation
Lec2: Digital Images and Medical Imaging Modalities
Ad

Similar to Lec14: Evaluation Framework for Medical Image Segmentation (20)

PDF
MRI Image Segmentation Using Gradient Based Watershed Transform In Level Set ...
PDF
Ai4201231234
PDF
MRI Brain Image Segmentation using Fuzzy Clustering Algorithms
PDF
Automated brain tumor detection and segmentation from mri images using adapti...
PDF
3D Segmentation of Brain Tumor Imaging
PDF
A brief review of segmentation methods for medical
PDF
A brief review of segmentation methods for medical images
PDF
Survey on Brain MRI Segmentation Techniques
PPTX
sec dc.pptx
PDF
Contour evolution method for precise boundary delineation of medical images
PDF
Ea4301770773
PPTX
Automatic left ventricle segmentation
PDF
Image segmentation by modified map ml estimations
PDF
Image segmentation by modified map ml
PDF
IMAGE SEGMENTATION BY MODIFIED MAP-ML ESTIMATIONS
PDF
Hybrid Approach for Brain Tumour Detection in Image Segmentation
PPT
Automatic MRI brain segmentation using local features, Self-Organizing Maps, ...
PDF
Q0460398103
PDF
C1103041623
PDF
10.1109@tip.2020.2990346
MRI Image Segmentation Using Gradient Based Watershed Transform In Level Set ...
Ai4201231234
MRI Brain Image Segmentation using Fuzzy Clustering Algorithms
Automated brain tumor detection and segmentation from mri images using adapti...
3D Segmentation of Brain Tumor Imaging
A brief review of segmentation methods for medical
A brief review of segmentation methods for medical images
Survey on Brain MRI Segmentation Techniques
sec dc.pptx
Contour evolution method for precise boundary delineation of medical images
Ea4301770773
Automatic left ventricle segmentation
Image segmentation by modified map ml estimations
Image segmentation by modified map ml
IMAGE SEGMENTATION BY MODIFIED MAP-ML ESTIMATIONS
Hybrid Approach for Brain Tumour Detection in Image Segmentation
Automatic MRI brain segmentation using local features, Self-Organizing Maps, ...
Q0460398103
C1103041623
10.1109@tip.2020.2990346

Recently uploaded (20)

PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PPTX
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPTX
famous lake in india and its disturibution and importance
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
PPTX
2. Earth - The Living Planet earth and life
PPTX
Microbiology with diagram medical studies .pptx
PPT
protein biochemistry.ppt for university classes
PDF
bbec55_b34400a7914c42429908233dbd381773.pdf
PPTX
Derivatives of integument scales, beaks, horns,.pptx
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
neck nodes and dissection types and lymph nodes levels
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Cell Membrane: Structure, Composition & Functions
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
DRUG THERAPY FOR SHOCK gjjjgfhhhhh.pptx.
Biophysics 2.pdffffffffffffffffffffffffff
famous lake in india and its disturibution and importance
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
AlphaEarth Foundations and the Satellite Embedding dataset
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
EPIDURAL ANESTHESIA ANATOMY AND PHYSIOLOGY.pptx
2. Earth - The Living Planet earth and life
Microbiology with diagram medical studies .pptx
protein biochemistry.ppt for university classes
bbec55_b34400a7914c42429908233dbd381773.pdf
Derivatives of integument scales, beaks, horns,.pptx
Comparative Structure of Integument in Vertebrates.pptx
2. Earth - The Living Planet Module 2ELS
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
neck nodes and dissection types and lymph nodes levels
ECG_Course_Presentation د.محمد صقران ppt
Cell Membrane: Structure, Composition & Functions

Lec14: Evaluation Framework for Medical Image Segmentation

  • 1. MEDICAL IMAGE COMPUTING (CAP 5937) LECTURE 14: Evaluation Framework for Medical Image Segmentation Dr. Ulas Bagci HEC 221, Center for Research in Computer Vision (CRCV), University of Central Florida (UCF), Orlando, FL 32814. bagci@ucf.edu or bagci@crcv.ucf.edu 1SPRING 2017
  • 2. Outline • How to evaluate accuracy of image segmentation? – Gold standard ~ surrogate of truths – Qualitative • Visual • Inter- and intra-observer agreement rates – Quantitative • Volumetric measurements (regression) • Region overlaps • Shape based measurements • Theoretical comparisons • STAPLE, Uncertainty guidance, and evaluation w/o truths 2
  • 3. Visual Assessment 3 Manual image segmentation from the full spectrum of IDEAL MRI data to delineate red: SAT, green: VAT, blue: liver, yellow: pancreas, purple: kidneys. Left to right: water- only, fat-only, in- phase, out-of-phase, fat fraction, and segmented labels from SliceOmatic. Reference: Assessment of Abdominal Adiposity and Organ Fat with Magnetic Resonance Imaging (chp11).
  • 4. Inherent Uncertainty 4 Comparison of glioblastoma multiforme (GBM) segmentation results on an axial slice: semi- automatic segmentation under Slicer (green, left image) and pure manual segmentation (blue, middle image). Egger et al., Nat Sci Rep., 2012.
  • 5. Inherent Uncertainty 5 red: endocardium; green: epicardium; yellow: ground truth Queiros et al., European Heart Journal, 2016.
  • 6. Segmentation Evaluation Can be considered to consist of two components: (1) Theoretical Study mathematical equivalence among algorithms. (2) Empirical Study practical performance of algorithms in specific application domains. 6
  • 7. Segmentation Evaluation: Theoretical Fundamental challenges in segmentation evaluation: (Ch1) Are major pI (purely Image based) frameworks such as active contours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them? 7
  • 8. Segmentation Evaluation: Theoretical Fundamental challenges in segmentation evaluation: (Ch1) Are major pI (purely Image based) frameworks such as active contours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them? (Ch2) How to develop truly distinct methods constituting real advance? 8
  • 9. Segmentation Evaluation: Theoretical Fundamental challenges in segmentation evaluation: (Ch1) Are major pI (purely Image based) frameworks such as active contours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them? (Ch2) How to develop truly distinct methods constituting real advance? (Ch3) How to choose a method for a given application domain? 9
  • 10. Segmentation Evaluation: Theoretical Fundamental challenges in segmentation evaluation: (Ch1) Are major pI (purely Image based) frameworks such as active contours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them? (Ch2) How to develop truly distinct methods constituting real advance? (Ch3) How to choose a method for a given application domain? (Ch4) How to set an algorithm optimally for an application domain? 10
  • 11. Segmentation Evaluation: Theoretical Fundamental challenges in segmentation evaluation: (Ch1) Are major pI (purely Image based) frameworks such as active contours, level sets, graph cuts, fuzzy connectedness, watersheds, truly distinct or some level of equivalence exists among them? (Ch2) How to develop truly distinct methods constituting real advance? (Ch3) How to choose a method for a given application domain? (Ch4) How to set an algorithm optimally for an application domain? Currently any method A can be shown empirically to be better than any method B, even when they are equivalent. 11
  • 12. Segmentation Evaluation: Theoretical Attributes commonly used by segmentation methods: (1) Connectedness (2) Texture (3) Smoothness of boundary (4) Gradient / homogeneity (5) Shape information about object (6) Noise handling (7) Optimization employed (8) Orientedness of boundary
  • 13. Attributes utilized by well-known delineation models Connected Gradient Texture Smooth Shape Noise Optimize Fuzzy con Yes Gr = hom affinity Obj feat affinity No No Scale FC In RFC Chan-Vese No No Yes Yes No No Yes Mum-Shah No No Yes Yes No Yes Yes KWT snake Boundary Yes No Yes No No Yes MSV LS Fg when expandng Yes No No No No No Live wire Boundary Yes Yes Yes User No Yes Act. shape Yes No No No Yes No Yes Act. app Yes No Yes No Yes No Yes Graph cut Usly not Yes Possible No No No Yes Clustering No No Yes No No No Yes SEGMENTATION EVALUATION: Theoretical
  • 14. Attributes utilized by well-known delineation models Connected Gradient Texture Smooth Shape Noise Optimize Fuzzy con Yes Gr = hom affinity Obj feat affinity No No Scale FC In RFC Chan-Vese No No Yes Yes No No Yes Mum-Shah No No Yes Yes No Yes Yes KWT snake Boundary Yes No Yes No No Yes MSV LS Fg when expandng Yes No No No No No Live wire Boundary Yes Yes Yes User No Yes Act. shape Yes No No No Yes No Yes Act. app Yes No Yes No Yes No Yes Graph cut Usly not Yes Possible No No No Yes Clustering No No Yes No No No Yes SEGMENTATION EVALUATION: Theoretical Deep Learning Yes Yes Yes Yes Yes Yes Yes
  • 15. Segmentation Evaluation: Empirical T : B : P : Example: Estimating the volume of brain. A body region - Imaging protocol - Application domain: A particular triple . A task - Example: Head. Example: T2 weighted MR imaging with a particular set of parameters. Q: A set of scenes acquired for a particular application domain , ,á ñT B P , , .T B Pá ñ
  • 16. Segmentation Evaluation: Empirical 16 The segmentation efficacy of a method M in an application domain may be characterized by three groups of factors: Precision : (Reliability) Repeatability taking into account all subjective actions influencing the result. Accuracy : (Validity) Degree to which the result agrees with truth. Efficiency : (Viability) Practical viability of the method. , ,T B Pá ñ
  • 17. Validation of Image Segmentation • Spectrum of accuracy versus realism in reference standard. • Digital phantoms. – Ground truth known accurately. – Not so realistic. • Acquisitions and careful segmentation. – Some uncertainty in ground truth. – More realistic. • Autopsy/histopathology. – Addresses pathology directly; resolution. • Clinical data ? – Hard to know ground truth. – Most realistic model. Slide Credit: N. Archip
  • 18. Comparison To Higher Resolution MRI Photograph MRI Provided by Peter Ratiu and Florin Talos. Credit: N. Archip
  • 19. Segmentation Evaluation: Empirical 19 Intra operator variations Inter operator variations Intra scanner variations Inter scanner variations Inter scanner variations include variations due to the same brand and different brands. Repeatability taking into account all subjective actions that influence the segmentation result. Precision
  • 20. Segmentation Evaluation: Empirical 20 Precision ( ) - 1 - , = 3, 4. + 2 1 2 i 1 2 O O M MT M O O M M PR i= C C C C A measure of precision for method M in a trial that produces and for situation Ti is given by Intra/inter operator Intra/inter scanner may be binary or fuzzy segmentations. 1O MC 2O MC CM O1 ,CM O2
  • 21. Segmentation Evaluation: Empirical 21 Accuracy The degree to which segmentations agree with true segmentation. Surrogates of truth are needed. For any image C acquired for application domain CM O - segmentation of O in C by method M, Ctd - surrogate of true delineation of O in C.
  • 23. Segmentation Evaluation: Empirical 23 FNVFM d = Ctd − CM O Ctd , TPVFM d = Ctd ∩ CM O Ctd FPVFM d = CM O − Ctd Ud - Ctd , TNVFM d = Ud − CM O -Ctd Ud -Ctd , Ud : A binary scene representing a reference super set (for example, this may be the body region that is imaged). : Amount of tissue truly in that is missed by . : Amount of tissue falsely delineated by . d M d M FNVF O M FPVF M
  • 24. Segmentation Evaluation: Empirical 24 Requirements for accuracy metrics: (1) Capture M’s behavior of trade-off between FP and FN. (2) Satisfy laws of tissue conservation: (3) Capable of characterizing the range of behavior of M. (4) Any monotonic function g(FNVF, FPVF) is fine as a metric. (5) Appropriate for 1 1 d d M M d d M M FNVF TPVF FPVF TNVF = - = - , , .T B Pá ñ
  • 26. Segmentation Evaluation: Empirical 26 1-FNVF FPVF Brain WM segmentation in PD MR images. Each value of parameter vector p of M gives a point on the DOC curve. The DOC curve characterizes the behavior of M over a range of parametric values of M. Delineation Operating Characteristic :MA Area under the DOC curve
  • 27. Segmentation Evaluation: Empirical 27 , ,á ñT B P . FPVF 1-FNVF 0 1 p - parameter vector for method M gp(FPVF, FNVF) - monotonic fn p* = arg min p [gp(FPVF, FNVF)] Set M to operate at p*. Optimally setting an algorithm for 1
  • 28. Existent Segmentation Data 28 Expert 1 Expert 2 Expert 3 Expert 4 Original Image • Manual segmentation performed by 4 independent experts • low grade glioma
  • 29. Expert and Student Segmentations 29 Test image ? ? ? ?
  • 30. Expert and Student Segmentations 30 Test image Expert consensus Student 1 Student 2 Student 3
  • 31. Segmentation Evaluation: Empirical 31 Describes practical viability of a method. Four factors should be considered: (1) Computational time – for one time training of M (2) Computational time – for segmenting each scene (3) Human time – for one-time training of M (4) Human time – for segmenting each scene (2) and (4) are crucial. (4) determines the degree of automation of M. Efficiency ( )1c Mt ( )2c Mt ( )1h Mt ( )2h Mt
  • 32. Segmentation Evaluation: Empirical 32 Precision : Accuracy : : : : : Area under the DOC curveintra scanner FN fraction for delineation:inter operator FP fraction for delineation:intra operator1T MPR 2T MPR 3T MPR d MFPVF MA d MFNVF Efficiency : operator time for scene segmentation.: operator time for algorithm training.: computational time for scene segmentation.: computational time for algorithm training.:1c Mt 2c Mt 1h Mt 2h Mt 4T MPR : inter scanner
  • 33. Remarks 33 (1) Precision, accuracy, efficiency are interdependent. accuracy à efficiency. precision and accuracy à difficult. (2) “Automatic segmentation method” has no meaning unless the results are proven on a large number of data sets with acceptable precision, accuracy, efficiency, and with . (3) A descriptive answer to “is method M1 better than M2 under ?” in terms of the 11 parameters is more meaningful than a “yes” or “no” answer. (4) DOC is essential to describe the range of behavior of M. 2h Mt = 0 , ,T B Pá ñ
  • 34. Velazquez et al, Scientific Reports 2013. 34
  • 35. Shape Based Metrics for Segmentation Evaluation 35 Sensitivity=94.69% Specificity=94.19% Sensitivity=72.99% Specificity=78.16% If you use only DSC (dice similarity, or overlap measure), DSC values are similar to each other In both examples (but not sensitivity-specificity values). Sufficient Enough?
  • 36. Hausdorff Distance • Can be used for a complementary evaluation metric to the overlap measure for measuring boundary mismatches! 36
  • 37. Hausdorff Distance • Can be used for a complementary evaluation metric to the overlap measure for measuring boundary mismatches! • Lower Haussdorff Distance (HD), Better segmentation accuracy! 37 ( ))(max),(maxmax),( bdadBAHD A Bb B Aa ÎÎ = ( )),(min)( badad Bb B Î = is a distance of one point a on A from B
  • 38. Segmentation Evaluation: STAPLE 38 • STAPLE (Simultaneous Truth and Performance Level Estimation): – An algorithm for estimating performance and ground truth from a collection of independent segmentations. – Warfield, Zou, Wells MICCAI 2002. – Warfield, Zou, Wells, IEEE TMI 2004. – Publicly Available – The STAPLE algorithm ( Warfield et al., 2004) is a region formulation for producing consensus segmentations. – When foreground is small à weight w is small
  • 39. Segmentation Evaluation: STAPLE • Segmentations are generated by sampling independently at each voxel. • However, the produced segmentations may not be realistic for two reasons. – First, the variability of the segmentation does not account for the intensity in the image such that borders with strong gradients are equally variable as borders with weak gradient. This is counter intuitive as the basic hypothesis of image segmentation is that changes of intensity are correlated with changes of labels. – Second, borders of the segmented structures are unrealistic mainly due to their lack of geometric regularity. 39
  • 40. Regression Analysis in Clinical Problems • Linear regression between volume(s) – automated segmentation’s volume vs. manual segmentation’s volume – Bland-Altman plot • Linear regression between visual inspection (raters) – Kappa statistics – t-test / p-value • Significantly different volumes ? Score ? 40
  • 41. Regression Analysis in Clinical Problems 41 Manual segmentation Vedentham, et al. JCIS, 2014
  • 43. What is Bland-Altman plot? • is a method of data plotting used in analyzing the agreement between two different assays. • Claim: any two methods that are designed to measure the same parameter should have good correlation. – X-axis: mean of the two measurement – Y-axis: difference between the two values • Good first step analyzing the data! 43
  • 44. Bland-Altman Plots (e.g., airway segmentation evaluation) 44 Xu, Bagci, et al. MedIA, 2015.
  • 45. New Directions: Sampling Image Segmentations (Le et al, MedIA, 2016) • Automatically produce plausible image segmentation samples from a single expert segmentation! 45
  • 46. New Directions: Sampling Image Segmentations (Le et al, MedIA, 2016) • Automatically produce plausible image segmentation samples from a single expert segmentation! • A probability distribution of image segmentation boundaries is defined as Gaussian Process, which leads to segmentations which are spatially coherent and consistent with the presence of salient borders in the image. 46
  • 48. Remark: Gaussian Process (GP) ? 48 Credit: Ghahramani
  • 49. Remark: Gaussian Process (GP) ? 49 Credit: Ghahramani
  • 50. Remark: Gaussian Process (GP) ? 50 Credit: Ghahramani
  • 53. New Directions: Sampling Image Segmentations (Le et al, MedIA, 2016) • Automatically produce plausible image segmentation samples from a single expert segmentation! • A probability distribution of image segmentation boundaries is defined as Gaussian Process, which leads to segmentations which are spatially coherent and consistent with the presence of salient borders in the image. 53
  • 54. Sample segmentation contours according to mean inter-sample dice coefficient! 54 (Top Left) Mean of the GP µ; (Top Middle) Sample of the level set function φ(a) drawn from 𝒢𝒫(µ,Σ) (Others) GPSSI samples. The ground truth is outlined in red, the GPSSI samples are outlined in orange.
  • 55. 55 New Directions: Sampling Image Segmentations (Le et al, MedIA, 2016) (Left) Signed geodesic distance µ(a) of the ROI with isocontours –45, 0, 45, 100, 200. (Right) One can check that the samples most probably lie in the region delineated by the isocontours µ(a)=±45. The sampled contours are in orange.
  • 56. New Directions: Sampling Image Segmentations (Le et al, MedIA, 2016) 56
  • 57. New Directions: Sampling Image Segmentations (Le et al, MedIA, 2016) 57
  • 58. Provocative Question? • Can we evaluate segmentation error without the ground truth? 58
  • 59. Provocative Question? • Can we evaluate segmentation error without the ground truth? – With the machine learning support, can we design a classifier which LEARNS segmentation error and adapt itself for better delineation? 59
  • 60. Summary • Segmentation Evaluation – Theoretical vs. Empirical – Visual Assessment – Volumetric Agreement – Efficacy (efficiency, accuracy, …) – STAPLE – New Trends! – Segmentation Challenges (choose your project!) 60
  • 61. Slide Credits and References • Credits to: Jayaram K. Udupa of Univ. of Penn., MIPG • Bagci’s CV Course 2015 Fall. • K.D. Toennies, Guide to Medical Image Analysis, • Handbook of Medical Imaging, Vol. 2. SPIE Press. • Handbook of Biomedical Imaging, Paragios, Duncan, Ayache. • Seutens,P., Medical Imaging, Cambridge Press. • Neculai Archip, Ph.D • Simon K. Warfield, Ph.D. (See STAPLE Algorithm) 61