SlideShare a Scribd company logo
Introduction to Computer Vision
Quantitative Biomedical Imaging Group
Institute of Biomedical Engineering
Big Data Institute
University of Oxford
Jens Rittscher
Mathematics (with Computer Science)
University of Bonn, Germany
Mathematics – a universal language plays a role in many disciplines
questions / opportunities
Economics Biology Computer Vision
Machine Learning for Computer Vision
What is Computer Vision?
o train machines to interpret the visual world
o analyse what objects are in an image
o detect specific objects of interest
Edge Detection
Image Segmentation
Classification
Visual Motion
C
o
u
r
s
e
T
h
e
m
e
s
Jens Rittscher
Institute of Biomedical
Engineering & Big Data Institute
University of Oxford
5
GE
-
Global
Research
Niskayuna,
NY
University
of
Oxford
DPhil - Engineering Science
(Computer Vision)
Title: Recognising Human Motion
Senior Scientist
Computer Vision and
Visualisation
Project Leader
Biomedical Imaging
Manager
Computer Vision
Senior Research Fellow (IBME)
Group Leader (TDI)
Adjunct Member (LICR)
2000
2005
2013
Professor of Engineering Science
Cell Tracking
`
Zebrafish Imaging
Computational Pathology
Re-identification
Group Segmentation
U
Oxford
Tissue Imaging
Endoscopy
length of “tongues” of BE, rather than the total length above
the GEJ.
Thus, the grading system defined by the working group to
improve the recognition of and reporting of gastroesophageal
landmarks and endoscopically recognized BE included the C &
M extent of endoscopically recognized BE, GEJ, SCJ, and dia-
phragmatic hiatus (Figure 2). Figures 3 and 4 show the C & M
extents of endoscopically recognized BE, with C ! 2 cm and M
! 5 cm, giving a classification of C2M5.
Initial Validation of the Classification System:
Internal Study
The grading system was validated initially by a panel of
5 members of the working group, who assessed a selection of 50
video clips. The video clips were viewed in random order. The
internal assessment produced reliability coefficients of 0.91 for
C and 0.66 for M. This correlates to an “almost perfect” level of
reliability for C and “substantial” reliability for M (Table 2).
One assessor misinterpreted M as being the “tongue” length,
and, if the results from this assessor were excluded, the reliabil-
ity coefficients were 0.94 for C and 0.88 for M. There were only
minimal differences between the reliability coefficients for
push-only and pull-only endoscopic procedures (Table 2), indi-
cating that these criteria could be used either during endoscope
insertion or toward the completion of endoscopic procedure, ie,
withdrawal.
Validation of the Classification System:
External Study
Of the 29 external assessors invited to participate in the
analysis, 22 submitted complete data for C & M values for the
selection of the 29 video clips selected for this study. One
observer assessed only 1 video clip, and these data were ex-
cluded from analysis. Moreover, 9 observers had at least once
recorded an M value that was numerically smaller than the C
value on the same clip (the M value should always be !C value).
In these situations, the M value was replaced with the C value.
The distribution of mean C & M assessments of the 29 video
clips is presented in Table 3. Almost half of the C assessments
but only 5 of the M assessments were less than 0.5 cm.
The overall reliability coefficients from the external assess-
ment were 0.94 for C and 0.93 for M, representing an “almost
perfect” level of reliability for both. Using the C & M criteria,
assessors were able to agree on the presence of endoscopic BE
greater than 1 cm in length with substantial reliability (RC !
0.72). The recognition of endoscopic BE "1 cm in length was
only slightly reliable (RC ! 0.21), making the recognition of
endoscopic BE of any length moderately reliable (RC ! 0.49).
The assessors were able to recognize the proximal margin of the
gastric folds and the diaphragmatic hiatus with almost perfect
reliability (RC ! 0.88 and 0.85, respectively). When calculating
percentage agreement, each observer was compared with every
other observer. For such pairwise assessment, there were a total
of 6699 comparisons from the 29 video clips. Of these compar-
isons for C & M values, the exact rates of agreement were 53%
and 38%, respectively. The comparisons differed at most by 1
cm in 88% and 82% and differed at most by 2 cm in 97% and
95% of the C & M values, respectively. The detailed breakdown
of results from the external assessment by length of BE and
reliability coefficients for recognizing the position of gastro-
esophageal landmarks are presented in Tables 4–6.
There were no observers that recorded extreme values, ie,
consistently the highest or lowest recordings. The observer with
the highest number of extreme recordings had, out of the 29
clips, 3 highest recordings on C and 4 highest recordings on M.
The results did not change when this observer was excluded
from the analysis.
Discussion
At present, standardized, validated criteria for the en-
doscopic description of BE are not routinely used. Endoscopists
currently adopt a loose classification system, defining endo-
scopic segments of BE as “long,” “short,” or “ultra-short,” with-
Figure 4. Video still of endoscopic Barrett’s esophagus showing an
area classified as C2M5. C: extent of circumferential metaplasia; M:
maximal extent of the metaplasia (C plus a distal “tongue” of 3 cm).
Table 2. Reliability Coefficients for the Initial Validation of
the Classification System: Internal Study
All endoscopies
(push or pull)
Push-only
endoscopy
Pull-only
endoscopy
Circumferential extent (C) 0.91 0.93 0.91
(0.94)a (0.94)a (0.94)a
Maximal extent (M) 0.66 0.65 0.67
(0.88)a (0.96)a (0.81)a
aReliability coefficient if the results from 1 of the 5 internal assessors,
who did not understand the “M” classification, are not included in the
analysis.
Table 3. Number of Video Clips With C & M Assessments
in Relationship to the Length of the BE Segment
Estimated BE length
Number of video clips
(C value)
Number of video clips
(M value)
0.0 to "0.5 cm 14 5
0.5 to "1.0 cm 4 2
1.0 to "3.0 cm 4 11
3.0 to "5.0 cm 2 4
!5.0 cm 5 7
CLINICAL–
ALIMENTARY
TRACT
1396 SHARMA ET AL GASTROENTEROLOGY Vol. 131, No. 5
• Learn image processing & machine learning
techniques in the context of a concrete application
setting
• Gain experience in working with images and the
application of machine learning models
Machine Learning for Computer Vision
Lectures
Exercises
C
o
u
r
s
e
C
o
m
p
o
n
e
n
t
s
Data Science
Theory
You have a strong
background in
mathematics and statistics
and like to apply the
methods to real-world
problems.
Practice
You have the necessary
practical programming
skills to implement your
ideas and work on large
data sets.
Context
You have a strong interest
or background knowledge
in a particular scientific
field that excites you.
Structure of the course
Feature Extraction Image Segmentation Object Detection
Traditional Computer Vision
Revisiting Computer Vision with Deep Learning
Object Detection
Semantic Segmentation
Machine Learning
Deep Learning
Motion & Tracking
Course structure
Unit Core Topics Lectures & Exercises
Day 1 Introduction, representation of digital images,
filtering, feature extraction
Lectures 1, 2
Day 2 Image segmentation Lectures 3, 4
Exercises 1, 2, (3)
Day 3 Machine learning (part 1)
Discussion of exercise sheet 1
Lecture 5
Day 4 Machine learning (part 2)
Object detection
Lectures 6, 7
Exercises 3, 4, (5)
Day 5 Deep learning elements
Discussion of exercise sheet 1
Lecture 8
Course structure
Unit Core Topics Lectures & Exercises
Day 6 Deep learning detection
Deep learning segmentation
Lectures 9, 10
Exercises 6, 7, (8)
Day 7 Autoencoders
Discussion of exercise sheet 3
Lecture 11
Day 8 Video processing
Visual tracking
Lectures 12, 13
Exercises 9, 10, (11)
Day 9 Application and translation of AI
Discussion of exercise sheet 4
Lecture 14
Day 10 Research Talk
The exercises are a fundamental part of the course. They are important
as they help you to understand the course material in more depth.
They will cover the following aspects:
• Understanding of the core methods
• Help to apply the concepts in practice
• Provide direction for additional study
The points from the exercises are account for 30% of the final grade.
Exercises 3, 6, 9, 12, 18 are optional.
Exercises
Programming and software
Python libraries
We advise to work with the Anaconda distribution that is based on Python
3.x. Using the conda installer is it possible to install missing packages
• Numpy
• Scikit-image (http://scikit-image.org/)
• Scikit-learn (http://scikit-learn.org/)
• OpenCV (http://opencv.org/) – not required for the exercises
• pyDICOM
For medical image processing:
• SimpleITK (http://www.simpleitk.org/)
You will find a set of Python
notebooks on github.
You can copy these onto your local
computer or run them online.
Your Python setup
Python example
Literature
Some history …
• Computer vision started in the late 1960s in groups that pioneered
artificial intelligence
• The goal was to build machines and systems that could ‘see’, i.e.
interpret the visual word
• As such the field has very close links with robotics
Computer Vision
• A seminal book that describes a
general framework for understanding
visual perception
• Reconstructing the scene from a set of
primitives (lines, simple geometric
structures) is a central theme
David Marr - Vision
Takeo Kanade
Contributions to computer vision and robotics in over 50 years
PhD Thesis 1974
Kyoto, Japan
Neural Network Based
Face Detection
H. A.Rowley, S. Baluja, T. Kanade
CVPR 1996
Input
Network Output
su
bs
am
pl
in
g
Preprocessing Neural network
pixels
20 by 20
Extracted window
Input image pyramid
(20 by 20 pixels)
Correct lighting Histogram equalization Receptive fields
Hidden units
variation across the face. The linear function will approx-
imate the overall brightness of each part of the window,
and can be subtracted from the window to compensate for a
variety of lightingconditions. Then histogram equalization
is performed, which non-linearly maps the intensity values
to expand the range of intensities in the window. The his-
togram is computed for pixels inside an oval region in the
window. This compensates for differences in camera input
gains, and improves the contrast in some cases.
The preprocessed windowis then passed througha neural
network. The network has retinal connections to its input
layer; the receptive fields of hidden units are shown in
Figure 1. There are three types of hidden units: 4 which
look at 10x10 pixel subregions, 16 which look at 5x5 pixel
subregions, and 6 which look at overlapping 20x5 pixel
horizontal stripes of pixels. Each of these types was chosen
to allow the hidden units to represent localized features that
might be important for face detection. Although the figure
shows a single hidden unit for each subregion of the input,
these units can be replicated. For the experiments which
are described later, we use networks with two and three sets
of these hidden units. Similar input connection patterns are
commonly used in speech and character recognition tasks
[Waibel et al., 1989, Le Cun et al., 1989]. The network has
a single, real-valued output, which indicates whether or not
the window contains a face.
Totraintheneural networkusedinstageonetoserveas an
accurate filter, a large number of face and non-face images
are needed. Nearly 1050 face examples were gathered
from face databases at CMU and Harvard2
. The images
and position, as follows:
1. Rotate image so both eyes appear on a horizontal line.
2. Scale image so the distance from the point between
the eyes to the upper lip is 12 pixels.
3. Extract a 20x20 pixel region, centered 1 pixel above
the point between the eyes and the upper lip.
In the training set,15 face examples are generated from each
original image,by randomly rotating the images (about their
center points) up to 10 , scaling between 90% and 110%,
translating up to half a pixel, and mirroring. Each 20x20
windowin the set is then preprocessed (by applyinglighting
correction and histogram equalization). The randomization
gives the filter invariance to translations of less than a pixel
and scalings of 10%. Larger changes in translation and
scale are dealt with by applying the filter at every pixel
position in an image pyramid, in which the images are
scaled by factors of 1.2.
Practically any image can serve as a non-face example
because the space of non-face images is much larger than
the space of face images. However, collecting small yet a
“representative” set of non-faces is difficult. Instead of col-
lecting the images before training is started, the images are
collected during training in the following manner, adapted
from [Sung and Poggio, 1994]:
1. Create an initial set of non-face images by generating
1000 images with random pixel intensities. Apply the
preprocessing steps to each of these images.
2. Train the neural network to produce an output of 1 for
A Statistical Approach to 3D Object Detection
Applied to Faces and Cars
H. Scheiderman and T. Tanade
2000
quencies. Each subsequent level represents a higher octave of frequencies. In terms of spatial
Level 3
LH
Level 3
HH
Level 3
HL
Level 2
LH
Level 2
HL
Level 2
HH
L1
HL
L1
LH
L1
HH
L1
LL
Figure 15. Wavelet representation of an image
Figure 16. Images and their wavelet transforms.
Note: the wavelet coefficients are each quantized to five values.
Structure from motion
C Tomasi and T. Kanade, 1991
Feature tracking
B.D. Lucas and T. Kanade 1981
C. Tomasi and T. Kanade 1991
If we now partition the matrices L, E, and R as follows:
E
R
II
> * l
L" ] }2F
" E' 0
II
0
3
E"
}
/
»
-
3
II
' R! '
R"
p
}
3-
3 '
we have
LSR = L'E'R! +L"E"R!f
.
Let be the ideal measurement matrix, that is, the matrix we would obtain in
the absence of noise. Because of the rank principle, the non-zero singular values
of W* are atmost three. Since the singular values in E are sorted in non-increasing
order, 17 must contain all the singular values of W* that exceed the noise level.
As a consequence, the term ¿"17"/?" must be due entirely to noise, and the product
L'E'R! is the best possible rank-3 approximation to W*.
We can now restate our key point.
The Rank Principle for Noisy Measurements
All the shape and motion information in W is contained in its three
greatest singular values, together with the corresponding left and
right eigenvectors.
Thus, the best possible approximations to the ideal measurement matrix W is
the product
W = L'E'R!
12
• Registration can be achieved through a local search
using gradients
• Tracking is improved by selecting which features
should be tracked
• The robotic system controls 30 cameras based
on the operator controlled master camera
• The feeds from 30 cameras are blended into
one dynamic panorama
In collaboration with CBS & Princeton Video Imaging
Image Guided Navigation System
to Measure Intraoperatively
Acetabular Implant Alignment
1998
transformation is first determined using manually specified anatomical landmarks to
perform corresponding point registration [6]. Once this initial estimate is determined,
the surface-based registration algorithm described in [15] uses the pre- and intra-oper-
ative data to refine the initial transformation estimate.
Once the location of the pelvis is determined via registration, navigational feedback can
be provided to the surgeon on a television monitor, as seen in Fig. 7. This feedback is
used by the surgeon to accurately position the acetabular implant within the acetabular
cavity. To align the cup within the acetabulum in the placement determined by the pre-
operative plan, the cross-hairs representing the tip of the implant and the top of the han-
dle must be aligned at the fixed cross hair in the center of the image. Once aligned, the
implant is in the pre-operatively planned orientation.
Fig. 6. Surface-based registration.
Fig. 7. Navigational feedback. Fig. 8. Real-time tracking of the pelvis.
Foundation of Quality of
Life Technology Center
CMU, 2208
Understanding the Phase Contrast Optics
to Restore Artifact-free Microscopy
Images for Segmentation
MICCAI 2012
Link to the youtube lecture.
Takeo Kanade’s Kyoto Prize lecture
• Lectures and attendance
• 20% attendance
• 20% exercises
• Examination
• 30% mid-term exam
• 30% mid-term exam
Course evaluation
• Form a study group of four students – in the second half of the course
we will have a small challenge and you will have to work as a team
• Every week we will devote 15 min of time to answer questions. In the
entire course we expect each group to prepare 3 questions. Please
submit these questions to the coordinator.
• In week 5 we will have a revision class. Each group should submit one
question/topic they like to revise
Group working & participation

More Related Content

PPTX
Visual Search for Musical Performances and Endoscopic Videos
DOCX
Gandhi_Nair_Noorani_Stutz_Group1_Report1final
PPT
Medicalprobes
PPT
Optical biopsy with confocal endoscopy diagnosed by pathologist
PDF
Evaluation of Gastric Diseases Using Segmentation RGB Interface in Video Endo...
PDF
How to Build a Research Roadmap (avoiding tempting dead-ends)
PDF
MediaEval 2017 - Medical Multimedia Task: Multimedia for Medicine: The Medico...
PPT
What is Endoscopy? (Flexible Endoscopes)
Visual Search for Musical Performances and Endoscopic Videos
Gandhi_Nair_Noorani_Stutz_Group1_Report1final
Medicalprobes
Optical biopsy with confocal endoscopy diagnosed by pathologist
Evaluation of Gastric Diseases Using Segmentation RGB Interface in Video Endo...
How to Build a Research Roadmap (avoiding tempting dead-ends)
MediaEval 2017 - Medical Multimedia Task: Multimedia for Medicine: The Medico...
What is Endoscopy? (Flexible Endoscopes)

Similar to Machine Learning for Computer Vision.pdf (20)

PPTX
Dmitry Stepanov - Detector of interest point from region of interest on NBI ...
PPTX
Flexible endoscopy a surgeon's perspective
PPTX
Gastric Cancer detection New(Prasun new file).pptx
PPT
endoscope-higher advanced technology.ppt
ODP
Recent Trends in Signal and Image Processing - Applications
PPTX
Medical instrumentation - ENT devices
PPTX
Icenco2015- 30 Dec. 2015 (invited talk)
PDF
Classification of upper gastrointestinal tract diseases using endoscopic images
PPTX
Medinfo 2010 openEHR Clinical Modelling Worshop
PPTX
Endoscopy
PDF
Endoscopy basic principle, types, application
PDF
S4502115119
PDF
An efficient method to classify GI tract images from WCE using visual words
PDF
Decomposition of color wavelet with higher order statistical texture and conv...
PPTX
Endoscopy.pptx
PDF
IRJET- Unsupervised Detecting and Locating of Gastrointestinal Anomalies
PDF
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)
PPTX
nishanth ppt[1].pptx management and symptoms
PPTX
Image enhanced endoscopy presented by dr bikash poudel
PDF
A VLSI Architecture Realisation of an Wireless Endoscopy System
Dmitry Stepanov - Detector of interest point from region of interest on NBI ...
Flexible endoscopy a surgeon's perspective
Gastric Cancer detection New(Prasun new file).pptx
endoscope-higher advanced technology.ppt
Recent Trends in Signal and Image Processing - Applications
Medical instrumentation - ENT devices
Icenco2015- 30 Dec. 2015 (invited talk)
Classification of upper gastrointestinal tract diseases using endoscopic images
Medinfo 2010 openEHR Clinical Modelling Worshop
Endoscopy
Endoscopy basic principle, types, application
S4502115119
An efficient method to classify GI tract images from WCE using visual words
Decomposition of color wavelet with higher order statistical texture and conv...
Endoscopy.pptx
IRJET- Unsupervised Detecting and Locating of Gastrointestinal Anomalies
Computer Vision in Academia and Industry (Dmytro Mishkin Technology Stream)
nishanth ppt[1].pptx management and symptoms
Image enhanced endoscopy presented by dr bikash poudel
A VLSI Architecture Realisation of an Wireless Endoscopy System
Ad

Recently uploaded (20)

PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
RMMM.pdf make it easy to upload and study
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Institutional Correction lecture only . . .
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
01-Introduction-to-Information-Management.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Microbial disease of the cardiovascular and lymphatic systems
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
RMMM.pdf make it easy to upload and study
Supply Chain Operations Speaking Notes -ICLT Program
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Anesthesia in Laparoscopic Surgery in India
Institutional Correction lecture only . . .
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
01-Introduction-to-Information-Management.pdf
Pharma ospi slides which help in ospi learning
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Chinmaya Tiranga quiz Grand Finale.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Final Presentation General Medicine 03-08-2024.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Ad

Machine Learning for Computer Vision.pdf

  • 1. Introduction to Computer Vision Quantitative Biomedical Imaging Group Institute of Biomedical Engineering Big Data Institute University of Oxford Jens Rittscher
  • 2. Mathematics (with Computer Science) University of Bonn, Germany
  • 3. Mathematics – a universal language plays a role in many disciplines questions / opportunities Economics Biology Computer Vision
  • 4. Machine Learning for Computer Vision What is Computer Vision? o train machines to interpret the visual world o analyse what objects are in an image o detect specific objects of interest Edge Detection Image Segmentation Classification Visual Motion C o u r s e T h e m e s
  • 5. Jens Rittscher Institute of Biomedical Engineering & Big Data Institute University of Oxford 5 GE - Global Research Niskayuna, NY University of Oxford DPhil - Engineering Science (Computer Vision) Title: Recognising Human Motion Senior Scientist Computer Vision and Visualisation Project Leader Biomedical Imaging Manager Computer Vision Senior Research Fellow (IBME) Group Leader (TDI) Adjunct Member (LICR) 2000 2005 2013 Professor of Engineering Science Cell Tracking ` Zebrafish Imaging Computational Pathology Re-identification Group Segmentation U Oxford Tissue Imaging Endoscopy length of “tongues” of BE, rather than the total length above the GEJ. Thus, the grading system defined by the working group to improve the recognition of and reporting of gastroesophageal landmarks and endoscopically recognized BE included the C & M extent of endoscopically recognized BE, GEJ, SCJ, and dia- phragmatic hiatus (Figure 2). Figures 3 and 4 show the C & M extents of endoscopically recognized BE, with C ! 2 cm and M ! 5 cm, giving a classification of C2M5. Initial Validation of the Classification System: Internal Study The grading system was validated initially by a panel of 5 members of the working group, who assessed a selection of 50 video clips. The video clips were viewed in random order. The internal assessment produced reliability coefficients of 0.91 for C and 0.66 for M. This correlates to an “almost perfect” level of reliability for C and “substantial” reliability for M (Table 2). One assessor misinterpreted M as being the “tongue” length, and, if the results from this assessor were excluded, the reliabil- ity coefficients were 0.94 for C and 0.88 for M. There were only minimal differences between the reliability coefficients for push-only and pull-only endoscopic procedures (Table 2), indi- cating that these criteria could be used either during endoscope insertion or toward the completion of endoscopic procedure, ie, withdrawal. Validation of the Classification System: External Study Of the 29 external assessors invited to participate in the analysis, 22 submitted complete data for C & M values for the selection of the 29 video clips selected for this study. One observer assessed only 1 video clip, and these data were ex- cluded from analysis. Moreover, 9 observers had at least once recorded an M value that was numerically smaller than the C value on the same clip (the M value should always be !C value). In these situations, the M value was replaced with the C value. The distribution of mean C & M assessments of the 29 video clips is presented in Table 3. Almost half of the C assessments but only 5 of the M assessments were less than 0.5 cm. The overall reliability coefficients from the external assess- ment were 0.94 for C and 0.93 for M, representing an “almost perfect” level of reliability for both. Using the C & M criteria, assessors were able to agree on the presence of endoscopic BE greater than 1 cm in length with substantial reliability (RC ! 0.72). The recognition of endoscopic BE "1 cm in length was only slightly reliable (RC ! 0.21), making the recognition of endoscopic BE of any length moderately reliable (RC ! 0.49). The assessors were able to recognize the proximal margin of the gastric folds and the diaphragmatic hiatus with almost perfect reliability (RC ! 0.88 and 0.85, respectively). When calculating percentage agreement, each observer was compared with every other observer. For such pairwise assessment, there were a total of 6699 comparisons from the 29 video clips. Of these compar- isons for C & M values, the exact rates of agreement were 53% and 38%, respectively. The comparisons differed at most by 1 cm in 88% and 82% and differed at most by 2 cm in 97% and 95% of the C & M values, respectively. The detailed breakdown of results from the external assessment by length of BE and reliability coefficients for recognizing the position of gastro- esophageal landmarks are presented in Tables 4–6. There were no observers that recorded extreme values, ie, consistently the highest or lowest recordings. The observer with the highest number of extreme recordings had, out of the 29 clips, 3 highest recordings on C and 4 highest recordings on M. The results did not change when this observer was excluded from the analysis. Discussion At present, standardized, validated criteria for the en- doscopic description of BE are not routinely used. Endoscopists currently adopt a loose classification system, defining endo- scopic segments of BE as “long,” “short,” or “ultra-short,” with- Figure 4. Video still of endoscopic Barrett’s esophagus showing an area classified as C2M5. C: extent of circumferential metaplasia; M: maximal extent of the metaplasia (C plus a distal “tongue” of 3 cm). Table 2. Reliability Coefficients for the Initial Validation of the Classification System: Internal Study All endoscopies (push or pull) Push-only endoscopy Pull-only endoscopy Circumferential extent (C) 0.91 0.93 0.91 (0.94)a (0.94)a (0.94)a Maximal extent (M) 0.66 0.65 0.67 (0.88)a (0.96)a (0.81)a aReliability coefficient if the results from 1 of the 5 internal assessors, who did not understand the “M” classification, are not included in the analysis. Table 3. Number of Video Clips With C & M Assessments in Relationship to the Length of the BE Segment Estimated BE length Number of video clips (C value) Number of video clips (M value) 0.0 to "0.5 cm 14 5 0.5 to "1.0 cm 4 2 1.0 to "3.0 cm 4 11 3.0 to "5.0 cm 2 4 !5.0 cm 5 7 CLINICAL– ALIMENTARY TRACT 1396 SHARMA ET AL GASTROENTEROLOGY Vol. 131, No. 5
  • 6. • Learn image processing & machine learning techniques in the context of a concrete application setting • Gain experience in working with images and the application of machine learning models Machine Learning for Computer Vision Lectures Exercises C o u r s e C o m p o n e n t s
  • 7. Data Science Theory You have a strong background in mathematics and statistics and like to apply the methods to real-world problems. Practice You have the necessary practical programming skills to implement your ideas and work on large data sets. Context You have a strong interest or background knowledge in a particular scientific field that excites you.
  • 8. Structure of the course Feature Extraction Image Segmentation Object Detection Traditional Computer Vision Revisiting Computer Vision with Deep Learning Object Detection Semantic Segmentation Machine Learning Deep Learning Motion & Tracking
  • 9. Course structure Unit Core Topics Lectures & Exercises Day 1 Introduction, representation of digital images, filtering, feature extraction Lectures 1, 2 Day 2 Image segmentation Lectures 3, 4 Exercises 1, 2, (3) Day 3 Machine learning (part 1) Discussion of exercise sheet 1 Lecture 5 Day 4 Machine learning (part 2) Object detection Lectures 6, 7 Exercises 3, 4, (5) Day 5 Deep learning elements Discussion of exercise sheet 1 Lecture 8
  • 10. Course structure Unit Core Topics Lectures & Exercises Day 6 Deep learning detection Deep learning segmentation Lectures 9, 10 Exercises 6, 7, (8) Day 7 Autoencoders Discussion of exercise sheet 3 Lecture 11 Day 8 Video processing Visual tracking Lectures 12, 13 Exercises 9, 10, (11) Day 9 Application and translation of AI Discussion of exercise sheet 4 Lecture 14 Day 10 Research Talk
  • 11. The exercises are a fundamental part of the course. They are important as they help you to understand the course material in more depth. They will cover the following aspects: • Understanding of the core methods • Help to apply the concepts in practice • Provide direction for additional study The points from the exercises are account for 30% of the final grade. Exercises 3, 6, 9, 12, 18 are optional. Exercises
  • 13. Python libraries We advise to work with the Anaconda distribution that is based on Python 3.x. Using the conda installer is it possible to install missing packages • Numpy • Scikit-image (http://scikit-image.org/) • Scikit-learn (http://scikit-learn.org/) • OpenCV (http://opencv.org/) – not required for the exercises • pyDICOM For medical image processing: • SimpleITK (http://www.simpleitk.org/)
  • 14. You will find a set of Python notebooks on github. You can copy these onto your local computer or run them online. Your Python setup
  • 18. • Computer vision started in the late 1960s in groups that pioneered artificial intelligence • The goal was to build machines and systems that could ‘see’, i.e. interpret the visual word • As such the field has very close links with robotics Computer Vision
  • 19. • A seminal book that describes a general framework for understanding visual perception • Reconstructing the scene from a set of primitives (lines, simple geometric structures) is a central theme David Marr - Vision
  • 20. Takeo Kanade Contributions to computer vision and robotics in over 50 years
  • 21. PhD Thesis 1974 Kyoto, Japan Neural Network Based Face Detection H. A.Rowley, S. Baluja, T. Kanade CVPR 1996 Input Network Output su bs am pl in g Preprocessing Neural network pixels 20 by 20 Extracted window Input image pyramid (20 by 20 pixels) Correct lighting Histogram equalization Receptive fields Hidden units variation across the face. The linear function will approx- imate the overall brightness of each part of the window, and can be subtracted from the window to compensate for a variety of lightingconditions. Then histogram equalization is performed, which non-linearly maps the intensity values to expand the range of intensities in the window. The his- togram is computed for pixels inside an oval region in the window. This compensates for differences in camera input gains, and improves the contrast in some cases. The preprocessed windowis then passed througha neural network. The network has retinal connections to its input layer; the receptive fields of hidden units are shown in Figure 1. There are three types of hidden units: 4 which look at 10x10 pixel subregions, 16 which look at 5x5 pixel subregions, and 6 which look at overlapping 20x5 pixel horizontal stripes of pixels. Each of these types was chosen to allow the hidden units to represent localized features that might be important for face detection. Although the figure shows a single hidden unit for each subregion of the input, these units can be replicated. For the experiments which are described later, we use networks with two and three sets of these hidden units. Similar input connection patterns are commonly used in speech and character recognition tasks [Waibel et al., 1989, Le Cun et al., 1989]. The network has a single, real-valued output, which indicates whether or not the window contains a face. Totraintheneural networkusedinstageonetoserveas an accurate filter, a large number of face and non-face images are needed. Nearly 1050 face examples were gathered from face databases at CMU and Harvard2 . The images and position, as follows: 1. Rotate image so both eyes appear on a horizontal line. 2. Scale image so the distance from the point between the eyes to the upper lip is 12 pixels. 3. Extract a 20x20 pixel region, centered 1 pixel above the point between the eyes and the upper lip. In the training set,15 face examples are generated from each original image,by randomly rotating the images (about their center points) up to 10 , scaling between 90% and 110%, translating up to half a pixel, and mirroring. Each 20x20 windowin the set is then preprocessed (by applyinglighting correction and histogram equalization). The randomization gives the filter invariance to translations of less than a pixel and scalings of 10%. Larger changes in translation and scale are dealt with by applying the filter at every pixel position in an image pyramid, in which the images are scaled by factors of 1.2. Practically any image can serve as a non-face example because the space of non-face images is much larger than the space of face images. However, collecting small yet a “representative” set of non-faces is difficult. Instead of col- lecting the images before training is started, the images are collected during training in the following manner, adapted from [Sung and Poggio, 1994]: 1. Create an initial set of non-face images by generating 1000 images with random pixel intensities. Apply the preprocessing steps to each of these images. 2. Train the neural network to produce an output of 1 for A Statistical Approach to 3D Object Detection Applied to Faces and Cars H. Scheiderman and T. Tanade 2000 quencies. Each subsequent level represents a higher octave of frequencies. In terms of spatial Level 3 LH Level 3 HH Level 3 HL Level 2 LH Level 2 HL Level 2 HH L1 HL L1 LH L1 HH L1 LL Figure 15. Wavelet representation of an image Figure 16. Images and their wavelet transforms. Note: the wavelet coefficients are each quantized to five values.
  • 22. Structure from motion C Tomasi and T. Kanade, 1991 Feature tracking B.D. Lucas and T. Kanade 1981 C. Tomasi and T. Kanade 1991 If we now partition the matrices L, E, and R as follows: E R II > * l L" ] }2F " E' 0 II 0 3 E" } / » - 3 II ' R! ' R" p } 3- 3 ' we have LSR = L'E'R! +L"E"R!f . Let be the ideal measurement matrix, that is, the matrix we would obtain in the absence of noise. Because of the rank principle, the non-zero singular values of W* are atmost three. Since the singular values in E are sorted in non-increasing order, 17 must contain all the singular values of W* that exceed the noise level. As a consequence, the term ¿"17"/?" must be due entirely to noise, and the product L'E'R! is the best possible rank-3 approximation to W*. We can now restate our key point. The Rank Principle for Noisy Measurements All the shape and motion information in W is contained in its three greatest singular values, together with the corresponding left and right eigenvectors. Thus, the best possible approximations to the ideal measurement matrix W is the product W = L'E'R! 12 • Registration can be achieved through a local search using gradients • Tracking is improved by selecting which features should be tracked
  • 23. • The robotic system controls 30 cameras based on the operator controlled master camera • The feeds from 30 cameras are blended into one dynamic panorama In collaboration with CBS & Princeton Video Imaging
  • 24. Image Guided Navigation System to Measure Intraoperatively Acetabular Implant Alignment 1998 transformation is first determined using manually specified anatomical landmarks to perform corresponding point registration [6]. Once this initial estimate is determined, the surface-based registration algorithm described in [15] uses the pre- and intra-oper- ative data to refine the initial transformation estimate. Once the location of the pelvis is determined via registration, navigational feedback can be provided to the surgeon on a television monitor, as seen in Fig. 7. This feedback is used by the surgeon to accurately position the acetabular implant within the acetabular cavity. To align the cup within the acetabulum in the placement determined by the pre- operative plan, the cross-hairs representing the tip of the implant and the top of the han- dle must be aligned at the fixed cross hair in the center of the image. Once aligned, the implant is in the pre-operatively planned orientation. Fig. 6. Surface-based registration. Fig. 7. Navigational feedback. Fig. 8. Real-time tracking of the pelvis. Foundation of Quality of Life Technology Center CMU, 2208 Understanding the Phase Contrast Optics to Restore Artifact-free Microscopy Images for Segmentation MICCAI 2012
  • 25. Link to the youtube lecture. Takeo Kanade’s Kyoto Prize lecture
  • 26. • Lectures and attendance • 20% attendance • 20% exercises • Examination • 30% mid-term exam • 30% mid-term exam Course evaluation
  • 27. • Form a study group of four students – in the second half of the course we will have a small challenge and you will have to work as a team • Every week we will devote 15 min of time to answer questions. In the entire course we expect each group to prepare 3 questions. Please submit these questions to the coordinator. • In week 5 we will have a revision class. Each group should submit one question/topic they like to revise Group working & participation