SlideShare a Scribd company logo
Robust Real-time FaceRobust Real-time Face
DetectionDetection
byby
Paul Viola and Michael Jones, 2002Paul Viola and Michael Jones, 2002
Presentation by Kostantina Palla & Alfredo Kalaitzis
School of Informatics
University of Edinburgh
February 20, 2009
OverviewOverview
 Robust – very high Detection Rate (True-Positive
Rate) & very low False-Positive Rate… always.
 Real Time – For practical applications at least 2
frames per second must be processed.
 Face Detection – not recognition. The goal is to
distinguish faces from non-faces (face detection is the
first step in the identification process)
Three goals & a conlcusionThree goals & a conlcusion
1. Feature Computation: what features? And how
can they be computed as quickly as possible
2. Feature Selection: select the most
discriminating features
3. Real-timeliness: must focus on potentially
positive areas (that contain faces)
4. Conclusion: presentation of results and
discussion of detection issues.
How did Viola & Jones deal with these challenges?
1. Feature Computation
The “Integral” image representation
2. Feature Selection
The AdaBoost training algorithm
3. Real-timeliness
A cascade of classifiers
Three solutionsThree solutions
FeaturesFeatures
 Can a simple feature (i.e. a value) indicate
the existence of a face?
 All faces share some similar properties
 The eyes region is darker than the
upper-cheeks.
 The nose bridge region is brighter than
the eyes.
 That is useful domain knowledge
 Need for encoding of Domain Knowledge:
 Location - Size: eyes & nose bridge
region
 Value: darker / brighter
OverviewOverview || Integral ImageIntegral Image | AdaBoost| AdaBoost | Cascade| Cascade
Rectangle featuresRectangle features
 Rectangle features:
 Value = ∑ (pixels in black area) - ∑
(pixels in white area)
 Three types: two-, three-, four-rectangles,
Viola&Jones used two-rectangle features
 For example: the difference in brightness
between the white &black rectangles over
a specific area
 Each feature is related to a special
location in the sub-window
 Each feature may have any size
 Why not pixels instead of features?
 Features encode domain knowledge
 Feature based systems operate faster
Overview |Overview | Integral ImageIntegral Image | AdaBoost| AdaBoost | Cascade| Cascade
Integral Image RepresentationIntegral Image Representation
(also check back-up slide #1)(also check back-up slide #1)
 Given a detection resolution of
24x24 (smallest sub-window), the set
of different rectangle features is
~160,000 !
 Need for speed
 Introducing Integral Image
Representation
 Definition: The integral image at
location (x,y), is the sum of the
pixels above and to the left of
(x,y), inclusive
 The Integral image can be computed
in a single pass and only once for
each sub-window!
( ) ( )
( ) ( ) ( )
( ) ( ) ( )
' , '
formal definition:
, ', '
Recursive definition:
, , 1 ,
, 1, ,
x x y y
ii x y i x y
s x y s x y i x y
ii x y ii x y s x y
≤ ≤
=
= − +
= − +
∑
Overview |Overview | Integral ImageIntegral Image | AdaBoost| AdaBoost | Cascade| Cascade
y
x
back-up slide #1back-up slide #1
Overview |Overview | Integral ImageIntegral Image | AdaBoost| AdaBoost | Cascade| Cascade
0 1 1 1
1 2 2 3
1 2 1 1
1 3 1 0
IMAGE
0 1 2 3
1 4 7 11
2 7 11 16
3 11 16 21
INTEGRAL IMAGE
Rapid computation of rectangular featuresRapid computation of rectangular features
 Back to feature evaluation . . .
 Using the integral image
representation we can compute the
value of any rectangular sum (part of
features) in constant time
 For example the integral sum inside
rectangle D can be computed as:
ii(d) + ii(a) – ii(b) – ii(c)
 two-, three-, and four-rectangular
features can be computed with 6, 8
and 9 array references respectively.
 As a result: feature computation takes
less time
ii(a) = A
ii(b) = A+B
ii(c) = A+C
ii(d) =
A+B+C+D
D = ii(d)+ii(a)-
ii(b)-ii(c)
Overview |Overview | Integral ImageIntegral Image | AdaBoost| AdaBoost | Cascade| Cascade
Three goalsThree goals
1. Feature Computation: features must be
computed as quickly as possible
2. Feature Selection: select the most
discriminating features
3. Real-timeliness: must focus on potentially
positive image areas (that contain faces)
How did Viola & Jones deal with these challenges?
Overview |Overview | Integral ImageIntegral Image | AdaBoost| AdaBoost | Cascade| Cascade
Feature selectionFeature selection
 Problem: Too many features
 In a sub-window (24x24) there are
~160,000 features (all possible
combinations of orientation, location
and scale of these feature types)
 impractical to compute all of them
(computationally expensive)
 We have to select a subset of relevant
features – which are informative - to
model a face
 Hypothesis: “A very small subset of
features can be combined to form an
effective classifier”
 How?
 AdaBoost algorithm
Overview | Integral ImageOverview | Integral Image || AdaBoostAdaBoost | Cascade| Cascade
Relevant feature Irrelevant feature
AdaBoostAdaBoost
 Stands for “Adaptive” boost
 Constructs a “strong” classifier as a
linear combination of weighted simple
“weak” classifiers
Overview | Integral ImageOverview | Integral Image || AdaBoostAdaBoost | Cascade| Cascade
Strong
classifier
Weak classifier
WeightImage
AdaBoost -AdaBoost - CharacteristicsCharacteristics
 Features as weak classifiers
Each single rectangle feature may be regarded
as a simple weak classifier
 An iterative algorithm
AdaBoost performs a series of trials, each time
selecting a new weak classifier
 Weights are being applied over the set of
the example images
During each iteration, each example/image
receives a weight determining its importance
Overview | Integral ImageOverview | Integral Image || AdaBoostAdaBoost | Cascade| Cascade
AdaBoost -AdaBoost - Getting the idea…Getting the idea…
 Given: example images labeled +/-
 Initially, all weights set equally
 Repeat T times
 Step 1: choose the most efficient weak classifier that will be a
component of the final strong classifier (Problem! Remember the huge
number of features…)
 Step 2: Update the weights to emphasize the examples which were
incorrectly classified
 This makes the next weak classifier to focus on “harder” examples
 Final (strong) classifier is a weighted combination of the T “weak” classifiers
 Weighted according to their accuracy
Overview | Integral ImageOverview | Integral Image || AdaBoostAdaBoost | Cascade| Cascade




≥
= ∑ ∑= =
otherwise
x
xh
T
t
T
t ttt h
0
2
1
)(1
)( 1 1αα
(pseudo-code at back-up slide #2)(pseudo-code at back-up slide #2)
AdaBoost –AdaBoost – Feature SelectionFeature Selection
Problem
 On each round, large set of possible weak classifiers (each simple
classifier consists of a single feature) – Which one to choose?
 choose the most efficient (the one that best separates the
examples – the lowest error)
 choice of a classifier corresponds to choice of a feature
 At the end, the ‘strong’ classifier consists of T features
Conclusion
 AdaBoost searches for a small number of good classifiers – features
(feature selection)
 adaptively constructs a final strong classifier taking into account the
failures of each one of the chosen weak classifiers (weight appliance)
 AdaBoost is used to both select a small set of features and train a
strong classifier
Overview | Integral ImageOverview | Integral Image || AdaBoostAdaBoost | Cascade| Cascade
 AdaBoost starts with a uniform
distribution of “weights” over training
examples.
 Select the classifier with the lowest
weighted error (i.e. a “weak” classifier)
 Increase the weights on the training
examples that were misclassified.
 (Repeat)
 At the end, carefully make a linear
combination of the weak classifiers
obtained at all iterations.
AdaBoost exampleAdaBoost example
( )1 1 1
strong
1
1 ( ) ( )
( ) 2
0 otherwise
n n nh h
h

α + + α ≥ α + + α
= 

x x
x
K K
Slide taken from a presentation by Qing Chen, Discover Lab, University of Ottawa
Overview | Integral ImageOverview | Integral Image || AdaBoostAdaBoost | Cascade| Cascade
Now we have a good face detectorNow we have a good face detector
 We can build a 200-feature
classifier!
 Experiments showed that a 200-
feature classifier achieves:
 95% detection rate
 0.14x10-3
FP rate (1 in 14084)
 Scans all sub-windows of a
384x288 pixel image in 0.7
seconds (on Intel PIII 700MHz)
 The more the better (?)
 Gain in classifier performance
 Lose in CPU time
 Verdict: good & fast, but not
enough
 Competitors achieve close to 1 in
a 1.000.000 FP rate!
 0.7 sec / frame IS NOT real-time.
Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
Three goalsThree goals
1. Feature Computation: features must be
computed as quickly as possible
2. Feature Selection: select the most
discriminating features
3. Real-timeliness: must focus on potentially
positive image areas (that contain faces)
How did Viola & Jones deal with these challenges?
Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
The attentional cascadeThe attentional cascade
 On average only 0.01% of all sub-
windows are positive (are faces)
 Status Quo: equal computation time is
spent on all sub-windows
 Must spend most time only on
potentially positive sub-windows.
 A simple 2-feature classifier can
achieve almost 100% detection rate
with 50% FP rate.
 That classifier can act as a 1st
layer of a
series to filter out most negative
windows
 2nd
layer with 10 features can tackle
“harder” negative-windows which
survived the 1st
layer, and so on…
 A cascade of gradually more complex
classifiers achieves even better
detection rates.
Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
On average, much fewer
features are computed per
sub-window (i.e. speed x 10)
Training a cascade of classifiersTraining a cascade of classifiers
Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
Strong classifier definition:
,
where ,
 Keep in mind:
 Competitors achieved 95% TP rate,10-6
FP rate
 These are the goals. Final cascade must do better!
 Given the goals, to design a cascade we must choose:
 Number of layers in cascade (strong classifiers)
 Number of features of each strong classifier (the ‘T’ in definition)
 Threshold of each strong classifier (the in definition)
 Optimization problem:
 Can we find optimum combination?
∑=
T
t t1
2
1
α
TREMENDOUSLY
DIFFICULT
PROBLEM
A simple framework for cascade trainingA simple framework for cascade training
Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
 Do not despair. Viola & Jones suggested a heuristic algorithm for
the cascade training: (pseudo-code at backup slide # 3)
 does not guarantee optimality
 but produces a “effective” cascade that meets previous goals
 Manual Tweaking:
 overall training outcome is highly depended on user’s choices
 select fi (Maximum Acceptable False Positive rate / layer)
 select di (Minimum Acceptable True Positive rate / layer)
 select Ftarget (Target Overall FP rate)
 possible repeat trial & error process for a given training set
 Until Ftarget is met:
 Add new layer:
 Until fi , di rates are met for this layer
 Increase feature number & train new strong classifier with AdaBoost
 Determine rates of layer on validation set
backup slide #3backup slide #3
• User selects values for f, the maximum acceptable false positive rate per layer and d,
the minimum acceptable detection rate per layer.
• User selects target overall false positive rate Ftarget.
• P = set of positive examples
• N = set of negative examples
• F0 = 1.0; D0 = 1.0; i = 0
While Fi > Ftarget
i++
ni = 0; Fi = Fi-1
while Fi > f x Fi-1
oni ++
oUse P and N to train a classifier with ni features using AdaBoost
oEvaluate current cascaded classifier on validation set to determine Fi and Di
oDecrease threshold for the ith classifier until the current cascaded classifier has
a detection rate of at least d x Di-1 (this also affects Fi)
N = ∅
If Fi > Ftarget then evaluate the current cascaded detector on the set of non-face
images and put any false detections into the set N.
Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
Three goalsThree goals
1. Feature Computation: features must be
computed as quickly as possible
2. Feature Selection: select the most
discriminating features
3. Real-timeliness: must focus on potentially
positive image areas (that contain faces)
How did Viola & Jones deal with these challenges?
Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
Training
Set
(sub-
windows)
Integral
Representation
Feature
computation
AdaBoost
Feature Selection
Cascade trainer
Testing phaseTesting phaseTraining phaseTraining phase
Strong Classifier 1
(cascade stage 1)
Strong Classifier N
(cascade stage N)
Classifier cascade
framework
Strong Classifier 2
(cascade stage 2)
FACE IDENTIFIED
Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
pros …pros …
 Extremely fast feature computation
 Efficient feature selection
 Scale and location invariant detector
 Instead of scaling the image itself (e.g. pyramid-filters), we scale the
features.
 Such a generic detection scheme can be trained for detection of
other types of objects (e.g. cars, hands)
…… and consand cons
 Detector is most effective only on frontal images of faces
 can hardly cope with 45o
face rotation
 Sensitive to lighting conditions
 We might get multiple detections of the same face, due to
overlapping sub-windows.
ResultsResults
(detailed results at back-up slide #4)(detailed results at back-up slide #4)
Results (Cont.)Results (Cont.)
 Viola & Jones prepared their final Detector cascade:
 38 layers, 6060 total features included
 1st
classifier- layer, 2-features
 50% FP rate, 99.9% TP rate
 2nd
classifier- layer, 10-features
 20% FP rate, 99.9% TP rate
 next 2 layers 25-features each, next 3 layers 50-features each
 and so on…
 Tested on the MIT+MCU test set
 a 384x288 pixel image on an PC (dated 2001) took about 0.067
seconds
Detector 10 31 50 65 78 95 167 422
Viola-Jones 76.1% 88.4% 91.4% 92.0% 92.1% 92.9% 93.9% 94.1%
Rowley-Baluja-Kanade 83.2% 86.0% - - 89.2% 89.2% 90.1% 89.9%
Schneiderman-Kanade - - - 94.4% - - - -
Roth-Yang-Ajuha - - - - - - - -
False detections
Detection rates for various numbers of false positives on the MIT+MCU test set containing 130
images and 507 faces (Viola & Jones 2002)
backup slide #4backup slide #4
Thank you for listening!Thank you for listening!

More Related Content

PPTX
Viola-Jones Object Detection
PPTX
Face detection ppt by Batyrbek
PPTX
Face recognition
DOCX
Template Matching - Pattern Recognition
PPTX
Vehicle Detection using Camera
PPT
face recognition system using LBP
PDF
“Driver Monitoring Systems: Present and Future,” a Presentation from XPERI
PDF
Computer Vision
Viola-Jones Object Detection
Face detection ppt by Batyrbek
Face recognition
Template Matching - Pattern Recognition
Vehicle Detection using Camera
face recognition system using LBP
“Driver Monitoring Systems: Present and Future,” a Presentation from XPERI
Computer Vision

What's hot (20)

PPTX
Wayfair's Data Science Team and Case Study: Uplift Modeling
PPTX
Pattern recognition facial recognition
PDF
Deep learning based object detection basics
PPTX
Computer Vision Introduction
PDF
NIST Face Recognition Vendor Test, FRVT
PPTX
Human pose estimation with deep learning
PDF
Non-Local Means and its Applications
PPTX
Computer vision - edge detection
PPTX
Deep learning for object detection
PPTX
Face recognition technology
PPT
Automated Face Detection System
PPTX
Facial recognition system
PDF
SSD: Single Shot MultiBox Detector (UPC Reading Group)
PPTX
Computer Vision harris
PDF
Introductory Level of SLAM Seminar
PDF
Template Matching
PPTX
What is machine vision slide share
PPTX
Real time traffic sign analysis
PDF
Object tracking presentation
PDF
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
Wayfair's Data Science Team and Case Study: Uplift Modeling
Pattern recognition facial recognition
Deep learning based object detection basics
Computer Vision Introduction
NIST Face Recognition Vendor Test, FRVT
Human pose estimation with deep learning
Non-Local Means and its Applications
Computer vision - edge detection
Deep learning for object detection
Face recognition technology
Automated Face Detection System
Facial recognition system
SSD: Single Shot MultiBox Detector (UPC Reading Group)
Computer Vision harris
Introductory Level of SLAM Seminar
Template Matching
What is machine vision slide share
Real time traffic sign analysis
Object tracking presentation
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
Ad

Similar to Robust Real Time Face Detection (20)

PPSX
Real-time Face Recognition & Detection Systems 1
PPTX
IMAGE PROCESSING
PPTX
Rapid object detection using boosted cascade of simple features
PPTX
Robust real time object detection
PPT
Avihu Efrat's Viola and Jones face detection slides
PDF
机器学习Adaboost
PDF
Face Detection System on Ada boost Algorithm Using Haar Classifiers
PPT
Face recognition.ppt
DOCX
Road signs detection using voila jone's algorithm with the help of opencv
PDF
Log polar coordinates
PPT
Face Detection techniques
PDF
XGBoost: the algorithm that wins every competition
PPTX
Face detection system design seminar
PPTX
Image processing
PDF
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
PDF
Questions On The Equation For Regression
ODP
Extract Features and Classification of Diseases from X-Ray Images Using Viola...
PDF
Estimating Human Pose from Occluded Images (ACCV 2009)
PDF
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
PDF
Image De-Noising Using Deep Neural Network
Real-time Face Recognition & Detection Systems 1
IMAGE PROCESSING
Rapid object detection using boosted cascade of simple features
Robust real time object detection
Avihu Efrat's Viola and Jones face detection slides
机器学习Adaboost
Face Detection System on Ada boost Algorithm Using Haar Classifiers
Face recognition.ppt
Road signs detection using voila jone's algorithm with the help of opencv
Log polar coordinates
Face Detection techniques
XGBoost: the algorithm that wins every competition
Face detection system design seminar
Image processing
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Questions On The Equation For Regression
Extract Features and Classification of Diseases from X-Ray Images Using Viola...
Estimating Human Pose from Occluded Images (ACCV 2009)
IMAGE DE-NOISING USING DEEP NEURAL NETWORK
Image De-Noising Using Deep Neural Network
Ad

More from Syed Zaid Irshad (20)

PDF
Data Structures & Algorithms - Spring 2025.pdf
PDF
Operating System.pdf
PDF
DBMS_Lab_Manual_&_Solution
PPTX
Data Structure and Algorithms.pptx
PPTX
Design and Analysis of Algorithms.pptx
PPTX
Professional Issues in Computing
PDF
Reduce course notes class xi
PDF
Reduce course notes class xii
PDF
Introduction to Database
PDF
C Language
PDF
Flowchart
PDF
Algorithm Pseudo
PDF
Computer Programming
PDF
ICS 2nd Year Book Introduction
PDF
Security, Copyright and the Law
PDF
Computer Architecture
PDF
Data Communication
PDF
Information Networks
PDF
Basic Concept of Information Technology
PDF
Introduction to ICS 1st Year Book
Data Structures & Algorithms - Spring 2025.pdf
Operating System.pdf
DBMS_Lab_Manual_&_Solution
Data Structure and Algorithms.pptx
Design and Analysis of Algorithms.pptx
Professional Issues in Computing
Reduce course notes class xi
Reduce course notes class xii
Introduction to Database
C Language
Flowchart
Algorithm Pseudo
Computer Programming
ICS 2nd Year Book Introduction
Security, Copyright and the Law
Computer Architecture
Data Communication
Information Networks
Basic Concept of Information Technology
Introduction to ICS 1st Year Book

Recently uploaded (20)

PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPT
Project quality management in manufacturing
DOCX
573137875-Attendance-Management-System-original
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Construction Project Organization Group 2.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
additive manufacturing of ss316l using mig welding
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Foundation to blockchain - A guide to Blockchain Tech
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
OOP with Java - Java Introduction (Basics)
Project quality management in manufacturing
573137875-Attendance-Management-System-original
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Construction Project Organization Group 2.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
additive manufacturing of ss316l using mig welding
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Embodied AI: Ushering in the Next Era of Intelligent Systems
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks

Robust Real Time Face Detection

  • 1. Robust Real-time FaceRobust Real-time Face DetectionDetection byby Paul Viola and Michael Jones, 2002Paul Viola and Michael Jones, 2002 Presentation by Kostantina Palla & Alfredo Kalaitzis School of Informatics University of Edinburgh February 20, 2009
  • 2. OverviewOverview  Robust – very high Detection Rate (True-Positive Rate) & very low False-Positive Rate… always.  Real Time – For practical applications at least 2 frames per second must be processed.  Face Detection – not recognition. The goal is to distinguish faces from non-faces (face detection is the first step in the identification process)
  • 3. Three goals & a conlcusionThree goals & a conlcusion 1. Feature Computation: what features? And how can they be computed as quickly as possible 2. Feature Selection: select the most discriminating features 3. Real-timeliness: must focus on potentially positive areas (that contain faces) 4. Conclusion: presentation of results and discussion of detection issues. How did Viola & Jones deal with these challenges?
  • 4. 1. Feature Computation The “Integral” image representation 2. Feature Selection The AdaBoost training algorithm 3. Real-timeliness A cascade of classifiers Three solutionsThree solutions
  • 5. FeaturesFeatures  Can a simple feature (i.e. a value) indicate the existence of a face?  All faces share some similar properties  The eyes region is darker than the upper-cheeks.  The nose bridge region is brighter than the eyes.  That is useful domain knowledge  Need for encoding of Domain Knowledge:  Location - Size: eyes & nose bridge region  Value: darker / brighter OverviewOverview || Integral ImageIntegral Image | AdaBoost| AdaBoost | Cascade| Cascade
  • 6. Rectangle featuresRectangle features  Rectangle features:  Value = ∑ (pixels in black area) - ∑ (pixels in white area)  Three types: two-, three-, four-rectangles, Viola&Jones used two-rectangle features  For example: the difference in brightness between the white &black rectangles over a specific area  Each feature is related to a special location in the sub-window  Each feature may have any size  Why not pixels instead of features?  Features encode domain knowledge  Feature based systems operate faster Overview |Overview | Integral ImageIntegral Image | AdaBoost| AdaBoost | Cascade| Cascade
  • 7. Integral Image RepresentationIntegral Image Representation (also check back-up slide #1)(also check back-up slide #1)  Given a detection resolution of 24x24 (smallest sub-window), the set of different rectangle features is ~160,000 !  Need for speed  Introducing Integral Image Representation  Definition: The integral image at location (x,y), is the sum of the pixels above and to the left of (x,y), inclusive  The Integral image can be computed in a single pass and only once for each sub-window! ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ' , ' formal definition: , ', ' Recursive definition: , , 1 , , 1, , x x y y ii x y i x y s x y s x y i x y ii x y ii x y s x y ≤ ≤ = = − + = − + ∑ Overview |Overview | Integral ImageIntegral Image | AdaBoost| AdaBoost | Cascade| Cascade y x
  • 8. back-up slide #1back-up slide #1 Overview |Overview | Integral ImageIntegral Image | AdaBoost| AdaBoost | Cascade| Cascade 0 1 1 1 1 2 2 3 1 2 1 1 1 3 1 0 IMAGE 0 1 2 3 1 4 7 11 2 7 11 16 3 11 16 21 INTEGRAL IMAGE
  • 9. Rapid computation of rectangular featuresRapid computation of rectangular features  Back to feature evaluation . . .  Using the integral image representation we can compute the value of any rectangular sum (part of features) in constant time  For example the integral sum inside rectangle D can be computed as: ii(d) + ii(a) – ii(b) – ii(c)  two-, three-, and four-rectangular features can be computed with 6, 8 and 9 array references respectively.  As a result: feature computation takes less time ii(a) = A ii(b) = A+B ii(c) = A+C ii(d) = A+B+C+D D = ii(d)+ii(a)- ii(b)-ii(c) Overview |Overview | Integral ImageIntegral Image | AdaBoost| AdaBoost | Cascade| Cascade
  • 10. Three goalsThree goals 1. Feature Computation: features must be computed as quickly as possible 2. Feature Selection: select the most discriminating features 3. Real-timeliness: must focus on potentially positive image areas (that contain faces) How did Viola & Jones deal with these challenges? Overview |Overview | Integral ImageIntegral Image | AdaBoost| AdaBoost | Cascade| Cascade
  • 11. Feature selectionFeature selection  Problem: Too many features  In a sub-window (24x24) there are ~160,000 features (all possible combinations of orientation, location and scale of these feature types)  impractical to compute all of them (computationally expensive)  We have to select a subset of relevant features – which are informative - to model a face  Hypothesis: “A very small subset of features can be combined to form an effective classifier”  How?  AdaBoost algorithm Overview | Integral ImageOverview | Integral Image || AdaBoostAdaBoost | Cascade| Cascade Relevant feature Irrelevant feature
  • 12. AdaBoostAdaBoost  Stands for “Adaptive” boost  Constructs a “strong” classifier as a linear combination of weighted simple “weak” classifiers Overview | Integral ImageOverview | Integral Image || AdaBoostAdaBoost | Cascade| Cascade Strong classifier Weak classifier WeightImage
  • 13. AdaBoost -AdaBoost - CharacteristicsCharacteristics  Features as weak classifiers Each single rectangle feature may be regarded as a simple weak classifier  An iterative algorithm AdaBoost performs a series of trials, each time selecting a new weak classifier  Weights are being applied over the set of the example images During each iteration, each example/image receives a weight determining its importance Overview | Integral ImageOverview | Integral Image || AdaBoostAdaBoost | Cascade| Cascade
  • 14. AdaBoost -AdaBoost - Getting the idea…Getting the idea…  Given: example images labeled +/-  Initially, all weights set equally  Repeat T times  Step 1: choose the most efficient weak classifier that will be a component of the final strong classifier (Problem! Remember the huge number of features…)  Step 2: Update the weights to emphasize the examples which were incorrectly classified  This makes the next weak classifier to focus on “harder” examples  Final (strong) classifier is a weighted combination of the T “weak” classifiers  Weighted according to their accuracy Overview | Integral ImageOverview | Integral Image || AdaBoostAdaBoost | Cascade| Cascade     ≥ = ∑ ∑= = otherwise x xh T t T t ttt h 0 2 1 )(1 )( 1 1αα (pseudo-code at back-up slide #2)(pseudo-code at back-up slide #2)
  • 15. AdaBoost –AdaBoost – Feature SelectionFeature Selection Problem  On each round, large set of possible weak classifiers (each simple classifier consists of a single feature) – Which one to choose?  choose the most efficient (the one that best separates the examples – the lowest error)  choice of a classifier corresponds to choice of a feature  At the end, the ‘strong’ classifier consists of T features Conclusion  AdaBoost searches for a small number of good classifiers – features (feature selection)  adaptively constructs a final strong classifier taking into account the failures of each one of the chosen weak classifiers (weight appliance)  AdaBoost is used to both select a small set of features and train a strong classifier Overview | Integral ImageOverview | Integral Image || AdaBoostAdaBoost | Cascade| Cascade
  • 16.  AdaBoost starts with a uniform distribution of “weights” over training examples.  Select the classifier with the lowest weighted error (i.e. a “weak” classifier)  Increase the weights on the training examples that were misclassified.  (Repeat)  At the end, carefully make a linear combination of the weak classifiers obtained at all iterations. AdaBoost exampleAdaBoost example ( )1 1 1 strong 1 1 ( ) ( ) ( ) 2 0 otherwise n n nh h h  α + + α ≥ α + + α =   x x x K K Slide taken from a presentation by Qing Chen, Discover Lab, University of Ottawa Overview | Integral ImageOverview | Integral Image || AdaBoostAdaBoost | Cascade| Cascade
  • 17. Now we have a good face detectorNow we have a good face detector  We can build a 200-feature classifier!  Experiments showed that a 200- feature classifier achieves:  95% detection rate  0.14x10-3 FP rate (1 in 14084)  Scans all sub-windows of a 384x288 pixel image in 0.7 seconds (on Intel PIII 700MHz)  The more the better (?)  Gain in classifier performance  Lose in CPU time  Verdict: good & fast, but not enough  Competitors achieve close to 1 in a 1.000.000 FP rate!  0.7 sec / frame IS NOT real-time. Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
  • 18. Three goalsThree goals 1. Feature Computation: features must be computed as quickly as possible 2. Feature Selection: select the most discriminating features 3. Real-timeliness: must focus on potentially positive image areas (that contain faces) How did Viola & Jones deal with these challenges? Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
  • 19. The attentional cascadeThe attentional cascade  On average only 0.01% of all sub- windows are positive (are faces)  Status Quo: equal computation time is spent on all sub-windows  Must spend most time only on potentially positive sub-windows.  A simple 2-feature classifier can achieve almost 100% detection rate with 50% FP rate.  That classifier can act as a 1st layer of a series to filter out most negative windows  2nd layer with 10 features can tackle “harder” negative-windows which survived the 1st layer, and so on…  A cascade of gradually more complex classifiers achieves even better detection rates. Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade On average, much fewer features are computed per sub-window (i.e. speed x 10)
  • 20. Training a cascade of classifiersTraining a cascade of classifiers Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade Strong classifier definition: , where ,  Keep in mind:  Competitors achieved 95% TP rate,10-6 FP rate  These are the goals. Final cascade must do better!  Given the goals, to design a cascade we must choose:  Number of layers in cascade (strong classifiers)  Number of features of each strong classifier (the ‘T’ in definition)  Threshold of each strong classifier (the in definition)  Optimization problem:  Can we find optimum combination? ∑= T t t1 2 1 α TREMENDOUSLY DIFFICULT PROBLEM
  • 21. A simple framework for cascade trainingA simple framework for cascade training Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade  Do not despair. Viola & Jones suggested a heuristic algorithm for the cascade training: (pseudo-code at backup slide # 3)  does not guarantee optimality  but produces a “effective” cascade that meets previous goals  Manual Tweaking:  overall training outcome is highly depended on user’s choices  select fi (Maximum Acceptable False Positive rate / layer)  select di (Minimum Acceptable True Positive rate / layer)  select Ftarget (Target Overall FP rate)  possible repeat trial & error process for a given training set  Until Ftarget is met:  Add new layer:  Until fi , di rates are met for this layer  Increase feature number & train new strong classifier with AdaBoost  Determine rates of layer on validation set
  • 22. backup slide #3backup slide #3 • User selects values for f, the maximum acceptable false positive rate per layer and d, the minimum acceptable detection rate per layer. • User selects target overall false positive rate Ftarget. • P = set of positive examples • N = set of negative examples • F0 = 1.0; D0 = 1.0; i = 0 While Fi > Ftarget i++ ni = 0; Fi = Fi-1 while Fi > f x Fi-1 oni ++ oUse P and N to train a classifier with ni features using AdaBoost oEvaluate current cascaded classifier on validation set to determine Fi and Di oDecrease threshold for the ith classifier until the current cascaded classifier has a detection rate of at least d x Di-1 (this also affects Fi) N = ∅ If Fi > Ftarget then evaluate the current cascaded detector on the set of non-face images and put any false detections into the set N. Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
  • 23. Three goalsThree goals 1. Feature Computation: features must be computed as quickly as possible 2. Feature Selection: select the most discriminating features 3. Real-timeliness: must focus on potentially positive image areas (that contain faces) How did Viola & Jones deal with these challenges? Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
  • 24. Training Set (sub- windows) Integral Representation Feature computation AdaBoost Feature Selection Cascade trainer Testing phaseTesting phaseTraining phaseTraining phase Strong Classifier 1 (cascade stage 1) Strong Classifier N (cascade stage N) Classifier cascade framework Strong Classifier 2 (cascade stage 2) FACE IDENTIFIED Overview | Integral ImageOverview | Integral Image | AdaBoost| AdaBoost || CascadeCascade
  • 25. pros …pros …  Extremely fast feature computation  Efficient feature selection  Scale and location invariant detector  Instead of scaling the image itself (e.g. pyramid-filters), we scale the features.  Such a generic detection scheme can be trained for detection of other types of objects (e.g. cars, hands) …… and consand cons  Detector is most effective only on frontal images of faces  can hardly cope with 45o face rotation  Sensitive to lighting conditions  We might get multiple detections of the same face, due to overlapping sub-windows.
  • 26. ResultsResults (detailed results at back-up slide #4)(detailed results at back-up slide #4)
  • 28.  Viola & Jones prepared their final Detector cascade:  38 layers, 6060 total features included  1st classifier- layer, 2-features  50% FP rate, 99.9% TP rate  2nd classifier- layer, 10-features  20% FP rate, 99.9% TP rate  next 2 layers 25-features each, next 3 layers 50-features each  and so on…  Tested on the MIT+MCU test set  a 384x288 pixel image on an PC (dated 2001) took about 0.067 seconds Detector 10 31 50 65 78 95 167 422 Viola-Jones 76.1% 88.4% 91.4% 92.0% 92.1% 92.9% 93.9% 94.1% Rowley-Baluja-Kanade 83.2% 86.0% - - 89.2% 89.2% 90.1% 89.9% Schneiderman-Kanade - - - 94.4% - - - - Roth-Yang-Ajuha - - - - - - - - False detections Detection rates for various numbers of false positives on the MIT+MCU test set containing 130 images and 507 faces (Viola & Jones 2002) backup slide #4backup slide #4
  • 29. Thank you for listening!Thank you for listening!

Editor's Notes

  • #3: קצת על המאמר פורסם ב 2001 מתעסק במטלת ה Object detection, והודגם ספציפית על מטלת ה Face detection. גם כן המוטיבציה מאחורי היוזמה. עם דגש משמעותי על מהירות מערכת real time face detection הראשונה בעולם מומש, שוכלל והורחב בין השאר ממומש בחבילת OpenCV הדוגמאות שלי מעט יותר משוכלל, אבל אותו עיקרון בעיות האם זה משנה היכן המאמר פורסם? האומנם first real-time face detection? International journal of computer vision
  • #4: בשר המאמר מתחיל בעצם עכשיו – עד עכשיו הדגמה.
  • #5: בשר המאמר מתחיל בעצם עכשיו – עד עכשיו הדגמה.
  • #6: אבני הבניין של ה Classifiers באמצעותם אנו מייצגים אובייקטים. פיצ'ר היא פונקציה שמחשבת הפרשים בין אזורים מבלניים בתמונה מתואר ויזואלית ע"י מלבן מחולק לתתי מלבנים, שחלקם לבנים וחלקם אפורים. ההפרש שמחושב הוא בין המלבנים בצבעים השונים. למשל... במאמר אנחנו משתמשים בארבעה סוגי פיצ'רים מידע נוסף 3 rectangular features types: two-rectangle feature type (horizontal/vertical) three-rectangle feature type four-rectangle feature type Doesn’t operate on raw grayscale pixel values but rather on values obtained from applying simple filters to the pixels.
  • #7: נדגים את השימוש בפיצ'רים על פרצופים. פנים ניתן לתאר היטב באמצעות פיצרים יש חלקים בולטים יותר כגון האף המצח והלחיים שסופגות יותר אור. לעומתן איזורים שקועים יותר כגון ארובות העיניים, שמוצללות יותר. בנוסף לשיער, פה, ומבנה הגולגולת. כך שיש לנו הפרשי בהירות מובהקים. למשל... מה קורה אם נבנה classifier סופי משני הפיצ'רים הללו? נקבל 100% detection rate, 40% false positive rate אלו תוצאות לא מספקות לשם זיהוי אבל נעזר בתוצאות בהמשך לשם סינון ראשוני של חלונות רקע. מכאן הלאה האלגוריתם שלנו לא יתעסק ישירות עם הפיקסלים בתמונה, אלא אך ורק באמצעות חישובי פיצ'רים – קרי הפרשים. שיפורים אולי אפילו לשים תמונה מוכרת יותר
  • #8: הגדרה: בע"פ: היצוג האינטגרלי של תמונה בנקודה X,Y הוא סכום הפיקסלים שמעל ומשמאל לנקודה, כולל. פרומאלית: רקורסיבית חישוב במעבר אחד סיבוכיות זמן כגודל התמונה
  • #9: הגדרה: בע"פ: היצוג האינטגרלי של תמונה בנקודה X,Y הוא סכום הפיקסלים שמעל ומשמאל לנקודה, כולל. פרומאלית: רקורסיבית חישוב במעבר אחד סיבוכיות זמן כגודל התמונה
  • #11: בשר המאמר מתחיל בעצם עכשיו – עד עכשיו הדגמה.
  • #12: לא מספיק למצוא את הפיצרים הכי טובים אם היינו בחורים את 200 הפיצרים הכי טובים (על סמך אחוז detection rate) – היינו מקבלים פחות או יותר את אותו סוג הפיצ'ר (שבמקרה של פנים היה איזור העיניים כהה יותר מגשר האף) חשוב לנו מגוון מייצג של פייצ'ירם. יחודיים. בלי מגוון מאפיינים מייחדים – שיעור ה FALSE POSITIVE שלנו עולה בהרבה. לכן חשוב מגוון מאפיינים – כדי לייחד
  • #13: מידע נוסף AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers.
  • #14: מידע נוסף AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers.
  • #15: מידע נוסף AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers.
  • #16: מידע נוסף AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers.
  • #17: לספר לאנשים מה הם רואים כל Weak classifier ממושקל (a) לפי ההצלחה שלו בפני עצמו. ה Classifier החזק מנצחי לפי Weighted majority של המשקלים הללו. מידע נוסף לצירוף הלינארי בסוף הם קוראים Perceptron.
  • #18: מידע נוסף AdaBoost is adaptive in the sense that subsequent classifiers built are tweaked in favor of those instances misclassified by previous classifiers.
  • #19: בשר המאמר מתחיל בעצם עכשיו – עד עכשיו הדגמה.
  • #23: הגדרה: בע"פ: היצוג האינטגרלי של תמונה בנקודה X,Y הוא סכום הפיקסלים שמעל ומשמאל לנקודה, כולל. פרומאלית: רקורסיבית חישוב במעבר אחד סיבוכיות זמן כגודל התמונה
  • #24: בשר המאמר מתחיל בעצם עכשיו – עד עכשיו הדגמה.
  • #25: בשר המאמר מתחיל בעצם עכשיו – עד עכשיו הדגמה.
  • #27: תוצאות אמיתיות מהניסוי