SlideShare a Scribd company logo
Matrioska: A Multi-Level Approach to Fast
Tracking by Learning
Mario Edoardo Maresca and Alfredo Petrosino
Department of Science and Technology, Parthenope University of Naples,
Centro Direzionale 80143, Napoli, Italy
marioedoardo.maresca@studenti.uniparthenope.it
petrosino@uniparthenope.it

Abstract. In this paper we propose a novel framework for the detection
and tracking in real-time of unknown object in a video stream. We decompose the problem into two separate modules: detection and learning.
The detection module can use multiple keypoint-based methods (ORB,
FREAK, BRISK, SIFT, SURF and more) inside a fallback model, to correctly localize the object frame by frame exploiting the strengths of each
method. The learning module updates the object model, with a growing
and pruning approach, to account for changes in its appearance and extracts negative samples to further improve the detector performance. To
show the effectiveness of the proposed tracking-by-detection algorithm,
we present numerous quantitative results on a number of challenging sequences in which the target object goes through changes of pose, scale
and illumination.
Keywords: Tracking by detection, real-time, keypoint-based methods,
learning, interest points.

1

Introduction

Despite recent innovations, real-time object tracking remains one of the most
challenging problems in a wide range of computer vision applications. The task
of tracking an unknown object in a video can be referred to as long-term tracking [13] or model-free tracking [14]. The goal of such systems is to localize the
object (we will refer to it as target object) in a generic video sequence, given
only the first bounding box that defines the object in the first frame. Tracking
objects is challenging because the system must deal with changes of appearance, illuminations, occlusions, out-of-plane rotations and real-time processing
requirements.
In its simplest form, tracking can be defined as the problem of estimating the
object motion in the image plane. Numerous approaches have been proposed,
however they primarily differ the choice of the object representation, that can
include: (i) points, (ii) primitive geometric shapes, (iii) object silhouette, (iiii)
skeletal models and more. For further details we refer the reader to [10].
The main challenge of an object tracking system is the difficulty to handle the appearance changes of the target object. The appearance changes can
2

Matrioska: A Multi-Level Approach to Fast Tracking by Learning

be caused by intrinsic changes such as pose, scale and shape variation and by
extrinsic changes such as illumination, camera motion, camera viewpoint, and
occlusions. To model such variability, various approaches have been proposed,
such as: updating a low dimensional subspace representation [15], MIL based
[14] and template or patch based.
Robust algorithms for long-term tracking are generally designed as the union
of different modules: a tracker, that performs object motion analysis, a detector,
that localizes the object when the tracker accumulates errors during run-time
and a learner that updates the object model. A system that uses only a tracker
is prone to failure: when the object is occluded or disappears from the camera view, the tracker will usually drift. For this reason we choose to design the
proposed framework as the union of only two modules: the detector and the
learner. The detector can use multiple keypoint-based methods to correctly localize the object, despite changes of illumination, scale, pose and occlusions,
within a fallback model. The learner updates the training pool used by the detector to account for large changes in the object appearance. Quantitative evaluations demonstrate the effectiveness of our approach that can be classified as a
“tracking-by-detection” algorithm, since we track the target object by detecting
it frame by frame.
The rest of the paper is organized as follows. Section 2 proposes an outline of
the current keypoint-based methods. Section 3 introduces in detail the proposed
framework (Matrioska): subsection 3.1 and 3.4 analyze the detector and the
learning module, respectively. Section 4 shows experimental results.

2

Overview of known keypoint-based methods

Numerous new keypoint-based methods have been proposed over the years (also
known as local feature-based or interest point-based). The last known technique,
KAZE [9], was published in 2012 .
Other methods are (in reverse cronological order): KAZE (2012) operating completely in a nonlinear scale space [9]; FREAK (2012) inspired by the
human visual system and more precisely by the retina [8]; BRISK (2011) Binary Robust Invariant Scalable Keypoints [6]; ORB (2011) Oriented FAST
and Rotated BRIEF [4]; ASIFT (2009) fully affine invariant image comparison method [7]; SURF (2006) Speeded Up Robust Features [3]; GLOH (2005)
Gradient location-orientation histogram [5]; PCA-SIFT (2004) Principal Components Analysis (PCA) to the normalized gradient patch [2]; SIFT (1999)
Scale-Invariant Feature Transform [1].

3

Matrioska

In the next sections we describe our novel framework for object detection and
tracking (belonging to the category of “tracking-by-detection”). First we will describe the principal components of the proposed detection module: (i) a detector
that uses the information of multiple keypoint-based methods, (ii) a filtering
Matrioska: A Multi-Level Approach to Fast Tracking by Learning

3

stage with a modified Generalized Hough Transform, and (iii) a scale identification process. Then we will explain the learning module, based on a growingand-pruning approach. Later we will perform quantitative tests to show that,
by using multiple keypoint-based methods, it is possible to achieve both a faster
overall detection time and an improved recall. In this stage we will disable the
online learning module to only focus on the outcome of the usage of multiple
methods. Then we will enable the online learning module to test Matrioska on
a number of challenging video clips that present strong changes in the object
appearance. Note that we intentionally choose to not apply any motion analysis,
as we only want to focus on the detector and learning components.

Fig. 1. Snapshots from TLD and MILTrack datasets. The proposed method is able
to correctly detect the target object despite large occlusion, illumination and pose
changes.

3.1

Detection: combining multiple keypoint-based methods

As we have shown in section 2, the strong interest around the category of
keypoint-based methods is evident as new methods are presented at a very high
rate. The reasons for this interest are essentially two: (i) they are fast as they
only focus on a sparse set of points and (ii) they are inherently robust to a
series of challenges (changes in illumination, point of view, rotation, scale and
occlusion).
The development of Matrioska starts from these considerations: we want to
achieve real-time performance and a high degree of robustness. We believe that
4

Matrioska: A Multi-Level Approach to Fast Tracking by Learning

the integration of various keypoint-based methods represents one of the best
ways to achieve these goals. Furthermore, by combining in a single framework
the results coming from different techniques, we are able to take advantage of
the strengths of each of them, thus increasing the overall robustness.
Figure 2 shows an outline of Matrioska’s detector behaviour. The algorithm
proceeds as follows:
1. Detection of the keypoints from the nth frame with the first registered
method in the technique pool.
2. K-nearest neighbor search (k-NN) between keypoints (of the same class) of
the training pool.
3. A first outlier filtering using the NNDR (Nearest Neighbor Distance Ratio).
4. A second filtering to discard all matches whose first nearest neighbor was
found on a negative sample.
5. A third, more specific filtering, is performed with the Generalized Hough
Transform (see section 3.2).
6. The last step involves the scale estimation to accurately draw the bounding
box according to the parameters obtained by the GHT (see section 3.3).
Steps 1-5 are encapsulated in a fallback model : we use the next method only
if the previous were not able to identify the target object. This model ensures
that only the sufficient keypoint-based methods will be used.

Fig. 2. We can add to the Technique pool an indefinite number of techniques. Matrioska, using the fallback model, will use, frame by frame, only the sufficient number
of techniques to detect the object, saving valuable computational resources.

3.2

Outliers filtering

The main drawback of using multiple keypoint-based methods is the fact that
each method will add a considerable amount of new outliers, making the filtering
Matrioska: A Multi-Level Approach to Fast Tracking by Learning

5

stage a challenging process. Furthermore we must operate in real-time, therefore
the filtering process should be as fast as possible.
In this scenario, filtering outliers with well-known fittings methods, such as
RANSAC or LMedS (Least Median of Squares), would not produce good results
because the percentage of inliers can fall much lower than 50%. For this reason
we employ a filtering process based on the Generalized Hough Transform (GHT)
in which each match of keypoints specifies three parameters: 2D object’s center
and orientation. To estimate the target object center we store, for each trained
keypoint, the size of the corresponding training image, therefore we can project
the center of this image on the current frame with a translation and a rotation.
These parameters are sufficient to localize the object if the scale doesn’t
change during tracking, but this is a strong assumption and generally it doesn’t
stand. To account for scale changes we could use a GHT with four parameters, adding the scale to the previous three parameters, similar to the solution
proposed by D. Lowe [1]. However, this could pose serious limitations on the
keypoint-based methods that Matrioska can use, as to identify the right scale
bin, we need to use the keypoint’s octave in which it was detected. However we
cannot rely on it because not all methods implement octave scaling, and even
if they implement it, we tend to fix the number of octaves to one for performance reasons. Furthermore, the octave number gives a very broad indication
because increasing the scale by an octave means doubling the size of the smoothing kernel, whose effect is roughly equivalent to halving the image resolution.
Instead, we want to achieve a higher precision, up to a factor equal to 0.01 of
scale changes.
3.3

Scale identification

The identification of the current object scale is an important step and deserves
a separate section because it is not directly related to the GHT discussed earlier. We want to achieve a stable and accurate method to correctly identify the
scale of the object without relying on the keypoint’s octave. In order to satisfy
these constraints we study the geometric distance between pairs of keypoints:
we compute the ratio between the distance of consecutive pairs of keypoints belonging to one training image and the query image. We repeat the process for
each training image having at least two matches. After this process, we obtain
the final object size by calculating the mean of all training image sizes scaled by
the factor found with the ratio of distances. The final size can be obtained with
the following equation:
So =

1
N

N

i=1

Si
J −1

J−1

k=1

PQ − PQ
k
k+1
PTi − PTi
k
k+1

(1)

where
– So and Si are two vectors that represent the width and height. So represents
the size of the object while Si the size of the ith training image.
6

Matrioska: A Multi-Level Approach to Fast Tracking by Learning

– N is the number of the training images.
– J is the number of keypoints found on the ith training image.
– PQ and PTi are the kth keypoints found on the query image Q and matched
k
k
to the training image Ti .

Fig. 3. We compute the ratio between the distance of consecutive pairs of keypoints
belonging to one training image and the query image. Represented here as segments
of the same color. The process is iterated over pairs of consecutive keypoints for each
training image to obtain the final object size.

3.4

Learning: growing and pruning

The learning component is imperative to solve one of the toughest challenges of
visual tracking: adapting the model to account for changes in the target object
appearance (shape deformation, lighting variation, large variation of pose).
In this section we present the proposed schema aiming to update the model
(the training pool) used by the detection module (section 3.1) to track the object.
Inspired by [15, 14, 13, 11, 16] our learning model is an incremental growing and
pruning approach: while tracking the object we must learn both new positive
and negative samples that will be added to the training pool.
One of the key factors in learning is the choice of the new positive sample: we
must learn a new sample only if the object appearance is undergoing changes.
This will: (i) avoid saturating the training pool with duplicated samples, (ii) add
valuable information to the detection component and (iii) not let the NNDR
discard good matches. To achieve this we must carefully determine the selection
+
criterion. The proposed criterion to choose a new candidate positive sample Sc
is a combination of two different conditions: (1) the detection module failed to
detect the object in the previous frame, (2) the current best GHT’s bin has less
than 2V votes. To learn a new positive sample one of these conditions must be
verified. A similar but simpler process is employed for the negative samples: we
learn as negative the keypoints found outside the bounding box when the ratio
between the number of positive keypoints and the negatives exceeds a given
threshold.
Matrioska: A Multi-Level Approach to Fast Tracking by Learning

7

Fig. 4. Some of the positive samples that have passed the selection criterion and have
been online learned. The tested video clip is the Carchase dataset from TLD.

4

Experimental Results

We have tested Matrioska on several challenging video sequences from TLD
[13] and MILTrack [14] dataset. The first tests show the performance changes
given by the use of multiple keypoint-based methods. However we only show
some possible combinations because testing all possible configurations would
not be plausible. Furthermore, our aim is to demonstrate that by using multiple
methods we can achieve both a faster overall detection time and an improved
accuracy. In this stage we will disable the online learning module to focus only
on the outcome of the usage of multiple methods.
In the second tests we will enable the online learning module to test Matrioska
on a number of challenging video clips that present strong changes in the object
appearance. In all the tests we initialize the algorithm only with the location of
the object in the first frame.
To avoid confusion we will use the same metric in all sequences to evaluate Matrioska performance: precision P = correctDetections/detections, recall
R = correctDetections/trueDetections and f-measure, where correctDetections
represents the number of detection whose overlap with ground truth bounding
box is larger than 25%, if ground truth is defined. The overlap is defined as
intersection/(GTarea + BBarea − intersection), where GTarea is the area of the
ground truth and BBarea is the area of the bounding box [13].
4.1

Detection with multiple methods

In this section we disable the online learning module to only evaluate the results
of using multiple keypoint-based methods. We will test ORB, BRISK, FREAK,
SURF and SIFT alone on some sequences and then we will try different combinations. Note that our aim is to show the advantage obtained by combining
multiple keypoint-based methods rather than running comparative evaluations
of single methods (such as [17–19]).
As Tables 1, 2, 3, 4 show, the best results in terms of recall, are obtained
with a combination of two methods. Furthermore, table 2 is indicative of the
contribution given by the fallback model: we obtained a recall of 0.69, halving
the time complexity from 0.11 seconds per frame (9 FPS) required using SIFT
only, to 0.055 seconds (18 FPS) using FREAK + SIFT. This is possible because
8

Matrioska: A Multi-Level Approach to Fast Tracking by Learning
Table 1. Tiger2 (MILTrack)
Method(s)
ORB
BRISK
FREAK
SURF
SIFT
ORB + FREAK
FREAK + SURF

FPS
30
20
29
6
4
24
10

Recall
0.15
0.01
0.23
0.04
0.05
0.38
0.37

Table 3. Motocross (TLD)
Method(s)
ORB
BRISK
FREAK
SURF
SIFT
ORB + SURF
ORB + SIFT

FPS
38
24
25
9
6
15
10

Recall
0.31
0.10
0.05
0.04
0.05
0.34
0.40

Table 2. Face occlusion 2 (TLD)
Method(s)
FPS
ORB
26
BRISK
25
FREAK
27
SURF
16
SIFT
9
FREAK + SURF 21
FREAK + SIFT 18

Recall
0.44
0.44
0.63
0.57
0.64
0.68
0.69

Table 4. Car (TLD)
Method(s)
ORB
BRISK
FREAK
SURF
SIFT
FREAK + SIFT
ORB + FREAK

FPS
28
30
33
15
8
14
20

Recall
0.48
0.23
0.54
0.48
0.67
0.95
0.87

SIFT will be used by Matrioska only when necessary. Table 4 shows an almost
perfect result testing Car dataset (TLD), even without the online learning module enabled: this is due to the fact that the target object does not change its
appearance during the sequence and our detector, with a combination of two
methods, is enough to obtain a robust performance.
4.2

Detection and Learning

In the following evaluation we will enable the online learning module of Matrioska
to test our approach against the dataset used by TLD and MILTrack. The gain in
performance compared to the use of the detection module alone is clearly visible
in Table 5. Figure 1 shows snapshots of the tested sequences with examples
of detection. The obtained results are better than many other state-of-the-art
approaches [24, 22, 25, 21, 15, 23, 14, 13, 26].
It must be noted that: (i) we could carefully choose for each sequence the
most suitable methods to obtain better performance but instead we registered
in the technique pool only ORB and FREAK despite the tested sequence, (ii)
the use of keypoint-based methods forced us to double the size of the smaller
sequence (and relocate the ground truth accordingly) because when the target
object is too small we cannot compute a feature vector for each keypoint found
inside it (e.g. Tiger2 and Coke11), (iii) we slightly enlarged the first bounding
box to be able to compute the feature vectors near the borders of the target
object, and (iiii) our OpenCV C++ single-threaded implementation runs at 25
FPS on an Intel Core i7-920 with a QVGA video stream.
Matrioska: A Multi-Level Approach to Fast Tracking by Learning

9

Table 5. Evaluation of Matrioska with online learning enabled. We provided only the
first object location and algorithm tracked the target object up to the final frame. The
results are better than many other state-of-the-art approaches.
Sequence
Car (TLD)
Carchase (TLD)
Motocross (TLD)
Pedestrain2 (TLD)
Pedestrain3 (TLD)
Coke11 (MILTrack)
David (MILTrack)
Face occlusion 2 (MILTrack)
Tiger2 (MILTrack)
Sylverster (MILTrack)

5

Frames
945
9928
2665
338
184
292
462
816
365
1345

Correct D. / True D. P / R / F-measure
854 / 860
0.97 / 0.99 / 0.98
7551 / 8660
0.97 / 0.87 / 0.92
1357 / 1412
0.84 / 0.96 / 0.90
260 / 266
0.94 / 0.98 / 0.96
145 / 156
0.96 / 0.92 / 0.94
59 / 59
1.00 / 1.00 / 1.00
93 / 93
1.00 / 1.00 / 1.00
163 / 163
1.00 / 1.00 / 1.00
67 / 73
0.93 / 0.91 / 0.92
269 / 269
1.00 / 1.00 / 1.00

Conclusions and Future Work

In this paper we presented a novel framework to address the problem of tracking
an unknown object in a video sequence. We used a combination of two modules:
(i) a detector that, using keypoint-based methods, can identify an object in
presence of illumination, scale, rotation and other changes, and (ii) a learning
module that updates the object model to account for large variations of the target
appearance. Several tests validated this approach and showed its efficiency.
In relation to future developments, in order to fully exploit the capabilities
of Matrioska it would be ideal to develop a series of new (keypoint-based) techniques each of which is based on the analysis of a particular feature (color, shape
and more) to obtain fast and simple techniques if used individually, but robust
when used together. These techniques, within a fallback model, would ensure in
addition, a low computational complexity as only the sufficient features would
be used for the correct detection of the object. Furthermore, we obtained comparable, if not better, results to the current state-of-the-art approaches using
only two components (a detector and a learning module). The integration of a
tracker could provide even better overall results.

References
1. Lowe, D. G.: Distinctive Image Features from Scale-Invariant Keypoints. Int. J.
Comput. Vision. 60, 91–110 (2004)
2. Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. Proceedings of the 2004 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition 2, 506–513 (2004)
3. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-Up Robust Features (SURF).
Comput. Vis. Image Underst. 110, 346–359 (2004)
4. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative
to SIFT or SURF. 2011 IEEE International Conference on Computer Vision, 2564–
2571 (2011)
10

Matrioska: A Multi-Level Approach to Fast Tracking by Learning

5. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE
Transactions on Pattern Analysis and Machine Intelligence 27, 1615–1630 (2005)
6. Leutenegger, S., Chli, M., Siegwart, R. Y.: BRISK: Binary Robust invariant scalable
keypoints. 2011 IEEE International Conference on Computer Vision (ICCV), 2548–
2555 (2011)
7. Morel, J.-M., Yu, G.: ASIFT: A New Framework for Fully Affine Invariant Image
Comparison. SIAM J. Img. Sci. 2, 438–469 (2009)
8. Ortiz, R.: FREAK: Fast Retina Keypoint. Proceedings of the 2012 IEEE Conference
on Computer Vision and Pattern Recognition, 510–517 (2012)
9. Alcantarilla, P. F., Bartoli, A., Davison, A. J.: KAZE features. Proceedings of the
12th European conference on Computer Vision, 214–227 (2012)
10. Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. ACM Comput. Surv.
38 (2006)
11. Kloihofer, W., Kampel, M.: Interest Point Based Tracking. 2010 20th International
Conference on Pattern Recognition, 3549–3552 (2010)
12. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple
features. Proceedings of the 2001 IEEE Computer Society Conference on Computer
Vision and Pattern Recognition 1, 511–518 (2001)
13. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-Learning-Detection. IEEE Trans.
on Pattern Anal. Mach. Intell. 34, 1409–1422 (2012)
14. Babenko, B., Yang, M.-H., Belongie, S.: Robust Object Tracking with Online Multiple Instance Learning. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1619–1632
(2011)
15. Ross, D. A., Lim, J., Lin, R.-S. Yang, M.-H.: Incremental Learning for Robust
Visual Tracking. Int. J. Comput. Vision 77, 125–141 (2008)
16. Hare, S., Saffari, A., Torr, P.H.S.: Efficient online structured output learning for
keypoint-based object tracking. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 1894–1901 (2012)
17. Heinly, J., Dunn, E., Frahm, J.-M.: Comparative evaluation of binary features.
Proceedings of the 12th European conference on CV, 759–773 (2012)
18. Gauglitz, S., H¨llerer, T., Turk, M.: Evaluation of Interest Point Detectors and
o
Feature Descriptors for Visual Tracking. Int. J. Comput. Vision 94, 335–360 (2011)
19. Khvedchenia, I.: A battle of three descriptors: SURF, FREAK and BRISK (2012).
http://computer-vision-talks.com/
20. Godec, M., Roth, P. M., Bischof, H.: Hough-based tracking of non-rigid objects.
Proceedings of the 2011 International Conference on Computer Vision, 81–88 (2011)
21. Yu, Q., Dinh, T. B., Medioni, G.: Online Tracking and Reacquisition Using Cotrained Generative and Discriminative Trackers. Proceedings of the 10th European
Conference on Computer Vision, 678–691 (2008)
22. Avidan, S.: Ensemble Tracking. IEEE Trans. Pattern Anal. Mach. Intell. 29, 261–
271 (2007)
23. Santner, J., Leistner C., Saffari, A., Pock, T., Bischof, H.: PROST Parallel Robust
Online Simple Tracking. 2010 IEEE Conference on CVPR, 723–730 (2010)
24. Grabner, H., Bischof, H.: On-line Boosting and Vision. Proceedings of the 2006
IEEE Computer Society Conference on Computer Vision and Pattern Recognition
1, 260–267 (2006)
25. Grabner, H., Leistner, C., Bischof, H.: Semi-supervised On-Line Boosting for Robust Tracking. Proceedings of the 10th European Conference on Computer Vision,
234–247 (2008)
26. Pernici, F.: FaceHugger: The ALIEN Tracker Applied to Faces. Computer Vision
ECCV 2012. Workshops and Demonstrations 7585, 597–601 (2012)

More Related Content

PDF
International Journal of Engineering Research and Development (IJERD)
PDF
Fast Motion Estimation for Quad-Tree Based Video Coder Using Normalized Cross...
PDF
3 d mrf based video tracking in the compressed domain
PDF
3 d mrf based video tracking in the compressed domain
PDF
A New Algorithm for Tracking Objects in Videos of Cluttered Scenes
PDF
Detection and Tracking of Moving Object: A Survey
PDF
Wireless Vision based Real time Object Tracking System Using Template Matching
PPTX
Multiple Object Tracking
International Journal of Engineering Research and Development (IJERD)
Fast Motion Estimation for Quad-Tree Based Video Coder Using Normalized Cross...
3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain
A New Algorithm for Tracking Objects in Videos of Cluttered Scenes
Detection and Tracking of Moving Object: A Survey
Wireless Vision based Real time Object Tracking System Using Template Matching
Multiple Object Tracking

What's hot (18)

PPTX
Object tracking
PDF
SAR Image Classification by Multilayer Back Propagation Neural Network
PDF
AN ADAPTIVE MESH METHOD FOR OBJECT TRACKING
PDF
An optimized framework for detection and tracking of video objects in challen...
PPTX
Arp zmp
PDF
An ann approach for network
PDF
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
PDF
Ijctt v7 p104
PDF
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...
PDF
GTSRB Traffic Sign recognition using machine learning
PDF
International Journal of Engineering Research and Development
PDF
A Survey On Tracking Moving Objects Using Various Algorithms
PDF
A Review on Classification Based Approaches for STEGanalysis Detection
PDF
A Novel GA-SVM Model For Vehicles And Pedestrial Classification In Videos
PDF
An Effective Implementation of Configurable Motion Estimation Architecture fo...
PDF
Ay33292297
PDF
Deep sort and sort paper introduce presentation
PPTX
Intrusion Detection Model using Self Organizing Maps.
Object tracking
SAR Image Classification by Multilayer Back Propagation Neural Network
AN ADAPTIVE MESH METHOD FOR OBJECT TRACKING
An optimized framework for detection and tracking of video objects in challen...
Arp zmp
An ann approach for network
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
Ijctt v7 p104
An Efficient Approach for Multi-Target Tracking in Sensor Networks using Ant ...
GTSRB Traffic Sign recognition using machine learning
International Journal of Engineering Research and Development
A Survey On Tracking Moving Objects Using Various Algorithms
A Review on Classification Based Approaches for STEGanalysis Detection
A Novel GA-SVM Model For Vehicles And Pedestrial Classification In Videos
An Effective Implementation of Configurable Motion Estimation Architecture fo...
Ay33292297
Deep sort and sort paper introduce presentation
Intrusion Detection Model using Self Organizing Maps.
Ad

Similar to Matrioska tracking keypoints in real-time (20)

PDF
F1063337
PDF
International Journal of Engineering Research and Development
PDF
Real time object tracking and learning using template matching
PDF
PDF
Real time implementation of object tracking through
PDF
Objects detection and tracking using fast principle component purist and kalm...
DOCX
Object tracking using python
PDF
Object video tracking using a pan tilt-zoom system
PPT
2D/Multi-view Segmentation and Tracking
PDF
real-time-object
PDF
ess-autonomousnavigation-ijrr10final.pdf
PDF
Robust Tracking Via Feature Mapping Method and Support Vector Machine
PDF
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
PDF
A survey on moving object tracking in video
PPTX
Object Detection & Tracking
PDF
object ttacking real time embdded ystem using imag processing
PPTX
CSTalks - Object detection and tracking - 25th May
PDF
GPGPU-Assisted Subpixel Tracking Method for Fiducial Markers
PDF
MULTIPLE OBJECTS TRACKING IN SURVEILLANCE VIDEO USING COLOR AND HU MOMENTS
PPTX
Motion Analysis in Image Processing using ML
F1063337
International Journal of Engineering Research and Development
Real time object tracking and learning using template matching
Real time implementation of object tracking through
Objects detection and tracking using fast principle component purist and kalm...
Object tracking using python
Object video tracking using a pan tilt-zoom system
2D/Multi-view Segmentation and Tracking
real-time-object
ess-autonomousnavigation-ijrr10final.pdf
Robust Tracking Via Feature Mapping Method and Support Vector Machine
Proposed Multi-object Tracking Algorithm Using Sobel Edge Detection operator
A survey on moving object tracking in video
Object Detection & Tracking
object ttacking real time embdded ystem using imag processing
CSTalks - Object detection and tracking - 25th May
GPGPU-Assisted Subpixel Tracking Method for Fiducial Markers
MULTIPLE OBJECTS TRACKING IN SURVEILLANCE VIDEO USING COLOR AND HU MOMENTS
Motion Analysis in Image Processing using ML
Ad

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
August Patch Tuesday
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Encapsulation theory and applications.pdf
PDF
Getting Started with Data Integration: FME Form 101
PDF
Hindi spoken digit analysis for native and non-native speakers
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PPTX
Tartificialntelligence_presentation.pptx
PDF
Web App vs Mobile App What Should You Build First.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
A Presentation on Artificial Intelligence
Unlocking AI with Model Context Protocol (MCP)
August Patch Tuesday
A comparative study of natural language inference in Swahili using monolingua...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
OMC Textile Division Presentation 2021.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Zenith AI: Advanced Artificial Intelligence
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25-Week II
Encapsulation theory and applications.pdf
Getting Started with Data Integration: FME Form 101
Hindi spoken digit analysis for native and non-native speakers
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Tartificialntelligence_presentation.pptx
Web App vs Mobile App What Should You Build First.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
DP Operators-handbook-extract for the Mautical Institute
A Presentation on Artificial Intelligence

Matrioska tracking keypoints in real-time

  • 1. Matrioska: A Multi-Level Approach to Fast Tracking by Learning Mario Edoardo Maresca and Alfredo Petrosino Department of Science and Technology, Parthenope University of Naples, Centro Direzionale 80143, Napoli, Italy marioedoardo.maresca@studenti.uniparthenope.it petrosino@uniparthenope.it Abstract. In this paper we propose a novel framework for the detection and tracking in real-time of unknown object in a video stream. We decompose the problem into two separate modules: detection and learning. The detection module can use multiple keypoint-based methods (ORB, FREAK, BRISK, SIFT, SURF and more) inside a fallback model, to correctly localize the object frame by frame exploiting the strengths of each method. The learning module updates the object model, with a growing and pruning approach, to account for changes in its appearance and extracts negative samples to further improve the detector performance. To show the effectiveness of the proposed tracking-by-detection algorithm, we present numerous quantitative results on a number of challenging sequences in which the target object goes through changes of pose, scale and illumination. Keywords: Tracking by detection, real-time, keypoint-based methods, learning, interest points. 1 Introduction Despite recent innovations, real-time object tracking remains one of the most challenging problems in a wide range of computer vision applications. The task of tracking an unknown object in a video can be referred to as long-term tracking [13] or model-free tracking [14]. The goal of such systems is to localize the object (we will refer to it as target object) in a generic video sequence, given only the first bounding box that defines the object in the first frame. Tracking objects is challenging because the system must deal with changes of appearance, illuminations, occlusions, out-of-plane rotations and real-time processing requirements. In its simplest form, tracking can be defined as the problem of estimating the object motion in the image plane. Numerous approaches have been proposed, however they primarily differ the choice of the object representation, that can include: (i) points, (ii) primitive geometric shapes, (iii) object silhouette, (iiii) skeletal models and more. For further details we refer the reader to [10]. The main challenge of an object tracking system is the difficulty to handle the appearance changes of the target object. The appearance changes can
  • 2. 2 Matrioska: A Multi-Level Approach to Fast Tracking by Learning be caused by intrinsic changes such as pose, scale and shape variation and by extrinsic changes such as illumination, camera motion, camera viewpoint, and occlusions. To model such variability, various approaches have been proposed, such as: updating a low dimensional subspace representation [15], MIL based [14] and template or patch based. Robust algorithms for long-term tracking are generally designed as the union of different modules: a tracker, that performs object motion analysis, a detector, that localizes the object when the tracker accumulates errors during run-time and a learner that updates the object model. A system that uses only a tracker is prone to failure: when the object is occluded or disappears from the camera view, the tracker will usually drift. For this reason we choose to design the proposed framework as the union of only two modules: the detector and the learner. The detector can use multiple keypoint-based methods to correctly localize the object, despite changes of illumination, scale, pose and occlusions, within a fallback model. The learner updates the training pool used by the detector to account for large changes in the object appearance. Quantitative evaluations demonstrate the effectiveness of our approach that can be classified as a “tracking-by-detection” algorithm, since we track the target object by detecting it frame by frame. The rest of the paper is organized as follows. Section 2 proposes an outline of the current keypoint-based methods. Section 3 introduces in detail the proposed framework (Matrioska): subsection 3.1 and 3.4 analyze the detector and the learning module, respectively. Section 4 shows experimental results. 2 Overview of known keypoint-based methods Numerous new keypoint-based methods have been proposed over the years (also known as local feature-based or interest point-based). The last known technique, KAZE [9], was published in 2012 . Other methods are (in reverse cronological order): KAZE (2012) operating completely in a nonlinear scale space [9]; FREAK (2012) inspired by the human visual system and more precisely by the retina [8]; BRISK (2011) Binary Robust Invariant Scalable Keypoints [6]; ORB (2011) Oriented FAST and Rotated BRIEF [4]; ASIFT (2009) fully affine invariant image comparison method [7]; SURF (2006) Speeded Up Robust Features [3]; GLOH (2005) Gradient location-orientation histogram [5]; PCA-SIFT (2004) Principal Components Analysis (PCA) to the normalized gradient patch [2]; SIFT (1999) Scale-Invariant Feature Transform [1]. 3 Matrioska In the next sections we describe our novel framework for object detection and tracking (belonging to the category of “tracking-by-detection”). First we will describe the principal components of the proposed detection module: (i) a detector that uses the information of multiple keypoint-based methods, (ii) a filtering
  • 3. Matrioska: A Multi-Level Approach to Fast Tracking by Learning 3 stage with a modified Generalized Hough Transform, and (iii) a scale identification process. Then we will explain the learning module, based on a growingand-pruning approach. Later we will perform quantitative tests to show that, by using multiple keypoint-based methods, it is possible to achieve both a faster overall detection time and an improved recall. In this stage we will disable the online learning module to only focus on the outcome of the usage of multiple methods. Then we will enable the online learning module to test Matrioska on a number of challenging video clips that present strong changes in the object appearance. Note that we intentionally choose to not apply any motion analysis, as we only want to focus on the detector and learning components. Fig. 1. Snapshots from TLD and MILTrack datasets. The proposed method is able to correctly detect the target object despite large occlusion, illumination and pose changes. 3.1 Detection: combining multiple keypoint-based methods As we have shown in section 2, the strong interest around the category of keypoint-based methods is evident as new methods are presented at a very high rate. The reasons for this interest are essentially two: (i) they are fast as they only focus on a sparse set of points and (ii) they are inherently robust to a series of challenges (changes in illumination, point of view, rotation, scale and occlusion). The development of Matrioska starts from these considerations: we want to achieve real-time performance and a high degree of robustness. We believe that
  • 4. 4 Matrioska: A Multi-Level Approach to Fast Tracking by Learning the integration of various keypoint-based methods represents one of the best ways to achieve these goals. Furthermore, by combining in a single framework the results coming from different techniques, we are able to take advantage of the strengths of each of them, thus increasing the overall robustness. Figure 2 shows an outline of Matrioska’s detector behaviour. The algorithm proceeds as follows: 1. Detection of the keypoints from the nth frame with the first registered method in the technique pool. 2. K-nearest neighbor search (k-NN) between keypoints (of the same class) of the training pool. 3. A first outlier filtering using the NNDR (Nearest Neighbor Distance Ratio). 4. A second filtering to discard all matches whose first nearest neighbor was found on a negative sample. 5. A third, more specific filtering, is performed with the Generalized Hough Transform (see section 3.2). 6. The last step involves the scale estimation to accurately draw the bounding box according to the parameters obtained by the GHT (see section 3.3). Steps 1-5 are encapsulated in a fallback model : we use the next method only if the previous were not able to identify the target object. This model ensures that only the sufficient keypoint-based methods will be used. Fig. 2. We can add to the Technique pool an indefinite number of techniques. Matrioska, using the fallback model, will use, frame by frame, only the sufficient number of techniques to detect the object, saving valuable computational resources. 3.2 Outliers filtering The main drawback of using multiple keypoint-based methods is the fact that each method will add a considerable amount of new outliers, making the filtering
  • 5. Matrioska: A Multi-Level Approach to Fast Tracking by Learning 5 stage a challenging process. Furthermore we must operate in real-time, therefore the filtering process should be as fast as possible. In this scenario, filtering outliers with well-known fittings methods, such as RANSAC or LMedS (Least Median of Squares), would not produce good results because the percentage of inliers can fall much lower than 50%. For this reason we employ a filtering process based on the Generalized Hough Transform (GHT) in which each match of keypoints specifies three parameters: 2D object’s center and orientation. To estimate the target object center we store, for each trained keypoint, the size of the corresponding training image, therefore we can project the center of this image on the current frame with a translation and a rotation. These parameters are sufficient to localize the object if the scale doesn’t change during tracking, but this is a strong assumption and generally it doesn’t stand. To account for scale changes we could use a GHT with four parameters, adding the scale to the previous three parameters, similar to the solution proposed by D. Lowe [1]. However, this could pose serious limitations on the keypoint-based methods that Matrioska can use, as to identify the right scale bin, we need to use the keypoint’s octave in which it was detected. However we cannot rely on it because not all methods implement octave scaling, and even if they implement it, we tend to fix the number of octaves to one for performance reasons. Furthermore, the octave number gives a very broad indication because increasing the scale by an octave means doubling the size of the smoothing kernel, whose effect is roughly equivalent to halving the image resolution. Instead, we want to achieve a higher precision, up to a factor equal to 0.01 of scale changes. 3.3 Scale identification The identification of the current object scale is an important step and deserves a separate section because it is not directly related to the GHT discussed earlier. We want to achieve a stable and accurate method to correctly identify the scale of the object without relying on the keypoint’s octave. In order to satisfy these constraints we study the geometric distance between pairs of keypoints: we compute the ratio between the distance of consecutive pairs of keypoints belonging to one training image and the query image. We repeat the process for each training image having at least two matches. After this process, we obtain the final object size by calculating the mean of all training image sizes scaled by the factor found with the ratio of distances. The final size can be obtained with the following equation: So = 1 N N i=1 Si J −1 J−1 k=1 PQ − PQ k k+1 PTi − PTi k k+1 (1) where – So and Si are two vectors that represent the width and height. So represents the size of the object while Si the size of the ith training image.
  • 6. 6 Matrioska: A Multi-Level Approach to Fast Tracking by Learning – N is the number of the training images. – J is the number of keypoints found on the ith training image. – PQ and PTi are the kth keypoints found on the query image Q and matched k k to the training image Ti . Fig. 3. We compute the ratio between the distance of consecutive pairs of keypoints belonging to one training image and the query image. Represented here as segments of the same color. The process is iterated over pairs of consecutive keypoints for each training image to obtain the final object size. 3.4 Learning: growing and pruning The learning component is imperative to solve one of the toughest challenges of visual tracking: adapting the model to account for changes in the target object appearance (shape deformation, lighting variation, large variation of pose). In this section we present the proposed schema aiming to update the model (the training pool) used by the detection module (section 3.1) to track the object. Inspired by [15, 14, 13, 11, 16] our learning model is an incremental growing and pruning approach: while tracking the object we must learn both new positive and negative samples that will be added to the training pool. One of the key factors in learning is the choice of the new positive sample: we must learn a new sample only if the object appearance is undergoing changes. This will: (i) avoid saturating the training pool with duplicated samples, (ii) add valuable information to the detection component and (iii) not let the NNDR discard good matches. To achieve this we must carefully determine the selection + criterion. The proposed criterion to choose a new candidate positive sample Sc is a combination of two different conditions: (1) the detection module failed to detect the object in the previous frame, (2) the current best GHT’s bin has less than 2V votes. To learn a new positive sample one of these conditions must be verified. A similar but simpler process is employed for the negative samples: we learn as negative the keypoints found outside the bounding box when the ratio between the number of positive keypoints and the negatives exceeds a given threshold.
  • 7. Matrioska: A Multi-Level Approach to Fast Tracking by Learning 7 Fig. 4. Some of the positive samples that have passed the selection criterion and have been online learned. The tested video clip is the Carchase dataset from TLD. 4 Experimental Results We have tested Matrioska on several challenging video sequences from TLD [13] and MILTrack [14] dataset. The first tests show the performance changes given by the use of multiple keypoint-based methods. However we only show some possible combinations because testing all possible configurations would not be plausible. Furthermore, our aim is to demonstrate that by using multiple methods we can achieve both a faster overall detection time and an improved accuracy. In this stage we will disable the online learning module to focus only on the outcome of the usage of multiple methods. In the second tests we will enable the online learning module to test Matrioska on a number of challenging video clips that present strong changes in the object appearance. In all the tests we initialize the algorithm only with the location of the object in the first frame. To avoid confusion we will use the same metric in all sequences to evaluate Matrioska performance: precision P = correctDetections/detections, recall R = correctDetections/trueDetections and f-measure, where correctDetections represents the number of detection whose overlap with ground truth bounding box is larger than 25%, if ground truth is defined. The overlap is defined as intersection/(GTarea + BBarea − intersection), where GTarea is the area of the ground truth and BBarea is the area of the bounding box [13]. 4.1 Detection with multiple methods In this section we disable the online learning module to only evaluate the results of using multiple keypoint-based methods. We will test ORB, BRISK, FREAK, SURF and SIFT alone on some sequences and then we will try different combinations. Note that our aim is to show the advantage obtained by combining multiple keypoint-based methods rather than running comparative evaluations of single methods (such as [17–19]). As Tables 1, 2, 3, 4 show, the best results in terms of recall, are obtained with a combination of two methods. Furthermore, table 2 is indicative of the contribution given by the fallback model: we obtained a recall of 0.69, halving the time complexity from 0.11 seconds per frame (9 FPS) required using SIFT only, to 0.055 seconds (18 FPS) using FREAK + SIFT. This is possible because
  • 8. 8 Matrioska: A Multi-Level Approach to Fast Tracking by Learning Table 1. Tiger2 (MILTrack) Method(s) ORB BRISK FREAK SURF SIFT ORB + FREAK FREAK + SURF FPS 30 20 29 6 4 24 10 Recall 0.15 0.01 0.23 0.04 0.05 0.38 0.37 Table 3. Motocross (TLD) Method(s) ORB BRISK FREAK SURF SIFT ORB + SURF ORB + SIFT FPS 38 24 25 9 6 15 10 Recall 0.31 0.10 0.05 0.04 0.05 0.34 0.40 Table 2. Face occlusion 2 (TLD) Method(s) FPS ORB 26 BRISK 25 FREAK 27 SURF 16 SIFT 9 FREAK + SURF 21 FREAK + SIFT 18 Recall 0.44 0.44 0.63 0.57 0.64 0.68 0.69 Table 4. Car (TLD) Method(s) ORB BRISK FREAK SURF SIFT FREAK + SIFT ORB + FREAK FPS 28 30 33 15 8 14 20 Recall 0.48 0.23 0.54 0.48 0.67 0.95 0.87 SIFT will be used by Matrioska only when necessary. Table 4 shows an almost perfect result testing Car dataset (TLD), even without the online learning module enabled: this is due to the fact that the target object does not change its appearance during the sequence and our detector, with a combination of two methods, is enough to obtain a robust performance. 4.2 Detection and Learning In the following evaluation we will enable the online learning module of Matrioska to test our approach against the dataset used by TLD and MILTrack. The gain in performance compared to the use of the detection module alone is clearly visible in Table 5. Figure 1 shows snapshots of the tested sequences with examples of detection. The obtained results are better than many other state-of-the-art approaches [24, 22, 25, 21, 15, 23, 14, 13, 26]. It must be noted that: (i) we could carefully choose for each sequence the most suitable methods to obtain better performance but instead we registered in the technique pool only ORB and FREAK despite the tested sequence, (ii) the use of keypoint-based methods forced us to double the size of the smaller sequence (and relocate the ground truth accordingly) because when the target object is too small we cannot compute a feature vector for each keypoint found inside it (e.g. Tiger2 and Coke11), (iii) we slightly enlarged the first bounding box to be able to compute the feature vectors near the borders of the target object, and (iiii) our OpenCV C++ single-threaded implementation runs at 25 FPS on an Intel Core i7-920 with a QVGA video stream.
  • 9. Matrioska: A Multi-Level Approach to Fast Tracking by Learning 9 Table 5. Evaluation of Matrioska with online learning enabled. We provided only the first object location and algorithm tracked the target object up to the final frame. The results are better than many other state-of-the-art approaches. Sequence Car (TLD) Carchase (TLD) Motocross (TLD) Pedestrain2 (TLD) Pedestrain3 (TLD) Coke11 (MILTrack) David (MILTrack) Face occlusion 2 (MILTrack) Tiger2 (MILTrack) Sylverster (MILTrack) 5 Frames 945 9928 2665 338 184 292 462 816 365 1345 Correct D. / True D. P / R / F-measure 854 / 860 0.97 / 0.99 / 0.98 7551 / 8660 0.97 / 0.87 / 0.92 1357 / 1412 0.84 / 0.96 / 0.90 260 / 266 0.94 / 0.98 / 0.96 145 / 156 0.96 / 0.92 / 0.94 59 / 59 1.00 / 1.00 / 1.00 93 / 93 1.00 / 1.00 / 1.00 163 / 163 1.00 / 1.00 / 1.00 67 / 73 0.93 / 0.91 / 0.92 269 / 269 1.00 / 1.00 / 1.00 Conclusions and Future Work In this paper we presented a novel framework to address the problem of tracking an unknown object in a video sequence. We used a combination of two modules: (i) a detector that, using keypoint-based methods, can identify an object in presence of illumination, scale, rotation and other changes, and (ii) a learning module that updates the object model to account for large variations of the target appearance. Several tests validated this approach and showed its efficiency. In relation to future developments, in order to fully exploit the capabilities of Matrioska it would be ideal to develop a series of new (keypoint-based) techniques each of which is based on the analysis of a particular feature (color, shape and more) to obtain fast and simple techniques if used individually, but robust when used together. These techniques, within a fallback model, would ensure in addition, a low computational complexity as only the sufficient features would be used for the correct detection of the object. Furthermore, we obtained comparable, if not better, results to the current state-of-the-art approaches using only two components (a detector and a learning module). The integration of a tracker could provide even better overall results. References 1. Lowe, D. G.: Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vision. 60, 91–110 (2004) 2. Ke, Y., Sukthankar, R.: PCA-SIFT: a more distinctive representation for local image descriptors. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2, 506–513 (2004) 3. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst. 110, 346–359 (2004) 4. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. 2011 IEEE International Conference on Computer Vision, 2564– 2571 (2011)
  • 10. 10 Matrioska: A Multi-Level Approach to Fast Tracking by Learning 5. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1615–1630 (2005) 6. Leutenegger, S., Chli, M., Siegwart, R. Y.: BRISK: Binary Robust invariant scalable keypoints. 2011 IEEE International Conference on Computer Vision (ICCV), 2548– 2555 (2011) 7. Morel, J.-M., Yu, G.: ASIFT: A New Framework for Fully Affine Invariant Image Comparison. SIAM J. Img. Sci. 2, 438–469 (2009) 8. Ortiz, R.: FREAK: Fast Retina Keypoint. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 510–517 (2012) 9. Alcantarilla, P. F., Bartoli, A., Davison, A. J.: KAZE features. Proceedings of the 12th European conference on Computer Vision, 214–227 (2012) 10. Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. ACM Comput. Surv. 38 (2006) 11. Kloihofer, W., Kampel, M.: Interest Point Based Tracking. 2010 20th International Conference on Pattern Recognition, 3549–3552 (2010) 12. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1, 511–518 (2001) 13. Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-Learning-Detection. IEEE Trans. on Pattern Anal. Mach. Intell. 34, 1409–1422 (2012) 14. Babenko, B., Yang, M.-H., Belongie, S.: Robust Object Tracking with Online Multiple Instance Learning. IEEE Trans. Pattern Anal. Mach. Intell. 33, 1619–1632 (2011) 15. Ross, D. A., Lim, J., Lin, R.-S. Yang, M.-H.: Incremental Learning for Robust Visual Tracking. Int. J. Comput. Vision 77, 125–141 (2008) 16. Hare, S., Saffari, A., Torr, P.H.S.: Efficient online structured output learning for keypoint-based object tracking. Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 1894–1901 (2012) 17. Heinly, J., Dunn, E., Frahm, J.-M.: Comparative evaluation of binary features. Proceedings of the 12th European conference on CV, 759–773 (2012) 18. Gauglitz, S., H¨llerer, T., Turk, M.: Evaluation of Interest Point Detectors and o Feature Descriptors for Visual Tracking. Int. J. Comput. Vision 94, 335–360 (2011) 19. Khvedchenia, I.: A battle of three descriptors: SURF, FREAK and BRISK (2012). http://computer-vision-talks.com/ 20. Godec, M., Roth, P. M., Bischof, H.: Hough-based tracking of non-rigid objects. Proceedings of the 2011 International Conference on Computer Vision, 81–88 (2011) 21. Yu, Q., Dinh, T. B., Medioni, G.: Online Tracking and Reacquisition Using Cotrained Generative and Discriminative Trackers. Proceedings of the 10th European Conference on Computer Vision, 678–691 (2008) 22. Avidan, S.: Ensemble Tracking. IEEE Trans. Pattern Anal. Mach. Intell. 29, 261– 271 (2007) 23. Santner, J., Leistner C., Saffari, A., Pock, T., Bischof, H.: PROST Parallel Robust Online Simple Tracking. 2010 IEEE Conference on CVPR, 723–730 (2010) 24. Grabner, H., Bischof, H.: On-line Boosting and Vision. Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1, 260–267 (2006) 25. Grabner, H., Leistner, C., Bischof, H.: Semi-supervised On-Line Boosting for Robust Tracking. Proceedings of the 10th European Conference on Computer Vision, 234–247 (2008) 26. Pernici, F.: FaceHugger: The ALIEN Tracker Applied to Faces. Computer Vision ECCV 2012. Workshops and Demonstrations 7585, 597–601 (2012)