SlideShare a Scribd company logo
Multi-view Vehicle Detection and Tracking in
Crossroads
Liwei Liu, Junliang Xing, Haizhou Ai
Computer Science and Technology Department,
Tsinghua University,Beijing 100084, China
Email: ahz@mail.tsinghua.edu.cn
Abstract—Multi-view vehicle detection and tracking in cross-
roads is of fundamental importance in traffic surveillance yet
still remains a very challenging task. The view changes of
different vehicles and their occlusions in crossroads are two
main difficulties that often fail many existing methods. To handle
these difficulties, we propose a new method for multi-view vehicle
detection and tracking that innovates mainly on two aspects: the
two-stage view selection and the dual-layer occlusion handling.
For the two-stage view selection, a Multi-Modal Particle Filter
(MMPF) is proposed to track vehicles in explicit view, i.e. frontal
(rear) view or side view. In the second stage, for the vehicles in
inexplicit views, i.e. intermediate views between frontal and side
view, spatial-temporal analysis is employed to further decide their
views so as to maintain the consistence of view transition. For the
dual-layer occlusion handling, a cluster based dedicated vehicle
model for partial occlusion and a backward retracking procedure
for full occlusion are integrated complementarily to deal with
occlusion problems. The two-stage view selection is efficient
for fusing multiple detectors, while the dual-layer occlusion
handling improves tracking performance effectively. Extensive
experiments under different weather conditions, including snowy,
sunny and cloudy, demonstrate the effectiveness and efficiency
of our method.
I. INTRODUCTION
Detection and tracking of vehicles in traffic scenes is of fun-
damental importance for surveillance system and has apparent
commercial value, which provides great potentials for many
high level computer vision applications such as traffic analysis,
intelligent scheduling and abnormal activity detection. The
difficulties behind this problem, however, are also hard, such
as vehicle view and type changes, partial and full vehicle
occlusions, gradual and sudden illumination changes. These
difficulties are inevitable in practical applications and thus
noticeably aggravate this problem.
Vehicle detection and tracking has been researched for many
years, and significant advances have been achieved. Traditional
methods try to detect vehicles based on background subtrac-
tion [1][2][3] and track them using techniques like Kalman
Fitler [3] and Spatial-Temporal Markov Radom Field [2] with
different observations such as contour [1] and appearance
[3]. Since their methods are sensitive to foreground noise,
particular cases such as camera adjustment, raining, snowing
and shadow will cause failure in these methods. Moreover,
they all require vehicles must be identified separately before
occlusion happens, which is a strong constraint and confine
they to the practical applications in crowded scenarios.
Fig. 1. The flow chart of our approach.
In the last decade, the fast development of object detection
techniques has result in many promising methods for detecting
particular object classes, e.g., faces [4][5], pedestrians [6][7],
and vehicles [8]. These object detectors provide good ob-
servation models for detection based tracking algorithm. All
the detection based methods can be categorized into three
classes according to the types of detectors: single view detector
[4][7], integration of multiple view detectors [6] and single
multi-view detector [8]. Obviously, single view detector is
unsuitable for the scenarios which contain multi-view targets,
e.g. crossroads. In consideration of the connection and distinc-
tion among multiple view detectors, tracking algorithm based
on multiple detectors must have a sophisticated integration
strategy. Single multi-view detector (always used in onboard
system) requires high affinity of targets in each view and
uniform aspect ratio of vehicles, so this approach doesn’t work
in our problem. In addition, Data-Driven MCMC [9] has been
used to recover trajectories of targets of interest over time,
but this method requires all the videos in advance and uses
optimization algorithms to solve the problem, which conflict
with the requirements of online and real-time processing
in our problem. As far as we known, there are very few
works on multi-view car detection and tracking in crossroads
based on detection techniques, which can process online and
in real-time. Our approach is motivated from this practical
requirement for applications.
In this work, we focus on videos taken by a single camera
at a height above ground as would be common in surveillance
application. The vehicle videos are acquired in crossroads
where occlusions among vehicles and viewpoint changes are
rather grave. Our approach is much more robust to shadow
978-1-4577-0121-4/11/$26.00 ©2011 IEEE 608
(a) (b) (c)
Fig. 2. Results of view confidence weight (from left to right: side view (red),
intermediate view and frontal view (green); the histogram in the bottom-right
corner of each figure: the quantitative comparison of the weights).
and illumination changes owing to detection based techniques
compared with background subtraction based ones. The main
contributions of our approach lie in: (1) A real-time and
online processing system that can deal with view changes
and occlusions effectively. (2) A two-stage view selection
technique that can efficiently fuse multiple detectors; (3) A
dual-layer occlusion handling technique that can deal with
partial and full occlusions integrally.
The rest of this paper is organized as follows. The details of
this proposed method are elaborated in section II. Experiment
results are demonstrated in Section III. And conclusions and
discussions are made in section IV.
II. THE PROPOSED APPROACH
The flow chart of our multi-view vehicle detection and
tracking system is shown in Fig.1. Multiple view detectors are
not only employed to search for new targets but also coupled
together in MMPF to guide the tracking process and perform
view selection of targets in explicit views. For those targets
in inexplicit views, spatial-temporal analysis is explored to
smooth their view transition and maintain the consistence of
traffic flow. For the sake of handling occlusion, we devise
a cluster based dedicated vehicle model and a backward re-
tracking procedure for partial occlusion and full occlusion
respectively. In the following, after briefly introducing multiple
view detectors, we will focus our illustration on the two-
stage view selection and dual-layer occlusion handling, which
mainly differentiate our approach from previous methods.
A. Multiple View Detectors
For vehicle surveillance videos in crossroads, it is very
difficult to train one detector that covers all views due to
the large variance of the vehicle appearance. So we train
detectors that cover typical views like frontal (rear) view and
side view. The two detectors are offline trained in the boosting
framework with Joint Sparse Granular Features (JSGF)†
which
has been proven to be effective for object detection and robust
to illumination variation. They provide very discriminative and
steady observation models for multi-view vehicle tracking.
B. Two-Stage View Selection
Having frontal and side view detectors is far from enough
for multi-view vehicle tracking due to response conflict of
the two detectors. In other words, if unreliable observation
is chosen to track a response conflict target, the target may
†Specified object detection apparatus, Chinese Patent 200710305499.7,
inventer: Haizhou Ai, Chang Huang, Shihong Lao, Takayoshi Yamashita.
TABLE I
THE FRAMEWORK OF TWO-STAGE VIEW SELECTION
Given: Each object st−1 has its supporting multi-view parti-
cle set sn
t−1,v, πn
t−1,v
N,V
n=1,v=1
, where N is the number of
particles of each view, V is the number of views, t − 1 is the
frame number and πn
t−1,v is the weight of particle sn
t−1,v:
• For the particles of the dominant view dvt−1 ∈ V :
+ Predict, resample and update as traditional particle filter;
+ Obtain the weighted mean state st,dvt−1
of dominant view;
• For each other view {v |v ∈ V ∩ v = dvt−1}:
+ If a new target match with this view:
- Reinitialize the particles with the new target;
- Use the detector to evaluate the particles;
+ Else if
N
n=1 πn
t−1,v < TS or the distance between
the center of st−1,v and the center of st−1,dvt−1
Dis(v , dvt−1) > TDis:
- Reset all the particles according to st,dvt−1 ;
- Update with the resetted particles;
+ Else:
- Predict, resample and update as traditional particle filter;
+ Obtain the weighted mean state st,v of v ;
• If ∀
N
n=1 πn
t,v −
N
n=1 πn
t,v > TW {v |v ∈ V ∩ v = v} :
+ dvt = v;
• Else:
+ Spatial-temporal analysis;
lost when it cannot get enough supporting observation. So
we propose the two-stage view selection to integrate the two
independent detectors for multi-view vehicle tracking. The
two-stage view selection contains multi-modal particle filter
and spatial-temporal analysis which will be introduced below.
1) Multi-Modal Particle Filter: Multi-Modal Particle Filter
(MMPF) is devised to track multi-view targets. As the name
suggests, a target has two possible views (frontal and side
views) as its two modals but at a time it only reveals one view,
and MMPF is employed to integrate the two view detectors to
track it and perform first stage view selection.
Different from traditional particle filter or CONDENSA-
TION [10], MMPF maintains two groups of particles for a
target, one for frontal view and the other for side view, not
only to track the target but also to acquire its view transition. In
the MMPF framework (Table 1), each particle is evaluated by
a confidence reflecting the likelihood of the target belonging
to the corresponding view. To select the dominant view, the
total confidence of its particles is calculated for each view. If
the difference between two views’ total confidences is bigger
than a threshold TW (equation (1)), then the bigger one (as
Fig.2(a) and Fig.2(c)) will be treated as the dominant view,
otherwise (as Fig.2(b)) a second stage view selection will be
adopted. Denoting N as the number of particles, and πn
t,v as
the nth particle’s confidence for view v in frame t:
N
n=1
πn
t,v −
N
n=1
πn
t,v > TW (1)
Since the two groups of particles are not independent, the
traditional procedures of predict, resample and update [10]
978-1-4577-0121-4/11/$26.00 ©2011 IEEE 609
(a) (b)
Fig. 3. (a) tracking result (green box represents frontal view and red box
denotes side view). (b) the predefined confidences of particles (brightness
indicates confidence and the position is the center of a particle).
for particles are unsuitable for our framework. So MMPF
needs redesigned procedures to deal with all the special cases
when the observation of minor view (the view other than the
dominant view) becomes unreliable or drifts. The redesigned
procedures of predict, resample and update can be formalized
as equation (2) (follow the framework in Table 1):
Predict by p(st,dv|st−1,dv) : {s
(i)
t,v, π
(i)
t−1,v} ∼ p(st,v|Ot−1,v)
Resample :



{s
(i)
t,dv, 1/Ndv} ∼ p(st,dv| Ot−1,dv)
{N(snew, δ2
), 1} ∼ p(st,mv| Ot−1,mv)
{T(s
(i)
t,dv), 1} ∼ p(st,mv| Ot−1,mv)
{s
(i)
t,mv, 1/Nmv} ∼ p(st,mv| Ot−1,mv)
Update :π
(n)
t,v ∝ p(ot,v| st,v), {s
(i)
t,v, π
(i)
t,v} ∼ p(st,v| Ot,v)
(2)
where dv is the dominant view and mv is the minor view,
dv ∪ mv = v. The tracking algorithm first predict all the
particles according to a motion model p(st,dv|st−1,dv) of dv.
In the stage of resample, the particles of dv resample according
to their weights. But for the minor view mv, different mea-
surements are adopted depending on the circumstance: when
a new target (snew) match with the minor view, N(snew, δ2
)
is used to generate new particles through Gaussian sampling.
When the observation becomes unreliable (the total confidence
is too small) or particles drift to another target, T(s
(i)
t,dv) resets
particles according the particles of the dominant view (with the
same center and scale). Except for the two situations above,
the particles of minor view resample like the dominant view’s.
Finally, the tracking algorithm updates the states for both
views by the weighted mean of all the resampled particles.
In the MMPF framework, the observation models need
to give a confidence reflecting the likelihood of the target
belonging to the corresponding view. The outputs of each
view detector are of potential to give confidences of particles
and yield to the corresponding view confidence. But they are
inaccurate and different from view to view, they cannot be used
directly without post processing. So, we utilize the number of
layers a particle passed l and the output of last layer confdet
to predefine the confidence of a particle.
xl = exp(a × (confdet − Tl
det)) (3)
pl = cl−lmax
(4)
pl = pl−1 +
xl
1 + xl
× (pl − pl−1) (5)
(a) (b) (c)
Fig. 4. (a) Vehicles in transition view have different view responses. (b) Some
kinds of vehicles yield to similar appearances with other views’ vehicles. (c)
Different positions cause similar appearances with other views’ vehicles.
where xl is the exponent amplification of difference between
confdet and Tl
det, Tl
det is the confidence threshold of the
detector, a is a constant (set to be 5 in experiments). In (4), pl
is the basis confidence of layer l, c is also a constant (set to be
1.1), lmax is the number of total layers of the corresponding
detector. pl in (5) is the redefined confidence.
Frontal view and side view detectors are trained in the
same way with the same number of layers, same detector
rate, so their pass rates of positive samples in each layer
are the same, from which we can see that the number of
layers a particle passed is important for evaluating the particle.
Since the numbers are discrete and the outputs of detectors
are inaccurate, integrating the two metrics to redefine the
confidence is more appropriate than using them respectively.
After our redefinition, the confidence is normalized to [0, 1).
The higher layer a particle passes, the bigger the confidence
is. Figure 3 (b) shows the predefined confidences.
2) Spatial-Temporal Analysis: Although MMPF is effective
in most cases, it is likely to fail when targets reveal inexplicit
views (Fig.4 (a)) which may lead to frequent view switch.
What is more, some targets may confuse MMPF when their
appearances are ambiguous due to their types (Fig.4 (b)) or
distance to camera (Fig.4 (c)). To address these problems,
spatial-temporal analysis is employed to perform a second
stage view selection, which smooth view switch procedure so
that the selected view coincides not only with traffic flow but
also view variation tendency.
During the spatial-temporal analysis, four different types of
energy terms are explored to vote for the correct view. The four
energy terms we concern about are primary particles, velocity
difference, historical views and neighboring targets.
Primary Particles. It is the number of confident particles,
which reflects the likelihood of a target belonging to a view
from another perspective. The energy term can be denoted as:
|P| (P = {p|Confp > Tc}) (6)
Velocity Difference. Since a different view of vehicle
has a different moving direction in crossroads, the velocity
difference can be used as an energy term. Take the side view
for an example, the velocity along x-direction is bigger than
y-direction. We adopt the mean velocity of recent 10 fames
as targets’ velocity, because velocity between two contiguous
frames is inaccurate.
VSide = |Vx| − |Vy| (7)
VF rontal = |Vy| − |Vx| (8)
978-1-4577-0121-4/11/$26.00 ©2011 IEEE 610
TABLE II
COEFFICIENT OF ENERGY TERM
Energy
Term
Primary
Particle
Velocity
Difference
Historical
Views
Neighboring
Targets
Coefficient α = 1/200 β = 1 γ = 1/10 δ = 1/4
Historical Views. As the temporal information, historical
views are utilized to smooth view variation tendency. In our
experiments, we record targets’ views of last n frames (set to
be 10 in experiments), and use the number of side view HSide
and the number of frontal view HF rontal as the temporal
energy terms.
Neighboring Targets. Since traffic flow is consistent at
a certain time, a target’s view is always the same with
neighbors’. So the numbers of the same view’s targets nearby,
NSide and NF rontal , are introduced into spatial-temporal
analysis as spatial information.
The composite energy function can be defined as equation
(9) whose coefficients are shown in Table 2. Maximum like-
lihood estimation is used to select dominant view.
Uv = α × Pv + β × Vv + γ × Hv + δ × Nv (9)
As the second stage of view selection, spatial-temporal
analysis uses spatial and temporal information to help targets
in inexplicit view to obtain their reliable views. The efficient
fusion of MMPF and spatial-temporal analysis is capable of
tracking multi-view vehicles for seizing their primary obser-
vations.
C. Dual-layer Complementary Occlusion Handling
Besides of the difficulties in selecting the corresponding ve-
hicle view, the occlusion between multiple vehicles is another
tough problem. Occlusion can be divided into two different
types: partial occlusion and full occlusion. In partial occlusion,
the detectors tend to drift due to its congenital deficiency at
distinguishing different targets. To solve this problem, a dedi-
cated vehicle model based on clustering is proposed to prevent
response from drifting. As for full occlusion and particular
partial occlusion whose observation becomes unreliable or
lost, a backward smoothing [10] process is adopted to handle
them.
Taking advantage of traffic scene, we propose a dedicated
vehicle model based on clustering to solve partial occlusion
effectively. The model fuses multiple cues, including position,
size and moving trend, to label particles in order to prevent
particles from drifting. When one target is partially occluded
by another target, particle filter of this target may fail down
because of response drifting. In the stage of resample, some
random resampled particles contain the other target and have
high confidences so that the merged result will drift to the other
target gradually and fail down the particle filter ultimately. So
it is necessary to label the particles with high confidence in one
occlusion cluster before merging to prevent responses from
drifting. For this purpose, we adopt K-Means to cluster confi-
dent particles, exploring the features of position, size and the
difference of moving trend. We denote the feature vector of a
particle as (xn, yn, wn, hn, dvi
n,x, dvi
n,y) where xn, yn, wn, hn
indicate its location and size, dvi
n,x, dvi
n,y are the differences
of moving trend in x direction and y direction, and i is the
target id in the occlusion cluster. For an example, supposing
a particle belonging to object 1, dv1
n,x, dv1
n,y represent the
differences between the velocity from the target position in
last frame (t−1) to the position of the particle and the velocity
of the target. The differences can be formalized as equation
(10) and (11). The smaller the differences are, the more likely
the particle belongs to the target.
dvi
n,x = xn − xi
t−1 − vt−1,x (10)
dvi
n,y = yn − yi
t−1 − vt−1,y (11)
In order to accelerate the process of convergence and
increase the accuracy, it is desirable to use the states of
the targets in the occlusion cluster last frame (t − 1) as the
centers of clustering (dvi
n,x = 0, dvi
n,y = 0) in the stage
of initialization. After clustering by K-Means, the obtained
cluster centers are deemed as the states of the targets and used
to reinitialize the corresponding particle sets. If the overlap
ratio of two merged targets is bigger than a threshold Toverlap
which indicates that a target tends to be fully occluded by the
other target, the second layer of occlusion handling will be
performed.
To surmount full occlusion, a backward smoothing [10]
process is adopted to retrack lost targets. When a target cannot
get enough supporting particles, the track of the target is
buffered for future backward smoothing with newly collected
observations. It first finds the match between the new targets
and the buffered targets based on their affinity (the rate of
overlap). And then Hungarian algorithm is employed to obtain
the optimal match between these two sets.
III. EXPERIMENTS
Experiments are carried out on videos collected from traffic
surveillance cameras in crossroads (with camera adjustment,
raining, snowing and shadow) and some real-world video data
collected with a hand-held camera. The system runs at more
than 22 fps on VGA size (640×480) video on an Intel Core
Quad 2.66 GHz CPU with 4G RAM.
A. Experiment Settings
In our experiments, the offline frontal view detector is
trained from 18567 samples with normalized size 24×24
while side view detector is trained from 9814 samples with
normalized size 48×24. The total confidence threshold (TW )
used to check whether a target’s view is explicit or not is set to
be 5. For dual-layer occlusion handling, if the overlap between
two merged targets after particles clustering is bigger than 0.9
(Toverlap), the backward smoothing procedure is adopted.
B. Detection Performance Evaluation
The evaluation set contains 3334 images with manual la-
beled ground truth out of 24800 multi-view vehicles captured
from traffic surveillance videos under different weather. We
evaluate the detection performance with the tracking results
which reflects the precision of tracking algorithm (detectors
978-1-4577-0121-4/11/$26.00 ©2011 IEEE 611
Fig. 5. ROC curve of multi-view vehicle tracking results.
are used to search for new targets in a part of image pyramid
and offer confidences for particles). We compare our work (red
curve) with other two baseline methods: 1) Simple Integration:
detect (same detectors) and track (particle filter) targets in
frontal view and side view respectively. When a target reveals
both views (overlap), the one with big confidence is selected
and corresponding detector is used to track it afterwards; 2)
Frontal + Side View: detect and track targets in frontal view
and side view respectively with no post processing. In Fig.
5, we can see that our method achieves a relatively higher
detection rate than the other two methods at the same false
alarm level while using the same detectors. We attribute this
to the two-stage view selection since it makes the MMPF seize
the primary observations of targets and track them effectively.
C. Tracking Performance Evaluation
We adopt the same metrics for evaluating tracking perfor-
mance as in [6][10]. These metrics are defined as following.
MT: number of Mostly Tracked trajectories; ML: number of
Mostly Lost trajectories; Frmt: number of Fragments trajecto-
ries; FAT: number of False trajectories; IDS: the frequency of
Identity Switches.
The video we use to evaluate tracking performance consists
of 10002 frames in 640×480 resolution which contain frequent
partial occlusions and intensive full occlusions. To evaluate the
performance of the dual-layer occlusion handling, we compare
our algorithm with the one without occlusion handling. From
Table 3 that gives the comparison results, we can see that
the dual-layer occlusion handling achieves an improvement on
almost all the metrics. Especially on the Frmt, we attribute this
significant improvement to our dual-layer occlusion handling
since it provides progressive association for tracking occluded
targets, which overcomes most of the fragments. The improve-
ment of Frmt further increases the MT in our method. Fig.6
gives some typical tracking results.
TABLE III
TRACKING COMPARISION
Algorithm GT MT ML Frmt FAT IDS
Our method 215 187 6 41 3 5
Without occlusion
handling
215 167 6 129 5 7
Fig. 6. Typical tracking results (first row: complex background; second row:
shadow and occlusion; third row: pedestrian disturbance).
IV. CONCLUSION
In this paper, we present a robust multi-view vehicle detec-
tion and tracking algorithm in crossroads. It is a real-time and
online processing system that can deal with view changes and
occlusions effectively. The two-stage view selection is efficient
in fusing multiple detectors while the dual-layer occlusion
handling technique can tackle both partial and full occlusions.
Experiments under different weather conditions (snowy, sunny
and cloudy) demonstrate the effectiveness and efficiency of our
method.
ACKNOWLEDGMENT
This work is supported by Beijing Educational Committee
Program (YB20081000303).
REFERENCES
[1] D. Koller, J. Weber, and J. Malik, “Robust multiple car tracking with
occlusion reasoning,” in Eur. Conf. Comput. Vis., 1994.
[2] S. Kamijo, Y. Matsushita, K. Ikeuchi, and M. Sakauchi, “Occlusion
robust tracking utilizing spatio-temporal markov random field model,”
in IEEE Int. Conf. Pattern Recognition, 2000.
[3] B. T. Morris and M. M. Trivedi, “Learning, modeling, and classification
of vehicle track patterns from live video,” IEEE Trans. Intell. Transp.
Syst., vol. 9, pp. 425–437, 2008.
[4] P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J.
Comput. Vis., vol. 57, pp. 137–154, 2004.
[5] C. Huang, H. Ai, Y. Li, and S. Lao, “High performance rotation invariant
multiview face detection,” IEEE Trans. Pattern Anal. Mach. Intel.,
vol. 29, pp. 671–686, 2007.
[6] B. Wu and R. Nevatia, “Detection and tracking of multiple, partially
occluded humans by bayesian combination of edgelet based part detec-
tors,” Int. J. Comput. Vis., vol. 75, pp. 247–266, 2007.
[7] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
detection,” in IEEE Int. Conf. Comput. Vis. Pattern Recognition, 2005.
[8] C.-H. Kuo and R. Nevatia, “Robust multi-view car detection using
unsupervised sub-categorization,” in Appl. of Comput. Vis., 2009.
[9] Q. Yu and G. Medioni, “Integrated detection and tracking for multiple
moving objects using data-driven mcmc data association,” in IEEE
Motion and Video Computing, 2008.
[10] J. Xing, H. Ai, L. Liu, and S. Lao, “Multiple player tracking in sports
video: a dual-mode two-way bayesian inference approach with progres-
sive observation modeling,” IEEE Tans. Image Processing, vol. 20, pp.
1652–1667, 2011.
978-1-4577-0121-4/11/$26.00 ©2011 IEEE 612

More Related Content

PDF
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
PDF
Robot Pose Estimation: A Vertical Stereo Pair Versus a Horizontal One
PDF
Applying Computer Vision to Traffic Monitoring System in Vietnam
PDF
B04410814
PDF
A Novel Approach for Ship Recognition using Shape and Texture
PDF
3 d mrf based video tracking in the compressed domain
PDF
3 d mrf based video tracking in the compressed domain
PDF
ShawnQuinnCSS565FinalResearchProject
A ROS IMPLEMENTATION OF THE MONO-SLAM ALGORITHM
Robot Pose Estimation: A Vertical Stereo Pair Versus a Horizontal One
Applying Computer Vision to Traffic Monitoring System in Vietnam
B04410814
A Novel Approach for Ship Recognition using Shape and Texture
3 d mrf based video tracking in the compressed domain
3 d mrf based video tracking in the compressed domain
ShawnQuinnCSS565FinalResearchProject

What's hot (19)

PPTX
Traffic sign detection via graph based ranking and segmentation
PDF
Leader follower formation control of ground vehicles using camshift based gui...
PDF
Online video-based abnormal detection using highly motion techniques and stat...
PDF
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
PDF
REVIEW OF LANE DETECTION AND TRACKING ALGORITHMS IN ADVANCED DRIVER ASSISTANC...
PDF
Implementation of a lane-tracking system for autonomous driving using Kalman ...
PDF
IRJET- Study on the Feature of Cavitation Bubbles in Hydraulic Valve by using...
PDF
Vehicle detection and tracking techniques a concise review
PDF
IRJET- Video Based Traffic Sign Detection by Scale Based Frame Fusion Technique
PDF
IRJET- Traffic Sign Classification and Detection using Deep Learning
PDF
Distance Metric Learning tutorial at CVPR 2015
PDF
A METHOD FOR TRACKING ROAD OBJECTS
PDF
Object extraction using edge, motion and saliency information from videos
PPT
Video Surveillance Systems For Traffic Monitoring
PDF
CLASSIFICATION AND COMPARISON OF LICENSE PLATES LOCALIZATION ALGORITHMS
PDF
CLASSIFICATION AND COMPARISON OF LICENSE PLATES LOCALIZATION ALGORITHMS
PDF
ramya_Motion_Detection
PDF
Fb4301931934
PDF
Pedestrian Counting in Video Sequences based on Optical Flow Clustering
Traffic sign detection via graph based ranking and segmentation
Leader follower formation control of ground vehicles using camshift based gui...
Online video-based abnormal detection using highly motion techniques and stat...
IRJET- Automatic Traffic Sign Detection and Recognition using CNN
REVIEW OF LANE DETECTION AND TRACKING ALGORITHMS IN ADVANCED DRIVER ASSISTANC...
Implementation of a lane-tracking system for autonomous driving using Kalman ...
IRJET- Study on the Feature of Cavitation Bubbles in Hydraulic Valve by using...
Vehicle detection and tracking techniques a concise review
IRJET- Video Based Traffic Sign Detection by Scale Based Frame Fusion Technique
IRJET- Traffic Sign Classification and Detection using Deep Learning
Distance Metric Learning tutorial at CVPR 2015
A METHOD FOR TRACKING ROAD OBJECTS
Object extraction using edge, motion and saliency information from videos
Video Surveillance Systems For Traffic Monitoring
CLASSIFICATION AND COMPARISON OF LICENSE PLATES LOCALIZATION ALGORITHMS
CLASSIFICATION AND COMPARISON OF LICENSE PLATES LOCALIZATION ALGORITHMS
ramya_Motion_Detection
Fb4301931934
Pedestrian Counting in Video Sequences based on Optical Flow Clustering
Ad

Viewers also liked (17)

PDF
Internet telcel 5 GB
PDF
Family Dental Center 4-8-16 lo res
PDF
Aprobados laboratorio
DOCX
Categorias
PDF
Alexis torres
PPTX
Planificación familiar en atención primaria
DOCX
My CV
PDF
Skok DCDC - Employer Reference 2014
PDF
DOCX
PDF
Trice-SeniorCapstone
PDF
Becoming a Digital Hero - Using Data and Stories to Create Heroic Digital Exp...
PPS
O que comemos (plasticina)
PPTX
Fármacos antimibianos
DOCX
ArtemisProjects
DOCX
Legislación informática
PDF
Oil and Gas Flow Solutions by Badger Meter
Internet telcel 5 GB
Family Dental Center 4-8-16 lo res
Aprobados laboratorio
Categorias
Alexis torres
Planificación familiar en atención primaria
My CV
Skok DCDC - Employer Reference 2014
Trice-SeniorCapstone
Becoming a Digital Hero - Using Data and Stories to Create Heroic Digital Exp...
O que comemos (plasticina)
Fármacos antimibianos
ArtemisProjects
Legislación informática
Oil and Gas Flow Solutions by Badger Meter
Ad

Similar to multi-view vehicle detection and tracking in (20)

PPTX
Multi view vehicle detection and tracking in crossroads
PDF
A real-time system for vehicle detection with shadow removal and vehicle clas...
PDF
MODEL BASED TECHNIQUE FOR VEHICLE TRACKING IN TRAFFIC VIDEO USING SPATIAL LOC...
PDF
G04743943
PDF
Vehicle Tracking Using Kalman Filter and Features
PDF
Design of an effective multiple objects tracking framework for dynamic video ...
PDF
A METHOD FOR TRACKING ROAD OBJECTS
PDF
VEHICLE DETECTION, CLASSIFICATION, COUNTING, AND DETECTION OF VEHICLE DIRECTI...
PDF
Real time vehicle counting in complex scene for traffic flow estimation using...
PDF
IRJET- A Survey of Approaches for Vehicle Traffic Analysis
PDF
IRJET- A Survey of Approaches for Vehicle Traffic Analysis
PDF
IRJET- Reckoning the Vehicle using MATLAB
PDF
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...
PDF
Kq3518291832
PDF
Vehicle counting without background modeling
PDF
Vehicle detection by using rear parts and tracking system
PDF
Real Time Object Identification for Intelligent Video Surveillance Applications
PDF
A0140109
PDF
proceedings of PSG NCIICT
PDF
Neural Network based Vehicle Classification for Intelligent Traffic Control
Multi view vehicle detection and tracking in crossroads
A real-time system for vehicle detection with shadow removal and vehicle clas...
MODEL BASED TECHNIQUE FOR VEHICLE TRACKING IN TRAFFIC VIDEO USING SPATIAL LOC...
G04743943
Vehicle Tracking Using Kalman Filter and Features
Design of an effective multiple objects tracking framework for dynamic video ...
A METHOD FOR TRACKING ROAD OBJECTS
VEHICLE DETECTION, CLASSIFICATION, COUNTING, AND DETECTION OF VEHICLE DIRECTI...
Real time vehicle counting in complex scene for traffic flow estimation using...
IRJET- A Survey of Approaches for Vehicle Traffic Analysis
IRJET- A Survey of Approaches for Vehicle Traffic Analysis
IRJET- Reckoning the Vehicle using MATLAB
Traffic Light Detection and Recognition for Self Driving Cars using Deep Lear...
Kq3518291832
Vehicle counting without background modeling
Vehicle detection by using rear parts and tracking system
Real Time Object Identification for Intelligent Video Surveillance Applications
A0140109
proceedings of PSG NCIICT
Neural Network based Vehicle Classification for Intelligent Traffic Control

More from Aalaa Khattab (6)

PPTX
Vuzix i wear vr920
PDF
A multi modal biometric system using fingerprint , face and speech
PPTX
Paper multi-modal biometric system using fingerprint , face and speech
PPTX
Multi modal biometric system
PPTX
Low level feature extraction - chapter 4
PPTX
Multi spectral imaging
Vuzix i wear vr920
A multi modal biometric system using fingerprint , face and speech
Paper multi-modal biometric system using fingerprint , face and speech
Multi modal biometric system
Low level feature extraction - chapter 4
Multi spectral imaging

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Spectroscopy.pptx food analysis technology
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
Teaching material agriculture food technology
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Big Data Technologies - Introduction.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Cloud computing and distributed systems.
Network Security Unit 5.pdf for BCA BBA.
Spectroscopy.pptx food analysis technology
Encapsulation_ Review paper, used for researhc scholars
Teaching material agriculture food technology
MIND Revenue Release Quarter 2 2025 Press Release
20250228 LYD VKU AI Blended-Learning.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Big Data Technologies - Introduction.pptx
Empathic Computing: Creating Shared Understanding
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Reach Out and Touch Someone: Haptics and Empathic Computing
A comparative analysis of optical character recognition models for extracting...
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

multi-view vehicle detection and tracking in

  • 1. Multi-view Vehicle Detection and Tracking in Crossroads Liwei Liu, Junliang Xing, Haizhou Ai Computer Science and Technology Department, Tsinghua University,Beijing 100084, China Email: ahz@mail.tsinghua.edu.cn Abstract—Multi-view vehicle detection and tracking in cross- roads is of fundamental importance in traffic surveillance yet still remains a very challenging task. The view changes of different vehicles and their occlusions in crossroads are two main difficulties that often fail many existing methods. To handle these difficulties, we propose a new method for multi-view vehicle detection and tracking that innovates mainly on two aspects: the two-stage view selection and the dual-layer occlusion handling. For the two-stage view selection, a Multi-Modal Particle Filter (MMPF) is proposed to track vehicles in explicit view, i.e. frontal (rear) view or side view. In the second stage, for the vehicles in inexplicit views, i.e. intermediate views between frontal and side view, spatial-temporal analysis is employed to further decide their views so as to maintain the consistence of view transition. For the dual-layer occlusion handling, a cluster based dedicated vehicle model for partial occlusion and a backward retracking procedure for full occlusion are integrated complementarily to deal with occlusion problems. The two-stage view selection is efficient for fusing multiple detectors, while the dual-layer occlusion handling improves tracking performance effectively. Extensive experiments under different weather conditions, including snowy, sunny and cloudy, demonstrate the effectiveness and efficiency of our method. I. INTRODUCTION Detection and tracking of vehicles in traffic scenes is of fun- damental importance for surveillance system and has apparent commercial value, which provides great potentials for many high level computer vision applications such as traffic analysis, intelligent scheduling and abnormal activity detection. The difficulties behind this problem, however, are also hard, such as vehicle view and type changes, partial and full vehicle occlusions, gradual and sudden illumination changes. These difficulties are inevitable in practical applications and thus noticeably aggravate this problem. Vehicle detection and tracking has been researched for many years, and significant advances have been achieved. Traditional methods try to detect vehicles based on background subtrac- tion [1][2][3] and track them using techniques like Kalman Fitler [3] and Spatial-Temporal Markov Radom Field [2] with different observations such as contour [1] and appearance [3]. Since their methods are sensitive to foreground noise, particular cases such as camera adjustment, raining, snowing and shadow will cause failure in these methods. Moreover, they all require vehicles must be identified separately before occlusion happens, which is a strong constraint and confine they to the practical applications in crowded scenarios. Fig. 1. The flow chart of our approach. In the last decade, the fast development of object detection techniques has result in many promising methods for detecting particular object classes, e.g., faces [4][5], pedestrians [6][7], and vehicles [8]. These object detectors provide good ob- servation models for detection based tracking algorithm. All the detection based methods can be categorized into three classes according to the types of detectors: single view detector [4][7], integration of multiple view detectors [6] and single multi-view detector [8]. Obviously, single view detector is unsuitable for the scenarios which contain multi-view targets, e.g. crossroads. In consideration of the connection and distinc- tion among multiple view detectors, tracking algorithm based on multiple detectors must have a sophisticated integration strategy. Single multi-view detector (always used in onboard system) requires high affinity of targets in each view and uniform aspect ratio of vehicles, so this approach doesn’t work in our problem. In addition, Data-Driven MCMC [9] has been used to recover trajectories of targets of interest over time, but this method requires all the videos in advance and uses optimization algorithms to solve the problem, which conflict with the requirements of online and real-time processing in our problem. As far as we known, there are very few works on multi-view car detection and tracking in crossroads based on detection techniques, which can process online and in real-time. Our approach is motivated from this practical requirement for applications. In this work, we focus on videos taken by a single camera at a height above ground as would be common in surveillance application. The vehicle videos are acquired in crossroads where occlusions among vehicles and viewpoint changes are rather grave. Our approach is much more robust to shadow 978-1-4577-0121-4/11/$26.00 ©2011 IEEE 608
  • 2. (a) (b) (c) Fig. 2. Results of view confidence weight (from left to right: side view (red), intermediate view and frontal view (green); the histogram in the bottom-right corner of each figure: the quantitative comparison of the weights). and illumination changes owing to detection based techniques compared with background subtraction based ones. The main contributions of our approach lie in: (1) A real-time and online processing system that can deal with view changes and occlusions effectively. (2) A two-stage view selection technique that can efficiently fuse multiple detectors; (3) A dual-layer occlusion handling technique that can deal with partial and full occlusions integrally. The rest of this paper is organized as follows. The details of this proposed method are elaborated in section II. Experiment results are demonstrated in Section III. And conclusions and discussions are made in section IV. II. THE PROPOSED APPROACH The flow chart of our multi-view vehicle detection and tracking system is shown in Fig.1. Multiple view detectors are not only employed to search for new targets but also coupled together in MMPF to guide the tracking process and perform view selection of targets in explicit views. For those targets in inexplicit views, spatial-temporal analysis is explored to smooth their view transition and maintain the consistence of traffic flow. For the sake of handling occlusion, we devise a cluster based dedicated vehicle model and a backward re- tracking procedure for partial occlusion and full occlusion respectively. In the following, after briefly introducing multiple view detectors, we will focus our illustration on the two- stage view selection and dual-layer occlusion handling, which mainly differentiate our approach from previous methods. A. Multiple View Detectors For vehicle surveillance videos in crossroads, it is very difficult to train one detector that covers all views due to the large variance of the vehicle appearance. So we train detectors that cover typical views like frontal (rear) view and side view. The two detectors are offline trained in the boosting framework with Joint Sparse Granular Features (JSGF)† which has been proven to be effective for object detection and robust to illumination variation. They provide very discriminative and steady observation models for multi-view vehicle tracking. B. Two-Stage View Selection Having frontal and side view detectors is far from enough for multi-view vehicle tracking due to response conflict of the two detectors. In other words, if unreliable observation is chosen to track a response conflict target, the target may †Specified object detection apparatus, Chinese Patent 200710305499.7, inventer: Haizhou Ai, Chang Huang, Shihong Lao, Takayoshi Yamashita. TABLE I THE FRAMEWORK OF TWO-STAGE VIEW SELECTION Given: Each object st−1 has its supporting multi-view parti- cle set sn t−1,v, πn t−1,v N,V n=1,v=1 , where N is the number of particles of each view, V is the number of views, t − 1 is the frame number and πn t−1,v is the weight of particle sn t−1,v: • For the particles of the dominant view dvt−1 ∈ V : + Predict, resample and update as traditional particle filter; + Obtain the weighted mean state st,dvt−1 of dominant view; • For each other view {v |v ∈ V ∩ v = dvt−1}: + If a new target match with this view: - Reinitialize the particles with the new target; - Use the detector to evaluate the particles; + Else if N n=1 πn t−1,v < TS or the distance between the center of st−1,v and the center of st−1,dvt−1 Dis(v , dvt−1) > TDis: - Reset all the particles according to st,dvt−1 ; - Update with the resetted particles; + Else: - Predict, resample and update as traditional particle filter; + Obtain the weighted mean state st,v of v ; • If ∀ N n=1 πn t,v − N n=1 πn t,v > TW {v |v ∈ V ∩ v = v} : + dvt = v; • Else: + Spatial-temporal analysis; lost when it cannot get enough supporting observation. So we propose the two-stage view selection to integrate the two independent detectors for multi-view vehicle tracking. The two-stage view selection contains multi-modal particle filter and spatial-temporal analysis which will be introduced below. 1) Multi-Modal Particle Filter: Multi-Modal Particle Filter (MMPF) is devised to track multi-view targets. As the name suggests, a target has two possible views (frontal and side views) as its two modals but at a time it only reveals one view, and MMPF is employed to integrate the two view detectors to track it and perform first stage view selection. Different from traditional particle filter or CONDENSA- TION [10], MMPF maintains two groups of particles for a target, one for frontal view and the other for side view, not only to track the target but also to acquire its view transition. In the MMPF framework (Table 1), each particle is evaluated by a confidence reflecting the likelihood of the target belonging to the corresponding view. To select the dominant view, the total confidence of its particles is calculated for each view. If the difference between two views’ total confidences is bigger than a threshold TW (equation (1)), then the bigger one (as Fig.2(a) and Fig.2(c)) will be treated as the dominant view, otherwise (as Fig.2(b)) a second stage view selection will be adopted. Denoting N as the number of particles, and πn t,v as the nth particle’s confidence for view v in frame t: N n=1 πn t,v − N n=1 πn t,v > TW (1) Since the two groups of particles are not independent, the traditional procedures of predict, resample and update [10] 978-1-4577-0121-4/11/$26.00 ©2011 IEEE 609
  • 3. (a) (b) Fig. 3. (a) tracking result (green box represents frontal view and red box denotes side view). (b) the predefined confidences of particles (brightness indicates confidence and the position is the center of a particle). for particles are unsuitable for our framework. So MMPF needs redesigned procedures to deal with all the special cases when the observation of minor view (the view other than the dominant view) becomes unreliable or drifts. The redesigned procedures of predict, resample and update can be formalized as equation (2) (follow the framework in Table 1): Predict by p(st,dv|st−1,dv) : {s (i) t,v, π (i) t−1,v} ∼ p(st,v|Ot−1,v) Resample :    {s (i) t,dv, 1/Ndv} ∼ p(st,dv| Ot−1,dv) {N(snew, δ2 ), 1} ∼ p(st,mv| Ot−1,mv) {T(s (i) t,dv), 1} ∼ p(st,mv| Ot−1,mv) {s (i) t,mv, 1/Nmv} ∼ p(st,mv| Ot−1,mv) Update :π (n) t,v ∝ p(ot,v| st,v), {s (i) t,v, π (i) t,v} ∼ p(st,v| Ot,v) (2) where dv is the dominant view and mv is the minor view, dv ∪ mv = v. The tracking algorithm first predict all the particles according to a motion model p(st,dv|st−1,dv) of dv. In the stage of resample, the particles of dv resample according to their weights. But for the minor view mv, different mea- surements are adopted depending on the circumstance: when a new target (snew) match with the minor view, N(snew, δ2 ) is used to generate new particles through Gaussian sampling. When the observation becomes unreliable (the total confidence is too small) or particles drift to another target, T(s (i) t,dv) resets particles according the particles of the dominant view (with the same center and scale). Except for the two situations above, the particles of minor view resample like the dominant view’s. Finally, the tracking algorithm updates the states for both views by the weighted mean of all the resampled particles. In the MMPF framework, the observation models need to give a confidence reflecting the likelihood of the target belonging to the corresponding view. The outputs of each view detector are of potential to give confidences of particles and yield to the corresponding view confidence. But they are inaccurate and different from view to view, they cannot be used directly without post processing. So, we utilize the number of layers a particle passed l and the output of last layer confdet to predefine the confidence of a particle. xl = exp(a × (confdet − Tl det)) (3) pl = cl−lmax (4) pl = pl−1 + xl 1 + xl × (pl − pl−1) (5) (a) (b) (c) Fig. 4. (a) Vehicles in transition view have different view responses. (b) Some kinds of vehicles yield to similar appearances with other views’ vehicles. (c) Different positions cause similar appearances with other views’ vehicles. where xl is the exponent amplification of difference between confdet and Tl det, Tl det is the confidence threshold of the detector, a is a constant (set to be 5 in experiments). In (4), pl is the basis confidence of layer l, c is also a constant (set to be 1.1), lmax is the number of total layers of the corresponding detector. pl in (5) is the redefined confidence. Frontal view and side view detectors are trained in the same way with the same number of layers, same detector rate, so their pass rates of positive samples in each layer are the same, from which we can see that the number of layers a particle passed is important for evaluating the particle. Since the numbers are discrete and the outputs of detectors are inaccurate, integrating the two metrics to redefine the confidence is more appropriate than using them respectively. After our redefinition, the confidence is normalized to [0, 1). The higher layer a particle passes, the bigger the confidence is. Figure 3 (b) shows the predefined confidences. 2) Spatial-Temporal Analysis: Although MMPF is effective in most cases, it is likely to fail when targets reveal inexplicit views (Fig.4 (a)) which may lead to frequent view switch. What is more, some targets may confuse MMPF when their appearances are ambiguous due to their types (Fig.4 (b)) or distance to camera (Fig.4 (c)). To address these problems, spatial-temporal analysis is employed to perform a second stage view selection, which smooth view switch procedure so that the selected view coincides not only with traffic flow but also view variation tendency. During the spatial-temporal analysis, four different types of energy terms are explored to vote for the correct view. The four energy terms we concern about are primary particles, velocity difference, historical views and neighboring targets. Primary Particles. It is the number of confident particles, which reflects the likelihood of a target belonging to a view from another perspective. The energy term can be denoted as: |P| (P = {p|Confp > Tc}) (6) Velocity Difference. Since a different view of vehicle has a different moving direction in crossroads, the velocity difference can be used as an energy term. Take the side view for an example, the velocity along x-direction is bigger than y-direction. We adopt the mean velocity of recent 10 fames as targets’ velocity, because velocity between two contiguous frames is inaccurate. VSide = |Vx| − |Vy| (7) VF rontal = |Vy| − |Vx| (8) 978-1-4577-0121-4/11/$26.00 ©2011 IEEE 610
  • 4. TABLE II COEFFICIENT OF ENERGY TERM Energy Term Primary Particle Velocity Difference Historical Views Neighboring Targets Coefficient α = 1/200 β = 1 γ = 1/10 δ = 1/4 Historical Views. As the temporal information, historical views are utilized to smooth view variation tendency. In our experiments, we record targets’ views of last n frames (set to be 10 in experiments), and use the number of side view HSide and the number of frontal view HF rontal as the temporal energy terms. Neighboring Targets. Since traffic flow is consistent at a certain time, a target’s view is always the same with neighbors’. So the numbers of the same view’s targets nearby, NSide and NF rontal , are introduced into spatial-temporal analysis as spatial information. The composite energy function can be defined as equation (9) whose coefficients are shown in Table 2. Maximum like- lihood estimation is used to select dominant view. Uv = α × Pv + β × Vv + γ × Hv + δ × Nv (9) As the second stage of view selection, spatial-temporal analysis uses spatial and temporal information to help targets in inexplicit view to obtain their reliable views. The efficient fusion of MMPF and spatial-temporal analysis is capable of tracking multi-view vehicles for seizing their primary obser- vations. C. Dual-layer Complementary Occlusion Handling Besides of the difficulties in selecting the corresponding ve- hicle view, the occlusion between multiple vehicles is another tough problem. Occlusion can be divided into two different types: partial occlusion and full occlusion. In partial occlusion, the detectors tend to drift due to its congenital deficiency at distinguishing different targets. To solve this problem, a dedi- cated vehicle model based on clustering is proposed to prevent response from drifting. As for full occlusion and particular partial occlusion whose observation becomes unreliable or lost, a backward smoothing [10] process is adopted to handle them. Taking advantage of traffic scene, we propose a dedicated vehicle model based on clustering to solve partial occlusion effectively. The model fuses multiple cues, including position, size and moving trend, to label particles in order to prevent particles from drifting. When one target is partially occluded by another target, particle filter of this target may fail down because of response drifting. In the stage of resample, some random resampled particles contain the other target and have high confidences so that the merged result will drift to the other target gradually and fail down the particle filter ultimately. So it is necessary to label the particles with high confidence in one occlusion cluster before merging to prevent responses from drifting. For this purpose, we adopt K-Means to cluster confi- dent particles, exploring the features of position, size and the difference of moving trend. We denote the feature vector of a particle as (xn, yn, wn, hn, dvi n,x, dvi n,y) where xn, yn, wn, hn indicate its location and size, dvi n,x, dvi n,y are the differences of moving trend in x direction and y direction, and i is the target id in the occlusion cluster. For an example, supposing a particle belonging to object 1, dv1 n,x, dv1 n,y represent the differences between the velocity from the target position in last frame (t−1) to the position of the particle and the velocity of the target. The differences can be formalized as equation (10) and (11). The smaller the differences are, the more likely the particle belongs to the target. dvi n,x = xn − xi t−1 − vt−1,x (10) dvi n,y = yn − yi t−1 − vt−1,y (11) In order to accelerate the process of convergence and increase the accuracy, it is desirable to use the states of the targets in the occlusion cluster last frame (t − 1) as the centers of clustering (dvi n,x = 0, dvi n,y = 0) in the stage of initialization. After clustering by K-Means, the obtained cluster centers are deemed as the states of the targets and used to reinitialize the corresponding particle sets. If the overlap ratio of two merged targets is bigger than a threshold Toverlap which indicates that a target tends to be fully occluded by the other target, the second layer of occlusion handling will be performed. To surmount full occlusion, a backward smoothing [10] process is adopted to retrack lost targets. When a target cannot get enough supporting particles, the track of the target is buffered for future backward smoothing with newly collected observations. It first finds the match between the new targets and the buffered targets based on their affinity (the rate of overlap). And then Hungarian algorithm is employed to obtain the optimal match between these two sets. III. EXPERIMENTS Experiments are carried out on videos collected from traffic surveillance cameras in crossroads (with camera adjustment, raining, snowing and shadow) and some real-world video data collected with a hand-held camera. The system runs at more than 22 fps on VGA size (640×480) video on an Intel Core Quad 2.66 GHz CPU with 4G RAM. A. Experiment Settings In our experiments, the offline frontal view detector is trained from 18567 samples with normalized size 24×24 while side view detector is trained from 9814 samples with normalized size 48×24. The total confidence threshold (TW ) used to check whether a target’s view is explicit or not is set to be 5. For dual-layer occlusion handling, if the overlap between two merged targets after particles clustering is bigger than 0.9 (Toverlap), the backward smoothing procedure is adopted. B. Detection Performance Evaluation The evaluation set contains 3334 images with manual la- beled ground truth out of 24800 multi-view vehicles captured from traffic surveillance videos under different weather. We evaluate the detection performance with the tracking results which reflects the precision of tracking algorithm (detectors 978-1-4577-0121-4/11/$26.00 ©2011 IEEE 611
  • 5. Fig. 5. ROC curve of multi-view vehicle tracking results. are used to search for new targets in a part of image pyramid and offer confidences for particles). We compare our work (red curve) with other two baseline methods: 1) Simple Integration: detect (same detectors) and track (particle filter) targets in frontal view and side view respectively. When a target reveals both views (overlap), the one with big confidence is selected and corresponding detector is used to track it afterwards; 2) Frontal + Side View: detect and track targets in frontal view and side view respectively with no post processing. In Fig. 5, we can see that our method achieves a relatively higher detection rate than the other two methods at the same false alarm level while using the same detectors. We attribute this to the two-stage view selection since it makes the MMPF seize the primary observations of targets and track them effectively. C. Tracking Performance Evaluation We adopt the same metrics for evaluating tracking perfor- mance as in [6][10]. These metrics are defined as following. MT: number of Mostly Tracked trajectories; ML: number of Mostly Lost trajectories; Frmt: number of Fragments trajecto- ries; FAT: number of False trajectories; IDS: the frequency of Identity Switches. The video we use to evaluate tracking performance consists of 10002 frames in 640×480 resolution which contain frequent partial occlusions and intensive full occlusions. To evaluate the performance of the dual-layer occlusion handling, we compare our algorithm with the one without occlusion handling. From Table 3 that gives the comparison results, we can see that the dual-layer occlusion handling achieves an improvement on almost all the metrics. Especially on the Frmt, we attribute this significant improvement to our dual-layer occlusion handling since it provides progressive association for tracking occluded targets, which overcomes most of the fragments. The improve- ment of Frmt further increases the MT in our method. Fig.6 gives some typical tracking results. TABLE III TRACKING COMPARISION Algorithm GT MT ML Frmt FAT IDS Our method 215 187 6 41 3 5 Without occlusion handling 215 167 6 129 5 7 Fig. 6. Typical tracking results (first row: complex background; second row: shadow and occlusion; third row: pedestrian disturbance). IV. CONCLUSION In this paper, we present a robust multi-view vehicle detec- tion and tracking algorithm in crossroads. It is a real-time and online processing system that can deal with view changes and occlusions effectively. The two-stage view selection is efficient in fusing multiple detectors while the dual-layer occlusion handling technique can tackle both partial and full occlusions. Experiments under different weather conditions (snowy, sunny and cloudy) demonstrate the effectiveness and efficiency of our method. ACKNOWLEDGMENT This work is supported by Beijing Educational Committee Program (YB20081000303). REFERENCES [1] D. Koller, J. Weber, and J. Malik, “Robust multiple car tracking with occlusion reasoning,” in Eur. Conf. Comput. Vis., 1994. [2] S. Kamijo, Y. Matsushita, K. Ikeuchi, and M. Sakauchi, “Occlusion robust tracking utilizing spatio-temporal markov random field model,” in IEEE Int. Conf. Pattern Recognition, 2000. [3] B. T. Morris and M. M. Trivedi, “Learning, modeling, and classification of vehicle track patterns from live video,” IEEE Trans. Intell. Transp. Syst., vol. 9, pp. 425–437, 2008. [4] P. Viola and M. J. Jones, “Robust real-time face detection,” Int. J. Comput. Vis., vol. 57, pp. 137–154, 2004. [5] C. Huang, H. Ai, Y. Li, and S. Lao, “High performance rotation invariant multiview face detection,” IEEE Trans. Pattern Anal. Mach. Intel., vol. 29, pp. 671–686, 2007. [6] B. Wu and R. Nevatia, “Detection and tracking of multiple, partially occluded humans by bayesian combination of edgelet based part detec- tors,” Int. J. Comput. Vis., vol. 75, pp. 247–266, 2007. [7] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Int. Conf. Comput. Vis. Pattern Recognition, 2005. [8] C.-H. Kuo and R. Nevatia, “Robust multi-view car detection using unsupervised sub-categorization,” in Appl. of Comput. Vis., 2009. [9] Q. Yu and G. Medioni, “Integrated detection and tracking for multiple moving objects using data-driven mcmc data association,” in IEEE Motion and Video Computing, 2008. [10] J. Xing, H. Ai, L. Liu, and S. Lao, “Multiple player tracking in sports video: a dual-mode two-way bayesian inference approach with progres- sive observation modeling,” IEEE Tans. Image Processing, vol. 20, pp. 1652–1667, 2011. 978-1-4577-0121-4/11/$26.00 ©2011 IEEE 612