CVPR presentation

Anisotropic Partial Differential Equation
based Video Saliency Detection
Vartika Sharma, Vembarasan Vaitheeswaran, Chee Seng Chan

Our Contributions
• First, we propose a novel method to generate static saliency map based on the adaptive nonlinear PDE
model. It is based on the Linear Elliptic System with Dirichlet boundary (LESD) model for image saliency
detection.
• We refine this model for the purpose of video saliency detection because the original LESD model does not
consider the orientation and motion information contained in the video.
• Further, the proposed algorithm was tested on MSRA and Berkeley datasets, where images are mostly
noiseless and are nearer to the image center but most of the video datasets contains heavy noise and the
salient object is usually moving within the frames. For this reason, we do not use center-prior which is given
in the original LESD model but instead, an extensive direction map consisting of background prior, color
prior, texture and luminance features are used.
• We then combine the static map with motion map, which consists of motion features extracted from the
motion vectors of predicted frames, to get the final saliency map. Figure 1 shows the pipeline of our model.

Addition of Non-Linear Matric Tensor
• The diffusion PDE seen previously does not give reliable information
in the presence of flow-like structures (e.g. fingerprints).
• We will extend our model for flow like structure where it would be
required to rotate the PDE flow towards the orientation of interesting
features.

Addition of Non-Linear Matric Tensor
K2

Feature Extraction From DCT Coefficients
• Three features including luminance, color and texture are extracted
from the unpredicted (I-frames) using DCT Coefficients
• On a given video frame, DCT operates on one 8X8 block at a time. On
this block, there are 64-elements or 64-coeffients and the DCT
operates on this block in a left to right and top to down manner (zig-
zag sequencing).

Feature Extraction From DCT Coefficients
• The results of a 64-element DCT transform are 1 DC coefficient and 63
AC coefficients.
• The DC coefficient represents the average color of the 8x8 region.
(Color and Luminance Prior)
• The 63 AC coefficients represent color change across the
block.(Texture)

Motion Feature Extraction from Motion
Vectors
• Motion Vector: A two-dimensional vector used for inter prediction
that provides an offset from the coordinates in the decoded picture to
the coordinates in a reference picture.
• There are two types of predicted frames: P frames use motion
compensated prediction from a past reference frame, while B frames
are bidirectionally predictive-coded by using motion compensated
prediction from a past and/or a future reference frame.

Motion Feature Extraction from Motion
Vectors
• As there is just one prediction direction (predicted from a past
reference frame) for P frames, the original motion vector MV are used
to represent the motion feature for P frames.
• As B frames might include two types of motion compensated
prediction (the backward and forward prediction), we calculate the
motion vectors for B frames

Anisotropic Partial Differential Equation based Video Saliency
Detection
Vartika Sharma, Vembarasan Vaitheeswaran, Chee Seng Chan
Result of our Video Saliency Detection model on KTH Action
Dataset

Results on KTH Action Datasetϯ
Number of action classes = 6
{boxing, hand clapping, hand waving, jogging, running, walking}
Boxing Hand Clapping Hand Waving Jogging Running Walking
Original Action Videos*
Final Saliency Maps
* For convenience, I have chosen only 16 frames per video
Ϯ "Recognizing Human Actions: A Local SVMApproach",Christian Schuldt, Ivan Laptev and Barbara Caputo; in Proc.
ICPR'04, Cambridge, UK.

DC AC01 AC02 AC03 AC04 AC05 AC06 AC07
AC20 AC21 AC22 AC23 AC24 AC25 AC26 AC27

• We performed salient region segmentation using MCMC segmentation method which
was proposed by Barbu etal ~cite{Barbu2012} for crowd counting. The main purpose of
our experiment is to get an estimation of crowd in a particular video frame and also, to
calculate the rate with which the crowd count is changing in the consecutive frames.
Although, now CCTV cameras are becoming very common for video surveillance, there
are very few algorithms available for real-time automated crowd counting. It is important
to note here that our focus is more on the rate of change of crowd count rather than the
actual crowd count of every frame. A sudden increase or decrease in a crowd count can
act as a warning sign of an unusual activity such as explosion, fight or some other
emergency. For our experiment, we calculate the standard deviation of crowd count in
consecutive video frames for every 10 seconds as a risk calculator. We train our algorithm
on 2000 video frames provided in the Mall Dataset cite{Loy2013} to set the threshold
limit of the standard deviation for which the rate of change of crowd count is ‘safe’. We
further test our algorithm on a few videos in the Pedestrian Traffic Database to show our
results. Figure shows our result on Mall database for crowd counting.

• begin{figure}begin{center}%fbox{rule{0pt}{2in}
rule{0.9linewidth}{0pt}} includegraphics[height=0.95linewidth,
width=0.95linewidth]{final2.png}end{center} caption{Crowd
counting result on frames of Mall Dataset. $(a)$ is the Original video
frames, $(b)$ is our Saliency detection results and $(c)$ is the
segmentaion based on MCMC
method.}label{fig:final2}%label{fig:onecol}end{figure}

CIOFM
MRS
SURPRISE
CA
Our Model

CVPR presentation

More Related Content

What's hot (19)

Viewers also liked (19)

Similar to CVPR presentation (20)

CVPR presentation