Final PPT

Malicious Activity Prediction in Public Surveillance
using Real Time Video Acquisition
Abhilash Dhondalkar(11EC07), Arjun A(11EC14),
M Ranga Sai Shreyas(11EC42), Tawfeeq Ahmad(11EC103)
Project Guide: Prof M S Bhat
Dept of E & C Engg
July 2014 - May 2015

Algorithm
Acquire a recorded video using USB
Malicious Object Recognition using HOG Features
If malicious object recognised, perform scene/ environment
description using visual-semantic alignments on images; After
scene description, set Prediction = 1
If Prediction = 1, perform super resolution on a set of 10
neighbouring frames to improve image clarity for facial
recognition
Perform Facial Recognition and get parameters from Database
such as UID, registered weapons and past criminal records and
history

Algorithm
If malicious object recognised, perform scene/
environment description using visual-semantic
alignments on images; After scene description, set
Prediction = 1
recognition
history

Algorithm
If Prediction = 1, perform super resolution on a set of
10 neighbouring frames to improve image clarity for
facial recognition
history

Algorithm
recognition
Perform Facial Recognition and get parameters from
Database such as UID, registered weapons and past
criminal records and history

Gradient Computation (Filtering using Sobel masks)
Orientation Binning (Creating cell histograms)
Descriptor Blocks (Rectangular and Circular HOG Blocks)
Block Normalization (L1, L2 norm; L2-hys; L1-sqrt)
SVM Classiﬁer (Hyper plane as decision function)

Gradient Computation (Filtering using Sobel masks)
Orientation Binning (Creating cell histograms)
Descriptor Blocks (Rectangular and Circular HOG
Blocks)
Block Normalization (L1, L2 norm; L2-hys; L1-sqrt)
SVM Classiﬁer (Hyper plane as decision function)

Deep Visual Semantic Description of Images
Malicious Activity Detection by describing a crowded
scene using the multi-modal approach
Extract CNN Features to recognise objects - 16 layer
NeuralNetfrom VGG
RNN to predict sentence equivalents for extracted CNN
features; trained using BPTT
Models and checkpoints used - MSCOCo and Flickr8k
Visualization of result - Sentence equivalent to image viewable
on HTML pop-up; accuracy of result - evaluated using BLEU
score

Malicious Activity Detection by describing a crowded scene
using the multi-modal approach
NeuralNetfrom VGG
Visualization of result - Sentence equivalent to image viewable
on HTML pop-up; accuracy of result - evaluated using BLEU
score

Malicious Activity Detection by describing a crowded scene
using the multi-modal approach
NeuralNetfrom VGG
Visualization of result - Sentence equivalent to image
viewable on HTML pop-up; accuracy of result -
evaluated using BLEU score

Deep Visual Semantic Description of Images - Technical
Details
CNN Feature Extraction - Load a batch of 10 images;
pre-process each of them; reorder the image to standard
caffe input dimension order
Predict using caffe_classifier and return transposed result
Training to create a checkpoint model - Initialize a set of
variables, structures and the model for generator class; fetch
the data provider
Go over training sentences and find vocabulary to use; Load
checkpoints; initialize solver, cost function and structures for
JSON work status
Mathematical Model for prediction - fetch a batch of test
images, evaluate cost and gradient; perform parameter update

Details
pre-process each of them; reorder the image to standard caffe
input dimension order
Predict using caffe_classifier and return transposed
result
the data provider
JSON work status

Details
Training to create a checkpoint model - Initialize a set
of variables, structures and the model for generator
class; fetch the data provider
JSON work status

Details
the data provider
Go over training sentences and ﬁnd vocabulary to use;
Load checkpoints; initialize solver, cost function and
structures for JSON work status

Details
the data provider
JSON work status
Mathematical Model for prediction - fetch a batch of
test images, evaluate cost and gradient; perform
parameter update

Details
Sentence Description Model using Multi-Modal
Approach
Load checkpoints and the MSCoCo model
Initialize output blobs to dump to JSON for visualization
Load the tasks ﬁle and features for all images containing a
4096 X N numpy array of features
Iterate over all images and predict

Details
Sentence Description Model using Multi-Modal Approach
Load the tasks ﬁle and features for all images containing a
4096 X N numpy array of features

Details
Sentence Description Model using Multi-Modal Approach
Load the tasks ﬁle and features for all images containing
a 4096 X N numpy array of features

Results for Image Descriptions

Results for Image Descriptions - Contd.

Super Resolution - Technical Details and Motion Estimators
Improvement in resolution and image clarity by
increasing the frequency contents in image sequences
Mathematical models - Forward (DHS) model and Inverse
model (least squares solution and need for Criterion function)
Choosing the optimal regularization parameter in inverse
model for SR image
Motion Estimation - Integer Pixel Displacement
Combinatorial Motion Estimation - minimizing the discrepancy
between real and synthetic images

Improvement in resolution and image clarity by increasing the
frequency contents in image sequences
Mathematical models - Forward (DHS) model and
Inverse model (least squares solution and need for
Criterion function)
model for SR image

model for SR image

model for SR image
Combinatorial Motion Estimation - minimizing the
discrepancy between real and synthetic images

Results for Super Resolution - Forward Model

Results for Super Resolution - Inverse Model

Face Detection and Recognition
Face Detection using the Classic Viola Jones Algorithm
Computation of features similar to Haar Basis functions;
Learning function - AdaBoost followed by Cascading
Face Recognition using PCA on Eigen Faces
Creation of Face Space and calculation of Eigen values
Training of Eigen Faces
Recognition using test and trained features - minimal
Euclidean distance

Drawbacks
Processing could not be done in real time
Too many false positives during malicious object recognition
Generating Image Descriptions and CNN feature extraction
were found to be slow processes; need for a GPU to accelerate
the process
Face Recognition is not that accurate using the Eigen Face
Approach, but it’s the one that is best suited

Drawbacks
Too many false positives during malicious object
recognition
the process

Drawbacks
Generating Image Descriptions and CNN feature
extraction were found to be slow processes; need for a
GPU to accelerate the process

Drawbacks
the process
Face Recognition is not that accurate using the Eigen
Face Approach, but it’s the one that is best suited

References
Histogram of oriented gradients for human detection, Dalal N,
Triggs B, CVPR, 2005
Deep Visual Semantic Alignments for Generating Image
Descriptions, Andrej Karpathy and Li Fei-Fei, Stanford
University
Super-resolution of Image Sequences, Krokhin, North-Eastern
University, Boston, Massachusetts
Robust Real-Time Face Detection, Paul Viola, Michael J
Jones, IJCC, 2004

Final PPT

More Related Content

Viewers also liked (14)

Similar to Final PPT (20)

Final PPT