SlideShare a Scribd company logo
Malicious Activity Prediction in Public Surveillance
using Real Time Video Acquisition
Abhilash Dhondalkar(11EC07), Arjun A(11EC14),
M Ranga Sai Shreyas(11EC42), Tawfeeq Ahmad(11EC103)
Project Guide: Prof M S Bhat
Dept of E & C Engg
July 2014 - May 2015
Work Outline
Algorithm
Acquire a recorded video using USB
Malicious Object Recognition using HOG Features
If malicious object recognised, perform scene/ environment
description using visual-semantic alignments on images; After
scene description, set Prediction = 1
If Prediction = 1, perform super resolution on a set of 10
neighbouring frames to improve image clarity for facial
recognition
Perform Facial Recognition and get parameters from Database
such as UID, registered weapons and past criminal records and
history
Algorithm
Acquire a recorded video using USB
Malicious Object Recognition using HOG Features
If malicious object recognised, perform scene/ environment
description using visual-semantic alignments on images; After
scene description, set Prediction = 1
If Prediction = 1, perform super resolution on a set of 10
neighbouring frames to improve image clarity for facial
recognition
Perform Facial Recognition and get parameters from Database
such as UID, registered weapons and past criminal records and
history
Algorithm
Acquire a recorded video using USB
Malicious Object Recognition using HOG Features
If malicious object recognised, perform scene/
environment description using visual-semantic
alignments on images; After scene description, set
Prediction = 1
If Prediction = 1, perform super resolution on a set of 10
neighbouring frames to improve image clarity for facial
recognition
Perform Facial Recognition and get parameters from Database
such as UID, registered weapons and past criminal records and
history
Algorithm
Acquire a recorded video using USB
Malicious Object Recognition using HOG Features
If malicious object recognised, perform scene/ environment
description using visual-semantic alignments on images; After
scene description, set Prediction = 1
If Prediction = 1, perform super resolution on a set of
10 neighbouring frames to improve image clarity for
facial recognition
Perform Facial Recognition and get parameters from Database
such as UID, registered weapons and past criminal records and
history
Algorithm
Acquire a recorded video using USB
Malicious Object Recognition using HOG Features
If malicious object recognised, perform scene/ environment
description using visual-semantic alignments on images; After
scene description, set Prediction = 1
If Prediction = 1, perform super resolution on a set of 10
neighbouring frames to improve image clarity for facial
recognition
Perform Facial Recognition and get parameters from
Database such as UID, registered weapons and past
criminal records and history
Malicious Object Recognition using HOG Features
Gradient Computation (Filtering using Sobel masks)
Orientation Binning (Creating cell histograms)
Descriptor Blocks (Rectangular and Circular HOG Blocks)
Block Normalization (L1, L2 norm; L2-hys; L1-sqrt)
SVM Classifier (Hyper plane as decision function)
Malicious Object Recognition using HOG Features
Gradient Computation (Filtering using Sobel masks)
Orientation Binning (Creating cell histograms)
Descriptor Blocks (Rectangular and Circular HOG Blocks)
Block Normalization (L1, L2 norm; L2-hys; L1-sqrt)
SVM Classifier (Hyper plane as decision function)
Malicious Object Recognition using HOG Features
Gradient Computation (Filtering using Sobel masks)
Orientation Binning (Creating cell histograms)
Descriptor Blocks (Rectangular and Circular HOG
Blocks)
Block Normalization (L1, L2 norm; L2-hys; L1-sqrt)
SVM Classifier (Hyper plane as decision function)
Malicious Object Recognition using HOG Features
Gradient Computation (Filtering using Sobel masks)
Orientation Binning (Creating cell histograms)
Descriptor Blocks (Rectangular and Circular HOG Blocks)
Block Normalization (L1, L2 norm; L2-hys; L1-sqrt)
SVM Classifier (Hyper plane as decision function)
Malicious Object Recognition using HOG Features
Gradient Computation (Filtering using Sobel masks)
Orientation Binning (Creating cell histograms)
Descriptor Blocks (Rectangular and Circular HOG Blocks)
Block Normalization (L1, L2 norm; L2-hys; L1-sqrt)
SVM Classifier (Hyper plane as decision function)
Results
Results
Deep Visual Semantic Description of Images
Malicious Activity Detection by describing a crowded
scene using the multi-modal approach
Extract CNN Features to recognise objects - 16 layer
NeuralNetfrom VGG
RNN to predict sentence equivalents for extracted CNN
features; trained using BPTT
Models and checkpoints used - MSCOCo and Flickr8k
Visualization of result - Sentence equivalent to image viewable
on HTML pop-up; accuracy of result - evaluated using BLEU
score
Deep Visual Semantic Description of Images
Malicious Activity Detection by describing a crowded scene
using the multi-modal approach
Extract CNN Features to recognise objects - 16 layer
NeuralNetfrom VGG
RNN to predict sentence equivalents for extracted CNN
features; trained using BPTT
Models and checkpoints used - MSCOCo and Flickr8k
Visualization of result - Sentence equivalent to image viewable
on HTML pop-up; accuracy of result - evaluated using BLEU
score
Deep Visual Semantic Description of Images
Malicious Activity Detection by describing a crowded scene
using the multi-modal approach
Extract CNN Features to recognise objects - 16 layer
NeuralNetfrom VGG
RNN to predict sentence equivalents for extracted CNN
features; trained using BPTT
Models and checkpoints used - MSCOCo and Flickr8k
Visualization of result - Sentence equivalent to image viewable
on HTML pop-up; accuracy of result - evaluated using BLEU
score
Deep Visual Semantic Description of Images
Malicious Activity Detection by describing a crowded scene
using the multi-modal approach
Extract CNN Features to recognise objects - 16 layer
NeuralNetfrom VGG
RNN to predict sentence equivalents for extracted CNN
features; trained using BPTT
Models and checkpoints used - MSCOCo and Flickr8k
Visualization of result - Sentence equivalent to image viewable
on HTML pop-up; accuracy of result - evaluated using BLEU
score
Deep Visual Semantic Description of Images
Malicious Activity Detection by describing a crowded scene
using the multi-modal approach
Extract CNN Features to recognise objects - 16 layer
NeuralNetfrom VGG
RNN to predict sentence equivalents for extracted CNN
features; trained using BPTT
Models and checkpoints used - MSCOCo and Flickr8k
Visualization of result - Sentence equivalent to image
viewable on HTML pop-up; accuracy of result -
evaluated using BLEU score
Deep Visual Semantic Description of Images - Technical
Details
CNN Feature Extraction - Load a batch of 10 images;
pre-process each of them; reorder the image to standard
caffe input dimension order
Predict using caffe_classifier and return transposed result
Training to create a checkpoint model - Initialize a set of
variables, structures and the model for generator class; fetch
the data provider
Go over training sentences and find vocabulary to use; Load
checkpoints; initialize solver, cost function and structures for
JSON work status
Mathematical Model for prediction - fetch a batch of test
images, evaluate cost and gradient; perform parameter update
Deep Visual Semantic Description of Images - Technical
Details
CNN Feature Extraction - Load a batch of 10 images;
pre-process each of them; reorder the image to standard caffe
input dimension order
Predict using caffe_classifier and return transposed
result
Training to create a checkpoint model - Initialize a set of
variables, structures and the model for generator class; fetch
the data provider
Go over training sentences and find vocabulary to use; Load
checkpoints; initialize solver, cost function and structures for
JSON work status
Mathematical Model for prediction - fetch a batch of test
images, evaluate cost and gradient; perform parameter update
Deep Visual Semantic Description of Images - Technical
Details
CNN Feature Extraction - Load a batch of 10 images;
pre-process each of them; reorder the image to standard caffe
input dimension order
Predict using caffe_classifier and return transposed result
Training to create a checkpoint model - Initialize a set
of variables, structures and the model for generator
class; fetch the data provider
Go over training sentences and find vocabulary to use; Load
checkpoints; initialize solver, cost function and structures for
JSON work status
Mathematical Model for prediction - fetch a batch of test
images, evaluate cost and gradient; perform parameter update
Deep Visual Semantic Description of Images - Technical
Details
CNN Feature Extraction - Load a batch of 10 images;
pre-process each of them; reorder the image to standard caffe
input dimension order
Predict using caffe_classifier and return transposed result
Training to create a checkpoint model - Initialize a set of
variables, structures and the model for generator class; fetch
the data provider
Go over training sentences and find vocabulary to use;
Load checkpoints; initialize solver, cost function and
structures for JSON work status
Mathematical Model for prediction - fetch a batch of test
images, evaluate cost and gradient; perform parameter update
Deep Visual Semantic Description of Images - Technical
Details
CNN Feature Extraction - Load a batch of 10 images;
pre-process each of them; reorder the image to standard caffe
input dimension order
Predict using caffe_classifier and return transposed result
Training to create a checkpoint model - Initialize a set of
variables, structures and the model for generator class; fetch
the data provider
Go over training sentences and find vocabulary to use; Load
checkpoints; initialize solver, cost function and structures for
JSON work status
Mathematical Model for prediction - fetch a batch of
test images, evaluate cost and gradient; perform
parameter update
Deep Visual Semantic Description of Images - Technical
Details
Sentence Description Model using Multi-Modal
Approach
Load checkpoints and the MSCoCo model
Initialize output blobs to dump to JSON for visualization
Load the tasks file and features for all images containing a
4096 X N numpy array of features
Iterate over all images and predict
Deep Visual Semantic Description of Images - Technical
Details
Sentence Description Model using Multi-Modal Approach
Load checkpoints and the MSCoCo model
Initialize output blobs to dump to JSON for visualization
Load the tasks file and features for all images containing a
4096 X N numpy array of features
Iterate over all images and predict
Deep Visual Semantic Description of Images - Technical
Details
Sentence Description Model using Multi-Modal Approach
Load checkpoints and the MSCoCo model
Initialize output blobs to dump to JSON for visualization
Load the tasks file and features for all images containing a
4096 X N numpy array of features
Iterate over all images and predict
Deep Visual Semantic Description of Images - Technical
Details
Sentence Description Model using Multi-Modal Approach
Load checkpoints and the MSCoCo model
Initialize output blobs to dump to JSON for visualization
Load the tasks file and features for all images containing
a 4096 X N numpy array of features
Iterate over all images and predict
Deep Visual Semantic Description of Images - Technical
Details
Sentence Description Model using Multi-Modal Approach
Load checkpoints and the MSCoCo model
Initialize output blobs to dump to JSON for visualization
Load the tasks file and features for all images containing a
4096 X N numpy array of features
Iterate over all images and predict
Results for Image Descriptions
Results for Image Descriptions - Contd.
Results for Image Descriptions
Results for Image Descriptions
Super Resolution - Technical Details and Motion Estimators
Improvement in resolution and image clarity by
increasing the frequency contents in image sequences
Mathematical models - Forward (DHS) model and Inverse
model (least squares solution and need for Criterion function)
Choosing the optimal regularization parameter in inverse
model for SR image
Motion Estimation - Integer Pixel Displacement
Combinatorial Motion Estimation - minimizing the discrepancy
between real and synthetic images
Super Resolution - Technical Details and Motion Estimators
Improvement in resolution and image clarity by increasing the
frequency contents in image sequences
Mathematical models - Forward (DHS) model and
Inverse model (least squares solution and need for
Criterion function)
Choosing the optimal regularization parameter in inverse
model for SR image
Motion Estimation - Integer Pixel Displacement
Combinatorial Motion Estimation - minimizing the discrepancy
between real and synthetic images
Super Resolution - Technical Details and Motion Estimators
Improvement in resolution and image clarity by increasing the
frequency contents in image sequences
Mathematical models - Forward (DHS) model and Inverse
model (least squares solution and need for Criterion function)
Choosing the optimal regularization parameter in inverse
model for SR image
Motion Estimation - Integer Pixel Displacement
Combinatorial Motion Estimation - minimizing the discrepancy
between real and synthetic images
Super Resolution - Technical Details and Motion Estimators
Improvement in resolution and image clarity by increasing the
frequency contents in image sequences
Mathematical models - Forward (DHS) model and Inverse
model (least squares solution and need for Criterion function)
Choosing the optimal regularization parameter in inverse
model for SR image
Motion Estimation - Integer Pixel Displacement
Combinatorial Motion Estimation - minimizing the discrepancy
between real and synthetic images
Super Resolution - Technical Details and Motion Estimators
Improvement in resolution and image clarity by increasing the
frequency contents in image sequences
Mathematical models - Forward (DHS) model and Inverse
model (least squares solution and need for Criterion function)
Choosing the optimal regularization parameter in inverse
model for SR image
Motion Estimation - Integer Pixel Displacement
Combinatorial Motion Estimation - minimizing the
discrepancy between real and synthetic images
Results for Super Resolution - Forward Model
Results for Super Resolution - Inverse Model
Results for Motion Estimator
Face Detection and Recognition
Face Detection using the Classic Viola Jones Algorithm
Computation of features similar to Haar Basis functions;
Learning function - AdaBoost followed by Cascading
Face Recognition using PCA on Eigen Faces
Creation of Face Space and calculation of Eigen values
Training of Eigen Faces
Recognition using test and trained features - minimal
Euclidean distance
Face Detection and Recognition
Face Detection using the Classic Viola Jones Algorithm
Computation of features similar to Haar Basis functions;
Learning function - AdaBoost followed by Cascading
Face Recognition using PCA on Eigen Faces
Creation of Face Space and calculation of Eigen values
Training of Eigen Faces
Recognition using test and trained features - minimal
Euclidean distance
Face Detection and Recognition
Face Detection using the Classic Viola Jones Algorithm
Computation of features similar to Haar Basis functions;
Learning function - AdaBoost followed by Cascading
Face Recognition using PCA on Eigen Faces
Creation of Face Space and calculation of Eigen values
Training of Eigen Faces
Recognition using test and trained features - minimal
Euclidean distance
Face Detection and Recognition
Face Detection using the Classic Viola Jones Algorithm
Computation of features similar to Haar Basis functions;
Learning function - AdaBoost followed by Cascading
Face Recognition using PCA on Eigen Faces
Creation of Face Space and calculation of Eigen values
Training of Eigen Faces
Recognition using test and trained features - minimal
Euclidean distance
Face Detection and Recognition
Face Detection using the Classic Viola Jones Algorithm
Computation of features similar to Haar Basis functions;
Learning function - AdaBoost followed by Cascading
Face Recognition using PCA on Eigen Faces
Creation of Face Space and calculation of Eigen values
Training of Eigen Faces
Recognition using test and trained features - minimal
Euclidean distance
Face Detection and Recognition
Face Detection using the Classic Viola Jones Algorithm
Computation of features similar to Haar Basis functions;
Learning function - AdaBoost followed by Cascading
Face Recognition using PCA on Eigen Faces
Creation of Face Space and calculation of Eigen values
Training of Eigen Faces
Recognition using test and trained features - minimal
Euclidean distance
Results for Face Recognition
Final Results
Final Results
Drawbacks
Processing could not be done in real time
Too many false positives during malicious object recognition
Generating Image Descriptions and CNN feature extraction
were found to be slow processes; need for a GPU to accelerate
the process
Face Recognition is not that accurate using the Eigen Face
Approach, but it’s the one that is best suited
Drawbacks
Processing could not be done in real time
Too many false positives during malicious object
recognition
Generating Image Descriptions and CNN feature extraction
were found to be slow processes; need for a GPU to accelerate
the process
Face Recognition is not that accurate using the Eigen Face
Approach, but it’s the one that is best suited
Drawbacks
Processing could not be done in real time
Too many false positives during malicious object recognition
Generating Image Descriptions and CNN feature
extraction were found to be slow processes; need for a
GPU to accelerate the process
Face Recognition is not that accurate using the Eigen Face
Approach, but it’s the one that is best suited
Drawbacks
Processing could not be done in real time
Too many false positives during malicious object recognition
Generating Image Descriptions and CNN feature extraction
were found to be slow processes; need for a GPU to accelerate
the process
Face Recognition is not that accurate using the Eigen
Face Approach, but it’s the one that is best suited
References
Histogram of oriented gradients for human detection, Dalal N,
Triggs B, CVPR, 2005
Deep Visual Semantic Alignments for Generating Image
Descriptions, Andrej Karpathy and Li Fei-Fei, Stanford
University
Super-resolution of Image Sequences, Krokhin, North-Eastern
University, Boston, Massachusetts
Robust Real-Time Face Detection, Paul Viola, Michael J
Jones, IJCC, 2004

More Related Content

PDF
Image Object Detection Pipeline
DOCX
Fr app e detecting malicious facebook applications
DOCX
DETECTING MALICIOUS FACEBOOK APPLICATIONS - IEEE PROJECTS IN PONDICHERRY,BUL...
DOCX
Fr app e detecting malicious facebook applications
PPTX
FRAppE Detecting Malicious Facebook Applications
PDF
Identification and Analysis of Malicious Content on Facebook: A Survey
PPTX
Android security
DOCX
Detecting malicious facebook applications
Image Object Detection Pipeline
Fr app e detecting malicious facebook applications
DETECTING MALICIOUS FACEBOOK APPLICATIONS - IEEE PROJECTS IN PONDICHERRY,BUL...
Fr app e detecting malicious facebook applications
FRAppE Detecting Malicious Facebook Applications
Identification and Analysis of Malicious Content on Facebook: A Survey
Android security
Detecting malicious facebook applications

Viewers also liked (14)

PDF
Frappe ERPNext Open Day February 2014
PDF
Facebook Attacks - an in-depth analysis
PPTX
Automatic test packet generation
PDF
ATPG Methods and Algorithms
PPTX
Motor Control Centre
PDF
101: Convolutional Neural Networks
PDF
Convolution Neural Networks
PDF
Word Embeddings - Introduction
PDF
Convolutional Neural Networks (CNN)
PPTX
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
PPTX
IEEE Presentation
PPT
IEEE Standards
PDF
Deep Learning - Convolutional Neural Networks
PPT
Slideshare Powerpoint presentation
Frappe ERPNext Open Day February 2014
Facebook Attacks - an in-depth analysis
Automatic test packet generation
ATPG Methods and Algorithms
Motor Control Centre
101: Convolutional Neural Networks
Convolution Neural Networks
Word Embeddings - Introduction
Convolutional Neural Networks (CNN)
Lecture 29 Convolutional Neural Networks - Computer Vision Spring2015
IEEE Presentation
IEEE Standards
Deep Learning - Convolutional Neural Networks
Slideshare Powerpoint presentation
Ad

Similar to Final PPT (20)

PPT
Automated Face Detection System
PPTX
Drone ppt
PPTX
Vision based system for monitoring the loss of attention in automotive driver
PDF
Smart Face Recognition System Analysis
PDF
Improving the Perturbation-Based Explanation of Deepfake Detectors Through th...
PPT
JPEG XR objective and subjective evaluations
PDF
OpenCV Introduction
PDF
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
PDF
Learning from Computer Simulation to Tackle Real-World Problems
PPTX
Anomaly Detection with Azure and .NET
PPTX
Object detection
PDF
Iaetsd multi-view and multi band face recognition
PDF
Mirko Lucchese - Deep Image Processing
PDF
Dataset creation for Deep Learning-based Geometric Computer Vision problems
PPTX
Scene recognition using Convolutional Neural Network
PPTX
ppt 20BET1024.pptx
PPTX
cvpresentation-190812154654 (1).pptx
PPTX
ObjectDetection.pptx
PPTX
Ai use cases
PDF
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Automated Face Detection System
Drone ppt
Vision based system for monitoring the loss of attention in automotive driver
Smart Face Recognition System Analysis
Improving the Perturbation-Based Explanation of Deepfake Detectors Through th...
JPEG XR objective and subjective evaluations
OpenCV Introduction
SkyStitch: a Cooperative Multi-UAV-based Real-time Video Surveillance System ...
Learning from Computer Simulation to Tackle Real-World Problems
Anomaly Detection with Azure and .NET
Object detection
Iaetsd multi-view and multi band face recognition
Mirko Lucchese - Deep Image Processing
Dataset creation for Deep Learning-based Geometric Computer Vision problems
Scene recognition using Convolutional Neural Network
ppt 20BET1024.pptx
cvpresentation-190812154654 (1).pptx
ObjectDetection.pptx
Ai use cases
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...
Ad

Final PPT

  • 1. Malicious Activity Prediction in Public Surveillance using Real Time Video Acquisition Abhilash Dhondalkar(11EC07), Arjun A(11EC14), M Ranga Sai Shreyas(11EC42), Tawfeeq Ahmad(11EC103) Project Guide: Prof M S Bhat Dept of E & C Engg July 2014 - May 2015
  • 3. Algorithm Acquire a recorded video using USB Malicious Object Recognition using HOG Features If malicious object recognised, perform scene/ environment description using visual-semantic alignments on images; After scene description, set Prediction = 1 If Prediction = 1, perform super resolution on a set of 10 neighbouring frames to improve image clarity for facial recognition Perform Facial Recognition and get parameters from Database such as UID, registered weapons and past criminal records and history
  • 4. Algorithm Acquire a recorded video using USB Malicious Object Recognition using HOG Features If malicious object recognised, perform scene/ environment description using visual-semantic alignments on images; After scene description, set Prediction = 1 If Prediction = 1, perform super resolution on a set of 10 neighbouring frames to improve image clarity for facial recognition Perform Facial Recognition and get parameters from Database such as UID, registered weapons and past criminal records and history
  • 5. Algorithm Acquire a recorded video using USB Malicious Object Recognition using HOG Features If malicious object recognised, perform scene/ environment description using visual-semantic alignments on images; After scene description, set Prediction = 1 If Prediction = 1, perform super resolution on a set of 10 neighbouring frames to improve image clarity for facial recognition Perform Facial Recognition and get parameters from Database such as UID, registered weapons and past criminal records and history
  • 6. Algorithm Acquire a recorded video using USB Malicious Object Recognition using HOG Features If malicious object recognised, perform scene/ environment description using visual-semantic alignments on images; After scene description, set Prediction = 1 If Prediction = 1, perform super resolution on a set of 10 neighbouring frames to improve image clarity for facial recognition Perform Facial Recognition and get parameters from Database such as UID, registered weapons and past criminal records and history
  • 7. Algorithm Acquire a recorded video using USB Malicious Object Recognition using HOG Features If malicious object recognised, perform scene/ environment description using visual-semantic alignments on images; After scene description, set Prediction = 1 If Prediction = 1, perform super resolution on a set of 10 neighbouring frames to improve image clarity for facial recognition Perform Facial Recognition and get parameters from Database such as UID, registered weapons and past criminal records and history
  • 8. Malicious Object Recognition using HOG Features Gradient Computation (Filtering using Sobel masks) Orientation Binning (Creating cell histograms) Descriptor Blocks (Rectangular and Circular HOG Blocks) Block Normalization (L1, L2 norm; L2-hys; L1-sqrt) SVM Classifier (Hyper plane as decision function)
  • 9. Malicious Object Recognition using HOG Features Gradient Computation (Filtering using Sobel masks) Orientation Binning (Creating cell histograms) Descriptor Blocks (Rectangular and Circular HOG Blocks) Block Normalization (L1, L2 norm; L2-hys; L1-sqrt) SVM Classifier (Hyper plane as decision function)
  • 10. Malicious Object Recognition using HOG Features Gradient Computation (Filtering using Sobel masks) Orientation Binning (Creating cell histograms) Descriptor Blocks (Rectangular and Circular HOG Blocks) Block Normalization (L1, L2 norm; L2-hys; L1-sqrt) SVM Classifier (Hyper plane as decision function)
  • 11. Malicious Object Recognition using HOG Features Gradient Computation (Filtering using Sobel masks) Orientation Binning (Creating cell histograms) Descriptor Blocks (Rectangular and Circular HOG Blocks) Block Normalization (L1, L2 norm; L2-hys; L1-sqrt) SVM Classifier (Hyper plane as decision function)
  • 12. Malicious Object Recognition using HOG Features Gradient Computation (Filtering using Sobel masks) Orientation Binning (Creating cell histograms) Descriptor Blocks (Rectangular and Circular HOG Blocks) Block Normalization (L1, L2 norm; L2-hys; L1-sqrt) SVM Classifier (Hyper plane as decision function)
  • 15. Deep Visual Semantic Description of Images Malicious Activity Detection by describing a crowded scene using the multi-modal approach Extract CNN Features to recognise objects - 16 layer NeuralNetfrom VGG RNN to predict sentence equivalents for extracted CNN features; trained using BPTT Models and checkpoints used - MSCOCo and Flickr8k Visualization of result - Sentence equivalent to image viewable on HTML pop-up; accuracy of result - evaluated using BLEU score
  • 16. Deep Visual Semantic Description of Images Malicious Activity Detection by describing a crowded scene using the multi-modal approach Extract CNN Features to recognise objects - 16 layer NeuralNetfrom VGG RNN to predict sentence equivalents for extracted CNN features; trained using BPTT Models and checkpoints used - MSCOCo and Flickr8k Visualization of result - Sentence equivalent to image viewable on HTML pop-up; accuracy of result - evaluated using BLEU score
  • 17. Deep Visual Semantic Description of Images Malicious Activity Detection by describing a crowded scene using the multi-modal approach Extract CNN Features to recognise objects - 16 layer NeuralNetfrom VGG RNN to predict sentence equivalents for extracted CNN features; trained using BPTT Models and checkpoints used - MSCOCo and Flickr8k Visualization of result - Sentence equivalent to image viewable on HTML pop-up; accuracy of result - evaluated using BLEU score
  • 18. Deep Visual Semantic Description of Images Malicious Activity Detection by describing a crowded scene using the multi-modal approach Extract CNN Features to recognise objects - 16 layer NeuralNetfrom VGG RNN to predict sentence equivalents for extracted CNN features; trained using BPTT Models and checkpoints used - MSCOCo and Flickr8k Visualization of result - Sentence equivalent to image viewable on HTML pop-up; accuracy of result - evaluated using BLEU score
  • 19. Deep Visual Semantic Description of Images Malicious Activity Detection by describing a crowded scene using the multi-modal approach Extract CNN Features to recognise objects - 16 layer NeuralNetfrom VGG RNN to predict sentence equivalents for extracted CNN features; trained using BPTT Models and checkpoints used - MSCOCo and Flickr8k Visualization of result - Sentence equivalent to image viewable on HTML pop-up; accuracy of result - evaluated using BLEU score
  • 20. Deep Visual Semantic Description of Images - Technical Details CNN Feature Extraction - Load a batch of 10 images; pre-process each of them; reorder the image to standard caffe input dimension order Predict using caffe_classifier and return transposed result Training to create a checkpoint model - Initialize a set of variables, structures and the model for generator class; fetch the data provider Go over training sentences and find vocabulary to use; Load checkpoints; initialize solver, cost function and structures for JSON work status Mathematical Model for prediction - fetch a batch of test images, evaluate cost and gradient; perform parameter update
  • 21. Deep Visual Semantic Description of Images - Technical Details CNN Feature Extraction - Load a batch of 10 images; pre-process each of them; reorder the image to standard caffe input dimension order Predict using caffe_classifier and return transposed result Training to create a checkpoint model - Initialize a set of variables, structures and the model for generator class; fetch the data provider Go over training sentences and find vocabulary to use; Load checkpoints; initialize solver, cost function and structures for JSON work status Mathematical Model for prediction - fetch a batch of test images, evaluate cost and gradient; perform parameter update
  • 22. Deep Visual Semantic Description of Images - Technical Details CNN Feature Extraction - Load a batch of 10 images; pre-process each of them; reorder the image to standard caffe input dimension order Predict using caffe_classifier and return transposed result Training to create a checkpoint model - Initialize a set of variables, structures and the model for generator class; fetch the data provider Go over training sentences and find vocabulary to use; Load checkpoints; initialize solver, cost function and structures for JSON work status Mathematical Model for prediction - fetch a batch of test images, evaluate cost and gradient; perform parameter update
  • 23. Deep Visual Semantic Description of Images - Technical Details CNN Feature Extraction - Load a batch of 10 images; pre-process each of them; reorder the image to standard caffe input dimension order Predict using caffe_classifier and return transposed result Training to create a checkpoint model - Initialize a set of variables, structures and the model for generator class; fetch the data provider Go over training sentences and find vocabulary to use; Load checkpoints; initialize solver, cost function and structures for JSON work status Mathematical Model for prediction - fetch a batch of test images, evaluate cost and gradient; perform parameter update
  • 24. Deep Visual Semantic Description of Images - Technical Details CNN Feature Extraction - Load a batch of 10 images; pre-process each of them; reorder the image to standard caffe input dimension order Predict using caffe_classifier and return transposed result Training to create a checkpoint model - Initialize a set of variables, structures and the model for generator class; fetch the data provider Go over training sentences and find vocabulary to use; Load checkpoints; initialize solver, cost function and structures for JSON work status Mathematical Model for prediction - fetch a batch of test images, evaluate cost and gradient; perform parameter update
  • 25. Deep Visual Semantic Description of Images - Technical Details Sentence Description Model using Multi-Modal Approach Load checkpoints and the MSCoCo model Initialize output blobs to dump to JSON for visualization Load the tasks file and features for all images containing a 4096 X N numpy array of features Iterate over all images and predict
  • 26. Deep Visual Semantic Description of Images - Technical Details Sentence Description Model using Multi-Modal Approach Load checkpoints and the MSCoCo model Initialize output blobs to dump to JSON for visualization Load the tasks file and features for all images containing a 4096 X N numpy array of features Iterate over all images and predict
  • 27. Deep Visual Semantic Description of Images - Technical Details Sentence Description Model using Multi-Modal Approach Load checkpoints and the MSCoCo model Initialize output blobs to dump to JSON for visualization Load the tasks file and features for all images containing a 4096 X N numpy array of features Iterate over all images and predict
  • 28. Deep Visual Semantic Description of Images - Technical Details Sentence Description Model using Multi-Modal Approach Load checkpoints and the MSCoCo model Initialize output blobs to dump to JSON for visualization Load the tasks file and features for all images containing a 4096 X N numpy array of features Iterate over all images and predict
  • 29. Deep Visual Semantic Description of Images - Technical Details Sentence Description Model using Multi-Modal Approach Load checkpoints and the MSCoCo model Initialize output blobs to dump to JSON for visualization Load the tasks file and features for all images containing a 4096 X N numpy array of features Iterate over all images and predict
  • 30. Results for Image Descriptions
  • 31. Results for Image Descriptions - Contd.
  • 32. Results for Image Descriptions
  • 33. Results for Image Descriptions
  • 34. Super Resolution - Technical Details and Motion Estimators Improvement in resolution and image clarity by increasing the frequency contents in image sequences Mathematical models - Forward (DHS) model and Inverse model (least squares solution and need for Criterion function) Choosing the optimal regularization parameter in inverse model for SR image Motion Estimation - Integer Pixel Displacement Combinatorial Motion Estimation - minimizing the discrepancy between real and synthetic images
  • 35. Super Resolution - Technical Details and Motion Estimators Improvement in resolution and image clarity by increasing the frequency contents in image sequences Mathematical models - Forward (DHS) model and Inverse model (least squares solution and need for Criterion function) Choosing the optimal regularization parameter in inverse model for SR image Motion Estimation - Integer Pixel Displacement Combinatorial Motion Estimation - minimizing the discrepancy between real and synthetic images
  • 36. Super Resolution - Technical Details and Motion Estimators Improvement in resolution and image clarity by increasing the frequency contents in image sequences Mathematical models - Forward (DHS) model and Inverse model (least squares solution and need for Criterion function) Choosing the optimal regularization parameter in inverse model for SR image Motion Estimation - Integer Pixel Displacement Combinatorial Motion Estimation - minimizing the discrepancy between real and synthetic images
  • 37. Super Resolution - Technical Details and Motion Estimators Improvement in resolution and image clarity by increasing the frequency contents in image sequences Mathematical models - Forward (DHS) model and Inverse model (least squares solution and need for Criterion function) Choosing the optimal regularization parameter in inverse model for SR image Motion Estimation - Integer Pixel Displacement Combinatorial Motion Estimation - minimizing the discrepancy between real and synthetic images
  • 38. Super Resolution - Technical Details and Motion Estimators Improvement in resolution and image clarity by increasing the frequency contents in image sequences Mathematical models - Forward (DHS) model and Inverse model (least squares solution and need for Criterion function) Choosing the optimal regularization parameter in inverse model for SR image Motion Estimation - Integer Pixel Displacement Combinatorial Motion Estimation - minimizing the discrepancy between real and synthetic images
  • 39. Results for Super Resolution - Forward Model
  • 40. Results for Super Resolution - Inverse Model
  • 41. Results for Motion Estimator
  • 42. Face Detection and Recognition Face Detection using the Classic Viola Jones Algorithm Computation of features similar to Haar Basis functions; Learning function - AdaBoost followed by Cascading Face Recognition using PCA on Eigen Faces Creation of Face Space and calculation of Eigen values Training of Eigen Faces Recognition using test and trained features - minimal Euclidean distance
  • 43. Face Detection and Recognition Face Detection using the Classic Viola Jones Algorithm Computation of features similar to Haar Basis functions; Learning function - AdaBoost followed by Cascading Face Recognition using PCA on Eigen Faces Creation of Face Space and calculation of Eigen values Training of Eigen Faces Recognition using test and trained features - minimal Euclidean distance
  • 44. Face Detection and Recognition Face Detection using the Classic Viola Jones Algorithm Computation of features similar to Haar Basis functions; Learning function - AdaBoost followed by Cascading Face Recognition using PCA on Eigen Faces Creation of Face Space and calculation of Eigen values Training of Eigen Faces Recognition using test and trained features - minimal Euclidean distance
  • 45. Face Detection and Recognition Face Detection using the Classic Viola Jones Algorithm Computation of features similar to Haar Basis functions; Learning function - AdaBoost followed by Cascading Face Recognition using PCA on Eigen Faces Creation of Face Space and calculation of Eigen values Training of Eigen Faces Recognition using test and trained features - minimal Euclidean distance
  • 46. Face Detection and Recognition Face Detection using the Classic Viola Jones Algorithm Computation of features similar to Haar Basis functions; Learning function - AdaBoost followed by Cascading Face Recognition using PCA on Eigen Faces Creation of Face Space and calculation of Eigen values Training of Eigen Faces Recognition using test and trained features - minimal Euclidean distance
  • 47. Face Detection and Recognition Face Detection using the Classic Viola Jones Algorithm Computation of features similar to Haar Basis functions; Learning function - AdaBoost followed by Cascading Face Recognition using PCA on Eigen Faces Creation of Face Space and calculation of Eigen values Training of Eigen Faces Recognition using test and trained features - minimal Euclidean distance
  • 48. Results for Face Recognition
  • 51. Drawbacks Processing could not be done in real time Too many false positives during malicious object recognition Generating Image Descriptions and CNN feature extraction were found to be slow processes; need for a GPU to accelerate the process Face Recognition is not that accurate using the Eigen Face Approach, but it’s the one that is best suited
  • 52. Drawbacks Processing could not be done in real time Too many false positives during malicious object recognition Generating Image Descriptions and CNN feature extraction were found to be slow processes; need for a GPU to accelerate the process Face Recognition is not that accurate using the Eigen Face Approach, but it’s the one that is best suited
  • 53. Drawbacks Processing could not be done in real time Too many false positives during malicious object recognition Generating Image Descriptions and CNN feature extraction were found to be slow processes; need for a GPU to accelerate the process Face Recognition is not that accurate using the Eigen Face Approach, but it’s the one that is best suited
  • 54. Drawbacks Processing could not be done in real time Too many false positives during malicious object recognition Generating Image Descriptions and CNN feature extraction were found to be slow processes; need for a GPU to accelerate the process Face Recognition is not that accurate using the Eigen Face Approach, but it’s the one that is best suited
  • 55. References Histogram of oriented gradients for human detection, Dalal N, Triggs B, CVPR, 2005 Deep Visual Semantic Alignments for Generating Image Descriptions, Andrej Karpathy and Li Fei-Fei, Stanford University Super-resolution of Image Sequences, Krokhin, North-Eastern University, Boston, Massachusetts Robust Real-Time Face Detection, Paul Viola, Michael J Jones, IJCC, 2004