SlideShare a Scribd company logo
GROUP MEMBERS
Jawad Sajid FA16-BCS-181
Mutahhar Ahmad FA16-BCS-122
Usman Sajid FA16-BCS-288
Introduction and Problem
Statement
• Strive for Industry 4.0
• These perspectives are not just ideas!
• Current situation of Pakistani Industry
• What are the problems we focus on?
1. Govern Access
2. Monitor Activities in Real-time
3. Alert to Environmental Risks
Honeywell’s Video Analytics
Microsoft’s Amazing Works!
Amazon Works!
“Real-world Anomaly Detection in Surveillance
Videos” by Waqas Sultani, Chen Chen and
Mubarak Shah
“A Review of the Applications of Computer
Vision to Construction Safety” by Brian H.W
Guo, Yang Zou and Long Chen
On-site 3D Vision Tracking of Construction Personnel
by Francisco Cordava and Joannis Brilakis
MobileNets: Efficient Convolutional Neural
Networks for Mobile Vision Applications by Google
Deep Face Recognition by University of Oxford
OpenPose: Realtime Mutli-Person 2D Pose
Estimation using Part Affinity Fields
Simple Online and Realtime Tracking With Deep
Association Metric by Queensland University
• Problem of
Deploying the
Application?
• Why two
databases?
• Importance of API
Layer
• System logins?
Why?
• Policy Oriented
Structure
DATABASE API DESIGN FACIAL
RECOGNITION
UNIFORM
DETECTION
POSE
RECOGNITION
UI
INTEGRATION
COMMUNICATI
ON MODULE
Smart environment for industry 4.0
• Initial Database
• Not Normalized
• Repeated
Information
• Not Any Centralized
Table
• Duty Rostrum
Missing
• Normalized
• Duty Rostrum
• Centralized
Information
Smart environment for industry 4.0
API and
Database
• Apache Thrift Services
• Command: thrift --gen <language>
<.Thrift File>
• Reusable Interfaces
• Server based on C# with Thrift
• Client based on Python with Thrift
• TBinary Protocol Layer
• TSocket Transport Layer
• MSSQL Server DB
• DJango Admin Panel
• Thrift Architecture
• API Interfaces
• C# Thrift Server
• Attaining an Admin Panel with Django and Database
Smart environment for industry 4.0
Smart environment for industry 4.0
Face Recognition
Face Recognition Pipeline
• Locate and Extract faces
• Identify Facial Features
• Represent Face As Measurement
• Compare Faces
Our Works
1- Face Recognition Using VGG-Face (Transfer Learning)
2- DLIB based Facial Recognition
VGG-FACE? LABELED FACES IN
THE WILD (LFW)
TRANSFER
LEARNING?
OUR DATASET?
FACE DETECTION? MMOD FACE DETECTOR
FACE EMBEDDINGS?
REMOVING THE LAST ACTIVATION LAYER?
OUR CLASSIFIER?
Accuracy loss: 0.0421
Accuracy: 0.9962
Validation Loss: 0.1581
Validation Accuracy: 0.9423
Our Results!
Problems? Scope?
•Training time requires time.
•Dataset too small.
•Did augmentation help? According to
requirement, no. Although better results.
•Runtime training?
Functionalities
• DLIB Face Detector
• Face Locations
• CNN based
• HOG based*
• Face Encodings (128
measurements)
• Compare Faces
Smart environment for industry 4.0
Smart environment for industry 4.0
Smart environment for industry 4.0
Smart environment for industry 4.0
Multi-label Classification
Problem!
Workflow
• Creating Dataset
• Preprocessing Data
• Train Our Model
• Testing Our Model
Creating Our
Dataset!
• Firstly, we scraped
images from
Google. Problem?
• Using Microsoft’s
Bing Image Search
API.
Preprocessing Data!
• Extract Multi-class Labels
• Labels list is a “lists of lists”
• Scaling
• Binarize the labels – MultiLabelBinarizer
• transform? Two-hot encoding
• Our Model?
SmallerVGGNet!
• Image Augmentation
Accuracy Loss: 0.0405,
Accuracy: 0.9857
Validation Loss: 0.0429
Validation Accuracy:
0.9842
Testing!
Smart environment for industry 4.0
OpenPose
DeepSort
Activity Recognition?
What is OpenPose?
Ildoo Kims Amazing Work
Pipeline
• Heatmaps and PAFs
• NMS – get part candidates
• Bipartite Graphs
• Line Integral
• Assignment
• Merging!
Pretrained Model. Why
not the pyopenpose
library?
Trained on Coco dataset
with Thin MobileNet
Depth-wise? Point to
point wise convolutions.
Smart environment for industry 4.0
EXTENSION OF SORT
(SIMPLE REAL-TIME
TRACKER)
THE KALMAN FILTER THE ASSIGNMENT
PROBLEM
DISTANCE METRIC EFFICIENT ALGORITHM
– HUNGARIAN
ALGORITHM
THE APPEARANCE
FEATURE VECTOR
Pretrained Model on:
• MARS Dataset (Motion
Analysis and Re-
identification Set)
DeepSort
Activity Recognition
To get started with Activity Recognition:
• Using OpenCV
• Kinetics Dataset – Created on 2017
• 400 Activities (78.4 – 94.5% Accuracy)
• “Can Spatiotemporal 3D CNNs Retrace the History of 2D
CNNs and ImageNet”? - 2019
Smart environment for industry 4.0
To understand pose
recognition we
need to help of
human pose
estimation
Dataset!
stand = 0, walk = 1,
operate = 2,
fall_down = 3
Smart environment for industry 4.0
model = Sequential()
model.add(Dense(units=128, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(units=64, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(units=16, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(units=4, activation='softmax'))
Smart environment for industry 4.0
Smart environment for industry 4.0
Integration
Direct Approach
• Using Models for Predictions Directly
Integrated with UI
• Cannot Use Parallel Approach for
Recognitions
• Can Only Be Used On a Single
Machine At a Time
Service Oriented Approach
• Distributed Approach
• Uses Django REST API services
• All recognition applications are
services
• Parallel request handling for multiple
requests
• Uses POST request for data
• JSON data for each service
Smart environment for industry 4.0
• PyQt5
• Application UI
• Output UI
• Training UI
Conclusion
• To be completed?
• Our Integration.
• Our Communication Module.
• Scope?
• Our goals in the future?
Smart environment for industry 4.0
Our Dataset
 Face Detection?
 Mmod_human-face_detector used!
 Cropped Images
• Vgg-Face to create
embeddings!
• (224,224) Target Image
• (1,2262) Dimensional
Tensor
• Output layer
• vgg_face=Model(inputs
=model.layers[0].input
,outputs=model.layers
[-2].output)
Smart environment for industry 4.0
Smart environment for industry 4.0
Before Discarding Output Layer After Discarding Output Layer
• Our Classifier!
• Tuning parameters
• Learning rate?
• Epochs?
model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(64, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(Conv2D(128, (3, 3), padding="same"))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(1024))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# softmax classifier
model.add(Dense(classes))
model.add(Activation(finalAct))
Smart environment for industry 4.0
• Parts and Pairs?
• Heatmaps? Marks Confidence for a
Part.
• Part Affinity Fields? Position and
Orientation of Pairs.
• Non Maximum Suppression? Transform
Confidence into Certainty.
• Extract the Local Maximums
• Compare and Suppress!
• Bipartite Graph?
Connect to Form Pairs.
• Assignment Problem!
Edge Should Have
Weights.
• Line Integral! Measure the effect of a field
along a connection.
• Assignment!
• Merging! Keeping in
mind the same index.
• Results!
Smart environment for industry 4.0
Smart environment for industry 4.0
Object
Tracking
• Detecting Objects
• Analyzing Temporal Information
• Challenges!
1. Occlusion
2. Variation in View Points
3. Non-stationary Camera
4. Annotating Training Data
Traditional Methods
• Centroid Tracking
• Meanshift
• Optical Flow
• Kalman Filter
Deep Learning
based approaches
• ROLO-Recurrent YOLO
Centroid Tracking
Step 1 – Accept
bounding box
coordinates and
compute centroids
Centroid Tracking
Step 2 – Compute
Euclidean distance
between new bounding
boxes and existing
objects
Centroid Tracking
Step 3 – Update (x,y)
coordinates of existing
objects.
Lonely objects?
Associate centroid with
minimum distances
between subsequent
frames.
Centroid Tracking
Step 4 – Register new
objects.
Step 5 – Deregister old
objects.
Object lost?
Disappeared? Left the
field of view.
Object Tracking
Limitations of Centroid Tracking
1 – Using computationally expensive object detector,
frame detection pipeline will slow down
tremendously.
2 - Underlying assumption of centroid tracking?
Overlapping? Object ID switching.
3 – Just Euclidean distance? Need more heuristics.
Meanshift or Mode seeking
• Used in clustering and unsupervised problems
• Replaces centroid technique of calculating
clusters with a weighted average
• Gives importance to points closer to mean
• Find modes in the given data distribution
• Extract certain features
• Tracks new largest mode in each frame
Object Tracking
Optical Flow
• Uses spatio-temporal image brightness variations at a pixel level
• Focus on displacement vector
• Assumptions
1 – Brightness Consistency
2 – Spatial Coherence
3 – Temporal Persistence
4 – Limited Motion
• Lucas-Kanade method to obtain equation for the velocity of certain
points to be tracked.
DeepSort
The Kalman Filter
• Core idea? Use available detections and previous predictions.
• Errors?
• Constant Velocity Model.
• Noise component? Process Noise? Measurement Noise?
• Recursive Nature.
• Why Kalman works? Gaussian Realm.
The Kalman Filter
• Our state contains 8 variables (u,v,a,h,u’,v’,a’,h’)
• Assumptions? Variables have absolute positions and velocity factors.
• Kalman? Good fit for bounding boxes.
• For every detection, create a track.
• Track, delete tracks and eliminate tracks.
DeepSort
The Assignment Problem - How to
associate new detections with new
predictions?
We need two things:
1 – A Distance Metric
2 – An Efficient Algorithm
The Distance Metric
Squared Mahalanobis distance to incorporate the
uncertainties from the Kalman filter.
The Efficient Algorithm
Hungarian Algorithm for simple data association problem.
The question is “WHERE IS DEEP LEARNING IN ALL THIS?”
New distance metric on the basis of “appearance” of the
object.
(D = Lambda * D_k + (1 - Lambda) * D_a)
Smart environment for industry 4.0
Smart environment for industry 4.0

More Related Content

PDF
An Object Detection, Tracking And Parametric Classification– A Review
PPTX
Real Time Object Dectection using machine learning
PPTX
slide-171212080528.pptx
PDF
Mirko Lucchese - Deep Image Processing
PPTX
Computer Vision for Beginners
PDF
Final Report - Major Project - MAP
PDF
Cvpr 2017 Summary Meetup
PDF
Brodmann17 CVPR 2017 review - meetup slides
An Object Detection, Tracking And Parametric Classification– A Review
Real Time Object Dectection using machine learning
slide-171212080528.pptx
Mirko Lucchese - Deep Image Processing
Computer Vision for Beginners
Final Report - Major Project - MAP
Cvpr 2017 Summary Meetup
Brodmann17 CVPR 2017 review - meetup slides

Similar to Smart environment for industry 4.0 (20)

PPTX
Object detection - RCNNs vs Retinanet
PDF
IRJET- Real-Time Object Detection using Deep Learning: A Survey
PDF
Deep learning with_computer_vision
PPTX
Introduction to Object recognition
PDF
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
PDF
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
PPTX
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
PDF
IRJET- Face Recognition using Landmark Estimation and Convolution Neural Network
PPTX
Elderly Assistance- Deep Learning Theme detection
PPTX
Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Chris...
PDF
物件偵測與辨識技術
PDF
Introduction to deep learning in python and Matlab
PDF
Data quality is more important than you think
PDF
Introduction to Face Processing with Computer Vision
PDF
Object Detection Beyond Mask R-CNN and RetinaNet I
PPTX
Deep Learning for Image Analysis
PPTX
Introduction to computer vision with Convoluted Neural Networks
PPTX
Introduction to computer vision
PDF
Cheatsheet convolutional-neural-networks
PDF
Final PPT
Object detection - RCNNs vs Retinanet
IRJET- Real-Time Object Detection using Deep Learning: A Survey
Deep learning with_computer_vision
Introduction to Object recognition
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
IRJET- Face Recognition using Landmark Estimation and Convolution Neural Network
Elderly Assistance- Deep Learning Theme detection
Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Chris...
物件偵測與辨識技術
Introduction to deep learning in python and Matlab
Data quality is more important than you think
Introduction to Face Processing with Computer Vision
Object Detection Beyond Mask R-CNN and RetinaNet I
Deep Learning for Image Analysis
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision
Cheatsheet convolutional-neural-networks
Final PPT
Ad

Recently uploaded (20)

PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPT
Project quality management in manufacturing
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Sustainable Sites - Green Building Construction
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
PPT on Performance Review to get promotions
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
composite construction of structures.pdf
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Welding lecture in detail for understanding
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Well-logging-methods_new................
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Arduino robotics embedded978-1-4302-3184-4.pdf
Project quality management in manufacturing
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Sustainable Sites - Green Building Construction
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPT on Performance Review to get promotions
Foundation to blockchain - A guide to Blockchain Tech
CYBER-CRIMES AND SECURITY A guide to understanding
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
composite construction of structures.pdf
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Welding lecture in detail for understanding
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Well-logging-methods_new................
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Ad

Smart environment for industry 4.0

  • 1. GROUP MEMBERS Jawad Sajid FA16-BCS-181 Mutahhar Ahmad FA16-BCS-122 Usman Sajid FA16-BCS-288
  • 2. Introduction and Problem Statement • Strive for Industry 4.0 • These perspectives are not just ideas! • Current situation of Pakistani Industry • What are the problems we focus on? 1. Govern Access 2. Monitor Activities in Real-time 3. Alert to Environmental Risks
  • 3. Honeywell’s Video Analytics Microsoft’s Amazing Works! Amazon Works! “Real-world Anomaly Detection in Surveillance Videos” by Waqas Sultani, Chen Chen and Mubarak Shah “A Review of the Applications of Computer Vision to Construction Safety” by Brian H.W Guo, Yang Zou and Long Chen
  • 4. On-site 3D Vision Tracking of Construction Personnel by Francisco Cordava and Joannis Brilakis MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications by Google Deep Face Recognition by University of Oxford OpenPose: Realtime Mutli-Person 2D Pose Estimation using Part Affinity Fields Simple Online and Realtime Tracking With Deep Association Metric by Queensland University
  • 5. • Problem of Deploying the Application? • Why two databases? • Importance of API Layer • System logins? Why?
  • 7. DATABASE API DESIGN FACIAL RECOGNITION UNIFORM DETECTION POSE RECOGNITION UI INTEGRATION COMMUNICATI ON MODULE
  • 9. • Initial Database • Not Normalized • Repeated Information • Not Any Centralized Table • Duty Rostrum Missing
  • 10. • Normalized • Duty Rostrum • Centralized Information
  • 12. API and Database • Apache Thrift Services • Command: thrift --gen <language> <.Thrift File> • Reusable Interfaces • Server based on C# with Thrift • Client based on Python with Thrift • TBinary Protocol Layer • TSocket Transport Layer • MSSQL Server DB • DJango Admin Panel
  • 13. • Thrift Architecture • API Interfaces • C# Thrift Server • Attaining an Admin Panel with Django and Database
  • 16. Face Recognition Face Recognition Pipeline • Locate and Extract faces • Identify Facial Features • Represent Face As Measurement • Compare Faces Our Works 1- Face Recognition Using VGG-Face (Transfer Learning) 2- DLIB based Facial Recognition
  • 17. VGG-FACE? LABELED FACES IN THE WILD (LFW) TRANSFER LEARNING?
  • 18. OUR DATASET? FACE DETECTION? MMOD FACE DETECTOR FACE EMBEDDINGS? REMOVING THE LAST ACTIVATION LAYER? OUR CLASSIFIER?
  • 19. Accuracy loss: 0.0421 Accuracy: 0.9962 Validation Loss: 0.1581 Validation Accuracy: 0.9423
  • 21. Problems? Scope? •Training time requires time. •Dataset too small. •Did augmentation help? According to requirement, no. Although better results. •Runtime training?
  • 22. Functionalities • DLIB Face Detector • Face Locations • CNN based • HOG based* • Face Encodings (128 measurements) • Compare Faces
  • 27. Multi-label Classification Problem! Workflow • Creating Dataset • Preprocessing Data • Train Our Model • Testing Our Model
  • 28. Creating Our Dataset! • Firstly, we scraped images from Google. Problem? • Using Microsoft’s Bing Image Search API.
  • 29. Preprocessing Data! • Extract Multi-class Labels • Labels list is a “lists of lists” • Scaling • Binarize the labels – MultiLabelBinarizer • transform? Two-hot encoding
  • 30. • Our Model? SmallerVGGNet! • Image Augmentation Accuracy Loss: 0.0405, Accuracy: 0.9857 Validation Loss: 0.0429 Validation Accuracy: 0.9842
  • 34. What is OpenPose? Ildoo Kims Amazing Work Pipeline • Heatmaps and PAFs • NMS – get part candidates • Bipartite Graphs • Line Integral • Assignment • Merging!
  • 35. Pretrained Model. Why not the pyopenpose library? Trained on Coco dataset with Thin MobileNet Depth-wise? Point to point wise convolutions.
  • 37. EXTENSION OF SORT (SIMPLE REAL-TIME TRACKER) THE KALMAN FILTER THE ASSIGNMENT PROBLEM DISTANCE METRIC EFFICIENT ALGORITHM – HUNGARIAN ALGORITHM THE APPEARANCE FEATURE VECTOR
  • 38. Pretrained Model on: • MARS Dataset (Motion Analysis and Re- identification Set)
  • 40. Activity Recognition To get started with Activity Recognition: • Using OpenCV • Kinetics Dataset – Created on 2017 • 400 Activities (78.4 – 94.5% Accuracy) • “Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet”? - 2019
  • 42. To understand pose recognition we need to help of human pose estimation Dataset! stand = 0, walk = 1, operate = 2, fall_down = 3
  • 44. model = Sequential() model.add(Dense(units=128, activation='relu')) model.add(BatchNormalization()) model.add(Dense(units=64, activation='relu')) model.add(BatchNormalization()) model.add(Dense(units=16, activation='relu')) model.add(BatchNormalization()) model.add(Dense(units=4, activation='softmax'))
  • 47. Integration Direct Approach • Using Models for Predictions Directly Integrated with UI • Cannot Use Parallel Approach for Recognitions • Can Only Be Used On a Single Machine At a Time Service Oriented Approach • Distributed Approach • Uses Django REST API services • All recognition applications are services • Parallel request handling for multiple requests • Uses POST request for data • JSON data for each service
  • 49. • PyQt5 • Application UI • Output UI • Training UI
  • 50. Conclusion • To be completed? • Our Integration. • Our Communication Module. • Scope? • Our goals in the future?
  • 53.  Face Detection?  Mmod_human-face_detector used!  Cropped Images
  • 54. • Vgg-Face to create embeddings! • (224,224) Target Image • (1,2262) Dimensional Tensor • Output layer • vgg_face=Model(inputs =model.layers[0].input ,outputs=model.layers [-2].output)
  • 57. Before Discarding Output Layer After Discarding Output Layer
  • 58. • Our Classifier! • Tuning parameters • Learning rate? • Epochs?
  • 59. model.add(Conv2D(32, (3, 3), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(3, 3))) model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(1024)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) # softmax classifier model.add(Dense(classes)) model.add(Activation(finalAct))
  • 61. • Parts and Pairs? • Heatmaps? Marks Confidence for a Part. • Part Affinity Fields? Position and Orientation of Pairs.
  • 62. • Non Maximum Suppression? Transform Confidence into Certainty. • Extract the Local Maximums • Compare and Suppress!
  • 63. • Bipartite Graph? Connect to Form Pairs. • Assignment Problem! Edge Should Have Weights.
  • 64. • Line Integral! Measure the effect of a field along a connection. • Assignment!
  • 65. • Merging! Keeping in mind the same index. • Results!
  • 68. Object Tracking • Detecting Objects • Analyzing Temporal Information • Challenges! 1. Occlusion 2. Variation in View Points 3. Non-stationary Camera 4. Annotating Training Data
  • 69. Traditional Methods • Centroid Tracking • Meanshift • Optical Flow • Kalman Filter Deep Learning based approaches • ROLO-Recurrent YOLO
  • 70. Centroid Tracking Step 1 – Accept bounding box coordinates and compute centroids
  • 71. Centroid Tracking Step 2 – Compute Euclidean distance between new bounding boxes and existing objects
  • 72. Centroid Tracking Step 3 – Update (x,y) coordinates of existing objects. Lonely objects? Associate centroid with minimum distances between subsequent frames.
  • 73. Centroid Tracking Step 4 – Register new objects. Step 5 – Deregister old objects. Object lost? Disappeared? Left the field of view.
  • 74. Object Tracking Limitations of Centroid Tracking 1 – Using computationally expensive object detector, frame detection pipeline will slow down tremendously. 2 - Underlying assumption of centroid tracking? Overlapping? Object ID switching. 3 – Just Euclidean distance? Need more heuristics.
  • 75. Meanshift or Mode seeking • Used in clustering and unsupervised problems • Replaces centroid technique of calculating clusters with a weighted average • Gives importance to points closer to mean • Find modes in the given data distribution • Extract certain features • Tracks new largest mode in each frame
  • 76. Object Tracking Optical Flow • Uses spatio-temporal image brightness variations at a pixel level • Focus on displacement vector • Assumptions 1 – Brightness Consistency 2 – Spatial Coherence 3 – Temporal Persistence 4 – Limited Motion • Lucas-Kanade method to obtain equation for the velocity of certain points to be tracked.
  • 77. DeepSort The Kalman Filter • Core idea? Use available detections and previous predictions. • Errors? • Constant Velocity Model. • Noise component? Process Noise? Measurement Noise? • Recursive Nature. • Why Kalman works? Gaussian Realm.
  • 78. The Kalman Filter • Our state contains 8 variables (u,v,a,h,u’,v’,a’,h’) • Assumptions? Variables have absolute positions and velocity factors. • Kalman? Good fit for bounding boxes. • For every detection, create a track. • Track, delete tracks and eliminate tracks.
  • 79. DeepSort The Assignment Problem - How to associate new detections with new predictions? We need two things: 1 – A Distance Metric 2 – An Efficient Algorithm
  • 80. The Distance Metric Squared Mahalanobis distance to incorporate the uncertainties from the Kalman filter. The Efficient Algorithm Hungarian Algorithm for simple data association problem. The question is “WHERE IS DEEP LEARNING IN ALL THIS?” New distance metric on the basis of “appearance” of the object. (D = Lambda * D_k + (1 - Lambda) * D_a)