Smart environment for industry 4.0

GROUP MEMBERS
Jawad Sajid FA16-BCS-181
Mutahhar Ahmad FA16-BCS-122
Usman Sajid FA16-BCS-288

Introduction and Problem
Statement
• Strive for Industry 4.0
• These perspectives are not just ideas!
• Current situation of Pakistani Industry
• What are the problems we focus on?
1. Govern Access
2. Monitor Activities in Real-time
3. Alert to Environmental Risks

Honeywell’s Video Analytics
Microsoft’s Amazing Works!
Amazon Works!
“Real-world Anomaly Detection in Surveillance
Videos” by Waqas Sultani, Chen Chen and
Mubarak Shah
“A Review of the Applications of Computer
Vision to Construction Safety” by Brian H.W
Guo, Yang Zou and Long Chen

On-site 3D Vision Tracking of Construction Personnel
by Francisco Cordava and Joannis Brilakis
MobileNets: Efficient Convolutional Neural
Networks for Mobile Vision Applications by Google
Deep Face Recognition by University of Oxford
OpenPose: Realtime Mutli-Person 2D Pose
Estimation using Part Affinity Fields
Simple Online and Realtime Tracking With Deep
Association Metric by Queensland University

• Problem of
Deploying the
Application?
• Why two
databases?
• Importance of API
Layer
• System logins?
Why?

DATABASE API DESIGN FACIAL
RECOGNITION
UNIFORM
DETECTION
POSE
RECOGNITION
UI
INTEGRATION
COMMUNICATI
ON MODULE

Smart environment for industry 4.0

• Initial Database
• Not Normalized
• Repeated
Information
• Not Any Centralized
Table
• Duty Rostrum
Missing

• Normalized
• Duty Rostrum
• Centralized
Information

API and
Database
• Apache Thrift Services
• Command: thrift --gen <language>
<.Thrift File>
• Reusable Interfaces
• Server based on C# with Thrift
• Client based on Python with Thrift
• TBinary Protocol Layer
• TSocket Transport Layer
• MSSQL Server DB
• DJango Admin Panel

• Thrift Architecture
• API Interfaces
• C# Thrift Server
• Attaining an Admin Panel with Django and Database

Face Recognition
Face Recognition Pipeline
• Locate and Extract faces
• Identify Facial Features
• Represent Face As Measurement
• Compare Faces
Our Works
1- Face Recognition Using VGG-Face (Transfer Learning)
2- DLIB based Facial Recognition

VGG-FACE? LABELED FACES IN
THE WILD (LFW)
TRANSFER
LEARNING?

OUR DATASET?
FACE DETECTION? MMOD FACE DETECTOR
FACE EMBEDDINGS?
REMOVING THE LAST ACTIVATION LAYER?
OUR CLASSIFIER?

Accuracy loss: 0.0421
Accuracy: 0.9962
Validation Loss: 0.1581
Validation Accuracy: 0.9423

Problems? Scope?
•Training time requires time.
•Dataset too small.
•Did augmentation help? According to
requirement, no. Although better results.
•Runtime training?

Functionalities
• DLIB Face Detector
• Face Locations
• CNN based
• HOG based*
• Face Encodings (128
measurements)
• Compare Faces

Multi-label Classification
Problem!
Workflow
• Creating Dataset
• Preprocessing Data
• Train Our Model
• Testing Our Model

Creating Our
Dataset!
• Firstly, we scraped
images from
Google. Problem?
• Using Microsoft’s
Bing Image Search
API.

Preprocessing Data!
• Extract Multi-class Labels
• Labels list is a “lists of lists”
• Scaling
• Binarize the labels – MultiLabelBinarizer
• transform? Two-hot encoding

• Our Model?
SmallerVGGNet!
• Image Augmentation
Accuracy Loss: 0.0405,
Accuracy: 0.9857
Validation Loss: 0.0429
Validation Accuracy:
0.9842

OpenPose
DeepSort
Activity Recognition?

What is OpenPose?
Ildoo Kims Amazing Work
Pipeline
• Heatmaps and PAFs
• NMS – get part candidates
• Bipartite Graphs
• Line Integral
• Assignment
• Merging!

Pretrained Model. Why
not the pyopenpose
library?
Trained on Coco dataset
with Thin MobileNet
Depth-wise? Point to
point wise convolutions.

EXTENSION OF SORT
(SIMPLE REAL-TIME
TRACKER)
THE KALMAN FILTER THE ASSIGNMENT
PROBLEM
DISTANCE METRIC EFFICIENT ALGORITHM
– HUNGARIAN
ALGORITHM
THE APPEARANCE
FEATURE VECTOR

Pretrained Model on:
• MARS Dataset (Motion
Analysis and Re-
identification Set)

Activity Recognition
To get started with Activity Recognition:
• Using OpenCV
• Kinetics Dataset – Created on 2017
• 400 Activities (78.4 – 94.5% Accuracy)
• “Can Spatiotemporal 3D CNNs Retrace the History of 2D
CNNs and ImageNet”? - 2019

To understand pose
recognition we
need to help of
human pose
estimation
Dataset!
stand = 0, walk = 1,
operate = 2,
fall_down = 3

model = Sequential()
model.add(Dense(units=128, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(units=4, activation='softmax'))

Integration
Direct Approach
• Using Models for Predictions Directly
Integrated with UI
• Cannot Use Parallel Approach for
Recognitions
• Can Only Be Used On a Single
Machine At a Time
Service Oriented Approach
• Distributed Approach
• Uses Django REST API services
• All recognition applications are
services
• Parallel request handling for multiple
requests
• Uses POST request for data
• JSON data for each service

• PyQt5
• Application UI
• Output UI
• Training UI

Conclusion
• To be completed?
• Our Integration.
• Our Communication Module.
• Scope?
• Our goals in the future?

 Face Detection?
 Mmod_human-face_detector used!
 Cropped Images

• Vgg-Face to create
embeddings!
• (224,224) Target Image
• (1,2262) Dimensional
Tensor
• Output layer
• vgg_face=Model(inputs
=model.layers[0].input
,outputs=model.layers
[-2].output)

Before Discarding Output Layer After Discarding Output Layer

• Our Classifier!
• Tuning parameters
• Learning rate?
• Epochs?

model.add(Conv2D(32, (3, 3), padding="same",
input_shape=inputShape))
model.add(Activation("relu"))
model.add(BatchNormalization(axis=chanDim))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Dropout(0.25))
# (CONV => RELU) * 2 => POOL
model.add(Conv2D(64, (3, 3), padding="same"))
# (CONV => RELU) * 2 => POOL
# first (and only) set of FC => RELU layers
model.add(Flatten())
model.add(Dense(1024))
# softmax classifier
model.add(Dense(classes))
model.add(Activation(finalAct))

• Parts and Pairs?
• Heatmaps? Marks Confidence for a
Part.
• Part Affinity Fields? Position and
Orientation of Pairs.

• Non Maximum Suppression? Transform
Confidence into Certainty.
• Extract the Local Maximums
• Compare and Suppress!

• Bipartite Graph?
Connect to Form Pairs.
• Assignment Problem!
Edge Should Have
Weights.

• Line Integral! Measure the effect of a field
along a connection.
• Assignment!

• Merging! Keeping in
mind the same index.
• Results!

Object
Tracking
• Detecting Objects
• Analyzing Temporal Information
• Challenges!
1. Occlusion
2. Variation in View Points
3. Non-stationary Camera
4. Annotating Training Data

Traditional Methods
• Centroid Tracking
• Meanshift
• Optical Flow
• Kalman Filter
Deep Learning
based approaches
• ROLO-Recurrent YOLO

Centroid Tracking
Step 1 – Accept
bounding box
coordinates and
compute centroids

Centroid Tracking
Step 2 – Compute
Euclidean distance
between new bounding
boxes and existing
objects

Centroid Tracking
Step 3 – Update (x,y)
coordinates of existing
objects.
Lonely objects?
Associate centroid with
minimum distances
between subsequent
frames.

Centroid Tracking
Step 4 – Register new
objects.
Step 5 – Deregister old
objects.
Object lost?
Disappeared? Left the
field of view.

Object Tracking
Limitations of Centroid Tracking
1 – Using computationally expensive object detector,
frame detection pipeline will slow down
tremendously.
2 - Underlying assumption of centroid tracking?
Overlapping? Object ID switching.
3 – Just Euclidean distance? Need more heuristics.

Meanshift or Mode seeking
• Used in clustering and unsupervised problems
• Replaces centroid technique of calculating
clusters with a weighted average
• Gives importance to points closer to mean
• Find modes in the given data distribution
• Extract certain features
• Tracks new largest mode in each frame

Object Tracking
Optical Flow
• Uses spatio-temporal image brightness variations at a pixel level
• Focus on displacement vector
• Assumptions
1 – Brightness Consistency
2 – Spatial Coherence
3 – Temporal Persistence
4 – Limited Motion
• Lucas-Kanade method to obtain equation for the velocity of certain
points to be tracked.

DeepSort
The Kalman Filter
• Core idea? Use available detections and previous predictions.
• Errors?
• Constant Velocity Model.
• Noise component? Process Noise? Measurement Noise?
• Recursive Nature.
• Why Kalman works? Gaussian Realm.

The Kalman Filter
• Our state contains 8 variables (u,v,a,h,u’,v’,a’,h’)
• Assumptions? Variables have absolute positions and velocity factors.
• Kalman? Good fit for bounding boxes.
• For every detection, create a track.
• Track, delete tracks and eliminate tracks.

DeepSort
The Assignment Problem - How to
associate new detections with new
predictions?
We need two things:
1 – A Distance Metric
2 – An Efficient Algorithm

The Distance Metric
Squared Mahalanobis distance to incorporate the
uncertainties from the Kalman filter.
The Efficient Algorithm
Hungarian Algorithm for simple data association problem.
The question is “WHERE IS DEEP LEARNING IN ALL THIS?”
New distance metric on the basis of “appearance” of the
object.
(D = Lambda * D_k + (1 - Lambda) * D_a)

Smart environment for industry 4.0

More Related Content

Similar to Smart environment for industry 4.0 (20)

Recently uploaded (20)

Smart environment for industry 4.0