SlideShare a Scribd company logo
© 2021 University of Auckland
Person Re-Identification and
Tracking at the Edge:
Challenges and Techniques
Morteza Biglari-Abhari
Department of Electrical, Computer
and Software Engineering
University of Auckland, New Zealand
© 2021 University of Auckland
Outline
2
➢ Why Person Re-Identification and Tracking
➢ Key Challenges and Current Approaches
➢ Appearance Based One-shot / Unsupervised Re-Identification
➢ Spatio-Temporal Based Tracking
➢ Fused Appearance and Spatio-Temporal Approach
➢ Privacy Issues
➢ Summary and Conclusions
© 2021 University of Auckland
Why Person Re-Identification and Tracking
3
The aim is matching images of people as viewed through multiple cameras in
different positions and locations and determine a unique identity.
Possible target applications:
o Surveillance for Security and Public Safety
o Healthcare and Industrial Facilities
o Commercial Entities (such as supermarkets) to monitor customer behavior
o Intelligent Transportation System
o Smart Cities
© 2021 University of Auckland
Key Challenges
4
Challenges: variations in the appearance of a person (even in the same camera view)
(variations in pose, lighting, color, resolution, motion blur, obstacles, occlusions)
© 2021 University of Auckland
Current Approaches
5
Person Detection
feature extraction
or feature learning
classification
Person Re-Identification
➢Person detection methods should be robust to detect people in different
conditions.
➢A person model needs to be robust against various conditions: varying
lighting conditions, partially obscured views, different camera view angles
Appearance based /
Spatio-temporal based
© 2021 University of Auckland
Person Detection Techniques
6
o Early person detection works relied on using blob detection [Krumm et al, 2000 – Everingham &
Zisserman, 2006]
o Low computational complexity but low accuracy
o Histogram of Oriented Gradients (HOG) and Support Vector Machine (SVM) algorithm
[Krumm et al, 2000 – Dalal & Trigs, 2005]
o Deformable Parts Module (DPM) – uses HOG features but includes structural
relationship between parts of the person [Cho et al, 2012 – Yan et al, 2014]
o Calculating HOG features is very computationally expensive
o Using background estimation can improve the accuracy
o Aggregate Channel Features (ACF) – improves detection speed through isolating
features that have the largest contribution towards accurate person detection
(focusing on gradient magnitude, HOG, and the LUV color channel). It may be slightly
less accurate but faster than DPM. [Dollar et al, 2014 – De Smedt & Goedeme, 2015]
© 2021 University of Auckland
DPM versus ACF
7
An example of person parts being extracted using DPM An example of ACF pedestrian detection [Benfold & Reid, 2011]
© 2021 University of Auckland
CNN-based Techniques for Person Detection
8
CNN-based Techniques:
o R-CNN: regions of interest (ROI) are extracted that potentially have targets for further
analysis and fed to CNN for feature extraction and classification. To improve the
processing speed, Fast R-CNN and Faster R-CNN have been proposed. [Girshick, 2015 – Ren et
al, 2017]
o While this was faster than other CNN-based techniques, it could process 5 fps using a high-end
GPU.
o Single Shot Detector (SSD): the image is only parsed once rather than processing
multiple potentially overlapping windows. Achieving similar level of accuracy to Faster
R-CNN but takes less processing time. YOLO is in this category.
o Pre-trained ResNet-50 and MobileNet-V2 have been used for person detection and
re-identification (for specific datasets such as CUHK03 and DukeMTMC), but they are
computationally very intensive for real-time detection on edge devices.
© 2021 University of Auckland
Feature Vectors (Person Detection)
9
The aim is to select features that allow for high inter-class variation (significantly
different between multiple people) while maintaining low intra-class variation (similar
for the same person).
➢ Features may include color (RGB, HSV, YCbCr),
texture, and structure.
➢ Descriptors that include both color and texture
perform better than either one alone. [Gou et al, 2017]
A visual representation of an example feature vector (made up of HSV color and LBP texture
histograms) representing an entire person.
© 2021 University of Auckland
Appearance Based Re-Identification
10
Person Detection (DPM)
Feature Extraction
(Color & LBP Histograms)
PCA & Metric Learning
(Covariance Metric)
Compute Similarity to Classes
(Euclidean Distance)
If match, Update Class Model
(Sequential K-Means)
Class / Identity Label
Input Image
feature vector
extraction
identity
classification
Fast one-shot/unsupervised re-identification (feature vector extraction):
o A combination of HSV for color and LBP
(Local Binary Pattern) for texture are used
to represent patches or parts of the
detected people.
o Principal Components Analysis (PCA) can
be used as an unsupervised method of
determining the most important
dimensions of the feature vectors in terms
of variation
o Metric learning as a useful pre-processing
step transforms the vectors so that they
are more linearly separable into identity
classes
© 2021 University of Auckland
Appearance Based Re-Identification (continued)
11
Person Detection (DPM)
Feature Extraction
(Color & LBP Histograms)
PCA & Metric Learning
(Covariance Metric)
Compute Similarity to Classes
(Euclidean Distance)
If match, Update Class Model
(Sequential K-Means)
Class / Identity Label
Input Image
feature vector
extraction
identity
classification
o Metric Learning reduces the computational
complexity and improves the accuracy.
o Covariance Metric transformation used in
this case study.
Σ is the output matrix, X is the input feature vector, μ is
the mean, and i and j refer to the positions of elements
within the vector/matrix.
∑ij = µ[Xi Xj] - µi µj
© 2021 University of Auckland
Appearance Based Re-Identification (continued)
12
Person Detection (DPM)
Feature Extraction
(Color & LBP Histograms)
PCA & Metric Learning
(Covariance Metric)
Compute Similarity to Classes
(Euclidean Distance)
If match, Update Class Model
(Sequential K-Means)
Class / Identity Label
Input Image
feature vector
extraction
identity
classification
Fast one-shot/unsupervised re-identification (identity classification):
o Each class represents a single identity,
and the aim is to classify the transformed
feature vectors into classes
o Supervised learning requiring large
training data may not be suitable in
applications where a possible individual
may enter an unconstrained camera view
o Unsupervised learning may not be suitable
due to very poor accuracy
As a compromise, one-shot learning methods may be used, where a
single sample (per class) is used during training and the model is
updated at run-time.
© 2021 University of Auckland
Reducing the impact of misclassification in
one-shot/unsupervised learning
13
o Compare the new sample (probe) with each
target sample in the gallery in each class
(i.e. calculate the Euclidean distance)
o If the distance is below a specified
threshold, then they match
o If the number of matches is more than a
specified numMin, then the new sample is
classified as part of that identity class
Gallery Approach:
A gallery of N feature vectors is maintained for each identity class:
➢ Create a new class for a new person and use the extracted feature as an anchor
➢ Establish a gallery of N samples (initially all identical) for each class
Classification
Model Update
Two main parts: Classification step and Model Update step
o A random target sample in the class gallery is
replaced with the classified probe sample.
o Constraint on N makes the model stable and
numMin reduces the impact of mis-classification
© 2021 University of Auckland
Reducing the impact of misclassification in
one-shot/unsupervised learning (continued)
14
Sequential k-Means Approach:
A modified from of k-Means clustering that supports online learning to classify
feature vectors is used. Each class/cluster is a new person.
➢ Each class is represented only as a cluster mean (instead of retaining all the
data points).
o Use the first sample’s feature vector to initialize a new cluster center (mc for class c)
o Compare a new probe feature vector X to the cluster mean mc for each existing class
o The probe feature vector X is classified into class c with the lowest Euclidean distance ||mc - X||
o Update the selected cluster mean using mc = β.X + (1 – β) mc
Proper value for β can be determined through parameter sweeping for the specific data set.
© 2021 University of Auckland
Spatio-Temporal Based Tracking
15
➢ Camera calibration matrices are used to
convert the image-space pixel coordinate
for the person to a real-world coordinate
on a map
➢ Position of each person detected in frame
N (the current frame) is classified based
on their proximity to each of the
predictions in frame N-1 (the previous
frame).
➢ Kalman Filters are used to predict the
next position of each track in a way that
takes the kinematics of the person into
account, with robustness against noise
© 2021 University of Auckland
Fused Appearance and Spatio-Temporal Approach
16
Person Detection (DPM)
Feature Extraction
(Color & LBP Histograms)
PCA & Metric Learning
(Covariance Metric)
Model Fusion (Linear Weighting)
Update Class Model
(Sequential K-Means)
Input Image
feature vector
similarity ranking
Position Estimation
(Image-space Coordinates)
Compute Similarity to Classes
(Euclidean Distance)
position measurement
Update Class Model
(Kalman Filter)
© 2021 University of Auckland
Initial Experimental Results - A Case Study
17
UoA-Indoor Dataset:
➢Three hours of footage from four overlapping cameras (resolution 1920 x 1080 at 15 frames per second)
➢19 different identities annotated across 150000 frames
Experiments were conducted on
two cases:
o Walk (where there is only one
person in the room at a time)
o Group sequences
(up to four people in the
room at the same time,
interacting with each other)
© 2021 University of Auckland
Initial Experimental Results - A Case Study (Continued)
18
While people may not be in the room at the same time, the system remembers the
identities of people it has seen before.
© 2021 University of Auckland
Initial Experimental Results - A Case Study (Continued)
19
Comparing DPM vs. ACF:
Classification
Model
One-shot learning
accuracy %
Unsupervised
learning accuracy %
Processing Speed
(fps)
DPM Appearance only 51.9 48.8
Spatio-temporal
only
33.3 31.7
Fused 65.7 61.6 9.8
ACF Appearance only 53.8 47.1
Spatio-temporal
only
34.3 30.6
Fused 69.4 56.7 22.3
© 2021 University of Auckland
Privacy Issues
20
Privacy issues may be considered as protecting personal information and security
vulnerabilities that may affect the sensitive information.
The first issue can be addressed through
Privacy-by-Design.
➢Privacy-Aware framework: Based on the target
application requirements, parts of captured
images may be censored to avoid individual
identification (where not necessary).
➢Privacy-Affirming framework: Only the
necessary data is extracted from the input
image though computer vision techniques.
© 2021 University of Auckland
Security Vulnerabilities (may affect sensitive
information)
21
Security threats and attacks on
machine learning based
computer vision systems may
significantly compromise the
data integrity and robustness of
object detection and tracking.
Source: [Hanif et al, 2018]
© 2021 University of Auckland
Summary and Conclusions
22
➢ An appearance-based person re-identification and tracking was presented considering some
trade-offs between accuracy and computational complexities.
➢ A spatio-temporal model was discussed to further aid the classification of detected individuals
into identity classes, using Kalman Filters to predict the future positions of people.
➢ A fused appearance-based and spatio-temporal approach was presented to improve the
accuracy
➢ The effectiveness of existing approaches are application dependent.
o Machine learning based and traditional image processing techniques can be employed for edge
device implementations (depending on the required accuracy and application requirements).
o Some traditional image processing techniques may be more suitable for implementing at the
edge devices (HOG based techniques are more energy efficient than CNN–based approaches).
o Privacy, security and data integrity are additional challenges for implementation at the edge.
© 2021 University of Auckland
Acknowledgements
23
▪ Dr. Andrew Tzer-Yeu Chen
▪ Dr. Kevin I-Kai Wang
© 2021 University of Auckland
Resources: Related Publications
24
- Chen, A. T., Biglari-Abhari, M., & Wang, K. I. K. (2020) Fusing Appearance and Spatio-Temporal Models for Person Re-Identification and
Tracking. J. Imaging 2020, 6, 27. https://doi.org/10.3390/jimaging6050027
- Chen, A. T., Biglari-Abhari, M., & Wang, K. I. K. (2019) Investigating fast re-identification for multi-camera indoor person tracking, Elsevier
Journal of Computers & Electrical Engineering, Vol. 77, pp. 273 – 288, 2019. https://doi.org/10.1016/j.compeleceng.2019.06.009
- Chen, A. T-Y., Biglari-Abhari, M., Wang, K. (2018) SuperBE: Computationally-Light Background Estimation with Superpixels, Journal of Real-time
Image Processing, January 2018
- Chen, A. T., Gupta, R., Borzenko, A., Wang, K. I. K & Biglari-Abhari, M. (2018). Accelerating SuperBE with Hardware/Software Co-Design, in
Journal of Imaging, 2018, 4(10), 122; doi: 10.3390/jimaging410012
- Chen, A. T., Biglari-Abhari, M., & Wang, K. (2018). Fast One-Shot Learning for Identity Classification in Person Re-identification and Tracking, in
Proceedings of the 15th IEEE International Conference on Control, Automation, Robotics and Vision (ICARCV-2018), Singapore, 18-21 Nov.
2018
- Chen, A. T., Biglari-Abhari, M., & Wang, K. (2018). Context is King: Privacy Perceptions of Camera-based Surveillance, in Proceedings of the
15th IEEE International Conference on Advanced Video and Signal-based Surveillance, Auckland - New Zealand, 27-30 November 2018
- Chen, A. T., Biglari-Abhari, M., Wang, K. I. K., Bouzerdoum, A., & Tivive, F. H. -C. (2018). Convolutional Neural Network Acceleration with
Hardware/Software Co-Design. Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-
Solving Technologies. 48 (5), 1288-1301, doi:10.1007/s10489-017-1007-z
- Chen, A. T-Y., Biglari-Abhari, M., Wang, K. I-K., (2017) Trusting the Computer in Computer Vision: A Privacy-Affirming Framework, Proceedings of
The First International Workshop on The Bright and Dark Sides of Computer Vision: Challenges and Opportunities for Privacy and Security (CV-
COPS 2017), Honolulu, Hawaii — July 21, 2017
- Chen, A. T-Y., Fan, J., Biglari-Abhari, M., Wang, K. I-K., (2017) A Computationally Efficient Pipeline for Camera-based Indoor Person Tracking,
Proceedings of Image and Vision Computing New Zealand (IVCNZ 2017), Christchurch, New Zealand — 4 – 6 Dec. 2017
© 2021 University of Auckland
Other Related Works (Our Research Team)
25
- Hemmati, M., Biglari-Abhari, M., & Niar, S. (2019) Adaptive Vehicle Detection for Real-time Autonomous Driving System, in Proceedings
of the 2019 IEEE Conference on Design, Automation & Test in Europe (DATE), Florence, Italy, 25-28 March 2019, pp. 1034-1039,
doi:10.23919/DATE.2019.8714818
- Porter, R., Morgan, S., Biglari-Abhari, M. (2019) Extending a Soft-Core RISC-V Processor to Accelerate CNN Inference, to appear in
Proceedings of the Sixth Annual Conference on Computational Science & Computational Intelligence, Las Vegas, Nevada, 5-7 December
2019
- Hemmati, M., Biglari-Abhari, M., Niar, S., Berber, S., (2017) Real‐Time Multi‐Scale Pedestrian Detection for Driver Assistance Systems,
ACM/IEEE Proceedings of the 54th Design Automation Conference (DAC), Austin, TX, 18-22 June 2017
- Hemmati, M., Biglari-Abhari, M., Berber, S., & Niar, S. (2014) HOG Feature Extractor Hardware Accelerator for Real-time Pedestrian
Detection. Proceedings of 17th Euromicro Conference on Digital System Design (DSD), Verona, ITALY: 27 August - 29 August 2014. (543-
550)
Other Related Embedded Computer Vision Systems publications:
© 2021 University of Auckland
References
26
[Benfold & Reid, 2011] B. Benfold and I. Reid, “Stable multi-target tracking in real-time surveillance video,” in Conference on Computer Vision and
Pattern Recognition (CVPR), 2011, pp. 3457–3464.
[Cho & Rybski, 2012] H. Cho, P. E. Rybski, A. Bar-Hillel, and W. Zhang, “Real-time pedestrian detection with deformable part models,” in Intelligent
Vehicles Symposium (IVS), 2012, pp. 1035–1042.
[Dalal & Triggs, 2005] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Conference on Computer Vision and Pattern
Recognition (CVPR), vol. 1, 2005, pp. 886–893.
[DeSmedt & Goedeme] F. DeSmedt and T. Goedeme ́,“Open framework for combined pedestrian detection,” in International Conference on Computer
Vision Theory and Applications (VISIGRAPP), 2015, pp. 551–558.
[Dollar et al, 2014] P. Dolla ́r, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramids for object detection,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 36, no. 8, pp. 1532–1545, 2014.
[Everingham & Zisserman, 2006] M. Everingham and A. Zisserman, “Automated person identification in video,” in International Conference on Image
and Video Retrieval (CIVR), 2006, pp. 289–298.
[Girshick, 2015] R. Girshick, “Fast R-CNN,” in International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448.
[Gou et al, 2017] M. Gou, S. Karanam, W. Liu, O. Camps, and R. J. Radke, “DukeMTMC4ReID: A large-scale multi-camera person re-identification
dataset,” in Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 10–19.
[Hanif et al, 2018] M. A. Hanif, F. Khalid, R. V. W. Putra, S. Rehman and M. Shafique, "Robust Machine Learning Systems: Reliability and Security for
Deep Neural Networks," 2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS), pp. 257-260
[Krumm et al, 2000] J. Krumm, S. Harris, B. Meyers, B. Brumitt, M. Hale, and S. Shafer, “Multi-camera multi-person tracking for EasyLiving,” in
International Workshop on Visual Surveillance, 2000, pp. 3–10.
[Ren et al, 2017] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137– 1149, 2017.
[Yan et al, 2014] J. Yan, Z. Lei, L. Wen, and S. Z. Li, “The fastest deformable part model for object detection,” in Conference on Computer Vision and
Pattern Recognition (CVPR), 2014, pp. 2497–2504.

More Related Content

PDF
Deep learning for person re-identification
PPTX
Detection and recognition of face using neural network
PPTX
Image segmentation
PPT
Image processing
PPTX
Image classification with Deep Neural Networks
DOCX
Tweening and morphing
PPT
Segmentation
PPT
Presentation on deformable model for medical image segmentation
Deep learning for person re-identification
Detection and recognition of face using neural network
Image segmentation
Image processing
Image classification with Deep Neural Networks
Tweening and morphing
Segmentation
Presentation on deformable model for medical image segmentation

What's hot (20)

PPTX
IMAGE SEGMENTATION TECHNIQUES
PDF
Deep learning for medical imaging
PDF
digital image processing, image processing
PDF
Jpeg2000
PPTX
Chapter 9 morphological image processing
PPTX
Image Sensing and Acquisition.pptx
PPT
Chapter10 image segmentation
PPSX
Edge Detection and Segmentation
PPTX
Histogram Processing
PDF
Image Segmentation
PPTX
Image Representation & Descriptors
PPT
introduction to Digital Image Processing
PPTX
COM2304: Introduction to Computer Vision & Image Processing
PPTX
Object detection
PPTX
Object Detection using Deep Neural Networks
PPTX
Image Classification using deep learning
PPTX
Texture,pattern and pattern classes
PPTX
Region based segmentation
PPTX
Digital image processing
PPTX
Canny Edge Detection
IMAGE SEGMENTATION TECHNIQUES
Deep learning for medical imaging
digital image processing, image processing
Jpeg2000
Chapter 9 morphological image processing
Image Sensing and Acquisition.pptx
Chapter10 image segmentation
Edge Detection and Segmentation
Histogram Processing
Image Segmentation
Image Representation & Descriptors
introduction to Digital Image Processing
COM2304: Introduction to Computer Vision & Image Processing
Object detection
Object Detection using Deep Neural Networks
Image Classification using deep learning
Texture,pattern and pattern classes
Region based segmentation
Digital image processing
Canny Edge Detection
Ad

Similar to “Person Re-Identification and Tracking at the Edge: Challenges and Techniques,” a Presentation from the University of Auckland (20)

PDF
30 ijcse-01238-8 thangaponu
PPTX
Paper Introduction "Density-aware person detection and tracking in crowds"
PDF
Analysis of Different Techniques for Person Re-Identification: An Assessment
PPT
Face identification
PDF
Online Multi-Person Tracking Using Variance Magnitude of Image colors and Sol...
PDF
Real time pedestrian detection, tracking, and distance estimation
PPT
photo detection in personal photo collection
PDF
Report
PDF
Gj3511231126
PDF
M.Sc. Thesis - Automatic People Counting in Crowded Scenes
PDF
Crowd Density Estimation Using Base Line Filtering
PPT
Gesture recog parag
PDF
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
PPTX
Object Detection and Tracking using Statistical and Stochastic Techniques
PDF
IRJET- Different Techniques for Mob Density Evaluation
PDF
C6524029320
PPT
Exploiting Dissimilarity Representations for Person Re-Identification
PPTX
ObjRecog2-17 (1).pptx
PPTX
Final_ppt1
PPT
Person re-identification, PhD Day 2011
30 ijcse-01238-8 thangaponu
Paper Introduction "Density-aware person detection and tracking in crowds"
Analysis of Different Techniques for Person Re-Identification: An Assessment
Face identification
Online Multi-Person Tracking Using Variance Magnitude of Image colors and Sol...
Real time pedestrian detection, tracking, and distance estimation
photo detection in personal photo collection
Report
Gj3511231126
M.Sc. Thesis - Automatic People Counting in Crowded Scenes
Crowd Density Estimation Using Base Line Filtering
Gesture recog parag
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
Object Detection and Tracking using Statistical and Stochastic Techniques
IRJET- Different Techniques for Mob Density Evaluation
C6524029320
Exploiting Dissimilarity Representations for Person Re-Identification
ObjRecog2-17 (1).pptx
Final_ppt1
Person re-identification, PhD Day 2011
Ad

More from Edge AI and Vision Alliance (20)

PDF
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...

Recently uploaded (20)

PPT
Teaching material agriculture food technology
PDF
KodekX | Application Modernization Development
PPTX
Big Data Technologies - Introduction.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Electronic commerce courselecture one. Pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Approach and Philosophy of On baking technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
Teaching material agriculture food technology
KodekX | Application Modernization Development
Big Data Technologies - Introduction.pptx
MIND Revenue Release Quarter 2 2025 Press Release
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
sap open course for s4hana steps from ECC to s4
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Digital-Transformation-Roadmap-for-Companies.pptx
Understanding_Digital_Forensics_Presentation.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Spectroscopy.pptx food analysis technology
Electronic commerce courselecture one. Pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Encapsulation_ Review paper, used for researhc scholars
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Approach and Philosophy of On baking technology
Machine learning based COVID-19 study performance prediction
The Rise and Fall of 3GPP – Time for a Sabbatical?

“Person Re-Identification and Tracking at the Edge: Challenges and Techniques,” a Presentation from the University of Auckland

  • 1. © 2021 University of Auckland Person Re-Identification and Tracking at the Edge: Challenges and Techniques Morteza Biglari-Abhari Department of Electrical, Computer and Software Engineering University of Auckland, New Zealand
  • 2. © 2021 University of Auckland Outline 2 ➢ Why Person Re-Identification and Tracking ➢ Key Challenges and Current Approaches ➢ Appearance Based One-shot / Unsupervised Re-Identification ➢ Spatio-Temporal Based Tracking ➢ Fused Appearance and Spatio-Temporal Approach ➢ Privacy Issues ➢ Summary and Conclusions
  • 3. © 2021 University of Auckland Why Person Re-Identification and Tracking 3 The aim is matching images of people as viewed through multiple cameras in different positions and locations and determine a unique identity. Possible target applications: o Surveillance for Security and Public Safety o Healthcare and Industrial Facilities o Commercial Entities (such as supermarkets) to monitor customer behavior o Intelligent Transportation System o Smart Cities
  • 4. © 2021 University of Auckland Key Challenges 4 Challenges: variations in the appearance of a person (even in the same camera view) (variations in pose, lighting, color, resolution, motion blur, obstacles, occlusions)
  • 5. © 2021 University of Auckland Current Approaches 5 Person Detection feature extraction or feature learning classification Person Re-Identification ➢Person detection methods should be robust to detect people in different conditions. ➢A person model needs to be robust against various conditions: varying lighting conditions, partially obscured views, different camera view angles Appearance based / Spatio-temporal based
  • 6. © 2021 University of Auckland Person Detection Techniques 6 o Early person detection works relied on using blob detection [Krumm et al, 2000 – Everingham & Zisserman, 2006] o Low computational complexity but low accuracy o Histogram of Oriented Gradients (HOG) and Support Vector Machine (SVM) algorithm [Krumm et al, 2000 – Dalal & Trigs, 2005] o Deformable Parts Module (DPM) – uses HOG features but includes structural relationship between parts of the person [Cho et al, 2012 – Yan et al, 2014] o Calculating HOG features is very computationally expensive o Using background estimation can improve the accuracy o Aggregate Channel Features (ACF) – improves detection speed through isolating features that have the largest contribution towards accurate person detection (focusing on gradient magnitude, HOG, and the LUV color channel). It may be slightly less accurate but faster than DPM. [Dollar et al, 2014 – De Smedt & Goedeme, 2015]
  • 7. © 2021 University of Auckland DPM versus ACF 7 An example of person parts being extracted using DPM An example of ACF pedestrian detection [Benfold & Reid, 2011]
  • 8. © 2021 University of Auckland CNN-based Techniques for Person Detection 8 CNN-based Techniques: o R-CNN: regions of interest (ROI) are extracted that potentially have targets for further analysis and fed to CNN for feature extraction and classification. To improve the processing speed, Fast R-CNN and Faster R-CNN have been proposed. [Girshick, 2015 – Ren et al, 2017] o While this was faster than other CNN-based techniques, it could process 5 fps using a high-end GPU. o Single Shot Detector (SSD): the image is only parsed once rather than processing multiple potentially overlapping windows. Achieving similar level of accuracy to Faster R-CNN but takes less processing time. YOLO is in this category. o Pre-trained ResNet-50 and MobileNet-V2 have been used for person detection and re-identification (for specific datasets such as CUHK03 and DukeMTMC), but they are computationally very intensive for real-time detection on edge devices.
  • 9. © 2021 University of Auckland Feature Vectors (Person Detection) 9 The aim is to select features that allow for high inter-class variation (significantly different between multiple people) while maintaining low intra-class variation (similar for the same person). ➢ Features may include color (RGB, HSV, YCbCr), texture, and structure. ➢ Descriptors that include both color and texture perform better than either one alone. [Gou et al, 2017] A visual representation of an example feature vector (made up of HSV color and LBP texture histograms) representing an entire person.
  • 10. © 2021 University of Auckland Appearance Based Re-Identification 10 Person Detection (DPM) Feature Extraction (Color & LBP Histograms) PCA & Metric Learning (Covariance Metric) Compute Similarity to Classes (Euclidean Distance) If match, Update Class Model (Sequential K-Means) Class / Identity Label Input Image feature vector extraction identity classification Fast one-shot/unsupervised re-identification (feature vector extraction): o A combination of HSV for color and LBP (Local Binary Pattern) for texture are used to represent patches or parts of the detected people. o Principal Components Analysis (PCA) can be used as an unsupervised method of determining the most important dimensions of the feature vectors in terms of variation o Metric learning as a useful pre-processing step transforms the vectors so that they are more linearly separable into identity classes
  • 11. © 2021 University of Auckland Appearance Based Re-Identification (continued) 11 Person Detection (DPM) Feature Extraction (Color & LBP Histograms) PCA & Metric Learning (Covariance Metric) Compute Similarity to Classes (Euclidean Distance) If match, Update Class Model (Sequential K-Means) Class / Identity Label Input Image feature vector extraction identity classification o Metric Learning reduces the computational complexity and improves the accuracy. o Covariance Metric transformation used in this case study. Σ is the output matrix, X is the input feature vector, μ is the mean, and i and j refer to the positions of elements within the vector/matrix. ∑ij = µ[Xi Xj] - µi µj
  • 12. © 2021 University of Auckland Appearance Based Re-Identification (continued) 12 Person Detection (DPM) Feature Extraction (Color & LBP Histograms) PCA & Metric Learning (Covariance Metric) Compute Similarity to Classes (Euclidean Distance) If match, Update Class Model (Sequential K-Means) Class / Identity Label Input Image feature vector extraction identity classification Fast one-shot/unsupervised re-identification (identity classification): o Each class represents a single identity, and the aim is to classify the transformed feature vectors into classes o Supervised learning requiring large training data may not be suitable in applications where a possible individual may enter an unconstrained camera view o Unsupervised learning may not be suitable due to very poor accuracy As a compromise, one-shot learning methods may be used, where a single sample (per class) is used during training and the model is updated at run-time.
  • 13. © 2021 University of Auckland Reducing the impact of misclassification in one-shot/unsupervised learning 13 o Compare the new sample (probe) with each target sample in the gallery in each class (i.e. calculate the Euclidean distance) o If the distance is below a specified threshold, then they match o If the number of matches is more than a specified numMin, then the new sample is classified as part of that identity class Gallery Approach: A gallery of N feature vectors is maintained for each identity class: ➢ Create a new class for a new person and use the extracted feature as an anchor ➢ Establish a gallery of N samples (initially all identical) for each class Classification Model Update Two main parts: Classification step and Model Update step o A random target sample in the class gallery is replaced with the classified probe sample. o Constraint on N makes the model stable and numMin reduces the impact of mis-classification
  • 14. © 2021 University of Auckland Reducing the impact of misclassification in one-shot/unsupervised learning (continued) 14 Sequential k-Means Approach: A modified from of k-Means clustering that supports online learning to classify feature vectors is used. Each class/cluster is a new person. ➢ Each class is represented only as a cluster mean (instead of retaining all the data points). o Use the first sample’s feature vector to initialize a new cluster center (mc for class c) o Compare a new probe feature vector X to the cluster mean mc for each existing class o The probe feature vector X is classified into class c with the lowest Euclidean distance ||mc - X|| o Update the selected cluster mean using mc = β.X + (1 – β) mc Proper value for β can be determined through parameter sweeping for the specific data set.
  • 15. © 2021 University of Auckland Spatio-Temporal Based Tracking 15 ➢ Camera calibration matrices are used to convert the image-space pixel coordinate for the person to a real-world coordinate on a map ➢ Position of each person detected in frame N (the current frame) is classified based on their proximity to each of the predictions in frame N-1 (the previous frame). ➢ Kalman Filters are used to predict the next position of each track in a way that takes the kinematics of the person into account, with robustness against noise
  • 16. © 2021 University of Auckland Fused Appearance and Spatio-Temporal Approach 16 Person Detection (DPM) Feature Extraction (Color & LBP Histograms) PCA & Metric Learning (Covariance Metric) Model Fusion (Linear Weighting) Update Class Model (Sequential K-Means) Input Image feature vector similarity ranking Position Estimation (Image-space Coordinates) Compute Similarity to Classes (Euclidean Distance) position measurement Update Class Model (Kalman Filter)
  • 17. © 2021 University of Auckland Initial Experimental Results - A Case Study 17 UoA-Indoor Dataset: ➢Three hours of footage from four overlapping cameras (resolution 1920 x 1080 at 15 frames per second) ➢19 different identities annotated across 150000 frames Experiments were conducted on two cases: o Walk (where there is only one person in the room at a time) o Group sequences (up to four people in the room at the same time, interacting with each other)
  • 18. © 2021 University of Auckland Initial Experimental Results - A Case Study (Continued) 18 While people may not be in the room at the same time, the system remembers the identities of people it has seen before.
  • 19. © 2021 University of Auckland Initial Experimental Results - A Case Study (Continued) 19 Comparing DPM vs. ACF: Classification Model One-shot learning accuracy % Unsupervised learning accuracy % Processing Speed (fps) DPM Appearance only 51.9 48.8 Spatio-temporal only 33.3 31.7 Fused 65.7 61.6 9.8 ACF Appearance only 53.8 47.1 Spatio-temporal only 34.3 30.6 Fused 69.4 56.7 22.3
  • 20. © 2021 University of Auckland Privacy Issues 20 Privacy issues may be considered as protecting personal information and security vulnerabilities that may affect the sensitive information. The first issue can be addressed through Privacy-by-Design. ➢Privacy-Aware framework: Based on the target application requirements, parts of captured images may be censored to avoid individual identification (where not necessary). ➢Privacy-Affirming framework: Only the necessary data is extracted from the input image though computer vision techniques.
  • 21. © 2021 University of Auckland Security Vulnerabilities (may affect sensitive information) 21 Security threats and attacks on machine learning based computer vision systems may significantly compromise the data integrity and robustness of object detection and tracking. Source: [Hanif et al, 2018]
  • 22. © 2021 University of Auckland Summary and Conclusions 22 ➢ An appearance-based person re-identification and tracking was presented considering some trade-offs between accuracy and computational complexities. ➢ A spatio-temporal model was discussed to further aid the classification of detected individuals into identity classes, using Kalman Filters to predict the future positions of people. ➢ A fused appearance-based and spatio-temporal approach was presented to improve the accuracy ➢ The effectiveness of existing approaches are application dependent. o Machine learning based and traditional image processing techniques can be employed for edge device implementations (depending on the required accuracy and application requirements). o Some traditional image processing techniques may be more suitable for implementing at the edge devices (HOG based techniques are more energy efficient than CNN–based approaches). o Privacy, security and data integrity are additional challenges for implementation at the edge.
  • 23. © 2021 University of Auckland Acknowledgements 23 ▪ Dr. Andrew Tzer-Yeu Chen ▪ Dr. Kevin I-Kai Wang
  • 24. © 2021 University of Auckland Resources: Related Publications 24 - Chen, A. T., Biglari-Abhari, M., & Wang, K. I. K. (2020) Fusing Appearance and Spatio-Temporal Models for Person Re-Identification and Tracking. J. Imaging 2020, 6, 27. https://doi.org/10.3390/jimaging6050027 - Chen, A. T., Biglari-Abhari, M., & Wang, K. I. K. (2019) Investigating fast re-identification for multi-camera indoor person tracking, Elsevier Journal of Computers & Electrical Engineering, Vol. 77, pp. 273 – 288, 2019. https://doi.org/10.1016/j.compeleceng.2019.06.009 - Chen, A. T-Y., Biglari-Abhari, M., Wang, K. (2018) SuperBE: Computationally-Light Background Estimation with Superpixels, Journal of Real-time Image Processing, January 2018 - Chen, A. T., Gupta, R., Borzenko, A., Wang, K. I. K & Biglari-Abhari, M. (2018). Accelerating SuperBE with Hardware/Software Co-Design, in Journal of Imaging, 2018, 4(10), 122; doi: 10.3390/jimaging410012 - Chen, A. T., Biglari-Abhari, M., & Wang, K. (2018). Fast One-Shot Learning for Identity Classification in Person Re-identification and Tracking, in Proceedings of the 15th IEEE International Conference on Control, Automation, Robotics and Vision (ICARCV-2018), Singapore, 18-21 Nov. 2018 - Chen, A. T., Biglari-Abhari, M., & Wang, K. (2018). Context is King: Privacy Perceptions of Camera-based Surveillance, in Proceedings of the 15th IEEE International Conference on Advanced Video and Signal-based Surveillance, Auckland - New Zealand, 27-30 November 2018 - Chen, A. T., Biglari-Abhari, M., Wang, K. I. K., Bouzerdoum, A., & Tivive, F. H. -C. (2018). Convolutional Neural Network Acceleration with Hardware/Software Co-Design. Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem- Solving Technologies. 48 (5), 1288-1301, doi:10.1007/s10489-017-1007-z - Chen, A. T-Y., Biglari-Abhari, M., Wang, K. I-K., (2017) Trusting the Computer in Computer Vision: A Privacy-Affirming Framework, Proceedings of The First International Workshop on The Bright and Dark Sides of Computer Vision: Challenges and Opportunities for Privacy and Security (CV- COPS 2017), Honolulu, Hawaii — July 21, 2017 - Chen, A. T-Y., Fan, J., Biglari-Abhari, M., Wang, K. I-K., (2017) A Computationally Efficient Pipeline for Camera-based Indoor Person Tracking, Proceedings of Image and Vision Computing New Zealand (IVCNZ 2017), Christchurch, New Zealand — 4 – 6 Dec. 2017
  • 25. © 2021 University of Auckland Other Related Works (Our Research Team) 25 - Hemmati, M., Biglari-Abhari, M., & Niar, S. (2019) Adaptive Vehicle Detection for Real-time Autonomous Driving System, in Proceedings of the 2019 IEEE Conference on Design, Automation & Test in Europe (DATE), Florence, Italy, 25-28 March 2019, pp. 1034-1039, doi:10.23919/DATE.2019.8714818 - Porter, R., Morgan, S., Biglari-Abhari, M. (2019) Extending a Soft-Core RISC-V Processor to Accelerate CNN Inference, to appear in Proceedings of the Sixth Annual Conference on Computational Science & Computational Intelligence, Las Vegas, Nevada, 5-7 December 2019 - Hemmati, M., Biglari-Abhari, M., Niar, S., Berber, S., (2017) Real‐Time Multi‐Scale Pedestrian Detection for Driver Assistance Systems, ACM/IEEE Proceedings of the 54th Design Automation Conference (DAC), Austin, TX, 18-22 June 2017 - Hemmati, M., Biglari-Abhari, M., Berber, S., & Niar, S. (2014) HOG Feature Extractor Hardware Accelerator for Real-time Pedestrian Detection. Proceedings of 17th Euromicro Conference on Digital System Design (DSD), Verona, ITALY: 27 August - 29 August 2014. (543- 550) Other Related Embedded Computer Vision Systems publications:
  • 26. © 2021 University of Auckland References 26 [Benfold & Reid, 2011] B. Benfold and I. Reid, “Stable multi-target tracking in real-time surveillance video,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2011, pp. 3457–3464. [Cho & Rybski, 2012] H. Cho, P. E. Rybski, A. Bar-Hillel, and W. Zhang, “Real-time pedestrian detection with deformable part models,” in Intelligent Vehicles Symposium (IVS), 2012, pp. 1035–1042. [Dalal & Triggs, 2005] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, 2005, pp. 886–893. [DeSmedt & Goedeme] F. DeSmedt and T. Goedeme ́,“Open framework for combined pedestrian detection,” in International Conference on Computer Vision Theory and Applications (VISIGRAPP), 2015, pp. 551–558. [Dollar et al, 2014] P. Dolla ́r, R. Appel, S. Belongie, and P. Perona, “Fast feature pyramids for object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 8, pp. 1532–1545, 2014. [Everingham & Zisserman, 2006] M. Everingham and A. Zisserman, “Automated person identification in video,” in International Conference on Image and Video Retrieval (CIVR), 2006, pp. 289–298. [Girshick, 2015] R. Girshick, “Fast R-CNN,” in International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448. [Gou et al, 2017] M. Gou, S. Karanam, W. Liu, O. Camps, and R. J. Radke, “DukeMTMC4ReID: A large-scale multi-camera person re-identification dataset,” in Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 10–19. [Hanif et al, 2018] M. A. Hanif, F. Khalid, R. V. W. Putra, S. Rehman and M. Shafique, "Robust Machine Learning Systems: Reliability and Security for Deep Neural Networks," 2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS), pp. 257-260 [Krumm et al, 2000] J. Krumm, S. Harris, B. Meyers, B. Brumitt, M. Hale, and S. Shafer, “Multi-camera multi-person tracking for EasyLiving,” in International Workshop on Visual Surveillance, 2000, pp. 3–10. [Ren et al, 2017] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137– 1149, 2017. [Yan et al, 2014] J. Yan, Z. Lei, L. Wen, and S. Z. Li, “The fastest deformable part model for object detection,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 2497–2504.