SlideShare a Scribd company logo
Copyright © 2014 videantis 1
Marco Jacobs
May 29, 2014
Implementing Histogram of Oriented
Gradients on a Parallel Vision Processor
Copyright © 2014 videantis 2
• 50% of the brain is used for vision
• Body uses 100W
• Brain consumes 20W
•  about 10W for vision analysis
The Challenge: Make Our Phones, Cars, Etc.
Smarter Than Us
• Challenge: beat the human
• Seems really hard, but can focus
on specific areas:
• Build machines that are
faster, safer, cheaper, last
longer, more accurate, etc.
Copyright © 2014 videantis 3
• Object detection/recognition
“That’s a person”
“That’s a car”
• Dalal & Triggs, “Histograms of
Oriented Gradients for Human
Detection”, INRIA (France), 2005
• Seminal paper: “100x accuracy
increase in object detection”
Object Detection — Key Vision Algorithm
Computer Vision: Algorithms and
Applications
Richard Szeliski
Copyright © 2014 videantis 4
Object Detection — Sample Applications
automotive:
pedestrian detection
automotive:
vehicle detection
surveillance:
perimeter detection
research:
animal detection
industry:
object inspection
search:
image categorization
Copyright © 2014 videantis 5
Step 1 — Training the Classifier, Offline
Normalized, fixed resolution images
with yes/no annotations
Extract feature vector (HOG)
Learn binary classifier (SVM)
Object yes / no
pedestrian
no pedestrian
 resample
false positives  retrain
SVM classifier
Copyright © 2014 videantis 6
Step 2 — Object Detection, Real-time
Scan image at different scales and
locations
Extract features over window (vector)
Run SVM classifier (object yes/no)
Fuse multiple detections
Final object detected
Detection window
Copyright © 2014 videantis 7
Training — INRIA people dataset
• Variety of poses
• Variable appearance / clothing
• Complex background
• Unconstrained illumination
• Occlusions and different scales
• Main assumption:
• clearly visible mostly upright people
Copyright © 2014 videantis 8
Step 2 — Object Detection, Real-time
Grayscale input image
Generate multiscale pyramid
Gamma normalization
Gradient calculation
(calculate angle and magnitude)
Histogram per block
SVM per window position
Non-max suppression
64
128 16
16
Copyright © 2014 videantis 9
Step 2 — Object Detection, Real-time
Grayscale input image
Generate multiscale pyramid
Gamma normalization
Gradient calculation
(calculate angle and magnitude)
Histogram per block
SVM per window position
Non-max suppression
8 8
8
8
4 histograms
of 9 cells
Per 64x128 window: feature vector of 105x4x9=3780
Multiply by SVM vector  object detected yes/no
Gradient
direction &
magnitude
64x128 pixels
7*15=105
16x16 positions
Copyright © 2014 videantis 10
• HOG compute complexity is ~10x optical flow (for full frame rate and
resolution)
• To reduce complexity, can locate features inside detected object
window and track these across frame
• Can also calculate the direction of the object
• Significantly reduces processor load
HOG in Combination With Feature Detect & Track
frame 1 frame 1 frame 2 frame 3 frame 4
HOG Feature detect Feature track Feature track Feature track
Copyright © 2014 videantis 11
Videantis v-MP4000HDX Architecture
Embedded
Vision
Subsystem
Bitstream
(un)packers
for video
codecs
Heterogeneous, scalable multi-core IP
• v-SP for bitstream parsing/
generation in video codecs
• v-MP for pixel-processing:
• vision, video encoding,
decoding, image processing
• Each v-MP is VLIW & SIMD
with own DMA
• v-MP4280HDX delivers:
• 8 x ~25.6 GOPS per v-MP at
800MHz, total >200 GOPS
• Less than 2mm2 in 28nm
Copyright © 2014 videantis 12
Host CPU GPUs Imaging DSPs v-MP4000HDX
ILP: VLIW or
superscalar
Superscalar
(Superscalar is
expensive in HW)
Varies, not
disclosed
Needs CPU
4-issue
>2 issue VLIW
causes NOPs and
requires loop
unrolling
2-issue VLIW
Right trade off
SIMD
128-bit
requires second
pipeline, RF, etc.
Very wide array
not used efficiently
by block-based
algos
>128-bit SIMD
Wide SIMD can’t be
used efficiently by
block-based algos
64/128-bit
Right trade off for
imaging and video
Multicore
1-4 cores
but cache coherency
introduces overhead
Many cores, with
many restrictions
1 core
1-8+ cores
Supports diverse
algorithms
Scales to low or
high end apps
Processor
frequency
2GHz+
Long pipeline
introduces hardware
overhead
~1GHz
Medium/long
pipelines
500MHz-1GHz
Medium pipeline
500MHz-1GHz
Medium pipeline
Caches /
DMA
Multi-level caches Multi-level caches
No cache, single
DMA
No cache, DMA per
core
Architecture Trade-offs for Vision Algos
Copyright © 2014 videantis 13
• “Lower-level” pixel processing processed on
accelerator
• How to enable acceleration on v-MP4280HDX:
• Replace all image data allocators
cvCreateMatHeader(…);
cvCreateData(…);
hog.detectMultiScale(…);
• by new “shared memory” allocator
cvCreateMatHeader(…);
cvCreateDataOvl(…)
hog.detectMultiScale(…);
• API internally takes care of moving data
and processing onto accelerator
• “Higher-level” processing remains on
host CPU for initial accelerated version
Seamless OpenCV Acceleration
Hostv-MP4280HDX
Image pyramid
HOG
SVM
Fuse multiple
detections
Copyright © 2014 videantis 14
Calculating HOG feature vectors in parallel:
• Each v-MP gets a slice of 16 pixels height
• Within the row, we calculate the HOG feature vector per 16x16
block
• We DMA in the next 8x16 block of data while the previous 16x16
block is processed
Mapping of HOG to v-MP4280HDX
process
DMA
shift
816
16
Copyright © 2014 videantis 15
Calculating the SVM dot products in parallel:
• We use the Daimler detector: 48x96 window versus 64x128 original Dalal and
Triggs. The Daimler detector detects pedestrians that are smaller in view
• 4 histograms x 9 bins x 5x11 16x16 blocks, using 8-pixel overlap
• Process a column per v-MP. Keep the fixed SVM vector local to v-MP
• Process a sliding window in vertical direction, preload the next 5x 9x4
histograms
Mapping of SVM to v-MP4280HDX
process
DMA
shift
4x9 x5
11
Copyright © 2014 videantis 16
• HOG in each image at 30 fps (each frame in video) or at 2 fps (for
combination with tracking)
• 1.2GHz Cortex-A9 ARM runs VGA at ~1fps
• Performance v-MP4280HDX compared to ARM: 135x at same frequency
• Power v-MP4280HDX compared to ARM: >400x lower
v-MP4280HDX HOG Performance and Power
* performance and power measured on videantis 40nm silicon
Copyright © 2014 videantis 17
Conclusions
• HOG is a key algorithm for object detection
• ~90% detection rate with 10-4 false positives per window
• Computationally demanding algorithm, ~10x more complex than feature
detection or optical flow
• The algorithms can be implemented efficiently at high resolution while
consuming low power on the videantis v-MP4000HDX vision processor
Please drop by our
booth for a silicon
demonstration
Copyright © 2014 videantis 18
HOG:
• Histogram of Oriented Gradients (HOG) for Object
Detection in Images, Navneet Dalal
• https://www.youtube.com/watch?v=7S5qXET179I
• 19 mins: starts talking about HOG
• Histograms of Oriented Gradients, UCF Computer Vision
Video Lectures 2012, Mubarak Shah
• http://www.youtube.com/watch?v=0Zib1YEE4LU
SVM:
• Support Vector Machines, Scholastic Home Video Tutor
• https://www.youtube.com/watch?v=LXGaYVXkGtg
• Support Vector Machines, AI course Fall 2010, MIT
• https://www.youtube.com/watch?v=_PwhiWxHK8o
References — Videos
Copyright © 2014 videantis 19
Marco Jacobs
Thank you

More Related Content

PDF
"Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,...
PDF
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
PDF
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
PDF
"Real-world Vision Systems Design: Challenges and Techniques," a Presentation...
PDF
"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius
PDF
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
PDF
On-Device AI
PDF
White Paper - CEVA-XM4 Intelligent Vision Processor
"Lessons Learned from Bringing Mobile and Embedded Vision Products to Market,...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Pre...
"Real-world Vision Systems Design: Challenges and Techniques," a Presentation...
"Dataflow: Where Power Budgets Are Won and Lost," a Presentation from Movidius
"Using the OpenCL C Kernel Language for Embedded Vision Processors," a Presen...
On-Device AI
White Paper - CEVA-XM4 Intelligent Vision Processor

What's hot (20)

PDF
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
PDF
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
PDF
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
PDF
MIT's experience on OpenPOWER/POWER 9 platform
PDF
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
PDF
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
PDF
Jetson AGX Xavier and the New Era of Autonomous Machines
PDF
“Once-for-All DNNs: Simplifying Design of Efficient Models for Diverse Hardwa...
PDF
Image Fusion - Approaches in Hardware
PDF
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
PDF
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
PDF
04 New opportunities in photon science with high-speed X-ray imaging detecto...
PDF
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
PDF
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
PDF
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
PDF
Methods for Achieving RTL to Gate Power Consistency
PDF
Persistent Memory Productization driven by AI & ML
PPTX
Chips&toys
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
"Using SGEMM and FFTs to Accelerate Deep Learning," a Presentation from ARM
“Deploying PyTorch Models for Real-time Inference On the Edge,” a Presentatio...
Bring the Future of Entertainment to Your Living Room: MPEG-I Immersive Video...
MIT's experience on OpenPOWER/POWER 9 platform
"New Standards for Embedded Vision and Neural Networks," a Presentation from ...
NVIDIA 深度學習教育機構 (DLI): Approaches to object detection
Jetson AGX Xavier and the New Era of Autonomous Machines
“Once-for-All DNNs: Simplifying Design of Efficient Models for Diverse Hardwa...
Image Fusion - Approaches in Hardware
"The Vision API Maze: Options and Trade-offs," a Presentation from the Khrono...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
04 New opportunities in photon science with high-speed X-ray imaging detecto...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
Methods for Achieving RTL to Gate Power Consistency
Persistent Memory Productization driven by AI & ML
Chips&toys
Energy Efficient Computing using Dynamic Tuning
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
Ad

Similar to "Implementing Histogram of Oriented Gradients on a Parallel Vision Processor," a Presentation from videantis (20)

PDF
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
PPTX
High Performance Pedestrian Detection On TEGRA X1
PPTX
ObjRecog2-17 (1).pptx
PDF
Виктор Ерухимов Open VX mixar moscow sept'15
PDF
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
PDF
"Computer-vision-based 360-degree Video Systems: Architectures, Algorithms an...
PDF
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
PDF
ArmourRD_ResearchPoster
PPTX
AI IN VEHICLE COUNTING (1).pptx
PDF
Computer architecture for vision system
PPTX
Efficient architecture to condensate visual information driven by attention ...
PDF
Fpga human detection
PDF
"Making OpenCV Code Run Fast," a Presentation from Intel
PPTX
Wits presentation 6_28072015
PDF
Computer-Vision_Integrating-Technology_MOB_17.06.16
PDF
[212]big models without big data using domain specific deep networks in data-...
PDF
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
PPTX
Moving vehicle detection from Videos
ODP
An Introduction to Computer Vision
PDF
ICS1020CV_2022.pdf
"Energy-efficient Hardware for Embedded Vision and Deep Convolutional Neural ...
High Performance Pedestrian Detection On TEGRA X1
ObjRecog2-17 (1).pptx
Виктор Ерухимов Open VX mixar moscow sept'15
"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Applic...
"Computer-vision-based 360-degree Video Systems: Architectures, Algorithms an...
PL-4043, Accelerating OpenVL for Heterogeneous Platforms, by Gregor Miller
ArmourRD_ResearchPoster
AI IN VEHICLE COUNTING (1).pptx
Computer architecture for vision system
Efficient architecture to condensate visual information driven by attention ...
Fpga human detection
"Making OpenCV Code Run Fast," a Presentation from Intel
Wits presentation 6_28072015
Computer-Vision_Integrating-Technology_MOB_17.06.16
[212]big models without big data using domain specific deep networks in data-...
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
Moving vehicle detection from Videos
An Introduction to Computer Vision
ICS1020CV_2022.pdf
Ad

More from Edge AI and Vision Alliance (20)

PDF
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...

Recently uploaded (20)

PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPT
Teaching material agriculture food technology
PPTX
A Presentation on Artificial Intelligence
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
Machine Learning_overview_presentation.pptx
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
Spectroscopy.pptx food analysis technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
sap open course for s4hana steps from ECC to s4
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Teaching material agriculture food technology
A Presentation on Artificial Intelligence
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine Learning_overview_presentation.pptx
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Approach and Philosophy of On baking technology
Spectroscopy.pptx food analysis technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
MYSQL Presentation for SQL database connectivity
Network Security Unit 5.pdf for BCA BBA.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Per capita expenditure prediction using model stacking based on satellite ima...
Spectral efficient network and resource selection model in 5G networks
Dropbox Q2 2025 Financial Results & Investor Presentation
sap open course for s4hana steps from ECC to s4

"Implementing Histogram of Oriented Gradients on a Parallel Vision Processor," a Presentation from videantis

  • 1. Copyright © 2014 videantis 1 Marco Jacobs May 29, 2014 Implementing Histogram of Oriented Gradients on a Parallel Vision Processor
  • 2. Copyright © 2014 videantis 2 • 50% of the brain is used for vision • Body uses 100W • Brain consumes 20W •  about 10W for vision analysis The Challenge: Make Our Phones, Cars, Etc. Smarter Than Us • Challenge: beat the human • Seems really hard, but can focus on specific areas: • Build machines that are faster, safer, cheaper, last longer, more accurate, etc.
  • 3. Copyright © 2014 videantis 3 • Object detection/recognition “That’s a person” “That’s a car” • Dalal & Triggs, “Histograms of Oriented Gradients for Human Detection”, INRIA (France), 2005 • Seminal paper: “100x accuracy increase in object detection” Object Detection — Key Vision Algorithm Computer Vision: Algorithms and Applications Richard Szeliski
  • 4. Copyright © 2014 videantis 4 Object Detection — Sample Applications automotive: pedestrian detection automotive: vehicle detection surveillance: perimeter detection research: animal detection industry: object inspection search: image categorization
  • 5. Copyright © 2014 videantis 5 Step 1 — Training the Classifier, Offline Normalized, fixed resolution images with yes/no annotations Extract feature vector (HOG) Learn binary classifier (SVM) Object yes / no pedestrian no pedestrian  resample false positives  retrain SVM classifier
  • 6. Copyright © 2014 videantis 6 Step 2 — Object Detection, Real-time Scan image at different scales and locations Extract features over window (vector) Run SVM classifier (object yes/no) Fuse multiple detections Final object detected Detection window
  • 7. Copyright © 2014 videantis 7 Training — INRIA people dataset • Variety of poses • Variable appearance / clothing • Complex background • Unconstrained illumination • Occlusions and different scales • Main assumption: • clearly visible mostly upright people
  • 8. Copyright © 2014 videantis 8 Step 2 — Object Detection, Real-time Grayscale input image Generate multiscale pyramid Gamma normalization Gradient calculation (calculate angle and magnitude) Histogram per block SVM per window position Non-max suppression 64 128 16 16
  • 9. Copyright © 2014 videantis 9 Step 2 — Object Detection, Real-time Grayscale input image Generate multiscale pyramid Gamma normalization Gradient calculation (calculate angle and magnitude) Histogram per block SVM per window position Non-max suppression 8 8 8 8 4 histograms of 9 cells Per 64x128 window: feature vector of 105x4x9=3780 Multiply by SVM vector  object detected yes/no Gradient direction & magnitude 64x128 pixels 7*15=105 16x16 positions
  • 10. Copyright © 2014 videantis 10 • HOG compute complexity is ~10x optical flow (for full frame rate and resolution) • To reduce complexity, can locate features inside detected object window and track these across frame • Can also calculate the direction of the object • Significantly reduces processor load HOG in Combination With Feature Detect & Track frame 1 frame 1 frame 2 frame 3 frame 4 HOG Feature detect Feature track Feature track Feature track
  • 11. Copyright © 2014 videantis 11 Videantis v-MP4000HDX Architecture Embedded Vision Subsystem Bitstream (un)packers for video codecs Heterogeneous, scalable multi-core IP • v-SP for bitstream parsing/ generation in video codecs • v-MP for pixel-processing: • vision, video encoding, decoding, image processing • Each v-MP is VLIW & SIMD with own DMA • v-MP4280HDX delivers: • 8 x ~25.6 GOPS per v-MP at 800MHz, total >200 GOPS • Less than 2mm2 in 28nm
  • 12. Copyright © 2014 videantis 12 Host CPU GPUs Imaging DSPs v-MP4000HDX ILP: VLIW or superscalar Superscalar (Superscalar is expensive in HW) Varies, not disclosed Needs CPU 4-issue >2 issue VLIW causes NOPs and requires loop unrolling 2-issue VLIW Right trade off SIMD 128-bit requires second pipeline, RF, etc. Very wide array not used efficiently by block-based algos >128-bit SIMD Wide SIMD can’t be used efficiently by block-based algos 64/128-bit Right trade off for imaging and video Multicore 1-4 cores but cache coherency introduces overhead Many cores, with many restrictions 1 core 1-8+ cores Supports diverse algorithms Scales to low or high end apps Processor frequency 2GHz+ Long pipeline introduces hardware overhead ~1GHz Medium/long pipelines 500MHz-1GHz Medium pipeline 500MHz-1GHz Medium pipeline Caches / DMA Multi-level caches Multi-level caches No cache, single DMA No cache, DMA per core Architecture Trade-offs for Vision Algos
  • 13. Copyright © 2014 videantis 13 • “Lower-level” pixel processing processed on accelerator • How to enable acceleration on v-MP4280HDX: • Replace all image data allocators cvCreateMatHeader(…); cvCreateData(…); hog.detectMultiScale(…); • by new “shared memory” allocator cvCreateMatHeader(…); cvCreateDataOvl(…) hog.detectMultiScale(…); • API internally takes care of moving data and processing onto accelerator • “Higher-level” processing remains on host CPU for initial accelerated version Seamless OpenCV Acceleration Hostv-MP4280HDX Image pyramid HOG SVM Fuse multiple detections
  • 14. Copyright © 2014 videantis 14 Calculating HOG feature vectors in parallel: • Each v-MP gets a slice of 16 pixels height • Within the row, we calculate the HOG feature vector per 16x16 block • We DMA in the next 8x16 block of data while the previous 16x16 block is processed Mapping of HOG to v-MP4280HDX process DMA shift 816 16
  • 15. Copyright © 2014 videantis 15 Calculating the SVM dot products in parallel: • We use the Daimler detector: 48x96 window versus 64x128 original Dalal and Triggs. The Daimler detector detects pedestrians that are smaller in view • 4 histograms x 9 bins x 5x11 16x16 blocks, using 8-pixel overlap • Process a column per v-MP. Keep the fixed SVM vector local to v-MP • Process a sliding window in vertical direction, preload the next 5x 9x4 histograms Mapping of SVM to v-MP4280HDX process DMA shift 4x9 x5 11
  • 16. Copyright © 2014 videantis 16 • HOG in each image at 30 fps (each frame in video) or at 2 fps (for combination with tracking) • 1.2GHz Cortex-A9 ARM runs VGA at ~1fps • Performance v-MP4280HDX compared to ARM: 135x at same frequency • Power v-MP4280HDX compared to ARM: >400x lower v-MP4280HDX HOG Performance and Power * performance and power measured on videantis 40nm silicon
  • 17. Copyright © 2014 videantis 17 Conclusions • HOG is a key algorithm for object detection • ~90% detection rate with 10-4 false positives per window • Computationally demanding algorithm, ~10x more complex than feature detection or optical flow • The algorithms can be implemented efficiently at high resolution while consuming low power on the videantis v-MP4000HDX vision processor Please drop by our booth for a silicon demonstration
  • 18. Copyright © 2014 videantis 18 HOG: • Histogram of Oriented Gradients (HOG) for Object Detection in Images, Navneet Dalal • https://www.youtube.com/watch?v=7S5qXET179I • 19 mins: starts talking about HOG • Histograms of Oriented Gradients, UCF Computer Vision Video Lectures 2012, Mubarak Shah • http://www.youtube.com/watch?v=0Zib1YEE4LU SVM: • Support Vector Machines, Scholastic Home Video Tutor • https://www.youtube.com/watch?v=LXGaYVXkGtg • Support Vector Machines, AI course Fall 2010, MIT • https://www.youtube.com/watch?v=_PwhiWxHK8o References — Videos
  • 19. Copyright © 2014 videantis 19 Marco Jacobs Thank you