SlideShare a Scribd company logo
© 2019 Renesas
Dynamically Reconfigurable
Processor Technology for Vision
Processing
Yoshio Sato
Renesas
May 2019
© 2019 Renesas
Renesas Broad Product Portfolio
RZ 32/64-bit Arm-Based High-End Embedded MPU Family
2
8/16-bit Ultra-low energy
Sensing and Motor Control
Microcontrollers and Microprocessors
32/64-bit Arm-Based
High-End Embedded MPU
32-bit High power efficiency
Motor Control
Renesas 40nm process
Automotive
…and more
Analog & Mixed Signal, Power, Discrete
▪ Analog Devices
▪ Interface
▪ Memory
▪ Optoelectronics
▪ Power Management
▪ Sensors
▪ Space & Harsh Environment
▪ Timing & Digital Logic
For automotive:
▪ Power Management
▪ Battery Management
▪ Video & Display
…and more
SoC, Integrated Platforms
Factory automation
Automotive
AutomotiveRenesas autonomy™
IoTRenesas Synergy™
HMIRenesas RZ/G Linux
© 2019 Renesas
Image Processing Market Dynamics
Vision Evolution Drives Need for More Computing Power and Connectivity Bandwidth
3
1x
26x
1x 5x
1x
130x
Resolution
FPS
VGA
30 10060
Consumer Healthcare
Picking,
Sorting
8MP+
Network
Video
Marketing
Camera
2MP
1MP
5MP
Motion Capture,
Sports Analysis
Intruder
Alarm
150+
Security, Marketing
Machine Vision
3MP
Medical Camera
(Endoscope)
Digital
Microscope
Access
Control
Barcode,
Character
Visual
Inspection
© 2019 Renesas
Problem Statement
Limitations of EXISTING ARCHITECTURES
Multi processor
• Limited scalability
• Power hungry
Dedicated IP
• Adding functionality increases cost and power consumption
• Not flexible
FPGA
• Power hungry
• Area efficiency challenge
4
© 2019 Renesas
Multi Processor Problem: Amdahl's Law
Speedup is Limited by the Serial Part of the Program
5
https://en.wikipedia.org/wiki/Amdahl%27s_law#
© 2019 Renesas
Multi Processor Problem: Von Neumann Architecture
Data Transfer Drives Power Consumption
6
Conventional Software Implementation (CPU/DSP/GPU)
Wired Logic Architecture (Dedicated IP/FPGA)
Saving memory access by combining multiple functions
→ Reducing power by less memory access
DRAM DRAM DRAM DRAM DRAM
DRAM DRAM
Sequential functions with many memory accesses
→ Consuming power only for data transfer
Func
1
Func
2
Func
3
Func
4
Func.
1
Func.
2
Func.
3
Func.
4
Compute
Local
SRAM
Main
SRAM
Main
DRAM
1x
2x
128x
CHIP
Reference: presentation by Bert Moons
Relative energy
consumption per equivalent
MAC operation
I/O
© 2019 Renesas
Dedicated IP Tradeoff
Performance Increase at Cost of Larger Die Size and Power Consumption
7
0
10
20
30
40
50
Product
Generation
Gen.1 Gen.2 Gen.3
Functionality,Cost,Power
© 2019 Renesas
Dedicated IP Limitations
Limited Flexibility to Fix Bugs for Pre Production and Mass Production
8
Product
GenerationGen 1 Gen 2
Functionality,Cost,Power
Gen 2.1 Gen 2.2
Bug Fix
Upgrade
Optimize
after market
Time-to-market
Bug Found!
Hide it!
Re-spin!
What if we find the bug in the field?
© 2019 Renesas
FPGA Problem
Efficient Mapping is Challenging; Adding Features Increases Die Size and Power
Programmability requires
tradeoff of increased die
size, power, and cost
Mapping design
efficiently to FPGA
resources is difficult
Adding new features may
force move to larger,
more expensive, and
power-hungry FPGA
9
IO Pads
Interconnect
Switch
Block
Logic
Block
Free
Area
FreeFree
© 2019 Renesas
Dynamically Reconfigurable Processor (DRP)
Benefits of Software Flexibility with Pipelined Hardware Performance
10
Performance HighLow
High
Low
DRP
Dedicated IP
or
FPGA
CPU
Flexible, but limited
performance
High performance, but limited
runtime reconfigurability
Allocate system
functions in
complementary
manner
Runtime-reconfigurable
hardwareProgrammability
© 2019 Renesas
DRP (Dynamically Reconfigurable Processor)
Spatial and Time-Multiplexed Computing
11
CPU GPU (SIMD) FPGA DRP
ALU ALU ALU ALU
Process 1 to a few
operations in a cycle
Process 10-100 same
operations in a cycle
Spatially expand
various operations
into LUT architecture
Efficient if all parallel
data are prepared
High degree of
freedom but
limited performance
High performance
but use huge area
due to expansion
Pipelined processing
Reg Reg
ld
ld
add
sub
bne
st
ld
ld
add
sub
ld ld
ld ld
add add
sub sub
Time
+
× -
Process spatially
expanded operations
Combine performance
with expansion and
area efficiency via
dynamic reconfiguration
+×
× -
Time
Spatially pipelined
Time-multiplexed
© 2019 Renesas
DRP in the System
Reconfigurable Array + Fast DMA
12
Fast and Intelligent Data Mover (DMA)
◼ Fast external memory access via dedicated DMA
◼ C programming with access API
◼ Dynamic configuration loading in less 1ms from
external memory
Reconfigurable Processing Elements Array
◼ Reconfigure as pipelined circuit to fit with algorithm
◼ Run complex algorithms by switching configurations
within a nanosecond (up to 64 contexts)
◼ Programming with C language by high-level Synthesis
tool
Reduce memory bandwidth and
achieve low latency
Memory
Controller
DMA controller
FIFO
FIFO
CPU
External DRAM
Output ImageInput Image
DRP
© 2019 Renesas
DMAC DMAC
STC
Custom Module-upper
STC
Custom Module-upper
Processing Element Array
Custom Module-lower Custom Module-lower
DRP Hardware Architecture
PE (Processor Elements), SRAMs, ALUs, STC (State Machine Controller) and DMAC
13
13
8b
ALU
Reg
(7Byte)
8b
ALU
512B
2p SRAM
16b
Mul
16b
Mul
AXI(128-bit)
Flexible coarse-grained
reconfigurable architecture
(Binary/8-bit)
Tile-based scalability
(48 PEs x 6 Tiles)
Embedded Intelligent DMA
controller
4KB
2p SRAM
16b
Mul
Reg
(7Byte)
One Tile
(Executable Independently)
© 2019 Renesas
DRP Reconfiguration Architecture
State Machine Controller and Dynamically Reconfigurable Data-path
14
DMAC DMAC
STC
Custom Module-upper
STC
Custom Module-upper
Processing Element Array
Custom Module-lower Custom Module-lower
ALUALU
512B
2p SRAM
16b
Mul
16b
Mul
4KB
2p SRAM
16b
Mul
Reg
Reg
The state defines the data-path structure, which can
transition to a new configuration within one clock cycle
© 2019 Renesas
DRP Programming Method
C-to-Hardware Design Flow at a Glance
15
for( i = 0; i < N; i++ ){
for( j = 0; j < N; j++){
fn(i, j) = 5*f(i, j) – f(i, j-1) – f(i-1, j)
– f(i+1, j) – f(i, j+1);
}
}
Dedicated
compiler
Automatic synthesis from C to
hardwired logic
Automatic mapping from logic
to DRP configuration
Algorithm
written in C
Dynamically reconfigurable data-path
© 2019 Renesas
STC
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
PE PE PE PE PE PE PE PE
Mem Mem Mem Mem
Mem Mem Mem Mem
Mem
Mem
Mem
Mem
Mem
Mem
Mem
Mem
Mem
Mem
Mem
Mem
Mem
Mem
Mem
Mem
MPY MPY MPY MPY
MPY MPY MPY MPY
1
1
2
3
4
5
6
7
3 4 5
State-by-State
Reconfiguration
Dynamic Reconfiguration Mechanism
Cycle-base Configuration for STC (State Machine Controller) and Data-path
Switch between multiple Data-paths with each DRP clock cycle
to execute complex algorithms
16
+
x
-
DP#1
-
x
DP#2
+x
DP#3
-
x
DP#4
+
x
DP#5
x
+
x
+
DP#6
+x
DP#7
+
x
-
DP#1
x +
DP#3
x
-
DP#4
+
x
DP#5
+
x
-
DP#1
+
x
-
DP#1
x +
DP#3
x
-
DP#4
+
x
DP#5
+
x
-
DP#1
Finite State Machine
+ Datapath Configurations
DRP Hardware
© 2019 Renesas
Dynamic Loading to Support Multiple Algorithms
Switch to New Algorithm In As Little As One Millisecond
17
Dynamic reconfiguration in one DRP clock cycle
• Up to 64 contexts (HW configurations) stored in DRP
• Context switching managed by State Machine Controller
Dynamic loading of new configuration in as little as 1 ms
• Loads from external memory without interrupting execution
• Change to HW with completely different function set
• Time division execution of huge applications
© 2019 Renesas
DRP vs CPU Performance
Boosting Image Processing Algorithms
DRP is 12 to 20 times faster than CPU in these image processing examples
18
Image size : 640x480 VGA
Image Color : Grayscale 8BPP
CPU : RZ/A2M Cortex-A9 @528MHz
DRP : max.66MHz
Video Frame Rate
Example Algorithm:
Harris Corner Detection
Process
Execution Time (ms)
DRP CPU
Canny Edge
Detection
9.0 110.6 *
Harris Corner
Detection
11.6 235 *
*CPU: Using Open CV)
Processing Time
Example Algorithm:
Canny Edge Detection
© 2019 Renesas
Dynamic Loading in Milliseconds
Accelerate Sequential Algorithms / Parallel Processing Across Multiple DRP Slices
19
Image size : 640x480 VGA
Image Color : RGB 24BPP, Grayscale 8BPP
CPU : RZ/A2M Cortex-A9@528MHz
DRP : max.66MHz
© 2019 Renesas
DRP Benchmarking
Over 10x Higher Performance and More Stable Execution Time than CPU
DRP methodology converts source program to hard-wired circuit
for high-speed, jitter-free execution
20
DRP: 10.4 ms; No JitterCPU: from 141.9 ms to 142.3 ms
Canny Edge Detection Benchmark on Streaming Video
© 2019 Renesas
RZ/A2M
First RZ Product with DRP
DRP Multipurpose Accelerator
Large 4MB SRAM
- High-speed access & Low BOM Cost
High performance HMI functions
- Cortex-A9 528MHz with NEON DSP
- 2D Graphics accelerator & Sprite Engine
- MIPI-CSI camera interface
- JPEG Hardware codec
Rich Connectivity
- Dual Ethernet
- USB/SDHI/MMC/NAND
- 8bit DDR Memory Interface
High Security
- Arm TrustZone
- Trusted Secure IP (TSIP)
- Hardware crypto acceleration
- Key management and storage
- Access management
Packages
- 324pin BGA: 19mm/0.8mm pitch
- 272pin BGA : 17mm/0.8mm pitch
- 256pin BGA: 11mm/0.5mm pitch
- 176pin BGA: 13mm/0.8mm pitch
21
Winner!
© 2019 Renesas
RZ/A2M Development Support
Software Packages and Tools Ready to Download
Free RZ/A2M Software Package
• Download from website
• FreeRTOS v10 integrated
Free Development Environment
• Download e2studio IDE from website
Free Compiler
• Download GNU Arm Embedded
Toolchain from GNU Arm Embedded
Toolchain 6-2017-q2-update
Fast and easy development
22
https://www.renesas.com/us/en/products/software-tools/software-os-middleware-
driver/software-package/rza2-software-development-kit-free-rtos.html
RZ/A2M Software Package
Device
Drivers
Middle-
ware
Sample Projects
DRP
Library
DRP
Drivers
e2 studio IDE
and
Tool Plug-ins
Tool
© 2019 Renesas
Sample Projects Using DRP Libraries
Shrink Time-to-Market with Free Application Code
23
Simple Image Signal
Processing (ISP)
2D Barcode Detection &
Recognition
Iris Detection &
Recognition
Image enhancement for fast,
robust recognition
720p resolution, 10x faster than
CPU decode, longer battery life
30 cm range, fast, encrypted
identity storage and transmission
https://www.renesas.com/us/en/products/software-tools/boards-and-kits/eval-demo/rz-a2m-evaluation-board-kit.html#downloads
More to
Come
© 2019 Renesas
RZ/A2M Evaluation Kit
First Prototyping-Ready Kit
Complete RZ/A2M Evaluation Platform
Evaluation of DRP enabled
MIPI Camera Module (MIPI CSI)
HyperMCP with HyperFlash™ and HyperRAM™
RGB conversion board for HDMI display
2ch Ethernet communication
Other peripheral functions: SDHI, USB, etc.
Segger J-Link Lite debugger
Available now for RZ/A2M customer evaluation
24
Part number: RTK7921053S00000BE
© 2019 Renesas
Summary and Next Steps
Embedded Vision Evolution demands more computing power and
connectivity bandwidth
Available architectures have limitations of scalability, power consumption,
flexibility, and efficiency
DRP is solving these problems with Spatial and Time-Multiplexed Computing
Roadmap for DRP technology to enhance embedded AI inference
capabilities
Renesas is looking for partners who would like to work with our DRP
technology
25
© 2019 Renesas
Dynamically Reconfigurable Processor (DRP)
Technology Resources
26
RZA2 Support Material
Visit RZ/A2 MPU Product Page
Watch RZ/A2 MPU Overview Video
Watch DRP Overview Video
https://www.renesas.com/rza2m
Buy RZ/A2M Evaluation Kit
https://www.renesas.com/us/en/products/software-
tools/boards-and-kits/eval-demo/rz-a2m-evaluation-
board-kit.html
Get RZA2M Demos, Downloads, and
Application Guides
https://www.renesas.com/us/en/products/software-
tools/boards-and-kits/eval-demo/rz-a2m-evaluation-
board-kit.html#downloads
DRP Vision Processing Demo
Please visit us in the Vision Technology
Showcase to see a demonstration of the
video processing capabilities of the
RZ/A2M with DRP
Embedded Vision Summit
Dynamically Reconfigurable Processor
Technology for Vision Processing
© 2019 Renesas
Thank You
27
© 2019 Renesas
Backup Material
© 2019 Renesas
Acquisition of IDT
Integrated Device Technology:
the leading supplier of analog
mixed-signal products including
sensors, connectivity and
wireless power
Who We Are
The World’s Leading Embedded Solution Provider
29
Renesas Technology
Spin-off from Hitachi and
Mitsubishi merger
2003
NEC Electronics
Spin-off from NEC
2002
2019
Acquisition of Intersil
Strengthen leadership
in the analog market
2017
Renesas Electronics
started operation
NEC Electronics and
Renesas Technology merged
2010
Originating from Hitachi, Mitsubishi, NEC, and Intersil
Net sales 757.4 billion yen in 2018
19,000+ employees worldwide
Headquartered in Tokyo, Japan
© 2019 Renesas
RZ Family of Microprocessors
Maximum Performance for Cognitive, HMI, Industrial Networks
30
RZ/G SeriesRZ/A Series
www.renesas.com/rz
© 2019 Renesas
RZ Series Target Applications
31
© 2019 Renesas
RZ/A2M MPUs add DRP Technology to RZ/A1 Features
32
© 2019 Renesas
RZ/G MPU Series Lineup
Now with 2nd Generation RZ/G MPUs — Scalable from Entry-class RZ/G2E to High-end RZ/G2H
33
© 2019 Renesas
Additional RZ MPU Resources
RZ MPU Family Overview
RZ Family of 64-Bit & 32-Bit Arm-Based High-End MPUs
https://www.renesas.com/us/en/products/microcontrollers-microprocessors/rz.html
RZ/A1 Support Material
RZA1 Development Software and Kits
https://www.renesas.com/us/en/products/microcontrollers-microprocessors/rz/rza/rza-startup.html
RZ/G Support Material
Get Started with the RZ/G Linux Platform
https://www.renesas.com/us/en/products/rzg-linux-platform/get-started.html
34

More Related Content

PPTX
PPTX
Enfabrica - Bridging the Network and Memory Worlds
PDF
DPDK in Containers Hands-on Lab
ODP
Dpdk performance
PDF
EBPF and Linux Networking
PDF
Meet cute-between-ebpf-and-tracing
PDF
LSFMM 2019 BPF Observability
PDF
DPDK In Depth
Enfabrica - Bridging the Network and Memory Worlds
DPDK in Containers Hands-on Lab
Dpdk performance
EBPF and Linux Networking
Meet cute-between-ebpf-and-tracing
LSFMM 2019 BPF Observability
DPDK In Depth

What's hot (20)

PPTX
ACI DHCP Config Guide
PPTX
Dpdk applications
PDF
AMD EPYC™ Microprocessor Architecture
 
PDF
DevConf 2014 Kernel Networking Walkthrough
PDF
Static Partitioning with Xen, LinuxRT, and Zephyr: A Concrete End-to-end Exam...
PPTX
CXL Fabric Management Standards
PDF
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
PDF
Network Programming: Data Plane Development Kit (DPDK)
PDF
BPF Hardware Offload Deep Dive
PPTX
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
PDF
Introduction to eBPF
PDF
Enabling new protocol processing with DPDK using Dynamic Device Personalization
PPTX
Kali Linux
PDF
7 hands on
PDF
DPDK: Multi Architecture High Performance Packet Processing
PDF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
PDF
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...
PPTX
ゼロから始める自作 CPU 入門
PDF
Open Ethernet: an open-source approach to modern network design
PPTX
Baremetal openstackのご紹介
ACI DHCP Config Guide
Dpdk applications
AMD EPYC™ Microprocessor Architecture
 
DevConf 2014 Kernel Networking Walkthrough
Static Partitioning with Xen, LinuxRT, and Zephyr: A Concrete End-to-end Exam...
CXL Fabric Management Standards
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Network Programming: Data Plane Development Kit (DPDK)
BPF Hardware Offload Deep Dive
Q1 Memory Fabric Forum: Building Fast and Secure Chips with CXL IP
Introduction to eBPF
Enabling new protocol processing with DPDK using Dynamic Device Personalization
Kali Linux
7 hands on
DPDK: Multi Architecture High Performance Packet Processing
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Ayar Labs TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Pack...
ゼロから始める自作 CPU 入門
Open Ethernet: an open-source approach to modern network design
Baremetal openstackのご紹介
Ad

Similar to "Dynamically Reconfigurable Processor Technology for Vision Processing," a Presentation from Renesas (20)

PDF
10 Reasons to Use Next-Generation HMI Solution Kits for RZ/A
PDF
01 renesas MCU 開發環境
PDF
É possível rodar Linux com menos de 10 MB de RAM?
PPTX
Renesas microcontroller and advanced controller.pptx
PPT
UIC Thesis Candiloro
PPT
DRESD In a Nutshell July07
PPT
Rev2 HPPS Project 2007
PPTX
Snapdragon SoC and ARMv7 Architecture
PPT
Reconfigurable Computing
PPT
RCW@DEI - Design Flow 4 SoPc
PPT
Rev1 HPPS Projects 2007
PDF
Implementing AI: Running AI at the Edge: Embedding low-cost intelligence with...
 
PPT
DRESD Project Presentation - December 2006
PPTX
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
PDF
2011 DDR4 Mini Workshop.pdf
PDF
Lightweight DNN Processor Design (based on NVDLA)
PDF
Deep learning: Hardware Landscape
PPTX
lecture one of fpga course on reconfig sys
PDF
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
PPTX
10 factors to consider when choosing your next 32-bit MCU
10 Reasons to Use Next-Generation HMI Solution Kits for RZ/A
01 renesas MCU 開發環境
É possível rodar Linux com menos de 10 MB de RAM?
Renesas microcontroller and advanced controller.pptx
UIC Thesis Candiloro
DRESD In a Nutshell July07
Rev2 HPPS Project 2007
Snapdragon SoC and ARMv7 Architecture
Reconfigurable Computing
RCW@DEI - Design Flow 4 SoPc
Rev1 HPPS Projects 2007
Implementing AI: Running AI at the Edge: Embedding low-cost intelligence with...
 
DRESD Project Presentation - December 2006
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
2011 DDR4 Mini Workshop.pdf
Lightweight DNN Processor Design (based on NVDLA)
Deep learning: Hardware Landscape
lecture one of fpga course on reconfig sys
Webinar Renesas - IoT é Segura? Com Renesas Synergy sim! E o SSP 1.5 tornou a...
10 factors to consider when choosing your next 32-bit MCU
Ad

More from Edge AI and Vision Alliance (20)

PDF
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
PDF
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
PDF
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
PDF
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
PDF
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
PDF
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
PDF
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
“Visual Search: Fine-grained Recognition with Embedding Models for the Edge,”...
“Optimizing Real-time SLAM Performance for Autonomous Robots with GPU Acceler...
“LLMs and VLMs for Regulatory Compliance, Quality Control and Safety Applicat...
“Simplifying Portable Computer Vision with OpenVX 2.0,” a Presentation from AMD
“Quantization Techniques for Efficient Deployment of Large Language Models: A...
“Introduction to Data Types for AI: Trade-Offs and Trends,” a Presentation fr...
“Introduction to Radar and Its Use for Machine Perception,” a Presentation fr...
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
Teaching material agriculture food technology
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Encapsulation theory and applications.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
KodekX | Application Modernization Development
PDF
Approach and Philosophy of On baking technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
The AUB Centre for AI in Media Proposal.docx
20250228 LYD VKU AI Blended-Learning.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Teaching material agriculture food technology
Digital-Transformation-Roadmap-for-Companies.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Encapsulation theory and applications.pdf
Encapsulation_ Review paper, used for researhc scholars
KodekX | Application Modernization Development
Approach and Philosophy of On baking technology
Chapter 3 Spatial Domain Image Processing.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
cuic standard and advanced reporting.pdf
Cloud computing and distributed systems.
Understanding_Digital_Forensics_Presentation.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

"Dynamically Reconfigurable Processor Technology for Vision Processing," a Presentation from Renesas

  • 1. © 2019 Renesas Dynamically Reconfigurable Processor Technology for Vision Processing Yoshio Sato Renesas May 2019
  • 2. © 2019 Renesas Renesas Broad Product Portfolio RZ 32/64-bit Arm-Based High-End Embedded MPU Family 2 8/16-bit Ultra-low energy Sensing and Motor Control Microcontrollers and Microprocessors 32/64-bit Arm-Based High-End Embedded MPU 32-bit High power efficiency Motor Control Renesas 40nm process Automotive …and more Analog & Mixed Signal, Power, Discrete ▪ Analog Devices ▪ Interface ▪ Memory ▪ Optoelectronics ▪ Power Management ▪ Sensors ▪ Space & Harsh Environment ▪ Timing & Digital Logic For automotive: ▪ Power Management ▪ Battery Management ▪ Video & Display …and more SoC, Integrated Platforms Factory automation Automotive AutomotiveRenesas autonomy™ IoTRenesas Synergy™ HMIRenesas RZ/G Linux
  • 3. © 2019 Renesas Image Processing Market Dynamics Vision Evolution Drives Need for More Computing Power and Connectivity Bandwidth 3 1x 26x 1x 5x 1x 130x Resolution FPS VGA 30 10060 Consumer Healthcare Picking, Sorting 8MP+ Network Video Marketing Camera 2MP 1MP 5MP Motion Capture, Sports Analysis Intruder Alarm 150+ Security, Marketing Machine Vision 3MP Medical Camera (Endoscope) Digital Microscope Access Control Barcode, Character Visual Inspection
  • 4. © 2019 Renesas Problem Statement Limitations of EXISTING ARCHITECTURES Multi processor • Limited scalability • Power hungry Dedicated IP • Adding functionality increases cost and power consumption • Not flexible FPGA • Power hungry • Area efficiency challenge 4
  • 5. © 2019 Renesas Multi Processor Problem: Amdahl's Law Speedup is Limited by the Serial Part of the Program 5 https://en.wikipedia.org/wiki/Amdahl%27s_law#
  • 6. © 2019 Renesas Multi Processor Problem: Von Neumann Architecture Data Transfer Drives Power Consumption 6 Conventional Software Implementation (CPU/DSP/GPU) Wired Logic Architecture (Dedicated IP/FPGA) Saving memory access by combining multiple functions → Reducing power by less memory access DRAM DRAM DRAM DRAM DRAM DRAM DRAM Sequential functions with many memory accesses → Consuming power only for data transfer Func 1 Func 2 Func 3 Func 4 Func. 1 Func. 2 Func. 3 Func. 4 Compute Local SRAM Main SRAM Main DRAM 1x 2x 128x CHIP Reference: presentation by Bert Moons Relative energy consumption per equivalent MAC operation I/O
  • 7. © 2019 Renesas Dedicated IP Tradeoff Performance Increase at Cost of Larger Die Size and Power Consumption 7 0 10 20 30 40 50 Product Generation Gen.1 Gen.2 Gen.3 Functionality,Cost,Power
  • 8. © 2019 Renesas Dedicated IP Limitations Limited Flexibility to Fix Bugs for Pre Production and Mass Production 8 Product GenerationGen 1 Gen 2 Functionality,Cost,Power Gen 2.1 Gen 2.2 Bug Fix Upgrade Optimize after market Time-to-market Bug Found! Hide it! Re-spin! What if we find the bug in the field?
  • 9. © 2019 Renesas FPGA Problem Efficient Mapping is Challenging; Adding Features Increases Die Size and Power Programmability requires tradeoff of increased die size, power, and cost Mapping design efficiently to FPGA resources is difficult Adding new features may force move to larger, more expensive, and power-hungry FPGA 9 IO Pads Interconnect Switch Block Logic Block Free Area FreeFree
  • 10. © 2019 Renesas Dynamically Reconfigurable Processor (DRP) Benefits of Software Flexibility with Pipelined Hardware Performance 10 Performance HighLow High Low DRP Dedicated IP or FPGA CPU Flexible, but limited performance High performance, but limited runtime reconfigurability Allocate system functions in complementary manner Runtime-reconfigurable hardwareProgrammability
  • 11. © 2019 Renesas DRP (Dynamically Reconfigurable Processor) Spatial and Time-Multiplexed Computing 11 CPU GPU (SIMD) FPGA DRP ALU ALU ALU ALU Process 1 to a few operations in a cycle Process 10-100 same operations in a cycle Spatially expand various operations into LUT architecture Efficient if all parallel data are prepared High degree of freedom but limited performance High performance but use huge area due to expansion Pipelined processing Reg Reg ld ld add sub bne st ld ld add sub ld ld ld ld add add sub sub Time + × - Process spatially expanded operations Combine performance with expansion and area efficiency via dynamic reconfiguration +× × - Time Spatially pipelined Time-multiplexed
  • 12. © 2019 Renesas DRP in the System Reconfigurable Array + Fast DMA 12 Fast and Intelligent Data Mover (DMA) ◼ Fast external memory access via dedicated DMA ◼ C programming with access API ◼ Dynamic configuration loading in less 1ms from external memory Reconfigurable Processing Elements Array ◼ Reconfigure as pipelined circuit to fit with algorithm ◼ Run complex algorithms by switching configurations within a nanosecond (up to 64 contexts) ◼ Programming with C language by high-level Synthesis tool Reduce memory bandwidth and achieve low latency Memory Controller DMA controller FIFO FIFO CPU External DRAM Output ImageInput Image DRP
  • 13. © 2019 Renesas DMAC DMAC STC Custom Module-upper STC Custom Module-upper Processing Element Array Custom Module-lower Custom Module-lower DRP Hardware Architecture PE (Processor Elements), SRAMs, ALUs, STC (State Machine Controller) and DMAC 13 13 8b ALU Reg (7Byte) 8b ALU 512B 2p SRAM 16b Mul 16b Mul AXI(128-bit) Flexible coarse-grained reconfigurable architecture (Binary/8-bit) Tile-based scalability (48 PEs x 6 Tiles) Embedded Intelligent DMA controller 4KB 2p SRAM 16b Mul Reg (7Byte) One Tile (Executable Independently)
  • 14. © 2019 Renesas DRP Reconfiguration Architecture State Machine Controller and Dynamically Reconfigurable Data-path 14 DMAC DMAC STC Custom Module-upper STC Custom Module-upper Processing Element Array Custom Module-lower Custom Module-lower ALUALU 512B 2p SRAM 16b Mul 16b Mul 4KB 2p SRAM 16b Mul Reg Reg The state defines the data-path structure, which can transition to a new configuration within one clock cycle
  • 15. © 2019 Renesas DRP Programming Method C-to-Hardware Design Flow at a Glance 15 for( i = 0; i < N; i++ ){ for( j = 0; j < N; j++){ fn(i, j) = 5*f(i, j) – f(i, j-1) – f(i-1, j) – f(i+1, j) – f(i, j+1); } } Dedicated compiler Automatic synthesis from C to hardwired logic Automatic mapping from logic to DRP configuration Algorithm written in C Dynamically reconfigurable data-path
  • 16. © 2019 Renesas STC PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE PE Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem Mem MPY MPY MPY MPY MPY MPY MPY MPY 1 1 2 3 4 5 6 7 3 4 5 State-by-State Reconfiguration Dynamic Reconfiguration Mechanism Cycle-base Configuration for STC (State Machine Controller) and Data-path Switch between multiple Data-paths with each DRP clock cycle to execute complex algorithms 16 + x - DP#1 - x DP#2 +x DP#3 - x DP#4 + x DP#5 x + x + DP#6 +x DP#7 + x - DP#1 x + DP#3 x - DP#4 + x DP#5 + x - DP#1 + x - DP#1 x + DP#3 x - DP#4 + x DP#5 + x - DP#1 Finite State Machine + Datapath Configurations DRP Hardware
  • 17. © 2019 Renesas Dynamic Loading to Support Multiple Algorithms Switch to New Algorithm In As Little As One Millisecond 17 Dynamic reconfiguration in one DRP clock cycle • Up to 64 contexts (HW configurations) stored in DRP • Context switching managed by State Machine Controller Dynamic loading of new configuration in as little as 1 ms • Loads from external memory without interrupting execution • Change to HW with completely different function set • Time division execution of huge applications
  • 18. © 2019 Renesas DRP vs CPU Performance Boosting Image Processing Algorithms DRP is 12 to 20 times faster than CPU in these image processing examples 18 Image size : 640x480 VGA Image Color : Grayscale 8BPP CPU : RZ/A2M Cortex-A9 @528MHz DRP : max.66MHz Video Frame Rate Example Algorithm: Harris Corner Detection Process Execution Time (ms) DRP CPU Canny Edge Detection 9.0 110.6 * Harris Corner Detection 11.6 235 * *CPU: Using Open CV) Processing Time Example Algorithm: Canny Edge Detection
  • 19. © 2019 Renesas Dynamic Loading in Milliseconds Accelerate Sequential Algorithms / Parallel Processing Across Multiple DRP Slices 19 Image size : 640x480 VGA Image Color : RGB 24BPP, Grayscale 8BPP CPU : RZ/A2M Cortex-A9@528MHz DRP : max.66MHz
  • 20. © 2019 Renesas DRP Benchmarking Over 10x Higher Performance and More Stable Execution Time than CPU DRP methodology converts source program to hard-wired circuit for high-speed, jitter-free execution 20 DRP: 10.4 ms; No JitterCPU: from 141.9 ms to 142.3 ms Canny Edge Detection Benchmark on Streaming Video
  • 21. © 2019 Renesas RZ/A2M First RZ Product with DRP DRP Multipurpose Accelerator Large 4MB SRAM - High-speed access & Low BOM Cost High performance HMI functions - Cortex-A9 528MHz with NEON DSP - 2D Graphics accelerator & Sprite Engine - MIPI-CSI camera interface - JPEG Hardware codec Rich Connectivity - Dual Ethernet - USB/SDHI/MMC/NAND - 8bit DDR Memory Interface High Security - Arm TrustZone - Trusted Secure IP (TSIP) - Hardware crypto acceleration - Key management and storage - Access management Packages - 324pin BGA: 19mm/0.8mm pitch - 272pin BGA : 17mm/0.8mm pitch - 256pin BGA: 11mm/0.5mm pitch - 176pin BGA: 13mm/0.8mm pitch 21 Winner!
  • 22. © 2019 Renesas RZ/A2M Development Support Software Packages and Tools Ready to Download Free RZ/A2M Software Package • Download from website • FreeRTOS v10 integrated Free Development Environment • Download e2studio IDE from website Free Compiler • Download GNU Arm Embedded Toolchain from GNU Arm Embedded Toolchain 6-2017-q2-update Fast and easy development 22 https://www.renesas.com/us/en/products/software-tools/software-os-middleware- driver/software-package/rza2-software-development-kit-free-rtos.html RZ/A2M Software Package Device Drivers Middle- ware Sample Projects DRP Library DRP Drivers e2 studio IDE and Tool Plug-ins Tool
  • 23. © 2019 Renesas Sample Projects Using DRP Libraries Shrink Time-to-Market with Free Application Code 23 Simple Image Signal Processing (ISP) 2D Barcode Detection & Recognition Iris Detection & Recognition Image enhancement for fast, robust recognition 720p resolution, 10x faster than CPU decode, longer battery life 30 cm range, fast, encrypted identity storage and transmission https://www.renesas.com/us/en/products/software-tools/boards-and-kits/eval-demo/rz-a2m-evaluation-board-kit.html#downloads More to Come
  • 24. © 2019 Renesas RZ/A2M Evaluation Kit First Prototyping-Ready Kit Complete RZ/A2M Evaluation Platform Evaluation of DRP enabled MIPI Camera Module (MIPI CSI) HyperMCP with HyperFlash™ and HyperRAM™ RGB conversion board for HDMI display 2ch Ethernet communication Other peripheral functions: SDHI, USB, etc. Segger J-Link Lite debugger Available now for RZ/A2M customer evaluation 24 Part number: RTK7921053S00000BE
  • 25. © 2019 Renesas Summary and Next Steps Embedded Vision Evolution demands more computing power and connectivity bandwidth Available architectures have limitations of scalability, power consumption, flexibility, and efficiency DRP is solving these problems with Spatial and Time-Multiplexed Computing Roadmap for DRP technology to enhance embedded AI inference capabilities Renesas is looking for partners who would like to work with our DRP technology 25
  • 26. © 2019 Renesas Dynamically Reconfigurable Processor (DRP) Technology Resources 26 RZA2 Support Material Visit RZ/A2 MPU Product Page Watch RZ/A2 MPU Overview Video Watch DRP Overview Video https://www.renesas.com/rza2m Buy RZ/A2M Evaluation Kit https://www.renesas.com/us/en/products/software- tools/boards-and-kits/eval-demo/rz-a2m-evaluation- board-kit.html Get RZA2M Demos, Downloads, and Application Guides https://www.renesas.com/us/en/products/software- tools/boards-and-kits/eval-demo/rz-a2m-evaluation- board-kit.html#downloads DRP Vision Processing Demo Please visit us in the Vision Technology Showcase to see a demonstration of the video processing capabilities of the RZ/A2M with DRP Embedded Vision Summit Dynamically Reconfigurable Processor Technology for Vision Processing
  • 29. © 2019 Renesas Acquisition of IDT Integrated Device Technology: the leading supplier of analog mixed-signal products including sensors, connectivity and wireless power Who We Are The World’s Leading Embedded Solution Provider 29 Renesas Technology Spin-off from Hitachi and Mitsubishi merger 2003 NEC Electronics Spin-off from NEC 2002 2019 Acquisition of Intersil Strengthen leadership in the analog market 2017 Renesas Electronics started operation NEC Electronics and Renesas Technology merged 2010 Originating from Hitachi, Mitsubishi, NEC, and Intersil Net sales 757.4 billion yen in 2018 19,000+ employees worldwide Headquartered in Tokyo, Japan
  • 30. © 2019 Renesas RZ Family of Microprocessors Maximum Performance for Cognitive, HMI, Industrial Networks 30 RZ/G SeriesRZ/A Series www.renesas.com/rz
  • 31. © 2019 Renesas RZ Series Target Applications 31
  • 32. © 2019 Renesas RZ/A2M MPUs add DRP Technology to RZ/A1 Features 32
  • 33. © 2019 Renesas RZ/G MPU Series Lineup Now with 2nd Generation RZ/G MPUs — Scalable from Entry-class RZ/G2E to High-end RZ/G2H 33
  • 34. © 2019 Renesas Additional RZ MPU Resources RZ MPU Family Overview RZ Family of 64-Bit & 32-Bit Arm-Based High-End MPUs https://www.renesas.com/us/en/products/microcontrollers-microprocessors/rz.html RZ/A1 Support Material RZA1 Development Software and Kits https://www.renesas.com/us/en/products/microcontrollers-microprocessors/rz/rza/rza-startup.html RZ/G Support Material Get Started with the RZ/G Linux Platform https://www.renesas.com/us/en/products/rzg-linux-platform/get-started.html 34