SlideShare a Scribd company logo
An Introduction to Asymmetric Multiprocessing:
when this architecture can be a game changer
and how to survive it.
Nicola La Gloria, Laura Nao
Kynetics LLC
Santa Clara, California
Hybrid Architecture: warpx.io
The Hybrid Design Architecture (HDA)
combines the power of an application
processor with the ease-of-use of
micro-controllers.
Software on warpx HDA
SMP vs AMP
SMP on homogeneous architectures:
● Single OS controlling two or more
identical cores sharing system
resources
● Dynamic scheduling and load
balancing
Core1
CoreN
Kernel SMP
OS
App App
. . .
SMP vs AMP
Core1
CoreN
MCAPI
OS OS/RTOS
App App Task Task
AMP on heterogeneous architectures:
● Different OS on each core -->
full-featured OS alongside a
real-time kernel
● Inter processor communication
protocol
● Efficient when the application can
be statically partitioned across
cores - high performance is
achieved locally
. . .
Supervised vs Not Supervised
● Strong isolation
● Hides non-trivial AMP details (e.g. resource
assignment, inter-core communication)
● Security and robustness
● Overhead of a software layer
● Achieve best performances by running
natively
● Boot sequence complexity
● Harder to debug
Interprocessor Communication
NXP i.MX7 overview
● Cortex-A7 core + Cortex-M4 core
● Master - Slave architecture
○ A7 is the master
○ M4 is the slave
● Inter processor communication
○ MU - Messaging Unit
○ RPMsg component (OpenAMP framework)
● Safe sharing of resources
○ RDC - Resource Domain Controller
NXP i.MX7 - RDC
NXP i.MX7 IPC - MU
MAC (VirtIO)
The OpenAMP framework - RPMsg
RPMsg on Linux
Hybrid Linux/FreeRTOS Demo
Demo Goal:
● IMU sensor (I2C) read by MCU task
● Calculate objective function (module of acc, mag, gyro vectors)
● Log/plot sensor samples on MPU
● Safely recover from a kernel panic
Hardware Setup
● Boundary Devices Nitrogen 7, Toradex Colibri i.MX7 SOM
○ NXP i.MX7D processor - ARM dual Cortex-A7 + ARM Cortex-M4
○ Segger J-Link Probe
Cortex M4 Bring Up (1)
Environment setup:
● Download FreeRTOS sources
https://github.com/boundarydevices/freertos-boundary.git
● Download GNU ARM Embedded Toolchain
https://developer.arm.com/open-source/gnu-toolchain/gnu-rm/download
● Example applications for Cortex-M4 are located in the
examples/imx7d_nitrogen7_m4/ folder
● Scripts for building both debug and release binaries are available in the
armgcc subfolder
Cortex M4 Bring Up (2)
M4 Binary application can be loaded on the Cortex-M4 in different ways:
● U-Boot - ums gadget + m4update
● using remoteproc framework (linux userspace)
● using imx-m4fwloader from NXP (linux userspace):
https://github.com/codeauroraforum/imx-m4fwloader
M4 code can be linked and loaded to one of the following:
● TCM - 32KB (preferred)
● OCRAM - 32KB
● DDR - up to 1MB
● QSPI Flash - 128KB
IDE Setup
● Eclipse for C/C+
○ GNU MCU Eclipse : plugins and tools for embedded ARM development -
https://marketplace.eclipse.org/content/gnu-mcu-eclipse
● GDB
● J-Link scripts for iMX7D for debugging both Cortex-A7 cores and Cortex-M4 -
https://wiki.segger.com/IMX7D
● FreeRTOS Kernel Awareness plugin from NXP
http://freescale.com/lgfiles/updates/Eclipse/KDS
● ARM DS-5 (not free)
● Sourcery Codebench (not free)
Workbench
code
MPU
console
Break
points
(MCU)
FreeRTOS
kernel
awareness
MCU
console
Demo Parameters
Remote core:
● Sample IMUs every 10ms
● Calculate the objective function on MCU (module of vectors)
● Buffer of 300 elements = 3Kb (stored TCM Memory only 32 Kb)
● Items (12 byte each) are dequeued and sent to master 10 at a time every 100
ms
Master core:
● Master reads incoming samples by polling the character device
Architecture Overview
start_cmd, stop_cmd, heartbeatLinux 4.9 FreeRTOS 1.0.1
The OpenAMP framework - RPMsg
Control Flow (2 cores)
● S0 RPMsg channel is down
● S1 RPMsg channel is up, /dev/rpmsg0 is created
● S2 RPMsg channel is up, endpoint created, data is
dumped into a log file
● S0 RPMsg channel is down
● S1 RPMsg channel is up (sampling IMU sensor,
buffering data)
● S2 RPMsg channel is up, sending data to master
core, (sampling IMU sensor, buffering data),
Register rpmsg char driver
+ probe
open /dev/rpmsg0
read /dev/rpmsg0
Channel created
stop_cmd
Master is dead
start_cmd
Master
heartbeat
rpmsg_char_client Data sender task
close /dev/rpmsg0
What if Linux Kernel Panics
● Kexec: system call to load and boot into another kernel from the currently
running kernel (4.9.74).
○ crashkernel=128M [normal kernel cmdline]
○ irqpoll, nosmp, reset_devices [crash kernel
cmdline]
○ --load-panic option
● Kdump: Linux mechanism to dump machine memory content on kernel
panic.
● Kexec/Kdump support on ARM platforms is still experimental
Video of the Demo
VIDEO
Pitfalls
● Before announcing the remote service, MCU checks whether master is
up. If notification arrives too early (virtqueue kick function call) when
booting crash kernel the system might hang
● Sometimes kexec still hangs and fails to soft-reboot - more frequent when
streaming continuously instead of sending data bursts (but we don’t know
why)
References
● OpenAMP project page: https://github.com/OpenAMP/
● Asymmetric multiprocessing and embedded linux (ELC 2017):
https://elinux.org/images/3/3b/NOVAK_CERVENKA.pdf
● Mantainers:
○ Open-amp:
■ Wendy Liang
○ RPMsg (Linux)
■ Ohad Ben-Cohen
■ Bjorn Andersson
○ Kexec (Linux)
■ Eric Biederman
○ Kdump (Linux)
■ Dave Young
■ Baoquan He
Q/A

More Related Content

PDF
Heterogeneous multiprocessing on androd and i.mx7
PPSX
FD.io Vector Packet Processing (VPP)
PDF
The Linux Block Layer - Built for Fast Storage
PPTX
Introduction to DPDK
PDF
Comprehensive XDP Off‌load-handling the Edge Cases
PPT
ARM Linux Embedded memory protection techniques
PPTX
Linux Interrupts
PPTX
Linux rt in financial markets
Heterogeneous multiprocessing on androd and i.mx7
FD.io Vector Packet Processing (VPP)
The Linux Block Layer - Built for Fast Storage
Introduction to DPDK
Comprehensive XDP Off‌load-handling the Edge Cases
ARM Linux Embedded memory protection techniques
Linux Interrupts
Linux rt in financial markets

What's hot (20)

PPTX
Userspace Linux I/O
ODP
Performance: Observe and Tune
PPTX
Linux Network Stack
PDF
Linux Locking Mechanisms
PDF
Introduction to eBPF
PDF
FreeBSD and Drivers
PPTX
Packet Framework - Cristian Dumitrescu
PDF
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
PDF
Linux SMEP bypass techniques
PDF
DLM knowledge-sharing
PDF
asap2013-khoa-presentation
PPTX
TRex Realistic Traffic Generator - Stateless support
PDF
DPDK In Depth
PDF
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
PPT
Linux Kernel Debugging
PDF
High Availability Storage (susecon2016)
PPTX
Understanding DPDK
PPTX
DPDK KNI interface
PDF
Performance challenges in software networking
PDF
Performance Lessons learned in vRouter - Stephen Hemminger
Userspace Linux I/O
Performance: Observe and Tune
Linux Network Stack
Linux Locking Mechanisms
Introduction to eBPF
FreeBSD and Drivers
Packet Framework - Cristian Dumitrescu
Kernel Recipes 2016 - New hwmon device registration API - Jean Delvare
Linux SMEP bypass techniques
DLM knowledge-sharing
asap2013-khoa-presentation
TRex Realistic Traffic Generator - Stateless support
DPDK In Depth
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Linux Kernel Debugging
High Availability Storage (susecon2016)
Understanding DPDK
DPDK KNI interface
Performance challenges in software networking
Performance Lessons learned in vRouter - Stephen Hemminger
Ad

Similar to Asymmetric Multiprocessing - Kynetics ELC 2018 portland (20)

PDF
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
PDF
TDC2016SP - Trilha Linux Embarcado
PDF
Руслан Букин: FreeBSD и встраиваемые системы
PDF
Develop Your Own Operating Systems using Cheap ARM Boards
PDF
Embedded platform choices
PPTX
Developing Real-Time Systems on Application Processors
PDF
Porting Android
PDF
Porting Android ABS 2011
PPTX
BSP.pptx
PPTX
“Linux Kernel CPU Hotplug in the Multicore System”
PPTX
Mces MOD 1.pptx
PDF
Dec.20.2019, Arduino based on Mbed os
PDF
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
PPTX
Unit vi (1)
PDF
Porting_uClinux_CELF2008_Griffin
PDF
Porting Android
PPTX
Embedded Systems Overview
PPTX
Lect 1_Embedded Linux Embedded RTOS ppt
PDF
All Arduino boards contain a microcontroller, which is a small computer. It t...
PDF
Building
Building a QT based solution on a i.MX7 processor running Linux and FreeRTOS
TDC2016SP - Trilha Linux Embarcado
Руслан Букин: FreeBSD и встраиваемые системы
Develop Your Own Operating Systems using Cheap ARM Boards
Embedded platform choices
Developing Real-Time Systems on Application Processors
Porting Android
Porting Android ABS 2011
BSP.pptx
“Linux Kernel CPU Hotplug in the Multicore System”
Mces MOD 1.pptx
Dec.20.2019, Arduino based on Mbed os
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
Unit vi (1)
Porting_uClinux_CELF2008_Griffin
Porting Android
Embedded Systems Overview
Lect 1_Embedded Linux Embedded RTOS ppt
All Arduino boards contain a microcontroller, which is a small computer. It t...
Building
Ad

Recently uploaded (20)

PPT
Mechanical Engineering MATERIALS Selection
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
PPT on Performance Review to get promotions
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
Construction Project Organization Group 2.pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PDF
Digital Logic Computer Design lecture notes
PPTX
additive manufacturing of ss316l using mig welding
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
web development for engineering and engineering
PPTX
Strings in CPP - Strings in C++ are sequences of characters used to store and...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
Geodesy 1.pptx...............................................
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Mechanical Engineering MATERIALS Selection
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPT on Performance Review to get promotions
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Construction Project Organization Group 2.pptx
573137875-Attendance-Management-System-original
CYBER-CRIMES AND SECURITY A guide to understanding
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Digital Logic Computer Design lecture notes
additive manufacturing of ss316l using mig welding
Embodied AI: Ushering in the Next Era of Intelligent Systems
web development for engineering and engineering
Strings in CPP - Strings in C++ are sequences of characters used to store and...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Arduino robotics embedded978-1-4302-3184-4.pdf
Geodesy 1.pptx...............................................
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026

Asymmetric Multiprocessing - Kynetics ELC 2018 portland

  • 1. An Introduction to Asymmetric Multiprocessing: when this architecture can be a game changer and how to survive it. Nicola La Gloria, Laura Nao Kynetics LLC Santa Clara, California
  • 2. Hybrid Architecture: warpx.io The Hybrid Design Architecture (HDA) combines the power of an application processor with the ease-of-use of micro-controllers.
  • 4. SMP vs AMP SMP on homogeneous architectures: ● Single OS controlling two or more identical cores sharing system resources ● Dynamic scheduling and load balancing Core1 CoreN Kernel SMP OS App App . . .
  • 5. SMP vs AMP Core1 CoreN MCAPI OS OS/RTOS App App Task Task AMP on heterogeneous architectures: ● Different OS on each core --> full-featured OS alongside a real-time kernel ● Inter processor communication protocol ● Efficient when the application can be statically partitioned across cores - high performance is achieved locally . . .
  • 6. Supervised vs Not Supervised ● Strong isolation ● Hides non-trivial AMP details (e.g. resource assignment, inter-core communication) ● Security and robustness ● Overhead of a software layer ● Achieve best performances by running natively ● Boot sequence complexity ● Harder to debug
  • 8. NXP i.MX7 overview ● Cortex-A7 core + Cortex-M4 core ● Master - Slave architecture ○ A7 is the master ○ M4 is the slave ● Inter processor communication ○ MU - Messaging Unit ○ RPMsg component (OpenAMP framework) ● Safe sharing of resources ○ RDC - Resource Domain Controller
  • 14. Hybrid Linux/FreeRTOS Demo Demo Goal: ● IMU sensor (I2C) read by MCU task ● Calculate objective function (module of acc, mag, gyro vectors) ● Log/plot sensor samples on MPU ● Safely recover from a kernel panic Hardware Setup ● Boundary Devices Nitrogen 7, Toradex Colibri i.MX7 SOM ○ NXP i.MX7D processor - ARM dual Cortex-A7 + ARM Cortex-M4 ○ Segger J-Link Probe
  • 15. Cortex M4 Bring Up (1) Environment setup: ● Download FreeRTOS sources https://github.com/boundarydevices/freertos-boundary.git ● Download GNU ARM Embedded Toolchain https://developer.arm.com/open-source/gnu-toolchain/gnu-rm/download ● Example applications for Cortex-M4 are located in the examples/imx7d_nitrogen7_m4/ folder ● Scripts for building both debug and release binaries are available in the armgcc subfolder
  • 16. Cortex M4 Bring Up (2) M4 Binary application can be loaded on the Cortex-M4 in different ways: ● U-Boot - ums gadget + m4update ● using remoteproc framework (linux userspace) ● using imx-m4fwloader from NXP (linux userspace): https://github.com/codeauroraforum/imx-m4fwloader M4 code can be linked and loaded to one of the following: ● TCM - 32KB (preferred) ● OCRAM - 32KB ● DDR - up to 1MB ● QSPI Flash - 128KB
  • 17. IDE Setup ● Eclipse for C/C+ ○ GNU MCU Eclipse : plugins and tools for embedded ARM development - https://marketplace.eclipse.org/content/gnu-mcu-eclipse ● GDB ● J-Link scripts for iMX7D for debugging both Cortex-A7 cores and Cortex-M4 - https://wiki.segger.com/IMX7D ● FreeRTOS Kernel Awareness plugin from NXP http://freescale.com/lgfiles/updates/Eclipse/KDS ● ARM DS-5 (not free) ● Sourcery Codebench (not free)
  • 19. Demo Parameters Remote core: ● Sample IMUs every 10ms ● Calculate the objective function on MCU (module of vectors) ● Buffer of 300 elements = 3Kb (stored TCM Memory only 32 Kb) ● Items (12 byte each) are dequeued and sent to master 10 at a time every 100 ms Master core: ● Master reads incoming samples by polling the character device
  • 20. Architecture Overview start_cmd, stop_cmd, heartbeatLinux 4.9 FreeRTOS 1.0.1
  • 22. Control Flow (2 cores) ● S0 RPMsg channel is down ● S1 RPMsg channel is up, /dev/rpmsg0 is created ● S2 RPMsg channel is up, endpoint created, data is dumped into a log file ● S0 RPMsg channel is down ● S1 RPMsg channel is up (sampling IMU sensor, buffering data) ● S2 RPMsg channel is up, sending data to master core, (sampling IMU sensor, buffering data), Register rpmsg char driver + probe open /dev/rpmsg0 read /dev/rpmsg0 Channel created stop_cmd Master is dead start_cmd Master heartbeat rpmsg_char_client Data sender task close /dev/rpmsg0
  • 23. What if Linux Kernel Panics ● Kexec: system call to load and boot into another kernel from the currently running kernel (4.9.74). ○ crashkernel=128M [normal kernel cmdline] ○ irqpoll, nosmp, reset_devices [crash kernel cmdline] ○ --load-panic option ● Kdump: Linux mechanism to dump machine memory content on kernel panic. ● Kexec/Kdump support on ARM platforms is still experimental
  • 24. Video of the Demo VIDEO
  • 25. Pitfalls ● Before announcing the remote service, MCU checks whether master is up. If notification arrives too early (virtqueue kick function call) when booting crash kernel the system might hang ● Sometimes kexec still hangs and fails to soft-reboot - more frequent when streaming continuously instead of sending data bursts (but we don’t know why)
  • 26. References ● OpenAMP project page: https://github.com/OpenAMP/ ● Asymmetric multiprocessing and embedded linux (ELC 2017): https://elinux.org/images/3/3b/NOVAK_CERVENKA.pdf ● Mantainers: ○ Open-amp: ■ Wendy Liang ○ RPMsg (Linux) ■ Ohad Ben-Cohen ■ Bjorn Andersson ○ Kexec (Linux) ■ Eric Biederman ○ Kdump (Linux) ■ Dave Young ■ Baoquan He
  • 27. Q/A