SlideShare a Scribd company logo
Memory Access Control in
Multiprocessor for Real-time Systems
       with Mixed Criticality

   Heechul Yun+, Gang Yao+, Rodolfo Pellizzoni*,
             Marco Caccamo+, Lui Sha+
  University of Illinois at Urbana and Champaign+
               Univerity of Waterloo*
Multi-core Systems
• Mainstream in smartphone
   – Dual/quad-core smartphones
   – More performance with less
     power
                                                                         Tegra 3 (4 cores)



• Traditional embedded/real-time domains
  – Avionics companies are investigating [Nowotsch12]
      • 8 core P4080 processor from Freescale

                                                                                           2
    [Nowotsch12] “Leveraging Multi-Core Computing Architectures in Avionics”, EDCC, 2012
Challenge

         Core1    Core2          Core3   Core4


                      System bus


                          DRAM


• Timing isolation is hard to achieve



                                                 3
Challenge
         Appl1     App2          App 3   App 4

        Core1     Core2          Core3   Core4


                      System bus


                          DRAM


• Cores compete for shared HW resources
  – System bus, DRAM controller, shared cache, …


                                                   4
Effect of Memory Contention
                                           Run-time increase (%)
                    70%

                    60%              60% WCET increase
                    50%                is unacceptable
                    40%

                    30%


App.
App       membomb   20%

                    10%

                    0%
Core        Core           429.mcf      471.omnetpp   473.astar   433.milc   470.lbm



Shared system bus   • Run-time increase due to contention
       Memory             – Five SPEC2006 benchmarks
                          – Compared to solo execution
                                                                                       5
Goal
• Mechanism to control memory contention
  – Software based controller for COTS multi-core
    processors


• Response time analysis accounting memory
  contention effect
  – Based on proposed software based controller



                                                    6
Outline
•   Motivation
•   Memory Access Control System
•   Response Time Analysis
•   Evaluation
•   Conclusion




                                   7
System Architecture
          Core1       Core2            Core3    Core4


            Memory bandwidth controllers (Part of OS)
           20%       30%          10%             40%


                              System bus


                                DRAM

• Assign memory bandwidth to each core using per-core
  memory bandwidth controller

                                                        8
Memory Bandwidth Controller
• Periodic server for memory resource

• Periodically monitor memory accesses of the core
  and control user specified bandwidth using OS
  scheduler
   – Monitoring can be efficiently done by using per-core
     hardware performance counter
   – Bandwidth = # memory accesses X avg. access time



                                                            9
Memory Bandwidth Controller
  • Period: 10 time unit, Budget: 2 memory accesses
         – memory access takes 1 time unit

                          Enqueue tasks
         2
Budget
         1



 Task

         0                  10               20
                 Dequeue tasks                Dequeue tasks

                   computation
                                                              10
                   memory fetch
Outline
•   Motivation
•   Memory Bandwidth Control System
•   Response Time Analysis
•   Evaluation
•   Conclusion




                                      11
System Model
   Critical core                Interfering cores
      Core1        Core2            Core3           Core4

                    Memory bandwidth controller

                           System bus

                             DRAM

• Cores are partitioned based on criticality
• Critical core runs periodic real-time tasks with fixed
  priority scheduling algorithm
• Interfering cores run non-critical workload and
  regulated with proposed memory access controller
                                                            12
Assumptions
    Critical core                Interfering cores
       Core1        Core2            Core3           Core4

                     Memory bandwidth controller

                            System bus

                              DRAM

•   Private or partitioned last level cache (LLC)
•   Round-robin bus arbitration policy
•   Memory access latency is constant
•   1 LLC miss = 1 DRAM access
                                                             13
Simple Case: One Interfering Core
                   Critical            Interfering

                    Core                     Core

                                            Memory
                                           bandwidth
                                           controller



                              System bus


                                DRAM

• Critical core - core under analysis
• Interfering core – generating memory interference
                                                        14
Problem Formulation
• For a given periodic real-time task set 𝑇 = {𝜏1 ,
   𝜏1 ,…, 𝜏 𝑛 } on a critical core
• Problem:
  – Determine 𝑇 is schedulable on the critical core
    given memory access control budget Q and period
    P on the interfering core




                                                      15
Task Model

            CM



                                               computation
                                               memory fetch
                                               (cache stall)
                    C    time

– C : WCET of a task on isolated core (no interference)
– CM: number of last level cache misses (DRAM accesses)
– L: stall time of single cache miss

                                                          16
Memory Interference Model




– P : memory access controller period
– Q: memory access time budget
– αu(t): Linearized interfering memory traffic upper-bound

                                                             17
Background [Pellizzoni07]
• Accounting Memory Interference
  – Cache bound: maximum interference time <=
    maximum number cache-accesses (CM) * L of the task
    under analysis
  – Traffic bound: maximum interference time <=
    maximum bus time requested by the interfering core


                                      Cache-bound                            Traffic bound
    𝐶 : WCET account memory stall delay
   L: stall time of single cache miss
                                                                                                      18
    [Pellizzoni07] “Toward the predictable integration of real-time cots based systems,” RTSS, 2007
Classic Response Time Analysis

                              𝑅𝑖 𝑘
𝑅 𝑖𝑘+1 = 𝐶 𝑖 +                     ∗ 𝐶𝑗
                               𝑇𝑗
                      𝑗<𝑖



– Tasks are sorted in priority order
    • low index = high priority task
– 𝐶 𝑖 : WCET of task i (in isolation w/o memory interference)
– 𝑅 𝑖 : Response time of task i
– 𝑇𝑗 : Period of task I


                                                                19
Extended Response Time Analysis

                               𝑅𝑖 𝑘
   𝑅 𝑖𝑘+1 = 𝐶 𝑖 +                   ∗ 𝐶𝑗 + min                    𝑁 𝑅 𝑘 ∗ 𝐿, 𝛼 𝑢 𝑅 𝑘
                                𝑇𝑗
                        𝑗<𝑖                                                    𝑡
                                                    where 𝑁 𝑡 =          𝑗≤𝑖        ∗ 𝐶𝑀𝑗
                                                                               𝑇𝑗

                                                              𝑢
                                                                       𝑄   𝑄(𝑃 − 𝑄)
                                                          𝛼       𝑡 = 𝑡 +2
                                                                       𝑃       𝑃
   –    𝑁 𝑡 : aggregated cache misses over time t
   –    𝛼 𝑢 (𝑡): interfering memory traffic over time t
   –   P: memory access control period
   –   Q: memory access time budget

• Proposed method achieves tighter response time than using 𝐶

                                                                                            20
Outline
•   Motivation
•   Memory Bandwidth Control System
•   Response Time Analysis
•   Evaluation
•   Conclusion




                                      21
Linux Kernel Implementation
• Extending CPU bandwidth reservation feature of
  group scheduler
  – Specify core and bandwidth (memory budget, period)
     •   mkdir /sys/fs/cgroup/core3; cd /sys/fs/cgroup/core3
     •   echo 3 > cpuset.cpus                       core 3
     •   echo 10000 > cpu.cfs_period_us             period
     •   echo 500000 > cpu.cfs_quota_event          cache-misses budget
           – Added feature

  – Monitor memory usage at every scheduler tick and
    context switch


                                                                       22
Experimental Platform
                                    Intel Core2Quad

             Core 0              Core 1               Core 2              Core 3

          L1-I   L1-D        L1-I    L1-D           L1-I   L1-D       L1-I    L1-D


                      L2 Cache                                 L2 Cache



                                          System Bus

                                            DRAM

• Core 0,2 were disabled to simulate a private LLC system
• Running a modified Linux 3.2 kernel
   – https://github.com/heechul/linux-sched-coreidle/tree/sched-3.2-throttle-v2

                                                                                     23
Synthetic Task

App.
App        membomb


Core1        Core3

Shared system bus
       Memory


   •    Core under analysis runs a synthetic task with 50% memory bandwidth
   •    Vary throttling budget of the interfering core from 0 to 100%
   •    Two findings: (1) we can control interference, (2) analysis provide an upper
        bound (albeit still pessimistic)

                                                                                       24
H.264 Movie Playback




• Cache-miss counts sampled over every 100ms
• Some inaccuracy in regulation due to implementation limitation
    – Current version is improved accurate by using hardware overflow interrupt

                                                                                  25
Conclusion
• Shared hardware resources in multi-core systems
  are big challenges for designing real-time systems

• We proposed and implemented a mechanism to
  provide memory bandwidth reservation
  capability on COTS multi-core processors

• We developed a response time analysis method
  using the proposed memory access control
  mechanism

                                                   26
Thank you.




             27

More Related Content

PPTX
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
PDF
A Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles
PPTX
Multi-IMA Partition Scheduling for Global I/O Synchronization
PDF
Lect06
PPTX
Performance Profiling in a Virtualized Environment
PPTX
Performance Profiling of Virtual Machines
PPTX
How to Measure RTOS Performance
PPTX
Apache Hadoop India Summit 2011 talk "Profiling Application Performance" by U...
MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isola...
A Simplex Architecture for Intelligent and Safe Unmanned Aerial Vehicles
Multi-IMA Partition Scheduling for Global I/O Synchronization
Lect06
Performance Profiling in a Virtualized Environment
Performance Profiling of Virtual Machines
How to Measure RTOS Performance
Apache Hadoop India Summit 2011 talk "Profiling Application Performance" by U...

What's hot (20)

PDF
VMworld 2014: Extreme Performance Series
PDF
Priority Inversion on Mars
PDF
Clifford sugerman
PPTX
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
PPT
design_flow
PDF
Rtos slides
PDF
Advanced virtualization techniques for FAUmachine
PPTX
Real time Linux
PDF
booting-booster-final-20160420-0700
PDF
Mastering Real-time Linux
PPTX
Real Time Kernels
PPTX
Hadoop Summit 2012 | HDFS High Availability
PPTX
Chip Multithreading Systems Need a New Operating System Scheduler
PPT
Considerations when implementing_ha_in_dmf
PPTX
Hardware-aware thread scheduling: the case of asymmetric multicore processors
PDF
Linux Preempt-RT Internals
PPTX
Gnu linux for safety related systems
PDF
Introducing KMux - The Kernel Multiplexer
PPTX
Memory management in vx works
PPTX
Preempt_rt realtime patch
VMworld 2014: Extreme Performance Series
Priority Inversion on Mars
Clifford sugerman
Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2
design_flow
Rtos slides
Advanced virtualization techniques for FAUmachine
Real time Linux
booting-booster-final-20160420-0700
Mastering Real-time Linux
Real Time Kernels
Hadoop Summit 2012 | HDFS High Availability
Chip Multithreading Systems Need a New Operating System Scheduler
Considerations when implementing_ha_in_dmf
Hardware-aware thread scheduling: the case of asymmetric multicore processors
Linux Preempt-RT Internals
Gnu linux for safety related systems
Introducing KMux - The Kernel Multiplexer
Memory management in vx works
Preempt_rt realtime patch
Ad

Similar to Memory access control in multiprocessor for real-time system with mixed criticality (20)

PPT
Power Optimization Through Manycore Multiprocessing
PPTX
Limitations of memory system performance
PPTX
Computer System Architecture Lecture Note 8.1 primary Memory
PPT
Multicore computers
DOC
Introduction to multi core
PPT
chap 18 multicore computers
PDF
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
PPTX
Computer System Overview-William Stallings.pptx
PPT
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
PDF
An FPGA-based Scalable Simulation Accelerator for Tile Architectures @HEART2011
PPTX
CPU Caches
PPT
Did you know
PPTX
TRACK G: An Innovative multicore system architecture for wireless SoCs/ Alon ...
PDF
Ph.D. thesis presentation
PPTX
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
PDF
System on Chip Based RTC in Power Electronics
PPSX
Coa presentation3
PPTX
CPU Memory Hierarchy and Caching Techniques
PDF
Week5
PPT
Cs intro-ca
Power Optimization Through Manycore Multiprocessing
Limitations of memory system performance
Computer System Architecture Lecture Note 8.1 primary Memory
Multicore computers
Introduction to multi core
chap 18 multicore computers
Parallelism-Aware Memory Interference Delay Analysis for COTS Multicore Systems
Computer System Overview-William Stallings.pptx
SOC-CH4.pptSOC Processors Used in SOCSOC Processors Used in SOC
An FPGA-based Scalable Simulation Accelerator for Tile Architectures @HEART2011
CPU Caches
Did you know
TRACK G: An Innovative multicore system architecture for wireless SoCs/ Alon ...
Ph.D. thesis presentation
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
System on Chip Based RTC in Power Electronics
Coa presentation3
CPU Memory Hierarchy and Caching Techniques
Week5
Cs intro-ca
Ad

More from Heechul Yun (6)

PDF
Micro-Architectural Attacks on Cyber-Physical Systems
PDF
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
PDF
Deterministic Memory Abstraction and Supporting Multicore System Architecture
PPTX
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
PPTX
Improving Real-Time Performance on Multicore Platforms using MemGuard
PPTX
System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks
Micro-Architectural Attacks on Cyber-Physical Systems
Protecting Real-Time GPU Kernels in Integrated CPU-GPU SoC Platforms
Deterministic Memory Abstraction and Supporting Multicore System Architecture
Taming Non-blocking Caches to Improve Isolation in Multicore Real-Time Systems
Improving Real-Time Performance on Multicore Platforms using MemGuard
System-wide Energy Optimization for Multiple DVS Components and Real-time Tasks

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
Machine Learning_overview_presentation.pptx
PDF
Encapsulation theory and applications.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Cloud computing and distributed systems.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PDF
Encapsulation_ Review paper, used for researhc scholars
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Machine learning based COVID-19 study performance prediction
sap open course for s4hana steps from ECC to s4
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectral efficient network and resource selection model in 5G networks
Machine Learning_overview_presentation.pptx
Encapsulation theory and applications.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
Dropbox Q2 2025 Financial Results & Investor Presentation
Building Integrated photovoltaic BIPV_UPV.pdf
Review of recent advances in non-invasive hemoglobin estimation
20250228 LYD VKU AI Blended-Learning.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
A comparative analysis of optical character recognition models for extracting...
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
Encapsulation_ Review paper, used for researhc scholars

Memory access control in multiprocessor for real-time system with mixed criticality

  • 1. Memory Access Control in Multiprocessor for Real-time Systems with Mixed Criticality Heechul Yun+, Gang Yao+, Rodolfo Pellizzoni*, Marco Caccamo+, Lui Sha+ University of Illinois at Urbana and Champaign+ Univerity of Waterloo*
  • 2. Multi-core Systems • Mainstream in smartphone – Dual/quad-core smartphones – More performance with less power Tegra 3 (4 cores) • Traditional embedded/real-time domains – Avionics companies are investigating [Nowotsch12] • 8 core P4080 processor from Freescale 2 [Nowotsch12] “Leveraging Multi-Core Computing Architectures in Avionics”, EDCC, 2012
  • 3. Challenge Core1 Core2 Core3 Core4 System bus DRAM • Timing isolation is hard to achieve 3
  • 4. Challenge Appl1 App2 App 3 App 4 Core1 Core2 Core3 Core4 System bus DRAM • Cores compete for shared HW resources – System bus, DRAM controller, shared cache, … 4
  • 5. Effect of Memory Contention Run-time increase (%) 70% 60% 60% WCET increase 50% is unacceptable 40% 30% App. App membomb 20% 10% 0% Core Core 429.mcf 471.omnetpp 473.astar 433.milc 470.lbm Shared system bus • Run-time increase due to contention Memory – Five SPEC2006 benchmarks – Compared to solo execution 5
  • 6. Goal • Mechanism to control memory contention – Software based controller for COTS multi-core processors • Response time analysis accounting memory contention effect – Based on proposed software based controller 6
  • 7. Outline • Motivation • Memory Access Control System • Response Time Analysis • Evaluation • Conclusion 7
  • 8. System Architecture Core1 Core2 Core3 Core4 Memory bandwidth controllers (Part of OS) 20% 30% 10% 40% System bus DRAM • Assign memory bandwidth to each core using per-core memory bandwidth controller 8
  • 9. Memory Bandwidth Controller • Periodic server for memory resource • Periodically monitor memory accesses of the core and control user specified bandwidth using OS scheduler – Monitoring can be efficiently done by using per-core hardware performance counter – Bandwidth = # memory accesses X avg. access time 9
  • 10. Memory Bandwidth Controller • Period: 10 time unit, Budget: 2 memory accesses – memory access takes 1 time unit Enqueue tasks 2 Budget 1 Task 0 10 20 Dequeue tasks Dequeue tasks computation 10 memory fetch
  • 11. Outline • Motivation • Memory Bandwidth Control System • Response Time Analysis • Evaluation • Conclusion 11
  • 12. System Model Critical core Interfering cores Core1 Core2 Core3 Core4 Memory bandwidth controller System bus DRAM • Cores are partitioned based on criticality • Critical core runs periodic real-time tasks with fixed priority scheduling algorithm • Interfering cores run non-critical workload and regulated with proposed memory access controller 12
  • 13. Assumptions Critical core Interfering cores Core1 Core2 Core3 Core4 Memory bandwidth controller System bus DRAM • Private or partitioned last level cache (LLC) • Round-robin bus arbitration policy • Memory access latency is constant • 1 LLC miss = 1 DRAM access 13
  • 14. Simple Case: One Interfering Core Critical Interfering Core Core Memory bandwidth controller System bus DRAM • Critical core - core under analysis • Interfering core – generating memory interference 14
  • 15. Problem Formulation • For a given periodic real-time task set 𝑇 = {𝜏1 , 𝜏1 ,…, 𝜏 𝑛 } on a critical core • Problem: – Determine 𝑇 is schedulable on the critical core given memory access control budget Q and period P on the interfering core 15
  • 16. Task Model CM computation memory fetch (cache stall) C time – C : WCET of a task on isolated core (no interference) – CM: number of last level cache misses (DRAM accesses) – L: stall time of single cache miss 16
  • 17. Memory Interference Model – P : memory access controller period – Q: memory access time budget – αu(t): Linearized interfering memory traffic upper-bound 17
  • 18. Background [Pellizzoni07] • Accounting Memory Interference – Cache bound: maximum interference time <= maximum number cache-accesses (CM) * L of the task under analysis – Traffic bound: maximum interference time <= maximum bus time requested by the interfering core Cache-bound Traffic bound 𝐶 : WCET account memory stall delay L: stall time of single cache miss 18 [Pellizzoni07] “Toward the predictable integration of real-time cots based systems,” RTSS, 2007
  • 19. Classic Response Time Analysis 𝑅𝑖 𝑘 𝑅 𝑖𝑘+1 = 𝐶 𝑖 + ∗ 𝐶𝑗 𝑇𝑗 𝑗<𝑖 – Tasks are sorted in priority order • low index = high priority task – 𝐶 𝑖 : WCET of task i (in isolation w/o memory interference) – 𝑅 𝑖 : Response time of task i – 𝑇𝑗 : Period of task I 19
  • 20. Extended Response Time Analysis 𝑅𝑖 𝑘 𝑅 𝑖𝑘+1 = 𝐶 𝑖 + ∗ 𝐶𝑗 + min 𝑁 𝑅 𝑘 ∗ 𝐿, 𝛼 𝑢 𝑅 𝑘 𝑇𝑗 𝑗<𝑖 𝑡 where 𝑁 𝑡 = 𝑗≤𝑖 ∗ 𝐶𝑀𝑗 𝑇𝑗 𝑢 𝑄 𝑄(𝑃 − 𝑄) 𝛼 𝑡 = 𝑡 +2 𝑃 𝑃 – 𝑁 𝑡 : aggregated cache misses over time t – 𝛼 𝑢 (𝑡): interfering memory traffic over time t – P: memory access control period – Q: memory access time budget • Proposed method achieves tighter response time than using 𝐶 20
  • 21. Outline • Motivation • Memory Bandwidth Control System • Response Time Analysis • Evaluation • Conclusion 21
  • 22. Linux Kernel Implementation • Extending CPU bandwidth reservation feature of group scheduler – Specify core and bandwidth (memory budget, period) • mkdir /sys/fs/cgroup/core3; cd /sys/fs/cgroup/core3 • echo 3 > cpuset.cpus  core 3 • echo 10000 > cpu.cfs_period_us  period • echo 500000 > cpu.cfs_quota_event  cache-misses budget – Added feature – Monitor memory usage at every scheduler tick and context switch 22
  • 23. Experimental Platform Intel Core2Quad Core 0 Core 1 Core 2 Core 3 L1-I L1-D L1-I L1-D L1-I L1-D L1-I L1-D L2 Cache L2 Cache System Bus DRAM • Core 0,2 were disabled to simulate a private LLC system • Running a modified Linux 3.2 kernel – https://github.com/heechul/linux-sched-coreidle/tree/sched-3.2-throttle-v2 23
  • 24. Synthetic Task App. App membomb Core1 Core3 Shared system bus Memory • Core under analysis runs a synthetic task with 50% memory bandwidth • Vary throttling budget of the interfering core from 0 to 100% • Two findings: (1) we can control interference, (2) analysis provide an upper bound (albeit still pessimistic) 24
  • 25. H.264 Movie Playback • Cache-miss counts sampled over every 100ms • Some inaccuracy in regulation due to implementation limitation – Current version is improved accurate by using hardware overflow interrupt 25
  • 26. Conclusion • Shared hardware resources in multi-core systems are big challenges for designing real-time systems • We proposed and implemented a mechanism to provide memory bandwidth reservation capability on COTS multi-core processors • We developed a response time analysis method using the proposed memory access control mechanism 26