SlideShare a Scribd company logo
Power Management Working
Group – sched_mc
Linaro connect Q4-11
Vincent Guittot <vincent.guittot@linaro.org>
https://wiki.linaro.org/WorkingGroups/PowerManagement/Specs/sche
d_mc
Overview
● CPU topology & sched_domain
● Load balance & trigger
● Power saving load balance
● Quad cortex-A9
● Dual cortex-A9
● Open issue
● Big.Little
● New Load Balance
CPU topology & sched domain
● Dual/Quad cortexMP
● New sched_domain configuration
● MC level instead of CPU level
● Add SD_SHARE_PKG_RESOURCES flag
Load Balance
● Load balance monitoring
● On running cores
● On idle cores
● Load balance check on events
● Newly idle cpu
● Wake up task
– Select an idle core with
SD_SHARE_PKG_RESOURCES
Load Balance trigger
● Done at each sched_domain level
● Only one for ARM cortex-A9 MP
● Some requirements for load balancing:
● Only on group which is out of capacity
● Default core capacity is 1 thread
● If there is an obvious imbalance (> imbalance_pct
%)
● Asym packing
● Load group with lowest cpu number 1st
Powersaving Load Balance
● Requirements :
● At least 2 sched_groups
● SD_POWERSAVINGS_BALANCE flag (CPU level only)
● near idle group and a near full group
● 0 < running threads < group capacity
● group_capacity = group_power /
SCHED_POWER_SCALE
● Not possible with Cortex-A9 MP system
Quad cortex-A9
● Change cpu topology for power saving mode
● Use arch_update_cpu_topology
● Use CPU sched_domain level
● with group_capacity > 1
● Emulate a virtual dual packages:
Quad cortex-A9
● Pack tasks on one virtual package
● Low cpu load example :
– Cyclictest with 10 threads
● With default configuration
– Tasks can be spread on 4 cores at wake up
● With virtual dual packages
– Tasks can be spread on 2 cores at wake up
– Periodic load balance coould spread on 4 cores
● Default config (without cpu topology patch)
– Only periodic load balance will spread on 4 cores
Quad cortex-A9● sched_mc 0
● sched_mc 2
Quad cortex-A9
● Use all cores
● Heavy cpu load example :
– Sysbench
● With default configuration
– Tasks can be spread on 4 cores at wake up
● With virtual dual package
– Tasks can be spread on 2 cores at wake up
– Periodic load balance will spread on 4 cores
● Default config (without cpu topology patch)
– Only periodic load balance will spread on 4 cores
Quad cortex-A9
sched_mc 0 sched_mc 2
sysbench 0.4.12: multi-threaded system
evaluation benchmark
Running the test with following options:
Number of threads: 12
Doing CPU performance benchmark
Threads started!
Time limit exceeded, exiting...
(last message repeated 11 times)
Done.
Maximum prime number checked in CPU test:
10000
Test execution summary:
total time: 20.3001s
total number of events: 517
total time taken by event execution: 242.7256
per-request statistics:
min: 315.64ms
avg: 469.49ms
max: 922.72ms
approx. 95 percentile: 491.18ms
Threads fairness:
events (avg/stddev): 43.0833/0.28
execution time (avg/stddev): 20.2271/0.05
sysbench 0.4.12: multi-threaded system
evaluation benchmark
Running the test with following options:
Number of threads: 12
Doing CPU performance benchmark
Threads started!
Time limit exceeded, exiting...
(last message repeated 11 times)
Done.
Maximum prime number checked in CPU test:
10000
Test execution summary:
total time: 20.2956s
total number of events: 528
total time taken by event execution: 242.4893
per-request statistics:
min: 372.61ms
avg: 459.26ms
max: 513.43ms
approx. 95 percentile: 473.56ms
Threads fairness:
events (avg/stddev): 44.0000/0.00
execution time (avg/stddev): 20.2074/0.04
Dual cortex-A9
● Change cpu topology for power saving mode
● Use CPU sched_domain level
● with group_capacity > 1
● Increase cpu_power
● cpu_capacity = cpu_power / SCHED_POWER_SCALE
● Use arch_scale_freq_power to increase cpu_power
● Pull several tasks on 1 core
● Emulate a dual package
● Default config without cpu topology
Dual cortex-A9
● Pack tasks on one core
● Low cpu load example :
– Cyclictest with 10 threads
● With default configuration
– Tasks can be spread on 2 cores at wake up
● With virtual dual package and increase of
cpu_power
– Periodic load balance could spread on 2 cores
Dual cortex-A9● sched_mc 0
● sched_mc 2
Dual Cortex-A9
● One cpu will be used while it has capacity
● The trigger is the number of running threads
● And both cores when core0 is out of capacity
● Heavy cpu load use case with few tasks
● Use all cores
● Use cpufreq as a light cpu load detector
● At lowest frequency, increase cpu_power and pull
tasks
● At other frequencies, use default cpu_power and
spread tasks
Open issue
● Using cpu_power to pull tasks
● Set a cpu_power to increase cpu_capacity is not
advised
● Intermediate step
● cpu_power update
● Updated during “Idle” and “Not idle” load balance
● But periodic load balance could not be called for a
while
– Cyclictest example
● RFC Patch available to ensure a periodic update
Open issue
● Idle load balance
● The ILB call can be locked for a while
● RFC patch available to solve such issue
● Need to check new modifications around ILB call
● Spurious wake up
● Idle Load balance called when nr_running > 1 on a
cpu
● No more true if cpu_power has been increased
● To be studied and propose a patch
Big.Little
● cpu_power can be used to define asymetric system
● More tasks will run on Big
● How to differentiate heavy and light cpu load tasks ?
● Time weighted cpu load (see new load balance)
● How to differentiate background/foreground tasks ?
● How to differentiate IO task ?
New Load balance
● Load balance modification on going
● Discussion during ELCE
● Take into account different kind of topology
● Dual/Quad cores (1 package)
● Big.Little
Questions?

More Related Content

PDF
HKG15-100: What is Linaro working on - core development lightning talks
PDF
Kernel Recipes 2015: Introduction to Kernel Power Management
PDF
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
PDF
BKK16-104 sched-freq
PDF
BKK16-208 EAS
PDF
Kernel Recipes 2015 - Porting Linux to a new processor architecture
PDF
Low latency & mechanical sympathy issues and solutions
PDF
Linux kernel debugging
HKG15-100: What is Linaro working on - core development lightning talks
Kernel Recipes 2015: Introduction to Kernel Power Management
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
BKK16-104 sched-freq
BKK16-208 EAS
Kernel Recipes 2015 - Porting Linux to a new processor architecture
Low latency & mechanical sympathy issues and solutions
Linux kernel debugging

What's hot (20)

PDF
Linux BPF Superpowers
PDF
Velocity 2015 linux perf tools
PDF
ACM Applicative System Methodology 2016
PDF
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
PDF
Le guide de dépannage de la jvm
PPTX
Linux kernel debugging
PDF
RxNetty vs Tomcat Performance Results
PDF
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
PDF
Kernel Recipes 2015 - Kernel dump analysis
PPTX
Am I reading GC logs Correctly?
PDF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
PDF
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
PPTX
Broken Linux Performance Tools 2016
PPTX
QCon 2015 Broken Performance Tools
PDF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
PDF
BPF: Tracing and more
PDF
Linux 4.x Tracing Tools: Using BPF Superpowers
PDF
Designing Tracing Tools
ODP
Speeding up ps and top
PDF
BKK16-TR08 How to generate power models for EAS and IPA
Linux BPF Superpowers
Velocity 2015 linux perf tools
ACM Applicative System Methodology 2016
Embedded Recipes 2018 - Finding sources of Latency In your system - Steven Ro...
Le guide de dépannage de la jvm
Linux kernel debugging
RxNetty vs Tomcat Performance Results
Kernel Recipes 2016 - Understanding a Real-Time System (more than just a kernel)
Kernel Recipes 2015 - Kernel dump analysis
Am I reading GC logs Correctly?
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Broken Linux Performance Tools 2016
QCon 2015 Broken Performance Tools
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
BPF: Tracing and more
Linux 4.x Tracing Tools: Using BPF Superpowers
Designing Tracing Tools
Speeding up ps and top
BKK16-TR08 How to generate power models for EAS and IPA
Ad

Viewers also liked (14)

PDF
LCE12: LCE12 ARMv8 Plenary
PDF
BUD17-218: Scheduler Load tracking update and improvement
PDF
Q2.12: Scheduler Inputs
PDF
LCE12: big.LITTLE TC2 update
PDF
LCE12: big.LITTLE Mini-Summit
PDF
ARM-KVM: Weather Report
PDF
2010 11 psa montreal explanation and fundamentalism
PDF
BKK16-304 The State of GDB on AArch64
PDF
20141111_SOS3_Gallo
PDF
HKG15-405: Redundant zero/sign-extension elimination in GCC
PDF
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
PDF
BKK16-305B ILP32 Performance on AArch64
PDF
BKK16-504 Running Linux in EL2 Virtualization
PDF
HKG15-400: Next steps in KVM enablement on ARM
LCE12: LCE12 ARMv8 Plenary
BUD17-218: Scheduler Load tracking update and improvement
Q2.12: Scheduler Inputs
LCE12: big.LITTLE TC2 update
LCE12: big.LITTLE Mini-Summit
ARM-KVM: Weather Report
2010 11 psa montreal explanation and fundamentalism
BKK16-304 The State of GDB on AArch64
20141111_SOS3_Gallo
HKG15-405: Redundant zero/sign-extension elimination in GCC
LAS16-301: OpenStack on Aarch64, running in production, upstream improvements...
BKK16-305B ILP32 Performance on AArch64
BKK16-504 Running Linux in EL2 Virtualization
HKG15-400: Next steps in KVM enablement on ARM
Ad

Similar to Q4.11: Sched_mc on dual / quad cores (20)

PPTX
Emr spark tuning demystified
PDF
Process Scheduler and Balancer in Linux Kernel
PDF
Introduction to ARM big.LITTLE technology
PDF
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
PDF
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
PPTX
Introduction to memory order consume
PDF
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
PPTX
Building a Better JVM
PDF
Memory Bandwidth QoS
PDF
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
PDF
AMP Kynetics - ELC 2018 Portland
PPTX
VMworld 2016: vSphere 6.x Host Resource Deep Dive
PDF
When the OS gets in the way
PPTX
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
PDF
Measuring a 25 and 40Gb/s Data Plane
ODP
LSA2 - 02 Control Groups
PDF
BKK16-317 How to generate power models for EAS and IPA
PDF
Deep Dive on Amazon EC2 Instances (March 2017)
PDF
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
PDF
cache2k, Java Caching, Turbo Charged, FOSDEM 2015
Emr spark tuning demystified
Process Scheduler and Balancer in Linux Kernel
Introduction to ARM big.LITTLE technology
Basics_of_Kernel_Panic_Hang_and_ Kdump.pdf
Q2.12: Existing Linux Mechanisms to Support big.LITTLE
Introduction to memory order consume
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
Building a Better JVM
Memory Bandwidth QoS
Asymmetric Multiprocessing - Kynetics ELC 2018 portland
AMP Kynetics - ELC 2018 Portland
VMworld 2016: vSphere 6.x Host Resource Deep Dive
When the OS gets in the way
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Measuring a 25 and 40Gb/s Data Plane
LSA2 - 02 Control Groups
BKK16-317 How to generate power models for EAS and IPA
Deep Dive on Amazon EC2 Instances (March 2017)
XPDDS19: Core Scheduling in Xen - Jürgen Groß, SUSE
cache2k, Java Caching, Turbo Charged, FOSDEM 2015

More from Linaro (20)

PDF
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
PDF
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
PDF
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
PDF
Bud17 113: distribution ci using qemu and open qa
PDF
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
PDF
HPC network stack on ARM - Linaro HPC Workshop 2018
PDF
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
PDF
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
PDF
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
PDF
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
PDF
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
PDF
HKG18-100K1 - George Grey: Opening Keynote
PDF
HKG18-318 - OpenAMP Workshop
PDF
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
PDF
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
PDF
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
PDF
HKG18-TR08 - Upstreaming SVE in QEMU
PDF
HKG18-113- Secure Data Path work with i.MX8M
PPTX
HKG18-120 - Devicetree Schema Documentation and Validation
PPTX
HKG18-223 - Trusted FirmwareM: Trusted boot
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Bud17 113: distribution ci using qemu and open qa
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-100K1 - George Grey: Opening Keynote
HKG18-318 - OpenAMP Workshop
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-113- Secure Data Path work with i.MX8M
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-223 - Trusted FirmwareM: Trusted boot

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Electronic commerce courselecture one. Pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
cuic standard and advanced reporting.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation theory and applications.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Empathic Computing: Creating Shared Understanding
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Approach and Philosophy of On baking technology
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Electronic commerce courselecture one. Pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The AUB Centre for AI in Media Proposal.docx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Understanding_Digital_Forensics_Presentation.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
cuic standard and advanced reporting.pdf
Machine learning based COVID-19 study performance prediction
Encapsulation theory and applications.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Network Security Unit 5.pdf for BCA BBA.
Empathic Computing: Creating Shared Understanding
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
Reach Out and Touch Someone: Haptics and Empathic Computing
NewMind AI Monthly Chronicles - July 2025
Approach and Philosophy of On baking technology

Q4.11: Sched_mc on dual / quad cores

  • 1. Power Management Working Group – sched_mc Linaro connect Q4-11 Vincent Guittot <vincent.guittot@linaro.org> https://wiki.linaro.org/WorkingGroups/PowerManagement/Specs/sche d_mc
  • 2. Overview ● CPU topology & sched_domain ● Load balance & trigger ● Power saving load balance ● Quad cortex-A9 ● Dual cortex-A9 ● Open issue ● Big.Little ● New Load Balance
  • 3. CPU topology & sched domain ● Dual/Quad cortexMP ● New sched_domain configuration ● MC level instead of CPU level ● Add SD_SHARE_PKG_RESOURCES flag
  • 4. Load Balance ● Load balance monitoring ● On running cores ● On idle cores ● Load balance check on events ● Newly idle cpu ● Wake up task – Select an idle core with SD_SHARE_PKG_RESOURCES
  • 5. Load Balance trigger ● Done at each sched_domain level ● Only one for ARM cortex-A9 MP ● Some requirements for load balancing: ● Only on group which is out of capacity ● Default core capacity is 1 thread ● If there is an obvious imbalance (> imbalance_pct %) ● Asym packing ● Load group with lowest cpu number 1st
  • 6. Powersaving Load Balance ● Requirements : ● At least 2 sched_groups ● SD_POWERSAVINGS_BALANCE flag (CPU level only) ● near idle group and a near full group ● 0 < running threads < group capacity ● group_capacity = group_power / SCHED_POWER_SCALE ● Not possible with Cortex-A9 MP system
  • 7. Quad cortex-A9 ● Change cpu topology for power saving mode ● Use arch_update_cpu_topology ● Use CPU sched_domain level ● with group_capacity > 1 ● Emulate a virtual dual packages:
  • 8. Quad cortex-A9 ● Pack tasks on one virtual package ● Low cpu load example : – Cyclictest with 10 threads ● With default configuration – Tasks can be spread on 4 cores at wake up ● With virtual dual packages – Tasks can be spread on 2 cores at wake up – Periodic load balance coould spread on 4 cores ● Default config (without cpu topology patch) – Only periodic load balance will spread on 4 cores
  • 9. Quad cortex-A9● sched_mc 0 ● sched_mc 2
  • 10. Quad cortex-A9 ● Use all cores ● Heavy cpu load example : – Sysbench ● With default configuration – Tasks can be spread on 4 cores at wake up ● With virtual dual package – Tasks can be spread on 2 cores at wake up – Periodic load balance will spread on 4 cores ● Default config (without cpu topology patch) – Only periodic load balance will spread on 4 cores
  • 11. Quad cortex-A9 sched_mc 0 sched_mc 2 sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 12 Doing CPU performance benchmark Threads started! Time limit exceeded, exiting... (last message repeated 11 times) Done. Maximum prime number checked in CPU test: 10000 Test execution summary: total time: 20.3001s total number of events: 517 total time taken by event execution: 242.7256 per-request statistics: min: 315.64ms avg: 469.49ms max: 922.72ms approx. 95 percentile: 491.18ms Threads fairness: events (avg/stddev): 43.0833/0.28 execution time (avg/stddev): 20.2271/0.05 sysbench 0.4.12: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 12 Doing CPU performance benchmark Threads started! Time limit exceeded, exiting... (last message repeated 11 times) Done. Maximum prime number checked in CPU test: 10000 Test execution summary: total time: 20.2956s total number of events: 528 total time taken by event execution: 242.4893 per-request statistics: min: 372.61ms avg: 459.26ms max: 513.43ms approx. 95 percentile: 473.56ms Threads fairness: events (avg/stddev): 44.0000/0.00 execution time (avg/stddev): 20.2074/0.04
  • 12. Dual cortex-A9 ● Change cpu topology for power saving mode ● Use CPU sched_domain level ● with group_capacity > 1 ● Increase cpu_power ● cpu_capacity = cpu_power / SCHED_POWER_SCALE ● Use arch_scale_freq_power to increase cpu_power ● Pull several tasks on 1 core ● Emulate a dual package ● Default config without cpu topology
  • 13. Dual cortex-A9 ● Pack tasks on one core ● Low cpu load example : – Cyclictest with 10 threads ● With default configuration – Tasks can be spread on 2 cores at wake up ● With virtual dual package and increase of cpu_power – Periodic load balance could spread on 2 cores
  • 14. Dual cortex-A9● sched_mc 0 ● sched_mc 2
  • 15. Dual Cortex-A9 ● One cpu will be used while it has capacity ● The trigger is the number of running threads ● And both cores when core0 is out of capacity ● Heavy cpu load use case with few tasks ● Use all cores ● Use cpufreq as a light cpu load detector ● At lowest frequency, increase cpu_power and pull tasks ● At other frequencies, use default cpu_power and spread tasks
  • 16. Open issue ● Using cpu_power to pull tasks ● Set a cpu_power to increase cpu_capacity is not advised ● Intermediate step ● cpu_power update ● Updated during “Idle” and “Not idle” load balance ● But periodic load balance could not be called for a while – Cyclictest example ● RFC Patch available to ensure a periodic update
  • 17. Open issue ● Idle load balance ● The ILB call can be locked for a while ● RFC patch available to solve such issue ● Need to check new modifications around ILB call ● Spurious wake up ● Idle Load balance called when nr_running > 1 on a cpu ● No more true if cpu_power has been increased ● To be studied and propose a patch
  • 18. Big.Little ● cpu_power can be used to define asymetric system ● More tasks will run on Big ● How to differentiate heavy and light cpu load tasks ? ● Time weighted cpu load (see new load balance) ● How to differentiate background/foreground tasks ? ● How to differentiate IO task ?
  • 19. New Load balance ● Load balance modification on going ● Discussion during ELCE ● Take into account different kind of topology ● Dual/Quad cores (1 package) ● Big.Little