Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

Yosuke Wakisaka Naoki Shibata Junji Kitamichi‡
Keiichi Yasumoto Minoru Ito
Nara Institute of Science and Technology
‡University of Aizu
Task Scheduling Algorithm for
Multicore Processor Systems with
Turbo Boost and Hyper-Threading
1

• Multicore processors
– Widely used in distributed computing environments
• Data centers, supercomputers, ordinary PCs, mobile devices
– Dynamic control of the CPU’s operating frequency
• Turbo Boost(Intel), Turbo Core(AMD), Dynamic frequency scaling(nVidia
GPUs)
– Efficient computation while saving power
• Hyper Threading(Intel), Simultaneous multithreading(IBM POWER)
Background
2

Objective of Research
• Efficient task scheduling
– Utilizing the latest technologies
– Minimize total computation time
• No known scheduling algorithms takes
account of these technologies
Dynamic scaling of operating speed by Turbo Boost
and Hyper-Threading
Delay of communication by network contention
3

Related Works(1/2)
• Task scheduling algorithm taking account of
network contention [1]
• Formulates communication delay between processors
– Reduces network contention
– No multicore processors
– No scaling of operating frequency
[1] O. Sinnen and LA. Sousa: ``Communication Contention in Task Scheduling,"
IEEE Transactions on Parallel and Distributed Systems, Vol. 16, No. 6, pp. 503-515,
2005.
4

Related Works(2/2)
• Task scheduling algorithm taking account of
multicore processor and fail-stop model [2]
– Considers both network contention and failure of
multiple processors on a single die
– No consideration for change of operating
frequency
[2] S. Gotoda, N. Shibata and M. Ito : ``Task scheduling algorithm for multicore
processor system for minimizing recovery time in case of single node fault,"
International Symposium on Cluster, Cloud, and Grid Computing (CCGrid), pp. 260-
267, 2012.
5

Outline
1. Background
2. Related Works
3. Modelling of Computing System
4. Proposed Method
5. Evaluation
6. Conclusion
6

Task Graph
• A group of tasks that can be
executed in parallel
• Vertex (task node)
– Task to be executed on a single
CPU core
• Edge (task link)
– Data dependence between tasks
7
Task node Task link
Task graph

Processor Graph
• Topology of the computer network
• Vertex (Processor node)
– CPU core (circle)
• has only one link
– Switch (rectangle)
• has more than 2 links
• Edge (Processor link)
– Communication path between
processors
8
Processor node Processor linkSwitch
Processor graph
321

Task Scheduling
• Task scheduling problem
– assigns a processor node to each
task node
– minimizes total execution time
– An NP-hard problem
9
1
One processor node is
assigned to each task node
321
Processor graph
Task graph

Core2
2 processors are working
Turbo Boost
• Technology for accelerating multicore processors
– Scales operating frequencies according to the usage of
each core
– Higher frequency if small number of cores are working
– Lower frequency if all of the cores are working
• Assumption
– Operating frequency only depends on usage of each
core
• It does not depend on temperature
– Frequency is switched instantly, anytime
Core0
Core1
Core3
Idle
10
Frequency

Hyper-Threading
Physical processor core
Hardware
Resource
Register file 1
Register file 2
Thread 1
Thread 2
Hardware Thread 1
Hardware Thread 2
• Multiple hardware threads are executed on a single
physical processor
– Hardware resources are shared among multiple threads
– Reduces performance penalty by pipeline stall
• We define Effective operating frequency
• Speed index reflecting reduction of executing speed by
sharing resources between hardware threads
11

Modelling Frequency Control by Turbo
Boost and Hyper-Threading
• Effective operating frequency depends on only
– The number of working processors
– The property of executed program
• We measure the operating frequencies
corresponding to processor states beforehand
– Programs with different properties are executed on
multiple cores at a time
– We measure the operating frequency of the real
devices
12

Modelling Effective Frequency
State of cores Frequency
[ c , i ] , [ i , i ] , [ i , i ] , [ i , i ] To be measured
[ c , i ] , [ c , i ] , [ i , i ] , [ i , i ] To be measured
Operating frequency when Turbo Boost is enabled
（The processor has 4 cores 8 hardware threads）
i:idle, c:computation heavy
m：memory access heavy, n：In between c and m
• We want to predict the operating frequency from
the number of working cores
– We changed the number of working cores
– We measured the operating frequency
Physical core Hardware thread
13

Results of Measurement
State Freq Ratio Effective freq.
[ c , c ] 3.1GHz 0.84 2.6GHz
[ m , m ] 3.1GHz 0.76 2.3GHz
[ n , n ] 3.1GHz 0.79 2.5GHz
Only Turbo Boost is used
Only Hyper Threading is used
 State of cores
 i : idle
 c : computation heavy
 m : memory-access
heavy
 n : In between c and m
 When Turbo Boost is used
 Effective freq. depends on
number of working cores
 When Hyper-Threading is
used
 Effective freq. depends on
properties of programs
State of cores Effective freq.
[ c , i ] [ i , i ] [ i , i ] [ i , i ] 3.7GHz
[ c , i ] [ c , i ] [ i , i ] [ i , i ] 3.5GHz
[ c , i ] [ c , i ] [ c , i ] [ i , i ] 3.3GHz
[ c , i ] [ c , i ] [ c , i ] [ c , i ] 3.1GHz
[ m , i ] [ i , i ] [ i , i ] [ i , i ] 3.7GHz
[ m , i ] [ m , i ] [ m , i ] [ m , i ] 3.1GHz
[ n , i ] [ i , i ] [ i , i ] [ i , i ] 3.7GHz
[ n , i ] [ n , i ] [ n , i ] [ n , i ] 3.1GHz
14

Outline
1. Background
2. Related Works
4. Proposed Method
5. Evaluation
6. Conclusion
15

Problem in Existing Scheduling
• Tasks are assigned in order
• Assumes fixed executing speed
• When operating frequency is dynamically
changed
– Operating frequency is high when the first task is
assigned
– Frequency is reduced as many tasks are
assigned to cores
Core1
Core2
Core3
Core4
a
Resulting schedule
Existing scheduling technique
:Freq. for 1 core execution
１２３４
a b c
12
1 2
1
d e
Execution Time 16

changed
assigned
assigned to cores
Core1
Core2
Core3
Core4
a
Resulting schedule
１２３４
a b c
12
1 2
1
d e
b
Execution Time
17

changed
assigned
assigned to cores
a
Resulting schedule
１２３４
a b c
12
1 2
1
d e
Core1
Core2
Core3
Core4
a
b
Execution Time
18

changed
assigned
assigned to cores
Core1
Core2
Core3
Core4
Resulting schedule
a
１２３４
a b c
12
1 2
1
d e
b
Execution time for a is increased
because b is assigned
Execution Time
19

• Execution time changes by assigning following tasks
• We cannot know which core is assigned to which task until
the scheduling is finished
Problem and Solution
Problem
• Tentatively assign tasks and estimate the operating
frequencies
Solution
20

Proc
2
N1
Execution Time
Proc
1
Core1
Core2
Core3
Core4
Estimating Execution Time of Tasks
Node
1
32 4
Link
5
a
c db
d
2 2 2
22 ２１４３
Task graph Processor graph
comm.
TB
1core
TB
2Core
HT
• Initial assignment of the first task
21

Proc
2
N1
Execution Time
Proc
1
Core1
Core2
Core3
Core4
Estimating Execution Time of Tasks
Node
1
32 4
Link
5
a
c db
d
2 2 2
22 ２１４３
comm.
TB
1core
TB
2Core
HT
N２
• Initial assignment of the second task
Task graph Processor graph 22

Proc
2
N1
N4
C
Execution Time
Proc
1
Core1
Core2
Core3
Core4
Tentative Assignment
Node
1
32 4
Link
5
a
c db
d
2 2 2
22 ２１４３
comm.
TB
1core
TB
2Core
HT
• We calculate communication time using the existing
method
• Operating frequency is tentatively fixed
N3
C
N5
Tentatively
assigned
N2

Proc
2
• Operating frequency is determined by observing
usage of each core
C
Execution Time
Proc
1
Core1
Core2
Core3
Core4
Determining Operating Freq.
Node
1
32 4
Link
5
a
c db
d
2 2 2
22 ２１４３
comm.
TB
1core
TB
2Core
HT
C
N1
N3
N4
N2 N5

Proc
2
C
Proc
1
Core1
Core2
Core3
Core4
C
N1
N4
N2
N5
Selecting Processor to Assign
• Processor is assigned so that the whole processing
time of the task graph is minimized
Proc
2
C
Execution Time
Proc
1
Core1
Core2
Core3
Core4
comm.
TB
1core
TB
2Core
HT
C
N1
N3
N4
N2 N5
Execution Time
C
N3 C
0 5 10 0 5 10
9 11
Compare the execution time of the whole task graph
25

Outline
1. Background
2. Related Works
4. Proposed Method
5. Evaluation
6. Conclusion
26

Evaluation
• Items to be evaluated
– Execution time of the whole task graph
– Accuracy of estimation of execution time
• Compared method
– We extended the Sinnen’s scheduling method in a
straight-forward way
• Sinnen’s method only considers network contention
• We switched the subroutine to calculate the task
execution time with our model
• The first method only assigns to the physical cores
（SinnenPhysical）
• The second method assigns to all hardware threads
（SinnenLogical） 27

Configuration
• System
– CPU:Intel Core i7 3770T(2.5GHz, 4 cores, 8 hardware
threads, single socket)
– Memory：16.0GB
– OS:Windows7 SP1(64bit)
– Java(TM) SE (1.6.0 21, 64bit)
– Gigabit LAN subsystem using the Intel 82579V Gigabit
Ethernet Controller
• Network
– 4 computers are connected with Gigabit Ethernet
– 420Mbps bandwidth (Measured with the real system)
28

Network Topologies for Evaluation
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Star
16 processors
5 switches
Mesh
16 processors
4 switches
（Only for simulation）
Tree
16 processors
7 switches
29

Task Graphs for Evaluation
 High parallelism（Matrix Solver）
 98 nodes
 67 edges
 Low parallelism（Robot Control）
 90 nodes
 135 edges
T. Tobita and H. Kasahara: ``A standard task graph set for fair evaluation of multi-processor scheduling
algorithms," Journal of Scheduling, Vol. 5, pp. 379-394,2002. 30

Execution Time of Whole Task Graph
• Execution time is reduced by up to 35%
• Higher parallelism leads to more reduction
– Because of greater freedom of choosing which core to assign
TaskExecutionTime
(sec)
16% 35% 9%
24%
0
50
100
150
200
250
Robot Solver Robot Solver
Tree Star
SinnenPhysical SinnenLogical Proposed
31

0
50
100
150
200
250
Robot Solver Robot Solver
Tree Star
SinnenPhysical Simulation SinnenPhysical Experiment System SinnenLogical Simulation
SinnenLogical Experiment System Proposed Simulation Proposed Experiment System
Accuracy of Estimation of Execution Time
• Estimation error is 4% in average, 7% at maximum
TaskExecutionTime
(sec)
5
% 6
%
7
%
7
% 6
%
5
%
5
%
6
%
4
% 1
%
2
% 1
%
32

Conclusion
• Task scheduling method for multicore processor systems
– Considering dynamic scaling of operating frequency by Turbo
Boost and Hyper Threading
– Network contention
• We modeled frequency control by Turbo Boost and
Hyper-Threading
33

Yosuke Wakisaka, Naoki Shibata, Keiichi
Yasumoto, Minoru Ito, and Junji Kitamichi : Task
Scheduling Algorithm for Multicore Processor
Systems with Turbo Boost and Hyper-
Threading, In Proc. of The 2014 International
Conference on Parallel and Distributed Processing
Techniques and Applications(PDPTA'14), pp. 229-
235, 2014-07-21 [ PDF ]
34

Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading (20)

More from Naoki Shibata (20)

Recently uploaded (20)

Task Scheduling Algorithm for Multicore Processor Systems with Turbo Boost and Hyper-Threading

Editor's Notes