SlideShare a Scribd company logo
Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads 
Aleksandar Prokopec 
Martin Odersky 
1
Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel 
Workloads 
Aleksandar Prokopec 
Martin Odersky 
2
Uniform workload 
(0 until 10000000) reduce (+) 
3
Uniform workload 
(0 until 10000000) reduce (+) 
sum = sum + x 
4
Uniform workload 
(0 until 10000000) reduce (+) 
sum = sum + x 

 
N 
cycles 
5
Baseline workload 
for (0 until 10000000) {} 

 
N 
cycles 
6
Irregular workload 
7
Irregular workload 
N 
cycles 
8
Irregular workload 
for { 
x <- 0 until width 
y <- 0 until height 
} image(x, y) = compute(x, y) 
N 
cycles 
9
Irregular workload 
for { 
x <- 0 until width 
y <- 0 until height 
} image(x, y) = compute(x, y) 
N 
cycles 
10
Workload function 
workload(n) – work spent on element n after the data-parallel operation completed 
11
Workload function 
Could be
 
Runtime value 
dependent 
for { 
x <- 0 until width 
y <- 0 until height 
} img(x, y) = compute(x, y) 
workload(n) – work spent on element n after the data-parallel operation completed 
12
Workload function 
Could be
 
Execution-schedule 
dependent 
for (n <- nodes) 
n.neighbours += new Node 
workload(n) – work spent on element n after the data-parallel operation completed 
13
Workload function 
Could be
 
Totally random 
for ((x, y) <- img.indices) 
img(x, y) = sample( 
x + random(), 
y + random() 
) 
workload(n) – work spent on element n after the data-parallel operation completed 
14
Data-parallel scheduler 
Assign loop elements to workers 
without knowledge about the workload function. 
15
Data-parallel scheduler 
1. Linear speedup for the baseline workload 
Assign loop elements to workers 
without knowledge about the workload function. 
16
Data-parallel scheduler 
1. Linear speedup for the baseline workload 
2. Optimal speedup for irregular workloads 
Assign loop elements to workers 
without knowledge about the workload function. 
17
Static batching 
Decides on the worker-element assignment before the data-parallel operation begins. 
N 
cycles 
18
Static batching 
Decides on the worker-element assignment before the data-parallel operation begins. 
No knowledge → divide uniformly. 
Not optimal for even mildly irregular workloads. 
N 
cycles 
19
Fixed-size batching 
Workload-driven – decides during execution. 
N 
cycles 
progress 
20
Fixed-size batching 
Workload-driven – decides during execution. 
N 
cycles 
0 
21
Fixed-size batching 
Workload-driven – decides during execution. 
N 
cycles 
2 
T0: CAS 
T0 
22
Fixed-size batching 
Workload-driven – decides during execution. 
N 
cycles 
4 
T1: CAS 
T0 
T1 
23
Fixed-size batching 
Workload-driven – decides during execution. 
N 
cycles 
6 
T0: CAS 
T0 
T1 
24
Fixed-size batching 
Workload-driven – decides during execution. 
N 
cycles 
8 
T0: CAS 
T0 
T1 
25
Fixed-size batching 
Workload-driven – decides during execution. 
N 
cycles 
10 
T0: CAS 
T0 
T1 
26
Fixed-size batching 
Workload-driven – decides during execution. 
N 
cycles 
12 
T0: CAS 
T0 
T1 
27
Fixed-size batching 
Workload-driven – decides during execution. 
N 
cycles 
progress 
Pros: lightweight 
Cons: minimum batch size, contention 
28
Fixed-size batching - contention 
29
Factoring, GSS, TS 
Batch size varies. 
N 
cycles 
progress 
Pros: lightweight 
Cons: contention 
30
Task-based work-stealing 
N 
cycles 
0..2 
2..4 
4..8 
8..16 
31
Task-based work-stealing 
N 
cycles 
0..2 
2..4 
4..8 
8..16 
2..4 
4..8 
8..16 
T0 
T1 
0..2 
32
Task-based work-stealing 
N 
cycles 
0..2 
2..4 
4..8 
8..16 
2..4 
4..8 
8..16 
T0 
T1 
0..2 
steal – a rare event 
33
Task-based work-stealing 
N 
cycles 
0..2 
2..4 
4..8 
8..16 
2..4 
4..8 
8..16 
T0 
T1 
10..12 
12..16 
8..10 
0..2 
34
Task-based work-stealing 
Pros: can be adaptive - uses stealing information 
Cons: heavyweight - minimum batch size much larger 
N 
cycles 
0..2 
2..4 
4..8 
8..16 
2..4 
4..8 
8..16 
T0 
T1 
10..12 
12..16 
0..2 
8..10 
35
Task-based work-stealing 
N 
cycles 
0..2 
2..4 
4..8 
8..16 
Cannot be stolen 
after T0 starts processing it 
36
Work-stealing tree 
0 
0 
T0 
N 
owned 
37
Work-stealing tree 
0 
0 
T0 
N 
0 
50 
T0 
N 
owned 
owned 
T0: CAS 
38
Work-stealing tree 
0 
0 
T0 
N 
0 
50 
T0 
N 
0 
N 
T0 
N 

 
owned 
owned 
completed 
T0: CAS 
T0: CAS 
What about stealing? 
39
Work-stealing tree 
0 
0 
T0 
N 
0 
50 
T0 
N 
0 
N 
T0 
N 

 
owned 
owned 
completed 
0 
-51 
T0 
N 
T0: CAS 
T1: CAS 
stolen 
T0: CAS 
40
Work-stealing tree 
0 
50 
T0 
N 
0 
N 
T0 
N 

 
owned 
completed 
0 
-51 
T0 
N 
T0: CAS 
stolen 
T0: CAS 
0 
0 
T0 
N 
owned 
T1: CAS 
41
Work-stealing tree 
0 
50 
T0 
N 
0 
N 
T0 
N 

 
owned 
completed 
0 
-51 
T0 
N 
T0: CAS 
stolen 
0 
-51 
T0 
N 
expanded 
50 
50 
T0 
M 
M 
M 
T1 
N 
T0: CAS 
0 
0 
T0 
N 
owned 
M = (50 + N) / 2 
42
Work-stealing tree 
0 
50 
T0 
N 
0 
N 
T0 
N 

 
owned 
completed 
0 
-51 
T0 
N 
T0: CAS 
stolen 
0 
-51 
T0 
N 
expanded 
50 
50 
T0 
M 
M 
M 
T1 
N 
T0: CAS 
0 
0 
T0 
N 
owned 
M = (50 + N) / 2 
T0 or T1: CAS 
43
Work-stealing tree 
0 
50 
T0 
N 
0 
N 
T0 
N 

 
owned 
completed 
0 
-51 
T0 
N 
T0: CAS 
stolen 
0 
-51 
T0 
N 
expanded 
50 
50 
T0 
M 
M 
M 
T1 
N 
T0 or T1: CAS 
T0: CAS 
0 
0 
T0 
N 
owned 
M = (50 + N) / 2 
44
Work-stealing tree - contention 
45
Work-stealing tree scheduling 
1)find either a non-expanded, non-completed node 
2)if not found, terminate 
3)if not owned, steal and/or expand, and descend 
4)advance until node is completed or stolen 
5)go to 1) 
50
Work-stealing tree scheduling 
2)if not found, terminate 
3)if not owned, steal and/or expand, and descend 
4)advance until node is completed or stolen 
5)go to 1) 
1)find either a non-expanded, non-completed node 
51
Choosing the node to steal 
Find first, in-order traversal 
2 
9 
5 
3 
52
Choosing the node to steal 
Find first, in-order traversal 
2 
9 
5 
3 
Catastrophic – a lot of stealing, huge trees 
53
Choosing the node to steal 
Find first, in-order traversal 
Find first, random order traversal 
2 
9 
5 
3 
2 
9 
5 
3 
Catastrophic – a lot of stealing, huge trees 
54
Choosing the node to steal 
Find first, in-order traversal 
Find first, random order traversal 
2 
9 
5 
3 
2 
9 
5 
3 
Catastrophic – a lot of stealing, huge trees 
Works reasonably well. 
55
Choosing the node to steal 
Find first, in-order traversal 
Find first, random order traversal 
Find most elements 
2 
9 
5 
3 
2 
9 
5 
3 
2 
9 
5 
3 
Catastrophic – a lot of stealing, huge trees 
Works reasonably well. 
Generates least nodes. 
Seems to be best. 
56
Comparison with fixed-size batching 
57
Comparison with fixed-size batching 
58
Comparison with task work-stealing 
59
Thank you! Questions? 
60
Finding work 
61
Other workloads 
62

More Related Content

PDF
Ctrie Data Structure
PDF
LEC 8-DS ALGO(heaps).pdf
PDF
Finite Difference Method
PDF
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
PPTX
Oracle 12c SQL: Date Ranges
PDF
SPSF03 - Numerical Integrations
PDF
SPSF02 - Graphical Data Representation
PPT
Ctrie Data Structure
LEC 8-DS ALGO(heaps).pdf
Finite Difference Method
Analyzing the Performance Effects of Meltdown + Spectre on Apache Spark Workl...
Oracle 12c SQL: Date Ranges
SPSF03 - Numerical Integrations
SPSF02 - Graphical Data Representation

What's hot (20)

PDF
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
PDF
SPSF04 - Euler and Runge-Kutta Methods
PPT
PPT
DOCX
CinemĂ tica directa e inversa de manipulador
 
KEY
Grand centraldispatch
PDF
Efficient Programs
PPTX
Data Algorithms And Analysis
PPTX
Offset in and offset out constraints
PDF
Gilat_ch05.pdf
PDF
Row Pattern Matching in SQL:2016
PDF
04 - 15 Jan - Heap Sort
PDF
04 - 15 Jan - Heap Sort
PPTX
C PROGRAMS - SARASWATHI RAMALINGAM
PDF
Overlap Layout Consensus assembly
PPT
Wepwhacker !
PDF
작은 슀타튞업에서 ëšžì‹ ëŸŹë‹ ë§›ëłŽêž°
PDF
Enjoyable Front-end Development with Reagent
PDF
Ethereum 9Ÿ @ Devcon5
PDF
Introduction of Hidden Markov Model
Apply Hammer Directly to Thumb; Avoiding Apache Spark and Cassandra AntiPatt...
SPSF04 - Euler and Runge-Kutta Methods
CinemĂ tica directa e inversa de manipulador
 
Grand centraldispatch
Efficient Programs
Data Algorithms And Analysis
Offset in and offset out constraints
Gilat_ch05.pdf
Row Pattern Matching in SQL:2016
04 - 15 Jan - Heap Sort
04 - 15 Jan - Heap Sort
C PROGRAMS - SARASWATHI RAMALINGAM
Overlap Layout Consensus assembly
Wepwhacker !
작은 슀타튞업에서 ëšžì‹ ëŸŹë‹ ë§›ëłŽêž°
Enjoyable Front-end Development with Reagent
Ethereum 9Ÿ @ Devcon5
Introduction of Hidden Markov Model
Ad

Similar to Work-stealing Tree Data Structure (20)

PPTX
Algorithim lec1.pptx
PPT
Quick Sort
PPTX
ADA_Module 2_MN.pptx Analysis and Design of Algorithms
PPT
Binsort
PDF
UNIT I_5.pdf
PDF
OpenSees dynamic_analysis
PPT
course information of design analysis of alg
PPTX
Merge sort and quick sort
PPTX
ICPC 2015, Tsukuba : Unofficial Commentary
 
PPT
MergesortQuickSort.ppt
PPT
presentation_mergesortquicksort_1458716068_193111.ppt
PDF
LEC 6-DS ALGO(updated).pdf
PPTX
cse couse aefrfrqewrbqwrgbqgvq2w3vqbvq23rbgw3rnw345
PDF
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
PDF
Self-managed and automatically reconfigurable stream processing
PDF
FINAL PROJECT
PPTX
Introduction to Algorithms
PPTX
1_Asymptotic_Notation_pptx.pptx
PPT
l1.ppt
PPT
l1.ppt
 
Algorithim lec1.pptx
Quick Sort
ADA_Module 2_MN.pptx Analysis and Design of Algorithms
Binsort
UNIT I_5.pdf
OpenSees dynamic_analysis
course information of design analysis of alg
Merge sort and quick sort
ICPC 2015, Tsukuba : Unofficial Commentary
 
MergesortQuickSort.ppt
presentation_mergesortquicksort_1458716068_193111.ppt
LEC 6-DS ALGO(updated).pdf
cse couse aefrfrqewrbqwrgbqgvq2w3vqbvq23rbgw3rnw345
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self-managed and automatically reconfigurable stream processing
FINAL PROJECT
Introduction to Algorithms
1_Asymptotic_Notation_pptx.pptx
l1.ppt
l1.ppt
 
Ad

More from Aleksandar Prokopec (7)

PDF
Introduction to Scala
PDF
ScalaBlitz
PDF
Scala Parallel Collections
PDF
ScalaMeter 2014
PDF
ScalaMeter 2012
PDF
Reactive Collections
PPTX
ScalaDays 2014 - Reactive Scala 3D Game Engine
Introduction to Scala
ScalaBlitz
Scala Parallel Collections
ScalaMeter 2014
ScalaMeter 2012
Reactive Collections
ScalaDays 2014 - Reactive Scala 3D Game Engine

Recently uploaded (20)

PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
ai tools demonstartion for schools and inter college
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
L1 - Introduction to python Backend.pptx
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPT
Introduction Database Management System for Course Database
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Transform Your Business with a Software ERP System
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
System and Network Administration Chapter 2
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
top salesforce developer skills in 2025.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Nekopoi APK 2025 free lastest update
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Adobe Illustrator 28.6 Crack My Vision of Vector Design
ai tools demonstartion for schools and inter college
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
L1 - Introduction to python Backend.pptx
Design an Analysis of Algorithms I-SECS-1021-03
How Creative Agencies Leverage Project Management Software.pdf
Introduction Database Management System for Course Database
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Operating system designcfffgfgggggggvggggggggg
Transform Your Business with a Software ERP System
VVF-Customer-Presentation2025-Ver1.9.pptx
Understanding Forklifts - TECH EHS Solution
Navsoft: AI-Powered Business Solutions & Custom Software Development
Which alternative to Crystal Reports is best for small or large businesses.pdf
System and Network Administration Chapter 2
Softaken Excel to vCard Converter Software.pdf
top salesforce developer skills in 2025.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Nekopoi APK 2025 free lastest update

Work-stealing Tree Data Structure