SlideShare a Scribd company logo
Mårten Rånge
WCOM AB

@marten_range
Concurrency
Examples for .NET
Concurrency scalability
Responsive
Performance
Scalable algorithms
Three pillars of Concurrency
 Scalability (CPU)
 Parallel.For

 Responsiveness
 Task/Future
 async/await

 Consistency





lock/synchronized
Interlocked.*
Mutex/Event/Semaphore
Monitor
Scalability
Concurrency scalability
Which is fastest?
var ints = new int[InnerLoop];
var random = new Random();
for (var inner = 0; inner < InnerLoop; ++inner)
{
ints[inner] = random.Next();
}
// -----------------------------------------------var ints = new int[InnerLoop];
var random = new Random();
Parallel.For(
0,
InnerLoop,
i => ints[i] = random.Next()
);
SHARED STATE  Race condition
var ints = new int[InnerLoop];
var random = new Random();
for (var inner = 0; inner < InnerLoop; ++inner)
{
ints[inner] = random.Next();
}
// -----------------------------------------------var ints = new int[InnerLoop];
var random = new Random();
Parallel.For(
0,
InnerLoop,
i => ints[i] = random.Next()
);
SHARED STATE  Poor performance
var ints = new int[InnerLoop];
var random = new Random();
for (var inner = 0; inner < InnerLoop; ++inner)
{
ints[inner] = random.Next();
}
// -----------------------------------------------var ints = new int[InnerLoop];
var random = new Random();
Parallel.For(
0,
InnerLoop,
i => ints[i] = random.Next()
);
Concurrency scalability
Then and now
Metric

VAX-11/750 (’80)

Today

Improvement

MHz

6

3300

550x

Memory MB

2

16384

8192x

Memory MB/s

13

R ~10000
W ~2500

770x
190x
Then and now
Metric

VAX-11/750 (’80)

Today

Improvement

MHz

6

3300

550x

Memory MB

2

16384

8192x

Memory MB/s

13

Memory nsec

225

R ~10000
W ~2500
70

770x
190x
3x
Then and now
Metric

VAX-11/750 (’80)

Today

Improvement

MHz

6

3300

550x

Memory MB

2

16384

8192x

Memory MB/s

13

Memory nsec

225

R ~10000
W ~2500
70

770x
190x
3x

Memory cycles

1.4

210

-150x
299,792,458 m/s
Concurrency scalability
Speed of light is too slow
0.09 m/c
Concurrency scalability
99% - latency mitigation
1% - computation
2 Core CPU
CPU1

CPU2

L1

L1

L2

L2
L3

RAM
2 Core CPU – L1 Cache

CPU1

L1

CPU2
new Random ()

new int[InnerLoop]

L1
2 Core CPU – L1 Cache

CPU1

CPU2

Random object

Random object

L1

L1
2 Core CPU – L1 Cache

CPU1

CPU2

Random object

Random object

L1

L1
2 Core CPU – L1 Cache

CPU1

CPU2

Random object

Random object

L1

L1
2 Core CPU – L1 Cache

CPU1

CPU2

Random object

Random object

L1

L1
2 Core CPU – L1 Cache

CPU1

CPU2

Random object

Random object

L1

L1
2 Core CPU – L1 Cache

CPU1

CPU2

Random object

Random object

L1

L1
4 Core CPU – L1 Cache

CPU1

L1

CPU2

L1

CPU3

new Random ()

new int[InnerLoop]

L1

CPU4

L1
2x4 Core CPU
CPU1 CPU2 CPU3 CPU4

CPU5 CPU6 CPU7 CPU8

L1

L1

L1

L1

L1

L1

L1

L1

L2

L2

L2

L2

L2

L2

L2

L2

L3

L3

RAM
Solution 1 – Locks
var ints = new int[InnerLoop];
var random = new Random();
Parallel.For(
0,
InnerLoop,
i => {lock (ints) {ints[i] = random.Next();}}
);
Solution 2 – No sharing
var ints = new int[InnerLoop];
Parallel.For(
0,
InnerLoop,
() => new Random(),
(i, pls, random) =>
{ints[i] = random.Next(); return random;},
random => {}
);
Parallel.For adds overhead
Level2
Level1

Level2
Level0
Level2

Level1
Level2

ints[0]

ints[1]
ints[2]
ints[3]

ints[4]
ints[5]
ints[6]

ints[7]
Solution 3 – Less overhead
var ints = new int[InnerLoop];
Parallel.For(
0,
InnerLoop / Modulus,
() => new Random(),
(i, pls, random) =>
{
var begin
= i * Modulus
;
var end
= begin + Modulus
;
for (var iter = begin; iter < end; ++iter)
{
ints[iter] = random.Next();
}
return random;
},
random => {}
);
var ints = new int[InnerLoop];
var random = new Random();
for (var inner = 0; inner < InnerLoop; ++inner)
{
ints[inner] = random.Next();
}
Solution 4 – Independent runs
var tasks = Enumerable.Range (0, 8).Select (
i => Task.Factory.StartNew (
() =>
{
var ints = new int[InnerLoop];
var random = new Random ();
while (counter.CountDown ())
{
for (var inner = 0; inner < InnerLoop; ++inner)
{
ints[inner] = random.Next();
}
}
},
TaskCreationOptions.LongRunning))
.ToArray ();
Task.WaitAll (tasks);
Parallel.For
Only for CPU bound problems
Sharing is bad
Kills performance
Race conditions
Dead-locks
Cache locality
RAM is a misnomer
Class design
Avoid GC
Natural concurrency
Avoid Parallel.For
Act like an engineer
Measure before and after
One more thing…
http://tinyurl.com/wcom-cpuscalability
Mårten Rånge
WCOM AB

@marten_range

More Related Content

PPTX
Concurrency
PPTX
Operating Systems - A Primer
PDF
Specializing the Data Path - Hooking into the Linux Network Stack
PPT
Lowering STM Overhead with Static Analysis
PDF
Advanced cfg bypass on adobe flash player 18 defcon russia 23
ODP
Paractical Solutions for Multicore Programming
PPTX
Crafting a Ready-to-Go STM
PDF
Zn task - defcon russia 20
Concurrency
Operating Systems - A Primer
Specializing the Data Path - Hooking into the Linux Network Stack
Lowering STM Overhead with Static Analysis
Advanced cfg bypass on adobe flash player 18 defcon russia 23
Paractical Solutions for Multicore Programming
Crafting a Ready-to-Go STM
Zn task - defcon russia 20

What's hot (20)

PDF
Basic of Exploitation
PDF
Facts about multithreading that'll keep you up at night - Guy Bar on, Vonage
PDF
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
PDF
Course lecture - An introduction to the Return Oriented Programming
PPTX
C10k and beyond - Uri Shamay, Akamai
PPTX
Bypassing DEP using ROP
PDF
Exploitation of counter overflows in the Linux kernel
PPTX
How Functions Work
PDF
[嵌入式系統] MCS-51 實驗 - 使用 IAR (3)
PPTX
Dive into ROP - a quick introduction to Return Oriented Programming
PPTX
Operating System Engineering Quiz
PPTX
An introduction to ROP
PPTX
Computer Science Homework Help
ODP
eBPF maps 101
PPTX
Operating System Assignment Help
PPTX
Computer Science Assignment Help
PDF
TestR: generating unit tests for R internals
PDF
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
PDF
Bare metal performance in Elixir
PDF
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Basic of Exploitation
Facts about multithreading that'll keep you up at night - Guy Bar on, Vonage
[嵌入式系統] MCS-51 實驗 - 使用 IAR (2)
Course lecture - An introduction to the Return Oriented Programming
C10k and beyond - Uri Shamay, Akamai
Bypassing DEP using ROP
Exploitation of counter overflows in the Linux kernel
How Functions Work
[嵌入式系統] MCS-51 實驗 - 使用 IAR (3)
Dive into ROP - a quick introduction to Return Oriented Programming
Operating System Engineering Quiz
An introduction to ROP
Computer Science Homework Help
eBPF maps 101
Operating System Assignment Help
Computer Science Assignment Help
TestR: generating unit tests for R internals
Global Interpreter Lock: Episode III - cat &lt; /dev/zero > GIL;
Bare metal performance in Elixir
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Ad

Similar to Concurrency scalability (20)

PDF
MultiThreading-in-system-and-android-logcat-42-.pdf
PPTX
Algorithm analysis.pptx
PDF
Return Oriented Programming - ROP
PDF
[Ruxcon 2011] Post Memory Corruption Memory Analysis
PPT
[CCC-28c3] Post Memory Corruption Memory Analysis
PDF
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
PDF
Beauty and the beast - Haskell on JVM
PPTX
Cpu高效编程技术
PDF
Austin c-c++-meetup-feb2018-spectre
PDF
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
PPTX
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
PDF
Protocol T50: Five months later... So what?
PDF
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
PDF
High Performance Systems Without Tears - Scala Days Berlin 2018
PPTX
Getting started cpp full
PDF
The Quantum Physics of Java
PDF
Programar para GPUs
DOC
C - aptitude3
DOC
C aptitude questions
MultiThreading-in-system-and-android-logcat-42-.pdf
Algorithm analysis.pptx
Return Oriented Programming - ROP
[Ruxcon 2011] Post Memory Corruption Memory Analysis
[CCC-28c3] Post Memory Corruption Memory Analysis
Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes
Beauty and the beast - Haskell on JVM
Cpu高效编程技术
Austin c-c++-meetup-feb2018-spectre
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
PVS-Studio 5.00, a solution for developers of modern resource-intensive appl...
Protocol T50: Five months later... So what?
ExperiencesSharingOnEmbeddedSystemDevelopment_20160321
High Performance Systems Without Tears - Scala Days Berlin 2018
Getting started cpp full
The Quantum Physics of Java
Programar para GPUs
C - aptitude3
C aptitude questions
Ad

More from Mårten Rånge (9)

PPTX
Know your FOSS obligations
PPTX
Ray Marching Explained
PPTX
Better performance through Superscalarity
PPTX
Property Based Tesing
PPTX
Monad - a functional design pattern
PPTX
Formlets
PPTX
Pragmatic metaprogramming
PPTX
Concurrency - responsiveness in .NET
PPTX
Meta Programming
Know your FOSS obligations
Ray Marching Explained
Better performance through Superscalarity
Property Based Tesing
Monad - a functional design pattern
Formlets
Pragmatic metaprogramming
Concurrency - responsiveness in .NET
Meta Programming

Recently uploaded (20)

PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation theory and applications.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation_ Review paper, used for researhc scholars
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Network Security Unit 5.pdf for BCA BBA.
MYSQL Presentation for SQL database connectivity
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
The AUB Centre for AI in Media Proposal.docx
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
Chapter 3 Spatial Domain Image Processing.pdf
Unlocking AI with Model Context Protocol (MCP)

Concurrency scalability