Multicore programmingandtpl

Agenda
 Part 1 - Current state of affairs
 Part 2 - Multithreaded algorithms
 Part 3 – Task Parallel Library

Multicore Programming
Part 1: Current state of affairs

Why Moore's law is not working
anymore
 Power consumption
 Wire delays
 DRAM access latency
 Diminishing returns of more instruction-level
parallelism

Power consumption

Sun‟s Surface
10,000

1,000 Rocket Nozzle
Power Density (W/cm2)

100 Nuclear Reactor

10 Pentium® processors

Hot Plate
1
8080
486

386

„70 „80 ‟90 ‟00 „10
Intel Developer Forum, Spring 2004 - Pat Gelsinger

Diminishing returns
 80‟s
 10 CPI  1 CPI
 90
 1 CPI  0.5CPI
 00‟s: multicore

The Free Lunch Is Over. A
Fundamental Turn Toward
Concurrency in Software

Herb Sutter

Survival

 To scale performance, put many processing cores on the
microprocessor chip
 New Moore’s law edition is about doubling of cores.

Quotations
 No matter how fast processors get, software
consistently finds new ways to eat up the extra speed
 If you haven‟t done so already, now is the time to take
a hard look at the design of your
application, determine what operations are CPU-
sensitive now or are likely to become so soon, and
identify how those places could benefit from
concurrency.”
-- Herb Sutter, C++ Architect at Microsoft (March
2005)
 After decades of single core processors, the high
volume processor industry has gone from single to
dual to quad-core in just the last two years. Moore‟s
Law scaling should easily let us hit the 80-core mark
in mainstream processors within the next ten years
and quite possibly even less.
-- Justin Rattner, CTO, Intel (February

What keeps us away from multicore
 Sequential way of thinking
 Believe that parallel programming is difficult and
error-prone
 Unwilling to accept the fact that sequential era is
over
 Neglecting performance

What have been done
 Many frameworks have been created, that brings
parallelism at application level.
 Vendors hardly tries to teach programming
community how to write parallel programs
 MIT and other education centers did a lot of
researches in this area

Part 2: Multithreaded algorithms

Chapter 27 Multithreaded
Algorithms

Multithreaded algorithms
 No single architecture of parallel
computer  no single and wide
accepted model of parallel
computing
 We rely on parallel shared memory
computer

Dynamic multithreaded model(DMM)
 Allows programmer to operate with “logical
parallelism” without worrying about any issues of
static programming
 Two main features are:
 Nested parallelism (parent can proceed while
spawned child is computing its result)
 Parallel loop (iteration of the loop can execute
concurrently)

DMM - advantages
 Simple extension of “serial model”. Only 3 new
keywords: parallel, spawn and sync.
 Provides theoretically clean way of quantify
parallelism based on notions of “work” and
“span”
 Many MT algorithms based on nested parallelism
a naturally follows from divide and conquer
approach

Analyzing MT algorithms: Matrix
multiplication
P-Square-Matrix-Multiply:
1. n = a.rows
2. let C be new NxN matrix
3. parallel for i = 1 to n
4. parallel for j = 1 to n
5. Cij = 0
6. for k 1 to n
7. Cij= Cij + Aik * B kj

Analyzing MT algorithms: Matrix
multiplication


Part 2: Task Parallel Library

TPL building blocks
 Consist of:
- Tasks
- Tread Safe Scalable Collections
- Phases and Work Exchange
- Partitioning
- Looping
- Control
- Breaking
- Exceptions
- Results

Data parallelism

Parallel.ForEach(letters, ch => Capitalize(ch));

Task parallelism

Parallel.Invoke(() => Average(), () => Minimum()
…);

Task Scheduler & Thread pool
 3.5 ThreadPool.QueueUserWorkItem
disadvantages:
 Zero information about each work item
 Fairness FIFO queue maintain
 Improvements:
 More efficient FIFO queue (ConcurrentQueue)
 Enhance the API to get more information from user
 Task
 Work stealing
 Threads injections
 Wait completion, handling exceptions, getting computation
result

New Primitives
 Thread-safe, scalable collections  AggregateException
 IProducerConsumerCollection<T>  Initialization
 ConcurrentQueue<T>  Lazy<T>
 ConcurrentStack<T>  LazyInitializer.EnsureInitialized<T>
 ConcurrentBag<T>  ThreadLocal<T>
 ConcurrentDictionary<TKey,TValu
e>
 Locks
 ManualResetEventSlim
 Phases and work exchange
 SemaphoreSlim
 Barrier
 SpinLock
 BlockingCollection<T>
 SpinWait
 CountdownEvent

 Cancellation
 Partitioning
 CancellationToken{Source}
 {Orderable}Partitioner<T>
 Partitioner.Create

 Exception handling

References
 The Free Lunch Is Over: A Fundamental Turn
Toward Concurrency in Software
 MIT Introduction to algorithms video lectures
 Chapter 27 Multithreaded Algorithms from
Introduction to algorithms 3rd edition
 CLR 4.0 ThreadPool Improvements: Part 1
 Multicore Programming Primer
 ThreadPool on Channel 9

Multicore programmingandtpl

More Related Content

What's hot (20)

Similar to Multicore programmingandtpl (20)

Multicore programmingandtpl