SlideShare a Scribd company logo
Multithreading and
      Parallelization
               Dmitri Nesteruk
dmitrinesteruk@gmail.com | http://nesteruk.org/seminars
Agenda
 Overview
 Multithreading
   PowerThreading (AsyncEnumerator)
 Multi-core parallelization
   Parallel Extensions to .NET Framework
 Multi-computer parallelization
   PureMPI.NET
Why now?
 Manycore paradigm shift
   CPU speeds reach production challenges
   (not at the limit yet)

   growth
 Processor features
   Hyper-threading
   SIMD
CPU Scope
 Past: more             Yesterday
 transistors per chip    1x-core

 Present: more cores
 per chip                 Today
                           2x-core norm
 Future: even more         4x-

 cores per chip;
                            Tomorrow
 NUMA & other                 32x-core?
 specialties
Machine Scope
 Most clients are
 concerned with     Machine
 one-machine use
 Clustering helps
                     Cluster
 leverage
 performance
 Clouds               Cloud
Multithreading vs. Parallelization
 Multithreading
    Using threads/thread pool to perform async
    operations
    Explicit (# of threads known)
 Parallelization
    Implicit parallelization
    No explicit thread operation
Ways to Parallelize/Multithread
                             System.Threading
             Managed         Parr. Extensions
                             Libraries

                             OpenMP
            Unmanaged        Libraries

                             GPGPU
            Specialized      FPGA
Managed
 System.Threading
 Libraries
   Parallel Extensions (TPL + PLINQ)
   PowerThreading
 Languages/frameworks
   Sing#, CCR
 Remoting, WCF, MPI.NET, PureMPI.NET, etc.
   Use over many machines
Unmanaged
 OpenMP
 – #pragma directives in C++ code
 Intel multi-core libraries
   Threading Building Blocks (low-level)
   Integrated Performance Primitives
   Math Kernel Library (also has MPI support)
 MPI, PVM, etc.
   Use over many machines
Specialized Ex. (Intrinsic Parallelization)
  GPU Computation (GPGPU)
    Calculations on graphic card
    Uses programmable pixel shaders
    See, e.g., NVidia CUDA, GPGPU.org
  FPGA
    Hardware-specific solutions
    E.g., in-socket accelerators
    Requires HDL programming & custom hardware
Part I

Multithreading: a look at
AsyncEnumerator
Multithreading
 Goals
   Do stuff concurrently
   Preserve safety/consistency
 Tools
   Threads
   ThreadPool
   Synchronization objects
   Framework async APIs
A Look at Delegates
 Making delegate for function is easy
 Given void a() { … }
  – ThreadStart del = a;
 Given void a(int n) { … }
  – Action<int> del = a;
 Given float a(int n, double m) {…}
  – Func<int, double, float> del = a;
 Otherwise, make your own!
Delegate Methods
 Invoke()
   Synchronous, blocks your thread 
 BeginInvoke
   Executes in ThreadPool
   Returns IAsyncResult
 EndInvoke
   Waits for completion
   Takes the IAsyncResult from BeginInvoke
Usage
 Fire and forget
  – del.BeginInvoke(null, null);
 Fire, and wait until done
  – IAsyncResult ar = del.BeginInvoke(null,null);
    …
    del.EndInvoke(ar);
 Fire, and call a function when done
  – del.BeginInvoke(firedWhenDone, null);
                      Callback parameter
WaitOne and WaitAll
 To wait until either delegate completes
  – WaitHandle.WaitOne(
      new ThreadStart[] {
        ar1.AsyncWaitHandle,
        ar2.AsyncWaitHandle
      }); // wait until either completes
 To wait until all delegates complete
    Use WaitAll instead of WaitOne
  – [MTAThread]-specific, use Pulse & Wait instead
Example
Execute a() and b() in parallel; wait on both

ThreadStart delA = a;
ThreadStart delB = b;
IAsyncResult arA = delA.BeginInvoke(null, null);
IAsyncResult arB = delB.BeginInvoke(null, null);
WaitHandle.WaitAll(new [] {
  arA.AsyncWaitHandle,
  arB.AsyncWaitHandle });
LINQ Example
Execute a() and b() in parallel; wait on both
WaitHandle.WaitAll(
  new [] { a, b }
   Implicitly make an array of delegates
  .Select (f =>f.BeginInvoke(null,null)
                                    Call each delegate
                                 .AsyncWaitHandle)
  .ToArray());                      Get a wait handle of each
   Convert from
   IEnumerable to array
Asynchronous Programming Model (APM)
 Basic goal
  – IAsyncResult ar =
      del.BeginXXX(null,null);
    …
    del.EndXXX(ar);
 Supported by Framework classes, e.g.,
  – FileStream
  – WebRequest
Difficulties
  Async calls do not always succeed
    Timeout
    Exceptions
    Cancelation
  Results in too many functions/anonymous
  delegates
    Async workflow code becomes difficult to read
PowerThreading
 A free library from   Resource locks
 Wintellect (Jeffrey    ReaderWriterGate
 Richter)              Async. prog. model
 Get it at              AsyncEnumerator
 wintellect.com         SyncGate
                       Other features
 Also check out
                        IO
 PowerCollections       State manager
                        NumaInformation :)
AsyncEnumerator
 Simplifies APM programming
 No need to manually manage
 IAsyncResult cookies
 Fewer functions, cleaner code
Usage patterns
 1 async op → process
 X async ops → process all
 X async ops → process each one as it
 completes
 X async ops → process some, discard the rest
 X async ops → process some until
 cancellation/timeout occurs, discard the rest
AsyncEnumerator Basics
 Has three methods
   Execute(IEnumerator<Int32>)
   BeginExecute
   EndExecute
 Also exists as AsyncEnumerator<T> when a
 return value is required
Inside the Function
internal IEnumerator<Int32> GetFile(
AsyncEnumerator ae, string uri)
{
  WebRequest wr = WebRequest.Create(uri);
  wr.BeginGetResponse(ae.End(), null);
  yield return 1;
  WebResponse resp = wr.EndGetResponse(
    ae.DequeueAsyncResult());
  // use response
}
Signature
internal IEnumerator<Int32> GetFile(
  AsyncEnumerator ae, string uri)
{
  Function must return IEnumerator<Int32>
WebRequestwr = WebRequest.Create(uri);
  Function must accept AsyncEnumerator as
wr.BeginGetResponse(ae.End(), null);
  one of the parameters (order unimportant)
  yield return 1;
WebResponseresp = wr.EndGetResponse(
ae.DequeueAsyncResult());
  // use response
}
Callback
internal IEnumerator<Int32> GetFile(
  AsyncEnumerator ae, string uri)
{
  WebRequest wr = WebRequest.Create(uri);
  wr.BeginGetResponse(ae.End(), null);
yieldthe asyncBeginXXX() methods
  Call return 1;
WebResponseresp = wr.EndGetResponse(
  Pass ae.End() as callback parameter
ae.DequeueAsyncResult());
  // use response
}
Yield
internal IEnumerator<Int32> GetFile(
  AsyncEnumerator ae, string uri)
{
  WebRequest wr = WebRequest.Create(uri);
  wr.BeginGetResponse(ae.End(), null);
  yield return 1;
WebResponseresp = wr.EndGetResponse(
  Now yield return the number of pending
  asynchronous operations
ae.DequeueAsyncResult());
  // use response
}
Wait & Process
internal IEnumerator<Int32> GetFile(
  AsyncEnumerator ae, string uri)
{
  WebRequest wr = WebRequest.Create(uri);
  wr.BeginGetResponse(ae.End(), null);
  yield return 1;    Call the asyncEndXXX() methods
  WebResponse resp = wr.EndGetResponse(
    ae.DequeueAsyncResult());
  // use response    Pass ae.DequeueAsyncResult() as parameter

}
Usage
 Init the enumerator
  – var ae = new AsyncEnumerator();
 Use it, passing itself as a parameter
  – ae.Execute(GetFile(
      ae, “http://nesteruk.org”));
Exception Handling
 Break out of function
  – try {
      resp = wr.EndGetResponse(
        ae.DequeueAsyncResult());
    } catch (WebException e) {
      // process e
      yield break;
    }
 Propagate a parameter
Discard Groups
 Sometimes, you want to ignore the result of
 some calls
   E.g., you already got the data elsewhere
 To discard a group of calls
   Use overloaded End(…) methods to specify
     Group number
     Cleanup delegate
   Call DiscardGroup(…) with group number
Cancellation
 External code can cancel the iterator
  – ae.Cancel(…)
 Or specify a timeout
  – ae.SetCancelTimeout(…)
 Check whether iterator is cancelled with
  – ae.IsCanceled(…)
    just call yield break if it is
Part II

Parallel Extensions to .NET
Framework TPL and PLINQ
Parallelization
 Algorithms vary

    (e.g., matrix multiplication)
    Some not so
    (e.g., matrix inversion)
    Some not at all

 parallelize them
Parallel Extensions to .NET Framework (PFX)
 A library for parallelization
 Consists of
    Task Parallel Library
    Parallel LINQ (PLINQ)
 Currently in CTP stage
 Maybe in .NET 4.0?
Task Parallel Library Features
 System.Linq
    Parallel LINQ
 System.Theading
    Implicit parallelism (Parallel.Xxx)
 System.Threading.Collections
    Thread-safe stack and queue
 System.Threading.Tasks
    Task manager, tasks, futures
System.Threading
 Implicit               Parallel.For | ForEach
 parallelization
 (Parallel.For and      LazyInit<T>
 ForEach)               WriteOnce<T>
 Aggregate
                        AggregateException
 exceptions
 Other useful classes
                        Other goodies 
Parallel.For
 Parallelizes a for loop
 Instead of

 for (int i = 0; i < 10; ++i) { … }

 We write

 Parallel.For(0, 10, i => { … });
Parallel.For Overloads
 Step size
 ParallelState for cancelation
 Thread-local initialization
 Thread-local finalization
 References to a TaskManager
 Task creation options
Parallel.ForEach
 Same features as Parallel.For except
    No counters or steps
 Takes an IEnumerable<T> 
Cancelation
 Parallel.For takes an Action<Int32>
 delegate
 Can also take an
 Action<Int32, ParallelState>
   ParallelState keeps track of the state of parallel
   execution
   ParallelState.Stop() stops execution in all threads
Parallel.For Exceptions
 The AggregateException class holds all
 exceptions thrown
 Created even if only one thread throws
 Used by both Parallel.Xxx and PLINQ
 Original exceptions stored in
 InnerExceptions property.
LazyInit<T>
 Lazy initialization of a single variable
 Options
  – AllowMultipleExecution
    Init function can be called by many threads, only
    one value published
  – EnsureSingleExecution
    Init function executed only once
  – ThreadLocal
    One init call & value per thread
WriteOnce<T>
 Single-assignment structure
 Just like Nullable:
   HasValue
   Value
 Also try methods
   TryGetValue
   TrySetValue
Futures
 A future is the name of a value that will
 eventually be produced by a computation
 Thus, we can decide what to do with the
 value before we know it
Futures of T
• Future is a factory
• Future<T> is the actual future (and also has
  factory methods)
  To make a future
  – var f = Future.Create(() => g());
  To use a future
    Get f.Value
    The accessor does an async computation
Tasks & TaskManager
 A better Thread+ThreadPool combination
 TaskManager
   A very clever thread pool :)
   Adjusts worker threads to # of CPUs/cores
   Keeps all cores busy
 Task
   A unit of work
   May (or may not) run concurrently
 http://channel9.msdn.com/posts/DanielMoth/Parall
 elFX-Task-and-friends/
Task
 Just like a future, a task takes an Action<T>
  – Task t = Task.Create(DoSomeWork);
    Overloads exist :)
 Fires off immediately. To wait on completion
  – t.Wait();
 Unlike the thread pool, task manager will use
 as many threads as there are cores
Parallel LINQ (PLINQ)
 Parallel evaluation in
    LINQ to Objects
    LINQ to XML
 Features
    IParallelEnumerable<T>
    ParallelEnumerable.AsParallel static
    method
Example
IEnumerable<T> data = ...;
var q = data.AsParallel()
  .Where(x => p(x))
  .Orderby(x => k(x))
  .Select(x => f(x));

foreach (var e in q)
  a(e);
Part III

Interprocess communication with
PureMPI.NET
Message Passing Interface
 An API for general-purpose IPC
 Works across cores & machines
 C++ and Fortran
   Some Intel libraries support explicitly
 http://www.mcs.anl.gov/research/projects/m
 pich2/
PureMPI.NET
 A free library available at http://purempi.net
 Uses WCF endpoints for communication
 Uses MPI syntax
 Features
   A library DLL for WCF functionality
   An EXE for easy deployment over network
How it works
 Your computers run a service that connects
 them together
 Your program exposes WCF endpoints
 You use the MPI interfaces to communicate
Communicator & Rank
 A communicator is a group of computers
   In most scenarios, you would have one group
   MPI_COMM_WORLD

 comm
   Useful for determine whether we are the
Main
static void Main(string[] args)
{                           MPIEnvironment           app.config

  using (ProcessorGroup processors =
    new ProcessorGroup("MPIEnvironment",
                       MpiProcess))
  {                     Run MpiProcess on all machines

    processors.Start(); Start each one
    processors.WaitForCompletion(); Wait on all
  }
}
Sending & Receiving
 Blocking or non-blocking methods
   Send/Receive (blocking)
   Begin|End Send/Receive (async)
   Invoked on the comm
Send/Receive
static void MpiProcess(IDictionary<string, Comm> comms)
{              Get a default comm from dictionary
  Comm comm = comms["MPI_COMM_WORLD"];
  if (comm.Rank == 0)
  {                 Get a message from 1 (blocking)
    string msg = comm.Receive<string>(1, string.Empty);
    Console.WriteLine("Got " + msg);
  }
  else if (comm.Rank == 1)
  {
    comm.Send(0, string.Empty, "Hello");
  }           Send a message to 0 (also blocking)
}
Extras
 Can use async ops
 Can send to all (Broadcast)
 Can distribute work and then collect it
 (Gather/Scatter)
Thank You!

More Related Content

KEY
OpenMP
PDF
Introduction to OpenMP
PDF
Open mp intro_01
PPTX
Presentation on Shared Memory Parallel Programming
PDF
OpenMP Tutorial for Beginners
PPTX
Intro to OpenMP
PDF
Open mp
PDF
Open mp directives
OpenMP
Introduction to OpenMP
Open mp intro_01
Presentation on Shared Memory Parallel Programming
OpenMP Tutorial for Beginners
Intro to OpenMP
Open mp
Open mp directives

What's hot (20)

PPT
PPT
OpenMP And C++
PDF
Concurrent Programming OpenMP @ Distributed System Discussion
PDF
Open mp library functions and environment variables
PDF
Automatic Reference Counting @ Pragma Night
PPTX
MPI n OpenMP
PDF
Erlang Message Passing Concurrency, For The Win
PDF
Wait for your fortune without Blocking!
ODP
openmp
PDF
JVM Mechanics: When Does the JVM JIT & Deoptimize?
PDF
Why GC is eating all my CPU?
PPTX
Parallelization using open mp
PPTX
PDF
Parallel Programming
PDF
Introduction to OpenMP
PDF
Introduction to OpenMP (Performance)
PPTX
Medical Image Processing Strategies for multi-core CPUs
ODP
OpenMp
PPTX
PDF
JCConf 2020 - New Java Features Released in 2020
OpenMP And C++
Concurrent Programming OpenMP @ Distributed System Discussion
Open mp library functions and environment variables
Automatic Reference Counting @ Pragma Night
MPI n OpenMP
Erlang Message Passing Concurrency, For The Win
Wait for your fortune without Blocking!
openmp
JVM Mechanics: When Does the JVM JIT & Deoptimize?
Why GC is eating all my CPU?
Parallelization using open mp
Parallel Programming
Introduction to OpenMP
Introduction to OpenMP (Performance)
Medical Image Processing Strategies for multi-core CPUs
OpenMp
JCConf 2020 - New Java Features Released in 2020
Ad

Similar to .Net Multithreading and Parallelization (20)

PPTX
Coding For Cores - C# Way
PPTX
.NET Multithreading/Multitasking
PPTX
Multi core programming 1
PPTX
Parallel Processing
PPTX
Async and parallel patterns and application design - TechDays2013 NL
PPTX
Parallel Programming
PPT
Parallel Extentions to the .NET Framework
PPTX
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
PPTX
Parallel extensions in .Net 4.0
PPTX
History of asynchronous in .NET
PPT
Intro To .Net Threads
PPTX
NDC Sydney 2019 - Async Demystified -- Karel Zikmund
PDF
Sync, async and multithreading
PDF
Concurrency and parallel in .net
PPTX
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
PPT
Introto netthreads-090906214344-phpapp01
PPT
Overview Of Parallel Development - Ericnel
PDF
I see deadlocks : Matt Ellis - Techorama NL 2024
PPTX
Async/Await
PPTX
C# Parallel programming
Coding For Cores - C# Way
.NET Multithreading/Multitasking
Multi core programming 1
Parallel Processing
Async and parallel patterns and application design - TechDays2013 NL
Parallel Programming
Parallel Extentions to the .NET Framework
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
Parallel extensions in .Net 4.0
History of asynchronous in .NET
Intro To .Net Threads
NDC Sydney 2019 - Async Demystified -- Karel Zikmund
Sync, async and multithreading
Concurrency and parallel in .net
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
Introto netthreads-090906214344-phpapp01
Overview Of Parallel Development - Ericnel
I see deadlocks : Matt Ellis - Techorama NL 2024
Async/Await
C# Parallel programming
Ad

More from Dmitri Nesteruk (20)

PDF
Good Ideas in Programming Languages
PDF
Design Pattern Observations
PDF
CallSharp: Automatic Input/Output Matching in .NET
PDF
Design Patterns in Modern C++
PPTX
C# Tricks
PPTX
Introduction to Programming Bots
PDF
Converting Managed Languages to C++
PDF
Monte Carlo C++
PDF
Tpl DataFlow
PDF
YouTrack: Not Just an Issue Tracker
PPTX
Проект X2C
PPTX
Domain Transformations
PDF
Victor CG Erofeev - Metro UI
PDF
Developer Efficiency
PPTX
Distributed Development
PDF
Dynamics CRM Data Integration
PDF
Web mining
PDF
Data mapping tutorial
PDF
Reactive Extensions
PDF
Design Patterns in .Net
Good Ideas in Programming Languages
Design Pattern Observations
CallSharp: Automatic Input/Output Matching in .NET
Design Patterns in Modern C++
C# Tricks
Introduction to Programming Bots
Converting Managed Languages to C++
Monte Carlo C++
Tpl DataFlow
YouTrack: Not Just an Issue Tracker
Проект X2C
Domain Transformations
Victor CG Erofeev - Metro UI
Developer Efficiency
Distributed Development
Dynamics CRM Data Integration
Web mining
Data mapping tutorial
Reactive Extensions
Design Patterns in .Net

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Cloud computing and distributed systems.
PPTX
Big Data Technologies - Introduction.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Network Security Unit 5.pdf for BCA BBA.
The AUB Centre for AI in Media Proposal.docx
Dropbox Q2 2025 Financial Results & Investor Presentation
Mobile App Security Testing_ A Comprehensive Guide.pdf
Cloud computing and distributed systems.
Big Data Technologies - Introduction.pptx
Unlocking AI with Model Context Protocol (MCP)
Encapsulation_ Review paper, used for researhc scholars
Review of recent advances in non-invasive hemoglobin estimation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing
Chapter 3 Spatial Domain Image Processing.pdf
Electronic commerce courselecture one. Pdf
NewMind AI Monthly Chronicles - July 2025
Digital-Transformation-Roadmap-for-Companies.pptx
NewMind AI Weekly Chronicles - August'25 Week I
Spectral efficient network and resource selection model in 5G networks
Network Security Unit 5.pdf for BCA BBA.

.Net Multithreading and Parallelization

  • 1. Multithreading and Parallelization Dmitri Nesteruk dmitrinesteruk@gmail.com | http://nesteruk.org/seminars
  • 2. Agenda Overview Multithreading PowerThreading (AsyncEnumerator) Multi-core parallelization Parallel Extensions to .NET Framework Multi-computer parallelization PureMPI.NET
  • 3. Why now? Manycore paradigm shift CPU speeds reach production challenges (not at the limit yet) growth Processor features Hyper-threading SIMD
  • 4. CPU Scope Past: more Yesterday transistors per chip 1x-core Present: more cores per chip Today 2x-core norm Future: even more 4x- cores per chip; Tomorrow NUMA & other 32x-core? specialties
  • 5. Machine Scope Most clients are concerned with Machine one-machine use Clustering helps Cluster leverage performance Clouds Cloud
  • 6. Multithreading vs. Parallelization Multithreading Using threads/thread pool to perform async operations Explicit (# of threads known) Parallelization Implicit parallelization No explicit thread operation
  • 7. Ways to Parallelize/Multithread System.Threading Managed Parr. Extensions Libraries OpenMP Unmanaged Libraries GPGPU Specialized FPGA
  • 8. Managed System.Threading Libraries Parallel Extensions (TPL + PLINQ) PowerThreading Languages/frameworks Sing#, CCR Remoting, WCF, MPI.NET, PureMPI.NET, etc. Use over many machines
  • 9. Unmanaged OpenMP – #pragma directives in C++ code Intel multi-core libraries Threading Building Blocks (low-level) Integrated Performance Primitives Math Kernel Library (also has MPI support) MPI, PVM, etc. Use over many machines
  • 10. Specialized Ex. (Intrinsic Parallelization) GPU Computation (GPGPU) Calculations on graphic card Uses programmable pixel shaders See, e.g., NVidia CUDA, GPGPU.org FPGA Hardware-specific solutions E.g., in-socket accelerators Requires HDL programming & custom hardware
  • 11. Part I Multithreading: a look at AsyncEnumerator
  • 12. Multithreading Goals Do stuff concurrently Preserve safety/consistency Tools Threads ThreadPool Synchronization objects Framework async APIs
  • 13. A Look at Delegates Making delegate for function is easy Given void a() { … } – ThreadStart del = a; Given void a(int n) { … } – Action<int> del = a; Given float a(int n, double m) {…} – Func<int, double, float> del = a; Otherwise, make your own!
  • 14. Delegate Methods Invoke() Synchronous, blocks your thread  BeginInvoke Executes in ThreadPool Returns IAsyncResult EndInvoke Waits for completion Takes the IAsyncResult from BeginInvoke
  • 15. Usage Fire and forget – del.BeginInvoke(null, null); Fire, and wait until done – IAsyncResult ar = del.BeginInvoke(null,null); … del.EndInvoke(ar); Fire, and call a function when done – del.BeginInvoke(firedWhenDone, null); Callback parameter
  • 16. WaitOne and WaitAll To wait until either delegate completes – WaitHandle.WaitOne( new ThreadStart[] { ar1.AsyncWaitHandle, ar2.AsyncWaitHandle }); // wait until either completes To wait until all delegates complete Use WaitAll instead of WaitOne – [MTAThread]-specific, use Pulse & Wait instead
  • 17. Example Execute a() and b() in parallel; wait on both ThreadStart delA = a; ThreadStart delB = b; IAsyncResult arA = delA.BeginInvoke(null, null); IAsyncResult arB = delB.BeginInvoke(null, null); WaitHandle.WaitAll(new [] { arA.AsyncWaitHandle, arB.AsyncWaitHandle });
  • 18. LINQ Example Execute a() and b() in parallel; wait on both WaitHandle.WaitAll( new [] { a, b } Implicitly make an array of delegates .Select (f =>f.BeginInvoke(null,null) Call each delegate .AsyncWaitHandle) .ToArray()); Get a wait handle of each Convert from IEnumerable to array
  • 19. Asynchronous Programming Model (APM) Basic goal – IAsyncResult ar = del.BeginXXX(null,null); … del.EndXXX(ar); Supported by Framework classes, e.g., – FileStream – WebRequest
  • 20. Difficulties Async calls do not always succeed Timeout Exceptions Cancelation Results in too many functions/anonymous delegates Async workflow code becomes difficult to read
  • 21. PowerThreading A free library from Resource locks Wintellect (Jeffrey ReaderWriterGate Richter) Async. prog. model Get it at AsyncEnumerator wintellect.com SyncGate Other features Also check out IO PowerCollections State manager NumaInformation :)
  • 22. AsyncEnumerator Simplifies APM programming No need to manually manage IAsyncResult cookies Fewer functions, cleaner code
  • 23. Usage patterns 1 async op → process X async ops → process all X async ops → process each one as it completes X async ops → process some, discard the rest X async ops → process some until cancellation/timeout occurs, discard the rest
  • 24. AsyncEnumerator Basics Has three methods Execute(IEnumerator<Int32>) BeginExecute EndExecute Also exists as AsyncEnumerator<T> when a return value is required
  • 25. Inside the Function internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yield return 1; WebResponse resp = wr.EndGetResponse( ae.DequeueAsyncResult()); // use response }
  • 26. Signature internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { Function must return IEnumerator<Int32> WebRequestwr = WebRequest.Create(uri); Function must accept AsyncEnumerator as wr.BeginGetResponse(ae.End(), null); one of the parameters (order unimportant) yield return 1; WebResponseresp = wr.EndGetResponse( ae.DequeueAsyncResult()); // use response }
  • 27. Callback internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yieldthe asyncBeginXXX() methods Call return 1; WebResponseresp = wr.EndGetResponse( Pass ae.End() as callback parameter ae.DequeueAsyncResult()); // use response }
  • 28. Yield internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yield return 1; WebResponseresp = wr.EndGetResponse( Now yield return the number of pending asynchronous operations ae.DequeueAsyncResult()); // use response }
  • 29. Wait & Process internal IEnumerator<Int32> GetFile( AsyncEnumerator ae, string uri) { WebRequest wr = WebRequest.Create(uri); wr.BeginGetResponse(ae.End(), null); yield return 1; Call the asyncEndXXX() methods WebResponse resp = wr.EndGetResponse( ae.DequeueAsyncResult()); // use response Pass ae.DequeueAsyncResult() as parameter }
  • 30. Usage Init the enumerator – var ae = new AsyncEnumerator(); Use it, passing itself as a parameter – ae.Execute(GetFile( ae, “http://nesteruk.org”));
  • 31. Exception Handling Break out of function – try { resp = wr.EndGetResponse( ae.DequeueAsyncResult()); } catch (WebException e) { // process e yield break; } Propagate a parameter
  • 32. Discard Groups Sometimes, you want to ignore the result of some calls E.g., you already got the data elsewhere To discard a group of calls Use overloaded End(…) methods to specify Group number Cleanup delegate Call DiscardGroup(…) with group number
  • 33. Cancellation External code can cancel the iterator – ae.Cancel(…) Or specify a timeout – ae.SetCancelTimeout(…) Check whether iterator is cancelled with – ae.IsCanceled(…) just call yield break if it is
  • 34. Part II Parallel Extensions to .NET Framework TPL and PLINQ
  • 35. Parallelization Algorithms vary (e.g., matrix multiplication) Some not so (e.g., matrix inversion) Some not at all parallelize them
  • 36. Parallel Extensions to .NET Framework (PFX) A library for parallelization Consists of Task Parallel Library Parallel LINQ (PLINQ) Currently in CTP stage Maybe in .NET 4.0?
  • 37. Task Parallel Library Features System.Linq Parallel LINQ System.Theading Implicit parallelism (Parallel.Xxx) System.Threading.Collections Thread-safe stack and queue System.Threading.Tasks Task manager, tasks, futures
  • 38. System.Threading Implicit Parallel.For | ForEach parallelization (Parallel.For and LazyInit<T> ForEach) WriteOnce<T> Aggregate AggregateException exceptions Other useful classes Other goodies 
  • 39. Parallel.For Parallelizes a for loop Instead of for (int i = 0; i < 10; ++i) { … } We write Parallel.For(0, 10, i => { … });
  • 40. Parallel.For Overloads Step size ParallelState for cancelation Thread-local initialization Thread-local finalization References to a TaskManager Task creation options
  • 41. Parallel.ForEach Same features as Parallel.For except No counters or steps Takes an IEnumerable<T> 
  • 42. Cancelation Parallel.For takes an Action<Int32> delegate Can also take an Action<Int32, ParallelState> ParallelState keeps track of the state of parallel execution ParallelState.Stop() stops execution in all threads
  • 43. Parallel.For Exceptions The AggregateException class holds all exceptions thrown Created even if only one thread throws Used by both Parallel.Xxx and PLINQ Original exceptions stored in InnerExceptions property.
  • 44. LazyInit<T> Lazy initialization of a single variable Options – AllowMultipleExecution Init function can be called by many threads, only one value published – EnsureSingleExecution Init function executed only once – ThreadLocal One init call & value per thread
  • 45. WriteOnce<T> Single-assignment structure Just like Nullable: HasValue Value Also try methods TryGetValue TrySetValue
  • 46. Futures A future is the name of a value that will eventually be produced by a computation Thus, we can decide what to do with the value before we know it
  • 47. Futures of T • Future is a factory • Future<T> is the actual future (and also has factory methods) To make a future – var f = Future.Create(() => g()); To use a future Get f.Value The accessor does an async computation
  • 48. Tasks & TaskManager A better Thread+ThreadPool combination TaskManager A very clever thread pool :) Adjusts worker threads to # of CPUs/cores Keeps all cores busy Task A unit of work May (or may not) run concurrently http://channel9.msdn.com/posts/DanielMoth/Parall elFX-Task-and-friends/
  • 49. Task Just like a future, a task takes an Action<T> – Task t = Task.Create(DoSomeWork); Overloads exist :) Fires off immediately. To wait on completion – t.Wait(); Unlike the thread pool, task manager will use as many threads as there are cores
  • 50. Parallel LINQ (PLINQ) Parallel evaluation in LINQ to Objects LINQ to XML Features IParallelEnumerable<T> ParallelEnumerable.AsParallel static method
  • 51. Example IEnumerable<T> data = ...; var q = data.AsParallel() .Where(x => p(x)) .Orderby(x => k(x)) .Select(x => f(x)); foreach (var e in q) a(e);
  • 53. Message Passing Interface An API for general-purpose IPC Works across cores & machines C++ and Fortran Some Intel libraries support explicitly http://www.mcs.anl.gov/research/projects/m pich2/
  • 54. PureMPI.NET A free library available at http://purempi.net Uses WCF endpoints for communication Uses MPI syntax Features A library DLL for WCF functionality An EXE for easy deployment over network
  • 55. How it works Your computers run a service that connects them together Your program exposes WCF endpoints You use the MPI interfaces to communicate
  • 56. Communicator & Rank A communicator is a group of computers In most scenarios, you would have one group MPI_COMM_WORLD comm Useful for determine whether we are the
  • 57. Main static void Main(string[] args) { MPIEnvironment app.config using (ProcessorGroup processors = new ProcessorGroup("MPIEnvironment", MpiProcess)) { Run MpiProcess on all machines processors.Start(); Start each one processors.WaitForCompletion(); Wait on all } }
  • 58. Sending & Receiving Blocking or non-blocking methods Send/Receive (blocking) Begin|End Send/Receive (async) Invoked on the comm
  • 59. Send/Receive static void MpiProcess(IDictionary<string, Comm> comms) { Get a default comm from dictionary Comm comm = comms["MPI_COMM_WORLD"]; if (comm.Rank == 0) { Get a message from 1 (blocking) string msg = comm.Receive<string>(1, string.Empty); Console.WriteLine("Got " + msg); } else if (comm.Rank == 1) { comm.Send(0, string.Empty, "Hello"); } Send a message to 0 (also blocking) }
  • 60. Extras Can use async ops Can send to all (Broadcast) Can distribute work and then collect it (Gather/Scatter)