SlideShare a Scribd company logo
MPI.NETSupercomputing in .NET using the Message Passing InterfaceDavid RossEmail: willmation@gmail.comBlog: www.pebblesteps.com
Computationally complex problems in enterprise softwareETL load into Data Warehouse takes too long.  Use compute clusters to quickly provide a summary reportAnalyse massive database tables by processing chunks in parallel on the computer clusterIncreasing the speed of Monte Carlo analysis problemsFiltering/Analysis of massive log filesClick through analysis from IIS logsFirewall logs
Three Pillars of ConcurrencyHerb Sutter/David Callahan break parallel computing techniques into:Responsiveness and Isolation Via Asynchronous Agents Active Objects, GUIs, Web Services, MPIThroughput and Scalability Via Concurrent CollectionsParallel LINQ, Work Stealing, Open MPConsistency Via Safely Shared ResourcesMutable shared Objects, Transactional MemorySource - Dr. Dobb’s Journal http://www.ddj.com/hpc-high-performance-computing/200001985
The Logical Supercomputer Supercomputer:Massively Parallel Machine/Workstations cluster
Batch orientated:  Big Problem goes in, Sometime later result is found...Single System Image:Doesn’t matter how the supercomputer is implemented in hardware/software it appears to the users as a SINGLE machine
Deployment of a program onto 1000 machines MUST be automatedMessage Passing Interface C based API for messagingSpecification not an implementation (standard by the MPI Forum)Different vendors (including Open Source projects) provide implementations of the specificationMS-MPI is a fork (of MPICH2) by Microsoft to run on their HPC serversIncludes Active Directory supportFast access to the MS network stack
MPI ImplementationStandard defines:Coding interface (C Header files)MPI Implementation is responsible for:Communication with OS & hardware (Network cards, Pipes, NUMA etc...)
Data transport/BufferingMPIFork-Join parallelismWork is segmented off to worker nodesResults  are collated back to the root nodeNo memory is sharedSeparate machines or processesHence data locking is necessary/impossibleSpeed criticalThroughput over development timeLarge data orientated problemsNumerical analysis (matrices) are easily parallelised
MPI.NETMPI.Net is a wrapper around MS-MPIMPI is complex as C runtime can not infer: Array lengthsthe size of complex typesMPI.NET is far simplerSize of collections etc inferred from the type system automaticallyIDispose used to setup/teardown MPI sessionMPI.NET uses “unsafe” handcrafted IL for very fast marshalling of .Net objects to unmanaged MPI API
Single Program Multiple NodeSame application is deployed to each nodeNode Id is used to drive application/orchestration logicFork-Join/Map Reduce are the core paradigms
Hello World in MPIpublic class FrameworkSetup	{static void Main(string[] args)  {using (new MPI.Environment(ref args)){string s = String.Format("My processor is {0}.  My rank is {1}",MPI.Environment.ProcessorName,Communicator.world.Rank); Console.WriteLine(s);    }  }}
ExecutingMPI.NET is designed to be hosted in Windows HPC ServerMPI.NET has recently been ported to Mono/Linux  - still under development and not recommendedWindows HPC Pack SDKmpiexec -n 4 SkillsMatter.MIP.Net.FrameworkSetup.exeMy processor is LPDellDevSL.digiterre.com.  My rank is 0My processor is LPDellDevSL.digiterre.com.  My rank is 3My processor is LPDellDevSL.digiterre.com.  My rank is 2My processor is LPDellDevSL.digiterre.com.  My rank is 1
Send/ReceiveLogical Topologystatic void Main(string[] args) {	using (new MPI.Environment(ref args)) {       if(Communicator.world.Size != 2)throw new Exception("This application must be run with MPI Size == 0" );for(int i = 0; i < NumberOfPings; i++)  {	     if (Communicator.world.Rank == 0)  {	string send = "Hello Msg:" + i;Console.WriteLine("Rank " + Communicator.world.Rank + " is sending: " + send);// Blocking sendCommunicator.world.Send<string>(send, 1, 0);	  } Rankdrives parallelismdata, destination, message tag
Send/Receiveelse {// Blocking receivestring s = Communicator.world.Receive<string>(0, 0);Console.WriteLine("Rank "+ Communicator.world.Rank + " recieved: " + s); }Result:Rank 0 is sending: Hello Msg:0Rank 0 is sending: Hello Msg:1Rank 0 is sending: Hello Msg:2Rank 0 is sending: Hello Msg:3Rank 0 is sending: Hello Msg:4Rank 1 received: Hello Msg:0Rank 1 received: Hello Msg:1Rank 1 received: Hello Msg:2Rank 1 received: Hello Msg:3Rank 1 received: Hello Msg:4source, message tag
Send/Receive/BarrierSend/ReceiveBlocking point to point messagingImmediate Send/Immediate ReceiveAsynchronous point to point messagingRequest object has flags to indicate if operation is completeBarrierGlobal blockAll programs halt until statement is executed on all nodes
Broadcast/Scatter/Gather/ReduceBroadcastSend data from one Node to All other nodesFor a many node system as soon as a node receives the shared data it passes it onScatterSplit an array into Communicator.world.Sizechunks and send a chunk to each nodeTypically used for sharing rows in a Matrix
Broadcast/Scatter/Gather/ReduceGatherEach node sends a chunk of data to the root nodeInverse of the Scatter operationReduceCalculate a result on each nodeCombine the results into a single value through a reduction (Min, Max, Add, or custom delegate etc...)
Data orientated problemstatic void Main(string[] args) {  using (new MPI.Environment(ref args)) {   // Load GradesintnumberOfGrades = 0; 	double[] allGrades = null;	if (Communicator.world.Rank == RANK_0)  {allGrades = LoadStudentGrades();numberOfGrades = allGrades.Length;	}Communicator.world.Broadcast(ref numberOfGrades, 0);LoadShare(populates)
 // Root splits up array and sends to compute nodesdouble[] grades = null;intpageSize = numberOfGrades/Communicator.world.Size;if (Communicator.world.Rank == RANK_0) {Communicator.world.ScatterFromFlattened	  (allGrades,pageSize, 0, ref grades);  }  else  {Communicator.world.ScatterFromFlattened	  (null, pageSize, 0, ref grades);  }Array is broken into pageSize chunks and sentEach chunk is deserialised into grades
// Calculate the sum on each nodedouble sumOfMarks =Communicator.world.Reduce<double>(grades.Sum(), Operation<double>.Add, 0);// Calculate and publish average Markdouble averageMark = 0.0;if (Communicator.world.Rank == RANK_0) {averageMark = sumOfMarks / numberOfGrades;}Communicator.world.Broadcast(ref averageMark, 0);...SummariseShare
ResultRank: 3, Sum of Marks:0, Average:50.7409948765608, stddev:0Rank: 2, Sum of Marks:0, Average:50.7409948765608, stddev:0Rank: 0, Sum of Marks:202963.979506243, Average:50.7409948765608, stddev:28.9402362588477Rank: 1, Sum of Marks:0, Average:50.7409948765608, stddev:0
Fork-Join ParallelismLoad the problem parametersShare the problem with the compute nodesWait and gather the resultsRepeatBest Practice:Each Fork-Join block should be treated a separate Unit of WorkPreferably as a individual module otherwise spaghetti code can ensue
When to usePLINQ or Parallel Task Library (1st choice)Map-Reduce operation to utilise all the cores on a boxWeb Services / WCF (2nd choice)No data sharing between nodesLoad balancer in front of a Web Farm is far easier developmentMPILots of sharing of intermediate resultsHuge data setsProject appetite to invest in a cluster or to deploy to a cloudMPI + PLINQ Hybrid (3rd choice)MPI moves dataPLINQ utilises cores

More Related Content

PPTX
stigbot_beta
PDF
Opnet lab 4 solutions
PDF
Opnet lab 6 solutions
PDF
Opnet lab 3 solutions
PDF
Structured streaming in Spark
PPTX
Magellan FOSS4G Talk, Boston 2017
PDF
Opnet lab 2 solutions
PDF
Opnet lab 5 solutions
stigbot_beta
Opnet lab 4 solutions
Opnet lab 6 solutions
Opnet lab 3 solutions
Structured streaming in Spark
Magellan FOSS4G Talk, Boston 2017
Opnet lab 2 solutions
Opnet lab 5 solutions

What's hot (10)

PDF
Continuous Application with Structured Streaming 2.0
PPTX
Structured Streaming Using Spark 2.1
PPTX
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
PDF
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
DOCX
Coit11238 2019-t2
PDF
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
PPTX
Web technology Unit I Part C
PDF
Computational Framework for Generating Visual Summaries of Topical Clusters i...
PPTX
Spark streaming
PDF
Parallel Computing - Lec 5
Continuous Application with Structured Streaming 2.0
Structured Streaming Using Spark 2.1
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Coit11238 2019-t2
Flink Forward SF 2017: Dean Wampler - Streaming Deep Learning Scenarios with...
Web technology Unit I Part C
Computational Framework for Generating Visual Summaries of Topical Clusters i...
Spark streaming
Parallel Computing - Lec 5
Ad

Viewers also liked (7)

PPT
Proyecto TIC en un centro educativo
PPT
Proyecto Tic en un centro educativo
PPT
Tic project
PDF
Rpvian anro 0020-pagina0479 (1)
PPT
Proyecto TIC en un centro educativo
PPS
Marcha sanlorenzo1
PPTX
งานนำเสนอ1
Proyecto TIC en un centro educativo
Proyecto Tic en un centro educativo
Tic project
Rpvian anro 0020-pagina0479 (1)
Proyecto TIC en un centro educativo
Marcha sanlorenzo1
งานนำเสนอ1
Ad

Similar to Mpi.Net Talk (20)

PPT
C#, What Is Next?
PPT
Migration To Multi Core - Parallel Programming Models
PPT
Overview Of Parallel Development - Ericnel
PDF
Move Message Passing Interface Applications to the Next Level
PPTX
25-MPI-OpenMP.pptx
PPTX
Natural Laws of Software Performance
PDF
Lecture 2 more about parallel computing
PPT
PPT
MPI Introduction
PPTX
Smalland Survive the Wilds v1.6.2 Free Download
PPTX
Cricket 07 Download For Pc Windows 7,10,11 Free
PPTX
TVersity Pro Media Server Free CRACK Download
PPTX
ScreenHunter Pro 7 Free crack Download
PPTX
Arcsoft TotalMedia Theatre crack Free 2025 Download
PPTX
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
PDF
AutoCAD 2025 Crack By Autodesk Free Serial Number
PDF
Wondershare Filmora Crack 2025 For Windows Free
PDF
Wondershare Filmora Crack 2025 For Windows Free
PDF
Smalland Survive the Wilds v1.6.2 Free Download
PDF
ScreenHunter Pro 7 Free crack Download 2025
C#, What Is Next?
Migration To Multi Core - Parallel Programming Models
Overview Of Parallel Development - Ericnel
Move Message Passing Interface Applications to the Next Level
25-MPI-OpenMP.pptx
Natural Laws of Software Performance
Lecture 2 more about parallel computing
MPI Introduction
Smalland Survive the Wilds v1.6.2 Free Download
Cricket 07 Download For Pc Windows 7,10,11 Free
TVersity Pro Media Server Free CRACK Download
ScreenHunter Pro 7 Free crack Download
Arcsoft TotalMedia Theatre crack Free 2025 Download
CyberLink MediaShow Ultra Free CRACK 6.0.10019 Download
AutoCAD 2025 Crack By Autodesk Free Serial Number
Wondershare Filmora Crack 2025 For Windows Free
Wondershare Filmora Crack 2025 For Windows Free
Smalland Survive the Wilds v1.6.2 Free Download
ScreenHunter Pro 7 Free crack Download 2025

Mpi.Net Talk

  • 1. MPI.NETSupercomputing in .NET using the Message Passing InterfaceDavid RossEmail: willmation@gmail.comBlog: www.pebblesteps.com
  • 2. Computationally complex problems in enterprise softwareETL load into Data Warehouse takes too long. Use compute clusters to quickly provide a summary reportAnalyse massive database tables by processing chunks in parallel on the computer clusterIncreasing the speed of Monte Carlo analysis problemsFiltering/Analysis of massive log filesClick through analysis from IIS logsFirewall logs
  • 3. Three Pillars of ConcurrencyHerb Sutter/David Callahan break parallel computing techniques into:Responsiveness and Isolation Via Asynchronous Agents Active Objects, GUIs, Web Services, MPIThroughput and Scalability Via Concurrent CollectionsParallel LINQ, Work Stealing, Open MPConsistency Via Safely Shared ResourcesMutable shared Objects, Transactional MemorySource - Dr. Dobb’s Journal http://www.ddj.com/hpc-high-performance-computing/200001985
  • 4. The Logical Supercomputer Supercomputer:Massively Parallel Machine/Workstations cluster
  • 5. Batch orientated: Big Problem goes in, Sometime later result is found...Single System Image:Doesn’t matter how the supercomputer is implemented in hardware/software it appears to the users as a SINGLE machine
  • 6. Deployment of a program onto 1000 machines MUST be automatedMessage Passing Interface C based API for messagingSpecification not an implementation (standard by the MPI Forum)Different vendors (including Open Source projects) provide implementations of the specificationMS-MPI is a fork (of MPICH2) by Microsoft to run on their HPC serversIncludes Active Directory supportFast access to the MS network stack
  • 7. MPI ImplementationStandard defines:Coding interface (C Header files)MPI Implementation is responsible for:Communication with OS & hardware (Network cards, Pipes, NUMA etc...)
  • 8. Data transport/BufferingMPIFork-Join parallelismWork is segmented off to worker nodesResults are collated back to the root nodeNo memory is sharedSeparate machines or processesHence data locking is necessary/impossibleSpeed criticalThroughput over development timeLarge data orientated problemsNumerical analysis (matrices) are easily parallelised
  • 9. MPI.NETMPI.Net is a wrapper around MS-MPIMPI is complex as C runtime can not infer: Array lengthsthe size of complex typesMPI.NET is far simplerSize of collections etc inferred from the type system automaticallyIDispose used to setup/teardown MPI sessionMPI.NET uses “unsafe” handcrafted IL for very fast marshalling of .Net objects to unmanaged MPI API
  • 10. Single Program Multiple NodeSame application is deployed to each nodeNode Id is used to drive application/orchestration logicFork-Join/Map Reduce are the core paradigms
  • 11. Hello World in MPIpublic class FrameworkSetup {static void Main(string[] args) {using (new MPI.Environment(ref args)){string s = String.Format("My processor is {0}. My rank is {1}",MPI.Environment.ProcessorName,Communicator.world.Rank); Console.WriteLine(s); } }}
  • 12. ExecutingMPI.NET is designed to be hosted in Windows HPC ServerMPI.NET has recently been ported to Mono/Linux - still under development and not recommendedWindows HPC Pack SDKmpiexec -n 4 SkillsMatter.MIP.Net.FrameworkSetup.exeMy processor is LPDellDevSL.digiterre.com. My rank is 0My processor is LPDellDevSL.digiterre.com. My rank is 3My processor is LPDellDevSL.digiterre.com. My rank is 2My processor is LPDellDevSL.digiterre.com. My rank is 1
  • 13. Send/ReceiveLogical Topologystatic void Main(string[] args) { using (new MPI.Environment(ref args)) { if(Communicator.world.Size != 2)throw new Exception("This application must be run with MPI Size == 0" );for(int i = 0; i < NumberOfPings; i++) { if (Communicator.world.Rank == 0) { string send = "Hello Msg:" + i;Console.WriteLine("Rank " + Communicator.world.Rank + " is sending: " + send);// Blocking sendCommunicator.world.Send<string>(send, 1, 0); } Rankdrives parallelismdata, destination, message tag
  • 14. Send/Receiveelse {// Blocking receivestring s = Communicator.world.Receive<string>(0, 0);Console.WriteLine("Rank "+ Communicator.world.Rank + " recieved: " + s); }Result:Rank 0 is sending: Hello Msg:0Rank 0 is sending: Hello Msg:1Rank 0 is sending: Hello Msg:2Rank 0 is sending: Hello Msg:3Rank 0 is sending: Hello Msg:4Rank 1 received: Hello Msg:0Rank 1 received: Hello Msg:1Rank 1 received: Hello Msg:2Rank 1 received: Hello Msg:3Rank 1 received: Hello Msg:4source, message tag
  • 15. Send/Receive/BarrierSend/ReceiveBlocking point to point messagingImmediate Send/Immediate ReceiveAsynchronous point to point messagingRequest object has flags to indicate if operation is completeBarrierGlobal blockAll programs halt until statement is executed on all nodes
  • 16. Broadcast/Scatter/Gather/ReduceBroadcastSend data from one Node to All other nodesFor a many node system as soon as a node receives the shared data it passes it onScatterSplit an array into Communicator.world.Sizechunks and send a chunk to each nodeTypically used for sharing rows in a Matrix
  • 17. Broadcast/Scatter/Gather/ReduceGatherEach node sends a chunk of data to the root nodeInverse of the Scatter operationReduceCalculate a result on each nodeCombine the results into a single value through a reduction (Min, Max, Add, or custom delegate etc...)
  • 18. Data orientated problemstatic void Main(string[] args) { using (new MPI.Environment(ref args)) { // Load GradesintnumberOfGrades = 0; double[] allGrades = null; if (Communicator.world.Rank == RANK_0) {allGrades = LoadStudentGrades();numberOfGrades = allGrades.Length; }Communicator.world.Broadcast(ref numberOfGrades, 0);LoadShare(populates)
  • 19. // Root splits up array and sends to compute nodesdouble[] grades = null;intpageSize = numberOfGrades/Communicator.world.Size;if (Communicator.world.Rank == RANK_0) {Communicator.world.ScatterFromFlattened (allGrades,pageSize, 0, ref grades); } else {Communicator.world.ScatterFromFlattened (null, pageSize, 0, ref grades); }Array is broken into pageSize chunks and sentEach chunk is deserialised into grades
  • 20. // Calculate the sum on each nodedouble sumOfMarks =Communicator.world.Reduce<double>(grades.Sum(), Operation<double>.Add, 0);// Calculate and publish average Markdouble averageMark = 0.0;if (Communicator.world.Rank == RANK_0) {averageMark = sumOfMarks / numberOfGrades;}Communicator.world.Broadcast(ref averageMark, 0);...SummariseShare
  • 21. ResultRank: 3, Sum of Marks:0, Average:50.7409948765608, stddev:0Rank: 2, Sum of Marks:0, Average:50.7409948765608, stddev:0Rank: 0, Sum of Marks:202963.979506243, Average:50.7409948765608, stddev:28.9402362588477Rank: 1, Sum of Marks:0, Average:50.7409948765608, stddev:0
  • 22. Fork-Join ParallelismLoad the problem parametersShare the problem with the compute nodesWait and gather the resultsRepeatBest Practice:Each Fork-Join block should be treated a separate Unit of WorkPreferably as a individual module otherwise spaghetti code can ensue
  • 23. When to usePLINQ or Parallel Task Library (1st choice)Map-Reduce operation to utilise all the cores on a boxWeb Services / WCF (2nd choice)No data sharing between nodesLoad balancer in front of a Web Farm is far easier developmentMPILots of sharing of intermediate resultsHuge data setsProject appetite to invest in a cluster or to deploy to a cloudMPI + PLINQ Hybrid (3rd choice)MPI moves dataPLINQ utilises cores
  • 24. More InformationMPI.Net: http://www.osl.iu.edu/research/mpi.net/software/Google: Windows HPC Pack 2008 SP1MPI Forum: http://www.mpi-forum.org/Slides and Source: http://www.pebblesteps.comThanks for listening...

Editor's Notes

  • #2: Good evening.My name is davidross and tonight I will be talking about the Message Passing Interface which is a high speed messaging framework used in the dvelopment of software on Supercomputers. Due to the commodisationof hardware and
  • #4: Before I go discuss MPI it is very important when discussing concurrency or parallel programming that we understand the problem space the technology is trying to solve. Herb Sutter who runs the C++ team at Microsoft has broken the multi-core currency problem into 3 aspects. Isolated components, Concurrent collections and safety. MPI falls into the first bucket and is focussed on high speed data transfer between compute nodes.
  • #5: Assuming we have a problem that is too slow to run under a single process or single machine. We have the problem of orchestrating data transfers and commands between the nodes.
  • #6: MPI is an API standard not a product and there are many different implementations of the standard. MPI.NET meanwhile uses knowledge of the .NET type system to remove the complexity of using MPI from a language like C. In C u must explicitly state the size of collections and the width of complexity types being transferred. In MPI.NEY the serialisation is far easier.
  • #9: As we will see in the code examples later MPI.NET is a