Mpi.Net Talk

MPI.NETSupercomputing in .NET using the Message Passing InterfaceDavid RossEmail: willmation@gmail.comBlog: www.pebblesteps.com

Computationally complex problems in enterprise softwareETL load into Data Warehouse takes too long. Use compute clusters to quickly provide a summary reportAnalyse massive database tables by processing chunks in parallel on the computer clusterIncreasing the speed of Monte Carlo analysis problemsFiltering/Analysis of massive log filesClick through analysis from IIS logsFirewall logs

Three Pillars of ConcurrencyHerb Sutter/David Callahan break parallel computing techniques into:Responsiveness and Isolation Via Asynchronous Agents Active Objects, GUIs, Web Services, MPIThroughput and Scalability Via Concurrent CollectionsParallel LINQ, Work Stealing, Open MPConsistency Via Safely Shared ResourcesMutable shared Objects, Transactional MemorySource - Dr. Dobb’s Journal http://www.ddj.com/hpc-high-performance-computing/200001985

The Logical Supercomputer Supercomputer:Massively Parallel Machine/Workstations cluster

Batch orientated: Big Problem goes in, Sometime later result is found...Single System Image:Doesn’t matter how the supercomputer is implemented in hardware/software it appears to the users as a SINGLE machine

Deployment of a program onto 1000 machines MUST be automatedMessage Passing Interface C based API for messagingSpecification not an implementation (standard by the MPI Forum)Different vendors (including Open Source projects) provide implementations of the specificationMS-MPI is a fork (of MPICH2) by Microsoft to run on their HPC serversIncludes Active Directory supportFast access to the MS network stack

MPI ImplementationStandard defines:Coding interface (C Header files)MPI Implementation is responsible for:Communication with OS & hardware (Network cards, Pipes, NUMA etc...)

Data transport/BufferingMPIFork-Join parallelismWork is segmented off to worker nodesResults are collated back to the root nodeNo memory is sharedSeparate machines or processesHence data locking is necessary/impossibleSpeed criticalThroughput over development timeLarge data orientated problemsNumerical analysis (matrices) are easily parallelised

MPI.NETMPI.Net is a wrapper around MS-MPIMPI is complex as C runtime can not infer: Array lengthsthe size of complex typesMPI.NET is far simplerSize of collections etc inferred from the type system automaticallyIDispose used to setup/teardown MPI sessionMPI.NET uses “unsafe” handcrafted IL for very fast marshalling of .Net objects to unmanaged MPI API

Single Program Multiple NodeSame application is deployed to each nodeNode Id is used to drive application/orchestration logicFork-Join/Map Reduce are the core paradigms

Hello World in MPIpublic class FrameworkSetup {static void Main(string[] args) {using (new MPI.Environment(ref args)){string s = String.Format("My processor is {0}. My rank is {1}",MPI.Environment.ProcessorName,Communicator.world.Rank); Console.WriteLine(s); } }}

ExecutingMPI.NET is designed to be hosted in Windows HPC ServerMPI.NET has recently been ported to Mono/Linux - still under development and not recommendedWindows HPC Pack SDKmpiexec -n 4 SkillsMatter.MIP.Net.FrameworkSetup.exeMy processor is LPDellDevSL.digiterre.com. My rank is 0My processor is LPDellDevSL.digiterre.com. My rank is 3My processor is LPDellDevSL.digiterre.com. My rank is 2My processor is LPDellDevSL.digiterre.com. My rank is 1

Send/ReceiveLogical Topologystatic void Main(string[] args) { using (new MPI.Environment(ref args)) { if(Communicator.world.Size != 2)throw new Exception("This application must be run with MPI Size == 0" );for(int i = 0; i < NumberOfPings; i++) { if (Communicator.world.Rank == 0) { string send = "Hello Msg:" + i;Console.WriteLine("Rank " + Communicator.world.Rank + " is sending: " + send);// Blocking sendCommunicator.world.Send<string>(send, 1, 0); } Rankdrives parallelismdata, destination, message tag

Send/Receiveelse {// Blocking receivestring s = Communicator.world.Receive<string>(0, 0);Console.WriteLine("Rank "+ Communicator.world.Rank + " recieved: " + s); }Result:Rank 0 is sending: Hello Msg:0Rank 0 is sending: Hello Msg:1Rank 0 is sending: Hello Msg:2Rank 0 is sending: Hello Msg:3Rank 0 is sending: Hello Msg:4Rank 1 received: Hello Msg:0Rank 1 received: Hello Msg:1Rank 1 received: Hello Msg:2Rank 1 received: Hello Msg:3Rank 1 received: Hello Msg:4source, message tag

Send/Receive/BarrierSend/ReceiveBlocking point to point messagingImmediate Send/Immediate ReceiveAsynchronous point to point messagingRequest object has flags to indicate if operation is completeBarrierGlobal blockAll programs halt until statement is executed on all nodes

Broadcast/Scatter/Gather/ReduceBroadcastSend data from one Node to All other nodesFor a many node system as soon as a node receives the shared data it passes it onScatterSplit an array into Communicator.world.Sizechunks and send a chunk to each nodeTypically used for sharing rows in a Matrix

Broadcast/Scatter/Gather/ReduceGatherEach node sends a chunk of data to the root nodeInverse of the Scatter operationReduceCalculate a result on each nodeCombine the results into a single value through a reduction (Min, Max, Add, or custom delegate etc...)

Data orientated problemstatic void Main(string[] args) { using (new MPI.Environment(ref args)) { // Load GradesintnumberOfGrades = 0; double[] allGrades = null; if (Communicator.world.Rank == RANK_0) {allGrades = LoadStudentGrades();numberOfGrades = allGrades.Length; }Communicator.world.Broadcast(ref numberOfGrades, 0);LoadShare(populates)

// Root splits up array and sends to compute nodesdouble[] grades = null;intpageSize = numberOfGrades/Communicator.world.Size;if (Communicator.world.Rank == RANK_0) {Communicator.world.ScatterFromFlattened (allGrades,pageSize, 0, ref grades); } else {Communicator.world.ScatterFromFlattened (null, pageSize, 0, ref grades); }Array is broken into pageSize chunks and sentEach chunk is deserialised into grades

// Calculate the sum on each nodedouble sumOfMarks =Communicator.world.Reduce<double>(grades.Sum(), Operation<double>.Add, 0);// Calculate and publish average Markdouble averageMark = 0.0;if (Communicator.world.Rank == RANK_0) {averageMark = sumOfMarks / numberOfGrades;}Communicator.world.Broadcast(ref averageMark, 0);...SummariseShare

ResultRank: 3, Sum of Marks:0, Average:50.7409948765608, stddev:0Rank: 2, Sum of Marks:0, Average:50.7409948765608, stddev:0Rank: 0, Sum of Marks:202963.979506243, Average:50.7409948765608, stddev:28.9402362588477Rank: 1, Sum of Marks:0, Average:50.7409948765608, stddev:0

Fork-Join ParallelismLoad the problem parametersShare the problem with the compute nodesWait and gather the resultsRepeatBest Practice:Each Fork-Join block should be treated a separate Unit of WorkPreferably as a individual module otherwise spaghetti code can ensue

When to usePLINQ or Parallel Task Library (1st choice)Map-Reduce operation to utilise all the cores on a boxWeb Services / WCF (2nd choice)No data sharing between nodesLoad balancer in front of a Web Farm is far easier developmentMPILots of sharing of intermediate resultsHuge data setsProject appetite to invest in a cluster or to deploy to a cloudMPI + PLINQ Hybrid (3rd choice)MPI moves dataPLINQ utilises cores

Mpi.Net Talk

More Related Content

What's hot (10)

Viewers also liked (7)

Similar to Mpi.Net Talk (20)

Mpi.Net Talk

Editor's Notes