MPI - 3

Parallel Computing
Mohamed Zahran (aka Z)
mzahran@cs.nyu.edu
http://www.mzahran.com
CSCI-UA.0480-003
MPI - III
Many slides of this
lecture are adopted
and slightly modified from:
• Gerassimos Barlas
• Peter S. Pacheco

Data distributions
Copyright © 2010, Elsevier Inc.
All rights Reserved
Sequential version

Different partitions of a 12-
component vector among 3 processes
• Block: Assign blocks of consecutive components to each process.
• Cyclic: Assign components in a round robin fashion.
• Block-cyclic: Use a cyclic distribution of blocks of components.

Parallel implementation of
vector addition
All rights Reserved
How will you distribute parts of x[] and y[] to processes?

Scatter
• Read an entire vector on process 0
• MPI_Scatter sends the needed
components to each of the other
processes.
# data items going
to each process
Important:
• All arguments are important for the source process (process 0 in our example)
• For all other processes, only recv_buf_p, recv_count, recv_type, src_proc,
and comm are important

Reading and distributing a vector
All rights Reserved
process 0 itself
also receives data.

• send_buf_p
– is not used except by the sender.
– However, it must be defined or NULL on others to make the
code correct.
– Must have at least communicator size * send_count elements
• All processes must call MPI_Scatter, not only the sender.
• send_count the number of data items sent to each process.
• recv_buf_p must have at least send_count elements
• MPI_Scatter uses block distribution

0 21 3 4 5 6 7 0
0 1 2
3 4 5
6 7 8
Process 0
Process 0
Process 1
Process 2

Gather
• MPI_Gather collects all of the
components of the vector onto process
dest process, ordered in rank order.
Important:
• All arguments are important for the destination process.
• For all other processes, only send_buf_p, send_count, send_type, dest_proc,
and comm are important
number of elements
for any single receive
number of elements
in send_buf_p

Print a distributed vector (1)
All rights Reserved

Print a distributed vector (2)
All rights Reserved

Allgather
• Concatenates the contents of each
process’ send_buf_p and stores this in
each process’ recv_buf_p.
• As usual, recv_count is the amount of data
being received from each process.
All rights Reserved

Matrix-vector multiplication
All rights Reserved
i-th component of y
Dot product of the ith
row of A with x.

Matrix-vector multiplication
Pseudo-code Serial Version

C style arrays
All rights Reserved
stored as

Serial matrix-vector multiplication
Let’s assume x[] is distributed among the different processes

An MPI matrix-vector
multiplication function (1)
All rights Reserved

An MPI matrix-vector
multiplication function (2)
All rights Reserved

Keep in mind …
• In distributed memory systems,
communication is more expensive than
computation.
• Distributing a fixed amount of data
among several messages is more
expensive than sending a single big
message.

Derived datatypes
• Used to represent any collection of data
items
• If a function that sends data knows this
information about a collection of data items,
it can collect the items from memory before
they are sent.
• A function that receives data can distribute
the items into their correct destinations in
memory when they’re received.
All rights Reserved

Derived datatypes
• A sequence of basic MPI data types
together with a displacement for each
of the data types.
All rights Reserved
Address in memory where the variables are stored
a and b are double; n is int
displacement from the beginning of the type
(We assume we start with a.)

MPI_Type create_struct
• Builds a derived datatype that consists
of individual elements that have
different basic types.
From the address of item 0an integer type that is big enough
to store an address on the system.
Number of elements in the type

Before you start using your new data type
Allows the MPI implementation to
optimize its internal representation of
the datatype for use in communication
functions.

When you are finished with your new type
This frees any additional storage used.
All rights Reserved

We have seen in the past …
• time in Linux
• clock() inside your code
• Does MPI offer anything else?

Elapsed parallel time
• Returns the number of seconds that
have elapsed since some time in the
past.
Elapsed time for
the calling process

Let’s see how we can analyze the
performance of an MPI program
The matrix-vector multiplication

Conclusions
• Reducing messages is a good
performance strategy!
– Collective vs point-to-point
• Distributing a fixed amount of data
among several messages is more
expensive than sending a single big
message.
Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)

MPI - 3

More Related Content

What's hot (20)

Similar to MPI - 3 (20)

More from Shah Zaib (6)

Recently uploaded (20)

MPI - 3