matrixmultiplicationparallel.ppsx

A Diagnostic research on the
effect of Matrix Multiplication
Parallelism on Multicore
30.06.2020

• Matrix multiplication is an important core computation in many areas
of scientiﬁc computing
• Normally for small multiplication we lean towards to use naive matrix
multiplication algorithm which has rich data parallelism.
• To obtain more performance through that algorithm we parallelized
them using OpenMP.
• Implementation of this system is done in Intel Core i9-7900X having
3.30 GHz x 20 CPU with 31.1 GB RAM.
• The operating System used is Fedora 27 and the software is Geanv
Version 2 IDE, GCC Linux Compiler.
Objective

• Parallel computing has been the fast evolving research area in the last
few decades.
• Parallel computing means dividing the same task into subtasks and all the
subtasks execute simultaneous on multiple processors to yield the output.
• This paper shows the effect of Square Naive Matrix Multiplication
algorithm on sequential as well as parallel implementation.
• Performances were evaluated in a multicore on the basis of its run time
• Chip Microprocessors - As we increase the capacity of chip placing
multiple processors on a single chip became practical.
• OpenMP is an API that uses multithreaded and shared memory
parallelism.
Parallel Computing

Square Naive Matrix Multiplication
• Executing Matrix multiplication using Naive algorithm is referred as
Square Naïve Matrix multiplication.
• Each entry is calculated as sum of products.
• The product of two Square Matrices A and B obtaining C Square
matrix.
• each entry of the product is computed as a sum of n pairwise products
where n represents number of rows/columns.

SQUARE_NAÏVE_MATRIX-MULTIPLY (A, B, C)
N = A.rows
for i = 1 to N
for j = 1 to N
Cij = 0
for k = 1 to N
Cij = Cij + Aik * Bkj
Return C
Pseudo Code for Serial version

PARALLEL_SQUARE_NAIVE_MATRIX-MULTIPLY (A, B, C)
#pragma omp parallel for schedule(dynamic, chunk)
collapse(2) private(i, j, k) shared(A, B, C)
for(i=0;i<n;i++)
for( j=0;j<n;j++)
for(k=0;k<n;k++)
C[i][j]+=A[i][k]*B[k][j];
Return C;
Pseudo Code for Parallel version

OpenMP Clause
• num_thread(integer)
• Schedule(type_of_schedule, Chunk size)
• Collapse(number_of_nested_for_loop)
• Private(list_of_variables)
• Shared(list_of_variables)

Implementation – Sequential Version
• A sequential Square Naive Matrix Multiplication algorithm was
implemented in C++ GCC Compiler without using OpenMP.
• the parallel codes have been written in C++ GCC Compiler with using
OpenMP.
• C=A×B given A is n*n matrix with the element aij, B is n*n matrix
with element bij and the product C is n*n matrix with elements cij.
• The performances of the sequential matrix multiplication are mostly
affected by the inefficiency of accessing the memory especially when
implemented in C++.

Implementation – Parallel Version
• major design issues implementing using OpenMP can be determined.
• Square Naive matrix multiplication algorithm parallelism can be
accomplished through the use of OpenMP components such as
compiler directives, runtime library routines and environment
variables.
• Both codes are executed in Fedora 27 Linux Operating System on
Intel Core i9-7900 Multicore system.
• The running time and performance (speed up, efficiency) of the serial
and parallel square Naive matrix multiplication algorithms has been
done.

• Speed up
• Efficiency
Efficiency =
Speedup and Efficiency
time
execution
Parallel
time
execution
Sequential
Speedup 
11
Dr.D.Christopher Durairaj & Mrs.A.Bharathi Lakshmi

Result – Time Taken
Cores 1000 x 1000 2000 x 2000 3000 x 3000 4000 x 4000 5000 x 5000
1 4.1621 54.4683 233.9659 639.1539 1282.2257
2 2.1539 25.9129 118.3784 316.9857 641.8125
4 1.0993 17.0027 64.2888 172.9639 329.7763
8 0.5822 8.5168 34.0960 81.1930 163.0773
12 0.5074 5.8074 22.6845 62.6338 135.0753
16 0.5061 4.7371 19.4545 57.1035 126.0156
18 0.4708 4.6368 18.6277 52.2070 120.5529
20 0.4487 0.5695 0.6275 0.5502 0.5177

matrixmultiplicationparallel.ppsx

Result - Speedup
Cores
1000×
1000
2000×
2000
3000×
3000
4000×
4000
5000×
5000
2 1.9323 2.1020 1.9764 2.0163 1.9978
4 3.7861 3.2035 3.6393 3.6953 3.8882
8 7.1488 6.3954 6.8620 7.8720 7.8627
12 8.2022 9.3791 10.3139 10.2046 9.4927
16 8.2241 11.4983 12.0263 11.1929 10.1751
18 8.8407 11.7469 12.5601 12.2427 10.6362
20 11.3795 14.6478 15.6904 15.2592 13.7029

Result - Efficiency
Cores
1000×
1000
2000×
2000
3000×
3000
4000×
4000
5000×
5000
2 0.9662 1.0510 0.9882 1.0082 0.9989
4 0.9465 0.8009 0.9098 0.9238 0.9720
8 0.8936 0.7994 0.8577 0.9840 0.9828
12 0.6835 0.7816 0.8595 0.8504 0.7911
16 0.5140 0.7186 0.7516 0.6996 0.6359
18 0.4912 0.6526 0.6978 0.6801 0.5909
20 0.4002 0.5826 0.6621 0.6423 0.5482

Summary
• number of threads assignment in per core and matrix sizes increases.
• the performance (Speedup and Efficiency) also increases in square
Naive matrix multiplication algorithm parallel implementation.
• the execution time of parallel computing with OpenMP got the best
achievement than the serial computing.
• When handled with larger datasets.

matrixmultiplicationparallel.ppsx

More Related Content

Similar to matrixmultiplicationparallel.ppsx (20)

More from Bharathi Lakshmi Pon (12)

Recently uploaded (20)

matrixmultiplicationparallel.ppsx