SlideShare a Scribd company logo
A Diagnostic research on the
effect of Matrix Multiplication
Parallelism on Multicore
30.06.2020
• Matrix multiplication is an important core computation in many areas
of scientific computing
• Normally for small multiplication we lean towards to use naive matrix
multiplication algorithm which has rich data parallelism.
• To obtain more performance through that algorithm we parallelized
them using OpenMP.
• Implementation of this system is done in Intel Core i9-7900X having
3.30 GHz x 20 CPU with 31.1 GB RAM.
• The operating System used is Fedora 27 and the software is Geanv
Version 2 IDE, GCC Linux Compiler.
Objective
• Parallel computing has been the fast evolving research area in the last
few decades.
• Parallel computing means dividing the same task into subtasks and all the
subtasks execute simultaneous on multiple processors to yield the output.
• This paper shows the effect of Square Naive Matrix Multiplication
algorithm on sequential as well as parallel implementation.
• Performances were evaluated in a multicore on the basis of its run time
• Chip Microprocessors - As we increase the capacity of chip placing
multiple processors on a single chip became practical.
• OpenMP is an API that uses multithreaded and shared memory
parallelism.
Parallel Computing
Square Naive Matrix Multiplication
• Executing Matrix multiplication using Naive algorithm is referred as
Square Naïve Matrix multiplication.
• Each entry is calculated as sum of products.
• The product of two Square Matrices A and B obtaining C Square
matrix.
• each entry of the product is computed as a sum of n pairwise products
where n represents number of rows/columns.
SQUARE_NAÏVE_MATRIX-MULTIPLY (A, B, C)
N = A.rows
for i = 1 to N
for j = 1 to N
Cij = 0
for k = 1 to N
Cij = Cij + Aik * Bkj
Return C
Pseudo Code for Serial version
PARALLEL_SQUARE_NAIVE_MATRIX-MULTIPLY (A, B, C)
#pragma omp parallel for schedule(dynamic, chunk)
collapse(2) private(i, j, k) shared(A, B, C)
for(i=0;i<n;i++)
for( j=0;j<n;j++)
for(k=0;k<n;k++)
C[i][j]+=A[i][k]*B[k][j];
Return C;
Pseudo Code for Parallel version
OpenMP Clause
• num_thread(integer)
• Schedule(type_of_schedule, Chunk size)
• Collapse(number_of_nested_for_loop)
• Private(list_of_variables)
• Shared(list_of_variables)
Design
Implementation – Sequential Version
• A sequential Square Naive Matrix Multiplication algorithm was
implemented in C++ GCC Compiler without using OpenMP.
• the parallel codes have been written in C++ GCC Compiler with using
OpenMP.
• C=A×B given A is n*n matrix with the element aij, B is n*n matrix
with element bij and the product C is n*n matrix with elements cij.
• The performances of the sequential matrix multiplication are mostly
affected by the inefficiency of accessing the memory especially when
implemented in C++.
Implementation – Parallel Version
• major design issues implementing using OpenMP can be determined.
• Square Naive matrix multiplication algorithm parallelism can be
accomplished through the use of OpenMP components such as
compiler directives, runtime library routines and environment
variables.
• Both codes are executed in Fedora 27 Linux Operating System on
Intel Core i9-7900 Multicore system.
• The running time and performance (speed up, efficiency) of the serial
and parallel square Naive matrix multiplication algorithms has been
done.
• Speed up
• Efficiency
Efficiency =
Speedup and Efficiency
time
execution
Parallel
time
execution
Sequential
Speedup 
11
Dr.D.Christopher Durairaj & Mrs.A.Bharathi Lakshmi
Result – Time Taken
Cores 1000 x 1000 2000 x 2000 3000 x 3000 4000 x 4000 5000 x 5000
1 4.1621 54.4683 233.9659 639.1539 1282.2257
2 2.1539 25.9129 118.3784 316.9857 641.8125
4 1.0993 17.0027 64.2888 172.9639 329.7763
8 0.5822 8.5168 34.0960 81.1930 163.0773
12 0.5074 5.8074 22.6845 62.6338 135.0753
16 0.5061 4.7371 19.4545 57.1035 126.0156
18 0.4708 4.6368 18.6277 52.2070 120.5529
20 0.4487 0.5695 0.6275 0.5502 0.5177
matrixmultiplicationparallel.ppsx
Result - Speedup
Cores
1000×
1000
2000×
2000
3000×
3000
4000×
4000
5000×
5000
2 1.9323 2.1020 1.9764 2.0163 1.9978
4 3.7861 3.2035 3.6393 3.6953 3.8882
8 7.1488 6.3954 6.8620 7.8720 7.8627
12 8.2022 9.3791 10.3139 10.2046 9.4927
16 8.2241 11.4983 12.0263 11.1929 10.1751
18 8.8407 11.7469 12.5601 12.2427 10.6362
20 11.3795 14.6478 15.6904 15.2592 13.7029
matrixmultiplicationparallel.ppsx
Result - Efficiency
Cores
1000×
1000
2000×
2000
3000×
3000
4000×
4000
5000×
5000
2 0.9662 1.0510 0.9882 1.0082 0.9989
4 0.9465 0.8009 0.9098 0.9238 0.9720
8 0.8936 0.7994 0.8577 0.9840 0.9828
12 0.6835 0.7816 0.8595 0.8504 0.7911
16 0.5140 0.7186 0.7516 0.6996 0.6359
18 0.4912 0.6526 0.6978 0.6801 0.5909
20 0.4002 0.5826 0.6621 0.6423 0.5482
matrixmultiplicationparallel.ppsx
Summary
• number of threads assignment in per core and matrix sizes increases.
• the performance (Speedup and Efficiency) also increases in square
Naive matrix multiplication algorithm parallel implementation.
• the execution time of parallel computing with OpenMP got the best
achievement than the serial computing.
• When handled with larger datasets.

More Related Content

PDF
Concurrent Programming
PPTX
Compiler Design
PPTX
Parallelizing matrix multiplication
PDF
Comprehensive Performance Evaluation on Multiplication of Matrices using MPI
PDF
Options and trade offs for parallelism and concurrency in Modern C++
PDF
Concurrent Matrix Multiplication on Multi-core Processors
PDF
Matrix Multiplication with Ateji PX for Java
PPTX
Complier design
Concurrent Programming
Compiler Design
Parallelizing matrix multiplication
Comprehensive Performance Evaluation on Multiplication of Matrices using MPI
Options and trade offs for parallelism and concurrency in Modern C++
Concurrent Matrix Multiplication on Multi-core Processors
Matrix Multiplication with Ateji PX for Java
Complier design

Similar to matrixmultiplicationparallel.ppsx (20)

PPTX
Sandia Fast Matmul
PDF
Unmanaged Parallelization via P/Invoke
DOCX
Matrix multiplication graph
PDF
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
PPSX
Parallel Computing--Webminar.ppsx
PPSX
parallelcomputing-webminar.ppsx
PPT
Chap8 slides
PPT
Parallel algorithms
PPT
Parallel algorithms
PPTX
Matrix multiplication
PPTX
Presentation on Shared Memory Parallel Programming
PPT
hpc-unit-IV-2-dense-matrix-algorithms.ppt
PDF
Performance comparison of row per slave and rows set
PDF
Performance comparison of row per slave and rows set per slave method in pvm ...
PDF
IRJET- Latin Square Computation of Order-3 using Open CL
PPT
Parallel algorithms
PPT
Migration To Multi Core - Parallel Programming Models
PDF
Adaptive Linear Solvers and Eigensolvers
PPT
densematrix.ppt
PPT
CS4961-L9.ppt
Sandia Fast Matmul
Unmanaged Parallelization via P/Invoke
Matrix multiplication graph
Performance Analysis of Parallel Algorithms on Multi-core System using OpenMP
Parallel Computing--Webminar.ppsx
parallelcomputing-webminar.ppsx
Chap8 slides
Parallel algorithms
Parallel algorithms
Matrix multiplication
Presentation on Shared Memory Parallel Programming
hpc-unit-IV-2-dense-matrix-algorithms.ppt
Performance comparison of row per slave and rows set
Performance comparison of row per slave and rows set per slave method in pvm ...
IRJET- Latin Square Computation of Order-3 using Open CL
Parallel algorithms
Migration To Multi Core - Parallel Programming Models
Adaptive Linear Solvers and Eigensolvers
densematrix.ppt
CS4961-L9.ppt
Ad

More from Bharathi Lakshmi Pon (12)

PPTX
Knowing about Computer SS.pptx
PPSX
iterativealgorithms.ppsx
PPSX
intensitytransformationspatialfiltering.ppsx
PPSX
PPSX
graphicsdesigning-intro.ppsx
PPSX
intensitytransformation.ppsx
PPSX
PSNR based Optimization using statistical algorithm.ppsx
PPSX
dipslideshare.ppsx
PPSX
webdesigning.ppsx
PPSX
classtimetable.ppsx
PPT
Intensity Transformation and Spatial Filtering
PPT
Sequential consistency model
Knowing about Computer SS.pptx
iterativealgorithms.ppsx
intensitytransformationspatialfiltering.ppsx
graphicsdesigning-intro.ppsx
intensitytransformation.ppsx
PSNR based Optimization using statistical algorithm.ppsx
dipslideshare.ppsx
webdesigning.ppsx
classtimetable.ppsx
Intensity Transformation and Spatial Filtering
Sequential consistency model
Ad

Recently uploaded (20)

PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Pre independence Education in Inndia.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Institutional Correction lecture only . . .
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Pharma ospi slides which help in ospi learning
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
01-Introduction-to-Information-Management.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Final Presentation General Medicine 03-08-2024.pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Pharmacology of Heart Failure /Pharmacotherapy of CHF
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Module 4: Burden of Disease Tutorial Slides S2 2025
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Pre independence Education in Inndia.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Institutional Correction lecture only . . .
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Pharma ospi slides which help in ospi learning
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Computing-Curriculum for Schools in Ghana
01-Introduction-to-Information-Management.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
TR - Agricultural Crops Production NC III.pdf
Sports Quiz easy sports quiz sports quiz
Final Presentation General Medicine 03-08-2024.pptx

matrixmultiplicationparallel.ppsx

  • 1. A Diagnostic research on the effect of Matrix Multiplication Parallelism on Multicore 30.06.2020
  • 2. • Matrix multiplication is an important core computation in many areas of scientific computing • Normally for small multiplication we lean towards to use naive matrix multiplication algorithm which has rich data parallelism. • To obtain more performance through that algorithm we parallelized them using OpenMP. • Implementation of this system is done in Intel Core i9-7900X having 3.30 GHz x 20 CPU with 31.1 GB RAM. • The operating System used is Fedora 27 and the software is Geanv Version 2 IDE, GCC Linux Compiler. Objective
  • 3. • Parallel computing has been the fast evolving research area in the last few decades. • Parallel computing means dividing the same task into subtasks and all the subtasks execute simultaneous on multiple processors to yield the output. • This paper shows the effect of Square Naive Matrix Multiplication algorithm on sequential as well as parallel implementation. • Performances were evaluated in a multicore on the basis of its run time • Chip Microprocessors - As we increase the capacity of chip placing multiple processors on a single chip became practical. • OpenMP is an API that uses multithreaded and shared memory parallelism. Parallel Computing
  • 4. Square Naive Matrix Multiplication • Executing Matrix multiplication using Naive algorithm is referred as Square Naïve Matrix multiplication. • Each entry is calculated as sum of products. • The product of two Square Matrices A and B obtaining C Square matrix. • each entry of the product is computed as a sum of n pairwise products where n represents number of rows/columns.
  • 5. SQUARE_NAÏVE_MATRIX-MULTIPLY (A, B, C) N = A.rows for i = 1 to N for j = 1 to N Cij = 0 for k = 1 to N Cij = Cij + Aik * Bkj Return C Pseudo Code for Serial version
  • 6. PARALLEL_SQUARE_NAIVE_MATRIX-MULTIPLY (A, B, C) #pragma omp parallel for schedule(dynamic, chunk) collapse(2) private(i, j, k) shared(A, B, C) for(i=0;i<n;i++) for( j=0;j<n;j++) for(k=0;k<n;k++) C[i][j]+=A[i][k]*B[k][j]; Return C; Pseudo Code for Parallel version
  • 7. OpenMP Clause • num_thread(integer) • Schedule(type_of_schedule, Chunk size) • Collapse(number_of_nested_for_loop) • Private(list_of_variables) • Shared(list_of_variables)
  • 9. Implementation – Sequential Version • A sequential Square Naive Matrix Multiplication algorithm was implemented in C++ GCC Compiler without using OpenMP. • the parallel codes have been written in C++ GCC Compiler with using OpenMP. • C=A×B given A is n*n matrix with the element aij, B is n*n matrix with element bij and the product C is n*n matrix with elements cij. • The performances of the sequential matrix multiplication are mostly affected by the inefficiency of accessing the memory especially when implemented in C++.
  • 10. Implementation – Parallel Version • major design issues implementing using OpenMP can be determined. • Square Naive matrix multiplication algorithm parallelism can be accomplished through the use of OpenMP components such as compiler directives, runtime library routines and environment variables. • Both codes are executed in Fedora 27 Linux Operating System on Intel Core i9-7900 Multicore system. • The running time and performance (speed up, efficiency) of the serial and parallel square Naive matrix multiplication algorithms has been done.
  • 11. • Speed up • Efficiency Efficiency = Speedup and Efficiency time execution Parallel time execution Sequential Speedup  11 Dr.D.Christopher Durairaj & Mrs.A.Bharathi Lakshmi
  • 12. Result – Time Taken Cores 1000 x 1000 2000 x 2000 3000 x 3000 4000 x 4000 5000 x 5000 1 4.1621 54.4683 233.9659 639.1539 1282.2257 2 2.1539 25.9129 118.3784 316.9857 641.8125 4 1.0993 17.0027 64.2888 172.9639 329.7763 8 0.5822 8.5168 34.0960 81.1930 163.0773 12 0.5074 5.8074 22.6845 62.6338 135.0753 16 0.5061 4.7371 19.4545 57.1035 126.0156 18 0.4708 4.6368 18.6277 52.2070 120.5529 20 0.4487 0.5695 0.6275 0.5502 0.5177
  • 14. Result - Speedup Cores 1000× 1000 2000× 2000 3000× 3000 4000× 4000 5000× 5000 2 1.9323 2.1020 1.9764 2.0163 1.9978 4 3.7861 3.2035 3.6393 3.6953 3.8882 8 7.1488 6.3954 6.8620 7.8720 7.8627 12 8.2022 9.3791 10.3139 10.2046 9.4927 16 8.2241 11.4983 12.0263 11.1929 10.1751 18 8.8407 11.7469 12.5601 12.2427 10.6362 20 11.3795 14.6478 15.6904 15.2592 13.7029
  • 16. Result - Efficiency Cores 1000× 1000 2000× 2000 3000× 3000 4000× 4000 5000× 5000 2 0.9662 1.0510 0.9882 1.0082 0.9989 4 0.9465 0.8009 0.9098 0.9238 0.9720 8 0.8936 0.7994 0.8577 0.9840 0.9828 12 0.6835 0.7816 0.8595 0.8504 0.7911 16 0.5140 0.7186 0.7516 0.6996 0.6359 18 0.4912 0.6526 0.6978 0.6801 0.5909 20 0.4002 0.5826 0.6621 0.6423 0.5482
  • 18. Summary • number of threads assignment in per core and matrix sizes increases. • the performance (Speedup and Efficiency) also increases in square Naive matrix multiplication algorithm parallel implementation. • the execution time of parallel computing with OpenMP got the best achievement than the serial computing. • When handled with larger datasets.