SlideShare a Scribd company logo
Parallel Computing
30.09.2020
Holy Cross College, Puttady Kerala
International Webinar
Dr.A.Bharathi Lakshmi
Head of IT Department, VVVC, VNR
Content
•What
•Why
•Architecture
•Software and Processors
•Parallel Programming
•Research Work
Preliminary
Preliminary
Preliminary
Parallel Computing
Serial Computing
Parallel Computing
Why Parallel Computing
•Save Time
•Memory Usage
•Concurrency
Architecture
Architecture
•Flynn’s Taxonomy
•Feng’s Classification
•Handler Classification
Flynn’s Taxonomy
Flynn’s Taxonomy - SISD
Flynn’s Taxonomy – SIMD
Flynn’s Taxonomy - MISD
Flynn’s Taxonomy - MIMD
Memory Architecture
•Shared Memory
•Uniform Memory Access (UMA)
•Non Uniform Memory Access (NUMA)
•Distributed Memory
•Hybrid Memory
Memory Architecture - UMA
Memory Architecture - NUMA
Memory Architecture
Distributed Memory
Memory Architecture
Hybrid Memory
Type of Parallel Computing
•Data Parallel
•Task Parallel
•Pipeline Parallel
OS & Processor
• Multiprocessing
• Multitasking
• Multithreading
• AMD
• 4 – 32 Cores
• 4 – 64 Threads
• Intel
• 2 – 7 Cores
• Duo – multithreading
• I7 – 8 Cores
Programming Languages
• Apache Hadoop
• Apache Spark
• Apache Flink
• Apache Beam
• CUDA
• OpenCL
• OpenHMPP
• OpenMP for C, C++ and Fortran (Shared Memory)
• Message Passing Interface (MPI) for C, C++ and Fortran (Distributed
Memory)
OpenMP
•Thread Modeling
•Converting Serial to Parallel program is easy
•Unix pThread
•Compiler directives
•Runtime Library
•Environmental Variables
•Fork-Join Model
OpenMP – Fork-Join Model
OpenMP - Directives
• For Parallel work-sharing
• parallel - # pragma omp parallel
• for - #pragma omp [parallel] for [clauses]
• sections - #pragma omp [parallel] sections [clauses]
• single - #pragma omp single [clauses]
• For Master and Synchronization
• master
• critical
• barrier
• atomic
• flush
• Ordered
• #pragma omp directive
OpenMP
•omp.h – Runtime Library
•Environmental Variable
•omp_dynamic
•omp_num_threads
•omp_schedule
•omp_nested
omp_dynamic
•Syntax
omp_dynamic = boolean value
•value – true | false
•True – allow users to adjust number of threads
•False – users can’t adjust number of threads
•Default value - false
omp_num_threads
• Syntax
omp_num_threads = num_list
• Num_list – positive integer values
• Single value
• True & parallel construct without num_threads
• false & parallel construct without num_threads
• Multiple values
• True & parallel construct without num_threads
• false & parallel construct without num_threads
omp_schedule
• Syntax
omp_schedule [= type[,size]]
• Type
• Dynamic
• Guided
• Runtime
• static
• Size
• Iterations
• Integer
• Not valid – Runtime
omp_nested
• Syntax
omp_nested [= true | false]
• True - enabled
• False – disabled
• Default - false
Functions
• omp_set_num_threads(int num_threads)
• int omp_get_num_threads
• int omp_get_max_threads
• int omp_get_thread_num
• int omp_get_num_procs
• void omp_set_dynamic
• int omp_get_dynamic
• void omp_set_nested
• int omp_get_nested
OpenMP - Clauses
• For General attributes
• if - if(expression)
• num_threads – num_threads(num)
• ordered - ordered
• schedule
• nowait - nowait
• For data-sharing attributes
• private – private(var)
• firstprivate – firstprivate(var)
• lastprivate – lastprivate(var)
• shared – shared(var)
• default – default(shared | none)
• reduction – reduction(operator:list)
Paradigm for using OMP
• Write a sequential program
• Identify the portion to be parallelized
•Add directive/pragmas
• In addition to this call runtime library routines and
modify environment variables
• Parallel programming is ready.
• Use OpenMP’s compiler to compile
•Run the program
Matrix Multiplication
•Serial coding
for(int i=0;i<n;i++)
for(int k=0;k<n;k++)
for(int j=0;j<m;j++)
c[i][j]=c[i][j]+a[i][k]*b[k][j];
Matrix Multiplication
• Parallel coding
#include<omp.h>
omp_set_num_threads(4);
#pragma omp parallel for private(i,j,k)
{
for(int i=0;i<n;i++)
for(int k=0;k<n;k++)
for(int j=0;j<m;j++)
c[i][j]=c[i][j]+a[i][k]*b[k][j];
}
Sum of an Array
•Serial coding
sum=0;
for(int i=0;i<n;i++)
for(int j=0;j<m;j++)
sum+=a[i][j];
Sum of an Array
• Parallel coding
#include<omp.h>
omp_set_num_threads(4);
#pragma omp parallel for private(i,j) reduction(+:sum)
{
for(int i=0;i<n;i++)
for(int j=0;j<m;j++)
sum+=a[i][j];
}
Image Reconstruction - Pseducode
Image Reconstruction – Time complexity
Time complexity Time complexity Graph
Speedup Graph
10 12 15 20 30
FBP 0.008455 0.007704 0.00781 0.015839 0.021083
SIRT 75.6588 76.6664 91.628 56.3881 176.8353
SART 50.5161 57.6855 56.3243 56.3881 202.067
ART 1609.1 1699.73 1889.8 918.3131 723.983
MLEM 522.894 462.973 750.215 709.861 2134.4
MAPEM 726.522 532.309 727.098 532.317 771.465
2 Core 502.087 332.341 502.65 332.347 463.192
4 Core 398.953 297.146 399.495 297.143 447.483
8 Core 198.488 145.926 199.045 145.934 259.513
Square Naïve Matrix Multiplication
Time complexity
Time complexity Graph
Speedup Graph
0
200
400
600
800
1000
1200
1400
1000 × 1000 2000 × 2000 3000 × 3000 4000 × 4000 5000 × 5000
1 Core 2 Cores 4 Cores 8 Cores
12 Cores 16 Cores 18 cores 20 Cores
0.0000
2.0000
4.0000
6.0000
8.0000
10.0000
12.0000
14.0000
16.0000
18.0000
1000 ×
1000
2000 ×
2000
3000 ×
3000
4000 ×
4000
5000 ×
5000
2 4 8 12
16 18 20
Cores
1000 x
1000
2000 x
2000
3000 x
3000
4000 x
4000
5000 x
5000
1 4.1621 54.4683 233.965 639.153 1282.2257
2 2.1539 25.9129 118.3784 316.9857 641.8125
4 1.0993 17.0027 64.2888 172.9639 329.7763
8 0.5822 8.5168 34.0960 81.1930 163.0.773
12 0.5074 5.8074 22.6845 62.6338 135.0753
16 0.5061 4.7371 19.455 57.1035 126.0156
18 0.4708 4.6368 18.6277 52.2070 120.5529
20 0.4487 0.5695 0.6275 0.5502 0.5177
Hot research
•Nividia
•Data mining – tremendous data
•Lacking in techniques and Computational power
•AI/Machine learning
•Image Processing
•Medical Field
• Image Reconstruction
Parallel Computing--Webminar.ppsx
Parallel Computing--Webminar.ppsx

More Related Content

PDF
0-C Reviewhgfhgfghfghfgdggfghfghfhgfhfgfhh.pdf
PPT
Sathya Final review
PPTX
Jvm memory model
PDF
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
PPT
Intro_2.ppt
PPT
Intro.ppt
PPT
Intro.ppt
PDF
Migrating from matlab to python
0-C Reviewhgfhgfghfghfgdggfghfghfhgfhfgfhh.pdf
Sathya Final review
Jvm memory model
.NET Fest 2019. Николай Балакин. Микрооптимизации в мире .NET
Intro_2.ppt
Intro.ppt
Intro.ppt
Migrating from matlab to python

Similar to Parallel Computing--Webminar.ppsx (20)

PDF
Lecture 2 role of algorithms in computing
PPTX
Connecting C++ and JavaScript on the Web with Embind
PDF
State of the .Net Performance
PDF
Adam Sitnik "State of the .NET Performance"
PPTX
Fedor Polyakov - Optimizing computer vision problems on mobile platforms
PDF
Lessons learned while building Omroep.nl
PDF
Lessons learned while building Omroep.nl
PPTX
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
PDF
Building source code level profiler for C++.pdf
PDF
Performance van Java 8 en verder - Jeroen Borgers
PDF
Look Mommy, No GC! (TechDays NL 2017)
PDF
The Diabolical Developers Guide to Performance Tuning
PDF
Crafting Your Own Numpy: Do More in C++ and Make It Python @ PyCon JP 2024
PPTX
MATLAB & Image Processing
PPT
SMS Spam Filter Design Using R: A Machine Learning Approach
PPTX
Algorithm and Data Structures - Basic of IT Problem Solving
PPT
Basic terminologies & asymptotic notations
PDF
EM12c: Capacity Planning with OEM Metrics
PPTX
Ip mdu-b.tech
Lecture 2 role of algorithms in computing
Connecting C++ and JavaScript on the Web with Embind
State of the .Net Performance
Adam Sitnik "State of the .NET Performance"
Fedor Polyakov - Optimizing computer vision problems on mobile platforms
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
SQL SCIPY STREAMLIT_Introduction to the basic of SQL SCIPY STREAMLIT
Building source code level profiler for C++.pdf
Performance van Java 8 en verder - Jeroen Borgers
Look Mommy, No GC! (TechDays NL 2017)
The Diabolical Developers Guide to Performance Tuning
Crafting Your Own Numpy: Do More in C++ and Make It Python @ PyCon JP 2024
MATLAB & Image Processing
SMS Spam Filter Design Using R: A Machine Learning Approach
Algorithm and Data Structures - Basic of IT Problem Solving
Basic terminologies & asymptotic notations
EM12c: Capacity Planning with OEM Metrics
Ip mdu-b.tech
Ad

More from BharathiLakshmiAAssi (20)

PPT
.Net IDE Components and Applications
PPT
.Net Controlling Program Flow Statements
PPTX
Fundamentals of .Net Programming concepts
PPTX
VB.net&OOP.pptx
PPTX
VB.netIDE.pptx
PPT
VB.Net-Introduction.ppt
PPT
File Allocation Methods.ppt
PPTX
Demand Paging.pptx
PPTX
Virtual Memory.pptx
PPTX
Knowing about Computer SS.pptx
PPSX
PPSX
Iterative Algorithms.ppsx
PPSX
Intensity Transformation.ppsx
PPSX
MAtrix Multiplication Parallel.ppsx
PPSX
Web Designing.ppsx
PPSX
Graphics Designing-Intro.ppsx
PPSX
Intensity Transformation & Spatial Filtering.ppsx
PPSX
DIP Slide Share.ppsx
PPSX
Class Timetable.ppsx
PPSX
.Net IDE Components and Applications
.Net Controlling Program Flow Statements
Fundamentals of .Net Programming concepts
VB.net&OOP.pptx
VB.netIDE.pptx
VB.Net-Introduction.ppt
File Allocation Methods.ppt
Demand Paging.pptx
Virtual Memory.pptx
Knowing about Computer SS.pptx
Iterative Algorithms.ppsx
Intensity Transformation.ppsx
MAtrix Multiplication Parallel.ppsx
Web Designing.ppsx
Graphics Designing-Intro.ppsx
Intensity Transformation & Spatial Filtering.ppsx
DIP Slide Share.ppsx
Class Timetable.ppsx
Ad

Recently uploaded (20)

PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Institutional Correction lecture only . . .
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Cell Structure & Organelles in detailed.
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
master seminar digital applications in india
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Lesson notes of climatology university.
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
RMMM.pdf make it easy to upload and study
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Pre independence Education in Inndia.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Institutional Correction lecture only . . .
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Microbial diseases, their pathogenesis and prophylaxis
Cell Structure & Organelles in detailed.
Abdominal Access Techniques with Prof. Dr. R K Mishra
master seminar digital applications in india
Microbial disease of the cardiovascular and lymphatic systems
Lesson notes of climatology university.
Supply Chain Operations Speaking Notes -ICLT Program
VCE English Exam - Section C Student Revision Booklet
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
RMMM.pdf make it easy to upload and study
Final Presentation General Medicine 03-08-2024.pptx
Pre independence Education in Inndia.pdf

Parallel Computing--Webminar.ppsx

  • 1. Parallel Computing 30.09.2020 Holy Cross College, Puttady Kerala International Webinar Dr.A.Bharathi Lakshmi Head of IT Department, VVVC, VNR
  • 7. Why Parallel Computing •Save Time •Memory Usage •Concurrency
  • 15. Memory Architecture •Shared Memory •Uniform Memory Access (UMA) •Non Uniform Memory Access (NUMA) •Distributed Memory •Hybrid Memory
  • 20. Type of Parallel Computing •Data Parallel •Task Parallel •Pipeline Parallel
  • 21. OS & Processor • Multiprocessing • Multitasking • Multithreading • AMD • 4 – 32 Cores • 4 – 64 Threads • Intel • 2 – 7 Cores • Duo – multithreading • I7 – 8 Cores
  • 22. Programming Languages • Apache Hadoop • Apache Spark • Apache Flink • Apache Beam • CUDA • OpenCL • OpenHMPP • OpenMP for C, C++ and Fortran (Shared Memory) • Message Passing Interface (MPI) for C, C++ and Fortran (Distributed Memory)
  • 23. OpenMP •Thread Modeling •Converting Serial to Parallel program is easy •Unix pThread •Compiler directives •Runtime Library •Environmental Variables •Fork-Join Model
  • 25. OpenMP - Directives • For Parallel work-sharing • parallel - # pragma omp parallel • for - #pragma omp [parallel] for [clauses] • sections - #pragma omp [parallel] sections [clauses] • single - #pragma omp single [clauses] • For Master and Synchronization • master • critical • barrier • atomic • flush • Ordered • #pragma omp directive
  • 26. OpenMP •omp.h – Runtime Library •Environmental Variable •omp_dynamic •omp_num_threads •omp_schedule •omp_nested
  • 27. omp_dynamic •Syntax omp_dynamic = boolean value •value – true | false •True – allow users to adjust number of threads •False – users can’t adjust number of threads •Default value - false
  • 28. omp_num_threads • Syntax omp_num_threads = num_list • Num_list – positive integer values • Single value • True & parallel construct without num_threads • false & parallel construct without num_threads • Multiple values • True & parallel construct without num_threads • false & parallel construct without num_threads
  • 29. omp_schedule • Syntax omp_schedule [= type[,size]] • Type • Dynamic • Guided • Runtime • static • Size • Iterations • Integer • Not valid – Runtime
  • 30. omp_nested • Syntax omp_nested [= true | false] • True - enabled • False – disabled • Default - false
  • 31. Functions • omp_set_num_threads(int num_threads) • int omp_get_num_threads • int omp_get_max_threads • int omp_get_thread_num • int omp_get_num_procs • void omp_set_dynamic • int omp_get_dynamic • void omp_set_nested • int omp_get_nested
  • 32. OpenMP - Clauses • For General attributes • if - if(expression) • num_threads – num_threads(num) • ordered - ordered • schedule • nowait - nowait • For data-sharing attributes • private – private(var) • firstprivate – firstprivate(var) • lastprivate – lastprivate(var) • shared – shared(var) • default – default(shared | none) • reduction – reduction(operator:list)
  • 33. Paradigm for using OMP • Write a sequential program • Identify the portion to be parallelized •Add directive/pragmas • In addition to this call runtime library routines and modify environment variables • Parallel programming is ready. • Use OpenMP’s compiler to compile •Run the program
  • 34. Matrix Multiplication •Serial coding for(int i=0;i<n;i++) for(int k=0;k<n;k++) for(int j=0;j<m;j++) c[i][j]=c[i][j]+a[i][k]*b[k][j];
  • 35. Matrix Multiplication • Parallel coding #include<omp.h> omp_set_num_threads(4); #pragma omp parallel for private(i,j,k) { for(int i=0;i<n;i++) for(int k=0;k<n;k++) for(int j=0;j<m;j++) c[i][j]=c[i][j]+a[i][k]*b[k][j]; }
  • 36. Sum of an Array •Serial coding sum=0; for(int i=0;i<n;i++) for(int j=0;j<m;j++) sum+=a[i][j];
  • 37. Sum of an Array • Parallel coding #include<omp.h> omp_set_num_threads(4); #pragma omp parallel for private(i,j) reduction(+:sum) { for(int i=0;i<n;i++) for(int j=0;j<m;j++) sum+=a[i][j]; }
  • 39. Image Reconstruction – Time complexity Time complexity Time complexity Graph Speedup Graph 10 12 15 20 30 FBP 0.008455 0.007704 0.00781 0.015839 0.021083 SIRT 75.6588 76.6664 91.628 56.3881 176.8353 SART 50.5161 57.6855 56.3243 56.3881 202.067 ART 1609.1 1699.73 1889.8 918.3131 723.983 MLEM 522.894 462.973 750.215 709.861 2134.4 MAPEM 726.522 532.309 727.098 532.317 771.465 2 Core 502.087 332.341 502.65 332.347 463.192 4 Core 398.953 297.146 399.495 297.143 447.483 8 Core 198.488 145.926 199.045 145.934 259.513
  • 40. Square Naïve Matrix Multiplication Time complexity Time complexity Graph Speedup Graph 0 200 400 600 800 1000 1200 1400 1000 × 1000 2000 × 2000 3000 × 3000 4000 × 4000 5000 × 5000 1 Core 2 Cores 4 Cores 8 Cores 12 Cores 16 Cores 18 cores 20 Cores 0.0000 2.0000 4.0000 6.0000 8.0000 10.0000 12.0000 14.0000 16.0000 18.0000 1000 × 1000 2000 × 2000 3000 × 3000 4000 × 4000 5000 × 5000 2 4 8 12 16 18 20 Cores 1000 x 1000 2000 x 2000 3000 x 3000 4000 x 4000 5000 x 5000 1 4.1621 54.4683 233.965 639.153 1282.2257 2 2.1539 25.9129 118.3784 316.9857 641.8125 4 1.0993 17.0027 64.2888 172.9639 329.7763 8 0.5822 8.5168 34.0960 81.1930 163.0.773 12 0.5074 5.8074 22.6845 62.6338 135.0753 16 0.5061 4.7371 19.455 57.1035 126.0156 18 0.4708 4.6368 18.6277 52.2070 120.5529 20 0.4487 0.5695 0.6275 0.5502 0.5177
  • 41. Hot research •Nividia •Data mining – tremendous data •Lacking in techniques and Computational power •AI/Machine learning •Image Processing •Medical Field • Image Reconstruction