SlideShare a Scribd company logo
UNIT 1 - Introduction
AUTOMATING PARALLEL PROGRAMMING
 When writing code, we typically don’t need to understand the details of the target system, as the
compiler handles it.
 Developers usually think in terms of a single CPU and sequential processing during coding and
debugging.
 Implementing algorithms for parallel systems (software or hardware) is more related than it seems.
 Parallelism in software and hardware shares common challenges and approaches.
AUTOMATING PARALLEL PROGRAMMING
AUTOMATING PARALLEL PROGRAMMING
Layers of Implementation
 Layer 5 - Application Layer:
 Defines the application or problem to be implemented on a parallel computing platform.
 Specifies the inputs and outputs, including data storage and timing requirements.
 Layer 4 - Algorithm Development:
 Focuses on defining the tasks and their interdependencies.
 Parallelism may not be evident in this layer, as tasks are usually developed for linear execution.
 The result is a dependence graph, directed graph (DG), or adjacency matrix summarizing task
dependencies.
AUTOMATING PARALLEL PROGRAMMING
 Layer 3 - Parallelization Layer:
 Extracts parallelism from the algorithm developed in Layer 4.
 It generates thread timing and processor assignments for software or hardware implementations.
 This layer is crucial for optimizing the algorithm for parallel execution.
 Layer 2 - Coding Layer:
 Involves writing the parallel algorithm in a high-level language.
 The language depends on the target parallel computing platform. For general-purpose platforms,
languages like Cilk++, OpenMP, or CUDA(computer unified device architecture) are used.
 For custom platforms, Hardware Description Languages (HDLs) like Verilog or VHDL are used.
AUTOMATING PARALLEL PROGRAMMING
 Layer 1 - Realization Layer:
 The algorithm is realized on a parallel computer platform, using methods like multithreading or custom parallel
processors (e.g., ASICs(application specific integrated circuit) or FPGAs (Field programmable gateways) ).
Automatic Programming in Parallel Computing:
 Automatic serial programming: The programmer writes code in high-level languages (C, Java, FORTRAN), and the
code is compiled automatically.
 Parallel computing: It is more complex as programmers need to manage how tasks are distributed and executed across
multiple processors.
 Parallelizing compilers can handle simple loops and embarrassingly parallel algorithms (tasks that can be easily
parallelized).
 For more complex tasks, the programmer needs intimate knowledge of processor interactions and task execution timing.
Parallel Algorithms and Parallel Architectures
 Parallel algorithms and parallel hardware are interconnected; the development of one often depends on
the other.
 Parallelism can be implemented at different levels in a computing system through hardware and
software techniques:
Data-Level Parallelism
 Operates on multiple bits of a datum or multiple data simultaneously.
 Examples: Bit-parallel addition, multiplication, division, vector processor arrays, and systolic arrays.
Instruction-Level Parallelism (ILP)
 Executes multiple instructions simultaneously within a processor. Example:
 Instruction pipelining.
Parallel Algorithms and Parallel Architectures
Thread-Level Parallelism (TLP)
 Executes multiple threads (lightweight processes sharing processor resources) simultaneously.
 Threads can run on one or multiple processors.
Process-Level Parallelism
 Manages multiple independent processes, each with dedicated resources like memory and registers.
Example:
 Classic multitasking and time-sharing across single or multiple machines.
Measuring benefits of Parallel Computing
Speedup Factor
 The benefit of parallel computing is measured by comparing the time taken to complete a task on a
single processor with the time taken on N parallel processors. The speedup, S(N) , is defined as:
 where Tp (1) is the algorithm processing time on a single processor and
 Tp ( N ) is the processing time on the parallel processors.
 In an ideal situation, for a fully parallelizable algorithm, and when the communication time between
processors and memory is neglected , we have Tp ( N ) = Tp (1)/ N , and the above equation gives
Communication Overhead
 Both single and parallel computing systems require data transfer between processors and memory.
 Communication delays occur due to a speed mismatch between the processor and memory.
 Parallel systems need processors to exchange data via interconnection networks, adding complexity.
Issues Affecting Communication Efficiency:
Interconnection Network Delay:
 Delays arise from factors like:
 Bit propagation.
 Message transmission.
 Queuing within the network.
 These delays depend on network topology, data size, and network speed.
Communication Overhead
Memory Bandwidth:
 Memory access is limited by a single-port system, restricting data transfer to one word per memory cycle.
Memory Collisions:
 Occur when multiple processors try to access the same memory module simultaneously.
 Arbitration mechanisms are required to resolve access conflicts.
Memory Wall:
 Memory transfer speeds lag behind processor speeds.
 This problem is being solved using memory hierarchy such as
 register → cache → RAM → electronic disk → magnetic disk → optical disk).
Communication Overhead
Estimating Speedup Factor and Communication
Overhead
 Let us assume we
have a parallel
algorithm consisting
of N independent
tasks that can be
executed either
on a single processor
or on N processors
 Under these ideal
circumstances,
Amdahl's Law
 Amdahl's Law is a fundamental principle used to estimate the potential speedup that can be achieved by
parallelizing a computation.
 It describes the maximum expected improvement in the execution time of a program when part of the
computation is parallelized.
Amdahl's Law
 Overall Speedup(max) = 1/{1 – Fraction Enhanced}
 Likewise, we can also think of the case where f = 1. Amdahl’s law is a principle that states that the
maximum potential improvement to the performance of a system is limited by the portion of the system
that cannot be improved.
 In other words, the performance improvement of a system as a whole is limited by its bottlenecks. The
law is often used to predict the potential performance improvement of a system when adding more
processors or improving the speed of individual processors.
 It is named after Gene Amdahl, who first proposed it in 1967.
Amdahl's Law
 The formula for Amdahl’s law is:
S = 1 / (1 – P + (P / N)) Where:
 S is the speedup of the system
 P is the proportion of the system that can be improved N is the number of processors in the system
 For example, if a system has a single bottleneck that occupies 20% of the total execution time, and we
add 4 more processors to the system, the speedup would be:
S = 1 / (1 – 0.2 + (0.2 / 5))
S = 1 / (0.8 + 0.04)
S = 1 / 0.84
S = 1.19
 This means that the overall performance of the system would improve by about 19% with the addition of
the 4 processors.
APPLICATIONS OF PARALLEL COMPUTING
Scientific Research and Simulation:
 Weather Forecasting: Running complex models to predict weather patterns and climate changes.
 Astrophysics and Cosmology: Simulating celestial bodies, universe evolution, etc.
 Molecular Dynamics: Studying molecular interactions, protein folding, drug discovery, etc.
Big Data Analytics and Data Processing:
 Data Mining: Analyzing vast datasets to extract patterns, trends, and insights.
 Machine Learning and AI: Training deep neural networks, processing large datasets in real-time.
 Web Search Engines: Indexing and retrieving information from enormous web databases.
APPLICATIONS OF PARALLEL COMPUTING
High-Performance Computing (HPC):
 Financial Modeling: Performing risk analysis, option pricing, and portfolio optimization.
 Fluid Dynamics and Computational Chemistry: Simulating fluid flows, chemical reactions, etc.
 Finite Element Analysis: Solving complex engineering problems in aerospace, automotive industries, etc.
Parallel Databases and Search Algorithms:
 Parallel Database Systems: Handling concurrent queries and transactions in large-scale databases.
 Parallel Search Algorithms: Speeding up searches in large datasets, such as in cryptography and pattern
matching.
APPLICATIONS OF PARALLEL COMPUTING
Image and Signal Processing:
 Medical Imaging: Processing MRI, CT scans for diagnostics and treatment planning.
 Video Processing: Real-time video encoding, decoding, and analysis.
Distributed Systems and Networking:
 Distributed Computing: Handling distributed tasks efficiently in cloud computing environments.
 Network Routing and Traffic Analysis: Optimizing routing algorithms, analyzing network traffic.
Real-Time Systems and Simulation:
 Robotics and Automation: Controlling multiple robots simultaneously for complex tasks.
 Virtual Reality and Gaming: Rendering complex scenes and simulations in real-time.
SHARED - MEMORY MULTIPROCESSORS (UNIFORM
MEMORY ACCESS [ UMA ])
 Shared-memory processors are popular due to their simple and general programming model, enabling
easy development of parallel software.
 Another term for shared-memory processors is Parallel Random Access Machine (PRAM).
 A shared-address space is used for communication between processors, with all processors accessing a
common memory space.
SHARED - MEMORY MULTIPROCESSORS (UNIFORM
MEMORY ACCESS [ UMA ])
1.1 Introduction.pptx about the design thinking of the engineering students

More Related Content

PDF
Introduction to Parallel Computing
PPT
Lecture1
PPT
Nbvtalkatjntuvizianagaram
PPTX
Parallel Computing-Part-1.pptx
PPTX
Parallel Algorithms Advantages and Disadvantages
PDF
Lecture 1 introduction to parallel and distributed computing
PDF
Chapter 1 - introduction - parallel computing
PPTX
Compiler design
Introduction to Parallel Computing
Lecture1
Nbvtalkatjntuvizianagaram
Parallel Computing-Part-1.pptx
Parallel Algorithms Advantages and Disadvantages
Lecture 1 introduction to parallel and distributed computing
Chapter 1 - introduction - parallel computing
Compiler design

Similar to 1.1 Introduction.pptx about the design thinking of the engineering students (20)

PDF
Parallel Algorithms
PDF
PPT
parallel computing.ppt
PDF
Big Data Systems Lecture -2 for Cloud Computing.pdf
PDF
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
PPTX
intro, definitions, basic laws+.pptx
PPTX
Unit 1.2 Parallel Programming in HPC.pptx
PDF
Parallel and Distributed Computing chapter 1
PPTX
Introduction to Parallel Computing
PDF
Parallel Algorithms
PPT
PPTX
parallel computing paradigm lecture.pptx
PPTX
Thinking in parallel ab tuladev
PPTX
Full introduction to_parallel_computing
PPTX
Performance measures
PPT
CS4961-L1.ppt
PPTX
Lecture 04 Chapter 1 - Introduction to Parallel Computing
PPTX
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
PPT
SecondPresentationDesigning_Parallel_Programs.ppt
PDF
Parallel Computing - Lec 5
Parallel Algorithms
parallel computing.ppt
Big Data Systems Lecture -2 for Cloud Computing.pdf
3. Potential Benefits, Limits and Costs of Parallel Programming.pdf
intro, definitions, basic laws+.pptx
Unit 1.2 Parallel Programming in HPC.pptx
Parallel and Distributed Computing chapter 1
Introduction to Parallel Computing
Parallel Algorithms
parallel computing paradigm lecture.pptx
Thinking in parallel ab tuladev
Full introduction to_parallel_computing
Performance measures
CS4961-L1.ppt
Lecture 04 Chapter 1 - Introduction to Parallel Computing
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
SecondPresentationDesigning_Parallel_Programs.ppt
Parallel Computing - Lec 5
Ad

More from HrushikeshDandu (13)

PPTX
Module4_GAN.pptxgdgdijehejejjejejejhehjdd
PPTX
Module -4.pptxhsks KB bsnskskndbsbdnmdndm
PPTX
Leverages.pptxhsjskhdhdnksjdjsbjsjsjhssjw
PPTX
Module-5.ppthsjkshshsjsjjsjsjdjndbdbbdhjj
PPTX
team -8.pptxvjkvgigioiibibibiibbiviggihu
PPTX
team_-8[1].pptxvjigivhihigigiiggigiiggui
PPTX
unit 5.pptxhhhnggjfvbjoohcchvvikbkbkbobh
PPTX
Task 04_INTALLATION OF VIRTUAL BOX AND UBUNTU.pptx
PPTX
Data Communications- Unit-4.pptx
PDF
hmmppt-120930073312-phpapp02.pdf
PDF
budget111ppt-130306122511-phpapp01.pdf
PDF
Xpo Innovation Technology (1).pdf.pdf (2)-1.pdf
PDF
Partial Derivatives.pdf
Module4_GAN.pptxgdgdijehejejjejejejhehjdd
Module -4.pptxhsks KB bsnskskndbsbdnmdndm
Leverages.pptxhsjskhdhdnksjdjsbjsjsjhssjw
Module-5.ppthsjkshshsjsjjsjsjdjndbdbbdhjj
team -8.pptxvjkvgigioiibibibiibbiviggihu
team_-8[1].pptxvjigivhihigigiiggigiiggui
unit 5.pptxhhhnggjfvbjoohcchvvikbkbkbobh
Task 04_INTALLATION OF VIRTUAL BOX AND UBUNTU.pptx
Data Communications- Unit-4.pptx
hmmppt-120930073312-phpapp02.pdf
budget111ppt-130306122511-phpapp01.pdf
Xpo Innovation Technology (1).pdf.pdf (2)-1.pdf
Partial Derivatives.pdf
Ad

Recently uploaded (20)

PPTX
UNIT 4 Total Quality Management .pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Welding lecture in detail for understanding
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
Construction Project Organization Group 2.pptx
PPT
Mechanical Engineering MATERIALS Selection
PPTX
web development for engineering and engineering
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Geodesy 1.pptx...............................................
DOCX
573137875-Attendance-Management-System-original
PDF
composite construction of structures.pdf
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
UNIT 4 Total Quality Management .pptx
R24 SURVEYING LAB MANUAL for civil enggi
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Lecture Notes Electrical Wiring System Components
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
OOP with Java - Java Introduction (Basics)
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Welding lecture in detail for understanding
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
Construction Project Organization Group 2.pptx
Mechanical Engineering MATERIALS Selection
web development for engineering and engineering
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Geodesy 1.pptx...............................................
573137875-Attendance-Management-System-original
composite construction of structures.pdf
bas. eng. economics group 4 presentation 1.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS

1.1 Introduction.pptx about the design thinking of the engineering students

  • 1. UNIT 1 - Introduction
  • 2. AUTOMATING PARALLEL PROGRAMMING  When writing code, we typically don’t need to understand the details of the target system, as the compiler handles it.  Developers usually think in terms of a single CPU and sequential processing during coding and debugging.  Implementing algorithms for parallel systems (software or hardware) is more related than it seems.  Parallelism in software and hardware shares common challenges and approaches.
  • 4. AUTOMATING PARALLEL PROGRAMMING Layers of Implementation  Layer 5 - Application Layer:  Defines the application or problem to be implemented on a parallel computing platform.  Specifies the inputs and outputs, including data storage and timing requirements.  Layer 4 - Algorithm Development:  Focuses on defining the tasks and their interdependencies.  Parallelism may not be evident in this layer, as tasks are usually developed for linear execution.  The result is a dependence graph, directed graph (DG), or adjacency matrix summarizing task dependencies.
  • 5. AUTOMATING PARALLEL PROGRAMMING  Layer 3 - Parallelization Layer:  Extracts parallelism from the algorithm developed in Layer 4.  It generates thread timing and processor assignments for software or hardware implementations.  This layer is crucial for optimizing the algorithm for parallel execution.  Layer 2 - Coding Layer:  Involves writing the parallel algorithm in a high-level language.  The language depends on the target parallel computing platform. For general-purpose platforms, languages like Cilk++, OpenMP, or CUDA(computer unified device architecture) are used.  For custom platforms, Hardware Description Languages (HDLs) like Verilog or VHDL are used.
  • 6. AUTOMATING PARALLEL PROGRAMMING  Layer 1 - Realization Layer:  The algorithm is realized on a parallel computer platform, using methods like multithreading or custom parallel processors (e.g., ASICs(application specific integrated circuit) or FPGAs (Field programmable gateways) ). Automatic Programming in Parallel Computing:  Automatic serial programming: The programmer writes code in high-level languages (C, Java, FORTRAN), and the code is compiled automatically.  Parallel computing: It is more complex as programmers need to manage how tasks are distributed and executed across multiple processors.  Parallelizing compilers can handle simple loops and embarrassingly parallel algorithms (tasks that can be easily parallelized).  For more complex tasks, the programmer needs intimate knowledge of processor interactions and task execution timing.
  • 7. Parallel Algorithms and Parallel Architectures  Parallel algorithms and parallel hardware are interconnected; the development of one often depends on the other.  Parallelism can be implemented at different levels in a computing system through hardware and software techniques: Data-Level Parallelism  Operates on multiple bits of a datum or multiple data simultaneously.  Examples: Bit-parallel addition, multiplication, division, vector processor arrays, and systolic arrays. Instruction-Level Parallelism (ILP)  Executes multiple instructions simultaneously within a processor. Example:  Instruction pipelining.
  • 8. Parallel Algorithms and Parallel Architectures Thread-Level Parallelism (TLP)  Executes multiple threads (lightweight processes sharing processor resources) simultaneously.  Threads can run on one or multiple processors. Process-Level Parallelism  Manages multiple independent processes, each with dedicated resources like memory and registers. Example:  Classic multitasking and time-sharing across single or multiple machines.
  • 9. Measuring benefits of Parallel Computing Speedup Factor  The benefit of parallel computing is measured by comparing the time taken to complete a task on a single processor with the time taken on N parallel processors. The speedup, S(N) , is defined as:  where Tp (1) is the algorithm processing time on a single processor and  Tp ( N ) is the processing time on the parallel processors.  In an ideal situation, for a fully parallelizable algorithm, and when the communication time between processors and memory is neglected , we have Tp ( N ) = Tp (1)/ N , and the above equation gives
  • 10. Communication Overhead  Both single and parallel computing systems require data transfer between processors and memory.  Communication delays occur due to a speed mismatch between the processor and memory.  Parallel systems need processors to exchange data via interconnection networks, adding complexity. Issues Affecting Communication Efficiency: Interconnection Network Delay:  Delays arise from factors like:  Bit propagation.  Message transmission.  Queuing within the network.  These delays depend on network topology, data size, and network speed.
  • 11. Communication Overhead Memory Bandwidth:  Memory access is limited by a single-port system, restricting data transfer to one word per memory cycle. Memory Collisions:  Occur when multiple processors try to access the same memory module simultaneously.  Arbitration mechanisms are required to resolve access conflicts. Memory Wall:  Memory transfer speeds lag behind processor speeds.  This problem is being solved using memory hierarchy such as  register → cache → RAM → electronic disk → magnetic disk → optical disk).
  • 13. Estimating Speedup Factor and Communication Overhead  Let us assume we have a parallel algorithm consisting of N independent tasks that can be executed either on a single processor or on N processors  Under these ideal circumstances,
  • 14. Amdahl's Law  Amdahl's Law is a fundamental principle used to estimate the potential speedup that can be achieved by parallelizing a computation.  It describes the maximum expected improvement in the execution time of a program when part of the computation is parallelized.
  • 15. Amdahl's Law  Overall Speedup(max) = 1/{1 – Fraction Enhanced}  Likewise, we can also think of the case where f = 1. Amdahl’s law is a principle that states that the maximum potential improvement to the performance of a system is limited by the portion of the system that cannot be improved.  In other words, the performance improvement of a system as a whole is limited by its bottlenecks. The law is often used to predict the potential performance improvement of a system when adding more processors or improving the speed of individual processors.  It is named after Gene Amdahl, who first proposed it in 1967.
  • 16. Amdahl's Law  The formula for Amdahl’s law is: S = 1 / (1 – P + (P / N)) Where:  S is the speedup of the system  P is the proportion of the system that can be improved N is the number of processors in the system  For example, if a system has a single bottleneck that occupies 20% of the total execution time, and we add 4 more processors to the system, the speedup would be: S = 1 / (1 – 0.2 + (0.2 / 5)) S = 1 / (0.8 + 0.04) S = 1 / 0.84 S = 1.19  This means that the overall performance of the system would improve by about 19% with the addition of the 4 processors.
  • 17. APPLICATIONS OF PARALLEL COMPUTING Scientific Research and Simulation:  Weather Forecasting: Running complex models to predict weather patterns and climate changes.  Astrophysics and Cosmology: Simulating celestial bodies, universe evolution, etc.  Molecular Dynamics: Studying molecular interactions, protein folding, drug discovery, etc. Big Data Analytics and Data Processing:  Data Mining: Analyzing vast datasets to extract patterns, trends, and insights.  Machine Learning and AI: Training deep neural networks, processing large datasets in real-time.  Web Search Engines: Indexing and retrieving information from enormous web databases.
  • 18. APPLICATIONS OF PARALLEL COMPUTING High-Performance Computing (HPC):  Financial Modeling: Performing risk analysis, option pricing, and portfolio optimization.  Fluid Dynamics and Computational Chemistry: Simulating fluid flows, chemical reactions, etc.  Finite Element Analysis: Solving complex engineering problems in aerospace, automotive industries, etc. Parallel Databases and Search Algorithms:  Parallel Database Systems: Handling concurrent queries and transactions in large-scale databases.  Parallel Search Algorithms: Speeding up searches in large datasets, such as in cryptography and pattern matching.
  • 19. APPLICATIONS OF PARALLEL COMPUTING Image and Signal Processing:  Medical Imaging: Processing MRI, CT scans for diagnostics and treatment planning.  Video Processing: Real-time video encoding, decoding, and analysis. Distributed Systems and Networking:  Distributed Computing: Handling distributed tasks efficiently in cloud computing environments.  Network Routing and Traffic Analysis: Optimizing routing algorithms, analyzing network traffic. Real-Time Systems and Simulation:  Robotics and Automation: Controlling multiple robots simultaneously for complex tasks.  Virtual Reality and Gaming: Rendering complex scenes and simulations in real-time.
  • 20. SHARED - MEMORY MULTIPROCESSORS (UNIFORM MEMORY ACCESS [ UMA ])  Shared-memory processors are popular due to their simple and general programming model, enabling easy development of parallel software.  Another term for shared-memory processors is Parallel Random Access Machine (PRAM).  A shared-address space is used for communication between processors, with all processors accessing a common memory space.
  • 21. SHARED - MEMORY MULTIPROCESSORS (UNIFORM MEMORY ACCESS [ UMA ])