SlideShare a Scribd company logo
Detecting and enhancing loop
level parallelism
With large dependence distance,
more potential of parallelism is
obtained by loop unrolling.
Longer distances may provide the
enough parallelism to keep the
processor busy.
docsity.com
Back Substitution
Back Substitution increases the
amount of parallelism, but sometimes it
also increases the amount of
computation required
These techniques can be applied both:
- within a basic block; and
- within a loop
docsity.com
Eliminating dependent computations
Within a basic block:
Here, algebraic simplifications of
expressions and an optimization
is used.
This called copy propagation
It eliminates operations that copy
values
docsity.com
Eliminating dependent computations
For example; copy propagation of
DADDUI R1,R2,#4
DADDUI R1,R1,#4
Results into
DADDUI R1,R2,#8
Here, computations are eliminated to
remove dependence
docsity.com
Eliminating dependent computations
Optimization:
Tree-Height Reduction Technique
It is also possible to increase the
parallelism of the code by
possibly increasing the number of
operations.
Such optimization is called tree
height reduction
docsity.com
Eliminating dependent computations
For example, the code sequence
ADD R1,R2,R3
ADD R4,R1,R6
ADD R8,R1,R7
requires three cycles for execution
Because, here all the instructions depend
on immediate predecessor and cannot be
issued in parallel
docsity.com
Eliminating dependent computations
Now taking the advantage of the
associatively, the code can be transformed
and written in the from shown as below,
ADD R1,R2,R3
ADD R4,R6,R7
ADD R8,R1,R4
This sequence can be computed in two
execution cycles by issuing first 2
instruction in parallel
docsity.com
Eliminating dependent computations
Recurrences are expressions
whose value in one iteration is
given by a function that depends
on the previous iteration.
Common type of recurrence occurs
in:
sum = sum + x;
docsity.com
Eliminating dependent
computations
Assuming an unroll loop with the
recurrence of five times.
If the value of x of these five
iterations be given by x1, x2, x3,
x4 and x5.
Then we can write the value of
sum at the end of each unroll as,
docsity.com
Eliminating dependent
computations
Sum = sum + x1 + x2 + x3 + x4 +
x5;
Unoptimizing the expressions
requires five dependent
operations.
And it can be rewritten as,
docsity.com
Eliminating dependent
computations
Sum = (( sum + x1) + ( x2 + x3)) + (
x4 + x5);
This can be evaluated in only three
dependent operations.
Recurrence also occurs from implicit
calculations.
With unrolling the dependent
computations can be minimised.
docsity.com

More Related Content

PPTX
3 Data Mining Tasks
PPTX
Query processing in Distributed Database System
PPT
Cluster Computing
PPTX
Design Issues of Distributed System (1).pptx
PPTX
Chess board problem(divide and conquer)
PPTX
Introduction to HDFS
PDF
Transaction TCP
PPTX
Dijkstra's Algorithm
3 Data Mining Tasks
Query processing in Distributed Database System
Cluster Computing
Design Issues of Distributed System (1).pptx
Chess board problem(divide and conquer)
Introduction to HDFS
Transaction TCP
Dijkstra's Algorithm

What's hot (20)

PPTX
Tree pruning
PPTX
Deadlock dbms
PPTX
Cluster computing
PPT
advanced computer architesture-conditions of parallelism
PPTX
IOT DATA MANAGEMENT AND COMPUTE STACK.pptx
PPTX
Design of Hadoop Distributed File System
PPTX
Principle source of optimazation
PPTX
Partial redundancy elimination
PPTX
Map reduce prashant
PPSX
Parallel Database
PPTX
sl slides-unit-1.pptx
PPTX
Content addressable network(can)
PDF
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
PDF
Lecture 1 introduction to parallel and distributed computing
PPTX
Multi threaded programming
PPTX
Hashing In Data Structure
PPTX
Wireless Mesh Network
PPTX
Lock based protocols
PPTX
Heapsort using Heap
PPTX
Ad-Hoc Networks
Tree pruning
Deadlock dbms
Cluster computing
advanced computer architesture-conditions of parallelism
IOT DATA MANAGEMENT AND COMPUTE STACK.pptx
Design of Hadoop Distributed File System
Principle source of optimazation
Partial redundancy elimination
Map reduce prashant
Parallel Database
sl slides-unit-1.pptx
Content addressable network(can)
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Lecture 1 introduction to parallel and distributed computing
Multi threaded programming
Hashing In Data Structure
Wireless Mesh Network
Lock based protocols
Heapsort using Heap
Ad-Hoc Networks
Ad

Similar to Unit v detecting-and-enhancing-loop-level-parallelism-advance-computer-architecture-lecture-slides (20)

PDF
Oct.22nd.Presentation.Final
PDF
Taking r to its limits. 70+ tips
PPT
Lecture#6 functions in c++
PPT
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
PDF
Code Optimizatoion
PPTX
JNTUK python programming python unit 3.pptx
PPTX
C++ Homework Help
PPTX
complier design unit 5 for helping students
PDF
Optimization in Programming languages
PPT
Functions and pointers_unit_4
PPTX
Intro to programing with java-lecture 3
PDF
Lecture 3
PPTX
made it easy: python quick reference for beginners
DOC
Chapter 5 notes new
PPTX
CPP Homework Help
PPTX
Inline function
PPTX
Co&al lecture-07
PPT
3 algorithm-and-flowchart
DOC
Chapter 6 notes
PPSX
Algorithms, Structure Charts, Corrective and adaptive.ppsx
Oct.22nd.Presentation.Final
Taking r to its limits. 70+ tips
Lecture#6 functions in c++
(Slides) Efficient Evaluation Methods of Elementary Functions Suitable for SI...
Code Optimizatoion
JNTUK python programming python unit 3.pptx
C++ Homework Help
complier design unit 5 for helping students
Optimization in Programming languages
Functions and pointers_unit_4
Intro to programing with java-lecture 3
Lecture 3
made it easy: python quick reference for beginners
Chapter 5 notes new
CPP Homework Help
Inline function
Co&al lecture-07
3 algorithm-and-flowchart
Chapter 6 notes
Algorithms, Structure Charts, Corrective and adaptive.ppsx
Ad

More from K Gowsic Gowsic (7)

PPTX
Control unit: Hardwired Vs Microprogrammed
PPT
Process Synchronization
PPT
Processes
PPT
Operating-System Structures
PPT
STRESS-MANAGEMENT.ppt
PPT
6224231.ppt
PDF
Unit I Memory technology and optimization
Control unit: Hardwired Vs Microprogrammed
Process Synchronization
Processes
Operating-System Structures
STRESS-MANAGEMENT.ppt
6224231.ppt
Unit I Memory technology and optimization

Recently uploaded (20)

PDF
Well-logging-methods_new................
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PPTX
UNIT 4 Total Quality Management .pptx
PPT
Project quality management in manufacturing
PPTX
additive manufacturing of ss316l using mig welding
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
PPT on Performance Review to get promotions
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Well-logging-methods_new................
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
UNIT 4 Total Quality Management .pptx
Project quality management in manufacturing
additive manufacturing of ss316l using mig welding
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Internet of Things (IOT) - A guide to understanding
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Embodied AI: Ushering in the Next Era of Intelligent Systems
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Safety Seminar civil to be ensured for safe working.
PPT on Performance Review to get promotions
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Fundamentals of Mechanical Engineering.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk

Unit v detecting-and-enhancing-loop-level-parallelism-advance-computer-architecture-lecture-slides

  • 1. Detecting and enhancing loop level parallelism With large dependence distance, more potential of parallelism is obtained by loop unrolling. Longer distances may provide the enough parallelism to keep the processor busy. docsity.com
  • 2. Back Substitution Back Substitution increases the amount of parallelism, but sometimes it also increases the amount of computation required These techniques can be applied both: - within a basic block; and - within a loop docsity.com
  • 3. Eliminating dependent computations Within a basic block: Here, algebraic simplifications of expressions and an optimization is used. This called copy propagation It eliminates operations that copy values docsity.com
  • 4. Eliminating dependent computations For example; copy propagation of DADDUI R1,R2,#4 DADDUI R1,R1,#4 Results into DADDUI R1,R2,#8 Here, computations are eliminated to remove dependence docsity.com
  • 5. Eliminating dependent computations Optimization: Tree-Height Reduction Technique It is also possible to increase the parallelism of the code by possibly increasing the number of operations. Such optimization is called tree height reduction docsity.com
  • 6. Eliminating dependent computations For example, the code sequence ADD R1,R2,R3 ADD R4,R1,R6 ADD R8,R1,R7 requires three cycles for execution Because, here all the instructions depend on immediate predecessor and cannot be issued in parallel docsity.com
  • 7. Eliminating dependent computations Now taking the advantage of the associatively, the code can be transformed and written in the from shown as below, ADD R1,R2,R3 ADD R4,R6,R7 ADD R8,R1,R4 This sequence can be computed in two execution cycles by issuing first 2 instruction in parallel docsity.com
  • 8. Eliminating dependent computations Recurrences are expressions whose value in one iteration is given by a function that depends on the previous iteration. Common type of recurrence occurs in: sum = sum + x; docsity.com
  • 9. Eliminating dependent computations Assuming an unroll loop with the recurrence of five times. If the value of x of these five iterations be given by x1, x2, x3, x4 and x5. Then we can write the value of sum at the end of each unroll as, docsity.com
  • 10. Eliminating dependent computations Sum = sum + x1 + x2 + x3 + x4 + x5; Unoptimizing the expressions requires five dependent operations. And it can be rewritten as, docsity.com
  • 11. Eliminating dependent computations Sum = (( sum + x1) + ( x2 + x3)) + ( x4 + x5); This can be evaluated in only three dependent operations. Recurrence also occurs from implicit calculations. With unrolling the dependent computations can be minimised. docsity.com