COA_Unit-3_slides_Pipeline Processing .pdf

Pipeline Processing
Basic concepts of Pipeline Processing
Instruction pipeline
Arithmetic pipeline
Handling Data, Control and Structural hazards
Compiler techniques for improving performance

Let's say that there are four loads of dirty laundry that need to be washed, dried, and folded.
We could put the the first load in the washer for 30 minutes, dry it for 40 minutes, and then
take 20 minutes to fold the clothes. Then pick up the second load and wash, dry, and fold,
and repeat for the third and fourth loads. Supposing we started at 6 PM and worked as
efficiently as possible, we would still be doing laundry until midnight.

However, a smarter approach to the problem would be to put the second load of dirty
laundry into the washer after the first was already clean and whirling happily in the dryer.
Then, while the first load was being folded, the second load would dry, and a third load
could be added to the pipeline of laundry. Using this method, the laundry would be finished
by 9:30.

Instruction Execution
An instruction in a process is divided into 5 subtasks:
1. Instruction fetch (IF).
2. Instruction decode (ID).
3. Operand fetch (OF).
4. Instruction Execution (IE).
5. Output store (OS).

• Instructions are executed one by one or in Non-Parallel fashion.
• Single H/W Components which can take only 1 task at a time from it’s
input and produce the result at the O/P.
Drawback
• Only one input can be processed at a time.
• Partial output or Intermediate O/P not possible

Pipelining
• Pipelining is a technique where multiple instructions are overlapped
during execution.
• Pipeline is divided into stages and these stages are connected with one
another to form a pipe like structure.
• Improves the throughput of the system and thus increases the overall
instruction throughput.

Pipelining
Execution in Pipelined Architecture
• Parallel Execution of instruction takes place.
• At a particular time slot, all the instructions are in different phases.
• Instead of single H/W components we can split H/W design into small
components.
• The segments are connected with each other through an interface
register and they can execute multiple task in in-depend or parallel.

4-Stage instruction pipeline
•The processing of each instruction is divided into 4-segments.
•FI🡪It is the segment that fetches an instruction.
•DA🡪It is the segment that decode instruction and calculate effective
address.
•FO🡪 It is the segment that fetches the operand
•EX🡪 It is the segment that executes the instruction

Instruction Cycle:
• Fetch Instruction
• Decode Instruction(Identify opcode and operand)
• Execute Instruction
• Write Back Result(In Register/Memory)
Registers Involved In Each Instruction Cycle:
• Memory address registers(MAR) : It is connected to the address lines of the
system
bus. It specifies the address in memory for a read or write operation.
• Memory Buffer Register(MBR) : It is connected to the data lines of the system
bus. It
contains the value to be stored in memory or the last value read from the
memory.
• Program Counter(PC) : Holds the address of the next instruction to be fetched.
• Instruction Register(IR) : Holds the last instruction fetched.

Stages of
Pipelining
• Instructions of the program execute parallely. When one instruction goes from nth
stage to (n+1)th
stage, another instruction goes from (n-1)th
stage to nth
stage.

Pipelining
• Advantages
• Pipelining improves the throughput of the system.
• In every clock cycle, a new instruction finishes its
execution.
• Allow multiple instructions to be executed concurrently.
• Disadvantages
• The design ofpipelined processor is complex and costly
to manufacture.
• The instruction latency is more.

Types of Pipelining
•Instruction Pipelining
•Arithmetic Pipelining

Instruction Pipelining
• Instruction pipelining is a technique in computer architecture where multiple
instruction stages (fetch, decode, execute, etc.) are overlapped to improve
CPU performance.
• It allows the next instruction to begin before the previous one completes,
increasing throughput and reducing execution time.
1. Fetch instruction from memory.
2. Decode the instruction.
3. Calculate effective address.
4. Fetch operand from memory.
5. Execute instruction.
6. Store the result in memory.

Problems with instruction pipeline (Pipeline
Hazards).
Pipeline hazards are issues that disrupt the smooth execution of instructions in a
pipeline. The main types are:
1.Resource Hazard (Resource Conflict): Occurs when two instructions need the
same resource (like memory or registers) simultaneously, causing a conflict.
2.Data Hazard (Data Dependency): Arises when an instruction depends on the result
of a previous instruction that hasn’t completed, leading to delays.
3.Branch Hazard (Branch Difficulties): Happens when the pipeline cannot predict the
outcome of a branch (like a conditional jump) early enough, leading to incorrect
instruction execution.

Arithmetic Pipeline
• Arithmetic pipelining is a technique used in processors to break down
complex arithmetic operations (like multiplication or division) into smaller
stages, allowing multiple operations to be processed concurrently.
• This improves performance by executing parts of several instructions in
parallel.

Arithmetic Pipeline
• The combined operation of floating-point addition and subtraction
is divided into four segments. Each segment contains the
corresponding suboperation.
• The suboperations in the four segments are:
• Compare the exponents by subtraction.
• Align the mantissas.
• Add or subtract the mantissas.
• Normalize the results

Floating-point binary addition:
X = A * 2a
= 0.9504 * 103
Y = B * 2b
= 0.8200 * 102
1. Compare exponents by subtraction:
• The exponents are compared by
subtracting them to determine their
difference. The larger exponent is
chosen as the exponent of the result.
• The difference of the exponents,
i.e., 3 - 2 = 1 determines how many
times the mantissa associated with the
smaller exponent must be shifted to the
right.
2. Align the mantissas:
• The mantissa associated with the
smaller exponent is shifted according to
the difference of exponents determined
in segment one.
X = 0.9504 * 103
Y = 0.08200 * 103

3. Add mantissas:
The two mantissas are added in segment
three.
Z = X + Y
= 1.0324 * 103
4. Normalize the result:
After normalization, the result is written
as:
Z = 0.10324 * 104

Parameters that Determines the Performance of pipeline process
• Speed Up Ratio (SK
)
• Latency (LK
)
• Efficiency (EK
)
• Throughput(HK
)

Parameters that Determines the Performance of pipeline process

Parameters that Determines the Performance of pipeline process-
Speed up(s)

Latency measures the time to complete a single instruction, and while it doesn’t
decrease with pipelining, pipelining allows more instructions to be completed
concurrently.
Throughput is significantly improved with pipelining because multiple
instructions are processed in parallel, leading to the completion of one
instruction per clock cycle after the pipeline is filled.
Speedup quantifies the performance improvement achieved by pipelining, often
close to the number of pipeline stages, provided hazards are minimized.
Efficiency is a measure of how effectively the pipeline stages are utilized. It is
affected by hazards, stalls, and the balance of work between stages.

Example: Space Time Diagram, No of
Segment = 4 No of Task =6 Speed Up Ratio =
?
•Number of Segment = 4
•Number of tasks= 6

Branch instruction hazard
•In a pipeline, a branch instruction hazard occurs when the
pipeline cannot determine the next instruction to fetch because
it depends on the outcome of a branch instruction. This creates
a delay (or "stall") in the pipeline until the branch's outcome is
known. Let’s break down this scenario with the given
information:
•Number of time slots: 12
•Total instructions: 7
•Branch instruction: 3rd instruction
•Number of stalls due to the branch: 2

•Space-Time Diagram with Stalls for the Branch Instruction
•Here’s a pipeline diagram showing how these stalls impact the
execution. Each cell represents the stage that an instruction is in
during a specific time slot.

• Explanation of Diagram
• IF (Instruction Fetch), ID (Instruction Decode), EX (Execute), MEM (Memory
Access), and WB (Writeback) represent the stages each instruction goes
through.
• The 3rd instruction is a branch, which, after the EX stage, introduces 2 stalls in
time slots 6 and 7.
• These stalls occur because the pipeline must wait for the branch outcome.
• Once the branch direction is resolved, instruction 4 resumes at time slot 8.
• Impact of Stalls: The branch hazard delays the pipeline by 2 time slots, causing
subsequent instructions to start later than they would in an uninterrupted
pipeline.
• Calculating the Total Time with Stalls
• In this case:
• Without stalls, the pipeline could ideally complete all 7 instructions in 10 time
slots.
• With the 2 stalls caused by the branch, the total execution time increases to 12
time slots.

Pipeline
Hazard
• Any condition that causes ‘stall’ in the pipeline operations can be called a
Hazard
• Pipeline hazards are situation that prevent the next instruction in the
instruction stream from executing during its designated clock cycles.
• Hazard Occurs:
A <- 3+A
B <- 4*A
• No Hazard
A <- 5*C
B <- 20+C
Pipeline: It is a technique of decomposing a sequential
process into a no of sub processes with each sub process
being executed in a special dedicated segment that
operates concurrently with all other segments.

Pipeline Hazards
a) Data Hazard
b) Control or Instruction Hazard
c) Structure Hazard

Pipeline Hazard
a) Data Hazards
•An instruction cannot continue because it needs a value that has not yet
been generated by an earlier instruction.
•In other words an instruction attempt to use a resource before it is ready.
There are three type of Data Hazard possible.
1) RAW (Read after Write) [Flow/True data dependency]
2) WAR (Write after Read) [Anti-Data dependency]
3) WAW (Write after Write) [Output data dependency]

Pipeline Hazard
• Let there be two instructions I and J, such that J follow I. Then,
• RAW hazard occurs when instruction J tries to read data before instruction I writes
it.
• Eg:
• I: R2 <- R1 + R3
• J: R4 <- R2 + R3
• WAR hazard occurs when instruction J tries to write data before instruction I reads
it.
• Eg:
• I: R2 <- R1 + R3
• J: R3 <- R4 + R5

Pipeline Hazard
•WAW hazard occurs when instruction J tries to write output before
instruction I writes it.
Eg:
I: R2 <- R1 + R3
J: R2 <- R4 + R5
•WAR and WAW hazards occur during the out-of-order execution of
the instructions.

#Observations
• All the instructions after the ADD use the result of the ADD instruction (in R1). The ADD instruction writes the
value of R1 in the WB stage (shown black), and the SUB instruction reads the value during ID stage (IDsub
). This
problem is called a data hazard. Unless precautions are taken to prevent it, the SUB instruction will read the
wrong value and try to use it.
• The AND instruction is also affected by this data hazard. The write of R1 does not complete until the end of cycle
5 (shown black). Thus, the AND instruction that reads the registers during cycle 4 (IDand
) will receive the wrong
result.
• The OR instruction can be made to operate without incurring a hazard by a simple implementation
technique. The technique is to perform register file reads in the second half of the cycle, and writes in the
first half. Because both WB for ADD and IDor
for OR are performed in one cycle 5, the write to register file
by ADD will perform in the first half of the cycle, and the read of registers by OR will perform in the
second half of the cycle.
• The XOR instruction operates properly, because its register read occur in cycle 6 after the register write by
ADD.

Control Hazard
•A control hazard (or branch hazard) occurs in pipelined processors
when the pipeline cannot determine which instruction to fetch next due
to a conditional branch instruction. This uncertainty causes a delay
until the branch outcome (whether to take the branch or not) is known.

Control hazard
Memory
Location
Instructions
12: If R1 = R3 Jump to label 36
16: AND R2, R3, R5
20: MUL R6, R1, R7
24: ADD R8, R1, R9
Label 36 DIV R10,R1, R11

Control
Hazard
In the instruction 12 ,
• Whether it will jump to address location 36 or not, it will be known at ‘MEM’ phase i.e.
at CC4.
• It means until the instruction 12 will be in the 4th
CC the next three instructions at
the memory locations 16, 20 and 24 have already entered in the pipe and
performing the operations in their respective phases.
• In normal pipeline concept these instruction(located at addresses 16,20 and 24) have
entered in the pipe.
• Now, suppose if condition(R1=R3) becomes true then everything happened with the
subsequent instructions becomes wrong, because at this point it is clear that instruction
at location 36 should be the next instruction to be executed, instead of 16,20 and 24.
• So, there is a requirement of flush out the wrongly entered/processed instructions.
• Therefore, STALLS for 3CC will occur and instruction at the location 16, 20 and 24 should
not suppose to executes and they should flush out from the pipe.

Control hazard
•Suppose, branch instruction decides to go to the location 36 for the
next instruction to be executed in the MEM stage i.e. in the CC4.
•Three subsequent instructions that follow the branch instruction will
be fetched and begin their executions as like normal scenario, before
If loop branches to location 36.
•Generally, pipelining will not be stopped, so all the instruction after
the branch also gets executed, but “if branch condition is true”, then
“just flush out the previous unwanted instruction from the pipe”
•In this case Branch Penalty is 3 Cycles

Control
Hazard
So,
• The instruction fetch unit of CPU is responsible for providing a stream of instructions to
the execution unit.
• The instruction fetched by the fetch unit are in consecutive memory location until some
special conditions or branch occurs.
• Problem arises when one of the instruction is branch instruction and need to go to the
some different memory location.
• In this case all unwanted instructions fetched in the pipeline from consecutive memory locations
are invalid now and need to remove i.e. Flushing out from the pipe.
Memory
Location
Instructions
12 If R1 = R3 Jump to label 36
16 AND R2, R3, R5
20 MUL R6, R1, R7
24 ADD R8, R1, R9
36 DIV R10,R1, R11
Flush OUT

Control Hazard
• This causes STALL in the pipeline till new corrected instruction are
fetched from the memory.
• Thus the time lost as a result of this called as Branch Penalty.
• For reducing the resulting delay, dedicated hardware is incorporated in
the fetch/decode unit to identify branch instruction possibility of
occurrence in advance.
• It can increase the cost.

Structural Hazard
• When the multiple instructions need the same resource.
• It means in a computer organization part, common resources are used by
the multiple instructions for their execution.
• These resources are like: Memory, RAM, Different kind of registers, ALU,
common bus etc.
• We have limited no of resources and large no of instructions
• So, obviously many conflict may occur due to this situation.
• Due to this, normal pipeline concept is getting disturbed, called as
Structural Hazard.

Structural
Hazard
• For I1: Focus on CC4,
In this above example , we are accessing ‘MEM’ in CC4 for loading/storing the data in
the memory.
• For I4: Focus on CC4,
In the same CC4, I4 is also fetching instruction from memory, so two instructions I1 and
I4
using the same resources at a time.
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
Clock Cycle
I1
I2
I3
I4

Structural
Hazard
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
STALL IF ID EX MEM WB
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9
Clock Cycle
I1
I2
I3
I4
• For I4: If we make a stall at CC4 and start the I4 in the CC5 then again the similar
kind of problem still exists, because at CC5, i2 is using the memory along with
i4.
• Same kind of problem may occur continuously in CC6 and CC7 also.
Then What will be the solution for this???

Structural Hazard solved by Stalling Solution
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
STALL STALL IF ID EX MEM
Clock Cycle
I1
I2
I3
I4
I1
I2
I3
I4
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
STALL STALL STALL IF ID EX MEM
Clock
Cycle
WB
CC11

Structural Hazard solution by hardware technique
• One of the simple solution is to give separate memory for keeping the instructions
and the data.
• In Von Neumann Architecture, same memory is used for storing the data and the
instructions, so it is a big drawback of Von Neumann Architecture.
• If we use harvard Architecture, we can store instructions and operand data in
different slots of memory locations.
VON-NEUMANN
I1
I2
DATA1
I3
DATA2
.
.
I10
Memory for
Instruction
Memory for operand
Data
I1 D1
I2 D2
I3 D3
I4 D4
.
.
D5
.
.
I10 D10
Harvard
Architecture

Structural
Hazard
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
STALL IF ID EX MEM WB
Clock Cycle
I1
I2
I3
I4
• Again at CC6, I2 is writing result in the register, whereas I4, is in decode stage.
Here, there is chance that both (I2) and (I4) is using the same register at their
stages. So, there is a requirement of better register architecture.
• It means we need to keep multiple register files as per the specific purpose.
• Similarly, common bus is also accessing by multiple instructions at their
different stages. So, there is a great chance of confliction.
• So, there is a requirement of better bus organization.

Methods of Optimizing Against Hazards – Compiler/Software Level
• While stalling is a universal remedy in that it can be used to resolve any pipelining
hazard, the costs are high, and stalls impact a chip’s ability to perform efficiently.
However, there are other methods available to resolve hazards that help retain
efficiency.
• The first we will examine are all performed in the compiler; no additional hardware
need be added to implement them because the improvements are made to the
code itself, not to the machine running it.
• Implemented correctly, compiler-level optimizations, as opposed to hardware-level
optimizations, can provide a solution to hazards that does not require extra power
and can be performed on any hardware.

Resolving Structural Hazards using Compiler/Software Level
•One approach to structural hazards is to reorder operations such that
two instructions are never so close to one another that this sort of
hazard occurs.
•Reordering operations to prevent this kind of hazard is often viable
during compiler time.

Resolving Data Hazards using
Compiler/Software Level
•Data hazards occur in a pipeline when an instruction depends on
the result of a previous instruction that hasn't completed yet.
•Resolving data hazards is crucial for maintaining pipeline efficiency,
and common techniques include stalling, data forwarding
(bypassing), and reordering instructions.

CYCLE ADD SUB
1 Fetch
2 Decode Fetch
3 Execute Decode
4 Memory Execute(receive the R1 from add
execute operation)
5 Write back Memory
6 Write back

Resolving Control Hazards using
Compiler/Software Level
•There are several ways to handle control hazards, including
stalling (waiting for branch resolution), branch prediction,
and branch delay slots.

Resolving Control Hazards using software/compiler level solution
Managing control hazards at the compiler level is related to distancing the logical operation on which the branch is
based from the branch itself and on limiting the number of branches. Both of these are accomplished by loop
unrolling. This is also used to increase the system performance by reducing the number of iterations.
Loop unrolling essentially expands the body of a loop so that fewer branches are necessary.
Example:
for (i=1000; i>0; i=i-1)
x[i] = x[i] + s;
Unrolled loop (4 times):
for (i = 1000; i > 0; i = i - 4) {
x[i] = x[i] + s;
x[i - 1] = x[i - 1] + s;
x[i - 2] = x[i - 2] + s;
x[i - 3] = x[i - 3] + s;
}

Challenges in Loop Unrolling
• Increased Code Size and Cache Misses
• Trade-offs of loop unrolling: larger code footprint may increase cache misses
• Optimal Unrolling Factor
• Explanation of factors influencing optimal unrolling: cache size, register

Limitations of Compiler-Level Optimization
• Limitations at Compile Time
• Many runtime hazards, like data dependencies, can’t be fully predicted
• Examples of Runtime-Only Hazards
• Control flow changes, unpredictable loops
• Conclusion: Need for hardware-level solutions for complex hazards

Methods of Optimizing Against Hazards – Hardware-Level
Resolving Structural Hazards
• Resource Duplication
• Resource Pipelining
• Dynamic Scheduling

Resource Duplication
•Resource duplication offers a more direct approach to eliminating
structural hazards. By duplicating critical resources, such as providing
multiple ALUs or additional memory ports, the pipeline can handle
concurrent requests without conflicts. This eliminates the need for
stalls and ensures smoother instruction flow. However, resource
duplication increases the hardware complexity and cost of the
processor. The trade-off between performance gain and increased
cost must be carefully considered.

Resource Duplication
• One of the simple solution is to give separate memory for keeping the instructions
and the data.
• In Von Neumann Architecture, same memory is used for storing the data and the
instructions, so it is a big drawback of Von Neumann Architecture.
• If we use harvard Architecture, we can store instructions and operand data in
different slots of memory locations.
VON-NEUMANN
I1
I2
DATA1
I3
DATA2
.
.
I10
Memory for
Instruction
Memory for operand
Data
I1 D1
I2 D2
I3 D3
I4 D4
.
.
D5
.
.
I10 D10
Harvard
Architecture

Pipelining the Resource
•Pipelining the resource itself is another effective strategy to mitigate
structural hazards. Instead of duplicating the entire resource, it is
divided into smaller pipelined stages. This allows the resource to
handle multiple instructions concurrently, as different stages can
process different parts of the instructions. For example, a pipelined
ALU can have separate stages for operand fetching, arithmetic
operation, and result writing. This approach improves throughput
without requiring full resource duplication, but it may increase the
latency of the resource itself.

Dynamic Scheduling
•Dynamic scheduling employs sophisticated hardware mechanisms to
dynamically analyze and reorder instructions at runtime, aiming to
avoid structural hazards. Techniques like scoreboarding and
Tomasulo's algorithm track resource availability and instruction
dependencies, allowing the processor to schedule instructions out of
order to maximize resource utilization and minimize stalls. While
highly efficient, dynamic scheduling significantly increases the
complexity of the processor's control logic and can lead to higher
power consumption.

Resolving Data Hazards
• Register Renaming
• Open forwarding

Register Renaming: Unlocking Parallelism
• Register renaming eliminates false dependencies by mapping logical to physical registers.
• This technique allows multiple instructions to execute concurrently without interference.
• It frees up resources and enhances the performance of the pipeline significantly.
• Register renaming is essential for modern processors aiming for higher instruction throughput.
• It’s a cornerstone strategy in overcoming data hazards.
• · Example:
• · Instruction 1: R1 = R2 + R3 (Stores the result in R1)
• · Instruction 2: R1 = R4 + R5 (Uses R1 but overwrites the previous value)
• Without register renaming, the second instruction would overwrite R1 too soon, before the first instruction can use it,
leading
• to incorrect results. With register renaming:
• · Instruction 1: R6 = R2 + R3 (Renames R1 to R6)
• · Instruction 2: R7 = R4 + R5 (Renames R1 to R7)
• This ensures that both instructions use different physical registers (R6 and R7), allowing parallel execution without any

Open forwarding
• Open forwarding allows for the dynamic transfer of data between pipeline stages.
This technique minimizes delays by forwarding results directly to dependent instructions.
It enhances throughput and reduces latency in executing instructions smoothly.
By doing so, it optimizes resource utilization within the pipeline.
Open forwarding is a key strategy to combat data hazards.

Resolving Control Hazards
• Branch Prediction

Branch Prediction
• Guessing whether a branch (like an if-statement) will go one way or another to keep the pipeline moving
without delays.
• There are two types 1) Static and 2) Dynamic
Static Prediction:
Uses simple rules, like always assuming branches are "taken" or "not taken.“
• Pros: Reduces pauses and keeps the system working smoothly.
• Cons: Wrong guesses lead to wasted work and lost time.

Dynamic Branch prediction
• A smarter type of branch prediction that looks at what happened in the past to make better guesses about
branches.
• Common techniques include 1-bit and 2-bit prediction tables, and more complex methods like Pattern
History Tables (PHT) or Branch History Tables (BHT).
• Pros: More accurate than basic guessing, which means fewer stalls.
• Cons: Needs extra hardware, and wrong guesses can still waste time.

Consider the following 4-Stage instruction pipeline where different instructions are
taking different amount of time at different stages. How many CC will be required to
complete these four instructions in the given pipeline?
IF ID EX WB
I1 2 1 2 2
I2 1 3 3 1
I3 2 2 2 2
I4 1 2 1 2

IF ID EX WB
I1 2 1 2 2
I2 1 3 3 1
I3 2 2 2 2
I4 1 2 1 2
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15 t16
I1
I2
I3
I4

Fetch(F), Decode(D), Execute(E) Write(W).
ADD R0, R1,R2
MUL R3, R4,
R6 SUB
R7,R8,R9
DIV R10, R11, R12
STORE X, R13
Fetch, Decode, Write Back takes 1 CC while Execution takes 3 Cycle for remaining instructions. What is
the
speed up?
Q. Consider the following program segment which is executed in the 4- stage pipeline.

Fetch(F), Decode(D), Execute(E) Write(W).
ADD R0, R1,R2
MUL R3, R4,
R6 SUB
R7,R8,R9
DIV R10, R11, R12
STORE X, R13
Fetch, Decode, Write Back takes 1 CC while Execution takes 3 Cycle for remaining instructions. What is the speed
up?
Q. Consider the following program segment which is executed in the 4- stage pipeline.
Fetch(F) Decode(D) Execute(E) Write Back(W)
I1 1 1 1 1
I2 1 1 3 1
I3 1 1 1 1
I4 1 1 3 1
I5 1 1 1 1

Fetch(F) Decode(D) Execute(E) Write Back(W)
I1 1 1 1 1
I2 1 1 3 1
I3 1 1 1 1
I4 1 1 3 1
I5 1 1 1 1
I1
I2
I3
I4
I5
Speed Up=

Q) A CPU has a 5-stage pipeline and operates at a frequency of 1 GHz. The instruction fetch happens in the first
stage. A conditional branch instruction computes the target address and evaluates the condition in the 3rd
stage. The CPU stalls and does not fetch new instructions following a conditional branch instruction until the
branch outcome is known. Given that a program consists of 1 billion instructions, where 20% of these
instructions are conditional branch instructions, and each instruction takes 1 clock cycles on average, calculate
the total time required for the completion of the program.

• Consider two pipeline implementations that have the same instruction structures and support
overlapping of all instructions, except for memory-related operations. In this case, if memory
operations cannot be executed simultaneously, it results in one stall cycle. In the program, 20% of
the instructions involve memory-related operations. Pipeline 1 utilizes 1 port-memory, while Pipeline
2 uses 2 port –memory. If the speedup factors for the respective pipelines are S1,S2, What is the
value of S2/S1.

COA_Unit-3_slides_Pipeline Processing .pdf

More Related Content

Similar to COA_Unit-3_slides_Pipeline Processing .pdf (20)

Recently uploaded (20)

COA_Unit-3_slides_Pipeline Processing .pdf