SlideShare a Scribd company logo
WELLCOME TO OUR PRESENTATION
Slide 1
 Mahmudul Hasan
sojib6@yahoo.com
Group Members
Slide 2
Slide 3
• Single-cycle control: hardwired
– Low CPI (1)
– Long clock period (to accommodate slowest instruction)
• Multi-cycle control: micro-programmed
– Short clock period
– High CPI
Single-cycle
Multi-cycle
insn0.(fetch,decode,exec) insn1.(fetch,decode,exec)
insn0.fetch insn0.dec insn0.exec insn1.fetch insn1.dec insn1.exec
time
Slide 3
• Start with multi-cycle design
• When insn0 goes from stage 1 to stage 2
… insn1 starts stage 1
• Each instruction passes through all stages
… but instructions enter and leave at faster rate
Multi-cycle insn0.fetch insn1.decinsn0.dec insn0.exec insn1.fetch insn1.exec
time
Pipelined
insn0.fetch insn0.dec
insn1.fetch
insn0.exec
insn1.dec
insn2.fetch
insn1.exec
insn2.dec insn2.exec
Can have as many insns in flight as there are stagesSlide 4
• A Pipelining is a series of stages, where some work is
done at each stage in parallel.
• The stages are connected one to the next to form a
pipe - instructions enter at one end, progress through
the stages, and exit at the other end.
Slide 5
Pipeline categories
 Linear pipelines
A linear pipeline processor is a series of
processing stages and memory access.
 Non-linear pipelines
A non-linear pipelining (also called dynamic pipeline) can
be configured to perform various functions at different
times. In a dynamic pipeline, there is also feed-forward or
feed-back connection. A non-linear pipeline also allows
very long instruction words.
Slide 6
Processor Pipeline Review
Slide 7
InstructionPipeline
• Instruction pipeline has six operations,
 Fetch instruction (FI)
 Decode instruction (DI)
 Calculate operands (CO)
 Fetch operands (FO)
 Execute instructions (EI)
 Write result (WR)
 Overlap these operations
Slide 8
Stage 1: Fetch Diagram
Instructio
nbits
IF / ID
Pipeline
register
Instructio
n
Cache
P
C
en
e
n
1
+
M
U
X
PC+
1
Decod
targe
t
Slide 9
Stage 1: Instructions Fetch
• Fetch an instruction from memory every cycle
– Use PC to index memory
– Increment PC (assume no branches for now)
• Write state to the pipeline register (IF/ID)
– The next stage will read this pipeline register
Slide 10
Stage 2: Decode Diagram
ID / EX
Pipeline
register
regA
content
s
regB
content
s
Register
File
reg
A
reg
B
e
n
Instructio
n
bits
IF / ID
Pipeline
register
PC+
1
PC+
1
Control
signal
s
Fetc
h
Execut
destRe
g
dat
a
targe
t
Slide 11
Stage 2: Instruction Decode
• Decodes opcode bits
– Set up Control signals for later stages
• Read input operands from register file
– Specified by decoded instruction bits
• Write state to the pipeline register (ID/EX)
– Opcode
– Register contents
– PC+1 (even though decode didn’t use it)
– Control signals (from insn) for opcode and destReg
Slide 12
Stage 3: Execute Diagram
ID / EX
Pipeline
register
regA
content
s
regB
content
s
ALU
resul
t
EX/Mem
Pipeline
register
PC+
1
Control
signal
s
Control
signal
s
PC+1
+offse
t
+
regB
content
s
A
L
UM
U
X
Decod
e
Memor
destReg
data
targe
t
Slide 13
Stage 3: Execution
• Perform ALU operations
– Calculate result of instruction
• Control signals select operation
• Contents of regA used as one input
• Either regB or constant offset (from insn) used as second input
– Calculate PC-relative branch target
• PC+1+(constant offset)
• Write state to the pipeline register (EX/Mem)
– ALU result, contents of regB, and PC+1+offset
– Control signals (from insn) for opcode and destReg
Slide 14
Stage 4: Memory Diagram
ALU
resul
t
Mem/WB
Pipeline
register
ALU
resul
t
EX/Mem
Pipeline
register
Control
signal
s
PC+1
+offse
t
regB
content
s
Loade
d
data
Control
signals
Execut
e
Write-
in_data
in_addr
Data Cache
en R/W
destReg
data
targe
t
Slide 15
Stage 4: Memory
• Perform data cache access
– ALU result contains address for LD or ST
– Opcode bits control R/W and enable signals
• Write state to the pipeline register (Mem/WB)
– ALU result and Loaded data
– Control signals (from insn) for opcode and destReg
Slide 16
Stage 5: Write-back Diagram
ALU
resul
t
Mem/WB
Pipeline
register
Control
signal
s
Loade
d
data
M
U
X
dat
a
destRe
g
M
U
X
Memor
y
Slide 17
Stage 5:Write Back
• Writing result to register file (if required)
– Write Loaded data to destReg for LD
– Write ALU result to destReg for arithmetic insn
– Opcode bits control register write enable signal
Slide 18
Putting It All Together
P
C
Ins
tCach
e
Register
file
M
U
X
A
L
U
1
Data
Cach
e
+
M U
X
IF/I
D
EX/Me
m
Mem/W
B
M
U
X
dest
op
ID/E
X
offse
t
PC+
1
PC+
1
+
targe
t
ALU
resul
t
des
t
op
val
B
des
t
op
ALU
result
mdat
a
eq
?
instructio
n
0
valA
val
B
R
0
R
1
R
2
R
3
R
4
R
5
R
6
R
7
reg
A
reg
B
dat
a
des
t
M
U
X
Slide 19
Six Stage
Instruction Pipeline
Slide 20
• Let see and examine our datapath and control
diagram.
• Associated resources with states.
• Ensure that flows do not conflict, or figure out how
to resolve
• Assert control in appropriate stage.
Slide 21
 Register Memory: All ALU operation will be performed
in register Memory.
 Instruction & Memory: Two kind of Memory System:-
1)instruction Memory 2)Data Memory
 Only instruction can access through memory as ALU
operation will perform in Register operand.
 5 Instruction cycle have to complete to execute the
operation
5Steps of MIPS Datapath
Memory
Access
Write
Back
Instruction
Fetch
Instr. Decode
Reg. Fetch
Execute
Addr. Calc
L
M
D
ALU
MUX
Memory
RegFile
MUXMUX
Data
Memory
MUX
Sign
Extend
4
Adder
Zero?
Next SEQ PC
Addres
s
Next PC
WB Data
Inst
RD
RS1
RS2
Imm
What do we need to do to pipeline the process ?
Slide 22
5 Steps of MIPS/DLX Datapath
Memory
Access
Write
Back
Instruction
Fetch
Instr. Decode
Reg. Fetch
Execute
Addr. Calc
Memory
MUXMUX
Data
Memory
Zero?
IF/I
D
ID/EX
RegFile
MUX
MEM/WB
EX/MEM
ALU
4
Adder
Next SEQ PC Next SEQ PC
RD RD RD
WB
• Data stationary control
– local decode for each instruction phase / pipeline stage
Next PC
Addres
s
RS1
RS2
Sign
Imm Extend
MUX
Slide 23
• Can help with answering questions like:
– how many cycles does it take to execute this code?
– what is the ALU doing during cycle 4?
– use this representation to help understand datapaths
Graphically Representing Pipelines
Slide 24
Visualizing Pipelining
I
n
s
t
r.
O
r
d
e
r
Time (clock cycles)
g
AL
U
DMemIfetch Re Reg
Reg
AL
U
DMemIfetch Reg
Reg
AL
U
DMemIfetch Reg
Reg
AL
U
DMemIfetch Reg
Cycle 6 Cycle 7Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Slide 25
Conventional Pipelined Execution Representation
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
IFetch Dcd Exec Mem WB
Program Flow
Time
Slide 26
SingleCycle,Multiple Cycle,vs. Pipeline
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Mem Wr
Mem Wr Ifetch Reg Exec
Multiple Cycle Implementation:
Load
Ifetch Reg Exec
Store
Exec Mem Wr
Clk
Single Cycle Implementation:
Load Store Waste
Mem Ifetch
R-type
Reg Exec Mem Wr
Pipeline Implementation:
Load Ifetch Reg Exec
Store Ifetch Reg
R-type Ifetch
Cycle 2Cycle 1
Slide 27
• Suppose we execute 100 instructions
• Single Cycle Machine
– 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
• Multicycle Machine
– 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns
• Ideal pipelined machine
– 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
Slide 28
Pipeline Performance
 Pipelining
 – At best no impact on latency
 • Still need to wait “n” stages (cycles) for completion of instruction
 – Improves “throughput”
 • No single instruction executes faster but overall throughput is higher
 • Average instruction execution time decreases
 • Successive instructions complete in each successive cycle (no 5 cycle
wait
 between instructions)
 – Reality
 • Clock determined by slowest stage
 • Pipeline overhead
 – Clock skew
 – Register delay
 – Pipeline fill and drain
Slide 29
 Consider a non pipelined machine with 6 execution stages of lengths 50 ns, 50
ns, 60 ns, 60 ns, 50 ns, and 50 ns.
- Find the instruction latency on this machine.
- How much time does it take to execute 100 instructions? Solution:
 Instruction latency = 50+50+60+60+50+50= 320 ns
Time to execute 100 instructions = 100*320 = 32000 ns
 Suppose we introduce pipelining on this machine. Assume that when
introducing pipelining, the clock skew adds 5ns of overhead to each execution
stage.
 - What is the instruction latency on the pipelined machine?
- How much time does it take to execute 100 instructions?
 Solution:
Remember that in the pipelined implementation, the length of the pipe stages must
all be the same, i.e., the speed of the slowest stage plus overhead. With 5ns
overhead it comes to:
 The length of pipelined stage = MAX(lengths of unpipelined stages) + overhead =
60 + 5 = 65 ns
Instruction latency = 65 ns
Time to execute 100 instructions = 65*6*1 + 65*1*99 = 390 + 6435 = 6825 ns
Slide 30
Slide 31
 Three types of pipeline hazards
 Structural hazard – They arise from resource
conflicts when the hardware cannot support all
possible combinations of instructions in
simultaneous overlapped execution.
 – Data hazard – They arise when an instruction
depends on the result of a previous instruction in a
way that is exposed by the overlapping of
instructions in the pipeline.
 – Control hazard – They arise from the pipelining of
branches and other instructions that change the PC.
 Pipelining makes efficient use of resources.
 Quicker time of execution of large number of
instructions
 The parallelism is invisible to the programmer.
Slide 32
 Pipelining involves adding hardware to thechip.
 Inability to continuously run the pipeline at full
speed because of pipeline hazards which
disrupt the smooth execution ofthe pipeline.
Slide 33
 The pipelining concept takes lot of advantages in many of the systems.
The pipelining has some of the hazards. The pipelining of instructions
which reduce the CPI, increases the speed of execution or operation,
and also increase throughput of
 Overall system. This is basic concept of any system and lot of
improvement can be done in pipelining concept to increase the speed of
system. The future scope is to apply the concept
 to different embedded systems and we can see how the performance
increases by this concept.
Slide 34
Inquiry
2014-2-60-035@ewu.edu.bd
rasadin.ewu@gmail.com
Slide 35
Referrance
 https://en.wikipedia.org/wiki/Pipeline_(computing)
 https://en.wikipedia.org/wiki/Instruction_pipelining
 https://docs.marklogic.com/guide/cpf/pipelines#id_50
300
 https://www.tutorialspoint.com/computer_organizatio
n/instruction_pipeline_architecture.asp
Slide 36

More Related Content

PPTX
Pipeline processing - Computer Architecture
PPT
Instruction pipelining
PDF
CS6401 OPERATING SYSTEMS Unit 2
PPTX
Operating system - Process and its concepts
PPT
Hardware multithreading
PPTX
Superscalar & superpipeline processor
PPT
Pipelining in computer architecture
PDF
Lec 12-15 mips instruction set processor
Pipeline processing - Computer Architecture
Instruction pipelining
CS6401 OPERATING SYSTEMS Unit 2
Operating system - Process and its concepts
Hardware multithreading
Superscalar & superpipeline processor
Pipelining in computer architecture
Lec 12-15 mips instruction set processor

What's hot (20)

PDF
Intel 80486 Microprocessor
PPTX
Target hardware debugging
PPT
80286 microprocessor
PPT
Parallel adder
PPTX
Adder
PPTX
ADDRESSING MODE
PPTX
Chapter 5: Cominational Logic with MSI and LSI
PPTX
Binary parallel adder
PPTX
Programmers model of 8086
DOCX
Leaky bucket algorithm
PDF
Unit II arm 7 Instruction Set
PPTX
CMOS LOGIC STRUCTURES
PPTX
Floating point arithmetic operations (1)
PDF
Computer Organization And Architecture lab manual
PPTX
PIC-18 Microcontroller
PPT
Demultiplexing of buses of 8085 microprocessor
PPTX
Microoperations
PPT
Arm processor
PPTX
Modes of transfer
PPTX
RISC - Reduced Instruction Set Computing
Intel 80486 Microprocessor
Target hardware debugging
80286 microprocessor
Parallel adder
Adder
ADDRESSING MODE
Chapter 5: Cominational Logic with MSI and LSI
Binary parallel adder
Programmers model of 8086
Leaky bucket algorithm
Unit II arm 7 Instruction Set
CMOS LOGIC STRUCTURES
Floating point arithmetic operations (1)
Computer Organization And Architecture lab manual
PIC-18 Microcontroller
Demultiplexing of buses of 8085 microprocessor
Microoperations
Arm processor
Modes of transfer
RISC - Reduced Instruction Set Computing
Ad

Similar to Design pipeline architecture for various stage pipelines (20)

PPTX
Pipelining of Processors Computer Architecture
PPTX
Pipelining of Processors
PPT
CH-5-Pipelining Computer architecture and organization.ppt
PPT
Performance Enhancement with Pipelining
PPTX
3 Pipelining
PPTX
pipelining
PPT
Pipelining _
PPT
Chapter 4
PPT
Pipeline hazard
PDF
Topic2a ss pipelines
PPTX
Pipeline & Nonpipeline Processor
PPTX
CPU Pipelining and Hazards - An Introduction
PPT
PPT
Pipelining slides
PDF
Arch 1112-6
PPT
Pipelining In computer
PDF
Below is the syntax and the encoding. The instruction below computes.pdf
PPT
Pipelining
PDF
Maximizing CPU Efficiency: A Comprehensive Exploration of Pipelining in Compu...
PPTX
Assembly p1
Pipelining of Processors Computer Architecture
Pipelining of Processors
CH-5-Pipelining Computer architecture and organization.ppt
Performance Enhancement with Pipelining
3 Pipelining
pipelining
Pipelining _
Chapter 4
Pipeline hazard
Topic2a ss pipelines
Pipeline & Nonpipeline Processor
CPU Pipelining and Hazards - An Introduction
Pipelining slides
Arch 1112-6
Pipelining In computer
Below is the syntax and the encoding. The instruction below computes.pdf
Pipelining
Maximizing CPU Efficiency: A Comprehensive Exploration of Pipelining in Compu...
Assembly p1
Ad

More from Mahmudul Hasan (14)

PPTX
C Programming Loop
PPTX
E commerce Website
PPTX
5 g architecture and application
PPTX
Car parking system
PPTX
Software company
PPTX
Avl tree
PPTX
PLOTTING UNITE STEP AND RAMP FUNCTION IN MATLAB
PPTX
Elimination of left recursion
PPTX
2 bit alu
PPTX
গুড়পুকুরের মেলা
PPTX
Emergency system AI presentation Using Prolog
PPT
Bank management system
PPTX
Hospital management system DBMS PROJECT USING APEX 5.04
PPTX
Digital search tree
C Programming Loop
E commerce Website
5 g architecture and application
Car parking system
Software company
Avl tree
PLOTTING UNITE STEP AND RAMP FUNCTION IN MATLAB
Elimination of left recursion
2 bit alu
গুড়পুকুরের মেলা
Emergency system AI presentation Using Prolog
Bank management system
Hospital management system DBMS PROJECT USING APEX 5.04
Digital search tree

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Geodesy 1.pptx...............................................
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
PPTX
Welding lecture in detail for understanding
PPTX
web development for engineering and engineering
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
PPT on Performance Review to get promotions
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPT
Project quality management in manufacturing
DOCX
573137875-Attendance-Management-System-original
PPTX
Construction Project Organization Group 2.pptx
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Sustainable Sites - Green Building Construction
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Internet of Things (IOT) - A guide to understanding
Geodesy 1.pptx...............................................
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Arduino robotics embedded978-1-4302-3184-4.pdf
Welding lecture in detail for understanding
web development for engineering and engineering
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Structs to JSON How Go Powers REST APIs.pdf
PPT on Performance Review to get promotions
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Project quality management in manufacturing
573137875-Attendance-Management-System-original
Construction Project Organization Group 2.pptx
OOP with Java - Java Introduction (Basics)
Sustainable Sites - Green Building Construction
Operating System & Kernel Study Guide-1 - converted.pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx

Design pipeline architecture for various stage pipelines

  • 1. WELLCOME TO OUR PRESENTATION Slide 1
  • 4. • Single-cycle control: hardwired – Low CPI (1) – Long clock period (to accommodate slowest instruction) • Multi-cycle control: micro-programmed – Short clock period – High CPI Single-cycle Multi-cycle insn0.(fetch,decode,exec) insn1.(fetch,decode,exec) insn0.fetch insn0.dec insn0.exec insn1.fetch insn1.dec insn1.exec time Slide 3
  • 5. • Start with multi-cycle design • When insn0 goes from stage 1 to stage 2 … insn1 starts stage 1 • Each instruction passes through all stages … but instructions enter and leave at faster rate Multi-cycle insn0.fetch insn1.decinsn0.dec insn0.exec insn1.fetch insn1.exec time Pipelined insn0.fetch insn0.dec insn1.fetch insn0.exec insn1.dec insn2.fetch insn1.exec insn2.dec insn2.exec Can have as many insns in flight as there are stagesSlide 4
  • 6. • A Pipelining is a series of stages, where some work is done at each stage in parallel. • The stages are connected one to the next to form a pipe - instructions enter at one end, progress through the stages, and exit at the other end. Slide 5
  • 7. Pipeline categories  Linear pipelines A linear pipeline processor is a series of processing stages and memory access.  Non-linear pipelines A non-linear pipelining (also called dynamic pipeline) can be configured to perform various functions at different times. In a dynamic pipeline, there is also feed-forward or feed-back connection. A non-linear pipeline also allows very long instruction words. Slide 6
  • 9. InstructionPipeline • Instruction pipeline has six operations,  Fetch instruction (FI)  Decode instruction (DI)  Calculate operands (CO)  Fetch operands (FO)  Execute instructions (EI)  Write result (WR)  Overlap these operations Slide 8
  • 10. Stage 1: Fetch Diagram Instructio nbits IF / ID Pipeline register Instructio n Cache P C en e n 1 + M U X PC+ 1 Decod targe t Slide 9
  • 11. Stage 1: Instructions Fetch • Fetch an instruction from memory every cycle – Use PC to index memory – Increment PC (assume no branches for now) • Write state to the pipeline register (IF/ID) – The next stage will read this pipeline register Slide 10
  • 12. Stage 2: Decode Diagram ID / EX Pipeline register regA content s regB content s Register File reg A reg B e n Instructio n bits IF / ID Pipeline register PC+ 1 PC+ 1 Control signal s Fetc h Execut destRe g dat a targe t Slide 11
  • 13. Stage 2: Instruction Decode • Decodes opcode bits – Set up Control signals for later stages • Read input operands from register file – Specified by decoded instruction bits • Write state to the pipeline register (ID/EX) – Opcode – Register contents – PC+1 (even though decode didn’t use it) – Control signals (from insn) for opcode and destReg Slide 12
  • 14. Stage 3: Execute Diagram ID / EX Pipeline register regA content s regB content s ALU resul t EX/Mem Pipeline register PC+ 1 Control signal s Control signal s PC+1 +offse t + regB content s A L UM U X Decod e Memor destReg data targe t Slide 13
  • 15. Stage 3: Execution • Perform ALU operations – Calculate result of instruction • Control signals select operation • Contents of regA used as one input • Either regB or constant offset (from insn) used as second input – Calculate PC-relative branch target • PC+1+(constant offset) • Write state to the pipeline register (EX/Mem) – ALU result, contents of regB, and PC+1+offset – Control signals (from insn) for opcode and destReg Slide 14
  • 16. Stage 4: Memory Diagram ALU resul t Mem/WB Pipeline register ALU resul t EX/Mem Pipeline register Control signal s PC+1 +offse t regB content s Loade d data Control signals Execut e Write- in_data in_addr Data Cache en R/W destReg data targe t Slide 15
  • 17. Stage 4: Memory • Perform data cache access – ALU result contains address for LD or ST – Opcode bits control R/W and enable signals • Write state to the pipeline register (Mem/WB) – ALU result and Loaded data – Control signals (from insn) for opcode and destReg Slide 16
  • 18. Stage 5: Write-back Diagram ALU resul t Mem/WB Pipeline register Control signal s Loade d data M U X dat a destRe g M U X Memor y Slide 17
  • 19. Stage 5:Write Back • Writing result to register file (if required) – Write Loaded data to destReg for LD – Write ALU result to destReg for arithmetic insn – Opcode bits control register write enable signal Slide 18
  • 20. Putting It All Together P C Ins tCach e Register file M U X A L U 1 Data Cach e + M U X IF/I D EX/Me m Mem/W B M U X dest op ID/E X offse t PC+ 1 PC+ 1 + targe t ALU resul t des t op val B des t op ALU result mdat a eq ? instructio n 0 valA val B R 0 R 1 R 2 R 3 R 4 R 5 R 6 R 7 reg A reg B dat a des t M U X Slide 19
  • 22. • Let see and examine our datapath and control diagram. • Associated resources with states. • Ensure that flows do not conflict, or figure out how to resolve • Assert control in appropriate stage. Slide 21
  • 23.  Register Memory: All ALU operation will be performed in register Memory.  Instruction & Memory: Two kind of Memory System:- 1)instruction Memory 2)Data Memory  Only instruction can access through memory as ALU operation will perform in Register operand.  5 Instruction cycle have to complete to execute the operation
  • 24. 5Steps of MIPS Datapath Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc L M D ALU MUX Memory RegFile MUXMUX Data Memory MUX Sign Extend 4 Adder Zero? Next SEQ PC Addres s Next PC WB Data Inst RD RS1 RS2 Imm What do we need to do to pipeline the process ? Slide 22
  • 25. 5 Steps of MIPS/DLX Datapath Memory Access Write Back Instruction Fetch Instr. Decode Reg. Fetch Execute Addr. Calc Memory MUXMUX Data Memory Zero? IF/I D ID/EX RegFile MUX MEM/WB EX/MEM ALU 4 Adder Next SEQ PC Next SEQ PC RD RD RD WB • Data stationary control – local decode for each instruction phase / pipeline stage Next PC Addres s RS1 RS2 Sign Imm Extend MUX Slide 23
  • 26. • Can help with answering questions like: – how many cycles does it take to execute this code? – what is the ALU doing during cycle 4? – use this representation to help understand datapaths Graphically Representing Pipelines Slide 24
  • 27. Visualizing Pipelining I n s t r. O r d e r Time (clock cycles) g AL U DMemIfetch Re Reg Reg AL U DMemIfetch Reg Reg AL U DMemIfetch Reg Reg AL U DMemIfetch Reg Cycle 6 Cycle 7Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Slide 25
  • 28. Conventional Pipelined Execution Representation IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB IFetch Dcd Exec Mem WB Program Flow Time Slide 26
  • 29. SingleCycle,Multiple Cycle,vs. Pipeline Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Mem Wr Mem Wr Ifetch Reg Exec Multiple Cycle Implementation: Load Ifetch Reg Exec Store Exec Mem Wr Clk Single Cycle Implementation: Load Store Waste Mem Ifetch R-type Reg Exec Mem Wr Pipeline Implementation: Load Ifetch Reg Exec Store Ifetch Reg R-type Ifetch Cycle 2Cycle 1 Slide 27
  • 30. • Suppose we execute 100 instructions • Single Cycle Machine – 45 ns/cycle x 1 CPI x 100 inst = 4500 ns • Multicycle Machine – 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns • Ideal pipelined machine – 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns Slide 28
  • 31. Pipeline Performance  Pipelining  – At best no impact on latency  • Still need to wait “n” stages (cycles) for completion of instruction  – Improves “throughput”  • No single instruction executes faster but overall throughput is higher  • Average instruction execution time decreases  • Successive instructions complete in each successive cycle (no 5 cycle wait  between instructions)  – Reality  • Clock determined by slowest stage  • Pipeline overhead  – Clock skew  – Register delay  – Pipeline fill and drain Slide 29
  • 32.  Consider a non pipelined machine with 6 execution stages of lengths 50 ns, 50 ns, 60 ns, 60 ns, 50 ns, and 50 ns. - Find the instruction latency on this machine. - How much time does it take to execute 100 instructions? Solution:  Instruction latency = 50+50+60+60+50+50= 320 ns Time to execute 100 instructions = 100*320 = 32000 ns  Suppose we introduce pipelining on this machine. Assume that when introducing pipelining, the clock skew adds 5ns of overhead to each execution stage.  - What is the instruction latency on the pipelined machine? - How much time does it take to execute 100 instructions?  Solution: Remember that in the pipelined implementation, the length of the pipe stages must all be the same, i.e., the speed of the slowest stage plus overhead. With 5ns overhead it comes to:  The length of pipelined stage = MAX(lengths of unpipelined stages) + overhead = 60 + 5 = 65 ns Instruction latency = 65 ns Time to execute 100 instructions = 65*6*1 + 65*1*99 = 390 + 6435 = 6825 ns Slide 30
  • 33. Slide 31  Three types of pipeline hazards  Structural hazard – They arise from resource conflicts when the hardware cannot support all possible combinations of instructions in simultaneous overlapped execution.  – Data hazard – They arise when an instruction depends on the result of a previous instruction in a way that is exposed by the overlapping of instructions in the pipeline.  – Control hazard – They arise from the pipelining of branches and other instructions that change the PC.
  • 34.  Pipelining makes efficient use of resources.  Quicker time of execution of large number of instructions  The parallelism is invisible to the programmer. Slide 32
  • 35.  Pipelining involves adding hardware to thechip.  Inability to continuously run the pipeline at full speed because of pipeline hazards which disrupt the smooth execution ofthe pipeline. Slide 33
  • 36.  The pipelining concept takes lot of advantages in many of the systems. The pipelining has some of the hazards. The pipelining of instructions which reduce the CPI, increases the speed of execution or operation, and also increase throughput of  Overall system. This is basic concept of any system and lot of improvement can be done in pipelining concept to increase the speed of system. The future scope is to apply the concept  to different embedded systems and we can see how the performance increases by this concept. Slide 34
  • 38. Referrance  https://en.wikipedia.org/wiki/Pipeline_(computing)  https://en.wikipedia.org/wiki/Instruction_pipelining  https://docs.marklogic.com/guide/cpf/pipelines#id_50 300  https://www.tutorialspoint.com/computer_organizatio n/instruction_pipeline_architecture.asp Slide 36