SlideShare a Scribd company logo
2
Most read
10
Most read
13
Most read
ASSIGNMENT # 1
Subject
“COMPUTER ARCHITECTURE”
Teacher
“Ma’am Aden Iqbal”
By
“Farwa Abdul Hannan”
(12-CS-13)
Monday, 28 March, 2016
NFC – INSITUTDE OF ENGINEERING AND
FERTILIZER RESEARCH, FSD
1
Tomasulo Algorithm
1) Consider the code sequence shown below.
LD F6, 12(R2)
LD F2, 16(R3)
ADDD F0, F2, F4
DIVD F10, F0, F6
SUBD F8, F6, F2
ADDI R2, R2, 8
ADDI R3, R3, 16
ADDD F6, F8, F2
a) Identify all WAR, WAW, and RAW dependencies in the instruction stream.
WAR WAW RAW
SUBD F8, F6, F2
ADDD F6, F8, F2
LD F6, 12(R2)
ADDD F6, F8, F2
LD F2, 16(R3)
ADDDF0, F2, F4
NIL NIL ADDD F0, F2, F4
DIVD F10, F0, F6
NIL NIL LD F6, 12(R2)
SUBD F8, F6, F2
b) Draw a pipeline diagram of how instructions would issue in a machine using
Tamasulo algorithm as discussed in class:. Assume that the FP Add unit has 4
EX phases, the FP Multiply unit has 7 EX phases, and divide has 24 EX phases.
FP Adds, Subtracts, and Multiplies are fully-pipelined, while divide operations
are NOT pipelined.
2
Cycle 1, 2, 3
Instruction Status Load/Buffers
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 Load1 Yes 12+R2
LD F2 16+ R3 2 Load2 Yes 16+R3
ADDD F0 F2 F4 3 Load3 No
DIVD F10 F0 F6
SUBD F8 F6 F2
ADDI R2 R2 8
ADDI R3 R3 16
ADDD F6 F8 F2
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
ADD1 Yes ADDD R(F4) Load2
ADD2 No
ADD3 No
MULT1 No
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
3 FU ADD1 Load2 Load1
Cycle 4
Instruction Status Load/Buffers
3
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 4 Load1 No
LD F2 16+ R3 2 4 Load2 Yes 16+R3
ADDD F0 F2 F4 3 Load3 No
DIVD F10 F0 F6 4
SUBD F8 F6 F2
ADDI R2 R2 8
ADDI R3 R3 16
ADDD F6 F8 F2
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
ADD1 Yes ADDD R(F4) Load2
ADD2 No
ADD3 No
MULT1 Yes DIVD M(A1) ADD1
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
4 FU ADD1 Load2 M(A1) MULT1
Cycle 5
Instruction Status Load/Buffers
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
4
LD F6 12+ R2 1 3 4 Load1 No
LD F2 16+ R3 2 4 5 Load2 No
ADDD F0 F2 F4 3 Load3 No
DIVD F10 F0 F6 4
SUBD F8 F6 F2 5
ADDI R2 R2 8
ADDI R3 R3 16
ADDD F6 F8 F2
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
4 ADD1 Yes ADDD M(A2) R(F4)
4 ADD2 Yes SUBD M(A1) M(A2)
ADD3 No
MULT1 Yes DIVD M(A1) ADD1
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
5 FU ADD1 M(A2) M(A1) ADD2 MULT1
Cycle 6
Instruction Status Load/Buffers
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 4 Load1 No
LD F2 16+ R3 2 4 5 Load2 No
ADDD F0 F2 F4 3 Load3 No
5
DIVD F10 F0 F6 4
SUBD F8 F6 F2 5
ADDI R2 R2 8 6 6
ADDI R3 R3 16
ADDD F6 F8 F2
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
3 ADD1 Yes ADDD M(A2) R(F4)
3 ADD2 Yes SUBD M(A1) M(A2)
ADD3 No
MULT1 Yes DIVD M(A1) ADD1
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
6 FU ADD1 M(A2) M(A1) ADD2 MULT1
Cycle 7
Instruction Status Load/Buffers
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 4 Load1 No
LD F2 16+ R3 2 4 5 Load2 No
ADDD F0 F2 F4 3 Load3 No
DIVD F10 F0 F6 4
SUBD F8 F6 F2 5
ADDI R2 R2 8 6 6 7
6
ADDI R3 R3 16 7 7
ADDD F6 F8 F2
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
2 ADD1 Yes ADDD M(A2) R(F4)
2 ADD2 Yes SUBD M(A1) M(A2)
ADD3 No
MULT1 Yes DIVD M(A1) ADD1
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
7 FU ADD1 M(A2) M(A1) ADD2 MULT1
Cycle 8
Instruction Status Load/Buffers
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 4 Load1 No
LD F2 16+ R3 2 4 5 Load2 No
ADDD F0 F2 F4 3 Load3 No
DIVD F10 F0 F6 4
SUBD F8 F6 F2 5
ADDI R2 R2 8 6 6 7
ADDI R3 R3 16 7 7 8
ADDD F6 F8 F2 8
Reservation Station
7
Time Name Busy Op. Vj Vk Qj Qk
1 ADD1 Yes ADDD M(A2) R(F4)
1 ADD2 Yes SUBD M(A1) M(A2)
ADD3 No ADDD M(A2) ADD2
MULT1 Yes DIVD M(A1) ADD1
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
8 FU ADD1 M(A2) ADD3 ADD2 MULT1
Cycle 9
Instruction Status Load/Buffers
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 4 Load1 No
LD F2 16+ R3 2 4 5 Load2 No
ADDD F0 F2 F4 3 9 Load3 No
DIVD F10 F0 F6 4
SUBD F8 F6 F2 5 9
ADDI R2 R2 8 6 6 7
ADDI R3 R3 16 7 7 8
ADDD F6 F8 F2 8
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
0 ADD1 Yes ADDD M(A2) R(F4)
0 ADD2 Yes SUBD M(A1) M(A2)
ADD3 No ADDD M(A2) ADD2
8
MULT1 Yes DIVD M(A1) ADD1
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
9 FU ADD1 M(A2) ADD3 ADD2 MULT1
Cycle 10
Instruction Status Load/Buffers
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 4 Load1 No
LD F2 16+ R3 2 4 5 Load2 No
ADDD F0 F2 F4 3 9 10 Load3 No
DIVD F10 F0 F6 4
SUBD F8 F6 F2 5 9
ADDI R2 R2 8 6 6 7
ADDI R3 R3 16 7 7 8
ADDD F6 F8 F2 8
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
ADD1 No
ADD2 Yes SUBD M(A1) M(A2)
4 ADD3 Yes ADDD M-M M(A2)
24 MULT1 Yes DIVD M+R4 M(A1)
MULT2 No
Register Result Status
9
Clock F0 F2 F4 F6 F8 F10 F12 F14
10 FU M+R4 M(A2) ADD3 ADD2 MULT1
Cycle 11
Instruction Status Load/Buffers
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 4 Load1 No
LD F2 16+ R3 2 4 5 Load2 No
ADDD F0 F2 F4 3 9 10 Load3 No
DIVD F10 F0 F6 4
SUBD F8 F6 F2 5 9 11
ADDI R2 R2 8 6 6 7
ADDI R3 R3 16 7 7 8
ADDD F6 F8 F2 8
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
ADD1 No
ADD2 No
3 ADD3 Yes ADDD M-M M(A2)
23 MULT1 Yes DIVD M+R4 M(A1)
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
11 FU M+R4 M(A2) ADD3 M-M MULT1
10
Cycle 14
Instruction Status Load/Buffers
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 4 Load1 No
LD F2 16+ R3 2 4 5 Load2 No
ADDD F0 F2 F4 3 10 11 Load3 No
DIVD F10 F0 F6 4
SUBD F8 F6 F2 5 8 9
ADDI R2 R2 8 6 6 7
ADDI R3 R3 16 7 7 8
ADDD F6 F8 F2 8 14
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
ADD1 No
ADD2 No
0 ADD3 Yes ADDD M-M M(A2)
20 MULT1 Yes DIVD M+R4 M(A1)
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
14 FU M+R4 M(A2) ADD3 M-M MULT1
Cycle 15
Instruction Status Load/Buffers
11
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 4 Load1 No
LD F2 16+ R3 2 4 5 Load2 No
ADDD F0 F2 F4 3 10 Load3 No
DIVD F10 F0 F6 4
SUBD F8 F6 F2 5 8
ADDI R2 R2 8 6 6 7
ADDI R3 R3 16 7 7 8
ADDD F6 F8 F2 8 14 15
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
ADD1 No
ADD2 No
ADD3 No
20 MULT1 Yes DIVD M+R4 M(A1)
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
15 FU M+R4 M(A2) M-M+M M-M MULT1
Cycle 35
Instruction Status Load/Buffers
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 4 Load1 No
12
LD F2 16+ R3 2 4 5 Load2 No
ADDD F0 F2 F4 3 10 11 Load3 No
DIVD F10 F0 F6 4 35
SUBD F8 F6 F2 5 8 9
ADDI R2 R2 8 6 6 7
ADDI R3 R3 16 7 7 8
ADDD F6 F8 F2 8 14 15
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
ADD1 No
ADD2 No
ADD3 No
0 MULT1 Yes DIVD M+R4 M(A1)
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
35 FU M+R4 M(A2) M-M+M M-M MULT1
Cycle 36
Instruction Status Load/Buffers
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 4 Load1 No
LD F2 16+ R3 2 4 5 Load2 No
ADDD F0 F2 F4 3 10 11 Load3 No
DIVD F10 F0 F6 4 35 36
13
SUBD F8 F6 F2 5 8 9
ADDI R2 R2 8 6 6 7
ADDI R3 R3 16 7 7 8
ADDD F6 F8 F2 8 14 15
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
ADD1 No
ADD2 No
ADD3 No
MULT1 No
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
36 FU M+R4 M(A2) M-M+M M-M (M+R4)/M
c) Tomasulo’s algorithm has a disadvantage. Only one result can complete per
clock, per CDB. Using the same latencies as above, find a code sequence of no
more than 12 instructions where Tomasulo’s algorithm must stall due to CDB
contention. Indicate where this occurs in your sequence.
It occurs in the following cycle
Cycle 9
Instruction Status Load/Buffers
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle Busy Address
j k
LD F6 12+ R2 1 3 4 Load1 No
LD F2 16+ R3 2 4 5 Load2 No
ADDD F0 F2 F4 3 9 Load3 No
DIVD F10 F0 F6 4
14
SUBD F8 F6 F2 5 9
ADDI R2 R2 8 6 6 7
ADDI R3 R3 16 7 7 8
ADDD F6 F8 F2 8
Reservation Station
Time Name Busy Op. Vj Vk Qj Qk
0 ADD1 Yes ADDD M(A2) R(F4)
0 ADD2 Yes SUBD M(A1) M(A2)
ADD3 No ADDD M(A2) ADD2
MULT1 Yes DIVD M(A1) ADD1
MULT2 No
Register Result Status
Clock F0 F2 F4 F6 F8 F10 F12 F14
9 FU ADD1 M(A2) ADD3 ADD2 MULT1
2) Evaluate the performance of several implementation options for
the following workload:
LOOP:
L.D F3, R4(R6) # F3 = MEM[r4+r6]
MUL.D F4, F3, F2 # F4 = F3*F2
S.D F4, R3(R6) # MEM[R3+R6] = F4
A.D F4, F3, F3 # F4 = F3+F3
Only one instruction can complete per result
per CDB
15
A.D F10, F10, F4 # F10 = F10 + F4
DSUBUI R6,R6, #4 # R6 = R6 - 4
BNEQ R6, loop # if R6 != 0, jump to LOOP
Assume the processor implements Tomasulo’s algorithm (with reservation stations and no reorder
buffer), as well as the following:
 A single instruction is issued per cycle.
 All function units are not pipelined.
 No forwarding between or within function units; results are communicated via the single
CDB.
 The memory execution unit uses three stages for load and 2 cycles for store. Load and store
have separate reservation stations, but either a load or store can execute at any one time
since they share the memory port.
 Issue and write result stages require one cycle each. Address generation is performed
separate from the ALU in the load and store buffers.
 Branches execute in the integer unit, and instructions issued after a branch wait until the
branch has been resolved and broadcast on the CDB.
Functional Unit Queues and Latencies:
Functional Unit # of Functional Units Latency (cycles in EX) # of Reservation Stations
Memory – Load 1 3 2
Memory – Store 1 2 2
Integer 1 1 5
FP – Add 1 4 3
FP – Multiply 1 2 2
a) Perform a simulation of the first two iterations for a single issue architecture.
Create the table below
Iteration 1
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle
j k
L.D F3 R4 R6 1 2 6
MUL.D F4 F3 F2 2 6 17
S.D F4 R3 R6 3 17 21
16
A.D F4 F3 F3 4 21 26
A.D F10 F10 F4 5 26 31
DSUBUI R6 R6 #4 6 31 36
BNEQ R6 loop 7 37
Iteration 2
Instruction Issue
Cycle
Execute
Cycle
Write
Cycle
j k
L.D F3 R4 R6 8 38 42
MUL.D F4 F3 F2 9 42 53
S.D F4 R3 R6 10 53 56
A.D F4 F3 F3 11 56 61
A.D F10 F10 F4 12 61 66
DSUBUI R6 R6 #4 13 66 71
BNEQ R6 loop 14 71
b) What is the performance bottleneck?
The delay in transmission of data through the circuits of a computer's microprocessor or
over a TCP/IP network. The delay typically occurs when a system's bandwidth cannot
support the amount of information being relayed at the speed it is being processed
c) What is the “steady state” of this loop – that is how many cycles will an average
loop iteration take if loop startup and shutdown effects are ignored?
The steady state of the loop occurs when the R6 will be equal to zero which means at R6
equal to zero the loop will no longer keep on iterating and will be in a steady state.
d) Where will the first issue stall occur?
The first stall will occur when the second instruction of MULTD F4, F3, F2 will execute
because its execution will be dependent on the F3 of LD. So RAW delay will occur.

More Related Content

PPT
8085 Architecture & Memory Interfacing1
PPTX
ARITHMETIC OPERATIONS IN 8085 MICROPROCESSOR
PPTX
Divide by N clock
PPT
1 radar signal processing
PDF
Designing of 8 BIT Arithmetic and Logical Unit and implementing on Xilinx Ver...
PPTX
Latches and flip flops
PDF
Timing diagram of microprocessor 8085
PPT
8085 Architecture & Memory Interfacing1
ARITHMETIC OPERATIONS IN 8085 MICROPROCESSOR
Divide by N clock
1 radar signal processing
Designing of 8 BIT Arithmetic and Logical Unit and implementing on Xilinx Ver...
Latches and flip flops
Timing diagram of microprocessor 8085

What's hot (20)

PPT
10 range and doppler measurements in radar systems
PDF
PMBus Specification Rev 1.2 Presentation 20100228.pdf
PDF
8085 data transfer instruction set
PPTX
SRAM DRAM
PPT
DIT-Radix-2-FFT in SPED
PDF
Edge triggered RS FF.pdf
PPTX
Discrete Fourier Transform
PPTX
Pentium (80586) Microprocessor By Er. Swapnil Kaware
PPT
Adc interfacing
PPTX
Multiplexers
PPT
RF Transceivers
PDF
Computing DFT using Matrix method
PPT
Unit 1(stld)
PPTX
8051 MICROCONTROLLER ARCHITECTURE.pptx
PPT
Flipflops and Excitation tables of flipflops
PDF
Satellite Link Budget_Course_Sofia_2017_Lisi
PPTX
PROGRAMMABLE KEYBOARD AND DISPLAY INTERFACE(8279).pptx
PPTX
Instruction set of 8085 Microprocessor By Er. Swapnil Kaware
PPTX
4 bit Binary counter
PPTX
1.ripple carry adder, full adder implementation using half adder.
10 range and doppler measurements in radar systems
PMBus Specification Rev 1.2 Presentation 20100228.pdf
8085 data transfer instruction set
SRAM DRAM
DIT-Radix-2-FFT in SPED
Edge triggered RS FF.pdf
Discrete Fourier Transform
Pentium (80586) Microprocessor By Er. Swapnil Kaware
Adc interfacing
Multiplexers
RF Transceivers
Computing DFT using Matrix method
Unit 1(stld)
8051 MICROCONTROLLER ARCHITECTURE.pptx
Flipflops and Excitation tables of flipflops
Satellite Link Budget_Course_Sofia_2017_Lisi
PROGRAMMABLE KEYBOARD AND DISPLAY INTERFACE(8279).pptx
Instruction set of 8085 Microprocessor By Er. Swapnil Kaware
4 bit Binary counter
1.ripple carry adder, full adder implementation using half adder.
Ad

Viewers also liked (20)

PDF
Program and Network Properties
PDF
Parallel programming model, language and compiler in ACA.
DOCX
Digital logic and design's Lab 4 nand
PDF
Scaling
DOCX
Cohen sutherland algorithm
DOCX
Javadocx j option pane
DOCX
Applications of Image Processing
PDF
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
DOCX
JAVA Manual remaining
PDF
Prefix and suffix of open gl
PDF
Linear combination of vector
DOCX
Digital logic and design's Lab 3
PDF
Raster images (assignment)
DOCX
Manual of JAVA (more than Half)
PPTX
Templates
DOCX
Chapter 4: Lexical & Syntax Analysis (Programming Exercises)
DOCX
Mission statement and Vision statement of 3 Different Companies
PPTX
Implementation & Challenges of IPv6
PDF
IPv6 Implementation challenges
DOCX
DLDLab 8 half adder
Program and Network Properties
Parallel programming model, language and compiler in ACA.
Digital logic and design's Lab 4 nand
Scaling
Cohen sutherland algorithm
Javadocx j option pane
Applications of Image Processing
Summary of Simultaneous Multithreading: Maximizing On-Chip Parallelism
JAVA Manual remaining
Prefix and suffix of open gl
Linear combination of vector
Digital logic and design's Lab 3
Raster images (assignment)
Manual of JAVA (more than Half)
Templates
Chapter 4: Lexical & Syntax Analysis (Programming Exercises)
Mission statement and Vision statement of 3 Different Companies
Implementation & Challenges of IPv6
IPv6 Implementation challenges
DLDLab 8 half adder
Ad

Similar to Tomasulo Algorithm (20)

PDF
Instruction Level Parallelism – Hardware Techniques
PDF
Cs718min1 2008soln View
PPT
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
PPTX
MODULE 2 OF EMBEDDED SYSTEM KTU SYLLABUS
PPTX
Basic computer organization design
PPT
COMPUTER ORGANIZATION - Design of control unit final
PPT
DOCX
Vlsi interview questions compilation
PDF
Microprocessor Question and Answer MCQ PDF
DOCX
embeddeed real time systems 2 mark questions and answers
PDF
computer organization and assembly language giki course slides
PDF
Advanced Microprocessors
PDF
Understanding Tomasulo Algorithm
PDF
computer organization and assembly language giki course slides
PPTX
CS304PC:Computer Organization and Architecture Session 6 Instruction cycle.pptx
PPT
COMPUTER ORGANIZATION - Design of control unit
PDF
Instruction execution cycle _
PDF
PPT
Unit 1 basic structure of computers
Instruction Level Parallelism – Hardware Techniques
Cs718min1 2008soln View
Lec7 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Dynamic Sch...
MODULE 2 OF EMBEDDED SYSTEM KTU SYLLABUS
Basic computer organization design
COMPUTER ORGANIZATION - Design of control unit final
Vlsi interview questions compilation
Microprocessor Question and Answer MCQ PDF
embeddeed real time systems 2 mark questions and answers
computer organization and assembly language giki course slides
Advanced Microprocessors
Understanding Tomasulo Algorithm
computer organization and assembly language giki course slides
CS304PC:Computer Organization and Architecture Session 6 Instruction cycle.pptx
COMPUTER ORGANIZATION - Design of control unit
Instruction execution cycle _
Unit 1 basic structure of computers

More from Farwa Ansari (12)

PDF
Energy Harvesting Techniques in Wireless Sensor Networks – A Survey
PPTX
Micro-services architecture
PDF
Software Design Patterns - An Overview
PDF
Optimizing the memory management of a virtual machine monitor on a NUMA syste...
PDF
Fault Tolerance Typed Assembly Language - A graphical overview
PDF
Comparative Analysis of Face Recognition Methodologies and Techniques
DOCX
Chapter 5: Names, Bindings and Scopes (review Questions and Problem Set)
PDF
Business plan of a software house
PDF
Graphic display devices
PPTX
Memory Hierarchy Design, Basics, Cache Optimization, Address Translation
DOCX
Dld (lab 1 & 2)
PPTX
Hacking and Hackers
Energy Harvesting Techniques in Wireless Sensor Networks – A Survey
Micro-services architecture
Software Design Patterns - An Overview
Optimizing the memory management of a virtual machine monitor on a NUMA syste...
Fault Tolerance Typed Assembly Language - A graphical overview
Comparative Analysis of Face Recognition Methodologies and Techniques
Chapter 5: Names, Bindings and Scopes (review Questions and Problem Set)
Business plan of a software house
Graphic display devices
Memory Hierarchy Design, Basics, Cache Optimization, Address Translation
Dld (lab 1 & 2)
Hacking and Hackers

Recently uploaded (20)

PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
Pharma ospi slides which help in ospi learning
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Lesson notes of climatology university.
PDF
VCE English Exam - Section C Student Revision Booklet
Pharmacology of Heart Failure /Pharmacotherapy of CHF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Microbial disease of the cardiovascular and lymphatic systems
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
human mycosis Human fungal infections are called human mycosis..pptx
Pharma ospi slides which help in ospi learning
O5-L3 Freight Transport Ops (International) V1.pdf
Computing-Curriculum for Schools in Ghana
O7-L3 Supply Chain Operations - ICLT Program
Abdominal Access Techniques with Prof. Dr. R K Mishra
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Final Presentation General Medicine 03-08-2024.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Final Presentation General Medicine 03-08-2024.pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
Microbial diseases, their pathogenesis and prophylaxis
Lesson notes of climatology university.
VCE English Exam - Section C Student Revision Booklet

Tomasulo Algorithm

  • 1. ASSIGNMENT # 1 Subject “COMPUTER ARCHITECTURE” Teacher “Ma’am Aden Iqbal” By “Farwa Abdul Hannan” (12-CS-13) Monday, 28 March, 2016 NFC – INSITUTDE OF ENGINEERING AND FERTILIZER RESEARCH, FSD
  • 2. 1 Tomasulo Algorithm 1) Consider the code sequence shown below. LD F6, 12(R2) LD F2, 16(R3) ADDD F0, F2, F4 DIVD F10, F0, F6 SUBD F8, F6, F2 ADDI R2, R2, 8 ADDI R3, R3, 16 ADDD F6, F8, F2 a) Identify all WAR, WAW, and RAW dependencies in the instruction stream. WAR WAW RAW SUBD F8, F6, F2 ADDD F6, F8, F2 LD F6, 12(R2) ADDD F6, F8, F2 LD F2, 16(R3) ADDDF0, F2, F4 NIL NIL ADDD F0, F2, F4 DIVD F10, F0, F6 NIL NIL LD F6, 12(R2) SUBD F8, F6, F2 b) Draw a pipeline diagram of how instructions would issue in a machine using Tamasulo algorithm as discussed in class:. Assume that the FP Add unit has 4 EX phases, the FP Multiply unit has 7 EX phases, and divide has 24 EX phases. FP Adds, Subtracts, and Multiplies are fully-pipelined, while divide operations are NOT pipelined.
  • 3. 2 Cycle 1, 2, 3 Instruction Status Load/Buffers Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 Load1 Yes 12+R2 LD F2 16+ R3 2 Load2 Yes 16+R3 ADDD F0 F2 F4 3 Load3 No DIVD F10 F0 F6 SUBD F8 F6 F2 ADDI R2 R2 8 ADDI R3 R3 16 ADDD F6 F8 F2 Reservation Station Time Name Busy Op. Vj Vk Qj Qk ADD1 Yes ADDD R(F4) Load2 ADD2 No ADD3 No MULT1 No MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 3 FU ADD1 Load2 Load1 Cycle 4 Instruction Status Load/Buffers
  • 4. 3 Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 4 Load1 No LD F2 16+ R3 2 4 Load2 Yes 16+R3 ADDD F0 F2 F4 3 Load3 No DIVD F10 F0 F6 4 SUBD F8 F6 F2 ADDI R2 R2 8 ADDI R3 R3 16 ADDD F6 F8 F2 Reservation Station Time Name Busy Op. Vj Vk Qj Qk ADD1 Yes ADDD R(F4) Load2 ADD2 No ADD3 No MULT1 Yes DIVD M(A1) ADD1 MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 4 FU ADD1 Load2 M(A1) MULT1 Cycle 5 Instruction Status Load/Buffers Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k
  • 5. 4 LD F6 12+ R2 1 3 4 Load1 No LD F2 16+ R3 2 4 5 Load2 No ADDD F0 F2 F4 3 Load3 No DIVD F10 F0 F6 4 SUBD F8 F6 F2 5 ADDI R2 R2 8 ADDI R3 R3 16 ADDD F6 F8 F2 Reservation Station Time Name Busy Op. Vj Vk Qj Qk 4 ADD1 Yes ADDD M(A2) R(F4) 4 ADD2 Yes SUBD M(A1) M(A2) ADD3 No MULT1 Yes DIVD M(A1) ADD1 MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 5 FU ADD1 M(A2) M(A1) ADD2 MULT1 Cycle 6 Instruction Status Load/Buffers Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 4 Load1 No LD F2 16+ R3 2 4 5 Load2 No ADDD F0 F2 F4 3 Load3 No
  • 6. 5 DIVD F10 F0 F6 4 SUBD F8 F6 F2 5 ADDI R2 R2 8 6 6 ADDI R3 R3 16 ADDD F6 F8 F2 Reservation Station Time Name Busy Op. Vj Vk Qj Qk 3 ADD1 Yes ADDD M(A2) R(F4) 3 ADD2 Yes SUBD M(A1) M(A2) ADD3 No MULT1 Yes DIVD M(A1) ADD1 MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 6 FU ADD1 M(A2) M(A1) ADD2 MULT1 Cycle 7 Instruction Status Load/Buffers Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 4 Load1 No LD F2 16+ R3 2 4 5 Load2 No ADDD F0 F2 F4 3 Load3 No DIVD F10 F0 F6 4 SUBD F8 F6 F2 5 ADDI R2 R2 8 6 6 7
  • 7. 6 ADDI R3 R3 16 7 7 ADDD F6 F8 F2 Reservation Station Time Name Busy Op. Vj Vk Qj Qk 2 ADD1 Yes ADDD M(A2) R(F4) 2 ADD2 Yes SUBD M(A1) M(A2) ADD3 No MULT1 Yes DIVD M(A1) ADD1 MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 7 FU ADD1 M(A2) M(A1) ADD2 MULT1 Cycle 8 Instruction Status Load/Buffers Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 4 Load1 No LD F2 16+ R3 2 4 5 Load2 No ADDD F0 F2 F4 3 Load3 No DIVD F10 F0 F6 4 SUBD F8 F6 F2 5 ADDI R2 R2 8 6 6 7 ADDI R3 R3 16 7 7 8 ADDD F6 F8 F2 8 Reservation Station
  • 8. 7 Time Name Busy Op. Vj Vk Qj Qk 1 ADD1 Yes ADDD M(A2) R(F4) 1 ADD2 Yes SUBD M(A1) M(A2) ADD3 No ADDD M(A2) ADD2 MULT1 Yes DIVD M(A1) ADD1 MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 8 FU ADD1 M(A2) ADD3 ADD2 MULT1 Cycle 9 Instruction Status Load/Buffers Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 4 Load1 No LD F2 16+ R3 2 4 5 Load2 No ADDD F0 F2 F4 3 9 Load3 No DIVD F10 F0 F6 4 SUBD F8 F6 F2 5 9 ADDI R2 R2 8 6 6 7 ADDI R3 R3 16 7 7 8 ADDD F6 F8 F2 8 Reservation Station Time Name Busy Op. Vj Vk Qj Qk 0 ADD1 Yes ADDD M(A2) R(F4) 0 ADD2 Yes SUBD M(A1) M(A2) ADD3 No ADDD M(A2) ADD2
  • 9. 8 MULT1 Yes DIVD M(A1) ADD1 MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 9 FU ADD1 M(A2) ADD3 ADD2 MULT1 Cycle 10 Instruction Status Load/Buffers Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 4 Load1 No LD F2 16+ R3 2 4 5 Load2 No ADDD F0 F2 F4 3 9 10 Load3 No DIVD F10 F0 F6 4 SUBD F8 F6 F2 5 9 ADDI R2 R2 8 6 6 7 ADDI R3 R3 16 7 7 8 ADDD F6 F8 F2 8 Reservation Station Time Name Busy Op. Vj Vk Qj Qk ADD1 No ADD2 Yes SUBD M(A1) M(A2) 4 ADD3 Yes ADDD M-M M(A2) 24 MULT1 Yes DIVD M+R4 M(A1) MULT2 No Register Result Status
  • 10. 9 Clock F0 F2 F4 F6 F8 F10 F12 F14 10 FU M+R4 M(A2) ADD3 ADD2 MULT1 Cycle 11 Instruction Status Load/Buffers Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 4 Load1 No LD F2 16+ R3 2 4 5 Load2 No ADDD F0 F2 F4 3 9 10 Load3 No DIVD F10 F0 F6 4 SUBD F8 F6 F2 5 9 11 ADDI R2 R2 8 6 6 7 ADDI R3 R3 16 7 7 8 ADDD F6 F8 F2 8 Reservation Station Time Name Busy Op. Vj Vk Qj Qk ADD1 No ADD2 No 3 ADD3 Yes ADDD M-M M(A2) 23 MULT1 Yes DIVD M+R4 M(A1) MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 11 FU M+R4 M(A2) ADD3 M-M MULT1
  • 11. 10 Cycle 14 Instruction Status Load/Buffers Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 4 Load1 No LD F2 16+ R3 2 4 5 Load2 No ADDD F0 F2 F4 3 10 11 Load3 No DIVD F10 F0 F6 4 SUBD F8 F6 F2 5 8 9 ADDI R2 R2 8 6 6 7 ADDI R3 R3 16 7 7 8 ADDD F6 F8 F2 8 14 Reservation Station Time Name Busy Op. Vj Vk Qj Qk ADD1 No ADD2 No 0 ADD3 Yes ADDD M-M M(A2) 20 MULT1 Yes DIVD M+R4 M(A1) MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 14 FU M+R4 M(A2) ADD3 M-M MULT1 Cycle 15 Instruction Status Load/Buffers
  • 12. 11 Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 4 Load1 No LD F2 16+ R3 2 4 5 Load2 No ADDD F0 F2 F4 3 10 Load3 No DIVD F10 F0 F6 4 SUBD F8 F6 F2 5 8 ADDI R2 R2 8 6 6 7 ADDI R3 R3 16 7 7 8 ADDD F6 F8 F2 8 14 15 Reservation Station Time Name Busy Op. Vj Vk Qj Qk ADD1 No ADD2 No ADD3 No 20 MULT1 Yes DIVD M+R4 M(A1) MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 15 FU M+R4 M(A2) M-M+M M-M MULT1 Cycle 35 Instruction Status Load/Buffers Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 4 Load1 No
  • 13. 12 LD F2 16+ R3 2 4 5 Load2 No ADDD F0 F2 F4 3 10 11 Load3 No DIVD F10 F0 F6 4 35 SUBD F8 F6 F2 5 8 9 ADDI R2 R2 8 6 6 7 ADDI R3 R3 16 7 7 8 ADDD F6 F8 F2 8 14 15 Reservation Station Time Name Busy Op. Vj Vk Qj Qk ADD1 No ADD2 No ADD3 No 0 MULT1 Yes DIVD M+R4 M(A1) MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 35 FU M+R4 M(A2) M-M+M M-M MULT1 Cycle 36 Instruction Status Load/Buffers Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 4 Load1 No LD F2 16+ R3 2 4 5 Load2 No ADDD F0 F2 F4 3 10 11 Load3 No DIVD F10 F0 F6 4 35 36
  • 14. 13 SUBD F8 F6 F2 5 8 9 ADDI R2 R2 8 6 6 7 ADDI R3 R3 16 7 7 8 ADDD F6 F8 F2 8 14 15 Reservation Station Time Name Busy Op. Vj Vk Qj Qk ADD1 No ADD2 No ADD3 No MULT1 No MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 36 FU M+R4 M(A2) M-M+M M-M (M+R4)/M c) Tomasulo’s algorithm has a disadvantage. Only one result can complete per clock, per CDB. Using the same latencies as above, find a code sequence of no more than 12 instructions where Tomasulo’s algorithm must stall due to CDB contention. Indicate where this occurs in your sequence. It occurs in the following cycle Cycle 9 Instruction Status Load/Buffers Instruction Issue Cycle Execute Cycle Write Cycle Busy Address j k LD F6 12+ R2 1 3 4 Load1 No LD F2 16+ R3 2 4 5 Load2 No ADDD F0 F2 F4 3 9 Load3 No DIVD F10 F0 F6 4
  • 15. 14 SUBD F8 F6 F2 5 9 ADDI R2 R2 8 6 6 7 ADDI R3 R3 16 7 7 8 ADDD F6 F8 F2 8 Reservation Station Time Name Busy Op. Vj Vk Qj Qk 0 ADD1 Yes ADDD M(A2) R(F4) 0 ADD2 Yes SUBD M(A1) M(A2) ADD3 No ADDD M(A2) ADD2 MULT1 Yes DIVD M(A1) ADD1 MULT2 No Register Result Status Clock F0 F2 F4 F6 F8 F10 F12 F14 9 FU ADD1 M(A2) ADD3 ADD2 MULT1 2) Evaluate the performance of several implementation options for the following workload: LOOP: L.D F3, R4(R6) # F3 = MEM[r4+r6] MUL.D F4, F3, F2 # F4 = F3*F2 S.D F4, R3(R6) # MEM[R3+R6] = F4 A.D F4, F3, F3 # F4 = F3+F3 Only one instruction can complete per result per CDB
  • 16. 15 A.D F10, F10, F4 # F10 = F10 + F4 DSUBUI R6,R6, #4 # R6 = R6 - 4 BNEQ R6, loop # if R6 != 0, jump to LOOP Assume the processor implements Tomasulo’s algorithm (with reservation stations and no reorder buffer), as well as the following:  A single instruction is issued per cycle.  All function units are not pipelined.  No forwarding between or within function units; results are communicated via the single CDB.  The memory execution unit uses three stages for load and 2 cycles for store. Load and store have separate reservation stations, but either a load or store can execute at any one time since they share the memory port.  Issue and write result stages require one cycle each. Address generation is performed separate from the ALU in the load and store buffers.  Branches execute in the integer unit, and instructions issued after a branch wait until the branch has been resolved and broadcast on the CDB. Functional Unit Queues and Latencies: Functional Unit # of Functional Units Latency (cycles in EX) # of Reservation Stations Memory – Load 1 3 2 Memory – Store 1 2 2 Integer 1 1 5 FP – Add 1 4 3 FP – Multiply 1 2 2 a) Perform a simulation of the first two iterations for a single issue architecture. Create the table below Iteration 1 Instruction Issue Cycle Execute Cycle Write Cycle j k L.D F3 R4 R6 1 2 6 MUL.D F4 F3 F2 2 6 17 S.D F4 R3 R6 3 17 21
  • 17. 16 A.D F4 F3 F3 4 21 26 A.D F10 F10 F4 5 26 31 DSUBUI R6 R6 #4 6 31 36 BNEQ R6 loop 7 37 Iteration 2 Instruction Issue Cycle Execute Cycle Write Cycle j k L.D F3 R4 R6 8 38 42 MUL.D F4 F3 F2 9 42 53 S.D F4 R3 R6 10 53 56 A.D F4 F3 F3 11 56 61 A.D F10 F10 F4 12 61 66 DSUBUI R6 R6 #4 13 66 71 BNEQ R6 loop 14 71 b) What is the performance bottleneck? The delay in transmission of data through the circuits of a computer's microprocessor or over a TCP/IP network. The delay typically occurs when a system's bandwidth cannot support the amount of information being relayed at the speed it is being processed c) What is the “steady state” of this loop – that is how many cycles will an average loop iteration take if loop startup and shutdown effects are ignored? The steady state of the loop occurs when the R6 will be equal to zero which means at R6 equal to zero the loop will no longer keep on iterating and will be in a steady state. d) Where will the first issue stall occur? The first stall will occur when the second instruction of MULTD F4, F3, F2 will execute because its execution will be dependent on the F3 of LD. So RAW delay will occur.