L4 speeding-up-execution

General Aspects of Computer Organization
(Lecture-4)
R S Ananda Murthy
Associate Professor and Head
Department of Electrical & Electronics Engineering,
Sri Jayachamarajendra College of Engineering,
Mysore 570 006
R S Ananda Murthy General Aspects of Computer Organization

Data Path In Side CPU
A
B
A+B
Registers ALU Input
Register
ALU
ALU Output
Register
ALU
Input Bus
BA A+B
Feeding two operands to the ALU and storing the output of
ALU in an internal register is called data path cycle.
Faster data path cycle results in faster program execution.
Multiple ALUs operating in parallel results in faster data
path cycle.

RISC Design Speeds Up Program Execution
Most manufacturers today implement the following features in
their processors to improve performance –
All instructions are directly executed by hardware instead
of being interpreted by a microprogram.
Maximize the rate at which instructions are issued by
adopting instruction-level parallelism.
Use simple ﬁxed-lengh instructions to speed-up decoding.
Avoid performing arithmetic and logical operations directly
on data present in the memory i.e., only LOAD and STORE
instructions should be executed with reference to memory.
Provide plently of registers in side the CPU.

Pipelining for High Performance
Number of stages in a pipeline varies depending upon the
hardware design of the CPU.
Each stage in a pipeline is executed by a dedicated
hardware unit in side the CU.
Each stage in a pipeline takes the same amount of time to
complete its task.
Hardware units of different stages in a pipeline can work
concurrently.
Operation of hardware units is synchronized by the clock
signal.
To implement pipelining instructions must be of ﬁxed length
and same instruction cycle time.
Pipelining requires sophisticated compiling techniques to
be implemented in the compiler.

A 4-Stage Pipeline
Clock Cycle
W1E1D1F1
Instruction
I1
W2E2D2F2I2
W3E3D3F3I3
W4E4D4F4I4
1 2 3 4 5 6 7
Time
Hardware Stages in Pipeline
F: Fetch instruction
D: Decode and get operands
E: Execute the instruction
W: Write result at destination
Period of clock signal
No. of stages in pipeline
F: Fetch
Instruction
D: Decode
and get
operands
E: Execute
operation
W: Write
results
B1 B2 B3
B1, B2, and B3 are storage buffers.
Information is passed from one stage to the next through
storage buffers.
Time taken to execute each instruction is nT.
Processor Band Width is 1/(T106) MIPS (Million
Instructions Per Second).

Superscalar Architecture
Instruction
decode
unit
Operand
fetch
unit
Instruction
execution
unit
Write
back
unit
S2 S3 S4 S5
Instruction
decode
unit
Operand
fetch
unit
Instruction
execution
unit
Write
back
unit
Instruction
fetch
unit
S1
Superscalar architecture has multiple pipelines as shown
above.
In the above example, a single fetch unit fetches a pair of
instructions together and puts each one into its own
pipeline, complete with its own ALU for parallel operation.
Compiler must ensure that the two instructions fetched do
not conﬂict over resource usage.

Superscalar Architecture with Five Functional Units
Instruction
decode
unit
Operand
fetch
unit
LOAD
Write
back
unit
S2 S3
S4
S5S1
Instruction
fetch
unit
ALU
ALU
STORE
Floating
Point
Now-a-days the word “superscalar” is used to describe
processors that issue multiple instructions – often four to
six – in a single clock cycle.
Superscalar processors generally have one pipeline with
multiple functional units as shown above.

License
This work is licensed under a
Creative Commons Attribution 4.0 International License.

L4 speeding-up-execution

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to L4 speeding-up-execution (20)

Recently uploaded (20)

L4 speeding-up-execution