Basic non pipelined cpu architecture

Basic non-pipelined
CPU Architecture

Contents
•CPU Architecture Types
•Detailed data path of a typical register
based CPU
•Fetch-Decode-Execute Cycle
•Implementation of Control Unit: Hardwired
Approach and Micro programmed Approach
•Calculations of CPI and MIPS parameters

Recall
• A simple CPU consists of a set of registers, Arithmetic Logic Unit
(ALU), and Control Unit (CU).
• Operand: Information involved in any operation performed by the
CPU needs to be addressed. In computer terminology, such
information is called the operand.

• Registers: A processor register (CPU register) is one of a small set of
data holding places that are part of the computer processor.
A register may hold an instruction, a storage address, or any kind of
data (such as a bit sequence or individual characters). Some instructions
specify registers as part of the instruction.
• Accumulator : A one-address instruction takes the form ADD R1. In this
case the instruction implicitly refers to a register, called the Accumulator
Racc, such that the contents of the accumulator is added to the contents
of the register R1 and the results are stored back into the accumulator
Racc.
Recall

A Simple Machine
• Our simple machine is an accumulator-based processor, which
has five 16-bit registers: Program Counter (PC), Instruction
Register (IR), Address Register (AR), Accumulator (AC), and
Data Register (DR). The PC contains the address of the next
instruction to be executed. The IR contains the operation code
portion of the instruction being executed. The AR contains the
address portion (if any) of the instruction being executed. The
AC serves as the implicit source and destination of data. The
DR is used to hold data. The memory unit is made up of 4096
words of storage. The word size is 16 bits.

CPU Architecture Types
• Accumulator (before 1960, e.g. 68HC11):
1-address add A acc  acc + mem[A]
• Stack (1960s to 1970s):
0-address add tos  tos + next
• Register-Memory (1970s to present, e.g. 80x86):
2-address add R1, A R1 R1 + mem[A]
load R1, A R1  mem[A]
• Register-Register (Load/Store) (1960s to present, e.g. MIPS):
3-address add R1, R2, R3 R1  R2 + R3
load R1, R2 R1  mem[R2]
store R1, R2 mem[R1]  R2

Code Sequence C = A + B
for Four Instruction Sets
Stack Accumulator Register
(register-memory)
Register (load-
store)
Push A
Push B
Add
Pop C
Load A
Add B
Store C
Load R1, A
Add R1, B
Store C, R1
Load R1,A
Load R2, B
Add R3, R1, R2
Store C, R3
memory memory
acc = acc + mem[C] R1 = R1 + mem[C] R3 = R1 + R2

Stack Architectures
•Instruction set:
add, sub, mult, div, . . .
push A, pop A
•Example: A*B - (A+C*B)
push A
push B
mul
push A
push C
push B
mul
add
sub
A B
A
A*B
A*B
A*B
A*B
A
A
C
A*B
A A*B
A C B B*C A+B*C result

Accumulator Architectures
• Instruction set:
add A, sub A, mult A, div A, . . .
load A, store A
• Example: A*B - (A+C*B)
load B
mul C
add A
store D
load A
mul B
sub D
B B*C A+B*C AA+B*C A*B result
acc = acc +,-,*,/ mem[A]

Register-Memory Architectures
add R1, A sub R1, A mul R1, B
load R1, A store R1, A
load R1, A
mul R1, B /* A*B */
store R1, D
load R2, C
mul R2, B /* C*B */
add R2, A /* A + CB */
sub R2, D /* AB - (A + C*B) */
R1 = R1 +,-,*,/ mem[B]

Register (Load-Store) Architectures
add R1, R2, R3 sub R1, R2, R3 mul R1, R2, R3
load R1, &A store R1, &A move R1, R2
load R1, &A
load R2, &B
load R3, &C
mul R7, R3, R2 /* C*B */
add R8, R7, R1 /* A + C*B */
mul R9, R1, R2 /* A*B */
sub R10, R9, R8 /* A*B - (A+C*B) */
R3 = R1 +,-,*,/ R2

Detailed data path of a typical register
based CPU
•DATAPATH:
•The CPU can be divided into a data section (data
path: contains the registers and the ALU) and a
control section (control unit: issues control signals
to the data path).
•Internal data movement among registers and
between the ALU and registers may be carried out
using different organizations including one-bus,
two-bus, or three-bus organizations.

One-Bus Data path
• Using one bus, the CPU registers and the ALU use a single bus to
move outgoing and incoming data.
• Since a bus can handle only a single data movement within one clock
cycle, two-operand operations will need two cycles to fetch the
operands for the ALU.
• Additional registers may also be needed to buffer data for the ALU.
This bus organization is the simplest and least expensive, but it limits
the amount of data transfer that can be done in the same clock cycle,
which will slow down the overall performance.
• Figure shows a one-bus data path consisting of a set of general-
purpose registers, a memory address register (MAR), a memory data
register (MDR), an instruction register (IR), a program counter (PC),
and an ALU

Two-Bus Data path
• Using two buses is a faster solution than the one-bus organization. In
this case, general-purpose registers are connected to both buses. Data
can be transferred from two different registers to the input point of the
ALU at the same time.
• Therefore, a two operand operation can fetch both operands in the same
clock cycle. An additional buffer register may be needed to hold the
output of the ALU when the two buses are busy carrying the two
operands. Figure a shows a two-bus organization.
• In some cases, one of the buses may be dedicated for moving data into
registers (in-bus), while the other is dedicated for transferring data out of
the registers (out-bus).
• In this case, the additional buffer register may be used, as one of the
ALU inputs, to hold one of the operands.
• The ALU output can be connected directly to the in-bus, which will
transfer the result into one of the registers. Figure b shows a two-bus
organization with in-bus and out-bus.

An Example of Two-Bus Data path.

Example of Two-Bus Data path with
in-bus and out-bus

Three-Bus Data path
• In a three-bus organization, two buses may be used as source buses
while the third is used as destination.
• The source buses move data out of registers (out-bus), and the
destination bus may move data into a register (in-bus).
• Each of the two out-buses is connected to an ALU input point. The
output of the ALU is connected directly to the in-bus.
• As can be expected, the more buses we have, the more data we can
move within a single clock cycle.
• However, increasing the number of buses will also increase the
complexity of the hardware. Figure shows an example of a three-bus
data path.

Fetch-Decode-Execute
Cycle
Both the data and the program that acts upon that data are loaded
into main memory (RAM) by the operating system. The CPU is
now ready to do some work.

Steps of the Fetch-Decode-Execute
Cycle
• Get the next instruction
• Figure out what to do
• Gathering the data needed to do it
• Do it
• Save the result, and
• Repeat (billions of times/second)!

Fetch Cycle
• The Program Counter (PC) contains the address of the next
instruction to be fetched
• The address contained in the PC is copied to the Memory
Address Register (MAR).
• The instruction is copied from the memory location contained in
the MAR and placed in the Memory Buffer Register (MBR).
• The entire instruction is copied from the MBR and placed in the
Current Instruction Register (CIR)
• The PC is incremented so that it points to the next
instruction to be fetched

Execute Cycle
• The address part of the instruction is placed in the MAR
• The instruction is decoded and executed
• The processor checks for interrupts (signals from devices or
other sources seeking the attention of the processor) and
either branches to the relevant interrupt service routine or
starts the cycle again.

1.The PC contains the address of location 100
2.CU fetches instruction in location 100
3. Make a copy of the instruction into the IR
4. Increment the PC by 1
5.Activate the right circuits to execute the
instruction

1. The PC contains the address of location 101
2. CU fetches instruction in location 101
3. A copy of the instruction is saved in the IR
4. Increment the PC
5. Activate the right circuits to execute the
instruction

Control Unit
• CU is the engine that runs the entire computer with the help of the
control signals.
• It perform the correct sequencing of the correct signals.
• It controls everything with a few control signals that points within
processor and a few control signals to the system bus.
• All the micro-operation are controlled by CU by performing
two basic tasks:
• Sequencing: It causes the processor to step through the series
of micro-operation in proper sequence, based on program being
executed.
• Execution: It causes each micro-operation to be performed.

Basic non pipelined cpu architecture

Control Signal Sources
• Clock
• It helps to synchronize the operation. It causes one micro-
• operation to be performed for each clock pulse
• Instruction Register
• Op-code for current instruction
• Determines which micro-instructions are performed
• Flags
• State of CPU
• Results of previous operations
• From Control Bus
• Interrupts / Bus Requests
• Acknowledgements

Control Signal Outputs
• Within Processor
• Cause data movement
• Activate specific functions
• Via Main Bus
• To memory
• To I/O modules

Types
• There are two design approach for CU:
• Hardwired approach
• Micro-programming approach

Hardwired Approach
• The control signals are generated by the help of thehardware.
• It can be designed as the clock sequential circuit.
• It is implemented with logic gates, flip-flops, decoders,
multiplexers and other logic buildings blocks.

Micro programmed Approach
• All controls that can be activated simultaneously are grouped together
to form the control words.
• These words are stored in the control memory.
• The control words are fetched from the control memory and
are routed to various functional units to enable appropriate
processing hardware.

Attributes Hardwired Control Microprogramming
Control
Speed Fast Slow
Cost of
Implementation
More Cheaper
Flexibility Difficult to modify Flexible
Ability to handle
complex instruction
Difficult Easier
Decoding Complex Easy
Application RISC CISC
Instruction Set Size Small Large
Control Memory Absent Present
Comparison

Control Unit Function
• Sequence login unit issues read command
• Word specified in control address register is read into
control buffer register
• Control buffer register contents generates control signals
and next address information
• Sequence login loads new address into control buffer
register based on next address information from control
buffer register and ALU flags

Calculations of CPI and MIPS parameters
We denote the number of CPU clock cycles for executing a job to be the
cycle count (CC), the cycle time by CT, and the clock frequency by
f=1/CT. The time taken by the CPU to execute a job can be expressed as
CPU time = CC x CT = CC / f
It may be easier to count the number of instructions executed in a given
program as compared to counting the number of CPU clock cycles
needed for executing that program. Therefore, the average number of
clock cycles per instruction (CPI) has been used as an alternate
performance measure. The following equation shows how to compute
the CPI.
CPI = CPU clock cycles for the program/Instruction count
CPU time = Instruction count x CPI x Clock cycle time
= (Instruction count x CPI) / Clock rate

Calculations of CPI and MIPS parameters
(Contd.)
overall CPI can be computed as,
Where Ii is the number of times an instruction of type i is executed in
the program and CPIi is the average number of clock cycles needed to
execute such instruction.

Example 1
• Consider computing the overall CPI for a machine A for which the
following performance measures were recorded when executing a
set of benchmark programs. Assume that the clock rate of the CPU is
200 MHz.
• Assuming the execution of 100 instructions, the overall CPI can be
computed as
Instruction
category
Percentage of
occurrence
No. of cycles
per instruction
ALU 38 1
Load & store 15 3
Branch 42 4
Others 5 5

Answer
CPI = (38*1+15*3+42*4+5*5)/100
=2.76

MIPS(million instructions-per-second)
The rate of instruction execution per unit time,
What is the MIPS rating for the machine considered in the previous
example

answer
MIPS = (200 X 10^6) / (2.76 X 10^6)
= 70.24

Exercise 1
Suppose that the same set of benchmark programs considered above
were executed on another machine, call it machine B, for which the
following measures were recorded.
What is the MIPS rating for the machine B and assuming a clock rate of
200 MHz?
Instruction
category
Percentage of
occurrence
No. of cycles
per instruction
ALU 35 1
Load & store 30 2
Branch 15 3
Others 20 5

Answer
CPI = (35*1 + 30*2 + 15*3 + 20*5 )/ 100
= 2.4
MIPS = (200 X 10^6) / (2.4 * 10^6)
= 83.67

Exercise 2
• Write the Code sequence using four types of CPU architecture for the
following,

Reference
• FUNDAMENTALS OF COMPUTER ORGANIZATION AND ARCHITECTURE
• Mostafa Abd-El-Barr, King Fahd University of Petroleum & Minerals (KFUPM)
• Hesham El-Rewini, Southern Methodist University

Basic non pipelined cpu architecture

More Related Content

What's hot (20)

Similar to Basic non pipelined cpu architecture (20)

Recently uploaded (20)

Basic non pipelined cpu architecture