2. Processor Implementation Style
• Single Cycle
– Perform each instruction in one cycle
– Clock cycle must be long enough for slowest instruction, therefore
– Disadvantage: only as fast as slowest instruction.
• Multi Cycle
– Break, fetch/execute cycle into multiple steps
– Performs 1 step in each clock cycle
– Advantages: each instruction uses only as many cycle as it needs
• Pipelined
– Executes each instrcution in multiple steps
– Performs 1 step/per instruction in each clock cycle
– Process multiple instructions in parallel.
3. ALU
• The Arithmetic Logic Unit (ALU) is the
heart of any CPU. An ALU performs three
kinds of operations, i.e.
• Arithmetic operations such as
Addition/Subtraction,
• Logical operations such as AND, OR, etc.
and
• Data movement operations such as Load
and Store
4. • An instruction execution in a CPU is
achieved by the movement of data/datum
associated with the instruction. This
movement of data is facilitated by the
Datapath.
5. 5.1 Introduction
A subset of the core MIPS instruction set:
• The memory-reference instruction – lw, sw
• The arithmetic-logical instruction – add,
sub, and, or, slt
• The branch instructions - branch equal
(beq), jump (j)
7. An overview of the implementation
For every instruction, the first two steps are
identical:
1. Sent the PC to the memory that contains the
code and fetch the instruction from that memory.
2. Read one or two registers, using fields of the
instruction to select the registers to read.
8. An overview of the implementation
(Cont.)
Even across different instruction classes
there are some similarities: Example
• All instruction classes use the ALU after
reading the registers.
Memory-reference – address calculation
Arithmetic-logic – operation execution
Branch – comparison.
9. CPU Data Path
What is a datapath? Why a datapath?
• "A class has 26 boys and 30 girl students. Find the
total number of students?"
– Identify how many boys, how many girls
– Understand that the sum to be solved using addition
– In your notebook, on a new page, create a workspace
– In the workspace, write 26 in a line and 30 in another line
– Now do the addition, if required use rough space for work out
– Rewrite this as result = 56
• Oh! you are a geek. Your computer also needs to do all
these or more steps to solve this problem.
10. • In fact, in the process of solving the
problem, the CPU has to get this
instruction from memory
– know the address of the memory location wherein the data about
boys and girls are kept
– decode the instruction to be ADD
– Get the boys girls data from memory to CPU workspace
(Registers)
– Navigate this data to ALU so that addition can be carried out
– Write the result from ALU to the Result Space.
11. • So, if we are using our brain and notepad space, the CPU uses
Registers, ALU, MEMORY, etc. The functional components that
make up the requirements of all the instruction execution. The
DATAPTH is a collection of registers, ALUs, multiplexers, status
registers and their interconnection are called DATAPATH.
• A DATAPATH is the collection of state elements,
computation elements, and interconnections that
together provide a conduit for the flow and
transformation of data in the processor during
execution
12. An overview of the implementation
(Cont.)
Fig. – 5.1 An abstract view of the implementation.
13. Essentially a DATAPATH consists of the
following elements.
• ALU – one or more to carry out the computation. ALU is not only used in data
operations but also in address calculation too.
• Instruction Register and decoder – to decode what instruction to be
executed and how to execute the instruction.
• Program Counter – Always points to the next instruction to be executed and
manages the flow of instructions.
• Memory –
– Instruction memory is mostly read-only in the fetch phase.
– Data memory is required to access the operand and result writing.
– The memory is accessed over a bus from the CPU.
– To access memory, the address of the memory location is required in addition to Read/Write
of data.
– The Memory Address Register (MAR) holds the address of the memory location to be
accessed.
– The Memory Data Register (MDR) holds the data. It holds the data read from memory (Data-
in) in the case of memory read; holds the data to be written into the memory location in the
case of Memory write operations. Thus MDR is a bidirectional register.
14. • Registers – Registers are in physical proximity and
internal to the CPU. These are ultra-fast than Memory.
Most of the times the operands are brought from memory
and kept in registers. Rather these registers are used as a
workspace and rough space for workout.
• Register files – These are multiport register set enabling
faster and parallel access to the register set.
• Internal Registers – Instruction Register, Memory
Address Register, Memory Data Register. These are not
accessible to the programmer.
• Multiplexers – Anything is reachable with these. These
allow what is to be allowed out based on the selection
input.
• Internal bus – which connects all these elements.
• Control unit – the master which manages the datapath
elements.
15. The MIPS Subset Implementation
Uses a single long clock cycle for every instruction.
Begins execution on one clock edge and completes
execution on the next clock edge.
16. Building a Datapath
• Fetching Instructions
• Instruction memory (IM) and PC are state elements.
• IM is used to hold and supply instruction given an address
• PC keeps the address of the instructions.
• An adder to increment the PC to the address of the next
instruction
17. Building a Datapath (Cont.)
• Fetching Instructions
• Use the PC to read
instruction address
• Fetch the instruction
from memory and
increment PC
• Use field of the
instruction to select
registers to read
• Execute depending on
the instruction
• Repeat…
19. Building a Datapath (Cont.)
• Arithmetic-logical Instructions
• add $t1, $t2, $t3
20. Building a Datapath (Cont.)
• Arithmetic logical operations:
• The register file contains all the registers and has two read
ports and one write ports.
• Register file always outputs the contents of the registers
corresponding to the read registers.
• Write is controlled by the write control signal
• We need total four inputs (three for register number and one
for data)
• ALU is controlled with the ALU operation signal which is 32
bit wide.
22. Building a Datapath (Cont.)
• Arithmetic-logical Instructions
• add $t1, $t2, $t3
25. Building a Datapath (Cont.)
• Load/store word:
• The memory unit in a state element with inputs for the
address and write data, having a single output for the
read result
• Separate read and write control signal
• The sign-extension unit has a 16-bit input that is sign
extended into a 32-bit result appearing on the output
29. Building a Datapath (Cont.)
• Branch Instructions
• beq $t1, $t2, offset
• Details of branch instructions:
– It has three operands , two registers that are compared for
equality and a 16 bit offset used to compute the branch
target address.
– Branch target address = sign extended offset field + PC
– To compute the branch target address the branch datapath
includes- 1. a sign extension and 2. adder
– The offset field is shifted left 2 bits.
6 bits 5 bits 5 bits 16 bits
op rs rt address / immediate
30. Building a Datapath (Cont.)
• Jump
• j 2500 # go to 10000
• Replace the lower 28 bits of the PC with the
lower 26 bits of the instruction shifted left by 2
bits.
6 bits 5 bits 5 bits 16 bits
op rs rt address / immediate
33. A Simple Implementation Scheme
• A datapath is a collection of functional units, such
as arithmetic logic units or multipliers, that perform at a
processing operations, registers, and buses. Along with
the control unit it composes the central processing unit
(CPU).
34. A Simple Implementation Scheme
• Creating a Single Datapath
– Single cycle implementation
– No datapath resource can be used more than once
per instruction.
– So any element needed more than once must be
duplicated (instruction memory and data memory are
separate)
– To share a datapath element we need multiplexer /
data selector.
48. A Simple Implementation Scheme (Cont.)
31-26 25-21 20-16 15-0
35 or 43 rs rt address
31-26 25-21 20-16 15-11 10-6 5-0
op rs rt rd shamt funct
• Designing the Main Control Unit
• R-type
• lw, sw
• Branch
31-26 25-21 20-16 15-0
4 rs rt address
49. A Simple Implementation Scheme (Cont.)
• Designing the Main Control Unit
• Observations:
– op field always contained in bits 31-26.
– Two registers to be read are always specified at
positions 25-21 and 20-16. (R-type, beq, sw)
– Base register for lw and sw is always in 25-21
– 16-bit offset (beq, lw, sw) are always in 15-0
– Destination register is in 20-16 (lw) or 15-11 (R-type).
50. • Designing the Main Control Unit
• The different positions for the two destination registers implies a
selector (i.e., a mux) to locate the appropriate field for each type of
instruction.
58. A Multicycle Implementation
• Each step in the execution will take 1 clock cycle.
• Multicycle implementation allows a functional unit to
be used more than once per instruction, as long as it
is used on different clock cycles.
• This sharing can help reduce the amount of hardware
required.
60. A Multicycle Implementation
• Advantages:
• The ability to allow instructions to take different
numbers of clock cycles.
• The ability to share functional units within the
execution of a single instruction.
63. A Multicycle Implementation
• New Registers:
The following temporary registers are important to the multicycle datapath
implementation discussed in this section:
• Instruction Register (IR) saves the data output of memory for a subsequent
instruction read
• Memory Data Register (MDR) saves memory output for a data read operation;
• A and B Registers (A,B) store ALU operand values read from the register file and
• ALU Output Register (ALUout) contains the result produced by the ALU.
64. Differences between a single-cycle and
multi-cycle datapath
• In the multicycle datapath, one memory unit stores both instructions
and data, whereas the single-cycle datapath requires separate
instruction and data memories.
• The multicycle datapath uses on ALU, versus an ALU and two adders
in the single-cycle datapath.
• In the single-cycle implementation, the instruction executes in one cycle
(by design) and the outputs of all functional units must stabilize within
one cycle. In contrast, the multicycle implementation uses one or more
registers to temporarily store (buffer) the ALU or functional unit outputs.
This buffering action stores a value in a temporary register until it is
needed or used in a subsequent clock cycle.