SlideShare a Scribd company logo
IT3030E, Fall 2024 1
Computer Architecture
Ngo Lam Trung, Pham Ngoc Hung, Hoang Van Hiep
Department of Computer Engineering
School of Information and Communication Technology (SoICT)
Hanoi University of Science and Technology
E-mail: [trungnl, hungpn, hiephv]@soict.hust.edu.vn
IT3030E, Fall 2024 2
Chapter 3: Instruction Set Architecture
(Language of the Computer)
[with materials from COD, RISC-V 2nd Edition, Patterson & Hennessy 2021,
M.J. Irwin’s presentation, PSU 2008,
The RISC-V Instruction Set Manual, Volume I, ver. 2.2]
IT3030E, Fall 2024 3
Content
❑ Introduction
❑ RISC-V Instruction Set Architecture
l Operands
l Instruction set (basic RV32I variant)
l RISC-V instruction formats
l Other RISC-V instructions
❑ Basic programming structures
l Branch and loop
l Procedure call
l Array and string
IT3030E, Fall 2024 4
What is RISC-V and its advantages (over ARM, x86)?
❑ Developed at UC Berkeley as open ISA (2010).
❑ Typical of many modern ISAs, which have a large share
of embedded market.
l RISC CPUs: Pioneers of Modern Computer Architecture
Receive ACM A.M. Turing Award
❑ Now managed by the RISC-V Foundation/RISC-V
International (https://riscv.org/, since 2015).
❑ “RISC-V combines a modular technical approach with an open,
royalty-free ISA — meaning that anyone, anywhere can benefit
from the IP contributed and produced by RISC-V.” - RISC-V
International.
❑ “RISC-V does not take a political position on behalf of any
geography.” - RISC-V International.
IT3030E, Fall 2024 5
Computer language: hardware operation
❑ Want to command the computer?
➔ You need to speak its language!!!
❑ Example: RISC-V assembly instruction
add a, b, c #a  b + c
❑ Operation performed
add b and c,
then store result into a
add a, b, c #a  b + c
operation operands comments
IT3030E, Fall 2024 6
Hardware operation
❑ What does the following code do?
add t0, g, h # t0 = g + h
add t1, i, j # t1 = i + j
sub f, t0, t1 # f = t0 - t1
❑ Equivalent C code
f = (g + h) – (i + j)
➔ Why not making 4- or 5-input instructions?
➔ DP1: Simplicity favors regularity!
Instruction format significantly influences
hardware design.
IT3030E, Fall 2024 7
Operands
❑ Object of operation
l Source operand: provides input data
l Destination operand: stores the result of operation
❑ RISC-V operands
l Registers
l Memory
l Constant/Immediate
IT3030E, Fall 2024 8
Data types in RISC-V
RV32 registers hold 32-bit (4-byte) words. Other common
data sizes include byte, halfword, and doubleword.
Byte
Halfword
Word
Doubleword
Byte = 8 bits
Word = 4 bytes
Doubleword = 8 bytes
Halfword = 2 bytes
IT3030E, Fall 2024 9
Register operand: RISC-V Register File
❑ Special memory inside CPU, called register file
❑ 32 slots, each slot is called a register (RV32I)
❑ Each register holds 32 bits of data
❑ Each register has a unique 5-bit address
Register File
src1 addr
src2 addr
dst addr
write data
32 bits
src1
data
src2
data
32
locations
32
5
32
5
5
32
write control
Read ports
addresses
Read ports
data
Write ports
address and
data
IT3030E, Fall 2024 10
RISC-V Register Convention
❑ RISC-V: load/store machine
❑ Data processing done on registers inside CPU
RISC-V integer registers
IT3030E, Fall 2024 11
Register operand: RISC-V Register File
❑ Register file: “work place” right inside CPU.
❑ Larger register file should be better, more flexibility for
CPU operation.
❑ Moore’s law: doubled number of transistor every 18 mo.
❑ Why only 32 registers, not more?
➔ DP2: Smaller is faster!
Effective use of register file is critical!
IT3030E, Fall 2024 12
Memory operand
❑ Memory operands are stored in main memory
l Large size
l Outsize CPU →Slower than register file (100 to 500 times)
❑ High level language programs use memory operands
l Variables
l Array and string
l Composite data structures
❑ Operations with memory operands
l Units of byte/half word/word/double word
l Load data from memory to register
l Store data from register to memory
IT3030E, Fall 2024 13
RISC-V memory organization
0x0f
0x0e
0x0d
0x0c
0x0b
0x0a
0x09
0x08
0x07
0x06
0x05
0x04
0x03
0x02
0x01
0x00
Word 3
Word 2
Word 1
Word 0
❑ Byte addressable
❑ Words are accessed
via byte address
❑ Only accessible via
load/store instructions
Data alignment
word address = 4 * word
number
RISC-V does not require
data alignment, but it is
strongly recommended.
➔ handled by compiler
Byte
address
(32 bit)
IT3030E, Fall 2024 14
RISC-V memory organization
0x0f
0x0e
0x0d
0x0c
0x0b
0x0a
0x09
0x08
0x07
0x06
0x05
0x04
0x03
0x02
0x01
0x00
Word 3
Word 2
Word 1
Word 0
Byte
address
(32 bit)
Is this optimized to
declare a struct in C like
this?
Struct data
{
char x;
short y;
int z;
}
Aligned Data
• Primitive data type requires K
bytes
• Address must be multiple of K
IT3030E, Fall 2024 15
Example: z = x + y
❑ x, y, z are allocated in mem, but must transfer to reg before adding
❑ Note: currently focus on instruction set first, assembly programming
will be presented later
IT3030E, Fall 2024 16
Byte Order
❑ Big Endian: word address points to MSB
IBM 360/370, Motorola 68k, Sparc, HP PA
❑ Little Endian: word address points to LSB
Intel 80x86, DEC, MIPS, RISC-V
MSB LSB
3 2 1 0
little endian order
0 1 2 3
big endian order
(most significant byte) (least significant byte)
IT3030E, Fall 2024 17
Example
❑ Consider a word in RISC-V memory consists of 4 byte
with hexa value as below
❑ What is the word’s value?
68
1B
5D
FA
❑ RISC-V is little-endian: address of LSB is X
➔ word’s value: FA5D1B68
X+3
X+2
X+1
X
address value
IT3030E, Fall 2024 18
Immediate operand
❑ Immediate value specified by a constant number
❑ Examples:
l Assignment: int x = 2024;
l Const in expression: x = y + 10;
l Branching: if.. else.., goto,…
❑ Does not need to be stored in register file or memory
l Value stored right in instruction → faster
l Fixed value specified at design time
l Cannot change value at run time
❑ What is the most-used constant?
l 0 value is stored in the special register: zero (x0)
l Make common cases fast!
IT3030E, Fall 2024 19
Instruction set
❑ Instruction: binary string represent opcode + operands
❑ RISC-V (RV32 variant) base instructions are 32 bits long.
l Must be word-aligned in memory.
❑ 6 instruction formats
➔ Why not only one format? Or 20 formats?
➔ DP3: Good design demands good compromises!
IT3030E, Fall 2024 20
Instruction categories
❑ Arithmetic: addition, subtraction,…
❑ Data transfer: transfer data between registers, memory,
and immediate
❑ Logical and bitwise: and, or, xor, shift left/right…
❑ Branch: conditional and unconditional
IT3030E, Fall 2024 21
Overview of RISC-V instruction set
Fig. 2.1
IT3030E, Fall 2024 22
Overview of RISC-V instruction set
IT3030E, Fall 2024 23
RISC-V Instruction set: Arithmetic operations
❑ RISC-V arithmetic statement
add rd, rs1, rs2 #rd  rs1 + rs2
sub rd, rs1, rs2 #rd  rs1 – rs2
addi rd, rs1, imm #rd  rs1 + imm
• rs1 5-bits register file address of the 1st source operand
• rs2 5-bits register file address of the 2nd source operand
• rd 5-bits register file address of the result’s destination
Why there is no subi instruction?
IT3030E, Fall 2024 24
Example
❑ Currently s1 = 6
❑ What is value of s1 after executing the following
instruction
addi s2, s1, 3
addi s1, s1, -2
sub s1, s2, s1
IT3030E, Fall 2024 25
RISC-V Instruction set: Logical operations
❑ Bitwise operations
IT3030E, Fall 2024 26
RISC-V Instruction set: Logical operations
❑ Basic logic operations
and rd, rs1, rs2 #rd  rs & rs2
andi rd, rs1, imm #rd  rs & imm
or rd, rs1, rs2 #rd  rs | rs2
ori rd, rs1, imm #rd  rs | imm
xor rd, rs1, rs2 #rd  rs ^ rs2
xor rd, rs1, imm #rd  rs ^ imm
❑ Example s1 = 8 = 0000 1000, s2 = 14 = 0000 1110
and s3, s1, s2
or s4, s1, s2
IT3030E, Fall 2024 27
RISC-V Instruction set: Shift operations
❑ Logical shift and arithmetic shift: move all the bits left or
right
sll rd, rs1, rs2 #rd  rs1 << rs2
srl rd, rs1, rs2 #rd  rs1 >> rs2
sra rd, rs1, rs2 #rd  rs1 >> rs2
(keep sign bit)
slli rd, rs1, imm #rd  rs1 << imm
srli rd, rs1, imm #rd  rs1 >> imm
srai rd, rs1, imm #rd  rs1 >> imm
(keep sign bit)
IT3030E, Fall 2024 28
RISC-V Instruction set: Memory access instructions
❑ RISC-V has two basic data transfer instructions for
accessing memory
lw rd, imm(rs1) #load word from memory
sw rs2, imm(rs1) #store word to memory
❑ The data is loaded into (lw) or stored from (sw) a register
in the register file
❑ The memory address is formed by adding the contents of
the base address register to the offset value
❑ Offset can be negative
❑ Data alignment is strongly recommended
❑ Why not the instruction is just like this: lw rd, imm?
IT3030E, Fall 2024 29
RISC-V Instruction set: Load Instruction
❑ Load/Store Instruction Format:
lw t0, 24(s3) #t0 mem at 24+s3
Which memory word will be loaded to t0?
Memory
data word address (hex)
0x00000000
0x00000004
0x00000008
0x0000000c
0xf f f f f f f f
$s3 0x12004094
2410 + $s3 =
. 0001 1000 (24)
+ . 1001 0100 (94)
. 1010 1100 (ac)
= 0x1200 40ac
0x120040ac
$t0
24
IT3030E, Fall 2024 30
RISC-V Instruction set: Load Instruction
❑ Given the integer array A stored in memory, with base
address stored in x13.
int A[100]; //x13 holds address of A[0]
❑ What is equivalent C code of this?
lw x10, 12(x13)
addi x12, x10, 10
sw x12, 40(x13)
IT3030E, Fall 2024 31
RISC-V control flow instructions
❑ RISC-V conditional branch instructions:
bne rs1, rs2, Dest #go to Dest if rs1rs2
beq rs1, rs2, Dest #go to Dest if rs1=rs2
bge rs1, rs2, Dest #go to Dest if rs1>=rs2
blt rs1, rs2, Dest #go to Dest if rs1<rs2
bgeu/bltu: unsigned comparison
Ex: if (i==j)
h = i + j;
bne s0, s1, Exit
add s3, s0, s1
Exit : ...
IT3030E, Fall 2024 32
Example
start:
addi s0, zero, 2 #load value for s0
addi s1, zero, 2
addi s3, zero, 0
beq s0, s1, Exit
add s3, s2, s1
Exit: add s2, s3, s1
.end start
What is final value of s2?
IT3030E, Fall 2024 33
Unconditional branch
❑ Unconditional branch instruction or jump instruction:
j Dest #go to Dest
❑ Note: this is a pseudo-instruction implemented with the
jal instruction, and auipc instruction if necessary
IT3030E, Fall 2024 34
Comparison instruction
❑ Set flag based on condition: slt
❑ Set on less than instruction:
slt $t0, $s0, $s1 # if $s0 < $s1 then
# $t0 = 1 else
# $t0 = 0
❑ Alternate versions of slt
slti $t0, $s0, 25 # if $s0 < 25 then $t0=1 ...
sltu $t0, $s0, $s1 # if $s0 < $s1 then $t0=1 ...
sltiu $t0, $s0, 25 # if $s0 < 25 then $t0=1 ...
❑ Can be combined with bne/beq for conditional branches
IT3030E, Fall 2024 35
Example
❑ Write assembly code to do the following
if (i<5)
X = 3;
else
X = 10;
Solution
slti t0,s1,5 # i<5? (inverse condition)
beq t0,zero,else # if i>=5 goto else part
addi t1,zero,3 # X = 3
j endif # skip the else part
else: addi t1,zero,10 # X = 10
endif:...
IT3030E, Fall 2024 36
Representation of RISC-V instruction
❑ All RISC-V instructions are 32 bits wide
The RISC-V Instruction Set Manual, Volume I: User-Level ISA
IT3030E, Fall 2024 37
R-format instruction: all operands are registers
❑ All fields are encoded by mnemonic names
❑ Examples
IT3030E, Fall 2024 38
Example of R-format instruction
add s1, s4, s5
add x9, x20, x21
0 21 20 0 9 51
0000000 10101 10100 000 01001 0110011
IT3030E, Fall 2024 39
I-format instruction: 2 registers + 1 immediate
❑ Combines the funct7 and rs2 for 12-bit immediate
❑ Examples
IT3030E, Fall 2024 40
Example
❑ Find machine codes of the following instructions
lw t0, 0(s1) # initialize maximum to A[0]
addi t1, zero, 0 # initialize index i to 0
addi t1, t1, 1 # increment index i by 1
IT3030E, Fall 2024 41
S-format instruction: 2 registers + 1 immediate
❑ Combines the funct7 and rd for 12-bit immediate
❑ Used for the store instructions, which does not require rd
❑ Examples
IT3030E, Fall 2024 42
B-format instruction: 2 registers + 1 immediate
❑ Combines the funct7 and rd for 13-bit immediate
l Lsb = 0 (imm[12:1] for half-word instruction address, more on
this later).
l Keep the same bit position as S-format
l Msb always in bit 31 of instruction word (simplified sign-
extension, also more on this later)
❑ As a result: position of 13 bits immediate are mixed
❑ Used for conditional jump instructions
IT3030E, Fall 2024 43
U- and J-format instruction: 1 register + 1 immediate
❑ Combines the funct7, funct3, rs1 and rs2 for 20-bit
immediate
❑ U-format: for load/add 20 bit upper-immediate to register
lui rd, upimm # rd  {upimm,000}
auipc rd, upimm # rd  PC + {upimm,000}
❑ J-format: for jump and link
jal rd, label # PC  PC+addr, rd  PC+4
addr = SignExt{imm,0}
❑ Pseudo-instruction j label ➔ jal x0, label
IT3030E, Fall 2024 44
Working with wide immediates and addresses
❑ Many operations need 32-bit immediates
l Loading 32-bit immediates to registers
l Loading addresses to registers
❑ However, instructions are only 32 bit-long
l Not sufficient to store 32-bit immediates in one instruction
l →combine 2 instructions to support wide immediates
❑ Example: load the value 0x3D0100 into s0
lui s0, 0x003D0 #s0  0x003D0000
addi s0, s0, 0x0100 #s0  0x003D0100
❑ Pseudo-instructions: combination of real instructions, for
convenience
l li, la
IT3030E, Fall 2024 45
Working with wide immediates and addresses
❑ Special case: long jump
❑ Conditional branches: blt, bne,..
l B-format, with 12 bit immediates
l Limited to 4 KB → limit branching distance
l Solution: change to jal
❑ Unconditional jump (jal)
l Distance is limited to 1MB
l Solution: use jalr, combine with auipc if necessary
jalr rd, rs1, imm # PC = rs1 + SignExt(imm), rd = PC+4
beq x10, x0, L1 #limit 4KB bne x10, x0, L2
jal x0, L1 #limit 1MB
L2:
IT3030E, Fall 2024 46
Exercise
❑ How branch instruction is executed?
❑ ➔ PC-relative addressing mode
slti t0, s1, 5
bne t0, zero, else
addi t1, zero, 3
j endif
else: addi t1, zero, 10
endif:...
How can CPU jump from here
to the “else” label?
IT3030E, Fall 2024 47
Example
switch(test) {
case 0:
a=a+1; break;
case 1:
a=a-1; break;
case 2:
b=2*b; break;
default:
}
Solution
beq s1,t0,case_0
beq s1,t1,case_1
beq s1,t2,case_2
j default
case_0:
addi s2,s2,1 #a=a+1
j continue
case_1:
sub s2,s2,t1 #a=a-1
j continue
case_2:
add s3,s3,s3 #b=2*b
j continue
default:
continue:
Assuming that: test,a,b are
stored in $s1,$s2,$s3
The simple switch
IT3030E, Fall 2024 48
Example
❑ Write assembly code correspond to the following C code
for (i = 0; i < n; i++)
sum = sum + A[i];
loop:
addi s1,s1,1 #i=i+step
add t1,s1,s1 #t1=2*s1
add t1,t1,t1 #t1=4*s1
add t1,t1,a0 #t1 <- address of A[i]
lw t0,0(t1) #load value of A[i] in t0
add s0,s0,t0 #sum = sum+A[i]
bne s1,a1,loop #if i != n, goto loop
IT3030E, Fall 2024 49
Example
The simple while loop: while (A[i]==k) i=i+1;
Assuming that: i, k, A are stored in x22,x24,x25
Solution
Loop:
slli x10, x22, 2 #i*4
add x10, x10, x25 #A[i] address
lw x9, 0(x10) #A[i] value
bne x9, x24, Exit #break if != k
addi x22, x22, 1 #next element
beq x0, x0, Loop
Exit: …
IT3030E, Fall 2024 50
Procedures
❑ Stack structure
❑ Passing control
l To beginning of procedure code
l Back to return point
❑ Passing data
l Procedure arguments
l Return value
❑ Register saving conventions
❑ Memory management
l Allocate during procedure execution
l Deallocate upon return
P(…) {
•
•
y = Q(x);
print(y)
•
}
int Q(int i)
{
int t = 3*i;
int v[10];
•
•
return v[t];
}
IT3030E, Fall 2024 51
Stack structure
❑ A region of memory operating on a Last In First Out
(LIFO) principle
❑ The bottom of stack is at the highest location
❑ sp: point to the top of the stack
b
a
$sp c
Frame for
current
procedure
$fp
.
.
.
Before calling
Sa
reg
Lo
var
IT3030E, Fall 2024 52
Stack structure
❑ To push data into stack
l addi sp, sp, -4
l sw t0, 0(sp)
❑ To pop data from the stack
l lw t0, 0(sp)
l addi sp, sp, 4
l Note that: the data is still there in the stack, but we
are not going to work with it anymore.
IT3030E, Fall 2024 53
Passing control flow
❑ Procedure call: using RISC-V procedure call instruction
jal rd, ProcAddress #jump and link
l Saves the return address (PC+4) in destination register rd
(usually in ra or x1)
l Jump to the ProcAddress
❑ Return address:
l Address of the next instruction right after call
❑ Procedure return: procedure return with
jalr x0, 0(x1)
l Update the value of PC = ra
l Jump to the address of PC (the next instruction right after
procedure call)
IT3030E, Fall 2024 54
Passing control
jal proc
jr $ra
proc
Save, etc.
Restore
PC
Prepare
to continue
Prepare
to call
main
jalr x0, 0(x1)
IT3030E, Fall 2024 55
Passing control
❑ Demonstrate on the Rars simulator
❑ Take care the value of the pc and ra register!
IT3030E, Fall 2024 56
Procedure call and nested procedure call
Example of nested procedure call
Question: how can the CPU resume the main program execution?
IT3030E, Fall 2024 57
Passing data
❑ Use registers
l Input arguments:
- a0-a7
- 8 parameters (arguments) maximum
l Return value:
- a0
❑ What if we want to pass more than 8 arguments
→ use the stack:
l Caller pushes arguments into stack before calling the
callee
l Callee get arguments from the stack
l (Optional) Callee saves the return value to the stack
l Question: who clean the stack, caller or callee?
IT3030E, Fall 2024 58
Register saving convention
❑ Registers to be saved by the caller
l ra, t0-t6, a0-a7
❑ Registers must be saved by the callee
l sp, s0-s11
❑ Note: save the registers just in case you need to
modify them.
❑ Question: where to save the above registers?
IT3030E, Fall 2024 59
Memory management
❑ Question: where to locate the
local variables: t and v?
l Use registers: # of registers is
limited
l Use Stack
P(…) {
•
•
y = Q(x);
print(y)
•
}
int Q(int i)
{
int t = 3*i;
int v[10];
•
•
return v[t];
}
IT3030E, Fall 2024 60
Carnegie Mellon
Stack Frame
❑ Current Stack Frame (“Top” to
Bottom)
l “Argument build:”
Parameters for function about to call
l Local variables
If can’t keep in registers
l Saved register context
l Old frame pointer (optional)
❑ Caller Stack Frame
l Return address
- Pushed by jal instruction
l Arguments for this call
Return Addr
Saved
Registers
+
Local
Variables
Argument
Build
(Optional)
Old fp
Arguments
9+
Caller
Frame
Frame pointer
fp
Stack pointer
sp
(Optional)
IT3030E, Fall 2024 61
Six Steps in the Execution of a Procedure
1. Main routine (caller) places parameters in a place
where the procedure (callee) can access them
l a0 – a7 (x10 – x17): 8 argument registers
2. Caller transfers control to the callee (jal)
3. Callee acquires the storage resources needed
4. Callee performs the desired task
5. Callee places the result value in a place where the
caller can access it
l a0 – a1: two value registers for result values
6. Callee returns control to the caller (jalr)
l ra (x1): one return address register to return to caller
IT3030E, Fall 2024 62
Procedure that does not call another proc.
❑ C code:
int leaf_example (int g, h, i, j)
{
int f;
f = (g + h) - (i + j);
return f;
}
l g, h, i, j stored in a0, a1, a2, a3
l f in s0 (need to be saved)
l t0 and t1 used for temporary data
l Preserve all s0, t0, t1 for safety
l Result in a0
IT3030E, Fall 2024 63
Sample code
leaf_example:
addi sp, sp, -12
sw t1, 8(sp)
sw t0, 4(sp)
sw s0, 0(sp)
add t0, a0, a1
add t1, a2, a3
sub s0, t0, t1
add a0, s0, zero
lw s0, 0(sp)
lw t0, 4(sp)
lw t1, 8(sp)
addi sp, sp, 12
jalr zero, 0(ra)
# room for 3 items
# save t1
# save t0
# save s0
# t0 = g+h
# t1 = i+j
# s0 = (g+h)-(i+j)
# return value in a0
# restore s0
# restore t0
# restore t1
# deallocate
# return to caller
Exercise: write code to utilize the procedure above
IT3030E, Fall 2024 64
Stack usage
IT3030E, Fall 2024 65
Procedure with nested proc.
❑ C code:
int fact (int n)
{
if (n < 1) return (1);
else return n * fact(n - 1);
}
l n in $a0
l Result in $a0
IT3030E, Fall 2024 66
Sample code
fact:
addi sp, sp, -8
sw ra, 4(sp)
sw a0, 0(sp)
addi t0, a0, -1
bge t0, zero, L1
addi a0, zero, 1
addi sp, sp, 8
jr ra
L1: addi a0, a0, -1
jal fact
add a1, a0, zero
lw a0, 0(sp)
lw ra, 4(sp)
addi sp, sp, 8
mul a0, a0, a1
jalr x0, 0(ra)
#2 items in stack
#save return address
#and current n
#n-1 > 0
#continue
#the base case, return 1
#deallocate 2 words
#and return
#otherwise reduce n
#then call fact again
#restore n
#and return address
#shrink stack
#value for normal case
#multiply with n
#and return
IT3030E, Fall 2024 67
RISC-V memory configuration
❑ Program text: stores machine code of program, declared
with .text
❑ Static data: data segment, declared with .data
❑ Heap: for dynamic allocation
❑ Stack: for local variable and dynamic allocation (LIFO)
IT3030E, Fall 2024 68
Accessing characters and string
❑ String is accessed as array of characters
❑ Accessing 1-byte characters
lb rd, imm(rs1) #load byte with sign-extension
lbu rd, imm(rs1) #load byte with zero-extension
sb rs2, 0(rs1) #store LSB to memory
IT3030E, Fall 2024 69
Accessing characters and string
❑ Accessing 2-byte characters
lh rd, imm(rs1) #load half with sign-extension
lhu rd, imm(rs1) #load half with zero-extension
sh rs2, 0(rs1) #store 2 LSB to memory
❑ Example: string copy
void strcpy (char x[], char y[])
{
int i = 0;
while ((x[i] = y[i]) != ‘0’)
i += 1;
}
IT3030E, Fall 2024 70
Accessing characters and string
#x and y are in a0 and a1, i in s0
strcpy:
addi sp,sp,–4 # adjust stack for 1 more item
sw s0, 0(sp) # save s0
add s0, zero, zero # i = 0
L1: add t1, s0, a1 # address of y[i] in t1
lbu t2, 0(t1) # t2 = y[i]
add t3, s0, a0 # address of x[i] in t3
sb t2, 0(t3) # x[i] = y[i]
beq t2, zero,L2 # if y[i] == 0, go to L2
addi s0, s0, 1 # i = i + 1
beq x0, x0, L1 # go to L1
L2: lw s0, 0(sp) # y[i] == 0: end of string.
# Restore old $s0
addi sp,sp,4 # pop 1 word off stack
jalr ra # return
IT3030E, Fall 2024 71
Interchange sort function
void sort (int v[], int n)
{
int i, j;
for (i = 0; i < n; i += 1)
{
for (j = i – 1; j >= 0 && v[j] > v[j + 1]; j-=1)
{
swap(v,j);
}
}
}
void swap(int v[], int k)
{
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
IT3030E, Fall 2024 72
RISC-V Instruction Set Extensions
❑ The RV31I Instruction Set (that we have learnt so far)
l Instruction word: 32 bits
l Only work on integers
l Supports arithmetic, logic and shift, data transfer, branches
❑ How about:
l Other instruction word length?
l Data other than integers?
l Additional operations: multiplication, division…?
❑ Instruction Set Extensions
l Additional operations and data types
l Additional formats and customed formats
l → RISC-V scalable ISA
IT3030E, Fall 2024 73
RISC-V Standard Extensions
❑ 32-bit instruction extensions
l “M”: Integer Multiplication and Division Instructions
l “A”: Atomic (Memory) Instructions
l “F”: Single-Precision Floating-Point Instructions
l “D”: Double-Precision Floating-Point Instructions
❑ 16-bit: “C”: Compressed Instructions
RISC-V instruction length encoding
IT3030E, Fall 2024 74
RV32M: Integer Multiplication and Division Extension
❑ Support integer multiplication, division (div and rem)
operations.
❑ All are R-format.
IT3030E, Fall 2024 75
RV32A: Atomic Extension
❑ Support synchronized “atomic” memory access.
l Load + data op + Store become atomic.
l Similar to semaphore/mutex in multithread software.
l Basically R-format, with aq (acquire) and rl (release) bits.
IT3030E, Fall 2024 76
RV32F / D Floating-Point Extensions
❑ Support floating point operations.
l Additional floating point register file for new data type.
l Additional instructions to work with the new register file.
l Additional load/store instructions.
❑ Data representation and computation are compliant with
the IEEE 754-2008 standard (chapter 4).
l “F”: 32-bit single precision floating point numbers (float in C).
l “D”: 64-bit double precision floating point numbers (double in C).
RISC-V floating point register file
IT3030E, Fall 2024 77
RV32F / D Floating-Point Extensions
❑ Single precisision instructions
❑ .s for single, .d for double
IT3030E, Fall 2024 78
RVC Compressed Extension
❑ 16-bit length instructions
l Double code density compared to 32-bit instructions.
l Limited to most frequently-used instructions/operands.
l Overall 25% - 30% code-size reduction.
RVC Instruction Formats
RVC Registers for CIW, CL, CS, CB instructions
IT3030E, Fall 2024 79
RVC Compressed Instructions
IT3030E, Fall 2024 80
Further reading
❑ MIPS instruction set
❑ ARM instruction set
❑ x86 instruction set
IT3030E, Fall 2024 81
The end

More Related Content

PDF
IT3030E-CA-Chap3-ISA-Exercises_aaaaa.pdf
PPT
COMPUTER ARCHITECTURE MIPS INTRODUCTION ISA
PDF
Cmps290 classnoteschap02
PPT
C language programming
PPT
MIPS instruction set microprocessor lecture notes
PPT
EMBEDDED SYSTEMS 4&5
PPTX
HDL17_MIPS CPU Design using Verilog.pptx
PPT
Chapter Eight(3)
IT3030E-CA-Chap3-ISA-Exercises_aaaaa.pdf
COMPUTER ARCHITECTURE MIPS INTRODUCTION ISA
Cmps290 classnoteschap02
C language programming
MIPS instruction set microprocessor lecture notes
EMBEDDED SYSTEMS 4&5
HDL17_MIPS CPU Design using Verilog.pptx
Chapter Eight(3)

Similar to IT3030E-CA-Chap3-Instruction Set Architecture.pdf (20)

PDF
system software 16 marks
PPTX
ISA.pptx
PPTX
Computer Organization Unit 3 Computer Fundamentals
PPT
leccccccccccc14_combinational_blocks.ppt
PPTX
Instruction Set Architecture: MIPS
PPTX
Instruction Set Architecture
PDF
Instructions: Language of the Computer pdf document is very useful
PPT
C language programming
PPTX
dinoC_ppt.pptx
PPTX
Lecture 2 coal sping12
PDF
Develop Embedded Software Module-Session 3
DOC
GSP 215 Doing by learn/newtonhelp.com
DOC
GSP 215 Perfect Education/newtonhelp.com
DOC
GSP 215 Become Exceptional/newtonhelp.com
DOC
Gsp 215 Future Our Mission/newtonhelp.com
PPTX
8086 Micro-processor and MDA 8086 Trainer Kit
PDF
cyber_systems.pdf
PPT
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
PPT
Chapter Eight(2)
system software 16 marks
ISA.pptx
Computer Organization Unit 3 Computer Fundamentals
leccccccccccc14_combinational_blocks.ppt
Instruction Set Architecture: MIPS
Instruction Set Architecture
Instructions: Language of the Computer pdf document is very useful
C language programming
dinoC_ppt.pptx
Lecture 2 coal sping12
Develop Embedded Software Module-Session 3
GSP 215 Doing by learn/newtonhelp.com
GSP 215 Perfect Education/newtonhelp.com
GSP 215 Become Exceptional/newtonhelp.com
Gsp 215 Future Our Mission/newtonhelp.com
8086 Micro-processor and MDA 8086 Trainer Kit
cyber_systems.pdf
Lec19 Intro to Computer Engineering by Hsien-Hsin Sean Lee Georgia Tech -- Pr...
Chapter Eight(2)
Ad

Recently uploaded (20)

PPTX
PLC ANALOGUE DONE BY KISMEC KULIM TD 5 .0
PPTX
STEEL- intro-1.pptxhejwjenwnwnenemwmwmwm
PPTX
Wireless and Mobile Backhaul Market.pptx
PPTX
Sem-8 project ppt fortvfvmat uyyjhuj.pptx
PPTX
02fdgfhfhfhghghhhhhhhhhhhhhhhhhhhhh.pptx
PPTX
Fundamentals of Computer.pptx Computer BSC
PPTX
quadraticequations-111211090004-phpapp02.pptx
PPTX
Nanokeyer nano keyekr kano ketkker nano keyer
PDF
Smarter Security: How Door Access Control Works with Alarms & CCTV
PPTX
Syllabus Computer Six class curriculum s
PPTX
1.pptxsadafqefeqfeqfeffeqfqeqfeqefqfeqfqeffqe
PPTX
Embeded System for Artificial intelligence 2.pptx
PPTX
Presentacion compuuuuuuuuuuuuuuuuuuuuuuu
PDF
Layer23-Switch.com The Cisco Catalyst 9300 Series is Cisco’s flagship stackab...
PPTX
ERP good ERP good ERP good ERP good good ERP good ERP good
PDF
Cableado de Controladores Logicos Programables
PPTX
Operating System Processes_Scheduler OSS
PPTX
code of ethics.pptxdvhwbssssSAssscasascc
PPTX
Computers and mobile device: Evaluating options for home and work
DOCX
fsdffdghjjgfxfdghjvhjvgfdfcbchghgghgcbjghf
PLC ANALOGUE DONE BY KISMEC KULIM TD 5 .0
STEEL- intro-1.pptxhejwjenwnwnenemwmwmwm
Wireless and Mobile Backhaul Market.pptx
Sem-8 project ppt fortvfvmat uyyjhuj.pptx
02fdgfhfhfhghghhhhhhhhhhhhhhhhhhhhh.pptx
Fundamentals of Computer.pptx Computer BSC
quadraticequations-111211090004-phpapp02.pptx
Nanokeyer nano keyekr kano ketkker nano keyer
Smarter Security: How Door Access Control Works with Alarms & CCTV
Syllabus Computer Six class curriculum s
1.pptxsadafqefeqfeqfeffeqfqeqfeqefqfeqfqeffqe
Embeded System for Artificial intelligence 2.pptx
Presentacion compuuuuuuuuuuuuuuuuuuuuuuu
Layer23-Switch.com The Cisco Catalyst 9300 Series is Cisco’s flagship stackab...
ERP good ERP good ERP good ERP good good ERP good ERP good
Cableado de Controladores Logicos Programables
Operating System Processes_Scheduler OSS
code of ethics.pptxdvhwbssssSAssscasascc
Computers and mobile device: Evaluating options for home and work
fsdffdghjjgfxfdghjvhjvgfdfcbchghgghgcbjghf
Ad

IT3030E-CA-Chap3-Instruction Set Architecture.pdf

  • 1. IT3030E, Fall 2024 1 Computer Architecture Ngo Lam Trung, Pham Ngoc Hung, Hoang Van Hiep Department of Computer Engineering School of Information and Communication Technology (SoICT) Hanoi University of Science and Technology E-mail: [trungnl, hungpn, hiephv]@soict.hust.edu.vn
  • 2. IT3030E, Fall 2024 2 Chapter 3: Instruction Set Architecture (Language of the Computer) [with materials from COD, RISC-V 2nd Edition, Patterson & Hennessy 2021, M.J. Irwin’s presentation, PSU 2008, The RISC-V Instruction Set Manual, Volume I, ver. 2.2]
  • 3. IT3030E, Fall 2024 3 Content ❑ Introduction ❑ RISC-V Instruction Set Architecture l Operands l Instruction set (basic RV32I variant) l RISC-V instruction formats l Other RISC-V instructions ❑ Basic programming structures l Branch and loop l Procedure call l Array and string
  • 4. IT3030E, Fall 2024 4 What is RISC-V and its advantages (over ARM, x86)? ❑ Developed at UC Berkeley as open ISA (2010). ❑ Typical of many modern ISAs, which have a large share of embedded market. l RISC CPUs: Pioneers of Modern Computer Architecture Receive ACM A.M. Turing Award ❑ Now managed by the RISC-V Foundation/RISC-V International (https://riscv.org/, since 2015). ❑ “RISC-V combines a modular technical approach with an open, royalty-free ISA — meaning that anyone, anywhere can benefit from the IP contributed and produced by RISC-V.” - RISC-V International. ❑ “RISC-V does not take a political position on behalf of any geography.” - RISC-V International.
  • 5. IT3030E, Fall 2024 5 Computer language: hardware operation ❑ Want to command the computer? ➔ You need to speak its language!!! ❑ Example: RISC-V assembly instruction add a, b, c #a  b + c ❑ Operation performed add b and c, then store result into a add a, b, c #a  b + c operation operands comments
  • 6. IT3030E, Fall 2024 6 Hardware operation ❑ What does the following code do? add t0, g, h # t0 = g + h add t1, i, j # t1 = i + j sub f, t0, t1 # f = t0 - t1 ❑ Equivalent C code f = (g + h) – (i + j) ➔ Why not making 4- or 5-input instructions? ➔ DP1: Simplicity favors regularity! Instruction format significantly influences hardware design.
  • 7. IT3030E, Fall 2024 7 Operands ❑ Object of operation l Source operand: provides input data l Destination operand: stores the result of operation ❑ RISC-V operands l Registers l Memory l Constant/Immediate
  • 8. IT3030E, Fall 2024 8 Data types in RISC-V RV32 registers hold 32-bit (4-byte) words. Other common data sizes include byte, halfword, and doubleword. Byte Halfword Word Doubleword Byte = 8 bits Word = 4 bytes Doubleword = 8 bytes Halfword = 2 bytes
  • 9. IT3030E, Fall 2024 9 Register operand: RISC-V Register File ❑ Special memory inside CPU, called register file ❑ 32 slots, each slot is called a register (RV32I) ❑ Each register holds 32 bits of data ❑ Each register has a unique 5-bit address Register File src1 addr src2 addr dst addr write data 32 bits src1 data src2 data 32 locations 32 5 32 5 5 32 write control Read ports addresses Read ports data Write ports address and data
  • 10. IT3030E, Fall 2024 10 RISC-V Register Convention ❑ RISC-V: load/store machine ❑ Data processing done on registers inside CPU RISC-V integer registers
  • 11. IT3030E, Fall 2024 11 Register operand: RISC-V Register File ❑ Register file: “work place” right inside CPU. ❑ Larger register file should be better, more flexibility for CPU operation. ❑ Moore’s law: doubled number of transistor every 18 mo. ❑ Why only 32 registers, not more? ➔ DP2: Smaller is faster! Effective use of register file is critical!
  • 12. IT3030E, Fall 2024 12 Memory operand ❑ Memory operands are stored in main memory l Large size l Outsize CPU →Slower than register file (100 to 500 times) ❑ High level language programs use memory operands l Variables l Array and string l Composite data structures ❑ Operations with memory operands l Units of byte/half word/word/double word l Load data from memory to register l Store data from register to memory
  • 13. IT3030E, Fall 2024 13 RISC-V memory organization 0x0f 0x0e 0x0d 0x0c 0x0b 0x0a 0x09 0x08 0x07 0x06 0x05 0x04 0x03 0x02 0x01 0x00 Word 3 Word 2 Word 1 Word 0 ❑ Byte addressable ❑ Words are accessed via byte address ❑ Only accessible via load/store instructions Data alignment word address = 4 * word number RISC-V does not require data alignment, but it is strongly recommended. ➔ handled by compiler Byte address (32 bit)
  • 14. IT3030E, Fall 2024 14 RISC-V memory organization 0x0f 0x0e 0x0d 0x0c 0x0b 0x0a 0x09 0x08 0x07 0x06 0x05 0x04 0x03 0x02 0x01 0x00 Word 3 Word 2 Word 1 Word 0 Byte address (32 bit) Is this optimized to declare a struct in C like this? Struct data { char x; short y; int z; } Aligned Data • Primitive data type requires K bytes • Address must be multiple of K
  • 15. IT3030E, Fall 2024 15 Example: z = x + y ❑ x, y, z are allocated in mem, but must transfer to reg before adding ❑ Note: currently focus on instruction set first, assembly programming will be presented later
  • 16. IT3030E, Fall 2024 16 Byte Order ❑ Big Endian: word address points to MSB IBM 360/370, Motorola 68k, Sparc, HP PA ❑ Little Endian: word address points to LSB Intel 80x86, DEC, MIPS, RISC-V MSB LSB 3 2 1 0 little endian order 0 1 2 3 big endian order (most significant byte) (least significant byte)
  • 17. IT3030E, Fall 2024 17 Example ❑ Consider a word in RISC-V memory consists of 4 byte with hexa value as below ❑ What is the word’s value? 68 1B 5D FA ❑ RISC-V is little-endian: address of LSB is X ➔ word’s value: FA5D1B68 X+3 X+2 X+1 X address value
  • 18. IT3030E, Fall 2024 18 Immediate operand ❑ Immediate value specified by a constant number ❑ Examples: l Assignment: int x = 2024; l Const in expression: x = y + 10; l Branching: if.. else.., goto,… ❑ Does not need to be stored in register file or memory l Value stored right in instruction → faster l Fixed value specified at design time l Cannot change value at run time ❑ What is the most-used constant? l 0 value is stored in the special register: zero (x0) l Make common cases fast!
  • 19. IT3030E, Fall 2024 19 Instruction set ❑ Instruction: binary string represent opcode + operands ❑ RISC-V (RV32 variant) base instructions are 32 bits long. l Must be word-aligned in memory. ❑ 6 instruction formats ➔ Why not only one format? Or 20 formats? ➔ DP3: Good design demands good compromises!
  • 20. IT3030E, Fall 2024 20 Instruction categories ❑ Arithmetic: addition, subtraction,… ❑ Data transfer: transfer data between registers, memory, and immediate ❑ Logical and bitwise: and, or, xor, shift left/right… ❑ Branch: conditional and unconditional
  • 21. IT3030E, Fall 2024 21 Overview of RISC-V instruction set Fig. 2.1
  • 22. IT3030E, Fall 2024 22 Overview of RISC-V instruction set
  • 23. IT3030E, Fall 2024 23 RISC-V Instruction set: Arithmetic operations ❑ RISC-V arithmetic statement add rd, rs1, rs2 #rd  rs1 + rs2 sub rd, rs1, rs2 #rd  rs1 – rs2 addi rd, rs1, imm #rd  rs1 + imm • rs1 5-bits register file address of the 1st source operand • rs2 5-bits register file address of the 2nd source operand • rd 5-bits register file address of the result’s destination Why there is no subi instruction?
  • 24. IT3030E, Fall 2024 24 Example ❑ Currently s1 = 6 ❑ What is value of s1 after executing the following instruction addi s2, s1, 3 addi s1, s1, -2 sub s1, s2, s1
  • 25. IT3030E, Fall 2024 25 RISC-V Instruction set: Logical operations ❑ Bitwise operations
  • 26. IT3030E, Fall 2024 26 RISC-V Instruction set: Logical operations ❑ Basic logic operations and rd, rs1, rs2 #rd  rs & rs2 andi rd, rs1, imm #rd  rs & imm or rd, rs1, rs2 #rd  rs | rs2 ori rd, rs1, imm #rd  rs | imm xor rd, rs1, rs2 #rd  rs ^ rs2 xor rd, rs1, imm #rd  rs ^ imm ❑ Example s1 = 8 = 0000 1000, s2 = 14 = 0000 1110 and s3, s1, s2 or s4, s1, s2
  • 27. IT3030E, Fall 2024 27 RISC-V Instruction set: Shift operations ❑ Logical shift and arithmetic shift: move all the bits left or right sll rd, rs1, rs2 #rd  rs1 << rs2 srl rd, rs1, rs2 #rd  rs1 >> rs2 sra rd, rs1, rs2 #rd  rs1 >> rs2 (keep sign bit) slli rd, rs1, imm #rd  rs1 << imm srli rd, rs1, imm #rd  rs1 >> imm srai rd, rs1, imm #rd  rs1 >> imm (keep sign bit)
  • 28. IT3030E, Fall 2024 28 RISC-V Instruction set: Memory access instructions ❑ RISC-V has two basic data transfer instructions for accessing memory lw rd, imm(rs1) #load word from memory sw rs2, imm(rs1) #store word to memory ❑ The data is loaded into (lw) or stored from (sw) a register in the register file ❑ The memory address is formed by adding the contents of the base address register to the offset value ❑ Offset can be negative ❑ Data alignment is strongly recommended ❑ Why not the instruction is just like this: lw rd, imm?
  • 29. IT3030E, Fall 2024 29 RISC-V Instruction set: Load Instruction ❑ Load/Store Instruction Format: lw t0, 24(s3) #t0 mem at 24+s3 Which memory word will be loaded to t0? Memory data word address (hex) 0x00000000 0x00000004 0x00000008 0x0000000c 0xf f f f f f f f $s3 0x12004094 2410 + $s3 = . 0001 1000 (24) + . 1001 0100 (94) . 1010 1100 (ac) = 0x1200 40ac 0x120040ac $t0 24
  • 30. IT3030E, Fall 2024 30 RISC-V Instruction set: Load Instruction ❑ Given the integer array A stored in memory, with base address stored in x13. int A[100]; //x13 holds address of A[0] ❑ What is equivalent C code of this? lw x10, 12(x13) addi x12, x10, 10 sw x12, 40(x13)
  • 31. IT3030E, Fall 2024 31 RISC-V control flow instructions ❑ RISC-V conditional branch instructions: bne rs1, rs2, Dest #go to Dest if rs1rs2 beq rs1, rs2, Dest #go to Dest if rs1=rs2 bge rs1, rs2, Dest #go to Dest if rs1>=rs2 blt rs1, rs2, Dest #go to Dest if rs1<rs2 bgeu/bltu: unsigned comparison Ex: if (i==j) h = i + j; bne s0, s1, Exit add s3, s0, s1 Exit : ...
  • 32. IT3030E, Fall 2024 32 Example start: addi s0, zero, 2 #load value for s0 addi s1, zero, 2 addi s3, zero, 0 beq s0, s1, Exit add s3, s2, s1 Exit: add s2, s3, s1 .end start What is final value of s2?
  • 33. IT3030E, Fall 2024 33 Unconditional branch ❑ Unconditional branch instruction or jump instruction: j Dest #go to Dest ❑ Note: this is a pseudo-instruction implemented with the jal instruction, and auipc instruction if necessary
  • 34. IT3030E, Fall 2024 34 Comparison instruction ❑ Set flag based on condition: slt ❑ Set on less than instruction: slt $t0, $s0, $s1 # if $s0 < $s1 then # $t0 = 1 else # $t0 = 0 ❑ Alternate versions of slt slti $t0, $s0, 25 # if $s0 < 25 then $t0=1 ... sltu $t0, $s0, $s1 # if $s0 < $s1 then $t0=1 ... sltiu $t0, $s0, 25 # if $s0 < 25 then $t0=1 ... ❑ Can be combined with bne/beq for conditional branches
  • 35. IT3030E, Fall 2024 35 Example ❑ Write assembly code to do the following if (i<5) X = 3; else X = 10; Solution slti t0,s1,5 # i<5? (inverse condition) beq t0,zero,else # if i>=5 goto else part addi t1,zero,3 # X = 3 j endif # skip the else part else: addi t1,zero,10 # X = 10 endif:...
  • 36. IT3030E, Fall 2024 36 Representation of RISC-V instruction ❑ All RISC-V instructions are 32 bits wide The RISC-V Instruction Set Manual, Volume I: User-Level ISA
  • 37. IT3030E, Fall 2024 37 R-format instruction: all operands are registers ❑ All fields are encoded by mnemonic names ❑ Examples
  • 38. IT3030E, Fall 2024 38 Example of R-format instruction add s1, s4, s5 add x9, x20, x21 0 21 20 0 9 51 0000000 10101 10100 000 01001 0110011
  • 39. IT3030E, Fall 2024 39 I-format instruction: 2 registers + 1 immediate ❑ Combines the funct7 and rs2 for 12-bit immediate ❑ Examples
  • 40. IT3030E, Fall 2024 40 Example ❑ Find machine codes of the following instructions lw t0, 0(s1) # initialize maximum to A[0] addi t1, zero, 0 # initialize index i to 0 addi t1, t1, 1 # increment index i by 1
  • 41. IT3030E, Fall 2024 41 S-format instruction: 2 registers + 1 immediate ❑ Combines the funct7 and rd for 12-bit immediate ❑ Used for the store instructions, which does not require rd ❑ Examples
  • 42. IT3030E, Fall 2024 42 B-format instruction: 2 registers + 1 immediate ❑ Combines the funct7 and rd for 13-bit immediate l Lsb = 0 (imm[12:1] for half-word instruction address, more on this later). l Keep the same bit position as S-format l Msb always in bit 31 of instruction word (simplified sign- extension, also more on this later) ❑ As a result: position of 13 bits immediate are mixed ❑ Used for conditional jump instructions
  • 43. IT3030E, Fall 2024 43 U- and J-format instruction: 1 register + 1 immediate ❑ Combines the funct7, funct3, rs1 and rs2 for 20-bit immediate ❑ U-format: for load/add 20 bit upper-immediate to register lui rd, upimm # rd  {upimm,000} auipc rd, upimm # rd  PC + {upimm,000} ❑ J-format: for jump and link jal rd, label # PC  PC+addr, rd  PC+4 addr = SignExt{imm,0} ❑ Pseudo-instruction j label ➔ jal x0, label
  • 44. IT3030E, Fall 2024 44 Working with wide immediates and addresses ❑ Many operations need 32-bit immediates l Loading 32-bit immediates to registers l Loading addresses to registers ❑ However, instructions are only 32 bit-long l Not sufficient to store 32-bit immediates in one instruction l →combine 2 instructions to support wide immediates ❑ Example: load the value 0x3D0100 into s0 lui s0, 0x003D0 #s0  0x003D0000 addi s0, s0, 0x0100 #s0  0x003D0100 ❑ Pseudo-instructions: combination of real instructions, for convenience l li, la
  • 45. IT3030E, Fall 2024 45 Working with wide immediates and addresses ❑ Special case: long jump ❑ Conditional branches: blt, bne,.. l B-format, with 12 bit immediates l Limited to 4 KB → limit branching distance l Solution: change to jal ❑ Unconditional jump (jal) l Distance is limited to 1MB l Solution: use jalr, combine with auipc if necessary jalr rd, rs1, imm # PC = rs1 + SignExt(imm), rd = PC+4 beq x10, x0, L1 #limit 4KB bne x10, x0, L2 jal x0, L1 #limit 1MB L2:
  • 46. IT3030E, Fall 2024 46 Exercise ❑ How branch instruction is executed? ❑ ➔ PC-relative addressing mode slti t0, s1, 5 bne t0, zero, else addi t1, zero, 3 j endif else: addi t1, zero, 10 endif:... How can CPU jump from here to the “else” label?
  • 47. IT3030E, Fall 2024 47 Example switch(test) { case 0: a=a+1; break; case 1: a=a-1; break; case 2: b=2*b; break; default: } Solution beq s1,t0,case_0 beq s1,t1,case_1 beq s1,t2,case_2 j default case_0: addi s2,s2,1 #a=a+1 j continue case_1: sub s2,s2,t1 #a=a-1 j continue case_2: add s3,s3,s3 #b=2*b j continue default: continue: Assuming that: test,a,b are stored in $s1,$s2,$s3 The simple switch
  • 48. IT3030E, Fall 2024 48 Example ❑ Write assembly code correspond to the following C code for (i = 0; i < n; i++) sum = sum + A[i]; loop: addi s1,s1,1 #i=i+step add t1,s1,s1 #t1=2*s1 add t1,t1,t1 #t1=4*s1 add t1,t1,a0 #t1 <- address of A[i] lw t0,0(t1) #load value of A[i] in t0 add s0,s0,t0 #sum = sum+A[i] bne s1,a1,loop #if i != n, goto loop
  • 49. IT3030E, Fall 2024 49 Example The simple while loop: while (A[i]==k) i=i+1; Assuming that: i, k, A are stored in x22,x24,x25 Solution Loop: slli x10, x22, 2 #i*4 add x10, x10, x25 #A[i] address lw x9, 0(x10) #A[i] value bne x9, x24, Exit #break if != k addi x22, x22, 1 #next element beq x0, x0, Loop Exit: …
  • 50. IT3030E, Fall 2024 50 Procedures ❑ Stack structure ❑ Passing control l To beginning of procedure code l Back to return point ❑ Passing data l Procedure arguments l Return value ❑ Register saving conventions ❑ Memory management l Allocate during procedure execution l Deallocate upon return P(…) { • • y = Q(x); print(y) • } int Q(int i) { int t = 3*i; int v[10]; • • return v[t]; }
  • 51. IT3030E, Fall 2024 51 Stack structure ❑ A region of memory operating on a Last In First Out (LIFO) principle ❑ The bottom of stack is at the highest location ❑ sp: point to the top of the stack b a $sp c Frame for current procedure $fp . . . Before calling Sa reg Lo var
  • 52. IT3030E, Fall 2024 52 Stack structure ❑ To push data into stack l addi sp, sp, -4 l sw t0, 0(sp) ❑ To pop data from the stack l lw t0, 0(sp) l addi sp, sp, 4 l Note that: the data is still there in the stack, but we are not going to work with it anymore.
  • 53. IT3030E, Fall 2024 53 Passing control flow ❑ Procedure call: using RISC-V procedure call instruction jal rd, ProcAddress #jump and link l Saves the return address (PC+4) in destination register rd (usually in ra or x1) l Jump to the ProcAddress ❑ Return address: l Address of the next instruction right after call ❑ Procedure return: procedure return with jalr x0, 0(x1) l Update the value of PC = ra l Jump to the address of PC (the next instruction right after procedure call)
  • 54. IT3030E, Fall 2024 54 Passing control jal proc jr $ra proc Save, etc. Restore PC Prepare to continue Prepare to call main jalr x0, 0(x1)
  • 55. IT3030E, Fall 2024 55 Passing control ❑ Demonstrate on the Rars simulator ❑ Take care the value of the pc and ra register!
  • 56. IT3030E, Fall 2024 56 Procedure call and nested procedure call Example of nested procedure call Question: how can the CPU resume the main program execution?
  • 57. IT3030E, Fall 2024 57 Passing data ❑ Use registers l Input arguments: - a0-a7 - 8 parameters (arguments) maximum l Return value: - a0 ❑ What if we want to pass more than 8 arguments → use the stack: l Caller pushes arguments into stack before calling the callee l Callee get arguments from the stack l (Optional) Callee saves the return value to the stack l Question: who clean the stack, caller or callee?
  • 58. IT3030E, Fall 2024 58 Register saving convention ❑ Registers to be saved by the caller l ra, t0-t6, a0-a7 ❑ Registers must be saved by the callee l sp, s0-s11 ❑ Note: save the registers just in case you need to modify them. ❑ Question: where to save the above registers?
  • 59. IT3030E, Fall 2024 59 Memory management ❑ Question: where to locate the local variables: t and v? l Use registers: # of registers is limited l Use Stack P(…) { • • y = Q(x); print(y) • } int Q(int i) { int t = 3*i; int v[10]; • • return v[t]; }
  • 60. IT3030E, Fall 2024 60 Carnegie Mellon Stack Frame ❑ Current Stack Frame (“Top” to Bottom) l “Argument build:” Parameters for function about to call l Local variables If can’t keep in registers l Saved register context l Old frame pointer (optional) ❑ Caller Stack Frame l Return address - Pushed by jal instruction l Arguments for this call Return Addr Saved Registers + Local Variables Argument Build (Optional) Old fp Arguments 9+ Caller Frame Frame pointer fp Stack pointer sp (Optional)
  • 61. IT3030E, Fall 2024 61 Six Steps in the Execution of a Procedure 1. Main routine (caller) places parameters in a place where the procedure (callee) can access them l a0 – a7 (x10 – x17): 8 argument registers 2. Caller transfers control to the callee (jal) 3. Callee acquires the storage resources needed 4. Callee performs the desired task 5. Callee places the result value in a place where the caller can access it l a0 – a1: two value registers for result values 6. Callee returns control to the caller (jalr) l ra (x1): one return address register to return to caller
  • 62. IT3030E, Fall 2024 62 Procedure that does not call another proc. ❑ C code: int leaf_example (int g, h, i, j) { int f; f = (g + h) - (i + j); return f; } l g, h, i, j stored in a0, a1, a2, a3 l f in s0 (need to be saved) l t0 and t1 used for temporary data l Preserve all s0, t0, t1 for safety l Result in a0
  • 63. IT3030E, Fall 2024 63 Sample code leaf_example: addi sp, sp, -12 sw t1, 8(sp) sw t0, 4(sp) sw s0, 0(sp) add t0, a0, a1 add t1, a2, a3 sub s0, t0, t1 add a0, s0, zero lw s0, 0(sp) lw t0, 4(sp) lw t1, 8(sp) addi sp, sp, 12 jalr zero, 0(ra) # room for 3 items # save t1 # save t0 # save s0 # t0 = g+h # t1 = i+j # s0 = (g+h)-(i+j) # return value in a0 # restore s0 # restore t0 # restore t1 # deallocate # return to caller Exercise: write code to utilize the procedure above
  • 64. IT3030E, Fall 2024 64 Stack usage
  • 65. IT3030E, Fall 2024 65 Procedure with nested proc. ❑ C code: int fact (int n) { if (n < 1) return (1); else return n * fact(n - 1); } l n in $a0 l Result in $a0
  • 66. IT3030E, Fall 2024 66 Sample code fact: addi sp, sp, -8 sw ra, 4(sp) sw a0, 0(sp) addi t0, a0, -1 bge t0, zero, L1 addi a0, zero, 1 addi sp, sp, 8 jr ra L1: addi a0, a0, -1 jal fact add a1, a0, zero lw a0, 0(sp) lw ra, 4(sp) addi sp, sp, 8 mul a0, a0, a1 jalr x0, 0(ra) #2 items in stack #save return address #and current n #n-1 > 0 #continue #the base case, return 1 #deallocate 2 words #and return #otherwise reduce n #then call fact again #restore n #and return address #shrink stack #value for normal case #multiply with n #and return
  • 67. IT3030E, Fall 2024 67 RISC-V memory configuration ❑ Program text: stores machine code of program, declared with .text ❑ Static data: data segment, declared with .data ❑ Heap: for dynamic allocation ❑ Stack: for local variable and dynamic allocation (LIFO)
  • 68. IT3030E, Fall 2024 68 Accessing characters and string ❑ String is accessed as array of characters ❑ Accessing 1-byte characters lb rd, imm(rs1) #load byte with sign-extension lbu rd, imm(rs1) #load byte with zero-extension sb rs2, 0(rs1) #store LSB to memory
  • 69. IT3030E, Fall 2024 69 Accessing characters and string ❑ Accessing 2-byte characters lh rd, imm(rs1) #load half with sign-extension lhu rd, imm(rs1) #load half with zero-extension sh rs2, 0(rs1) #store 2 LSB to memory ❑ Example: string copy void strcpy (char x[], char y[]) { int i = 0; while ((x[i] = y[i]) != ‘0’) i += 1; }
  • 70. IT3030E, Fall 2024 70 Accessing characters and string #x and y are in a0 and a1, i in s0 strcpy: addi sp,sp,–4 # adjust stack for 1 more item sw s0, 0(sp) # save s0 add s0, zero, zero # i = 0 L1: add t1, s0, a1 # address of y[i] in t1 lbu t2, 0(t1) # t2 = y[i] add t3, s0, a0 # address of x[i] in t3 sb t2, 0(t3) # x[i] = y[i] beq t2, zero,L2 # if y[i] == 0, go to L2 addi s0, s0, 1 # i = i + 1 beq x0, x0, L1 # go to L1 L2: lw s0, 0(sp) # y[i] == 0: end of string. # Restore old $s0 addi sp,sp,4 # pop 1 word off stack jalr ra # return
  • 71. IT3030E, Fall 2024 71 Interchange sort function void sort (int v[], int n) { int i, j; for (i = 0; i < n; i += 1) { for (j = i – 1; j >= 0 && v[j] > v[j + 1]; j-=1) { swap(v,j); } } } void swap(int v[], int k) { int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; }
  • 72. IT3030E, Fall 2024 72 RISC-V Instruction Set Extensions ❑ The RV31I Instruction Set (that we have learnt so far) l Instruction word: 32 bits l Only work on integers l Supports arithmetic, logic and shift, data transfer, branches ❑ How about: l Other instruction word length? l Data other than integers? l Additional operations: multiplication, division…? ❑ Instruction Set Extensions l Additional operations and data types l Additional formats and customed formats l → RISC-V scalable ISA
  • 73. IT3030E, Fall 2024 73 RISC-V Standard Extensions ❑ 32-bit instruction extensions l “M”: Integer Multiplication and Division Instructions l “A”: Atomic (Memory) Instructions l “F”: Single-Precision Floating-Point Instructions l “D”: Double-Precision Floating-Point Instructions ❑ 16-bit: “C”: Compressed Instructions RISC-V instruction length encoding
  • 74. IT3030E, Fall 2024 74 RV32M: Integer Multiplication and Division Extension ❑ Support integer multiplication, division (div and rem) operations. ❑ All are R-format.
  • 75. IT3030E, Fall 2024 75 RV32A: Atomic Extension ❑ Support synchronized “atomic” memory access. l Load + data op + Store become atomic. l Similar to semaphore/mutex in multithread software. l Basically R-format, with aq (acquire) and rl (release) bits.
  • 76. IT3030E, Fall 2024 76 RV32F / D Floating-Point Extensions ❑ Support floating point operations. l Additional floating point register file for new data type. l Additional instructions to work with the new register file. l Additional load/store instructions. ❑ Data representation and computation are compliant with the IEEE 754-2008 standard (chapter 4). l “F”: 32-bit single precision floating point numbers (float in C). l “D”: 64-bit double precision floating point numbers (double in C). RISC-V floating point register file
  • 77. IT3030E, Fall 2024 77 RV32F / D Floating-Point Extensions ❑ Single precisision instructions ❑ .s for single, .d for double
  • 78. IT3030E, Fall 2024 78 RVC Compressed Extension ❑ 16-bit length instructions l Double code density compared to 32-bit instructions. l Limited to most frequently-used instructions/operands. l Overall 25% - 30% code-size reduction. RVC Instruction Formats RVC Registers for CIW, CL, CS, CB instructions
  • 79. IT3030E, Fall 2024 79 RVC Compressed Instructions
  • 80. IT3030E, Fall 2024 80 Further reading ❑ MIPS instruction set ❑ ARM instruction set ❑ x86 instruction set
  • 81. IT3030E, Fall 2024 81 The end