chapter 6 here is about risc processors and ciscs

1
Processor Design
The Language of Bits
Prof. Smruti Ranjan Sarangi
IIT Delhi
Basic Computer Architecture
Chapter 6: RISC-V Assembly Language

(c) Smruti R. Sarangi, 2024 2
Download the pdf of the book
www.basiccomparch.com
videos
Slides, software, solution manual
Print version
(Publisher: WhiteFalcon, 2021)
Available on e-commerce sites.
The pdf version of the book and
all the learning resources can be
freely downloaded from the
website: www.basiccomparch.com
2nd
version

3
Outline
* RISC-V Machine Model
* Integer Instructions
* Control Transfer and Memory Instructions
* Floating Point Instructions
* Instruction Encoding
Code snippets beautified
with carbon.app
(https://carbon.now.sh)

4
Requirements of a Modern ISA
Compatible with all kinds of devices: 32, 64, 128 bits
No licensing requirement
Large software developer community
Extensibility: Possible to add all kinds of novel
extensions: vector, crypto, AI/ML, etc.
Why did the need
for RISC-V arise?
Project begins in Berkeley
2010
Creative Commons License
2015
Maintained by the RISC-V
Foundation

RISC-V Machine Model
Name Description
RV32 32-bit ISA
RV32E 32-bit ISA (embedded version)
RV64 64-bit ISA
RV64E 64-bit ISA (embedded version)
RV128 128-bit ISA
Mnemonic Meaning
E Embedded version
I Base integer ISA
M Integer multiplication/division
instructions
A Atomic instructions
F Single-precision floating point
D Double-precision floating point
V Vector Instructions
Different variants of the
instruction set
Extensions

More about Extensions
• Integer variants of the ISA
• RV32I is a regular 32-bit ISA with integer instructions
• RV64I, RV128I
• RV32IMA  integer, mul/div, atomic instructions
• RV32I1p3
• Major version number: 1
• Minor version number: 3
• G  base integer ISA, additional integer insts, FP instructions, basic
synch. instructions
• Popular format: RV32G

Embedded and Compressed Instructions
• We quickly run out of letters !!!
• The `Z’ series was introduced
• Zfa  Additional floating-point instructions
• Embedded version
• Does not reduce the instruction width (still 32 bits)
• Reduces the number of registers by 50%
• Compressed version
• Width = 16 bits
• Access only 8 registers
• Limited opcode support, smaller immediate values
Even in 64 and 128-bit formats,
instructions are 32 bits.
48 and 64-bit
instructions are getting
ratified

RISC-V Integer Registers
Register Mnemonic Description
x0 zero Hardwired to zero
x1 ra Return address
x2 sp Stack pointer
x3 gp Global pointer
x4 tp Thread pointer (thread-
local storage
x5-7 t0-2 Temporary registers
x8 s0/fp Saved register/frame
pointer
x9 s1 Saved register
x10-11 a0-1 Function args/
return values
x12-17 a2-7 Function args
x18-27 s2-11 Saved registers
x28-31 t3-6 Temporary registers
x0 is hardwired to 0
There is a dedicated pc register
Bi-endian

x0/zero
x1
x2
x3
x4
x5
x6
x7
x8
x9
RV32
RV64
RV128
x31
0
32
64
127
PC
0
32
64
127
View of Registers

Outline

11
Moving values to registers
Semantics Example Explanation
addi rd, rs1, imm addi x1, x0, 5 x1 ← 0 + 5
add rd, rs1, rs2 add x1, x0, x3 x1 ← 0 + x3
• Register x0/ zero is hardwired to 0
• Moving a value to a register is accomplished by adding it to zero
• Immediates are normally 12 bits in RISC-V
• The addi instruction adds an immediate to a register (could be zero)
• The regular add instruction can be used
• If the first source register is set to zero 
Moving a value from one register to another
• There is no subi instruction because the immediate is signed
(c) Smruti R. Sarangi, 2024

Loading Immediates Directly
• The lui instruction can load a 20-bit immediate directly into the upper 20 bits.
• Note that other instructions only load the lower 12 bits
• The li assembler directive can load a 32-bit value directly
• Translated to two instructions (internally)
• We can either use the x-series registers or the mnemonics
• The mnemonics are normally preferred because it is possible to honor the register-
usage semantics.
lui rd, imm (20-bit) lui x1, 5 x1 ← 5 << 12
li rd, imm (32-bit) li t0, 0xABCD1234 t0 ← 0xABCD1234

Add 409932 + 409823.

• Using assembler directives like li makes writing assembly code easier
Add 409932 + 409823.

Add/Sub Instructions
add rd, rs1, rs2 add x1, x2, x3 x1 ← x2 + x3
addi rd, rs1, imm addi x1, x2, 5 x1 ← x2 + 5
sub rd, rs1, rs2 sub x1, x2, x3 x1 ← x2 - x3

Compute 4 + 5 - 19.

Swap two numbers stored in x1 and x2 (resp.).

Multiplication and Division
• Part of the ‘M’ extension
• The default mul instruction computes the lower 32 bits
• The product ideally requires 64 bits
• The mulh and mulhu instructions can be used to compute the upper 32 bits
• The HW dynamically recognizes a consecutive mul-mulh instruction pair, and
replaces it with a single multiplication
mul rd, rs1, rs2 mul x1, x2, x3 x1 ← x2 * x3
div rd, rs1, rs2 div x1, x2, x3 x1 ← x2 / x3
rem rd, rs1, rs2 rem x1, x2, x3 x1 ← rem (x2 / x3)

Division Rules
Division Operation Quotient Remainder
4 ÷ 3 1 1
4 ÷ (-3) -1 1
(-4) ÷ 3 -1 -1
(-4) ÷ (-3) 1 -1
Round towards 0 Sign of the remainder is the same
as the sign of the dividend
Consecutive division and remainder instructions (with the same operands) can be fused.

Multiply 3 with (-17).

Compute 123
+ 1.

Divide -50 / 3.
Dynamically fused by
hardware.

Logical and Shift Instructions
and rd, rs1, rs2 and x1, x2, x3 x1 ← x2 AND x3
andi rd, rs1, imm andi x1, x2, 6 x1 ← x2 AND 6
or rd, rs1, rs2 or x1, x2, x3 x1 ← x2 OR x3
ori rd, rs1, imm ori x1, x2, 9 x1 ← x2 OR 9
xor rd, rs1, rs2 xor x1, x2, x3 x1 ← x2 XOR x3
xori rd, rs1, imm xori x1, x2, 9 x1 ← x2 XOR 9
sll rd, rs1, rs2 sll x1, x2, x3 x1 ← x2 << x3
srl rd, rs1, rs2 srl x1, x2, x3 x1 ← x2 >> x3
sra rd, rs1, rs2 sra x1, x2, x3 x1 ← x2 >>> x3
slli rd, rs1, imm slli x1, x2, 3 x1 ← x2 << 3
srli rd, rs1, imm srli x1, x2, 3 x1 ← x2 >> 3
srai rd, rs1, imm srai x1, x2, 3 x1 ← x2 >>> 3
right-shift arithmetic

Generic Format of Shift Instructions
Generic format
sll
srl
sra
slli
srli
srai
rd
rd
rd
rd
rd
rd
rs1
rs1
rs1
rs1
rs1
rs1
rs2
rs2
rs2
imm
imm
imm
,
,
,
,
,
, ,
,
,
,
,
,
,
,
,
,
,
,

Compute the bitwise OR of A and B. Let A = 4 and B = 1.

Compute t1 = t2 + t3 × 4
Strength
reduction
Multiplications are slow. Shifts are much faster.

Outline

Conditional Control Transfer Instructions
beq rs1, rs2, label beq x1, x2, .foo Branch to the . foo label if x1 = x2
bne rs1, rs2, label bne x1, x2, .foo Branch to the .foo label if x1 ≠ x2
bge rs1, rs2, label bge x1, x2, .foo Branch to the .foo label if x1 ≥ x2
blt rs1, rs2, label blt x1, x2, .foo Branch to the .foo label if x1 < x2
bgeu rs1, rs2, label bgeu x1, x2, .foo Similar to bge, considers unsigned values.
bltu rs1, rs2, label bltu x1, x2, .foo Similar to blt, considers unsigned values.
slt rd, rs1, rs2 slt x1, x2, x3 if (x2 < x3) set x1 to 1
slti rd, rs1, imm slti x1, x2, 5 if (x2 < 5) set x1 to 1
sltu rd, rs1, rs2 sltu x1, x2, 5 if (x2 <unsigned x3) set x1 to 1
sltui rd, rs1, imm sltui x1, x2, x3 if (x2 <unsigned 5) set x1 to 1

Control Transfer Instructions
RISC-V does not have a flags register.
Instead, the flags are stored in a register that the programmer specifies.
Compare 2 and 5 and store the comparison
result in t2. [t2 = (2 < 5)]

Add two 64-bit numbers (stored in two 32-bit registers)
Note: sltu does an unsigned comparison.
t0 < t4 → there
is a carry

Add all the numbers from 1 to 10.

Unconditional Branches
• The jal instruction performs the role of an unconditional branch and a call instruction:
• The return address is explicitly specified (there is no default)
• If the return addr. register is zero, then this is a regular unconditional branch instruction
• Otherwise, it is a call instruction
• jalr is like a return instruction
• In this case, the return address is stored in rd
• If the return addr. register is zero, then nothing is stored. Functions like an indirect branch.
jal rd, label jal x1, func Jump to the func label and store the
return address in x1
jalr rd, rs1, offset jalr x1, x2, 20 Jump to the address x2 + 20 and
store the return address in x1

Check if the number stored in a1 is a square or not?

Check whether the number stored in a1 is a prime number.

Find the number of ones in a 32-bit number stored in a1.

A Program with a Simple Function Call
Return value

Compute xn
and store the result in a0.
Initialization
Computation

Load-Store Instructions
lw rd, imm(rs1) lw x1, 32(sp) x1 ← mem[sp + 32]
sw rs2, imm(rs1) sw x1, 32(sp) mem[sp+32] ← x1
la rd, label la x1, pi x1 ← address(pi)
• The lw and sw instructions load and store 32-bit values
• There is a need for an assembler directive (la)
• Assign a constant to a label
• Load the label directly to a register
• Generate assembly statements to load the constant to a register

Load Instruction
lw x1, imm(x5)

Store Instruction
Register
file
Memory
x5
x1
imm
sw x1, imm(x5)

Load a0 with the contents of the memory address
sp − s0 × 4 − 12.
Load 17 into the
register s0.

Create a copy of a 10-element array. Assume the starting address of the
original array is stored in a1 and that of the destination array is stored in a2.

Compute the sum of the elements in a 10-element array. Assume that the base
address of the array is stored in a1. Store the result in a0.

Contd…
Add the array
element to a0

Compute the result of the factorial function using a recursive algorithm.

Contd …

Outline

Floating Point Registers
f0-7 ft0-7 Temporary registers
f8-9 fs0-1 Saved registers
f10-11 fa0-1 Arguments/return values
f12-17 fa2-7 Function arguments
f18-27 fs2-11 Saved registers
f28-31 ft8-11 Temporary registers
• No register is hardwired to 0
• It is not possible to load values into FP registers directly
• They can only be initialized by loading values from memory (like x86)

Floating Point Control and Status Register
Mnemonic Explanation
NV Invalid operation
DZ Divide by zero
OF Overflow
UF Underflow
NX Inexact
Rounding
Mode
Mnemonic Meaning
000 RNE Round to nearest, prefer even LSBs
001 RTZ Round towards zero
010 RDN Round down (towards -∞)
011 RUP Round up (towards +∞)
100 RMM Round to nearest (prefer maximum abs. val.)
101,110 Invalid. Reserved for future use
111 DYN Select a rounding mode dynamically (frm field)
Reserved Rounding mode
(frm)
8
31 7 5
24 3
Accrued exceptions (fflags)
NV DZ OF UF NX
1 1 1 1 1
4 0

Floating Point Load and Store Operations
• We load the contents into memory first, and then transfer it to a FP
register.
flw rd, imm(rs1) flw f1, 32(sp) f1 ← mem[sp + 32]
fsw rs2, imm(rs1) fsw f1, 32(sp) mem[sp+32] ← f1

Load the value of a constant val into the
floating point register fs1.
Define the constant.
Store its address in a1
Load into an FP register.

Arithmetic Instructions
fadd.s rd, rs1, rs2 fadd.s f1, f2, f3 f1 ← f2 + f3
fsub.s rd, rs1, rs2 fsub.s f1, f2, f3 f1 ← f2 - f3
fmul.s rd, rs1, rs2 fmul.s f1, f2, f3 f1 ← f2 × f3
fdiv.s rd, rs1, rs2 fdiv.s f1, f2, f3 f1 ← f2 ÷ f3
fmin.s rd, rs1, rs2 fmin.s f1, f2, f3 f1 ← min(f2 , f3)
fmax.s rd, rs1, rs2 fmax.s f1, f2, f3 f1 ← max(f2 , f3)
fsqrt.s rd, rs1 fsqrt.s f1, f2 f1 ←
• .s is for single precision
• .d is for double precision

Compute , and store the result in fa0.

Multiplication and Conversion Instructions
fmadd.s rd, rs1, rs2, rs3 fmadd.s f1, f2, f3, f4 f1 ← f2 * f3 + f4
fmsub.s rd, rs1, rs2, rs3 fmsub.s f1, f2, f3, f4 f1 ← f2 * f3 - f4
fcvt.s.w rd, rs1 fcvt.s.w f1, x5 f1 ← (float) x5
fcvt.w.s rd, rs1 fcvt.w.s x5, f1 x5 ← (int) f1
• fmadd.s and fmsub.s are compound instructions: multiply + add/subtract
• The convert instruction (fcvt) is another way of loading and storing FP values

Compute π × e + 4, and store the result in ft0. Convert the result to an integer and
store the result in a0.
Need to convert
4 to 4.0 first

Floating Point Comparison Instructions
Same as integer comparison
flt.s rd, rs1, rs2 flt.s s1, f2, f3 if (f2 < f3) set s1 to 1
fle.s rd, rs1, rs2 fle.s s1, f2, f3 if (f2 ≤ f3) set s1 to 1
feq.s rd, rs1, rs2 feq.s s1, f2, f3 if (f2 == f3) set s1 to 1

First, initialize a0 = 0, then set a0 = 17 if e < π.
Perform the
comparison and
store the result in t0
Check the result of
the comparison.

Outline

Format/Type Structure Instructions
R rd, rs1, rs2 add, sub, mul, div, rem, and, or, xor, sll, srl, sra, slt, sltu
I rd, rs1, imm
rd, imm(rs1)
addi, andi, ori, xori, slli, srli, srai, slti, sltiu
lw, jalr
S rs2, imm(rs1) sw
U rd, imm lui
Instruction Formats in 32-bit RISC-V
• Define a new instruction type for store instructions
• The default immediate size is 12 bits
• The U-type supports a 20-bit immediate

add, slt and xor
opcode
rd
funct3
funct7 rs1
7
7 3 5
5
rs2
5
R-Type
Differentiate
between opcodes
Further differentiation
between opcodes (add, sub)
opcode
rd
funct3
funct7 rs1
7
12 3 5
5
imm I-Type
12-bit immediate
field

20
opcode
rd
7
5
U-Type
imm
opcode
imm[4..0]
funct3
imm[11..5] rs1
7
7 3 5
5
rs2
5
S-Type
Split the
immediate
Split the
immediate
Do not define
a new format
20-bit immediate

opcode
rd
7
5
20
J-Type
imm[19..12]
imm[10..1]
20 11
opcode
imm[..]
funct3
imm[10..5] rs1
7
7 3 5
5
rs2
5
B-Type
11
12
4..1
Format Structure Instructions
B rs1, rs2, imm beq, bne, blt, bge, bltu, bgeu
J rd, imm jal
immediate bits: LSB is 0
 Do not represent it
immediate bit
≈ ± 4 𝐾𝐵
≈ ± 1 𝑀𝐵
immediate bits: LSB is 0

opcode
rd
width
funct7 rs1
7
12 3 5
5
imm LOAD-FP
12-bit immediate
field
opcode
imm[4..0]
width
imm[11..5] rs1
7
7 3 5
5
rs2
5
STORE-FP
#bytes
loaded/stored

FP Instructions
opcode
rd
rm
funct5 rs1
7
5 2 3 5
5
rs2
fmt
5
opcode
rd
rm
rs3 rs1
7
5 2 3 5
5
rs2
fmt
5
precision
Used by all FP arithmetic,
compare and conversion
instructions.
rounding mode
Fused multiply-add instruction

THE END

chapter 6 here is about risc processors and ciscs

More Related Content

Similar to chapter 6 here is about risc processors and ciscs (20)

More from Elisée Ndjabu (20)

Recently uploaded (20)

chapter 6 here is about risc processors and ciscs