SlideShare a Scribd company logo
Computer Architecture
Memory System Design
February 26, 2025 Veton Këpuska 2
Chapter Outline
7.1 Introduction: The Components of the Memory
System
7.2 RAM Structure: The Logic Designer’s Perspective
7.3 Memory Boards and Modules
7.4 Two-Level Memory Hierarchy
7.5 The Cache
7.6 Virtual Memory
7.7 The Memory Subsystem in the Computer
February 26, 2025 Veton Këpuska 3
Introduction
 So far, we’ve treated memory as an array of words
limited in size only by the number of address bits.
 Real world issues arise:
 cost
 speed
 size
 power consumption
 volatility
 ...
February 26, 2025 Veton Këpuska 4
Topics Covered
 Memory components:
 RAM memory cells and cell arrays
 Static RAM—more expensive, but less complex
 Tree and matrix decoders—needed for large RAM chips
 Dynamic RAM—less expensive, but needs “refreshing”
 Chip organization
 Timing
 ROM—Read-only memory
 Memory boards
 Arrays of chips give more addresses and/or wider
words
 2-D and 3-D chip arrays
 Memory modules
 Large systems can benefit by partitioning memory for
 separate access by system components
 fast access to multiple words
February 26, 2025 Veton Këpuska 5
Topics Covered
• The memory hierarchy: from fast and expensive to slow
and cheap
 Example: Registers Cache Main Memory
Disk
 At first, consider just two adjacent levels in the
hierarchy
 The cache: High speed and expensive
 Kinds: Direct mapped, associative, set associative
 Virtual memory—makes the hierarchy transparent
 Translate the address from CPU’s logical address
to the physical address where the information is
actually stored
 Memory management—how to move information
back and forth
 Multiprogramming—what to do while we wait
 The Translation Look-aside Buffer - “TLB” helps in
speeding the address translation process
• Overall consideration of the memory as a subsystem
February 26, 2025 Veton Këpuska 6
The CPU–Memory Interface
 Sequence of events:
 Read:
1. CPU loads MAR, issues Read, and REQUEST
2. Main memory transmits words to MDR
3. Main memory asserts COMPLETE
 Write:
1. CPU loads MAR and MDR, asserts Write, and REQUEST
2. Value in MDR is written into address in MAR
3. Main memory asserts COMPLETE
CPU
m
Main memory
Address bus
Data bus
s Address
0
1
2
3
2m
– 1
A0 – Am– 1
D0 – Db– 1
R/W
REQUEST
COMPLETE
MDR
Register
file
Control signals
m
w
w
MAR
b
February 26, 2025 Veton Këpuska 7
The CPU–Memory Interface (cont’d.)
 Additional points:
 If b < w, main memory must make (w/b) b-bit transfers
 Some CPUs allow reading and writing of word sizes < w
 Example: Intel 8088: m = 20, w = 16, s = b = 8
 8- and 16-bit values can be read and written
 If memory is sufficiently fast, or if its response is
predictable,
then COMPLETE may be omitted
 Some systems use separate R and W lines, and omit
REQUEST
CPU
m
Main memory
Address bus
Data bus
s Address
0
1
2
3
2m
– 1
A0 – AmÐ1
D0 – DbÐ1
R/W
REQUEST
COMPLETE
MDR
Register
file
Control signals
m
w
w
MAR
b
February 26, 2025 Veton Këpuska 8
Some Memory Properties
Symbol Definition Intel Intel PowerPC
8088 8086 601
w CPU word size 16 bits 16 bits 64 bits
m Bits in a logical memory address 20 bits 20 bits 32 bits
s Bits in smallest addressable unit 8 bits 8 bits 8 bits
b Data bus size 8 bits 16 bits 64 bits
2m Memory word capacity, s-sized wds 220 wds 220 wds 232 wds
2mxsMemory bit capacity 220x8 bits 220 8 bits 232x8 bits
February 26, 2025 Veton Këpuska 9
Big-Endian and Little-Endian Storage
 When data types having a word size larger than the
smallest addressable unit are stored in memory the
question arises,
 “Is the least significant part of the word stored at the
lowest address (little-Endian, little end first) or—
 is the most significant part of the word stored at the lowest
address (big-Endian, big end first)”?
Example: The hexadecimal 16-bit number ABCDH,
stored at address 0:
AB CD
msb ... lsb
AB
CD
0
1
AB
CD
0
1
Little-Endian Big-Endian
February 26, 2025 Veton Këpuska 10
Bin-Endian and Little-Endian Storage of a PowerPC 32-
Bit Word
Little-Endian Storage
Memory
Address
Contents
0 b7 … b0
1 b15 … b8
2 b23 … b16
3 b31 … b24
…
CPU Word
CPU Word bit
locations
b31...b24 b23…b16 b15...b8 b7…b0
Big-Endian byte
addresses
0 1 2 3
Little-Endian byte
addresses
3 2 1 0
Big-Endian Storage
Memory
Address
Contents
0 b31 … b24
1 b23 … b16
2 b15 … b8
3 b7 … b0
…
February 26, 2025 Veton Këpuska 11
Memory Performance Parameters
Symbol Definition Units Meaning
ta
Access time Time Time to access a memory word until
the assertion of the memory
completion signal
tc
Cycle time Time Time from start of read to start of next
memory operation
k Block size Words Number of words per block
 Bandwidth Words/
sec
Word Transmission Rate
tl
Latency Time Time to access first of a sequence of
words
tbl=t1+k/ Block
access
time
Time Time to access entire block form
start of read
February 26, 2025 Veton Këpuska 12
The Memory Hierarchy, Cost, and Performance
 Some Typical Values:
Component
Access type Random Random Random Direct Sequential
access access access access access
Capacity, bytes 64–1024 8–512 KB 8–64 MB 1–10 GB 1 TB
Latency 1–10 ns 20 ns 50 ns 10 ms 10 ms–10 s
Block size 1 word 16 words 16 words 4 KB 4 KB
Bandwidth System 8 MB/s 1 MB/s 1 MB/s 1 MB/s
clock
rate
Cost/MB High $500 $30 $0.25 $0.02
CPU
Cache MainMemory DiskMemory
Tape
Memory
February 26, 2025 Veton Këpuska 13
Memory Hierarchy
 Memory Hierarchy
CPU Cache Main
Memory
HD
word
Primary
Level
Secondary
Level
Ternary
Level
2nd
-ary
level
Block
3rd
-ary
level
Block
Cache
lines
Virtual Memory
•Block Size
•Latency
•Bandwidth
•Hit Rate
Increases Frequency of transaction Decreases
Principle of Locality
Fast & Small
Slow & Large
February 26, 2025 Veton Këpuska 14
Memory Hierarchy
 Considering Any Two Adjacent Levels of the Memory
Hierarchy
 Some definitions:
 Temporal locality: the property of most programs that if a
given memory location is referenced, it is likely to be
referenced again, “soon.”
 Spatial locality: if a given memory location is referenced,
those locations near it are likely to be referenced “soon.”
 Working set: The set of memory locations referenced over a
fixed period of time, or within a time window.
 Notice that temporal and spatial locality both work to
assure that the contents of the working set change only
slowly over execution time.
CPU • • • • • •
Slower,
larger
Defining the primary and secondary levels:
Secondary
level
Primary
level
Faster,
smaller
two adjacent levels in the hierarchy
February 26, 2025 Veton Këpuska 15
Primary and Secondary Levels of the Memory Hierarchy
Secondary
level
• The item of commerce between any two levels is the block.
• Blocks may/will differ in size at different levels in the hierarchy.
Example: Cache block size ~ 16–64 bytes.
Disk block size: ~ 1–4 Kbytes.
• As working set changes, blocks are moved back/forth through the
hierarchy to satisfy memory access requests.
• The primary level block size is often determined more by the
properties of information transfer than by spatial locality principle.
Speed between levels defined by
latency: time to access first word, and
bandwidth: the number of words per second transmitted between
levels.
Typical latencies:
Cache latency: a few clocks
Disk latency: 100,000 clocks
Primary
level
February 26, 2025 Veton Këpuska 16
Primary and Secondary Addresses
• A complication: Addresses will differ
depending on the level.
• Primary address: the address of a value in
the primary level.
• Secondary address: the address of a
value in the secondary level.
 Primary and Secondary Address
Examples
 Main memory address: unsigned integer
 Disk address: track number, sector number,
offset of word in sector.
February 26, 2025 Veton Këpuska 17
Addressing and Accessing a Two-Level Hierarchy
The
computer
system, HW
or SW,
must
perform any
address
translation
that is
required:
Two ways of forming the address: Segmentation and Paging.
Paging is more common. Sometimes the two are used together,
one “on top of” the other. More about address translation and paging later...
Miss
System
address
Hit
Address in
secondary
memory
Memory management unit (MMU)
Address in
primary
memory
Block
Word
Primary
level
Secondary
level
Translation function
(mapping tables,
permissions, etc.)
February 26, 2025 Veton Këpuska 18
Primary Address Formation
Block
System address
Lookup table
Word
Block
Primary address
(a) Paging
Word
Block
System address
Lookup table
Word
Base address
Primary address
(b) Segmentation
Word
+
February 26, 2025 Veton Këpuska 19
Hits and Misses; Paging; Block Placement
 Hit: the word was found at the level from which it was
requested.
 Miss: the word was not found at the level from which it was
requested.
 A miss will result in a request for the block containing the word
from the next higher level in the hierarchy.
 Hit ratio (or hit rate) = h = number of hits
 Miss ratio: 1 - hit ratio
 tp = primary memory access time.
 ts = secondary memory access time
 Access time, ta = h • tp + (1-h) • ts
 Page: commonly, a disk block.
 Page fault: synonymous with a miss.
 Demand paging: pages are moved from disk to main
memory only when a word in the page is requested by the
processor.
 Block placement and replacement decisions must be made
each time a block is moved.
total number of
references
February 26, 2025 Veton Këpuska 20
Hits and Misses; Paging; Block Placement
 The Cache Regime:
 Secondary latency only about 10 time as long as
primary-level access time.
 CPU wait afforded as long as miss ratio is small.
 A small fast memory (sometimes part of CPU), called
cache, connected to a slower memory typically of
different semiconductor technology.
 Cache’s access time matches the processor speed.
 Two level cashes:
 Primarily cache (faster level)
 Secondary cache (slower level)
 Since CPU waits on cache miss, software cannot be
used in miss response thus everything has to be
done in hardware.
 This means that placement and replacement decisions
and secondary address formation must be kept simple
so they can be done quickly with limited hardware.
 Less flexibility of placement in cache then in primary
level of a virtual memory system.
February 26, 2025 Veton Këpuska 21
Hits and Misses; Paging; Block Placement
 The Cache Miss versus the Main Memory
Miss
 The cache hit time is typically 1-2 clock cycles.
 Miss penalty only a few 10’s of clock cycles to
bring the desired block into the cache from main
memory.
 Main memory miss penalty typically runs to the
10,000-100,000 of clock cycles to bring the
block from the disk.
 Cache miss rates of no greater than 2% are
tolerable
 Main memory miss rates no greater than
0.001% to avoid significant degradation in
system performance.
February 26, 2025 Veton Këpuska 22
Virtual Memory
 A virtual memory is a memory hierarchy, usually
consisting of at least main memory and disk, in which
the processor issues all memory references as
effective addresses in a flat address space. All
translations to primary and secondary addresses are
handled transparently to the process making the
address reference, thus providing the illusion of a
flat address space.
 Recall that disk accesses may require 100,000 clock
cycles to complete, due to the slow access time of
the disk subsystem. Once the processor has, through
mediation of the operating system, made the proper
request to the disk subsystem, it is available for other
tasks.
 Multiprogramming shares the processor among
independent programs that are resident in main
memory and thus available for execution.
February 26, 2025 Veton Këpuska 23
Decisions in Designing a 2-Level Hierarchy
 Translation procedure to translate from system address to primary
address.
 Block size—block transfer efficiency and miss ratio will be affected.
 Processor dispatch on miss—processor wait or processor
multiprogrammed.
 Primary-level placement—direct, associative, or a combination.
Discussed later.
 Replacement policy—which block is to be replaced upon a miss.
 Direct access to secondary level—in the cache regime, can the
processor directly access main memory upon a cache miss?
 Write through—can the processor write directly to main memory upon a
cache miss?
 Read through—can the processor read directly from main memory upon a
cache miss as the cache is being updated?
 Read or write bypass—can certain infrequent read or write misses be
satisfied by a direct access of main memory without any block movement?
February 26, 2025 Veton Këpuska 24
The Cache Mapping Function
CPU
Cache
Block
Main memory
Mapping function
Address
Word
The cache mapping function is responsible for all cache operations:
• Placement strategy: where to place an incoming block in the cache
• Replacement strategy: which block to replace upon a miss
• Read and write policy: how to handle reads and writes upon cache misses
Mapping function must be implemented in hardware.
Three different types of mapping functions:
• Associative
• Direct mapped
• Block-set associative
Example: 256 KB 16 words 32 MB
February 26, 2025 Veton Këpuska 25
Memory Fields and Address Translation
Example of processor-issued 32-bit virtual address:
0
31
32 bits
That same 32-bit address partitioned into two fields, a block field,
and a word field. The word field represents the offset into the block
specified in the block field:
Block number Word
26 6
226
64 word blocks
00 ••• 001001 001011
Example of a specific memory reference: Block 9, word 11.
February 26, 2025 Veton Këpuska 26
Associative Cache
*16 bits, while unrealistically small, simplifies the examples
Associative mapped
cache model: any
block from main
memory can be put
anywhere in the
cache.
Assume a 16-bit
main memory.*
Cache
memory
Main
memory
Valid
bits
0
1
2
255
421
?
119
2
Cache block 0 MM block 0
MM block 1
MM block 2
MM block 119
MM block 421
MM block 8191
?
Cache block 2
Cache block 255
1
0
1
3
13
1
Tag
memory
Tag
field,
13 bits
Tag
Main memory address:
Byte
One cache line,
8 bytes
One cache line,
8 bytes
Valid,
1 bit
Associative
Memory Search
February 26, 2025 Veton Këpuska 27
Associative Cache
 In General:
 B- Number of Bytes per Block
 T-Address size in bits of Main Memory, also
Number of Tag bits in the Tag Memory
 L-Number of Blocks in Cache.
 SMM=Bx2T – Size of Main Memory in Bytes
 SCM=LxB – Size of Cache Memory in Bytes
 Main Memory address length in bits:
log2SMM = T+log2B
February 26, 2025 Veton Këpuska 28
Associative Cache Mechanism
Because any block can reside anywhere in the cache, an associative (content
addressable) memory is used. All locations are searched simultaneously.
Match
bit
Valid
bit
Match
64
3
8
To CPU
Argument
register
Associative tag memory
3
13 Selector
Tag
Main memory address
Byte
Cache block 0
?
Cache block 2
Cache block 255
One cache line,
8 bytes
2
3
1
4
5
6
February 26, 2025 Veton Këpuska 29
Advantages and Disadvantages
of the Associative Mapped Cache
 Advantage
 Most flexible of all—any MM block can go
anywhere in the cache.
 Disadvantages
 Large tag memory.
 Need to search entire tag memory
simultaneously means lots of hardware.
 Replacement Policy is an issue when the
cache is full.
Q.: How is an associative search conducted at the logic gate level?
Direct-mapped caches simplify the hardware by allowing each MM block
to go into only one place in the cache:
February 26, 2025 Veton Këpuska 30
Direct-Mapped Cache
Key Idea: all the MM
blocks from a given
group can go into only
one location in the
cache, corresponding
to the group number.
Now the cache needs only examine the single group
that its reference specifies.
Cache
memory Main memory block numbers Group #:
Valid
bits
0
1
2
255
30
9
1
1
1
1
1
3
8
3
8
1
Tag
memory
Tag
field,
5 bits
Group
5
Tag
Cache address:
Main memory address:
Byte
One
cache
line,
8 bytes One cache line,
8 bytes
0
1
2
255
256
257
258
511
512
513
514
767
2305
7680
7681
7682
7936
7937
7938
0
1
2
255
8191
0
Tag #: 1 2 9 30 31
February 26, 2025 Veton Këpuska 31
Direct-Mapped Cache
 In General:
 B- Number of Bytes per Block
 T- Number of Tag bits in the Tag Memory also
Number of Columns in Main Memory
 G-Number of Groups  L
 L-Number of Blocks in Cache.
 SMM=GxBx2T
– Size of Main Memory in Bytes
 SCM=LxB – Size of Cache Memory in Bytes
 Main Memory address length in bits:
log2SMM = T+log2G+log2B
February 26, 2025 Veton Këpuska 32
Direct-Mapped Cache Operation
1. Decode the
group number of
the incoming MM
address to select
the group
2. If Match
AND Valid
3. Then gate out the
tag field
4. Compare cache
tag with
incoming tag
5. If a hit, then gate
out the cache
line
6. and use the word field to
select the desired word.
Cache
memory
Valid
bits
0
1
2
255
30
9
1
1
1
5
5 5
8
64
3
1
256
1
1
3
8
1
Tag
memory
Tag
field,
5 bits
Group
5
Tag
Cache miss
Cache hit
=

Main memory address
Byte
Hit
5-bit
comparator Selector
8– 256
decoder
4
3
1
5
2
6
February 26, 2025 Veton Këpuska 33
Direct-Mapped Caches
 The direct mapped cache uses less hardware,
but is much more restrictive in block placement.
 If two blocks from the same group are
frequently referenced, then the cache will
“thrash.” That is, repeatedly bring the two
competing blocks into and out of the cache. This
will cause a performance degradation.
 Block replacement strategy is trivial.
 Compromise—allow several cache blocks in each
group—the Block-Set-Associative Cache
February 26, 2025 Veton Këpuska 34
2-Way Set-Associative Cache
Example shows 256 groups, a set of two per group.
Sometimes referred to as a 2-way set-associative cache.
Cache
memory Main memory block numbers Group #:
7680
2304
258
0
1
2
255
2
2
0
3
8
3
8
Tag
memory
Tag
field,
5 bits
Set
5
Tag
Cache group address:
Main memory address:
Byte
One
cache
line,
8 bytes
One cache line,
8 bytes
512
513
255
0
1
2
255
256
257
258
511
512
513
514
767
2304
7680
7681
7682
7936
7937
7938
0
1
2
255
8191
0
Tag #: 1 2 9 30 31
30
9
1
1 511
February 26, 2025 Veton Këpuska 35
K-Way Set-Associative Cache
Tag Memory Valid Bits Cache Memory
L
T
K
L
1
K 2T
LS
B
T log2S log2B
Primary
Address
Main
Memory
Address
February 26, 2025 Veton Këpuska 36
K-Way Set-Associative Cache
 K=1-way Block-Set Associative Cache
 Direct Mapped Cache
 S=1 Block-Set Associative Cache 
Associative Cache.
February 26, 2025 Veton Këpuska 37
Read/Write Policies
 Cache Write Policies upon a Cache Hit:
 Write-through:
 Updates both cache & Main Memory upon each write to
a block that is present in the cache.
 Write-back:
 Postpones main memory update until block is replaced
uses “dirty bit” (this bit is set upon first write in a block
of the cache).
 Read and Write Miss Policies
 Miss upon a read:
 Read-through: Immediate forwarding (from MM) to CPU
then cache is updated
 Load the block into the cache first.
 Miss upon a write:
 Write allocate: block is first brought into the cache and
then (MM) updated.
 Write no-allocate: write directly to main memory
(without brining block to cache).
February 26, 2025 Veton Këpuska 38
Block Replacement Policy
 Not needed with direct-mapped cache
 Least Recently Used (LRU)
 Track usage with a counter. Each time a block is
accessed:
 Clear counter of accessed block
 Increment counters with values less than the one
just cleared.
 All others remain unchanged
 When set is full, remove line with highest count
 Random replacement—replace block at random
 Even random replacement is a fairly effective
strategy
 FIFO – Longest residing block replaced first.
 Smaller cache memories:
 Random worst, LRU best
 Large cache memories:
 Random, LRU comparable
February 26, 2025 Veton Këpuska 39
Performance and Cache Design
 Direct-Mapped and Block-Set Associative Caches may
suffer from minor time penalties due to the need to
decode the group (set) field.
 For equivalent amount of hardware (e.g., # number
of gates), direct-mapped and set-associative caches
will have more capacity.
 Quantitative analysis of cache performance is quite
complex and is the subject of ongoing research.
 Simplified mathematical analysis, however, will
provide sufficient insight into the issues involved:
February 26, 2025 Veton Këpuska 40
Performance and Cache Design
 Definitions:
 h - hit ratio and (1-h) miss ratio.
 TC cache access time upon a cache hit.
 TM access time upon a cache miss.
 Ta average memory access during a particular machine execution interval:
 Recall - Access time:
Ta = h • Tp + (1 - h) • Ts
for primary (p) and secondary (s) levels.
 For Tp =Tc for cache and Ts=TM for MM,
Ta = hTC + (1-h)TM
 This equation is valid for read-through cache where the value
is forwarded from main memory to the CPU directly as it is
received without waiting for the entire cache line to be loaded
before initiating another cache read to forward the value.
 If this is not the case TM is replaced by the cache line fill time.
February 26, 2025 Veton Këpuska 41
Performance and Cache Design
 Speedup S:
where
 Twithout is the time taken without the improvement,
e.g., without the cache, and
 Twith is the time the process takes with the
improvement, e.g., with the cache.
 Having a model for cache and MM access times
and cache line fill time, the speedup can be
calculated once the hit ratio is known.
with
without
T
T
S 
February 26, 2025 Veton Këpuska 42
Performance and Cache Design
 Example:
 Changing a certain cache from direct mapped to two-
way set-associative caused the hit ratio measured for
a certain set of benchmarks to improve form 0.91 to
0.96 with TC=20 ns, and TM=70ns. What is the
speedup, assuming no change in Tc from the set-
associative multiplexer.
 Answer:
 Twithout = 0.91x20 + (1-0.91)x70 = 24.5 ns;
 Twith = 0.96x20 + (1-0.96)x70 = 22.5 ns;
 Thus the speedup is:
 S=24.5/22.0=1.11  11% improvement in execution
time.
February 26, 2025 Veton Këpuska 43
Virtual Memory
 The memory management unit, MMU, is responsible for mapping logical
addresses issued by the CPU to physical addresses that are presented
to the cache and main memory.
A word about addresses:
CPU
MainMemory
Cache Disk
MMU
Logical
Address
Physical
Address
Mapping
Tables
Virtual
Address
CPU Chip
• Effective address — an address computed by the processor while
executing a program. Synonymous with logical address.
• The term effective address is often used when referring to
activity inside the CPU. Logical address is most often used
when referring to addresses when viewed from outside the CPU.
• Virtual address — the address generated from the logical address
by the memory management unit, MMU, prior to making address
translation.
• Physical address — the address presented to the memory unit.
(Note: Every address reference must be translated.)
February 26, 2025 Veton Këpuska 44
Virtual Addresses—Why?
 The logical address provided by the
CPU is translated to a virtual address
by the MMU. Often the virtual address
space is larger than the logical
address, allowing program units to be
mapped to a much larger virtual
address space.
February 26, 2025 Veton Këpuska 45
Virtual Addressing—Advantages
 Simplified addressing. Each program unit can be compiled
into its own memory space, beginning at address 0 and
potentially extending far beyond the amount of physical
memory present in the system.
 No address relocation required at load time.
 No need to fragment the program to accommodate memory
limitations.
 Cost effective use of physical memory.
 Less expensive secondary (disk) storage can replace primary
storage. (The MMU will bring portions of the program into
physical memory as required, e.g., not all portions of the
program does need to occupy physical memory at one time)
 Access control. As each memory reference is translated, it
can be simultaneously checked for read, write, and execute
privileges.
 This allows access/security control at the most fundamental
levels.
 Can be used to prevent buggy programs and intruders from
causing damage to other users or the system().
 This is the origin of those “bus error” and “segmentation
fault” messages.
February 26, 2025 Veton Këpuska 46
Memory Management
Block
System address
Lookup table
Word
Block
Primary address
(a) Paging
Word
Block
System address
Lookup table
Word
Base address
Primary address
(b) Segmentation
Word
+
February 26, 2025 Veton Këpuska 47
Memory Management by Segmentation
 Notice that each
segment’s virtual
address starts at 0,
different from its
physical address.
 Repeated movement
of segments into
and out of physical
memory will result
in gaps between
segments. This is
called external
fragmentation.
 Compaction
(defragmenter)
routines must be
occasionally run to
remove these
fragments.
Main memory
Segment 1
Segment 5
Gap
Segment 6
Physical
memory
addresses
Virtual
memory
addresses
0000
0
0
0
0
0
FFF
Segment 9
Segment 3
Gap
February 26, 2025 Veton Këpuska 48
Segmentation Mechanism
 The computation of
physical address
from virtual
address requires
an integer addition
for each memory
reference, and a
comparison if
segment limits are
checked.
 Q: How does the
MMU switch
references from
one segment to
another?
Main memory
Segment 1
Segment 5
Gap
Segment 6
Offset in
segment
Segment
base
register
Segment
limit
register
No
Virtual
memory
address
from CPU
Bounds
error
Segment 9
Segment 3
Gap
+

February 26, 2025 Veton Këpuska 49
Memory Management by Paging
 This figure shows the mapping between virtual memory pages,
physical memory pages, and pages in secondary memory. Page
n - 1 is not present in physical memory, but only in secondary
memory.
 The MMU manages this mapping (logical to physical address).
Program
unit
0
Page 1
Page 2
Page n – 1
Virtual memory
Page 0
Physical memory
Secondary memory
.
.
Demand Paging:
Pages are brought
into physical
memory only as
they are needed.
February 26, 2025 Veton Këpuska 50
Virtual Address Translation in a Paged MMU
A page fault will result in 100,000 or more cycles passing before the page
has been brought from secondary storage to MM.

Page table
limit register
Page table
base register
No
Bounds
error
Access-
control bits:
presence bit,
dirty bit,
usage bits
Physical
page
number or
pointer to
secondary
storage
+
Offset in
page table
Hit.
Page in
primary memory.
Translate to
Disk address.
Miss
(page fault).
Page in
secondary
memory.
Page table
Desired word
Main memory
Virtual address from CPU
Page number Offset in page Physical page
Physical address
Word
• 1 table per
user per
program unit
• One
translation
per memory
access
• Potentially
large page
table
February 26, 2025 Veton Këpuska 51
Page Placement and Replacement
 Page tables are direct mapped, since the
physical page is computed directly from the
virtual page number.
 But physical pages can reside anywhere in
physical memory.
 Page tables such as those on the previous slide
result in large page tables, since there must be a
page table entry for every page in the program
unit.
 Some implementations resort to hash tables
instead, which need have entries only for those
pages actually present in physical memory.
 Replacement strategies are generally LRU, or at
least employ a “use bit” to guide replacement.
February 26, 2025 Veton Këpuska 52
Fast Address Translation: Regaining Lost Ground
 The concept of virtual memory is very
attractive, but leads to considerable
overhead:
 There must be a translation for every
memory reference.
 There must be two memory references for
every program reference:
 One to retrieve the page table entry,
 One to retrieve the value.
 Most caches are addressed by physical
address, so there must be a virtual to
physical translation before the cache can
be accessed.
February 26, 2025 Veton Këpuska 53
Fast Address Translation: Regaining Lost Ground
 The answer: a small cache in the
processor that retains the last few
virtual to physical translations: a
Translation Lookaside Buffer, TLB.
 The TLB contains not only the virtual to
physical translations, but also the
valid, dirty, and protection bits, so a
TLB hit allows the processor to access
physical memory directly.
 The TLB is usually implemented as a
fully associative cache:
February 26, 2025 Veton Këpuska 54
Translation Lookaside Buffer Structure and Operation
TLB
Desired word
Main memory or cache
Virtual address from CPU
Page number
Associative lookup
of virtual page
number in TLB
TLB miss.
Look for
physical page
in page table.
To page table
Virtual
page
number
Word Physical page
Physical address
Word
Hit
N
Y
Access-
control bits:
presence bit,
dirty bit,
valid bit,
usage bits
Physical
page
number
TLB hit.
Page is in
primary memory.
February 26, 2025 Veton Këpuska 55
Operation of the Memory Hierarchy
Virtual address
CPU Cache Main memory Secondary memory
Search TLB Search cache
Update cache
from MM
Return value
from cache
Generate physical
address
TLB
hit
Cache
hit
Search page table
Update MM, cache,
and page table
Page fault. Get page
from secondary
memory
Generate physical
address
Update TLB
Page
table
hit
Y Y Y
Miss Miss Miss
February 26, 2025 Veton Këpuska 56
I/O Connection to a Memory with a Cache
 The memory system is quite complex, and
affords many possible tradeoffs.
 The only realistic way to chose among these
alternatives is to study a typical workload,
using either simulations or prototype systems.
 Instruction and data accesses usually have
different patterns.
 It is possible to employ a cache at the disk level,
using the disk hardware.
 Traffic between MM and disk is I/O, and direct
memory access, DMA, can be used to speed the
transfers:
Cache
Main
memory
Paging
DMA
Disk
I/O
I/O DMA
CPU
February 26, 2025 Veton Këpuska 57
Summary
 Most memory systems are multileveled—cache,
main memory, and disk.
 Static and dynamic RAM are fastest components,
and their speed has the strongest effect on
system performance.
 Chips are organized into boards and modules.
 Larger, slower memory is attached to faster
memory in a hierarchical structure.
 The cache to main memory interface requires
hardware address translation.
 Virtual memory—the main memory–disk
interface—can employ software for address
translation because of the slower speeds
involved.
 The hierarchy must be carefully designed to
ensure optimum price-performance.

More Related Content

PPTX
Main Memory Management in Operating System
PPT
Cache Memory
PPTX
Computer Memory Hierarchy Computer Architecture
PPT
04 Cache Memory
PPTX
CPU Memory Hierarchy and Caching Techniques
PPT
cache memory introduction, level, function
PPT
04_Cache_Memory-cust memori memori memori.ppt
PPTX
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
Main Memory Management in Operating System
Cache Memory
Computer Memory Hierarchy Computer Architecture
04 Cache Memory
CPU Memory Hierarchy and Caching Techniques
cache memory introduction, level, function
04_Cache_Memory-cust memori memori memori.ppt
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx

Similar to 15845007 Computer architecture and Org.ppt (20)

PPTX
memorytechnologyandoptimization-140416131506-phpapp02.pptx
PDF
Unit I Memory technology and optimization
PPTX
Memory technology and optimization in Advance Computer Architechture
PPT
Kiến trúc máy tính - COE 301 - Memory.ppt
PPT
Cache memory
PPT
Cache memory
PPT
Memory Hierarchy PPT of Computer Organization
PPT
PPTX
cache cache memory memory cache memory.pptx
PDF
Architecture_L5 (3).pdf wwwwwwwwwwwwwwwwwwwwwwwwwww
PPTX
Memory System Design-converted.pptx
PPT
Akanskaha_ganesh_kullarni_memory_computer.ppt
PPTX
2021Arch_5_ch2_2.pptx How to improve the performance of Memory hierarchy
PPT
Memory Organization and Cache mapping.ppt
PPTX
PPT
04 cache memory
PPTX
Memory Organization
PPTX
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
PPTX
CS 542 Putting it all together -- Storage Management
PPTX
CAO-Unit-III.pptx
memorytechnologyandoptimization-140416131506-phpapp02.pptx
Unit I Memory technology and optimization
Memory technology and optimization in Advance Computer Architechture
Kiến trúc máy tính - COE 301 - Memory.ppt
Cache memory
Cache memory
Memory Hierarchy PPT of Computer Organization
cache cache memory memory cache memory.pptx
Architecture_L5 (3).pdf wwwwwwwwwwwwwwwwwwwwwwwwwww
Memory System Design-converted.pptx
Akanskaha_ganesh_kullarni_memory_computer.ppt
2021Arch_5_ch2_2.pptx How to improve the performance of Memory hierarchy
Memory Organization and Cache mapping.ppt
04 cache memory
Memory Organization
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
CS 542 Putting it all together -- Storage Management
CAO-Unit-III.pptx
Ad

Recently uploaded (20)

PDF
Yogi Goddess Pres Conference Studio Updates
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Computing-Curriculum for Schools in Ghana
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Updated Idioms and Phrasal Verbs in English subject
PPTX
Lesson notes of climatology university.
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
RMMM.pdf make it easy to upload and study
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PPTX
master seminar digital applications in india
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Yogi Goddess Pres Conference Studio Updates
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Supply Chain Operations Speaking Notes -ICLT Program
Computing-Curriculum for Schools in Ghana
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Updated Idioms and Phrasal Verbs in English subject
Lesson notes of climatology university.
Paper A Mock Exam 9_ Attempt review.pdf.
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Orientation - ARALprogram of Deped to the Parents.pptx
UNIT III MENTAL HEALTH NURSING ASSESSMENT
Microbial disease of the cardiovascular and lymphatic systems
RMMM.pdf make it easy to upload and study
Module 4: Burden of Disease Tutorial Slides S2 2025
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
What if we spent less time fighting change, and more time building what’s rig...
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
master seminar digital applications in india
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Ad

15845007 Computer architecture and Org.ppt

  • 2. February 26, 2025 Veton Këpuska 2 Chapter Outline 7.1 Introduction: The Components of the Memory System 7.2 RAM Structure: The Logic Designer’s Perspective 7.3 Memory Boards and Modules 7.4 Two-Level Memory Hierarchy 7.5 The Cache 7.6 Virtual Memory 7.7 The Memory Subsystem in the Computer
  • 3. February 26, 2025 Veton Këpuska 3 Introduction  So far, we’ve treated memory as an array of words limited in size only by the number of address bits.  Real world issues arise:  cost  speed  size  power consumption  volatility  ...
  • 4. February 26, 2025 Veton Këpuska 4 Topics Covered  Memory components:  RAM memory cells and cell arrays  Static RAM—more expensive, but less complex  Tree and matrix decoders—needed for large RAM chips  Dynamic RAM—less expensive, but needs “refreshing”  Chip organization  Timing  ROM—Read-only memory  Memory boards  Arrays of chips give more addresses and/or wider words  2-D and 3-D chip arrays  Memory modules  Large systems can benefit by partitioning memory for  separate access by system components  fast access to multiple words
  • 5. February 26, 2025 Veton Këpuska 5 Topics Covered • The memory hierarchy: from fast and expensive to slow and cheap  Example: Registers Cache Main Memory Disk  At first, consider just two adjacent levels in the hierarchy  The cache: High speed and expensive  Kinds: Direct mapped, associative, set associative  Virtual memory—makes the hierarchy transparent  Translate the address from CPU’s logical address to the physical address where the information is actually stored  Memory management—how to move information back and forth  Multiprogramming—what to do while we wait  The Translation Look-aside Buffer - “TLB” helps in speeding the address translation process • Overall consideration of the memory as a subsystem
  • 6. February 26, 2025 Veton Këpuska 6 The CPU–Memory Interface  Sequence of events:  Read: 1. CPU loads MAR, issues Read, and REQUEST 2. Main memory transmits words to MDR 3. Main memory asserts COMPLETE  Write: 1. CPU loads MAR and MDR, asserts Write, and REQUEST 2. Value in MDR is written into address in MAR 3. Main memory asserts COMPLETE CPU m Main memory Address bus Data bus s Address 0 1 2 3 2m – 1 A0 – Am– 1 D0 – Db– 1 R/W REQUEST COMPLETE MDR Register file Control signals m w w MAR b
  • 7. February 26, 2025 Veton Këpuska 7 The CPU–Memory Interface (cont’d.)  Additional points:  If b < w, main memory must make (w/b) b-bit transfers  Some CPUs allow reading and writing of word sizes < w  Example: Intel 8088: m = 20, w = 16, s = b = 8  8- and 16-bit values can be read and written  If memory is sufficiently fast, or if its response is predictable, then COMPLETE may be omitted  Some systems use separate R and W lines, and omit REQUEST CPU m Main memory Address bus Data bus s Address 0 1 2 3 2m – 1 A0 – AmÐ1 D0 – DbÐ1 R/W REQUEST COMPLETE MDR Register file Control signals m w w MAR b
  • 8. February 26, 2025 Veton Këpuska 8 Some Memory Properties Symbol Definition Intel Intel PowerPC 8088 8086 601 w CPU word size 16 bits 16 bits 64 bits m Bits in a logical memory address 20 bits 20 bits 32 bits s Bits in smallest addressable unit 8 bits 8 bits 8 bits b Data bus size 8 bits 16 bits 64 bits 2m Memory word capacity, s-sized wds 220 wds 220 wds 232 wds 2mxsMemory bit capacity 220x8 bits 220 8 bits 232x8 bits
  • 9. February 26, 2025 Veton Këpuska 9 Big-Endian and Little-Endian Storage  When data types having a word size larger than the smallest addressable unit are stored in memory the question arises,  “Is the least significant part of the word stored at the lowest address (little-Endian, little end first) or—  is the most significant part of the word stored at the lowest address (big-Endian, big end first)”? Example: The hexadecimal 16-bit number ABCDH, stored at address 0: AB CD msb ... lsb AB CD 0 1 AB CD 0 1 Little-Endian Big-Endian
  • 10. February 26, 2025 Veton Këpuska 10 Bin-Endian and Little-Endian Storage of a PowerPC 32- Bit Word Little-Endian Storage Memory Address Contents 0 b7 … b0 1 b15 … b8 2 b23 … b16 3 b31 … b24 … CPU Word CPU Word bit locations b31...b24 b23…b16 b15...b8 b7…b0 Big-Endian byte addresses 0 1 2 3 Little-Endian byte addresses 3 2 1 0 Big-Endian Storage Memory Address Contents 0 b31 … b24 1 b23 … b16 2 b15 … b8 3 b7 … b0 …
  • 11. February 26, 2025 Veton Këpuska 11 Memory Performance Parameters Symbol Definition Units Meaning ta Access time Time Time to access a memory word until the assertion of the memory completion signal tc Cycle time Time Time from start of read to start of next memory operation k Block size Words Number of words per block  Bandwidth Words/ sec Word Transmission Rate tl Latency Time Time to access first of a sequence of words tbl=t1+k/ Block access time Time Time to access entire block form start of read
  • 12. February 26, 2025 Veton Këpuska 12 The Memory Hierarchy, Cost, and Performance  Some Typical Values: Component Access type Random Random Random Direct Sequential access access access access access Capacity, bytes 64–1024 8–512 KB 8–64 MB 1–10 GB 1 TB Latency 1–10 ns 20 ns 50 ns 10 ms 10 ms–10 s Block size 1 word 16 words 16 words 4 KB 4 KB Bandwidth System 8 MB/s 1 MB/s 1 MB/s 1 MB/s clock rate Cost/MB High $500 $30 $0.25 $0.02 CPU Cache MainMemory DiskMemory Tape Memory
  • 13. February 26, 2025 Veton Këpuska 13 Memory Hierarchy  Memory Hierarchy CPU Cache Main Memory HD word Primary Level Secondary Level Ternary Level 2nd -ary level Block 3rd -ary level Block Cache lines Virtual Memory •Block Size •Latency •Bandwidth •Hit Rate Increases Frequency of transaction Decreases Principle of Locality Fast & Small Slow & Large
  • 14. February 26, 2025 Veton Këpuska 14 Memory Hierarchy  Considering Any Two Adjacent Levels of the Memory Hierarchy  Some definitions:  Temporal locality: the property of most programs that if a given memory location is referenced, it is likely to be referenced again, “soon.”  Spatial locality: if a given memory location is referenced, those locations near it are likely to be referenced “soon.”  Working set: The set of memory locations referenced over a fixed period of time, or within a time window.  Notice that temporal and spatial locality both work to assure that the contents of the working set change only slowly over execution time. CPU • • • • • • Slower, larger Defining the primary and secondary levels: Secondary level Primary level Faster, smaller two adjacent levels in the hierarchy
  • 15. February 26, 2025 Veton Këpuska 15 Primary and Secondary Levels of the Memory Hierarchy Secondary level • The item of commerce between any two levels is the block. • Blocks may/will differ in size at different levels in the hierarchy. Example: Cache block size ~ 16–64 bytes. Disk block size: ~ 1–4 Kbytes. • As working set changes, blocks are moved back/forth through the hierarchy to satisfy memory access requests. • The primary level block size is often determined more by the properties of information transfer than by spatial locality principle. Speed between levels defined by latency: time to access first word, and bandwidth: the number of words per second transmitted between levels. Typical latencies: Cache latency: a few clocks Disk latency: 100,000 clocks Primary level
  • 16. February 26, 2025 Veton Këpuska 16 Primary and Secondary Addresses • A complication: Addresses will differ depending on the level. • Primary address: the address of a value in the primary level. • Secondary address: the address of a value in the secondary level.  Primary and Secondary Address Examples  Main memory address: unsigned integer  Disk address: track number, sector number, offset of word in sector.
  • 17. February 26, 2025 Veton Këpuska 17 Addressing and Accessing a Two-Level Hierarchy The computer system, HW or SW, must perform any address translation that is required: Two ways of forming the address: Segmentation and Paging. Paging is more common. Sometimes the two are used together, one “on top of” the other. More about address translation and paging later... Miss System address Hit Address in secondary memory Memory management unit (MMU) Address in primary memory Block Word Primary level Secondary level Translation function (mapping tables, permissions, etc.)
  • 18. February 26, 2025 Veton Këpuska 18 Primary Address Formation Block System address Lookup table Word Block Primary address (a) Paging Word Block System address Lookup table Word Base address Primary address (b) Segmentation Word +
  • 19. February 26, 2025 Veton Këpuska 19 Hits and Misses; Paging; Block Placement  Hit: the word was found at the level from which it was requested.  Miss: the word was not found at the level from which it was requested.  A miss will result in a request for the block containing the word from the next higher level in the hierarchy.  Hit ratio (or hit rate) = h = number of hits  Miss ratio: 1 - hit ratio  tp = primary memory access time.  ts = secondary memory access time  Access time, ta = h • tp + (1-h) • ts  Page: commonly, a disk block.  Page fault: synonymous with a miss.  Demand paging: pages are moved from disk to main memory only when a word in the page is requested by the processor.  Block placement and replacement decisions must be made each time a block is moved. total number of references
  • 20. February 26, 2025 Veton Këpuska 20 Hits and Misses; Paging; Block Placement  The Cache Regime:  Secondary latency only about 10 time as long as primary-level access time.  CPU wait afforded as long as miss ratio is small.  A small fast memory (sometimes part of CPU), called cache, connected to a slower memory typically of different semiconductor technology.  Cache’s access time matches the processor speed.  Two level cashes:  Primarily cache (faster level)  Secondary cache (slower level)  Since CPU waits on cache miss, software cannot be used in miss response thus everything has to be done in hardware.  This means that placement and replacement decisions and secondary address formation must be kept simple so they can be done quickly with limited hardware.  Less flexibility of placement in cache then in primary level of a virtual memory system.
  • 21. February 26, 2025 Veton Këpuska 21 Hits and Misses; Paging; Block Placement  The Cache Miss versus the Main Memory Miss  The cache hit time is typically 1-2 clock cycles.  Miss penalty only a few 10’s of clock cycles to bring the desired block into the cache from main memory.  Main memory miss penalty typically runs to the 10,000-100,000 of clock cycles to bring the block from the disk.  Cache miss rates of no greater than 2% are tolerable  Main memory miss rates no greater than 0.001% to avoid significant degradation in system performance.
  • 22. February 26, 2025 Veton Këpuska 22 Virtual Memory  A virtual memory is a memory hierarchy, usually consisting of at least main memory and disk, in which the processor issues all memory references as effective addresses in a flat address space. All translations to primary and secondary addresses are handled transparently to the process making the address reference, thus providing the illusion of a flat address space.  Recall that disk accesses may require 100,000 clock cycles to complete, due to the slow access time of the disk subsystem. Once the processor has, through mediation of the operating system, made the proper request to the disk subsystem, it is available for other tasks.  Multiprogramming shares the processor among independent programs that are resident in main memory and thus available for execution.
  • 23. February 26, 2025 Veton Këpuska 23 Decisions in Designing a 2-Level Hierarchy  Translation procedure to translate from system address to primary address.  Block size—block transfer efficiency and miss ratio will be affected.  Processor dispatch on miss—processor wait or processor multiprogrammed.  Primary-level placement—direct, associative, or a combination. Discussed later.  Replacement policy—which block is to be replaced upon a miss.  Direct access to secondary level—in the cache regime, can the processor directly access main memory upon a cache miss?  Write through—can the processor write directly to main memory upon a cache miss?  Read through—can the processor read directly from main memory upon a cache miss as the cache is being updated?  Read or write bypass—can certain infrequent read or write misses be satisfied by a direct access of main memory without any block movement?
  • 24. February 26, 2025 Veton Këpuska 24 The Cache Mapping Function CPU Cache Block Main memory Mapping function Address Word The cache mapping function is responsible for all cache operations: • Placement strategy: where to place an incoming block in the cache • Replacement strategy: which block to replace upon a miss • Read and write policy: how to handle reads and writes upon cache misses Mapping function must be implemented in hardware. Three different types of mapping functions: • Associative • Direct mapped • Block-set associative Example: 256 KB 16 words 32 MB
  • 25. February 26, 2025 Veton Këpuska 25 Memory Fields and Address Translation Example of processor-issued 32-bit virtual address: 0 31 32 bits That same 32-bit address partitioned into two fields, a block field, and a word field. The word field represents the offset into the block specified in the block field: Block number Word 26 6 226 64 word blocks 00 ••• 001001 001011 Example of a specific memory reference: Block 9, word 11.
  • 26. February 26, 2025 Veton Këpuska 26 Associative Cache *16 bits, while unrealistically small, simplifies the examples Associative mapped cache model: any block from main memory can be put anywhere in the cache. Assume a 16-bit main memory.* Cache memory Main memory Valid bits 0 1 2 255 421 ? 119 2 Cache block 0 MM block 0 MM block 1 MM block 2 MM block 119 MM block 421 MM block 8191 ? Cache block 2 Cache block 255 1 0 1 3 13 1 Tag memory Tag field, 13 bits Tag Main memory address: Byte One cache line, 8 bytes One cache line, 8 bytes Valid, 1 bit Associative Memory Search
  • 27. February 26, 2025 Veton Këpuska 27 Associative Cache  In General:  B- Number of Bytes per Block  T-Address size in bits of Main Memory, also Number of Tag bits in the Tag Memory  L-Number of Blocks in Cache.  SMM=Bx2T – Size of Main Memory in Bytes  SCM=LxB – Size of Cache Memory in Bytes  Main Memory address length in bits: log2SMM = T+log2B
  • 28. February 26, 2025 Veton Këpuska 28 Associative Cache Mechanism Because any block can reside anywhere in the cache, an associative (content addressable) memory is used. All locations are searched simultaneously. Match bit Valid bit Match 64 3 8 To CPU Argument register Associative tag memory 3 13 Selector Tag Main memory address Byte Cache block 0 ? Cache block 2 Cache block 255 One cache line, 8 bytes 2 3 1 4 5 6
  • 29. February 26, 2025 Veton Këpuska 29 Advantages and Disadvantages of the Associative Mapped Cache  Advantage  Most flexible of all—any MM block can go anywhere in the cache.  Disadvantages  Large tag memory.  Need to search entire tag memory simultaneously means lots of hardware.  Replacement Policy is an issue when the cache is full. Q.: How is an associative search conducted at the logic gate level? Direct-mapped caches simplify the hardware by allowing each MM block to go into only one place in the cache:
  • 30. February 26, 2025 Veton Këpuska 30 Direct-Mapped Cache Key Idea: all the MM blocks from a given group can go into only one location in the cache, corresponding to the group number. Now the cache needs only examine the single group that its reference specifies. Cache memory Main memory block numbers Group #: Valid bits 0 1 2 255 30 9 1 1 1 1 1 3 8 3 8 1 Tag memory Tag field, 5 bits Group 5 Tag Cache address: Main memory address: Byte One cache line, 8 bytes One cache line, 8 bytes 0 1 2 255 256 257 258 511 512 513 514 767 2305 7680 7681 7682 7936 7937 7938 0 1 2 255 8191 0 Tag #: 1 2 9 30 31
  • 31. February 26, 2025 Veton Këpuska 31 Direct-Mapped Cache  In General:  B- Number of Bytes per Block  T- Number of Tag bits in the Tag Memory also Number of Columns in Main Memory  G-Number of Groups  L  L-Number of Blocks in Cache.  SMM=GxBx2T – Size of Main Memory in Bytes  SCM=LxB – Size of Cache Memory in Bytes  Main Memory address length in bits: log2SMM = T+log2G+log2B
  • 32. February 26, 2025 Veton Këpuska 32 Direct-Mapped Cache Operation 1. Decode the group number of the incoming MM address to select the group 2. If Match AND Valid 3. Then gate out the tag field 4. Compare cache tag with incoming tag 5. If a hit, then gate out the cache line 6. and use the word field to select the desired word. Cache memory Valid bits 0 1 2 255 30 9 1 1 1 5 5 5 8 64 3 1 256 1 1 3 8 1 Tag memory Tag field, 5 bits Group 5 Tag Cache miss Cache hit =  Main memory address Byte Hit 5-bit comparator Selector 8– 256 decoder 4 3 1 5 2 6
  • 33. February 26, 2025 Veton Këpuska 33 Direct-Mapped Caches  The direct mapped cache uses less hardware, but is much more restrictive in block placement.  If two blocks from the same group are frequently referenced, then the cache will “thrash.” That is, repeatedly bring the two competing blocks into and out of the cache. This will cause a performance degradation.  Block replacement strategy is trivial.  Compromise—allow several cache blocks in each group—the Block-Set-Associative Cache
  • 34. February 26, 2025 Veton Këpuska 34 2-Way Set-Associative Cache Example shows 256 groups, a set of two per group. Sometimes referred to as a 2-way set-associative cache. Cache memory Main memory block numbers Group #: 7680 2304 258 0 1 2 255 2 2 0 3 8 3 8 Tag memory Tag field, 5 bits Set 5 Tag Cache group address: Main memory address: Byte One cache line, 8 bytes One cache line, 8 bytes 512 513 255 0 1 2 255 256 257 258 511 512 513 514 767 2304 7680 7681 7682 7936 7937 7938 0 1 2 255 8191 0 Tag #: 1 2 9 30 31 30 9 1 1 511
  • 35. February 26, 2025 Veton Këpuska 35 K-Way Set-Associative Cache Tag Memory Valid Bits Cache Memory L T K L 1 K 2T LS B T log2S log2B Primary Address Main Memory Address
  • 36. February 26, 2025 Veton Këpuska 36 K-Way Set-Associative Cache  K=1-way Block-Set Associative Cache  Direct Mapped Cache  S=1 Block-Set Associative Cache  Associative Cache.
  • 37. February 26, 2025 Veton Këpuska 37 Read/Write Policies  Cache Write Policies upon a Cache Hit:  Write-through:  Updates both cache & Main Memory upon each write to a block that is present in the cache.  Write-back:  Postpones main memory update until block is replaced uses “dirty bit” (this bit is set upon first write in a block of the cache).  Read and Write Miss Policies  Miss upon a read:  Read-through: Immediate forwarding (from MM) to CPU then cache is updated  Load the block into the cache first.  Miss upon a write:  Write allocate: block is first brought into the cache and then (MM) updated.  Write no-allocate: write directly to main memory (without brining block to cache).
  • 38. February 26, 2025 Veton Këpuska 38 Block Replacement Policy  Not needed with direct-mapped cache  Least Recently Used (LRU)  Track usage with a counter. Each time a block is accessed:  Clear counter of accessed block  Increment counters with values less than the one just cleared.  All others remain unchanged  When set is full, remove line with highest count  Random replacement—replace block at random  Even random replacement is a fairly effective strategy  FIFO – Longest residing block replaced first.  Smaller cache memories:  Random worst, LRU best  Large cache memories:  Random, LRU comparable
  • 39. February 26, 2025 Veton Këpuska 39 Performance and Cache Design  Direct-Mapped and Block-Set Associative Caches may suffer from minor time penalties due to the need to decode the group (set) field.  For equivalent amount of hardware (e.g., # number of gates), direct-mapped and set-associative caches will have more capacity.  Quantitative analysis of cache performance is quite complex and is the subject of ongoing research.  Simplified mathematical analysis, however, will provide sufficient insight into the issues involved:
  • 40. February 26, 2025 Veton Këpuska 40 Performance and Cache Design  Definitions:  h - hit ratio and (1-h) miss ratio.  TC cache access time upon a cache hit.  TM access time upon a cache miss.  Ta average memory access during a particular machine execution interval:  Recall - Access time: Ta = h • Tp + (1 - h) • Ts for primary (p) and secondary (s) levels.  For Tp =Tc for cache and Ts=TM for MM, Ta = hTC + (1-h)TM  This equation is valid for read-through cache where the value is forwarded from main memory to the CPU directly as it is received without waiting for the entire cache line to be loaded before initiating another cache read to forward the value.  If this is not the case TM is replaced by the cache line fill time.
  • 41. February 26, 2025 Veton Këpuska 41 Performance and Cache Design  Speedup S: where  Twithout is the time taken without the improvement, e.g., without the cache, and  Twith is the time the process takes with the improvement, e.g., with the cache.  Having a model for cache and MM access times and cache line fill time, the speedup can be calculated once the hit ratio is known. with without T T S 
  • 42. February 26, 2025 Veton Këpuska 42 Performance and Cache Design  Example:  Changing a certain cache from direct mapped to two- way set-associative caused the hit ratio measured for a certain set of benchmarks to improve form 0.91 to 0.96 with TC=20 ns, and TM=70ns. What is the speedup, assuming no change in Tc from the set- associative multiplexer.  Answer:  Twithout = 0.91x20 + (1-0.91)x70 = 24.5 ns;  Twith = 0.96x20 + (1-0.96)x70 = 22.5 ns;  Thus the speedup is:  S=24.5/22.0=1.11  11% improvement in execution time.
  • 43. February 26, 2025 Veton Këpuska 43 Virtual Memory  The memory management unit, MMU, is responsible for mapping logical addresses issued by the CPU to physical addresses that are presented to the cache and main memory. A word about addresses: CPU MainMemory Cache Disk MMU Logical Address Physical Address Mapping Tables Virtual Address CPU Chip • Effective address — an address computed by the processor while executing a program. Synonymous with logical address. • The term effective address is often used when referring to activity inside the CPU. Logical address is most often used when referring to addresses when viewed from outside the CPU. • Virtual address — the address generated from the logical address by the memory management unit, MMU, prior to making address translation. • Physical address — the address presented to the memory unit. (Note: Every address reference must be translated.)
  • 44. February 26, 2025 Veton Këpuska 44 Virtual Addresses—Why?  The logical address provided by the CPU is translated to a virtual address by the MMU. Often the virtual address space is larger than the logical address, allowing program units to be mapped to a much larger virtual address space.
  • 45. February 26, 2025 Veton Këpuska 45 Virtual Addressing—Advantages  Simplified addressing. Each program unit can be compiled into its own memory space, beginning at address 0 and potentially extending far beyond the amount of physical memory present in the system.  No address relocation required at load time.  No need to fragment the program to accommodate memory limitations.  Cost effective use of physical memory.  Less expensive secondary (disk) storage can replace primary storage. (The MMU will bring portions of the program into physical memory as required, e.g., not all portions of the program does need to occupy physical memory at one time)  Access control. As each memory reference is translated, it can be simultaneously checked for read, write, and execute privileges.  This allows access/security control at the most fundamental levels.  Can be used to prevent buggy programs and intruders from causing damage to other users or the system().  This is the origin of those “bus error” and “segmentation fault” messages.
  • 46. February 26, 2025 Veton Këpuska 46 Memory Management Block System address Lookup table Word Block Primary address (a) Paging Word Block System address Lookup table Word Base address Primary address (b) Segmentation Word +
  • 47. February 26, 2025 Veton Këpuska 47 Memory Management by Segmentation  Notice that each segment’s virtual address starts at 0, different from its physical address.  Repeated movement of segments into and out of physical memory will result in gaps between segments. This is called external fragmentation.  Compaction (defragmenter) routines must be occasionally run to remove these fragments. Main memory Segment 1 Segment 5 Gap Segment 6 Physical memory addresses Virtual memory addresses 0000 0 0 0 0 0 FFF Segment 9 Segment 3 Gap
  • 48. February 26, 2025 Veton Këpuska 48 Segmentation Mechanism  The computation of physical address from virtual address requires an integer addition for each memory reference, and a comparison if segment limits are checked.  Q: How does the MMU switch references from one segment to another? Main memory Segment 1 Segment 5 Gap Segment 6 Offset in segment Segment base register Segment limit register No Virtual memory address from CPU Bounds error Segment 9 Segment 3 Gap + 
  • 49. February 26, 2025 Veton Këpuska 49 Memory Management by Paging  This figure shows the mapping between virtual memory pages, physical memory pages, and pages in secondary memory. Page n - 1 is not present in physical memory, but only in secondary memory.  The MMU manages this mapping (logical to physical address). Program unit 0 Page 1 Page 2 Page n – 1 Virtual memory Page 0 Physical memory Secondary memory . . Demand Paging: Pages are brought into physical memory only as they are needed.
  • 50. February 26, 2025 Veton Këpuska 50 Virtual Address Translation in a Paged MMU A page fault will result in 100,000 or more cycles passing before the page has been brought from secondary storage to MM.  Page table limit register Page table base register No Bounds error Access- control bits: presence bit, dirty bit, usage bits Physical page number or pointer to secondary storage + Offset in page table Hit. Page in primary memory. Translate to Disk address. Miss (page fault). Page in secondary memory. Page table Desired word Main memory Virtual address from CPU Page number Offset in page Physical page Physical address Word • 1 table per user per program unit • One translation per memory access • Potentially large page table
  • 51. February 26, 2025 Veton Këpuska 51 Page Placement and Replacement  Page tables are direct mapped, since the physical page is computed directly from the virtual page number.  But physical pages can reside anywhere in physical memory.  Page tables such as those on the previous slide result in large page tables, since there must be a page table entry for every page in the program unit.  Some implementations resort to hash tables instead, which need have entries only for those pages actually present in physical memory.  Replacement strategies are generally LRU, or at least employ a “use bit” to guide replacement.
  • 52. February 26, 2025 Veton Këpuska 52 Fast Address Translation: Regaining Lost Ground  The concept of virtual memory is very attractive, but leads to considerable overhead:  There must be a translation for every memory reference.  There must be two memory references for every program reference:  One to retrieve the page table entry,  One to retrieve the value.  Most caches are addressed by physical address, so there must be a virtual to physical translation before the cache can be accessed.
  • 53. February 26, 2025 Veton Këpuska 53 Fast Address Translation: Regaining Lost Ground  The answer: a small cache in the processor that retains the last few virtual to physical translations: a Translation Lookaside Buffer, TLB.  The TLB contains not only the virtual to physical translations, but also the valid, dirty, and protection bits, so a TLB hit allows the processor to access physical memory directly.  The TLB is usually implemented as a fully associative cache:
  • 54. February 26, 2025 Veton Këpuska 54 Translation Lookaside Buffer Structure and Operation TLB Desired word Main memory or cache Virtual address from CPU Page number Associative lookup of virtual page number in TLB TLB miss. Look for physical page in page table. To page table Virtual page number Word Physical page Physical address Word Hit N Y Access- control bits: presence bit, dirty bit, valid bit, usage bits Physical page number TLB hit. Page is in primary memory.
  • 55. February 26, 2025 Veton Këpuska 55 Operation of the Memory Hierarchy Virtual address CPU Cache Main memory Secondary memory Search TLB Search cache Update cache from MM Return value from cache Generate physical address TLB hit Cache hit Search page table Update MM, cache, and page table Page fault. Get page from secondary memory Generate physical address Update TLB Page table hit Y Y Y Miss Miss Miss
  • 56. February 26, 2025 Veton Këpuska 56 I/O Connection to a Memory with a Cache  The memory system is quite complex, and affords many possible tradeoffs.  The only realistic way to chose among these alternatives is to study a typical workload, using either simulations or prototype systems.  Instruction and data accesses usually have different patterns.  It is possible to employ a cache at the disk level, using the disk hardware.  Traffic between MM and disk is I/O, and direct memory access, DMA, can be used to speed the transfers: Cache Main memory Paging DMA Disk I/O I/O DMA CPU
  • 57. February 26, 2025 Veton Këpuska 57 Summary  Most memory systems are multileveled—cache, main memory, and disk.  Static and dynamic RAM are fastest components, and their speed has the strongest effect on system performance.  Chips are organized into boards and modules.  Larger, slower memory is attached to faster memory in a hierarchical structure.  The cache to main memory interface requires hardware address translation.  Virtual memory—the main memory–disk interface—can employ software for address translation because of the slower speeds involved.  The hierarchy must be carefully designed to ensure optimum price-performance.