2. February 26, 2025 Veton Këpuska 2
Chapter Outline
7.1 Introduction: The Components of the Memory
System
7.2 RAM Structure: The Logic Designer’s Perspective
7.3 Memory Boards and Modules
7.4 Two-Level Memory Hierarchy
7.5 The Cache
7.6 Virtual Memory
7.7 The Memory Subsystem in the Computer
3. February 26, 2025 Veton Këpuska 3
Introduction
So far, we’ve treated memory as an array of words
limited in size only by the number of address bits.
Real world issues arise:
cost
speed
size
power consumption
volatility
...
4. February 26, 2025 Veton Këpuska 4
Topics Covered
Memory components:
RAM memory cells and cell arrays
Static RAM—more expensive, but less complex
Tree and matrix decoders—needed for large RAM chips
Dynamic RAM—less expensive, but needs “refreshing”
Chip organization
Timing
ROM—Read-only memory
Memory boards
Arrays of chips give more addresses and/or wider
words
2-D and 3-D chip arrays
Memory modules
Large systems can benefit by partitioning memory for
separate access by system components
fast access to multiple words
5. February 26, 2025 Veton Këpuska 5
Topics Covered
• The memory hierarchy: from fast and expensive to slow
and cheap
Example: Registers Cache Main Memory
Disk
At first, consider just two adjacent levels in the
hierarchy
The cache: High speed and expensive
Kinds: Direct mapped, associative, set associative
Virtual memory—makes the hierarchy transparent
Translate the address from CPU’s logical address
to the physical address where the information is
actually stored
Memory management—how to move information
back and forth
Multiprogramming—what to do while we wait
The Translation Look-aside Buffer - “TLB” helps in
speeding the address translation process
• Overall consideration of the memory as a subsystem
6. February 26, 2025 Veton Këpuska 6
The CPU–Memory Interface
Sequence of events:
Read:
1. CPU loads MAR, issues Read, and REQUEST
2. Main memory transmits words to MDR
3. Main memory asserts COMPLETE
Write:
1. CPU loads MAR and MDR, asserts Write, and REQUEST
2. Value in MDR is written into address in MAR
3. Main memory asserts COMPLETE
CPU
m
Main memory
Address bus
Data bus
s Address
0
1
2
3
2m
– 1
A0 – Am– 1
D0 – Db– 1
R/W
REQUEST
COMPLETE
MDR
Register
file
Control signals
m
w
w
MAR
b
7. February 26, 2025 Veton Këpuska 7
The CPU–Memory Interface (cont’d.)
Additional points:
If b < w, main memory must make (w/b) b-bit transfers
Some CPUs allow reading and writing of word sizes < w
Example: Intel 8088: m = 20, w = 16, s = b = 8
8- and 16-bit values can be read and written
If memory is sufficiently fast, or if its response is
predictable,
then COMPLETE may be omitted
Some systems use separate R and W lines, and omit
REQUEST
CPU
m
Main memory
Address bus
Data bus
s Address
0
1
2
3
2m
– 1
A0 – AmÐ1
D0 – DbÐ1
R/W
REQUEST
COMPLETE
MDR
Register
file
Control signals
m
w
w
MAR
b
8. February 26, 2025 Veton Këpuska 8
Some Memory Properties
Symbol Definition Intel Intel PowerPC
8088 8086 601
w CPU word size 16 bits 16 bits 64 bits
m Bits in a logical memory address 20 bits 20 bits 32 bits
s Bits in smallest addressable unit 8 bits 8 bits 8 bits
b Data bus size 8 bits 16 bits 64 bits
2m Memory word capacity, s-sized wds 220 wds 220 wds 232 wds
2mxsMemory bit capacity 220x8 bits 220 8 bits 232x8 bits
9. February 26, 2025 Veton Këpuska 9
Big-Endian and Little-Endian Storage
When data types having a word size larger than the
smallest addressable unit are stored in memory the
question arises,
“Is the least significant part of the word stored at the
lowest address (little-Endian, little end first) or—
is the most significant part of the word stored at the lowest
address (big-Endian, big end first)”?
Example: The hexadecimal 16-bit number ABCDH,
stored at address 0:
AB CD
msb ... lsb
AB
CD
0
1
AB
CD
0
1
Little-Endian Big-Endian
10. February 26, 2025 Veton Këpuska 10
Bin-Endian and Little-Endian Storage of a PowerPC 32-
Bit Word
Little-Endian Storage
Memory
Address
Contents
0 b7 … b0
1 b15 … b8
2 b23 … b16
3 b31 … b24
…
CPU Word
CPU Word bit
locations
b31...b24 b23…b16 b15...b8 b7…b0
Big-Endian byte
addresses
0 1 2 3
Little-Endian byte
addresses
3 2 1 0
Big-Endian Storage
Memory
Address
Contents
0 b31 … b24
1 b23 … b16
2 b15 … b8
3 b7 … b0
…
11. February 26, 2025 Veton Këpuska 11
Memory Performance Parameters
Symbol Definition Units Meaning
ta
Access time Time Time to access a memory word until
the assertion of the memory
completion signal
tc
Cycle time Time Time from start of read to start of next
memory operation
k Block size Words Number of words per block
Bandwidth Words/
sec
Word Transmission Rate
tl
Latency Time Time to access first of a sequence of
words
tbl=t1+k/ Block
access
time
Time Time to access entire block form
start of read
12. February 26, 2025 Veton Këpuska 12
The Memory Hierarchy, Cost, and Performance
Some Typical Values:
Component
Access type Random Random Random Direct Sequential
access access access access access
Capacity, bytes 64–1024 8–512 KB 8–64 MB 1–10 GB 1 TB
Latency 1–10 ns 20 ns 50 ns 10 ms 10 ms–10 s
Block size 1 word 16 words 16 words 4 KB 4 KB
Bandwidth System 8 MB/s 1 MB/s 1 MB/s 1 MB/s
clock
rate
Cost/MB High $500 $30 $0.25 $0.02
CPU
Cache MainMemory DiskMemory
Tape
Memory
13. February 26, 2025 Veton Këpuska 13
Memory Hierarchy
Memory Hierarchy
CPU Cache Main
Memory
HD
word
Primary
Level
Secondary
Level
Ternary
Level
2nd
-ary
level
Block
3rd
-ary
level
Block
Cache
lines
Virtual Memory
•Block Size
•Latency
•Bandwidth
•Hit Rate
Increases Frequency of transaction Decreases
Principle of Locality
Fast & Small
Slow & Large
14. February 26, 2025 Veton Këpuska 14
Memory Hierarchy
Considering Any Two Adjacent Levels of the Memory
Hierarchy
Some definitions:
Temporal locality: the property of most programs that if a
given memory location is referenced, it is likely to be
referenced again, “soon.”
Spatial locality: if a given memory location is referenced,
those locations near it are likely to be referenced “soon.”
Working set: The set of memory locations referenced over a
fixed period of time, or within a time window.
Notice that temporal and spatial locality both work to
assure that the contents of the working set change only
slowly over execution time.
CPU • • • • • •
Slower,
larger
Defining the primary and secondary levels:
Secondary
level
Primary
level
Faster,
smaller
two adjacent levels in the hierarchy
15. February 26, 2025 Veton Këpuska 15
Primary and Secondary Levels of the Memory Hierarchy
Secondary
level
• The item of commerce between any two levels is the block.
• Blocks may/will differ in size at different levels in the hierarchy.
Example: Cache block size ~ 16–64 bytes.
Disk block size: ~ 1–4 Kbytes.
• As working set changes, blocks are moved back/forth through the
hierarchy to satisfy memory access requests.
• The primary level block size is often determined more by the
properties of information transfer than by spatial locality principle.
Speed between levels defined by
latency: time to access first word, and
bandwidth: the number of words per second transmitted between
levels.
Typical latencies:
Cache latency: a few clocks
Disk latency: 100,000 clocks
Primary
level
16. February 26, 2025 Veton Këpuska 16
Primary and Secondary Addresses
• A complication: Addresses will differ
depending on the level.
• Primary address: the address of a value in
the primary level.
• Secondary address: the address of a
value in the secondary level.
Primary and Secondary Address
Examples
Main memory address: unsigned integer
Disk address: track number, sector number,
offset of word in sector.
17. February 26, 2025 Veton Këpuska 17
Addressing and Accessing a Two-Level Hierarchy
The
computer
system, HW
or SW,
must
perform any
address
translation
that is
required:
Two ways of forming the address: Segmentation and Paging.
Paging is more common. Sometimes the two are used together,
one “on top of” the other. More about address translation and paging later...
Miss
System
address
Hit
Address in
secondary
memory
Memory management unit (MMU)
Address in
primary
memory
Block
Word
Primary
level
Secondary
level
Translation function
(mapping tables,
permissions, etc.)
18. February 26, 2025 Veton Këpuska 18
Primary Address Formation
Block
System address
Lookup table
Word
Block
Primary address
(a) Paging
Word
Block
System address
Lookup table
Word
Base address
Primary address
(b) Segmentation
Word
+
19. February 26, 2025 Veton Këpuska 19
Hits and Misses; Paging; Block Placement
Hit: the word was found at the level from which it was
requested.
Miss: the word was not found at the level from which it was
requested.
A miss will result in a request for the block containing the word
from the next higher level in the hierarchy.
Hit ratio (or hit rate) = h = number of hits
Miss ratio: 1 - hit ratio
tp = primary memory access time.
ts = secondary memory access time
Access time, ta = h • tp + (1-h) • ts
Page: commonly, a disk block.
Page fault: synonymous with a miss.
Demand paging: pages are moved from disk to main
memory only when a word in the page is requested by the
processor.
Block placement and replacement decisions must be made
each time a block is moved.
total number of
references
20. February 26, 2025 Veton Këpuska 20
Hits and Misses; Paging; Block Placement
The Cache Regime:
Secondary latency only about 10 time as long as
primary-level access time.
CPU wait afforded as long as miss ratio is small.
A small fast memory (sometimes part of CPU), called
cache, connected to a slower memory typically of
different semiconductor technology.
Cache’s access time matches the processor speed.
Two level cashes:
Primarily cache (faster level)
Secondary cache (slower level)
Since CPU waits on cache miss, software cannot be
used in miss response thus everything has to be
done in hardware.
This means that placement and replacement decisions
and secondary address formation must be kept simple
so they can be done quickly with limited hardware.
Less flexibility of placement in cache then in primary
level of a virtual memory system.
21. February 26, 2025 Veton Këpuska 21
Hits and Misses; Paging; Block Placement
The Cache Miss versus the Main Memory
Miss
The cache hit time is typically 1-2 clock cycles.
Miss penalty only a few 10’s of clock cycles to
bring the desired block into the cache from main
memory.
Main memory miss penalty typically runs to the
10,000-100,000 of clock cycles to bring the
block from the disk.
Cache miss rates of no greater than 2% are
tolerable
Main memory miss rates no greater than
0.001% to avoid significant degradation in
system performance.
22. February 26, 2025 Veton Këpuska 22
Virtual Memory
A virtual memory is a memory hierarchy, usually
consisting of at least main memory and disk, in which
the processor issues all memory references as
effective addresses in a flat address space. All
translations to primary and secondary addresses are
handled transparently to the process making the
address reference, thus providing the illusion of a
flat address space.
Recall that disk accesses may require 100,000 clock
cycles to complete, due to the slow access time of
the disk subsystem. Once the processor has, through
mediation of the operating system, made the proper
request to the disk subsystem, it is available for other
tasks.
Multiprogramming shares the processor among
independent programs that are resident in main
memory and thus available for execution.
23. February 26, 2025 Veton Këpuska 23
Decisions in Designing a 2-Level Hierarchy
Translation procedure to translate from system address to primary
address.
Block size—block transfer efficiency and miss ratio will be affected.
Processor dispatch on miss—processor wait or processor
multiprogrammed.
Primary-level placement—direct, associative, or a combination.
Discussed later.
Replacement policy—which block is to be replaced upon a miss.
Direct access to secondary level—in the cache regime, can the
processor directly access main memory upon a cache miss?
Write through—can the processor write directly to main memory upon a
cache miss?
Read through—can the processor read directly from main memory upon a
cache miss as the cache is being updated?
Read or write bypass—can certain infrequent read or write misses be
satisfied by a direct access of main memory without any block movement?
24. February 26, 2025 Veton Këpuska 24
The Cache Mapping Function
CPU
Cache
Block
Main memory
Mapping function
Address
Word
The cache mapping function is responsible for all cache operations:
• Placement strategy: where to place an incoming block in the cache
• Replacement strategy: which block to replace upon a miss
• Read and write policy: how to handle reads and writes upon cache misses
Mapping function must be implemented in hardware.
Three different types of mapping functions:
• Associative
• Direct mapped
• Block-set associative
Example: 256 KB 16 words 32 MB
25. February 26, 2025 Veton Këpuska 25
Memory Fields and Address Translation
Example of processor-issued 32-bit virtual address:
0
31
32 bits
That same 32-bit address partitioned into two fields, a block field,
and a word field. The word field represents the offset into the block
specified in the block field:
Block number Word
26 6
226
64 word blocks
00 ••• 001001 001011
Example of a specific memory reference: Block 9, word 11.
26. February 26, 2025 Veton Këpuska 26
Associative Cache
*16 bits, while unrealistically small, simplifies the examples
Associative mapped
cache model: any
block from main
memory can be put
anywhere in the
cache.
Assume a 16-bit
main memory.*
Cache
memory
Main
memory
Valid
bits
0
1
2
255
421
?
119
2
Cache block 0 MM block 0
MM block 1
MM block 2
MM block 119
MM block 421
MM block 8191
?
Cache block 2
Cache block 255
1
0
1
3
13
1
Tag
memory
Tag
field,
13 bits
Tag
Main memory address:
Byte
One cache line,
8 bytes
One cache line,
8 bytes
Valid,
1 bit
Associative
Memory Search
27. February 26, 2025 Veton Këpuska 27
Associative Cache
In General:
B- Number of Bytes per Block
T-Address size in bits of Main Memory, also
Number of Tag bits in the Tag Memory
L-Number of Blocks in Cache.
SMM=Bx2T – Size of Main Memory in Bytes
SCM=LxB – Size of Cache Memory in Bytes
Main Memory address length in bits:
log2SMM = T+log2B
28. February 26, 2025 Veton Këpuska 28
Associative Cache Mechanism
Because any block can reside anywhere in the cache, an associative (content
addressable) memory is used. All locations are searched simultaneously.
Match
bit
Valid
bit
Match
64
3
8
To CPU
Argument
register
Associative tag memory
3
13 Selector
Tag
Main memory address
Byte
Cache block 0
?
Cache block 2
Cache block 255
One cache line,
8 bytes
2
3
1
4
5
6
29. February 26, 2025 Veton Këpuska 29
Advantages and Disadvantages
of the Associative Mapped Cache
Advantage
Most flexible of all—any MM block can go
anywhere in the cache.
Disadvantages
Large tag memory.
Need to search entire tag memory
simultaneously means lots of hardware.
Replacement Policy is an issue when the
cache is full.
Q.: How is an associative search conducted at the logic gate level?
Direct-mapped caches simplify the hardware by allowing each MM block
to go into only one place in the cache:
30. February 26, 2025 Veton Këpuska 30
Direct-Mapped Cache
Key Idea: all the MM
blocks from a given
group can go into only
one location in the
cache, corresponding
to the group number.
Now the cache needs only examine the single group
that its reference specifies.
Cache
memory Main memory block numbers Group #:
Valid
bits
0
1
2
255
30
9
1
1
1
1
1
3
8
3
8
1
Tag
memory
Tag
field,
5 bits
Group
5
Tag
Cache address:
Main memory address:
Byte
One
cache
line,
8 bytes One cache line,
8 bytes
0
1
2
255
256
257
258
511
512
513
514
767
2305
7680
7681
7682
7936
7937
7938
0
1
2
255
8191
0
Tag #: 1 2 9 30 31
31. February 26, 2025 Veton Këpuska 31
Direct-Mapped Cache
In General:
B- Number of Bytes per Block
T- Number of Tag bits in the Tag Memory also
Number of Columns in Main Memory
G-Number of Groups L
L-Number of Blocks in Cache.
SMM=GxBx2T
– Size of Main Memory in Bytes
SCM=LxB – Size of Cache Memory in Bytes
Main Memory address length in bits:
log2SMM = T+log2G+log2B
32. February 26, 2025 Veton Këpuska 32
Direct-Mapped Cache Operation
1. Decode the
group number of
the incoming MM
address to select
the group
2. If Match
AND Valid
3. Then gate out the
tag field
4. Compare cache
tag with
incoming tag
5. If a hit, then gate
out the cache
line
6. and use the word field to
select the desired word.
Cache
memory
Valid
bits
0
1
2
255
30
9
1
1
1
5
5 5
8
64
3
1
256
1
1
3
8
1
Tag
memory
Tag
field,
5 bits
Group
5
Tag
Cache miss
Cache hit
=
Main memory address
Byte
Hit
5-bit
comparator Selector
8– 256
decoder
4
3
1
5
2
6
33. February 26, 2025 Veton Këpuska 33
Direct-Mapped Caches
The direct mapped cache uses less hardware,
but is much more restrictive in block placement.
If two blocks from the same group are
frequently referenced, then the cache will
“thrash.” That is, repeatedly bring the two
competing blocks into and out of the cache. This
will cause a performance degradation.
Block replacement strategy is trivial.
Compromise—allow several cache blocks in each
group—the Block-Set-Associative Cache
34. February 26, 2025 Veton Këpuska 34
2-Way Set-Associative Cache
Example shows 256 groups, a set of two per group.
Sometimes referred to as a 2-way set-associative cache.
Cache
memory Main memory block numbers Group #:
7680
2304
258
0
1
2
255
2
2
0
3
8
3
8
Tag
memory
Tag
field,
5 bits
Set
5
Tag
Cache group address:
Main memory address:
Byte
One
cache
line,
8 bytes
One cache line,
8 bytes
512
513
255
0
1
2
255
256
257
258
511
512
513
514
767
2304
7680
7681
7682
7936
7937
7938
0
1
2
255
8191
0
Tag #: 1 2 9 30 31
30
9
1
1 511
35. February 26, 2025 Veton Këpuska 35
K-Way Set-Associative Cache
Tag Memory Valid Bits Cache Memory
L
T
K
L
1
K 2T
LS
B
T log2S log2B
Primary
Address
Main
Memory
Address
37. February 26, 2025 Veton Këpuska 37
Read/Write Policies
Cache Write Policies upon a Cache Hit:
Write-through:
Updates both cache & Main Memory upon each write to
a block that is present in the cache.
Write-back:
Postpones main memory update until block is replaced
uses “dirty bit” (this bit is set upon first write in a block
of the cache).
Read and Write Miss Policies
Miss upon a read:
Read-through: Immediate forwarding (from MM) to CPU
then cache is updated
Load the block into the cache first.
Miss upon a write:
Write allocate: block is first brought into the cache and
then (MM) updated.
Write no-allocate: write directly to main memory
(without brining block to cache).
38. February 26, 2025 Veton Këpuska 38
Block Replacement Policy
Not needed with direct-mapped cache
Least Recently Used (LRU)
Track usage with a counter. Each time a block is
accessed:
Clear counter of accessed block
Increment counters with values less than the one
just cleared.
All others remain unchanged
When set is full, remove line with highest count
Random replacement—replace block at random
Even random replacement is a fairly effective
strategy
FIFO – Longest residing block replaced first.
Smaller cache memories:
Random worst, LRU best
Large cache memories:
Random, LRU comparable
39. February 26, 2025 Veton Këpuska 39
Performance and Cache Design
Direct-Mapped and Block-Set Associative Caches may
suffer from minor time penalties due to the need to
decode the group (set) field.
For equivalent amount of hardware (e.g., # number
of gates), direct-mapped and set-associative caches
will have more capacity.
Quantitative analysis of cache performance is quite
complex and is the subject of ongoing research.
Simplified mathematical analysis, however, will
provide sufficient insight into the issues involved:
40. February 26, 2025 Veton Këpuska 40
Performance and Cache Design
Definitions:
h - hit ratio and (1-h) miss ratio.
TC cache access time upon a cache hit.
TM access time upon a cache miss.
Ta average memory access during a particular machine execution interval:
Recall - Access time:
Ta = h • Tp + (1 - h) • Ts
for primary (p) and secondary (s) levels.
For Tp =Tc for cache and Ts=TM for MM,
Ta = hTC + (1-h)TM
This equation is valid for read-through cache where the value
is forwarded from main memory to the CPU directly as it is
received without waiting for the entire cache line to be loaded
before initiating another cache read to forward the value.
If this is not the case TM is replaced by the cache line fill time.
41. February 26, 2025 Veton Këpuska 41
Performance and Cache Design
Speedup S:
where
Twithout is the time taken without the improvement,
e.g., without the cache, and
Twith is the time the process takes with the
improvement, e.g., with the cache.
Having a model for cache and MM access times
and cache line fill time, the speedup can be
calculated once the hit ratio is known.
with
without
T
T
S
42. February 26, 2025 Veton Këpuska 42
Performance and Cache Design
Example:
Changing a certain cache from direct mapped to two-
way set-associative caused the hit ratio measured for
a certain set of benchmarks to improve form 0.91 to
0.96 with TC=20 ns, and TM=70ns. What is the
speedup, assuming no change in Tc from the set-
associative multiplexer.
Answer:
Twithout = 0.91x20 + (1-0.91)x70 = 24.5 ns;
Twith = 0.96x20 + (1-0.96)x70 = 22.5 ns;
Thus the speedup is:
S=24.5/22.0=1.11 11% improvement in execution
time.
43. February 26, 2025 Veton Këpuska 43
Virtual Memory
The memory management unit, MMU, is responsible for mapping logical
addresses issued by the CPU to physical addresses that are presented
to the cache and main memory.
A word about addresses:
CPU
MainMemory
Cache Disk
MMU
Logical
Address
Physical
Address
Mapping
Tables
Virtual
Address
CPU Chip
• Effective address — an address computed by the processor while
executing a program. Synonymous with logical address.
• The term effective address is often used when referring to
activity inside the CPU. Logical address is most often used
when referring to addresses when viewed from outside the CPU.
• Virtual address — the address generated from the logical address
by the memory management unit, MMU, prior to making address
translation.
• Physical address — the address presented to the memory unit.
(Note: Every address reference must be translated.)
44. February 26, 2025 Veton Këpuska 44
Virtual Addresses—Why?
The logical address provided by the
CPU is translated to a virtual address
by the MMU. Often the virtual address
space is larger than the logical
address, allowing program units to be
mapped to a much larger virtual
address space.
45. February 26, 2025 Veton Këpuska 45
Virtual Addressing—Advantages
Simplified addressing. Each program unit can be compiled
into its own memory space, beginning at address 0 and
potentially extending far beyond the amount of physical
memory present in the system.
No address relocation required at load time.
No need to fragment the program to accommodate memory
limitations.
Cost effective use of physical memory.
Less expensive secondary (disk) storage can replace primary
storage. (The MMU will bring portions of the program into
physical memory as required, e.g., not all portions of the
program does need to occupy physical memory at one time)
Access control. As each memory reference is translated, it
can be simultaneously checked for read, write, and execute
privileges.
This allows access/security control at the most fundamental
levels.
Can be used to prevent buggy programs and intruders from
causing damage to other users or the system().
This is the origin of those “bus error” and “segmentation
fault” messages.
46. February 26, 2025 Veton Këpuska 46
Memory Management
Block
System address
Lookup table
Word
Block
Primary address
(a) Paging
Word
Block
System address
Lookup table
Word
Base address
Primary address
(b) Segmentation
Word
+
47. February 26, 2025 Veton Këpuska 47
Memory Management by Segmentation
Notice that each
segment’s virtual
address starts at 0,
different from its
physical address.
Repeated movement
of segments into
and out of physical
memory will result
in gaps between
segments. This is
called external
fragmentation.
Compaction
(defragmenter)
routines must be
occasionally run to
remove these
fragments.
Main memory
Segment 1
Segment 5
Gap
Segment 6
Physical
memory
addresses
Virtual
memory
addresses
0000
0
0
0
0
0
FFF
Segment 9
Segment 3
Gap
48. February 26, 2025 Veton Këpuska 48
Segmentation Mechanism
The computation of
physical address
from virtual
address requires
an integer addition
for each memory
reference, and a
comparison if
segment limits are
checked.
Q: How does the
MMU switch
references from
one segment to
another?
Main memory
Segment 1
Segment 5
Gap
Segment 6
Offset in
segment
Segment
base
register
Segment
limit
register
No
Virtual
memory
address
from CPU
Bounds
error
Segment 9
Segment 3
Gap
+
49. February 26, 2025 Veton Këpuska 49
Memory Management by Paging
This figure shows the mapping between virtual memory pages,
physical memory pages, and pages in secondary memory. Page
n - 1 is not present in physical memory, but only in secondary
memory.
The MMU manages this mapping (logical to physical address).
Program
unit
0
Page 1
Page 2
Page n – 1
Virtual memory
Page 0
Physical memory
Secondary memory
.
.
Demand Paging:
Pages are brought
into physical
memory only as
they are needed.
50. February 26, 2025 Veton Këpuska 50
Virtual Address Translation in a Paged MMU
A page fault will result in 100,000 or more cycles passing before the page
has been brought from secondary storage to MM.
Page table
limit register
Page table
base register
No
Bounds
error
Access-
control bits:
presence bit,
dirty bit,
usage bits
Physical
page
number or
pointer to
secondary
storage
+
Offset in
page table
Hit.
Page in
primary memory.
Translate to
Disk address.
Miss
(page fault).
Page in
secondary
memory.
Page table
Desired word
Main memory
Virtual address from CPU
Page number Offset in page Physical page
Physical address
Word
• 1 table per
user per
program unit
• One
translation
per memory
access
• Potentially
large page
table
51. February 26, 2025 Veton Këpuska 51
Page Placement and Replacement
Page tables are direct mapped, since the
physical page is computed directly from the
virtual page number.
But physical pages can reside anywhere in
physical memory.
Page tables such as those on the previous slide
result in large page tables, since there must be a
page table entry for every page in the program
unit.
Some implementations resort to hash tables
instead, which need have entries only for those
pages actually present in physical memory.
Replacement strategies are generally LRU, or at
least employ a “use bit” to guide replacement.
52. February 26, 2025 Veton Këpuska 52
Fast Address Translation: Regaining Lost Ground
The concept of virtual memory is very
attractive, but leads to considerable
overhead:
There must be a translation for every
memory reference.
There must be two memory references for
every program reference:
One to retrieve the page table entry,
One to retrieve the value.
Most caches are addressed by physical
address, so there must be a virtual to
physical translation before the cache can
be accessed.
53. February 26, 2025 Veton Këpuska 53
Fast Address Translation: Regaining Lost Ground
The answer: a small cache in the
processor that retains the last few
virtual to physical translations: a
Translation Lookaside Buffer, TLB.
The TLB contains not only the virtual to
physical translations, but also the
valid, dirty, and protection bits, so a
TLB hit allows the processor to access
physical memory directly.
The TLB is usually implemented as a
fully associative cache:
54. February 26, 2025 Veton Këpuska 54
Translation Lookaside Buffer Structure and Operation
TLB
Desired word
Main memory or cache
Virtual address from CPU
Page number
Associative lookup
of virtual page
number in TLB
TLB miss.
Look for
physical page
in page table.
To page table
Virtual
page
number
Word Physical page
Physical address
Word
Hit
N
Y
Access-
control bits:
presence bit,
dirty bit,
valid bit,
usage bits
Physical
page
number
TLB hit.
Page is in
primary memory.
55. February 26, 2025 Veton Këpuska 55
Operation of the Memory Hierarchy
Virtual address
CPU Cache Main memory Secondary memory
Search TLB Search cache
Update cache
from MM
Return value
from cache
Generate physical
address
TLB
hit
Cache
hit
Search page table
Update MM, cache,
and page table
Page fault. Get page
from secondary
memory
Generate physical
address
Update TLB
Page
table
hit
Y Y Y
Miss Miss Miss
56. February 26, 2025 Veton Këpuska 56
I/O Connection to a Memory with a Cache
The memory system is quite complex, and
affords many possible tradeoffs.
The only realistic way to chose among these
alternatives is to study a typical workload,
using either simulations or prototype systems.
Instruction and data accesses usually have
different patterns.
It is possible to employ a cache at the disk level,
using the disk hardware.
Traffic between MM and disk is I/O, and direct
memory access, DMA, can be used to speed the
transfers:
Cache
Main
memory
Paging
DMA
Disk
I/O
I/O DMA
CPU
57. February 26, 2025 Veton Këpuska 57
Summary
Most memory systems are multileveled—cache,
main memory, and disk.
Static and dynamic RAM are fastest components,
and their speed has the strongest effect on
system performance.
Chips are organized into boards and modules.
Larger, slower memory is attached to faster
memory in a hierarchical structure.
The cache to main memory interface requires
hardware address translation.
Virtual memory—the main memory–disk
interface—can employ software for address
translation because of the slower speeds
involved.
The hierarchy must be carefully designed to
ensure optimum price-performance.