15845007 Computer architecture and Org.ppt

Computer Architecture
Memory System Design

February 26, 2025 Veton Këpuska 2
Chapter Outline
7.1 Introduction: The Components of the Memory
System
7.2 RAM Structure: The Logic Designer’s Perspective
7.3 Memory Boards and Modules
7.4 Two-Level Memory Hierarchy
7.5 The Cache
7.6 Virtual Memory
7.7 The Memory Subsystem in the Computer

Introduction
 So far, we’ve treated memory as an array of words
limited in size only by the number of address bits.
 Real world issues arise:
 cost
 speed
 size
 power consumption
 volatility
 ...

Topics Covered
 Memory components:
 RAM memory cells and cell arrays
 Static RAM—more expensive, but less complex
 Tree and matrix decoders—needed for large RAM chips
 Dynamic RAM—less expensive, but needs “refreshing”
 Chip organization
 Timing
 ROM—Read-only memory
 Memory boards
 Arrays of chips give more addresses and/or wider
words
 2-D and 3-D chip arrays
 Memory modules
 Large systems can benefit by partitioning memory for
 separate access by system components
 fast access to multiple words

Topics Covered
• The memory hierarchy: from fast and expensive to slow
and cheap
 Example: Registers Cache Main Memory
Disk
 At first, consider just two adjacent levels in the
hierarchy
 The cache: High speed and expensive
 Kinds: Direct mapped, associative, set associative
 Virtual memory—makes the hierarchy transparent
 Translate the address from CPU’s logical address
to the physical address where the information is
actually stored
 Memory management—how to move information
back and forth
 Multiprogramming—what to do while we wait
 The Translation Look-aside Buffer - “TLB” helps in
speeding the address translation process
• Overall consideration of the memory as a subsystem

The CPU–Memory Interface
 Sequence of events:
 Read:
1. CPU loads MAR, issues Read, and REQUEST
2. Main memory transmits words to MDR
3. Main memory asserts COMPLETE
 Write:
1. CPU loads MAR and MDR, asserts Write, and REQUEST
2. Value in MDR is written into address in MAR
3. Main memory asserts COMPLETE
CPU
m
Main memory
Address bus
Data bus
s Address
0
1
2
3
2m
– 1
A0 – Am– 1
D0 – Db– 1
R/W
REQUEST
COMPLETE
MDR
Register
file
Control signals
m
w
w
MAR
b

The CPU–Memory Interface (cont’d.)
 Additional points:
 If b < w, main memory must make (w/b) b-bit transfers
 Some CPUs allow reading and writing of word sizes < w
 Example: Intel 8088: m = 20, w = 16, s = b = 8
 8- and 16-bit values can be read and written
 If memory is sufficiently fast, or if its response is
predictable,
then COMPLETE may be omitted
 Some systems use separate R and W lines, and omit
REQUEST
CPU
m
Main memory
Address bus
Data bus
s Address
0
1
2
3
2m
– 1
A0 – AmÐ1
D0 – DbÐ1
R/W
REQUEST
COMPLETE
MDR
Register
file
Control signals
m
w
w
MAR
b

Some Memory Properties
Symbol Definition Intel Intel PowerPC
8088 8086 601
w CPU word size 16 bits 16 bits 64 bits
m Bits in a logical memory address 20 bits 20 bits 32 bits
s Bits in smallest addressable unit 8 bits 8 bits 8 bits
b Data bus size 8 bits 16 bits 64 bits
2m Memory word capacity, s-sized wds 220 wds 220 wds 232 wds
2mxsMemory bit capacity 220x8 bits 220 8 bits 232x8 bits

Big-Endian and Little-Endian Storage
 When data types having a word size larger than the
smallest addressable unit are stored in memory the
question arises,
 “Is the least significant part of the word stored at the
lowest address (little-Endian, little end first) or—
 is the most significant part of the word stored at the lowest
address (big-Endian, big end first)”?
Example: The hexadecimal 16-bit number ABCDH,
stored at address 0:
AB CD
msb ... lsb
AB
CD
0
1
AB
CD
0
1
Little-Endian Big-Endian

Bin-Endian and Little-Endian Storage of a PowerPC 32-
Bit Word
Little-Endian Storage
Memory
Address
Contents
0 b7 … b0
1 b15 … b8
2 b23 … b16
3 b31 … b24
…
CPU Word
CPU Word bit
locations
b31...b24 b23…b16 b15...b8 b7…b0
Big-Endian byte
addresses
0 1 2 3
Little-Endian byte
addresses
3 2 1 0
Big-Endian Storage
Memory
Address
Contents
0 b31 … b24
1 b23 … b16
2 b15 … b8
3 b7 … b0
…

Memory Performance Parameters
Symbol Definition Units Meaning
ta
Access time Time Time to access a memory word until
the assertion of the memory
completion signal
tc
Cycle time Time Time from start of read to start of next
memory operation
k Block size Words Number of words per block
 Bandwidth Words/
sec
Word Transmission Rate
tl
Latency Time Time to access first of a sequence of
words
tbl=t1+k/ Block
access
time
Time Time to access entire block form
start of read

The Memory Hierarchy, Cost, and Performance
 Some Typical Values:
Component
Access type Random Random Random Direct Sequential
access access access access access
Capacity, bytes 64–1024 8–512 KB 8–64 MB 1–10 GB 1 TB
Latency 1–10 ns 20 ns 50 ns 10 ms 10 ms–10 s
Block size 1 word 16 words 16 words 4 KB 4 KB
Bandwidth System 8 MB/s 1 MB/s 1 MB/s 1 MB/s
clock
rate
Cost/MB High $500 $30 $0.25 $0.02
CPU
Cache MainMemory DiskMemory
Tape
Memory

Memory Hierarchy
 Memory Hierarchy
CPU Cache Main
Memory
HD
word
Primary
Level
Secondary
Level
Ternary
Level
2nd
-ary
level
Block
3rd
-ary
level
Block
Cache
lines
Virtual Memory
•Block Size
•Latency
•Bandwidth
•Hit Rate
Increases Frequency of transaction Decreases
Principle of Locality
Fast & Small
Slow & Large

Memory Hierarchy
 Considering Any Two Adjacent Levels of the Memory
Hierarchy
 Some definitions:
 Temporal locality: the property of most programs that if a
given memory location is referenced, it is likely to be
referenced again, “soon.”
 Spatial locality: if a given memory location is referenced,
those locations near it are likely to be referenced “soon.”
 Working set: The set of memory locations referenced over a
fixed period of time, or within a time window.
 Notice that temporal and spatial locality both work to
assure that the contents of the working set change only
slowly over execution time.
CPU • • • • • •
Slower,
larger
Defining the primary and secondary levels:
Secondary
level
Primary
level
Faster,
smaller
two adjacent levels in the hierarchy

Primary and Secondary Levels of the Memory Hierarchy
Secondary
level
• The item of commerce between any two levels is the block.
• Blocks may/will differ in size at different levels in the hierarchy.
Example: Cache block size ~ 16–64 bytes.
Disk block size: ~ 1–4 Kbytes.
• As working set changes, blocks are moved back/forth through the
hierarchy to satisfy memory access requests.
• The primary level block size is often determined more by the
properties of information transfer than by spatial locality principle.
Speed between levels defined by
latency: time to access first word, and
bandwidth: the number of words per second transmitted between
levels.
Typical latencies:
Cache latency: a few clocks
Disk latency: 100,000 clocks
Primary
level

Primary and Secondary Addresses
• A complication: Addresses will differ
depending on the level.
• Primary address: the address of a value in
the primary level.
• Secondary address: the address of a
value in the secondary level.
 Primary and Secondary Address
Examples
 Main memory address: unsigned integer
 Disk address: track number, sector number,
offset of word in sector.

Addressing and Accessing a Two-Level Hierarchy
The
computer
system, HW
or SW,
must
perform any
address
translation
that is
required:
Two ways of forming the address: Segmentation and Paging.
Paging is more common. Sometimes the two are used together,
one “on top of” the other. More about address translation and paging later...
Miss
System
address
Hit
Address in
secondary
memory
Memory management unit (MMU)
Address in
primary
memory
Block
Word
Primary
level
Secondary
level
Translation function
(mapping tables,
permissions, etc.)

Primary Address Formation
Block
System address
Lookup table
Word
Block
Primary address
(a) Paging
Word
Block
System address
Lookup table
Word
Base address
Primary address
(b) Segmentation
Word
+

Hits and Misses; Paging; Block Placement
 Hit: the word was found at the level from which it was
requested.
 Miss: the word was not found at the level from which it was
requested.
 A miss will result in a request for the block containing the word
from the next higher level in the hierarchy.
 Hit ratio (or hit rate) = h = number of hits
 Miss ratio: 1 - hit ratio
 tp = primary memory access time.
 ts = secondary memory access time
 Access time, ta = h • tp + (1-h) • ts
 Page: commonly, a disk block.
 Page fault: synonymous with a miss.
 Demand paging: pages are moved from disk to main
memory only when a word in the page is requested by the
processor.
 Block placement and replacement decisions must be made
each time a block is moved.
total number of
references

 The Cache Regime:
 Secondary latency only about 10 time as long as
primary-level access time.
 CPU wait afforded as long as miss ratio is small.
 A small fast memory (sometimes part of CPU), called
cache, connected to a slower memory typically of
different semiconductor technology.
 Cache’s access time matches the processor speed.
 Two level cashes:
 Primarily cache (faster level)
 Secondary cache (slower level)
 Since CPU waits on cache miss, software cannot be
used in miss response thus everything has to be
done in hardware.
 This means that placement and replacement decisions
and secondary address formation must be kept simple
so they can be done quickly with limited hardware.
 Less flexibility of placement in cache then in primary
level of a virtual memory system.

 The Cache Miss versus the Main Memory
Miss
 The cache hit time is typically 1-2 clock cycles.
 Miss penalty only a few 10’s of clock cycles to
bring the desired block into the cache from main
memory.
 Main memory miss penalty typically runs to the
10,000-100,000 of clock cycles to bring the
block from the disk.
 Cache miss rates of no greater than 2% are
tolerable
 Main memory miss rates no greater than
0.001% to avoid significant degradation in
system performance.

Virtual Memory
 A virtual memory is a memory hierarchy, usually
consisting of at least main memory and disk, in which
the processor issues all memory references as
effective addresses in a flat address space. All
translations to primary and secondary addresses are
handled transparently to the process making the
address reference, thus providing the illusion of a
flat address space.
 Recall that disk accesses may require 100,000 clock
cycles to complete, due to the slow access time of
the disk subsystem. Once the processor has, through
mediation of the operating system, made the proper
request to the disk subsystem, it is available for other
tasks.
 Multiprogramming shares the processor among
independent programs that are resident in main
memory and thus available for execution.

Decisions in Designing a 2-Level Hierarchy
 Translation procedure to translate from system address to primary
address.
 Block size—block transfer efficiency and miss ratio will be affected.
 Processor dispatch on miss—processor wait or processor
multiprogrammed.
 Primary-level placement—direct, associative, or a combination.
Discussed later.
 Replacement policy—which block is to be replaced upon a miss.
 Direct access to secondary level—in the cache regime, can the
processor directly access main memory upon a cache miss?
 Write through—can the processor write directly to main memory upon a
cache miss?
 Read through—can the processor read directly from main memory upon a
cache miss as the cache is being updated?
 Read or write bypass—can certain infrequent read or write misses be
satisfied by a direct access of main memory without any block movement?

The Cache Mapping Function
CPU
Cache
Block
Main memory
Mapping function
Address
Word
The cache mapping function is responsible for all cache operations:
• Placement strategy: where to place an incoming block in the cache
• Replacement strategy: which block to replace upon a miss
• Read and write policy: how to handle reads and writes upon cache misses
Mapping function must be implemented in hardware.
Three different types of mapping functions:
• Associative
• Direct mapped
• Block-set associative
Example: 256 KB 16 words 32 MB

Memory Fields and Address Translation
Example of processor-issued 32-bit virtual address:
0
31
32 bits
That same 32-bit address partitioned into two fields, a block field,
and a word field. The word field represents the offset into the block
specified in the block field:
Block number Word
26 6
226
64 word blocks
00 ••• 001001 001011
Example of a specific memory reference: Block 9, word 11.

Associative Cache
*16 bits, while unrealistically small, simplifies the examples
Associative mapped
cache model: any
block from main
memory can be put
anywhere in the
cache.
Assume a 16-bit
main memory.*
Cache
memory
Main
memory
Valid
bits
0
1
2
255
421
?
119
2
Cache block 0 MM block 0
MM block 1
MM block 2
MM block 119
MM block 421
MM block 8191
?
Cache block 2
Cache block 255
1
0
1
3
13
1
Tag
memory
Tag
field,
13 bits
Tag
Main memory address:
Byte
One cache line,
8 bytes
One cache line,
8 bytes
Valid,
1 bit
Associative
Memory Search

Associative Cache
 In General:
 B- Number of Bytes per Block
 T-Address size in bits of Main Memory, also
Number of Tag bits in the Tag Memory
 L-Number of Blocks in Cache.
 SMM=Bx2T – Size of Main Memory in Bytes
 SCM=LxB – Size of Cache Memory in Bytes
 Main Memory address length in bits:
log2SMM = T+log2B

Associative Cache Mechanism
Because any block can reside anywhere in the cache, an associative (content
addressable) memory is used. All locations are searched simultaneously.
Match
bit
Valid
bit
Match
64
3
8
To CPU
Argument
register
Associative tag memory
3
13 Selector
Tag
Main memory address
Byte
Cache block 0
?
Cache block 2
Cache block 255
One cache line,
8 bytes
2
3
1
4
5
6

Advantages and Disadvantages
of the Associative Mapped Cache
 Advantage
 Most flexible of all—any MM block can go
anywhere in the cache.
 Disadvantages
 Large tag memory.
 Need to search entire tag memory
simultaneously means lots of hardware.
 Replacement Policy is an issue when the
cache is full.
Q.: How is an associative search conducted at the logic gate level?
Direct-mapped caches simplify the hardware by allowing each MM block
to go into only one place in the cache:

Direct-Mapped Cache
Key Idea: all the MM
blocks from a given
group can go into only
one location in the
cache, corresponding
to the group number.
Now the cache needs only examine the single group
that its reference specifies.
Cache
memory Main memory block numbers Group #:
Valid
bits
0
1
2
255
30
9
1
1
1
1
1
3
8
3
8
1
Tag
memory
Tag
field,
5 bits
Group
5
Tag
Cache address:
Byte
One
cache
line,
8 bytes One cache line,
8 bytes
0
1
2
255
256
257
258
511
512
513
514
767
2305
7680
7681
7682
7936
7937
7938
0
1
2
255
8191
0
Tag #: 1 2 9 30 31

Direct-Mapped Cache
 In General:
 B- Number of Bytes per Block
 T- Number of Tag bits in the Tag Memory also
Number of Columns in Main Memory
 G-Number of Groups  L
 L-Number of Blocks in Cache.
 SMM=GxBx2T
– Size of Main Memory in Bytes
 SCM=LxB – Size of Cache Memory in Bytes
 Main Memory address length in bits:
log2SMM = T+log2G+log2B

Direct-Mapped Cache Operation
1. Decode the
group number of
the incoming MM
address to select
the group
2. If Match
AND Valid
3. Then gate out the
tag field
4. Compare cache
tag with
incoming tag
5. If a hit, then gate
out the cache
line
6. and use the word field to
select the desired word.
Cache
memory
Valid
bits
0
1
2
255
30
9
1
1
1
5
5 5
8
64
3
1
256
1
1
3
8
1
Tag
memory
Tag
field,
5 bits
Group
5
Tag
Cache miss
Cache hit
=

Main memory address
Byte
Hit
5-bit
comparator Selector
8– 256
decoder
4
3
1
5
2
6

Direct-Mapped Caches
 The direct mapped cache uses less hardware,
but is much more restrictive in block placement.
 If two blocks from the same group are
frequently referenced, then the cache will
“thrash.” That is, repeatedly bring the two
competing blocks into and out of the cache. This
will cause a performance degradation.
 Block replacement strategy is trivial.
 Compromise—allow several cache blocks in each
group—the Block-Set-Associative Cache

2-Way Set-Associative Cache
Example shows 256 groups, a set of two per group.
Sometimes referred to as a 2-way set-associative cache.
Cache
memory Main memory block numbers Group #:
7680
2304
258
0
1
2
255
2
2
0
3
8
3
8
Tag
memory
Tag
field,
5 bits
Set
5
Tag
Cache group address:
Byte
One
cache
line,
8 bytes
One cache line,
8 bytes
512
513
255
0
1
2
255
256
257
258
511
512
513
514
767
2304
7680
7681
7682
7936
7937
7938
0
1
2
255
8191
0
Tag #: 1 2 9 30 31
30
9
1
1 511

K-Way Set-Associative Cache
Tag Memory Valid Bits Cache Memory
L
T
K
L
1
K 2T
LS
B
T log2S log2B
Primary
Address
Main
Memory
Address

K-Way Set-Associative Cache
 K=1-way Block-Set Associative Cache
 Direct Mapped Cache
 S=1 Block-Set Associative Cache 
Associative Cache.

Read/Write Policies
 Cache Write Policies upon a Cache Hit:
 Write-through:
 Updates both cache & Main Memory upon each write to
a block that is present in the cache.
 Write-back:
 Postpones main memory update until block is replaced
uses “dirty bit” (this bit is set upon first write in a block
of the cache).
 Read and Write Miss Policies
 Miss upon a read:
 Read-through: Immediate forwarding (from MM) to CPU
then cache is updated
 Load the block into the cache first.
 Miss upon a write:
 Write allocate: block is first brought into the cache and
then (MM) updated.
 Write no-allocate: write directly to main memory
(without brining block to cache).

Block Replacement Policy
 Not needed with direct-mapped cache
 Least Recently Used (LRU)
 Track usage with a counter. Each time a block is
accessed:
 Clear counter of accessed block
 Increment counters with values less than the one
just cleared.
 All others remain unchanged
 When set is full, remove line with highest count
 Random replacement—replace block at random
 Even random replacement is a fairly effective
strategy
 FIFO – Longest residing block replaced first.
 Smaller cache memories:
 Random worst, LRU best
 Large cache memories:
 Random, LRU comparable

Performance and Cache Design
 Direct-Mapped and Block-Set Associative Caches may
suffer from minor time penalties due to the need to
decode the group (set) field.
 For equivalent amount of hardware (e.g., # number
of gates), direct-mapped and set-associative caches
will have more capacity.
 Quantitative analysis of cache performance is quite
complex and is the subject of ongoing research.
 Simplified mathematical analysis, however, will
provide sufficient insight into the issues involved:

 Definitions:
 h - hit ratio and (1-h) miss ratio.
 TC cache access time upon a cache hit.
 TM access time upon a cache miss.
 Ta average memory access during a particular machine execution interval:
 Recall - Access time:
Ta = h • Tp + (1 - h) • Ts
for primary (p) and secondary (s) levels.
 For Tp =Tc for cache and Ts=TM for MM,
Ta = hTC + (1-h)TM
 This equation is valid for read-through cache where the value
is forwarded from main memory to the CPU directly as it is
received without waiting for the entire cache line to be loaded
before initiating another cache read to forward the value.
 If this is not the case TM is replaced by the cache line fill time.

 Speedup S:
where
 Twithout is the time taken without the improvement,
e.g., without the cache, and
 Twith is the time the process takes with the
improvement, e.g., with the cache.
 Having a model for cache and MM access times
and cache line fill time, the speedup can be
calculated once the hit ratio is known.
with
without
T
T
S 

 Example:
 Changing a certain cache from direct mapped to two-
way set-associative caused the hit ratio measured for
a certain set of benchmarks to improve form 0.91 to
0.96 with TC=20 ns, and TM=70ns. What is the
speedup, assuming no change in Tc from the set-
associative multiplexer.
 Answer:
 Twithout = 0.91x20 + (1-0.91)x70 = 24.5 ns;
 Twith = 0.96x20 + (1-0.96)x70 = 22.5 ns;
 Thus the speedup is:
 S=24.5/22.0=1.11  11% improvement in execution
time.

Virtual Memory
 The memory management unit, MMU, is responsible for mapping logical
addresses issued by the CPU to physical addresses that are presented
to the cache and main memory.
A word about addresses:
CPU
MainMemory
Cache Disk
MMU
Logical
Address
Physical
Address
Mapping
Tables
Virtual
Address
CPU Chip
• Effective address — an address computed by the processor while
executing a program. Synonymous with logical address.
• The term effective address is often used when referring to
activity inside the CPU. Logical address is most often used
when referring to addresses when viewed from outside the CPU.
• Virtual address — the address generated from the logical address
by the memory management unit, MMU, prior to making address
translation.
• Physical address — the address presented to the memory unit.
(Note: Every address reference must be translated.)

Virtual Addresses—Why?
 The logical address provided by the
CPU is translated to a virtual address
by the MMU. Often the virtual address
space is larger than the logical
address, allowing program units to be
mapped to a much larger virtual
address space.

Virtual Addressing—Advantages
 Simplified addressing. Each program unit can be compiled
into its own memory space, beginning at address 0 and
potentially extending far beyond the amount of physical
memory present in the system.
 No address relocation required at load time.
 No need to fragment the program to accommodate memory
limitations.
 Cost effective use of physical memory.
 Less expensive secondary (disk) storage can replace primary
storage. (The MMU will bring portions of the program into
physical memory as required, e.g., not all portions of the
program does need to occupy physical memory at one time)
 Access control. As each memory reference is translated, it
can be simultaneously checked for read, write, and execute
privileges.
 This allows access/security control at the most fundamental
levels.
 Can be used to prevent buggy programs and intruders from
causing damage to other users or the system().
 This is the origin of those “bus error” and “segmentation
fault” messages.

Memory Management
Block
System address
Lookup table
Word
Block
Primary address
(a) Paging
Word
Block
System address
Lookup table
Word
Base address
Primary address
(b) Segmentation
Word
+

Memory Management by Segmentation
 Notice that each
segment’s virtual
address starts at 0,
different from its
physical address.
 Repeated movement
of segments into
and out of physical
memory will result
in gaps between
segments. This is
called external
fragmentation.
 Compaction
(defragmenter)
routines must be
occasionally run to
remove these
fragments.
Main memory
Segment 1
Segment 5
Gap
Segment 6
Physical
memory
addresses
Virtual
memory
addresses
0000
0
0
0
0
0
FFF
Segment 9
Segment 3
Gap

Segmentation Mechanism
 The computation of
physical address
from virtual
address requires
an integer addition
for each memory
reference, and a
comparison if
segment limits are
checked.
 Q: How does the
MMU switch
references from
one segment to
another?
Main memory
Segment 1
Segment 5
Gap
Segment 6
Offset in
segment
Segment
base
register
Segment
limit
register
No
Virtual
memory
address
from CPU
Bounds
error
Segment 9
Segment 3
Gap
+


Memory Management by Paging
 This figure shows the mapping between virtual memory pages,
physical memory pages, and pages in secondary memory. Page
n - 1 is not present in physical memory, but only in secondary
memory.
 The MMU manages this mapping (logical to physical address).
Program
unit
0
Page 1
Page 2
Page n – 1
Virtual memory
Page 0
Physical memory
Secondary memory
.
.
Demand Paging:
Pages are brought
into physical
memory only as
they are needed.

Virtual Address Translation in a Paged MMU
A page fault will result in 100,000 or more cycles passing before the page
has been brought from secondary storage to MM.

Page table
limit register
Page table
base register
No
Bounds
error
Access-
control bits:
presence bit,
dirty bit,
usage bits
Physical
page
number or
pointer to
secondary
storage
+
Offset in
page table
Hit.
Page in
primary memory.
Translate to
Disk address.
Miss
(page fault).
Page in
secondary
memory.
Page table
Desired word
Main memory
Virtual address from CPU
Page number Offset in page Physical page
Physical address
Word
• 1 table per
user per
program unit
• One
translation
per memory
access
• Potentially
large page
table

Page Placement and Replacement
 Page tables are direct mapped, since the
physical page is computed directly from the
virtual page number.
 But physical pages can reside anywhere in
physical memory.
 Page tables such as those on the previous slide
result in large page tables, since there must be a
page table entry for every page in the program
unit.
 Some implementations resort to hash tables
instead, which need have entries only for those
pages actually present in physical memory.
 Replacement strategies are generally LRU, or at
least employ a “use bit” to guide replacement.

Fast Address Translation: Regaining Lost Ground
 The concept of virtual memory is very
attractive, but leads to considerable
overhead:
 There must be a translation for every
memory reference.
 There must be two memory references for
every program reference:
 One to retrieve the page table entry,
 One to retrieve the value.
 Most caches are addressed by physical
address, so there must be a virtual to
physical translation before the cache can
be accessed.

Fast Address Translation: Regaining Lost Ground
 The answer: a small cache in the
processor that retains the last few
virtual to physical translations: a
Translation Lookaside Buffer, TLB.
 The TLB contains not only the virtual to
physical translations, but also the
valid, dirty, and protection bits, so a
TLB hit allows the processor to access
physical memory directly.
 The TLB is usually implemented as a
fully associative cache:

Translation Lookaside Buffer Structure and Operation
TLB
Desired word
Main memory or cache
Virtual address from CPU
Page number
Associative lookup
of virtual page
number in TLB
TLB miss.
Look for
physical page
in page table.
To page table
Virtual
page
number
Word Physical page
Physical address
Word
Hit
N
Y
Access-
control bits:
presence bit,
dirty bit,
valid bit,
usage bits
Physical
page
number
TLB hit.
Page is in
primary memory.

Operation of the Memory Hierarchy
Virtual address
CPU Cache Main memory Secondary memory
Search TLB Search cache
Update cache
from MM
Return value
from cache
Generate physical
address
TLB
hit
Cache
hit
Search page table
Update MM, cache,
and page table
Page fault. Get page
from secondary
memory
Generate physical
address
Update TLB
Page
table
hit
Y Y Y
Miss Miss Miss

I/O Connection to a Memory with a Cache
 The memory system is quite complex, and
affords many possible tradeoffs.
 The only realistic way to chose among these
alternatives is to study a typical workload,
using either simulations or prototype systems.
 Instruction and data accesses usually have
different patterns.
 It is possible to employ a cache at the disk level,
using the disk hardware.
 Traffic between MM and disk is I/O, and direct
memory access, DMA, can be used to speed the
transfers:
Cache
Main
memory
Paging
DMA
Disk
I/O
I/O DMA
CPU

Summary
 Most memory systems are multileveled—cache,
main memory, and disk.
 Static and dynamic RAM are fastest components,
and their speed has the strongest effect on
system performance.
 Chips are organized into boards and modules.
 Larger, slower memory is attached to faster
memory in a hierarchical structure.
 The cache to main memory interface requires
hardware address translation.
 Virtual memory—the main memory–disk
interface—can employ software for address
translation because of the slower speeds
involved.
 The hierarchy must be carefully designed to
ensure optimum price-performance.

15845007 Computer architecture and Org.ppt

More Related Content

Similar to 15845007 Computer architecture and Org.ppt (20)

Recently uploaded (20)

15845007 Computer architecture and Org.ppt