SlideShare a Scribd company logo
Chapter 4
Cache Memory
Table 4.1
Key Characteristics of Computer Memory Systems
Location
Internal (e.g. processor registers, cache,
main memory)
External (e.g. optical disks, magnetic disks,
tapes)
Capacity
Number of words
Number of bytes
Unit of Transfer
Word
Block
Access Method
Sequential
Direct
Random
Associative
Performance
Access time
Cycle time
Transfer rate
Physical Type
Semiconductor
Magnetic
Optical
Magneto-optical
Physical Characteristics
Volatile/nonvolatile
Erasable/nonerasable
Organization
Memory modules
Characteristics of Memory Systems
• Location
• Refers to whether memory is internal and external to the computer
• Internal memory is often equated with main memory
• Processor requires its own local memory, in the form of registers
• Cache is another form of internal memory
• External memory consists of peripheral storage devices that are accessible to
the processor via I/O controllers
• Capacity
• Memory is typically expressed in terms of bytes
• Unit of transfer
• For internal memory the unit of transfer is equal to the number of electrical
lines into and out of the memory module
Method of Accessing Units of Data
Sequential
access
Memory is organized into
units of data called records
Access must be made in a
specific linear sequence
Access time is variable
Direct
access
Involves a shared read-write
mechanism
Individual blocks or records
have a unique address based
on physical location
Access time is variable
Random
access
Each addressable location in
memory has a unique,
physically wired-in
addressing mechanism
The time to access a given
location is independent of
the sequence of prior
accesses and is constant
Any location can be selected
at random and directly
addressed and accessed
Main memory and some
cache systems are random
access
Associative
A word is retrieved based on
a portion of its contents
rather than its address
Each location has its own
addressing mechanism and
retrieval time is constant
independent of location or
prior access patterns
Cache memories may
employ associative access
Capacity and Performance:
The two most important characteristics of memory
Three performance parameters are used:
Access time (latency)
• For random-access memory it is the
time it takes to perform a read or
write operation
• For non-random-access memory it is
the time it takes to position the
read-write mechanism at the desired
location
Memory cycle time
•Access time plus any additional time
required before second access can
commence
•Additional time may be required for
transients to die out on signal lines or to
regenerate data if they are read
destructively
•Concerned with the system bus, not the
processor
Transfer rate
• The rate at which data can be
transferred into or out of a memory
unit
• For random-access memory it is
equal to 1/(cycle time)
PROBLEMS
Q1. Calculate the cycle time of the system if the
access time is 10msec and the delay time is 1%
of the access time. Also calculate the data
transfer rate.
DATA:
C.T=?
D.T=10nsec= 10−9
sec
Delay=1% of A.T= 1−9
sec
DTR=?
SOLUTION:
C.T = A.T + Delay
= (10−9
)+(1−9
)
C.T =10.1 nsec
D.T.R = 1/C.T
D.T.R = 1/10.1 nsec=0.099 nsec
D.T.R= 0.0999
x 10−3
/10−3
D.T.R =99 10
6
D.T.R =99.0 Mbits/sec
PROBLEMS
Q2. Calculate the cycle time of the system if the
access time is 20nsec and the delay time is 2%
of the access time. Also calculate the data
transfer rate.
cache cache memory memory cache memory.pptx
PROBLEMS
Q3. Calculate the access time of the system if
the data transfer rate is 100M bits/sec and the
delay time is 2nsec.
cache cache memory memory cache memory.pptx
PROBLEMS
Q4. Calculate the access time of the system if
the data transfer rate is 50M bits/sec and the
delay time is 1nsec.
cache cache memory memory cache memory.pptx
PROBLEMS
Q5. In a system the address generated by CPU
appears in the address register at 110nsec of the
clock pulse. The data appears in the data register
at 200nsec of the clock pulse. The delay of the
system is 20nsec. Calculate access time, cycle
time and DTR.
cache cache memory memory cache memory.pptx
cache cache memory memory cache memory.pptx
Memory
• The most common forms are:
• Semiconductor memory
• Magnetic surface memory
• Optical
• Magneto-optical
• Several physical characteristics of data storage are important:
• Volatile memory
• Information decays naturally or is lost when electrical power is switched off
• Nonvolatile memory
• Once recorded, information remains without deterioration until deliberately changed
• No electrical power is needed to retain information
• Magnetic-surface memories
• Are nonvolatile
• Semiconductor memory
• May be either volatile or nonvolatile
• Nonerasable memory
• Cannot be altered, except by destroying the storage unit
• Semiconductor memory of this type is known as read-only memory (ROM)
• For random-access memory the organization is a key design issue
• Organization refers to the physical arrangement of bits to form words
Memory Hierarchy
• Design constraints on a computer’s memory can be summed up
by three questions:
• How much, how fast, how expensive
• There is a trade-off among capacity, access time, and cost
• Faster access time, greater cost per bit
• Greater capacity, smaller cost per bit
• Greater capacity, slower access time
• The way out of the memory dilemma is not to rely on a single
memory component or technology, but to employ a memory
hierarchy
Figure 4.1 The Memory Hierarchy
Inboard
memory
Outboard
storage
Off-line
storage
Main
memory
Magnetic disk
CD-ROM
CD-RW
DVD-RW
DVD-RAM
Blu-Ray
Magnetic tape
Cache
Reg-
isters
0
T1
T1 + T2
T2
1
Fraction of accesses involving only Level 1 (Hit ratio)
Average
access
time
Figure 4.2 Performance of a Simple Two-Level Memory
Memory
• The use of three levels exploits the fact that semiconductor
memory comes in a variety of types which differ in speed and cost
• Data are stored more permanently on external mass storage
devices
• External, nonvolatile memory is also referred to as secondary
memory or auxiliary memory
• Disk cache
• A portion of main memory can be used as a buffer to hold data
temporarily that is to be read out to disk
• A few large transfers of data can be used instead of many small transfers
of data
• Data can be retrieved rapidly from the software cache rather than slowly
from the disk
CPU
Word Transfer
Fast
Fastest Fast
Less
fast
Slow
Slow
Block Transfer
Cache Main Memory
Figure 4.3 Cache and Main Memory
(a) Single cache
(b) Three-level cache organization
CPU
Level 1
(L1) cache
Level 2
(L2) cache
Level 3
(L3) cache
Main
Memory
Memory
address
0
1
2
0
1
2
C – 1
3
2n
– 1
Word
Length
Block Length
(K Words)
Block 0
(K words)
Block M – 1
Line
Number Tag Block
(b) Main memory
(a) Cache
Figure 4.4 Cache/Main-Memory Structure
Receive address
RA from CPU
Is block
containing RA
in cache?
Fetch RA word
and deliver
to CPU
DONE
Access main
memory for block
containing RA
Allocate cache
line for main
memory block
Deliver RA word
to CPU
Load main
memory block
into cache line
Figure 4.5 Cache Read Operation
START
No
Yes
Processor Cache
Address
Address
buffer
Data
buffer
Control
Data
Figure 4.6 Typical Cache Organization
Control
System
Bus
Table 4.2
Elements of Cache Design
Cache Addresses
Logical
Physical
Cache Size
Mapping Function
Direct
Associative
Set Associative
Replacement Algorithm
Least recently used (LRU)
First in first out (FIFO)
Least frequently used (LFU)
Random
Write Policy
Write through
Write back
Line Size
Number of caches
Single or two level
Unified or split
Cache Addresses
• Virtual memory
• Facility that allows programs to address memory from a logical point of
view, without regard to the amount of main memory physically available
• When used, the address fields of machine instructions contain virtual
addresses
• For reads to and writes from main memory, a hardware memory
management unit (MMU) translates each virtual address into a physical
address in main memory
Virtual Memory
Processor Main
memory
Cache
Logical address Physical address
Data
MMU
(a) Logical Cache
Processor Main
memory
Cache
Logical address Physical address
Data
MMU
(b) Physical Cache
Figure 4.7 Logical and Physical Caches
Table 4.3
Cache Sizes of
Some
Processors
a Two values separated by a
slash refer to instruction and
data caches.
b Both caches are instruction
only; no data caches.
(Table can be found on page
134 in the textbook.)
Processor Type
Year of
Introduction
L1 Cachea L2 cache L3 Cache
IBM 360/85 Mainframe 1968 16 to 32 kB — —
PDP-11/70 Minicomputer 1975 1 kB — —
VAX 11/780 Minicomputer 1978 16 kB — —
IBM 3033 Mainframe 1978 64 kB — —
IBM 3090 Mainframe 1985 128 to 256 kB — —
Intel 80486 PC 1989 8 kB — —
Pentium PC 1993 8 kB/8 kB 256 to 512 KB —
PowerPC 601 PC 1993 32 kB — —
PowerPC 620 PC 1996 32 kB/32 kB — —
PowerPC G4 PC/server 1999 32 kB/32 kB 256 KB to 1 MB 2 MB
IBM S/390 G6 Mainframe 1999 256 kB 8 MB —
Pentium 4 PC/server 2000 8 kB/8 kB 256 KB —
IBM SP
High-end
server/
supercomputer
2000 64 kB/32 kB 8 MB —
CRAY MTAb Supercomputer 2000 8 kB 2 MB —
Itanium PC/server 2001 16 kB/16 kB 96 KB 4 MB
Itanium 2 PC/server 2002 32 kB 256 KB 6 MB
IBM
POWER5
High-end
server
2003 64 kB 1.9 MB 36 MB
CRAY XD-1 Supercomputer 2004 64 kB/64 kB 1MB —
IBM
POWER6
PC/server 2007 64 kB/64 kB 4 MB 32 MB
IBM z10 Mainframe 2008 64 kB/128 kB 3 MB 24-48 MB
Intel Core i7
EE 990
Workstaton/
server
2011 6 ´ 32 kB/32 kB 1.5 MB 12 MB
IBM
zEnterprise
196
Mainframe/
Server
2011
24 ´ 64 kB/
128 kB
24 ´ 1.5 MB
24 MB L3
192 MB
L4
Mapping Function
• Because there are fewer cache lines than main memory blocks, an
algorithm is needed for mapping main memory blocks into cache lines
• Three techniques can be used:
Direct
•The simplest technique
•Maps each block of main
memory into only one
possible cache line
Associative
•Permits each main memory
block to be loaded into any
line of the cache
•The cache control logic
interprets a memory address
simply as a Tag and a Word
field
•To determine whether a
block is in the cache, the
cache control logic must
simultaneously examine
every line’s Tag for a match
Set Associative
•A compromise that exhibits
the strengths of both the
direct and associative
approaches while reducing
their disadvantages
(a) Direct mapping
First m blocks of
main memory
(equal to size of cache)
b
L0
Lm–1
L0
Lm–1
Bm–1
B0
b = length of block in bits
t = length of tag in bits
cache memory
m
lines
b
b
t
b
t
Figure 4.8 Mapping From Main Memory to Cache:
Direct and Associative
(b) Associative mapping
one block of
main memory
cache memory
Word
Line
Tag
WO
W1
W2
W3
Compare
1 if match
0 if no match
0 if match
1 if no match
W4j
W(4j+1)
W(4j+2)
W(4j+3)
Tag Data
Cache
L0
Li
Memory Address
(miss in cache)
(hit in cache)
w
s–r
w
r
s+w
Main Memory
Bj
B0
s
w
Figure 4.9 Direct-Mapping Cache Organization
Lm–1
s–r
111111111111111111111100
111111111111111111111000
111111110000000000000000
000101101111111111111100
000101100011001110011100
111111110000000000000100
000101100000000000000100
000101100000000000000000
000000001111111111111100
000000000000000000000000
000000000000000000000100
000000001111111111111000
00
00
FF
FF
FF
FF
16
16
16
16
00
00
13579246
Tag
Tag
(hex)
Main memory address (binary)
Tag Data
32 bits
16-Kline cache
8 bits
8 bits 2 bits
Tag
Main memory address =
Figure 4.10 Direct Mapping Example
Line Word
Line
Number
Line + Word Data
77777777
11235813
12345678
FEDCBA98 FEDCBA98
24682468
11223344
13579246
00
16
FF
16
16
0000
0001
0CE7
3FFE
3FFF
11235813
FEDCBA98
11223344
12345678
14 bits
32 bits
16-MByte main memory
Note: Memory address values are
in binary representation;
other values are in hexadecimal
Direct Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+ w/2w = 2s
• Number of lines in cache = m = 2r
• Size of tag = (s – r) bits
PROBLEMS
Q1. The size of the main memory is 16M bytes and
the size of the cache memory is 65K bytes, the
main memory block is 4 bytes. Calculate the no. of
bits required for TAG field, SLOT field and Byte or
WORD if the address generated by the CPU is
(4096AF) specific the bit pattern of the above
mentioned field using direct mapping function.
cache cache memory memory cache memory.pptx
cache cache memory memory cache memory.pptx
Victim Cache
• Originally proposed as an approach to reduce the conflict misses of
direct mapped caches without affecting its fast access time
• Fully associative cache
• Typical size is 4 to 16 cache lines
• Residing between direct mapped L1 cache and the next level of
memory
Tag Word
W0
W1
W2
W3
L0
Compare
W4j
W(4j+1)
W(4j+2)
W(4j+3)
Tag Data
Cache
Memory Address
(miss in cache)
(hit in cache)
w
w
s
s+w
Main Memory
B0
Bj
s
w
Figure 4.11 Fully Associative Cache Organization
Lm–1
Lj
s
1 if match
0 if no match
0 if match
1 if no match
111111111111111111111100
111111111111111111111000
111111111111111111110100
000101100011001110011000
000101100011001110011100
000101100011001110100000
000000000000000000000100
000000000000000000000000 13579246
FEDCBA98
Tag Data
32 bits
16 Kline Cache
22 bits
Tag
Main Memory Address =
Figure 4.12 Associative Mapping Example
Word
Line
Number
Data
FEDCBA98
24682468
11223344
33333333
11223344
3FFFFE
058CE7
000000
3FFFFF
0000
0001
3FFE
3FFF
FEDCBA98
13579246
3FFFFD 3FFD
33333333
24682468
32 bits
16 MByte Main Memory
2 bits
22 bits
000000
000001
Tag (hex)
058CE7
058CE8
058CE6
3FFFFE
3FFFFD
3FFFFF
Tag
Main memory address (binary)
Word
Note: Memory address values are
in binary representation;
other values are in hexadecimal
Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+ w/2w = 2s
• Number of lines in cache = undetermined
• Size of tag = s bits
PROBLEMS
Q2. The size of the main memory is 16M bytes and
the size of the cache memory is 65K bytes, the
main memory block is 4 bytes. Calculate the no. of
bits required for TAG field and WORD if the address
generated by the CPU is (4096AF) specific the bit
pattern of the above mentioned field using
Associative mapping function.
cache cache memory memory cache memory.pptx
Set Associative Mapping
• Compromise that exhibits the strengths of both the direct and
associative approaches while reducing their disadvantages
• Cache consists of a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given set
• e.g. 2 lines per set
• 2 way associative mapping
• A given block can be in one of 2 lines in only one set
Figure 4.13 Mapping From Main Memory to Cache:
k-way Set Associative
First v blocks of
main memory
(equal to number of sets)
Cache memory - way 1 Cache memory - way k
one
set
(b) k direct-mapped caches
v
lines
Bv–1
B0 L0
Lv–1
(a) v associative-mapped caches
First v blocks of
main memory
(equal to number of sets)
Cache memory - set 0
Cache memory - set v–1
k
lines
Bv–1
B0 L0
Lk–1
Set Associative Mapping Summary
• Address length = (s + w) bits
• Number of addressable units = 2s+w words or bytes
• Block size = line size = 2w words or bytes
• Number of blocks in main memory = 2s+w/2w=2s
• Number of lines in set = k
• Number of sets = v = 2d
• Number of lines in cache = m=kv = k * 2d
• Size of cache = k * 2d+w words or bytes
• Size of tag = (s – d) bits
000101100111111111111100
111111111111111111111000
111111111000000000000000
000101100011001110011100
000101100000000000000000
000000001111111111111000
000000000000000000000000 13579246
000
000
000
000
Tag
(hex)
Tag Data
32 bits
16 Kline Cache
9 bits
Tag
Main Memory Address =
Figure 4.15 Two-Way Set Associative Mapping Example
Set Word
Tag Data
Set
Number
Data
77777777
11235813
12345678
FEDCBA98 FEDCBA98
24682468
11223344
02C
02C
02C
02C
1FF
1FF
1FF
1FF
77777777
13579246
000
02C
1FF
02C
02C
0000
0001
0CE7
1FFE
1FFF
02C
24682468
1FF
11235813
11223344
12345678
32 bits
16 MByte Main Memory
32 bits
9 bits
FEDCBA98
2 bits
13 bits
9 bits
111111111111111111111100
111111111000000000000100
000101100000000000000100
000000001111111111111100
000000000000000000000100
Tag
Main memory address (binary)
Set + Word
Note: Memory address values are
in binary representation;
other values are in hexadecimal
Figure 4.16 Varying Associativity over Cache Size
0.0
1k
Hit
ratio
2k 4k 8k 16k
Cache size (bytes)
direct
2-way
4-way
8-way
16-way
32k 64k 128k 256k 512k 1M
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
PROBLEMS
Q3. The size of the main memory is 16M bytes and the
size of the cache memory is 65K bytes, the main memory
block is 4 bytes. Calculate the no. of bits required for TAG
field, SLOT and WORD if the address generated by the
CPU is (4096AF) specific the bit pattern of the above
mentioned field using two (2) way set- associative
mapping function.
cache cache memory memory cache memory.pptx
cache cache memory memory cache memory.pptx
PROBLEMS
Q4. The size of the main memory is 16M bytes and
the size of the cache memory is 65K bytes, the
main memory block is 4 bytes. Calculate the no. of
bits required for TAG field, SLOT and WORD if the
address generated by the CPU is (4096AF) specific
the bit pattern of the above mentioned field using
4 way set- associative mapping function.
cache cache memory memory cache memory.pptx
cache cache memory memory cache memory.pptx
Replacement Algorithms
• Once the cache has been filled, when a new block is brought into
the cache, one of the existing blocks must be replaced
• For direct mapping there is only one possible line for any
particular block and no choice is possible
• For the associative and set-associative techniques a replacement
algorithm is needed
• To achieve high speed, an algorithm must be implemented in
hardware
The most common replacement algorithms are:
• Least recently used (LRU)
• Most effective
• Replace that block in the set that has been in the cache longest with no
reference to it
• Because of its simplicity of implementation, LRU is the most popular
replacement algorithm
• First-in-first-out (FIFO)
• Replace that block in the set that has been in the cache longest
• Easily implemented as a round-robin or circular buffer technique
• Least frequently used (LFU)
• Replace that block in the set that has experienced the fewest references
• Could be implemented by associating a counter with each line
First-in-first-out (FIFO)
Consider the following reference string: 0, 2, 1, 6, 4, 0, 1, 0, 3, 1, 2, 1. Using
FIFO page replacement algorithm
When a block that is resident in the
cache is to be replaced there are
two cases to consider:
If the old block in the cache has not been
altered then it may be overwritten with a new
block without first writing out the old block
If at least one write operation has been
performed on a word in that line of the cache
then main memory must be updated by
writing the line of cache out to the block of
memory before bringing in the new block
There are two problems to contend
with:
More than one device may have access to main
memory
A more complex problem occurs when
multiple processors are attached to the same
bus and each processor has its own local cache
- if a word is altered in one cache it could
conceivably invalidate a word in other caches
Write Policy
Write Through
and Write Back
• Write through
• Simplest technique
• All write operations are made to main memory as well as to the cache
• The main disadvantage of this technique is that it generates substantial
memory traffic and may create a bottleneck
• Write back
• Minimizes memory writes
• Updates are made only in the cache
• Portions of main memory are invalid and hence accesses by I/O modules can
be allowed only through the cache
• This makes for complex circuitry and a potential bottleneck
Line Size
When a block of data
is retrieved and
placed in the cache
not only the desired
word but also some
number of adjacent
words are retrieved
As the block size
increases the hit
ratio will at first
increase because of
the principle of
locality
As the block size
increases more
useful data are
brought into the
cache
The hit ratio will
begin to decrease as
the block becomes
bigger and the
probability of using
the newly fetched
information becomes
less than the
probability of reusing
the information that
has to be replaced
Two specific effects
come into play:
•Larger blocks reduce the
number of blocks that fit into a
cache
•As a block becomes larger each
additional word is farther from
the requested word
Multilevel Caches
• As logic density has increased it has become possible to have a
cache on the same chip as the processor
• The on-chip cache reduces the processor’s external bus activity
and speeds up execution time and increases overall system
performance
• When the requested instruction or data is found in the on-chip cache, the
bus access is eliminated
• On-chip cache accesses will complete appreciably faster than would even
zero-wait state bus cycles
• During this period the bus is free to support other transfers
• Two-level cache:
• Internal cache designated as level 1 (L1)
• External cache designated as level 2 (L2)
• Potential savings due to the use of an L2 cache depends on the hit
rates in both the L1 and L2 caches
• The use of multilevel caches complicates all of the design issues
related to caches, including size, replacement algorithm, and write
policy
0.78
0.80
0.82
0.84
0.86
0.88
0.90
0.92
0.94
0.96
0.98
1k 2k 4k 8k 16k 32k
L1 = 16k
64k 128k 256k 512k 1M 2M
Hit
ratio
L2 Cache size (bytes)
L1 = 8k
Figure 4.17 Total Hit Ratio (L1 and L2) for 8 Kbyte and 16 Kbyte L1
Unified Versus Split Caches
• Has become common to split cache:
• One dedicated to instructions
• One dedicated to data
• Both exist at the same level, typically as two L1 caches
• Advantages of unified cache:
• Higher hit rate
• Balances load of instruction and data fetches automatically
• Only one cache needs to be designed and implemented
• Trend is toward split caches at the L1 and unified caches for higher
levels
• Advantages of split cache:
• Eliminates cache contention between instruction fetch/decode unit and
execution unit
• Important in pipelining
Problem Solution
Processor on which
Feature First
Appears
External memory slower than the system
bus.
Add external cache using
faster memory
technology.
386
Increased processor speed results in
external bus becoming a bottleneck for
cache access.
Move external cache on-
chip, operating at the
same speed as the
processor.
486
Internal cache is rather small, due to
limited space on chip
Add external L2 cache
using faster technology
than main memory
486
Contention occurs when both the
Instruction Prefetcher and the Execution
Unit simultaneously require access to the
cache. In that case, the Prefetcher is stalled
while the Execution Unit’s data access
takes place.
Create separate data and
instruction caches.
Pentium
Create separate back-side
bus that runs at higher
speed than the main
(front-side) external bus.
The BSB is dedicated to
the L2 cache.
Pentium Pro
Increased processor speed results in
external bus becoming a bottleneck for L2
cache access.
Move L2 cache on to the
processor chip.
Pentium II
Add external L3 cache. Pentium III
Some applications deal with massive
databases and must have rapid access to
large amounts of data. The on-chip caches
are too small.
Move L3 cache on-chip. Pentium 4
Table 4.4
Intel
Cache
Evolution
Figure 4.18 Pentium 4 Block Diagram
Load
address
unit
Integer register file
L1 data cache (16 KB)
FP register file
Store
address
unit
Simple
integer
ALU
Instruction
fetch/decode
unit
Out-of-order
execution
logic
L2 cache
(512 KB)
L3 cache
(1 MB)
L1 instruction
cache (12K mops)
Simple
integer
ALU
Complex
integer
ALU
FP/
MMX
unit
FP
move
unit
System Bus
64
bits
256
bits
Table 4.5 Pentium 4 Cache Operating Modes
Control Bits Operating Mode
CD NW Cache Fills Write Throughs Invalidates
0 0 Enabled Enabled Enabled
1 0 Disabled Enabled Enabled
1 1 Disabled Disabled Disabled
Note: CD = 0; NW = 1 is an invalid combination.
Summary
Chapter 4
• Computer memory system
overview
• Characteristics of
Memory Systems
• Memory Hierarchy
• Cache memory principles
• Pentium 4 cache
organization
Cache
Memory
• Elements of cache design
• Cache addresses
• Cache size
• Mapping function
• Replacement algorithms
• Write policy
• Line size
• Number of caches

More Related Content

PDF
INF3703 - Chapter 14 Distributed Databases
PPTX
Chapter 9
PPTX
Database ms priyanka
PPT
Memory Hierarchy PPT of Computer Organization
PPT
PPTX
CH04-COA10e.pptx
PPSX
Coa presentation3
PPTX
Computer memory
INF3703 - Chapter 14 Distributed Databases
Chapter 9
Database ms priyanka
Memory Hierarchy PPT of Computer Organization
CH04-COA10e.pptx
Coa presentation3
Computer memory

Similar to cache cache memory memory cache memory.pptx (20)

PPTX
Computer Memory Hierarchy Computer Architecture
PPTX
Memory Organization
PPTX
CPU Memory Hierarchy and Caching Techniques
PPTX
WEEK6_COMPUTER_ORGANIZATION.pptx
PPTX
Unit IV Memory.pptx
PDF
3 computer memory
PPT
04_Cache_Memory-cust memori memori memori.ppt
PPT
Ct213 memory subsystem
PPT
Internal Memory FIT NED UNIVERSITY OF EN
PPT
cache memory introduction, level, function
PPT
Chapter 8 computer memory system overview
PPTX
Exploring Of System Hardware
PPTX
Unit-6.pptx lk kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
COA (Unit_4.pptx)
PPTX
memorytechnologyandoptimization-140416131506-phpapp02.pptx
PDF
Unit I Memory technology and optimization
PPTX
Memory technology and optimization in Advance Computer Architechture
PPT
Presentacion pujol
PPT
Cs intro-ca
PPTX
System Unit
Computer Memory Hierarchy Computer Architecture
Memory Organization
CPU Memory Hierarchy and Caching Techniques
WEEK6_COMPUTER_ORGANIZATION.pptx
Unit IV Memory.pptx
3 computer memory
04_Cache_Memory-cust memori memori memori.ppt
Ct213 memory subsystem
Internal Memory FIT NED UNIVERSITY OF EN
cache memory introduction, level, function
Chapter 8 computer memory system overview
Exploring Of System Hardware
Unit-6.pptx lk kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
COA (Unit_4.pptx)
memorytechnologyandoptimization-140416131506-phpapp02.pptx
Unit I Memory technology and optimization
Memory technology and optimization in Advance Computer Architechture
Presentacion pujol
Cs intro-ca
System Unit
Ad

Recently uploaded (20)

PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Artificial Intelligence
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Current and future trends in Computer Vision.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
UNIT 4 Total Quality Management .pptx
PPT
Project quality management in manufacturing
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
additive manufacturing of ss316l using mig welding
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
PPT on Performance Review to get promotions
PPTX
Construction Project Organization Group 2.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Geodesy 1.pptx...............................................
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
web development for engineering and engineering
Foundation to blockchain - A guide to Blockchain Tech
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Artificial Intelligence
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Current and future trends in Computer Vision.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
UNIT 4 Total Quality Management .pptx
Project quality management in manufacturing
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
additive manufacturing of ss316l using mig welding
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPT on Performance Review to get promotions
Construction Project Organization Group 2.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Geodesy 1.pptx...............................................
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
web development for engineering and engineering
Ad

cache cache memory memory cache memory.pptx

  • 2. Table 4.1 Key Characteristics of Computer Memory Systems Location Internal (e.g. processor registers, cache, main memory) External (e.g. optical disks, magnetic disks, tapes) Capacity Number of words Number of bytes Unit of Transfer Word Block Access Method Sequential Direct Random Associative Performance Access time Cycle time Transfer rate Physical Type Semiconductor Magnetic Optical Magneto-optical Physical Characteristics Volatile/nonvolatile Erasable/nonerasable Organization Memory modules
  • 3. Characteristics of Memory Systems • Location • Refers to whether memory is internal and external to the computer • Internal memory is often equated with main memory • Processor requires its own local memory, in the form of registers • Cache is another form of internal memory • External memory consists of peripheral storage devices that are accessible to the processor via I/O controllers • Capacity • Memory is typically expressed in terms of bytes • Unit of transfer • For internal memory the unit of transfer is equal to the number of electrical lines into and out of the memory module
  • 4. Method of Accessing Units of Data Sequential access Memory is organized into units of data called records Access must be made in a specific linear sequence Access time is variable Direct access Involves a shared read-write mechanism Individual blocks or records have a unique address based on physical location Access time is variable Random access Each addressable location in memory has a unique, physically wired-in addressing mechanism The time to access a given location is independent of the sequence of prior accesses and is constant Any location can be selected at random and directly addressed and accessed Main memory and some cache systems are random access Associative A word is retrieved based on a portion of its contents rather than its address Each location has its own addressing mechanism and retrieval time is constant independent of location or prior access patterns Cache memories may employ associative access
  • 5. Capacity and Performance: The two most important characteristics of memory Three performance parameters are used: Access time (latency) • For random-access memory it is the time it takes to perform a read or write operation • For non-random-access memory it is the time it takes to position the read-write mechanism at the desired location Memory cycle time •Access time plus any additional time required before second access can commence •Additional time may be required for transients to die out on signal lines or to regenerate data if they are read destructively •Concerned with the system bus, not the processor Transfer rate • The rate at which data can be transferred into or out of a memory unit • For random-access memory it is equal to 1/(cycle time)
  • 6. PROBLEMS Q1. Calculate the cycle time of the system if the access time is 10msec and the delay time is 1% of the access time. Also calculate the data transfer rate.
  • 7. DATA: C.T=? D.T=10nsec= 10−9 sec Delay=1% of A.T= 1−9 sec DTR=? SOLUTION: C.T = A.T + Delay = (10−9 )+(1−9 ) C.T =10.1 nsec
  • 8. D.T.R = 1/C.T D.T.R = 1/10.1 nsec=0.099 nsec D.T.R= 0.0999 x 10−3 /10−3 D.T.R =99 10 6 D.T.R =99.0 Mbits/sec
  • 9. PROBLEMS Q2. Calculate the cycle time of the system if the access time is 20nsec and the delay time is 2% of the access time. Also calculate the data transfer rate.
  • 11. PROBLEMS Q3. Calculate the access time of the system if the data transfer rate is 100M bits/sec and the delay time is 2nsec.
  • 13. PROBLEMS Q4. Calculate the access time of the system if the data transfer rate is 50M bits/sec and the delay time is 1nsec.
  • 15. PROBLEMS Q5. In a system the address generated by CPU appears in the address register at 110nsec of the clock pulse. The data appears in the data register at 200nsec of the clock pulse. The delay of the system is 20nsec. Calculate access time, cycle time and DTR.
  • 18. Memory • The most common forms are: • Semiconductor memory • Magnetic surface memory • Optical • Magneto-optical • Several physical characteristics of data storage are important: • Volatile memory • Information decays naturally or is lost when electrical power is switched off • Nonvolatile memory • Once recorded, information remains without deterioration until deliberately changed • No electrical power is needed to retain information • Magnetic-surface memories • Are nonvolatile • Semiconductor memory • May be either volatile or nonvolatile • Nonerasable memory • Cannot be altered, except by destroying the storage unit • Semiconductor memory of this type is known as read-only memory (ROM) • For random-access memory the organization is a key design issue • Organization refers to the physical arrangement of bits to form words
  • 19. Memory Hierarchy • Design constraints on a computer’s memory can be summed up by three questions: • How much, how fast, how expensive • There is a trade-off among capacity, access time, and cost • Faster access time, greater cost per bit • Greater capacity, smaller cost per bit • Greater capacity, slower access time • The way out of the memory dilemma is not to rely on a single memory component or technology, but to employ a memory hierarchy
  • 20. Figure 4.1 The Memory Hierarchy Inboard memory Outboard storage Off-line storage Main memory Magnetic disk CD-ROM CD-RW DVD-RW DVD-RAM Blu-Ray Magnetic tape Cache Reg- isters
  • 21. 0 T1 T1 + T2 T2 1 Fraction of accesses involving only Level 1 (Hit ratio) Average access time Figure 4.2 Performance of a Simple Two-Level Memory
  • 22. Memory • The use of three levels exploits the fact that semiconductor memory comes in a variety of types which differ in speed and cost • Data are stored more permanently on external mass storage devices • External, nonvolatile memory is also referred to as secondary memory or auxiliary memory • Disk cache • A portion of main memory can be used as a buffer to hold data temporarily that is to be read out to disk • A few large transfers of data can be used instead of many small transfers of data • Data can be retrieved rapidly from the software cache rather than slowly from the disk
  • 23. CPU Word Transfer Fast Fastest Fast Less fast Slow Slow Block Transfer Cache Main Memory Figure 4.3 Cache and Main Memory (a) Single cache (b) Three-level cache organization CPU Level 1 (L1) cache Level 2 (L2) cache Level 3 (L3) cache Main Memory
  • 24. Memory address 0 1 2 0 1 2 C – 1 3 2n – 1 Word Length Block Length (K Words) Block 0 (K words) Block M – 1 Line Number Tag Block (b) Main memory (a) Cache Figure 4.4 Cache/Main-Memory Structure
  • 25. Receive address RA from CPU Is block containing RA in cache? Fetch RA word and deliver to CPU DONE Access main memory for block containing RA Allocate cache line for main memory block Deliver RA word to CPU Load main memory block into cache line Figure 4.5 Cache Read Operation START No Yes
  • 26. Processor Cache Address Address buffer Data buffer Control Data Figure 4.6 Typical Cache Organization Control System Bus
  • 27. Table 4.2 Elements of Cache Design Cache Addresses Logical Physical Cache Size Mapping Function Direct Associative Set Associative Replacement Algorithm Least recently used (LRU) First in first out (FIFO) Least frequently used (LFU) Random Write Policy Write through Write back Line Size Number of caches Single or two level Unified or split
  • 28. Cache Addresses • Virtual memory • Facility that allows programs to address memory from a logical point of view, without regard to the amount of main memory physically available • When used, the address fields of machine instructions contain virtual addresses • For reads to and writes from main memory, a hardware memory management unit (MMU) translates each virtual address into a physical address in main memory Virtual Memory
  • 29. Processor Main memory Cache Logical address Physical address Data MMU (a) Logical Cache Processor Main memory Cache Logical address Physical address Data MMU (b) Physical Cache Figure 4.7 Logical and Physical Caches
  • 30. Table 4.3 Cache Sizes of Some Processors a Two values separated by a slash refer to instruction and data caches. b Both caches are instruction only; no data caches. (Table can be found on page 134 in the textbook.) Processor Type Year of Introduction L1 Cachea L2 cache L3 Cache IBM 360/85 Mainframe 1968 16 to 32 kB — — PDP-11/70 Minicomputer 1975 1 kB — — VAX 11/780 Minicomputer 1978 16 kB — — IBM 3033 Mainframe 1978 64 kB — — IBM 3090 Mainframe 1985 128 to 256 kB — — Intel 80486 PC 1989 8 kB — — Pentium PC 1993 8 kB/8 kB 256 to 512 KB — PowerPC 601 PC 1993 32 kB — — PowerPC 620 PC 1996 32 kB/32 kB — — PowerPC G4 PC/server 1999 32 kB/32 kB 256 KB to 1 MB 2 MB IBM S/390 G6 Mainframe 1999 256 kB 8 MB — Pentium 4 PC/server 2000 8 kB/8 kB 256 KB — IBM SP High-end server/ supercomputer 2000 64 kB/32 kB 8 MB — CRAY MTAb Supercomputer 2000 8 kB 2 MB — Itanium PC/server 2001 16 kB/16 kB 96 KB 4 MB Itanium 2 PC/server 2002 32 kB 256 KB 6 MB IBM POWER5 High-end server 2003 64 kB 1.9 MB 36 MB CRAY XD-1 Supercomputer 2004 64 kB/64 kB 1MB — IBM POWER6 PC/server 2007 64 kB/64 kB 4 MB 32 MB IBM z10 Mainframe 2008 64 kB/128 kB 3 MB 24-48 MB Intel Core i7 EE 990 Workstaton/ server 2011 6 ´ 32 kB/32 kB 1.5 MB 12 MB IBM zEnterprise 196 Mainframe/ Server 2011 24 ´ 64 kB/ 128 kB 24 ´ 1.5 MB 24 MB L3 192 MB L4
  • 31. Mapping Function • Because there are fewer cache lines than main memory blocks, an algorithm is needed for mapping main memory blocks into cache lines • Three techniques can be used: Direct •The simplest technique •Maps each block of main memory into only one possible cache line Associative •Permits each main memory block to be loaded into any line of the cache •The cache control logic interprets a memory address simply as a Tag and a Word field •To determine whether a block is in the cache, the cache control logic must simultaneously examine every line’s Tag for a match Set Associative •A compromise that exhibits the strengths of both the direct and associative approaches while reducing their disadvantages
  • 32. (a) Direct mapping First m blocks of main memory (equal to size of cache) b L0 Lm–1 L0 Lm–1 Bm–1 B0 b = length of block in bits t = length of tag in bits cache memory m lines b b t b t Figure 4.8 Mapping From Main Memory to Cache: Direct and Associative (b) Associative mapping one block of main memory cache memory
  • 33. Word Line Tag WO W1 W2 W3 Compare 1 if match 0 if no match 0 if match 1 if no match W4j W(4j+1) W(4j+2) W(4j+3) Tag Data Cache L0 Li Memory Address (miss in cache) (hit in cache) w s–r w r s+w Main Memory Bj B0 s w Figure 4.9 Direct-Mapping Cache Organization Lm–1 s–r
  • 34. 111111111111111111111100 111111111111111111111000 111111110000000000000000 000101101111111111111100 000101100011001110011100 111111110000000000000100 000101100000000000000100 000101100000000000000000 000000001111111111111100 000000000000000000000000 000000000000000000000100 000000001111111111111000 00 00 FF FF FF FF 16 16 16 16 00 00 13579246 Tag Tag (hex) Main memory address (binary) Tag Data 32 bits 16-Kline cache 8 bits 8 bits 2 bits Tag Main memory address = Figure 4.10 Direct Mapping Example Line Word Line Number Line + Word Data 77777777 11235813 12345678 FEDCBA98 FEDCBA98 24682468 11223344 13579246 00 16 FF 16 16 0000 0001 0CE7 3FFE 3FFF 11235813 FEDCBA98 11223344 12345678 14 bits 32 bits 16-MByte main memory Note: Memory address values are in binary representation; other values are in hexadecimal
  • 35. Direct Mapping Summary • Address length = (s + w) bits • Number of addressable units = 2s+w words or bytes • Block size = line size = 2w words or bytes • Number of blocks in main memory = 2s+ w/2w = 2s • Number of lines in cache = m = 2r • Size of tag = (s – r) bits
  • 36. PROBLEMS Q1. The size of the main memory is 16M bytes and the size of the cache memory is 65K bytes, the main memory block is 4 bytes. Calculate the no. of bits required for TAG field, SLOT field and Byte or WORD if the address generated by the CPU is (4096AF) specific the bit pattern of the above mentioned field using direct mapping function.
  • 39. Victim Cache • Originally proposed as an approach to reduce the conflict misses of direct mapped caches without affecting its fast access time • Fully associative cache • Typical size is 4 to 16 cache lines • Residing between direct mapped L1 cache and the next level of memory
  • 40. Tag Word W0 W1 W2 W3 L0 Compare W4j W(4j+1) W(4j+2) W(4j+3) Tag Data Cache Memory Address (miss in cache) (hit in cache) w w s s+w Main Memory B0 Bj s w Figure 4.11 Fully Associative Cache Organization Lm–1 Lj s 1 if match 0 if no match 0 if match 1 if no match
  • 41. 111111111111111111111100 111111111111111111111000 111111111111111111110100 000101100011001110011000 000101100011001110011100 000101100011001110100000 000000000000000000000100 000000000000000000000000 13579246 FEDCBA98 Tag Data 32 bits 16 Kline Cache 22 bits Tag Main Memory Address = Figure 4.12 Associative Mapping Example Word Line Number Data FEDCBA98 24682468 11223344 33333333 11223344 3FFFFE 058CE7 000000 3FFFFF 0000 0001 3FFE 3FFF FEDCBA98 13579246 3FFFFD 3FFD 33333333 24682468 32 bits 16 MByte Main Memory 2 bits 22 bits 000000 000001 Tag (hex) 058CE7 058CE8 058CE6 3FFFFE 3FFFFD 3FFFFF Tag Main memory address (binary) Word Note: Memory address values are in binary representation; other values are in hexadecimal
  • 42. Associative Mapping Summary • Address length = (s + w) bits • Number of addressable units = 2s+w words or bytes • Block size = line size = 2w words or bytes • Number of blocks in main memory = 2s+ w/2w = 2s • Number of lines in cache = undetermined • Size of tag = s bits
  • 43. PROBLEMS Q2. The size of the main memory is 16M bytes and the size of the cache memory is 65K bytes, the main memory block is 4 bytes. Calculate the no. of bits required for TAG field and WORD if the address generated by the CPU is (4096AF) specific the bit pattern of the above mentioned field using Associative mapping function.
  • 45. Set Associative Mapping • Compromise that exhibits the strengths of both the direct and associative approaches while reducing their disadvantages • Cache consists of a number of sets • Each set contains a number of lines • A given block maps to any line in a given set • e.g. 2 lines per set • 2 way associative mapping • A given block can be in one of 2 lines in only one set
  • 46. Figure 4.13 Mapping From Main Memory to Cache: k-way Set Associative First v blocks of main memory (equal to number of sets) Cache memory - way 1 Cache memory - way k one set (b) k direct-mapped caches v lines Bv–1 B0 L0 Lv–1 (a) v associative-mapped caches First v blocks of main memory (equal to number of sets) Cache memory - set 0 Cache memory - set v–1 k lines Bv–1 B0 L0 Lk–1
  • 47. Set Associative Mapping Summary • Address length = (s + w) bits • Number of addressable units = 2s+w words or bytes • Block size = line size = 2w words or bytes • Number of blocks in main memory = 2s+w/2w=2s • Number of lines in set = k • Number of sets = v = 2d • Number of lines in cache = m=kv = k * 2d • Size of cache = k * 2d+w words or bytes • Size of tag = (s – d) bits
  • 48. 000101100111111111111100 111111111111111111111000 111111111000000000000000 000101100011001110011100 000101100000000000000000 000000001111111111111000 000000000000000000000000 13579246 000 000 000 000 Tag (hex) Tag Data 32 bits 16 Kline Cache 9 bits Tag Main Memory Address = Figure 4.15 Two-Way Set Associative Mapping Example Set Word Tag Data Set Number Data 77777777 11235813 12345678 FEDCBA98 FEDCBA98 24682468 11223344 02C 02C 02C 02C 1FF 1FF 1FF 1FF 77777777 13579246 000 02C 1FF 02C 02C 0000 0001 0CE7 1FFE 1FFF 02C 24682468 1FF 11235813 11223344 12345678 32 bits 16 MByte Main Memory 32 bits 9 bits FEDCBA98 2 bits 13 bits 9 bits 111111111111111111111100 111111111000000000000100 000101100000000000000100 000000001111111111111100 000000000000000000000100 Tag Main memory address (binary) Set + Word Note: Memory address values are in binary representation; other values are in hexadecimal
  • 49. Figure 4.16 Varying Associativity over Cache Size 0.0 1k Hit ratio 2k 4k 8k 16k Cache size (bytes) direct 2-way 4-way 8-way 16-way 32k 64k 128k 256k 512k 1M 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
  • 50. PROBLEMS Q3. The size of the main memory is 16M bytes and the size of the cache memory is 65K bytes, the main memory block is 4 bytes. Calculate the no. of bits required for TAG field, SLOT and WORD if the address generated by the CPU is (4096AF) specific the bit pattern of the above mentioned field using two (2) way set- associative mapping function.
  • 53. PROBLEMS Q4. The size of the main memory is 16M bytes and the size of the cache memory is 65K bytes, the main memory block is 4 bytes. Calculate the no. of bits required for TAG field, SLOT and WORD if the address generated by the CPU is (4096AF) specific the bit pattern of the above mentioned field using 4 way set- associative mapping function.
  • 56. Replacement Algorithms • Once the cache has been filled, when a new block is brought into the cache, one of the existing blocks must be replaced • For direct mapping there is only one possible line for any particular block and no choice is possible • For the associative and set-associative techniques a replacement algorithm is needed • To achieve high speed, an algorithm must be implemented in hardware
  • 57. The most common replacement algorithms are: • Least recently used (LRU) • Most effective • Replace that block in the set that has been in the cache longest with no reference to it • Because of its simplicity of implementation, LRU is the most popular replacement algorithm • First-in-first-out (FIFO) • Replace that block in the set that has been in the cache longest • Easily implemented as a round-robin or circular buffer technique • Least frequently used (LFU) • Replace that block in the set that has experienced the fewest references • Could be implemented by associating a counter with each line
  • 58. First-in-first-out (FIFO) Consider the following reference string: 0, 2, 1, 6, 4, 0, 1, 0, 3, 1, 2, 1. Using FIFO page replacement algorithm
  • 59. When a block that is resident in the cache is to be replaced there are two cases to consider: If the old block in the cache has not been altered then it may be overwritten with a new block without first writing out the old block If at least one write operation has been performed on a word in that line of the cache then main memory must be updated by writing the line of cache out to the block of memory before bringing in the new block There are two problems to contend with: More than one device may have access to main memory A more complex problem occurs when multiple processors are attached to the same bus and each processor has its own local cache - if a word is altered in one cache it could conceivably invalidate a word in other caches Write Policy
  • 60. Write Through and Write Back • Write through • Simplest technique • All write operations are made to main memory as well as to the cache • The main disadvantage of this technique is that it generates substantial memory traffic and may create a bottleneck • Write back • Minimizes memory writes • Updates are made only in the cache • Portions of main memory are invalid and hence accesses by I/O modules can be allowed only through the cache • This makes for complex circuitry and a potential bottleneck
  • 61. Line Size When a block of data is retrieved and placed in the cache not only the desired word but also some number of adjacent words are retrieved As the block size increases the hit ratio will at first increase because of the principle of locality As the block size increases more useful data are brought into the cache The hit ratio will begin to decrease as the block becomes bigger and the probability of using the newly fetched information becomes less than the probability of reusing the information that has to be replaced Two specific effects come into play: •Larger blocks reduce the number of blocks that fit into a cache •As a block becomes larger each additional word is farther from the requested word
  • 62. Multilevel Caches • As logic density has increased it has become possible to have a cache on the same chip as the processor • The on-chip cache reduces the processor’s external bus activity and speeds up execution time and increases overall system performance • When the requested instruction or data is found in the on-chip cache, the bus access is eliminated • On-chip cache accesses will complete appreciably faster than would even zero-wait state bus cycles • During this period the bus is free to support other transfers • Two-level cache: • Internal cache designated as level 1 (L1) • External cache designated as level 2 (L2) • Potential savings due to the use of an L2 cache depends on the hit rates in both the L1 and L2 caches • The use of multilevel caches complicates all of the design issues related to caches, including size, replacement algorithm, and write policy
  • 63. 0.78 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1k 2k 4k 8k 16k 32k L1 = 16k 64k 128k 256k 512k 1M 2M Hit ratio L2 Cache size (bytes) L1 = 8k Figure 4.17 Total Hit Ratio (L1 and L2) for 8 Kbyte and 16 Kbyte L1
  • 64. Unified Versus Split Caches • Has become common to split cache: • One dedicated to instructions • One dedicated to data • Both exist at the same level, typically as two L1 caches • Advantages of unified cache: • Higher hit rate • Balances load of instruction and data fetches automatically • Only one cache needs to be designed and implemented • Trend is toward split caches at the L1 and unified caches for higher levels • Advantages of split cache: • Eliminates cache contention between instruction fetch/decode unit and execution unit • Important in pipelining
  • 65. Problem Solution Processor on which Feature First Appears External memory slower than the system bus. Add external cache using faster memory technology. 386 Increased processor speed results in external bus becoming a bottleneck for cache access. Move external cache on- chip, operating at the same speed as the processor. 486 Internal cache is rather small, due to limited space on chip Add external L2 cache using faster technology than main memory 486 Contention occurs when both the Instruction Prefetcher and the Execution Unit simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Unit’s data access takes place. Create separate data and instruction caches. Pentium Create separate back-side bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache. Pentium Pro Increased processor speed results in external bus becoming a bottleneck for L2 cache access. Move L2 cache on to the processor chip. Pentium II Add external L3 cache. Pentium III Some applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too small. Move L3 cache on-chip. Pentium 4 Table 4.4 Intel Cache Evolution
  • 66. Figure 4.18 Pentium 4 Block Diagram Load address unit Integer register file L1 data cache (16 KB) FP register file Store address unit Simple integer ALU Instruction fetch/decode unit Out-of-order execution logic L2 cache (512 KB) L3 cache (1 MB) L1 instruction cache (12K mops) Simple integer ALU Complex integer ALU FP/ MMX unit FP move unit System Bus 64 bits 256 bits
  • 67. Table 4.5 Pentium 4 Cache Operating Modes Control Bits Operating Mode CD NW Cache Fills Write Throughs Invalidates 0 0 Enabled Enabled Enabled 1 0 Disabled Enabled Enabled 1 1 Disabled Disabled Disabled Note: CD = 0; NW = 1 is an invalid combination.
  • 68. Summary Chapter 4 • Computer memory system overview • Characteristics of Memory Systems • Memory Hierarchy • Cache memory principles • Pentium 4 cache organization Cache Memory • Elements of cache design • Cache addresses • Cache size • Mapping function • Replacement algorithms • Write policy • Line size • Number of caches