Chapter 6: Memory
• CPU accesses
memory at least once
per fetch-execute
cycle:
– Instruction fetch
– Possible operand reads
– Possible operand write
• RAM is much slower
than the CPU, so we
need a compromise:
– Cache
• We will explore
memory here
– RAM, ROM, Cache,
Virtual Memory
• Memory is organized into a hierarchy
– Memory near the top of the hierarchy is faster,
but also more expensive, so we have less of it
in the computer – this presents a challenge
• how do we make use of faster memory without
having to go down the hierarchy to slower
memory?
Types of Memory
• Cache
– SRAM (static RAM) made up of flip-flops
(like Registers)
– Slower than registers because of added
circuits to find the proper cache location,
but much faster than RAM
• DRAM is 10-100 times slower than SRAM
• ROM
– Read-only memory – contents of memory
are fused into place
– Variations:
• PROM – programmable (comes blank and
the user can program it once)
• EPROM – erasable PROM, where the
contents of all of PROM can be erased by
using ultraviolet light
• EEPROM – electrical fields can alter parts
of the contents, so it is selectively erasable,
a newer variation, flash memory, provides
greater speed
• RAM
– stands for random access
memory because you
access into memory by
supplying the address
• it should be called read-
write memory (Cache
and ROMs are also
random access
memories)
– Actually known as
DRAM (dynamic RAM)
and is built out of
capacitors
• Capacitors lose their
charge, so must be
recharged often (every
couple of milliseconds)
and have destructive
reads, so must be
recharged after a read
Memory Hierarchy Terms
• The goal of the memory hierarchy is to keep the contents
that are needed now at or near the top of the hierarchy
– We discuss the performance of the memory hierarchy using the
following terms:
• Hit – when the datum being accessed is found at the current level
• Miss – when the datum being accessed is not found and the next level of
the hierarchy must be examined
• Hit rate – how many hits out of all memory accesses
• Miss rate – how many misses out of all memory accesses
– NOTE: hit rate = 1 – miss rate, miss rate = 1 – hit rate
• Hit time – time to access this level of the hierarchy
• Miss penalty – time to access the next level
Effective Access Time Formula
• We want to determine the impact that the memory
hierarchy has on the CPU
– In a pipeline machine, we expect 1 instruction to leave the
pipeline each cycle
• the system clock is usually set to the speed of cache
• but a memory access to DRAM takes more time, so this impacts the
CPU’s performance
– On average, we want to know how long a memory access takes
(whether it is cache, DRAM or elsewhere)
• effective access time = hit time + miss rate * miss penalty
– that is, our memory access, on average, is the time it takes to access the
cache, plus for a miss, how much time it takes to access memory
– With a 2-level cache, we can expand our formula:
• average memory access time = hit time0 + miss rate0 * (hit time1 + miss
rate1 * miss penalty1 )
– We can expand the formula more to include access to swap
space (hard disk)
Locality of Reference
• The better the hit rate for level 0, the better off we are
– Similarly, if we use 2 caches, we want the hit rate of level 1 to
be as high as possible
– We want to implement the memory hierarchy to follow
Locality of Reference
• accesses to memory will generally be near recent memory accesses
and those in the near future will be around this current access
– Three forms of locality:
• Temporal locality – recently accessed items tend to be accessed again
in the near future (local variables, instructions inside a loop)
• Spatial locality – accesses tend to be clustered (accessing a[i] will
probably be followed by a[i+1] in the near future)
• Sequential locality – instructions tend to be accessed sequentially
– How do we support locality of reference?
• If we bring something into cache, bring in neighbors as well
• Keep an item in the cache for awhile as we hope to keep using it
Cache
• Cache is fast memory
– Used to store instructions and data
• It is hoped that what is needed will be in cache and what isn’t needed
will be moved out of cache back to memory
• Issues:
– What size cache? How many caches?
– How do you access what you need?
• since cache only stores part of what is in memory, we need a
mechanism to map from the memory address to the location in cache
• this is known as the cache’s mapping function
– If you have to bring in something new, what do you discard?
• this is known as the replacement strategy
– What happens if you write a new value to cache?
• we must update the now obsolete value(s) in memory
Cache and Memory Organization
• Group memory locations into lines (or refill lines)
– For instance, 1 line might store 16 bytes or 4 words
• The line size varies architecture-to-architecture
– All main memory addresses are broken into two parts
• the line #
• the location in the line
– If we have 256 Megabytes, word accessed, with word sizes of 4, and 4
words per line, we would have 16,777,216 lines so our 26 bit address has 24
bits for the line number and 2 bits for the word in the line
– The cache has the same organization but there are far fewer line
numbers (say 1024 lines of 4 words each)
• So the remainder of the address becomes the tag
– The tag is used to make sure that the line we want is the line we found
The valid bit is used to determine
if the given line has been modified
or not (is the line in memory still
valid or outdated?)
Types of Cache
• The mapping function is based on the type of cache
– Direct-mapped – each entry in memory has 1 specific place
where it can be placed in cache
• this is a cheap and easy cache to implement (and also fast), but since
there is no need for a replacement strategy it has the poorest hit rate
– Associative – any memory item can be placed in any cache line
• this cache uses associative memory so that an entry is searched for in
parallel – this is expensive and tends to be slower than a direct-mapped
cache, however, because we are free to place an entry anywhere, we can
use a replacement strategy and thus get the best hit rate
– Set-associative – a compromise between these two extremes
• by grouping lines into sets so that a line is mapped into a given set, but
within that set, the line can go anywhere
• a replacement strategy is used to determine which line within a set should
be used, so this cache improves on the hit rate of the direct-mapped cache
• while not being as expensive or as slow as the associative cache
Tag s-r Line or Slot r Word w
8 14 2
Assume 24 bit addresses,
if the cache has 16384 lines,
each storing 4 words, then
we have the following:
Direct Mapped Cache
• Assume m refill lines
– A line j in memory will be found
in cache at location j mod m
• Since each line has 1 and only 1
location in cache, there is no
need for a replacement strategy
– This yields poor hit rate but fast
performance (and cheap)
– All addresses are broken into 3
parts
• a line number (to determine the
line in cache)
• a word number
• the rest is the tag – compare the
tag to make sure you have the
right line
chapter6.ppt
Associative Cache
• Any line in memory can be placed in any line in cache
– No line number portion of the address, just a tag and a word within the
line
– Because the tag is longer, more tag storage space is needed in the cache,
so these caches need more space and so are more costly
• All tags are searched simultaneously using “associative memory”
to find the tag requested
– This is both more expensive and slower than direct-mapped caches but,
because there are choices of where to place a new line, associative caches
require a replacement strategy which might require additional hardware to
implement
Tag 22 bit
Word
2 bit
From our previous example, our address now looks like this:
Notice how big the tag is, our
cache now requires more
space to store more tag space!
Set Associative Cache
• In order to provide some degree of
variability in placement, we need
more than a direct-mapped cache
– A 2-way set associative cache
provides 2 refill lines for each line
number
• Instead of n refill lines, there are now
n / 2 sets, each set storing 2 refill lines
– We can think of this as having 2
direct-mapped caches of half the size
• Because there are ½ as many refill
lines, the line number has 1 fewer bits
and the tag number has 1 more
• We can expand this to:
– 4-way set associative
– 8-way set associative
– 16-way set associative, etc
• As the number increases,
the hit rate improves, but
the expense also increases
and the hit time gets worse
• Eventually we reach an n-
way cache, which is a fully
associative cache
Tag s-r Line or Slot r Word w
9 13 2
chapter6.ppt
Replacement And Write Strategies
• When we need to bring in a new
line from memory, we will have to
throw out a line
– Which one?
• No choice in a direct-mapped cache
• For associative and set-associative, we
have choices
– We rely on a replacement strategy to
make the best choice
• this should promote locality of
reference
– 3 replacement strategies are
• Least recently used (hard to
implement, how do we determine
which line was least recently used?)
• First-in first out (easy to implement,
but not very good results)
• Random
• If we are to write a
datum to cache, what
about writing it to
memory?
– Write-through – write to
both cache and memory
at the same time
• if we write to several
data in the same line
though, this becomes
inefficient
– Write-back – wait until
the refill line is being
discarded and write back
any changed values to
memory at that time
• This causes stale or
dirty values in memory
Virtual Memory
• Just as DRAM acts as a backup for cache, hard disk
(known as the swap space) acts as a backup for DRAM
• This is known as virtual memory
– Virtual memory is necessary because most programs are too
large to store entirely in memory
• Also, there are parts of a program that are not used very often, so
why waste the time loading those parts into memory if they won’t be
used?
– Page – a fixed sized unit of memory – all programs and data
are broken into pages
– Paging – the process of bringing in a page when it is needed
(this might require throwing a page out of memory, moving
it back to the swap disk)
• The operating system is in charge of Virtual Memory for us
– it moves needed pages into memory from disk and keeps track of where
a specific page is placed
The Paging Process
• When the CPU generates a memory address, it is a
logical (or virtual) address
– The first address of a program is 0, so the logical address is
merely an offset into the program or into the data segment
• For instance, address 25 is located 25 from the beginning of the program
• But 25 is not the physical address in memory, so the logical address must
be translated (or mapped) into a physical address
– Assume memory is broken into fixed size units known as
frames (1 page fits into 1 frame)
• We know the logical address as its page # and the offset into the page
– We have to translate the page # into the frame # (that is, where
is that particular page currently be stored in memory – or is it
even in memory?)
• Thus, the mapping process for paging means finding the frame # and
replacing the page # with it
Example of Paging
Here, we have a process of 8 pages but only 4 physical frames in
memory – therefore we must place a page into one of the available
frames in memory whenever a page is needed
At this point in time, pages 0, 3, 4 and 7 have been moved into
memory at frames 2, 0, 1 and 3 respectively
This information (of which page is stored in which frame) is stored
in memory in a location known as the Page Table. The page table
also stores whether the given page has been modified (the valid bit
– much like our cache)
A More
Complete
Example
Virtual address
mapped to physical
address
Logical and physical memory for our program
the page table
Address 1010 is
page 101, item 0
Page 101 (5) is
located in frame 11
(3) so the item 1010
is found at 110
Page Faults
• Just as cache is limited in size, so is main memory – a
process is usually given a limited number of frames
• What if a referenced page is not currently in memory?
– The memory reference causes a page fault
• The page fault requires that the OS handle the problem
– The process’ status is saved and the CPU switches to the OS
– The OS determines if there is an empty frame for the referenced
page, if not, then the OS uses a replacement strategy to select a
page to discard
• if that page is dirty, then the page must be written to disk instead of
discarded
– The OS locates the requested page on disk and loads it into the
appropriate frame in memory
– The page table is modified to reflect the change
• Page faults are time consuming because of the disk access – this causes
our effective memory access time to deteriorate badly!
Another Paging Example
Here, we have 13 bits for our addresses even though main memory is only 4K = 212
The Full
Paging
Process
We want to avoid memory accesses
(we prefer cache accesses) – but if
every memory access now requires
first accessing the page table, which
is in memory, it slows down our
computer
So we move the most used portion of
the page table into a special cache known
as the Table Lookaside Buffer or
Translation Lookaside Buffer, abbrev.
as the TLB
The process is also shown in the next
slide as a flowchart
chapter6.ppt
A Variation: Segmentation
• One flaw of paging is that, because a page is fixed in size, a
chunk of code might be divided into two or more pages
– So page faults can occur any time
• Consider, as an example, a loop which crosses 2 pages
• If the OS must remove one of the two pages to load the other, then the
OS generates 2 page faults for each loop iteration!
• A variation of paging is segmentation
– instead of fixed size blocks, programs are divided into procedural units
equal to their size
• We subdivide programs into procedures
• We subdivide data into structures (e.g., arrays, structs)
– We still use the “on-demand” approach of virtual memory, but when a
block of code is loaded into memory, the entire needed block is loaded in
• Segmentation uses a segment table instead of a page table and works similarly
although addresses are put together differently
• But segmentation causes fragmentation – when a segment is discarded from
memory for a new segment, there may be a chunk of memory that goes unused
• One solution to fragmentation is to use paging with segmentation
Effective Access With Paging
• We modify our previous formula to include the impact of
paging:
– effective access time = hit time0 + miss rate0 * (hit time1 + miss
rate1 * (hit time2 + miss rate2 * miss penalty2))
• Level 0 is on-chip cache
• Level 1 is off-chip cache
• Level 2 is main memory
• Level 3 is disk (miss penalty2 is disk access time, which is lengthy)
– Example:
• On chip cache hit rate is 90%, hit time is 5 ns, off chip cache hit rate is
96%, hit time is 10 ns, main memory hit rate is 99.8%, hit time is 60 ns,
memory miss penalty is 10 ms = 10,000 ns
– memory miss penalty is the same as the disk hit time, or disk access time
• Access time = 5 ns + .10 * (10 ns + .04 * (60 ns + .002 * 10,000 ns)) =
6.32 ns
– So our memory hierarchy adds over 20% to our memory access
Memory Organization
Here we see a typical memory layout:
Two on-chip caches: one for data, one for instructions with part of each cache
Reserved for a TLB
One off-chip cache to back-up both on-chip caches
Main memory, backed up by virtual memory

More Related Content

PPT
IS 139 Lecture 7
PPT
Cache Memory from Computer Architecture.ppt
PPT
cache memory.ppt
PPT
cache memory.ppt
PPT
Ct213 memory subsystem
PPT
Memory organization including cache and RAM.ppt
PPT
Memory Organization and Cache mapping.ppt
PPT
chapter 6 memory computer architecture.ppt
IS 139 Lecture 7
Cache Memory from Computer Architecture.ppt
cache memory.ppt
cache memory.ppt
Ct213 memory subsystem
Memory organization including cache and RAM.ppt
Memory Organization and Cache mapping.ppt
chapter 6 memory computer architecture.ppt

Similar to chapter6.ppt (20)

PPTX
CAO-Unit-III.pptx
PPT
Cache Memory for Computer Architecture.ppt
PPT
Akanskaha_ganesh_kullarni_memory_computer.ppt
PPT
04 Cache Memory
PPT
cache memory
PPT
Computer organization memory hierarchy
PPT
Ch_4.pptInnovation technology pptInnovation technology ppt
PPT
Memory organization in 08_Memory_Organization.ppt
PDF
Chache memory ( chapter number 4 ) by William stalling
PDF
Lecture 25
PDF
Computer architecture
PPTX
coa-Unit5-ppt1 (1).pptx
PPTX
Chapter_11_memory_system this is part of computer architecture.pptx
PPTX
BCSE205L_Module 4 Computer Architecture Org.pptx
PPT
04_Cache Memory.ppt
PPT
cache memory
PDF
lecture-2-3_Memory.pdf,describing memory
PPT
PDF
unit 4.faosdfjasl;dfkjas lskadfj asdlfk jasdf;laksjdf ;laskdjf a;slkdjf
PPT
04 cache memory.ppt 1
CAO-Unit-III.pptx
Cache Memory for Computer Architecture.ppt
Akanskaha_ganesh_kullarni_memory_computer.ppt
04 Cache Memory
cache memory
Computer organization memory hierarchy
Ch_4.pptInnovation technology pptInnovation technology ppt
Memory organization in 08_Memory_Organization.ppt
Chache memory ( chapter number 4 ) by William stalling
Lecture 25
Computer architecture
coa-Unit5-ppt1 (1).pptx
Chapter_11_memory_system this is part of computer architecture.pptx
BCSE205L_Module 4 Computer Architecture Org.pptx
04_Cache Memory.ppt
cache memory
lecture-2-3_Memory.pdf,describing memory
unit 4.faosdfjasl;dfkjas lskadfj asdlfk jasdf;laksjdf ;laskdjf a;slkdjf
04 cache memory.ppt 1
Ad

Recently uploaded (20)

PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PDF
HVAC Specification 2024 according to central public works department
PDF
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
PDF
semiconductor packaging in vlsi design fab
PPTX
Climate Change and Its Global Impact.pptx
PDF
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
PPTX
Introduction to pro and eukaryotes and differences.pptx
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PPTX
Education and Perspectives of Education.pptx
PPTX
Module on health assessment of CHN. pptx
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
PDF
Complications of Minimal Access-Surgery.pdf
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
HVAC Specification 2024 according to central public works department
MBA _Common_ 2nd year Syllabus _2021-22_.pdf
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
semiconductor packaging in vlsi design fab
Climate Change and Its Global Impact.pptx
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
Introduction to pro and eukaryotes and differences.pptx
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Education and Perspectives of Education.pptx
Module on health assessment of CHN. pptx
B.Sc. DS Unit 2 Software Engineering.pptx
Complications of Minimal Access-Surgery.pdf
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
Environmental Education MCQ BD2EE - Share Source.pdf
Unit 4 Computer Architecture Multicore Processor.pptx
FORM 1 BIOLOGY MIND MAPS and their schemes
Ad

chapter6.ppt

  • 1. Chapter 6: Memory • CPU accesses memory at least once per fetch-execute cycle: – Instruction fetch – Possible operand reads – Possible operand write • RAM is much slower than the CPU, so we need a compromise: – Cache • We will explore memory here – RAM, ROM, Cache, Virtual Memory • Memory is organized into a hierarchy – Memory near the top of the hierarchy is faster, but also more expensive, so we have less of it in the computer – this presents a challenge • how do we make use of faster memory without having to go down the hierarchy to slower memory?
  • 2. Types of Memory • Cache – SRAM (static RAM) made up of flip-flops (like Registers) – Slower than registers because of added circuits to find the proper cache location, but much faster than RAM • DRAM is 10-100 times slower than SRAM • ROM – Read-only memory – contents of memory are fused into place – Variations: • PROM – programmable (comes blank and the user can program it once) • EPROM – erasable PROM, where the contents of all of PROM can be erased by using ultraviolet light • EEPROM – electrical fields can alter parts of the contents, so it is selectively erasable, a newer variation, flash memory, provides greater speed • RAM – stands for random access memory because you access into memory by supplying the address • it should be called read- write memory (Cache and ROMs are also random access memories) – Actually known as DRAM (dynamic RAM) and is built out of capacitors • Capacitors lose their charge, so must be recharged often (every couple of milliseconds) and have destructive reads, so must be recharged after a read
  • 3. Memory Hierarchy Terms • The goal of the memory hierarchy is to keep the contents that are needed now at or near the top of the hierarchy – We discuss the performance of the memory hierarchy using the following terms: • Hit – when the datum being accessed is found at the current level • Miss – when the datum being accessed is not found and the next level of the hierarchy must be examined • Hit rate – how many hits out of all memory accesses • Miss rate – how many misses out of all memory accesses – NOTE: hit rate = 1 – miss rate, miss rate = 1 – hit rate • Hit time – time to access this level of the hierarchy • Miss penalty – time to access the next level
  • 4. Effective Access Time Formula • We want to determine the impact that the memory hierarchy has on the CPU – In a pipeline machine, we expect 1 instruction to leave the pipeline each cycle • the system clock is usually set to the speed of cache • but a memory access to DRAM takes more time, so this impacts the CPU’s performance – On average, we want to know how long a memory access takes (whether it is cache, DRAM or elsewhere) • effective access time = hit time + miss rate * miss penalty – that is, our memory access, on average, is the time it takes to access the cache, plus for a miss, how much time it takes to access memory – With a 2-level cache, we can expand our formula: • average memory access time = hit time0 + miss rate0 * (hit time1 + miss rate1 * miss penalty1 ) – We can expand the formula more to include access to swap space (hard disk)
  • 5. Locality of Reference • The better the hit rate for level 0, the better off we are – Similarly, if we use 2 caches, we want the hit rate of level 1 to be as high as possible – We want to implement the memory hierarchy to follow Locality of Reference • accesses to memory will generally be near recent memory accesses and those in the near future will be around this current access – Three forms of locality: • Temporal locality – recently accessed items tend to be accessed again in the near future (local variables, instructions inside a loop) • Spatial locality – accesses tend to be clustered (accessing a[i] will probably be followed by a[i+1] in the near future) • Sequential locality – instructions tend to be accessed sequentially – How do we support locality of reference? • If we bring something into cache, bring in neighbors as well • Keep an item in the cache for awhile as we hope to keep using it
  • 6. Cache • Cache is fast memory – Used to store instructions and data • It is hoped that what is needed will be in cache and what isn’t needed will be moved out of cache back to memory • Issues: – What size cache? How many caches? – How do you access what you need? • since cache only stores part of what is in memory, we need a mechanism to map from the memory address to the location in cache • this is known as the cache’s mapping function – If you have to bring in something new, what do you discard? • this is known as the replacement strategy – What happens if you write a new value to cache? • we must update the now obsolete value(s) in memory
  • 7. Cache and Memory Organization • Group memory locations into lines (or refill lines) – For instance, 1 line might store 16 bytes or 4 words • The line size varies architecture-to-architecture – All main memory addresses are broken into two parts • the line # • the location in the line – If we have 256 Megabytes, word accessed, with word sizes of 4, and 4 words per line, we would have 16,777,216 lines so our 26 bit address has 24 bits for the line number and 2 bits for the word in the line – The cache has the same organization but there are far fewer line numbers (say 1024 lines of 4 words each) • So the remainder of the address becomes the tag – The tag is used to make sure that the line we want is the line we found The valid bit is used to determine if the given line has been modified or not (is the line in memory still valid or outdated?)
  • 8. Types of Cache • The mapping function is based on the type of cache – Direct-mapped – each entry in memory has 1 specific place where it can be placed in cache • this is a cheap and easy cache to implement (and also fast), but since there is no need for a replacement strategy it has the poorest hit rate – Associative – any memory item can be placed in any cache line • this cache uses associative memory so that an entry is searched for in parallel – this is expensive and tends to be slower than a direct-mapped cache, however, because we are free to place an entry anywhere, we can use a replacement strategy and thus get the best hit rate – Set-associative – a compromise between these two extremes • by grouping lines into sets so that a line is mapped into a given set, but within that set, the line can go anywhere • a replacement strategy is used to determine which line within a set should be used, so this cache improves on the hit rate of the direct-mapped cache • while not being as expensive or as slow as the associative cache
  • 9. Tag s-r Line or Slot r Word w 8 14 2 Assume 24 bit addresses, if the cache has 16384 lines, each storing 4 words, then we have the following: Direct Mapped Cache • Assume m refill lines – A line j in memory will be found in cache at location j mod m • Since each line has 1 and only 1 location in cache, there is no need for a replacement strategy – This yields poor hit rate but fast performance (and cheap) – All addresses are broken into 3 parts • a line number (to determine the line in cache) • a word number • the rest is the tag – compare the tag to make sure you have the right line
  • 11. Associative Cache • Any line in memory can be placed in any line in cache – No line number portion of the address, just a tag and a word within the line – Because the tag is longer, more tag storage space is needed in the cache, so these caches need more space and so are more costly • All tags are searched simultaneously using “associative memory” to find the tag requested – This is both more expensive and slower than direct-mapped caches but, because there are choices of where to place a new line, associative caches require a replacement strategy which might require additional hardware to implement Tag 22 bit Word 2 bit From our previous example, our address now looks like this: Notice how big the tag is, our cache now requires more space to store more tag space!
  • 12. Set Associative Cache • In order to provide some degree of variability in placement, we need more than a direct-mapped cache – A 2-way set associative cache provides 2 refill lines for each line number • Instead of n refill lines, there are now n / 2 sets, each set storing 2 refill lines – We can think of this as having 2 direct-mapped caches of half the size • Because there are ½ as many refill lines, the line number has 1 fewer bits and the tag number has 1 more • We can expand this to: – 4-way set associative – 8-way set associative – 16-way set associative, etc • As the number increases, the hit rate improves, but the expense also increases and the hit time gets worse • Eventually we reach an n- way cache, which is a fully associative cache Tag s-r Line or Slot r Word w 9 13 2
  • 14. Replacement And Write Strategies • When we need to bring in a new line from memory, we will have to throw out a line – Which one? • No choice in a direct-mapped cache • For associative and set-associative, we have choices – We rely on a replacement strategy to make the best choice • this should promote locality of reference – 3 replacement strategies are • Least recently used (hard to implement, how do we determine which line was least recently used?) • First-in first out (easy to implement, but not very good results) • Random • If we are to write a datum to cache, what about writing it to memory? – Write-through – write to both cache and memory at the same time • if we write to several data in the same line though, this becomes inefficient – Write-back – wait until the refill line is being discarded and write back any changed values to memory at that time • This causes stale or dirty values in memory
  • 15. Virtual Memory • Just as DRAM acts as a backup for cache, hard disk (known as the swap space) acts as a backup for DRAM • This is known as virtual memory – Virtual memory is necessary because most programs are too large to store entirely in memory • Also, there are parts of a program that are not used very often, so why waste the time loading those parts into memory if they won’t be used? – Page – a fixed sized unit of memory – all programs and data are broken into pages – Paging – the process of bringing in a page when it is needed (this might require throwing a page out of memory, moving it back to the swap disk) • The operating system is in charge of Virtual Memory for us – it moves needed pages into memory from disk and keeps track of where a specific page is placed
  • 16. The Paging Process • When the CPU generates a memory address, it is a logical (or virtual) address – The first address of a program is 0, so the logical address is merely an offset into the program or into the data segment • For instance, address 25 is located 25 from the beginning of the program • But 25 is not the physical address in memory, so the logical address must be translated (or mapped) into a physical address – Assume memory is broken into fixed size units known as frames (1 page fits into 1 frame) • We know the logical address as its page # and the offset into the page – We have to translate the page # into the frame # (that is, where is that particular page currently be stored in memory – or is it even in memory?) • Thus, the mapping process for paging means finding the frame # and replacing the page # with it
  • 17. Example of Paging Here, we have a process of 8 pages but only 4 physical frames in memory – therefore we must place a page into one of the available frames in memory whenever a page is needed At this point in time, pages 0, 3, 4 and 7 have been moved into memory at frames 2, 0, 1 and 3 respectively This information (of which page is stored in which frame) is stored in memory in a location known as the Page Table. The page table also stores whether the given page has been modified (the valid bit – much like our cache)
  • 18. A More Complete Example Virtual address mapped to physical address Logical and physical memory for our program the page table Address 1010 is page 101, item 0 Page 101 (5) is located in frame 11 (3) so the item 1010 is found at 110
  • 19. Page Faults • Just as cache is limited in size, so is main memory – a process is usually given a limited number of frames • What if a referenced page is not currently in memory? – The memory reference causes a page fault • The page fault requires that the OS handle the problem – The process’ status is saved and the CPU switches to the OS – The OS determines if there is an empty frame for the referenced page, if not, then the OS uses a replacement strategy to select a page to discard • if that page is dirty, then the page must be written to disk instead of discarded – The OS locates the requested page on disk and loads it into the appropriate frame in memory – The page table is modified to reflect the change • Page faults are time consuming because of the disk access – this causes our effective memory access time to deteriorate badly!
  • 20. Another Paging Example Here, we have 13 bits for our addresses even though main memory is only 4K = 212
  • 21. The Full Paging Process We want to avoid memory accesses (we prefer cache accesses) – but if every memory access now requires first accessing the page table, which is in memory, it slows down our computer So we move the most used portion of the page table into a special cache known as the Table Lookaside Buffer or Translation Lookaside Buffer, abbrev. as the TLB The process is also shown in the next slide as a flowchart
  • 23. A Variation: Segmentation • One flaw of paging is that, because a page is fixed in size, a chunk of code might be divided into two or more pages – So page faults can occur any time • Consider, as an example, a loop which crosses 2 pages • If the OS must remove one of the two pages to load the other, then the OS generates 2 page faults for each loop iteration! • A variation of paging is segmentation – instead of fixed size blocks, programs are divided into procedural units equal to their size • We subdivide programs into procedures • We subdivide data into structures (e.g., arrays, structs) – We still use the “on-demand” approach of virtual memory, but when a block of code is loaded into memory, the entire needed block is loaded in • Segmentation uses a segment table instead of a page table and works similarly although addresses are put together differently • But segmentation causes fragmentation – when a segment is discarded from memory for a new segment, there may be a chunk of memory that goes unused • One solution to fragmentation is to use paging with segmentation
  • 24. Effective Access With Paging • We modify our previous formula to include the impact of paging: – effective access time = hit time0 + miss rate0 * (hit time1 + miss rate1 * (hit time2 + miss rate2 * miss penalty2)) • Level 0 is on-chip cache • Level 1 is off-chip cache • Level 2 is main memory • Level 3 is disk (miss penalty2 is disk access time, which is lengthy) – Example: • On chip cache hit rate is 90%, hit time is 5 ns, off chip cache hit rate is 96%, hit time is 10 ns, main memory hit rate is 99.8%, hit time is 60 ns, memory miss penalty is 10 ms = 10,000 ns – memory miss penalty is the same as the disk hit time, or disk access time • Access time = 5 ns + .10 * (10 ns + .04 * (60 ns + .002 * 10,000 ns)) = 6.32 ns – So our memory hierarchy adds over 20% to our memory access
  • 25. Memory Organization Here we see a typical memory layout: Two on-chip caches: one for data, one for instructions with part of each cache Reserved for a TLB One off-chip cache to back-up both on-chip caches Main memory, backed up by virtual memory