SlideShare a Scribd company logo
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch BuffersEng. MshariAlabdulkarim
Direct-Mapped Cache PerformanceOutline:Introduction.
Baseline Design.
Reducing Conflict Misses.
Reducing Capacity and Compulsory Misses.Direct-Mapped Cache PerformanceIntroduction (1):Goal: Improve the performance of caches.Why it is important to enhance the performance of the cache?Because it  has a dramatic effect on the performance of advanced processors.The miss cost has been increasing in the last decade because of:The cycle time has been decreasing much faster than main memory access time.
The average number of machines cycles per instruction has  also been decreasing.Direct-Mapped Cache PerformanceIntroduction (2):VAX was an instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC) in the mid-1970s.
VAX 11/780 is a complex instruction set computer (CISC) machine.
VAX-11/780 was the first model of VAX machines, and it has been introduced on October 25, 1977.
It has been used for a while as a baseline in CPU benchmarks because its speed was about one MIPS.Direct-Mapped Cache PerformanceIntroduction (3):The increasing cost of cache misses:
Direct-Mapped Cache PerformanceBaseline Design (1):
Direct-Mapped Cache PerformanceBaseline Design (2):Chip Specifications:The cycle time of this chip is 3 to 8 times longer than the instruction issue rate.
The first & second level caches are assumed to be direct-mapped ⟹ “fastest effective access time”.
The data cache can be either write-through or write-back. How to obtain?123Very fast on-chip clockIssuing many inst. per cycleUsing higher speed technology for the processor chip
Direct-Mapped Cache PerformanceBaseline Design (3):System Parameters:Instruction issue rate = 1000 MIPS.
There are two separate first-level caches: instruction and data caches.
The size of the first-level caches is 4KB (with 16B lines).
The size of the second-level is 1MB (with 128B lines).
The miss penalties are assumed to be 24 instruction times for the first level and 320 instruction times for the second level.Direct-Mapped Cache PerformanceBaseline Design (4):Test program characteristics
Direct-Mapped Cache PerformanceBaseline Design (5):Baseline system first-level cache miss rates
Direct-Mapped Cache PerformanceBaseline Design (6):Performance lost in memory hierarchyBaseline design performanceNet performance of the system
Direct-Mapped Cache PerformanceReducing Conflict Misses (1):Caches MissesCapacityCoherenceCompulsoryConflictConflict misses can be reduced using either miss caching or victim caching.Direct-Mapped Cache PerformanceReducing Conflict Misses (2):Conflict misses: are misses that would not occur if cache was fully–associative, and had LRU replacement.
Compulsory misses: are misses required in any cache organization because they are the first references to an instruction or piece of data.
Capacity misses: occur when the cache size is not sufficient to hold data between references.
Coherence misses: are misses that occur as a result of invalidation to preserve multiprocessor cache consistency.Direct-Mapped Cache PerformanceReducing Conflict Misses (3):Percentage of conflict misses, 4K I and D caches, 16B lines
Direct-Mapped Cache PerformanceReducing Conflict Misses (4):Miss Caching:We can add associativity to a direct-mapped cache by placing a small miss cache on-chip between a first-level cache and the access port to the second-level cache.
Miss cache: is a small fully-associative cache containing on the order of two to five cache lines of data.
Misses in the cache that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache.Direct-Mapped Cache PerformanceReducing Conflict Misses (5):Miss cache organization
Direct-Mapped Cache PerformanceReducing Conflict Misses (6):How miss cache works?When a miss occurs, data is returned to the direct mapped cache and the miss cache under it, where it replaces the least recently used item.
Each time the upper cache is probed, the miss cache is probed as well.
If a miss occurs in the upper cache but the address hits in the miss cache, then the direct-mapped cache can be reloaded in the next cycle from the miss cache.Direct-Mapped Cache PerformanceReducing Conflict Misses (7):Conflict misses removed by miss caching
Direct-Mapped Cache PerformanceReducing Conflict Misses (8):Victim Caching:How it works?When a miss occurs, the fully-associative cache  will be loaded with the victim line from the direct-mapped cache.
In the case of a miss in the direct-mapped cache that hits in the victim cache, the contents of the direct-mapped cache line and the matching victim cache line are swapped.Direct-Mapped Cache PerformanceReducing Conflict Misses (9):Victim cache organization
Direct-Mapped Cache PerformanceReducing Conflict Misses (10):Conflict misses removed by victim caching
Direct-Mapped Cache PerformanceReducing Conflict Misses (11):Victim cache performance with varying direct-mapped data cache size
Direct-Mapped Cache PerformanceReducing Conflict Misses (12):Victim cache performance with varying data cache line size4KB direct-mapped cacheDirect-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (1):Compulsory misses are misses required in any cache organization because they are the first references to a piece of data.
Capacity misses occur when the cache size is not sufficient to hold data between references.
One way of reducing the number of capacity and compulsory misses is to use prefetch techniques such as longer cache line sizes or prefetching methods.Direct-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (2):Prefetch algorithms:1. Prefetch always: prefetches after every reference.2. Prefetch on miss: on a miss, always fetches the next line.3. Tagged prefetch: When a block is prefetched, its tag bit is set to zero. Each time a block is used its tag bit is set to one. When a block undergoes a zero to one transition its successor block is prefetched.

More Related Content

PDF
ADESTRAMENTO BÁSICO NAS UNIDADES DE INFANTARIA PÁRA-QUEDISTA PPA INF/3
PDF
INSTRUÇÕES PROVISÓRIAS O CAÇADOR IP 21-2
PDF
Técnicas Treinamento Físico Militar
PDF
파이콘 한국 2019 튜토리얼 - SHAP (Part 3)
PPTX
Sc08 Talk Final
PPT
Simple Site Speed Improvements (SMX 2010)
PPTX
Cache management
PPT
Ways to reduce misses
ADESTRAMENTO BÁSICO NAS UNIDADES DE INFANTARIA PÁRA-QUEDISTA PPA INF/3
INSTRUÇÕES PROVISÓRIAS O CAÇADOR IP 21-2
Técnicas Treinamento Físico Militar
파이콘 한국 2019 튜토리얼 - SHAP (Part 3)
Sc08 Talk Final
Simple Site Speed Improvements (SMX 2010)
Cache management
Ways to reduce misses

Similar to Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers (20)

PPTX
Cache memory
PPTX
Cache.pptx
PPT
Memory organization including cache and RAM.ppt
PPTX
GRP13_CACHE MEMORY ORGANIZATION AND DIFFERENT CACHE MAPPING TECHNIQUES.pptx
PDF
Lecture 25
PPTX
Cache simulator
PPTX
hierarchical memory technology.pptx
PPT
cache memory.ppt
PPT
cache memory.ppt
PPT
lec16-memory.ppt
PPTX
Cache memoy designed by Mohd Tariq
PDF
Memory mapping
PPT
12-6810-12.ppt
DOCX
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
PDF
Architecture and implementation issues of multi core processors and caching –...
PPTX
Cache memory
PDF
computerarchitecturecachememory-170927134432.pdf
PPT
Computer architecture cache memory
PPTX
Cache Memory.pptx
Cache memory
Cache.pptx
Memory organization including cache and RAM.ppt
GRP13_CACHE MEMORY ORGANIZATION AND DIFFERENT CACHE MAPPING TECHNIQUES.pptx
Lecture 25
Cache simulator
hierarchical memory technology.pptx
cache memory.ppt
cache memory.ppt
lec16-memory.ppt
Cache memoy designed by Mohd Tariq
Memory mapping
12-6810-12.ppt
Chapter 8 1 Digital Design and Computer Architecture, 2n.docx
Architecture and implementation issues of multi core processors and caching –...
Cache memory
computerarchitecturecachememory-170927134432.pdf
Computer architecture cache memory
Cache Memory.pptx
Ad

More from Mshari Alabdulkarim (6)

PPSX
Qo s provisioning for scalable video streaming over ad hoc networks using cro...
PPSX
Generate and test random numbers
PPTX
Ad-Hoc Networks
PPSX
Power Saving in Wireless Sensor Networks
Qo s provisioning for scalable video streaming over ad hoc networks using cro...
Generate and test random numbers
Ad-Hoc Networks
Power Saving in Wireless Sensor Networks
Ad

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Unlocking AI with Model Context Protocol (MCP)
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Electronic commerce courselecture one. Pdf
PDF
cuic standard and advanced reporting.pdf
PPTX
Machine Learning_overview_presentation.pptx
PPTX
A Presentation on Artificial Intelligence
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Big Data Technologies - Introduction.pptx
Chapter 3 Spatial Domain Image Processing.pdf
Review of recent advances in non-invasive hemoglobin estimation
Unlocking AI with Model Context Protocol (MCP)
The AUB Centre for AI in Media Proposal.docx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Programs and apps: productivity, graphics, security and other tools
Per capita expenditure prediction using model stacking based on satellite ima...
Assigned Numbers - 2025 - Bluetooth® Document
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
“AI and Expert System Decision Support & Business Intelligence Systems”
Electronic commerce courselecture one. Pdf
cuic standard and advanced reporting.pdf
Machine Learning_overview_presentation.pptx
A Presentation on Artificial Intelligence
Digital-Transformation-Roadmap-for-Companies.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
A comparative analysis of optical character recognition models for extracting...
Big Data Technologies - Introduction.pptx

Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers

  • 1. Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch BuffersEng. MshariAlabdulkarim
  • 5. Reducing Capacity and Compulsory Misses.Direct-Mapped Cache PerformanceIntroduction (1):Goal: Improve the performance of caches.Why it is important to enhance the performance of the cache?Because it has a dramatic effect on the performance of advanced processors.The miss cost has been increasing in the last decade because of:The cycle time has been decreasing much faster than main memory access time.
  • 6. The average number of machines cycles per instruction has also been decreasing.Direct-Mapped Cache PerformanceIntroduction (2):VAX was an instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC) in the mid-1970s.
  • 7. VAX 11/780 is a complex instruction set computer (CISC) machine.
  • 8. VAX-11/780 was the first model of VAX machines, and it has been introduced on October 25, 1977.
  • 9. It has been used for a while as a baseline in CPU benchmarks because its speed was about one MIPS.Direct-Mapped Cache PerformanceIntroduction (3):The increasing cost of cache misses:
  • 11. Direct-Mapped Cache PerformanceBaseline Design (2):Chip Specifications:The cycle time of this chip is 3 to 8 times longer than the instruction issue rate.
  • 12. The first & second level caches are assumed to be direct-mapped ⟹ “fastest effective access time”.
  • 13. The data cache can be either write-through or write-back. How to obtain?123Very fast on-chip clockIssuing many inst. per cycleUsing higher speed technology for the processor chip
  • 14. Direct-Mapped Cache PerformanceBaseline Design (3):System Parameters:Instruction issue rate = 1000 MIPS.
  • 15. There are two separate first-level caches: instruction and data caches.
  • 16. The size of the first-level caches is 4KB (with 16B lines).
  • 17. The size of the second-level is 1MB (with 128B lines).
  • 18. The miss penalties are assumed to be 24 instruction times for the first level and 320 instruction times for the second level.Direct-Mapped Cache PerformanceBaseline Design (4):Test program characteristics
  • 19. Direct-Mapped Cache PerformanceBaseline Design (5):Baseline system first-level cache miss rates
  • 20. Direct-Mapped Cache PerformanceBaseline Design (6):Performance lost in memory hierarchyBaseline design performanceNet performance of the system
  • 21. Direct-Mapped Cache PerformanceReducing Conflict Misses (1):Caches MissesCapacityCoherenceCompulsoryConflictConflict misses can be reduced using either miss caching or victim caching.Direct-Mapped Cache PerformanceReducing Conflict Misses (2):Conflict misses: are misses that would not occur if cache was fully–associative, and had LRU replacement.
  • 22. Compulsory misses: are misses required in any cache organization because they are the first references to an instruction or piece of data.
  • 23. Capacity misses: occur when the cache size is not sufficient to hold data between references.
  • 24. Coherence misses: are misses that occur as a result of invalidation to preserve multiprocessor cache consistency.Direct-Mapped Cache PerformanceReducing Conflict Misses (3):Percentage of conflict misses, 4K I and D caches, 16B lines
  • 25. Direct-Mapped Cache PerformanceReducing Conflict Misses (4):Miss Caching:We can add associativity to a direct-mapped cache by placing a small miss cache on-chip between a first-level cache and the access port to the second-level cache.
  • 26. Miss cache: is a small fully-associative cache containing on the order of two to five cache lines of data.
  • 27. Misses in the cache that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache.Direct-Mapped Cache PerformanceReducing Conflict Misses (5):Miss cache organization
  • 28. Direct-Mapped Cache PerformanceReducing Conflict Misses (6):How miss cache works?When a miss occurs, data is returned to the direct mapped cache and the miss cache under it, where it replaces the least recently used item.
  • 29. Each time the upper cache is probed, the miss cache is probed as well.
  • 30. If a miss occurs in the upper cache but the address hits in the miss cache, then the direct-mapped cache can be reloaded in the next cycle from the miss cache.Direct-Mapped Cache PerformanceReducing Conflict Misses (7):Conflict misses removed by miss caching
  • 31. Direct-Mapped Cache PerformanceReducing Conflict Misses (8):Victim Caching:How it works?When a miss occurs, the fully-associative cache will be loaded with the victim line from the direct-mapped cache.
  • 32. In the case of a miss in the direct-mapped cache that hits in the victim cache, the contents of the direct-mapped cache line and the matching victim cache line are swapped.Direct-Mapped Cache PerformanceReducing Conflict Misses (9):Victim cache organization
  • 33. Direct-Mapped Cache PerformanceReducing Conflict Misses (10):Conflict misses removed by victim caching
  • 34. Direct-Mapped Cache PerformanceReducing Conflict Misses (11):Victim cache performance with varying direct-mapped data cache size
  • 35. Direct-Mapped Cache PerformanceReducing Conflict Misses (12):Victim cache performance with varying data cache line size4KB direct-mapped cacheDirect-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (1):Compulsory misses are misses required in any cache organization because they are the first references to a piece of data.
  • 36. Capacity misses occur when the cache size is not sufficient to hold data between references.
  • 37. One way of reducing the number of capacity and compulsory misses is to use prefetch techniques such as longer cache line sizes or prefetching methods.Direct-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (2):Prefetch algorithms:1. Prefetch always: prefetches after every reference.2. Prefetch on miss: on a miss, always fetches the next line.3. Tagged prefetch: When a block is prefetched, its tag bit is set to zero. Each time a block is used its tag bit is set to one. When a block undergoes a zero to one transition its successor block is prefetched.
  • 38. Direct-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (3):Limited time for prefetch
  • 39. Direct-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (4):Stream Buffers:Goal: start the prefetch before a tag transition can take place.
  • 40. Direct-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (5):How stream buffer works?When a miss occurs, the stream buffer begins prefetching successive lines starting at the miss target.
  • 41. As each prefetch request is sent out, the tag for the address is entered into the stream buffer, and the available bit is set to false.
  • 42. When the prefetch data returns it is placed in the entry with its tag and the available bit is set to true.
  • 43. If a reference misses in the cache but hits in the buffer the cache can be reloaded in a single cycle from the stream buffer.
  • 44. When a line is moved from a stream buffer to the cache, the entries in the stream buffer can shift up by one and a new successive address is fetched.Direct-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (6):Sequential stream buffer performance4-entry I-stream buffer backing 4KB I-cache with 16B lines.
  • 45. 4-entry D-stream buffer backing 4KB D-cache with 16B lines.Direct-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (6):Multi-Way Stream Buffers:The single stream buffer could remove 72% of the instruction cache misses, but it could only remove 25% of the datacache misses.
  • 46. One reason for this is that data references tend to consist of interleaved streams of data from different sources.Direct-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (7):
  • 47. Direct-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (8):Four-way stream buffer performance
  • 48. Direct-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (9):Stream buffer performance vs. cache size
  • 49. Direct-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (10):Stream buffer performance vs. line size
  • 50. Direct-Mapped Cache PerformanceSystem performance with victim cache and stream buffers
  • 51. Direct-Mapped Cache PerformanceReference:N. P Jouppi, “Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers,” in Computer Architecture, 1990. Proceedings., 17th Annual International Symposium on, 2002, 364–373.
  • 52. Direct-Mapped Cache PerformanceThank you for your attention