Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers

Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch BuffersEng. MshariAlabdulkarim

Direct-Mapped Cache PerformanceOutline:Introduction.

Reducing Capacity and Compulsory Misses.Direct-Mapped Cache PerformanceIntroduction (1):Goal: Improve the performance of caches.Why it is important to enhance the performance of the cache?Because it has a dramatic effect on the performance of advanced processors.The miss cost has been increasing in the last decade because of:The cycle time has been decreasing much faster than main memory access time.

The average number of machines cycles per instruction has also been decreasing.Direct-Mapped Cache PerformanceIntroduction (2):VAX was an instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC) in the mid-1970s.

VAX 11/780 is a complex instruction set computer (CISC) machine.

VAX-11/780 was the first model of VAX machines, and it has been introduced on October 25, 1977.

It has been used for a while as a baseline in CPU benchmarks because its speed was about one MIPS.Direct-Mapped Cache PerformanceIntroduction (3):The increasing cost of cache misses:

Direct-Mapped Cache PerformanceBaseline Design (1):

Direct-Mapped Cache PerformanceBaseline Design (2):Chip Specifications:The cycle time of this chip is 3 to 8 times longer than the instruction issue rate.

The first & second level caches are assumed to be direct-mapped ⟹ “fastest effective access time”.

The data cache can be either write-through or write-back. How to obtain?123Very fast on-chip clockIssuing many inst. per cycleUsing higher speed technology for the processor chip

Direct-Mapped Cache PerformanceBaseline Design (3):System Parameters:Instruction issue rate = 1000 MIPS.

There are two separate first-level caches: instruction and data caches.

The size of the first-level caches is 4KB (with 16B lines).

The size of the second-level is 1MB (with 128B lines).

The miss penalties are assumed to be 24 instruction times for the first level and 320 instruction times for the second level.Direct-Mapped Cache PerformanceBaseline Design (4):Test program characteristics

Direct-Mapped Cache PerformanceBaseline Design (5):Baseline system first-level cache miss rates

Direct-Mapped Cache PerformanceBaseline Design (6):Performance lost in memory hierarchyBaseline design performanceNet performance of the system

Direct-Mapped Cache PerformanceReducing Conflict Misses (1):Caches MissesCapacityCoherenceCompulsoryConflictConflict misses can be reduced using either miss caching or victim caching.Direct-Mapped Cache PerformanceReducing Conflict Misses (2):Conflict misses: are misses that would not occur if cache was fully–associative, and had LRU replacement.

Compulsory misses: are misses required in any cache organization because they are the first references to an instruction or piece of data.

Capacity misses: occur when the cache size is not sufficient to hold data between references.

Coherence misses: are misses that occur as a result of invalidation to preserve multiprocessor cache consistency.Direct-Mapped Cache PerformanceReducing Conflict Misses (3):Percentage of conflict misses, 4K I and D caches, 16B lines

Direct-Mapped Cache PerformanceReducing Conflict Misses (4):Miss Caching:We can add associativity to a direct-mapped cache by placing a small miss cache on-chip between a first-level cache and the access port to the second-level cache.

Miss cache: is a small fully-associative cache containing on the order of two to five cache lines of data.

Misses in the cache that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache.Direct-Mapped Cache PerformanceReducing Conflict Misses (5):Miss cache organization

Direct-Mapped Cache PerformanceReducing Conflict Misses (6):How miss cache works?When a miss occurs, data is returned to the direct mapped cache and the miss cache under it, where it replaces the least recently used item.

Each time the upper cache is probed, the miss cache is probed as well.

If a miss occurs in the upper cache but the address hits in the miss cache, then the direct-mapped cache can be reloaded in the next cycle from the miss cache.Direct-Mapped Cache PerformanceReducing Conflict Misses (7):Conflict misses removed by miss caching

Direct-Mapped Cache PerformanceReducing Conflict Misses (8):Victim Caching:How it works?When a miss occurs, the fully-associative cache will be loaded with the victim line from the direct-mapped cache.

In the case of a miss in the direct-mapped cache that hits in the victim cache, the contents of the direct-mapped cache line and the matching victim cache line are swapped.Direct-Mapped Cache PerformanceReducing Conflict Misses (9):Victim cache organization

Direct-Mapped Cache PerformanceReducing Conflict Misses (10):Conflict misses removed by victim caching

Direct-Mapped Cache PerformanceReducing Conflict Misses (11):Victim cache performance with varying direct-mapped data cache size

Direct-Mapped Cache PerformanceReducing Conflict Misses (12):Victim cache performance with varying data cache line size4KB direct-mapped cacheDirect-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (1):Compulsory misses are misses required in any cache organization because they are the first references to a piece of data.

Capacity misses occur when the cache size is not sufficient to hold data between references.

One way of reducing the number of capacity and compulsory misses is to use prefetch techniques such as longer cache line sizes or prefetching methods.Direct-Mapped Cache PerformanceReducing Capacity and Compulsory Misses (2):Prefetch algorithms:1. Prefetch always: prefetches after every reference.2. Prefetch on miss: on a miss, always fetches the next line.3. Tagged prefetch: When a block is prefetched, its tag bit is set to zero. Each time a block is used its tag bit is set to one. When a block undergoes a zero to one transition its successor block is prefetched.

Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers

More Related Content

Similar to Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers (20)

More from Mshari Alabdulkarim (6)

Recently uploaded (20)

Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers