SlideShare a Scribd company logo
Understand CPU Caching ConceptsAbhijit K RaoIIIT - Bangalore
Concept of CachingNeed for Cache has come about due to reasons : The concept of Locality of reference.      -> 5 percent of the data is accessed 95 percent of the times, so          makes sense to cache the 5 percent of the data.The gap between CPU and main memory speeds.     ->  In analogy to producer consumer problem, the CPU is the          consumer and RAM, hard disks act as producers. Slow          producers limit  the performance of the consumer.
Locality of ReferenceSpatial locality : If a particular memory location say nth location is referenced at a particular time, then it is likely that (n+1) th memory location will be referenced in the near futureThe actual piece of data that was requested is called the critical word, and the surrounding group of bytes that gets fetched along with it is called a cache line or cache block.Temporal Locality: If at one point in time say T a particular memory location is referenced, then it is likely that the same location will be referenced again at time T+delta.     This is very similar to the concept of working set, i.e., set of pages which the CPU frequently accesses.
CPU Cache and its operationA CPU cache is a smaller, faster memory which stores copies  of the data from  the most frequently used main memory locations.  The concept of locality of reference drives caching concept, we cache the most frequently used, data, instruction for faster data access.   CPU cache could be data cache, instruction cache. Unlike RAM, cache is not expandable.The CPU first checks in the L1 cache for data, if it does not find it at L1, it moves over to L2 and finally L3.  If not found at L3, it’s a cache miss and RAM is searched next, followed by the hard drive.If the CPU finds the requested data in cache, it’s a cache hit, and if not, it’s a cache miss.
Levels of caching and speed, size comparisons
Cache organizationWhen the processor needs to read or write a location in main      memory, it first checks whether that memory location is in the cache.     This is accomplished by comparing the address of the memory location     to all tags in the cache that might contain that address.If the processor finds that the memory location is in the cache, we say     that a cache hit has occurred; otherwise, we speak of a cache miss.
Cache Entry structureCache row entries usually have the following structure:The data blocks (cache line) contain the actual data fetched from the main memory. The memory address is split into the tag, the index and the displacement (offset), while the valid bit denotes that this particular entry has valid data.The index length is bits and describes which row the data has been put in.
The displacement length is and specifies which block of the ones we have stored we need.
The tag length is address − index − displacement Cache organization - 1 Cache is divided into blocks. The blocks form the basic unit of cache    organization. RAM is also organized into blocks of the same size as the   cache's blocks When the CPU requests a byte from a particular RAM block, it needs to    be able to determine three things very quickly:1. Whether or not the needed block is actually in the cache  2. The location of the block within the cache	 3. The location of the desired byte within the block
Mapping RAM blocks to cache blockFully associative : Any RAM block can be stored in any available block frame. The problem with this scheme is that if you want to retrieve a specific block from the cache, you have to check the tag of every single block frame in the entire cache because the desired block could be in any of the framesDirect mapping : In a direct-mapped cache, each block frame can cache only a certain subset of the blocks in main memory.  ForEg. Ram block X whose modulo results in 1 are always stored in Cache block 1.    The problem with this approach is certain cache blocks could remain unused and there could be frequent eviction of cache entries for certain cache blocks.N way associative : Ram block X, could be either mapped to Cache Block X or Y.
Handling Cache MissIn order to make room for the new entry on a cache miss, the cache has to evict one of the existing entries. The heuristic that it uses to choose the entry to evict is called the replacement policy. The fundamental problem with any replacement policy is that it must predict which existing cache entry is least likely to be used in the future. Some of the replacement policies are :Random Eviction: Removal of any cache entry by random choice.LIFO: Evicting the latest cache entry.FIFO: Evicting the oldest cache entry.LRU: Evicting the Least recently used cache entry.
Mirroring Cache to Main memoryIf data are written to the cache, they must at some point be written to main memory and other higher order cache as well. The timing of this write is controlled by what is known as the write policy.A Write-through cache, every write to the cache causes a write to main memory and higher order cache like L2, L3.Write-back or copy-back cache, writes are not immediately mirrored to the main memory.  Instead, the cache tracks which locations will be evicted. Such entries are written to main memory, higher order cache just before eviction of the cache entry
Stale data in cacheThe data in main memory being cached may be changed by other entities (e.g. peripherals using direct memory access or multi-core processor), in which case the copy in the cache may become out-of-date or stale. Alternatively, when the CPU in a multi-core processor updates the data in the cache, copies of data in caches associated with other cores will become stale. Communication protocols between the cache managers. Which keep the data consistent are known as cache coherence protocols.  Eg. Snoopy based, directory based, token based.
State of the Art todayCurrent day research on cache design, handling cache coherence, is more biased to multicore architectures.ReferencesWikipedia : http://en.wikipedia.org/wiki/CPU_cacheArsTechnica : http://arstechnica.com/http://software.intel.comWhat Every Programmer Should Know About Memory -    - Ulrich Drepper, Red Hat, Inc.

More Related Content

PPT
Cpu caching concepts mr mahesh
PPT
Cache Memory
PPT
Computer architecture cache memory
PPT
04 cache memory...
PPT
Lecture2
PPTX
PPT
cache memory
PDF
677_Project_Report_V2
Cpu caching concepts mr mahesh
Cache Memory
Computer architecture cache memory
04 cache memory...
Lecture2
cache memory
677_Project_Report_V2

Similar to CPU Caching Concepts (20)

PPTX
Cache memory ppt
PPTX
cachememppt analyzing the structure of the cache memoyr
PDF
Cache memory
PPT
IS 139 Lecture 7
PPTX
lecture-5.pptx
DOCX
Cache memory
PPT
Memory organization including cache and RAM.ppt
PDF
lecture-2-3_Memory.pdf,describing memory
PPT
chapter 6 memory computer architecture.ppt
PPS
Cache memory
PDF
cachememory-210517060741 (1).pdf
PPTX
Cache Memory
PDF
computer-memory
PPTX
Cache Memory.pptx
PPTX
MODULE-4 - Memory-System used in Computer organization
PPT
Memory Organization and Cache mapping.ppt
PPTX
UNIT IV Computer architecture Analysis.pptx
PPTX
Chapter_07_Cache_Memory presentation.pptx
PPTX
Elements of cache design
PPT
Computer organization memory hierarchy
Cache memory ppt
cachememppt analyzing the structure of the cache memoyr
Cache memory
IS 139 Lecture 7
lecture-5.pptx
Cache memory
Memory organization including cache and RAM.ppt
lecture-2-3_Memory.pdf,describing memory
chapter 6 memory computer architecture.ppt
Cache memory
cachememory-210517060741 (1).pdf
Cache Memory
computer-memory
Cache Memory.pptx
MODULE-4 - Memory-System used in Computer organization
Memory Organization and Cache mapping.ppt
UNIT IV Computer architecture Analysis.pptx
Chapter_07_Cache_Memory presentation.pptx
Elements of cache design
Computer organization memory hierarchy
Ad

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
1. Introduction to Computer Programming.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
Teaching material agriculture food technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
cuic standard and advanced reporting.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Approach and Philosophy of On baking technology
Spectroscopy.pptx food analysis technology
Spectral efficient network and resource selection model in 5G networks
1. Introduction to Computer Programming.pptx
Programs and apps: productivity, graphics, security and other tools
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
Electronic commerce courselecture one. Pdf
20250228 LYD VKU AI Blended-Learning.pptx
A Presentation on Artificial Intelligence
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Teaching material agriculture food technology
Unlocking AI with Model Context Protocol (MCP)
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
cuic standard and advanced reporting.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Approach and Philosophy of On baking technology
Ad

CPU Caching Concepts

  • 1. Understand CPU Caching ConceptsAbhijit K RaoIIIT - Bangalore
  • 2. Concept of CachingNeed for Cache has come about due to reasons : The concept of Locality of reference. -> 5 percent of the data is accessed 95 percent of the times, so makes sense to cache the 5 percent of the data.The gap between CPU and main memory speeds. -> In analogy to producer consumer problem, the CPU is the consumer and RAM, hard disks act as producers. Slow producers limit the performance of the consumer.
  • 3. Locality of ReferenceSpatial locality : If a particular memory location say nth location is referenced at a particular time, then it is likely that (n+1) th memory location will be referenced in the near futureThe actual piece of data that was requested is called the critical word, and the surrounding group of bytes that gets fetched along with it is called a cache line or cache block.Temporal Locality: If at one point in time say T a particular memory location is referenced, then it is likely that the same location will be referenced again at time T+delta. This is very similar to the concept of working set, i.e., set of pages which the CPU frequently accesses.
  • 4. CPU Cache and its operationA CPU cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations. The concept of locality of reference drives caching concept, we cache the most frequently used, data, instruction for faster data access. CPU cache could be data cache, instruction cache. Unlike RAM, cache is not expandable.The CPU first checks in the L1 cache for data, if it does not find it at L1, it moves over to L2 and finally L3. If not found at L3, it’s a cache miss and RAM is searched next, followed by the hard drive.If the CPU finds the requested data in cache, it’s a cache hit, and if not, it’s a cache miss.
  • 5. Levels of caching and speed, size comparisons
  • 6. Cache organizationWhen the processor needs to read or write a location in main memory, it first checks whether that memory location is in the cache. This is accomplished by comparing the address of the memory location to all tags in the cache that might contain that address.If the processor finds that the memory location is in the cache, we say that a cache hit has occurred; otherwise, we speak of a cache miss.
  • 7. Cache Entry structureCache row entries usually have the following structure:The data blocks (cache line) contain the actual data fetched from the main memory. The memory address is split into the tag, the index and the displacement (offset), while the valid bit denotes that this particular entry has valid data.The index length is bits and describes which row the data has been put in.
  • 8. The displacement length is and specifies which block of the ones we have stored we need.
  • 9. The tag length is address − index − displacement Cache organization - 1 Cache is divided into blocks. The blocks form the basic unit of cache organization. RAM is also organized into blocks of the same size as the cache's blocks When the CPU requests a byte from a particular RAM block, it needs to be able to determine three things very quickly:1. Whether or not the needed block is actually in the cache 2. The location of the block within the cache 3. The location of the desired byte within the block
  • 10. Mapping RAM blocks to cache blockFully associative : Any RAM block can be stored in any available block frame. The problem with this scheme is that if you want to retrieve a specific block from the cache, you have to check the tag of every single block frame in the entire cache because the desired block could be in any of the framesDirect mapping : In a direct-mapped cache, each block frame can cache only a certain subset of the blocks in main memory. ForEg. Ram block X whose modulo results in 1 are always stored in Cache block 1. The problem with this approach is certain cache blocks could remain unused and there could be frequent eviction of cache entries for certain cache blocks.N way associative : Ram block X, could be either mapped to Cache Block X or Y.
  • 11. Handling Cache MissIn order to make room for the new entry on a cache miss, the cache has to evict one of the existing entries. The heuristic that it uses to choose the entry to evict is called the replacement policy. The fundamental problem with any replacement policy is that it must predict which existing cache entry is least likely to be used in the future. Some of the replacement policies are :Random Eviction: Removal of any cache entry by random choice.LIFO: Evicting the latest cache entry.FIFO: Evicting the oldest cache entry.LRU: Evicting the Least recently used cache entry.
  • 12. Mirroring Cache to Main memoryIf data are written to the cache, they must at some point be written to main memory and other higher order cache as well. The timing of this write is controlled by what is known as the write policy.A Write-through cache, every write to the cache causes a write to main memory and higher order cache like L2, L3.Write-back or copy-back cache, writes are not immediately mirrored to the main memory. Instead, the cache tracks which locations will be evicted. Such entries are written to main memory, higher order cache just before eviction of the cache entry
  • 13. Stale data in cacheThe data in main memory being cached may be changed by other entities (e.g. peripherals using direct memory access or multi-core processor), in which case the copy in the cache may become out-of-date or stale. Alternatively, when the CPU in a multi-core processor updates the data in the cache, copies of data in caches associated with other cores will become stale. Communication protocols between the cache managers. Which keep the data consistent are known as cache coherence protocols. Eg. Snoopy based, directory based, token based.
  • 14. State of the Art todayCurrent day research on cache design, handling cache coherence, is more biased to multicore architectures.ReferencesWikipedia : http://en.wikipedia.org/wiki/CPU_cacheArsTechnica : http://arstechnica.com/http://software.intel.comWhat Every Programmer Should Know About Memory - - Ulrich Drepper, Red Hat, Inc.
  • 15. Q/A