SlideShare a Scribd company logo
Pipelining Cache
By Riman Mandal
Contents
▪ What is Pipelining?
▪ Cache optimization
▪ Why Pipelining cache?
▪ Cache Hit and Cache Access
▪ How can we implement pipelining to cache
▪ Cache Pipelining effects
▪ References
What is Pipelining?
Time
Jobs
24 hrs
24 hrs
24 hrs
Un-pipelined
Throughput Parallelism
1 car /
24 hrs 1
Start and Finish a job before moving to next job
What is Pipelining? (cont.)
Time
Jobs
Throughput Parallelism
1 car / 8
hrs 3
Pipelined Break the job into small stages
Engine1
Engine2
Engine3
Engine4
Body1
Body2
Body2
Body4
Paint1
Paint2
Paint3
Paint4
8 hr
8 hr
8 hr
x3
What is Pipelining? (cont.)
Time
Jobs
3 ns
3 ns
3 ns
Un-pipelined Start and Finish an instruction execution before
moving to next instruction
FET DEC EXE
FET DEC EXE
FET DEC EXE
Cyc 1
Cyc 2
Cyc 3
What is Pipelining? (cont.)
Time
Jobs
Pipelined Break the instruction exeution into small stages
FET IR1
FET IR2
FET IR3
FET IR4
DEC IR1
DEC IR2
DEC IR3
DEC IR4
EXC IR1
EXC IR2
EXC IR3
EXC IR4
Cyc 1
Cyc 2
Cyc 3
1 ns 1 ns 1 ns
Un-pipelined
Clock Speed = 1 / 3ns
= 333 MHz
Pipelined
Clock Speed = 1 / 1ns
= 1 GHz
Cache optimization
▪ Average memory access time(AMAT) = Hit time + Miss rate × Miss
penalty
▪ 5 matrices : hit time, miss rate, miss penalty, bandwidth, power
consumption
▪ Optimizing CacheAccessTime
– Reducing the hit time (1st level catch, way-prediction)
– Increasing cache bandwidth (pipelining cache, non-blocking cache, multibanked
cache)
– Reducing the miss penalty (critical word first, merging write buffers)
– Reducing the miss rate (compiler optimizations)
– Reducing the miss penalty or miss rate via parallelism (prefetching)
Why Pipelining Cache?
▪ Basically used for L1 Cache.
▪ Multiple Cycles to access the cache
– Access comes in cycle N (hit)
– Access comes in Cycle N+1 (hit) (Has to wait)
Hit time = Actual hit time + wait time
Cache Hit and Cache Access
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ?
Hit ?
Hit ?
Where ?
Index
Done
Valid
bit
Designing a 3 Stage pipeline Cache
▪ Reading the tag and validity bit.
▪ Combine the result and find out the actual hit and start data read.
▪ Finishing the data read and transfer data to CPU.
Retrieve tag and valid bit Is Hit? Start data read Serve CPU request
Stage 1:Read tag and valid bit
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ?
Hit ?
Hit ?
Where ?
Index
Done
Valid
bit
Stage 2: If Hit start reading
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ?
Hit ?
Hit ?
Where ?
Index
Done
Valid
bit
Stage 3: Supply data to CPU
Tag Set Offset@
Tag Data
Tag Data
Tag Data
Set 0
Set 1
Set 2
Hit ?
Hit ?
Hit ?
Where ?
Index
Done
Valid
bit
Designing a 2 Stage pipeline Cache
▪ Checking the tag and validity bit and combine them to find actual hit,
and find the location of data.
▪ Read data and serve the CPU request.
Retrieve tag and valid bit. Is Hit? Serve CPU request
Example
▪ Instruction-cache pipeline stages:
– Pentium: 1 stage
– Pentium Pro through Pentium III: 2 stages
– Pentium 4: 4 stages
Pipeline Cache Efficiency
▪ Increases the bandwidth
▪ increasing the number of pipeline stages leading to
– greater penalty on mispredicted branches
– more clock cycles between issuing the load and using the data
Technique
Hit
time
Bandwidth
Miss
penalty
Miss
rate
Power
consumption
Pipelining
Cache
_ +
References
▪ https://www.udacity.com/course/high-performance-computer-
architecture--ud007
▪ https://www.youtube.com/watch?v=r9AxfQB_qlc
▪ “ComputerArchitecture: A Quantitative Approach Fifth Edition”, by
Hennessy & Patterson

More Related Content

PDF
Relay baton - Good example of one piece continous flow
PDF
Ensuring QoS in Multi-tenant Hadoop Environments
PDF
Parallel Prime Number Generation Using The Sieve of Eratosthenes
DOCX
Taller syslog
PPTX
Agile processexplained
PPT
Quick Sort
PDF
がんばれテックリード!JIRA芸人篇!!
PPTX
A star algorithm in artificial intelligence
Relay baton - Good example of one piece continous flow
Ensuring QoS in Multi-tenant Hadoop Environments
Parallel Prime Number Generation Using The Sieve of Eratosthenes
Taller syslog
Agile processexplained
Quick Sort
がんばれテックリード!JIRA芸人篇!!
A star algorithm in artificial intelligence

Similar to Pipelining cache (20)

PDF
How to build TiDB
PPTX
OPTIMIZING THE TICK STACK
PDF
Analyzing and Interpreting AWR
PDF
Delta: Building Merge on Read
PDF
VLSI Static Timing Analysis Setup And Hold Part 2
PPTX
TimeComplexity important topic of Algorithm analysis .pptx
PPTX
Building real time Data Pipeline using Spark Streaming
PDF
01Query Processing and Optimization-SUM25.pdf
PPTX
CMPN301-Pipelining_V2.pptx
PDF
13 static timing_analysis_4_set_up_and_hold_time_violation_remedy
PPTX
Kusto (Azure Data Explorer) Training for R&D - January 2019
PDF
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
PDF
Google Spanner
PDF
Unveiling etcd: Architecture and Source Code Deep Dive
PDF
Spark + AI Summit recap jul16 2020
PDF
Computer network (8)
PDF
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
PDF
presentation
PPTX
Web TCard - Speed optimization
PDF
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
How to build TiDB
OPTIMIZING THE TICK STACK
Analyzing and Interpreting AWR
Delta: Building Merge on Read
VLSI Static Timing Analysis Setup And Hold Part 2
TimeComplexity important topic of Algorithm analysis .pptx
Building real time Data Pipeline using Spark Streaming
01Query Processing and Optimization-SUM25.pdf
CMPN301-Pipelining_V2.pptx
13 static timing_analysis_4_set_up_and_hold_time_violation_remedy
Kusto (Azure Data Explorer) Training for R&D - January 2019
Flink Forward Berlin 2018: Lasse Nedergaard - "Our successful journey with Fl...
Google Spanner
Unveiling etcd: Architecture and Source Code Deep Dive
Spark + AI Summit recap jul16 2020
Computer network (8)
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
presentation
Web TCard - Speed optimization
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Ad

Recently uploaded (20)

PDF
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
PDF
Microsoft Office 365 Crack Download Free
PDF
AI Guide for Business Growth - Arna Softech
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PPTX
Introduction to Windows Operating System
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Autodesk AutoCAD Crack Free Download 2025
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
Cybersecurity: Protecting the Digital World
PDF
Time Tracking Features That Teams and Organizations Actually Need
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PPTX
Trending Python Topics for Data Visualization in 2025
AI/ML Infra Meetup | LLM Agents and Implementation Challenges
Microsoft Office 365 Crack Download Free
AI Guide for Business Growth - Arna Softech
MCP Security Tutorial - Beginner to Advanced
Why Generative AI is the Future of Content, Code & Creativity?
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Introduction to Windows Operating System
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Autodesk AutoCAD Crack Free Download 2025
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
Advanced SystemCare Ultimate Crack + Portable (2025)
How Tridens DevSecOps Ensures Compliance, Security, and Agility
Computer Software and OS of computer science of grade 11.pptx
Cybersecurity: Protecting the Digital World
Time Tracking Features That Teams and Organizations Actually Need
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Trending Python Topics for Data Visualization in 2025
Ad

Pipelining cache

  • 2. Contents ▪ What is Pipelining? ▪ Cache optimization ▪ Why Pipelining cache? ▪ Cache Hit and Cache Access ▪ How can we implement pipelining to cache ▪ Cache Pipelining effects ▪ References
  • 3. What is Pipelining? Time Jobs 24 hrs 24 hrs 24 hrs Un-pipelined Throughput Parallelism 1 car / 24 hrs 1 Start and Finish a job before moving to next job
  • 4. What is Pipelining? (cont.) Time Jobs Throughput Parallelism 1 car / 8 hrs 3 Pipelined Break the job into small stages Engine1 Engine2 Engine3 Engine4 Body1 Body2 Body2 Body4 Paint1 Paint2 Paint3 Paint4 8 hr 8 hr 8 hr x3
  • 5. What is Pipelining? (cont.) Time Jobs 3 ns 3 ns 3 ns Un-pipelined Start and Finish an instruction execution before moving to next instruction FET DEC EXE FET DEC EXE FET DEC EXE Cyc 1 Cyc 2 Cyc 3
  • 6. What is Pipelining? (cont.) Time Jobs Pipelined Break the instruction exeution into small stages FET IR1 FET IR2 FET IR3 FET IR4 DEC IR1 DEC IR2 DEC IR3 DEC IR4 EXC IR1 EXC IR2 EXC IR3 EXC IR4 Cyc 1 Cyc 2 Cyc 3 1 ns 1 ns 1 ns Un-pipelined Clock Speed = 1 / 3ns = 333 MHz Pipelined Clock Speed = 1 / 1ns = 1 GHz
  • 7. Cache optimization ▪ Average memory access time(AMAT) = Hit time + Miss rate × Miss penalty ▪ 5 matrices : hit time, miss rate, miss penalty, bandwidth, power consumption ▪ Optimizing CacheAccessTime – Reducing the hit time (1st level catch, way-prediction) – Increasing cache bandwidth (pipelining cache, non-blocking cache, multibanked cache) – Reducing the miss penalty (critical word first, merging write buffers) – Reducing the miss rate (compiler optimizations) – Reducing the miss penalty or miss rate via parallelism (prefetching)
  • 8. Why Pipelining Cache? ▪ Basically used for L1 Cache. ▪ Multiple Cycles to access the cache – Access comes in cycle N (hit) – Access comes in Cycle N+1 (hit) (Has to wait) Hit time = Actual hit time + wait time
  • 9. Cache Hit and Cache Access Tag Set Offset@ Tag Data Tag Data Tag Data Set 0 Set 1 Set 2 Hit ? Hit ? Hit ? Where ? Index Done Valid bit
  • 10. Designing a 3 Stage pipeline Cache ▪ Reading the tag and validity bit. ▪ Combine the result and find out the actual hit and start data read. ▪ Finishing the data read and transfer data to CPU. Retrieve tag and valid bit Is Hit? Start data read Serve CPU request
  • 11. Stage 1:Read tag and valid bit Tag Set Offset@ Tag Data Tag Data Tag Data Set 0 Set 1 Set 2 Hit ? Hit ? Hit ? Where ? Index Done Valid bit
  • 12. Stage 2: If Hit start reading Tag Set Offset@ Tag Data Tag Data Tag Data Set 0 Set 1 Set 2 Hit ? Hit ? Hit ? Where ? Index Done Valid bit
  • 13. Stage 3: Supply data to CPU Tag Set Offset@ Tag Data Tag Data Tag Data Set 0 Set 1 Set 2 Hit ? Hit ? Hit ? Where ? Index Done Valid bit
  • 14. Designing a 2 Stage pipeline Cache ▪ Checking the tag and validity bit and combine them to find actual hit, and find the location of data. ▪ Read data and serve the CPU request. Retrieve tag and valid bit. Is Hit? Serve CPU request
  • 15. Example ▪ Instruction-cache pipeline stages: – Pentium: 1 stage – Pentium Pro through Pentium III: 2 stages – Pentium 4: 4 stages
  • 16. Pipeline Cache Efficiency ▪ Increases the bandwidth ▪ increasing the number of pipeline stages leading to – greater penalty on mispredicted branches – more clock cycles between issuing the load and using the data Technique Hit time Bandwidth Miss penalty Miss rate Power consumption Pipelining Cache _ +