SlideShare a Scribd company logo
Computer Memory
Terminologies
 Capacity: the amount of information that can be
contained in a memory unit -- usually in terms of words or
bytes
 Word: the natural unit of organization in the memory,
typically the number of bits used to represent a number
 Addressable unit: the fundamental data element size
that can be addressed in the memory -- typically either
the word size or individual bytes
 Unit of transfer: The number of data elements
transferred at a time – usually bits in main memory and
blocks in secondary memory
Terminologies
 Access time:
– For RAM, the time to address the unit and perform the transfer
– For non-random access memory, the time to position the R/W
head over the desired location
 Memory cycle time: Access time plus any other time required before a
second access can be startedsecond access can be started
 Transfer rate: Rate at which data is transferred to/from the memory
device.
 For random-access memory, it is 1/cycle time
 For non-random-access memory
TN = TA +
TN = Average time to read or write N bits
TA = Average access time
N = number of bits
R = Transfer Rate, in bps
Access technique
 how are memory contents accessed
– Random access:
» Each location has a unique physical address
» Locations can be accessed in any order and
all access times are the sameall access times are the same
» What we term “RAM” is more aptly called
read/write memory since this access technique also
applies to ROMs as well
 » Example: main memory
Access technique
 – Sequential access:
» Data does not have a unique address
» Must read all data items in sequence until the
desired item is found
» Access times are highly variable
» Example: tape drive units
 – Direct access: – Direct access:
» Data items have unique addresses
» Access is done using a combination of moving to a
general memory “area” followed by a sequential access
to reach the desired data item
» Example: disk drives
Access technique
 – Associative access:
» A variation of random access memory
» Data items are accessed based on their
contents rather than their actual location
» Search all data items in parallel for a match to
a given search patterna given search pattern
» All memory locations searched in parallel
without regard to the size of the memory
Extremely fast for large memory sizes
» Cost per it is 5-10 times that of a “normal”
RAM cell
» Example: some cache memory units
Principle of Locality
 Principle of Locality: “Programs tend to reuse data and
instructions they have used recently.”
 A widely held rule of thumb is that a program spends 90% of its
execution time in only 10% of the code. This implies that we
can predict with reasonable accuracy what instruction and data
a program will use in the near future based on its accesses in
the recent past.
7
the recent past.
 Two different types of locality have been observed:
 Temporal Locality: Recently accessed items are likely to be
accessed in the near future.
 Spatial Locality: Items whose addresses are near one another
tends to be referenced close together in time.
Programmers would want unlimited amount
of fast memory
 Computer pioneers correctly predicted that
Programmers would want unlimited amount of fast
memory. An economical solution to that desire is a
memory hierarchy, which takes advantage of locality
and cost-performance of memory technologies. The
goal is to provide a memory system with cost almostgoal is to provide a memory system with cost almost
as low as the cheapest level of memory and speed
almost as fast as the fastest level.
Challenges in designing Memory Hierarchy
The trade-off between three characteristics of
memory:(cost, capacity and access time)
 Faster access time, greater cost per bit
 Greater capacity, smaller cost per bit
 Greater capacity, slower access time
The levels in a typical memory hierarchy
Cache
 Cache is the name given to the first level of the
memory hierarchy encountered once the address
leaves the CPU.
 Cache Hit- when CPU finds a requested data item in
the cache, it is called a cache hit.the cache, it is called a cache hit.
 Cache Miss- when the CPU does not find a data
item it needs in the cache, a cache miss occurs.
How cache work
 A fixed size collection of data containing the
requested word, called a block, is retrieved from the
main memory and placed into the cache. Temporal
locality tells us that we are likely to need this word
again in the near future, so it is useful to place it in
the cache where it can be accessed quickly.the cache where it can be accessed quickly.
Because of spatial locality, there is a high probability
that the other data in the block will be needed soon. ,
Cache Read Operation
Cache Performance
Memory Stall Cycle- the number of cycles during which the CPU is
stalled waiting for a memory access is called the memory stall cycle.
The performance is then the product of the clock cycle time and the
sum of CPU cycles and the memory stall cycles:
CPU execution time = (CPU clock cycles + Memory stall cycles) × Clock Cycle time
This equation assumes that the CPU clock cycle include the time to
handle a cache hit, and that the CPU is stalled during a cache miss.
The number of memory stall cycle depends on both the number of
misses and the cost per miss, which is called the miss penalty.
Memory stall cycle = Number of misses × Miss penalty
= IC × × Miss rate × Miss penalty
Cache performance (math)
Example: assume we have a computer where the clocks per instruction (CPI)
is 1.0 when all memory accesses hit in the cache. The only data accesses are
loads and stores, and these total 50% of the instructions. If the miss penalty is
25 clock cycles and the miss rate is 2%, how much faster would the computer
be if all instruction were cache hits?
Answer: first compute the performance for the computer that always hits:
CPU execution time = (CPU clock cycles + Memory stall cycles) × Clock Cycle timeCPU execution time = (CPU clock cycles + Memory stall cycles) × Clock Cycle time
= (IC × CPI + 0) × Clock Cycle time
= IC × 1.0 × Clock Cycle time
Now for the computer with the real cache, first we compute memory stall cycles:
Memory stall cycle = IC × × Miss rate × Miss penalty
= IC × (1 + 0.5) × 0.02 × 25
= IC × 0.75
(here (1 + 0.5) represents 1 instruction access and 0.5 data access per instruction)
Cache performance (math cont.)
Thus total performance,
CPU execution timecache = (IC × 1.0 + IC × 0.75) × Clock Cycle time
= IC × 1.75 × Clock Cycle time
Now performance ratio is the inverse of the execution time,
CPU execution time IC × 1.75 × Clock Cycle timeCPU execution timecache IC × 1.75 × Clock Cycle time
CPU execution time IC × 1.0 × Clock Cycle time
= 1.75
The computer with no cache misses is 1.75 times faster.
Where can a block be placed in a cache?
There are three categories of cache organization based on the
restriction on where a block is placed.
1. Fully associative – if a block can be places anywhere in the
cache, the cache is said to be fully associative.
2. Direct mapped – if each block has only one place it can2. Direct mapped – if each block has only one place it can
appear in the cache, the cache is said to be direct mapped.
The mapping is usually
(Block address ) MOD (Number of blocks in the cache)
3. Set associative – if a block can be placed in a restricted set
of places in the cache, the cache is called set associative. A
set is a group of blocks in the cache. Block can be placed
anywhere within the set. Set is usually chosen by bit
selection, that is,
(Block address ) MOD (Number of sets in the cache)
Where can a block be placed in a
cache?(cont.)
0 1 2 3 4 5 6 7
Cache
Fully associative
Block 12 can go anywhere
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Direct mapped
Block 12 can go only
into block 4
( 12 mod 8)
Set associative
Block 12 can go
anywhere in set 0
( 12 mod 4)
Block no.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Memory
Block no.
Set Set Set Sat
0 1 2 3
This example cache has eight block frames and memory has 32 blocks
How is a block found if it is in the cache?
Caches have an address tag on each block frame that
gives the block address. The tag of every cache block
that might contain the desired information is checked
to see if it matches the block address from CPU.
the figure shows the three portion of an address in a
set associative or direct-mapped cache. The tag is
used to check all the blocks in the set and the index is
used to select the set. The block offset is the address
of desired data within the block. Fully associative
cache have no index field.
Block address
Tag Index
Block
offset
Which block should be replaced on a cache
miss?
When a cache miss occurs, the cache controller must
select a block to be replace with the desired data. There
are three primary strategies employed for selecting which
block to replace:
Random- candidate blocks are randomly selected.
Least Recent Used (LRU) – to reduce the chances of
throwing out information that will be needed soon,
accesses to blocks are recorded. The block that has been
unused to the longest time will be replaced first.
Firs in, First out (FIFO) – because LRU can be complicated
to calculate, this method replaces the oldest block first.
What happens on a write?
There are two basic options when writing to the cache:
Write through- the information is written on both the block in the
cache and the block in the lower level memory.
Write back – the information is written only to the block in the
cache. The modified cache is written to main memory only whencache. The modified cache is written to main memory only when
it is replaced.
Dirty Bit – to reduce the frequency of writing back blocks on
replacement, a feature called dirty bit is commonly used.
This status bit indicates whether the block is dirty (modified
in the cache) or clean (not modified). If it is clean, the block
is not written back on a miss, since identical information to
the cache is found in lower levels.
Comparison between write back and write
through?
Both write back and write through have their advantages.
Write back –
Write occurs at the speed of cache memory.
Multiple write within a block requires only one write to the lowerMultiple write within a block requires only one write to the lower
level memory.
Uses less memory bandwidth.
Saves power.
Write through-
Is easier to implement than write back.
Simplified data coherency.
Six basic cache optimization
1. Larger block size to reduce miss rate.
2. Bigger cache to reduce miss rate.
3. Higher associativity to reduce miss rate.
4. Multilevel cache to reduce miss penalty.4. Multilevel cache to reduce miss penalty.
5. Giving priority to read miss over write miss to
reduce miss penalty.
6. Avoiding address translation during indexing of the
cache to reduce hit time.
Semiconductor memory
 – Typically random access
 – RAM: actually read-write memory
 » Dynamic RAM
 Storage cell is essentially a transistor acting as a capacitor
 Capacitor charge dissipates over time causing a 1 to flip to a zero
 Cells must be refreshed periodically to avoid this Cells must be refreshed periodically to avoid this
 Very high packaging density
 » Static RAM:
 basically an array of flip-flop storage cells
 Uses 5-10x more transistors than similar dynamic cell so
packaging density is 10x lower
 Faster than a dynamic cell
SRAM Vs DRAM
Semiconductor memory
 – Read Only Memories (ROM)
 » “Permanent” data storage
 » ROMs
 Data is “wired in” during fabrication at
a chip manufacturer’s plant
Purchased in lots of 10k or more
 » PROMs » PROMs
 Programmable ROM
 Data can be written once by the user employing a PROM programmer
 Useful for small production runs
 » EPROM
 Erasable PROM
 Programming is similar to a PROM
 Can be erased by exposing to UV light
Semiconductor memory
 » EEPROMS
 Electrically erasable PROMs Can be written to many times while
remaining in a system
 Does not have to be erased first
 Program individual bytes
 Writes require several hundred usec per byte
 Used in systems for development,
 personalization, and other tasks requiring unique information to be
stored
 » Flash Memory
 Similar to EEPROM in using electrical erase
 Fast erasures, block erasures
 Higher density than EEPROM
RAID (Redundant Array Of Independent
Disks)
 RAID (Redundant Array Of Independent Disks) is a
data storage virtualization technology that combines
multiple physical disk drive components into a single
logical unit for the purpose of data redundancy and
performance improvement. Earlier RAID was known
as Redundant Array of Inexpensive Disks.as Redundant Array of Inexpensive Disks.
 Common RAID levels:
 RAID 0
 RAID 1
 RAID 5
 RAID 10
RAID 0
a. It splits data among two or more disks.
b. Provides good performance.
c. Lack of data redundancy means there is
no fail over support with this
configuration.
d. In the diagram to the right, the oddd. In the diagram to the right, the odd
blocks are written to disk 0 and the even
blocks to disk 1 such that A1, A2, A3,
A4, … would be the order of blocks read
if read sequentially from the beginning.
e. Used in read only NFS systems and
gaming systems.
RAID 1
• RAID1 is ‘data mirroring’.
• Two copies of the data are held on two
physical disks, and the data is always
identical.
• Twice as many disks are required to store
the same data when compared to RAID 0.the same data when compared to RAID 0.
• Array continues to operate so long as at
least one drive is functioning.
RAID 3
• a single redundant disk is used -- the
parity drive
• Parity bit is computed for the set of
individual bits in the same
position on all disks
• If a drive fails, parity information
on the redundant disks can be
used to calculate the data from
the failed disk “on the fly”
RAID 5
• RAID 5 is an ideal combination of
good performance, good fault tolerance
and high capacity and storage
efficiency.
• An arrangement of parity and CRC to• An arrangement of parity and CRC to
help rebuilding drive data in case of
disk failures.
• “Distributed Parity” is the key word
here.
RAID 10
a. Combines RAID 1 and RAID 0.
b. Which means having the pleasure of
both - good performance and good
failover handling.
c. Also called ‘Nested RAID’.

More Related Content

PDF
Introduction to Avr Microcontrollers
PPTX
Arm chap 3 last
PPTX
COMPUTER ORGANIZATION AND ARCHITECTURE
PPTX
Floating point representation
PDF
COMPUTER ORGANIZATION NOTES Unit 5
PPTX
Embedded c
PPTX
System software - macro expansion,nested macro calls
PDF
PART-3 : Mastering RTOS FreeRTOS and STM32Fx with Debugging
Introduction to Avr Microcontrollers
Arm chap 3 last
COMPUTER ORGANIZATION AND ARCHITECTURE
Floating point representation
COMPUTER ORGANIZATION NOTES Unit 5
Embedded c
System software - macro expansion,nested macro calls
PART-3 : Mastering RTOS FreeRTOS and STM32Fx with Debugging

What's hot (20)

PPT
Time delays & counter.ppt
PPTX
Micro controller 8051 Interrupts
PPTX
Integrated development environment
PPTX
RISC Vs CISC, Harvard v/s Van Neumann
PPTX
I/O port programming in 8051
PPTX
8237 dma controller
PPTX
RISC Vs CISC Computer architecture and design
PDF
Introduction to Embedded Systems
PDF
Embedded C - Lecture 2
PPTX
Asynchronous data transfer
PPT
Programming 8051 Timers
PPT
Spi master core verification
PPT
Microprocessor 80386
PPTX
timing_diagram_of_8085.pptx
PPT
Interrupt11
PPTX
Assembly and Machine Code
PPTX
Part I:Introduction to assembly language
PDF
Embedded C - Lecture 1
PPTX
8051 timer counter
Time delays & counter.ppt
Micro controller 8051 Interrupts
Integrated development environment
RISC Vs CISC, Harvard v/s Van Neumann
I/O port programming in 8051
8237 dma controller
RISC Vs CISC Computer architecture and design
Introduction to Embedded Systems
Embedded C - Lecture 2
Asynchronous data transfer
Programming 8051 Timers
Spi master core verification
Microprocessor 80386
timing_diagram_of_8085.pptx
Interrupt11
Assembly and Machine Code
Part I:Introduction to assembly language
Embedded C - Lecture 1
8051 timer counter
Ad

Similar to computer-memory (20)

PDF
Computer architecture
PDF
Lecture 25
PPT
ch5.pptjhbuhugikhgyfguijhft67yijbtdyuyhjh
PPTX
CAO-Unit-III.pptx
PPT
cache memory.ppt
PPT
cache memory.ppt
PPTX
BCSE205L_Module 4 Computer Architecture Org.pptx
PPT
IS 139 Lecture 7
PPTX
Cache.pptx
PPT
Memory Organization and Cache mapping.ppt
PDF
Architecture and implementation issues of multi core processors and caching –...
PDF
Computer architecture for HNDIT
PPTX
Computer Organisation and Architecture (COA)
PPT
Cpu caching concepts mr mahesh
PPTX
coa-Unit5-ppt1 (1).pptx
PPTX
2.Cache Memory.pptxoigeyu49-gasdihurovhvhd;oig
PPTX
unit4COApptx.pptxunit4COApptx.pptxunit4COApptx.pptx
PPTX
CPU Caching Concepts
PPT
Memory organization including cache and RAM.ppt
Computer architecture
Lecture 25
ch5.pptjhbuhugikhgyfguijhft67yijbtdyuyhjh
CAO-Unit-III.pptx
cache memory.ppt
cache memory.ppt
BCSE205L_Module 4 Computer Architecture Org.pptx
IS 139 Lecture 7
Cache.pptx
Memory Organization and Cache mapping.ppt
Architecture and implementation issues of multi core processors and caching –...
Computer architecture for HNDIT
Computer Organisation and Architecture (COA)
Cpu caching concepts mr mahesh
coa-Unit5-ppt1 (1).pptx
2.Cache Memory.pptxoigeyu49-gasdihurovhvhd;oig
unit4COApptx.pptxunit4COApptx.pptxunit4COApptx.pptx
CPU Caching Concepts
Memory organization including cache and RAM.ppt
Ad

More from Bablu Shofi (7)

PPTX
Cyber security
PPTX
Genetic algorithm
PPT
Informed search (heuristics)
PPT
Uninformed search
PPT
Data linkcontrol
PPTX
PPTX
Inventory Management
Cyber security
Genetic algorithm
Informed search (heuristics)
Uninformed search
Data linkcontrol
Inventory Management

Recently uploaded (20)

PPT
Project quality management in manufacturing
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPT
Total quality management ppt for engineering students
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Geodesy 1.pptx...............................................
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPTX
Current and future trends in Computer Vision.pptx
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Project quality management in manufacturing
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
Internet of Things (IOT) - A guide to understanding
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
Total quality management ppt for engineering students
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Geodesy 1.pptx...............................................
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Current and future trends in Computer Vision.pptx
Fundamentals of safety and accident prevention -final (1).pptx
III.4.1.2_The_Space_Environment.p pdffdf
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf

computer-memory

  • 2. Terminologies  Capacity: the amount of information that can be contained in a memory unit -- usually in terms of words or bytes  Word: the natural unit of organization in the memory, typically the number of bits used to represent a number  Addressable unit: the fundamental data element size that can be addressed in the memory -- typically either the word size or individual bytes  Unit of transfer: The number of data elements transferred at a time – usually bits in main memory and blocks in secondary memory
  • 3. Terminologies  Access time: – For RAM, the time to address the unit and perform the transfer – For non-random access memory, the time to position the R/W head over the desired location  Memory cycle time: Access time plus any other time required before a second access can be startedsecond access can be started  Transfer rate: Rate at which data is transferred to/from the memory device.  For random-access memory, it is 1/cycle time  For non-random-access memory TN = TA + TN = Average time to read or write N bits TA = Average access time N = number of bits R = Transfer Rate, in bps
  • 4. Access technique  how are memory contents accessed – Random access: » Each location has a unique physical address » Locations can be accessed in any order and all access times are the sameall access times are the same » What we term “RAM” is more aptly called read/write memory since this access technique also applies to ROMs as well  » Example: main memory
  • 5. Access technique  – Sequential access: » Data does not have a unique address » Must read all data items in sequence until the desired item is found » Access times are highly variable » Example: tape drive units  – Direct access: – Direct access: » Data items have unique addresses » Access is done using a combination of moving to a general memory “area” followed by a sequential access to reach the desired data item » Example: disk drives
  • 6. Access technique  – Associative access: » A variation of random access memory » Data items are accessed based on their contents rather than their actual location » Search all data items in parallel for a match to a given search patterna given search pattern » All memory locations searched in parallel without regard to the size of the memory Extremely fast for large memory sizes » Cost per it is 5-10 times that of a “normal” RAM cell » Example: some cache memory units
  • 7. Principle of Locality  Principle of Locality: “Programs tend to reuse data and instructions they have used recently.”  A widely held rule of thumb is that a program spends 90% of its execution time in only 10% of the code. This implies that we can predict with reasonable accuracy what instruction and data a program will use in the near future based on its accesses in the recent past. 7 the recent past.  Two different types of locality have been observed:  Temporal Locality: Recently accessed items are likely to be accessed in the near future.  Spatial Locality: Items whose addresses are near one another tends to be referenced close together in time.
  • 8. Programmers would want unlimited amount of fast memory  Computer pioneers correctly predicted that Programmers would want unlimited amount of fast memory. An economical solution to that desire is a memory hierarchy, which takes advantage of locality and cost-performance of memory technologies. The goal is to provide a memory system with cost almostgoal is to provide a memory system with cost almost as low as the cheapest level of memory and speed almost as fast as the fastest level.
  • 9. Challenges in designing Memory Hierarchy The trade-off between three characteristics of memory:(cost, capacity and access time)  Faster access time, greater cost per bit  Greater capacity, smaller cost per bit  Greater capacity, slower access time
  • 10. The levels in a typical memory hierarchy
  • 11. Cache  Cache is the name given to the first level of the memory hierarchy encountered once the address leaves the CPU.  Cache Hit- when CPU finds a requested data item in the cache, it is called a cache hit.the cache, it is called a cache hit.  Cache Miss- when the CPU does not find a data item it needs in the cache, a cache miss occurs.
  • 12. How cache work  A fixed size collection of data containing the requested word, called a block, is retrieved from the main memory and placed into the cache. Temporal locality tells us that we are likely to need this word again in the near future, so it is useful to place it in the cache where it can be accessed quickly.the cache where it can be accessed quickly. Because of spatial locality, there is a high probability that the other data in the block will be needed soon. ,
  • 14. Cache Performance Memory Stall Cycle- the number of cycles during which the CPU is stalled waiting for a memory access is called the memory stall cycle. The performance is then the product of the clock cycle time and the sum of CPU cycles and the memory stall cycles: CPU execution time = (CPU clock cycles + Memory stall cycles) × Clock Cycle time This equation assumes that the CPU clock cycle include the time to handle a cache hit, and that the CPU is stalled during a cache miss. The number of memory stall cycle depends on both the number of misses and the cost per miss, which is called the miss penalty. Memory stall cycle = Number of misses × Miss penalty = IC × × Miss rate × Miss penalty
  • 15. Cache performance (math) Example: assume we have a computer where the clocks per instruction (CPI) is 1.0 when all memory accesses hit in the cache. The only data accesses are loads and stores, and these total 50% of the instructions. If the miss penalty is 25 clock cycles and the miss rate is 2%, how much faster would the computer be if all instruction were cache hits? Answer: first compute the performance for the computer that always hits: CPU execution time = (CPU clock cycles + Memory stall cycles) × Clock Cycle timeCPU execution time = (CPU clock cycles + Memory stall cycles) × Clock Cycle time = (IC × CPI + 0) × Clock Cycle time = IC × 1.0 × Clock Cycle time Now for the computer with the real cache, first we compute memory stall cycles: Memory stall cycle = IC × × Miss rate × Miss penalty = IC × (1 + 0.5) × 0.02 × 25 = IC × 0.75 (here (1 + 0.5) represents 1 instruction access and 0.5 data access per instruction)
  • 16. Cache performance (math cont.) Thus total performance, CPU execution timecache = (IC × 1.0 + IC × 0.75) × Clock Cycle time = IC × 1.75 × Clock Cycle time Now performance ratio is the inverse of the execution time, CPU execution time IC × 1.75 × Clock Cycle timeCPU execution timecache IC × 1.75 × Clock Cycle time CPU execution time IC × 1.0 × Clock Cycle time = 1.75 The computer with no cache misses is 1.75 times faster.
  • 17. Where can a block be placed in a cache? There are three categories of cache organization based on the restriction on where a block is placed. 1. Fully associative – if a block can be places anywhere in the cache, the cache is said to be fully associative. 2. Direct mapped – if each block has only one place it can2. Direct mapped – if each block has only one place it can appear in the cache, the cache is said to be direct mapped. The mapping is usually (Block address ) MOD (Number of blocks in the cache) 3. Set associative – if a block can be placed in a restricted set of places in the cache, the cache is called set associative. A set is a group of blocks in the cache. Block can be placed anywhere within the set. Set is usually chosen by bit selection, that is, (Block address ) MOD (Number of sets in the cache)
  • 18. Where can a block be placed in a cache?(cont.) 0 1 2 3 4 5 6 7 Cache Fully associative Block 12 can go anywhere 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Direct mapped Block 12 can go only into block 4 ( 12 mod 8) Set associative Block 12 can go anywhere in set 0 ( 12 mod 4) Block no. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Memory Block no. Set Set Set Sat 0 1 2 3 This example cache has eight block frames and memory has 32 blocks
  • 19. How is a block found if it is in the cache? Caches have an address tag on each block frame that gives the block address. The tag of every cache block that might contain the desired information is checked to see if it matches the block address from CPU. the figure shows the three portion of an address in a set associative or direct-mapped cache. The tag is used to check all the blocks in the set and the index is used to select the set. The block offset is the address of desired data within the block. Fully associative cache have no index field. Block address Tag Index Block offset
  • 20. Which block should be replaced on a cache miss? When a cache miss occurs, the cache controller must select a block to be replace with the desired data. There are three primary strategies employed for selecting which block to replace: Random- candidate blocks are randomly selected. Least Recent Used (LRU) – to reduce the chances of throwing out information that will be needed soon, accesses to blocks are recorded. The block that has been unused to the longest time will be replaced first. Firs in, First out (FIFO) – because LRU can be complicated to calculate, this method replaces the oldest block first.
  • 21. What happens on a write? There are two basic options when writing to the cache: Write through- the information is written on both the block in the cache and the block in the lower level memory. Write back – the information is written only to the block in the cache. The modified cache is written to main memory only whencache. The modified cache is written to main memory only when it is replaced. Dirty Bit – to reduce the frequency of writing back blocks on replacement, a feature called dirty bit is commonly used. This status bit indicates whether the block is dirty (modified in the cache) or clean (not modified). If it is clean, the block is not written back on a miss, since identical information to the cache is found in lower levels.
  • 22. Comparison between write back and write through? Both write back and write through have their advantages. Write back – Write occurs at the speed of cache memory. Multiple write within a block requires only one write to the lowerMultiple write within a block requires only one write to the lower level memory. Uses less memory bandwidth. Saves power. Write through- Is easier to implement than write back. Simplified data coherency.
  • 23. Six basic cache optimization 1. Larger block size to reduce miss rate. 2. Bigger cache to reduce miss rate. 3. Higher associativity to reduce miss rate. 4. Multilevel cache to reduce miss penalty.4. Multilevel cache to reduce miss penalty. 5. Giving priority to read miss over write miss to reduce miss penalty. 6. Avoiding address translation during indexing of the cache to reduce hit time.
  • 24. Semiconductor memory  – Typically random access  – RAM: actually read-write memory  » Dynamic RAM  Storage cell is essentially a transistor acting as a capacitor  Capacitor charge dissipates over time causing a 1 to flip to a zero  Cells must be refreshed periodically to avoid this Cells must be refreshed periodically to avoid this  Very high packaging density  » Static RAM:  basically an array of flip-flop storage cells  Uses 5-10x more transistors than similar dynamic cell so packaging density is 10x lower  Faster than a dynamic cell
  • 26. Semiconductor memory  – Read Only Memories (ROM)  » “Permanent” data storage  » ROMs  Data is “wired in” during fabrication at a chip manufacturer’s plant Purchased in lots of 10k or more  » PROMs » PROMs  Programmable ROM  Data can be written once by the user employing a PROM programmer  Useful for small production runs  » EPROM  Erasable PROM  Programming is similar to a PROM  Can be erased by exposing to UV light
  • 27. Semiconductor memory  » EEPROMS  Electrically erasable PROMs Can be written to many times while remaining in a system  Does not have to be erased first  Program individual bytes  Writes require several hundred usec per byte  Used in systems for development,  personalization, and other tasks requiring unique information to be stored  » Flash Memory  Similar to EEPROM in using electrical erase  Fast erasures, block erasures  Higher density than EEPROM
  • 28. RAID (Redundant Array Of Independent Disks)  RAID (Redundant Array Of Independent Disks) is a data storage virtualization technology that combines multiple physical disk drive components into a single logical unit for the purpose of data redundancy and performance improvement. Earlier RAID was known as Redundant Array of Inexpensive Disks.as Redundant Array of Inexpensive Disks.  Common RAID levels:  RAID 0  RAID 1  RAID 5  RAID 10
  • 29. RAID 0 a. It splits data among two or more disks. b. Provides good performance. c. Lack of data redundancy means there is no fail over support with this configuration. d. In the diagram to the right, the oddd. In the diagram to the right, the odd blocks are written to disk 0 and the even blocks to disk 1 such that A1, A2, A3, A4, … would be the order of blocks read if read sequentially from the beginning. e. Used in read only NFS systems and gaming systems.
  • 30. RAID 1 • RAID1 is ‘data mirroring’. • Two copies of the data are held on two physical disks, and the data is always identical. • Twice as many disks are required to store the same data when compared to RAID 0.the same data when compared to RAID 0. • Array continues to operate so long as at least one drive is functioning.
  • 31. RAID 3 • a single redundant disk is used -- the parity drive • Parity bit is computed for the set of individual bits in the same position on all disks • If a drive fails, parity information on the redundant disks can be used to calculate the data from the failed disk “on the fly”
  • 32. RAID 5 • RAID 5 is an ideal combination of good performance, good fault tolerance and high capacity and storage efficiency. • An arrangement of parity and CRC to• An arrangement of parity and CRC to help rebuilding drive data in case of disk failures. • “Distributed Parity” is the key word here.
  • 33. RAID 10 a. Combines RAID 1 and RAID 0. b. Which means having the pleasure of both - good performance and good failover handling. c. Also called ‘Nested RAID’.