computer-memory

Terminologies
 Capacity: the amount of information that can be
contained in a memory unit -- usually in terms of words or
bytes
 Word: the natural unit of organization in the memory,
typically the number of bits used to represent a number
 Addressable unit: the fundamental data element size
that can be addressed in the memory -- typically either
the word size or individual bytes
 Unit of transfer: The number of data elements
transferred at a time – usually bits in main memory and
blocks in secondary memory

Terminologies
 Access time:
– For RAM, the time to address the unit and perform the transfer
– For non-random access memory, the time to position the R/W
head over the desired location
 Memory cycle time: Access time plus any other time required before a
second access can be startedsecond access can be started
 Transfer rate: Rate at which data is transferred to/from the memory
device.
 For random-access memory, it is 1/cycle time
 For non-random-access memory
TN = TA +
TN = Average time to read or write N bits
TA = Average access time
N = number of bits
R = Transfer Rate, in bps

Access technique
 how are memory contents accessed
– Random access:
» Each location has a unique physical address
» Locations can be accessed in any order and
all access times are the sameall access times are the same
» What we term “RAM” is more aptly called
read/write memory since this access technique also
applies to ROMs as well
 » Example: main memory

Access technique
 – Sequential access:
» Data does not have a unique address
» Must read all data items in sequence until the
desired item is found
» Access times are highly variable
» Example: tape drive units
 – Direct access: – Direct access:
» Data items have unique addresses
» Access is done using a combination of moving to a
general memory “area” followed by a sequential access
to reach the desired data item
» Example: disk drives

Access technique
 – Associative access:
» A variation of random access memory
» Data items are accessed based on their
contents rather than their actual location
» Search all data items in parallel for a match to
a given search patterna given search pattern
» All memory locations searched in parallel
without regard to the size of the memory
Extremely fast for large memory sizes
» Cost per it is 5-10 times that of a “normal”
RAM cell
» Example: some cache memory units

Principle of Locality
 Principle of Locality: “Programs tend to reuse data and
instructions they have used recently.”
 A widely held rule of thumb is that a program spends 90% of its
execution time in only 10% of the code. This implies that we
can predict with reasonable accuracy what instruction and data
a program will use in the near future based on its accesses in
the recent past.
7
the recent past.
 Two different types of locality have been observed:
 Temporal Locality: Recently accessed items are likely to be
accessed in the near future.
 Spatial Locality: Items whose addresses are near one another
tends to be referenced close together in time.

Programmers would want unlimited amount
of fast memory
 Computer pioneers correctly predicted that
Programmers would want unlimited amount of fast
memory. An economical solution to that desire is a
memory hierarchy, which takes advantage of locality
and cost-performance of memory technologies. The
goal is to provide a memory system with cost almostgoal is to provide a memory system with cost almost
as low as the cheapest level of memory and speed
almost as fast as the fastest level.

Challenges in designing Memory Hierarchy
The trade-off between three characteristics of
memory:(cost, capacity and access time)
 Faster access time, greater cost per bit
 Greater capacity, smaller cost per bit
 Greater capacity, slower access time

The levels in a typical memory hierarchy

Cache
 Cache is the name given to the first level of the
memory hierarchy encountered once the address
leaves the CPU.
 Cache Hit- when CPU finds a requested data item in
the cache, it is called a cache hit.the cache, it is called a cache hit.
 Cache Miss- when the CPU does not find a data
item it needs in the cache, a cache miss occurs.

How cache work
 A fixed size collection of data containing the
requested word, called a block, is retrieved from the
main memory and placed into the cache. Temporal
locality tells us that we are likely to need this word
again in the near future, so it is useful to place it in
the cache where it can be accessed quickly.the cache where it can be accessed quickly.
Because of spatial locality, there is a high probability
that the other data in the block will be needed soon. ,

Cache Performance
Memory Stall Cycle- the number of cycles during which the CPU is
stalled waiting for a memory access is called the memory stall cycle.
The performance is then the product of the clock cycle time and the
sum of CPU cycles and the memory stall cycles:
CPU execution time = (CPU clock cycles + Memory stall cycles) × Clock Cycle time
This equation assumes that the CPU clock cycle include the time to
handle a cache hit, and that the CPU is stalled during a cache miss.
The number of memory stall cycle depends on both the number of
misses and the cost per miss, which is called the miss penalty.
Memory stall cycle = Number of misses × Miss penalty
= IC × × Miss rate × Miss penalty

Cache performance (math)
Example: assume we have a computer where the clocks per instruction (CPI)
is 1.0 when all memory accesses hit in the cache. The only data accesses are
loads and stores, and these total 50% of the instructions. If the miss penalty is
25 clock cycles and the miss rate is 2%, how much faster would the computer
be if all instruction were cache hits?
Answer: first compute the performance for the computer that always hits:
CPU execution time = (CPU clock cycles + Memory stall cycles) × Clock Cycle timeCPU execution time = (CPU clock cycles + Memory stall cycles) × Clock Cycle time
= (IC × CPI + 0) × Clock Cycle time
= IC × 1.0 × Clock Cycle time
Now for the computer with the real cache, first we compute memory stall cycles:
Memory stall cycle = IC × × Miss rate × Miss penalty
= IC × (1 + 0.5) × 0.02 × 25
= IC × 0.75
(here (1 + 0.5) represents 1 instruction access and 0.5 data access per instruction)

Cache performance (math cont.)
Thus total performance,
CPU execution timecache = (IC × 1.0 + IC × 0.75) × Clock Cycle time
= IC × 1.75 × Clock Cycle time
Now performance ratio is the inverse of the execution time,
CPU execution time IC × 1.75 × Clock Cycle timeCPU execution timecache IC × 1.75 × Clock Cycle time
CPU execution time IC × 1.0 × Clock Cycle time
= 1.75
The computer with no cache misses is 1.75 times faster.

Where can a block be placed in a cache?
There are three categories of cache organization based on the
restriction on where a block is placed.
1. Fully associative – if a block can be places anywhere in the
cache, the cache is said to be fully associative.
2. Direct mapped – if each block has only one place it can2. Direct mapped – if each block has only one place it can
appear in the cache, the cache is said to be direct mapped.
The mapping is usually
(Block address ) MOD (Number of blocks in the cache)
3. Set associative – if a block can be placed in a restricted set
of places in the cache, the cache is called set associative. A
set is a group of blocks in the cache. Block can be placed
anywhere within the set. Set is usually chosen by bit
selection, that is,
(Block address ) MOD (Number of sets in the cache)

Where can a block be placed in a
cache?(cont.)
0 1 2 3 4 5 6 7
Cache
Fully associative
Block 12 can go anywhere
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Direct mapped
Block 12 can go only
into block 4
( 12 mod 8)
Set associative
Block 12 can go
anywhere in set 0
( 12 mod 4)
Block no.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Memory
Block no.
Set Set Set Sat
0 1 2 3
This example cache has eight block frames and memory has 32 blocks

How is a block found if it is in the cache?
Caches have an address tag on each block frame that
gives the block address. The tag of every cache block
that might contain the desired information is checked
to see if it matches the block address from CPU.
the figure shows the three portion of an address in a
set associative or direct-mapped cache. The tag is
used to check all the blocks in the set and the index is
used to select the set. The block offset is the address
of desired data within the block. Fully associative
cache have no index field.
Block address
Tag Index
Block
offset

Which block should be replaced on a cache
miss?
When a cache miss occurs, the cache controller must
select a block to be replace with the desired data. There
are three primary strategies employed for selecting which
block to replace:
Random- candidate blocks are randomly selected.
Least Recent Used (LRU) – to reduce the chances of
throwing out information that will be needed soon,
accesses to blocks are recorded. The block that has been
unused to the longest time will be replaced first.
Firs in, First out (FIFO) – because LRU can be complicated
to calculate, this method replaces the oldest block first.

What happens on a write?
There are two basic options when writing to the cache:
Write through- the information is written on both the block in the
cache and the block in the lower level memory.
Write back – the information is written only to the block in the
cache. The modified cache is written to main memory only whencache. The modified cache is written to main memory only when
it is replaced.
Dirty Bit – to reduce the frequency of writing back blocks on
replacement, a feature called dirty bit is commonly used.
This status bit indicates whether the block is dirty (modified
in the cache) or clean (not modified). If it is clean, the block
is not written back on a miss, since identical information to
the cache is found in lower levels.

Comparison between write back and write
through?
Both write back and write through have their advantages.
Write back –
Write occurs at the speed of cache memory.
Multiple write within a block requires only one write to the lowerMultiple write within a block requires only one write to the lower
level memory.
Uses less memory bandwidth.
Saves power.
Write through-
Is easier to implement than write back.
Simplified data coherency.

Six basic cache optimization
1. Larger block size to reduce miss rate.
2. Bigger cache to reduce miss rate.
3. Higher associativity to reduce miss rate.
4. Multilevel cache to reduce miss penalty.4. Multilevel cache to reduce miss penalty.
5. Giving priority to read miss over write miss to
reduce miss penalty.
6. Avoiding address translation during indexing of the
cache to reduce hit time.

Semiconductor memory
 – Typically random access
 – RAM: actually read-write memory
 » Dynamic RAM
 Storage cell is essentially a transistor acting as a capacitor
 Capacitor charge dissipates over time causing a 1 to flip to a zero
 Cells must be refreshed periodically to avoid this Cells must be refreshed periodically to avoid this
 Very high packaging density
 » Static RAM:
 basically an array of flip-flop storage cells
 Uses 5-10x more transistors than similar dynamic cell so
packaging density is 10x lower
 Faster than a dynamic cell

 – Read Only Memories (ROM)
 » “Permanent” data storage
 » ROMs
 Data is “wired in” during fabrication at
a chip manufacturer’s plant
Purchased in lots of 10k or more
 » PROMs » PROMs
 Programmable ROM
 Data can be written once by the user employing a PROM programmer
 Useful for small production runs
 » EPROM
 Erasable PROM
 Programming is similar to a PROM
 Can be erased by exposing to UV light

 » EEPROMS
 Electrically erasable PROMs Can be written to many times while
remaining in a system
 Does not have to be erased first
 Program individual bytes
 Writes require several hundred usec per byte
 Used in systems for development,
 personalization, and other tasks requiring unique information to be
stored
 » Flash Memory
 Similar to EEPROM in using electrical erase
 Fast erasures, block erasures
 Higher density than EEPROM

RAID (Redundant Array Of Independent
Disks)
 RAID (Redundant Array Of Independent Disks) is a
data storage virtualization technology that combines
multiple physical disk drive components into a single
logical unit for the purpose of data redundancy and
performance improvement. Earlier RAID was known
as Redundant Array of Inexpensive Disks.as Redundant Array of Inexpensive Disks.
 Common RAID levels:
 RAID 0
 RAID 1
 RAID 5
 RAID 10

RAID 0
a. It splits data among two or more disks.
b. Provides good performance.
c. Lack of data redundancy means there is
no fail over support with this
configuration.
d. In the diagram to the right, the oddd. In the diagram to the right, the odd
blocks are written to disk 0 and the even
blocks to disk 1 such that A1, A2, A3,
A4, … would be the order of blocks read
if read sequentially from the beginning.
e. Used in read only NFS systems and
gaming systems.

RAID 1
• RAID1 is ‘data mirroring’.
• Two copies of the data are held on two
physical disks, and the data is always
identical.
• Twice as many disks are required to store
the same data when compared to RAID 0.the same data when compared to RAID 0.
• Array continues to operate so long as at
least one drive is functioning.

RAID 3
• a single redundant disk is used -- the
parity drive
• Parity bit is computed for the set of
individual bits in the same
position on all disks
• If a drive fails, parity information
on the redundant disks can be
used to calculate the data from
the failed disk “on the fly”

RAID 5
• RAID 5 is an ideal combination of
good performance, good fault tolerance
and high capacity and storage
efficiency.
• An arrangement of parity and CRC to• An arrangement of parity and CRC to
help rebuilding drive data in case of
disk failures.
• “Distributed Parity” is the key word
here.

RAID 10
a. Combines RAID 1 and RAID 0.
b. Which means having the pleasure of
both - good performance and good
failover handling.
c. Also called ‘Nested RAID’.

computer-memory

More Related Content

What's hot (20)

Similar to computer-memory (20)

More from Bablu Shofi (7)

Recently uploaded (20)

computer-memory