SlideShare a Scribd company logo
Velammal Engineering College
Department of Computer Science
and Engineering
Welcome…
Dr.S.Gunasundari
Mr. A. Arockia Abins &
Ms. R. Amirthavalli,
CSE,
Velammal Engineering College
Subject Code / Name:
19IT202T /
Computer Architecture
Syllabus – Unit V
UNIT-V MEMORY & I/O SYSTEMS
Memory Hierarchy – memory technologies – Cache Memory
– Performance Considerations Virtual Memory, TLB’s –
Accessing I/O devices – Interrupts – Direct Memory Access –
Bus Structure – Bus operation.
Text Books
● Book 1:
● Name: Computer Organization and Design: The
Hardware/Software Interface
● Authors: David A. Patterson and John L. Hennessy
● Publisher: Morgan Kaufmann / Elsevier
● Edition: Fifth Edition, 2014
● Book 2:
● Name: Computer Organization and Embedded Systems Interface
● Authors: Carl Hamacher, Zvonko Vranesic, Safwat Zaky and Naraig
Manjikian
● Publisher: Tata McGraw Hill
● Edition: Sixth Edition, 2012
The Memory
System
Memory
● A computer memory is the storage space in the
computer system where both data and the
program instructions are stored.
Memory size
● bit – ‘0’ or ‘1’
● 8 bits – 1 byte
Traditional Architecture
Up to 2k
addressable
MDR
MAR
Connection of the memory to the processor.
k-
bit
address bus
n-
bit
data
bus
Control
lines
( , MFC,
etc.)
Processo
r Memory
locations
Word length = n
bits
W
R /
Basic Concepts
● The maximum size of the memory that can be used in any computer is
determined by the addressing scheme.
16-bit addresses = 216
= 64K memory locations
● Most modern computers are byte addressable.
2
k
4
- 2
k
3
- 2
k
2
- 2
k
1
- 2
k
4
-
2
k
4
-
0 1 2 3
4 5 6 7
0
0
4
2
k
1
- 2
k
2
- 2
k
3
- 2
k
4
-
3 2 1 0
7 6 5 4
Byte address
Byte address
(a) Big-endian
assignment
(b) Little-endian
assignment
4
Word
address
•
•
•
•
•
•
Internal organization of memory
chips
Memory hierarchy
Memory hierarchy
● A structure that uses multiple levels of memories; as
the distance from the processor increases, the size
of the memories and the access time both
increase.
Speed, Size, and Cost
Processor
Primary
cach
e
Secondar
y cach
e
Mai
n
Magnetic
disk
memory
Increasin
g siz
e
Increasing
spee
d
Memory hierarchy
secondary
memory
Increasing
cost per
bit
Registers
L
1
L2
● Check whether the following statements are true
not?
● Most of the cost of the memory hierarchy
is at the highest level.
● Most of the capacity of the memory
hierarchy is at the lowest level.
Basic Terms in Memory
1.HIT : If the data requested by the processor appears in some block in the
upper level, this is called a hit .
2. Miss : If the data is not found in the upper level, the request is called a
miss.
The lower level in the hierarchy is then accessed to retrieve the block
containing the requested data.
3. HIT RATE OR HIT RATIO : It is the fraction of memory accesses found in
the upper level; it is often used as a measure of the performance of the
memory hierarchy.
4.MISS RATE : It is the fraction of memory accesses not found in the upper
level.
5. HIT TIME : It is the time to access the upper level of the memory
hierarchy, which includes the time needed to determine whether the access is a
hit or a miss.
6. MISS PENALTY : It is the time to replace a block in the upper level with
the corresponding block from the lower level, plus the time to deliver this block to
Memory Technologies
Four primary Technologies:
1. Main Memory -- DRAM
2. Cache Memory -- SRAM
3. Flash Memory -- nonvolatile memory is the
secondary memory in
Personal Mobile Devices
4.Secondary Memory -- Magnetic Disk
SRAM Cell
◼ Two transistor inverters are cross connected to implement a basic
flip-flop.
◼ The cell is connected to one word line and two bits lines by transistors
T1 and T2
◼ When word line is at ground level, the transistors are turned off and
the latch retains its state
◼ Read operation: In order to read state of SRAM cell, the word line is
activated to close switches T1 and T2. Sense/Write circuits at the
bottom monitor the state of b and b’
Y
X
Word
line
Bit
lines
b
T 2
T1
b ′
SRAM Technology
● The circuits are capable of retaining their state as long as
power is applied.
● value is stored on a pair of inverting gates
● very fast but takes up more space than DRAM (4 to 6 transistors)
● SRAMs are said to be volatile memories because their contents are
lost when power is interrupted.
● Static RAMs can be accessed very quickly
DRAMs
● Static RAMs are fast, but they cost more area and are more expensive.
● Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not retain their
state indefinitely – need to be periodically refreshed.
● value is stored as a charge on capacitor (must be refreshed)
● very small but slower than SRAM (factor of 5 to 10)
single-transistor dynamic memory cell
T
C
Word line
Bit line
◼ Static RAMs (SRAMs):
▪ Consist of circuits that are capable of retaining their state as long
as the power is applied.
▪ Volatile memories, because their contents are lost when power is
interrupted.
▪ Access times of static RAMs are in the range of few nanoseconds.
▪ However, the cost is usually high.
◼ Dynamic RAMs (DRAMs):
▪ Do not retain their state indefinitely.
▪ Contents must be periodically refreshed.
▪ Contents may be refreshed while accessing them for reading.
Questions
◼ What is mean by memory hierarchy?
◼ What are the factors to be considered in memory
hierarchy?
◼ Define hit and miss.
◼ Comparison between SRAM and DRAM.
Latency & Bandwidth
◼ Memory latency is the time it takes to transfer a
word of data to or from memory
◼ Memory bandwidth is the number of bits or bytes
that can be transferred in one second.
Flash Memory
▪ Has similar approach to EEPROM.
▪ Read the contents of a single cell, but write the contents of an entire
block of cells.
▪ Flash devices have greater density.
▪ Higher capacity and low storage cost per bit.
▪ Power consumption of flash memory is very low, making it attractive for
use in equipment that is battery-driven.
▪ Single flash chips are not sufficiently large, so
larger memory modules are implemented using
flash cards and flash drives.
Secondary Memory
● Permanent memory or Non-volatile memory
● Used to store programs and data on a long-term
basis
● Storage capacity : very high (ex: 1 TB)
● Access speed : very slow
Organization of Data on a
Disk
Sector 0, track
0
Sector 3,
track
n
Organization of one surface of a disk.
Sector 0, track
1
Secondary Memory
Tracks divided into sectors
Magnetic Disk Structure
Surface organized into tracks
Disk Access
Head in position above a track
Disk Access
Rotation is counter-clockwise
Disk Access – Read
About to read blue sector
Disk Access – Read
After BLUE read
After reading blue sector
Disk Access – Read
After BLUE read
Red request scheduled next
Disk Access – Seek
After BLUE read Seek for RED
Seek to red’s track
Disk Access – Rotational
Latency
After BLUE read Seek for RED Rotational latency
Wait for red sector to rotate around
Disk Access – Read
After BLUE read Seek for RED Rotational latency After RED read
Complete read of red
Disk Access – Service Time
Components
After BLUE read Seek for RED Rotational latency After RED read
Seek
Rotational Latency
Data Transfer
Speed, Size, and Cost
◼ A big challenge in the design of a computer system is to
provide a sufficiently large memory, with a reasonable
speed at an affordable cost.
◼ Static RAM:
▪ Very fast, but expensive, because a basic SRAM cell has a complex
circuit making it impossible to pack a large number of cells onto a single
chip.
◼ Dynamic RAM:
▪ Simpler basic cell circuit, hence are much less expensive, but significantly
slower than SRAMs.
◼ Magnetic disks:
▪ Storage provided by DRAMs is higher than SRAMs, but is still less than
what is necessary.
▪ Secondary storage such as magnetic disks provide a large amount
of storage, but is much slower than DRAMs.
Cache Memories
Cache
● What is cache memory?
● Why we need it?
● Locality of reference (very important)
- temporal
- spatial
● Cache block – cache line
● A set of contiguous address locations of some size
What is cache memory?
● The cache memory is a small-sized high speed
volatile memory that provides high speed data
access to a processor.
● The cache memory stores the frequently used
computer programs, applications and the program
data.
Locality of Reference
◼ Analysis of programs indicates that many instructions
in localized areas of a program are executed
repeatedly during some period of time, while the
others are accessed relatively less frequently.
▪ These instructions may be the ones in a loop, nested loop or few
procedures calling each other repeatedly.
▪ This is called “locality of reference”.
◼ Temporal locality of reference:
▪ Recently executed instruction is likely to be executed again very
soon.
◼ Spatial locality of reference:
▪ Instructions with addresses close to a recently instruction are likely
to be executed soon.
Cache
● Replacement algorithm
● Hit / miss
● Write-through / Write-back
● Load through
Use of a cache memory.
Cache
Main
memory
Processor
Cache hit
• If the data is in the cache it is called a Read or Write hit.
• Read hit:
▪ The data is obtained from the cache.
• Write hit:
▪ Cache has a replica of the contents of the main memory.
▪ Contents of the cache and the main memory may be updated
simultaneously. This is the write-through protocol.
▪ Update the contents of the cache, and mark it as updated by setting a bit
known as the dirty bit or modified bit. The contents of the main
memory are updated when this block is replaced. This is write-back
or copy-back protocol.
Cache miss
• If the data is not present in the cache, then a Read miss
or Write miss occurs.
• Read miss:
▪ Block of words containing this requested word is transferred from the memory.
▪ After the block is transferred, the desired word is forwarded to the processor.
▪ The desired word may also be forwarded to the processor as soon as it is transferred
without waiting for the entire block to be transferred. This is called load-through or
early-restart.
• Write-miss:
▪ Write-through protocol is used, then the contents of the main memory are
updated directly.
▪ If write-back protocol is used, the block containing the
addressed word is first brought into the cache. The desired word
is overwritten with new information.
Mapping functions
◼ Mapping functions determine how memory blocks
are placed in the cache.
◼ A simple processor example:
▪ Cache consisting of 128 blocks of 16 words each.
▪ Total size of cache is 2048 (2K) words.
▪ Main memory is addressable by a 16-bit address.
▪ Main memory has 64K words.
▪ Main memory has 4K blocks of 16 words each.
◼ Three mapping functions:
▪ Direct mapping
▪ Associative mapping
▪ Set-associative mapping.
Direct Mapping
ta
g
ta
g
ta
g
Cache
Main
memor
y
Block 0
Block 1
Block 127
Block 128
Block 129
Block 255
Block 256
Block 257
Block 4095
Block 0
Block 1
Block 127
7 4 Main memory
address
Tag Block Wor
d
Figure 5.15. Direct-mapped
cache.
5
4: one of 16 words. (each
block has 16=24
words)
7: points to a particular block
in the cache (128=27
)
5: 5 tag bits are compared
with the tag bits associated
with its location in the cache.
Identify which of the 32
blocks that are resident in
the cache (4096/128).
Block j of main memory maps onto
block j modulo 128 of the cache
Direct Mapping
● Tag: 11101
● Block: 1111111=127, in the 127th
block of the cache
● Word:1100=12, the 12th
word of the 127th
block in the
cache
7 4 Main memory
address
Tag Block Wor
d
5
11101,1111111,1100
Associative Mapping
4
ta
g
ta
g
ta
g
Cache
Main
memor
y
Block 0
Block 1
Block i
Block 4095
Block 0
Block 1
Block 127
12 Main memory
address
Figure 5.16. Associative-mapped cache.
Tag Wor
d
4: one of 16 words. (each
block has 16=24
words)
12: 12 tag bits Identify which
of the 4096 blocks that are
resident in the cache
4096=212
.
Associative Mapping
● Tag: 111011111111
● Word:1100=12, the 12th
word of a block in the cache
111011111111,1100
4
12 Main memory
address
Tag Wor
d
Set-Associative Mapping
ta
g
ta
g
ta
g
Cache
Main
memor
y
Block 0
Block 1
Block 63
Block 64
Block 65
Block 127
Block 128
Block 129
Block 4095
Block 0
Block 1
Block 126
ta
g
ta
g
Block 2
Block 3
ta
g Block 127
Main memory
address
6 6 4
Tag Se
t
Wor
d
Set
0
Set
1
Set
63
Figure 5.17. Set-associative-mapped cache with two blocks per set.
4: one of 16 words. (each
block has 16=24
words)
6: points to a particular set in
the cache (128/2=64=26
)
6: 6 tag bits is used to check
if the desired block is
present (4096/64=26
).
Set-Associative Mapping
● Tag: 111011
● Set: 111111=63, in the 63th
set of the cache
● Word:1100=12, the 12th
word of the 63th set in the
cache
Main memory
address
6 6 4
Tag Se
t
Wor
d
111011,111111,1100
Replacement Algorithms
● Difficult to determine which blocks to kick out
● Least Recently Used (LRU) block
● The cache controller tracks references to all blocks
as computation proceeds.
● Increase / clear track counters when a hit/miss
occurs
● List out the mapping functions.
● What is mean by cache memory?
Virtual
Memories
Overview
● Physical main memory is not as large as the address space
spanned by an address issued by the processor.
232
= 4 GB, 264
= …
● When a program does not completely fit into the main
memory, the parts of it not currently being executed are
stored on secondary storage devices.
● Techniques that automatically move program and data
blocks into the physical main memory when they are
required for execution are called virtual-memory
techniques.
● Virtual addresses will be translated into physical addresses.
Virtual Memory
◼ Virtual memory is an architectural solution to increase the effective
size of the memory system.
Overview
Memory
Management
Unit
Address Translation
● All programs and data are composed of fixed-
length units called pages, each of which consists of
a block of words that occupy contiguous locations
in the main memory.
● Page cannot be too small or too large.
● The virtual memory mechanism bridges the size and
speed gaps between the main memory and
secondary storage – similar to cache.
Address Translation
Page
frame
Virtual address from
processor
in
memory
Offset
Offset
Virtual page
number
Page table
address
Page table base register
Figure 5.27. Virtual-memory address
translation.
Contro
l bit
s
Physical address in main
memory
PAGE TABLE
Page
frame
+
Address Translation
● The page table information is used by the MMU for
every access, so it is supposed to be with the MMU.
● However, since MMU is on the processor chip and
the page table is rather large, only small portion of
it, which consists of the page table entries that
correspond to the most recently accessed pages,
can be accommodated within the MMU.
● Translation Lookaside Buffer (TLB)
TLB
Figure 5.28. Use of an associative-mapped
TLB.
No
Yes
Hit
Miss
Virtual address from
processor
TLB
Offset
Virtual page number
number
Virtual page Page frame
in memory
Control
bits
Offset
Physical address in main memory
Page frame
=?
Address translation (contd..)
◼ What happens if a program generates an access
to a page that is not in the main memory?
◼ In this case, a page fault is said to occur.
▪ Whole page must be brought into the main memory from the
disk, before the execution can proceed.
◼ Upon detecting a page fault by the MMU,
following actions occur:
▪ MMU asks the operating system to intervene by raising an
exception.
▪ Processing of the active task which caused the page fault is
interrupted.
▪ Control is transferred to the operating system.
▪ Operating system copies the requested page from secondary
storage to the main memory.
▪ Once the page is copied, control is returned to the task which
was interrupted.
62
Address translation (contd..)
◼ When a new page is to be brought into the main
memory from secondary storage, the main memory
may be full.
▪ Some page from the main memory must be replaced with this new
page.
◼ How to choose which page to replace?
▪ This is similar to the replacement that occurs when the cache is full.
▪ The principle of locality of reference (?) can also be applied here.
▪ A replacement strategy similar to LRU can be applied.
◼ Since the size of the main memory is relatively larger
compared to cache, a relatively large amount of
programs and data can be held in the main memory.
▪ Minimizes the frequency of transfers between secondary storage and
main memory.
63
Cache & Virtual Memory
◼ Cache memory:
▪ Introduced to bridge the speed gap between the processor and
the main memory.
▪ Implemented in hardware.
◼ Virtual memory:
▪ Introduced to bridge the speed gap between the main memory
and secondary storage.
▪ Implemented in part by software.
64
INPUT /
OUTPUT
ORGANIZATION
Accessing I/O
Devices
Accessing I/O devices
Bus
I/O device 1 I/O device n
Processor Memory
• Multiple I/O devices may be connected to the processor and the memory via a bus.
• Bus consists of three sets of lines to carry address, data and control signals.
• Each I/O device is assigned an unique address.
• To access an I/O device, the processor places the address on the address lines.
• The device recognizes the address, and responds to the control signals.
Accessing I/O devices
(contd..)
 I/O devices and the memory may share the same
address space:
 Memory-mapped I/O.
 Any machine instruction that can access memory can be used to transfer
data to or from an I/O device.
 Simpler software.
 I/O devices and the memory may have different address
spaces:
 Special instructions to transfer data to and from I/O devices.
 I/O devices may have to deal with fewer address lines.
 I/O address lines need not be physically separate from memory address
lines.
 In fact, address lines may be shared between I/O devices and memory,
with a control signal to indicate whether it is a memory address or an I/O
address.
68
Accessing I/O devices
(contd..)
I/O
interface
decoder
Address Data and
status registers
Control
circuits
Input device
Bus
Address lines
Data lines
Control lines
• I/O device is connected to the bus using an I/O interface circuit which has:
- Address decoder, control circuit, and data and status registers.
• Address decoder decodes the address placed on the address lines thus enabling the
device to recognize its address.
• Data register holds the data being transferred to or from the processor.
• Status register holds information necessary for the operation of the I/O device.
• Data and status registers are connected to the data lines, and have unique addresses.
• I/O interface circuit coordinates I/O transfers.
Accessing I/O devices
(contd..)
 Recall that the rate of transfer to and from I/O
devices is slower than the speed of the processor.
This creates the need for mechanisms to synchronize
data transfers between them.
 Program-controlled I/O:
 Processor repeatedly monitors a status flag to achieve the
necessary synchronization.
 Processor polls the I/O device.
 Two other mechanisms used for synchronizing data
transfers between the processor and memory:
 Interrupts.
 Direct Memory Access.
Interrupts
Interrupts
• In program-controlled I/O, when the processor
continuously monitors the status of the device, it
does not perform any useful tasks.
• An alternate approach would be for the I/O device
to alert the processor when it becomes ready.
o Do so by sending a hardware signal called an interrupt to the
processor.
o At least one of the bus control lines, called an interrupt-request
line is dedicated for this purpose.
• Processor can perform other useful tasks while it is
waiting for the device to be ready.
Interrupt
• An interrupt is an event that causes the
execution of one program to be suspended and
execution of another program to be begin.
Interrupts (contd..)
Interrupt Service routine
Program 1
here
Interrupt
occurs
M
i
2
1
i 1
+
• Processor is executing the instruction located at address i when an interrupt occurs.
• Routine executed in response to an interrupt request is called the interrupt-service routine.
• When an interrupt occurs, control must be transferred to the interrupt service routine.
• But before transferring control, the current contents of the PC (i+1), must be saved in a known
location.
• This will enable the return-from-interrupt instruction to resume execution at i+1.
• Return address, or the contents of the PC are usually stored on the processor stack.
Example
• Some computations + print
• Two subroutines: COMPUTE and PRINT
• The printer accepts only one line of text at a time.
• Try to overlap printing and computation.
 COMPUTE produces first n lines of text;
 PRINT sends the first line to the printer; then PRINT is suspended;
COMPUTE continues to perform other computations;
 After the printer finishes printing the first line, it send an
interrupt-request signal to the processor;
 In response, the processor interrupts execution of COMPUTE
and transfers control to PRINT to send the next line;
 COMPUTE resumes;
 …
Handling Multiple
Devices
• How can the processor recognize the device requesting an
interrupt?
• Given that different devices are likely to require different
interrupt-service routines, how can the processor obtain the
starting address of the appropriate routine in each case?
• (Vectored interrupts)
• Should a device be allowed to interrupt the processor while
another interrupt is being serviced?
• (Interrupt nesting)
• How should two or more simultaneous interrupt requests be
handled?
• (Daisy-chain)
Vectored Interrupts
• The device requesting an interrupt may identify itself
directly to the processor.
o Device can do so by sending a special code (4 to 8 bits) the
processor over the bus.
o Code supplied by the device may represent a part of the starting
address of the interrupt-service routine.
o The remainder of the starting address is obtained by the processor
based on other information such as the range of memory
addresses where interrupt service routines are located.
• Usually the location pointed to by the interrupting
device is used to store the starting address of the
interrupt-service routine.
Interrupt Nesting
Priority arbitration
Device 1 Device 2 Device p
Processor
INTA1
INTR1 INTRp
INTAp
• Each device has a separate interrupt-request and interrupt-acknowledge line.
• Each interrupt-request line is assigned a different priority level.
• Interrupt requests received over these lines are sent to a priority arbitration circuit
in the processor.
• If the interrupt request has a higher priority level than the priority of the processor,
then the request is accepted.
Interrupts (contd..)
Processor
Device 2
INTR
INTA
Device n
Device 1
Polling scheme:
• If the processor uses a polling mechanism to poll the status registers of I/O devices
to determine which device is requesting an interrupt.
• In this case the priority is determined by the order in which the devices are polled.
• The first device with status bit set to 1 is the device whose interrupt request is
accepted.
Daisy chain scheme:
• Devices are connected to form a daisy chain.
• Devices share the interrupt-request line, and interrupt-acknowledge line is connected
to form a daisy chain.
• When devices raise an interrupt request, the interrupt-request line is activated.
• The processor in response activates interrupt-acknowledge.
• Received by device 1, if device 1 does not need service, it passes the signal to device 2.
• Device that is electrically closest to the processor has the highest priority.
Interrupts (contd..)
• When I/O devices were organized into a priority structure, each device had its own
interrupt-request and interrupt-acknowledge line.
• When I/O devices were organized in a daisy chain fashion, the devices shared an
interrupt-request line, and the interrupt-acknowledge propagated through the devices.
• A combination of priority structure and daisy chain scheme can also used.
Device Device
circuit
Priority arbitration
Processor
Device Device
INTR1
INTR p
INTA1
INTAp
• Devices are organized into groups.
• Each group is assigned a different priority level.
• All the devices within a single group share an interrupt-request line, and are
connected to form a daisy chain.
Exceptions
 Interrupts caused by interrupt-requests sent by I/O
devices.
 Interrupts could be used in many other situations where
the execution of one program needs to be suspended
and execution of another program needs to be started.
 In general, the term exception is used to refer to any
event that causes an interruption.
 Interrupt-requests from I/O devices is one type of an exception.
 Other types of exceptions are:
 Recovery from errors
 Debugging
 Privilege exception
Direct Memory
Access
Direct Memory Access
(contd..)
 Direct Memory Access (DMA):
 A special control unit may be provided to transfer a block of data
directly between an I/O device and the main memory, without
continuous intervention by the processor.
 Control unit which performs these transfers is a part
of the I/O device’s interface circuit. This control unit
is called as a DMA controller.
 DMA controller performs functions that would be
normally carried out by the processor:
 For each word, it provides the memory address and all the control
signals.
 To transfer a block of data, it increments the memory addresses
and keeps track of the number of transfers.
Direct Memory Access
(contd..)
 DMA controller can transfer a block of data from an
external device to the processor, without any intervention
from the processor.
 However, the operation of the DMA controller must be under the control of
a program executed by the processor. That is, the processor must initiate
the DMA transfer.
 To initiate the DMA transfer, the processor informs the DMA
controller of:
 Starting address,
 Number of words in the block.
 Direction of transfer (I/O device to the memory, or memory to the I/O
device).
 Once the DMA controller completes the DMA transfer, it
informs the processor by raising an interrupt signal.
Direct Memory Access
memory
Processor
System bus
Main
Keyboard
Disk/DMA
controller Printer
DMA
controller
Disk
Disk
• DMA controller connects a high-speed network to the computer bus.
• Disk controller, which controls two disks also has DMA capability. It provides two
DMA channels.
• It can perform two independent DMA operations, as if each disk has its own DMA
controller. The registers to store the memory address, word count and status and
control information are duplicated.
Network
Interface
Direct Memory Access
(contd..)
 Processor and DMA controllers have to use the bus in an
interwoven fashion to access the memory.
 DMA devices are given higher priority than the processor to access the
bus.
 Among different DMA devices, high priority is given to high-speed
peripherals such as a disk or a graphics display device.
 Processor originates most memory access cycles on the
bus.
 DMA controller can be said to “steal” memory access cycles from the
bus. This interweaving technique is called as “cycle stealing”.
 An alternate approach is the provide a DMA controller
an exclusive capability to initiate transfers on the bus, and
hence exclusive access to the main memory. This is
known as the block or burst mode.
Bus arbitration
 Processor and DMA controllers both need to initiate
data transfers on the bus and access main memory.
 The device that is allowed to initiate transfers on the
bus at any given time is called the bus master.
 When the current bus master relinquishes its status as
the bus master, another device can acquire this status.
 The process by which the next device to become the bus master is
selected and bus mastership is transferred to it is called bus arbitration.
 Centralized arbitration:
 A single bus arbiter performs the arbitration.
 Distributed arbitration:
 All devices participate in the selection of the next bus master.
Centralized Bus
Arbitration
Processor
DMA
controller
1
DMA
controller
2
BG1 BG2
B R
B BS Y
Centralized Bus
Arbitration(cont.,)
• Bus arbiter may be the processor or a separate unit
connected to the bus.
• Normally, the processor is the bus master, unless it grants bus
membership to one of the DMA controllers.
• DMA controller requests the control of the bus by asserting the
Bus Request (BR) line.
• In response, the processor activates the Bus-Grant1 (BG1) line,
indicating that the controller may use the bus when it is free.
• BG1 signal is connected to all DMA controllers in a daisy chain
fashion.
• BBSY signal is 0, it indicates that the bus is busy. When BBSY
becomes 1, the DMA controller which asserted BR can
acquire control of the bus.
Centralized arbitration
(contd..)
BBSY
BG1
BG2
Bus
master
BR
Processor DMA controller 2 Processor
Time
DMA controller 2
asserts the BR signal.
Processor asserts
the BG1 signal
BG1 signal propagates
to DMA#2.
Processor relinquishes control
of the bus by setting BBSY to 1.
Distributed arbitration
 All devices waiting to use the bus share the responsibility of
carrying out the arbitration process.
 Arbitration process does not depend on a central arbiter and hence
distributed arbitration has higher reliability.
 Each device is assigned a 4-bit ID number.
 All the devices are connected using 5 lines, 4 arbitration lines
to transmit the ID, and one line for the Start-Arbitration signal.
 To request the bus a device:
 Asserts the Start-Arbitration signal.
 Places its 4-bit ID number on the arbitration lines.
 The pattern that appears on the arbitration lines is the
logical-OR of all the 4-bit device IDs placed on the arbitration
lines.
Distributed arbitration
Distributed
arbitration(Contd.,)
• Arbitration process:
o Each device compares the pattern that appears on the arbitration lines to
its own ID, starting with MSB.
o If it detects a difference, it transmits 0s on the arbitration lines for that and
all lower bit positions.
o The pattern that appears on the arbitration lines is the logical-OR of all the
4-bit device IDs placed on the arbitration lines.
Distributed arbitration
(contd..)
• Device A has the ID 5 and wants to request the bus:
- Transmits the pattern 0101 on the arbitration lines.
• Device B has the ID 6 and wants to request the bus:
- Transmits the pattern 0110 on the arbitration lines.
• Pattern that appears on the arbitration lines is the logical OR of the patterns:
- Pattern 0111 appears on the arbitration lines.
Arbitration process:
• Each device compares the pattern that appears on the arbitration lines to its own
ID, starting with MSB.
• If it detects a difference, it transmits 0s on the arbitration lines for that and all lower
bit positions.
• Device A compares its ID 5 with a pattern 0101 to pattern 0111.
• It detects a difference at bit position 0, as a result, it transmits a pattern 0100 on the
arbitration lines.
• The pattern that appears on the arbitration lines is the logical-OR of 0100 and 0110,
which is 0110.
• This pattern is the same as the device ID of B, and hence B has won the arbitration.
Buses
Buses
• Processor, main memory, and I/O devices are
interconnected by means of a bus.
• Bus provides a communication path for the transfer
of data.
o Bus also includes lines to support interrupts and arbitration.
• A bus protocol is the set of rules that govern the
behavior of various devices connected to the bus,
as to when to place information on the bus, when
to assert control signals, etc.
Buses (contd..)
 Bus lines may be grouped into three types:
 Data
 Address
 Control
 Control signals specify:
 Whether it is a read or a write operation.
 Required size of the data, when several operand sizes (byte, word,
long word) are possible.
 Timing information to indicate when the processor and I/O devices
may place data or receive data from the bus.
 Schemes for timing of data transfers over a bus can be
classified into:
 Synchronous,
 Asynchronous.
Synchronous bus
Bus clock
Bus cycle
Synchronous bus (contd..)
Bus cycle
Data
Bus clock
command
Address and
t0 t1 t2
Time
Master places the
device address and
command on the bus,
and indicates that
it is a Read operation.
Addressed slave places
data on the data lines Master “strobes” the data
on the data lines into its
input buffer, for a Read
operation.
• In case of a Write operation, the master places the data on the bus along with the
address and commands at time t0.
• The slave strobes the data into its input buffer at time t2.
Synchronous bus (contd..)
• Once the master places the device address and
command on the bus, it takes time for this information
to propagate to the devices:
o This time depends on the physical and electrical characteristics of
the bus.
• Also, all the devices have to be given enough time to
decode the address and control signals, so that the
addressed slave can place data on the bus.
• Width of the pulse t1 - t0 depends on:
o Maximum propagation delay between two devices connected to
the bus.
o Time taken by all the devices to decode the address and control
signals, so that the addressed slave can respond at time t1.
Synchronous bus (contd..)
• At the end of the clock cycle, at time t2, the master
strobes the data on the data lines into its input
buffer if it’s a Read operation.
o “Strobe” means to capture the values of the data and store them
into a buffer.
• When data are to be loaded into a storage buffer
register, the data should be available for a period
longer than the setup time of the device.
• Width of the pulse t2 - t1 should be longer than:
o Maximum propagation time of the bus plus
o Set up time of the input buffer register of the master.
Synchronous bus (contd..)
Data
Bus clock
command
Address and
t
0
t1 t
2
command
Address and
Data
Seen by
master
Seen by slave
tAM
tAS
tDS
tDM
Time
• Signals do not appear on the bus as soon as they are placed on the bus, due to the
propagation delay in the interface circuits.
• Signals reach the devices after a propagation delay which depends on the
characteristics of the bus.
• Data must remain on the bus for some time after t2 equal to the hold time of the buffer.
Address &
command
appear on the
bus.
Address &
command reach
the slave.
Data appears
on the bus.
Data reaches
the master.
Synchronous bus (contd..)
• Data transfer has to be completed within one clock
cycle.
o Clock period t2 - t0 must be such that the longest propagation
delay on the bus and the slowest device interface must be
accommodated.
o Forces all the devices to operate at the speed of the slowest
device.
• Processor just assumes that the data are available
at t2 in case of a Read operation, or are read by
the device in case of a Write operation.
o What if the device is actually failed, and never really responded?
Synchronous bus (contd..)
• Most buses have control signals to represent a
response from the slave.
• Control signals serve two purposes:
o Inform the master that the slave has recognized the address, and
is ready to participate in a data transfer operation.
o Enable to adjust the duration of the data transfer operation
based on the speed of the participating slaves.
• High-frequency bus clock is used:
o Data transfer spans several clock cycles instead of just one clock
cycle as in the earlier case.
Synchronous bus (contd..)
1 2 3 4
Clock
Address
Command
Data
Slave-ready
Time
Address & command
requesting a Read
operation appear on
the bus.
Slave places the data on the bus,
and asserts Slave-ready signal.
Master strobes data
into the input buffer.
Clock changes are seen by all the devices
at the same time.
Asynchronous bus
 Data transfers on the bus is controlled by a handshake
between the master and the slave.
 Common clock in the synchronous bus case is replaced
by two timing control lines:
 Master-ready,
 Slave-ready.
 Master-ready signal is asserted by the master to indicate
to the slave that it is ready to participate in a data
transfer.
 Slave-ready signal is asserted by the slave in response to
the master-ready from the master, and it indicates to the
master that the slave is ready to participate in a data
transfer.
Asynchronous bus
(contd..)
• Data transfer using the handshake protocol:
o Master places the address and command information on the bus.
o Asserts the Master-ready signal to indicate to the slaves that the
address and command information has been placed on the bus.
o All devices on the bus decode the address.
o Address slave performs the required operation, and informs the
processor it has done so by asserting the Slave-ready signal.
o Master removes all the signals from the bus, once Slave-ready is
asserted.
o If the operation is a Read operation, Master also strobes the data
into its input buffer.
Asynchronous bus
(contd..)
Slave-ready
Data
Master-ready
and command
Address
Bus cycle
t1 t2 t3 t4 t5
t0
Time
t0 - Master places the address and command information on the bus.
t1 - Master asserts the Master-ready signal. Master-ready signal is asserted at t1
instead of t0
t2 - Addressed slave places the data on the bus and asserts the Slave-ready signal.
t3 - Slave-ready signal arrives at the master
.
t4 - Master removes the address and command information.
t5 - Slave receives the transition of the Master-ready signal from 1 to 0. It removes
Asynchronous vs.
Synchronous bus
• Advantages of asynchronous bus:
o Eliminates the need for synchronization between the sender and
the receiver.
o Can accommodate varying delays automatically, using the
Slave-ready signal.
• Disadvantages of asynchronous bus:
o Data transfer rate with full handshake is limited by two-round trip
delays.
o Data transfers using a synchronous bus involves only one round
trip delay, and hence a synchronous bus can achieve faster
rates.
Thank You…

More Related Content

PPTX
MEMORY & I/O SYSTEMS
PPTX
CA UNIT V..pptx
PPTX
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
PPT
Internal Memory FIT NED UNIVERSITY OF EN
PPTX
Computer Architecture and Organization--Memory and I/O
PDF
Unit IV Memory and I/O Organization
PPT
Computer Organisation and Architecture
PPTX
Unit 4 Memory Organization_Computer Organization and Architecture
MEMORY & I/O SYSTEMS
CA UNIT V..pptx
onur-comparch-fall2018-lecture3b-memoryhierarchyandcaches-afterlecture.pptx
Internal Memory FIT NED UNIVERSITY OF EN
Computer Architecture and Organization--Memory and I/O
Unit IV Memory and I/O Organization
Computer Organisation and Architecture
Unit 4 Memory Organization_Computer Organization and Architecture

Similar to Computer Architecture Input Output Memory.pptx (20)

PDF
Architecture_L5 (3).pdf wwwwwwwwwwwwwwwwwwwwwwwwwww
PPT
Computer Architecture- University of Dodoma
PPT
Chapter 8 computer memory system overview
PPT
sramanddram.ppt
PPTX
Memory Organization
PPT
PDF
Computer organization memory
PDF
Detail explanation of memory organization
PPTX
COA notes
PPT
Memory Hierarchy PPT of Computer Organization
PPT
memory systems-module 3 presentation ppt
PPTX
Computer architecture memory system
PPTX
Memory hierarchy
PPT
Chapter5 the memory-system-jntuworld
PPTX
Computer organizatin.Chapter Seven.pptxs
PPT
Memory system in computer architecture system
PPT
computer chapter5-the memory system (1) (2).ppt
PPT
chapter5-the memory system chapter .ppt
PPTX
Introduction to the memory system embedded.pptx
PDF
Memory hierarchy.pdf
Architecture_L5 (3).pdf wwwwwwwwwwwwwwwwwwwwwwwwwww
Computer Architecture- University of Dodoma
Chapter 8 computer memory system overview
sramanddram.ppt
Memory Organization
Computer organization memory
Detail explanation of memory organization
COA notes
Memory Hierarchy PPT of Computer Organization
memory systems-module 3 presentation ppt
Computer architecture memory system
Memory hierarchy
Chapter5 the memory-system-jntuworld
Computer organizatin.Chapter Seven.pptxs
Memory system in computer architecture system
computer chapter5-the memory system (1) (2).ppt
chapter5-the memory system chapter .ppt
Introduction to the memory system embedded.pptx
Memory hierarchy.pdf
Ad

More from Gunasundari Selvaraj (9)

PPTX
Unit 4 Computer Architecture Multicore Processor.pptx
PPTX
PROCESSOR AND CONTROL UNIT - unit 3 Architecture
PDF
Computer Architecture: ARITHMETIC FOR COMPUTERS
PPTX
Introduction to Computer Architecture: unit 1
Unit 4 Computer Architecture Multicore Processor.pptx
PROCESSOR AND CONTROL UNIT - unit 3 Architecture
Computer Architecture: ARITHMETIC FOR COMPUTERS
Introduction to Computer Architecture: unit 1
Ad

Recently uploaded (20)

PPTX
TNA_Presentation-1-Final(SAVE)) (1).pptx
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
20th Century Theater, Methods, History.pptx
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
1_English_Language_Set_2.pdf probationary
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
advance database management system book.pdf
PDF
IGGE1 Understanding the Self1234567891011
PPTX
Introduction to Building Materials
TNA_Presentation-1-Final(SAVE)) (1).pptx
FORM 1 BIOLOGY MIND MAPS and their schemes
History, Philosophy and sociology of education (1).pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Share_Module_2_Power_conflict_and_negotiation.pptx
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
20th Century Theater, Methods, History.pptx
What if we spent less time fighting change, and more time building what’s rig...
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
1_English_Language_Set_2.pdf probationary
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
advance database management system book.pdf
IGGE1 Understanding the Self1234567891011
Introduction to Building Materials

Computer Architecture Input Output Memory.pptx

  • 1. Velammal Engineering College Department of Computer Science and Engineering Welcome… Dr.S.Gunasundari Mr. A. Arockia Abins & Ms. R. Amirthavalli, CSE, Velammal Engineering College
  • 2. Subject Code / Name: 19IT202T / Computer Architecture
  • 3. Syllabus – Unit V UNIT-V MEMORY & I/O SYSTEMS Memory Hierarchy – memory technologies – Cache Memory – Performance Considerations Virtual Memory, TLB’s – Accessing I/O devices – Interrupts – Direct Memory Access – Bus Structure – Bus operation.
  • 4. Text Books ● Book 1: ● Name: Computer Organization and Design: The Hardware/Software Interface ● Authors: David A. Patterson and John L. Hennessy ● Publisher: Morgan Kaufmann / Elsevier ● Edition: Fifth Edition, 2014 ● Book 2: ● Name: Computer Organization and Embedded Systems Interface ● Authors: Carl Hamacher, Zvonko Vranesic, Safwat Zaky and Naraig Manjikian ● Publisher: Tata McGraw Hill ● Edition: Sixth Edition, 2012
  • 6. Memory ● A computer memory is the storage space in the computer system where both data and the program instructions are stored.
  • 7. Memory size ● bit – ‘0’ or ‘1’ ● 8 bits – 1 byte
  • 8. Traditional Architecture Up to 2k addressable MDR MAR Connection of the memory to the processor. k- bit address bus n- bit data bus Control lines ( , MFC, etc.) Processo r Memory locations Word length = n bits W R /
  • 9. Basic Concepts ● The maximum size of the memory that can be used in any computer is determined by the addressing scheme. 16-bit addresses = 216 = 64K memory locations ● Most modern computers are byte addressable. 2 k 4 - 2 k 3 - 2 k 2 - 2 k 1 - 2 k 4 - 2 k 4 - 0 1 2 3 4 5 6 7 0 0 4 2 k 1 - 2 k 2 - 2 k 3 - 2 k 4 - 3 2 1 0 7 6 5 4 Byte address Byte address (a) Big-endian assignment (b) Little-endian assignment 4 Word address • • • • • •
  • 10. Internal organization of memory chips
  • 12. Memory hierarchy ● A structure that uses multiple levels of memories; as the distance from the processor increases, the size of the memories and the access time both increase.
  • 13. Speed, Size, and Cost Processor Primary cach e Secondar y cach e Mai n Magnetic disk memory Increasin g siz e Increasing spee d Memory hierarchy secondary memory Increasing cost per bit Registers L 1 L2
  • 14. ● Check whether the following statements are true not? ● Most of the cost of the memory hierarchy is at the highest level. ● Most of the capacity of the memory hierarchy is at the lowest level.
  • 15. Basic Terms in Memory 1.HIT : If the data requested by the processor appears in some block in the upper level, this is called a hit . 2. Miss : If the data is not found in the upper level, the request is called a miss. The lower level in the hierarchy is then accessed to retrieve the block containing the requested data. 3. HIT RATE OR HIT RATIO : It is the fraction of memory accesses found in the upper level; it is often used as a measure of the performance of the memory hierarchy. 4.MISS RATE : It is the fraction of memory accesses not found in the upper level. 5. HIT TIME : It is the time to access the upper level of the memory hierarchy, which includes the time needed to determine whether the access is a hit or a miss. 6. MISS PENALTY : It is the time to replace a block in the upper level with the corresponding block from the lower level, plus the time to deliver this block to
  • 16. Memory Technologies Four primary Technologies: 1. Main Memory -- DRAM 2. Cache Memory -- SRAM 3. Flash Memory -- nonvolatile memory is the secondary memory in Personal Mobile Devices 4.Secondary Memory -- Magnetic Disk
  • 17. SRAM Cell ◼ Two transistor inverters are cross connected to implement a basic flip-flop. ◼ The cell is connected to one word line and two bits lines by transistors T1 and T2 ◼ When word line is at ground level, the transistors are turned off and the latch retains its state ◼ Read operation: In order to read state of SRAM cell, the word line is activated to close switches T1 and T2. Sense/Write circuits at the bottom monitor the state of b and b’ Y X Word line Bit lines b T 2 T1 b ′
  • 18. SRAM Technology ● The circuits are capable of retaining their state as long as power is applied. ● value is stored on a pair of inverting gates ● very fast but takes up more space than DRAM (4 to 6 transistors) ● SRAMs are said to be volatile memories because their contents are lost when power is interrupted. ● Static RAMs can be accessed very quickly
  • 19. DRAMs ● Static RAMs are fast, but they cost more area and are more expensive. ● Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not retain their state indefinitely – need to be periodically refreshed. ● value is stored as a charge on capacitor (must be refreshed) ● very small but slower than SRAM (factor of 5 to 10) single-transistor dynamic memory cell T C Word line Bit line
  • 20. ◼ Static RAMs (SRAMs): ▪ Consist of circuits that are capable of retaining their state as long as the power is applied. ▪ Volatile memories, because their contents are lost when power is interrupted. ▪ Access times of static RAMs are in the range of few nanoseconds. ▪ However, the cost is usually high. ◼ Dynamic RAMs (DRAMs): ▪ Do not retain their state indefinitely. ▪ Contents must be periodically refreshed. ▪ Contents may be refreshed while accessing them for reading.
  • 21. Questions ◼ What is mean by memory hierarchy? ◼ What are the factors to be considered in memory hierarchy? ◼ Define hit and miss. ◼ Comparison between SRAM and DRAM.
  • 22. Latency & Bandwidth ◼ Memory latency is the time it takes to transfer a word of data to or from memory ◼ Memory bandwidth is the number of bits or bytes that can be transferred in one second.
  • 23. Flash Memory ▪ Has similar approach to EEPROM. ▪ Read the contents of a single cell, but write the contents of an entire block of cells. ▪ Flash devices have greater density. ▪ Higher capacity and low storage cost per bit. ▪ Power consumption of flash memory is very low, making it attractive for use in equipment that is battery-driven. ▪ Single flash chips are not sufficiently large, so larger memory modules are implemented using flash cards and flash drives.
  • 24. Secondary Memory ● Permanent memory or Non-volatile memory ● Used to store programs and data on a long-term basis ● Storage capacity : very high (ex: 1 TB) ● Access speed : very slow
  • 25. Organization of Data on a Disk Sector 0, track 0 Sector 3, track n Organization of one surface of a disk. Sector 0, track 1
  • 27. Tracks divided into sectors Magnetic Disk Structure Surface organized into tracks
  • 28. Disk Access Head in position above a track
  • 29. Disk Access Rotation is counter-clockwise
  • 30. Disk Access – Read About to read blue sector
  • 31. Disk Access – Read After BLUE read After reading blue sector
  • 32. Disk Access – Read After BLUE read Red request scheduled next
  • 33. Disk Access – Seek After BLUE read Seek for RED Seek to red’s track
  • 34. Disk Access – Rotational Latency After BLUE read Seek for RED Rotational latency Wait for red sector to rotate around
  • 35. Disk Access – Read After BLUE read Seek for RED Rotational latency After RED read Complete read of red
  • 36. Disk Access – Service Time Components After BLUE read Seek for RED Rotational latency After RED read Seek Rotational Latency Data Transfer
  • 37. Speed, Size, and Cost ◼ A big challenge in the design of a computer system is to provide a sufficiently large memory, with a reasonable speed at an affordable cost. ◼ Static RAM: ▪ Very fast, but expensive, because a basic SRAM cell has a complex circuit making it impossible to pack a large number of cells onto a single chip. ◼ Dynamic RAM: ▪ Simpler basic cell circuit, hence are much less expensive, but significantly slower than SRAMs. ◼ Magnetic disks: ▪ Storage provided by DRAMs is higher than SRAMs, but is still less than what is necessary. ▪ Secondary storage such as magnetic disks provide a large amount of storage, but is much slower than DRAMs.
  • 39. Cache ● What is cache memory? ● Why we need it? ● Locality of reference (very important) - temporal - spatial ● Cache block – cache line ● A set of contiguous address locations of some size
  • 40. What is cache memory? ● The cache memory is a small-sized high speed volatile memory that provides high speed data access to a processor. ● The cache memory stores the frequently used computer programs, applications and the program data.
  • 41. Locality of Reference ◼ Analysis of programs indicates that many instructions in localized areas of a program are executed repeatedly during some period of time, while the others are accessed relatively less frequently. ▪ These instructions may be the ones in a loop, nested loop or few procedures calling each other repeatedly. ▪ This is called “locality of reference”. ◼ Temporal locality of reference: ▪ Recently executed instruction is likely to be executed again very soon. ◼ Spatial locality of reference: ▪ Instructions with addresses close to a recently instruction are likely to be executed soon.
  • 42. Cache ● Replacement algorithm ● Hit / miss ● Write-through / Write-back ● Load through Use of a cache memory. Cache Main memory Processor
  • 43. Cache hit • If the data is in the cache it is called a Read or Write hit. • Read hit: ▪ The data is obtained from the cache. • Write hit: ▪ Cache has a replica of the contents of the main memory. ▪ Contents of the cache and the main memory may be updated simultaneously. This is the write-through protocol. ▪ Update the contents of the cache, and mark it as updated by setting a bit known as the dirty bit or modified bit. The contents of the main memory are updated when this block is replaced. This is write-back or copy-back protocol.
  • 44. Cache miss • If the data is not present in the cache, then a Read miss or Write miss occurs. • Read miss: ▪ Block of words containing this requested word is transferred from the memory. ▪ After the block is transferred, the desired word is forwarded to the processor. ▪ The desired word may also be forwarded to the processor as soon as it is transferred without waiting for the entire block to be transferred. This is called load-through or early-restart. • Write-miss: ▪ Write-through protocol is used, then the contents of the main memory are updated directly. ▪ If write-back protocol is used, the block containing the addressed word is first brought into the cache. The desired word is overwritten with new information.
  • 45. Mapping functions ◼ Mapping functions determine how memory blocks are placed in the cache. ◼ A simple processor example: ▪ Cache consisting of 128 blocks of 16 words each. ▪ Total size of cache is 2048 (2K) words. ▪ Main memory is addressable by a 16-bit address. ▪ Main memory has 64K words. ▪ Main memory has 4K blocks of 16 words each. ◼ Three mapping functions: ▪ Direct mapping ▪ Associative mapping ▪ Set-associative mapping.
  • 46. Direct Mapping ta g ta g ta g Cache Main memor y Block 0 Block 1 Block 127 Block 128 Block 129 Block 255 Block 256 Block 257 Block 4095 Block 0 Block 1 Block 127 7 4 Main memory address Tag Block Wor d Figure 5.15. Direct-mapped cache. 5 4: one of 16 words. (each block has 16=24 words) 7: points to a particular block in the cache (128=27 ) 5: 5 tag bits are compared with the tag bits associated with its location in the cache. Identify which of the 32 blocks that are resident in the cache (4096/128). Block j of main memory maps onto block j modulo 128 of the cache
  • 47. Direct Mapping ● Tag: 11101 ● Block: 1111111=127, in the 127th block of the cache ● Word:1100=12, the 12th word of the 127th block in the cache 7 4 Main memory address Tag Block Wor d 5 11101,1111111,1100
  • 48. Associative Mapping 4 ta g ta g ta g Cache Main memor y Block 0 Block 1 Block i Block 4095 Block 0 Block 1 Block 127 12 Main memory address Figure 5.16. Associative-mapped cache. Tag Wor d 4: one of 16 words. (each block has 16=24 words) 12: 12 tag bits Identify which of the 4096 blocks that are resident in the cache 4096=212 .
  • 49. Associative Mapping ● Tag: 111011111111 ● Word:1100=12, the 12th word of a block in the cache 111011111111,1100 4 12 Main memory address Tag Wor d
  • 50. Set-Associative Mapping ta g ta g ta g Cache Main memor y Block 0 Block 1 Block 63 Block 64 Block 65 Block 127 Block 128 Block 129 Block 4095 Block 0 Block 1 Block 126 ta g ta g Block 2 Block 3 ta g Block 127 Main memory address 6 6 4 Tag Se t Wor d Set 0 Set 1 Set 63 Figure 5.17. Set-associative-mapped cache with two blocks per set. 4: one of 16 words. (each block has 16=24 words) 6: points to a particular set in the cache (128/2=64=26 ) 6: 6 tag bits is used to check if the desired block is present (4096/64=26 ).
  • 51. Set-Associative Mapping ● Tag: 111011 ● Set: 111111=63, in the 63th set of the cache ● Word:1100=12, the 12th word of the 63th set in the cache Main memory address 6 6 4 Tag Se t Wor d 111011,111111,1100
  • 52. Replacement Algorithms ● Difficult to determine which blocks to kick out ● Least Recently Used (LRU) block ● The cache controller tracks references to all blocks as computation proceeds. ● Increase / clear track counters when a hit/miss occurs
  • 53. ● List out the mapping functions. ● What is mean by cache memory?
  • 55. Overview ● Physical main memory is not as large as the address space spanned by an address issued by the processor. 232 = 4 GB, 264 = … ● When a program does not completely fit into the main memory, the parts of it not currently being executed are stored on secondary storage devices. ● Techniques that automatically move program and data blocks into the physical main memory when they are required for execution are called virtual-memory techniques. ● Virtual addresses will be translated into physical addresses.
  • 56. Virtual Memory ◼ Virtual memory is an architectural solution to increase the effective size of the memory system.
  • 58. Address Translation ● All programs and data are composed of fixed- length units called pages, each of which consists of a block of words that occupy contiguous locations in the main memory. ● Page cannot be too small or too large. ● The virtual memory mechanism bridges the size and speed gaps between the main memory and secondary storage – similar to cache.
  • 59. Address Translation Page frame Virtual address from processor in memory Offset Offset Virtual page number Page table address Page table base register Figure 5.27. Virtual-memory address translation. Contro l bit s Physical address in main memory PAGE TABLE Page frame +
  • 60. Address Translation ● The page table information is used by the MMU for every access, so it is supposed to be with the MMU. ● However, since MMU is on the processor chip and the page table is rather large, only small portion of it, which consists of the page table entries that correspond to the most recently accessed pages, can be accommodated within the MMU. ● Translation Lookaside Buffer (TLB)
  • 61. TLB Figure 5.28. Use of an associative-mapped TLB. No Yes Hit Miss Virtual address from processor TLB Offset Virtual page number number Virtual page Page frame in memory Control bits Offset Physical address in main memory Page frame =?
  • 62. Address translation (contd..) ◼ What happens if a program generates an access to a page that is not in the main memory? ◼ In this case, a page fault is said to occur. ▪ Whole page must be brought into the main memory from the disk, before the execution can proceed. ◼ Upon detecting a page fault by the MMU, following actions occur: ▪ MMU asks the operating system to intervene by raising an exception. ▪ Processing of the active task which caused the page fault is interrupted. ▪ Control is transferred to the operating system. ▪ Operating system copies the requested page from secondary storage to the main memory. ▪ Once the page is copied, control is returned to the task which was interrupted. 62
  • 63. Address translation (contd..) ◼ When a new page is to be brought into the main memory from secondary storage, the main memory may be full. ▪ Some page from the main memory must be replaced with this new page. ◼ How to choose which page to replace? ▪ This is similar to the replacement that occurs when the cache is full. ▪ The principle of locality of reference (?) can also be applied here. ▪ A replacement strategy similar to LRU can be applied. ◼ Since the size of the main memory is relatively larger compared to cache, a relatively large amount of programs and data can be held in the main memory. ▪ Minimizes the frequency of transfers between secondary storage and main memory. 63
  • 64. Cache & Virtual Memory ◼ Cache memory: ▪ Introduced to bridge the speed gap between the processor and the main memory. ▪ Implemented in hardware. ◼ Virtual memory: ▪ Introduced to bridge the speed gap between the main memory and secondary storage. ▪ Implemented in part by software. 64
  • 67. Accessing I/O devices Bus I/O device 1 I/O device n Processor Memory • Multiple I/O devices may be connected to the processor and the memory via a bus. • Bus consists of three sets of lines to carry address, data and control signals. • Each I/O device is assigned an unique address. • To access an I/O device, the processor places the address on the address lines. • The device recognizes the address, and responds to the control signals.
  • 68. Accessing I/O devices (contd..)  I/O devices and the memory may share the same address space:  Memory-mapped I/O.  Any machine instruction that can access memory can be used to transfer data to or from an I/O device.  Simpler software.  I/O devices and the memory may have different address spaces:  Special instructions to transfer data to and from I/O devices.  I/O devices may have to deal with fewer address lines.  I/O address lines need not be physically separate from memory address lines.  In fact, address lines may be shared between I/O devices and memory, with a control signal to indicate whether it is a memory address or an I/O address. 68
  • 69. Accessing I/O devices (contd..) I/O interface decoder Address Data and status registers Control circuits Input device Bus Address lines Data lines Control lines • I/O device is connected to the bus using an I/O interface circuit which has: - Address decoder, control circuit, and data and status registers. • Address decoder decodes the address placed on the address lines thus enabling the device to recognize its address. • Data register holds the data being transferred to or from the processor. • Status register holds information necessary for the operation of the I/O device. • Data and status registers are connected to the data lines, and have unique addresses. • I/O interface circuit coordinates I/O transfers.
  • 70. Accessing I/O devices (contd..)  Recall that the rate of transfer to and from I/O devices is slower than the speed of the processor. This creates the need for mechanisms to synchronize data transfers between them.  Program-controlled I/O:  Processor repeatedly monitors a status flag to achieve the necessary synchronization.  Processor polls the I/O device.  Two other mechanisms used for synchronizing data transfers between the processor and memory:  Interrupts.  Direct Memory Access.
  • 72. Interrupts • In program-controlled I/O, when the processor continuously monitors the status of the device, it does not perform any useful tasks. • An alternate approach would be for the I/O device to alert the processor when it becomes ready. o Do so by sending a hardware signal called an interrupt to the processor. o At least one of the bus control lines, called an interrupt-request line is dedicated for this purpose. • Processor can perform other useful tasks while it is waiting for the device to be ready.
  • 73. Interrupt • An interrupt is an event that causes the execution of one program to be suspended and execution of another program to be begin.
  • 74. Interrupts (contd..) Interrupt Service routine Program 1 here Interrupt occurs M i 2 1 i 1 + • Processor is executing the instruction located at address i when an interrupt occurs. • Routine executed in response to an interrupt request is called the interrupt-service routine. • When an interrupt occurs, control must be transferred to the interrupt service routine. • But before transferring control, the current contents of the PC (i+1), must be saved in a known location. • This will enable the return-from-interrupt instruction to resume execution at i+1. • Return address, or the contents of the PC are usually stored on the processor stack.
  • 75. Example • Some computations + print • Two subroutines: COMPUTE and PRINT • The printer accepts only one line of text at a time. • Try to overlap printing and computation.  COMPUTE produces first n lines of text;  PRINT sends the first line to the printer; then PRINT is suspended; COMPUTE continues to perform other computations;  After the printer finishes printing the first line, it send an interrupt-request signal to the processor;  In response, the processor interrupts execution of COMPUTE and transfers control to PRINT to send the next line;  COMPUTE resumes;  …
  • 76. Handling Multiple Devices • How can the processor recognize the device requesting an interrupt? • Given that different devices are likely to require different interrupt-service routines, how can the processor obtain the starting address of the appropriate routine in each case? • (Vectored interrupts) • Should a device be allowed to interrupt the processor while another interrupt is being serviced? • (Interrupt nesting) • How should two or more simultaneous interrupt requests be handled? • (Daisy-chain)
  • 77. Vectored Interrupts • The device requesting an interrupt may identify itself directly to the processor. o Device can do so by sending a special code (4 to 8 bits) the processor over the bus. o Code supplied by the device may represent a part of the starting address of the interrupt-service routine. o The remainder of the starting address is obtained by the processor based on other information such as the range of memory addresses where interrupt service routines are located. • Usually the location pointed to by the interrupting device is used to store the starting address of the interrupt-service routine.
  • 78. Interrupt Nesting Priority arbitration Device 1 Device 2 Device p Processor INTA1 INTR1 INTRp INTAp • Each device has a separate interrupt-request and interrupt-acknowledge line. • Each interrupt-request line is assigned a different priority level. • Interrupt requests received over these lines are sent to a priority arbitration circuit in the processor. • If the interrupt request has a higher priority level than the priority of the processor, then the request is accepted.
  • 79. Interrupts (contd..) Processor Device 2 INTR INTA Device n Device 1 Polling scheme: • If the processor uses a polling mechanism to poll the status registers of I/O devices to determine which device is requesting an interrupt. • In this case the priority is determined by the order in which the devices are polled. • The first device with status bit set to 1 is the device whose interrupt request is accepted. Daisy chain scheme: • Devices are connected to form a daisy chain. • Devices share the interrupt-request line, and interrupt-acknowledge line is connected to form a daisy chain. • When devices raise an interrupt request, the interrupt-request line is activated. • The processor in response activates interrupt-acknowledge. • Received by device 1, if device 1 does not need service, it passes the signal to device 2. • Device that is electrically closest to the processor has the highest priority.
  • 80. Interrupts (contd..) • When I/O devices were organized into a priority structure, each device had its own interrupt-request and interrupt-acknowledge line. • When I/O devices were organized in a daisy chain fashion, the devices shared an interrupt-request line, and the interrupt-acknowledge propagated through the devices. • A combination of priority structure and daisy chain scheme can also used. Device Device circuit Priority arbitration Processor Device Device INTR1 INTR p INTA1 INTAp • Devices are organized into groups. • Each group is assigned a different priority level. • All the devices within a single group share an interrupt-request line, and are connected to form a daisy chain.
  • 81. Exceptions  Interrupts caused by interrupt-requests sent by I/O devices.  Interrupts could be used in many other situations where the execution of one program needs to be suspended and execution of another program needs to be started.  In general, the term exception is used to refer to any event that causes an interruption.  Interrupt-requests from I/O devices is one type of an exception.  Other types of exceptions are:  Recovery from errors  Debugging  Privilege exception
  • 83. Direct Memory Access (contd..)  Direct Memory Access (DMA):  A special control unit may be provided to transfer a block of data directly between an I/O device and the main memory, without continuous intervention by the processor.  Control unit which performs these transfers is a part of the I/O device’s interface circuit. This control unit is called as a DMA controller.  DMA controller performs functions that would be normally carried out by the processor:  For each word, it provides the memory address and all the control signals.  To transfer a block of data, it increments the memory addresses and keeps track of the number of transfers.
  • 84. Direct Memory Access (contd..)  DMA controller can transfer a block of data from an external device to the processor, without any intervention from the processor.  However, the operation of the DMA controller must be under the control of a program executed by the processor. That is, the processor must initiate the DMA transfer.  To initiate the DMA transfer, the processor informs the DMA controller of:  Starting address,  Number of words in the block.  Direction of transfer (I/O device to the memory, or memory to the I/O device).  Once the DMA controller completes the DMA transfer, it informs the processor by raising an interrupt signal.
  • 85. Direct Memory Access memory Processor System bus Main Keyboard Disk/DMA controller Printer DMA controller Disk Disk • DMA controller connects a high-speed network to the computer bus. • Disk controller, which controls two disks also has DMA capability. It provides two DMA channels. • It can perform two independent DMA operations, as if each disk has its own DMA controller. The registers to store the memory address, word count and status and control information are duplicated. Network Interface
  • 86. Direct Memory Access (contd..)  Processor and DMA controllers have to use the bus in an interwoven fashion to access the memory.  DMA devices are given higher priority than the processor to access the bus.  Among different DMA devices, high priority is given to high-speed peripherals such as a disk or a graphics display device.  Processor originates most memory access cycles on the bus.  DMA controller can be said to “steal” memory access cycles from the bus. This interweaving technique is called as “cycle stealing”.  An alternate approach is the provide a DMA controller an exclusive capability to initiate transfers on the bus, and hence exclusive access to the main memory. This is known as the block or burst mode.
  • 87. Bus arbitration  Processor and DMA controllers both need to initiate data transfers on the bus and access main memory.  The device that is allowed to initiate transfers on the bus at any given time is called the bus master.  When the current bus master relinquishes its status as the bus master, another device can acquire this status.  The process by which the next device to become the bus master is selected and bus mastership is transferred to it is called bus arbitration.  Centralized arbitration:  A single bus arbiter performs the arbitration.  Distributed arbitration:  All devices participate in the selection of the next bus master.
  • 89. Centralized Bus Arbitration(cont.,) • Bus arbiter may be the processor or a separate unit connected to the bus. • Normally, the processor is the bus master, unless it grants bus membership to one of the DMA controllers. • DMA controller requests the control of the bus by asserting the Bus Request (BR) line. • In response, the processor activates the Bus-Grant1 (BG1) line, indicating that the controller may use the bus when it is free. • BG1 signal is connected to all DMA controllers in a daisy chain fashion. • BBSY signal is 0, it indicates that the bus is busy. When BBSY becomes 1, the DMA controller which asserted BR can acquire control of the bus.
  • 90. Centralized arbitration (contd..) BBSY BG1 BG2 Bus master BR Processor DMA controller 2 Processor Time DMA controller 2 asserts the BR signal. Processor asserts the BG1 signal BG1 signal propagates to DMA#2. Processor relinquishes control of the bus by setting BBSY to 1.
  • 91. Distributed arbitration  All devices waiting to use the bus share the responsibility of carrying out the arbitration process.  Arbitration process does not depend on a central arbiter and hence distributed arbitration has higher reliability.  Each device is assigned a 4-bit ID number.  All the devices are connected using 5 lines, 4 arbitration lines to transmit the ID, and one line for the Start-Arbitration signal.  To request the bus a device:  Asserts the Start-Arbitration signal.  Places its 4-bit ID number on the arbitration lines.  The pattern that appears on the arbitration lines is the logical-OR of all the 4-bit device IDs placed on the arbitration lines.
  • 93. Distributed arbitration(Contd.,) • Arbitration process: o Each device compares the pattern that appears on the arbitration lines to its own ID, starting with MSB. o If it detects a difference, it transmits 0s on the arbitration lines for that and all lower bit positions. o The pattern that appears on the arbitration lines is the logical-OR of all the 4-bit device IDs placed on the arbitration lines.
  • 94. Distributed arbitration (contd..) • Device A has the ID 5 and wants to request the bus: - Transmits the pattern 0101 on the arbitration lines. • Device B has the ID 6 and wants to request the bus: - Transmits the pattern 0110 on the arbitration lines. • Pattern that appears on the arbitration lines is the logical OR of the patterns: - Pattern 0111 appears on the arbitration lines. Arbitration process: • Each device compares the pattern that appears on the arbitration lines to its own ID, starting with MSB. • If it detects a difference, it transmits 0s on the arbitration lines for that and all lower bit positions. • Device A compares its ID 5 with a pattern 0101 to pattern 0111. • It detects a difference at bit position 0, as a result, it transmits a pattern 0100 on the arbitration lines. • The pattern that appears on the arbitration lines is the logical-OR of 0100 and 0110, which is 0110. • This pattern is the same as the device ID of B, and hence B has won the arbitration.
  • 95. Buses
  • 96. Buses • Processor, main memory, and I/O devices are interconnected by means of a bus. • Bus provides a communication path for the transfer of data. o Bus also includes lines to support interrupts and arbitration. • A bus protocol is the set of rules that govern the behavior of various devices connected to the bus, as to when to place information on the bus, when to assert control signals, etc.
  • 97. Buses (contd..)  Bus lines may be grouped into three types:  Data  Address  Control  Control signals specify:  Whether it is a read or a write operation.  Required size of the data, when several operand sizes (byte, word, long word) are possible.  Timing information to indicate when the processor and I/O devices may place data or receive data from the bus.  Schemes for timing of data transfers over a bus can be classified into:  Synchronous,  Asynchronous.
  • 99. Synchronous bus (contd..) Bus cycle Data Bus clock command Address and t0 t1 t2 Time Master places the device address and command on the bus, and indicates that it is a Read operation. Addressed slave places data on the data lines Master “strobes” the data on the data lines into its input buffer, for a Read operation. • In case of a Write operation, the master places the data on the bus along with the address and commands at time t0. • The slave strobes the data into its input buffer at time t2.
  • 100. Synchronous bus (contd..) • Once the master places the device address and command on the bus, it takes time for this information to propagate to the devices: o This time depends on the physical and electrical characteristics of the bus. • Also, all the devices have to be given enough time to decode the address and control signals, so that the addressed slave can place data on the bus. • Width of the pulse t1 - t0 depends on: o Maximum propagation delay between two devices connected to the bus. o Time taken by all the devices to decode the address and control signals, so that the addressed slave can respond at time t1.
  • 101. Synchronous bus (contd..) • At the end of the clock cycle, at time t2, the master strobes the data on the data lines into its input buffer if it’s a Read operation. o “Strobe” means to capture the values of the data and store them into a buffer. • When data are to be loaded into a storage buffer register, the data should be available for a period longer than the setup time of the device. • Width of the pulse t2 - t1 should be longer than: o Maximum propagation time of the bus plus o Set up time of the input buffer register of the master.
  • 102. Synchronous bus (contd..) Data Bus clock command Address and t 0 t1 t 2 command Address and Data Seen by master Seen by slave tAM tAS tDS tDM Time • Signals do not appear on the bus as soon as they are placed on the bus, due to the propagation delay in the interface circuits. • Signals reach the devices after a propagation delay which depends on the characteristics of the bus. • Data must remain on the bus for some time after t2 equal to the hold time of the buffer. Address & command appear on the bus. Address & command reach the slave. Data appears on the bus. Data reaches the master.
  • 103. Synchronous bus (contd..) • Data transfer has to be completed within one clock cycle. o Clock period t2 - t0 must be such that the longest propagation delay on the bus and the slowest device interface must be accommodated. o Forces all the devices to operate at the speed of the slowest device. • Processor just assumes that the data are available at t2 in case of a Read operation, or are read by the device in case of a Write operation. o What if the device is actually failed, and never really responded?
  • 104. Synchronous bus (contd..) • Most buses have control signals to represent a response from the slave. • Control signals serve two purposes: o Inform the master that the slave has recognized the address, and is ready to participate in a data transfer operation. o Enable to adjust the duration of the data transfer operation based on the speed of the participating slaves. • High-frequency bus clock is used: o Data transfer spans several clock cycles instead of just one clock cycle as in the earlier case.
  • 105. Synchronous bus (contd..) 1 2 3 4 Clock Address Command Data Slave-ready Time Address & command requesting a Read operation appear on the bus. Slave places the data on the bus, and asserts Slave-ready signal. Master strobes data into the input buffer. Clock changes are seen by all the devices at the same time.
  • 106. Asynchronous bus  Data transfers on the bus is controlled by a handshake between the master and the slave.  Common clock in the synchronous bus case is replaced by two timing control lines:  Master-ready,  Slave-ready.  Master-ready signal is asserted by the master to indicate to the slave that it is ready to participate in a data transfer.  Slave-ready signal is asserted by the slave in response to the master-ready from the master, and it indicates to the master that the slave is ready to participate in a data transfer.
  • 107. Asynchronous bus (contd..) • Data transfer using the handshake protocol: o Master places the address and command information on the bus. o Asserts the Master-ready signal to indicate to the slaves that the address and command information has been placed on the bus. o All devices on the bus decode the address. o Address slave performs the required operation, and informs the processor it has done so by asserting the Slave-ready signal. o Master removes all the signals from the bus, once Slave-ready is asserted. o If the operation is a Read operation, Master also strobes the data into its input buffer.
  • 108. Asynchronous bus (contd..) Slave-ready Data Master-ready and command Address Bus cycle t1 t2 t3 t4 t5 t0 Time t0 - Master places the address and command information on the bus. t1 - Master asserts the Master-ready signal. Master-ready signal is asserted at t1 instead of t0 t2 - Addressed slave places the data on the bus and asserts the Slave-ready signal. t3 - Slave-ready signal arrives at the master . t4 - Master removes the address and command information. t5 - Slave receives the transition of the Master-ready signal from 1 to 0. It removes
  • 109. Asynchronous vs. Synchronous bus • Advantages of asynchronous bus: o Eliminates the need for synchronization between the sender and the receiver. o Can accommodate varying delays automatically, using the Slave-ready signal. • Disadvantages of asynchronous bus: o Data transfer rate with full handshake is limited by two-round trip delays. o Data transfers using a synchronous bus involves only one round trip delay, and hence a synchronous bus can achieve faster rates.

Editor's Notes

  • #27: Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #28: Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #29: Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #30: Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #31: Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #32: Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #33: Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #34: Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #35: Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #36: Goal: Show the inefficeincy of current disk requests. Conveyed Ideas: Rotational latency is wasted time that can be used to service tasks Background Information: None. Slide Background: None. Kill text and arrows
  • #83: This alternative approach is called as direct memory access. DMA consists of a special control unit which is provided to transfer a block of data directly between an I/O device and the main memory without continuous intervention by the processor. A control unit which performs these transfers without the intervention of the processor is a part of the I/O device’s interface circuit, and this controller is called as the DMA controller. DMA controller performs functions that would be normally be performed by the processor. The processor will have to provide a memory address and all the control signals. So, the DMA controller will also provide with the memory address where the data is going to be stored along with the necessary control signals. When a block of data needs to be transferred, the DMA controller will also have to increment the memory addresses and keep track of the number of words that have been transferred.
  • #84: Repeat DMA controller. DMA controller can be used to transfer a block of data from an external device to the processor, without requiring any help from the processor. As a result the processor is free to execute other programs. However, the DMA controller should perform the task of transferring data to or from an I/O device for a program that is being executed by a processor. That is, the DMA controller does not and should not have the capability to determine when a data transfer operation should take place. The processor must initiate DMA transfer of data, when it is indicated or required by the program that is being executed by the processor. When the processor determines that the program that is being executed requires a DMA transfer, it informs the DMA controller which sits in the interface circuit of the device of three things, namely, the starting address of the memory location, the number of words that needs to be transferred, and the direction of transfer that is, whether the data needs to be transferred from the I/O device to the memory or from the memory to the I/O device. After initiating the DMA transfer, the processor suspends the program that initiated the transfer, and continues with the execution of some other program. The program whose execution is suspended is said to be in the blocked state.
  • #85: Let us consider a memory organization with two DMA controllers. In this memory organization, a DMA controller is used to connect a high speed network to the computer bus. In addition, disk controller which also controls two disks may have DMA capability. The disk controller controls two disks and it also has DMA capability. The disk controller provides two DMA channels. The disk controller can two independent DMA operations, as if each disk has its own DMA controller. Each DMA controller has three registers, one to store the memory address, one to store the word count, and the last to store the status and control information. There are two copies of these three registers in order to perform independent DMA operations. That is, these registers are duplicated.
  • #86: Processor also has to transfer data to and from the main memory. Also, the DMA controller is responsible for transferring data to and from the I/O device to the main memory. Both the processor and the DMA controller have to use the external bus to talk to the main memory. Usually, DMA controllers are given higher priority than the processor to access the bus. Now, we also need to decide the priority among different DMA devices that may need to use the bus. Among these different DMA devices, high priority is given to high speed peripherals such as a disk or a graphics display device. Usually, the processor originates most cycles on the bus. The DMA controller can be said to steal memory access cycles on from the bus. Thus, the processor and the DMA controller use the bus in an interwoven fashion. This interweaving technique is called as cycle stealing. An alternate approach would be to provide DMA controllers exclusive capability to initiate transfers on the bus, and hence exclusive access to the main memory. This is known as the block mode or the burst mode of operation.
  • #87: Processor and DMA controllers both need to initiate data transfers on the bus and access main memory. The process of using the bus to perform a data transfer operation is called as the initiation of a transfer operation. At any point in time only one device is allowed to initiate transfers on the bus. The device that is allowed to initiate transfers on the bus at any given time is called the bus master. When the current bus master releases control of the bus, another device can acquire the status of the bus master. How does one determine which is the next device which will acquire the status of the bus master. Note that there may be several DMA controllers plus the processor which requires access to the bus. The process by which the next device to become the bus master is selected and bus mastership is transferred to it is called bus arbitration. There are two types of bus arbitration processes. Centralized arbitration and distributed arbitration. In case of centralized arbitration, a single bus arbiter performs the arbitration. Whereas in case of distributed arbitration all devices which need to initiate data transfers on the bus participate or are involved in the selection of the next bus master.
  • #97: Recall that one device plays the role of a master. The device that initiates the data transfer on the bus by issuing read or write control signals is called as a master. The device that is being addressed by the master is called a slave or a target.
  • #105: Slave-ready signal is an acknowledgement from the slave to the master to confirm that the valid data has been sent. Depending on when the slave-ready signal is asserted, the duration of the data transfer can change.