Computer Architecture Input Output Memory.pptx

Velammal Engineering College
Department of Computer Science
and Engineering
Welcome…
Dr.S.Gunasundari
Mr. A. Arockia Abins &
Ms. R. Amirthavalli,
CSE,
Velammal Engineering College

Subject Code / Name:
19IT202T /
Computer Architecture

Syllabus – Unit V
UNIT-V MEMORY & I/O SYSTEMS
Memory Hierarchy – memory technologies – Cache Memory
– Performance Considerations Virtual Memory, TLB’s –
Accessing I/O devices – Interrupts – Direct Memory Access –
Bus Structure – Bus operation.

Text Books
● Book 1:
● Name: Computer Organization and Design: The
Hardware/Software Interface
● Authors: David A. Patterson and John L. Hennessy
● Publisher: Morgan Kaufmann / Elsevier
● Edition: Fifth Edition, 2014
● Book 2:
● Name: Computer Organization and Embedded Systems Interface
● Authors: Carl Hamacher, Zvonko Vranesic, Safwat Zaky and Naraig
Manjikian
● Publisher: Tata McGraw Hill
● Edition: Sixth Edition, 2012

Memory
● A computer memory is the storage space in the
computer system where both data and the
program instructions are stored.

Memory size
● bit – ‘0’ or ‘1’
● 8 bits – 1 byte

Traditional Architecture
Up to 2k
addressable
MDR
MAR
Connection of the memory to the processor.
k-
bit
address bus
n-
bit
data
bus
Control
lines
( , MFC,
etc.)
Processo
r Memory
locations
Word length = n
bits
W
R /

Basic Concepts
● The maximum size of the memory that can be used in any computer is
determined by the addressing scheme.
16-bit addresses = 216
= 64K memory locations
● Most modern computers are byte addressable.
2
k
4
- 2
k
3
- 2
k
2
- 2
k
1
- 2
k
4
-
2
k
4
-
0 1 2 3
4 5 6 7
0
0
4
2
k
1
- 2
k
2
- 2
k
3
- 2
k
4
-
3 2 1 0
7 6 5 4
Byte address
Byte address
(a) Big-endian
assignment
(b) Little-endian
assignment
4
Word
address
•
•
•
•
•
•

Internal organization of memory
chips

Memory hierarchy
● A structure that uses multiple levels of memories; as
the distance from the processor increases, the size
of the memories and the access time both
increase.

Speed, Size, and Cost
Processor
Primary
cach
e
Secondar
y cach
e
Mai
n
Magnetic
disk
memory
Increasin
g siz
e
Increasing
spee
d
Memory hierarchy
secondary
memory
Increasing
cost per
bit
Registers
L
1
L2

● Check whether the following statements are true
not?
● Most of the cost of the memory hierarchy
is at the highest level.
● Most of the capacity of the memory
hierarchy is at the lowest level.

Basic Terms in Memory
1.HIT : If the data requested by the processor appears in some block in the
upper level, this is called a hit .
2. Miss : If the data is not found in the upper level, the request is called a
miss.
The lower level in the hierarchy is then accessed to retrieve the block
containing the requested data.
3. HIT RATE OR HIT RATIO : It is the fraction of memory accesses found in
the upper level; it is often used as a measure of the performance of the
memory hierarchy.
4.MISS RATE : It is the fraction of memory accesses not found in the upper
level.
5. HIT TIME : It is the time to access the upper level of the memory
hierarchy, which includes the time needed to determine whether the access is a
hit or a miss.
6. MISS PENALTY : It is the time to replace a block in the upper level with
the corresponding block from the lower level, plus the time to deliver this block to

Memory Technologies
Four primary Technologies:
1. Main Memory -- DRAM
2. Cache Memory -- SRAM
3. Flash Memory -- nonvolatile memory is the
secondary memory in
Personal Mobile Devices
4.Secondary Memory -- Magnetic Disk

SRAM Cell
◼ Two transistor inverters are cross connected to implement a basic
flip-flop.
◼ The cell is connected to one word line and two bits lines by transistors
T1 and T2
◼ When word line is at ground level, the transistors are turned off and
the latch retains its state
◼ Read operation: In order to read state of SRAM cell, the word line is
activated to close switches T1 and T2. Sense/Write circuits at the
bottom monitor the state of b and b’
Y
X
Word
line
Bit
lines
b
T 2
T1
b ′

SRAM Technology
● The circuits are capable of retaining their state as long as
power is applied.
● value is stored on a pair of inverting gates
● very fast but takes up more space than DRAM (4 to 6 transistors)
● SRAMs are said to be volatile memories because their contents are
lost when power is interrupted.
● Static RAMs can be accessed very quickly

DRAMs
● Static RAMs are fast, but they cost more area and are more expensive.
● Dynamic RAMs (DRAMs) are cheap and area efficient, but they can not retain their
state indefinitely – need to be periodically refreshed.
● value is stored as a charge on capacitor (must be refreshed)
● very small but slower than SRAM (factor of 5 to 10)
single-transistor dynamic memory cell
T
C
Word line
Bit line

◼ Static RAMs (SRAMs):
▪ Consist of circuits that are capable of retaining their state as long
as the power is applied.
▪ Volatile memories, because their contents are lost when power is
interrupted.
▪ Access times of static RAMs are in the range of few nanoseconds.
▪ However, the cost is usually high.
◼ Dynamic RAMs (DRAMs):
▪ Do not retain their state indefinitely.
▪ Contents must be periodically refreshed.
▪ Contents may be refreshed while accessing them for reading.

Questions
◼ What is mean by memory hierarchy?
◼ What are the factors to be considered in memory
hierarchy?
◼ Define hit and miss.
◼ Comparison between SRAM and DRAM.

Latency & Bandwidth
◼ Memory latency is the time it takes to transfer a
word of data to or from memory
◼ Memory bandwidth is the number of bits or bytes
that can be transferred in one second.

Flash Memory
▪ Has similar approach to EEPROM.
▪ Read the contents of a single cell, but write the contents of an entire
block of cells.
▪ Flash devices have greater density.
▪ Higher capacity and low storage cost per bit.
▪ Power consumption of flash memory is very low, making it attractive for
use in equipment that is battery-driven.
▪ Single flash chips are not sufficiently large, so
larger memory modules are implemented using
flash cards and flash drives.

Secondary Memory
● Permanent memory or Non-volatile memory
● Used to store programs and data on a long-term
basis
● Storage capacity : very high (ex: 1 TB)
● Access speed : very slow

Organization of Data on a
Disk
Sector 0, track
0
Sector 3,
track
n
Organization of one surface of a disk.
Sector 0, track
1

Tracks divided into sectors
Magnetic Disk Structure
Surface organized into tracks

Disk Access
Head in position above a track

Disk Access
Rotation is counter-clockwise

Disk Access – Read
About to read blue sector

After BLUE read
After reading blue sector

After BLUE read
Red request scheduled next

Disk Access – Seek
After BLUE read Seek for RED
Seek to red’s track

Disk Access – Rotational
Latency
After BLUE read Seek for RED Rotational latency
Wait for red sector to rotate around

After BLUE read Seek for RED Rotational latency After RED read
Complete read of red

Disk Access – Service Time
Components
After BLUE read Seek for RED Rotational latency After RED read
Seek
Rotational Latency
Data Transfer

Speed, Size, and Cost
◼ A big challenge in the design of a computer system is to
provide a sufficiently large memory, with a reasonable
speed at an affordable cost.
◼ Static RAM:
▪ Very fast, but expensive, because a basic SRAM cell has a complex
circuit making it impossible to pack a large number of cells onto a single
chip.
◼ Dynamic RAM:
▪ Simpler basic cell circuit, hence are much less expensive, but significantly
slower than SRAMs.
◼ Magnetic disks:
▪ Storage provided by DRAMs is higher than SRAMs, but is still less than
what is necessary.
▪ Secondary storage such as magnetic disks provide a large amount
of storage, but is much slower than DRAMs.

Cache
● What is cache memory?
● Why we need it?
● Locality of reference (very important)
- temporal
- spatial
● Cache block – cache line
● A set of contiguous address locations of some size

What is cache memory?
● The cache memory is a small-sized high speed
volatile memory that provides high speed data
access to a processor.
● The cache memory stores the frequently used
computer programs, applications and the program
data.

Locality of Reference
◼ Analysis of programs indicates that many instructions
in localized areas of a program are executed
repeatedly during some period of time, while the
others are accessed relatively less frequently.
▪ These instructions may be the ones in a loop, nested loop or few
procedures calling each other repeatedly.
▪ This is called “locality of reference”.
◼ Temporal locality of reference:
▪ Recently executed instruction is likely to be executed again very
soon.
◼ Spatial locality of reference:
▪ Instructions with addresses close to a recently instruction are likely
to be executed soon.

Cache
● Replacement algorithm
● Hit / miss
● Write-through / Write-back
● Load through
Use of a cache memory.
Cache
Main
memory
Processor

Cache hit
• If the data is in the cache it is called a Read or Write hit.
• Read hit:
▪ The data is obtained from the cache.
• Write hit:
▪ Cache has a replica of the contents of the main memory.
▪ Contents of the cache and the main memory may be updated
simultaneously. This is the write-through protocol.
▪ Update the contents of the cache, and mark it as updated by setting a bit
known as the dirty bit or modified bit. The contents of the main
memory are updated when this block is replaced. This is write-back
or copy-back protocol.

Cache miss
• If the data is not present in the cache, then a Read miss
or Write miss occurs.
• Read miss:
▪ Block of words containing this requested word is transferred from the memory.
▪ After the block is transferred, the desired word is forwarded to the processor.
▪ The desired word may also be forwarded to the processor as soon as it is transferred
without waiting for the entire block to be transferred. This is called load-through or
early-restart.
• Write-miss:
▪ Write-through protocol is used, then the contents of the main memory are
updated directly.
▪ If write-back protocol is used, the block containing the
addressed word is first brought into the cache. The desired word
is overwritten with new information.

Mapping functions
◼ Mapping functions determine how memory blocks
are placed in the cache.
◼ A simple processor example:
▪ Cache consisting of 128 blocks of 16 words each.
▪ Total size of cache is 2048 (2K) words.
▪ Main memory is addressable by a 16-bit address.
▪ Main memory has 64K words.
▪ Main memory has 4K blocks of 16 words each.
◼ Three mapping functions:
▪ Direct mapping
▪ Associative mapping
▪ Set-associative mapping.

Direct Mapping
ta
g
ta
g
ta
g
Cache
Main
memor
y
Block 0
Block 1
Block 127
Block 128
Block 129
Block 255
Block 256
Block 257
Block 4095
Block 0
Block 1
Block 127
7 4 Main memory
address
Tag Block Wor
d
Figure 5.15. Direct-mapped
cache.
5
4: one of 16 words. (each
block has 16=24
words)
7: points to a particular block
in the cache (128=27
)
5: 5 tag bits are compared
with the tag bits associated
with its location in the cache.
Identify which of the 32
blocks that are resident in
the cache (4096/128).
Block j of main memory maps onto
block j modulo 128 of the cache

Direct Mapping
● Tag: 11101
● Block: 1111111=127, in the 127th
block of the cache
● Word:1100=12, the 12th
word of the 127th
block in the
cache
7 4 Main memory
address
Tag Block Wor
d
5
11101,1111111,1100

Associative Mapping
4
ta
g
ta
g
ta
g
Cache
Main
memor
y
Block 0
Block 1
Block i
Block 4095
Block 0
Block 1
Block 127
12 Main memory
address
Figure 5.16. Associative-mapped cache.
Tag Wor
d
block has 16=24
words)
12: 12 tag bits Identify which
of the 4096 blocks that are
resident in the cache
4096=212
.

Associative Mapping
● Tag: 111011111111
● Word:1100=12, the 12th
word of a block in the cache
111011111111,1100
4
12 Main memory
address
Tag Wor
d

Set-Associative Mapping
ta
g
ta
g
ta
g
Cache
Main
memor
y
Block 0
Block 1
Block 63
Block 64
Block 65
Block 127
Block 128
Block 129
Block 4095
Block 0
Block 1
Block 126
ta
g
ta
g
Block 2
Block 3
ta
g Block 127
Main memory
address
6 6 4
Tag Se
t
Wor
d
Set
0
Set
1
Set
63
Figure 5.17. Set-associative-mapped cache with two blocks per set.
block has 16=24
words)
6: points to a particular set in
the cache (128/2=64=26
)
6: 6 tag bits is used to check
if the desired block is
present (4096/64=26
).

Set-Associative Mapping
● Tag: 111011
● Set: 111111=63, in the 63th
set of the cache
● Word:1100=12, the 12th
word of the 63th set in the
cache
Main memory
address
6 6 4
Tag Se
t
Wor
d
111011,111111,1100

Replacement Algorithms
● Difficult to determine which blocks to kick out
● Least Recently Used (LRU) block
● The cache controller tracks references to all blocks
as computation proceeds.
● Increase / clear track counters when a hit/miss
occurs

● List out the mapping functions.
● What is mean by cache memory?

Overview
● Physical main memory is not as large as the address space
spanned by an address issued by the processor.
232
= 4 GB, 264
= …
● When a program does not completely fit into the main
memory, the parts of it not currently being executed are
stored on secondary storage devices.
● Techniques that automatically move program and data
blocks into the physical main memory when they are
required for execution are called virtual-memory
techniques.
● Virtual addresses will be translated into physical addresses.

Virtual Memory
◼ Virtual memory is an architectural solution to increase the effective
size of the memory system.

Overview
Memory
Management
Unit

Address Translation
● All programs and data are composed of fixed-
length units called pages, each of which consists of
a block of words that occupy contiguous locations
in the main memory.
● Page cannot be too small or too large.
● The virtual memory mechanism bridges the size and
speed gaps between the main memory and
secondary storage – similar to cache.

Address Translation
Page
frame
Virtual address from
processor
in
memory
Offset
Offset
Virtual page
number
Page table
address
Page table base register
Figure 5.27. Virtual-memory address
translation.
Contro
l bit
s
Physical address in main
memory
PAGE TABLE
Page
frame
+

Address Translation
● The page table information is used by the MMU for
every access, so it is supposed to be with the MMU.
● However, since MMU is on the processor chip and
the page table is rather large, only small portion of
it, which consists of the page table entries that
correspond to the most recently accessed pages,
can be accommodated within the MMU.
● Translation Lookaside Buffer (TLB)

TLB
Figure 5.28. Use of an associative-mapped
TLB.
No
Yes
Hit
Miss
Virtual address from
processor
TLB
Offset
Virtual page number
number
Virtual page Page frame
in memory
Control
bits
Offset
Physical address in main memory
Page frame
=?

Address translation (contd..)
◼ What happens if a program generates an access
to a page that is not in the main memory?
◼ In this case, a page fault is said to occur.
▪ Whole page must be brought into the main memory from the
disk, before the execution can proceed.
◼ Upon detecting a page fault by the MMU,
following actions occur:
▪ MMU asks the operating system to intervene by raising an
exception.
▪ Processing of the active task which caused the page fault is
interrupted.
▪ Control is transferred to the operating system.
▪ Operating system copies the requested page from secondary
storage to the main memory.
▪ Once the page is copied, control is returned to the task which
was interrupted.
62

Address translation (contd..)
◼ When a new page is to be brought into the main
memory from secondary storage, the main memory
may be full.
▪ Some page from the main memory must be replaced with this new
page.
◼ How to choose which page to replace?
▪ This is similar to the replacement that occurs when the cache is full.
▪ The principle of locality of reference (?) can also be applied here.
▪ A replacement strategy similar to LRU can be applied.
◼ Since the size of the main memory is relatively larger
compared to cache, a relatively large amount of
programs and data can be held in the main memory.
▪ Minimizes the frequency of transfers between secondary storage and
main memory.
63

Cache & Virtual Memory
◼ Cache memory:
▪ Introduced to bridge the speed gap between the processor and
the main memory.
▪ Implemented in hardware.
◼ Virtual memory:
▪ Introduced to bridge the speed gap between the main memory
and secondary storage.
▪ Implemented in part by software.
64

Accessing I/O devices
Bus
I/O device 1 I/O device n
Processor Memory
• Multiple I/O devices may be connected to the processor and the memory via a bus.
• Bus consists of three sets of lines to carry address, data and control signals.
• Each I/O device is assigned an unique address.
• To access an I/O device, the processor places the address on the address lines.
• The device recognizes the address, and responds to the control signals.

(contd..)
 I/O devices and the memory may share the same
address space:
 Memory-mapped I/O.
 Any machine instruction that can access memory can be used to transfer
data to or from an I/O device.
 Simpler software.
 I/O devices and the memory may have different address
spaces:
 Special instructions to transfer data to and from I/O devices.
 I/O devices may have to deal with fewer address lines.
 I/O address lines need not be physically separate from memory address
lines.
 In fact, address lines may be shared between I/O devices and memory,
with a control signal to indicate whether it is a memory address or an I/O
address.
68

(contd..)
I/O
interface
decoder
Address Data and
status registers
Control
circuits
Input device
Bus
Address lines
Data lines
Control lines
• I/O device is connected to the bus using an I/O interface circuit which has:
- Address decoder, control circuit, and data and status registers.
• Address decoder decodes the address placed on the address lines thus enabling the
device to recognize its address.
• Data register holds the data being transferred to or from the processor.
• Status register holds information necessary for the operation of the I/O device.
• Data and status registers are connected to the data lines, and have unique addresses.
• I/O interface circuit coordinates I/O transfers.

(contd..)
 Recall that the rate of transfer to and from I/O
devices is slower than the speed of the processor.
This creates the need for mechanisms to synchronize
data transfers between them.
 Program-controlled I/O:
 Processor repeatedly monitors a status flag to achieve the
necessary synchronization.
 Processor polls the I/O device.
 Two other mechanisms used for synchronizing data
transfers between the processor and memory:
 Interrupts.
 Direct Memory Access.

Interrupts
• In program-controlled I/O, when the processor
continuously monitors the status of the device, it
does not perform any useful tasks.
• An alternate approach would be for the I/O device
to alert the processor when it becomes ready.
o Do so by sending a hardware signal called an interrupt to the
processor.
o At least one of the bus control lines, called an interrupt-request
line is dedicated for this purpose.
• Processor can perform other useful tasks while it is
waiting for the device to be ready.

Interrupt
• An interrupt is an event that causes the
execution of one program to be suspended and
execution of another program to be begin.

Interrupts (contd..)
Interrupt Service routine
Program 1
here
Interrupt
occurs
M
i
2
1
i 1
+
• Processor is executing the instruction located at address i when an interrupt occurs.
• Routine executed in response to an interrupt request is called the interrupt-service routine.
• When an interrupt occurs, control must be transferred to the interrupt service routine.
• But before transferring control, the current contents of the PC (i+1), must be saved in a known
location.
• This will enable the return-from-interrupt instruction to resume execution at i+1.
• Return address, or the contents of the PC are usually stored on the processor stack.

Example
• Some computations + print
• Two subroutines: COMPUTE and PRINT
• The printer accepts only one line of text at a time.
• Try to overlap printing and computation.
 COMPUTE produces first n lines of text;
 PRINT sends the first line to the printer; then PRINT is suspended;
COMPUTE continues to perform other computations;
 After the printer finishes printing the first line, it send an
interrupt-request signal to the processor;
 In response, the processor interrupts execution of COMPUTE
and transfers control to PRINT to send the next line;
 COMPUTE resumes;
 …

Handling Multiple
Devices
• How can the processor recognize the device requesting an
interrupt?
• Given that different devices are likely to require different
interrupt-service routines, how can the processor obtain the
starting address of the appropriate routine in each case?
• (Vectored interrupts)
• Should a device be allowed to interrupt the processor while
another interrupt is being serviced?
• (Interrupt nesting)
• How should two or more simultaneous interrupt requests be
handled?
• (Daisy-chain)

Vectored Interrupts
• The device requesting an interrupt may identify itself
directly to the processor.
o Device can do so by sending a special code (4 to 8 bits) the
processor over the bus.
o Code supplied by the device may represent a part of the starting
address of the interrupt-service routine.
o The remainder of the starting address is obtained by the processor
based on other information such as the range of memory
addresses where interrupt service routines are located.
• Usually the location pointed to by the interrupting
device is used to store the starting address of the
interrupt-service routine.

Interrupt Nesting
Priority arbitration
Device 1 Device 2 Device p
Processor
INTA1
INTR1 INTRp
INTAp
• Each device has a separate interrupt-request and interrupt-acknowledge line.
• Each interrupt-request line is assigned a different priority level.
• Interrupt requests received over these lines are sent to a priority arbitration circuit
in the processor.
• If the interrupt request has a higher priority level than the priority of the processor,
then the request is accepted.

Processor
Device 2
INTR
INTA
Device n
Device 1
Polling scheme:
• If the processor uses a polling mechanism to poll the status registers of I/O devices
to determine which device is requesting an interrupt.
• In this case the priority is determined by the order in which the devices are polled.
• The first device with status bit set to 1 is the device whose interrupt request is
accepted.
Daisy chain scheme:
• Devices are connected to form a daisy chain.
• Devices share the interrupt-request line, and interrupt-acknowledge line is connected
to form a daisy chain.
• When devices raise an interrupt request, the interrupt-request line is activated.
• The processor in response activates interrupt-acknowledge.
• Received by device 1, if device 1 does not need service, it passes the signal to device 2.
• Device that is electrically closest to the processor has the highest priority.

• When I/O devices were organized into a priority structure, each device had its own
interrupt-request and interrupt-acknowledge line.
• When I/O devices were organized in a daisy chain fashion, the devices shared an
interrupt-request line, and the interrupt-acknowledge propagated through the devices.
• A combination of priority structure and daisy chain scheme can also used.
Device Device
circuit
Priority arbitration
Processor
Device Device
INTR1
INTR p
INTA1
INTAp
• Devices are organized into groups.
• Each group is assigned a different priority level.
• All the devices within a single group share an interrupt-request line, and are
connected to form a daisy chain.

Exceptions
 Interrupts caused by interrupt-requests sent by I/O
devices.
 Interrupts could be used in many other situations where
the execution of one program needs to be suspended
and execution of another program needs to be started.
 In general, the term exception is used to refer to any
event that causes an interruption.
 Interrupt-requests from I/O devices is one type of an exception.
 Other types of exceptions are:
 Recovery from errors
 Debugging
 Privilege exception

Direct Memory Access
(contd..)
 Direct Memory Access (DMA):
 A special control unit may be provided to transfer a block of data
directly between an I/O device and the main memory, without
continuous intervention by the processor.
 Control unit which performs these transfers is a part
of the I/O device’s interface circuit. This control unit
is called as a DMA controller.
 DMA controller performs functions that would be
normally carried out by the processor:
 For each word, it provides the memory address and all the control
signals.
 To transfer a block of data, it increments the memory addresses
and keeps track of the number of transfers.

(contd..)
 DMA controller can transfer a block of data from an
external device to the processor, without any intervention
from the processor.
 However, the operation of the DMA controller must be under the control of
a program executed by the processor. That is, the processor must initiate
the DMA transfer.
 To initiate the DMA transfer, the processor informs the DMA
controller of:
 Starting address,
 Number of words in the block.
 Direction of transfer (I/O device to the memory, or memory to the I/O
device).
 Once the DMA controller completes the DMA transfer, it
informs the processor by raising an interrupt signal.

memory
Processor
System bus
Main
Keyboard
Disk/DMA
controller Printer
DMA
controller
Disk
Disk
• DMA controller connects a high-speed network to the computer bus.
• Disk controller, which controls two disks also has DMA capability. It provides two
DMA channels.
• It can perform two independent DMA operations, as if each disk has its own DMA
controller. The registers to store the memory address, word count and status and
control information are duplicated.
Network
Interface

(contd..)
 Processor and DMA controllers have to use the bus in an
interwoven fashion to access the memory.
 DMA devices are given higher priority than the processor to access the
bus.
 Among different DMA devices, high priority is given to high-speed
peripherals such as a disk or a graphics display device.
 Processor originates most memory access cycles on the
bus.
 DMA controller can be said to “steal” memory access cycles from the
bus. This interweaving technique is called as “cycle stealing”.
 An alternate approach is the provide a DMA controller
an exclusive capability to initiate transfers on the bus, and
hence exclusive access to the main memory. This is
known as the block or burst mode.

Bus arbitration
 Processor and DMA controllers both need to initiate
data transfers on the bus and access main memory.
 The device that is allowed to initiate transfers on the
bus at any given time is called the bus master.
 When the current bus master relinquishes its status as
the bus master, another device can acquire this status.
 The process by which the next device to become the bus master is
selected and bus mastership is transferred to it is called bus arbitration.
 Centralized arbitration:
 A single bus arbiter performs the arbitration.
 Distributed arbitration:
 All devices participate in the selection of the next bus master.

Centralized Bus
Arbitration
Processor
DMA
controller
1
DMA
controller
2
BG1 BG2
B R
B BS Y

Centralized Bus
Arbitration(cont.,)
• Bus arbiter may be the processor or a separate unit
connected to the bus.
• Normally, the processor is the bus master, unless it grants bus
membership to one of the DMA controllers.
• DMA controller requests the control of the bus by asserting the
Bus Request (BR) line.
• In response, the processor activates the Bus-Grant1 (BG1) line,
indicating that the controller may use the bus when it is free.
• BG1 signal is connected to all DMA controllers in a daisy chain
fashion.
• BBSY signal is 0, it indicates that the bus is busy. When BBSY
becomes 1, the DMA controller which asserted BR can
acquire control of the bus.

Centralized arbitration
(contd..)
BBSY
BG1
BG2
Bus
master
BR
Processor DMA controller 2 Processor
Time
DMA controller 2
asserts the BR signal.
Processor asserts
the BG1 signal
BG1 signal propagates
to DMA#2.
Processor relinquishes control
of the bus by setting BBSY to 1.

Distributed arbitration
 All devices waiting to use the bus share the responsibility of
carrying out the arbitration process.
 Arbitration process does not depend on a central arbiter and hence
distributed arbitration has higher reliability.
 Each device is assigned a 4-bit ID number.
 All the devices are connected using 5 lines, 4 arbitration lines
to transmit the ID, and one line for the Start-Arbitration signal.
 To request the bus a device:
 Asserts the Start-Arbitration signal.
 Places its 4-bit ID number on the arbitration lines.
 The pattern that appears on the arbitration lines is the
logical-OR of all the 4-bit device IDs placed on the arbitration
lines.

Distributed
arbitration(Contd.,)
• Arbitration process:
o Each device compares the pattern that appears on the arbitration lines to
its own ID, starting with MSB.
o If it detects a difference, it transmits 0s on the arbitration lines for that and
all lower bit positions.
o The pattern that appears on the arbitration lines is the logical-OR of all the
4-bit device IDs placed on the arbitration lines.

Distributed arbitration
(contd..)
• Device A has the ID 5 and wants to request the bus:
- Transmits the pattern 0101 on the arbitration lines.
• Device B has the ID 6 and wants to request the bus:
- Transmits the pattern 0110 on the arbitration lines.
• Pattern that appears on the arbitration lines is the logical OR of the patterns:
- Pattern 0111 appears on the arbitration lines.
Arbitration process:
• Each device compares the pattern that appears on the arbitration lines to its own
ID, starting with MSB.
• If it detects a difference, it transmits 0s on the arbitration lines for that and all lower
bit positions.
• Device A compares its ID 5 with a pattern 0101 to pattern 0111.
• It detects a difference at bit position 0, as a result, it transmits a pattern 0100 on the
arbitration lines.
• The pattern that appears on the arbitration lines is the logical-OR of 0100 and 0110,
which is 0110.
• This pattern is the same as the device ID of B, and hence B has won the arbitration.

Buses
• Processor, main memory, and I/O devices are
interconnected by means of a bus.
• Bus provides a communication path for the transfer
of data.
o Bus also includes lines to support interrupts and arbitration.
• A bus protocol is the set of rules that govern the
behavior of various devices connected to the bus,
as to when to place information on the bus, when
to assert control signals, etc.

Buses (contd..)
 Bus lines may be grouped into three types:
 Data
 Address
 Control
 Control signals specify:
 Whether it is a read or a write operation.
 Required size of the data, when several operand sizes (byte, word,
long word) are possible.
 Timing information to indicate when the processor and I/O devices
may place data or receive data from the bus.
 Schemes for timing of data transfers over a bus can be
classified into:
 Synchronous,
 Asynchronous.

Synchronous bus
Bus clock
Bus cycle

Synchronous bus (contd..)
Bus cycle
Data
Bus clock
command
Address and
t0 t1 t2
Time
Master places the
device address and
command on the bus,
and indicates that
it is a Read operation.
Addressed slave places
data on the data lines Master “strobes” the data
on the data lines into its
input buffer, for a Read
operation.
• In case of a Write operation, the master places the data on the bus along with the
address and commands at time t0.
• The slave strobes the data into its input buffer at time t2.

• Once the master places the device address and
command on the bus, it takes time for this information
to propagate to the devices:
o This time depends on the physical and electrical characteristics of
the bus.
• Also, all the devices have to be given enough time to
decode the address and control signals, so that the
addressed slave can place data on the bus.
• Width of the pulse t1 - t0 depends on:
o Maximum propagation delay between two devices connected to
the bus.
o Time taken by all the devices to decode the address and control
signals, so that the addressed slave can respond at time t1.

• At the end of the clock cycle, at time t2, the master
strobes the data on the data lines into its input
buffer if it’s a Read operation.
o “Strobe” means to capture the values of the data and store them
into a buffer.
• When data are to be loaded into a storage buffer
register, the data should be available for a period
longer than the setup time of the device.
• Width of the pulse t2 - t1 should be longer than:
o Maximum propagation time of the bus plus
o Set up time of the input buffer register of the master.

Data
Bus clock
command
Address and
t
0
t1 t
2
command
Address and
Data
Seen by
master
Seen by slave
tAM
tAS
tDS
tDM
Time
• Signals do not appear on the bus as soon as they are placed on the bus, due to the
propagation delay in the interface circuits.
• Signals reach the devices after a propagation delay which depends on the
characteristics of the bus.
• Data must remain on the bus for some time after t2 equal to the hold time of the buffer.
Address &
command
appear on the
bus.
Address &
command reach
the slave.
Data appears
on the bus.
Data reaches
the master.

• Data transfer has to be completed within one clock
cycle.
o Clock period t2 - t0 must be such that the longest propagation
delay on the bus and the slowest device interface must be
accommodated.
o Forces all the devices to operate at the speed of the slowest
device.
• Processor just assumes that the data are available
at t2 in case of a Read operation, or are read by
the device in case of a Write operation.
o What if the device is actually failed, and never really responded?

• Most buses have control signals to represent a
response from the slave.
• Control signals serve two purposes:
o Inform the master that the slave has recognized the address, and
is ready to participate in a data transfer operation.
o Enable to adjust the duration of the data transfer operation
based on the speed of the participating slaves.
• High-frequency bus clock is used:
o Data transfer spans several clock cycles instead of just one clock
cycle as in the earlier case.

1 2 3 4
Clock
Address
Command
Data
Slave-ready
Time
Address & command
requesting a Read
operation appear on
the bus.
Slave places the data on the bus,
and asserts Slave-ready signal.
Master strobes data
into the input buffer.
Clock changes are seen by all the devices
at the same time.

Asynchronous bus
 Data transfers on the bus is controlled by a handshake
between the master and the slave.
 Common clock in the synchronous bus case is replaced
by two timing control lines:
 Master-ready,
 Slave-ready.
 Master-ready signal is asserted by the master to indicate
to the slave that it is ready to participate in a data
transfer.
 Slave-ready signal is asserted by the slave in response to
the master-ready from the master, and it indicates to the
master that the slave is ready to participate in a data
transfer.

Asynchronous bus
(contd..)
• Data transfer using the handshake protocol:
o Master places the address and command information on the bus.
o Asserts the Master-ready signal to indicate to the slaves that the
address and command information has been placed on the bus.
o All devices on the bus decode the address.
o Address slave performs the required operation, and informs the
processor it has done so by asserting the Slave-ready signal.
o Master removes all the signals from the bus, once Slave-ready is
asserted.
o If the operation is a Read operation, Master also strobes the data
into its input buffer.

Asynchronous bus
(contd..)
Slave-ready
Data
Master-ready
and command
Address
Bus cycle
t1 t2 t3 t4 t5
t0
Time
t0 - Master places the address and command information on the bus.
t1 - Master asserts the Master-ready signal. Master-ready signal is asserted at t1
instead of t0
t2 - Addressed slave places the data on the bus and asserts the Slave-ready signal.
t3 - Slave-ready signal arrives at the master
.
t4 - Master removes the address and command information.
t5 - Slave receives the transition of the Master-ready signal from 1 to 0. It removes

Asynchronous vs.
Synchronous bus
• Advantages of asynchronous bus:
o Eliminates the need for synchronization between the sender and
the receiver.
o Can accommodate varying delays automatically, using the
Slave-ready signal.
• Disadvantages of asynchronous bus:
o Data transfer rate with full handshake is limited by two-round trip
delays.
o Data transfers using a synchronous bus involves only one round
trip delay, and hence a synchronous bus can achieve faster
rates.

Computer Architecture Input Output Memory.pptx

More Related Content

Similar to Computer Architecture Input Output Memory.pptx (20)

More from Gunasundari Selvaraj (9)

Recently uploaded (20)

Computer Architecture Input Output Memory.pptx

Editor's Notes