SlideShare a Scribd company logo
1
ECE 152 / 496
Introduction to Computer Architecture
Input/Output (I/O)
Benjamin C. Lee
Duke University
Slides from Daniel Sorin (Duke)
and are derived from work by
Amir Roth (Penn) and Alvy Lebeck (Duke)
ECE 152
© 2012 Daniel J. Sorin from Roth
2
Where We Are in This Course Right
Now
• So far:
• We know how to design a processor that can fetch, decode, and
execute the instructions in an ISA
• We understand how to design caches and memory
• Now:
• We learn about the lowest level of storage (disks)
• We learn about input/output in general
• Next:
• Multicore processors
• Evaluating and improving performance
ECE 152
© 2012 Daniel J. Sorin from Roth
3
This Unit:
I/O
• I/O system structure
• Devices, controllers, and buses
• Device characteristics
• Disks
• Bus characteristics
• I/O control
• Polling and interrupts
• DMA
Application
OS
Firmware
Compiler
I/O
Memory
Digital Circuits
Gates & Transistors
CPU
ECE 152
© 2012 Daniel J. Sorin from Roth
4
Readings
• Patterson and Hennessy
• Chapter 6
ECE 152
© 2012 Daniel J. Sorin from Roth
5
Computers Interact with Outside World
• Input/output (I/O)
• Otherwise, how will we ever tell a computer what to do…
• …or exploit the results of its work?
• Computers without I/O are not useful
• ICQ: What kinds of I/O do computers have?
ECE 152
© 2012 Daniel J. Sorin from Roth
6
One Instance of I/O
• Have briefly seen one instance of I/O
• Disk: bottom of memory hierarchy
• Holds whatever can’t fit in memory
• ICQ: What else do disks hold?
CPU
D$
L2
Main
Memory
I$
Disk(swap)
ECE 152
© 2012 Daniel J. Sorin from Roth
7
A More General/Realistic I/O System
• A computer system
• CPU, including cache(s)
• Memory (DRAM)
• I/O peripherals: disks, input devices, displays, network cards, ...
• With built-in or separate I/O (or DMA) controllers
• All connected by a system bus
CPU ($)
Main
Memory Disk
kbd
DMA DMA
display NIC
I/O ctrl
“System” (memory-I/O) bus
ECE 152
© 2012 Daniel J. Sorin from Roth
8
I/O: Control + Data Transfer
• I/O devices have two ports
• Control: commands and status reports
• How we tell I/O what to do
• How I/O tells us about itself
• Control is the tricky part (especially status reports)
• Data
• Labor-intensive part
• “Interesting” I/O devices do data transfers (to/from memory)
• Display: video memory  monitor
• Disk: memory  disk
• Network interface: memory  network
ECE 152
© 2012 Daniel J. Sorin from Roth
9
Operating System (OS) Plays a Big Role
• I/O interface is typically under OS control
• User applications access I/O devices indirectly (e.g., SYSCALL)
• Why?
• Device drivers are “programs” that OS uses to manage devices
• Virtualization: same argument as for memory
• Physical devices shared among multiple programs
• Direct access could lead to conflicts – example?
• Synchronization
• Most have asynchronous interfaces, require unbounded waiting
• OS handles asynchrony internally, presents synchronous interface
• Standardization
• Devices of a certain type (disks) can/will have different interfaces
• OS handles differences (via drivers), presents uniform interface
ECE 152
© 2012 Daniel J. Sorin from Roth
10
I/O Device Characteristics
• Primary characteristic
• Data rate (aka bandwidth)
• Contributing factors
• Partner: humans have slower output data rates than machines
• Input or output or both (input/output)
Device Partner I? O? Data Rate
(KB/s)
Keyboard Human Input 0.01
Mouse Human Input 0.02
Speaker Human Output 0.60
Printer Human Output 200
Display Human Output 240,000
Modem Machine I/O 7
Ethernet card Machine I/O ~1,000,000
Disk Machine I/O ~10,000
ECE 152
© 2012 Daniel J. Sorin from Roth
11
I/O Device Bandwidth: Some Examples
• Keyboard
• 1 B/key * 10 keys/s = 10 B/s
• Mouse
• 2 B/transfer * 10 transfers/s = 20 B/s
• Display
• 4 B/pixel * 1M pixel/display * 60 displays/s = 240 MB/s
ECE 152
© 2012 Daniel J. Sorin from Roth
12
I/O Device: Disk
• Disk: like stack of record players
• Collection of platters
• Each with read/write head
• Platters divided into concentric tracks
• Head seeks (forward/backward) to track
• All heads move in unison
• Each track divided into sectors
• ZBR (zone bit recording)
• More sectors on outer tracks
• Sectors rotate under head
• Controller
• Seeks heads, waits for sectors
• Turns heads on/off
• May have its own cache (made w/DRAM)
platter
head
sector
track
ECE 152
© 2012 Daniel J. Sorin from Roth
13
Disk Parameters
• Slightly newer disk from Toshiba
• 0.85”, 4 GB drives, used in iPod-mini
Seagate ST3200 Seagate Savvio Toshiba
MK1003
Diameter 3.5” 2.5” 1.8”
Capacity 200 GB 73 GB 10 GB
RPM 7200 RPM 10000 RPM 4200 RPM
Cache 8 MB ? 512 KB
Disks/Heads 2/4 2/4 1/2
Average Seek 8 ms 4.5 ms 7 ms
Peak Data Rate 150 MB/s 200 MB/s 200 MB/s
Sustained Data Rate 58 MB/s 94 MB/s 16 MB/s
Interface ATA SCSI ATA
Use Desktop Laptop iPod
ECE 152
© 2012 Daniel J. Sorin from Roth
14
Disk Read/Write Latency
• Disk read/write latency has four components
• Seek delay (tseek): head seeks to right track
• Rotational delay (trotation): right sector rotates under head
• On average: time to go halfway around disk
• Transfer time (ttransfer): data actually being transferred
• Controller delay (tcontroller): controller overhead (on either side)
• Example: time to read a 4KB page assuming…
• 128 sectors/track, 512 B/sector, 6000 RPM, 10 ms tseek, 1 ms tcontroller
• 6000 RPM  100 R/s  10 ms/R  trotation = 10 ms / 2 = 5 ms
• 4 KB page  8 sectors  ttransfer = 10 ms * 8/128 = 0.6 ms
• tdisk = tseek + trotation + ttransfer + tcontroller
= 10 + 5 + 0.6 + 1 = 16.6 ms
ECE 152
© 2012 Daniel J. Sorin from Roth
15
Disk Bandwidth
• Disk is bandwidth-inefficient for page-sized transfers
• Actual data transfer (ttransfer) a small part of disk access (and
cycle)
• Increase bandwidth: stripe data across multiple disks
• Striping strategy depends on disk usage model
• “File System” or “web server”: many small files
• Map entire files to disks
• “Supercomputer” or “database”: several large files
• Stripe single file across multiple disks
• Both bandwidth and individual transaction latency
important
ECE 152
© 2012 Daniel J. Sorin from Roth
16
Error Correction: RAID
• Error correction: more important for disk than for
memory
• Mechanical disk failures (entire disk lost) is common failure mode
• Entire file system can be lost if files striped across multiple disks
• RAID (redundant array of inexpensive disks)
• Similar to DRAM error correction, but…
• Major difference: which disk failed is known
• Even parity can be used to recover from single failures
• Parity disk can be used to reconstruct data faulty disk
• RAID design balances bandwidth and fault-tolerance
• Many flavors of RAID exist
• Tradeoff: extra disks (cost) vs. performance vs. reliability
• Deeper discussion of RAID in ECE 252 and ECE 254
• RAID doesn’t solve all problems  can you think of any examples?
ECE 152
© 2012 Daniel J. Sorin from Roth
17
The System Bus
• System bus: connects system components together
• Important: insufficient bandwidth can bottleneck entire system
• Performance factors
• Physical length
• Number and type of connected devices (taps)
CPU ($)
Main
Memory Disk
kbd
DMA DMA
display NIC
I/O ctrl
“System” (memory-I/O) bus
ECE 152
© 2012 Daniel J. Sorin from Roth
18
Three Buses
• Processor-memory bus
• Connects CPU and memory, no direct I/O interface
+ Short, few taps  fast, high-bandwidth
– System specific
• I/O bus
• Connects I/O devices, no direct P-M interface
– Longer, more taps  slower, lower-bandwidth
+ Industry standard
• Connect P-M bus to I/O bus using adapter
• Backplane bus
• CPU, memory, I/O connected to same bus
+ Industry standard, cheap (no adapters needed)
– Processor-memory performance compromised
CPU
I/O I/O
I/O
Mem
Proc-Mem
adapter
I/O I/O
Backplane
CPU Mem
ECE 152
© 2012 Daniel J. Sorin from Roth
19
Bus Design
• Goals
• High Performance: low latency and high bandwidth
• Standardization: flexibility in dealing with many devices
• Low Cost
• Processor-memory bus emphasizes performance, then cost
• I/O & backplane emphasize standardization, then
performance
• Design issues
• Width/multiplexing: are wires shared or separate?
• Clocking: is bus clocked or not?
• Switching: how/when is bus control acquired and released?
• Arbitration: how do we decide who gets the bus next?
data lines
address lines
control lines
ECE 152
© 2012 Daniel J. Sorin from Roth
20
(1) Bus Width and Multiplexing
• Wider
+ More bandwidth
– More expensive and more susceptible to skew
• Multiplexed: address and data share same lines
+ Cheaper
– Less bandwidth
• Burst transfers (bus parking)
• Multiple sequential data transactions for single address
+ Increase bandwidth at relatively little cost
ECE 152
© 2012 Daniel J. Sorin from Roth
21
(2) Bus Clocking
• Synchronous: clocked
+ Fast
– Bus must be short to minimize clock skew
• Asynchronous: un-clocked
+ Can be longer: no clock skew, deals with devices of different speeds
– Slower: requires “hand-shaking” protocol
• For example, asynchronous read
• Multiplexed data/address lines, 3 control lines
• Processor drives address onto bus, asserts Request line
• Memory asserts Ack line, processor stops driving
• Memory drives data on bus, asserts DataReady line
• Processor asserts Ack line, memory stops driving
• P-M buses are synchronous
• I/O and backplane buses asynchronous or slow-clock synchronous
ECE 152
© 2012 Daniel J. Sorin from Roth
22
(3) Bus Switching
• Atomic: bus “busy” between request and reply
+ Simple
– Low utilization
• Split-transaction: requests/replies can be interleaved
+ Higher utilization  higher throughput
– Complex, requires sending IDs to match replies to request
ECE 152
© 2012 Daniel J. Sorin from Roth
23
(4) Bus Arbitration
• Bus master: component that can initiate a bus request
• Bus typically has several masters, including processor
• I/O devices can also be masters (Why? See in a bit)
• Arbitration: choosing a master among multiple
requests
• Try to implement priority and fairness (no device “starves”)
• Daisy-chain: devices connect to bus in priority order
• High-priority devices intercept/deny requests by low-priority ones
 Simple, but slow and can’t ensure fairness
• Centralized: special arbiter chip collects requests, decides
 Ensures fairness, but arbiter chip may itself become bottleneck
• Distributed: everyone sees all requests simultaneously
• Back off and retry if not the highest priority request
 No bottlenecks and fair, but needs a lot of control lines
ECE 152
© 2012 Daniel J. Sorin from Roth
24
Standard Bus Examples
• USB (universal serial bus)
• Popular for low/moderate bandwidth external peripherals
+ Packetized interface (like TCP), extremely flexible
+ Also supplies power to the peripheral
PCI/PCIe SCSI USB
Type Backplane I/O I/O
Width 32–64 bits 8–32 bits 1
Multiplexed? Yes Yes Yes
Clocking 33 (66) MHz 5 (10) MHz Asynchronous
Data rate 133 (266) MB/s 10 (20) MB/s 0.2, 1.5, 60 MB/s
Arbitration Distributed Distributed Daisy-chain
Maximum
masters
1024 7–31 127
Maximum length 0.5 m 2.5 m –
ECE 152
© 2012 Daniel J. Sorin from Roth
25
This Unit:
I/O
• I/O system structure
• Devices, controllers, and buses
• Device characteristics
• Disks
• Bus characteristics
• I/O control
• Polling and interrupts
• DMA
Application
OS
Firmware
Compiler
I/O
Memory
Digital Circuits
Gates & Transistors
CPU
ECE 152
© 2012 Daniel J. Sorin from Roth
26
I/O Control and Interfaces
• Now that we know how I/O devices and buses work…
• How does I/O actually happen?
• How does CPU give commands to I/O devices?
• How do I/O devices execute data transfers?
• How does CPU know when I/O devices are done?
ECE 152
© 2012 Daniel J. Sorin from Roth
27
Sending Commands to I/O Devices
• Remember: only OS can do this! Two options …
• I/O instructions
• OS only? Instructions must be privileged (only OS can execute)
• E.g., IA-32 (deprecated)
• Memory-mapped I/O
• Portion of physical address space reserved for I/O
• OS maps physical addresses to I/O device control registers
• Stores/loads to these addresses are commands to I/O devices
• Main memory ignores them, I/O devices recognize and
respond
• Address specifies both I/O device and command
• These address are not cached – why?
• OS only? I/O physical addresses only mapped in OS address space
ECE 152
© 2012 Daniel J. Sorin from Roth
28
Querying I/O Device Status
• Now that we’ve sent command to I/O device …
• How do we query I/O device status?
• So that we know if data we asked for is ready?
• So that we know if device is ready to receive next command?
• Polling: Ready now? How about now? How about
now???
• Processor queries I/O device status register (e.g., with MM load)
• Loops until it gets status it wants (ready for next command)
• Or tries again a little later
+ Simple
– Waste of processor’s time
• Processor much faster than I/O device
ECE 152
© 2012 Daniel J. Sorin from Roth
29
Polling Overhead: Example #1
• Parameters
• 2 GHz CPU
• Polling event takes 400 cycles
• Overhead for polling a mouse 30 times per second?
• Cycles per second for polling = (30 poll/s)*(400 cycles/poll)
  12000 cycles/second for polling
• (12000 cycles/second)/(2 G cycles/second) = 0.0006% overhead
+ Negligible overhead
ECE 152
© 2012 Daniel J. Sorin from Roth
30
Polling Overhead: Example #2
• Same parameters
• 2 GHz CPU, polling event takes 400 cycles
• Overhead for polling a 4 MB/s disk with 16 B interface?
• Must poll often enough not to miss data from disk
• Polling rate = (4MB/s)/(16 B/poll) >> mouse polling rate
• Cycles per second for polling=[(4MB/s)/(16 B/poll)]*(400
cyc/poll)
  100 M cycles/second for polling
• (100 M cycles/second)/(2 G cycles/second) = 5% overhead
– Not so good
• This is the overhead of polling, not actual data transfer
• Really bad if disk is not being used (pure overhead!)
ECE 152
© 2012 Daniel J. Sorin from Roth
31
Interrupt-Driven I/O
• Interrupts: alternative to polling
• I/O device generates interrupt when status changes, data ready
• OS handles interrupts just like exceptions (e.g., page faults)
• Identity of interrupting I/O device recorded in ECR
• ECR: exception cause register
• I/O interrupts are asynchronous
• Not associated with any one instruction
• Don’t need to be handled immediately
• I/O interrupts are prioritized
• Synchronous interrupts (e.g., page faults) have highest
priority
• High-bandwidth I/O devices have higher priority than low-
bandwidth ones
ECE 152
© 2012 Daniel J. Sorin from Roth
32
Interrupt Overhead
• Parameters
• 2 GHz CPU
• Polling event takes 400 cycles
• Interrupt handler takes 400 cycles
• Data transfer takes 100 cycles
• 4 MB/s, 16 B interface disk, transfers data only 5% of time
• Percent of time processor spends transferring data
• 0.05 * (4 MB/s)/(16 B/xfer)*[(100 c/xfer)/(2 G c/s)] = 0.06%
• Overhead for polling?
• (4 MB/s)/(16 B/poll) * [(400 c/poll)/(2 G c/s)] = 5%
• Overhead for interrupts?
+ 0.05 * (4 MB/s)/(16 B/int) * [(400 c/int)/(2 G c/s)] = 0.25%
Note: when disk is
transferring data, the interrupt
rate is same as polling rate
ECE 152
© 2012 Daniel J. Sorin from Roth
33
Direct Memory Access (DMA)
• Interrupts remove overhead of polling…
• But still requires OS to transfer data one word at a time
• OK for low bandwidth I/O devices: mice, microphones, etc.
• Bad for high bandwidth I/O devices: disks, monitors, etc.
• Direct Memory Access (DMA)
• Transfer data between I/O and memory without processor
control
• Transfers entire blocks (e.g., pages, video frames) at a time
• Can use bus “burst” transfer mode if available
• Only interrupts processor when done (or if error occurs)
ECE 152
© 2012 Daniel J. Sorin from Roth
34
DMA Controllers
• To do DMA, I/O device attached to DMA controller
• Multiple devices can be connected to one DMA controller
• Controller itself seen as a memory mapped I/O device
• Processor initializes start memory address, transfer size, etc.
• DMA controller takes care of bus arbitration and transfer details
• So that’s why buses support arbitration and multiple masters!
CPU ($)
Main
Memory Disk
DMA DMA
display NIC
I/O ctrl
Bus
ECE 152
© 2012 Daniel J. Sorin from Roth
35
I/O Processors
• A DMA controller is a very simple component
• May be as simple as a FSM with some local memory
• Some I/O requires complicated sequences of transfers
• I/O processor: heavier DMA controller that executes instructions
• Can be programmed to do complex transfers
• E.g., programmable network card
CPU ($)
Main
Memory Disk
DMA DMA
display NIC
IOP
Bus
ECE 152
© 2012 Daniel J. Sorin from Roth
36
DMA Overhead
• Parameters
• 2 GHz CPU
• Interrupt handler takes 400 cycles
• Data transfer takes 100 cycles
• 4 MB/s, 16 B interface, disk transfers data 50% of time
• DMA setup takes 1600 cycles, transfer 1 16KB page at a time
• Processor overhead for interrupt-driven I/O?
• 0.5 * (4M B/s)/(16 B/xfer)*[(500 c/xfer)/(2 G c/s)] = 3.1%
• Processor overhead with DMA?
• Processor only gets involved once per page, not once per 16 B
+ 0.5 * (4M B/s)/(16K B/page) * [(2000 c/page)/(2 G c/s)] = 0.01%
ECE 152
© 2012 Daniel J. Sorin from Roth
37
DMA and Memory Hierarchy
• DMA is good, but is not without challenges
• Without DMA: processor initiates all data transfers
• All transfers go through address translation
+ Transfers can be of any size and cross virtual page
boundaries
• All values seen by cache hierarchy
+ Caches never contain stale data
• With DMA: DMA controllers initiate data transfers
• Do they use virtual or physical addresses?
• What if they write data to a cached memory location?
ECE 152
© 2012 Daniel J. Sorin from Roth
38
DMA and Address Translation
• Which addresses does processor specify to DMA
controller?
• Virtual DMA
+ Can specify large cross-page transfers
– DMA controller has to do address translation internally
• DMA contains small translation buffer (TB)
• OS initializes buffer contents when it requests an I/O transfer
• Physical DMA
+ DMA controller is simple
– Can only do short page-size transfers
• OS breaks large transfers into page-size chunks
ECE 152
© 2012 Daniel J. Sorin from Roth
39
DMA and Caching
• Caches are good
• Reduce CPU’s observed instruction and data access latency
+ But also, reduce CPU’s use of memory…
+ …leaving majority of memory/bus bandwidth for DMA I/O
• But they also introduce a coherence problem for DMA
• Input problem
• DMA write into memory version of cached location
• Cached version now stale
• Output problem: write-back caches only
• DMA read from memory version of “dirty” cached location
• Output stale value
ECE 152
© 2012 Daniel J. Sorin from Roth
40
Solutions to Coherence Problem
• Route all DMA I/O accesses to cache
+ Solves problem
– Expensive: CPU must contend for access to caches with DMA
• Disallow caching of I/O data
+ Also works
– Expensive in a different way: CPU access to those regions slow
• Selective flushing/invalidations of cached data
• Flush all dirty blocks in “I/O region”
• Invalidate blocks in “I/O region” as DMA writes those addresses
+ The high performance solution
• Hardware cache coherence mechanisms for doing this
– Expensive in yet a third way: must implement this mechanism
ECE 152
© 2012 Daniel J. Sorin from Roth
41
H/W Cache Coherence (more later on
this)
• D$ and L2 “snoop”bus traffic
• Observe transactions
• Check if written addresses are resident
• Self-invalidate those blocks
+ Doesn’t require access to data part
– Does require access to tag part
• May need 2nd copy of tags for this
• That’s OK, tags smaller than data
• Bus addresses are physical
• L2 is easy (physical index/tag)
• D$ is harder (virtual index/physical
tag)
CPU
D$
L2
I$
TLB
PA
PA
VA VA
TLB
Main
Memory Disk
DMA
Bus
ECE 152
© 2012 Daniel J. Sorin from Roth
42
Designing an I/O System for Bandwidth
• Approach
• Find bandwidths of individual components
• Configure components you can change…
• To match bandwidth of bottleneck component you can’t
• Example (from P&H textbook, 3rd
edition)
• Parameters
• 300 MIPS CPU, 100 MB/s backplane bus
• 50K OS insns + 100K user insns per I/O operation
• SCSI-2 controllers (20 MB/s): each accommodates up to 7 disks
• 5 MB/s disks with tseek + trotation = 10 ms, 64 KB reads
• Determine
• What is the maximum sustainable I/O rate?
• How many SCSI-2 controllers and disks does it require?
ECE 152
© 2012 Daniel J. Sorin from Roth
43
Designing an I/O System for Bandwidth
• First: determine I/O rates of components we can’t change
• CPU: (300M insn/s) / (150K Insns/IO) = 2000 IO/s
• Backplane: (100M B/s) / (64K B/IO) = 1562 IO/s
• Peak I/O rate determined by bus: 1562 IO/s
• Second: configure remaining components to match rate
• Disk: 1 / [10 ms/IO + (64K B/IO) / (5M B/s)] = 43.9 IO/s
• How many disks?
• (1562 IO/s) / (43.9 IO/s) = 36 disks
• How many controllers?
• (43.9 IO/s) * (64K B/IO) = 2.74M B/s per disk
• (20M B/s) / (2.74M B/s) = 7.2 disks per SCSI controller
• (36 disks) / (7 disks/SCSI-2) = 6 SCSI-2 controllers
• Caveat: real I/O systems modeled with simulation
ECE 152
© 2012 Daniel J. Sorin from Roth
44
Designing an I/O System for Latency
• Previous system designed for bandwidth
• Some systems have latency requirements as well
• E.g., database system may require maximum or average latency
• Latencies are actually harder to deal with than bandwidths
• Unloaded system: few concurrent IO transactions
• Latency is easy to calculate
• Loaded system: many concurrent IO transactions
• Contention can lead to queuing
• Latencies can rise dramatically
• Queuing theory can help if transactions obey fixed distribution
• Otherwise simulation is needed
ECE 152
© 2012 Daniel J. Sorin from Roth
45
Summary
• Role of the OS
• Device characteristics
• Data bandwidth
• Disks
• Structure and latency: seek, rotation, transfer, controller delays
• Bus characteristics
• Processor-memory, I/O, and backplane buses
• Width, multiplexing, clocking, switching, arbitration
• I/O control
• I/O instructions vs. memory mapped I/O
• Polling vs. interrupts
• Processor controlled data transfer vs. DMA
• Interaction of DMA with memory system

More Related Content

PPT
Supplemental lecture on comprehensive I/O systems.ppt
PPT
Manajemen Disk pada Sistem Operasi MK.ppt
PPT
04 connector and components
PPT
lec17-disks.ppt
PPT
Lec3_Storage_Management.ppt
PPT
ch02.ppt
PPTX
Week 1 Session 1 - Introduction to Personal Computer.pptx
PPTX
Comp 501.pptx
Supplemental lecture on comprehensive I/O systems.ppt
Manajemen Disk pada Sistem Operasi MK.ppt
04 connector and components
lec17-disks.ppt
Lec3_Storage_Management.ppt
ch02.ppt
Week 1 Session 1 - Introduction to Personal Computer.pptx
Comp 501.pptx

Similar to Computer organization and architecture - (20)

PDF
disk sechduling
PDF
M1. Introducing Computers Part A .pdf
PDF
I.T for Management: What is a computer and how does it work
PPT
Chapter 7
PPT
Basic hardware concept
PPTX
CADCAM: Product life cycle and CAD input, out puts advantages and
PPTX
Internal components storage devices
PPT
io-management_operatingsystembasicss.ppt
PPT
L21-Introduction-to-IO.ppt
PDF
Week 3 intro to computer organization and assembly language
PPT
hard.ppt
PPT
hard.ppt
PPT
hard.ppt
PPT
PPT
hardware.ppt
PPT
hard (1).ppt
PPSX
Coa presentation3
disk sechduling
M1. Introducing Computers Part A .pdf
I.T for Management: What is a computer and how does it work
Chapter 7
Basic hardware concept
CADCAM: Product life cycle and CAD input, out puts advantages and
Internal components storage devices
io-management_operatingsystembasicss.ppt
L21-Introduction-to-IO.ppt
Week 3 intro to computer organization and assembly language
hard.ppt
hard.ppt
hard.ppt
hardware.ppt
hard (1).ppt
Coa presentation3
Ad

Recently uploaded (20)

PDF
Digital Logic Computer Design lecture notes
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
OOP with Java - Java Introduction (Basics)
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
Construction Project Organization Group 2.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT
Mechanical Engineering MATERIALS Selection
PDF
PPT on Performance Review to get promotions
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPT
Project quality management in manufacturing
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Geodesy 1.pptx...............................................
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Well-logging-methods_new................
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
Digital Logic Computer Design lecture notes
Model Code of Practice - Construction Work - 21102022 .pdf
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
OOP with Java - Java Introduction (Basics)
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Construction Project Organization Group 2.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Mechanical Engineering MATERIALS Selection
PPT on Performance Review to get promotions
Operating System & Kernel Study Guide-1 - converted.pdf
Project quality management in manufacturing
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Geodesy 1.pptx...............................................
Embodied AI: Ushering in the Next Era of Intelligent Systems
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Well-logging-methods_new................
CYBER-CRIMES AND SECURITY A guide to understanding
Ad

Computer organization and architecture -

  • 1. 1 ECE 152 / 496 Introduction to Computer Architecture Input/Output (I/O) Benjamin C. Lee Duke University Slides from Daniel Sorin (Duke) and are derived from work by Amir Roth (Penn) and Alvy Lebeck (Duke)
  • 2. ECE 152 © 2012 Daniel J. Sorin from Roth 2 Where We Are in This Course Right Now • So far: • We know how to design a processor that can fetch, decode, and execute the instructions in an ISA • We understand how to design caches and memory • Now: • We learn about the lowest level of storage (disks) • We learn about input/output in general • Next: • Multicore processors • Evaluating and improving performance
  • 3. ECE 152 © 2012 Daniel J. Sorin from Roth 3 This Unit: I/O • I/O system structure • Devices, controllers, and buses • Device characteristics • Disks • Bus characteristics • I/O control • Polling and interrupts • DMA Application OS Firmware Compiler I/O Memory Digital Circuits Gates & Transistors CPU
  • 4. ECE 152 © 2012 Daniel J. Sorin from Roth 4 Readings • Patterson and Hennessy • Chapter 6
  • 5. ECE 152 © 2012 Daniel J. Sorin from Roth 5 Computers Interact with Outside World • Input/output (I/O) • Otherwise, how will we ever tell a computer what to do… • …or exploit the results of its work? • Computers without I/O are not useful • ICQ: What kinds of I/O do computers have?
  • 6. ECE 152 © 2012 Daniel J. Sorin from Roth 6 One Instance of I/O • Have briefly seen one instance of I/O • Disk: bottom of memory hierarchy • Holds whatever can’t fit in memory • ICQ: What else do disks hold? CPU D$ L2 Main Memory I$ Disk(swap)
  • 7. ECE 152 © 2012 Daniel J. Sorin from Roth 7 A More General/Realistic I/O System • A computer system • CPU, including cache(s) • Memory (DRAM) • I/O peripherals: disks, input devices, displays, network cards, ... • With built-in or separate I/O (or DMA) controllers • All connected by a system bus CPU ($) Main Memory Disk kbd DMA DMA display NIC I/O ctrl “System” (memory-I/O) bus
  • 8. ECE 152 © 2012 Daniel J. Sorin from Roth 8 I/O: Control + Data Transfer • I/O devices have two ports • Control: commands and status reports • How we tell I/O what to do • How I/O tells us about itself • Control is the tricky part (especially status reports) • Data • Labor-intensive part • “Interesting” I/O devices do data transfers (to/from memory) • Display: video memory  monitor • Disk: memory  disk • Network interface: memory  network
  • 9. ECE 152 © 2012 Daniel J. Sorin from Roth 9 Operating System (OS) Plays a Big Role • I/O interface is typically under OS control • User applications access I/O devices indirectly (e.g., SYSCALL) • Why? • Device drivers are “programs” that OS uses to manage devices • Virtualization: same argument as for memory • Physical devices shared among multiple programs • Direct access could lead to conflicts – example? • Synchronization • Most have asynchronous interfaces, require unbounded waiting • OS handles asynchrony internally, presents synchronous interface • Standardization • Devices of a certain type (disks) can/will have different interfaces • OS handles differences (via drivers), presents uniform interface
  • 10. ECE 152 © 2012 Daniel J. Sorin from Roth 10 I/O Device Characteristics • Primary characteristic • Data rate (aka bandwidth) • Contributing factors • Partner: humans have slower output data rates than machines • Input or output or both (input/output) Device Partner I? O? Data Rate (KB/s) Keyboard Human Input 0.01 Mouse Human Input 0.02 Speaker Human Output 0.60 Printer Human Output 200 Display Human Output 240,000 Modem Machine I/O 7 Ethernet card Machine I/O ~1,000,000 Disk Machine I/O ~10,000
  • 11. ECE 152 © 2012 Daniel J. Sorin from Roth 11 I/O Device Bandwidth: Some Examples • Keyboard • 1 B/key * 10 keys/s = 10 B/s • Mouse • 2 B/transfer * 10 transfers/s = 20 B/s • Display • 4 B/pixel * 1M pixel/display * 60 displays/s = 240 MB/s
  • 12. ECE 152 © 2012 Daniel J. Sorin from Roth 12 I/O Device: Disk • Disk: like stack of record players • Collection of platters • Each with read/write head • Platters divided into concentric tracks • Head seeks (forward/backward) to track • All heads move in unison • Each track divided into sectors • ZBR (zone bit recording) • More sectors on outer tracks • Sectors rotate under head • Controller • Seeks heads, waits for sectors • Turns heads on/off • May have its own cache (made w/DRAM) platter head sector track
  • 13. ECE 152 © 2012 Daniel J. Sorin from Roth 13 Disk Parameters • Slightly newer disk from Toshiba • 0.85”, 4 GB drives, used in iPod-mini Seagate ST3200 Seagate Savvio Toshiba MK1003 Diameter 3.5” 2.5” 1.8” Capacity 200 GB 73 GB 10 GB RPM 7200 RPM 10000 RPM 4200 RPM Cache 8 MB ? 512 KB Disks/Heads 2/4 2/4 1/2 Average Seek 8 ms 4.5 ms 7 ms Peak Data Rate 150 MB/s 200 MB/s 200 MB/s Sustained Data Rate 58 MB/s 94 MB/s 16 MB/s Interface ATA SCSI ATA Use Desktop Laptop iPod
  • 14. ECE 152 © 2012 Daniel J. Sorin from Roth 14 Disk Read/Write Latency • Disk read/write latency has four components • Seek delay (tseek): head seeks to right track • Rotational delay (trotation): right sector rotates under head • On average: time to go halfway around disk • Transfer time (ttransfer): data actually being transferred • Controller delay (tcontroller): controller overhead (on either side) • Example: time to read a 4KB page assuming… • 128 sectors/track, 512 B/sector, 6000 RPM, 10 ms tseek, 1 ms tcontroller • 6000 RPM  100 R/s  10 ms/R  trotation = 10 ms / 2 = 5 ms • 4 KB page  8 sectors  ttransfer = 10 ms * 8/128 = 0.6 ms • tdisk = tseek + trotation + ttransfer + tcontroller = 10 + 5 + 0.6 + 1 = 16.6 ms
  • 15. ECE 152 © 2012 Daniel J. Sorin from Roth 15 Disk Bandwidth • Disk is bandwidth-inefficient for page-sized transfers • Actual data transfer (ttransfer) a small part of disk access (and cycle) • Increase bandwidth: stripe data across multiple disks • Striping strategy depends on disk usage model • “File System” or “web server”: many small files • Map entire files to disks • “Supercomputer” or “database”: several large files • Stripe single file across multiple disks • Both bandwidth and individual transaction latency important
  • 16. ECE 152 © 2012 Daniel J. Sorin from Roth 16 Error Correction: RAID • Error correction: more important for disk than for memory • Mechanical disk failures (entire disk lost) is common failure mode • Entire file system can be lost if files striped across multiple disks • RAID (redundant array of inexpensive disks) • Similar to DRAM error correction, but… • Major difference: which disk failed is known • Even parity can be used to recover from single failures • Parity disk can be used to reconstruct data faulty disk • RAID design balances bandwidth and fault-tolerance • Many flavors of RAID exist • Tradeoff: extra disks (cost) vs. performance vs. reliability • Deeper discussion of RAID in ECE 252 and ECE 254 • RAID doesn’t solve all problems  can you think of any examples?
  • 17. ECE 152 © 2012 Daniel J. Sorin from Roth 17 The System Bus • System bus: connects system components together • Important: insufficient bandwidth can bottleneck entire system • Performance factors • Physical length • Number and type of connected devices (taps) CPU ($) Main Memory Disk kbd DMA DMA display NIC I/O ctrl “System” (memory-I/O) bus
  • 18. ECE 152 © 2012 Daniel J. Sorin from Roth 18 Three Buses • Processor-memory bus • Connects CPU and memory, no direct I/O interface + Short, few taps  fast, high-bandwidth – System specific • I/O bus • Connects I/O devices, no direct P-M interface – Longer, more taps  slower, lower-bandwidth + Industry standard • Connect P-M bus to I/O bus using adapter • Backplane bus • CPU, memory, I/O connected to same bus + Industry standard, cheap (no adapters needed) – Processor-memory performance compromised CPU I/O I/O I/O Mem Proc-Mem adapter I/O I/O Backplane CPU Mem
  • 19. ECE 152 © 2012 Daniel J. Sorin from Roth 19 Bus Design • Goals • High Performance: low latency and high bandwidth • Standardization: flexibility in dealing with many devices • Low Cost • Processor-memory bus emphasizes performance, then cost • I/O & backplane emphasize standardization, then performance • Design issues • Width/multiplexing: are wires shared or separate? • Clocking: is bus clocked or not? • Switching: how/when is bus control acquired and released? • Arbitration: how do we decide who gets the bus next? data lines address lines control lines
  • 20. ECE 152 © 2012 Daniel J. Sorin from Roth 20 (1) Bus Width and Multiplexing • Wider + More bandwidth – More expensive and more susceptible to skew • Multiplexed: address and data share same lines + Cheaper – Less bandwidth • Burst transfers (bus parking) • Multiple sequential data transactions for single address + Increase bandwidth at relatively little cost
  • 21. ECE 152 © 2012 Daniel J. Sorin from Roth 21 (2) Bus Clocking • Synchronous: clocked + Fast – Bus must be short to minimize clock skew • Asynchronous: un-clocked + Can be longer: no clock skew, deals with devices of different speeds – Slower: requires “hand-shaking” protocol • For example, asynchronous read • Multiplexed data/address lines, 3 control lines • Processor drives address onto bus, asserts Request line • Memory asserts Ack line, processor stops driving • Memory drives data on bus, asserts DataReady line • Processor asserts Ack line, memory stops driving • P-M buses are synchronous • I/O and backplane buses asynchronous or slow-clock synchronous
  • 22. ECE 152 © 2012 Daniel J. Sorin from Roth 22 (3) Bus Switching • Atomic: bus “busy” between request and reply + Simple – Low utilization • Split-transaction: requests/replies can be interleaved + Higher utilization  higher throughput – Complex, requires sending IDs to match replies to request
  • 23. ECE 152 © 2012 Daniel J. Sorin from Roth 23 (4) Bus Arbitration • Bus master: component that can initiate a bus request • Bus typically has several masters, including processor • I/O devices can also be masters (Why? See in a bit) • Arbitration: choosing a master among multiple requests • Try to implement priority and fairness (no device “starves”) • Daisy-chain: devices connect to bus in priority order • High-priority devices intercept/deny requests by low-priority ones  Simple, but slow and can’t ensure fairness • Centralized: special arbiter chip collects requests, decides  Ensures fairness, but arbiter chip may itself become bottleneck • Distributed: everyone sees all requests simultaneously • Back off and retry if not the highest priority request  No bottlenecks and fair, but needs a lot of control lines
  • 24. ECE 152 © 2012 Daniel J. Sorin from Roth 24 Standard Bus Examples • USB (universal serial bus) • Popular for low/moderate bandwidth external peripherals + Packetized interface (like TCP), extremely flexible + Also supplies power to the peripheral PCI/PCIe SCSI USB Type Backplane I/O I/O Width 32–64 bits 8–32 bits 1 Multiplexed? Yes Yes Yes Clocking 33 (66) MHz 5 (10) MHz Asynchronous Data rate 133 (266) MB/s 10 (20) MB/s 0.2, 1.5, 60 MB/s Arbitration Distributed Distributed Daisy-chain Maximum masters 1024 7–31 127 Maximum length 0.5 m 2.5 m –
  • 25. ECE 152 © 2012 Daniel J. Sorin from Roth 25 This Unit: I/O • I/O system structure • Devices, controllers, and buses • Device characteristics • Disks • Bus characteristics • I/O control • Polling and interrupts • DMA Application OS Firmware Compiler I/O Memory Digital Circuits Gates & Transistors CPU
  • 26. ECE 152 © 2012 Daniel J. Sorin from Roth 26 I/O Control and Interfaces • Now that we know how I/O devices and buses work… • How does I/O actually happen? • How does CPU give commands to I/O devices? • How do I/O devices execute data transfers? • How does CPU know when I/O devices are done?
  • 27. ECE 152 © 2012 Daniel J. Sorin from Roth 27 Sending Commands to I/O Devices • Remember: only OS can do this! Two options … • I/O instructions • OS only? Instructions must be privileged (only OS can execute) • E.g., IA-32 (deprecated) • Memory-mapped I/O • Portion of physical address space reserved for I/O • OS maps physical addresses to I/O device control registers • Stores/loads to these addresses are commands to I/O devices • Main memory ignores them, I/O devices recognize and respond • Address specifies both I/O device and command • These address are not cached – why? • OS only? I/O physical addresses only mapped in OS address space
  • 28. ECE 152 © 2012 Daniel J. Sorin from Roth 28 Querying I/O Device Status • Now that we’ve sent command to I/O device … • How do we query I/O device status? • So that we know if data we asked for is ready? • So that we know if device is ready to receive next command? • Polling: Ready now? How about now? How about now??? • Processor queries I/O device status register (e.g., with MM load) • Loops until it gets status it wants (ready for next command) • Or tries again a little later + Simple – Waste of processor’s time • Processor much faster than I/O device
  • 29. ECE 152 © 2012 Daniel J. Sorin from Roth 29 Polling Overhead: Example #1 • Parameters • 2 GHz CPU • Polling event takes 400 cycles • Overhead for polling a mouse 30 times per second? • Cycles per second for polling = (30 poll/s)*(400 cycles/poll)   12000 cycles/second for polling • (12000 cycles/second)/(2 G cycles/second) = 0.0006% overhead + Negligible overhead
  • 30. ECE 152 © 2012 Daniel J. Sorin from Roth 30 Polling Overhead: Example #2 • Same parameters • 2 GHz CPU, polling event takes 400 cycles • Overhead for polling a 4 MB/s disk with 16 B interface? • Must poll often enough not to miss data from disk • Polling rate = (4MB/s)/(16 B/poll) >> mouse polling rate • Cycles per second for polling=[(4MB/s)/(16 B/poll)]*(400 cyc/poll)   100 M cycles/second for polling • (100 M cycles/second)/(2 G cycles/second) = 5% overhead – Not so good • This is the overhead of polling, not actual data transfer • Really bad if disk is not being used (pure overhead!)
  • 31. ECE 152 © 2012 Daniel J. Sorin from Roth 31 Interrupt-Driven I/O • Interrupts: alternative to polling • I/O device generates interrupt when status changes, data ready • OS handles interrupts just like exceptions (e.g., page faults) • Identity of interrupting I/O device recorded in ECR • ECR: exception cause register • I/O interrupts are asynchronous • Not associated with any one instruction • Don’t need to be handled immediately • I/O interrupts are prioritized • Synchronous interrupts (e.g., page faults) have highest priority • High-bandwidth I/O devices have higher priority than low- bandwidth ones
  • 32. ECE 152 © 2012 Daniel J. Sorin from Roth 32 Interrupt Overhead • Parameters • 2 GHz CPU • Polling event takes 400 cycles • Interrupt handler takes 400 cycles • Data transfer takes 100 cycles • 4 MB/s, 16 B interface disk, transfers data only 5% of time • Percent of time processor spends transferring data • 0.05 * (4 MB/s)/(16 B/xfer)*[(100 c/xfer)/(2 G c/s)] = 0.06% • Overhead for polling? • (4 MB/s)/(16 B/poll) * [(400 c/poll)/(2 G c/s)] = 5% • Overhead for interrupts? + 0.05 * (4 MB/s)/(16 B/int) * [(400 c/int)/(2 G c/s)] = 0.25% Note: when disk is transferring data, the interrupt rate is same as polling rate
  • 33. ECE 152 © 2012 Daniel J. Sorin from Roth 33 Direct Memory Access (DMA) • Interrupts remove overhead of polling… • But still requires OS to transfer data one word at a time • OK for low bandwidth I/O devices: mice, microphones, etc. • Bad for high bandwidth I/O devices: disks, monitors, etc. • Direct Memory Access (DMA) • Transfer data between I/O and memory without processor control • Transfers entire blocks (e.g., pages, video frames) at a time • Can use bus “burst” transfer mode if available • Only interrupts processor when done (or if error occurs)
  • 34. ECE 152 © 2012 Daniel J. Sorin from Roth 34 DMA Controllers • To do DMA, I/O device attached to DMA controller • Multiple devices can be connected to one DMA controller • Controller itself seen as a memory mapped I/O device • Processor initializes start memory address, transfer size, etc. • DMA controller takes care of bus arbitration and transfer details • So that’s why buses support arbitration and multiple masters! CPU ($) Main Memory Disk DMA DMA display NIC I/O ctrl Bus
  • 35. ECE 152 © 2012 Daniel J. Sorin from Roth 35 I/O Processors • A DMA controller is a very simple component • May be as simple as a FSM with some local memory • Some I/O requires complicated sequences of transfers • I/O processor: heavier DMA controller that executes instructions • Can be programmed to do complex transfers • E.g., programmable network card CPU ($) Main Memory Disk DMA DMA display NIC IOP Bus
  • 36. ECE 152 © 2012 Daniel J. Sorin from Roth 36 DMA Overhead • Parameters • 2 GHz CPU • Interrupt handler takes 400 cycles • Data transfer takes 100 cycles • 4 MB/s, 16 B interface, disk transfers data 50% of time • DMA setup takes 1600 cycles, transfer 1 16KB page at a time • Processor overhead for interrupt-driven I/O? • 0.5 * (4M B/s)/(16 B/xfer)*[(500 c/xfer)/(2 G c/s)] = 3.1% • Processor overhead with DMA? • Processor only gets involved once per page, not once per 16 B + 0.5 * (4M B/s)/(16K B/page) * [(2000 c/page)/(2 G c/s)] = 0.01%
  • 37. ECE 152 © 2012 Daniel J. Sorin from Roth 37 DMA and Memory Hierarchy • DMA is good, but is not without challenges • Without DMA: processor initiates all data transfers • All transfers go through address translation + Transfers can be of any size and cross virtual page boundaries • All values seen by cache hierarchy + Caches never contain stale data • With DMA: DMA controllers initiate data transfers • Do they use virtual or physical addresses? • What if they write data to a cached memory location?
  • 38. ECE 152 © 2012 Daniel J. Sorin from Roth 38 DMA and Address Translation • Which addresses does processor specify to DMA controller? • Virtual DMA + Can specify large cross-page transfers – DMA controller has to do address translation internally • DMA contains small translation buffer (TB) • OS initializes buffer contents when it requests an I/O transfer • Physical DMA + DMA controller is simple – Can only do short page-size transfers • OS breaks large transfers into page-size chunks
  • 39. ECE 152 © 2012 Daniel J. Sorin from Roth 39 DMA and Caching • Caches are good • Reduce CPU’s observed instruction and data access latency + But also, reduce CPU’s use of memory… + …leaving majority of memory/bus bandwidth for DMA I/O • But they also introduce a coherence problem for DMA • Input problem • DMA write into memory version of cached location • Cached version now stale • Output problem: write-back caches only • DMA read from memory version of “dirty” cached location • Output stale value
  • 40. ECE 152 © 2012 Daniel J. Sorin from Roth 40 Solutions to Coherence Problem • Route all DMA I/O accesses to cache + Solves problem – Expensive: CPU must contend for access to caches with DMA • Disallow caching of I/O data + Also works – Expensive in a different way: CPU access to those regions slow • Selective flushing/invalidations of cached data • Flush all dirty blocks in “I/O region” • Invalidate blocks in “I/O region” as DMA writes those addresses + The high performance solution • Hardware cache coherence mechanisms for doing this – Expensive in yet a third way: must implement this mechanism
  • 41. ECE 152 © 2012 Daniel J. Sorin from Roth 41 H/W Cache Coherence (more later on this) • D$ and L2 “snoop”bus traffic • Observe transactions • Check if written addresses are resident • Self-invalidate those blocks + Doesn’t require access to data part – Does require access to tag part • May need 2nd copy of tags for this • That’s OK, tags smaller than data • Bus addresses are physical • L2 is easy (physical index/tag) • D$ is harder (virtual index/physical tag) CPU D$ L2 I$ TLB PA PA VA VA TLB Main Memory Disk DMA Bus
  • 42. ECE 152 © 2012 Daniel J. Sorin from Roth 42 Designing an I/O System for Bandwidth • Approach • Find bandwidths of individual components • Configure components you can change… • To match bandwidth of bottleneck component you can’t • Example (from P&H textbook, 3rd edition) • Parameters • 300 MIPS CPU, 100 MB/s backplane bus • 50K OS insns + 100K user insns per I/O operation • SCSI-2 controllers (20 MB/s): each accommodates up to 7 disks • 5 MB/s disks with tseek + trotation = 10 ms, 64 KB reads • Determine • What is the maximum sustainable I/O rate? • How many SCSI-2 controllers and disks does it require?
  • 43. ECE 152 © 2012 Daniel J. Sorin from Roth 43 Designing an I/O System for Bandwidth • First: determine I/O rates of components we can’t change • CPU: (300M insn/s) / (150K Insns/IO) = 2000 IO/s • Backplane: (100M B/s) / (64K B/IO) = 1562 IO/s • Peak I/O rate determined by bus: 1562 IO/s • Second: configure remaining components to match rate • Disk: 1 / [10 ms/IO + (64K B/IO) / (5M B/s)] = 43.9 IO/s • How many disks? • (1562 IO/s) / (43.9 IO/s) = 36 disks • How many controllers? • (43.9 IO/s) * (64K B/IO) = 2.74M B/s per disk • (20M B/s) / (2.74M B/s) = 7.2 disks per SCSI controller • (36 disks) / (7 disks/SCSI-2) = 6 SCSI-2 controllers • Caveat: real I/O systems modeled with simulation
  • 44. ECE 152 © 2012 Daniel J. Sorin from Roth 44 Designing an I/O System for Latency • Previous system designed for bandwidth • Some systems have latency requirements as well • E.g., database system may require maximum or average latency • Latencies are actually harder to deal with than bandwidths • Unloaded system: few concurrent IO transactions • Latency is easy to calculate • Loaded system: many concurrent IO transactions • Contention can lead to queuing • Latencies can rise dramatically • Queuing theory can help if transactions obey fixed distribution • Otherwise simulation is needed
  • 45. ECE 152 © 2012 Daniel J. Sorin from Roth 45 Summary • Role of the OS • Device characteristics • Data bandwidth • Disks • Structure and latency: seek, rotation, transfer, controller delays • Bus characteristics • Processor-memory, I/O, and backplane buses • Width, multiplexing, clocking, switching, arbitration • I/O control • I/O instructions vs. memory mapped I/O • Polling vs. interrupts • Processor controlled data transfer vs. DMA • Interaction of DMA with memory system

Editor's Notes

  • #3: credential: bring a computer die photo wafer : This can be an hidden slide. I just want to use this to do my own planning. I have rearranged Culler’s lecture slides slightly and add more slides. This covers everything he covers in his first lecture (and more) but may We will save the fun part, “ Levels of Organization,” at the end (so student can stay awake): I will show the internal stricture of the SS10/20. Notes to Patterson: You may want to edit the slides in your section or add extra slides to taylor your needs.
  • #5: Ask for examples of I/O in real computers, iPods, etc.
  • #7: They don’t know what DMA is yet – just tell them you’ll get to it. For now, it’s just a controller.
  • #13: Don’t dwell on the entries in the table. Just give them some idea of the general parameters. Don’t worry about the interfaces – tell them you’ll talk about that later.
  • #14: T_transfer = 10ms/rotation * 8 sectors * (1 track/128 sectors)
  • #16: Talk about this in words, I.e., elaborate beyond what is on the slide. Draw it on the board and get them to see how it would work.
  • #25: credential: bring a computer die photo wafer : This can be an hidden slide. I just want to use this to do my own planning. I have rearranged Culler’s lecture slides slightly and add more slides. This covers everything he covers in his first lecture (and more) but may We will save the fun part, “ Levels of Organization,” at the end (so student can stay awake): I will show the internal stricture of the SS10/20. Notes to Patterson: You may want to edit the slides in your section or add extra slides to taylor your needs.