SlideShare a Scribd company logo
Chapter 1
Computer System
Overview
Seventh Edition
By William Stallings
Operating
Systems:
Internals
and Design
Principles
Operating Systems:Operating Systems:
Internals and Design PrinciplesInternals and Design Principles
“No artifact designed by man is so convenient for this kind of
functional description as a digital computer. Almost the only ones
of its properties that are detectable in its behavior are the
organizational properties. Almost no interesting statement that
one can make about an operating computer bears any particular
relation to the specific nature of the hardware. A computer is an
organization of elementary functional components in which, to a
high approximation, only the function performed by those
components is relevant to the behavior of the whole system.”
THE SCIENCES OF THE ARTIFICIAL ,
Herbert Simon
Operating SystemOperating System
 Exploits the hardware resources of one or more
processors to provide a set of services to system
users
 Manages secondary memory and I/O devices
Basic ElementsBasic Elements
ProcessorProcessor
Main MemoryMain Memory
Volatile
Contents of the memory is
lost when the computer is
shut down
Referred to as real memory
or primary memory
I/O ModulesI/O Modules
System BusSystem Bus
Provides for
communication among
processors, main memory,
and I/O modules
Top-LevelTop-Level
ViewView
MicroprocessorMicroprocessor
Invention that brought about desktop
and handheld computing
Processor on a single chip
Fastest general purpose processor
Multiprocessors
Each chip (socket) contains multiple
processors (cores)
Graphical ProcessingGraphical Processing
Units (GPUs)Units (GPUs)
Provide efficient computation on arrays
of data using Single-Instruction Multiple
Data (SIMD) techniques
Used for general numerical processing
Physics simulations for games
Computations on large spreadsheets
Digital Signal ProcessorsDigital Signal Processors
(DSPs)(DSPs)
Deal with streaming signals such as
audio or video
Used to be embedded in devices like
modems
Encoding/decoding speech and video
(codecs)
Support for encryption and security
System on a ChipSystem on a Chip
(SoC)(SoC)
To satisfy the requirements of handheld
devices, the microprocessor is giving way
to the SoC
Components such as DSPs, GPUs,
codecs and main memory, in
addition to the CPUs and caches,
are on the same chip
Instruction ExecutionInstruction Execution
 A program consists of a set of instructions
stored in memory
Basic Instruction CycleBasic Instruction Cycle
 The processor fetches the instruction from
memory
 Program counter (PC) holds address of the
instruction to be fetched next
 PC is incremented after each fetch
Instruction Register (IR)Instruction Register (IR)
Fetched instruction is
loaded into Instruction
Register (IR)
 Processor interprets the
instruction and performs
required action:
 Processor-memory
 Processor-I/O
 Data processing
 Control
Characteristics of aCharacteristics of a
Hypothetical MachineHypothetical Machine
Example of
Program
Execution
InterruptsInterrupts
 Interrupt the normal sequencing of the
processor
 Provided to improve processor utilization
 most I/O devices are slower than the processor
 processor must pause to wait for device
 wasteful use of the processor
Common Classes
of Interrupts
Flow of Control
Without
Interrupts
Interrupts:
Short I/O Wait
Transfer of Control via Interrupts
Instruction Cycle With InterruptsInstruction Cycle With Interrupts
Program Timing:
Short I/O Wait
Program Timing:
Long I/O wait
Simple
Interrupt
Processing
Multiple InterruptsMultiple Interrupts
Memory HierarchyMemory Hierarchy
 Major constraints in memory
 amount
 speed
 expense
 Memory must be able to keep up with the processor
 Cost of memory must be reasonable in relationship to
the other components
Memory RelationshipsMemory Relationships
The Memory HierarchyThe Memory Hierarchy
 Going down the
hierarchy:
 decreasing cost per bit
 increasing capacity
 increasing access time
 decreasing frequency of
access to the memory by
the processor
Performance of a SimplePerformance of a Simple
Two-Level MemoryTwo-Level Memory
Figure 1.15 Performance of a Simple Two-Level Memory
 Memory references by the processor tend to
cluster
 Data is organized so that the percentage of
accesses to each successively lower level is
substantially less than that of the level above
 Can be applied across more than two levels of
memory
Chapter01 os7e
 Invisible to the OS
 Interacts with other memory management hardware
 Processor must access memory at least once per instruction
cycle
 Processor execution is limited by memory cycle time
 Exploit the principle of locality with a small, fast memory
 Contains a copy of a portion of main memory
 Processor first checks cache
 If not found, a block of memory is read into cache
 Because of locality of reference, it is likely that many of the
future memory references will be to other bytes in the block
Cache and
Main
Memory
Cache/Main-Memory StructureCache/Main-Memory Structure
I/O TechniquesI/O Techniques
∗ When the processor encounters an instruction relating to
I/O, it executes that instruction by issuing a command to
the appropriate I/O module
Programmed I/OProgrammed I/O
 The I/O module performs the requested action
then sets the appropriate bits in the I/O status
register
 The processor periodically checks the status of the
I/O module until it determines the instruction is
complete
 With programmed I/O the performance level of
the entire system is severely degraded
Interrupt-Driven I/OInterrupt-Driven I/O
Interrupt-Driven I/OInterrupt-Driven I/O
DrawbacksDrawbacks
 Transfer rate is limited by the speed with
which the processor can test and service a
device
 The processor is tied up in managing an I/O
transfer
 a number of instructions must be
executed for each I/O transfer
Direct Memory AccessDirect Memory Access
(DMA)(DMA)
∗ Performed by a separate module on the system bus or
incorporated into an I/O module
 Transfers the entire block of data directly to
and from memory without going through the
processor
 processor is involved only at the beginning and end of the
transfer
 processor executes more slowly during a transfer when
processor access to the bus is required
 More efficient than interrupt-driven or
programmed I/O
Symmetric MultiprocessorsSymmetric Multiprocessors
(SMP)(SMP)
 A stand-alone computer system with the
following characteristics:
 two or more similar processors of comparable capability
 processors share the same main memory and are
interconnected by a bus or other internal connection scheme
 processors share access to I/O devices
 all processors can perform the same functions
 the system is controlled by an integrated operating system
that provides interaction between processors and their
programs at the job, task, file, and data element levels
Chapter01 os7e
SMP OrganizationSMP Organization
Figure 1.19 Symmetric Multiprocessor Organization
Multicore ComputerMulticore Computer
 Also known as a chip multiprocessor
 Combines two or more processors (cores) on a
single piece of silicon (die)
 each core consists of all of the components of an
independent processor
 In addition, multicore chips also include L2
cache and in some cases L3 cache
Intel Core i7Intel Core i7
IntelIntel
Core i7Core i7
Figure 1.20 Intel Corei7 Block Diagram
SummarySummary
Basic Elements
 processor, main memory, I/O modules, system
bus
 GPUs, SIMD, DSPs, SoC
 Instruction execution
 processor-memory, processor-I/O, data processing,
control
 Interrupt/Interrupt Processing
 Memory Hierarchy
 Cache/cache principles and designs
 Multiprocessor/multicore

More Related Content

PPTX
Information Storage and Retrieval system (ISRS)
PDF
Operating System.pdf
PPTX
Diabetes Mellitus
PPTX
Hypertension
PPTX
Republic Act No. 11313 Safe Spaces Act (Bawal Bastos Law).pptx
PPTX
Power Point Presentation on Artificial Intelligence
Information Storage and Retrieval system (ISRS)
Operating System.pdf
Diabetes Mellitus
Hypertension
Republic Act No. 11313 Safe Spaces Act (Bawal Bastos Law).pptx
Power Point Presentation on Artificial Intelligence

What's hot (20)

PPTX
CS6303 - Computer Architecture
PDF
Intel x86 Architecture
PPTX
Hardware Multi-Threading
PPTX
Microprogrammed Control Unit
PPTX
Assembly language
PPT
Chapter 1 computer abstractions and technology
PPT
Introduction to microprocessor
PPTX
Cache memory
PPS
Virtual memory
PPTX
ADDRESSING MODES
PPTX
Types of Instruction Format
PPTX
General register organization (computer organization)
PPTX
1 Computer Architecture
PPTX
RISC - Reduced Instruction Set Computing
PPTX
Chapter 02 instructions language of the computer
PPTX
Addressing sequencing
PPTX
Single &Multi Core processor
PPT
isa architecture
CS6303 - Computer Architecture
Intel x86 Architecture
Hardware Multi-Threading
Microprogrammed Control Unit
Assembly language
Chapter 1 computer abstractions and technology
Introduction to microprocessor
Cache memory
Virtual memory
ADDRESSING MODES
Types of Instruction Format
General register organization (computer organization)
1 Computer Architecture
RISC - Reduced Instruction Set Computing
Chapter 02 instructions language of the computer
Addressing sequencing
Single &Multi Core processor
isa architecture
Ad

Viewers also liked (20)

PPTX
μιχαελα
PPTX
Responsabilidad social corporativa 2
PDF
Qform classlibrary
PPT
Hallazgos red educadores
PPT
Presentation1
PPTX
Presentación1
PPS
El cirujano clandestino 0
PPT
Presentación1
PPSX
Presentación valentin flores g
PPS
El mundo en el 2070
PPTX
Psycho presentation
PPS
Deloqueescapaznuestrocerebro
PPTX
PPTX
Sistema endocrino
PDF
A brief introduction to dart
PPT
Ac Government 091005124359 Phpapp02 091007124148 Phpapp01 091009124527 Phpapp...
DOCX
PDF
Ceo Guide To Social Media
PPTX
Executive branch
PPTX
Project work IPE - BDO
μιχαελα
Responsabilidad social corporativa 2
Qform classlibrary
Hallazgos red educadores
Presentation1
Presentación1
El cirujano clandestino 0
Presentación1
Presentación valentin flores g
El mundo en el 2070
Psycho presentation
Deloqueescapaznuestrocerebro
Sistema endocrino
A brief introduction to dart
Ac Government 091005124359 Phpapp02 091007124148 Phpapp01 091009124527 Phpapp...
Ceo Guide To Social Media
Executive branch
Project work IPE - BDO
Ad

Similar to Chapter01 os7e (20)

PPTX
Computer System Overview-William Stallings.pptx
PPTX
Computer System Overview,
PPTX
Ch 01 os8e
PPTX
COMPUTER ORGANIZATION for beginner and a
PPTX
COMPUTER_ORGANIZATION (1).pptx
PPTX
381CCS_CHAPTER1_UPDATEDdatabase management .pptx
PPT
OS-20210426203801.ppt
PPT
OS-20210426203801 introduction to os.ppt
PPT
OS-20210426203801.ppt
PPT
OS-20210426203801.ppt
PPT
OS-20210426203801.ppt
PPT
Introduction to Oerating System By Vinod.ppt
PPT
OS full chapter.ppt
PPT
Windows 1Fundaments.ppt
PPT
OS-deadlock detection and recovery with additional features.ppt
PPT
Operating systems, complete information.ppt
PPT
Operating system overview and explaination.ppt
PPT
Abhishek_OSC_ompimization_cyber_cocnpts.ppt
PPTX
Basic Organisation and fundamental Of Computer.pptx
PPTX
Operating system.pptx
Computer System Overview-William Stallings.pptx
Computer System Overview,
Ch 01 os8e
COMPUTER ORGANIZATION for beginner and a
COMPUTER_ORGANIZATION (1).pptx
381CCS_CHAPTER1_UPDATEDdatabase management .pptx
OS-20210426203801.ppt
OS-20210426203801 introduction to os.ppt
OS-20210426203801.ppt
OS-20210426203801.ppt
OS-20210426203801.ppt
Introduction to Oerating System By Vinod.ppt
OS full chapter.ppt
Windows 1Fundaments.ppt
OS-deadlock detection and recovery with additional features.ppt
Operating systems, complete information.ppt
Operating system overview and explaination.ppt
Abhishek_OSC_ompimization_cyber_cocnpts.ppt
Basic Organisation and fundamental Of Computer.pptx
Operating system.pptx

Recently uploaded (20)

PPTX
A slide for students with the advantagea
PPTX
DPT-MAY24.pptx for review and ucploading
PDF
313302 DBMS UNIT 1 PPT for diploma Computer Eng Unit 2
PPTX
Surgical thesis protocol formation ppt.pptx
PDF
MCQ Practice CBT OL Official Language 1.pptx.pdf
PPTX
Definition and Relation of Food Science( Lecture1).pptx
PPTX
cse couse aefrfrqewrbqwrgbqgvq2w3vqbvq23rbgw3rnw345
PPTX
Condensed_Food_Science_Lecture1_Precised.pptx
PPTX
Sports and Dance -lesson 3 powerpoint presentation
DOCX
mcsp232projectguidelinesjan2023 (1).docx
PPTX
microtomy kkk. presenting to cryst in gl
DOC
field study for teachers graduating samplr
PPT
APPROACH TO DEVELOPMENTALlllllllllllllllll
PDF
APNCET2025RESULT Result Result 2025 2025
PDF
CV of Architect Professor A F M Mohiuddin Akhand.pdf
PDF
Why Today’s Brands Need ORM & SEO Specialists More Than Ever.pdf
PPTX
Prokaryotes v Eukaryotes PowerPoint.pptx
PPT
notes_Lecture2 23l3j2 dfjl dfdlkj d 2.ppt
PPTX
Your Guide to a Winning Interview Aug 2025.
PPT
Gsisgdkddkvdgjsjdvdbdbdbdghjkhgcvvkkfcxxfg
A slide for students with the advantagea
DPT-MAY24.pptx for review and ucploading
313302 DBMS UNIT 1 PPT for diploma Computer Eng Unit 2
Surgical thesis protocol formation ppt.pptx
MCQ Practice CBT OL Official Language 1.pptx.pdf
Definition and Relation of Food Science( Lecture1).pptx
cse couse aefrfrqewrbqwrgbqgvq2w3vqbvq23rbgw3rnw345
Condensed_Food_Science_Lecture1_Precised.pptx
Sports and Dance -lesson 3 powerpoint presentation
mcsp232projectguidelinesjan2023 (1).docx
microtomy kkk. presenting to cryst in gl
field study for teachers graduating samplr
APPROACH TO DEVELOPMENTALlllllllllllllllll
APNCET2025RESULT Result Result 2025 2025
CV of Architect Professor A F M Mohiuddin Akhand.pdf
Why Today’s Brands Need ORM & SEO Specialists More Than Ever.pdf
Prokaryotes v Eukaryotes PowerPoint.pptx
notes_Lecture2 23l3j2 dfjl dfdlkj d 2.ppt
Your Guide to a Winning Interview Aug 2025.
Gsisgdkddkvdgjsjdvdbdbdbdghjkhgcvvkkfcxxfg

Chapter01 os7e

Editor's Notes

  • #2: “Operating Systems: Internal and Design Principles”, 7/e, by William Stallings, Chapter 1 “Computer System Overview”.
  • #3: Chapter 1 provides an overview of computer system hardware. In most areas, the survey is brief, as it is assumed that the reader is familiar with this subject. However, several areas are covered in some detail because of their importance to topics covered later in the book.
  • #4: An operating system (OS) exploits the hardware resources of one or more processors to provide a set of services to system users. The OS also manages secondary memory and I/O (input/output) devices on behalf of its users. Accordingly, it is important to have some understanding of the underlying computer system hardware before we begin our examination of operating systems.
  • #5: At a top level, a computer consists of processor, memory, and I/O components, with one or more modules of each type. These components are interconnected in some fashion to achieve the main function of the computer, which is to execute programs. Thus, there are four main structural elements: Processor I/O Modules Main Memory System Bus
  • #6: Processor : Controls the operation of the computer and performs its data processing functions. When there is only one processor, it is often referred to as the central processing unit (CPU).
  • #7: Main memory : Stores data and programs. This memory is typically volatile; that is, when the computer is shut down, the contents of the memory are lost. In contrast, the contents of disk memory are retained even when the computer system is shut down. Main memory is also referred to as real memory or primary memory.
  • #8: I/O modules : Move data between the computer and its external environment. The external environment consists of a variety of devices, including secondary memory devices (e.g., disks), communications equipment, and terminals.
  • #9: System bus : Provides for communication among processors, main memory, and I/O modules.
  • #10: Figure 1.1 depicts these top-level components. One of the processor’s functions is to exchange data with memory. For this purpose, it typically makes use of two internal (to the processor) registers: a memory address register (MAR), which specifies the address in memory for the next read or write; and a memory buffer register (MBR), which contains the data to be written into memory or which receives the data read from memory. Similarly, an I/O address register (I/OAR) specifies a particular I/O device. An I/O buffer register (I/OBR) is used for the exchange of data between an I/O module and the processor. A memory module consists of a set of locations, defined by sequentially numbered addresses. Each location contains a bit pattern that can be interpreted as either an instruction or data. An I/O module transfers data from external devices to processor and memory, and vice versa. It contains internal buffers for temporarily holding data until they can be sent on.
  • #11: The hardware revolution that brought about desktop and handheld computing was the invention of the microprocessor, which contained a processor on a single chip. Though originally much slower than multichip processors, microprocessors have continually evolved to the point that they are now much faster for most computations due to the physics involved in moving information around in sub-nanosecond timeframes. Not only have microprocessors become the fastest general purpose processors available, they are now multiprocessors; each chip (called a socket) contains multiple processors (called cores), each with multiple levels of large memory caches, and multiple logical processors sharing the execution units of each core. As of 2010, it is not unusual for even a laptop to have 2 or 4 cores, each with 2 hardware threads, for a total of 4 or 8 logical processors.
  • #12: Although processors provide very good performance for most forms of computing, there is increasing demand for numerical computation. Graphical Processing Units (GPUs) provide efficient computation on arrays of data using Single-Instruction Multiple Data (SIMD) techniques pioneered in supercomputers. GPUs are no longer used just for rendering advanced graphics, but they are also used for general numerical processing, such as physics simulations for games or computations on large spreadsheets. Simultaneously, the CPUs themselves are gaining the capability of operating on arrays of data—with increasingly powerful vector units integrated into the processor architecture of the x86 and AMD64 families. Processors and GPUs are not the end of the computational story for the modern PC.
  • #13: Digital Signal Processors (DSPs) are also present, for dealing with streaming signals—such as audio or video. DSPs used to be embedded in I/O devices, like modems, but they are now becoming first-class computational devices, especially in handhelds. Other specialized computational devices (fixed function units) co-exist with the CPU to support other standard computations, such as encoding/decoding speech and video (codecs), or providing support for encryption and security.
  • #14: To satisfy the requirements of handheld devices, the classic microprocessor is giving way to the System on a Chip (SoC), where not just the CPUs and caches are on the same chip, but also many of the other components of the system, such as DSPs, GPUs, I/O devices (such as radios and codecs), and main memory.
  • #15: A program to be executed by a processor consists of a set of instructions stored in memory. In its simplest form, instruction processing consists of two steps: The processor reads ( fetches ) instructions from memory one at a time and executes each instruction. Program execution consists of repeating the process of instruction fetch and instruction execution. Instruction execution may involve several operations and depends on the nature of the instruction.
  • #16: The processing required for a single instruction is called an instruction cycle. Using a simplified two-step description, the instruction cycle is depicted in Figure 1.2 . The two steps are referred to as the fetch stage and the execute stage. Program execution halts only if the processor is turned off, some sort of unrecoverable error occurs, or a program instruction that halts the processor is encountered.
  • #17: At the beginning of each instruction cycle, the processor fetches an instruction from memory. Typically, the program counter (PC) holds the address of the next instruction to be fetched. Unless instructed otherwise, the processor always increments the PC after each instruction fetch so that it will fetch the next instruction in sequence (i.e., the instruction located at the next higher memory address).
  • #18: The fetched instruction is loaded into the instruction register (IR). The instruction contains bits that specify the action the processor is to take. The processor interprets the instruction and performs the required action. In general, these actions fall into four categories: • Processor-memory: Data may be transferred from processor to memory or from memory to processor. • Processor-I/O: Data may be transferred to or from a peripheral device by transferring between the processor and an I/O module. • Data processing: The processor may perform some arithmetic or logic operation on data. • Control: An instruction may specify that the sequence of execution be altered. An instruction’s execution may involve a combination of these actions.
  • #19: Consider a simple example using a hypothetical processor that includes the characteristics listed in Figure 1.3 . The processor contains a single data register, called the accumulator (AC). Both instructions and data are 16 bits long, and memory is organized as a sequence of 16-bit words. The instruction format provides 4 bits for the opcode, allowing as many as 24 16 different opcodes (represented by a single hexadecimal 1 digit). The opcode defines the operation the processor is to perform. With the remaining 12 bits of the instruction format, up to 212 4,096 (4K) words of memory (denoted by three hexadecimal digits) can be directly addressed.
  • #20: Figure 1.4 illustrates a partial program execution, showing the relevant portions of memory and processor registers. The program fragment shown adds the contents of the memory word at address 940 to the contents of the memory word at address 941 and stores the result in the latter location. Three instructions, which can be described as three fetch and three execute stages, are required: The PC contains 300, the address of the first instruction. This instruction (the value 1940 in hexadecimal) is loaded into the IR and the PC is incremented. Note that this process involves the use of a memory address register (MAR) and a memory buffer register (MBR). For simplicity, these intermediate registers are not shown. 2. The first 4 bits (first hexadecimal digit) in the IR indicate that the AC is to be loaded from memory. The remaining 12 bits (three hexadecimal digits) specify the address, which is 940. 3. The next instruction (5941) is fetched from location 301 and the PC is incremented. 4. The old contents of the AC and the contents of location 941 are added and the result is stored in the AC. 5. The next instruction (2941) is fetched from location 302 and the PC is incremented. 6. The contents of the AC are stored in location 941. In this example, three instruction cycles, each consisting of a fetch stage and an execute stage, are needed to add the contents of location 940 to the contents of 941. With a more complex set of instructions, fewer instruction cycles would be needed. Most modern processors include instructions that contain more than one address. Thus the execution stage for a particular instruction may involve more than one reference to memory. Also, instead of memory references, an instruction may specify an I/O operation.
  • #21: Virtually all computers provide a mechanism by which other modules (I/O, memory) may interrupt the normal sequencing of the processor. Interrupts are provided primarily as a way to improve processor utilization. For example, most I/O devices are much slower than the processor. Suppose that the processor is transferring data to a printer using the instruction cycle scheme of Figure 1.2 . After each write operation, the processor must pause and remain idle until the printer catches up. The length of this pause may be on the order of many thousands or even millions of instruction cycles. Clearly, this is a very wasteful use of the processor.
  • #22: Table 1.1 lists the most common classes of interrupts.
  • #23: To give a specific example, consider a PC that operates at 1 GHz, which would allow roughly 10 9 instructions per second. 2 A typical hard disk has a rotational speed of 7200 revolutions per minute for a half-track rotation time of 4 ms, which is 4 million times slower than the processor. Figure 1.5a illustrates this state of affairs. The user program performs a series of WRITE calls interleaved with processing. The solid vertical lines represent segments of code in a program. Code segments 1, 2, and 3 refer to sequences of instructions that do not involve I/O. The WRITE calls are to an I/O routine that is a system utility and that will perform the actual I/O operation. The I/O program consists of three sections: • A sequence of instructions, labeled 4 in the figure, to prepare for the actual I/O operation. This may include copying the data to be output into a special buffer and preparing the parameters for a device command. • The actual I/O command. Without the use of interrupts, once this command is issued, the program must wait for the I/O device to perform the requested function (or periodically check the status, or poll, the I/O device). The program might wait by simply repeatedly performing a test operation to determine if the I/O operation is done. • A sequence of instructions, labeled 5 in the figure, to complete the operation. This may include setting a flag indicating the success or failure of the operation. The dashed line represents the path of execution followed by the processor; that is, this line shows the sequence in which instructions are executed. Thus, after the first WRITE instruction is encountered, the user program is interrupted and execution continues with the I/O program. After the I/O program execution is complete, execution resumes in the user program immediately following the WRITE Instruction. Because the I/O operation may take a relatively long time to complete, the I/O program is hung up waiting for the operation to complete; hence, the user program is stopped at the point of the WRITE call for some considerable period of time.
  • #24: With interrupts, the processor can be engaged in executing other instructions while an I/O operation is in progress. Consider the flow of control in Figure 1.5b . As before, the user program reaches a point at which it makes a system call in the form of a WRITE call. The I/O program that is invoked in this case consists only of the preparation code and the actual I/O command. After these few instructions have been executed, control returns to the user program. Meanwhile, the external device is busy accepting data from computer memory and printing it. This I/O operation is conducted concurrently with the execution of instructions in the user program. When the external device becomes ready to be serviced, that is, when it is ready to accept more data from the processor, the I/O module for that external device sends an interrupt request signal to the processor. The processor responds by suspending operation of the current program; branching off to a routine to service that particular I/O device, known as an interrupt handler; and resuming the original execution after the device is serviced. The points at which such interrupts occur are indicated by in Figure 1.5b . Note that an interrupt can occur at any point in the main program, not just at one specific instruction.
  • #25: For the user program, an interrupt suspends the normal sequence of execution. When the interrupt processing is completed, execution resumes ( Figure 1.6 ). Thus, the user program does not have to contain any special code to accommodate interrupts; the processor and the OS are responsible for suspending the user program and then resuming it at the same point.
  • #26: To accommodate interrupts, an interrupt stage is added to the instruction cycle, as shown in Figure 1.7 (compare Figure 1.2 ). In the interrupt stage, the processor checks to see if any interrupts have occurred, indicated by the presence of an interrupt signal. If no interrupts are pending, the processor proceeds to the fetch stage and fetches the next instruction of the current program. If an interrupt is pending, the processor suspends execution of the current program and executes an interrupthandler routine. The interrupt-handler routine is generally part of the OS. Typically, this routine determines the nature of the interrupt and performs whatever actions are needed. In the example we have been using, the handler determines which I/O module generated the interrupt and may branch to a program that will write more data out to that I/O module. When the interrupt-handler routine is completed, the processor can resume execution of the user program at the point of interruption. It is clear that there is some overhead involved in this process. Extra instructions must be executed (in the interrupt handler) to determine the nature of the interrupt and to decide on the appropriate action. Nevertheless, because of the relatively large amount of time that would be wasted by simply waiting on an I/O operation, the processor can be employed much more efficiently with the use of interrupts.
  • #27: To appreciate the gain in efficiency, consider Figure 1.8 , which is a timing diagram based on the flow of control in Figures 1.5a and 1.5b . Figures 1.5b and 1.8 assume that the time required for the I/O operation is relatively short: less than the time to complete the execution of instructions between write operations in the user program. The more typical case, especially for a slow device such as a printer, is that the I/O operation will take much more time than executing a sequence of user instructions. Figure 1.5c indicates this state of affairs. In this case, the user program reaches the second WRITE call before the I/O operation spawned by the first call is complete. The result is that the user program is hung up at that point.
  • #28: When the preceding I/O operation is completed, this new WRITE call may be processed, and a new I/O operation may be started. Figure 1.9 shows the timing for this situation with and without the use of interrupts. We can see that there is still a gain in efficiency because part of the time during which the I/O operation is underway overlaps with the execution of user instructions.
  • #29: An interrupt triggers a number of events, both in the processor hardware and in software. Figure 1.10 shows a typical sequence. When an I/O device completes an I/O operation, the following sequence of hardware events occurs: The device issues an interrupt signal to the processor. 2. The processor finishes execution of the current instruction before responding to the interrupt, as indicated in Figure 1.7 . 3. The processor tests for a pending interrupt request, determines that there is one, and sends an acknowledgment signal to the device that issued the interrupt. The acknowledgment allows the device to remove its interrupt signal. 4. The processor next needs to prepare to transfer control to the interrupt routine. To begin, it saves information needed to resume the current program at the point of interrupt. The minimum information required is the program status word 3 (PSW) and the location of the next instruction to be executed, which is contained in the program counter (PC). These can be pushed onto a control stack (see Appendix 1B). 5. The processor then loads the program counter with the entry location of the interrupt-handling routine that will respond to this interrupt. Depending on the computer architecture and OS design, there may be a single program, one for each type of interrupt, or one for each device and each type of interrupt. If there is more than one interrupt-handling routine, the processor must determine which one to invoke. This information may have been included in the original interrupt signal, or the processor may have to issue a request to the device that issued the interrupt to get a response that contains the needed information.
  • #30: So far, we have discussed the occurrence of a single interrupt. Suppose, however, that one or more interrupts can occur while an interrupt is being processed. For example, a program may be receiving data from a communications line and printing results at the same time. The printer will generate an interrupt every time that it completes a print operation. The communication line controller will generate an interrupt every time a unit of data arrives. The unit could either be a single character or a block, depending on the nature of the communications discipline. In any case, it is possible for a communications interrupt to occur while a printer interrupt is being processed.
  • #31: The design constraints on a computer’s memory can be summed up by three questions: How much? How fast? How expensive? The question of how much is somewhat open ended. If the capacity is there, applications will likely be developed to use it. The question of how fast is, in a sense, easier to answer. To achieve greatest performance, the memory must be able to keep up with the processor. That is, as the processor is executing instructions, we would not want it to have to pause waiting for instructions or operands. The final question must also be considered. For a practical system, the cost of memory must be reasonable in relationship to other components.
  • #32: As might be expected, there is a trade-off among the three key characteristics of memory: namely, capacity, access time, and cost. A variety of technologies are used to implement memory systems, and across this spectrum of technologies, the following relationships hold: • Faster access time, greater cost per bit • Greater capacity, smaller cost per bit • Greater capacity, slower access speed The dilemma facing the designer is clear. The designer would like to use memory technologies that provide for large-capacity memory, both because the capacity is needed and because the cost per bit is low. However, to meet performance requirements, the designer needs to use expensive, relatively lower-capacity memories with fast access times.
  • #33: The way out of this dilemma is to not rely on a single memory component or technology, but to employ a memory hierarchy . A typical hierarchy is illustrated in Figure 1.14 . As one goes down the hierarchy, the following occur: a. Decreasing cost per bit b. Increasing capacity c. Increasing access time d. Decreasing frequency of access to the memory by the processor Thus, smaller, more expensive, faster memories are supplemented by larger, cheaper, slower memories. The key to the success of this organization is the decreasing frequency of access at lower levels. We will examine this concept in greater detail later in this chapter, when we discuss the cache, and when we discuss virtual memory later in this book. A brief explanation is provided at this point. Suppose that the processor has access to two levels of memory. Level 1 contains 1,000 bytes and has an access time of 0.1 μs; level 2 contains 100,000 bytes and has an access time of 1 μs. Assume that if a byte to be accessed is in level 1, then the processor accesses it directly. If it is in level 2, then the byte is first transferred to level 1 and then accessed by the processor. For simplicity, we ignore the time required for the processor to determine whether the byte is in level 1 or level 2.
  • #34: Suppose that the processor has access to two levels of memory. Level 1 contains 1,000 bytes and has an access time of 0.1 μs; level 2 contains 100,000 bytes and has an access time of 1 μs. Assume that if a byte to be accessed is in level 1, then the processor accesses it directly. If it is in level 2, then the byte is first transferred to level 1 and then accessed by the processor. For simplicity, we ignore the time required for the processor to determine whether the byte is in level 1 or level 2. Figure 1.15 shows the general shape of the curve that models this situation. The figure shows the average access time to a two-level memory as a function of the hit ratio H , where H is defined as the fraction of all memory accesses that are found in the faster memory (e.g., the cache), T 1 is the access time to level 1, and T 2 is the access time to level 2. 4 As can be seen, for high percentages of level 1 access, the average total access time is much closer to that of level 1 than that of level 2. In our example, suppose 95% of the memory accesses are found in the cache (H 0.95) . Then the average time to access a byte can be expressed as (0.95) (0.1 s) (0.05) (0.1 s 1 s) 0.095 0.055 0.15 s The result is close to the access time of the faster memory. So the strategy of using two memory levels works in principle, but only if conditions (a) through (d) in the preceding list apply. By employing a variety of technologies, a spectrum of memory systems exists that satisfies conditions (a) through (c). Fortunately, condition (d) is also generally valid.
  • #35: The basis for the validity of condition (d) is a principle known as locality of reference [DENN68]. During the course of execution of a program, memory references by the processor, for both instructions and data, tend to cluster. Programs typically contain a number of iterative loops and subroutines. Once a loop or subroutine is entered, there are repeated references to a small set of instructions. Similarly, operations on tables and arrays involve access to a clustered set of data bytes. Over a long period of time, the clusters in use change, but over a short period of time, the processor is primarily working with fixed clusters of memory references. Accordingly, it is possible to organize data across the hierarchy such that the percentage of accesses to each successively lower level is substantially less than that of the level above. Consider the two-level example already presented. Let level 2 memory contain all program instructions and data. The current clusters can be temporarily placed in level 1. From time to time, one of the clusters in level 1 will have to be swapped back to level 2 to make room for a new cluster coming in to level 1. On average, however, most references will be to instructions and data contained in level 1. This principle can be applied across more than two levels of memory. The fastest, smallest, and most expensive type of memory consists of the registers internal to the processor. Typically, a processor will contain a few dozen such registers, although some processors contain hundreds of registers. Skipping down two levels, main memory is the principal internal memory system of the computer. Each location in main memory has a unique address, and most machine instructions refer to one or more main memory addresses. Main memory is usually extended with a higher-speed, smaller cache. The cache is not usually visible to the programmer or, indeed, to the processor. It is a device for staging the movement of data between main memory and processor registers to improve performance.
  • #36: The three forms of memory just described are, typically, volatile and employ semiconductor technology. The use of three levels exploits the fact that semiconductor memory comes in a variety of types, which differ in speed and cost. Data are stored more permanently on external mass storage devices, of which the most common are hard disk and removable media, such as removable disk, tape, and optical storage. External, nonvolatile memory is also referred to as secondary memory or auxiliary memory . These are used to store program and data files, and are usually visible to the programmer only in terms of files and records, as opposed to individual bytes or words. A hard disk is also used to provide an extension to main memory known as virtual memory, which is discussed in Chapter 8 .
  • #37: Although cache memory is invisible to the OS, it interacts with other memory management hardware. Furthermore, many of the principles used in virtual memory schemes (discussed in Chapter 8 ) are also applied in cache memory. On all instruction cycles, the processor accesses memory at least once, to fetch the instruction, and often one or more additional times, to fetch operands and/ or store results. The rate at which the processor can execute instructions is clearly limited by the memory cycle time (the time it takes to read one word from or write one word to memory). This limitation has been a significant problem because of the persistent mismatch between processor and main memory speeds: Over the years, processor speed has consistently increased more rapidly than memory access speed. We are faced with a trade-off among speed, cost, and size. Ideally, main memory should be built with the same technology as that of the processor registers, giving memory cycle times comparable to processor cycle times. This has always been too expensive a strategy. The solution is to exploit the principle of locality by providing a small, fast memory between the processor and main memory, namely the cache.
  • #38: Cache memory is intended to provide memory access time approaching that of the fastest memories available and at the same time support a large memory size that has the price of less expensive types of semiconductor memories. The concept is illustrated in Figure 1.16a . There is a relatively large and slow main memory together with a smaller, faster cache memory. The cache contains a copy of a portion of main memory. When the processor attempts to read a byte or word of memory, a check is made to determine if the byte or word is in the cache. If so, the byte or word is delivered to the processor. If not, a block of main memory, consisting of some fixed number of bytes, is read into the cache and then the byte or word is delivered to the processor. Because of the phenomenon of locality of reference, when a block of data is fetched into the cache to satisfy a single memory reference, it is likely that many of the near-future memory references will be to other bytes in the block.
  • #39: Cache and main memory illustration. Figure 1.16b depicts the use of multiple levels of cache. The L2 cache is slower and typically larger than the L1 cache, and the L3 cache is slower and typically larger than the L2 cache.
  • #40: Figure 1.17 depicts the structure of a cache/main memory system. Main memory consists of up to 2 n addressable words, with each word having a unique n –bit address. For mapping purposes, this memory is considered to consist of a number of fixed-length blocks of K words each. That is, there are M 2n/K blocks. Cache consists of C slots (also referred to as lines ) of K words each, and the number of slots is considerably less than the number of main memory blocks (CM) . 5 Some subset of the blocks of main memory resides in the slots of the cache. If a word in a block of memory that is not in the cache is read, that block is transferred to one of the slots of the cache. Because there are more blocks than slots, an individual slot cannot be uniquely and permanently dedicated to a particular block. Therefore, each slot includes a tag that identifies which particular block is currently being stored. The tag is usually some number of higher-order bits of the address and refers to all addresses that begin with that sequence of bits.
  • #41: When the processor is executing a program and encounters an instruction relating to I/O, it executes that instruction by issuing a command to the appropriate I/O module. Three techniques are possible for I/O operations: programmed I/O, interrupt-driven I/O, and direct memory access (DMA).
  • #42: In the case of programmed I/O , the I/O module performs the requested action and then sets the appropriate bits in the I/O status register but takes no further action to alert the processor. In particular, it does not interrupt the processor. Thus, after the I/O instruction is invoked, the processor must take some active role in determining when the I/O instruction is completed. For this purpose, the processor periodically checks the status of the I/O module until it finds that the operation is complete. With programmed I/O, the processor has to wait a long time for the I/O module of concern to be ready for either reception or transmission of more data. The processor, while waiting, must repeatedly interrogate the status of the I/O module. As a result, the performance level of the entire system is severely degraded.
  • #43: An alternative, known as interrupt-driven I/O , is for the processor to issue an I/O command to a module and then go on to do some other useful work. The I/O module will then interrupt the processor to request service when it is ready to exchange data with the processor. The processor then executes the data transfer, as before, and then resumes its former processing.
  • #44: Interrupt-driven I/O, though more efficient than simple programmed I/O, still requires the active intervention of the processor to transfer data between memory and an I/O module, and any data transfer must traverse a path through the processor. Thus, both of these forms of I/O suffer from two inherent drawbacks: 1. The I/O transfer rate is limited by the speed with which the processor can test and service a device. 2. The processor is tied up in managing an I/O transfer; a number of instructions must be executed for each I/O transfer.
  • #45: When large volumes of data are to be moved, a more efficient technique is required: direct memory access (DMA) . The DMA function can be performed by a separate module on the system bus or it can be incorporated into an I/O module. In either case, the technique works as follows. When the processor wishes to read or write a block of data, it issues a command to the DMA module, by sending to the DMA module the following information: • Whether a read or write is requested • The address of the I/O device involved • The starting location in memory to read data from or write data to • The number of words to be read or written
  • #46: The processor then continues with other work. It has delegated this I/O operation to the DMA module, and that module will take care of it. The DMA module transfers the entire block of data, one word at a time, directly to or from memory without going through the processor. When the transfer is complete, the DMA module sends an interrupt signal to the processor. Thus, the processor is involved only at the beginning and end of the transfer. The DMA module needs to take control of the bus to transfer data to and from memory. Because of this competition for bus usage, there may be times when the processor needs the bus and must wait for the DMA module. Note that this is not an interrupt; the processor does not save a context and do something else. Rather, the processor pauses for one bus cycle (the time it takes to transfer one word across the bus). The overall effect is to cause the processor to execute more slowly during a DMA transfer when processor access to the bus is required. Nevertheless, for a multiple-word I/O transfer, DMA is far more efficient than interrupt-driven or programmed I/O.
  • #47: An SMP can be defined as a stand-alone computer system with the following characteristics: 1. There are two or more similar processors of comparable capability. 2. These processors share the same main memory and I/O facilities and are interconnected by a bus or other internal connection scheme, such that memory access time is approximately the same for each processor. 3. All processors share access to I/O devices, either through the same channels or through different channels that provide paths to the same device. 4. All processors can perform the same functions (hence the term symmetric ). 5. The system is controlled by an integrated operating system that provides interaction between processors and their programs at the job, task, file, and data element levels. Points 1 to 4 should be self-explanatory. Point 5 illustrates one of the contrasts with a loosely coupled multiprocessing system, such as a cluster. In the latter, the physical unit of interaction is usually a message or complete file. In an SMP, individual data elements can constitute the level of interaction, and there can be a high degree of cooperation between processes.
  • #48: An SMP organization has a number of potential advantages over a uniprocessor organization, including the following: • Performance: If the work to be done by a computer can be organized so that some portions of the work can be done in parallel, then a system with multiple processors will yield greater performance than one with a single processor of the same type. Availability: In a symmetric multiprocessor, because all processors can perform the same functions, the failure of a single processor does not halt the machine. Instead, the system can continue to function at reduced performance. • Incremental growth: A user can enhance the performance of a system by adding an additional processor. • Scaling: Vendors can offer a range of products with different price and performance characteristics based on the number of processors configured in the system. It is important to note that these are potential, rather than guaranteed, benefits. The operating system must provide tools and functions to exploit the parallelism in an SMP system. An attractive feature of an SMP is that the existence of multiple processors is transparent to the user. The operating system takes care of scheduling of tasks on individual processors and of synchronization among processors.
  • #49: ORGANIZATION Figure 1.19 illustrates the general organization of an SMP. There are multiple processors, each of which contains its own control unit, arithmetic logic unit, and registers. Each processor has access to a shared main memory and the I/O devices through some form of interconnection mechanism; a shared bus is a common facility. The processors can communicate with each other through memory (messages and status information left in shared address spaces). It may also be possible for processors to exchange signals directly. The memory is often organized so that multiple simultaneous accesses to separate blocks of memory are possible. In modern computers, processors generally have at least one level of cache memory that is private to the processor. This use of cache introduces some new design considerations. Because each local cache contains an image of a portion of main memory, if a word is altered in one cache, it could conceivably invalidate a word in another cache. To prevent this, the other processors must be alerted that an update has taken place. This problem is known as the cache coherence problem and is typically addressed in hardware rather than by the OS.
  • #50: A multicore computer, also known as a chip multiprocessor , combines two or more processors (called cores) on a single piece of silicon (called a die). Typically, each core consists of all of the components of an independent processor, such as registers, ALU, pipeline hardware, and control unit, plus L1 instruction and data caches. In addition to the multiple cores, contemporary multicore chips also include L2 cache and, in some cases, L3 cache. The motivation for the development of multicore computers can be summed up as follows. For decades, microprocessor systems have experienced a steady, usually exponential, increase in performance. This is partly due to hardware trends, such as an increase in clock frequency and the ability to put cache memory closer to the processor because of the increasing miniaturization of microcomputer components. Performance has also been improved by the increased complexity of processor design to exploit parallelism in instruction execution and memory access. In brief, designers have come up against practical limits in the ability to achieve greater performance by means of more complex processors. Designers have found that the best way to improve performance to take advantage of advances in hardware is to put multiple processors and a substantial amount of cache memory on a single chip. A detailed discussion of the rationale for this trend is beyond our scope, but is summarized in Appendix C .
  • #51: An example of a multicore system is the Intel Core i7, which includes four x86 processors, each with a dedicated L2 cache, and with a shared L3 cache ( Figure 1.20 ). One mechanism Intel uses to make its caches more effective is prefetching, in which the hardware examines memory access patterns and attempts to fill the caches speculatively with data that’s likely to be requested soon.
  • #52: The Core i7 chip supports two forms of external communications to other chips. The DDR3 memory controller brings the memory controller for the DDR (double data rate) main memory onto the chip. The interface supports three channels that are 8 bytes wide for a total bus width of 192 bits, for an aggregate data rate of up to 32 GB/s. With the memory controller on the chip, the Front Side Bus is eliminated. The QuickPath Interconnect (QPI) is a point-to-point link electrical interconnect specification. It enables high-speed communications among connected processor chips. The QPI link operates at 6.4 GT/s (transfers per second).
  • #53: Summary of Chapter 1.