SlideShare a Scribd company logo
Performance Modeling And Design Of Computer
Systems Queueing Theory In Action Mor
Harcholbalter download
https://ebookbell.com/product/performance-modeling-and-design-of-
computer-systems-queueing-theory-in-action-mor-
harcholbalter-4924082
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Molecular Modeling For The Design Of Novel Performance Chemicals And
Materials Beena Rai
https://ebookbell.com/product/molecular-modeling-for-the-design-of-
novel-performance-chemicals-and-materials-beena-rai-4347818
Stochastic Modeling Of Manufacturing Systems Advances In Design
Performance Evaluation And Control Issues 1st Edition J Macgregor
Smith Auth
https://ebookbell.com/product/stochastic-modeling-of-manufacturing-
systems-advances-in-design-performance-evaluation-and-control-
issues-1st-edition-j-macgregor-smith-auth-4191222
Design Modeling Manufacturing And Performance Evaluation Of A
Solarpowered Singleeffect Absorption Cooling System Harcdr Vahid
Vakiloroaya
https://ebookbell.com/product/design-modeling-manufacturing-and-
performance-evaluation-of-a-solarpowered-singleeffect-absorption-
cooling-system-harcdr-vahid-vakiloroaya-58530276
Statistical Performance Modeling And Optimization Foundations And
Trends In Electronic Design Automation Xin Li
https://ebookbell.com/product/statistical-performance-modeling-and-
optimization-foundations-and-trends-in-electronic-design-automation-
xin-li-2014772
Recent Developments In Pavement Design Modeling And Performance 1st Ed
Sherif Elbadawy
https://ebookbell.com/product/recent-developments-in-pavement-design-
modeling-and-performance-1st-ed-sherif-elbadawy-7320708
Parallel Pnp Robots Parametric Modeling Performance Evaluation And
Design Optimization 1st Ed Guanglei Wu
https://ebookbell.com/product/parallel-pnp-robots-parametric-modeling-
performance-evaluation-and-design-optimization-1st-ed-guanglei-
wu-22476758
High Performance Memory Testing Design Principles Fault Modeling And
Selftest 1st Edition R Dean Adams
https://ebookbell.com/product/high-performance-memory-testing-design-
principles-fault-modeling-and-selftest-1st-edition-r-dean-
adams-2261034
Network Design Modelling And Performance Evaluation Quoctuan Vien
https://ebookbell.com/product/network-design-modelling-and-
performance-evaluation-quoctuan-vien-49118674
Proton Exchange Membrane Fuel Cells Design Modelling And Performance
Assessment Techniques 1st Edition Alhussein Albarbar
https://ebookbell.com/product/proton-exchange-membrane-fuel-cells-
design-modelling-and-performance-assessment-techniques-1st-edition-
alhussein-albarbar-6842500
Performance Modeling And Design Of Computer Systems Queueing Theory In Action Mor Harcholbalter
Performance
Modeling and
Design of
Computer
Systems
QUEUEING
THEORY
IN ACTION
MOR HARCHOL-BALTER
Harchol-
Balter
PER
FOR
MANCE
MODELING
AND
DESIGN
OF
COMPUTER
SYSTEMS
Computer systems design is full of conundrums:
• Given a choice between a single machine with speed s, or n machines each
with speed s/n, which should we choose?
• If both the arrival rate and service rate double, will the mean response time
stay the same?
• Should systems really aim to balance load, or is this a convenient myth?
• If a scheduling policy favors one set of jobs, does it necessarily hurt some
other jobs, or are these “conservation laws” being misinterpreted?
• Do greedy, shortest-delay, routing strategies make sense in a server farm, or is
what is good for the individual disastrous for the system as a whole?
• How do high job size variability and heavy-tailed workloads affect the choice
of a scheduling policy?
• How should one trade off energy and delay in designing a computer system?
• If 12 servers are needed to meet delay guarantees when the arrival rate is 9
jobs/sec, will we need 12,000 servers when the arrival rate is 9,000 jobs/sec?
Tackling the questions that systems designers care about, this book brings
queueing theory decisively back to computer science. The book is written with
computer scientists and engineers in mind and is full of examples from computer
systems, as well as manufacturing and operations research. Fun and readable,
the book is highly approachable, even for undergraduates, while still being
thoroughly rigorous and also covering a much wider span of topics than many
queueing books.
Readers benefit from a lively mix of motivation and intuition, with illustrations,
examples, and more than 300 exercises – all while acquiring the skills needed
to model, analyze, and design large-scale systems with good performance
and low cost. The exercises are an important feature, teaching research-level
counterintuitive lessons in the design of computer systems. The goal is to train
readers not only to customize existing analyses but also to invent their own.
Mor Harchol-Balter is an Associate Professor in the Computer Science
Department at Carnegie Mellon University. She is a leader in the ACM Sigmetrics
Conference on Measurement and Modeling of Computer Systems, having served
as technical program committee chair in 2007 and conference chair in 2013.
Cover photo © Ferenc Cegledi / Shutterstock.com
Cover design by James F. Brisson
Performance Modeling and Design of Computer Systems
Computer systems design is full of conundrums:
r Given a choice between a single machine with speed s, or n machines each with
speed s/n, which should we choose?
r If both the arrival rate and service rate double, will the mean response time stay the
same?
r Should systems really aim to balance load, or is this a convenient myth?
r If a scheduling policy favors one set of jobs, does it necessarily hurt some other jobs,
or are these “conservation laws” being misinterpreted?
r Do greedy, shortest-delay, routing strategies make sense in a server farm, or is what
is good for the individual disastrous for the system as a whole?
r How do high job size variability and heavy-tailed workloads affect the choice of a
scheduling policy?
r How should one trade off energy and delay in designing a computer system?
r If 12 servers are needed to meet delay guarantees when the arrival rate is 9 jobs/sec,
will we need 12,000 servers when the arrival rate is 9,000 jobs/sec?
Tackling the questions that systems designers care about, this book brings queueing theory
decisively back to computer science. The book is written with computer scientists and
engineers in mind and is full of examples from computer systems, as well as manufacturing
and operations research. Fun and readable, the book is highly approachable, even for
undergraduates, while still being thoroughly rigorous and also covering a much wider span
of topics than many queueing books.
Readers benefit from a lively mix of motivation and intuition, with illustrations, examples,
and more than 300 exercises – all while acquiring the skills needed to model, analyze,
and design large-scale systems with good performance and low cost. The exercises are an
important feature, teaching research-level counterintuitive lessons in the design of computer
systems. The goal is to train readers not only to customize existing analyses but also to
invent their own.
Mor Harchol-Balter is an Associate Professor in the Computer Science Department at
Carnegie Mellon University. She is a leader in the ACM Sigmetrics Conference on Measure-
ment and Modeling of Computer Systems, having served as technical program committee
chair in 2007 and conference chair in 2013.
Performance Modeling And Design Of Computer Systems Queueing Theory In Action Mor Harcholbalter
Performance Modeling and
Design of Computer Systems
Queueing Theory in Action
Mor Harchol-Balter
Carnegie Mellon University, Pennsylvania
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town,
Singapore, São Paulo, Delhi, Mexico City
Cambridge University Press
32 Avenue of the Americas, New York, NY 10013-2473, USA
www.cambridge.org
Information on this title: www.cambridge.org/9781107027503
C
 Mor Harchol-Balter 2013
This publication is in copyright. Subject to statutory exception
and to the provisions of relevant collective licensing agreements,
no reproduction of any part may take place without the written
permission of Cambridge University Press.
First published 2013
Printed in the United States of America
A catalog record for this publication is available from the British Library.
Library of Congress Cataloging in Publication Data
Harchol-Balter, Mor, 1966–
Performance modeling and design of computer systems : queueing theory in
action / Mor Harchol-Balter.
pages cm
Includes bibliographical references and index.
ISBN 978-1-107-02750-3
1. Transaction systems (Computer systems) – Mathematical models. 2. Computer
systems – Design and construction – Mathematics. 3. Queueing theory.
4. Queueing networks (Data transmission) I. Title.
QA76.545.H37 2013
519.8
2–dc23 2012019844
ISBN 978-1-107-02750-3 Hardback
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party Internet websites referred to in this publication and
does not guarantee that any content on such websites is, or will remain, accurate or
appropriate.
To my loving husband Andrew, my awesome son Danny,
and my parents, Irit and Micha
I have always been interested in finding better designs for computer systems, designs
that improve performance without the purchase of additional resources. When I look
back at the problems that I have solved and I look ahead to the problems I hope to
solve, I realize that the problem formulations keep getting simpler and simpler, and my
footing less secure. Every wisdom that I once believed, I have now come to question:
If a scheduling policy helps one set of jobs, does it necessarily hurt some other jobs,
or are these “conservation laws” being misinterpreted? Do greedy routing strategies
make sense in server farms, or is what is good for the individual actually disastrous for
the system as a whole? When comparing a single fast machine with n slow machines,
each of 1/nth the speed, the single fast machine is typically much more expensive – but
does that mean that it is necessarily better? Should distributed systems really aim to
balance load, or is this a convenient myth? Cycle stealing, where machines can help
each other when they are idle, sounds like a great idea, but can we quantify the actual
benefit? How much is the performance of scheduling policies affected by variability
in the arrival rate and service rate and by fluctuations in the load, and what can we do
to combat variability? Inherent in these questions is the impact of real user behaviors
and real-world workloads with heavy-tailed, highly variable service demands, as
well as correlated arrival processes. Also intertwined in my work are the tensions
between theoretical analysis and the realities of implementation, each motivating the
other. In my search to discover new research techniques that allow me to answer
these and other questions, I find that I am converging toward the fundamental core
that defines all these problems, and that makes the counterintuitive more believable.
Contents
Preface xvii
Acknowledgments xxiii
I Introduction to Queueing
1 Motivating Examples of the Power of Analytical Modeling 3
1.1 What Is Queueing Theory? 3
1.2 Examples of the Power of Queueing Theory 5
2 Queueing Theory Terminology 13
2.1 Where We Are Heading 13
2.2 The Single-Server Network 13
2.3 Classification of Queueing Networks 16
2.4 Open Networks 16
2.5 More Metrics: Throughput and Utilization 17
2.6 Closed Networks 20
2.6.1 Interactive (Terminal-Driven) Systems 21
2.6.2 Batch Systems 22
2.6.3 Throughput in a Closed System 23
2.7 Differences between Closed and Open Networks 24
2.7.1 A Question on Modeling 25
2.8 Related Readings 25
2.9 Exercises 26
II Necessary Probability Background
3 Probability Review 31
3.1 Sample Space and Events 31
3.2 Probability Defined on Events 32
3.3 Conditional Probabilities on Events 33
3.4 Independent Events and Conditionally Independent Events 34
3.5 Law of Total Probability 35
3.6 Bayes Law 36
3.7 Discrete versus Continuous Random Variables 37
3.8 Probabilities and Densities 38
3.8.1 Discrete: Probability Mass Function 38
3.8.2 Continuous: Probability Density Function 41
3.9 Expectation and Variance 44
3.10 Joint Probabilities and Independence 47
vii
viii contents
3.11 Conditional Probabilities and Expectations 49
3.12 Probabilities and Expectations via Conditioning 53
3.13 Linearity of Expectation 54
3.14 Normal Distribution 57
3.14.1 Linear Transformation Property 58
3.14.2 Central Limit Theorem 61
3.15 Sum of a Random Number of Random Variables 62
3.16 Exercises 64
4 Generating Random Variables for Simulation 70
4.1 Inverse-Transform Method 70
4.1.1 The Continuous Case 70
4.1.2 The Discrete Case 72
4.2 Accept-Reject Method 72
4.2.1 Discrete Case 73
4.2.2 Continuous Case 75
4.2.3 Some Harder Problems 77
4.3 Readings 78
4.4 Exercises 78
5 Sample Paths, Convergence, and Averages 79
5.1 Convergence 79
5.2 Strong and Weak Laws of Large Numbers 83
5.3 Time Average versus Ensemble Average 84
5.3.1 Motivation 85
5.3.2 Definition 86
5.3.3 Interpretation 86
5.3.4 Equivalence 88
5.3.5 Simulation 90
5.3.6 Average Time in System 90
5.4 Related Readings 91
5.5 Exercise 91
III The Predictive Power of Simple Operational Laws: “What-If”
Questions and Answers
6 Little’s Law and Other Operational Laws 95
6.1 Little’s Law for Open Systems 95
6.2 Intuitions 96
6.3 Little’s Law for Closed Systems 96
6.4 Proof of Little’s Law for Open Systems 97
6.4.1 Statement via Time Averages 97
6.4.2 Proof 98
6.4.3 Corollaries 100
6.5 Proof of Little’s Law for Closed Systems 101
6.5.1 Statement via Time Averages 101
6.5.2 Proof 102
6.6 Generalized Little’s Law 102
contents ix
6.7 Examples Applying Little’s Law 103
6.8 More Operational Laws: The Forced Flow Law 106
6.9 Combining Operational Laws 107
6.10 Device Demands 110
6.11 Readings and Further Topics Related to Little’s Law 111
6.12 Exercises 111
7 Modification Analysis: “What-If” for Closed Systems 114
7.1 Review 114
7.2 Asymptotic Bounds for Closed Systems 115
7.3 Modification Analysis for Closed Systems 118
7.4 More Modification Analysis Examples 119
7.5 Comparison of Closed and Open Networks 122
7.6 Readings 122
7.7 Exercises 122
IV From Markov Chains to Simple Queues
8 Discrete-Time Markov Chains 129
8.1 Discrete-Time versus Continuous-Time Markov Chains 130
8.2 Definition of a DTMC 130
8.3 Examples of Finite-State DTMCs 131
8.3.1 Repair Facility Problem 131
8.3.2 Umbrella Problem 132
8.3.3 Program Analysis Problem 132
8.4 Powers of P: n-Step Transition Probabilities 133
8.5 Stationary Equations 135
8.6 The Stationary Distribution Equals the Limiting Distribution 136
8.7 Examples of Solving Stationary Equations 138
8.7.1 Repair Facility Problem with Cost 138
8.7.2 Umbrella Problem 139
8.8 Infinite-State DTMCs 139
8.9 Infinite-State Stationarity Result 140
8.10 Solving Stationary Equations in Infinite-State DTMCs 142
8.11 Exercises 145
9 Ergodicity Theory 148
9.1 Ergodicity Questions 148
9.2 Finite-State DTMCs 149
9.2.1 Existence of the Limiting Distribution 149
9.2.2 Mean Time between Visits to a State 153
9.2.3 Time Averages 155
9.3 Infinite-State Markov Chains 155
9.3.1 Recurrent versus Transient 156
9.3.2 Infinite Random Walk Example 160
9.3.3 Positive Recurrent versus Null Recurrent 162
9.4 Ergodic Theorem of Markov Chains 164
x contents
9.5 Time Averages 166
9.6 Limiting Probabilities Interpreted as Rates 168
9.7 Time-Reversibility Theorem 170
9.8 When Chains Are Periodic or Not Irreducible 171
9.8.1 Periodic Chains 171
9.8.2 Chains that Are Not Irreducible 177
9.9 Conclusion 177
9.10 Proof of Ergodic Theorem of Markov Chains∗
178
9.11 Exercises 183
10 Real-World Examples: Google, Aloha, and Harder Chains∗
190
10.1 Google’s PageRank Algorithm 190
10.1.1 Google’s DTMC Algorithm 190
10.1.2 Problems with Real Web Graphs 192
10.1.3 Google’s Solution to Dead Ends and Spider Traps 194
10.1.4 Evaluation of the PageRank Algorithm 195
10.1.5 Practical Implementation Considerations 195
10.2 Aloha Protocol Analysis 195
10.2.1 The Slotted Aloha Protocol 196
10.2.2 The Aloha Markov Chain 196
10.2.3 Properties of the Aloha Markov Chain 198
10.2.4 Improving the Aloha Protocol 199
10.3 Generating Functions for Harder Markov Chains 200
10.3.1 The z-Transform 201
10.3.2 Solving the Chain 201
10.4 Readings and Summary 203
10.5 Exercises 204
11 Exponential Distribution and the Poisson Process 206
11.1 Definition of the Exponential Distribution 206
11.2 Memoryless Property of the Exponential 207
11.3 Relating Exponential to Geometric via δ-Steps 209
11.4 More Properties of the Exponential 211
11.5 The Celebrated Poisson Process 213
11.6 Merging Independent Poisson Processes 218
11.7 Poisson Splitting 218
11.8 Uniformity 221
11.9 Exercises 222
12 Transition to Continuous-Time Markov Chains 225
12.1 Defining CTMCs 225
12.2 Solving CTMCs 229
12.3 Generalization and Interpretation 232
12.3.1 Interpreting the Balance Equations for the CTMC 234
12.3.2 Summary Theorem for CTMCs 234
12.4 Exercises 234
contents xi
13 M/M/1 and PASTA 236
13.1 The M/M/1 Queue 236
13.2 Examples Using an M/M/1 Queue 239
13.3 PASTA 242
13.4 Further Reading 245
13.5 Exercises 245
V Server Farms and Networks: Multi-server, Multi-queue Systems
14 Server Farms: M/M/k and M/M/k/k 253
14.1 Time-Reversibility for CTMCs 253
14.2 M/M/k/k Loss System 255
14.3 M/M/k 258
14.4 Comparison of Three Server Organizations 263
14.5 Readings 264
14.6 Exercises 264
15 Capacity Provisioning for Server Farms 269
15.1 What Does Load Really Mean in an M/M/k? 269
15.2 The M/M/∞ 271
15.2.1 Analysis of the M/M/∞ 271
15.2.2 A First Cut at a Capacity Provisioning Rule for the M/M/k 272
15.3 Square-Root Staffing 274
15.4 Readings 276
15.5 Exercises 276
16 Time-Reversibility and Burke’s Theorem 282
16.1 More Examples of Finite-State CTMCs 282
16.1.1 Networks with Finite Buffer Space 282
16.1.2 Batch System with M/M/2 I/O 284
16.2 The Reverse Chain 285
16.3 Burke’s Theorem 288
16.4 An Alternative (Partial) Proof of Burke’s Theorem 290
16.5 Application: Tandem Servers 291
16.6 General Acyclic Networks with Probabilistic Routing 293
16.7 Readings 294
16.8 Exercises 294
17 Networks of Queues and Jackson Product Form 297
17.1 Jackson Network Definition 297
17.2 The Arrival Process into Each Server 298
17.3 Solving the Jackson Network 300
17.4 The Local Balance Approach 301
17.5 Readings 306
17.6 Exercises 306
18 Classed Network of Queues 311
18.1 Overview 311
18.2 Motivation for Classed Networks 311
xii contents
18.3 Notation and Modeling for Classed Jackson Networks 314
18.4 A Single-Server Classed Network 315
18.5 Product Form Theorems 317
18.6 Examples Using Classed Networks 322
18.6.1 Connection-Oriented ATM Network Example 322
18.6.2 Distribution of Job Classes Example 325
18.6.3 CPU-Bound and I/O-Bound Jobs Example 326
18.7 Readings 329
18.8 Exercises 329
19 Closed Networks of Queues 331
19.1 Motivation 331
19.2 Product-Form Solution 333
19.2.1 Local Balance Equations for Closed Networks 333
19.2.2 Example of Deriving Limiting Probabilities 335
19.3 Mean Value Analysis (MVA) 337
19.3.1 The Arrival Theorem 338
19.3.2 Iterative Derivation of Mean Response Time 340
19.3.3 An MVA Example 341
19.4 Readings 343
19.5 Exercises 343
VI Real-World Workloads: High Variability and Heavy Tails
20 Tales of Tails: A Case Study of Real-World Workloads 349
20.1 Grad School Tales . . . Process Migration 349
20.2 UNIX Process Lifetime Measurements 350
20.3 Properties of the Pareto Distribution 352
20.4 The Bounded Pareto Distribution 353
20.5 Heavy Tails 354
20.6 The Benefits of Active Process Migration 354
20.7 Pareto Distributions Are Everywhere 355
20.8 Exercises 357
21 Phase-Type Distributions and Matrix-Analytic Methods 359
21.1 Representing General Distributions by Exponentials 359
21.2 Markov Chain Modeling of PH Workloads 364
21.3 The Matrix-Analytic Method 366
21.4 Analysis of Time-Varying Load 367
21.4.1 High-Level Ideas 367
21.4.2 The Generator Matrix, Q 368
21.4.3 Solving for R 370
21.4.4 Finding 
π0 371
21.4.5 Performance Metrics 372
21.5 More Complex Chains 372
21.6 Readings and Further Remarks 376
21.7 Exercises 376
contents xiii
22 Networks with Time-Sharing (PS) Servers (BCMP) 380
22.1 Review of Product-Form Networks 380
22.2 BCMP Result 380
22.2.1 Networks with FCFS Servers 381
22.2.2 Networks with PS Servers 382
22.3 M/M/1/PS 384
22.4 M/Cox/1/PS 385
22.5 Tandem Network of M/G/1/PS Servers 391
22.6 Network of PS Servers with Probabilistic Routing 393
22.7 Readings 394
22.8 Exercises 394
23 The M/G/1 Queue and the Inspection Paradox 395
23.1 The Inspection Paradox 395
23.2 The M/G/1 Queue and Its Analysis 396
23.3 Renewal-Reward Theory 399
23.4 Applying Renewal-Reward to Get Expected Excess 400
23.5 Back to the Inspection Paradox 402
23.6 Back to the M/G/1 Queue 403
23.7 Exercises 405
24 Task Assignment Policies for Server Farms 408
24.1 Task Assignment for FCFS Server Farms 410
24.2 Task Assignment for PS Server Farms 419
24.3 Optimal Server Farm Design 424
24.4 Readings and Further Follow-Up 428
24.5 Exercises 430
25 Transform Analysis 433
25.1 Definitions of Transforms and Some Examples 433
25.2 Getting Moments from Transforms: Peeling the Onion 436
25.3 Linearity of Transforms 439
25.4 Conditioning 441
25.5 Distribution of Response Time in an M/M/1 443
25.6 Combining Laplace and z-Transforms 444
25.7 More Results on Transforms 445
25.8 Readings 446
25.9 Exercises 446
26 M/G/1 Transform Analysis 450
26.1 The z-Transform of the Number in System 450
26.2 The Laplace Transform of Time in System 454
26.3 Readings 456
26.4 Exercises 456
27 Power Optimization Application 457
27.1 The Power Optimization Problem 457
27.2 Busy Period Analysis of M/G/1 459
27.3 M/G/1 with Setup Cost 462
xiv contents
27.4 Comparing ON/IDLE versus ON/OFF 465
27.5 Readings 467
27.6 Exercises 467
VII Smart Scheduling in the M/G/1
28 Performance Metrics 473
28.1 Traditional Metrics 473
28.2 Commonly Used Metrics for Single Queues 474
28.3 Today’s Trendy Metrics 474
28.4 Starvation/Fairness Metrics 475
28.5 Deriving Performance Metrics 476
28.6 Readings 477
29 Scheduling: Non-Preemptive, Non-Size-Based Policies 478
29.1 FCFS, LCFS, and RANDOM 478
29.2 Readings 481
29.3 Exercises 481
30 Scheduling: Preemptive, Non-Size-Based Policies 482
30.1 Processor-Sharing (PS) 482
30.1.1 Motivation behind PS 482
30.1.2 Ages of Jobs in the M/G/1/PS System 483
30.1.3 Response Time as a Function of Job Size 484
30.1.4 Intuition for PS Results 487
30.1.5 Implications of PS Results for Understanding FCFS 487
30.2 Preemptive-LCFS 488
30.3 FB Scheduling 490
30.4 Readings 495
30.5 Exercises 496
31 Scheduling: Non-Preemptive, Size-Based Policies 499
31.1 Priority Queueing 499
31.2 Non-Preemptive Priority 501
31.3 Shortest-Job-First (SJF) 504
31.4 The Problem with Non-Preemptive Policies 506
31.5 Exercises 507
32 Scheduling: Preemptive, Size-Based Policies 508
32.1 Motivation 508
32.2 Preemptive Priority Queueing 508
32.3 Preemptive-Shortest-Job-First (PSJF) 512
32.4 Transform Analysis of PSJF 514
32.5 Exercises 516
33 Scheduling: SRPT and Fairness 518
33.1 Shortest-Remaining-Processing-Time (SRPT) 518
33.2 Precise Derivation of SRPT Waiting Time∗
521
contents xv
33.3 Comparisons with Other Policies 523
33.3.1 Comparison with PSJF 523
33.3.2 SRPT versus FB 523
33.3.3 Comparison of All Scheduling Policies 524
33.4 Fairness of SRPT 525
33.5 Readings 529
Bibliography 531
Index 541
Performance Modeling And Design Of Computer Systems Queueing Theory In Action Mor Harcholbalter
Preface
The ad hoc World of Computer System Design
The design of computer systems is often viewed very much as an art rather than a
science. Decisions about which scheduling policy to use, how many servers to run,
what speed to operate each server at, and the like are often based on intuitions rather
than mathematically derived formulas. Specific policies built into kernels are often
riddled with secret “voodoo constants,”1
which have no explanation but seem to “work
well” under some benchmarked workloads. Computer systems students are often told
to first build the system and then make changes to the policies to improve system
performance, rather than first creating a formal model and design of the system on
paper to ensure the system meets performance goals.
Even when trying to evaluate the performance of an existing computer system, students
are encouraged to simulate the system and spend many days running their simulation
under different workloads waiting to see what happens. Given that the search space of
possible workloads and input parameters is often huge, vast numbers of simulations
are needed to properly cover the space. Despite this fact, mathematical models of the
system are rarely created, and we rarely characterize workloads stochastically. There is
no formal analysis of the parameter space under which the computer system is likely to
perform well versus that under which it is likely to perform poorly. It is no wonder that
computer systems students are left feeling that the whole process of system evaluation
and design is very ad hoc. As an example, consider the trial-and-error approach to
updating resource scheduling in the many versions of the Linux kernel.
Analytical Modeling for Computer Systems
But it does not have to be this way! These same systems designers could mathematically
model the system, stochastically characterize the workloads and performance goals,
and then analytically derive the performance of the system as a function of workload
and input parameters. The fields of analytical modeling and stochastic processes have
existed for close to a century, and they can be used to save systems designers huge
numbers of hours in trial and error while improving performance. Analytical modeling
can also be used in conjunction with simulation to help guide the simulation, reducing
the number of cases that need to be explored.
1 The term “voodoo constants” was coined by Prof. John Ousterhout during his lectures at the University of
California, Berkeley.
xvii
xviii preface
Unfortunately, of the hundreds of books written on stochastic processes, almost none
deal with computer systems. The examples in those books and the material covered are
oriented toward operations research areas such as manufacturing systems, or human
operators answering calls in a call center, or some assembly-line system with different
priority jobs.
In many ways the analysis used in designing manufacturing systems is not all that
different from computer systems. There are many parallels between a human operator
and a computer server: There are faster human operators and slower ones (just as
computer servers); the human servers sometimes get sick (just as computer servers
sometimes break down); when not needed, human operators can be sent home to save
money (just as computer servers can be turned off to save power); there is a startup
overhead to bringing back a human operator (just as there is a warmup cost to turning
on a computer server); and the list goes on.
However, there are also many differences between manufacturing systems and com-
puter systems. To start, computer systems workloads have been shown to have ex-
tremely high variability in job sizes (service requirements), with squared coefficients
of variation upward of 100. This is very different from the low-variability service times
characteristic of job sizes in manufacturing workloads. This difference in variability
can result in performance differences of orders of magnitude. Second, computer work-
loads are typically preemptible, and time-sharing (Processor-Sharing) of the CPU is
extremely common. By contrast, most manufacturing workloads are non-preemptive
(first-come-first-serve service order is the most common). Thus most books on stochas-
tic processes and queueing omit chapters on Processor-Sharing or more advanced pre-
emptive policies like Shortest-Remaining-Processing-Time, which are very much at
the heart of computer systems. Processor-Sharing is particularly relevant when analyz-
ing server farms, which, in the case of computer systems, are typically composed of
Processor-Sharing servers, not First-Come-First-Served ones. It is also relevant in any
computing application involving bandwidth being shared between users, which typi-
cally happens in a processor-sharing style, not first-come-first-serve order. Performance
metrics may also be different for computer systems as compared with manufacturing
systems (e.g., power usage, an important metric for computer systems, is not mentioned
in stochastic processes books). Closed-loop architectures, in which new jobs are not
created until existing jobs complete, and where the performance goal is to maximize
throughput, are very common in computer systems, but are often left out of queueing
books. Finally, the particular types of interactions that occur in disks, networking pro-
tocols, databases, memory controllers, and other computer systems are very different
from what has been analyzed in traditional queueing books.
The Goal of This Book
Many times I have walked into a fellow computer scientist’s office and was pleased to
find a queueing book on his shelf. Unfortunately, when questioned, my colleague was
quick to answer that he never uses the book because “The world doesn’t look like an
M/M/1 queue, and I can’t understand anything past that chapter.” The problem is that
preface xix
the queueing theory books are not “friendly” to computer scientists. The applications
are not computer-oriented, and the assumptions used are often unrealistic for computer
systems. Furthermore, these books are abstruse and often impenetrable by anyone who
has not studied graduate-level mathematics. In some sense this is hard to avoid: If one
wants to do more than provide readers with formulas to “plug into,” then one has to
teach them to derive their own formulas, and this requires learning a good deal of math.
Fortunately, as one of my favorite authors, Sheldon Ross, has shown, it is possible to
teach a lot of stochastic analysis in a fun and simple way that does not require first
taking classes in measure theory and real analysis.
My motive in writing this book is to improve the design of computer systems by intro-
ducing computer scientists to the powerful world of queueing-theoretic modeling and
analysis. Personally, I have found queueing-theoretic analysis to be extremely valuable
in much of my research including: designing routing protocols for networks, designing
better scheduling algorithms for web servers and database management systems, disk
scheduling, memory-bank allocation, supercomputing resource scheduling, and power
management and capacity provisioning in data centers. Content-wise, I have two goals
for the book. First, I want to provide enough applications from computer systems to
make the book relevant and interesting to computer scientists. Toward this end, almost
half the chapters of the book are “application” chapters. Second, I want to make the
book mathematically rich enough to give readers the ability to actually develop new
queueing analysis, not just apply existing analysis. As computer systems and their
workloads continue to evolve and become more complex, it is unrealistic to assume
that they can be modeled with known queueing frameworks and analyses. As a designer
of computer systems myself, I am constantly finding that I have to invent new queueing
concepts to model aspects of computer systems.
How This Book Came to Be
In 1998, as a postdoc at MIT, I developed and taught a new computer science class,
which I called “Performance Analysis and Design of Computer Systems.” The class
had the following description:
In designing computer systems one is usually constrained by certain performance
goals (e.g., low response time or high throughput or low energy). On the other hand,
one often has many choices: One fast disk, or two slow ones? What speed CPU will
suffice? Should we invest our money in more buffer space or a faster processor?
How should jobs be scheduled by the processor? Does it pay to migrate active jobs?
Which routing policy will work best? Should one balance load among servers? How
can we best combat high-variability workloads? Often answers to these questions are
counterintuitive. Ideally, one would like to have answers to these questions before
investing the time and money to build a system. This class will introduce students
to analytic stochastic modeling, which allows system designers to answer questions
such as those above.
Since then, I have further developed the class via 10 more iterations taught within
the School of Computer Science at Carnegie Mellon, where I taught versions of the
xx preface
class to both PhD students and advanced undergraduates in the areas of computer
science, engineering, mathematics, and operations research. In 2002, the Operations
Management department within the Tepper School of Business at Carnegie Mellon
made the class a qualifier requirement for all operations management students.
As other faculty, including my own former PhD students, adopted my lecture notes in
teaching their own classes, I was frequently asked to turn the notes into a book. This is
“version 1” of that book.
Outline of the Book
This book is written in a question/answer style, which mimics the Socratic style that
I use in teaching. I believe that a class “lecture” should ideally be a long sequence
of bite-sized questions, which students can easily provide answers to and which lead
students to the right intuitions. In reading this book, it is extremely important to try
to answer each question without looking at the answer that follows the question. The
questions are written to remind the reader to “think” rather than just “read,” and to
remind the teacher to ask questions rather than just state facts.
There are exercises at the end of each chapter. The exercises are an integral part of the
book and should not be skipped. Many exercises are used to illustrate the application
of the theory to problems in computer systems design, typically with the purpose of
illuminating a key insight. All exercises are related to the material covered in the
chapter, with early exercises being straightforward applications of the material and
later exercises exploring extensions of the material involving greater difficulty.
The book is divided into seven parts, which mostly build on each other.
Part I introduces queueing theory and provides motivating examples from computer
systems design that can be answered using basic queueing analysis. Basic queueing
terminology is introduced including closed and open queueing models and performance
metrics.
Part II is a probability refresher. To make this book self-contained, we have included
in these chapters all the probability that will be needed throughout the rest of the book.
This includes a summary of common discrete and continuous random variables, their
moments, and conditional expectations and probabilities. Also included is some mate-
rial on generating random variables for simulation. Finally we end with a discussion of
sample paths, convergence of sequences of random variables, and time averages versus
ensemble averages.
Part III is about operational laws, or “back of the envelope” analysis. These are
very simple laws that hold for all well-behaved queueing systems. In particular, they
do not require that any assumptions be made about the arrival process or workload
(like Poisson arrivals or Exponential service times). These laws allow us to quickly
reason at a high level (averages only) about system behavior and make design decisions
regarding what modifications will have the biggest performance impact. Applications
to high-level computer system design are provided throughout.
preface xxi
Part IV is about Markov chains and their application toward stochastic analysis of
computer systems. Markov chains allow a much more detailed analysis of systems
by representing the full space of possible states that the system can be in. Whereas
the operational laws in Part III often allow us to answer questions about the overall
mean number of jobs in a system, Markov chains allow us to derive the probability
of exactly i jobs being queued at server j of a multi-server system. Part IV includes
both discrete-time and continuous-time Markov chains. Applications include Google’s
PageRank algorithm, the Aloha (Ethernet) networking protocol, and an analysis of
dropping probabilities in finite-buffer routers.
Part V develops the Markov chain theory introduced in Part IV to allow the analysis of
more complex networks, including server farms. We analyze networks of queues with
complex routing rules, where jobs can be associated with a “class” that determines
their route through the network (these are known as BCMP networks). Part V also
derives theorems on capacity provisioning of server farms, such as the “square-root
staffing rule,” which determines the minimum number of servers needed to provide
certain delay guarantees.
The fact that Parts IV and V are based on Markov chains necessitates that certain
“Markovian” (memoryless) assumptions are made in the analysis. In particular, it is
assumed that the service requirements (sizes) of jobs follow an Exponential distribu-
tion and that the times between job arrivals are also Exponentially distributed. Many
applications are reasonably well modeled via these Exponential assumptions, allowing
us to use Markov analysis to get good insights into system performance. However,
in some cases, it is important to capture the high-variability job size distributions or
correlations present in the empirical workloads.
Part VI introduces techniques that allow us to replace these Exponential distributions
with high-variability distributions. Phase-type distributions are introduced, which allow
us to model virtually any general distribution by a mixture of Exponentials, leverag-
ing our understanding of Exponential distributions and Markov chains from Parts IV
and V. Matrix-analytic techniques are then developed to analyze systems with phase-
type workloads in both the arrival process and service process. The M/G/1 queue
is introduced, and notions such as the Inspection Paradox are discussed. Real-world
workloads are described including heavy-tailed distributions. Transform techniques
are also introduced that facilitate working with general distributions. Finally, even
the service order at the queues is generalized from simple first-come-first-served ser-
vice order to time-sharing (Processor-Sharing) service order, which is more common
in computer systems. Applications abound: Resource allocation (task assignment) in
server farms with high-variability job sizes is studied extensively, both for server farms
with non-preemptive workloads and for web server farms with time-sharing servers.
Power management policies for single servers and for data centers are also studied.
Part VII, the final part of the book, is devoted to scheduling. Smart scheduling is
extremely important in computer systems, because it can dramatically improve system
performance without requiring the purchase of any new hardware. Scheduling is at the
heart of operating systems, bandwidth allocation in networks, disks, databases, memory
hierarchies, and the like. Much of the research being done in the computer systems
xxii preface
area today involves the design and adoption of new scheduling policies. Scheduling can
be counterintuitive, however, and the analysis of even basic scheduling policies is far
from simple. Scheduling policies are typically evaluated via simulation. In introducing
the reader to analytical techniques for evaluating scheduling policies, our hope is that
more such policies might be evaluated via analysis.
We expect readers to mostly work through the chapters in order, with the following
exceptions: First, any chapter or section marked with a star (*) can be skipped without
disturbing the flow. Second, the chapter on transforms, Chapter 25, is purposely moved
to the end, so that most of the book does not depend on knowing transform analysis.
However, because learning transform analysis takes some time, we recommend that
any teacher who plans to cover transforms introduce the topic a little at a time, starting
early in the course. To facilitate this, we have included a large number of exercises at
the end of Chapter 25 that do not require material in later chapters and can be assigned
earlier in the course to give students practice manipulating transforms.
Finally, we urge readers to please check the following websites for new errors/software:
http://www.cs.cmu.edu/∼harchol/PerformanceModeling/errata.html
http://www.cs.cmu.edu/∼harchol/PerformanceModeling/software.html
Please send any additional errors to harchol@cs.cmu.edu.
Acknowledgments
Writing a book, I quickly realized, is very different from writing a research paper, even
a very long one. Book writing actually bears much more similarity to teaching a class.
That is why I would like to start by thanking the three people who most influenced my
teaching. Manuel Blum, my PhD advisor, taught me the art of creating a lecture out
of a series of bite-sized questions. Dick Karp taught me that you can cover an almost
infinite amount of material in just one lecture if you spend enough time in advance
simplifying that material into its cleanest form. Sheldon Ross inspired me by the depth
of his knowledge in stochastic processes (a knowledge so deep that he never once
looked at his notes while teaching) and by the sheer clarity and elegance of both his
lectures and his many beautifully written books.
I would also like to thank Carnegie Mellon University, and the School of Computer
Science at Carnegie Mellon, which has at its core the theme of interdisciplinary re-
search, particularly the mixing of theoretical and applied research. CMU has been the
perfect environment for me to develop the analytical techniques in this book, all in
the context of solving hard applied problems in computer systems design. CMU has
also provided me with a never-ending stream of gifted students, who have inspired
many of the exercises and discussions in this book. Much of this book came from the
research of my own PhD students, including Sherwin Doroudi, Anshul Gandhi, Varun
Gupta, Yoongu Kim, David McWherter, Takayuki Osogami, Bianca Schroeder, Adam
Wierman, and Timothy Zhu. In addition, Mark Crovella, Mike Kozuch, and particu-
larly Alan Scheller-Wolf, all longtime collaborators of mine, have inspired much of
my thinking via their uncanny intuitions and insights.
A great many people have proofread parts of this book or tested out the book and
provided me with useful feedback. These include Sem Borst, Doug Down, Erhun
Ozkan, Katsunobu Sasanuma, Alan Scheller-Wolf, Thrasyvoulos Spyropoulos, Jarod
Wang, and Zachary Young. I would also like to thank my editors, Diana Gillooly and
Lauren Cowles from Cambridge University Press, who were very quick to answer my
endless questions, and who greatly improved the presentation of this book. Finally, I am
very grateful to Miso Kim, my illustrator, a PhD student at the Carnegie Mellon School
of Design, who spent hundreds of hours designing all the fun figures in the book.
On a more personal note, I would like to thank my mother, Irit Harchol, for making
my priorities her priorities, allowing me to maximize my achievements. I did not know
what this meant until I had a child of my own. Lastly, I would like to thank my
husband, Andrew Young. He won me over by reading all my online lecture notes and
doing every homework problem – this was his way of asking me for a first date. His
ability to understand it all without attending any lectures made me believe that my
lecture notes might actually “work” as a book. His willingness to sit by my side every
night for many months gave me the motivation to make it happen.
xxiii
Performance Modeling And Design Of Computer Systems Queueing Theory In Action Mor Harcholbalter
PART I
Introduction to Queueing
Part I serves as an introduction to analytical modeling.
We begin in Chapter 1 with a number of paradoxical examples that come up in the
design of computer systems, showing off the power of analytical modeling in making
design decisions.
Chapter 2 introduces the reader to basic queueing theory terminology and notation
that is used throughout the rest of the book. Readers are introduced to both open and
closed queueing networks and to standard performance metrics, such as response time,
throughput, and the number of jobs in the system.
1
Performance Modeling And Design Of Computer Systems Queueing Theory In Action Mor Harcholbalter
CHAPTER 1
Motivating Examples of the
Power of Analytical Modeling
1.1 What Is Queueing Theory?
Queueing theory is the theory behind what happens when you have lots of jobs,
scarce resources, and subsequently long queues and delays. It is literally the “theory
of queues”: what makes queues appear and how to make them go away.
Imagine a computer system, say a web server, where there is only one job. The job
arrives, it uses certain resources (some CPU, some I/O), and then it departs. Given the
job’s resource requirements, it is very easy to predict exactly when the job will depart.
There is no delay because there are no queues. If every job indeed got to run on its own
computer, there would be no need for queueing theory. Unfortunately, that is rarely the
case.
Arriving customers
Server
Figure 1.1. Illustration of a queue, in which customers wait to be served, and a server. The
picture shows one customer being served at the server and five others waiting in the queue.
Queueing theory applies anywhere that queues come up (see Figure 1.1). We all have
had the experience of waiting in line at the bank, wondering why there are not more
tellers, or waiting in line at the supermarket, wondering why the express lane is for 8
items or less rather than 15 items or less, or whether it might be best to actually have two
express lanes, one for 8 items or less and the other for 15 items or less. Queues are also
at the heart of any computer system. Your CPU uses a time-sharing scheduler to serve
a queue of jobs waiting for CPU time. A computer disk serves a queue of jobs waiting
to read or write blocks. A router in a network serves a queue of packets waiting to be
routed. The router queue is a finite capacity queue, in which packets are dropped when
demand exceeds the buffer space. Memory banks serve queues of threads requesting
memory blocks. Databases sometimes have lock queues, where transactions wait to
acquire the lock on a record. Server farms consist of many servers, each with its own
queue of jobs. The list of examples goes on and on.
The goals of a queueing theorist are twofold. The first is predicting the system perfor-
mance. Typically this means predicting mean delay or delay variability or the proba-
bility that delay exceeds some Service Level Agreement (SLA). However, it can also
mean predicting the number of jobs that will be queueing or the mean number of servers
3
4 motivating examples of the power of analytical modeling
being utilized (e.g., total power needs), or any other such metric. Although prediction
is important, an even more important goal is finding a superior system design to im-
prove performance. Commonly this takes the form of capacity planning, where one
determines which additional resources to buy to meet delay goals (e.g., is it better to
buy a faster disk or a faster CPU, or to add a second slow disk). Many times, however,
without buying any additional resources at all, one can improve performance just by
deploying a smarter scheduling policy or different routing policy to reduce delays.
Given the importance of smart scheduling in computer systems, all of Part VII of this
book is devoted to understanding scheduling policies.
Queueing theory is built on a much broader area of mathematics called stochastic
modeling and analysis. Stochastic modeling represents the service demands of jobs and
the interarrival times of jobs as random variables. For example, the CPU requirements
of UNIX processes might be modeled using a Pareto distribution [84], whereas the
arrival process of jobs at a busy web server might be well modeled by a Poisson
process with Exponentially distributed interarrival times. Stochastic models can also
be used to model dependencies between jobs, as well as anything else that can be
represented as a random variable.
Although it is generally possible to come up with a stochastic model that adequately
represents the jobs or customers in a system and its service dynamics, these stochastic
models are not always analytically tractable with respect to solving for performance.
As we discuss in Part IV, Markovian assumptions, such as assuming Exponentially
distributed service demands or a Poisson arrival process, greatly simplify the analysis;
hence much of the existing queueing literature relies on such Markovian assumptions.
In many cases these are a reasonable approximation. For example, the arrival process of
book orders on Amazon might be reasonably well approximated by a Poisson process,
given that there are many independent users, each independently submitting requests
at a low rate (although this all breaks down when a new Harry Potter book comes
out). However, in some cases Markovian assumptions are very far from reality; for
example, in the case in which service demands of jobs are highly variable or are
correlated.
While many queueing texts downplay the Markovian assumptions being made, this
book does just the opposite. Much of my own research is devoted to demonstrating the
impact of workload assumptions on correctly predicting system performance. I have
found many cases where making simplifying assumptions about the workload can lead
to very inaccurate performance results and poor system designs. In my own research,
I therefore put great emphasis on integrating measured workload distributions into the
analysis. Rather than trying to hide the assumptions being made, this book highlights
all assumptions about workloads. We will discuss specifically whether the workload
models are accurate and how our model assumptions affect performance and design,
as well as look for more accurate workload models. In my opinion, a major reason
why computer scientists are so slow to adopt queueing theory is that the standard
Markovian assumptions often do not fit. However, there are often ways to work around
these assumptions, many of which are shown in this book, such as using phase-type
distributions and matrix-analytic methods, introduced in Chapter 21.
1.2 examples of the power of queueing theory 5
1.2 Examples of the Power of Queueing Theory
The remainder of this chapter is devoted to showing some concrete examples of the
power of queueing theory. Do not expect to understand everything in the examples. The
examples are developed in much greater detail later in the book. Terms like “Poisson
process” that you may not be familiar with are also explained later in the book. These
examples are just here to highlight the types of lessons covered in this book.
As stated earlier, one use of queueing theory is as a predictive tool, allowing one to
predict the performance of a given system. For example, one might be analyzing a
network, with certain bandwidths, where different classes of packets arrive at certain
rates and follow certain routes throughout the network simultaneously. Then queueing
theory can be used to compute quantities such as the mean time that packets spend
waiting at a particular router i, the distribution on the queue buildup at router i, or the
mean overall time to get from router i to router j in the network.
We now turn to the usefulness of queueing theory as a design tool in choosing the
best system design to minimize response time. The examples that follow illustrate that
system design is often a counterintuitive process.
Design Example 1 – Doubling Arrival Rate
Consider a system consisting of a single CPU that serves a queue of jobs in First-Come-
First-Served (FCFS) order, as illustrated in Figure 1.2. The jobs arrive according to some
random process with some average arrival rate, say λ = 3 jobs per second. Each job
has some CPU service requirement, drawn independently from some distribution of job
service requirements (we can assume any distribution on the job service requirements
for this example). Let’s say that the average service rate is μ = 5 jobs per second (i.e.,
each job on average requires 1/5 of a second of service). Note that the system is not
in overload (3  5). Let E [T] denote the mean response time of this system, where
response time is the time from when a job arrives until it completes service, a.k.a.
sojourn time.
λ = 3
FCFS CPU
µ = 5
If λ 2λ,
by how much
should µ increase?
Figure 1.2. A system with a single CPU that serves jobs in FCFS order.
Question: Your boss tells you that starting tomorrow the arrival rate will double. You
are told to buy a faster CPU to ensure that jobs experience the same mean response
time, E [T]. That is, customers should not notice the effect of the increased arrival
rate. By how much should you increase the CPU speed? (a) Double the CPU speed;
(b) More than double the CPU speed; (c) Less than double the CPU speed.
Answer: (c) Less than double.
6 motivating examples of the power of analytical modeling
Question: Why not (a)?
Answer: It turns out that doubling CPU speed together with doubling the arrival
rate will generally result in cutting the mean response time in half! We prove this in
Chapter 13. Therefore, the CPU speed does not need to double.
Question: Can you immediately see a rough argument for this result that does not
involve any queueing theory formulas? What happens if we double the service rate and
double the arrival rate?
Answer: Imagine that there are two types of time: Federation time and Klingon time.
Klingon seconds are faster than Federation seconds. In fact, each Klingon second is
equivalent to a half-second in Federation time. Now, suppose that in the Federation,
there is a CPU serving jobs. Jobs arrive with rate λ jobs per second and are served at
some rate μ jobs per second. The Klingons steal the system specs and reengineer the
same system in the Klingon world. In the Klingon system, the arrival rate is λ jobs
per Klingon second, and the service rate is μ jobs per Klingon second. Note that both
systems have the same mean response time, E [T], except that the Klingon system
response time is measured in Klingon seconds, while the Federation system response
time is measured in Federation seconds. Consider now that Captain Kirk is observing
both the Federation system and the Klingon reengineered system. From his perspective,
the Klingon system has twice the arrival rate and twice the service rate; however, the
mean response time in the Klingon system has been halved (because Klingon seconds
are half-seconds in Federation time).
Question: Suppose the CPU employs time-sharing service order (known as Processor-
Sharing, or PS for short), instead of FCFS. Does the answer change?
Answer: No. The same basic argument still works.
Design Example 2 – Sometimes “Improvements” Do Nothing
Consider the batch system shown in Figure 1.3. There are always N = 6 jobs in this
system (this is called the multiprogramming level). As soon as a job completes service,
a new job is started (this is called a “closed” system). Each job must go through the
“service facility.” At the service facility, with probability 1/2 the job goes to server 1,
and with probability 1/2 it goes to server 2. Server 1 services jobs at an average rate
of 1 job every 3 seconds. Server 2 also services jobs at an average rate of 1 job every 3
seconds. The distribution on the service times of the jobs is irrelevant for this problem.
Response time is defined as usual as the time from when a job first arrives at the service
facility (at the fork) until it completes service.
N = 6 jobs
½
½
µ =⅓
Server 1
Server 2
µ =⅓
Figure 1.3. A closed batch system.
1.2 examples of the power of queueing theory 7
Question: You replace server 1 with a server that is twice as fast (the new server
services jobs at an average rate of 2 jobs every 3 seconds). Does this “improvement”
affect the average response time in the system? Does it affect the throughput? (Assume
that the routing probabilities remain constant at 1/2 and 1/2.)
Answer: Not really. Both the average response time and throughput are hardly affected.
This is explained in Chapter 7.
Question: Suppose that the system had a higher multiprogramming level, N. Does the
answer change?
Answer: No. The already negligible effect on response time and throughput goes to
zero as N increases.
Question: Suppose the system had a lower value of N. Does the answer change?
Answer: Yes. If N is sufficiently low, then the “improvement” helps. Consider, for
example, the case N = 1.
Question: Suppose the system is changed into an open system, rather than a closed
system, as shown in Figure 1.4, where arrival times are independent of service com-
pletions. Now does the “improvement” reduce mean response time?
Answer: Absolutely!
½
½
µ =⅓
Server 1
Server 2
µ =⅓
Figure 1.4. An open system.
Design Example 3 – One Machine or Many?
You are given a choice between one fast CPU of speed s, or n slow CPUs each of speed
s/n (see Figure 1.5). Your goal is to minimize mean response time. To start, assume
that jobs are non-preemptible (i.e., each job must be run to completion).
versus
µ = 1
µ = 1
µ = 4
µ = 1
µ = 1
Figure 1.5. Which is better for minimizing mean response time: many slow servers or one
fast server?
8 motivating examples of the power of analytical modeling
Question: Which is the better choice: one fast machine or many slow ones?
Hint: Suppose that I tell you that the answer is, “It depends on the workload.” What
aspects of the workload do you think the answer depends on?
Answer: It turns out that the answer depends on the variability of the job size distribu-
tion, as well as on the system load.
Question: Which system do you prefer when job size variability is high?
Answer: When job size variability is high, we prefer many slow servers because we
do not want short jobs getting stuck behind long ones.
Question: Which system do you prefer when load is low?
Answer: When load is low, not all servers will be utilized, so it seems better to go with
one fast server.
These observations are revisited many times throughout the book.
Question: Now suppose we ask the same question, but jobs are preemptible; that is,
they can be stopped and restarted where they left off. When do we prefer many slow
machines as compared to a single fast machine?
Answer: If your jobs are preemptible, you could always use a single fast machine
to simulate the effect of n slow machines. Hence a single fast machine is at least as
good.
The question of many slow servers versus a few fast ones has huge applicability in a
wide range of areas, because anything can be viewed as a resource, including CPU,
power, and bandwidth.
For an example involving power management in data centers, consider the problem
from [69] where you have a fixed power budget P and a server farm consisting of
n servers. You have to decide how much power to allocate to each server, so as
to minimize overall mean response time for jobs arriving at the server farm. There
is a function that specifies the relationship between the power allocated to a server
and the speed (frequency) at which it runs – generally, the more power you allocate
to a server, the faster it runs (the higher its frequency), subject to some maximum
possible frequency and some minimum power level needed just to turn the server on.
To answer the question of how to allocate power, you need to think about whether
you prefer many slow servers (allocate just a little power to every server) or a few fast
ones (distribute all the power among a small number of servers). In [69], queueing
theory is used to optimally answer this question under a wide variety of parameter
settings.
As another example, if bandwidth is the resource, we can ask when it pays to partition
bandwidth into smaller chunks and when it is better not to. The problem is also
interesting when performance is combined with price. For example, it is often cheaper
(financially) to purchase many slow servers than a few fast servers. Yet in some cases,
many slow servers can consume more total power than a few fast ones. All of these
factors can further influence the choice of architecture.
1.2 examples of the power of queueing theory 9
Design Example 4 – Task Assignment in a Server Farm
Consider a server farm with a central dispatcher and several hosts. Each arriving job is
immediately dispatched to one of the hosts for processing. Figure 1.6 illustrates such
a system.
Host 1
Host 2
Host 3
Dispatcher
(Load Balancer)
Arrivals
Figure 1.6. A distributed server system with a central dispatcher.
Server farms like this are found everywhere. Web server farms typically deploy a
front-end dispatcher like Cisco’s Local Director or IBM’s Network Dispatcher. Super-
computing sites might use LoadLeveler or some other dispatcher to balance load and
assign jobs to hosts.
For the moment, let’s assume that all the hosts are identical (homogeneous) and that
all jobs only use a single resource. Let’s also assume that once jobs are assigned to a
host, they are processed there in FCFS order and are non-preemptible.
There are many possible task assignment policies that can be used for dispatching jobs
to hosts. Here are a few:
Random: Each job flips a fair coin to determine where it is routed.
Round-Robin: The ith job goes to host i mod n, where n is the number of hosts,
and hosts are numbered 0, 1, . . . , n − 1.
Shortest-Queue: Each job goes to the host with the fewest number of jobs.
Size-Interval-Task-Assignment (SITA): “Short” jobs go to the first host, “medium”
jobs go to the second host, “long” jobs go to the third host, etc., for some definition
of “short,” “medium,” and “long.”
Least-Work-Left (LWL): Each job goes to the host with the least total remaining
work, where the “work” at a host is the sum of the sizes of jobs there.
Central-Queue: Rather than have a queue at each host, jobs are pooled at one central
queue. When a host is done working on a job, it grabs the first job in the central
queue to work on.
Question: Which of these task assignment policies yields the lowest mean response
time?
10 motivating examples of the power of analytical modeling
Answer: Given the ubiquity of server farms, it is surprising how little is known about
this question. If job size variability is low, then the LWL policy is best. If job size
variability is high, then it is important to keep short jobs from getting stuck behind long
ones, so a SITA-like policy, which affords short jobs isolation from long ones, can be
far better. In fact, for a long time it was believed that SITA is always better than LWL
when job size variability is high. However, it was recently discovered (see [90]) that
SITA can be far worse than LWL even under job size variability tending to infinity. It
turns out that other properties of the workload, including load and fractional moments
of the job size distribution, matter as well.
Question: For the previous question, how important was it to know the size of jobs?
For example, how does LWL, which requires knowing job size, compare with Central-
Queue, which does not?
Answer: Actually, most task assignment policies do not require knowing the size of
jobs. For example, it can be proven by induction that LWL is equivalent to Central-
Queue. Even policies like SITA, which by definition are based on knowing the job size,
can be well approximated by other policies that do not require knowing the job size;
see [82].
Question: Now consider a different model, in which jobs are preemptible. Specifically,
suppose that the servers are Processor-Sharing (PS) servers, which time-share among
all the jobs at the server, rather than serving them in FCFS order. Which task assignment
policy is preferable now? Is the answer the same as that for FCFS servers?
Answer: The task assignment policies that are best for FCFS servers are often a
disaster under PS servers. For PS servers, the Shortest-Queue policy is near optimal
([79]), whereas that policy is pretty bad for FCFS servers if job size variability is high.
There are many open questions with respect to task assignment policies. The case of
server farms with PS servers, for example, has received almost no attention, and even
the case of FCFS servers is still only partly understood. There are also many other
task assignment policies that have not been mentioned. For example, cycle stealing
(taking advantage of a free host to process jobs in some other queue) can be combined
with many existing task assignment policies to create improved policies. There are also
other metrics to consider, like minimizing the variance of response time, rather than
mean response time, or maximizing fairness. Finally, task assignment can become even
more complex, and more important, when the workload changes over time.
Task assignment is analyzed in great detail in Chapter 24, after we have had a chance
to study empirical workloads.
Design Example 5 – Scheduling Policies
Suppose you have a single server. Jobs arrive according to a Poisson process. Assume
anything you like about the distribution of job sizes. The following are some possible
service orders (scheduling orders) for serving jobs:
First-Come-First-Served (FCFS): When the server completes a job, it starts working
on the job that arrived earliest.
1.2 examples of the power of queueing theory 11
Non-Preemptive Last-Come-First-Served (LCFS): When the server completes a
job, it starts working on the job that arrived last.
Random: When the server completes a job, it starts working on a random job.
Question: Which of these non-preemptive service orders will result in the lowest mean
response time?
Answer: Believe it or not, they all have the same mean response time.
Question: Suppose we change the non-preemptive LCFS policy to a Preemptive-LCFS
policy (PLCFS), which works as follows: Whenever a new arrival enters the system,
it immediately preempts the job in service. How does the mean response time of this
policy compare with the others?
Answer: It depends on the variability of the job size distribution. If the job size
distribution is at least moderately variable, then PLCFS will be a huge improvement.
If the job size distribution is hardly variable (basically constant), then PLCFS policy
will be up to a factor of 2 worse.
We study many counterintuitive scheduling theory results toward the very end of the
book, in Chapters 28 through 33.
More Design Examples
There are many more questions in computer systems design that lend themselves to a
queueing-theoretic solution.
One example is the notion of a setup cost. It turns out that it can take both significant time
and power to turn on a server that is off. In designing an efficient power management
policy, we often want to leave servers off (to save power), but then we have to pay the
setup cost to get them back on when jobs arrive. Given performance goals, both with
respect to response time and power usage, an important question is whether it pays to
turn a server off. If so, one can then ask exactly how many servers should be left on.
These questions are discussed in Chapters 15 and 27.
There are also questions involving optimal scheduling when jobs have priorities (e.g.,
certain users have paid more for their jobs to have priority over other users’ jobs, or
some jobs are inherently more vital than others). Again, queueing theory is very useful
in designing the right priority scheme to maximize the value of the work completed.
Figure 1.7. Example of a difficult problem: The M/G/2 queue consists of a single queue and
two servers. When a server completes a job, it starts working on the job at the head of the
queue. Job sizes follow a general distribution, G.
12 motivating examples of the power of analytical modeling
However, queueing theory (and more generally analytical modeling) is not currently
all-powerful! There are lots of very simple problems that we can at best only analyze
approximately. As an example, consider the simple two-server network shown in
Figure 1.7, where job sizes come from a general distribution. No one knows how to
derive mean response time for this network. Approximations exist, but they are quite
poor, particularly when job size variability gets high [76]. We mention many such open
problems in this book, and we encourage readers to attempt to solve these!
CHAPTER 2
Queueing Theory Terminology
2.1 Where We Are Heading
Queueing theory is the study of queueing behavior in networks and systems. Figure 2.1
shows the solution process.
Real-world system
with question:
“Should we buy a faster
disk or a faster CPU?”
Translate back
Result
Queueing
network
Model as
Analyze!
Figure 2.1. Solution process.
In Chapter 1, we looked at examples of the power of queueing theory as a design tool.
In this chapter, we start from scratch and define the terminology used in queueing
theory.
2.2 The Single-Server Network
A queueing network is made up of servers.
The simplest example of a queueing network is the single-server network, as shown
in Figure 2.2. The discussion in this section is limited to the single-server network with
First-Come-First-Served (FCFS) service order. You can think of the server as being a
CPU.
Arriving jobs
λ = 3
FCFS
= 4
Figure 2.2. Single-server network.
13
14 queueing theory terminology
There are several parameters associated with the single-server network:
Service Order This is the order in which jobs will be served by the server. Unless
otherwise stated, assume First-Come-First-Served (FCFS).
Average Arrival Rate This is the average rate, λ, at which jobs arrive to the server
(e.g., λ = 3 jobs/sec).
Mean Interarrival Time This is the average time between successive job arrivals
(e.g., 1/λ = 1
3
sec).
Service Requirement, Size The “size” of a job is typically denoted by the random
variable S. This is the time it would take the job to run on this server if there were
no other jobs around (no queueing). In a queueing model, the size (a.k.a. service
requirement) is typically associated with the server (e.g., this job will take 5 seconds
on this server).
Mean Service Time This is the expected value of S, namely the average time required
to service a job on this CPU, where “service” does not include queueing time. In
Figure 2.2, E [S] = 1
4
sec.
Average Service Rate This is the average rate, μ, at which jobs are served (e.g.,
μ = 4 jobs/sec = 1
E[S]
).
Observe that this way of speaking is different from the way we normally talk about
servers in conversation. For example, nowhere have we mentioned the absolute speed
of the CPU; rather we have only defined the CPU’s speed in terms of the set of jobs
that it is working on.
In normal conversation, we might say something like the following:
r The average arrival rate of jobs is 3 jobs per second.
r Jobs have different service requirements, but the average number of cycles re-
quired by a job is 5,000 cycles per job.
r The CPU speed is 20,000 cycles per second.
That is, an average of 15,000 cycles of work arrive at the CPU each second, and the
CPU can process 20,000 cycles of work a second.
In the queueing-theoretic way of talking, we would never mention the word “cycle.”
Instead, we would simply say
r The average arrival rate of jobs is 3 jobs per second.
r The average rate at which the CPU can service jobs is 4 jobs per second.
This second way of speaking suppresses some of the detail and thus makes the problem
a little easier to think about. You should feel comfortable going back and forth between
the two.
We consider these common performance metrics in the context of a single-server
system:
Response Time, Turnaround Time, Time in System, or Sojourn Time (T) We
define a job’s response time by T = tdepart − tarrive, where tdepart is the time when the
2.2 the single-server network 15
job leaves the system, and tarrive is the time when the job arrived to the system. We
are interested in E [T], the mean response time; Var(T), the variance in response
time; and the tail behavior of T, P {T  t}.
Waiting Time or Delay (TQ ) This is the time that the job spends in the queue, not
being served. It is also called the “time in queue” or the “wasted time.” Notice that
E [T] = E [TQ ] + E [S]. Under FCFS service order, waiting time can be defined as
the time from when a job arrives to the system until it first receives service.
Number of Jobs in the System (N) This includes those jobs in the queue, plus the
one being served (if any).
Number of Jobs in Queue (NQ ) This denotes only the number of jobs waiting (in
queue).
There are some immediate observations that we can make about the single-server
network. First, observe that as λ, the mean arrival rate, increases, all the performance
metrics mentioned earlier increase (get worse). Also, as μ, the mean service rate,
increases, all the performance metrics mentioned earlier decrease (improve).
We require that λ ≤ μ (we always assume λ  μ).
Question: If λ  μ what happens?
Answer: If λ  μ the queue length goes to infinity over time.
Question: Can you provide the intuition?
Answer: Consider a large time t. Then, if N(t) is the number of jobs in the system
at time t, and A(t) (respectively, D(t)) denotes the number of arrivals (respectively,
departures) by time t, then we have:
E[N(t)] = E[A(t)] − E[D(t)] ≥ λt − μt = t(λ − μ).
(The inequality comes from the fact that the expected number of departures by time t
is actually smaller than μt, because the server is not always busy). Now observe that if
λ  μ, then t(λ − μ) → ∞, as t → ∞.
Throughout the book we assume λ  μ, which is needed for stability (keeping queue
sizes from growing unboundedly). Situations where λ ≥ μ are touched on in Chapter 9.
Question: Given the previous stability condition (λ  μ), suppose that the interarrival
distribution and the service time distribution are Deterministic (i.e., both are constants).
What is TQ ? What is T?
Answer: TQ = 0, and T = S.
Therefore queueing (waiting) results from variability in service time and/or interarrival
time distributions. Here is an example of how variability leads to queues: Let’s discretize
time. Suppose at each time step, an arrival occurs with probability p = 1/6. Suppose at
each time step, a departure occurs with probability q = 1/3. Then there is a non-zero
probability that the queue will build up (temporarily) if several arrivals occur without
a departure.
16 queueing theory terminology
2.3 Classification of Queueing Networks
Queueing networks can be classified into two categories: open networks and closed
networks. Stochastic processes books (e.g., [149, 150]) usually limit their discussion
to open networks. By contrast, the systems performance analysis books (e.g., [117,
125]) almost exclusively discuss closed networks. Open networks are introduced in
Section 2.4. Closed networks are introduced in Section 2.6.
2.4 Open Networks
An open queueing network has external arrivals and departures. Four examples of open
networks are illustrated in this section.
Example: The Single-Server System
This was shown in Figure 2.2.
Example: Network of Queues with Probabilistic Routing
This is shown in Figure 2.3. Here server i receives external arrivals (“outside arrivals”)
with rate ri. Server i also receives internal arrivals from some of the other servers. A
packet that finishes service at server i is next routed to server j with probability pij .
We can even allow the probabilities to depend on the “class” of the packet, so that not
all packets have to follow the same routing scheme.
Server 1
µ1
µ2
Server 3
Server 2
µ3
r2
r3
r1
p12
p2,out
p1,out
p23
p31
p13
Figure 2.3. Network of queues with probabilistic routing.
Application: In modeling packet flows in the Internet, for example, one could make
the class of the packet (and hence its route) depend on its source and destination IP
addresses. In modeling delays, each wire might be replaced by a server that would be
used to model the wire latency. The goal might be to predict mean round-trip times for
packets on a particular route, given the presence of the other packets. We solve this
problem in Chapter 18.
2.5 more metrics: throughput and utilization 17
Example: Network of Queues with Non-Probabilistic Routing
This is shown in Figure 2.4. Here all jobs follow a predetermined route: CPU to disk 1
to disk 2 to disk 1 to disk 2 to disk 1 and out.
Arriving
Jobs (λ)
CPU Disk 1
Disk 2
2X around
(Disk 1,2,1,2,1)
Figure 2.4. Network of queues with non-probabilistic routing.
Example: Finite Buffer
An example of a single-server network with finite buffer is shown in Figure 2.5. Any
arrival that finds no room is dropped.
λ
CPU
Space for 9 jobs
plus 1 in service
µCPU
Figure 2.5. Single-server network with finite buffer capacity.
2.5 More Metrics: Throughput and Utilization
We have already seen four performance metrics: E [N], E [T], E [NQ ], and E [TQ ].
Although these were applied to a single-server system, they can also be used to describe
performance in a multi-server, multi-queue system. For example, E [T] would denote
the mean time a job spends in the whole system, including all time spent in various
queues and time spent receiving service at various servers, whereas E [TQ ] refers to just
the mean time the job “wasted” waiting in various queues. If we want to refer to just the
ith queue in such a system, we typically write E [Ni] to denote the expected number
of jobs both queueing and in service at server i, and E [Ti] to denote the expected time
a job spends queueing and in service at server i.
Now we introduce two new performance metrics: throughput and utilization. Through-
put is arguably the performance metric most used in conversation. Everyone wants
higher throughput! Let’s see why.
Question: How does maximizing throughput relate to minimizing response time? For
example, in Figure 2.6, which system has higher throughput?
18 queueing theory terminology
versus
µ =⅓
λ =
=
λ =
Figure 2.6. Comparing throughput of two systems.
Answer: We will see soon.
Let’s start by defining utilization.
Device Utilization (ρi) is the fraction of time device i is busy. Note our current
definition of utilization applies only to a single device (server). When the device is
implied, we simply write ρ (omitting the subscript).
Suppose we watch a device i for a long period of time. Let τ denote the length of the
observation period. Let B denote the total time during the observation period that the
device is non-idle (busy). Then
ρi =
B
τ
.
Device Throughput (Xi) is the rate of completions at device i (e.g., jobs/sec). The
throughput (X) of the system is the rate of job completions in the system.
Let C denote the total number of jobs completed at device i during time τ. Then
Xi =
C
τ
.
So how does Xi relate to ρi? Well,
C
τ
=

C
B

·
B
τ
.
Question: So what is C
B
?
Answer: Well, B
C
= E [S]. So C
B
= 1
E[S]
= μi.
So we have
Xi = μi · ρi.
Here is another way to derive this expression by conditioning:
Xi = Mean rate of completion at server i
= E [Rate of completion at server i | server i is busy] · P {server i is busy}
+ E [Rate of completion at server i | server i is idle] · P {server i is idle}
= μi · P {server i is busy} + 0
= μi · ρi
2.5 more metrics: throughput and utilization 19
Or, equivalently,
ρi = Xi · E [S] .
This latter formulation has a name: the Utilization Law.
Example: Single-Server Network: What Is the Throughput?
In Figure 2.7 we have a single-server system.
=⅓
λ =
FCFS
Figure 2.7. Single-server model.
Question: What is X?
Answer: X = ρ · μ. But what is ρ? In Chapter 6, we will prove that ρ = λ
μ
. For now
here is a hand-wavy but intuitive way to see this, but not a proof!!
ρ = Fraction of time server is busy
=
Average service time required by a job
Average time between arrivals
=
1/μ
1/λ
=
λ
μ
.
So, this leaves us with
X = ρ · μ =
λ
μ
· μ = λ.
So the throughput does not depend on the service rate whatsoever!
In particular, in the example shown in Figure 2.6, repeated again in Figure 2.8, both
systems have the same throughput of 1/6 jobs/sec. In the case of the faster processor,
the response time drops and the queue length drops, but X does not change. Therefore
lower response time is not related to higher throughput.
versus
=⅓
λ =
=
λ =
Figure 2.8. Same model, but different values of μ. Throughput, X, is the same in both.
Question: Explain why X does not change.
Answer: No matter how high we make μ, the completion rate is still bounded by the
arrival rate: “rate in = rate out.” Changing μ affects the maximum possible X, but
20 queueing theory terminology
not the actual X. Note that because we assume a stable system, then, for large t, the
number of arrivals during t is approximately the number of completions during t.
Example: Probabilistic Network of Queues: What is the Throughput?
For Figure 2.3, ri denotes the average outside arrival rate into server i, and μi denotes
the average service rate at server i.
Question: What is the system throughput, X, in Figure 2.3?
Answer: X =

i ri.
Question: What is the throughput at server i, Xi?
Answer: Let λi denote the total arrival rate into server i. Then Xi = λi. But to get λi
we need to solve these simultaneous equations:
λi = ri +

j
λj Pji (2.1)
Question: How are the ri’s constrained in these equations?
Answer: For the network to reach “equilibrium” (flow into server = flow out of server),
we must have λi  μi, ∀i, and this constrains the ri’s (see Exercise 2.1).
Example: Network of Queues with Non-Probabilistic Routing:
What is the Throughput?
Question: What is X in Figure 2.4?
Answer: X = λ.
Question: What are XDisk1 and XDisk2?
Answer: XDisk1 = 3λ and XDisk2 = 2λ.
Example: Finite Buffer: What is the Throughput?
For Figure 2.5, the outside arrival rate is λ and the service rate is μ.
Question: What is X?
Answer: X = ρμ. But we need stochastic analysis to determine ρ because it is no
longer simply λ/μ. Observe that X  λ because some arrivals get dropped.
2.6 Closed Networks
Closed queueing networks have no external arrivals or departures. They can be classified
into two categories as shown in Figure 2.9.
Closed networks
Interactive
(terminal-driven)
Batch system
Figure 2.9. Closed network categories.
2.6 closed networks 21
2.6.1 Interactive (Terminal-Driven) Systems
An example of an interactive (terminal-driven) system is shown in Figure 2.10. Ter-
minals represent users who each send a job to the “central subsystem” and then wait
for a response. The central subsystem is a network of queues. A user cannot submit
her next job before her previous job returns. Thus, the number of jobs in the system is
fixed (equal to the number of terminals). This number is sometimes called the load or
MPL (multiprogramming level), not to be confused with device utilization.
Central subsystem
N user terminals
t
u
o
n
i
Figure 2.10. Interactive system.
There is a think time, Z, which is a random variable representing the time at each
terminal between receiving the result of one job and sending out the next job. Note
that the number of jobs in the central subsystem is at most the number of terminals,
because some users might be in the “thinking” state.
An example of an interactive system such as the one shown in Figure 2.10 is a data-
entry application. N users each sit at terminals filling out the entries on their screens.
Several fields of the screen must be filled out, and then the whole screen is submitted
to the central subsystem for appropriate processing and database update. A new screen
cannot be filled out until the previous update is performed. The “think time,” Z, is the
time to key data to the screen.
An individual user (terminal) oscillates between the think state and the submitted state
as shown in Figure 2.11.
Think state Think state Think state
Submitted state
in out
Submitted state
Figure 2.11. The user alternates between thinking and waiting for the submitted job to return.
22 queueing theory terminology
Question: How would you define the response time for the interactive system?
Answer: Response time is the time it takes a job to go between “in” and “out” in
Figures 2.10 and 2.11. We denote the average time to get from “in” to “out” by
E [Response Time] or E [R] to differentiate it from E [T], which is defined as
E [T] = E [R] + E [Z]
Important: Although “response time” in open systems is denoted by the random
variable (r.v.) T, for closed interactive systems, we refer to T as the system time (or
“time in system”) and reserve the r.v. R for response time.
Goal: The goal in an interactive system is to find a way to allow as many users
as possible to get onto the system at once, so they can all get their work done,
while keeping E [R] low enough. Note that interactive systems are very different
from open systems in that a small change in N has a profound effect on the system
behavior.
The typical questions asked by systems designers are:
r Given the original system, how high can I make N while keeping E [R] below
some threshold? That is, how does E [R] rise with N?
r Assume a fixed multiprogramming level, N. Given that we can make changes
to the central subsystem (i.e., make certain devices faster), which changes will
improve E [R] the most?
Question: Say we are modeling performance at a website. Would you model the
website as a closed interactive system or an open system?
Answer: The jury is still out. There are research papers of both types. On the one hand,
once a user clicks on a link (submits a job), he typically waits to receive the result
before clicking on another link. Thus users behave as if the website is a closed system.
On the other hand, a website may have a huge number of users, each of whom is very
transient in his or her use of the website. In this respect, the website might behave more
like an open system.
Schroeder et al. [165] proposes the idea of a “partly-open” system. Here users arrive
from outside as in an open system, but make k requests to the system when they arrive,
where each request can only be made when the previous request completes (as in a
closed system).
2.6.2 Batch Systems
An example of a batch system is shown in Figure 2.12. A batch system looks like an
interactive system with a think time of zero. However, the goals are somewhat different
for batch systems. In a batch system, typically one is running many jobs overnight. As
soon as one job completes, another one is started. So there are always N jobs in the
central subsystem. The MPL is usually predetermined and fixed. For example the MPL
might be the number of jobs that fit into memory.
2.6 closed networks 23
Central subsystem
N user terminals
Figure 2.12. Batch system.
Goal: For a batch system, the goal is to obtain high throughput, so that as many jobs
as possible are completed overnight.
The typical question asked by systems designers is, “How can we improve the central
subsystem so as to maximize throughput?”
Note that we are typically constrained by some fixed maximum MPL (because only so
many jobs fit into memory or for some other reason). Thus the only method we have
for increasing throughput is changing the central subsystem, either by changing the
routing or by speeding up some device. Observe that in the batch system we are not
concerned with response times because the jobs are running overnight.
Question: What does X mean in a closed system?
Answer: X is the number of jobs crossing “out” per second. Note that “in” = “out” for
the batch system.
2.6.3 Throughput in a Closed System
Let’s look at some examples.
Example: Single Server
Figure 2.13 shows a closed network consisting of a single server.
MPL = N
µ
Figure 2.13. Single-server closed network.
Question: What is the throughput, X, in Figure 2.13?
Answer: X = μ.
Observe that this is very different from the case of the open network where throughput
was independent of service rate!
24 queueing theory terminology
Question: What is the mean response time, E [R], in Figure 2.13?
Answer: For a closed batch system, E [R] = E [T], namely the response time and
time in system are the same. For Figure 2.13, E [T] = N/μ, because every “arrival”
waits behind N − 1 jobs and then runs.
Note that X and E [R] are inversely related!
Example: Tandem Servers
Now consider the example of a more complicated closed network, as shown in Fig-
ure 2.14.
MPL = N
µ2
µ1
Figure 2.14. Tandem servers closed network.
Question: What is the throughput?
Answer: We would like to say X = min(μ1, μ2) . . .
Question: Why is this previous answer not necessarily correct?
Answer: The previous answer is correct if we know that the slower server is always
busy, but that is not necessarily the case. Imagine N = 1. Then it is certainly not the
case that the slower server is always busy.
Question: OK, but what happens when N = 2. Now it appears that there is always at
least one job at the slow server, doesn’t it?
Answer: Nope, the slower server is still not always busy. What we’re missing here is
the fact that sometimes the slow server is faster than the fast server – because these
service rates are just averages! So do we in fact need to take the job size distribution
into account to get the exact answer? Does the job size distribution really affect the
answer very much?
We will answer these questions soon enough . . . For now, let’s sum up the differences
between the behavior of open and closed networks and why we need to consider both.
2.7 Differences between Closed and Open Networks
Open Systems
r Throughput, X, is independent of the μi’s
r X is not affected by doubling the μi’s.
r Throughput and response time are not related.
2.8 related readings 25
Closed Systems
r X depends on μi’s.
r If we double all the μi’s while holding N constant, then X changes.
r In fact we see in Chapter 6 that for closed systems,
Higher throughput ⇐⇒ Lower avg. response time.
2.7.1 A Question on Modeling
Here is a final question: A few years ago I got a call from some folks at IBM. They
were trying to model their blade server as a single-server queue. They knew the arrival
rate into the server, λ, in jobs/sec. However they were wondering how to get E [S], the
mean job size.
Question: How do you obtain E [S] in practice for your single-server system?
Answer: At first glance, you might reason that because E [S] is the mean time required
for a job in isolation, you should just send a single job into the system and measure
its response time, repeating that experiment a hundred times to get an average. This
makes sense in theory, but does not work well in practice, because cache conditions
and other factors are very different for the scenario of just a single job compared with
the case when the system has been loaded for some time.
A better approach is to recall that E [S] = 1
μ
, so it suffices to think about the service
rate of the server in jobs/second. To get μ, assuming an open system, we can make λ
higher and higher, which will increase the completion rate, until the completion rate
levels off at some value, which will be rate μ.
An even better idea is to put our server into a closed system, with zero think time. This
way the server is guaranteed to always be occupied with work. Now, if we measure the
completion rate at the server (jobs completing per second), then that will give us μ for
the server. E [S] is then the reciprocal of μ.
2.8 Related Readings
Especially helpful in understanding closed queueing networks are Lazowska (pp. 58–
59) [117] and Menascé (pp. 84–87) [125]. Both of these are wonderful books.
There is surprisingly very little known in the literature on how closed systems compare
to open systems. For example, consider a closed interactive single-server system with
load ρ, versus the corresponding open system with load ρ. How do these compare to
each other with respect to their mean response time? How does variability in service
time affect closed systems versus open ones? These questions and many others are
considered in [186] and [24], as well as in Exercises 7.2, 7.5, 13.7, and 13.8. Another
question is how the scheduling policy (service order) at the server affects closed systems
versus open systems. This question was not really discussed until 2006 [165]. For a
Another Random Document on
Scribd Without Any Related Topics
O F
R EL IGIO N.
By HENRY WARE, D. D.,
LATE HOLLIS PROFESSOR OF DIVINITY IN HARVARD COLLEGE.
2 Vols. 12mo. Cloth.
VII.
THE CLO UDS O F
AR ISTO PHAN ES.
W I T H N O T E S .
By C. C. FELTON,
ELIOT PROFESSOR OF GREEK LITERATURE IN HARVARD
UNIVERSITY.
12mo. Cloth.
VIII.
PROF. LIEBIG'S REPORT ON ORGANIC CHEMISTRY.
PART I. AGRICULTURAL CHEMISTRY.
C HEMISTRY
I N I T S
A P P L I C AT I O N T O A G R I C U LT U R E A N D
P H Y S I O L O G Y.
BY
JUSTUS LIEBIG, M.D., Ph.D., F.R.S., M.R.I.A.,
PROFESSOR OF CHEMISTRY IN THE UNIVERSITY OF GIESSEN,
ETC.
EDITED FROM THE MANUSCRIPT OF THE AUTHOR,
By LYON PLAYFAIR, Ph.D.
WITH VERY NUMEROUS ADDITIONS, AND A NEW CHAPTER ON
SOILS.
THIRD AMERICAN, FROM THE SECOND ENGLISH EDITION,
WITH NOTES AND APPENDIX,
By JO HN W. WEBSTER, M .D.,
ERVING PROFESSOR OF CHEMISTRY IN HARVARD
UNIVERSITY.
12mo. Cloth.
IX.
PART II. ANIMAL CHEMISTRY.
ANIMAL CHEMISTRY,
O R O R G A N I C C H E M I S T R Y I N I T S
A P P L I C AT I O N T O P H Y S I O L O G Y A N D
P AT H O L O G Y.
BY
JUSTUS LIEBIG, M.D., Ph.D., F.R.S, M.R.I.A.,
PROFESSOR OF CHEMISTRY IN THE UNIVERSITY OF GIESSEN,
ETC.
EDITED FROM THE AUTHOR'S MANUSCRIPT,
By WILLIAM GREGORY, M.D., F.R.S.E., M.R.I.A.,
PROFESSOR OF MEDICINE AND CHEMISTRY IN THE UNIVERSITY
AND KING'S COLLEGE, ABERDEEN.
WITH ADDITIONS, NOTES, AND CORRECTIONS,
By Dr. GREGORY,
AND OTHERS
By JO HN W. WEBSTER, M .D.,
ERVING PROFESSOR OF CHEMISTRY IN HARVARD UNIVERSITY.
12mo. Cloth.
X.
A NARRATIV E O F V OYAGESL
A N D
C O M M E R C I A L E N T E R P R I S E S .
By RICHARD J. CLEVELAND.
2 Vols. 12mo. Cloth.
XI.
L ECTUR ES O N MO DERN
HISTO RY,
F R O M
T H E I R R U P T I O N O F T H E N O R T H E R N
N AT I O N S
T O T H E
C L O S E O F T H E A M E R I C A N
R E V O L U T I O N .
By WIL L IAM SM YTH,
PROFESSOR OF MODERN HISTORY IN THE UNIVERSITY OF
CAMBRIDGE.
FROM THE SECOND LONDON EDITION,
WITH A PREFACE, LIST OF BOOKS ON AMERICAN
HISTORY, c.,
By JARED SPARKS, L L . D.,
PROFESSOR OF ANCIENT AND MODERN HISTORY IN HARVARD
UNIVERSITY.
2 Vols. 8vo. Cloth.
XII.
HENRY O F O F TERDING EN :
A ROMANCE.
FROM THE GERMAN OF
NOVALIS (FRIEDRICH von HARDENBERG).
12mo. Cloth.
WORKS IN PRESS.
I.
A TREATISE O N MINER ALO GY,
ON THE BASIS OF THOMSON'S OUTLINES,
WITH NUMEROUS ADDITIONS;
COMPRISING
THE DESCRIPTION OF ALL THE NEW AMERICAN AND FOREIGN
MINERALS, THEIR LOCALITIES, c.
DESIGNED AS A TEXT-BOOK FOR STUDENTS, TRAVELLERS, AND
PERSONS ATTENDING LECTURES ON THE SCIENCE.
By JO HN W. WEBSTER, M .D.,
ERVING PROFESSOR OF CHEMISTRY AND MINERALOGY IN
HARVARD UNIVERSITY.
8vo.
II.
THE EVIDENCES
OF THE
G E N U I N E N E SS O F T H E
G O S P E L S.
By ANDREWS NO RTO N.
Vols. II.  III.
BEING THE COMPLETION OF THE WORK.
8vo.
III.
THE SPAN ISH STUDENT.
A DRAMA: IN THREE ACTS.
BY
HENRY WADSWORTH LONGFELLOW,
AUTHOR OF VOICES OF THE NIGHT, HYPERION, ETC.
l6mo.
*** END OF THE PROJECT GUTENBERG EBOOK POEMS ON
SLAVERY ***
Updated editions will replace the previous one—the old editions
will be renamed.
Creating the works from print editions not protected by U.S.
copyright law means that no one owns a United States
copyright in these works, so the Foundation (and you!) can copy
and distribute it in the United States without permission and
without paying copyright royalties. Special rules, set forth in the
General Terms of Use part of this license, apply to copying and
distributing Project Gutenberg™ electronic works to protect the
PROJECT GUTENBERG™ concept and trademark. Project
Gutenberg is a registered trademark, and may not be used if
you charge for an eBook, except by following the terms of the
trademark license, including paying royalties for use of the
Project Gutenberg trademark. If you do not charge anything for
copies of this eBook, complying with the trademark license is
very easy. You may use this eBook for nearly any purpose such
as creation of derivative works, reports, performances and
research. Project Gutenberg eBooks may be modified and
printed and given away—you may do practically ANYTHING in
the United States with eBooks not protected by U.S. copyright
law. Redistribution is subject to the trademark license, especially
commercial redistribution.
START: FULL LICENSE
THE FULL PROJECT GUTENBERG LICENSE
PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK
To protect the Project Gutenberg™ mission of promoting the
free distribution of electronic works, by using or distributing this
work (or any other work associated in any way with the phrase
“Project Gutenberg”), you agree to comply with all the terms of
the Full Project Gutenberg™ License available with this file or
online at www.gutenberg.org/license.
Section 1. General Terms of Use and
Redistributing Project Gutenberg™
electronic works
1.A. By reading or using any part of this Project Gutenberg™
electronic work, you indicate that you have read, understand,
agree to and accept all the terms of this license and intellectual
property (trademark/copyright) agreement. If you do not agree
to abide by all the terms of this agreement, you must cease
using and return or destroy all copies of Project Gutenberg™
electronic works in your possession. If you paid a fee for
obtaining a copy of or access to a Project Gutenberg™
electronic work and you do not agree to be bound by the terms
of this agreement, you may obtain a refund from the person or
entity to whom you paid the fee as set forth in paragraph 1.E.8.
1.B. “Project Gutenberg” is a registered trademark. It may only
be used on or associated in any way with an electronic work by
people who agree to be bound by the terms of this agreement.
There are a few things that you can do with most Project
Gutenberg™ electronic works even without complying with the
full terms of this agreement. See paragraph 1.C below. There
are a lot of things you can do with Project Gutenberg™
electronic works if you follow the terms of this agreement and
help preserve free future access to Project Gutenberg™
electronic works. See paragraph 1.E below.
1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright
law in the United States and you are located in the United
States, we do not claim a right to prevent you from copying,
distributing, performing, displaying or creating derivative works
based on the work as long as all references to Project
Gutenberg are removed. Of course, we hope that you will
support the Project Gutenberg™ mission of promoting free
access to electronic works by freely sharing Project Gutenberg™
works in compliance with the terms of this agreement for
keeping the Project Gutenberg™ name associated with the
work. You can easily comply with the terms of this agreement
by keeping this work in the same format with its attached full
Project Gutenberg™ License when you share it without charge
with others.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside
the United States, check the laws of your country in addition to
the terms of this agreement before downloading, copying,
displaying, performing, distributing or creating derivative works
based on this work or any other Project Gutenberg™ work. The
Foundation makes no representations concerning the copyright
status of any work in any country other than the United States.
1.E. Unless you have removed all references to Project
Gutenberg:
1.E.1. The following sentence, with active links to, or other
immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project
Gutenberg™ work (any work on which the phrase “Project
Gutenberg” appears, or with which the phrase “Project
Gutenberg” is associated) is accessed, displayed, performed,
viewed, copied or distributed:
This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it
away or re-use it under the terms of the Project Gutenberg
License included with this eBook or online at
www.gutenberg.org. If you are not located in the United
States, you will have to check the laws of the country where
you are located before using this eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is
derived from texts not protected by U.S. copyright law (does not
contain a notice indicating that it is posted with permission of
the copyright holder), the work can be copied and distributed to
anyone in the United States without paying any fees or charges.
If you are redistributing or providing access to a work with the
phrase “Project Gutenberg” associated with or appearing on the
work, you must comply either with the requirements of
paragraphs 1.E.1 through 1.E.7 or obtain permission for the use
of the work and the Project Gutenberg™ trademark as set forth
in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is
posted with the permission of the copyright holder, your use and
distribution must comply with both paragraphs 1.E.1 through
1.E.7 and any additional terms imposed by the copyright holder.
Additional terms will be linked to the Project Gutenberg™
License for all works posted with the permission of the copyright
holder found at the beginning of this work.
1.E.4. Do not unlink or detach or remove the full Project
Gutenberg™ License terms from this work, or any files
containing a part of this work or any other work associated with
Project Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute
this electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
with active links or immediate access to the full terms of the
Project Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if
you provide access to or distribute copies of a Project
Gutenberg™ work in a format other than “Plain Vanilla ASCII” or
other format used in the official version posted on the official
Project Gutenberg™ website (www.gutenberg.org), you must,
at no additional cost, fee or expense to the user, provide a copy,
a means of exporting a copy, or a means of obtaining a copy
upon request, of the work in its original “Plain Vanilla ASCII” or
other form. Any alternate format must include the full Project
Gutenberg™ License as specified in paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™
works unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or
providing access to or distributing Project Gutenberg™
electronic works provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project
Gutenberg™ electronic work or group of works on different
terms than are set forth in this agreement, you must obtain
permission in writing from the Project Gutenberg Literary
Archive Foundation, the manager of the Project Gutenberg™
trademark. Contact the Foundation as set forth in Section 3
below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on,
transcribe and proofread works not protected by U.S. copyright
law in creating the Project Gutenberg™ collection. Despite these
efforts, Project Gutenberg™ electronic works, and the medium
on which they may be stored, may contain “Defects,” such as,
but not limited to, incomplete, inaccurate or corrupt data,
transcription errors, a copyright or other intellectual property
infringement, a defective or damaged disk or other medium, a
computer virus, or computer codes that damage or cannot be
read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except
for the “Right of Replacement or Refund” described in
paragraph 1.F.3, the Project Gutenberg Literary Archive
Foundation, the owner of the Project Gutenberg™ trademark,
and any other party distributing a Project Gutenberg™ electronic
work under this agreement, disclaim all liability to you for
damages, costs and expenses, including legal fees. YOU AGREE
THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT
LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT
EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE
THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY
DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE
TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL,
PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE
NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of
receiving it, you can receive a refund of the money (if any) you
paid for it by sending a written explanation to the person you
received the work from. If you received the work on a physical
medium, you must return the medium with your written
explanation. The person or entity that provided you with the
defective work may elect to provide a replacement copy in lieu
of a refund. If you received the work electronically, the person
or entity providing it to you may choose to give you a second
opportunity to receive the work electronically in lieu of a refund.
If the second copy is also defective, you may demand a refund
in writing without further opportunities to fix the problem.
1.F.4. Except for the limited right of replacement or refund set
forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’,
WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of
damages. If any disclaimer or limitation set forth in this
agreement violates the law of the state applicable to this
agreement, the agreement shall be interpreted to make the
maximum disclaimer or limitation permitted by the applicable
state law. The invalidity or unenforceability of any provision of
this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the
Foundation, the trademark owner, any agent or employee of the
Foundation, anyone providing copies of Project Gutenberg™
electronic works in accordance with this agreement, and any
volunteers associated with the production, promotion and
distribution of Project Gutenberg™ electronic works, harmless
from all liability, costs and expenses, including legal fees, that
arise directly or indirectly from any of the following which you
do or cause to occur: (a) distribution of this or any Project
Gutenberg™ work, (b) alteration, modification, or additions or
deletions to any Project Gutenberg™ work, and (c) any Defect
you cause.
Section 2. Information about the Mission
of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new
computers. It exists because of the efforts of hundreds of
volunteers and donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project
Gutenberg™’s goals and ensuring that the Project Gutenberg™
collection will remain freely available for generations to come. In
2001, the Project Gutenberg Literary Archive Foundation was
created to provide a secure and permanent future for Project
Gutenberg™ and future generations. To learn more about the
Project Gutenberg Literary Archive Foundation and how your
efforts and donations can help, see Sections 3 and 4 and the
Foundation information page at www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-
profit 501(c)(3) educational corporation organized under the
laws of the state of Mississippi and granted tax exempt status
by the Internal Revenue Service. The Foundation’s EIN or
federal tax identification number is 64-6221541. Contributions
to the Project Gutenberg Literary Archive Foundation are tax
deductible to the full extent permitted by U.S. federal laws and
your state’s laws.
The Foundation’s business office is located at 809 North 1500
West, Salt Lake City, UT 84116, (801) 596-1887. Email contact
links and up to date contact information can be found at the
Foundation’s website and official page at
www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission
of increasing the number of public domain and licensed works
that can be freely distributed in machine-readable form
accessible by the widest array of equipment including outdated
equipment. Many small donations ($1 to $5,000) are particularly
important to maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws
regulating charities and charitable donations in all 50 states of
the United States. Compliance requirements are not uniform
and it takes a considerable effort, much paperwork and many
fees to meet and keep up with these requirements. We do not
solicit donations in locations where we have not received written
confirmation of compliance. To SEND DONATIONS or determine
the status of compliance for any particular state visit
www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states
where we have not met the solicitation requirements, we know
of no prohibition against accepting unsolicited donations from
donors in such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot
make any statements concerning tax treatment of donations
received from outside the United States. U.S. laws alone swamp
our small staff.
Please check the Project Gutenberg web pages for current
donation methods and addresses. Donations are accepted in a
number of other ways including checks, online payments and
credit card donations. To donate, please visit:
www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could
be freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose
network of volunteer support.
Project Gutenberg™ eBooks are often created from several
printed editions, all of which are confirmed as not protected by
copyright in the U.S. unless a copyright notice is included. Thus,
we do not necessarily keep eBooks in compliance with any
particular paper edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg
Literary Archive Foundation, how to help produce our new
eBooks, and how to subscribe to our email newsletter to hear
about new eBooks.
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Computer Simulation_ A Foundational Approach Using Python (2018).pdf
PDF
Queueing Modelling Fundamentals With Applications In Communication Networks S...
PDF
Computer Science And Engineeringtheory And Applications 1st Coll
PDF
The Theory Of Queuing Systems With Correlated Flows 1st Ed 2020 Alexander N D...
PDF
Optimal Design Of Queueing Systems 1st Edition Shaler Stidham Jr
PDF
Distributed Realtime Systems Theory And Practice 1st Ed K Erciyes
PDF
QTTS.pdf
PDF
queuingtheory-091005084417-phpapp01 (2).pdf
Computer Simulation_ A Foundational Approach Using Python (2018).pdf
Queueing Modelling Fundamentals With Applications In Communication Networks S...
Computer Science And Engineeringtheory And Applications 1st Coll
The Theory Of Queuing Systems With Correlated Flows 1st Ed 2020 Alexander N D...
Optimal Design Of Queueing Systems 1st Edition Shaler Stidham Jr
Distributed Realtime Systems Theory And Practice 1st Ed K Erciyes
QTTS.pdf
queuingtheory-091005084417-phpapp01 (2).pdf

Similar to Performance Modeling And Design Of Computer Systems Queueing Theory In Action Mor Harcholbalter (20)

PDF
Computer Simulation A Foundational Approach Using Python 1st Edition Yahya Es...
PPT
Unit-1 Computer Network electrnics .ppt
PPT
Lecture19.ppt
PPT
PDF
Cyberphysical Systems Raj Rajkumar Dionisio De Niz Mark Klein
PPTX
QUANTITATIVE TECHNIQUES-22-WAITING LINE & REPLACEMENT.pptx
PDF
Cdma Systems Capacity Engineering Kiseon Kim Insoo Koo
PPT
QUEUING THEORY
PDF
International Journal of Engineering Research and Development (IJERD)
PPTX
Manufacturing systems Design presentation.
PDF
Statistical Queuing Theory with Some Applications.pdf
PPT
Queuingtheory 091005084417-phpapp01
PDF
BU_FCAI_SCC430_Modeling&Simulation_Ch03.pdf
PDF
Report_Internships
PPTX
Unit 4 queuing models
PDF
Handbook of healthcare operations management
PPTX
queuing-theory.2858801.powerpoint.pptx
PPTX
Simulation, Modeling, it’s application, advantage & disadvantage
DOCX
CMGT 580Introduction to Systems Engineering ManagementClass .docx
PDF
Linux capacity planning
Computer Simulation A Foundational Approach Using Python 1st Edition Yahya Es...
Unit-1 Computer Network electrnics .ppt
Lecture19.ppt
Cyberphysical Systems Raj Rajkumar Dionisio De Niz Mark Klein
QUANTITATIVE TECHNIQUES-22-WAITING LINE & REPLACEMENT.pptx
Cdma Systems Capacity Engineering Kiseon Kim Insoo Koo
QUEUING THEORY
International Journal of Engineering Research and Development (IJERD)
Manufacturing systems Design presentation.
Statistical Queuing Theory with Some Applications.pdf
Queuingtheory 091005084417-phpapp01
BU_FCAI_SCC430_Modeling&Simulation_Ch03.pdf
Report_Internships
Unit 4 queuing models
Handbook of healthcare operations management
queuing-theory.2858801.powerpoint.pptx
Simulation, Modeling, it’s application, advantage & disadvantage
CMGT 580Introduction to Systems Engineering ManagementClass .docx
Linux capacity planning
Ad

Recently uploaded (20)

PPTX
Pharma ospi slides which help in ospi learning
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Weekly quiz Compilation Jan -July 25.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
GDM (1) (1).pptx small presentation for students
Pharma ospi slides which help in ospi learning
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
Anesthesia in Laparoscopic Surgery in India
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Final Presentation General Medicine 03-08-2024.pptx
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Final Presentation General Medicine 03-08-2024.pptx
human mycosis Human fungal infections are called human mycosis..pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
O7-L3 Supply Chain Operations - ICLT Program
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
Weekly quiz Compilation Jan -July 25.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
GDM (1) (1).pptx small presentation for students
Ad

Performance Modeling And Design Of Computer Systems Queueing Theory In Action Mor Harcholbalter

  • 1. Performance Modeling And Design Of Computer Systems Queueing Theory In Action Mor Harcholbalter download https://ebookbell.com/product/performance-modeling-and-design-of- computer-systems-queueing-theory-in-action-mor- harcholbalter-4924082 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Molecular Modeling For The Design Of Novel Performance Chemicals And Materials Beena Rai https://ebookbell.com/product/molecular-modeling-for-the-design-of- novel-performance-chemicals-and-materials-beena-rai-4347818 Stochastic Modeling Of Manufacturing Systems Advances In Design Performance Evaluation And Control Issues 1st Edition J Macgregor Smith Auth https://ebookbell.com/product/stochastic-modeling-of-manufacturing- systems-advances-in-design-performance-evaluation-and-control- issues-1st-edition-j-macgregor-smith-auth-4191222 Design Modeling Manufacturing And Performance Evaluation Of A Solarpowered Singleeffect Absorption Cooling System Harcdr Vahid Vakiloroaya https://ebookbell.com/product/design-modeling-manufacturing-and- performance-evaluation-of-a-solarpowered-singleeffect-absorption- cooling-system-harcdr-vahid-vakiloroaya-58530276 Statistical Performance Modeling And Optimization Foundations And Trends In Electronic Design Automation Xin Li https://ebookbell.com/product/statistical-performance-modeling-and- optimization-foundations-and-trends-in-electronic-design-automation- xin-li-2014772
  • 3. Recent Developments In Pavement Design Modeling And Performance 1st Ed Sherif Elbadawy https://ebookbell.com/product/recent-developments-in-pavement-design- modeling-and-performance-1st-ed-sherif-elbadawy-7320708 Parallel Pnp Robots Parametric Modeling Performance Evaluation And Design Optimization 1st Ed Guanglei Wu https://ebookbell.com/product/parallel-pnp-robots-parametric-modeling- performance-evaluation-and-design-optimization-1st-ed-guanglei- wu-22476758 High Performance Memory Testing Design Principles Fault Modeling And Selftest 1st Edition R Dean Adams https://ebookbell.com/product/high-performance-memory-testing-design- principles-fault-modeling-and-selftest-1st-edition-r-dean- adams-2261034 Network Design Modelling And Performance Evaluation Quoctuan Vien https://ebookbell.com/product/network-design-modelling-and- performance-evaluation-quoctuan-vien-49118674 Proton Exchange Membrane Fuel Cells Design Modelling And Performance Assessment Techniques 1st Edition Alhussein Albarbar https://ebookbell.com/product/proton-exchange-membrane-fuel-cells- design-modelling-and-performance-assessment-techniques-1st-edition- alhussein-albarbar-6842500
  • 5. Performance Modeling and Design of Computer Systems QUEUEING THEORY IN ACTION MOR HARCHOL-BALTER Harchol- Balter PER FOR MANCE MODELING AND DESIGN OF COMPUTER SYSTEMS Computer systems design is full of conundrums: • Given a choice between a single machine with speed s, or n machines each with speed s/n, which should we choose? • If both the arrival rate and service rate double, will the mean response time stay the same? • Should systems really aim to balance load, or is this a convenient myth? • If a scheduling policy favors one set of jobs, does it necessarily hurt some other jobs, or are these “conservation laws” being misinterpreted? • Do greedy, shortest-delay, routing strategies make sense in a server farm, or is what is good for the individual disastrous for the system as a whole? • How do high job size variability and heavy-tailed workloads affect the choice of a scheduling policy? • How should one trade off energy and delay in designing a computer system? • If 12 servers are needed to meet delay guarantees when the arrival rate is 9 jobs/sec, will we need 12,000 servers when the arrival rate is 9,000 jobs/sec? Tackling the questions that systems designers care about, this book brings queueing theory decisively back to computer science. The book is written with computer scientists and engineers in mind and is full of examples from computer systems, as well as manufacturing and operations research. Fun and readable, the book is highly approachable, even for undergraduates, while still being thoroughly rigorous and also covering a much wider span of topics than many queueing books. Readers benefit from a lively mix of motivation and intuition, with illustrations, examples, and more than 300 exercises – all while acquiring the skills needed to model, analyze, and design large-scale systems with good performance and low cost. The exercises are an important feature, teaching research-level counterintuitive lessons in the design of computer systems. The goal is to train readers not only to customize existing analyses but also to invent their own. Mor Harchol-Balter is an Associate Professor in the Computer Science Department at Carnegie Mellon University. She is a leader in the ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, having served as technical program committee chair in 2007 and conference chair in 2013. Cover photo © Ferenc Cegledi / Shutterstock.com Cover design by James F. Brisson
  • 6. Performance Modeling and Design of Computer Systems Computer systems design is full of conundrums: r Given a choice between a single machine with speed s, or n machines each with speed s/n, which should we choose? r If both the arrival rate and service rate double, will the mean response time stay the same? r Should systems really aim to balance load, or is this a convenient myth? r If a scheduling policy favors one set of jobs, does it necessarily hurt some other jobs, or are these “conservation laws” being misinterpreted? r Do greedy, shortest-delay, routing strategies make sense in a server farm, or is what is good for the individual disastrous for the system as a whole? r How do high job size variability and heavy-tailed workloads affect the choice of a scheduling policy? r How should one trade off energy and delay in designing a computer system? r If 12 servers are needed to meet delay guarantees when the arrival rate is 9 jobs/sec, will we need 12,000 servers when the arrival rate is 9,000 jobs/sec? Tackling the questions that systems designers care about, this book brings queueing theory decisively back to computer science. The book is written with computer scientists and engineers in mind and is full of examples from computer systems, as well as manufacturing and operations research. Fun and readable, the book is highly approachable, even for undergraduates, while still being thoroughly rigorous and also covering a much wider span of topics than many queueing books. Readers benefit from a lively mix of motivation and intuition, with illustrations, examples, and more than 300 exercises – all while acquiring the skills needed to model, analyze, and design large-scale systems with good performance and low cost. The exercises are an important feature, teaching research-level counterintuitive lessons in the design of computer systems. The goal is to train readers not only to customize existing analyses but also to invent their own. Mor Harchol-Balter is an Associate Professor in the Computer Science Department at Carnegie Mellon University. She is a leader in the ACM Sigmetrics Conference on Measure- ment and Modeling of Computer Systems, having served as technical program committee chair in 2007 and conference chair in 2013.
  • 8. Performance Modeling and Design of Computer Systems Queueing Theory in Action Mor Harchol-Balter Carnegie Mellon University, Pennsylvania
  • 9. cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo, Delhi, Mexico City Cambridge University Press 32 Avenue of the Americas, New York, NY 10013-2473, USA www.cambridge.org Information on this title: www.cambridge.org/9781107027503 C Mor Harchol-Balter 2013 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2013 Printed in the United States of America A catalog record for this publication is available from the British Library. Library of Congress Cataloging in Publication Data Harchol-Balter, Mor, 1966– Performance modeling and design of computer systems : queueing theory in action / Mor Harchol-Balter. pages cm Includes bibliographical references and index. ISBN 978-1-107-02750-3 1. Transaction systems (Computer systems) – Mathematical models. 2. Computer systems – Design and construction – Mathematics. 3. Queueing theory. 4. Queueing networks (Data transmission) I. Title. QA76.545.H37 2013 519.8 2–dc23 2012019844 ISBN 978-1-107-02750-3 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party Internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
  • 10. To my loving husband Andrew, my awesome son Danny, and my parents, Irit and Micha
  • 11. I have always been interested in finding better designs for computer systems, designs that improve performance without the purchase of additional resources. When I look back at the problems that I have solved and I look ahead to the problems I hope to solve, I realize that the problem formulations keep getting simpler and simpler, and my footing less secure. Every wisdom that I once believed, I have now come to question: If a scheduling policy helps one set of jobs, does it necessarily hurt some other jobs, or are these “conservation laws” being misinterpreted? Do greedy routing strategies make sense in server farms, or is what is good for the individual actually disastrous for the system as a whole? When comparing a single fast machine with n slow machines, each of 1/nth the speed, the single fast machine is typically much more expensive – but does that mean that it is necessarily better? Should distributed systems really aim to balance load, or is this a convenient myth? Cycle stealing, where machines can help each other when they are idle, sounds like a great idea, but can we quantify the actual benefit? How much is the performance of scheduling policies affected by variability in the arrival rate and service rate and by fluctuations in the load, and what can we do to combat variability? Inherent in these questions is the impact of real user behaviors and real-world workloads with heavy-tailed, highly variable service demands, as well as correlated arrival processes. Also intertwined in my work are the tensions between theoretical analysis and the realities of implementation, each motivating the other. In my search to discover new research techniques that allow me to answer these and other questions, I find that I am converging toward the fundamental core that defines all these problems, and that makes the counterintuitive more believable.
  • 12. Contents Preface xvii Acknowledgments xxiii I Introduction to Queueing 1 Motivating Examples of the Power of Analytical Modeling 3 1.1 What Is Queueing Theory? 3 1.2 Examples of the Power of Queueing Theory 5 2 Queueing Theory Terminology 13 2.1 Where We Are Heading 13 2.2 The Single-Server Network 13 2.3 Classification of Queueing Networks 16 2.4 Open Networks 16 2.5 More Metrics: Throughput and Utilization 17 2.6 Closed Networks 20 2.6.1 Interactive (Terminal-Driven) Systems 21 2.6.2 Batch Systems 22 2.6.3 Throughput in a Closed System 23 2.7 Differences between Closed and Open Networks 24 2.7.1 A Question on Modeling 25 2.8 Related Readings 25 2.9 Exercises 26 II Necessary Probability Background 3 Probability Review 31 3.1 Sample Space and Events 31 3.2 Probability Defined on Events 32 3.3 Conditional Probabilities on Events 33 3.4 Independent Events and Conditionally Independent Events 34 3.5 Law of Total Probability 35 3.6 Bayes Law 36 3.7 Discrete versus Continuous Random Variables 37 3.8 Probabilities and Densities 38 3.8.1 Discrete: Probability Mass Function 38 3.8.2 Continuous: Probability Density Function 41 3.9 Expectation and Variance 44 3.10 Joint Probabilities and Independence 47 vii
  • 13. viii contents 3.11 Conditional Probabilities and Expectations 49 3.12 Probabilities and Expectations via Conditioning 53 3.13 Linearity of Expectation 54 3.14 Normal Distribution 57 3.14.1 Linear Transformation Property 58 3.14.2 Central Limit Theorem 61 3.15 Sum of a Random Number of Random Variables 62 3.16 Exercises 64 4 Generating Random Variables for Simulation 70 4.1 Inverse-Transform Method 70 4.1.1 The Continuous Case 70 4.1.2 The Discrete Case 72 4.2 Accept-Reject Method 72 4.2.1 Discrete Case 73 4.2.2 Continuous Case 75 4.2.3 Some Harder Problems 77 4.3 Readings 78 4.4 Exercises 78 5 Sample Paths, Convergence, and Averages 79 5.1 Convergence 79 5.2 Strong and Weak Laws of Large Numbers 83 5.3 Time Average versus Ensemble Average 84 5.3.1 Motivation 85 5.3.2 Definition 86 5.3.3 Interpretation 86 5.3.4 Equivalence 88 5.3.5 Simulation 90 5.3.6 Average Time in System 90 5.4 Related Readings 91 5.5 Exercise 91 III The Predictive Power of Simple Operational Laws: “What-If” Questions and Answers 6 Little’s Law and Other Operational Laws 95 6.1 Little’s Law for Open Systems 95 6.2 Intuitions 96 6.3 Little’s Law for Closed Systems 96 6.4 Proof of Little’s Law for Open Systems 97 6.4.1 Statement via Time Averages 97 6.4.2 Proof 98 6.4.3 Corollaries 100 6.5 Proof of Little’s Law for Closed Systems 101 6.5.1 Statement via Time Averages 101 6.5.2 Proof 102 6.6 Generalized Little’s Law 102
  • 14. contents ix 6.7 Examples Applying Little’s Law 103 6.8 More Operational Laws: The Forced Flow Law 106 6.9 Combining Operational Laws 107 6.10 Device Demands 110 6.11 Readings and Further Topics Related to Little’s Law 111 6.12 Exercises 111 7 Modification Analysis: “What-If” for Closed Systems 114 7.1 Review 114 7.2 Asymptotic Bounds for Closed Systems 115 7.3 Modification Analysis for Closed Systems 118 7.4 More Modification Analysis Examples 119 7.5 Comparison of Closed and Open Networks 122 7.6 Readings 122 7.7 Exercises 122 IV From Markov Chains to Simple Queues 8 Discrete-Time Markov Chains 129 8.1 Discrete-Time versus Continuous-Time Markov Chains 130 8.2 Definition of a DTMC 130 8.3 Examples of Finite-State DTMCs 131 8.3.1 Repair Facility Problem 131 8.3.2 Umbrella Problem 132 8.3.3 Program Analysis Problem 132 8.4 Powers of P: n-Step Transition Probabilities 133 8.5 Stationary Equations 135 8.6 The Stationary Distribution Equals the Limiting Distribution 136 8.7 Examples of Solving Stationary Equations 138 8.7.1 Repair Facility Problem with Cost 138 8.7.2 Umbrella Problem 139 8.8 Infinite-State DTMCs 139 8.9 Infinite-State Stationarity Result 140 8.10 Solving Stationary Equations in Infinite-State DTMCs 142 8.11 Exercises 145 9 Ergodicity Theory 148 9.1 Ergodicity Questions 148 9.2 Finite-State DTMCs 149 9.2.1 Existence of the Limiting Distribution 149 9.2.2 Mean Time between Visits to a State 153 9.2.3 Time Averages 155 9.3 Infinite-State Markov Chains 155 9.3.1 Recurrent versus Transient 156 9.3.2 Infinite Random Walk Example 160 9.3.3 Positive Recurrent versus Null Recurrent 162 9.4 Ergodic Theorem of Markov Chains 164
  • 15. x contents 9.5 Time Averages 166 9.6 Limiting Probabilities Interpreted as Rates 168 9.7 Time-Reversibility Theorem 170 9.8 When Chains Are Periodic or Not Irreducible 171 9.8.1 Periodic Chains 171 9.8.2 Chains that Are Not Irreducible 177 9.9 Conclusion 177 9.10 Proof of Ergodic Theorem of Markov Chains∗ 178 9.11 Exercises 183 10 Real-World Examples: Google, Aloha, and Harder Chains∗ 190 10.1 Google’s PageRank Algorithm 190 10.1.1 Google’s DTMC Algorithm 190 10.1.2 Problems with Real Web Graphs 192 10.1.3 Google’s Solution to Dead Ends and Spider Traps 194 10.1.4 Evaluation of the PageRank Algorithm 195 10.1.5 Practical Implementation Considerations 195 10.2 Aloha Protocol Analysis 195 10.2.1 The Slotted Aloha Protocol 196 10.2.2 The Aloha Markov Chain 196 10.2.3 Properties of the Aloha Markov Chain 198 10.2.4 Improving the Aloha Protocol 199 10.3 Generating Functions for Harder Markov Chains 200 10.3.1 The z-Transform 201 10.3.2 Solving the Chain 201 10.4 Readings and Summary 203 10.5 Exercises 204 11 Exponential Distribution and the Poisson Process 206 11.1 Definition of the Exponential Distribution 206 11.2 Memoryless Property of the Exponential 207 11.3 Relating Exponential to Geometric via δ-Steps 209 11.4 More Properties of the Exponential 211 11.5 The Celebrated Poisson Process 213 11.6 Merging Independent Poisson Processes 218 11.7 Poisson Splitting 218 11.8 Uniformity 221 11.9 Exercises 222 12 Transition to Continuous-Time Markov Chains 225 12.1 Defining CTMCs 225 12.2 Solving CTMCs 229 12.3 Generalization and Interpretation 232 12.3.1 Interpreting the Balance Equations for the CTMC 234 12.3.2 Summary Theorem for CTMCs 234 12.4 Exercises 234
  • 16. contents xi 13 M/M/1 and PASTA 236 13.1 The M/M/1 Queue 236 13.2 Examples Using an M/M/1 Queue 239 13.3 PASTA 242 13.4 Further Reading 245 13.5 Exercises 245 V Server Farms and Networks: Multi-server, Multi-queue Systems 14 Server Farms: M/M/k and M/M/k/k 253 14.1 Time-Reversibility for CTMCs 253 14.2 M/M/k/k Loss System 255 14.3 M/M/k 258 14.4 Comparison of Three Server Organizations 263 14.5 Readings 264 14.6 Exercises 264 15 Capacity Provisioning for Server Farms 269 15.1 What Does Load Really Mean in an M/M/k? 269 15.2 The M/M/∞ 271 15.2.1 Analysis of the M/M/∞ 271 15.2.2 A First Cut at a Capacity Provisioning Rule for the M/M/k 272 15.3 Square-Root Staffing 274 15.4 Readings 276 15.5 Exercises 276 16 Time-Reversibility and Burke’s Theorem 282 16.1 More Examples of Finite-State CTMCs 282 16.1.1 Networks with Finite Buffer Space 282 16.1.2 Batch System with M/M/2 I/O 284 16.2 The Reverse Chain 285 16.3 Burke’s Theorem 288 16.4 An Alternative (Partial) Proof of Burke’s Theorem 290 16.5 Application: Tandem Servers 291 16.6 General Acyclic Networks with Probabilistic Routing 293 16.7 Readings 294 16.8 Exercises 294 17 Networks of Queues and Jackson Product Form 297 17.1 Jackson Network Definition 297 17.2 The Arrival Process into Each Server 298 17.3 Solving the Jackson Network 300 17.4 The Local Balance Approach 301 17.5 Readings 306 17.6 Exercises 306 18 Classed Network of Queues 311 18.1 Overview 311 18.2 Motivation for Classed Networks 311
  • 17. xii contents 18.3 Notation and Modeling for Classed Jackson Networks 314 18.4 A Single-Server Classed Network 315 18.5 Product Form Theorems 317 18.6 Examples Using Classed Networks 322 18.6.1 Connection-Oriented ATM Network Example 322 18.6.2 Distribution of Job Classes Example 325 18.6.3 CPU-Bound and I/O-Bound Jobs Example 326 18.7 Readings 329 18.8 Exercises 329 19 Closed Networks of Queues 331 19.1 Motivation 331 19.2 Product-Form Solution 333 19.2.1 Local Balance Equations for Closed Networks 333 19.2.2 Example of Deriving Limiting Probabilities 335 19.3 Mean Value Analysis (MVA) 337 19.3.1 The Arrival Theorem 338 19.3.2 Iterative Derivation of Mean Response Time 340 19.3.3 An MVA Example 341 19.4 Readings 343 19.5 Exercises 343 VI Real-World Workloads: High Variability and Heavy Tails 20 Tales of Tails: A Case Study of Real-World Workloads 349 20.1 Grad School Tales . . . Process Migration 349 20.2 UNIX Process Lifetime Measurements 350 20.3 Properties of the Pareto Distribution 352 20.4 The Bounded Pareto Distribution 353 20.5 Heavy Tails 354 20.6 The Benefits of Active Process Migration 354 20.7 Pareto Distributions Are Everywhere 355 20.8 Exercises 357 21 Phase-Type Distributions and Matrix-Analytic Methods 359 21.1 Representing General Distributions by Exponentials 359 21.2 Markov Chain Modeling of PH Workloads 364 21.3 The Matrix-Analytic Method 366 21.4 Analysis of Time-Varying Load 367 21.4.1 High-Level Ideas 367 21.4.2 The Generator Matrix, Q 368 21.4.3 Solving for R 370 21.4.4 Finding π0 371 21.4.5 Performance Metrics 372 21.5 More Complex Chains 372 21.6 Readings and Further Remarks 376 21.7 Exercises 376
  • 18. contents xiii 22 Networks with Time-Sharing (PS) Servers (BCMP) 380 22.1 Review of Product-Form Networks 380 22.2 BCMP Result 380 22.2.1 Networks with FCFS Servers 381 22.2.2 Networks with PS Servers 382 22.3 M/M/1/PS 384 22.4 M/Cox/1/PS 385 22.5 Tandem Network of M/G/1/PS Servers 391 22.6 Network of PS Servers with Probabilistic Routing 393 22.7 Readings 394 22.8 Exercises 394 23 The M/G/1 Queue and the Inspection Paradox 395 23.1 The Inspection Paradox 395 23.2 The M/G/1 Queue and Its Analysis 396 23.3 Renewal-Reward Theory 399 23.4 Applying Renewal-Reward to Get Expected Excess 400 23.5 Back to the Inspection Paradox 402 23.6 Back to the M/G/1 Queue 403 23.7 Exercises 405 24 Task Assignment Policies for Server Farms 408 24.1 Task Assignment for FCFS Server Farms 410 24.2 Task Assignment for PS Server Farms 419 24.3 Optimal Server Farm Design 424 24.4 Readings and Further Follow-Up 428 24.5 Exercises 430 25 Transform Analysis 433 25.1 Definitions of Transforms and Some Examples 433 25.2 Getting Moments from Transforms: Peeling the Onion 436 25.3 Linearity of Transforms 439 25.4 Conditioning 441 25.5 Distribution of Response Time in an M/M/1 443 25.6 Combining Laplace and z-Transforms 444 25.7 More Results on Transforms 445 25.8 Readings 446 25.9 Exercises 446 26 M/G/1 Transform Analysis 450 26.1 The z-Transform of the Number in System 450 26.2 The Laplace Transform of Time in System 454 26.3 Readings 456 26.4 Exercises 456 27 Power Optimization Application 457 27.1 The Power Optimization Problem 457 27.2 Busy Period Analysis of M/G/1 459 27.3 M/G/1 with Setup Cost 462
  • 19. xiv contents 27.4 Comparing ON/IDLE versus ON/OFF 465 27.5 Readings 467 27.6 Exercises 467 VII Smart Scheduling in the M/G/1 28 Performance Metrics 473 28.1 Traditional Metrics 473 28.2 Commonly Used Metrics for Single Queues 474 28.3 Today’s Trendy Metrics 474 28.4 Starvation/Fairness Metrics 475 28.5 Deriving Performance Metrics 476 28.6 Readings 477 29 Scheduling: Non-Preemptive, Non-Size-Based Policies 478 29.1 FCFS, LCFS, and RANDOM 478 29.2 Readings 481 29.3 Exercises 481 30 Scheduling: Preemptive, Non-Size-Based Policies 482 30.1 Processor-Sharing (PS) 482 30.1.1 Motivation behind PS 482 30.1.2 Ages of Jobs in the M/G/1/PS System 483 30.1.3 Response Time as a Function of Job Size 484 30.1.4 Intuition for PS Results 487 30.1.5 Implications of PS Results for Understanding FCFS 487 30.2 Preemptive-LCFS 488 30.3 FB Scheduling 490 30.4 Readings 495 30.5 Exercises 496 31 Scheduling: Non-Preemptive, Size-Based Policies 499 31.1 Priority Queueing 499 31.2 Non-Preemptive Priority 501 31.3 Shortest-Job-First (SJF) 504 31.4 The Problem with Non-Preemptive Policies 506 31.5 Exercises 507 32 Scheduling: Preemptive, Size-Based Policies 508 32.1 Motivation 508 32.2 Preemptive Priority Queueing 508 32.3 Preemptive-Shortest-Job-First (PSJF) 512 32.4 Transform Analysis of PSJF 514 32.5 Exercises 516 33 Scheduling: SRPT and Fairness 518 33.1 Shortest-Remaining-Processing-Time (SRPT) 518 33.2 Precise Derivation of SRPT Waiting Time∗ 521
  • 20. contents xv 33.3 Comparisons with Other Policies 523 33.3.1 Comparison with PSJF 523 33.3.2 SRPT versus FB 523 33.3.3 Comparison of All Scheduling Policies 524 33.4 Fairness of SRPT 525 33.5 Readings 529 Bibliography 531 Index 541
  • 22. Preface The ad hoc World of Computer System Design The design of computer systems is often viewed very much as an art rather than a science. Decisions about which scheduling policy to use, how many servers to run, what speed to operate each server at, and the like are often based on intuitions rather than mathematically derived formulas. Specific policies built into kernels are often riddled with secret “voodoo constants,”1 which have no explanation but seem to “work well” under some benchmarked workloads. Computer systems students are often told to first build the system and then make changes to the policies to improve system performance, rather than first creating a formal model and design of the system on paper to ensure the system meets performance goals. Even when trying to evaluate the performance of an existing computer system, students are encouraged to simulate the system and spend many days running their simulation under different workloads waiting to see what happens. Given that the search space of possible workloads and input parameters is often huge, vast numbers of simulations are needed to properly cover the space. Despite this fact, mathematical models of the system are rarely created, and we rarely characterize workloads stochastically. There is no formal analysis of the parameter space under which the computer system is likely to perform well versus that under which it is likely to perform poorly. It is no wonder that computer systems students are left feeling that the whole process of system evaluation and design is very ad hoc. As an example, consider the trial-and-error approach to updating resource scheduling in the many versions of the Linux kernel. Analytical Modeling for Computer Systems But it does not have to be this way! These same systems designers could mathematically model the system, stochastically characterize the workloads and performance goals, and then analytically derive the performance of the system as a function of workload and input parameters. The fields of analytical modeling and stochastic processes have existed for close to a century, and they can be used to save systems designers huge numbers of hours in trial and error while improving performance. Analytical modeling can also be used in conjunction with simulation to help guide the simulation, reducing the number of cases that need to be explored. 1 The term “voodoo constants” was coined by Prof. John Ousterhout during his lectures at the University of California, Berkeley. xvii
  • 23. xviii preface Unfortunately, of the hundreds of books written on stochastic processes, almost none deal with computer systems. The examples in those books and the material covered are oriented toward operations research areas such as manufacturing systems, or human operators answering calls in a call center, or some assembly-line system with different priority jobs. In many ways the analysis used in designing manufacturing systems is not all that different from computer systems. There are many parallels between a human operator and a computer server: There are faster human operators and slower ones (just as computer servers); the human servers sometimes get sick (just as computer servers sometimes break down); when not needed, human operators can be sent home to save money (just as computer servers can be turned off to save power); there is a startup overhead to bringing back a human operator (just as there is a warmup cost to turning on a computer server); and the list goes on. However, there are also many differences between manufacturing systems and com- puter systems. To start, computer systems workloads have been shown to have ex- tremely high variability in job sizes (service requirements), with squared coefficients of variation upward of 100. This is very different from the low-variability service times characteristic of job sizes in manufacturing workloads. This difference in variability can result in performance differences of orders of magnitude. Second, computer work- loads are typically preemptible, and time-sharing (Processor-Sharing) of the CPU is extremely common. By contrast, most manufacturing workloads are non-preemptive (first-come-first-serve service order is the most common). Thus most books on stochas- tic processes and queueing omit chapters on Processor-Sharing or more advanced pre- emptive policies like Shortest-Remaining-Processing-Time, which are very much at the heart of computer systems. Processor-Sharing is particularly relevant when analyz- ing server farms, which, in the case of computer systems, are typically composed of Processor-Sharing servers, not First-Come-First-Served ones. It is also relevant in any computing application involving bandwidth being shared between users, which typi- cally happens in a processor-sharing style, not first-come-first-serve order. Performance metrics may also be different for computer systems as compared with manufacturing systems (e.g., power usage, an important metric for computer systems, is not mentioned in stochastic processes books). Closed-loop architectures, in which new jobs are not created until existing jobs complete, and where the performance goal is to maximize throughput, are very common in computer systems, but are often left out of queueing books. Finally, the particular types of interactions that occur in disks, networking pro- tocols, databases, memory controllers, and other computer systems are very different from what has been analyzed in traditional queueing books. The Goal of This Book Many times I have walked into a fellow computer scientist’s office and was pleased to find a queueing book on his shelf. Unfortunately, when questioned, my colleague was quick to answer that he never uses the book because “The world doesn’t look like an M/M/1 queue, and I can’t understand anything past that chapter.” The problem is that
  • 24. preface xix the queueing theory books are not “friendly” to computer scientists. The applications are not computer-oriented, and the assumptions used are often unrealistic for computer systems. Furthermore, these books are abstruse and often impenetrable by anyone who has not studied graduate-level mathematics. In some sense this is hard to avoid: If one wants to do more than provide readers with formulas to “plug into,” then one has to teach them to derive their own formulas, and this requires learning a good deal of math. Fortunately, as one of my favorite authors, Sheldon Ross, has shown, it is possible to teach a lot of stochastic analysis in a fun and simple way that does not require first taking classes in measure theory and real analysis. My motive in writing this book is to improve the design of computer systems by intro- ducing computer scientists to the powerful world of queueing-theoretic modeling and analysis. Personally, I have found queueing-theoretic analysis to be extremely valuable in much of my research including: designing routing protocols for networks, designing better scheduling algorithms for web servers and database management systems, disk scheduling, memory-bank allocation, supercomputing resource scheduling, and power management and capacity provisioning in data centers. Content-wise, I have two goals for the book. First, I want to provide enough applications from computer systems to make the book relevant and interesting to computer scientists. Toward this end, almost half the chapters of the book are “application” chapters. Second, I want to make the book mathematically rich enough to give readers the ability to actually develop new queueing analysis, not just apply existing analysis. As computer systems and their workloads continue to evolve and become more complex, it is unrealistic to assume that they can be modeled with known queueing frameworks and analyses. As a designer of computer systems myself, I am constantly finding that I have to invent new queueing concepts to model aspects of computer systems. How This Book Came to Be In 1998, as a postdoc at MIT, I developed and taught a new computer science class, which I called “Performance Analysis and Design of Computer Systems.” The class had the following description: In designing computer systems one is usually constrained by certain performance goals (e.g., low response time or high throughput or low energy). On the other hand, one often has many choices: One fast disk, or two slow ones? What speed CPU will suffice? Should we invest our money in more buffer space or a faster processor? How should jobs be scheduled by the processor? Does it pay to migrate active jobs? Which routing policy will work best? Should one balance load among servers? How can we best combat high-variability workloads? Often answers to these questions are counterintuitive. Ideally, one would like to have answers to these questions before investing the time and money to build a system. This class will introduce students to analytic stochastic modeling, which allows system designers to answer questions such as those above. Since then, I have further developed the class via 10 more iterations taught within the School of Computer Science at Carnegie Mellon, where I taught versions of the
  • 25. xx preface class to both PhD students and advanced undergraduates in the areas of computer science, engineering, mathematics, and operations research. In 2002, the Operations Management department within the Tepper School of Business at Carnegie Mellon made the class a qualifier requirement for all operations management students. As other faculty, including my own former PhD students, adopted my lecture notes in teaching their own classes, I was frequently asked to turn the notes into a book. This is “version 1” of that book. Outline of the Book This book is written in a question/answer style, which mimics the Socratic style that I use in teaching. I believe that a class “lecture” should ideally be a long sequence of bite-sized questions, which students can easily provide answers to and which lead students to the right intuitions. In reading this book, it is extremely important to try to answer each question without looking at the answer that follows the question. The questions are written to remind the reader to “think” rather than just “read,” and to remind the teacher to ask questions rather than just state facts. There are exercises at the end of each chapter. The exercises are an integral part of the book and should not be skipped. Many exercises are used to illustrate the application of the theory to problems in computer systems design, typically with the purpose of illuminating a key insight. All exercises are related to the material covered in the chapter, with early exercises being straightforward applications of the material and later exercises exploring extensions of the material involving greater difficulty. The book is divided into seven parts, which mostly build on each other. Part I introduces queueing theory and provides motivating examples from computer systems design that can be answered using basic queueing analysis. Basic queueing terminology is introduced including closed and open queueing models and performance metrics. Part II is a probability refresher. To make this book self-contained, we have included in these chapters all the probability that will be needed throughout the rest of the book. This includes a summary of common discrete and continuous random variables, their moments, and conditional expectations and probabilities. Also included is some mate- rial on generating random variables for simulation. Finally we end with a discussion of sample paths, convergence of sequences of random variables, and time averages versus ensemble averages. Part III is about operational laws, or “back of the envelope” analysis. These are very simple laws that hold for all well-behaved queueing systems. In particular, they do not require that any assumptions be made about the arrival process or workload (like Poisson arrivals or Exponential service times). These laws allow us to quickly reason at a high level (averages only) about system behavior and make design decisions regarding what modifications will have the biggest performance impact. Applications to high-level computer system design are provided throughout.
  • 26. preface xxi Part IV is about Markov chains and their application toward stochastic analysis of computer systems. Markov chains allow a much more detailed analysis of systems by representing the full space of possible states that the system can be in. Whereas the operational laws in Part III often allow us to answer questions about the overall mean number of jobs in a system, Markov chains allow us to derive the probability of exactly i jobs being queued at server j of a multi-server system. Part IV includes both discrete-time and continuous-time Markov chains. Applications include Google’s PageRank algorithm, the Aloha (Ethernet) networking protocol, and an analysis of dropping probabilities in finite-buffer routers. Part V develops the Markov chain theory introduced in Part IV to allow the analysis of more complex networks, including server farms. We analyze networks of queues with complex routing rules, where jobs can be associated with a “class” that determines their route through the network (these are known as BCMP networks). Part V also derives theorems on capacity provisioning of server farms, such as the “square-root staffing rule,” which determines the minimum number of servers needed to provide certain delay guarantees. The fact that Parts IV and V are based on Markov chains necessitates that certain “Markovian” (memoryless) assumptions are made in the analysis. In particular, it is assumed that the service requirements (sizes) of jobs follow an Exponential distribu- tion and that the times between job arrivals are also Exponentially distributed. Many applications are reasonably well modeled via these Exponential assumptions, allowing us to use Markov analysis to get good insights into system performance. However, in some cases, it is important to capture the high-variability job size distributions or correlations present in the empirical workloads. Part VI introduces techniques that allow us to replace these Exponential distributions with high-variability distributions. Phase-type distributions are introduced, which allow us to model virtually any general distribution by a mixture of Exponentials, leverag- ing our understanding of Exponential distributions and Markov chains from Parts IV and V. Matrix-analytic techniques are then developed to analyze systems with phase- type workloads in both the arrival process and service process. The M/G/1 queue is introduced, and notions such as the Inspection Paradox are discussed. Real-world workloads are described including heavy-tailed distributions. Transform techniques are also introduced that facilitate working with general distributions. Finally, even the service order at the queues is generalized from simple first-come-first-served ser- vice order to time-sharing (Processor-Sharing) service order, which is more common in computer systems. Applications abound: Resource allocation (task assignment) in server farms with high-variability job sizes is studied extensively, both for server farms with non-preemptive workloads and for web server farms with time-sharing servers. Power management policies for single servers and for data centers are also studied. Part VII, the final part of the book, is devoted to scheduling. Smart scheduling is extremely important in computer systems, because it can dramatically improve system performance without requiring the purchase of any new hardware. Scheduling is at the heart of operating systems, bandwidth allocation in networks, disks, databases, memory hierarchies, and the like. Much of the research being done in the computer systems
  • 27. xxii preface area today involves the design and adoption of new scheduling policies. Scheduling can be counterintuitive, however, and the analysis of even basic scheduling policies is far from simple. Scheduling policies are typically evaluated via simulation. In introducing the reader to analytical techniques for evaluating scheduling policies, our hope is that more such policies might be evaluated via analysis. We expect readers to mostly work through the chapters in order, with the following exceptions: First, any chapter or section marked with a star (*) can be skipped without disturbing the flow. Second, the chapter on transforms, Chapter 25, is purposely moved to the end, so that most of the book does not depend on knowing transform analysis. However, because learning transform analysis takes some time, we recommend that any teacher who plans to cover transforms introduce the topic a little at a time, starting early in the course. To facilitate this, we have included a large number of exercises at the end of Chapter 25 that do not require material in later chapters and can be assigned earlier in the course to give students practice manipulating transforms. Finally, we urge readers to please check the following websites for new errors/software: http://www.cs.cmu.edu/∼harchol/PerformanceModeling/errata.html http://www.cs.cmu.edu/∼harchol/PerformanceModeling/software.html Please send any additional errors to harchol@cs.cmu.edu.
  • 28. Acknowledgments Writing a book, I quickly realized, is very different from writing a research paper, even a very long one. Book writing actually bears much more similarity to teaching a class. That is why I would like to start by thanking the three people who most influenced my teaching. Manuel Blum, my PhD advisor, taught me the art of creating a lecture out of a series of bite-sized questions. Dick Karp taught me that you can cover an almost infinite amount of material in just one lecture if you spend enough time in advance simplifying that material into its cleanest form. Sheldon Ross inspired me by the depth of his knowledge in stochastic processes (a knowledge so deep that he never once looked at his notes while teaching) and by the sheer clarity and elegance of both his lectures and his many beautifully written books. I would also like to thank Carnegie Mellon University, and the School of Computer Science at Carnegie Mellon, which has at its core the theme of interdisciplinary re- search, particularly the mixing of theoretical and applied research. CMU has been the perfect environment for me to develop the analytical techniques in this book, all in the context of solving hard applied problems in computer systems design. CMU has also provided me with a never-ending stream of gifted students, who have inspired many of the exercises and discussions in this book. Much of this book came from the research of my own PhD students, including Sherwin Doroudi, Anshul Gandhi, Varun Gupta, Yoongu Kim, David McWherter, Takayuki Osogami, Bianca Schroeder, Adam Wierman, and Timothy Zhu. In addition, Mark Crovella, Mike Kozuch, and particu- larly Alan Scheller-Wolf, all longtime collaborators of mine, have inspired much of my thinking via their uncanny intuitions and insights. A great many people have proofread parts of this book or tested out the book and provided me with useful feedback. These include Sem Borst, Doug Down, Erhun Ozkan, Katsunobu Sasanuma, Alan Scheller-Wolf, Thrasyvoulos Spyropoulos, Jarod Wang, and Zachary Young. I would also like to thank my editors, Diana Gillooly and Lauren Cowles from Cambridge University Press, who were very quick to answer my endless questions, and who greatly improved the presentation of this book. Finally, I am very grateful to Miso Kim, my illustrator, a PhD student at the Carnegie Mellon School of Design, who spent hundreds of hours designing all the fun figures in the book. On a more personal note, I would like to thank my mother, Irit Harchol, for making my priorities her priorities, allowing me to maximize my achievements. I did not know what this meant until I had a child of my own. Lastly, I would like to thank my husband, Andrew Young. He won me over by reading all my online lecture notes and doing every homework problem – this was his way of asking me for a first date. His ability to understand it all without attending any lectures made me believe that my lecture notes might actually “work” as a book. His willingness to sit by my side every night for many months gave me the motivation to make it happen. xxiii
  • 30. PART I Introduction to Queueing Part I serves as an introduction to analytical modeling. We begin in Chapter 1 with a number of paradoxical examples that come up in the design of computer systems, showing off the power of analytical modeling in making design decisions. Chapter 2 introduces the reader to basic queueing theory terminology and notation that is used throughout the rest of the book. Readers are introduced to both open and closed queueing networks and to standard performance metrics, such as response time, throughput, and the number of jobs in the system. 1
  • 32. CHAPTER 1 Motivating Examples of the Power of Analytical Modeling 1.1 What Is Queueing Theory? Queueing theory is the theory behind what happens when you have lots of jobs, scarce resources, and subsequently long queues and delays. It is literally the “theory of queues”: what makes queues appear and how to make them go away. Imagine a computer system, say a web server, where there is only one job. The job arrives, it uses certain resources (some CPU, some I/O), and then it departs. Given the job’s resource requirements, it is very easy to predict exactly when the job will depart. There is no delay because there are no queues. If every job indeed got to run on its own computer, there would be no need for queueing theory. Unfortunately, that is rarely the case. Arriving customers Server Figure 1.1. Illustration of a queue, in which customers wait to be served, and a server. The picture shows one customer being served at the server and five others waiting in the queue. Queueing theory applies anywhere that queues come up (see Figure 1.1). We all have had the experience of waiting in line at the bank, wondering why there are not more tellers, or waiting in line at the supermarket, wondering why the express lane is for 8 items or less rather than 15 items or less, or whether it might be best to actually have two express lanes, one for 8 items or less and the other for 15 items or less. Queues are also at the heart of any computer system. Your CPU uses a time-sharing scheduler to serve a queue of jobs waiting for CPU time. A computer disk serves a queue of jobs waiting to read or write blocks. A router in a network serves a queue of packets waiting to be routed. The router queue is a finite capacity queue, in which packets are dropped when demand exceeds the buffer space. Memory banks serve queues of threads requesting memory blocks. Databases sometimes have lock queues, where transactions wait to acquire the lock on a record. Server farms consist of many servers, each with its own queue of jobs. The list of examples goes on and on. The goals of a queueing theorist are twofold. The first is predicting the system perfor- mance. Typically this means predicting mean delay or delay variability or the proba- bility that delay exceeds some Service Level Agreement (SLA). However, it can also mean predicting the number of jobs that will be queueing or the mean number of servers 3
  • 33. 4 motivating examples of the power of analytical modeling being utilized (e.g., total power needs), or any other such metric. Although prediction is important, an even more important goal is finding a superior system design to im- prove performance. Commonly this takes the form of capacity planning, where one determines which additional resources to buy to meet delay goals (e.g., is it better to buy a faster disk or a faster CPU, or to add a second slow disk). Many times, however, without buying any additional resources at all, one can improve performance just by deploying a smarter scheduling policy or different routing policy to reduce delays. Given the importance of smart scheduling in computer systems, all of Part VII of this book is devoted to understanding scheduling policies. Queueing theory is built on a much broader area of mathematics called stochastic modeling and analysis. Stochastic modeling represents the service demands of jobs and the interarrival times of jobs as random variables. For example, the CPU requirements of UNIX processes might be modeled using a Pareto distribution [84], whereas the arrival process of jobs at a busy web server might be well modeled by a Poisson process with Exponentially distributed interarrival times. Stochastic models can also be used to model dependencies between jobs, as well as anything else that can be represented as a random variable. Although it is generally possible to come up with a stochastic model that adequately represents the jobs or customers in a system and its service dynamics, these stochastic models are not always analytically tractable with respect to solving for performance. As we discuss in Part IV, Markovian assumptions, such as assuming Exponentially distributed service demands or a Poisson arrival process, greatly simplify the analysis; hence much of the existing queueing literature relies on such Markovian assumptions. In many cases these are a reasonable approximation. For example, the arrival process of book orders on Amazon might be reasonably well approximated by a Poisson process, given that there are many independent users, each independently submitting requests at a low rate (although this all breaks down when a new Harry Potter book comes out). However, in some cases Markovian assumptions are very far from reality; for example, in the case in which service demands of jobs are highly variable or are correlated. While many queueing texts downplay the Markovian assumptions being made, this book does just the opposite. Much of my own research is devoted to demonstrating the impact of workload assumptions on correctly predicting system performance. I have found many cases where making simplifying assumptions about the workload can lead to very inaccurate performance results and poor system designs. In my own research, I therefore put great emphasis on integrating measured workload distributions into the analysis. Rather than trying to hide the assumptions being made, this book highlights all assumptions about workloads. We will discuss specifically whether the workload models are accurate and how our model assumptions affect performance and design, as well as look for more accurate workload models. In my opinion, a major reason why computer scientists are so slow to adopt queueing theory is that the standard Markovian assumptions often do not fit. However, there are often ways to work around these assumptions, many of which are shown in this book, such as using phase-type distributions and matrix-analytic methods, introduced in Chapter 21.
  • 34. 1.2 examples of the power of queueing theory 5 1.2 Examples of the Power of Queueing Theory The remainder of this chapter is devoted to showing some concrete examples of the power of queueing theory. Do not expect to understand everything in the examples. The examples are developed in much greater detail later in the book. Terms like “Poisson process” that you may not be familiar with are also explained later in the book. These examples are just here to highlight the types of lessons covered in this book. As stated earlier, one use of queueing theory is as a predictive tool, allowing one to predict the performance of a given system. For example, one might be analyzing a network, with certain bandwidths, where different classes of packets arrive at certain rates and follow certain routes throughout the network simultaneously. Then queueing theory can be used to compute quantities such as the mean time that packets spend waiting at a particular router i, the distribution on the queue buildup at router i, or the mean overall time to get from router i to router j in the network. We now turn to the usefulness of queueing theory as a design tool in choosing the best system design to minimize response time. The examples that follow illustrate that system design is often a counterintuitive process. Design Example 1 – Doubling Arrival Rate Consider a system consisting of a single CPU that serves a queue of jobs in First-Come- First-Served (FCFS) order, as illustrated in Figure 1.2. The jobs arrive according to some random process with some average arrival rate, say λ = 3 jobs per second. Each job has some CPU service requirement, drawn independently from some distribution of job service requirements (we can assume any distribution on the job service requirements for this example). Let’s say that the average service rate is μ = 5 jobs per second (i.e., each job on average requires 1/5 of a second of service). Note that the system is not in overload (3 5). Let E [T] denote the mean response time of this system, where response time is the time from when a job arrives until it completes service, a.k.a. sojourn time. λ = 3 FCFS CPU µ = 5 If λ 2λ, by how much should µ increase? Figure 1.2. A system with a single CPU that serves jobs in FCFS order. Question: Your boss tells you that starting tomorrow the arrival rate will double. You are told to buy a faster CPU to ensure that jobs experience the same mean response time, E [T]. That is, customers should not notice the effect of the increased arrival rate. By how much should you increase the CPU speed? (a) Double the CPU speed; (b) More than double the CPU speed; (c) Less than double the CPU speed. Answer: (c) Less than double.
  • 35. 6 motivating examples of the power of analytical modeling Question: Why not (a)? Answer: It turns out that doubling CPU speed together with doubling the arrival rate will generally result in cutting the mean response time in half! We prove this in Chapter 13. Therefore, the CPU speed does not need to double. Question: Can you immediately see a rough argument for this result that does not involve any queueing theory formulas? What happens if we double the service rate and double the arrival rate? Answer: Imagine that there are two types of time: Federation time and Klingon time. Klingon seconds are faster than Federation seconds. In fact, each Klingon second is equivalent to a half-second in Federation time. Now, suppose that in the Federation, there is a CPU serving jobs. Jobs arrive with rate λ jobs per second and are served at some rate μ jobs per second. The Klingons steal the system specs and reengineer the same system in the Klingon world. In the Klingon system, the arrival rate is λ jobs per Klingon second, and the service rate is μ jobs per Klingon second. Note that both systems have the same mean response time, E [T], except that the Klingon system response time is measured in Klingon seconds, while the Federation system response time is measured in Federation seconds. Consider now that Captain Kirk is observing both the Federation system and the Klingon reengineered system. From his perspective, the Klingon system has twice the arrival rate and twice the service rate; however, the mean response time in the Klingon system has been halved (because Klingon seconds are half-seconds in Federation time). Question: Suppose the CPU employs time-sharing service order (known as Processor- Sharing, or PS for short), instead of FCFS. Does the answer change? Answer: No. The same basic argument still works. Design Example 2 – Sometimes “Improvements” Do Nothing Consider the batch system shown in Figure 1.3. There are always N = 6 jobs in this system (this is called the multiprogramming level). As soon as a job completes service, a new job is started (this is called a “closed” system). Each job must go through the “service facility.” At the service facility, with probability 1/2 the job goes to server 1, and with probability 1/2 it goes to server 2. Server 1 services jobs at an average rate of 1 job every 3 seconds. Server 2 also services jobs at an average rate of 1 job every 3 seconds. The distribution on the service times of the jobs is irrelevant for this problem. Response time is defined as usual as the time from when a job first arrives at the service facility (at the fork) until it completes service. N = 6 jobs ½ ½ µ =⅓ Server 1 Server 2 µ =⅓ Figure 1.3. A closed batch system.
  • 36. 1.2 examples of the power of queueing theory 7 Question: You replace server 1 with a server that is twice as fast (the new server services jobs at an average rate of 2 jobs every 3 seconds). Does this “improvement” affect the average response time in the system? Does it affect the throughput? (Assume that the routing probabilities remain constant at 1/2 and 1/2.) Answer: Not really. Both the average response time and throughput are hardly affected. This is explained in Chapter 7. Question: Suppose that the system had a higher multiprogramming level, N. Does the answer change? Answer: No. The already negligible effect on response time and throughput goes to zero as N increases. Question: Suppose the system had a lower value of N. Does the answer change? Answer: Yes. If N is sufficiently low, then the “improvement” helps. Consider, for example, the case N = 1. Question: Suppose the system is changed into an open system, rather than a closed system, as shown in Figure 1.4, where arrival times are independent of service com- pletions. Now does the “improvement” reduce mean response time? Answer: Absolutely! ½ ½ µ =⅓ Server 1 Server 2 µ =⅓ Figure 1.4. An open system. Design Example 3 – One Machine or Many? You are given a choice between one fast CPU of speed s, or n slow CPUs each of speed s/n (see Figure 1.5). Your goal is to minimize mean response time. To start, assume that jobs are non-preemptible (i.e., each job must be run to completion). versus µ = 1 µ = 1 µ = 4 µ = 1 µ = 1 Figure 1.5. Which is better for minimizing mean response time: many slow servers or one fast server?
  • 37. 8 motivating examples of the power of analytical modeling Question: Which is the better choice: one fast machine or many slow ones? Hint: Suppose that I tell you that the answer is, “It depends on the workload.” What aspects of the workload do you think the answer depends on? Answer: It turns out that the answer depends on the variability of the job size distribu- tion, as well as on the system load. Question: Which system do you prefer when job size variability is high? Answer: When job size variability is high, we prefer many slow servers because we do not want short jobs getting stuck behind long ones. Question: Which system do you prefer when load is low? Answer: When load is low, not all servers will be utilized, so it seems better to go with one fast server. These observations are revisited many times throughout the book. Question: Now suppose we ask the same question, but jobs are preemptible; that is, they can be stopped and restarted where they left off. When do we prefer many slow machines as compared to a single fast machine? Answer: If your jobs are preemptible, you could always use a single fast machine to simulate the effect of n slow machines. Hence a single fast machine is at least as good. The question of many slow servers versus a few fast ones has huge applicability in a wide range of areas, because anything can be viewed as a resource, including CPU, power, and bandwidth. For an example involving power management in data centers, consider the problem from [69] where you have a fixed power budget P and a server farm consisting of n servers. You have to decide how much power to allocate to each server, so as to minimize overall mean response time for jobs arriving at the server farm. There is a function that specifies the relationship between the power allocated to a server and the speed (frequency) at which it runs – generally, the more power you allocate to a server, the faster it runs (the higher its frequency), subject to some maximum possible frequency and some minimum power level needed just to turn the server on. To answer the question of how to allocate power, you need to think about whether you prefer many slow servers (allocate just a little power to every server) or a few fast ones (distribute all the power among a small number of servers). In [69], queueing theory is used to optimally answer this question under a wide variety of parameter settings. As another example, if bandwidth is the resource, we can ask when it pays to partition bandwidth into smaller chunks and when it is better not to. The problem is also interesting when performance is combined with price. For example, it is often cheaper (financially) to purchase many slow servers than a few fast servers. Yet in some cases, many slow servers can consume more total power than a few fast ones. All of these factors can further influence the choice of architecture.
  • 38. 1.2 examples of the power of queueing theory 9 Design Example 4 – Task Assignment in a Server Farm Consider a server farm with a central dispatcher and several hosts. Each arriving job is immediately dispatched to one of the hosts for processing. Figure 1.6 illustrates such a system. Host 1 Host 2 Host 3 Dispatcher (Load Balancer) Arrivals Figure 1.6. A distributed server system with a central dispatcher. Server farms like this are found everywhere. Web server farms typically deploy a front-end dispatcher like Cisco’s Local Director or IBM’s Network Dispatcher. Super- computing sites might use LoadLeveler or some other dispatcher to balance load and assign jobs to hosts. For the moment, let’s assume that all the hosts are identical (homogeneous) and that all jobs only use a single resource. Let’s also assume that once jobs are assigned to a host, they are processed there in FCFS order and are non-preemptible. There are many possible task assignment policies that can be used for dispatching jobs to hosts. Here are a few: Random: Each job flips a fair coin to determine where it is routed. Round-Robin: The ith job goes to host i mod n, where n is the number of hosts, and hosts are numbered 0, 1, . . . , n − 1. Shortest-Queue: Each job goes to the host with the fewest number of jobs. Size-Interval-Task-Assignment (SITA): “Short” jobs go to the first host, “medium” jobs go to the second host, “long” jobs go to the third host, etc., for some definition of “short,” “medium,” and “long.” Least-Work-Left (LWL): Each job goes to the host with the least total remaining work, where the “work” at a host is the sum of the sizes of jobs there. Central-Queue: Rather than have a queue at each host, jobs are pooled at one central queue. When a host is done working on a job, it grabs the first job in the central queue to work on. Question: Which of these task assignment policies yields the lowest mean response time?
  • 39. 10 motivating examples of the power of analytical modeling Answer: Given the ubiquity of server farms, it is surprising how little is known about this question. If job size variability is low, then the LWL policy is best. If job size variability is high, then it is important to keep short jobs from getting stuck behind long ones, so a SITA-like policy, which affords short jobs isolation from long ones, can be far better. In fact, for a long time it was believed that SITA is always better than LWL when job size variability is high. However, it was recently discovered (see [90]) that SITA can be far worse than LWL even under job size variability tending to infinity. It turns out that other properties of the workload, including load and fractional moments of the job size distribution, matter as well. Question: For the previous question, how important was it to know the size of jobs? For example, how does LWL, which requires knowing job size, compare with Central- Queue, which does not? Answer: Actually, most task assignment policies do not require knowing the size of jobs. For example, it can be proven by induction that LWL is equivalent to Central- Queue. Even policies like SITA, which by definition are based on knowing the job size, can be well approximated by other policies that do not require knowing the job size; see [82]. Question: Now consider a different model, in which jobs are preemptible. Specifically, suppose that the servers are Processor-Sharing (PS) servers, which time-share among all the jobs at the server, rather than serving them in FCFS order. Which task assignment policy is preferable now? Is the answer the same as that for FCFS servers? Answer: The task assignment policies that are best for FCFS servers are often a disaster under PS servers. For PS servers, the Shortest-Queue policy is near optimal ([79]), whereas that policy is pretty bad for FCFS servers if job size variability is high. There are many open questions with respect to task assignment policies. The case of server farms with PS servers, for example, has received almost no attention, and even the case of FCFS servers is still only partly understood. There are also many other task assignment policies that have not been mentioned. For example, cycle stealing (taking advantage of a free host to process jobs in some other queue) can be combined with many existing task assignment policies to create improved policies. There are also other metrics to consider, like minimizing the variance of response time, rather than mean response time, or maximizing fairness. Finally, task assignment can become even more complex, and more important, when the workload changes over time. Task assignment is analyzed in great detail in Chapter 24, after we have had a chance to study empirical workloads. Design Example 5 – Scheduling Policies Suppose you have a single server. Jobs arrive according to a Poisson process. Assume anything you like about the distribution of job sizes. The following are some possible service orders (scheduling orders) for serving jobs: First-Come-First-Served (FCFS): When the server completes a job, it starts working on the job that arrived earliest.
  • 40. 1.2 examples of the power of queueing theory 11 Non-Preemptive Last-Come-First-Served (LCFS): When the server completes a job, it starts working on the job that arrived last. Random: When the server completes a job, it starts working on a random job. Question: Which of these non-preemptive service orders will result in the lowest mean response time? Answer: Believe it or not, they all have the same mean response time. Question: Suppose we change the non-preemptive LCFS policy to a Preemptive-LCFS policy (PLCFS), which works as follows: Whenever a new arrival enters the system, it immediately preempts the job in service. How does the mean response time of this policy compare with the others? Answer: It depends on the variability of the job size distribution. If the job size distribution is at least moderately variable, then PLCFS will be a huge improvement. If the job size distribution is hardly variable (basically constant), then PLCFS policy will be up to a factor of 2 worse. We study many counterintuitive scheduling theory results toward the very end of the book, in Chapters 28 through 33. More Design Examples There are many more questions in computer systems design that lend themselves to a queueing-theoretic solution. One example is the notion of a setup cost. It turns out that it can take both significant time and power to turn on a server that is off. In designing an efficient power management policy, we often want to leave servers off (to save power), but then we have to pay the setup cost to get them back on when jobs arrive. Given performance goals, both with respect to response time and power usage, an important question is whether it pays to turn a server off. If so, one can then ask exactly how many servers should be left on. These questions are discussed in Chapters 15 and 27. There are also questions involving optimal scheduling when jobs have priorities (e.g., certain users have paid more for their jobs to have priority over other users’ jobs, or some jobs are inherently more vital than others). Again, queueing theory is very useful in designing the right priority scheme to maximize the value of the work completed. Figure 1.7. Example of a difficult problem: The M/G/2 queue consists of a single queue and two servers. When a server completes a job, it starts working on the job at the head of the queue. Job sizes follow a general distribution, G.
  • 41. 12 motivating examples of the power of analytical modeling However, queueing theory (and more generally analytical modeling) is not currently all-powerful! There are lots of very simple problems that we can at best only analyze approximately. As an example, consider the simple two-server network shown in Figure 1.7, where job sizes come from a general distribution. No one knows how to derive mean response time for this network. Approximations exist, but they are quite poor, particularly when job size variability gets high [76]. We mention many such open problems in this book, and we encourage readers to attempt to solve these!
  • 42. CHAPTER 2 Queueing Theory Terminology 2.1 Where We Are Heading Queueing theory is the study of queueing behavior in networks and systems. Figure 2.1 shows the solution process. Real-world system with question: “Should we buy a faster disk or a faster CPU?” Translate back Result Queueing network Model as Analyze! Figure 2.1. Solution process. In Chapter 1, we looked at examples of the power of queueing theory as a design tool. In this chapter, we start from scratch and define the terminology used in queueing theory. 2.2 The Single-Server Network A queueing network is made up of servers. The simplest example of a queueing network is the single-server network, as shown in Figure 2.2. The discussion in this section is limited to the single-server network with First-Come-First-Served (FCFS) service order. You can think of the server as being a CPU. Arriving jobs λ = 3 FCFS = 4 Figure 2.2. Single-server network. 13
  • 43. 14 queueing theory terminology There are several parameters associated with the single-server network: Service Order This is the order in which jobs will be served by the server. Unless otherwise stated, assume First-Come-First-Served (FCFS). Average Arrival Rate This is the average rate, λ, at which jobs arrive to the server (e.g., λ = 3 jobs/sec). Mean Interarrival Time This is the average time between successive job arrivals (e.g., 1/λ = 1 3 sec). Service Requirement, Size The “size” of a job is typically denoted by the random variable S. This is the time it would take the job to run on this server if there were no other jobs around (no queueing). In a queueing model, the size (a.k.a. service requirement) is typically associated with the server (e.g., this job will take 5 seconds on this server). Mean Service Time This is the expected value of S, namely the average time required to service a job on this CPU, where “service” does not include queueing time. In Figure 2.2, E [S] = 1 4 sec. Average Service Rate This is the average rate, μ, at which jobs are served (e.g., μ = 4 jobs/sec = 1 E[S] ). Observe that this way of speaking is different from the way we normally talk about servers in conversation. For example, nowhere have we mentioned the absolute speed of the CPU; rather we have only defined the CPU’s speed in terms of the set of jobs that it is working on. In normal conversation, we might say something like the following: r The average arrival rate of jobs is 3 jobs per second. r Jobs have different service requirements, but the average number of cycles re- quired by a job is 5,000 cycles per job. r The CPU speed is 20,000 cycles per second. That is, an average of 15,000 cycles of work arrive at the CPU each second, and the CPU can process 20,000 cycles of work a second. In the queueing-theoretic way of talking, we would never mention the word “cycle.” Instead, we would simply say r The average arrival rate of jobs is 3 jobs per second. r The average rate at which the CPU can service jobs is 4 jobs per second. This second way of speaking suppresses some of the detail and thus makes the problem a little easier to think about. You should feel comfortable going back and forth between the two. We consider these common performance metrics in the context of a single-server system: Response Time, Turnaround Time, Time in System, or Sojourn Time (T) We define a job’s response time by T = tdepart − tarrive, where tdepart is the time when the
  • 44. 2.2 the single-server network 15 job leaves the system, and tarrive is the time when the job arrived to the system. We are interested in E [T], the mean response time; Var(T), the variance in response time; and the tail behavior of T, P {T t}. Waiting Time or Delay (TQ ) This is the time that the job spends in the queue, not being served. It is also called the “time in queue” or the “wasted time.” Notice that E [T] = E [TQ ] + E [S]. Under FCFS service order, waiting time can be defined as the time from when a job arrives to the system until it first receives service. Number of Jobs in the System (N) This includes those jobs in the queue, plus the one being served (if any). Number of Jobs in Queue (NQ ) This denotes only the number of jobs waiting (in queue). There are some immediate observations that we can make about the single-server network. First, observe that as λ, the mean arrival rate, increases, all the performance metrics mentioned earlier increase (get worse). Also, as μ, the mean service rate, increases, all the performance metrics mentioned earlier decrease (improve). We require that λ ≤ μ (we always assume λ μ). Question: If λ μ what happens? Answer: If λ μ the queue length goes to infinity over time. Question: Can you provide the intuition? Answer: Consider a large time t. Then, if N(t) is the number of jobs in the system at time t, and A(t) (respectively, D(t)) denotes the number of arrivals (respectively, departures) by time t, then we have: E[N(t)] = E[A(t)] − E[D(t)] ≥ λt − μt = t(λ − μ). (The inequality comes from the fact that the expected number of departures by time t is actually smaller than μt, because the server is not always busy). Now observe that if λ μ, then t(λ − μ) → ∞, as t → ∞. Throughout the book we assume λ μ, which is needed for stability (keeping queue sizes from growing unboundedly). Situations where λ ≥ μ are touched on in Chapter 9. Question: Given the previous stability condition (λ μ), suppose that the interarrival distribution and the service time distribution are Deterministic (i.e., both are constants). What is TQ ? What is T? Answer: TQ = 0, and T = S. Therefore queueing (waiting) results from variability in service time and/or interarrival time distributions. Here is an example of how variability leads to queues: Let’s discretize time. Suppose at each time step, an arrival occurs with probability p = 1/6. Suppose at each time step, a departure occurs with probability q = 1/3. Then there is a non-zero probability that the queue will build up (temporarily) if several arrivals occur without a departure.
  • 45. 16 queueing theory terminology 2.3 Classification of Queueing Networks Queueing networks can be classified into two categories: open networks and closed networks. Stochastic processes books (e.g., [149, 150]) usually limit their discussion to open networks. By contrast, the systems performance analysis books (e.g., [117, 125]) almost exclusively discuss closed networks. Open networks are introduced in Section 2.4. Closed networks are introduced in Section 2.6. 2.4 Open Networks An open queueing network has external arrivals and departures. Four examples of open networks are illustrated in this section. Example: The Single-Server System This was shown in Figure 2.2. Example: Network of Queues with Probabilistic Routing This is shown in Figure 2.3. Here server i receives external arrivals (“outside arrivals”) with rate ri. Server i also receives internal arrivals from some of the other servers. A packet that finishes service at server i is next routed to server j with probability pij . We can even allow the probabilities to depend on the “class” of the packet, so that not all packets have to follow the same routing scheme. Server 1 µ1 µ2 Server 3 Server 2 µ3 r2 r3 r1 p12 p2,out p1,out p23 p31 p13 Figure 2.3. Network of queues with probabilistic routing. Application: In modeling packet flows in the Internet, for example, one could make the class of the packet (and hence its route) depend on its source and destination IP addresses. In modeling delays, each wire might be replaced by a server that would be used to model the wire latency. The goal might be to predict mean round-trip times for packets on a particular route, given the presence of the other packets. We solve this problem in Chapter 18.
  • 46. 2.5 more metrics: throughput and utilization 17 Example: Network of Queues with Non-Probabilistic Routing This is shown in Figure 2.4. Here all jobs follow a predetermined route: CPU to disk 1 to disk 2 to disk 1 to disk 2 to disk 1 and out. Arriving Jobs (λ) CPU Disk 1 Disk 2 2X around (Disk 1,2,1,2,1) Figure 2.4. Network of queues with non-probabilistic routing. Example: Finite Buffer An example of a single-server network with finite buffer is shown in Figure 2.5. Any arrival that finds no room is dropped. λ CPU Space for 9 jobs plus 1 in service µCPU Figure 2.5. Single-server network with finite buffer capacity. 2.5 More Metrics: Throughput and Utilization We have already seen four performance metrics: E [N], E [T], E [NQ ], and E [TQ ]. Although these were applied to a single-server system, they can also be used to describe performance in a multi-server, multi-queue system. For example, E [T] would denote the mean time a job spends in the whole system, including all time spent in various queues and time spent receiving service at various servers, whereas E [TQ ] refers to just the mean time the job “wasted” waiting in various queues. If we want to refer to just the ith queue in such a system, we typically write E [Ni] to denote the expected number of jobs both queueing and in service at server i, and E [Ti] to denote the expected time a job spends queueing and in service at server i. Now we introduce two new performance metrics: throughput and utilization. Through- put is arguably the performance metric most used in conversation. Everyone wants higher throughput! Let’s see why. Question: How does maximizing throughput relate to minimizing response time? For example, in Figure 2.6, which system has higher throughput?
  • 47. 18 queueing theory terminology versus µ =⅓ λ = = λ = Figure 2.6. Comparing throughput of two systems. Answer: We will see soon. Let’s start by defining utilization. Device Utilization (ρi) is the fraction of time device i is busy. Note our current definition of utilization applies only to a single device (server). When the device is implied, we simply write ρ (omitting the subscript). Suppose we watch a device i for a long period of time. Let τ denote the length of the observation period. Let B denote the total time during the observation period that the device is non-idle (busy). Then ρi = B τ . Device Throughput (Xi) is the rate of completions at device i (e.g., jobs/sec). The throughput (X) of the system is the rate of job completions in the system. Let C denote the total number of jobs completed at device i during time τ. Then Xi = C τ . So how does Xi relate to ρi? Well, C τ = C B · B τ . Question: So what is C B ? Answer: Well, B C = E [S]. So C B = 1 E[S] = μi. So we have Xi = μi · ρi. Here is another way to derive this expression by conditioning: Xi = Mean rate of completion at server i = E [Rate of completion at server i | server i is busy] · P {server i is busy} + E [Rate of completion at server i | server i is idle] · P {server i is idle} = μi · P {server i is busy} + 0 = μi · ρi
  • 48. 2.5 more metrics: throughput and utilization 19 Or, equivalently, ρi = Xi · E [S] . This latter formulation has a name: the Utilization Law. Example: Single-Server Network: What Is the Throughput? In Figure 2.7 we have a single-server system. =⅓ λ = FCFS Figure 2.7. Single-server model. Question: What is X? Answer: X = ρ · μ. But what is ρ? In Chapter 6, we will prove that ρ = λ μ . For now here is a hand-wavy but intuitive way to see this, but not a proof!! ρ = Fraction of time server is busy = Average service time required by a job Average time between arrivals = 1/μ 1/λ = λ μ . So, this leaves us with X = ρ · μ = λ μ · μ = λ. So the throughput does not depend on the service rate whatsoever! In particular, in the example shown in Figure 2.6, repeated again in Figure 2.8, both systems have the same throughput of 1/6 jobs/sec. In the case of the faster processor, the response time drops and the queue length drops, but X does not change. Therefore lower response time is not related to higher throughput. versus =⅓ λ = = λ = Figure 2.8. Same model, but different values of μ. Throughput, X, is the same in both. Question: Explain why X does not change. Answer: No matter how high we make μ, the completion rate is still bounded by the arrival rate: “rate in = rate out.” Changing μ affects the maximum possible X, but
  • 49. 20 queueing theory terminology not the actual X. Note that because we assume a stable system, then, for large t, the number of arrivals during t is approximately the number of completions during t. Example: Probabilistic Network of Queues: What is the Throughput? For Figure 2.3, ri denotes the average outside arrival rate into server i, and μi denotes the average service rate at server i. Question: What is the system throughput, X, in Figure 2.3? Answer: X = i ri. Question: What is the throughput at server i, Xi? Answer: Let λi denote the total arrival rate into server i. Then Xi = λi. But to get λi we need to solve these simultaneous equations: λi = ri + j λj Pji (2.1) Question: How are the ri’s constrained in these equations? Answer: For the network to reach “equilibrium” (flow into server = flow out of server), we must have λi μi, ∀i, and this constrains the ri’s (see Exercise 2.1). Example: Network of Queues with Non-Probabilistic Routing: What is the Throughput? Question: What is X in Figure 2.4? Answer: X = λ. Question: What are XDisk1 and XDisk2? Answer: XDisk1 = 3λ and XDisk2 = 2λ. Example: Finite Buffer: What is the Throughput? For Figure 2.5, the outside arrival rate is λ and the service rate is μ. Question: What is X? Answer: X = ρμ. But we need stochastic analysis to determine ρ because it is no longer simply λ/μ. Observe that X λ because some arrivals get dropped. 2.6 Closed Networks Closed queueing networks have no external arrivals or departures. They can be classified into two categories as shown in Figure 2.9. Closed networks Interactive (terminal-driven) Batch system Figure 2.9. Closed network categories.
  • 50. 2.6 closed networks 21 2.6.1 Interactive (Terminal-Driven) Systems An example of an interactive (terminal-driven) system is shown in Figure 2.10. Ter- minals represent users who each send a job to the “central subsystem” and then wait for a response. The central subsystem is a network of queues. A user cannot submit her next job before her previous job returns. Thus, the number of jobs in the system is fixed (equal to the number of terminals). This number is sometimes called the load or MPL (multiprogramming level), not to be confused with device utilization. Central subsystem N user terminals t u o n i Figure 2.10. Interactive system. There is a think time, Z, which is a random variable representing the time at each terminal between receiving the result of one job and sending out the next job. Note that the number of jobs in the central subsystem is at most the number of terminals, because some users might be in the “thinking” state. An example of an interactive system such as the one shown in Figure 2.10 is a data- entry application. N users each sit at terminals filling out the entries on their screens. Several fields of the screen must be filled out, and then the whole screen is submitted to the central subsystem for appropriate processing and database update. A new screen cannot be filled out until the previous update is performed. The “think time,” Z, is the time to key data to the screen. An individual user (terminal) oscillates between the think state and the submitted state as shown in Figure 2.11. Think state Think state Think state Submitted state in out Submitted state Figure 2.11. The user alternates between thinking and waiting for the submitted job to return.
  • 51. 22 queueing theory terminology Question: How would you define the response time for the interactive system? Answer: Response time is the time it takes a job to go between “in” and “out” in Figures 2.10 and 2.11. We denote the average time to get from “in” to “out” by E [Response Time] or E [R] to differentiate it from E [T], which is defined as E [T] = E [R] + E [Z] Important: Although “response time” in open systems is denoted by the random variable (r.v.) T, for closed interactive systems, we refer to T as the system time (or “time in system”) and reserve the r.v. R for response time. Goal: The goal in an interactive system is to find a way to allow as many users as possible to get onto the system at once, so they can all get their work done, while keeping E [R] low enough. Note that interactive systems are very different from open systems in that a small change in N has a profound effect on the system behavior. The typical questions asked by systems designers are: r Given the original system, how high can I make N while keeping E [R] below some threshold? That is, how does E [R] rise with N? r Assume a fixed multiprogramming level, N. Given that we can make changes to the central subsystem (i.e., make certain devices faster), which changes will improve E [R] the most? Question: Say we are modeling performance at a website. Would you model the website as a closed interactive system or an open system? Answer: The jury is still out. There are research papers of both types. On the one hand, once a user clicks on a link (submits a job), he typically waits to receive the result before clicking on another link. Thus users behave as if the website is a closed system. On the other hand, a website may have a huge number of users, each of whom is very transient in his or her use of the website. In this respect, the website might behave more like an open system. Schroeder et al. [165] proposes the idea of a “partly-open” system. Here users arrive from outside as in an open system, but make k requests to the system when they arrive, where each request can only be made when the previous request completes (as in a closed system). 2.6.2 Batch Systems An example of a batch system is shown in Figure 2.12. A batch system looks like an interactive system with a think time of zero. However, the goals are somewhat different for batch systems. In a batch system, typically one is running many jobs overnight. As soon as one job completes, another one is started. So there are always N jobs in the central subsystem. The MPL is usually predetermined and fixed. For example the MPL might be the number of jobs that fit into memory.
  • 52. 2.6 closed networks 23 Central subsystem N user terminals Figure 2.12. Batch system. Goal: For a batch system, the goal is to obtain high throughput, so that as many jobs as possible are completed overnight. The typical question asked by systems designers is, “How can we improve the central subsystem so as to maximize throughput?” Note that we are typically constrained by some fixed maximum MPL (because only so many jobs fit into memory or for some other reason). Thus the only method we have for increasing throughput is changing the central subsystem, either by changing the routing or by speeding up some device. Observe that in the batch system we are not concerned with response times because the jobs are running overnight. Question: What does X mean in a closed system? Answer: X is the number of jobs crossing “out” per second. Note that “in” = “out” for the batch system. 2.6.3 Throughput in a Closed System Let’s look at some examples. Example: Single Server Figure 2.13 shows a closed network consisting of a single server. MPL = N µ Figure 2.13. Single-server closed network. Question: What is the throughput, X, in Figure 2.13? Answer: X = μ. Observe that this is very different from the case of the open network where throughput was independent of service rate!
  • 53. 24 queueing theory terminology Question: What is the mean response time, E [R], in Figure 2.13? Answer: For a closed batch system, E [R] = E [T], namely the response time and time in system are the same. For Figure 2.13, E [T] = N/μ, because every “arrival” waits behind N − 1 jobs and then runs. Note that X and E [R] are inversely related! Example: Tandem Servers Now consider the example of a more complicated closed network, as shown in Fig- ure 2.14. MPL = N µ2 µ1 Figure 2.14. Tandem servers closed network. Question: What is the throughput? Answer: We would like to say X = min(μ1, μ2) . . . Question: Why is this previous answer not necessarily correct? Answer: The previous answer is correct if we know that the slower server is always busy, but that is not necessarily the case. Imagine N = 1. Then it is certainly not the case that the slower server is always busy. Question: OK, but what happens when N = 2. Now it appears that there is always at least one job at the slow server, doesn’t it? Answer: Nope, the slower server is still not always busy. What we’re missing here is the fact that sometimes the slow server is faster than the fast server – because these service rates are just averages! So do we in fact need to take the job size distribution into account to get the exact answer? Does the job size distribution really affect the answer very much? We will answer these questions soon enough . . . For now, let’s sum up the differences between the behavior of open and closed networks and why we need to consider both. 2.7 Differences between Closed and Open Networks Open Systems r Throughput, X, is independent of the μi’s r X is not affected by doubling the μi’s. r Throughput and response time are not related.
  • 54. 2.8 related readings 25 Closed Systems r X depends on μi’s. r If we double all the μi’s while holding N constant, then X changes. r In fact we see in Chapter 6 that for closed systems, Higher throughput ⇐⇒ Lower avg. response time. 2.7.1 A Question on Modeling Here is a final question: A few years ago I got a call from some folks at IBM. They were trying to model their blade server as a single-server queue. They knew the arrival rate into the server, λ, in jobs/sec. However they were wondering how to get E [S], the mean job size. Question: How do you obtain E [S] in practice for your single-server system? Answer: At first glance, you might reason that because E [S] is the mean time required for a job in isolation, you should just send a single job into the system and measure its response time, repeating that experiment a hundred times to get an average. This makes sense in theory, but does not work well in practice, because cache conditions and other factors are very different for the scenario of just a single job compared with the case when the system has been loaded for some time. A better approach is to recall that E [S] = 1 μ , so it suffices to think about the service rate of the server in jobs/second. To get μ, assuming an open system, we can make λ higher and higher, which will increase the completion rate, until the completion rate levels off at some value, which will be rate μ. An even better idea is to put our server into a closed system, with zero think time. This way the server is guaranteed to always be occupied with work. Now, if we measure the completion rate at the server (jobs completing per second), then that will give us μ for the server. E [S] is then the reciprocal of μ. 2.8 Related Readings Especially helpful in understanding closed queueing networks are Lazowska (pp. 58– 59) [117] and Menascé (pp. 84–87) [125]. Both of these are wonderful books. There is surprisingly very little known in the literature on how closed systems compare to open systems. For example, consider a closed interactive single-server system with load ρ, versus the corresponding open system with load ρ. How do these compare to each other with respect to their mean response time? How does variability in service time affect closed systems versus open ones? These questions and many others are considered in [186] and [24], as well as in Exercises 7.2, 7.5, 13.7, and 13.8. Another question is how the scheduling policy (service order) at the server affects closed systems versus open systems. This question was not really discussed until 2006 [165]. For a
  • 55. Another Random Document on Scribd Without Any Related Topics
  • 56. O F R EL IGIO N. By HENRY WARE, D. D., LATE HOLLIS PROFESSOR OF DIVINITY IN HARVARD COLLEGE. 2 Vols. 12mo. Cloth. VII. THE CLO UDS O F AR ISTO PHAN ES. W I T H N O T E S . By C. C. FELTON, ELIOT PROFESSOR OF GREEK LITERATURE IN HARVARD UNIVERSITY. 12mo. Cloth. VIII. PROF. LIEBIG'S REPORT ON ORGANIC CHEMISTRY. PART I. AGRICULTURAL CHEMISTRY. C HEMISTRY I N I T S A P P L I C AT I O N T O A G R I C U LT U R E A N D P H Y S I O L O G Y.
  • 57. BY JUSTUS LIEBIG, M.D., Ph.D., F.R.S., M.R.I.A., PROFESSOR OF CHEMISTRY IN THE UNIVERSITY OF GIESSEN, ETC. EDITED FROM THE MANUSCRIPT OF THE AUTHOR, By LYON PLAYFAIR, Ph.D. WITH VERY NUMEROUS ADDITIONS, AND A NEW CHAPTER ON SOILS. THIRD AMERICAN, FROM THE SECOND ENGLISH EDITION, WITH NOTES AND APPENDIX, By JO HN W. WEBSTER, M .D., ERVING PROFESSOR OF CHEMISTRY IN HARVARD UNIVERSITY. 12mo. Cloth. IX. PART II. ANIMAL CHEMISTRY. ANIMAL CHEMISTRY, O R O R G A N I C C H E M I S T R Y I N I T S A P P L I C AT I O N T O P H Y S I O L O G Y A N D P AT H O L O G Y. BY
  • 58. JUSTUS LIEBIG, M.D., Ph.D., F.R.S, M.R.I.A., PROFESSOR OF CHEMISTRY IN THE UNIVERSITY OF GIESSEN, ETC. EDITED FROM THE AUTHOR'S MANUSCRIPT, By WILLIAM GREGORY, M.D., F.R.S.E., M.R.I.A., PROFESSOR OF MEDICINE AND CHEMISTRY IN THE UNIVERSITY AND KING'S COLLEGE, ABERDEEN. WITH ADDITIONS, NOTES, AND CORRECTIONS, By Dr. GREGORY, AND OTHERS By JO HN W. WEBSTER, M .D., ERVING PROFESSOR OF CHEMISTRY IN HARVARD UNIVERSITY. 12mo. Cloth. X. A NARRATIV E O F V OYAGESL A N D C O M M E R C I A L E N T E R P R I S E S . By RICHARD J. CLEVELAND. 2 Vols. 12mo. Cloth. XI.
  • 59. L ECTUR ES O N MO DERN HISTO RY, F R O M T H E I R R U P T I O N O F T H E N O R T H E R N N AT I O N S T O T H E C L O S E O F T H E A M E R I C A N R E V O L U T I O N . By WIL L IAM SM YTH, PROFESSOR OF MODERN HISTORY IN THE UNIVERSITY OF CAMBRIDGE. FROM THE SECOND LONDON EDITION, WITH A PREFACE, LIST OF BOOKS ON AMERICAN HISTORY, c., By JARED SPARKS, L L . D., PROFESSOR OF ANCIENT AND MODERN HISTORY IN HARVARD UNIVERSITY. 2 Vols. 8vo. Cloth. XII. HENRY O F O F TERDING EN : A ROMANCE. FROM THE GERMAN OF NOVALIS (FRIEDRICH von HARDENBERG).
  • 61. WORKS IN PRESS. I. A TREATISE O N MINER ALO GY, ON THE BASIS OF THOMSON'S OUTLINES, WITH NUMEROUS ADDITIONS; COMPRISING THE DESCRIPTION OF ALL THE NEW AMERICAN AND FOREIGN MINERALS, THEIR LOCALITIES, c. DESIGNED AS A TEXT-BOOK FOR STUDENTS, TRAVELLERS, AND PERSONS ATTENDING LECTURES ON THE SCIENCE. By JO HN W. WEBSTER, M .D., ERVING PROFESSOR OF CHEMISTRY AND MINERALOGY IN HARVARD UNIVERSITY. 8vo. II. THE EVIDENCES OF THE G E N U I N E N E SS O F T H E G O S P E L S.
  • 62. By ANDREWS NO RTO N. Vols. II. III. BEING THE COMPLETION OF THE WORK. 8vo. III. THE SPAN ISH STUDENT. A DRAMA: IN THREE ACTS. BY HENRY WADSWORTH LONGFELLOW, AUTHOR OF VOICES OF THE NIGHT, HYPERION, ETC. l6mo.
  • 63. *** END OF THE PROJECT GUTENBERG EBOOK POEMS ON SLAVERY *** Updated editions will replace the previous one—the old editions will be renamed. Creating the works from print editions not protected by U.S. copyright law means that no one owns a United States copyright in these works, so the Foundation (and you!) can copy and distribute it in the United States without permission and without paying copyright royalties. Special rules, set forth in the General Terms of Use part of this license, apply to copying and distributing Project Gutenberg™ electronic works to protect the PROJECT GUTENBERG™ concept and trademark. Project Gutenberg is a registered trademark, and may not be used if you charge for an eBook, except by following the terms of the trademark license, including paying royalties for use of the Project Gutenberg trademark. If you do not charge anything for copies of this eBook, complying with the trademark license is very easy. You may use this eBook for nearly any purpose such as creation of derivative works, reports, performances and research. Project Gutenberg eBooks may be modified and printed and given away—you may do practically ANYTHING in the United States with eBooks not protected by U.S. copyright law. Redistribution is subject to the trademark license, especially commercial redistribution. START: FULL LICENSE
  • 64. THE FULL PROJECT GUTENBERG LICENSE
  • 65. PLEASE READ THIS BEFORE YOU DISTRIBUTE OR USE THIS WORK To protect the Project Gutenberg™ mission of promoting the free distribution of electronic works, by using or distributing this work (or any other work associated in any way with the phrase “Project Gutenberg”), you agree to comply with all the terms of the Full Project Gutenberg™ License available with this file or online at www.gutenberg.org/license. Section 1. General Terms of Use and Redistributing Project Gutenberg™ electronic works 1.A. By reading or using any part of this Project Gutenberg™ electronic work, you indicate that you have read, understand, agree to and accept all the terms of this license and intellectual property (trademark/copyright) agreement. If you do not agree to abide by all the terms of this agreement, you must cease using and return or destroy all copies of Project Gutenberg™ electronic works in your possession. If you paid a fee for obtaining a copy of or access to a Project Gutenberg™ electronic work and you do not agree to be bound by the terms of this agreement, you may obtain a refund from the person or entity to whom you paid the fee as set forth in paragraph 1.E.8. 1.B. “Project Gutenberg” is a registered trademark. It may only be used on or associated in any way with an electronic work by people who agree to be bound by the terms of this agreement. There are a few things that you can do with most Project Gutenberg™ electronic works even without complying with the full terms of this agreement. See paragraph 1.C below. There are a lot of things you can do with Project Gutenberg™ electronic works if you follow the terms of this agreement and help preserve free future access to Project Gutenberg™ electronic works. See paragraph 1.E below.
  • 66. 1.C. The Project Gutenberg Literary Archive Foundation (“the Foundation” or PGLAF), owns a compilation copyright in the collection of Project Gutenberg™ electronic works. Nearly all the individual works in the collection are in the public domain in the United States. If an individual work is unprotected by copyright law in the United States and you are located in the United States, we do not claim a right to prevent you from copying, distributing, performing, displaying or creating derivative works based on the work as long as all references to Project Gutenberg are removed. Of course, we hope that you will support the Project Gutenberg™ mission of promoting free access to electronic works by freely sharing Project Gutenberg™ works in compliance with the terms of this agreement for keeping the Project Gutenberg™ name associated with the work. You can easily comply with the terms of this agreement by keeping this work in the same format with its attached full Project Gutenberg™ License when you share it without charge with others. 1.D. The copyright laws of the place where you are located also govern what you can do with this work. Copyright laws in most countries are in a constant state of change. If you are outside the United States, check the laws of your country in addition to the terms of this agreement before downloading, copying, displaying, performing, distributing or creating derivative works based on this work or any other Project Gutenberg™ work. The Foundation makes no representations concerning the copyright status of any work in any country other than the United States. 1.E. Unless you have removed all references to Project Gutenberg: 1.E.1. The following sentence, with active links to, or other immediate access to, the full Project Gutenberg™ License must appear prominently whenever any copy of a Project Gutenberg™ work (any work on which the phrase “Project
  • 67. Gutenberg” appears, or with which the phrase “Project Gutenberg” is associated) is accessed, displayed, performed, viewed, copied or distributed: This eBook is for the use of anyone anywhere in the United States and most other parts of the world at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.org. If you are not located in the United States, you will have to check the laws of the country where you are located before using this eBook. 1.E.2. If an individual Project Gutenberg™ electronic work is derived from texts not protected by U.S. copyright law (does not contain a notice indicating that it is posted with permission of the copyright holder), the work can be copied and distributed to anyone in the United States without paying any fees or charges. If you are redistributing or providing access to a work with the phrase “Project Gutenberg” associated with or appearing on the work, you must comply either with the requirements of paragraphs 1.E.1 through 1.E.7 or obtain permission for the use of the work and the Project Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9. 1.E.3. If an individual Project Gutenberg™ electronic work is posted with the permission of the copyright holder, your use and distribution must comply with both paragraphs 1.E.1 through 1.E.7 and any additional terms imposed by the copyright holder. Additional terms will be linked to the Project Gutenberg™ License for all works posted with the permission of the copyright holder found at the beginning of this work. 1.E.4. Do not unlink or detach or remove the full Project Gutenberg™ License terms from this work, or any files
  • 68. containing a part of this work or any other work associated with Project Gutenberg™. 1.E.5. Do not copy, display, perform, distribute or redistribute this electronic work, or any part of this electronic work, without prominently displaying the sentence set forth in paragraph 1.E.1 with active links or immediate access to the full terms of the Project Gutenberg™ License. 1.E.6. You may convert to and distribute this work in any binary, compressed, marked up, nonproprietary or proprietary form, including any word processing or hypertext form. However, if you provide access to or distribute copies of a Project Gutenberg™ work in a format other than “Plain Vanilla ASCII” or other format used in the official version posted on the official Project Gutenberg™ website (www.gutenberg.org), you must, at no additional cost, fee or expense to the user, provide a copy, a means of exporting a copy, or a means of obtaining a copy upon request, of the work in its original “Plain Vanilla ASCII” or other form. Any alternate format must include the full Project Gutenberg™ License as specified in paragraph 1.E.1. 1.E.7. Do not charge a fee for access to, viewing, displaying, performing, copying or distributing any Project Gutenberg™ works unless you comply with paragraph 1.E.8 or 1.E.9. 1.E.8. You may charge a reasonable fee for copies of or providing access to or distributing Project Gutenberg™ electronic works provided that: • You pay a royalty fee of 20% of the gross profits you derive from the use of Project Gutenberg™ works calculated using the method you already use to calculate your applicable taxes. The fee is owed to the owner of the Project Gutenberg™ trademark, but he has agreed to donate royalties under this paragraph to the Project Gutenberg Literary Archive Foundation. Royalty
  • 69. payments must be paid within 60 days following each date on which you prepare (or are legally required to prepare) your periodic tax returns. Royalty payments should be clearly marked as such and sent to the Project Gutenberg Literary Archive Foundation at the address specified in Section 4, “Information about donations to the Project Gutenberg Literary Archive Foundation.” • You provide a full refund of any money paid by a user who notifies you in writing (or by e-mail) within 30 days of receipt that s/he does not agree to the terms of the full Project Gutenberg™ License. You must require such a user to return or destroy all copies of the works possessed in a physical medium and discontinue all use of and all access to other copies of Project Gutenberg™ works. • You provide, in accordance with paragraph 1.F.3, a full refund of any money paid for a work or a replacement copy, if a defect in the electronic work is discovered and reported to you within 90 days of receipt of the work. • You comply with all other terms of this agreement for free distribution of Project Gutenberg™ works. 1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™ electronic work or group of works on different terms than are set forth in this agreement, you must obtain permission in writing from the Project Gutenberg Literary Archive Foundation, the manager of the Project Gutenberg™ trademark. Contact the Foundation as set forth in Section 3 below. 1.F. 1.F.1. Project Gutenberg volunteers and employees expend considerable effort to identify, do copyright research on, transcribe and proofread works not protected by U.S. copyright
  • 70. law in creating the Project Gutenberg™ collection. Despite these efforts, Project Gutenberg™ electronic works, and the medium on which they may be stored, may contain “Defects,” such as, but not limited to, incomplete, inaccurate or corrupt data, transcription errors, a copyright or other intellectual property infringement, a defective or damaged disk or other medium, a computer virus, or computer codes that damage or cannot be read by your equipment. 1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for the “Right of Replacement or Refund” described in paragraph 1.F.3, the Project Gutenberg Literary Archive Foundation, the owner of the Project Gutenberg™ trademark, and any other party distributing a Project Gutenberg™ electronic work under this agreement, disclaim all liability to you for damages, costs and expenses, including legal fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH 1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE. 1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect in this electronic work within 90 days of receiving it, you can receive a refund of the money (if any) you paid for it by sending a written explanation to the person you received the work from. If you received the work on a physical medium, you must return the medium with your written explanation. The person or entity that provided you with the defective work may elect to provide a replacement copy in lieu of a refund. If you received the work electronically, the person or entity providing it to you may choose to give you a second opportunity to receive the work electronically in lieu of a refund.
  • 71. If the second copy is also defective, you may demand a refund in writing without further opportunities to fix the problem. 1.F.4. Except for the limited right of replacement or refund set forth in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PURPOSE. 1.F.5. Some states do not allow disclaimers of certain implied warranties or the exclusion or limitation of certain types of damages. If any disclaimer or limitation set forth in this agreement violates the law of the state applicable to this agreement, the agreement shall be interpreted to make the maximum disclaimer or limitation permitted by the applicable state law. The invalidity or unenforceability of any provision of this agreement shall not void the remaining provisions. 1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation, the trademark owner, any agent or employee of the Foundation, anyone providing copies of Project Gutenberg™ electronic works in accordance with this agreement, and any volunteers associated with the production, promotion and distribution of Project Gutenberg™ electronic works, harmless from all liability, costs and expenses, including legal fees, that arise directly or indirectly from any of the following which you do or cause to occur: (a) distribution of this or any Project Gutenberg™ work, (b) alteration, modification, or additions or deletions to any Project Gutenberg™ work, and (c) any Defect you cause. Section 2. Information about the Mission of Project Gutenberg™
  • 72. Project Gutenberg™ is synonymous with the free distribution of electronic works in formats readable by the widest variety of computers including obsolete, old, middle-aged and new computers. It exists because of the efforts of hundreds of volunteers and donations from people in all walks of life. Volunteers and financial support to provide volunteers with the assistance they need are critical to reaching Project Gutenberg™’s goals and ensuring that the Project Gutenberg™ collection will remain freely available for generations to come. In 2001, the Project Gutenberg Literary Archive Foundation was created to provide a secure and permanent future for Project Gutenberg™ and future generations. To learn more about the Project Gutenberg Literary Archive Foundation and how your efforts and donations can help, see Sections 3 and 4 and the Foundation information page at www.gutenberg.org. Section 3. Information about the Project Gutenberg Literary Archive Foundation The Project Gutenberg Literary Archive Foundation is a non- profit 501(c)(3) educational corporation organized under the laws of the state of Mississippi and granted tax exempt status by the Internal Revenue Service. The Foundation’s EIN or federal tax identification number is 64-6221541. Contributions to the Project Gutenberg Literary Archive Foundation are tax deductible to the full extent permitted by U.S. federal laws and your state’s laws. The Foundation’s business office is located at 809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up to date contact information can be found at the Foundation’s website and official page at www.gutenberg.org/contact
  • 73. Section 4. Information about Donations to the Project Gutenberg Literary Archive Foundation Project Gutenberg™ depends upon and cannot survive without widespread public support and donations to carry out its mission of increasing the number of public domain and licensed works that can be freely distributed in machine-readable form accessible by the widest array of equipment including outdated equipment. Many small donations ($1 to $5,000) are particularly important to maintaining tax exempt status with the IRS. The Foundation is committed to complying with the laws regulating charities and charitable donations in all 50 states of the United States. Compliance requirements are not uniform and it takes a considerable effort, much paperwork and many fees to meet and keep up with these requirements. We do not solicit donations in locations where we have not received written confirmation of compliance. To SEND DONATIONS or determine the status of compliance for any particular state visit www.gutenberg.org/donate. While we cannot and do not solicit contributions from states where we have not met the solicitation requirements, we know of no prohibition against accepting unsolicited donations from donors in such states who approach us with offers to donate. International donations are gratefully accepted, but we cannot make any statements concerning tax treatment of donations received from outside the United States. U.S. laws alone swamp our small staff. Please check the Project Gutenberg web pages for current donation methods and addresses. Donations are accepted in a number of other ways including checks, online payments and
  • 74. credit card donations. To donate, please visit: www.gutenberg.org/donate. Section 5. General Information About Project Gutenberg™ electronic works Professor Michael S. Hart was the originator of the Project Gutenberg™ concept of a library of electronic works that could be freely shared with anyone. For forty years, he produced and distributed Project Gutenberg™ eBooks with only a loose network of volunteer support. Project Gutenberg™ eBooks are often created from several printed editions, all of which are confirmed as not protected by copyright in the U.S. unless a copyright notice is included. Thus, we do not necessarily keep eBooks in compliance with any particular paper edition. Most people start at our website which has the main PG search facility: www.gutenberg.org. This website includes information about Project Gutenberg™, including how to make donations to the Project Gutenberg Literary Archive Foundation, how to help produce our new eBooks, and how to subscribe to our email newsletter to hear about new eBooks.
  • 75. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com