Artificial_Intelligence__A_Modern_Approach_Nho Vĩnh Share.pdf

Artiﬁcial Intelligence
A Modern Approach
Third Edition

PRENTICE HALL SERIES
IN ARTIFICIAL INTELLIGENCE
Stuart Russell and Peter Norvig, Editors
FORSYTH & PONCE Computer Vision: A Modern Approach
GRAHAM ANSI Common Lisp
JURAFSKY & MARTIN Speech and Language Processing, 2nd ed.
NEAPOLITAN Learning Bayesian Networks
RUSSELL & NORVIG Artiﬁcial Intelligence: A Modern Approach, 3rd ed.

Artiﬁcial Intelligence
A Modern Approach
Third Edition
Stuart J. Russell and Peter Norvig
Contributing writers:
Ernest Davis
Douglas D. Edwards
David Forsyth
Nicholas J. Hay
Jitendra M. Malik
Vibhu Mittal
Mehran Sahami
Sebastian Thrun
Boston Columbus Indianapolis New York San Francisco Upper Saddle River
Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto
Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo

Pearson Education Limited
Edinburgh Gate
Harlow
Essex CM20 2JE
England
and Associated Companies throughout the world
Visit us on the World Wide Web at:
www.pearsonglobaleditions.com
© Pearson Education Limited 2016
The rights of Stuart J. Russell and Peter Norvig to be identified as the authors of this work have
been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Authorized adaptation from the United States edition, entitled Artificial Intelligence: A Modern
Approach, Third Edition, ISBN 9780136042594, by Stuart J. Russell and Peter Norvig published
by Pearson Education © 2010.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means, electronic, mechanical, photocopying, recording or
otherwise, without either the prior written permission of the publisher or a license permitting
restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron
House, 6–10 Kirby Street, London EC1N 8TS.
All trademarks used herein are the property of their respective owners. The use of any trademark
in this text does not vest in the author or publisher any trademark ownership rights in such
trademarks, nor does the use of such trademarks imply any affiliation with or endorsement of this
book by such owners.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
10 9 8 7 6 5 4 3 2 1
ISBN 10: 1292153962
ISBN 13: 9781292153964
Printed and bound in Malaysia

For Loy, Gordon, Lucy, George, and Isaac — S.J.R.
For Kris, Isabella, and Juliet — P.N.

This page intentionally left blank

Preface
Artificial Intelligence (AI) is a big field, and this is a big book. We have tried to explore the
full breadth of the field, which encompasses logic, probability, and continuous mathematics;
perception, reasoning, learning, and action; and everything from microelectronic devices to
robotic planetary explorers. The book is also big because we go into some depth.
The subtitle of this book is “A Modern Approach.” The intended meaning of this rather
empty phrase is that we have tried to synthesize what is now known into a common frame-
work, rather than trying to explain each subfield of AI in its own historical context. We
apologize to those whose subfields are, as a result, less recognizable.
New to this edition
This edition captures the changes in AI that have taken place since the last edition in 2003.
There have been important applications of AI technology, such as the widespread deploy-
ment of practical speech recognition, machine translation, autonomous vehicles, and house-
hold robotics. There have been algorithmic landmarks, such as the solution of the game of
checkers. And there has been a great deal of theoretical progress, particularly in areas such
as probabilistic reasoning, machine learning, and computer vision. Most important from our
point of view is the continued evolution in how we think about the field, and thus how we
organize the book. The major changes are as follows:
• We place more emphasis on partially observable and nondeterministic environments,
especially in the nonprobabilistic settings of search and planning. The concepts of
belief state (a set of possible worlds) and state estimation (maintaining the belief state)
are introduced in these settings; later in the book, we add probabilities.
• In addition to discussing the types of environments and types of agents, we now cover
in more depth the types of representations that an agent can use. We distinguish among
atomic representations (in which each state of the world is treated as a black box),
factored representations (in which a state is a set of attribute/value pairs), and structured
representations (in which the world consists of objects and relations between them).
• Our coverage of planning goes into more depth on contingent planning in partially
observable environments and includes a new approach to hierarchical planning.
• We have added new material on first-order probabilistic models, including open-universe
models for cases where there is uncertainty as to what objects exist.
• We have completely rewritten the introductory machine-learning chapter, stressing a
wider variety of more modern learning algorithms and placing them on a firmer theo-
retical footing.
• We have expanded coverage of Web search and information extraction, and of tech-
niques for learning from very large data sets.
• 20% of the citations in this edition are to works published after 2003.
• We estimate that about 20% of the material is brand new. The remaining 80% reflects
older work but has been largely rewritten to present a more unified picture of the field.
vii

viii Preface
Overview of the book
The main unifying theme is the idea of an intelligent agent. We define AI as the study of
agents that receive percepts from the environment and perform actions. Each such agent im-
plements a function that maps percept sequences to actions, and we cover different ways to
represent these functions, such as reactive agents, real-time planners, and decision-theoretic
systems. We explain the role of learning as extending the reach of the designer into unknown
environments, and we show how that role constrains agent design, favoring explicit knowl-
edge representation and reasoning. We treat robotics and vision not as independently defined
problems, but as occurring in the service of achieving goals. We stress the importance of the
task environment in determining the appropriate agent design.
Our primary aim is to convey the ideas that have emerged over the past fifty years of AI
research and the past two millennia of related work. We have tried to avoid excessive formal-
ity in the presentation of these ideas while retaining precision. We have included pseudocode
algorithms to make the key ideas concrete; our pseudocode is described in Appendix B.
This book is primarily intended for use in an undergraduate course or course sequence.
The book has 27 chapters, each requiring about a week’s worth of lectures, so working
through the whole book requires a two-semester sequence. A one-semester course can use
selected chapters to suit the interests of the instructor and students. The book can also be
used in a graduate-level course (perhaps with the addition of some of the primary sources
suggested in the bibliographical notes). Sample syllabi are available at the book’s Web site,
aima.cs.berkeley.edu. The only prerequisite is familiarity with basic concepts of
computer science (algorithms, data structures, complexity) at a sophomore level. Freshman
calculus and linear algebra are useful for some of the topics; the required mathematical back-
ground is supplied in Appendix A.
Exercises are given at the end of each chapter. Exercises requiring significant pro-
gramming are marked with a keyboard icon. These exercises can best be solved by taking
advantage of the code repository at aima.cs.berkeley.edu. Some of them are large
enough to be considered term projects. A number of exercises require some investigation of
the literature; these are marked with a book icon.
Throughout the book, important points are marked with a pointing icon. We have in-
cluded an extensive index of around 6,000 items to make it easy to find things in the book.
Wherever a new term is first defined, it is also marked in the margin.
NEW TERM
About the Web site
aima.cs.berkeley.edu, the Web site for the book, contains
• implementations of the algorithms in the book in several programming languages,
• a list of over 1000 schools that have used the book, many with links to online course
materials and syllabi,
• an annotated list of over 800 links to sites around the Web with useful AI content,
• a chapter-by-chapter list of supplementary material and links,
• instructions on how to join a discussion group for the book,

Preface ix
• instructions on how to contact the authors with questions or comments,
• instructions on how to report errors in the book, in the likely event that some exist, and
• slides and other materials for instructors.
Pearson offers many different products around the world to facilitate learning. In countries
outside the United States, some products and services related to this textbook may not be
available due to copyright and/or permissions restrictions. If you have questions, you can
contact your local office by visiting www.pearsonhighered.com/international or you can con-
tact your local Pearson representative.
About the cover
The cover depicts the final position from the decisive game 6 of the 1997 match between
chess champion Garry Kasparov and program DEEP BLUE. Kasparov, playing Black, was
forced to resign, making this the first time a computer had beaten a world champion in a
chess match. Kasparov is shown at the top. To his left is the Asimo humanoid robot and
to his right is Thomas Bayes (1702–1761), whose ideas about probability as a measure of
belief underlie much of modern AI technology. Below that we see a Mars Exploration Rover,
a robot that landed on Mars in 2004 and has been exploring the planet ever since. To the
right is Alan Turing (1912–1954), whose fundamental work defined the fields of computer
science in general and artificial intelligence in particular. At the bottom is Shakey (1966–
1972), the first robot to combine perception, world-modeling, planning, and learning. With
Shakey is project leader Charles Rosen (1917–2002). At the bottom right is Aristotle (384
B.C.–322 B.C.), who pioneered the study of logic; his work was state of the art until the 19th
century (copy of a bust by Lysippos). At the bottom left, lightly screened behind the authors’
names, is a planning algorithm by Aristotle from De Motu Animalium in the original Greek.
Behind the title is a portion of the CPSC Bayesian network for medical diagnosis (Pradhan
et al., 1994). Behind the chess board is part of a Bayesian logic model for detecting nuclear
explosions from seismic signals.
Credits: Stan Honda/Getty (Kasparaov), Library of Congress (Bayes), NASA (Mars
rover), National Museum of Rome (Aristotle), Peter Norvig (book), Ian Parker (Berkeley
skyline), Shutterstock (Asimo, Chess pieces), Time Life/Getty (Shakey, Turing).
Acknowledgments
This book would not have been possible without the many contributors whose names did not
make it to the cover. Jitendra Malik and David Forsyth wrote Chapter 24 (computer vision)
and Sebastian Thrun wrote Chapter 25 (robotics). Vibhu Mittal wrote part of Chapter 22
(natural language). Nick Hay, Mehran Sahami, and Ernest Davis wrote some of the exercises.
Zoran Duric (George Mason), Thomas C. Henderson (Utah), Leon Reznik (RIT), Michael
Gourley (Central Oklahoma) and Ernest Davis (NYU) reviewed the manuscript and made
helpful suggestions. We thank Ernie Davis in particular for his tireless ability to read multiple
drafts and help improve the book. Nick Hay whipped the bibliography into shape and on
deadline stayed up to 5:30 AM writing code to make the book better. Jon Barron formatted
and improved the diagrams in this edition, while Tim Huang, Mark Paskin, and Cynthia

x Preface
Bruyns helped with diagrams and algorithms in previous editions. Ravi Mohan and Ciaran
O’Reilly wrote and maintain the Java code examples on the Web site. John Canny wrote
the robotics chapter for the first edition and Douglas Edwards researched the historical notes.
Tracy Dunkelberger, Allison Michael, Scott Disanno, and Jane Bonnell at Pearson tried their
best to keep us on schedule and made many helpful suggestions. Most helpful of all has
been Julie Sussman, P.P.A., who read every chapter and provided extensive improvements. In
previous editions we had proofreaders who would tell us when we left out a comma and said
which when we meant that; Julie told us when we left out a minus sign and said xi when we
meant xj. For every typo or confusing explanation that remains in the book, rest assured that
Julie has fixed at least five. She persevered even when a power failure forced her to work by
lantern light rather than LCD glow.
Stuart would like to thank his parents for their support and encouragement and his
wife, Loy Sheflott, for her endless patience and boundless wisdom. He hopes that Gordon,
Lucy, George, and Isaac will soon be reading this book after they have forgiven him for
working so long on it. RUGS (Russell’s Unusual Group of Students) have been unusually
helpful, as always.
Peter would like to thank his parents (Torsten and Gerda) for getting him started,
and his wife (Kris), children (Bella and Juliet), colleagues, and friends for encouraging and
tolerating him through the long hours of writing and longer hours of rewriting.
We both thank the librarians at Berkeley, Stanford, and NASA and the developers of
CiteSeer, Wikipedia, and Google, who have revolutionized the way we do research. We can’t
acknowledge all the people who have used the book and made suggestions, but we would like
to note the especially helpful comments of Gagan Aggarwal, Eyal Amir, Ion Androutsopou-
los, Krzysztof Apt, Warren Haley Armstrong, Ellery Aziel, Jeff Van Baalen, Darius Bacon,
Brian Baker, Shumeet Baluja, Don Barker, Tony Barrett, James Newton Bass, Don Beal,
Howard Beck, Wolfgang Bibel, John Binder, Larry Bookman, David R. Boxall, Ronen Braf-
man, John Bresina, Gerhard Brewka, Selmer Bringsjord, Carla Brodley, Chris Brown, Emma
Brunskill, Wilhelm Burger, Lauren Burka, Carlos Bustamante, Joao Cachopo, Murray Camp-
bell, Norman Carver, Emmanuel Castro, Anil Chakravarthy, Dan Chisarick, Berthe Choueiry,
Roberto Cipolla, David Cohen, James Coleman, Julie Ann Comparini, Corinna Cortes, Gary
Cottrell, Ernest Davis, Tom Dean, Rina Dechter, Tom Dietterich, Peter Drake, Chuck Dyer,
Doug Edwards, Robert Egginton, Asma’a El-Budrawy, Barbara Engelhardt, Kutluhan Erol,
Oren Etzioni, Hana Filip, Douglas Fisher, Jeffrey Forbes, Ken Ford, Eric Fosler-Lussier,
John Fosler, Jeremy Frank, Alex Franz, Bob Futrelle, Marek Galecki, Stefan Gerberding,
Stuart Gill, Sabine Glesner, Seth Golub, Gosta Grahne, Russ Greiner, Eric Grimson, Bar-
bara Grosz, Larry Hall, Steve Hanks, Othar Hansson, Ernst Heinz, Jim Hendler, Christoph
Herrmann, Paul Hilfinger, Robert Holte, Vasant Honavar, Tim Huang, Seth Hutchinson, Joost
Jacob, Mark Jelasity, Magnus Johansson, Istvan Jonyer, Dan Jurafsky, Leslie Kaelbling, Keiji
Kanazawa, Surekha Kasibhatla, Simon Kasif, Henry Kautz, Gernot Kerschbaumer, Max
Khesin, Richard Kirby, Dan Klein, Kevin Knight, Roland Koenig, Sven Koenig, Daphne
Koller, Rich Korf, Benjamin Kuipers, James Kurien, John Lafferty, John Laird, Gus Lars-
son, John Lazzaro, Jon LeBlanc, Jason Leatherman, Frank Lee, Jon Lehto, Edward Lim,
Phil Long, Pierre Louveaux, Don Loveland, Sridhar Mahadevan, Tony Mancill, Jim Martin,

Preface xi
Andy Mayer, John McCarthy, David McGrane, Jay Mendelsohn, Risto Miikkulanien, Brian
Milch, Steve Minton, Vibhu Mittal, Mehryar Mohri, Leora Morgenstern, Stephen Muggleton,
Kevin Murphy, Ron Musick, Sung Myaeng, Eric Nadeau, Lee Naish, Pandu Nayak, Bernhard
Nebel, Stuart Nelson, XuanLong Nguyen, Nils Nilsson, Illah Nourbakhsh, Ali Nouri, Arthur
Nunes-Harwitt, Steve Omohundro, David Page, David Palmer, David Parkes, Ron Parr, Mark
Paskin, Tony Passera, Amit Patel, Michael Pazzani, Fernando Pereira, Joseph Perla, Wim Pi-
jls, Ira Pohl, Martha Pollack, David Poole, Bruce Porter, Malcolm Pradhan, Bill Pringle, Lor-
raine Prior, Greg Provan, William Rapaport, Deepak Ravichandran, Ioannis Refanidis, Philip
Resnik, Francesca Rossi, Sam Roweis, Richard Russell, Jonathan Schaeffer, Richard Scherl,
Hinrich Schuetze, Lars Schuster, Bart Selman, Soheil Shams, Stuart Shapiro, Jude Shav-
lik, Yoram Singer, Satinder Singh, Daniel Sleator, David Smith, Bryan So, Robert Sproull,
Lynn Stein, Larry Stephens, Andreas Stolcke, Paul Stradling, Devika Subramanian, Marek
Suchenek, Rich Sutton, Jonathan Tash, Austin Tate, Bas Terwijn, Olivier Teytaud, Michael
Thielscher, William Thompson, Sebastian Thrun, Eric Tiedemann, Mark Torrance, Randall
Upham, Paul Utgoff, Peter van Beek, Hal Varian, Paulina Varshavskaya, Sunil Vemuri, Vandi
Verma, Ubbo Visser, Jim Waldo, Toby Walsh, Bonnie Webber, Dan Weld, Michael Wellman,
Kamin Whitehouse, Michael Dean White, Brian Williams, David Wolfe, Jason Wolfe, Bill
Woods, Alden Wright, Jay Yagnik, Mark Yasuda, Richard Yen, Eliezer Yudkowsky, Weixiong
Zhang, Ming Zhao, Shlomo Zilberstein, and our esteemed colleague Anonymous Reviewer.

About the Authors
Stuart Russell was born in 1962 in Portsmouth, England. He received his B.A. with first-
class honours in physics from Oxford University in 1982, and his Ph.D. in computer science
from Stanford in 1986. He then joined the faculty of the University of California at Berkeley,
where he is a professor of computer science, director of the Center for Intelligent Systems,
and holder of the Smith–Zadeh Chair in Engineering. In 1990, he received the Presidential
Young Investigator Award of the National Science Foundation, and in 1995 he was cowinner
of the Computers and Thought Award. He was a 1996 Miller Professor of the University of
California and was appointed to a Chancellor’s Professorship in 2000. In 1998, he gave the
Forsythe Memorial Lectures at Stanford University. He is a Fellow and former Executive
Council member of the American Association for Artificial Intelligence. He has published
over 100 papers on a wide range of topics in artificial intelligence. His other books include
The Use of Knowledge in Analogy and Induction and (with Eric Wefald) Do the Right Thing:
Studies in Limited Rationality.
Peter Norvig is currently Director of Research at Google, Inc., and was the director respon-
sible for the core Web search algorithms from 2002 to 2005. He is a Fellow of the American
Association for Artificial Intelligence and the Association for Computing Machinery. Previ-
ously, he was head of the Computational Sciences Division at NASA Ames Research Center,
where he oversaw NASA’s research and development in artificial intelligence and robotics,
and chief scientist at Junglee, where he helped develop one of the first Internet information
extraction services. He received a B.S. in applied mathematics from Brown University and
a Ph.D. in computer science from the University of California at Berkeley. He received the
Distinguished Alumni and Engineering Innovation awards from Berkeley and the Exceptional
Achievement Medal from NASA. He has been a professor at the University of Southern Cal-
ifornia and a research faculty member at Berkeley. His other books are Paradigms of AI
Programming: Case Studies in Common Lisp and Verbmobil: A Translation System for Face-
to-Face Dialog and Intelligent Help Systems for UNIX.
xii

Contents
I Artificial Intelligence
1 Introduction 1
1.1 What Is AI? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Foundations of Artificial Intelligence . . . . . . . . . . . . . . . . . . 5
1.3 The History of Artificial Intelligence . . . . . . . . . . . . . . . . . . . . 16
1.4 The State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.5 Summary, Bibliographical and Historical Notes, Exercises . . . . . . . . . 29
2 Intelligent Agents 34
2.1 Agents and Environments . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2 Good Behavior: The Concept of Rationality . . . . . . . . . . . . . . . . 36
2.3 The Nature of Environments . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.4 The Structure of Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
II Problem-solving
3 Solving Problems by Searching 64
3.1 Problem-Solving Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.2 Example Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3 Searching for Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.4 Uninformed Search Strategies . . . . . . . . . . . . . . . . . . . . . . . . 81
3.5 Informed (Heuristic) Search Strategies . . . . . . . . . . . . . . . . . . . 92
3.6 Heuristic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4 Beyond Classical Search 120
4.1 Local Search Algorithms and Optimization Problems . . . . . . . . . . . 120
4.2 Local Search in Continuous Spaces . . . . . . . . . . . . . . . . . . . . . 129
4.3 Searching with Nondeterministic Actions . . . . . . . . . . . . . . . . . . 133
4.4 Searching with Partial Observations . . . . . . . . . . . . . . . . . . . . . 138
4.5 Online Search Agents and Unknown Environments . . . . . . . . . . . . 147
5 Adversarial Search 161
5.1 Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.2 Optimal Decisions in Games . . . . . . . . . . . . . . . . . . . . . . . . 163
5.3 Alpha–Beta Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.4 Imperfect Real-Time Decisions . . . . . . . . . . . . . . . . . . . . . . . 171
5.5 Stochastic Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
xiii

xiv Contents
5.6 Partially Observable Games . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.7 State-of-the-Art Game Programs . . . . . . . . . . . . . . . . . . . . . . 185
5.8 Alternative Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6 Constraint Satisfaction Problems 202
6.1 Defining Constraint Satisfaction Problems . . . . . . . . . . . . . . . . . 202
6.2 Constraint Propagation: Inference in CSPs . . . . . . . . . . . . . . . . . 208
6.3 Backtracking Search for CSPs . . . . . . . . . . . . . . . . . . . . . . . . 214
6.4 Local Search for CSPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
6.5 The Structure of Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 222
III Knowledge, reasoning, and planning
7 Logical Agents 234
7.1 Knowledge-Based Agents . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.2 The Wumpus World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.3 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
7.4 Propositional Logic: A Very Simple Logic . . . . . . . . . . . . . . . . . 243
7.5 Propositional Theorem Proving . . . . . . . . . . . . . . . . . . . . . . . 249
7.6 Effective Propositional Model Checking . . . . . . . . . . . . . . . . . . 259
7.7 Agents Based on Propositional Logic . . . . . . . . . . . . . . . . . . . . 265
8 First-Order Logic 285
8.1 Representation Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 285
8.2 Syntax and Semantics of First-Order Logic . . . . . . . . . . . . . . . . . 290
8.3 Using First-Order Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
8.4 Knowledge Engineering in First-Order Logic . . . . . . . . . . . . . . . . 307
9 Inference in First-Order Logic 322
9.1 Propositional vs. First-Order Inference . . . . . . . . . . . . . . . . . . . 322
9.2 Unification and Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
9.3 Forward Chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
9.4 Backward Chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
9.5 Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
10 Classical Planning 366
10.1 Definition of Classical Planning . . . . . . . . . . . . . . . . . . . . . . . 366
10.2 Algorithms for Planning as State-Space Search . . . . . . . . . . . . . . . 373
10.3 Planning Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379

Contents xv
10.4 Other Classical Planning Approaches . . . . . . . . . . . . . . . . . . . . 387
10.5 Analysis of Planning Approaches . . . . . . . . . . . . . . . . . . . . . . 392
11 Planning and Acting in the Real World 401
11.1 Time, Schedules, and Resources . . . . . . . . . . . . . . . . . . . . . . . 401
11.2 Hierarchical Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406
11.3 Planning and Acting in Nondeterministic Domains . . . . . . . . . . . . . 415
11.4 Multiagent Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
12 Knowledge Representation 437
12.1 Ontological Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
12.2 Categories and Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
12.3 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
12.4 Mental Events and Mental Objects . . . . . . . . . . . . . . . . . . . . . 450
12.5 Reasoning Systems for Categories . . . . . . . . . . . . . . . . . . . . . 453
12.6 Reasoning with Default Information . . . . . . . . . . . . . . . . . . . . 458
12.7 The Internet Shopping World . . . . . . . . . . . . . . . . . . . . . . . . 462
IV Uncertain knowledge and reasoning
13 Quantifying Uncertainty 480
13.1 Acting under Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . 480
13.2 Basic Probability Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 483
13.3 Inference Using Full Joint Distributions . . . . . . . . . . . . . . . . . . . 490
13.4 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
13.5 Bayes’ Rule and Its Use . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
13.6 The Wumpus World Revisited . . . . . . . . . . . . . . . . . . . . . . . . 499
14 Probabilistic Reasoning 510
14.1 Representing Knowledge in an Uncertain Domain . . . . . . . . . . . . . 510
14.2 The Semantics of Bayesian Networks . . . . . . . . . . . . . . . . . . . . 513
14.3 Efﬁcient Representation of Conditional Distributions . . . . . . . . . . . . 518
14.4 Exact Inference in Bayesian Networks . . . . . . . . . . . . . . . . . . . 522
14.5 Approximate Inference in Bayesian Networks . . . . . . . . . . . . . . . 530
14.6 Relational and First-Order Probability Models . . . . . . . . . . . . . . . 539
14.7 Other Approaches to Uncertain Reasoning . . . . . . . . . . . . . . . . . 546
15 Probabilistic Reasoning over Time 566
15.1 Time and Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566

xvi Contents
15.2 Inference in Temporal Models . . . . . . . . . . . . . . . . . . . . . . . . 570
15.3 Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
15.4 Kalman Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
15.5 Dynamic Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . 590
15.6 Keeping Track of Many Objects . . . . . . . . . . . . . . . . . . . . . . . 599
16 Making Simple Decisions 610
16.1 Combining Beliefs and Desires under Uncertainty . . . . . . . . . . . . . 610
16.2 The Basis of Utility Theory . . . . . . . . . . . . . . . . . . . . . . . . . 611
16.3 Utility Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
16.4 Multiattribute Utility Functions . . . . . . . . . . . . . . . . . . . . . . . 622
16.5 Decision Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
16.6 The Value of Information . . . . . . . . . . . . . . . . . . . . . . . . . . 628
16.7 Decision-Theoretic Expert Systems . . . . . . . . . . . . . . . . . . . . . 633
17 Making Complex Decisions 645
17.1 Sequential Decision Problems . . . . . . . . . . . . . . . . . . . . . . . . 645
17.2 Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
17.3 Policy Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656
17.4 Partially Observable MDPs . . . . . . . . . . . . . . . . . . . . . . . . . 658
17.5 Decisions with Multiple Agents: Game Theory . . . . . . . . . . . . . . . 666
17.6 Mechanism Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
V Learning
18 Learning from Examples 693
18.1 Forms of Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
18.2 Supervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
18.3 Learning Decision Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
18.4 Evaluating and Choosing the Best Hypothesis . . . . . . . . . . . . . . . 708
18.5 The Theory of Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
18.6 Regression and Classiﬁcation with Linear Models . . . . . . . . . . . . . 717
18.7 Artiﬁcial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . 727
18.8 Nonparametric Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 737
18.9 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . 744
18.10 Ensemble Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748
18.11 Practical Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . 753
19 Knowledge in Learning 768
19.1 A Logical Formulation of Learning . . . . . . . . . . . . . . . . . . . . . 768

Contents xvii
19.2 Knowledge in Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 777
19.3 Explanation-Based Learning . . . . . . . . . . . . . . . . . . . . . . . . 780
19.4 Learning Using Relevance Information . . . . . . . . . . . . . . . . . . . 784
19.5 Inductive Logic Programming . . . . . . . . . . . . . . . . . . . . . . . . 788
20 Learning Probabilistic Models 802
20.1 Statistical Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802
20.2 Learning with Complete Data . . . . . . . . . . . . . . . . . . . . . . . . 806
20.3 Learning with Hidden Variables: The EM Algorithm . . . . . . . . . . . . 816
21 Reinforcement Learning 830
21.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 830
21.2 Passive Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . 832
21.3 Active Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . 839
21.4 Generalization in Reinforcement Learning . . . . . . . . . . . . . . . . . 845
21.5 Policy Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848
21.6 Applications of Reinforcement Learning . . . . . . . . . . . . . . . . . . 850
VI Communicating, perceiving, and acting
22 Natural Language Processing 860
22.1 Language Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 860
22.2 Text Classiﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865
22.3 Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867
22.4 Information Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873
23 Natural Language for Communication 888
23.1 Phrase Structure Grammars . . . . . . . . . . . . . . . . . . . . . . . . . 888
23.2 Syntactic Analysis (Parsing) . . . . . . . . . . . . . . . . . . . . . . . . . 892
23.3 Augmented Grammars and Semantic Interpretation . . . . . . . . . . . . 897
23.4 Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 907
23.5 Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912
24 Perception 928
24.1 Image Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929
24.2 Early Image-Processing Operations . . . . . . . . . . . . . . . . . . . . . 935
24.3 Object Recognition by Appearance . . . . . . . . . . . . . . . . . . . . . 942
24.4 Reconstructing the 3D World . . . . . . . . . . . . . . . . . . . . . . . . 947
24.5 Object Recognition from Structural Information . . . . . . . . . . . . . . 957

xviii Contents
24.6 Using Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 961
25 Robotics 971
25.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 971
25.2 Robot Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 973
25.3 Robotic Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978
25.4 Planning to Move . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986
25.5 Planning Uncertain Movements . . . . . . . . . . . . . . . . . . . . . . . 993
25.6 Moving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997
25.7 Robotic Software Architectures . . . . . . . . . . . . . . . . . . . . . . . 1003
25.8 Application Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1006
VII Conclusions
26 Philosophical Foundations 1020
26.1 Weak AI: Can Machines Act Intelligently? . . . . . . . . . . . . . . . . . 1020
26.2 Strong AI: Can Machines Really Think? . . . . . . . . . . . . . . . . . . 1026
26.3 The Ethics and Risks of Developing Artiﬁcial Intelligence . . . . . . . . . 1034
27 AI: The Present and Future 1044
27.1 Agent Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1044
27.2 Agent Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1047
27.3 Are We Going in the Right Direction? . . . . . . . . . . . . . . . . . . . 1049
27.4 What If AI Does Succeed? . . . . . . . . . . . . . . . . . . . . . . . . . 1051
A Mathematical background 1053
A.1 Complexity Analysis and O() Notation . . . . . . . . . . . . . . . . . . . 1053
A.2 Vectors, Matrices, and Linear Algebra . . . . . . . . . . . . . . . . . . . 1055
A.3 Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057
B Notes on Languages and Algorithms 1060
B.1 Deﬁning Languages with Backus–Naur Form (BNF) . . . . . . . . . . . . 1060
B.2 Describing Algorithms with Pseudocode . . . . . . . . . . . . . . . . . . 1061
B.3 Online Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1062
Bibliography 1063
Index 1095

1 INTRODUCTION
In which we try to explain why we consider artificial intelligence to be a subject
most worthy of study, and in which we try to decide what exactly it is, this being a
good thing to decide before embarking.
We call ourselves Homo sapiens—man the wise—because our intelligence is so important
INTELLIGENCE
to us. For thousands of years, we have tried to understand how we think; that is, how a mere
handful of matter can perceive, understand, predict, and manipulate a world far larger and
more complicated than itself. The field of artificial intelligence, or AI, goes further still: it
ARTIFICIAL
INTELLIGENCE
attempts not just to understand but also to build intelligent entities.
AI is one of the newest fields in science and engineering. Work started in earnest soon
after World War II, and the name itself was coined in 1956. Along with molecular biology,
AI is regularly cited as the “field I would most like to be in” by scientists in other disciplines.
A student in physics might reasonably feel that all the good ideas have already been taken by
Galileo, Newton, Einstein, and the rest. AI, on the other hand, still has openings for several
full-time Einsteins and Edisons.
AI currently encompasses a huge variety of subfields, ranging from the general (learning
and perception) to the specific, such as playing chess, proving mathematical theorems, writing
poetry, driving a car on a crowded street, and diagnosing diseases. AI is relevant to any
intellectual task; it is truly a universal field.
1.1 WHAT IS AI?
We have claimed that AI is exciting, but we have not said what it is. In Figure 1.1 we see
eight definitions of AI, laid out along two dimensions. The definitions on top are concerned
with thought processes and reasoning, whereas the ones on the bottom address behavior. The
definitions on the left measure success in terms of fidelity to human performance, whereas
the ones on the right measure against an ideal performance measure, called rationality. A
RATIONALITY
system is rational if it does the “right thing,” given what it knows.
Historically, all four approaches to AI have been followed, each by different people
with different methods. A human-centered approach must be in part an empirical science, in-
1

2 Chapter 1. Introduction
Thinking Humanly Thinking Rationally
“The exciting new effort to make comput-
ers think . . . machines with minds, in the
full and literal sense.” (Haugeland, 1985)
“The study of mental faculties through the
use of computational models.”
(Charniak and McDermott, 1985)
“[The automation of] activities that we
associate with human thinking, activities
such as decision-making, problem solv-
ing, learning . . .” (Bellman, 1978)
“The study of the computations that make
it possible to perceive, reason, and act.”
(Winston, 1992)
Acting Humanly Acting Rationally
“The art of creating machines that per-
form functions that require intelligence
when performed by people.” (Kurzweil,
1990)
“Computational Intelligence is the study
of the design of intelligent agents.” (Poole
et al., 1998)
“The study of how to make computers do
things at which, at the moment, people are
better.” (Rich and Knight, 1991)
“AI . . . is concerned with intelligent be-
havior in artifacts.” (Nilsson, 1998)
Figure 1.1 Some definitions of artificial intelligence, organized into four categories.
volving observations and hypotheses about human behavior. A rationalist1 approach involves
a combination of mathematics and engineering. The various group have both disparaged and
helped each other. Let us look at the four approaches in more detail.
1.1.1 Acting humanly: The Turing Test approach
The Turing Test, proposed by Alan Turing (1950), was designed to provide a satisfactory
TURING TEST
operational definition of intelligence. A computer passes the test if a human interrogator, after
posing some written questions, cannot tell whether the written responses come from a person
or from a computer. Chapter 26 discusses the details of the test and whether a computer would
really be intelligent if it passed. For now, we note that programming a computer to pass a
rigorously applied test provides plenty to work on. The computer would need to possess the
following capabilities:
• natural language processing to enable it to communicate successfully in English;
NATURAL LANGUAGE
PROCESSING
• knowledge representation to store what it knows or hears;
KNOWLEDGE
REPRESENTATION
• automated reasoning to use the stored information to answer questions and to draw
AUTOMATED
REASONING
new conclusions;
• machine learning to adapt to new circumstances and to detect and extrapolate patterns.
MACHINE LEARNING
1 By distinguishing between human and rational behavior, we are not suggesting that humans are necessarily
“irrational” in the sense of “emotionally unstable” or “insane.” One merely need note that we are not perfect:
not all chess players are grandmasters; and, unfortunately, not everyone gets an A on the exam. Some systematic
errors in human reasoning are cataloged by Kahneman et al. (1982).

Section 1.1. What Is AI? 3
Turing’s test deliberately avoided direct physical interaction between the interrogator and the
computer, because physical simulation of a person is unnecessary for intelligence. However,
the so-called total Turing Test includes a video signal so that the interrogator can test the
TOTAL TURING TEST
subject’s perceptual abilities, as well as the opportunity for the interrogator to pass physical
objects “through the hatch.” To pass the total Turing Test, the computer will need
• computer vision to perceive objects, and
COMPUTER VISION
• robotics to manipulate objects and move about.
ROBOTICS
These six disciplines compose most of AI, and Turing deserves credit for designing a test
that remains relevant 60 years later. Yet AI researchers have devoted little effort to passing
the Turing Test, believing that it is more important to study the underlying principles of in-
telligence than to duplicate an exemplar. The quest for “artificial flight” succeeded when the
Wright brothers and others stopped imitating birds and started using wind tunnels and learn-
ing about aerodynamics. Aeronautical engineering texts do not define the goal of their field
as making “machines that fly so exactly like pigeons that they can fool even other pigeons.”
1.1.2 Thinking humanly: The cognitive modeling approach
If we are going to say that a given program thinks like a human, we must have some way of
determining how humans think. We need to get inside the actual workings of human minds.
There are three ways to do this: through introspection—trying to catch our own thoughts as
they go by; through psychological experiments—observing a person in action; and through
brain imaging—observing the brain in action. Once we have a sufficiently precise theory of
the mind, it becomes possible to express the theory as a computer program. If the program’s
input–output behavior matches corresponding human behavior, that is evidence that some of
the program’s mechanisms could also be operating in humans. For example, Allen Newell
and Herbert Simon, who developed GPS, the “General Problem Solver” (Newell and Simon,
1961), were not content merely to have their program solve problems correctly. They were
more concerned with comparing the trace of its reasoning steps to traces of human subjects
solving the same problems. The interdisciplinary field of cognitive science brings together
COGNITIVE SCIENCE
computer models from AI and experimental techniques from psychology to construct precise
and testable theories of the human mind.
Cognitive science is a fascinating field in itself, worthy of several textbooks and at least
one encyclopedia (Wilson and Keil, 1999). We will occasionally comment on similarities or
differences between AI techniques and human cognition. Real cognitive science, however, is
necessarily based on experimental investigation of actual humans or animals. We will leave
that for other books, as we assume the reader has only a computer for experimentation.
In the early days of AI there was often confusion between the approaches: an author
would argue that an algorithm performs well on a task and that it is therefore a good model
of human performance, or vice versa. Modern authors separate the two kinds of claims;
this distinction has allowed both AI and cognitive science to develop more rapidly. The two
fields continue to fertilize each other, most notably in computer vision, which incorporates
neurophysiological evidence into computational models.

1.1.3 Thinking rationally: The “laws of thought” approach
The Greek philosopher Aristotle was one of the first to attempt to codify “right thinking,” that
is, irrefutable reasoning processes. His syllogisms provided patterns for argument structures
SYLLOGISM
that always yielded correct conclusions when given correct premises—for example, “Socrates
is a man; all men are mortal; therefore, Socrates is mortal.” These laws of thought were
supposed to govern the operation of the mind; their study initiated the field called logic.
LOGIC
Logicians in the 19th century developed a precise notation for statements about all kinds
of objects in the world and the relations among them. (Contrast this with ordinary arithmetic
notation, which provides only for statements about numbers.) By 1965, programs existed
that could, in principle, solve any solvable problem described in logical notation. (Although
if no solution exists, the program might loop forever.) The so-called logicist tradition within
LOGICIST
artificial intelligence hopes to build on such programs to create intelligent systems.
There are two main obstacles to this approach. First, it is not easy to take informal
knowledge and state it in the formal terms required by logical notation, particularly when
the knowledge is less than 100% certain. Second, there is a big difference between solving
a problem “in principle” and solving it in practice. Even problems with just a few hundred
facts can exhaust the computational resources of any computer unless it has some guidance
as to which reasoning steps to try first. Although both of these obstacles apply to any attempt
to build computational reasoning systems, they appeared first in the logicist tradition.
1.1.4 Acting rationally: The rational agent approach
An agent is just something that acts (agent comes from the Latin agere, to do). Of course,
AGENT
all computer programs do something, but computer agents are expected to do more: operate
autonomously, perceive their environment, persist over a prolonged time period, adapt to
change, and create and pursue goals. A rational agent is one that acts so as to achieve the
RATIONAL AGENT
best outcome or, when there is uncertainty, the best expected outcome.
In the “laws of thought” approach to AI, the emphasis was on correct inferences. Mak-
ing correct inferences is sometimes part of being a rational agent, because one way to act
rationally is to reason logically to the conclusion that a given action will achieve one’s goals
and then to act on that conclusion. On the other hand, correct inference is not all of ration-
ality; in some situations, there is no provably correct thing to do, but something must still be
done. There are also ways of acting rationally that cannot be said to involve inference. For
example, recoiling from a hot stove is a reflex action that is usually more successful than a
slower action taken after careful deliberation.
All the skills needed for the Turing Test also allow an agent to act rationally. Knowledge
representation and reasoning enable agents to reach good decisions. We need to be able to
generate comprehensible sentences in natural language to get by in a complex society. We
need learning not only for erudition, but also because it improves our ability to generate
effective behavior.
The rational-agent approach has two advantages over the other approaches. First, it
is more general than the “laws of thought” approach because correct inference is just one
of several possible mechanisms for achieving rationality. Second, it is more amenable to

Section 1.2. The Foundations of Artificial Intelligence 5
scientific development than are approaches based on human behavior or human thought. The
standard of rationality is mathematically well defined and completely general, and can be
“unpacked” to generate agent designs that provably achieve it. Human behavior, on the other
hand, is well adapted for one specific environment and is defined by, well, the sum total
of all the things that humans do. This book therefore concentrates on general principles
of rational agents and on components for constructing them. We will see that despite the
apparent simplicity with which the problem can be stated, an enormous variety of issues
come up when we try to solve it. Chapter 2 outlines some of these issues in more detail.
One important point to keep in mind: We will see before too long that achieving perfect
rationality—always doing the right thing—is not feasible in complicated environments. The
computational demands are just too high. For most of the book, however, we will adopt the
working hypothesis that perfect rationality is a good starting point for analysis. It simplifies
the problem and provides the appropriate setting for most of the foundational material in
the field. Chapters 5 and 17 deal explicitly with the issue of limited rationality—acting
LIMITED
RATIONALITY
appropriately when there is not enough time to do all the computations one might like.
1.2 THE FOUNDATIONS OF ARTIFICIAL INTELLIGENCE
In this section, we provide a brief history of the disciplines that contributed ideas, viewpoints,
and techniques to AI. Like any history, this one is forced to concentrate on a small number
of people, events, and ideas and to ignore others that also were important. We organize the
history around a series of questions. We certainly would not wish to give the impression that
these questions are the only ones the disciplines address or that the disciplines have all been
working toward AI as their ultimate fruition.
1.2.1 Philosophy
• Can formal rules be used to draw valid conclusions?
• How does the mind arise from a physical brain?
• Where does knowledge come from?
• How does knowledge lead to action?
Aristotle (384–322 B.C.), whose bust appears on the front cover of this book, was the first
to formulate a precise set of laws governing the rational part of the mind. He developed an
informal system of syllogisms for proper reasoning, which in principle allowed one to gener-
ate conclusions mechanically, given initial premises. Much later, Ramon Lull (d. 1315) had
the idea that useful reasoning could actually be carried out by a mechanical artifact. Thomas
Hobbes (1588–1679) proposed that reasoning was like numerical computation, that “we add
and subtract in our silent thoughts.” The automation of computation itself was already well
under way. Around 1500, Leonardo da Vinci (1452–1519) designed but did not build a me-
chanical calculator; recent reconstructions have shown the design to be functional. The first
known calculating machine was constructed around 1623 by the German scientist Wilhelm
Schickard (1592–1635), although the Pascaline, built in 1642 by Blaise Pascal (1623–1662),

is more famous. Pascal wrote that “the arithmetical machine produces effects which appear
nearer to thought than all the actions of animals.” Gottfried Wilhelm Leibniz (1646–1716)
built a mechanical device intended to carry out operations on concepts rather than numbers,
but its scope was rather limited. Leibniz did surpass Pascal by building a calculator that
could add, subtract, multiply, and take roots, whereas the Pascaline could only add and sub-
tract. Some speculated that machines might not just do calculations but actually be able to
think and act on their own. In his 1651 book Leviathan, Thomas Hobbes suggested the idea
of an “artificial animal,” arguing “For what is the heart but a spring; and the nerves, but so
many strings; and the joints, but so many wheels.”
It’s one thing to say that the mind operates, at least in part, according to logical rules, and
to build physical systems that emulate some of those rules; it’s another to say that the mind
itself is such a physical system. René Descartes (1596–1650) gave the first clear discussion
of the distinction between mind and matter and of the problems that arise. One problem with
a purely physical conception of the mind is that it seems to leave little room for free will:
if the mind is governed entirely by physical laws, then it has no more free will than a rock
“deciding” to fall toward the center of the earth. Descartes was a strong advocate of the power
of reasoning in understanding the world, a philosophy now called rationalism, and one that
RATIONALISM
counts Aristotle and Leibnitz as members. But Descartes was also a proponent of dualism.
DUALISM
He held that there is a part of the human mind (or soul or spirit) that is outside of nature,
exempt from physical laws. Animals, on the other hand, did not possess this dual quality;
they could be treated as machines. An alternative to dualism is materialism, which holds
MATERIALISM
that the brain’s operation according to the laws of physics constitutes the mind. Free will is
simply the way that the perception of available choices appears to the choosing entity.
Given a physical mind that manipulates knowledge, the next problem is to establish
the source of knowledge. The empiricism movement, starting with Francis Bacon’s (1561–
EMPIRICISM
1626) Novum Organum,2 is characterized by a dictum of John Locke (1632–1704): “Nothing
is in the understanding, which was not first in the senses.” David Hume’s (1711–1776) A
Treatise of Human Nature (Hume, 1739) proposed what is now known as the principle of
induction: that general rules are acquired by exposure to repeated associations between their
INDUCTION
elements. Building on the work of Ludwig Wittgenstein (1889–1951) and Bertrand Russell
(1872–1970), the famous Vienna Circle, led by Rudolf Carnap (1891–1970), developed the
doctrine of logical positivism. This doctrine holds that all knowledge can be characterized by
LOGICAL POSITIVISM
logical theories connected, ultimately, to observation sentences that correspond to sensory
OBSERVATION
SENTENCES
inputs; thus logical positivism combines rationalism and empiricism.3 The confirmation the-
ory of Carnap and Carl Hempel (1905–1997) attempted to analyze the acquisition of knowl-
CONFIRMATION
THEORY
edge from experience. Carnap’s book The Logical Structure of the World (1928) defined an
explicit computational procedure for extracting knowledge from elementary experiences. It
was probably the first theory of mind as a computational process.
2 The Novum Organum is an update of Aristotle’s Organon, or instrument of thought. Thus Aristotle can be
seen as both an empiricist and a rationalist.
3 In this picture, all meaningful statements can be verified or falsified either by experimentation or by analysis
of the meaning of the words. Because this rules out most of metaphysics, as was the intention, logical positivism
was unpopular in some circles.

The final element in the philosophical picture of the mind is the connection between
knowledge and action. This question is vital to AI because intelligence requires action as well
as reasoning. Moreover, only by understanding how actions are justified can we understand
how to build an agent whose actions are justifiable (or rational). Aristotle argued (in De Motu
Animalium) that actions are justified by a logical connection between goals and knowledge of
the action’s outcome (the last part of this extract also appears on the front cover of this book,
in the original Greek):
But how does it happen that thinking is sometimes accompanied by action and sometimes
not, sometimes by motion, and sometimes not? It looks as if almost the same thing
happens as in the case of reasoning and making inferences about unchanging objects. But
in that case the end is a speculative proposition . . . whereas here the conclusion which
results from the two premises is an action. . . . I need covering; a cloak is a covering. I
need a cloak. What I need, I have to make; I need a cloak. I have to make a cloak. And
the conclusion, the “I have to make a cloak,” is an action.
In the Nicomachean Ethics (Book III. 3, 1112b), Aristotle further elaborates on this topic,
suggesting an algorithm:
We deliberate not about ends, but about means. For a doctor does not deliberate whether
he shall heal, nor an orator whether he shall persuade, . . . They assume the end and
consider how and by what means it is attained, and if it seems easily and best produced
thereby; while if it is achieved by one means only they consider how it will be achieved
by this and by what means this will be achieved, till they come to the first cause, . . . and
what is last in the order of analysis seems to be first in the order of becoming. And if we
come on an impossibility, we give up the search, e.g., if we need money and this cannot
be got; but if a thing appears possible we try to do it.
Aristotle’s algorithm was implemented 2300 years later by Newell and Simon in their GPS
program. We would now call it a regression planning system (see Chapter 10).
Goal-based analysis is useful, but does not say what to do when several actions will
achieve the goal or when no action will achieve it completely. Antoine Arnauld (1612–1694)
correctly described a quantitative formula for deciding what action to take in cases like this
(see Chapter 16). John Stuart Mill’s (1806–1873) book Utilitarianism (Mill, 1863) promoted
the idea of rational decision criteria in all spheres of human activity. The more formal theory
of decisions is discussed in the following section.
1.2.2 Mathematics
• What are the formal rules to draw valid conclusions?
• What can be computed?
• How do we reason with uncertain information?
Philosophers staked out some of the fundamental ideas of AI, but the leap to a formal science
required a level of mathematical formalization in three fundamental areas: logic, computa-
tion, and probability.
The idea of formal logic can be traced back to the philosophers of ancient Greece, but
its mathematical development really began with the work of George Boole (1815–1864), who

worked out the details of propositional, or Boolean, logic (Boole, 1847). In 1879, Gottlob
Frege (1848–1925) extended Boole’s logic to include objects and relations, creating the first-
order logic that is used today.4 Alfred Tarski (1902–1983) introduced a theory of reference
that shows how to relate the objects in a logic to objects in the real world.
The next step was to determine the limits of what could be done with logic and com-
putation. The first nontrivial algorithm is thought to be Euclid’s algorithm for computing
ALGORITHM
greatest common divisors. The word algorithm (and the idea of studying them) comes from
al-Khowarazmi, a Persian mathematician of the 9th century, whose writings also introduced
Arabic numerals and algebra to Europe. Boole and others discussed algorithms for logical
deduction, and, by the late 19th century, efforts were under way to formalize general mathe-
matical reasoning as logical deduction. In 1930, Kurt Gödel (1906–1978) showed that there
exists an effective procedure to prove any true statement in the first-order logic of Frege and
Russell, but that first-order logic could not capture the principle of mathematical induction
needed to characterize the natural numbers. In 1931, Gödel showed that limits on deduc-
tion do exist. His incompleteness theorem showed that in any formal theory as strong as
INCOMPLETENESS
THEOREM
Peano arithmetic (the elementary theory of natural numbers), there are true statements that
are undecidable in the sense that they have no proof within the theory.
This fundamental result can also be interpreted as showing that some functions on the
integers cannot be represented by an algorithm—that is, they cannot be computed. This
motivated Alan Turing (1912–1954) to try to characterize exactly which functions are com-
putable—capable of being computed. This notion is actually slightly problematic because
COMPUTABLE
the notion of a computation or effective procedure really cannot be given a formal definition.
However, the Church–Turing thesis, which states that the Turing machine (Turing, 1936) is
capable of computing any computable function, is generally accepted as providing a sufficient
definition. Turing also showed that there were some functions that no Turing machine can
compute. For example, no machine can tell in general whether a given program will return
an answer on a given input or run forever.
Although decidability and computability are important to an understanding of computa-
tion, the notion of tractability has had an even greater impact. Roughly speaking, a problem
TRACTABILITY
is called intractable if the time required to solve instances of the problem grows exponentially
with the size of the instances. The distinction between polynomial and exponential growth
in complexity was first emphasized in the mid-1960s (Cobham, 1964; Edmonds, 1965). It is
important because exponential growth means that even moderately large instances cannot be
solved in any reasonable time. Therefore, one should strive to divide the overall problem of
generating intelligent behavior into tractable subproblems rather than intractable ones.
How can one recognize an intractable problem? The theory of NP-completeness, pio-
NP-COMPLETENESS
neered by Steven Cook (1971) and Richard Karp (1972), provides a method. Cook and Karp
showed the existence of large classes of canonical combinatorial search and reasoning prob-
lems that are NP-complete. Any problem class to which the class of NP-complete problems
can be reduced is likely to be intractable. (Although it has not been proved that NP-complete
4 Frege’s proposed notation for first-order logic—an arcane combination of textual and geometric features—
never became popular.

problems are necessarily intractable, most theoreticians believe it.) These results contrast
with the optimism with which the popular press greeted the first computers—“Electronic
Super-Brains” that were “Faster than Einstein!” Despite the increasing speed of computers,
careful use of resources will characterize intelligent systems. Put crudely, the world is an
extremely large problem instance! Work in AI has helped explain why some instances of
NP-complete problems are hard, yet others are easy (Cheeseman et al., 1991).
Besides logic and computation, the third great contribution of mathematics to AI is the
theory of probability. The Italian Gerolamo Cardano (1501–1576) first framed the idea of
PROBABILITY
probability, describing it in terms of the possible outcomes of gambling events. In 1654,
Blaise Pascal (1623–1662), in a letter to Pierre Fermat (1601–1665), showed how to pre-
dict the future of an unfinished gambling game and assign average payoffs to the gamblers.
Probability quickly became an invaluable part of all the quantitative sciences, helping to deal
with uncertain measurements and incomplete theories. James Bernoulli (1654–1705), Pierre
Laplace (1749–1827), and others advanced the theory and introduced new statistical meth-
ods. Thomas Bayes (1702–1761), who appears on the front cover of this book, proposed
a rule for updating probabilities in the light of new evidence. Bayes’ rule underlies most
modern approaches to uncertain reasoning in AI systems.
1.2.3 Economics
• How should we make decisions so as to maximize payoff?
• How should we do this when others may not go along?
• How should we do this when the payoff may be far in the future?
The science of economics got its start in 1776, when Scottish philosopher Adam Smith
(1723–1790) published An Inquiry into the Nature and Causes of the Wealth of Nations.
While the ancient Greeks and others had made contributions to economic thought, Smith was
the first to treat it as a science, using the idea that economies can be thought of as consist-
ing of individual agents maximizing their own economic well-being. Most people think of
economics as being about money, but economists will say that they are really studying how
people make choices that lead to preferred outcomes. When McDonald’s offers a hamburger
for a dollar, they are asserting that they would prefer the dollar and hoping that customers will
prefer the hamburger. The mathematical treatment of “preferred outcomes” or utility was
UTILITY
first formalized by Léon Walras (pronounced “Valrasse”) (1834-1910) and was improved by
Frank Ramsey (1931) and later by John von Neumann and Oskar Morgenstern in their book
The Theory of Games and Economic Behavior (1944).
Decision theory, which combines probability theory with utility theory, provides a for-
DECISION THEORY
mal and complete framework for decisions (economic or otherwise) made under uncertainty—
that is, in cases where probabilistic descriptions appropriately capture the decision maker’s
environment. This is suitable for “large” economies where each agent need pay no attention
to the actions of other agents as individuals. For “small” economies, the situation is much
more like a game: the actions of one player can significantly affect the utility of another
(either positively or negatively). Von Neumann and Morgenstern’s development of game
theory (see also Luce and Raiffa, 1957) included the surprising result that, for some games,
GAME THEORY

a rational agent should adopt policies that are (or least appear to be) randomized. Unlike de-
cision theory, game theory does not offer an unambiguous prescription for selecting actions.
For the most part, economists did not address the third question listed above, namely,
how to make rational decisions when payoffs from actions are not immediate but instead re-
sult from several actions taken in sequence. This topic was pursued in the field of operations
research, which emerged in World War II from efforts in Britain to optimize radar installa-
OPERATIONS
RESEARCH
tions, and later found civilian applications in complex management decisions. The work of
Richard Bellman (1957) formalized a class of sequential decision problems called Markov
decision processes, which we study in Chapters 17 and 21.
Work in economics and operations research has contributed much to our notion of ra-
tional agents, yet for many years AI research developed along entirely separate paths. One
reason was the apparent complexity of making rational decisions. The pioneering AI re-
searcher Herbert Simon (1916–2001) won the Nobel Prize in economics in 1978 for his early
work showing that models based on satisficing—making decisions that are “good enough,”
SATISFICING
rather than laboriously calculating an optimal decision—gave a better description of actual
human behavior (Simon, 1947). Since the 1990s, there has been a resurgence of interest in
decision-theoretic techniques for agent systems (Wellman, 1995).
1.2.4 Neuroscience
• How do brains process information?
Neuroscience is the study of the nervous system, particularly the brain. Although the exact
NEUROSCIENCE
way in which the brain enables thought is one of the great mysteries of science, the fact that it
does enable thought has been appreciated for thousands of years because of the evidence that
strong blows to the head can lead to mental incapacitation. It has also long been known that
human brains are somehow different; in about 335 B.C. Aristotle wrote, “Of all the animals,
man has the largest brain in proportion to his size.”5 Still, it was not until the middle of the
18th century that the brain was widely recognized as the seat of consciousness. Before then,
candidate locations included the heart and the spleen.
Paul Broca’s (1824–1880) study of aphasia (speech deficit) in brain-damaged patients
in 1861 demonstrated the existence of localized areas of the brain responsible for specific
cognitive functions. In particular, he showed that speech production was localized to the
portion of the left hemisphere now called Broca’s area.6 By that time, it was known that
the brain consisted of nerve cells, or neurons, but it was not until 1873 that Camillo Golgi
NEURON
(1843–1926) developed a staining technique allowing the observation of individual neurons
in the brain (see Figure 1.2). This technique was used by Santiago Ramon y Cajal (1852–
1934) in his pioneering studies of the brain’s neuronal structures.7 Nicolas Rashevsky (1936,
1938) was the first to apply mathematical models to the study of the nervous sytem.
5 Since then, it has been discovered that the tree shrew (Scandentia) has a higher ratio of brain to body mass.
6 Many cite Alexander Hood (1824) as a possible prior source.
7 Golgi persisted in his belief that the brain’s functions were carried out primarily in a continuous medium in
which neurons were embedded, whereas Cajal propounded the “neuronal doctrine.” The two shared the Nobel
prize in 1906 but gave mutually antagonistic acceptance speeches.

Axon
Cell body or Soma
Nucleus
Dendrite
Synapses
Axonal arborization
Axon from another cell
Synapse
Figure 1.2 The parts of a nerve cell or neuron. Each neuron consists of a cell body,
or soma, that contains a cell nucleus. Branching out from the cell body are a number of
ﬁbers called dendrites and a single long ﬁber called the axon. The axon stretches out for a
long distance, much longer than the scale in this diagram indicates. Typically, an axon is
1 cm long (100 times the diameter of the cell body), but can reach up to 1 meter. A neuron
makes connections with 10 to 100,000 other neurons at junctions called synapses. Signals are
propagated from neuron to neuron by a complicated electrochemical reaction. The signals
control brain activity in the short term and also enable long-term changes in the connectivity
of neurons. These mechanisms are thought to form the basis for learning in the brain. Most
information processing goes on in the cerebral cortex, the outer layer of the brain. The basic
organizational unit appears to be a column of tissue about 0.5 mm in diameter, containing
about 20,000 neurons and extending the full depth of the cortex about 4 mm in humans).
We now have some data on the mapping between areas of the brain and the parts of the
body that they control or from which they receive sensory input. Such mappings are able to
change radically over the course of a few weeks, and some animals seem to have multiple
maps. Moreover, we do not fully understand how other areas can take over functions when
one area is damaged. There is almost no theory on how an individual memory is stored.
The measurement of intact brain activity began in 1929 with the invention by Hans
Berger of the electroencephalograph (EEG). The recent development of functional magnetic
resonance imaging (fMRI) (Ogawa et al., 1990; Cabeza and Nyberg, 2001) is giving neu-
roscientists unprecedentedly detailed images of brain activity, enabling measurements that
correspond in interesting ways to ongoing cognitive processes. These are augmented by
advances in single-cell recording of neuron activity. Individual neurons can be stimulated
electrically, chemically, or even optically (Han and Boyden, 2007), allowing neuronal input–
output relationships to be mapped. Despite these advances, we are still a long way from
understanding how cognitive processes actually work.
The truly amazing conclusion is that a collection of simple cells can lead to thought,
action, and consciousness or, in the pithy words of John Searle (1992), brains cause minds.

Supercomputer Personal Computer Human Brain
Computational units 104 CPUs, 1012 transistors 4 CPUs, 109 transistors 1011 neurons
Storage units 1014 bits RAM 1011 bits RAM 1011 neurons
1015 bits disk 1013 bits disk 1014 synapses
Cycle time 10−9 sec 10−9 sec 10−3 sec
Operations/sec 1015 1010 1017
Memory updates/sec 1014 1010 1014
Figure 1.3 A crude comparison of the raw computational resources available to the IBM
BLUE GENE supercomputer, a typical personal computer of 2008, and the human brain. The
brain’s numbers are essentially fixed, whereas the supercomputer’s numbers have been in-
creasing by a factor of 10 every 5 years or so, allowing it to achieve rough parity with the
brain. The personal computer lags behind on all metrics except cycle time.
The only real alternative theory is mysticism: that minds operate in some mystical realm that
is beyond physical science.
Brains and digital computers have somewhat different properties. Figure 1.3 shows that
computers have a cycle time that is a million times faster than a brain. The brain makes up
for that with far more storage and interconnection than even a high-end personal computer,
although the largest supercomputers have a capacity that is similar to the brain’s. (It should
be noted, however, that the brain does not seem to use all of its neurons simultaneously.)
Futurists make much of these numbers, pointing to an approaching singularity at which
SINGULARITY
computers reach a superhuman level of performance (Vinge, 1993; Kurzweil, 2005), but the
raw comparisons are not especially informative. Even with a computer of virtually unlimited
capacity, we still would not know how to achieve the brain’s level of intelligence.
1.2.5 Psychology
• How do humans and animals think and act?
The origins of scientific psychology are usually traced to the work of the German physi-
cist Hermann von Helmholtz (1821–1894) and his student Wilhelm Wundt (1832–1920).
Helmholtz applied the scientific method to the study of human vision, and his Handbook
of Physiological Optics is even now described as “the single most important treatise on the
physics and physiology of human vision” (Nalwa, 1993, p.15). In 1879, Wundt opened the
first laboratory of experimental psychology, at the University of Leipzig. Wundt insisted
on carefully controlled experiments in which his workers would perform a perceptual or as-
sociative task while introspecting on their thought processes. The careful controls went a
long way toward making psychology a science, but the subjective nature of the data made
it unlikely that an experimenter would ever disconfirm his or her own theories. Biologists
studying animal behavior, on the other hand, lacked introspective data and developed an ob-
jective methodology, as described by H. S. Jennings (1906) in his influential work Behavior of
the Lower Organisms. Applying this viewpoint to humans, the behaviorism movement, led
BEHAVIORISM
by John Watson (1878–1958), rejected any theory involving mental processes on the grounds

that introspection could not provide reliable evidence. Behaviorists insisted on studying only
objective measures of the percepts (or stimulus) given to an animal and its resulting actions
(or response). Behaviorism discovered a lot about rats and pigeons but had less success at
understanding humans.
Cognitive psychology, which views the brain as an information-processing device,
COGNITIVE
PSYCHOLOGY
can be traced back at least to the works of William James (1842–1910). Helmholtz also
insisted that perception involved a form of unconscious logical inference. The cognitive
viewpoint was largely eclipsed by behaviorism in the United States, but at Cambridge’s Ap-
plied Psychology Unit, directed by Frederic Bartlett (1886–1969), cognitive modeling was
able to flourish. The Nature of Explanation, by Bartlett’s student and successor Kenneth
Craik (1943), forcefully reestablished the legitimacy of such “mental” terms as beliefs and
goals, arguing that they are just as scientific as, say, using pressure and temperature to talk
about gases, despite their being made of molecules that have neither. Craik specified the
three key steps of a knowledge-based agent: (1) the stimulus must be translated into an inter-
nal representation, (2) the representation is manipulated by cognitive processes to derive new
internal representations, and (3) these are in turn retranslated back into action. He clearly
explained why this was a good design for an agent:
If the organism carries a “small-scale model” of external reality and of its own possible
actions within its head, it is able to try out various alternatives, conclude which is the best
of them, react to future situations before they arise, utilize the knowledge of past events
in dealing with the present and future, and in every way to react in a much fuller, safer,
and more competent manner to the emergencies which face it. (Craik, 1943)
After Craik’s death in a bicycle accident in 1945, his work was continued by Donald Broad-
bent, whose book Perception and Communication (1958) was one of the first works to model
psychological phenomena as information processing. Meanwhile, in the United States, the
development of computer modeling led to the creation of the field of cognitive science. The
field can be said to have started at a workshop in September 1956 at MIT. (We shall see that
this is just two months after the conference at which AI itself was “born.”) At the workshop,
George Miller presented The Magic Number Seven, Noam Chomsky presented Three Models
of Language, and Allen Newell and Herbert Simon presented The Logic Theory Machine.
These three influential papers showed how computer models could be used to address the
psychology of memory, language, and logical thinking, respectively. It is now a common
(although far from universal) view among psychologists that “a cognitive theory should be
like a computer program” (Anderson, 1980); that is, it should describe a detailed information-
processing mechanism whereby some cognitive function might be implemented.
1.2.6 Computer engineering
• How can we build an efficient computer?
For artificial intelligence to succeed, we need two things: intelligence and an artifact. The
computer has been the artifact of choice. The modern digital electronic computer was in-
vented independently and almost simultaneously by scientists in three countries embattled in

World War II. The first operational computer was the electromechanical Heath Robinson,8
built in 1940 by Alan Turing’s team for a single purpose: deciphering German messages. In
1943, the same group developed the Colossus, a powerful general-purpose machine based
on vacuum tubes.9 The first operational programmable computer was the Z-3, the inven-
tion of Konrad Zuse in Germany in 1941. Zuse also invented floating-point numbers and the
first high-level programming language, Plankalkül. The first electronic computer, the ABC,
was assembled by John Atanasoff and his student Clifford Berry between 1940 and 1942
at Iowa State University. Atanasoff’s research received little support or recognition; it was
the ENIAC, developed as part of a secret military project at the University of Pennsylvania
by a team including John Mauchly and John Eckert, that proved to be the most influential
forerunner of modern computers.
Since that time, each generation of computer hardware has brought an increase in speed
and capacity and a decrease in price. Performance doubled every 18 months or so until around
2005, when power dissipation problems led manufacturers to start multiplying the number of
CPU cores rather than the clock speed. Current expectations are that future increases in power
will come from massive parallelism—a curious convergence with the properties of the brain.
Of course, there were calculating devices before the electronic computer. The earliest
automated machines, dating from the 17th century, were discussed on page 6. The first pro-
grammable machine was a loom, devised in 1805 by Joseph Marie Jacquard (1752–1834),
that used punched cards to store instructions for the pattern to be woven. In the mid-19th
century, Charles Babbage (1792–1871) designed two machines, neither of which he com-
pleted. The Difference Engine was intended to compute mathematical tables for engineering
and scientific projects. It was finally built and shown to work in 1991 at the Science Museum
in London (Swade, 2000). Babbage’s Analytical Engine was far more ambitious: it included
addressable memory, stored programs, and conditional jumps and was the first artifact capa-
ble of universal computation. Babbage’s colleague Ada Lovelace, daughter of the poet Lord
Byron, was perhaps the world’s first programmer. (The programming language Ada is named
after her.) She wrote programs for the unfinished Analytical Engine and even speculated that
the machine could play chess or compose music.
AI also owes a debt to the software side of computer science, which has supplied the
operating systems, programming languages, and tools needed to write modern programs (and
papers about them). But this is one area where the debt has been repaid: work in AI has pio-
neered many ideas that have made their way back to mainstream computer science, including
time sharing, interactive interpreters, personal computers with windows and mice, rapid de-
velopment environments, the linked list data type, automatic storage management, and key
concepts of symbolic, functional, declarative, and object-oriented programming.
8 Heath Robinson was a cartoonist famous for his depictions of whimsical and absurdly complicated contrap-
tions for everyday tasks such as buttering toast.
9 In the postwar period, Turing wanted to use these computers for AI research—for example, one of the first
chess programs (Turing et al., 1953). His efforts were blocked by the British government.

1.2.7 Control theory and cybernetics
• How can artifacts operate under their own control?
Ktesibios of Alexandria (c. 250 B.C.) built the first self-controlling machine: a water clock
with a regulator that maintained a constant flow rate. This invention changed the definition
of what an artifact could do. Previously, only living things could modify their behavior in
response to changes in the environment. Other examples of self-regulating feedback control
systems include the steam engine governor, created by James Watt (1736–1819), and the
thermostat, invented by Cornelis Drebbel (1572–1633), who also invented the submarine.
The mathematical theory of stable feedback systems was developed in the 19th century.
The central figure in the creation of what is now called control theory was Norbert
CONTROL THEORY
Wiener (1894–1964). Wiener was a brilliant mathematician who worked with Bertrand Rus-
sell, among others, before developing an interest in biological and mechanical control systems
and their connection to cognition. Like Craik (who also used control systems as psychological
models), Wiener and his colleagues Arturo Rosenblueth and Julian Bigelow challenged the
behaviorist orthodoxy (Rosenblueth et al., 1943). They viewed purposive behavior as aris-
ing from a regulatory mechanism trying to minimize “error”—the difference between current
state and goal state. In the late 1940s, Wiener, along with Warren McCulloch, Walter Pitts,
and John von Neumann, organized a series of influential conferences that explored the new
mathematical and computational models of cognition. Wiener’s book Cybernetics (1948) be-
CYBERNETICS
came a bestseller and awoke the public to the possibility of artificially intelligent machines.
Meanwhile, in Britain, W. Ross Ashby (Ashby, 1940) pioneered similar ideas. Ashby, Alan
Turing, Grey Walter, and others formed the Ratio Club for “those who had Wiener’s ideas
before Wiener’s book appeared.” Ashby’s Design for a Brain (1948, 1952) elaborated on his
idea that intelligence could be created by the use of homeostatic devices containing appro-
HOMEOSTATIC
priate feedback loops to achieve stable adaptive behavior.
Modern control theory, especially the branch known as stochastic optimal control, has
as its goal the design of systems that maximize an objective function over time. This roughly
OBJECTIVE
FUNCTION
matches our view of AI: designing systems that behave optimally. Why, then, are AI and
control theory two different fields, despite the close connections among their founders? The
answer lies in the close coupling between the mathematical techniques that were familiar to
the participants and the corresponding sets of problems that were encompassed in each world
view. Calculus and matrix algebra, the tools of control theory, lend themselves to systems that
are describable by fixed sets of continuous variables, whereas AI was founded in part as a way
to escape from the these perceived limitations. The tools of logical inference and computation
allowed AI researchers to consider problems such as language, vision, and planning that fell
completely outside the control theorist’s purview.
1.2.8 Linguistics
• How does language relate to thought?
In 1957, B. F. Skinner published Verbal Behavior. This was a comprehensive, detailed ac-
count of the behaviorist approach to language learning, written by the foremost expert in

the field. But curiously, a review of the book became as well known as the book itself, and
served to almost kill off interest in behaviorism. The author of the review was the linguist
Noam Chomsky, who had just published a book on his own theory, Syntactic Structures.
Chomsky pointed out that the behaviorist theory did not address the notion of creativity in
language—it did not explain how a child could understand and make up sentences that he or
she had never heard before. Chomsky’s theory—based on syntactic models going back to the
Indian linguist Panini (c. 350 B.C.)—could explain this, and unlike previous theories, it was
formal enough that it could in principle be programmed.
Modern linguistics and AI, then, were “born” at about the same time, and grew up
together, intersecting in a hybrid field called computational linguistics or natural language
COMPUTATIONAL
LINGUISTICS
processing. The problem of understanding language soon turned out to be considerably more
complex than it seemed in 1957. Understanding language requires an understanding of the
subject matter and context, not just an understanding of the structure of sentences. This might
seem obvious, but it was not widely appreciated until the 1960s. Much of the early work in
knowledge representation (the study of how to put knowledge into a form that a computer
can reason with) was tied to language and informed by research in linguistics, which was
connected in turn to decades of work on the philosophical analysis of language.
1.3 THE HISTORY OF ARTIFICIAL INTELLIGENCE
With the background material behind us, we are ready to cover the development of AI itself.
1.3.1 The gestation of artificial intelligence (1943–1955)
The first work that is now generally recognized as AI was done by Warren McCulloch and
Walter Pitts (1943). They drew on three sources: knowledge of the basic physiology and
function of neurons in the brain; a formal analysis of propositional logic due to Russell and
Whitehead; and Turing’s theory of computation. They proposed a model of artificial neurons
in which each neuron is characterized as being “on” or “off,” with a switch to “on” occurring
in response to stimulation by a sufficient number of neighboring neurons. The state of a
neuron was conceived of as “factually equivalent to a proposition which proposed its adequate
stimulus.” They showed, for example, that any computable function could be computed by
some network of connected neurons, and that all the logical connectives (and, or, not, etc.)
could be implemented by simple net structures. McCulloch and Pitts also suggested that
suitably defined networks could learn. Donald Hebb (1949) demonstrated a simple updating
rule for modifying the connection strengths between neurons. His rule, now called Hebbian
learning, remains an influential model to this day.
HEBBIAN LEARNING
Two undergraduate students at Harvard, Marvin Minsky and Dean Edmonds, built the
first neural network computer in 1950. The SNARC, as it was called, used 3000 vacuum
tubes and a surplus automatic pilot mechanism from a B-24 bomber to simulate a network of
40 neurons. Later, at Princeton, Minsky studied universal computation in neural networks.
His Ph.D. committee was skeptical about whether this kind of work should be considered

Section 1.3. The History of Artificial Intelligence 17
mathematics, but von Neumann reportedly said, “If it isn’t now, it will be someday.” Minsky
was later to prove influential theorems showing the limitations of neural network research.
There were a number of early examples of work that can be characterized as AI, but
Alan Turing’s vision was perhaps the most influential. He gave lectures on the topic as early
as 1947 at the London Mathematical Society and articulated a persuasive agenda in his 1950
article “Computing Machinery and Intelligence.” Therein, he introduced the Turing Test,
machine learning, genetic algorithms, and reinforcement learning. He proposed the Child
Programme idea, explaining “Instead of trying to produce a programme to simulate the adult
mind, why not rather try to produce one which simulated the child’s?”
1.3.2 The birth of artificial intelligence (1956)
Princeton was home to another influential figure in AI, John McCarthy. After receiving his
PhD there in 1951 and working for two years as an instructor, McCarthy moved to Stan-
ford and then to Dartmouth College, which was to become the official birthplace of the field.
McCarthy convinced Minsky, Claude Shannon, and Nathaniel Rochester to help him bring
together U.S. researchers interested in automata theory, neural nets, and the study of intel-
ligence. They organized a two-month workshop at Dartmouth in the summer of 1956. The
proposal states:10
We propose that a 2 month, 10 man study of artificial intelligence be carried
out during the summer of 1956 at Dartmouth College in Hanover, New Hamp-
shire. The study is to proceed on the basis of the conjecture that every aspect of
learning or any other feature of intelligence can in principle be so precisely de-
scribed that a machine can be made to simulate it. An attempt will be made to find
how to make machines use language, form abstractions and concepts, solve kinds
of problems now reserved for humans, and improve themselves. We think that a
significant advance can be made in one or more of these problems if a carefully
selected group of scientists work on it together for a summer.
There were 10 attendees in all, including Trenchard More from Princeton, Arthur Samuel
from IBM, and Ray Solomonoff and Oliver Selfridge from MIT.
Two researchers from Carnegie Tech,11 Allen Newell and Herbert Simon, rather stole
the show. Although the others had ideas and in some cases programs for particular appli-
cations such as checkers, Newell and Simon already had a reasoning program, the Logic
Theorist (LT), about which Simon claimed, “We have invented a computer program capable
of thinking non-numerically, and thereby solved the venerable mind–body problem.”12 Soon
after the workshop, the program was able to prove most of the theorems in Chapter 2 of Rus-
10 This was the first official usage of McCarthy’s term artificial intelligence. Perhaps “computational rationality”
would have been more precise and less threatening, but “AI” has stuck. At the 50th anniversary of the Dartmouth
conference, McCarthy stated that he resisted the terms “computer” or “computational” in deference to Norbert
Weiner, who was promoting analog cybernetic devices rather than digital computers.
11 Now Carnegie Mellon University (CMU).
12 Newell and Simon also invented a list-processing language, IPL, to write LT. They had no compiler and
translated it into machine code by hand. To avoid errors, they worked in parallel, calling out binary numbers to
each other as they wrote each instruction to make sure they agreed.

sell and Whitehead’s Principia Mathematica. Russell was reportedly delighted when Simon
showed him that the program had come up with a proof for one theorem that was shorter than
the one in Principia. The editors of the Journal of Symbolic Logic were less impressed; they
rejected a paper coauthored by Newell, Simon, and Logic Theorist.
The Dartmouth workshop did not lead to any new breakthroughs, but it did introduce
all the major figures to each other. For the next 20 years, the field would be dominated by
these people and their students and colleagues at MIT, CMU, Stanford, and IBM.
Looking at the proposal for the Dartmouth workshop (McCarthy et al., 1955), we can
see why it was necessary for AI to become a separate field. Why couldn’t all the work done
in AI have taken place under the name of control theory or operations research or decision
theory, which, after all, have objectives similar to those of AI? Or why isn’t AI a branch
of mathematics? The first answer is that AI from the start embraced the idea of duplicating
human faculties such as creativity, self-improvement, and language use. None of the other
fields were addressing these issues. The second answer is methodology. AI is the only one
of these fields that is clearly a branch of computer science (although operations research does
share an emphasis on computer simulations), and AI is the only field to attempt to build
machines that will function autonomously in complex, changing environments.
1.3.3 Early enthusiasm, great expectations (1952–1969)
The early years of AI were full of successes—in a limited way. Given the primitive comput-
ers and programming tools of the time and the fact that only a few years earlier computers
were seen as things that could do arithmetic and no more, it was astonishing whenever a com-
puter did anything remotely clever. The intellectual establishment, by and large, preferred to
believe that “a machine can never do X.” (See Chapter 26 for a long list of X’s gathered
by Turing.) AI researchers naturally responded by demonstrating one X after another. John
McCarthy referred to this period as the “Look, Ma, no hands!” era.
Newell and Simon’s early success was followed up with the General Problem Solver,
or GPS. Unlike Logic Theorist, this program was designed from the start to imitate human
problem-solving protocols. Within the limited class of puzzles it could handle, it turned out
that the order in which the program considered subgoals and possible actions was similar to
that in which humans approached the same problems. Thus, GPS was probably the first pro-
gram to embody the “thinking humanly” approach. The success of GPS and subsequent pro-
grams as models of cognition led Newell and Simon (1976) to formulate the famous physical
symbol system hypothesis, which states that “a physical symbol system has the necessary and
PHYSICAL SYMBOL
SYSTEM
sufficient means for general intelligent action.” What they meant is that any system (human
or machine) exhibiting intelligence must operate by manipulating data structures composed
of symbols. We will see later that this hypothesis has been challenged from many directions.
At IBM, Nathaniel Rochester and his colleagues produced some of the first AI pro-
grams. Herbert Gelernter (1959) constructed the Geometry Theorem Prover, which was
able to prove theorems that many students of mathematics would find quite tricky. Starting
in 1952, Arthur Samuel wrote a series of programs for checkers (draughts) that eventually
learned to play at a strong amateur level. Along the way, he disproved the idea that comput-

ers can do only what they are told to: his program quickly learned to play a better game than
its creator. The program was demonstrated on television in February 1956, creating a strong
impression. Like Turing, Samuel had trouble finding computer time. Working at night, he
used machines that were still on the testing floor at IBM’s manufacturing plant. Chapter 5
covers game playing, and Chapter 21 explains the learning techniques used by Samuel.
John McCarthy moved from Dartmouth to MIT and there made three crucial contribu-
tions in one historic year: 1958. In MIT AI Lab Memo No. 1, McCarthy defined the high-level
language Lisp, which was to become the dominant AI programming language for the next 30
LISP
years. With Lisp, McCarthy had the tool he needed, but access to scarce and expensive com-
puting resources was also a serious problem. In response, he and others at MIT invented time
sharing. Also in 1958, McCarthy published a paper entitled Programs with Common Sense,
in which he described the Advice Taker, a hypothetical program that can be seen as the first
complete AI system. Like the Logic Theorist and Geometry Theorem Prover, McCarthy’s
program was designed to use knowledge to search for solutions to problems. But unlike the
others, it was to embody general knowledge of the world. For example, he showed how
some simple axioms would enable the program to generate a plan to drive to the airport. The
program was also designed to accept new axioms in the normal course of operation, thereby
allowing it to achieve competence in new areas without being reprogrammed. The Advice
Taker thus embodied the central principles of knowledge representation and reasoning: that
it is useful to have a formal, explicit representation of the world and its workings and to be
able to manipulate that representation with deductive processes. It is remarkable how much
of the 1958 paper remains relevant today.
1958 also marked the year that Marvin Minsky moved to MIT. His initial collaboration
with McCarthy did not last, however. McCarthy stressed representation and reasoning in for-
mal logic, whereas Minsky was more interested in getting programs to work and eventually
developed an anti-logic outlook. In 1963, McCarthy started the AI lab at Stanford. His plan
to use logic to build the ultimate Advice Taker was advanced by J. A. Robinson’s discov-
ery in 1965 of the resolution method (a complete theorem-proving algorithm for first-order
logic; see Chapter 9). Work at Stanford emphasized general-purpose methods for logical
reasoning. Applications of logic included Cordell Green’s question-answering and planning
systems (Green, 1969b) and the Shakey robotics project at the Stanford Research Institute
(SRI). The latter project, discussed further in Chapter 25, was the first to demonstrate the
complete integration of logical reasoning and physical activity.
Minsky supervised a series of students who chose limited problems that appeared to
require intelligence to solve. These limited domains became known as microworlds. James
MICROWORLD
Slagle’s SAINT program (1963) was able to solve closed-form calculus integration problems
typical of first-year college courses. Tom Evans’s ANALOGY program (1968) solved geo-
metric analogy problems that appear in IQ tests. Daniel Bobrow’s STUDENT program (1967)
solved algebra story problems, such as the following:
If the number of customers Tom gets is twice the square of 20 percent of the number
of advertisements he runs, and the number of advertisements he runs is 45, what is the
number of customers Tom gets?

Red
Green
Red
Green
Green
Blue
Blue
Red
Figure 1.4 A scene from the blocks world. SHRDLU (Winograd, 1972) has just completed
the command “Find a block which is taller than the one you are holding and put it in the box.”
The most famous microworld was the blocks world, which consists of a set of solid blocks
placed on a tabletop (or more often, a simulation of a tabletop), as shown in Figure 1.4.
A typical task in this world is to rearrange the blocks in a certain way, using a robot hand
that can pick up one block at a time. The blocks world was home to the vision project of
David Huffman (1971), the vision and constraint-propagation work of David Waltz (1975),
the learning theory of Patrick Winston (1970), the natural-language-understanding program
of Terry Winograd (1972), and the planner of Scott Fahlman (1974).
Early work building on the neural networks of McCulloch and Pitts also ﬂourished.
The work of Winograd and Cowan (1963) showed how a large number of elements could
collectively represent an individual concept, with a corresponding increase in robustness and
parallelism. Hebb’s learning methods were enhanced by Bernie Widrow (Widrow and Hoff,
1960; Widrow, 1962), who called his networks adalines, and by Frank Rosenblatt (1962)
with his perceptrons. The perceptron convergence theorem (Block et al., 1962) says that
the learning algorithm can adjust the connection strengths of a perceptron to match any input
data, provided such a match exists. These topics are covered in Chapter 20.
1.3.4 A dose of reality (1966–1973)
From the beginning, AI researchers were not shy about making predictions of their coming
successes. The following statement by Herbert Simon in 1957 is often quoted:
It is not my aim to surprise or shock you—but the simplest way I can summarize is to say
that there are now in the world machines that think, that learn and that create. Moreover,

their ability to do these things is going to increase rapidly until—in a visible future—the
range of problems they can handle will be coextensive with the range to which the human
mind has been applied.
Terms such as “visible future” can be interpreted in various ways, but Simon also made
more concrete predictions: that within 10 years a computer would be chess champion, and
a significant mathematical theorem would be proved by machine. These predictions came
true (or approximately true) within 40 years rather than 10. Simon’s overconfidence was due
to the promising performance of early AI systems on simple examples. In almost all cases,
however, these early systems turned out to fail miserably when tried out on wider selections
of problems and on more difficult problems.
The first kind of difficulty arose because most early programs knew nothing of their
subject matter; they succeeded by means of simple syntactic manipulations. A typical story
occurred in early machine translation efforts, which were generously funded by the U.S. Na-
tional Research Council in an attempt to speed up the translation of Russian scientific papers
in the wake of the Sputnik launch in 1957. It was thought initially that simple syntactic trans-
formations based on the grammars of Russian and English, and word replacement from an
electronic dictionary, would suffice to preserve the exact meanings of sentences. The fact is
that accurate translation requires background knowledge in order to resolve ambiguity and
establish the content of the sentence. The famous retranslation of “the spirit is willing but
the flesh is weak” as “the vodka is good but the meat is rotten” illustrates the difficulties en-
countered. In 1966, a report by an advisory committee found that “there has been no machine
translation of general scientific text, and none is in immediate prospect.” All U.S. government
funding for academic translation projects was canceled. Today, machine translation is an im-
perfect but widely used tool for technical, commercial, government, and Internet documents.
The second kind of difficulty was the intractability of many of the problems that AI was
attempting to solve. Most of the early AI programs solved problems by trying out different
combinations of steps until the solution was found. This strategy worked initially because
microworlds contained very few objects and hence very few possible actions and very short
solution sequences. Before the theory of computational complexity was developed, it was
widely thought that “scaling up” to larger problems was simply a matter of faster hardware
and larger memories. The optimism that accompanied the development of resolution theorem
proving, for example, was soon dampened when researchers failed to prove theorems involv-
ing more than a few dozen facts. The fact that a program can find a solution in principle does
not mean that the program contains any of the mechanisms needed to find it in practice.
The illusion of unlimited computational power was not confined to problem-solving
programs. Early experiments in machine evolution (now called genetic algorithms) (Fried-
MACHINE EVOLUTION
GENETIC
ALGORITHM berg, 1958; Friedberg et al., 1959) were based on the undoubtedly correct belief that by
making an appropriate series of small mutations to a machine-code program, one can gen-
erate a program with good performance for any particular task. The idea, then, was to try
random mutations with a selection process to preserve mutations that seemed useful. De-
spite thousands of hours of CPU time, almost no progress was demonstrated. Modern genetic
algorithms use better representations and have shown more success.

Failure to come to grips with the “combinatorial explosion” was one of the main criti-
cisms of AI contained in the Lighthill report (Lighthill, 1973), which formed the basis for the
decision by the British government to end support for AI research in all but two universities.
(Oral tradition paints a somewhat different and more colorful picture, with political ambitions
and personal animosities whose description is beside the point.)
A third difficulty arose because of some fundamental limitations on the basic structures
being used to generate intelligent behavior. For example, Minsky and Papert’s book Percep-
trons (1969) proved that, although perceptrons (a simple form of neural network) could be
shown to learn anything they were capable of representing, they could represent very little. In
particular, a two-input perceptron (restricted to be simpler than the form Rosenblatt originally
studied) could not be trained to recognize when its two inputs were different. Although their
results did not apply to more complex, multilayer networks, research funding for neural-net
research soon dwindled to almost nothing. Ironically, the new back-propagation learning al-
gorithms for multilayer networks that were to cause an enormous resurgence in neural-net
research in the late 1980s were actually discovered first in 1969 (Bryson and Ho, 1969).
1.3.5 Knowledge-based systems: The key to power? (1969–1979)
The picture of problem solving that had arisen during the first decade of AI research was of
a general-purpose search mechanism trying to string together elementary reasoning steps to
find complete solutions. Such approaches have been called weak methods because, although
WEAK METHOD
general, they do not scale up to large or difficult problem instances. The alternative to weak
methods is to use more powerful, domain-specific knowledge that allows larger reasoning
steps and can more easily handle typically occurring cases in narrow areas of expertise. One
might say that to solve a hard problem, you have to almost know the answer already.
The DENDRAL program (Buchanan et al., 1969) was an early example of this approach.
It was developed at Stanford, where Ed Feigenbaum (a former student of Herbert Simon),
Bruce Buchanan (a philosopher turned computer scientist), and Joshua Lederberg (a Nobel
laureate geneticist) teamed up to solve the problem of inferring molecular structure from the
information provided by a mass spectrometer. The input to the program consists of the ele-
mentary formula of the molecule (e.g., C6H13NO2) and the mass spectrum giving the masses
of the various fragments of the molecule generated when it is bombarded by an electron beam.
For example, the mass spectrum might contain a peak at m = 15, corresponding to the mass
of a methyl (CH3) fragment.
The naive version of the program generated all possible structures consistent with the
formula, and then predicted what mass spectrum would be observed for each, comparing this
with the actual spectrum. As one might expect, this is intractable for even moderate-sized
molecules. The DENDRAL researchers consulted analytical chemists and found that they
worked by looking for well-known patterns of peaks in the spectrum that suggested common
substructures in the molecule. For example, the following rule is used to recognize a ketone
(C=O) subgroup (which weighs 28):
if there are two peaks at x1 and x2 such that
(a) x1 + x2 = M + 28 (M is the mass of the whole molecule);

(b) x1 − 28 is a high peak;
(c) x2 − 28 is a high peak;
(d) At least one of x1 and x2 is high.
then there is a ketone subgroup
Recognizing that the molecule contains a particular substructure reduces the number of pos-
sible candidates enormously. DENDRAL was powerful because
All the relevant theoretical knowledge to solve these problems has been mapped over from
its general form in the [spectrum prediction component] (“first principles”) to efficient
special forms (“cookbook recipes”). (Feigenbaum et al., 1971)
The significance of DENDRAL was that it was the first successful knowledge-intensive sys-
tem: its expertise derived from large numbers of special-purpose rules. Later systems also
incorporated the main theme of McCarthy’s Advice Taker approach—the clean separation of
the knowledge (in the form of rules) from the reasoning component.
With this lesson in mind, Feigenbaum and others at Stanford began the Heuristic Pro-
gramming Project (HPP) to investigate the extent to which the new methodology of expert
systems could be applied to other areas of human expertise. The next major effort was in
EXPERT SYSTEMS
the area of medical diagnosis. Feigenbaum, Buchanan, and Dr. Edward Shortliffe developed
MYCIN to diagnose blood infections. With about 450 rules, MYCIN was able to perform
as well as some experts, and considerably better than junior doctors. It also contained two
major differences from DENDRAL. First, unlike the DENDRAL rules, no general theoretical
model existed from which the MYCIN rules could be deduced. They had to be acquired from
extensive interviewing of experts, who in turn acquired them from textbooks, other experts,
and direct experience of cases. Second, the rules had to reflect the uncertainty associated with
medical knowledge. MYCIN incorporated a calculus of uncertainty called certainty factors
CERTAINTY FACTOR
(see Chapter 14), which seemed (at the time) to fit well with how doctors assessed the impact
of evidence on the diagnosis.
The importance of domain knowledge was also apparent in the area of understanding
natural language. Although Winograd’s SHRDLU system for understanding natural language
had engendered a good deal of excitement, its dependence on syntactic analysis caused some
of the same problems as occurred in the early machine translation work. It was able to
overcome ambiguity and understand pronoun references, but this was mainly because it was
designed specifically for one area—the blocks world. Several researchers, including Eugene
Charniak, a fellow graduate student of Winograd’s at MIT, suggested that robust language
understanding would require general knowledge about the world and a general method for
using that knowledge.
At Yale, linguist-turned-AI-researcher Roger Schank emphasized this point, claiming,
“There is no such thing as syntax,” which upset a lot of linguists but did serve to start a useful
discussion. Schank and his students built a series of programs (Schank and Abelson, 1977;
Wilensky, 1978; Schank and Riesbeck, 1981; Dyer, 1983) that all had the task of under-
standing natural language. The emphasis, however, was less on language per se and more on
the problems of representing and reasoning with the knowledge required for language under-
standing. The problems included representing stereotypical situations (Cullingford, 1981),

describing human memory organization (Rieger, 1976; Kolodner, 1983), and understanding
plans and goals (Wilensky, 1983).
The widespread growth of applications to real-world problems caused a concurrent in-
crease in the demands for workable knowledge representation schemes. A large number
of different representation and reasoning languages were developed. Some were based on
logic—for example, the Prolog language became popular in Europe, and the PLANNER fam-
ily in the United States. Others, following Minsky’s idea of frames (1975), adopted a more
FRAMES
structured approach, assembling facts about particular object and event types and arranging
the types into a large taxonomic hierarchy analogous to a biological taxonomy.
1.3.6 AI becomes an industry (1980–present)
The first successful commercial expert system, R1, began operation at the Digital Equipment
Corporation (McDermott, 1982). The program helped configure orders for new computer
systems; by 1986, it was saving the company an estimated $40 million a year. By 1988,
DEC’s AI group had 40 expert systems deployed, with more on the way. DuPont had 100 in
use and 500 in development, saving an estimated $10 million a year. Nearly every major U.S.
corporation had its own AI group and was either using or investigating expert systems.
In 1981, the Japanese announced the “Fifth Generation” project, a 10-year plan to build
intelligent computers running Prolog. In response, the United States formed the Microelec-
tronics and Computer Technology Corporation (MCC) as a research consortium designed to
assure national competitiveness. In both cases, AI was part of a broad effort, including chip
design and human-interface research. In Britain, the Alvey report reinstated the funding that
was cut by the Lighthill report.13 In all three countries, however, the projects never met their
ambitious goals.
Overall, the AI industry boomed from a few million dollars in 1980 to billions of dollars
in 1988, including hundreds of companies building expert systems, vision systems, robots,
and software and hardware specialized for these purposes. Soon after that came a period
called the “AI Winter,” in which many companies fell by the wayside as they failed to deliver
on extravagant promises.
1.3.7 The return of neural networks (1986–present)
In the mid-1980s at least four different groups reinvented the back-propagation learning
BACK-PROPAGATION
algorithm first found in 1969 by Bryson and Ho. The algorithm was applied to many learn-
ing problems in computer science and psychology, and the widespread dissemination of the
results in the collection Parallel Distributed Processing (Rumelhart and McClelland, 1986)
caused great excitement.
These so-called connectionist models of intelligent systems were seen by some as di-
CONNECTIONIST
rect competitors both to the symbolic models promoted by Newell and Simon and to the
logicist approach of McCarthy and others (Smolensky, 1988). It might seem obvious that
at some level humans manipulate symbols—in fact, Terrence Deacon’s book The Symbolic
13 To save embarrassment, a new field called IKBS (Intelligent Knowledge-Based Systems) was invented because
Artificial Intelligence had been officially canceled.

Species (1997) suggests that this is the defining characteristic of humans—but the most ar-
dent connectionists questioned whether symbol manipulation had any real explanatory role in
detailed models of cognition. This question remains unanswered, but the current view is that
connectionist and symbolic approaches are complementary, not competing. As occurred with
the separation of AI and cognitive science, modern neural network research has bifurcated
into two fields, one concerned with creating effective network architectures and algorithms
and understanding their mathematical properties, the other concerned with careful modeling
of the empirical properties of actual neurons and ensembles of neurons.
1.3.8 AI adopts the scientific method (1987–present)
Recent years have seen a revolution in both the content and the methodology of work in
artificial intelligence.14 It is now more common to build on existing theories than to propose
brand-new ones, to base claims on rigorous theorems or hard experimental evidence rather
than on intuition, and to show relevance to real-world applications rather than toy examples.
AI was founded in part as a rebellion against the limitations of existing fields like control
theory and statistics, but now it is embracing those fields. As David McAllester (1998) put it:
In the early period of AI it seemed plausible that new forms of symbolic computation,
e.g., frames and semantic networks, made much of classical theory obsolete. This led to
a form of isolationism in which AI became largely separated from the rest of computer
science. This isolationism is currently being abandoned. There is a recognition that
machine learning should not be isolated from information theory, that uncertain reasoning
should not be isolated from stochastic modeling, that search should not be isolated from
classical optimization and control, and that automated reasoning should not be isolated
from formal methods and static analysis.
In terms of methodology, AI has finally come firmly under the scientific method. To be ac-
cepted, hypotheses must be subjected to rigorous empirical experiments, and the results must
be analyzed statistically for their importance (Cohen, 1995). It is now possible to replicate
experiments by using shared repositories of test data and code.
The field of speech recognition illustrates the pattern. In the 1970s, a wide variety of
different architectures and approaches were tried. Many of these were rather ad hoc and
fragile, and were demonstrated on only a few specially selected examples. In recent years,
approaches based on hidden Markov models (HMMs) have come to dominate the area. Two
HIDDEN MARKOV
MODELS
aspects of HMMs are relevant. First, they are based on a rigorous mathematical theory. This
has allowed speech researchers to build on several decades of mathematical results developed
in other fields. Second, they are generated by a process of training on a large corpus of
real speech data. This ensures that the performance is robust, and in rigorous blind tests the
HMMs have been improving their scores steadily. Speech technology and the related field of
handwritten character recognition are already making the transition to widespread industrial
14 Some have characterized this change as a victory of the neats—those who think that AI theories should be
grounded in mathematical rigor—over the scruffies—those who would rather try out lots of ideas, write some
programs, and then assess what seems to be working. Both approaches are important. A shift toward neatness
implies that the field has reached a level of stability and maturity. Whether that stability will be disrupted by a
new scruffy idea is another question.

and consumer applications. Note that there is no scientific claim that humans use HMMs to
recognize speech; rather, HMMs provide a mathematical framework for understanding the
problem and support the engineering claim that they work well in practice.
Machine translation follows the same course as speech recognition. In the 1950s there
was initial enthusiasm for an approach based on sequences of words, with models learned
according to the principles of information theory. That approach fell out of favor in the
1960s, but returned in the late 1990s and now dominates the field.
Neural networks also fit this trend. Much of the work on neural nets in the 1980s was
done in an attempt to scope out what could be done and to learn how neural nets differ from
“traditional” techniques. Using improved methodology and theoretical frameworks, the field
arrived at an understanding in which neural nets can now be compared with corresponding
techniques from statistics, pattern recognition, and machine learning, and the most promising
technique can be applied to each application. As a result of these developments, so-called
data mining technology has spawned a vigorous new industry.
DATA MINING
Judea Pearl’s (1988) Probabilistic Reasoning in Intelligent Systems led to a new accep-
tance of probability and decision theory in AI, following a resurgence of interest epitomized
by Peter Cheeseman’s (1985) article “In Defense of Probability.” The Bayesian network
BAYESIAN NETWORK
formalism was invented to allow efficient representation of, and rigorous reasoning with,
uncertain knowledge. This approach largely overcomes many problems of the probabilistic
reasoning systems of the 1960s and 1970s; it now dominates AI research on uncertain reason-
ing and expert systems. The approach allows for learning from experience, and it combines
the best of classical AI and neural nets. Work by Judea Pearl (1982a) and by Eric Horvitz and
David Heckerman (Horvitz and Heckerman, 1986; Horvitz et al., 1986) promoted the idea of
normative expert systems: ones that act rationally according to the laws of decision theory
and do not try to imitate the thought steps of human experts. The WindowsTM operating sys-
tem includes several normative diagnostic expert systems for correcting problems. Chapters
13 to 16 cover this area.
Similar gentle revolutions have occurred in robotics, computer vision, and knowledge
representation. A better understanding of the problems and their complexity properties, com-
bined with increased mathematical sophistication, has led to workable research agendas and
robust methods. Although increased formalization and specialization led fields such as vision
and robotics to become somewhat isolated from “mainstream” AI in the 1990s, this trend has
reversed in recent years as tools from machine learning in particular have proved effective for
many problems. The process of reintegration is already yielding significant benefits
1.3.9 The emergence of intelligent agents (1995–present)
Perhaps encouraged by the progress in solving the subproblems of AI, researchers have also
started to look at the “whole agent” problem again. The work of Allen Newell, John Laird,
and Paul Rosenbloom on SOAR (Newell, 1990; Laird et al., 1987) is the best-known example
of a complete agent architecture. One of the most important environments for intelligent
agents is the Internet. AI systems have become so common in Web-based applications that
the “-bot” suffix has entered everyday language. Moreover, AI technologies underlie many

Internet tools, such as search engines, recommender systems, and Web site aggregators.
One consequence of trying to build complete agents is the realization that the previously
isolated subfields of AI might need to be reorganized somewhat when their results are to be
tied together. In particular, it is now widely appreciated that sensory systems (vision, sonar,
speech recognition, etc.) cannot deliver perfectly reliable information about the environment.
Hence, reasoning and planning systems must be able to handle uncertainty. A second major
consequence of the agent perspective is that AI has been drawn into much closer contact
with other fields, such as control theory and economics, that also deal with agents. Recent
progress in the control of robotic cars has derived from a mixture of approaches ranging from
better sensors, control-theoretic integration of sensing, localization and mapping, as well as
a degree of high-level planning.
Despite these successes, some influential founders of AI, including John McCarthy
(2007), Marvin Minsky (2007), Nils Nilsson (1995, 2005) and Patrick Winston (Beal and
Winston, 2009), have expressed discontent with the progress of AI. They think that AI should
put less emphasis on creating ever-improved versions of applications that are good at a spe-
cific task, such as driving a car, playing chess, or recognizing speech. Instead, they believe
AI should return to its roots of striving for, in Simon’s words, “machines that think, that learn
and that create.” They call the effort human-level AI or HLAI; their first symposium was in
HUMAN-LEVEL AI
2004 (Minsky et al., 2004). The effort will require very large knowledge bases; Hendler et al.
(1995) discuss where these knowledge bases might come from.
A related idea is the subfield of Artificial General Intelligence or AGI (Goertzel and
ARTIFICIAL GENERAL
INTELLIGENCE
Pennachin, 2007), which held its first conference and organized the Journal of Artificial Gen-
eral Intelligence in 2008. AGI looks for a universal algorithm for learning and acting in
any environment, and has its roots in the work of Ray Solomonoff (1964), one of the atten-
dees of the original 1956 Dartmouth conference. Guaranteeing that what we create is really
Friendly AI is also a concern (Yudkowsky, 2008; Omohundro, 2008), one we will return to
FRIENDLY AI
in Chapter 26.
1.3.10 The availability of very large data sets (2001–present)
Throughout the 60-year history of computer science, the emphasis has been on the algorithm
as the main subject of study. But some recent work in AI suggests that for many problems, it
makes more sense to worry about the data and be less picky about what algorithm to apply.
This is true because of the increasing availability of very large data sources: for example,
trillions of words of English and billions of images from the Web (Kilgarriff and Grefenstette,
2006); or billions of base pairs of genomic sequences (Collins et al., 2003).
One influential paper in this line was Yarowsky’s (1995) work on word-sense disam-
biguation: given the use of the word “plant” in a sentence, does that refer to flora or factory?
Previous approaches to the problem had relied on human-labeled examples combined with
machine learning algorithms. Yarowsky showed that the task can be done, with accuracy
above 96%, with no labeled examples at all. Instead, given a very large corpus of unanno-
tated text and just the dictionary definitions of the two senses—“works, industrial plant” and
“flora, plant life”—one can label examples in the corpus, and from there bootstrap to learn

new patterns that help label new examples. Banko and Brill (2001) show that techniques
like this perform even better as the amount of available text goes from a million words to a
billion and that the increase in performance from using more data exceeds any difference in
algorithm choice; a mediocre algorithm with 100 million words of unlabeled training data
outperforms the best known algorithm with 1 million words.
As another example, Hays and Efros (2007) discuss the problem of filling in holes in a
photograph. Suppose you use Photoshop to mask out an ex-friend from a group photo, but
now you need to fill in the masked area with something that matches the background. Hays
and Efros defined an algorithm that searches through a collection of photos to find something
that will match. They found the performance of their algorithm was poor when they used
a collection of only ten thousand photos, but crossed a threshold into excellent performance
when they grew the collection to two million photos.
Work like this suggests that the “knowledge bottleneck” in AI—the problem of how to
express all the knowledge that a system needs—may be solved in many applications by learn-
ing methods rather than hand-coded knowledge engineering, provided the learning algorithms
have enough data to go on (Halevy et al., 2009). Reporters have noticed the surge of new ap-
plications and have written that “AI Winter” may be yielding to a new Spring (Havenstein,
2005). As Kurzweil (2005) writes, “today, many thousands of AI applications are deeply
embedded in the infrastructure of every industry.”
1.4 THE STATE OF THE ART
What can AI do today? A concise answer is difficult because there are so many activities in
so many subfields. Here we sample a few applications; others appear throughout the book.
Robotic vehicles: A driverless robotic car named STANLEY sped through the rough
terrain of the Mojave dessert at 22 mph, finishing the 132-mile course first to win the 2005
DARPA Grand Challenge. STANLEY is a Volkswagen Touareg outfitted with cameras, radar,
and laser rangefinders to sense the environment and onboard software to command the steer-
ing, braking, and acceleration (Thrun, 2006). The following year CMU’s BOSS won the Ur-
ban Challenge, safely driving in traffic through the streets of a closed Air Force base, obeying
traffic rules and avoiding pedestrians and other vehicles.
Speech recognition: A traveler calling United Airlines to book a flight can have the en-
tire conversation guided by an automated speech recognition and dialog management system.
Autonomous planning and scheduling: A hundred million miles from Earth, NASA’s
Remote Agent program became the first on-board autonomous planning program to control
the scheduling of operations for a spacecraft (Jonsson et al., 2000). REMOTE AGENT gen-
erated plans from high-level goals specified from the ground and monitored the execution of
those plans—detecting, diagnosing, and recovering from problems as they occurred. Succes-
sor program MAPGEN (Al-Chang et al., 2004) plans the daily operations for NASA’s Mars
Exploration Rovers, and MEXAR2 (Cesta et al., 2007) did mission planning—both logistics
and science planning—for the European Space Agency’s Mars Express mission in 2008.

Section 1.5. Summary 29
Game playing: IBM’s DEEP BLUE became the first computer program to defeat the
world champion in a chess match when it bested Garry Kasparov by a score of 3.5 to 2.5 in
an exhibition match (Goodman and Keene, 1997). Kasparov said that he felt a “new kind of
intelligence” across the board from him. Newsweek magazine described the match as “The
brain’s last stand.” The value of IBM’s stock increased by $18 billion. Human champions
studied Kasparov’s loss and were able to draw a few matches in subsequent years, but the
most recent human-computer matches have been won convincingly by the computer.
Spam fighting: Each day, learning algorithms classify over a billion messages as spam,
saving the recipient from having to waste time deleting what, for many users, could comprise
80% or 90% of all messages, if not classified away by algorithms. Because the spammers are
continually updating their tactics, it is difficult for a static programmed approach to keep up,
and learning algorithms work best (Sahami et al., 1998; Goodman and Heckerman, 2004).
Logistics planning: During the Persian Gulf crisis of 1991, U.S. forces deployed a
Dynamic Analysis and Replanning Tool, DART (Cross and Walker, 1994), to do automated
logistics planning and scheduling for transportation. This involved up to 50,000 vehicles,
cargo, and people at a time, and had to account for starting points, destinations, routes, and
conflict resolution among all parameters. The AI planning techniques generated in hours
a plan that would have taken weeks with older methods. The Defense Advanced Research
Project Agency (DARPA) stated that this single application more than paid back DARPA’s
30-year investment in AI.
Robotics: The iRobot Corporation has sold over two million Roomba robotic vacuum
cleaners for home use. The company also deploys the more rugged PackBot to Iraq and
Afghanistan, where it is used to handle hazardous materials, clear explosives, and identify
the location of snipers.
Machine Translation: A computer program automatically translates from Arabic to
English, allowing an English speaker to see the headline “Ardogan Confirms That Turkey
Would Not Accept Any Pressure, Urging Them to Recognize Cyprus.” The program uses a
statistical model built from examples of Arabic-to-English translations and from examples of
English text totaling two trillion words (Brants et al., 2007). None of the computer scientists
on the team speak Arabic, but they do understand statistics and machine learning algorithms.
These are just a few examples of artificial intelligence systems that exist today. Not
magic or science fiction—but rather science, engineering, and mathematics, to which this
book provides an introduction.
1.5 SUMMARY
This chapter defines AI and establishes the cultural background against which it has devel-
oped. Some of the important points are as follows:
• Different people approach AI with different goals in mind. Two important questions to
ask are: Are you concerned with thinking or behavior? Do you want to model humans
or work from an ideal standard?

• In this book, we adopt the view that intelligence is concerned mainly with rational
action. Ideally, an intelligent agent takes the best possible action in a situation. We
study the problem of building agents that are intelligent in this sense.
• Philosophers (going back to 400 B.C.) made AI conceivable by considering the ideas
that the mind is in some ways like a machine, that it operates on knowledge encoded in
some internal language, and that thought can be used to choose what actions to take.
• Mathematicians provided the tools to manipulate statements of logical certainty as well
as uncertain, probabilistic statements. They also set the groundwork for understanding
computation and reasoning about algorithms.
• Economists formalized the problem of making decisions that maximize the expected
outcome to the decision maker.
• Neuroscientists discovered some facts about how the brain works and the ways in which
it is similar to and different from computers.
• Psychologists adopted the idea that humans and animals can be considered information-
processing machines. Linguists showed that language use fits into this model.
• Computer engineers provided the ever-more-powerful machines that make AI applica-
tions possible.
• Control theory deals with designing devices that act optimally on the basis of feedback
from the environment. Initially, the mathematical tools of control theory were quite
different from AI, but the fields are coming closer together.
• The history of AI has had cycles of success, misplaced optimism, and resulting cutbacks
in enthusiasm and funding. There have also been cycles of introducing new creative
approaches and systematically refining the best ones.
• AI has advanced more rapidly in the past decade because of greater use of the scientific
method in experimenting with and comparing approaches.
• Recent progress in understanding the theoretical basis for intelligence has gone hand in
hand with improvements in the capabilities of real systems. The subfields of AI have
become more integrated, and AI has found common ground with other disciplines.
BIBLIOGRAPHICAL AND HISTORICAL NOTES
The methodological status of artificial intelligence is investigated in The Sciences of the Artifi-
cial, by Herb Simon (1981), which discusses research areas concerned with complex artifacts.
It explains how AI can be viewed as both science and mathematics. Cohen (1995) gives an
overview of experimental methodology within AI.
The Turing Test (Turing, 1950) is discussed by Shieber (1994), who severely criticizes
the usefulness of its instantiation in the Loebner Prize competition, and by Ford and Hayes
(1995), who argue that the test itself is not helpful for AI. Bringsjord (2008) gives advice for
a Turing Test judge. Shieber (2004) and Epstein et al. (2008) collect a number of essays on
the Turing Test. Artificial Intelligence: The Very Idea, by John Haugeland (1985), gives a

Exercises 31
readable account of the philosophical and practical problems of AI. Significant early papers
in AI are anthologized in the collections by Webber and Nilsson (1981) and by Luger (1995).
The Encyclopedia of AI (Shapiro, 1992) contains survey articles on almost every topic in
AI, as does Wikipedia. These articles usually provide a good entry point into the research
literature on each topic. An insightful and comprehensive history of AI is given by Nils
Nillson (2009), one of the early pioneers of the field.
The most recent work appears in the proceedings of the major AI conferences: the bi-
ennial International Joint Conference on AI (IJCAI), the annual European Conference on AI
(ECAI), and the National Conference on AI, more often known as AAAI, after its sponsoring
organization. The major journals for general AI are Artificial Intelligence, Computational
Intelligence, the IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE In-
telligent Systems, and the electronic Journal of Artificial Intelligence Research. There are also
many conferences and journals devoted to specific areas, which we cover in the appropriate
chapters. The main professional societies for AI are the American Association for Artificial
Intelligence (AAAI), the ACM Special Interest Group in Artificial Intelligence (SIGART),
and the Society for Artificial Intelligence and Simulation of Behaviour (AISB). AAAI’s AI
Magazine contains many topical and tutorial articles, and its Web site, aaai.org, contains
news, tutorials, and background information.
EXERCISES
These exercises are intended to stimulate discussion, and some might be set as term projects.
Alternatively, preliminary attempts can be made now, and these attempts can be reviewed
after the completion of the book.
1.1 Define in your own words: (a) intelligence, (b) artificial intelligence, (c) agent, (d)
rationality, (e) logical reasoning.
1.2 Every year the Loebner Prize is awarded to the program that comes closest to passing
a version of the Turing Test. Research and report on the latest winner of the Loebner prize.
What techniques does it use? How does it advance the state of the art in AI?
1.3 Are reflex actions (such as flinching from a hot stove) rational? Are they intelligent?
1.4 There are well-known classes of problems that are intractably difficult for computers,
and other classes that are provably undecidable. Does this mean that AI is impossible?
1.5 The neural structure of the sea slug Aplysia has been widely studied (first by Nobel
Laureate Eric Kandel) because it has only about 20,000 neurons, most of them large and
easily manipulated. Assuming that the cycle time for an Aplysia neuron is roughly the same
as for a human neuron, how does the computational power, in terms of memory updates per
second, compare with the high-end computer described in Figure 1.3?
1.6 How could introspection—reporting on one’s inner thoughts—be inaccurate? Could I
be wrong about what I’m thinking? Discuss.

1.7 To what extent are the following computer systems instances of artiﬁcial intelligence:
• Supermarket bar code scanners.
• Voice-activated telephone menus.
• Spelling and grammar correction features in Microsoft Word.
• Internet routing algorithms that respond dynamically to the state of the network.
1.8 Many of the computational models of cognitive activities that have been proposed in-
volve quite complex mathematical operations, such as convolving an image with a Gaussian
or ﬁnding a minimum of the entropy function. Most humans (and certainly all animals) never
learn this kind of mathematics at all, almost no one learns it before college, and almost no
one can compute the convolution of a function with a Gaussian in their head. What sense
does it make to say that the “vision system” is doing this kind of mathematics, whereas the
actual person has no idea how to do it?
1.9 Some authors have claimed that perception and motor skills are the most important part
of intelligence, and that “higher level” capacities are necessarily parasitic—simple add-ons to
these underlying facilities. Certainly, most of evolution and a large part of the brain have been
devoted to perception and motor skills, whereas AI has found tasks such as game playing and
logical inference to be easier, in many ways, than perceiving and acting in the real world. Do
you think that AI’s traditional focus on higher-level cognitive abilities is misplaced?
1.10 Is AI a science, or is it engineering? Or neither or both? Explain.
1.11 “Surely computers cannot be intelligent—they can do only what their programmers
tell them.” Is the latter statement true, and does it imply the former?
1.12 “Surely animals cannot be intelligent—they can do only what their genes tell them.”
Is the latter statement true, and does it imply the former?
1.13 “Surely animals, humans, and computers cannot be intelligent—they can do only what
their constituent atoms are told to do by the laws of physics.” Is the latter statement true, and
does it imply the former?
1.14 Examine the AI literature to discover whether the following tasks can currently be
solved by computers:
a. Playing a decent game of table tennis (Ping-Pong).
b. Driving in the center of Cairo, Egypt.
c. Driving in Victorville, California.
d. Buying a week’s worth of groceries at the market.
e. Buying a week’s worth of groceries on the Web.
f. Playing a decent game of bridge at a competitive level.
g. Discovering and proving new mathematical theorems.
h. Writing an intentionally funny story.
i. Giving competent legal advice in a specialized area of law.

Exercises 33
j. Translating spoken English into spoken Swedish in real time.
k. Performing a complex surgical operation.
For the currently infeasible tasks, try to find out what the difficulties are and predict when, if
ever, they will be overcome.
1.15 Various subfields of AI have held contests by defining a standard task and inviting re-
searchers to do their best. Examples include the DARPA Grand Challenge for robotic cars,
The International Planning Competition, the Robocup robotic soccer league, the TREC infor-
mation retrieval event, and contests in machine translation, speech recognition. Investigate
five of these contests, and describe the progress made over the years. To what degree have the
contests advanced toe state of the art in AI? Do what degree do they hurt the field by drawing
energy away from new ideas?

2 INTELLIGENT AGENTS
In which we discuss the nature of agents, perfect or otherwise, the diversity of
environments, and the resulting menagerie of agent types.
Chapter 1 identified the concept of rational agents as central to our approach to artificial
intelligence. In this chapter, we make this notion more concrete. We will see that the concept
of rationality can be applied to a wide variety of agents operating in any imaginable environ-
ment. Our plan in this book is to use this concept to develop a small set of design principles
for building successful agents—systems that can reasonably be called intelligent.
We begin by examining agents, environments, and the coupling between them. The
observation that some agents behave better than others leads naturally to the idea of a rational
agent—one that behaves as well as possible. How well an agent can behave depends on
the nature of the environment; some environments are more difficult than others. We give a
crude categorization of environments and show how properties of an environment influence
the design of suitable agents for that environment. We describe a number of basic “skeleton”
agent designs, which we flesh out in the rest of the book.
2.1 AGENTS AND ENVIRONMENTS
An agent is anything that can be viewed as perceiving its environment through sensors and
ENVIRONMENT
SENSOR acting upon that environment through actuators. This simple idea is illustrated in Figure 2.1.
ACTUATOR A human agent has eyes, ears, and other organs for sensors and hands, legs, vocal tract, and so
on for actuators. A robotic agent might have cameras and infrared range finders for sensors
and various motors for actuators. A software agent receives keystrokes, file contents, and
network packets as sensory inputs and acts on the environment by displaying on the screen,
writing files, and sending network packets.
We use the term percept to refer to the agent’s perceptual inputs at any given instant. An
PERCEPT
agent’s percept sequence is the complete history of everything the agent has ever perceived.
PERCEPT SEQUENCE
In general, an agent’s choice of action at any given instant can depend on the entire percept
sequence observed to date, but not on anything it hasn’t perceived. By specifying the agent’s
choice of action for every possible percept sequence, we have said more or less everything
34

Section 2.1. Agents and Environments 35
Agent Sensors
Actuators
Environment
Percepts
Actions
?
Figure 2.1 Agents interact with environments through sensors and actuators.
there is to say about the agent. Mathematically speaking, we say that an agent’s behavior is
described by the agent function that maps any given percept sequence to an action.
AGENT FUNCTION
We can imagine tabulating the agent function that describes any given agent; for most
agents, this would be a very large table—infinite, in fact, unless we place a bound on the
length of percept sequences we want to consider. Given an agent to experiment with, we can,
in principle, construct this table by trying out all possible percept sequences and recording
which actions the agent does in response.1 The table is, of course, an external characterization
of the agent. Internally, the agent function for an artificial agent will be implemented by an
agent program. It is important to keep these two ideas distinct. The agent function is an
AGENT PROGRAM
abstract mathematical description; the agent program is a concrete implementation, running
within some physical system.
To illustrate these ideas, we use a very simple example—the vacuum-cleaner world
shown in Figure 2.2. This world is so simple that we can describe everything that happens;
it’s also a made-up world, so we can invent many variations. This particular world has just two
locations: squares A and B. The vacuum agent perceives which square it is in and whether
there is dirt in the square. It can choose to move left, move right, suck up the dirt, or do
nothing. One very simple agent function is the following: if the current square is dirty, then
suck; otherwise, move to the other square. A partial tabulation of this agent function is shown
in Figure 2.3 and an agent program that implements it appears in Figure 2.8 on page 48.
Looking at Figure 2.3, we see that various vacuum-world agents can be defined simply
by filling in the right-hand column in various ways. The obvious question, then, is this: What
is the right way to fill out the table? In other words, what makes an agent good or bad,
intelligent or stupid? We answer these questions in the next section.
1 If the agent uses some randomization to choose its actions, then we would have to try each sequence many
times to identify the probability of each action. One might imagine that acting randomly is rather silly, but we
show later in this chapter that it can be very intelligent.

36 Chapter 2. Intelligent Agents
A B
Figure 2.2 A vacuum-cleaner world with just two locations.
Percept sequence Action
[A, Clean] Right
[A, Dirty] Suck
[B, Clean] Left
[B, Dirty] Suck
[A, Clean], [A, Clean] Right
[A, Clean], [A, Dirty] Suck
.
.
.
.
.
.
[A, Clean], [A, Clean], [A, Clean] Right
[A, Clean], [A, Clean], [A, Dirty] Suck
.
.
.
.
.
.
Figure 2.3 Partial tabulation of a simple agent function for the vacuum-cleaner world
shown in Figure 2.2.
Before closing this section, we should emphasize that the notion of an agent is meant to
be a tool for analyzing systems, not an absolute characterization that divides the world into
agents and non-agents. One could view a hand-held calculator as an agent that chooses the
action of displaying “4” when given the percept sequence “2 + 2 =,” but such an analysis
would hardly aid our understanding of the calculator. In a sense, all areas of engineering can
be seen as designing artifacts that interact with the world; AI operates at (what the authors
consider to be) the most interesting end of the spectrum, where the artifacts have signiﬁcant
computational resources and the task environment requires nontrivial decision making.
2.2 GOOD BEHAVIOR: THE CONCEPT OF RATIONALITY
A rational agent is one that does the right thing—conceptually speaking, every entry in the
RATIONAL AGENT
table for the agent function is ﬁlled out correctly. Obviously, doing the right thing is better
than doing the wrong thing, but what does it mean to do the right thing?

Section 2.2. Good Behavior: The Concept of Rationality 37
We answer this age-old question in an age-old way: by considering the consequences
of the agent’s behavior. When an agent is plunked down in an environment, it generates a
sequence of actions according to the percepts it receives. This sequence of actions causes the
environment to go through a sequence of states. If the sequence is desirable, then the agent
has performed well. This notion of desirability is captured by a performance measure that
PERFORMANCE
MEASURE
evaluates any given sequence of environment states.
Notice that we said environment states, not agent states. If we define success in terms
of agent’s opinion of its own performance, an agent could achieve perfect rationality simply
by deluding itself that its performance was perfect. Human agents in particular are notorious
for “sour grapes”—believing they did not really want something (e.g., a Nobel Prize) after
not getting it.
Obviously, there is not one fixed performance measure for all tasks and agents; typically,
a designer will devise one appropriate to the circumstances. This is not as easy as it sounds.
Consider, for example, the vacuum-cleaner agent from the preceding section. We might
propose to measure performance by the amount of dirt cleaned up in a single eight-hour shift.
With a rational agent, of course, what you ask for is what you get. A rational agent can
maximize this performance measure by cleaning up the dirt, then dumping it all on the floor,
then cleaning it up again, and so on. A more suitable performance measure would reward the
agent for having a clean floor. For example, one point could be awarded for each clean square
at each time step (perhaps with a penalty for electricity consumed and noise generated). As
a general rule, it is better to design performance measures according to what one actually
wants in the environment, rather than according to how one thinks the agent should behave.
Even when the obvious pitfalls are avoided, there remain some knotty issues to untangle.
For example, the notion of “clean floor” in the preceding paragraph is based on average
cleanliness over time. Yet the same average cleanliness can be achieved by two different
agents, one of which does a mediocre job all the time while the other cleans energetically but
takes long breaks. Which is preferable might seem to be a fine point of janitorial science, but
in fact it is a deep philosophical question with far-reaching implications. Which is better—
a reckless life of highs and lows, or a safe but humdrum existence? Which is better—an
economy where everyone lives in moderate poverty, or one in which some live in plenty
while others are very poor? We leave these questions as an exercise for the diligent reader.
2.2.1 Rationality
What is rational at any given time depends on four things:
• The performance measure that defines the criterion of success.
• The agent’s prior knowledge of the environment.
• The actions that the agent can perform.
• The agent’s percept sequence to date.
This leads to a definition of a rational agent:
DEFINITION OF A
RATIONAL AGENT
For each possible percept sequence, a rational agent should select an action that is ex-
pected to maximize its performance measure, given the evidence provided by the percept
sequence and whatever built-in knowledge the agent has.

Consider the simple vacuum-cleaner agent that cleans a square if it is dirty and moves to the
other square if not; this is the agent function tabulated in Figure 2.3. Is this a rational agent?
That depends! First, we need to say what the performance measure is, what is known about
the environment, and what sensors and actuators the agent has. Let us assume the following:
• The performance measure awards one point for each clean square at each time step,
over a “lifetime” of 1000 time steps.
• The “geography” of the environment is known a priori (Figure 2.2) but the dirt distri-
bution and the initial location of the agent are not. Clean squares stay clean and sucking
cleans the current square. The Left and Right actions move the agent left and right
except when this would take the agent outside the environment, in which case the agent
remains where it is.
• The only available actions are Left, Right, and Suck.
• The agent correctly perceives its location and whether that location contains dirt.
We claim that under these circumstances the agent is indeed rational; its expected perfor-
mance is at least as high as any other agent’s. Exercise 2.1 asks you to prove this.
One can see easily that the same agent would be irrational under different circum-
stances. For example, once all the dirt is cleaned up, the agent will oscillate needlessly back
and forth; if the performance measure includes a penalty of one point for each movement left
or right, the agent will fare poorly. A better agent for this case would do nothing once it is
sure that all the squares are clean. If clean squares can become dirty again, the agent should
occasionally check and re-clean them if needed. If the geography of the environment is un-
known, the agent will need to explore it rather than stick to squares A and B. Exercise 2.1
asks you to design agents for these cases.
2.2.2 Omniscience, learning, and autonomy
We need to be careful to distinguish between rationality and omniscience. An omniscient
OMNISCIENCE
agent knows the actual outcome of its actions and can act accordingly; but omniscience is
impossible in reality. Consider the following example: I am walking along the Champs
Elysées one day and I see an old friend across the street. There is no traffic nearby and I’m
not otherwise engaged, so, being rational, I start to cross the street. Meanwhile, at 33,000
feet, a cargo door falls off a passing airliner,2 and before I make it to the other side of the
street I am flattened. Was I irrational to cross the street? It is unlikely that my obituary would
read “Idiot attempts to cross street.”
This example shows that rationality is not the same as perfection. Rationality max-
imizes expected performance, while perfection maximizes actual performance. Retreating
from a requirement of perfection is not just a question of being fair to agents. The point is
that if we expect an agent to do what turns out to be the best action after the fact, it will be
impossible to design an agent to fulfill this specification—unless we improve the performance
of crystal balls or time machines.
2 See N. Henderson, “New door latches urged for Boeing 747 jumbo jets,” Washington Post, August 24, 1989.

Section 2.2. Good Behavior: The Concept of Rationality 39
Our definition of rationality does not require omniscience, then, because the rational
choice depends only on the percept sequence to date. We must also ensure that we haven’t
inadvertently allowed the agent to engage in decidedly underintelligent activities. For exam-
ple, if an agent does not look both ways before crossing a busy road, then its percept sequence
will not tell it that there is a large truck approaching at high speed. Does our definition of
rationality say that it’s now OK to cross the road? Far from it! First, it would not be rational
to cross the road given this uninformative percept sequence: the risk of accident from cross-
ing without looking is too great. Second, a rational agent should choose the “looking” action
before stepping into the street, because looking helps maximize the expected performance.
Doing actions in order to modify future percepts—sometimes called information gather-
ing—is an important part of rationality and is covered in depth in Chapter 16. A second
INFORMATION
GATHERING
example of information gathering is provided by the exploration that must be undertaken by
EXPLORATION
a vacuum-cleaning agent in an initially unknown environment.
Our definition requires a rational agent not only to gather information but also to learn
LEARNING
as much as possible from what it perceives. The agent’s initial configuration could reflect
some prior knowledge of the environment, but as the agent gains experience this may be
modified and augmented. There are extreme cases in which the environment is completely
known a priori. In such cases, the agent need not perceive or learn; it simply acts correctly.
Of course, such agents are fragile. Consider the lowly dung beetle. After digging its nest and
laying its eggs, it fetches a ball of dung from a nearby heap to plug the entrance. If the ball of
dung is removed from its grasp en route, the beetle continues its task and pantomimes plug-
ging the nest with the nonexistent dung ball, never noticing that it is missing. Evolution has
built an assumption into the beetle’s behavior, and when it is violated, unsuccessful behavior
results. Slightly more intelligent is the sphex wasp. The female sphex will dig a burrow, go
out and sting a caterpillar and drag it to the burrow, enter the burrow again to check all is
well, drag the caterpillar inside, and lay its eggs. The caterpillar serves as a food source when
the eggs hatch. So far so good, but if an entomologist moves the caterpillar a few inches
away while the sphex is doing the check, it will revert to the “drag” step of its plan and will
continue the plan without modification, even after dozens of caterpillar-moving interventions.
The sphex is unable to learn that its innate plan is failing, and thus will not change it.
To the extent that an agent relies on the prior knowledge of its designer rather than
on its own percepts, we say that the agent lacks autonomy. A rational agent should be
AUTONOMY
autonomous—it should learn what it can to compensate for partial or incorrect prior knowl-
edge. For example, a vacuum-cleaning agent that learns to foresee where and when additional
dirt will appear will do better than one that does not. As a practical matter, one seldom re-
quires complete autonomy from the start: when the agent has had little or no experience, it
would have to act randomly unless the designer gave some assistance. So, just as evolution
provides animals with enough built-in reflexes to survive long enough to learn for themselves,
it would be reasonable to provide an artificial intelligent agent with some initial knowledge
as well as an ability to learn. After sufficient experience of its environment, the behavior
of a rational agent can become effectively independent of its prior knowledge. Hence, the
incorporation of learning allows one to design a single rational agent that will succeed in a
vast variety of environments.

2.3 THE NATURE OF ENVIRONMENTS
Now that we have a definition of rationality, we are almost ready to think about building
rational agents. First, however, we must think about task environments, which are essen-
TASK ENVIRONMENT
tially the “problems” to which rational agents are the “solutions.” We begin by showing how
to specify a task environment, illustrating the process with a number of examples. We then
show that task environments come in a variety of flavors. The flavor of the task environment
directly affects the appropriate design for the agent program.
2.3.1 Specifying the task environment
In our discussion of the rationality of the simple vacuum-cleaner agent, we had to specify
the performance measure, the environment, and the agent’s actuators and sensors. We group
all these under the heading of the task environment. For the acronymically minded, we call
this the PEAS (Performance, Environment, Actuators, Sensors) description. In designing an
PEAS
agent, the first step must always be to specify the task environment as fully as possible.
The vacuum world was a simple example; let us consider a more complex problem: an
automated taxi driver. We should point out, before the reader becomes alarmed, that a fully
automated taxi is currently somewhat beyond the capabilities of existing technology. (page 28
describes an existing driving robot.) The full driving task is extremely open-ended. There is
no limit to the novel combinations of circumstances that can arise—another reason we chose
it as a focus for discussion. Figure 2.4 summarizes the PEAS description for the taxi’s task
environment. We discuss each element in more detail in the following paragraphs.
Agent Type Performance
Measure
Environment Actuators Sensors
Taxi driver Safe, fast, legal,
comfortable trip,
maximize profits
Roads, other
traffic,
pedestrians,
customers
Steering,
accelerator,
brake, signal,
horn, display
Cameras, sonar,
speedometer,
GPS, odometer,
accelerometer,
engine sensors,
keyboard
Figure 2.4 PEAS description of the task environment for an automated taxi.
First, what is the performance measure to which we would like our automated driver
to aspire? Desirable qualities include getting to the correct destination; minimizing fuel con-
sumption and wear and tear; minimizing the trip time or cost; minimizing violations of traffic
laws and disturbances to other drivers; maximizing safety and passenger comfort; maximiz-
ing profits. Obviously, some of these goals conflict, so tradeoffs will be required.
Next, what is the driving environment that the taxi will face? Any taxi driver must
deal with a variety of roads, ranging from rural lanes and urban alleys to 12-lane freeways.
The roads contain other traffic, pedestrians, stray animals, road works, police cars, puddles,

Section 2.3. The Nature of Environments 41
and potholes. The taxi must also interact with potential and actual passengers. There are also
some optional choices. The taxi might need to operate in Southern California, where snow
is seldom a problem, or in Alaska, where it seldom is not. It could always be driving on the
right, or we might want it to be flexible enough to drive on the left when in Britain or Japan.
Obviously, the more restricted the environment, the easier the design problem.
The actuators for an automated taxi include those available to a human driver: control
over the engine through the accelerator and control over steering and braking. In addition, it
will need output to a display screen or voice synthesizer to talk back to the passengers, and
perhaps some way to communicate with other vehicles, politely or otherwise.
The basic sensors for the taxi will include one or more controllable video cameras so
that it can see the road; it might augment these with infrared or sonar sensors to detect dis-
tances to other cars and obstacles. To avoid speeding tickets, the taxi should have a speedome-
ter, and to control the vehicle properly, especially on curves, it should have an accelerometer.
To determine the mechanical state of the vehicle, it will need the usual array of engine, fuel,
and electrical system sensors. Like many human drivers, it might want a global positioning
system (GPS) so that it doesn’t get lost. Finally, it will need a keyboard or microphone for
the passenger to request a destination.
In Figure 2.5, we have sketched the basic PEAS elements for a number of additional
agent types. Further examples appear in Exercise 2.4. It may come as a surprise to some read-
ers that our list of agent types includes some programs that operate in the entirely artificial
environment defined by keyboard input and character output on a screen. “Surely,” one might
say, “this is not a real environment, is it?” In fact, what matters is not the distinction between
“real” and “artificial” environments, but the complexity of the relationship among the behav-
ior of the agent, the percept sequence generated by the environment, and the performance
measure. Some “real” environments are actually quite simple. For example, a robot designed
to inspect parts as they come by on a conveyor belt can make use of a number of simplifying
assumptions: that the lighting is always just so, that the only thing on the conveyor belt will
be parts of a kind that it knows about, and that only two actions (accept or reject) are possible.
In contrast, some software agents (or software robots or softbots) exist in rich, unlim-
SOFTWARE AGENT
SOFTBOT ited domains. Imagine a softbot Web site operator designed to scan Internet news sources and
show the interesting items to its users, while selling advertising space to generate revenue.
To do well, that operator will need some natural language processing abilities, it will need
to learn what each user and advertiser is interested in, and it will need to change its plans
dynamically—for example, when the connection for one news source goes down or when a
new one comes online. The Internet is an environment whose complexity rivals that of the
physical world and whose inhabitants include many artificial and human agents.
2.3.2 Properties of task environments
The range of task environments that might arise in AI is obviously vast. We can, however,
identify a fairly small number of dimensions along which task environments can be catego-
rized. These dimensions determine, to a large extent, the appropriate agent design and the
applicability of each of the principal families of techniques for agent implementation. First,

Agent Type Performance
Measure
Environment Actuators Sensors
Medical
diagnosis system
Healthy patient,
reduced costs
Patient, hospital,
staff
Display of
questions, tests,
diagnoses,
treatments,
referrals
Keyboard entry
of symptoms,
findings, patient’s
answers
Satellite image
analysis system
Correct image
categorization
Downlink from
orbiting satellite
Display of scene
categorization
Color pixel
arrays
Part-picking
robot
Percentage of
parts in correct
bins
Conveyor belt
with parts; bins
Jointed arm and
hand
Camera, joint
angle sensors
Refinery
controller
Purity, yield,
safety
Refinery,
operators
Valves, pumps,
heaters, displays
Temperature,
pressure,
chemical sensors
Interactive
English tutor
Student’s score
on test
Set of students,
testing agency
Display of
exercises,
suggestions,
corrections
Keyboard entry
Figure 2.5 Examples of agent types and their PEAS descriptions.
we list the dimensions, then we analyze several task environments to illustrate the ideas. The
definitions here are informal; later chapters provide more precise statements and examples of
each kind of environment.
Fully observable vs. partially observable: If an agent’s sensors give it access to the
FULLY OBSERVABLE
PARTIALLY
OBSERVABLE complete state of the environment at each point in time, then we say that the task environ-
ment is fully observable. A task environment is effectively fully observable if the sensors
detect all aspects that are relevant to the choice of action; relevance, in turn, depends on the
performance measure. Fully observable environments are convenient because the agent need
not maintain any internal state to keep track of the world. An environment might be partially
observable because of noisy and inaccurate sensors or because parts of the state are simply
missing from the sensor data—for example, a vacuum agent with only a local dirt sensor
cannot tell whether there is dirt in other squares, and an automated taxi cannot see what other
drivers are thinking. If the agent has no sensors at all then the environment is unobserv-
able. One might think that in such cases the agent’s plight is hopeless, but, as we discuss in
UNOBSERVABLE
Chapter 4, the agent’s goals may still be achievable, sometimes with certainty.
Single agent vs. multiagent: The distinction between single-agent and multiagent en-
SINGLE AGENT
MULTIAGENT

vironments may seem simple enough. For example, an agent solving a crossword puzzle by
itself is clearly in a single-agent environment, whereas an agent playing chess is in a two-
agent environment. There are, however, some subtle issues. First, we have described how an
entity may be viewed as an agent, but we have not explained which entities must be viewed
as agents. Does an agent A (the taxi driver for example) have to treat an object B (another
vehicle) as an agent, or can it be treated merely as an object behaving according to the laws of
physics, analogous to waves at the beach or leaves blowing in the wind? The key distinction
is whether B’s behavior is best described as maximizing a performance measure whose value
depends on agent A’s behavior. For example, in chess, the opponent entity B is trying to
maximize its performance measure, which, by the rules of chess, minimizes agent A’s per-
formance measure. Thus, chess is a competitive multiagent environment. In the taxi-driving
COMPETITIVE
environment, on the other hand, avoiding collisions maximizes the performance measure of
all agents, so it is a partially cooperative multiagent environment. It is also partially com-
COOPERATIVE
petitive because, for example, only one car can occupy a parking space. The agent-design
problems in multiagent environments are often quite different from those in single-agent en-
vironments; for example, communication often emerges as a rational behavior in multiagent
environments; in some competitive environments, randomized behavior is rational because
it avoids the pitfalls of predictability.
Deterministic vs. stochastic. If the next state of the environment is completely deter-
DETERMINISTIC
STOCHASTIC mined by the current state and the action executed by the agent, then we say the environment
is deterministic; otherwise, it is stochastic. In principle, an agent need not worry about uncer-
tainty in a fully observable, deterministic environment. (In our definition, we ignore uncer-
tainty that arises purely from the actions of other agents in a multiagent environment; thus,
a game can be deterministic even though each agent may be unable to predict the actions of
the others.) If the environment is partially observable, however, then it could appear to be
stochastic. Most real situations are so complex that it is impossible to keep track of all the
unobserved aspects; for practical purposes, they must be treated as stochastic. Taxi driving is
clearly stochastic in this sense, because one can never predict the behavior of traffic exactly;
moreover, one’s tires blow out and one’s engine seizes up without warning. The vacuum
world as we described it is deterministic, but variations can include stochastic elements such
as randomly appearing dirt and an unreliable suction mechanism (Exercise 2.13). We say an
environment is uncertain if it is not fully observable or not deterministic. One final note:
UNCERTAIN
our use of the word “stochastic” generally implies that uncertainty about outcomes is quan-
tified in terms of probabilities; a nondeterministic environment is one in which actions are
NONDETERMINISTIC
characterized by their possible outcomes, but no probabilities are attached to them. Nonde-
terministic environment descriptions are usually associated with performance measures that
require the agent to succeed for all possible outcomes of its actions.
Episodic vs. sequential: In an episodic task environment, the agent’s experience is
EPISODIC
SEQUENTIAL divided into atomic episodes. In each episode the agent receives a percept and then performs
a single action. Crucially, the next episode does not depend on the actions taken in previous
episodes. Many classification tasks are episodic. For example, an agent that has to spot
defective parts on an assembly line bases each decision on the current part, regardless of
previous decisions; moreover, the current decision doesn’t affect whether the next part is

defective. In sequential environments, on the other hand, the current decision could affect
all future decisions.3 Chess and taxi driving are sequential: in both cases, short-term actions
can have long-term consequences. Episodic environments are much simpler than sequential
environments because the agent does not need to think ahead.
Static vs. dynamic: If the environment can change while an agent is deliberating, then
STATIC
DYNAMIC we say the environment is dynamic for that agent; otherwise, it is static. Static environments
are easy to deal with because the agent need not keep looking at the world while it is deciding
on an action, nor need it worry about the passage of time. Dynamic environments, on the
other hand, are continuously asking the agent what it wants to do; if it hasn’t decided yet,
that counts as deciding to do nothing. If the environment itself does not change with the
passage of time but the agent’s performance score does, then we say the environment is
semidynamic. Taxi driving is clearly dynamic: the other cars and the taxi itself keep moving
SEMIDYNAMIC
while the driving algorithm dithers about what to do next. Chess, when played with a clock,
is semidynamic. Crossword puzzles are static.
Discrete vs. continuous: The discrete/continuous distinction applies to the state of the
DISCRETE
CONTINUOUS environment, to the way time is handled, and to the percepts and actions of the agent. For
example, the chess environment has a ﬁnite number of distinct states (excluding the clock).
Chess also has a discrete set of percepts and actions. Taxi driving is a continuous-state and
continuous-time problem: the speed and location of the taxi and of the other vehicles sweep
through a range of continuous values and do so smoothly over time. Taxi-driving actions are
also continuous (steering angles, etc.). Input from digital cameras is discrete, strictly speak-
ing, but is typically treated as representing continuously varying intensities and locations.
Known vs. unknown: Strictly speaking, this distinction refers not to the environment
KNOWN
UNKNOWN itself but to the agent’s (or designer’s) state of knowledge about the “laws of physics” of
the environment. In a known environment, the outcomes (or outcome probabilities if the
environment is stochastic) for all actions are given. Obviously, if the environment is unknown,
the agent will have to learn how it works in order to make good decisions. Note that the
distinction between known and unknown environments is not the same as the one between
fully and partially observable environments. It is quite possible for a known environment
to be partially observable—for example, in solitaire card games, I know the rules but am
still unable to see the cards that have not yet been turned over. Conversely, an unknown
environment can be fully observable—in a new video game, the screen may show the entire
game state but I still don’t know what the buttons do until I try them.
As one might expect, the hardest case is partially observable, multiagent, stochastic,
sequential, dynamic, continuous, and unknown. Taxi driving is hard in all these senses, except
that for the most part the driver’s environment is known. Driving a rented car in a new country
with unfamiliar geography and trafﬁc laws is a lot more exciting.
Figure 2.6 lists the properties of a number of familiar environments. Note that the
answers are not always cut and dried. For example, we describe the part-picking robot as
episodic, because it normally considers each part in isolation. But if one day there is a large
3 The word “sequential” is also used in computer science as the antonym of “parallel.” The two meanings are
largely unrelated.

Task Environment Observable Agents Deterministic Episodic Static Discrete
Crossword puzzle Fully Single Deterministic Sequential Static Discrete
Chess with a clock Fully Multi Deterministic Sequential Semi Discrete
Poker Partially Multi Stochastic Sequential Static Discrete
Backgammon Fully Multi Stochastic Sequential Static Discrete
Taxi driving Partially Multi Stochastic Sequential Dynamic Continuous
Medical diagnosis Partially Single Stochastic Sequential Dynamic Continuous
Image analysis Fully Single Deterministic Episodic Semi Continuous
Part-picking robot Partially Single Stochastic Episodic Dynamic Continuous
Refinery controller Partially Single Stochastic Sequential Dynamic Continuous
Interactive English tutor Partially Multi Stochastic Sequential Dynamic Discrete
Figure 2.6 Examples of task environments and their characteristics.
batch of defective parts, the robot should learn from several observations that the distribution
of defects has changed, and should modify its behavior for subsequent parts. We have not
included a “known/unknown” column because, as explained earlier, this is not strictly a prop-
erty of the environment. For some environments, such as chess and poker, it is quite easy to
supply the agent with full knowledge of the rules, but it is nonetheless interesting to consider
how an agent might learn to play these games without such knowledge.
Several of the answers in the table depend on how the task environment is defined. We
have listed the medical-diagnosis task as single-agent because the disease process in a patient
is not profitably modeled as an agent; but a medical-diagnosis system might also have to
deal with recalcitrant patients and skeptical staff, so the environment could have a multiagent
aspect. Furthermore, medical diagnosis is episodic if one conceives of the task as selecting a
diagnosis given a list of symptoms; the problem is sequential if the task can include proposing
a series of tests, evaluating progress over the course of treatment, and so on. Also, many
environments are episodic at higher levels than the agent’s individual actions. For example,
a chess tournament consists of a sequence of games; each game is an episode because (by
and large) the contribution of the moves in one game to the agent’s overall performance is
not affected by the moves in its previous game. On the other hand, decision making within a
single game is certainly sequential.
The code repository associated with this book (aima.cs.berkeley.edu) includes imple-
mentations of a number of environments, together with a general-purpose environment simu-
lator that places one or more agents in a simulated environment, observes their behavior over
time, and evaluates them according to a given performance measure. Such experiments are
often carried out not for a single environment but for many environments drawn from an en-
vironment class. For example, to evaluate a taxi driver in simulated traffic, we would want to
ENVIRONMENT
CLASS
run many simulations with different traffic, lighting, and weather conditions. If we designed
the agent for a single scenario, we might be able to take advantage of specific properties
of the particular case but might not identify a good design for driving in general. For this

reason, the code repository also includes an environment generator for each environment
ENVIRONMENT
GENERATOR
class that selects particular environments (with certain likelihoods) in which to run the agent.
For example, the vacuum environment generator initializes the dirt pattern and agent location
randomly. We are then interested in the agent’s average performance over the environment
class. A rational agent for a given environment class maximizes this average performance.
Exercises 2.9 to 2.13 take you through the process of developing an environment class and
evaluating various agents therein.
2.4 THE STRUCTURE OF AGENTS
So far we have talked about agents by describing behavior—the action that is performed after
any given sequence of percepts. Now we must bite the bullet and talk about how the insides
work. The job of AI is to design an agent program that implements the agent function—
AGENT PROGRAM
the mapping from percepts to actions. We assume this program will run on some sort of
computing device with physical sensors and actuators—we call this the architecture:
ARCHITECTURE
agent = architecture + program .
Obviously, the program we choose has to be one that is appropriate for the architecture. If the
program is going to recommend actions like Walk, the architecture had better have legs. The
architecture might be just an ordinary PC, or it might be a robotic car with several onboard
computers, cameras, and other sensors. In general, the architecture makes the percepts from
the sensors available to the program, runs the program, and feeds the program’s action choices
to the actuators as they are generated. Most of this book is about designing agent programs,
although Chapters 24 and 25 deal directly with the sensors and actuators.
2.4.1 Agent programs
The agent programs that we design in this book all have the same skeleton: they take the
current percept as input from the sensors and return an action to the actuators.4 Notice the
difference between the agent program, which takes the current percept as input, and the agent
function, which takes the entire percept history. The agent program takes just the current
percept as input because nothing more is available from the environment; if the agent’s actions
need to depend on the entire percept sequence, the agent will have to remember the percepts.
We describe the agent programs in the simple pseudocode language that is deﬁned in
Appendix B. (The online code repository contains implementations in real programming
languages.) For example, Figure 2.7 shows a rather trivial agent program that keeps track of
the percept sequence and then uses it to index into a table of actions to decide what to do.
The table—an example of which is given for the vacuum world in Figure 2.3—represents
explicitly the agent function that the agent program embodies. To build a rational agent in
4 There are other choices for the agent program skeleton; for example, we could have the agent programs be
coroutines that run asynchronously with the environment. Each such coroutine has an input and output port and
consists of a loop that reads the input port for percepts and writes actions to the output port.

Section 2.4. The Structure of Agents 47
function TABLE-DRIVEN-AGENT(percept) returns an action
persistent: percepts, a sequence, initially empty
table, a table of actions, indexed by percept sequences, initially fully specified
append percept to the end of percepts
action ← LOOKUP(percepts,table)
return action
Figure 2.7 The TABLE-DRIVEN-AGENT program is invoked for each new percept and
returns an action each time. It retains the complete percept sequence in memory.
this way, we as designers must construct a table that contains the appropriate action for every
possible percept sequence.
It is instructive to consider why the table-driven approach to agent construction is
doomed to failure. Let P be the set of possible percepts and let T be the lifetime of the
agent (the total number of percepts it will receive). The lookup table will contain
PT
t = 1 |P|t
entries. Consider the automated taxi: the visual input from a single camera comes in at the
rate of roughly 27 megabytes per second (30 frames per second, 640 × 480 pixels with 24
bits of color information). This gives a lookup table with over 10250,000,000,000 entries for an
hour’s driving. Even the lookup table for chess—a tiny, well-behaved fragment of the real
world—would have at least 10150 entries. The daunting size of these tables (the number of
atoms in the observable universe is less than 1080) means that (a) no physical agent in this
universe will have the space to store the table, (b) the designer would not have time to create
the table, (c) no agent could ever learn all the right table entries from its experience, and (d)
even if the environment is simple enough to yield a feasible table size, the designer still has
no guidance about how to fill in the table entries.
Despite all this, TABLE-DRIVEN-AGENT does do what we want: it implements the
desired agent function. The key challenge for AI is to find out how to write programs that,
to the extent possible, produce rational behavior from a smallish program rather than from
a vast table. We have many examples showing that this can be done successfully in other
areas: for example, the huge tables of square roots used by engineers and schoolchildren prior
to the 1970s have now been replaced by a five-line program for Newton’s method running
on electronic calculators. The question is, can AI do for general intelligent behavior what
Newton did for square roots? We believe the answer is yes.
In the remainder of this section, we outline four basic kinds of agent programs that
embody the principles underlying almost all intelligent systems:
• Simple reflex agents;
• Model-based reflex agents;
• Goal-based agents; and
• Utility-based agents.
Each kind of agent program combines particular components in particular ways to generate
actions. Section 2.4.6 explains in general terms how to convert all these agents into learning

function REFLEX-VACUUM-AGENT([location,status]) returns an action
if status = Dirty then return Suck
else if location = A then return Right
else if location = B then return Left
Figure 2.8 The agent program for a simple reflex agent in the two-state vacuum environ-
ment. This program implements the agent function tabulated in Figure 2.3.
agents that can improve the performance of their components so as to generate better actions.
Finally, Section 2.4.7 describes the variety of ways in which the components themselves can
be represented within the agent. This variety provides a major organizing principle for the
field and for the book itself.
2.4.2 Simple reflex agents
The simplest kind of agent is the simple reflex agent. These agents select actions on the basis
SIMPLE REFLEX
AGENT
of the current percept, ignoring the rest of the percept history. For example, the vacuum agent
whose agent function is tabulated in Figure 2.3 is a simple reflex agent, because its decision
is based only on the current location and on whether that location contains dirt. An agent
program for this agent is shown in Figure 2.8.
Notice that the vacuum agent program is very small indeed compared to the correspond-
ing table. The most obvious reduction comes from ignoring the percept history, which cuts
down the number of possibilities from 4T to just 4. A further, small reduction comes from
the fact that when the current square is dirty, the action does not depend on the location.
Simple reflex behaviors occur even in more complex environments. Imagine yourself
as the driver of the automated taxi. If the car in front brakes and its brake lights come on, then
you should notice this and initiate braking. In other words, some processing is done on the
visual input to establish the condition we call “The car in front is braking.” Then, this triggers
some established connection in the agent program to the action “initiate braking.” We call
such a connection a condition–action rule,5 written as
CONDITION–ACTION
RULE
if car-in-front-is-braking then initiate-braking.
Humans also have many such connections, some of which are learned responses (as for driv-
ing) and some of which are innate reflexes (such as blinking when something approaches the
eye). In the course of the book, we show several different ways in which such connections
can be learned and implemented.
The program in Figure 2.8 is specific to one particular vacuum environment. A more
general and flexible approach is first to build a general-purpose interpreter for condition–
action rules and then to create rule sets for specific task environments. Figure 2.9 gives the
structure of this general program in schematic form, showing how the condition–action rules
allow the agent to make the connection from percept to action. (Do not worry if this seems
5 Also called situation–action rules, productions, or if–then rules.

Agent
Environment
Sensors
What action I
should do now
Condition-action rules
Actuators
What the world
is like now
Figure 2.9 Schematic diagram of a simple reflex agent.
function SIMPLE-REFLEX-AGENT(percept) returns an action
persistent: rules, a set of condition–action rules
state ← INTERPRET-INPUT(percept)
rule ← RULE-MATCH(state,rules)
action ← rule.ACTION
return action
Figure 2.10 A simple reflex agent. It acts according to a rule whose condition matches
the current state, as defined by the percept.
trivial; it gets more interesting shortly.) We use rectangles to denote the current internal state
of the agent’s decision process, and ovals to represent the background information used in
the process. The agent program, which is also very simple, is shown in Figure 2.10. The
INTERPRET-INPUT function generates an abstracted description of the current state from the
percept, and the RULE-MATCH function returns the first rule in the set of rules that matches
the given state description. Note that the description in terms of “rules” and “matching” is
purely conceptual; actual implementations can be as simple as a collection of logic gates
implementing a Boolean circuit.
Simple reflex agents have the admirable property of being simple, but they turn out to be
of limited intelligence. The agent in Figure 2.10 will work only if the correct decision can be
made on the basis of only the current percept—that is, only if the environment is fully observ-
able. Even a little bit of unobservability can cause serious trouble. For example, the braking
rule given earlier assumes that the condition car-in-front-is-braking can be determined from
the current percept—a single frame of video. This works if the car in front has a centrally
mounted brake light. Unfortunately, older models have different configurations of taillights,

brake lights, and turn-signal lights, and it is not always possible to tell from a single image
whether the car is braking. A simple reflex agent driving behind such a car would either brake
continuously and unnecessarily, or, worse, never brake at all.
We can see a similar problem arising in the vacuum world. Suppose that a simple reflex
vacuum agent is deprived of its location sensor and has only a dirt sensor. Such an agent
has just two possible percepts: [Dirty] and [Clean]. It can Suck in response to [Dirty]; what
should it do in response to [Clean]? Moving Left fails (forever) if it happens to start in square
A, and moving Right fails (forever) if it happens to start in square B. Infinite loops are often
unavoidable for simple reflex agents operating in partially observable environments.
Escape from infinite loops is possible if the agent can randomize its actions. For ex-
RANDOMIZATION
ample, if the vacuum agent perceives [Clean], it might flip a coin to choose between Left and
Right. It is easy to show that the agent will reach the other square in an average of two steps.
Then, if that square is dirty, the agent will clean it and the task will be complete. Hence, a
randomized simple reflex agent might outperform a deterministic simple reflex agent.
We mentioned in Section 2.3 that randomized behavior of the right kind can be rational
in some multiagent environments. In single-agent environments, randomization is usually not
rational. It is a useful trick that helps a simple reflex agent in some situations, but in most
cases we can do much better with more sophisticated deterministic agents.
2.4.3 Model-based reflex agents
The most effective way to handle partial observability is for the agent to keep track of the
part of the world it can’t see now. That is, the agent should maintain some sort of internal
state that depends on the percept history and thereby reflects at least some of the unobserved
INTERNAL STATE
aspects of the current state. For the braking problem, the internal state is not too extensive—
just the previous frame from the camera, allowing the agent to detect when two red lights at
the edge of the vehicle go on or off simultaneously. For other driving tasks such as changing
lanes, the agent needs to keep track of where the other cars are if it can’t see them all at once.
And for any driving to be possible at all, the agent needs to keep track of where its keys are.
Updating this internal state information as time goes by requires two kinds of knowl-
edge to be encoded in the agent program. First, we need some information about how the
world evolves independently of the agent—for example, that an overtaking car generally will
be closer behind than it was a moment ago. Second, we need some information about how
the agent’s own actions affect the world—for example, that when the agent turns the steering
wheel clockwise, the car turns to the right, or that after driving for five minutes northbound
on the freeway, one is usually about five miles north of where one was five minutes ago. This
knowledge about “how the world works”—whether implemented in simple Boolean circuits
or in complete scientific theories—is called a model of the world. An agent that uses such a
model is called a model-based agent.
MODEL-BASED
AGENT
Figure 2.11 gives the structure of the model-based reflex agent with internal state, show-
ing how the current percept is combined with the old internal state to generate the updated
description of the current state, based on the agent’s model of how the world works. The agent
program is shown in Figure 2.12. The interesting part is the function UPDATE-STATE, which

Agent
Environment
Sensors
State
How the world evolves
What my actions do
Condition-action rules
Actuators
What the world
is like now
What action I
should do now
Figure 2.11 A model-based reflex agent.
function MODEL-BASED-REFLEX-AGENT(percept) returns an action
persistent: state, the agent’s current conception of the world state
model, a description of how the next state depends on current state and action
rules, a set of condition–action rules
action, the most recent action, initially none
state ← UPDATE-STATE(state,action,percept,model)
rule ← RULE-MATCH(state,rules)
action ← rule.ACTION
return action
Figure 2.12 A model-based reflex agent. It keeps track of the current state of the world,
using an internal model. It then chooses an action in the same way as the reflex agent.
is responsible for creating the new internal state description. The details of how models and
states are represented vary widely depending on the type of environment and the particular
technology used in the agent design. Detailed examples of models and updating algorithms
appear in Chapters 4, 12, 11, 15, 17, and 25.
Regardless of the kind of representation used, it is seldom possible for the agent to
determine the current state of a partially observable environment exactly. Instead, the box
labeled “what the world is like now” (Figure 2.11) represents the agent’s “best guess” (or
sometimes best guesses). For example, an automated taxi may not be able to see around the
large truck that has stopped in front of it and can only guess about what may be causing the
hold-up. Thus, uncertainty about the current state may be unavoidable, but the agent still has
to make a decision.
A perhaps less obvious point about the internal “state” maintained by a model-based
agent is that it does not have to describe “what the world is like now” in a literal sense. For

Agent
Environment
Sensors
What action I
should do now
State
What my actions do
Actuators
What the world
is like now
What it will be like
if I do action A
Goals
Figure 2.13 A model-based, goal-based agent. It keeps track of the world state as well as
a set of goals it is trying to achieve, and chooses an action that will (eventually) lead to the
achievement of its goals.
example, the taxi may be driving back home, and it may have a rule telling it to fill up with
gas on the way home unless it has at least half a tank. Although “driving back home” may
seem to an aspect of the world state, the fact of the taxi’s destination is actually an aspect of
the agent’s internal state. If you find this puzzling, consider that the taxi could be in exactly
the same place at the same time, but intending to reach a different destination.
2.4.4 Goal-based agents
Knowing something about the current state of the environment is not always enough to decide
what to do. For example, at a road junction, the taxi can turn left, turn right, or go straight
on. The correct decision depends on where the taxi is trying to get to. In other words, as well
as a current state description, the agent needs some sort of goal information that describes
GOAL
situations that are desirable—for example, being at the passenger’s destination. The agent
program can combine this with the model (the same information as was used in the model-
based reflex agent) to choose actions that achieve the goal. Figure 2.13 shows the goal-based
agent’s structure.
Sometimes goal-based action selection is straightforward—for example, when goal sat-
isfaction results immediately from a single action. Sometimes it will be more tricky—for
example, when the agent has to consider long sequences of twists and turns in order to find a
way to achieve the goal. Search (Chapters 3 to 5) and planning (Chapters 10 and 11) are the
subfields of AI devoted to finding action sequences that achieve the agent’s goals.
Notice that decision making of this kind is fundamentally different from the condition–
action rules described earlier, in that it involves consideration of the future—both “What will
happen if I do such-and-such?” and “Will that make me happy?” In the reflex agent designs,
this information is not explicitly represented, because the built-in rules map directly from

percepts to actions. The reflex agent brakes when it sees brake lights. A goal-based agent, in
principle, could reason that if the car in front has its brake lights on, it will slow down. Given
the way the world usually evolves, the only action that will achieve the goal of not hitting
other cars is to brake.
Although the goal-based agent appears less efficient, it is more flexible because the
knowledge that supports its decisions is represented explicitly and can be modified. If it starts
to rain, the agent can update its knowledge of how effectively its brakes will operate; this will
automatically cause all of the relevant behaviors to be altered to suit the new conditions.
For the reflex agent, on the other hand, we would have to rewrite many condition–action
rules. The goal-based agent’s behavior can easily be changed to go to a different destination,
simply by specifying that destination as the goal. The reflex agent’s rules for when to turn
and when to go straight will work only for a single destination; they must all be replaced to
go somewhere new.
2.4.5 Utility-based agents
Goals alone are not enough to generate high-quality behavior in most environments. For
example, many action sequences will get the taxi to its destination (thereby achieving the
goal) but some are quicker, safer, more reliable, or cheaper than others. Goals just provide a
crude binary distinction between “happy” and “unhappy” states. A more general performance
measure should allow a comparison of different world states according to exactly how happy
they would make the agent. Because “happy” does not sound very scientific, economists and
computer scientists use the term utility instead.6
UTILITY
We have already seen that a performance measure assigns a score to any given sequence
of environment states, so it can easily distinguish between more and less desirable ways of
getting to the taxi’s destination. An agent’s utility function is essentially an internalization
UTILITY FUNCTION
of the performance measure. If the internal utility function and the external performance
measure are in agreement, then an agent that chooses actions to maximize its utility will be
rational according to the external performance measure.
Let us emphasize again that this is not the only way to be rational—we have already
seen a rational agent program for the vacuum world (Figure 2.8) that has no idea what its
utility function is—but, like goal-based agents, a utility-based agent has many advantages in
terms of flexibility and learning. Furthermore, in two kinds of cases, goals are inadequate but
a utility-based agent can still make rational decisions. First, when there are conflicting goals,
only some of which can be achieved (for example, speed and safety), the utility function
specifies the appropriate tradeoff. Second, when there are several goals that the agent can
aim for, none of which can be achieved with certainty, utility provides a way in which the
likelihood of success can be weighed against the importance of the goals.
Partial observability and stochasticity are ubiquitous in the real world, and so, therefore,
is decision making under uncertainty. Technically speaking, a rational utility-based agent
chooses the action that maximizes the expected utility of the action outcomes—that is, the
EXPECTED UTILITY
utility the agent expects to derive, on average, given the probabilities and utilities of each
6 The word “utility” here refers to “the quality of being useful,” not to the electric company or waterworks.

Agent
Environment
Sensors
How happy I will be
in such a state
State
What my actions do
Utility
Actuators
What action I
should do now
What it will be like
if I do action A
What the world
is like now
Figure 2.14 A model-based, utility-based agent. It uses a model of the world, along with
a utility function that measures its preferences among states of the world. Then it chooses the
action that leads to the best expected utility, where expected utility is computed by averaging
over all possible outcome states, weighted by the probability of the outcome.
outcome. (Appendix A defines expectation more precisely.) In Chapter 16, we show that any
rational agent must behave as if it possesses a utility function whose expected value it tries
to maximize. An agent that possesses an explicit utility function can make rational decisions
with a general-purpose algorithm that does not depend on the specific utility function being
maximized. In this way, the “global” definition of rationality—designating as rational those
agent functions that have the highest performance—is turned into a “local” constraint on
rational-agent designs that can be expressed in a simple program.
The utility-based agent structure appears in Figure 2.14. Utility-based agent programs
appear in Part IV, where we design decision-making agents that must handle the uncertainty
inherent in stochastic or partially observable environments.
At this point, the reader may be wondering, “Is it that simple? We just build agents that
maximize expected utility, and we’re done?” It’s true that such agents would be intelligent,
but it’s not simple. A utility-based agent has to model and keep track of its environment,
tasks that have involved a great deal of research on perception, representation, reasoning,
and learning. The results of this research fill many of the chapters of this book. Choosing
the utility-maximizing course of action is also a difficult task, requiring ingenious algorithms
that fill several more chapters. Even with these algorithms, perfect rationality is usually
unachievable in practice because of computational complexity, as we noted in Chapter 1.
2.4.6 Learning agents
We have described agent programs with various methods for selecting actions. We have
not, so far, explained how the agent programs come into being. In his famous early paper,
Turing (1950) considers the idea of actually programming his intelligent machines by hand.

Performance standard
Agent
Environment
Sensors
Performance
element
changes
knowledge
learning
goals
Problem
generator
feedback
Learning
element
Critic
Actuators
Figure 2.15 A general learning agent.
He estimates how much work this might take and concludes “Some more expeditious method
seems desirable.” The method he proposes is to build learning machines and then to teach
them. In many areas of AI, this is now the preferred method for creating state-of-the-art
systems. Learning has another advantage, as we noted earlier: it allows the agent to operate
in initially unknown environments and to become more competent than its initial knowledge
alone might allow. In this section, we briefly introduce the main ideas of learning agents.
Throughout the book, we comment on opportunities and methods for learning in particular
kinds of agents. Part V goes into much more depth on the learning algorithms themselves.
A learning agent can be divided into four conceptual components, as shown in Fig-
ure 2.15. The most important distinction is between the learning element, which is re-
LEARNING ELEMENT
sponsible for making improvements, and the performance element, which is responsible for
PERFORMANCE
ELEMENT
selecting external actions. The performance element is what we have previously considered
to be the entire agent: it takes in percepts and decides on actions. The learning element uses
feedback from the critic on how the agent is doing and determines how the performance
CRITIC
element should be modified to do better in the future.
The design of the learning element depends very much on the design of the performance
element. When trying to design an agent that learns a certain capability, the first question is
not “How am I going to get it to learn this?” but “What kind of performance element will my
agent need to do this once it has learned how?” Given an agent design, learning mechanisms
can be constructed to improve every part of the agent.
The critic tells the learning element how well the agent is doing with respect to a fixed
performance standard. The critic is necessary because the percepts themselves provide no
indication of the agent’s success. For example, a chess program could receive a percept
indicating that it has checkmated its opponent, but it needs a performance standard to know
that this is a good thing; the percept itself does not say so. It is important that the performance

standard be fixed. Conceptually, one should think of it as being outside the agent altogether
because the agent must not modify it to fit its own behavior.
The last component of the learning agent is the problem generator. It is responsible
PROBLEM
GENERATOR
for suggesting actions that will lead to new and informative experiences. The point is that
if the performance element had its way, it would keep doing the actions that are best, given
what it knows. But if the agent is willing to explore a little and do some perhaps suboptimal
actions in the short run, it might discover much better actions for the long run. The problem
generator’s job is to suggest these exploratory actions. This is what scientists do when they
carry out experiments. Galileo did not think that dropping rocks from the top of a tower in
Pisa was valuable in itself. He was not trying to break the rocks or to modify the brains of
unfortunate passers-by. His aim was to modify his own brain by identifying a better theory
of the motion of objects.
To make the overall design more concrete, let us return to the automated taxi example.
The performance element consists of whatever collection of knowledge and procedures the
taxi has for selecting its driving actions. The taxi goes out on the road and drives, using
this performance element. The critic observes the world and passes information along to the
learning element. For example, after the taxi makes a quick left turn across three lanes of traf-
fic, the critic observes the shocking language used by other drivers. From this experience, the
learning element is able to formulate a rule saying this was a bad action, and the performance
element is modified by installation of the new rule. The problem generator might identify
certain areas of behavior in need of improvement and suggest experiments, such as trying out
the brakes on different road surfaces under different conditions.
The learning element can make changes to any of the “knowledge” components shown
in the agent diagrams (Figures 2.9, 2.11, 2.13, and 2.14). The simplest cases involve learning
directly from the percept sequence. Observation of pairs of successive states of the environ-
ment can allow the agent to learn “How the world evolves,” and observation of the results of
its actions can allow the agent to learn “What my actions do.” For example, if the taxi exerts
a certain braking pressure when driving on a wet road, then it will soon find out how much
deceleration is actually achieved. Clearly, these two learning tasks are more difficult if the
environment is only partially observable.
The forms of learning in the preceding paragraph do not need to access the external
performance standard—in a sense, the standard is the universal one of making predictions
that agree with experiment. The situation is slightly more complex for a utility-based agent
that wishes to learn utility information. For example, suppose the taxi-driving agent receives
no tips from passengers who have been thoroughly shaken up during the trip. The external
performance standard must inform the agent that the loss of tips is a negative contribution to
its overall performance; then the agent might be able to learn that violent maneuvers do not
contribute to its own utility. In a sense, the performance standard distinguishes part of the
incoming percept as a reward (or penalty) that provides direct feedback on the quality of the
agent’s behavior. Hard-wired performance standards such as pain and hunger in animals can
be understood in this way. This issue is discussed further in Chapter 21.
In summary, agents have a variety of components, and those components can be repre-
sented in many ways within the agent program, so there appears to be great variety among

learning methods. There is, however, a single unifying theme. Learning in intelligent agents
can be summarized as a process of modification of each component of the agent to bring the
components into closer agreement with the available feedback information, thereby improv-
ing the overall performance of the agent.
2.4.7 How the components of agent programs work
We have described agent programs (in very high-level terms) as consisting of various compo-
nents, whose function it is to answer questions such as: “What is the world like now?” “What
action should I do now?” “What do my actions do?” The next question for a student of AI
is, “How on earth do these components work?” It takes about a thousand pages to begin to
answer that question properly, but here we want to draw the reader’s attention to some basic
distinctions among the various ways that the components can represent the environment that
the agent inhabits.
Roughly speaking, we can place the representations along an axis of increasing com-
plexity and expressive power—atomic, factored, and structured. To illustrate these ideas,
it helps to consider a particular agent component, such as the one that deals with “What my
actions do.” This component describes the changes that might occur in the environment as
the result of taking an action, and Figure 2.16 provides schematic depictions of how those
transitions might be represented.
B C
(a) Atomic (b) Factored (b) Structured
B C
Figure 2.16 Three ways to represent states and the transitions between them. (a) Atomic
representation: a state (such as B or C) is a black box with no internal structure; (b) Factored
representation: a state consists of a vector of attribute values; values can be Boolean, real-
valued, or one of a fixed set of symbols. (c) Structured representation: a state includes
objects, each of which may have attributes of its own as well as relationships to other objects.
In an atomic representation each state of the world is indivisible—it has no internal
ATOMIC
REPRESENTATION
structure. Consider the problem of finding a driving route from one end of a country to the
other via some sequence of cities (we address this problem in Figure 3.2 on page 68). For the
purposes of solving this problem, it may suffice to reduce the state of world to just the name
of the city we are in—a single atom of knowledge; a “black box” whose only discernible
property is that of being identical to or different from another black box. The algorithms

underlying search and game-playing (Chapters 3–5), Hidden Markov models (Chapter 15),
and Markov decision processes (Chapter 17) all work with atomic representations—or, at
least, they treat representations as if they were atomic.
Now consider a higher-fidelity description for the same problem, where we need to be
concerned with more than just atomic location in one city or another; we might need to pay
attention to how much gas is in the tank, our current GPS coordinates, whether or not the oil
warning light is working, how much spare change we have for toll crossings, what station is
on the radio, and so on. A factored representation splits up each state into a fixed set of
FACTORED
REPRESENTATION
variables or attributes, each of which can have a value. While two different atomic states
VARIABLE
ATTRIBUTE
VALUE
have nothing in common—they are just different black boxes—two different factored states
can share some attributes (such as being at some particular GPS location) and not others (such
as having lots of gas or having no gas); this makes it much easier to work out how to turn
one state into another. With factored representations, we can also represent uncertainty—for
example, ignorance about the amount of gas in the tank can be represented by leaving that
attribute blank. Many important areas of AI are based on factored representations, including
constraint satisfaction algorithms (Chapter 6), propositional logic (Chapter 7), planning
(Chapters 10 and 11), Bayesian networks (Chapters 13–16), and the machine learning al-
gorithms in Chapters 18, 20, and 21.
For many purposes, we need to understand the world as having things in it that are
related to each other, not just variables with values. For example, we might notice that a
large truck ahead of us is reversing into the driveway of a dairy farm but a cow has got loose
and is blocking the truck’s path. A factored representation is unlikely to be pre-equipped
with the attribute TruckAheadBackingIntoDairyFarmDrivewayBlockedByLooseCow with
value true or false. Instead, we would need a structured representation, in which ob-
STRUCTURED
REPRESENTATION
jects such as cows and trucks and their various and varying relationships can be described
explicitly. (See Figure 2.16(c).) Structured representations underlie relational databases
and first-order logic (Chapters 8, 9, and 12), first-order probability models (Chapter 14),
knowledge-based learning (Chapter 19) and much of natural language understanding
(Chapters 22 and 23). In fact, almost everything that humans express in natural language
concerns objects and their relationships.
As we mentioned earlier, the axis along which atomic, factored, and structured repre-
sentations lie is the axis of increasing expressiveness. Roughly speaking, a more expressive
EXPRESSIVENESS
representation can capture, at least as concisely, everything a less expressive one can capture,
plus some more. Often, the more expressive language is much more concise; for example, the
rules of chess can be written in a page or two of a structured-representation language such
as first-order logic but require thousands of pages when written in a factored-representation
language such as propositional logic. On the other hand, reasoning and learning become
more complex as the expressive power of the representation increases. To gain the benefits
of expressive representations while avoiding their drawbacks, intelligent systems for the real
world may need to operate at all points along the axis simultaneously.

2.5 SUMMARY
This chapter has been something of a whirlwind tour of AI, which we have conceived of as
the science of agent design. The major points to recall are as follows:
• An agent is something that perceives and acts in an environment. The agent function
for an agent specifies the action taken by the agent in response to any percept sequence.
• The performance measure evaluates the behavior of the agent in an environment. A
rational agent acts so as to maximize the expected value of the performance measure,
given the percept sequence it has seen so far.
• A task environment specification includes the performance measure, the external en-
vironment, the actuators, and the sensors. In designing an agent, the first step must
always be to specify the task environment as fully as possible.
• Task environments vary along several significant dimensions. They can be fully or
partially observable, single-agent or multiagent, deterministic or stochastic, episodic or
sequential, static or dynamic, discrete or continuous, and known or unknown.
• The agent program implements the agent function. There exists a variety of basic
agent-program designs reflecting the kind of information made explicit and used in the
decision process. The designs vary in efficiency, compactness, and flexibility. The
appropriate design of the agent program depends on the nature of the environment.
• Simple reflex agents respond directly to percepts, whereas model-based reflex agents
maintain internal state to track aspects of the world that are not evident in the current
percept. Goal-based agents act to achieve their goals, and utility-based agents try to
maximize their own expected “happiness.”
• All agents can improve their performance through learning.
The central role of action in intelligence—the notion of practical reasoning—goes back at
least as far as Aristotle’s Nicomachean Ethics. Practical reasoning was also the subject of
McCarthy’s (1958) influential paper “Programs with Common Sense.” The fields of robotics
and control theory are, by their very nature, concerned principally with physical agents. The
concept of a controller in control theory is identical to that of an agent in AI. Perhaps sur-
CONTROLLER
prisingly, AI has concentrated for most of its history on isolated components of agents—
question-answering systems, theorem-provers, vision systems, and so on—rather than on
whole agents. The discussion of agents in the text by Genesereth and Nilsson (1987) was an
influential exception. The whole-agent view is now widely accepted and is a central theme in
recent texts (Poole et al., 1998; Nilsson, 1998; Padgham and Winikoff, 2004; Jones, 2007).
Chapter 1 traced the roots of the concept of rationality in philosophy and economics. In
AI, the concept was of peripheral interest until the mid-1980s, when it began to suffuse many

discussions about the proper technical foundations of the field. A paper by Jon Doyle (1983)
predicted that rational agent design would come to be seen as the core mission of AI, while
other popular topics would spin off to form new disciplines.
Careful attention to the properties of the environment and their consequences for ra-
tional agent design is most apparent in the control theory tradition—for example, classical
control systems (Dorf and Bishop, 2004; Kirk, 2004) handle fully observable, deterministic
environments; stochastic optimal control (Kumar and Varaiya, 1986; Bertsekas and Shreve,
2007) handles partially observable, stochastic environments; and hybrid control (Henzinger
and Sastry, 1998; Cassandras and Lygeros, 2006) deals with environments containing both
discrete and continuous elements. The distinction between fully and partially observable en-
vironments is also central in the dynamic programming literature developed in the field of
operations research (Puterman, 1994), which we discuss in Chapter 17.
Reflex agents were the primary model for psychological behaviorists such as Skinner
(1953), who attempted to reduce the psychology of organisms strictly to input/output or stim-
ulus/response mappings. The advance from behaviorism to functionalism in psychology,
which was at least partly driven by the application of the computer metaphor to agents (Put-
nam, 1960; Lewis, 1966), introduced the internal state of the agent into the picture. Most
work in AI views the idea of pure reflex agents with state as too simple to provide much
leverage, but work by Rosenschein (1985) and Brooks (1986) questioned this assumption
(see Chapter 25). In recent years, a great deal of work has gone into finding efficient algo-
rithms for keeping track of complex environments (Hamscher et al., 1992; Simon, 2006). The
Remote Agent program (described on page 28) that controlled the Deep Space One spacecraft
is a particularly impressive example (Muscettola et al., 1998; Jonsson et al., 2000).
Goal-based agents are presupposed in everything from Aristotle’s view of practical rea-
soning to McCarthy’s early papers on logical AI. Shakey the Robot (Fikes and Nilsson,
1971; Nilsson, 1984) was the first robotic embodiment of a logical, goal-based agent. A
full logical analysis of goal-based agents appeared in Genesereth and Nilsson (1987), and a
goal-based programming methodology called agent-oriented programming was developed by
Shoham (1993). The agent-based approach is now extremely popular in software engineer-
ing (Ciancarini and Wooldridge, 2001). It has also infiltrated the area of operating systems,
where autonomic computing refers to computer systems and networks that monitor and con-
AUTONOMIC
COMPUTING
trol themselves with a perceive–act loop and machine learning methods (Kephart and Chess,
2003). Noting that a collection of agent programs designed to work well together in a true
multiagent environment necessarily exhibits modularity—the programs share no internal state
and communicate with each other only through the environment—it is common within the
field of multiagent systems to design the agent program of a single agent as a collection of
MULTIAGENT
SYSTEMS
autonomous sub-agents. In some cases, one can even prove that the resulting system gives
the same optimal solutions as a monolithic design.
The goal-based view of agents also dominates the cognitive psychology tradition in the
area of problem solving, beginning with the enormously influential Human Problem Solv-
ing (Newell and Simon, 1972) and running through all of Newell’s later work (Newell, 1990).
Goals, further analyzed as desires (general) and intentions (currently pursued), are central to
the theory of agents developed by Bratman (1987). This theory has been influential both in

Exercises 61
natural language understanding and multiagent systems.
Horvitz et al. (1988) speciﬁcally suggest the use of rationality conceived as the maxi-
mization of expected utility as a basis for AI. The text by Pearl (1988) was the ﬁrst in AI to
cover probability and utility theory in depth; its exposition of practical methods for reasoning
and decision making under uncertainty was probably the single biggest factor in the rapid
shift towards utility-based agents in the 1990s (see Part IV).
The general design for learning agents portrayed in Figure 2.15 is classic in the machine
learning literature (Buchanan et al., 1978; Mitchell, 1997). Examples of the design, as em-
bodied in programs, go back at least as far as Arthur Samuel’s (1959, 1967) learning program
for playing checkers. Learning agents are discussed in depth in Part V.
Interest in agents and in agent design has risen rapidly in recent years, partly because of
the growth of the Internet and the perceived need for automated and mobile softbot (Etzioni
and Weld, 1994). Relevant papers are collected in Readings in Agents (Huhns and Singh,
1998) and Foundations of Rational Agency (Wooldridge and Rao, 1999). Texts on multiagent
systems usually provide a good introduction to many aspects of agent design (Weiss, 2000a;
Wooldridge, 2002). Several conference series devoted to agents began in the 1990s, including
the International Workshop on Agent Theories, Architectures, and Languages (ATAL), the
International Conference on Autonomous Agents (AGENTS), and the International Confer-
ence on Multi-Agent Systems (ICMAS). In 2002, these three merged to form the International
Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS). The journal
Autonomous Agents and Multi-Agent Systems was founded in 1998. Finally, Dung Beetle
Ecology (Hanski and Cambefort, 1991) provides a wealth of interesting information on the
behavior of dung beetles. YouTube features inspiring video recordings of their activities.
EXERCISES
2.1 Let us examine the rationality of various vacuum-cleaner agent functions.
a. Show that the simple vacuum-cleaner agent function described in Figure 2.3 is indeed
rational under the assumptions listed on page 38.
b. Describe a rational agent function for the case in which each movement costs one point.
Does the corresponding agent program require internal state?
c. Discuss possible agent designs for the cases in which clean squares can become dirty
and the geography of the environment is unknown. Does it make sense for the agent to
learn from its experience in these cases? If so, what should it learn? If not, why not?
2.2 Write an essay on the relationship between evolution and one or more of autonomy,
intelligence, and learning.
2.3 For each of the following assertions, say whether it is true or false and support your
answer with examples or counterexamples where appropriate.
a. An agent that senses only partial information about the state cannot be perfectly rational.

b. There exist task environments in which no pure reflex agent can behave rationally.
c. There exists a task environment in which every agent is rational.
d. The input to an agent program is the same as the input to the agent function.
e. Every agent function is implementable by some program/machine combination.
f. Suppose an agent selects its action uniformly at random from the set of possible actions.
There exists a deterministic task environment in which this agent is rational.
g. It is possible for a given agent to be perfectly rational in two distinct task environments.
h. Every agent is rational in an unobservable environment.
i. A perfectly rational poker-playing agent never loses.
2.4 For each of the following activities, give a PEAS description of the task environment
and characterize it in terms of the properties listed in Section 2.3.2.
• Performing a gymnastics floor routine.
• Exploring the subsurface oceans of Titan.
• Playing soccer.
• Shopping for used AI books on the Internet.
• Practicing tennis against a wall.
• Performing a high jump.
• Bidding on an item at an auction.
2.5 Define in your own words the following terms: agent, agent function, agent program,
rationality, autonomy, reflex agent, model-based agent, goal-based agent, utility-based agent,
learning agent.
2.6 This exercise explores the differences between agent functions and agent programs.
a. Can there be more than one agent program that implements a given agent function?
Give an example, or show why one is not possible.
b. Are there agent functions that cannot be implemented by any agent program?
c. Given a fixed machine architecture, does each agent program implement exactly one
agent function?
d. Given an architecture with n bits of storage, how many different possible agent pro-
grams are there?
e. Suppose we keep the agent program fixed but speed up the machine by a factor of two.
Does that change the agent function?
2.7 Write pseudocode agent programs for the goal-based and utility-based agents.
2.8 Consider a simple thermostat that turns on a furnace when the temperature is at least 3
degrees below the setting, and turns off a furnace when the temperature is at least 3 degrees
above the setting. Is a thermostat an instance of a simple reflex agent, a model-based reflex
agent, or a goal-based agent?

Exercises 63
The following exercises all concern the implementation of environments and agents for the
vacuum-cleaner world.
2.9 Implement a performance-measuring environment simulator for the vacuum-cleaner
world depicted in Figure 2.2 and specified on page 38. Your implementation should be modu-
lar so that the sensors, actuators, and environment characteristics (size, shape, dirt placement,
etc.) can be changed easily. (Note: for some choices of programming language and operating
system there are already implementations in the online code repository.)
2.10 Consider a modified version of the vacuum environment in Exercise 2.9, in which the
agent is penalized one point for each movement.
a. Can a simple reflex agent be perfectly rational for this environment? Explain.
b. What about a reflex agent with state? Design such an agent.
c. How do your answers to a and b change if the agent’s percepts give it the clean/dirty
status of every square in the environment?
2.11 Consider a modified version of the vacuum environment in Exercise 2.9, in which the
geography of the environment—its extent, boundaries, and obstacles—is unknown, as is the
initial dirt configuration. (The agent can go Up and Down as well as Left and Right.)
a. Can a simple reflex agent be perfectly rational for this environment? Explain.
b. Can a simple reflex agent with a randomized agent function outperform a simple reflex
agent? Design such an agent and measure its performance on several environments.
c. Can you design an environment in which your randomized agent will perform poorly?
Show your results.
d. Can a reflex agent with state outperform a simple reflex agent? Design such an agent
and measure its performance on several environments. Can you design a rational agent
of this type?
2.12 Repeat Exercise 2.11 for the case in which the location sensor is replaced with a
“bump” sensor that detects the agent’s attempts to move into an obstacle or to cross the
boundaries of the environment. Suppose the bump sensor stops working; how should the
agent behave?
2.13 The vacuum environments in the preceding exercises have all been deterministic. Dis-
cuss possible agent programs for each of the following stochastic versions:
a. Murphy’s law: twenty-five percent of the time, the Suck action fails to clean the floor if
it is dirty and deposits dirt onto the floor if the floor is clean. How is your agent program
affected if the dirt sensor gives the wrong answer 10% of the time?
b. Small children: At each time step, each clean square has a 10% chance of becoming
dirty. Can you come up with a rational agent design for this case?

3 SOLVING PROBLEMS BY
SEARCHING
In which we see how an agent can find a sequence of actions that achieves its
goals when no single action will do.
The simplest agents discussed in Chapter 2 were the reflex agents, which base their actions on
a direct mapping from states to actions. Such agents cannot operate well in environments for
which this mapping would be too large to store and would take too long to learn. Goal-based
agents, on the other hand, consider future actions and the desirability of their outcomes.
This chapter describes one kind of goal-based agent called a problem-solving agent.
PROBLEM-SOLVING
AGENT
Problem-solving agents use atomic representations, as described in Section 2.4.7—that is,
states of the world are considered as wholes, with no internal structure visible to the problem-
solving algorithms. Goal-based agents that use more advanced factored or structured rep-
resentations are usually called planning agents and are discussed in Chapters 7 and 10.
Our discussion of problem solving begins with precise definitions of problems and their
solutions and give several examples to illustrate these definitions. We then describe several
general-purpose search algorithms that can be used to solve these problems. We will see
several uninformed search algorithms—algorithms that are given no information about the
problem other than its definition. Although some of these algorithms can solve any solvable
problem, none of them can do so efficiently. Informed search algorithms, on the other hand,
can do quite well given some guidance on where to look for solutions.
In this chapter, we limit ourselves to the simplest kind of task environment, for which
the solution to a problem is always a fixed sequence of actions. The more general case—where
the agent’s future actions may vary depending on future percepts—is handled in Chapter 4.
This chapter uses the concepts of asymptotic complexity (that is, O() notation) and
NP-completeness. Readers unfamiliar with these concepts should consult Appendix A.
3.1 PROBLEM-SOLVING AGENTS
Intelligent agents are supposed to maximize their performance measure. As we mentioned
in Chapter 2, achieving this is sometimes simplified if the agent can adopt a goal and aim at
satisfying it. Let us first look at why and how an agent might do this.
64

Section 3.1. Problem-Solving Agents 65
Imagine an agent in the city of Arad, Romania, enjoying a touring holiday. The agent’s
performance measure contains many factors: it wants to improve its suntan, improve its Ro-
manian, take in the sights, enjoy the nightlife (such as it is), avoid hangovers, and so on. The
decision problem is a complex one involving many tradeoffs and careful reading of guide-
books. Now, suppose the agent has a nonrefundable ticket to fly out of Bucharest the follow-
ing day. In that case, it makes sense for the agent to adopt the goal of getting to Bucharest.
Courses of action that don’t reach Bucharest on time can be rejected without further consid-
eration and the agent’s decision problem is greatly simplified. Goals help organize behavior
by limiting the objectives that the agent is trying to achieve and hence the actions it needs
to consider. Goal formulation, based on the current situation and the agent’s performance
GOAL FORMULATION
measure, is the first step in problem solving.
We will consider a goal to be a set of world states—exactly those states in which the
goal is satisfied. The agent’s task is to find out how to act, now and in the future, so that it
reaches a goal state. Before it can do this, it needs to decide (or we need to decide on its
behalf) what sorts of actions and states it should consider. If it were to consider actions at
the level of “move the left foot forward an inch” or “turn the steering wheel one degree left,”
the agent would probably never find its way out of the parking lot, let alone to Bucharest,
because at that level of detail there is too much uncertainty in the world and there would be
too many steps in a solution. Problem formulation is the process of deciding what actions
PROBLEM
FORMULATION
and states to consider, given a goal. We discuss this process in more detail later. For now, let
us assume that the agent will consider actions at the level of driving from one major town to
another. Each state therefore corresponds to being in a particular town.
Our agent has now adopted the goal of driving to Bucharest and is considering where
to go from Arad. Three roads lead out of Arad, one toward Sibiu, one to Timisoara, and one
to Zerind. None of these achieves the goal, so unless the agent is familiar with the geography
of Romania, it will not know which road to follow.1 In other words, the agent will not know
which of its possible actions is best, because it does not yet know enough about the state
that results from taking each action. If the agent has no additional information—i.e., if the
environment is unknown in the sense defined in Section 2.3—then it is has no choice but to
try one of the actions at random. This sad situation is discussed in Chapter 4.
But suppose the agent has a map of Romania. The point of a map is to provide the
agent with information about the states it might get itself into and the actions it can take. The
agent can use this information to consider subsequent stages of a hypothetical journey via
each of the three towns, trying to find a journey that eventually gets to Bucharest. Once it has
found a path on the map from Arad to Bucharest, it can achieve its goal by carrying out the
driving actions that correspond to the legs of the journey. In general, an agent with several
immediate options of unknown value can decide what to do by first examining future actions
that eventually lead to states of known value.
To be more specific about what we mean by “examining future actions,” we have to
be more specific about properties of the environment, as defined in Section 2.3. For now,
1 We are assuming that most readers are in the same position and can easily imagine themselves to be as clueless
as our agent. We apologize to Romanian readers who are unable to take advantage of this pedagogical device.

66 Chapter 3. Solving Problems by Searching
we assume that the environment is observable, so the agent always knows the current state.
For the agent driving in Romania, it’s reasonable to suppose that each city on the map has a
sign indicating its presence to arriving drivers. We also assume the environment is discrete,
so at any given state there are only finitely many actions to choose from. This is true for
navigating in Romania because each city is connected to a small number of other cities. We
will assume the environment is known, so the agent knows which states are reached by each
action. (Having an accurate map suffices to meet this condition for navigation problems.)
Finally, we assume that the environment is deterministic, so each action has exactly one
outcome. Under ideal conditions, this is true for the agent in Romania—it means that if it
chooses to drive from Arad to Sibiu, it does end up in Sibiu. Of course, conditions are not
always ideal, as we show in Chapter 4.
Under these assumptions, the solution to any problem is a fixed sequence of actions.
“Of course!” one might say, “What else could it be?” Well, in general it could be a branching
strategy that recommends different actions in the future depending on what percepts arrive.
For example, under less than ideal conditions, the agent might plan to drive from Arad to
Sibiu and then to Rimnicu Vilcea but may also need to have a contingency plan in case it
arrives by accident in Zerind instead of Sibiu. Fortunately, if the agent knows the initial state
and the environment is known and deterministic, it knows exactly where it will be after the
first action and what it will perceive. Since only one percept is possible after the first action,
the solution can specify only one possible second action, and so on.
The process of looking for a sequence of actions that reaches the goal is called search.
SEARCH
A search algorithm takes a problem as input and returns a solution in the form of an action
SOLUTION
sequence. Once a solution is found, the actions it recommends can be carried out. This
is called the execution phase. Thus, we have a simple “formulate, search, execute” design
EXECUTION
for the agent, as shown in Figure 3.1. After formulating a goal and a problem to solve,
the agent calls a search procedure to solve it. It then uses the solution to guide its actions,
doing whatever the solution recommends as the next thing to do—typically, the first action of
the sequence—and then removing that step from the sequence. Once the solution has been
executed, the agent will formulate a new goal.
Notice that while the agent is executing the solution sequence it ignores its percepts
when choosing an action because it knows in advance what they will be. An agent that
carries out its plans with its eyes closed, so to speak, must be quite certain of what is going
on. Control theorists call this an open-loop system, because ignoring the percepts breaks the
OPEN-LOOP
loop between agent and environment.
We first describe the process of problem formulation, and then devote the bulk of the
chapter to various algorithms for the SEARCH function. We do not discuss the workings of
the UPDATE-STATE and FORMULATE-GOAL functions further in this chapter.
3.1.1 Well-defined problems and solutions
A problem can be defined formally by five components:
PROBLEM
• The initial state that the agent starts in. For example, the initial state for our agent in
INITIAL STATE
Romania might be described as In(Arad).

Section 3.1. Problem-Solving Agents 67
function SIMPLE-PROBLEM-SOLVING-AGENT(percept) returns an action
persistent: seq, an action sequence, initially empty
state, some description of the current world state
goal, a goal, initially null
problem, a problem formulation
state ← UPDATE-STATE(state,percept)
if seq is empty then
goal ← FORMULATE-GOAL(state)
problem ← FORMULATE-PROBLEM(state,goal)
seq ← SEARCH(problem)
if seq = failure then return a null action
action ← FIRST(seq)
seq ← REST(seq)
return action
Figure 3.1 A simple problem-solving agent. It first formulates a goal and a problem,
searches for a sequence of actions that would solve the problem, and then executes the actions
one at a time. When this is complete, it formulates another goal and starts over.
• A description of the possible actions available to the agent. Given a particular state s,
ACTIONS
ACTIONS(s) returns the set of actions that can be executed in s. We say that each of
these actions is applicable in s. For example, from the state In(Arad), the applicable
APPLICABLE
actions are {Go(Sibiu), Go(Timisoara), Go(Zerind)}.
• A description of what each action does; the formal name for this is the transition
model, specified by a function RESULT(s, a) that returns the state that results from
TRANSITION MODEL
doing action a in state s. We also use the term successor to refer to any state reachable
SUCCESSOR
from a given state by a single action.2 For example, we have
RESULT(In(Arad), Go(Zerind)) = In(Zerind) .
Together, the initial state, actions, and transition model implicitly define the state space
STATE SPACE
of the problem—the set of all states reachable from the initial state by any sequence
of actions. The state space forms a directed network or graph in which the nodes
GRAPH
are states and the links between nodes are actions. (The map of Romania shown in
Figure 3.2 can be interpreted as a state-space graph if we view each road as standing
for two driving actions, one in each direction.) A path in the state space is a sequence
PATH
of states connected by a sequence of actions.
• The goal test, which determines whether a given state is a goal state. Sometimes there
GOAL TEST
is an explicit set of possible goal states, and the test simply checks whether the given
state is one of them. The agent’s goal in Romania is the singleton set {In(Bucharest)}.
2 Many treatments of problem solving, including previous editions of this book, use a successor function, which
returns the set of all successors, instead of separate ACTIONS and RESULT functions. The successor function
makes it difficult to describe an agent that knows what actions it can try but not what they achieve. Also, note
some author use RESULT(a, s) instead of RESULT(s, a), and some use DO instead of RESULT.

Giurgiu
Urziceni
Hirsova
Eforie
Neamt
Oradea
Zerind
Arad
Timisoara
Lugoj
Mehadia
Drobeta
Craiova
Sibiu Fagaras
Pitesti
Vaslui
Iasi
Rimnicu Vilcea
Bucharest
71
75
118
111
70
75
120
151
140
99
80
97
101
211
138
146 85
90
98
142
92
87
86
Figure 3.2 A simplified road map of part of Romania.
Sometimes the goal is specified by an abstract property rather than an explicitly enumer-
ated set of states. For example, in chess, the goal is to reach a state called “checkmate,”
where the opponent’s king is under attack and can’t escape.
• A path cost function that assigns a numeric cost to each path. The problem-solving
PATH COST
agent chooses a cost function that reflects its own performance measure. For the agent
trying to get to Bucharest, time is of the essence, so the cost of a path might be its length
in kilometers. In this chapter, we assume that the cost of a path can be described as the
sum of the costs of the individual actions along the path.3 The step cost of taking action
STEP COST
a in state s to reach state s′ is denoted by c(s, a, s′). The step costs for Romania are
shown in Figure 3.2 as route distances. We assume that step costs are nonnegative.4
The preceding elements define a problem and can be gathered into a single data structure
that is given as input to a problem-solving algorithm. A solution to a problem is an action
sequence that leads from the initial state to a goal state. Solution quality is measured by the
path cost function, and an optimal solution has the lowest path cost among all solutions.
OPTIMAL SOLUTION
3.1.2 Formulating problems
In the preceding section we proposed a formulation of the problem of getting to Bucharest in
terms of the initial state, actions, transition model, goal test, and path cost. This formulation
seems reasonable, but it is still a model—an abstract mathematical description—and not the
3 This assumption is algorithmically convenient but also theoretically justifiable—see page 652 in Chapter 17.
4 The implications of negative costs are explored in Exercise 3.8.

Section 3.2. Example Problems 69
real thing. Compare the simple state description we have chosen, In(Arad), to an actual cross-
country trip, where the state of the world includes so many things: the traveling companions,
the current radio program, the scenery out of the window, the proximity of law enforcement
officers, the distance to the next rest stop, the condition of the road, the weather, and so on.
All these considerations are left out of our state descriptions because they are irrelevant to the
problem of finding a route to Bucharest. The process of removing detail from a representation
is called abstraction.
ABSTRACTION
In addition to abstracting the state description, we must abstract the actions themselves.
A driving action has many effects. Besides changing the location of the vehicle and its oc-
cupants, it takes up time, consumes fuel, generates pollution, and changes the agent (as they
say, travel is broadening). Our formulation takes into account only the change in location.
Also, there are many actions that we omit altogether: turning on the radio, looking out of
the window, slowing down for law enforcement officers, and so on. And of course, we don’t
specify actions at the level of “turn steering wheel to the left by one degree.”
Can we be more precise about defining the appropriate level of abstraction? Think of the
abstract states and actions we have chosen as corresponding to large sets of detailed world
states and detailed action sequences. Now consider a solution to the abstract problem: for
example, the path from Arad to Sibiu to Rimnicu Vilcea to Pitesti to Bucharest. This abstract
solution corresponds to a large number of more detailed paths. For example, we could drive
with the radio on between Sibiu and Rimnicu Vilcea, and then switch it off for the rest of
the trip. The abstraction is valid if we can expand any abstract solution into a solution in the
more detailed world; a sufficient condition is that for every detailed state that is “in Arad,”
there is a detailed path to some state that is “in Sibiu,” and so on.5 The abstraction is useful
if carrying out each of the actions in the solution is easier than the original problem; in this
case they are easy enough that they can be carried out without further search or planning by
an average driving agent. The choice of a good abstraction thus involves removing as much
detail as possible while retaining validity and ensuring that the abstract actions are easy to
carry out. Were it not for the ability to construct useful abstractions, intelligent agents would
be completely swamped by the real world.
3.2 EXAMPLE PROBLEMS
The problem-solving approach has been applied to a vast array of task environments. We
list some of the best known here, distinguishing between toy and real-world problems. A
toy problem is intended to illustrate or exercise various problem-solving methods. It can be
TOY PROBLEM
given a concise, exact description and hence is usable by different researchers to compare the
performance of algorithms. A real-world problem is one whose solutions people actually
REAL-WORLD
PROBLEM
care about. Such problems tend not to have a single agreed-upon description, but we can give
the general flavor of their formulations.
5 See Section 11.2 for a more complete set of definitions and algorithms.

R
L
S S
S S
R
L
R
L
R
L
S
S
S
S
L
L
L
L R
R
R
R
Figure 3.3 The state space for the vacuum world. Links denote actions: L = Left, R =
Right, S = Suck.
3.2.1 Toy problems
The first example we examine is the vacuum world first introduced in Chapter 2. (See
Figure 2.2.) This can be formulated as a problem as follows:
• States: The state is determined by both the agent location and the dirt locations. The
agent is in one of two locations, each of which might or might not contain dirt. Thus,
there are 2 × 22 = 8 possible world states. A larger environment with n locations has
n · 2n states.
• Initial state: Any state can be designated as the initial state.
• Actions: In this simple environment, each state has just three actions: Left, Right, and
Suck. Larger environments might also include Up and Down.
• Transition model: The actions have their expected effects, except that moving Left in
the leftmost square, moving Right in the rightmost square, and Sucking in a clean square
have no effect. The complete state space is shown in Figure 3.3.
• Goal test: This checks whether all the squares are clean.
• Path cost: Each step costs 1, so the path cost is the number of steps in the path.
Compared with the real world, this toy problem has discrete locations, discrete dirt, reliable
cleaning, and it never gets any dirtier. Chapter 4 relaxes some of these assumptions.
The 8-puzzle, an instance of which is shown in Figure 3.4, consists of a 3×3 board with
8-PUZZLE
eight numbered tiles and a blank space. A tile adjacent to the blank space can slide into the
space. The object is to reach a specified goal state, such as the one shown on the right of the
figure. The standard formulation is as follows:

2
Start State Goal State
1
3 4
6 7
5
1
2
3
4
6
7
8
5
8
Figure 3.4 A typical instance of the 8-puzzle.
• States: A state description specifies the location of each of the eight tiles and the blank
in one of the nine squares.
• Initial state: Any state can be designated as the initial state. Note that any given goal
can be reached from exactly half of the possible initial states (Exercise 3.5).
• Actions: The simplest formulation defines the actions as movements of the blank space
Left, Right, Up, or Down. Different subsets of these are possible depending on where
the blank is.
• Transition model: Given a state and action, this returns the resulting state; for example,
if we apply Left to the start state in Figure 3.4, the resulting state has the 5 and the blank
switched.
• Goal test: This checks whether the state matches the goal configuration shown in Fig-
ure 3.4. (Other goal configurations are possible.)
• Path cost: Each step costs 1, so the path cost is the number of steps in the path.
What abstractions have we included here? The actions are abstracted to their beginning and
final states, ignoring the intermediate locations where the block is sliding. We have abstracted
away actions such as shaking the board when pieces get stuck and ruled out extracting the
pieces with a knife and putting them back again. We are left with a description of the rules of
the puzzle, avoiding all the details of physical manipulations.
The 8-puzzle belongs to the family of sliding-block puzzles, which are often used as
SLIDING-BLOCK
PUZZLES
test problems for new search algorithms in AI. This family is known to be NP-complete,
so one does not expect to find methods significantly better in the worst case than the search
algorithms described in this chapter and the next. The 8-puzzle has 9!/2 = 181, 440 reachable
states and is easily solved. The 15-puzzle (on a 4×4 board) has around 1.3 trillion states, and
random instances can be solved optimally in a few milliseconds by the best search algorithms.
The 24-puzzle (on a 5 × 5 board) has around 1025 states, and random instances take several
hours to solve optimally.
The goal of the 8-queens problem is to place eight queens on a chessboard such that
8-QUEENS PROBLEM
no queen attacks any other. (A queen attacks any piece in the same row, column or diago-
nal.) Figure 3.5 shows an attempted solution that fails: the queen in the rightmost column is
attacked by the queen at the top left.

Figure 3.5 Almost a solution to the 8-queens problem. (Solution is left as an exercise.)
Although efficient special-purpose algorithms exist for this problem and for the whole
n-queens family, it remains a useful test problem for search algorithms. There are two main
kinds of formulation. An incremental formulation involves operators that augment the state
INCREMENTAL
FORMULATION
description, starting with an empty state; for the 8-queens problem, this means that each
action adds a queen to the state. A complete-state formulation starts with all 8 queens on
COMPLETE-STATE
FORMULATION
the board and moves them around. In either case, the path cost is of no interest because only
the final state counts. The first incremental formulation one might try is the following:
• States: Any arrangement of 0 to 8 queens on the board is a state.
• Initial state: No queens on the board.
• Actions: Add a queen to any empty square.
• Transition model: Returns the board with a queen added to the specified square.
• Goal test: 8 queens are on the board, none attacked.
In this formulation, we have 64 · 63 · · · 57 ≈ 1.8 × 1014 possible sequences to investigate. A
better formulation would prohibit placing a queen in any square that is already attacked:
• States: All possible arrangements of n queens (0 ≤ n ≤ 8), one per column in the
leftmost n columns, with no queen attacking another.
• Actions: Add a queen to any square in the leftmost empty column such that it is not
attacked by any other queen.
This formulation reduces the 8-queens state space from 1.8 × 1014 to just 2,057, and solutions
are easy to find. On the other hand, for 100 queens the reduction is from roughly 10400 states
to about 1052 states (Exercise 3.6)—a big improvement, but not enough to make the problem
tractable. Section 4.1 describes the complete-state formulation, and Chapter 6 gives a simple
algorithm that solves even the million-queens problem with ease.

Our final toy problem was devised by Donald Knuth (1964) and illustrates how infinite
state spaces can arise. Knuth conjectured that, starting with the number 4, a sequence of fac-
torial, square root, and floor operations will reach any desired positive integer. For example,
we can reach 5 from 4 as follows:
j
v
u
u
t
srqp
(4!)!
k
= 5 .
The problem definition is very simple:
• States: Positive numbers.
• Initial state: 4.
• Actions: Apply factorial, square root, or floor operation (factorial for integers only).
• Transition model: As given by the mathematical definitions of the operations.
• Goal test: State is the desired positive integer.
To our knowledge there is no bound on how large a number might be constructed in the pro-
cess of reaching a given target—for example, the number 620,448,401,733,239,439,360,000
is generated in the expression for 5—so the state space for this problem is infinite. Such
state spaces arise frequently in tasks involving the generation of mathematical expressions,
circuits, proofs, programs, and other recursively defined objects.
3.2.2 Real-world problems
We have already seen how the route-finding problem is defined in terms of specified loca-
ROUTE-FINDING
PROBLEM
tions and transitions along links between them. Route-finding algorithms are used in a variety
of applications. Some, such as Web sites and in-car systems that provide driving directions,
are relatively straightforward extensions of the Romania example. Others, such as routing
video streams in computer networks, military operations planning, and airline travel-planning
systems, involve much more complex specifications. Consider the airline travel problems that
must be solved by a travel-planning Web site:
• States: Each state obviously includes a location (e.g., an airport) and the current time.
Furthermore, because the cost of an action (a flight segment) may depend on previous
segments, their fare bases, and their status as domestic or international, the state must
record extra information about these “historical” aspects.
• Initial state: This is specified by the user’s query.
• Actions: Take any flight from the current location, in any seat class, leaving after the
current time, leaving enough time for within-airport transfer if needed.
• Transition model: The state resulting from taking a flight will have the flight’s desti-
nation as the current location and the flight’s arrival time as the current time.
• Goal test: Are we at the final destination specified by the user?
• Path cost: This depends on monetary cost, waiting time, flight time, customs and im-
migration procedures, seat quality, time of day, type of airplane, frequent-flyer mileage
awards, and so on.

Commercial travel advice systems use a problem formulation of this kind, with many addi-
tional complications to handle the byzantine fare structures that airlines impose. Any sea-
soned traveler knows, however, that not all air travel goes according to plan. A really good
system should include contingency plans—such as backup reservations on alternate flights—
to the extent that these are justified by the cost and likelihood of failure of the original plan.
Touring problems are closely related to route-finding problems, but with an impor-
TOURING PROBLEM
tant difference. Consider, for example, the problem “Visit every city in Figure 3.2 at least
once, starting and ending in Bucharest.” As with route finding, the actions correspond
to trips between adjacent cities. The state space, however, is quite different. Each state
must include not just the current location but also the set of cities the agent has visited.
So the initial state would be In(Bucharest), Visited({Bucharest}), a typical intermedi-
ate state would be In(Vaslui), Visited({Bucharest, Urziceni, Vaslui}), and the goal test
would check whether the agent is in Bucharest and all 20 cities have been visited.
The traveling salesperson problem (TSP) is a touring problem in which each city
TRAVELING
SALESPERSON
PROBLEM
must be visited exactly once. The aim is to find the shortest tour. The problem is known to
be NP-hard, but an enormous amount of effort has been expended to improve the capabilities
of TSP algorithms. In addition to planning trips for traveling salespersons, these algorithms
have been used for tasks such as planning movements of automatic circuit-board drills and of
stocking machines on shop floors.
A VLSI layout problem requires positioning millions of components and connections
VLSI LAYOUT
on a chip to minimize area, minimize circuit delays, minimize stray capacitances, and max-
imize manufacturing yield. The layout problem comes after the logical design phase and is
usually split into two parts: cell layout and channel routing. In cell layout, the primitive
components of the circuit are grouped into cells, each of which performs some recognized
function. Each cell has a fixed footprint (size and shape) and requires a certain number of
connections to each of the other cells. The aim is to place the cells on the chip so that they do
not overlap and so that there is room for the connecting wires to be placed between the cells.
Channel routing finds a specific route for each wire through the gaps between the cells. These
search problems are extremely complex, but definitely worth solving. Later in this chapter,
we present some algorithms capable of solving them.
Robot navigation is a generalization of the route-finding problem described earlier.
ROBOT NAVIGATION
Rather than following a discrete set of routes, a robot can move in a continuous space with
(in principle) an infinite set of possible actions and states. For a circular robot moving on a
flat surface, the space is essentially two-dimensional. When the robot has arms and legs or
wheels that must also be controlled, the search space becomes many-dimensional. Advanced
techniques are required just to make the search space finite. We examine some of these
methods in Chapter 25. In addition to the complexity of the problem, real robots must also
deal with errors in their sensor readings and motor controls.
Automatic assembly sequencing of complex objects by a robot was first demonstrated
AUTOMATIC
ASSEMBLY
SEQUENCING
by FREDDY (Michie, 1972). Progress since then has been slow but sure, to the point where
the assembly of intricate objects such as electric motors is economically feasible. In assembly
problems, the aim is to find an order in which to assemble the parts of some object. If the
wrong order is chosen, there will be no way to add some part later in the sequence without

Section 3.3. Searching for Solutions 75
undoing some of the work already done. Checking a step in the sequence for feasibility is a
difficult geometrical search problem closely related to robot navigation. Thus, the generation
of legal actions is the expensive part of assembly sequencing. Any practical algorithm must
avoid exploring all but a tiny fraction of the state space. Another important assembly problem
is protein design, in which the goal is to find a sequence of amino acids that will fold into a
PROTEIN DESIGN
three-dimensional protein with the right properties to cure some disease.
3.3 SEARCHING FOR SOLUTIONS
Having formulated some problems, we now need to solve them. A solution is an action
sequence, so search algorithms work by considering various possible action sequences. The
possible action sequences starting at the initial state form a search tree with the initial state
SEARCH TREE
at the root; the branches are actions and the nodes correspond to states in the state space of
NODE
the problem. Figure 3.6 shows the first few steps in growing the search tree for finding a route
from Arad to Bucharest. The root node of the tree corresponds to the initial state, In(Arad).
The first step is to test whether this is a goal state. (Clearly it is not, but it is important to
check so that we can solve trick problems like “starting in Arad, get to Arad.”) Then we
need to consider taking various actions. We do this by expanding the current state; that is,
EXPANDING
applying each legal action to the current state, thereby generating a new set of states. In
GENERATING
this case, we add three branches from the parent node In(Arad) leading to three new child
PARENT NODE
nodes: In(Sibiu), In(Timisoara), and In(Zerind). Now we must choose which of these three
CHILD NODE
possibilities to consider further.
This is the essence of search—following up one option now and putting the others aside
for later, in case the first choice does not lead to a solution. Suppose we choose Sibiu first.
We check to see whether it is a goal state (it is not) and then expand it to get In(Arad),
In(Fagaras), In(Oradea), and In(RimnicuVilcea). We can then choose any of these four or go
back and choose Timisoara or Zerind. Each of these six nodes is a leaf node, that is, a node
LEAF NODE
with no children in the tree. The set of all leaf nodes available for expansion at any given
point is called the frontier. (Many authors call it the open list, which is both geographically
FRONTIER
OPEN LIST less evocative and less accurate, because other data structures are better suited than a list.) In
Figure 3.6, the frontier of each tree consists of those nodes with bold outlines.
The process of expanding nodes on the frontier continues until either a solution is found
or there are no more states to expand. The general TREE-SEARCH algorithm is shown infor-
mally in Figure 3.7. Search algorithms all share this basic structure; they vary primarily
according to how they choose which state to expand next—the so-called search strategy.
SEARCH STRATEGY
The eagle-eyed reader will notice one peculiar thing about the search tree shown in Fig-
ure 3.6: it includes the path from Arad to Sibiu and back to Arad again! We say that In(Arad)
is a repeated state in the search tree, generated in this case by a loopy path. Considering
REPEATED STATE
LOOPY PATH such loopy paths means that the complete search tree for Romania is infinite because there
is no limit to how often one can traverse a loop. On the other hand, the state space—the
map shown in Figure 3.2—has only 20 states. As we discuss in Section 3.4, loops can cause

certain algorithms to fail, making otherwise solvable problems unsolvable. Fortunately, there
is no need to consider loopy paths. We can rely on more than intuition for this: because path
costs are additive and step costs are nonnegative, a loopy path to any given state is never
better than the same path with the loop removed.
Loopy paths are a special case of the more general concept of redundant paths, which
REDUNDANT PATH
exist whenever there is more than one way to get from one state to another. Consider the paths
Arad–Sibiu (140 km long) and Arad–Zerind–Oradea–Sibiu (297 km long). Obviously, the
second path is redundant—it’s just a worse way to get to the same state. If you are concerned
about reaching the goal, there’s never any reason to keep more than one path to any given
state, because any goal state that is reachable by extending one path is also reachable by
extending the other.
In some cases, it is possible to deﬁne the problem itself so as to eliminate redundant
paths. For example, if we formulate the 8-queens problem (page 71) so that a queen can be
placed in any column, then each state with n queens can be reached by n! different paths; but
if we reformulate the problem so that each new queen is placed in the leftmost empty column,
then each state can be reached only through one path.
(a) The initial state
(b) After expanding Arad
(c) After expanding Sibiu
Rimnicu Vilcea Lugoj
Arad Fagaras Oradea Arad
Arad Oradea
Rimnicu Vilcea Lugoj
Zerind
Sibiu
Arad Fagaras Oradea
Timisoara
Arad
Arad Oradea
Lugoj Arad
Arad Oradea
Zerind
Arad
Sibiu Timisoara
Arad
Rimnicu Vilcea
Zerind
Arad
Sibiu
Arad Fagaras Oradea
Timisoara
Figure 3.6 Partial search trees for ﬁnding a route from Arad to Bucharest. Nodes that
have been expanded are shaded; nodes that have been generated but not yet expanded are
outlined in bold; nodes that have not yet been generated are shown in faint dashed lines.

function TREE-SEARCH(problem) returns a solution, or failure
initialize the frontier using the initial state of problem
loop do
if the frontier is empty then return failure
choose a leaf node and remove it from the frontier
if the node contains a goal state then return the corresponding solution
expand the chosen node, adding the resulting nodes to the frontier
function GRAPH-SEARCH(problem) returns a solution, or failure
initialize the frontier using the initial state of problem
initialize the explored set to be empty
loop do
if the frontier is empty then return failure
choose a leaf node and remove it from the frontier
if the node contains a goal state then return the corresponding solution
add the node to the explored set
expand the chosen node, adding the resulting nodes to the frontier
only if not in the frontier or explored set
Figure 3.7 An informal description of the general tree-search and graph-search algo-
rithms. The parts of GRAPH-SEARCH marked in bold italic are the additions needed to
handle repeated states.
In other cases, redundant paths are unavoidable. This includes all problems where
the actions are reversible, such as route-finding problems and sliding-block puzzles. Route-
finding on a rectangular grid (like the one used later for Figure 3.9) is a particularly impor-
RECTANGULAR GRID
tant example in computer games. In such a grid, each state has four successors, so a search
tree of depth d that includes repeated states has 4d leaves; but there are only about 2d2 distinct
states within d steps of any given state. For d = 20, this means about a trillion nodes but only
about 800 distinct states. Thus, following redundant paths can cause a tractable problem to
become intractable. This is true even for algorithms that know how to avoid infinite loops.
As the saying goes, algorithms that forget their history are doomed to repeat it. The
way to avoid exploring redundant paths is to remember where one has been. To do this, we
augment the TREE-SEARCH algorithm with a data structure called the explored set (also
EXPLORED SET
known as the closed list), which remembers every expanded node. Newly generated nodes
CLOSED LIST
that match previously generated nodes—ones in the explored set or the frontier—can be dis-
carded instead of being added to the frontier. The new algorithm, called GRAPH-SEARCH, is
shown informally in Figure 3.7. The specific algorithms in this chapter draw on this general
design.
Clearly, the search tree constructed by the GRAPH-SEARCH algorithm contains at most
one copy of each state, so we can think of it as growing a tree directly on the state-space graph,
as shown in Figure 3.8. The algorithm has another nice property: the frontier separates the
SEPARATOR
state-space graph into the explored region and the unexplored region, so that every path from

Figure 3.8 A sequence of search trees generated by a graph search on the Romania prob-
lem of Figure 3.2. At each stage, we have extended each path by one step. Notice that at the
third stage, the northernmost city (Oradea) has become a dead end: both of its successors are
already explored via other paths.
(c)
(b)
(a)
Figure 3.9 The separation property of GRAPH-SEARCH, illustrated on a rectangular-grid
problem. The frontier (white nodes) always separates the explored region of the state space
(black nodes) from the unexplored region (gray nodes). In (a), just the root has been ex-
panded. In (b), one leaf node has been expanded. In (c), the remaining successors of the root
have been expanded in clockwise order.
the initial state to an unexplored state has to pass through a state in the frontier. (If this
seems completely obvious, try Exercise 3.14 now.) This property is illustrated in Figure 3.9.
As every step moves a state from the frontier into the explored region while moving some
states from the unexplored region into the frontier, we see that the algorithm is systematically
examining the states in the state space, one by one, until it ﬁnds a solution.
3.3.1 Infrastructure for search algorithms
Search algorithms require a data structure to keep track of the search tree that is being con-
structed. For each node n of the tree, we have a structure that contains four components:
• n.STATE: the state in the state space to which the node corresponds;
• n.PARENT: the node in the search tree that generated this node;
• n.ACTION: the action that was applied to the parent to generate the node;
• n.PATH-COST: the cost, traditionally denoted by g(n), of the path from the initial state
to the node, as indicated by the parent pointers.

1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
Node
STATE
PARENT
ACTION = Right
PATH-COST = 6
Figure 3.10 Nodes are the data structures from which the search tree is constructed. Each
has a parent, a state, and various bookkeeping fields. Arrows point from child to parent.
Given the components for a parent node, it is easy to see how to compute the necessary
components for a child node. The function CHILD-NODE takes a parent node and an action
and returns the resulting child node:
function CHILD-NODE(problem,parent,action) returns a node
return a node with
STATE = problem.RESULT(parent.STATE,action),
PARENT = parent, ACTION = action,
PATH-COST = parent.PATH-COST + problem.STEP-COST(parent.STATE,action)
The node data structure is depicted in Figure 3.10. Notice how the PARENT pointers
string the nodes together into a tree structure. These pointers also allow the solution path to be
extracted when a goal node is found; we use the SOLUTION function to return the sequence
of actions obtained by following parent pointers back to the root.
Up to now, we have not been very careful to distinguish between nodes and states, but in
writing detailed algorithms it’s important to make that distinction. A node is a bookkeeping
data structure used to represent the search tree. A state corresponds to a configuration of the
world. Thus, nodes are on particular paths, as defined by PARENT pointers, whereas states
are not. Furthermore, two different nodes can contain the same world state if that state is
generated via two different search paths.
Now that we have nodes, we need somewhere to put them. The frontier needs to be
stored in such a way that the search algorithm can easily choose the next node to expand
according to its preferred strategy. The appropriate data structure for this is a queue. The
QUEUE
operations on a queue are as follows:
• EMPTY?(queue) returns true only if there are no more elements in the queue.
• POP(queue) removes the first element of the queue and returns it.
• INSERT(element, queue) inserts an element and returns the resulting queue.

Queues are characterized by the order in which they store the inserted nodes. Three common
variants are the first-in, first-out or FIFO queue, which pops the oldest element of the queue;
FIFO QUEUE
the last-in, first-out or LIFO queue (also known as a stack), which pops the newest element
LIFO QUEUE
of the queue; and the priority queue, which pops the element of the queue with the highest
PRIORITY QUEUE
priority according to some ordering function.
The explored set can be implemented with a hash table to allow efficient checking for
repeated states. With a good implementation, insertion and lookup can be done in roughly
constant time no matter how many states are stored. One must take care to implement the
hash table with the right notion of equality between states. For example, in the traveling
salesperson problem (page 74), the hash table needs to know that the set of visited cities
{Bucharest,Urziceni,Vaslui} is the same as {Urziceni,Vaslui,Bucharest}. Sometimes this can
be achieved most easily by insisting that the data structures for states be in some canonical
form; that is, logically equivalent states should map to the same data structure. In the case
CANONICAL FORM
of states described by sets, for example, a bit-vector representation or a sorted list without
repetition would be canonical, whereas an unsorted list would not.
3.3.2 Measuring problem-solving performance
Before we get into the design of specific search algorithms, we need to consider the criteria
that might be used to choose among them. We can evaluate an algorithm’s performance in
four ways:
• Completeness: Is the algorithm guaranteed to find a solution when there is one?
COMPLETENESS
• Optimality: Does the strategy find the optimal solution, as defined on page 68?
OPTIMALITY
• Time complexity: How long does it take to find a solution?
TIME COMPLEXITY
• Space complexity: How much memory is needed to perform the search?
SPACE COMPLEXITY
Time and space complexity are always considered with respect to some measure of the prob-
lem difficulty. In theoretical computer science, the typical measure is the size of the state
space graph, |V | + |E|, where V is the set of vertices (nodes) of the graph and E is the set
of edges (links). This is appropriate when the graph is an explicit data structure that is input
to the search program. (The map of Romania is an example of this.) In AI, the graph is often
represented implicitly by the initial state, actions, and transition model and is frequently infi-
nite. For these reasons, complexity is expressed in terms of three quantities: b, the branching
factor or maximum number of successors of any node; d, the depth of the shallowest goal
BRANCHING FACTOR
DEPTH node (i.e., the number of steps along the path from the root); and m, the maximum length of
any path in the state space. Time is often measured in terms of the number of nodes generated
during the search, and space in terms of the maximum number of nodes stored in memory.
For the most part, we describe time and space complexity for search on a tree; for a graph,
the answer depends on how “redundant” the paths in the state space are.
To assess the effectiveness of a search algorithm, we can consider just the search cost—
SEARCH COST
which typically depends on the time complexity but can also include a term for memory
usage—or we can use the total cost, which combines the search cost and the path cost of the
TOTAL COST
solution found. For the problem of finding a route from Arad to Bucharest, the search cost
is the amount of time taken by the search and the solution cost is the total length of the path

Section 3.4. Uninformed Search Strategies 81
in kilometers. Thus, to compute the total cost, we have to add milliseconds and kilometers.
There is no “official exchange rate” between the two, but it might be reasonable in this case to
convert kilometers into milliseconds by using an estimate of the car’s average speed (because
time is what the agent cares about). This enables the agent to find an optimal tradeoff point
at which further computation to find a shorter path becomes counterproductive. The more
general problem of tradeoffs between different goods is taken up in Chapter 16.
3.4 UNINFORMED SEARCH STRATEGIES
This section covers several search strategies that come under the heading of uninformed
search (also called blind search). The term means that the strategies have no additional
UNINFORMED
SEARCH
BLIND SEARCH information about states beyond that provided in the problem definition. All they can do is
generate successors and distinguish a goal state from a non-goal state. All search strategies
are distinguished by the order in which nodes are expanded. Strategies that know whether
one non-goal state is “more promising” than another are called informed search or heuristic
INFORMED SEARCH
search strategies; they are covered in Section 3.5.
HEURISTIC SEARCH
3.4.1 Breadth-first search
Breadth-first search is a simple strategy in which the root node is expanded first, then all the
BREADTH-FIRST
SEARCH
successors of the root node are expanded next, then their successors, and so on. In general,
all the nodes are expanded at a given depth in the search tree before any nodes at the next
level are expanded.
Breadth-first search is an instance of the general graph-search algorithm (Figure 3.7) in
which the shallowest unexpanded node is chosen for expansion. This is achieved very simply
by using a FIFO queue for the frontier. Thus, new nodes (which are always deeper than their
parents) go to the back of the queue, and old nodes, which are shallower than the new nodes,
get expanded first. There is one slight tweak on the general graph-search algorithm, which is
that the goal test is applied to each node when it is generated rather than when it is selected for
expansion. This decision is explained below, where we discuss time complexity. Note also
that the algorithm, following the general template for graph search, discards any new path to
a state already in the frontier or explored set; it is easy to see that any such path must be at
least as deep as the one already found. Thus, breadth-first search always has the shallowest
path to every node on the frontier.
Pseudocode is given in Figure 3.11. Figure 3.12 shows the progress of the search on a
simple binary tree.
How does breadth-first search rate according to the four criteria from the previous sec-
tion? We can easily see that it is complete—if the shallowest goal node is at some finite depth
d, breadth-first search will eventually find it after generating all shallower nodes (provided
the branching factor b is finite). Note that as soon as a goal node is generated, we know it
is the shallowest goal node because all shallower nodes must have been generated already
and failed the goal test. Now, the shallowest goal node is not necessarily the optimal one;

function BREADTH-FIRST-SEARCH(problem) returns a solution, or failure
node ← a node with STATE = problem.INITIAL-STATE, PATH-COST = 0
if problem.GOAL-TEST(node.STATE) then return SOLUTION(node)
frontier ← a FIFO queue with node as the only element
explored ← an empty set
loop do
if EMPTY?(frontier) then return failure
node ← POP(frontier) /* chooses the shallowest node in frontier */
add node.STATE to explored
for each action in problem.ACTIONS(node.STATE) do
child ← CHILD-NODE(problem,node,action)
if child.STATE is not in explored or frontier then
if problem.GOAL-TEST(child.STATE) then return SOLUTION(child)
frontier ← INSERT(child,frontier)
Figure 3.11 Breadth-first search on a graph.
technically, breadth-first search is optimal if the path cost is a nondecreasing function of the
depth of the node. The most common such scenario is that all actions have the same cost.
So far, the news about breadth-first search has been good. The news about time and
space is not so good. Imagine searching a uniform tree where every state has b successors.
The root of the search tree generates b nodes at the first level, each of which generates b more
nodes, for a total of b2 at the second level. Each of these generates b more nodes, yielding b3
nodes at the third level, and so on. Now suppose that the solution is at depth d. In the worst
case, it is the last node generated at that level. Then the total number of nodes generated is
b + b2
+ b3
+ · · · + bd
= O(bd
) .
(If the algorithm were to apply the goal test to nodes when selected for expansion, rather than
when generated, the whole layer of nodes at depth d would be expanded before the goal was
detected and the time complexity would be O(bd+1).)
As for space complexity: for any kind of graph search, which stores every expanded
node in the explored set, the space complexity is always within a factor of b of the time
complexity. For breadth-first graph search in particular, every node generated remains in
memory. There will be O(bd−1) nodes in the explored set and O(bd) nodes in the frontier,
A
B C
E F G
D
A
B
D E F G
C
A
C
D E F G
B
B C
D E F G
A
Figure 3.12 Breadth-first search on a simple binary tree. At each stage, the node to be
expanded next is indicated by a marker.

so the space complexity is O(bd), i.e., it is dominated by the size of the frontier. Switching
to a tree search would not save much space, and in a state space with many redundant paths,
switching could cost a great deal of time.
An exponential complexity bound such as O(bd) is scary. Figure 3.13 shows why. It
lists, for various values of the solution depth d, the time and memory required for a breadth-
first search with branching factor b = 10. The table assumes that 1 million nodes can be
generated per second and that a node requires 1000 bytes of storage. Many search problems
fit roughly within these assumptions (give or take a factor of 100) when run on a modern
personal computer.
Depth Nodes Time Memory
2 110 .11 milliseconds 107 kilobytes
4 11,110 11 milliseconds 10.6 megabytes
6 106 1.1 seconds 1 gigabyte
8 108 2 minutes 103 gigabytes
10 1010 3 hours 10 terabytes
12 1012 13 days 1 petabyte
14 1014 3.5 years 99 petabytes
16 1016 350 years 10 exabytes
Figure 3.13 Time and memory requirements for breadth-first search. The numbers shown
assume branching factor b = 10; 1 million nodes/second; 1000 bytes/node.
Two lessons can be learned from Figure 3.13. First, the memory requirements are a
bigger problem for breadth-first search than is the execution time. One might wait 13 days
for the solution to an important problem with search depth 12, but no personal computer has
the petabyte of memory it would take. Fortunately, other strategies require less memory.
The second lesson is that time is still a major factor. If your problem has a solution at
depth 16, then (given our assumptions) it will take about 350 years for breadth-first search (or
indeed any uninformed search) to find it. In general, exponential-complexity search problems
cannot be solved by uninformed methods for any but the smallest instances.
3.4.2 Uniform-cost search
When all step costs are equal, breadth-first search is optimal because it always expands the
shallowest unexpanded node. By a simple extension, we can find an algorithm that is optimal
with any step-cost function. Instead of expanding the shallowest node, uniform-cost search
UNIFORM-COST
SEARCH
expands the node n with the lowest path cost g(n). This is done by storing the frontier as a
priority queue ordered by g. The algorithm is shown in Figure 3.14.
In addition to the ordering of the queue by path cost, there are two other significant
differences from breadth-first search. The first is that the goal test is applied to a node when
it is selected for expansion (as in the generic graph-search algorithm shown in Figure 3.7)
rather than when it is first generated. The reason is that the first goal node that is generated

function UNIFORM-COST-SEARCH(problem) returns a solution, or failure
node ← a node with STATE = problem.INITIAL-STATE, PATH-COST = 0
frontier ← a priority queue ordered by PATH-COST, with node as the only element
explored ← an empty set
loop do
if EMPTY?(frontier) then return failure
node ← POP(frontier) /* chooses the lowest-cost node in frontier */
add node.STATE to explored
if child.STATE is not in explored or frontier then
frontier ← INSERT(child,frontier)
else if child.STATE is in frontier with higher PATH-COST then
replace that frontier node with child
Figure 3.14 Uniform-cost search on a graph. The algorithm is identical to the general
graph search algorithm in Figure 3.7, except for the use of a priority queue and the addition
of an extra check in case a shorter path to a frontier state is discovered. The data structure for
frontier needs to support efﬁcient membership testing, so it should combine the capabilities
of a priority queue and a hash table.
Sibiu Fagaras
Pitesti
Rimnicu Vilcea
Bucharest
99
80
97
101
211
Figure 3.15 Part of the Romania state space, selected to illustrate uniform-cost search.
may be on a suboptimal path. The second difference is that a test is added in case a better
path is found to a node currently on the frontier.
Both of these modiﬁcations come into play in the example shown in Figure 3.15, where
the problem is to get from Sibiu to Bucharest. The successors of Sibiu are Rimnicu Vilcea and
Fagaras, with costs 80 and 99, respectively. The least-cost node, Rimnicu Vilcea, is expanded
next, adding Pitesti with cost 80 + 97 = 177. The least-cost node is now Fagaras, so it is
expanded, adding Bucharest with cost 99 + 211 = 310. Now a goal node has been generated,
but uniform-cost search keeps going, choosing Pitesti for expansion and adding a second path

to Bucharest with cost 80+97+101 = 278. Now the algorithm checks to see if this new path
is better than the old one; it is, so the old one is discarded. Bucharest, now with g-cost 278,
is selected for expansion and the solution is returned.
It is easy to see that uniform-cost search is optimal in general. First, we observe that
whenever uniform-cost search selects a node n for expansion, the optimal path to that node
has been found. (Were this not the case, there would have to be another frontier node n′ on
the optimal path from the start node to n, by the graph separation property of Figure 3.9;
by definition, n′ would have lower g-cost than n and would have been selected first.) Then,
because step costs are nonnegative, paths never get shorter as nodes are added. These two
facts together imply that uniform-cost search expands nodes in order of their optimal path
cost. Hence, the first goal node selected for expansion must be the optimal solution.
Uniform-cost search does not care about the number of steps a path has, but only about
their total cost. Therefore, it will get stuck in an infinite loop if there is a path with an infinite
sequence of zero-cost actions—for example, a sequence of NoOp actions.6 Completeness is
guaranteed provided the cost of every step exceeds some small positive constant ǫ.
Uniform-cost search is guided by path costs rather than depths, so its complexity is not
easily characterized in terms of b and d. Instead, let C∗ be the cost of the optimal solution,7
and assume that every action costs at least ǫ. Then the algorithm’s worst-case time and space
complexity is O(b1+⌊C∗/ǫ⌋), which can be much greater than bd. This is because uniform-
cost search can explore large trees of small steps before exploring paths involving large and
perhaps useful steps. When all step costs are equal, b1+⌊C∗/ǫ⌋ is just bd+1. When all step
costs are the same, uniform-cost search is similar to breadth-first search, except that the latter
stops as soon as it generates a goal, whereas uniform-cost search examines all the nodes at
the goal’s depth to see if one has a lower cost; thus uniform-cost search does strictly more
work by expanding nodes at depth d unnecessarily.
3.4.3 Depth-first search
Depth-first search always expands the deepest node in the current frontier of the search tree.
DEPTH-FIRST
SEARCH
The progress of the search is illustrated in Figure 3.16. The search proceeds immediately
to the deepest level of the search tree, where the nodes have no successors. As those nodes
are expanded, they are dropped from the frontier, so then the search “backs up” to the next
deepest node that still has unexplored successors.
The depth-first search algorithm is an instance of the graph-search algorithm in Fig-
ure 3.7; whereas breadth-first-search uses a FIFO queue, depth-first search uses a LIFO queue.
A LIFO queue means that the most recently generated node is chosen for expansion. This
must be the deepest unexpanded node because it is one deeper than its parent—which, in turn,
was the deepest unexpanded node when it was selected.
As an alternative to the GRAPH-SEARCH-style implementation, it is common to im-
plement depth-first search with a recursive function that calls itself on each of its children in
turn. (A recursive depth-first algorithm incorporating a depth limit is shown in Figure 3.17.)
6 NoOp, or “no operation,” is the name of an assembly language instruction that does nothing.
7 Here, and throughout the book, the “star” in C∗
means an optimal value for C.

A
C
F G
M N O
A
C
F G
L M N O
A
C
F G
L M N O
C
F G
L M N O
A
B C
E F G
K L M N O
A
C
E F G
J K L M N O
A
C
E F G
J K L M N O
A
B C
D E F G
I J K L M N O
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H I J K L M N O
Figure 3.16 Depth-first search on a binary tree. The unexplored region is shown in light
gray. Explored nodes with no descendants in the frontier are removed from memory. Nodes
at depth 3 have no successors and M is the only goal node.
The properties of depth-first search depend strongly on whether the graph-search or
tree-search version is used. The graph-search version, which avoids repeated states and re-
dundant paths, is complete in finite state spaces because it will eventually expand every node.
The tree-search version, on the other hand, is not complete—for example, in Figure 3.6 the
algorithm will follow the Arad–Sibiu–Arad–Sibiu loop forever. Depth-first tree search can be
modified at no extra memory cost so that it checks new states against those on the path from
the root to the current node; this avoids infinite loops in finite state spaces but does not avoid
the proliferation of redundant paths. In infinite state spaces, both versions fail if an infinite
non-goal path is encountered. For example, in Knuth’s 4 problem, depth-first search would
keep applying the factorial operator forever.
For similar reasons, both versions are nonoptimal. For example, in Figure 3.16, depth-
first search will explore the entire left subtree even if node C is a goal node. If node J were
also a goal node, then depth-first search would return it as a solution instead of C, which
would be a better solution; hence, depth-first search is not optimal.

The time complexity of depth-first graph search is bounded by the size of the state space
(which may be infinite, of course). A depth-first tree search, on the other hand, may generate
all of the O(bm) nodes in the search tree, where m is the maximum depth of any node; this
can be much greater than the size of the state space. Note that m itself can be much larger
than d (the depth of the shallowest solution) and is infinite if the tree is unbounded.
So far, depth-first search seems to have no clear advantage over breadth-first search,
so why do we include it? The reason is the space complexity. For a graph search, there is
no advantage, but a depth-first tree search needs to store only a single path from the root
to a leaf node, along with the remaining unexpanded sibling nodes for each node on the
path. Once a node has been expanded, it can be removed from memory as soon as all its
descendants have been fully explored. (See Figure 3.16.) For a state space with branching
factor b and maximum depth m, depth-first search requires storage of only O(bm) nodes.
Using the same assumptions as for Figure 3.13 and assuming that nodes at the same depth as
the goal node have no successors, we find that depth-first search would require 156 kilobytes
instead of 10 exabytes at depth d = 16, a factor of 7 trillion times less space. This has
led to the adoption of depth-first tree search as the basic workhorse of many areas of AI,
including constraint satisfaction (Chapter 6), propositional satisfiability (Chapter 7), and logic
programming (Chapter 9). For the remainder of this section, we focus primarily on the tree-
search version of depth-first search.
A variant of depth-first search called backtracking search uses still less memory. (See
BACKTRACKING
SEARCH
Chapter 6 for more details.) In backtracking, only one successor is generated at a time rather
than all successors; each partially expanded node remembers which successor to generate
next. In this way, only O(m) memory is needed rather than O(bm). Backtracking search
facilitates yet another memory-saving (and time-saving) trick: the idea of generating a suc-
cessor by modifying the current state description directly rather than copying it first. This
reduces the memory requirements to just one state description and O(m) actions. For this to
work, we must be able to undo each modification when we go back to generate the next suc-
cessor. For problems with large state descriptions, such as robotic assembly, these techniques
are critical to success.
3.4.4 Depth-limited search
The embarrassing failure of depth-first search in infinite state spaces can be alleviated by
supplying depth-first search with a predetermined depth limit ℓ. That is, nodes at depth ℓ are
treated as if they have no successors. This approach is called depth-limited search. The
DEPTH-LIMITED
SEARCH
depth limit solves the infinite-path problem. Unfortunately, it also introduces an additional
source of incompleteness if we choose ℓ < d, that is, the shallowest goal is beyond the depth
limit. (This is likely when d is unknown.) Depth-limited search will also be nonoptimal if
we choose ℓ > d. Its time complexity is O(bℓ) and its space complexity is O(bℓ). Depth-first
search can be viewed as a special case of depth-limited search with ℓ = ∞.
Sometimes, depth limits can be based on knowledge of the problem. For example, on
the map of Romania there are 20 cities. Therefore, we know that if there is a solution, it must
be of length 19 at the longest, so ℓ = 19 is a possible choice. But in fact if we studied the

function DEPTH-LIMITED-SEARCH(problem,limit) returns a solution, or failure/cutoff
return RECURSIVE-DLS(MAKE-NODE(problem.INITIAL-STATE),problem,limit)
function RECURSIVE-DLS(node,problem,limit) returns a solution, or failure/cutoff
else if limit = 0 then return cutoff
else
cutoff occurred? ← false
result ← RECURSIVE-DLS(child,problem,limit − 1)
if result = cutoff then cutoff occurred? ← true
else if result 6= failure then return result
if cutoff occurred? then return cutoff else return failure
Figure 3.17 A recursive implementation of depth-limited tree search.
map carefully, we would discover that any city can be reached from any other city in at most
9 steps. This number, known as the diameter of the state space, gives us a better depth limit,
DIAMETER
which leads to a more efficient depth-limited search. For most problems, however, we will
not know a good depth limit until we have solved the problem.
Depth-limited search can be implemented as a simple modification to the general tree-
or graph-search algorithm. Alternatively, it can be implemented as a simple recursive al-
gorithm as shown in Figure 3.17. Notice that depth-limited search can terminate with two
kinds of failure: the standard failure value indicates no solution; the cutoff value indicates
no solution within the depth limit.
3.4.5 Iterative deepening depth-first search
Iterative deepening search (or iterative deepening depth-first search) is a general strategy,
ITERATIVE
DEEPENING SEARCH
often used in combination with depth-first tree search, that finds the best depth limit. It does
this by gradually increasing the limit—first 0, then 1, then 2, and so on—until a goal is found.
This will occur when the depth limit reaches d, the depth of the shallowest goal node. The
algorithm is shown in Figure 3.18. Iterative deepening combines the benefits of depth-first
and breadth-first search. Like depth-first search, its memory requirements are modest: O(bd)
to be precise. Like breadth-first search, it is complete when the branching factor is finite and
optimal when the path cost is a nondecreasing function of the depth of the node. Figure 3.19
shows four iterations of ITERATIVE-DEEPENING-SEARCH on a binary search tree, where the
solution is found on the fourth iteration.
Iterative deepening search may seem wasteful because states are generated multiple
times. It turns out this is not too costly. The reason is that in a search tree with the same (or
nearly the same) branching factor at each level, most of the nodes are in the bottom level,
so it does not matter much that the upper levels are generated multiple times. In an iterative
deepening search, the nodes on the bottom level (depth d) are generated once, those on the

function ITERATIVE-DEEPENING-SEARCH(problem) returns a solution, or failure
for depth = 0 to ∞ do
result ← DEPTH-LIMITED-SEARCH(problem,depth)
if result 6= cutoff then return result
Figure 3.18 The iterative deepening search algorithm, which repeatedly applies depth-
limited search with increasing limits. It terminates when a solution is found or if the depth-
limited search returns failure, meaning that no solution exists.
Limit = 3
Limit = 2
Limit = 1
Limit = 0 A A
A
B C
A
B C
A
B C
A
B C
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H I J K L M N O
A
B C
D E F G
H J K L M N O
I
A
B C
D E F G
H I J K L M N O
Figure 3.19 Four iterations of iterative deepening search on a binary tree.

next-to-bottom level are generated twice, and so on, up to the children of the root, which are
generated d times. So the total number of nodes generated in the worst case is
N(IDS) = (d)b + (d − 1)b2
+ · · · + (1)bd
,
which gives a time complexity of O(bd)—asymptotically the same as breadth-first search.
There is some extra cost for generating the upper levels multiple times, but it is not large. For
example, if b = 10 and d = 5, the numbers are
N(IDS) = 50 + 400 + 3, 000 + 20, 000 + 100, 000 = 123, 450
N(BFS) = 10 + 100 + 1, 000 + 10, 000 + 100, 000 = 111, 110 .
If you are really concerned about repeating the repetition, you can use a hybrid approach
that runs breadth-first search until almost all the available memory is consumed, and then
runs iterative deepening from all the nodes in the frontier. In general, iterative deepening is
the preferred uninformed search method when the search space is large and the depth of the
solution is not known.
Iterative deepening search is analogous to breadth-first search in that it explores a com-
plete layer of new nodes at each iteration before going on to the next layer. It would seem
worthwhile to develop an iterative analog to uniform-cost search, inheriting the latter algo-
rithm’s optimality guarantees while avoiding its memory requirements. The idea is to use
increasing path-cost limits instead of increasing depth limits. The resulting algorithm, called
iterative lengthening search, is explored in Exercise 3.18. It turns out, unfortunately, that
ITERATIVE
LENGTHENING
SEARCH
iterative lengthening incurs substantial overhead compared to uniform-cost search.
3.4.6 Bidirectional search
The idea behind bidirectional search is to run two simultaneous searches—one forward from
the initial state and the other backward from the goal—hoping that the two searches meet in
the middle (Figure 3.20). The motivation is that bd/2 + bd/2 is much less than bd, or in the
figure, the area of the two small circles is less than the area of one big circle centered on the
start and reaching to the goal.
Bidirectional search is implemented by replacing the goal test with a check to see
whether the frontiers of the two searches intersect; if they do, a solution has been found.
(It is important to realize that the first such solution found may not be optimal, even if the
two searches are both breadth-first; some additional search is required to make sure there
isn’t another short-cut across the gap.) The check can be done when each node is generated
or selected for expansion and, with a hash table, will take constant time. For example, if a
problem has solution depth d = 6, and each direction runs breadth-first search one node at a
time, then in the worst case the two searches meet when they have generated all of the nodes
at depth 3. For b = 10, this means a total of 2,220 node generations, compared with 1,111,110
for a standard breadth-first search. Thus, the time complexity of bidirectional search using
breadth-first searches in both directions is O(bd/2). The space complexity is also O(bd/2).
We can reduce this by roughly half if one of the two searches is done by iterative deepening,
but at least one of the frontiers must be kept in memory so that the intersection check can be
done. This space requirement is the most significant weakness of bidirectional search.

Goal
Start
Figure 3.20 A schematic view of a bidirectional search that is about to succeed when a
branch from the start node meets a branch from the goal node.
The reduction in time complexity makes bidirectional search attractive, but how do we
search backward? This is not as easy as it sounds. Let the predecessors of a state x be all
PREDECESSOR
those states that have x as a successor. Bidirectional search requires a method for computing
predecessors. When all the actions in the state space are reversible, the predecessors of x are
just its successors. Other cases may require substantial ingenuity.
Consider the question of what we mean by “the goal” in searching “backward from the
goal.” For the 8-puzzle and for finding a route in Romania, there is just one goal state, so the
backward search is very much like the forward search. If there are several explicitly listed
goal states—for example, the two dirt-free goal states in Figure 3.3—then we can construct a
new dummy goal state whose immediate predecessors are all the actual goal states. But if the
goal is an abstract description, such as the goal that “no queen attacks another queen” in the
n-queens problem, then bidirectional search is difficult to use.
3.4.7 Comparing uninformed search strategies
Figure 3.21 compares search strategies in terms of the four evaluation criteria set forth in
Section 3.3.2. This comparison is for tree-search versions. For graph searches, the main
differences are that depth-first search is complete for finite state spaces and that the space and
time complexities are bounded by the size of the state space.
Criterion
Breadth- Uniform- Depth- Depth- Iterative Bidirectional
First Cost First Limited Deepening (if applicable)
Complete? Yesa
Yesa,b
No No Yesa
Yesa,d
Time O(bd
) O(b1+⌊C∗
/ǫ⌋
) O(bm
) O(bℓ
) O(bd
) O(bd/2
)
Space O(bd
) O(b1+⌊C∗
/ǫ⌋
) O(bm) O(bℓ) O(bd) O(bd/2
)
Optimal? Yesc
Yes No No Yesc
Yesc,d
Figure 3.21 Evaluation of tree-search strategies. b is the branching factor; d is the depth
of the shallowest solution; m is the maximum depth of the search tree; l is the depth limit.
Superscript caveats are as follows: a
complete if b is finite; b
complete if step costs ≥ ǫ for
positive ǫ; c
optimal if step costs are all identical; d
if both directions use breadth-first search.

3.5 INFORMED (HEURISTIC) SEARCH STRATEGIES
This section shows how an informed search strategy—one that uses problem-specific knowl-
INFORMED SEARCH
edge beyond the definition of the problem itself—can find solutions more efficiently than can
an uninformed strategy.
The general approach we consider is called best-first search. Best-first search is an
BEST-FIRST SEARCH
instance of the general TREE-SEARCH or GRAPH-SEARCH algorithm in which a node is
selected for expansion based on an evaluation function, f(n). The evaluation function is
EVALUATION
FUNCTION
construed as a cost estimate, so the node with the lowest evaluation is expanded first. The
implementation of best-first graph search is identical to that for uniform-cost search (Fig-
ure 3.14), except for the use of f instead of g to order the priority queue.
The choice of f determines the search strategy. (For example, as Exercise 3.22 shows,
best-first tree search includes depth-first search as a special case.) Most best-first algorithms
include as a component of f a heuristic function, denoted h(n):
HEURISTIC
FUNCTION
h(n) = estimated cost of the cheapest path from the state at node n to a goal state.
(Notice that h(n) takes a node as input, but, unlike g(n), it depends only on the state at that
node.) For example, in Romania, one might estimate the cost of the cheapest path from Arad
to Bucharest via the straight-line distance from Arad to Bucharest.
Heuristic functions are the most common form in which additional knowledge of the
problem is imparted to the search algorithm. We study heuristics in more depth in Section 3.6.
For now, we consider them to be arbitrary, nonnegative, problem-specific functions, with one
constraint: if n is a goal node, then h(n) = 0. The remainder of this section covers two ways
to use heuristic information to guide search.
3.5.1 Greedy best-first search
Greedy best-first search8 tries to expand the node that is closest to the goal, on the grounds
GREEDY BEST-FIRST
SEARCH
that this is likely to lead to a solution quickly. Thus, it evaluates nodes by using just the
heuristic function; that is, f(n) = h(n).
Let us see how this works for route-finding problems in Romania; we use the straight-
line distance heuristic, which we will call hSLD . If the goal is Bucharest, we need to
STRAIGHT-LINE
DISTANCE
know the straight-line distances to Bucharest, which are shown in Figure 3.22. For exam-
ple, hSLD (In(Arad)) = 366. Notice that the values of hSLD cannot be computed from the
problem description itself. Moreover, it takes a certain amount of experience to know that
hSLD is correlated with actual road distances and is, therefore, a useful heuristic.
Figure 3.23 shows the progress of a greedy best-first search using hSLD to find a path
from Arad to Bucharest. The first node to be expanded from Arad will be Sibiu because it
is closer to Bucharest than either Zerind or Timisoara. The next node to be expanded will
be Fagaras because it is closest. Fagaras in turn generates Bucharest, which is the goal. For
this particular problem, greedy best-first search using hSLD finds a solution without ever
8 Our first edition called this greedy search; other authors have called it best-first search. Our more general
usage of the latter term follows Pearl (1984).

Section 3.5. Informed (Heuristic) Search Strategies 93
Urziceni
Neamt
Oradea
Zerind
Timisoara
Mehadia
Sibiu
Pitesti
Rimnicu Vilcea
Vaslui
Bucharest
Giurgiu
Hirsova
Eforie
Arad
Lugoj
Drobeta
Craiova
Fagaras
Iasi
0
160
242
161
77
151
366
244
226
176
241
253
329
80
199
380
234
374
100
193
Figure 3.22 Values of hSLD —straight-line distances to Bucharest.
expanding a node that is not on the solution path; hence, its search cost is minimal. It is
not optimal, however: the path via Sibiu and Fagaras to Bucharest is 32 kilometers longer
than the path through Rimnicu Vilcea and Pitesti. This shows why the algorithm is called
“greedy”—at each step it tries to get as close to the goal as it can.
Greedy best-first tree search is also incomplete even in a finite state space, much like
depth-first search. Consider the problem of getting from Iasi to Fagaras. The heuristic sug-
gests that Neamt be expanded first because it is closest to Fagaras, but it is a dead end. The
solution is to go first to Vaslui—a step that is actually farther from the goal according to
the heuristic—and then to continue to Urziceni, Bucharest, and Fagaras. The algorithm will
never find this solution, however, because expanding Neamt puts Iasi back into the frontier,
Iasi is closer to Fagaras than Vaslui is, and so Iasi will be expanded again, leading to an infi-
nite loop. (The graph search version is complete in finite spaces, but not in infinite ones.) The
worst-case time and space complexity for the tree version is O(bm), where m is the maximum
depth of the search space. With a good heuristic function, however, the complexity can be
reduced substantially. The amount of the reduction depends on the particular problem and on
the quality of the heuristic.
3.5.2 A* search: Minimizing the total estimated solution cost
The most widely known form of best-first search is called A∗ search (pronounced “A-star
A
∗ SEARCH
search”). It evaluates nodes by combining g(n), the cost to reach the node, and h(n), the cost
to get from the node to the goal:
f(n) = g(n) + h(n) .
Since g(n) gives the path cost from the start node to node n, and h(n) is the estimated cost
of the cheapest path from n to the goal, we have
f(n) = estimated cost of the cheapest solution through n .
Thus, if we are trying to find the cheapest solution, a reasonable thing to try first is the
node with the lowest value of g(n) + h(n). It turns out that this strategy is more than just
reasonable: provided that the heuristic function h(n) satisfies certain conditions, A∗ search is
both complete and optimal. The algorithm is identical to UNIFORM-COST-SEARCH except
that A∗ uses g + h instead of g.

Rimnicu Vilcea
Zerind
Arad
Sibiu
Arad Fagaras Oradea
Timisoara
Sibiu Bucharest
329 374
366 380 193
253 0
Rimnicu Vilcea
Arad
Sibiu
Arad Fagaras Oradea
Timisoara
329
Zerind
374
366 176 380 193
Zerind
Arad
Sibiu Timisoara
253 329 374
Arad
366
(d) After expanding Fagaras
Figure 3.23 Stages in a greedy best-ﬁrst tree search for Bucharest with the straight-line
distance heuristic hSLD . Nodes are labeled with their h-values.
Conditions for optimality: Admissibility and consistency
The ﬁrst condition we require for optimality is that h(n) be an admissible heuristic. An
ADMISSIBLE
HEURISTIC
admissible heuristic is one that never overestimates the cost to reach the goal. Because g(n)
is the actual cost to reach n along the current path, and f(n) = g(n) + h(n), we have as an
immediate consequence that f(n) never overestimates the true cost of a solution along the
current path through n.
Admissible heuristics are by nature optimistic because they think the cost of solving
the problem is less than it actually is. An obvious example of an admissible heuristic is the
straight-line distance hSLD that we used in getting to Bucharest. Straight-line distance is
admissible because the shortest path between any two points is a straight line, so the straight

line cannot be an overestimate. In Figure 3.24, we show the progress of an A∗ tree search for
Bucharest. The values of g are computed from the step costs in Figure 3.2, and the values of
hSLD are given in Figure 3.22. Notice in particular that Bucharest first appears on the frontier
at step (e), but it is not selected for expansion because its f-cost (450) is higher than that of
Pitesti (417). Another way to say this is that there might be a solution through Pitesti whose
cost is as low as 417, so the algorithm will not settle for a solution that costs 450.
A second, slightly stronger condition called consistency (or sometimes monotonicity)
CONSISTENCY
MONOTONICITY is required only for applications of A∗ to graph search.9 A heuristic h(n) is consistent if, for
every node n and every successor n′ of n generated by any action a, the estimated cost of
reaching the goal from n is no greater than the step cost of getting to n′ plus the estimated
cost of reaching the goal from n′:
h(n) ≤ c(n, a, n′
) + h(n′
) .
This is a form of the general triangle inequality, which stipulates that each side of a triangle
TRIANGLE
INEQUALITY
cannot be longer than the sum of the other two sides. Here, the triangle is formed by n, n′,
and the goal Gn closest to n. For an admissible heuristic, the inequality makes perfect sense:
if there were a route from n to Gn via n′ that was cheaper than h(n), that would violate the
property that h(n) is a lower bound on the cost to reach Gn.
It is fairly easy to show (Exercise 3.32) that every consistent heuristic is also admissible.
Consistency is therefore a stricter requirement than admissibility, but one has to work quite
hard to concoct heuristics that are admissible but not consistent. All the admissible heuristics
we discuss in this chapter are also consistent. Consider, for example, hSLD . We know that
the general triangle inequality is satisfied when each side is measured by the straight-line
distance and that the straight-line distance between n and n′ is no greater than c(n, a, n′).
Hence, hSLD is a consistent heuristic.
Optimality of A*
As we mentioned earlier, A∗ has the following properties: the tree-search version of A∗ is
optimal if h(n) is admissible, while the graph-search version is optimal if h(n) is consistent.
We show the second of these two claims since it is more useful. The argument es-
sentially mirrors the argument for the optimality of uniform-cost search, with g replaced by
f—just as in the A∗ algorithm itself.
The first step is to establish the following: if h(n) is consistent, then the values of
f(n) along any path are nondecreasing. The proof follows directly from the definition of
consistency. Suppose n′ is a successor of n; then g(n′) = g(n) + c(n, a, n′) for some action
a, and we have
f(n′
) = g(n′
) + h(n′
) = g(n) + c(n, a, n′
) + h(n′
) ≥ g(n) + h(n) = f(n) .
The next step is to prove that whenever A∗ selects a node n for expansion, the optimal path
to that node has been found. Were this not the case, there would have to be another frontier
node n′ on the optimal path from the start node to n, by the graph separation property of
9 With an admissible but inconsistent heuristic, A∗
requires some extra bookkeeping to ensure optimality.

Arad
Sibiu Timisoara
447=118+329
Zerind
449=75+374
393=140+253
Arad
366=0+366
(d) After expanding Rimnicu Vilcea
(e) After expanding Fagaras
(f) After expanding Pitesti
Zerind
Arad
Sibiu
Arad
Timisoara
Rimnicu Vilcea
Fagaras Oradea
447=118+329 449=75+374
646=280+366 413=220+193
415=239+176 671=291+380
Zerind
Arad
Sibiu Timisoara
447=118+329 449=75+374
Rimnicu Vilcea
Craiova Pitesti Sibiu
526=366+160 553=300+253
417=317+100
Zerind
Arad
Sibiu
Arad
Timisoara
Sibiu Bucharest
Fagaras Oradea
447=118+329 449=75+374
646=280+366
591=338+253 450=450+0 526=366+160 553=300+253
417=317+100
671=291+380
Zerind
Arad
Sibiu
Arad
Timisoara
Sibiu Bucharest
Oradea
Bucharest Craiova Rimnicu Vilcea
418=418+0
447=118+329 449=75+374
646=280+366
591=338+253 450=450+0 526=366+160 553=300+253
615=455+160 607=414+193
671=291+380
Rimnicu Vilcea
Fagaras Rimnicu Vilcea
Arad Fagaras Oradea
646=280+366 415=239+176 671=291+380
Figure 3.24 Stages in an A∗
search for Bucharest. Nodes are labeled with f = g + h. The
h values are the straight-line distances to Bucharest taken from Figure 3.22.

O
Z
A
T
L
M
D
C
R
F
P
G
B
U
H
E
V
I
N
380
400
420
S
Figure 3.25 Map of Romania showing contours at f = 380, f = 400, and f = 420, with
Arad as the start state. Nodes inside a given contour have f-costs less than or equal to the
contour value.
Figure 3.9; because f is nondecreasing along any path, n′ would have lower f-cost than n
and would have been selected first.
From the two preceding observations, it follows that the sequence of nodes expanded
by A∗ using GRAPH-SEARCH is in nondecreasing order of f(n). Hence, the first goal node
selected for expansion must be an optimal solution because f is the true cost for goal nodes
(which have h = 0) and all later goal nodes will be at least as expensive.
The fact that f-costs are nondecreasing along any path also means that we can draw
contours in the state space, just like the contours in a topographic map. Figure 3.25 shows
CONTOUR
an example. Inside the contour labeled 400, all nodes have f(n) less than or equal to 400,
and so on. Then, because A∗ expands the frontier node of lowest f-cost, we can see that an
A∗ search fans out from the start node, adding nodes in concentric bands of increasing f-cost.
With uniform-cost search (A∗ search using h(n) = 0), the bands will be “circular”
around the start state. With more accurate heuristics, the bands will stretch toward the goal
state and become more narrowly focused around the optimal path. If C∗ is the cost of the
optimal solution path, then we can say the following:
• A∗ expands all nodes with f(n) < C∗.
• A∗ might then expand some of the nodes right on the “goal contour” (where f(n) = C∗)
before selecting a goal node.
Completeness requires that there be only finitely many nodes with cost less than or equal to
C∗, a condition that is true if all step costs exceed some finite ǫ and if b is finite.
Notice that A∗ expands no nodes with f(n) > C∗—for example, Timisoara is not
expanded in Figure 3.24 even though it is a child of the root. We say that the subtree below

Timisoara is pruned; because hSLD is admissible, the algorithm can safely ignore this subtree
PRUNING
while still guaranteeing optimality. The concept of pruning—eliminating possibilities from
consideration without having to examine them—is important for many areas of AI.
One final observation is that among optimal algorithms of this type—algorithms that
extend search paths from the root and use the same heuristic information—A∗ is optimally
efficient for any given consistent heuristic. That is, no other optimal algorithm is guaran-
OPTIMALLY
EFFICIENT
teed to expand fewer nodes than A∗ (except possibly through tie-breaking among nodes with
f(n) = C∗). This is because any algorithm that does not expand all nodes with f(n) < C∗
runs the risk of missing the optimal solution.
That A∗ search is complete, optimal, and optimally efficient among all such algorithms
is rather satisfying. Unfortunately, it does not mean that A∗ is the answer to all our searching
needs. The catch is that, for most problems, the number of states within the goal contour
search space is still exponential in the length of the solution. The details of the analysis are
beyond the scope of this book, but the basic results are as follows. For problems with constant
step costs, the growth in run time as a function of the optimal solution depth d is analyzed in
terms of the the absolute error or the relative error of the heuristic. The absolute error is
ABSOLUTE ERROR
RELATIVE ERROR defined as ∆ ≡ h∗ − h, where h∗ is the actual cost of getting from the root to the goal, and
the relative error is defined as ǫ ≡ (h∗ − h)/h∗.
The complexity results depend very strongly on the assumptions made about the state
space. The simplest model studied is a state space that has a single goal and is essentially a
tree with reversible actions. (The 8-puzzle satisfies the first and third of these assumptions.)
In this case, the time complexity of A∗ is exponential in the maximum absolute error, that is,
O(b∆). For constant step costs, we can write this as O(bǫd), where d is the solution depth.
For almost all heuristics in practical use, the absolute error is at least proportional to the path
cost h∗, so ǫ is constant or growing and the time complexity is exponential in d. We can
also see the effect of a more accurate heuristic: O(bǫd) = O((bǫ)d), so the effective branching
factor (defined more formally in the next section) is bǫ.
When the state space has many goal states—particularly near-optimal goal states—the
search process can be led astray from the optimal path and there is an extra cost proportional
to the number of goals whose cost is within a factor ǫ of the optimal cost. Finally, in the
general case of a graph, the situation is even worse. There can be exponentially many states
with f(n) < C∗ even if the absolute error is bounded by a constant. For example, consider
a version of the vacuum world where the agent can clean up any square for unit cost without
even having to visit it: in that case, squares can be cleaned in any order. With N initially dirty
squares, there are 2N states where some subset has been cleaned and all of them are on an
optimal solution path—and hence satisfy f(n) < C∗—even if the heuristic has an error of 1.
The complexity of A∗ often makes it impractical to insist on finding an optimal solution.
One can use variants of A∗ that find suboptimal solutions quickly, or one can sometimes
design heuristics that are more accurate but not strictly admissible. In any case, the use of a
good heuristic still provides enormous savings compared to the use of an uninformed search.
In Section 3.6, we look at the question of designing good heuristics.
Computation time is not, however, A∗’s main drawback. Because it keeps all generated
nodes in memory (as do all GRAPH-SEARCH algorithms), A∗ usually runs out of space long

function RECURSIVE-BEST-FIRST-SEARCH(problem) returns a solution, or failure
return RBFS(problem, MAKE-NODE(problem.INITIAL-STATE),∞)
function RBFS(problem,node,f limit) returns a solution, or failure and a new f-cost limit
successors ← [ ]
add CHILD-NODE(problem,node,action) into successors
if successors is empty then return failure, ∞
for each s in successors do /* update f with value from previous search, if any */
s.f ← max(s.g + s.h, node.f ))
loop do
best ← the lowest f-value node in successors
if best.f > f limit then return failure, best.f
alternative ← the second-lowest f-value among successors
result,best.f ← RBFS(problem,best,min(f limit, alternative))
if result 6= failure then return result
Figure 3.26 The algorithm for recursive best-first search.
before it runs out of time. For this reason, A∗ is not practical for many large-scale prob-
lems. There are, however, algorithms that overcome the space problem without sacrificing
optimality or completeness, at a small cost in execution time. We discuss these next.
3.5.3 Memory-bounded heuristic search
The simplest way to reduce memory requirements for A∗ is to adapt the idea of iterative
deepening to the heuristic search context, resulting in the iterative-deepening A∗ (IDA∗) al-
ITERATIVE-
DEEPENING
A
∗
gorithm. The main difference between IDA∗ and standard iterative deepening is that the cutoff
used is the f-cost (g +h) rather than the depth; at each iteration, the cutoff value is the small-
est f-cost of any node that exceeded the cutoff on the previous iteration. IDA∗ is practical
for many problems with unit step costs and avoids the substantial overhead associated with
keeping a sorted queue of nodes. Unfortunately, it suffers from the same difficulties with real-
valued costs as does the iterative version of uniform-cost search described in Exercise 3.18.
This section briefly examines two other memory-bounded algorithms, called RBFS and MA∗.
Recursive best-first search (RBFS) is a simple recursive algorithm that attempts to
RECURSIVE
BEST-FIRST SEARCH
mimic the operation of standard best-first search, but using only linear space. The algorithm
is shown in Figure 3.26. Its structure is similar to that of a recursive depth-first search, but
rather than continuing indefinitely down the current path, it uses the f limit variable to keep
track of the f-value of the best alternative path available from any ancestor of the current
node. If the current node exceeds this limit, the recursion unwinds back to the alternative
path. As the recursion unwinds, RBFS replaces the f-value of each node along the path
with a backed-up value—the best f-value of its children. In this way, RBFS remembers the
BACKED-UP VALUE
f-value of the best leaf in the forgotten subtree and can therefore decide whether it’s worth

Zerind
Arad
Sibiu
Arad Fagaras Oradea
Craiova Sibiu
Bucharest Craiova Rimnicu Vilcea
Zerind
Arad
Sibiu
Arad
Sibiu Bucharest
Rimnicu Vilcea
Oradea
Zerind
Arad
Sibiu
Arad
Timisoara
Timisoara
Timisoara
Fagaras Oradea Rimnicu Vilcea
646 415 671
526 553
646 671
450
591
646 671
526 553
418 615 607
447 449
447
447 449
449
366
393
366
393
413
413 417
415
366
393
415 450 417
Rimnicu Vilcea
Fagaras
447
415
447
447
417
(a) After expanding Arad, Sibiu,
and Rimnicu Vilcea
(c) After switching back to Rimnicu Vilcea
and expanding Pitesti
(b) After unwinding back to Sibiu
and expanding Fagaras
447
447
∞
∞
∞
417
417
Pitesti
Figure 3.27 Stages in an RBFS search for the shortest route to Bucharest. The f-limit
value for each recursive call is shown on top of each current node, and every node is labeled
with its f-cost. (a) The path via Rimnicu Vilcea is followed until the current best leaf (Pitesti)
has a value that is worse than the best alternative path (Fagaras). (b) The recursion unwinds
and the best leaf value of the forgotten subtree (417) is backed up to Rimnicu Vilcea; then
Fagaras is expanded, revealing a best leaf value of 450. (c) The recursion unwinds and the
best leaf value of the forgotten subtree (450) is backed up to Fagaras; then Rimnicu Vilcea is
expanded. This time, because the best alternative path (through Timisoara) costs at least 447,
the expansion continues to Bucharest.
reexpanding the subtree at some later time. Figure 3.27 shows how RBFS reaches Bucharest.
RBFS is somewhat more efﬁcient than IDA∗, but still suffers from excessive node re-
generation. In the example in Figure 3.27, RBFS follows the path via Rimnicu Vilcea, then

“changes its mind” and tries Fagaras, and then changes its mind back again. These mind
changes occur because every time the current best path is extended, its f-value is likely to
increase—h is usually less optimistic for nodes closer to the goal. When this happens, the
second-best path might become the best path, so the search has to backtrack to follow it.
Each mind change corresponds to an iteration of IDA∗ and could require many reexpansions
of forgotten nodes to recreate the best path and extend it one more node.
Like A∗ tree search, RBFS is an optimal algorithm if the heuristic function h(n) is
admissible. Its space complexity is linear in the depth of the deepest optimal solution, but
its time complexity is rather difficult to characterize: it depends both on the accuracy of the
heuristic function and on how often the best path changes as nodes are expanded.
IDA∗ and RBFS suffer from using too little memory. Between iterations, IDA∗ retains
only a single number: the current f-cost limit. RBFS retains more information in memory,
but it uses only linear space: even if more memory were available, RBFS has no way to make
use of it. Because they forget most of what they have done, both algorithms may end up reex-
panding the same states many times over. Furthermore, they suffer the potentially exponential
increase in complexity associated with redundant paths in graphs (see Section 3.3).
It seems sensible, therefore, to use all available memory. Two algorithms that do this
are MA∗ (memory-bounded A∗) and SMA∗ (simplified MA∗). SMA∗ is—well—simpler, so
MA*
SMA* we will describe it. SMA∗ proceeds just like A∗, expanding the best leaf until memory is full.
At this point, it cannot add a new node to the search tree without dropping an old one. SMA∗
always drops the worst leaf node—the one with the highest f-value. Like RBFS, SMA∗
then backs up the value of the forgotten node to its parent. In this way, the ancestor of a
forgotten subtree knows the quality of the best path in that subtree. With this information,
SMA∗ regenerates the subtree only when all other paths have been shown to look worse than
the path it has forgotten. Another way of saying this is that, if all the descendants of a node n
are forgotten, then we will not know which way to go from n, but we will still have an idea
of how worthwhile it is to go anywhere from n.
The complete algorithm is too complicated to reproduce here,10 but there is one subtlety
worth mentioning. We said that SMA∗ expands the best leaf and deletes the worst leaf. What
if all the leaf nodes have the same f-value? To avoid selecting the same node for deletion
and expansion, SMA∗ expands the newest best leaf and deletes the oldest worst leaf. These
coincide when there is only one leaf, but in that case, the current search tree must be a single
path from root to leaf that fills all of memory. If the leaf is not a goal node, then even if it is on
an optimal solution path, that solution is not reachable with the available memory. Therefore,
the node can be discarded exactly as if it had no successors.
SMA∗ is complete if there is any reachable solution—that is, if d, the depth of the
shallowest goal node, is less than the memory size (expressed in nodes). It is optimal if any
optimal solution is reachable; otherwise, it returns the best reachable solution. In practical
terms, SMA∗ is a fairly robust choice for finding optimal solutions, particularly when the state
space is a graph, step costs are not uniform, and node generation is expensive compared to
the overhead of maintaining the frontier and the explored set.
10 A rough sketch appeared in the first edition of this book.

On very hard problems, however, it will often be the case that SMA∗ is forced to switch
back and forth continually among many candidate solution paths, only a small subset of which
can fit in memory. (This resembles the problem of thrashing in disk paging systems.) Then
THRASHING
the extra time required for repeated regeneration of the same nodes means that problems
that would be practically solvable by A∗, given unlimited memory, become intractable for
SMA∗. That is to say, memory limitations can make a problem intractable from the point
of view of computation time. Although no current theory explains the tradeoff between time
and memory, it seems that this is an inescapable problem. The only way out is to drop the
optimality requirement.
3.5.4 Learning to search better
We have presented several fixed strategies—breadth-first, greedy best-first, and so on—that
have been designed by computer scientists. Could an agent learn how to search better? The
answer is yes, and the method rests on an important concept called the metalevel state space.
METALEVEL STATE
SPACE
Each state in a metalevel state space captures the internal (computational) state of a program
that is searching in an object-level state space such as Romania. For example, the internal
OBJECT-LEVEL STATE
SPACE
state of the A∗ algorithm consists of the current search tree. Each action in the metalevel state
space is a computation step that alters the internal state; for example, each computation step
in A∗ expands a leaf node and adds its successors to the tree. Thus, Figure 3.24, which shows
a sequence of larger and larger search trees, can be seen as depicting a path in the metalevel
state space where each state on the path is an object-level search tree.
Now, the path in Figure 3.24 has five steps, including one step, the expansion of Fagaras,
that is not especially helpful. For harder problems, there will be many such missteps, and a
metalevel learning algorithm can learn from these experiences to avoid exploring unpromis-
METALEVEL
LEARNING
ing subtrees. The techniques used for this kind of learning are described in Chapter 21. The
goal of learning is to minimize the total cost of problem solving, trading off computational
expense and path cost.
3.6 HEURISTIC FUNCTIONS
In this section, we look at heuristics for the 8-puzzle, in order to shed light on the nature of
heuristics in general.
The 8-puzzle was one of the earliest heuristic search problems. As mentioned in Sec-
tion 3.2, the object of the puzzle is to slide the tiles horizontally or vertically into the empty
space until the configuration matches the goal configuration (Figure 3.28).
The average solution cost for a randomly generated 8-puzzle instance is about 22 steps.
The branching factor is about 3. (When the empty tile is in the middle, four moves are
possible; when it is in a corner, two; and when it is along an edge, three.) This means
that an exhaustive tree search to depth 22 would look at about 322 ≈ 3.1 × 1010 states.
A graph search would cut this down by a factor of about 170,000 because only 9!/2 =
181, 440 distinct states are reachable. (See Exercise 3.5.) This is a manageable number, but

Section 3.6. Heuristic Functions 103
2
1
3 4
6 7
5
1
2
3
4
6
7
8
5
8
Figure 3.28 A typical instance of the 8-puzzle. The solution is 26 steps long.
the corresponding number for the 15-puzzle is roughly 1013, so the next order of business is
to find a good heuristic function. If we want to find the shortest solutions by using A∗, we
need a heuristic function that never overestimates the number of steps to the goal. There is a
long history of such heuristics for the 15-puzzle; here are two commonly used candidates:
• h1 = the number of misplaced tiles. For Figure 3.28, all of the eight tiles are out of
position, so the start state would have h1 = 8. h1 is an admissible heuristic because it
is clear that any tile that is out of place must be moved at least once.
• h2 = the sum of the distances of the tiles from their goal positions. Because tiles
cannot move along diagonals, the distance we will count is the sum of the horizontal
and vertical distances. This is sometimes called the city block distance or Manhattan
distance. h2 is also admissible because all any move can do is move one tile one step
MANHATTAN
DISTANCE
closer to the goal. Tiles 1 to 8 in the start state give a Manhattan distance of
h2 = 3 + 1 + 2 + 2 + 2 + 3 + 3 + 2 = 18 .
As expected, neither of these overestimates the true solution cost, which is 26.
3.6.1 The effect of heuristic accuracy on performance
One way to characterize the quality of a heuristic is the effective branching factor b∗. If the
EFFECTIVE
BRANCHING FACTOR
total number of nodes generated by A∗ for a particular problem is N and the solution depth is
d, then b∗ is the branching factor that a uniform tree of depth d would have to have in order
to contain N + 1 nodes. Thus,
N + 1 = 1 + b∗
+ (b∗
)2
+ · · · + (b∗
)d
.
For example, if A∗ finds a solution at depth 5 using 52 nodes, then the effective branching
factor is 1.92. The effective branching factor can vary across problem instances, but usually
it is fairly constant for sufficiently hard problems. (The existence of an effective branching
factor follows from the result, mentioned earlier, that the number of nodes expanded by A∗
grows exponentially with solution depth.) Therefore, experimental measurements of b∗ on a
small set of problems can provide a good guide to the heuristic’s overall usefulness. A well-
designed heuristic would have a value of b∗ close to 1, allowing fairly large problems to be
solved at reasonable computational cost.

To test the heuristic functions h1 and h2, we generated 1200 random problems with
solution lengths from 2 to 24 (100 for each even number) and solved them with iterative
deepening search and with A∗ tree search using both h1 and h2. Figure 3.29 gives the average
number of nodes generated by each strategy and the effective branching factor. The results
suggest that h2 is better than h1, and is far better than using iterative deepening search. Even
for small problems with d = 12, A∗ with h2 is 50,000 times more efficient than uninformed
iterative deepening search.
Search Cost (nodes generated) Effective Branching Factor
d IDS A∗(h1) A∗(h2) IDS A∗(h1) A∗(h2)
2 10 6 6 2.45 1.79 1.79
4 112 13 12 2.87 1.48 1.45
6 680 20 18 2.73 1.34 1.30
8 6384 39 25 2.80 1.33 1.24
10 47127 93 39 2.79 1.38 1.22
12 3644035 227 73 2.78 1.42 1.24
14 – 539 113 – 1.44 1.23
16 – 1301 211 – 1.45 1.25
18 – 3056 363 – 1.46 1.26
20 – 7276 676 – 1.47 1.27
22 – 18094 1219 – 1.48 1.28
24 – 39135 1641 – 1.48 1.26
Figure 3.29 Comparison of the search costs and effective branching factors for the
ITERATIVE-DEEPENING-SEARCH and A∗
algorithms with h1, h2. Data are averaged over
100 instances of the 8-puzzle for each of various solution lengths d.
One might ask whether h2 is always better than h1. The answer is “Essentially, yes.” It
is easy to see from the definitions of the two heuristics that, for any node n, h2(n) ≥ h1(n).
We thus say that h2 dominates h1. Domination translates directly into efficiency: A∗ using
DOMINATION
h2 will never expand more nodes than A∗ using h1 (except possibly for some nodes with
f(n) = C∗). The argument is simple. Recall the observation on page 97 that every node
with f(n) < C∗ will surely be expanded. This is the same as saying that every node with
h(n) < C∗ − g(n) will surely be expanded. But because h2 is at least as big as h1 for all
nodes, every node that is surely expanded by A∗ search with h2 will also surely be expanded
with h1, and h1 might cause other nodes to be expanded as well. Hence, it is generally
better to use a heuristic function with higher values, provided it is consistent and that the
computation time for the heuristic is not too long.
3.6.2 Generating admissible heuristics from relaxed problems
We have seen that both h1 (misplaced tiles) and h2 (Manhattan distance) are fairly good
heuristics for the 8-puzzle and that h2 is better. How might one have come up with h2? Is it
possible for a computer to invent such a heuristic mechanically?
h1 and h2 are estimates of the remaining path length for the 8-puzzle, but they are also
perfectly accurate path lengths for simplified versions of the puzzle. If the rules of the puzzle

were changed so that a tile could move anywhere instead of just to the adjacent empty square,
then h1 would give the exact number of steps in the shortest solution. Similarly, if a tile could
move one square in any direction, even onto an occupied square, then h2 would give the exact
number of steps in the shortest solution. A problem with fewer restrictions on the actions is
called a relaxed problem. The state-space graph of the relaxed problem is a supergraph of
RELAXED PROBLEM
the original state space because the removal of restrictions creates added edges in the graph.
Because the relaxed problem adds edges to the state space, any optimal solution in the
original problem is, by definition, also a solution in the relaxed problem; but the relaxed
problem may have better solutions if the added edges provide short cuts. Hence, the cost of
an optimal solution to a relaxed problem is an admissible heuristic for the original problem.
Furthermore, because the derived heuristic is an exact cost for the relaxed problem, it must
obey the triangle inequality and is therefore consistent (see page 95).
If a problem definition is written down in a formal language, it is possible to construct
relaxed problems automatically.11 For example, if the 8-puzzle actions are described as
A tile can move from square A to square B if
A is horizontally or vertically adjacent to B and B is blank,
we can generate three relaxed problems by removing one or both of the conditions:
(a) A tile can move from square A to square B if A is adjacent to B.
(b) A tile can move from square A to square B if B is blank.
(c) A tile can move from square A to square B.
From (a), we can derive h2 (Manhattan distance). The reasoning is that h2 would be the
proper score if we moved each tile in turn to its destination. The heuristic derived from (b) is
discussed in Exercise 3.34. From (c), we can derive h1 (misplaced tiles) because it would be
the proper score if tiles could move to their intended destination in one step. Notice that it is
crucial that the relaxed problems generated by this technique can be solved essentially without
search, because the relaxed rules allow the problem to be decomposed into eight independent
subproblems. If the relaxed problem is hard to solve, then the values of the corresponding
heuristic will be expensive to obtain.12
A program called ABSOLVER can generate heuristics automatically from problem def-
initions, using the “relaxed problem” method and various other techniques (Prieditis, 1993).
ABSOLVER generated a new heuristic for the 8-puzzle that was better than any preexisting
heuristic and found the first useful heuristic for the famous Rubik’s Cube puzzle.
One problem with generating new heuristic functions is that one often fails to get a
single “clearly best” heuristic. If a collection of admissible heuristics h1 . . . hm is available
for a problem and none of them dominates any of the others, which should we choose? As it
turns out, we need not make a choice. We can have the best of all worlds, by defining
h(n) = max{h1(n), . . . , hm(n)} .
11 In Chapters 8 and 10, we describe formal languages suitable for this task; with formal descriptions that can be
manipulated, the construction of relaxed problems can be automated. For now, we use English.
12 Note that a perfect heuristic can be obtained simply by allowing h to run a full breadth-first search “on the
sly.” Thus, there is a tradeoff between accuracy and computation time for heuristic functions.

1
2
3
4
6
8
5
2
1
3 6
7 8
5
4
Figure 3.30 A subproblem of the 8-puzzle instance given in Figure 3.28. The task is to
get tiles 1, 2, 3, and 4 into their correct positions, without worrying about what happens to
the other tiles.
This composite heuristic uses whichever function is most accurate on the node in question.
Because the component heuristics are admissible, h is admissible; it is also easy to prove that
h is consistent. Furthermore, h dominates all of its component heuristics.
3.6.3 Generating admissible heuristics from subproblems: Pattern databases
Admissible heuristics can also be derived from the solution cost of a subproblem of a given
SUBPROBLEM
problem. For example, Figure 3.30 shows a subproblem of the 8-puzzle instance in Fig-
ure 3.28. The subproblem involves getting tiles 1, 2, 3, 4 into their correct positions. Clearly,
the cost of the optimal solution of this subproblem is a lower bound on the cost of the com-
plete problem. It turns out to be more accurate than Manhattan distance in some cases.
The idea behind pattern databases is to store these exact solution costs for every pos-
PATTERN DATABASE
sible subproblem instance—in our example, every possible conﬁguration of the four tiles
and the blank. (The locations of the other four tiles are irrelevant for the purposes of solv-
ing the subproblem, but moves of those tiles do count toward the cost.) Then we compute
an admissible heuristic hDB for each complete state encountered during a search simply by
looking up the corresponding subproblem conﬁguration in the database. The database itself is
constructed by searching back13 from the goal and recording the cost of each new pattern en-
countered; the expense of this search is amortized over many subsequent problem instances.
The choice of 1-2-3-4 is fairly arbitrary; we could also construct databases for 5-6-7-8,
for 2-4-6-8, and so on. Each database yields an admissible heuristic, and these heuristics can
be combined, as explained earlier, by taking the maximum value. A combined heuristic of
this kind is much more accurate than the Manhattan distance; the number of nodes generated
when solving random 15-puzzles can be reduced by a factor of 1000.
One might wonder whether the heuristics obtained from the 1-2-3-4 database and the
5-6-7-8 could be added, since the two subproblems seem not to overlap. Would this still give
an admissible heuristic? The answer is no, because the solutions of the 1-2-3-4 subproblem
and the 5-6-7-8 subproblem for a given state will almost certainly share some moves—it is
13 By working backward from the goal, the exact solution cost of every instance encountered is immediately
available. This is an example of dynamic programming, which we discuss further in Chapter 17.

unlikely that 1-2-3-4 can be moved into place without touching 5-6-7-8, and vice versa. But
what if we don’t count those moves? That is, we record not the total cost of solving the 1-2-
3-4 subproblem, but just the number of moves involving 1-2-3-4. Then it is easy to see that
the sum of the two costs is still a lower bound on the cost of solving the entire problem. This
is the idea behind disjoint pattern databases. With such databases, it is possible to solve
DISJOINT PATTERN
DATABASES
random 15-puzzles in a few milliseconds—the number of nodes generated is reduced by a
factor of 10,000 compared with the use of Manhattan distance. For 24-puzzles, a speedup of
roughly a factor of a million can be obtained.
Disjoint pattern databases work for sliding-tile puzzles because the problem can be
divided up in such a way that each move affects only one subproblem—because only one tile
is moved at a time. For a problem such as Rubik’s Cube, this kind of subdivision is difficult
because each move affects 8 or 9 of the 26 cubies. More general ways of defining additive,
admissible heuristics have been proposed that do apply to Rubik’s cube (Yang et al., 2008),
but they have not yielded a heuristic better than the best nonadditive heuristic for the problem.
3.6.4 Learning heuristics from experience
A heuristic function h(n) is supposed to estimate the cost of a solution beginning from the
state at node n. How could an agent construct such a function? One solution was given in
the preceding sections—namely, to devise relaxed problems for which an optimal solution
can be found easily. Another solution is to learn from experience. “Experience” here means
solving lots of 8-puzzles, for instance. Each optimal solution to an 8-puzzle problem provides
examples from which h(n) can be learned. Each example consists of a state from the solu-
tion path and the actual cost of the solution from that point. From these examples, a learning
algorithm can be used to construct a function h(n) that can (with luck) predict solution costs
for other states that arise during search. Techniques for doing just this using neural nets, de-
cision trees, and other methods are demonstrated in Chapter 18. (The reinforcement learning
methods described in Chapter 21 are also applicable.)
Inductive learning methods work best when supplied with features of a state that are
FEATURE
relevant to predicting the state’s value, rather than with just the raw state description. For
example, the feature “number of misplaced tiles” might be helpful in predicting the actual
distance of a state from the goal. Let’s call this feature x1(n). We could take 100 randomly
generated 8-puzzle configurations and gather statistics on their actual solution costs. We
might find that when x1(n) is 5, the average solution cost is around 14, and so on. Given
these data, the value of x1 can be used to predict h(n). Of course, we can use several features.
A second feature x2(n) might be “number of pairs of adjacent tiles that are not adjacent in the
goal state.” How should x1(n) and x2(n) be combined to predict h(n)? A common approach
is to use a linear combination:
h(n) = c1x1(n) + c2x2(n) .
The constants c1 and c2 are adjusted to give the best fit to the actual data on solution costs.
One expects both c1 and c2 to be positive because misplaced tiles and incorrect adjacent pairs
make the problem harder to solve. Notice that this heuristic does satisfy the condition that
h(n) = 0 for goal states, but it is not necessarily admissible or consistent.

3.7 SUMMARY
This chapter has introduced methods that an agent can use to select actions in environments
that are deterministic, observable, static, and completely known. In such cases, the agent can
construct sequences of actions that achieve its goals; this process is called search.
• Before an agent can start searching for solutions, a goal must be identified and a well-
defined problem must be formulated.
• A problem consists of five parts: the initial state, a set of actions, a transition model
describing the results of those actions, a goal test function, and a path cost function.
The environment of the problem is represented by a state space. A path through the
state space from the initial state to a goal state is a solution.
• Search algorithms treat states and actions as atomic: they do not consider any internal
structure they might possess.
• A general TREE-SEARCH algorithm considers all possible paths to find a solution,
whereas a GRAPH-SEARCH algorithm avoids consideration of redundant paths.
• Search algorithms are judged on the basis of completeness, optimality, time complex-
ity, and space complexity. Complexity depends on b, the branching factor in the state
space, and d, the depth of the shallowest solution.
• Uninformed search methods have access only to the problem definition. The basic
algorithms are as follows:
– Breadth-first search expands the shallowest nodes first; it is complete, optimal
for unit step costs, but has exponential space complexity.
– Uniform-cost search expands the node with lowest path cost, g(n), and is optimal
for general step costs.
– Depth-first search expands the deepest unexpanded node first. It is neither com-
plete nor optimal, but has linear space complexity. Depth-limited search adds a
depth bound.
– Iterative deepening search calls depth-first search with increasing depth limits
until a goal is found. It is complete, optimal for unit step costs, has time complexity
comparable to breadth-first search, and has linear space complexity.
– Bidirectional search can enormously reduce time complexity, but it is not always
applicable and may require too much space.
• Informed search methods may have access to a heuristic function h(n) that estimates
the cost of a solution from n.
– The generic best-first search algorithm selects a node for expansion according to
an evaluation function.
– Greedy best-first search expands nodes with minimal h(n). It is not optimal but
is often efficient.

Bibliographical and Historical Notes 109
– A∗ search expands nodes with minimal f(n) = g(n) + h(n). A∗ is complete and
optimal, provided that h(n) is admissible (for TREE-SEARCH) or consistent (for
GRAPH-SEARCH). The space complexity of A∗ is still prohibitive.
– RBFS (recursive best-first search) and SMA∗ (simplified memory-bounded A∗)
are robust, optimal search algorithms that use limited amounts of memory; given
enough time, they can solve problems that A∗ cannot solve because it runs out of
memory.
• The performance of heuristic search algorithms depends on the quality of the heuristic
function. One can sometimes construct good heuristics by relaxing the problem defi-
nition, by storing precomputed solution costs for subproblems in a pattern database, or
by learning from experience with the problem class.
The topic of state-space search originated in more or less its current form in the early years of
AI. Newell and Simon’s work on the Logic Theorist (1957) and GPS (1961) led to the estab-
lishment of search algorithms as the primary weapons in the armory of 1960s AI researchers
and to the establishment of problem solving as the canonical AI task. Work in operations
research by Richard Bellman (1957) showed the importance of additive path costs in sim-
plifying optimization algorithms. The text on Automated Problem Solving by Nils Nilsson
(1971) established the area on a solid theoretical footing.
Most of the state-space search problems analyzed in this chapter have a long history
in the literature and are less trivial than they might seem. The missionaries and cannibals
problem used in Exercise 3.9 was analyzed in detail by Amarel (1968). It had been consid-
ered earlier—in AI by Simon and Newell (1961) and in operations research by Bellman and
Dreyfus (1962).
The 8-puzzle is a smaller cousin of the 15-puzzle, whose history is recounted at length
by Slocum and Sonneveld (2006). It was widely believed to have been invented by the fa-
mous American game designer Sam Loyd, based on his claims to that effect from 1891 on-
ward (Loyd, 1959). Actually it was invented by Noyes Chapman, a postmaster in Canastota,
New York, in the mid-1870s. (Chapman was unable to patent his invention, as a generic
patent covering sliding blocks with letters, numbers, or pictures was granted to Ernest Kinsey
in 1878.) It quickly attracted the attention of the public and of mathematicians (Johnson and
Story, 1879; Tait, 1880). The editors of the American Journal of Mathematics stated, “The
‘15’ puzzle for the last few weeks has been prominently before the American public, and may
safely be said to have engaged the attention of nine out of ten persons of both sexes and all
ages and conditions of the community.” Ratner and Warmuth (1986) showed that the general
n × n version of the 15-puzzle belongs to the class of NP-complete problems.
The 8-queens problem was first published anonymously in the German chess maga-
zine Schach in 1848; it was later attributed to one Max Bezzel. It was republished in 1850
and at that time drew the attention of the eminent mathematician Carl Friedrich Gauss, who

attempted to enumerate all possible solutions; initially he found only 72, but eventually he
found the correct answer of 92, although Nauck published all 92 solutions first, in 1850.
Netto (1901) generalized the problem to n queens, and Abramson and Yung (1989) found an
O(n) algorithm.
Each of the real-world search problems listed in the chapter has been the subject of a
good deal of research effort. Methods for selecting optimal airline flights remain proprietary
for the most part, but Carl de Marcken (personal communication) has shown that airline ticket
pricing and restrictions have become so convoluted that the problem of selecting an optimal
flight is formally undecidable. The traveling-salesperson problem is a standard combinato-
rial problem in theoretical computer science (Lawler et al., 1992). Karp (1972) proved the
TSP to be NP-hard, but effective heuristic approximation methods were developed (Lin and
Kernighan, 1973). Arora (1998) devised a fully polynomial approximation scheme for Eu-
clidean TSPs. VLSI layout methods are surveyed by Shahookar and Mazumder (1991), and
many layout optimization papers appear in VLSI journals. Robotic navigation and assembly
problems are discussed in Chapter 25.
Uninformed search algorithms for problem solving are a central topic of classical com-
puter science (Horowitz and Sahni, 1978) and operations research (Dreyfus, 1969). Breadth-
first search was formulated for solving mazes by Moore (1959). The method of dynamic
programming (Bellman, 1957; Bellman and Dreyfus, 1962), which systematically records
solutions for all subproblems of increasing lengths, can be seen as a form of breadth-first
search on graphs. The two-point shortest-path algorithm of Dijkstra (1959) is the origin
of uniform-cost search. These works also introduced the idea of explored and frontier sets
(closed and open lists).
A version of iterative deepening designed to make efficient use of the chess clock was
first used by Slate and Atkin (1977) in the CHESS 4.5 game-playing program. Martelli’s
algorithm B (1977) includes an iterative deepening aspect and also dominates A∗’s worst-case
performance with admissible but inconsistent heuristics. The iterative deepening technique
came to the fore in work by Korf (1985a). Bidirectional search, which was introduced by
Pohl (1971), can also be effective in some cases.
The use of heuristic information in problem solving appears in an early paper by Simon
and Newell (1958), but the phrase “heuristic search” and the use of heuristic functions that
estimate the distance to the goal came somewhat later (Newell and Ernst, 1965; Lin, 1965).
Doran and Michie (1966) conducted extensive experimental studies of heuristic search. Al-
though they analyzed path length and “penetrance” (the ratio of path length to the total num-
ber of nodes examined so far), they appear to have ignored the information provided by the
path cost g(n). The A∗ algorithm, incorporating the current path cost into heuristic search,
was developed by Hart, Nilsson, and Raphael (1968), with some later corrections (Hart et al.,
1972). Dechter and Pearl (1985) demonstrated the optimal efficiency of A∗.
The original A∗ paper introduced the consistency condition on heuristic functions. The
monotone condition was introduced by Pohl (1977) as a simpler replacement, but Pearl (1984)
showed that the two were equivalent.
Pohl (1977) pioneered the study of the relationship between the error in heuristic func-
tions and the time complexity of A∗. Basic results were obtained for tree search with unit step

costs and a single goal node (Pohl, 1977; Gaschnig, 1979; Huyn et al., 1980; Pearl, 1984) and
with multiple goal nodes (Dinh et al., 2007). The “effective branching factor” was proposed
by Nilsson (1971) as an empirical measure of the efficiency; it is equivalent to assuming a
time cost of O((b∗)d). For tree search applied to a graph, Korf et al. (2001) argue that the time
cost is better modeled as O(bd−k), where k depends on the heuristic accuracy; this analysis
has elicited some controversy, however. For graph search, Helmert and Röger (2008) noted
that several well-known problems contained exponentially many nodes on optimal solution
paths, implying exponential time complexity for A∗ even with constant absolute error in h.
There are many variations on the A∗ algorithm. Pohl (1973) proposed the use of dynamic
weighting, which uses a weighted sum fw(n) = wgg(n) + whh(n) of the current path length
and the heuristic function as an evaluation function, rather than the simple sum f(n) = g(n)+
h(n) used in A∗. The weights wg and wh are adjusted dynamically as the search progresses.
Pohl’s algorithm can be shown to be ǫ-admissible—that is, guaranteed to find solutions within
a factor 1 + ǫ of the optimal solution, where ǫ is a parameter supplied to the algorithm. The
same property is exhibited by the A∗
ǫ algorithm (Pearl, 1984), which can select any node from
the frontier provided its f-cost is within a factor 1+ ǫ of the lowest-f-cost frontier node. The
selection can be done so as to minimize search cost.
Bidirectional versions of A∗ have been investigated; a combination of bidirectional A∗
and known landmarks was used to efficiently find driving routes for Microsoft’s online map
service (Goldberg et al., 2006). After caching a set of paths between landmarks, the algorithm
can find an optimal path between any pair of points in a 24 million point graph of the United
States, searching less than 0.1% of the graph. Others approaches to bidirectional search
include a breadth-first search backward from the goal up to a fixed depth, followed by a
forward IDA∗ search (Dillenburg and Nelson, 1994; Manzini, 1995).
A∗ and other state-space search algorithms are closely related to the branch-and-bound
techniques that are widely used in operations research (Lawler and Wood, 1966). The
relationships between state-space search and branch-and-bound have been investigated in
depth (Kumar and Kanal, 1983; Nau et al., 1984; Kumar et al., 1988). Martelli and Monta-
nari (1978) demonstrate a connection between dynamic programming (see Chapter 17) and
certain types of state-space search. Kumar and Kanal (1988) attempt a “grand unification” of
heuristic search, dynamic programming, and branch-and-bound techniques under the name
of CDP—the “composite decision process.”
Because computers in the late 1950s and early 1960s had at most a few thousand words
of main memory, memory-bounded heuristic search was an early research topic. The Graph
Traverser (Doran and Michie, 1966), one of the earliest search programs, commits to an
operator after searching best-first up to the memory limit. IDA∗ (Korf, 1985a, 1985b) was the
first widely used optimal, memory-bounded heuristic search algorithm, and a large number
of variants have been developed. An analysis of the efficiency of IDA∗ and of its difficulties
with real-valued heuristics appears in Patrick et al. (1992).
RBFS (Korf, 1993) is actually somewhat more complicated than the algorithm shown
in Figure 3.26, which is closer to an independently developed algorithm called iterative ex-
pansion (Russell, 1992). RBFS uses a lower bound as well as the upper bound; the two al-
ITERATIVE
EXPANSION
gorithms behave identically with admissible heuristics, but RBFS expands nodes in best-first

order even with an inadmissible heuristic. The idea of keeping track of the best alternative
path appeared earlier in Bratko’s (1986) elegant Prolog implementation of A∗ and in the DTA∗
algorithm (Russell and Wefald, 1991). The latter work also discusses metalevel state spaces
and metalevel learning.
The MA∗ algorithm appeared in Chakrabarti et al. (1989). SMA∗, or Simplified MA∗,
emerged from an attempt to implement MA∗ as a comparison algorithm for IE (Russell, 1992).
Kaindl and Khorsand (1994) have applied SMA∗ to produce a bidirectional search algorithm
that is substantially faster than previous algorithms. Korf and Zhang (2000) describe a divide-
and-conquer approach, and Zhou and Hansen (2002) introduce memory-bounded A∗ graph
search and a strategy for switching to breadth-first search to increase memory-efficiency
(Zhou and Hansen, 2006). Korf (1995) surveys memory-bounded search techniques.
The idea that admissible heuristics can be derived by problem relaxation appears in the
seminal paper by Held and Karp (1970), who used the minimum-spanning-tree heuristic to
solve the TSP. (See Exercise 3.33.)
The automation of the relaxation process was implemented successfully by Priedi-
tis (1993), building on earlier work with Mostow (Mostow and Prieditis, 1989). Holte and
Hernadvolgyi (2001) describe more recent steps towards automating the process. The use of
pattern databases to derive admissible heuristics is due to Gasser (1995) and Culberson and
Schaeffer (1996, 1998); disjoint pattern databases are described by Korf and Felner (2002);
a similar method using symbolic patterns is due to Edelkamp (2009). Felner et al. (2007)
show how to compress pattern databases to save space. The probabilistic interpretation of
heuristics was investigated in depth by Pearl (1984) and Hansson and Mayer (1989).
By far the most comprehensive source on heuristics and heuristic search algorithms
is Pearl’s (1984) Heuristics text. This book provides especially good coverage of the wide
variety of offshoots and variations of A∗, including rigorous proofs of their formal properties.
Kanal and Kumar (1988) present an anthology of important articles on heuristic search, and
Rayward-Smith et al. (1996) cover approaches from Operations Research. Papers about new
search algorithms—which, remarkably, continue to be discovered—appear in journals such
as Artificial Intelligence and Journal of the ACM.
The topic of parallel search algorithms was not covered in the chapter, partly because
PARALLEL SEARCH
it requires a lengthy discussion of parallel computer architectures. Parallel search became a
popular topic in the 1990s in both AI and theoretical computer science (Mahanti and Daniels,
1993; Grama and Kumar, 1995; Crauser et al., 1998) and is making a comeback in the era
of new multicore and cluster architectures (Ralphs et al., 2004; Korf and Schultze, 2005).
Also of increasing importance are search algorithms for very large graphs that require disk
storage (Korf, 2008).
EXERCISES
3.1 Explain why problem formulation must follow goal formulation.
3.2 Give a complete problem formulation for each of the following problems. Choose a

Exercises 113
formulation that is precise enough to be implemented.
a. There are six glass boxes in a row, each with a lock. Each of the first five boxes holds a
key unlocking the next box in line; the last box holds a banana. You have the key to the
first box, and you want the banana.
b. You start with the sequence ABABAECCEC, or in general any sequence made from A,
B, C, and E. You can transform this sequence using the following equalities: AC = E,
AB = BC, BB = E, and Ex = x for any x. For example, ABBC can be transformed into
AEC, and then AC, and then E. Your goal is to produce the sequence E.
c. There is an n × n grid of squares, each square initially being either unpainted floor or a
bottomless pit. You start standing on an unpainted floor square, and can either paint the
square under you or move onto an adjacent unpainted floor square. You want the whole
floor painted.
d. A container ship is in port, loaded high with containers. There 13 rows of containers,
each 13 containers wide and 5 containers tall. You control a crane that can move to any
location above the ship, pick up the container under it, and move it onto the dock. You
want the ship unloaded.
3.3 You have a 9 × 9 grid of squares, each of which can be colored red or blue. The grid
is initially colored all blue, but you can change the color of any square any number of times.
Imagining the grid divided into nine 3 × 3 sub-squares, you want each sub-square to be all
one color but neighboring sub-squares to be different colors.
a. Formulate this problem in the straightforward way. Compute the size of the state space.
b. You need color a square only once. Reformulate, and compute the size of the state
space. Would breadth-first graph search perform faster on this problem than on the one
in (a)? How about iterative deepening tree search?
c. Given the goal, we need consider only colorings where each sub-square is uniformly
colored. Reformulate the problem and compute the size of the state space.
d. How many solutions does this problem have?
e. Parts (b) and (c) successively abstracted the original problem (a). Can you give a trans-
lation from solutions in problem (c) into solutions in problem (b), and from solutions in
problem (b) into solutions for problem (a)?
3.4 Suppose two friends live in different cities on a map, such as the Romania map shown
in Figure 3.2. On every turn, we can simultaneously move each friend to a neighboring city
on the map. The amount of time needed to move from city i to neighbor j is equal to the road
distance d(i, j) between the cities, but on each turn the friend that arrives first must wait until
the other one arrives (and calls the first on his/her cell phone) before the next turn can begin.
We want the two friends to meet as quickly as possible.
a. Write a detailed formulation for this search problem. (You will find it helpful to define
some formal notation here.)
b. Let D(i, j) be the straight-line distance between cities i and j. Which of the following
heuristic functions are admissible? (i) D(i, j); (ii) 2 · D(i, j); (iii) D(i, j)/2.

S
G
Figure 3.31 A scene with polygonal obstacles. S and G are the start and goal states.
c. Are there completely connected maps for which no solution exists?
d. Are there maps in which all solutions require one friend to visit the same city twice?
3.5 Show that the 8-puzzle states are divided into two disjoint sets, such that any state is
reachable from any other state in the same set, while no state is reachable from any state in
the other set. (Hint: See Berlekamp et al. (1982).) Devise a procedure to decide which set a
given state is in, and explain why this is useful for generating random states.
3.6 Consider the n-queens problem using the “efficient” incremental formulation given on
page 72. Explain why the state space has at least 3
√
n! states and estimate the largest n for
which exhaustive exploration is feasible. (Hint: Derive a lower bound on the branching factor
by considering the maximum number of squares that a queen can attack in any column.)
3.7 Consider the problem of finding the shortest path between two points on a plane that has
convex polygonal obstacles as shown in Figure 3.31. This is an idealization of the problem
that a robot has to solve to navigate in a crowded environment.
a. Suppose the state space consists of all positions (x, y) in the plane. How many states
are there? How many paths are there to the goal?
b. Explain briefly why the shortest path from one polygon vertex to any other in the scene
must consist of straight-line segments joining some of the vertices of the polygons.
Define a good state space now. How large is this state space?
c. Define the necessary functions to implement the search problem, including an ACTIONS
function that takes a vertex as input and returns a set of vectors, each of which maps the
current vertex to one of the vertices that can be reached in a straight line. (Do not forget
the neighbors on the same polygon.) Use the straight-line distance for the heuristic
function.
d. Apply one or more of the algorithms in this chapter to solve a range of problems in the
domain, and comment on their performance.

Exercises 115
3.8 On page 68, we said that we would not consider problems with negative path costs. In
this exercise, we explore this decision in more depth.
a. Suppose that actions can have arbitrarily large negative costs; explain why this possi-
bility would force any optimal algorithm to explore the entire state space.
b. Does it help if we insist that step costs must be greater than or equal to some negative
constant c? Consider both trees and graphs.
c. Suppose that a set of actions forms a loop in the state space such that executing the set in
some order results in no net change to the state. If all of these actions have negative cost,
what does this imply about the optimal behavior for an agent in such an environment?
d. One can easily imagine actions with high negative cost, even in domains such as route
finding. For example, some stretches of road might have such beautiful scenery as to
far outweigh the normal costs in terms of time and fuel. Explain, in precise terms,
within the context of state-space search, why humans do not drive around scenic loops
indefinitely, and explain how to define the state space and actions for route finding so
that artificial agents can also avoid looping.
e. Can you think of a real domain in which step costs are such as to cause looping?
3.9 The missionaries and cannibals problem is usually stated as follows. Three mission-
aries and three cannibals are on one side of a river, along with a boat that can hold one or
two people. Find a way to get everyone to the other side without ever leaving a group of mis-
sionaries in one place outnumbered by the cannibals in that place. This problem is famous in
AI because it was the subject of the first paper that approached problem formulation from an
analytical viewpoint (Amarel, 1968).
a. Formulate the problem precisely, making only those distinctions necessary to ensure a
valid solution. Draw a diagram of the complete state space.
b. Implement and solve the problem optimally using an appropriate search algorithm. Is it
a good idea to check for repeated states?
c. Why do you think people have a hard time solving this puzzle, given that the state space
is so simple?
3.10 Define in your own words the following terms: state, state space, search tree, search
node, goal, action, transition model, and branching factor.
3.11 What’s the difference between a world state, a state description, and a search node?
Why is this distinction useful?
3.12 An action such as Go(Sibiu) really consists of a long sequence of finer-grained actions:
turn on the car, release the brake, accelerate forward, etc. Having composite actions of this
kind reduces the number of steps in a solution sequence, thereby reducing the search time.
Suppose we take this to the logical extreme, by making super-composite actions out of every
possible sequence of Go actions. Then every problem instance is solved by a single super-
composite action, such as Go(Sibiu)Go(Rimnicu Vilcea)Go(Pitesti)Go(Bucharest). Explain
how search would work in this formulation. Is this a practical approach for speeding up
problem solving?

x 12
x 16
x 2 x 2
Figure 3.32 The track pieces in a wooden railway set; each is labeled with the number of
copies in the set. Note that curved pieces and “fork” pieces (“switches” or “points”) can be
flipped over so they can curve in either direction. Each curve subtends 45 degrees.
3.13 Does a finite state space always lead to a finite search tree? How about a finite state
space that is a tree? Can you be more precise about what types of state spaces always lead to
finite search trees? (Adapted from Bender, 1996.)
3.14 Prove that GRAPH-SEARCH satisfies the graph separation property illustrated in Fig-
ure 3.9. (Hint: Begin by showing that the property holds at the start, then show that if it holds
before an iteration of the algorithm, it holds afterwards.) Describe a search algorithm that
violates the property.
3.15 Which of the following are true and which are false? Explain your answers.
a. Depth-first search always expands at least as many nodes as A∗ search with an admissi-
ble heuristic.
b. h(n) = 0 is an admissible heuristic for the 8-puzzle.
c. A∗ is of no use in robotics because percepts, states, and actions are continuous.
d. Breadth-first search is complete even if zero step costs are allowed.
e. Assume that a rook can move on a chessboard any number of squares in a straight line,
vertically or horizontally, but cannot jump over other pieces. Manhattan distance is an
admissible heuristic for the problem of moving the rook from square A to square B in
the smallest number of moves.
3.16 A basic wooden railway set contains the pieces shown in Figure 3.32. The task is to
connect these pieces into a railway that has no overlapping tracks and no loose ends where a
train could run off onto the floor.
a. Suppose that the pieces fit together exactly with no slack. Give a precise formulation of
the task as a search problem.
b. Identify a suitable uninformed search algorithm for this task and explain your choice.
c. Explain why removing any one of the “fork” pieces makes the problem unsolvable.
d. Give an upper bound on the total size of the state space defined by your formulation.
(Hint: think about the maximum branching factor for the construction process and the
maximum depth, ignoring the problem of overlapping pieces and loose ends. Begin by
pretending that every piece is unique.)
3.17 Implement two versions of the RESULT(s, a) function for the 8-puzzle: one that copies

Exercises 117
and edits the data structure for the parent node s and one that modifies the parent state di-
rectly (undoing the modifications as needed). Write versions of iterative deepening depth-first
search that use these functions and compare their performance.
3.18 On page 90, we mentioned iterative lengthening search, an iterative analog of uni-
form cost search. The idea is to use increasing limits on path cost. If a node is generated
whose path cost exceeds the current limit, it is immediately discarded. For each new itera-
tion, the limit is set to the lowest path cost of any node discarded in the previous iteration.
a. Show that this algorithm is optimal for general path costs.
b. Consider a uniform tree with branching factor b, solution depth d, and unit step costs.
How many iterations will iterative lengthening require?
c. Now consider step costs drawn from the continuous range [ǫ, 1], where 0 < ǫ < 1. How
many iterations are required in the worst case?
d. Implement the algorithm and apply it to instances of the 8-puzzle and traveling sales-
person problems. Compare the algorithm’s performance to that of uniform-cost search,
and comment on your results.
3.19 Describe a state space in which iterative deepening search performs much worse than
depth-first search (for example, O(n2) vs. O(n)).
3.20 Write a program that will take as input two Web page URLs and find a path of links
from one to the other. What is an appropriate search strategy? Is bidirectional search a good
idea? Could a search engine be used to implement a predecessor function?
3.21 Consider the vacuum-world problem defined in Figure 2.2.
a. Which of the algorithms defined in this chapter would be appropriate for this problem?
Should the algorithm use tree search or graph search?
b. Apply your chosen algorithm to compute an optimal sequence of actions for a 3 × 3
world whose initial state has dirt in the three top squares and the agent in the center.
c. Construct a search agent for the vacuum world, and evaluate its performance in a set of
3 × 3 worlds with probability 0.2 of dirt in each square. Include the search cost as well
as path cost in the performance measure, using a reasonable exchange rate.
d. Compare your best search agent with a simple randomized reflex agent that sucks if
there is dirt and otherwise moves randomly.
e. Consider what would happen if the world were enlarged to n × n. How does the per-
formance of the search agent and of the reflex agent vary with n?
3.22 Prove each of the following statements, or give a counterexample:
a. Breadth-first search is a special case of uniform-cost search.
b. Depth-first search is a special case of best-first tree search.
c. Uniform-cost search is a special case of A∗ search.
3.23 Compare the performance of A∗ and RBFS on a set of randomly generated problems

in the 8-puzzle (with Manhattan distance) and TSP (with MST—see Exercise 3.33) domains.
Discuss your results. What happens to the performance of RBFS when a small random num-
ber is added to the heuristic values in the 8-puzzle domain?
3.24 Trace the operation of A∗ search applied to the problem of getting to Bucharest from
Lugoj using the straight-line distance heuristic. That is, show the sequence of nodes that the
algorithm will consider and the f, g, and h score for each node.
3.25 Sometimes there is no good evaluation function for a problem but there is a good
comparison method: a way to tell whether one node is better than another without assigning
numerical values to either. Show that this is enough to do a best-first search. Is there an
analog of A∗ for this setting?
3.26 Devise a state space in which A∗ using GRAPH-SEARCH returns a suboptimal solution
with an h(n) function that is admissible but inconsistent.
3.27 Accurate heuristics don’t necessarily reduce search time in the worst case. Given any
depth d, define a search problem with a goal node at depth d, and write a heuristic function
such that |h(n)−h∗(n)| ≤ O(log h∗(n)) but A∗ expands all nodes of depth less than d.
3.28 The heuristic path algorithm (Pohl, 1977) is a best-first search in which the evalu-
HEURISTIC PATH
ALGORITHM
ation function is f(n) = (2 − w)g(n) + wh(n). For what values of w is this complete?
For what values is it optimal, assuming that h is admissible? What kind of search does this
perform for w = 0, w = 1, and w = 2?
3.29 Consider the unbounded version of the regular 2D grid shown in Figure 3.9. The start
state is at the origin, (0,0), and the goal state is at (x, y).
a. What is the branching factor b in this state space?
b. How many distinct states are there at depth k (for k > 0)?
c. What is the maximum number of nodes expanded by breadth-first tree search?
d. What is the maximum number of nodes expanded by breadth-first graph search?
e. Is h = |u − x| + |v − y| an admissible heuristic for a state at (u, v)? Explain.
f. How many nodes are expanded by A∗ graph search using h?
g. Does h remain admissible if some links are removed?
h. Does h remain admissible if some links are added between nonadjacent states?
3.30 Consider the problem of moving k knights from k starting squares s1, . . . , sk to k goal
squares g1, . . . , gk, on an unbounded chessboard, subject to the rule that no two knights can
land on the same square at the same time. Each action consists of moving up to k knights
simultaneously. We would like to complete the maneuver in the smallest number of actions.
a. What is the maximum branching factor in this state space, expressed as a function of k?
b. Suppose hi is an admissible heuristic for the problem of moving knight i to goal gi by
itself. Which of the following heuristics are admissible for the k-knight problem? Of
those, which is the best?
(i) min{h1, . . . , hk}.

Exercises 119
(ii) max{h1, . . . , hk}.
(iii)
Pk
i = 1 hi.
c. Repeat (b) for the case where you are allowed to move only one knight at a time.
3.31 We saw on page 93 that the straight-line distance heuristic leads greedy best-first
search astray on the problem of going from Iasi to Fagaras. However, the heuristic is per-
fect on the opposite problem: going from Fagaras to Iasi. Are there problems for which the
heuristic is misleading in both directions?
3.32 Prove that if a heuristic is consistent, it must be admissible. Construct an admissible
heuristic that is not consistent.
3.33 The traveling salesperson problem (TSP) can be solved with the minimum-spanning-
tree (MST) heuristic, which estimates the cost of completing a tour, given that a partial tour
has already been constructed. The MST cost of a set of cities is the smallest sum of the link
costs of any tree that connects all the cities.
a. Show how this heuristic can be derived from a relaxed version of the TSP.
b. Show that the MST heuristic dominates straight-line distance.
c. Write a problem generator for instances of the TSP where cities are represented by
random points in the unit square.
d. Find an efficient algorithm in the literature for constructing the MST, and use it with A∗
graph search to solve instances of the TSP.
3.34 On page 105, we defined the relaxation of the 8-puzzle in which a tile can move from
square A to square B if B is blank. The exact solution of this problem defines Gaschnig’s
heuristic (Gaschnig, 1979). Explain why Gaschnig’s heuristic is at least as accurate as h1
(misplaced tiles), and show cases where it is more accurate than both h1 and h2 (Manhattan
distance). Explain how to calculate Gaschnig’s heuristic efficiently.
3.35 We gave two simple heuristics for the 8-puzzle: Manhattan distance and misplaced
tiles. Several heuristics in the literature purport to improve on this—see, for example, Nils-
son (1971), Mostow and Prieditis (1989), and Hansson et al. (1992). Test these claims by
implementing the heuristics and comparing the performance of the resulting algorithms.

%9/.$ #,!33)#!,
3%!2#(
,Q ZKLFK ZH UHOD[ WKH VLPSOLILQJ DVVXPSWLRQV RI WKH SUHYLRXV FKDSWHU WKHUHE
JHWWLQJ FORVHU WR WKH UHDO ZRUOG
#HAPTER ADDRESSED A SINGLE CATEGORY OF PROBLEMS OBSERVABLE DETERMINISTIC KNOWN ENVI
RONMENTS WHERE THE SOLUTION IS A SEQUENCE OF ACTIONS )N THIS CHAPTER WE LOOK AT WHAT HAPPENS
WHEN THESE ASSUMPTIONS ARE RELAXED 7E BEGIN WITH A FAIRLY SIMPLE CASE 3ECTIONS AND
COVER ALGORITHMS THAT PERFORM PURELY ORFDO VHDUFK IN THE STATE SPACE EVALUATING AND MODIFY
ING ONE OR MORE CURRENT STATES RATHER THAN SYSTEMATICALLY EXPLORING PATHS FROM AN INITIAL STATE
4HESE ALGORITHMS ARE SUITABLE FOR PROBLEMS IN WHICH ALL THAT MATTERS IS THE SOLUTION STATE NOT
THE PATH COST TO REACH IT 4HE FAMILY OF LOCAL SEARCH ALGORITHMS INCLUDES METHODS INSPIRED BY
STATISTICAL PHYSICS VLPXODWHG DQQHDOLQJ AND EVOLUTIONARY BIOLOGY JHQHWLF DOJRULWKPV
4HEN IN 3ECTIONS n WE EXAMINE WHAT HAPPENS WHEN WE RELAX THE ASSUMPTIONS
OF DETERMINISM AND OBSERVABILITY 4HE KEY IDEA IS THAT IF AN AGENT CANNOT PREDICT EXACTLY WHAT
PERCEPT IT WILL RECEIVE THEN IT WILL NEED TO CONSIDER WHAT TO DO UNDER EACH FRQWLQJHQF THAT
ITS PERCEPTS MAY REVEAL 7ITH PARTIAL OBSERVABILITY THE AGENT WILL ALSO NEED TO KEEP TRACK OF
THE STATES IT MIGHT BE IN
INALLY 3ECTION INVESTIGATES RQOLQH VHDUFK IN WHICH THE AGENT IS FACED WITH A STATE
SPACE THAT IS INITIALLY UNKNOWN AND MUST BE EXPLORED
,/#!, 3%!2#( !,'/2)4(-3 !.$ /04)-):!4)/. 02/,%-3
4HE SEARCH ALGORITHMS THAT WE HAVE SEEN SO FAR ARE DESIGNED TO EXPLORE SEARCH SPACES SYS
TEMATICALLY 4HIS SYSTEMATICITY IS ACHIEVED BY KEEPING ONE OR MORE PATHS IN MEMORY AND BY
RECORDING WHICH ALTERNATIVES HAVE BEEN EXPLORED AT EACH POINT ALONG THE PATH 7HEN A GOAL IS
FOUND THE SDWK TO THAT GOAL ALSO CONSTITUTES A VROXWLRQ TO THE PROBLEM )N MANY PROBLEMS HOW
EVER THE PATH TO THE GOAL IS IRRELEVANT OR EXAMPLE IN THE QUEENS PROBLEM SEE PAGE
WHAT MATTERS IS THE lNAL CONlGURATION OF QUEENS NOT THE ORDER IN WHICH THEY ARE ADDED 4HE
SAME GENERAL PROPERTY HOLDS FOR MANY IMPORTANT APPLICATIONS SUCH AS INTEGRATED CIRCUIT DE
SIGN FACTORY mOOR LAYOUT JOB SHOP SCHEDULING AUTOMATIC PROGRAMMING TELECOMMUNICATIONS
NETWORK OPTIMIZATION VEHICLE ROUTING AND PORTFOLIO MANAGEMENT

3ECTION ,OCAL 3EARCH !LGORITHMS AND /PTIMIZATION 0ROBLEMS
)F THE PATH TO THE GOAL DOES NOT MATTER WE MIGHT CONSIDER A DIFFERENT CLASS OF ALGO
RITHMS ONES THAT DO NOT WORRY ABOUT PATHS AT ALL /RFDO VHDUFK ALGORITHMS OPERATE USING
LOCAL SEARCH
A SINGLE FXUUHQW QRGH RATHER THAN MULTIPLE PATHS AND GENERALLY MOVE ONLY TO NEIGHBORS
CURRENT NODE
OF THAT NODE 4YPICALLY THE PATHS FOLLOWED BY THE SEARCH ARE NOT RETAINED !LTHOUGH LOCAL
SEARCH ALGORITHMS ARE NOT SYSTEMATIC THEY HAVE TWO KEY ADVANTAGES THEY USE VERY LITTLE
MEMORYÛSUALLY A CONSTANT AMOUNT AND THEY CAN OFTEN lND REASONABLE SOLUTIONS IN LARGE
OR INlNITE CONTINUOUS STATE SPACES FOR WHICH SYSTEMATIC ALGORITHMS ARE UNSUITABLE
)N ADDITION TO lNDING GOALS LOCAL SEARCH ALGORITHMS ARE USEFUL FOR SOLVING PURE RS
WLPL]DWLRQ SUREOHPV IN WHICH THE AIM IS TO lND THE BEST STATE ACCORDING TO AN REMHFWLYH
OPTIMIZATION
PROBLEM
IXQFWLRQ -ANY OPTIMIZATION PROBLEMS DO NOT lT THE hSTANDARDv SEARCH MODEL INTRODUCED IN
OBJECTIVE
FUNCTION
#HAPTER OR EXAMPLE NATURE PROVIDES AN OBJECTIVE FUNCTIONˆREPRODUCTIVE lTNESSˆTHAT
$ARWINIAN EVOLUTION COULD BE SEEN AS ATTEMPTING TO OPTIMIZE BUT THERE IS NO hGOAL TESTv AND
NO hPATH COSTv FOR THIS PROBLEM
4O UNDERSTAND LOCAL SEARCH WE lND IT USEFUL TO CONSIDER THE VWDWHVSDFH ODQGVFDSH AS
STATE-SPACE
LANDSCAPE
IN IGURE ! LANDSCAPE HAS BOTH hLOCATIONv DElNED BY THE STATE AND hELEVATIONv DElNED
BY THE VALUE OF THE HEURISTIC COST FUNCTION OR OBJECTIVE FUNCTION )F ELEVATION CORRESPONDS TO
COST THEN THE AIM IS TO lND THE LOWEST VALLEYÂ JOREDO PLQLPXP IF ELEVATION CORRESPONDS
GLOBAL MINIMUM
TO AN OBJECTIVE FUNCTION THEN THE AIM IS TO lND THE HIGHEST PEAKÂ JOREDO PD[LPXP 9OU
GLOBAL MAXIMUM
CAN CONVERT FROM ONE TO THE OTHER JUST BY INSERTING A MINUS SIGN ,OCAL SEARCH ALGORITHMS
EXPLORE THIS LANDSCAPE ! FRPSOHWH LOCAL SEARCH ALGORITHM ALWAYS lNDS A GOAL IF ONE EXISTS
AN RSWLPDO ALGORITHM ALWAYS lNDS A GLOBAL MINIMUMMAXIMUM
CURRENT
STATE
OBJECTIVE FUNCTION
STATE SPACE
GLOBAL MAXIMUM
LOCAL MAXIMUM
hFLATv LOCAL MAXIMUM
SHOULDER
)LJXUH ! ONE DIMENSIONAL STATE SPACE LANDSCAPE IN WHICH ELEVATION CORRESPONDS TO THE
OBJECTIVE FUNCTION 4HE AIM IS TO lND THE GLOBAL MAXIMUM (ILL CLIMBING SEARCH MODIlES
THE CURRENT STATE TO TRY TO IMPROVE IT AS SHOWN BY THE ARROW 4HE VARIOUS TOPOGRAPHIC FEATURES
ARE DElNED IN THE TEXT

#HAPTER EYOND #LASSICAL 3EARCH
IXQFWLRQ (),, #,)-).'problem UHWXUQV A STATE THAT IS A LOCAL MAXIMUM
current ← -!+% ./$%problem).)4)!, 34!4%
ORRS GR
neighbor ← A HIGHEST VALUED SUCCESSOR OF current
LI NEIGHBOR6!,5% ≤ CURRENT6!,5% WKHQ UHWXUQ current34!4%
current ← neighbor
)LJXUH 4HE HILL CLIMBING SEARCH ALGORITHM WHICH IS THE MOST BASIC LOCAL SEARCH TECH
NIQUE !T EACH STEP THE CURRENT NODE IS REPLACED BY THE BEST NEIGHBOR IN THIS VERSION THAT
MEANS THE NEIGHBOR WITH THE HIGHEST 6!,5% BUT IF A HEURISTIC COST ESTIMATE h IS USED WE
WOULD lND THE NEIGHBOR WITH THE LOWEST h
+LOOFOLPELQJ VHDUFK
4HE KLOOFOLPELQJ SEARCH ALGORITHM VWHHSHVWDVFHQW VERSION IS SHOWN IN IGURE )T IS
HILL CLIMBING
STEEPEST ASCENT SIMPLY A LOOP THAT CONTINUALLY MOVES IN THE DIRECTION OF INCREASING VALUEˆTHAT IS UPHILL )T
TERMINATES WHEN IT REACHES A hPEAKv WHERE NO NEIGHBOR HAS A HIGHER VALUE 4HE ALGORITHM
DOES NOT MAINTAIN A SEARCH TREE SO THE DATA STRUCTURE FOR THE CURRENT NODE NEED ONLY RECORD
THE STATE AND THE VALUE OF THE OBJECTIVE FUNCTION (ILL CLIMBING DOES NOT LOOK AHEAD BEYOND
THE IMMEDIATE NEIGHBORS OF THE CURRENT STATE 4HIS RESEMBLES TRYING TO lND THE TOP OF -OUNT
%VEREST IN A THICK FOG WHILE SUFFERING FROM AMNESIA
4O ILLUSTRATE HILL CLIMBING WE WILL USE THE TXHHQV SUREOHP INTRODUCED ON PAGE
,OCAL SEARCH ALGORITHMS TYPICALLY USE A FRPSOHWHVWDWH IRUPXODWLRQ WHERE EACH STATE HAS
QUEENS ON THE BOARD ONE PER COLUMN 4HE SUCCESSORS OF A STATE ARE ALL POSSIBLE STATES
GENERATED BY MOVING A SINGLE QUEEN TO ANOTHER SQUARE IN THE SAME COLUMN SO EACH STATE HAS
8 × 7 = 56 SUCCESSORS 4HE HEURISTIC COST FUNCTION h IS THE NUMBER OF PAIRS OF QUEENS THAT
ARE ATTACKING EACH OTHER EITHER DIRECTLY OR INDIRECTLY 4HE GLOBAL MINIMUM OF THIS FUNCTION
IS ZERO WHICH OCCURS ONLY AT PERFECT SOLUTIONS IGURE A SHOWS A STATE WITH h = 17 4HE
lGURE ALSO SHOWS THE VALUES OF ALL ITS SUCCESSORS WITH THE BEST SUCCESSORS HAVING h = 12
(ILL CLIMBING ALGORITHMS TYPICALLY CHOOSE RANDOMLY AMONG THE SET OF BEST SUCCESSORS IF THERE
IS MORE THAN ONE
(ILL CLIMBING IS SOMETIMES CALLED JUHHG ORFDO VHDUFK BECAUSE IT GRABS A GOOD NEIGHBOR
GREEDY LOCAL
SEARCH
STATE WITHOUT THINKING AHEAD ABOUT WHERE TO GO NEXT !LTHOUGH GREED IS CONSIDERED ONE OF THE
SEVEN DEADLY SINS IT TURNS OUT THAT GREEDY ALGORITHMS OFTEN PERFORM QUITE WELL (ILL CLIMBING
OFTEN MAKES RAPID PROGRESS TOWARD A SOLUTION BECAUSE IT IS USUALLY QUITE EASY TO IMPROVE A BAD
STATE OR EXAMPLE FROM THE STATE IN IGURE A IT TAKES JUST lVE STEPS TO REACH THE STATE
IN IGURE B WHICH HAS h = 1 AND IS VERY NEARLY A SOLUTION 5NFORTUNATELY HILL CLIMBING
OFTEN GETS STUCK FOR THE FOLLOWING REASONS
• /RFDO PD[LPD A LOCAL MAXIMUM IS A PEAK THAT IS HIGHER THAN EACH OF ITS NEIGHBORING
LOCAL MAXIMUM
STATES BUT LOWER THAN THE GLOBAL MAXIMUM (ILL CLIMBING ALGORITHMS THAT REACH THE
VICINITY OF A LOCAL MAXIMUM WILL BE DRAWN UPWARD TOWARD THE PEAK BUT WILL THEN BE
STUCK WITH NOWHERE ELSE TO GO IGURE ILLUSTRATES THE PROBLEM SCHEMATICALLY -ORE


A B
)LJXUH A !N QUEENS STATE WITH HEURISTIC COST ESTIMATE h = 17 SHOWING THE VALUE OF
h FOR EACH POSSIBLE SUCCESSOR OBTAINED BY MOVING A QUEEN WITHIN ITS COLUMN 4HE BEST MOVES
ARE MARKED B ! LOCAL MINIMUM IN THE QUEENS STATE SPACE THE STATE HAS h = 1 BUT EVERY
SUCCESSOR HAS A HIGHER COST
CONCRETELY THE STATE IN IGURE B IS A LOCAL MAXIMUM IE A LOCAL MINIMUM FOR THE
COST h EVERY MOVE OF A SINGLE QUEEN MAKES THE SITUATION WORSE
• 5LGJHV A RIDGE IS SHOWN IN IGURE 2IDGES RESULT IN A SEQUENCE OF LOCAL MAXIMA
RIDGE
THAT IS VERY DIFlCULT FOR GREEDY ALGORITHMS TO NAVIGATE
• 3ODWHDX[ A PLATEAU IS A mAT AREA OF THE STATE SPACE LANDSCAPE )T CAN BE A mAT LOCAL
PLATEAU
MAXIMUM FROM WHICH NO UPHILL EXIT EXISTS OR A VKRXOGHU FROM WHICH PROGRESS IS
SHOULDER
POSSIBLE 3EE IGURE ! HILL CLIMBING SEARCH MIGHT GET LOST ON THE PLATEAU
)N EACH CASE THE ALGORITHM REACHES A POINT AT WHICH NO PROGRESS IS BEING MADE 3TARTING FROM
A RANDOMLY GENERATED QUEENS STATE STEEPEST ASCENT HILL CLIMBING GETS STUCK OF THE TIME
SOLVING ONLY OF PROBLEM INSTANCES )T WORKS QUICKLY TAKING JUST STEPS ON AVERAGE WHEN
IT SUCCEEDS AND WHEN IT GETS STUCKˆNOT BAD FOR A STATE SPACE WITH 88 ≈ 17 MILLION STATES
4HE ALGORITHM IN IGURE HALTS IF IT REACHES A PLATEAU WHERE THE BEST SUCCESSOR HAS
THE SAME VALUE AS THE CURRENT STATE -IGHT IT NOT BE A GOOD IDEA TO KEEP GOINGˆTO ALLOW A
VLGHZDV PRYH IN THE HOPE THAT THE PLATEAU IS REALLY A SHOULDER AS SHOWN IN IGURE 4HE
SIDEWAYS MOVE
ANSWER IS USUALLY YES BUT WE MUST TAKE CARE )F WE ALWAYS ALLOW SIDEWAYS MOVES WHEN THERE
ARE NO UPHILL MOVES AN INlNITE LOOP WILL OCCUR WHENEVER THE ALGORITHM REACHES A mAT LOCAL
MAXIMUM THAT IS NOT A SHOULDER /NE COMMON SOLUTION IS TO PUT A LIMIT ON THE NUMBER OF CON
SECUTIVE SIDEWAYS MOVES ALLOWED OR EXAMPLE WE COULD ALLOW UP TO SAY CONSECUTIVE
SIDEWAYS MOVES IN THE QUEENS PROBLEM 4HIS RAISES THE PERCENTAGE OF PROBLEM INSTANCES
SOLVED BY HILL CLIMBING FROM TO 3UCCESS COMES AT A COST THE ALGORITHM AVERAGES
ROUGHLY STEPS FOR EACH SUCCESSFUL INSTANCE AND FOR EACH FAILURE

)LJXUH )LLUSTRATION OF WHY RIDGES CAUSE DIFlCULTIES FOR HILL CLIMBING 4HE GRID OF STATES
DARK CIRCLES IS SUPERIMPOSED ON A RIDGE RISING FROM LEFT TO RIGHT CREATING A SEQUENCE OF LOCAL
MAXIMA THAT ARE NOT DIRECTLY CONNECTED TO EACH OTHER ROM EACH LOCAL MAXIMUM ALL THE
AVAILABLE ACTIONS POINT DOWNHILL
-ANY VARIANTS OF HILL CLIMBING HAVE BEEN INVENTED 6WRFKDVWLF KLOO FOLPELQJ CHOOSES AT
STOCHASTIC HILL
CLIMBING
RANDOM FROM AMONG THE UPHILL MOVES THE PROBABILITY OF SELECTION CAN VARY WITH THE STEEPNESS
OF THE UPHILL MOVE 4HIS USUALLY CONVERGES MORE SLOWLY THAN STEEPEST ASCENT BUT IN SOME
STATE LANDSCAPES IT lNDS BETTER SOLUTIONS )LUVWFKRLFH KLOO FOLPELQJ IMPLEMENTS STOCHASTIC
FIRST-CHOICE HILL
CLIMBING
HILL CLIMBING BY GENERATING SUCCESSORS RANDOMLY UNTIL ONE IS GENERATED THAT IS BETTER THAN THE
CURRENT STATE 4HIS IS A GOOD STRATEGY WHEN A STATE HAS MANY EG THOUSANDS OF SUCCESSORS
4HE HILL CLIMBING ALGORITHMS DESCRIBED SO FAR ARE INCOMPLETEˆTHEY OFTEN FAIL TO lND
A GOAL WHEN ONE EXISTS BECAUSE THEY CAN GET STUCK ON LOCAL MAXIMA 5DQGRPUHVWDUW KLOO
FOLPELQJ ADOPTS THE WELL KNOWN ADAGE h)F AT lRST YOU DONT SUCCEED TRY TRY AGAINv )T CON
RANDOM-RESTART
HILL CLIMBING
DUCTS A SERIES OF HILL CLIMBING SEARCHES FROM RANDOMLY GENERATED INITIAL STATES UNTIL A GOAL
IS FOUND )T IS TRIVIALLY COMPLETE WITH PROBABILITY APPROACHING BECAUSE IT WILL EVENTUALLY
GENERATE A GOAL STATE AS THE INITIAL STATE )F EACH HILL CLIMBING SEARCH HAS A PROBABILITY p OF
SUCCESS THEN THE EXPECTED NUMBER OF RESTARTS REQUIRED IS 1/p OR QUEENS INSTANCES WITH
NO SIDEWAYS MOVES ALLOWED p ≈ 0.14 SO WE NEED ROUGHLY ITERATIONS TO lND A GOAL FAIL
URES AND SUCCESS 4HE EXPECTED NUMBER OF STEPS IS THE COST OF ONE SUCCESSFUL ITERATION PLUS
(1−p)/p TIMES THE COST OF FAILURE OR ROUGHLY STEPS IN ALL 7HEN WE ALLOW SIDEWAYS MOVES
1/0.94 ≈ 1.06 ITERATIONS ARE NEEDED ON AVERAGE AND (1 × 21)+ (0.06/0.94) × 64 ≈ 25 STEPS
OR QUEENS THEN RANDOM RESTART HILL CLIMBING IS VERY EFFECTIVE INDEED %VEN FOR THREE MIL
LION QUEENS THE APPROACH CAN lND SOLUTIONS IN UNDER A MINUTE
1 'ENERATING A UDQGRP STATE FROM AN IMPLICITLY SPECIlED STATE SPACE CAN BE A HARD PROBLEM IN ITSELF
2 ,UBY HW DO PROVE THAT IT IS BEST IN SOME CASES TO RESTART A RANDOMIZED SEARCH ALGORITHM AFTER A PARTICULAR
lXED AMOUNT OF TIME AND THAT THIS CAN BE PXFK MORE EFlCIENT THAN LETTING EACH SEARCH CONTINUE INDElNITELY
$ISALLOWING OR LIMITING THE NUMBER OF SIDEWAYS MOVES IS AN EXAMPLE OF THIS IDEA

4HE SUCCESS OF HILL CLIMBING DEPENDS VERY MUCH ON THE SHAPE OF THE STATE SPACE LAND
SCAPE IF THERE ARE FEW LOCAL MAXIMA AND PLATEAUX RANDOM RESTART HILL CLIMBING WILL lND A
GOOD SOLUTION VERY QUICKLY /N THE OTHER HAND MANY REAL PROBLEMS HAVE A LANDSCAPE THAT
LOOKS MORE LIKE A WIDELY SCATTERED FAMILY OF BALDING PORCUPINES ON A mAT mOOR WITH MINIATURE
PORCUPINES LIVING ON THE TIP OF EACH PORCUPINE NEEDLE DG LQ¿QLWXP .0 HARD PROBLEMS TYPI
CALLY HAVE AN EXPONENTIAL NUMBER OF LOCAL MAXIMA TO GET STUCK ON $ESPITE THIS A REASONABLY
GOOD LOCAL MAXIMUM CAN OFTEN BE FOUND AFTER A SMALL NUMBER OF RESTARTS
6LPXODWHG DQQHDOLQJ
! HILL CLIMBING ALGORITHM THAT QHYHU MAKES hDOWNHILLv MOVES TOWARD STATES WITH LOWER VALUE
OR HIGHER COST IS GUARANTEED TO BE INCOMPLETE BECAUSE IT CAN GET STUCK ON A LOCAL MAXI
MUM )N CONTRAST A PURELY RANDOM WALKˆTHAT IS MOVING TO A SUCCESSOR CHOSEN UNIFORMLY
AT RANDOM FROM THE SET OF SUCCESSORSˆIS COMPLETE BUT EXTREMELY INEFlCIENT 4HEREFORE IT
SEEMS REASONABLE TO TRY TO COMBINE HILL CLIMBING WITH A RANDOM WALK IN SOME WAY THAT YIELDS
BOTH EFlCIENCY AND COMPLETENESS 6LPXODWHG DQQHDOLQJ IS SUCH AN ALGORITHM )N METALLURGY
SIMULATED
ANNEALING
DQQHDOLQJ IS THE PROCESS USED TO TEMPER OR HARDEN METALS AND GLASS BY HEATING THEM TO A
HIGH TEMPERATURE AND THEN GRADUALLY COOLING THEM THUS ALLOWING THE MATERIAL TO REACH A LOW
ENERGY CRYSTALLINE STATE 4O EXPLAIN SIMULATED ANNEALING WE SWITCH OUR POINT OF VIEW FROM
HILL CLIMBING TO JUDGLHQW GHVFHQW IE MINIMIZING COST AND IMAGINE THE TASK OF GETTING A
GRADIENT DESCENT
PING PONG BALL INTO THE DEEPEST CREVICE IN A BUMPY SURFACE )F WE JUST LET THE BALL ROLL IT WILL
COME TO REST AT A LOCAL MINIMUM )F WE SHAKE THE SURFACE WE CAN BOUNCE THE BALL OUT OF THE
LOCAL MINIMUM 4HE TRICK IS TO SHAKE JUST HARD ENOUGH TO BOUNCE THE BALL OUT OF LOCAL MIN
IMA BUT NOT HARD ENOUGH TO DISLODGE IT FROM THE GLOBAL MINIMUM 4HE SIMULATED ANNEALING
SOLUTION IS TO START BY SHAKING HARD IE AT A HIGH TEMPERATURE AND THEN GRADUALLY REDUCE THE
INTENSITY OF THE SHAKING IE LOWER THE TEMPERATURE
4HE INNERMOST LOOP OF THE SIMULATED ANNEALING ALGORITHM IGURE IS QUITE SIMILAR TO
HILL CLIMBING )NSTEAD OF PICKING THE EHVW MOVE HOWEVER IT PICKS A UDQGRP MOVE )F THE MOVE
IMPROVES THE SITUATION IT IS ALWAYS ACCEPTED /THERWISE THE ALGORITHM ACCEPTS THE MOVE WITH
SOME PROBABILITY LESS THAN 4HE PROBABILITY DECREASES EXPONENTIALLY WITH THE hBADNESSv OF
THE MOVEˆTHE AMOUNT ΔE BY WHICH THE EVALUATION IS WORSENED 4HE PROBABILITY ALSO DE
CREASES AS THE hTEMPERATUREv T GOES DOWN hBADv MOVES ARE MORE LIKELY TO BE ALLOWED AT THE
START WHEN T IS HIGH AND THEY BECOME MORE UNLIKELY AS T DECREASES )F THE schedule LOWERS
T SLOWLY ENOUGH THE ALGORITHM WILL lND A GLOBAL OPTIMUM WITH PROBABILITY APPROACHING
3IMULATED ANNEALING WAS lRST USED EXTENSIVELY TO SOLVE 6,3) LAYOUT PROBLEMS IN THE
EARLY S )T HAS BEEN APPLIED WIDELY TO FACTORY SCHEDULING AND OTHER LARGE SCALE OPTIMIZA
TION TASKS )N %XERCISE YOU ARE ASKED TO COMPARE ITS PERFORMANCE TO THAT OF RANDOM RESTART
HILL CLIMBING ON THE QUEENS PUZZLE
/RFDO EHDP VHDUFK
+EEPING JUST ONE NODE IN MEMORY MIGHT SEEM TO BE AN EXTREME REACTION TO THE PROBLEM OF
MEMORY LIMITATIONS 4HE ORFDO EHDP VHDUFK ALGORITHM KEEPS TRACK OF k STATES RATHER THAN
LOCAL BEAM
SEARCH
3 ,OCAL BEAM SEARCH IS AN ADAPTATION OF EHDP VHDUFK WHICH IS A PATH BASED ALGORITHM

IXQFWLRQ 3)-5,!4%$ !..%!,).'problem schedule UHWXUQV A SOLUTION STATE
LQSXWV problem A PROBLEM
schedule A MAPPING FROM TIME TO hTEMPERATUREv
current ← -!+% ./$%problem).)4)!, 34!4%
IRU t WR ∞ GR
T ← schedulet
LI T WKHQ UHWXUQ current
next ← A RANDOMLY SELECTED SUCCESSOR OF current
ΔE ← next6!,5% n current6!,5%
LI ΔE WKHQ current ← next
HOVH current ← next ONLY WITH PROBABILITY eΔE/T
)LJXUH 4HE SIMULATED ANNEALING ALGORITHM A VERSION OF STOCHASTIC HILL CLIMBING WHERE
SOME DOWNHILL MOVES ARE ALLOWED $OWNHILL MOVES ARE ACCEPTED READILY EARLY IN THE ANNEAL
ING SCHEDULE AND THEN LESS OFTEN AS TIME GOES ON 4HE schedule INPUT DETERMINES THE VALUE OF
THE TEMPERATURE T AS A FUNCTION OF TIME
JUST ONE )T BEGINS WITH k RANDOMLY GENERATED STATES !T EACH STEP ALL THE SUCCESSORS OF ALL k
STATES ARE GENERATED )F ANY ONE IS A GOAL THE ALGORITHM HALTS /THERWISE IT SELECTS THE k BEST
SUCCESSORS FROM THE COMPLETE LIST AND REPEATS
!T lRST SIGHT A LOCAL BEAM SEARCH WITH k STATES MIGHT SEEM TO BE NOTHING MORE THAN
RUNNING k RANDOM RESTARTS IN PARALLEL INSTEAD OF IN SEQUENCE )N FACT THE TWO ALGORITHMS
ARE QUITE DIFFERENT )N A RANDOM RESTART SEARCH EACH SEARCH PROCESS RUNS INDEPENDENTLY OF
THE OTHERS ,Q D ORFDO EHDP VHDUFK XVHIXO LQIRUPDWLRQ LV SDVVHG DPRQJ WKH SDUDOOHO VHDUFK
WKUHDGV )N EFFECT THE STATES THAT GENERATE THE BEST SUCCESSORS SAY TO THE OTHERS h#OME OVER
HERE THE GRASS IS GREENERv 4HE ALGORITHM QUICKLY ABANDONS UNFRUITFUL SEARCHES AND MOVES
ITS RESOURCES TO WHERE THE MOST PROGRESS IS BEING MADE
)N ITS SIMPLEST FORM LOCAL BEAM SEARCH CAN SUFFER FROM A LACK OF DIVERSITY AMONG THE
k STATESˆTHEY CAN QUICKLY BECOME CONCENTRATED IN A SMALL REGION OF THE STATE SPACE MAKING
THE SEARCH LITTLE MORE THAN AN EXPENSIVE VERSION OF HILL CLIMBING ! VARIANT CALLED VWRFKDVWLF
EHDP VHDUFK ANALOGOUS TO STOCHASTIC HILL CLIMBING HELPS ALLEVIATE THIS PROBLEM )NSTEAD
STOCHASTIC BEAM
SEARCH
OF CHOOSING THE BEST k FROM THE THE POOL OF CANDIDATE SUCCESSORS STOCHASTIC BEAM SEARCH
CHOOSES k SUCCESSORS AT RANDOM WITH THE PROBABILITY OF CHOOSING A GIVEN SUCCESSOR BEING
AN INCREASING FUNCTION OF ITS VALUE 3TOCHASTIC BEAM SEARCH BEARS SOME RESEMBLANCE TO THE
PROCESS OF NATURAL SELECTION WHEREBY THE hSUCCESSORSv OFFSPRING OF A hSTATEv ORGANISM
POPULATE THE NEXT GENERATION ACCORDING TO ITS hVALUEv lTNESS
*HQHWLF DOJRULWKPV
! JHQHWLF DOJRULWKP OR *$ IS A VARIANT OF STOCHASTIC BEAM SEARCH IN WHICH SUCCESSOR STATES
GENETIC
ALGORITHM
ARE GENERATED BY COMBINING WZR PARENT STATES RATHER THAN BY MODIFYING A SINGLE STATE 4HE
ANALOGY TO NATURAL SELECTION IS THE SAME AS IN STOCHASTIC BEAM SEARCH EXCEPT THAT NOW WE ARE
DEALING WITH SEXUAL RATHER THAN ASEXUAL REPRODUCTION

A
)NITIAL 0OPULATION
B
ITNESS UNCTION
C
3ELECTION
D
#ROSSOVER
E
-UTATION
24
23
20
11
29%
31%
26%
14%
32752411
24748552
32752411
24415124
32748552
24752411
32752124
24415411
32252124
24752411
32748152
24415417
24748552
32752411
24415124
32543213
)LJXUH 4HE GENETIC ALGORITHM ILLUSTRATED FOR DIGIT STRINGS REPRESENTING QUEENS STATES
4HE INITIAL POPULATION IN A IS RANKED BY THE lTNESS FUNCTION IN B RESULTING IN PAIRS FOR
MATING IN C 4HEY PRODUCE OFFSPRING IN D WHICH ARE SUBJECT TO MUTATION IN E

)LJXUH 4HE QUEENS STATES CORRESPONDING TO THE lRST TWO PARENTS IN IGURE C AND
THE lRST OFFSPRING IN IGURE D 4HE SHADED COLUMNS ARE LOST IN THE CROSSOVER STEP AND THE
UNSHADED COLUMNS ARE RETAINED
,IKE BEAM SEARCHES '!S BEGIN WITH A SET OF k RANDOMLY GENERATED STATES CALLED THE
SRSXODWLRQ %ACH STATE OR LQGLYLGXDO IS REPRESENTED AS A STRING OVER A lNITE ALPHABETˆMOST
POPULATION
INDIVIDUAL COMMONLY A STRING OF S AND S OR EXAMPLE AN QUEENS STATE MUST SPECIFY THE POSITIONS OF
QUEENS EACH IN A COLUMN OF SQUARES AND SO REQUIRES 8 × log2 8 = 24 BITS !LTERNATIVELY
THE STATE COULD BE REPRESENTED AS DIGITS EACH IN THE RANGE FROM TO 7E DEMONSTRATE LATER
THAT THE TWO ENCODINGS BEHAVE DIFFERENTLY IGURE A SHOWS A POPULATION OF FOUR DIGIT
STRINGS REPRESENTING QUEENS STATES
4HE PRODUCTION OF THE NEXT GENERATION OF STATES IS SHOWN IN IGURE B nE )N B
EACH STATE IS RATED BY THE OBJECTIVE FUNCTION OR IN '! TERMINOLOGY THE ¿WQHVV IXQFWLRQ !
FITNESS FUNCTION
lTNESS FUNCTION SHOULD RETURN HIGHER VALUES FOR BETTER STATES SO FOR THE QUEENS PROBLEM
WE USE THE NUMBER OF QRQDWWDFNLQJ PAIRS OF QUEENS WHICH HAS A VALUE OF FOR A SOLUTION
4HE VALUES OF THE FOUR STATES ARE AND )N THIS PARTICULAR VARIANT OF THE GENETIC
ALGORITHM THE PROBABILITY OF BEING CHOSEN FOR REPRODUCING IS DIRECTLY PROPORTIONAL TO THE
lTNESS SCORE AND THE PERCENTAGES ARE SHOWN NEXT TO THE RAW SCORES
)N C TWO PAIRS ARE SELECTED AT RANDOM FOR REPRODUCTION IN ACCORDANCE WITH THE PROB

ABILITIES IN B .OTICE THAT ONE INDIVIDUAL IS SELECTED TWICE AND ONE NOT AT ALL OR EACH
PAIR TO BE MATED A FURVVRYHU POINT IS CHOSEN RANDOMLY FROM THE POSITIONS IN THE STRING )N
CROSSOVER
IGURE THE CROSSOVER POINTS ARE AFTER THE THIRD DIGIT IN THE lRST PAIR AND AFTER THE lFTH DIGIT
IN THE SECOND PAIR
)N D THE OFFSPRING THEMSELVES ARE CREATED BY CROSSING OVER THE PARENT STRINGS AT THE
CROSSOVER POINT OR EXAMPLE THE lRST CHILD OF THE lRST PAIR GETS THE lRST THREE DIGITS FROM THE
lRST PARENT AND THE REMAINING DIGITS FROM THE SECOND PARENT WHEREAS THE SECOND CHILD GETS
THE lRST THREE DIGITS FROM THE SECOND PARENT AND THE REST FROM THE lRST PARENT 4HE QUEENS
STATES INVOLVED IN THIS REPRODUCTION STEP ARE SHOWN IN IGURE 4HE EXAMPLE SHOWS THAT
WHEN TWO PARENT STATES ARE QUITE DIFFERENT THE CROSSOVER OPERATION CAN PRODUCE A STATE THAT IS
A LONG WAY FROM EITHER PARENT STATE )T IS OFTEN THE CASE THAT THE POPULATION IS QUITE DIVERSE
EARLY ON IN THE PROCESS SO CROSSOVER LIKE SIMULATED ANNEALING FREQUENTLY TAKES LARGE STEPS IN
THE STATE SPACE EARLY IN THE SEARCH PROCESS AND SMALLER STEPS LATER ON WHEN MOST INDIVIDUALS
ARE QUITE SIMILAR
INALLY IN E EACH LOCATION IS SUBJECT TO RANDOM PXWDWLRQ WITH A SMALL INDEPENDENT
MUTATION
PROBABILITY /NE DIGIT WAS MUTATED IN THE lRST THIRD AND FOURTH OFFSPRING )N THE QUEENS
PROBLEM THIS CORRESPONDS TO CHOOSING A QUEEN AT RANDOM AND MOVING IT TO A RANDOM SQUARE
IN ITS COLUMN IGURE DESCRIBES AN ALGORITHM THAT IMPLEMENTS ALL THESE STEPS
,IKE STOCHASTIC BEAM SEARCH GENETIC ALGORITHMS COMBINE AN UPHILL TENDENCY WITH RAN
DOM EXPLORATION AND EXCHANGE OF INFORMATION AMONG PARALLEL SEARCH THREADS 4HE PRIMARY
ADVANTAGE IF ANY OF GENETIC ALGORITHMS COMES FROM THE CROSSOVER OPERATION 9ET IT CAN BE
SHOWN MATHEMATICALLY THAT IF THE POSITIONS OF THE GENETIC CODE ARE PERMUTED INITIALLY IN A
RANDOM ORDER CROSSOVER CONVEYS NO ADVANTAGE )NTUITIVELY THE ADVANTAGE COMES FROM THE
ABILITY OF CROSSOVER TO COMBINE LARGE BLOCKS OF LETTERS THAT HAVE EVOLVED INDEPENDENTLY TO PER
FORM USEFUL FUNCTIONS THUS RAISING THE LEVEL OF GRANULARITY AT WHICH THE SEARCH OPERATES OR
EXAMPLE IT COULD BE THAT PUTTING THE lRST THREE QUEENS IN POSITIONS AND WHERE THEY DO
NOT ATTACK EACH OTHER CONSTITUTES A USEFUL BLOCK THAT CAN BE COMBINED WITH OTHER BLOCKS TO
CONSTRUCT A SOLUTION
4HE THEORY OF GENETIC ALGORITHMS EXPLAINS HOW THIS WORKS USING THE IDEA OF A VFKHPD
SCHEMA
WHICH IS A SUBSTRING IN WHICH SOME OF THE POSITIONS CAN BE LEFT UNSPECIlED OR EXAMPLE
THE SCHEMA DESCRIBES ALL QUEENS STATES IN WHICH THE lRST THREE QUEENS ARE IN
POSITIONS AND RESPECTIVELY 3TRINGS THAT MATCH THE SCHEMA SUCH AS ARE
CALLED LQVWDQFHV OF THE SCHEMA )T CAN BE SHOWN THAT IF THE AVERAGE lTNESS OF THE INSTANCES OF
INSTANCE
A SCHEMA IS ABOVE THE MEAN THEN THE NUMBER OF INSTANCES OF THE SCHEMA WITHIN THE POPULATION
WILL GROW OVER TIME #LEARLY THIS EFFECT IS UNLIKELY TO BE SIGNIlCANT IF ADJACENT BITS ARE TOTALLY
UNRELATED TO EACH OTHER BECAUSE THEN THERE WILL BE FEW CONTIGUOUS BLOCKS THAT PROVIDE A
CONSISTENT BENElT 'ENETIC ALGORITHMS WORK BEST WHEN SCHEMATA CORRESPOND TO MEANINGFUL
COMPONENTS OF A SOLUTION OR EXAMPLE IF THE STRING IS A REPRESENTATION OF AN ANTENNA THEN THE
SCHEMATA MAY REPRESENT COMPONENTS OF THE ANTENNA SUCH AS REmECTORS AND DEmECTORS ! GOOD
4 4HERE ARE MANY VARIANTS OF THIS SELECTION RULE 4HE METHOD OF FXOOLQJ IN WHICH ALL INDIVIDUALS BELOW A GIVEN
THRESHOLD ARE DISCARDED CAN BE SHOWN TO CONVERGE FASTER THAN THE RANDOM VERSION AUM HW DO
5 )T IS HERE THAT THE ENCODING MATTERS )F A BIT ENCODING IS USED INSTEAD OF DIGITS THEN THE CROSSOVER POINT
HAS A CHANCE OF BEING IN THE MIDDLE OF A DIGIT WHICH RESULTS IN AN ESSENTIALLY ARBITRARY MUTATION OF THAT DIGIT

3ECTION ,OCAL 3EARCH IN #ONTINUOUS 3PACES
IXQFWLRQ '%.%4)# !,'/2)4(-population )4.%33 . UHWXUQV AN INDIVIDUAL
LQSXWV population A SET OF INDIVIDUALS
)4.%33 . A FUNCTION THAT MEASURES THE lTNESS OF AN INDIVIDUAL
UHSHDW
new population ← EMPTY SET
IRU i WR 3):%population GR
x ← 2!.$/- 3%,%#4)/.population )4.%33 .
y ← 2!.$/- 3%,%#4)/.population )4.%33 .
child ← 2%02/$5#%x y
LI SMALL RANDOM PROBABILITY WKHQ child ← -54!4%child
ADD child TO new population
population ← new population
XQWLO SOME INDIVIDUAL IS lT ENOUGH OR ENOUGH TIME HAS ELAPSED
UHWXUQ THE BEST INDIVIDUAL IN population ACCORDING TO )4.%33 .
IXQFWLRQ 2%02/$5#%x y UHWXUQV AN INDIVIDUAL
LQSXWV x y PARENT INDIVIDUALS
n ← ,%.'4(x c ← RANDOM NUMBER FROM TO n
UHWXUQ !00%.$35342).'x c 35342).'y c + 1 n
)LJXUH ! GENETIC ALGORITHM 4HE ALGORITHM IS THE SAME AS THE ONE DIAGRAMMED IN
IGURE WITH ONE VARIATION IN THIS MORE POPULAR VERSION EACH MATING OF TWO PARENTS
PRODUCES ONLY ONE OFFSPRING NOT TWO
COMPONENT IS LIKELY TO BE GOOD IN A VARIETY OF DIFFERENT DESIGNS 4HIS SUGGESTS THAT SUCCESSFUL
USE OF GENETIC ALGORITHMS REQUIRES CAREFUL ENGINEERING OF THE REPRESENTATION
)N PRACTICE GENETIC ALGORITHMS HAVE HAD A WIDESPREAD IMPACT ON OPTIMIZATION PROBLEMS
SUCH AS CIRCUIT LAYOUT AND JOB SHOP SCHEDULING !T PRESENT IT IS NOT CLEAR WHETHER THE APPEAL
OF GENETIC ALGORITHMS ARISES FROM THEIR PERFORMANCE OR FROM THEIR STHETICALLY PLEASING ORIGINS
IN THE THEORY OF EVOLUTION -UCH WORK REMAINS TO BE DONE TO IDENTIFY THE CONDITIONS UNDER
WHICH GENETIC ALGORITHMS PERFORM WELL
,/#!, 3%!2#( ). #/.4).5/53 30!#%3
)N #HAPTER WE EXPLAINED THE DISTINCTION BETWEEN DISCRETE AND CONTINUOUS ENVIRONMENTS
POINTING OUT THAT MOST REAL WORLD ENVIRONMENTS ARE CONTINUOUS 9ET NONE OF THE ALGORITHMS
WE HAVE DESCRIBED EXCEPT FOR lRST CHOICE HILL CLIMBING AND SIMULATED ANNEALING CAN HANDLE
CONTINUOUS STATE AND ACTION SPACES BECAUSE THEY HAVE INlNITE BRANCHING FACTORS 4HIS SECTION
PROVIDES A YHU EULHI INTRODUCTION TO SOME LOCAL SEARCH TECHNIQUES FOR lNDING OPTIMAL SOLU
TIONS IN CONTINUOUS SPACES 4HE LITERATURE ON THIS TOPIC IS VAST MANY OF THE BASIC TECHNIQUES

%6/,54)/. !.$ 3%!2#(
4HE THEORY OF HYROXWLRQ WAS DEVELOPED IN #HARLES $ARWINS 2Q WKH 2ULJLQ RI
6SHFLHV E 0HDQV RI 1DWXUDO 6HOHFWLRQ AND INDEPENDENTLY BY !LFRED 2USSEL
7ALLACE 4HE CENTRAL IDEA IS SIMPLE VARIATIONS OCCUR IN REPRODUCTION AND
WILL BE PRESERVED IN SUCCESSIVE GENERATIONS APPROXIMATELY IN PROPORTION TO THEIR
EFFECT ON REPRODUCTIVE lTNESS
$ARWINS THEORY WAS DEVELOPED WITH NO KNOWLEDGE OF HOW THE TRAITS OF ORGAN
ISMS CAN BE INHERITED AND MODIlED 4HE PROBABILISTIC LAWS GOVERNING THESE PRO
CESSES WERE lRST IDENTIlED BY 'REGOR -ENDEL A MONK WHO EXPERIMENTED
WITH SWEET PEAS -UCH LATER 7ATSON AND #RICK IDENTIlED THE STRUCTURE OF THE
$.! MOLECULE AND ITS ALPHABET !'4# ADENINE GUANINE THYMINE CYTOSINE )N
THE STANDARD MODEL VARIATION OCCURS BOTH BY POINT MUTATIONS IN THE LETTER SEQUENCE
AND BY hCROSSOVERv IN WHICH THE $.! OF AN OFFSPRING IS GENERATED BY COMBINING
LONG SECTIONS OF $.! FROM EACH PARENT
4HE ANALOGY TO LOCAL SEARCH ALGORITHMS HAS ALREADY BEEN DESCRIBED THE PRINCI
PAL DIFFERENCE BETWEEN STOCHASTIC BEAM SEARCH AND EVOLUTION IS THE USE OF VH[XDO RE
PRODUCTION WHEREIN SUCCESSORS ARE GENERATED FROM PXOWLSOH ORGANISMS RATHER THAN
JUST ONE 4HE ACTUAL MECHANISMS OF EVOLUTION ARE HOWEVER FAR RICHER THAN MOST
GENETIC ALGORITHMS ALLOW OR EXAMPLE MUTATIONS CAN INVOLVE REVERSALS DUPLICA
TIONS AND MOVEMENT OF LARGE CHUNKS OF $.! SOME VIRUSES BORROW $.! FROM ONE
ORGANISM AND INSERT IT IN ANOTHER AND THERE ARE TRANSPOSABLE GENES THAT DO NOTHING
BUT COPY THEMSELVES MANY THOUSANDS OF TIMES WITHIN THE GENOME 4HERE ARE EVEN
GENES THAT POISON CELLS FROM POTENTIAL MATES THAT DO NOT CARRY THE GENE THEREBY IN
CREASING THEIR OWN CHANCES OF REPLICATION -OST IMPORTANT IS THE FACT THAT THE JHQHV
WKHPVHOYHV HQFRGH WKH PHFKDQLVPV WHEREBY THE GENOME IS REPRODUCED AND TRANS
LATED INTO AN ORGANISM )N GENETIC ALGORITHMS THOSE MECHANISMS ARE A SEPARATE
PROGRAM THAT IS NOT REPRESENTED WITHIN THE STRINGS BEING MANIPULATED
$ARWINIAN EVOLUTION MAY APPEAR INEFlCIENT HAVING GENERATED BLINDLY SOME
1045 OR SO ORGANISMS WITHOUT IMPROVING ITS SEARCH HEURISTICS ONE IOTA IFTY
YEARS BEFORE $ARWIN HOWEVER THE OTHERWISE GREAT RENCH NATURALIST *EAN ,AMARCK
PROPOSED A THEORY OF EVOLUTION WHEREBY TRAITS DFTXLUHG E DGDSWDWLRQ GXU
LQJ DQ RUJDQLVP¶V OLIHWLPH WOULD BE PASSED ON TO ITS OFFSPRING 3UCH A PROCESS
WOULD BE EFFECTIVE BUT DOES NOT SEEM TO OCCUR IN NATURE -UCH LATER *AMES ALD
WIN PROPOSED A SUPERlCIALLY SIMILAR THEORY THAT BEHAVIOR LEARNED DURING AN
ORGANISMS LIFETIME COULD ACCELERATE THE RATE OF EVOLUTION 5NLIKE ,AMARCKS ALD
WINS THEORY IS ENTIRELY CONSISTENT WITH $ARWINIAN EVOLUTION BECAUSE IT RELIES ON SE
LECTION PRESSURES OPERATING ON INDIVIDUALS THAT HAVE FOUND LOCAL OPTIMA AMONG THE
SET OF POSSIBLE BEHAVIORS ALLOWED BY THEIR GENETIC MAKEUP #OMPUTER SIMULATIONS
CONlRM THAT THE hALDWIN EFFECTv IS REAL ONCE hORDINARYv EVOLUTION HAS CREATED
ORGANISMS WHOSE INTERNAL PERFORMANCE MEASURE CORRELATES WITH ACTUAL lTNESS

3ECTION ,OCAL 3EARCH IN #ONTINUOUS 3PACES
ORIGINATED IN THE TH CENTURY AFTER THE DEVELOPMENT OF CALCULUS BY .EWTON AND ,EIBNIZ 7E
lND USES FOR THESE TECHNIQUES AT SEVERAL PLACES IN THE BOOK INCLUDING THE CHAPTERS ON LEARNING
VISION AND ROBOTICS
7E BEGIN WITH AN EXAMPLE 3UPPOSE WE WANT TO PLACE THREE NEW AIRPORTS ANYWHERE
IN 2OMANIA SUCH THAT THE SUM OF SQUARED DISTANCES FROM EACH CITY ON THE MAP IGURE
TO ITS NEAREST AIRPORT IS MINIMIZED 4HE STATE SPACE IS THEN DElNED BY THE COORDINATES OF
THE AIRPORTS (x1, y1) (x2, y2) AND (x3, y3) 4HIS IS A VL[GLPHQVLRQDO SPACE WE ALSO SAY
THAT STATES ARE DElNED BY SIX YDULDEOHV )N GENERAL STATES ARE DElNED BY AN n DIMENSIONAL
VARIABLE
VECTOR OF VARIABLES [ -OVING AROUND IN THIS SPACE CORRESPONDS TO MOVING ONE OR MORE OF
THE AIRPORTS ON THE MAP 4HE OBJECTIVE FUNCTION f(x1, y1, x2, y2, x3, y3) IS RELATIVELY EASY TO
COMPUTE FOR ANY PARTICULAR STATE ONCE WE COMPUTE THE CLOSEST CITIES ,ET Ci BE THE SET OF
CITIES WHOSE CLOSEST AIRPORT IN THE CURRENT STATE IS AIRPORT i 4HEN LQ WKH QHLJKERUKRRG RI WKH
FXUUHQW VWDWH WHERE THE CiS REMAIN CONSTANT WE HAVE
f(x1, y1, x2, y2, x3, y3) =
3

i = 1

c∈Ci
(xi − xc)2
+ (yi − yc)2
.
4HIS EXPRESSION IS CORRECT ORFDOO BUT NOT GLOBALLY BECAUSE THE SETS Ci ARE DISCONTINUOUS
FUNCTIONS OF THE STATE
/NE WAY TO AVOID CONTINUOUS PROBLEMS IS SIMPLY TO GLVFUHWL]H THE NEIGHBORHOOD OF EACH
DISCRETIZATION
STATE OR EXAMPLE WE CAN MOVE ONLY ONE AIRPORT AT A TIME IN EITHER THE x OR y DIRECTION BY
A lXED AMOUNT ±δ 7ITH VARIABLES THIS GIVES POSSIBLE SUCCESSORS FOR EACH STATE 7E
CAN THEN APPLY ANY OF THE LOCAL SEARCH ALGORITHMS DESCRIBED PREVIOUSLY 7E COULD ALSO AP
PLY STOCHASTIC HILL CLIMBING AND SIMULATED ANNEALING DIRECTLY WITHOUT DISCRETIZING THE SPACE
4HESE ALGORITHMS CHOOSE SUCCESSORS RANDOMLY WHICH CAN BE DONE BY GENERATING RANDOM VEC
TORS OF LENGTH δ
-ANY METHODS ATTEMPT TO USE THE JUDGLHQW OF THE LANDSCAPE TO lND A MAXIMUM 4HE
GRADIENT
GRADIENT OF THE OBJECTIVE FUNCTION IS A VECTOR ∇f THAT GIVES THE MAGNITUDE AND DIRECTION OF THE
STEEPEST SLOPE OR OUR PROBLEM WE HAVE
∇f =

∂f
∂x1
,
∂f
∂y1
,
∂f
∂x2
,
∂f
∂y2
,
∂f
∂x3
,
∂f
∂y3

.
)N SOME CASES WE CAN lND A MAXIMUM BY SOLVING THE EQUATION ∇f = 0 4HIS COULD BE DONE
FOR EXAMPLE IF WE WERE PLACING JUST ONE AIRPORT THE SOLUTION IS THE ARITHMETIC MEAN OF ALL THE
CITIES COORDINATES )N MANY CASES HOWEVER THIS EQUATION CANNOT BE SOLVED IN CLOSED FORM
OR EXAMPLE WITH THREE AIRPORTS THE EXPRESSION FOR THE GRADIENT DEPENDS ON WHAT CITIES ARE
CLOSEST TO EACH AIRPORT IN THE CURRENT STATE 4HIS MEANS WE CAN COMPUTE THE GRADIENT ORFDOO
BUT NOT JOREDOO FOR EXAMPLE
∂f
∂x1
= 2

c∈C1
(xi − xc) .
'IVEN A LOCALLY CORRECT EXPRESSION FOR THE GRADIENT WE CAN PERFORM STEEPEST ASCENT HILL CLIMB
6 ! BASIC KNOWLEDGE OF MULTIVARIATE CALCULUS AND VECTOR ARITHMETIC IS USEFUL FOR READING THIS SECTION

ING BY UPDATING THE CURRENT STATE ACCORDING TO THE FORMULA
[ ← [ + α∇f([) ,
WHERE α IS A SMALL CONSTANT OFTEN CALLED THE VWHS VL]H )N OTHER CASES THE OBJECTIVE FUNCTION
STEP SIZE
MIGHT NOT BE AVAILABLE IN A DIFFERENTIABLE FORM AT ALLˆFOR EXAMPLE THE VALUE OF A PARTICULAR SET
OF AIRPORT LOCATIONS MIGHT BE DETERMINED BY RUNNING SOME LARGE SCALE ECONOMIC SIMULATION
PACKAGE )N THOSE CASES WE CAN CALCULATE A SO CALLED HPSLULFDO JUDGLHQW BY EVALUATING THE
EMPIRICAL
GRADIENT
RESPONSE TO SMALL INCREMENTS AND DECREMENTS IN EACH COORDINATE %MPIRICAL GRADIENT SEARCH
IS THE SAME AS STEEPEST ASCENT HILL CLIMBING IN A DISCRETIZED VERSION OF THE STATE SPACE
(IDDEN BENEATH THE PHRASE hα IS A SMALL CONSTANTv LIES A HUGE VARIETY OF METHODS FOR
ADJUSTING α 4HE BASIC PROBLEM IS THAT IF α IS TOO SMALL TOO MANY STEPS ARE NEEDED IF α
IS TOO LARGE THE SEARCH COULD OVERSHOOT THE MAXIMUM 4HE TECHNIQUE OF OLQH VHDUFK TRIES TO
LINE SEARCH
OVERCOME THIS DILEMMA BY EXTENDING THE CURRENT GRADIENT DIRECTIONˆUSUALLY BY REPEATEDLY
DOUBLING αˆUNTIL f STARTS TO DECREASE AGAIN 4HE POINT AT WHICH THIS OCCURS BECOMES THE NEW
CURRENT STATE 4HERE ARE SEVERAL SCHOOLS OF THOUGHT ABOUT HOW THE NEW DIRECTION SHOULD BE
CHOSEN AT THIS POINT
OR MANY PROBLEMS THE MOST EFFECTIVE ALGORITHM IS THE VENERABLE 1HZWRQ±5DSKVRQ
NEWTON–RAPHSON
METHOD 4HIS IS A GENERAL TECHNIQUE FOR lNDING ROOTS OF FUNCTIONSˆTHAT IS SOLVING EQUATIONS
OF THE FORM g(x) = 0 )T WORKS BY COMPUTING A NEW ESTIMATE FOR THE ROOT x ACCORDING TO
.EWTONS FORMULA
x ← x − g(x)/g
(x) .
4O lND A MAXIMUM OR MINIMUM OF f WE NEED TO lND [ SUCH THAT THE JUDGLHQW IS ZERO IE
∇f([) = 4HUS g(x) IN .EWTONS FORMULA BECOMES ∇f([) AND THE UPDATE EQUATION CAN
BE WRITTEN IN MATRIXnVECTOR FORM AS
[ ← [ − +−1
f ([)∇f([) ,
WHERE +f ([) IS THE +HVVLDQ MATRIX OF SECOND DERIVATIVES WHOSE ELEMENTS Hij ARE GIVEN
HESSIAN
BY ∂2f/∂xi∂xj OR OUR AIRPORT EXAMPLE WE CAN SEE FROM %QUATION THAT +f ([) IS
PARTICULARLY SIMPLE THE OFF DIAGONAL ELEMENTS ARE ZERO AND THE DIAGONAL ELEMENTS FOR AIRPORT
i ARE JUST TWICE THE NUMBER OF CITIES IN Ci ! MOMENTS CALCULATION SHOWS THAT ONE STEP OF
THE UPDATE MOVES AIRPORT i DIRECTLY TO THE CENTROID OF Ci WHICH IS THE MINIMUM OF THE LOCAL
EXPRESSION FOR f FROM %QUATION OR HIGH DIMENSIONAL PROBLEMS HOWEVER COMPUTING
THE n2 ENTRIES OF THE (ESSIAN AND INVERTING IT MAY BE EXPENSIVE SO MANY APPROXIMATE VERSIONS
OF THE .EWTONn2APHSON METHOD HAVE BEEN DEVELOPED
,OCAL SEARCH METHODS SUFFER FROM LOCAL MAXIMA RIDGES AND PLATEAUX IN CONTINUOUS
STATE SPACES JUST AS MUCH AS IN DISCRETE SPACES 2ANDOM RESTARTS AND SIMULATED ANNEALING CAN
BE USED AND ARE OFTEN HELPFUL (IGH DIMENSIONAL CONTINUOUS SPACES ARE HOWEVER BIG PLACES
IN WHICH IT IS EASY TO GET LOST
! lNAL TOPIC WITH WHICH A PASSING ACQUAINTANCE IS USEFUL IS FRQVWUDLQHG RSWLPL]DWLRQ
CONSTRAINED
OPTIMIZATION
!N OPTIMIZATION PROBLEM IS CONSTRAINED IF SOLUTIONS MUST SATISFY SOME HARD CONSTRAINTS ON THE
VALUES OF THE VARIABLES OR EXAMPLE IN OUR AIRPORT SITING PROBLEM WE MIGHT CONSTRAIN SITES
7 )N GENERAL THE .EWTONn2APHSON UPDATE CAN BE SEEN AS lTTING A QUADRATIC SURFACE TO f AT [ AND THEN MOVING
DIRECTLY TO THE MINIMUM OF THAT SURFACEˆWHICH IS ALSO THE MINIMUM OF f IF f IS QUADRATIC

3ECTION 3EARCHING WITH .ONDETERMINISTIC !CTIONS
TO BE INSIDE 2OMANIA AND ON DRY LAND RATHER THAN IN THE MIDDLE OF LAKES 4HE DIFlCULTY OF
CONSTRAINED OPTIMIZATION PROBLEMS DEPENDS ON THE NATURE OF THE CONSTRAINTS AND THE OBJECTIVE
FUNCTION 4HE BEST KNOWN CATEGORY IS THAT OF OLQHDU SURJUDPPLQJ PROBLEMS IN WHICH CON
LINEAR
PROGRAMMING
STRAINTS MUST BE LINEAR INEQUALITIES FORMING A FRQYH[ VHW AND THE OBJECTIVE FUNCTION IS ALSO
CONVEX SET
LINEAR 4HE TIME COMPLEXITY OF LINEAR PROGRAMMING IS POLYNOMIAL IN THE NUMBER OF VARIABLES
,INEAR PROGRAMMING IS PROBABLY THE MOST WIDELY STUDIED AND BROADLY USEFUL CLASS OF
OPTIMIZATION PROBLEMS )T IS A SPECIAL CASE OF THE MORE GENERAL PROBLEM OF FRQYH[ RSWL
PL]DWLRQ WHICH ALLOWS THE CONSTRAINT REGION TO BE ANY CONVEX REGION AND THE OBJECTIVE TO
CONVEX
OPTIMIZATION
BE ANY FUNCTION THAT IS CONVEX WITHIN THE CONSTRAINT REGION 5NDER CERTAIN CONDITIONS CONVEX
OPTIMIZATION PROBLEMS ARE ALSO POLYNOMIALLY SOLVABLE AND MAY BE FEASIBLE IN PRACTICE WITH
THOUSANDS OF VARIABLES 3EVERAL IMPORTANT PROBLEMS IN MACHINE LEARNING AND CONTROL THEORY
CAN BE FORMULATED AS CONVEX OPTIMIZATION PROBLEMS SEE #HAPTER
3%!2#().' 7)4( ./.$%4%2-).)34)# !#4)/.3
)N #HAPTER WE ASSUMED THAT THE ENVIRONMENT IS FULLY OBSERVABLE AND DETERMINISTIC AND THAT
THE AGENT KNOWS WHAT THE EFFECTS OF EACH ACTION ARE 4HEREFORE THE AGENT CAN CALCULATE EXACTLY
WHICH STATE RESULTS FROM ANY SEQUENCE OF ACTIONS AND ALWAYS KNOWS WHICH STATE IT IS IN )TS
PERCEPTS PROVIDE NO NEW INFORMATION AFTER EACH ACTION ALTHOUGH OF COURSE THEY TELL THE AGENT
THE INITIAL STATE
7HEN THE ENVIRONMENT IS EITHER PARTIALLY OBSERVABLE OR NONDETERMINISTIC OR BOTH PER
CEPTS BECOME USEFUL )N A PARTIALLY OBSERVABLE ENVIRONMENT EVERY PERCEPT HELPS NARROW DOWN
THE SET OF POSSIBLE STATES THE AGENT MIGHT BE IN THUS MAKING IT EASIER FOR THE AGENT TO ACHIEVE
ITS GOALS 7HEN THE ENVIRONMENT IS NONDETERMINISTIC PERCEPTS TELL THE AGENT WHICH OF THE POS
SIBLE OUTCOMES OF ITS ACTIONS HAS ACTUALLY OCCURRED )N BOTH CASES THE FUTURE PERCEPTS CANNOT
BE DETERMINED IN ADVANCE AND THE AGENTS FUTURE ACTIONS WILL DEPEND ON THOSE FUTURE PERCEPTS
3O THE SOLUTION TO A PROBLEM IS NOT A SEQUENCE BUT A FRQWLQJHQF SODQ ALSO KNOWN AS A VWUDW
CONTINGENCY PLAN
HJ THAT SPECIlES WHAT TO DO DEPENDING ON WHAT PERCEPTS ARE RECEIVED )N THIS SECTION WE
STRATEGY
EXAMINE THE CASE OF NONDETERMINISM DEFERRING PARTIAL OBSERVABILITY TO 3ECTION
7KH HUUDWLF YDFXXP ZRUOG
!S AN EXAMPLE WE USE THE VACUUM WORLD lRST INTRODUCED IN #HAPTER AND DElNED AS A
SEARCH PROBLEM IN 3ECTION 2ECALL THAT THE STATE SPACE HAS EIGHT STATES AS SHOWN IN
IGURE 4HERE ARE THREE ACTIONSˆ/HIW 5LJKW AND 6XFNˆAND THE GOAL IS TO CLEAN UP ALL
THE DIRT STATES AND )F THE ENVIRONMENT IS OBSERVABLE DETERMINISTIC AND COMPLETELY
KNOWN THEN THE PROBLEM IS TRIVIALLY SOLVABLE BY ANY OF THE ALGORITHMS IN #HAPTER AND THE
SOLUTION IS AN ACTION SEQUENCE OR EXAMPLE IF THE INITIAL STATE IS THEN THE ACTION SEQUENCE
;6XFN 5LJKW 6XFN= WILL REACH A GOAL STATE
8 ! SET OF POINTS S IS CONVEX IF THE LINE JOINING ANY TWO POINTS IN S IS ALSO CONTAINED IN S ! FRQYH[ IXQFWLRQ IS
ONE FOR WHICH THE SPACE hABOVEv IT FORMS A CONVEX SET BY DElNITION CONVEX FUNCTIONS HAVE NO LOCAL AS OPPOSED
TO GLOBAL MINIMA


)LJXUH 4HE EIGHT POSSIBLE STATES OF THE VACUUM WORLD STATES AND ARE GOAL STATES
.OW SUPPOSE THAT WE INTRODUCE NONDETERMINISM IN THE FORM OF A POWERFUL BUT ERRATIC
VACUUM CLEANER )N THE HUUDWLF YDFXXP ZRUOG THE 6XFN ACTION WORKS AS FOLLOWS
ERRATIC VACUUM
WORLD
• 7HEN APPLIED TO A DIRTY SQUARE THE ACTION CLEANS THE SQUARE AND SOMETIMES CLEANS UP
DIRT IN AN ADJACENT SQUARE TOO
• 7HEN APPLIED TO A CLEAN SQUARE THE ACTION SOMETIMES DEPOSITS DIRT ON THE CARPET
4O PROVIDE A PRECISE FORMULATION OF THIS PROBLEM WE NEED TO GENERALIZE THE NOTION OF A WUDQ
VLWLRQ PRGHO FROM #HAPTER )NSTEAD OF DElNING THE TRANSITION MODEL BY A 2%35,4 FUNCTION
THAT RETURNS A SINGLE STATE WE USE A 2%35,43 FUNCTION THAT RETURNS A VHW OF POSSIBLE OUTCOME
STATES OR EXAMPLE IN THE ERRATIC VACUUM WORLD THE 6XFN ACTION IN STATE LEADS TO A STATE IN
THE SET {5, 7}ˆTHE DIRT IN THE RIGHT HAND SQUARE MAY OR MAY NOT BE VACUUMED UP
7E ALSO NEED TO GENERALIZE THE NOTION OF A VROXWLRQ TO THE PROBLEM OR EXAMPLE IF WE
START IN STATE THERE IS NO SINGLE VHTXHQFH OF ACTIONS THAT SOLVES THE PROBLEM )NSTEAD WE
NEED A CONTINGENCY PLAN SUCH AS THE FOLLOWING
[6XFN, LI State = 5 WKHQ [5LJKW, 6XFN] HOVH [ ]] .
4HUS SOLUTIONS FOR NONDETERMINISTIC PROBLEMS CAN CONTAIN NESTED LInWKHQnHOVH STATEMENTS
THIS MEANS THAT THEY ARE WUHHV RATHER THAN SEQUENCES 4HIS ALLOWS THE SELECTION OF ACTIONS
BASED ON CONTINGENCIES ARISING DURING EXECUTION -ANY PROBLEMS IN THE REAL PHYSICAL WORLD
ARE CONTINGENCY PROBLEMS BECAUSE EXACT PREDICTION IS IMPOSSIBLE OR THIS REASON MANY
PEOPLE KEEP THEIR EYES OPEN WHILE WALKING AROUND OR DRIVING
9 7E ASSUME THAT MOST READERS FACE SIMILAR PROBLEMS AND CAN SYMPATHIZE WITH OUR AGENT 7E APOLOGIZE TO
OWNERS OF MODERN EFlCIENT HOME APPLIANCES WHO CANNOT TAKE ADVANTAGE OF THIS PEDAGOGICAL DEVICE

!.$n/2 VHDUFK WUHHV
4HE NEXT QUESTION IS HOW TO lND CONTINGENT SOLUTIONS TO NONDETERMINISTIC PROBLEMS !S IN
#HAPTER WE BEGIN BY CONSTRUCTING SEARCH TREES BUT HERE THE TREES HAVE A DIFFERENT CHARACTER
)N A DETERMINISTIC ENVIRONMENT THE ONLY BRANCHING IS INTRODUCED BY THE AGENTS OWN CHOICES
IN EACH STATE 7E CALL THESE NODES 25 QRGHV )N THE VACUUM WORLD FOR EXAMPLE AT AN /2
OR NODE
NODE THE AGENT CHOOSES /HIW RU 5LJKW RU 6XFN )N A NONDETERMINISTIC ENVIRONMENT BRANCHING
IS ALSO INTRODUCED BY THE HQYLURQPHQW¶V CHOICE OF OUTCOME FOR EACH ACTION 7E CALL THESE
NODES $1' QRGHV OR EXAMPLE THE 6XFN ACTION IN STATE LEADS TO A STATE IN THE SET {5, 7}
AND NODE
SO THE AGENT WOULD NEED TO lND A PLAN FOR STATE DQG FOR STATE 4HESE TWO KINDS OF NODES
ALTERNATE LEADING TO AN !.$n/2 WUHH AS ILLUSTRATED IN IGURE
AND–OR TREE
! SOLUTION FOR AN !.$n/2 SEARCH PROBLEM IS A SUBTREE THAT HAS A GOAL NODE AT EVERY
LEAF SPECIlES ONE ACTION AT EACH OF ITS /2 NODES AND INCLUDES EVERY OUTCOME BRANCH
AT EACH OF ITS !.$ NODES 4HE SOLUTION IS SHOWN IN BOLD LINES IN THE lGURE IT CORRESPONDS
TO THE PLAN GIVEN IN %QUATION 4HE PLAN USES IFnTHENnELSE NOTATION TO HANDLE THE !.$
BRANCHES BUT WHEN THERE ARE MORE THAN TWO BRANCHES AT A NODE IT MIGHT BE BETTER TO USE A FDVH
/HIW
6XFN
5LJKW
6XFN
5LJKW
6XFN
6
*2$/
8
*2$/
7
1
2
5
1
/223
5
/223
5
/223
/HIW 6XFN
1
/223 *2$/
8 4
)LJXUH 4HE lRST TWO LEVELS OF THE SEARCH TREE FOR THE ERRATIC VACUUM WORLD 3TATE
NODES ARE /2 NODES WHERE SOME ACTION MUST BE CHOSEN !T THE !.$ NODES SHOWN AS CIRCLES
EVERY OUTCOME MUST BE HANDLED AS INDICATED BY THE ARC LINKING THE OUTGOING BRANCHES 4HE
SOLUTION FOUND IS SHOWN IN BOLD LINES

IXQFWLRQ !.$ /2 '2!0( 3%!2#(problem UHWXUQV a conditional plan, or failure
/2 3%!2#(problem).)4)!, 34!4% problem [ ]
IXQFWLRQ /2 3%!2#(state problem path UHWXUQV a conditional plan, or failure
LI problem'/!, 4%34state WKHQ UHWXUQ THE EMPTY PLAN
LI state IS ON path WKHQ UHWXUQ failure
IRU HDFK action LQ problem!#4)/.3state GR
plan ← !.$ 3%!2#(2%35,43state action problem [state | path]
LI plan = failure WKHQ UHWXUQ [action | plan]
UHWXUQ failure
IXQFWLRQ !.$ 3%!2#(states problem path UHWXUQV a conditional plan, or failure
IRU HDFK si LQ states GR
plani ← /2 3%!2#(si problem path
LI plani failure WKHQ UHWXUQ failure
UHWXUQ [LI s1 WKHQ plan1 HOVH LI s2 WKHQ plan2 HOVH . . . LI sn−1 WKHQ plann−1 HOVH plann]
)LJXUH !N ALGORITHM FOR SEARCHING !.$n/2 GRAPHS GENERATED BY NONDETERMINISTIC
ENVIRONMENTS )T RETURNS A CONDITIONAL PLAN THAT REACHES A GOAL STATE IN ALL CIRCUMSTANCES 4HE
NOTATION [x | l] REFERS TO THE LIST FORMED BY ADDING OBJECT x TO THE FRONT OF LIST l
CONSTRUCT -ODIFYING THE BASIC PROBLEM SOLVING AGENT SHOWN IN IGURE TO EXECUTE CON
TINGENT SOLUTIONS OF THIS KIND IS STRAIGHTFORWARD /NE MAY ALSO CONSIDER A SOMEWHAT DIFFERENT
AGENT DESIGN IN WHICH THE AGENT CAN ACT EHIRUH IT HAS FOUND A GUARANTEED PLAN AND DEALS WITH
SOME CONTINGENCIES ONLY AS THEY ARISE DURING EXECUTION 4HIS TYPE OF LQWHUOHDYLQJ OF SEARCH
INTERLEAVING
AND EXECUTION IS ALSO USEFUL FOR EXPLORATION PROBLEMS SEE 3ECTION AND FOR GAME PLAYING
SEE #HAPTER
IGURE GIVES A RECURSIVE DEPTH lRST ALGORITHM FOR !.$n/2 GRAPH SEARCH /NE
KEY ASPECT OF THE ALGORITHM IS THE WAY IN WHICH IT DEALS WITH CYCLES WHICH OFTEN ARISE IN
NONDETERMINISTIC PROBLEMS EG IF AN ACTION SOMETIMES HAS NO EFFECT OR IF AN UNINTENDED
EFFECT CAN BE CORRECTED )F THE CURRENT STATE IS IDENTICAL TO A STATE ON THE PATH FROM THE ROOT
THEN IT RETURNS WITH FAILURE 4HIS DOESNT MEAN THAT THERE IS QR SOLUTION FROM THE CURRENT STATE
IT SIMPLY MEANS THAT IF THERE LV A NONCYCLIC SOLUTION IT MUST BE REACHABLE FROM THE EARLIER
INCARNATION OF THE CURRENT STATE SO THE NEW INCARNATION CAN BE DISCARDED 7ITH THIS CHECK WE
ENSURE THAT THE ALGORITHM TERMINATES IN EVERY lNITE STATE SPACE BECAUSE EVERY PATH MUST REACH
A GOAL A DEAD END OR A REPEATED STATE .OTICE THAT THE ALGORITHM DOES NOT CHECK WHETHER THE
CURRENT STATE IS A REPETITION OF A STATE ON SOME RWKHU PATH FROM THE ROOT WHICH IS IMPORTANT FOR
EFlCIENCY %XERCISE INVESTIGATES THIS ISSUE
!.$n/2 GRAPHS CAN ALSO BE EXPLORED BY BREADTH lRST OR BEST lRST METHODS 4HE CONCEPT
OF A HEURISTIC FUNCTION MUST BE MODIlED TO ESTIMATE THE COST OF A CONTINGENT SOLUTION RATHER
THAN A SEQUENCE BUT THE NOTION OF ADMISSIBILITY CARRIES OVER AND THERE IS AN ANALOG OF THE !∗
ALGORITHM FOR lNDING OPTIMAL SOLUTIONS 0OINTERS ARE GIVEN IN THE BIBLIOGRAPHICAL NOTES AT THE
END OF THE CHAPTER

6XFN 5LJKW
6
1
2
5
5LJKW
)LJXUH 0ART OF THE SEARCH GRAPH FOR THE SLIPPERY VACUUM WORLD WHERE WE HAVE SHOWN
SOME CYCLES EXPLICITLY !LL SOLUTIONS FOR THIS PROBLEM ARE CYCLIC PLANS BECAUSE THERE IS NO
WAY TO MOVE RELIABLY
7U WU DJDLQ
#ONSIDER THE SLIPPERY VACUUM WORLD WHICH IS IDENTICAL TO THE ORDINARY NON ERRATIC VAC
UUM WORLD EXCEPT THAT MOVEMENT ACTIONS SOMETIMES FAIL LEAVING THE AGENT IN THE SAME LOCA
TION OR EXAMPLE MOVING 5LJKW IN STATE LEADS TO THE STATE SET {1, 2} IGURE SHOWS
PART OF THE SEARCH GRAPH CLEARLY THERE ARE NO LONGER ANY ACYCLIC SOLUTIONS FROM STATE AND
!.$ /2 '2!0( 3%!2#( WOULD RETURN WITH FAILURE 4HERE IS HOWEVER A FFOLF VROXWLRQ
CYCLIC SOLUTION
WHICH IS TO KEEP TRYING Right UNTIL IT WORKS 7E CAN EXPRESS THIS SOLUTION BY ADDING A ODEHO TO
LABEL
DENOTE SOME PORTION OF THE PLAN AND USING THAT LABEL LATER INSTEAD OF REPEATING THE PLAN ITSELF
4HUS OUR CYCLIC SOLUTION IS
[6XFN, L1 : Right, LI State = 5 WKHQ L1 HOVH Suck] .
! BETTER SYNTAX FOR THE LOOPING PART OF THIS PLAN WOULD BE hZKLOH State = 5 GR Rightv
)N GENERAL A CYCLIC PLAN MAY BE CONSIDERED A SOLUTION PROVIDED THAT EVERY LEAF IS A GOAL
STATE AND THAT A LEAF IS REACHABLE FROM EVERY POINT IN THE PLAN 4HE MODIlCATIONS NEEDED
TO !.$ /2 '2!0( 3%!2#( ARE COVERED IN %XERCISE 4HE KEY REALIZATION IS THAT A LOOP
IN THE STATE SPACE BACK TO A STATE L TRANSLATES TO A LOOP IN THE PLAN BACK TO THE POINT WHERE THE
SUBPLAN FOR STATE L IS EXECUTED
'IVEN THE DElNITION OF A CYCLIC SOLUTION AN AGENT EXECUTING SUCH A SOLUTION WILL EVENTU
ALLY REACH THE GOAL SURYLGHG WKDW HDFK RXWFRPH RI D QRQGHWHUPLQLVWLF DFWLRQ HYHQWXDOO RFFXUV
)S THIS CONDITION REASONABLE )T DEPENDS ON THE REASON FOR THE NONDETERMINISM )F THE ACTION
ROLLS A DIE THEN ITS REASONABLE TO SUPPOSE THAT EVENTUALLY A SIX WILL BE ROLLED )F THE ACTION IS
TO INSERT A HOTEL CARD KEY INTO THE DOOR LOCK BUT IT DOESNT WORK THE lRST TIME THEN PERHAPS IT
WILL EVENTUALLY WORK OR PERHAPS ONE HAS THE WRONG KEY OR THE WRONG ROOM !FTER SEVEN OR

EIGHT TRIES MOST PEOPLE WILL ASSUME THE PROBLEM IS WITH THE KEY AND WILL GO BACK TO THE FRONT
DESK TO GET A NEW ONE /NE WAY TO UNDERSTAND THIS DECISION IS TO SAY THAT THE INITIAL PROBLEM
FORMULATION OBSERVABLE NONDETERMINISTIC IS ABANDONED IN FAVOR OF A DIFFERENT FORMULATION
PARTIALLY OBSERVABLE DETERMINISTIC WHERE THE FAILURE IS ATTRIBUTED TO AN UNOBSERVABLE PROP
ERTY OF THE KEY 7E HAVE MORE TO SAY ON THIS ISSUE IN #HAPTER
3%!2#().' 7)4( 0!24)!, /3%26!4)/.3
7E NOW TURN TO THE PROBLEM OF PARTIAL OBSERVABILITY WHERE THE AGENTS PERCEPTS DO NOT SUF
lCE TO PIN DOWN THE EXACT STATE !S NOTED AT THE BEGINNING OF THE PREVIOUS SECTION IF THE
AGENT IS IN ONE OF SEVERAL POSSIBLE STATES THEN AN ACTION MAY LEAD TO ONE OF SEVERAL POSSIBLE
OUTCOMESˆHYHQ LI WKH HQYLURQPHQW LV GHWHUPLQLVWLF 4HE KEY CONCEPT REQUIRED FOR SOLVING
PARTIALLY OBSERVABLE PROBLEMS IS THE EHOLHI VWDWH REPRESENTING THE AGENTS CURRENT BELIEF ABOUT
BELIEF STATE
THE POSSIBLE PHYSICAL STATES IT MIGHT BE IN GIVEN THE SEQUENCE OF ACTIONS AND PERCEPTS UP TO
THAT POINT 7E BEGIN WITH THE SIMPLEST SCENARIO FOR STUDYING BELIEF STATES WHICH IS WHEN THE
AGENT HAS NO SENSORS AT ALL THEN WE ADD IN PARTIAL SENSING AS WELL AS NONDETERMINISTIC ACTIONS
6HDUFKLQJ ZLWK QR REVHUYDWLRQ
7HEN THE AGENTS PERCEPTS PROVIDE QR LQIRUPDWLRQ DW DOO WE HAVE WHAT IS CALLED A VHQVRU
OHVV PROBLEM OR SOMETIMES A FRQIRUPDQW PROBLEM !T lRST ONE MIGHT THINK THE SENSORLESS
SENSORLESS
CONFORMANT AGENT HAS NO HOPE OF SOLVING A PROBLEM IF IT HAS NO IDEA WHAT STATE ITS IN IN FACT SENSORLESS
PROBLEMS ARE QUITE OFTEN SOLVABLE -OREOVER SENSORLESS AGENTS CAN BE SURPRISINGLY USEFUL
PRIMARILY BECAUSE THEY GRQ¶W RELY ON SENSORS WORKING PROPERLY )N MANUFACTURING SYSTEMS
FOR EXAMPLE MANY INGENIOUS METHODS HAVE BEEN DEVELOPED FOR ORIENTING PARTS CORRECTLY FROM
AN UNKNOWN INITIAL POSITION BY USING A SEQUENCE OF ACTIONS WITH NO SENSING AT ALL 4HE HIGH
COST OF SENSING IS ANOTHER REASON TO AVOID IT FOR EXAMPLE DOCTORS OFTEN PRESCRIBE A BROAD
SPECTRUM ANTIBIOTIC RATHER THAN USING THE CONTINGENT PLAN OF DOING AN EXPENSIVE BLOOD TEST
THEN WAITING FOR THE RESULTS TO COME BACK AND THEN PRESCRIBING A MORE SPECIlC ANTIBIOTIC AND
PERHAPS HOSPITALIZATION BECAUSE THE INFECTION HAS PROGRESSED TOO FAR
7E CAN MAKE A SENSORLESS VERSION OF THE VACUUM WORLD !SSUME THAT THE AGENT KNOWS
THE GEOGRAPHY OF ITS WORLD BUT DOESNT KNOW ITS LOCATION OR THE DISTRIBUTION OF DIRT )N THAT
CASE ITS INITIAL STATE COULD BE ANY ELEMENT OF THE SET {1, 2, 3, 4, 5, 6, 7, 8} .OW CONSIDER WHAT
HAPPENS IF IT TRIES THE ACTION 5LJKW 4HIS WILL CAUSE IT TO BE IN ONE OF THE STATES {2, 4, 6, 8}ˆTHE
AGENT NOW HAS MORE INFORMATION URTHERMORE THE ACTION SEQUENCE ;5LJKW 6XFN= WILL ALWAYS
END UP IN ONE OF THE STATES {4, 8} INALLY THE SEQUENCE ;5LJKW 6XFN /HIW 6XFN= IS GUARANTEED
TO REACH THE GOAL STATE NO MATTER WHAT THE START STATE 7E SAY THAT THE AGENT CAN FRHUFH THE
COERCION
WORLD INTO STATE
4O SOLVE SENSORLESS PROBLEMS WE SEARCH IN THE SPACE OF BELIEF STATES RATHER THAN PHYSICAL
STATES .OTICE THAT IN BELIEF STATE SPACE THE PROBLEM IS IXOO REVHUYDEOH BECAUSE THE AGENT
10 )N A FULLY OBSERVABLE ENVIRONMENT EACH BELIEF STATE CONTAINS ONE PHYSICAL STATE 4HUS WE CAN VIEW THE ALGO
RITHMS IN #HAPTER AS SEARCHING IN A BELIEF STATE SPACE OF SINGLETON BELIEF STATES

3ECTION 3EARCHING WITH 0ARTIAL /BSERVATIONS
ALWAYS KNOWS ITS OWN BELIEF STATE URTHERMORE THE SOLUTION IF ANY IS ALWAYS A SEQUENCE OF
ACTIONS 4HIS IS BECAUSE AS IN THE ORDINARY PROBLEMS OF #HAPTER THE PERCEPTS RECEIVED AFTER
EACH ACTION ARE COMPLETELY PREDICTABLEˆTHEYRE ALWAYS EMPTY 3O THERE ARE NO CONTINGENCIES
TO PLAN FOR 4HIS IS TRUE HYHQ LI WKH HQYLURQPHQW LV QRQGHWHUPLQVWLF
)T IS INSTRUCTIVE TO SEE HOW THE BELIEF STATE SEARCH PROBLEM IS CONSTRUCTED 3UPPOSE
THE UNDERLYING PHYSICAL PROBLEM P IS DElNED BY !#4)/.3P 2%35,4P '/!, 4%34P AND
34%0 #/34P 4HEN WE CAN DElNE THE CORRESPONDING SENSORLESS PROBLEM AS FOLLOWS
• %HOLHI VWDWHV 4HE ENTIRE BELIEF STATE SPACE CONTAINS EVERY POSSIBLE SET OF PHYSICAL STATES
)F P HAS N STATES THEN THE SENSORLESS PROBLEM HAS UP TO 2N STATES ALTHOUGH MANY MAY
BE UNREACHABLE FROM THE INITIAL STATE
• ,QLWLDO VWDWH 4YPICALLY THE SET OF ALL STATES IN P ALTHOUGH IN SOME CASES THE AGENT WILL
HAVE MORE KNOWLEDGE THAN THIS
• $FWLRQV 4HIS IS SLIGHTLY TRICKY 3UPPOSE THE AGENT IS IN BELIEF STATE b = {s1, s2} BUT
!#4)/.3P (s1) = !#4)/.3P (s2) THEN THE AGENT IS UNSURE OF WHICH ACTIONS ARE LEGAL
)F WE ASSUME THAT ILLEGAL ACTIONS HAVE NO EFFECT ON THE ENVIRONMENT THEN IT IS SAFE TO
TAKE THE XQLRQ OF ALL THE ACTIONS IN ANY OF THE PHYSICAL STATES IN THE CURRENT BELIEF STATE b
!#4)/.3(b) =

s∈b
!#4)/.3P (s) .
/N THE OTHER HAND IF AN ILLEGAL ACTION MIGHT BE THE END OF THE WORLD IT IS SAFER TO ALLOW
ONLY THE LQWHUVHFWLRQ THAT IS THE SET OF ACTIONS LEGAL IN DOO THE STATES OR THE VACUUM
WORLD EVERY STATE HAS THE SAME LEGAL ACTIONS SO BOTH METHODS GIVE THE SAME RESULT
• 7UDQVLWLRQ PRGHO 4HE AGENT DOESNT KNOW WHICH STATE IN THE BELIEF STATE IS THE RIGHT
ONE SO AS FAR AS IT KNOWS IT MIGHT GET TO ANY OF THE STATES RESULTING FROM APPLYING THE
ACTION TO ONE OF THE PHYSICAL STATES IN THE BELIEF STATE OR DETERMINISTIC ACTIONS THE SET
OF STATES THAT MIGHT BE REACHED IS
b
= 2%35,4(b, a) = {s
: s
= 2%35,4P (s, a) AND s ∈ b} .
7ITH DETERMINISTIC ACTIONS b IS NEVER LARGER THAN b 7ITH NONDETERMINISM WE HAVE
b
= 2%35,4(b, a) = {s
: s
∈ 2%35,43P (s, a) AND s ∈ b}
=

s∈b
2%35,43P (s, a) ,
WHICH MAY BE LARGER THAN b AS SHOWN IN IGURE 4HE PROCESS OF GENERATING
THE NEW BELIEF STATE AFTER THE ACTION IS CALLED THE SUHGLFWLRQ STEP THE NOTATION b =
PREDICTION
02%$)#4P (b, a) WILL COME IN HANDY
• *RDO WHVW 4HE AGENT WANTS A PLAN THAT IS SURE TO WORK WHICH MEANS THAT A BELIEF STATE
SATISlES THE GOAL ONLY IF DOO THE PHYSICAL STATES IN IT SATISFY '/!, 4%34P 4HE AGENT
MAY DFFLGHQWDOO ACHIEVE THE GOAL EARLIER BUT IT WONT NQRZ THAT IT HAS DONE SO
• 3DWK FRVW 4HIS IS ALSO TRICKY )F THE SAME ACTION CAN HAVE DIFFERENT COSTS IN DIFFERENT
STATES THEN THE COST OF TAKING AN ACTION IN A GIVEN BELIEF STATE COULD BE ONE OF SEVERAL
VALUES 4HIS GIVES RISE TO A NEW CLASS OF PROBLEMS WHICH WE EXPLORE IN %XERCISE
OR NOW WE ASSUME THAT THE COST OF AN ACTION IS THE SAME IN ALL STATES AND SO CAN BE
TRANSFERRED DIRECTLY FROM THE UNDERLYING PHYSICAL PROBLEM

2
4
1
3
2
4
1
3
1
3
B
A
)LJXUH A 0REDICTING THE NEXT BELIEF STATE FOR THE SENSORLESS VACUUM WORLD WITH A
DETERMINISTIC ACTION Right B 0REDICTION FOR THE SAME BELIEF STATE AND ACTION IN THE SLIPPERY
VERSION OF THE SENSORLESS VACUUM WORLD
IGURE SHOWS THE REACHABLE BELIEF STATE SPACE FOR THE DETERMINISTIC SENSORLESS VACUUM
WORLD 4HERE ARE ONLY REACHABLE BELIEF STATES OUT OF 28 = 256 POSSIBLE BELIEF STATES
4HE PRECEDING DElNITIONS ENABLE THE AUTOMATIC CONSTRUCTION OF THE BELIEF STATE PROBLEM
FORMULATION FROM THE DElNITION OF THE UNDERLYING PHYSICAL PROBLEM /NCE THIS IS DONE WE
CAN APPLY ANY OF THE SEARCH ALGORITHMS OF #HAPTER )N FACT WE CAN DO A LITTLE BIT MORE
THAN THAT )N hORDINARYv GRAPH SEARCH NEWLY GENERATED STATES ARE TESTED TO SEE IF THEY ARE
IDENTICAL TO EXISTING STATES 4HIS WORKS FOR BELIEF STATES TOO FOR EXAMPLE IN IGURE THE
ACTION SEQUENCE ;6XFN /HIW 6XFN= STARTING AT THE INITIAL STATE REACHES THE SAME BELIEF STATE AS
;5LJKW /HIW 6XFN= NAMELY {5, 7} .OW CONSIDER THE BELIEF STATE REACHED BY ;/HIW= NAMELY
{1, 3, 5, 7} /BVIOUSLY THIS IS NOT IDENTICAL TO {5, 7} BUT IT IS A VXSHUVHW )T IS EASY TO PROVE
%XERCISE THAT IF AN ACTION SEQUENCE IS A SOLUTION FOR A BELIEF STATE b IT IS ALSO A SOLUTION FOR
ANY SUBSET OF b (ENCE WE CAN DISCARD A PATH REACHING {1, 3, 5, 7} IF {5, 7} HAS ALREADY BEEN
GENERATED #ONVERSELY IF {1, 3, 5, 7} HAS ALREADY BEEN GENERATED AND FOUND TO BE SOLVABLE
THEN ANY VXEVHW SUCH AS {5, 7} IS GUARANTEED TO BE SOLVABLE 4HIS EXTRA LEVEL OF PRUNING MAY
DRAMATICALLY IMPROVE THE EFlCIENCY OF SENSORLESS PROBLEM SOLVING
%VEN WITH THIS IMPROVEMENT HOWEVER SENSORLESS PROBLEM SOLVING AS WE HAVE DESCRIBED
IT IS SELDOM FEASIBLE IN PRACTICE 4HE DIFlCULTY IS NOT SO MUCH THE VASTNESS OF THE BELIEF STATE
SPACEˆEVEN THOUGH IT IS EXPONENTIALLY LARGER THAN THE UNDERLYING PHYSICAL STATE SPACE IN
MOST CASES THE BRANCHING FACTOR AND SOLUTION LENGTH IN THE BELIEF STATE SPACE AND PHYSICAL
STATE SPACE ARE NOT SO DIFFERENT 4HE REAL DIFlCULTY LIES WITH THE SIZE OF EACH BELIEF STATE OR
EXAMPLE THE INITIAL BELIEF STATE FOR THE 10 × 10 VACUUM WORLD CONTAINS 100 × 2100 OR AROUND
1032 PHYSICAL STATESˆFAR TOO MANY IF WE USE THE ATOMIC REPRESENTATION WHICH IS AN EXPLICIT
LIST OF STATES
/NE SOLUTION IS TO REPRESENT THE BELIEF STATE BY SOME MORE COMPACT DESCRIPTION )N
%NGLISH WE COULD SAY THE AGENT KNOWS h.OTHINGv IN THE INITIAL STATE AFTER MOVING /HIW WE
COULD SAY h.OT IN THE RIGHTMOST COLUMN v AND SO ON #HAPTER EXPLAINS HOW TO DO THIS IN A
FORMAL REPRESENTATION SCHEME !NOTHER APPROACH IS TO AVOID THE STANDARD SEARCH ALGORITHMS
WHICH TREAT BELIEF STATES AS BLACK BOXES JUST LIKE ANY OTHER PROBLEM STATE )NSTEAD WE CAN LOOK

,
2
3
,
2
3
, 2
3
,
2
3
,
2
3
, 2
3
,
2
3
1
1 3
5 7
2 4
6 8
2 3
4 5 6
7 8
4 5
7 8
5 3
7
6 4
8
4
8
5
7
6
8
8 7
3
7
)LJXUH 4HE REACHABLE PORTION OF THE BELIEF STATE SPACE FOR THE DETERMINISTIC SENSOR
LESS VACUUM WORLD %ACH SHADED BOX CORRESPONDS TO A SINGLE BELIEF STATE !T ANY GIVEN POINT
THE AGENT IS IN A PARTICULAR BELIEF STATE BUT DOES NOT KNOW WHICH PHYSICAL STATE IT IS IN 4HE
INITIAL BELIEF STATE COMPLETE IGNORANCE IS THE TOP CENTER BOX !CTIONS ARE REPRESENTED BY
LABELED LINKS 3ELF LOOPS ARE OMITTED FOR CLARITY
LQVLGH THE BELIEF STATES AND DEVELOP LQFUHPHQWDO EHOLHIVWDWH VHDUFK ALGORITHMS THAT BUILD UP
INCREMENTAL
BELIEF-STATE
SEARCH
THE SOLUTION ONE PHYSICAL STATE AT A TIME OR EXAMPLE IN THE SENSORLESS VACUUM WORLD THE
INITIAL BELIEF STATE IS {1, 2, 3, 4, 5, 6, 7, 8} AND WE HAVE TO lND AN ACTION SEQUENCE THAT WORKS
IN ALL STATES 7E CAN DO THIS BY lRST lNDING A SOLUTION THAT WORKS FOR STATE THEN WE CHECK
IF IT WORKS FOR STATE IF NOT GO BACK AND lND A DIFFERENT SOLUTION FOR STATE AND SO ON *UST
AS AN !.$n/2 SEARCH HAS TO lND A SOLUTION FOR EVERY BRANCH AT AN !.$ NODE THIS ALGORITHM
HAS TO lND A SOLUTION FOR EVERY STATE IN THE BELIEF STATE THE DIFFERENCE IS THAT !.$n/2 SEARCH
CAN lND A DIFFERENT SOLUTION FOR EACH BRANCH WHEREAS AN INCREMENTAL BELIEF STATE SEARCH HAS
TO lND RQH SOLUTION THAT WORKS FOR DOO THE STATES
4HE MAIN ADVANTAGE OF THE INCREMENTAL APPROACH IS THAT IT IS TYPICALLY ABLE TO DETECT
FAILURE QUICKLYˆWHEN A BELIEF STATE IS UNSOLVABLE IT IS USUALLY THE CASE THAT A SMALL SUBSET OF
THE BELIEF STATE CONSISTING OF THE lRST FEW STATES EXAMINED IS ALSO UNSOLVABLE )N SOME CASES

THIS LEADS TO A SPEEDUP PROPORTIONAL TO THE SIZE OF THE BELIEF STATES WHICH MAY THEMSELVES BE
AS LARGE AS THE PHYSICAL STATE SPACE ITSELF
%VEN THE MOST EFlCIENT SOLUTION ALGORITHM IS NOT OF MUCH USE WHEN NO SOLUTIONS EXIST
-ANY THINGS JUST CANNOT BE DONE WITHOUT SENSING OR EXAMPLE THE SENSORLESS PUZZLE IS
IMPOSSIBLE /N THE OTHER HAND A LITTLE BIT OF SENSING CAN GO A LONG WAY OR EXAMPLE EVERY
PUZZLE INSTANCE IS SOLVABLE IF JUST ONE SQUARE IS VISIBLEˆTHE SOLUTION INVOLVES MOVING EACH
TILE IN TURN INTO THE VISIBLE SQUARE AND THEN KEEPING TRACK OF ITS LOCATION
6HDUFKLQJ ZLWK REVHUYDWLRQV
OR A GENERAL PARTIALLY OBSERVABLE PROBLEM WE HAVE TO SPECIFY HOW THE ENVIRONMENT GENERATES
PERCEPTS FOR THE AGENT OR EXAMPLE WE MIGHT DElNE THE LOCAL SENSING VACUUM WORLD TO BE
ONE IN WHICH THE AGENT HAS A POSITION SENSOR AND A LOCAL DIRT SENSOR BUT HAS NO SENSOR CAPABLE
OF DETECTING DIRT IN OTHER SQUARES 4HE FORMAL PROBLEM SPECIlCATION INCLUDES A 0%2#%04(s)
FUNCTION THAT RETURNS THE PERCEPT RECEIVED IN A GIVEN STATE )F SENSING IS NONDETERMINISTIC
THEN WE USE A 0%2#%043 FUNCTION THAT RETURNS A SET OF POSSIBLE PERCEPTS OR EXAMPLE IN THE
LOCAL SENSING VACUUM WORLD THE 0%2#%04 IN STATE IS [A, Dirty] ULLY OBSERVABLE PROBLEMS
ARE A SPECIAL CASE IN WHICH 0%2#%04(s) = s FOR EVERY STATE s WHILE SENSORLESS PROBLEMS ARE
A SPECIAL CASE IN WHICH 0%2#%04(s) = null
7HEN OBSERVATIONS ARE PARTIAL IT WILL USUALLY BE THE CASE THAT SEVERAL STATES COULD HAVE
PRODUCED ANY GIVEN PERCEPT OR EXAMPLE THE PERCEPT [A, Dirty] IS PRODUCED BY STATE AS
WELL AS BY STATE (ENCE GIVEN THIS AS THE INITIAL PERCEPT THE INITIAL BELIEF STATE FOR THE
LOCAL SENSING VACUUM WORLD WILL BE {1, 3} 4HE !#4)/.3 34%0 #/34 AND '/!, 4%34
ARE CONSTRUCTED FROM THE UNDERLYING PHYSICAL PROBLEM JUST AS FOR SENSORLESS PROBLEMS BUT THE
TRANSITION MODEL IS A BIT MORE COMPLICATED 7E CAN THINK OF TRANSITIONS FROM ONE BELIEF STATE
TO THE NEXT FOR A PARTICULAR ACTION AS OCCURRING IN THREE STAGES AS SHOWN IN IGURE
• 4HE SUHGLFWLRQ STAGE IS THE SAME AS FOR SENSORLESS PROBLEMS GIVEN THE ACTION a IN BELIEF
STATE b THE PREDICTED BELIEF STATE IS b̂ = 02%$)#4(b, a)
• 4HE REVHUYDWLRQ SUHGLFWLRQ STAGE DETERMINES THE SET OF PERCEPTS o THAT COULD BE OB
SERVED IN THE PREDICTED BELIEF STATE
0/33),% 0%2#%043(b̂) = {o : o = 0%2#%04(s) AND s ∈ b̂} .
• 4HE XSGDWH STAGE DETERMINES FOR EACH POSSIBLE PERCEPT THE BELIEF STATE THAT WOULD
RESULT FROM THE PERCEPT 4HE NEW BELIEF STATE bo IS JUST THE SET OF STATES IN b̂ THAT COULD
HAVE PRODUCED THE PERCEPT
bo = 50$!4%(b̂, o) = {s : o = 0%2#%04(s) AND s ∈ b̂} .
.OTICE THAT EACH UPDATED BELIEF STATE bo CAN BE NO LARGER THAN THE PREDICTED BELIEF STATE b̂
OBSERVATIONS CAN ONLY HELP REDUCE UNCERTAINTY COMPARED TO THE SENSORLESS CASE -ORE
OVER FOR DETERMINISTIC SENSING THE BELIEF STATES FOR THE DIFFERENT POSSIBLE PERCEPTS WILL
BE DISJOINT FORMING A SDUWLWLRQ OF THE ORIGINAL PREDICTED BELIEF STATE
11 (ERE AND THROUGHOUT THE BOOK THE hHATv IN b̂ MEANS AN ESTIMATED OR PREDICTED VALUE FOR b

2
4
4
1
2
4
1
3
2
1
3 3
B
A
4
2
1
3
Right
[A,Dirty]
[B,Dirty]
[B,Clean]
Right
[B,Dirty]
[B,Clean]
)LJXUH 4WO EXAMPLE OF TRANSITIONS IN LOCAL SENSING VACUUM WORLDS A )N THE DE
TERMINISTIC WORLD 5LJKW IS APPLIED IN THE INITIAL BELIEF STATE RESULTING IN A NEW BELIEF STATE
WITH TWO POSSIBLE PHYSICAL STATES FOR THOSE STATES THE POSSIBLE PERCEPTS ARE [B, Dirty] AND
[B, Clean] LEADING TO TWO BELIEF STATES EACH OF WHICH IS A SINGLETON B )N THE SLIPPERY
WORLD 5LJKW IS APPLIED IN THE INITIAL BELIEF STATE GIVING A NEW BELIEF STATE WITH FOUR PHYSI
CAL STATES FOR THOSE STATES THE POSSIBLE PERCEPTS ARE [A, Dirty] [B, Dirty] AND [B, Clean]
LEADING TO THREE BELIEF STATES AS SHOWN
0UTTING THESE THREE STAGES TOGETHER WE OBTAIN THE POSSIBLE BELIEF STATES RESULTING FROM A GIVEN
ACTION AND THE SUBSEQUENT POSSIBLE PERCEPTS
2%35,43(b, a) = {bo : bo = 50$!4%(02%$)#4(b, a), o) AND
o ∈ 0/33),% 0%2#%043(02%$)#4(b, a))} .
!GAIN THE NONDETERMINISM IN THE PARTIALLY OBSERVABLE PROBLEM COMES FROM THE INABILITY
TO PREDICT EXACTLY WHICH PERCEPT WILL BE RECEIVED AFTER ACTING UNDERLYING NONDETERMINISM IN
THE PHYSICAL ENVIRONMENT MAY FRQWULEXWH TO THIS INABILITY BY ENLARGING THE BELIEF STATE AT THE
PREDICTION STAGE LEADING TO MORE PERCEPTS AT THE OBSERVATION STAGE
6ROYLQJ SDUWLDOO REVHUYDEOH SUREOHPV
4HE PRECEDING SECTION SHOWED HOW TO DERIVE THE 2%35,43 FUNCTION FOR A NONDETERMINISTIC
BELIEF STATE PROBLEM FROM AN UNDERLYING PHYSICAL PROBLEM AND THE 0%2#%04 FUNCTION 'IVEN

7
5
1
3
4
2
Suck
[B,Dirty] [B,Clean]
Right
[A,Clean]
)LJXUH 4HE lRST LEVEL OF THE !.$n/2 SEARCH TREE FOR A PROBLEM IN THE LOCAL SENSING
VACUUM WORLD Suck IS THE lRST STEP OF THE SOLUTION
SUCH A FORMULATION THE !.$n/2 SEARCH ALGORITHM OF IGURE CAN BE APPLIED DIRECTLY TO
DERIVE A SOLUTION IGURE SHOWS PART OF THE SEARCH TREE FOR THE LOCAL SENSING VACUUM
WORLD ASSUMING AN INITIAL PERCEPT [A, Dirty] 4HE SOLUTION IS THE CONDITIONAL PLAN
[6XFN, 5LJKW, LI Bstate = {6} WKHQ 6XFN HOVH [ ]] .
.OTICE THAT BECAUSE WE SUPPLIED A BELIEF STATE PROBLEM TO THE !.$n/2 SEARCH ALGORITHM IT
RETURNED A CONDITIONAL PLAN THAT TESTS THE BELIEF STATE RATHER THAN THE ACTUAL STATE 4HIS IS AS IT
SHOULD BE IN A PARTIALLY OBSERVABLE ENVIRONMENT THE AGENT WONT BE ABLE TO EXECUTE A SOLUTION
THAT REQUIRES TESTING THE ACTUAL STATE
!S IN THE CASE OF STANDARD SEARCH ALGORITHMS APPLIED TO SENSORLESS PROBLEMS THE !.$n
/2 SEARCH ALGORITHM TREATS BELIEF STATES AS BLACK BOXES JUST LIKE ANY OTHER STATES /NE CAN
IMPROVE ON THIS BY CHECKING FOR PREVIOUSLY GENERATED BELIEF STATES THAT ARE SUBSETS OR SUPERSETS
OF THE CURRENT STATE JUST AS FOR SENSORLESS PROBLEMS /NE CAN ALSO DERIVE INCREMENTAL SEARCH
ALGORITHMS ANALOGOUS TO THOSE DESCRIBED FOR SENSORLESS PROBLEMS THAT PROVIDE SUBSTANTIAL
SPEEDUPS OVER THE BLACK BOX APPROACH
$Q DJHQW IRU SDUWLDOO REVHUYDEOH HQYLURQPHQWV
4HE DESIGN OF A PROBLEM SOLVING AGENT FOR PARTIALLY OBSERVABLE ENVIRONMENTS IS QUITE SIMILAR
TO THE SIMPLE PROBLEM SOLVING AGENT IN IGURE THE AGENT FORMULATES A PROBLEM CALLS A
SEARCH ALGORITHM SUCH AS !.$ /2 '2!0( 3%!2#( TO SOLVE IT AND EXECUTES THE SOLUTION
4HERE ARE TWO MAIN DIFFERENCES IRST THE SOLUTION TO A PROBLEM WILL BE A CONDITIONAL PLAN
RATHER THAN A SEQUENCE IF THE lRST STEP IS AN IFnTHENnELSE EXPRESSION THE AGENT WILL NEED TO
TEST THE CONDITION IN THE IF PART AND EXECUTE THE THEN PART OR THE ELSE PART ACCORDINGLY 3ECOND
THE AGENT WILL NEED TO MAINTAIN ITS BELIEF STATE AS IT PERFORMS ACTIONS AND RECEIVES PERCEPTS
4HIS PROCESS RESEMBLES THE PREDICTIONnOBSERVATIONnUPDATE PROCESS IN %QUATION BUT IS
ACTUALLY SIMPLER BECAUSE THE PERCEPT IS GIVEN BY THE ENVIRONMENT RATHER THAN CALCULATED BY THE

7
5
6
2
1
3
6
4
8
2 [B,Dirty]
Right
[A,Clean]
7
5
Suck
)LJXUH 4WO PREDICTIONnUPDATE CYCLES OF BELIEF STATE MAINTENANCE IN THE KINDERGARTEN
VACUUM WORLD WITH LOCAL SENSING
AGENT 'IVEN AN INITIAL BELIEF STATE b AN ACTION a AND A PERCEPT o THE NEW BELIEF STATE IS
b
= 50$!4%(02%$)#4(b, a), o) .
IGURE SHOWS THE BELIEF STATE BEING MAINTAINED IN THE NLQGHUJDUWHQ VACUUM WORLD WITH
LOCAL SENSING WHEREIN ANY SQUARE MAY BECOME DIRTY AT ANY TIME UNLESS THE AGENT IS ACTIVELY
CLEANING IT AT THAT MOMENT
)N PARTIALLY OBSERVABLE ENVIRONMENTSˆWHICH INCLUDE THE VAST MAJORITY OF REAL WORLD
ENVIRONMENTSˆMAINTAINING ONES BELIEF STATE IS A CORE FUNCTION OF ANY INTELLIGENT SYSTEM
4HIS FUNCTION GOES UNDER VARIOUS NAMES INCLUDING PRQLWRULQJ ¿OWHULQJ AND VWDWH HVWLPD
MONITORING
FILTERING WLRQ %QUATION IS CALLED A UHFXUVLYH STATE ESTIMATOR BECAUSE IT COMPUTES THE NEW BELIEF
STATE ESTIMATION
RECURSIVE
STATE FROM THE PREVIOUS ONE RATHER THAN BY EXAMINING THE ENTIRE PERCEPT SEQUENCE )F THE AGENT
IS NOT TO hFALL BEHIND v THE COMPUTATION HAS TO HAPPEN AS FAST AS PERCEPTS ARE COMING IN !S
THE ENVIRONMENT BECOMES MORE COMPLEX THE EXACT UPDATE COMPUTATION BECOMES INFEASIBLE
AND THE AGENT WILL HAVE TO COMPUTE AN APPROXIMATE BELIEF STATE PERHAPS FOCUSING ON THE IM
PLICATIONS OF THE PERCEPT FOR THE ASPECTS OF THE ENVIRONMENT THAT ARE OF CURRENT INTEREST -OST
WORK ON THIS PROBLEM HAS BEEN DONE FOR STOCHASTIC CONTINUOUS STATE ENVIRONMENTS WITH THE
TOOLS OF PROBABILITY THEORY AS EXPLAINED IN #HAPTER (ERE WE WILL SHOW AN EXAMPLE IN A
DISCRETE ENVIRONMENT WITH DETRMINISTIC SENSORS AND NONDETERMINISTIC ACTIONS
4HE EXAMPLE CONCERNS A ROBOT WITH THE TASK OF ORFDOL]DWLRQ WORKING OUT WHERE IT IS
LOCALIZATION
GIVEN A MAP OF THE WORLD AND A SEQUENCE OF PERCEPTS AND ACTIONS /UR ROBOT IS PLACED IN THE
MAZE LIKE ENVIRONMENT OF IGURE 4HE ROBOT IS EQUIPPED WITH FOUR SONAR SENSORS THAT
TELL WHETHER THERE IS AN OBSTACLEˆTHE OUTER WALL OR A BLACK SQUARE IN THE lGUREˆIN EACH OF
THE FOUR COMPASS DIRECTIONS 7E ASSUME THAT THE SENSORS GIVE PERFECTLY CORRECT DATA AND THAT
THE ROBOT HAS A CORRECT MAP OF THE ENVIORNMENT UT UNFORTUNATELY THE ROBOTS NAVIGATIONAL
SYSTEM IS BROKEN SO WHEN IT EXECUTES A Move ACTION IT MOVES RANDOMLY TO ONE OF THE ADJACENT
SQUARES 4HE ROBOTS TASK IS TO DETERMINE ITS CURRENT LOCATION
3UPPOSE THE ROBOT HAS JUST BEEN SWITCHED ON SO IT DOES NOT KNOW WHERE IT IS 4HUS ITS
INITIAL BELIEF STATE b CONSISTS OF THE SET OF ALL LOCATIONS 4HE THE ROBOT RECEIVES THE PERCEPT
12 4HE USUAL APOLOGIES TO THOSE WHO ARE UNFAMILIAR WITH THE EFFECT OF SMALL CHILDREN ON THE ENVIRONMENT

A 0OSSIBLE LOCATIONS OF ROBOT AFTER E1 = NSW
B 0OSSIBLE LOCATIONS OF ROBOT !FTER E1 = NSW, E2 = NS
)LJXUH 0OSSIBLE POSITIONS OF THE ROBOT A AFTER ONE OBSERVATION E1 = NSW AND
B AFTER A SECOND OBSERVATION E2 = NS 7HEN SENSORS ARE NOISELESS AND THE TRANSITION MODEL
IS ACCURATE THERE ARE NO OTHER POSSIBLE LOCATIONS FOR THE ROBOT CONSISTENT WITH THIS SEQUENCE
OF TWO OBSERVATIONS
16: MEANING THERE ARE OBSTACLES TO THE NORTH WEST AND SOUTH AND DOES AN UPDATE USING THE
EQUATION bo = 50$!4%(b) YIELDING THE LOCATIONS SHOWN IN IGURE A 9OU CAN INSPECT
THE MAZE TO SEE THAT THOSE ARE THE ONLY FOUR LOCATIONS THAT YIELD THE PERCEPT NWS
.EXT THE ROBOT EXECUTES A Move ACTION BUT THE RESULT IS NONDETERMINISTIC 4HE NEW BE
LIEF STATE ba = 02%$)#4(bo, Move) CONTAINS ALL THE LOCATIONS THAT ARE ONE STEP AWAY FROM THE
LOCATIONS IN bo 7HEN THE SECOND PERCEPT NS ARRIVES THE ROBOT DOES 50$!4%(ba, NS) AND
lNDS THAT THE BELIEF STATE HAS COLLAPSED DOWN TO THE SINGLE LOCATION SHOWN IN IGURE B
4HATS THE ONLY LOCATION THAT COULD BE THE RESULT OF
50$!4%(02%$)#4(50$!4%(b, NSW ), Move), NS) .
7ITH NONDETERMNISTIC ACTIONS THE 02%$)#4 STEP GROWS THE BELIEF STATE BUT THE 50$!4% STEP
SHRINKS IT BACK DOWNˆAS LONG AS THE PERCEPTS PROVIDE SOME USEFUL IDENTIFYING INFORMATION
3OMETIMES THE PERCEPTS DONT HELP MUCH FOR LOCALIZATION )F THERE WERE ONE OR MORE LONG
EAST WEST CORRIDORS THEN A ROBOT COULD RECEIVE A LONG SEQUENCE OF NS PERCEPTS BUT NEVER
KNOW WHERE IN THE CORRIDORS IT WAS

3ECTION /NLINE 3EARCH !GENTS AND 5NKNOWN %NVIRONMENTS
/.,).% 3%!2#( !'%.43 !.$ 5.+./7. %.6)2/.-%.43
3O FAR WE HAVE CONCENTRATED ON AGENTS THAT USE RIÀLQH VHDUFK ALGORITHMS 4HEY COMPUTE
OFFLINE SEARCH
A COMPLETE SOLUTION BEFORE SETTING FOOT IN THE REAL WORLD AND THEN EXECUTE THE SOLUTION )N
CONTRAST AN RQOLQH VHDUFK AGENT LQWHUOHDYHV COMPUTATION AND ACTION lRST IT TAKES AN ACTION
ONLINE SEARCH
THEN IT OBSERVES THE ENVIRONMENT AND COMPUTES THE NEXT ACTION /NLINE SEARCH IS A GOOD IDEA
IN DYNAMIC OR SEMIDYNAMIC DOMAINSˆDOMAINS WHERE THERE IS A PENALTY FOR SITTING AROUND
AND COMPUTING TOO LONG /NLINE SEARCH IS ALSO HELPFUL IN NONDETERMINISTIC DOMAINS BECAUSE
IT ALLOWS THE AGENT TO FOCUS ITS COMPUTATIONAL EFFORTS ON THE CONTINGENCIES THAT ACTUALLY ARISE
RATHER THAN THOSE THAT PLJKW HAPPEN BUT PROBABLY WONT /F COURSE THERE IS A TRADEOFF THE
MORE AN AGENT PLANS AHEAD THE LESS OFTEN IT WILL lND ITSELF UP THE CREEK WITHOUT A PADDLE
/NLINE SEARCH IS A QHFHVVDU IDEA FOR UNKNOWN ENVIRONMENTS WHERE THE AGENT DOES NOT
KNOW WHAT STATES EXIST OR WHAT ITS ACTIONS DO )N THIS STATE OF IGNORANCE THE AGENT FACES AN
H[SORUDWLRQ SUREOHP AND MUST USE ITS ACTIONS AS EXPERIMENTS IN ORDER TO LEARN ENOUGH TO
EXPLORATION
PROBLEM
MAKE DELIBERATION WORTHWHILE
4HE CANONICAL EXAMPLE OF ONLINE SEARCH IS A ROBOT THAT IS PLACED IN A NEW BUILDING AND
MUST EXPLORE IT TO BUILD A MAP THAT IT CAN USE FOR GETTING FROM A TO B -ETHODS FOR ESCAPING
FROM LABYRINTHSˆREQUIRED KNOWLEDGE FOR ASPIRING HEROES OF ANTIQUITYˆARE ALSO EXAMPLES OF
ONLINE SEARCH ALGORITHMS 3PATIAL EXPLORATION IS NOT THE ONLY FORM OF EXPLORATION HOWEVER
#ONSIDER A NEWBORN BABY IT HAS MANY POSSIBLE ACTIONS BUT KNOWS THE OUTCOMES OF NONE OF
THEM AND IT HAS EXPERIENCED ONLY A FEW OF THE POSSIBLE STATES THAT IT CAN REACH 4HE BABYS
GRADUAL DISCOVERY OF HOW THE WORLD WORKS IS IN PART AN ONLINE SEARCH PROCESS
2QOLQH VHDUFK SUREOHPV
!N ONLINE SEARCH PROBLEM MUST BE SOLVED BY AN AGENT EXECUTING ACTIONS RATHER THAN BY PURE
COMPUTATION 7E ASSUME A DETERMINISTIC AND FULLY OBSERVABLE ENVIRONMENT #HAPTER RE
LAXES THESE ASSUMPTIONS BUT WE STIPULATE THAT THE AGENT KNOWS ONLY THE FOLLOWING
• !#4)/.3(s) WHICH RETURNS A LIST OF ACTIONS ALLOWED IN STATE s
• 4HE STEP COST FUNCTION c(s, a, s)ˆNOTE THAT THIS CANNOT BE USED UNTIL THE AGENT KNOWS
THAT s IS THE OUTCOME AND
• '/!, 4%34(s)
.OTE IN PARTICULAR THAT THE AGENT FDQQRW DETERMINE 2%35,4(s, a) EXCEPT BY ACTUALLY BEING
IN s AND DOING a OR EXAMPLE IN THE MAZE PROBLEM SHOWN IN IGURE THE AGENT DOES
NOT KNOW THAT GOING Up FROM LEADS TO NOR HAVING DONE THAT DOES IT KNOW THAT
GOING Down WILL TAKE IT BACK TO 4HIS DEGREE OF IGNORANCE CAN BE REDUCED IN SOME
APPLICATIONSˆFOR EXAMPLE A ROBOT EXPLORER MIGHT KNOW HOW ITS MOVEMENT ACTIONS WORK AND
BE IGNORANT ONLY OF THE LOCATIONS OF OBSTACLES
13 4HE TERM hONLINEv IS COMMONLY USED IN COMPUTER SCIENCE TO REFER TO ALGORITHMS THAT MUST PROCESS INPUT DATA
AS THEY ARE RECEIVED RATHER THAN WAITING FOR THE ENTIRE INPUT DATA SET TO BECOME AVAILABLE

G
S
1
2
3
1 2 3
)LJXUH ! SIMPLE MAZE PROBLEM 4HE AGENT STARTS AT S AND MUST REACH G BUT KNOWS
NOTHING OF THE ENVIRONMENT
S
G
S
G
A
A
S G
A B
)LJXUH A 4WO STATE SPACES THAT MIGHT LEAD AN ONLINE SEARCH AGENT INTO A DEAD END
!NY GIVEN AGENT WILL FAIL IN AT LEAST ONE OF THESE SPACES B ! TWO DIMENSIONAL ENVIRONMENT
THAT CAN CAUSE AN ONLINE SEARCH AGENT TO FOLLOW AN ARBITRARILY INEFlCIENT ROUTE TO THE GOAL
7HICHEVER CHOICE THE AGENT MAKES THE ADVERSARY BLOCKS THAT ROUTE WITH ANOTHER LONG THIN
WALL SO THAT THE PATH FOLLOWED IS MUCH LONGER THAN THE BEST POSSIBLE PATH
INALLY THE AGENT MIGHT HAVE ACCESS TO AN ADMISSIBLE HEURISTIC FUNCTION h(s) THAT ES
TIMATES THE DISTANCE FROM THE CURRENT STATE TO A GOAL STATE OR EXAMPLE IN IGURE THE
AGENT MIGHT KNOW THE LOCATION OF THE GOAL AND BE ABLE TO USE THE -ANHATTAN DISTANCE HEURISTIC
4YPICALLY THE AGENTS OBJECTIVE IS TO REACH A GOAL STATE WHILE MINIMIZING COST !NOTHER
POSSIBLE OBJECTIVE IS SIMPLY TO EXPLORE THE ENTIRE ENVIRONMENT 4HE COST IS THE TOTAL PATH COST
OF THE PATH THAT THE AGENT ACTUALLY TRAVELS )T IS COMMON TO COMPARE THIS COST WITH THE PATH
COST OF THE PATH THE AGENT WOULD FOLLOW LI LW NQHZ WKH VHDUFK VSDFH LQ DGYDQFHˆTHAT IS THE
ACTUAL SHORTEST PATH OR SHORTEST COMPLETE EXPLORATION )N THE LANGUAGE OF ONLINE ALGORITHMS
THIS IS CALLED THE FRPSHWLWLYH UDWLR WE WOULD LIKE IT TO BE AS SMALL AS POSSIBLE
COMPETITIVE RATIO

!LTHOUGH THIS SOUNDS LIKE A REASONABLE REQUEST IT IS EASY TO SEE THAT THE BEST ACHIEVABLE
COMPETITIVE RATIO IS INlNITE IN SOME CASES OR EXAMPLE IF SOME ACTIONS ARE LUUHYHUVLEOHˆ
IRREVERSIBLE
IE THEY LEAD TO A STATE FROM WHICH NO ACTION LEADS BACK TO THE PREVIOUS STATEˆTHE ONLINE
SEARCH MIGHT ACCIDENTALLY REACH A GHDGHQG STATE FROM WHICH NO GOAL STATE IS REACHABLE 0ER
DEAD END
HAPS THE TERM hACCIDENTALLYv IS UNCONVINCINGˆAFTER ALL THERE MIGHT BE AN ALGORITHM THAT
HAPPENS NOT TO TAKE THE DEAD END PATH AS IT EXPLORES /UR CLAIM TO BE MORE PRECISE IS THAT QR
DOJRULWKP FDQ DYRLG GHDG HQGV LQ DOO VWDWH VSDFHV #ONSIDER THE TWO DEAD END STATE SPACES IN
IGURE A 4O AN ONLINE SEARCH ALGORITHM THAT HAS VISITED STATES S AND A THE TWO STATE
SPACES LOOK LGHQWLFDO SO IT MUST MAKE THE SAME DECISION IN BOTH 4HEREFORE IT WILL FAIL IN
ONE OF THEM 4HIS IS AN EXAMPLE OF AN DGYHUVDU DUJXPHQWˆWE CAN IMAGINE AN ADVERSARY
ADVERSARY
ARGUMENT
CONSTRUCTING THE STATE SPACE WHILE THE AGENT EXPLORES IT AND PUTTING THE GOALS AND DEAD ENDS
WHEREVER IT CHOOSES
$EAD ENDS ARE A REAL DIFlCULTY FOR ROBOT EXPLORATIONˆSTAIRCASES RAMPS CLIFFS ONE WAY
STREETS AND ALL KINDS OF NATURAL TERRAIN PRESENT OPPORTUNITIES FOR IRREVERSIBLE ACTIONS 4O MAKE
PROGRESS WE SIMPLY ASSUME THAT THE STATE SPACE IS VDIHO H[SORUDEOHˆTHAT IS SOME GOAL STATE
SAFELY EXPLORABLE
IS REACHABLE FROM EVERY REACHABLE STATE 3TATE SPACES WITH REVERSIBLE ACTIONS SUCH AS MAZES
AND PUZZLES CAN BE VIEWED AS UNDIRECTED GRAPHS AND ARE CLEARLY SAFELY EXPLORABLE
%VEN IN SAFELY EXPLORABLE ENVIRONMENTS NO BOUNDED COMPETITIVE RATIO CAN BE GUARAN
TEED IF THERE ARE PATHS OF UNBOUNDED COST 4HIS IS EASY TO SHOW IN ENVIRONMENTS WITH IRRE
VERSIBLE ACTIONS BUT IN FACT IT REMAINS TRUE FOR THE REVERSIBLE CASE AS WELL AS IGURE B
SHOWS OR THIS REASON IT IS COMMON TO DESCRIBE THE PERFORMANCE OF ONLINE SEARCH ALGORITHMS
IN TERMS OF THE SIZE OF THE ENTIRE STATE SPACE RATHER THAN JUST THE DEPTH OF THE SHALLOWEST GOAL
2QOLQH VHDUFK DJHQWV
!FTER EACH ACTION AN ONLINE AGENT RECEIVES A PERCEPT TELLING IT WHAT STATE IT HAS REACHED FROM
THIS INFORMATION IT CAN AUGMENT ITS MAP OF THE ENVIRONMENT 4HE CURRENT MAP IS USED TO
DECIDE WHERE TO GO NEXT 4HIS INTERLEAVING OF PLANNING AND ACTION MEANS THAT ONLINE SEARCH
ALGORITHMS ARE QUITE DIFFERENT FROM THE OFmINE SEARCH ALGORITHMS WE HAVE SEEN PREVIOUSLY OR
EXAMPLE OFmINE ALGORITHMS SUCH AS !∗ CAN EXPAND A NODE IN ONE PART OF THE SPACE AND THEN
IMMEDIATELY EXPAND A NODE IN ANOTHER PART OF THE SPACE BECAUSE NODE EXPANSION INVOLVES
SIMULATED RATHER THAN REAL ACTIONS !N ONLINE ALGORITHM ON THE OTHER HAND CAN DISCOVER
SUCCESSORS ONLY FOR A NODE THAT IT PHYSICALLY OCCUPIES 4O AVOID TRAVELING ALL THE WAY ACROSS
THE TREE TO EXPAND THE NEXT NODE IT SEEMS BETTER TO EXPAND NODES IN A ORFDO ORDER $EPTH lRST
SEARCH HAS EXACTLY THIS PROPERTY BECAUSE EXCEPT WHEN BACKTRACKING THE NEXT NODE EXPANDED
IS A CHILD OF THE PREVIOUS NODE EXPANDED
!N ONLINE DEPTH lRST SEARCH AGENT IS SHOWN IN IGURE 4HIS AGENT STORES ITS MAP
IN A TABLE 2%35,4[s, a] THAT RECORDS THE STATE RESULTING FROM EXECUTING ACTION a IN STATE s
7HENEVER AN ACTION FROM THE CURRENT STATE HAS NOT BEEN EXPLORED THE AGENT TRIES THAT ACTION
4HE DIFlCULTY COMES WHEN THE AGENT HAS TRIED ALL THE ACTIONS IN A STATE )N OFmINE DEPTH lRST
SEARCH THE STATE IS SIMPLY DROPPED FROM THE QUEUE IN AN ONLINE SEARCH THE AGENT HAS TO
BACKTRACK PHYSICALLY )N DEPTH lRST SEARCH THIS MEANS GOING BACK TO THE STATE FROM WHICH THE
AGENT MOST RECENTLY ENTERED THE CURRENT STATE 4O ACHIEVE THAT THE ALGORITHM KEEPS A TABLE THAT

IXQFWLRQ /.,).% $3 !'%.4s
UHWXUQV AN ACTION
LQSXWV s
A PERCEPT THAT IDENTIlES THE CURRENT STATE
SHUVLVWHQW result A TABLE INDEXED BY STATE AND ACTION INITIALLY EMPTY
untried A TABLE THAT LISTS FOR EACH STATE THE ACTIONS NOT YET TRIED
unbacktracked A TABLE THAT LISTS FOR EACH STATE THE BACKTRACKS NOT YET TRIED
s a THE PREVIOUS STATE AND ACTION INITIALLY NULL
LI '/!, 4%34s
WKHQ UHWXUQ stop
LI s
IS A NEW STATE NOT IN untried WKHQ untried;s
= ← !#4)/.3s
LI s IS NOT NULL WKHQ
result;s a= ← s
ADD s TO THE FRONT OF unbacktracked;s
=
LI untried;s
= IS EMPTY WKHQ
LI unbacktracked;s
= IS EMPTY WKHQ UHWXUQ stop
HOVH a ← AN ACTION b SUCH THAT result;s
b= 0/0unbacktracked;s
=
HOVH a ← 0/0untried;s
=
s ← s
UHWXUQ a
)LJXUH !N ONLINE SEARCH AGENT THAT USES DEPTH lRST EXPLORATION 4HE AGENT IS APPLI
CABLE ONLY IN STATE SPACES IN WHICH EVERY ACTION CAN BE hUNDONEv BY SOME OTHER ACTION
LISTS FOR EACH STATE THE PREDECESSOR STATES TO WHICH THE AGENT HAS NOT YET BACKTRACKED )F THE
AGENT HAS RUN OUT OF STATES TO WHICH IT CAN BACKTRACK THEN ITS SEARCH IS COMPLETE
7E RECOMMEND THAT THE READER TRACE THROUGH THE PROGRESS OF /.,).% $3 !'%.4
WHEN APPLIED TO THE MAZE GIVEN IN IGURE )T IS FAIRLY EASY TO SEE THAT THE AGENT WILL IN
THE WORST CASE END UP TRAVERSING EVERY LINK IN THE STATE SPACE EXACTLY TWICE OR EXPLORATION
THIS IS OPTIMAL FOR lNDING A GOAL ON THE OTHER HAND THE AGENTS COMPETITIVE RATIO COULD BE
ARBITRARILY BAD IF IT GOES OFF ON A LONG EXCURSION WHEN THERE IS A GOAL RIGHT NEXT TO THE INITIAL
STATE !N ONLINE VARIANT OF ITERATIVE DEEPENING SOLVES THIS PROBLEM FOR AN ENVIRONMENT THAT IS
A UNIFORM TREE THE COMPETITIVE RATIO OF SUCH AN AGENT IS A SMALL CONSTANT
ECAUSE OF ITS METHOD OF BACKTRACKING /.,).% $3 !'%.4 WORKS ONLY IN STATE
SPACES WHERE THE ACTIONS ARE REVERSIBLE 4HERE ARE SLIGHTLY MORE COMPLEX ALGORITHMS THAT
WORK IN GENERAL STATE SPACES BUT NO SUCH ALGORITHM HAS A BOUNDED COMPETITIVE RATIO
2QOLQH ORFDO VHDUFK
,IKE DEPTH lRST SEARCH KLOOFOLPELQJ VHDUFK HAS THE PROPERTY OF LOCALITY IN ITS NODE EXPAN
SIONS )N FACT BECAUSE IT KEEPS JUST ONE CURRENT STATE IN MEMORY HILL CLIMBING SEARCH IS
DOUHDG AN ONLINE SEARCH ALGORITHM 5NFORTUNATELY IT IS NOT VERY USEFUL IN ITS SIMPLEST FORM
BECAUSE IT LEAVES THE AGENT SITTING AT LOCAL MAXIMA WITH NOWHERE TO GO -OREOVER RANDOM
RESTARTS CANNOT BE USED BECAUSE THE AGENT CANNOT TRANSPORT ITSELF TO A NEW STATE
)NSTEAD OF RANDOM RESTARTS ONE MIGHT CONSIDER USING A UDQGRP ZDON TO EXPLORE THE
RANDOM WALK
ENVIRONMENT ! RANDOM WALK SIMPLY SELECTS AT RANDOM ONE OF THE AVAILABLE ACTIONS FROM THE

S G
)LJXUH !N ENVIRONMENT IN WHICH A RANDOM WALK WILL TAKE EXPONENTIALLY MANY STEPS
TO lND THE GOAL
CURRENT STATE PREFERENCE CAN BE GIVEN TO ACTIONS THAT HAVE NOT YET BEEN TRIED )T IS EASY TO
PROVE THAT A RANDOM WALK WILL HYHQWXDOO lND A GOAL OR COMPLETE ITS EXPLORATION PROVIDED
THAT THE SPACE IS lNITE /N THE OTHER HAND THE PROCESS CAN BE VERY SLOW IGURE SHOWS
AN ENVIRONMENT IN WHICH A RANDOM WALK WILL TAKE EXPONENTIALLY MANY STEPS TO lND THE GOAL
BECAUSE AT EACH STEP BACKWARD PROGRESS IS TWICE AS LIKELY AS FORWARD PROGRESS 4HE EXAMPLE
IS CONTRIVED OF COURSE BUT THERE ARE MANY REAL WORLD STATE SPACES WHOSE TOPOLOGY CAUSES
THESE KINDS OF hTRAPSv FOR RANDOM WALKS
!UGMENTING HILL CLIMBING WITH PHPRU RATHER THAN RANDOMNESS TURNS OUT TO BE A MORE
EFFECTIVE APPROACH 4HE BASIC IDEA IS TO STORE A hCURRENT BEST ESTIMATEv H(s) OF THE COST TO
REACH THE GOAL FROM EACH STATE THAT HAS BEEN VISITED H(s) STARTS OUT BEING JUST THE HEURISTIC
ESTIMATE h(s) AND IS UPDATED AS THE AGENT GAINS EXPERIENCE IN THE STATE SPACE IGURE
SHOWS A SIMPLE EXAMPLE IN A ONE DIMENSIONAL STATE SPACE )N A THE AGENT SEEMS TO BE
STUCK IN A mAT LOCAL MINIMUM AT THE SHADED STATE 2ATHER THAN STAYING WHERE IT IS THE AGENT
SHOULD FOLLOW WHAT SEEMS TO BE THE BEST PATH TO THE GOAL GIVEN THE CURRENT COST ESTIMATES FOR
ITS NEIGHBORS 4HE ESTIMATED COST TO REACH THE GOAL THROUGH A NEIGHBOR s IS THE COST TO GET
TO s PLUS THE ESTIMATED COST TO GET TO A GOAL FROM THEREˆTHAT IS c(s, a, s) + H(s) )N THE
EXAMPLE THERE ARE TWO ACTIONS WITH ESTIMATED COSTS 1+9 AND 1+2 SO IT SEEMS BEST TO MOVE
RIGHT .OW IT IS CLEAR THAT THE COST ESTIMATE OF FOR THE SHADED STATE WAS OVERLY OPTIMISTIC
3INCE THE BEST MOVE COST AND LED TO A STATE THAT IS AT LEAST STEPS FROM A GOAL THE SHADED
STATE MUST BE AT LEAST STEPS FROM A GOAL SO ITS H SHOULD BE UPDATED ACCORDINGLY AS SHOWN
IN IGURE B #ONTINUING THIS PROCESS THE AGENT WILL MOVE BACK AND FORTH TWICE MORE
UPDATING H EACH TIME AND hmATTENING OUTv THE LOCAL MINIMUM UNTIL IT ESCAPES TO THE RIGHT
!N AGENT IMPLEMENTING THIS SCHEME WHICH IS CALLED LEARNING REAL TIME !∗ /57$∗ IS
LRTA*
SHOWN IN IGURE ,IKE /.,).% $3 !'%.4 IT BUILDS A MAP OF THE ENVIRONMENT IN
THE result TABLE )T UPDATES THE COST ESTIMATE FOR THE STATE IT HAS JUST LEFT AND THEN CHOOSES THE
hAPPARENTLY BESTv MOVE ACCORDING TO ITS CURRENT COST ESTIMATES /NE IMPORTANT DETAIL IS THAT
ACTIONS THAT HAVE NOT YET BEEN TRIED IN A STATE s ARE ALWAYS ASSUMED TO LEAD IMMEDIATELY TO THE
GOAL WITH THE LEAST POSSIBLE COST NAMELY h(s) 4HIS RSWLPLVP XQGHU XQFHUWDLQW ENCOURAGES
OPTIMISM UNDER
UNCERTAINTY
THE AGENT TO EXPLORE NEW POSSIBLY PROMISING PATHS
!N ,24!∗ AGENT IS GUARANTEED TO lND A GOAL IN ANY lNITE SAFELY EXPLORABLE ENVIRONMENT
5NLIKE !∗ HOWEVER IT IS NOT COMPLETE FOR INlNITE STATE SPACESˆTHERE ARE CASES WHERE IT CAN BE
LED INlNITELY ASTRAY )T CAN EXPLORE AN ENVIRONMENT OF n STATES IN O(n2) STEPS IN THE WORST CASE
14 2ANDOM WALKS ARE COMPLETE ON INlNITE ONE DIMENSIONAL AND TWO DIMENSIONAL GRIDS /N A THREE DIMENSIONAL
GRID THE PROBABILITY THAT THE WALK EVER RETURNS TO THE STARTING POINT IS ONLY ABOUT (UGHES

1
2
1 1
1 1 1
1
1 1 1
1 1 1
1
1 1 1
1 1 1
1
2
2
3
4
4
4
3
3
3
1 1 1
1 1 1
1
3
1 1 1
1 1 1
1
5
3
5
5
4
A
B
C
D
E
8 9
8
9
8 9
8
9
8 9
4
4
3
4
)LJXUH IVE ITERATIONS OF ,24!∗
ON A ONE DIMENSIONAL STATE SPACE %ACH STATE IS
LABELED WITH H(s) THE CURRENT COST ESTIMATE TO REACH A GOAL AND EACH LINK IS LABELED WITH ITS
STEP COST 4HE SHADED STATE MARKS THE LOCATION OF THE AGENT AND THE UPDATED COST ESTIMATES AT
EACH ITERATION ARE CIRCLED
IXQFWLRQ ,24! !'%.4s
UHWXUQV AN ACTION
LQSXWV s
A PERCEPT THAT IDENTIlES THE CURRENT STATE
SHUVLVWHQW result A TABLE INDEXED BY STATE AND ACTION INITIALLY EMPTY
H A TABLE OF COST ESTIMATES INDEXED BY STATE INITIALLY EMPTY
s a THE PREVIOUS STATE AND ACTION INITIALLY NULL
LI '/!, 4%34s
WKHQ UHWXUQ stop
LI s
IS A NEW STATE NOT IN H WKHQ H ;s
= ← hs
LI s IS NOT NULL
result;s a= ← s
H ;s= ← min
b ∈!#4)/.3(s)
,24! #/34s b result;s b= H
a ← AN ACTION b IN !#4)/.3s
THAT MINIMIZES ,24! #/34s
b result;s
b= H
s ← s
UHWXUQ a
IXQFWLRQ ,24! #/34s a s
H UHWXUQV A COST ESTIMATE
LI s
IS UNDElNED WKHQ UHWXUQ h(s)
HOVH UHWXUQ c(s, a, s
) + H[s
]
)LJXUH ,24! !'%.4 SELECTS AN ACTION ACCORDING TO THE VALUES OF NEIGHBORING
STATES WHICH ARE UPDATED AS THE AGENT MOVES ABOUT THE STATE SPACE

3ECTION 3UMMARY
BUT OFTEN DOES MUCH BETTER 4HE ,24!∗ AGENT IS JUST ONE OF A LARGE FAMILY OF ONLINE AGENTS THAT
ONE CAN DElNE BY SPECIFYING THE ACTION SELECTION RULE AND THE UPDATE RULE IN DIFFERENT WAYS
7E DISCUSS THIS FAMILY DEVELOPED ORIGINALLY FOR STOCHASTIC ENVIRONMENTS IN #HAPTER
/HDUQLQJ LQ RQOLQH VHDUFK
4HE INITIAL IGNORANCE OF ONLINE SEARCH AGENTS PROVIDES SEVERAL OPPORTUNITIES FOR LEARNING IRST
THE AGENTS LEARN A hMAPv OF THE ENVIRONMENTˆMORE PRECISELY THE OUTCOME OF EACH ACTION IN
EACH STATEˆSIMPLY BY RECORDING EACH OF THEIR EXPERIENCES .OTICE THAT THE ASSUMPTION OF
DETERMINISTIC ENVIRONMENTS MEANS THAT ONE EXPERIENCE IS ENOUGH FOR EACH ACTION 3ECOND
THE LOCAL SEARCH AGENTS ACQUIRE MORE ACCURATE ESTIMATES OF THE COST OF EACH STATE BY USING LOCAL
UPDATING RULES AS IN ,24!∗ )N #HAPTER WE SHOW THAT THESE UPDATES EVENTUALLY CONVERGE
TO H[DFW VALUES FOR EVERY STATE PROVIDED THAT THE AGENT EXPLORES THE STATE SPACE IN THE RIGHT
WAY /NCE EXACT VALUES ARE KNOWN OPTIMAL DECISIONS CAN BE TAKEN SIMPLY BY MOVING TO THE
LOWEST COST SUCCESSORˆTHAT IS PURE HILL CLIMBING IS THEN AN OPTIMAL STRATEGY
)F YOU FOLLOWED OUR SUGGESTION TO TRACE THE BEHAVIOR OF /.,).% $3 !'%.4 IN THE
ENVIRONMENT OF IGURE YOU WILL HAVE NOTICED THAT THE AGENT IS NOT VERY BRIGHT OR
EXAMPLE AFTER IT HAS SEEN THAT THE Up ACTION GOES FROM TO THE AGENT STILL HAS NO
IDEA THAT THE Down ACTION GOES BACK TO OR THAT THE Up ACTION ALSO GOES FROM TO
FROM TO AND SO ON )N GENERAL WE WOULD LIKE THE AGENT TO LEARN THAT Up
INCREASES THE y COORDINATE UNLESS THERE IS A WALL IN THE WAY THAT Down REDUCES IT AND SO ON
OR THIS TO HAPPEN WE NEED TWO THINGS IRST WE NEED A FORMAL AND EXPLICITLY MANIPULABLE
REPRESENTATION FOR THESE KINDS OF GENERAL RULES SO FAR WE HAVE HIDDEN THE INFORMATION INSIDE
THE BLACK BOX CALLED THE 2%35,4 FUNCTION 0ART ))) IS DEVOTED TO THIS ISSUE 3ECOND WE NEED
ALGORITHMS THAT CAN CONSTRUCT SUITABLE GENERAL RULES FROM THE SPECIlC OBSERVATIONS MADE BY
THE AGENT 4HESE ARE COVERED IN #HAPTER
35--!29
4HIS CHAPTER HAS EXAMINED SEARCH ALGORITHMS FOR PROBLEMS BEYOND THE hCLASSICALv CASE OF
lNDING THE SHORTEST PATH TO A GOAL IN AN OBSERVABLE DETERMINISTIC DISCRETE ENVIRONMENT
• /RFDO VHDUFK METHODS SUCH AS KLOO FOLPELQJ OPERATE ON COMPLETE STATE FORMULATIONS
KEEPING ONLY A SMALL NUMBER OF NODES IN MEMORY 3EVERAL STOCHASTIC ALGORITHMS HAVE
BEEN DEVELOPED INCLUDING VLPXODWHG DQQHDOLQJ WHICH RETURNS OPTIMAL SOLUTIONS WHEN
GIVEN AN APPROPRIATE COOLING SCHEDULE
• -ANY LOCAL SEARCH METHODS APPLY ALSO TO PROBLEMS IN CONTINUOUS SPACES /LQHDU SUR
JUDPPLQJ AND FRQYH[ RSWLPL]DWLRQ PROBLEMS OBEY CERTAIN RESTRICTIONS ON THE SHAPE
OF THE STATE SPACE AND THE NATURE OF THE OBJECTIVE FUNCTION AND ADMIT POLYNOMIAL TIME
ALGORITHMS THAT ARE OFTEN EXTREMELY EFlCIENT IN PRACTICE
• ! JHQHWLF DOJRULWKP IS A STOCHASTIC HILL CLIMBING SEARCH IN WHICH A LARGE POPULATION OF
STATES IS MAINTAINED .EW STATES ARE GENERATED BY PXWDWLRQ AND BY FURVVRYHU WHICH
COMBINES PAIRS OF STATES FROM THE POPULATION

• )N QRQGHWHUPLQLVWLF ENVIRONMENTS AGENTS CAN APPLY !.$n/2 SEARCH TO GENERATE FRQ
WLQJHQW PLANS THAT REACH THE GOAL REGARDLESS OF WHICH OUTCOMES OCCUR DURING EXECUTION
• 7HEN THE ENVIRONMENT IS PARTIALLY OBSERVABLE THE EHOLHI VWDWH REPRESENTS THE SET OF
POSSIBLE STATES THAT THE AGENT MIGHT BE IN
• 3TANDARD SEARCH ALGORITHMS CAN BE APPLIED DIRECTLY TO BELIEF STATE SPACE TO SOLVE VHQVRU
OHVV SUREOHPV AND BELIEF STATE !.$n/2 SEARCH CAN SOLVE GENERAL PARTIALLY OBSERVABLE
PROBLEMS )NCREMENTAL ALGORITHMS THAT CONSTRUCT SOLUTIONS STATE BY STATE WITHIN A BELIEF
STATE ARE OFTEN MORE EFlCIENT
• ([SORUDWLRQ SUREOHPV ARISE WHEN THE AGENT HAS NO IDEA ABOUT THE STATES AND ACTIONS OF
ITS ENVIRONMENT OR SAFELY EXPLORABLE ENVIRONMENTS RQOLQH VHDUFK AGENTS CAN BUILD A
MAP AND lND A GOAL IF ONE EXISTS 5PDATING HEURISTIC ESTIMATES FROM EXPERIENCE PROVIDES
AN EFFECTIVE METHOD TO ESCAPE FROM LOCAL MINIMA
),)/'2!0()#!, !.$ ()34/2)#!, ./4%3
,OCAL SEARCH TECHNIQUES HAVE A LONG HISTORY IN MATHEMATICS AND COMPUTER SCIENCE )NDEED
THE .EWTONn2APHSON METHOD .EWTON 2APHSON CAN BE SEEN AS A VERY EFl
CIENT LOCAL SEARCH METHOD FOR CONTINUOUS SPACES IN WHICH GRADIENT INFORMATION IS AVAILABLE
RENT IS A CLASSIC REFERENCE FOR OPTIMIZATION ALGORITHMS THAT DO NOT REQUIRE SUCH IN
FORMATION EAM SEARCH WHICH WE HAVE PRESENTED AS A LOCAL SEARCH ALGORITHM ORIGINATED
AS A BOUNDED WIDTH VARIANT OF DYNAMIC PROGRAMMING FOR SPEECH RECOGNITION IN THE (!209
SYSTEM ,OWERRE ! RELATED ALGORITHM IS ANALYZED IN DEPTH BY 0EARL #H
4HE TOPIC OF LOCAL SEARCH WAS REINVIGORATED IN THE EARLY S BY SURPRISINGLY GOOD RE
SULTS FOR LARGE CONSTRAINT SATISFACTION PROBLEMS SUCH AS n QUEENS -INTON HW DO AND
LOGICAL REASONING 3ELMAN HW DO AND BY THE INCORPORATION OF RANDOMNESS MULTIPLE
SIMULTANEOUS SEARCHES AND OTHER IMPROVEMENTS 4HIS RENAISSANCE OF WHAT #HRISTOS 0APADIM
ITRIOU HAS CALLED h.EW !GEv ALGORITHMS ALSO SPARKED INCREASED INTEREST AMONG THEORETICAL
COMPUTER SCIENTISTS +OUTSOUPIAS AND 0APADIMITRIOU !LDOUS AND 6AZIRANI )N
THE lELD OF OPERATIONS RESEARCH A VARIANT OF HILL CLIMBING CALLED WDEX VHDUFK HAS GAINED POPU
TABU SEARCH
LARITY 'LOVER AND ,AGUNA 4HIS ALGORITHM MAINTAINS A TABU LIST OF k PREVIOUSLY VISITED
STATES THAT CANNOT BE REVISITED AS WELL AS IMPROVING EFlCIENCY WHEN SEARCHING GRAPHS THIS LIST
CAN ALLOW THE ALGORITHM TO ESCAPE FROM SOME LOCAL MINIMA !NOTHER USEFUL IMPROVEMENT ON
HILL CLIMBING IS THE 34!'% ALGORITHM OYAN AND -OORE 4HE IDEA IS TO USE THE LOCAL
MAXIMA FOUND BY RANDOM RESTART HILL CLIMBING TO GET AN IDEA OF THE OVERALL SHAPE OF THE LAND
SCAPE 4HE ALGORITHM lTS A SMOOTH SURFACE TO THE SET OF LOCAL MAXIMA AND THEN CALCULATES THE
GLOBAL MAXIMUM OF THAT SURFACE ANALYTICALLY 4HIS BECOMES THE NEW RESTART POINT 4HE ALGO
RITHM HAS BEEN SHOWN TO WORK IN PRACTICE ON HARD PROBLEMS 'OMES HW DO SHOWED THAT
THE RUN TIMES OF SYSTEMATIC BACKTRACKING ALGORITHMS OFTEN HAVE A KHDYWDLOHG GLVWULEXWLRQ
HEAVY-TAILED
DISTRIBUTION
WHICH MEANS THAT THE PROBABILITY OF A VERY LONG RUN TIME IS MORE THAN WOULD BE PREDICTED IF
THE RUN TIMES WERE EXPONENTIALLY DISTRIBUTED 7HEN THE RUN TIME DISTRIBUTION IS HEAVY TAILED
RANDOM RESTARTS lND A SOLUTION FASTER ON AVERAGE THAN A SINGLE RUN TO COMPLETION

IBLIOGRAPHICAL AND (ISTORICAL .OTES
3IMULATED ANNEALING WAS lRST DESCRIBED BY +IRKPATRICK HW DO WHO BORROWED
DIRECTLY FROM THE 0HWURSROLV DOJRULWKP WHICH IS USED TO SIMULATE COMPLEX SYSTEMS IN
PHYSICS -ETROPOLIS HW DO AND WAS SUPPOSEDLY INVENTED AT A ,OS !LAMOS DINNER PARTY
3IMULATED ANNEALING IS NOW A lELD IN ITSELF WITH HUNDREDS OF PAPERS PUBLISHED EVERY YEAR
INDING OPTIMAL SOLUTIONS IN CONTINUOUS SPACES IS THE SUBJECT MATTER OF SEVERAL lELDS
INCLUDING RSWLPL]DWLRQ WKHRU RSWLPDO FRQWURO WKHRU AND THE FDOFXOXV RI YDULDWLRQV 4HE
BASIC TECHNIQUES ARE EXPLAINED WELL BY ISHOP 0RESS HW DO COVER A WIDE RANGE
OF ALGORITHMS AND PROVIDE WORKING SOFTWARE
!S !NDREW -OORE POINTS OUT RESEARCHERS HAVE TAKEN INSPIRATION FOR SEARCH AND OPTI
MIZATION ALGORITHMS FROM A WIDE VARIETY OF lELDS OF STUDY METALLURGY SIMULATED ANNEALING
BIOLOGY GENETIC ALGORITHMS ECONOMICS MARKET BASED ALGORITHMS ENTOMOLOGY ANT COLONY
OPTIMIZATION NEUROLOGY NEURAL NETWORKS ANIMAL BEHAVIOR REINFORCEMENT LEARNING MOUN
TAINEERING HILL CLIMBING AND OTHERS
/LQHDU SURJUDPPLQJ ,0 WAS lRST STUDIED SYSTEMATICALLY BY THE 2USSIAN MATHEMATI
CIAN ,EONID +ANTOROVICH )T WAS ONE OF THE lRST APPLICATIONS OF COMPUTERS THE VLP
SOH[ DOJRULWKP $ANTZIG IS STILL USED DESPITE WORST CASE EXPONENTIAL COMPLEXITY +AR
MARKAR DEVELOPED THE FAR MORE EFlCIENT FAMILY OF LQWHULRUSRLQW METHODS WHICH WAS
SHOWN TO HAVE POLYNOMIAL COMPLEXITY FOR THE MORE GENERAL CLASS OF CONVEX OPTIMIZATION PROB
LEMS BY .ESTEROV AND .EMIROVSKI %XCELLENT INTRODUCTIONS TO CONVEX OPTIMIZATION ARE
PROVIDED BY EN 4AL AND .EMIROVSKI AND OYD AND 6ANDENBERGHE
7ORK BY 3EWALL 7RIGHT ON THE CONCEPT OF A ¿WQHVV ODQGVFDSH WAS AN IMPOR
TANT PRECURSOR TO THE DEVELOPMENT OF GENETIC ALGORITHMS )N THE S SEVERAL STATISTICIANS
INCLUDING OX AND RIEDMAN USED EVOLUTIONARY TECHNIQUES FOR OPTIMIZATION
PROBLEMS BUT IT WASNT UNTIL 2ECHENBERG INTRODUCED HYROXWLRQ VWUDWHJLHV TO SOLVE OP
EVOLUTION
STRATEGY
TIMIZATION PROBLEMS FOR AIRFOILS THAT THE APPROACH GAINED POPULARITY )N THE S AND S
*OHN (OLLAND CHAMPIONED GENETIC ALGORITHMS BOTH AS A USEFUL TOOL AND AS A METHOD
TO EXPAND OUR UNDERSTANDING OF ADAPTATION BIOLOGICAL OR OTHERWISE (OLLAND 4HE DU
WL¿FLDO OLIH MOVEMENT ,ANGTON TAKES THIS IDEA ONE STEP FURTHER VIEWING THE PRODUCTS
ARTIFICIAL LIFE
OF GENETIC ALGORITHMS AS RUJDQLVPV RATHER THAN SOLUTIONS TO PROBLEMS 7ORK IN THIS lELD BY
(INTON AND .OWLAN AND !CKLEY AND ,ITTMAN HAS DONE MUCH TO CLARIFY THE IM
PLICATIONS OF THE ALDWIN EFFECT OR GENERAL BACKGROUND ON EVOLUTION WE RECOMMEND 3MITH
AND 3ZATHM|
ARY 2IDLEY AND #ARROLL
-OST COMPARISONS OF GENETIC ALGORITHMS TO OTHER APPROACHES ESPECIALLY STOCHASTIC HILL
CLIMBING HAVE FOUND THAT THE GENETIC ALGORITHMS ARE SLOWER TO CONVERGE /2EILLY AND /P
PACHER -ITCHELL HW DO *UELS AND 7ATTENBERG ALUJA 3UCH lNDINGS
ARE NOT UNIVERSALLY POPULAR WITHIN THE '! COMMUNITY BUT RECENT ATTEMPTS WITHIN THAT COM
MUNITY TO UNDERSTAND POPULATION BASED SEARCH AS AN APPROXIMATE FORM OF AYESIAN LEARNING
SEE #HAPTER MIGHT HELP CLOSE THE GAP BETWEEN THE lELD AND ITS CRITICS 0ELIKAN HW DO
4HE THEORY OF TXDGUDWLF GQDPLFDO VVWHPV MAY ALSO EXPLAIN THE PERFORMANCE OF
'!S 2ABANI HW DO 3EE ,OHN HW DO FOR AN EXAMPLE OF '!S APPLIED TO ANTENNA
DESIGN AND 2ENNER AND %KART FOR AN APPLICATION TO COMPUTER AIDED DESIGN
4HE lELD OF JHQHWLF SURJUDPPLQJ IS CLOSELY RELATED TO GENETIC ALGORITHMS 4HE PRINCI
GENETIC
PROGRAMMING
PAL DIFFERENCE IS THAT THE REPRESENTATIONS THAT ARE MUTATED AND COMBINED ARE PROGRAMS RATHER

THAN BIT STRINGS 4HE PROGRAMS ARE REPRESENTED IN THE FORM OF EXPRESSION TREES THE EXPRESSIONS
CAN BE IN A STANDARD LANGUAGE SUCH AS ,ISP OR CAN BE SPECIALLY DESIGNED TO REPRESENT CIRCUITS
ROBOT CONTROLLERS AND SO ON #ROSSOVER INVOLVES SPLICING TOGETHER SUBTREES RATHER THAN SUB
STRINGS 4HIS FORM OF MUTATION GUARANTEES THAT THE OFFSPRING ARE WELL FORMED EXPRESSIONS
WHICH WOULD NOT BE THE CASE IF PROGRAMS WERE MANIPULATED AS STRINGS
)NTEREST IN GENETIC PROGRAMMING WAS SPURRED BY *OHN +OZAS WORK +OZA
BUT IT GOES BACK AT LEAST TO EARLY EXPERIMENTS WITH MACHINE CODE BY RIEDBERG AND
WITH lNITE STATE AUTOMATA BY OGEL HW DO !S WITH GENETIC ALGORITHMS THERE IS DEBATE
ABOUT THE EFFECTIVENESS OF THE TECHNIQUE +OZA HW DO DESCRIBE EXPERIMENTS IN THE USE
OF GENETIC PROGRAMMING TO DESIGN CIRCUIT DEVICES
4HE JOURNALS (YROXWLRQDU RPSXWDWLRQ AND ,((( 7UDQVDFWLRQV RQ (YROXWLRQDU RP
SXWDWLRQ COVER GENETIC ALGORITHMS AND GENETIC PROGRAMMING ARTICLES ARE ALSO FOUND IN RP
SOH[ 6VWHPV $GDSWLYH %HKDYLRU AND $UWL¿FLDO /LIH 4HE MAIN CONFERENCE IS THE *HQHWLF
DQG (YROXWLRQDU RPSXWDWLRQ RQIHUHQFH '%##/ 'OOD OVERVIEW TEXTS ON GENETIC ALGO
RITHMS ARE GIVEN BY -ITCHELL OGEL AND ,ANGDON AND 0OLI AND BY THE
FREE ONLINE BOOK BY 0OLI HW DO
4HE UNPREDICTABILITY AND PARTIAL OBSERVABILITY OF REAL ENVIRONMENTS WERE RECOGNIZED
EARLY ON IN ROBOTICS PROJECTS THAT USED PLANNING TECHNIQUES INCLUDING 3HAKEY IKES HW DO
AND 2%$$9 -ICHIE 4HE PROBLEMS RECEIVED MORE ATTENTION AFTER THE PUBLICA
TION OF -C$ERMOTTS A INmUENTIAL ARTICLE 3ODQQLQJ DQG $FWLQJ
4HE lRST WORK TO MAKE EXPLICIT USE OF !.$n/2 TREES SEEMS TO HAVE BEEN 3LAGLES 3!).4
PROGRAM FOR SYMBOLIC INTEGRATION MENTIONED IN #HAPTER !MAREL APPLIED THE IDEA
TO PROPOSITIONAL THEOREM PROVING A TOPIC DISCUSSED IN #HAPTER AND INTRODUCED A SEARCH
ALGORITHM SIMILAR TO !.$ /2 '2!0( 3%!2#( 4HE ALGORITHM WAS FURTHER DEVELOPED AND
FORMALIZED BY .ILSSON WHO ALSO DESCRIBED !/∗ˆWHICH AS ITS NAME SUGGESTS lNDS
OPTIMAL SOLUTIONS GIVEN AN ADMISSIBLE HEURISTIC !/∗ WAS ANALYZED AND IMPROVED BY -ARTELLI
AND -ONTANARI !/∗ IS A TOP DOWN ALGORITHM A BOTTOM UP GENERALIZATION OF !∗ IS
!∗,$ FOR !∗ ,IGHTEST $ERIVATION ELZENSZWALB AND -C!LLESTER )NTEREST IN !.$n/2
SEARCH HAS UNDERGONE A REVIVAL IN RECENT YEARS WITH NEW ALGORITHMS FOR lNDING CYCLIC SOLU
TIONS *IMENEZ AND 4ORRAS (ANSEN AND :ILBERSTEIN AND NEW TECHNIQUES INSPIRED
BY DYNAMIC PROGRAMMING ONET AND 'EFFNER
4HE IDEA OF TRANSFORMING PARTIALLY OBSERVABLE PROBLEMS INTO BELIEF STATE PROBLEMS ORIG
INATED WITH !STROM FOR THE MUCH MORE COMPLEX CASE OF PROBABILISTIC UNCERTAINTY SEE
#HAPTER %RDMANN AND -ASON STUDIED THE PROBLEM OF ROBOTIC MANIPULATION WITH
OUT SENSORS USING A CONTINUOUS FORM OF BELIEF STATE SEARCH 4HEY SHOWED THAT IT WAS POSSIBLE
TO ORIENT A PART ON A TABLE FROM AN ARBITRARY INITIAL POSITION BY A WELL DESIGNED SEQUENCE OF TILT
ING ACTIONS -ORE PRACTICAL METHODS BASED ON A SERIES OF PRECISELY ORIENTED DIAGONAL BARRIERS
ACROSS A CONVEYOR BELT USE THE SAME ALGORITHMIC INSIGHTS 7IEGLEY HW DO
4HE BELIEF STATE APPROACH WAS REINVENTED IN THE CONTEXT OF SENSORLESS AND PARTIALLY OB
SERVABLE SEARCH PROBLEMS BY 'ENESERETH AND .OURBAKHSH !DDITIONAL WORK WAS DONE
ON SENSORLESS PROBLEMS IN THE LOGIC BASED PLANNING COMMUNITY 'OLDMAN AND ODDY
3MITH AND 7ELD 4HIS WORK HAS EMPHASIZED CONCISE REPRESENTATIONS FOR BELIEF STATES
AS EXPLAINED IN #HAPTER ONET AND 'EFFNER INTRODUCED THE lRST EFFECTIVE HEURISTICS

%XERCISES
FOR BELIEF STATE SEARCH THESE WERE RElNED BY RYCE HW DO 4HE INCREMENTAL APPROACH
TO BELIEF STATE SEARCH IN WHICH SOLUTIONS ARE CONSTRUCTED INCREMENTALLY FOR SUBSETS OF STATES
WITHIN EACH BELIEF STATE WAS STUDIED IN THE PLANNING LITERATURE BY +URIEN HW DO SEVERAL
NEW INCREMENTAL ALGORITHMS WERE INTRODUCED FOR NONDETERMINISTIC PARTIALLY OBSERVABLE PROB
LEMS BY 2USSELL AND 7OLFE !DDITIONAL REFERENCES FOR PLANNING IN STOCHASTIC PARTIALLY
OBSERVABLE ENVIRONMENTS APPEAR IN #HAPTER
!LGORITHMS FOR EXPLORING UNKNOWN STATE SPACES HAVE BEEN OF INTEREST FOR MANY CENTURIES
$EPTH lRST SEARCH IN A MAZE CAN BE IMPLEMENTED BY KEEPING ONES LEFT HAND ON THE WALL LOOPS
CAN BE AVOIDED BY MARKING EACH JUNCTION $EPTH lRST SEARCH FAILS WITH IRREVERSIBLE ACTIONS
THE MORE GENERAL PROBLEM OF EXPLORING (XOHULDQ JUDSKV IE GRAPHS IN WHICH EACH NODE HAS
EULERIAN GRAPH
EQUAL NUMBERS OF INCOMING AND OUTGOING EDGES WAS SOLVED BY AN ALGORITHM DUE TO (IERHOLZER
4HE lRST THOROUGH ALGORITHMIC STUDY OF THE EXPLORATION PROBLEM FOR ARBITRARY GRAPHS
WAS CARRIED OUT BY $ENG AND 0APADIMITRIOU WHO DEVELOPED A COMPLETELY GENERAL
ALGORITHM BUT SHOWED THAT NO BOUNDED COMPETITIVE RATIO IS POSSIBLE FOR EXPLORING A GENERAL
GRAPH 0APADIMITRIOU AND 9ANNAKAKIS EXAMINED THE QUESTION OF lNDING PATHS TO A GOAL
IN GEOMETRIC PATH PLANNING ENVIRONMENTS WHERE ALL ACTIONS ARE REVERSIBLE 4HEY SHOWED THAT
A SMALL COMPETITIVE RATIO IS ACHIEVABLE WITH SQUARE OBSTACLES BUT WITH GENERAL RECTANGULAR
OBSTACLES NO BOUNDED RATIO CAN BE ACHIEVED 3EE IGURE
4HE ,24!∗ ALGORITHM WAS DEVELOPED BY +ORF AS PART OF AN INVESTIGATION INTO
UHDOWLPH VHDUFK FOR ENVIRONMENTS IN WHICH THE AGENT MUST ACT AFTER SEARCHING FOR ONLY A
REAL-TIME SEARCH
lXED AMOUNT OF TIME A COMMON SITUATION IN TWO PLAYER GAMES ,24!∗ IS IN FACT A SPECIAL
CASE OF REINFORCEMENT LEARNING ALGORITHMS FOR STOCHASTIC ENVIRONMENTS ARTO HW DO )TS
POLICY OF OPTIMISM UNDER UNCERTAINTYˆALWAYS HEAD FOR THE CLOSEST UNVISITED STATEˆCAN RESULT
IN AN EXPLORATION PATTERN THAT IS LESS EFlCIENT IN THE UNINFORMED CASE THAN SIMPLE DEPTH lRST
SEARCH +OENIG $ASGUPTA HW DO SHOW THAT ONLINE ITERATIVE DEEPENING SEARCH IS
OPTIMALLY EFlCIENT FOR lNDING A GOAL IN A UNIFORM TREE WITH NO HEURISTIC INFORMATION 3EV
ERAL INFORMED VARIANTS ON THE ,24!∗ THEME HAVE BEEN DEVELOPED WITH DIFFERENT METHODS FOR
SEARCHING AND UPDATING WITHIN THE KNOWN PORTION OF THE GRAPH 0EMBERTON AND +ORF
!S YET THERE IS NO GOOD UNDERSTANDING OF HOW TO lND GOALS WITH OPTIMAL EFlCIENCY WHEN
USING HEURISTIC INFORMATION
%8%2#)3%3
'IVE THE NAME OF THE ALGORITHM THAT RESULTS FROM EACH OF THE FOLLOWING SPECIAL CASES
D ,OCAL BEAM SEARCH WITH k = 1
E ,OCAL BEAM SEARCH WITH ONE INITIAL STATE AND NO LIMIT ON THE NUMBER OF STATES RETAINED
F 3IMULATED ANNEALING WITH T = 0 AT ALL TIMES AND OMITTING THE TERMINATION TEST
G 3IMULATED ANNEALING WITH T = ∞ AT ALL TIMES
H 'ENETIC ALGORITHM WITH POPULATION SIZE N = 1

%XERCISE CONSIDERS THE PROBLEM OF BUILDING RAILWAY TRACKS UNDER THE ASSUMPTION
THAT PIECES lT EXACTLY WITH NO SLACK .OW CONSIDER THE REAL PROBLEM IN WHICH PIECES DONT
lT EXACTLY BUT ALLOW FOR UP TO DEGREES OF ROTATION TO EITHER SIDE OF THE hPROPERv ALIGNMENT
%XPLAIN HOW TO FORMULATE THE PROBLEM SO IT COULD BE SOLVED BY SIMULATED ANNEALING
'ENERATE A LARGE NUMBER OF PUZZLE AND QUEENS INSTANCES AND SOLVE THEM WHERE POS
SIBLE BY HILL CLIMBING STEEPEST ASCENT AND lRST CHOICE VARIANTS HILL CLIMBING WITH RANDOM
RESTART AND SIMULATED ANNEALING -EASURE THE SEARCH COST AND PERCENTAGE OF SOLVED PROBLEMS
AND GRAPH THESE AGAINST THE OPTIMAL SOLUTION COST #OMMENT ON YOUR RESULTS
4HE !.$ /2 '2!0( 3%!2#( ALGORITHM IN IGURE CHECKS FOR REPEATED STATES
ONLY ON THE PATH FROM THE ROOT TO THE CURRENT STATE 3UPPOSE THAT IN ADDITION THE ALGORITHM
WERE TO STORE HYHU VISITED STATE AND CHECK AGAINST THAT LIST 3EE 2%!$4( )234 3%!2#(
IN IGURE FOR AN EXAMPLE $ETERMINE THE INFORMATION THAT SHOULD BE STORED AND HOW THE
ALGORITHM SHOULD USE THAT INFORMATION WHEN A REPEATED STATE IS FOUND +LQW 9OU WILL NEED TO
DISTINGUISH AT LEAST BETWEEN STATES FOR WHICH A SUCCESSFUL SUBPLAN WAS CONSTRUCTED PREVIOUSLY
AND STATES FOR WHICH NO SUBPLAN COULD BE FOUND %XPLAIN HOW TO USE LABELS AS DElNED IN
3ECTION TO AVOID HAVING MULTIPLE COPIES OF SUBPLANS
%XPLAIN PRECISELY HOW TO MODIFY THE !.$ /2 '2!0( 3%!2#( ALGORITHM TO GENERATE
A CYCLIC PLAN IF NO ACYCLIC PLAN EXISTS 9OU WILL NEED TO DEAL WITH THREE ISSUES LABELING THE PLAN
STEPS SO THAT A CYCLIC PLAN CAN POINT BACK TO AN EARLIER PART OF THE PLAN MODIFYING /2 3%!2#(
SO THAT IT CONTINUES TO LOOK FOR ACYCLIC PLANS AFTER lNDING A CYCLIC PLAN AND AUGMENTING THE
PLAN REPRESENTATION TO INDICATE WHETHER A PLAN IS CYCLIC 3HOW HOW YOUR ALGORITHM WORKS ON
A THE SLIPPERY VACUUM WORLD AND B THE SLIPPERY ERRATIC VACUUM WORLD 9OU MIGHT WISH TO
USE A COMPUTER IMPLEMENTATION TO CHECK YOUR RESULTS
)N 3ECTION WE INTRODUCED BELIEF STATES TO SOLVE SENSORLESS SEARCH PROBLEMS !
SEQUENCE OF ACTIONS SOLVES A SENSORLESS PROBLEM IF IT MAPS EVERY PHYSICAL STATE IN THE INITIAL
BELIEF STATE b TO A GOAL STATE 3UPPOSE THE AGENT KNOWS h∗(s) THE TRUE OPTIMAL COST OF SOLVING
THE PHYSICAL STATE s IN THE FULLY OBSERVABLE PROBLEM FOR EVERY STATE s IN b IND AN ADMISSIBLE
HEURISTIC h(b) FOR THE SENSORLESS PROBLEM IN TERMS OF THESE COSTS AND PROVE ITS ADMISSIBILTY
#OMMENT ON THE ACCURACY OF THIS HEURISTIC ON THE SENSORLESS VACUUM PROBLEM OF IGURE
(OW WELL DOES !∗ PERFORM
4HIS EXERCISE EXPLORES SUBSETnSUPERSET RELATIONS BETWEEN BELIEF STATES IN SENSORLESS OR
PARTIALLY OBSERVABLE ENVIRONMENTS
D 0ROVE THAT IF AN ACTION SEQUENCE IS A SOLUTION FOR A BELIEF STATE b IT IS ALSO A SOLUTION FOR
ANY SUBSET OF b #AN ANYTHING BE SAID ABOUT SUPERSETS OF b
E %XPLAIN IN DETAIL HOW TO MODIFY GRAPH SEARCH FOR SENSORLESS PROBLEMS TO TAKE ADVANTAGE
OF YOUR ANSWERS IN A
F %XPLAIN IN DETAIL HOW TO MODIFY !.$n/2 SEARCH FOR PARTIALLY OBSERVABLE PROBLEMS
BEYOND THE MODIlCATIONS YOU DESCRIBE IN B
/N PAGE IT WAS ASSUMED THAT A GIVEN ACTION WOULD HAVE THE SAME COST WHEN EX
ECUTED IN ANY PHYSICAL STATE WITHIN A GIVEN BELIEF STATE 4HIS LEADS TO A BELIEF STATE SEARCH

%XERCISES
PROBLEM WITH WELL DElNED STEP COSTS .OW CONSIDER WHAT HAPPENS WHEN THE ASSUMPTION
DOES NOT HOLD $OES THE NOTION OF OPTIMALITY STILL MAKE SENSE IN THIS CONTEXT OR DOES IT REQUIRE
MODIlCATION #ONSIDER ALSO VARIOUS POSSIBLE DElNITIONS OF THE hCOSTv OF EXECUTING AN ACTION
IN A BELIEF STATE FOR EXAMPLE WE COULD USE THE PLQLPXP OF THE PHYSICAL COSTS OR THE PD[L
PXP OR A COST LQWHUYDO WITH THE LOWER BOUND BEING THE MINIMUM COST AND THE UPPER BOUND
BEING THE MAXIMUM OR JUST KEEP THE SET OF ALL POSSIBLE COSTS FOR THAT ACTION OR EACH OF THESE
EXPLORE WHETHER !∗ WITH MODIlCATIONS IF NECESSARY CAN RETURN OPTIMAL SOLUTIONS
#ONSIDER THE SENSORLESS VERSION OF THE ERRATIC VACUUM WORLD $RAW THE BELIEF STATE SPACE
REACHABLE FROM THE INITIAL BELIEF STATE {1, 3, 5, 7} AND EXPLAIN WHY THE PROBLEM IS UNSOLV
ABLE
7E CAN TURN THE NAVIGATION PROBLEM IN %XERCISE INTO AN ENVIRONMENT AS FOLLOWS
• 4HE PERCEPT WILL BE A LIST OF THE POSITIONS UHODWLYH WR WKH DJHQW OF THE VISIBLE VERTICES
4HE PERCEPT DOES QRW INCLUDE THE POSITION OF THE ROBOT 4HE ROBOT MUST LEARN ITS OWN PO
SITION FROM THE MAP FOR NOW YOU CAN ASSUME THAT EACH LOCATION HAS A DIFFERENT hVIEWv
• %ACH ACTION WILL BE A VECTOR DESCRIBING A STRAIGHT LINE PATH TO FOLLOW )F THE PATH IS
UNOBSTRUCTED THE ACTION SUCCEEDS OTHERWISE THE ROBOT STOPS AT THE POINT WHERE ITS
PATH lRST INTERSECTS AN OBSTACLE )F THE AGENT RETURNS A ZERO MOTION VECTOR AND IS AT THE
GOAL WHICH IS lXED AND KNOWN THEN THE ENVIRONMENT TELEPORTS THE AGENT TO A UDQGRP
ORFDWLRQ NOT INSIDE AN OBSTACLE
• 4HE PERFORMANCE MEASURE CHARGES THE AGENT POINT FOR EACH UNIT OF DISTANCE TRAVERSED
AND AWARDS POINTS EACH TIME THE GOAL IS REACHED
D )MPLEMENT THIS ENVIRONMENT AND A PROBLEM SOLVING AGENT FOR IT !FTER EACH TELEPORTA
TION THE AGENT WILL NEED TO FORMULATE A NEW PROBLEM WHICH WILL INVOLVE DISCOVERING ITS
CURRENT LOCATION
E $OCUMENT YOUR AGENTS PERFORMANCE BY HAVING THE AGENT GENERATE SUITABLE COMMENTARY
AS IT MOVES AROUND AND REPORT ITS PERFORMANCE OVER EPISODES
F -ODIFY THE ENVIRONMENT SO THAT OF THE TIME THE AGENT ENDS UP AT AN UNINTENDED
DESTINATION CHOSEN RANDOMLY FROM THE OTHER VISIBLE VERTICES IF ANY OTHERWISE NO MOVE
AT ALL 4HIS IS A CRUDE MODEL OF THE MOTION ERRORS OF A REAL ROBOT -ODIFY THE AGENT
SO THAT WHEN SUCH AN ERROR IS DETECTED IT lNDS OUT WHERE IT IS AND THEN CONSTRUCTS A
PLAN TO GET BACK TO WHERE IT WAS AND RESUME THE OLD PLAN 2EMEMBER THAT SOMETIMES
GETTING BACK TO WHERE IT WAS MIGHT ALSO FAIL 3HOW AN EXAMPLE OF THE AGENT SUCCESSFULLY
OVERCOMING TWO SUCCESSIVE MOTION ERRORS AND STILL REACHING THE GOAL
G .OW TRY TWO DIFFERENT RECOVERY SCHEMES AFTER AN ERROR HEAD FOR THE CLOSEST VERTEX ON
THE ORIGINAL ROUTE AND REPLAN A ROUTE TO THE GOAL FROM THE NEW LOCATION #OMPARE THE
PERFORMANCE OF THE THREE RECOVERY SCHEMES 7OULD THE INCLUSION OF SEARCH COSTS AFFECT
THE COMPARISON
H .OW SUPPOSE THAT THERE ARE LOCATIONS FROM WHICH THE VIEW IS IDENTICAL OR EXAMPLE
SUPPOSE THE WORLD IS A GRID WITH SQUARE OBSTACLES 7HAT KIND OF PROBLEM DOES THE AGENT
NOW FACE 7HAT DO SOLUTIONS LOOK LIKE

3UPPOSE THAT AN AGENT IS IN A 3 × 3 MAZE ENVIRONMENT LIKE THE ONE SHOWN IN IG
URE 4HE AGENT KNOWS THAT ITS INITIAL LOCATION IS THAT THE GOAL IS AT AND THAT THE
FOUR ACTIONS Up Down Left Right HAVE THEIR USUAL EFFECTS UNLESS BLOCKED BY A WALL 4HE
AGENT DOES QRW KNOW WHERE THE INTERNAL WALLS ARE )N ANY GIVEN STATE THE AGENT PERCEIVES THE
SET OF LEGAL ACTIONS IT CAN ALSO TELL WHETHER THE STATE IS ONE IT HAS VISITED BEFORE OR IS A NEW
STATE
D %XPLAIN HOW THIS ONLINE SEARCH PROBLEM CAN BE VIEWED AS AN OFmINE SEARCH IN BELIEF STATE
SPACE WHERE THE INITIAL BELIEF STATE INCLUDES ALL POSSIBLE ENVIRONMENT CONlGURATIONS
(OW LARGE IS THE INITIAL BELIEF STATE (OW LARGE IS THE SPACE OF BELIEF STATES
E (OW MANY DISTINCT PERCEPTS ARE POSSIBLE IN THE INITIAL STATE
F $ESCRIBE THE lRST FEW BRANCHES OF A CONTINGENCY PLAN FOR THIS PROBLEM (OW LARGE
ROUGHLY IS THE COMPLETE PLAN
.OTICE THAT THIS CONTINGENCY PLAN IS A SOLUTION FOR HYHU SRVVLEOH HQYLURQPHQW lTTING THE GIVEN
DESCRIPTION 4HEREFORE INTERLEAVING OF SEARCH AND EXECUTION IS NOT STRICTLY NECESSARY EVEN IN
UNKNOWN ENVIRONMENTS
)N THIS EXERCISE WE EXAMINE HILL CLIMBING IN THE CONTEXT OF ROBOT NAVIGATION USING THE
ENVIRONMENT IN IGURE AS AN EXAMPLE
D 2EPEAT %XERCISE USING HILL CLIMBING $OES YOUR AGENT EVER GET STUCK IN A LOCAL
MINIMUM )S IT SRVVLEOH FOR IT TO GET STUCK WITH CONVEX OBSTACLES
E #ONSTRUCT A NONCONVEX POLYGONAL ENVIRONMENT IN WHICH THE AGENT GETS STUCK
F -ODIFY THE HILL CLIMBING ALGORITHM SO THAT INSTEAD OF DOING A DEPTH SEARCH TO DECIDE
WHERE TO GO NEXT IT DOES A DEPTH k SEARCH )T SHOULD lND THE BEST k STEP PATH AND DO
ONE STEP ALONG IT AND THEN REPEAT THE PROCESS
G )S THERE SOME k FOR WHICH THE NEW ALGORITHM IS GUARANTEED TO ESCAPE FROM LOCAL MINIMA
H %XPLAIN HOW ,24!∗ ENABLES THE AGENT TO ESCAPE FROM LOCAL MINIMA IN THIS CASE
2ELATE THE TIME COMPLEXITY OF ,24!∗ TO ITS SPACE COMPLEXITY

5 ADVERSARIAL SEARCH
In which we examine the problems that arise when we try to plan ahead in a world
where other agents are planning against us.
5.1 GAMES
Chapter 2 introduced multiagent environments, in which each agent needs to consider the
actions of other agents and how they affect its own welfare. The unpredictability of these
other agents can introduce contingencies into the agent’s problem-solving process, as dis-
cussed in Chapter 4. In this chapter we cover competitive environments, in which the agents’
goals are in conflict, giving rise to adversarial search problems—often known as games.
GAME
Mathematical game theory, a branch of economics, views any multiagent environment
as a game, provided that the impact of each agent on the others is “significant,” regardless
of whether the agents are cooperative or competitive.1 In AI, the most common games are
of a rather specialized kind—what game theorists call deterministic, turn-taking, two-player,
zero-sum games of perfect information (such as chess). In our terminology, this means
ZERO-SUM GAMES
PERFECT
INFORMATION deterministic, fully observable environments in which two agents act alternately and in which
the utility values at the end of the game are always equal and opposite. For example, if one
player wins a game of chess, the other player necessarily loses. It is this opposition between
the agents’ utility functions that makes the situation adversarial.
Games have engaged the intellectual faculties of humans—sometimes to an alarming
degree—for as long as civilization has existed. For AI researchers, the abstract nature of
games makes them an appealing subject for study. The state of a game is easy to represent,
and agents are usually restricted to a small number of actions whose outcomes are defined by
precise rules. Physical games, such as croquet and ice hockey, have much more complicated
descriptions, a much larger range of possible actions, and rather imprecise rules defining
the legality of actions. With the exception of robot soccer, these physical games have not
attracted much interest in the AI community.
1 Environments with very many agents are often viewed as economies rather than games.
161

162 Chapter 5. Adversarial Search
Games, unlike most of the toy problems studied in Chapter 3, are interesting because
they are too hard to solve. For example, chess has an average branching factor of about 35,
and games often go to 50 moves by each player, so the search tree has about 35100 or 10154
nodes (although the search graph has “only” about 1040 distinct nodes). Games, like the real
world, therefore require the ability to make some decision even when calculating the optimal
decision is infeasible. Games also penalize inefficiency severely. Whereas an implementation
of A∗ search that is half as efficient will simply take twice as long to run to completion, a chess
program that is half as efficient in using its available time probably will be beaten into the
ground, other things being equal. Game-playing research has therefore spawned a number of
interesting ideas on how to make the best possible use of time.
We begin with a definition of the optimal move and an algorithm for finding it. We
then look at techniques for choosing a good move when time is limited. Pruning allows us
PRUNING
to ignore portions of the search tree that make no difference to the final choice, and heuristic
evaluation functions allow us to approximate the true utility of a state without doing a com-
plete search. Section 5.5 discusses games such as backgammon that include an element of
chance; we also discuss bridge, which includes elements of imperfect information because
IMPERFECT
INFORMATION
not all cards are visible to each player. Finally, we look at how state-of-the-art game-playing
programs fare against human opposition and at directions for future developments.
We first consider games with two players, whom we call MAX and MIN for reasons that
will soon become obvious. MAX moves first, and then they take turns moving until the game
is over. At the end of the game, points are awarded to the winning player and penalties are
given to the loser. A game can be formally defined as a kind of search problem with the
following elements:
• S0: The initial state, which specifies how the game is set up at the start.
• PLAYER(s): Defines which player has the move in a state.
• ACTIONS(s): Returns the set of legal moves in a state.
• RESULT(s, a): The transition model, which defines the result of a move.
• TERMINAL-TEST(s): A terminal test, which is true when the game is over and false
TERMINAL TEST
otherwise. States where the game has ended are called terminal states.
TERMINAL STATES
• UTILITY(s, p): A utility function (also called an objective function or payoff function),
defines the final numeric value for a game that ends in terminal state s for a player p. In
chess, the outcome is a win, loss, or draw, with values +1, 0, or 1
2. Some games have a
wider variety of possible outcomes; the payoffs in backgammon range from 0 to +192.
A zero-sum game is (confusingly) defined as one where the total payoff to all players
is the same for every instance of the game. Chess is zero-sum because every game has
payoff of either 0 + 1, 1 + 0 or 1
2 + 1
2. “Constant-sum” would have been a better term,
but zero-sum is traditional and makes sense if you imagine each player is charged an
entry fee of 1
2.
The initial state, ACTIONS function, and RESULT function define the game tree for the
GAME TREE
game—a tree where the nodes are game states and the edges are moves. Figure 5.1 shows
part of the game tree for tic-tac-toe (noughts and crosses). From the initial state, MAX has
nine possible moves. Play alternates between MAX’s placing an X and MIN’s placing an O

Section 5.2. Optimal Decisions in Games 163
until we reach leaf nodes corresponding to terminal states such that one player has three in
a row or all the squares are filled. The number on each leaf node indicates the utility value
of the terminal state from the point of view of MAX; high values are assumed to be good for
MAX and bad for MIN (which is how the players get their names).
For tic-tac-toe the game tree is relatively small—fewer than 9! = 362, 880 terminal
nodes. But for chess there are over 1040 nodes, so the game tree is best thought of as a
theoretical construct that we cannot realize in the physical world. But regardless of the size
of the game tree, it is MAX’s job to search for a good move. We use the term search tree for a
SEARCH TREE
tree that is superimposed on the full game tree, and examines enough nodes to allow a player
to determine what move to make.
X
X
X
X
X
X
X
X
X
X X
O
O
X O
O
X O
X O
X
. . . . . . . . . . . .
. . .
. . .
. . .
X
X
–1 0 +1
X
X
X X
O
X X
O
X X
O
O
O
X
X X
O
O
O
O O X X
MAX (X)
MIN (O)
MAX (X)
MIN (O)
TERMINAL
Utility
Figure 5.1 A (partial) game tree for the game of tic-tac-toe. The top node is the initial
state, and MAX moves first, placing an X in an empty square. We show part of the tree, giving
alternating moves by MIN (O) and MAX (X), until we eventually reach terminal states, which
can be assigned utilities according to the rules of the game.
5.2 OPTIMAL DECISIONS IN GAMES
In a normal search problem, the optimal solution would be a sequence of actions leading to
a goal state—a terminal state that is a win. In adversarial search, MIN has something to say
about it. MAX therefore must find a contingent strategy, which specifies MAX’s move in
STRATEGY
the initial state, then MAX’s moves in the states resulting from every possible response by

MAX A
B C D
3 12 8 2 4 6 14 5 2
3 2 2
3
a1
a2
a3
b1
b2
b3 c1
c2
c3 d1
d2
d3
MIN
Figure 5.2 A two-ply game tree. The △ nodes are “MAX nodes,” in which it is MAX’s
turn to move, and the ▽ nodes are “MIN nodes.” The terminal nodes show the utility values
for MAX; the other nodes are labeled with their minimax values. MAX’s best move at the root
is a1, because it leads to the state with the highest minimax value, and MIN’s best reply is b1,
because it leads to the state with the lowest minimax value.
MIN, then MAX’s moves in the states resulting from every possible response by MIN to those
moves, and so on. This is exactly analogous to the AND–OR search algorithm (Figure 4.11)
with MAX playing the role of OR and MIN equivalent to AND. Roughly speaking, an optimal
strategy leads to outcomes at least as good as any other strategy when one is playing an
infallible opponent. We begin by showing how to find this optimal strategy.
Even a simple game like tic-tac-toe is too complex for us to draw the entire game tree
on one page, so we will switch to the trivial game in Figure 5.2. The possible moves for MAX
at the root node are labeled a1, a2, and a3. The possible replies to a1 for MIN are b1, b2,
b3, and so on. This particular game ends after one move each by MAX and MIN. (In game
parlance, we say that this tree is one move deep, consisting of two half-moves, each of which
is called a ply.) The utilities of the terminal states in this game range from 2 to 14.
PLY
Given a game tree, the optimal strategy can be determined from the minimax value
MINIMAX VALUE
of each node, which we write as MINIMAX(n). The minimax value of a node is the utility
(for MAX) of being in the corresponding state, assuming that both players play optimally
from there to the end of the game. Obviously, the minimax value of a terminal state is just
its utility. Furthermore, given a choice, MAX prefers to move to a state of maximum value,
whereas MIN prefers a state of minimum value. So we have the following:
MINIMAX(s) =



UTILITY(s) if TERMINAL-TEST(s)
maxa∈Actions(s) MINIMAX(RESULT(s, a)) if PLAYER(s) = MAX
mina∈Actions(s) MINIMAX(RESULT(s,a)) if PLAYER(s) = MIN
Let us apply these definitions to the game tree in Figure 5.2. The terminal nodes on the bottom
level get their utility values from the game’s UTILITY function. The first MIN node, labeled
B, has three successor states with values 3, 12, and 8, so its minimax value is 3. Similarly,
the other two MIN nodes have minimax value 2. The root node is a MAX node; its successor
states have minimax values 3, 2, and 2; so it has a minimax value of 3. We can also identify

Section 5.2. Optimal Decisions in Games 165
the minimax decision at the root: action a1 is the optimal choice for MAX because it leads to
MINIMAX DECISION
the state with the highest minimax value.
This definition of optimal play for MAX assumes that MIN also plays optimally—it
maximizes the worst-case outcome for MAX. What if MIN does not play optimally? Then it is
easy to show (Exercise 5.7) that MAX will do even better. Other strategies against suboptimal
opponents may do better than the minimax strategy, but these strategies necessarily do worse
against optimal opponents.
5.2.1 The minimax algorithm
The minimax algorithm (Figure 5.3) computes the minimax decision from the current state.
MINIMAX ALGORITHM
It uses a simple recursive computation of the minimax values of each successor state, directly
implementing the defining equations. The recursion proceeds all the way down to the leaves
of the tree, and then the minimax values are backed up through the tree as the recursion
unwinds. For example, in Figure 5.2, the algorithm first recurses down to the three bottom-
left nodes and uses the UTILITY function on them to discover that their values are 3, 12, and
8, respectively. Then it takes the minimum of these values, 3, and returns it as the backed-
up value of node B. A similar process gives the backed-up values of 2 for C and 2 for D.
Finally, we take the maximum of 3, 2, and 2 to get the backed-up value of 3 for the root node.
The minimax algorithm performs a complete depth-first exploration of the game tree.
If the maximum depth of the tree is m and there are b legal moves at each point, then the
time complexity of the minimax algorithm is O(b m). The space complexity is O(bm) for an
algorithm that generates all actions at once, or O(m) for an algorithm that generates actions
one at a time (see page 87). For real games, of course, the time cost is totally impractical,
but this algorithm serves as the basis for the mathematical analysis of games and for more
practical algorithms.
5.2.2 Optimal decisions in multiplayer games
Many popular games allow more than two players. Let us examine how to extend the minimax
idea to multiplayer games. This is straightforward from the technical viewpoint, but raises
some interesting new conceptual issues.
First, we need to replace the single value for each node with a vector of values. For
example, in a three-player game with players A, B, and C, a vector hvA, vB, vCi is associated
with each node. For terminal states, this vector gives the utility of the state from each player’s
viewpoint. (In two-player, zero-sum games, the two-element vector can be reduced to a single
value because the values are always opposite.) The simplest way to implement this is to have
the UTILITY function return a vector of utilities.
Now we have to consider nonterminal states. Consider the node marked X in the game
tree shown in Figure 5.4. In that state, player C chooses what to do. The two choices lead
to terminal states with utility vectors hvA = 1, vB = 2, vC = 6i and hvA = 4, vB = 2, vC = 3i.
Since 6 is bigger than 3, C should choose the first move. This means that if state X is reached,
subsequent play will lead to a terminal state with utilities hvA = 1, vB = 2, vC = 6i. Hence,
the backed-up value of X is this vector. The backed-up value of a node n is always the utility

function MINIMAX-DECISION(state) returns an action
return arg maxa ∈ ACTIONS(s) MIN-VALUE(RESULT(state,a))
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ← −∞
for each a in ACTIONS(state) do
v ← MAX(v, MIN-VALUE(RESULT(s, a)))
return v
function MIN-VALUE(state) returns a utility value
v ← ∞
v ← MIN(v, MAX-VALUE(RESULT(s, a)))
return v
Figure 5.3 An algorithm for calculating minimax decisions. It returns the action corre-
sponding to the best possible move, that is, the move that leads to the outcome with the
best utility, under the assumption that the opponent plays to minimize utility. The functions
MAX-VALUE and MIN-VALUE go through the whole game tree, all the way to the leaves,
to determine the backed-up value of a state. The notation argmaxa ∈ S f(a) computes the
element a of set S that has the maximum value of f(a).
to move
A
B
C
A
(1, 2, 6) (4, 2, 3) (6, 1, 2) (7, 4,1) (5,1,1) (1, 5, 2) (7, 7,1) (5, 4, 5)
(1, 2, 6) (6, 1, 2) (1, 5, 2) (5, 4, 5)
(1, 2, 6) (1, 5, 2)
(1, 2, 6)
X
Figure 5.4 The ﬁrst three plies of a game tree with three players (A, B, C). Each node is
labeled with values from the viewpoint of each player. The best move is marked at the root.
vector of the successor state with the highest value for the player choosing at n. Anyone
who plays multiplayer games, such as Diplomacy, quickly becomes aware that much more
is going on than in two-player games. Multiplayer games usually involve alliances, whether
ALLIANCE
formal or informal, among the players. Alliances are made and broken as the game proceeds.
How are we to understand such behavior? Are alliances a natural consequence of optimal
strategies for each player in a multiplayer game? It turns out that they can be. For example,

Section 5.3. Alpha–Beta Pruning 167
suppose A and B are in weak positions and C is in a stronger position. Then it is often
optimal for both A and B to attack C rather than each other, lest C destroy each of them
individually. In this way, collaboration emerges from purely selfish behavior. Of course,
as soon as C weakens under the joint onslaught, the alliance loses its value, and either A
or B could violate the agreement. In some cases, explicit alliances merely make concrete
what would have happened anyway. In other cases, a social stigma attaches to breaking an
alliance, so players must balance the immediate advantage of breaking an alliance against the
long-term disadvantage of being perceived as untrustworthy. See Section 17.5 for more on
these complications.
If the game is not zero-sum, then collaboration can also occur with just two players.
Suppose, for example, that there is a terminal state with utilities hvA = 1000, vB = 1000i and
that 1000 is the highest possible utility for each player. Then the optimal strategy is for both
players to do everything possible to reach this state—that is, the players will automatically
cooperate to achieve a mutually desirable goal.
5.3 ALPHA–BETA PRUNING
The problem with minimax search is that the number of game states it has to examine is
exponential in the depth of the tree. Unfortunately, we can’t eliminate the exponent, but it
turns out we can effectively cut it in half. The trick is that it is possible to compute the correct
minimax decision without looking at every node in the game tree. That is, we can borrow the
idea of pruning from Chapter 3 to eliminate large parts of the tree from consideration. The
particular technique we examine is called alpha–beta pruning. When applied to a standard
ALPHA–BETA
PRUNING
minimax tree, it returns the same move as minimax would, but prunes away branches that
cannot possibly influence the final decision.
Consider again the two-ply game tree from Figure 5.2. Let’s go through the calculation
of the optimal decision once more, this time paying careful attention to what we know at
each point in the process. The steps are explained in Figure 5.5. The outcome is that we can
identify the minimax decision without ever evaluating two of the leaf nodes.
Another way to look at this is as a simplification of the formula for MINIMAX. Let the
two unevaluated successors of node C in Figure 5.5 have values x and y. Then the value of
the root node is given by
MINIMAX(root) = max(min(3, 12, 8), min(2, x, y), min(14, 5, 2))
= max(3, min(2, x, y), 2)
= max(3, z, 2) where z = min(2, x, y) ≤ 2
= 3.
In other words, the value of the root and hence the minimax decision are independent of the
values of the pruned leaves x and y.
Alpha–beta pruning can be applied to trees of any depth, and it is often possible to
prune entire subtrees rather than just leaves. The general principle is this: consider a node n

(a) (b)
(c) (d)
(e) (f)
3 3 12
3 12 8 3 12 8 2
3 12 8 2 14 3 12 8 2 14 5 2
A
B
A
B
A
B C D
A
B C D
A
B
A
B C
[−° , +° ] [−° , +° ]
[3, +° ]
[3, +° ]
[3, 3]
[3, 14]
[−° , 2]
[−° , 2] [2, 2]
[3, 3]
[3, 3]
[3, 3]
[3, 3]
[−° , 3] [−° , 3]
[−° , 2] [−° , 14]
Figure 5.5 Stages in the calculation of the optimal decision for the game tree in Figure 5.2.
At each point, we show the range of possible values for each node. (a) The first leaf below B
has the value 3. Hence, B, which is a MIN node, has a value of at most 3. (b) The second leaf
below B has a value of 12; MIN would avoid this move, so the value of B is still at most 3.
(c) The third leaf below B has a value of 8; we have seen all B’s successor states, so the
value of B is exactly 3. Now, we can infer that the value of the root is at least 3, because
MAX has a choice worth 3 at the root. (d) The first leaf below C has the value 2. Hence,
C, which is a MIN node, has a value of at most 2. But we know that B is worth 3, so MAX
would never choose C. Therefore, there is no point in looking at the other successor states
of C. This is an example of alpha–beta pruning. (e) The first leaf below D has the value 14,
so D is worth at most 14. This is still higher than MAX’s best alternative (i.e., 3), so we need
to keep exploring D’s successor states. Notice also that we now have bounds on all of the
successors of the root, so the root’s value is also at most 14. (f) The second successor of D
is worth 5, so again we need to keep exploring. The third successor is worth 2, so now D is
worth exactly 2. MAX’s decision at the root is to move to B, giving a value of 3.
somewhere in the tree (see Figure 5.6), such that Player has a choice of moving to that node.
If Player has a better choice m either at the parent node of n or at any choice point further up,
then n will never be reached in actual play. So once we have found out enough about n (by
examining some of its descendants) to reach this conclusion, we can prune it.
Remember that minimax search is depth-first, so at any one time we just have to con-
sider the nodes along a single path in the tree. Alpha–beta pruning gets its name from the
following two parameters that describe bounds on the backed-up values that appear anywhere
along the path:

Section 5.3. Alpha–Beta Pruning 169
Player
Opponent
Player
Opponent
m
n
•
•
•
Figure 5.6 The general case for alpha–beta pruning. If m is better than n for Player, we
will never get to n in play.
α = the value of the best (i.e., highest-value) choice we have found so far at any choice point
along the path for MAX.
β = the value of the best (i.e., lowest-value) choice we have found so far at any choice point
along the path for MIN.
Alpha–beta search updates the values of α and β as it goes along and prunes the remaining
branches at a node (i.e., terminates the recursive call) as soon as the value of the current
node is known to be worse than the current α or β value for MAX or MIN, respectively. The
complete algorithm is given in Figure 5.7. We encourage you to trace its behavior when
applied to the tree in Figure 5.5.
5.3.1 Move ordering
The effectiveness of alpha–beta pruning is highly dependent on the order in which the states
are examined. For example, in Figure 5.5(e) and (f), we could not prune any successors of D
at all because the worst successors (from the point of view of MIN) were generated first. If
the third successor of D had been generated first, we would have been able to prune the other
two. This suggests that it might be worthwhile to try to examine first the successors that are
likely to be best.
If this can be done,2 then it turns out that alpha–beta needs to examine only O(bm/2)
nodes to pick the best move, instead of O(bm) for minimax. This means that the effective
branching factor becomes
√
b instead of b—for chess, about 6 instead of 35. Put another
way, alpha–beta can solve a tree roughly twice as deep as minimax in the same amount of
time. If successors are examined in random order rather than best-first, the total number of
nodes examined will be roughly O(b3m/4) for moderate b. For chess, a fairly simple ordering
function (such as trying captures first, then threats, then forward moves, and then backward
moves) gets you to within about a factor of 2 of the best-case O(bm/2) result.
2 Obviously, it cannot be done perfectly; otherwise, the ordering function could be used to play a perfect game!

function ALPHA-BETA-SEARCH(state) returns an action
v ← MAX-VALUE(state,−∞,+∞)
return the action in ACTIONS(state) with value v
function MAX-VALUE(state,α,β) returns a utility value
v ← −∞
v ← MAX(v, MIN-VALUE(RESULT(s,a),α,β))
if v ≥ β then return v
α ← MAX(α, v)
return v
function MIN-VALUE(state,α,β) returns a utility value
v ← +∞
v ← MIN(v, MAX-VALUE(RESULT(s,a) ,α,β))
if v ≤ α then return v
β ← MIN(β, v)
return v
Figure 5.7 The alpha–beta search algorithm. Notice that these routines are the same as
the MINIMAX functions in Figure 5.3, except for the two lines in each of MIN-VALUE and
MAX-VALUE that maintain α and β (and the bookkeeping to pass these parameters along).
Adding dynamic move-ordering schemes, such as trying first the moves that were found
to be best in the past, brings us quite close to the theoretical limit. The past could be the
previous move—often the same threats remain—or it could come from previous exploration
of the current move. One way to gain information from the current move is with iterative
deepening search. First, search 1 ply deep and record the best path of moves. Then search
1 ply deeper, but use the recorded path to inform move ordering. As we saw in Chapter 3,
iterative deepening on an exponential game tree adds only a constant fraction to the total
search time, which can be more than made up from better move ordering. The best moves are
often called killer moves and to try them first is called the killer move heuristic.
KILLER MOVES
In Chapter 3, we noted that repeated states in the search tree can cause an exponential
increase in search cost. In many games, repeated states occur frequently because of transpo-
sitions—different permutations of the move sequence that end up in the same position. For
TRANSPOSITION
example, if White has one move, a1, that can be answered by Black with b1 and an unre-
lated move a2 on the other side of the board that can be answered by b2, then the sequences
[a1, b1, a2, b2] and [a2, b2, a1, b1] both end up in the same position. It is worthwhile to store
the evaluation of the resulting position in a hash table the first time it is encountered so that
we don’t have to recompute it on subsequent occurrences. The hash table of previously seen
positions is traditionally called a transposition table; it is essentially identical to the explored
TRANSPOSITION
TABLE

Section 5.4. Imperfect Real-Time Decisions 171
list in GRAPH-SEARCH (Section 3.3). Using a transposition table can have a dramatic effect,
sometimes as much as doubling the reachable search depth in chess. On the other hand, if we
are evaluating a million nodes per second, at some point it is not practical to keep all of them
in the transposition table. Various strategies have been used to choose which nodes to keep
and which to discard.
5.4 IMPERFECT REAL-TIME DECISIONS
The minimax algorithm generates the entire game search space, whereas the alpha–beta algo-
rithm allows us to prune large parts of it. However, alpha–beta still has to search all the way
to terminal states for at least a portion of the search space. This depth is usually not practical,
because moves must be made in a reasonable amount of time—typically a few minutes at
most. Claude Shannon’s paper Programming a Computer for Playing Chess (1950) proposed
instead that programs should cut off the search earlier and apply a heuristic evaluation func-
tion to states in the search, effectively turning nonterminal nodes into terminal leaves. In
EVALUATION
FUNCTION
other words, the suggestion is to alter minimax or alpha–beta in two ways: replace the utility
function by a heuristic evaluation function EVAL, which estimates the position’s utility, and
replace the terminal test by a cutoff test that decides when to apply EVAL. That gives us the
CUTOFF TEST
following for heuristic minimax for state s and maximum depth d:
H-MINIMAX(s, d) =



EVAL(s) if CUTOFF-TEST(s, d)
maxa∈Actions(s) H-MINIMAX(RESULT(s, a), d + 1) if PLAYER(s) = MAX
mina∈Actions(s) H-MINIMAX(RESULT(s,a), d + 1) if PLAYER(s) = MIN.
5.4.1 Evaluation functions
An evaluation function returns an estimate of the expected utility of the game from a given
position, just as the heuristic functions of Chapter 3 return an estimate of the distance to
the goal. The idea of an estimator was not new when Shannon proposed it. For centuries,
chess players (and aﬁcionados of other games) have developed ways of judging the value of
a position because humans are even more limited in the amount of search they can do than
are computer programs. It should be clear that the performance of a game-playing program
depends strongly on the quality of its evaluation function. An inaccurate evaluation function
will guide an agent toward positions that turn out to be lost. How exactly do we design good
evaluation functions?
First, the evaluation function should order the terminal states in the same way as the
true utility function: states that are wins must evaluate better than draws, which in turn must
be better than losses. Otherwise, an agent using the evaluation function might err even if it
can see ahead all the way to the end of the game. Second, the computation must not take
too long! (The whole point is to search faster.) Third, for nonterminal states, the evaluation
function should be strongly correlated with the actual chances of winning.

One might well wonder about the phrase “chances of winning.” After all, chess is not a
game of chance: we know the current state with certainty, and no dice are involved. But if the
search must be cut off at nonterminal states, then the algorithm will necessarily be uncertain
about the final outcomes of those states. This type of uncertainty is induced by computational,
rather than informational, limitations. Given the limited amount of computation that the
evaluation function is allowed to do for a given state, the best it can do is make a guess about
the final outcome.
Let us make this idea more concrete. Most evaluation functions work by calculating
various features of the state—for example, in chess, we would have features for the number
of white pawns, black pawns, white queens, black queens, and so on. The features, taken
together, define various categories or equivalence classes of states: the states in each category
have the same values for all the features. For example, one category contains all two-pawn
vs. one-pawn endgames. Any given category, generally speaking, will contain some states
that lead to wins, some that lead to draws, and some that lead to losses. The evaluation
function cannot know which states are which, but it can return a single value that reflects the
proportion of states with each outcome. For example, suppose our experience suggests that
72% of the states encountered in the two-pawns vs. one-pawn category lead to a win (utility
+1); 20% to a loss (0), and 8% to a draw (1/2). Then a reasonable evaluation for states in
the category is the expected value: (0.72 × +1) + (0.20 × 0) + (0.08 × 1/2) = 0.76. In
EXPECTED VALUE
principle, the expected value can be determined for each category, resulting in an evaluation
function that works for any state. As with terminal states, the evaluation function need not
return actual expected values as long as the ordering of the states is the same.
In practice, this kind of analysis requires too many categories and hence too much
experience to estimate all the probabilities of winning. Instead, most evaluation functions
compute separate numerical contributions from each feature and then combine them to find
the total value. For example, introductory chess books give an approximate material value
MATERIAL VALUE
for each piece: each pawn is worth 1, a knight or bishop is worth 3, a rook 5, and the queen 9.
Other features such as “good pawn structure” and “king safety” might be worth half a pawn,
say. These feature values are then simply added up to obtain the evaluation of the position.
A secure advantage equivalent to a pawn gives a substantial likelihood of winning, and
a secure advantage equivalent to three pawns should give almost certain victory, as illustrated
in Figure 5.8(a). Mathematically, this kind of evaluation function is called a weighted linear
function because it can be expressed as
WEIGHTED LINEAR
FUNCTION
EVAL(s) = w1f1(s) + w2f2(s) + · · · + wnfn(s) =
n
X
i=1
wifi(s) ,
where each wi is a weight and each fi is a feature of the position. For chess, the fi could be
the numbers of each kind of piece on the board, and the wi could be the values of the pieces
(1 for pawn, 3 for bishop, etc.).
Adding up the values of features seems like a reasonable thing to do, but in fact it
involves a strong assumption: that the contribution of each feature is independent of the
values of the other features. For example, assigning the value 3 to a bishop ignores the fact
that bishops are more powerful in the endgame, when they have a lot of space to maneuver.

(b) White to move
(a) White to move
Figure 5.8 Two chess positions that differ only in the position of the rook at lower right.
In (a), Black has an advantage of a knight and two pawns, which should be enough to win
the game. In (b), White will capture the queen, giving it an advantage that should be strong
enough to win.
For this reason, current programs for chess and other games also use nonlinear combinations
of features. For example, a pair of bishops might be worth slightly more than twice the value
of a single bishop, and a bishop is worth more in the endgame (that is, when the move number
feature is high or the number of remaining pieces feature is low).
The astute reader will have noticed that the features and weights are not part of the rules
of chess! They come from centuries of human chess-playing experience. In games where this
kind of experience is not available, the weights of the evaluation function can be estimated
by the machine learning techniques of Chapter 18. Reassuringly, applying these techniques
to chess has confirmed that a bishop is indeed worth about three pawns.
5.4.2 Cutting off search
The next step is to modify ALPHA-BETA-SEARCH so that it will call the heuristic EVAL
function when it is appropriate to cut off the search. We replace the two lines in Figure 5.7
that mention TERMINAL-TEST with the following line:
if CUTOFF-TEST(state, depth) then return EVAL(state)
We also must arrange for some bookkeeping so that the current depth is incremented on each
recursive call. The most straightforward approach to controlling the amount of search is to set
a fixed depth limit so that CUTOFF-TEST(state, depth) returns true for all depth greater than
some fixed depth d. (It must also return true for all terminal states, just as TERMINAL-TEST
did.) The depth d is chosen so that a move is selected within the allocated time. A more
robust approach is to apply iterative deepening. (See Chapter 3.) When time runs out, the
program returns the move selected by the deepest completed search. As a bonus, iterative
deepening also helps with move ordering.

These simple approaches can lead to errors due to the approximate nature of the eval-
uation function. Consider again the simple evaluation function for chess based on material
advantage. Suppose the program searches to the depth limit, reaching the position in Fig-
ure 5.8(b), where Black is ahead by a knight and two pawns. It would report this as the
heuristic value of the state, thereby declaring that the state is a probable win by Black. But
White’s next move captures Black’s queen with no compensation. Hence, the position is
really won for White, but this can be seen only by looking ahead one more ply.
Obviously, a more sophisticated cutoff test is needed. The evaluation function should be
applied only to positions that are quiescent—that is, unlikely to exhibit wild swings in value
QUIESCENCE
in the near future. In chess, for example, positions in which favorable captures can be made
are not quiescent for an evaluation function that just counts material. Nonquiescent positions
can be expanded further until quiescent positions are reached. This extra search is called a
quiescence search; sometimes it is restricted to consider only certain types of moves, such
QUIESCENCE
SEARCH
as capture moves, that will quickly resolve the uncertainties in the position.
The horizon effect is more difﬁcult to eliminate. It arises when the program is facing
HORIZON EFFECT
an opponent’s move that causes serious damage and is ultimately unavoidable, but can be
temporarily avoided by delaying tactics. Consider the chess game in Figure 5.9. It is clear
that there is no way for the black bishop to escape. For example, the white rook can capture
it by moving to h1, then a1, then a2; a capture at depth 6 ply. But Black does have a sequence
of moves that pushes the capture of the bishop “over the horizon.” Suppose Black searches
to depth 8 ply. Most moves by Black will lead to the eventual capture of the bishop, and thus
will be marked as “bad” moves. But Black will consider checking the white king with the
pawn at e4. This will lead to the king capturing the pawn. Now Black will consider checking
again, with the pawn at f5, leading to another pawn capture. That takes up 4 ply, and from
there the remaining 4 ply is not enough to capture the bishop. Black thinks that the line of
play has saved the bishop at the price of two pawns, when actually all it has done is push the
inevitable capture of the bishop beyond the horizon that Black can see.
One strategy to mitigate the horizon effect is the singular extension, a move that is
SINGULAR
EXTENSION
“clearly better” than all other moves in a given position. Once discovered anywhere in the
tree in the course of a search, this singular move is remembered. When the search reaches the
normal depth limit, the algorithm checks to see if the singular extension is a legal move; if it
is, the algorithm allows the move to be considered. This makes the tree deeper, but because
there will be few singular extensions, it does not add many total nodes to the tree.
5.4.3 Forward pruning
So far, we have talked about cutting off search at a certain level and about doing alpha–
beta pruning that provably has no effect on the result (at least with respect to the heuristic
evaluation values). It is also possible to do forward pruning, meaning that some moves at
FORWARD PRUNING
a given node are pruned immediately without further consideration. Clearly, most humans
playing chess consider only a few moves from each position (at least consciously). One
approach to forward pruning is beam search: on each ply, consider only a “beam” of the n
BEAM SEARCH
best moves (according to the evaluation function) rather than considering all possible moves.

a b c d e f g h
1
2
3
4
5
6
7
8
Figure 5.9 The horizon effect. With Black to move, the black bishop is surely doomed.
But Black can forestall that event by checking the white king with its pawns, forcing the king
to capture the pawns. This pushes the inevitable loss of the bishop over the horizon, and thus
the pawn sacriﬁces are seen by the search algorithm as good moves rather than bad ones.
Unfortunately, this approach is rather dangerous because there is no guarantee that the best
move will not be pruned away.
The PROBCUT, or probabilistic cut, algorithm (Buro, 1995) is a forward-pruning ver-
sion of alpha–beta search that uses statistics gained from prior experience to lessen the chance
that the best move will be pruned. Alpha–beta search prunes any node that is provably out-
side the current (α, β) window. PROBCUT also prunes nodes that are probably outside the
window. It computes this probability by doing a shallow search to compute the backed-up
value v of a node and then using past experience to estimate how likely it is that a score of v
at depth d in the tree would be outside (α, β). Buro applied this technique to his Othello pro-
gram, LOGISTELLO, and found that a version of his program with PROBCUT beat the regular
version 64% of the time, even when the regular version was given twice as much time.
Combining all the techniques described here results in a program that can play cred-
itable chess (or other games). Let us assume we have implemented an evaluation function for
chess, a reasonable cutoff test with a quiescence search, and a large transposition table. Let
us also assume that, after months of tedious bit-bashing, we can generate and evaluate around
a million nodes per second on the latest PC, allowing us to search roughly 200 million nodes
per move under standard time controls (three minutes per move). The branching factor for
chess is about 35, on average, and 355 is about 50 million, so if we used minimax search,
we could look ahead only about ﬁve plies. Though not incompetent, such a program can be
fooled easily by an average human chess player, who can occasionally plan six or eight plies
ahead. With alpha–beta search we get to about 10 plies, which results in an expert level of
play. Section 5.8 describes additional pruning techniques that can extend the effective search
depth to roughly 14 plies. To reach grandmaster status we would need an extensively tuned
evaluation function and a large database of optimal opening and endgame moves.

5.4.4 Search versus lookup
Somehow it seems like overkill for a chess program to start a game by considering a tree of a
billion game states, only to conclude that it will move its pawn to e4. Books describing good
play in the opening and endgame in chess have been available for about a century (Tattersall,
1911). It is not surprising, therefore, that many game-playing programs use table lookup
rather than search for the opening and ending of games.
For the openings, the computer is mostly relying on the expertise of humans. The best
advice of human experts on how to play each opening is copied from books and entered into
tables for the computer’s use. However, computers can also gather statistics from a database
of previously played games to see which opening sequences most often lead to a win. In
the early moves there are few choices, and thus much expert commentary and past games on
which to draw. Usually after ten moves we end up in a rarely seen position, and the program
must switch from table lookup to search.
Near the end of the game there are again fewer possible positions, and thus more chance
to do lookup. But here it is the computer that has the expertise: computer analysis of
endgames goes far beyond anything achieved by humans. A human can tell you the gen-
eral strategy for playing a king-and-rook-versus-king (KRK) endgame: reduce the opposing
king’s mobility by squeezing it toward one edge of the board, using your king to prevent the
opponent from escaping the squeeze. Other endings, such as king, bishop, and knight versus
king (KBNK), are difﬁcult to master and have no succinct strategy description. A computer,
on the other hand, can completely solve the endgame by producing a policy, which is a map-
POLICY
ping from every possible state to the best move in that state. Then we can just look up the best
move rather than recompute it anew. How big will the KBNK lookup table be? It turns out
there are 462 ways that two kings can be placed on the board without being adjacent. After
the kings are placed, there are 62 empty squares for the bishop, 61 for the knight, and two
possible players to move next, so there are just 462 × 62 × 61 × 2 = 3, 494, 568 possible
positions. Some of these are checkmates; mark them as such in a table. Then do a retrograde
RETROGRADE
minimax search: reverse the rules of chess to do unmoves rather than moves. Any move by
White that, no matter what move Black responds with, ends up in a position marked as a win,
must also be a win. Continue this search until all 3,494,568 positions are resolved as win,
loss, or draw, and you have an infallible lookup table for all KBNK endgames.
Using this technique and a tour de force of optimization tricks, Ken Thompson (1986,
1996) and Lewis Stiller (1992, 1996) solved all chess endgames with up to ﬁve pieces and
some with six pieces, making them available on the Internet. Stiller discovered one case
where a forced mate existed but required 262 moves; this caused some consternation because
the rules of chess require a capture or pawn move to occur within 50 moves. Later work by
Marc Bourzutschky and Yakov Konoval (Bourzutschky, 2006) solved all pawnless six-piece
and some seven-piece endgames; there is a KQNKRBN endgame that with best play requires
517 moves until a capture, which then leads to a mate.
If we could extend the chess endgame tables from 6 pieces to 32, then White would
know on the opening move whether it would be a win, loss, or draw. This has not happened
so far for chess, but it has happened for checkers, as explained in the historical notes section.

Section 5.5. Stochastic Games 177
5.5 STOCHASTIC GAMES
In real life, many unpredictable external events can put us into unforeseen situations. Many
games mirror this unpredictability by including a random element, such as the throwing of
dice. We call these stochastic games. Backgammon is a typical game that combines luck
STOCHASTIC GAMES
and skill. Dice are rolled at the beginning of a player’s turn to determine the legal moves. In
the backgammon position of Figure 5.10, for example, White has rolled a 6–5 and has four
possible moves.
1 2 3 4 5 6 7 8 9 10 11 12
24 23 22 21 20 19 18 17 16 15 14 13
0
25
Figure 5.10 A typical backgammon position. The goal of the game is to move all one’s
pieces off the board. White moves clockwise toward 25, and Black moves counterclockwise
toward 0. A piece can move to any position unless multiple opponent pieces are there; if there
is one opponent, it is captured and must start over. In the position shown, White has rolled
6–5 and must choose among four legal moves: (5–10,5–11), (5–11,19–24), (5–10,10–16),
and (5–11,11–16), where the notation (5–11,11–16) means move one piece from position 5
to 11, and then move a piece from 11 to 16.
Although White knows what his or her own legal moves are, White does not know what
Black is going to roll and thus does not know what Black’s legal moves will be. That means
White cannot construct a standard game tree of the sort we saw in chess and tic-tac-toe. A
game tree in backgammon must include chance nodes in addition to MAX and MIN nodes.
CHANCE NODES
Chance nodes are shown as circles in Figure 5.11. The branches leading from each chance
node denote the possible dice rolls; each branch is labeled with the roll and its probability.
There are 36 ways to roll two dice, each equally likely; but because a 6–5 is the same as a 5–6,
there are only 21 distinct rolls. The six doubles (1–1 through 6–6) each have a probability of
1/36, so we say P(1–1) = 1/36. The other 15 distinct rolls each have a 1/18 probability.

CHANCE
MIN
MAX
CHANCE
MAX
. . .
. . .
B
1
. . .
1,1
1/36
1,2
1/18
TERMINAL
1,2
1/18
...
...
...
...
...
...
...
1,1
1/36
...
...
... ...
...
...
C
. . .
1/18
6,5 6,6
1/36
1/18
6,5 6,6
1/36
2 –1
1
–1
Figure 5.11 Schematic game tree for a backgammon position.
The next step is to understand how to make correct decisions. Obviously, we still want
to pick the move that leads to the best position. However, positions do not have deﬁnite
minimax values. Instead, we can only calculate the expected value of a position: the average
EXPECTED VALUE
over all possible outcomes of the chance nodes.
This leads us to generalize the minimax value for deterministic games to an expecti-
minimax value for games with chance nodes. Terminal nodes and MAX and MIN nodes (for
EXPECTIMINIMAX
VALUE
which the dice roll is known) work exactly the same way as before. For chance nodes we
compute the expected value, which is the sum of the value over all outcomes, weighted by
the probability of each chance action:
EXPECTIMINIMAX(s) =







UTILITY(s) if TERMINAL-TEST(s)
maxa EXPECTIMINIMAX(RESULT(s, a)) if PLAYER(s) = MAX
mina EXPECTIMINIMAX(RESULT(s, a)) if PLAYER(s) = MIN
P
r P(r)EXPECTIMINIMAX(RESULT(s, r)) if PLAYER(s) = CHANCE
where r represents a possible dice roll (or other chance event) and RESULT(s, r) is the same
state as s, with the additional fact that the result of the dice roll is r.
5.5.1 Evaluation functions for games of chance
As with minimax, the obvious approximation to make with expectiminimax is to cut the
search off at some point and apply an evaluation function to each leaf. One might think that
evaluation functions for games such as backgammon should be just like evaluation functions

Section 5.5. Stochastic Games 179
for chess—they just need to give higher scores to better positions. But in fact, the presence of
chance nodes means that one has to be more careful about what the evaluation values mean.
Figure 5.12 shows what happens: with an evaluation function that assigns the values [1, 2,
3, 4] to the leaves, move a1 is best; with values [1, 20, 30, 400], move a2 is best. Hence,
the program behaves totally differently if we make a change in the scale of some evaluation
values! It turns out that to avoid this sensitivity, the evaluation function must be a positive
linear transformation of the probability of winning from a position (or, more generally, of the
expected utility of the position). This is an important and general property of situations in
which uncertainty is involved, and we discuss it further in Chapter 16.
CHANCE
MIN
MAX
2 2 3 3 1 1 4 4
2 3 1 4
.9 .1 .9 .1
2.1 1.3
20 20 30 30 1 1 400 400
20 30 1 400
.9 .1 .9 .1
21 40.9
a1 a2 a1 a2
Figure 5.12 An order-preserving transformation on leaf values changes the best move.
If the program knew in advance all the dice rolls that would occur for the rest of the
game, solving a game with dice would be just like solving a game without dice, which mini-
max does in O(bm) time, where b is the branching factor and m is the maximum depth of the
game tree. Because expectiminimax is also considering all the possible dice-roll sequences,
it will take O(bmnm), where n is the number of distinct rolls.
Even if the search depth is limited to some small depth d, the extra cost compared with
that of minimax makes it unrealistic to consider looking ahead very far in most games of
chance. In backgammon n is 21 and b is usually around 20, but in some situations can be as
high as 4000 for dice rolls that are doubles. Three plies is probably all we could manage.
Another way to think about the problem is this: the advantage of alpha–beta is that
it ignores future developments that just are not going to happen, given best play. Thus, it
concentrates on likely occurrences. In games with dice, there are no likely sequences of
moves, because for those moves to take place, the dice would ﬁrst have to come out the right
way to make them legal. This is a general problem whenever uncertainty enters the picture:
the possibilities are multiplied enormously, and forming detailed plans of action becomes
pointless because the world probably will not play along.
It may have occurred to you that something like alpha–beta pruning could be applied

to game trees with chance nodes. It turns out that it can. The analysis for MIN and MAX
nodes is unchanged, but we can also prune chance nodes, using a bit of ingenuity. Consider
the chance node C in Figure 5.11 and what happens to its value as we examine and evaluate
its children. Is it possible to ﬁnd an upper bound on the value of C before we have looked
at all its children? (Recall that this is what alpha–beta needs in order to prune a node and its
subtree.) At ﬁrst sight, it might seem impossible because the value of C is the average of its
children’s values, and in order to compute the average of a set of numbers, we must look at
all the numbers. But if we put bounds on the possible values of the utility function, then we
can arrive at bounds for the average without looking at every number. For example, say that
all utility values are between −2 and +2; then the value of leaf nodes is bounded, and in turn
we can place an upper bound on the value of a chance node without looking at all its children.
An alternative is to do Monte Carlo simulation to evaluate a position. Start with
MONTE CARLO
SIMULATION
an alpha–beta (or other) search algorithm. From a start position, have the algorithm play
thousands of games against itself, using random dice rolls. In the case of backgammon, the
resulting win percentage has been shown to be a good approximation of the value of the
position, even if the algorithm has an imperfect heuristic and is searching only a few plies
(Tesauro, 1995). For games with dice, this type of simulation is called a rollout.
ROLLOUT
5.6 PARTIALLY OBSERVABLE GAMES
Chess has often been described as war in miniature, but it lacks at least one major charac-
teristic of real wars, namely, partial observability. In the “fog of war,” the existence and
disposition of enemy units is often unknown until revealed by direct contact. As a result,
warfare includes the use of scouts and spies to gather information and the use of concealment
and bluff to confuse the enemy. Partially observable games share these characteristics and
are thus qualitatively different from the games described in the preceding sections.
5.6.1 Kriegspiel: Partially observable chess
In deterministic partially observable games, uncertainty about the state of the board arises en-
tirely from lack of access to the choices made by the opponent. This class includes children’s
games such as Battleships (where each player’s ships are placed in locations hidden from the
opponent but do not move) and Stratego (where piece locations are known but piece types are
hidden). We will examine the game of Kriegspiel, a partially observable variant of chess in
KRIEGSPIEL
which pieces can move but are completely invisible to the opponent.
The rules of Kriegspiel are as follows: White and Black each see a board containing
only their own pieces. A referee, who can see all the pieces, adjudicates the game and period-
ically makes announcements that are heard by both players. On his turn, White proposes to
the referee any move that would be legal if there were no black pieces. If the move is in fact
not legal (because of the black pieces), the referee announces “illegal.” In this case, White
may keep proposing moves until a legal one is found—and learns more about the location of
Black’s pieces in the process. Once a legal move is proposed, the referee announces one or

Section 5.6. Partially Observable Games 181
more of the following: “Capture on square X” if there is a capture, and “Check by D” if the
black king is in check, where D is the direction of the check, and can be one of “Knight,”
“Rank,” “File,” “Long diagonal,” or “Short diagonal.” (In case of discovered check, the ref-
eree may make two “Check” announcements.) If Black is checkmated or stalemated, the
referee says so; otherwise, it is Black’s turn to move.
Kriegspiel may seem terrifyingly impossible, but humans manage it quite well and com-
puter programs are beginning to catch up. It helps to recall the notion of a belief state as
defined in Section 4.4 and illustrated in Figure 4.14—the set of all logically possible board
states given the complete history of percepts to date. Initially, White’s belief state is a sin-
gleton because Black’s pieces haven’t moved yet. After White makes a move and Black re-
sponds, White’s belief state contains 20 positions because Black has 20 replies to any White
move. Keeping track of the belief state as the game progresses is exactly the problem of state
estimation, for which the update step is given in Equation (4.6). We can map Kriegspiel
state estimation directly onto the partially observable, nondeterministic framework of Sec-
tion 4.4 if we consider the opponent as the source of nondeterminism; that is, the RESULTS
of White’s move are composed from the (predictable) outcome of White’s own move and the
unpredictable outcome given by Black’s reply.3
Given a current belief state, White may ask, “Can I win the game?” For a partially
observable game, the notion of a strategy is altered; instead of specifying a move to make
for each possible move the opponent might make, we need a move for every possible percept
sequence that might be received. For Kriegspiel, a winning strategy, or guaranteed check-
mate, is one that, for each possible percept sequence, leads to an actual checkmate for every
GUARANTEED
CHECKMATE
possible board state in the current belief state, regardless of how the opponent moves. With
this definition, the opponent’s belief state is irrelevant—the strategy has to work even if the
opponent can see all the pieces. This greatly simplifies the computation. Figure 5.13 shows
part of a guaranteed checkmate for the KRK (king and rook against king) endgame. In this
case, Black has just one piece (the king), so a belief state for White can be shown in a single
board by marking each possible position of the Black king.
The general AND-OR search algorithm can be applied to the belief-state space to find
guaranteed checkmates, just as in Section 4.4. The incremental belief-state algorithm men-
tioned in that section often finds midgame checkmates up to depth 9—probably well beyond
the abilities of human players.
In addition to guaranteed checkmates, Kriegspiel admits an entirely new concept that
makes no sense in fully observable games: probabilistic checkmate. Such checkmates are
PROBABILISTIC
CHECKMATE
still required to work in every board state in the belief state; they are probabilistic with respect
to randomization of the winning player’s moves. To get the basic idea, consider the problem
of finding a lone black king using just the white king. Simply by moving randomly, the
white king will eventually bump into the black king even if the latter tries to avoid this fate,
since Black cannot keep guessing the right evasive moves indefinitely. In the terminology of
probability theory, detection occurs with probability 1. The KBNK endgame—king, bishop
3 Sometimes, the belief state will become too large to represent just as a list of board states, but we will ignore
this issue for now; Chapters 7 and 8 suggest methods for compactly representing very large belief states.

a
1
2
3
4
d
b c
Kc3 ?
“Illegal”
“OK”
Rc3 ?
“OK” “Check”
Figure 5.13 Part of a guaranteed checkmate in the KRK endgame, shown on a reduced
board. In the initial belief state, Black’s king is in one of three possible locations. By a
combination of probing moves, the strategy narrows this down to one. Completion of the
checkmate is left as an exercise.
and knight against king—is won in this sense; White presents Black with an inﬁnite random
sequence of choices, for one of which Black will guess incorrectly and reveal his position,
leading to checkmate. The KBBK endgame, on the other hand, is won with probability 1− ǫ.
White can force a win only by leaving one of his bishops unprotected for one move. If
Black happens to be in the right place and captures the bishop (a move that would lose if the
bishops are protected), the game is drawn. White can choose to make the risky move at some
randomly chosen point in the middle of a very long sequence, thus reducing ǫ to an arbitrarily
small constant, but cannot reduce ǫ to zero.
It is quite rare that a guaranteed or probabilistic checkmate can be found within any
reasonable depth, except in the endgame. Sometimes a checkmate strategy works for some of
the board states in the current belief state but not others. Trying such a strategy may succeed,
leading to an accidental checkmate—accidental in the sense that White could not know that
ACCIDENTAL
CHECKMATE
it would be checkmate—if Black’s pieces happen to be in the right places. (Most checkmates
in games between humans are of this accidental nature.) This idea leads naturally to the
question of how likely it is that a given strategy will win, which leads in turn to the question
of how likely it is that each board state in the current belief state is the true board state.

Section 5.6. Partially Observable Games 183
One’s first inclination might be to propose that all board states in the current belief state
are equally likely—but this can’t be right. Consider, for example, White’s belief state after
Black’s first move of the game. By definition (assuming that Black plays optimally), Black
must have played an optimal move, so all board states resulting from suboptimal moves ought
to be assigned zero probability. This argument is not quite right either, because each player’s
goal is not just to move pieces to the right squares but also to minimize the information that
the opponent has about their location. Playing any predictable “optimal” strategy provides
the opponent with information. Hence, optimal play in partially observable games requires
a willingness to play somewhat randomly. (This is why restaurant hygiene inspectors do
random inspection visits.) This means occasionally selecting moves that may seem “intrinsi-
cally” weak—but they gain strength from their very unpredictability, because the opponent is
unlikely to have prepared any defense against them.
From these considerations, it seems that the probabilities associated with the board
states in the current belief state can only be calculated given an optimal randomized strat-
egy; in turn, computing that strategy seems to require knowing the probabilities of the var-
ious states the board might be in. This conundrum can be resolved by adopting the game-
theoretic notion of an equilibrium solution, which we pursue further in Chapter 17. An
equilibrium specifies an optimal randomized strategy for each player. Computing equilib-
ria is prohibitively expensive, however, even for small games, and is out of the question for
Kriegspiel. At present, the design of effective algorithms for general Kriegspiel play is an
open research topic. Most systems perform bounded-depth lookahead in their own belief-
state space, ignoring the opponent’s belief state. Evaluation functions resemble those for the
observable game but include a component for the size of the belief state—smaller is better!
5.6.2 Card games
Card games provide many examples of stochastic partial observability, where the missing
information is generated randomly. For example, in many games, cards are dealt randomly at
the beginning of the game, with each player receiving a hand that is not visible to the other
players. Such games include bridge, whist, hearts, and some forms of poker.
At first sight, it might seem that these card games are just like dice games: the cards are
dealt randomly and determine the moves available to each player, but all the “dice” are rolled
at the beginning! Even though this analogy turns out to be incorrect, it suggests an effective
algorithm: consider all possible deals of the invisible cards; solve each one as if it were a
fully observable game; and then choose the move that has the best outcome averaged over all
the deals. Suppose that each deal s occurs with probability P(s); then the move we want is
argmax
a
X
s
P(s) MINIMAX(RESULT(s,a)) . (5.1)
Here, we run exact MINIMAX if computationally feasible; otherwise, we run H-MINIMAX.
Now, in most card games, the number of possible deals is rather large. For example,
in bridge play, each player sees just two of the four hands; there are two unseen hands of 13
cards each, so the number of deals is 26
13

= 10, 400, 600. Solving even one deal is quite
difficult, so solving ten million is out of the question. Instead, we resort to a Monte Carlo

approximation: instead of adding up all the deals, we take a random sample of N deals,
where the probability of deal s appearing in the sample is proportional to P(s):
argmax
a
1
N
N
X
i = 1
MINIMAX(RESULT(si, a)) . (5.2)
(Notice that P(s) does not appear explicitly in the summation, because the samples are al-
ready drawn according to P(s).) As N grows large, the sum over the random sample tends
to the exact value, but even for fairly small N—say, 100 to 1,000—the method gives a good
approximation. It can also be applied to deterministic games such as Kriegspiel, given some
reasonable estimate of P(s).
For games like whist and hearts, where there is no bidding or betting phase before play
commences, each deal will be equally likely and so the values of P(s) are all equal. For
bridge, play is preceded by a bidding phase in which each team indicates how many tricks it
expects to win. Since players bid based on the cards they hold, the other players learn more
about the probability of each deal. Taking this into account in deciding how to play the hand
is tricky, for the reasons mentioned in our description of Kriegspiel: players may bid in such
a way as to minimize the information conveyed to their opponents. Even so, the approach is
quite effective for bridge, as we show in Section 5.7.
The strategy described in Equations 5.1 and 5.2 is sometimes called averaging over
clairvoyance because it assumes that the game will become observable to both players im-
mediately after the first move. Despite its intuitive appeal, the strategy can lead one astray.
Consider the following story:
Day 1: Road A leads to a heap of gold; Road B leads to a fork. Take the left fork and
you’ll find a bigger heap of gold, but take the right fork and you’ll be run over by a bus.
Day 2: Road A leads to a heap of gold; Road B leads to a fork. Take the right fork and
you’ll find a bigger heap of gold, but take the left fork and you’ll be run over by a bus.
Day 3: Road A leads to a heap of gold; Road B leads to a fork. One branch of the
fork leads to a bigger heap of gold, but take the wrong fork and you’ll be hit by a bus.
Unfortunately you don’t know which fork is which.
Averaging over clairvoyance leads to the following reasoning: on Day 1, B is the right choice;
on Day 2, B is the right choice; on Day 3, the situation is the same as either Day 1 or Day 2,
so B must still be the right choice.
Now we can see how averaging over clairvoyance fails: it does not consider the belief
state that the agent will be in after acting. A belief state of total ignorance is not desirable, es-
pecially when one possibility is certain death. Because it assumes that every future state will
automatically be one of perfect knowledge, the approach never selects actions that gather in-
formation (like the first move in Figure 5.13); nor will it choose actions that hide information
from the opponent or provide information to a partner because it assumes that they already
know the information; and it will never bluff in poker,4 because it assumes the opponent can
BLUFF
see its cards. In Chapter 17, we show how to construct algorithms that do all these things by
virtue of solving the true partially observable decision problem.
4 Bluffing—betting as if one’s hand is good, even when it’s not—is a core part of poker strategy.

Section 5.7. State-of-the-Art Game Programs 185
5.7 STATE-OF-THE-ART GAME PROGRAMS
In 1965, the Russian mathematician Alexander Kronrod called chess “the Drosophila of ar-
tificial intelligence.” John McCarthy disagrees: whereas geneticists use fruit flies to make
discoveries that apply to biology more broadly, AI has used chess to do the equivalent of
breeding very fast fruit flies. Perhaps a better analogy is that chess is to AI as Grand Prix
motor racing is to the car industry: state-of-the-art game programs are blindingly fast, highly
optimized machines that incorporate the latest engineering advances, but they aren’t much
use for doing the shopping or driving off-road. Nonetheless, racing and game-playing gen-
erate excitement and a steady stream of innovations that have been adopted by the wider
community. In this section we look at what it takes to come out on top in various games.
Chess: IBM’s DEEP BLUE chess program, now retired, is well known for defeating world
CHESS
champion Garry Kasparov in a widely publicized exhibition match. Deep Blue ran on a par-
allel computer with 30 IBM RS/6000 processors doing alpha–beta search. The unique part
was a configuration of 480 custom VLSI chess processors that performed move generation
and move ordering for the last few levels of the tree, and evaluated the leaf nodes. Deep Blue
searched up to 30 billion positions per move, reaching depth 14 routinely. The key to its
success seems to have been its ability to generate singular extensions beyond the depth limit
for sufficiently interesting lines of forcing/forced moves. In some cases the search reached a
depth of 40 plies. The evaluation function had over 8000 features, many of them describing
highly specific patterns of pieces. An “opening book” of about 4000 positions was used, as
well as a database of 700,000 grandmaster games from which consensus recommendations
could be extracted. The system also used a large endgame database of solved positions con-
taining all positions with five pieces and many with six pieces. This database had the effect
of substantially extending the effective search depth, allowing Deep Blue to play perfectly in
some cases even when it was many moves away from checkmate.
The success of DEEP BLUE reinforced the widely held belief that progress in computer
game-playing has come primarily from ever-more-powerful hardware—a view encouraged
by IBM. But algorithmic improvements have allowed programs running on standard PCs
to win World Computer Chess Championships. A variety of pruning heuristics are used to
reduce the effective branching factor to less than 3 (compared with the actual branching factor
of about 35). The most important of these is the null move heuristic, which generates a good
NULL MOVE
lower bound on the value of a position, using a shallow search in which the opponent gets
to move twice at the beginning. This lower bound often allows alpha–beta pruning without
the expense of a full-depth search. Also important is futility pruning, which helps decide in
FUTILITY PRUNING
advance which moves will cause a beta cutoff in the successor nodes.
HYDRA can be seen as the successor to DEEP BLUE. HYDRA runs on a 64-processor
cluster with 1 gigabyte per processor and with custom hardware in the form of FPGA (Field
Programmable Gate Array) chips. HYDRA reaches 200 million evaluations per second, about
the same as Deep Blue, but HYDRA reaches 18 plies deep rather than just 14 because of
aggressive use of the null move heuristic and forward pruning.

RYBKA, winner of the 2008 and 2009 World Computer Chess Championships, is con-
sidered the strongest current computer player. It uses an off-the-shelf 8-core 3.2 GHz Intel
Xeon processor, but little is known about the design of the program. RYBKA’s main ad-
vantage appears to be its evaluation function, which has been tuned by its main developer,
International Master Vasik Rajlich, and at least three other grandmasters.
The most recent matches suggest that the top computer chess programs have pulled
ahead of all human contenders. (See the historical notes for details.)
Checkers: Jonathan Schaeffer and colleagues developed CHINOOK, which runs on regular
CHECKERS
PCs and uses alpha–beta search. Chinook defeated the long-running human champion in an
abbreviated match in 1990, and since 2007 CHINOOK has been able to play perfectly by using
alpha–beta search combined with a database of 39 trillion endgame positions.
Othello, also called Reversi, is probably more popular as a computer game than as a board
OTHELLO
game. It has a smaller search space than chess, usually 5 to 15 legal moves, but evaluation
expertise had to be developed from scratch. In 1997, the LOGISTELLO program (Buro, 2002)
defeated the human world champion, Takeshi Murakami, by six games to none. It is generally
acknowledged that humans are no match for computers at Othello.
Backgammon: Section 5.5 explained why the inclusion of uncertainty from dice rolls makes
BACKGAMMON
deep search an expensive luxury. Most work on backgammon has gone into improving the
evaluation function. Gerry Tesauro (1992) combined reinforcement learning with neural
networks to develop a remarkably accurate evaluator that is used with a search to depth 2
or 3. After playing more than a million training games against itself, Tesauro’s program,
TD-GAMMON, is competitive with top human players. The program’s opinions on the open-
ing moves of the game have in some cases radically altered the received wisdom.
Go is the most popular board game in Asia. Because the board is 19 × 19 and moves are
GO
allowed into (almost) every empty square, the branching factor starts at 361, which is too
daunting for regular alpha–beta search methods. In addition, it is difficult to write an eval-
uation function because control of territory is often very unpredictable until the endgame.
Therefore the top programs, such as MOGO, avoid alpha–beta search and instead use Monte
Carlo rollouts. The trick is to decide what moves to make in the course of the rollout. There is
no aggressive pruning; all moves are possible. The UCT (upper confidence bounds on trees)
method works by making random moves in the first few iterations, and over time guiding
the sampling process to prefer moves that have led to wins in previous samples. Some tricks
are added, including knowledge-based rules that suggest particular moves whenever a given
pattern is detected and limited local search to decide tactical questions. Some programs also
include special techniques from combinatorial game theory to analyze endgames. These
COMBINATORIAL
GAME THEORY
techniques decompose a position into sub-positions that can be analyzed separately and then
combined (Berlekamp and Wolfe, 1994; Müller, 2003). The optimal solutions obtained in
this way have surprised many professional Go players, who thought they had been playing
optimally all along. Current Go programs play at the master level on a reduced 9 × 9 board,
but are still at advanced amateur level on a full board.
Bridge is a card game of imperfect information: a player’s cards are hidden from the other
BRIDGE
players. Bridge is also a multiplayer game with four players instead of two, although the

Section 5.8. Alternative Approaches 187
players are paired into two teams. As in Section 5.6, optimal play in partially observable
games like bridge can include elements of information gathering, communication, and careful
weighing of probabilities. Many of these techniques are used in the Bridge Baron program
(Smith et al., 1998), which won the 1997 computer bridge championship. While it does
not play optimally, Bridge Baron is one of the few successful game-playing systems to use
complex, hierarchical plans (see Chapter 11) involving high-level ideas, such as finessing and
squeezing, that are familiar to bridge players.
The GIB program (Ginsberg, 1999) won the 2000 computer bridge championship quite
decisively using the Monte Carlo method. Since then, other winning programs have followed
GIB’s lead. GIB’s major innovation is using explanation-based generalization to compute
EXPLANATION-
BASED
GENERALIZATION
and cache general rules for optimal play in various standard classes of situations rather than
evaluating each situation individually. For example, in a situation where one player has the
cards A-K-Q-J-4-3-2 of one suit and another player has 10-9-8-7-6-5, there are 7 × 6 = 42
ways that the first player can lead from that suit and the second player can follow. But GIB
treats these situations as just two: the first player can lead either a high card or a low card;
the exact cards played don’t matter. With this optimization (and a few others), GIB can solve
a 52-card, fully observable deal exactly in about a second. GIB’s tactical accuracy makes up
for its inability to reason about information. It finished 12th in a field of 35 in the par contest
(involving just play of the hand, not bidding) at the 1998 human world championship, far
exceeding the expectations of many human experts.
There are several reasons why GIB plays at expert level with Monte Carlo simulation,
whereas Kriegspiel programs do not. First, GIB’s evaluation of the fully observable version
of the game is exact, searching the full game tree, while Kriegspiel programs rely on inexact
heuristics. But far more important is the fact that in bridge, most of the uncertainty in the
partially observable information comes from the randomness of the deal, not from the adver-
sarial play of the opponent. Monte Carlo simulation handles randomness well, but does not
always handle strategy well, especially when the strategy involves the value of information.
Scrabble: Most people think the hard part about Scrabble is coming up with good words, but
SCRABBLE
given the official dictionary, it turns out to be rather easy to program a move generator to find
the highest-scoring move (Gordon, 1994). That doesn’t mean the game is solved, however:
merely taking the top-scoring move each turn results in a good but not expert player. The
problem is that Scrabble is both partially observable and stochastic: you don’t know what
letters the other player has or what letters you will draw next. So playing Scrabble well
combines the difficulties of backgammon and bridge. Nevertheless, in 2006, the QUACKLE
program defeated the former world champion, David Boys, 3–2.
5.8 ALTERNATIVE APPROACHES
Because calculating optimal decisions in games is intractable in most cases, all algorithms
must make some assumptions and approximations. The standard approach, based on mini-
max, evaluation functions, and alpha–beta, is just one way to do this. Probably because it has

MAX
99 1000 1000 1000 100 101 102 100
100
99
MIN
Figure 5.14 A two-ply game tree for which heuristic minimax may make an error.
been worked on for so long, the standard approach dominates other methods in tournament
play. Some believe that this has caused game playing to become divorced from the main-
stream of AI research: the standard approach no longer provides much room for new insight
into general questions of decision making. In this section, we look at the alternatives.
First, let us consider heuristic minimax. It selects an optimal move in a given search
tree provided that the leaf node evaluations are exactly correct. In reality, evaluations are
usually crude estimates of the value of a position and can be considered to have large errors
associated with them. Figure 5.14 shows a two-ply game tree for which minimax suggests
taking the right-hand branch because 100 99. That is the correct move if the evaluations
are all correct. But of course the evaluation function is only approximate. Suppose that
the evaluation of each node has an error that is independent of other nodes and is randomly
distributed with mean zero and standard deviation of σ. Then when σ = 5, the left-hand
branch is actually better 71% of the time, and 58% of the time when σ = 2. The intuition
behind this is that the right-hand branch has four nodes that are close to 99; if an error in
the evaluation of any one of the four makes the right-hand branch slip below 99, then the
left-hand branch is better.
In reality, circumstances are actually worse than this because the error in the evaluation
function is not independent. If we get one node wrong, the chances are high that nearby nodes
in the tree will also be wrong. The fact that the node labeled 99 has siblings labeled 1000
suggests that in fact it might have a higher true value. We can use an evaluation function
that returns a probability distribution over possible values, but it is difﬁcult to combine these
distributions properly, because we won’t have a good model of the very strong dependencies
that exist between the values of sibling nodes
Next, we consider the search algorithm that generates the tree. The aim of an algorithm
designer is to specify a computation that runs quickly and yields a good move. The alpha–beta
algorithm is designed not just to select a good move but also to calculate bounds on the values
of all the legal moves. To see why this extra information is unnecessary, consider a position
in which there is only one legal move. Alpha–beta search still will generate and evaluate a
large search tree, telling us that the only move is the best move and assigning it a value. But
since we have to make the move anyway, knowing the move’s value is useless. Similarly, if
there is one obviously good move and several moves that are legal but lead to a quick loss, we

would not want alpha–beta to waste time determining a precise value for the lone good move.
Better to just make the move quickly and save the time for later. This leads to the idea of the
utility of a node expansion. A good search algorithm should select node expansions of high
utility—that is, ones that are likely to lead to the discovery of a significantly better move. If
there are no node expansions whose utility is higher than their cost (in terms of time), then
the algorithm should stop searching and make a move. Notice that this works not only for
clear-favorite situations but also for the case of symmetrical moves, for which no amount of
search will show that one move is better than another.
This kind of reasoning about what computations to do is called metareasoning (rea-
METAREASONING
soning about reasoning). It applies not just to game playing but to any kind of reasoning
at all. All computations are done in the service of trying to reach better decisions, all have
costs, and all have some likelihood of resulting in a certain improvement in decision quality.
Alpha–beta incorporates the simplest kind of metareasoning, namely, a theorem to the effect
that certain branches of the tree can be ignored without loss. It is possible to do much better.
In Chapter 16, we see how these ideas can be made precise and implementable.
Finally, let us reexamine the nature of search itself. Algorithms for heuristic search
and for game playing generate sequences of concrete states, starting from the initial state
and then applying an evaluation function. Clearly, this is not how humans play games. In
chess, one often has a particular goal in mind—for example, trapping the opponent’s queen—
and can use this goal to selectively generate plausible plans for achieving it. This kind of
goal-directed reasoning or planning sometimes eliminates combinatorial search altogether.
David Wilkins’ (1980) PARADISE is the only program to have used goal-directed reasoning
successfully in chess: it was capable of solving some chess problems requiring an 18-move
combination. As yet there is no good understanding of how to combine the two kinds of
algorithms into a robust and efficient system, although Bridge Baron might be a step in the
right direction. A fully integrated system would be a significant achievement not just for
game-playing research but also for AI research in general, because it would be a good basis
for a general intelligent agent.
5.9 SUMMARY
We have looked at a variety of games to understand what optimal play means and to under-
stand how to play well in practice. The most important ideas are as follows:
• A game can be defined by the initial state (how the board is set up), the legal actions
in each state, the result of each action, a terminal test (which says when the game is
over), and a utility function that applies to terminal states.
• In two-player zero-sum games with perfect information, the minimax algorithm can
select optimal moves by a depth-first enumeration of the game tree.
• The alpha–beta search algorithm computes the same optimal move as minimax, but
achieves much greater efficiency by eliminating subtrees that are provably irrelevant.
• Usually, it is not feasible to consider the whole game tree (even with alpha–beta), so we

need to cut the search off at some point and apply a heuristic evaluation function that
estimates the utility of a state.
• Many game programs precompute tables of best moves in the opening and endgame so
that they can look up a move rather than search.
• Games of chance can be handled by an extension to the minimax algorithm that eval-
uates a chance node by taking the average utility of all its children, weighted by the
probability of each child.
• Optimal play in games of imperfect information, such as Kriegspiel and bridge, re-
quires reasoning about the current and future belief states of each player. A simple
approximation can be obtained by averaging the value of an action over each possible
configuration of missing information.
• Programs have bested even champion human players at games such as chess, checkers,
and Othello. Humans retain the edge in several games of imperfect information, such
as poker, bridge, and Kriegspiel, and in games with very large branching factors and
little good heuristic knowledge, such as Go.
The early history of mechanical game playing was marred by numerous frauds. The most
notorious of these was Baron Wolfgang von Kempelen’s (1734–1804) “The Turk,” a supposed
chess-playing automaton that defeated Napoleon before being exposed as a magician’s trick
cabinet housing a human chess expert (see Levitt, 2000). It played from 1769 to 1854. In
1846, Charles Babbage (who had been fascinated by the Turk) appears to have contributed
the first serious discussion of the feasibility of computer chess and checkers (Morrison and
Morrison, 1961). He did not understand the exponential complexity of search trees, claiming
“the combinations involved in the Analytical Engine enormously surpassed any required,
even by the game of chess.” Babbage also designed, but did not build, a special-purpose
machine for playing tic-tac-toe. The first true game-playing machine was built around 1890
by the Spanish engineer Leonardo Torres y Quevedo. It specialized in the “KRK” (king and
rook vs. king) chess endgame, guaranteeing a win with king and rook from any position.
The minimax algorithm is traced to a 1912 paper by Ernst Zermelo, the developer of
modern set theory. The paper unfortunately contained several errors and did not describe min-
imax correctly. On the other hand, it did lay out the ideas of retrograde analysis and proposed
(but did not prove) what became known as Zermelo’s theorem: that chess is determined—
White can force a win or Black can or it is a draw; we just don’t know which. Zermelo says
that should we eventually know, “Chess would of course lose the character of a game at all.”
A solid foundation for game theory was developed in the seminal work Theory of Games
and Economic Behavior (von Neumann and Morgenstern, 1944), which included an analysis
showing that some games require strategies that are randomized (or otherwise unpredictable).
See Chapter 17 for more information.

John McCarthy conceived the idea of alpha–beta search in 1956, although he did not
publish it. The NSS chess program (Newell et al., 1958) used a simplified version of alpha–
beta; it was the first chess program to do so. Alpha–beta pruning was described by Hart and
Edwards (1961) and Hart et al. (1972). Alpha–beta was used by the “Kotok–McCarthy” chess
program written by a student of John McCarthy (Kotok, 1962). Knuth and Moore (1975)
proved the correctness of alpha–beta and analysed its time complexity. Pearl (1982b) shows
alpha–beta to be asymptotically optimal among all fixed-depth game-tree search algorithms.
Several attempts have been made to overcome the problems with the “standard ap-
proach” that were outlined in Section 5.8. The first nonexhaustive heuristic search algorithm
with some theoretical grounding was probably B∗ (Berliner, 1979), which attempts to main-
tain interval bounds on the possible value of a node in the game tree rather than giving it
a single point-valued estimate. Leaf nodes are selected for expansion in an attempt to re-
fine the top-level bounds until one move is “clearly best.” Palay (1985) extends the B∗ idea
using probability distributions on values in place of intervals. David McAllester’s (1988)
conspiracy number search expands leaf nodes that, by changing their values, could cause
the program to prefer a new move at the root. MGSS∗ (Russell and Wefald, 1989) uses the
decision-theoretic techniques of Chapter 16 to estimate the value of expanding each leaf in
terms of the expected improvement in decision quality at the root. It outplayed an alpha–
beta algorithm at Othello despite searching an order of magnitude fewer nodes. The MGSS∗
approach is, in principle, applicable to the control of any form of deliberation.
Alpha–beta search is in many ways the two-player analog of depth-first branch-and-
bound, which is dominated by A∗ in the single-agent case. The SSS∗ algorithm (Stockman,
1979) can be viewed as a two-player A∗ and never expands more nodes than alpha–beta to
reach the same decision. The memory requirements and computational overhead of the queue
make SSS∗ in its original form impractical, but a linear-space version has been developed
from the RBFS algorithm (Korf and Chickering, 1996). Plaat et al. (1996) developed a new
view of SSS∗ as a combination of alpha–beta and transposition tables, showing how to over-
come the drawbacks of the original algorithm and developing a new variant called MTD(f)
that has been adopted by a number of top programs.
D. F. Beal (1980) and Dana Nau (1980, 1983) studied the weaknesses of minimax ap-
plied to approximate evaluations. They showed that under certain assumptions about the dis-
tribution of leaf values in the tree, minimaxing can yield values at the root that are actually less
reliable than the direct use of the evaluation function itself. Pearl’s book Heuristics (1984)
partially explains this apparent paradox and analyzes many game-playing algorithms. Baum
and Smith (1997) propose a probability-based replacement for minimax, showing that it re-
sults in better choices in certain games. The expectiminimax algorithm was proposed by
Donald Michie (1966). Bruce Ballard (1983) extended alpha–beta pruning to cover trees
with chance nodes and Hauk (2004) reexamines this work and provides empirical results.
Koller and Pfeffer (1997) describe a system for completely solving partially observ-
able games. The system is quite general, handling games whose optimal strategy requires
randomized moves and games that are more complex than those handled by any previous
system. Still, it can’t handle games as complex as poker, bridge, and Kriegspiel. Frank
et al. (1998) describe several variants of Monte Carlo search, including one where MIN has

complete information but MAX does not. Among deterministic, partially observable games,
Kriegspiel has received the most attention. Ferguson demonstrated hand-derived random-
ized strategies for winning Kriegspiel with a bishop and knight (1992) or two bishops (1995)
against a king. The first Kriegspiel programs concentrated on finding endgame checkmates
and performed AND–OR search in belief-state space (Sakuta and Iida, 2002; Bolognesi and
Ciancarini, 2003). Incremental belief-state algorithms enabled much more complex midgame
checkmates to be found (Russell and Wolfe, 2005; Wolfe and Russell, 2007), but efficient
state estimation remains the primary obstacle to effective general play (Parker et al., 2005).
Chess was one of the first tasks undertaken in AI, with early efforts by many of the pio-
neers of computing, including Konrad Zuse in 1945, Norbert Wiener in his book Cybernetics
(1948), and Alan Turing in 1950 (see Turing et al., 1953). But it was Claude Shannon’s
article Programming a Computer for Playing Chess (1950) that had the most complete set
of ideas, describing a representation for board positions, an evaluation function, quiescence
search, and some ideas for selective (nonexhaustive) game-tree search. Slater (1950) and the
commentators on his article also explored the possibilities for computer chess play.
D. G. Prinz (1952) completed a program that solved chess endgame problems but did
not play a full game. Stan Ulam and a group at the Los Alamos National Lab produced a
program that played chess on a 6 × 6 board with no bishops (Kister et al., 1957). It could
search 4 plies deep in about 12 minutes. Alex Bernstein wrote the first documented program
to play a full game of standard chess (Bernstein and Roberts, 1958).5
The first computer chess match featured the Kotok–McCarthy program from MIT (Ko-
tok, 1962) and the ITEP program written in the mid-1960s at Moscow’s Institute of Theo-
retical and Experimental Physics (Adelson-Velsky et al., 1970). This intercontinental match
was played by telegraph. It ended with a 3–1 victory for the ITEP program in 1967. The first
chess program to compete successfully with humans was MIT’s MACHACK-6 (Greenblatt
et al., 1967). Its Elo rating of approximately 1400 was well above the novice level of 1000.
The Fredkin Prize, established in 1980, offered awards for progressive milestones in
chess play. The $5,000 prize for the first program to achieve a master rating went to BELLE
(Condon and Thompson, 1982), which achieved a rating of 2250. The $10,000 prize for the
first program to achieve a USCF (United States Chess Federation) rating of 2500 (near the
grandmaster level) was awarded to DEEP THOUGHT (Hsu et al., 1990) in 1989. The grand
prize, $100,000, went to DEEP BLUE (Campbell et al., 2002; Hsu, 2004) for its landmark
victory over world champion Garry Kasparov in a 1997 exhibition match. Kasparov wrote:
The decisive game of the match was Game 2, which left a scar in my memory . . . we saw
something that went well beyond our wildest expectations of how well a computer would
be able to foresee the long-term positional consequences of its decisions. The machine
refused to move to a position that had a decisive short-term advantage—showing a very
human sense of danger. (Kasparov, 1997)
Probably the most complete description of a modern chess program is provided by Ernst
Heinz (2000), whose DARKTHOUGHT program was the highest-ranked noncommercial PC
program at the 1999 world championships.
5 A Russian program, BESM may have predated Bernstein’s program.

(a) (b)
Figure 5.15 Pioneers in computer chess: (a) Herbert Simon and Allen Newell, developers
of the NSS program (1958); (b) John McCarthy and the Kotok–McCarthy program on an
IBM 7090 (1967).
In recent years, chess programs are pulling ahead of even the world’s best humans.
In 2004–2005 HYDRA defeated grand master Evgeny Vladimirov 3.5–0.5, world champion
Ruslan Ponomariov 2–0, and seventh-ranked Michael Adams 5.5–0.5. In 2006, DEEP FRITZ
beat world champion Vladimir Kramnik 4–2, and in 2007 RYBKA defeated several grand
masters in games in which it gave odds (such as a pawn) to the human players. As of 2009,
the highest Elo rating ever recorded was Kasparov’s 2851. HYDRA (Donninger and Lorenz,
2004) is rated somewhere between 2850 and 3000, based mostly on its trouncing of Michael
Adams. The RYBKA program is rated between 2900 and 3100, but this is based on a small
number of games and is not considered reliable. Ross (2004) shows how human players have
learned to exploit some of the weaknesses of the computer programs.
Checkers was the first of the classic games fully played by a computer. Christopher
Strachey (1952) wrote the first working program for checkers. Beginning in 1952, Arthur
Samuel of IBM, working in his spare time, developed a checkers program that learned its
own evaluation function by playing itself thousands of times (Samuel, 1959, 1967). We
describe this idea in more detail in Chapter 21. Samuel’s program began as a novice but
after only a few days’ self-play had improved itself beyond Samuel’s own level. In 1962 it
defeated Robert Nealy, a champion at “blind checkers,” through an error on his part. When
one considers that Samuel’s computing equipment (an IBM 704) had 10,000 words of main
memory, magnetic tape for long-term storage, and a .000001 GHz processor, the win remains
a great accomplishment.
The challenge started by Samuel was taken up by Jonathan Schaeffer of the University
of Alberta. His CHINOOK program came in second in the 1990 U.S. Open and earned the
right to challenge for the world championship. It then ran up against a problem, in the form
of Marion Tinsley. Dr. Tinsley had been world champion for over 40 years, losing only
three games in all that time. In the first match against CHINOOK, Tinsley suffered his fourth

and fifth losses, but won the match 20.5–18.5. A rematch at the 1994 world championship
ended prematurely when Tinsley had to withdraw for health reasons. CHINOOK became the
official world champion. Schaeffer kept on building on his database of endgames, and in
2007 “solved” checkers (Schaeffer et al., 2007; Schaeffer, 2008). This had been predicted by
Richard Bellman (1965). In the paper that introduced the dynamic programming approach
to retrograde analysis, he wrote, “In checkers, the number of possible moves in any given
situation is so small that we can confidently expect a complete digital computer solution to
the problem of optimal play in this game.” Bellman did not, however, fully appreciate the
size of the checkers game tree. There are about 500 quadrillion positions. After 18 years
of computation on a cluster of 50 or more machines, Jonathan Schaeffer’s team completed
an endgame table for all checkers positions with 10 or fewer pieces: over 39 trillion entries.
From there, they were able to do forward alpha–beta search to derive a policy that proves
that checkers is in fact a draw with best play by both sides. Note that this is an application
of bidirectional search (Section 3.4.6). Building an endgame table for all of checkers would
be impractical: it would require a billion gigabytes of storage. Searching without any table
would also be impractical: the search tree has about 847 positions, and would take thousands
of years to search with today’s technology. Only a combination of clever search, endgame
data, and a drop in the price of processors and memory could solve checkers. Thus, checkers
joins Qubic (Patashnik, 1980), Connect Four (Allis, 1988), and Nine-Men’s Morris (Gasser,
1998) as games that have been solved by computer analysis.
Backgammon, a game of chance, was analyzed mathematically by Gerolamo Cardano
(1663), but only taken up for computer play in the late 1970s, first with the BKG pro-
gram (Berliner, 1980b); it used a complex, manually constructed evaluation function and
searched only to depth 1. It was the first program to defeat a human world champion at a ma-
jor classic game (Berliner, 1980a). Berliner readily acknowledged that BKG was very lucky
with the dice. Gerry Tesauro’s (1995) TD-GAMMON played consistently at world champion
level. The BGBLITZ program was the winner of the 2008 Computer Olympiad.
Go is a deterministic game, but the large branching factor makes it challeging. The key
issues and early literature in computer Go are summarized by Bouzy and Cazenave (2001) and
Müller (2002). Up to 1997 there were no competent Go programs. Now the best programs
play most of their moves at the master level; the only problem is that over the course of a
game they usually make at least one serious blunder that allows a strong opponent to win.
Whereas alpha–beta search reigns in most games, many recent Go programs have adopted
Monte Carlo methods based on the UCT (upper confidence bounds on trees) scheme (Kocsis
and Szepesvari, 2006). The strongest Go program as of 2009 is Gelly and Silver’s MOGO
(Wang and Gelly, 2007; Gelly and Silver, 2008). In August 2008, MOGO scored a surprising
win against top professional Myungwan Kim, albeit with MOGO receiving a handicap of
nine stones (about the equivalent of a queen handicap in chess). Kim estimated MOGO’s
strength at 2–3 dan, the low end of advanced amateur. For this match, MOGO was run on
an 800-processor 15 teraflop supercomputer (1000 times Deep Blue). A few weeks later,
MOGO, with only a five-stone handicap, won against a 6-dan professional. In the 9 × 9 form
of Go, MOGO is at approximately the 1-dan professional level. Rapid advances are likely
as experimentation continues with new forms of Monte Carlo search. The Computer Go

Exercises 195
Newsletter, published by the Computer Go Association, describes current developments.
Bridge: Smith et al. (1998) report on how their planning-based program won the 1998
computer bridge championship, and (Ginsberg, 2001) describes how his GIB program, based
on Monte Carlo simulation, won the following computer championship and did surprisingly
well against human players and standard book problem sets. From 2001–2007, the computer
bridge championship was won five times by JACK and twice by WBRIDGE5. Neither has
had academic articles explaining their structure, but both are rumored to use the Monte Carlo
technique, which was first proposed for bridge by Levy (1989).
Scrabble: A good description of a top program, MAVEN, is given by its creator, Brian
Sheppard (2002). Generating the highest-scoring move is described by Gordon (1994), and
modeling opponents is covered by Richards and Amir (2007).
Soccer (Kitano et al., 1997b; Visser et al., 2008) and billiards (Lam and Greenspan,
2008; Archibald et al., 2009) and other stochastic games with a continuous space of actions
are beginning to attract attention in AI, both in simulation and with physical robot players.
Computer game competitions occur annually, and papers appear in a variety of venues.
The rather misleadingly named conference proceedings Heuristic Programming in Artificial
Intelligence report on the Computer Olympiads, which include a wide variety of games. The
General Game Competition (Love et al., 2006) tests programs that must learn to play an un-
known game given only a logical description of the rules of the game. There are also several
edited collections of important papers on game-playing research (Levy, 1988a, 1988b; Mars-
land and Schaeffer, 1990). The International Computer Chess Association (ICCA), founded
in 1977, publishes the ICGA Journal (formerly the ICCA Journal). Important papers have
been published in the serial anthology Advances in Computer Chess, starting with Clarke
(1977). Volume 134 of the journal Artificial Intelligence (2002) contains descriptions of
state-of-the-art programs for chess, Othello, Hex, shogi, Go, backgammon, poker, Scrabble,
and other games. Since 1998, a biennial Computers and Games conference has been held.
EXERCISES
5.1 Suppose you have an oracle, OM(s), that correctly predicts the opponent’s move in
any state. Using this, formulate the definition of a game as a (single-agent) search problem.
Describe an algorithm for finding the optimal move.
5.2 Consider the problem of solving two 8-puzzles.
a. Give a complete problem formulation in the style of Chapter 3.
b. How large is the reachable state space? Give an exact numerical expression.
c. Suppose we make the problem adversarial as follows: the two players take turns mov-
ing; a coin is flipped to determine the puzzle on which to make a move in that turn; and
the winner is the first to solve one puzzle. Which algorithm can be used to choose a
move in this setting?
d. Give an informal proof that someone will eventually win if both play perfectly.

(b)
(a) a
f
e
d
c
b
bd
cd ad
ce cf cc ae af ac
de df
dd dd
? ?
?
?
?
P E
Figure 5.16 (a) A map where the cost of every edge is 1. Initially the pursuer P is at node
b and the evader E is at node d. (b) A partial game tree for this map. Each node is labeled
with the P, E positions. P moves ﬁrst. Branches marked “?” have yet to be explored.
5.3 Imagine that, in Exercise 3.4, one of the friends wants to avoid the other. The problem
then becomes a two-player pursuit–evasion game. We assume now that the players take
PURSUIT–EVASION
turns moving. The game ends only when the players are on the same node; the terminal
payoff to the pursuer is minus the total time taken. (The evader “wins” by never losing.) An
example is shown in Figure 5.16.
a. Copy the game tree and mark the values of the terminal nodes.
b. Next to each internal node, write the strongest fact you can infer about its value (a
number, one or more inequalities such as “≥ 14”, or a “?”).
c. Beneath each question mark, write the name of the node reached by that branch.
d. Explain how a bound on the value of the nodes in (c) can be derived from consideration
of shortest-path lengths on the map, and derive such bounds for these nodes. Remember
the cost to get to each leaf as well as the cost to solve it.
e. Now suppose that the tree as given, with the leaf bounds from (d), is evaluated from left
to right. Circle those “?” nodes that would not need to be expanded further, given the
bounds from part (d), and cross out those that need not be considered at all.
f. Can you prove anything in general about who wins the game on a map that is a tree?

Exercises 197
5.4 Describe and implement state descriptions, move generators, terminal tests, utility func-
tions, and evaluation functions for one or more of the following stochastic games: Monopoly,
Scrabble, bridge play with a given contract, or Texas hold’em poker.
5.5 Describe and implement a real-time, multiplayer game-playing environment, where
time is part of the environment state and players are given fixed time allocations.
5.6 Discuss how well the standard approach to game playing would apply to games such as
tennis, pool, and croquet, which take place in a continuous physical state space.
5.7 Prove the following assertion: For every game tree, the utility obtained by MAX using
minimax decisions against a suboptimal MIN will be never be lower than the utility obtained
playing against an optimal MIN. Can you come up with a game tree in which MAX can do
still better using a suboptimal strategy against a suboptimal MIN?
A B
1 4
3
2
Figure 5.17 The starting position of a simple game. Player A moves first. The two players
take turns moving, and each player must move his token to an open adjacent space in either
direction. If the opponent occupies an adjacent space, then a player may jump over the
opponent to the next open space if any. (For example, if A is on 3 and B is on 2, then A may
move back to 1.) The game ends when one player reaches the opposite end of the board. If
player A reaches space 4 first, then the value of the game to A is +1; if player B reaches
space 1 first, then the value of the game to A is −1.
5.8 Consider the two-player game described in Figure 5.17.
a. Draw the complete game tree, using the following conventions:
• Write each state as (sA, sB), where sA and sB denote the token locations.
• Put each terminal state in a square box and write its game value in a circle.
• Put loop states (states that already appear on the path to the root) in double square
boxes. Since their value is unclear, annotate each with a “?” in a circle.
b. Now mark each node with its backed-up minimax value (also in a circle). Explain how
you handled the “?” values and why.
c. Explain why the standard minimax algorithm would fail on this game tree and briefly
sketch how you might fix it, drawing on your answer to (b). Does your modified algo-
rithm give optimal decisions for all games with loops?
d. This 4-square game can be generalized to n squares for any n 2. Prove that A wins
if n is even and loses if n is odd.
5.9 This problem exercises the basic concepts of game playing, using tic-tac-toe (noughts
and crosses) as an example. We define Xn as the number of rows, columns, or diagonals

with exactly n X’s and no O’s. Similarly, On is the number of rows, columns, or diagonals
with just n O’s. The utility function assigns +1 to any position with X3 = 1 and −1 to any
position with O3 = 1. All other terminal positions have utility 0. For nonterminal positions,
we use a linear evaluation function defined as Eval(s) = 3X2(s)+X1(s)−(3O2(s)+O1(s)).
a. Approximately how many possible games of tic-tac-toe are there?
b. Show the whole game tree starting from an empty board down to depth 2 (i.e., one X
and one O on the board), taking symmetry into account.
c. Mark on your tree the evaluations of all the positions at depth 2.
d. Using the minimax algorithm, mark on your tree the backed-up values for the positions
at depths 1 and 0, and use those values to choose the best starting move.
e. Circle the nodes at depth 2 that would not be evaluated if alpha–beta pruning were
applied, assuming the nodes are generated in the optimal order for alpha–beta pruning.
5.10 Consider the family of generalized tic-tac-toe games, defined as follows. Each partic-
ular game is specified by a set S of squares and a collection W of winning positions. Each
winning position is a subset of S. For example, in standard tic-tac-toe, S is a set of 9 squares
and W is a collection of 8 subsets of W: the three rows, the three columns, and the two diag-
onals. In other respects, the game is identical to standard tic-tac-toe. Starting from an empty
board, players alternate placing their marks on an empty square. A player who marks every
square in a winning position wins the game. It is a tie if all squares are marked and neither
player has won.
a. Let N = |S|, the number of squares. Give an upper bound on the number of nodes in
the complete game tree for generalized tic-tac-toe as a function of N.
b. Give a lower bound on the size of the game tree for the worst case, where W = { }.
c. Propose a plausible evaluation function that can be used for any instance of generalized
tic-tac-toe. The function may depend on S and W.
d. Assume that it is possible to generate a new board and check whether it is a winning
position in 100N machine instructions and assume a 2 gigahertz processor. Ignore
memory limitations. Using your estimate in (a), roughly how large a game tree can be
completely solved by alpha–beta in a second of CPU time? a minute? an hour?
5.11 Develop a general game-playing program, capable of playing a variety of games.
a. Implement move generators and evaluation functions for one or more of the following
games: Kalah, Othello, checkers, and chess.
b. Construct a general alpha–beta game-playing agent.
c. Compare the effect of increasing search depth, improving move ordering, and improv-
ing the evaluation function. How close does your effective branching factor come to the
ideal case of perfect move ordering?
d. Implement a selective search algorithm, such as B* (Berliner, 1979), conspiracy number
search (McAllester, 1988), or MGSS* (Russell and Wefald, 1989) and compare its
performance to A*.

Exercises 199
n1
n2
nj
Figure 5.18 Situation when considering whether to prune node nj.
5.12 Describe how the minimax and alpha–beta algorithms change for two-player, non-
zero-sum games in which each player has a distinct utility function and both utility functions
are known to both players. If there are no constraints on the two terminal utilities, is it possible
for any node to be pruned by alpha–beta? What if the player’s utility functions on any state
sum to a number between constants −k and k, making the game almost zero-sum?
5.13 Develop a formal proof of correctness for alpha–beta pruning. To do this, consider the
situation shown in Figure 5.18. The question is whether to prune node nj, which is a max-
node and a descendant of node n1. The basic idea is to prune it if and only if the minimax
value of n1 can be shown to be independent of the value of nj.
a. Mode n1 takes on the minimum value among its children: n1 = min(n2, n21, . . . , n2b2 ).
Find a similar expression for n2 and hence an expression for n1 in terms of nj.
b. Let li be the minimum (or maximum) value of the nodes to the left of node ni at depth i,
whose minimax value is already known. Similarly, let ri be the minimum (or maximum)
value of the unexplored nodes to the right of ni at depth i. Rewrite your expression for
n1 in terms of the li and ri values.
c. Now reformulate the expression to show that in order to affect n1, nj must not exceed
a certain bound derived from the li values.
d. Repeat the process for the case where nj is a min-node.
5.14 Prove that alpha–beta pruning takes time O(2m/2) with optimal move ordering, where
m is the maximum depth of the game tree.
5.15 Suppose you have a chess program that can evaluate 5 million nodes per second. De-
cide on a compact representation of a game state for storage in a transposition table. About
how many entries can you ﬁt in a 1-gigabyte in-memory table? Will that be enough for the

0.5 0.5
0.5 0.5
2 2 1 2 0 2 -1 0
Figure 5.19 The complete game tree for a trivial game with chance nodes.
three minutes of search allocated for one move? How many table lookups can you do in the
time it would take to do one evaluation? Now suppose the transposition table is stored on
disk. About how many evaluations could you do in the time it takes to do one disk seek with
standard disk hardware?
5.16 This question considers pruning in games with chance nodes. Figure 5.19 shows the
complete game tree for a trivial game. Assume that the leaf nodes are to be evaluated in left-
to-right order, and that before a leaf node is evaluated, we know nothing about its value—the
range of possible values is −∞ to ∞.
a. Copy the figure, mark the value of all the internal nodes, and indicate the best move at
the root with an arrow.
b. Given the values of the first six leaves, do we need to evaluate the seventh and eighth
leaves? Given the values of the first seven leaves, do we need to evaluate the eighth
leaf? Explain your answers.
c. Suppose the leaf node values are known to lie between –2 and 2 inclusive. After the
first two leaves are evaluated, what is the value range for the left-hand chance node?
d. Circle all the leaves that need not be evaluated under the assumption in (c).
5.17 Implement the expectiminimax algorithm and the *-alpha–beta algorithm, which is
described by Ballard (1983), for pruning game trees with chance nodes. Try them on a game
such as backgammon and measure the pruning effectiveness of *-alpha–beta.
5.18 Prove that with a positive linear transformation of leaf values (i.e., transforming a
value x to ax + b where a 0), the choice of move remains unchanged in a game tree, even
when there are chance nodes.
5.19 Consider the following procedure for choosing moves in games with chance nodes:
• Generate some dice-roll sequences (say, 50) down to a suitable depth (say, 8).
• With known dice rolls, the game tree becomes deterministic. For each dice-roll se-
quence, solve the resulting deterministic game tree using alpha–beta.

Exercises 201
• Use the results to estimate the value of each move and to choose the best.
Will this procedure work well? Why (or why not)?
5.20 In the following, a “max” tree consists only of max nodes, whereas an “expectimax”
tree consists of a max node at the root with alternating layers of chance and max nodes. At
chance nodes, all outcome probabilities are nonzero. The goal is to find the value of the root
with a bounded-depth search.
a. Assuming that leaf values are finite but unbounded, is pruning (as in alpha–beta) ever
possible in a max tree? Give an example, or explain why not.
b. Is pruning ever possible in an expectimax tree under the same conditions? Give an
example, or explain why not.
c. If leaf values are constrained to be in the range [0, 1], is pruning ever possible in a max
tree? Give an example, or explain why not.
d. If leaf values are constrained to be in the range [0, 1], is pruning ever possible in an
expectimax tree? Give an example (qualitatively different from your example in (e), if
any), or explain why not.
e. If leaf values are constrained to be nonnegative, is pruning ever possible in a max tree?
Give an example, or explain why not.
f. If leaf values are constrained to be nonnegative, is pruning ever possible in an expecti-
max tree? Give an example, or explain why not.
g. Consider the outcomes of a chance node in an expectimax tree. Which of the following
evaluation orders is most likely to yield pruning opportunities: (i) Lowest probability
first; (ii) Highest probability first; (iii) Doesn’t make any difference?
5.21 Which of the following are true and which are false? Give brief explanations.
a. In a fully observable, turn-taking, zero-sum game between two perfectly rational play-
ers, it does not help the first player to know what strategy the second player is using—
that is, what move the second player will make, given the first player’s move.
b. In a partially observable, turn-taking, zero-sum game between two perfectly rational
players, it does not help the first player to know what move the second player will
make, given the first player’s move.
c. A perfectly rational backgammon agent never loses.
5.22 Consider carefully the interplay of chance events and partial information in each of the
games in Exercise 5.4.
a. For which is the standard expectiminimax model appropriate? Implement the algorithm
and run it in your game-playing agent, with appropriate modifications to the game-
playing environment.
b. For which would the scheme described in Exercise 5.19 be appropriate?
c. Discuss how you might deal with the fact that in some of the games, the players do not
have the same knowledge of the current state.

6 CONSTRAINT
SATISFACTION PROBLEMS
In which we see how treating states as more than just little black boxes leads to the
invention of a range of powerful new search methods and a deeper understanding
of problem structure and complexity.
Chapters 3 and 4 explored the idea that problems can be solved by searching in a space of
states. These states can be evaluated by domain-specific heuristics and tested to see whether
they are goal states. From the point of view of the search algorithm, however, each state is
atomic, or indivisible—a black box with no internal structure.
This chapter describes a way to solve a wide variety of problems more efficiently. We
use a factored representation for each state: a set of variables, each of which has a value.
A problem is solved when each variable has a value that satisfies all the constraints on the
variable. A problem described this way is called a constraint satisfaction problem, or CSP.
CONSTRAINT
SATISFACTION
PROBLEM
CSP search algorithms take advantage of the structure of states and use general-purpose
rather than problem-specific heuristics to enable the solution of complex problems. The main
idea is to eliminate large portions of the search space all at once by identifying variable/value
combinations that violate the constraints.
6.1 DEFINING CONSTRAINT SATISFACTION PROBLEMS
A constraint satisfaction problem consists of three components, X, D, and C:
X is a set of variables, {X1, . . . , Xn}.
D is a set of domains, {D1, . . . , Dn}, one for each variable.
C is a set of constraints that specify allowable combinations of values.
Each domain Di consists of a set of allowable values, {v1, . . . , vk} for variable Xi. Each
constraint Ci consists of a pair hscope, reli, where scope is a tuple of variables that participate
in the constraint and rel is a relation that defines the values that those variables can take on. A
relation can be represented as an explicit list of all tuples of values that satisfy the constraint,
or as an abstract relation that supports two operations: testing if a tuple is a member of the
relation and enumerating the members of the relation. For example, if X1 and X2 both have
202

Section 6.1. Defining Constraint Satisfaction Problems 203
the domain {A,B}, then the constraint saying the two variables must have different values
can be written as h(X1, X2), [(A, B), (B, A)]i or as h(X1, X2), X1 6= X2i.
To solve a CSP, we need to define a state space and the notion of a solution. Each
state in a CSP is defined by an assignment of values to some or all of the variables, {Xi =
ASSIGNMENT
vi, Xj = vj, . . .}. An assignment that does not violate any constraints is called a consistent
CONSISTENT
or legal assignment. A complete assignment is one in which every variable is assigned, and
COMPLETE
ASSIGNMENT
a solution to a CSP is a consistent, complete assignment. A partial assignment is one that
SOLUTION
PARTIAL
ASSIGNMENT assigns values to only some of the variables.
6.1.1 Example problem: Map coloring
Suppose that, having tired of Romania, we are looking at a map of Australia showing each
of its states and territories (Figure 6.1(a)). We are given the task of coloring each region
either red, green, or blue in such a way that no neighboring regions have the same color. To
formulate this as a CSP, we define the variables to be the regions
X = {WA, NT, Q, NSW , V, SA, T} .
The domain of each variable is the set Di = {red, green, blue}. The constraints require
neighboring regions to have distinct colors. Since there are nine places where regions border,
there are nine constraints:
C = {SA 6= WA, SA 6= NT, SA 6= Q, SA 6= NSW , SA 6= V,
WA 6= NT, NT 6= Q, Q 6= NSW , NSW 6= V } .
Here we are using abbreviations; SA 6= WA is a shortcut for h(SA, WA), SA 6= WAi, where
SA 6= WA can be fully enumerated in turn as
{(red, green), (red, blue), (green, red), (green, blue), (blue, red), (blue, green)} .
There are many possible solutions to this problem, such as
{WA = red, NT = green, Q = red, NSW = green, V = red, SA = blue, T = red }.
It can be helpful to visualize a CSP as a constraint graph, as shown in Figure 6.1(b). The
CONSTRAINT GRAPH
nodes of the graph correspond to variables of the problem, and a link connects any two vari-
ables that participate in a constraint.
Why formulate a problem as a CSP? One reason is that the CSPs yield a natural rep-
resentation for a wide variety of problems; if you already have a CSP-solving system, it is
often easier to solve a problem using it than to design a custom solution using another search
technique. In addition, CSP solvers can be faster than state-space searchers because the CSP
solver can quickly eliminate large swatches of the search space. For example, once we have
chosen {SA = blue} in the Australia problem, we can conclude that none of the five neighbor-
ing variables can take on the value blue. Without taking advantage of constraint propagation,
a search procedure would have to consider 35 = 243 assignments for the five neighboring
variables; with constraint propagation we never have to consider blue as a value, so we have
only 25 = 32 assignments to look at, a reduction of 87%.
In regular state-space search we can only ask: is this specific state a goal? No? What
about this one? With CSPs, once we find out that a partial assignment is not a solution, we can

204 Chapter 6. Constraint Satisfaction Problems
Western
Australia
Northern
Territory
South
Australia
Queensland
New
South
Wales
Victoria
Tasmania
WA
NT
SA
Q
NSW
V
T
(a) (b)
Figure 6.1 (a) The principal states and territories of Australia. Coloring this map can
be viewed as a constraint satisfaction problem (CSP). The goal is to assign colors to each
region so that no neighboring regions have the same color. (b) The map-coloring problem
represented as a constraint graph.
immediately discard further refinements of the partial assignment. Furthermore, we can see
why the assignment is not a solution—we see which variables violate a constraint—so we can
focus attention on the variables that matter. As a result, many problems that are intractable
for regular state-space search can be solved quickly when formulated as a CSP.
6.1.2 Example problem: Job-shop scheduling
Factories have the problem of scheduling a day’s worth of jobs, subject to various constraints.
In practice, many of these problems are solved with CSP techniques. Consider the problem of
scheduling the assembly of a car. The whole job is composed of tasks, and we can model each
task as a variable, where the value of each variable is the time that the task starts, expressed
as an integer number of minutes. Constraints can assert that one task must occur before
another—for example, a wheel must be installed before the hubcap is put on—and that only
so many tasks can go on at once. Constraints can also specify that a task takes a certain
amount of time to complete.
We consider a small part of the car assembly, consisting of 15 tasks: install axles (front
and back), affix all four wheels (right and left, front and back), tighten nuts for each wheel,
affix hubcaps, and inspect the final assembly. We can represent the tasks with 15 variables:
X = {AxleF , AxleB, WheelRF , WheelLF , WheelRB, WheelLB, NutsRF ,
NutsLF , NutsRB, NutsLB, CapRF , CapLF , CapRB, CapLB, Inspect} .
The value of each variable is the time that the task starts. Next we represent precedence
constraints between individual tasks. Whenever a task T1 must occur before task T2, and
PRECEDENCE
CONSTRAINTS
task T1 takes duration d1 to complete, we add an arithmetic constraint of the form
T1 + d1 ≤ T2 .

In our example, the axles have to be in place before the wheels are put on, and it takes 10
minutes to install an axle, so we write
AxleF + 10 ≤ WheelRF ; AxleF + 10 ≤ WheelLF ;
AxleB + 10 ≤ WheelRB; AxleB + 10 ≤ WheelLB .
Next we say that, for each wheel, we must affix the wheel (which takes 1 minute), then tighten
the nuts (2 minutes), and finally attach the hubcap (1 minute, but not represented yet):
WheelRF + 1 ≤ NutsRF ; NutsRF + 2 ≤ CapRF ;
WheelLF + 1 ≤ NutsLF ; NutsLF + 2 ≤ CapLF ;
WheelRB + 1 ≤ NutsRB; NutsRB + 2 ≤ CapRB;
WheelLB + 1 ≤ NutsLB; NutsLB + 2 ≤ CapLB .
Suppose we have four workers to install wheels, but they have to share one tool that helps put
the axle in place. We need a disjunctive constraint to say that AxleF and AxleB must not
DISJUNCTIVE
CONSTRAINT
overlap in time; either one comes first or the other does:
(AxleF + 10 ≤ AxleB) or (AxleB + 10 ≤ AxleF ) .
This looks like a more complicated constraint, combining arithmetic and logic. But it still
reduces to a set of pairs of values that AxleF and AxleF can take on.
We also need to assert that the inspection comes last and takes 3 minutes. For every
variable except Inspect we add a constraint of the form X + dX ≤ Inspect. Finally, suppose
there is a requirement to get the whole assembly done in 30 minutes. We can achieve that by
limiting the domain of all variables:
Di = {1, 2, 3, . . . , 27} .
This particular problem is trivial to solve, but CSPs have been applied to job-shop schedul-
ing problems like this with thousands of variables. In some cases, there are complicated
constraints that are difficult to specify in the CSP formalism, and more advanced planning
techniques are used, as discussed in Chapter 11.
6.1.3 Variations on the CSP formalism
The simplest kind of CSP involves variables that have discrete, finite domains. Map-
DISCRETE DOMAIN
FINITE DOMAIN coloring problems and scheduling with time limits are both of this kind. The 8-queens prob-
lem described in Chapter 3 can also be viewed as a finite-domain CSP, where the variables
Q1, . . . , Q8 are the positions of each queen in columns 1, . . . , 8 and each variable has the
domain Di = {1, 2, 3, 4, 5, 6, 7, 8}.
A discrete domain can be infinite, such as the set of integers or strings. (If we didn’t put
INFINITE
a deadline on the job-scheduling problem, there would be an infinite number of start times
for each variable.) With infinite domains, it is no longer possible to describe constraints by
enumerating all allowed combinations of values. Instead, a constraint language must be
CONSTRAINT
LANGUAGE
used that understands constraints such as T1 + d1 ≤ T2 directly, without enumerating the
set of pairs of allowable values for (T1, T2). Special solution algorithms (which we do not
discuss here) exist for linear constraints on integer variables—that is, constraints, such as
LINEAR
CONSTRAINTS
the one just given, in which each variable appears only in linear form. It can be shown that
no algorithm exists for solving general nonlinear constraints on integer variables.
NONLINEAR
CONSTRAINTS

Constraint satisfaction problems with continuous domains are common in the real
CONTINUOUS
DOMAINS
world and are widely studied in the field of operations research. For example, the scheduling
of experiments on the Hubble Space Telescope requires very precise timing of observations;
the start and finish of each observation and maneuver are continuous-valued variables that
must obey a variety of astronomical, precedence, and power constraints. The best-known
category of continuous-domain CSPs is that of linear programming problems, where con-
straints must be linear equalities or inequalities. Linear programming problems can be solved
in time polynomial in the number of variables. Problems with different types of constraints
and objective functions have also been studied—quadratic programming, second-order conic
programming, and so on.
In addition to examining the types of variables that can appear in CSPs, it is useful to
look at the types of constraints. The simplest type is the unary constraint, which restricts
UNARY CONSTRAINT
the value of a single variable. For example, in the map-coloring problem it could be the case
that South Australians won’t tolerate the color green; we can express that with the unary
constraint h(SA), SA 6= greeni
A binary constraint relates two variables. For example, SA 6= NSW is a binary
BINARY CONSTRAINT
constraint. A binary CSP is one with only binary constraints; it can be represented as a
constraint graph, as in Figure 6.1(b).
We can also describe higher-order constraints, such as asserting that the value of Y is
between X and Z, with the ternary constraint Between(X, Y, Z).
A constraint involving an arbitrary number of variables is called a global constraint.
GLOBAL
CONSTRAINT
(The name is traditional but confusing because it need not involve all the variables in a prob-
lem). One of the most common global constraints is Alldiff , which says that all of the
variables involved in the constraint must have different values. In Sudoku problems (see
Section 6.2.6), all variables in a row or column must satisfy an Alldiff constraint. An-
other example is provided by cryptarithmetic puzzles. (See Figure 6.2(a).) Each letter in a
CRYPTARITHMETIC
cryptarithmetic puzzle represents a different digit. For the case in Figure 6.2(a), this would
be represented as the global constraint Alldiff (F, T, U, W, R, O). The addition constraints
on the four columns of the puzzle can be written as the following n-ary constraints:
O + O = R + 10 · C10
C10 + W + W = U + 10 · C100
C100 + T + T = O + 10 · C1000
C1000 = F ,
where C10, C100, and C1000 are auxiliary variables representing the digit carried over into the
tens, hundreds, or thousands column. These constraints can be represented in a constraint
hypergraph, such as the one shown in Figure 6.2(b). A hypergraph consists of ordinary nodes
CONSTRAINT
HYPERGRAPH
(the circles in the figure) and hypernodes (the squares), which represent n-ary constraints.
Alternatively, as Exercise 6.5 asks you to prove, every finite-domain constraint can be
reduced to a set of binary constraints if enough auxiliary variables are introduced, so we could
transform any CSP into one with only binary constraints; this makes the algorithms simpler.
Another way to convert an n-ary CSP to a binary one is the dual graph transformation: create
DUAL GRAPH
a new graph in which there will be one variable for each constraint in the original graph, and

(a)
O
W
T
F U R
(b)
+
F
T
T
O
W
W
U
O
O
R
C3 C1
C2
Figure 6.2 (a) A cryptarithmetic problem. Each letter stands for a distinct digit; the aim is
to find a substitution of digits for letters such that the resulting sum is arithmetically correct,
with the added restriction that no leading zeroes are allowed. (b) The constraint hypergraph
for the cryptarithmetic problem, showing the Alldiff constraint (square box at the top) as
well as the column addition constraints (four square boxes in the middle). The variables C1,
C2, and C3 represent the carry digits for the three columns.
one binary constraint for each pair of constraints in the original graph that share variables. For
example, if the original graph has variables {X, Y, Z} and constraints h(X, Y, Z), C1i and
h(X, Y ), C2i then the dual graph would have variables {C1, C2} with the binary constraint
h(X, Y ), R1i, where (X, Y ) are the shared variables and R1 is a new relation that defines the
constraint between the shared variables, as specified by the original C1 and C2.
There are however two reasons why we might prefer a global constraint such as Alldiff
rather than a set of binary constraints. First, it is easier and less error-prone to write the
problem description using Alldiff . Second, it is possible to design special-purpose inference
algorithms for global constraints that are not available for a set of more primitive constraints.
We describe these inference algorithms in Section 6.2.5.
The constraints we have described so far have all been absolute constraints, violation of
which rules out a potential solution. Many real-world CSPs include preference constraints
PREFERENCE
CONSTRAINTS
indicating which solutions are preferred. For example, in a university class-scheduling prob-
lem there are absolute constraints that no professor can teach two classes at the same time.
But we also may allow preference constraints: Prof. R might prefer teaching in the morning,
whereas Prof. N prefers teaching in the afternoon. A schedule that has Prof. R teaching at
2 p.m. would still be an allowable solution (unless Prof. R happens to be the department chair)
but would not be an optimal one. Preference constraints can often be encoded as costs on in-
dividual variable assignments—for example, assigning an afternoon slot for Prof. R costs
2 points against the overall objective function, whereas a morning slot costs 1. With this
formulation, CSPs with preferences can be solved with optimization search methods, either
path-based or local. We call such a problem a constraint optimization problem, or COP.
CONSTRAINT
OPTIMIZATION
PROBLEM
Linear programming problems do this kind of optimization.

6.2 CONSTRAINT PROPAGATION: INFERENCE IN CSPS
In regular state-space search, an algorithm can do only one thing: search. In CSPs there is a
choice: an algorithm can search (choose a new variable assignment from several possibilities)
or do a specific type of inference called constraint propagation: using the constraints to
INFERENCE
CONSTRAINT
PROPAGATION reduce the number of legal values for a variable, which in turn can reduce the legal values
for another variable, and so on. Constraint propagation may be intertwined with search, or it
may be done as a preprocessing step, before search starts. Sometimes this preprocessing can
solve the whole problem, so no search is required at all.
The key idea is local consistency. If we treat each variable as a node in a graph (see
LOCAL
CONSISTENCY
Figure 6.1(b)) and each binary constraint as an arc, then the process of enforcing local con-
sistency in each part of the graph causes inconsistent values to be eliminated throughout the
graph. There are different types of local consistency, which we now cover in turn.
6.2.1 Node consistency
A single variable (corresponding to a node in the CSP network) is node-consistent if all
NODE CONSISTENCY
the values in the variable’s domain satisfy the variable’s unary constraints. For example,
in the variant of the Australia map-coloring problem (Figure 6.1) where South Australians
dislike green, the variable SA starts with domain {red, green, blue}, and we can make it
node consistent by eliminating green, leaving SA with the reduced domain {red, blue}. We
say that a network is node-consistent if every variable in the network is node-consistent.
It is always possible to eliminate all the unary constraints in a CSP by running node
consistency. It is also possible to transform all n-ary constraints into binary ones (see Ex-
ercise 6.5). Because of this, it is common to define CSP solvers that work with only binary
constraints; we make that assumption for the rest of this chapter, except where noted.
6.2.2 Arc consistency
A variable in a CSP is arc-consistent if every value in its domain satisfies the variable’s
ARC CONSISTENCY
binary constraints. More formally, Xi is arc-consistent with respect to another variable Xj if
for every value in the current domain Di there is some value in the domain Dj that satisfies
the binary constraint on the arc (Xi, Xj). A network is arc-consistent if every variable is arc
consistent with every other variable. For example, consider the constraint Y = X2 where the
domain of both X and Y is the set of digits. We can write this constraint explicitly as
h(X, Y ), {(0, 0), (1, 1), (2, 4), (3, 9))}i .
To make X arc-consistent with respect to Y , we reduce X’s domain to {0, 1, 2, 3}. If we
also make Y arc-consistent with respect to X, then Y ’s domain becomes {0, 1, 4, 9} and the
whole CSP is arc-consistent.
On the other hand, arc consistency can do nothing for the Australia map-coloring prob-
lem. Consider the following inequality constraint on (SA, WA):
{(red, green), (red, blue), (green, red), (green, blue), (blue, red), (blue, green)} .

Section 6.2. Constraint Propagation: Inference in CSPs 209
function AC-3(csp) returns false if an inconsistency is found and true otherwise
inputs: csp, a binary CSP with components (X, D, C)
local variables: queue, a queue of arcs, initially all the arcs in csp
while queue is not empty do
(Xi, Xj) ← REMOVE-FIRST(queue)
if REVISE(csp, Xi, Xj) then
if size of Di = 0 then return false
for each Xk in Xi.NEIGHBORS - {Xj} do
add (Xk, Xi) to queue
return true
function REVISE(csp, Xi, Xj) returns true iff we revise the domain of Xi
revised ← false
for each x in Di do
if no value y in Dj allows (x,y) to satisfy the constraint between Xi and Xj then
delete x from Di
revised ← true
return revised
Figure 6.3 The arc-consistency algorithm AC-3. After applying AC-3, either every arc
is arc-consistent, or some variable has an empty domain, indicating that the CSP cannot be
solved. The name “AC-3” was used by the algorithm’s inventor (Mackworth, 1977) because
it’s the third version developed in the paper.
No matter what value you choose for SA (or for WA), there is a valid value for the other
variable. So applying arc consistency has no effect on the domains of either variable.
The most popular algorithm for arc consistency is called AC-3 (see Figure 6.3). To
make every variable arc-consistent, the AC-3 algorithm maintains a queue of arcs to consider.
(Actually, the order of consideration is not important, so the data structure is really a set, but
tradition calls it a queue.) Initially, the queue contains all the arcs in the CSP. AC-3 then pops
off an arbitrary arc (Xi, Xj) from the queue and makes Xi arc-consistent with respect to Xj.
If this leaves Di unchanged, the algorithm just moves on to the next arc. But if this revises
Di (makes the domain smaller), then we add to the queue all arcs (Xk, Xi) where Xk is a
neighbor of Xi. We need to do that because the change in Di might enable further reductions
in the domains of Dk, even if we have previously considered Xk. If Di is revised down to
nothing, then we know the whole CSP has no consistent solution, and AC-3 can immediately
return failure. Otherwise, we keep checking, trying to remove values from the domains of
variables until no more arcs are in the queue. At that point, we are left with a CSP that is
equivalent to the original CSP—they both have the same solutions—but the arc-consistent
CSP will in most cases be faster to search because its variables have smaller domains.
The complexity of AC-3 can be analyzed as follows. Assume a CSP with n variables,
each with domain size at most d, and with c binary constraints (arcs). Each arc (Xk, Xi) can
be inserted in the queue only d times because Xi has at most d values to delete. Checking

consistency of an arc can be done in O(d2) time, so we get O(cd3) total worst-case time.1
It is possible to extend the notion of arc consistency to handle n-ary rather than just
binary constraints; this is called generalized arc consistency or sometimes hyperarc consis-
tency, depending on the author. A variable Xi is generalized arc consistent with respect to
GENERALIZED ARC
CONSISTENT
an n-ary constraint if for every value v in the domain of Xi there exists a tuple of values that
is a member of the constraint, has all its values taken from the domains of the corresponding
variables, and has its Xi component equal to v. For example, if all variables have the do-
main {0, 1, 2, 3}, then to make the variable X consistent with the constraint X Y Z,
we would have to eliminate 2 and 3 from the domain of X because the constraint cannot be
satisfied when X is 2 or 3.
6.2.3 Path consistency
Arc consistency can go a long way toward reducing the domains of variables, sometimes
finding a solution (by reducing every domain to size 1) and sometimes finding that the CSP
cannot be solved (by reducing some domain to size 0). But for other networks, arc consistency
fails to make enough inferences. Consider the map-coloring problem on Australia, but with
only two colors allowed, red and blue. Arc consistency can do nothing because every variable
is already arc consistent: each can be red with blue at the other end of the arc (or vice versa).
But clearly there is no solution to the problem: because Western Australia, Northern Territory
and South Australia all touch each other, we need at least three colors for them alone.
Arc consistency tightens down the domains (unary constraints) using the arcs (binary
constraints). To make progress on problems like map coloring, we need a stronger notion of
consistency. Path consistency tightens the binary constraints by using implicit constraints
PATH CONSISTENCY
that are inferred by looking at triples of variables.
A two-variable set {Xi, Xj} is path-consistent with respect to a third variable Xm if,
for every assignment {Xi = a, Xj = b} consistent with the constraints on {Xi, Xj}, there is
an assignment to Xm that satisfies the constraints on {Xi, Xm} and {Xm, Xj}. This is called
path consistency because one can think of it as looking at a path from Xi to Xj with Xm in
the middle.
Let’s see how path consistency fares in coloring the Australia map with two colors. We
will make the set {WA, SA} path consistent with respect to NT. We start by enumerating the
consistent assignments to the set. In this case, there are only two: {WA = red, SA = blue}
and {WA = blue, SA = red}. We can see that with both of these assignments NT can be
neither red nor blue (because it would conflict with either WA or SA). Because there is no
valid choice for NT, we eliminate both assignments, and we end up with no valid assignments
for {WA, SA}. Therefore, we know that there can be no solution to this problem. The PC-2
algorithm (Mackworth, 1977) achieves path consistency in much the same way that AC-3
achieves arc consistency. Because it is so similar, we do not show it here.
1 The AC-4 algorithm (Mohr and Henderson, 1986) runs in O(cd2
) worst-case time but can be slower than AC-3
on average cases. See Exercise 6.12.

6.2.4 K-consistency
Stronger forms of propagation can be defined with the notion of k-consistency. A CSP is
K-CONSISTENCY
k-consistent if, for any set of k − 1 variables and for any consistent assignment to those
variables, a consistent value can always be assigned to any kth variable. 1-consistency says
that, given the empty set, we can make any set of one variable consistent: this is what we
called node consistency. 2-consistency is the same as arc consistency. For binary constraint
networks, 3-consistency is the same as path consistency.
A CSP is strongly k-consistent if it is k-consistent and is also (k − 1)-consistent,
STRONGLY
K-CONSISTENT
(k − 2)-consistent, . . . all the way down to 1-consistent. Now suppose we have a CSP with
n nodes and make it strongly n-consistent (i.e., strongly k-consistent for k = n). We can
then solve the problem as follows: First, we choose a consistent value for X1. We are then
guaranteed to be able to choose a value for X2 because the graph is 2-consistent, for X3
because it is 3-consistent, and so on. For each variable Xi, we need only search through the d
values in the domain to find a value consistent with X1, . . . , Xi−1. We are guaranteed to find
a solution in time O(n2d). Of course, there is no free lunch: any algorithm for establishing
n-consistency must take time exponential in n in the worst case. Worse, n-consistency also
requires space that is exponential in n. The memory issue is even more severe than the time.
In practice, determining the appropriate level of consistency checking is mostly an empirical
science. It can be said practitioners commonly compute 2-consistency and less commonly
3-consistency.
6.2.5 Global constraints
Remember that a global constraint is one involving an arbitrary number of variables (but not
necessarily all variables). Global constraints occur frequently in real problems and can be
handled by special-purpose algorithms that are more efficient than the general-purpose meth-
ods described so far. For example, the Alldiff constraint says that all the variables involved
must have distinct values (as in the cryptarithmetic problem above and Sudoku puzzles be-
low). One simple form of inconsistency detection for Alldiff constraints works as follows:
if m variables are involved in the constraint, and if they have n possible distinct values alto-
gether, and m n, then the constraint cannot be satisfied.
This leads to the following simple algorithm: First, remove any variable in the con-
straint that has a singleton domain, and delete that variable’s value from the domains of the
remaining variables. Repeat as long as there are singleton variables. If at any point an empty
domain is produced or there are more variables than domain values left, then an inconsistency
has been detected.
This method can detect the inconsistency in the assignment {WA = red, NSW = red}
for Figure 6.1. Notice that the variables SA, NT, and Q are effectively connected by an
Alldiff constraint because each pair must have two different colors. After applying AC-3
with the partial assignment, the domain of each variable is reduced to {green, blue}. That
is, we have three variables and only two colors, so the Alldiff constraint is violated. Thus,
a simple consistency procedure for a higher-order constraint is sometimes more effective
than applying arc consistency to an equivalent set of binary constraints. There are more

complex inference algorithms for Alldiff (see van Hoeve and Katriel, 2006) that propagate
more constraints but are more computationally expensive to run.
Another important higher-order constraint is the resource constraint, sometimes called
RESOURCE
CONSTRAINT
the atmost constraint. For example, in a scheduling problem, let P1, . . . , P4 denote the
numbers of personnel assigned to each of four tasks. The constraint that no more than 10
personnel are assigned in total is written as Atmost(10, P1, P2, P3, P4). We can detect an
inconsistency simply by checking the sum of the minimum values of the current domains;
for example, if each variable has the domain {3, 4, 5, 6}, the Atmost constraint cannot be
satisfied. We can also enforce consistency by deleting the maximum value of any domain if it
is not consistent with the minimum values of the other domains. Thus, if each variable in our
example has the domain {2, 3, 4, 5, 6}, the values 5 and 6 can be deleted from each domain.
For large resource-limited problems with integer values—such as logistical problems
involving moving thousands of people in hundreds of vehicles—it is usually not possible to
represent the domain of each variable as a large set of integers and gradually reduce that set by
consistency-checking methods. Instead, domains are represented by upper and lower bounds
and are managed by bounds propagation. For example, in an airline-scheduling problem,
BOUNDS
PROPAGATION
let’s suppose there are two flights, F1 and F2, for which the planes have capacities 165 and
385, respectively. The initial domains for the numbers of passengers on each flight are then
D1 = [0, 165] and D2 = [0, 385] .
Now suppose we have the additional constraint that the two flights together must carry 420
people: F1 + F2 = 420. Propagating bounds constraints, we reduce the domains to
D1 = [35, 165] and D2 = [255, 385] .
We say that a CSP is bounds consistent if for every variable X, and for both the lower-
BOUNDS
CONSISTENT
bound and upper-bound values of X, there exists some value of Y that satisfies the constraint
between X and Y for every variable Y . This kind of bounds propagation is widely used in
practical constraint problems.
6.2.6 Sudoku example
The popular Sudoku puzzle has introduced millions of people to constraint satisfaction prob-
SUDOKU
lems, although they may not recognize it. A Sudoku board consists of 81 squares, some of
which are initially filled with digits from 1 to 9. The puzzle is to fill in all the remaining
squares such that no digit appears twice in any row, column, or 3 × 3 box (see Figure 6.4). A
row, column, or box is called a unit.
The Sudoku puzzles that are printed in newspapers and puzzle books have the property
that there is exactly one solution. Although some can be tricky to solve by hand, taking tens
of minutes, even the hardest Sudoku problems yield to a CSP solver in less than 0.1 second.
A Sudoku puzzle can be considered a CSP with 81 variables, one for each square. We
use the variable names A1 through A9 for the top row (left to right), down to I1 through I9
for the bottom row. The empty squares have the domain {1, 2, 3, 4, 5, 6, 7, 8, 9} and the pre-
filled squares have a domain consisting of a single value. In addition, there are 27 different

3 2 6
9 3 5 1
1 8 6 4
8 1 2 9
7 8
6 7 8 2
2 6 9 5
8 2 3 9
5 1 3
3 2 6
9 3 5 1
1 8 6 4
8 1 2 9
7 8
6 7 8 2
2 6 9 5
8 2 3 9
5 1 3
4 8 9 1 5 7
6 7 4 8 2
2 5 7 9 3
5 4 3 7 6
2 9 5 6 4 1 3
1 3 9 4 5
3 7 8 1 4
1 4 5 7 6
6 9 4 7 8 2
1 2 3 4 5 6 7 8 9
A
B
C
D
E
F
G
H
I
A
B
C
D
E
F
G
H
I
1 2 3 4 5 6 7 8 9
(a) (b)
Figure 6.4 (a) A Sudoku puzzle and (b) its solution.
Alldiff constraints: one for each row, column, and box of 9 squares.
Alldiff (A1, A2, A3, A4, A5, A6, A7, A8, A9)
Alldiff (B1, B2, B3, B4, B5, B6, B7, B8, B9)
· · ·
Alldiff (A1, B1, C1, D1, E1, F1, G1, H1, I1)
Alldiff (A2, B2, C2, D2, E2, F2, G2, H2, I2)
· · ·
Alldiff (A1, A2, A3, B1, B2, B3, C1, C2, C3)
Alldiff (A4, A5, A6, B4, B5, B6, C4, C5, C6)
· · ·
Let us see how far arc consistency can take us. Assume that the Alldiff constraints have been
expanded into binary constraints (such as A1 6= A2) so that we can apply the AC-3 algorithm
directly. Consider variable E6 from Figure 6.4(a)—the empty square between the 2 and the
8 in the middle box. From the constraints in the box, we can remove not only 2 and 8 but also
1 and 7 from E6’s domain. From the constraints in its column, we can eliminate 5, 6, 2, 8,
9, and 3. That leaves E6 with a domain of {4}; in other words, we know the answer for E6.
Now consider variable I6—the square in the bottom middle box surrounded by 1, 3, and 3.
Applying arc consistency in its column, we eliminate 5, 6, 2, 4 (since we now know E6 must
be 4), 8, 9, and 3. We eliminate 1 by arc consistency with I5, and we are left with only the
value 7 in the domain of I6. Now there are 8 known values in column 6, so arc consistency
can infer that A6 must be 1. Inference continues along these lines, and eventually, AC-3 can
solve the entire puzzle—all the variables have their domains reduced to a single value, as
shown in Figure 6.4(b).
Of course, Sudoku would soon lose its appeal if every puzzle could be solved by a

mechanical application of AC-3, and indeed AC-3 works only for the easiest Sudoku puzzles.
Slightly harder ones can be solved by PC-2, but at a greater computational cost: there are
255,960 different path constraints to consider in a Sudoku puzzle. To solve the hardest puzzles
and to make efficient progress, we will have to be more clever.
Indeed, the appeal of Sudoku puzzles for the human solver is the need to be resourceful
in applying more complex inference strategies. Aficionados give them colorful names, such
as “naked triples.” That strategy works as follows: in any unit (row, column or box), find
three squares that each have a domain that contains the same three numbers or a subset of
those numbers. For example, the three domains might be {1, 8}, {3, 8}, and {1, 3, 8}. From
that we don’t know which square contains 1, 3, or 8, but we do know that the three numbers
must be distributed among the three squares. Therefore we can remove 1, 3, and 8 from the
domains of every other square in the unit.
It is interesting to note how far we can go without saying much that is specific to Su-
doku. We do of course have to say that there are 81 variables, that their domains are the digits
1 to 9, and that there are 27 Alldiff constraints. But beyond that, all the strategies—arc con-
sistency, path consistency, etc.—apply generally to all CSPs, not just to Sudoku problems.
Even naked triples is really a strategy for enforcing consistency of Alldiff constraints and
has nothing to do with Sudoku per se. This is the power of the CSP formalism: for each new
problem area, we only need to define the problem in terms of constraints; then the general
constraint-solving mechanisms can take over.
6.3 BACKTRACKING SEARCH FOR CSPS
Sudoku problems are designed to be solved by inference over constraints. But many other
CSPs cannot be solved by inference alone; there comes a time when we must search for a
solution. In this section we look at backtracking search algorithms that work on partial as-
signments; in the next section we look at local search algorithms over complete assignments.
We could apply a standard depth-limited search (from Chapter 3). A state would be a
partial assignment, and an action would be adding var = value to the assignment. But for a
CSP with n variables of domain size d, we quickly notice something terrible: the branching
factor at the top level is nd because any of d values can be assigned to any of n variables. At
the next level, the branching factor is (n − 1)d, and so on for n levels. We generate a tree
with n! · dn leaves, even though there are only dn possible complete assignments!
Our seemingly reasonable but naive formulation ignores crucial property common to
all CSPs: commutativity. A problem is commutative if the order of application of any given
COMMUTATIVITY
set of actions has no effect on the outcome. CSPs are commutative because when assigning
values to variables, we reach the same partial assignment regardless of order. Therefore, we
need only consider a single variable at each node in the search tree. For example, at the root
node of a search tree for coloring the map of Australia, we might make a choice between
SA = red, SA = green, and SA = blue, but we would never choose between SA = red and
WA = blue. With this restriction, the number of leaves is dn, as we would hope.

Section 6.3. Backtracking Search for CSPs 215
function BACKTRACKING-SEARCH(csp) returns a solution, or failure
return BACKTRACK({ },csp)
function BACKTRACK(assignment,csp) returns a solution, or failure
if assignment is complete then return assignment
var ← SELECT-UNASSIGNED-VARIABLE(csp)
for each value in ORDER-DOMAIN-VALUES(var,assignment,csp) do
if value is consistent with assignment then
add {var = value} to assignment
inferences ← INFERENCE(csp,var,value)
if inferences 6= failure then
add inferences to assignment
result ← BACKTRACK(assignment,csp)
if result 6= failure then
return result
remove {var = value} and inferences from assignment
return failure
Figure 6.5 A simple backtracking algorithm for constraint satisfaction problems. The al-
gorithm is modeled on the recursive depth-first search of Chapter 3. By varying the functions
SELECT-UNASSIGNED-VARIABLE and ORDER-DOMAIN-VALUES, we can implement the
general-purpose heuristics discussed in the text. The function INFERENCE can optionally be
used to impose arc-, path-, or k-consistency, as desired. If a value choice leads to failure
(noticed either by INFERENCE or by BACKTRACK), then value assignments (including those
made by INFERENCE) are removed from the current assignment and a new value is tried.
The term backtracking search is used for a depth-first search that chooses values for
BACKTRACKING
SEARCH
one variable at a time and backtracks when a variable has no legal values left to assign. The
algorithm is shown in Figure 6.5. It repeatedly chooses an unassigned variable, and then tries
all values in the domain of that variable in turn, trying to find a solution. If an inconsistency is
detected, then BACKTRACK returns failure, causing the previous call to try another value. Part
of the search tree for the Australia problem is shown in Figure 6.6, where we have assigned
variables in the order WA, NT, Q, . . .. Because the representation of CSPs is standardized,
there is no need to supply BACKTRACKING-SEARCH with a domain-specific initial state,
action function, transition model, or goal test.
Notice that BACKTRACKING-SEARCH keeps only a single representation of a state and
alters that representation rather than creating new ones, as described on page 87.
In Chapter 3 we improved the poor performance of uninformed search algorithms by
supplying them with domain-specific heuristic functions derived from our knowledge of the
problem. It turns out that we can solve CSPs efficiently without such domain-specific knowl-
edge. Instead, we can add some sophistication to the unspecified functions in Figure 6.5,
using them to address the following questions:
1. Which variable should be assigned next (SELECT-UNASSIGNED-VARIABLE), and in
what order should its values be tried (ORDER-DOMAIN-VALUES)?

WA=red WA=blue
WA=green
WA=red
NT=blue
WA=red
NT=green
WA=red
NT=green
Q=red
WA=red
NT=green
Q=blue
Figure 6.6 Part of the search tree for the map-coloring problem in Figure 6.1.
2. What inferences should be performed at each step in the search (INFERENCE)?
3. When the search arrives at an assignment that violates a constraint, can the search avoid
repeating this failure?
The subsections that follow answer each of these questions in turn.
6.3.1 Variable and value ordering
The backtracking algorithm contains the line
var ← SELECT-UNASSIGNED-VARIABLE(csp) .
The simplest strategy for SELECT-UNASSIGNED-VARIABLE is to choose the next unassigned
variable in order, {X1, X2, . . .}. This static variable ordering seldom results in the most effi-
cient search. For example, after the assignments for WA = red and NT = green in Figure 6.6,
there is only one possible value for SA, so it makes sense to assign SA = blue next rather than
assigning Q. In fact, after SA is assigned, the choices for Q, NSW , and V are all forced. This
intuitive idea—choosing the variable with the fewest “legal” values—is called the minimum-
remaining-values (MRV) heuristic. It also has been called the “most constrained variable” or
MINIMUM-
REMAINING-VALUES
“fail-first” heuristic, the latter because it picks a variable that is most likely to cause a failure
soon, thereby pruning the search tree. If some variable X has no legal values left, the MRV
heuristic will select X and failure will be detected immediately—avoiding pointless searches
through other variables. The MRV heuristic usually performs better than a random or static
ordering, sometimes by a factor of 1,000 or more, although the results vary widely depending
on the problem.
The MRV heuristic doesn’t help at all in choosing the first region to color in Australia,
because initially every region has three legal colors. In this case, the degree heuristic comes
DEGREE HEURISTIC
in handy. It attempts to reduce the branching factor on future choices by selecting the vari-
able that is involved in the largest number of constraints on other unassigned variables. In
Figure 6.1, SA is the variable with highest degree, 5; the other variables have degree 2 or 3,
except for T, which has degree 0. In fact, once SA is chosen, applying the degree heuris-
tic solves the problem without any false steps—you can choose any consistent color at each
choice point and still arrive at a solution with no backtracking. The minimum-remaining-

values heuristic is usually a more powerful guide, but the degree heuristic can be useful as a
tie-breaker.
Once a variable has been selected, the algorithm must decide on the order in which to
examine its values. For this, the least-constraining-value heuristic can be effective in some
LEAST-
CONSTRAINING-
VALUE
cases. It prefers the value that rules out the fewest choices for the neighboring variables in
the constraint graph. For example, suppose that in Figure 6.1 we have generated the partial
assignment with WA = red and NT = green and that our next choice is for Q. Blue would
be a bad choice because it eliminates the last legal value left for Q’s neighbor, SA. The
least-constraining-value heuristic therefore prefers red to blue. In general, the heuristic is
trying to leave the maximum flexibility for subsequent variable assignments. Of course, if we
are trying to find all the solutions to a problem, not just the first one, then the ordering does
not matter because we have to consider every value anyway. The same holds if there are no
solutions to the problem.
Why should variable selection be fail-first, but value selection be fail-last? It turns out
that, for a wide variety of problems, a variable ordering that chooses a variable with the
minimum number of remaining values helps minimize the number of nodes in the search tree
by pruning larger parts of the tree earlier. For value ordering, the trick is that we only need
one solution; therefore it makes sense to look for the most likely values first. If we wanted to
enumerate all solutions rather than just find one, then value ordering would be irrelevant.
6.3.2 Interleaving search and inference
So far we have seen how AC-3 and other algorithms can infer reductions in the domain of
variables before we begin the search. But inference can be even more powerful in the course
of a search: every time we make a choice of a value for a variable, we have a brand-new
opportunity to infer new domain reductions on the neighboring variables.
One of the simplest forms of inference is called forward checking. Whenever a vari-
FORWARD
CHECKING
able X is assigned, the forward-checking process establishes arc consistency for it: for each
unassigned variable Y that is connected to X by a constraint, delete from Y ’s domain any
value that is inconsistent with the value chosen for X. Because forward checking only does
arc consistency inferences, there is no reason to do forward checking if we have already done
arc consistency as a preprocessing step.
Figure 6.7 shows the progress of backtracking search on the Australia CSP with for-
ward checking. There are two important points to notice about this example. First, notice
that after WA = red and Q = green are assigned, the domains of NT and SA are reduced
to a single value; we have eliminated branching on these variables altogether by propagat-
ing information from WA and Q. A second point to notice is that after V = blue, the do-
main of SA is empty. Hence, forward checking has detected that the partial assignment
{WA = red, Q = green, V = blue} is inconsistent with the constraints of the problem, and
the algorithm will therefore backtrack immediately.
For many problems the search will be more effective if we combine the MRV heuris-
tic with forward checking. Consider Figure 6.7 after assigning {WA = red}. Intuitively, it
seems that that assignment constrains its neighbors, NT and SA, so we should handle those

Initial domains
After WA=red
After Q=green
After V=blue
R G B
R
R B
R G B
R G B
B
R G B
R G B
R G B
R
R
R
R G B
B
B
G B
R G B
G
G
R G B
R G B
B
G B
R G B
R G B
R G B
R G B
WA T
SA
V
NSW
Q
NT
Figure 6.7 The progress of a map-coloring search with forward checking. WA = red
is assigned first; then forward checking deletes red from the domains of the neighboring
variables NT and SA. After Q = green is assigned, green is deleted from the domains of
NT, SA, and NSW . After V = blue is assigned, blue is deleted from the domains of NSW
and SA, leaving SA with no legal values.
variables next, and then all the other variables will fall into place. That’s exactly what hap-
pens with MRV: NT and SA have two values, so one of them is chosen first, then the other,
then Q, NSW , and V in order. Finally T still has three values, and any one of them works.
We can view forward checking as an efficient way to incrementally compute the information
that the MRV heuristic needs to do its job.
Although forward checking detects many inconsistencies, it does not detect all of them.
The problem is that it makes the current variable arc-consistent, but doesn’t look ahead and
make all the other variables arc-consistent. For example, consider the third row of Figure 6.7.
It shows that when WA is red and Q is green, both NT and SA are forced to be blue. Forward
checking does not look far enough ahead to notice that this is an inconsistency: NT and SA
are adjacent and so cannot have the same value.
The algorithm called MAC (for Maintaining Arc Consistency (MAC)) detects this
MAINTAINING ARC
CONSISTENCY (MAC)
inconsistency. After a variable Xi is assigned a value, the INFERENCE procedure calls AC-3,
but instead of a queue of all arcs in the CSP, we start with only the arcs (Xj, Xi) for all
Xj that are unassigned variables that are neighbors of Xi. From there, AC-3 does constraint
propagation in the usual way, and if any variable has its domain reduced to the empty set, the
call to AC-3 fails and we know to backtrack immediately. We can see that MAC is strictly
more powerful than forward checking because forward checking does the same thing as MAC
on the initial arcs in MAC’s queue; but unlike MAC, forward checking does not recursively
propagate constraints when changes are made to the domains of variables.
6.3.3 Intelligent backtracking: Looking backward
The BACKTRACKING-SEARCH algorithm in Figure 6.5 has a very simple policy for what to
do when a branch of the search fails: back up to the preceding variable and try a different
value for it. This is called chronological backtracking because the most recent decision
CHRONOLOGICAL
BACKTRACKING
point is revisited. In this subsection, we consider better possibilities.
Consider what happens when we apply simple backtracking in Figure 6.1 with a fixed
variable ordering Q, NSW , V , T, SA, WA, NT. Suppose we have generated the partial
assignment {Q = red, NSW = green, V = blue, T = red}. When we try the next variable,
SA, we see that every value violates a constraint. We back up to T and try a new color for

Tasmania! Obviously this is silly—recoloring Tasmania cannot possibly resolve the problem
with South Australia.
A more intelligent approach to backtracking is to backtrack to a variable that might fix
the problem—a variable that was responsible for making one of the possible values of SA
impossible. To do this, we will keep track of a set of assignments that are in conflict with
some value for SA. The set (in this case {Q = red, NSW = green, V = blue, }), is called the
conflict set for SA. The backjumping method backtracks to the most recent assignment in
CONFLICT SET
BACKJUMPING the conflict set; in this case, backjumping would jump over Tasmania and try a new value
for V . This method is easily implemented by a modification to BACKTRACK such that it
accumulates the conflict set while checking for a legal value to assign. If no legal value is
found, the algorithm should return the most recent element of the conflict set along with the
failure indicator.
The sharp-eyed reader will have noticed that forward checking can supply the conflict
set with no extra work: whenever forward checking based on an assignment X = x deletes a
value from Y ’s domain, it should add X = x to Y ’s conflict set. If the last value is deleted
from Y ’s domain, then the assignments in the conflict set of Y are added to the conflict set
of X. Then, when we get to Y , we know immediately where to backtrack if needed.
The eagle-eyed reader will have noticed something odd: backjumping occurs when
every value in a domain is in conflict with the current assignment; but forward checking
detects this event and prevents the search from ever reaching such a node! In fact, it can be
shown that every branch pruned by backjumping is also pruned by forward checking. Hence,
simple backjumping is redundant in a forward-checking search or, indeed, in a search that
uses stronger consistency checking, such as MAC.
Despite the observations of the preceding paragraph, the idea behind backjumping re-
mains a good one: to backtrack based on the reasons for failure. Backjumping notices failure
when a variable’s domain becomes empty, but in many cases a branch is doomed long before
this occurs. Consider again the partial assignment {WA = red, NSW = red} (which, from
our earlier discussion, is inconsistent). Suppose we try T = red next and then assign NT, Q,
V , SA. We know that no assignment can work for these last four variables, so eventually we
run out of values to try at NT. Now, the question is, where to backtrack? Backjumping cannot
work, because NT does have values consistent with the preceding assigned variables—NT
doesn’t have a complete conflict set of preceding variables that caused it to fail. We know,
however, that the four variables NT, Q, V , and SA, taken together, failed because of a set of
preceding variables, which must be those variables that directly conflict with the four. This
leads to a deeper notion of the conflict set for a variable such as NT: it is that set of preced-
ing variables that caused NT, together with any subsequent variables, to have no consistent
solution. In this case, the set is WA and NSW , so the algorithm should backtrack to NSW
and skip over Tasmania. A backjumping algorithm that uses conflict sets defined in this way
is called conflict-directed backjumping.
CONFLICT-DIRECTED
BACKJUMPING
We must now explain how these new conflict sets are computed. The method is in
fact quite simple. The “terminal” failure of a branch of the search always occurs because a
variable’s domain becomes empty; that variable has a standard conflict set. In our example,
SA fails, and its conflict set is (say) {WA, NT, Q}. We backjump to Q, and Q absorbs

the conflict set from SA (minus Q itself, of course) into its own direct conflict set, which is
{NT, NSW }; the new conflict set is {WA, NT, NSW }. That is, there is no solution from
Q onward, given the preceding assignment to {WA, NT, NSW }. Therefore, we backtrack
to NT, the most recent of these. NT absorbs {WA, NT, NSW } − {NT} into its own
direct conflict set {WA}, giving {WA, NSW } (as stated in the previous paragraph). Now
the algorithm backjumps to NSW , as we would hope. To summarize: let Xj be the current
variable, and let conf (Xj) be its conflict set. If every possible value for Xj fails, backjump
to the most recent variable Xi in conf (Xj), and set
conf (Xi) ← conf (Xi) ∪ conf (Xj) − {Xi} .
When we reach a contradiction, backjumping can tell us how far to back up, so we don’t
waste time changing variables that won’t fix the problem. But we would also like to avoid
running into the same problem again. When the search arrives at a contradiction, we know
that some subset of the conflict set is responsible for the problem. Constraint learning is the
CONSTRAINT
LEARNING
idea of finding a minimum set of variables from the conflict set that causes the problem. This
set of variables, along with their corresponding values, is called a no-good. We then record
NO-GOOD
the no-good, either by adding a new constraint to the CSP or by keeping a separate cache of
no-goods.
For example, consider the state {WA = red, NT = green, Q = blue} in the bottom
row of Figure 6.6. Forward checking can tell us this state is a no-good because there is no
valid assignment to SA. In this particular case, recording the no-good would not help, because
once we prune this branch from the search tree, we will never encounter this combination
again. But suppose that the search tree in Figure 6.6 were actually part of a larger search tree
that started by first assigning values for V and T. Then it would be worthwhile to record
{WA = red, NT = green, Q = blue} as a no-good because we are going to run into the
same problem again for each possible set of assignments to V and T.
No-goods can be effectively used by forward checking or by backjumping. Constraint
learning is one of the most important techniques used by modern CSP solvers to achieve
efficiency on complex problems.
6.4 LOCAL SEARCH FOR CSPS
Local search algorithms (see Section 4.1) turn out to be effective in solving many CSPs. They
use a complete-state formulation: the initial state assigns a value to every variable, and the
search changes the value of one variable at a time. For example, in the 8-queens problem (see
Figure 4.3), the initial state might be a random configuration of 8 queens in 8 columns, and
each step moves a single queen to a new position in its column. Typically, the initial guess
violates several constraints. The point of local search is to eliminate the violated constraints.2
In choosing a new value for a variable, the most obvious heuristic is to select the value
that results in the minimum number of conflicts with other variables—the min-conflicts
MIN-CONFLICTS
2 Local search can easily be extended to constraint optimization problems (COPs). In that case, all the techniques
for hill climbing and simulated annealing can be applied to optimize the objective function.

Section 6.4. Local Search for CSPs 221
function MIN-CONFLICTS(csp,max steps) returns a solution or failure
inputs: csp, a constraint satisfaction problem
max steps, the number of steps allowed before giving up
current ← an initial complete assignment for csp
for i = 1 to max steps do
if current is a solution for csp then return current
var ← a randomly chosen conflicted variable from csp.VARIABLES
value ← the value v for var that minimizes CONFLICTS(var,v,current,csp)
set var = value in current
return failure
Figure 6.8 The MIN-CONFLICTS algorithm for solving CSPs by local search. The initial
state may be chosen randomly or by a greedy assignment process that chooses a minimal-
conflict value for each variable in turn. The CONFLICTS function counts the number of
constraints violated by a particular value, given the rest of the current assignment.
2
2
1
2
3
1
2
3
3
2
3
2
3
0
Figure 6.9 A two-step solution using min-conflicts for an 8-queens problem. At each
stage, a queen is chosen for reassignment in its column. The number of conflicts (in this
case, the number of attacking queens) is shown in each square. The algorithm moves the
queen to the min-conflicts square, breaking ties randomly.
heuristic. The algorithm is shown in Figure 6.8 and its application to an 8-queens problem is
diagrammed in Figure 6.9.
Min-conflicts is surprisingly effective for many CSPs. Amazingly, on the n-queens
problem, if you don’t count the initial placement of queens, the run time of min-conflicts is
roughly independent of problem size. It solves even the million-queens problem in an aver-
age of 50 steps (after the initial assignment). This remarkable observation was the stimulus
leading to a great deal of research in the 1990s on local search and the distinction between
easy and hard problems, which we take up in Chapter 7. Roughly speaking, n-queens is
easy for local search because solutions are densely distributed throughout the state space.
Min-conflicts also works well for hard problems. For example, it has been used to schedule
observations for the Hubble Space Telescope, reducing the time taken to schedule a week of
observations from three weeks (!) to around 10 minutes.

All the local search techniques from Section 4.1 are candidates for application to CSPs,
and some of those have proved especially effective. The landscape of a CSP under the min-
conflicts heuristic usually has a series of plateaux. There may be millions of variable as-
signments that are only one conflict away from a solution. Plateau search—allowing side-
ways moves to another state with the same score—can help local search find its way off this
plateau. This wandering on the plateau can be directed with tabu search: keeping a small
list of recently visited states and forbidding the algorithm to return to those states. Simulated
annealing can also be used to escape from plateaux.
Another technique, called constraint weighting, can help concentrate the search on the
CONSTRAINT
WEIGHTING
important constraints. Each constraint is given a numeric weight, Wi, initially all 1. At each
step of the search, the algorithm chooses a variable/value pair to change that will result in the
lowest total weight of all violated constraints. The weights are then adjusted by incrementing
the weight of each constraint that is violated by the current assignment. This has two benefits:
it adds topography to plateaux, making sure that it is possible to improve from the current
state, and it also, over time, adds weight to the constraints that are proving difficult to solve.
Another advantage of local search is that it can be used in an online setting when the
problem changes. This is particularly important in scheduling problems. A week’s airline
schedule may involve thousands of flights and tens of thousands of personnel assignments,
but bad weather at one airport can render the schedule infeasible. We would like to repair the
schedule with a minimum number of changes. This can be easily done with a local search
algorithm starting from the current schedule. A backtracking search with the new set of
constraints usually requires much more time and might find a solution with many changes
from the current schedule.
6.5 THE STRUCTURE OF PROBLEMS
In this section, we examine ways in which the structure of the problem, as represented by
the constraint graph, can be used to find solutions quickly. Most of the approaches here also
apply to other problems besides CSPs, such as probabilistic reasoning. After all, the only way
we can possibly hope to deal with the real world is to decompose it into many subproblems.
Looking again at the constraint graph for Australia (Figure 6.1(b), repeated as Figure 6.12(a)),
one fact stands out: Tasmania is not connected to the mainland.3 Intuitively, it is obvious that
coloring Tasmania and coloring the mainland are independent subproblems—any solution
INDEPENDENT
SUBPROBLEMS
for the mainland combined with any solution for Tasmania yields a solution for the whole
map. Independence can be ascertained simply by finding connected components of the
CONNECTED
COMPONENT
constraint graph. Each component corresponds to a subproblem CSPi. If assignment Si is
a solution of CSPi, then
S
i Si is a solution of
S
i CSPi. Why is this important? Consider
the following: suppose each CSPi has c variables from the total of n variables, where c is
a constant. Then there are n/c subproblems, each of which takes at most dc work to solve,
3 A careful cartographer or patriotic Tasmanian might object that Tasmania should not be colored the same as
its nearest mainland neighbor, to avoid the impression that it might be part of that state.

Section 6.5. The Structure of Problems 223
where d is the size of the domain. Hence, the total work is O(dcn/c), which is linear in n;
without the decomposition, the total work is O(dn), which is exponential in n. Let’s make
this more concrete: dividing a Boolean CSP with 80 variables into four subproblems reduces
the worst-case solution time from the lifetime of the universe down to less than a second.
Completely independent subproblems are delicious, then, but rare. Fortunately, some
other graph structures are also easy to solve. For example, a constraint graph is a tree when
any two variables are connected by only one path. We show that any tree-structured CSP can
be solved in time linear in the number of variables.4 The key is a new notion of consistency,
called directed arc consistency or DAC. A CSP is defined to be directed arc-consistent under
DIRECTED ARC
CONSISTENCY
an ordering of variables X1, X2, . . . , Xn if and only if every Xi is arc-consistent with each
Xj for j i.
To solve a tree-structured CSP, first pick any variable to be the root of the tree, and
choose an ordering of the variables such that each variable appears after its parent in the tree.
Such an ordering is called a topological sort. Figure 6.10(a) shows a sample tree and (b)
TOPOLOGICAL SORT
shows one possible ordering. Any tree with n nodes has n−1 arcs, so we can make this graph
directed arc-consistent in O(n) steps, each of which must compare up to d possible domain
values for two variables, for a total time of O(nd2). Once we have a directed arc-consistent
graph, we can just march down the list of variables and choose any remaining value. Since
each link from a parent to its child is arc consistent, we know that for any value we choose for
the parent, there will be a valid value left to choose for the child. That means we won’t have
to backtrack; we can move linearly through the variables. The complete algorithm is shown
in Figure 6.11.
A
C
B D
E
F
(a)
A C
B D E F
(b)
Figure 6.10 (a) The constraint graph of a tree-structured CSP. (b) A linear ordering of the
variables consistent with the tree with A as the root. This is known as a topological sort of
the variables.
Now that we have an efficient algorithm for trees, we can consider whether more general
constraint graphs can be reduced to trees somehow. There are two primary ways to do this,
one based on removing nodes and one based on collapsing nodes together.
The first approach involves assigning values to some variables so that the remaining
variables form a tree. Consider the constraint graph for Australia, shown again in Fig-
ure 6.12(a). If we could delete South Australia, the graph would become a tree, as in (b).
Fortunately, we can do this (in the graph, not the continent) by fixing a value for SA and
4 Sadly, very few regions of the world have tree-structured maps, although Sulawesi comes close.

function TREE-CSP-SOLVER(csp) returns a solution, or failure
inputs: csp, a CSP with components X, D, C
n ← number of variables in X
assignment ← an empty assignment
root ← any variable in X
X ← TOPOLOGICALSORT(X ,root)
for j = n down to 2 do
MAKE-ARC-CONSISTENT(PARENT(Xj),Xj)
if it cannot be made consistent then return failure
for i = 1 to n do
assignment[Xi] ← any consistent value from Di
if there is no consistent value then return failure
return assignment
Figure 6.11 The TREE-CSP-SOLVER algorithm for solving tree-structured CSPs. If the
CSP has a solution, we will ﬁnd it in linear time; if not, we will detect a contradiction.
WA
NT
SA
Q
NSW
V
T
WA
NT
Q
NSW
V
T
(a) (b)
Figure 6.12 (a) The original constraint graph from Figure 6.1. (b) The constraint graph
after the removal of SA.
deleting from the domains of the other variables any values that are inconsistent with the
value chosen for SA.
Now, any solution for the CSP after SA and its constraints are removed will be con-
sistent with the value chosen for SA. (This works for binary CSPs; the situation is more
complicated with higher-order constraints.) Therefore, we can solve the remaining tree with
the algorithm given above and thus solve the whole problem. Of course, in the general case
(as opposed to map coloring), the value chosen for SA could be the wrong one, so we would
need to try each possible value. The general algorithm is as follows:

Section 6.5. The Structure of Problems 225
1. Choose a subset S of the CSP’s variables such that the constraint graph becomes a tree
after removal of S. S is called a cycle cutset.
CYCLE CUTSET
2. For each possible assignment to the variables in S that satisfies all constraints on S,
(a) remove from the domains of the remaining variables any values that are inconsis-
tent with the assignment for S, and
(b) If the remaining CSP has a solution, return it together with the assignment for S.
If the cycle cutset has size c, then the total run time is O(dc · (n − c)d2): we have to try each
of the dc combinations of values for the variables in S, and for each combination we must
solve a tree problem of size n − c. If the graph is “nearly a tree,” then c will be small and the
savings over straight backtracking will be huge. In the worst case, however, c can be as large
as (n − 2). Finding the smallest cycle cutset is NP-hard, but several efficient approximation
algorithms are known. The overall algorithmic approach is called cutset conditioning; it
CUTSET
CONDITIONING
comes up again in Chapter 14, where it is used for reasoning about probabilities.
The second approach is based on constructing a tree decomposition of the constraint
TREE
DECOMPOSITION
graph into a set of connected subproblems. Each subproblem is solved independently, and the
resulting solutions are then combined. Like most divide-and-conquer algorithms, this works
well if no subproblem is too large. Figure 6.13 shows a tree decomposition of the map-
coloring problem into five subproblems. A tree decomposition must satisfy the following
three requirements:
• Every variable in the original problem appears in at least one of the subproblems.
• If two variables are connected by a constraint in the original problem, they must appear
together (along with the constraint) in at least one of the subproblems.
• If a variable appears in two subproblems in the tree, it must appear in every subproblem
along the path connecting those subproblems.
The first two conditions ensure that all the variables and constraints are represented in the
decomposition. The third condition seems rather technical, but simply reflects the constraint
that any given variable must have the same value in every subproblem in which it appears;
the links joining subproblems in the tree enforce this constraint. For example, SA appears in
all four of the connected subproblems in Figure 6.13. You can verify from Figure 6.12 that
this decomposition makes sense.
We solve each subproblem independently; if any one has no solution, we know the en-
tire problem has no solution. If we can solve all the subproblems, then we attempt to construct
a global solution as follows. First, we view each subproblem as a “mega-variable” whose do-
main is the set of all solutions for the subproblem. For example, the leftmost subproblem in
Figure 6.13 is a map-coloring problem with three variables and hence has six solutions—one
is {WA = red, SA = blue, NT = green}. Then, we solve the constraints connecting the
subproblems, using the efficient algorithm for trees given earlier. The constraints between
subproblems simply insist that the subproblem solutions agree on their shared variables. For
example, given the solution {WA = red, SA = blue, NT = green} for the first subproblem,
the only consistent solution for the next subproblem is {SA = blue, NT = green, Q = red}.
A given constraint graph admits many tree decompositions; in choosing a decompo-
sition, the aim is to make the subproblems as small as possible. The tree width of a tree
TREE WIDTH

T
WA
NT
SA
NT
SA
Q
SA
Q
NSW
SA NSW
V
Figure 6.13 A tree decomposition of the constraint graph in Figure 6.12(a).
decomposition of a graph is one less than the size of the largest subproblem; the tree width
of the graph itself is defined to be the minimum tree width among all its tree decompositions.
If a graph has tree width w and we are given the corresponding tree decomposition, then the
problem can be solved in O(ndw+1) time. Hence, CSPs with constraint graphs of bounded
tree width are solvable in polynomial time. Unfortunately, finding the decomposition with
minimal tree width is NP-hard, but there are heuristic methods that work well in practice.
So far, we have looked at the structure of the constraint graph. There can be important
structure in the values of variables as well. Consider the map-coloring problem with n colors.
For every consistent solution, there is actually a set of n! solutions formed by permuting the
color names. For example, on the Australia map we know that WA, NT, and SA must all have
different colors, but there are 3! = 6 ways to assign the three colors to these three regions.
This is called value symmetry. We would like to reduce the search space by a factor of
VALUE SYMMETRY
n! by breaking the symmetry. We do this by introducing a symmetry-breaking constraint.
SYMMETRY-
BREAKING
CONSTRAINT
For our example, we might impose an arbitrary ordering constraint, NT SA WA, that
requires the three values to be in alphabetical order. This constraint ensures that only one of
the n! solutions is possible: {NT = blue, SA = green, WA = red}.
For map coloring, it was easy to find a constraint that eliminates the symmetry, and
in general it is possible to find constraints that eliminate all but one symmetric solution in
polynomial time, but it is NP-hard to eliminate all symmetry among intermediate sets of
values during search. In practice, breaking value symmetry has proved to be important and
effective on a wide range of problems.

6.6 SUMMARY
• Constraint satisfaction problems (CSPs) represent a state with a set of variable/value
pairs and represent the conditions for a solution by a set of constraints on the variables.
Many important real-world problems can be described as CSPs.
• A number of inference techniques use the constraints to infer which variable/value pairs
are consistent and which are not. These include node, arc, path, and k-consistency.
• Backtracking search, a form of depth-first search, is commonly used for solving CSPs.
Inference can be interwoven with search.
• The minimum-remaining-values and degree heuristics are domain-independent meth-
ods for deciding which variable to choose next in a backtracking search. The least-
constraining-value heuristic helps in deciding which value to try first for a given
variable. Backtracking occurs when no legal assignment can be found for a variable.
Conflict-directed backjumping backtracks directly to the source of the problem.
• Local search using the min-conflicts heuristic has also been applied to constraint satis-
faction problems with great success.
• The complexity of solving a CSP is strongly related to the structure of its constraint
graph. Tree-structured problems can be solved in linear time. Cutset conditioning can
reduce a general CSP to a tree-structured one and is quite efficient if a small cutset can
be found. Tree decomposition techniques transform the CSP into a tree of subproblems
and are efficient if the tree width of the constraint graph is small.
The earliest work related to constraint satisfaction dealt largely with numerical constraints.
Equational constraints with integer domains were studied by the Indian mathematician Brah-
magupta in the seventh century; they are often called Diophantine equations, after the Greek
DIOPHANTINE
EQUATIONS
mathematician Diophantus (c. 200–284), who actually considered the domain of positive ra-
tionals. Systematic methods for solving linear equations by variable elimination were studied
by Gauss (1829); the solution of linear inequality constraints goes back to Fourier (1827).
Finite-domain constraint satisfaction problems also have a long history. For example,
graph coloring (of which map coloring is a special case) is an old problem in mathematics.
GRAPH COLORING
The four-color conjecture (that every planar graph can be colored with four or fewer colors)
was first made by Francis Guthrie, a student of De Morgan, in 1852. It resisted solution—
despite several published claims to the contrary—until a proof was devised by Appel and
Haken (1977) (see the book Four Colors Suffice (Wilson, 2004)). Purists were disappointed
that part of the proof relied on a computer, so Georges Gonthier (2008), using the COQ
theorem prover, derived a formal proof that Appel and Haken’s proof was correct.
Specific classes of constraint satisfaction problems occur throughout the history of
computer science. One of the most influential early examples was the SKETCHPAD sys-

tem (Sutherland, 1963), which solved geometric constraints in diagrams and was the fore-
runner of modern drawing programs and CAD tools. The identification of CSPs as a general
class is due to Ugo Montanari (1974). The reduction of higher-order CSPs to purely binary
CSPs with auxiliary variables (see Exercise 6.5) is due originally to the 19th-century logician
Charles Sanders Peirce. It was introduced into the CSP literature by Dechter (1990b) and
was elaborated by Bacchus and van Beek (1998). CSPs with preferences among solutions are
studied widely in the optimization literature; see Bistarelli et al. (1997) for a generalization
of the CSP framework to allow for preferences. The bucket-elimination algorithm (Dechter,
1999) can also be applied to optimization problems.
Constraint propagation methods were popularized by Waltz’s (1975) success on poly-
hedral line-labeling problems for computer vision. Waltz showed that, in many problems,
propagation completely eliminates the need for backtracking. Montanari (1974) introduced
the notion of constraint networks and propagation by path consistency. Alan Mackworth
(1977) proposed the AC-3 algorithm for enforcing arc consistency as well as the general idea
of combining backtracking with some degree of consistency enforcement. AC-4, a more
efficient arc-consistency algorithm, was developed by Mohr and Henderson (1986). Soon af-
ter Mackworth’s paper appeared, researchers began experimenting with the tradeoff between
the cost of consistency enforcement and the benefits in terms of search reduction. Haralick
and Elliot (1980) favored the minimal forward-checking algorithm described by McGregor
(1979), whereas Gaschnig (1979) suggested full arc-consistency checking after each vari-
able assignment—an algorithm later called MAC by Sabin and Freuder (1994). The latter
paper provides somewhat convincing evidence that, on harder CSPs, full arc-consistency
checking pays off. Freuder (1978, 1982) investigated the notion of k-consistency and its
relationship to the complexity of solving CSPs. Apt (1999) describes a generic algorithmic
framework within which consistency propagation algorithms can be analyzed, and Bessière
(2006) presents a current survey.
Special methods for handling higher-order or global constraints were developed first
within the context of constraint logic programming. Marriott and Stuckey (1998) provide
excellent coverage of research in this area. The Alldiff constraint was studied by Regin
(1994), Stergiou and Walsh (1999), and van Hoeve (2001). Bounds constraints were incorpo-
rated into constraint logic programming by Van Hentenryck et al. (1998). A survey of global
constraints is provided by van Hoeve and Katriel (2006).
Sudoku has become the most widely known CSP and was described as such by Simonis
(2005). Agerbeck and Hansen (2008) describe some of the strategies and show that Sudoku
on an n2 × n2 board is in the class of NP-hard problems. Reeson et al. (2007) show an
interactive solver based on CSP techniques.
The idea of backtracking search goes back to Golomb and Baumert (1965), and its
application to constraint satisfaction is due to Bitner and Reingold (1975), although they trace
the basic algorithm back to the 19th century. Bitner and Reingold also introduced the MRV
heuristic, which they called the most-constrained-variable heuristic. Brelaz (1979) used the
degree heuristic as a tiebreaker after applying the MRV heuristic. The resulting algorithm,
despite its simplicity, is still the best method for k-coloring arbitrary graphs. Haralick and
Elliot (1980) proposed the least-constraining-value heuristic.

The basic backjumping method is due to John Gaschnig (1977, 1979). Kondrak and
van Beek (1997) showed that this algorithm is essentially subsumed by forward checking.
Conflict-directed backjumping was devised by Prosser (1993). The most general and pow-
erful form of intelligent backtracking was actually developed very early on by Stallman and
Sussman (1977). Their technique of dependency-directed backtracking led to the develop-
DEPENDENCY-
DIRECTED
BACKTRACKING
ment of truth maintenance systems (Doyle, 1979), which we discuss in Section 12.6.2. The
connection between the two areas is analyzed by de Kleer (1989).
The work of Stallman and Sussman also introduced the idea of constraint learning,
in which partial results obtained by search can be saved and reused later in the search. The
idea was formalized Dechter (1990a). Backmarking (Gaschnig, 1979) is a particularly sim-
BACKMARKING
ple method in which consistent and inconsistent pairwise assignments are saved and used
to avoid rechecking constraints. Backmarking can be combined with conflict-directed back-
jumping; Kondrak and van Beek (1997) present a hybrid algorithm that provably subsumes
either method taken separately. The method of dynamic backtracking (Ginsberg, 1993) re-
DYNAMIC
BACKTRACKING
tains successful partial assignments from later subsets of variables when backtracking over
an earlier choice that does not invalidate the later success.
Empirical studies of several randomized backtracking methods were done by Gomes
et al. (2000) and Gomes and Selman (2001). Van Beek (2006) surveys backtracking.
Local search in constraint satisfaction problems was popularized by the work of Kirk-
patrick et al. (1983) on simulated annealing (see Chapter 4), which is widely used for schedul-
ing problems. The min-conflicts heuristic was first proposed by Gu (1989) and was developed
independently by Minton et al. (1992). Sosic and Gu (1994) showed how it could be applied
to solve the 3,000,000 queens problem in less than a minute. The astounding success of
local search using min-conflicts on the n-queens problem led to a reappraisal of the nature
and prevalence of “easy” and “hard” problems. Peter Cheeseman et al. (1991) explored the
difficulty of randomly generated CSPs and discovered that almost all such problems either
are trivially easy or have no solutions. Only if the parameters of the problem generator are
set in a certain narrow range, within which roughly half of the problems are solvable, do we
find “hard” problem instances. We discuss this phenomenon further in Chapter 7. Konolige
(1994) showed that local search is inferior to backtracking search on problems with a certain
degree of local structure; this led to work that combined local search and inference, such as
that by Pinkas and Dechter (1995). Hoos and Tsang (2006) survey local search techniques.
Work relating the structure and complexity of CSPs originates with Freuder (1985), who
showed that search on arc consistent trees works without any backtracking. A similar result,
with extensions to acyclic hypergraphs, was developed in the database community (Beeri
et al., 1983). Bayardo and Miranker (1994) present an algorithm for tree-structured CSPs
that runs in linear time without any preprocessing.
Since those papers were published, there has been a great deal of progress in developing
more general results relating the complexity of solving a CSP to the structure of its constraint
graph. The notion of tree width was introduced by the graph theorists Robertson and Seymour
(1986). Dechter and Pearl (1987, 1989), building on the work of Freuder, applied a related
notion (which they called induced width) to constraint satisfaction problems and developed
the tree decomposition approach sketched in Section 6.5. Drawing on this work and on results

from database theory, Gottlob et al. (1999a, 1999b) developed a notion, hypertree width, that
is based on the characterization of the CSP as a hypergraph. In addition to showing that any
CSP with hypertree width w can be solved in time O(nw+1 log n), they also showed that
hypertree width subsumes all previously defined measures of “width” in the sense that there
are cases where the hypertree width is bounded and the other measures are unbounded.
Interest in look-back approaches to backtracking was rekindled by the work of Bayardo
and Schrag (1997), whose RELSAT algorithm combined constraint learning and backjumping
and was shown to outperform many other algorithms of the time. This led to AND/OR
search algorithms applicable to both CSPs and probabilistic reasoning (Dechter and Ma-
teescu, 2007). Brown et al. (1988) introduce the idea of symmetry breaking in CSPs, and
Gent et al. (2006) give a recent survey.
The field of distributed constraint satisfaction looks at solving CSPs when there is a
DISTRIBUTED
CONSTRAINT
SATISFACTION
collection of agents, each of which controls a subset of the constraint variables. There have
been annual workshops on this problem since 2000, and good coverage elsewhere (Collin
et al., 1999; Pearce et al., 2008; Shoham and Leyton-Brown, 2009).
Comparing CSP algorithms is mostly an empirical science: few theoretical results show
that one algorithm dominates another on all problems; instead, we need to run experiments
to see which algorithms perform better on typical instances of problems. As Hooker (1995)
points out, we need to be careful to distinguish between competitive testing—as occurs in
competitions among algorithms based on run time—and scientific testing, whose goal is to
identify the properties of an algorithm that determine its efficacy on a class of problems.
The recent textbooks by Apt (2003) and Dechter (2003), and the collection by Rossi
et al. (2006) are excellent resources on constraint processing. There are several good earlier
surveys, including those by Kumar (1992), Dechter and Frost (2002), and Bartak (2001); and
the encyclopedia articles by Dechter (1992) and Mackworth (1992). Pearson and Jeavons
(1997) survey tractable classes of CSPs, covering both structural decomposition methods
and methods that rely on properties of the domains or constraints themselves. Kondrak and
van Beek (1997) give an analytical survey of backtracking search algorithms, and Bacchus
and van Run (1995) give a more empirical survey. Constraint programming is covered in the
books by Apt (2003) and Fruhwirth and Abdennadher (2003). Several interesting applications
are described in the collection edited by Freuder and Mackworth (1994). Papers on constraint
satisfaction appear regularly in Artificial Intelligence and in the specialist journal Constraints.
The primary conference venue is the International Conference on Principles and Practice of
Constraint Programming, often called CP.
EXERCISES
6.1 How many solutions are there for the map-coloring problem in Figure 6.1? How many
solutions if four colors are allowed? Two colors?

Exercises 231
6.2 Consider the problem of constructing (not solving) crossword puzzles:5 fitting words
into a rectangular grid. The grid, which is given as part of the problem, specifies which
squares are blank and which are shaded. Assume that a list of words (i.e., a dictionary)
is provided and that the task is to fill in the blank squares by using any subset of the list.
Formulate this problem precisely in two ways:
a. As a general search problem. Choose an appropriate search algorithm and specify a
heuristic function. Is it better to fill in blanks one letter at a time or one word at a time?
b. As a constraint satisfaction problem. Should the variables be words or letters?
Which formulation do you think will be better? Why?
6.3 Give precise formulations for each of the following as constraint satisfaction problems:
a. Rectilinear floor-planning: find non-overlapping places in a large rectangle for a number
of smaller rectangles.
b. Class scheduling: There is a fixed number of professors and classrooms, a list of classes
to be offered, and a list of possible time slots for classes. Each professor has a set of
classes that he or she can teach.
c. Hamiltonian tour: given a network of cities connected by roads, choose an order to visit
all cities in a country without repeating any.
6.4 Solve the cryptarithmetic problem in Figure 6.2 by hand, using the strategy of back-
tracking with forward checking and the MRV and least-constraining-value heuristics.
6.5 Show how a single ternary constraint such as “A + B = C” can be turned into three
binary constraints by using an auxiliary variable. You may assume finite domains. (Hint:
Consider a new variable that takes on values that are pairs of other values, and consider
constraints such as “X is the first element of the pair Y .”) Next, show how constraints with
more than three variables can be treated similarly. Finally, show how unary constraints can be
eliminated by altering the domains of variables. This completes the demonstration that any
CSP can be transformed into a CSP with only binary constraints.
6.6 Consider the following logic puzzle: In five houses, each with a different color, live five
persons of different nationalities, each of whom prefers a different brand of candy, a different
drink, and a different pet. Given the following facts, the questions to answer are “Where does
the zebra live, and in which house do they drink water?”
The Englishman lives in the red house.
The Spaniard owns the dog.
The Norwegian lives in the first house on the left.
The green house is immediately to the right of the ivory house.
The man who eats Hershey bars lives in the house next to the man with the fox.
Kit Kats are eaten in the yellow house.
The Norwegian lives next to the blue house.
5 Ginsberg et al. (1990) discuss several methods for constructing crossword puzzles. Littman et al. (1999) tackle
the harder problem of solving them.

The Smarties eater owns snails.
The Snickers eater drinks orange juice.
The Ukrainian drinks tea.
The Japanese eats Milky Ways.
Kit Kats are eaten in a house next to the house where the horse is kept.
Coffee is drunk in the green house.
Milk is drunk in the middle house.
Discuss different representations of this problem as a CSP. Why would one prefer one repre-
sentation over another?
6.7 Consider the graph with 8 nodes A1, A2, A3, A4, H, T, F1, F2. Ai is connected to
Ai+1 for all i, each Ai is connected to H, H is connected to T, and T is connected to each
Fi. Find a 3-coloring of this graph by hand using the following strategy: backtracking with
conflict-directed backjumping, the variable order A1, H, A4, F1, A2, F2, A3, T, and the
value order R, G, B.
6.8 Explain why it is a good heuristic to choose the variable that is most constrained but the
value that is least constraining in a CSP search.
6.9 Generate random instances of map-coloring problems as follows: scatter n points on
the unit square; select a point X at random, connect X by a straight line to the nearest point
Y such that X is not already connected to Y and the line crosses no other line; repeat the
previous step until no more connections are possible. The points represent regions on the
map and the lines connect neighbors. Now try to find k-colorings of each map, for both
k = 3 and k = 4, using min-conflicts, backtracking, backtracking with forward checking, and
backtracking with MAC. Construct a table of average run times for each algorithm for values
of n up to the largest you can manage. Comment on your results.
6.10 Use the AC-3 algorithm to show that arc consistency can detect the inconsistency of
the partial assignment {WA = red, V = blue} for the problem shown in Figure 6.1.
6.11 What is the worst-case complexity of running AC-3 on a tree-structured CSP?
6.12 AC-3 puts back on the queue every arc (Xk, Xi) whenever any value is deleted from
the domain of Xi, even if each value of Xk is consistent with several remaining values of Xi.
Suppose that, for every arc (Xk, Xi), we keep track of the number of remaining values of Xi
that are consistent with each value of Xk. Explain how to update these numbers efficiently
and hence show that arc consistency can be enforced in total time O(n2d2).
6.13 The TREE-CSP-SOLVER (Figure 6.10) makes arcs consistent starting at the leaves and
working backwards towards the root. Why does it do that? What would happen if it went in
the opposite direction?
6.14 We introduced Sudoku as a CSP to be solved by search over partial assignments be-
cause that is the way people generally undertake solving Sudoku problems. It is also possible,
of course, to attack these problems with local search over complete assignments. How well
would a local solver using the min-conflicts heuristic do on Sudoku problems?

Exercises 233
6.15 Define in your own words the terms constraint, commutativity, arc consistency, back-
jumping, min-conflicts, and cycle cutset.
6.16 Suppose that a graph is known to have a cycle cutset of no more than k nodes. Describe
a simple algorithm for finding a minimal cycle cutset whose run time is not much more than
O(nk) for a CSP with n variables. Search the literature for methods for finding approximately
minimal cycle cutsets in time that is polynomial in the size of the cutset. Does the existence
of such algorithms make the cycle cutset method practical?
6.17 Consider the problem of tiling a surface (completely and exactly covering it) with n
dominoes (2 × 1 rectangles). The surface is an arbitrary edge-connected (i.e., adjacent along
an edge, not just a corner) collection of 2n 1×1 squares (e.g., a checkerboard, a checkerboard
with some squares missing, a 10 × 1 row of squares, etc.).
a. Formulate this problem precisely as a CSP where the dominoes are the variables.
b. Formulate this problem precisely as a CSP where the squares are the variables, keeping
the state space as small as possible. (Hint: does it matter which particular domino goes
on a given pair of squares?)
c. Construct a surface consisting of 6 squares such that your CSP formulation from part
(b) has a tree-structured constraint graph.
d. Describe exactly the set of solvable instances that have a tree-structured constraint
graph.

7 LOGICAL AGENTS
In which we design agents that can form representations of a complex world, use a
process of inference to derive new representations about the world, and use these
new representations to deduce what to do.
Humans, it seems, know things; and what they know helps them do things. These are
not empty statements. They make strong claims about how the intelligence of humans is
achieved—not by purely reflex mechanisms but by processes of reasoning that operate on
REASONING
internal representations of knowledge. In AI, this approach to intelligence is embodied in
REPRESENTATION
knowledge-based agents.
KNOWLEDGE-BASED
AGENTS
The problem-solving agents of Chapters 3 and 4 know things, but only in a very limited,
inflexible sense. For example, the transition model for the 8-puzzle—knowledge of what the
actions do—is hidden inside the domain-specific code of the RESULT function. It can be
used to predict the outcome of actions but not to deduce that two tiles cannot occupy the
same space or that states with odd parity cannot be reached from states with even parity. The
atomic representations used by problem-solving agents are also very limiting. In a partially
observable environment, an agent’s only choice for representing what it knows about the
current state is to list all possible concrete states—a hopeless prospect in large environments.
Chapter 6 introduced the idea of representing states as assignments of values to vari-
ables; this is a step in the right direction, enabling some parts of the agent to work in a
domain-independent way and allowing for more efficient algorithms. In this chapter and
those that follow, we take this step to its logical conclusion, so to speak—we develop logic
LOGIC
as a general class of representations to support knowledge-based agents. Such agents can
combine and recombine information to suit myriad purposes. Often, this process can be quite
far removed from the needs of the moment—as when a mathematician proves a theorem or
an astronomer calculates the earth’s life expectancy. Knowledge-based agents can accept new
tasks in the form of explicitly described goals; they can achieve competence quickly by being
told or learning new knowledge about the environment; and they can adapt to changes in the
environment by updating the relevant knowledge.
We begin in Section 7.1 with the overall agent design. Section 7.2 introduces a sim-
ple new environment, the wumpus world, and illustrates the operation of a knowledge-based
agent without going into any technical detail. Then we explain the general principles of logic
234

Section 7.1. Knowledge-Based Agents 235
in Section 7.3 and the specifics of propositional logic in Section 7.4. While less expressive
than first-order logic (Chapter 8), propositional logic illustrates all the basic concepts of
logic; it also comes with well-developed inference technologies, which we describe in sec-
tions 7.5 and 7.6. Finally, Section 7.7 combines the concept of knowledge-based agents with
the technology of propositional logic to build some simple agents for the wumpus world.
7.1 KNOWLEDGE-BASED AGENTS
The central component of a knowledge-based agent is its knowledge base, or KB. A knowl-
KNOWLEDGE BASE
edge base is a set of sentences. (Here “sentence” is used as a technical term. It is related
SENTENCE
but not identical to the sentences of English and other natural languages.) Each sentence is
expressed in a language called a knowledge representation language and represents some
KNOWLEDGE
REPRESENTATION
LANGUAGE
assertion about the world. Sometimes we dignify a sentence with the name axiom, when the
AXIOM
sentence is taken as given without being derived from other sentences.
There must be a way to add new sentences to the knowledge base and a way to query
what is known. The standard names for these operations are TELL and ASK, respectively.
Both operations may involve inference—that is, deriving new sentences from old. Inference
INFERENCE
must obey the requirement that when one ASKs a question of the knowledge base, the answer
should follow from what has been told (or TELLed) to the knowledge base previously. Later
in this chapter, we will be more precise about the crucial word “follow.” For now, take it to
mean that the inference process should not make things up as it goes along.
Figure 7.1 shows the outline of a knowledge-based agent program. Like all our agents,
it takes a percept as input and returns an action. The agent maintains a knowledge base, KB,
which may initially contain some background knowledge.
BACKGROUND
KNOWLEDGE
Each time the agent program is called, it does three things. First, it TELLs the knowl-
edge base what it perceives. Second, it ASKs the knowledge base what action it should
perform. In the process of answering this query, extensive reasoning may be done about
the current state of the world, about the outcomes of possible action sequences, and so on.
Third, the agent program TELLs the knowledge base which action was chosen, and the agent
executes the action.
The details of the representation language are hidden inside three functions that imple-
ment the interface between the sensors and actuators on one side and the core representation
and reasoning system on the other. MAKE-PERCEPT-SENTENCE constructs a sentence as-
serting that the agent perceived the given percept at the given time. MAKE-ACTION-QUERY
constructs a sentence that asks what action should be done at the current time. Finally,
MAKE-ACTION-SENTENCE constructs a sentence asserting that the chosen action was ex-
ecuted. The details of the inference mechanisms are hidden inside TELL and ASK. Later
sections will reveal these details.
The agent in Figure 7.1 appears quite similar to the agents with internal state described
in Chapter 2. Because of the definitions of TELL and ASK, however, the knowledge-based
agent is not an arbitrary program for calculating actions. It is amenable to a description at

236 Chapter 7. Logical Agents
function KB-AGENT(percept) returns an action
persistent: KB, a knowledge base
t, a counter, initially 0, indicating time
TELL(KB, MAKE-PERCEPT-SENTENCE(percept,t))
action ← ASK(KB, MAKE-ACTION-QUERY(t))
TELL(KB, MAKE-ACTION-SENTENCE(action,t))
t ← t + 1
return action
Figure 7.1 A generic knowledge-based agent. Given a percept, the agent adds the percept
to its knowledge base, asks the knowledge base for the best action, and tells the knowledge
base that it has in fact taken that action.
the knowledge level, where we need specify only what the agent knows and what its goals
KNOWLEDGE LEVEL
are, in order to ﬁx its behavior. For example, an automated taxi might have the goal of
taking a passenger from San Francisco to Marin County and might know that the Golden
Gate Bridge is the only link between the two locations. Then we can expect it to cross the
Golden Gate Bridge because it knows that that will achieve its goal. Notice that this analysis
is independent of how the taxi works at the implementation level. It doesn’t matter whether
IMPLEMENTATION
LEVEL
its geographical knowledge is implemented as linked lists or pixel maps, or whether it reasons
by manipulating strings of symbols stored in registers or by propagating noisy signals in a
network of neurons.
A knowledge-based agent can be built simply by TELLing it what it needs to know.
Starting with an empty knowledge base, the agent designer can TELL sentences one by one
until the agent knows how to operate in its environment. This is called the declarative ap-
DECLARATIVE
proach to system building. In contrast, the procedural approach encodes desired behaviors
directly as program code. In the 1970s and 1980s, advocates of the two approaches engaged
in heated debates. We now understand that a successful agent often combines both declarative
and procedural elements in its design, and that declarative knowledge can often be compiled
into more efﬁcient procedural code.
We can also provide a knowledge-based agent with mechanisms that allow it to learn
for itself. These mechanisms, which are discussed in Chapter 18, create general knowledge
about the environment from a series of percepts. A learning agent can be fully autonomous.
7.2 THE WUMPUS WORLD
In this section we describe an environment in which knowledge-based agents can show their
worth. The wumpus world is a cave consisting of rooms connected by passageways. Lurking
WUMPUS WORLD
somewhere in the cave is the terrible wumpus, a beast that eats anyone who enters its room.
The wumpus can be shot by an agent, but the agent has only one arrow. Some rooms contain

Section 7.2. The Wumpus World 237
bottomless pits that will trap anyone who wanders into these rooms (except for the wumpus,
which is too big to fall in). The only mitigating feature of this bleak environment is the
possibility of finding a heap of gold. Although the wumpus world is rather tame by modern
computer game standards, it illustrates some important points about intelligence.
A sample wumpus world is shown in Figure 7.2. The precise definition of the task
environment is given, as suggested in Section 2.3, by the PEAS description:
• Performance measure: +1000 for climbing out of the cave with the gold, –1000 for
falling into a pit or being eaten by the wumpus, –1 for each action taken and –10 for
using up the arrow. The game ends either when the agent dies or when the agent climbs
out of the cave.
• Environment: A 4 × 4 grid of rooms. The agent always starts in the square labeled
[1,1], facing to the right. The locations of the gold and the wumpus are chosen ran-
domly, with a uniform distribution, from the squares other than the start square. In
addition, each square other than the start can be a pit, with probability 0.2.
• Actuators: The agent can move Forward, TurnLeft by 90◦, or TurnRight by 90◦. The
agent dies a miserable death if it enters a square containing a pit or a live wumpus. (It
is safe, albeit smelly, to enter a square with a dead wumpus.) If an agent tries to move
forward and bumps into a wall, then the agent does not move. The action Grab can be
used to pick up the gold if it is in the same square as the agent. The action Shoot can
be used to fire an arrow in a straight line in the direction the agent is facing. The arrow
continues until it either hits (and hence kills) the wumpus or hits a wall. The agent has
only one arrow, so only the first Shoot action has any effect. Finally, the action Climb
can be used to climb out of the cave, but only from square [1,1].
• Sensors: The agent has five sensors, each of which gives a single bit of information:
– In the square containing the wumpus and in the directly (not diagonally) adjacent
squares, the agent will perceive a Stench.
– In the squares directly adjacent to a pit, the agent will perceive a Breeze.
– In the square where the gold is, the agent will perceive a Glitter.
– When an agent walks into a wall, it will perceive a Bump.
– When the wumpus is killed, it emits a woeful Scream that can be perceived any-
where in the cave.
The percepts will be given to the agent program in the form of a list of five symbols;
for example, if there is a stench and a breeze, but no glitter, bump, or scream, the agent
program will get [Stench, Breeze, None, None, None].
We can characterize the wumpus environment along the various dimensions given in Chap-
ter 2. Clearly, it is discrete, static, and single-agent. (The wumpus doesn’t move, fortunately.)
It is sequential, because rewards may come only after many actions are taken. It is partially
observable, because some aspects of the state are not directly perceivable: the agent’s lo-
cation, the wumpus’s state of health, and the availability of an arrow. As for the locations
of the pits and the wumpus: we could treat them as unobserved parts of the state that hap-
pen to be immutable—in which case, the transition model for the environment is completely

PIT
1 2 3 4
1
2
3
4
START
Stench
Stench
Breeze
Gold
PIT
PIT
Breeze
Breeze
Breeze
Breeze
Breeze
Stench
Figure 7.2 A typical wumpus world. The agent is in the bottom left corner, facing right.
known; or we could say that the transition model itself is unknown because the agent doesn’t
know which Forward actions are fatal—in which case, discovering the locations of pits and
wumpus completes the agent’s knowledge of the transition model.
For an agent in the environment, the main challenge is its initial ignorance of the con-
figuration of the environment; overcoming this ignorance seems to require logical reasoning.
In most instances of the wumpus world, it is possible for the agent to retrieve the gold safely.
Occasionally, the agent must choose between going home empty-handed and risking death to
find the gold. About 21% of the environments are utterly unfair, because the gold is in a pit
or surrounded by pits.
Let us watch a knowledge-based wumpus agent exploring the environment shown in
Figure 7.2. We use an informal knowledge representation language consisting of writing
down symbols in a grid (as in Figures 7.3 and 7.4).
The agent’s initial knowledge base contains the rules of the environment, as described
previously; in particular, it knows that it is in [1,1] and that [1,1] is a safe square; we denote
that with an “A” and “OK,” respectively, in square [1,1].
The first percept is [None, None, None, None, None], from which the agent can con-
clude that its neighboring squares, [1,2] and [2,1], are free of dangers—they are OK. Fig-
ure 7.3(a) shows the agent’s state of knowledge at this point.
A cautious agent will move only into a square that it knows to be OK. Let us suppose
the agent decides to move forward to [2,1]. The agent perceives a breeze (denoted by “B”) in
[2,1], so there must be a pit in a neighboring square. The pit cannot be in [1,1], by the rules of
the game, so there must be a pit in [2,2] or [3,1] or both. The notation “P?” in Figure 7.3(b)
indicates a possible pit in those squares. At this point, there is only one known square that is
OK and that has not yet been visited. So the prudent agent will turn around, go back to [1,1],
and then proceed to [1,2].
The agent perceives a stench in [1,2], resulting in the state of knowledge shown in
Figure 7.4(a). The stench in [1,2] means that there must be a wumpus nearby. But the

Section 7.2. The Wumpus World 239
A
B
G
P
S
W
= Agent
= Breeze
= Glitter, Gold
= Pit
= Stench
= Wumpus
OK = Safe square
V = Visited
A
OK
1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4 3,4 4,4
OK
OK
B
P?
P?
A
OK OK
OK
1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4 3,4 4,4
V
(a) (b)
Figure 7.3 The first step taken by the agent in the wumpus world. (a) The initial sit-
uation, after percept [None, None, None, None, None]. (b) After one move, with percept
[None, Breeze, None, None, None].
B
B P!
A
OK OK
OK
1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4 3,4 4,4
V
OK
W!
V
P!
A
OK OK
OK
1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4 3,4 4,4
V
S
OK
W!
V
V V
B
S G
P?
P?
(b)
(a)
S
A
B
G
P
S
W
= Agent
= Breeze
= Glitter, Gold
= Pit
= Stench
= Wumpus
OK = Safe square
V = Visited
Figure 7.4 Two later stages in the progress of the agent. (a) After the third move,
with percept [Stench, None, None, None, None]. (b) After the fifth move, with percept
[Stench, Breeze, Glitter, None, None].
wumpus cannot be in [1,1], by the rules of the game, and it cannot be in [2,2] (or the agent
would have detected a stench when it was in [2,1]). Therefore, the agent can infer that the
wumpus is in [1,3]. The notation W! indicates this inference. Moreover, the lack of a breeze
in [1,2] implies that there is no pit in [2,2]. Yet the agent has already inferred that there must
be a pit in either [2,2] or [3,1], so this means it must be in [3,1]. This is a fairly difficult
inference, because it combines knowledge gained at different times in different places and
relies on the lack of a percept to make one crucial step.

The agent has now proved to itself that there is neither a pit nor a wumpus in [2,2], so it
is OK to move there. We do not show the agent’s state of knowledge at [2,2]; we just assume
that the agent turns and moves to [2,3], giving us Figure 7.4(b). In [2,3], the agent detects a
glitter, so it should grab the gold and then return home.
Note that in each case for which the agent draws a conclusion from the available in-
formation, that conclusion is guaranteed to be correct if the available information is correct.
This is a fundamental property of logical reasoning. In the rest of this chapter, we describe
how to build logical agents that can represent information and draw conclusions such as those
described in the preceding paragraphs.
7.3 LOGIC
This section summarizes the fundamental concepts of logical representation and reasoning.
These beautiful ideas are independent of any of logic’s particular forms. We therefore post-
pone the technical details of those forms until the next section, using instead the familiar
example of ordinary arithmetic.
In Section 7.1, we said that knowledge bases consist of sentences. These sentences
are expressed according to the syntax of the representation language, which specifies all the
SYNTAX
sentences that are well formed. The notion of syntax is clear enough in ordinary arithmetic:
“x + y = 4” is a well-formed sentence, whereas “x4y+ =” is not.
A logic must also define the semantics or meaning of sentences. The semantics defines
SEMANTICS
the truth of each sentence with respect to each possible world. For example, the semantics
TRUTH
POSSIBLE WORLD for arithmetic specifies that the sentence “x + y = 4” is true in a world where x is 2 and y
is 2, but false in a world where x is 1 and y is 1. In standard logics, every sentence must be
either true or false in each possible world—there is no “in between.”1
When we need to be precise, we use the term model in place of “possible world.”
MODEL
Whereas possible worlds might be thought of as (potentially) real environments that the agent
might or might not be in, models are mathematical abstractions, each of which simply fixes
the truth or falsehood of every relevant sentence. Informally, we may think of a possible world
as, for example, having x men and y women sitting at a table playing bridge, and the sentence
x + y = 4 is true when there are four people in total. Formally, the possible models are just
all possible assignments of real numbers to the variables x and y. Each such assignment fixes
the truth of any sentence of arithmetic whose variables are x and y. If a sentence α is true in
model m, we say that m satisfies α or sometimes m is a model of α. We use the notation
SATISFACTION
M(α) to mean the set of all models of α.
Now that we have a notion of truth, we are ready to talk about logical reasoning. This
involves the relation of logical entailment between sentences—the idea that a sentence fol-
ENTAILMENT
lows logically from another sentence. In mathematical notation, we write
α |= β
1 Fuzzy logic, discussed in Chapter 14, allows for degrees of truth.

Section 7.3. Logic 241
1 2 3
1
2 PIT
1 2 3
1
2 PIT
1 2 3
1
2 PIT PIT
PIT
1 2 3
1
2 PIT
PIT
1 2 3
1
2
PIT
1 2 3
1
2 PIT
PIT
1 2 3
1
2 PIT PIT
1 2 3
1
2
KB α1
Breeze
Breeze
Breeze
Breeze
Breeze
Breeze
Breeze
Breeze
1 2 3
1
2 PIT
1 2 3
1
2 PIT
1 2 3
1
2 PIT PIT
PIT
1 2 3
1
2 PIT
PIT
1 2 3
1
2
PIT
1 2 3
1
2 PIT
PIT
1 2 3
1
2 PIT PIT
1 2 3
1
2
KB
Breeze
α2
Breeze
Breeze
Breeze
Breeze
Breeze
Breeze
Breeze
(a) (b)
Figure 7.5 Possible models for the presence of pits in squares [1,2], [2,2], and [3,1]. The
KB corresponding to the observations of nothing in [1,1] and a breeze in [2,1] is shown by
the solid line. (a) Dotted line shows models of α1 (no pit in [1,2]). (b) Dotted line shows
models of α2 (no pit in [2,2]).
to mean that the sentence α entails the sentence β. The formal deﬁnition of entailment is this:
α |= β if and only if, in every model in which α is true, β is also true. Using the notation just
introduced, we can write
α |= β if and only if M(α) ⊆ M(β) .
(Note the direction of the ⊆ here: if α |= β, then α is a stronger assertion than β: it rules out
more possible worlds.) The relation of entailment is familiar from arithmetic; we are happy
with the idea that the sentence x = 0 entails the sentence xy = 0. Obviously, in any model
where x is zero, it is the case that xy is zero (regardless of the value of y).
We can apply the same kind of analysis to the wumpus-world reasoning example given
in the preceding section. Consider the situation in Figure 7.3(b): the agent has detected
nothing in [1,1] and a breeze in [2,1]. These percepts, combined with the agent’s knowledge
of the rules of the wumpus world, constitute the KB. The agent is interested (among other
things) in whether the adjacent squares [1,2], [2,2], and [3,1] contain pits. Each of the three
squares might or might not contain a pit, so (for the purposes of this example) there are 23 = 8
possible models. These eight models are shown in Figure 7.5.2
The KB can be thought of as a set of sentences or as a single sentence that asserts all
the individual sentences. The KB is false in models that contradict what the agent knows—
for example, the KB is false in any model in which [1,2] contains a pit, because there is
no breeze in [1,1]. There are in fact just three models in which the KB is true, and these are
2 Although the ﬁgure shows the models as partial wumpus worlds, they are really nothing more than assignments
of true and false to the sentences “there is a pit in [1,2]” etc. Models, in the mathematical sense, do not need to
have ’orrible ’airy wumpuses in them.

shown surrounded by a solid line in Figure 7.5. Now let us consider two possible conclusions:
α1 = “There is no pit in [1,2].”
We have surrounded the models of α1 and α2 with dotted lines in Figures 7.5(a) and 7.5(b),
respectively. By inspection, we see the following:
in every model in which KB is true, α1 is also true.
Hence, KB |= α1: there is no pit in [1,2]. We can also see that
in some models in which KB is true, α2 is false.
Hence, KB 6|= α2: the agent cannot conclude that there is no pit in [2,2]. (Nor can it conclude
that there is a pit in [2,2].)3
The preceding example not only illustrates entailment but also shows how the definition
of entailment can be applied to derive conclusions—that is, to carry out logical inference.
LOGICAL INFERENCE
The inference algorithm illustrated in Figure 7.5 is called model checking, because it enu-
MODEL CHECKING
merates all possible models to check that α is true in all models in which KB is true, that is,
that M(KB) ⊆ M(α).
In understanding entailment and inference, it might help to think of the set of all conse-
quences of KB as a haystack and of α as a needle. Entailment is like the needle being in the
haystack; inference is like finding it. This distinction is embodied in some formal notation: if
an inference algorithm i can derive α from KB, we write
KB ⊢i α ,
which is pronounced “α is derived from KB by i” or “i derives α from KB.”
An inference algorithm that derives only entailed sentences is called sound or truth-
SOUND
preserving. Soundness is a highly desirable property. An unsound inference procedure es-
TRUTH-PRESERVING
sentially makes things up as it goes along—it announces the discovery of nonexistent needles.
It is easy to see that model checking, when it is applicable,4 is a sound procedure.
The property of completeness is also desirable: an inference algorithm is complete if
COMPLETENESS
it can derive any sentence that is entailed. For real haystacks, which are finite in extent,
it seems obvious that a systematic examination can always decide whether the needle is in
the haystack. For many knowledge bases, however, the haystack of consequences is infinite,
and completeness becomes an important issue.5 Fortunately, there are complete inference
procedures for logics that are sufficiently expressive to handle many knowledge bases.
We have described a reasoning process whose conclusions are guaranteed to be true
in any world in which the premises are true; in particular, if KB is true in the real world,
then any sentence α derived from KB by a sound inference procedure is also true in the real
world. So, while an inference process operates on “syntax”—internal physical configurations
such as bits in registers or patterns of electrical blips in brains—the process corresponds
3 The agent can calculate the probability that there is a pit in [2,2]; Chapter 13 shows how.
4 Model checking works if the space of models is finite—for example, in wumpus worlds of fixed size. For
arithmetic, on the other hand, the space of models is infinite: even if we restrict ourselves to the integers, there
are infinitely many pairs of values for x and y in the sentence x + y = 4.
5 Compare with the case of infinite search spaces in Chapter 3, where depth-first search is not complete.

Section 7.4. Propositional Logic: A Very Simple Logic 243
Follows
Sentences Sentence
Entails
Semantics
Semantics
Representation
World
Aspects of the
real world
Aspect of the
real world
Figure 7.6 Sentences are physical configurations of the agent, and reasoning is a process
of constructing new physical configurations from old ones. Logical reasoning should en-
sure that the new configurations represent aspects of the world that actually follow from the
aspects that the old configurations represent.
to the real-world relationship whereby some aspect of the real world is the case6 by virtue
of other aspects of the real world being the case. This correspondence between world and
representation is illustrated in Figure 7.6.
The final issue to consider is grounding—the connection between logical reasoning
GROUNDING
processes and the real environment in which the agent exists. In particular, how do we know
that KB is true in the real world? (After all, KB is just “syntax” inside the agent’s head.)
This is a philosophical question about which many, many books have been written. (See
Chapter 26.) A simple answer is that the agent’s sensors create the connection. For example,
our wumpus-world agent has a smell sensor. The agent program creates a suitable sentence
whenever there is a smell. Then, whenever that sentence is in the knowledge base, it is
true in the real world. Thus, the meaning and truth of percept sentences are defined by the
processes of sensing and sentence construction that produce them. What about the rest of the
agent’s knowledge, such as its belief that wumpuses cause smells in adjacent squares? This
is not a direct representation of a single percept, but a general rule—derived, perhaps, from
perceptual experience but not identical to a statement of that experience. General rules like
this are produced by a sentence construction process called learning, which is the subject
of Part V. Learning is fallible. It could be the case that wumpuses cause smells except on
February 29 in leap years, which is when they take their baths. Thus, KB may not be true in
the real world, but with good learning procedures, there is reason for optimism.
7.4 PROPOSITIONAL LOGIC: A VERY SIMPLE LOGIC
We now present a simple but powerful logic called propositional logic. We cover the syntax
PROPOSITIONAL
LOGIC
of propositional logic and its semantics—the way in which the truth of sentences is deter-
mined. Then we look at entailment—the relation between a sentence and another sentence
that follows from it—and see how this leads to a simple algorithm for logical inference. Ev-
erything takes place, of course, in the wumpus world.
6 As Wittgenstein (1922) put it in his famous Tractatus: “The world is everything that is the case.”

7.4.1 Syntax
The syntax of propositional logic defines the allowable sentences. The atomic sentences
ATOMIC SENTENCES
consist of a single proposition symbol. Each such symbol stands for a proposition that can
PROPOSITION
SYMBOL
be true or false. We use symbols that start with an uppercase letter and may contain other
letters or subscripts, for example: P, Q, R, W1,3 and North. The names are arbitrary but
are often chosen to have some mnemonic value—we use W1,3 to stand for the proposition
that the wumpus is in [1,3]. (Remember that symbols such as W1,3 are atomic, i.e., W, 1,
and 3 are not meaningful parts of the symbol.) There are two proposition symbols with fixed
meanings: True is the always-true proposition and False is the always-false proposition.
Complex sentences are constructed from simpler sentences, using parentheses and logical
COMPLEX
SENTENCES
connectives. There are five connectives in common use:
LOGICAL
CONNECTIVES
¬ (not). A sentence such as ¬W1,3 is called the negation of W1,3. A literal is either an
NEGATION
LITERAL atomic sentence (a positive literal) or a negated atomic sentence (a negative literal).
∧ (and). A sentence whose main connective is ∧, such as W1,3 ∧ P3,1, is called a con-
junction; its parts are the conjuncts. (The ∧ looks like an “A” for “And.”)
CONJUNCTION
∨ (or). A sentence using ∨, such as (W1,3 ∧P3,1)∨W2,2, is a disjunction of the disjuncts
DISJUNCTION
(W1,3 ∧ P3,1) and W2,2. (Historically, the ∨ comes from the Latin “vel,” which means
“or.” For most people, it is easier to remember ∨ as an upside-down ∧.)
⇒ (implies). A sentence such as (W1,3 ∧P3,1) ⇒ ¬W2,2 is called an implication (or con-
IMPLICATION
ditional). Its premise or antecedent is (W1,3 ∧P3,1), and its conclusion or consequent
PREMISE
CONCLUSION is ¬W2,2. Implications are also known as rules or if–then statements. The implication
RULES symbol is sometimes written in other books as ⊃ or →.
⇔ (if and only if). The sentence W1,3 ⇔ ¬W2,2 is a biconditional. Some other books
BICONDITIONAL
write this as ≡.
Sentence → AtomicSentence | ComplexSentence
AtomicSentence → True | False | P | Q | R | . . .
ComplexSentence → ( Sentence ) | [ Sentence ]
| ¬ Sentence
| Sentence ∧ Sentence
| Sentence ∨ Sentence
| Sentence ⇒ Sentence
| Sentence ⇔ Sentence
OPERATOR PRECEDENCE : ¬, ∧, ∨, ⇒, ⇔
Figure 7.7 A BNF (Backus–Naur Form) grammar of sentences in propositional logic,
along with operator precedences, from highest to lowest.

Figure 7.7 gives a formal grammar of propositional logic; see page 1066 if you are not
familiar with the BNF notation. The BNF grammar by itself is ambiguous; a sentence with
several operators can be parsed by the grammar in multiple ways. To eliminate the ambiguity
we define a precedence for each operator. The “not” operator (¬) has the highest precedence,
which means that in the sentence ¬A ∧ B the ¬ binds most tightly, giving us the equivalent
of (¬A)∧B rather than ¬(A∧B). (The notation for ordinary arithmetic is the same: −2+4
is 2, not –6.) When in doubt, use parentheses to make sure of the right interpretation. Square
brackets mean the same thing as parentheses; the choice of square brackets or parentheses is
solely to make it easier for a human to read a sentence.
7.4.2 Semantics
Having specified the syntax of propositional logic, we now specify its semantics. The se-
mantics defines the rules for determining the truth of a sentence with respect to a particular
model. In propositional logic, a model simply fixes the truth value—true or false—for ev-
TRUTH VALUE
ery proposition symbol. For example, if the sentences in the knowledge base make use of the
proposition symbols P1,2, P2,2, and P3,1, then one possible model is
m1 = {P1,2 = false, P2,2 = false, P3,1 = true} .
With three proposition symbols, there are 23 = 8 possible models—exactly those depicted
in Figure 7.5. Notice, however, that the models are purely mathematical objects with no
necessary connection to wumpus worlds. P1,2 is just a symbol; it might mean “there is a pit
in [1,2]” or “I’m in Paris today and tomorrow.”
The semantics for propositional logic must specify how to compute the truth value of
any sentence, given a model. This is done recursively. All sentences are constructed from
atomic sentences and the five connectives; therefore, we need to specify how to compute the
truth of atomic sentences and how to compute the truth of sentences formed with each of the
five connectives. Atomic sentences are easy:
• True is true in every model and False is false in every model.
• The truth value of every other proposition symbol must be specified directly in the
model. For example, in the model m1 given earlier, P1,2 is false.
For complex sentences, we have five rules, which hold for any subsentences P and Q in any
model m (here “iff” means “if and only if”):
• ¬P is true iff P is false in m.
• P ∧ Q is true iff both P and Q are true in m.
• P ∨ Q is true iff either P or Q is true in m.
• P ⇒ Q is true unless P is true and Q is false in m.
• P ⇔ Q is true iff P and Q are both true or both false in m.
The rules can also be expressed with truth tables that specify the truth value of a complex
TRUTH TABLE
sentence for each possible assignment of truth values to its components. Truth tables for the
five connectives are given in Figure 7.8. From these tables, the truth value of any sentence s
can be computed with respect to any model m by a simple recursive evaluation. For example,

P Q ¬P P ∧ Q P ∨ Q P ⇒ Q P ⇔ Q
false false true false false true true
false true true false true true false
true false false false true false false
true true false true true true true
Figure 7.8 Truth tables for the five logical connectives. To use the table to compute, for
example, the value of P ∨ Q when P is true and Q is false, first look on the left for the row
where P is true and Q is false (the third row). Then look in that row under the P ∨Q column
to see the result: true.
the sentence ¬P1,2 ∧ (P2,2 ∨ P3,1), evaluated in m1, gives true ∧ (false ∨ true) = true ∧
true = true. Exercise 7.3 asks you to write the algorithm PL-TRUE?(s, m), which computes
the truth value of a propositional logic sentence s in a model m.
The truth tables for “and,” “or,” and “not” are in close accord with our intuitions about
the English words. The main point of possible confusion is that P ∨ Q is true when P is true
or Q is true or both. A different connective, called “exclusive or” (“xor” for short), yields
false when both disjuncts are true.7 There is no consensus on the symbol for exclusive or;
some choices are ˙
∨ or 6= or ⊕.
The truth table for ⇒ may not quite fit one’s intuitive understanding of “P implies Q”
or “if P then Q.” For one thing, propositional logic does not require any relation of causation
or relevance between P and Q. The sentence “5 is odd implies Tokyo is the capital of Japan”
is a true sentence of propositional logic (under the normal interpretation), even though it is
a decidedly odd sentence of English. Another point of confusion is that any implication is
true whenever its antecedent is false. For example, “5 is even implies Sam is smart” is true,
regardless of whether Sam is smart. This seems bizarre, but it makes sense if you think of
“P ⇒ Q” as saying, “If P is true, then I am claiming that Q is true. Otherwise I am making
no claim.” The only way for this sentence to be false is if P is true but Q is false.
The biconditional, P ⇔ Q, is true whenever both P ⇒ Q and Q ⇒ P are true. In
English, this is often written as “P if and only if Q.” Many of the rules of the wumpus world
are best written using ⇔. For example, a square is breezy if a neighboring square has a pit,
and a square is breezy only if a neighboring square has a pit. So we need a biconditional,
B1,1 ⇔ (P1,2 ∨ P2,1) ,
where B1,1 means that there is a breeze in [1,1].
7.4.3 A simple knowledge base
Now that we have defined the semantics for propositional logic, we can construct a knowledge
base for the wumpus world. We focus first on the immutable aspects of the wumpus world,
leaving the mutable aspects for a later section. For now, we need the following symbols for
each [x, y] location:
7 Latin has a separate word, aut, for exclusive or.

Px,y is true if there is a pit in [x, y].
Wx,y is true if there is a wumpus in [x, y], dead or alive.
Bx,y is true if the agent perceives a breeze in [x, y].
Sx,y is true if the agent perceives a stench in [x, y].
The sentences we write will suffice to derive ¬P1,2 (there is no pit in [1,2]), as was done
informally in Section 7.3. We label each sentence Ri so that we can refer to them:
• There is no pit in [1,1]:
R1 : ¬P1,1 .
• A square is breezy if and only if there is a pit in a neighboring square. This has to be
stated for each square; for now, we include just the relevant squares:
R2 : B1,1 ⇔ (P1,2 ∨ P2,1) .
R3 : B2,1 ⇔ (P1,1 ∨ P2,2 ∨ P3,1) .
• The preceding sentences are true in all wumpus worlds. Now we include the breeze
percepts for the first two squares visited in the specific world the agent is in, leading up
to the situation in Figure 7.3(b).
R4 : ¬B1,1 .
R5 : B2,1 .
7.4.4 A simple inference procedure
Our goal now is to decide whether KB |= α for some sentence α. For example, is ¬P1,2
entailed by our KB? Our first algorithm for inference is a model-checking approach that is a
direct implementation of the definition of entailment: enumerate the models, and check that
α is true in every model in which KB is true. Models are assignments of true or false to
every proposition symbol. Returning to our wumpus-world example, the relevant proposi-
tion symbols are B1,1, B2,1, P1,1, P1,2, P2,1, P2,2, and P3,1. With seven symbols, there are
27 = 128 possible models; in three of these, KB is true (Figure 7.9). In those three models,
¬P1,2 is true, hence there is no pit in [1,2]. On the other hand, P2,2 is true in two of the three
models and false in one, so we cannot yet tell whether there is a pit in [2,2].
Figure 7.9 reproduces in a more precise form the reasoning illustrated in Figure 7.5. A
general algorithm for deciding entailment in propositional logic is shown in Figure 7.10. Like
the BACKTRACKING-SEARCH algorithm on page 215, TT-ENTAILS? performs a recursive
enumeration of a finite space of assignments to symbols. The algorithm is sound because it
implements directly the definition of entailment, and complete because it works for any KB
and α and always terminates—there are only finitely many models to examine.
Of course, “finitely many” is not always the same as “few.” If KB and α contain n
symbols in all, then there are 2n models. Thus, the time complexity of the algorithm is
O(2n). (The space complexity is only O(n) because the enumeration is depth-first.) Later in
this chapter we show algorithms that are much more efficient in many cases. Unfortunately,
propositional entailment is co-NP-complete (i.e., probably no easier than NP-complete—see
Appendix A), so every known inference algorithm for propositional logic has a worst-case
complexity that is exponential in the size of the input.

B1,1 B2,1 P1,1 P1,2 P2,1 P2,2 P3,1 R1 R2 R3 R4 R5 KB
false false false false false false false true true true true false false
false false false false false false true true true false true false false
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
false true false false false false false true true false true true false
false true false false false false true true true true true true true
false true false false false true false true true true true true true
false true false false false true true true true true true true true
false true false false true false false true false false true true false
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
true true true true true true true false true true false true false
Figure 7.9 A truth table constructed for the knowledge base given in the text. KB is true
if R1 through R5 are true, which occurs in just 3 of the 128 rows (the ones underlined in the
right-hand column). In all 3 rows, P1,2 is false, so there is no pit in [1,2]. On the other hand,
there might (or might not) be a pit in [2,2].
function TT-ENTAILS?(KB,α) returns true or false
inputs: KB, the knowledge base, a sentence in propositional logic
α, the query, a sentence in propositional logic
symbols ← a list of the proposition symbols in KB and α
return TT-CHECK-ALL(KB,α,symbols,{ })
function TT-CHECK-ALL(KB,α,symbols,model) returns true or false
if EMPTY?(symbols) then
if PL-TRUE?(KB,model) then return PL-TRUE?(α,model)
else return true // when KB is false, always return true
else do
P ← FIRST(symbols)
rest ← REST(symbols)
return (TT-CHECK-ALL(KB,α,rest,model ∪ {P = true})
and
TT-CHECK-ALL(KB,α,rest,model ∪ {P = false }))
Figure 7.10 A truth-table enumeration algorithm for deciding propositional entailment.
(TT stands for truth table.) PL-TRUE? returns true if a sentence holds within a model. The
variable model represents a partial model—an assignment to some of the symbols. The key-
word “and” is used here as a logical operation on its two arguments, returning true or false.

Section 7.5. Propositional Theorem Proving 249
(α ∧ β) ≡ (β ∧ α) commutativity of ∧
(α ∨ β) ≡ (β ∨ α) commutativity of ∨
((α ∧ β) ∧ γ) ≡ (α ∧ (β ∧ γ)) associativity of ∧
((α ∨ β) ∨ γ) ≡ (α ∨ (β ∨ γ)) associativity of ∨
¬(¬α) ≡ α double-negation elimination
(α ⇒ β) ≡ (¬β ⇒ ¬α) contraposition
(α ⇒ β) ≡ (¬α ∨ β) implication elimination
(α ⇔ β) ≡ ((α ⇒ β) ∧ (β ⇒ α)) biconditional elimination
¬(α ∧ β) ≡ (¬α ∨ ¬β) De Morgan
¬(α ∨ β) ≡ (¬α ∧ ¬β) De Morgan
(α ∧ (β ∨ γ)) ≡ ((α ∧ β) ∨ (α ∧ γ)) distributivity of ∧ over ∨
(α ∨ (β ∧ γ)) ≡ ((α ∨ β) ∧ (α ∨ γ)) distributivity of ∨ over ∧
Figure 7.11 Standard logical equivalences. The symbols α, β, and γ stand for arbitrary
sentences of propositional logic.
7.5 PROPOSITIONAL THEOREM PROVING
So far, we have shown how to determine entailment by model checking: enumerating models
and showing that the sentence must hold in all models. In this section, we show how entail-
ment can be done by theorem proving—applying rules of inference directly to the sentences
THEOREM PROVING
in our knowledge base to construct a proof of the desired sentence without consulting models.
If the number of models is large but the length of the proof is short, then theorem proving can
be more efficient than model checking.
Before we plunge into the details of theorem-proving algorithms, we will need some
additional concepts related to entailment. The first concept is logical equivalence: two sen-
LOGICAL
EQUIVALENCE
tences α and β are logically equivalent if they are true in the same set of models. We write
this as α ≡ β. For example, we can easily show (using truth tables) that P ∧ Q and Q ∧ P
are logically equivalent; other equivalences are shown in Figure 7.11. These equivalences
play much the same role in logic as arithmetic identities do in ordinary mathematics. An
alternative definition of equivalence is as follows: any two sentences α and β are equivalent
only if each of them entails the other:
α ≡ β if and only if α |= β and β |= α .
The second concept we will need is validity. A sentence is valid if it is true in all models. For
VALIDITY
example, the sentence P ∨ ¬P is valid. Valid sentences are also known as tautologies—they
TAUTOLOGY
are necessarily true. Because the sentence True is true in all models, every valid sentence
is logically equivalent to True. What good are valid sentences? From our definition of
entailment, we can derive the deduction theorem, which was known to the ancient Greeks:
DEDUCTION
THEOREM
For any sentences α and β, α |= β if and only if the sentence (α ⇒ β) is valid.
(Exercise 7.5 asks for a proof.) Hence, we can decide if α |= β by checking that (α ⇒ β) is
true in every model—which is essentially what the inference algorithm in Figure 7.10 does—

or by proving that (α ⇒ β) is equivalent to True. Conversely, the deduction theorem states
that every valid implication sentence describes a legitimate inference.
The final concept we will need is satisfiability. A sentence is satisfiable if it is true
SATISFIABILITY
in, or satisfied by, some model. For example, the knowledge base given earlier, (R1 ∧ R2 ∧
R3 ∧ R4 ∧ R5), is satisfiable because there are three models in which it is true, as shown
in Figure 7.9. Satisfiability can be checked by enumerating the possible models until one is
found that satisfies the sentence. The problem of determining the satisfiability of sentences
in propositional logic—the SAT problem—was the first problem proved to be NP-complete.
SAT
Many problems in computer science are really satisfiability problems. For example, all the
constraint satisfaction problems in Chapter 6 ask whether the constraints are satisfiable by
some assignment.
Validity and satisfiability are of course connected: α is valid iff ¬α is unsatisfiable;
contrapositively, α is satisfiable iff ¬α is not valid. We also have the following useful result:
α |= β if and only if the sentence (α ∧ ¬β) is unsatisfiable.
Proving β from α by checking the unsatisfiability of (α ∧ ¬β) corresponds exactly to the
standard mathematical proof technique of reductio ad absurdum (literally, “reduction to an
REDUCTIO AD
ABSURDUM
absurd thing”). It is also called proof by refutation or proof by contradiction. One assumes a
REFUTATION
CONTRADICTION sentence β to be false and shows that this leads to a contradiction with known axioms α. This
contradiction is exactly what is meant by saying that the sentence (α ∧ ¬β) is unsatisfiable.
7.5.1 Inference and proofs
This section covers inference rules that can be applied to derive a proof—a chain of conclu-
INFERENCE RULES
PROOF sions that leads to the desired goal. The best-known rule is called Modus Ponens (Latin for
MODUS PONENS mode that affirms) and is written
α ⇒ β, α
β
.
The notation means that, whenever any sentences of the form α ⇒ β and α are given, then
the sentence β can be inferred. For example, if (WumpusAhead ∧WumpusAlive) ⇒ Shoot
and (WumpusAhead ∧ WumpusAlive) are given, then Shoot can be inferred.
Another useful inference rule is And-Elimination, which says that, from a conjunction,
AND-ELIMINATION
any of the conjuncts can be inferred:
α ∧ β
α
.
For example, from (WumpusAhead ∧ WumpusAlive), WumpusAlive can be inferred.
By considering the possible truth values of α and β, one can show easily that Modus
Ponens and And-Elimination are sound once and for all. These rules can then be used in
any particular instances where they apply, generating sound inferences without the need for
enumerating models.
All of the logical equivalences in Figure 7.11 can be used as inference rules. For exam-
ple, the equivalence for biconditional elimination yields the two inference rules
α ⇔ β
(α ⇒ β) ∧ (β ⇒ α)
and
(α ⇒ β) ∧ (β ⇒ α)
α ⇔ β
.

Not all inference rules work in both directions like this. For example, we cannot run Modus
Ponens in the opposite direction to obtain α ⇒ β and α from β.
Let us see how these inference rules and equivalences can be used in the wumpus world.
We start with the knowledge base containing R1 through R5 and show how to prove ¬P1,2,
that is, there is no pit in [1,2]. First, we apply biconditional elimination to R2 to obtain
R6 : (B1,1 ⇒ (P1,2 ∨ P2,1)) ∧ ((P1,2 ∨ P2,1) ⇒ B1,1) .
Then we apply And-Elimination to R6 to obtain
R7 : ((P1,2 ∨ P2,1) ⇒ B1,1) .
Logical equivalence for contrapositives gives
R8 : (¬B1,1 ⇒ ¬(P1,2 ∨ P2,1)) .
Now we can apply Modus Ponens with R8 and the percept R4 (i.e., ¬B1,1), to obtain
R9 : ¬(P1,2 ∨ P2,1) .
Finally, we apply De Morgan’s rule, giving the conclusion
R10 : ¬P1,2 ∧ ¬P2,1 .
That is, neither [1,2] nor [2,1] contains a pit.
We found this proof by hand, but we can apply any of the search algorithms in Chapter 3
to find a sequence of steps that constitutes a proof. We just need to define a proof problem as
follows:
• INITIAL STATE: the initial knowledge base.
• ACTIONS: the set of actions consists of all the inference rules applied to all the sen-
tences that match the top half of the inference rule.
• RESULT: the result of an action is to add the sentence in the bottom half of the inference
rule.
• GOAL: the goal is a state that contains the sentence we are trying to prove.
Thus, searching for proofs is an alternative to enumerating models. In many practical cases
finding a proof can be more efficient because the proof can ignore irrelevant propositions, no
matter how many of them there are. For example, the proof given earlier leading to ¬P1,2 ∧
¬P2,1 does not mention the propositions B2,1, P1,1, P2,2, or P3,1. They can be ignored
because the goal proposition, P1,2, appears only in sentence R2; the other propositions in R2
appear only in R4 and R2; so R1, R3, and R5 have no bearing on the proof. The same would
hold even if we added a million more sentences to the knowledge base; the simple truth-table
algorithm, on the other hand, would be overwhelmed by the exponential explosion of models.
One final property of logical systems is monotonicity, which says that the set of en-
MONOTONICITY
tailed sentences can only increase as information is added to the knowledge base.8 For any
sentences α and β,
if KB |= α then KB ∧ β |= α .
8 Nonmonotonic logics, which violate the monotonicity property, capture a common property of human rea-
soning: changing one’s mind. They are discussed in Section 12.6.

For example, suppose the knowledge base contains the additional assertion β stating that there
are exactly eight pits in the world. This knowledge might help the agent draw additional con-
clusions, but it cannot invalidate any conclusion α already inferred—such as the conclusion
that there is no pit in [1,2]. Monotonicity means that inference rules can be applied whenever
suitable premises are found in the knowledge base—the conclusion of the rule must follow
regardless of what else is in the knowledge base.
7.5.2 Proof by resolution
We have argued that the inference rules covered so far are sound, but we have not discussed
the question of completeness for the inference algorithms that use them. Search algorithms
such as iterative deepening search (page 89) are complete in the sense that they will ﬁnd
any reachable goal, but if the available inference rules are inadequate, then the goal is not
reachable—no proof exists that uses only those inference rules. For example, if we removed
the biconditional elimination rule, the proof in the preceding section would not go through.
The current section introduces a single inference rule, resolution, that yields a complete
inference algorithm when coupled with any complete search algorithm.
We begin by using a simple version of the resolution rule in the wumpus world. Let us
consider the steps leading up to Figure 7.4(a): the agent returns from [2,1] to [1,1] and then
goes to [1,2], where it perceives a stench, but no breeze. We add the following facts to the
knowledge base:
R11 : ¬B1,2 .
R12 : B1,2 ⇔ (P1,1 ∨ P2,2 ∨ P1,3) .
By the same process that led to R10 earlier, we can now derive the absence of pits in [2,2]
and [1,3] (remember that [1,1] is already known to be pitless):
R13 : ¬P2,2 .
R14 : ¬P1,3 .
We can also apply biconditional elimination to R3, followed by Modus Ponens with R5, to
obtain the fact that there is a pit in [1,1], [2,2], or [3,1]:
R15 : P1,1 ∨ P2,2 ∨ P3,1 .
Now comes the ﬁrst application of the resolution rule: the literal ¬P2,2 in R13 resolves with
the literal P2,2 in R15 to give the resolvent
RESOLVENT
R16 : P1,1 ∨ P3,1 .
In English; if there’s a pit in one of [1,1], [2,2], and [3,1] and it’s not in [2,2], then it’s in [1,1]
or [3,1]. Similarly, the literal ¬P1,1 in R1 resolves with the literal P1,1 in R16 to give
R17 : P3,1 .
In English: if there’s a pit in [1,1] or [3,1] and it’s not in [1,1], then it’s in [3,1]. These last
two inference steps are examples of the unit resolution inference rule,
UNIT RESOLUTION
ℓ1 ∨ · · · ∨ ℓk, m
ℓ1 ∨ · · · ∨ ℓi−1 ∨ ℓi+1 ∨ · · · ∨ ℓk
,
where each ℓ is a literal and ℓi and m are complementary literals (i.e., one is the negation
COMPLEMENTARY
LITERALS

of the other). Thus, the unit resolution rule takes a clause—a disjunction of literals—and a
CLAUSE
literal and produces a new clause. Note that a single literal can be viewed as a disjunction of
one literal, also known as a unit clause.
UNIT CLAUSE
The unit resolution rule can be generalized to the full resolution rule,
RESOLUTION
ℓ1 ∨ · · · ∨ ℓk, m1 ∨ · · · ∨ mn
ℓ1 ∨ · · · ∨ ℓi−1 ∨ ℓi+1 ∨ · · · ∨ ℓk ∨ m1 ∨ · · · ∨ mj−1 ∨ mj+1 ∨ · · · ∨ mn
,
where ℓi and mj are complementary literals. This says that resolution takes two clauses and
produces a new clause containing all the literals of the two original clauses except the two
complementary literals. For example, we have
P1,1 ∨ P3,1, ¬P1,1 ∨ ¬P2,2
P3,1 ∨ ¬P2,2
.
There is one more technical aspect of the resolution rule: the resulting clause should contain
only one copy of each literal.9 The removal of multiple copies of literals is called factoring.
FACTORING
For example, if we resolve (A ∨ B) with (A ∨ ¬B), we obtain (A ∨ A), which is reduced to
just A.
The soundness of the resolution rule can be seen easily by considering the literal ℓi that
is complementary to literal mj in the other clause. If ℓi is true, then mj is false, and hence
m1 ∨ · · · ∨ mj−1 ∨ mj+1 ∨ · · · ∨ mn must be true, because m1 ∨ · · · ∨ mn is given. If ℓi is
false, then ℓ1 ∨ · · · ∨ ℓi−1 ∨ ℓi+1 ∨ · · · ∨ ℓk must be true because ℓ1 ∨ · · · ∨ ℓk is given. Now
ℓi is either true or false, so one or other of these conclusions holds—exactly as the resolution
rule states.
What is more surprising about the resolution rule is that it forms the basis for a family
of complete inference procedures. A resolution-based theorem prover can, for any sentences
α and β in propositional logic, decide whether α |= β. The next two subsections explain
how resolution accomplishes this.
Conjunctive normal form
The resolution rule applies only to clauses (that is, disjunctions of literals), so it would seem
to be relevant only to knowledge bases and queries consisting of clauses. How, then, can
it lead to a complete inference procedure for all of propositional logic? The answer is that
every sentence of propositional logic is logically equivalent to a conjunction of clauses. A
sentence expressed as a conjunction of clauses is said to be in conjunctive normal form or
CONJUNCTIVE
NORMAL FORM
CNF (see Figure 7.14). We now describe a procedure for converting to CNF. We illustrate
the procedure by converting the sentence B1,1 ⇔ (P1,2 ∨ P2,1) into CNF. The steps are as
follows:
1. Eliminate ⇔, replacing α ⇔ β with (α ⇒ β) ∧ (β ⇒ α).
(B1,1 ⇒ (P1,2 ∨ P2,1)) ∧ ((P1,2 ∨ P2,1) ⇒ B1,1) .
2. Eliminate ⇒, replacing α ⇒ β with ¬α ∨ β:
(¬B1,1 ∨ P1,2 ∨ P2,1) ∧ (¬(P1,2 ∨ P2,1) ∨ B1,1) .
9 If a clause is viewed as a set of literals, then this restriction is automatically respected. Using set notation for
clauses makes the resolution rule much cleaner, at the cost of introducing additional notation.

3. CNF requires ¬ to appear only in literals, so we “move ¬ inwards” by repeated appli-
cation of the following equivalences from Figure 7.11:
¬(¬α) ≡ α (double-negation elimination)
¬(α ∧ β) ≡ (¬α ∨ ¬β) (De Morgan)
¬(α ∨ β) ≡ (¬α ∧ ¬β) (De Morgan)
In the example, we require just one application of the last rule:
(¬B1,1 ∨ P1,2 ∨ P2,1) ∧ ((¬P1,2 ∧ ¬P2,1) ∨ B1,1) .
4. Now we have a sentence containing nested ∧ and ∨ operators applied to literals. We
apply the distributivity law from Figure 7.11, distributing ∨ over ∧ wherever possible.
(¬B1,1 ∨ P1,2 ∨ P2,1) ∧ (¬P1,2 ∨ B1,1) ∧ (¬P2,1 ∨ B1,1) .
The original sentence is now in CNF, as a conjunction of three clauses. It is much harder to
read, but it can be used as input to a resolution procedure.
A resolution algorithm
Inference procedures based on resolution work by using the principle of proof by contradic-
tion introduced on page 250. That is, to show that KB |= α, we show that (KB ∧ ¬α) is
unsatisfiable. We do this by proving a contradiction.
A resolution algorithm is shown in Figure 7.12. First, (KB ∧ ¬α) is converted into
CNF. Then, the resolution rule is applied to the resulting clauses. Each pair that contains
complementary literals is resolved to produce a new clause, which is added to the set if it is
not already present. The process continues until one of two things happens:
• there are no new clauses that can be added, in which case KB does not entail α; or,
• two clauses resolve to yield the empty clause, in which case KB entails α.
The empty clause—a disjunction of no disjuncts—is equivalent to False because a disjunction
is true only if at least one of its disjuncts is true. Another way to see that an empty clause
represents a contradiction is to observe that it arises only from resolving two complementary
unit clauses such as P and ¬P.
We can apply the resolution procedure to a very simple inference in the wumpus world.
When the agent is in [1,1], there is no breeze, so there can be no pits in neighboring squares.
The relevant knowledge base is
KB = R2 ∧ R4 = (B1,1 ⇔ (P1,2 ∨ P2,1)) ∧ ¬B1,1
and we wish to prove α which is, say, ¬P1,2. When we convert (KB ∧ ¬α) into CNF, we
obtain the clauses shown at the top of Figure 7.13. The second row of the figure shows
clauses obtained by resolving pairs in the first row. Then, when P1,2 is resolved with ¬P1,2,
we obtain the empty clause, shown as a small square. Inspection of Figure 7.13 reveals that
many resolution steps are pointless. For example, the clause B1,1 ∨¬B1,1 ∨P1,2 is equivalent
to True ∨ P1,2 which is equivalent to True. Deducing that True is true is not very helpful.
Therefore, any clause in which two complementary literals appear can be discarded.

function PL-RESOLUTION(KB,α) returns true or false
inputs: KB, the knowledge base, a sentence in propositional logic
α, the query, a sentence in propositional logic
clauses ← the set of clauses in the CNF representation of KB ∧ ¬α
new ← { }
loop do
for each pair of clauses Ci, Cj in clauses do
resolvents ← PL-RESOLVE(Ci,Cj)
if resolvents contains the empty clause then return true
new ← new ∪ resolvents
if new ⊆ clauses then return false
clauses ← clauses ∪ new
Figure 7.12 A simple resolution algorithm for propositional logic. The function
PL-RESOLVE returns the set of all possible clauses obtained by resolving its two inputs.
¬P2,1 B1,1 ¬B1,1 P1,2 P2,1 ¬P1,2 B1,1 ¬B1,1 P1,2
¬P2,1 ¬P1,2
P1,2 P2,1 ¬P2,1 ¬B1,1 P2,1 B1,1 P1,2 P2,1 ¬P1,2
¬B1,1 P1,2 B1,1
^ ^ ^
^
^ ^ ^ ^ ^ ^ ^
^
Figure 7.13 Partial application of PL-RESOLUTION to a simple inference in the wumpus
world. ¬P1,2 is shown to follow from the first four clauses in the top row.
Completeness of resolution
To conclude our discussion of resolution, we now show why PL-RESOLUTION is complete.
To do this, we introduce the resolution closure RC (S) of a set of clauses S, which is the set
RESOLUTION
CLOSURE
of all clauses derivable by repeated application of the resolution rule to clauses in S or their
derivatives. The resolution closure is what PL-RESOLUTION computes as the final value of
the variable clauses. It is easy to see that RC (S) must be finite, because there are only finitely
many distinct clauses that can be constructed out of the symbols P1, . . . , Pk that appear in S.
(Notice that this would not be true without the factoring step that removes multiple copies of
literals.) Hence, PL-RESOLUTION always terminates.
The completeness theorem for resolution in propositional logic is called the ground
resolution theorem:
GROUND
RESOLUTION
THEOREM
If a set of clauses is unsatisfiable, then the resolution closure of those clauses
contains the empty clause.
This theorem is proved by demonstrating its contrapositive: if the closure RC(S) does not

contain the empty clause, then S is satisfiable. In fact, we can construct a model for S with
suitable truth values for P1, . . . , Pk. The construction procedure is as follows:
For i from 1 to k,
– If a clause in RC(S) contains the literal ¬Pi and all its other literals are false under
the assignment chosen for P1, . . . , Pi−1, then assign false to Pi.
– Otherwise, assign true to Pi.
This assignment to P1, . . . , Pk is a model of S. To see this, assume the opposite—that, at
some stage i in the sequence, assigning symbol Pi causes some clause C to become false.
For this to happen, it must be the case that all the other literals in C must already have been
falsified by assignments to P1, . . . , Pi−1. Thus, C must now look like either (false ∨ false ∨
· · · false∨Pi) or like (false∨false∨· · · false∨¬Pi). If just one of these two is in RC(S), then
the algorithm will assign the appropriate truth value to Pi to make C true, so C can only be
falsified if both of these clauses are in RC(S). Now, since RC(S) is closed under resolution,
it will contain the resolvent of these two clauses, and that resolvent will have all of its literals
already falsified by the assignments to P1, . . . , Pi−1. This contradicts our assumption that
the first falsified clause appears at stage i. Hence, we have proved that the construction never
falsifies a clause in RC(S); that is, it produces a model of RC(S) and thus a model of S
itself (since S is contained in RC(S)).
7.5.3 Horn clauses and definite clauses
The completeness of resolution makes it a very important inference method. In many practical
situations, however, the full power of resolution is not needed. Some real-world knowledge
bases satisfy certain restrictions on the form of sentences they contain, which enables them
to use a more restricted and efficient inference algorithm.
One such restricted form is the definite clause, which is a disjunction of literals of
DEFINITE CLAUSE
which exactly one is positive. For example, the clause (¬L1,1 ∨ ¬Breeze ∨ B1,1) is a definite
clause, whereas (¬B1,1 ∨ P1,2 ∨ P2,1) is not.
Slightly more general is the Horn clause, which is a disjunction of literals of which at
HORN CLAUSE
most one is positive. So all definite clauses are Horn clauses, as are clauses with no positive
literals; these are called goal clauses. Horn clauses are closed under resolution: if you resolve
GOAL CLAUSES
two Horn clauses, you get back a Horn clause.
Knowledge bases containing only definite clauses are interesting for three reasons:
1. Every definite clause can be written as an implication whose premise is a conjunction
of positive literals and whose conclusion is a single positive literal. (See Exercise 7.13.)
For example, the definite clause (¬L1,1 ∨ ¬Breeze ∨ B1,1) can be written as the im-
plication (L1,1 ∧ Breeze) ⇒ B1,1. In the implication form, the sentence is easier to
understand: it says that if the agent is in [1,1] and there is a breeze, then [1,1] is breezy.
In Horn form, the premise is called the body and the conclusion is called the head. A
BODY
HEAD sentence consisting of a single positive literal, such as L1,1, is called a fact. It too can
FACT be written in implication form as True ⇒ L1,1, but it is simpler to write just L1,1.

CNFSentence → Clause1 ∧ · · · ∧ Clausen
Clause → Literal1 ∨ · · · ∨ Literalm
Literal → Symbol | ¬Symbol
Symbol → P | Q | R | . . .
HornClauseForm → DefiniteClauseForm | GoalClauseForm
DefiniteClauseForm → (Symbol1 ∧ · · · ∧ Symboll) ⇒ Symbol
GoalClauseForm → (Symbol1 ∧ · · · ∧ Symboll) ⇒ False
Figure 7.14 A grammar for conjunctive normal form, Horn clauses, and definite clauses.
A clause such as A ∧ B ⇒ C is still a definite clause when it is written as ¬A ∨ ¬B ∨ C,
but only the former is considered the canonical form for definite clauses. One more class is
the k-CNF sentence, which is a CNF sentence where each clause has at most k literals.
2. Inference with Horn clauses can be done through the forward-chaining and backward-
FORWARD-CHAINING
chaining algorithms, which we explain next. Both of these algorithms are natural,
BACKWARD-
CHAINING
in that the inference steps are obvious and easy for humans to follow. This type of
inference is the basis for logic programming, which is discussed in Chapter 9.
3. Deciding entailment with Horn clauses can be done in time that is linear in the size of
the knowledge base—a pleasant surprise.
7.5.4 Forward and backward chaining
The forward-chaining algorithm PL-FC-ENTAILS?(KB, q) determines if a single proposi-
tion symbol q—the query—is entailed by a knowledge base of definite clauses. It begins
from known facts (positive literals) in the knowledge base. If all the premises of an implica-
tion are known, then its conclusion is added to the set of known facts. For example, if L1,1
and Breeze are known and (L1,1 ∧ Breeze) ⇒ B1,1 is in the knowledge base, then B1,1 can
be added. This process continues until the query q is added or until no further inferences can
be made. The detailed algorithm is shown in Figure 7.15; the main point to remember is that
it runs in linear time.
The best way to understand the algorithm is through an example and a picture. Fig-
ure 7.16(a) shows a simple knowledge base of Horn clauses with A and B as known facts.
Figure 7.16(b) shows the same knowledge base drawn as an AND–OR graph (see Chap-
ter 4). In AND–OR graphs, multiple links joined by an arc indicate a conjunction—every
link must be proved—while multiple links without an arc indicate a disjunction—any link
can be proved. It is easy to see how forward chaining works in the graph. The known leaves
(here, A and B) are set, and inference propagates up the graph as far as possible. Wher-
ever a conjunction appears, the propagation waits until all the conjuncts are known before
proceeding. The reader is encouraged to work through the example in detail.

function PL-FC-ENTAILS?(KB,q) returns true or false
inputs: KB, the knowledge base, a set of propositional definite clauses
q, the query, a proposition symbol
count ← a table, where count[c] is the number of symbols in c’s premise
inferred ← a table, where inferred[s] is initially false for all symbols
agenda ← a queue of symbols, initially symbols known to be true in KB
while agenda is not empty do
p ← POP(agenda)
if p = q then return true
if inferred[p] = false then
inferred[p] ← true
for each clause c in KB where p is in c.PREMISE do
decrement count[c]
if count[c] = 0 then add c.CONCLUSION to agenda
return false
Figure 7.15 The forward-chaining algorithm for propositional logic. The agenda keeps
track of symbols known to be true but not yet “processed.” The count table keeps track of
how many premises of each implication are as yet unknown. Whenever a new symbol p from
the agenda is processed, the count is reduced by one for each implication in whose premise
p appears (easily identified in constant time with appropriate indexing.) If a count reaches
zero, all the premises of the implication are known, so its conclusion can be added to the
agenda. Finally, we need to keep track of which symbols have been processed; a symbol that
is already in the set of inferred symbols need not be added to the agenda again. This avoids
redundant work and prevents loops caused by implications such as P ⇒ Q and Q ⇒ P.
It is easy to see that forward chaining is sound: every inference is essentially an appli-
cation of Modus Ponens. Forward chaining is also complete: every entailed atomic sentence
will be derived. The easiest way to see this is to consider the final state of the inferred table
(after the algorithm reaches a fixed point where no new inferences are possible). The table
FIXED POINT
contains true for each symbol inferred during the process, and false for all other symbols.
We can view the table as a logical model; moreover, every definite clause in the original KB is
true in this model. To see this, assume the opposite, namely that some clause a1∧. . .∧ak ⇒ b
is false in the model. Then a1 ∧ . . . ∧ ak must be true in the model and b must be false in
the model. But this contradicts our assumption that the algorithm has reached a fixed point!
We can conclude, therefore, that the set of atomic sentences inferred at the fixed point defines
a model of the original KB. Furthermore, any atomic sentence q that is entailed by the KB
must be true in all its models and in this model in particular. Hence, every entailed atomic
sentence q must be inferred by the algorithm.
Forward chaining is an example of the general concept of data-driven reasoning—that
DATA-DRIVEN
is, reasoning in which the focus of attention starts with the known data. It can be used within
an agent to derive conclusions from incoming percepts, often without a specific query in
mind. For example, the wumpus agent might TELL its percepts to the knowledge base using

Section 7.6. Effective Propositional Model Checking 259
P ⇒ Q
L ∧ M ⇒ P
B ∧ L ⇒ M
A ∧ P ⇒ L
A ∧ B ⇒ L
A
B
Q
P
M
L
B
A
(a) (b)
Figure 7.16 (a) A set of Horn clauses. (b) The corresponding AND–OR graph.
an incremental forward-chaining algorithm in which new facts can be added to the agenda to
initiate new inferences. In humans, a certain amount of data-driven reasoning occurs as new
information arrives. For example, if I am indoors and hear rain starting to fall, it might occur
to me that the picnic will be canceled. Yet it will probably not occur to me that the seventeenth
petal on the largest rose in my neighbor’s garden will get wet; humans keep forward chaining
under careful control, lest they be swamped with irrelevant consequences.
The backward-chaining algorithm, as its name suggests, works backward from the
query. If the query q is known to be true, then no work is needed. Otherwise, the algorithm
finds those implications in the knowledge base whose conclusion is q. If all the premises of
one of those implications can be proved true (by backward chaining), then q is true. When
applied to the query Q in Figure 7.16, it works back down the graph until it reaches a set of
known facts, A and B, that forms the basis for a proof. The algorithm is essentially identical
to the AND-OR-GRAPH-SEARCH algorithm in Figure 4.11. As with forward chaining, an
efficient implementation runs in linear time.
Backward chaining is a form of goal-directed reasoning. It is useful for answering
GOAL-DIRECTED
REASONING
specific questions such as “What shall I do now?” and “Where are my keys?” Often, the cost
of backward chaining is much less than linear in the size of the knowledge base, because the
process touches only relevant facts.
7.6 EFFECTIVE PROPOSITIONAL MODEL CHECKING
In this section, we describe two families of efficient algorithms for general propositional
inference based on model checking: One approach based on backtracking search, and one
on local hill-climbing search. These algorithms are part of the “technology” of propositional
logic. This section can be skimmed on a first reading of the chapter.

The algorithms we describe are for checking satisfiability: the SAT problem. (As noted
earlier, testing entailment, α |= β, can be done by testing unsatisfiability of α ∧ ¬β.) We
have already noted the connection between finding a satisfying model for a logical sentence
and finding a solution for a constraint satisfaction problem, so it is perhaps not surprising that
the two families of algorithms closely resemble the backtracking algorithms of Section 6.3
and the local search algorithms of Section 6.4. They are, however, extremely important in
their own right because so many combinatorial problems in computer science can be reduced
to checking the satisfiability of a propositional sentence. Any improvement in satisfiability
algorithms has huge consequences for our ability to handle complexity in general.
7.6.1 A complete backtracking algorithm
The first algorithm we consider is often called the Davis–Putnam algorithm, after the sem-
DAVIS–PUTNAM
ALGORITHM
inal paper by Martin Davis and Hilary Putnam (1960). The algorithm is in fact the version
described by Davis, Logemann, and Loveland (1962), so we will call it DPLL after the ini-
tials of all four authors. DPLL takes as input a sentence in conjunctive normal form—a set
of clauses. Like BACKTRACKING-SEARCH and TT-ENTAILS?, it is essentially a recursive,
depth-first enumeration of possible models. It embodies three improvements over the simple
scheme of TT-ENTAILS?:
• Early termination: The algorithm detects whether the sentence must be true or false,
even with a partially completed model. A clause is true if any literal is true, even if
the other literals do not yet have truth values; hence, the sentence as a whole could be
judged true even before the model is complete. For example, the sentence (A ∨ B) ∧
(A ∨ C) is true if A is true, regardless of the values of B and C. Similarly, a sentence
is false if any clause is false, which occurs when each of its literals is false. Again, this
can occur long before the model is complete. Early termination avoids examination of
entire subtrees in the search space.
• Pure symbol heuristic: A pure symbol is a symbol that always appears with the same
PURE SYMBOL
“sign” in all clauses. For example, in the three clauses (A ∨ ¬B), (¬B ∨ ¬C), and
(C ∨ A), the symbol A is pure because only the positive literal appears, B is pure
because only the negative literal appears, and C is impure. It is easy to see that if
a sentence has a model, then it has a model with the pure symbols assigned so as to
make their literals true, because doing so can never make a clause false. Note that, in
determining the purity of a symbol, the algorithm can ignore clauses that are already
known to be true in the model constructed so far. For example, if the model contains
B = false, then the clause (¬B ∨ ¬C) is already true, and in the remaining clauses C
appears only as a positive literal; therefore C becomes pure.
• Unit clause heuristic: A unit clause was defined earlier as a clause with just one lit-
eral. In the context of DPLL, it also means clauses in which all literals but one are
already assigned false by the model. For example, if the model contains B = true,
then (¬B ∨ ¬C) simplifies to ¬C, which is a unit clause. Obviously, for this clause
to be true, C must be set to false. The unit clause heuristic assigns all such symbols
before branching on the remainder. One important consequence of the heuristic is that

function DPLL-SATISFIABLE?(s) returns true or false
inputs: s, a sentence in propositional logic
clauses ← the set of clauses in the CNF representation of s
symbols ← a list of the proposition symbols in s
return DPLL(clauses,symbols,{ })
function DPLL(clauses,symbols,model) returns true or false
if every clause in clauses is true in model then return true
if some clause in clauses is false in model then return false
P,value ← FIND-PURE-SYMBOL(symbols,clauses,model)
if P is non-null then return DPLL(clauses,symbols – P,model ∪ {P=value})
P,value ← FIND-UNIT-CLAUSE(clauses,model)
if P is non-null then return DPLL(clauses,symbols – P,model ∪ {P=value})
P ← FIRST(symbols); rest ← REST(symbols)
return DPLL(clauses,rest,model ∪ {P=true}) or
DPLL(clauses,rest,model ∪ {P=false}))
Figure 7.17 The DPLL algorithm for checking satisfiability of a sentence in propositional
logic. The ideas behind FIND-PURE-SYMBOL and FIND-UNIT-CLAUSE are described in
the text; each returns a symbol (or null) and the truth value to assign to that symbol. Like
TT-ENTAILS?, DPLL operates over partial models.
any attempt to prove (by refutation) a literal that is already in the knowledge base will
succeed immediately (Exercise 7.22). Notice also that assigning one unit clause can
create another unit clause—for example, when C is set to false, (C ∨ A) becomes a
unit clause, causing true to be assigned to A. This “cascade” of forced assignments
is called unit propagation. It resembles the process of forward chaining with definite
UNIT PROPAGATION
clauses, and indeed, if the CNF expression contains only definite clauses then DPLL
essentially replicates forward chaining. (See Exercise 7.23.)
The DPLL algorithm is shown in Figure 7.17, which gives the the essential skeleton of the
search process.
What Figure 7.17 does not show are the tricks that enable SAT solvers to scale up to
large problems. It is interesting that most of these tricks are in fact rather general, and we
have seen them before in other guises:
1. Component analysis (as seen with Tasmania in CSPs): As DPLL assigns truth values
to variables, the set of clauses may become separated into disjoint subsets, called com-
ponents, that share no unassigned variables. Given an efficient way to detect when this
occurs, a solver can gain considerable speed by working on each component separately.
2. Variable and value ordering (as seen in Section 6.3.1 for CSPs): Our simple imple-
mentation of DPLL uses an arbitrary variable ordering and always tries the value true
before false. The degree heuristic (see page 216) suggests choosing the variable that
appears most frequently over all remaining clauses.

3. Intelligent backtracking (as seen in Section 6.3 for CSPs): Many problems that can-
not be solved in hours of run time with chronological backtracking can be solved in
seconds with intelligent backtracking that backs up all the way to the relevant point of
conflict. All SAT solvers that do intelligent backtracking use some form of conflict
clause learning to record conflicts so that they won’t be repeated later in the search.
Usually a limited-size set of conflicts is kept, and rarely used ones are dropped.
4. Random restarts (as seen on page 124 for hill-climbing): Sometimes a run appears not
to be making progress. In this case, we can start over from the top of the search tree,
rather than trying to continue. After restarting, different random choices (in variable
and value selection) are made. Clauses that are learned in the first run are retained after
the restart and can help prune the search space. Restarting does not guarantee that a
solution will be found faster, but it does reduce the variance on the time to solution.
5. Clever indexing (as seen in many algorithms): The speedup methods used in DPLL
itself, as well as the tricks used in modern solvers, require fast indexing of such things
as “the set of clauses in which variable Xi appears as a positive literal.” This task is
complicated by the fact that the algorithms are interested only in the clauses that have
not yet been satisfied by previous assignments to variables, so the indexing structures
must be updated dynamically as the computation proceeds.
With these enhancements, modern solvers can handle problems with tens of millions of vari-
ables. They have revolutionized areas such as hardware verification and security protocol
verification, which previously required laborious, hand-guided proofs.
7.6.2 Local search algorithms
We have seen several local search algorithms so far in this book, including HILL-CLIMBING
(page 122) and SIMULATED-ANNEALING (page 126). These algorithms can be applied di-
rectly to satisfiability problems, provided that we choose the right evaluation function. Be-
cause the goal is to find an assignment that satisfies every clause, an evaluation function that
counts the number of unsatisfied clauses will do the job. In fact, this is exactly the measure
used by the MIN-CONFLICTS algorithm for CSPs (page 221). All these algorithms take steps
in the space of complete assignments, flipping the truth value of one symbol at a time. The
space usually contains many local minima, to escape from which various forms of random-
ness are required. In recent years, there has been a great deal of experimentation to find a
good balance between greediness and randomness.
One of the simplest and most effective algorithms to emerge from all this work is called
WALKSAT (Figure 7.18). On every iteration, the algorithm picks an unsatisfied clause and
picks a symbol in the clause to flip. It chooses randomly between two ways to pick which
symbol to flip: (1) a “min-conflicts” step that minimizes the number of unsatisfied clauses in
the new state and (2) a “random walk” step that picks the symbol randomly.
When WALKSAT returns a model, the input sentence is indeed satisfiable, but when
it returns failure, there are two possible causes: either the sentence is unsatisfiable or we
need to give the algorithm more time. If we set max flips = ∞ and p 0, WALKSAT will
eventually return a model (if one exists), because the random-walk steps will eventually hit

function WALKSAT(clauses,p,max flips) returns a satisfying model or failure
inputs: clauses, a set of clauses in propositional logic
p, the probability of choosing to do a “random walk” move, typically around 0.5
max flips, number of flips allowed before giving up
model ← a random assignment of true/false to the symbols in clauses
for i = 1 to max flips do
if model satisfies clauses then return model
clause ← a randomly selected clause from clauses that is false in model
with probability p flip the value in model of a randomly selected symbol from clause
else flip whichever symbol in clause maximizes the number of satisfied clauses
return failure
Figure 7.18 The WALKSAT algorithm for checking satisfiability by randomly flipping
the values of variables. Many versions of the algorithm exist.
upon the solution. Alas, if max flips is infinity and the sentence is unsatisfiable, then the
algorithm never terminates!
For this reason, WALKSAT is most useful when we expect a solution to exist—for ex-
ample, the problems discussed in Chapters 3 and 6 usually have solutions. On the other hand,
WALKSAT cannot always detect unsatisfiability, which is required for deciding entailment.
For example, an agent cannot reliably use WALKSAT to prove that a square is safe in the
wumpus world. Instead, it can say, “I thought about it for an hour and couldn’t come up with
a possible world in which the square isn’t safe.” This may be a good empirical indicator that
the square is safe, but it’s certainly not a proof.
7.6.3 The landscape of random SAT problems
Some SAT problems are harder than others. Easy problems can be solved by any old algo-
rithm, but because we know that SAT is NP-complete, at least some problem instances must
require exponential run time. In Chapter 6, we saw some surprising discoveries about certain
kinds of problems. For example, the n-queens problem—thought to be quite tricky for back-
tracking search algorithms—turned out to be trivially easy for local search methods, such as
min-conflicts. This is because solutions are very densely distributed in the space of assign-
ments, and any initial assignment is guaranteed to have a solution nearby. Thus, n-queens is
easy because it is underconstrained.
UNDERCONSTRAINED
When we look at satisfiability problems in conjunctive normal form, an undercon-
strained problem is one with relatively few clauses constraining the variables. For example,
here is a randomly generated 3-CNF sentence with five symbols and five clauses:
(¬D ∨ ¬B ∨ C) ∧ (B ∨ ¬A ∨ ¬C) ∧ (¬C ∨ ¬B ∨ E)
∧ (E ∨ ¬D ∨ B) ∧ (B ∨ E ∨ ¬C) .
Sixteen of the 32 possible assignments are models of this sentence, so, on average, it would
take just two random guesses to find a model. This is an easy satisfiability problem, as are

most such underconstrained problems. On the other hand, an overconstrained problem has
many clauses relative to the number of variables and is likely to have no solutions.
To go beyond these basic intuitions, we must define exactly how random sentences
are generated. The notation CNFk(m, n) denotes a k-CNF sentence with m clauses and n
symbols, where the clauses are chosen uniformly, independently, and without replacement
from among all clauses with k different literals, which are positive or negative at random. (A
symbol may not appear twice in a clause, nor may a clause appear twice in a sentence.)
Given a source of random sentences, we can measure the probability of satisfiability.
Figure 7.19(a) plots the probability for CNF3(m, 50), that is, sentences with 50 variables
and 3 literals per clause, as a function of the clause/symbol ratio, m/n. As we expect, for
small m/n the probability of satisfiability is close to 1, and at large m/n the probability
is close to 0. The probability drops fairly sharply around m/n = 4.3. Empirically, we find
that the “cliff” stays in roughly the same place (for k = 3) and gets sharper and sharper as n
increases. Theoretically, the satisfiability threshold conjecture says that for every k ≥ 3,
SATISFIABILITY
THRESHOLD
CONJECTURE
there is a threshold ratio rk such that, as n goes to infinity, the probability that CNFk(n, rn)
is satisfiable becomes 1 for all values of r below the threshold, and 0 for all values above.
The conjecture remains unproven.
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5 6 7 8
P(satisfiable)
Clause/symbol ratio m/n
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 1 2 3 4 5 6 7 8
Runtime
Clause/symbol ratio m/n
DPLL
WalkSAT
(a) (b)
Figure 7.19 (a) Graph showing the probability that a random 3-CNF sentence with n = 50
symbols is satisfiable, as a function of the clause/symbol ratio m/n. (b) Graph of the median
run time (measured in number of recursive calls to DPLL, a good proxy) on random 3-CNF
sentences. The most difficult problems have a clause/symbol ratio of about 4.3.
Now that we have a good idea where the satisfiable and unsatisfiable problems are, the
next question is, where are the hard problems? It turns out that they are also often at the
threshold value. Figure 7.19(b) shows that 50-symbol problems at the threshold value of 4.3
are about 20 times more difficult to solve than those at a ratio of 3.3. The underconstrained
problems are easiest to solve (because it is so easy to guess a solution); the overconstrained
problems are not as easy as the underconstrained, but still are much easier than the ones right
at the threshold.

Section 7.7. Agents Based on Propositional Logic 265
7.7 AGENTS BASED ON PROPOSITIONAL LOGIC
In this section, we bring together what we have learned so far in order to construct wumpus
world agents that use propositional logic. The ﬁrst step is to enable the agent to deduce, to the
extent possible, the state of the world given its percept history. This requires writing down a
complete logical model of the effects of actions. We also show how the agent can keep track of
the world efﬁciently without going back into the percept history for each inference. Finally,
we show how the agent can use logical inference to construct plans that are guaranteed to
achieve its goals.
7.7.1 The current state of the world
As stated at the beginning of the chapter, a logical agent operates by deducing what to do
from a knowledge base of sentences about the world. The knowledge base is composed of
axioms—general knowledge about how the world works—and percept sentences obtained
from the agent’s experience in a particular world. In this section, we focus on the problem of
deducing the current state of the wumpus world—where am I, is that square safe, and so on.
We began collecting axioms in Section 7.4.3. The agent knows that the starting square
contains no pit (¬P1,1) and no wumpus (¬W1,1). Furthermore, for each square, it knows that
the square is breezy if and only if a neighboring square has a pit; and a square is smelly if and
only if a neighboring square has a wumpus. Thus, we include a large collection of sentences
of the following form:
B1,1 ⇔ (P1,2 ∨ P2,1)
S1,1 ⇔ (W1,2 ∨ W2,1)
· · ·
The agent also knows that there is exactly one wumpus. This is expressed in two parts. First,
we have to say that there is at least one wumpus:
W1,1 ∨ W1,2 ∨ · · · ∨ W4,3 ∨ W4,4 .
Then, we have to say that there is at most one wumpus. For each pair of locations, we add a
sentence saying that at least one of them must be wumpus-free:
¬W1,1 ∨ ¬W1,2
¬W1,1 ∨ ¬W1,3
· · ·
¬W4,3 ∨ ¬W4,4 .
So far, so good. Now let’s consider the agent’s percepts. If there is currently a stench, one
might suppose that a proposition Stench should be added to the knowledge base. This is not
quite right, however: if there was no stench at the previous time step, then ¬Stench would al-
ready be asserted, and the new assertion would simply result in a contradiction. The problem
is solved when we realize that a percept asserts something only about the current time. Thus,
if the time step (as supplied to MAKE-PERCEPT-SENTENCE in Figure 7.1) is 4, then we add

Stench4
to the knowledge base, rather than Stench—neatly avoiding any contradiction with
¬Stench3
. The same goes for the breeze, bump, glitter, and scream percepts.
The idea of associating propositions with time steps extends to any aspect of the world
that changes over time. For example, the initial knowledge base includes L0
1,1—the agent is in
square [1, 1] at time 0—as well as FacingEast0
, HaveArrow0
, and WumpusAlive0
. We use
the word fluent (from the Latin fluens, flowing) to refer an aspect of the world that changes.
FLUENT
“Fluent” is a synonym for “state variable,” in the sense described in the discussion of factored
representations in Section 2.4.7 on page 57. Symbols associated with permanent aspects of
the world do not need a time superscript and are sometimes called atemporal variables.
ATEMPORAL
VARIABLE
We can connect stench and breeze percepts directly to the properties of the squares
where they are experienced through the location fluent as follows.10 For any time step t and
any square [x, y], we assert
Lt
x,y ⇒ (Breezet ⇔ Bx,y)
Lt
x,y ⇒ (Stencht ⇔ Sx,y) .
Now, of course, we need axioms that allow the agent to keep track of fluents such as Lt
x,y.
These fluents change as the result of actions taken by the agent, so, in the terminology of
Chapter 3, we need to write down the transition model of the wumpus world as a set of
logical sentences.
First, we need proposition symbols for the occurrences of actions. As with percepts,
these symbols are indexed by time; thus, Forward0
means that the agent executes the Forward
action at time 0. By convention, the percept for a given time step happens first, followed by
the action for that time step, followed by a transition to the next time step.
To describe how the world changes, we can try writing effect axioms that specify the
EFFECT AXIOM
outcome of an action at the next time step. For example, if the agent is at location [1, 1] facing
east at time 0 and goes Forward, the result is that the agent is in square [2, 1] and no longer
is in [1, 1]:
L0
1,1 ∧ FacingEast0
∧ Forward0
⇒ (L1
2,1 ∧ ¬L1
1,1) . (7.1)
We would need one such sentence for each possible time step, for each of the 16 squares,
and each of the four orientations. We would also need similar sentences for the other actions:
Grab, Shoot, Climb, TurnLeft, and TurnRight.
Let us suppose that the agent does decide to move Forward at time 0 and asserts this
fact into its knowledge base. Given the effect axiom in Equation (7.1), combined with the
initial assertions about the state at time 0, the agent can now deduce that it is in [2, 1]. That
is, ASK(KB, L1
2,1) = true. So far, so good. Unfortunately, the news elsewhere is less good:
if we ASK(KB, HaveArrow1
), the answer is false, that is, the agent cannot prove it still
has the arrow; nor can it prove it doesn’t have it! The information has been lost because the
effect axiom fails to state what remains unchanged as the result of an action. The need to do
this gives rise to the frame problem.11 One possible solution to the frame problem would
FRAME PROBLEM
10 Section 7.4.3 conveniently glossed over this requirement.
11 The name “frame problem” comes from “frame of reference” in physics—the assumed stationary background
with respect to which motion is measured. It also has an analogy to the frames of a movie, in which normally
most of the background stays constant while changes occur in the foreground.

be to add frame axioms explicitly asserting all the propositions that remain the same. For
FRAME AXIOM
example, for each time t we would have
Forwardt
⇒ (HaveArrowt
⇔ HaveArrowt+1
)
Forwardt
⇒ (WumpusAlivet
⇔ WumpusAlivet+1
)
· · ·
where we explicitly mention every proposition that stays unchanged from time t to time
t + 1 under the action Forward. Although the agent now knows that it still has the arrow
after moving forward and that the wumpus hasn’t died or come back to life, the proliferation
of frame axioms seems remarkably inefficient. In a world with m different actions and n
fluents, the set of frame axioms will be of size O(mn). This specific manifestation of the
frame problem is sometimes called the representational frame problem. Historically, the
REPRESENTATIONAL
FRAME PROBLEM
problem was a significant one for AI researchers; we explore it further in the notes at the end
of the chapter.
The representational frame problem is significant because the real world has very many
fluents, to put it mildly. Fortunately for us humans, each action typically changes no more
than some small number k of those fluents—the world exhibits locality. Solving the repre-
LOCALITY
sentational frame problem requires defining the transition model with a set of axioms of size
O(mk) rather than size O(mn). There is also an inferential frame problem: the problem
INFERENTIAL FRAME
PROBLEM
of projecting forward the results of a t step plan of action in time O(kt) rather than O(nt).
The solution to the problem involves changing one’s focus from writing axioms about
actions to writing axioms about fluents. Thus, for each fluent F, we will have an axiom that
defines the truth value of Ft+1 in terms of fluents (including F itself) at time t and the actions
that may have occurred at time t. Now, the truth value of Ft+1 can be set in one of two ways:
either the action at time t causes F to be true at t + 1, or F was already true at time t and the
action at time t does not cause it to be false. An axiom of this form is called a successor-state
axiom and has this schema:
SUCCESSOR-STATE
AXIOM
Ft+1
⇔ ActionCausesFt
∨ (Ft
∧ ¬ActionCausesNotFt
) .
One of the simplest successor-state axioms is the one for HaveArrow. Because there is no
action for reloading, the ActionCausesF t
part goes away and we are left with
HaveArrowt+1
⇔ (HaveArrowt
∧ ¬Shoott
) . (7.2)
For the agent’s location, the successor-state axioms are more elaborate. For example, Lt+1
1,1
is true if either (a) the agent moved Forward from [1, 2] when facing south, or from [2, 1]
when facing west; or (b) Lt
1,1 was already true and the action did not cause movement (either
because the action was not Forward or because the action bumped into a wall). Written out
in propositional logic, this becomes
Lt+1
1,1 ⇔ (Lt
1,1 ∧ (¬Forwardt
∨ Bumpt+1
))
∨ (Lt
1,2 ∧ (Southt
∧ Forwardt
)) (7.3)
∨ (Lt
2,1 ∧ (Westt
∧ Forwardt
)) .
Exercise 7.26 asks you to write out axioms for the remaining wumpus world fluents.

Given a complete set of successor-state axioms and the other axioms listed at the begin-
ning of this section, the agent will be able to ASK and answer any answerable question about
the current state of the world. For example, in Section 7.2 the initial sequence of percepts and
actions is
¬Stench0
∧ ¬Breeze0
∧ ¬Glitter0
∧ ¬Bump0
∧ ¬Scream0
; Forward0
¬Stench1
∧ Breeze1
∧ ¬Glitter1
∧ ¬Bump1
∧ ¬Scream1
; TurnRight1
¬Stench2
∧ Breeze2
∧ ¬Glitter2
∧ ¬Bump2
∧ ¬Scream2
; TurnRight2
¬Stench3
∧ Breeze3
∧ ¬Glitter3
∧ ¬Bump3
∧ ¬Scream3
; Forward3
¬Stench4
∧ ¬Breeze4
∧ ¬Glitter4
∧ ¬Bump4
∧ ¬Scream4
; TurnRight4
¬Stench5
∧ ¬Breeze5
∧ ¬Glitter5
∧ ¬Bump5
∧ ¬Scream5
; Forward5
Stench6
∧ ¬Breeze6
∧ ¬Glitter6
∧ ¬Bump6
∧ ¬Scream6
At this point, we have ASK(KB, L6
1,2) = true, so the agent knows where it is. Moreover,
ASK(KB, W1,3) = true and ASK(KB, P3,1) = true, so the agent has found the wumpus and
one of the pits. The most important question for the agent is whether a square is OK to move
into, that is, the square contains no pit nor live wumpus. It’s convenient to add axioms for
this, having the form
OKt
x,y ⇔ ¬Px,y ∧ ¬(Wx,y ∧ WumpusAlivet
) .
Finally, ASK(KB, OK6
2,2) = true, so the square [2, 2] is OK to move into. In fact, given a
sound and complete inference algorithm such as DPLL, the agent can answer any answerable
question about which squares are OK—and can do so in just a few milliseconds for small-to-
medium wumpus worlds.
Solving the representational and inferential frame problems is a big step forward, but
a pernicious problem remains: we need to conﬁrm that all the necessary preconditions of an
action hold for it to have its intended effect. We said that the Forward action moves the agent
ahead unless there is a wall in the way, but there are many other unusual exceptions that could
cause the action to fail: the agent might trip and fall, be stricken with a heart attack, be carried
away by giant bats, etc. Specifying all these exceptions is called the qualiﬁcation problem.
QUALIFICATION
PROBLEM
There is no complete solution within logic; system designers have to use good judgment in
deciding how detailed they want to be in specifying their model, and what details they want
to leave out. We will see in Chapter 13 that probability theory allows us to summarize all the
exceptions without explicitly naming them.
7.7.2 A hybrid agent
The ability to deduce various aspects of the state of the world can be combined fairly straight-
forwardly with condition–action rules and with problem-solving algorithms from Chapters 3
and 4 to produce a hybrid agent for the wumpus world. Figure 7.20 shows one possible way
HYBRID AGENT
to do this. The agent program maintains and updates a knowledge base as well as a current
plan. The initial knowledge base contains the atemporal axioms—those that don’t depend
on t, such as the axiom relating the breeziness of squares to the presence of pits. At each
time step, the new percept sentence is added along with all the axioms that depend on t, such

as the successor-state axioms. (The next section explains why the agent doesn’t need axioms
for future time steps.) Then, the agent uses logical inference, by ASKing questions of the
knowledge base, to work out which squares are safe and which have yet to be visited.
The main body of the agent program constructs a plan based on a decreasing priority of
goals. First, if there is a glitter, the program constructs a plan to grab the gold, follow a route
back to the initial location, and climb out of the cave. Otherwise, if there is no current plan,
the program plans a route to the closest safe square that it has not visited yet, making sure
the route goes through only safe squares. Route planning is done with A∗ search, not with
ASK. If there are no safe squares to explore, the next step—if the agent still has an arrow—is
to try to make a safe square by shooting at one of the possible wumpus locations. These are
determined by asking where ASK(KB, ¬Wx,y) is false—that is, where it is not known that
there is not a wumpus. The function PLAN-SHOT (not shown) uses PLAN-ROUTE to plan a
sequence of actions that will line up this shot. If this fails, the program looks for a square to
explore that is not provably unsafe—that is, a square for which ASK(KB, ¬OKt
x,y) returns
false. If there is no such square, then the mission is impossible and the agent retreats to [1, 1]
and climbs out of the cave.
7.7.3 Logical state estimation
The agent program in Figure 7.20 works quite well, but it has one major weakness: as time
goes by, the computational expense involved in the calls to ASK goes up and up. This happens
mainly because the required inferences have to go back further and further in time and involve
more and more proposition symbols. Obviously, this is unsustainable—we cannot have an
agent whose time to process each percept grows in proportion to the length of its life! What
we really need is a constant update time—that is, independent of t. The obvious answer is to
save, or cache, the results of inference, so that the inference process at the next time step can
CACHING
build on the results of earlier steps instead of having to start again from scratch.
As we saw in Section 4.4, the past history of percepts and all their ramiﬁcations can
be replaced by the belief state—that is, some representation of the set of all possible current
states of the world.12 The process of updating the belief state as new percepts arrive is called
state estimation. Whereas in Section 4.4 the belief state was an explicit list of states, here
we can use a logical sentence involving the proposition symbols associated with the current
time step, as well as the atemporal symbols. For example, the logical sentence
WumpusAlive1
∧ L1
2,1 ∧ B2,1 ∧ (P3,1 ∨ P2,2) (7.4)
represents the set of all states at time 1 in which the wumpus is alive, the agent is at [2, 1],
that square is breezy, and there is a pit in [3, 1] or [2, 2] or both.
Maintaining an exact belief state as a logical formula turns out not to be easy. If there
are n ﬂuent symbols for time t, then there are 2n possible states—that is, assignments of truth
values to those symbols. Now, the set of belief states is the powerset (set of all subsets) of the
set of physical states. There are 2n physical states, hence 22n
belief states. Even if we used
the most compact possible encoding of logical formulas, with each belief state represented
12 We can think of the percept history itself as a representation of the belief state, but one that makes inference
increasingly expensive as the history gets longer.

function HYBRID-WUMPUS-AGENT(percept) returns an action
inputs: percept, a list, [stench,breeze,glitter,bump,scream]
persistent: KB, a knowledge base, initially the atemporal “wumpus physics”
t, a counter, initially 0, indicating time
plan, an action sequence, initially empty
TELL(KB, MAKE-PERCEPT-SENTENCE(percept,t))
TELL the KB the temporal “physics” sentences for time t
safe ← {[x, y] : ASK(KB,OKt
x,y) = true}
if ASK(KB,Glittert
) = true then
plan ← [Grab] + PLAN-ROUTE(current,{[1,1]},safe) + [Climb]
if plan is empty then
unvisited ← {[x, y] : ASK(KB, Lt′
x,y) = false for all t′
≤ t}
plan ← PLAN-ROUTE(current,unvisited ∩ safe,safe)
if plan is empty and ASK(KB,HaveArrowt
) = true then
possible wumpus ← {[x, y] : ASK(KB, ¬ Wx,y) = false}
plan ← PLAN-SHOT(current,possible wumpus,safe)
if plan is empty then // no choice but to take a risk
not unsafe ← {[x, y] : ASK(KB, ¬ OKt
x,y) = false}
plan ← PLAN-ROUTE(current,unvisited ∩ not unsafe,safe)
if plan is empty then
plan ← PLAN-ROUTE(current,{[1, 1]},safe) + [Climb]
action ← POP(plan)
TELL(KB, MAKE-ACTION-SENTENCE(action,t))
t ← t + 1
return action
function PLAN-ROUTE(current,goals,allowed) returns an action sequence
inputs: current, the agent’s current position
goals, a set of squares; try to plan a route to one of them
allowed, a set of squares that can form part of the route
problem ← ROUTE-PROBLEM(current,goals,allowed)
return A*-GRAPH-SEARCH(problem)
Figure 7.20 A hybrid agent program for the wumpus world. It uses a propositional knowl-
edge base to infer the state of the world, and a combination of problem-solving search and
domain-speciﬁc code to decide what actions to take.
by a unique binary number, we would need numbers with log2(22n
) = 2n bits to label the
current belief state. That is, exact state estimation may require logical formulas whose size is
exponential in the number of symbols.
One very common and natural scheme for approximate state estimation is to represent
belief states as conjunctions of literals, that is, 1-CNF formulas. To do this, the agent program
simply tries to prove Xt and ¬Xt for each symbol Xt (as well as each atemporal symbol
whose truth value is not yet known), given the belief state at t − 1. The conjunction of

Figure 7.21 Depiction of a 1-CNF belief state (bold outline) as a simply representable,
conservative approximation to the exact (wiggly) belief state (shaded region with dashed
outline). Each possible world is shown as a circle; the shaded ones are consistent with all the
percepts.
provable literals becomes the new belief state, and the previous belief state is discarded.
It is important to understand that this scheme may lose some information as time goes
along. For example, if the sentence in Equation (7.4) were the true belief state, then neither
P3,1 nor P2,2 would be provable individually and neither would appear in the 1-CNF belief
state. (Exercise 7.27 explores one possible solution to this problem.) On the other hand,
because every literal in the 1-CNF belief state is proved from the previous belief state, and
the initial belief state is a true assertion, we know that entire 1-CNF belief state must be
true. Thus, the set of possible states represented by the 1-CNF belief state includes all states
that are in fact possible given the full percept history. As illustrated in Figure 7.21, the 1-
CNF belief state acts as a simple outer envelope, or conservative approximation, around the
CONSERVATIVE
APPROXIMATION
exact belief state. We see this idea of conservative approximations to complicated sets as a
recurring theme in many areas of AI.
7.7.4 Making plans by propositional inference
The agent in Figure 7.20 uses logical inference to determine which squares are safe, but uses
A∗ search to make plans. In this section, we show how to make plans by logical inference.
The basic idea is very simple:
1. Construct a sentence that includes
(a) Init0
, a collection of assertions about the initial state;
(b) Transition1
, . . . , Transitiont
, the successor-state axioms for all possible actions
at each time up to some maximum time t;
(c) the assertion that the goal is achieved at time t: HaveGoldt
∧ ClimbedOutt
.

2. Present the whole sentence to a SAT solver. If the solver finds a satisfying model, then
the goal is achievable; if the sentence is unsatisfiable, then the planning problem is
impossible.
3. Assuming a model is found, extract from the model those variables that represent ac-
tions and are assigned true. Together they represent a plan to achieve the goals.
A propositional planning procedure, SATPLAN, is shown in Figure 7.22. It implements the
basic idea just given, with one twist. Because the agent does not know how many steps it
will take to reach the goal, the algorithm tries each possible number of steps t, up to some
maximum conceivable plan length Tmax. In this way, it is guaranteed to find the shortest plan
if one exists. Because of the way SATPLAN searches for a solution, this approach cannot
be used in a partially observable environment; SATPLAN would just set the unobservable
variables to the values it needs to create a solution.
function SATPLAN(init, transition, goal,T max) returns solution or failure
inputs: init, transition, goal, constitute a description of the problem
T max, an upper limit for plan length
for t = 0 to T max do
cnf ← TRANSLATE-TO-SAT(init, transition, goal,t)
model ← SAT-SOLVER(cnf )
if model is not null then
return EXTRACT-SOLUTION(model)
return failure
Figure 7.22 The SATPLAN algorithm. The planning problem is translated into a CNF
sentence in which the goal is asserted to hold at a fixed time step t and axioms are included
for each time step up to t. If the satisfiability algorithm finds a model, then a plan is extracted
by looking at those proposition symbols that refer to actions and are assigned true in the
model. If no model exists, then the process is repeated with the goal moved one step later.
The key step in using SATPLAN is the construction of the knowledge base. It might
seem, on casual inspection, that the wumpus world axioms in Section 7.7.1 suffice for steps
1(a) and 1(b) above. There is, however, a significant difference between the requirements for
entailment (as tested by ASK) and those for satisfiability. Consider, for example, the agent’s
location, initially [1, 1], and suppose the agent’s unambitious goal is to be in [2, 1] at time 1.
The initial knowledge base contains L0
1,1 and the goal is L1
2,1. Using ASK, we can prove L1
2,1
if Forward0
is asserted, and, reassuringly, we cannot prove L1
2,1 if, say, Shoot0
is asserted
instead. Now, SATPLAN will find the plan [Forward0
]; so far, so good. Unfortunately,
SATPLAN also finds the plan [Shoot0
]. How could this be? To find out, we inspect the model
that SATPLAN constructs: it includes the assignment L0
2,1, that is, the agent can be in [2, 1]
at time 1 by being there at time 0 and shooting. One might ask, “Didn’t we say the agent is in
[1, 1] at time 0?” Yes, we did, but we didn’t tell the agent that it can’t be in two places at once!
For entailment, L0
2,1 is unknown and cannot, therefore, be used in a proof; for satisfiability,

on the other hand, L0
2,1 is unknown and can, therefore, be set to whatever value helps to
make the goal true. For this reason, SATPLAN is a good debugging tool for knowledge bases
because it reveals places where knowledge is missing. In this particular case, we can fix the
knowledge base by asserting that, at each time step, the agent is in exactly one location, using
a collection of sentences similar to those used to assert the existence of exactly one wumpus.
Alternatively, we can assert ¬L0
x,y for all locations other than [1, 1]; the successor-state axiom
for location takes care of subsequent time steps. The same fixes also work to make sure the
agent has only one orientation.
SATPLAN has more surprises in store, however. The first is that it finds models with
impossible actions, such as shooting with no arrow. To understand why, we need to look more
carefully at what the successor-state axioms (such as Equation (7.3)) say about actions whose
preconditions are not satisfied. The axioms do predict correctly that nothing will happen when
such an action is executed (see Exercise 10.14), but they do not say that the action cannot be
executed! To avoid generating plans with illegal actions, we must add precondition axioms
PRECONDITION
AXIOMS
stating that an action occurrence requires the preconditions to be satisfied.13 For example, we
need to say, for each time t, that
Shoott
⇒ HaveArrowt
.
This ensures that if a plan selects the Shoot action at any time, it must be the case that the
agent has an arrow at that time.
SATPLAN’s second surprise is the creation of plans with multiple simultaneous actions.
For example, it may come up with a model in which both Forward0
and Shoot0
are true,
which is not allowed. To eliminate this problem, we introduce action exclusion axioms: for
ACTION EXCLUSION
AXIOM
every pair of actions At
i and At
j we add the axiom
¬At
i ∨ ¬At
j .
It might be pointed out that walking forward and shooting at the same time is not so hard to
do, whereas, say, shooting and grabbing at the same time is rather impractical. By imposing
action exclusion axioms only on pairs of actions that really do interfere with each other, we
can allow for plans that include multiple simultaneous actions—and because SATPLAN finds
the shortest legal plan, we can be sure that it will take advantage of this capability.
To summarize, SATPLAN finds models for a sentence containing the initial state, the
goal, the successor-state axioms, the precondition axioms, and the action exclusion axioms.
It can be shown that this collection of axioms is sufficient, in the sense that there are no
longer any spurious “solutions.” Any model satisfying the propositional sentence will be a
valid plan for the original problem. Modern SAT-solving technology makes the approach
quite practical. For example, a DPLL-style solver has no difficulty in generating the 11-step
solution for the wumpus world instance shown in Figure 7.2.
This section has described a declarative approach to agent construction: the agent works
by a combination of asserting sentences in the knowledge base and performing logical infer-
ence. This approach has some weaknesses hidden in phrases such as “for each time t” and
13 Notice that the addition of precondition axioms means that we need not include preconditions for actions in
the successor-state axioms.

“for each square [x, y].” For any practical agent, these phrases have to be implemented by
code that generates instances of the general sentence schema automatically for insertion into
the knowledge base. For a wumpus world of reasonable size—one comparable to a smallish
computer game—we might need a 100 × 100 board and 1000 time steps, leading to knowl-
edge bases with tens or hundreds of millions of sentences. Not only does this become rather
impractical, but it also illustrates a deeper problem: we know something about the wum-
pus world—namely, that the “physics” works the same way across all squares and all time
steps—that we cannot express directly in the language of propositional logic. To solve this
problem, we need a more expressive language, one in which phrases like “for each time t”
and “for each square [x, y]” can be written in a natural way. First-order logic, described in
Chapter 8, is such a language; in first-order logic a wumpus world of any size and duration
can be described in about ten sentences rather than ten million or ten trillion.
7.8 SUMMARY
We have introduced knowledge-based agents and have shown how to define a logic with
which such agents can reason about the world. The main points are as follows:
• Intelligent agents need knowledge about the world in order to reach good decisions.
• Knowledge is contained in agents in the form of sentences in a knowledge represen-
tation language that are stored in a knowledge base.
• A knowledge-based agent is composed of a knowledge base and an inference mecha-
nism. It operates by storing sentences about the world in its knowledge base, using the
inference mechanism to infer new sentences, and using these sentences to decide what
action to take.
• A representation language is defined by its syntax, which specifies the structure of
sentences, and its semantics, which defines the truth of each sentence in each possible
world or model.
• The relationship of entailment between sentences is crucial to our understanding of
reasoning. A sentence α entails another sentence β if β is true in all worlds where
α is true. Equivalent definitions include the validity of the sentence α ⇒ β and the
unsatisfiability of the sentence α ∧ ¬β.
• Inference is the process of deriving new sentences from old ones. Sound inference algo-
rithms derive only sentences that are entailed; complete algorithms derive all sentences
that are entailed.
• Propositional logic is a simple language consisting of proposition symbols and logical
connectives. It can handle propositions that are known true, known false, or completely
unknown.
• The set of possible models, given a fixed propositional vocabulary, is finite, so en-
tailment can be checked by enumerating models. Efficient model-checking inference
algorithms for propositional logic include backtracking and local search methods and
can often solve large problems quickly.

• Inference rules are patterns of sound inference that can be used to find proofs. The
resolution rule yields a complete inference algorithm for knowledge bases that are
expressed in conjunctive normal form. Forward chaining and backward chaining
are very natural reasoning algorithms for knowledge bases in Horn form.
• Local search methods such as WALKSAT can be used to find solutions. Such algo-
rithms are sound but not complete.
• Logical state estimation involves maintaining a logical sentence that describes the set
of possible states consistent with the observation history. Each update step requires
inference using the transition model of the environment, which is built from successor-
state axioms that specify how each fluent changes.
• Decisions within a logical agent can be made by SAT solving: finding possible models
specifying future action sequences that reach the goal. This approach works only for
fully observable or sensorless environments.
• Propositional logic does not scale to environments of unbounded size because it lacks
the expressive power to deal concisely with time, space, and universal patterns of rela-
tionships among objects.
John McCarthy’s paper “Programs with Common Sense” (McCarthy, 1958, 1968) promul-
gated the notion of agents that use logical reasoning to mediate between percepts and actions.
It also raised the flag of declarativism, pointing out that telling an agent what it needs to know
is an elegant way to build software. Allen Newell’s (1982) article “The Knowledge Level”
makes the case that rational agents can be described and analyzed at an abstract level defined
by the knowledge they possess rather than the programs they run. The declarative and proce-
dural approaches to AI are analyzed in depth by Boden (1977). The debate was revived by,
among others, Brooks (1991) and Nilsson (1991), and continues to this day (Shaparau et al.,
2008). Meanwhile, the declarative approach has spread into other areas of computer science
such as networking (Loo et al., 2006).
Logic itself had its origins in ancient Greek philosophy and mathematics. Various log-
ical principles—principles connecting the syntactic structure of sentences with their truth
and falsity, with their meaning, or with the validity of arguments in which they figure—are
scattered in the works of Plato. The first known systematic study of logic was carried out
by Aristotle, whose work was assembled by his students after his death in 322 B.C. as a
treatise called the Organon. Aristotle’s syllogisms were what we would now call inference
SYLLOGISM
rules. Although the syllogisms included elements of both propositional and first-order logic,
the system as a whole lacked the compositional properties required to handle sentences of
arbitrary complexity.
The closely related Megarian and Stoic schools (originating in the fifth century B.C.
and continuing for several centuries thereafter) began the systematic study of the basic logical
connectives. The use of truth tables for defining connectives is due to Philo of Megara. The

Stoics took five basic inference rules as valid without proof, including the rule we now call
Modus Ponens. They derived a number of other rules from these five, using, among other
principles, the deduction theorem (page 249) and were much clearer about the notion of
proof than was Aristotle. A good account of the history of Megarian and Stoic logic is given
by Benson Mates (1953).
The idea of reducing logical inference to a purely mechanical process applied to a for-
mal language is due to Wilhelm Leibniz (1646–1716), although he had limited success in im-
plementing the ideas. George Boole (1847) introduced the first comprehensive and workable
system of formal logic in his book The Mathematical Analysis of Logic. Boole’s logic was
closely modeled on the ordinary algebra of real numbers and used substitution of logically
equivalent expressions as its primary inference method. Although Boole’s system still fell
short of full propositional logic, it was close enough that other mathematicians could quickly
fill in the gaps. Schröder (1877) described conjunctive normal form, while Horn form was
introduced much later by Alfred Horn (1951). The first comprehensive exposition of modern
propositional logic (and first-order logic) is found in Gottlob Frege’s (1879) Begriffschrift
(“Concept Writing” or “Conceptual Notation”).
The first mechanical device to carry out logical inferences was constructed by the third
Earl of Stanhope (1753–1816). The Stanhope Demonstrator could handle syllogisms and
certain inferences in the theory of probability. William Stanley Jevons, one of those who
improved upon and extended Boole’s work, constructed his “logical piano” in 1869 to per-
form inferences in Boolean logic. An entertaining and instructive history of these and other
early mechanical devices for reasoning is given by Martin Gardner (1968). The first pub-
lished computer program for logical inference was the Logic Theorist of Newell, Shaw,
and Simon (1957). This program was intended to model human thought processes. Mar-
tin Davis (1957) had actually designed a program that came up with a proof in 1954, but the
Logic Theorist’s results were published slightly earlier.
Truth tables as a method of testing validity or unsatisfiability in propositional logic were
introduced independently by Emil Post (1921) and Ludwig Wittgenstein (1922). In the 1930s,
a great deal of progress was made on inference methods for first-order logic. In particular,
Gödel (1930) showed that a complete procedure for inference in first-order logic could be
obtained via a reduction to propositional logic, using Herbrand’s theorem (Herbrand, 1930).
We take up this history again in Chapter 9; the important point here is that the development
of efficient propositional algorithms in the 1960s was motivated largely by the interest of
mathematicians in an effective theorem prover for first-order logic. The Davis–Putnam algo-
rithm (Davis and Putnam, 1960) was the first effective algorithm for propositional resolution
but was in most cases much less efficient than the DPLL backtracking algorithm introduced
two years later (1962). The full resolution rule and a proof of its completeness appeared in a
seminal paper by J. A. Robinson (1965), which also showed how to do first-order reasoning
without resort to propositional techniques.
Stephen Cook (1971) showed that deciding satisfiability of a sentence in propositional
logic (the SAT problem) is NP-complete. Since deciding entailment is equivalent to decid-
ing unsatisfiability, it is co-NP-complete. Many subsets of propositional logic are known for
which the satisfiability problem is polynomially solvable; Horn clauses are one such subset.

The linear-time forward-chaining algorithm for Horn clauses is due to Dowling and Gallier
(1984), who describe their algorithm as a dataflow process similar to the propagation of sig-
nals in a circuit.
Early theoretical investigations showed that DPLL has polynomial average-case com-
plexity for certain natural distributions of problems. This potentially exciting fact became
less exciting when Franco and Paull (1983) showed that the same problems could be solved
in constant time simply by guessing random assignments. The random-generation method
described in the chapter produces much harder problems. Motivated by the empirical success
of local search on these problems, Koutsoupias and Papadimitriou (1992) showed that a sim-
ple hill-climbing algorithm can solve almost all satisfiability problem instances very quickly,
suggesting that hard problems are rare. Moreover, Schöning (1999) exhibited a randomized
hill-climbing algorithm whose worst-case expected run time on 3-SAT problems (that is, sat-
isfiability of 3-CNF sentences) is O(1.333n)—still exponential, but substantially faster than
previous worst-case bounds. The current record is O(1.324n) (Iwama and Tamaki, 2004).
Achlioptas et al. (2004) and Alekhnovich et al. (2005) exhibit families of 3-SAT instances
for which all known DPLL-like algorithms require exponential running time.
On the practical side, efficiency gains in propositional solvers have been marked. Given
ten minutes of computing time, the original DPLL algorithm in 1962 could only solve prob-
lems with no more than 10 or 15 variables. By 1995 the SATZ solver (Li and Anbulagan,
1997) could handle 1,000 variables, thanks to optimized data structures for indexing vari-
ables. Two crucial contributions were the watched literal indexing technique of Zhang and
Stickel (1996), which makes unit propagation very efficient, and the introduction of clause
(i.e., constraint) learning techniques from the CSP community by Bayardo and Schrag (1997).
Using these ideas, and spurred by the prospect of solving industrial-scale circuit verification
problems, Moskewicz et al. (2001) developed the CHAFF solver, which could handle prob-
lems with millions of variables. Beginning in 2002, SAT competitions have been held reg-
ularly; most of the winning entries have either been descendants of CHAFF or have used the
same general approach. RSAT (Pipatsrisawat and Darwiche, 2007), the 2007 winner, falls in
the latter category. Also noteworthy is MINISAT (Een and Sörensson, 2003), an open-source
implementation available at http://minisat.se that is designed to be easily modified
and improved. The current landscape of solvers is surveyed by Gomes et al. (2008).
Local search algorithms for satisfiability were tried by various authors throughout the
1980s; all of the algorithms were based on the idea of minimizing the number of unsatisfied
clauses (Hansen and Jaumard, 1990). A particularly effective algorithm was developed by
Gu (1989) and independently by Selman et al. (1992), who called it GSAT and showed that
it was capable of solving a wide range of very hard problems very quickly. The WALKSAT
algorithm described in the chapter is due to Selman et al. (1996).
The “phase transition” in satisfiability of random k-SAT problems was first observed
by Simon and Dubois (1989) and has given rise to a great deal of theoretical and empirical
research—due, in part, to the obvious connection to phase transition phenomena in statistical
physics. Cheeseman et al. (1991) observed phase transitions in several CSPs and conjecture
that all NP-hard problems have a phase transition. Crawford and Auton (1993) located the
3-SAT transition at a clause/variable ratio of around 4.26, noting that this coincides with a

sharp peak in the run time of their SAT solver. Cook and Mitchell (1997) provide an excellent
summary of the early literature on the problem.
The current state of theoretical understanding is summarized by Achlioptas (2009).
The satisfiability threshold conjecture states that, for each k, there is a sharp satisfiability
SATISFIABILITY
THRESHOLD
CONJECTURE
threshold rk, such that as the number of variables n → ∞, instances below the threshold are
satisfiable with probability 1, while those above the threshold are unsatisfiable with proba-
bility 1. The conjecture was not quite proved by Friedgut (1999): a sharp threshold exists but
its location might depend on n even as n → ∞. Despite significant progress in asymptotic
analysis of the threshold location for large k (Achlioptas and Peres, 2004; Achlioptas et al.,
2007), all that can be proved for k = 3 is that it lies in the range [3.52,4.51]. Current theory
suggests that a peak in the run time of a SAT solver is not necessarily related to the satisfia-
bility threshold, but instead to a phase transition in the solution distribution and structure of
SAT instances. Empirical results due to Coarfa et al. (2003) support this view. In fact, al-
gorithms such as survey propagation (Parisi and Zecchina, 2002; Maneva et al., 2007) take
SURVEY
PROPAGATION
advantage of special properties of random SAT instances near the satisfiability threshold and
greatly outperform general SAT solvers on such instances.
The best sources for information on satisfiability, both theoretical and practical, are the
Handbook of Satisfiability (Biere et al., 2009) and the regular International Conferences on
Theory and Applications of Satisfiability Testing, known as SAT.
The idea of building agents with propositional logic can be traced back to the seminal
paper of McCulloch and Pitts (1943), which initiated the field of neural networks. Con-
trary to popular supposition, the paper was concerned with the implementation of a Boolean
circuit-based agent design in the brain. Circuit-based agents, which perform computation by
propagating signals in hardware circuits rather than running algorithms in general-purpose
computers, have received little attention in AI, however. The most notable exception is the
work of Stan Rosenschein (Rosenschein, 1985; Kaelbling and Rosenschein, 1990), who de-
veloped ways to compile circuit-based agents from declarative descriptions of the task envi-
ronment. (Rosenschein’s approach is described at some length in the second edition of this
book.) The work of Rod Brooks (1986, 1989) demonstrates the effectiveness of circuit-based
designs for controlling robots—a topic we take up in Chapter 25. Brooks (1991) argues
that circuit-based designs are all that is needed for AI—that representation and reasoning
are cumbersome, expensive, and unnecessary. In our view, neither approach is sufficient by
itself. Williams et al. (2003) show how a hybrid agent design not too different from our
wumpus agent has been used to control NASA spacecraft, planning sequences of actions and
diagnosing and recovering from faults.
The general problem of keeping track of a partially observable environment was intro-
duced for state-based representations in Chapter 4. Its instantiation for propositional repre-
sentations was studied by Amir and Russell (2003), who identified several classes of envi-
ronments that admit efficient state-estimation algorithms and showed that for several other
classes the problem is intractable. The temporal-projection problem, which involves deter-
TEMPORAL-
PROJECTION
mining what propositions hold true after an action sequence is executed, can be seen as a
special case of state estimation with empty percepts. Many authors have studied this problem
because of its importance in planning; some important hardness results were established by

Exercises 279
Liberatore (1997). The idea of representing a belief state with propositions can be traced to
Wittgenstein (1922).
Logical state estimation, of course, requires a logical representation of the effects of
actions—a key problem in AI since the late 1950s. The dominant proposal has been the sit-
uation calculus formalism (McCarthy, 1963), which is couched within first-order logic. We
discuss situation calculus, and various extensions and alternatives, in Chapters 10 and 12. The
approach taken in this chapter—using temporal indices on propositional variables—is more
restrictive but has the benefit of simplicity. The general approach embodied in the SATPLAN
algorithm was proposed by Kautz and Selman (1992). Later generations of SATPLAN were
able to take advantage of the advances in SAT solvers, described earlier, and remain among
the most effective ways of solving difficult problems (Kautz, 2006).
The frame problem was first recognized by McCarthy and Hayes (1969). Many re-
searchers considered the problem unsolvable within first-order logic, and it spurred a great
deal of research into nonmonotonic logics. Philosophers from Dreyfus (1972) to Crockett
(1994) have cited the frame problem as one symptom of the inevitable failure of the entire
AI enterprise. The solution of the frame problem with successor-state axioms is due to Ray
Reiter (1991). Thielscher (1999) identifies the inferential frame problem as a separate idea
and provides a solution. In retrospect, one can see that Rosenschein’s (1985) agents were
using circuits that implemented successor-state axioms, but Rosenschein did not notice that
the frame problem was thereby largely solved. Foo (2001) explains why the discrete-event
control theory models typically used by engineers do not have to explicitly deal with the
frame problem: because they are dealing with prediction and control, not with explanation
and reasoning about counterfactual situations.
Modern propositional solvers have wide applicability in industrial applications. The ap-
plication of propositional inference in the synthesis of computer hardware is now a standard
technique having many large-scale deployments (Nowick et al., 1993). The SATMC satisfi-
ability checker was used to detect a previously unknown vulnerability in a Web browser user
sign-on protocol (Armando et al., 2008).
The wumpus world was invented by Gregory Yob (1975). Ironically, Yob developed it
because he was bored with games played on a rectangular grid: the topology of his original
wumpus world was a dodecahedron, and we put it back in the boring old grid. Michael
Genesereth was the first to suggest that the wumpus world be used as an agent testbed.
EXERCISES
7.1 Suppose the agent has progressed to the point shown in Figure 7.4(a), page 239, having
perceived nothing in [1,1], a breeze in [2,1], and a stench in [1,2], and is now concerned with
the contents of [1,3], [2,2], and [3,1]. Each of these can contain a pit, and at most one can
contain a wumpus. Following the example of Figure 7.5, construct the set of possible worlds.
(You should find 32 of them.) Mark the worlds in which the KB is true and those in which

each of the following sentences is true:
α3 = “There is a wumpus in [1,3].”
Hence show that KB |= α2 and KB |= α3.
7.2 (Adapted from Barwise and Etchemendy (1993).) Given the following, can you prove
that the unicorn is mythical? How about magical? Horned?
If the unicorn is mythical, then it is immortal, but if it is not mythical, then it is a
mortal mammal. If the unicorn is either immortal or a mammal, then it is horned.
The unicorn is magical if it is horned.
7.3 Consider the problem of deciding whether a propositional logic sentence is true in a
given model.
a. Write a recursive algorithm PL-TRUE?(s, m) that returns true if and only if the sen-
tence s is true in the model m (where m assigns a truth value for every symbol in s).
The algorithm should run in time linear in the size of the sentence. (Alternatively, use a
version of this function from the online code repository.)
b. Give three examples of sentences that can be determined to be true or false in a partial
model that does not specify a truth value for some of the symbols.
c. Show that the truth value (if any) of a sentence in a partial model cannot be determined
efficiently in general.
d. Modify your PL-TRUE? algorithm so that it can sometimes judge truth from partial
models, while retaining its recursive structure and linear run time. Give three examples
of sentences whose truth in a partial model is not detected by your algorithm.
e. Investigate whether the modified algorithm makes TT-ENTAILS? more efficient.
7.4 Which of the following are correct?
a. False |= True.
b. True |= False.
c. (A ∧ B) |= (A ⇔ B).
d. A ⇔ B |= A ∨ B.
e. A ⇔ B |= ¬A ∨ B.
f. (A ∨ B) ∧ (¬C ∨ ¬D ∨ E) |= (A ∨ B ∨ C) ∧ (B ∧ C ∧ D ⇒ E).
g. (A ∨ B) ∧ (¬C ∨ ¬D ∨ E) |= (A ∨ B) ∧ (¬D ∨ E).
h. (A ∨ B) ∧ ¬(A ⇒ B) is satisfiable.
i. (A ∧ B) ⇒ C |= (A ⇒ C) ∨ (B ⇒ C).
j. (C ∨ (¬A ∧ ¬B)) ≡ ((A ⇒ C) ∧ (B ⇒ C)).
k. (A ⇔ B) ∧ (¬A ∨ B) is satisfiable.
l. (A ⇔ B) ⇔ C has the same number of models as (A ⇔ B) for any fixed set of
proposition symbols that includes A, B, C.

Exercises 281
7.5 Prove each of the following assertions:
a. α is valid if and only if True |= α.
b. For any α, False |= α.
c. α |= β if and only if the sentence (α ⇒ β) is valid.
d. α ≡ β if and only if the sentence (α ⇔ β) is valid.
e. α |= β if and only if the sentence (α ∧ ¬β) is unsatisfiable.
7.6 Prove, or find a counterexample to, each of the following assertions:
a. If α |= γ or β |= γ (or both) then (α ∧ β) |= γ
b. If (α ∧ β) |= γ then α |= γ or β |= γ (or both).
c. If α |= (β ∨ γ) then α |= β or α |= γ (or both).
7.7 Consider a vocabulary with only four propositions, A, B, C, and D. How many models
are there for the following sentences?
a. B ∨ C.
b. ¬A ∨ ¬B ∨ ¬C ∨ ¬D.
c. (A ⇒ B) ∧ A ∧ ¬B ∧ C ∧ D.
7.8 We have defined four binary logical connectives.
a. Are there any others that might be useful?
b. How many binary connectives can there be?
c. Why are some of them not very useful?
7.9 Using a method of your choice, verify each of the equivalences in Figure 7.11 (page 249).
7.10 Decide whether each of the following sentences is valid, unsatisfiable, or neither. Ver-
ify your decisions using truth tables or the equivalence rules of Figure 7.11 (page 249).
a. Smoke ⇒ Smoke
b. Smoke ⇒ Fire
c. (Smoke ⇒ Fire) ⇒ (¬Smoke ⇒ ¬Fire)
d. Smoke ∨ Fire ∨ ¬Fire
e. ((Smoke ∧ Heat) ⇒ Fire) ⇔ ((Smoke ⇒ Fire) ∨ (Heat ⇒ Fire))
f. Big ∨ Dumb ∨ (Big ⇒ Dumb)
g. (Big ∧ Dumb) ∨ ¬Dumb
7.11 Any propositional logic sentence is logically equivalent to the assertion that each pos-
sible world in which it would be false is not the case. From this observation, prove that any
sentence can be written in CNF.
7.12 Use resolution to prove the sentence ¬A∧¬B from the clauses in Exercise 7.19.
7.13 This exercise looks into the relationship between clauses and implication sentences.

a. Show that the clause (¬P1 ∨ · · · ∨ ¬Pm ∨ Q) is logically equivalent to the implication
sentence (P1 ∧ · · · ∧ Pm) ⇒ Q.
b. Show that every clause (regardless of the number of positive literals) can be written in
the form (P1 ∧ · · · ∧ Pm) ⇒ (Q1 ∨ · · · ∨ Qn), where the Ps and Qs are proposition
symbols. A knowledge base consisting of such sentences is in implicative normal
form or Kowalski form (Kowalski, 1979).
IMPLICATIVE
NORMAL FORM
c. Write down the full resolution rule for sentences in implicative normal form.
7.14 According to some political pundits, a person who is radical (R) is electable (E) if
he/she is conservative (C), but otherwise is not electable.
a. Which of the following are correct representations of this assertion?
(i) (R ∧ E) ⇐⇒ C
(ii) R ⇒ (E ⇐⇒ C)
(iii) R ⇒ ((C ⇒ E) ∨ ¬E)
b. Which of the sentences in (a) can be expressed in Horn form?
7.15 This question considers representing satisfiability (SAT) problems as CSPs.
a. Draw the constraint graph corresponding to the SAT problem
(¬X1 ∨ X2) ∧ (¬X2 ∨ X3) ∧ . . . ∧ (¬Xn−1 ∨ Xn)
for the particular case n = 4.
b. How many solutions are there for this general SAT problem as a function of n?
c. Suppose we apply BACKTRACKING-SEARCH (page 215) to find all solutions to a SAT
CSP of the type given in (a). (To find all solutions to a CSP, we simply modify the
basic algorithm so it continues searching after each solution is found.) Assume that
variables are ordered X1, . . . , Xn and false is ordered before true. How much time
will the algorithm take to terminate? (Write an O(·) expression as a function of n.)
d. We know that SAT problems in Horn form can be solved in linear time by forward
chaining (unit propagation). We also know that every tree-structured binary CSP with
discrete, finite domains can be solved in time linear in the number of variables (Sec-
tion 6.5). Are these two facts connected? Discuss.
7.16 Prove each of the following assertions:
a. Every pair of propositional clauses either has no resolvents, or all their resolvents are
logically equivalent.
b. There is no clause that, when resolved with itself, yields (after factoring) the clause
(¬P ∨ ¬Q).
c. If a propositional clause C can be resolved with a copy of itself, it must be logically
equivalent to True.
7.17 Consider the following sentence:
[(Food ⇒ Party) ∨ (Drinks ⇒ Party)] ⇒ [(Food ∧ Drinks) ⇒ Party] .

Exercises 283
a. Determine, using enumeration, whether this sentence is valid, satisfiable (but not valid),
or unsatisfiable.
b. Convert the left-hand and right-hand sides of the main implication into CNF, showing
each step, and explain how the results confirm your answer to (a).
c. Prove your answer to (a) using resolution.
7.18 A sentence is in disjunctive normal form (DNF) if it is the disjunction of conjunctions
DISJUNCTIVE
NORMAL FORM
of literals. For example, the sentence (A ∧ B ∧ ¬C) ∨ (¬A ∧ C) ∨ (B ∧ ¬C) is in DNF.
a. Any propositional logic sentence is logically equivalent to the assertion that some pos-
sible world in which it would be true is in fact the case. From this observation, prove
that any sentence can be written in DNF.
b. Construct an algorithm that converts any sentence in propositional logic into DNF.
(Hint: The algorithm is similar to the algorithm for conversion to CNF given in Sec-
tion 7.5.2.)
c. Construct a simple algorithm that takes as input a sentence in DNF and returns a satis-
fying assignment if one exists, or reports that no satisfying assignment exists.
d. Apply the algorithms in (b) and (c) to the following set of sentences:
A ⇒ B
B ⇒ C
C ⇒ ¬A .
e. Since the algorithm in (b) is very similar to the algorithm for conversion to CNF, and
since the algorithm in (c) is much simpler than any algorithm for solving a set of sen-
tences in CNF, why is this technique not used in automated reasoning?
7.19 Convert the following set of sentences to clausal form.
S1: A ⇔ (C ∨ E).
S2: E ⇒ D.
S3: B ∧ F ⇒ ¬C.
S4: E ⇒ C.
S5: C ⇒ F.
S6: C ⇒ B
Give a trace of the execution of DPLL on the conjunction of these clauses.
7.20 Is a randomly generated 4-CNF sentence with n symbols and m clauses more or less
likely to be solvable than a randomly generated 3-CNF sentence with n symbols and m
clauses? Explain.
7.21 Minesweeper, the well-known computer game, is closely related to the wumpus world.
A minesweeper world is a rectangular grid of N squares with M invisible mines scattered
among them. Any square may be probed by the agent; instant death follows if a mine is
probed. Minesweeper indicates the presence of mines by revealing, in each probed square,
the number of mines that are directly or diagonally adjacent. The goal is to probe every
unmined square.

a. Let Xi,j be true iff square [i, j] contains a mine. Write down the assertion that exactly
two mines are adjacent to [1,1] as a sentence involving some logical combination of
Xi,j propositions.
b. Generalize your assertion from (a) by explaining how to construct a CNF sentence
asserting that k of n neighbors contain mines.
c. Explain precisely how an agent can use DPLL to prove that a given square does (or
does not) contain a mine, ignoring the global constraint that there are exactly M mines
in all.
d. Suppose that the global constraint is constructed from your method from part (b). How
does the number of clauses depend on M and N? Suggest a way to modify DPLL so
that the global constraint does not need to be represented explicitly.
e. Are any conclusions derived by the method in part (c) invalidated when the global
constraint is taken into account?
f. Give examples of configurations of probe values that induce long-range dependencies
such that the contents of a given unprobed square would give information about the
contents of a far-distant square. (Hint: consider an N × 1 board.)
7.22 How long does it take to prove KB |= α using DPLL when α is a literal already
contained in KB? Explain.
7.23 Trace the behavior of DPLL on the knowledge base in Figure 7.16 when trying to
prove Q, and compare this behavior with that of the forward-chaining algorithm.
7.24 Discuss what is meant by optimal behavior in the wumpus world. Show that the
HYBRID-WUMPUS-AGENT is not optimal, and suggest ways to improve it.
7.25 Suppose an agent inhabits a world with two states, S and ¬S, and can do exactly one
of two actions, a and b. Action a does nothing and action b flips from one state to the other.
Let St be the proposition that the agent is in state S at time t, and let at be the proposition
that the agent does action a at time t (similarly for bt).
a. Write a successor-state axiom for St+1.
b. Convert the sentence in (a) into CNF.
c. Show a resolution refutation proof that if the agent is in ¬S at time t and does a, it will
still be in ¬S at time t + 1.
7.26 Section 7.7.1 provides some of the successor-state axioms required for the wumpus
world. Write down axioms for all remaining fluent symbols.
7.27 Modify the HYBRID-WUMPUS-AGENT to use the 1-CNF logical state estimation
method described on page 271. We noted on that page that such an agent will not be able
to acquire, maintain, and use more complex beliefs such as the disjunction P3,1 ∨ P2,2. Sug-
gest a method for overcoming this problem by defining additional proposition symbols, and
try it out in the wumpus world. Does it improve the performance of the agent?

8 FIRST-ORDER LOGIC
In which we notice that the world is blessed with many objects, some of which are
related to other objects, and in which we endeavor to reason about them.
In Chapter 7, we showed how a knowledge-based agent could represent the world in which it
operates and deduce what actions to take. We used propositional logic as our representation
language because it sufficed to illustrate the basic concepts of logic and knowledge-based
agents. Unfortunately, propositional logic is too puny a language to represent knowledge
of complex environments in a concise way. In this chapter, we examine first-order logic,1
FIRST-ORDER LOGIC
which is sufficiently expressive to represent a good deal of our commonsense knowledge.
It also either subsumes or forms the foundation of many other representation languages and
has been studied intensively for many decades. We begin in Section 8.1 with a discussion of
representation languages in general; Section 8.2 covers the syntax and semantics of first-order
logic; Sections 8.3 and 8.4 illustrate the use of first-order logic for simple representations.
8.1 REPRESENTATION REVISITED
In this section, we discuss the nature of representation languages. Our discussion motivates
the development of first-order logic, a much more expressive language than the propositional
logic introduced in Chapter 7. We look at propositional logic and at other kinds of languages
to understand what works and what fails. Our discussion will be cursory, compressing cen-
turies of thought, trial, and error into a few paragraphs.
Programming languages (such as C++ or Java or Lisp) are by far the largest class of
formal languages in common use. Programs themselves represent, in a direct sense, only
computational processes. Data structures within programs can represent facts; for example,
a program could use a 4 × 4 array to represent the contents of the wumpus world. Thus, the
programming language statement World[2,2] ← Pit is a fairly natural way to assert that there
is a pit in square [2,2]. (Such representations might be considered ad hoc; database systems
were developed precisely to provide a more general, domain-independent way to store and
1 Also called first-order predicate calculus, sometimes abbreviated as FOL or FOPC.
285

286 Chapter 8. First-Order Logic
retrieve facts.) What programming languages lack is any general mechanism for deriving
facts from other facts; each update to a data structure is done by a domain-specific procedure
whose details are derived by the programmer from his or her own knowledge of the domain.
This procedural approach can be contrasted with the declarative nature of propositional logic,
in which knowledge and inference are separate, and inference is entirely domain independent.
A second drawback of data structures in programs (and of databases, for that matter)
is the lack of any easy way to say, for example, “There is a pit in [2,2] or [3,1]” or “If the
wumpus is in [1,1] then he is not in [2,2].” Programs can store a single value for each variable,
and some systems allow the value to be “unknown,” but they lack the expressiveness required
to handle partial information.
Propositional logic is a declarative language because its semantics is based on a truth
relation between sentences and possible worlds. It also has sufficient expressive power to
deal with partial information, using disjunction and negation. Propositional logic has a third
property that is desirable in representation languages, namely, compositionality. In a com-
COMPOSITIONALITY
positional language, the meaning of a sentence is a function of the meaning of its parts. For
example, the meaning of “S1,4 ∧ S1,2” is related to the meanings of “S1,4” and “S1,2.” It
would be very strange if “S1,4” meant that there is a stench in square [1,4] and “S1,2” meant
that there is a stench in square [1,2], but “S1,4 ∧S1,2” meant that France and Poland drew 1–1
in last week’s ice hockey qualifying match. Clearly, noncompositionality makes life much
more difficult for the reasoning system.
As we saw in Chapter 7, however, propositional logic lacks the expressive power to
concisely describe an environment with many objects. For example, we were forced to write
a separate rule about breezes and pits for each square, such as
B1,1 ⇔ (P1,2 ∨ P2,1) .
In English, on the other hand, it seems easy enough to say, once and for all, “Squares adjacent
to pits are breezy.” The syntax and semantics of English somehow make it possible to describe
the environment concisely.
8.1.1 The language of thought
Natural languages (such as English or Spanish) are very expressive indeed. We managed to
write almost this whole book in natural language, with only occasional lapses into other lan-
guages (including logic, mathematics, and the language of diagrams). There is a long tradi-
tion in linguistics and the philosophy of language that views natural language as a declarative
knowledge representation language. If we could uncover the rules for natural language, we
could use it in representation and reasoning systems and gain the benefit of the billions of
pages that have been written in natural language.
The modern view of natural language is that it serves a as a medium for communication
rather than pure representation. When a speaker points and says, “Look!” the listener comes
to know that, say, Superman has finally appeared over the rooftops. Yet we would not want
to say that the sentence “Look!” represents that fact. Rather, the meaning of the sentence
depends both on the sentence itself and on the context in which the sentence was spoken.
Clearly, one could not store a sentence such as “Look!” in a knowledge base and expect to

Section 8.1. Representation Revisited 287
recover its meaning without also storing a representation of the context—which raises the
question of how the context itself can be represented. Natural languages also suffer from
ambiguity, a problem for a representation language. As Pinker (1995) puts it: “When people
AMBIGUITY
think about spring, surely they are not confused as to whether they are thinking about a season
or something that goes boing—and if one word can correspond to two thoughts, thoughts
can’t be words.”
The famous Sapir–Whorf hypothesis claims that our understanding of the world is
strongly influenced by the language we speak. Whorf (1956) wrote “We cut nature up, orga-
nize it into concepts, and ascribe significances as we do, largely because we are parties to an
agreement to organize it this way—an agreement that holds throughout our speech commu-
nity and is codified in the patterns of our language.” It is certainly true that different speech
communities divide up the world differently. The French have two words “chaise” and “fau-
teuil,” for a concept that English speakers cover with one: “chair.” But English speakers
can easily recognize the category fauteuil and give it a name—roughly “open-arm chair”—so
does language really make a difference? Whorf relied mainly on intuition and speculation,
but in the intervening years we actually have real data from anthropological, psychological
and neurological studies.
For example, can you remember which of the following two phrases formed the opening
of Section 8.1?
“In this section, we discuss the nature of representation languages . . .”
“This section covers the topic of knowledge representation languages . . .”
Wanner (1974) did a similar experiment and found that subjects made the right choice at
chance level—about 50% of the time—but remembered the content of what they read with
better than 90% accuracy. This suggests that people process the words to form some kind of
nonverbal representation.
More interesting is the case in which a concept is completely absent in a language.
Speakers of the Australian aboriginal language Guugu Yimithirr have no words for relative
directions, such as front, back, right, or left. Instead they use absolute directions, saying,
for example, the equivalent of “I have a pain in my north arm.” This difference in language
makes a difference in behavior: Guugu Yimithirr speakers are better at navigating in open
terrain, while English speakers are better at placing the fork to the right of the plate.
Language also seems to influence thought through seemingly arbitrary grammatical
features such as the gender of nouns. For example, “bridge” is masculine in Spanish and
feminine in German. Boroditsky (2003) asked subjects to choose English adjectives to de-
scribe a photograph of a particular bridge. Spanish speakers chose big, dangerous, strong,
and towering, whereas German speakers chose beautiful, elegant, fragile, and slender. Words
can serve as anchor points that affect how we perceive the world. Loftus and Palmer (1974)
showed experimental subjects a movie of an auto accident. Subjects who were asked “How
fast were the cars going when they contacted each other?” reported an average of 32 mph,
while subjects who were asked the question with the word “smashed” instead of “contacted”
reported 41mph for the same cars in the same movie.

In a ﬁrst-order logic reasoning system that uses CNF, we can see that the linguistic form
“¬(A ∨ B)” and “¬A ∧ ¬B” are the same because we can look inside the system and see
that the two sentences are stored as the same canonical CNF form. Can we do that with the
human brain? Until recently the answer was “no,” but now it is “maybe.” Mitchell et al.
(2008) put subjects in an fMRI (functional magnetic resonance imaging) machine, showed
them words such as “celery,” and imaged their brains. The researchers were then able to train
a computer program to predict, from a brain image, what word the subject had been presented
with. Given two choices (e.g., “celery” or “airplane”), the system predicts correctly 77% of
the time. The system can even predict at above-chance levels for words it has never seen
an fMRI image of before (by considering the images of related words) and for people it has
never seen before (proving that fMRI reveals some level of common representation across
people). This type of work is still in its infancy, but fMRI (and other imaging technology
such as intracranial electrophysiology (Sahin et al., 2009)) promises to give us much more
concrete ideas of what human knowledge representations are like.
From the viewpoint of formal logic, representing the same knowledge in two different
ways makes absolutely no difference; the same facts will be derivable from either represen-
tation. In practice, however, one representation might require fewer steps to derive a conclu-
sion, meaning that a reasoner with limited resources could get to the conclusion using one
representation but not the other. For nondeductive tasks such as learning from experience,
outcomes are necessarily dependent on the form of the representations used. We show in
Chapter 18 that when a learning program considers two possible theories of the world, both
of which are consistent with all the data, the most common way of breaking the tie is to choose
the most succinct theory—and that depends on the language used to represent theories. Thus,
the inﬂuence of language on thought is unavoidable for any agent that does learning.
8.1.2 Combining the best of formal and natural languages
We can adopt the foundation of propositional logic—a declarative, compositional semantics
that is context-independent and unambiguous—and build a more expressive logic on that
foundation, borrowing representational ideas from natural language while avoiding its draw-
backs. When we look at the syntax of natural language, the most obvious elements are nouns
and noun phrases that refer to objects (squares, pits, wumpuses) and verbs and verb phrases
OBJECT
that refer to relations among objects (is breezy, is adjacent to, shoots). Some of these rela-
RELATION
tions are functions—relations in which there is only one “value” for a given “input.” It is
FUNCTION
easy to start listing examples of objects, relations, and functions:
• Objects: people, houses, numbers, theories, Ronald McDonald, colors, baseball games,
wars, centuries . . .
• Relations: these can be unary relations or properties such as red, round, bogus, prime,
PROPERTY
multistoried . . ., or more general n-ary relations such as brother of, bigger than, inside,
part of, has color, occurred after, owns, comes between, . . .
• Functions: father of, best friend, third inning of, one more than, beginning of . . .
Indeed, almost any assertion can be thought of as referring to objects and properties or rela-
tions. Some examples follow:

Section 8.1. Representation Revisited 289
• “One plus two equals three.”
Objects: one, two, three, one plus two; Relation: equals; Function: plus. (“One plus
two” is a name for the object that is obtained by applying the function “plus” to the
objects “one” and “two.” “Three” is another name for this object.)
• “Squares neighboring the wumpus are smelly.”
Objects: wumpus, squares; Property: smelly; Relation: neighboring.
• “Evil King John ruled England in 1200.”
Objects: John, England, 1200; Relation: ruled; Properties: evil, king.
The language of first-order logic, whose syntax and semantics we define in the next section,
is built around objects and relations. It has been so important to mathematics, philosophy, and
artificial intelligence precisely because those fields—and indeed, much of everyday human
existence—can be usefully thought of as dealing with objects and the relations among them.
First-order logic can also express facts about some or all of the objects in the universe. This
enables one to represent general laws or rules, such as the statement “Squares neighboring
the wumpus are smelly.”
The primary difference between propositional and first-order logic lies in the ontologi-
cal commitment made by each language—that is, what it assumes about the nature of reality.
ONTOLOGICAL
COMMITMENT
Mathematically, this commitment is expressed through the nature of the formal models with
respect to which the truth of sentences is defined. For example, propositional logic assumes
that there are facts that either hold or do not hold in the world. Each fact can be in one
of two states: true or false, and each model assigns true or false to each proposition sym-
bol (see Section 7.4.2).2 First-order logic assumes more; namely, that the world consists of
objects with certain relations among them that do or do not hold. The formal models are
correspondingly more complicated than those for propositional logic. Special-purpose logics
make still further ontological commitments; for example, temporal logic assumes that facts
TEMPORAL LOGIC
hold at particular times and that those times (which may be points or intervals) are ordered.
Thus, special-purpose logics give certain kinds of objects (and the axioms about them) “first
class” status within the logic, rather than simply defining them within the knowledge base.
Higher-order logic views the relations and functions referred to by first-order logic as ob-
HIGHER-ORDER
LOGIC
jects in themselves. This allows one to make assertions about all relations—for example, one
could wish to define what it means for a relation to be transitive. Unlike most special-purpose
logics, higher-order logic is strictly more expressive than first-order logic, in the sense that
some sentences of higher-order logic cannot be expressed by any finite number of first-order
logic sentences.
A logic can also be characterized by its epistemological commitments—the possible
EPISTEMOLOGICAL
COMMITMENT
states of knowledge that it allows with respect to each fact. In both propositional and first-
order logic, a sentence represents a fact and the agent either believes the sentence to be true,
believes it to be false, or has no opinion. These logics therefore have three possible states
of knowledge regarding any sentence. Systems using probability theory, on the other hand,
2 In contrast, facts in fuzzy logic have a degree of truth between 0 and 1. For example, the sentence “Vienna is
a large city” might be true in our world only to degree 0.6 in fuzzy logic.

can have any degree of belief, ranging from 0 (total disbelief) to 1 (total belief).3 For ex-
ample, a probabilistic wumpus-world agent might believe that the wumpus is in [1,3] with
probability 0.75. The ontological and epistemological commitments of five different logics
are summarized in Figure 8.1.
Language Ontological Commitment Epistemological Commitment
(What exists in the world) (What an agent believes about facts)
Propositional logic facts true/false/unknown
First-order logic facts, objects, relations true/false/unknown
Temporal logic facts, objects, relations, times true/false/unknown
Probability theory facts degree of belief ∈ [0, 1]
Fuzzy logic facts with degree of truth ∈ [0, 1] known interval value
Figure 8.1 Formal languages and their ontological and epistemological commitments.
In the next section, we will launch into the details of first-order logic. Just as a student of
physics requires some familiarity with mathematics, a student of AI must develop a talent for
working with logical notation. On the other hand, it is also important not to get too concerned
with the specifics of logical notation—after all, there are dozens of different versions. The
main things to keep hold of are how the language facilitates concise representations and how
its semantics leads to sound reasoning procedures.
8.2 SYNTAX AND SEMANTICS OF FIRST-ORDER LOGIC
We begin this section by specifying more precisely the way in which the possible worlds
of first-order logic reflect the ontological commitment to objects and relations. Then we
introduce the various elements of the language, explaining their semantics as we go along.
8.2.1 Models for first-order logic
Recall from Chapter 7 that the models of a logical language are the formal structures that
constitute the possible worlds under consideration. Each model links the vocabulary of the
logical sentences to elements of the possible world, so that the truth of any sentence can
be determined. Thus, models for propositional logic link proposition symbols to predefined
truth values. Models for first-order logic are much more interesting. First, they have objects
in them! The domain of a model is the set of objects or domain elements it contains. The do-
DOMAIN
DOMAIN ELEMENTS main is required to be nonempty—every possible world must contain at least one object. (See
Exercise 8.7 for a discussion of empty worlds.) Mathematically speaking, it doesn’t matter
what these objects are—all that matters is how many there are in each particular model—but
for pedagogical purposes we’ll use a concrete example. Figure 8.2 shows a model with five
3 It is important not to confuse the degree of belief in probability theory with the degree of truth in fuzzy logic.
Indeed, some fuzzy systems allow uncertainty (degree of belief) about degrees of truth.

Section 8.2. Syntax and Semantics of First-Order Logic 291
objects: Richard the Lionheart, King of England from 1189 to 1199; his younger brother, the
evil King John, who ruled from 1199 to 1215; the left legs of Richard and John; and a crown.
The objects in the model may be related in various ways. In the figure, Richard and
John are brothers. Formally speaking, a relation is just the set of tuples of objects that are
TUPLE
related. (A tuple is a collection of objects arranged in a fixed order and is written with angle
brackets surrounding the objects.) Thus, the brotherhood relation in this model is the set
{ hRichard the Lionheart, King Johni, hKing John, Richard the Lionhearti } . (8.1)
(Here we have named the objects in English, but you may, if you wish, mentally substitute the
pictures for the names.) The crown is on King John’s head, so the “on head” relation contains
just one tuple, hthe crown, King Johni. The “brother” and “on head” relations are binary
relations—that is, they relate pairs of objects. The model also contains unary relations, or
properties: the “person” property is true of both Richard and John; the “king” property is true
only of John (presumably because Richard is dead at this point); and the “crown” property is
true only of the crown.
Certain kinds of relationships are best considered as functions, in that a given object
must be related to exactly one object in this way. For example, each person has one left leg,
so the model has a unary “left leg” function that includes the following mappings:
hRichard the Lionhearti → Richard’s left leg
hKing Johni → John’s left leg .
(8.2)
Strictly speaking, models in first-order logic require total functions, that is, there must be a
TOTAL FUNCTIONS
value for every input tuple. Thus, the crown must have a left leg and so must each of the left
legs. There is a technical solution to this awkward problem involving an additional “invisible”
R J
$
left leg
on head
brother
brother
person person
king
crown
left leg
Figure 8.2 A model containing five objects, two binary relations, three unary relations
(indicated by labels on the objects), and one unary function, left-leg.

object that is the left leg of everything that has no left leg, including itself. Fortunately, as
long as one makes no assertions about the left legs of things that have no left legs, these
technicalities are of no import.
So far, we have described the elements that populate models for first-order logic. The
other essential part of a model is the link between those elements and the vocabulary of the
logical sentences, which we explain next.
8.2.2 Symbols and interpretations
We turn now to the syntax of first-order logic. The impatient reader can obtain a complete
description from the formal grammar in Figure 8.3.
The basic syntactic elements of first-order logic are the symbols that stand for objects,
relations, and functions. The symbols, therefore, come in three kinds: constant symbols,
CONSTANT SYMBOL
which stand for objects; predicate symbols, which stand for relations; and function sym-
PREDICATE SYMBOL
bols, which stand for functions. We adopt the convention that these symbols will begin with
FUNCTION SYMBOL
uppercase letters. For example, we might use the constant symbols Richard and John; the
predicate symbols Brother, OnHead, Person, King, and Crown; and the function symbol
LeftLeg. As with proposition symbols, the choice of names is entirely up to the user. Each
predicate and function symbol comes with an arity that fixes the number of arguments.
ARITY
As in propositional logic, every model must provide the information required to deter-
mine if any given sentence is true or false. Thus, in addition to its objects, relations, and
functions, each model includes an interpretation that specifies exactly which objects, rela-
INTERPRETATION
tions and functions are referred to by the constant, predicate, and function symbols. One
possible interpretation for our example—which a logician would call the intended interpre-
tation—is as follows:
INTENDED
INTERPRETATION
• Richard refers to Richard the Lionheart and John refers to the evil King John.
• Brother refers to the brotherhood relation, that is, the set of tuples of objects given in
Equation (8.1); OnHead refers to the “on head” relation that holds between the crown
and King John; Person, King, and Crown refer to the sets of objects that are persons,
kings, and crowns.
• LeftLeg refers to the “left leg” function, that is, the mapping given in Equation (8.2).
There are many other possible interpretations, of course. For example, one interpretation
maps Richard to the crown and John to King John’s left leg. There are five objects in
the model, so there are 25 possible interpretations just for the constant symbols Richard
and John. Notice that not all the objects need have a name—for example, the intended
interpretation does not name the crown or the legs. It is also possible for an object to have
several names; there is an interpretation under which both Richard and John refer to the
crown.4 If you find this possibility confusing, remember that, in propositional logic, it is
perfectly possible to have a model in which Cloudy and Sunny are both true; it is the job of
the knowledge base to rule out models that are inconsistent with our knowledge.
4 Later, in Section 8.2.8, we examine a semantics in which every object has exactly one name.

In summary, a model in first-order logic consists of a set of objects and an interpretation
that maps constant symbols to objects, predicate symbols to relations on those objects, and
function symbols to functions on those objects. Just as with propositional logic, entailment,
validity, and so on are defined in terms of all possible models. To get an idea of what the
set of all possible models looks like, see Figure 8.4. It shows that models vary in how many
objects they contain—from one up to infinity—and in the way the constant symbols map
to objects. If there are two constant symbols and one object, then both symbols must refer
to the same object; but this can still happen even with more objects. When there are more
objects than constant symbols, some of the objects will have no names. Because the number
of possible models is unbounded, checking entailment by the enumeration of all possible
models is not feasible for first-order logic (unlike propositional logic). Even if the number of
objects is restricted, the number of combinations can be very large. (See Exercise 8.5.) For
the example in Figure 8.4, there are 137,506,194,466 models with six or fewer objects.
8.2.3 Terms
A term is a logical expression that refers to an object. Constant symbols are therefore terms,
TERM
but it is not always convenient to have a distinct symbol to name every object. For example,
in English we might use the expression “King John’s left leg” rather than giving a name
to his leg. This is what function symbols are for: instead of using a constant symbol, we
use LeftLeg(John). In the general case, a complex term is formed by a function symbol
followed by a parenthesized list of terms as arguments to the function symbol. It is important
to remember that a complex term is just a complicated kind of name. It is not a “subroutine
call” that “returns a value.” There is no LeftLeg subroutine that takes a person as input and
returns a leg. We can reason about left legs (e.g., stating the general rule that everyone has one
and then deducing that John must have one) without ever providing a definition of LeftLeg.
This is something that cannot be done with subroutines in programming languages.5
The formal semantics of terms is straightforward. Consider a term f(t1, . . . , tn). The
function symbol f refers to some function in the model (call it F); the argument terms refer
to objects in the domain (call them d1, . . . , dn); and the term as a whole refers to the object
that is the value of the function F applied to d1, . . . , dn. For example, suppose the LeftLeg
function symbol refers to the function shown in Equation (8.2) and John refers to King John,
then LeftLeg(John) refers to King John’s left leg. In this way, the interpretation fixes the
referent of every term.
8.2.4 Atomic sentences
Now that we have both terms for referring to objects and predicate symbols for referring to
relations, we can put them together to make atomic sentences that state facts. An atomic
5 λ-expressions provide a useful notation in which new function symbols are constructed “on the fly.” For
example, the function that squares its argument can be written as (λx x × x) and can be applied to arguments
just like any other function symbol. A λ-expression can also be defined and used as a predicate symbol. (See
Chapter 22.) The lambda operator in Lisp plays exactly the same role. Notice that the use of λ in this way does
not increase the formal expressive power of first-order logic, because any sentence that includes a λ-expression
can be rewritten by “plugging in” its arguments to yield an equivalent sentence.

sentence (or atom for short) is formed from a predicate symbol optionally followed by a
ATOMIC SENTENCE
ATOM parenthesized list of terms, such as
Brother(Richard, John).
This states, under the intended interpretation given earlier, that Richard the Lionheart is the
brother of King John.6 Atomic sentences can have complex terms as arguments. Thus,
Married(Father(Richard), Mother(John))
states that Richard the Lionheart’s father is married to King John’s mother (again, under a
suitable interpretation).
An atomic sentence is true in a given model if the relation referred to by the predicate
symbol holds among the objects referred to by the arguments.
8.2.5 Complex sentences
We can use logical connectives to construct more complex sentences, with the same syntax
and semantics as in propositional calculus. Here are four sentences that are true in the model
of Figure 8.2 under our intended interpretation:
¬Brother(LeftLeg(Richard), John)
Brother(Richard, John) ∧ Brother(John, Richard)
King(Richard) ∨ King(John)
¬King(Richard) ⇒ King(John) .
8.2.6 Quantifiers
Once we have a logic that allows objects, it is only natural to want to express properties of
entire collections of objects, instead of enumerating the objects by name. Quantifiers let us
QUANTIFIER
do this. First-order logic contains two standard quantifiers, called universal and existential.
Universal quantification (∀)
Recall the difficulty we had in Chapter 7 with the expression of general rules in proposi-
tional logic. Rules such as “Squares neighboring the wumpus are smelly” and “All kings
are persons” are the bread and butter of first-order logic. We deal with the first of these in
Section 8.3. The second rule, “All kings are persons,” is written in first-order logic as
∀ x King(x) ⇒ Person(x) .
∀ is usually pronounced “For all . . .”. (Remember that the upside-down A stands for “all.”)
Thus, the sentence says, “For all x, if x is a king, then x is a person.” The symbol x is called
a variable. By convention, variables are lowercase letters. A variable is a term all by itself,
VARIABLE
and as such can also serve as the argument of a function—for example, LeftLeg(x). A term
with no variables is called a ground term.
GROUND TERM
Intuitively, the sentence ∀ x P, where P is any logical expression, says that P is true
for every object x. More precisely, ∀ x P is true in a given model if P is true in all possible
extended interpretations constructed from the interpretation given in the model, where each
EXTENDED
INTERPRETATION
6 We usually follow the argument-ordering convention that P(x, y) is read as “x is a P of y.”

extended interpretation specifies a domain element to which x refers.
This sounds complicated, but it is really just a careful way of stating the intuitive mean-
ing of universal quantification. Consider the model shown in Figure 8.2 and the intended
interpretation that goes with it. We can extend the interpretation in five ways:
x → Richard the Lionheart,
x → King John,
x → Richard’s left leg,
x → John’s left leg,
x → the crown.
The universally quantified sentence ∀ x King(x) ⇒ Person(x) is true in the original model
if the sentence King(x) ⇒ Person(x) is true under each of the five extended interpreta-
tions. That is, the universally quantified sentence is equivalent to asserting the following five
sentences:
Richard the Lionheart is a king ⇒ Richard the Lionheart is a person.
King John is a king ⇒ King John is a person.
Richard’s left leg is a king ⇒ Richard’s left leg is a person.
John’s left leg is a king ⇒ John’s left leg is a person.
The crown is a king ⇒ the crown is a person.
Let us look carefully at this set of assertions. Since, in our model, King John is the only
king, the second sentence asserts that he is a person, as we would hope. But what about
the other four sentences, which appear to make claims about legs and crowns? Is that part
of the meaning of “All kings are persons”? In fact, the other four assertions are true in the
model, but make no claim whatsoever about the personhood qualifications of legs, crowns,
or indeed Richard. This is because none of these objects is a king. Looking at the truth table
for ⇒ (Figure 7.8 on page 246), we see that the implication is true whenever its premise is
false—regardless of the truth of the conclusion. Thus, by asserting the universally quantified
sentence, which is equivalent to asserting a whole list of individual implications, we end
up asserting the conclusion of the rule just for those objects for whom the premise is true
and saying nothing at all about those individuals for whom the premise is false. Thus, the
truth-table definition of ⇒ turns out to be perfect for writing general rules with universal
quantifiers.
A common mistake, made frequently even by diligent readers who have read this para-
graph several times, is to use conjunction instead of implication. The sentence
∀ x King(x) ∧ Person(x)
would be equivalent to asserting
Richard the Lionheart is a king ∧ Richard the Lionheart is a person,
King John is a king ∧ King John is a person,
Richard’s left leg is a king ∧ Richard’s left leg is a person,
and so on. Obviously, this does not capture what we want.

Existential quantification (∃)
Universal quantification makes statements about every object. Similarly, we can make a state-
ment about some object in the universe without naming it, by using an existential quantifier.
To say, for example, that King John has a crown on his head, we write
∃ x Crown(x) ∧ OnHead(x, John) .
∃x is pronounced “There exists an x such that . . .” or “For some x . . .”.
Intuitively, the sentence ∃ x P says that P is true for at least one object x. More
precisely, ∃ x P is true in a given model if P is true in at least one extended interpretation
that assigns x to a domain element. That is, at least one of the following is true:
Richard the Lionheart is a crown ∧ Richard the Lionheart is on John’s head;
King John is a crown ∧ King John is on John’s head;
Richard’s left leg is a crown ∧ Richard’s left leg is on John’s head;
John’s left leg is a crown ∧ John’s left leg is on John’s head;
The crown is a crown ∧ the crown is on John’s head.
The fifth assertion is true in the model, so the original existentially quantified sentence is
true in the model. Notice that, by our definition, the sentence would also be true in a model
in which King John was wearing two crowns. This is entirely consistent with the original
sentence “King John has a crown on his head.” 7
Just as ⇒ appears to be the natural connective to use with ∀, ∧ is the natural connective
to use with ∃. Using ∧ as the main connective with ∀ led to an overly strong statement in
the example in the previous section; using ⇒ with ∃ usually leads to a very weak statement,
indeed. Consider the following sentence:
∃ x Crown(x) ⇒ OnHead(x, John) .
On the surface, this might look like a reasonable rendition of our sentence. Applying the
semantics, we see that the sentence says that at least one of the following assertions is true:
Richard the Lionheart is a crown ⇒ Richard the Lionheart is on John’s head;
King John is a crown ⇒ King John is on John’s head;
Richard’s left leg is a crown ⇒ Richard’s left leg is on John’s head;
and so on. Now an implication is true if both premise and conclusion are true, or if its premise
is false. So if Richard the Lionheart is not a crown, then the first assertion is true and the
existential is satisfied. So, an existentially quantified implication sentence is true whenever
any object fails to satisfy the premise; hence such sentences really do not say much at all.
Nested quantifiers
We will often want to express more complex sentences using multiple quantifiers. The sim-
plest case is where the quantifiers are of the same type. For example, “Brothers are siblings”
can be written as
∀ x ∀ y Brother(x, y) ⇒ Sibling(x, y) .
7 There is a variant of the existential quantifier, usually written ∃1
or ∃!, that means “There exists exactly one.”
The same meaning can be expressed using equality statements.

Consecutive quantifiers of the same type can be written as one quantifier with several vari-
ables. For example, to say that siblinghood is a symmetric relationship, we can write
∀ x, y Sibling(x, y) ⇔ Sibling(y, x) .
In other cases we will have mixtures. “Everybody loves somebody” means that for every
person, there is someone that person loves:
∀ x ∃ y Loves(x, y) .
On the other hand, to say “There is someone who is loved by everyone,” we write
∃ y ∀ x Loves(x, y) .
The order of quantification is therefore very important. It becomes clearer if we insert paren-
theses. ∀ x (∃ y Loves(x, y)) says that everyone has a particular property, namely, the prop-
erty that they love someone. On the other hand, ∃ y (∀ x Loves(x, y)) says that someone in
the world has a particular property, namely the property of being loved by everybody.
Some confusion can arise when two quantifiers are used with the same variable name.
Consider the sentence
∀ x (Crown(x) ∨ (∃ x Brother(Richard, x))) .
Here the x in Brother(Richard, x) is existentially quantified. The rule is that the variable
belongs to the innermost quantifier that mentions it; then it will not be subject to any other
quantification. Another way to think of it is this: ∃ x Brother(Richard, x) is a sentence
about Richard (that he has a brother), not about x; so putting a ∀ x outside it has no effect. It
could equally well have been written ∃ z Brother(Richard, z). Because this can be a source
of confusion, we will always use different variable names with nested quantifiers.
Connections between ∀ and ∃
The two quantifiers are actually intimately connected with each other, through negation. As-
serting that everyone dislikes parsnips is the same as asserting there does not exist someone
who likes them, and vice versa:
∀ x ¬Likes(x, Parsnips) is equivalent to ¬∃ x Likes(x, Parsnips) .
We can go one step further: “Everyone likes ice cream” means that there is no one who does
not like ice cream:
∀ x Likes(x, IceCream) is equivalent to ¬∃ x ¬Likes(x, IceCream) .
Because ∀ is really a conjunction over the universe of objects and ∃ is a disjunction, it should
not be surprising that they obey De Morgan’s rules. The De Morgan rules for quantified and
unquantified sentences are as follows:
∀ x ¬P ≡ ¬∃ x P ¬(P ∨ Q) ≡ ¬P ∧ ¬Q
¬∀ x P ≡ ∃ x ¬P ¬(P ∧ Q) ≡ ¬P ∨ ¬Q
∀ x P ≡ ¬∃ x ¬P P ∧ Q ≡ ¬(¬P ∨ ¬Q)
∃ x P ≡ ¬∀ x ¬P P ∨ Q ≡ ¬(¬P ∧ ¬Q) .
Thus, we do not really need both ∀ and ∃, just as we do not really need both ∧ and ∨. Still,
readability is more important than parsimony, so we will keep both of the quantifiers.

8.2.7 Equality
First-order logic includes one more way to make atomic sentences, other than using a predi-
cate and terms as described earlier. We can use the equality symbol to signify that two terms
EQUALITY SYMBOL
refer to the same object. For example,
Father(John) = Henry
says that the object referred to by Father(John) and the object referred to by Henry are the
same. Because an interpretation ﬁxes the referent of any term, determining the truth of an
equality sentence is simply a matter of seeing that the referents of the two terms are the same
object.
The equality symbol can be used to state facts about a given function, as we just did for
the Father symbol. It can also be used with negation to insist that two terms are not the same
object. To say that Richard has at least two brothers, we would write
∃ x, y Brother(x, Richard) ∧ Brother(y, Richard) ∧ ¬(x = y) .
The sentence
∃ x, y Brother(x, Richard) ∧ Brother(y, Richard)
does not have the intended meaning. In particular, it is true in the model of Figure 8.2, where
Richard has only one brother. To see this, consider the extended interpretation in which both
x and y are assigned to King John. The addition of ¬(x = y) rules out such models. The
notation x 6= y is sometimes used as an abbreviation for ¬(x = y).
8.2.8 An alternative semantics?
Continuing the example from the previous section, suppose that we believe that Richard has
two brothers, John and Geoffrey.8 Can we capture this state of affairs by asserting
Brother(John, Richard) ∧ Brother(Geoffrey, Richard) ? (8.3)
Not quite. First, this assertion is true in a model where Richard has only one brother—
we need to add John 6= Geoffrey. Second, the sentence doesn’t rule out models in which
Richard has many more brothers besides John and Geoffrey. Thus, the correct translation of
“Richard’s brothers are John and Geoffrey” is as follows:
Brother(John, Richard) ∧ Brother(Geoffrey, Richard) ∧ John 6= Geoffrey
∧ ∀ x Brother(x, Richard) ⇒ (x = John ∨ x = Geoffrey) .
For many purposes, this seems much more cumbersome than the corresponding natural-
language expression. As a consequence, humans may make mistakes in translating their
knowledge into ﬁrst-order logic, resulting in unintuitive behaviors from logical reasoning
systems that use the knowledge. Can we devise a semantics that allows a more straightfor-
ward logical expression?
One proposal that is very popular in database systems works as follows. First, we insist
that every constant symbol refer to a distinct object—the so-called unique-names assump-
tion. Second, we assume that atomic sentences not known to be true are in fact false—the
UNIQUE-NAMES
ASSUMPTION
closed-world assumption. Finally, we invoke domain closure, meaning that each model
CLOSED-WORLD
ASSUMPTION
DOMAIN CLOSURE
8 Actually he had four, the others being William and Henry.

. . .
R J
R
J
R J
R
J
R J
R
J
R J
R
J
R J
R
J
Figure 8.5 Some members of the set of all models for a language with two constant sym-
bols, R and J, and one binary relation symbol, under database semantics. The interpretation
of the constant symbols is fixed, and there is a distinct object for each constant symbol.
contains no more domain elements than those named by the constant symbols. Under the
resulting semantics, which we call database semantics to distinguish it from the standard
DATABASE
SEMANTICS
semantics of first-order logic, the sentence Equation (8.3) does indeed state that Richard’s
two brothers are John and Geoffrey. Database semantics is also used in logic programming
systems, as explained in Section 9.4.5.
It is instructive to consider the set of all possible models under database semantics for
the same case as shown in Figure 8.4. Figure 8.5 shows some of the models, ranging from
the model with no tuples satisfying the relation to the model with all tuples satisfying the
relation. With two objects, there are four possible two-element tuples, so there are 24 = 16
different subsets of tuples that can satisfy the relation. Thus, there are 16 possible models in
all—a lot fewer than the infinitely many models for the standard first-order semantics. On the
other hand, the database semantics requires definite knowledge of what the world contains.
This example brings up an important point: there is no one “correct” semantics for
logic. The usefulness of any proposed semantics depends on how concise and intuitive it
makes the expression of the kinds of knowledge we want to write down, and on how easy
and natural it is to develop the corresponding rules of inference. Database semantics is most
useful when we are certain about the identity of all the objects described in the knowledge
base and when we have all the facts at hand; in other cases, it is quite awkward. For the rest
of this chapter, we assume the standard semantics while noting instances in which this choice
leads to cumbersome expressions.
8.3 USING FIRST-ORDER LOGIC
Now that we have defined an expressive logical language, it is time to learn how to use it. The
best way to do this is through examples. We have seen some simple sentences illustrating the
various aspects of logical syntax; in this section, we provide more systematic representations
of some simple domains. In knowledge representation, a domain is just some part of the
DOMAIN
world about which we wish to express some knowledge.
We begin with a brief description of the TELL/ASK interface for first-order knowledge
bases. Then we look at the domains of family relationships, numbers, sets, and lists, and at

Section 8.3. Using First-Order Logic 301
the wumpus world. The next section contains a more substantial example (electronic circuits)
and Chapter 12 covers everything in the universe.
8.3.1 Assertions and queries in first-order logic
Sentences are added to a knowledge base using TELL, exactly as in propositional logic. Such
sentences are called assertions. For example, we can assert that John is a king, Richard is a
ASSERTION
person, and all kings are persons:
TELL(KB, King(John)) .
TELL(KB, Person(Richard)) .
TELL(KB, ∀ x King(x) ⇒ Person(x)) .
We can ask questions of the knowledge base using ASK. For example,
ASK(KB, King(John))
returns true. Questions asked with ASK are called queries or goals. Generally speaking, any
QUERY
GOAL query that is logically entailed by the knowledge base should be answered affirmatively. For
example, given the two preceding assertions, the query
ASK(KB, Person(John))
should also return true. We can ask quantified queries, such as
ASK(KB, ∃ x Person(x)) .
The answer is true, but this is perhaps not as helpful as we would like. It is rather like
answering “Can you tell me the time?” with “Yes.” If we want to know what value of x
makes the sentence true, we will need a different function, ASKVARS, which we call with
ASKVARS(KB, Person(x))
and which yields a stream of answers. In this case there will be two answers: {x/John} and
{x/Richard}. Such an answer is called a substitution or binding list. ASKVARS is usually
SUBSTITUTION
BINDING LIST reserved for knowledge bases consisting solely of Horn clauses, because in such knowledge
bases every way of making the query true will bind the variables to specific values. That is
not the case with first-order logic; if KB has been told King(John) ∨ King(Richard), then
there is no binding to x for the query ∃ x King(x), even though the query is true.
8.3.2 The kinship domain
The first example we consider is the domain of family relationships, or kinship. This domain
includes facts such as “Elizabeth is the mother of Charles” and “Charles is the father of
William” and rules such as “One’s grandmother is the mother of one’s parent.”
Clearly, the objects in our domain are people. We have two unary predicates, Male and
Female. Kinship relations—parenthood, brotherhood, marriage, and so on—are represented
by binary predicates: Parent, Sibling, Brother, Sister, Child, Daughter, Son, Spouse,
Wife, Husband, Grandparent, Grandchild, Cousin, Aunt, and Uncle. We use functions
for Mother and Father, because every person has exactly one of each of these (at least
according to nature’s design).

We can go through each function and predicate, writing down what we know in terms
of the other symbols. For example, one’s mother is one’s female parent:
∀ m, c Mother(c) = m ⇔ Female(m) ∧ Parent(m, c) .
One’s husband is one’s male spouse:
∀ w, h Husband(h, w) ⇔ Male(h) ∧ Spouse(h, w) .
Male and female are disjoint categories:
∀ x Male(x) ⇔ ¬Female(x) .
Parent and child are inverse relations:
∀ p, c Parent(p, c) ⇔ Child(c, p) .
A grandparent is a parent of one’s parent:
∀ g, c Grandparent(g, c) ⇔ ∃ p Parent(g, p) ∧ Parent(p, c) .
A sibling is another child of one’s parents:
∀ x, y Sibling(x, y) ⇔ x 6= y ∧ ∃ p Parent(p, x) ∧ Parent(p, y) .
We could go on for several more pages like this, and Exercise 8.15 asks you to do just that.
Each of these sentences can be viewed as an axiom of the kinship domain, as explained
in Section 7.1. Axioms are commonly associated with purely mathematical domains—we
will see some axioms for numbers shortly—but they are needed in all domains. They provide
the basic factual information from which useful conclusions can be derived. Our kinship
axioms are also definitions; they have the form ∀ x, y P(x, y) ⇔ . . .. The axioms define
DEFINITION
the Mother function and the Husband, Male, Parent, Grandparent, and Sibling predicates
in terms of other predicates. Our definitions “bottom out” at a basic set of predicates (Child,
Spouse, and Female) in terms of which the others are ultimately defined. This is a natural
way in which to build up the representation of a domain, and it is analogous to the way in
which software packages are built up by successive definitions of subroutines from primitive
library functions. Notice that there is not necessarily a unique set of primitive predicates;
we could equally well have used Parent, Spouse, and Male. In some domains, as we show,
there is no clearly identifiable basic set.
Not all logical sentences about a domain are axioms. Some are theorems—that is, they
THEOREM
are entailed by the axioms. For example, consider the assertion that siblinghood is symmetric:
∀ x, y Sibling(x, y) ⇔ Sibling(y, x) .
Is this an axiom or a theorem? In fact, it is a theorem that follows logically from the axiom
that defines siblinghood. If we ASK the knowledge base this sentence, it should return true.
From a purely logical point of view, a knowledge base need contain only axioms and
no theorems, because the theorems do not increase the set of conclusions that follow from
the knowledge base. From a practical point of view, theorems are essential to reduce the
computational cost of deriving new sentences. Without them, a reasoning system has to start
from first principles every time, rather like a physicist having to rederive the rules of calculus
for every new problem.

Not all axioms are definitions. Some provide more general information about certain
predicates without constituting a definition. Indeed, some predicates have no complete defi-
nition because we do not know enough to characterize them fully. For example, there is no
obvious definitive way to complete the sentence
∀ x Person(x) ⇔ . . .
Fortunately, first-order logic allows us to make use of the Person predicate without com-
pletely defining it. Instead, we can write partial specifications of properties that every person
has and properties that make something a person:
∀ x Person(x) ⇒ . . .
∀ x . . . ⇒ Person(x) .
Axioms can also be “just plain facts,” such as Male(Jim) and Spouse(Jim, Laura).
Such facts form the descriptions of specific problem instances, enabling specific questions
to be answered. The answers to these questions will then be theorems that follow from
the axioms. Often, one finds that the expected answers are not forthcoming—for example,
from Spouse(Jim, Laura) one expects (under the laws of many countries) to be able to infer
¬Spouse(George, Laura); but this does not follow from the axioms given earlier—even after
we add Jim 6= George as suggested in Section 8.2.8. This is a sign that an axiom is missing.
Exercise 8.8 asks the reader to supply it.
8.3.3 Numbers, sets, and lists
Numbers are perhaps the most vivid example of how a large theory can be built up from
a tiny kernel of axioms. We describe here the theory of natural numbers or non-negative
NATURAL NUMBERS
integers. We need a predicate NatNum that will be true of natural numbers; we need one
constant symbol, 0; and we need one function symbol, S (successor). The Peano axioms
PEANO AXIOMS
define natural numbers and addition.9 Natural numbers are defined recursively:
NatNum(0) .
∀ n NatNum(n) ⇒ NatNum(S(n)) .
That is, 0 is a natural number, and for every object n, if n is a natural number, then S(n) is
a natural number. So the natural numbers are 0, S(0), S(S(0)), and so on. (After reading
Section 8.2.8, you will notice that these axioms allow for other natural numbers besides the
usual ones; see Exercise 8.13.) We also need axioms to constrain the successor function:
∀ n 0 6= S(n) .
∀ m, n m 6= n ⇒ S(m) 6= S(n) .
Now we can define addition in terms of the successor function:
∀ m NatNum(m) ⇒ + (0, m) = m .
∀ m, n NatNum(m) ∧ NatNum(n) ⇒ + (S(m), n) = S(+(m, n)) .
The first of these axioms says that adding 0 to any natural number m gives m itself. Notice
the use of the binary function symbol “+” in the term +(m, 0); in ordinary mathematics, the
term would be written m + 0 using infix notation. (The notation we have used for first-order
INFIX
9 The Peano axioms also include the principle of induction, which is a sentence of second-order logic rather
than of first-order logic. The importance of this distinction is explained in Chapter 9.

logic is called prefix.) To make our sentences about numbers easier to read, we allow the use
PREFIX
of infix notation. We can also write S(n) as n + 1, so the second axiom becomes
∀ m, n NatNum(m) ∧ NatNum(n) ⇒ (m + 1) + n = (m + n) + 1 .
This axiom reduces addition to repeated application of the successor function.
The use of infix notation is an example of syntactic sugar, that is, an extension to or
SYNTACTIC SUGAR
abbreviation of the standard syntax that does not change the semantics. Any sentence that
uses sugar can be “desugared” to produce an equivalent sentence in ordinary first-order logic.
Once we have addition, it is straightforward to define multiplication as repeated addi-
tion, exponentiation as repeated multiplication, integer division and remainders, prime num-
bers, and so on. Thus, the whole of number theory (including cryptography) can be built up
from one constant, one function, one predicate and four axioms.
The domain of sets is also fundamental to mathematics as well as to commonsense
SET
reasoning. (In fact, it is possible to define number theory in terms of set theory.) We want to
be able to represent individual sets, including the empty set. We need a way to build up sets
by adding an element to a set or taking the union or intersection of two sets. We will want
to know whether an element is a member of a set and we will want to distinguish sets from
objects that are not sets.
We will use the normal vocabulary of set theory as syntactic sugar. The empty set is a
constant written as { }. There is one unary predicate, Set, which is true of sets. The binary
predicates are x ∈ s (x is a member of set s) and s1 ⊆ s2 (set s1 is a subset, not necessarily
proper, of set s2). The binary functions are s1 ∩ s2 (the intersection of two sets), s1 ∪ s2
(the union of two sets), and {x|s} (the set resulting from adjoining element x to set s). One
possible set of axioms is as follows:
1. The only sets are the empty set and those made by adjoining something to a set:
∀ s Set(s) ⇔ (s = { }) ∨ (∃ x, s2 Set(s2) ∧ s = {x|s2}) .
2. The empty set has no elements adjoined into it. In other words, there is no way to
decompose { } into a smaller set and an element:
¬∃ x, s {x|s} = { } .
3. Adjoining an element already in the set has no effect:
∀ x, s x ∈ s ⇔ s = {x|s} .
4. The only members of a set are the elements that were adjoined into it. We express
this recursively, saying that x is a member of s if and only if s is equal to some set s2
adjoined with some element y, where either y is the same as x or x is a member of s2:
∀ x, s x ∈ s ⇔ ∃ y, s2 (s = {y|s2} ∧ (x = y ∨ x ∈ s2)) .
5. A set is a subset of another set if and only if all of the first set’s members are members
of the second set:
∀ s1, s2 s1 ⊆ s2 ⇔ (∀ x x ∈ s1 ⇒ x ∈ s2) .
6. Two sets are equal if and only if each is a subset of the other:
∀ s1, s2 (s1 = s2) ⇔ (s1 ⊆ s2 ∧ s2 ⊆ s1) .

7. An object is in the intersection of two sets if and only if it is a member of both sets:
∀ x, s1, s2 x ∈ (s1 ∩ s2) ⇔ (x ∈ s1 ∧ x ∈ s2) .
8. An object is in the union of two sets if and only if it is a member of either set:
∀ x, s1, s2 x ∈ (s1 ∪ s2) ⇔ (x ∈ s1 ∨ x ∈ s2) .
Lists are similar to sets. The differences are that lists are ordered and the same element can
LIST
appear more than once in a list. We can use the vocabulary of Lisp for lists: Nil is the constant
list with no elements; Cons, Append, First, and Rest are functions; and Find is the pred-
icate that does for lists what Member does for sets. List? is a predicate that is true only of
lists. As with sets, it is common to use syntactic sugar in logical sentences involving lists. The
empty list is [ ]. The term Cons(x, y), where y is a nonempty list, is written [x|y]. The term
Cons(x, Nil) (i.e., the list containing the element x) is written as [x]. A list of several ele-
ments, such as [A, B, C], corresponds to the nested term Cons(A, Cons(B, Cons(C, Nil))).
Exercise 8.17 asks you to write out the axioms for lists.
8.3.4 The wumpus world
Some propositional logic axioms for the wumpus world were given in Chapter 7. The first-
order axioms in this section are much more concise, capturing in a natural way exactly what
we want to say.
Recall that the wumpus agent receives a percept vector with five elements. The corre-
sponding first-order sentence stored in the knowledge base must include both the percept and
the time at which it occurred; otherwise, the agent will get confused about when it saw what.
We use integers for time steps. A typical percept sentence would be
Percept([Stench, Breeze, Glitter, None, None], 5) .
Here, Percept is a binary predicate, and Stench and so on are constants placed in a list. The
actions in the wumpus world can be represented by logical terms:
Turn(Right), Turn(Left), Forward, Shoot, Grab, Climb .
To determine which is best, the agent program executes the query
ASKVARS(∃ a BestAction(a, 5)) ,
which returns a binding list such as {a/Grab}. The agent program can then return Grab as
the action to take. The raw percept data implies certain facts about the current state. For
example:
∀ t, s, g, m, c Percept([s, Breeze, g, m, c], t) ⇒ Breeze(t) ,
∀ t, s, b, m, c Percept([s, b, Glitter, m, c], t) ⇒ Glitter(t) ,
and so on. These rules exhibit a trivial form of the reasoning process called perception, which
we study in depth in Chapter 24. Notice the quantification over time t. In propositional logic,
we would need copies of each sentence for each time step.
Simple “reflex” behavior can also be implemented by quantified implication sentences.
For example, we have
∀ t Glitter(t) ⇒ BestAction(Grab, t) .

Given the percept and rules from the preceding paragraphs, this would yield the desired con-
clusion BestAction(Grab, 5)—that is, Grab is the right thing to do.
We have represented the agent’s inputs and outputs; now it is time to represent the
environment itself. Let us begin with objects. Obvious candidates are squares, pits, and the
wumpus. We could name each square—Square1,2 and so on—but then the fact that Square1,2
and Square1,3 are adjacent would have to be an “extra” fact, and we would need one such
fact for each pair of squares. It is better to use a complex term in which the row and column
appear as integers; for example, we can simply use the list term [1, 2]. Adjacency of any two
squares can be defined as
∀ x, y, a, b Adjacent([x, y], [a, b]) ⇔
(x = a ∧ (y = b − 1 ∨ y = b + 1)) ∨ (y = b ∧ (x = a − 1 ∨ x = a + 1)) .
We could name each pit, but this would be inappropriate for a different reason: there is no
reason to distinguish among pits.10 It is simpler to use a unary predicate Pit that is true of
squares containing pits. Finally, since there is exactly one wumpus, a constant Wumpus is
just as good as a unary predicate (and perhaps more dignified from the wumpus’s viewpoint).
The agent’s location changes over time, so we write At(Agent, s, t) to mean that the
agent is at square s at time t. We can fix the wumpus’s location with ∀t At(Wumpus, [2, 2], t).
We can then say that objects can only be at one location at a time:
∀ x, s1, s2, t At(x, s1, t) ∧ At(x, s2, t) ⇒ s1 = s2 .
Given its current location, the agent can infer properties of the square from properties of its
current percept. For example, if the agent is at a square and perceives a breeze, then that
square is breezy:
∀ s, t At(Agent, s, t) ∧ Breeze(t) ⇒ Breezy(s) .
It is useful to know that a square is breezy because we know that the pits cannot move about.
Notice that Breezy has no time argument.
Having discovered which places are breezy (or smelly) and, very important, not breezy
(or not smelly), the agent can deduce where the pits are (and where the wumpus is). Whereas
propositional logic necessitates a separate axiom for each square (see R2 and R3 on page 247)
and would need a different set of axioms for each geographical layout of the world, first-order
logic just needs one axiom:
∀ s Breezy(s) ⇔ ∃ r Adjacent(r, s) ∧ Pit(r) . (8.4)
Similarly, in first-order logic we can quantify over time, so we need just one successor-state
axiom for each predicate, rather than a different copy for each time step. For example, the
axiom for the arrow (Equation (7.2) on page 267) becomes
∀ t HaveArrow(t + 1) ⇔ (HaveArrow(t) ∧ ¬Action(Shoot, t)) .
From these two example sentences, we can see that the first-order logic formulation is no
less concise than the original English-language description given in Chapter 7. The reader
10 Similarly, most of us do not name each bird that flies overhead as it migrates to warmer regions in winter. An
ornithologist wishing to study migration patterns, survival rates, and so on does name each bird, by means of a
ring on its leg, because individual birds must be tracked.

Section 8.4. Knowledge Engineering in First-Order Logic 307
is invited to construct analogous axioms for the agent’s location and orientation; in these
cases, the axioms quantify over both space and time. As in the case of propositional state
estimation, an agent can use logical inference with axioms of this kind to keep track of aspects
of the world that are not directly observed. Chapter 10 goes into more depth on the subject of
first-order successor-state axioms and their uses for constructing plans.
8.4 KNOWLEDGE ENGINEERING IN FIRST-ORDER LOGIC
The preceding section illustrated the use of first-order logic to represent knowledge in three
simple domains. This section describes the general process of knowledge-base construction—
a process called knowledge engineering. A knowledge engineer is someone who investigates
KNOWLEDGE
ENGINEERING
a particular domain, learns what concepts are important in that domain, and creates a formal
representation of the objects and relations in the domain. We illustrate the knowledge engi-
neering process in an electronic circuit domain that should already be fairly familiar, so that
we can concentrate on the representational issues involved. The approach we take is suitable
for developing special-purpose knowledge bases whose domain is carefully circumscribed
and whose range of queries is known in advance. General-purpose knowledge bases, which
cover a broad range of human knowledge and are intended to support tasks such as natural
language understanding, are discussed in Chapter 12.
8.4.1 The knowledge-engineering process
Knowledge engineering projects vary widely in content, scope, and difficulty, but all such
projects include the following steps:
1. Identify the task. The knowledge engineer must delineate the range of questions that
the knowledge base will support and the kinds of facts that will be available for each
specific problem instance. For example, does the wumpus knowledge base need to be
able to choose actions or is it required to answer questions only about the contents
of the environment? Will the sensor facts include the current location? The task will
determine what knowledge must be represented in order to connect problem instances to
answers. This step is analogous to the PEAS process for designing agents in Chapter 2.
2. Assemble the relevant knowledge. The knowledge engineer might already be an expert
in the domain, or might need to work with real experts to extract what they know—a
process called knowledge acquisition. At this stage, the knowledge is not represented
KNOWLEDGE
ACQUISITION
formally. The idea is to understand the scope of the knowledge base, as determined by
the task, and to understand how the domain actually works.
For the wumpus world, which is defined by an artificial set of rules, the relevant
knowledge is easy to identify. (Notice, however, that the definition of adjacency was
not supplied explicitly in the wumpus-world rules.) For real domains, the issue of
relevance can be quite difficult—for example, a system for simulating VLSI designs
might or might not need to take into account stray capacitances and skin effects.

3. Decide on a vocabulary of predicates, functions, and constants. That is, translate the
important domain-level concepts into logic-level names. This involves many questions
of knowledge-engineering style. Like programming style, this can have a significant
impact on the eventual success of the project. For example, should pits be represented
by objects or by a unary predicate on squares? Should the agent’s orientation be a
function or a predicate? Should the wumpus’s location depend on time? Once the
choices have been made, the result is a vocabulary that is known as the ontology of
ONTOLOGY
the domain. The word ontology means a particular theory of the nature of being or
existence. The ontology determines what kinds of things exist, but does not determine
their specific properties and interrelationships.
4. Encode general knowledge about the domain. The knowledge engineer writes down
the axioms for all the vocabulary terms. This pins down (to the extent possible) the
meaning of the terms, enabling the expert to check the content. Often, this step reveals
misconceptions or gaps in the vocabulary that must be fixed by returning to step 3 and
iterating through the process.
5. Encode a description of the specific problem instance. If the ontology is well thought
out, this step will be easy. It will involve writing simple atomic sentences about in-
stances of concepts that are already part of the ontology. For a logical agent, problem
instances are supplied by the sensors, whereas a “disembodied” knowledge base is sup-
plied with additional sentences in the same way that traditional programs are supplied
with input data.
6. Pose queries to the inference procedure and get answers. This is where the reward is:
we can let the inference procedure operate on the axioms and problem-specific facts to
derive the facts we are interested in knowing. Thus, we avoid the need for writing an
application-specific solution algorithm.
7. Debug the knowledge base. Alas, the answers to queries will seldom be correct on
the first try. More precisely, the answers will be correct for the knowledge base as
written, assuming that the inference procedure is sound, but they will not be the ones
that the user is expecting. For example, if an axiom is missing, some queries will not be
answerable from the knowledge base. A considerable debugging process could ensue.
Missing axioms or axioms that are too weak can be easily identified by noticing places
where the chain of reasoning stops unexpectedly. For example, if the knowledge base
includes a diagnostic rule (see Exercise 8.14) for finding the wumpus,
∀ s Smelly(s) ⇒ Adjacent(Home(Wumpus), s) ,
instead of the biconditional, then the agent will never be able to prove the absence of
wumpuses. Incorrect axioms can be identified because they are false statements about
the world. For example, the sentence
∀ x NumOfLegs(x, 4) ⇒ Mammal(x)
is false for reptiles, amphibians, and, more importantly, tables. The falsehood of this
sentence can be determined independently of the rest of the knowledge base. In contrast,

a typical error in a program looks like this:
offset = position + 1 .
It is impossible to tell whether this statement is correct without looking at the rest of the
program to see whether, for example, offset is used to refer to the current position,
or to one beyond the current position, or whether the value of position is changed
by another statement and so offset should also be changed again.
To understand this seven-step process better, we now apply it to an extended example—the
domain of electronic circuits.
8.4.2 The electronic circuits domain
We will develop an ontology and knowledge base that allow us to reason about digital circuits
of the kind shown in Figure 8.6. We follow the seven-step process for knowledge engineering.
Identify the task
There are many reasoning tasks associated with digital circuits. At the highest level, one
analyzes the circuit’s functionality. For example, does the circuit in Figure 8.6 actually add
properly? If all the inputs are high, what is the output of gate A2? Questions about the
circuit’s structure are also interesting. For example, what are all the gates connected to the
first input terminal? Does the circuit contain feedback loops? These will be our tasks in this
section. There are more detailed levels of analysis, including those related to timing delays,
circuit area, power consumption, production cost, and so on. Each of these levels would
require additional knowledge.
Assemble the relevant knowledge
What do we know about digital circuits? For our purposes, they are composed of wires and
gates. Signals flow along wires to the input terminals of gates, and each gate produces a
1
2
3
1
2
X1 X2
A1
A2
O1
C1
Figure 8.6 A digital circuit C1, purporting to be a one-bit full adder. The first two inputs
are the two bits to be added, and the third input is a carry bit. The first output is the sum, and
the second output is a carry bit for the next adder. The circuit contains two XOR gates, two
AND gates, and one OR gate.

signal on the output terminal that flows along another wire. To determine what these signals
will be, we need to know how the gates transform their input signals. There are four types
of gates: AND, OR, and XOR gates have two input terminals, and NOT gates have one. All
gates have one output terminal. Circuits, like gates, have input and output terminals.
To reason about functionality and connectivity, we do not need to talk about the wires
themselves, the paths they take, or the junctions where they come together. All that matters
is the connections between terminals—we can say that one output terminal is connected to
another input terminal without having to say what actually connects them. Other factors such
as the size, shape, color, or cost of the various components are irrelevant to our analysis.
If our purpose were something other than verifying designs at the gate level, the ontol-
ogy would be different. For example, if we were interested in debugging faulty circuits, then
it would probably be a good idea to include the wires in the ontology, because a faulty wire
can corrupt the signal flowing along it. For resolving timing faults, we would need to include
gate delays. If we were interested in designing a product that would be profitable, then the
cost of the circuit and its speed relative to other products on the market would be important.
Decide on a vocabulary
We now know that we want to talk about circuits, terminals, signals, and gates. The next step
is to choose functions, predicates, and constants to represent them. First, we need to be able
to distinguish gates from each other and from other objects. Each gate is represented as an
object named by a constant, about which we assert that it is a gate with, say, Gate(X1). The
behavior of each gate is determined by its type: one of the constants AND, OR, XOR, or
NOT. Because a gate has exactly one type, a function is appropriate: Type(X1) = XOR.
Circuits, like gates, are identified by a predicate: Circuit(C1).
Next we consider terminals, which are identified by the predicate Terminal(x). A gate
or circuit can have one or more input terminals and one or more output terminals. We use the
function In(1, X1) to denote the first input terminal for gate X1. A similar function Out is
used for output terminals. The function Arity(c, i, j) says that circuit c has i input and j out-
put terminals. The connectivity between gates can be represented by a predicate, Connected,
which takes two terminals as arguments, as in Connected(Out(1, X1), In(1, X2)).
Finally, we need to know whether a signal is on or off. One possibility is to use a unary
predicate, On(t), which is true when the signal at a terminal is on. This makes it a little
difficult, however, to pose questions such as “What are all the possible values of the signals
at the output terminals of circuit C1 ?” We therefore introduce as objects two signal values, 1
and 0, and a function Signal(t) that denotes the signal value for the terminal t.
Encode general knowledge of the domain
One sign that we have a good ontology is that we require only a few general rules, which can
be stated clearly and concisely. These are all the axioms we will need:
1. If two terminals are connected, then they have the same signal:
∀ t1, t2 Terminal(t1) ∧ Terminal(t2) ∧ Connected(t1, t2) ⇒
Signal(t1) = Signal(t2) .

2. The signal at every terminal is either 1 or 0:
∀ t Terminal(t) ⇒ Signal(t) = 1 ∨ Signal(t) = 0 .
3. Connected is commutative:
∀ t1, t2 Connected(t1, t2) ⇔ Connected(t2, t1) .
4. There are four types of gates:
∀ g Gate(g) ∧ k = Type(g) ⇒ k = AND ∨ k = OR ∨ k = XOR ∨ k = NOT .
5. An AND gate’s output is 0 if and only if any of its inputs is 0:
∀ g Gate(g) ∧ Type(g) = AND ⇒
Signal(Out(1, g)) = 0 ⇔ ∃ n Signal(In(n, g)) = 0 .
6. An OR gate’s output is 1 if and only if any of its inputs is 1:
∀ g Gate(g) ∧ Type(g) = OR ⇒
Signal(Out(1, g)) = 1 ⇔ ∃ n Signal(In(n, g)) = 1 .
7. An XOR gate’s output is 1 if and only if its inputs are different:
∀ g Gate(g) ∧ Type(g) = XOR ⇒
Signal(Out(1, g)) = 1 ⇔ Signal(In(1, g)) 6= Signal(In(2, g)) .
8. A NOT gate’s output is different from its input:
∀ g Gate(g) ∧ (Type(g) = NOT) ⇒
Signal(Out(1, g)) 6= Signal(In(1, g)) .
9. The gates (except for NOT) have two inputs and one output.
∀ g Gate(g) ∧ Type(g) = NOT ⇒ Arity(g, 1, 1) .
∀ g Gate(g) ∧ k = Type(g) ∧ (k = AND ∨ k = OR ∨ k = XOR) ⇒
Arity(g, 2, 1)
10. A circuit has terminals, up to its input and output arity, and nothing beyond its arity:
∀ c, i, j Circuit(c) ∧ Arity(c, i, j) ⇒
∀ n (n ≤ i ⇒ Terminal(In(c, n))) ∧ (n i ⇒ In(c, n) = Nothing) ∧
∀ n (n ≤ j ⇒ Terminal(Out(c, n))) ∧ (n j ⇒ Out(c, n) = Nothing)
11. Gates, terminals, signals, gate types, and Nothing are all distinct.
∀ g, t Gate(g) ∧ Terminal(t) ⇒
g 6= t 6= 1 6= 0 6= OR 6= AND 6= XOR 6= NOT 6= Nothing .
12. Gates are circuits.
∀ g Gate(g) ⇒ Circuit(g)
Encode the speciﬁc problem instance
The circuit shown in Figure 8.6 is encoded as circuit C1 with the following description. First,
we categorize the circuit and its component gates:
Circuit(C1) ∧ Arity(C1, 3, 2)
Gate(X1) ∧ Type(X1) = XOR
Gate(X2) ∧ Type(X2) = XOR
Gate(A1) ∧ Type(A1) = AND
Gate(A2) ∧ Type(A2) = AND
Gate(O1) ∧ Type(O1) = OR .

Then, we show the connections between them:
Connected(Out(1, X1), In(1, X2)) Connected(In(1, C1), In(1, X1))
Connected(Out(1, X1), In(2, A2)) Connected(In(1, C1), In(1, A1))
Connected(Out(1, A2), In(1, O1)) Connected(In(2, C1), In(2, X1))
Connected(Out(1, A1), In(2, O1)) Connected(In(2, C1), In(2, A1))
Connected(Out(1, X2), Out(1, C1)) Connected(In(3, C1), In(2, X2))
Connected(Out(1, O1), Out(2, C1)) Connected(In(3, C1), In(1, A2)) .
Pose queries to the inference procedure
What combinations of inputs would cause the first output of C1 (the sum bit) to be 0 and the
second output of C1 (the carry bit) to be 1?
∃ i1, i2, i3 Signal(In(1, C1)) = i1 ∧ Signal(In(2, C1)) = i2 ∧ Signal(In(3, C1)) = i3
∧ Signal(Out(1, C1)) = 0 ∧ Signal(Out(2, C1)) = 1 .
The answers are substitutions for the variables i1, i2, and i3 such that the resulting sentence
is entailed by the knowledge base. ASKVARS will give us three such substitutions:
{i1/1, i2/1, i3/0} {i1/1, i2/0, i3/1} {i1/0, i2/1, i3/1} .
What are the possible sets of values of all the terminals for the adder circuit?
∃ i1, i2, i3, o1, o2 Signal(In(1, C1)) = i1 ∧ Signal(In(2, C1)) = i2
∧ Signal(In(3, C1)) = i3 ∧ Signal(Out(1, C1)) = o1 ∧ Signal(Out(2, C1)) = o2 .
This final query will return a complete input–output table for the device, which can be used
to check that it does in fact add its inputs correctly. This is a simple example of circuit
verification. We can also use the definition of the circuit to build larger digital systems, for
CIRCUIT
VERIFICATION
which the same kind of verification procedure can be carried out. (See Exercise 8.28.) Many
domains are amenable to the same kind of structured knowledge-base development, in which
more complex concepts are defined on top of simpler concepts.
Debug the knowledge base
We can perturb the knowledge base in various ways to see what kinds of erroneous behaviors
emerge. For example, suppose we fail to read Section 8.2.8 and hence forget to assert that
1 6= 0. Suddenly, the system will be unable to prove any outputs for the circuit, except for
the input cases 000 and 110. We can pinpoint the problem by asking for the outputs of each
gate. For example, we can ask
∃ i1, i2, o Signal(In(1, C1)) = i1 ∧ Signal(In(2, C1)) = i2 ∧ Signal(Out(1, X1)) ,
which reveals that no outputs are known at X1 for the input cases 10 and 01. Then, we look
at the axiom for XOR gates, as applied to X1:
Signal(Out(1, X1)) = 1 ⇔ Signal(In(1, X1)) 6= Signal(In(2, X1)) .
If the inputs are known to be, say, 1 and 0, then this reduces to
Signal(Out(1, X1)) = 1 ⇔ 1 6= 0 .
Now the problem is apparent: the system is unable to infer that Signal(Out(1, X1)) = 1, so
we need to tell it that 1 6= 0.

8.5 SUMMARY
This chapter has introduced first-order logic, a representation language that is far more pow-
erful than propositional logic. The important points are as follows:
• Knowledge representation languages should be declarative, compositional, expressive,
context independent, and unambiguous.
• Logics differ in their ontological commitments and epistemological commitments.
While propositional logic commits only to the existence of facts, first-order logic com-
mits to the existence of objects and relations and thereby gains expressive power.
• The syntax of first-order logic builds on that of propositional logic. It adds terms to
represent objects, and has universal and existential quantifiers to construct assertions
about all or some of the possible values of the quantified variables.
• A possible world, or model, for first-order logic includes a set of objects and an inter-
pretation that maps constant symbols to objects, predicate symbols to relations among
objects, and function symbols to functions on objects.
• An atomic sentence is true just when the relation named by the predicate holds between
the objects named by the terms. Extended interpretations, which map quantifier vari-
ables to objects in the model, define the truth of quantified sentences.
• Developing a knowledge base in first-order logic requires a careful process of analyzing
the domain, choosing a vocabulary, and encoding the axioms required to support the
desired inferences.
Although Aristotle’s logic deals with generalizations over objects, it fell far short of the ex-
pressive power of first-order logic. A major barrier to its further development was its concen-
tration on one-place predicates to the exclusion of many-place relational predicates. The first
systematic treatment of relations was given by Augustus De Morgan (1864), who cited the
following example to show the sorts of inferences that Aristotle’s logic could not handle: “All
horses are animals; therefore, the head of a horse is the head of an animal.” This inference
is inaccessible to Aristotle because any valid rule that can support this inference must first
analyze the sentence using the two-place predicate “x is the head of y.” The logic of relations
was studied in depth by Charles Sanders Peirce (1870, 2004).
True first-order logic dates from the introduction of quantifiers in Gottlob Frege’s (1879)
Begriffschrift (“Concept Writing” or “Conceptual Notation”). Peirce (1883) also developed
first-order logic independently of Frege, although slightly later. Frege’s ability to nest quan-
tifiers was a big step forward, but he used an awkward notation. The present notation for
first-order logic is due substantially to Giuseppe Peano (1889), but the semantics is virtually
identical to Frege’s. Oddly enough, Peano’s axioms were due in large measure to Grassmann
(1861) and Dedekind (1888).

Leopold Löwenheim (1915) gave a systematic treatment of model theory for first-order
logic, including the first proper treatment of the equality symbol. Löwenheim’s results were
further extended by Thoralf Skolem (1920). Alfred Tarski (1935, 1956) gave an explicit
definition of truth and model-theoretic satisfaction in first-order logic, using set theory.
McCarthy (1958) was primarily responsible for the introduction of first-order logic as a
tool for building AI systems. The prospects for logic-based AI were advanced significantly by
Robinson’s (1965) development of resolution, a complete procedure for first-order inference
described in Chapter 9. The logicist approach took root at Stanford University. Cordell Green
(1969a, 1969b) developed a first-order reasoning system, QA3, leading to the first attempts to
build a logical robot at SRI (Fikes and Nilsson, 1971). First-order logic was applied by Zohar
Manna and Richard Waldinger (1971) for reasoning about programs and later by Michael
Genesereth (1984) for reasoning about circuits. In Europe, logic programming (a restricted
form of first-order reasoning) was developed for linguistic analysis (Colmerauer et al., 1973)
and for general declarative systems (Kowalski, 1974). Computational logic was also well
entrenched at Edinburgh through the LCF (Logic for Computable Functions) project (Gordon
et al., 1979). These developments are chronicled further in Chapters 9 and 12.
Practical applications built with first-order logic include a system for evaluating the
manufacturing requirements for electronic products (Mannion, 2002), a system for reasoning
about policies for file access and digital rights management (Halpern and Weissman, 2008),
and a system for the automated composition of Web services (McIlraith and Zeng, 2001).
Reactions to the Whorf hypothesis (Whorf, 1956) and the problem of language and
thought in general, appear in several recent books (Gumperz and Levinson, 1996; Bowerman
and Levinson, 2001; Pinker, 2003; Gentner and Goldin-Meadow, 2003). The “theory” theory
(Gopnik and Glymour, 2002; Tenenbaum et al., 2007) views children’s learning about the
world as analogous to the construction of scientific theories. Just as the predictions of a
machine learning algorithm depend strongly on the vocabulary supplied to it, so will the
child’s formulation of theories depend on the linguistic environment in which learning occurs.
There are a number of good introductory texts on first-order logic, including some by
leading figures in the history of logic: Alfred Tarski (1941), Alonzo Church (1956), and
W.V. Quine (1982) (which is one of the most readable). Enderton (1972) gives a more math-
ematically oriented perspective. A highly formal treatment of first-order logic, along with
many more advanced topics in logic, is provided by Bell and Machover (1977). Manna and
Waldinger (1985) give a readable introduction to logic from a computer science perspec-
tive, as do Huth and Ryan (2004), who concentrate on program verification. Barwise and
Etchemendy (2002) take an approach similar to the one used here. Smullyan (1995) presents
results concisely, using the tableau format. Gallier (1986) provides an extremely rigorous
mathematical exposition of first-order logic, along with a great deal of material on its use in
automated reasoning. Logical Foundations of Artificial Intelligence (Genesereth and Nilsson,
1987) is both a solid introduction to logic and the first systematic treatment of logical agents
with percepts and actions, and there are two good handbooks: van Bentham and ter Meulen
(1997) and Robinson and Voronkov (2001). The journal of record for the field of pure math-
ematical logic is the Journal of Symbolic Logic, whereas the Journal of Applied Logic deals
with concerns closer to those of artificial intelligence.

Exercises 315
EXERCISES
8.1 A logical knowledge base represents the world using a set of sentences with no explicit
structure. An analogical representation, on the other hand, has physical structure that corre-
sponds directly to the structure of the thing represented. Consider a road map of your country
as an analogical representation of facts about the country—it represents facts with a map lan-
guage. The two-dimensional structure of the map corresponds to the two-dimensional surface
of the area.
a. Give five examples of symbols in the map language.
b. An explicit sentence is a sentence that the creator of the representation actually writes
down. An implicit sentence is a sentence that results from explicit sentences because
of properties of the analogical representation. Give three examples each of implicit and
explicit sentences in the map language.
c. Give three examples of facts about the physical structure of your country that cannot be
represented in the map language.
d. Give two examples of facts that are much easier to express in the map language than in
first-order logic.
e. Give two other examples of useful analogical representations. What are the advantages
and disadvantages of each of these languages?
8.2 Consider a knowledge base containing just two sentences: P(a) and P(b). Does this
knowledge base entail ∀ x P(x)? Explain your answer in terms of models.
8.3 Is the sentence ∃ x, y x = y valid? Explain.
8.4 Write down a logical sentence such that every world in which it is true contains exactly
two objects.
8.5 Consider a symbol vocabulary that contains c constant symbols, pk predicate symbols
of each arity k, and fk function symbols of each arity k, where 1 ≤ k ≤ A. Let the domain
size be fixed at D. For any given model, each predicate or function symbol is mapped onto a
relation or function, respectively, of the same arity. You may assume that the functions in the
model allow some input tuples to have no value for the function (i.e., the value is the invisible
object). Derive a formula for the number of possible models for a domain with D elements.
Don’t worry about eliminating redundant combinations.
8.6 Which of the following are valid (necessarily true) sentences?
a. (∃x x = x) ⇒ (∀ y ∃z y = z).
b. ∀ x P(x) ∨ ¬P(x).
c. ∀ x Smart(x) ∨ (x = x).
8.7 Consider a version of the semantics for first-order logic in which models with empty
domains are allowed. Give at least two examples of sentences that are valid according to the

standard semantics but not according to the new semantics. Discuss which outcome makes
more intuitive sense for your examples.
8.8 Does the fact ¬Spouse(George, Laura) follow from the facts Jim 6= George and
Spouse(Jim, Laura)? If so, give a proof; if not, supply additional axioms as needed. What
happens if we use Spouse as a unary function symbol instead of a binary predicate?
8.9 Consider a vocabulary with the following symbols:
Occupation(p, o): Predicate. Person p has occupation o.
Customer(p1, p2): Predicate. Person p1 is a customer of person p2.
Boss(p1, p2): Predicate. Person p1 is a boss of person p2.
Doctor, Surgeon, Lawyer, Actor: Constants denoting occupations.
Emily, Joe: Constants denoting people.
Use these symbols to write the following assertions in ﬁrst-order logic:
a. Emily is either a surgeon or a lawyer.
b. Joe is an actor, but he also holds another job.
c. All surgeons are doctors.
d. Joe does not have a lawyer (i.e., is not a customer of any lawyer).
e. Emily has a boss who is a lawyer.
f. There exists a lawyer all of whose customers are doctors.
g. Every surgeon has a lawyer.
8.10 In each of the following we give an English sentence and a number of candidate logical
expressions. For each of the logical expressions, state whether it (1) correctly expresses the
English sentence; (2) is syntactically invalid and therefore meaningless; or (3) is syntactically
valid but does not express the meaning of the English sentence.
a. Every cat loves its mother or father.
(i) ∀ x Cat(x) ⇒ Loves(x, Mother(x) ∨ Father(x)).
(ii) ∀ x ¬Cat(x) ∨ Loves(x, Mother(x)) ∨ Loves(x, Father(x)).
(iii) ∀ x Cat(x) ∧ (Loves(x, Mother(x)) ∨ Loves(x, Father(x))).
b. Every dog who loves one of its brothers is happy.
(i) ∀ x Dog(x) ∧ (∃y Brother(y, x) ∧ Loves(x, y)) ⇒ Happy(x).
(ii) ∀ x, y Dog(x) ∧ Brother(y, x) ∧ Loves(x, y) ⇒ Happy(x).
(iii) ∀ x Dog(x) ∧ [∀ y Brother(y, x) ⇔ Loves(x, y)] ⇒ Happy(x).
c. No dog bites a child of its owner.
(i) ∀ x Dog(x) ⇒ ¬Bites(x, Child(Owner(x))).
(ii) ¬∃ x, y Dog(x) ∧ Child(y, Owner(x)) ∧ Bites(x, y).
(iii) ∀ x Dog(x) ⇒ (∀ y Child(y, Owner(x)) ⇒ ¬Bites(x, y)).
(iv) ¬∃ x Dog(x) ⇒ (∃ y Child(y, Owner(x)) ∧ Bites(x, y)).
d. Everyone’s zip code within a state has the same ﬁrst digit.

Exercises 317
(i) ∀ x, s, z1 [State(s) ∧ LivesIn(x, s) ∧ Zip(x) = z1] ⇒
[∀ y, z2 LivesIn(y, s) ∧ Zip(y) = z2 ⇒ Digit(1, z1) = Digit(1, z2)].
(ii) ∀ x, s [State(s) ∧ LivesIn(x, s) ∧ ∃ z1 Zip(x) = z1] ⇒
[∀ y, z2 LivesIn(y, s) ∧ Zip(y) = z2 ∧ Digit(1, z1) = Digit(1, z2)].
(iii) ∀ x, y, s State(s)∧LivesIn(x, s)∧LivesIn(y, s) ⇒ Digit(1, Zip(x) = Zip(y)).
(iv) ∀ x, y, s State(s) ∧ LivesIn(x, s) ∧ LivesIn(y, s) ⇒
Digit(1, Zip(x)) = Digit(1, Zip(y)).
8.11 Complete the following exercises about logical senntences:
a. Translate into good, natural English (no xs or ys!):
∀ x, y, l SpeaksLanguage(x, l) ∧ SpeaksLanguage(y, l)
⇒ Understands(x, y) ∧ Understands(y, x).
b. Explain why this sentence is entailed by the sentence
∀ x, y, l SpeaksLanguage(x, l) ∧ SpeaksLanguage(y, l)
⇒ Understands(x, y).
c. Translate into first-order logic the following sentences:
(i) Understanding leads to friendship.
(ii) Friendship is transitive.
Remember to define all predicates, functions, and constants you use.
8.12 True or false? Explain.
a. ∃ x x = Rumpelstiltskin is a valid (necessarily true) sentence of first-order logic.
b. Every existentially quantified sentence in first-order logic is true in any model that con-
tains exactly one object.
c. ∀ x, y x = y is satisfiable.
8.13 Rewrite the first two Peano axioms in Section 8.3.3 as a single axiom that defines
NatNum(x) so as to exclude the possibility of natural numbers except for those generated by
the successor function.
8.14 Equation (8.4) on page 306 defines the conditions under which a square is breezy. Here
we consider two other ways to describe this aspect of the wumpus world.
a. We can write diagnostic rules leading from observed effects to hidden causes. For find-
DIAGNOSTIC RULE
ing pits, the obvious diagnostic rules say that if a square is breezy, some adjacent square
must contain a pit; and if a square is not breezy, then no adjacent square contains a pit.
Write these two rules in first-order logic and show that their conjunction is logically
equivalent to Equation (8.4).
b. We can write causal rules leading from cause to effect. One obvious causal rule is that
CAUSAL RULE
a pit causes all adjacent squares to be breezy. Write this rule in first-order logic, explain
why it is incomplete compared to Equation (8.4), and supply the missing axiom.

Beatrice
Andrew
Eugenie
William Harry
Charles
Diana
Mum
George
Philip
Elizabeth Margaret
Kydd
Spencer
Peter
Mark
Zara
Anne Sarah Edward Sophie
Louise James
Figure 8.7 A typical family tree. The symbol “⊲⊳” connects spouses and arrows point to
children.
8.15 Write axioms describing the predicates Grandchild, Greatgrandparent , Ancestor,
Brother, Sister, Daughter, Son, FirstCousin, BrotherInLaw, SisterInLaw, Aunt, and
Uncle. Find out the proper definition of mth cousin n times removed, and write the def-
inition in first-order logic. Now write down the basic facts depicted in the family tree in
Figure 8.7. Using a suitable logical reasoning system, TELL it all the sentences you have
written down, and ASK it who are Elizabeth’s grandchildren, Diana’s brothers-in-law, Zara’s
great-grandparents, and Eugenie’s ancestors.
8.16 Write down a sentence asserting that + is a commutative function. Does your sentence
follow from the Peano axioms? If so, explain why; if not, give a model in which the axioms
are true and your sentence is false.
8.17 Using the set axioms as examples, write axioms for the list domain, including all the
constants, functions, and predicates mentioned in the chapter.
8.18 Explain what is wrong with the following proposed definition of adjacent squares in
the wumpus world:
∀ x, y Adjacent([x, y], [x + 1, y]) ∧ Adjacent([x, y], [x, y + 1]) .
8.19 Write out the axioms required for reasoning about the wumpus’s location, using a
constant symbol Wumpus and a binary predicate At(Wumpus, Location). Remember that
there is only one wumpus.
8.20 Assuming predicates Parent(p, q) and Female(p) and constants Joan and Kevin,
with the obvious meanings, express each of the following sentences in first-order logic. (You
may use the abbreviation ∃1 to mean “there exists exactly one.”)
a. Joan has a daughter (possibly more than one, and possibly sons as well).
b. Joan has exactly one daughter (but may have sons as well).
c. Joan has exactly one child, a daughter.
d. Joan and Kevin have exactly one child together.
e. Joan has at least one child with Kevin, and no children with anyone else.

Exercises 319
8.21 Arithmetic assertions can be written in first-order logic with the predicate symbol ,
the function symbols + and ×, and the constant symbols 0 and 1. Additional predicates can
also be defined with biconditionals.
a. Represent the property “x is an even number.”
b. Represent the property “x is prime.”
c. Goldbach’s conjecture is the conjecture (unproven as yet) that every even number is
equal to the sum of two primes. Represent this conjecture as a logical sentence.
8.22 In Chapter 6, we used equality to indicate the relation between a variable and its value.
For instance, we wrote WA = red to mean that Western Australia is colored red. Repre-
senting this in first-order logic, we must write more verbosely ColorOf (WA) = red. What
incorrect inference could be drawn if we wrote sentences such as WA = red directly as logical
assertions?
8.23 Write in first-order logic the assertion that every key and at least one of every pair of
socks will eventually be lost forever, using only the following vocabulary: Key(x), x is a key;
Sock(x), x is a sock; Pair(x, y), x and y are a pair; Now, the current time; Before(t1, t2),
time t1 comes before time t2; Lost(x, t), object x is lost at time t.
8.24 Translate into first-order logic the sentence “Everyone’s DNA is unique and is derived
from their parents’ DNA.” You must specify the precise intended meaning of your vocabulary
terms. (Hint: Do not use the predicate Unique(x), since uniqueness is not really a property
of an object in itself!)
8.25 For each of the following sentences in English, decide if the accompanying first-order
logic sentence is a good translation. If not, explain why not and correct it.
a. Any apartment in London has lower rent than some apartments in Paris.
∀ x [Apt(x) ∧ In(x, London)] ⇒ ∃ y ([Apt(y) ∧ In(y, Paris)] ⇒
(Rent(x) Rent(y))) .
b. There is exactly one apartment in Paris with rent below $1000.
∃ x Apt(x) ∧ In(x, Paris) ∧
∀ y [Apt(y) ∧ In(y, Paris) ∧ (Rent(y) Dollars(1000))] ⇒ (y = x).
c. If an apartment is more expensive than all apartments in London, it must be in Moscow.
∀ x Apt(x) ∧ [∀ y Apt(y) ∧ In(y, London) ∧ (Rent(x) Rent(y))] ⇒
In(x, Moscow).
8.26 Represent the following sentences in first-order logic, using a consistent vocabulary
(which you must define):
a. Some students took French in spring 2009.
b. Every student who takes French passes it.
c. Only one student took Greek in spring 2009.
d. The best score in Greek is always lower than the best score in French.

Z0
Z1
Z2
Z3
Z4
X0
Y0
X1
Y1
X2
Y2
X3
Y3
Ad0
Ad1
Ad2
Ad3
X0
X1
X2
X3
Z0
Z1
Z2
Z3
Z4
Y0
Y1
Y2
Y3
+
Figure 8.8 A four-bit adder. Each Adi is a one-bit adder, as in Figure 8.6 on page 309.
e. Every person who buys a policy is smart.
f. There is an agent who sells policies only to people who are not insured.
g. There is a barber who shaves all men in town who do not shave themselves.
h. A person born in the UK, each of whose parents is a UK citizen or a UK resident, is a
UK citizen by birth.
i. A person born outside the UK, one of whose parents is a UK citizen by birth, is a UK
citizen by descent.
j. Politicians can fool some of the people all of the time, and they can fool all of the people
some of the time, but they can’t fool all of the people all of the time.
k. All Greeks speak the same language. (Use Speaks(x, l) to mean that person x speaks
language l.)
8.27 Write a general set of facts and axioms to represent the assertion “Wellington heard
about Napoleon’s death” and to correctly answer the question “Did Napoleon hear about
Wellington’s death?”
8.28 Extend the vocabulary from Section 8.4 to deﬁne addition for n-bit binary numbers.
Then encode the description of the four-bit adder in Figure 8.8, and pose the queries needed
to verify that it is in fact correct.
8.29 The circuit representation in the chapter is more detailed than necessary if we care
only about circuit functionality. A simpler formulation describes any m-input, n-output gate
or circuit using a predicate with m+n arguments, such that the predicate is true exactly when
the inputs and outputs are consistent. For example, NOT gates are described by the binary
predicate NOT(i, o), for which NOT(0, 1) and NOT(1, 0) are known. Compositions of
gates are deﬁned by conjunctions of gate predicates in which shared variables indicate direct
connections. For example, a NAND circuit can be composed from ANDs and NOTs:
∀ i1, i2, oa, o AND(i1, i2, oa) ∧ NOT(oa, o) ⇒ NAND(i1, i2, o) .

Exercises 321
Using this representation, define the one-bit adder in Figure 8.6 and the four-bit adder in
Figure 8.8, and explain what queries you would use to verify the designs. What kinds of
queries are not supported by this representation that are supported by the representation in
Section 8.4?
8.30 Obtain a passport application for your country, identify the rules determining eligi-
bility for a passport, and translate them into first-order logic, following the steps outlined in
Section 8.4.
8.31 Consider a first-order logical knowledge base that describes worlds containing people,
songs, albums (e.g., “Meet the Beatles”) and disks (i.e., particular physical instances of CDs).
The vocabulary contains the following symbols:
CopyOf (d, a): Predicate. Disk d is a copy of album a.
Owns(p, d): Predicate. Person p owns disk d.
Sings(p, s, a): Album a includes a recording of song s sung by person p.
Wrote(p, s): Person p wrote song s.
McCartney, Gershwin, BHoliday, Joe, EleanorRigby, TheManILove, Revolver:
Constants with the obvious meanings.
Express the following statements in first-order logic:
a. Gershwin wrote “The Man I Love.”
b. Gershwin did not write “Eleanor Rigby.”
c. Either Gershwin or McCartney wrote “The Man I Love.”
d. Joe has written at least one song.
e. Joe owns a copy of Revolver.
f. Every song that McCartney sings on Revolver was written by McCartney.
g. Gershwin did not write any of the songs on Revolver.
h. Every song that Gershwin wrote has been recorded on some album. (Possibly different
songs are recorded on different albums.)
i. There is a single album that contains every song that Joe has written.
j. Joe owns a copy of an album that has Billie Holiday singing “The Man I Love.”
k. Joe owns a copy of every album that has a song sung by McCartney. (Of course, each
different album is instantiated in a different physical CD.)
l. Joe owns a copy of every album on which all the songs are sung by Billie Holiday.

9 INFERENCE IN
FIRST-ORDER LOGIC
In which we define effective procedures for answering questions posed in first-
order logic.
Chapter 7 showed how sound and complete inference can be achieved for propositional logic.
In this chapter, we extend those results to obtain algorithms that can answer any answer-
able question stated in first-order logic. Section 9.1 introduces inference rules for quantifiers
and shows how to reduce first-order inference to propositional inference, albeit at potentially
great expense. Section 9.2 describes the idea of unification, showing how it can be used
to construct inference rules that work directly with first-order sentences. We then discuss
three major families of first-order inference algorithms. Forward chaining and its applica-
tions to deductive databases and production systems are covered in Section 9.3; backward
chaining and logic programming systems are developed in Section 9.4. Forward and back-
ward chaining can be very efficient, but are applicable only to knowledge bases that can
be expressed as sets of Horn clauses. General first-order sentences require resolution-based
theorem proving, which is described in Section 9.5.
9.1 PROPOSITIONAL VS. FIRST-ORDER INFERENCE
This section and the next introduce the ideas underlying modern logical inference systems.
We begin with some simple inference rules that can be applied to sentences with quantifiers
to obtain sentences without quantifiers. These rules lead naturally to the idea that first-order
inference can be done by converting the knowledge base to propositional logic and using
propositional inference, which we already know how to do. The next section points out an
obvious shortcut, leading to inference methods that manipulate first-order sentences directly.
9.1.1 Inference rules for quantifiers
Let us begin with universal quantifiers. Suppose our knowledge base contains the standard
folkloric axiom stating that all greedy kings are evil:
∀ x King(x) ∧ Greedy(x) ⇒ Evil(x) .
322

Section 9.1. Propositional vs. First-Order Inference 323
Then it seems quite permissible to infer any of the following sentences:
King(John) ∧ Greedy(John) ⇒ Evil(John)
King(Richard) ∧ Greedy(Richard) ⇒ Evil(Richard)
King(Father(John)) ∧ Greedy(Father(John)) ⇒ Evil(Father(John)) .
.
.
.
The rule of Universal Instantiation (UI for short) says that we can infer any sentence ob-
UNIVERSAL
INSTANTIATION
tained by substituting a ground term (a term without variables) for the variable.1 To write
GROUND TERM
out the inference rule formally, we use the notion of substitutions introduced in Section 8.3.
Let SUBST(θ, α) denote the result of applying the substitution θ to the sentence α. Then the
rule is written
∀ v α
SUBST({v/g}, α)
for any variable v and ground term g. For example, the three sentences given earlier are
obtained with the substitutions {x/John}, {x/Richard}, and {x/Father(John)}.
In the rule for Existential Instantiation, the variable is replaced by a single new con-
EXISTENTIAL
INSTANTIATION
stant symbol. The formal statement is as follows: for any sentence α, variable v, and constant
symbol k that does not appear elsewhere in the knowledge base,
∃ v α
SUBST({v/k}, α)
.
For example, from the sentence
∃ x Crown(x) ∧ OnHead(x, John)
we can infer the sentence
Crown(C1) ∧ OnHead(C1, John)
as long as C1 does not appear elsewhere in the knowledge base. Basically, the existential
sentence says there is some object satisfying a condition, and applying the existential instan-
tiation rule just gives a name to that object. Of course, that name must not already belong
to another object. Mathematics provides a nice example: suppose we discover that there is a
number that is a little bigger than 2.71828 and that satisfies the equation d(xy)/dy = xy for x.
We can give this number a name, such as e, but it would be a mistake to give it the name of
an existing object, such as π. In logic, the new name is called a Skolem constant. Existen-
SKOLEM CONSTANT
tial Instantiation is a special case of a more general process called skolemization, which we
cover in Section 9.5.
Whereas Universal Instantiation can be applied many times to produce many different
consequences, Existential Instantiation can be applied once, and then the existentially quan-
tified sentence can be discarded. For example, we no longer need ∃ x Kill(x, Victim) once
we have added the sentence Kill(Murderer, Victim). Strictly speaking, the new knowledge
base is not logically equivalent to the old, but it can be shown to be inferentially equivalent
INFERENTIAL
EQUIVALENCE
in the sense that it is satisfiable exactly when the original knowledge base is satisfiable.
1 Do not confuse these substitutions with the extended interpretations used to define the semantics of quantifiers.
The substitution replaces a variable with a term (a piece of syntax) to produce a new sentence, whereas an
interpretation maps a variable to an object in the domain.

324 Chapter 9. Inference in First-Order Logic
9.1.2 Reduction to propositional inference
Once we have rules for inferring nonquantified sentences from quantified sentences, it be-
comes possible to reduce first-order inference to propositional inference. In this section we
give the main ideas; the details are given in Section 9.5.
The first idea is that, just as an existentially quantified sentence can be replaced by
one instantiation, a universally quantified sentence can be replaced by the set of all possible
instantiations. For example, suppose our knowledge base contains just the sentences
∀ x King(x) ∧ Greedy(x) ⇒ Evil(x)
King(John)
Greedy(John)
Brother(Richard, John) .
(9.1)
Then we apply UI to the first sentence using all possible ground-term substitutions from the
vocabulary of the knowledge base—in this case, {x/John} and {x/Richard}. We obtain
King(John) ∧ Greedy(John) ⇒ Evil(John)
King(Richard) ∧ Greedy(Richard) ⇒ Evil(Richard) ,
and we discard the universally quantified sentence. Now, the knowledge base is essentially
propositional if we view the ground atomic sentences—King(John), Greedy(John), and
so on—as proposition symbols. Therefore, we can apply any of the complete propositional
algorithms in Chapter 7 to obtain conclusions such as Evil(John).
This technique of propositionalization can be made completely general, as we show
in Section 9.5; that is, every first-order knowledge base and query can be propositionalized
in such a way that entailment is preserved. Thus, we have a complete decision procedure
for entailment . . . or perhaps not. There is a problem: when the knowledge base includes
a function symbol, the set of possible ground-term substitutions is infinite! For example, if
the knowledge base mentions the Father symbol, then infinitely many nested terms such as
Father(Father(Father(John))) can be constructed. Our propositional algorithms will have
difficulty with an infinitely large set of sentences.
Fortunately, there is a famous theorem due to Jacques Herbrand (1930) to the effect
that if a sentence is entailed by the original, first-order knowledge base, then there is a proof
involving just a finite subset of the propositionalized knowledge base. Since any such subset
has a maximum depth of nesting among its ground terms, we can find the subset by first
generating all the instantiations with constant symbols (Richard and John), then all terms of
depth 1 (Father(Richard) and Father(John)), then all terms of depth 2, and so on, until we
are able to construct a propositional proof of the entailed sentence.
We have sketched an approach to first-order inference via propositionalization that is
complete—that is, any entailed sentence can be proved. This is a major achievement, given
that the space of possible models is infinite. On the other hand, we do not know until the
proof is done that the sentence is entailed! What happens when the sentence is not entailed?
Can we tell? Well, for first-order logic, it turns out that we cannot. Our proof procedure can
go on and on, generating more and more deeply nested terms, but we will not know whether
it is stuck in a hopeless loop or whether the proof is just about to pop out. This is very much

Section 9.2. Unification and Lifting 325
like the halting problem for Turing machines. Alan Turing (1936) and Alonzo Church (1936)
both proved, in rather different ways, the inevitability of this state of affairs. The question of
entailment for first-order logic is semidecidable—that is, algorithms exist that say yes to every
entailed sentence, but no algorithm exists that also says no to every nonentailed sentence.
9.2 UNIFICATION AND LIFTING
The preceding section described the understanding of first-order inference that existed up
to the early 1960s. The sharp-eyed reader (and certainly the computational logicians of the
early 1960s) will have noticed that the propositionalization approach is rather inefficient. For
example, given the query Evil(x) and the knowledge base in Equation (9.1), it seems per-
verse to generate sentences such as King(Richard) ∧ Greedy(Richard) ⇒ Evil(Richard).
Indeed, the inference of Evil(John) from the sentences
∀ x King(x) ∧ Greedy(x) ⇒ Evil(x)
King(John)
Greedy(John)
seems completely obvious to a human being. We now show how to make it completely
obvious to a computer.
9.2.1 A first-order inference rule
The inference that John is evil—that is, that {x/John} solves the query Evil(x)—works like
this: to use the rule that greedy kings are evil, find some x such that x is a king and x is
greedy, and then infer that this x is evil. More generally, if there is some substitution θ that
makes each of the conjuncts of the premise of the implication identical to sentences already
in the knowledge base, then we can assert the conclusion of the implication, after applying θ.
In this case, the substitution θ = {x/John} achieves that aim.
We can actually make the inference step do even more work. Suppose that instead of
knowing Greedy(John), we know that everyone is greedy:
∀ y Greedy(y) . (9.2)
Then we would still like to be able to conclude that Evil(John), because we know that
John is a king (given) and John is greedy (because everyone is greedy). What we need for
this to work is to find a substitution both for the variables in the implication sentence and
for the variables in the sentences that are in the knowledge base. In this case, applying the
substitution {x/John, y/John} to the implication premises King(x) and Greedy(x) and the
knowledge-base sentences King(John) and Greedy(y) will make them identical. Thus, we
can infer the conclusion of the implication.
This inference process can be captured as a single inference rule that we call Gener-
alized Modus Ponens:2 For atomic sentences pi, pi
′, and q, where there is a substitution θ
GENERALIZED
MODUS PONENS

such that SUBST(θ, pi
′) = SUBST(θ, pi), for all i,
p1
′, p2
′, . . . , pn
′, (p1 ∧ p2 ∧ . . . ∧ pn ⇒ q)
SUBST(θ, q)
.
There are n+1 premises to this rule: the n atomic sentences pi
′ and the one implication. The
conclusion is the result of applying the substitution θ to the consequent q. For our example:
p1
′ is King(John) p1 is King(x)
p2
′ is Greedy(y) p2 is Greedy(x)
θ is {x/John, y/John} q is Evil(x)
SUBST(θ, q) is Evil(John) .
It is easy to show that Generalized Modus Ponens is a sound inference rule. First, we observe
that, for any sentence p (whose variables are assumed to be universally quantified) and for
any substitution θ,
p |= SUBST(θ, p)
holds by Universal Instantiation. It holds in particular for a θ that satisfies the conditions of
the Generalized Modus Ponens rule. Thus, from p1
′, . . . , pn
′ we can infer
SUBST(θ, p1
′
) ∧ . . . ∧ SUBST(θ, pn
′
)
and from the implication p1 ∧ . . . ∧ pn ⇒ q we can infer
SUBST(θ, p1) ∧ . . . ∧ SUBST(θ, pn) ⇒ SUBST(θ, q) .
Now, θ in Generalized Modus Ponens is defined so that SUBST(θ, pi
′) = SUBST(θ, pi), for
all i; therefore the first of these two sentences matches the premise of the second exactly.
Hence, SUBST(θ, q) follows by Modus Ponens.
Generalized Modus Ponens is a lifted version of Modus Ponens—it raises Modus Po-
LIFTING
nens from ground (variable-free) propositional logic to first-order logic. We will see in the
rest of this chapter that we can develop lifted versions of the forward chaining, backward
chaining, and resolution algorithms introduced in Chapter 7. The key advantage of lifted
inference rules over propositionalization is that they make only those substitutions that are
required to allow particular inferences to proceed.
9.2.2 Unification
Lifted inference rules require finding substitutions that make different logical expressions
look identical. This process is called unification and is a key component of all first-order
UNIFICATION
inference algorithms. The UNIFY algorithm takes two sentences and returns a unifier for
UNIFIER
them if one exists:
UNIFY(p, q) = θ where SUBST(θ, p) = SUBST(θ, q) .
Let us look at some examples of how UNIFY should behave. Suppose we have a query
AskVars(Knows(John, x)): whom does John know? Answers to this query can be found
2 Generalized Modus Ponens is more general than Modus Ponens (page 249) in the sense that the known facts
and the premise of the implication need match only up to a substitution, rather than exactly. On the other hand,
Modus Ponens allows any sentence α as the premise, rather than just a conjunction of atomic sentences.

by finding all sentences in the knowledge base that unify with Knows(John, x). Here are the
results of unification with four different sentences that might be in the knowledge base:
UNIFY(Knows(John, x), Knows(John, Jane)) = {x/Jane}
UNIFY(Knows(John, x), Knows(y, Bill)) = {x/Bill, y/John}
UNIFY(Knows(John, x), Knows(y, Mother(y))) = {y/John, x/Mother(John)}
UNIFY(Knows(John, x), Knows(x, Elizabeth)) = fail .
The last unification fails because x cannot take on the values John and Elizabeth at the
same time. Now, remember that Knows(x, Elizabeth) means “Everyone knows Elizabeth,”
so we should be able to infer that John knows Elizabeth. The problem arises only because
the two sentences happen to use the same variable name, x. The problem can be avoided
by standardizing apart one of the two sentences being unified, which means renaming its
STANDARDIZING
APART
variables to avoid name clashes. For example, we can rename x in Knows(x, Elizabeth) to
x17 (a new variable name) without changing its meaning. Now the unification will work:
UNIFY(Knows(John, x), Knows(x17, Elizabeth)) = {x/Elizabeth, x17/John} .
Exercise 9.13 delves further into the need for standardizing apart.
There is one more complication: we said that UNIFY should return a substitution
that makes the two arguments look the same. But there could be more than one such uni-
fier. For example, UNIFY(Knows(John, x), Knows(y, z)) could return {y/John, x/z} or
{y/John, x/John, z/John}. The first unifier gives Knows(John, z) as the result of unifi-
cation, whereas the second gives Knows(John, John). The second result could be obtained
from the first by an additional substitution {z/John}; we say that the first unifier is more
general than the second, because it places fewer restrictions on the values of the variables. It
turns out that, for every unifiable pair of expressions, there is a single most general unifier (or
MOST GENERAL
UNIFIER
MGU) that is unique up to renaming and substitution of variables. (For example, {x/John}
and {y/John} are considered equivalent, as are {x/John, y/John} and {x/John, y/x}.) In
this case it is {y/John, x/z}.
An algorithm for computing most general unifiers is shown in Figure 9.1. The process
is simple: recursively explore the two expressions simultaneously “side by side,” building up
a unifier along the way, but failing if two corresponding points in the structures do not match.
There is one expensive step: when matching a variable against a complex term, one must
check whether the variable itself occurs inside the term; if it does, the match fails because no
consistent unifier can be constructed. For example, S(x) can’t unify with S(S(x)). This so-
called occur check makes the complexity of the entire algorithm quadratic in the size of the
OCCUR CHECK
expressions being unified. Some systems, including all logic programming systems, simply
omit the occur check and sometimes make unsound inferences as a result; other systems use
more complex algorithms with linear-time complexity.
9.2.3 Storage and retrieval
Underlying the TELL and ASK functions used to inform and interrogate a knowledge base
are the more primitive STORE and FETCH functions. STORE(s) stores a sentence s into the
knowledge base and FETCH(q) returns all unifiers such that the query q unifies with some

function UNIFY(x,y,θ) returns a substitution to make x and y identical
inputs: x, a variable, constant, list, or compound expression
y, a variable, constant, list, or compound expression
θ, the substitution built up so far (optional, defaults to empty)
if θ = failure then return failure
else if x = y then return θ
else if VARIABLE?(x) then return UNIFY-VAR(x,y,θ)
else if VARIABLE?(y) then return UNIFY-VAR(y,x,θ)
else if COMPOUND?(x) and COMPOUND?(y) then
return UNIFY(x.ARGS,y.ARGS, UNIFY(x.OP,y.OP,θ))
else if LIST?(x) and LIST?(y) then
return UNIFY(x.REST,y.REST, UNIFY(x.FIRST,y.FIRST,θ))
else return failure
function UNIFY-VAR(var,x,θ) returns a substitution
if {var/val} ∈ θ then return UNIFY(val,x,θ)
else if {x/val} ∈ θ then return UNIFY(var,val,θ)
else if OCCUR-CHECK?(var,x) then return failure
else return add {var/x} to θ
Figure 9.1 The unification algorithm. The algorithm works by comparing the structures
of the inputs, element by element. The substitution θ that is the argument to UNIFY is built
up along the way and is used to make sure that later comparisons are consistent with bindings
that were established earlier. In a compound expression such as F(A, B), the OP field picks
out the function symbol F and the ARGS field picks out the argument list (A, B).
sentence in the knowledge base. The problem we used to illustrate unification—finding all
facts that unify with Knows(John, x)—is an instance of FETCHing.
The simplest way to implement STORE and FETCH is to keep all the facts in one long
list and unify each query against every element of the list. Such a process is inefficient, but
it works, and it’s all you need to understand the rest of the chapter. The remainder of this
section outlines ways to make retrieval more efficient; it can be skipped on first reading.
We can make FETCH more efficient by ensuring that unifications are attempted only
with sentences that have some chance of unifying. For example, there is no point in trying
to unify Knows(John, x) with Brother(Richard, John). We can avoid such unifications by
indexing the facts in the knowledge base. A simple scheme called predicate indexing puts
INDEXING
PREDICATE
INDEXING all the Knows facts in one bucket and all the Brother facts in another. The buckets can be
stored in a hash table for efficient access.
Predicate indexing is useful when there are many predicate symbols but only a few
clauses for each symbol. Sometimes, however, a predicate has many clauses. For example,
suppose that the tax authorities want to keep track of who employs whom, using a predi-
cate Employs(x, y). This would be a very large bucket with perhaps millions of employers

Employs(x,y)
Employs(x,Richard) Employs(IBM,y)
Employs(IBM,Richard)
Employs(x,y)
Employs(John,John)
Employs(x,x)
Employs(x,John) Employs(John,y)
(a) (b)
Figure 9.2 (a) The subsumption lattice whose lowest node is Employs(IBM , Richard).
(b) The subsumption lattice for the sentence Employs(John, John).
and tens of millions of employees. Answering a query such as Employs(x, Richard) with
predicate indexing would require scanning the entire bucket.
For this particular query, it would help if facts were indexed both by predicate and by
second argument, perhaps using a combined hash table key. Then we could simply construct
the key from the query and retrieve exactly those facts that unify with the query. For other
queries, such as Employs(IBM , y), we would need to have indexed the facts by combining
the predicate with the first argument. Therefore, facts can be stored under multiple index
keys, rendering them instantly accessible to various queries that they might unify with.
Given a sentence to be stored, it is possible to construct indices for all possible queries
that unify with it. For the fact Employs(IBM , Richard), the queries are
Employs(IBM , Richard) Does IBM employ Richard?
Employs(x, Richard) Who employs Richard?
Employs(IBM , y) Whom does IBM employ?
Employs(x, y) Who employs whom?
These queries form a subsumption lattice, as shown in Figure 9.2(a). The lattice has some
SUBSUMPTION
LATTICE
interesting properties. For example, the child of any node in the lattice is obtained from its
parent by a single substitution; and the “highest” common descendant of any two nodes is
the result of applying their most general unifier. The portion of the lattice above any ground
fact can be constructed systematically (Exercise 9.5). A sentence with repeated constants has
a slightly different lattice, as shown in Figure 9.2(b). Function symbols and variables in the
sentences to be stored introduce still more interesting lattice structures.
The scheme we have described works very well whenever the lattice contains a small
number of nodes. For a predicate with n arguments, however, the lattice contains O(2n)
nodes. If function symbols are allowed, the number of nodes is also exponential in the size
of the terms in the sentence to be stored. This can lead to a huge number of indices. At some
point, the benefits of indexing are outweighed by the costs of storing and maintaining all
the indices. We can respond by adopting a fixed policy, such as maintaining indices only on
keys composed of a predicate plus each argument, or by using an adaptive policy that creates
indices to meet the demands of the kinds of queries being asked. For most AI systems, the
number of facts to be stored is small enough that efficient indexing is considered a solved
problem. For commercial databases, where facts number in the billions, the problem has
been the subject of intensive study and technology development..

9.3 FORWARD CHAINING
A forward-chaining algorithm for propositional definite clauses was given in Section 7.5.
The idea is simple: start with the atomic sentences in the knowledge base and apply Modus
Ponens in the forward direction, adding new atomic sentences, until no further inferences
can be made. Here, we explain how the algorithm is applied to first-order definite clauses.
Definite clauses such as Situation ⇒ Response are especially useful for systems that make
inferences in response to newly arrived information. Many systems can be defined this way,
and forward chaining can be implemented very efficiently.
9.3.1 First-order definite clauses
First-order definite clauses closely resemble propositional definite clauses (page 256): they
are disjunctions of literals of which exactly one is positive. A definite clause either is atomic
or is an implication whose antecedent is a conjunction of positive literals and whose conse-
quent is a single positive literal. The following are first-order definite clauses:
King(x) ∧ Greedy(x) ⇒ Evil(x) .
King(John) .
Greedy(y) .
Unlike propositional literals, first-order literals can include variables, in which case those
variables are assumed to be universally quantified. (Typically, we omit universal quantifiers
when writing definite clauses.) Not every knowledge base can be converted into a set of
definite clauses because of the single-positive-literal restriction, but many can. Consider the
following problem:
The law says that it is a crime for an American to sell weapons to hostile nations. The
country Nono, an enemy of America, has some missiles, and all of its missiles were sold
to it by Colonel West, who is American.
We will prove that West is a criminal. First, we will represent these facts as first-order definite
clauses. The next section shows how the forward-chaining algorithm solves the problem.
“. . . it is a crime for an American to sell weapons to hostile nations”:
American(x) ∧ Weapon(y) ∧ Sells(x, y, z) ∧ Hostile(z) ⇒ Criminal(x) . (9.3)
“Nono . . . has some missiles.” The sentence ∃ x Owns(Nono, x)∧Missile(x) is transformed
into two definite clauses by Existential Instantiation, introducing a new constant M1:
Owns(Nono, M1) (9.4)
Missile(M1)
. (9.5)
“All of its missiles were sold to it by Colonel West”:
Missile(x) ∧ Owns(Nono, x) ⇒ Sells(West, x, Nono) . (9.6)
We will also need to know that missiles are weapons:
Missile(x) ⇒ Weapon(x) (9.7)

Section 9.3. Forward Chaining 331
and we must know that an enemy of America counts as “hostile”:
Enemy(x, America) ⇒ Hostile(x) . (9.8)
“West, who is American . . .”:
American(West) . (9.9)
“The country Nono, an enemy of America . . .”:
Enemy(Nono, America) . (9.10)
This knowledge base contains no function symbols and is therefore an instance of the class
of Datalog knowledge bases. Datalog is a language that is restricted to first-order definite
DATALOG
clauses with no function symbols. Datalog gets its name because it can represent the type of
statements typically made in relational databases. We will see that the absence of function
symbols makes inference much easier.
9.3.2 A simple forward-chaining algorithm
The first forward-chaining algorithm we consider is a simple one, shown in Figure 9.3. Start-
ing from the known facts, it triggers all the rules whose premises are satisfied, adding their
conclusions to the known facts. The process repeats until the query is answered (assuming
that just one answer is required) or no new facts are added. Notice that a fact is not “new”
if it is just a renaming of a known fact. One sentence is a renaming of another if they
RENAMING
are identical except for the names of the variables. For example, Likes(x, IceCream) and
Likes(y, IceCream) are renamings of each other because they differ only in the choice of x
or y; their meanings are identical: everyone likes ice cream.
We use our crime problem to illustrate how FOL-FC-ASK works. The implication
sentences are (9.3), (9.6), (9.7), and (9.8). Two iterations are required:
• On the first iteration, rule (9.3) has unsatisfied premises.
Rule (9.6) is satisfied with {x/M1}, and Sells(West, M1, Nono) is added.
Rule (9.7) is satisfied with {x/M1}, and Weapon(M1) is added.
Rule (9.8) is satisfied with {x/Nono}, and Hostile(Nono) is added.
• On the second iteration, rule (9.3) is satisfied with {x/West, y/M1, z/Nono}, and
Criminal(West) is added.
Figure 9.4 shows the proof tree that is generated. Notice that no new inferences are possible
at this point because every sentence that could be concluded by forward chaining is already
contained explicitly in the KB. Such a knowledge base is called a fixed point of the inference
process. Fixed points reached by forward chaining with first-order definite clauses are similar
to those for propositional forward chaining (page 258); the principal difference is that a first-
order fixed point can include universally quantified atomic sentences.
FOL-FC-ASK is easy to analyze. First, it is sound, because every inference is just an
application of Generalized Modus Ponens, which is sound. Second, it is complete for definite
clause knowledge bases; that is, it answers every query whose answers are entailed by any
knowledge base of definite clauses. For Datalog knowledge bases, which contain no function
symbols, the proof of completeness is fairly easy. We begin by counting the number of

function FOL-FC-ASK(KB,α) returns a substitution or false
inputs: KB, the knowledge base, a set of first-order definite clauses
α, the query, an atomic sentence
local variables: new, the new sentences inferred on each iteration
repeat until new is empty
new ← { }
for each rule in KB do
(p1 ∧ . . . ∧ pn ⇒ q) ← STANDARDIZE-VARIABLES(rule)
for each θ such that SUBST(θ,p1 ∧ . . . ∧ pn) = SUBST(θ,p′
1 ∧ . . . ∧ p′
n)
for some p′
1, . . . , p′
n in KB
q′
← SUBST(θ,q)
if q′
does not unify with some sentence already in KB or new then
add q′
to new
φ ← UNIFY(q′
,α)
if φ is not fail then return φ
add new to KB
return false
Figure 9.3 A conceptually straightforward, but very inefficient, forward-chaining algo-
rithm. On each iteration, it adds to KB all the atomic sentences that can be inferred in one
step from the implication sentences and the atomic sentences already in KB. The function
STANDARDIZE-VARIABLES replaces all variables in its arguments with new ones that have
not been used before.
Hostile(Nono)
Enemy(Nono,America)
Owns(Nono,M1)
Missile(M1)
American(West)
Weapon(M1)
Criminal(West)
Sells(West,M1,Nono)
Figure 9.4 The proof tree generated by forward chaining on the crime example. The initial
facts appear at the bottom level, facts inferred on the first iteration in the middle level, and
facts inferred on the second iteration at the top level.
possible facts that can be added, which determines the maximum number of iterations. Let k
be the maximum arity (number of arguments) of any predicate, p be the number of predicates,
and n be the number of constant symbols. Clearly, there can be no more than pnk distinct
ground facts, so after this many iterations the algorithm must have reached a fixed point. Then
we can make an argument very similar to the proof of completeness for propositional forward

chaining. (See page 258.) The details of how to make the transition from propositional to
first-order completeness are given for the resolution algorithm in Section 9.5.
For general definite clauses with function symbols, FOL-FC-ASK can generate in-
finitely many new facts, so we need to be more careful. For the case in which an answer to
the query sentence q is entailed, we must appeal to Herbrand’s theorem to establish that the
algorithm will find a proof. (See Section 9.5 for the resolution case.) If the query has no
answer, the algorithm could fail to terminate in some cases. For example, if the knowledge
base includes the Peano axioms
NatNum(0)
∀ n NatNum(n) ⇒ NatNum(S(n)) ,
then forward chaining adds NatNum(S(0)), NatNum(S(S(0))), NatNum(S(S(S(0)))),
and so on. This problem is unavoidable in general. As with general first-order logic, entail-
ment with definite clauses is semidecidable.
9.3.3 Efficient forward chaining
The forward-chaining algorithm in Figure 9.3 is designed for ease of understanding rather
than for efficiency of operation. There are three possible sources of inefficiency. First, the
“inner loop” of the algorithm involves finding all possible unifiers such that the premise of
a rule unifies with a suitable set of facts in the knowledge base. This is often called pattern
matching and can be very expensive. Second, the algorithm rechecks every rule on every
PATTERN MATCHING
iteration to see whether its premises are satisfied, even if very few additions are made to the
knowledge base on each iteration. Finally, the algorithm might generate many facts that are
irrelevant to the goal. We address each of these issues in turn.
Matching rules against known facts
The problem of matching the premise of a rule against the facts in the knowledge base might
seem simple enough. For example, suppose we want to apply the rule
Missile(x) ⇒ Weapon(x) .
Then we need to find all the facts that unify with Missile(x); in a suitably indexed knowledge
base, this can be done in constant time per fact. Now consider a rule such as
Missile(x) ∧ Owns(Nono, x) ⇒ Sells(West, x, Nono) .
Again, we can find all the objects owned by Nono in constant time per object; then, for each
object, we could check whether it is a missile. If the knowledge base contains many objects
owned by Nono and very few missiles, however, it would be better to find all the missiles first
and then check whether they are owned by Nono. This is the conjunct ordering problem:
CONJUNCT
ORDERING
find an ordering to solve the conjuncts of the rule premise so that the total cost is minimized.
It turns out that finding the optimal ordering is NP-hard, but good heuristics are available.
For example, the minimum-remaining-values (MRV) heuristic used for CSPs in Chapter 6
would suggest ordering the conjuncts to look for missiles first if fewer missiles than objects
are owned by Nono.

WA
NT
SA
Q
NSW
V
T
Diff (wa, nt) ∧ Diff (wa, sa) ∧
Diff (nt, q) ∧ Diff (nt, sa) ∧
Diff (q, nsw) ∧ Diff (q, sa) ∧
Diff (nsw, v) ∧ Diff (nsw, sa) ∧
Diff (v, sa) ⇒ Colorable()
Diff (Red, Blue) Diff (Red, Green)
Diff (Green, Red) Diff (Green, Blue)
Diff (Blue, Red) Diff (Blue, Green)
(a) (b)
Figure 9.5 (a) Constraint graph for coloring the map of Australia. (b) The map-coloring
CSP expressed as a single definite clause. Each map region is represented as a variable whose
value can be one of the constants Red, Green or Blue.
The connection between pattern matching and constraint satisfaction is actually very
close. We can view each conjunct as a constraint on the variables that it contains—for ex-
ample, Missile(x) is a unary constraint on x. Extending this idea, we can express every
finite-domain CSP as a single definite clause together with some associated ground facts.
Consider the map-coloring problem from Figure 6.1, shown again in Figure 9.5(a). An equiv-
alent formulation as a single definite clause is given in Figure 9.5(b). Clearly, the conclusion
Colorable() can be inferred only if the CSP has a solution. Because CSPs in general include
3-SAT problems as special cases, we can conclude that matching a definite clause against a
set of facts is NP-hard.
It might seem rather depressing that forward chaining has an NP-hard matching problem
in its inner loop. There are three ways to cheer ourselves up:
• We can remind ourselves that most rules in real-world knowledge bases are small and
simple (like the rules in our crime example) rather than large and complex (like the
CSP formulation in Figure 9.5). It is common in the database world to assume that
both the sizes of rules and the arities of predicates are bounded by a constant and to
worry only about data complexity—that is, the complexity of inference as a function
DATA COMPLEXITY
of the number of ground facts in the knowledge base. It is easy to show that the data
complexity of forward chaining is polynomial.
• We can consider subclasses of rules for which matching is efficient. Essentially every
Datalog clause can be viewed as defining a CSP, so matching will be tractable just
when the corresponding CSP is tractable. Chapter 6 describes several tractable families
of CSPs. For example, if the constraint graph (the graph whose nodes are variables
and whose links are constraints) forms a tree, then the CSP can be solved in linear
time. Exactly the same result holds for rule matching. For instance, if we remove South

Australia from the map in Figure 9.5, the resulting clause is
Diff (wa, nt) ∧ Diff (nt, q) ∧ Diff (q, nsw) ∧ Diff (nsw, v) ⇒ Colorable()
which corresponds to the reduced CSP shown in Figure 6.12 on page 224. Algorithms
for solving tree-structured CSPs can be applied directly to the problem of rule matching.
• We can try to to eliminate redundant rule-matching attempts in the forward-chaining
algorithm, as described next.
Incremental forward chaining
When we showed how forward chaining works on the crime example, we cheated; in partic-
ular, we omitted some of the rule matching done by the algorithm shown in Figure 9.3. For
example, on the second iteration, the rule
Missile(x) ⇒ Weapon(x)
matches against Missile(M1) (again), and of course the conclusion Weapon(M1) is already
known so nothing happens. Such redundant rule matching can be avoided if we make the
following observation: Every new fact inferred on iteration t must be derived from at least
one new fact inferred on iteration t − 1. This is true because any inference that does not
require a new fact from iteration t − 1 could have been done at iteration t − 1 already.
This observation leads naturally to an incremental forward-chaining algorithm where,
at iteration t, we check a rule only if its premise includes a conjunct pi that unifies with a fact
p′
i newly inferred at iteration t − 1. The rule-matching step then fixes pi to match with p′
i, but
allows the other conjuncts of the rule to match with facts from any previous iteration. This
algorithm generates exactly the same facts at each iteration as the algorithm in Figure 9.3, but
is much more efficient.
With suitable indexing, it is easy to identify all the rules that can be triggered by any
given fact, and indeed many real systems operate in an “update” mode wherein forward chain-
ing occurs in response to each new fact that is TELLed to the system. Inferences cascade
through the set of rules until the fixed point is reached, and then the process begins again for
the next new fact.
Typically, only a small fraction of the rules in the knowledge base are actually triggered
by the addition of a given fact. This means that a great deal of redundant work is done in
repeatedly constructing partial matches that have some unsatisfied premises. Our crime ex-
ample is rather too small to show this effectively, but notice that a partial match is constructed
on the first iteration between the rule
American(x) ∧ Weapon(y) ∧ Sells(x, y, z) ∧ Hostile(z) ⇒ Criminal(x)
and the fact American(West). This partial match is then discarded and rebuilt on the second
iteration (when the rule succeeds). It would be better to retain and gradually complete the
partial matches as new facts arrive, rather than discarding them.
The rete algorithm3 was the first to address this problem. The algorithm preprocesses
RETE
the set of rules in the knowledge base to construct a sort of dataflow network in which each
3 Rete is Latin for net. The English pronunciation rhymes with treaty.

node is a literal from a rule premise. Variable bindings flow through the network and are
filtered out when they fail to match a literal. If two literals in a rule share a variable—for
example, Sells(x, y, z) ∧ Hostile(z) in the crime example—then the bindings from each
literal are filtered through an equality node. A variable binding reaching a node for an n-
ary literal such as Sells(x, y, z) might have to wait for bindings for the other variables to be
established before the process can continue. At any given point, the state of a rete network
captures all the partial matches of the rules, avoiding a great deal of recomputation.
Rete networks, and various improvements thereon, have been a key component of so-
called production systems, which were among the earliest forward-chaining systems in
PRODUCTION
SYSTEM
widespread use.4 The XCON system (originally called R1; McDermott, 1982) was built
with a production-system architecture. XCON contained several thousand rules for designing
configurations of computer components for customers of the Digital Equipment Corporation.
It was one of the first clear commercial successes in the emerging field of expert systems.
Many other similar systems have been built with the same underlying technology, which has
been implemented in the general-purpose language OPS-5.
Production systems are also popular in cognitive architectures—that is, models of hu-
COGNITIVE
ARCHITECTURES
man reasoning—such as ACT (Anderson, 1983) and SOAR (Laird et al., 1987). In such sys-
tems, the “working memory” of the system models human short-term memory, and the pro-
ductions are part of long-term memory. On each cycle of operation, productions are matched
against the working memory of facts. A production whose conditions are satisfied can add or
delete facts in working memory. In contrast to the typical situation in databases, production
systems often have many rules and relatively few facts. With suitably optimized matching
technology, some modern systems can operate in real time with tens of millions of rules.
Irrelevant facts
The final source of inefficiency in forward chaining appears to be intrinsic to the approach
and also arises in the propositional context. Forward chaining makes all allowable inferences
based on the known facts, even if they are irrelevant to the goal at hand. In our crime example,
there were no rules capable of drawing irrelevant conclusions, so the lack of directedness was
not a problem. In other cases (e.g., if many rules describe the eating habits of Americans and
the prices of missiles), FOL-FC-ASK will generate many irrelevant conclusions.
One way to avoid drawing irrelevant conclusions is to use backward chaining, as de-
scribed in Section 9.4. Another solution is to restrict forward chaining to a selected subset of
rules, as in PL-FC-ENTAILS? (page 258). A third approach has emerged in the field of de-
ductive databases, which are large-scale databases, like relational databases, but which use
DEDUCTIVE
DATABASES
forward chaining as the standard inference tool rather than SQL queries. The idea is to rewrite
the rule set, using information from the goal, so that only relevant variable bindings—those
belonging to a so-called magic set—are considered during forward inference. For example, if
MAGIC SET
the goal is Criminal(West), the rule that concludes Criminal(x) will be rewritten to include
an extra conjunct that constrains the value of x:
Magic(x) ∧ American(x) ∧ Weapon(y) ∧ Sells(x, y, z) ∧ Hostile(z) ⇒ Criminal(x) .
4 The word production in production systems denotes a condition–action rule.

Section 9.4. Backward Chaining 337
The fact Magic(West) is also added to the KB. In this way, even if the knowledge base
contains facts about millions of Americans, only Colonel West will be considered during the
forward inference process. The complete process for defining magic sets and rewriting the
knowledge base is too complex to go into here, but the basic idea is to perform a sort of
“generic” backward inference from the goal in order to work out which variable bindings
need to be constrained. The magic sets approach can therefore be thought of as a kind of
hybrid between forward inference and backward preprocessing.
9.4 BACKWARD CHAINING
The second major family of logical inference algorithms uses the backward chaining ap-
proach introduced in Section 7.5 for definite clauses. These algorithms work backward from
the goal, chaining through rules to find known facts that support the proof. We describe
the basic algorithm, and then we describe how it is used in logic programming, which is the
most widely used form of automated reasoning. We also see that backward chaining has some
disadvantages compared with forward chaining, and we look at ways to overcome them. Fi-
nally, we look at the close connection between logic programming and constraint satisfaction
problems.
9.4.1 A backward-chaining algorithm
Figure 9.6 shows a backward-chaining algorithm for definite clauses. FOL-BC-ASK(KB,
goal) will be proved if the knowledge base contains a clause of the form lhs ⇒ goal, where
lhs (left-hand side) is a list of conjuncts. An atomic fact like American(West) is considered
as a clause whose lhs is the empty list. Now a query that contains variables might be proved
in multiple ways. For example, the query Person(x) could be proved with the substitution
{x/John} as well as with {x/Richard}. So we implement FOL-BC-ASK as a generator—
GENERATOR
a function that returns multiple times, each time giving one possible result.
Backward chaining is a kind of AND/OR search—the OR part because the goal query
can be proved by any rule in the knowledge base, and the AND part because all the conjuncts
in the lhs of a clause must be proved. FOL-BC-OR works by fetching all clauses that might
unify with the goal, standardizing the variables in the clause to be brand-new variables, and
then, if the rhs of the clause does indeed unify with the goal, proving every conjunct in the
lhs, using FOL-BC-AND. That function in turn works by proving each of the conjuncts in
turn, keeping track of the accumulated substitution as we go. Figure 9.7 is the proof tree for
deriving Criminal(West) from sentences (9.3) through (9.10).
Backward chaining, as we have written it, is clearly a depth-first search algorithm.
This means that its space requirements are linear in the size of the proof (neglecting, for
now, the space required to accumulate the solutions). It also means that backward chaining
(unlike forward chaining) suffers from problems with repeated states and incompleteness. We
will discuss these problems and some potential solutions, but first we show how backward
chaining is used in logic programming systems.

function FOL-BC-ASK(KB,query) returns a generator of substitutions
return FOL-BC-OR(KB,query,{ })
generator FOL-BC-OR(KB,goal,θ) yields a substitution
for each rule (lhs ⇒ rhs) in FETCH-RULES-FOR-GOAL(KB, goal) do
(lhs, rhs) ← STANDARDIZE-VARIABLES((lhs, rhs))
for each θ′
in FOL-BC-AND(KB,lhs, UNIFY(rhs, goal, θ)) do
yield θ′
generator FOL-BC-AND(KB,goals,θ) yields a substitution
if θ = failure then return
else if LENGTH(goals) = 0 then yield θ
else do
first,rest ← FIRST(goals), REST(goals)
for each θ′
in FOL-BC-OR(KB, SUBST(θ, first), θ) do
for each θ′′
in FOL-BC-AND(KB,rest,θ′
) do
yield θ′′
Figure 9.6 A simple backward-chaining algorithm for first-order knowledge bases.
Hostile(Nono)
Enemy(Nono,America)
Owns(Nono,M1)
Missile(M1)
Criminal(West)
Missile(y)
Weapon(y) Sells(West,M1,z)
American(West)
{y/M1} { }
{ }
{ }
{z/Nono}
{ }
Figure 9.7 Proof tree constructed by backward chaining to prove that West is a criminal.
The tree should be read depth first, left to right. To prove Criminal(West), we have to prove
the four conjuncts below it. Some of these are in the knowledge base, and others require
further backward chaining. Bindings for each successful unification are shown next to the
corresponding subgoal. Note that once one subgoal in a conjunction succeeds, its substitution
is applied to subsequent subgoals. Thus, by the time FOL-BC-ASK gets to the last conjunct,
originally Hostile(z), z is already bound to Nono.

9.4.2 Logic programming
Logic programming is a technology that comes fairly close to embodying the declarative
ideal described in Chapter 7: that systems should be constructed by expressing knowledge in
a formal language and that problems should be solved by running inference processes on that
knowledge. The ideal is summed up in Robert Kowalski’s equation,
Algorithm = Logic + Control .
Prolog is the most widely used logic programming language. It is used primarily as a rapid-
PROLOG
prototyping language and for symbol-manipulation tasks such as writing compilers (Van Roy,
1990) and parsing natural language (Pereira and Warren, 1980). Many expert systems have
been written in Prolog for legal, medical, financial, and other domains.
Prolog programs are sets of definite clauses written in a notation somewhat different
from standard first-order logic. Prolog uses uppercase letters for variables and lowercase for
constants—the opposite of our convention for logic. Commas separate conjuncts in a clause,
and the clause is written “backwards” from what we are used to; instead of A ∧ B ⇒ C in
Prolog we have C :- A, B. Here is a typical example:
criminal(X) :- american(X), weapon(Y), sells(X,Y,Z), hostile(Z).
The notation [E|L] denotes a list whose first element is E and whose rest is L. Here is a
Prolog program for append(X,Y,Z), which succeeds if list Z is the result of appending
lists X and Y:
append([],Y,Y).
append([A|X],Y,[A|Z]) :- append(X,Y,Z).
In English, we can read these clauses as (1) appending an empty list with a list Y produces
the same list Y and (2) [A|Z] is the result of appending [A|X] onto Y, provided that Z is
the result of appending X onto Y. In most high-level languages we can write a similar recur-
sive function that describes how to append two lists. The Prolog definition is actually much
more powerful, however, because it describes a relation that holds among three arguments,
rather than a function computed from two arguments. For example, we can ask the query
append(X,Y,[1,2]): what two lists can be appended to give [1,2]? We get back the
solutions
X=[] Y=[1,2];
X=[1] Y=[2];
X=[1,2] Y=[]
The execution of Prolog programs is done through depth-first backward chaining, where
clauses are tried in the order in which they are written in the knowledge base. Some aspects
of Prolog fall outside standard logical inference:
• Prolog uses the database semantics of Section 8.2.8 rather than first-order semantics,
and this is apparent in its treatment of equality and negation (see Section 9.4.5).
• There is a set of built-in functions for arithmetic. Literals using these function symbols
are “proved” by executing code rather than doing further inference. For example, the

goal “X is 4+3” succeeds with X bound to 7. On the other hand, the goal “5 is X+Y”
fails, because the built-in functions do not do arbitrary equation solving.5
• There are built-in predicates that have side effects when executed. These include input–
output predicates and the assert/retract predicates for modifying the knowledge
base. Such predicates have no counterpart in logic and can produce confusing results—
for example, if facts are asserted in a branch of the proof tree that eventually fails.
• The occur check is omitted from Prolog’s unification algorithm. This means that some
unsound inferences can be made; these are almost never a problem in practice.
• Prolog uses depth-first backward-chaining search with no checks for infinite recursion.
This makes it very fast when given the right set of axioms, but incomplete when given
the wrong ones.
Prolog’s design represents a compromise between declarativeness and execution efficiency—
inasmuch as efficiency was understood at the time Prolog was designed.
9.4.3 Efficient implementation of logic programs
The execution of a Prolog program can happen in two modes: interpreted and compiled.
Interpretation essentially amounts to running the FOL-BC-ASK algorithm from Figure 9.6,
with the program as the knowledge base. We say “essentially” because Prolog interpreters
contain a variety of improvements designed to maximize speed. Here we consider only two.
First, our implementation had to explicitly manage the iteration over possible results
generated by each of the subfunctions. Prolog interpreters have a global data structure,
a stack of choice points, to keep track of the multiple possibilities that we considered in
CHOICE POINT
FOL-BC-OR. This global stack is more efficient, and it makes debugging easier, because
the debugger can move up and down the stack.
Second, our simple implementation of FOL-BC-ASK spends a good deal of time gener-
ating substitutions. Instead of explicitly constructing substitutions, Prolog has logic variables
that remember their current binding. At any point in time, every variable in the program ei-
ther is unbound or is bound to some value. Together, these variables and values implicitly
define the substitution for the current branch of the proof. Extending the path can only add
new variable bindings, because an attempt to add a different binding for an already bound
variable results in a failure of unification. When a path in the search fails, Prolog will back
up to a previous choice point, and then it might have to unbind some variables. This is done
by keeping track of all the variables that have been bound in a stack called the trail. As each
TRAIL
new variable is bound by UNIFY-VAR, the variable is pushed onto the trail. When a goal fails
and it is time to back up to a previous choice point, each of the variables is unbound as it is
removed from the trail.
Even the most efficient Prolog interpreters require several thousand machine instruc-
tions per inference step because of the cost of index lookup, unification, and building the
recursive call stack. In effect, the interpreter always behaves as if it has never seen the pro-
gram before; for example, it has to find clauses that match the goal. A compiled Prolog
5 Note that if the Peano axioms are provided, such goals can be solved by inference within a Prolog program.

procedure APPEND(ax,y,az,continuation)
trail ← GLOBAL-TRAIL-POINTER()
if ax = [ ] and UNIFY(y,az) then CALL(continuation)
RESET-TRAIL(trail)
a, x, z ← NEW-VARIABLE(), NEW-VARIABLE(), NEW-VARIABLE()
if UNIFY(ax,[a | x]) and UNIFY(az,[a | z]) then APPEND(x,y,z,continuation)
Figure 9.8 Pseudocode representing the result of compiling the Append predicate. The
function NEW-VARIABLE returns a new variable, distinct from all other variables used so far.
The procedure CALL(continuation) continues execution with the specified continuation.
program, on the other hand, is an inference procedure for a specific set of clauses, so it knows
what clauses match the goal. Prolog basically generates a miniature theorem prover for each
different predicate, thereby eliminating much of the overhead of interpretation. It is also pos-
sible to open-code the unification routine for each different call, thereby avoiding explicit
OPEN-CODE
analysis of term structure. (For details of open-coded unification, see Warren et al. (1977).)
The instruction sets of today’s computers give a poor match with Prolog’s semantics,
so most Prolog compilers compile into an intermediate language rather than directly into ma-
chine language. The most popular intermediate language is the Warren Abstract Machine,
or WAM, named after David H. D. Warren, one of the implementers of the first Prolog com-
piler. The WAM is an abstract instruction set that is suitable for Prolog and can be either
interpreted or translated into machine language. Other compilers translate Prolog into a high-
level language such as Lisp or C and then use that language’s compiler to translate to machine
language. For example, the definition of the Append predicate can be compiled into the code
shown in Figure 9.8. Several points are worth mentioning:
• Rather than having to search the knowledge base for Append clauses, the clauses be-
come a procedure and the inferences are carried out simply by calling the procedure.
• As described earlier, the current variable bindings are kept on a trail. The first step of the
procedure saves the current state of the trail, so that it can be restored by RESET-TRAIL
if the first clause fails. This will undo any bindings generated by the first call to UNIFY.
• The trickiest part is the use of continuations to implement choice points. You can think
CONTINUATION
of a continuation as packaging up a procedure and a list of arguments that together
define what should be done next whenever the current goal succeeds. It would not
do just to return from a procedure like APPEND when the goal succeeds, because it
could succeed in several ways, and each of them has to be explored. The continuation
argument solves this problem because it can be called each time the goal succeeds. In
the APPEND code, if the first argument is empty and the second argument unifies with
the third, then the APPEND predicate has succeeded. We then CALL the continuation,
with the appropriate bindings on the trail, to do whatever should be done next. For
example, if the call to APPEND were at the top level, the continuation would print the
bindings of the variables.

Before Warren’s work on the compilation of inference in Prolog, logic programming was
too slow for general use. Compilers by Warren and others allowed Prolog code to achieve
speeds that are competitive with C on a variety of standard benchmarks (Van Roy, 1990).
Of course, the fact that one can write a planner or natural language parser in a few dozen
lines of Prolog makes it somewhat more desirable than C for prototyping most small-scale AI
research projects.
Parallelization can also provide substantial speedup. There are two principal sources of
parallelism. The first, called OR-parallelism, comes from the possibility of a goal unifying
OR-PARALLELISM
with many different clauses in the knowledge base. Each gives rise to an independent branch
in the search space that can lead to a potential solution, and all such branches can be solved
in parallel. The second, called AND-parallelism, comes from the possibility of solving
AND-PARALLELISM
each conjunct in the body of an implication in parallel. AND-parallelism is more difficult to
achieve, because solutions for the whole conjunction require consistent bindings for all the
variables. Each conjunctive branch must communicate with the other branches to ensure a
global solution.
9.4.4 Redundant inference and infinite loops
We now turn to the Achilles heel of Prolog: the mismatch between depth-first search and
search trees that include repeated states and infinite paths. Consider the following logic pro-
gram that decides if a path exists between two points on a directed graph:
path(X,Z) :- link(X,Z).
path(X,Z) :- path(X,Y), link(Y,Z).
A simple three-node graph, described by the facts link(a,b) and link(b,c), is shown
in Figure 9.9(a). With this program, the query path(a,c) generates the proof tree shown
in Figure 9.10(a). On the other hand, if we put the two clauses in the order
path(X,Z) :- path(X,Y), link(Y,Z).
path(X,Z) :- link(X,Z).
then Prolog follows the infinite path shown in Figure 9.10(b). Prolog is therefore incomplete
as a theorem prover for definite clauses—even for Datalog programs, as this example shows—
because, for some knowledge bases, it fails to prove sentences that are entailed. Notice that
forward chaining does not suffer from this problem: once path(a,b), path(b,c), and
path(a,c) are inferred, forward chaining halts.
Depth-first backward chaining also has problems with redundant computations. For
example, when finding a path from A1 to J4 in Figure 9.9(b), Prolog performs 877 inferences,
most of which involve finding all possible paths to nodes from which the goal is unreachable.
This is similar to the repeated-state problem discussed in Chapter 3. The total amount of
inference can be exponential in the number of ground facts that are generated. If we apply
forward chaining instead, at most n2 path(X,Y) facts can be generated linking n nodes.
For the problem in Figure 9.9(b), only 62 inferences are needed.
Forward chaining on graph search problems is an example of dynamic programming,
DYNAMIC
PROGRAMMING
in which the solutions to subproblems are constructed incrementally from those of smaller

(a) (b)
A B C
A1
J4
Figure 9.9 (a) Finding a path from A to C can lead Prolog into an infinite loop. (b) A
graph in which each node is connected to two random successors in the next layer. Finding a
path from A1 to J4 requires 877 inferences.
path(a,c)
fail
{ }
/
Y b
{ }
link(a,c) path(a,Y)
link(a,Y)
link(b,c)
path(a,c)
path(a,Y) link(Y,c)
path(a,Y’) link(Y’,Y)
(a) (b)
Figure 9.10 (a) Proof that a path exists from A to C. (b) Infinite proof tree generated
when the clauses are in the “wrong” order.
subproblems and are cached to avoid recomputation. We can obtain a similar effect in a
backward chaining system using memoization—that is, caching solutions to subgoals as
they are found and then reusing those solutions when the subgoal recurs, rather than repeat-
ing the previous computation. This is the approach taken by tabled logic programming sys-
TABLED LOGIC
PROGRAMMING
tems, which use efficient storage and retrieval mechanisms to perform memoization. Tabled
logic programming combines the goal-directedness of backward chaining with the dynamic-
programming efficiency of forward chaining. It is also complete for Datalog knowledge
bases, which means that the programmer need worry less about infinite loops. (It is still pos-
sible to get an infinite loop with predicates like father(X,Y) that refer to a potentially
unbounded number of objects.)
9.4.5 Database semantics of Prolog
Prolog uses database semantics, as discussed in Section 8.2.8. The unique names assumption
says that every Prolog constant and every ground term refers to a distinct object, and the
closed world assumption says that the only sentences that are true are those that are entailed

by the knowledge base. There is no way to assert that a sentence is false in Prolog. This makes
Prolog less expressive than first-order logic, but it is part of what makes Prolog more efficient
and more concise. Consider the following Prolog assertions about some course offerings:
Course(CS, 101), Course(CS, 102), Course(CS, 106), Course(EE, 101). (9.11)
Under the unique names assumption, CS and EE are different (as are 101, 102, and 106),
so this means that there are four distinct courses. Under the closed-world assumption there
are no other courses, so there are exactly four courses. But if these were assertions in FOL
rather than in Prolog, then all we could say is that there are somewhere between one and
infinity courses. That’s because the assertions (in FOL) do not deny the possibility that other
unmentioned courses are also offered, nor do they say that the courses mentioned are different
from each other. If we wanted to translate Equation (9.11) into FOL, we would get this:
Course(d, n) ⇔ (d = CS ∧ n = 101) ∨ (d = CS ∧ n = 102)
∨ (d = CS ∧ n = 106) ∨ (d = EE ∧ n = 101) . (9.12)
This is called the completion of Equation (9.11). It expresses in FOL the idea that there are
COMPLETION
at most four courses. To express in FOL the idea that there are at least four courses, we need
to write the completion of the equality predicate:
x = y ⇔ (x = CS ∧ y = CS) ∨ (x = EE ∧ y = EE) ∨ (x = 101 ∧ y = 101)
∨ (x = 102 ∧ y = 102) ∨ (x = 106 ∧ y = 106) .
The completion is useful for understanding database semantics, but for practical purposes, if
your problem can be described with database semantics, it is more efficient to reason with
Prolog or some other database semantics system, rather than translating into FOL and rea-
soning with a full FOL theorem prover.
9.4.6 Constraint logic programming
In our discussion of forward chaining (Section 9.3), we showed how constraint satisfaction
problems (CSPs) can be encoded as definite clauses. Standard Prolog solves such problems
in exactly the same way as the backtracking algorithm given in Figure 6.5.
Because backtracking enumerates the domains of the variables, it works only for finite-
domain CSPs. In Prolog terms, there must be a finite number of solutions for any goal
with unbound variables. (For example, the goal diff(Q,SA), which says that Queensland
and South Australia must be different colors, has six solutions if three colors are allowed.)
Infinite-domain CSPs—for example, with integer or real-valued variables—require quite dif-
ferent algorithms, such as bounds propagation or linear programming.
Consider the following example. We define triangle(X,Y,Z) as a predicate that
holds if the three arguments are numbers that satisfy the triangle inequality:
triangle(X,Y,Z) :-
X0, Y0, Z0, X+Y=Z, Y+Z=X, X+Z=Y.
If we ask Prolog the query triangle(3,4,5), it succeeds. On the other hand, if we
ask triangle(3,4,Z), no solution will be found, because the subgoal Z=0 cannot be
handled by Prolog; we can’t compare an unbound value to 0.

Section 9.5. Resolution 345
Constraint logic programming (CLP) allows variables to be constrained rather than
CONSTRAINT LOGIC
PROGRAMMING
bound. A CLP solution is the most specific set of constraints on the query variables that can
be derived from the knowledge base. For example, the solution to the triangle(3,4,Z)
query is the constraint 7 = Z = 1. Standard logic programs are just a special case of
CLP in which the solution constraints must be equality constraints—that is, bindings.
CLP systems incorporate various constraint-solving algorithms for the constraints al-
lowed in the language. For example, a system that allows linear inequalities on real-valued
variables might include a linear programming algorithm for solving those constraints. CLP
systems also adopt a much more flexible approach to solving standard logic programming
queries. For example, instead of depth-first, left-to-right backtracking, they might use any of
the more efficient algorithms discussed in Chapter 6, including heuristic conjunct ordering,
backjumping, cutset conditioning, and so on. CLP systems therefore combine elements of
constraint satisfaction algorithms, logic programming, and deductive databases.
Several systems that allow the programmer more control over the search order for in-
ference have been defined. The MRS language (Genesereth and Smith, 1981; Russell, 1985)
allows the programmer to write metarules to determine which conjuncts are tried first. The
METARULE
user could write a rule saying that the goal with the fewest variables should be tried first or
could write domain-specific rules for particular predicates.
9.5 RESOLUTION
The last of our three families of logical systems is based on resolution. We saw on page 250
that propositional resolution using refutation is a complete inference procedure for proposi-
tional logic. In this section, we describe how to extend resolution to first-order logic.
9.5.1 Conjunctive normal form for first-order logic
As in the propositional case, first-order resolution requires that sentences be in conjunctive
normal form (CNF)—that is, a conjunction of clauses, where each clause is a disjunction of
literals.6 Literals can contain variables, which are assumed to be universally quantified. For
example, the sentence
∀ x American(x) ∧ Weapon(y) ∧ Sells(x, y, z) ∧ Hostile(z) ⇒ Criminal(x)
becomes, in CNF,
¬American(x) ∨ ¬Weapon(y) ∨ ¬Sells(x, y, z) ∨ ¬Hostile(z) ∨ Criminal(x) .
Every sentence of first-order logic can be converted into an inferentially equivalent CNF
sentence. In particular, the CNF sentence will be unsatisfiable just when the original sentence
is unsatisfiable, so we have a basis for doing proofs by contradiction on the CNF sentences.
6 A clause can also be represented as an implication with a conjunction of atoms in the premise and a disjunction
of atoms in the conclusion (Exercise 7.13). This is called implicative normal form or Kowalski form (especially
when written with a right-to-left implication symbol (Kowalski, 1979)) and is often much easier to read.

The procedure for conversion to CNF is similar to the propositional case, which we saw
on page 253. The principal difference arises from the need to eliminate existential quantifiers.
We illustrate the procedure by translating the sentence “Everyone who loves all animals is
loved by someone,” or
∀ x [∀ y Animal(y) ⇒ Loves(x, y)] ⇒ [∃ y Loves(y, x)] .
The steps are as follows:
• Eliminate implications:
∀ x [¬∀ y ¬Animal(y) ∨ Loves(x, y)] ∨ [∃ y Loves(y, x)] .
• Move ¬ inwards: In addition to the usual rules for negated connectives, we need rules
for negated quantifiers. Thus, we have
¬∀ x p becomes ∃ x ¬p
¬∃ x p becomes ∀ x ¬p .
Our sentence goes through the following transformations:
∀ x [∃ y ¬(¬Animal(y) ∨ Loves(x, y))] ∨ [∃ y Loves(y, x)] .
∀ x [∃ y ¬¬Animal(y) ∧ ¬Loves(x, y)] ∨ [∃ y Loves(y, x)] .
∀ x [∃ y Animal(y) ∧ ¬Loves(x, y)] ∨ [∃ y Loves(y, x)] .
Notice how a universal quantifier (∀ y) in the premise of the implication has become
an existential quantifier. The sentence now reads “Either there is some animal that x
doesn’t love, or (if this is not the case) someone loves x.” Clearly, the meaning of the
original sentence has been preserved.
• Standardize variables: For sentences like (∃ x P(x))∨(∃ x Q(x)) which use the same
variable name twice, change the name of one of the variables. This avoids confusion
later when we drop the quantifiers. Thus, we have
∀ x [∃ y Animal(y) ∧ ¬Loves(x, y)] ∨ [∃ z Loves(z, x)] .
• Skolemize: Skolemization is the process of removing existential quantifiers by elimi-
SKOLEMIZATION
nation. In the simple case, it is just like the Existential Instantiation rule of Section 9.1:
translate ∃ x P(x) into P(A), where A is a new constant. However, we can’t apply Ex-
istential Instantiation to our sentence above because it doesn’t match the pattern ∃ v α;
only parts of the sentence match the pattern. If we blindly apply the rule to the two
matching parts we get
∀ x [Animal(A) ∧ ¬Loves(x, A)] ∨ Loves(B, x) ,
which has the wrong meaning entirely: it says that everyone either fails to love a par-
ticular animal A or is loved by some particular entity B. In fact, our original sentence
allows each person to fail to love a different animal or to be loved by a different person.
Thus, we want the Skolem entities to depend on x and z:
∀ x [Animal(F(x)) ∧ ¬Loves(x, F(x))] ∨ Loves(G(z), x) .
Here F and G are Skolem functions. The general rule is that the arguments of the
SKOLEM FUNCTION
Skolem function are all the universally quantified variables in whose scope the exis-
tential quantifier appears. As with Existential Instantiation, the Skolemized sentence is
satisfiable exactly when the original sentence is satisfiable.

• Drop universal quantifiers: At this point, all remaining variables must be universally
quantified. Moreover, the sentence is equivalent to one in which all the universal quan-
tifiers have been moved to the left. We can therefore drop the universal quantifiers:
[Animal(F(x)) ∧ ¬Loves(x, F(x))] ∨ Loves(G(z), x) .
• Distribute ∨ over ∧:
[Animal(F(x)) ∨ Loves(G(z), x)] ∧ [¬Loves(x, F(x)) ∨ Loves(G(z), x)] .
This step may also require flattening out nested conjunctions and disjunctions.
The sentence is now in CNF and consists of two clauses. It is quite unreadable. (It may
help to explain that the Skolem function F(x) refers to the animal potentially unloved by x,
whereas G(z) refers to someone who might love x.) Fortunately, humans seldom need look
at CNF sentences—the translation process is easily automated.
9.5.2 The resolution inference rule
The resolution rule for first-order clauses is simply a lifted version of the propositional reso-
lution rule given on page 253. Two clauses, which are assumed to be standardized apart so
that they share no variables, can be resolved if they contain complementary literals. Propo-
sitional literals are complementary if one is the negation of the other; first-order literals are
complementary if one unifies with the negation of the other. Thus, we have
ℓ1 ∨ · · · ∨ ℓk, m1 ∨ · · · ∨ mn
SUBST(θ, ℓ1 ∨ · · · ∨ ℓi−1 ∨ ℓi+1 ∨ · · · ∨ ℓk ∨ m1 ∨ · · · ∨ mj−1 ∨ mj+1 ∨ · · · ∨ mn)
where UNIFY(ℓi, ¬mj) = θ. For example, we can resolve the two clauses
[Animal(F(x)) ∨ Loves(G(x), x)] and [¬Loves(u, v) ∨ ¬Kills(u, v)]
by eliminating the complementary literals Loves(G(x), x) and ¬Loves(u, v), with unifier
θ = {u/G(x), v/x}, to produce the resolvent clause
[Animal(F(x)) ∨ ¬Kills(G(x), x)] .
This rule is called the binary resolution rule because it resolves exactly two literals. The
BINARY RESOLUTION
binary resolution rule by itself does not yield a complete inference procedure. The full reso-
lution rule resolves subsets of literals in each clause that are unifiable. An alternative approach
is to extend factoring—the removal of redundant literals—to the first-order case. Proposi-
tional factoring reduces two literals to one if they are identical; first-order factoring reduces
two literals to one if they are unifiable. The unifier must be applied to the entire clause. The
combination of binary resolution and factoring is complete.
9.5.3 Example proofs
Resolution proves that KB |= α by proving KB ∧ ¬α unsatisfiable, that is, by deriving the
empty clause. The algorithmic approach is identical to the propositional case, described in

¬American(x) ¬Weapon(y) ¬Sells(x,y,z) ¬Hostile(z) Criminal(x) ¬Criminal(West)
¬Enemy(Nono,America)
Enemy(Nono,America)
¬Missile(x) Weapon(x) ¬Weapon(y) ¬Sells(West,y,z) ¬Hostile(z)
Missile(M1) ¬Missile(y) ¬Sells(West,y,z) ¬Hostile(z)
¬Missile(x) ¬Owns(Nono,x) Sells(West,x,Nono) ¬Sells(West,M1,z) ¬Hostile(z)
¬American(West) ¬Weapon(y) ¬Sells(West,y,z) ¬Hostile(z)
American(West)
¬Missile(M1) ¬Owns(Nono,M1) ¬Hostile(Nono)
Missile(M1)
¬Owns(Nono,M1) ¬Hostile(Nono)
Owns(Nono,M1)
¬Enemy(x,America) Hostile(x) ¬Hostile(Nono)
^
^
^ ^
^ ^ ^
^ ^ ^
^ ^
^ ^ ^
^ ^
^
^
Figure 9.11 A resolution proof that West is a criminal. At each step, the literals that unify
are in bold.
Figure 7.12, so we need not repeat it here. Instead, we give two example proofs. The ﬁrst is
the crime example from Section 9.3. The sentences in CNF are
¬American(x) ∨ ¬Weapon(y) ∨ ¬Sells(x, y, z) ∨ ¬Hostile(z) ∨ Criminal(x)
¬Missile(x) ∨ ¬Owns(Nono, x) ∨ Sells(West, x, Nono)
¬Enemy(x, America) ∨ Hostile(x)
¬Missile(x) ∨ Weapon(x)
Owns(Nono, M1) Missile(M1)
American(West) Enemy(Nono, America) .
We also include the negated goal ¬Criminal(West). The resolution proof is shown in Fig-
ure 9.11. Notice the structure: single “spine” beginning with the goal clause, resolving against
clauses from the knowledge base until the empty clause is generated. This is characteristic
of resolution on Horn clause knowledge bases. In fact, the clauses along the main spine
correspond exactly to the consecutive values of the goals variable in the backward-chaining
algorithm of Figure 9.6. This is because we always choose to resolve with a clause whose
positive literal uniﬁed with the leftmost literal of the “current” clause on the spine; this is
exactly what happens in backward chaining. Thus, backward chaining is just a special case
of resolution with a particular control strategy to decide which resolution to perform next.
Our second example makes use of Skolemization and involves clauses that are not def-
inite clauses. This results in a somewhat more complex proof structure. In English, the
problem is as follows:
Everyone who loves all animals is loved by someone.
Anyone who kills an animal is loved by no one.
Jack loves all animals.
Either Jack or Curiosity killed the cat, who is named Tuna.
Did Curiosity kill the cat?

First, we express the original sentences, some background knowledge, and the negated goal
G in ﬁrst-order logic:
A. ∀ x [∀ y Animal(y) ⇒ Loves(x, y)] ⇒ [∃ y Loves(y, x)]
B. ∀ x [∃ z Animal(z) ∧ Kills(x, z)] ⇒ [∀ y ¬Loves(y, x)]
C. ∀ x Animal(x) ⇒ Loves(Jack, x)
D. Kills(Jack, Tuna) ∨ Kills(Curiosity, Tuna)
E. Cat(Tuna)
F. ∀ x Cat(x) ⇒ Animal(x)
¬G. ¬Kills(Curiosity, Tuna)
Now we apply the conversion procedure to convert each sentence to CNF:
A1. Animal(F(x)) ∨ Loves(G(x), x)
A2. ¬Loves(x, F(x)) ∨ Loves(G(x), x)
B. ¬Loves(y, x) ∨ ¬Animal(z) ∨ ¬Kills(x, z)
C. ¬Animal(x) ∨ Loves(Jack, x)
D. Kills(Jack, Tuna) ∨ Kills(Curiosity, Tuna)
E. Cat(Tuna)
F. ¬Cat(x) ∨ Animal(x)
¬G. ¬Kills(Curiosity, Tuna)
The resolution proof that Curiosity killed the cat is given in Figure 9.12. In English, the proof
could be paraphrased as follows:
Suppose Curiosity did not kill Tuna. We know that either Jack or Curiosity did; thus
Jack must have. Now, Tuna is a cat and cats are animals, so Tuna is an animal. Because
anyone who kills an animal is loved by no one, we know that no one loves Jack. On the
other hand, Jack loves all animals, so someone loves him; so we have a contradiction.
Therefore, Curiosity killed the cat.
¬Loves(y, Jack) Loves(G(Jack), Jack)
¬Kills(Curiosity, Tuna)
Kills(Jack, Tuna) Kills(Curiosity, Tuna)
¬Cat(x) Animal(x)
Cat(Tuna)
¬Animal(F(Jack)) Loves(G(Jack), Jack) Animal(F(x)) Loves(G(x), x)
¬Loves(y, x) ¬Kills(x, Tuna)
Kills(Jack, Tuna)
¬Loves(y, x) ¬Animal(z) ¬Kills(x, z)
Animal(Tuna) ¬Loves(x,F(x)) Loves(G(x), x) ¬Animal(x) Loves(Jack, x)
^
^
^ ^ ^ ^
^
^
^
Figure 9.12 A resolution proof that Curiosity killed the cat. Notice the use of factoring
in the derivation of the clause Loves(G(Jack), Jack). Notice also in the upper right, the
uniﬁcation of Loves(x, F(x)) and Loves(Jack, x) can only succeed after the variables have
been standardized apart.

The proof answers the question “Did Curiosity kill the cat?” but often we want to pose more
general questions, such as “Who killed the cat?” Resolution can do this, but it takes a little
more work to obtain the answer. The goal is ∃ w Kills(w, Tuna), which, when negated,
becomes ¬Kills(w, Tuna) in CNF. Repeating the proof in Figure 9.12 with the new negated
goal, we obtain a similar proof tree, but with the substitution {w/Curiosity} in one of the
steps. So, in this case, finding out who killed the cat is just a matter of keeping track of the
bindings for the query variables in the proof.
Unfortunately, resolution can produce nonconstructive proofs for existential goals.
NONCONSTRUCTIVE
PROOF
For example, ¬Kills(w, Tuna) resolves with Kills(Jack, Tuna) ∨ Kills(Curiosity, Tuna)
to give Kills(Jack, Tuna), which resolves again with ¬Kills(w, Tuna) to yield the empty
clause. Notice that w has two different bindings in this proof; resolution is telling us that,
yes, someone killed Tuna—either Jack or Curiosity. This is no great surprise! One so-
lution is to restrict the allowed resolution steps so that the query variables can be bound
only once in a given proof; then we need to be able to backtrack over the possible bind-
ings. Another solution is to add a special answer literal to the negated goal, which be-
ANSWER LITERAL
comes ¬Kills(w, Tuna) ∨ Answer(w). Now, the resolution process generates an answer
whenever a clause is generated containing just a single answer literal. For the proof in Fig-
ure 9.12, this is Answer(Curiosity). The nonconstructive proof would generate the clause
Answer(Curiosity) ∨ Answer(Jack), which does not constitute an answer.
9.5.4 Completeness of resolution
This section gives a completeness proof of resolution. It can be safely skipped by those who
are willing to take it on faith.
We show that resolution is refutation-complete, which means that if a set of sentences
REFUTATION
COMPLETENESS
is unsatisfiable, then resolution will always be able to derive a contradiction. Resolution
cannot be used to generate all logical consequences of a set of sentences, but it can be used
to establish that a given sentence is entailed by the set of sentences. Hence, it can be used to
find all answers to a given question, Q(x), by proving that KB ∧ ¬Q(x) is unsatisfiable.
We take it as given that any sentence in first-order logic (without equality) can be rewrit-
ten as a set of clauses in CNF. This can be proved by induction on the form of the sentence,
using atomic sentences as the base case (Davis and Putnam, 1960). Our goal therefore is to
prove the following: if S is an unsatisfiable set of clauses, then the application of a finite
number of resolution steps to S will yield a contradiction.
Our proof sketch follows Robinson’s original proof with some simplifications from
Genesereth and Nilsson (1987). The basic structure of the proof (Figure 9.13) is as follows:
1. First, we observe that if S is unsatisfiable, then there exists a particular set of ground
instances of the clauses of S such that this set is also unsatisfiable (Herbrand’s theorem).
2. We then appeal to the ground resolution theorem given in Chapter 7, which states that
propositional resolution is complete for ground sentences.
3. We then use a lifting lemma to show that, for any propositional resolution proof using
the set of ground sentences, there is a corresponding first-order resolution proof using
the first-order sentences from which the ground sentences were obtained.

Resolution can find a contradiction in S'
There is a resolution proof for the contradiction in S'
Herbrand’s theorem
Some set S' of ground instances is unsatisfiable
Any set of sentences S is representable in clausal form
Assume S is unsatisfiable, and in clausal form
Lifting lemma
Ground resolution
theorem
Figure 9.13 Structure of a completeness proof for resolution.
To carry out the ﬁrst step, we need three new concepts:
• Herbrand unive

Artificial_Intelligence__A_Modern_Approach_Nho Vĩnh Share.pdf

More Related Content

Similar to Artificial_Intelligence__A_Modern_Approach_Nho Vĩnh Share.pdf (20)

More from Nho Vĩnh (8)

Recently uploaded (20)

Artificial_Intelligence__A_Modern_Approach_Nho Vĩnh Share.pdf