Designing with the Mind in MindSimple Guide to Unde.docx

Designing with the
Mind in Mind
Simple Guide to Understanding
User Interface Design Guidelines
Second Edition
This page intentionally left blank
Designing with the
Mind in Mind
Simple Guide to Understanding
User Interface Design Guidelines
Second Edition
Jeff Johnson
AMSTERDAM • BOSTON • HEIDELBERG • LONDON
NEW YORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
Morgan Kaufmann is an imprint of Elsevier

Acquiring Editor: Meg Dunkerley
Editorial Project Manager: Heather Scherer
Project Manager: Priya Kumaraguruparan
Designer: Matthew Limbert
Morgan Kaufmann is an imprint of Elsevier
225 Wyman Street, Waltham, MA, 02451, USA
Copyright © 2014, 2010 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in
any form or by any means, electronic
or mechanical, including photocopying, recording, or any
information storage and retrieval system,
without permission in writing from the publisher. Details on
how to seek permission, further
information about the Publisher’s permissions policies and our
arrangements with organizations such
as the Copyright Clearance Center and the Copyright Licensing
Agency, can be found at our website:
www.elsevier.com/permissions.
This book and the individual contributions contained in it are
protected under copyright by the
Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly
changing. As new research and experience
broaden our understanding, changes in research methods or
professional practices, may become
necessary. Practitioners and researchers must always rely on
their own experience and knowledge in

evaluating and using any information or methods described
herein. In using such information or
methods they should be mindful of their own safety and the
safety of others, including parties for
whom they have a professional responsibility. To the fullest
extent of the law, neither the Publisher nor
the authors, contributors, or editors, assume any liability for
any injury and/or damage to persons or
property as a matter of products liability,negligence or
otherwise, or from any use or operation of any
methods, products, instructions, or ideas contained in the
material herein.
Library of Congress Cataloging-in-Publication Data
Application submitted
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British
Library
ISBN: 978-0-12-407914-4
Printed in China
14 15 16 17 10 9 8 7 6 5 4 3 2 1
For information on all Morgan Kaufmann publications,
visit our Web site at www.mkp.com
http://www.elsevier.com/permissions
http://www.mkp.com
v
Contents

Acknowledgments
...............................................................................................
.......vii
Foreword
...............................................................................................
..................... ix
Introduction
...............................................................................................
.............. xiii
CHAPTER 1 Our Perception is Biased
.............................................. 1
CHAPTER 2 Our Vision is Optimized to See Structure
...................... 13
CHAPTER 3 We Seek and Use Visual Structure
.............................. 29
CHAPTER 4 Our Color Vision is Limited
.......................................... 37
CHAPTER 5 Our Peripheral Vision is Poor
...................................... 49
CHAPTER 6 Reading is Unnatural
.................................................. 67
CHAPTER 7 Our Attention is Limited; Our Memory is
Imperfect ........ 87
CHAPTER 8 Limits on Attention Shape Our Thought and
Action ...... 107
CHAPTER 9 Recognition is Easy; Recall is Hard
........................... 121
CHAPTER 10 Learning from Experience and Performing
Learned
Actions are Easy; Novel Actions, Problem Solving,
and Calculation are Hard .......................................... 131
CHAPTER 11 Many Factors Affect Learning
.................................... 149

CHAPTER 12 Human Decision Making is Rarely Rational
................ 169
CHAPTER 13 Our Hand–Eye Coordination Follows Laws
.................. 187
CHAPTER 14 We Have Time Requirements
..................................... 195
Epilogue
...............................................................................................
................... 217
Appendix
...............................................................................................
.................. 219
Bibliography
...............................................................................................
............. 223
Index
...............................................................................................
........................ 229
vii
Acknowledgments
I could not have written this book without a lot of help and the
support of many
people.
First are the students of the human–computer interaction course

I taught as an
Erskine Fellow at the University of Canterbury in New Zealand
in 2006. It was for
them that I developed a lecture providing a brief background in
perceptual and
cognitive psychology—just enough to enable them to
understand and apply user-inter-
face design guidelines. That lecture expanded into a
professional development course,
then into the first edition of this book. My need to prepare more
comprehensive psy-
chological background for an upper-level course in human–
computer interaction that
I taught at the University of Canterbury in 2013 provided
motivation for expanding the
topics covered and improving the explanations in this second
edition.
Second, I thank my colleagues at the University of Canterbury
who provided
ideas, feedback on my ideas, and illustrations for the second
edition’s new chapter on
Fitts’ law: Professor Andy Cockburn, Dr. Sylvain Malacria, and
Mathieu Nancel. I also
thank my colleague and friend Professor Tim Bell for sharing
user-interface exam-
ples and for other help while I was at the university working on
the second edition.
Third, I thank the reviewers of the first edition—Susan Fowler,
Robin Jeffries,
Tim McCoy, and Jon Meads—and of the second edition—Susan
Fowler, Robin Jef-
fries, and James Hartman. They made many helpful comments
and suggestions that
allowed me to greatly improve the book.

Fourth, I am grateful to four cognitive science researchers who
directed me to
important references, shared useful illustrations with me, or
allowed me to bounce
ideas off of them:
• Professor Edward Adelson, Department of Brain and
Cognitive Sciences,
Massachusetts Institute of Technology.
• Professor Dan Osherson, Department of Psychology,
Princeton University.
• Dr. Dan Bullock, Department of Cognitive and Neural
Systems, Boston
University.
• Dr. Amy L. Milton, Department of Psychology and Downing
College, University
of Cambridge.
The book also was helped greatly by the care, oversight,
logistical support,
and nurturing provided by the staff at Elsevier, especially Meg
Dunkerley, Heather
Scherer, Lindsay Lawrence, and Priya Kumaraguruparan.
Last but not least, I thank my wife and friend Karen Ande for
her love and support
while I was researching and writing this book.

ix
Foreword
It is gratifying to see this book go into a second edition because
of the endorsement
that implies for maturing the field of human–computer
interaction beyond pure
empirical methods.
Human–computer interaction (HCI) as a topic is basically
simple. There is a per-
son of some sort who wants to do some task like write an essay
or pilot an airplane.
What makes the activity HCI is inserting a mediating computer.
In principle, our
person could have done the task without the computer. She
could have used a quill
pen and ink, for example, or flown an airplane that uses
hydraulic tubes to work the
controls. These are not quite HCI. They do use intermediary
tools or machines, and
the process of their design and the facts of their use bear
resemblance to those of
HCI. In fact, they fit into HCI’s uncle discipline of human
factors. But it is the com-
puter, and the process of contingent interaction the computer
renders possible, that
makes HCI distinctive.
The computer can transform a task’s representation and needed
skills. It can
change the linear writing process into something more like

sculpturing, the writer
roughing out the whole, then adding or subtracting bits to refine
the text. It can
change the piloting process into a kind of supervision, letting
the computer with
inputs of speed, altitude, and location and outputs of throttle,
flap, and rudder, do
the actual flying. And if instead of one person we have a small
group or a mass
crowd, or if instead of a single computer we have a network of
communicating
mobile or embedded computers, or if instead of a simple task we
have impinging
cultural or coordination considerations, then we get the many
variants of computer
mediation that form the broad spectrum of HCI.
The components of a discipline of HCI would also seem simple.
There is an arti-
fact that must be engineered and implemented. There is the
process of design for the
interaction itself and the objects, virtual or physical, with which
to interact. Then
there are all the principles, abstractions, theories, facts, and
phenomena surround-
ing HCI to know about. Let’s call the first interaction
engineering (e.g., using Harel
statecharts to guide implementation), the second, interaction
design (e.g., the design
of the workflow for a smartphone to record diet), and the third,
perhaps a little
overly grandly, interaction science (e.g., the use of Fitts’ law to
design button sizes
in an application). The hard bit for HCI is that fitting these
three together is not easy.
Beside work in HCI itself, each has its own literature not

friendly to outsiders. The
present book was written to bridge the gap between the relevant
science that has
been built up from the psychological literature and HCI design
problems where the
science could be of use.
Actually, the importance of linking engineering, design, and
science together in
HCI goes deeper. HCI is a technology. As Brian Arthur in his
book The Nature of
Forewordx
Technology tells us, technologies largely derive from other
technologies, not sci-
ence. The flat panel displays now common are a substitute for
CRT devices of yore,
and these go back to modified radar screens on the Whirlwind
computer. Further-
more, technologies are composed of parts that are themselves
technologies. A laptop
computer has a display for output and a key and a touchpad for
input and several
storage systems, and so on, each with its own technologies. But
eventually all these
technologies ground out in some phenomenon of nature that is
not a technology,
and here is a place where science plays a role. Some keyboard
input devices use the
natural phenomenon of electrical capacitance to sense
keystrokes. Pressing a key
brings two D-shaped pads close to a printed circuit board that is
covered by an insu-

lating film, thereby changing the pattern of capacitance. That is
to say, this keyboard
harnesses the natural phenomenon of capacitance in a reliable
way that can be
exploited to provide the HCI function of signaling an intended
interaction to the
computer.
Many natural phenomena are easy to understand and exploit by
simple observa-
tion or modest tinkering. No science needed. But some, like
capacitance, are much
less obvious, and then you really need science to understand
them. In some cases,
the HCI system that is built generates its own phenomena, and
you need science to
understand the unexpected, emergent properties of seemingly
obvious things. Peo-
ple sometimes believe that because they can intuitively
understand the easy cases
(e.g., with usability testing), they can understand all the cases.
But this is not neces-
sarily true. The natural phenomena to be exploited in HCI range
from abstractions
of computer science, such as the notion of the working set, to
psychological theories
of human cognition, perception, and movement, such as the
nature of vision. Psy-
chology, the area addressed by this book, is an area with an
especially messy and at
times contradictory literature, but it is also especially rich in
phenomena that can be
exploited for HCI technology.
I think it is underappreciated how important it is for the future
development of

HCI as a discipline that the field develops a supporting science
base as illustrated by
the current book for the field of psychology. It also involves
HCI growing some of its
own science bits.
Why is this important? There are at least three reasons. First,
having some sort of
theory enables explanatory evaluation. The use of A-B testing is
limited if you don’t
know why there was a difference. On the other hand, if you
have a theory that lets
you interpret the difference, then you can fix it. You will never
understand the prob-
lems of why a windows-based user interface can take excessive
time to use by doing
usability testing, for example, if you don’t have the theoretical
concept of the win-
dow working set. Second, it enables generative design. It allows
a shift in represen-
tation of the design space. Once it is realized that a very
important property of
pointing devices is the bandwidth of the human motor group to
which a transducer
is going to be applied, then the problem gets reformulated to
terms of how to con-
nect those muscles and the consequence for the rest of the
design. Third, it supports
the codification of knowledge. Only by having theories and
abstractions can we
concisely cumulate our results and develop a field with
sufficient power and depth.
xiForeword

Why isn’t there wider use of science or theory in HCI? There
are obvious reasons,
like the fact that it isn’t easy to get the relevant science
linkages or results in the first
place, that it’s hard to make the connection with science in
almost any engineering
field, and that often the connection is made, but invisibly
packaged, in a way that
nonspecialists never need to see it. The poet tosses capacitance
with his finger, but
only knows he writes a poem. He thinks he writes with love,
because someone
understood electricity.
But, mainly, I think there isn’t wider use of science or theory in
HCI because it is
difficult to put that knowledge into a form that is easily useful
at the time of design
need. Jeff Johnson in this book is careful to connect theory with
design choice, and
to do it in a practical way. He has accumulated grounded design
rules that reach
across the component parts of HCI, making it easier for
designers as they design to
keep them in mind.
Stuart K. Card

xiii
Introduction
USER-INTERFACE DESIGN RULES: WHERE DO THEY
COME FROM
AND HOW CAN THEY BE USED EFFECTIVELY?
For as long as people have been designing interactive computer
systems, some have
attempted to promote good design by publishing user-interface
design guidelines
(also called design rules). Early ones included:
• Cheriton (1976) proposed user-interface design guidelines
for early interactive
(time-shared) computer systems.
• Norman (1983a, 1983b) presented design rules for software
user interfaces
based on human cognition, including cognitive errors.
• Smith and Mosier (1986) wrote perhaps the most
comprehensive set of user-
interface design guidelines.
• Shneiderman (1987) included “Eight Golden Rules of
Interface Design” in the
first edition of his book Designing the User Interface and in all
later editions.
• Brown (1988) wrote a book of design guidelines,
appropriately titled Human–
Computer Interface Design Guidelines.
• Nielsen and Molich (1990) offered a set of design rules for
use in heuristic

evaluation of user interfaces, and Nielsen and Mack (1994)
updated them.
• Marcus (1992) presented guidelines for graphic design in
online documents
and user interfaces.
In the twenty-first century, additional user-interface design
guidelines have been
offered by Stone et al. (2005); Koyani et al. (2006); Johnson
(2007); and Shneiderman
and Plaisant (2009). Microsoft, Apple Computer, and Oracle
publish guidelines for
designing software for their platforms (Apple Computer, 2009;
Microsoft Corporation,
2009; Oracle Corporation/Sun Microsystems, 2001).
How valuable are user-interface design guidelines? That
depends on who applies
them to design problems.
USER-INTERFACE DESIGN AND EVALUATION REQUIRES
UNDERSTANDING AND EXPERIENCE
Following user-interface design guidelines is not as
straightforward as following
cooking recipes. Design rules often describe goals rather than
actions. They are
purposefully very general to make them broadly applicable, but
that means that
Introductionxiv
their exact meaning and applicability to specific design
situations is open to

interpretation.
Complicating matters further, more than one rule will often
seem applicable to a
given design situation. In such cases, the applicable design
rules often conflict—
that is, they suggest different designs. This requires designers
to determine which
competing design rule is more applicable to the given situation
and should take
precedence.
Design problems, even without competing design guidelines,
often have multiple
conflicting goals. For example:
• Bright screen and long battery life
• Lightweight and sturdy
• Multifunctional and easy to learn
• Powerful and simple
• High resolution and fast loading
• WYSIWYG (what you see is what you get) and usable by
blind people
Satisfying all the design goals for a computer-based product or
service usually
requires tradeoffs—lots and lots of tradeoffs. Finding the right
balance point between
competing design rules requires further tradeoffs.
Given all of these complications, user-interface design rules and

guidelines must
be applied thoughtfully, not mindlessly, by people who are
skilled in the art of user-
interface design and/or evaluation. User-interface design rules
and guidelines are
more like laws than like rote recipes. Just as a set of laws is
best applied and inter-
preted by lawyers and judges who are well versed in the laws, a
set of user-interface
design guidelines is best applied and interpreted by people who
understand the
basis for the guidelines and have learned from experience in
applying them.
Unfortunately, with a few exceptions (e.g., Norman, 1983a),
user-interface design
guidelines are provided as simple lists of design edicts with
little or no rationale or
background.
Furthermore, although many early members of the user-interface
design and
usability profession had backgrounds in cognitive psychology,
most newcomers to
the field do not. That makes it difficult for them to apply user-
interface design guide-
lines sensibly. Providing that rationale and background
education is the focus of this
book.
COMPARING USER-INTERFACE DESIGN GUIDELINES
Table I.1 places the two best-known user-interface guideline
lists side by side to
show the types of rules they contain and how they compare to
each other (see the
Appendix for additional guidelines lists). For example, both

lists start with a rule call-
ing for consistency in design. Both lists include a rule about
preventing errors. The
xvIntroduction
Nielsen–Molich rule to “help users recognize, diagnose, and
recover from errors”
corresponds closely to the Shneiderman–Plaisant rule to “permit
easy reversal of
actions.” “User control and freedom” corresponds to “make
users feel they are in
control.” There is a reason for this similarity, and it isn’t just
that later authors were
influenced by earlier ones.
WHERE DO DESIGN GUIDELINES COME FROM?
For present purposes, the detailed design rules in each set of
guidelines, such as
those in Table I.1, are less important than what they have in
common: their basis and
origin. Where did these design rules come from? Were their
authors—like clothing
fashion designers—simply trying to impose their own personal
design tastes on the
computer and software industries?
If that were so, the different sets of design rules would be very
different from
each other, as the various authors sought to differentiate
themselves from the others.
In fact, all of these sets of user-interface design guidelines are
quite similar if we
ignore differences in wording, emphasis, and the state of

computer technology
when each set was written. Why?
The answer is that all of the design rules are based on human
psychology: how
people perceive, learn, reason, remember, and convert
intentions into action. Many
authors of design guidelines had at least some background in
psychology that they
applied to computer system design.
For example, Don Norman was a professor, researcher, and
prolific author in the
field of cognitive psychology long before he began writing
about human–computer
interaction. Norman’s early human–computer design guidelines
were based on
research—his own and others’—on human cognition. He was
especially interested
in cognitive errors that people often make and how computer
systems can be
designed to lessen or eliminate the impact of those errors.
Table I.1 Two Best-Known Lists of User-Interface Design
Guidelines
Shneiderman (1987); Shneiderman
and Plaisant (2009)
Nielsen and Molich (1990)
Strive for consistency
Cater to universal usability
Offer informative feedback
Design task flows to yield closure
Prevent errors

Permit easy reversal of actions
Make users feel they are in control
Minimize short-term memory load
Consistency and standards
Visibility of system status
Match between system and real world
User control and freedom
Error prevention
Recognition rather than recall
Flexibility and efficiency of use
Aesthetic and minimalist design
Help users recognize, diagnose, and recover from errors
Provide online documentation and help
Introductionxvi
Similarly, other authors of user-interface design guidelines—for
example, Brown,
Shneiderman, Nielsen, and Molich—used knowledge of
perceptual and cognitive
psychology to try to improve the design of usable and useful
interactive systems.
Bottom line: User-interface design guidelines are based on
human psychology.
By reading this book, you will learn the most important aspects
of the psychol-
ogy underlying user-interface and usability design guidelines.
INTENDED AUDIENCE OF THIS BOOK
This book is intended mainly for software design and
development professionals

who have to apply user-interface and interaction design
guidelines. This includes
interaction designers, user-interface designers, user-experience
designers, graphic
designers, and hardware product designers. It also includes
usability testers and eval-
uators, who often refer to design heuristics when reviewing
software or analyzing
observed usage problems.
A second intended audience is students of interaction design
and human– computer
interaction. A third intended audience is software development
managers who want
enough of a background in the psychological basis of user-
interface design rules to
understand and evaluate the work of the people they manage.
Designing with the Mind in Mind.
http://dx.doi.org/10.1016/B978-0-12-407914-4.00001-4
© 2014 Elsevier Inc. All rights reserved.
CHAPTER
1
Our Perception is Biased
Our perception of the world around us is not a true depiction of
what is actually
there. Our perceptions are heavily biased by at least three
factors:
l The past: our experience

l The present: the current context
l The future: our goals
PERCEPTION BIASED BY EXPERIENCE
Experience—your past perceptions—can bias your current
perception in several
different ways.
Perceptual priming
Imagine that you own a large insurance company. You are
meeting with a real estate
manager, discussing plans for a new campus of company
buildings. The campus
consists of a row of five buildings, the last two with T-shaped
courtyards providing
light for the cafeteria and fitness center. If the real estate
manager showed you the
map in Figure 1.1, you would see five black shapes representing
the buildings.
Now imagine that instead of a real estate manager, you are
meeting with an adver-
tising manager. You are discussing a new billboard ad to be
placed in certain markets
around the country. The advertising manager shows you the
same image, but in this
scenario the image is a sketch of the ad, consisting of a single
word: LIFE. In this
scenario, you see a word, clearly and unambiguously.
When your perceptual system has been primed to see building
shapes, you see
building shapes, and the white areas between the buildings
barely register in your
perception. When your perceptual system has been primed to

see text, you see text,
and the black areas between the letters barely register.
1
CHAPTER 1 Our Perception is Biased2
A relatively famous example of how priming the mind can
affect perception is an
image, supposedly by R. C. James,1 that initially looks to most
people like a random
splattering of paint (see Fig. 1.2) similar to the work of the
painter Jackson Pollack.
Before reading further, look at the image.
Only after you are told that it is a Dalmatian dog sniffing the
ground near a tree
can your visual system organize the image into a coherent
picture. Moreover, once
you’ve seen the dog, it is hard to go back to seeing just a
random collection of spots.
1 Published in Lindsay and Norman (1972), Figure 3-17, p. 146.
FIGURE 1.1
Building map or word? What you see depends on what you were
told to see.
FIGURE 1.2
Image showing the effect of mental priming of the visual
system. What do you see?

3Perception biased by experience
These priming examples are visual, but priming can also bias
other types of per-
ception, such as sentence comprehension. For example, the
headline “New Vaccine
Contains Rabies” would probably be understood differently by
people who had
recently heard stories about contaminated vaccines than by
people who had recently
heard stories about successful uses of vaccines to fight diseases.
Familiar perceptual patterns or frames
Much of our lives are spent in familiar situations: the rooms in
our homes, our yards,
our routes to and from school or work, our offices,
neighborhood parks, stores, res-
taurants, etc. Repeated exposure to each type of situation builds
a pattern in our
minds of what to expect to see there. These perceptual patterns,
which some
researchers call frames, include the objects or events that are
usually encountered
in that situation.
For example, you know most rooms in your home well enough
that you need not
constantly scrutinize every detail. You know how they are laid
out and where most
objects are located. You can probably navigate much of your
home in total darkness.
But your experience with homes is broader than your specific
home. In addition to
having a pattern for your home, your brain has one for homes in

general. It biases
your perception of all homes, familiar and new. In a kitchen,
you expect to see a
stove and a sink. In a bathroom, you expect to see a toilet, a
sink, and a shower or a
bathtub (or both).
Mental frames for situations bias our perception to see the
objects and events
expected in each situation. They are a mental shortcut: by
eliminating the need for us
to constantly scrutinize every detail of our environment, they
help us get around in
our world. However, mental frames also make us see things that
aren’t really there.
For example, if you visit a house in which there is no stove in
the kitchen, you
might nonetheless later recall seeing one, because your mental
frame for kitchens
has a strong stove component. Similarly, part of the frame for
eating at a restaurant
is paying the bill, so you might recall paying for your dinner
even if you absentmind-
edly walked out without paying. Your brain also has frames for
back yards, schools,
city streets, business offices, supermarkets, dentist visits, taxis,
air travel, and other
familiar situations.
Anyone who uses computers, websites, or smartphones has
frames for the desk-
top and files, web browsers, websites, and various types of
applications and online
services. For example, when they visit a new Web site,
experienced Web users

expect to see a site name and logo, a navigation bar, some other
links, and maybe a
search box. When they book a flight online, they expect to
specify trip details,
examine search results, make a choice, and make a purchase.
Because of the perceptual frames users of computer software
and websites have,
they often click buttons or links without looking carefully at
them. Their perception
of the display is based more on what their frame for the
situation leads them to
expect than on what is actually on the screen. This sometimes
confounds software
designers, who expect users to see what is on the screen—but
that isn’t how human
vision works.
For example, if the positions of the “Next” and “Back” buttons
on the last page of
a multistep dialog box2 switched, many people would not
immediately notice the
switch (see Fig. 1.3). Their visual system would have been
lulled into inattention by
the consistent placement of the buttons on the prior several
pages. Even after unin-
tentionally going backward a few times, they might continue to
perceive the buttons
2 Multistep dialog boxes are called wizards in user-interface
designer jargon.

FIGURE 1.3
The “Next” button is perceived to be in a consistent location,
even when it isn’t.
5Perception biased by experience
in their standard locations. This is why consistent placement of
controls is a common
user-interface guideline, to ensure that reality matches the
user’s frame for the
situation.
Similarly, if we are trying to find something but it is in a
different place or looks
different from usual, we might miss it even though it is in plain
view because our
mental frames tune us to look for expected features in expected
locations. For exam-
ple, if the “Submit” button on one form in a Web site is shaped
differently or is a
different color from those on other forms on the site, users
might not find it. This
expectation-induced blindness is discussed more later in this
chapter in the “Percep-
tion Biased by Goals” section.
Habituation
A third way in which experience biases perception is called
habituation. Repeated
exposure to the same (or highly similar) perceptions dulls our
perceptual system’s
sensitivity to them. Habituation is a very low-level phenomenon
of our nervous sys-

tem: it occurs at a neuronal level. Even primitive animals like
flatworms and amoeba,
with very simple nervous systems, habituate to repeated stimuli
(e.g., mild electric
shocks or light-flashes). People, with our complex nervous
systems, habituate to a
range of events, from low-level ones like a continually beeping
tone, to medium-level
ones like a blinking ad on a Web site, to high-level ones like a
person who tells the
same jokes at every party or a politician giving a long,
repetitious speech.
We experience habituation in computer usage when the same
error messages or
“Are you sure?” confirmation messages appear again and again.
People initially notice
them and perhaps respond, but eventually click them closed
reflexively without
bothering to read them.
Habituation is also a factor in a recent phenomenon variously
labeled “social
media burnout” (Nichols, 2013), “social media fatigue,” or
“Facebook vacations”
(Rainie et al., 2013): newcomers to social media sites and
tweeting are initially
excited by the novelty of microblogging about their
experiences, but sooner or later
get tired of wasting time reading tweets about every little thing
that their “friends”
do or see—for example, “Man! Was that ever a great salmon
salad I had for lunch
today.”
Attentional blink

Another low-level biasing of perception by past experience
occurs just after we spot
or hear something important. For a very brief period following
the recognition—
between 0.15 and 0.45 second—we are nearly deaf and blind to
other visual stimuli,
even though our ears and eyes stay functional. Researchers call
this the attentional
blink (Raymond et al., 1992, Stafford and Webb, 2005).3 It is
thought to be caused by
the brain’s perceptual and attention mechanisms being briefly
fully occupied with
processing the first recognition.
3 Chapter 14 discusses the attentional blink interval in the
context of other perceptual intervals.
A classic example: You are in a subway car as it enters a
station, planning to meet
two friends at that station. As the train arrives, your car passes
one of your friends, and
you spot him briefly through your window. In the next split
second, your window
passes your other friend, but you fail to notice her because her
image hit your retina
during the attentional blink that resulted from your recognition
of your first friend.
When people use computer-based systems and online services,
attentional blink
can cause them to miss information or events if things appear in
rapid succession. A

popular modern technique for making documentary videos is to
present a series of
still photographs in rapid succession.4 This technique is highly
prone to attentional
blink effects: if an image really captures your attention (e.g., it
has a strong meaning
for you), you will probably miss one or more of the immediately
following images. In
contrast, a captivating image in an auto-running slideshow (e.g.,
on a Web site or an
information kiosk) is unlikely to cause attentional blink (i.e.,
missing the next image)
because each image typically remains displayed for several
seconds.
PERCEPTION BIASED BY CURRENT CONTEXT
When we try to understand how our visual perception works, it
is tempting to think
of it as a bottom-up process, combining basic features such as
edges, lines, angles,
curves, and patterns into figures and ultimately into meaningful
objects. To take read-
ing as an example, you might assume that our visual system
first recognizes shapes as
letters and then combines letters into words, words into
sentences, and so on.
But visual perception—reading in particular—is not strictly a
bottom-up process.
It includes top-down influences too. For example, the word in
which a character
appears may affect how we identify the character (see Fig. 1.4).
Similarly, our overall comprehension of a sentence or a
paragraph can even influence
what words we see in it. For example, the same letter sequence

can be read as different
words depending on the meaning of the surrounding paragraph
(see Fig. 1.5).
Contextual biasing of vision need not involve reading. The
Müller–Lyer illusion is
a famous example (see Fig. 1.6): the two horizontal lines are the
same length, but the
outward-pointing “fins” cause our visual system to see the top
line as longer than the
4 For an example, search YouTube for “history of the world in
two minutes.”
FIGURE 1.4
The same character is perceived as H or A depending on the
surrounding letters.
7Perception biased by current context
line with inward-pointing “fins.” This and other optical
illusions (see Fig. 1.7) trick us
because our visual system does not use accurate, optimal
methods to perceive the
world. It developed through evolution, a semi-random process
that layers jury-
rigged—often incomplete and inaccurate—solutions on top of
each other. It works
fine most of the time, but it includes a lot of approximations,
kludges, hacks, and
outright “bugs” that cause it to fail in certain cases.
The examples in Figures 1.6 and 1.7 show vision being biased

by visual context.
However, biasing of perception by the current context works
between different senses
too. Perceptions in any of our five senses may affect
simultaneous perceptions in any
of our other senses. What we feel with our tactile sense can be
biased by what we
hear, see, or smell. What we see can be biased by what we hear,
and what we hear can
be biased by what we see. The following two examples of visual
perception affect
what we hear:
l McGurk effect. If you watch a video of someone saying “bah,
bah, bah,” then
“dah, dah, dah,” then “vah, vah, vah,” but the audio is “bah,
bah, bah” through-
out, you will hear the syllable indicated by the speaker’s lip
movement rather
than the syllable actually in the audio track.5 Only by closing or
averting your
eyes do you hear the syllable as it really is. I’ll bet you didn’t
know you could
read lips, and in fact do so many times a day.
5 Go to YouTube, search for “McGurk effect,” and view (and
hear) some of the resulting videos.
Fold napkins. Polish silverware. Wash dishes.
French napkins. Polish silverware. German dishes.
FIGURE 1.5
The same phrase is perceived differently depending on the list it
appears in.

FIGURE 1.6
Müller–Lyer illusion: equal-length horizontal lines appear to
have different lengths.
l Ventriloquism. Ventriloquists don’t throw their voice; they
just learn to talk
without moving their mouths much. Viewers’ brains perceive
the talking as
coming from the nearest moving mouth: that of the
ventriloquist’s puppet
(Eagleman, 2012).
An example of the opposite—hearing biasing vision—is the
illusory flash effect.
When a spot is flashed once briefly on a display but is
accompanied by two quick
beeps, it appears to flash twice. Similarly, the perceived rate of
a blinking light can
be adjusted by the frequency of a repeating click (Eagleman,
2012).
Later chapters explain how visual perception, reading, and
recognition function
in the human brain. For now, I will simply say that the pattern
of neural activity that
corresponds to recognizing a letter, a word, a face, or any object
includes input from
(A) (B)

(C)
FIGURE 1.7
(A) The checkboard does not bulge in the middle; (B) the
triangle sides are not bent; and (C) the
red vertical lines are parallel.
9Perception biased by goals
neural activity stimulated by the context. This context includes
other nearby per-
ceived objects and events, and even reactivated memories of
previously perceived
objects and events.
Context biases perception not only in people but also in lower
animals. A friend
of mine often brought her dog with her in her car when running
errands. One day,
as she drove into her driveway, a cat was in the front yard. The
dog saw it and
began barking. My friend opened the car door and the dog
jumped out and ran
after the cat, which turned and jumped through a bush to escape.
The dog dove
into the bush but missed the cat. The dog remained agitated for
some time
afterward.
Thereafter, for as long as my friend lived in that house,
whenever she arrived at
home with her dog in the car, he would get excited, bark, jump
out of the car as soon

as the door was opened, dash across the yard, and leap into the
bush. There was no
cat, but that didn’t matter. Returning home in the car was
enough to make the dog
see one—perhaps even smell one. However, walking home, as
the dog did after
being taken for his daily walk, did not evoke the “cat mirage.”
PERCEPTION BIASED BY GOALS
In addition to being biased by our past experience and the
present context, our per-
ception is influenced by our goals and plans for the future.
Specifically, our goals:
l Guide our perceptual apparatus, so we sample what we need
from the world
around us.
l Filter our perceptions: things unrelated to our goals tend to
be filtered out pre-
consciously, never registering in our conscious minds.
For example, when people navigate through software or a Web
site, seeking
information or a specific function, they don’t read carefully.
They scan screens
quickly and superficially for items that seem related to their
goal. They don’t simply
ignore items unrelated to their goals; they often don’t even
notice them.
To see this, glance at Figure 1.8 and look for scissors, and then
immediately flip
back to this page. Try it now.
Did you spot the scissors? Now, without looking back at the

toolbox, can you say
whether there is a screwdriver in the toolbox too?
Our goals filter our perceptions in other perceptual senses as
well as in vision. A
familiar example is the “cocktail party” effect. If you are
conversing with someone
at a crowded party, you can focus your attention to hear mainly
what he or she is
saying even though many other people are talking near you.
The more interested
you are in the conversation, the more strongly your brain filters
out surrounding
chatter. If you are bored by what your conversational partner is
saying, you will
probably hear much more of the conversations around you.
The effect was first documented in studies of air-traffic
controllers, who were
able to carry on a conversation with the pilots of their assigned
aircraft even though
many different conversations were occurring simultaneously on
the same radio fre-
quency, coming out of the same speaker in the control room
(Arons, 1992). Research
suggests that our ability to focus on one conversation among
several simultaneous
ones depends not only on our interest level in the conversation,
but also on objective
factors, such as the similarity of voices in the cacophony, the
amount of general

“noise” (e.g., clattering dishes or loud music), and the
predictability of what your
conversational partner is saying (Arons, 1992).
This filtering of perception by our goals is particularly true for
adults, who tend
to be more focused on goals than children are. Children are
more stimulus-driven:
their perception is less filtered by their goals. This
characteristic makes them more
distractible than adults, but it also makes them less biased as
observers.
A parlor game demonstrates this age difference in perceptual
filtering. It is similar
to the Figure 1.8 exercise. Most households have a catch-all
drawer for kitchen imple-
ments or tools. From your living room, send a visitor to the
room where the catch-all
drawer is, with instructions to fetch you a specific tool, such as
measuring spoons or
a pipe wrench. When the person returns with the tool, ask
whether another specific
tool was in the drawer. Most adults will not know what else was
in the drawer. Chil-
dren—if they can complete the task without being distracted by
all the cool stuff in
the drawer—will often be able to tell you more about what else
was there.
Perceptual filtering can also be seen in how people navigate
websites. Suppose I
put you on the homepage of New Zealand’s University of
Canterbury (see Fig. 1.9)
and asked you to find information about financial support for
postgraduate students

in the computer science department. You would scan the page
and probably quickly
click one of the links that share words with the goal that I gave
you: Departments
(top left), Scholarships (middle), then Postgraduate Students
(bottom left) or Post-
graduate (right). If you’re a “search” person, you might instead
go right to the Search
box (top right), type words related to the goal, and click “Go.”
Whether you browse or search, it is likely that you would leave
the homepage
without noticing that you were randomly chosen to win $100
(bottom right). Why?
Because that was not related to your goal.
FIGURE 1.8
Toolbox: Are there scissors here?
11Perception biased by goals
What is the mechanism by which our current goals bias our
perception? There
are two:
l Influencing where we look. Perception is active, not passive.
Think of your
perceptual senses not as simply filtering what comes to you, but
rather as reach-
ing out into the world and pulling in what you need to perceive.
Your hands,
your primary touch sensors, literally do this, but the rest of your
senses do it

too. You constantly move your eyes, ears, hands, feet, body,
and attention so as
to sample exactly the things in your environment that are most
relevant to what
you are doing or about to do (Ware, 2008). If you are looking
on a Web site for
a campus map, your eyes and pointer-controlling hand are
attracted to anything
that might lead you to that goal. You more or less ignore
anything unrelated to
your goal.
l Sensitizing our perceptual system to certain features. When
you are look-
ing for something, your brain can prime your perception to be
especially sensi-
tive to features of what you are looking for (Ware, 2008). For
example, when
you are looking for a red car in a large parking lot, red cars will
seem to pop out
as you scan the lot, and cars of other colors will barely register
in your con-
sciousness, even though you do in some sense see them.
Similarly, when you are
FIGURE 1.9
University of Canterbury Web site: navigating sites requires
perceptual filtering.
trying to find your spouse in a dark, crowded room, your brain
“programs” your

auditory system to be especially sensitive to the combination of
frequencies
that make up his or her voice.
TAKING BIASED PERCEPTION INTO ACCOUNT WHEN
DESIGNING
All these sources of perceptual bias of course have implications
for user-interface
design. Here are three.
Avoid ambiguity
Avoid ambiguous information displays, and test your design to
verify that all users
interpret the display in the same way. Where ambiguity is
unavoidable, either rely on
standards or conventions to resolve it, or prime users to resolve
the ambiguity in the
intended way.
For example, computer displays often shade buttons and text
fields to make them
look raised in relation to the background surface (see Fig. 1.10).
This appearance
relies on a convention, familiar to most experienced computer
users, that the light
source is at the top left of the screen. If an object were depicted
as lit by a light
source in a different location, users would not see the object as
raised.
Be consistent
Place information and controls in consistent locations. Controls
and data displays that
serve the same function on different pages should be placed in
the same position on
each page on which they appear. They should also have the

same color, text fonts,
shading, and so on. This consistency allows users to spot and
recognize them quickly.
Understand the goals
Users come to a system with goals they want to achieve.
Designers should under-
stand those goals. Realize that users’ goals may vary, and that
their goals strongly
influence what they perceive. Ensure that at every point in an
interaction, the infor-
mation users need is available, prominent, and maps clearly to a
possible user goal,
so users will notice and use the information.
Search
FIGURE 1.10
Buttons on computer screens are often shaded to make them
look three dimensional, but the
convention works only if the light source is assumed to be on
the top left.
CHAPTER
13
Our Vision is Optimized
to See Structure 2

Early in the twentieth century, a group of German psychologists
sought to explain
how human visual perception works. They observed and
catalogued many important
visual phenomena. One of their basic findings was that human
vision is holistic: our
visual system automatically imposes structure on visual input
and is wired to per-
ceive whole shapes, figures, and objects rather than
disconnected edges, lines, and
areas. The German word for “shape” or “figure” is Gestalt, so
these theories became
known as the Gestalt principles of visual perception.
Today’s perceptual and cognitive psychologists regard the
Gestalt theory of per-
ception as more of a descriptive framework than an explanatory
and predictive
theory. Today’s theories of visual perception tend to be based
heavily on the neuro-
physiology of the eyes, optic nerve, and brain (see Chapters 4–
7).
Not surprisingly, the findings of neurophysiological researchers
support the observa-
tions of the Gestalt psychologists. We really are—along with
other animals—“wired” to
perceive our surroundings in terms of whole objects (Stafford
and Webb, 2005; Ware,
2008). Consequently, the Gestalt principles are still valid—if
not as a fundamental expla-
nation of visual perception, at least as a framework for
describing it. They also provide a
useful basis for guidelines for graphic design and user-interface
design (Soegaard, 2007).

For present purposes, the most important Gestalt principles are
Proximity, Simi-
larity, Continuity, Closure, Symmetry, Figure/Ground, and
Common Fate. The fol-
lowing sections describe each principle and provide examples
from both static
graphic design and user-interface design.
GESTALT PRINCIPLE: PROXIMITY
The Gestalt principle of Proximity is that the relative distance
between objects in a
display affects our perception of whether and how the objects
are organized into
CHAPTER 2 Our Vision is Optimized to See Structure14
subgroups. Objects that are near each other (relative to other
objects) appear
grouped, while those that are farther apart do not.
In Figure 2.1A, the stars are closer together horizontally than
they are vertically,
so we see three rows of stars, while the stars in Figure 2.1B are
closer together verti-
cally than they are horizontally, so we see three columns.
(A) (B)
FIGURE 2.1
Proximity: items that are closer appear grouped as rows (A) and
columns (B).
FIGURE 2.2

In Outlook’s Distribution List Membership dialog box, list
buttons are in a group box, separate
from the control buttons.
15Gestalt principle: proximity
The Proximity principle has obvious relevance to the layout of
control panels or
data forms in software, Web sites, and electronic appliances.
Designers often sepa-
rate groups of on-screen controls and data displays by enclosing
them in group boxes
or by placing separator lines between groups (see Fig. 2.2).
However, according to the Proximity principle, items on a
display can be visually
grouped simply by spacing them closer to each other than to
other controls, without
group boxes or visible borders (see Fig. 2.3). Many graphic
design experts recom-
mend this approach to reduce visual clutter and code size in a
user interface (Mullet
and Sano, 1994).
FIGURE 2.3
In Mozilla Thunderbird’s Subscribe Folders dialog box, controls
are grouped using the Proximity
principle.
FIGURE 2.4
In Discreet’s Software Installer, poorly spaced radio buttons

look grouped in vertical columns.
Conversely, if controls are poorly spaced (e.g., if connected
controls are too far
apart) people will have trouble perceiving them as related,
making the software
harder to learn and remember. For example, the Discreet
Software Installer displays
six horizontal pairs of radio buttons, each representing a two-
way choice, but their
spacing, due to the Proximity principle, makes them appear to
be two vertical sets
of radio buttons, each representing a six-way choice, at least
until users try them and
learn how they operate (see Fig. 2.4).
GESTALT PRINCIPLE: SIMILARITY
Another factor that affects our perception of grouping is
expressed in the Gestalt
principle of Similarity, where objects that look similar appear
grouped, all other
things being equal. In Figure 2.5, the slightly larger, “hollow”
stars are perceived as
a group.
The Page Setup dialog box in Mac OS applications uses the
Similarity and Proxim-
ity principles to convey groupings (see Fig. 2.6). The three very
similar and tightly
spaced Orientation settings are clearly intended to appear
grouped. The three menus
are not so tightly spaced but look similar enough that they

appear related even
though that probably wasn’t intended.
Similarly, the text fields in a form at book publisher Elsevier’s
Web site are orga-
nized into an upper group of eight for the name and address, a
group of three split
fields for phone numbers, and two single text fields. The four
menus, in addition to
being data fields, help separate the text field groups (see Fig.
2.7). By contrast, the
labels are too far from their fields to seem connected to them.
FIGURE 2.5
Similarity: items appear grouped if they look more similar to
each other than to other objects.
17Gestalt principle: similarity
FIGURE 2.6
Mac OS Page Setup dialog box. The Similarity and Proximity
principles are used to group the
Orientation settings.
FIGURE 2.7
Similarity makes the text fields appear grouped in this online
form at Elsevier.com.
http://Elsevier.com

GESTALT PRINCIPLE: CONTINUITY
In addition to the two Gestalt principles concerning our
tendency to organize objects
into groups, several Gestalt principles describe our visual
system’s tendency to
resolve ambiguity or fill in missing data in such a way as to
perceive whole objects.
The first such principle, the principle of Continuity, states that
our visual perception
is biased to perceive continuous forms rather than disconnected
segments.
For example, in Figure 2.8A, we automatically see two crossing
lines—one blue
and one orange. We don’t see two separate orange segments and
two separate blue
ones, and we don’t see a blue-and-orange V on top of an upside-
down orange-and-
blue V. In Figure 2.8B, we see a sea monster in water, not three
pieces of one.
A well-known example of the use of the continuity principle in
graphic design is
the IBM® logo. It consists of disconnected blue patches, and
yet it is not at all ambig-
uous; it is easily seen as three bold letters, perhaps viewed
through something like
venetian blinds (see Fig. 2.9).
(A) (B)
FIGURE 2.8
Continuity: Human vision is biased to see continuous forms,

even adding missing data if
necessary.
FIGURE 2.9
The IBM company logo uses the Continuity principle to form
letters from disconnected patches.
19Gestalt principle: closure
Slider controls are a user-interface example of the Continuity
principle. We see a
slider as depicting a single range controlled by a handle that
appears somewhere on
the slider, not as two separate ranges separated by the handle
(see Fig. 2.10A). Even
displaying different colors on each side of a slider’s handle
doesn’t completely
“break” our perception of a slider as one continuous object,
although Componen-
tOne’s choice of strongly contrasting colors (gray vs. red)
certainly strains that per-
ception a bit (see Fig. 2.10B).
GESTALT PRINCIPLE: CLOSURE
Related to Continuity is the Gestalt principle of Closure, which
states that our visual
system automatically tries to close open figures so that they are
perceived as whole
objects rather than separate pieces. Thus, we perceive the
disconnected arcs in Fig-
ure 2.11A as a circle.
Our visual system is so strongly biased to see objects that it can

even interpret a
totally blank area as an object. We see the combination of
shapes in Figure 2.11B as
a white triangle overlapping another triangle and three black
circles, even though
the figure really only contains three V shapes and three black
pac-men.
The Closure principle is often applied in graphical user
interfaces (GUIs). For
example, GUIs often represent collections of objects (e.g.,
documents or messages)
as stacks (see Fig. 2.12). Just showing one whole object and the
edges of others
“behind” it is enough to make users perceive a stack of objects,
all whole.
(A)
(B)
FIGURE 2.10
Continuity: we see a slider as a single slot with a handle
somewhere on it, not as two slots
separated by a handle: (A) Mac OS and (B) ComponentOne.
GESTALT PRINCIPLE: SYMMETRY
A third fact about our tendency to see objects is captured in the
Gestalt principle of
Symmetry. It states that we tend to parse complex scenes in a
way that reduces the

complexity. The data in our visual field usually has more than
one possible interpre-
tation, but our vision automatically organizes and interprets the
data so as to simplify
it and give it symmetry.
For example, we see the complex shape on the far left of Figure
2.13 as two over-
lapping diamonds, not as two touching corner bricks or a pinch-
waist octahedron
with a square in its center. A pair of overlapping diamonds is
simpler than the other
two interpretations shown on the right—it has fewer sides and
more symmetry than
the other two interpretations.
In printed graphics and on computer screens, our visual
system’s reliance on the
symmetry principle can be exploited to represent three-
dimensional objects on a
two-dimensional display. This can be seen in a cover
illustration for Paul Thagard’s
book Coherence in Thought and Action (Thagard, 2002; see Fig.
2.14) and in a
three-dimensional depiction of a cityscape (see Fig. 2.15).
FIGURE 2.12
Icons depicting stacks of objects exhibit the Closure principle:
partially visible objects are
perceived as whole.
(A) (B)
FIGURE 2.11

Closure: Human vision is biased to see whole objects, even
when they are incomplete.
21Gestalt principle: figure/ground
GESTALT PRINCIPLE: FIGURE/GROUND
The next Gestalt principle that describes how our visual system
structures the data
it receives is Figure/Ground. This principle states that our mind
separates the visual
field into the figure (the foreground) and ground (the
background). The foreground
consists of the elements of a scene that are the object of our
primary attention, and
the background is everything else.
The Figure/Ground principle also specifies that the visual
system’s parsing of
scenes into figure and ground is influenced by characteristics of
the scene. For exam-
ple, when a small object or color patch overlaps a larger one,
we tend to perceive the
smaller object as the figure and the larger object as the ground
(see Fig. 2.16).
not= or
FIGURE 2.13
Symmetry: the human visual system tries to resolve complex
scenes into combinations of simple,
symmetrical shapes.
FIGURE 2.14

The cover of the book Coherence in Thought and Action
(Thagard, 2002) uses the symmetry,
Closure, and Continuity principles to depict a cube.
However, our perception of figure versus ground is not
completely determined
by scene characteristics. It also depends on the viewer’s focus
of attention. Dutch
artist M. C. Escher exploited this phenomenon to produce
ambiguous images in
which figure and ground switch roles as our attention shifts (see
Fig. 2.17).
In user-interface and Web design, the Figure/Ground principle
is often used to
place an impression-inducing background “behind” the primary
displayed content
FIGURE 2.16
Figure/Ground: when objects overlap, we see the smaller as the
figure and the larger as the
ground.
FIGURE 2.15
Symmetry: the human visual system parses very complex two-
dimensional images into three-
dimensional scenes.

23Gestalt principle: figure/ground
(see Fig. 2.18). The background can convey information (e.g.,
the user’s current loca-
tion), or it can suggest a theme, brand, or mood for
interpretation of the content.
Figure/Ground is also often used to pop up information over
other content. Con-
tent that was formerly the figure—the focus of the users’
attention—temporarily
becomes the background for new information, which appears
briefly as the new
FIGURE 2.17
M. C. Escher exploited figure/ground ambiguity in his art.
FIGURE 2.18
Figure/Ground is used at AndePhotos.com to display a thematic
watermark “behind” the content.
http://AndePhotos.com
figure (see Fig. 2.19). This approach is usually better than
temporarily replacing the
old information with the new information, because it provides
context that helps
keep people oriented regarding their place in the interaction.
GESTALT PRINCIPLE: COMMON FATE

The previous six Gestalt principles concerned perception of
static (unmoving) figures
and objects. One final Gestalt principle—Common Fate—
concerns moving objects.
The cCommon Fate principle is related to the Proximity and
Similarity principles—
like them, it affects whether we perceive objects as grouped.
The Common Fate prin-
ciple states that objects that move together are perceived as
grouped or related.
For example, in a display showing dozens of pentagons, if seven
of them wiggled
in synchrony, people would see them as a related group, even if
the wiggling penta-
gons were separated from each other and looked no different
from all the other
pentagons (see Fig. 2.20).
Common motion—implying common fates—is used in some
animations to show
relationships between entities. For example, Google’s
GapMinder graphs animate dots
representing nations to show changes over time in various
factors of economic devel-
opment. Countries that move together share development
histories (see Fig. 2.21).
FIGURE 2.19
Figure/Ground is used at PBS.org’s mobile Web site to pop up a
call-to-action “over” the page
content.

25Gestalt principles: combined
GESTALT PRINCIPLES: COMBINED
Of course, in real-world visual scenes, the Gestalt principles
work in concert, not in
isolation. For example, a typical Mac OS desktop usually
exemplifies six of the seven
principles described here, excluding Common Fate): Proximity,
Similarity, Continu-
ity, Closure, Symmetry, and Figure/Ground (see Fig. 2.22). On
a typical desktop,
Common Fate is used (along with similarity) when a user selects
several files or fold-
ers and drags them as a group to a new location (see Fig. 2.23).
FIGURE 2.20
Common Fate: items appear grouped or related if they move
together.
FIGURE 2.21
Common fate: GapMinder animates dots to show which nations
have similar development
histories (for details, animations, and videos, visit
GapMinder.org).
http://GapMinder.org
FIGURE 2.22
All of the Gestalt principles except Common Fate play a role in
this portion of a Mac OS desktop.

FIGURE 2.23
Similarity and Common Fate: when users drag folders that they
have selected, common highlight-
ing and motion make the selected folders appear grouped.
27Gestalt principles: combined
With all these Gestalt principles operating at once, unintended
visual relation-
ships can be implied by a design. A recommended practice, after
designing a display,
is to view it with each of the Gestalt principles in mind—
Proximity, Similarity, Con-
tinuity, Closure, Symmetry, Figure/Ground, and Common
Fate—to see if the design
suggests any relationships between elements that you do not
intend.
CHAPTER
29

We Seek and Use Visual
Structure
Chapter 2 used the Gestalt principles of visual perception to
show how our visual
system is optimized to perceive structure. Perceiving structure
in our environment
helps us make sense of objects and events quickly. Chapter 2
also mentioned that
when people are navigating through software or Web sites, they
don’t scrutinize
screens carefully and read every word. They scan quickly for
relevant information.
This chapter presents examples to show that when information
is presented in a
terse, structured way, it is easier for people to scan and
understand.
Consider two presentations of the same information about an
airline flight reser-
vation. The first presentation is unstructured prose text; the
second is structured
text in outline form (see Fig. 3.1). The structured presentation
of the reservation can
be scanned and understood much more quickly than the prose
presentation.
The more structured and terse the presentation of information,
the more quickly
and easily people can scan and comprehend it. Look at the
Contents page from the
California Department of Motor Vehicles (see Fig. 3.2). The
wordy, repetitive links
slow users down and “bury” the important words they need to
see.

3
Unstructured:
You are booked on United flight 237, which departs from
Auckland at 14:30 on Tuesday 15 Oct and arrives at San
Francisco at 11:40 on Tuesday 15 Oct.
Structured:
Flight: United 237, Auckland San Francisco
Depart: 14:30 Tue 15 Oct
Arrive: 11:40 Tue 15 Oct
FIGURE 3.1
Structured presentation of airline reservation information is
easier to scan and understand.
CHAPTER 3 We Seek and Use Visual Structure30
Compare that with a terser, more structured hypothetical design
that factors out
needless repetition and marks as links only the words that
represent options
(see Fig. 3.3). All options presented in the actual Contents page
are available in the
revision, yet it consumes less screen space and is easier to scan.
Displaying search results is another situation in which

structuring data and avoid-
ing repetitive “noise” can improve people’s ability to scan
quickly and find what they
seek. In 2006, search results at HP.com included so much
repeated navigation data
and metadata for each retrieved item that they were useless. By
2009, HP had elimi-
nated the repetition and structured the results, making them
easier to scan and more
useful (see Fig. 3.4).
Of course, for information displays to be easy to scan, it is not
enough merely to
make them terse, structured, and nonrepetitious. They must also
conform to the
rules of graphic design, some of which were presented in
Chapter 2.
For example, a prerelease version of a mortgage calculator on a
real estate Web
site presented its results in a table that violated at least two
important rules of
graphic design (see Fig. 3.5A). First, people usually read
(online or offline) from top
to bottom, but the labels for calculated amounts were below
their corresponding
values. Second, the labels were just as close to the value below
as to their own
FIGURE 3.2
Contents page at the California Department of Motor Vehicles
(DMV) Web site buries the
important information in repetitive prose.
Licenses & ID Cards: Renewals, Duplicates, Changes

• Renew license: in person by mail by Internet
• Renew: instruction permit
• Apply for duplicate: license ID card
• Change of: name address
• Register as: organ donor
FIGURE 3.3
California DMV Web site Contents page with repetition
eliminated and better visual structure.
http://HP.com
31CHAPTER 3 We Seek and Use Visual Structure
value, so proximity (see Chapter 2) could not be used to
perceive that labels were
grouped with their values. To understand this mortgage results
table, users had to
scrutinize it carefully and slowly figure out which labels went
with which
numbers.
(A) (B)
FIGURE 3.4
In 2006, HP.com’s site search produced repetitious, “noisy”
results (A), but by 2009 was
improved (B).
360
0.00

Mortgage Summary
Monthly Payment $ 1,840.59
Number of Payments
Total of Payments $ 662,611.22
Interest Total $ 318,861.22
Tax Total $ 93,750.00
PMI Total $
Pay off Date Sep 2037
(A) (B)
FIGURE 3.5
(A) Mortgage summary presented by a software mortgage
calculator; (B) an improved design.
The revised design, in contrast, allows users to perceive the
correspondence
between labels and values without conscious thought (see Fig.
3.5B).
STRUCTURE ENHANCES PEOPLE’S ABILITY TO SCAN
LONG
NUMBERS

Even small amounts of information can be made easier to scan
if they are structured.
Two examples are telephone numbers and credit card numbers
(see Fig. 3.6). Tradi-
tionally, such numbers were broken into parts to make them
easier to scan and
remember.
A long number can be broken up in two ways: either the user
interface breaks it
up explicitly by providing a separate field for each part of the
number, or the inter-
face provides a single number field but lets users break the
number into parts with
spaces or punctuation (see Fig. 3.7A). However, many of
today’s computer presenta-
tions of phone and credit card numbers do not segment the
numbers and do not
Easy: (415) 123 4567
Hard: 4151234567
Easy: 1234 5678 9012 3456
Hard: 1234567890123456
FIGURE 3.6
Telephone and credit card numbers are easier to scan and
understand when segmented.
(A)
(B)
FIGURE 3.7

(A) At Democrats.org, credit card numbers can include spaces.
(B) At StuffIt.com, they cannot,
making them harder to scan and verify.
http://Democrats.org
http://StuffIt.com
33Data-specific controls provide even more structure
allow users to include spaces or other punctuation (see Fig.
3.7B). This limitation
makes it harder for people to scan a number or verify that they
typed it correctly, and
so is considered a user-interface design blooper ( Johnson,
2007). Forms presented in
software and Web sites should accept credit card numbers,
social security numbers,
phone numbers, and so on in a variety of different formats and
parse them into the
internal format.
Segmenting data fields can provide useful visual structure even
when the data to
be entered is not, strictly speaking, a number. Dates are an
example of a case in
which segmented fields can improve readability and help
prevent data entry errors,
as shown by a date field at Bank of America’s Web site (see
Fig. 3.8).
DATA-SPECIFIC CONTROLS PROVIDE EVEN MORE
STRUCTURE
A step up in structure from segmented data fields are data-
specific controls. Instead
of using simple text fields—whether segmented or not—

designers can use controls
that are designed specifically to display (and accept as input) a
value of a specific
type. For example, dates can be presented (and accepted) in the
form of menus com-
bined with pop-up calendar controls (see Fig. 3.9).
It is also possible to provide visual structure by mixing
segmented text fields with
data-specific controls, as demonstrated by an email address
field at Southwest Air-
lines’ Web site (see Fig. 3.10).
FIGURE 3.8
At BankOfAmerica.com, segmented data fields provide useful
structure.
FIGURE 3.10
At SWA.com email addresses are entered into fields structured
to accept parts of the address.
FIGURE 3.9
At NWA.com, dates are displayed and entered using a control
that is specifically designed for dates.
http://BankOfAmerica.com
http://SWA.com
http://NWA.com
VISUAL HIERARCHY LETS PEOPLE FOCUS ON THE

RELEVANT
INFORMATION
One of the most important goals in structuring information
presentations is to pro-
vide a visual hierarchy—an arrangement that:
l Breaks the information into distinct sections, and breaks
large sections into
subsections.
l Labels each section and subsection prominently and in such a
way as to clearly
identify its content.
l Presents the sections and subsections as a hierarchy, with
higher-level sections
presented more strongly than lower-level ones.
A visual hierarchy allows people, when scanning information, to
instantly sepa-
rate what is relevant to their goals from what is irrelevant, and
to focus their atten-
tion on the relevant information. They find what they are
looking for more quickly
because they can easily skip everything else.
Try it for yourself. Look at the two information displays in
Figure 3.11 and find the
information about prominence. How much longer does it take
you to find it in the
nonhierarchical presentation?
Create a Clear Visual Hierarchy
Organize and prioritize the contents of a page by

using size, prominence, and content relationships.
Let’s look at these relationships more closely:
• Size. The more important a headline is, the larger
its font size should be. Big bold headlines help to
grab the user’s attention as they scan the Web
page.
• Content Relationships. Group similar content
types by displaying the content in a similar visual
style, or in a clearly defined area.
• Prominence. The more important the headline or
content, the higher up the page it should be placed.
The most important or popular content should
always be positioned prominently near the top of
the page, so users can view it without having to
scroll too far.
Create a Clear Visual Hierarchy
Organize and prioritize the contents
of a page by using size, prominence,
and content relationships. Let’s look
at these relationships more closely.
The more important a headline is,
the larger its font size should be.
Big bold headlines help to grab the
user’s attention as they scan the
Web page. The more important the
headline or content, the higher up
the page it should be placed. The
most important or popular content
should always be positioned
prominently near the top of the page,
so users can view it without having to

scroll too far. Group similar content
types by displaying the content in a
similar visual style, or in a clearly
defined area.
(A) (B)
FIGURE 3.11
Find the advice about prominence in each of these displays.
Prose text format (A) makes
people read everything. Visual hierarchy (B) lets people ignore
information irrelevant to their
goals.
35Visual hierarchy lets people focus on the relevant information
The examples in Figure 3.11 show the value of visual hierarchy
in a textual,
read-only information display. Visual hierarchy is equally
important in interactive
control panels and forms—perhaps even more so. Compare
dialog boxes from
two different music software products (see Fig. 3.12). The
Reharmonize dialog
box of Band-in-a-Box has poor visual hierarchy, making it hard
for users to find
things quickly. In contrast, GarageBand’s Audio/MIDI control
panel has good
visual hierarchy, so users can quickly find the settings they are
interested in.
(A)

(B)
FIGURE 3.12
Visual hierarchy in interactive control panels and forms lets
users find settings quickly: (A)
Band-in-a-Box (bad) and (B) GarageBand (good).
Used by permission, www.OK/Cancel.com.
http://www.OK/Cancel.com
http://dx.doi.org/10.1016/B978-0-12-407914-4.00004-X
CHAPTER
37
Our Color Vision is Limited
Human color perception has both strengths and limitations,
many of which are rel-
evant to user-interface design. For example:
l Our vision is optimized to detect contrasts (edges), not
absolute brightness.
l Our ability to distinguish colors depends on how colors are
presented.

l Some people have color-blindness.
l The user’s display and viewing conditions affect color
perception.
To understand these qualities of human color vision, let’s start
with a brief
description of how the human visual system processes color
information from the
environment.
HOW COLOR VISION WORKS
If you took introductory psychology or neurophysiology in
college, you probably
learned that the retina at the back of the human eye—the surface
onto which the eye
focuses images—has two types of light receptor cells: rods and
cones. You probably
also learned that the rods detect light levels but not colors,
while the cones detect
colors. Finally, you probably learned that there are three types
of cones—sensitive to
red, green, and blue light—suggesting that our color vision is
similar to video cam-
eras and computer displays, which detect or project a wide
variety of colors through
combinations of red, green, and blue pixels.
What you learned in college is only partly right. People with
normal vision do in
fact have rods and three types of cones1 in their retinas. The
rods are sensitive to
overall brightness while the three types of cones are sensitive to
different
1 People with color-blindness may have fewer than three, and

some women have four, cone types (Eagleman,
2012).
4
CHAPTER 4 Our Color Vision is Limited38
frequencies of light. But that is where the truth departs from
what most people
learned in college, until recently.
First, those of us who live in industrialized societies hardly use
our rods at all. They
function only at low levels of light. They are for getting around
in poorly lighted envi-
ronments—the environments our ancestors lived in until the
nineteenth century.
Today, we use our rods only when we are having dinner by
candlelight, feeling our way
around our dark house at night, camping outside after dark, etc.
(see Chapter 5). In
bright daylight and modern artificially lighted environments—
where we spend most of
our time—our rods are completely maxed out, providing no
useful information. Most of
the time, our vision is based entirely on input from our cones
(Ware, 2008).
So how do our cones work? Are the three types of cones
sensitive to red, green,
and blue light, respectively? In fact, each type of cone is
sensitive to a wider range of
light frequencies than you might expect, and the sensitivity
ranges of the three types

overlap considerably. In addition, the overall sensitivity of the
three types of cones
differs greatly (see Fig. 4.1A):
l Low frequency. These cones are sensitive to light over
almost the entire range
of visible light, but are most sensitive to the middle (yellow)
and low (red)
frequencies.
l Medium frequency. These cones respond to light ranging
from the high-fre-
quency blues through the lower middle-frequency yellows and
oranges. Over-
all, they are less sensitive than the low-frequency cones.
l High frequency. These cones are most sensitive to light at the
upper end of
the visible light spectrum—violets and blues—but they also
respond weakly to
middle frequencies, such as green. These cones are much less
sensitive overall
than the other two types of cones, and also less numerous. One
result is that our
eyes are much less sensitive to blues and violets than to other
colors.
Compare a graph of the light sensitivity of our retinal cone cells
(Fig. 4.1A) to
what the graph might look like if electrical engineers had
designed our retinas as a
mosaic of receptors sensitive to red, green, and blue, like a
camera (Fig. 4.1B).
1.0

0.8
0.6
0.4
0.2
(B)(A)
400 500 600 700
Wavelength (nanometers)
L
M
H
400 500 600 700
Wavelength (nanometers)
0.2
0.4
0.6
0.8
1.0
R
e

at
ve
a
bs
or
ba
nc
e
FIGURE 4.1
Sensitivity of the three types of retinal cones (A) versus
artificial red, green, and blue receptors (B).
39Vision is optimized for contrast, not brightness
Given the odd relationships among the sensitivities of our three
types of retinal
cone cells, one might wonder how the brain combines the
signals from the cones to
allow us to see a broad range of colors.
The answer is by subtraction. Neurons in the visual cortex at the
back of our
brain subtract the signals coming over the optic nerves from the
medium- and low-
frequency cones, producing a red–green difference signal
channel. Other neurons in
the visual cortex subtract the signals from the high- and low-
frequency cones, yield-

ing a yellow–blue difference signal channel. A third group of
neurons in the visual
cortex adds the signals coming from the low- and medium-
frequency cones to pro-
duce an overall luminance (or black–white) signal channel.2
These three channels
are called color-opponent channels.
The brain then applies additional subtractive processes to all
three color-oppo-
nent channels: signals coming from a given area of the retina
are effectively sub-
tracted from similar signals coming from nearby areas of the
retina.
VISION IS OPTIMIZED FOR CONTRAST, NOT BRIGHTNESS
All this subtraction makes our visual system much more
sensitive to differences in
color and brightness—that is, to contrasting colors and edges—
than to absolute
brightness levels.
To see this, look at the inner bar in Figure 4.2. The inner bar
looks darker on the
right, but in fact is one solid shade of gray. To our contrast-
sensitive visual system, it
looks lighter on the left and darker on the right because the
outer rectangle is darker
on the left and lighter on the right.
The sensitivity of our visual system to contrast rather than to
absolute brightness
is an advantage: it helped our distant ancestors recognize a
leopard in the nearby
bushes as the same dangerous animal whether they saw it in
bright noon sunlight or

in the early morning hours of a cloudy day. Similarly, being
sensitive to color
2 The overall brightness sum omits the signal from the high-
frequency (blue–violet) cones. Those cones are so
insensitive that their contribution to the total would be
negligible, so omitting them makes little difference.
FIGURE 4.2
The inner gray bar looks darker on the right, but in fact is all
one shade of gray.
contrasts rather than to absolute colors allows us to see a rose
as the same red
whether it is in the sun or the shade.
Brain researcher Edward H. Adelson at the Massachusetts
Institute of Technology
developed an outstanding illustration of our visual system’s
insensitivity to absolute
brightness and its sensitivity to contrast (see Fig. 4.3). As
difficult as it may be to
believe, square A on the checkerboard is exactly the same shade
as square B. Square
B only appears white because it is depicted as being in the
cylinder’s shadow.
THE ABILITY TO DISCRIMINATE COLORS DEPENDS ON
HOW
COLORS ARE PRESENTED
Even our ability to detect differences between colors is limited.

Because of how our
visual system works, three presentation factors affect our ability
to distinguish col-
ors from each other:
l Paleness. The paler (less saturated) two colors are, the harder
it is to tell them
apart (see Fig. 4.4A).
l Color patch size. The smaller or thinner objects are, the
harder it is to distin-
guish their colors (see Fig. 4.4B). Text is often thin, so the
exact color of text is
often hard to determine.
l Separation. The more separated color patches are, the more
difficult it is to
distinguish their colors, especially if the separation is great
enough to require
eye motion between patches (see Fig. 4.4C).
Several years ago, the online travel website ITN.net used two
pale colors—white
and pale yellow—to indicate which step of the reservation
process the user was on
(see Fig. 4.5). Some site visitors couldn’t see which step they
were on.
FIGURE 4.3
The squares marked A and B are the same gray. We see B as
white because it is shaded from
the cylinder’s shadow.
http://ITN.net

41The ability to discriminate colors depends on how colors are
presented
(A) (B) (C)
FIGURE 4.4
Factors affecting ability to distinguish colors: (A) paleness, (B)
size, and (C) separation.
FIGURE 4.5
The pale color marking the current step makes it hard for users
to see which step in the airline
reservation process they are on in ITN.net’s 2003 website.
1 3 5 7 9
11 13 15 17
1
0.8
0.6
0.4
0.2
0
0.2
0.4

0.6
0.8
1
S1
S6
S16
S11
0.8 1
0.6 0.8
0.4 0.6
0.2 0.4
0 0.2
0.4 0.2
0.6 0.4
0.8 0.6
1 0.8
0.2 0
FIGURE 4.6
Tiny color patches in this chart legend are hard to distinguish.

0 200 400 600 800
Legend
Beverages
Condiments
Confections
Dairy products
Grains/Cereals
Meat/Poultry
Produce
Seafood
FIGURE 4.7
Large color patches make it easier to distinguish the colors.
FIGURE 4.8
The difference in color between visited and unvisited links is
too subtle in MinneapolisFed.org’s
website.
Small color patches are often seen in data charts and plots.
Many business graph-
ics packages produce legends on charts and plots, but make the
color patches in the
legend very small (see Fig. 4.6). Color patches in chart legends

should be large to help
people distinguish the colors (see Fig. 4.7).
On websites, a common use of color is to distinguish
unfollowed links from
already followed ones. On some sites, the “followed” and
“unfollowed” colors are too
similar. The website of the Federal Reserve Bank of
Minneapolis (see Fig. 4.8) has
this problem. Furthermore, the two colors are shades of blue,
the color range in
which our eyes are least sensitive. Can you spot the two
followed links?3
3 Already followed links in Figure 4.8: Housing Units
Authorized and House Price Index.
43Color-blindness
COLOR-BLINDNESS
A fourth factor of color presentation that affects design
principles for interactive systems
is whether the colors can be distinguished by people who have
common types of color-
blindness. Having color-blindness doesn’t mean an inability to
see colors. It just means
that one or more of the color subtraction channels (see the
“How Color Vision Works”
section) don’t function normally, making it difficult to
distinguish certain pairs of colors.
Approximately 8% of men and slightly under 0.5% of women
have a color perception
deficit: difficulty discriminating certain pairs of colors
(Wolfmaier, 1999). The most com-

mon type of color-blindness is red–green; other types are much
rarer. Figure 4.9 shows
color pairs that people with red–green color-blindness have
trouble distinguishing.
(A)
(C)
(B)
FIGURE 4.9
Red–green color-blind people can’t distinguish (A) dark red
from black, (B) blue from purple,
and (C) light green from white.
FIGURE 4.10
MoneyDance’s graph uses colors some users can’t distinguish.
FIGURE 4.11
MoneyDance’s graph rendered in grayscale.
(A) (B)
FIGURE 4.12
Google logo: (A) normal and (B) after red–green color-
blindness filter.

The home finance application MoneyDance provides a graphical
breakdown of
household expenses, using color to indicate the various expense
categories (see Fig.
4.10). Unfortunately, many of the colors are hues that color-
blind people cannot tell
apart. For example, people with red–green color-blindness
cannot distinguish the
blue from the purple or the green from the khaki. If you are not
color-blind, you can
get an idea of which colors in an image will be hard to
distinguish by converting the
image to grayscale (see Fig. 4.11), but, as described in the
“Guidelines for Using
Color” section later in this chapter, it is best to run the image
through a color-blind-
ness filter or simulator (see Fig. 4.12).
EXTERNAL FACTORS THAT INFLUENCE THE ABILITY TO
DISTINGUISH COLORS
Factors concerning the external environment also impact
people’s ability to distin-
guish colors. For example:
l Variation among color displays. Computer displays vary in
how they dis-
play colors, depending on their technologies, driver software, or
color settings.
45Guidelines for using color
Even monitors of the same model with the same settings may
display colors
slightly differently. Something that looks yellow on one display

may look beige
on another. Colors that are clearly different on one may look the
same on
another.
l Grayscale displays. Although most displays these days are
color, there are
devices, especially small handheld ones, with grayscale
displays. For instance,
Figure 4.11 shows that a grayscale display can make areas of
different colors look
the same.
l Display angle. Some computer displays, particularly LCD
ones, work much
better when viewed straight on than at an angle. When LCD
displays are viewed
at an angle, colors—and color differences—often are altered.
l Ambient illumination. Strong light on a display washes out
colors before it
washes out light and dark areas, reducing color displays to
grayscale ones, as
anyone who has tried to use a bank ATM in direct sunlight
knows. In offices,
glare and venetian blind shadows can mask color differences.
These four external factors are usually out of the software
designer’s control.
Designers should, therefore, keep in mind that they don’t have
full control of users’
color viewing experience. Colors that seem highly
distinguishable in the development
facility on the development team’s computer displays and under
normal office lighting
conditions may not be as distinguishable in some of the

environments where the soft-
ware is used.
GUIDELINES FOR USING COLOR
In interactive software systems that rely on color to convey
information, follow
these five guidelines to assure that the users of the software
receive the
information:
1. Distinguish colors by saturation and brightness, as well as
hue. Avoid
subtle color differences. Make sure the contrast between colors
is high (but see
guideline 5). One way to test whether colors are different
enough is to view
them in grayscale. If you can’t distinguish the colors when they
are rendered in
grays, they aren’t different enough.
2. Use distinctive colors. Recall that our visual system
combines the signals
from retinal cone cells to produce three color-opponent
channels: red–green,
yellow–blue, and black–white (luminance). The colors that
people can distin-
guish most easily are those that cause a strong signal (positive
or negative) on
one of the three color-perception channels, and neutral signals
on the other
two channels. Not surprisingly, those colors are red, green,
yellow, blue, black,
and white (see Fig. 4.13). All other colors cause signals on
more than one color
channel, and so our visual system cannot distinguish them from
other colors as

quickly and easily as it can distinguish those six colors (Ware,
2008).
3. Avoid color pairs that color-blind people cannot distinguish.
Such pairs
include dark red versus black, dark red versus dark green, blue
versus purple,
light green versus white. Don’t use dark reds, blues, or violets
against any dark
colors. Instead, use dark reds, blues, and violets against light
yellows and greens.
Use an online color-blindness simulator4 to check web pages
and images to see
how people with various color-vision deficiencies would see
them.
4. Use color redundantly with other cues. Don’t rely on color
alone. If you
use color to mark something, mark it another way as well.
Apple’s iPhoto uses
both color and a symbol to distinguish “smart” photo albums
from regular
albums (see Fig. 4.14).
5. Separate strong opponent colors. Placing opponent colors
right next to or
on top of each other causes a disturbing shimmering sensation,
and so it should
be avoided (see Fig. 4.15).
As shown in Figure 4.5, ITN.net used only pale yellow to mark
customers’ current

step in making a reservation, which is too subtle. A simple way
to strengthen the
marking would be to make the current step bold and increase the
saturation of the
4 Search the Web for “color-blindness filter” or “color-
blindness simulator.”
FIGURE 4.13
The most distinctive colors: black, white, red, green, yellow,
blue. Each color causes a strong
signal on only one color-opponent channel.
FIGURE 4.14
Apple’s iPhoto uses color plus a symbol to distinguish two
types of albums.
FIGURE 4.15
Opponent colors, placed on or directly next to each other, clash.
47Guidelines for using color
yellow (see Fig. 4.16A). But ITN.net opted for a totally new
design, which also uses
color redundantly with shape (see Figure 4.16B).
A graph from the Federal Reserve Bank uses shades of gray (see
Fig. 4.17). This is
a well-designed graph. Any sighted person could read it.

(A)
(B)
FIGURE 4.16
ITN.net’s current step is highlighted in two ways: with color
and shape.
FIGURE 4.17
MinneapolisFed.org’s graph uses shade differences visible to all
sighted people, on any display.
CHAPTER
49
Our Peripheral Vision
is Poor
Chapter 4 explained that the human visual system differs from a
digital camera in
the way it detects and processes color. Our visual system also

differs from a camera
in its resolution. On a digital camera’s photo sensor,
photoreceptive elements are
spread uniformly in a tight matrix, so the spatial resolution is
constant across the
entire image frame. The human visual system is not like that.
This chapter explains why
l Stationary items in muted colors presented in the periphery
of people’s visual
field often will not be noticed.
l Motion in the periphery is usually noticed.
RESOLUTION OF THE FOVEA COMPARED TO THE
PERIPHERY
The spatial resolution of the human visual field drops greatly
from the center to the
edges. There are three reasons for this:
l Pixel density. Each eye has 6 to 7 million retinal cone cells.
They are packed
much more tightly in the center of our visual field—a small
region called the
fovea—than they are at the edges of the retina (see Fig. 5.1).
The fovea has about
158,000 cone cells in each square millimeter. The rest of the
retina has only
9,000 cone cells per square millimeter.
l Data compression. Cone cells in the fovea connect 1:1 to the
ganglial neuron
cells that begin the processing and transmission of visual data,
while elsewhere
on the retina, multiple photoreceptor cells (cones and rods)

connect to each
ganglion cell. In technical terms, information from the visual
periphery is com-
pressed (with data loss) before transmission to the brain, while
information
from the fovea is not.
5
CHAPTER 5 Our Peripheral Vision is Poor50
l Processing resources. The fovea is only about 1% of the
retina, but the brain’s
visual cortex devotes about 50% of its area to input from the
fovea. The other
half of the visual cortex processes data from the remaining 99%
of the retina.
The result is that our vision has much, much greater resolution
in the center of our
visual field than elsewhere (Lindsay and Norman, 1972;
Waloszek, 2005). Said in devel-
oper jargon: in the center 1% of your visual field (i.e., the
fovea), you have a high-
resolution TIFF, and everywhere else, you have only a low-
resolution JPEG. That is noth-
ing like a digital camera.
To visualize how small the fovea is compared to your entire
visual field, hold your
arm straight out and look at your thumb. Your thumbnail,
viewed at arm’s length,
corresponds approximately to the fovea (Ware, 2008). While
you have your eyes

focused on the thumbnail, everything else in your visual field
falls outside of your
fovea on your retina.
In the fovea, people with normal vision have very high
resolution: they can
resolve several thousand dots within that region—better
resolution than many of
today’s pocket digital cameras. Just outside of the fovea, the
resolution is already
down to a few dozen dots per inch viewed at arm’s length. At
the edges of our
vision, the “pixels” of our visual system are as large as a melon
(or human head) at
arm’s length (see Fig. 5.2).
Even though our eyes have more rods than cones—125 million
versus 6–7 million—
peripheral vision has much lower resolution than foveal vision.
This is because while
most of our cone cells are densely packed in the fovea (1% of
the retina’s area), the rods
are spread out over the rest of the retina (99% of the retina’s
area). In people with nor-
mal vision, peripheral vision is about 20/200, which in the
United States is considered
Blind spot
sdoRsdoR
senoCsenoC
180,000
160,000

140,000
120,000
N
um
be
r o
f r
ec
ep
to
rs
pe
r s
qu
ar
e
m
m
et
er
100,000
80,000

60,000
40,000
20,000
0
70 60 50 40 30 20 10 0
Angle (deg)
10 20 30 40 50 60 70 80
FIGURE 5.1
Distribution of photoreceptor cells (cones and rods) across the
retina. From Lindsay and Norman
(1972).
51Resolution of the fovea compared to the periphery
legally blind. Think about that: in the periphery of your visual
field, you are legally
blind. Here is how brain researcher David Eagleman (2012;
page 23) describes it:
The resolution in your peripheral vision is roughly equivalent to
looking through a frosted
shower door, and yet you enjoy the illusion of seeing the
periphery clearly. … Wherever
you cast your eyes appears to be in sharp focus, and therefore
you assume the whole
visual world is in focus.
If our peripheral vision has such low resolution, one might
wonder why we don’t see

the world in a kind of tunnel vision where everything is out of
focus except what we
are directly looking at now. Instead, we seem to see our
surroundings sharply and
clearly all around us. We experience this illusion because our
eyes move rapidly and
constantly about three times per second even when we don’t
realize it, focusing our
fovea on selected pieces of our environment. Our brain fills in
the rest in a gross,
impressionistic way based on what we know and expect.1 Our
brain does not have to
maintain a high-resolution mental model of our environment
because it can order the
eyes to sample and resample details in the environment as
needed (Clark, 1998).
For example, as you read this page, your eyes dart around,
scanning and reading.
No matter where on the page your eyes are focused, you have
the impression of
viewing a complete page of text, because, of course, you are.
1 Our brains also fill in perceptual gaps that occur during rapid
(saccadic) eye movements, when vision is sup-
pressed (see Chapter 14).
(A) (B)
FIGURE 5.2
The resolution of our visual field is high in the center but much
lower at the edges. Right image
from Vision Research, Vol. 14 (1974), Elsevier.

But now, imagine that you are viewing this page on a computer
screen, and the
computer is tracking your eye movements and knows where
your fovea is on the
page. Imagine that wherever you look, the right text for that
spot on the page is
shown clearly in the small area corresponding to your fovea, but
everywhere else on
the page, the computer shows random, meaningless text. As
your fovea flits around
the page, the computer quickly updates each area where your
fovea stops to show the
correct text there, while the last position of your fovea returns
to textual noise. Amaz-
ingly, experiments have shown that people rarely notice this:
not only can they read,
they believe that they are viewing a full page of meaningful text
(Clark, 1998). How-
ever, it does slow people’s reading, even if they don’t realize it
(Larson, 2004).
The fact that retinal cone cells are distributed tightly in and
near the fovea, and
sparsely in the periphery of the retina, affects not only spatial
resolution but color resolu-
tion. We can discriminate colors better in the center of our
visual field than at the edges.
Another interesting fact about our visual field is that it has a
gap—a small area (blind
spot) in which we see nothing. The gap corresponds to the spot
on our retina where the
optic nerve and blood vessels exit the back of the eye (see Fig.

5.1). There are no retinal
rod or cone cells at that spot, so when the image of an object in
our visual field happens
to fall on that part of the retina, we don’t see it. We usually
don’t notice this hole in our
vision because our brain fills it in with the surrounding content,
like a graphic artist
using Photoshop to fill in a blemish on a photograph by copying
nearby background
pixels.
People sometimes experience the blind spot when they gaze at
stars. As you look
at one star, a nearby star may disappear briefly into the blind
spot until you shift your
gaze. You can also observe the gap by trying the exercise in
Figure 5.3. Some people
have other gaps resulting from imperfections on the retina,
retinal damage, or brain
strokes that affect the visual cortex,2 but the optic nerve gap is
an imperfection
everyone shares.
IS THE VISUAL PERIPHERY GOOD FOR ANYTHING?
It seems that the fovea is better than the periphery at just about
everything. One might
wonder why we have peripheral vision. What is it good for? Our
peripheral vision
serves three important functions: it guides fovea, detects
motion, and lets us see better
in the dark.
2 See VisionSimulations.com.
FIGURE 5.3

To “see” the retinal gap, cover your left eye, hold this book
near your face, and focus your right
eye on the +. Move the book slowly away from you, staying
focused on the +. The @ will disappear
at some point.
53Is the visual periphery good for anything?
Function 1: Guides fovea
First, peripheral vision provides low-resolution cues to guide
our eye movements so that
our fovea visits all the interesting and crucial parts of our visual
field. Our eyes don’t scan
our environment randomly. They move so as to focus our fovea
on important things, the
most important ones (usually) first. The fuzzy cues on the
outskirts of our visual field
provide the data that helps our brain plan where to move our
eyes, and in what order.
For example, when we scan a medicine label for a “use by”
date, a fuzzy blob in
the periphery with the vague form of a date is enough to cause
an eye movement that
lands the fovea there to allow us to check it. If we are browsing
a produce market
looking for strawberries, a blurry reddish patch at the edge of
our visual field draws
our eyes and our attention, even though sometimes it may turn
out to be radishes
instead of strawberries. If we hear an animal growl nearby, a
fuzzy animal-like shape
in the corner of our eye will be enough to zip our eyes in that
direction, especially if

the shape is moving toward us (see Fig. 5.4).
How peripheral vision guides and augments central, foveal
vision is discussed
more in the “Visual Search Is Linear Unless Targets ‘Pop’ in
the Periphery” section
later in this chapter.
Function 2: Detects motion
A related guiding function of peripheral vision is that it is good
at detecting motion.
Anything that moves in our visual periphery, even slightly, is
likely to draw our
attention—and hence our fovea—toward it. The reason for this
phenomenon is that
our ancestors—including prehuman ones—were selected for
their ability to spot
food and avoid predators. As a result, even though we can move
our eyes under
conscious, intentional control, some of the mechanisms that
control where they look
are preconscious, involuntary, and very fast.
FIGURE 5.4
A moving shape at the edge of our vision draws our eye: it
could be food, or it might consider us food.
What if we have no reason to expect that there might be
anything interesting in
a certain spot in the periphery,3 and nothing in that spot attracts
our attention? Our

eyes may never move our fovea to that spot, so we may never
see what is there.
Function 3: Lets us see better in the dark
A third function of peripheral vision is to allow us to see in
low-light conditions—for
example, on starlit nights, in caves, around campfires, etc.
These were conditions under
which vision evolved, and in which people—like the animals
that preceded them on
Earth—spent much of their time until the invention of the
electric light bulb in the 1800s.
Just as the rods are overloaded in well-lighted conditions (see
Chapter 5), the
cones don’t function very well in low light, so our rods take
over. Low-light, rods-
only vision is called scotopic vision. An interesting fact is that
because there are no
rods in the fovea, you can see objects better in low-light
conditions (e.g., faint stars)
if you don’t look directly at them.
EXAMPLES FROM COMPUTER USER INTERFACES
The low acuity of our peripheral vision explains why software
and website users fail
to notice error messages in some applications and websites.
When someone clicks a
button or a link, that is usually where his or her fovea is
positioned. Everything on the
screen that is not within 1–2 centimeters of the click location
(assuming normal com-
puter viewing distance) is in peripheral vision, where resolution
is low. If, after the
click, an error message appears in the periphery, it should not
be surprising that the

person might not notice it.
For example, at InformaWorld.com, the online publications
website of Informa
Healthcare, if a user enters an incorrect username or password
and clicks “Sign In,”
an error message appears in a “message bar” far away from
where the user’s eyes are
most likely focused (see Fig. 5.5). The red word “Error” might
appear in the user’s
peripheral vision as a small reddish blob, which would help
draw the eyes in that
direction. However, the red blob could fall into a gap in the
viewer’s visual field, and
so not be noticed at all.
Consider the sequence of events from a user’s point of view.
The user enters a
username and password and then clicks “Sign In.” The page
redisplays with blank
fields. The user thinks “Huh? I gave it my login information and
hit ‘Sign In,’ didn’t I?
Did I hit the wrong button?” The user reenters the username and
password, and
clicks “Sign In” again. The page redisplays with empty fields
again. Now the user is
really confused. The user sighs (or curses), sits back in his chair
and lets his eyes scan
the screen. Suddenly noticing the error message, the user says
“A-ha! Has that error
message been there all along?”
3 See Chapter 1 on how expectations bias our perceptions.
http://InformaWorld.com

55Examples from computer user interfaces
Even when an error message is placed nearer to the center of the
viewer’s visual
field than in the preceding example, other factors can diminish
its visibility. For
example, until recently the website of Airborne.com signaled a
login failure by dis-
playing an error message in red just above the Login ID field
(see Fig. 5.6). This error
message is entirely in red and fairly near the “Login” button
where the user’s eyes are
probably focused. Nonetheless, some users would not notice this
error message
when it first appeared. Can you think of any reasons people
might not initially see
this error message?
One reason is that even though the error message is much closer
to where users
will be looking when they click the “Login” button, it is still in
the periphery, not in
the fovea. The fovea is small: just a centimeter or two on a
computer screen, assum-
ing the user is the usual distance from the screen.
A second reason is that the error message is not the only thing
near the top of the
page that is red. The page title is also red. Resolution in the
periphery is low, so when
the error message appears, the user’s visual system may not
register any change:
there was something red up there before, and there still is (see
Fig. 5.7).

If the page title were black or any other color besides red, the
red error message
would be more likely to be noticed, even though it appears in
the periphery of the
users’ visual field.
Error Message
Fovea
FIGURE 5.5
This error message for a faulty sign-in appears in peripheral
vision, where it will probably be
missed.
http://Airborne.com
COMMON METHODS OF MAKING MESSAGES VISIBLE
There are several common and well-known methods of ensuring
that an error mes-
sage will be seen:
l Put it where users are looking. People focus in predictable
places when
interacting with graphical user interfaces (GUIs). In Western
societies, people
tend to traverse forms and control panels from upper left to
lower right. While
moving the screen pointer, people usually look either at where it
is or where
they are moving it to. When people click a button or link, they
can usually be

assumed to be looking directly at it, at least for a few moments
afterward.
Designers can use this predictability to position error messages
near where
they expect users to be looking.
FIGURE 5.6
This error message for a faulty login is missed by some users
even though it is not far from the
“Login” button.
FIGURE 5.7
Simulation of a user’s visual field while the fovea is fixed on
the “Login” button.
57Common methods of making messages visible
FIGURE 5.8
This error message for faulty sign-in is displayed more
prominently, near where users will be
looking.
l Mark the error. Somehow mark the error prominently to
indicate clearly that
something is wrong. Often this can be done by simply placing
the error mes-
sage near what it refers to, unless that would place the message
too far from
where users are likely to be looking.
l Use an error symbol. Make errors or error messages more

visible by marking
them with an error symbol, such as , , , or .
l Reserve red for errors. By convention, in interactive
computer systems the
color red connotes alert, danger, problem, error, etc. Using red
for any other
information on a computer display invites misinterpretation. But
suppose you
are designing a website for Stanford University, which has red
as its school
color. Or suppose you are designing for a Chinese market,
where red is consid-
ered an auspicious, positive color. What do you do? Use another
color for errors,
mark them with error symbols, or use stronger methods (see the
next section).
An improved version of the InformaWorld sign-in error screen
uses several of
these techniques (see Fig. 5.8).
At America Online’s website, the form for registering for a new
email account fol-
lows the guidelines pretty well (see Fig. 5.9). Data fields with
errors are marked with
red error symbols. Error messages are displayed in red and are
near the error. Further-
more, most of the error messages appear as soon as an erroneous
entry is made, when

the user is still focused on that part of the form, rather than only
after the user submits
the form. It is unlikely that AOL users will miss seeing these
error messages.
HEAVY ARTILLERY FOR MAKING USERS NOTICE
MESSAGES
If the common, conventional methods of making users notice
messages are not
enough, three stronger methods are available to user-interface
designers: pop-up mes-
sage in error dialog box, use of sound (e.g., beep), and wiggle
or blink briefly. How-
ever, these methods, while very effective, have significant
negative effects, so they
should be used sparingly and with great care.
Method 1: Pop-up message in error dialog box
Displaying an error message in a dialog box sticks it right in the
user’s face, making it
hard to miss. Error dialog boxes interrupt the user’s work and
demand immediate
attention. That is good if the error message signals a critical
condition, but it can annoy
people if such an approach is used for a minor message, such as
confirming the execu-
tion of a user-requested action.
The annoyance of pop-up messages rises with the degree of
modality. Nonmodal
pop-ups allow users to ignore them and continue working.
Application-modal pop-
ups block any further work in the application that displayed the
error, but allow
users to interact with other software on their computer. System-
modal pop-ups

block any user action until the dialog has been dismissed.
Application-modal pop-ups should be used sparingly—for
example, only when
application data may be lost if the user doesn’t attend to the
error. System-modal
FIGURE 5.9
New member registration at AOL.com displays error messages
prominently, near each error.
http://AOL.com
59Heavy artillery for making users notice messages
pop-ups should be used extremely rarely—basically only when
the system is about
to crash and take hours of work with it, or if people will die if
the user misses the
error message.
On the Web, an additional reason to avoid pop-up error dialog
boxes is that some
people set their browsers to block all pop-up windows. If your
website relies on
pop-up error messages, some users may never see them.
REI.com has an example of a pop-up dialog being used to
display an error mes-
sage. The message is displayed when someone who is
registering as a new customer
omits required fields in the form (see Fig. 5.10). Is this an
appropriate use of a pop-up
dialog? AOL.com (see Fig. 5.9) shows that missing data errors

can be signaled quite
well without pop-up dialogs, so REI.com’s use of them seems a
bit heavy-handed.
Examples of more appropriate use of error dialog boxes come
from Microsoft
Excel (see Fig. 5.11A) and Adobe InDesign (see Fig. 5.11B). In
both cases, loss of data
is at stake.
Method 2: Use sound (e.g., beep)
When a computer beeps, that tells its user something has
happened that requires
attention. The person’s eyes reflexively begin scanning the
screen for whatever caused
the beep. This can allow the user to notice an error message that
is someplace other
than where the user was just looking, such as in a standard error
message box on the
display. That is the value of beeping.
However, imagine many people in a cubicle work environment
or a classroom, all
using an application that signals all errors and warnings by
beeping. Such a work-
place would be very annoying, to say the least. Worse, people
wouldn’t be able to tell
whether their own computer or someone else’s was beeping.
FIGURE 5.10
REI’s pop-up dialog box signals required data that was omitted.
It is hard to miss, but perhaps
overkill.
http://AOL.com

The opposite situation is noisy work environments (e.g.,
factories or computer
server rooms), where auditory signals emitted by an application
might be masked by
ambient noise. Even in non-noisy environments, some computer
users simply prefer
quiet, and mute the sound on their computers or turn it way
down.
For these reasons, signaling errors and other conditions with
sound are remedies
that can be used only in very special, controlled situations.
Computer games often use sound to signal events and
conditions. In games,
sound isn’t annoying; it is expected. Its use in games is
widespread, even in game
arcades, where dozens of machines are all banging, roaring,
buzzing, clanging, beep-
ing, and playing music at once. (Well, it is annoying to parents
who have to go into
the arcades and endure all the screeching and booming to
retrieve their kids, but the
games aren’t designed for parents.)
Method 3: Wiggle or blink briefly
As described earlier in this chapter, our peripheral vision is
good at detecting motion,
and motion in the periphery causes reflexive eye movements
that bring the motion
into the fovea. User-interface designers can make use of this by
wiggling or flashing

messages briefly when they want to ensure that users see them.
It doesn’t take much
motion to trigger eye movement toward the motion. Just a tiny
bit of motion is enough
to make a viewer’s eyes zip over in that direction. Millions of
years of evolution have
had quite an effect.
As an example of using motion to attract users’ eye attention,
Apple’s iCloud
online service briefly shakes the entire dialog box horizontally
when a user enters an
invalid username or password (see Fig. 5.12). In addition to
clearly indicating “No”
(like a person shaking his head), this attracts the users’
eyeballs, guaranteed.
(Because, after all, the motion in the corner of your eye might
be a leopard.)
The most common use of blinking in computer user interfaces
(other than adver-
tisements) is in menu bars. When an action (e.g., Edit or Copy)
is selected from a
(A)
(B)
FIGURE 5.11
Appropriate pop-up error dialogs: (A) Microsoft Excel and (B)
Adobe InDesign.
61Heavy artillery for making users notice messages

menu, it usually blinks once before the menu closes to confirm
that the system “got”
the command—that is, that the user didn’t miss the menu item.
This use of blinking
is very common. It is so quick that most computer users aren’t
even aware of it, but
if menu items didn’t blink once, we would have less confidence
that we actually
selected them.
Motion and blinking, like pop-up dialog boxes and beeping,
must be used spar-
ingly. Most experienced computer users consider wiggling,
blinking objects on
screen to be annoying. Most of us have learned to ignore
displays that blink because
many such displays are advertisements. Conversely, a few
computer users have atten-
tional impairments that make it difficult for them to ignore
something that is blink-
ing or wiggling.
Therefore, if wiggling or blinking is used, it should be brief—it
should last about
a quarter- to a half-second, no longer. Otherwise, it quickly
goes from an uncon-
scious attention-grabber to a conscious annoyance.
Use heavy-artillery methods sparingly to avoid habituating your
users
There is one final reason to use the preceding heavy-artillery
methods sparingly (i.e.,
only for critical messages): to avoid habituating your users.
When pop-ups, sound,
motion, and blinking are used too often to attract users’

attention, a psychological
phenomenon called habituation sets in (see Chapter 1). Our
brain pays less and less
attention to any stimulus that occurs frequently.
It is like the old fable of the boy who cried “Wolf!” too often:
eventually, the vil-
lagers learned to ignore his cries, so when a wolf actually did
come, his cries went
unheeded. Overuse of strong attention-getting methods can
cause important mes-
sages to be blocked by habituation.
FIGURE 5.12
Apple’s iCloud shakes the dialog box briefly on login errors to
attract a user’s fovea toward it.
VISUAL SEARCH IS LINEAR UNLESS TARGETS
“POP” IN THE PERIPHERY
As explained earlier, one function of peripheral vision is to
drive our eyes to focus the
fovea on important things—things we are seeking or that might
be a threat. Objects
moving in our peripheral vision fairly reliably “yank” our eyes
in that direction.
When we are looking for an object, our entire visual system,
including the periph-
ery, primes itself to detect that object. In fact, the periphery is a
crucial component in
visual search, despite its low spatial and color resolution.

However, just how helpful
the periphery is in aiding visual search depends strongly on
what we are looking for.
Look quickly at Figure 5.13 and find the Z.
To find the Z, you had to scan carefully through the characters
until your fovea
landed on it. In the lingo of vision researchers, the time to find
the Z is linear: it
depends approximately linearly on the number of distracting
characters and the
position of the Z among them.
Now look quickly at Figure 5.14 and find the bold character.
That was much easier (i.e., faster), wasn’t it? You didn’t have
to scan your fovea
carefully through the distracting characters. Your periphery
quickly detected the
boldness and determined its location, and because that is what
you were seeking,
FIGURE 5.13
Finding the Z requires scanning carefully through the
characters.
FIGURE 5.14
Finding the bold letter does not require scanning through
everything.
63Visual search is linear unless targets “pop” in the periphery

your visual system moved your fovea there. Your periphery
could not determine
exactly what was bold—that is beyond its resolution and
abilities—but it did locate the
boldness. In vision-researcher lingo, the periphery was primed
to look for boldness in
parallel over its entire area, and boldness is a distinctive feature
of the target, so search-
ing for a bold target is nonlinear. In designer lingo, we simply
say that boldness “pops
out” (“pops” for short) in the periphery, assuming that only the
target is bold.
Color “pops” even more strongly. Compare counting the L’s in
Figure 5.15 with
counting the blue characters in Figure 5.16.
What else makes things “pop” in the periphery? As described
earlier, the periph-
ery easily detects motion, so motion “pops.” Generalizing from
boldness, we also
can say that font weight “pops,” because if all but one of the
characters on a display
were bold, the nonbold character would stand out. Basically, a
visual target will pop
out in your periphery if it differs from surrounding objects in
features the periphery
can detect. The more distinctive features of the target, the more
it “pops,” assuming
the periphery can detect those features.
Using peripheral “pop” in design
Designers use peripheral “pop” to focus the attention of a
product’s users, as well as
to allow users to find information faster. Chapter 3 described

how visual hierarchy—
titles, headings, boldness, bullets, and indenting—can make it
easier for users to spot
FIGURE 5.15
Counting L’s is hard; character shape doesn’t “pop” among
characters.
FIGURE 5.16
Counting blue characters is easy because color “pops.”
and extract from text the information they need. Glance back at
Figure 3.11 in Chap-
ter 3 and see how the headings and bullets make the topics and
subtopics “pop” so
readers can go right to them.
Many interactive systems use color to indicate status, usually
reserving red for
problems. Online maps and some vehicle GPS devices mark
traffic jams with red so
they stand out (see Fig. 5.17). Systems for controlling air traffic
mark potential colli-
sions in red (see Fig. 5.18). Applications for monitoring servers
and networks use
color to show the health status of assets or groups of them (see
Fig. 5.19).
These are all uses of peripheral “pop” to make important
information stand out

and visual search nonlinear.
When there are many possible targets
Sometimes in displays of many items, any of them could be
what the user wants.
Examples include command menus (see Fig. 5.20A) and object
pallets (see Fig. 5.20B).
Let’s assume that the application cannot anticipate which item
or items a user is likely
to want, and highlight those. That is a fair assumption for
today’s applications.4 Are
users doomed to have to search linearly through such displays
for the item they want?
That depends. Designers can try to make each item so
distinctive that when a
specific one is the user’s target, the user’s peripheral vision will
be able to spot it among
4 But in the not-too-distant future it might not be.
FIGURE 5.17
Google Maps uses color to show traffic conditions. Red
indicates traffic jams.
65Visual search is linear unless targets “pop” in the periphery
FIGURE 5.18
Air traffic control systems often use red to make potential
collisions stand out.
FIGURE 5.19

Paessler’s monitoring tool uses color to show the health of
network components.
all the other items. Designing distinctive sets of icons is hard—
especially when the
set is large—but it can be done (see Johnson et. al, 1989).
Designing sets of icons that
are so distinctive that they can be distinguished in peripheral
vision is very hard, but
not impossible. For example, if a user goes to the Mac OS
application pallet to open
his or her calendar, a white rectangular blob in the periphery
with something black
in the middle is more likely to attract the user’s eye than a blue
circular blob (see Fig.
5.20B). The trick is not to get too fancy and detailed with the
icons—give each one
a distinctive color and gross shape.
On the other hand, if the potential targets are all words, as in
command menus
(see Fig. 20A), visual distinctiveness is not an option. In textual
menus and lists,
visual search will be linear, at least at first. With practice, users
learn the positions of
frequently used items in menus, lists, and pallets, so searching
for particular items is
no longer linear.
That is why applications should never move items around in
menus, lists, or

pallets. Doing that prevents users from learning item positions,
thereby dooming
them to search linearly forever. Therefore, “dynamic menus” is
considered a major
user-interface design blooper ( Johnson, 2007).
FIGURE 5.20
(A) Microsoft Word Tools menu, and (B) MacOS application
pallet.
CHAPTER
67
Reading is Unnatural
6
Most people in industrialized nations grow up in households and
school districts that
promote education and reading. They learn to read as young
children and become good
readers by adolescence. As adults, most of our activities during
a normal day involve
reading. The process of reading—deciphering words into their
meaning—is for most
educated adults automatic, leaving our conscious minds free to
ponder the meaning
and implications of what we are reading. Because of this
background, it is common for

good readers to consider reading to be as “natural” a human
activity as speaking is.
WE’RE WIRED FOR LANGUAGE, BUT NOT FOR READING
Speaking and understanding spoken language is a natural human
ability, but reading is
not. Over hundreds of thousands—perhaps millions—of years,
the human brain
evolved the neural structures necessary to support spoken
language. As a result, normal
humans are born with an innate ability to learn as toddlers, with
no systematic training,
whatever language they are exposed to. After early childhood,
this ability decreases
significantly. By adolescence, learning a new language is the
same as learning any other
skill: it requires instruction and practice, and the learning and
processing are handled
by different brain areas from those that handled it in early
childhood (Sousa, 2005).
In contrast, writing and reading did not exist until a few
thousand years BCE and
did not become common until only four or five centuries ago—
long after the human
brain had evolved into its modern state. At no time during
childhood do our brains
show any special innate ability to learn to read. Instead, reading
is an artificial skill
that we learn by systematic instruction and practice, like
playing a violin, juggling,
or reading music (Sousa, 2005).
Many people never learn to read well, or at all
Because people are not innately “wired” to learn to read,
children who either lack

caregivers who read to them or who receive inadequate reading
instruction in
CHAPTER 6 Reading is Unnatural68
school may never learn to read. There are a great many such
people, especially in
the developing world. By comparison, very few people never
learn a spoken
language.
For a variety of reasons, some people who learn to read never
become good at it.
Perhaps their parents did not value and promote reading.
Perhaps they attended
substandard schools or didn’t attend school at all. Perhaps they
learned a second
language but never learned to read well in that language. People
who have cognitive
or perceptual impairments such as dyslexia may never read
easily.
A person’s ability to read is specific to a language and a script
(a system of writ-
ing). To see what text looks like to someone who cannot read,
just look at a para-
graph printed in a language and script that you do not know (see
Fig. 6.1).
Alternatively, you can approximate the feeling of illiteracy by
taking a page writ-
ten in a familiar script and language—such as a page of this
book—and turning it
upside down. Turn this book upside down and try reading the

next few paragraphs.
This exercise only approximates the feeling of illiteracy. You
will discover that the
inverted text appears foreign and illegible at first, but after a
minute you will be able
to read it, albeit slowly and laboriously.
Learning to read = training our visual system
Learning to read involves training our visual system to
recognize patterns—the
patterns exhibited by text. These patterns run a gamut from low
level to high
level:
l Lines, contours, and shapes are basic visual features that our
brain recognizes
innately. We don’t have to learn to recognize them.
l Basic features combine to form patterns that we learn to
identify as charac-
ters—letters, numeric digits, and other standard symbols. In
ideographic
scripts, such as Chinese, symbols represent entire words or
concepts.
FIGURE 6.1
To see how it feels to be illiterate, look at text printed in a
foreign script: (A) Amharic and
(B) Tibetan.
69We’re wired for language, but not for reading
l In alphabetic scripts, patterns of characters form morphemes,

which we learn
to recognize as packets of meaning—for example, “farm,”
“tax,” “-ed,” and “-ing”
are morphemes in English.
l Morphemes combine to form patterns that we recognize as
words—for example,
“farm,” “tax,” “-ed,” and “-ing” can be combined to form the
words “farm,”
“farmed,” “farming,” “tax,” “taxed,” and “taxing.” Even
ideographic scripts
include symbols that serve as morphemes or modifiers of
meaning rather than as
words or concepts.
l Words combine to form patterns that we learn to recognize as
phrases, idiom-
atic expressions, and sentences.
l Sentences combine to form paragraphs.
Actually, only part of our visual system is trained to recognize
textual patterns
involved in reading: the fovea and a small area immediately
surrounding it (known as the
perifovea), and the downstream neural networks running
through the optic nerve to
the visual cortex and into various parts of our brain. The neural
networks starting else-
where in our retinas do not get trained to read. More about this
is explained later in the
chapter.
Learning to read also involves training the brain’s systems that
control eye move-
ment to move our eyes in a specific way over text. The main

direction of eye move-
ment depends on the direction in which the language we are
reading is written:
European language scripts are read left to right, many middle
Eastern language
scripts are read right to left, and some language scripts are read
top to bottom.
Beyond that, the precise eye movements differ depending on
whether we are read-
ing, skimming for overall meaning, or scanning for specific
words.
How we read
Assuming our visual system and brain have successfully been
trained, reading
becomes semi-automatic or fully automatic—both the eye
movement and the
processing.
As explained earlier, the center of our visual field—the fovea
and perifovea—is
the only part of our visual field that is trained to read. All text
that we read enters our
visual system after being scanned by the central area, which
means that reading
requires a lot of eye movement.
As explained in Chapter 5 on the discussion of peripheral
vision, our eyes con-
stantly jump around, several times a second. Each of these
movements, called sac-
cades, lasts about 0.1 second. Saccades are ballistic, like firing
a shell from a cannon:
their endpoint is determined when they are triggered, and once
triggered, they
always execute to completion. As described in earlier chapters,

the destinations of
saccadic eye movements are programmed by the brain from a
combination of our
goals, events in the visual periphery, events detected and
localized by other percep-
tual senses, and past history including training.
When we read, we may feel that our eyes scan smoothly across
the lines of text,
but that feeling is incorrect. In reality, our eyes continue with
saccades during read-
ing, but the movements generally follow the line of text. They
fix our fovea on a
word, pause there for a fraction of a second to allow basic
patterns to be captured
and transmitted to the brain for further analysis, then jump to
the next important
word (Larson, 2004). Eye fixations while reading always land
on words, usually near
the center, never on word boundaries (see Fig. 6.2). Very
common small connector
and function words like “a,” “and,” “the,” “or,” “is,” and “but”
are usually skipped
over, their presence either detected in perifoveal vision or
simply assumed. Most of
the saccades during reading are in the text’s normal reading
direction, but a few—
about 10%—jump backwards to previous words. At the end of
each line of text, our
eyes jump to where our brain guesses the next line begins.1
How much can we take in during each eye fixation during

reading? For reading
European-language scripts at normal reading distances and text-
font sizes, the fovea
clearly sees 3–4 characters on either side of the fixation point.
The perifovea sees out
about 15–20 characters from the fixation point, but not very
clearly (see Fig. 6.3).
According to reading researcher Kevin Larson (2004), the
reading area in and around
the fovea consists of three distinct zones (for European-
language scripts):
Closest to the fixation point is where word recognition takes
place. This zone is usu-
ally large enough to capture the word being fixated, and often
includes smaller function
words directly to the right of the fixated word. The next zone
extends a few letters past
the word recognition zone, and readers gather preliminary
information about the next let-
ters in this zone. The final zone extends out to 15 letters past
the fixation point. Informa-
tion gathered out this far is used to identify the length of
upcoming words and to identify
the best location for the next fixation point.
1 Later we will see that centered text disrupts the brain’s guess
about where the next line starts.
FIGURE 6.2
Saccadic eye movements during reading jump between
important words.
FIGURE 6.3

Visibility of words in a line of text, with fovea fixed on the
word “years.”
71Is reading feature-driven or context-driven?
Because our visual system has been trained to read, perception
around the fixation
point is asymmetrical: it is more sensitive to characters in the
reading direction than
in the other direction. For European-language scripts, this is
toward the right. That
makes sense because characters to the left of the fixation point
have usually already
been read.
IS READING FEATURE-DRIVEN OR CONTEXT-DRIVEN?
As explained earlier, reading involves recognizing features and
patterns. Pattern rec-
ognition, and therefore reading, can be either a bottom-up,
feature-driven process,
or a top-down, context-driven process.
In feature-driven reading, the visual system starts by identifying
simple features—
line segments in a certain orientation or curves of a certain
radius—on a page or display,
and then combines them into more complex features, such as
angles, multiple curves,
shapes, and patterns. Then the brain recognizes certain shapes
as characters or symbols
representing letters, numbers, or, for ideographic scripts, words.
In alphabetic scripts,
groups of letters are perceived as morphemes and words. In all
types of scripts, sequences

of words are parsed into phrases, sentences, and paragraphs that
have meaning.
Feature-driven reading is sometimes referred to as “bottom-up”
or “context-free.”
The brain’s ability to recognize basic features—lines, edges,
angles, etc.—is built in
and therefore automatic from birth. In contrast, recognition of
morphemes, words,
and phrases has to be learned. It starts out as a nonautomatic,
conscious process
requiring conscious analysis of letters, morphemes, and words,
but with enough
practice it becomes automatic (Sousa, 2005). Obviously, the
more common a mor-
pheme, word, or phrase, the more likely that recognition of it
will become auto-
matic. With ideographic scripts such as Chinese, which have
many times more
symbols than alphabetic scripts do, people typically take many
years longer to
become skilled readers.
Context-driven or top-down reading operates in parallel with
feature-driven read-
ing but it works the opposite way: from whole sentences or the
gist of a paragraph
down to the words and characters. The visual system starts by
recognizing high-
level patterns like words, phrases, and sentences, or by knowing
the text’s meaning
in advance. It then uses that knowledge to figure out—or
guess—what the compo-
nents of the high-level pattern must be (Boulton, 2009).
Context-driven reading is
less likely to become fully automatic because most phrase-level

and sentence-level
patterns and contexts don’t occur frequently enough to allow
their recognition to
become burned into neural firing patterns. But there are
exceptions, such as idiom-
atic expressions.
To experience context-driven reading, glance quickly at Figure
6.4, then immedi-
ately direct your eyes back here and finish reading this
paragraph. Try it now. What
did the text say?
Now look at the same sentence again more carefully. Do you
read it the same way
now?
Also, based on what we have already read and our knowledge of
the world, our
brains can sometimes predict text that the fovea has not yet read
(or its meaning),
allowing us to skip reading it. For example, if at the end of a
page we read “It was a
dark and stormy,” we would expect the first word on the next
page to be “night.” We
would be surprised if it was some other word (e.g., “cow”).
Feature-driven, bottom-up reading dominates; context assists
It has been known for decades that reading involves both
feature-driven (bottom-up)
processing and context-driven (top-down) processing. In
addition to being able to

figure out the meaning of a sentence by analyzing the letters
and words in it, people
can determine the words of a sentence by knowing the meaning
of the sentence, or
the letters in a word by knowing what word it is (see Fig. 6.5).
The question is: Is
skilled reading primarily bottom-up or top-down, or is neither
mode dominant?
Early scientific studies of reading—from the late 1800s through
about 1980—
seemed to show that people recognize words first and from that
determine what
letters are present. The theory of reading that emerged from
those findings was that
our visual system recognizes words primarily from their overall
shape. This theory
failed to account for certain experimental results and so was
controversial among
reading researchers, but it nonetheless gained wide acceptance
among nonresearch-
ers, especially in the graphic design field (Larson, 2004;
Herrmann, 2011).
Mray had a ltilte lmab, its feclee was withe as sown. And ervey
wehre taht Mray wnet, the lmab was srue to go.
(A)
(B)
Twinkle twinkle little star how I wonder what you are
FIGURE 6.5
Top-down reading: most readers, especially those who know the

songs from which these text
passages are taken, can read these passages even though the
words (A) have all but their first
and last letters scrambled and (B) are mostly obscured.
The rain in Spain falls
manly in the the plain
FIGURE 6.4
Top-down recognition of the expression can inhibit seeing the
actual text.
73Is reading feature-driven or context-driven?
Similarly, educational researchers in the 1970s applied
information theory to
reading, and assumed that because of redundancies in written
language, top-down,
context-driven reading would be faster than bottom-up, feature-
driven reading. This
assumption led them to hypothesize that reading for highly
skilled (fast) readers
would be dominated by context-driven (top-down) processing.
This theory was
probably responsible for many speed-reading methods of the
1970s and 1980s, which
supposedly trained people to read fast by taking in whole
phrases and sentences at
a time.
However, empirical studies of readers conducted since then
have demonstrated
conclusively that those early theories were false. Summing up

the research are state-
ments from reading researchers Kevin Larson (2004) and Keith
Stanovich (Bolton,
2009), respectively:
Word shape is no longer a viable model of word recognition.
The bulk of scientific evi-
dence says that we recognize a word’s component letters, then
use that visual informa-
tion to recognize a word.
Context [is] important, but it’s a more important aid for the
poorer reader who doesn’t
have automatic context-free recognition instantiated.
In other words, reading consists mainly of context-free, bottom-
up, feature-driven
processes. In skilled readers, these processes are well learned to
the point of being
automatic. Context-driven reading today is considered mainly a
backup method that,
although it operates in parallel with feature-based reading, is
only relevant when
feature-driven reading is difficult or insufficiently automatic.
Skilled readers may resort to context-based reading when
feature-based read-
ing is disrupted by poor presentation of information (see
examples later in this
chapter). Also, in the race between context-based and feature-
based reading to
decipher the text we see, contextual cues sometimes win out
over features. As an
example of context-based reading, Americans visiting England
sometimes mis-
read “to let” signs as “toilet,” because in the United States they

see the word “toi-
let” often, but they almost never see the phrase “to let”—
Americans use “for
rent” instead.
In less skilled readers, feature-based reading is not automatic; it
is conscious and
laborious. Therefore, more of their reading is context-based.
Their involuntary use
of context-based reading and nonautomatic feature-based
reading consumes short-
term cognitive capacity, leaving little for comprehension.2 They
have to focus on
2 Chapter 10 describes the differences between automatic and
controlled cognitive processing. Here, we will
simply say that controlled processes burden working memory,
while automatic processes do not.
deciphering the stream of words, leaving no capacity for
constructing the meaning
of sentences and paragraphs. That is why poor readers can read
a passage aloud but
afterward have no idea what they just read.
Why is context-free (bottom-up) reading not automatic in some
adults? Some peo-
ple didn’t get enough experience reading as young children for
the feature-driven
recognition processes to become automatic. As they grow up,
they find reading men-
tally laborious and taxing, so they avoid reading, which

perpetuates and compounds
their deficit (Boulton, 2009).
SKILLED AND UNSKILLED READING USE DIFFERENT
PARTS OF THE BRAIN
Before the 1980s, researchers who wanted to understand which
parts of the brain
are involved in language and reading were limited mainly to
studying people who
had suffered brain injuries. For example, in the mid-19th
century, doctors found that
people with brain damage near the left temple—an area now
called Broca’s area
after the doctor who discovered it—can understand speech but
have trouble speak-
ing, and that people with brain damage near the left ear—now
called Wernicke’s area—
cannot understand speech (Sousa, 2005) (see Fig. 6.6).
In recent decades, new methods of observing the operation of
functioning brains
in living people have been developed: electroencephalography
(EEG), functional
magnetic resonance imaging (fMRI), and functional magnetic
resonance spectros-
copy (fMRS). These methods allow researchers to watch the
response in different
areas of a person’s brain—including the sequence in which they
respond—as the
person perceives various stimuli or performs specific tasks
(Minnery and Fine,
2009).
Broca’s area
Wernicke’s area

FIGURE 6.6
The human brain, showing Broca’s area and Wernicke’s area.
75Poor information design can disrupt reading
Using these methods, researchers have discovered that the
neural pathways
involved in reading differ for novice versus skilled readers. Of
course, the first area
to respond during reading is the occipital (or visual) cortex at
the back of the brain.
That is the same regardless of a person’s reading skill. After
that, the pathways
diverge (Sousa, 2005):
l Novice. First an area of the brain just above and behind
Wernicke’s area
becomes active. Researchers have come to view this as the area
where, at least
with alphabetic scripts such as English and German, words are
“sounded out”
and assembled—that is, letters are analyzed and matched with
their correspond-
ing sounds. The word-analysis area then communicates with
Broca’s area and
the frontal lobe, where morphemes and words—units of
meaning—are recog-
nized and overall meaning is extracted. For ideographic
languages, where sym-
bols represent whole words and often have a graphical
correspondence to their
meaning, sounding out of words is not part of reading.

l Advanced. The word-analysis area is skipped. Instead the
occipitotemporal area
(behind the ear, not far from the visual cortex) becomes active.
The prevailing
view is that this area recognizes words as a whole without
sounding them out,
and then that activity activates pathways toward the front of the
brain that cor-
respond to the word’s meaning and mental image. Broca’s area
is only slightly
involved.
Findings from brain scan methods of course don’t indicate
exactly what pro-
cesses are being used, but they support the theory that advanced
readers use differ-
ent processes from those novice readers use.
POOR INFORMATION DESIGN CAN DISRUPT READING
Careless writing or presentation of text can reduce skilled
readers’ automatic, con-
text-free reading to conscious, context-based reading, burdening
working memory,
thereby decreasing speed and comprehension. In unskilled
readers, poor text pre-
sentation can block reading altogether.
Uncommon or unfamiliar vocabulary
One way software often disrupts reading is by using unfamiliar
vocabulary—words
the intended readers don’t know very well or at all.
One type of unfamiliar terminology is computer jargon,
sometimes known as
“geek speak.” For example, an intranet application displayed

the following error mes-
sage if a user tried to use the application after more than 15
minutes of letting it sit
idle:
Your session has expired. Please reauthenticate.
The application was for finding resources—rooms, equipment,
etc.—within
the company. Its users included receptionists, accountants, and
managers, as
well as engineers. Most nontechnical users would not
understand the word
“reauthenticate,” so they would drop out of automatic reading
mode into con-
scious wondering about the message’s meaning. To avoid
disrupting reading, the
application’s developers could have used the more familiar
instruction, “Login
again.” For a discussion of how “geek speak” in computer-based
systems affects
learning, see Chapter 11.
Reading can also be disrupted by uncommon terms even if they
are not computer
technology terms. Here are some rare English words, including
many that appear
mainly in contracts, privacy statements, or other legal
documents:
l Aforementioned: mentioned previously
l Bailiwick: the region in which a sheriff has legal powers;

more generally:
domain of control
l Disclaim: renounce any claim to or connection with; disown;
repudiate
l Heretofore: up to the present time; before now
l Jurisprudence: the principles and theories on which a legal
system is based
l Obfuscate: make something difficult to perceive or
understand
l Penultimate: next to the last, as in “the next to the last
chapter of a book”
When readers—even skilled ones—encounter such a word, their
automatic read-
ing processes probably won’t recognize it. Instead, their brain
uses less automatic
processes, such as sounding out the word’s parts and using them
to figure out its
meaning, figuring out the meaning from the context in which
the word appears, or
looking the word up in a dictionary.
Difficult scripts and typefaces
Even when the vocabulary is familiar, reading can be disrupted
by typefaces with
unfamiliar or hard-to-distinguish shapes. Context-free,
automatic reading is based
on recognizing letters and words bottom-up from their lower-
level visual features.
Our visual system is quite literally a neural network that must
be trained to recog-

nize certain combinations of shapes as characters. Therefore, a
typeface with dif-
ficult-to-recognize features and shapes will be hard to read. For
example, try to
read Abraham Lincoln’s Gettysburg Address in an outline
typeface in ALL CAPS
(see Fig. 6.7).
Comparison studies show that skilled readers read uppercase
text 10–15% more
slowly than lowercase text. Current-day researchers attribute
that difference mainly
to a lack of practice reading uppercase text, not to an inherent
lower recognizability
of uppercase text (Larson, 2004). Nonetheless, it is important
for designers to be
aware of the practice effect (Herrmann, 2011).
Tiny fonts
Another way to make text hard to read in software applications,
websites, and elec-
tronic appliances is to use fonts that are too small for their
intended readers’ visual
system to resolve. For example, try to read the first paragraph
of the U.S. Constitu-
tion in a seven-point font (see Fig. 6.8).
Developers sometimes use tiny fonts because they have a lot of
text to display in
a small amount of space. But if the intended users of the system
cannot read the text,
or can read it only laboriously, the text might as well not be

there.
Text on noisy background
Visual noise in and around text can disrupt recognition of
features, characters, and
words, and therefore drop reading out of automatic feature-
based mode into a
more conscious and context-based mode. In software user
interfaces and websites,
visual noise often results from designers’ placing text over a
patterned background
or displaying text in colors that contrast poorly with the
background, as an exam-
ple from Arvanitakis.com shows (see Fig. 6.9).
FIGURE 6.7
Text in ALL CAPS is harder to read because we are not
practiced at doing it. Outline typefaces
complicate feature recognition. This example demonstrates
both.
We the people of the United States, in Order to form a more
perfect Union, establish Justice, insure domestic Tranquility,
provide
for the common defense, promote the general Welfare, and
secure the Blessings of Liberty to ourselves and our Posterity,
do
ordain and establish this Constitution for the United States of
America.
FIGURE 6.8
The opening paragraph of the U.S. Constitution, presented in a
seven-point font.

http://Arvanitakis.com
There are situations in which designers intend to make text hard
to read. For
example, a common security measure on the Web is to ask site
users to identify dis-
torted words, as proof that they are a live human beings and not
an Internet “’bot.”
This relies on the fact that most people can read text that
Internet ’bots cannot cur-
rently read. Text displayed as a challenge to test a registrant’s
humanity is called a
captcha3 (see Fig. 6.10).
Of course, most text displayed in a user interface should be easy
to read. A
patterned background need not be especially strong to disrupt
people’s ability to
read text placed over it. For example, the Federal Reserve
Bank’s collection of
websites formerly provided a mortgage calculator that was
decorated with a
repeating pastel background with a home and neighborhood
theme. Although
well-intentioned, the decorated background made the calculator
hard to read
(see Fig. 6.11).
Information buried in repetition
Visual noise can also come from the text itself. If successive
lines of text contain
a lot of repetition, readers receive poor feedback about what
line they are focused

on, plus it is hard to pick out the important information. For
example, recall the
example from the California Department of Motor Vehicles web
site in Chapter 3
(see Fig. 3.2).
3 The term originally comes from the word “capture,” but it is
also said to be an acronym for “Completely
Automated Public Turing test to tell Computers and Humans
Apart.”
FIGURE 6.10
Text that is intentionally displayed with noise so that Web-
crawling software cannot read it is
called a captcha.
FIGURE 6.9
Arvanitakis.com uses text on a noisy background and poor color
contrast.
Another example of repetition that creates noise is the computer
store on
Apple.com. The pages for ordering a laptop computer list
different keyboard
options for a computer in a very repetitive way, making it hard
to see that the
essential difference between the keyboards is the language that
they support
(see Fig. 6.12).

Centered text
One aspect of reading that is highly automatic in most skilled
readers is eye move-
ment. In automatic (fast) reading, our eyes are trained to go
back to the same hori-
zontal position and down one line. If text is centered or right-
aligned, each line of
text starts in a different horizontal position. Automatic eye
movements, therefore,
take our eyes back to the wrong place, so we must consciously
adjust our gaze to the
actual start of each line. This drops us out of automatic mode
and slows us down
greatly. With poetry and wedding invitations, that is probably
okay, but with any
FIGURE 6.11
The Federal Reserve Bank’s online mortgage calculator
displayed text on a patterned
background.
FIGURE 6.12
Apple.com’s “Buy Computer” page lists options in which the
important information (keyboard
language compatibility) is buried in repetition.
http://Apple.com
other type of text, it is a disadvantage. An example of centered
prose text is provided
by the web site of FargoHomes, a real estate company (see Fig.

6.13). Try reading the
text quickly to demonstrate to yourself how your eyes move.
The same site also centers numbered lists, really messing up
readers’ automatic
eye movement (see Fig. 6.14). Try scanning the list quickly.
Exclusive Buyer Agency Offer
(No Cost) Service to Home Buyers!
Dan and Lida want to work for you if:
.......................................................................................
Would you like to avoid sellers agents who are pushing, selling,
and trying to make sales quotas?
Do you want your agent to be on your side and not the sellers
side?
Do you expect your agent to be responsible and professional....?
If you don’t like to have your time wasted, Dan and Lida want
to work for you....
If you understand that everything we say and do, is to save you
time, money, and keep you out of trouble....
-and if you understand that some agents income and allegiances
are in direct competition with your best interests....
-and if you understand that we take risks, give you 24/7 access,
and put aside other paying business for you...
-and if you understand that we have a vested interest in helping
you learn to make all the right choices...
- then, call us now, because Dan and Lida want to work for
you!!
FIGURE 6.13

FargoHomes.com centers text, thwarting automatic eye
movement patterns.
FIGURE 6.14
FargoHomes.com centers numbered items, really thwarting
automatic eye movement patterns.
http://FargoHomes.com
http://FargoHomes.com
Design implications: Don’t disrupt reading; support it!
Obviously, a designer’s goal should be to support reading, not
disrupt it. Skilled (fast)
reading is mostly automatic and mostly based on feature,
character, and word recog-
nition. The easier the recognition, the easier and faster the
reading. Less skilled read-
ing, by contrast, is greatly assisted by contextual cues.
Designers of interactive systems can support both reading
methods by following
these guidelines:
1) Ensure that text in user interfaces allows the feature-based
automatic
processes to function effectively by avoiding the disruptive
flaws
described earlier: difficult or tiny fonts, patterned backgrounds,
centering,
etc.

2) Use restricted, highly consistent vocabularies—sometimes
referred to in the
industry as plain language4 or simplified language (Redish,
2007).
3) Format text to create a visual hierarchy (see Chapter 3) to
facilitate easy scan-
ning: use headings, bulleted lists, tables, and visually
emphasized words (see
Fig. 6.15).
Experienced information architects, content editors, and graphic
designers can
be very useful in ensuring that text is presented to support easy
scanning and
reading.
4 For more information on plain language, see the U.S.
government website, www.plainlanguage.gov.
FIGURE 6.15
Microsoft Word’s “Help” homepage is easy to scan and read.
MUCH OF THE READING REQUIRED BY SOFTWARE
IS UNNECESSARY
In addition to committing design mistakes that disrupt reading,
many software user
interfaces simply present too much text, requiring users to read
more than is neces-
sary. Consider how much unnecessary text there is in a dialog
box for setting text

entry properties in the SmartDraw application (see Fig. 6.16).
Software designers often justify lengthy instructions by arguing:
“We need all
that text to explain clearly to users what to do.” However,
instructions can often be
shortened with no loss of clarity. Let’s examine how the Jeep
company, between
2002 and 2007, shortened its instructions for finding a local
Jeep dealer (see Fig.
6.17):
4) 2002: The “Find a Dealer” page displayed a large paragraph
of prose text, with
numbered instructions buried in it, and a form asking for more
information than
needed to find a dealer near the user.
FIGURE 6.16
SmartDraw’s “Text Entry Properties” dialog box displays too
much text for its simple functionality.
83Much of the reading required by software is unnecessary
FIGURE 6.17
Between 2002 and 2007, Jeep.com drastically reduced the
reading required by “Find a Dealer.”
http://Jeep.com

5) 2003: The instructions on the “Find a Dealer” page had
been boiled down to
three bullet points, and the form required less information.
6) 2007: “Find a Dealer” had been cut to one field (zip code)
and a “Go” button on
the homepage.
Even when text describes products rather than explaining
instructions, it is
counterproductive to put all a vendor wants to say about a
product into a lengthy
prose description that people have to read from start to end.
Most potential custom-
ers cannot or will not read it. Compare Costco.com’s
descriptions of laptop comput-
ers in 2007 with those in 2009 (see Fig. 6.18).
Design implications: Minimize the need for reading
Too much text in a user interface loses poor readers, who
unfortunately are a signifi-
cant percentage of the population. Too much text even alienates
good readers: it
turns using an interactive system into an intimidating amount of
work.
FIGURE 6.18
Between 2007 and 2009, Costco.com drastically reduced the
text in product descriptions.
http://Costco.com
85Test on real users

Minimize the amount of prose text in a user interface; don’t
present users with
long blocks of prose text to read. In instructions, use the least
amount of text that
gets most users to their intended goals. In a product description,
provide a brief
overview of the product and let users request more detail if they
want it. Technical
writers and content editors can assist greatly in doing this. For
additional advice on
how to eliminate unnecessary text, see Krug (2005) and Redish
(2007).
TEST ON REAL USERS
Finally, designers should test their designs on the intended user
population to be
confident that users can read all essential text quickly and
effortlessly. Some testing
can be done early, using prototypes and partial
implementations, but it should also
be done just before release. Fortunately, last-minute changes to
text font sizes and
formats are usually easy to make.

CHAPTER
87
Our Attention is Limited;
Our Memory is Imperfect
Just as the human visual system has strengths and weaknesses,
so do human attention
and memory. This chapter describes some of those strengths and
weaknesses as back-
ground for understanding how we can design interactive systems
to support and aug-
ment attention and memory rather than burdening or confusing
them. We will start
with an overview of how memory works, and how it is related to
attention.
SHORT- VERSUS LONG-TERM MEMORY
Psychologists historically have distinguished short-term
memory from long-term
memory. Short-term memory covers situations in which
information is retained for
intervals ranging from a fraction of a second to a few minutes.
Long-term memory
covers situations in which information is retained over longer
periods (e.g., hours,
days, years, even lifetimes).
It is tempting to think of short- and long-term memory as
separate memory stores.
Indeed, some theories of memory have considered them
separate. After all, in a digi-
tal computer, the short-term memory stores (central processing
unit [CPU] data reg-
isters) are separate from the long-term memory stores (random

access memory
[RAM], hard disk, flash memory, CD-ROM, etc.). More direct
evidence comes from
findings that damage to certain parts of the human brain results
in short-term mem-
ory deficits but not long-term ones, or vice versa. Finally, the
speed with which infor-
mation or plans can disappear from our immediate awareness
contrasts sharply with
the seeming permanence of our memory of important events in
our lives, faces of
significant people, activities we have practiced, and information
we have studied.
These phenomena led many researchers to theorize that short-
term memory is a
separate store in the brain where information is held temporarily
after entering
through our perceptual senses (e.g., visual or auditory), or after
being retrieved from
long-term memory (see Fig. 7.1).
7
Imperfect88
A MODERN VIEW OF MEMORY
Recent research on memory and brain function indicates that
short- and long-term
memory are functions of a single memory system—one that is
more closely linked
with perception than previously thought ( Jonides et al., 2008).
Long-term memory

Perceptions enter through the visual, auditory, olfactory,
gustatory, or tactile sensory
systems and trigger responses starting in areas of the brain
dedicated to each sense
(e.g., visual cortex, auditory cortex), then spread into other
areas of the brain that are
not specific to any particular sensory modality. The sensory
modality–specific areas
of the brain detect only simple features of the data, such as a
dark–light edge, diago-
nal line, high-pitched tone, sour taste, red color, or rightward
motion. Downstream
areas of the brain combine low-level features to detect higher-
level features of the
input, such as animal, the word “duck,” Uncle Kevin, minor
key, threat, or fairness.
As described in Chapter 1, the set of neurons activated by a
perceived stimulus
depends on both the features and context of the stimulus. The
context is as impor-
tant as the features of the stimulus in determining what neural
patterns are acti-
vated. For example, a dog barking near you when you are
walking in your
neighborhood activates a different pattern of neural activity in
your brain than the
same sound heard when you are safely inside your car. The
more similar two percep-
tual stimuli are—that is, the more features and contextual
elements they share—the
more overlap there is between the sets of neurons that fire in
response to them.
The initial strength of a perception depends on how much it is
amplified or damp-

ened by other brain activity. All perceptions create some kind of
trace, but some are
so weak that they can be considered as not registered: the
pattern was activated
once but never again.
Memory formation consists of changes in the neurons involved
in a neural activ-
ity pattern, which make the pattern easier to reactivate in the
future.1 Some such
changes result from chemicals released near neural endings that
boost or inhibit
their sensitivity to stimulation. These changes last only until the
chemicals dissipate
1 There is evidence that the long-term neural changes associated
with learning occur mainly during sleep, sug-
gesting that separating learning sessions by periods of sleep
may facilitate learning (Stafford and Webb, 2005).
Perception Short-Term Memory
hello
Long-Term Memory
duck farm ham
friend Bill greet smile
FIGURE 7.1
Traditional (antiquated) view of short-term versus long-term
memory.

89A modern view of memory
or are neutralized by other chemicals. More permanent changes
occur when neu-
rons grow and branch, forming new connections with others.
Activating a memory consists of reactivating the same pattern
of neural activity
that occurred when the memory was formed. Somehow the brain
distinguishes ini-
tial activations of neural patterns from reactivations—perhaps
by measuring the rela-
tive ease with which the pattern was reactivated. New
perceptions very similar to
the original ones reactivate the same patterns of neurons,
resulting in recognition if
the reactivated perception reaches awareness. In the absence of
a similar percep-
tion, stimulation from activity in other parts of the brain can
also reactivate a pattern
of neural activity, which if it reaches awareness results in
recall.
The more often a neural memory pattern is reactivated, the
stronger it becomes—
that is, the easier it is to reactivate—which in turn means that
the perception it cor-
responds to is easier to recognize and recall. Neural memory
patterns can also be
strengthened or weakened by excitatory or inhibitory signals
from other parts of the
brain.
A particular memory is not located in any specific spot in the
brain. The neural
activity pattern comprising a memory involves a network of

millions of neurons
extending over a wide area. Activity patterns for different
memories overlap,
depending on which features they share. Removing, damaging,
or inhibiting neu-
rons in a particular part of the brain typically does not
completely wipe out mem-
ories that involve those neurons, but rather just reduces the
detail or accuracy of
the memory by deleting features.2 However, some areas in a
neural activity pat-
tern may be critical pathways, so that removing, damaging, or
inhibiting them
may prevent most of the pattern from activating, thereby
effectively eliminating
the corresponding memory.
For example, researchers have long known that the
hippocampus, twin seahorse-
shaped neural clusters near the base of the brain, plays an
important role in storing
long-term memories. The modern view is that the hippocampus
is a controlling
mechanism that directs neural rewiring so as to “burn”
memories into the brain’s
wiring. The amygdala, two jellybean-shaped clusters on the
frontal tips of the hip-
pocampus, has a similar role, but it specializes in storing
memories of emotionally
intense, threatening situations (Eagleman, 2012).
Cognitive psychologists view human long-term memory as
consisting of several
distinct functions:
l Semantic long-term memory stores facts and relationships.

l Episodic long-term memory records past events.
l Procedural long-term memory remembers action sequences.
These distinctions, while important and interesting, are beyond
the scope of this
book.
2 This is similar to the effect of cutting pieces out of a
holographic image: it reduces the overall resolution of
the image, rather than removing areas of it, as with an ordinary
photograph.
Imperfect90
Short-term memory
The processes just discussed are about long-term memory. What
about short-term
memory? What psychologists call short-term memory is actually
a combination of
phenomena involving perception, attention, and retrieval from
long-term memory.
One component of short-term memory is perceptual. Each of our
perceptual
senses has its own very brief short-term “memory” that is the
result of residual neural
activity after a perceptual stimulus ceases, like a bell that rings
briefly after it is
struck. Until they fade away, these residual perceptions are
available as possible
input to our brain’s attention and memory-storage mechanisms,
which integrate

input from our various perceptual systems, focus our awareness
on some of that
input, and store some of it in long-term memory. These sensory-
specific residual
perceptions together comprise a minor component of short-term
memory. Here, we
are only interested in them as potential inputs to working
memory.
Also available as potential input to working memory are long-
term memories
reactivated through recognition or recall. As explained earlier,
each long-term mem-
ory corresponds to a specific pattern of neural activity
distributed across our brain.
While activated, a memory pattern is a candidate for our
attention and therefore
potential input for working memory.
The human brain has multiple attention mechanisms, some
voluntary and some
involuntary. They focus our awareness on a very small subset of
the perceptions and
activated long-term memories while ignoring everything else.
That tiny subset of all
the available information from our perceptual systems and our
long-term memories
that we are aware of right now is the main component of our
short-term memory,
the part that cognitive scientists often call working memory. It
integrates informa-
tion from all of our sensory modalities and our long-term
memory. Henceforth, we
will restrict our discussion of short-term memory to working
memory.

So what is working memory? First, here is what it is not: it is
not a store—it is not
a place in the brain where memories and perceptions go to be
worked on. And it is
nothing like accumulators or fast random-access memory in
digital computers.
Instead, working memory is our combined focus of attention:
everything that we
are conscious of at a given time. More precisely, it is a few
perceptions and long-term
memories that are activated enough that we remain aware of
them over a short
period. Psychologists also view working memory as including
an executive func-
tion—based mainly in the frontal cerebral cortex—that
manipulates items we are
attending to and, if needed, refreshes their activation so they
remain in our aware-
ness (Baddeley, 2012).
A useful—if oversimplified—analogy for memory is a huge,
dark, musty ware-
house. The warehouse is full of long-term memories, piled
haphazardly (not stacked
neatly), intermingled and tangled, and mostly covered with dust
and cobwebs. Doors
along the walls represent our perceptual senses: sight, hearing,
smell, taste, touch.
They open briefly to let perceptions in. As perceptions enter,
they are briefly illumi-
nated by light coming in from outside, but they quickly are
pushed (by more enter-
ing perceptions) into the dark tangled piles of old memories.

91A modern view of memory
In the ceiling of the warehouse are a small fixed number of
searchlights, con-
trolled by the attention mechanism’s executive function
(Baddeley, 2012). They
swing around and focus on items in the memory piles,
illuminating them for a while
until they swing away to focus elsewhere. Sometimes one or
two searchlights focus
on new items after they enter through the doors. When a
searchlight moves to focus
on something new, whatever it had been focusing on is plunged
into darkness.
The small fixed number of searchlights represents the limited
capacity of work-
ing memory. What is illuminated by them (and briefly through
the open doors) rep-
resents the contents of working memory: out of the vast
warehouse’s entire contents,
the few items we are attending to at any moment. See Figure 7.2
for a visual.
The warehouse analogy is too simple and should not be taken
too seriously. As
Chapter 1 explained, our senses are not just passive doorways
into our brains,
through which our environment “pushes” perceptions. Rather,
our brain actively
and continually seeks out important events and features in our
environment and
“pulls” perceptions in as needed (Ware, 2008). Furthermore, the
brain is buzzing
with activity most of the time and its internal activity is only

modulated—not deter-
mined—by sensory input (Eagleman, 2012). Also, as described
earlier, memories are
embodied as networks of neurons distributed around the brain,
not as objects in a
specific location. Finally, activating a memory in the brain can
activate related ones;
our warehouse-with-searchlights analogy doesn’t represent that.
Nonetheless, the analogy—especially the part about the
searchlights—illustrates
that working memory is a combination of several foci of
attention—the currently
FriendSally
Hello
Duck
FIGURE 7.2
Modern view of memory: a dark warehouse full of stuff (long-
term memory), with searchlights
focused on a few items (short-term memory).
Imperfect92
activated neural patterns of which we are aware—and that the
capacity of working
memory is extremely limited, and the content at any given
moment is very volatile.
What about the earlier finding that damage to some parts of the

brain causes
short-term memory deficits, while other types of brain damage
cause long-term
memory deficits? The current interpretation is that some types
of damage decrease
or eliminate the brain’s ability to focus attention on specific
objects and events,
while other types of damage harm the brain’s ability to store or
retrieve long-term
memories.
CHARACTERISTICS OF ATTENTION AND WORKING
MEMORY
As noted, working memory is equal to the focus of our
attention. Whatever is in that
focus is what we are conscious of at any moment. But what
determines what we
attend to and how much we can attend to at any given time?
Attention is highly focused and selective
Most of what is going on around you at this moment you are
unaware of. Your per-
ceptual system and brain sample very selectively from your
surroundings, because
they don’t have the capacity to process everything.
Right now you are conscious of the last few words and ideas
you’ve read, but
probably not the color of the wall in front of you. But now that
I’ve shifted your atten-
tion, you are conscious of the wall’s color, and may have
forgotten some of the ideas
you read on the previous page.
Chapter 1 described how our perception is filtered and biased
by our goals. For

example, if you are looking for your friend in a crowded
shopping mall, your visual
system “primes” itself to notice people who look like your
friend (including how he
or she is dressed), and barely notice everything else.
Simultaneously, your auditory
system primes itself to notice voices that sound like your
friend’s voice, and even
footsteps that sound like those of your friend. Human-shaped
blobs in your periph-
eral vision and sounds localized by your auditory system that
match your friend snap
your eyes and head toward them. While you look, anyone
looking or sounding simi-
lar to your friend attracts your attention, and you won’t notice
other people or events
that would normally have interested you.
Besides focusing on objects and events related to our current
goals, our attention
is drawn to:
l Movement, especially movement near or toward us. For
example, some-
thing jumps at you while you walk on a street, or something
swings toward your
head in a haunted house ride at an amusement park, or a car in
an adjacent lane
suddenly swerves toward your lane (see the discussion of the
flinch reflex in
Chapter 14).
l Threats. Anything that signals or portends danger to us or
people in our care.

93Characteristics of attention and working memory
l Faces of other people. We are primed from birth to notice
faces more than
other objects in our environment.
l Sex and food. Even if we are happily married and well fed,
these things attract
our attention. Even the mere words probably quickly got your
attention.
These things, along with our current goals, draw our attention
involuntarily. We
don’t become aware of something in our environment and then
orient ourselves
toward it. It’s the other way around: our perceptual system
detects something atten-
tion-worthy and orients us toward it preconsciously, and only
afterwards do we
become aware of it.3
Capacity of attention (a.k.a. working memory)
The primary characteristics of working memory are its low
capacity and volatility.
But what is the capacity? In terms of the warehouse analogy
presented earlier, what
is the small fixed number of searchlights?
Many college-educated people have read about “the magical
number seven, plus or
minus two,” proposed by cognitive psychologist George Miller
in 1956 as the limit on
the number of simultaneous unrelated items in human working
memory (Miller, 1956).

Miller’s characterization of the working memory limit naturally
raises several
questions:
l What are the items in working memory? They are current
perceptions and
retrieved memories. They are goals, numbers, words, names,
sounds, images,
odors—anything one can be aware of. In the brain, they are
patterns of neural
activity.
l Why must items be unrelated? Because if two items are
related, they corre-
spond to one big neural activity pattern—one set of features—
and hence one
item, not two.
l Why the fudge-factor of plus or minus two? Because
researchers cannot
measure with perfect accuracy how much people can keep track
of, and because
of differences between individuals in working memory capacity.
Later research in the 1960s and 1970s found Miller’s estimate
to be too high. In the
experiments Miller considered, some of the items presented to
people to remember
could be “chunked” (i.e., considered related), making it appear
that people’s working
memory was holding more items than it actually was.
Furthermore, all the subjects in
Miller’s experiments were college students. Working memory
capacity varies in the
general population. When the experiments were revised to
disallow unintended chunk-

ing and include noncollege students as subjects, the average
capacity of working mem-
ory was shown to be more like four plus or minus one—that is,
three to five items
(Broadbent, 1975; Mastin, 2010). Thus, in our warehouse
analogy, there would be only
four searchlights.
3 Exactly how long afterwards is discussed in Chapter 14.
Imperfect94
More recent research has cast doubt on the idea that the
capacity of working
memory should be measured in whole items or “chunks.” It
turns out that in early
working memory experiments, people were asked to briefly
remember items (e.g.,
words or images) that were quite different from each other—
that is, they had very
few features in common. In such a situation, people don’t have
to remember every
feature of an item to recall it a few seconds later; remembering
some of its features
is enough. So people appeared to recall items as a whole, and
therefore working
memory capacity seemed measurable in whole items.
Recent experiments have given people items to remember that
are similar—that
is, they share many features. In that situation, to recall an item
and not confuse it
with other items, people must remember more of its features. In

these experiments,
researchers found that people remember more details (i.e.,
features) of some items
than of others, and the items they remember in greater detail are
the ones they paid
more attention to (Bays and Husain, 2008). This suggests that
the unit of attention—
and therefore the capacity of working memory—is best
measured in item features
rather than whole items or “chunks” (Cowan et al., 2004). This
jibes with the mod-
ern view of the brain as a feature-recognition device, but it is
controversial among
memory researchers, some of whom argue that the basic
capacity of human working
memory is three to five whole items, but that is reduced if
people attend to a large
number of details (i.e., features) of the items (Alvarez and
Cavanagh, 2004).
Bottom line: The true capacity of human working memory is
still a research topic.
The second important characteristic of working memory is how
volatile it is.
Cognitive psychologists used to say that new items arriving in
working memory
often bump old ones out, but that way of describing the
volatility is based on the
view of working memory as a temporary storage place for
information. The mod-
ern view of working memory as the current focus of attention
makes it even clearer:
focusing attention on new information turns it away from some
of what it was focus-
ing on. That is why the searchlight analogy is useful.

However we describe it, information can easily be lost from
working memory. If
items in working memory don’t get combined or rehearsed, they
are at risk of having
the focus shifted away from them. This volatility applies to
goals as well as to the
details of objects. Losing items from working memory
corresponds to forgetting or
losing track of something you were doing. We have all had such
experiences, for
example:
l Going to another room for something, but once there we
can’t remember why
we came.
l Taking a phone call, and afterward not remembering what we
were doing before
the call.
l Something yanks our attention away from a conversation, and
then we can’t
remember what we were talking about.
l In the middle of adding a long list of numbers, something
distracts us, so we have
to start over.
95Characteristics of attention and working memory
WORKING MEMORY TEST
To test your working memory, get a pen or pencil and two blank
sheets

of paper and follow these instructions:
1. Place one blank sheet of paper after this page in the book
and
use it to cover the next page.
2. Flip to the next page for three seconds, pull the paper cover
down and read the black numbers at the top, and flip back to
this page. Don’t peek at other numbers on that page unless you
want to ruin the test.
3. Say your phone number backward, out loud.
4. Now write down the black numbers from memory. … Did
you
get all of them?
5. Flip back to the next page for three seconds, read the red
numbers (under the black ones), and flip back.
6. Write down the numbers from memory. These would be
easier
to recall than the first ones if you noticed that they are the first
seven digits of π (3.141592), because then they would be only
one number, not seven.
7. Flip back to the next page for 3 seconds, read the green
numbers, and flip back.
8. Write down the numbers from memory. If you noticed that
they
are odd numbers from 1 to 13, they would be easier to recall,
because they would be three chunks (“odd, 1, 13” or “odd,
seven
from 1”), not seven.

9. Flip back to the next page for three seconds, read the orange
words, and flip back.
10. Write down the words from memory. … Could you recall
them
all?
11. Flip back to the next page for three seconds, read the blue
words, and flip back.
12. Write down the words from memory. … It was certainly a
lot
easier to recall them all because they form a sentence, so they
could be memorized as one sentence rather than seven words.
Imperfect96
IMPLICATIONS OF WORKING MEMORY
CHARACTERISTICS
FOR USER-INTERFACE DESIGN
The capacity and volatility of working memory have many
implications for the design
of interactive computer systems. The basic implication is that
user interfaces should
help people remember essential information from one moment
to the next. Don’t
require people to remember system status or what they have
done, because their atten-
tion is focused on their primary goal and progress toward it.
Specific examples follow.
Modes
The limited capacity and volatility of working memory is one

reason why user-
interface design guidelines often say to either avoid designs that
have modes or
provide adequate mode feedback. In a moded user interface,
some user actions
have different effects depending on what mode the system is in.
For example:
l In a car, pressing the accelerator pedal can move the car
either forwards, back-
wards, or not at all, depending on whether the transmission is in
drive, reverse,
or neutral. The transmission sets a mode in the car’s user
interface.
l In many digital cameras, pressing the shutter button can
either snap a photo or
start a video recording, depending on which mode is selected.
3 8 4 7 5 3 9
3 1 4 1 5 9 2
1 3 5 7 9 11 31
town river corn string car shovel
what is the meaning of life
97Implications of working memory characteristics for user-
interface design
l In a drawing program, clicking and dragging normally selects
one or more

graphic objects on the drawing, but when the software is in
“draw rectangle”
mode, clicking and dragging adds a rectangle to the drawing and
stretches it to
the desired size.
Moded user interfaces have advantages; that is why many
interactive systems
have them. Modes allow a device to have more functions than
controls: the same
control provides different functions in different modes. Modes
allow an interactive
system to assign different meanings to the same gestures to
reduce the number of
gestures users must learn.
However, one well-known disadvantage of modes is that people
often make mode
errors: they forget what mode the system is in and do the wrong
thing by mistake
(Johnson, 1990). This is especially true in systems that give
poor feedback about what
the current mode is. Because of the problem of mode errors,
many user-interface design
guidelines say to either avoid modes or provide strong feedback
about which mode the
system is in. Human working memory is too unreliable for
designers to assume that
users can, without clear, continuous feedback, keep track of
what mode the system is
in, even when the users are the ones changing the system from
one mode to another.
Search results
When people use a search function on a computer to find
information, they enter

the search terms, start the search, and then review the results.
Evaluating the
results often requires knowing what the search terms were. If
working memory
were less limited, people would always remember, when
browsing the results,
what they had entered as search terms just a few seconds
earlier. But as we have
seen, working memory is very limited. When the results appear,
a person’s atten-
tion naturally turns away from what he or she entered and
toward the results.
Therefore, it should be no surprise that people viewing search
results often do not
remember the search terms they just typed.
Unfortunately, some designers of online search functions don’t
understand that.
Search results sometimes don’t show the search terms that
generated the results. For
example, in 2006, the search results page at Slate.com provided
search fields so users
could search again, but didn’t show what a user had searched
for (see Fig. 7.3A). A
recent version of the site shows the user’s search terms (see Fig.
7.3B), reducing the
burden on users’ working memory.
Calls to action
A well-known “netiquette” guideline for writing email
messages, especially messages
that require responses or ask the recipients to do something, is
to restrict each message
to one topic. If a message contains multiple topics or requests,
its recipients may focus
on one of them (usually the first one), get engrossed in

responding to that, and forget to
respond to the rest of the email. The guideline to put different
topics or requests into
separate emails is a direct result of the limited capacity of
human attention.
Imperfect98
Web designers are familiar with a similar guideline: Avoid
putting competing
calls to action on a page. Each page should have only one
dominant call to action—
or one for each possible user goal—to not overwhelm users’
attention capacity
and cause them go down paths that don’t achieve their (or the
site owner’s) goals.
(A)
(B)
FIGURE 7.3
Slate.com search results: (A) in 2007, users’ search terms were
not shown, but (B) in 2013,
search terms are shown.
99Implications of working memory characteristics for user-
interface design
A related guideline: Once users have specified their goal, don’t

distract them from
accomplishing it by displaying extraneous links and calls to
action. Instead, guide
them to the goal by using a design pattern called the process
funnel (van Duyne
et al., 2002; see also Johnson, 2007).
Instructions
If you asked a friend for a recipe or for directions to her home,
and she gave you a
long sequence of steps, you probably would not try to remember
it all. You would
know that you could not reliably keep all of the instructions in
your working mem-
ory, so you would write them down or ask your friend to send
them to you by email.
Later, while following the instructions, you would put them
where you could refer
to them until you reached the goal.
Similarly, interactive systems that display instructions for
multistep operations
should allow people to refer to the instructions while executing
them until com-
pleting all the steps. Most interactive systems do this (see Fig.
7.4), but some do not
(see Fig. 7.5).
Navigation depth
Using a software product, digital device, phone menu system, or
Web site often
involves navigating to the user’s desired information or goal. It
is well established
that navigation hierarchies that are broad and shallow are easier
for most people—
especially those who are nontechnical—to find their way around

in than narrow,
deep hierarchies (Cooper, 1999). This applies to hierarchies of
application win-
dows and dialog boxes, as well as to menu hierarchies (
Johnson, 2007).
FIGURE 7.4
Instructions in Windows Help files remain displayed while users
follow them.
Imperfect100
A related guideline: In hierarchies deeper than two levels,
provide navigation
“breadcrumb” paths to constantly remind users where they are
(Nielsen, 1999; van
Duyne et al., 2002).
These guidelines, like the others mentioned earlier, are based on
the limited capac-
ity of human working memory. Requiring a user to drill down
through eight levels of
dialog boxes, web pages, menus, or tables—especially with no
visible reminders of
their location—will probably exceed the user’s working memory
capacity, thereby
causing him or her to forget where he or she came from or what
his or her overall
goals were.
CHARACTERISTICS OF LONG-TERM MEMORY
Long-term memory differs from working memory in many

respects. Unlike working
memory, it actually is a memory store.
However, specific memories are not stored in any one neuron or
location in the
brain. As described earlier, memories, like perceptions, consist
of patterns of activa-
tion of large sets of neurons. Related memories correspond to
overlapping patterns
of activated neurons. This means that every memory is stored in
a distributed fash-
ion, spread among many parts of the brain. In this way, long-
term memory in the
brain is similar to holographic light images.
Long-term memory evolved to serve our ancestors and us very
well in getting
around in our world. However, it has many weaknesses: it is
error-prone, impression-
ist, free-associative, idiosyncratic, retroactively alterable, and
easily biased by a vari-
ety of factors at the time of recording or retrieval. Let’s
examine some of these
weaknesses.
Error-prone
Nearly everything we’ve ever experienced is stored in our long-
term memory. Unlike
working memory, the capacity of human long-term memory
seems almost unlimited.
Adult human brains each contain about 86 billion neurons
(Herculano-Houzel,
2009). As described earlier, individual neurons do not store
memories; memories are
encoded by networks of neurons acting together. Even if only
some of the brain’s

FIGURE 7.5
Instructions for Windows XP wireless setup start by telling
users to close the instructions.
101Characteristics of long-term memory
neurons are involved in memory, the large number of neurons
allows for a great
many different combinations of them, each capable of
representing a different mem-
ory. Still, no one has yet measured or even estimated the
maximum information
capacity of the human brain.4 Whatever the capacity is, it’s a
lot.
However, what is in long-term memory is not an accurate, high-
resolution record-
ing of our experiences. In terms familiar to computer engineers,
one could charac-
terize long-term memory as using heavy compression methods
that drop a great deal
of information. Images, concepts, events, sensations, actions—
all are reduced to
combinations of abstract features. Different memories are stored
at different levels of
detail—that is, with more or fewer features.
For example, the face of a man you met briefly who is not
important to you might
be stored simply as an average Caucasian male face with a
beard, with no other
details—a whole face reduced to three features. If you were

asked later to describe
the man in his absence, the most you could honestly say was
that he was a “white
guy with a beard.” You would not be able to pick him out of a
police lineup of other
Caucasian men with beards. In contrast, your memory of your
best friend’s face
includes many more features, allowing you to give a more
detailed description and
pick your friend out of any police lineup. Nonetheless, it is still
a set of features, not
anything like a bitmap image.
As another example, I have a vivid childhood memory of being
run over by a plow
and badly cut, but my father says it happened to my brother.
One of us is wrong.
In the realm of human–computer interaction, a Microsoft Word
user may remem-
ber that there is a command to insert a page number, but may
not remember which
menu the command is in. That specific feature may not have
been recorded when
the user learned how to insert page numbers. Alternatively,
perhaps the menu-loca-
tion feature was recorded, but just does not reactivate with the
rest of the memory
pattern when the user tries to recall how to insert a page
number.
Weighted by emotions
Chapter 1 described a dog that remembered seeing a cat in his
front yard every time
he returned home in the family car. The dog was excited when
he first saw the cat,

so his memory of it was strong and vivid.
A comparable human example would be an adult could easily
have strong memo-
ries of her first day at nursery school, but probably not of her
tenth. On the first day,
she was probably upset about being left at the school by her
parents, whereas by the
tenth day, being left there was nothing unusual.
Retroactively alterable
Suppose that while you are on an ocean cruise with your family,
you see a whale-shark.
Years later, when you and your family are discussing the trip,
you might remember
4 The closest researchers have come is Landauer’s (1986) use of
the average human learning rate to calculate
the amount of information a person can learn in a lifetime: 109
bits, or a few hundred megabytes.
Imperfect102
seeing a whale, and one of your relatives might recall seeing a
shark. For both of you,
some details in long-term memory were dropped because they
did not fit a common
concept.
A true example comes from 1983, when the late President
Ronald Reagan was
speaking with Jewish leaders during his first term as president.
He spoke about being

in Europe during World War II and helping to liberate Jews
from the Nazi concentra-
tion camps. The trouble was, he was never in Europe during
World War II. When he
was an actor, he was in a movie about World War II, made
entirely in Hollywood.
That important detail was missing from his memory.
IMPLICATIONS OF LONG-TERM MEMORY
CHARACTERISTICS
FOR USER-INTERFACE DESIGN
The main thing that the characteristics of long-term memory
imply is that people
need tools to augment it. Since prehistoric times, people have
invented technologies
to help them remember things over long periods: notched sticks,
knotted ropes,
mnemonics, verbal stories and histories retold around
campfires, writing, scrolls,
books, number systems, shopping lists, checklists, phone
directories, datebooks,
accounting ledgers, oven timers, computers, portable digital
assistants (PDAs),
online shared calendars, etc.
Given that humankind has a need for technologies that augment
memory, it
seems clear that software designers should try to provide
software that fulfills that
A LONG-TERM MEMORY TEST
Test your long-term memory by answering the following
questions:
1. Was there a roll of tape in the toolbox in Chapter 1?
2. What was your previous phone number?

3. Which of these words were not in the list presented in the
work-
ing memory test earlier in this chapter: city, stream, corn, auto,
twine, spade?
4. What was your first-grade teacher’s name? Second grade?
Third
grade? …
5. What Web site was presented earlier that does not show
search
terms when it displays search results?
Regarding question 3: When words are memorized, often what
is
retained is the concept, rather than the exact word that was
presented.
For example, one could hear the word “town” and later recall it
as “city.”
103Implications of long-term memory characteristics for user-
interface design
need. At the very least, designers should avoid developing
systems that burden long-
term memory. Yet that is exactly what many interactive systems
do.
Authentication is one functional area in which many software
systems place bur-
densome demands on users’ long-term memory. For example, a
web application
developed a few years ago told users to change their personal

identification number
(PIN) “to a number that is easy to remember,” but then imposed
restrictions that
made it impossible to do so (see Fig. 7.6). Whoever wrote those
instructions seems to
have realized that the PIN requirements were unreasonable,
because the instruc-
tions end by advising users to write down their PIN! Nevermind
that writing a PIN
down creates a security risk and adds yet another memory task:
users must remem-
ber where they hid their written-down PIN.
A contrasting example of burdening people’s long-term memory
for the sake of
security comes from Intuit.com. To purchase software, visitors
must register. The
site requires users to select a security question from a menu (see
Fig. 7.7). What if you
can’t answer any of the questions? What if you don’t recall your
first pet’s name, your
high school mascot, or any of the answers to the other
questions?
But that isn’t where the memory burden ends. Some questions
could have sev-
eral possible answers. Many people had several elementary
schools, childhood
friends, or heroes. To register, they must choose a question and
then remember
which answer they gave to Intuit.com. How? Probably by
writing it down some-
where. Then, when Intuit.com asks them the security question,
they have to
remember where they put the answer. Why burden people’s
memory, when it

would be easy to let users make up a security question for
which they can easily
recall the one possible answer?
Such unreasonable demands on people’s long-term memory
counteract the secu-
rity and productivity that computer-based applications
supposedly provide (Schrage,
2005), as users:
l Place sticky notes on or near computers or “hide” them in
desk drawers.
l Contact customer support to recover passwords they cannot
recall.
FIGURE 7.6
Instructions tell users to create an easy-to-remember PIN, but
the restrictions make that
impossible.
Imperfect104
l Use passwords that are easy for others to guess.
l Set up systems with no login requirements at all, or with one
shared login and
password.
The registration form at Network

Solution
s.com represents a small step toward
more usable security. Like Intuit.com, it offers a choice of
security questions, but it
also allows users to create their own security question—one for
which they can
more easily remember the answer (see Fig. 7.8).
Another implication of long-term memory characteristics for
interactive systems
is that learning and long-term retention are enhanced by user-
interface consistency.
FIGURE 7.7
Intuit.com’s registration burdens long-term memory: users may
have no unique, memorable
answer for any of the questions.
FIGURE 7.8
Network

Designing with the Mind in MindSimple Guide to Unde.docx

Designing with the Mind in MindSimple Guide to Unde.docx

More Related Content

Similar to Designing with the Mind in MindSimple Guide to Unde.docx (20)

More from simonithomas47935 (20)

Recently uploaded (20)

Designing with the Mind in MindSimple Guide to Unde.docx