Image Processing 2ed

Digital Image
Processing
Second Edition

Rafael C. Gonzalez
University of Tennessee

Richard E. Woods
MedData Interactive

Prentice Hall
Upper Saddle River, New Jersey 07458

Library of Congress Cataloging-in-Pubblication Data
Gonzalez, Rafael C.
Digital Image Processing / Richard E. Woods
p. cm.
Includes bibliographical references
ISBN 0-201-18075-8
1. Digital Imaging. 2. Digital Techniques. I. Title.

TA1632.G66 2001
621.3—dc21 2001035846
CIP
Vice-President and Editorial Director, ECS: Marcia J. Horton
Publisher: Tom Robbins
Associate Editor: Alice Dworkin
Editorial Assistant: Jody McDonnell
Vice President and Director of Production and Manufacturing, ESM: David W. Riccardi
Executive Managing Editor: Vince O’Brien
Managing Editor: David A. George
Production Editor: Rose Kernan
Composition: Prepare, Inc.
Director of Creative Services: Paul Belfanti
Creative Director: Carole Anson
Art Director and Cover Designer: Heather Scott
Art Editor: Greg Dulles
Manufacturing Manager: Trudy Pisciotti
Manufacturing Buyer: Lisa McDowell
Senior Marketing Manager: Jennie Burger

© 2002 by Prentice-Hall, Inc.
Upper Saddle River, New Jersey 07458

All rights reserved. No part of this book may be
reproduced, in any form or by any means,
without permission in writing from the publisher.

The author and publisher of this book have used their best efforts in preparing this book. These efforts
include the development, research, and testing of the theories and programs to determine their
effectiveness. The author and publisher make no warranty of any kind, expressed or implied, with regard to
these programs or the documentation contained in this book. The author and publisher shall not be liable in
any event for incidental or consequential damages in connection with, or arising out of, the furnishing,
performance, or use of these programs.

Printed in the United States of America
10 9 8 7 6 5 4 3 2 1

ISBN: 0-201-18075-8

Pearson Education Ltd., London
Pearson Education Australia Pty., Limited, Sydney
Pearson Education Singapore, Pte. Ltd.
Pearson Education North Asia Ltd., Hong Kong
Pearson Education Canada, Ltd., Toronto
Pearson Education de Mexico, S.A. de C.V.
Pearson Education—Japan, Tokyo
Pearson Education Malaysia, Pte. Ltd.
Pearson Education, Upper Saddle River, New Jersey

Preface

When something can be read without effort,
great effort has gone into its writing.
Enrique Jardiel Poncela

This edition is the most comprehensive revision of Digital Image Processing
since the book first appeared in 1977.As the 1977 and 1987 editions by Gonzalez
and Wintz, and the 1992 edition by Gonzalez and Woods, the present edition was
prepared with students and instructors in mind.Thus, the principal objectives of
the book continue to be to provide an introduction to basic concepts and
methodologies for digital image processing, and to develop a foundation that can
be used as the basis for further study and research in this field. To achieve these
objectives, we again focused on material that we believe is fundamental and
has a scope of application that is not limited to the solution of specialized prob-
lems. The mathematical complexity of the book remains at a level well within
the grasp of college seniors and first-year graduate students who have intro-
ductory preparation in mathematical analysis, vectors, matrices, probability, sta-
tistics, and rudimentary computer programming.
The present edition was influenced significantly by a recent market survey
conducted by Prentice Hall. The major findings of this survey were:
1. A need for more motivation in the introductory chapter regarding the spec-
trum of applications of digital image processing.
2. A simplification and shortening of material in the early chapters in order
to “get to the subject matter” as quickly as possible.
3. A more intuitive presentation in some areas, such as image transforms and
image restoration.
4. Individual chapter coverage of color image processing, wavelets, and image
morphology.
5. An increase in the breadth of problems at the end of each chapter.
The reorganization that resulted in this edition is our attempt at providing a
reasonable degree of balance between rigor in the presentation, the findings of
the market survey, and suggestions made by students, readers, and colleagues
since the last edition of the book. The major changes made in the book are as
follows.
Chapter 1 was rewritten completely.The main focus of the current treatment
is on examples of areas that use digital image processing. While far from ex-
haustive, the examples shown will leave little doubt in the reader’s mind re-
garding the breadth of application of digital image processing methodologies.
Chapter 2 is totally new also. The focus of the presentation in this chapter is on
how digital images are generated, and on the closely related concepts of

xv

xvi I Preface

sampling, aliasing, Moiré patterns, and image zooming and shrinking. The new
material and the manner in which these two chapters were reorganized address
directly the first two findings in the market survey mentioned above.
Chapters 3 though 6 in the current edition cover the same concepts as Chap-
ters 3 through 5 in the previous edition, but the scope is expanded and the pre-
sentation is totally different. In the previous edition, Chapter 3 was devoted
exclusively to image transforms. One of the major changes in the book is that
image transforms are now introduced when they are needed.This allowed us to
begin discussion of image processing techniques much earlier than before, fur-
ther addressing the second finding of the market survey. Chapters 3 and 4 in the
current edition deal with image enhancement, as opposed to a single chapter
(Chapter 4) in the previous edition. The new organization of this material does
not imply that image enhancement is more important than other areas. Rather,
we used it as an avenue to introduce spatial methods for image processing
(Chapter 3), as well as the Fourier transform, the frequency domain, and image
filtering (Chapter 4). Our purpose for introducing these concepts in the context
of image enhancement (a subject particularly appealing to beginners) was to in-
crease the level of intuitiveness in the presentation, thus addressing partially
the third major finding in the marketing survey. This organization also gives in-
structors flexibility in the amount of frequency-domain material they wish to
cover.
Chapter 5 also was rewritten completely in a more intuitive manner. The
coverage of this topic in earlier editions of the book was based on matrix theory.
Although unified and elegant, this type of presentation is difficult to follow,
particularly by undergraduates. The new presentation covers essentially the
same ground, but the discussion does not rely on matrix theory and is much
easier to understand, due in part to numerous new examples. The price paid for
this newly gained simplicity is the loss of a unified approach, in the sense that
in the earlier treatment a number of restoration results could be derived from
one basic formulation. On balance, however, we believe that readers (especial-
ly beginners) will find the new treatment much more appealing and easier to fol-
low. Also, as indicated below, the old material is stored in the book Web site for
easy access by individuals preferring to follow a matrix-theory formulation.
Chapter 6 dealing with color image processing is new. Interest in this area has
increased significantly in the past few years as a result of growth in the use of
digital images for Internet applications. Our treatment of this topic represents
a significant expansion of the material from previous editions. Similarly Chap-
ter 7, dealing with wavelets, is new. In addition to a number of signal process-
ing applications, interest in this area is motivated by the need for more
sophisticated methods for image compression, a topic that in turn is motivated
by a increase in the number of images transmitted over the Internet or stored
in Web servers. Chapter 8 dealing with image compression was updated to in-
clude new compression methods and standards, but its fundamental structure
remains the same as in the previous edition. Several image transforms, previously
covered in Chapter 3 and whose principal use is compression, were moved to
this chapter.

I Preface xvii

Chapter 9, dealing with image morphology, is new. It is based on a signifi-
cant expansion of the material previously included as a section in the chapter
on image representation and description. Chapter 10, dealing with image seg-
mentation, has the same basic structure as before, but numerous new examples
were included and a new section on segmentation by morphological watersheds
was added. Chapter 11, dealing with image representation and description, was
shortened slightly by the removal of the material now included in Chapter 9.
New examples were added and the Hotelling transform (description by princi-
pal components), previously included in Chapter 3, was moved to this chapter.
Chapter 12 dealing with object recognition was shortened by the removal of
topics dealing with knowledge-based image analysis, a topic now covered in
considerable detail in a number of books which we reference in Chapters 1 and
12. Experience since the last edition of Digital Image Processing indicates that
the new, shortened coverage of object recognition is a logical place at which to
conclude the book.
Although the book is totally self-contained, we have established a compan-
ion web site (see inside front cover) designed to provide support to users of the
book. For students following a formal course of study or individuals embarked
on a program of self study, the site contains a number of tutorial reviews on
background material such as probability, statistics, vectors, and matrices, pre-
pared at a basic level and written using the same notation as in the book.
Detailed solutions to many of the exercises in the book also are provided. For
instruction, the site contains suggested teaching outlines, classroom presentation
materials, laboratory experiments, and various image databases (including most
images from the book). In addition, part of the material removed from the pre-
vious edition is stored in the Web site for easy download and classroom use, at
the discretion of the instructor.A downloadable instructor’s manual containing
sample curricula, solutions to sample laboratory experiments, and solutions to
all problems in the book is available to instructors who have adopted the book
for classroom use.
This edition of Digital Image Processing is a reflection of the significant
progress that has been made in this field in just the past decade. As is usual in
a project such as this, progress continues after work on the manuscript stops. One
of the reasons earlier versions of this book have been so well accepted through-
out the world is their emphasis on fundamental concepts, an approach that,
among other things, attempts to provide a measure of constancy in a rapidly-
evolving body of knowledge. We have tried to observe that same principle in
preparing this edition of the book.

R.C.G.
R.E.W.

Contents

Preface xv
Acknowledgements xviii
About the Authors xix

1
1.1
Introduction 15
What Is Digital Image Processing? 15
1.2 The Origins of Digital Image Processing 17
1.3 Examples of Fields that Use Digital Image Processing 21
1.3.1 Gamma-Ray Imaging 22
1.3.2 X-ray Imaging 23
1.3.3 Imaging in the Ultraviolet Band 25
1.3.4 Imaging in the Visible and Infrared Bands 26
1.3.5 Imaging in the Microwave Band 32
1.3.6 Imaging in the Radio Band 34
1.3.7 Examples in which Other Imaging Modalities Are Used 34
1.4 Fundamental Steps in Digital Image Processing 39
1.5 Components of an Image Processing System 42
Summary 44
References and Further Reading 45

2
2.1
Digital Image Fundamentals
Elements of Visual Perception 34
34
2.1.1 Structure of the Human Eye 35
2.1.2 Image Formation in the Eye 37
2.1.3 Brightness Adaptation and Discrimination 38
2.2 Light and the Electromagnetic Spectrum 42
2.3 Image Sensing and Acquisition 45
2.3.1 Image Acquisition Using a Single Sensor 47
2.3.2 Image Acquisition Using Sensor Strips 48
2.3.3 Image Acquisition Using Sensor Arrays 49
2.3.4 A Simple Image Formation Model 50
2.4 Image Sampling and Quantization 52
2.4.1 Basic Concepts in Sampling and Quantization 52
2.4.2 Representing Digital Images 54
2.4.3 Spatial and Gray-Level Resolution 57
2.4.4 Aliasing and Moiré Patterns 62
2.4.5 Zooming and Shrinking Digital Images 64

vii

viii I Contents

2.5 Some Basic Relationships Between Pixels 66
2.5.1 Neighbors of a Pixel 66
2.5.2 Adjacency, Connectivity, Regions, and Boundaries 66
2.5.3 Distance Measures 68
2.5.4 Image Operations on a Pixel Basis 69
2.6 Linear and Nonlinear Operations 70
Summary 70
Problems 71

3
3.1
Image Enhancement in the Spatial Domain
Background 76
75
3.2 Some Basic Gray Level Transformations 78
3.2.1 Image Negatives 78
3.2.2 Log Transformations 79
3.2.3 Power-Law Transformations 80
3.2.4 Piecewise-Linear Transformation Functions 85
3.3 Histogram Processing 88
3.3.1 Histogram Equalization 91
3.3.2 Histogram Matching (Specification) 94
3.3.3 Local Enhancement 103
3.3.4 Use of Histogram Statistics for Image Enhancement 103
3.4 Enhancement Using Arithmetic/Logic Operations 108
3.4.1 Image Subtraction 110
3.4.2 Image Averaging 112
3.5 Basics of Spatial Filtering 116
3.6 Smoothing Spatial Filters 119
3.6.1 Smoothing Linear Filters 119
3.6.2 Order-Statistics Filters 123
3.7 Sharpening Spatial Filters 125
3.7.1 Foundation 125
3.7.2 Use of Second Derivatives for Enhancement–
The Laplacian 128
3.7.3 Use of First Derivatives for Enhancement—The Gradient 134
3.8 Combining Spatial Enhancement Methods 137
Summary 141
Problems 142

4 Image Enhancement in the Frequency
Domain 147
4.1 Background 148

I Contents ix

4.2 Introduction to the Fourier Transform and the Frequency
Domain 149
4.2.1 The One-Dimensional Fourier Transform and its Inverse 150
4.2.2 The Two-Dimensional DFT and Its Inverse 154
4.2.3 Filtering in the Frequency Domain 156
4.2.4 Correspondence between Filtering in the Spatial
and Frequency Domains 161
4.3 Smoothing Frequency-Domain Filters 167
4.3.1 Ideal Lowpass Filters 167
4.3.2 Butterworth Lowpass Filters 173
4.3.3 Gaussian Lowpass Filters 175
4.3.4 Additional Examples of Lowpass Filtering 178
4.4 Sharpening Frequency Domain Filters 180
4.4.1 Ideal Highpass Filters 182
4.4.2 Butterworth Highpass Filters 183
4.4.3 Gaussian Highpass Filters 184
4.4.4 The Laplacian in the Frequency Domain 185
4.4.5 Unsharp Masking, High-Boost Filtering,
and High-Frequency Emphasis Filtering 187
4.5 Homomorphic Filtering 191
4.6 Implementation 194
4.6.1 Some Additional Properties of the 2-D Fourier Transform 194
4.6.2 Computing the Inverse Fourier Transform Using a Forward
Transform Algorithm 198
4.6.3 More on Periodicity: the Need for Padding 199
4.6.4 The Convolution and Correlation Theorems 205
4.6.5 Summary of Properties of the 2-D Fourier Transform 208
4.6.6 The Fast Fourier Transform 208
4.6.7 Some Comments on Filter Design 213
Summary 214
References 214
Problems 215

5
5.1
Image Restoration 220
A Model of the Image Degradation/Restoration Process 221
5.2 Noise Models 222
5.2.1 Spatial and Frequency Properties of Noise 222
5.2.2 Some Important Noise Probability Density Functions 222
5.2.3 Periodic Noise 227
5.2.4 Estimation of Noise Parameters 227
5.3 Restoration in the Presence of Noise Only–Spatial Filtering 230
5.3.1 Mean Filters 231
5.3.2 Order-Statistics Filters 233
5.3.3 Adaptive Filters 237

x I Contents

5.4 Periodic Noise Reduction by Frequency Domain Filtering 243
5.4.1 Bandreject Filters 244
5.4.2 Bandpass Filters 245
5.4.3 Notch Filters 246
5.4.4 Optimum Notch Filtering 248
5.5 Linear, Position-Invariant Degradations 254
5.6 Estimating the Degradation Function 256
5.6.1 Estimation by Image Observation 256
5.6.2 Estimation by Experimentation 257
5.6.3 Estimation by Modeling 258
5.7 Inverse Filtering 261
5.8 Minimum Mean Square Error (Wiener) Filtering 262
5.9 Constrained Least Squares Filtering 266
5.10 Geometric Mean Filter 270
5.11 Geometric Transformations 270
5.11.1 Spatial Transformations 271
5.11.2 Gray-Level Interpolation 272
Summary 276
Problems 278

6
6.1
Color Image Processing
Color Fundamentals 283
282
6.2 Color Models 289
6.2.1 The RGB Color Model 290
6.2.2 The CMY and CMYK Color Models 294
6.2.3 The HSI Color Model 295
6.3 Pseudocolor Image Processing 302
6.3.1 Intensity Slicing 303
6.3.2 Gray Level to Color Transformations 308
6.4 Basics of Full-Color Image Processing 313
6.5 Color Transformations 315
6.5.1 Formulation 315
6.5.2 Color Complements 318
6.5.3 Color Slicing 320
6.5.4 Tone and Color Corrections 322
6.5.5 Histogram Processing 326
6.6 Smoothing and Sharpening 327
6.6.1 Color Image Smoothing 328
6.6.2 Color Image Sharpening 330
6.7 Color Segmentation 331
6.7.1 Segmentation in HSI Color Space 331
6.7.2 Segmentation in RGB Vector Space 333
6.7.3 Color Edge Detection 335

I Contents xi

6.8 Noise in Color Images 339
6.9 Color Image Compression 342
Summary 343
Problems 344

7
7.1
Wavelets and Multiresolution Processing
Background 350
349
7.1.1 Image Pyramids 351
7.1.2 Subband Coding 354
7.1.3 The Haar Transform 360
7.2 Multiresolution Expansions 363
7.2.1 Series Expansions 364
7.2.2 Scaling Functions 365
7.2.3 Wavelet Functions 369
7.3 Wavelet Transforms in One Dimension 372
7.3.1 The Wavelet Series Expansions 372
7.3.2 The Discrete Wavelet Transform 375
7.3.3 The Continuous Wavelet Transform 376
7.4 The Fast Wavelet Transform 379
7.5 Wavelet Transforms in Two Dimensions 386
7.6 Wavelet Packets 394
Summary 402
Problems 404

8
8.1
Image Compression
Fundamentals 411
409
8.1.1 Coding Redundancy 412
8.1.2 Interpixel Redundancy 414
8.1.3 Psychovisual Redundancy 417
8.1.4 Fidelity Criteria 419
8.2 Image Compression Models 421
8.2.1 The Source Encoder and Decoder 421
8.2.2 The Channel Encoder and Decoder 423
8.3 Elements of Information Theory 424
8.3.1 Measuring Information 424
8.3.2 The Information Channel 425
8.3.3 Fundamental Coding Theorems 430
8.3.4 Using Information Theory 437
8.4 Error-Free Compression 440
8.4.1 Variable-Length Coding 440

xii I Contents

8.4.2 LZW Coding 446
8.4.3 Bit-Plane Coding 448
8.4.4 Lossless Predictive Coding 456
8.5 Lossy Compression 459
8.5.1 Lossy Predictive Coding 459
8.5.2 Transform Coding 467
8.5.3 Wavelet Coding 486
8.6 Image Compression Standards 492
8.6.1 Binary Image Compression Standards 493
8.6.2 Continuous Tone Still Image Compression Standards 498
8.6.3 Video Compression Standards 510
Summary 513
Problems 514

9
9.1
Morphological Image Processing
Preliminaries 520
519
9.1.1 Some Basic Concepts from Set Theory 520
9.1.2 Logic Operations Involving Binary Images 522
9.2 Dilation and Erosion 523
9.2.1 Dilation 523
9.2.2 Erosion 525
9.3 Opening and Closing 528
9.4 The Hit-or-Miss Transformation 532
9.5 Some Basic Morphological Algorithms 534
9.5.1 Boundary Extraction 534
9.5.2 Region Filling 535
9.5.3 Extraction of Connected Components 536
9.5.4 Convex Hull 539
9.5.5 Thinning 541
9.5.6 Thickening 541
9.5.7 Skeletons 543
9.5.8 Pruning 545
9.5.9 Summary of Morphological Operations on Binary Images 547
9.6 Extensions to Gray-Scale Images 550
9.6.1 Dilation 550
9.6.2 Erosion 552
9.6.3 Opening and Closing 554
9.6.4 Some Applications of Gray-Scale Morphology 556
Summary 560
Problems 560

I Contents xiii

10 Image Segmentation 567
10.1 Detection of Discontinuities 568
10.1.1 Point Detection 569
10.1.2 Line Detection 570
10.1.3 Edge Detection 572
10.2 Edge Linking and Boundary Detection 585
10.2.1 Local Processing 585
10.2.2 Global Processing via the Hough Transform 587
10.2.3 Global Processing via Graph-Theoretic Techniques 591
10.3 Thresholding 595
10.3.1 Foundation 595
10.3.2 The Role of Illumination 596
10.3.3 Basic Global Thresholding 598
10.3.4 Basic Adaptive Thresholding 600
10.3.5 Optimal Global and Adaptive Thresholding 602
10.3.6 Use of Boundary Characteristics for Histogram Improvement
and Local Thresholding 608
10.3.7 Thresholds Based on Several Variables 611
10.4 Region-Based Segmentation 612
10.4.1 Basic Formulation 612
10.4.2 Region Growing 613
10.4.3 Region Splitting and Merging 615
10.5 Segmentation by Morphological Watersheds 617
10.5.1 Basic Concepts 617
10.5.2 Dam Construction 620
10.5.3 Watershed Segmentation Algorithm 622
10.5.4 The Use of Markers 624
10.6 The Use of Motion in Segmentation 626
10.6.1 Spatial Techniques 626
10.6.2 Frequency Domain Techniques 630
Summary 634
Problems 636

11 Representation and Description
11.1 Representation 644
643
11.1.1 Chain Codes 644
11.1.2 Polygonal Approximations 646
11.1.3 Signatures 648
11.1.4 Boundary Segments 649
11.1.5 Skeletons 650

xiv I Contents

11.2 Boundary Descriptors 653
11.2.1 Some Simple Descriptors 653
11.2.2 Shape Numbers 654
11.2.3 Fourier Descriptors 655
11.2.4 Statistical Moments 659
11.3 Regional Descriptors 660
11.3.1 Some Simple Descriptors 661
11.3.2 Topological Descriptors 661
11.3.3 Texture 665
11.3.4 Moments of Two-Dimensional Functions 672
11.4 Use of Principal Components for Description 675
11.5 Relational Descriptors 683
Summary 687
Problems 689

12 Object Recognition 693
12.1 Patterns and Pattern Classes 693
12.2 Recognition Based on Decision-Theoretic Methods 698
12.2.1 Matching 698
12.2.2 Optimum Statistical Classifiers 704
12.2.3 Neural Networks 712
12.3 Structural Methods 732
12.3.1 Matching Shape Numbers 732
12.3.2 String Matching 734
12.3.3 Syntactic Recognition of Strings 735
12.3.4 Syntactic Recognition of Trees 740
Summary 750
Problems 750

Bibliography 755
Index 779

1 Introduction

One picture is worth more than ten thousand words.
Anonymous

Preview
Interest in digital image processing methods stems from two principal applica-
tion areas: improvement of pictorial information for human interpretation; and
processing of image data for storage, transmission, and representation for au-
tonomous machine perception.This chapter has several objectives: (1) to define
the scope of the field that we call image processing; (2) to give a historical per-
spective of the origins of this field; (3) to give an idea of the state of the art in
image processing by examining some of the principal areas in which it is ap-
plied; (4) to discuss briefly the principal approaches used in digital image pro-
cessing; (5) to give an overview of the components contained in a typical,
general-purpose image processing system; and (6) to provide direction to the
books and other literature where image processing work normally is reported.

1.1 What Is Digital Image Processing?
An image may be defined as a two-dimensional function, f(x, y), where x and
y are spatial (plane) coordinates, and the amplitude of f at any pair of coordi-
nates (x, y) is called the intensity or gray level of the image at that point. When
x, y, and the amplitude values of f are all finite, discrete quantities, we call the
image a digital image. The field of digital image processing refers to processing
digital images by means of a digital computer. Note that a digital image is com-
posed of a finite number of elements, each of which has a particular location and

1

2 Chapter 1 I Introduction

value. These elements are referred to as picture elements, image elements, pels,
and pixels. Pixel is the term most widely used to denote the elements of a digi-
tal image. We consider these definitions in more formal terms in Chapter 2.
Vision is the most advanced of our senses, so it is not surprising that images
play the single most important role in human perception. However, unlike
humans, who are limited to the visual band of the electromagnetic (EM) spec-
trum, imaging machines cover almost the entire EM spectrum, ranging from
gamma to radio waves. They can operate on images generated by sources that
humans are not accustomed to associating with images. These include ultra-
sound, electron microscopy, and computer-generated images.Thus, digital image
processing encompasses a wide and varied field of applications.
There is no general agreement among authors regarding where image pro-
cessing stops and other related areas, such as image analysis and computer vi-
sion, start. Sometimes a distinction is made by defining image processing as a
discipline in which both the input and output of a process are images.We believe
this to be a limiting and somewhat artificial boundary. For example, under this
definition, even the trivial task of computing the average intensity of an image
(which yields a single number) would not be considered an image processing op-
eration. On the other hand, there are fields such as computer vision whose ul-
timate goal is to use computers to emulate human vision, including learning
and being able to make inferences and take actions based on visual inputs. This
area itself is a branch of artificial intelligence (AI) whose objective is to emu-
late human intelligence.The field of AI is in its earliest stages of infancy in terms
of development, with progress having been much slower than originally antic-
ipated. The area of image analysis (also called image understanding) is in be-
tween image processing and computer vision.
There are no clear-cut boundaries in the continuum from image processing
at one end to computer vision at the other. However, one useful paradigm is
to consider three types of computerized processes in this continuum: low-,
mid-, and high-level processes. Low-level processes involve primitive opera-
tions such as image preprocessing to reduce noise, contrast enhancement, and
image sharpening. A low-level process is characterized by the fact that both
its inputs and outputs are images. Mid-level processing on images involves
tasks such as segmentation (partitioning an image into regions or objects),
description of those objects to reduce them to a form suitable for computer
processing, and classification (recognition) of individual objects. A mid-level
process is characterized by the fact that its inputs generally are images, but its
outputs are attributes extracted from those images (e.g., edges, contours, and
the identity of individual objects). Finally, higher-level processing involves
“making sense” of an ensemble of recognized objects, as in image analysis,
and, at the far end of the continuum, performing the cognitive functions nor-
mally associated with vision.
Based on the preceding comments, we see that a logical place of overlap be-
tween image processing and image analysis is the area of recognition of indi-
vidual regions or objects in an image. Thus, what we call in this book digital
image processing encompasses processes whose inputs and outputs are images

1.2 I The Origins of Digital Image Processing 3

and, in addition, encompasses processes that extract attributes from images, up
to and including the recognition of individual objects. As a simple illustration
to clarify these concepts, consider the area of automated analysis of text. The
processes of acquiring an image of the area containing the text, preprocessing
that image, extracting (segmenting) the individual characters, describing the
characters in a form suitable for computer processing, and recognizing those
individual characters are in the scope of what we call digital image processing
in this book. Making sense of the content of the page may be viewed as being
in the domain of image analysis and even computer vision, depending on the
level of complexity implied by the statement “making sense.” As will become
evident shortly, digital image processing, as we have defined it, is used success-
fully in a broad range of areas of exceptional social and economic value.The con-
cepts developed in the following chapters are the foundation for the methods
used in those application areas.

1.2 The Origins of Digital Image Processing
One of the first applications of digital images was in the newspaper industry,
when pictures were first sent by submarine cable between London and New
York. Introduction of the Bartlane cable picture transmission system in the
early 1920s reduced the time required to transport a picture across the Atlantic
from more than a week to less than three hours. Specialized printing equipment
coded pictures for cable transmission and then reconstructed them at the re-
ceiving end. Figure 1.1 was transmitted in this way and reproduced on a tele-
graph printer fitted with typefaces simulating a halftone pattern.
Some of the initial problems in improving the visual quality of these early dig-
ital pictures were related to the selection of printing procedures and the distri-
bution of intensity levels. The printing method used to obtain Fig. 1.1 was
abandoned toward the end of 1921 in favor of a technique based on photo-
graphic reproduction made from tapes perforated at the telegraph receiving
terminal. Figure 1.2 shows an image obtained using this method. The improve-
ments over Fig. 1.1 are evident, both in tonal quality and in resolution.

FIGURE 1.1 A
digital picture
produced in 1921
from a coded tape
by a telegraph
printer with
special type faces.
(McFarlane.†)

†
References in the Bibliography at the end of the book are listed in alphabetical order by authors’ last
names.


FIGURE 1.2 A
digital picture
made in 1922
from a tape
punched after the
signals had
crossed the
Atlantic twice.
Some errors are
visible.
(McFarlane.)

The early Bartlane systems were capable of coding images in five distinct
levels of gray. This capability was increased to 15 levels in 1929. Figure 1.3 is
typical of the type of images that could be obtained using the 15-tone equipment.
During this period, introduction of a system for developing a film plate via light
beams that were modulated by the coded picture tape improved the reproduc-
tion process considerably.
Although the examples just cited involve digital images, they are not con-
sidered digital image processing results in the context of our definition because
computers were not involved in their creation.Thus, the history of digital image
processing is intimately tied to the development of the digital computer. In fact,
digital images require so much storage and computational power that progress
in the field of digital image processing has been dependent on the development
of digital computers and of supporting technologies that include data storage,
display, and transmission.
The idea of a computer goes back to the invention of the abacus in Asia
Minor, more than 5000 years ago. More recently, there were developments in the
past two centuries that are the foundation of what we call a computer today.
However, the basis for what we call a modern digital computer dates back to only
the 1940s with the introduction by John von Neumann of two key concepts:
(1) a memory to hold a stored program and data, and (2) conditional branch-
ing. These two ideas are the foundation of a central processing unit (CPU),
which is at the heart of computers today. Starting with von Neumann, there were

FIGURE 1.3
Unretouched
cable picture of
Generals Pershing
and Foch,
transmitted in
1929 from
London to New
York by 15-tone
equipment.
(McFarlane.)

1.2 I The Origins of Digital Image Processing 5

a series of key advances that led to computers powerful enough to be used for
digital image processing. Briefly, these advances may be summarized as follows:
(1) the invention of the transistor by Bell Laboratories in 1948; (2) the devel-
opment in the 1950s and 1960s of the high-level programming languages
COBOL (Common Business-Oriented Language) and FORTRAN (Formula
Translator); (3) the invention of the integrated circuit (IC) at Texas Instruments
in 1958; (4) the development of operating systems in the early 1960s; (5) the de-
velopment of the microprocessor (a single chip consisting of the central pro-
cessing unit, memory, and input and output controls) by Intel in the early 1970s;
(6) introduction by IBM of the personal computer in 1981; and (7) progressive
miniaturization of components, starting with large scale integration (LI) in the
late 1970s, then very large scale integration (VLSI) in the 1980s, to the present
use of ultra large scale integration (ULSI). Concurrent with these advances
were developments in the areas of mass storage and display systems, both of
which are fundamental requirements for digital image processing.
The first computers powerful enough to carry out meaningful image pro-
cessing tasks appeared in the early 1960s.The birth of what we call digital image
processing today can be traced to the availability of those machines and the
onset of the space program during that period. It took the combination of those
two developments to bring into focus the potential of digital image processing
concepts. Work on using computer techniques for improving images from a
space probe began at the Jet Propulsion Laboratory (Pasadena, California) in
1964 when pictures of the moon transmitted by Ranger 7 were processed by a
computer to correct various types of image distortion inherent in the on-board
television camera. Figure 1.4 shows the first image of the moon taken by
Ranger 7 on July 31, 1964 at 9 : 09 A.M. Eastern Daylight Time (EDT), about 17
minutes before impacting the lunar surface (the markers, called reseau marks,
are used for geometric corrections, as discussed in Chapter 5). This also is the
first image of the moon taken by a U.S. spacecraft. The imaging lessons learned
with Ranger 7 served as the basis for improved methods used to enhance and
restore images from the Surveyor missions to the moon, the Mariner series of
flyby missions to Mars, the Apollo manned flights to the moon, and others.

FIGURE 1.4 The
first picture of the
moon by a U.S.
spacecraft.
Ranger 7 took this
image on July 31,
1964 at 9 : 09 A.M.
EDT, about 17
minutes before
impacting the
lunar surface.
(Courtesy of
NASA.)


In parallel with space applications, digital image processing techniques began in
the late 1960s and early 1970s to be used in medical imaging, remote Earth re-
sources observations, and astronomy. The invention in the early 1970s of comput-
erized axial tomography (CAT), also called computerized tomography (CT) for
short, is one of the most important events in the application of image processing in
medical diagnosis. Computerized axial tomography is a process in which a ring of
detectors encircles an object (or patient) and an X-ray source, concentric with the
detector ring, rotates about the object.The X-rays pass through the object and are
collected at the opposite end by the corresponding detectors in the ring. As the
source rotates, this procedure is repeated. Tomography consists of algorithms that
use the sensed data to construct an image that represents a “slice” through the ob-
ject. Motion of the object in a direction perpendicular to the ring of detectors pro-
duces a set of such slices, which constitute a three-dimensional (3-D) rendition of
the inside of the object. Tomography was invented independently by Sir Godfrey
N. Hounsfield and Professor Allan M. Cormack, who shared the 1979 Nobel Prize
in Medicine for their invention. It is interesting to note that X-rays were discov-
ered in 1895 by Wilhelm Conrad Roentgen, for which he received the 1901 Nobel
Prize for Physics. These two inventions, nearly 100 years apart, led to some of the
most active application areas of image processing today.
From the 1960s until the present, the field of image processing has grown vig-
orously. In addition to applications in medicine and the space program, digital
image processing techniques now are used in a broad range of applications. Com-
puter procedures are used to enhance the contrast or code the intensity levels into
color for easier interpretation of X-rays and other images used in industry, medi-
cine, and the biological sciences. Geographers use the same or similar techniques
to study pollution patterns from aerial and satellite imagery. Image enhancement
and restoration procedures are used to process degraded images of unrecoverable
objects or experimental results too expensive to duplicate. In archeology, image
processing methods have successfully restored blurred pictures that were the only
available records of rare artifacts lost or damaged after being photographed. In
physics and related fields, computer techniques routinely enhance images of ex-
periments in areas such as high-energy plasmas and electron microscopy. Similar-
ly successful applications of image processing concepts can be found in astronomy,
biology, nuclear medicine, law enforcement, defense, and industrial applications.
These examples illustrate processing results intended for human interpreta-
tion.The second major area of application of digital image processing techniques
mentioned at the beginning of this chapter is in solving problems dealing with
machine perception. In this case, interest focuses on procedures for extracting
from an image information in a form suitable for computer processing. Often,
this information bears little resemblance to visual features that humans use in
interpreting the content of an image. Examples of the type of information used
in machine perception are statistical moments, Fourier transform coefficients, and
multidimensional distance measures. Typical problems in machine perception
that routinely utilize image processing techniques are automatic character recog-
nition, industrial machine vision for product assembly and inspection, military
recognizance, automatic processing of fingerprints, screening of X-rays and blood
samples, and machine processing of aerial and satellite imagery for weather

1.3 I Examples of Fields that Use Digital Image Processing 7

prediction and environmental assessment.The continuing decline in the ratio of
computer price to performance and the expansion of networking and commu-
nication bandwidth via the World Wide Web and the Internet have created un-
precedented opportunities for continued growth of digital image processing.
Some of these application areas are illustrated in the following section.

1.3 Examples of Fields that Use Digital Image Processing
Today, there is almost no area of technical endeavor that is not impacted in
some way by digital image processing. We can cover only a few of these appli-
cations in the context and space of the current discussion. However, limited as
it is, the material presented in this section will leave no doubt in the reader’s
mind regarding the breadth and importance of digital image processing. We
show in this section numerous areas of application, each of which routinely uti-
lizes the digital image processing techniques developed in the following chap-
ters. Many of the images shown in this section are used later in one or more of
the examples given in the book. All images shown are digital.
The areas of application of digital image processing are so varied that some
form of organization is desirable in attempting to capture the breadth of this
field. One of the simplest ways to develop a basic understanding of the extent of
image processing applications is to categorize images according to their source
(e.g., visual, X-ray, and so on).The principal energy source for images in use today
is the electromagnetic energy spectrum. Other important sources of energy in-
clude acoustic, ultrasonic, and electronic (in the form of electron beams used in
electron microscopy). Synthetic images, used for modeling and visualization, are
generated by computer. In this section we discuss briefly how images are gener-
ated in these various categories and the areas in which they are applied. Meth-
ods for converting images into digital form are discussed in the next chapter.
Images based on radiation from the EM spectrum are the most familiar, es-
pecially images in the X-ray and visual bands of the spectrum. Electromagnet-
ic waves can be conceptualized as propagating sinusoidal waves of varying
wavelengths, or they can be thought of as a stream of massless particles, each
traveling in a wavelike pattern and moving at the speed of light. Each massless
particle contains a certain amount (or bundle) of energy. Each bundle of ener-
gy is called a photon. If spectral bands are grouped according to energy per
photon, we obtain the spectrum shown in Fig. 1.5, ranging from gamma rays
(highest energy) at one end to radio waves (lowest energy) at the other. The
bands are shown shaded to convey the fact that bands of the EM spectrum are
not distinct but rather transition smoothly from one to the other.

Energy of one photon (electron volts)
106 105 104 103 102 101 10–1 10–1 10–2 10–3 10–4 10–5 10–6 10–7 10–8 10–9

Gamma rays X-rays Ultraviolet Visible Infrared Microwaves Radio waves

FIGURE 1.5 The electromagnetic spectrum arranged according to energy per photon.


1.3.1 Gamma-Ray Imaging
Major uses of imaging based on gamma rays include nuclear medicine and as-
tronomical observations. In nuclear medicine, the approach is to inject a pa-
tient with a radioactive isotope that emits gamma rays as it decays. Images are
produced from the emissions collected by gamma ray detectors. Figure 1.6(a)
shows an image of a complete bone scan obtained by using gamma-ray imag-
ing. Images of this sort are used to locate sites of bone pathology, such as in-
fections or tumors. Figure 1.6(b) shows another major modality of nuclear
imaging called positron emission tomography (PET). The principle is the same

a b
c d
FIGURE 1.6
Examples of
gamma-ray
imaging. (a) Bone
scan. (b) PET
image. (c) Cygnus
Loop. (d) Gamma
radiation (bright
spot) from a
reactor valve.
(Images courtesy
of (a) G.E.
Medical Systems,
(b) Dr. Michael
E. Casey, CTI
PET Systems,
(c) NASA,
(d) Professors
Zhong He and
David K. Wehe,
University of
Michigan.)


as with X-ray tomography, mentioned briefly in Section 1.2. However, instead
of using an external source of X-ray energy, the patient is given a radioactive iso-
tope that emits positrons as it decays. When a positron meets an electron, both
are annihilated and two gamma rays are given off. These are detected and a to-
mographic image is created using the basic principles of tomography.The image
shown in Fig. 1.6(b) is one sample of a sequence that constitutes a 3-D rendi-
tion of the patient. This image shows a tumor in the brain and one in the lung,
easily visible as small white masses.
A star in the constellation of Cygnus exploded about 15,000 years ago, gen-
erating a superheated stationary gas cloud (known as the Cygnus Loop) that
glows in a spectacular array of colors. Figure 1.6(c) shows the Cygnus Loop im-
aged in the gamma-ray band. Unlike the two examples shown in Figs. 1.6(a)
and (b), this image was obtained using the natural radiation of the object being
imaged. Finally, Fig. 1.6(d) shows an image of gamma radiation from a valve in
a nuclear reactor. An area of strong radiation is seen in the lower, left side of
the image.

1.3.2 X-ray Imaging
X-rays are among the oldest sources of EM radiation used for imaging. The
best known use of X-rays is medical diagnostics, but they also are used exten-
sively in industry and other areas, like astronomy. X-rays for medical and in-
dustrial imaging are generated using an X-ray tube, which is a vacuum tube
with a cathode and anode. The cathode is heated, causing free electrons to be
released. These electrons flow at high speed to the positively charged anode.
When the electrons strike a nucleus, energy is released in the form of X-ray ra-
diation. The energy (penetrating power) of the X-rays is controlled by a volt-
age applied across the anode, and the number of X-rays is controlled by a current
applied to the filament in the cathode. Figure 1.7(a) shows a familiar chest X-ray
generated simply by placing the patient between an X-ray source and a film
sensitive to X-ray energy. The intensity of the X-rays is modified by absorption
as they pass through the patient, and the resulting energy falling on the film de-
velops it, much in the same way that light develops photographic film. In digi-
tal radiography, digital images are obtained by one of two methods: (1) by
digitizing X-ray films; or (2) by having the X-rays that pass through the patient
fall directly onto devices (such as a phosphor screen) that convert X-rays to
light.The light signal in turn is captured by a light-sensitive digitizing system.We
discuss digitization in detail in Chapter 2.
Angiography is another major application in an area called contrast-
enhancement radiography. This procedure is used to obtain images (called
angiograms) of blood vessels. A catheter (a small, flexible, hollow tube) is in-
serted, for example, into an artery or vein in the groin. The catheter is thread-
ed into the blood vessel and guided to the area to be studied.When the catheter
reaches the site under investigation, an X-ray contrast medium is injected
through the catheter. This enhances contrast of the blood vessels and enables
the radiologist to see any irregularities or blockages. Figure 1.7(b) shows an ex-
ample of an aortic angiogram. The catheter can be seen being inserted into the
large blood vessel on the lower left of the picture. Note the high contrast of the


a FIGURE 1.7 Examples of X-ray imaging. (a) Chest X-ray. (b) Aortic angiogram. (c) Head
d CT. (d) Circuit boards. (e) Cygnus Loop. (Images courtesy of (a) and (c) Dr. David
b
c e R. Pickens, Dept. of Radiology & Radiological Sciences, Vanderbilt University Medical
Center, (b) Dr. Thomas R. Gest, Division of Anatomical Sciences, University of Michi-
gan Medical School, (d) Mr. Joseph E. Pascente, Lixi, Inc., and (e) NASA.)


large vessel as the contrast medium flows up in the direction of the kidneys,
which are also visible in the image. As discussed in Chapter 3, angiography is a
major area of digital image processing, where image subtraction is used to en-
hance further the blood vessels being studied.
Perhaps the best known of all uses of X-rays in medical imaging is comput-
erized axial tomography. Due to their resolution and 3-D capabilities, CAT
scans revolutionized medicine from the moment they first became available in
the early 1970s. As noted in Section 1.2, each CAT image is a “slice” taken per-
pendicularly through the patient. Numerous slices are generated as the patient
is moved in a longitudinal direction. The ensemble of such images constitutes a
3-D rendition of the inside of the patient, with the longitudinal resolution being
proportional to the number of slice images taken. Figure 1.7(c) shows a typical
head CAT slice image.
Techniques similar to the ones just discussed, but generally involving higher-
energy X-rays, are applicable in industrial processes. Figure 1.7(d) shows an
X-ray image of an electronic circuit board. Such images, representative of lit-
erally hundreds of industrial applications of X-rays, are used to examine circuit
boards for flaws in manufacturing, such as missing components or broken traces.
Industrial CAT scans are useful when the parts can be penetrated by X-rays,
such as in plastic assemblies, and even large bodies, like solid-propellant rock-
et motors. Figure 1.7(e) shows an example of X-ray imaging in astronomy. This
image is the Cygnus Loop of Fig. 1.6(c), but imaged this time in the X-ray band.

1.3.3 Imaging in the Ultraviolet Band
Applications of ultraviolet “light” are varied. They include lithography, indus-
trial inspection, microscopy, lasers, biological imaging, and astronomical obser-
vations. We illustrate imaging in this band with examples from microscopy and
astronomy.
Ultraviolet light is used in fluorescence microscopy, one of the fastest grow-
ing areas of microscopy. Fluorescence is a phenomenon discovered in the mid-
dle of the nineteenth century, when it was first observed that the mineral
fluorspar fluoresces when ultraviolet light is directed upon it. The ultraviolet
light itself is not visible, but when a photon of ultraviolet radiation collides with
an electron in an atom of a fluorescent material, it elevates the electron to a
higher energy level. Subsequently, the excited electron relaxes to a lower level
and emits light in the form of a lower-energy photon in the visible (red) light re-
gion. The basic task of the fluorescence microscope is to use an excitation light
to irradiate a prepared specimen and then to separate the much weaker radi-
ating fluorescent light from the brighter excitation light.Thus, only the emission
light reaches the eye or other detector. The resulting fluorescing areas shine
against a dark background with sufficient contrast to permit detection. The
darker the background of the nonfluorescing material, the more efficient the
instrument.
Fluorescence microscopy is an excellent method for studying materials that
can be made to fluoresce, either in their natural form (primary fluorescence) or
when treated with chemicals capable of fluorescing (secondary fluorescence).
Figures 1.8(a) and (b) show results typical of the capability of fluorescence


a b
c
FIGURE 1.8
Examples of
ultraviolet
imaging.
(a) Normal corn.
(b) Smut corn.
(c) Cygnus Loop.
(Images courtesy
of (a) and
(b) Dr. Michael
W. Davidson,
Florida State
University,
(c) NASA.)

microscopy. Figure 1.8(a) shows a fluorescence microscope image of normal
corn, and Fig. 1.8(b) shows corn infected by “smut,” a disease of cereals, corn,
grasses, onions, and sorghum that can be caused by any of more than 700 species
of parasitic fungi. Corn smut is particularly harmful because corn is one of the
principal food sources in the world. As another illustration, Fig. 1.8(c) shows
the Cygnus Loop imaged in the high-energy region of the ultraviolet band.

1.3.4 Imaging in the Visible and Infrared Bands
Considering that the visual band of the electromagnetic spectrum is the most
familiar in all our activities, it is not surprising that imaging in this band out-
weighs by far all the others in terms of scope of application. The infrared band


often is used in conjunction with visual imaging, so we have grouped the visi-
ble and infrared bands in this section for the purpose of illustration.We consider
in the following discussion applications in light microscopy, astronomy, remote
sensing, industry, and law enforcement.
Figure 1.9 shows several examples of images obtained with a light microscope.
The examples range from pharmaceuticals and microinspection to materials
characterization. Even in just microscopy, the application areas are too numer-
ous to detail here. It is not difficult to conceptualize the types of processes one
might apply to these images, ranging from enhancement to measurements.

a b c
d e f
FIGURE 1.9 Examples of light microscopy images. (a) Taxol (anticancer agent), magnified
250 µ. (b) Cholesterol—40 µ. (c) Microprocessor—60 µ. (d) Nickel oxide thin film—600
µ. (e) Surface of audio CD—1750 µ. (f) Organic superconductor—450 µ. (Images cour-
tesy of Dr. Michael W. Davidson, Florida State University.)


TABLE 1.1
Band No. Name Wavelength ( m) Characteristics and Uses
Thematic bands
in NASA’s 1 Visible blue 0.45–0.52 Maximum water
LANDSAT penetration
satellite. 2 Visible green 0.52–0.60 Good for measuring plant
vigor
3 Visible red 0.63–0.69 Vegetation discrimination
4 Near infrared 0.76–0.90 Biomass and shoreline
mapping
5 Middle infrared 1.55–1.75 Moisture content of soil
and vegetation
6 Thermal infrared 10.4–12.5 Soil moisture; thermal
mapping
7 Middle infrared 2.08–2.35 Mineral mapping

Another major area of visual processing is remote sensing, which usually
includes several bands in the visual and infrared regions of the spectrum.
Table 1.1 shows the so-called thematic bands in NASA’s LANDSAT satel-
lite. The primary function of LANDSAT is to obtain and transmit images of
the Earth from space, for purposes of monitoring environmental conditions
on the planet. The bands are expressed in terms of wavelength, with 1 m
being equal to 10–6 m (we discuss the wavelength regions of the electromag-
netic spectrum in more detail in Chapter 2). Note the characteristics and uses
of each band.
In order to develop a basic appreciation for the power of this type of multi-
spectral imaging, consider Fig. 1.10, which shows one image for each of the spec-
1 2 3

4 5 6 7
FIGURE 1.10 LANDSAT satellite images of the Washington, D.C. area. The numbers refer to the thematic
bands in Table 1.1. (Images courtesy of NASA.)


FIGURE 1.11
Multispectral
image of
Hurricane
Andrew taken by
NOAA GEOS
(Geostationary
Environmental
Operational
Satellite) sensors.
(Courtesy of
NOAA.)

tral bands in Table 1.1.The area imaged is Washington D.C., which includes fea-
tures such as buildings, roads, vegetation, and a major river (the Potomac) going
though the city. Images of population centers are used routinely (over time) to
assess population growth and shift patterns, pollution, and other factors harm-
ful to the environment. The differences between visual and infrared image fea-
tures are quite noticeable in these images. Observe, for example, how well
defined the river is from its surroundings in Bands 4 and 5.
Weather observation and prediction also are major applications of multi-
spectral imaging from satellites. For example, Fig. 1.11 is an image of a hurricane
taken by a National Oceanographic and Atmospheric Administration (NOAA)
satellite using sensors in the visible and infrared bands.The eye of the hurricane
is clearly visible in this image.
Figures 1.12 and 1.13 show an application of infrared imaging. These images
are part of the Nighttime Lights of the World data set, which provides a glob-
al inventory of human settlements. The images were generated by the infrared
imaging system mounted on a NOAA DMSP (Defense Meteorological Satel-
lite Program) satellite. The infrared imaging system operates in the band 10.0
to 13.4 m, and has the unique capability to observe faint sources of visible-
near infrared emissions present on the Earth’s surface, including cities, towns,
villages, gas flares, and fires. Even without formal training in image process-
ing, it is not difficult to imagine writing a computer program that would use
these images to estimate the percent of total electrical energy used by various
regions of the world.


FIGURE 1.12
Infrared satellite
images of the
Americas. The
small gray map is
provided for
reference.
(Courtesy of
NOAA.)

A major area of imaging in the visual spectrum is in automated visual inspec-
tion of manufactured goods. Figure 1.14 shows some examples. Figure 1.14(a) is
a controller board for a CD-ROM drive. A typical image processing task with
products like this is to inspect them for missing parts (the black square on the top,
right quadrant of the image is an example of a missing component). Figure 1.14(b)
is an imaged pill container.The objective here is to have a machine look for miss-
ing pills. Figure 1.14(c) shows an application in which image processing is used to
look for bottles that are not filled up to an acceptable level. Figure 1.14(d) shows


FIGURE 1.13
Infrared satellite
images of the
remaining
populated part of
the world. The
small gray map is
provided for
reference.
(Courtesy of
NOAA.)

a clear-plastic part with an unacceptable number of air pockets in it. Detecting
anomalies like these is a major theme of industrial inspection that includes other
products such as wood and cloth. Figure 1.14(e) shows a batch of cereal during in-
spection for color and the presence of anomalies such as burned flakes. Finally,
Fig. 1.14(f) shows an image of an intraocular implant (replacement lens for the
human eye).A “structured light” illumination technique was used to highlight for
easier detection flat lens deformations toward the center of the lens.The markings
at 1 o’clock and 5 o’clock are tweezer damage. Most of the other small speckle de-
tail is debris. The objective in this type of inspection is to find damaged or incor-
rectly manufactured implants automatically, prior to packaging.
As a final illustration of image processing in the visual spectrum, consider
Fig. 1.15. Figure 1.15(a) shows a thumb print. Images of fingerprints are routinely
processed by computer, either to enhance them or to find features that aid in
the automated search of a database for potential matches. Figure 1.15(b) shows
an image of paper currency.Applications of digital image processing in this area
include automated counting and, in law enforcement, the reading of the serial
number for the purpose of tracking and identifying bills.The two vehicle images
shown in Figs. 1.15 (c) and (d) are examples of automated license plate reading.


a b
c d
e f
FIGURE 1.14
Some examples of
manufactured
goods often
checked using
digital image
processing. (a) A
circuit board
controller.
(b) Packaged pills.
(c) Bottles.
(d) Bubbles in
clear-plastic
product.
(e) Cereal.
(f) Image of
intraocular
implant.
(Fig. (f) courtesy
of Mr. Pete Sites,
Perceptics
Corporation.)

The light rectangles indicate the area in which the imaging system detected the
plate. The black rectangles show the results of automated reading of the plate
content by the system. License plate and other applications of character recog-
nition are used extensively for traffic monitoring and surveillance.

1.3.5 Imaging in the Microwave Band
The dominant application of imaging in the microwave band is radar.The unique
feature of imaging radar is its ability to collect data over virtually any region at
any time, regardless of weather or ambient lighting conditions. Some radar


a b
c
d
FIGURE 1.15
Some additional
examples of
imaging in the
visual spectrum.
(a) Thumb print.
(b) Paper
currency. (c) and
(d). Automated
license plate
reading. (Figure
(a) courtesy of the
National Institute
of Standards and
Technology.
Figures (c) and
(d) courtesy of
Dr. Juan Herrera,
Perceptics
Corporation.)

waves can penetrate clouds, and under certain conditions can also see through
vegetation, ice, and extremely dry sand. In many cases, radar is the only way to
explore inaccessible regions of the Earth’s surface.An imaging radar works like
a flash camera in that it provides its own illumination (microwave pulses) to il-
luminate an area on the ground and take a snapshot image. Instead of a cam-
era lens, a radar uses an antenna and digital computer processing to record its
images. In a radar image, one can see only the microwave energy that was re-
flected back toward the radar antenna.
Figure 1.16 shows a spaceborne radar image covering a rugged mountain-
ous area of southeast Tibet, about 90 km east of the city of Lhasa. In the lower
right corner is a wide valley of the Lhasa River, which is populated by Tibetan
farmers and yak herders and includes the village of Menba. Mountains in this
area reach about 5800 m (19,000 ft) above sea level, while the valley floors lie
about 4300 m (14,000 ft) above sea level. Note the clarity and detail of the image,
unencumbered by clouds or other atmospheric conditions that normally inter-
fere with images in the visual band.


FIGURE 1.16
Spaceborne radar
image of
mountains in
southeast Tibet.
(Courtesy of
NASA.)

1.3.6 Imaging in the Radio Band
As in the case of imaging at the other end of the spectrum (gamma rays), the
major applications of imaging in the radio band are in medicine and astrono-
my. In medicine radio waves are used in magnetic resonance imaging (MRI).
This technique places a patient in a powerful magnet and passes radio waves
through his or her body in short pulses. Each pulse causes a responding pulse
of radio waves to be emitted by the patient’s tissues. The location from which
these signals originate and their strength are determined by a computer, which
produces a two-dimensional picture of a section of the patient. MRI can produce
pictures in any plane. Figure 1.17 shows MRI images of a human knee and spine.
The last image to the right in Fig. 1.18 shows an image of the Crab Pulsar in
the radio band. Also shown for an interesting comparison are images of the
same region but taken in most of the bands discussed earlier. Note that each
image gives a totally different “view” of the Pulsar.

1.3.7 Examples in which Other Imaging Modalities Are Used
Although imaging in the electromagnetic spectrum is dominant by far, there
are a number of other imaging modalities that also are important. Specifically,
we discuss in this section acoustic imaging, electron microscopy, and synthetic
(computer-generated) imaging.
Imaging using “sound” finds application in geological exploration, industry,
and medicine. Geological applications use sound in the low end of the sound spec-
trum (hundreds of Hertz) while imaging in other areas use ultrasound (millions
of Hertz). The most important commercial applications of image processing in
geology are in mineral and oil exploration. For image acquisition over land, one
of the main approaches is to use a large truck and a large flat steel plate.The plate
is pressed on the ground by the truck, and the truck is vibrated through a fre-


a b
FIGURE 1.17 MRI images of a human (a) knee, and (b) spine. (Image (a) courtesy of
Dr. Thomas R. Gest, Division of Anatomical Sciences, University of Michigan Medical
School, and (b) Dr. David R. Pickens, Department of Radiology and Radiological Sci-
ences, Vanderbilt University Medical Center.)

quency spectrum up to 100 Hz. The strength and speed of the returning sound
waves are determined by the composition of the earth below the surface. These
are analyzed by computer, and images are generated from the resulting analysis.
For marine acquisition, the energy source consists usually of two air guns
towed behind a ship. Returning sound waves are detected by hydrophones
placed in cables that are either towed behind the ship, laid on the bottom of
the ocean, or hung from buoys (vertical cables).The two air guns are alternately
pressurized to ~ 2000 psi and then set off. The constant motion of the ship pro-
vides a transversal direction of motion that, together with the returning sound
waves, is used to generate a 3-D map of the composition of the Earth below
the bottom of the ocean.
Figure 1.19 shows a cross-sectional image of a well-known 3-D model against
which the performance of seismic imaging algorithms is tested.The arrow points
to a hydrocarbon (oil and/or gas) trap. This target is brighter than the sur-
rounding layers because of the change in density in the target region is larger.

Gamma X-ray Optical Infrared Radio
FIGURE 1.18 Images of the Crab Pulsar (in the center of images) covering the electromagnetic spectrum.
(Courtesy of NASA.)


FIGURE 1.19
Cross-sectional
image of a seismic
model. The arrow
points to a
hydrocarbon (oil
and/or gas) trap.
(Courtesy of
Dr. Curtis Ober,
Sandia National
Laboratories.)

Seismic interpreters look for these “bright spots” to find oil and gas. The layers
above also are bright, but their brightness does not vary as strongly across the
layers. Many seismic reconstruction algorithms have difficulty imaging this tar-
get because of the faults above it.
Although ultrasound imaging is used routinely in manufacturing, the best
known applications of this technique are in medicine, especially in obstetrics,
where unborn babies are imaged to determine the health of their development.
A byproduct of this examination is determining the sex of the baby. Ultrasound
images are generated using the following basic procedure:
1. The ultrasound system (a computer, ultrasound probe consisting of a source
and receiver, and a display) transmits high-frequency (1 to 5 MHz) sound
pulses into the body.
2. The sound waves travel into the body and hit a boundary between tissues
(e.g., between fluid and soft tissue, soft tissue and bone). Some of the sound
waves are reflected back to the probe, while some travel on further until
they reach another boundary and get reflected.
3. The reflected waves are picked up by the probe and relayed to the
computer.
4. The machine calculates the distance from the probe to the tissue or organ
boundaries using the speed of sound in tissue (1540 m s) and the time of
the each echo’s return.
5. The system displays the distances and intensities of the echoes on the screen,
forming a two-dimensional image.
In a typical ultrasound image, millions of pulses and echoes are sent and re-
ceived each second. The probe can be moved along the surface of the body and
angled to obtain various views. Figure 1.20 shows several examples.
We continue the discussion on imaging modalities with some examples of
electron microscopy. Electron microscopes function as their optical counter-
parts, except that they use a focused beam of electrons instead of light to image
a specimen. The operation of electron microscopes involves the following basic
steps: A stream of electrons is produced by an electron source and accelerated
toward the specimen using a positive electrical potential. This stream is con-


a b
c d
FIGURE 1.20
Examples of
ultrasound
imaging. (a) Baby.
(2) Another view
of baby.
(c) Thyroids.
(d) Muscle layers
showing lesion.
(Courtesy of
Siemens Medical
Systems, Inc.,
Ultrasound
Group.)

fined and focused using metal apertures and magnetic lenses into a thin, fo-
cused, monochromatic beam.This beam is focused onto the sample using a mag-
netic lens. Interactions occur inside the irradiated sample, affecting the electron
beam. These interactions and effects are detected and transformed into an
image, much in the same way that light is reflected from, or absorbed by, objects
in a scene. These basic steps are carried out in all electron microscopes, re-
gardless of type.
A transmission electron microscope (TEM) works much like a slide projec-
tor. A projector shines (transmits) a beam of light through the slide; as the light
passes through the slide, it is affected by the contents of the slide. This trans-
mitted beam is then projected onto the viewing screen, forming an enlarged
image of the slide. TEMs work the same way, except that they shine a beam of
electrons through a specimen (analogous to the slide).The fraction of the beam
transmitted through the specimen is projected onto a phosphor screen. The in-
teraction of the electrons with the phosphor produces light and, therefore, a
viewable image. A scanning electron microscope (SEM), on the other hand, ac-
tually scans the electron beam and records the interaction of beam and sample
at each location.This produces one dot on a phosphor screen.A complete image
is formed by a raster scan of the bean through the sample, much like a TV cam-
era. The electrons interact with a phosphor screen and produce light. SEMs are
suitable for “bulky” samples, while TEMs require very thin samples.
Electron microscopes are capable of very high magnification.While light mi-
croscopy is limited to magnifications on the order 1000 *, electron microscopes


a b
FIGURE 1.21 (a) 250 * SEM image of a tungsten filament following thermal failure.
(b) 2500 * SEM image of damaged integrated circuit. The white fibers are oxides re-
sulting from thermal destruction. (Figure (a) courtesy of Mr. Michael Shaffer, Depart-
ment of Geological Sciences, University of Oregon, Eugene; (b) courtesy of Dr.
J. M. Hudak, McMaster University, Hamilton, Ontario, Canada.)

can achieve magnification of 10,000 * or more. Figure 1.21 shows two SEM im-
ages of specimen failures due to thermal overload.
We conclude the discussion of imaging modalities by looking briefly at im-
ages that are not obtained from physical objects. Instead, they are generated
by computer. Fractals are striking examples of computer-generated images (Lu
[1997]). Basically, a fractal is nothing more than an iterative reproduction of a
basic pattern according to some mathematical rules. For instance, tiling is one
of the simplest ways to generate a fractal image.A square can be subdivided into
four square subregions, each of which can be further subdivided into four small-
er square regions, and so on. Depending on the complexity of the rules for fill-
ing each subsquare, some beautiful tile images can be generated using this
method. Of course, the geometry can be arbitrary. For instance, the fractal image
could be grown radially out of a center point. Figure 1.22(a) shows a fractal
grown in this way.The reader will recognize this image as the theme image used
in the beginning page of each chapter in this book, selected because of its artis-
tic simplicity and abstract analogy to a human eye. Figure 1.22(b) shows an-
other fractal (a “moonscape”) that provides an interesting analogy to the images
of space used as illustrations in some of the preceding sections.
Fractal images tend toward artistic, mathematical formulations of “growth”
of subimage elements according to some rules. They are useful sometimes as
random textures. A more structured approach to image generation by comput-
er lies in 3-D modeling. This is an area that provides an important intersection
between image processing and computer graphics and is the basis for many 3-D
visualization systems (e.g., flight simulators). Figures 1.22(c) and (d) show ex-
amples of computer-generated images. Since the original object is created in
3-D, images can be generated in any perspective from plane projections of
the 3-D volume. Images of this type can be used for medical training and for a
host of other applications, such as criminal forensics and special effects.

1.4 I Fundamental Steps in Digital Image Processing 25

a b
c d
FIGURE 1.22
(a) and (b) Fractal
images. (c) and
(d) Images
generated from
3-D computer
models of the
objects shown.
(Figures (a) and
(b) courtesy of
Ms. Melissa
D. Binde,
Swarthmore
College, (c) and
(d) courtesy of
NASA.)

1.4 Fundamental Steps in Digital Image Processing
It is helpful to divide the material covered in the following chapters into the
two broad categories defined in Section 1.1: methods whose input and output
are images, and methods whose inputs may be images, but whose outputs are at-
tributes extracted from those images. This organization is summarized in
Fig. 1.23. The diagram does not imply that every process is applied to an image.
Rather, the intention is to convey an idea of all the methodologies that can be
applied to images for different purposes and possibly with different objectives.
The discussion in this section may be viewed as a brief overview of the mater-
ial in the remainder of the book.
Image acquisition is the first process shown in Fig. 1.23. The discussion in
Section 1.3 gave some hints regarding the origin of digital images. This topic is
considered in much more detail in Chapter 2, where we also introduce a num-
ber of basic digital image concepts that are used throughout the book. Note
that acquisition could be as simple as being given an image that is already in dig-
ital form. Generally, the image acquisition stage involves preprocessing, such
as scaling.
Image enhancement is among the simplest and most appealing areas of dig-
ital image processing. Basically, the idea behind enhancement techniques is to
bring out detail that is obscured, or simply to highlight certain features of interest
in an image. A familiar example of enhancement is when we increase the con-
trast of an image because “it looks better.” It is important to keep in mind that


FIGURE 1.23 Outputs of these processes generally are images
Fundamental
steps in digital

Outputs of these processes generally are image attributes
image processing. CHAPTER 6 CHAPTER 7 CHAPTER 8 CHAPTER 9

Color image Wavelets and Morphological
multiresolution Compression
processing processing
processing

CHAPTER 5 CHAPTER 10

Image
Segmentation
restoration

CHAPTERS 3 & 4 CHAPTER 11
Knowledge base Representation
Image
enhancement & description

CHAPTER 2 CHAPTER 12

Image Object
Problem recognition
acquisition
domain

enhancement is a very subjective area of image processing.Two chapters are de-
voted to enhancement, not because it is more important than the other topics
covered in the book but because we use enhancement as an avenue to introduce
the reader to techniques that are used in other chapters as well. Thus, rather
than having a chapter dedicated to mathematical preliminaries, we introduce a
number of needed mathematical concepts by showing how they apply to en-
hancement. This approach allows the reader to gain familiarity with these con-
cepts in the context of image processing. A good example of this is the Fourier
transform, which is introduced in Chapter 4 but is used also in several of the
other chapters.
Image restoration is an area that also deals with improving the appearance
of an image. However, unlike enhancement, which is subjective, image restora-
tion is objective, in the sense that restoration techniques tend to be based on
mathematical or probabilistic models of image degradation. Enhancement, on
the other hand, is based on human subjective preferences regarding what con-
stitutes a “good” enhancement result.
Color image processing is an area that has been gaining in importance be-
cause of the significant increase in the use of digital images over the Internet.
Chapter 5 covers a number of fundamental concepts in color models and basic
color processing in a digital domain. Color is used also in later chapters as the
basis for extracting features of interest in an image.
Wavelets are the foundation for representing images in various degrees of
resolution. In particular, this material is used in this book for image data com-
pression and for pyramidal representation, in which images are subdivided suc-
cessively into smaller regions.

1.4 I Fundamental Steps in Digital Image Processing 27

Compression, as the name implies, deals with techniques for reducing the
storage required to save an image, or the bandwidth required to transmit it. Al-
though storage technology has improved significantly over the past decade, the
same cannot be said for transmission capacity. This is true particularly in uses
of the Internet, which are characterized by significant pictorial content. Image
compression is familiar (perhaps inadvertently) to most users of computers in
the form of image file extensions, such as the jpg file extension used in the JPEG
(Joint Photographic Experts Group) image compression standard.
Morphological processing deals with tools for extracting image components
that are useful in the representation and description of shape. The material in
this chapter begins a transition from processes that output images to processes
that output image attributes, as indicated in Section 1.1.
Segmentation procedures partition an image into its constituent parts or ob-
jects. In general, autonomous segmentation is one of the most difficult tasks in
digital image processing. A rugged segmentation procedure brings the process
a long way toward successful solution of imaging problems that require objects
to be identified individually. On the other hand, weak or erratic segmentation
algorithms almost always guarantee eventual failure. In general, the more ac-
curate the segmentation, the more likely recognition is to succeed.
Representation and description almost always follow the output of a seg-
mentation stage, which usually is raw pixel data, constituting either the bound-
ary of a region (i.e., the set of pixels separating one image region from another)
or all the points in the region itself. In either case, converting the data to a form
suitable for computer processing is necessary. The first decision that must be
made is whether the data should be represented as a boundary or as a com-
plete region. Boundary representation is appropriate when the focus is on ex-
ternal shape characteristics, such as corners and inflections. Regional
representation is appropriate when the focus is on internal properties, such as
texture or skeletal shape. In some applications, these representations comple-
ment each other. Choosing a representation is only part of the solution for trans-
forming raw data into a form suitable for subsequent computer processing. A
method must also be specified for describing the data so that features of inter-
est are highlighted. Description, also called feature selection, deals with extract-
ing attributes that result in some quantitative information of interest or are
basic for differentiating one class of objects from another.
Recognition is the process that assigns a label (e.g., “vehicle”) to an object
based on its descriptors. As detailed in Section 1.1, we conclude our coverage
of digital image processing with the development of methods for recognition of
individual objects.
So far we have said nothing about the need for prior knowledge or about
the interaction between the knowledge base and the processing modules in
Fig. 1.23. Knowledge about a problem domain is coded into an image process-
ing system in the form of a knowledge database.This knowledge may be as sim-
ple as detailing regions of an image where the information of interest is known
to be located, thus limiting the search that has to be conducted in seeking that
information. The knowledge base also can be quite complex, such as an inter-
related list of all major possible defects in a materials inspection problem or an


image database containing high-resolution satellite images of a region in con-
nection with change-detection applications. In addition to guiding the operation
of each processing module, the knowledge base also controls the interaction
between modules. This distinction is made in Fig. 1.23 by the use of double-
headed arrows between the processing modules and the knowledge base, as op-
posed to single-headed arrows linking the processing modules.
Although we do not discuss image display explicitly at this point, it is impor-
tant to keep in mind that viewing the results of image processing can take place
at the output of any stage in Fig. 1.23. We also note that not all image processing
applications require the complexity of interactions implied by Fig. 1.23. In fact, not
even all those modules are needed in some cases. For example, image enhance-
ment for human visual interpretation seldom requires use of any of the other
stages in Fig. 1.23. In general, however, as the complexity of an image processing
task increases, so does the number of processes required to solve the problem.

1.5 Components of an Image Processing System
As recently as the mid-1980s, numerous models of image processing systems
being sold throughout the world were rather substantial peripheral devices that
attached to equally substantial host computers. Late in the 1980s and early in
the 1990s, the market shifted to image processing hardware in the form of sin-
gle boards designed to be compatible with industry standard buses and to fit into
engineering workstation cabinets and personal computers. In addition to low-
ering costs, this market shift also served as a catalyst for a significant number of
new companies whose specialty is the development of software written specif-
ically for image processing.
Although large-scale image processing systems still are being sold for mas-
sive imaging applications, such as processing of satellite images, the trend con-
tinues toward miniaturizing and blending of general-purpose small computers
with specialized image processing hardware. Figure 1.24 shows the basic com-
ponents comprising a typical general-purpose system used for digital image pro-
cessing.The function of each component is discussed in the following paragraphs,
starting with image sensing.
With reference to sensing, two elements are required to acquire digital im-
ages. The first is a physical device that is sensitive to the energy radiated by the
object we wish to image. The second, called a digitizer, is a device for convert-
ing the output of the physical sensing device into digital form. For instance, in
a digital video camera, the sensors produce an electrical output proportional
to light intensity. The digitizer converts these outputs to digital data. These top-
ics are covered in some detail in Chapter 2.
Specialized image processing hardware usually consists of the digitizer just
mentioned, plus hardware that performs other primitive operations, such as an
arithmetic logic unit (ALU), which performs arithmetic and logical operations
in parallel on entire images. One example of how an ALU is used is in averag-
ing images as quickly as they are digitized, for the purpose of noise reduction.
This type of hardware sometimes is called a front-end subsystem, and its most

1.5 I Components of an Image Processing System 29

Network FIGURE 1.24
Components of a
general-purpose
image processing
system.

Image displays Computer Mass storage

Specialized
Image processing
Hardcopy image processing
software
hardware

Image sensors

Problem
domain

distinguishing characteristic is speed. In other words, this unit performs functions
that require fast data throughputs (e.g., digitizing and averaging video images
at 30 frames s) that the typical main computer cannot handle.
The computer in an image processing system is a general-purpose computer
and can range from a PC to a supercomputer. In dedicated applications, some-
times specially designed computers are used to achieve a required level of per-
formance, but our interest here is on general-purpose image processing systems.
In these systems, almost any well-equipped PC-type machine is suitable for off-
line image processing tasks.
Software for image processing consists of specialized modules that perform
specific tasks. A well-designed package also includes the capability for the user
to write code that, as a minimum, utilizes the specialized modules. More so-
phisticated software packages allow the integration of those modules and gen-
eral-purpose software commands from at least one computer language.
Mass storage capability is a must in image processing applications. An image
of size 1024*1024 pixels, in which the intensity of each pixel is an 8-bit quan-
tity, requires one megabyte of storage space if the image is not compressed.
When dealing with thousands, or even millions, of images, providing adequate
storage in an image processing system can be a challenge. Digital storage for


image processing applications falls into three principal categories: (1) short-
term storage for use during processing, (2) on-line storage for relatively fast re-
call, and (3) archival storage, characterized by infrequent access. Storage is
measured in bytes (eight bits), Kbytes (one thousand bytes), Mbytes (one mil-
lion bytes), Gbytes (meaning giga, or one billion, bytes), and Tbytes (meaning
tera, or one trillion, bytes).
One method of providing short-term storage is computer memory. Another
is by specialized boards, called frame buffers, that store one or more images and
can be accessed rapidly, usually at video rates (e.g., at 30 complete images per
second). The latter method allows virtually instantaneous image zoom, as well
as scroll (vertical shifts) and pan (horizontal shifts). Frame buffers usually are
housed in the specialized image processing hardware unit shown in Fig. 1.24. On-
line storage generally takes the form of magnetic disks or optical-media stor-
age.The key factor characterizing on-line storage is frequent access to the stored
data. Finally, archival storage is characterized by massive storage requirements
but infrequent need for access. Magnetic tapes and optical disks housed in
“jukeboxes” are the usual media for archival applications.
Image displays in use today are mainly color (preferably flat screen) TV mon-
itors. Monitors are driven by the outputs of image and graphics display cards that
are an integral part of the computer system. Seldom are there requirements for
image display applications that cannot be met by display cards available com-
mercially as part of the computer system. In some cases, it is necessary to have
stereo displays, and these are implemented in the form of headgear containing
two small displays embedded in goggles worn by the user.
Hardcopy devices for recording images include laser printers, film cam-
eras, heat-sensitive devices, inkjet units, and digital units, such as optical and
CD-ROM disks. Film provides the highest possible resolution, but paper is the
obvious medium of choice for written material. For presentations, images are dis-
played on film transparencies or in a digital medium if image projection equip-
ment is used.The latter approach is gaining acceptance as the standard for image
presentations.
Networking is almost a default function in any computer system in use today.
Because of the large amount of data inherent in image processing applications,
the key consideration in image transmission is bandwidth. In dedicated net-
works, this typically is not a problem, but communications with remote sites via
the Internet are not always as efficient. Fortunately, this situation is improving
quickly as a result of optical fiber and other broadband technologies.

Summary
The main purpose of the material presented in this chapter is to provide a sense of per-
spective about the origins of digital image processing and, more important, about cur-
rent and future areas of application of this technology. Although the coverage of these
topics in this chapter was necessarily incomplete due to space limitations, it should have
left the reader with a clear impression of the breadth and practical scope of digital image
processing.As we proceed in the following chapters with the development of image pro-
cessing theory and applications, numerous examples are provided to keep a clear focus

I References and Further Reading 31

on the utility and promise of these techniques. Upon concluding the study of the final
chapter, the reader of this book will have arrived at a level of understanding that is the
foundation for most of the work currently underway in this field.

References and Further Reading
References at the end of later chapters address specific topics discussed in those chap-
ters, and are keyed to the Bibliography at the end of the book. However, in this chapter
we follow a different format in order to summarize in one place a body of journals that
publish material on image processing and related topics. We also provide a list of books
from which the reader can readily develop a historical and current perspective of activ-
ities in this field.Thus, the reference material cited in this chapter is intended as a general-
purpose, easily accessible guide to the published literature on image processing.
Major refereed journals that publish articles on image processing and related topics
include: IEEE Transactions on Image Processing; IEEE Transactions on Pattern Analy-
sis and Machine Intelligence; Computer Vision, Graphics, and Image Processing (prior
to 1991); Computer Vision and Image Understanding; IEEE Transactions on Systems,
Man and Cybernetics; Artificial Intelligence; Pattern Recognition; Pattern Recognition
Letters; Journal of the Optical Society of America (prior to 1984); Journal of the Optical
Society of America—A: Optics, Image Science and Vision; Optical Engineering; Applied
Optics—Information Processing; IEEE Transactions on Medical Imaging; Journal of
Electronic Imaging; IEEE Transactions on Information Theory; IEEE Transactions on
Communications; IEEE Transactions on Acoustics, Speech and Signal Processing; Pro-
ceedings of the IEEE; and issues of the IEEE Transactions on Computers prior to 1980.
Publications of the International Society for Optical Engineering (SPIE) also are of
interest.
The following books, listed in reverse chronological order (with the number of books
being biased toward more recent publications), contain material that complements our
treatment of digital image processing. These books represent an easily accessible
overview of the area for the past 30 years and were selected to provide a variety of treat-
ments.They range from textbooks, which cover foundation material; to handbooks, which
give an overview of techniques; and finally to edited books, which contain material rep-
resentative of current research in the field.
Duda, R. O., Hart, P. E., and Stork, D. G. [2001]. Pattern Classification, 2nd ed., John Wiley
& Sons, NY.
Ritter, G. X. and Wilson, J. N. [2001]. Handbook of Computer Vision Algorithms in Image
Algebra, CRC Press, Boca Raton, FL.
Shapiro, L. G. and Stockman, G. C. [2001]. Computer Vision, Prentice Hall, Upper Sad-
dle River, NJ.
Dougherty, E. R. (ed.) [2000]. Random Processes for Image and Signal Processing, IEEE
Press, NY.
Etienne, E. K. and Nachtegael, M. (eds.). [2000]. Fuzzy Techniques in Image Processing,
Springer-Verlag, NY.
Goutsias, J, Vincent, L., and Bloomberg, D. S. (eds.). [2000]. Mathematical Morphology
and Its Applications to Image and Signal Processing, Kluwer Academic Publishers,
Boston, MA.
Mallot, A. H. [2000]. Computational Vision, The MIT Press, Cambridge, MA.
Marchand-Maillet, S. and Sharaiha, Y. M. [2000]. Binary Digital Image Processing: A
Discrete Approach, Academic Press, NY.


Mitra, S. K. and Sicuranza, G. L. (eds.) [2000]. Nonlinear Image Processing, Academic
Press, NY.
Edelman, S. [1999]. Representation and Recognition in Vision,The MIT Press, Cambridge,
MA.
Lillesand, T. M. and Kiefer, R. W. [1999]. Remote Sensing and Image Interpretation, John
Wiley & Sons, NY.
Mather, P. M. [1999]. Computer Processing of Remotely Sensed Images: An Introduction,
John Wiley & Sons, NY.
Petrou, M. and Bosdogianni, P. [1999]. Image Processing: The Fundamentals, John Wiley
& Sons, UK.
Russ, J. C. [1999]. The Image Processing Handbook, 3rd ed., CRC Press, Boca Raton,
FL.
Smirnov, A. [1999]. Processing of Multidimensional Signals, Springer-Verlag, NY.
Sonka, M., Hlavac, V., and Boyle, R. [1999]. Image Processing, Analysis, and Computer
Vision, PWS Publishing, NY.
Umbaugh, S. E. [1998]. Computer Vision and Image Processing: A Practical Approach
Using CVIPtools, Prentice Hall, Upper Saddle River, NJ.
Haskell, B. G. and Netravali, A. N. [1997]. Digital Pictures: Representation, Compression,
and Standards, Perseus Publishing, NY.
Jahne, B. [1997]. Digital Image Processing: Concepts, Algorithms, and Scientific Applica-
tions, Springer-Verlag, NY.
Castleman, K. R. [1996]. Digital Image Processing, 2nd ed., Prentice Hall, Upper Saddle
River, NJ.
Geladi, P. and Grahn, H. [1996]. Multivariate Image Analysis, John Wiley & Sons, NY.
Bracewell, R. N. [1995]. Two-Dimensional Imaging, Prentice Hall, Upper Saddle
River, NJ.
Sid-Ahmed, M. A. [1995]. Image Processing: Theory, Algorithms, and Architectures,
McGraw-Hill, NY.
Jain, R., Rangachar, K., and Schunk, B. [1995]. Computer Vision, McGraw-Hill, NY.
Mitiche, A. [1994]. Computational Analysis of Visual Motion, Perseus Publishing, NY.
Baxes, G. A. [1994]. Digital Image Processing: Principles and Applications, John Wiley &
Sons, NY.
Gonzalez, R. C. and Woods, R. E. [1992]. Digital Image Processing, Addison-Wesley,
Reading, MA.
Haralick, R. M. and Shapiro, L. G. [1992]. Computer and Robot Vision, vols. 1 & 2,
Addison-Wesley, Reading, MA.
Pratt, W. K. [1991] Digital Image Processing, 2nd ed., Wiley-Interscience, NY.
Lim, J. S. [1990]. Two-Dimensional Signal and Image Processing, Prentice Hall, Upper
Saddle River, NJ.
Jain,A. K. [1989]. Fundamentals of Digital Image Processing, Prentice Hall, Upper Saddle
River, NJ.
Schalkoff, R. J. [1989]. Digital Image Processing and Computer Vision, John Wiley &
Sons, NY.
Giardina, C. R. and Dougherty, E. R. [1988]. Morphological Methods in Image and Sig-
nal Processing, Prentice Hall, Upper Saddle River, NJ.

I References and Further Reading 33

Levine, M. D. [1985]. Vision in Man and Machine, McGraw-Hill, NY.
Serra, J. [1982]. Image Analysis and Mathematical Morphology, Academic Press, NY.
Ballard, D. H. and Brown, C. M. [1982]. Computer Vision, Prentice Hall, Upper Saddle
River, NJ.
Fu, K. S. [1982]. Syntactic Pattern Recognition and Applications, Prentice Hall, Upper
Saddle River, NJ.
Nevatia, R. [1982]. Machine Perception, Prentice Hall, Upper Saddle River, NJ.
Pavlidis, T. [1982]. Algorithms for Graphics and Image Processing, Computer Science
Press, Rockville, MD.
Rosenfeld, R. and Kak, A. C. [1982]. Digital Picture Processing, 2nd ed., vols. 1 & 2,
Academic Press, NY.
Hall, E. L. [1979]. Computer Image Processing and Recognition, Academic Press, NY.
Gonzalez, R. C. and Thomason, M. G. [1978]. Syntactic Pattern Recognition: An Intro-
duction, Addison-Wesley, Reading, MA.
Andrews, H. C. and Hunt, B. R. [1977]. Digital Image Restoration, Prentice Hall, Upper
Saddle River, NJ.
Pavlidis, T. [1977]. Structural Pattern Recognition, Springer-Verlag, NY, 1977.
Tou, J. T. and Gonzalez, R. C. [1974]. Pattern Recognition Principles, Addison-Wesley,
Reading, MA, 1974.
Andrews, H. C. [1970]. Computer Techniques in Image Processing, Academic Press, NY.

2 Digital Image
Fundamentals
Those who wish to succeed must ask the right preliminary questions.
Aristotle

Preview
The purpose of this chapter is to introduce several concepts related to digital im-
ages and some of the notation used throughout the book. Section 2.1 briefly
summarizes the mechanics of the human visual system, including image for-
mation in the eye and its capabilities for brightness adaptation and discrimina-
tion. Section 2.2 discusses light, other components of the electromagnetic
spectrum, and their imaging characteristics. Section 2.3 discusses imaging sen-
sors and how they are used to generate digital images. Section 2.4 introduces the
concepts of uniform image sampling and gray-level quantization. Additional
topics discussed in that section include digital image representation, the effects
of varying the number of samples and gray levels in an image, some important
phenomena associated with sampling, and techniques for image zooming and
shrinking. Section 2.5 deals with some basic relationships between pixels that are
used throughout the book. Finally, Section 2.6 defines the conditions for linear
operations. As noted in that section, linear operators play a central role in the
development of image processing techniques.

2.1 Elements of Visual Perception
Although the digital image processing field is built on a foundation of mathe-
matical and probabilistic formulations, human intuition and analysis play a cen-
tral role in the choice of one technique versus another, and this choice often is

34

2.1 I Elements of Visual Perception 35

made based on subjective, visual judgments. Hence, developing a basic under-
standing of human visual perception as a first step in our journey through this
book is appropriate. Given the complexity and breadth of this topic, we can
only aspire to cover the most rudimentary aspects of human vision. In particu-
lar, our interest lies in the mechanics and parameters related to how images are
formed in the eye. We are interested in learning the physical limitations of
human vision in terms of factors that also are used in our work with digital im-
ages.Thus, factors such as how human and electronic imaging compare in terms
of resolution and ability to adapt to changes in illumination are not only inter-
esting, they also are important from a practical point of view.

2.1.1 Structure of the Human Eye
Figure 2.1 shows a simplified horizontal cross section of the human eye. The
eye is nearly a sphere, with an average diameter of approximately 20 mm.Three
membranes enclose the eye: the cornea and sclera outer cover; the choroid; and
the retina. The cornea is a tough, transparent tissue that covers the anterior

Cornea FIGURE 2.1
Simplified
Iris
diagram of a cross
section of the
human eye.
Anterior chamber Ciliary muscle
dy
bo
y
ar
ili
C

Lens

Ciliary fibers

Visual axis

Vitreous humor

Retina

Blind spot
Fovea
Sclera

Choroid

Ner
ve &
she
ath

36 Chapter 2 I Digital Image Fundamentals

surface of the eye. Continuous with the cornea, the sclera is an opaque mem-
brane that encloses the remainder of the optic globe.
The choroid lies directly below the sclera. This membrane contains a net-
work of blood vessels that serve as the major source of nutrition to the eye.
Even superficial injury to the choroid, often not deemed serious, can lead to se-
vere eye damage as a result of inflammation that restricts blood flow. The
choroid coat is heavily pigmented and hence helps to reduce the amount of ex-
traneous light entering the eye and the backscatter within the optical globe. At
its anterior extreme, the choroid is divided into the ciliary body and the iris
diaphragm. The latter contracts or expands to control the amount of light that
enters the eye. The central opening of the iris (the pupil) varies in diameter
from approximately 2 to 8 mm. The front of the iris contains the visible pig-
ment of the eye, whereas the back contains a black pigment.
The lens is made up of concentric layers of fibrous cells and is suspended by
fibers that attach to the ciliary body. It contains 60 to 70% water, about 6% fat, and
more protein than any other tissue in the eye.The lens is colored by a slightly yel-
low pigmentation that increases with age. In extreme cases, excessive clouding of
the lens, caused by the affliction commonly referred to as cataracts, can lead to
poor color discrimination and loss of clear vision. The lens absorbs approximate-
ly 8% of the visible light spectrum, with relatively higher absorption at shorter
wavelengths. Both infrared and ultraviolet light are absorbed appreciably by pro-
teins within the lens structure and, in excessive amounts, can damage the eye.
The innermost membrane of the eye is the retina, which lines the inside of the
wall’s entire posterior portion. When the eye is properly focused, light from an
object outside the eye is imaged on the retina. Pattern vision is afforded by the
distribution of discrete light receptors over the surface of the retina.There are two
classes of receptors: cones and rods. The cones in each eye number between 6
and 7 million. They are located primarily in the central portion of the retina,
called the fovea, and are highly sensitive to color. Humans can resolve fine de-
tails with these cones largely because each one is connected to its own nerve end.
Muscles controlling the eye rotate the eyeball until the image of an object of in-
terest falls on the fovea. Cone vision is called photopic or bright-light vision.
The number of rods is much larger: Some 75 to 150 million are distributed
over the retinal surface. The larger area of distribution and the fact that sever-
al rods are connected to a single nerve end reduce the amount of detail dis-
cernible by these receptors. Rods serve to give a general, overall picture of the
field of view. They are not involved in color vision and are sensitive to low lev-
els of illumination. For example, objects that appear brightly colored in day-
light when seen by moonlight appear as colorless forms because only the rods
are stimulated. This phenomenon is known as scotopic or dim-light vision.
Figure 2.2 shows the density of rods and cones for a cross section of the right
eye passing through the region of emergence of the optic nerve from the eye.
The absence of receptors in this area results in the so-called blind spot (see
Fig. 2.1). Except for this region, the distribution of receptors is radially sym-
metric about the fovea. Receptor density is measured in degrees from the fovea
(that is, in degrees off axis, as measured by the angle formed by the visual axis
and a line passing through the center of the lens and intersecting the retina).


180,000 FIGURE 2.2
Blind spot Distribution of
Cones rods and cones in
Rods the retina.
No. of rods or cones per mm2

135,000

90,000

45,000

80° 60° 40° 20° 0° 20° 40° 60° 80°
Degrees from visual axis (center of fovea)

Note in Fig. 2.2 that cones are most dense in the center of the retina (in the cen-
ter area of the fovea). Note also that rods increase in density from the center
out to approximately 20° off axis and then decrease in density out to the extreme
periphery of the retina.
The fovea itself is a circular indentation in the retina of about 1.5 mm in di-
ameter. However, in terms of future discussions, talking about square or rec-
tangular arrays of sensing elements is more useful. Thus, by taking some liberty
in interpretation, we can view the fovea as a square sensor array of size
1.5 mm*1.5 mm. The density of cones in that area of the retina is approxi-
mately 150,000 elements per mm2. Based on these approximations, the number
of cones in the region of highest acuity in the eye is about 337,000 elements.
Just in terms of raw resolving power, a charge-coupled device (CCD) imaging
chip of medium resolution can have this number of elements in a receptor array
no larger than 5 mm*5 mm. While the ability of humans to integrate intelli-
gence and experience with vision makes this type of comparison dangerous.
Keep in mind for future discussions that the basic ability of the eye to resolve
detail is certainly within the realm of current electronic imaging sensors.

2.1.2 Image Formation in the Eye
The principal difference between the lens of the eye and an ordinary optical
lens is that the former is flexible. As illustrated in Fig. 2.1, the radius of curva-
ture of the anterior surface of the lens is greater than the radius of its posteri-
or surface. The shape of the lens is controlled by tension in the fibers of the
ciliary body. To focus on distant objects, the controlling muscles cause the lens
to be relatively flattened. Similarly, these muscles allow the lens to become
thicker in order to focus on objects near the eye.
The distance between the center of the lens and the retina (called the focal
length) varies from approximately 17 mm to about 14 mm, as the refractive
power of the lens increases from its minimum to its maximum. When the eye


FIGURE 2.3
Graphical
representation of C
the eye looking at 15 m
a palm tree. Point
C is the optical
center of the lens.

100 m 17 mm

focuses on an object farther away than about 3 m, the lens exhibits its lowest re-
fractive power.When the eye focuses on a nearby object, the lens is most strong-
ly refractive. This information makes it easy to calculate the size of the retinal
image of any object. In Fig. 2.3, for example, the observer is looking at a tree
15 m high at a distance of 100 m. If h is the height in mm of that object in the
retinal image, the geometry of Fig. 2.3 yields 15/100=h/17 or h=2.55 mm.As
indicated in Section 2.1.1, the retinal image is reflected primarily in the area of
the fovea. Perception then takes place by the relative excitation of light recep-
tors, which transform radiant energy into electrical impulses that are ultimate-
ly decoded by the brain.

2.1.3 Brightness Adaptation and Discrimination
Because digital images are displayed as a discrete set of intensities, the eye’s
ability to discriminate between different intensity levels is an important con-
sideration in presenting image-processing results.The range of light intensity lev-
els to which the human visual system can adapt is enormous—on the order of
1010—from the scotopic threshold to the glare limit. Experimental evidence in-
dicates that subjective brightness (intensity as perceived by the human visual
system) is a logarithmic function of the light intensity incident on the eye. Fig-
ure 2.4, a plot of light intensity versus subjective brightness, illustrates this char-

FIGURE 2.4
Glare limit
Range of
subjective
brightness
sensations
Subjective brightness

showing a
Adaptation range

particular
Ba
adaptation level.
Bb

Scotopic

Scotopic Photopic
threshold
–6 –4 –2 0 2 4
Log of intensity (mL)


acteristic. The long solid curve represents the range of intensities to which the
visual system can adapt. In photopic vision alone, the range is about 106. The
transition from scotopic to photopic vision is gradual over the approximate
range from 0.001 to 0.1 millilambert (–3 to –1 mL in the log scale), as the dou-
ble branches of the adaptation curve in this range show.
The essential point in interpreting the impressive dynamic range depicted
in Fig. 2.4 is that the visual system cannot operate over such a range simultane-
ously. Rather, it accomplishes this large variation by changes in its overall sen-
sitivity, a phenomenon known as brightness adaptation. The total range of
distinct intensity levels it can discriminate simultaneously is rather small when
compared with the total adaptation range. For any given set of conditions, the
current sensitivity level of the visual system is called the brightness adaptation
level, which may correspond, for example, to brightness Ba in Fig. 2.4. The short
intersecting curve represents the range of subjective brightness that the eye can
perceive when adapted to this level. This range is rather restricted, having a
level Bb at and below which all stimuli are perceived as indistinguishable blacks.
The upper (dashed) portion of the curve is not actually restricted but, if ex-
tended too far, loses its meaning because much higher intensities would simply
raise the adaptation level higher than Ba .
The ability of the eye to discriminate between changes in light intensity at any
specific adaptation level is also of considerable interest. A classic experiment
used to determine the capability of the human visual system for brightness dis-
crimination consists of having a subject look at a flat, uniformly illuminated
area large enough to occupy the entire field of view. This area typically is a dif-
fuser, such as opaque glass, that is illuminated from behind by a light source
whose intensity, I, can be varied. To this field is added an increment of illumi-
nation, I, in the form of a short-duration flash that appears as a circle in the
center of the uniformly illuminated field, as Fig. 2.5 shows.
If I is not bright enough, the subject says “no,” indicating no perceivable
change.As I gets stronger, the subject may give a positive response of “yes,” in-
dicating a perceived change. Finally, when I is strong enough, the subject will
give a response of “yes” all the time. The quantity ¢Ic I, where ¢Ic is the incre-
ment of illumination discriminable 50% of the time with background illumina-
tion I, is called the Weber ratio. A small value of ¢Ic I, means that a small
percentage change in intensity is discriminable.This represents “good” brightness
discrimination. Conversely, a large value of ¢Ic I, means that a large percentage
change in intensity is required.This represents “poor” brightness discrimination.

FIGURE 2.5 Basic
experimental
I+¢I setup used to
characterize
brightness
discrimination.

I


FIGURE 2.6 1.0
Typical Weber
ratio as a function 0.5
of intensity.
0

log ¢Ic /I
– 0.5

–1.0

–1.5

–2.0
–4 –3 –2 –1 0 1 2 3 4
log I

A plot of log ¢Ic I, as a function of log I has the general shape shown in
Fig. 2.6.This curve shows that brightness discrimination is poor (the Weber ratio
is large) at low levels of illumination, and it improves significantly (the Weber
ratio decreases) as background illumination increases. The two branches in the
curve reflect the fact that at low levels of illumination vision is carried out by
activity of the rods, whereas at high levels (showing better discrimination) vi-
sion is the function of cones.
If the background illumination is held constant and the intensity of the
other source, instead of flashing, is now allowed to vary incrementally from
never being perceived to always being perceived, the typical observer can dis-
cern a total of one to two dozen different intensity changes. Roughly, this re-
sult is related to the number of different intensities a person can see at any one
point in a monochrome image. This result does not mean that an image can be
represented by such a small number of intensity values because, as the eye
roams about the image, the average background changes, thus allowing a
different set of incremental changes to be detected at each new adaptation
level. The net consequence is that the eye is capable of a much broader range
of overall intensity discrimination. In fact, we show in Section 2.4.3 that the eye
is capable of detecting objectionable contouring effects in monochrome im-
ages whose overall intensity is represented by fewer than approximately two
dozen levels.
Two phenomena clearly demonstrate that perceived brightness is not a sim-
ple function of intensity. The first is based on the fact that the visual system
tends to undershoot or overshoot around the boundary of regions of different
intensities. Figure 2.7(a) shows a striking example of this phenomenon. Al-
though the intensity of the stripes is constant, we actually perceive a brightness
pattern that is strongly scalloped, especially near the boundaries [Fig. 2.7(b)].
These seemingly scalloped bands are called Mach bands after Ernst Mach, who
first described the phenomenon in 1865.
The second phenomenon, called simultaneous contrast, is related to the fact
that a region’s perceived brightness does not depend simply on its intensity, as
Fig. 2.8 demonstrates. All the center squares have exactly the same intensity.


a
b
FIGURE 2.7
(a) An example
showing that
perceived
brightness is not a
simple function of
intensity. The
relative vertical
positions between
the two profiles in
(b) have no
special
significance; they
were chosen for
clarity.

Perceived brightness

Actual illumination

However, they appear to the eye to become darker as the background gets
lighter.A more familiar example is a piece of paper that seems white when lying
on a desk, but can appear totally black when used to shield the eyes while look-
ing directly at a bright sky.

a b c
FIGURE 2.8 Examples of simultaneous contrast. All the inner squares have the same in-
tensity, but they appear progressively darker as the background becomes lighter.


a b
c d
FIGURE 2.9 Some
well-known
optical illusions.

Other examples of human perception phenomena are optical illusions, in
which the eye fills in nonexisting information or wrongly perceives geometrical
properties of objects. Some examples are shown in Fig. 2.9. In Fig. 2.9(a), the out-
line of a square is seen clearly, in spite of the fact that no lines defining such a
figure are part of the image. The same effect, this time with a circle, can be seen
in Fig. 2.9(b); note how just a few lines are sufficient to give the illusion of a
complete circle. The two horizontal line segments in Fig. 2.9(c) are of the same
length, but one appears shorter than the other. Finally, all lines in Fig. 2.9(d)
that are oriented at 45° are equidistant and parallel. Yet the crosshatching cre-
ates the illusion that those lines are far from being parallel. Optical illusions
are a characteristic of the human visual system that is not fully understood.

2.2 Light and the Electromagnetic Spectrum
The electromagnetic spectrum was introduced in Section 1.3. We now consider
this topic in more detail. In 1666, Sir Isaac Newton discovered that when a beam
of sunlight is passed through a glass prism, the emerging beam of light is not

2.2 I Light and the Electromagnetic Spectrum 43

Energy of one photon (electron volts)
106 105 104 103 102 101 1 10–1 10–2 10–3 10–4 10–5 10–6 10–7 10–8 10–9

Frequency (Hz)
1021 1020 1019 1018 1017 1016 1015 1014 1013 1012 1011 1010 109 108 107 106 105

Wavelength (meters)
10–12 10–11 10–10 10–9 10–8 10–7 10–6 10–5 10–4 10–3 10–2 10–1 1 101 102 103

Hard X-rays Ultraviolet Infrared Radio waves

Gamma rays Soft X-rays Visible spectrum Microwaves

0.4*10 –6 0.5*10 –6 0.6*10 –6 0.7*10 –6
Ultraviolet Violet Blue Green Yellow Orange Red Infrared
FIGURE 2.10 The electromagnetic spectrum. The visible spectrum is shown zoomed to facilitate explanation,
but note that the visible spectrum is a rather narrow portion of the EM spectrum.

white but consists instead of a continuous spectrum of colors ranging from vio-
let at one end to red at the other. As shown in Fig. 2.10, the range of colors we
perceive in visible light represents a very small portion of the electromagnetic
spectrum. On one end of the spectrum are radio waves with wavelengths billions
of times longer than those of visible light. On the other end of the spectrum are
gamma rays with wavelengths millions of times smaller than those of visible light.
The electromagnetic spectrum can be expressed in terms of wavelength, fre-
quency, or energy.Wavelength (l) and frequency (n) are related by the expression
c
l = (2.2-1)
n
where c is the speed of light (2.998*108 m s). The energy of the various com-
ponents of the electromagnetic spectrum is given by the expression
E=hn (2.2-2)
where h is Planck’s constant.The units of wavelength are meters, with the terms
microns (denoted m and equal to 10–6 m) and nanometers (10–9 m) being used
just as frequently. Frequency is measured in Hertz (Hz), with one Hertz being
equal to one cycle of a sinusoidal wave per second.A commonly used unit of en-
ergy is the electron-volt.


FIGURE 2.11 l
Graphical
representation of
one wavelength.

Electromagnetic waves can be visualized as propagating sinusoidal waves with
wavelength l (Fig. 2.11), or they can be thought of as a stream of massless parti-
cles, each traveling in a wavelike pattern and moving at the speed of light. Each
massless particle contains a certain amount (or bundle) of energy. Each bundle
of energy is called a photon. We see from Eq. (2.2-2) that energy is proportional
to frequency, so the higher-frequency (shorter wavelength) electromagnetic phe-
nomena carry more energy per photon.Thus, radio waves have photons with low
energies, microwaves have more energy than radio waves, infrared still more, then
visible, ultraviolet, X-rays, and finally gamma rays, the most energetic of all. This
is the reason that gamma rays are so dangerous to living organisms.
Light is a particular type of electromagnetic radiation that can be seen and
sensed by the human eye. The visible (color) spectrum is shown expanded in
Fig. 2.10 for the purpose of discussion (we consider color in much more detail in
Chapter 6).The visible band of the electromagnetic spectrum spans the range from
approximately 0.43 m (violet) to about 0.79 m (red). For convenience, the color
spectrum is divided into six broad regions: violet, blue, green, yellow, orange, and
red. No color (or other component of the electromagnetic spectrum) ends abrupt-
ly, but rather each range blends smoothly into the next, as shown in Fig. 2.10.
The colors that humans perceive in an object are determined by the nature of
the light reflected from the object. A body that reflects light and is relatively bal-
anced in all visible wavelengths appears white to the observer. However, a body that
favors reflectance in a limited range of the visible spectrum exhibits some shades
of color. For example, green objects reflect light with wavelengths primarily in the
500 to 570 nm range while absorbing most of the energy at other wavelengths.
Light that is void of color is called achromatic or monochromatic light. The
only attribute of such light is its intensity, or amount. The term gray level gen-
erally is used to describe monochromatic intensity because it ranges from black,
to grays, and finally to white. Chromatic light spans the electromagnetic ener-
gy spectrum from approximately 0.43 to 0.79 m, as noted previously. Three
basic quantities are used to describe the quality of a chromatic light source: ra-
diance; luminance; and brightness. Radiance is the total amount of energy that
flows from the light source, and it is usually measured in watts (W). Luminance,
measured in lumens (lm), gives a measure of the amount of energy an observ-
er perceives from a light source. For example, light emitted from a source op-
erating in the far infrared region of the spectrum could have significant energy
(radiance), but an observer would hardly perceive it; its luminance would be
almost zero. Finally, as discussed in Section 2.1, brightness is a subjective de-
scriptor of light perception that is practically impossible to measure. It embod-

2.3 I Image Sensing and Acquisition 45

ies the achromatic notion of intensity and is one of the key factors in describ-
ing color sensation.
Continuing with the discussion of Fig. 2.10, we note that at the short-wave-
length end of the electromagnetic spectrum, we have gamma rays and hard
X-rays. As discussed in Section 1.3.1, gamma radiation is important for medical
and astronomical imaging, and for imaging radiation in nuclear environments.
Hard (high-energy) X-rays are used in industrial applications. Chest X-rays are
in the high end (shorter wavelength) of the soft X-rays region and dental X-rays
are in the lower energy end of that band. The soft X-ray band transitions into
the far ultraviolet light region, which in turn blends with the visible spectrum at
longer wavelengths. Moving still higher in wavelength, we encounter the in-
frared band, which radiates heat, a fact that makes it useful in imaging applica-
tions that rely on “heat signatures.” The part of the infrared band close to the
visible spectrum is called the near-infrared region.The opposite end of this band
is called the far-infrared region. This latter region blends with the microwave
band. This band is well known as the source of energy in microwave ovens, but
it has many other uses, including communication and radar. Finally, the radio
wave band encompasses television as well as AM and FM radio. In the higher
energies, radio signals emanating from certain stellar bodies are useful in as-
tronomical observations. Examples of images in most of the bands just discussed
are given in Section 1.3.
In principle, if a sensor can be developed that is capable of detecting energy
radiated by a band of the electromagnetic spectrum, we can image events of in-
terest in that band. It is important to note, however, that the wavelength of an
electromagnetic wave required to “see” an object must be of the same size as
or smaller than the object. For example, a water molecule has a diameter on
the order of 10–10 m. Thus, to study molecules, we would need a source capable
of emitting in the far ultraviolet or soft X-ray region. This limitation, along with
the physical properties of the sensor material, establishes the fundamental lim-
its on the capability of imaging sensors, such as visible, infrared, and other sen-
sors in use today.
Although imaging is based predominantly on energy radiated by electro-
magnetic waves, this is not the only method for image generation. For example,
as discussed in Section 1.3.7, sound reflected from objects can be used to form
ultrasonic images. Other major sources of digital images are electron beams for
electron microscopy and synthetic images used in graphics and visualization.

2.3 Image Sensing and Acquisition
The types of images in which we are interested are generated by the combina-
tion of an “illumination” source and the reflection or absorption of energy from
that source by the elements of the “scene” being imaged. We enclose illumina-
tion and scene in quotes to emphasize the fact that they are considerably more
general than the familiar situation in which a visible light source illuminates a
common everyday 3-D (three-dimensional) scene. For example, the illumination
may originate from a source of electromagnetic energy such as radar, infrared,


or X-ray energy. But, as noted earlier, it could originate from less traditional
sources, such as ultrasound or even a computer-generated illumination pattern.
Similarly, the scene elements could be familiar objects, but they can just as eas-
ily be molecules, buried rock formations, or a human brain.We could even image
a source, such as acquiring images of the sun. Depending on the nature of the
source, illumination energy is reflected from, or transmitted through, objects.An
example in the first category is light reflected from a planar surface. An exam-
ple in the second category is when X-rays pass through a patient’s body for the
purpose of generating a diagnostic X-ray film. In some applications, the re-
flected or transmitted energy is focused onto a photoconverter (e.g., a phos-
phor screen), which converts the energy into visible light. Electron microscopy
and some applications of gamma imaging use this approach.
Figure 2.12 shows the three principal sensor arrangements used to transform
illumination energy into digital images. The idea is simple: Incoming energy is

a Energy
b
c Filter
FIGURE 2.12
(a) Single imaging
sensor. Sensing material
Power in
(b) Line sensor.
(c) Array sensor.
Voltage waveform out
Housing


transformed into a voltage by the combination of input electrical power and
sensor material that is responsive to the particular type of energy being de-
tected. The output voltage waveform is the response of the sensor(s), and a dig-
ital quantity is obtained from each sensor by digitizing its response. In this
section, we look at the principal modalities for image sensing and generation.
Image digitizing is discussed in Section 2.4.

2.3.1 Image Acquisition Using a Single Sensor
Figure 2.12(a) shows the components of a single sensor. Perhaps the most fa-
miliar sensor of this type is the photodiode, which is constructed of silicon ma-
terials and whose output voltage waveform is proportional to light. The use of
a filter in front of a sensor improves selectivity. For example, a green (pass) fil-
ter in front of a light sensor favors light in the green band of the color spec-
trum. As a consequence, the sensor output will be stronger for green light than
for other components in the visible spectrum.
In order to generate a 2-D image using a single sensor, there has to be rela-
tive displacements in both the x- and y-directions between the sensor and the
area to be imaged. Figure 2.13 shows an arrangement used in high-precision
scanning, where a film negative is mounted onto a drum whose mechanical ro-
tation provides displacement in one dimension.The single sensor is mounted on
a lead screw that provides motion in the perpendicular direction. Since me-
chanical motion can be controlled with high precision, this method is an inex-
pensive (but slow) way to obtain high-resolution images. Other similar
mechanical arrangements use a flat bed, with the sensor moving in two linear
directions. These types of mechanical digitizers sometimes are referred to as
microdensitometers.
Another example of imaging with a single sensor places a laser source coin-
cident with the sensor. Moving mirrors are used to control the outgoing beam
in a scanning pattern and to direct the reflected laser signal onto the sensor.
This arrangement also can be used to acquire images using strip and array sen-
sors, which are discussed in the following two sections.

Film

Sensor Rotation

Linear motion
One image line out
per increment of rotation
and full linear displacement
of sensor from left to right.
FIGURE 2.13 Combining a single sensor with motion to generate a 2-D image.


2.3.2 Image Acquisition Using Sensor Strips
A geometry that is used much more frequently than single sensors consists of
an in-line arrangement of sensors in the form of a sensor strip, as Fig. 2.12(b)
shows. The strip provides imaging elements in one direction. Motion perpen-
dicular to the strip provides imaging in the other direction, as shown in
Fig. 2.14(a).This is the type of arrangement used in most flat bed scanners. Sens-
ing devices with 4000 or more in-line sensors are possible. In-line sensors are
used routinely in airborne imaging applications, in which the imaging system is
mounted on an aircraft that flies at a constant altitude and speed over the ge-
ographical area to be imaged. One-dimensional imaging sensor strips that re-
spond to various bands of the electromagnetic spectrum are mounted
perpendicular to the direction of flight. The imaging strip gives one line of an
image at a time, and the motion of the strip completes the other dimension of
a two-dimensional image. Lenses or other focusing schemes are used to pro-
ject the area to be scanned onto the sensors.
Sensor strips mounted in a ring configuration are used in medical and in-
dustrial imaging to obtain cross-sectional (“slice”) images of 3-D objects, as
Fig. 2.14(b) shows. A rotating X-ray source provides illumination and the por-

One image line out per
increment of linear motion

Imaged area
Image
reconstruction
Cross-sectional images
Linear motion of 3-D object

Sensor strip

3-D object

X-ray source

tion
ar mo
L ine
Sensor ring

a b
FIGURE 2.14 (a) Image acquisition using a linear sensor strip. (b) Image acquisition using a circular sensor strip.


tion of the sensors opposite the source collect the X-ray energy that pass through
the object (the sensors obviously have to be sensitive to X-ray energy). This is
the basis for medical and industrial computerized axial tomography (CAT)
imaging as indicated in Sections 1.2 and 1.3.2. It is important to note that the out-
put of the sensors must be processed by reconstruction algorithms whose ob-
jective is to transform the sensed data into meaningful cross-sectional images.
In other words, images are not obtained directly from the sensors by motion
alone; they require extensive processing. A 3-D digital volume consisting of
stacked images is generated as the object is moved in a direction perpendicu-
lar to the sensor ring. Other modalities of imaging based on the CAT principle
include magnetic resonance imaging (MRI) and positron emission tomography
(PET). The illumination sources, sensors, and types of images are different, but
conceptually they are very similar to the basic imaging approach shown in
Fig. 2.14(b).

2.3.3 Image Acquisition Using Sensor Arrays
Figure 2.12(c) shows individual sensors arranged in the form of a 2-D array.
Numerous electromagnetic and some ultrasonic sensing devices frequently are
arranged in an array format. This is also the predominant arrangement found
in digital cameras. A typical sensor for these cameras is a CCD array, which can
be manufactured with a broad range of sensing properties and can be packaged
in rugged arrays of 4000 * 4000 elements or more. CCD sensors are used wide-
ly in digital cameras and other light sensing instruments. The response of each
sensor is proportional to the integral of the light energy projected onto the sur-
face of the sensor, a property that is used in astronomical and other applica-
tions requiring low noise images. Noise reduction is achieved by letting the
sensor integrate the input light signal over minutes or even hours (we discuss
noise reduction by integration in Chapter 3). Since the sensor array shown in
Fig. 2.15(c) is two dimensional, its key advantage is that a complete image can
be obtained by focusing the energy pattern onto the surface of the array. Mo-
tion obviously is not necessary, as is the case with the sensor arrangements dis-
cussed in the preceding two sections.
The principal manner in which array sensors are used is shown in Fig. 2.15.
This figure shows the energy from an illumination source being reflected from
a scene element, but, as mentioned at the beginning of this section, the energy
also could be transmitted through the scene elements. The first function per-
formed by the imaging system shown in Fig. 2.15(c) is to collect the incoming
energy and focus it onto an image plane. If the illumination is light, the front end
of the imaging system is a lens, which projects the viewed scene onto the lens
focal plane, as Fig. 2.15(d) shows. The sensor array, which is coincident with the
focal plane, produces outputs proportional to the integral of the light received
at each sensor. Digital and analog circuitry sweep these outputs and convert
them to a video signal, which is then digitized by another section of the imag-
ing system. The output is a digital image, as shown diagrammatically in
Fig. 2.15(e). Conversion of an image into digital form is the topic of Section 2.4.


Illumination (energy)
source

Output (digitized) image

Imaging system

(Internal) image plane

Scene element

a
c d e
b
FIGURE 2.15 An example of the digital image acquisition process. (a) Energy (“illumination”) source. (b) An el-
ement of a scene. (c) Imaging system. (d) Projection of the scene onto the image plane. (e) Digitized image.

2.3.4 A Simple Image Formation Model
As introduced in Section 1.1, we shall denote images by two-dimensional func-
tions of the form f(x, y). The value or amplitude of f at spatial coordinates
(x, y) is a positive scalar quantity whose physical meaning is determined by
the source of the image. Most of the images in which we are interested in this
book are monochromatic images, whose values are said to span the gray scale,
as discussed in Section 2.2. When an image is generated from a physical
process, its values are proportional to energy radiated by a physical source
(e.g., electromagnetic waves). As a consequence, f(x, y) must be nonzero and
finite; that is,

0<f(x, y)<q. (2.3-1)

The function f(x, y) may be characterized by two components: (1) the
amount of source illumination incident on the scene being viewed, and (2) the
amount of illumination reflected by the objects in the scene. Appropriately,
these are called the illumination and reflectance components and are denoted
by i(x, y) and r(x, y), respectively. The two functions combine as a product to
form f(x, y):


f(x, y)=i(x, y)r(x, y) (2.3-2)

where

0<i(x, y)<q (2.3-3)

and

0<r(x, y)<1. (2.3-4)

Equation (2.3-4) indicates that reflectance is bounded by 0 (total absorption)
and 1 (total reflectance).The nature of i(x, y) is determined by the illumination
source, and r(x, y) is determined by the characteristics of the imaged objects.
It is noted that these expressions also are applicable to images formed via trans-
mission of the illumination through a medium, such as a chest X-ray. In this
case, we would deal with a transmissivity instead of a reflectivity function, but the
limits would be the same as in Eq. (2.3-4), and the image function formed would
be modeled as the product in Eq. (2.3-2).

I The values given in Eqs. (2.3-3) and (2.3-4) are theoretical bounds. The fol- EXAMPLE 2.1:
lowing average numerical figures illustrate some typical ranges of i(x, y) for Some typical
visible light. On a clear day, the sun may produce in excess of 90,000 lm m2 of values of
illumination and
illumination on the surface of the Earth. This figure decreases to less than reflectance.
10,000 lm m2 on a cloudy day. On a clear evening, a full moon yields about
0.1 lm m2 of illumination. The typical illumination level in a commercial office
is about 1000 lm m2. Similarly, the following are some typical values of r(x, y):
0.01 for black velvet, 0.65 for stainless steel, 0.80 for flat-white wall paint, 0.90
for silver-plated metal, and 0.93 for snow. I

As noted in Section 2.2, we call the intensity of a monochrome image at any
coordinates Ax0 , y0 B the gray level (/) of the image at that point. That is,

/ = fAx0 , y0 B (2.3-5)

From Eqs. (2.3-2) through (2.3-4), it is evident that / lies in the range

Lmin / Lmax (2.3-6)

In theory, the only requirement on Lmin is that it be positive, and on Lmax that it
be finite. In practice, Lmin=imin rmin and Lmax=imax rmax . Using the preceding av-
erage office illumination and range of reflectance values as guidelines, we may
expect Lmin≠10 and Lmax≠1000 to be typical limits for indoor values in the
absence of additional illumination.
The interval CLmin , Lmax D is called the gray scale. Common practice is to shift
this interval numerically to the interval [0, L-1], where /=0 is considered
black and /=L-1 is considered white on the gray scale. All intermediate
values are shades of gray varying from black to white.


2.4 Image Sampling and Quantization
From the discussion in the preceding section, we see that there are numerous
ways to acquire images, but our objective in all is the same: to generate digital
images from sensed data. The output of most sensors is a continuous voltage
waveform whose amplitude and spatial behavior are related to the physical
phenomenon being sensed. To create a digital image, we need to convert the
continuous sensed data into digital form. This involves two processes: sampling
and quantization.

2.4.1 Basic Concepts in Sampling and Quantization
The basic idea behind sampling and quantization is illustrated in Fig. 2.16. Fig-
ure 2.16(a) shows a continuous image, f(x, y), that we want to convert to digi-
tal form. An image may be continuous with respect to the x- and y-coordinates,
and also in amplitude. To convert it to digital form, we have to sample the func-
tion in both coordinates and in amplitude. Digitizing the coordinate values is
called sampling. Digitizing the amplitude values is called quantization.
The one-dimensional function shown in Fig. 2.16(b) is a plot of amplitude
(gray level) values of the continuous image along the line segment AB in
Fig. 2.16(a). The random variations are due to image noise. To sample this func-
tion, we take equally spaced samples along line AB, as shown in Fig. 2.16(c).The
location of each sample is given by a vertical tick mark in the bottom part of the
figure.The samples are shown as small white squares superimposed on the func-
tion.The set of these discrete locations gives the sampled function. However, the
values of the samples still span (vertically) a continuous range of gray-level val-
ues. In order to form a digital function, the gray-level values also must be con-
verted (quantized) into discrete quantities. The right side of Fig. 2.16(c) shows
the gray-level scale divided into eight discrete levels, ranging from black to
white. The vertical tick marks indicate the specific value assigned to each of the
eight gray levels. The continuous gray levels are quantized simply by assigning
one of the eight discrete gray levels to each sample. The assignment is made
depending on the vertical proximity of a sample to a vertical tick mark. The
digital samples resulting from both sampling and quantization are shown in
Fig. 2.16(d). Starting at the top of the image and carrying out this procedure
line by line produces a two-dimensional digital image.
Sampling in the manner just described assumes that we have a continuous
image in both coordinate directions as well as in amplitude. In practice, the
method of sampling is determined by the sensor arrangement used to generate
the image. When an image is generated by a single sensing element combined
with mechanical motion, as in Fig. 2.13, the output of the sensor is quantized in
the manner described above. However, sampling is accomplished by selecting
the number of individual mechanical increments at which we activate the sen-
sor to collect data. Mechanical motion can be made very exact so, in principle,
there is almost no limit as to how fine we can sample an image. However, prac-
tical limits are established by imperfections in the optics used to focus on the

2.4 I Image Sampling and Quantization 53

A B

A B

A B A B
Quantization

Sampling

a b
c d
FIGURE 2.16 Generating a digital image. (a) Continuous image. (b) A scan line from A to B in the continuous image,
used to illustrate the concepts of sampling and quantization. (c) Sampling and quantization. (d) Digital scan line.

sensor an illumination spot that is inconsistent with the fine resolution achiev-
able with mechanical displacements.
When a sensing strip is used for image acquisition, the number of sensors in
the strip establishes the sampling limitations in one image direction. Mechanical
motion in the other direction can be controlled more accurately, but it makes
little sense to try to achieve sampling density in one direction that exceeds the


a b
FIGURE 2.17 (a) Continuos image projected onto a sensor array. (b) Result of image
sampling and quantization.

sampling limits established by the number of sensors in the other. Quantiza-
tion of the sensor outputs completes the process of generating a digital image.
When a sensing array is used for image acquisition, there is no motion and
the number of sensors in the array establishes the limits of sampling in both di-
rections. Quantization of the sensor outputs is as before. Figure 2.17 illustrates
this concept. Figure 2.17(a) shows a continuous image projected onto the plane
of an array sensor. Figure 2.17(b) shows the image after sampling and quanti-
zation. Clearly, the quality of a digital image is determined to a large degree by
the number of samples and discrete gray levels used in sampling and quantiza-
tion. However, as shown in Section 2.4.3, image content is an important con-
sideration in choosing these parameters.

2.4.2 Representing Digital Images
The result of sampling and quantization is a matrix of real numbers.We will use
two principal ways in this book to represent digital images.Assume that an image
f(x, y) is sampled so that the resulting digital image has M rows and N columns.
The values of the coordinates (x, y) now become discrete quantities. For nota-
tional clarity and convenience, we shall use integer values for these discrete co-
ordinates. Thus, the values of the coordinates at the origin are (x, y)=(0, 0).
The next coordinate values along the first row of the image are represented as
(x, y)=(0, 1). It is important to keep in mind that the notation (0, 1) is used
to signify the second sample along the first row. It does not mean that these are
the actual values of physical coordinates when the image was sampled. Figure
2.18 shows the coordinate convention used throughout this book.


Origin FIGURE 2.18
0 1 2 3. . . . . . N-1 Coordinate
0 y convention used
1 in this book to
2 represent digital
images.
3
.
.
.

.
.
.
M-1
One pixel f (x, y)
x

The notation introduced in the preceding paragraph allows us to write the
complete M*N digital image in the following compact matrix form:

f(x, y) = D T.
f(0, 0) f(0, 1) p f(0, N - 1)
f(1, 0) f(1, 1) p f(1, N - 1)
(2.4-1)
o o o
f(M - 1, 0) f(M - 1, 1) p f(M - 1, N - 1)
The right side of this equation is by definition a digital image. Each element of
this matrix array is called an image element, picture element, pixel, or pel. The
terms image and pixel will be used throughout the rest of our discussions to de-
note a digital image and its elements.
In some discussions, it is advantageous to use a more traditional matrix no-
tation to denote a digital image and its elements:

A = D T.
a0, 0 a0, 1 p a0, N - 1
a1, 0 a1, 1 p a1, N - 1
(2.4-2)
o o o
aM - 1, 0 aM - 1, 1 p aM - 1, N - 1

Clearly, aij=f(x=i, y=j)=f(i, j), so Eqs. (2.4-1) and (2.4-2) are identical
matrices.
Expressing sampling and quantization in more formal mathematical terms
can be useful at times. Let Z and R denote the set of real integers and the set
of real numbers, respectively. The sampling process may be viewed as parti-
tioning the xy plane into a grid, with the coordinates of the center of each grid
being a pair of elements from the Cartesian product Z2, which is the set of all
ordered pairs of elements Azi , zj B, with zi and zj being integers from Z. Hence,
f(x, y) is a digital image if (x, y) are integers from Z2 and f is a function that
assigns a gray-level value (that is, a real number from the set of real numbers,
R) to each distinct pair of coordinates (x, y). This functional assignment


obviously is the quantization process described earlier. If the gray levels also are
integers (as usually is the case in this and subsequent chapters), Z replaces R,
and a digital image then becomes a 2-D function whose coordinates and am-
plitude values are integers.
This digitization process requires decisions about values for M, N, and for the
number, L, of discrete gray levels allowed for each pixel. There are no require-
ments on M and N, other than that they have to be positive integers. However,
due to processing, storage, and sampling hardware considerations, the number
of gray levels typically is an integer power of 2:

L = 2k. (2.4-3)

We assume that the discrete levels are equally spaced and that they are integers
in the interval [0, L-1]. Sometimes the range of values spanned by the gray
scale is called the dynamic range of an image, and we refer to images whose gray
levels span a significant portion of the gray scale as having a high dynamic range.
When an appreciable number of pixels exhibit this property, the image will have
high contrast. Conversely, an image with low dynamic range tends to have a dull,
washed out gray look. This is discussed in much more detail in Section 3.3.
The number, b, of bits required to store a digitized image is

b=M*N*k. (2.4-4)

When M=N, this equation becomes

b = N 2k. (2.4-5)

Table 2.1 shows the number of bits required to store square images with vari-
ous values of N and k. The number of gray levels corresponding to each value
of k is shown in parentheses. When an image can have 2k gray levels, it is com-
mon practice to refer to the image as a “k-bit image.” For example, an image with
256 possible gray-level values is called an 8-bit image. Note that storage re-
quirements for 8-bit images of size 1024*1024 and higher are not insignificant.

TABLE 2.1
Number of storage bits for various values of N and k.

N/k 1 (L 2) 2 (L 4) 3 (L 8) 4 (L 16) 5 (L 32) 6 (L 64) 7 (L 128) 8 (L 256)

32 1,024 2,048 3,072 4,096 5,120 6,144 7,168 8,192
64 4,096 8,192 12,288 16,384 20,480 24,576 28,672 32,768
128 16,384 32,768 49,152 65,536 81,920 98,304 114,688 131,072
256 65,536 131,072 196,608 262,144 327,680 393,216 458,752 524,288
512 262,144 524,288 786,432 1,048,576 1,310,720 1,572,864 1,835,008 2,097,152
1024 1,048,576 2,097,152 3,145,728 4,194,304 5,242,880 6,291,456 7,340,032 8,388,608
2048 4,194,304 8,388,608 12,582,912 16,777,216 20,971,520 25,165,824 29,369,128 33,554,432
4096 16,777,216 33,554,432 50,331,648 67,108,864 83,886,080 100,663,296 117,440,512 134,217,728
8192 67,108,864 134,217,728 201,326,592 268,435,456 335,544,320 402,653,184 469,762,048 536,870,912


2.4.3 Spatial and Gray-Level Resolution
Sampling is the principal factor determining the spatial resolution of an image. Ba-
sically, spatial resolution is the smallest discernible detail in an image. Suppose that
we construct a chart with vertical lines of width W, with the space between the lines
also having width W. A line pair consists of one such line and its adjacent space.
Thus, the width of a line pair is 2W, and there are 1/2W line pairs per unit distance.
A widely used definition of resolution is simply the smallest number of discernible
line pairs per unit distance; for example, 100 line pairs per millimeter.
Gray-level resolution similarly refers to the smallest discernible change in
gray level, but, as noted in Section 2.1.3, measuring discernible changes in gray
level is a highly subjective process. We have considerable discretion regarding
the number of samples used to generate a digital image, but this is not true for
the number of gray levels. Due to hardware considerations, the number of gray
levels is usually an integer power of 2, as mentioned in the previous section.
The most common number is 8 bits, with 16 bits being used in some applica-
tions where enhancement of specific gray-level ranges is necessary. Sometimes
we find systems that can digitize the gray levels of an image with 10 or 12 bits
of accuracy, but these are the exception rather than the rule.
When an actual measure of physical resolution relating pixels and the level
of detail they resolve in the original scene are not necessary, it is not uncommon
to refer to an L-level digital image of size M*N as having a spatial resolution
of M*N pixels and a gray-level resolution of L levels. We will use this termi-
nology from time to time in subsequent discussions, making a reference to ac- EXAMPLE 2.2:
tual resolvable detail only when necessary for clarity. Typical effects of
varying the
number of
I Figure 2.19 shows an image of size 1024*1024 pixels whose gray levels are samples in a
represented by 8 bits. The other images shown in Fig. 2.19 are the results of digital image.

32
64
128

256

512

1024
FIGURE 2.19 A 1024*1024, 8-bit image subsampled down to size 32*32 pixels. The number of allowable
gray levels was kept at 256.


subsampling the 1024*1024 image. The subsampling was accomplished by
deleting the appropriate number of rows and columns from the original image.
For example, the 512*512 image was obtained by deleting every other row and
column from the 1024*1024 image. The 256*256 image was generated by
deleting every other row and column in the 512*512 image, and so on. The
number of allowed gray levels was kept at 256.
These images show the dimensional proportions between various sampling
densities, but their size differences make it difficult to see the effects resulting
from a reduction in the number of samples. The simplest way to compare these
effects is to bring all the subsampled images up to size 1024*1024 by row and
column pixel replication. The results are shown in Figs. 2.20(b) through (f). Fig-
ure 2.20(a) is the same 1024*1024, 256-level image shown in Fig. 2.19; it is re-
peated to facilitate comparisons.
Compare Fig. 2.20(a) with the 512*512 image in Fig. 2.20(b) and note that
it is virtually impossible to tell these two images apart. The level of detail lost
is simply too fine to be seen on the printed page at the scale in which these im-

a b c
d e f
FIGURE 2.20 (a) 1024*1024, 8-bit image. (b) 512*512 image resampled into 1024*1024 pixels by row and
column duplication. (c) through (f) 256*256, 128*128, 64*64, and 32*32 images resampled into
1024*1024 pixels.


ages are shown. Next, the 256*256 image in Fig. 2.20(c) shows a very slight fine
checkerboard pattern in the borders between flower petals and the black back-
ground. A slightly more pronounced graininess throughout the image also is
beginning to appear.These effects are much more visible in the 128*128 image
in Fig. 2.20(d), and they become pronounced in the 64*64 and 32*32 images
in Figs. 2.20(e) and (f), respectively. I
EXAMPLE 2.3:
I In this example, we keep the number of samples constant and reduce the num- Typical effects of
varying the
ber of gray levels from 256 to 2, in integer powers of 2. Figure 2.21(a) is a 452*374 number of gray
CAT projection image, displayed with k=8 (256 gray levels). Images such as this levels in a digital
are obtained by fixing the X-ray source in one position, thus producing a 2-D image image.

a b
c d
FIGURE 2.21
(a) 452*374,
256-level image.
(b)–(d) Image
displayed in 128,
64, and 32 gray
levels, while
keeping the
spatial resolution
constant.


in any desired direction. Projection images are used as guides to set up the para-
meters for a CAT scanner, including tilt, number of slices, and range.
Figures 2.21(b) through (h) were obtained by reducing the number of bits
from k=7 to k=1 while keeping the spatial resolution constant at 452*374
pixels. The 256-, 128-, and 64-level images are visually identical for all practical
purposes. The 32-level image shown in Fig. 2.21(d), however, has an almost im-
perceptible set of very fine ridgelike structures in areas of smooth gray levels
(particularly in the skull). This effect, caused by the use of an insufficient num-
ber of gray levels in smooth areas of a digital image, is called false contouring,
so called because the ridges resemble topographic contours in a map. False con-
touring generally is quite visible in images displayed using 16 or less uniform-
ly spaced gray levels, as the images in Figs. 2.21(e) through (h) show.

e f
g h
FIGURE 2.21
(Continued)
(e)–(g) Image
displayed in 16, 8,
4, and 2 gray
levels. (Original
courtesy of
Dr. David
R. Pickens,
Department of
Radiology &
Radiological
Sciences,
Vanderbilt
University
Medical Center.)


As a very rough rule of thumb, and assuming powers of 2 for convenience,
images of size 256*256 pixels and 64 gray levels are about the smallest images
that can be expected to be reasonably free of objectionable sampling checker-
boards and false contouring. I

The results in Examples 2.2 and 2.3 illustrate the effects produced on image
quality by varying N and k independently. However, these results only partially
answer the question of how varying N and k affect images because we have not
considered yet any relationships that might exist between these two parameters.
An early study by Huang [1965] attempted to quantify experimentally the ef-
fects on image quality produced by varying N and k simultaneously. The exper-
iment consisted of a set of subjective tests. Images similar to those shown in
Fig. 2.22 were used.The woman’s face is representative of an image with relatively
little detail; the picture of the cameraman contains an intermediate amount of
detail; and the crowd picture contains, by comparison, a large amount of detail.
Sets of these three types of images were generated by varying N and k, and
observers were then asked to rank them according to their subjective quality.
Results were summarized in the form of so-called isopreference curves in the
Nk-plane (Fig. 2.23 shows average isopreference curves representative of curves
corresponding to the images shown in Fig. 2.22). Each point in the Nk-plane rep-
resents an image having values of N and k equal to the coordinates of that point.
Points lying on an isopreference curve correspond to images of equal subjective
quality. It was found in the course of the experiments that the isopreference
curves tended to shift right and upward, but their shapes in each of the three
image categories were similar to those shown in Fig. 2.23. This is not unexpect-
ed, since a shift up and right in the curves simply means larger values for N and
k, which implies better picture quality.
The key point of interest in the context of the present discussion is that iso-
preference curves tend to become more vertical as the detail in the image in-
creases. This result suggests that for images with a large amount of detail only

a b c
FIGURE 2.22 (a) Image with a low level of detail. (b) Image with a medium level of detail. (c) Image with a rel-
atively large amount of detail. (Image (b) courtesy of the Massachusetts Institute of Technology.)


FIGURE 2.23
Representative
isopreference
curves for the 5
three types of
images in
Fig. 2.22.
Face
k
Cameraman

Crowd
4

32 64 128 256
N

a few gray levels may be needed. For example, the isopreference curve in
Fig. 2.23 corresponding to the crowd is nearly vertical. This indicates that, for a
fixed value of N, the perceived quality for this type of image is nearly indepen-
dent of the number of gray levels used (for the range of gray levels shown in
Fig. 2.23). It is also of interest to note that perceived quality in the other two
image categories remained the same in some intervals in which the spatial res-
olution was increased, but the number of gray levels actually decreased. The
most likely reason for this result is that a decrease in k tends to increase the ap-
parent contrast of an image, a visual effect that humans often perceive as im-
proved quality in an image.

2.4.4 Aliasing and Moiré Patterns
As discussed in more detail in Chapter 4, functions whose area under the curve
is finite can be represented in terms of sines and cosines of various frequencies.
The sine/cosine component with the highest frequency determines the highest
“frequency content” of the function. Suppose that this highest frequency is fi-
nite and that the function is of unlimited duration (these functions are called
band-limited functions).Then, the Shannon sampling theorem [Bracewell (1995)]
tells us that, if the function is sampled at a rate equal to or greater than twice
its highest frequency, it is possible to recover completely the original function
from its samples. If the function is undersampled, then a phenomenon called
aliasing corrupts the sampled image. The corruption is in the form of addition-
al frequency components being introduced into the sampled function. These
are called aliased frequencies. Note that the sampling rate in images is the num-
ber of samples taken (in both spatial directions) per unit distance.
As it turns out, except for a special case discussed in the following paragraph,
it is impossible to satisfy the sampling theorem in practice.We can only work with
sampled data that are finite in duration. We can model the process of convert-


FIGURE 2.24 Illustration of the Moiré pattern effect.

ing a function of unlimited duration into a function of finite duration simply by
multiplying the unlimited function by a “gating function” that is valued 1 for
some interval and 0 elsewhere. Unfortunately, this function itself has frequen-
cy components that extend to infinity.Thus, the very act of limiting the duration
of a band-limited function causes it to cease being band limited, which causes
it to violate the key condition of the sampling theorem. The principal approach
for reducing the aliasing effects on an image is to reduce its high-frequency com-
ponents by blurring the image (we discuss blurring in detail in Chapter 4) prior
to sampling. However, aliasing is always present in a sampled image. The effect
of aliased frequencies can be seen under the right conditions in the form of so-
called Moiré patterns†, as discussed next.
There is one special case of significant importance in which a function of in-
finite duration can be sampled over a finite interval without violating the sam-
pling theorem. When a function is periodic, it may be sampled at a rate equal
to or exceeding twice its highest frequency, and it is possible to recover the func-
tion from its samples provided that the sampling captures exactly an integer
number of periods of the function. This special case allows us to illustrate vivid-
ly the Moiré effect. Figure 2.24 shows two identical periodic patterns of equal-
ly spaced vertical bars, rotated in opposite directions and then superimposed on
each other by multiplying the two images. A Moiré pattern, caused by a break-
up of the periodicity, is seen in Fig. 2.24 as a 2-D sinusoidal (aliased) waveform
(which looks like a corrugated tin roof) running in a vertical direction. A simi-
lar pattern can appear when images are digitized (e.g., scanned) from a print-
ed page, which consists of periodic ink dots.

†
The word Moiré appears to have originated with weavers and comes from the word mohair, a cloth made
from Angora goat hairs.


2.4.5 Zooming and Shrinking Digital Images
We conclude the treatment of sampling and quantization with a brief discussion
on how to zoom and shrink a digital image. This topic is related to image sam-
pling and quantization because zooming may be viewed as oversampling, while
shrinking may be viewed as undersampling. The key difference between these
two operations and sampling and quantizing an original continuous image is
that zooming and shrinking are applied to a digital image.
Zooming requires two steps: the creation of new pixel locations, and the
assignment of gray levels to those new locations. Let us start with a simple ex-
ample. Suppose that we have an image of size 500*500 pixels and we want
to enlarge it 1.5 times to 750*750 pixels. Conceptually, one of the easiest
ways to visualize zooming is laying an imaginary 750*750 grid over the orig-
inal image. Obviously, the spacing in the grid would be less than one pixel be-
cause we are fitting it over a smaller image. In order to perform gray-level
assignment for any point in the overlay, we look for the closest pixel in the
original image and assign its gray level to the new pixel in the grid. When we
are done with all points in the overlay grid, we simply expand it to the origi-
nal specified size to obtain the zoomed image. This method of gray-level as-
signment is called nearest neighbor interpolation. (Pixel neighborhoods are
discussed in the next section.)
Pixel replication, the method used to generate Figs. 2.20(b) through (f), is a
special case of nearest neighbor interpolation. Pixel replication is applicable
when we want to increase the size of an image an integer number of times. For
instance, to double the size of an image, we can duplicate each column. This
doubles the image size in the horizontal direction. Then, we duplicate each row
of the enlarged image to double the size in the vertical direction. The same pro-
cedure is used to enlarge the image by any integer number of times (triple,
quadruple, and so on). Duplication is just done the required number of times to
achieve the desired size. The gray-level assignment of each pixel is predeter-
mined by the fact that new locations are exact duplicates of old locations.
Although nearest neighbor interpolation is fast, it has the undesirable feature
that it produces a checkerboard effect that is particularly objectionable at high
factors of magnification. Figures 2.20(e) and (f) are good examples of this. A
slightly more sophisticated way of accomplishing gray-level assignments is
bilinear interpolation using the four nearest neighbors of a point. Let (x¿, y¿)
denote the coordinates of a point in the zoomed image (think of it as a point on
the grid described previously), and let v(x¿, y¿) denote the gray level assigned
to it. For bilinear interpolation, the assigned gray level is given by
v(x¿, y¿) = ax¿ + by¿ + cx¿y¿ + d (2.4-6)
where the four coefficients are determined from the four equations in four un-
knowns that can be written using the four nearest neighbors of point (x¿, y¿).
Image shrinking is done in a similar manner as just described for zooming. The
equivalent process of pixel replication is row-column deletion. For example, to shrink
an image by one-half, we delete every other row and column.We can use the zoom-
ing grid analogy to visualize the concept of shrinking by a noninteger factor, except


that we now expand the grid to fit over the original image, do gray-level nearest
neighbor or bilinear interpolation, and then shrink the grid back to its original spec-
ified size.To reduce possible aliasing effects, it is a good idea to blur an image slight-
ly before shrinking it. Blurring of digital images is discussed in Chapters 3 and 4.
It is possible to use more neighbors for interpolation. Using more neighbors
implies fitting the points with a more complex surface, which generally gives
smoother results.This is an exceptionally important consideration in image gen-
eration for 3-D graphics [Watt (1993)] and in medical image processing
[Lehmann et al. (1999)], but the extra computational burden seldom is justifi-
able for general-purpose digital image zooming and shrinking, where bilinear
interpolation generally is the method of choice.

I Figures 2.20(d) through (f) are shown again in the top row of Fig. 2.25. As EXAMPLE 2.4:
noted earlier, these images were zoomed from 128*128, 64*64, and 32*32 Image zooming
to 1024*1024 pixels using nearest neighbor interpolation. The equivalent re- using bilinear
interpolation.
sults using bilinear interpolation are shown in the second row of Fig. 2.25. The
improvements in overall appearance are clear, especially in the 128*128 and

a b c
d e f
FIGURE 2.25 Top row: images zoomed from 128*128, 64*64, and 32*32 pixels to 1024*1024 pixels,
using nearest neighbor gray-level interpolation. Bottom row: same sequence, but using bilinear interpolation.


64*64 cases.The 32*32 to 1024*1024 image is blurry, but keep in mind that
this image was zoomed by a factor of 32. In spite of this, the result of bilinear
interpolation shown in Fig. 2.25(f) is a reasonably good rendition of the origi-
nal image shape, something that is lost in Fig. 2.25(c). I

2.5 Some Basic Relationships Between Pixels
In this section, we consider several important relationships between pixels in a
digital image.As mentioned before, an image is denoted by f(x, y).When refer-
ring in this section to a particular pixel, we use lowercase letters, such as p and q.

2.5.1 Neighbors of a Pixel
A pixel p at coordinates (x, y) has four horizontal and vertical neighbors whose
coordinates are given by
(x+1, y), (x-1, y), (x, y+1), (x, y-1)
This set of pixels, called the 4-neighbors of p, is denoted by N4(p). Each pixel
is a unit distance from (x, y), and some of the neighbors of p lie outside the
digital image if (x, y) is on the border of the image.
The four diagonal neighbors of p have coordinates
(x+1, y+1), (x+1, y-1), (x-1, y+1), (x-1, y-1)
and are denoted by ND(p). These points, together with the 4-neighbors, are
called the 8-neighbors of p, denoted by N8(p). As before, some of the points in
ND(p) and N8(p) fall outside the image if (x, y) is on the border of the image.

2.5.2 Adjacency, Connectivity, Regions, and Boundaries
Connectivity between pixels is a fundamental concept that simplifies the defini-
tion of numerous digital image concepts, such as regions and boundaries. To es-
tablish if two pixels are connected, it must be determined if they are neighbors and
if their gray levels satisfy a specified criterion of similarity (say, if their gray lev-
els are equal). For instance, in a binary image with values 0 and 1, two pixels may
be 4-neighbors, but they are said to be connected only if they have the same value.
Let V be the set of gray-level values used to define adjacency. In a binary
image, V={1} if we are referring to adjacency of pixels with value 1. In a gray-
scale image, the idea is the same, but set V typically contains more elements. For
example, in the adjacency of pixels with a range of possible gray-level values 0
to 255, set V could be any subset of these 256 values. We consider three types
of adjacency:
(a) 4-adjacency. Two pixels p and q with values from V are 4-adjacent if q is
in the set N4(p).
(b) 8-adjacency. Two pixels p and q with values from V are 8-adjacent if q is
in the set N8(p).

2.5 I Some Basic Relationships Between Pixels 67

(c) m-adjacency (mixed adjacency). Two pixels p and q with values from V are
m-adjacent if
(i) q is in N4(p), or
(ii) q is in ND(p) and the set N4(p) ¨ N4(q) has no pixels whose values
are from V.
Mixed adjacency is a modification of 8-adjacency. It is introduced to eliminate
the ambiguities that often arise when 8-adjacency is used. For example, consid-
er the pixel arrangement shown in Fig. 2.26(a) for V={1}. The three pixels at
the top of Fig. 2.26(b) show multiple (ambiguous) 8-adjacency, as indicated by
the dashed lines. This ambiguity is removed by using m-adjacency, as shown in
Fig. 2.26(c). Two image subsets S1 and S2 are adjacent if some pixel in S1 is ad-
jacent to some pixel in S2. It is understood here and in the following definitions
that adjacent means 4-, 8-, or m-adjacent.
A (digital) path (or curve) from pixel p with coordinates (x, y) to pixel q
with coordinates (s, t) is a sequence of distinct pixels with coordinates
Ax0 , y0 B, Ax1 , y1 B, p , Axn , yn B
where Ax0 , y0 B = (x, y), Axn , yn B = (s, t), and pixels Axi , yi B and Axi - 1 , yi - 1 B are
adjacent for 1 i n. In this case, n is the length of the path. If
Ax0 , y0 B = (xn , yn), the path is a closed path.We can define 4-, 8-, or m-paths de-
pending on the type of adjacency specified. For example, the paths shown in
Fig. 2.26(b) between the northeast and southeast points are 8-paths, and the
path in Fig. 2.26(c) is an m-path. Note the absence of ambiguity in the m-path.
Let S represent a subset of pixels in an image. Two pixels p and q are said to
be connected in S if there exists a path between them consisting entirely of pix-
els in S. For any pixel p in S, the set of pixels that are connected to it in S is
called a connected component of S. If it only has one connected component,
then set S is called a connected set.
Let R be a subset of pixels in an image. We call R a region of the image if R
is a connected set. The boundary (also called border or contour) of a region R
is the set of pixels in the region that have one or more neighbors that are not
in R. If R happens to be an entire image (which we recall is a rectangular set of
pixels), then its boundary is defined as the set of pixels in the first and last rows
and columns of the image.This extra definition is required because an image has
no neighbors beyond its border. Normally, when we refer to a region, we are

0 1 1 0 1 1 0 1 1

0 1 0 0 1 0 0 1 0

0 0 1 0 0 1 0 0 1

a b c
FIGURE 2.26 (a) Arrangement of pixels; (b) pixels that are 8-adjacent (shown dashed)
to the center pixel; (c) m-adjacency.


referring to a subset of an image, and any pixels in the boundary of the region
that happen to coincide with the border of the image are included implicitly as
part of the region boundary.
The concept of an edge is found frequently in discussions dealing with re-
gions and boundaries. There is a key difference between these concepts, how-
ever. The boundary of a finite region forms a closed path (Problem 2.14) and is
thus a “global” concept. As discussed in detail in Chapter 10, edges are formed
from pixels with derivative values that exceed a preset threshold. Thus, the idea
of an edge is a “local” concept that is based on a measure of gray-level discon-
tinuity at a point. It is possible to link edge points into edge segments, and some-
times these segments are linked in such a way that correspond to boundaries,
but this is not always the case.The one exception in which edges and boundaries
correspond is in binary images. Depending on the type of connectivity and edge
operators used (we discuss these in Chapter 10), the edge extracted from a bi-
nary region will be the same as the region boundary. This is intuitive. Concep-
tually, until we arrive at Chapter 10, it is helpful to think of edges as intensity
discontinuities and boundaries as closed paths.

2.5.3 Distance Measures
For pixels p, q, and z, with coordinates (x, y), (s, t), and (v, w), respectively, D
is a distance function or metric if
(a) D(p, q) 0 AD(p, q)=0 iff p=qB,
(b) D(p, q)=D(q, p), and
(c) D(p, z) D(p, q)+D(q, z).
The Euclidean distance between p and q is defined as
De(p, q) = C(x - s)2 + (y - t)2 D 2 .
1
(2.5-1)

For this distance measure, the pixels having a distance less than or equal to some
value r from (x, y) are the points contained in a disk of radius r centered at (x, y).
The D4 distance (also called city-block distance) between p and q is defined as
D4(p, q) = ∑x - s∑ + ∑y - t∑. (2.5-2)

In this case, the pixels having a D4 distance from (x, y) less than or equal to
some value r form a diamond centered at (x, y). For example, the pixels with
D4 distance 2 from (x, y) (the center point) form the following contours of
constant distance:
2
2 1 2
2 1 0 1 2
2 1 2
2

The pixels with D4=1 are the 4-neighbors of (x, y).

2.5 I Some Basic Relationships Between Pixels 69

The D8 distance (also called chessboard distance) between p and q is defined as
D8(p, q) = max A∑x - s∑, ∑y - t∑B. (2.5-3)
In this case, the pixels with D8 distance from (x, y) less than or equal to some value
r form a square centered at (x, y). For example, the pixels with D8 distance 2
from (x, y) (the center point) form the following contours of constant distance:
2 2 2 2 2
2 1 1 1 2
2 1 0 1 2
2 1 1 1 2
2 2 2 2 2
The pixels with D8=1 are the 8-neighbors of (x, y).
Note that the D4 and D8 distances between p and q are independent of any
paths that might exist between the points because these distances involve only
the coordinates of the points. If we elect to consider m-adjacency, however, the
Dm distance between two points is defined as the shortest m-path between the
points. In this case, the distance between two pixels will depend on the values
of the pixels along the path, as well as the values of their neighbors. For in-
stance, consider the following arrangement of pixels and assume that p, p2 , and
p4 have value 1 and that p1 and p3 can have a value of 0 or 1:
p3 p4
p1 p2
p
Suppose that we consider adjacency of pixels valued 1 (i.e., V={1}). If p1 and
p3 are 0, the length of the shortest m-path (the Dm distance) between p and p4
is 2. If p1 is 1, then p2 and p will no longer be m-adjacent (see the definition of
m-adjacency) and the length of the shortest m-path becomes 3 (the path goes
through the points pp1 p2 p4). Similar comments apply if p3 is 1 (and p1 is 0); in
this case, the length of the shortest m-path also is 3. Finally, if both p1 and p3 are
1 the length of the shortest m-path between p and p4 is 4. In this case, the path
goes through the sequence of points pp1 p2 p3 p4 .

2.5.4 Image Operations on a Pixel Basis
Numerous references are made in the following chapters to operations between
images, such as dividing one image by another. In Eq. (2.4-2), images were rep-
resented in the form of matrices. As we know, matrix division is not defined.
However, when we refer to an operation like “dividing one image by another,”
we mean specifically that the division is carried out between corresponding pix-
els in the two images. Thus, for example, if f and g are images, the first element
of the image formed by “dividing” f by g is simply the first pixel in f divided
by the first pixel in g; of course, the assumption is that none of the pixels in g
have value 0. Other arithmetic and logic operations are similarly defined be-
tween corresponding pixels in the images involved.


2.6 Linear and Nonlinear Operations
Let H be an operator whose input and output are images. H is said to be a linear
operator if, for any two images f and g and any two scalars a and b,
H(af + bg) = aH(f) + bH(g). (2.6-1)
In other words, the result of applying a linear operator to the sum of two images
(that have been multiplied by the constants shown) is identical to applying the
operator to the images individually, multiplying the results by the appropriate
constants, and then adding those results. For example, an operator whose func-
tion is to compute the sum of K images is a linear operator. An operator that
computes the absolute value of the difference of two images is not. An opera-
tor that fails the test of Eq. (2.6-1) is by definition nonlinear.
Linear operations are exceptionally important in image processing because
they are based on a significant body of well-understood theoretical and practi-
cal results.Although nonlinear operations sometimes offer better performance,
they are not always predictable, and for the most part are not well understood
theoretically.

Summary
The material in this chapter is primarily background information for subsequent dis-
cussions. Our treatment of the human visual system, although brief, provides a basic idea
of the capabilities of the eye in perceiving pictorial information. The discussion of light
and the electromagnetic spectrum is fundamental in understanding the origin of the
many images we use in this book. Similarly, the image model developed in Section 2.3.4
is used in the Chapter 4 as the basis for an image enhancement technique called homo-
morphic filtering, and again in Chapter 10 to explain the effect of illumination on the
shape of image histograms.
The sampling ideas introduced in Section 2.4 are the foundation for many of the dig-
itizing phenomena likely to be encountered in practice. These ideas can be expanded
further once a basic understanding of frequency content is mastered. A detailed discus-
sion of the frequency domain is given in Chapter 4. The concepts of sampling and alias-
ing effects also are of importance in the context of image acquisition.
The concepts introduced in Section 2.5 are the basic building blocks for processing
techniques based on pixel neighborhoods. As shown in the following chapter and in
Chapter 5, neighborhood processing methods are at the core of many image enhance-
ment and restoration procedures. When applicable, neighborhood processing is favored
in commercial applications of image processing due to their operational speed and sim-
plicity of implementation in hardware and/or firmware. Finally, the concept of a linear
operator and the theoretical and conceptual power associated with it will be used ex-
tensively in the following three chapters.

Additional reading for the material in Section 2.1 regarding the structure of the human
eye may be found in Atchison and Smith [2000], and Oyster [1999]. For additional read-
ing on visual perception, see Regan [2000] and Gordon [1997].The book by Hubel [1988]
and the now classic book by Cornsweet [1970] also are of interest. Born and Wolf [1999]

I Problems 71

is a basic reference that discusses light in terms of electromagnetic theory. Electromag-
netic energy propagation is covered in some detail by Felsen and Marcuvitz [1994].
The area of image sensing is quite broad and very fast moving. An excellent source
of information on optical and other imaging sensors is the International Society for Op-
tical Engineering (SPIE). The following are representative publications by the SPIE in
this area: Blouke et al. [2001], Hoover and Doty [1996], and Freeman [1987].
The image model presented in Section 2.3.4 is from Oppenheim, Schafer, and Stock-
ham [1968]. A reference for the illumination and reflectance values used in that section
is the IES Lighting Handbook [2000]. For additional reading on image sampling and
some of its effects, such as aliasing, see Bracewell [1995]. The early experiments men-
tioned in Section 2.4.3 on perceived image quality as a function of sampling and quati-
zation were reported by Huang [1965].The issue of reducing the number of samples and
gray levels in an image while minimizing the ensuing degradation is still of current in-
terest, as exemplified by Papamarkos and Atsalakis [2000]. For further reading on image
shrinking and zooming, see Sid-Ahmed [1995], Unser et al. [1995], Umbaugh [1998], and
Lehmann et al. [1999]. For further reading on the topics covered in Section 2.5, see
Rosenfeld and Kak [1982], Marchand-Maillet and Sharaiha [2000], and Ritter and Wil-
son [2001]. Additional reading on linear systems in the context of image processing may
be found in Castleman [1996].

Problems
# 2.1 Using the background information provided in Section 2.1, and thinking purely
in geometric terms, estimate the diameter of the smallest printed dot that the eye
can discern if the page on which the dot is printed is 0.2 m away from the eyes.
Assume for simplicity that the visual system ceases to detect the dot when the
image of the dot on the fovea becomes smaller than the diameter of one recep- See inside front cover

tor (cone) in that area of the retina. Assume further that the fovea can be mod- Detailed solutions to the
problems marked with a
eled as a square array of dimensions 1.5 mm*1.5 mm, and that the cones and star can be found in the
spaces between the cones are distributed uniformly throughout this array. book web site. The site
also contains suggested
2.2 When you enter a dark theater on a bright day, it takes an appreciable interval projects based on the ma-
of time before you can see well enough to find an empty seat. Which of the visu- terial in this chapter.
al processes explained in Section 2.1 is at play in this situation?
# 2.3 Although it is not shown in Fig. 2.10, alternating current certainly is part of the
electromagnetic spectrum. Commercial alternating current in the United States
has a frequency of 60 Hz.What is the wavelength in kilometers of this component
of the spectrum?
2.4 You are hired to design the front end of an imaging system for studying the bound-
ary shapes of cells, bacteria, viruses, and protein.The front end consists, in this case,
of the illumination source(s) and corresponding imaging camera(s). The diame-
ters of circles required to enclose individual specimens in each of these categories
are 50, 1, 0.1, and 0.01 m, respectively.
(a) Can you solve the imaging aspects of this problem with a single sensor and
camera? If your answer is yes, specify the illumination wavelength band and
the type of camera needed. Identify the camera as being a color camera, far-
infrared camera, or whatever appropriate name corresponds to the illumi-
nation source.
(b) If your answer in (a) is no, what type of illumination sources and corre-
sponding imaging sensors would you recommend? Specify the light sources


and cameras as requested in part (a). Use the minimum number of illumina-
tion sources and cameras needed to solve the problem.
2.5 A CCD camera chip of dimensions 7*7 mm, and having 1024*1024 elements,
is focused on a square, flat area, located 0.5 m away. How many line pairs per mm
will this camera be able to resolve? The camera is equipped with a 35-mm lens.
(Hint: Model the imaging process as in Fig. 2.3, with the focal length of the cam-
era lens substituting for the focal length of the eye.)
# 2.6 An automobile manufacturer is automating the placement of certain components
on the bumpers of a limited-edition line of sports cars. The components are color
coordinated, so the robots need to know the color of each car in order to select the
appropriate bumper component. Models come in only four colors: blue, green, red,
and white. You are hired to propose a solution based on imaging. How would you
solve the problem of automatically determining the color of each car, keeping in
mind that cost is the most important consideration in your choice of components?
2.7 Suppose that a flat area with center at Ax0 , y0 B is illuminated by a light source with
intensity distribution

i(x, y) = Ke-CAx - x0B + Ay - y0B D
2 2
.

Assume for simplicity that the reflectance of the area is constant and equal to
1.0, and let K=255. If the resulting image is digitized with k bits of intensity res-
olution, and the eye can detect an abrupt change of eight shades of intensity be-
tween adjacent pixels, what value of k will cause visible false contouring?
2.8 Sketch the image in Problem 2.7 for k=2.
# 2.9 A common measure of transmission for digital data is the baud rate, defined as
the number of bits transmitted per second. Generally, transmission is accom-
plished in packets consisting of a start bit, a byte (8 bits) of information, and a stop
bit. Using these facts, answer the following:
(a) How many minutes would it take to transmit a 1024*1024 image with 256
gray levels using a 56K baud modem?
(b) What would the time be at 750K baud, a representative speed of a phone
DSL (digital subscriber line) connection?
2.10 High-definition television (HDTV) generates images with a resolution of 1125
horizontal TV lines interlaced (where every other line is painted on the tube face
in each of two fields, each field being 1 60th of a second in duration). The width-
to-height aspect ratio of the images is 16 : 9. The fact that the horizontal lines are
distinct fixes the vertical resolution of the images. A company has designed an
image capture system that generates digital images from HDTV images. The res-
olution of each TV (horizontal) line in their system is in proportion to vertical res-
olution, with the proportion being the width-to-height ratio of the images. Each
pixel in the color image has 24 bits of intensity resolution, 8 pixels each for a red,
a green, and a blue image.These three “primary” images form a color image. How
many bits would it take to store a 2-hour HDTV program?
# 2.11 Consider the two image subsets, S1 and S2, shown in the following figure. For
V={1}, determine whether these two subsets are (a) 4-adjacent, (b) 8-adjacent,
or (c) m-adjacent.

I Problems 73

S1 S2
0 0 0 0 0 0 0 1 1 0
1 0 0 1 0 0 1 0 0 1
1 0 0 1 0 1 1 0 0 0
0 0 1 1 1 0 0 0 0 0
0 0 1 1 1 0 0 1 1 1

# 2.12 Develop an algorithm for converting a one-pixel-thick 8-path to a 4-path.
2.13 Develop an algorithm for converting a one-pixel-thick m-path to a 4-path.
2.14 Show that the boundary of the region, as defined in Section 2.5.2, is a closed path.
# 2.15 Consider the image segment shown.
(a) Let V={0, 1} and compute the lengths of the shortest 4-, 8-, and m-path be-
tween p and q. If a particular path does not exist between these two points,
explain why.
(b) Repeat for V={1, 2}.
3 1 2 1 (q)

2 2 0 2

1 2 1 1

(p) 1 0 1 2

# 2.16 (a) Give the condition(s) under which the D4 distance between two points p and
q is equal to the shortest 4-path between these points.
(b) Is this path unique?
2.17 Repeat Problem 2.16 for the D8 distance.
# 2.18 In the following chapter, we will deal with operators whose function is to com-
pute the sum of pixel values in a small subimage area, S. Show that these are lin-
ear operators.
2.19 The median, z, of a set of numbers is such that half the values in the set are below
z and the other half are above it. For example, the median of the set of values
{2, 3, 8, 20, 21, 25, 31} is 20. Show that an operator that computes the median of
a subimage area, S, is nonlinear.
2.20 A plant produces a line of translucent miniature polymer squares. Stringent qual-
ity requirements dictate 100% visual inspection, and the plant manager finds the use
of human inspectors increasingly expensive. Inspection is semiautomated.At each
inspection station, a robotic mechanism places each polymer square over a light
located under an optical system that produces a magnified image of the square.
The image completely fills a viewing screen measuring 80*80 mm. Defects appear
as dark circular blobs, and the inspector’s job is to look at the screen and reject any
sample that has one or more such dark blobs with a diameter of 0.8 mm or larger,
as measured on the scale of the screen. The manager believes that, if she can find
a way to automate the process completely, she will increase profits by 50%. She
also believes that success in this project will aid her climb up the corporate ladder.
After much investigation, the manager decides that the way to solve the problem
is to view each inspection screen with a CCD TV camera and feed the output of the


camera into an image processing system capable of detecting the blobs, measuring
their diameter, and activating the accept/reject buttons previously operated by an
inspector. She is able to find a system that can do the job, as long as the smallest de-
fect occupies an area of at least 2*2 pixels in the digital image.The manager hires
you to help her specify the camera and lens system, but requires that you use off-
the-shelf components. For the lenses, assume that this constraint means any integer
multiple of 25 mm or 35 mm, up to 200 mm. For the cameras, it means resolutions
of 512*512, 1024*1024, or 2048*2048 pixels.The individual imaging elements
in these cameras are squares measuring 8*8 m, and the spaces between imag-
ing elements are 2 m. For this application, the cameras cost much more than the
lenses, so the problem should be solved with the lowest-resolution camera possible,
based on the choice of lenses. As a consultant, you are to provide a written recom-
mendation, showing in reasonable detail the analysis that led to your conclusion.
Use the same imaging geometry suggested in Problem 2.5.

3 Image Enhancement
in the Spatial Domain
It makes all the difference whether one sees darkness
through the light or brightness through the shadows.
David Lindsay

Preview
The principal objective of enhancement is to process an image so that the re-
sult is more suitable than the original image for a specific application.The word
specific is important, because it establishes at the outset that the techniques dis-
cussed in this chapter are very much problem oriented. Thus, for example, a
method that is quite useful for enhancing X-ray images may not necessarily be
the best approach for enhancing pictures of Mars transmitted by a space probe.
Regardless of the method used, however, image enhancement is one of the most
interesting and visually appealing areas of image processing.
Image enhancement approaches fall into two broad categories: spatial domain
methods and frequency domain methods. The term spatial domain refers to the
image plane itself, and approaches in this category are based on direct manipu-
lation of pixels in an image. Frequency domain processing techniques are based
on modifying the Fourier transform of an image. Spatial methods are covered in
this chapter, and frequency domain enhancement is discussed in Chapter 4. En-
hancement techniques based on various combinations of methods from these
two categories are not unusual.We note also that many of the fundamental tech-
niques introduced in this chapter in the context of enhancement are used in
subsequent chapters for a variety of other image processing applications.
There is no general theory of image enhancement. When an image is
processed for visual interpretation, the viewer is the ultimate judge of how well

75

76 Chapter 3 I Image Enhancement in the Spatial Domain

a particular method works. Visual evaluation of image quality is a highly sub-
jective process, thus making the definition of a “good image” an elusive standard
by which to compare algorithm performance. When the problem is one of pro-
cessing images for machine perception, the evaluation task is somewhat easier.
For example, in dealing with a character recognition application, and leaving
aside other issues such as computational requirements, the best image process-
ing method would be the one yielding the best machine recognition results.
However, even in situations when a clear-cut criterion of performance can be
imposed on the problem, a certain amount of trial and error usually is required
before a particular image enhancement approach is selected.

3.1 Background
As indicated previously, the term spatial domain refers to the aggregate of
pixels composing an image. Spatial domain methods are procedures that op-
erate directly on these pixels. Spatial domain processes will be denoted by the
expression
g(x, y) = TCf(x, y)D (3.1-1)
where f(x, y) is the input image, g(x, y) is the processed image, and T is an
operator on f, defined over some neighborhood of (x, y). In addition, T can op-
erate on a set of input images, such as performing the pixel-by-pixel sum of K
images for noise reduction, as discussed in Section 3.4.2.
The principal approach in defining a neighborhood about a point (x, y) is to
use a square or rectangular subimage area centered at (x, y), as Fig. 3.1 shows.
The center of the subimage is moved from pixel to pixel starting, say, at the top
left corner. The operator T is applied at each location (x, y) to yield the output,
g, at that location. The process utilizes only the pixels in the area of the image
spanned by the neighborhood.Although other neighborhood shapes, such as ap-

FIGURE 3.1 A Origin
3*3 y
neighborhood
about a point (x, y)
(x, y) in an image.

Image f(x, y)

x

3.1 I Background 77

proximations to a circle, sometimes are used, square and rectangular arrays are
by far the most predominant because of their ease of implementation.
The simplest form of T is when the neighborhood is of size 1*1 (that is, a
single pixel). In this case, g depends only on the value of f at (x, y), and T be-
comes a gray-level (also called an intensity or mapping) transformation func-
tion of the form

s = T(r) (3.1-2)

where, for simplicity in notation, r and s are variables denoting, respectively,
the gray level of f(x, y) and g(x, y) at any point (x, y). For example, if T(r) has
the form shown in Fig. 3.2(a), the effect of this transformation would be to pro-
duce an image of higher contrast than the original by darkening the levels below
m and brightening the levels above m in the original image. In this technique,
known as contrast stretching, the values of r below m are compressed by the
transformation function into a narrow range of s, toward black.The opposite ef-
fect takes place for values of r above m. In the limiting case shown in Fig. 3.2(b),
T(r) produces a two-level (binary) image. A mapping of this form is called a
thresholding function. Some fairly simple, yet powerful, processing approaches
can be formulated with gray-level transformations. Because enhancement at
any point in an image depends only on the gray level at that point, techniques
in this category often are referred to as point processing.
Larger neighborhoods allow considerably more flexibility. The general ap-
proach is to use a function of the values of f in a predefined neighborhood of
(x, y) to determine the value of g at (x, y). One of the principal approaches in
this formulation is based on the use of so-called masks (also referred to as filters,
kernels, templates, or windows). Basically, a mask is a small (say, 3*3) 2-D
array, such as the one shown in Fig. 3.1, in which the values of the mask coeffi-
cients determine the nature of the process, such as image sharpening. En-
hancement techniques based on this type of approach often are referred to as
mask processing or filtering. These concepts are discussed in Section 3.5.

s=T(r) s=T(r) a b
FIGURE 3.2 Gray-
level
transformation
Light

Light

functions for
T(r) T(r) contrast
enhancement.
Dark

Dark

r r
m m
Dark Light Dark Light


3.2 Some Basic Gray Level Transformations
We begin the study of image enhancement techniques by discussing gray-level
transformation functions.These are among the simplest of all image enhancement
techniques. The values of pixels, before and after processing, will be denoted by r
and s, respectively. As indicated in the previous section, these values are related
by an expression of the form s=T(r), where T is a transformation that maps a
pixel value r into a pixel value s. Since we are dealing with digital quantities, val-
ues of the transformation function typically are stored in a one-dimensional array
and the mappings from r to s are implemented via table lookups. For an 8-bit en-
vironment, a lookup table containing the values of T will have 256 entries.
As an introduction to gray-level transformations, consider Fig. 3.3, which
shows three basic types of functions used frequently for image enhancement: lin-
ear (negative and identity transformations), logarithmic (log and inverse-log
transformations), and power-law (nth power and nth root transformations).The
identity function is the trivial case in which output intensities are identical to
input intensities. It is included in the graph only for completeness.

3.2.1 Image Negatives
The negative of an image with gray levels in the range [0, L-1] is obtained by using
the negative transformation shown in Fig. 3.3, which is given by the expression
s = L - 1 - r. (3.2-1)

FIGURE 3.3 Some L-1
basic gray-level
transformation
functions used for Negative
image
enhancement. nth root
3L/4

Log
Output gray level, s

nth power
L/2

L/4

Identity Inverse log

0
0 L/4 L/2 3L/4 L-1
Input gray level, r

3.2 I Some Basic Gray Level Transformations 79

a b
FIGURE 3.4
(a) Original
digital
mammogram.
(b) Negative
image obtained
using the negative
transformation in
Eq. (3.2-1).
(Courtesy of G.E.
Medical Systems.)

Reversing the intensity levels of an image in this manner produces the equiva-
lent of a photographic negative. This type of processing is particularly suited
for enhancing white or gray detail embedded in dark regions of an image, es-
pecially when the black areas are dominant in size. An example is shown in
Fig. 3.4. The original image is a digital mammogram showing a small lesion. In
spite of the fact that the visual content is the same in both images, note how
much easier it is to analyze the breast tissue in the negative image in this par-
ticular case.

3.2.2 Log Transformations
The general form of the log transformation shown in Fig. 3.3 is
s = c log (1 + r) (3.2-2)
where c is a constant, and it is assumed that r 0. The shape of the log curve
in Fig. 3.3 shows that this transformation maps a narrow range of low gray-level
values in the input image into a wider range of output levels.The opposite is true
of higher values of input levels. We would use a transformation of this type to
expand the values of dark pixels in an image while compressing the higher-level
values. The opposite is true of the inverse log transformation.
Any curve having the general shape of the log functions shown in Fig. 3.3
would accomplish this spreading/compressing of gray levels in an image. In fact,
the power-law transformations discussed in the next section are much more
versatile for this purpose than the log transformation. However, the log func-
tion has the important characteristic that it compresses the dynamic range of im-
ages with large variations in pixel values. A classic illustration of an application
in which pixel values have a large dynamic range is the Fourier spectrum, which
will be discussed in Chapter 4. At the moment, we are concerned only with the
image characteristics of spectra. It is not unusual to encounter spectrum values


a b
FIGURE 3.5
(a) Fourier
spectrum.
(b) Result of
applying the log
transformation
given in
Eq. (3.2-2) with
c=1.

that range from 0 to 106 or higher. While processing numbers such as these pre-
sents no problems for a computer, image display systems generally will not be
able to reproduce faithfully such a wide range of intensity values. The net effect
is that a significant degree of detail will be lost in the display of a typical Fouri-
er spectrum.
As an illustration of log transformations, Fig. 3.5(a) shows a Fourier spectrum
with values in the range 0 to 1.5*106.When these values are scaled linearly for
display in an 8-bit system, the brightest pixels will dominate the display, at the ex-
pense of lower (and just as important) values of the spectrum. The effect of this
dominance is illustrated vividly by the relatively small area of the image in
Fig. 3.5(a) that is not perceived as black. If, instead of displaying the values in this
manner, we first apply Eq. (3.2-2) (with c=1 in this case) to the spectrum val-
ues, then the range of values of the result become 0 to 6.2, a more manageable
number. Figure 3.5(b) shows the result of scaling this new range linearly and dis-
playing the spectrum in the same 8-bit display.The wealth of detail visible in this
image as compared to a straight display of the spectrum is evident from these pic-
tures. Most of the Fourier spectra seen in image processing publications have
been scaled in just this manner.

3.2.3 Power-Law Transformations
Power-law transformations have the basic form
s = crg (3.2-3)
where c and g are positive constants. Sometimes Eq. (3.2-3) is written as
s = c(r + e)g to account for an offset (that is, a measurable output when the
input is zero). However, offsets typically are an issue of display calibration and
as a result they are normally ignored in Eq. (3.2-3). Plots of s versus r for vari-
ous values of g are shown in Fig. 3.6. As in the case of the log transformation,
power-law curves with fractional values of g map a narrow range of dark input
values into a wider range of output values, with the opposite being true for high-


L-1 FIGURE 3.6 Plots
of the equation
s=crg for
g=0.04
various values of
g=0.10 g (c=1 in all
cases).
3L/4 g=0.20

g=0.40
Output gray level, s

g=0.67
L/2 g=1

g=1.5

g=2.5

L/4
g=5.0

g=10.0

g=25.0

0
0 L/4 L/2 3L/4 L-1
Input gray level, r

er values of input levels. Unlike the log function, however, we notice here a
family of possible transformation curves obtained simply by varying g. As ex-
pected, we see in Fig. 3.6 that curves generated with values of g>1 have ex-
actly the opposite effect as those generated with values of g<1. Finally, we
note that Eq. (3.2-3) reduces to the identity transformation when c=g=1.
A variety of devices used for image capture, printing, and display respond ac-
cording to a power law. By convention, the exponent in the power-law equation
is referred to as gamma [hence our use of this symbol in Eq. (3.2-3)].The process
used to correct this power-law response phenomena is called gamma correc-
tion. For example, cathode ray tube (CRT) devices have an intensity-to-volt-
age response that is a power function, with exponents varying from
approximately 1.8 to 2.5. With reference to the curve for g=2.5 in Fig. 3.6, we
see that such display systems would tend to produce images that are darker
than intended. This effect is illustrated in Fig. 3.7. Figure 3.7(a) shows a simple
gray-scale linear wedge input into a CRT monitor. As expected, the output of
the monitor appears darker than the input, as shown in Fig. 3.7(b). Gamma cor-
rection in this case is straightforward. All we need to do is preprocess the input
image before inputting it into the monitor by performing the transformation
s = r1 2.5 = r0.4. The result is shown in Fig. 3.7(c). When input into the same
monitor, this gamma-corrected input produces an output that is close in ap-
pearance to the original image, as shown in Fig. 3.7(d). A similar analysis would


Image as viewed on monitor
a b
c d
FIGURE 3.7
(a) Linear-wedge
gray-scale image. Monitor
(b) Response of
monitor to linear
wedge.
(c) Gamma-
corrected wedge.
(d) Output of
monitor.
Gamma
correction

Image as viewed on monitor

Monitor

apply to other imaging devices such as scanners and printers. The only differ-
ence would be the device-dependent value of gamma (Poynton [1996]).
Gamma correction is important if displaying an image accurately on a com-
puter screen is of concern. Images that are not corrected properly can look ei-
ther bleached out, or, what is more likely, too dark. Trying to reproduce colors
accurately also requires some knowledge of gamma correction because varying
the value of gamma correction changes not only the brightness, but also the ra-
tios of red to green to blue. Gamma correction has become increasingly im-
portant in the past few years, as use of digital images for commercial purposes
over the Internet has increased. It is not unusual that images created for a pop-
ular Web site will be viewed by millions of people, the majority of whom will
have different monitors and/or monitor settings. Some computer systems even
have partial gamma correction built in. Also, current image standards do not
contain the value of gamma with which an image was created, thus complicat-
ing the issue further. Given these constraints, a reasonable approach when stor-
ing images in a Web site is to preprocess the images with a gamma that
represents an “average” of the types of monitors and computer systems that
one expects in the open market at any given point in time.
EXAMPLE 3.1:
Contrast
enhancement I In addition to gamma correction, power-law transformations are useful for
using power-law general-purpose contrast manipulation. Figure 3.8(a) shows a magnetic reso-
transformations. nance (MR) image of an upper thoracic human spine with a fracture dislocation


a b
c d
FIGURE 3.8
(a) Magnetic
resonance (MR)
image of a
fractured human
spine.
(b)–(d) Results of
applying the
transformation in
Eq. (3.2-3) with
c=1 and
g=0.6, 0.4, and
0.3, respectively.
(Original image
for this example
courtesy of Dr.
David R. Pickens,
Department of
Radiology and
Radiological
Sciences,
Vanderbilt
University
Medical Center.)

and spinal cord impingement. The fracture is visible near the vertical center of
the spine, approximately one-fourth of the way down from the top of the pic-
ture. Since the given image is predominantly dark, an expansion of gray levels
are desirable. This can be accomplished with a power-law transformation with
a fractional exponent. The other images shown in the Figure were obtained by
processing Fig. 3.8(a) with the power-law transformation function of Eq. (3.2-3).
The values of gamma corresponding to images (b) through (d) are 0.6, 0.4, and
0.3, respectively (the value of c was 1 in all cases). We note that, as gamma de-
creased from 0.6 to 0.4, more detail became visible.A further decrease of gamma


to 0.3 enhanced a little more detail in the background, but began to reduce con-
trast to the point where the image started to have a very slight “washed-out”
look, especially in the background. By comparing all results, we see that the
best enhancement in terms of contrast and discernable detail was obtained with
g=0.4.A value of g=0.3 is an approximate limit below which contrast in this
particular image would be reduced to an unacceptable level. I

EXAMPLE 3.2: I Figure 3.9(a) shows the opposite problem of Fig. 3.8(a). The image to be en-
Another hanced now has a washed-out appearance, indicating that a compression of gray
illustration of levels is desirable. This can be accomplished with Eq. (3.2-3) using values of g
power-law
transformations. greater than 1. The results of processing Fig. 3.9(a) with g=3.0, 4.0, and 5.0
are shown in Figs. 3.9(b) through (d). Suitable results were obtained with gamma
values of 3.0 and 4.0, the latter having a slightly more appealing appearance be-
cause it has higher contrast. The result obtained with g=5.0 has areas that are
too dark, in which some detail is lost.The dark region to the left of the main road
in the upper left quadrant is an example of such an area. I

a b
c d
FIGURE 3.9
(a) Aerial image.
(b)–(d) Results of
applying the
transformation in
Eq. (3.2-3) with
c=1 and
g=3.0, 4.0, and
5.0, respectively.
(Original image
for this example
courtesy of
NASA.)


3.2.4 Piecewise-Linear Transformation Functions
A complementary approach to the methods discussed in the previous three sec-
tions is to use piecewise linear functions. The principal advantage of piecewise
linear functions over the types of functions we have discussed thus far is that the
form of piecewise functions can be arbitrarily complex. In fact, as we will see
shortly, a practical implementation of some important transformations can be
formulated only as piecewise functions. The principal disadvantage of piece-
wise functions is that their specification requires considerably more user input.

Contrast stretching
One of the simplest piecewise linear functions is a contrast-stretching trans-
formation. Low-contrast images can result from poor illumination, lack of dy-
namic range in the imaging sensor, or even wrong setting of a lens aperture
during image acquisition. The idea behind contrast stretching is to increase the
dynamic range of the gray levels in the image being processed.
Figure 3.10(a) shows a typical transformation used for contrast stretching.
The locations of points Ar1 , s1 B and Ar2 , s2 B control the shape of the transformation

a b
L-1 c d
(r2, s2) FIGURE 3.10
3L/4 Contrast
Ouput gray level, s

stretching.
(a) Form of
L/2 T(r) transformation
function. (b) A
low-contrast
L/4
image. (c) Result
of contrast
(r1, s1) stretching.
(d) Result of
0
0 L/4 L/2 3L/4 L-1 thresholding.
(Original image
Input gray level, r
courtesy of
Dr. Roger Heady,
Research School
of Biological
Sciences,
Australian
National
University,
Canberra,
Australia.)


function. If r1=s1 and r2=s2 , the transformation is a linear function that pro-
duces no changes in gray levels. If r1=r2 , s1=0 and s2=L-1, the transfor-
mation becomes a thresholding function that creates a binary image, as illustrated
in Fig. 3.2(b). Intermediate values of Ar1 , s1 B and Ar2 , s2 B produce various degrees
of spread in the gray levels of the output image, thus affecting its contrast. In
general, r1 r2 and s1 s2 is assumed so that the function is single valued and
monotonically increasing. This condition preserves the order of gray levels, thus
preventing the creation of intensity artifacts in the processed image.
Figure 3.10(b) shows an 8-bit image with low contrast. Fig. 3.10(c) shows the
result of contrast stretching, obtained by setting Ar1 , s1 B= Armin , 0 B and
Ar2 , s2 B=Armax , L-1B where rmin and rmax denote the minimum and maximum
gray levels in the image, respectively.Thus, the transformation function stretched
the levels linearly from their original range to the full range [0, L-1]. Final-
ly, Fig. 3.10(d) shows the result of using the thresholding function defined pre-
viously, with r1=r2=m, the mean gray level in the image. The original image
on which these results are based is a scanning electron microscope image of
pollen, magnified approximately 700 times.

Gray-level slicing
Highlighting a specific range of gray levels in an image often is desired. Appli-
cations include enhancing features such as masses of water in satellite imagery
and enhancing flaws in X-ray images. There are several ways of doing level slic-
ing, but most of them are variations of two basic themes. One approach is to dis-
play a high value for all gray levels in the range of interest and a low value for
all other gray levels.This transformation, shown in Fig. 3.11(a), produces a binary
image.The second approach, based on the transformation shown in Fig. 3.11(b),
brightens the desired range of gray levels but preserves the background and
gray-level tonalities in the image. Figure 3.11(c) shows a gray-scale image, and
Fig. 3.11(d) shows the result of using the transformation in Fig. 3.11(a).Variations
of the two transformations shown in Fig. 3.11 are easy to formulate.

Bit-plane slicing
Instead of highlighting gray-level ranges, highlighting the contribution made to
total image appearance by specific bits might be desired. Suppose that each
pixel in an image is represented by 8 bits. Imagine that the image is composed
of eight 1-bit planes, ranging from bit-plane 0 for the least significant bit to bit-
plane 7 for the most significant bit. In terms of 8-bit bytes, plane 0 contains all
the lowest order bits in the bytes comprising the pixels in the image and plane
7 contains all the high-order bits. Figure 3.12 illustrates these ideas, and Fig. 3.14
shows the various bit planes for the image shown in Fig. 3.13. Note that the
higher-order bits (especially the top four) contain the majority of the visually sig-
nificant data.The other bit planes contribute to more subtle details in the image.
Separating a digital image into its bit planes is useful for analyzing the relative
importance played by each bit of the image, a process that aids in determining
the adequacy of the number of bits used to quantize each pixel. Also, this type
of decomposition is useful for image compression, as discussed in Chapter 8.


L-1 L-1 a b
c d
FIGURE 3.11
(a) This
transformation
highlights range
s s T(r) [A, B] of gray
T(r) levels and reduces
all others to a
constant level.
(b) This
transformation
highlights range
r r [A, B] but
0 A B L-1 0 A B L-1 preserves all
other levels.
(c) An image.
(d) Result of
using the
transformation
in (a).

In terms of bit-plane extraction for an 8-bit image, it is not difficult to show
that the (binary) image for bit-plane 7 can be obtained by processing the input
image with a thresholding gray-level transformation function that (1) maps all
levels in the image between 0 and 127 to one level (for example, 0); and (2) maps
all levels between 129 and 255 to another (for example, 255). The binary image
for bit-plane 7 in Fig. 3.14 was obtained in just this manner. It is left as an exer-
cise (Problem 3.3) to obtain the gray-level transformation functions that would
yield the other bit planes.

One 8-bit byte FIGURE 3.12
Bit-plane 7
Bit-plane
(most significant)
representation of
an 8-bit image.

Bit-plane 0
(least significant)


FIGURE 3.13 An 8-bit fractal image. (A fractal is an image generated from mathematical
expressions). (Courtesy of Ms. Melissa D. Binde, Swarthmore College, Swarthmore, PA.)

3.3 Histogram Processing
The histogram of a digital image with gray levels in the range [0, L-1] is a dis-
crete function hArk B=nk , where rk is the kth gray level and nk is the number
See inside front cover
of pixels in the image having gray level rk . It is common practice to normalize
Consult the book web site
a histogram by dividing each of its values by the total number of pixels in the
image, denoted by n. Thus, a normalized histogram is given by pArk B=nk n,
for a review of basic prob-
ability theory.

for k=0, 1, p , L-1. Loosely speaking, pArk B gives an estimate of the prob-
ability of occurrence of gray level rk . Note that the sum of all components of a
normalized histogram is equal to 1.
Histograms are the basis for numerous spatial domain processing techniques.
Histogram manipulation can be used effectively for image enhancement, as
shown in this section. In addition to providing useful image statistics, we shall
see in subsequent chapters that the information inherent in histograms also is
quite useful in other image processing applications, such as image compression
and segmentation. Histograms are simple to calculate in software and also lend
themselves to economic hardware implementations, thus making them a pop-
ular tool for real-time image processing.
As an introduction to the role of histogram processing in image enhance-
ment, consider Fig. 3.15, which is the pollen image of Fig. 3.10 shown in four
basic gray-level characteristics: dark, light, low contrast, and high contrast. The
right side of the figure shows the histograms corresponding to these images.
The horizontal axis of each histogram plot corresponds to gray level values, rk .
The vertical axis corresponds to values of hArk B=nk or pArk B=nk n if the
values are normalized. Thus, as indicated previously, these histogram plots are
simply plots of hArk B=nk versus rk or pArk B=nk n versus rk .

3.3 I Histogram Processing 89

FIGURE 3.14 The eight bit planes of the image in Fig. 3.13. The number at the bottom,
right of each image identifies the bit plane.

We note in the dark image that the components of the histogram are con-
centrated on the low (dark) side of the gray scale. Similarly, the components of
the histogram of the bright image are biased toward the high side of the gray
scale. An image with low contrast has a histogram that will be narrow and will
be centered toward the middle of the gray scale. For a monochrome image this
implies a dull, washed-out gray look. Finally, we see that the components of the
histogram in the high-contrast image cover a broad range of the gray scale and,
further, that the distribution of pixels is not too far from uniform, with very few
vertical lines being much higher than the others. Intuitively, it is reasonable to
conclude that an image whose pixels tend to occupy the entire range of possi-
ble gray levels and, in addition, tend to be distributed uniformly, will have an ap-
pearance of high contrast and will exhibit a large variety of gray tones. The net
effect will be an image that shows a great deal of gray-level detail and has high
dynamic range. It will be shown shortly that it is possible to develop a trans-
formation function that can automatically achieve this effect, based only on
information available in the histogram of the input image.


Dark image

Bright image

Low-contrast image

High-contrast image

a b
FIGURE 3.15 Four basic image types: dark, light, low contrast, high contrast, and their cor-
responding histograms. (Original image courtesy of Dr. Roger Heady, Research School
of Biological Sciences, Australian National University, Canberra, Australia.)


3.3.1 Histogram Equalization
Consider for a moment continuous functions, and let the variable r represent the
gray levels of the image to be enhanced. In the initial part of our discussion we
assume that r has been normalized to the interval [0, 1], with r=0 represent-
ing black and r=1 representing white. Later, we consider a discrete formula-
tion and allow pixel values to be in the interval [0, L-1].
For any r satisfying the aforementioned conditions, we focus attention on
transformations of the form

s=T(r) 0 r 1 (3.3-1)

that produce a level s for every pixel value r in the original image. For reasons
that will become obvious shortly, we assume that the transformation function
T(r) satisfies the following conditions:
(a) T(r) is single-valued and monotonically increasing in the interval
0 r 1; and
(b) 0 T(r) 1 for 0 r 1.
The requirement in (a) that T(r) be single valued is needed to guarantee that the
inverse transformation will exist, and the monotonicity condition preserves
the increasing order from black to white in the output image. A transformation
function that is not monotonically increasing could result in at least a section
of the intensity range being inverted, thus producing some inverted gray levels
in the output image. While this may be a desirable effect in some cases, that is
not what we are after in the present discussion. Finally, condition (b) guarantees
that the output gray levels will be in the same range as the input levels. Fig-
ure 3.16 gives an example of a transformation function that satisfies these two
conditions. The inverse transformation from s back to r is denoted

r = T-1(s) 0 s 1. (3.3-2)

It can be shown by example (Problem 3.8) that even if T(r) satisfies conditions
(a) and (b), it is possible that the corresponding inverse T-1(s) may fail to be sin-
gle valued.

s FIGURE 3.16 A
gray-level
transformation
t function that is
both single valued
and
sk=T(rk)
monotonically
T(r) increasing.

r
0 rk 1


The gray levels in an image may be viewed as random variables in the in-
terval [0, 1]. One of the most fundamental descriptors of a random variable is
its probability density function (PDF). Let pr(r) and ps(s) denote the probability
density functions of random variables r and s, respectively, where the subscripts
on p are used to denote that pr and ps are different functions. A basic result
from an elementary probability theory is that, if pr(r) and T(r) are known and
T-1(s) satisfies condition (a), then the probability density function ps(s) of the
transformed variable s can be obtained using a rather simple formula:

ps(s) = pr(r) 2 2.
dr
(3.3-3)
ds
Thus, the probability density function of the transformed variable, s, is deter-
mined by the gray-level PDF of the input image and by the chosen transfor-
mation function.
A transformation function of particular importance in image processing
has the form
r
s = T(r) = pr(w) dw (3.3-4)
30

where w is a dummy variable of integration. The right side of Eq. (3.3-4) is rec-
ognized as the cumulative distribution function (CDF) of random variable r.
Since probability density functions are always positive, and recalling that the in-
tegral of a function is the area under the function, it follows that this transfor-
mation function is single valued and monotonically increasing, and, therefore,
satisfies condition (a). Similarly, the integral of a probability density function for
variables in the range [0, 1] also is in the range [0, 1], so condition (b) is satis-
fied as well.
Given transformation function T(r), we find ps(s) by applying Eq. (3.3-3).We
know from basic calculus (Leibniz’s rule) that the derivative of a definite inte-
gral with respect to its upper limit is simply the integrand evaluated at that limit.
In other words,
ds dT(r)
=
dr dr
r
c p (w) dw d
d
= (3.3-5)
dr 3 r 0

= pr(r).
Substituting this result for dr ds into Eq. (3.3-3), and keeping in mind that all
probability values are positive, yields

ps(s) = pr(r) 2 2
dr

= pr(r) 2 2
ds
1
(3.3-6)
pr(r)
= 1 0 s 1.


Because ps(s) is a probability density function, it follows that it must be zero out-
side the interval [0, 1] in this case because its integral over all values of s must
equal 1. We recognize the form of ps(s) given in Eq. (3.3-6) as a uniform prob-
ability density function. Simply stated, we have demonstrated that performing
the transformation function given in Eq. (3.3-4) yields a random variable s char-
acterized by a uniform probability density function. It is important to note from
Eq. (3.3-4) that T(r) depends on pr(r), but, as indicated by Eq. (3.3-6), the re-
sulting ps(s) always is uniform, independent of the form of pr(r).
For discrete values we deal with probabilities and summations instead of
probability density functions and integrals. The probability of occurrence of
gray level rk in an image is approximated by
nk
pr(rk) = k = 0, 1, 2, p , L - 1 (3.3-7)
n
where, as noted at the beginning of this section, n is the total number of pixels
in the image, nk is the number of pixels that have gray level rk , and L is the total
number of possible gray levels in the image. The discrete version of the trans-
formation function given in Eq. (3.3-4) is
k
sk = TArk B = a pr Arj B (3.3-8)
j=0
k n
j
= a k = 0, 1, 2, p , L - 1.
j=0 n
Thus, a processed (output) image is obtained by mapping each pixel with level
rk in the input image into a corresponding pixel with level sk in the output image
via Eq. (3.3-8). As indicated earlier, a plot of pr Ark B versus rk is called a his-
togram. The transformation (mapping) given in Eq. (3.3-8) is called histogram
equalization or histogram linearization. It is not difficult to show (Problem 3.9)
that the transformation in Eq. (3.3-8) satisfies conditions (a) and (b) stated pre-
viously in this section.
Unlike its continuos counterpart, it cannot be proved in general that this dis-
crete transformation will produce the discrete equivalent of a uniform proba-
bility density function, which would be a uniform histogram. However, as will
be seen shortly, use of Eq. (3.3-8) does have the general tendency of spreading
the histogram of the input image so that the levels of the histogram-equalized
image will span a fuller range of the gray scale.
We discussed earlier in this section the many advantages of having gray-level
values that cover the entire gray scale. In addition to producing gray levels that
have this tendency, the method just derived has the additional advantage that
it is fully “automatic.” In other words, given an image, the process of histogram
equalization consists simply of implementing Eq. (3.3-8), which is based on in-
formation that can be extracted directly from the given image, without the need
for further parameter specifications. We note also the simplicity of the compu-
tations that would be required to implement the technique.
The inverse transformation from s back to r is denoted by
rk = T-1 Ask B k = 0, 1, 2, p , L - 1 (3.3-9)


It can be shown (Problem 3.9) that the inverse transformation in Eq. (3.3-9)
satisfies conditions (a) and (b) stated previously in this section only if none of
the levels, rk , k=0, 1, 2, p , L-1, are missing from the input image. Although
the inverse transformation is not used in histogram equalization, it plays a cen-
tral role in the histogram-matching scheme developed in the next section. We
also discuss in that section details of how to implement histogram processing
techniques.

EXAMPLE 3.3: I Figure 3.17(a) shows the four images from Fig. 3.15, and Fig. 3.17(b) shows
Histogram the result of performing histogram equalization on each of these images.The first
equalization. three results (top to bottom) show significant improvement. As expected, his-
togram equalization did not produce a significant visual difference in the fourth
image because the histogram of this image already spans the full spectrum of
the gray scale. The transformation functions used to generate the images in
Fig. 3.17(b) are shown in Fig. 3.18. These functions were generated from the
histograms of the original images [see Fig. 3.15(b)] using Eq. (3.3-8). Note that
transformation (4) has a basic linear shape, again indicating that the gray lev-
els in the fourth input image are nearly uniformly distributed.As was just noted,
we would expect histogram equalization in this case to have negligible effect on
the appearance of the image.
The histograms of the equalized images are shown in Fig. 3.17(c). It is of in-
terest to note that, while all these histograms are different, the histogram-
equalized images themselves are visually very similar. This is not unexpected
because the difference between the images in the left column is simply one of
contrast, not of content. In other words, since the images have the same content,
the increase in contrast resulting from histogram equalization was enough to
render any gray-level differences in the resulting images visually indistinguish-
able. Given the significant contrast differences of the images in the left column,
this example illustrates the power of histogram equalization as an adaptive en-
hancement tool. I

3.3.2 Histogram Matching (Specification)
As indicated in the preceding discussion, histogram equalization automatical-
ly determines a transformation function that seeks to produce an output image
that has a uniform histogram. When automatic enhancement is desired, this is
a good approach because the results from this technique are predictable and the
method is simple to implement. We show in this section that there are applica-
tions in which attempting to base enhancement on a uniform histogram is not
the best approach. In particular, it is useful sometimes to be able to specify the
shape of the histogram that we wish the processed image to have. The method
used to generate a processed image that has a specified histogram is called
histogram matching or histogram specification.

Development of the method
Let us return for a moment to continuous gray levels r and z (considered
continuous random variables), and let pr(r) and pz(z) denote their corre-
sponding continuos probability density functions. In this notation, r and z denote


a b c
FIGURE 3.17 (a) Images from Fig. 3.15. (b) Results of histogram equalization. (c) Cor-
responding histograms.


FIGURE 3.18 1.00
Transformation
functions (1)
through (4) were
obtained from the
histograms of the 0.75
images in
Fig.3.17(a), using
Eq. (3.3-8). (4)
(1)

0.50
(2)

(3)
0.25

0
0 64 128 192 255

the gray levels of the input and output (processed) images, respectively. We can
estimate pr(r) from the given input image, while pz(z) is the specified probability
density function that we wish the output image to have.
Let s be a random variable with the property
r
s = T(r) = pr(w) dw (3.3-10)
30

where w is a dummy variable of integration.We recognize this expression as the
continuos version of histogram equalization given in Eq. (3.3-4). Suppose next
that we define a random variable z with the property
z
G(z) = pz(t) dt = s (3.3-11)
30

where t is a dummy variable of integration. It then follows from these two equa-
tions that G(z)=T(r) and, therefore, that z must satisfy the condition
z = G -1(s) = G -1 CT(r)D. (3.3-12)
The transformation T(r) can be obtained from Eq. (3.3-10) once pr(r) has been
estimated from the input image. Similarly, the transformation function G(z)
can be obtained using Eq. (3.3-11) because pz(z) is given.
Assuming that G–1 exists and that it satisfies conditions (a) and (b) in the
previous section, Eqs. (3.3-10) through (3.3-12) show that an image with a spec-
ified probability density function can be obtained from an input image by using
the following procedure: (1) Obtain the transformation function T(r) using
Eq. (3.3-10). (2) Use Eq. (3.3-11) to obtain the transformation function G(z).
(3) Obtain the inverse transformation function G–1. (4) Obtain the output image


by applying Eq. (3.3-12) to all the pixels in the input image.The result of this pro-
cedure will be an image whose gray levels, z, have the specified probability den-
sity function pz(z).
Although the procedure just described is straightforward in principle, it is
seldom possible in practice to obtain analytical expressions for T(r) and for
G–1. Fortunately, this problem is simplified considerably in the case of discrete
values.The price we pay is the same as in histogram equalization, where only an
approximation to the desired histogram is achievable. In spite of this, however,
some very useful results can be obtained even with crude approximations.
The discrete formulation of Eq. (3.3-10) is given by Eq. (3.3-8), which we re-
peat here for convenience:
k
sk = TArk B = a pr Arj B
j=0
(3.3-13)
k nj
= a k = 0, 1, 2, p , L - 1
j=0 n

where n is the total number of pixels in the image, nj is the number of pixels with
gray level rj , and L is the number of discrete gray levels. Similarly, the discrete
formulation of Eq. (3.3-11) is obtained from the given histogram pz Azi B, i=0,
1, 2, p , L-1, and has the form
k
vk = GAzk B = a pz Azi B = sk k = 0, 1, 2, p , L - 1. (3.3-14)
i=0

As in the continuos case, we are seeking values of z that satisfy this equation.
The variable vk was added here for clarity in the discussion that follows. Final-
ly, the discrete version of Eq. (3.3-12) is given by
zk = G -1 CTArk B D k = 0, 1, 2, p , L - 1 (3.3-15)
or, from Eq. (3.3-13),
zk = G -1 Ask B k = 0, 1, 2, p , L - 1. (3.3-16)
Equations (3.3-13) through (3.3-16) are the foundation for implementing
histogram matching for digital images. Equation (3.3-13) is a mapping from the
levels in the original image into corresponding levels sk based on the histogram
of the original image, which we compute from the pixels in the image. Equation
(3.3-14) computes a transformation function G from the given histogram pz(z).
Finally, Eq. (3.3-15) or its equivalent, Eq. (3.3-16), gives us (an approximation
of) the desired levels of the image with that histogram. The first two equations
can be implemented easily because all the quantities are known. Implementa-
tion of Eq. (3.3-16) is straightforward, but requires additional explanation.

Implementation
We start by noting the following: (1) Each set of gray levels Erj F, Esj F, and Ezj F,
j=0, 1, 2, p , L-1, is a one-dimensional array of dimension L*1. (2) All
mappings from r to s and from s to z are simple table lookups between a given


pixel value and these arrays. (3) Each of the elements of these arrays, for ex-
ample, sk , contains two important pieces of information: The subscript k de-
notes the location of the element in the array, and s denotes the value at that
location. (4) We need to be concerned only with integer pixel values. For ex-
ample, in the case of an 8-bit image, L=256 and the elements of each of the
arrays just mentioned are integers between 0 and 255. This implies that we now
work with gray level values in the interval [0, L-1] instead of the normalized
interval [0, 1] that we used before to simplify the development of histogram
processing techniques.
In order to see how histogram matching actually can be implemented, con-
sider Fig. 3.19(a), ignoring for a moment the connection shown between this
figure and Fig. 3.19(c). Figure 3.19(a) shows a hypothetical discrete transfor-
mation function s=T(r) obtained from a given image. The first gray level in
the image, r1 , maps to s1 ; the second gray level, r2 , maps to s2 ; the kth level rk
maps to sk ; and so on (the important point here is the ordered correspondence
between these values). Each value sj in the array is precomputed using
Eq. (3.3-13), so the process of mapping simply uses the actual value of a pixel
as an index in an array to determine the corresponding value of s. This process
is particularly easy because we are dealing with integers. For example, the s
mapping for an 8-bit pixel with value 127 would be found in the 128th position
in array Esj F (recall that we start at 0) out of the possible 256 positions. If we
stopped here and mapped the value of each pixel of an input image by the

a b s v
c
FIGURE 3.19 1 1
(a) Graphical
sk
interpretation of G(z)
mapping from rk vq
to sk via T(r). T(r)
(b) Mapping of zq
to its
corresponding
value vq via G(z).
(c) Inverse 0 r 0 z
mapping from sk 0 rk L-1 0 zq L-1
to its
corresponding v
value of zk .
1

sk
G(z)

0 z
0 zk L-1


method just described, the output would be a histogram-equalized image, ac-
cording to Eq. (3.3-8).
In order to implement histogram matching we have to go one step further.
Figure 3.19(b) is a hypothetical transformation function G obtained from a
given histogram pz(z) by using Eq. (3.3-14). For any zq , this transformation
function yields a corresponding value vq . This mapping is shown by the arrows
in Fig. 3.19(b). Conversely, given any value vq , we would find the correspond-
ing value zq from G–1. In terms of the figure, all this means graphically is that we
would reverse the direction of the arrows to map vq into its corresponding zq .
However, we know from the definition in Eq. (3.3-14) that v=s for corre-
sponding subscripts, so we can use exactly this process to find the zk corre-
sponding to any value sk that we computed previously from the equation
sk=TArk B. This idea is shown in Fig. 3.19(c).
Since we really do not have the z’s (recall that finding these values is pre-
cisely the objective of histogram matching), we must resort to some sort of iter-
ative scheme to find z from s. The fact that we are dealing with integers makes
this a particularly simple process. Basically, because vk=sk , we have from
Eq. (3.3-14) that the z’s for which we are looking must satisfy the equation
GAzk B=sk , or AGAzk B-sk B=0. Thus, all we have to do to find the value of zk
corresponding to sk is to iterate on values of z such that this equation is satisfied
for k=0, 1, 2, p , L-1. This is the same thing as Eq. (3.3-16), except that we
do not have to find the inverse of G because we are going to iterate on z. Since
we are dealing with integers, the closest we can get to satisfying the equation
AGAzk B-sk B=0 is to let zk=z for each value of k, where z is the smallest
ˆ ˆ
integer in the interval [0, L-1] such that
AG(z) - sk B
ˆ 0 k = 0, 1, 2, p , L - 1. (3.3-17)
Given a value sk , all this means conceptually in terms of Fig. 3.19(c) is that we
ˆ
would start with z = 0 and increase it in integer steps until Eq. (3.3-17) is sat-
ˆ
isfied, at which point we let zk = z. Repeating this process for all values of k
would yield all the required mappings from s to z, which constitutes the im-
ˆ
plementation of Eq. (3.3-16). In practice, we would not have to start with z = 0
each time because the values of sk are known to increase monotonically. Thus,
ˆ
for k=k+1, we would start with z = zk and increment in integer values
from there.
The procedure we have just developed for histogram matching may be sum-
marized as follows:
1. Obtain the histogram of the given image.
2. Use Eq. (3.3-13) to precompute a mapped level sk for each level rk .
3. Obtain the transformation function G from the given pz(z) using
Eq. (3.3-14).
4. Precompute zk for each value of sk using the iterative scheme defined in con-
nection with Eq. (3.3-17).
5. For each pixel in the original image, if the value of that pixel is rk , map this
value to its corresponding level sk ; then map level sk into the final level zk .
Use the precomputed values from Steps (2) and (4) for these mappings.


Note that Step (5) implements two mappings for each pixel in the image being
processed. The first mapping is nothing more than histogram equalization. If
the histogram-equalized image is not required, it obviously would be beneficial
to combine both transformations into one in order to save an intermediate step.
Finally, we note that, even in the discrete case, we need to be concerned about
G–1 satisfying conditions (a) and (b) of the previous section. It is not difficult to
show (Problem 3.9) that the only way to guarantee that G–1 be single valued and
monotonic is to require that G be strictly monotonic (i.e., always increasing),
which means simply that none of the values of the specified histogram pz Azi B in
Eq. (3.3-14) can be zero.

EXAMPLE 3.4: I Figure 3.20(a) shows an image of the Mars moon, Phobos, taken by NASA’s
Comparison Mars Global Surveyor. Figure 3.20(b) shows the histogram of Fig. 3.20(a). The
between image is dominated by large, dark areas, resulting in a histogram characterized
histogram
equalization and by a large concentration of pixels in the dark end of the gray scale. At first
histogram glance, one might conclude that histogram equalization would be a good ap-
matching. proach to enhance this image, so that details in the dark areas become more
visible. It is demonstrated in the following discussion that this is not so.
Figure 3.21(a) shows the histogram equalization transformation [Eq. (3.3-8)
or (3.3-13)] obtained from the histogram shown in Fig. 3.20(b). The most rele-
vant characteristic of this transformation function is how fast it rises from gray
level 0 to a level near 190. This is caused by the large concentration of pixels in
the input histogram having levels very near 0. When this transformation is ap-
plied to the levels of the input image to obtain a histogram-equalized result,
the net effect is to map a very narrow interval of dark pixels into the upper end
of the gray scale of the output image. Because numerous pixels in the input
image have levels precisely in this interval, we would expect the result to be an

7.00
Number of pixels ( 104)

5.25
*

3.50

1.75

0
0 64 128 192 255
Gray level

a b
FIGURE 3.20 (a) Image of the Mars moon Photos taken by NASA’s Mars Global
Surveyor. (b) Histogram. (Original image courtesy of NASA.)


255 a b
c
Output gray levels 192 FIGURE 3.21
(a) Transformation
function for
128 histogram
equalization.
(b) Histogram-
64
equalized image
(note the washed-
0 out appearance).
0 64 128 192 255 (c) Histogram
Input gray levels of (b).
7.00
*104)

5.25
Number of pixels (

3.50

1.75

0
0 64 128 192 255
Gray level

image with a light, washed-out appearance. As shown in Fig. 3.21(b), this is in-
deed the case. The histogram of this image is shown in Fig. 3.21(c). Note how all
the gray levels are biased toward the upper one-half of the gray scale.
Since the problem with the transformation function in Fig. 3.21(a) was caused
by a large concentration of pixels in the original image with levels near 0, a rea-
sonable approach is to modify the histogram of that image so that it does not
have this property. Figure 3.22(a) shows a manually specified function that pre-
serves the general shape of the original histogram, but has a smoother transition
of levels in the dark region of the gray scale. Sampling this function into 256
equally spaced discrete values produced the desired specified histogram. The
transformation function G(z) obtained from this histogram using Eq. (3.3-14) is
labeled transformation (1) in Fig. 3.22(b). Similarly, the inverse transformation
G–1(s) from Eq. (3.3-16) [obtained using the iterative technique discussed in
connection with Eq. (3.3-17)] is labeled transformation (2) in Fig. 3.22(b).The en-
hanced image in Fig. 3.22(c) was obtained by applying transformation (2) to the
pixels of the histogram-equalized image in Fig. 3.21(b).The improvement of the
histogram-specified image over the result obtained by histogram equalization is
evident by comparing these two images. It is of interest to note that a rather
modest change in the original histogram was all that was required to obtain a sig-
nificant improvement in enhancement.The histogram of Fig. 3.22(c) is shown in
Fig. 3.22(d). The most distinguishing feature of this histogram is how its low end
has shifted right toward the lighter region of the gray scale, as desired. I


a c 7.00
b

Number of pixels (*104)
d
5.25
FIGURE 3.22
(a) Specified
histogram. 3.50
(b) Curve (1) is
from Eq. (3.3-14),
1.75
using the
histogram in (a);
curve (2) was 0
obtained using 0 64 128 192 255
the iterative Gray level
procedure in
255
Eq. (3.3-17).
(c) Enhanced
image using 192
Output gray levels

mappings from (1)
curve (2).
(d) Histogram 128
of (c). (2)
64

0
0 64 128 192 255
Input gray levels

7.00
Number of pixels ( 104)

5.25
*

3.50

1.75

0
0 64 128 192 255
Gray level

Although it probably is obvious by now, we emphasize before leaving this sec-
tion that histogram specification is, for the most part, a trial-and-error process.
One can use guidelines learned from the problem at hand, just as we did in the
preceding example. At times, there may be cases in which it is possible to for-
mulate what an “average” histogram should look like and use that as the spec-
ified histogram. In cases such as these, histogram specification becomes a
straightforward process. In general, however, there are no rules for specifying
histograms, and one must resort to analysis on a case-by-case basis for any given
enhancement task.


3.3.3 Local Enhancement
The histogram processing methods discussed in the previous two sections are
global, in the sense that pixels are modified by a transformation function based
on the gray-level content of an entire image. Although this global approach is
suitable for overall enhancement, there are cases in which it is necessary to en-
hance details over small areas in an image. The number of pixels in these areas
may have negligible influence on the computation of a global transformation
whose shape does not necessarily guarantee the desired local enhancement.
The solution is to devise transformation functions based on the gray-level dis-
tribution—or other properties—in the neighborhood of every pixel in the image.
Although processing methods based on neighborhoods are the topic of Section
3.5, we discuss local histogram processing here for the sake of clarity and con-
tinuity. The reader will have no difficulty in following the discussion.
The histogram processing techniques previously described are easily adapt-
able to local enhancement. The procedure is to define a square or rectangular
neighborhood and move the center of this area from pixel to pixel. At each lo-
cation, the histogram of the points in the neighborhood is computed and either
a histogram equalization or histogram specification transformation function is
obtained. This function is finally used to map the gray level of the pixel cen-
tered in the neighborhood.The center of the neighborhood region is then moved
to an adjacent pixel location and the procedure is repeated. Since only one new
row or column of the neighborhood changes during a pixel-to-pixel translation
of the region, updating the histogram obtained in the previous location with
the new data introduced at each motion step is possible (Problem 3.11).This ap-
proach has obvious advantages over repeatedly computing the histogram over
all pixels in the neighborhood region each time the region is moved one pixel
location.Another approach used some times to reduce computation is to utilize
nonoverlapping regions, but this method usually produces an undesirable
checkerboard effect.

I Figure 3.23(a) shows an image that has been slightly blurred to reduce its EXAMPLE 3.5:
noise content (see Section 3.6.1 regarding blurring). Figure 3.23(b) shows the re- Enhancement
sult of global histogram equalization. As is often the case when this technique using local
histograms.
is applied to smooth, noisy areas, Fig. 3.23(b) shows considerable enhancement
of the noise, with a slight increase in contrast. Note that no new structural de-
tails were brought out by this method. However, local histogram equalization
using a 7*7 neighborhood revealed the presence of small squares inside the
larger dark squares. The small squares were too close in gray level to the larg-
er ones, and their sizes were too small to influence global histogram equaliza-
tion significantly. Note also the finer noise texture in Fig. 3.23(c), a result of
local processing using relatively small neighborhoods. I

3.3.4 Use of Histogram Statistics for Image Enhancement
Instead of using the image histogram directly for enhancement, we can use in-
stead some statistical parameters obtainable directly from the histogram. Let r
denote a discrete random variable representing discrete gray-levels in the range


a b c
FIGURE 3.23 (a) Original image. (b) Result of global histogram equalization. (c) Result of local histogram
equalization using a 7*7 neighborhood about each pixel.

[0, L-1], and let pAri B denote the normalized histogram component corre-
sponding to the ith value of r. As indicated previously in this section, we may
view pAri B as an estimate of the probability of occurrence of gray level ri . The
nth moment of r about its mean is defined as
L-1
mn(r) = a Ari - mB pAri B
n
(3.3-18)
i=0

where m is the mean value of r (its average gray level):
L-1
m = a ri pAri B. (3.3-19)
i=0

It follows from Eqs. (3.3-18) and (3.3-19) that m0=1 and m1=0. The second
moment is given by
L-1
m2(r) = a Ari - mB pAri B.
2
(3.3-20)
i=0

We recognize this expression as the variance of r, which is denoted conven-
tionally by s2(r). The standard deviation is defined simply as the square root of
the variance. We will revisit moments in Chapter 11 in connection with image
description. In terms of enhancement, however, we are interested primarily in
the mean, which is a measure of average gray level in an image, and the variance
(or standard deviation), which is a measure of average contrast.
We consider two uses of the mean and variance for enhancement purposes.
The global mean and variance are measured over an entire image and are use-
ful primarily for gross adjustments of overall intensity and contrast. A much
more powerful use of these two measures is in local enhancement, where the
local mean and variance are used as the basis for making changes that depend
on image characteristics in a predefined region about each pixel in the image.


Let (x, y) be the coordinates of a pixel in an image, and let Sxy denote a
neighborhood (subimage) of specified size, centered at (x, y). From Eq. (3.3-19)
the mean value mSxy of the pixels in Sxy can be computed using the expression

mSxy = a rs, t pArs, t B (3.3-21)
(s, t)HSxy

where rs, t is the gray level at coordinates (s, t) in the neighborhood, and pArs, t B
is the neighborhood normalized histogram component corresponding to that
value of gray level. Similarly, from Eq. (3.3-20), the gray-level variance of the pix-
els in region Sxy is given by

a Crs, t - mSxy D pArs, t B.
2
s2 xy =
S (3.3-22)
(s, t)HSxy

The local mean is a measure of average gray level in neighborhood Sxy , and the
variance (or standard deviation) is a measure of contrast in that neighborhood.
An important aspect of image processing using the local mean and variance
is the flexibility they afford in developing simple, yet powerful enhancement
techniques based on statistical measures that have a close, predictable corre-
spondence with image appearance. We illustrate these characteristics by means
of an example.

I Figure 3.24 shows an SEM (scanning electron microscope) image of a tung- EXAMPLE 3.6:
sten filament wrapped around a support. The filament in the center of the Enhancement
image and its support are quite clear and easy to study. There is another fila- based on local
statistics.
ment structure on the right side of the image, but it is much darker and its size
and other features are not as easily discernable. Local enhancement by contrast
manipulation is an ideal approach to try on problems such as this, where part
of the image is acceptable, but other parts may contain hidden features of in-
terest.
In this particular case, the problem is to enhance dark areas while leaving the
light area as unchanged as possible since it does note require enhancement. We
can use the concepts presented in this section to formulate an enhancement
method that can tell the difference between dark and light and, at the same
time, is capable of enhancing only the dark areas.A measure of whether an area
is relatively light or dark at a point (x, y) is to compare the local average gray
level mSxy to the average image gray level, called the global mean and denoted
MG . This latter quantity is obtained by letting S encompass the entire image.
Thus, we have the first element of our enhancement scheme: We will consider
the pixel at a point (x, y) as a candidate for processing if mSxy k0 MG , where
k0 is a positive constant with value less than 1.0. Since we are interested in en-
hancing areas that have low contrast, we also need a measure to determine
whether the contrast of an area makes it a candidate for enhancement.Thus, we
will consider the pixel at a point (x, y) as a candidate for enhancement if
sSxy k2 DG , where DG is the global standard deviation and k2 is a positive con-
stant. The value of this constant will be greater than 1.0 if we are interested in
enhancing light areas and less than 1.0 for dark areas. Finally, we need to restrict


the lowest values of contrast we are willing to accept, otherwise the procedure
would attempt to enhance even constant areas, whose standard deviation is
zero. Thus, we also set a lower limit on the local standard deviation by requir-
ing that k1 DG sSxy , with k<k2. A pixel at (x, y) that meets all the condi-
tions for local enhancement is processed simply by multiplying it by a specified
constant, E, to increase (or decrease) the value of its gray level relative to the
rest of the image. The values of pixels that do not meet the enhancement con-
ditions are left unchanged.
A summary of the enhancement method is as follows. Let f(x, y) represent
the value of an image pixel at any image coordinates (x, y), and let g(x, y) rep-
resent the corresponding enhanced pixel at those coordinates. Then

g(x, y) = b
E f(x, y) if mSxy k0 MG AND k1 DG sSxy k2 DG
f(x, y) otherwise

where, as indicated previously, E, k0 , k1 , and k2 are specified parameters; MG is
the global mean of the input image; and DG is its global standard deviation.
Normally, making a successful selection of parameters requires a bit of ex-
perimentation to gain familiarity with a given image or class of images. In this
case, the following values were selected: E=4.0, k0=0.4, k1=0.02, and
k2=0.4. The relatively low value of 4.0 for E was chosen so that, when it was
multiplied by the levels in the areas being enhanced (which are dark), the re-
sult would still tend toward the dark end of the scale, and thus preserve the gen-
eral visual balance of the image. The value of k0 was chosen as somewhat less
than half the global mean since it is obvious by looking at the image that the
areas that require enhancement definitely are dark enough to be below half
the global mean. A similar analysis led to the choice of values for k1 and k2 .
Choosing these constants is not a difficult task in general, but their choice

FIGURE 3.24 SEM
image of a
tungsten filament
and support,
magnified
approximately
130 *. (Original
image courtesy of
Mr. Michael
Shaffer,
Department of
Geological
Sciences,
University of
Oregon, Eugene).


a b c
FIGURE 3.25 (a) Image formed from all local means obtained from Fig. 3.24 using Eq. (3.3-21). (b) Image
formed from all local standard deviations obtained from Fig. 3.24 using Eq. (3.3-22). (c) Image formed from
all multiplication constants used to produce the enhanced image shown in Fig. 3.26.

definitely must be guided by a logical analysis of the enhancement problem at
hand. Finally, the choice of size for the local area should be as small as possible
in order to preserve detail and keep the computational burden as low as possi-
ble. We chose a small (3*3) local region.
Figure 3.25(a) shows the values of mSxy for all values of (x, y). Since the value
of mSxy for each (x, y) is the average of the neighboring pixels in a 3*3 area
centered at (x, y), we expect the result to be similar to the original image, but

FIGURE 3.26
Enhanced SEM
image. Compare
with Fig. 3.24. Note
in particular the
enhanced area on
the right, bottom
side of the image.


slightly blurred. This indeed is the case in Fig. 3.25(a). Figure 3.25(b) shows in
image formed using all the values of sSxy . Similarly, we can construct an image
out the values that multiply f(x, y) at each coordinate pair (x, y) to form g(x, y).
Since the values are either 1 or E, the image is binary, as shown in Fig. 3.25(c).
The dark areas correspond to 1 and the light areas to E. Thus, any light point in
Fig. 3.25(c) signifies a coordinate pair (x, y) at which the enhancement proce-
dure multiplied f(x, y) by E to produce an enhanced pixel. The dark points
represent coordinates at which the procedure did not to modify the pixel values.
The enhanced image obtained with the method just described is shown in
Fig. 3.26. In comparing this image with the original in Fig. 3.24, we note the obvious
detail that has been brought out on the right side of the enhanced image. It is worth-
while to point out that the unenhanced portions of the image (the light areas) were
left intact for the most part. We do note the appearance of some small bright dots
in the shadow areas where the coil meets the support stem, and around some of the
borders between the filament and the background.These are undesirable artifacts
created by the enhancement technique. In other words, the points appearing as light
dots met the criteria for enhancement and their values were amplified by factor E.
Introduction of artifacts is a definite drawback of a method such as the one just de-
scribed because of the nonlinear way in which they process an image.The key point
here, however, is that the image was enhanced in a most satisfactory way as far as
bringing out the desired detail. I

It is not difficult to imagine the numerous ways in which the example just
given could be adapted or extended to other situations in which local en-
hancement is applicable.

3.4 Enhancement Using Arithmetic/Logic Operations
Arithmetic/logic operations involving images are performed on a pixel-by-pixel
basis between two or more images (this excludes the logic operation NOT, which
is performed on a single image). As an example, subtraction of two images re-
sults in a new image whose pixel at coordinates (x, y) is the difference between
the pixels in that same location in the two images being subtracted. Depending
on the hardware and/or software being used, the actual mechanics of imple-
menting arithmetic/logic operations can be done sequentially, one pixel at a
time, or in parallel, where all operations are performed simultaneously.
Logic operations similarly operate on a pixel-by-pixel basis†. We need only
be concerned with the ability to implement the AND, OR, and NOT logic op-
erators because these three operators are functionally complete. In other words,
any other logic operator can be implemented by using only these three basic
functions.When dealing with logic operations on gray-scale images, pixel values
are processed as strings of binary numbers. For example, performing the NOT
operation on a black, 8-bit pixel (a string of eight 0’s) produces a white pixel

†
Recall that, for two binary variables a and b: aANDb yields 1 only when both a and b are 1; otherwise
the result is 0. Similarly, aORb is 0 when both variables are 0; otherwise the result is 1. Finally, if a is 1,
NOT (a) is 0, and vice versa.

3.4 I Enhancement Using Arithmetic/Logic Operations 109

a b c
d e f
FIGURE 3.27
(a) Original
image. (b) AND
image mask.
(c) Result of the
AND operation
on images (a) and
(b). (d) Original
image. (e) OR
image mask.
(f) Result of
operation OR on
images (d) and
(e).

(a string of eight 1’s). Intermediate values are processed the same way, chang-
ing all 1’s to 0’s and vice versa.Thus, the NOT logic operator performs the same
function as the negative transformation of Eq. (3.2-1). The AND and OR op-
erations are used for masking; that is, for selecting subimages in an image, as il-
lustrated in Fig. 3.27. In the AND and OR image masks, light represents a binary
1 and dark represents a binary 0. Masking sometimes is referred to as region of
interest (ROI) processing. In terms of enhancement, masking is used primarily
to isolate an area for processing. This is done to highlight that area and differ-
entiate it from the rest of the image. Logic operations also are used frequently
in conjunction with morphological operations, as discussed in Chapter 9.
Of the four arithmetic operations, subtraction and addition (in that order) are
the most useful for image enhancement. We consider division of two images
simply as multiplication of one image by the reciprocal of the other.Aside from
the obvious operation of multiplying an image by a constant to increase its av-
erage gray level, image multiplication finds use in enhancement primarily as a
masking operation that is more general than the logical masks discussed in the
previous paragraph. In other words, multiplication of one image by another can
be used to implement gray-level, rather than binary, masks. We give an exam-
ple in Section 3.8 of how such a masking operation can be a useful tool. In the
remainder of this section, we develop and illustrate methods based on subtrac-
tion and addition for image enhancement. Other uses of image multiplication
are discussed in Chapter 5, in the context of image restoration.


3.4.1 Image Subtraction
The difference between two images f(x, y) and h(x, y), expressed as
g(x, y) = f(x, y) - h(x, y), (3.4-1)
is obtained by computing the difference between all pairs of corresponding pix-
els from f and h. The key usefulness of subtraction is the enhancement of dif-
ferences between images. We illustrate this concept by returning briefly to the
discussion in Section 3.2.4, where we showed that the higher-order bit planes of
an image carry a significant amount of visually relevant detail, while the lower
planes contribute more to fine (often imperceptible) detail. Figure 3.28(a) shows
the fractal image used earlier to illustrate the concept of bit planes. Figure 3.28(b)
shows the result of discarding (setting to zero) the four least significant bit planes
of the original image. The images are nearly identical visually, with the excep-
tion of a very slight drop in overall contrast due to less variability of the gray-
level values in the image of Fig. 3.28(b). The pixel-by-pixel difference between
these two images is shown in Fig. 3.28(c). The differences in pixel values are so
small that the difference image appears nearly black when displayed on an 8-bit

a b
c d
FIGURE 3.28
(a) Original
fractal image.
(b) Result of
setting the four
lower-order bit
planes to zero.
(c) Difference
between (a) and
(b).
(d) Histogram-
equalized
difference image.
(Original image
courtesy of Ms.
Melissa D. Binde,
Swarthmore
College,
Swarthmore, PA).


display. In order to bring out more detail, we can perform a contrast stretching
transformation, such as those discussed in Sections 3.2 or 3.3. We chose his-
togram equalization, but an appropriate power-law transformation would have
done the job also. The result is shown in Fig. 3.28(d). This is a very useful image
for evaluating the effect of setting to zero the lower-order planes.

I One of the most commercially successful and beneficial uses of image sub- EXAMPLE 3.7:
traction is in the area of medical imaging called mask mode radiography. In this Use of image
case h(x, y), the mask, is an X-ray image of a region of a patient’s body captured subtraction in
mask mode
by an intensified TV camera (instead of traditional X-ray film) located oppo- radiography.
site an X-ray source.The procedure consists of injecting a contrast medium into
the patient’s bloodstream, taking a series of images of the same anatomical re-
gion as h(x, y), and subtracting this mask from the series of incoming images
after injection of the contrast medium. The net effect of subtracting the mask
from each sample in the incoming stream of TV images is that the areas that are
different between f(x, y) and h(x, y) appear in the output image as enhanced
detail. Because images can be captured at TV rates, this procedure in essence
gives a movie showing how the contrast medium propagates through the vari-
ous arteries in the area being observed.
Figure 3.29(a) shows an X-ray image of the top of a patient’s head prior to
injection of an iodine medium into the bloodstream. The camera yielding this
image was positioned above the patient’s head, looking down. As a reference
point, the bright spot in the lower one-third of the image is the core of the spinal
column. Figure 3.29(b) shows the difference between the mask (Fig. 3.29a) and
an image taken some time after the medium was introduced into the blood-
stream. The bright arterial paths carrying the medium are unmistakably en-
hanced in Fig. 3.29(b). These arteries appear quite bright because they are not
subtracted out (that is, they are not part of the mask image). The overall back-
ground is much darker than that in Fig. 3.29(a) because differences between
areas of little change yield low values, which in turn appear as dark shades of gray
in the difference image. Note, for instance, that the spinal cord, which is bright
in Fig. 3.29(a), appears quite dark in Fig. 3.29(b) as a result of subtraction. I

a b
FIGURE 3.29
Enhancement by
image subtraction.
(a) Mask image.
(b) An image
(taken after
injection of a
contrast medium
into the
bloodstream) with
mask subtracted
out.


A few comments on implementation are an order before we leave this sec-
tion. In practice, most images are displayed using 8 bits (even 24-bit color im-
ages consists of three separate 8-bit channels). Thus, we expect image values
not to be outside the range from 0 to 255. The values in a difference image can
range from a minimum of –255 to a maximum of 255, so some sort of scaling is
required to display the results.There are two principal ways to scale a difference
image. One method is to add 255 to every pixel and then divide by 2. It is not
guaranteed that the values will cover the entire 8-bit range from 0 to 255, but
all pixel values definitely will be within this range. This method is fast and sim-
ple to implement, but it has the limitations that the full range of the display
may not be utilized and, potentially more serious, the truncation inherent in the
division by 2 will generally cause loss in accuracy.
If more accuracy and full coverage of the 8-bit range are desired, then we can
resort to another approach. First, the value of the minimum difference is ob-
tained and its negative added to all the pixels in the difference image (this will
create a modified difference image whose minimum values is 0). Then, all the
pixels in the image are scaled to the interval [0, 255] by multiplying each pixel
by the quantity 255 Max, where Max is the maximum pixel value in the modi-
fied difference image. It is evident that this approach is considerably more com-
plex and difficult to implement.
Before leaving this section we note also that change detection via image sub-
traction finds another major application in the area of segmentation, which is
the topic of Chapter 10. Basically, segmentation techniques attempt to subdivide
an image into regions based on a specified criterion. Image subtraction for seg-
mentation is used when the criterion is “changes.” For instance, in tracking (seg-
menting) moving vehicles in a sequence of images, subtraction is used to remove
all stationary components in an image. What is left should be the moving ele-
ments in the image, plus noise.

3.4.2 Image Averaging
Consider a noisy image g(x, y) formed by the addition of noise h(x, y) to an
original image f(x, y); that is,
g(x, y) = f(x, y) + h(x, y) (3.4-2)
where the assumption is that at every pair of coordinates (x, y) the noise is un-
correlated† and has zero average value.The objective of the following procedure
is to reduce the noise content by adding a set of noisy images, Egi(x, y)F.
If the noise satisfies the constraints just stated, it can be shown (Problem
–
3.15) that if an image g(x, y) is formed by averaging K different noisy images,

– 1 K
K ia
g(x, y) = gi(x, y) (3.4-3)
=1

†
Recall that the variance of a random variable x with mean m is defined as EC(x-m)2 D, where EE F is
the expected value of the argument. The covariance of two random variables xi and xj is defined as
EC Axi-mi B Axj-mj B D. If the variables are uncorrelated, their covariance is 0.


then it follows that
–
EEg(x, y)F = f(x, y) (3.4-4)

and
1 2
s2– (x, y) =
g s (3.4-5)
K h(x, y)
– –
where EEg(x, y)F is the expected value of g, and s2– (x, y) and s2– (x, y) are the
g h
– and h, all at coordinates (x, y). The standard deviation at any

1K
variances of g
point in the average image is
1
sg(x, y) =
– sh(x, y) . (3.4-6)

As K increases, Eqs. (3.4-5) and (3.4-6) indicate that the variability (noise) of
–
the pixel values at each location (x, y) decreases. Because EEg(x, y)F = f(x, y),
this means that g– (x, y) approaches f(x, y) as the number of noisy images used
in the averaging process increases. In practice, the images gi(x, y) must be reg-
istered (aligned) in order to avoid the introduction of blurring and other arti-
facts in the output image.

I An important application of image averaging is in the field of astronomy, EXAMPLE 3.8:
where imaging with very low light levels is routine, causing sensor noise fre- Noise reduction
quently to render single images virtually useless for analysis. Figure 3.30(a) by image
averaging.
shows an image of a galaxy pair called NGC 3314, taken by NASA’s Hubble
Space Telescope with a wide field planetary camera. NGC 3314 lies about 140
million light-years from Earth, in the direction of the southern-hemisphere con-
stellation Hydra. The bright stars forming a pinwheel shape near the center of
the front galaxy have formed recently from interstellar gas and dust. Fig-
ure 3.30(b) shows the same image, but corrupted by uncorrelated Gaussian
noise with zero mean and a standard deviation of 64 gray levels. This image is
useless for all practical purposes. Figures 3.30(c) through (f) show the results of
averaging 8, 16, 64, and 128 images, respectively. We see that the result obtained
with K=128 is reasonably close to the original in visual appearance.
We can get a better appreciation from Fig. 3.31 for how reduction in the vi-
sual appearance of noise takes place as a function of increasing K. This figure
shows the difference images between the original [Fig. 3.30(a)] and each of the
averaged images in Figs. 3.30(c) through (f). The histograms corresponding to
the difference images are also shown in the figure. As usual, the vertical scale
in the histograms represents number of pixels and is in the range C0, 2.6*104 D.
The horizontal scale represents gray level and is in the range [0, 255]. Notice in
the histograms that the mean and standard deviation of the difference images
decrease as K increases.This is as expected because, according to Eqs. (3.4-3) and
(3.4-4), the average image should approach the original as K increases. We can
also see the effect of a decreasing mean in the difference images on the left col-
umn of Fig. 3.31, which become darker as the K increases.


a b
c d
e f
FIGURE 3.30 (a) Image of Galaxy Pair NGC 3314. (b) Image corrupted by additive Gauss-
ian noise with zero mean and a standard deviation of 64 gray levels. (c)–(f) Results of av-
eraging K=8, 16, 64, and 128 noisy images. (Original image courtesy of NASA.)

Addition is the discrete formulation of continuous integration. In astronomical
observations, a process equivalent to the method just described is to use the inte-
grating capabilities of CCD or similar sensors for noise reduction by observing the
same scene over long periods of time. The net effect, however, is analogous to the
procedure just discussed. Cooling the sensor further reduces its noise level. I


a b
FIGURE 3.31
(a) From top to
bottom:
Difference images
between
Fig. 3.30(a) and
the four images in
Figs. 3.30(c)
through (f),
respectively.
(b) Corresponding
histograms.

As in the case of image subtraction, adding two or more 8-bit images requires
special care when it comes to displaying the result on an 8-bit display.The values in
the sum of K, 8-bit images can range from 0 to 255*K. Scaling back to 8 bits in
this case consists simply of dividing the result by K. Naturally, some accuracy will
be lost in the process, but this is unavoidable if the display has to be limited to 8 bits.


It is possible in some implementations of image averaging to have negative
values when noise is added to an image. In fact, in the example just given, this
was precisely the case because Gaussian random variables with zero mean and
nonzero variance have negative as well as positive values. The images in the ex-
ample were scaled using the second scaling method discussed at the end of the
previous section. That is, the minimum value in a given average image was ob-
tained and its negative was added to the image. Then all the pixels in the mod-
ified image were scaled to the range [0, 255] by multiplying each pixel in the
modified image by the quantity 255 Max, where Max was the maximum pixel
value in that image.

3.5 Basics of Spatial Filtering
As mentioned in Section 3.1, some neighborhood operations work with the val-
ues of the image pixels in the neighborhood and the corresponding values of a
subimage that has the same dimensions as the neighborhood. The subimage is
called a filter, mask, kernel, template, or window, with the first three terms being
the most prevalent terminology. The values in a filter subimage are referred to
as coefficients, rather than pixels.
The concept of filtering has its roots in the use of the Fourier transform for
signal processing in the so-called frequency domain. This topic is discussed in
more detail in Chapter 4. In the present chapter, we are interested in filtering
operations that are performed directly on the pixels of an image. We use the
term spatial filtering to differentiate this type of process from the more tradi-
tional frequency domain filtering.
The mechanics of spatial filtering are illustrated in Fig. 3.32. The process con-
sists simply of moving the filter mask from point to point in an image. At each
point (x, y), the response of the filter at that point is calculated using a prede-
fined relationship. For linear spatial filtering (see Section 2.6 regarding linear-
ity), the response is given by a sum of products of the filter coefficients and the
corresponding image pixels in the area spanned by the filter mask. For the 3*3
mask shown in Fig. 3.32, the result (or response), R, of linear filtering with the
filter mask at a point (x, y) in the image is
R = w(-1, -1)f(x - 1, y - 1) + w(-1, 0)f(x - 1, y) + p
+ w(0, 0)f(x, y) + p + w(1, 0)f(x + 1, y) + w(1, 1)f(x + 1, y + 1),
which we see is the sum of products of the mask coefficients with the corre-
sponding pixels directly under the mask. Note in particular that the coefficient
w(0, 0) coincides with image value f(x, y), indicating that the mask is centered
at (x, y) when the computation of the sum of products takes place. For a mask
of size m*n, we assume that m=2a+1 and n=2b+1, where a and b are
nonnegative integers. All this says is that our focus in the following discussion
will be on masks of odd sizes, with the smallest meaningful size being 3*3 (we
exclude from our discussion the trivial case of a 1*1 mask).

3.5 I Basics of Spatial Filtering 117

Image origin FIGURE 3.32 The
mechanics of
y spatial filtering.
The magnified
drawing shows a
3*3 mask and
the image section
directly under it;
Mask the image section
is shown
displaced out
from under the
mask for ease of
readability.

w(–1, –1) w(–1, 0) w(–1, 1)

Image f(x, y)

w(0, –1) w(0, 0) w(0, 1)

x

w(1, –1) w(1, 0) w(1, 1)

f(x-1, y-1) f(x-1, y) f(x-1, y+1) Mask coefficients, showing
coordinate arrangement

f(x, y-1) f(x, y) f(x, y+1)

f(x+1, y-1) f(x+1, y) f(x+1, y+1)

Pixels of image
section under mask

In general, linear filtering of an image f of size M*N with a filter mask of
size m*n is given by the expression:
a b
g(x, y) = a a w(s, t)f(x + s, y + t) (3.5-1)
s = -a t = -b

where, from the previous paragraph, a=(m-1) 2 and b=(n-1) 2. To
generate a complete filtered image this equation must be applied for x=0, 1,
2, p , M-1 and y=0, 1, 2, p , N-1. In this way, we are assured that the


mask processes all pixels in the image. It is easily verified when m=n=3 that
this expression reduces to the example given in the previous paragraph.
As discussed in Chapter 4, the process of linear filtering given in Eq. (3.5-1)
is similar to a frequency domain concept called convolution. For this reason,
linear spatial filtering often is referred to as “convolving a mask with an image.”
Similarly, filter masks are sometimes called convolution masks. The term con-
volution kernel also is in common use.
When interest lies on the response, R, of an m*n mask at any point (x, y),
and not on the mechanics of implementing mask convolution, it is common
practice to simplify the notation by using the following expression:
R = w1 z1 + w2 z2 + p + wmn zmn (3.5-2)
mn
= a wi zi
i=1

where the w’s are mask coefficients, the z’s are the values of the image gray
levels corresponding to those coefficients, and mn is the total number of coef-
ficients in the mask. For the 3*3 general mask shown in Fig. 3.33 the response
at any point (x, y) in the image is given by
R = w1 z1 + w2 z2 + p w9 z9 (3.5-3)
9
= a wi zi .
i=1

We make special mention of this simple formula because it is seen frequently
in the published literature on image processing.
Nonlinear spatial filters also operate on neighborhoods, and the mechanics
of sliding a mask past an image are the same as was just outlined. In general,
however, the filtering operation is based conditionally on the values of the pix-
els in the neighborhood under consideration, and they do not explicitly use co-
efficients in the sum-of-products manner described in Eqs. (3.5-1) and (3.5-2).
As shown in Section 3.6.2, for example, noise reduction can be achieved effec-
tively with a nonlinear filter whose basic function is to compute the median
gray-level value in the neighborhood in which the filter is located. Computation
of the median is a nonlinear operation, as is computation of the variance, which
we used in Section 3.3.4.

FIGURE 3.33
Another w1 w2 w3
representation of
a general 3*3
spatial filter mask.
w4 w5 w6

w7 w8 w9

3.6 I Smoothing Spatial Filters 119

An important consideration in implementing neighborhood operations for
spatial filtering is the issue of what happens when the center of the filter ap-
proaches the border of the image. Consider for simplicity a square mask of size
n*n. At least one edge of such a mask will coincide with the border of the
image when the center of the mask is at a distance of (n-1) 2 pixels away
from the border of the image. If the center of the mask moves any closer to the
border, one or more rows or columns of the mask will be located outside the
image plane. There are several ways to handle this situation. The simplest is to
limit the excursions of the center of the mask to be at a distance no less than
(n-1) 2 pixels from the border. The resulting filtered image will be smaller
than the original, but all the pixels in the filtered imaged will have been
processed with the full mask. If the result is required to be the same size as the
original, then the approach typically employed is to filter all pixels only with the
section of the mask that is fully contained in the image. With this approach,
there will be bands of pixels near the border that will have been processed with
a partial filter mask. Other approaches include “padding” the image by adding
rows and columns of 0’s (or other constant gray level), or padding by replicat-
ing rows or columns. The padding is then stripped off at the end of the process.
This keeps the size of the filtered image the same as the original, but the values
of the padding will have an effect near the edges that becomes more prevalent
as the size of the mask increases. The only way to obtain a perfectly filtered re-
sult is to accept a somewhat smaller filtered image by limiting the excursions of
the center of the filter mask to a distance no less than (n-1) 2 pixels from the
border of the original image.

3.6 Smoothing Spatial Filters
Smoothing filters are used for blurring and for noise reduction. Blurring is used
in preprocessing steps, such as removal of small details from an image prior to
(large) object extraction, and bridging of small gaps in lines or curves. Noise
reduction can be accomplished by blurring with a linear filter and also by non-
linear filtering.

3.6.1 Smoothing Linear Filters
The output (response) of a smoothing, linear spatial filter is simply the average
of the pixels contained in the neighborhood of the filter mask. These filters
sometimes are called averaging filters. For reasons explained in Chapter 4, they
also are referred to a lowpass filters.
The idea behind smoothing filters is straightforward. By replacing the value
of every pixel in an image by the average of the gray levels in the neighbor-
hood defined by the filter mask, this process results in an image with reduced
“sharp” transitions in gray levels. Because random noise typically consists of
sharp transitions in gray levels, the most obvious application of smoothing is
noise reduction. However, edges (which almost always are desirable features of
an image) also are characterized by sharp transitions in gray levels, so averag-
ing filters have the undesirable side effect that they blur edges. Another appli-
cation of this type of process includes the smoothing of false contours that result


a b
FIGURE 3.34 Two 1 1 1 1 2 1
3*3 smoothing
(averaging) filter
1 1
masks. The –* 1 1 1 ––* 2 4 2
9 16
constant multipli
er in front of each
mask is equal to 1 1 1 1 2 1
the sum of the
values of its
coefficients, as is
required to
compute an from using an insufficient number of gray levels, as discussed in Section 2.4.3.
average. A major use of averaging filters is in the reduction of “irrelevant” detail in an
image. By “irrelevant” we mean pixel regions that are small with respect to the
size of the filter mask. This latter application is illustrated later in this section.
Figure 3.34 shows two 3*3 smoothing filters. Use of the first filter yields the
standard average of the pixels under the mask. This can best be seen by substi-
tuting the coefficients of the mask into Eq. (3.5-3):
1 9
9 ia i
R = z,
=1

which is the average of the gray levels of the pixels in the 3*3 neighborhood
defined by the mask. Note that, instead of being 1 9, the coefficients of the fil-
ter are all 1’s. The idea here is that it is computationally more efficient to have
coefficients valued 1. At the end of the filtering process the entire image is di-
vided by 9. An m*n mask would have a normalizing constant equal to 1 mn.
A spatial averaging filter in which all coefficients are equal is sometimes called
a box filter.
The second mask shown in Fig. 3.34 is a little more interesting. This mask
yields a so-called weighted average, terminology used to indicate that pixels are
multiplied by different coefficients, thus giving more importance (weight) to
some pixels at the expense of others. In the mask shown in Fig. 3.34(b) the pixel

nal neighbors (by a factor of 12) and, thus, are weighed less than these imme-
at the center of the mask is multiplied by a higher value than any other, thus giv-
ing this pixel more importance in the calculation of the average. The other pix-
els are inversely weighted as a function of their distance from the center of the
mask. The diagonal terms are further away from the center than the orthogo-

diate neighbors of the center pixel.The basic strategy behind weighing the center
point the highest and then reducing the value of the coefficients as a function
of increasing distance from the origin is simply an attempt to reduce blurring
in the smoothing process.We could have picked other weights to accomplish the
same general objective. However, the sum of all the coefficients in the mask of
Fig. 3.34(b) is equal to 16, an attractive feature for computer implementation be-
cause it has an integer power of 2. In practice, it is difficult in general to see dif-
ferences between images smoothed by using either of the masks in Fig. 3.34, or
similar arrangements, because the area these masks span at any one location in
an image is so small.


With reference to Eq. (3.5-1), the general implementation for filtering an
M*N image with a weighted averaging filter of size m*n (m and n odd) is
given by the expression
a b

a a w(s, t)f(x + s, y + t)
s = -a t = -b
g(x, y) = a b
(3.6-1)
a a w(s, t)
s = -a t = -b

The parameters in this equation are as defined in Eq. (3.5-1). As before, it is
understood that the complete filtered image is obtained by applying Eq. (3.6-1)
for x=0, 1, 2, p , M-1 and y=0, 1, 2, p , N-1. The denominator in
Eq. (3.6-1) is simply the sum of the mask coefficients and, therefore, it is a con-
stant that needs to be computed only once. Typically, this scale factor is applied
to all the pixels of the output image after the filtering process is completed.

I The effects of smoothing as a function of filter size are illustrated in Fig. 3.35, EXAMPLE 3.9:
which shows an original image and the corresponding smoothed results obtained Image smoothing
using square averaging filters of sizes n=3, 5, 9, 15, and 35 pixels, respectively. with masks of
various sizes.
The principal features of these results are as follows: For n=3, we note a gen-
eral slight blurring throughout the entire image but, as expected, details that
are of approximately the same size as the filter mask are affected considerably
more. For example, the 3*3 and 5*5 squares, the small letter “a,” and the fine
grain noise show significant blurring when compared to the rest of the image.A
positive result is that the noise is less pronounced. Note that the jagged borders
of the characters and gray circles have been pleasingly smoothed.
The result for n=5 is somewhat similar, with a slight further increase in
blurring. For n=9 we see considerably more blurring, and the 20% black cir-
cle is not nearly as distinct from the background as in the previous three images,
illustrating the blending effect that blurring has on objects whose gray level
content is close to that of its neighboring pixels. Note the significant further
smoothing of the noisy rectangles. The results for n=15 and 35 are extreme
with respect to the sizes of the objects in the image. This type of excessive blur-
ring is generally used to eliminate small objects from an image. For instance, the
three small squares, two of the circles, and most of the noisy rectangle areas
have been blended into the background of the image in Fig. 3.35(f). Note also
in this figure the pronounced black border. This is a result of padding the bor-
der of the original image with 0’s (black) and then trimming off the padded
area. Some of the black was blended into all filtered images, but became truly
objectionable for the images smoothed with the larger filters. I

As mentioned earlier, an important application of spatial averaging is to blur
an image for the purpose getting a gross representation of objects of interest,
such that the intensity of smaller objects blends with the background and larg-
er objects become “bloblike” and easy to detect. The size of the mask estab-
lishes the relative size of the objects that will be blended with the background.
As an illustration, consider Fig. 3.36(a), which is an image from the Hubble tele-
scope in orbit around the Earth. Figure 3.36(b) shows the result of applying a


a b FIGURE 3.35 (a) Original image, of size 500*500 pixels. (b)–(f) Results of smoothing
c d with square averaging filter masks of sizes n=3, 5, 9, 15, and 35, respectively. The black
e f squares at the top are of sizes 3, 5, 9, 15, 25, 35, 45, and 55 pixels, respectively; their bor-
ders are 25 pixels apart. The letters at the bottom range in size from 10 to 24 points, in
increments of 2 points; the large letter at the top is 60 points. The vertical bars are 5 pix-
els wide and 100 pixels high; their separation is 20 pixels. The diameter of the circles is
25 pixels, and their borders are 15 pixels apart; their gray levels range from 0% to 100%
black in increments of 20%. The background of the image is 10% black. The noisy rec-
tangles are of size 50*120 pixels.


a b c
FIGURE 3.36 (a) Image from the Hubble Space Telescope. (b) Image processed by a 15*15 averaging mask.
(c) Result of thresholding (b). (Original image courtesy of NASA.)

15*15 averaging mask to this image. We see that a number of objects have ei-
ther blended with the background or their intensity has diminished considerably.
It is typical to follow an operation like this with thresholding to eliminate ob-
jects based on their intensity. The result of using the thresholding function of
Fig. 3.2(b) with a threshold value equal to 25% of the highest intensity in the
blurred image is shown in Fig. 3.36(c). Comparing this result with the original
image, we see that it is a reasonable representation of what we would consider
to be the largest, brightest objects in that image.

3.6.2 Order-Statistics Filters
Order-statistics filters are nonlinear spatial filters whose response is based on
ordering (ranking) the pixels contained in the image area encompassed by
the filter, and then replacing the value of the center pixel with the value de-
termined by the ranking result. The best-known example in this category is the
median filter, which, as its name implies, replaces the value of a pixel by the
median of the gray levels in the neighborhood of that pixel (the original value
of the pixel is included in the computation of the median). Median filters are
quite popular because, for certain types of random noise, they provide excel-
lent noise-reduction capabilities, with considerably less blurring than linear
smoothing filters of similar size. Median filters are particularly effective in
the presence of impulse noise, also called salt-and-pepper noise because of its
appearance as white and black dots superimposed on an image.
The median, j, of a set of values is such that half the values in the set are less
than or equal to j, and half are greater than or equal to j. In order to perform
median filtering at a point in an image, we first sort the values of the pixel in
question and its neighbors, determine their median, and assign this value to that
pixel. For example, in a 3*3 neighborhood the median is the 5th largest value,
in a 5*5 neighborhood the 13th largest value, and so on. When several values


in a neighborhood are the same, all equal values are grouped. For example, sup-
pose that a 3*3 neighborhood has values (10, 20, 20, 20, 15, 20, 20, 25, 100).
These values are sorted as (10, 15, 20, 20, 20, 20, 20, 25, 100), which results in a
median of 20. Thus, the principal function of median filters is to force points
with distinct gray levels to be more like their neighbors. In fact, isolated clusters
of pixels that are light or dark with respect to their neighbors, and whose area
is less than n2 2 (one-half the filter area), are eliminated by an n*n median
filter. In this case “eliminated” means forced to the median intensity of the
neighbors. Larger clusters are affected considerably less.
Although the median filter is by far the most useful order-statistics filter in
image processing, it is by no means the only one. The median represents the
50th percentile of a ranked set of numbers, but the reader will recall from basic
statistics that ranking lends itself to many other possibilities. For example, using
the 100th percentile results in the so-called max filter, which is useful in finding
the brightest points in an image. The response of a 3*3 max filter is given by
R=max Ezk | k=1, 2, p , 9F. The 0th percentile filter is the min filter, used for
the opposite purpose. Median, max, and mean filters are considered in more
detail in Chapter 5.

I Figure 3.37(a) shows an X-ray image of a circuit board heavily corrupted by
EXAMPLE 3.10: salt-and-pepper noise.To illustrate the point about the superiority of median fil-
Use of median tering over average filtering in situations such as this, we show in Fig. 3.37(b) the
filtering for noise result of processing the noisy image with a 3*3 neighborhood averaging mask,
reduction.
and in Fig. 3.37(c) the result of using a 3*3 median filter.The image processed
with the averaging filter has less visible noise, but the price paid is significant
blurring. The superiority in all respects of median over average filtering in this
case is quite evident. In general, median filtering is much better suited than av-
eraging for the removal of additive salt-and-pepper noise. I

a b c
FIGURE 3.37 (a) X-ray image of circuit board corrupted by salt-and-pepper noise. (b) Noise reduction with a
3*3 averaging mask. (c) Noise reduction with a 3*3 median filter. (Original image courtesy of Mr. Joseph
E. Pascente, Lixi, Inc.)

3.7 I Sharpening Spatial Filters 125

3.7 Sharpening Spatial Filters
The principal objective of sharpening is to highlight fine detail in an image or
to enhance detail that has been blurred, either in error or as a natural effect of
a particular method of image acquisition. Uses of image sharpening vary and in-
clude applications ranging from electronic printing and medical imaging to in-
dustrial inspection and autonomous guidance in military systems.
In the last section, we saw that image blurring could be accomplished in the
spatial domain by pixel averaging in a neighborhood. Since averaging is analo-
gous to integration, it is logical to conclude that sharpening could be accom-
plished by spatial differentiation. This, in fact, is the case, and the discussion in
this section deals with various ways of defining and implementing operators for
sharpening by digital differentiation. Fundamentally, the strength of the re-
sponse of a derivative operator is proportional to the degree of discontinuity of
the image at the point at which the operator is applied. Thus, image differenti-
ation enhances edges and other discontinuities (such as noise) and deempha-
sizes areas with slowly varying gray-level values.

3.7.1 Foundation
In the two sections that follow, we consider in some detail sharpening filters that
are based on first- and second-order derivatives, respectively. Before proceeding
with that discussion, however, we stop to look at some of the fundamental prop-
erties of these derivatives in a digital context. To simplify the explanation, we
focus attention on one-dimensional derivatives. In particular, we are interested
in the behavior of these derivatives in areas of constant gray level (flat segments),
at the onset and end of discontinuities (step and ramp discontinuities), and along
gray-level ramps.These types of discontinuities can be used to model noise points,
lines, and edges in an image. The behavior of derivatives during transitions into
and out of these image features also is of interest.
The derivatives of a digital function are defined in terms of differences.There
are various ways to define these differences. However, we require that any de-
finition we use for a first derivative (1) must be zero in flat segments (areas of
constant gray-level values); (2) must be nonzero at the onset of a gray-level
step or ramp; and (3) must be nonzero along ramps. Similarly, any definition of
a second derivative (1) must be zero in flat areas; (2) must be nonzero at the
onset and end of a gray-level step or ramp; and (3) must be zero along ramps
of constant slope. Since we are dealing with digital quantities whose values are
finite, the maximum possible gray-level change also is finite, and the shortest dis-
tance over which that change can occur is between adjacent pixels.
A basic definition of the first-order derivative of a one-dimensional func-
tion f(x) is the difference
0f
= f(x + 1) - f(x).
0x
We used a partial derivative here in order to keep the notation the same as
when we consider an image function of two variables, f(x, y), at which time we


will be dealing with partial derivatives along the two spatial axes. Use of a par-
tial derivative in the present discussion does not affect in any way the nature of
what we are trying to accomplish.
Similarly, we define a second-order derivative as the difference
0 2f
= f(x + 1) + f(x - 1) - 2f(x).
0x2
It is easily verified that these two definitions satisfy the conditions stated pre-
viously regarding derivatives of the first and second order. To see this, and also
to highlight the fundamental similarities and differences between first- and sec-
ond-order derivatives in the context of image processing, consider the example
shown in Fig. 3.38.
Figure 3.38(a) shows a simple image that contains various solid objects, a
line, and a single noise point. Figure 3.38(b) shows a horizontal gray-level pro-
file (scan line) of the image along the center and including the noise point. This
profile is the one-dimensional function we will use for illustrations regarding this
figure. Figure 3.38(c) shows a simplification of the profile, with just enough num-

a b
c
FIGURE 3.38
(a) A simple
image. (b) 1-D
horizontal gray-
level profile along
the center of the
image and
including the
isolated noise
point.
(c) Simplified
profile (the points
are joined by
dashed lines to
simplify
interpretation).
Gray level profile

7
Isolated point
6
5
4 Ramp Step
Thin line
3 Flat segment
2
1
0
Image strip 5 5 4 3 2 1 0 0 0 6 0 0 0 0 1 3 1 0 0 0 0 7 7 7 7

First Derivative –1 –1 –1 –1 –1 0 0 6 –6 0 0 0 1 2 –2 –1 0 0 0 7 0 0 0

Second Derivative –1 0 0 0 0 1 0 6 –12 6 0 0 1 1 –4 1 1 0 0 7 –7 0 0


bers to make it possible for us to analyze how the first- and second-order de-
rivatives behave as they encounter a noise point, a line, and then the edge of an
object. In our simplified diagram the transition in the ramp spans four pixels, the
noise point is a single pixel, the line is three pixels thick, and the transition into
the gray-level step takes place between adjacent pixels.The number of gray lev-
els was simplified to only eight levels.
Let us consider the properties of the first and second derivatives as we tra-
verse the profile from left to right. First, we note that the first-order derivative
is nonzero along the entire ramp, while the second-order derivative is nonzero
only at the onset and end of the ramp. Because edges in an image resemble this
type of transition, we conclude that first-order derivatives produce “thick” edges
and second-order derivatives, much finer ones. Next we encounter the isolated
noise point. Here, the response at and around the point is much stronger for
the second- than for the first-order derivative. Of course, this is not unexpect-
ed. A second-order derivative is much more aggressive than a first-order de-
rivative in enhancing sharp changes. Thus, we can expect a second-order
derivative to enhance fine detail (including noise) much more than a first-order
derivative. The thin line is a fine detail, and we see essentially the same differ-
ence between the two derivatives. If the maximum gray level of the line had
been the same as the isolated point, the response of the second derivative would
have been stronger for the latter. Finally, in this case, the response of the two de-
rivatives is the same at the gray-level step (in most cases when the transition into
a step is not from zero, the second derivative will be weaker). We also note that
the second derivative has a transition from positive back to negative. In an
image, this shows as a thin double line.This “double-edge” effect is an issue that
will be important in Chapter 10, where we use derivatives for edge detection.
It is of interest also to note that if the gray level of the thin line had been the
same as the step, the response of the second derivative would have been stronger
for the line than for the step.
In summary, comparing the response between first- and second-order de-
rivatives, we arrive at the following conclusions. (1) First-order derivatives gen-
erally produce thicker edges in an image. (2) Second-order derivatives have a
stronger response to fine detail, such as thin lines and isolated points. (3) First-
order derivatives generally have a stronger response to a gray-level step. (4) Sec-
ond-order derivatives produce a double response at step changes in gray level.
We also note of second-order derivatives that, for similar changes in gray-level
values in an image, their response is stronger to a line than to a step, and to a
point than to a line.
In most applications, the second derivative is better suited than the first de-
rivative for image enhancement because of the ability of the former to enhance
fine detail. For this, and for reasons of simpler implementation and extensions,
we will focus attention initially on uses of the second derivative for enhance-
ment. First-order derivatives are discussed in Section 3.7.3. Although the prin-
ciple of use of first derivatives in image processing is for edge extraction, they
do have important uses in image enhancement. In fact, we show in Section 3.8
that they can be used in conjunction with the second derivative to obtain some
impressive enhancement results.


3.7.2 Use of Second Derivatives for Enhancement–The Laplacian
In this section we consider in some detail the use of two-dimensional, second-
order derivatives for image enhancement.The approach basically consists of defin-
ing a discrete formulation of the second-order derivative and then constructing
a filter mask based on that formulation. We are interested in isotropic filters,
whose response is independent of the direction of the discontinuities in the image
to which the filter is applied. In other words, isotropic filters are rotation invari-
ant, in the sense that rotating the image and then applying the filter gives the
same result as applying the filter to the image first and then rotating the result.

Development of the method
It can be shown (Rosenfeld and Kak [1982]) that the simplest isotropic deriv-
ative operator is the Laplacian, which, for a function (image) f(x, y) of two
variables, is defined as
0 2f 0 2f
§ 2f = + . (3.7-1)
0x 2 0y2
Because derivatives of any order are linear operations, the Laplacian is a lin-
ear operator.
In order to be useful for digital image processing, this equation needs to be
expressed in discrete form. There are several ways to define a digital Laplacian
using neighborhoods. Whatever the definition, however, it has to satisfy the
properties of a second derivative outlined in Section 3.7.1. The definition of the
digital second derivative given in that section is one of the most used.Taking into
account that we now have two variables, we use the following notation for the
partial second-order derivative in the x-direction:
0 2f
= f(x + 1, y) + f(x - 1, y) - 2f(x, y) (3.7-2)
0 2x2
and, similarly in the y-direction, as
0 2f
= f(x, y + 1) + f(x, y - 1) - 2f(x, y) (3.7-3)
0 2y2
The digital implementation of the two-dimensional Laplacian in Eq. (3.7-1) is
obtained by summing these two components:
§ 2f = Cf(x + 1, y) + f(x - 1, y) + f(x, y + 1) + f(x, y - 1)D
- 4f(x, y). (3.7-4)
This equation can be implemented using the mask shown in Fig. 3.39(a), which
gives an isotropic result for rotations in increments of 90°. The mechanics of
implementation are given in Eq. (3.5-1) and are illustrated in Section 3.6.1 for
the linear smoothing filters. We simply are using different coefficients here.
The diagonal directions can be incorporated in the definition of the digital
Laplacian by adding two more terms to Eq. (3.7-4), one for each of the two
diagonal directions. The form of each new term is the same as either Eq. (3.7-2)


a b
0 1 0 1 1 1 c d
FIGURE 3.39
(a) Filter mask
used to
1 –4 1 1 –8 1
implement the
digital Laplacian,
as defined in
0 1 0 1 1 1 Eq. (3.7-4).
(b) Mask used to
implement an
extension of this
0 –1 0 –1 –1 –1 equation that
includes the
diagonal
neighbors. (c) and
–1 4 –1 –1 8 –1 (d) Two other
implementations
of the Laplacian.
0 –1 0 –1 –1 –1

or (3.7-3), but the coordinates are along the diagonals. Since each diagonal term
also contains a –2f(x, y) term, the total subtracted from the difference terms
now would be –8f(x, y). The mask used to implement this new definition is
shown in Fig. 3.39(b). This mask yields isotropic results for increments of 45°.
The other two masks shown in Fig. 3.39 also are used frequently in practice.
They are based on a definition of the Laplacian that is the negative of the one
we used here. As such, they yield equivalent results, but the difference in sign
must be kept in mind when combining (by addition or subtraction) a Lapla-
cian-filtered image with another image.
Because the Laplacian is a derivative operator, its use highlights gray-level
discontinuities in an image and deemphasizes regions with slowly varying gray
levels. This will tend to produce images that have grayish edge lines and other
discontinuities, all superimposed on a dark, featureless background. Background
features can be “recovered” while still preserving the sharpening effect of the
Laplacian operation simply by adding the original and Laplacian images. As
noted in the previous paragraph, it is important to keep in mind which defini-
tion of the Laplacian is used. If the definition used has a negative center coef-
ficient, then we subtract, rather than add, the Laplacian image to obtain a
sharpened result. Thus, the basic way in which we use the Laplacian for image
enhancement is as follows:

g(x, y) = d
f(x, y) - § 2f(x, y) if the center coefficient of the
Laplacian mask is negative
(3.7-5)
f(x, y) + § 2f(x, y) if the center coefficient of the
Laplacian mask is positive.

Use of this equation is illustrated next.


EXAMPLE 3.11: I Figure 3.40(a) shows an image of the North Pole of the moon. Figure 3.40(b)
Imaging shows the result of filtering this image with the Laplacian mask in Fig. 3.39(b).
sharpening with Since the Laplacian image contains both positive and negative values, a typical
the Laplacian.
way to scale it is to use the approach discussed at the end of Section 3.4.1. Some-
times one encounters the absolute value being used for this purpose, but this re-
ally is not correct because it produces double lines of nearly equal magnitude,
which can be confusing.
The image shown in Fig. 3.40(c) was scaled in the manner just described for
display purposes. Note that the dominant features of the image are edges and
sharp gray-level discontinuities of various gray-level values. The background,
previously near black, is now gray due to the scaling. This grayish appearance
is typical of Laplacian images that have been scaled properly. Finally, Fig. 3.40(d)

a b
c d
FIGURE 3.40
(a) Image of the
North Pole of the
moon.
(b) Laplacian-
filtered image.
(c) Laplacian
image scaled for
display purposes.
(d) Image
enhanced by
using Eq. (3.7-5).
(Original image
courtesy of
NASA.)


shows the result obtained using Eq. (3.7-5). The detail in this image is unmis-
takably clearer and sharper than in the original image. Adding the image to
the Laplacian restored the overall gray level variations in the image, with the
Laplacian increasing the contrast at the locations of gray-level discontinuities.
The net result is an image in which small details were enhanced and the back-
ground tonality was perfectly preserved. Results like these have made
Laplacian-based enhancement a fundamental tool used frequently for sharp-
ening digital images. I

Simplifications
In the previous example, we implemented Eq. (3.7-5) by first computing the
Laplacian-filtered image and then subtracting it from the original image. This
was done for instructional purposes to illustrate each step in the procedure. In
practice, Eq. (3.7-5) is usually implemented with one pass of a single mask. The
coefficients of the single mask are easily obtained by substituting Eq. (3.7-4)
for § 2f(x, y) in the first line of Eq. (3.7-5):
g(x, y) = f(x, y) - Cf(x + 1, y) + f(x - 1, y)
+ f(x, y + 1) + f(x, y - 1)D + 4f(x, y)
= 5f(x, y) - Cf(x + 1, y) + f(x - 1, y)
(3.7-6)

+ f(x, y + 1) + f(x, y - 1)D.
This equation can be implemented using the mask shown in Fig. 3.41(a). The
mask shown in Fig. 3.41(b) would be used if the diagonal neighbors also were
included in the calculation of the Laplacian. Identical masks would have re-
sulted if we had substituted the negative of Eq. (3.7-4) into the second line of
Eq. (3.7-5).

I The results obtainable with the mask containing the diagonal terms usually EXAMPLE 3.12:
are a little sharper than those obtained with the more basic mask of Fig. 3.41(a). Image
This property is illustrated by the Laplacian-filtered images shown in enhancement
using a composite
Figs. 3.41(d) and (e), which were obtained by using the masks in Figs. 3.41(a) and Laplacian mask.
(b), respectively. By comparing the filtered images with the original image shown
in Fig. 3.41(c), we note that both masks produced effective enhancement, but the
result using the mask in Fig. 3.41(b) is visibly sharper. Figure 3.41(c) is a scan-
ning electron microscope (SEM) image of a tungsten filament following ther-
mal failure; the magnification is approximately 250 *.) I

Because the Laplacian is a linear operator, we could have arrived at the same
composite masks in Figs. 3.41(a) and (b) by noting that Eq. (3.7-5) is the dif-
ference between (sum of) two linear processes. That is, f(x, y) be may viewed
as itself processed with a mask that has a unit coefficient in the center and zeros
elsewhere. The second term in the equation is the same image processed with
one of the Laplacian masks of Fig. 3.39. Due to linearity, the result obtained in
Eq. (3.7-5) with the unit-center mask and one of those Laplacian masks would
be the same as the result obtained with a single mask formed by subtracting
(adding) the Laplacian mask from (to) the unity-center mask.


0 –1 0

–1 5 –1
–1 –1 –1
0 –1 0
–1 9 –1

–1 –1 –1

a b c FIGURE 3.41 (a) Composite Laplacian mask. (b) A second composite mask. (c) Scanning
d e electron microscope image. (d) and (e) Results of filtering with the masks in (a) and (b),
respectively. Note how much sharper (e) is than (d). (Original image courtesy of Mr. Michael
Shaffer, Department of Geological Sciences, University of Oregon, Eugene.)

Unsharp masking and high-boost filtering
A process used for many years in the publishing industry to sharpen images
consists of subtracting a blurred version of an image from the image itself. This
process, called unsharp masking, is expressed as
–
fs(x, y) = f(x, y) - f (x, y) (3.7-7)
where fs(x, y) denotes the sharpened image obtained by unsharp masking, and
–
f (x, y) is a blurred version of f(x, y).The origin of unsharp masking is in dark-
room photography, where it consists of clamping together a blurred negative to
a corresponding positive film and then developing this combination to produce
a sharper image.
A slight further generalization of unsharp masking is called high-boost
filtering. A high-boost filtered image, fhb , is defined at any point (x, y) as
–
fhb(x, y) = Af(x, y) - f (x, y) (3.7-8)


a b
0 –1 0 –1 –1 –1 FIGURE 3.42 The
high-boost filtering
technique can be
implemented with
–1 A+4 –1 –1 A+8 –1 either one of these
masks, with A 1.

0 –1 0 –1 –1 –1

–
where A 1 and, as before, f is a blurred version of f. This equation may
be written as
–
fhb(x, y) = (A - 1)f(x, y) + f(x, y) - f (x, y). (3.7-9)

By using Eq. (3.7-7), we obtain

fhb(x, y) = (A - 1)f(x, y) + fs(x, y) (3.7-10)

as the expression for computing a high-boost-filtered image.
Equation (3.7-10) is applicable in general and does not state explicitly how
the sharp image is obtained. If we elect to use the Laplacian, then we know that
fs(x, y) can be obtained using Eq. (3.7-5). In this case, Eq. (3.7-10) becomes

fhb = d
Af(x, y) - § 2f(x, y) if the center coefficient of the
Laplacian mask is negative
(3.7-11)
Af(x, y) + § 2f(x, y) if the center coefficient of the
Laplacian mask is positive.

High-boost filtering can be implemented with one pass using either of the two
masks shown in Fig. 3.42. Note that, when A=1, high-boost filtering becomes
“standard” Laplacian sharpening. As the value of A increases past 1, the con-
tribution of the sharpening process becomes less and less important. Eventual-
ly, if A is large enough, the high-boost image will be approximately equal to the
original image multiplied by a constant.

I One of the principal applications of boost filtering is when the input image is EXAMPLE 3.13:
darker than desired. By varying the boost coefficient, it generally is possible to Image
obtain an overall increase in average gray level of the image, thus helping to bright- enhancement with
a high-boost filter.
en the final result. Figure 3.43 shows such an application. Part (a) of this figure is
a darker version of the image in Fig. 3.41(c). Figure 3.43(b) shows the Laplacian
computed using the mask in Fig. 3.42(b), with A=0. Figure 3.43(c) was obtained
using the mask in Fig. 3.42(b) with A=1. As expected, the image has been
sharpened, but it is still as dark as the original. Finally, Fig. 3.43(d) shows the re-
sult of using A=1.7.This is a much more acceptable result, in which the average
gray level has increased, thus making the image lighter and more natural. I


a b
c d
FIGURE 3.43
(a) Same as
Fig. 3.41(c), but
darker.
(a) Laplacian of
(a) computed with
the mask in
Fig. 3.42(b) using
A=0.
(c) Laplacian
enhanced image
using the mask in
Fig. 3.42(b) with
A=1. (d) Same
as (c), but using
A=1.7.

3.7.3 Use of First Derivatives for Enhancement—The Gradient
First derivatives in image processing are implemented using the magnitude of
the gradient. For a function f(x, y), the gradient of f at coordinates (x, y) is de-
fined as the two-dimensional column vector

§f = B x R = D T .
0f
G 0x
(3.7-12)
Gy 0f
0y
The magnitude of this vector is given by
§f = mag (§f)
= CG 2 + G 2 D
1 2
(3.7-13)

= Ba b + a b R .
x y

0f 2 0f 2 1 2
0x 0y
The components of the gradient vector itself are linear operators, but the mag-
nitude of this vector obviously is not because of the squaring and square root


operations. On the other hand, the partial derivatives in Eq. (3.7-12) are not ro-
tation invariant (isotropic), but the magnitude of the gradient vector is. Al-
though it is not strictly correct, the magnitude of the gradient vector often is
referred to as the gradient. In keeping with tradition, we will use this term in the
following discussions, explicitly referring to the vector or its magnitude only in
cases where confusion is likely.
The computational burden of implementing Eq. (3.7-13) over an entire image
is not trivial, and it is common practice to approximate the magnitude of the gra-
dient by using absolute values instead of squares and square roots:
§f L @Gx @ + @Gy @. (3.7-14)
This equation is simpler to compute and it still preserves relative changes in
gray levels, but the isotropic feature property is lost in general. However, as in
the case of the Laplacian, the isotropic properties of the digital gradient de-
fined in the following paragraph are preserved only for a limited number of ro-
tational increments that depend on the masks used to approximate the
derivatives. As it turns out, the most popular masks used to approximate the
gradient give the same result only for vertical and horizontal edges and thus
the isotropic properties of the gradient are preserved only for multiples of 90°.
These results are independent of whether Eq. (3.7-13) or (3.7-14) is used, so
nothing of significance is lost in using the simpler of the two equations.
As in the case of the Laplacian, we now define digital approximations to the
preceding equations, and from there formulate the appropriate filter masks. In
order to simplify the discussion that follows, we will use the notation in
Fig. 3.44(a) to denote image points in a 3*3 region. For example, the center
point, z5 , denotes f(x, y), z1 denotes f(x-1, y-1), and so on. As indicated
in Section 3.7.1, the simplest approximations to a first-order derivative that sat-
isfy the conditions stated in that section are Gx=Az8-z5 B and Gy=Az6-z5 B.
Two other definitions proposed by Roberts [1965] in the early development of
digital image processing use cross differences:
Gx = Az9 - z5 B and Gy = Az8 - z6 B. (3.7-15)
If we elect to use Eq. (3.7-13), then we compute the gradient as
§f = C Az9 - z5 B + Az8 - z6 B D
2 2 1 2
(3.7-16)
If we use absolute values, then substituting the quantities in Eq. (3.7-15) into
Eq. (3.7-14) gives us the following approximation to the gradient:
§f L @z9 - z5 @ + @z8 - z6 @. (3.7-17)
This equation can be implemented with the two masks shown in Figs. 3.44(b) and
(c). These masks are referred to as the Roberts cross-gradient operators.
Masks of even size are awkward to implement. The smallest filter mask in
which we are interested is of size 3*3. An approximation using absolute val-
ues, still at point z5 , but using a 3*3 mask, is
§f L @ Az7 + 2z8 + z9 B - Az1 + 2z2 + z3 B @
+ @ Az3 + 2z6 + z9 B - Az1 + 2z4 + z7 B @. (3.7-18)


a
b c z1 z2 z3
d e
FIGURE 3.44
A 3*3 region of
z4 z5 z6
an image (the z’s
are gray-level
values) and masks
used to compute z7 z8 z9
the gradient at
point labeled z5 .
All masks
coefficients sum –1 0 0 –1
to zero, as
expected of a
derivative
0 1 1 0
operator.

–1 –2 –1 –1 0 1

0 0 0 –2 0 2

1 2 1 –1 0 1

The difference between the third and first rows of the 3*3 image region ap-
proximates the derivative in the x-direction, and the difference between the
third and first columns approximates the derivative in the y-direction.The masks
shown in Figs. 3.44(d) and (e), called the Sobel operators, can be used to im-
plement Eq. (3.7-18) via the mechanics given in Eq. (3.5-1). The idea behind
using a weight value of 2 is to achieve some smoothing by giving more impor-
tance to the center point (we discuss this in more detail in Chapter 10). Note that
the coefficients in all the masks shown in Fig. 3.44 sum to 0, indicating that they
would give a response of 0 in an area of constant gray level, as expected of a de-
rivative operator.

EXAMPLE 3.14: I The gradient is used frequently in industrial inspection, either to aid hu-
Use of the mans in the detection of defects or, what is more common, as a preprocessing
gradient for edge step in automated inspection. We will have more to say about this in Chapters
enhancement.
10 and 11. However, it will be instructive at this point to consider a simple ex-
ample to show how the gradient can be used to enhance defects and eliminate
slowly changing background features. In this particular example, the enhance-
ment is used as a preprocessing step for automated inspection, rather than for
human analysis.
Figure 3.45(a) shows an optical image of a contact lens, illuminated by a light-
ing arrangement designed to highlight imperfections, such as the two edge

3.8 I Combining Spatial Enhancement Methods 137

a b
FIGURE 3.45
Optical image of
contact lens (note
defects on the
boundary at 4 and
5 o’clock).
(b) Sobel
gradient.
(Original image
courtesy of
Mr. Pete Sites,
Perceptics
Corporation.)

defects in the lens boundary seen at 4 and 5 o’clock. Figure 3.45(b) shows the
gradient obtained using Eq. (3.7-14) with the two Sobel masks in Figs. 3.44(d)
and (e). The edge defects also are quite visible in this image, but with the added
advantage that constant or slowly varying shades of gray have been eliminat-
ed, thus simplifying considerably the computational task required for auto-
mated inspection. Note also that the gradient process highlighted small specs
that are not readily visible in the gray-scale image (specs like these can be for-
eign matter, air pockets in a supporting solution, or miniscule imperfections in
the lens). The ability to enhance small discontinuities in an otherwise flat gray
field is another important feature of the gradient. I

3.8 Combining Spatial Enhancement Methods
With a few exceptions, like combining blurring with thresholding in Section 3.6.1,
we have focused attention thus far on individual enhancement approaches. Fre-
quently, a given enhancement task will require application of several comple-
mentary enhancement techniques in order to achieve an acceptable result. In
this section we illustrate by means of an example how to combine several of the
approaches developed in this chapter to address a difficult enhancement task.
The image shown in Fig. 3.46(a) is a nuclear whole body bone scan, used to
detect diseases such as bone infection and tumors. Our objective is to enhance
this image by sharpening it and by bringing out more of the skeletal detail. The
narrow dynamic range of the gray levels and high noise content make this image
difficult to enhance. The strategy we will follow is to utilize the Laplacian to
highlight fine detail, and the gradient to enhance prominent edges. For reasons
that will be explained shortly, a smoothed version of the gradient image will be
used to mask the Laplacian image (see Section 3.4 regarding masking). Final-
ly, we will attempt to increase the dynamic range of the gray levels by using a
gray-level transformation.
Figure 3.46 (b) shows the Laplacian of the original image, obtained using
the mask in Fig. 3.39(d). This image was scaled (for display only) using the
same technique as in Fig. 3.40. We can obtain a sharpened image at this point


a b
c d
FIGURE 3.46
(a) Image of
whole body bone
scan.
(b) Laplacian of
(a). (c) Sharpened
image obtained
by adding (a) and
(b). (d) Sobel of
(a).

3.8 I Combining Spatial Enhancement Methods 139

e f
g h
FIGURE 3.46
(Continued)
(e) Sobel image
smoothed with a
5*5 averaging
filter. (f) Mask
image formed by
the product of (c)
and (e).
(g) Sharpened
image obtained
by the sum of (a)
and (f). (h) Final
result obtained by
applying a
power-law
transformation to
(g). Compare (g)
and (h) with (a).
(Original image
courtesy of G.E.
Medical Systems.)


simply by adding Figs. 3.46(a) and (b), which are an implementation of the
second line in Eq. (3.7-5) (we used a mask with a positive center coefficient).
Just by looking at the noise level in (b), we would expect a rather noisy sharp-
ened image if we added Figs. 3.46(a) and (b), a fact that is confirmed by the
result shown in Fig. 3.46(c). One way that comes immediately to mind to re-
duce the noise is to use a median filter. However, median filtering is a non-
linear process capable of removing image features. This is unacceptable in
medical image processing.
An alternate approach is to use a mask formed from a smoothed version of
the gradient of the original image. The motivation behind this is straightfor-
ward and is based on the properties of first- and second-order derivatives ex-
plained in Section 3.7.1.The Laplacian, being a second-order derivative operator,
has the definite advantage that it is superior in enhancing fine detail. Howev-
er, this causes it to produce noisier results than the gradient. This noise is most
objectionable in smooth areas, where it tends to be more visible. The gradient
has a stronger response in areas of significant gray-level transitions (gray-level
ramps and steps) than does the Laplacian.The response of the gradient to noise
and fine detail is lower than the Laplacian’s and can be lowered further by
smoothing the gradient with an averaging filter. The idea, then, is to smooth the
gradient and multiply it by the Laplacian image. In this context, we may view
the smoothed gradient as a mask image. The product will preserve details in
the strong areas while reducing noise in the relatively flat areas.This process can
be viewed roughly as combining the best features of the Laplacian and the gra-
dient. The result is added to the original to obtain a final sharpened image, and
could even be used in boost filtering.
Figure 3.46(d) shows the Sobel gradient of the original image, computed
using Eq. (3.7-14). Components Gx and Gy were obtained using the masks in
Figs. 3.44(d) and (e), respectively. As expected from our discussion in Section
3.7.1, edges are much more dominant in this image than in the Laplacian image.
The smoothed gradient image shown in Fig. 3.46(e) was obtained by using an
averaging filter of size 5*5. The two gradient images were scaled for display
in the same manner as the two Laplacian images. Because the smallest possible
value of a gradient image is 0, the background is black in the scaled gradient im-
ages, rather than gray as in the scaled Laplacian. The fact that Figs. 3.46(d) and
(e) are much brighter than Fig. 3.46(b) is again evidence that the gradient of an
image with significant edge content has values that are higher in general than
in a Laplacian image.
The product of the Laplacian and smoothed-gradient image is shown in
Fig. 3.46(f). Note the dominance of the strong edges and the relative lack of vis-
ible noise, which is the key objective behind masking the Laplacian with a
smoothed gradient image. Adding the product image to the original resulted in
the sharpened image shown in Fig. 3.46(g). The significant increase in sharp-
ness of detail in this image over the original is evident in most parts of the image,
including the ribs, spinal chord, pelvis, and skull.This type of improvement would
not have been possible by using the Laplacian or gradient alone.
The sharpening procedure just discussed does not affect in an appreciable
way the dynamic range of the gray levels in an image. Thus, the final step in our

I Summary 141

enhancement task is to increase the dynamic range of the sharpened image. As
we discussed in some detail in Sections 3.2 and 3.3, there are a number of gray-
level transformation functions that can accomplish this objective. We do know
from the results in Section 3.3.2 that histogram equalization is not likely to work
well on images that have dark gray-level distributions like our images have
here. Histogram specification could be a solution, but the dark characteristics
of the images with which we are dealing lend themselves much better to a power-
law transformation. Since we wish to spread the gray levels, the value of g in
Eq. (3.2-3) has to be less than 1. After a few trials with this equation we arrived
at the result shown in Fig. 3.46(h), obtained with g=0.5 and c=1. Compar-
ing this image with Fig. 3.46(g), we see that significant new detail is visible in
Fig. 3.46(h). The areas around the wrists, hands, ankles, and feet are good ex-
amples of this. The skeletal bone structure also is much more pronounced, in-
cluding the arm and leg bones. Note also the faint definition of the outline of
the body, and of body tissue. Bringing out detail of this nature by expanding
the dynamic range of the gray levels also enhanced noise, but Fig. 3.46(h) rep-
resents a significant visual improvement over the original image.
The approach just discussed is representative of the types of processes that
can be linked in order to achieve results that are not possible with a single
technique. The way in which the results are used depends on the application.
The final user of the type of images shown in this section is likely to be a ra-
diologist. For a number of reasons that are beyond the scope of our discussion,
physicians are unlikely to rely on enhanced results to arrive at a diagnosis.
However, enhanced images are quite useful in highlighting details that can
serve as clues for further analysis in the original image or sequence of images.
In other areas, the enhanced result may indeed be the final product. Examples
are found in the printing industry, in image-based product inspection, in foren-
sics, in microscopy, in surveillance, and in a host of other areas where the prin-
cipal objective of enhancement is to obtain an image with a higher content of
visual detail.

Summary
The material presented in this chapter is representative of spatial domain techniques
commonly used in practice for image enhancement. This area of image processing is
a dynamic field, and new techniques and applications are reported routinely in pro-
fessional literature and in new product announcements. For this reason, the topics in-
cluded in this chapter were selected for their value as fundamental material that
would serve as a foundation for understanding the state of the art in enhancement
techniques, as well as for further study in this field. In addition to enhancement, this
chapter served the purpose of introducing a number of concepts, such as filtering with
spatial masks, that will be used in numerous occasions throughout the remainder of
the book. In the following chapter, we deal with enhancement from a complemen-
tary viewpoint in the frequency domain. Between these two chapters, the reader will
have developed a solid foundation for the terminology and some of the most funda-
mental tools used in image processing. The fact that these tools were introduced in the
context of image enhancement is likely to aid in the understanding of how they op-
erate on digital images.


The material in Section 3.1 is from Gonzalez [1986]. Additional reading for the materi-
al in Section 3.2 may be found in Schowengerdt [1983], Poyton [1996], and Russ [1999].
See also the paper by Tsujii et al. [1998] regarding the optimization of image displays.
Early references on histogram processing are Hummel [1974], Gonzalez and Fittes [1977],
and Woods and Gonzalez [1981]. Stark [2000] gives some interesting generalizations of
histogram equalization for adaptive contrast enhancement. Other approaches for con-
trast enhancement are exemplified by Centeno and Haertel [1997] and Cheng and Xu
[2000]. For enhancement based on an ideal image model, see Highnam and Brady [1997].
For extensions of the local histogram equalization method, see Caselles et al. [1999], and
Zhu et al. [1999]. See Narendra and Fitch [1981] on the use and implementation of local
statistics for image enhancement. Kim et al. [1997] present an interesting approach com-
bining the gradient with local statistics for image enhancement.
Image subtraction (Section 3.4.1) is a generic image processing tool widely used for
change detection. As noted in that section, one of the principal applications of digital
image subtraction is in mask mode radiography, where patient motion is a problem be-
cause motion smears the results. The problem of motion during image subtraction has
received significant attention over the years, as exemplified in the survey article by Mei-
jering et al. [1999].The method of noise reduction by image averaging (Section 3.4.2) was
first proposed by Kohler and Howell [1963]. See Peebles [1993] regarding the expected
value of the mean and variance of a sum of random variables.
For additional reading on linear spatial filters and their implementation, see Um-
baugh [1998], Jain [1989], and Rosenfeld and Kak [1982]. Rank-order filters are dis-
cussed in these references as well.Wilburn [1998] discusses generalizations of rank-order
filters. The book by Pitas and Venetsanopoulos [1990] also deals with median and other
nonlinear spatial filters.A special issue of IEEE Transactions in Image Processing [1996]
is dedicated to the topic of nonlinear image processing. The material on high-boost fil-
tering is from Schowengerdt [1983]. We will encounter again many of the spatial filters
introduced in this chapter in discussions dealing with image restoration (Chapter 5) and
edge detection (Chapter 10).

Problems
2
3.1 Exponentials of the form e-ar , with a a positive constant, are useful for con-
structing smooth gray-level transformation functions. Start with this basic func-
tion and construct transformation functions having the general shapes shown in
the following figures. The constants shown are input parameters, and your pro-
See inside front cover posed transformations must include them in their specification. (For simplicity in
Detailed solutions to the your answers, L0 is not a required parameter in the third curve.)
problems marked with a
star can be found in the
book web site. The site s=T(r) s=T(r) s=T(r)
also contains suggested
projects based on the ma- A B D
terial in this chapter.

A/2 B/2

C
r r r
L0 L0 0

(a) (b) (c)

I Problems 143

3.2 # (a) Give a continuous function for implementing the contrast stretching trans-
formation shown in Fig. 3.2(a). In addition to m, your function must include a
parameter, E, for controlling the slope of the function as it transitions from low
to high gray-level values. Your function should be normalized so that its min-
imum and maximum values are 0 and 1, respectively.
(b) Sketch a family of transformations as a function of parameter E, for a fixed
value m=L 2, where L is the number of gray levels in the image.
(c) What is the smallest value of s that will make your function effectively per-
form as the function in Fig. 3.2(b)? In other words, your function does not
have to be identical to Fig. 3.2(b). It just has to yield the same result of pro-
ducing a binary image. Assume that you are working with 8-bit images, and
let m=128. Also, let C be the smallest positive number representable in the
computer you are using.
3.3 Propose a set of gray-level-slicing transformations capable of producing all the in-
dividual bit planes of an 8-bit monochrome image. (For example, a transforma-
tion function with the property T(r)=0 for r in the range [0, 127], and
T(r)=255 for r in the range [128, 255] produces an image of the 7th bit plane
in an 8-bit image.)
3.4 # (a) What effect would setting to zero the lower-order bit planes have on the his-
togram of an image in general?
(b) What would be the effect on the histogram if we set to zero the higher-
order bit planes instead?
# 3.5 Explain why the discrete histogram equalization technique does not, in general,
yield a flat histogram.
3.6 Suppose that a digital image is subjected to histogram equalization. Show that a
second pass of histogram equalization will produce exactly the same result as the
first pass.

12ps
3.7 In some applications it is useful to model the histogram of input images as Gauss-
ian probability density functions of the form
2
(r - m)
1 -
pr(r) = e 2s2

where m and s are the mean and standard deviation of the Gaussian PDF. The
approach is to let m and s be measures of average gray level and contrast of a
given image. What is the transformation function you would use for histogram
equalization?
# 3.8 Assuming continuous values, show by example that it is possible to have a case
in which the transformation function given in Eq. (3.3-4) satisfies Conditions (a)
and (b) in Section 3.3.1, but its inverse may fail to be single valued.
3.9 (a) Show that the discrete transformation function given in Eq. (3.3-8) for his-
togram equalization satisfies conditions (a) and (b) in Section 3.3.1.
(b) Show by example that this does not hold in general for the inverse discrete
transformation function given in Eq. (3.3-9).
# (c) Show that the inverse discrete transformation in Eq. (3.3-9) satisfies Conditions
(a) and (b) in Section 3.3.1 if none of the gray levels rk , k=0, 1, p , L-1,
are missing.


3.10 An image has the gray level PDF pr(r) shown in the following diagram. It is de-
sired to transform the gray levels of this image so that they will have the speci-
fied pz(z) shown. Assume continuous quantities and find the transformation (in
terms of r and z) that will accomplish this.
pr(r) pz(z)

2 2

r z
1 1

# 3.11 Propose a method for updating the local histogram for use in the local enhance-
ment technique discussed in Section 3.3.3.
3.12 Two images, f(x, y) and g(x, y), have histograms hf and hg . Give the conditions
under which you can determine the histograms of
(a) f(x, y)+g(x, y)
(b) f(x, y)-g(x, y)
(c) f(x, y)*g(x, y)
(d) f(x, y) , g(x, y)
in terms of hf and hg. Explain how to obtain the histogram in each case.
3.13 Consider two 8-bit images whose gray levels span the full range from 0 to 255.
(a) Discuss the limiting effect of repeatedly subtracting image (b) from image (a).
(b) Would reversing the order of the images yield a different result?
# 3.14 Image subtraction is used often in industrial applications for detecting missing
components in product assembly. The approach is to store a “golden” image
that corresponds to a correct assembly; this image is then subtracted from in-
coming images of the same product. Ideally, the differences would be zero if
the new products are assembled correctly. Difference images for products with
missing components would be nonzero in the area where they differ from the
golden image. What conditions do you think have to be met in practice for this
method to work?
3.15 Prove the validity of Eqs. (3.4-4) and (3.4-5).
3.16 In an industrial application, X-ray imaging is to be used to inspect the inside of
certain composite castings.The objective is to look for voids in the castings, which
typically appear as small blobs in the image. However, due to properties in of the
casting material and X-ray energy used, high noise content often makes inspec-
tion difficult, so the decision is made to use image averaging to reduce the noise
and thus improve visible contrast. In computing the average, it is important to
keep the number of images as small as possible to reduce the time the parts have
to remain stationary during imaging. After numerous experiments, it is conclud-
ed that decreasing the noise variance by a factor of 10 is sufficient. If the imag-
ing device can produce 30 frames s, how long would the castings have to remain
stationary during imaging to achieve the desired decrease in variance? Assume
that the noise is uncorrelated and has zero mean.

I Problems 145

3.17 The implementation of linear spatial filters requires moving the center of a mask
throughout an image and, at each location, computing the sum of products of the
mask coefficients with the corresponding pixels at that location (see Section 3.5).
In the case of lowpass filtering, all coefficients are 1, allowing use of a so-called
box-filter or moving-average algorithm, which consists of updating only the part
of the computation that changes from one location to the next.
# (a) Formulate such an algorithm for an n*n filter, showing the nature of the
computations involved and the scanning sequence used for moving the mask
around the image.
(b) The ratio of the number of computations performed by a brute-force imple-
mentation to the number of computations performed by the box-filter algo-
rithm is called the computational advantage. Obtain the computational
advantage in this case and plot it as a function of n for n>1. The 1 n2 scaling
factor is common to both approaches, so you need not consider it in obtaining
the computational advantage. Assume that the image has an outer border of
zeros that is thick enough to allow you to ignore border effects in your analysis.
3.18 Discuss the limiting effect of repeatedly applying a 3*3 lowpass spatial filter to
a digital image. You may ignore border effects.
3.19 # (a) It was stated in Section 3.6.2 that isolated clusters of dark or light (with respect
to the background) pixels whose area is less than one-half the area of a medi-
an filter are eliminated (forced to the median value of the neighbors) by the fil-
ter. Assume a filter of size n*n, with n odd, and explain why this is so.
(b) Consider an image having various sets of pixel clusters.Assume that all points
in a cluster are lighter or darker than the background (but not both simulta-
neously in the same cluster), and that the area of each cluster is less than or
equal to n2 2. In terms of n, under what condition would one or more of these
clusters cease to be isolated in the sense described in part (a)?
# 3.20 (a) Develop a procedure for computing the median of an n*n neighborhood.
(b) Propose a technique for updating the median as the center of the neighbor-
hood is moved from pixel to pixel.
3.21 (a) In a character recognition application, text pages are reduced to binary form
using a thresholding transformation function of the form shown in Fig. 3.2(b).
This is followed by a procedure that thins the characters until they become
strings of binary 1’s on a background of 0’s. Due to noise, the binarization
and thinning processes result in broken strings of characters with gaps rang-
ing from 1 to 3 pixels. One way to “repair” the gaps is to run an averaging
mask over the binary image to blur it, and thus create bridges of nonzero pix-
els between gaps. Give the (odd) size of the smallest averaging mask capable
of performing this task.
(b) After bridging the gaps, it is desired to threshold the image in order to con-
vert it back to binary form. For your answer in (a), what is the minimum value
of the threshold required to accomplish this, without causing the segments to
break up again?
# 3.22 The three images shown were blurred using square averaging masks of sizes
n=23, 25, and 45, respectively. The vertical bars on the left lower part of (a) and
(c) are blurred, but a clear separation exists between them. However, the bars


have merged in image (b), in spite of the fact that the mask that produced this
image is significantly smaller than the mask that produced image (c). Explain this.

(a) (b) (c)
3.23 Consider an application such as the one shown in Fig. 3.36, in which it is desired
to eliminate objects smaller than those enclosed in a square of size q*q pixels.
Suppose that we want to reduce the average gray level of those objects to one-tenth
of their original average gray level. In this way, those objects will be closer to the
gray level of the background and they can then be eliminated by thresholding.
Give the (odd) size of the smallest averaging mask that will accomplish the desired
reduction in average gray level in only one pass of the mask over the image.
3.24 In a given application an averaging mask is applied to input images to reduce
noise, and then a Laplacian mask is applied to enhance small details. Would the
result be the same if the order of these operations were reversed?
# 3.25 Show that the Laplacian operation defined in Eq. (3.7-1) is isotropic (invariant to
rotation).You will need the following equations relating coordinates after axis ro-
tation by an angle u:
x=x¿ cos u-y¿ sin u
y=x¿ sin u+y¿ cos u
where (x, y) are the unrotated and (x¿, y¿) are the rotated coordinates.
3.26 Give a 3*3 mask for performing unsharp masking in a single pass through an
image.
# 3.27 Show that subtracting the Laplacian from an image is proportional to unsharp
masking. Use the definition for the Laplacian given in Eq. (3.7-4).
3.28 (a) Show that the magnitude of the gradient given in Eq. (3.7-13) is an isotrop-
ic operation. (See Problem 3.25.)
(b) Show that the isotropic property is lost in general if the gradient is comput-
ed using Eq. (3.7-14).
3.29 A CCD TV camera is used to perform a long-term study by observing the same area
24 hours a day, for 30 days. Digital images are captured and transmitted to a cen-
tral location every 5 minutes. The illumination of the scene changes from natural
daylight to artificial lighting.At no time is the scene without illumination, so it is al-
ways possible to obtain an image. Because the range of illumination is such that it
is always in the linear operating range of the camera, it is decided not to employ any
compensating mechanisms on the camera itself. Rather, it is decided to use digital
techniques to postprocess, and thus normalize, the images to the equivalent of con-
stant illumination. Propose a method to do this.You are at liberty to use any method
you wish, but state clearly all the assumptions you made in arriving at your design.

Image Processing 2ed

More Related Content

Similar to Image Processing 2ed (20)

Image Processing 2ed