Algorithms For Image Processing And Computer Vision 2nd Edition Parker
Algorithms For Image Processing And Computer Vision 2nd Edition Parker
Algorithms For Image Processing And Computer Vision 2nd Edition Parker
Algorithms For Image Processing And Computer Vision 2nd Edition Parker
1. Algorithms For Image Processing And Computer
Vision 2nd Edition Parker download
https://ebookbell.com/product/algorithms-for-image-processing-
and-computer-vision-2nd-edition-parker-22046208
Explore and download more ebooks at ebookbell.com
2. Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Algorithms For Image Processing And Computer Vision 2nd Edition J R
Parker
https://ebookbell.com/product/algorithms-for-image-processing-and-
computer-vision-2nd-edition-j-r-parker-5132564
Learn Opencv With Python By Examples Implement Computer Vision
Algorithms Provided By Opencv With Python For Image Processing Object
Detection And Machine Learning 2nd Edition James Chen
https://ebookbell.com/product/learn-opencv-with-python-by-examples-
implement-computer-vision-algorithms-provided-by-opencv-with-python-
for-image-processing-object-detection-and-machine-learning-2nd-
edition-james-chen-50883868
Machine Learning Algorithms For Signal And Image Processing Suman Lata
Tripathi
https://ebookbell.com/product/machine-learning-algorithms-for-signal-
and-image-processing-suman-lata-tripathi-48672584
Machine Learning For Opencv 4 Intelligent Algorithms For Building
Image Processing Apps Using Opencv 4 Python And Scikitlearn 2nd Edit
2nd Edition Aditya Shrimali Vishwesh Ravi Beyeler Michael Sharma
https://ebookbell.com/product/machine-learning-for-
opencv-4-intelligent-algorithms-for-building-image-processing-apps-
using-opencv-4-python-and-scikitlearn-2nd-edit-2nd-edition-aditya-
shrimali-vishwesh-ravi-beyeler-michael-sharma-11049180
3. Fuzzy Models And Algorithms For Pattern Recognition And Image
Processing 1st Edition James C Bezdek
https://ebookbell.com/product/fuzzy-models-and-algorithms-for-pattern-
recognition-and-image-processing-1st-edition-james-c-bezdek-2513454
Fuzzy Models And Algorithms For Pattern Recognition And Image
Processing James C Bezdek
https://ebookbell.com/product/fuzzy-models-and-algorithms-for-pattern-
recognition-and-image-processing-james-c-bezdek-42998070
Algorithmarchitecture Matching For Signal And Image Processing Best
Papers From Design And Architectures For Signal And Image Processing
2007 2008 2009 1st Edition Xiaolin Chen
https://ebookbell.com/product/algorithmarchitecture-matching-for-
signal-and-image-processing-best-papers-from-design-and-architectures-
for-signal-and-image-processing-2007-2008-2009-1st-edition-xiaolin-
chen-4229906
Modern Algorithms For Image Processing Computer Imagery By Example
Using C 1st Edition Vladimir Kovalevsky
https://ebookbell.com/product/modern-algorithms-for-image-processing-
computer-imagery-by-example-using-c-1st-edition-vladimir-
kovalevsky-7282956
Modern Algorithms For Image Processing Computer Imagery By Example
Using C Vladimir Kovalevsky
https://ebookbell.com/product/modern-algorithms-for-image-processing-
computer-imagery-by-example-using-c-vladimir-kovalevsky-11074114
10. Algorithms for Image Processing and Computer Vision, Second Edition
Published by
Wiley Publishing, Inc.
10475 Crosspoint Boulevard
Indianapolis, IN 46256
www.wiley.com
Copyright 2011 by J.R. Parker
Published by Wiley Publishing, Inc., Indianapolis, Indiana
Published simultaneously in Canada
ISBN: 978-0-470-64385-3
ISBN: 978-1-118-02188-0 (ebk)
ISBN: 978-1-118-02189-7 (ebk)
ISBN: 978-1-118-01962-7 (ebk)
Manufactured in the United States of America
10 9 8 7 6 5 4 3 2 1
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means,
electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108
of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization
through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA
01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions
Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at
http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with
respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including
without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or
promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work
is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional
services. If professional assistance is required, the services of a competent professional person should be sought. Neither
the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is
referred to in this work as a citation and/or a potential source of further information does not mean that the author or the
publisher endorses the information the organization or website may provide or recommendations it may make. Further,
readers should be aware that Internet websites listed in this work may have changed or disappeared between when this
work was written and when it is read.
For general information on our other products and services please contact our Customer Care Department within the
United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available
in electronic books.
Library of Congress Control Number: 2010939957
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its
affiliates, in the United States and other countries, and may not be used without written permission. All other trademarks
are the property of their respective owners. Wiley Publishing, Inc. is not associated with any product or vendor mentioned
in this book.
11. ‘‘Sin lies only in hurting other people unnecessarily.
All other ‘sins’ are invented nonsense.
(Hurting yourself is not a sin — just stupid.)’’
— Robert A. Heinlein
Thanks, Bob.
12. Credits
Executive Editor
Carol Long
Project Editor
John Sleeva
Technical Editor
Kostas Terzidis
Production Editor
Daniel Scribner
Copy Editor
Christopher Jones
Editorial Director
Robyn B. Siesky
Editorial Manager
Mary Beth Wakefield
Freelancer Editorial
Manager
Rosemarie Graham
Marketing Manager
Ashley Zurcher
Production Manager
Tim Tate
Vice President and Executive
Group Publisher
Richard Swadley
Vice President and Executive
Publisher
Barry Pruett
Associate Publisher
Jim Minatel
Project Coordinator, Cover
Lynsey Stanford
Proofreaders
Nancy Hanger, Paul Sagan
Indexer
Ron Strauss
Cover Image
Ryan Sneed
Cover Designer
GYRO PHOTOGRAPHY/
amanaimagesRB/Getty Images
vi
13. About the Author
J.R. Parker is a computer expert and teacher, with special interests in image
processing and vision, video game technologies, and computer simulations.
With a Ph.D. in Informatics from the State University of Gent, Dr. Parker
has taught computer science, art, and drama at the University of Calgary in
Canada, where he is a full professor. He has more than 150 technical papers
and four books to his credit, as well as video games such as the Booze Cruise,
a simulation of impaired driving designed to demonstrate its folly, and a
number of educational games. Jim lives on a small ranch near Cochrane,
Alberta, Canada with family and a host of legged and winged creatures.
vii
14. About the Technical Editor
Kostas Terzidis is an Associate Professor at the Harvard Graduate School of
Design. He holds a Ph.D. in Architecture from the University of Michigan
(1994), a Masters of Architecture from Ohio State University (1989), and a
Diploma of Engineering from the Aristotle University of Thessaloniki (1986).
His most recent work is in the development of theories and techniques for
the use of algorithms in architecture. His book Expressive Form: A Concep-
tual Approach to Computational Design, published by London-based Spon Press
(2003), offers a unique perspective on the use of computation as it relates to aes-
thetics, specifically in architecture and design. His book Algorithmic Architecture
(Architectural Press/Elsevier, 2006) provides an ontological investigation into
the terms, concepts, and processes of algorithmic architecture and provides
a theoretical framework for design implementations. His latest book, Algo-
rithms for Visual Design (Wiley, 2009), provides students, programmers, and
researchers the technical, theoretical, and design means to develop computer
code that will allow them to experiment with design problems.
viii
15. Acknowledgments
Thanks this time to Sonny Chan, for the inspiration for the parallel computing
chapter, to Jeff Boyd, for introducing me repeatedly to OpenCV, and to Ralph
Huntsinger and Ghislain C. Vansteenkiste, for getting me into and successfully
out of my Ph.D. program.
Almost all the images used in this book were created by me, using an IBM
PC with a frame grabber and a Sony CCD camera, an HP scanner, and a Sony
Eyetoy as a webcam. Credits for the few images that were not acquired in this
way are as follows:
Corel Corporation made available the color image of the grasshopper on
a leaf shown in Figure 3.33, and also was the origin of the example search
images in Figure 10.5.
The sample images in Figure 10.1 were a part of the ALOI data set, use of
which was allowed by J. M. Geusebroek.
Thanks to Big Hill Veterinary Clinic in Cochrane, Alberta, Canada, for the
X-ray image shown in Figure 3.10e.
Finally, thanks to Dr. N. Wardlaw, of the University of Calgary Department
of Geology, for the geological micropore image of Figure 3.16.
Most importantly, I need to thank my family: my wife, Katrin, and children,
Bailey and Max. They sacrificed time and energy so that this work could be
completed. I appreciate it and hope that the effort has been worthwhile.
ix
17. Contents at a Glance
Preface xxi
Chapter 1 Practical Aspects of a Vision System — Image Display,
Input/Output, and Library Calls 1
Chapter 2 Edge-Detection Techniques 21
Chapter 3 Digital Morphology 85
Chapter 4 Grey-Level Segmentation 137
Chapter 5 Texture and Color 177
Chapter 6 Thinning 209
Chapter 7 Image Restoration 251
Chapter 8 Classification 285
Chapter 9 Symbol Recognition 321
Chapter 10 Content-Based Search — Finding Images by Example 395
Chapter 11 High-Performance Computing for Vision and Image
Processing 425
Index 465
xi
19. Contents
Preface xxi
Chapter 1 Practical Aspects of a Vision System — Image Display,
Input/Output, and Library Calls 1
OpenCV 2
The Basic OpenCV Code 2
The IplImage Data Structure 3
Reading and Writing Images 6
Image Display 7
An Example 7
Image Capture 10
Interfacing with the AIPCV Library 14
Website Files 18
References 18
Chapter 2 Edge-Detection Techniques 21
The Purpose of Edge Detection 21
Traditional Approaches and Theory 23
Models of Edges 24
Noise 26
Derivative Operators 30
Template-Based Edge Detection 36
Edge Models: The Marr-Hildreth Edge Detector 39
The Canny Edge Detector 42
The Shen-Castan (ISEF) Edge Detector 48
A Comparison of Two Optimal Edge Detectors 51
xiii
20. xiv Contents
Color Edges 53
Source Code for the Marr-Hildreth Edge Detector 58
Source Code for the Canny Edge Detector 62
Source Code for the Shen-Castan Edge Detector 70
Website Files 80
References 82
Chapter 3 Digital Morphology 85
Morphology Defined 85
Connectedness 86
Elements of Digital Morphology — Binary Operations 87
Binary Dilation 88
Implementing Binary Dilation 92
Binary Erosion 94
Implementation of Binary Erosion 100
Opening and Closing 101
MAX — A High-Level Programming Language for
Morphology 107
The ‘‘Hit-and-Miss’’ Transform 113
Identifying Region Boundaries 116
Conditional Dilation 116
Counting Regions 119
Grey-Level Morphology 121
Opening and Closing 123
Smoothing 126
Gradient 128
Segmentation of Textures 129
Size Distribution of Objects 130
Color Morphology 131
Website Files 132
References 135
Chapter 4 Grey-Level Segmentation 137
Basics of Grey-Level Segmentation 137
Using Edge Pixels 139
Iterative Selection 140
The Method of Grey-Level Histograms 141
Using Entropy 142
Fuzzy Sets 146
Minimum Error Thresholding 148
Sample Results From Single Threshold Selection 149
21. Contents xv
The Use of Regional Thresholds 151
Chow and Kaneko 152
Modeling Illumination Using Edges 156
Implementation and Results 159
Comparisons 160
Relaxation Methods 161
Moving Averages 167
Cluster-Based Thresholds 170
Multiple Thresholds 171
Website Files 172
References 173
Chapter 5 Texture and Color 177
Texture and Segmentation 177
A Simple Analysis of Texture in Grey-Level Images 179
Grey-Level Co-Occurrence 182
Maximum Probability 185
Moments 185
Contrast 185
Homogeneity 185
Entropy 186
Results from the GLCM Descriptors 186
Speeding Up the Texture Operators 186
Edges and Texture 188
Energy and Texture 191
Surfaces and Texture 193
Vector Dispersion 193
Surface Curvature 195
Fractal Dimension 198
Color Segmentation 201
Color Textures 205
Website Files 205
References 206
Chapter 6 Thinning 209
What Is a Skeleton? 209
The Medial Axis Transform 210
Iterative Morphological Methods 212
The Use of Contours 221
Choi/Lam/Siu Algorithm 224
Treating the Object as a Polygon 226
Triangulation Methods 227
22. xvi Contents
Force-Based Thinning 228
Definitions 229
Use of a Force Field 230
Subpixel Skeletons 234
Source Code for Zhang-Suen/Stentiford/Holt Combined
Algorithm 235
Website Files 246
References 247
Chapter 7 Image Restoration 251
Image Degradations — The Real World 251
The Frequency Domain 253
The Fourier Transform 254
The Fast Fourier Transform 256
The Inverse Fourier Transform 260
Two-Dimensional Fourier Transforms 260
Fourier Transforms in OpenCV 262
Creating Artificial Blur 264
The Inverse Filter 270
The Wiener Filter 271
Structured Noise 273
Motion Blur — A Special Case 276
The Homomorphic Filter — Illumination 277
Frequency Filters in General 278
Isolating Illumination Effects 280
Website Files 281
References 283
Chapter 8 Classification 285
Objects, Patterns, and Statistics 285
Features and Regions 288
Training and Testing 292
Variation: In-Class and Out-Class 295
Minimum Distance Classifiers 299
Distance Metrics 300
Distances Between Features 302
Cross Validation 304
Support Vector Machines 306
Multiple Classifiers — Ensembles 309
Merging Multiple Methods 309
Merging Type 1 Responses 310
Evaluation 311
Converting Between Response Types 312
23. Contents xvii
Merging Type 2 Responses 313
Merging Type 3 Responses 315
Bagging and Boosting 315
Bagging 315
Boosting 316
Website Files 317
References 318
Chapter 9 Symbol Recognition 321
The Problem 321
OCR on Simple Perfect Images 322
OCR on Scanned Images — Segmentation 326
Noise 327
Isolating Individual Glyphs 329
Matching Templates 333
Statistical Recognition 337
OCR on Fax Images — Printed Characters 339
Orientation — Skew Detection 340
The Use of Edges 345
Handprinted Characters 348
Properties of the Character Outline 349
Convex Deficiencies 353
Vector Templates 357
Neural Nets 363
A Simple Neural Net 364
A Backpropagation Net for Digit Recognition 368
The Use of Multiple Classifiers 372
Merging Multiple Methods 372
Results From the Multiple Classifier 375
Printed Music Recognition — A Study 375
Staff Lines 376
Segmentation 378
Music Symbol Recognition 381
Source Code for Neural Net Recognition System 383
Website Files 390
References 392
Chapter 10 Content-Based Search — Finding Images by Example 395
Searching Images 395
Maintaining Collections of Images 396
Features for Query by Example 399
Color Image Features 399
Mean Color 400
Color Quad Tree 400
24. xviii Contents
Hue and Intensity Histograms 401
Comparing Histograms 402
Requantization 403
Results from Simple Color Features 404
Other Color-Based Methods 407
Grey-Level Image Features 408
Grey Histograms 409
Grey Sigma — Moments 409
Edge Density — Boundaries Between Objects 409
Edge Direction 410
Boolean Edge Density 410
Spatial Considerations 411
Overall Regions 411
Rectangular Regions 412
Angular Regions 412
Circular Regions 414
Hybrid Regions 414
Test of Spatial Sampling 414
Additional Considerations 417
Texture 418
Objects, Contours, Boundaries 418
Data Sets 418
Website Files 419
References 420
Systems 424
Chapter 11 High-Performance Computing for Vision and Image
Processing 425
Paradigms for Multiple-Processor Computation 426
Shared Memory 426
Message Passing 427
Execution Timing 427
Using clock() 428
Using QueryPerformanceCounter 430
The Message-Passing Interface System 432
Installing MPI 432
Using MPI 433
Inter-Process Communication 434
Running MPI Programs 436
Real Image Computations 437
Using a Computer Network — Cluster Computing 440
25. Contents xix
A Shared Memory System — Using the PC Graphics
Processor 444
GLSL 444
OpenGL Fundamentals 445
Practical Textures in OpenGL 448
Shader Programming Basics 451
Vertex and Fragment Shaders 452
Required GLSL Initializations 453
Reading and Converting the Image 454
Passing Parameters to Shader Programs 456
Putting It All Together 457
Speedup Using the GPU 459
Developing and Testing Shader Code 459
Finding the Needed Software 460
Website Files 461
References 461
Index 465
27. Preface
Humans still obtain the vast majority of their sensory input through their vi-
sual system, and an enormous effort has been made to artificially enhance this
sense. Eyeglasses, binoculars, telescopes, radar, infrared sensors, and photo-
multipliers all function to improve our view of the world and the universe.
We even have telescopes in orbit (eyes outside the atmosphere) and many of
those ‘‘see’’ in other spectra: infrared, ultraviolet, X-rays. These give us views
that we could not have imagined only a few years ago, and in colors that we’ll
never see with the naked eye. The computer has been essential for creating the
incredible images we’ve all seen from these devices.
When the first edition of this book was written, the Hubble Space Telescope
was in orbit and producing images at a great rate. It and the European
Hipparcos telescope were the only optical instruments above the atmosphere.
Now there is COROT, Kepler, MOST (Canada’s space telescope), and Swift
Gamma Ray Burst Explorer. In addition, there is the Spitzer (infrared),
Chandra (X-ray), GALEX (ultraviolet), and a score of others. The first edition
was written on a 450-Mhz Pentium III with 256 MB of memory. In 1999, the
first major digital SLR camera was placed on the market: the Nikon D1. It
had only 2.74 million pixels and cost just under $6,000. A typical PC disk
drive held 100–200 MB. Webcams existed in 1997, but they were expensive
and low-resolution. Persons using computer images needed to have a special
image acquisition card and a relatively expensive camera to conduct their
work, generally amounting to $1–2,000 worth of equipment. The technology
of personal computers and image acquisition has changed a lot since then.
The 1997 first edition was inspired by my numerous scans though the
Internet news groups related to image processing and computer vision. I
noted that some requests appeared over and over again, sometimes answered
and sometimes not, and wondered if it would be possible to answer the more
xxi
28. xxii Preface
frequently asked questions in book form, which would allow the development
of some of the background necessary for a complete explanation. However,
since I had just completed a book (Practical Computer Vision Using C), I was in
no mood to pursue the issue. I continued to collect information from the Net,
hoping to one day collate it into a sensible form. I did that, and the first edition
was very well received. (Thanks!)
Fifteen years later, given the changes in technology, I’m surprised at how
little has changed in the field of vision and image processing, at least at
the accessible level. Yes, the theory has become more sophisticated and
three-dimensional vision methods have certainly improved. Some robot vision
systems have accomplished rather interesting things, and face recognition has
been taken to a new level. However, cheap character recognition is still, well,
cheap, and is still not up to a level where it can be used reliably in most cases.
Unlike other kinds of software, vision systems are not ubiquitous features of
daily life. Why not? Possibly because the vision problem is really a hard one.
Perhaps there is room for a revision of the original book?
My goal has changed somewhat. I am now also interested in ‘‘democratiza-
tion’’ of this technology — that is, in allowing it to be used by anyone, at home,
in their business, or at schools. Of course, you need to be able to program a
computer, but that skill is more common than it was. All the software needed
to build the programs in this edition is freely available on the Internet. I
have used a free compiler (Microsoft Visual Studio Express), and OpenCV is
also a free download. The only impediment to the development of your own
image-analysis systems is your own programming ability.
Some of the original material has not changed very much. Edge detec-
tion, thinning, thresholding, and morphology have not been hot areas of
research, and the chapters in this edition are quite similar to those in the
original. The software has been updated to use Intel’s OpenCV system, which
makes image IO and display much easier for programmers. It is even a simple
matter to capture images from a webcam in real time and use them as input
to the programs. Chapter 1 contains a discussion of the basics of OpenCV use,
and all software in this book uses OpenCV as a basis.
Much of the mathematics in this book is still necessary for the detailed under-
standing of the algorithms described. Advanced methods in image processing
and vision require the motivation and justification that only mathematics can
provide. In some cases, I have only scratched the surface, and have left a
more detailed study for those willing to follow the references given at the
ends of chapters. I have tried to select references that provide a range of
approaches, from detailed and complex mathematical analyses to clear and
concise exposition. However, in some cases there are very few clear descrip-
tions in the literature, and none that do not require at least a university-level
math course. Here I have attempted to describe the situation in an intuitive
manner, sacrificing rigor (which can be found almost anywhere else) for as
29. Preface xxiii
clear a description as possible. The software that accompanies the descriptions
is certainly an alternative to the math, and gives a step-by-step description of
the algorithms.
I have deleted some material completely from the first edition. There is no
longer a chapter on wavelets, nor is there a chapter on genetic algorithms.
On the other hand, there is a new chapter on classifiers, which I think was
an obvious omission in the first edition. A key inclusion here is the chapter
on the use of parallel programming for solving image-processing problems,
including the use of graphics cards (GPUs) to accelerate calculations by factors
up to 200. There’s also a completely new chapter on content-based searches,
which is the use of image information to retrieve other images. It’s like saying,
‘‘Find me another image that looks like this.’’ Content-based search will be an
essential technology over the next two decades. It will enable the effective use
of modern large-capacity disk drives; and with the proliferation of inexpensive
high-resolution digital cameras, it makes sense that people will be searching
through large numbers of big images (huge numbers of pixels) more and more
often.
Most of the algorithms discussed in this edition can be found in source
code form on the accompanying web page. The chapter on thresholding alone
provides 17 programs, each implementing a different thresholding algorithm.
Thinning programs, edge detection, and morphology are all now available on
the Internet.
The chapter on image restoration is still one of the few sources of practical
information on that subject. The symbol recognition chapter has been updated;
however, as many methods are commercial, they cannot be described and
software can’t be provided due to patent and copyright concerns. Still, the
basics are there, and have been connected with the material on classifiers.
The chapter on parallel programming for vision is, I think, a unique feature
of this book. Again using downloadable tools, this chapter shows how to link
all the computers on your network into a large image-processing cluster. Of
couse, it also shows how to use all the CPUs on your multi-core and, most
importantly, gives an introductory and very practical look at how to program
the GPU to do image processing and vision tasks, rather than just graphics.
Finally, I have provided a chapter giving a selection of methods for use
in searching through images. These methods have code showing their imple-
mentation and, combined with other code in the book, will allow for many
hours of experimenting with your own ideas and algorithms for organizing
and searching image data sets.
Readers can download all the source code and sample images mentioned in
this book from the book’s web page — www.wiley.com/go/jrparker. You can
also link to my own page, through which I will add new code, new images,
and perhaps even new written material to supplement and update the printed
matter. Comments and mistakes (how likely is that?) can be communicated
30. xxiv Preface
through that web page, and errata will be posted, as will reader contributions
to the software collection and new ideas for ways to use the code methods for
compiling on other systems and with other compilers.
I invite you to make suggestions through the website for subjects for new
chapters that you would like to read. It is my intention to select a popular
request and to post a new chapter on that subject on the site at a future date.
A book, even one primarily released on paper, need not be a completely static
thing!
Jim Parker
Cochrane, Alberta, Canada
October 2010
31. C H A P T E R
1
Practical Aspects of a Vision
System—Image Display,
Input/Output, and Library Calls
When experimenting with vision- and image-analysis systems or implement-
ing one for a practical purpose, a basic software infrastructure is essential.
Images consist of pixels, and in a typical image from a digital camera there
will be 4–6 million pixels, each representing the color at a point in the
image. This large amount of data is stored as a file in a format (such as GIF
or JPEG) suitable for manipulation by commercial software packages, such
as Photoshop and Paint. Developing new image-analysis software means
first being able to read these files into an internal form that allows access to
the pixel values. There is nothing exciting about code that does this, and it
does not involve any actual image processing, but it is an essential first step.
Similarly, image-analysis software will need to display images on the screen
and save them in standard formats. It’s probably useful to have a facility for
image capture available, too. None of these operations modify an image but
simply move it about in useful ways.
These bookkeeping tasks can require most of the code involved in an
imaging program. The procedure for changing all red pixels to yellow, for
example, can contain as few as 10 lines of code; yet, the program needed to
read the image, display it, and output of the result may require an additional
2,000 lines of code, or even more.
Of course, this infrastructure code (which can be thought of as an application
programming interface, or API) can be used for all applications; so, once it is
developed, the API can be used without change until updates are required.
Changes in the operating system, in underlying libraries, or in additional
functionalities can require new versions of the API. If properly done, these
1
32. 2 Chapter 1 ■ Practical Aspects of a Vision System
new versions will require little or no modification to the vision programs that
depend on it. Such an API is the OpenCV system.
1.1 OpenCV
OpenCV was originally developed by Intel. At the time of this writing,
version 2.0 is current and can be downloaded from http://sourceforge
.net/projects/opencvlibrary/.
However, Version 2.0 is relatively new, yet it does not install and compile
with all of the major systems and compilers. All the examples in this book use
Version 1.1 from http://sourceforge.net/projects/opencvlibrary/files
/opencv-win/1.1pre1/OpenCV_1.1pre1a.exe/download,and compile with the
Microsoft Visual C++ 2008 Express Edition, which can be downloaded from
www.microsoft.com/express/Downloads/#2008-Visual-CPP.
The Algorithms for Image Processing and Computer Vision website
(www.wiley.com/go/jrparker) will maintain current links to new versions of
these tools. The website shows how to install both the compiler and OpenCV.
The advantage of using this combination of tools is that they are still pretty
current, they work, and they are free.
1.2 The Basic OpenCV Code
OpenCV is a library of C functions that implement both infrastructure oper-
ations and image-processing and vision functions. Developers can, of course,
add their own functions into the mix. Thus, any of the code described here
can be invoked from a program that uses the OpenCV paradigm, meaning
that the methods of this book are available in addition to those of OpenCV.
One simply needs to know how to call the library, and what the basic data
structures of open CV are.
OpenCV is a large and complex library. To assist everyone in starting to use
it, the following is a basic program that can be modified to do almost anything
that anyone would want:
// basic.c : A `wrapper´ for basic vision programs.
#include ˝stdafx.h˝
#include ˝cv.h˝
#include ˝highgui.h˝
int main (int argc, char* argv[])
{
IplImage *image = 0;
33. Chapter 1 ■ Practical Aspects of a Vision System 3
image = cvLoadImage(˝C:AIPCVimage1.jpg˝, 1 );
if( image )
{
cvNamedWindow( ˝Input Image˝, 1 );
cvShowImage( ˝Input Image˝, image );
printf( ˝Press a key to exitn˝);
cvWaitKey(0);
cvDestroyWindow(˝String˝);
}
else
fprintf( stderr, ˝Error reading imagen˝ );
return 0;
}
This is similar to many example programs on the Internet. It reads in an
image (C:AIPCVimage1.jpg is a string giving the path name of the image)
and displays it in a window on the screen. When the user presses a key, the
program terminates after destroying the display window.
Before anyone can modify this code in a knowledgeable way, the data
structures and functions need to be explained.
1.2.1 The IplImage Data Structure
The IplImage structure is the in-memory data organization for an image.
Images in IplImage form can be converted into arrays of pixels, but IplImage
also contains a lot of structural information about the image data, which can
have many forms. For example, an image read from a GIF file could be 256
grey levels with an 8-bit pixel size, or a JPEG file could be read into a 24-bit
per pixel color image. Both files can be represented as an IplImage.
An IplImage is much like other internal image representations in its basic
organization. The essential fields are as follows:
width An integer holding the width of the image in pixels
height An integer holding the height of the image in pixels
imageData A pointer to an array of characters, each one an actual pixel or color value
If each pixel is one byte, this is really all we need. However, there are many
data types for an image within OpenCV; they can be bytes, ints, floats, or
doubles in type, for instance. They can be greys (1 byte) or 3-byte color (RGB),
4 bytes, and so on. Finally, some image formats may have the origin at the
upper left (most do, in fact) and some use the lower left (only Microsoft).
34. 4 Chapter 1 ■ Practical Aspects of a Vision System
Other useful fields to know about include the following:
nChannels An integer specifying the number of colors per pixel (1–4).
depth An integer specifying the number of bits per pixel.
origin The origin of the coordinate system. An integer: 0=upper
left, 1=lower left.
widthStep An integer specifying, in bytes, the size of one row of the
image.
imageSize An integer specifying, in bytes, the size of the image
( = widthStep * height).
imageDataOrigin A pointer to the origin (root, base) of the image.
roi A pointer to a structure that defines a region of interest
within this image that is being processed.
When an image is created or read in from a file, an instance of an IplImage
is created for it, and the appropriate fields are given values. Consider the
following definition:
IplImage* img = 0;
As will be described later in more detail, an image can be read from a file by
the following code:
img = cvLoadImage(filename);
where the variable filename is a string holding the name of the image file. If
this succeeds, then
img->imageData
points to the block of memory where the pixels can be found. Figure 1.1 shows
a JPEG image named marchA062.jpg that can be used as an example.
Reading this image creates a specific type of internal representation common
to basic RGB images and will be the most likely variant of the IplImage
structure to be encountered in real situations. This representation has each
pixel represented as three bytes: one for red, one for green, and one for
blue. They appear in the order b, g, r, starting at the first row of the image
and stepping through columns, and then rows. Thus, the data pointed to by
img->imageData is stored in the following order:
b0,0 g0,0 r0,0 b0,1 g0,1 r0,1 b0,2 g0,2 r0,2 . . .
This means that the RGB values of the pixels in the first row (row 0) appear
in reverse order (b, g, r) for all pixels in that row. Then comes the next row,
starting over at column 0, and so on, until the final row.
35. Chapter 1 ■ Practical Aspects of a Vision System 5
Figure 1.1: Sample digital image for use in this chapter. It is an image of a tree in Chico,
CA, and was acquired using an HP Photosmart M637 camera. This is typical of a modern,
medium-quality camera.
How can an individual pixel be accessed? The field widthStep is the size of
a row, so the start of image row i would be found at
img->imageData + i*img->widthStep
Column j is j pixels along from this location; if pixels are bytes, then that’s
img->imageData + i*img->widthStep + j
If pixels are RGB values, as in the JPEG image read in above, then each pixel
is 3 bytes long and pixel j starts at location
img->imageData + i*img->widthStep + j*3
The value of the field nChannels is essentially the number of bytes per pixel,
so the pixel location can be generalized as:
img->imageData + i*img->widthStep))[j*img->nChannels]
Finally, the color components are in the order blue, green, and red. Thus,
the blue value for pixel [i,j] is found at
(img->imageData + i*img->widthStep)[j*img->nChannels + 0]
and green and red at the following, respectively:
(img->imageData + i*img->widthStep)[j*img->nChannels + 1]
(img->imageData + i*img->widthStep)[j*img->nChannels + 2]
The data type for a pixel will be unsigned character (or uchar).
There is a generic way to access pixels in an image that automatically uses
what is known about the image and its format and returns or modifies a
specified pixel. This is quite handy, because pixels can be bytes, RGB, float, or
36. 6 Chapter 1 ■ Practical Aspects of a Vision System
double in type. The function cvGet2D does this; getting the pixel value at i,j
for the image above is simply
p = cvGet2D (img, i, j);
The variable p is of type CvScalar, which is
struct CvScalar
{
double val[4];
}
If the pixel has only a single value (i.e., grey), then p.val[0] is that value. If it
is RGB, then the color components of the pixel are as follows:
Blue is p.val[0]
Green is p.val[1]
Red is p.val[2]
Modifying the pixel value is done as follows:
p.val[0] = 0; // Blue
p.val[1] = 255; // Green
p.val[2] = 255; // Red
cvSet2D(img,i,j,p); // Set the (i,j) pixel to yellow
This is referred to as indirect access in OpenCV documentation and is slower
than other means of accessing pixels. It is, on the other hand, clean and clear.
1.2.2 Reading and Writing Images
The basic function for image input has already been seen; cvLoadImage reads
an image from a file, given a path name to that file. It can read images in JPEG,
BMP, PNM, PNG, and TIF formats, and does so automatically, without the
need to specify the file type. This is determined from the data on the file itself.
Once read, a pointer to an IplImage structure is returned that will by default
be forced into a 3-channel RGB form, such as has been described previously.
So, the call
img = cvLoadImage (filename);
returns an IplImage* value that is an RGB image, unless the file name indicated
by the string variable filename can’t be read, in which case the function returns
0 (null). A second parameter can be used to change the default return image.
The call
img = cvLoadImage (filename, f);
37. Chapter 1 ■ Practical Aspects of a Vision System 7
returns a 1 channel (1 byte per pixel) grey-level image if f=0, and returns the
actual image type that is found in the file if f<0.
Writing an image to a file can be simple or complex, depending on what the
user wants to accomplish. Writing grey-level or RGB color images is simple,
using the code:
k = cvSaveImage( filename, img );
The filename is, as usual, a string indicating the name of the file to be saved,
and the img variable is the image to be written to that file. The file type will
correspond to the suffix on the file, so if the filename is file.jpg, then the file
format will be JPEG. If the file cannot be written, then the function returns 0.
1.2.3 Image Display
If the basic C/C++ compiler is used alone, then displaying an image is quite
involved. One of the big advantages in using OpenCV is that it provides easy
ways to call functions that open a window and display images within it. This
does not require the use of other systems, such as Tcl/Tk or Java, and asks
the programmer to have only a basic knowledge of the underlying system for
managing windows on their computer.
The user interface functions of OpenCV are collected into a library named
highgui, and are documented on the Internet and in books. The basics are as
follows: a window is created using the cvNamedWindow function, which specifies
a name for the window. All windows are referred to by their name and not
through pointers. When created, the window can be given the autosize
property or not. Following this, the function cvShowImage can be used to
display an image (as specified by an IplImage pointer) in an existing window.
For windows with the autosize property, the window will change size to fit
the image; otherwise, the image will be scaled to fit the window.
Whenever cvShowimage is called, the image passed as a parameter is dis-
played in the given window. In this way, consecutive parts of the processing
of an image can be displayed, and simple animations can be created and
displayed. After a window has been created, it can be moved to any position
on the screen using cvMoveWindow (name, x, y). It can also be moved using
the mouse, just like any other window.
1.2.4 An Example
It is now possible to write a simple OpenCV program that will read, process,
and display an image. The input image will be that of Figure 1.1, and the goal
will be to threshold it.
38. 8 Chapter 1 ■ Practical Aspects of a Vision System
First, add the needed include files, declare an image, and read it from a
file.
// Threshold a color image.
#include “stdafx.h“
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <cv.h>
#include <highgui.h>
int main (int argc, char* argv[])
{
IplImage *image = 0;
int i,j,k;
int mean=0, count=0;
char c;
image = cvLoadImage(“C:/AIPCV/marchA062.jpg“);
At this point, there should be image data pointed to by image. If so (if the
image is not null), display it in a window, as before.
if( image )
{
printf (“Height %d X with %dn“, image->height, image->width);
cvNamedWindow( “mainWin“, CV_WINDOW_AUTOSIZE);
cvShowImage( “mainWin“, image );
printf (“Display of image is done.n“);
cvWaitKey(0); // wait for a key
Now perform the thresholding operation. But this is a color image, so
convert it to grey first using the average of the three color components.
for (i=0; i<image->height; i++)
for (j=0; j<image->width; j++)
{
k=( (image->imageData+i*image->widthStep)[j*image->nChannels+0]
+(image->imageData+i*image->widthStep)[j*image->nChannels+1]
+(image->imageData+i*image->widthStep)[j*image->nChannels+2])/3;
(image->imageData+i*image->widthStep)[j*image->nChannels+0]
= (UCHAR) k;
(image->imageData+i*image->widthStep)[j*image->nChannels+1]
= (UCHAR) k;
(image->imageData+i*image->widthStep)[j*image->nChannels+2]
= (UCHAR) k;
39. Chapter 1 ■ Practical Aspects of a Vision System 9
At this point in the loop, count and sum the pixel values so that the mean
can be determined later.
mean += k;
count++;
}
Make a new window and display the grey image in it.
cvNamedWindow( “grey“, CV_WINDOW_AUTOSIZE);
cvShowImage( “grey“, image );
cvWaitKey(0); // wait for a key
Finally, compute the mean level for use as a threshold and pass through the
image again, setting pixels less than the mean to 0 and those greater to 255;
mean = mean/count;
for (i=0; i<image->height; i++)
for (j=0; j<image->width; j++)
{
k=(image->imageData+i*image->widthStep)
[j * image->nChannels + 0];
if (k < mean) k = 0;
else k = 255;
(image->imageData+i*image->widthStep)[j*image->nChannels+0]
= (UCHAR) k;
(image->imageData+i*image->widthStep)[j*image->nChannels+1]
= (UCHAR) k;
(image->imageData+i*image->widthStep)[j*image->nChannels+2]
= (UCHAR) k;
}
One final window is created, and the final thresholded image is displayed
and saved.
cvNamedWindow( “thresh“);
cvShowImage( “thresh“, image );
cvSaveImage( “thresholded.jpg“, image );
Wait for the user to type a key before destroying all the windows and
exiting.
cvWaitKey(0); // wait for a key
cvDestroyWindow(“mainWin“);
cvDestroyWindow(“grey“);
cvDestroyWindow(“thresh“);
}
40. 10 Chapter 1 ■ Practical Aspects of a Vision System
else
fprintf( stderr, “Error reading imagen“ );
return 0;
}
Figure 1.2 shows a screen shot of this program.
Figure 1.2: The three image windows created by the thresholding program.
1.3 Image Capture
The processing of still photos or scientific images can be done quite effectively
using scanned image or data from digital cameras. The availability of digital
image data has increased many-fold over the past decade, and it is no longer
unusual to find a digital camera, a scanner, and a video camera in a typical
household or small college laboratory. Other kinds of data and other devices
can be quite valuable sources of images for a vision system, key among
these the webcam. These are digital cameras, almost always USB powered,
having image sizes of 640x480 or larger. They acquire color images at video
rates, making such cameras ideal for certain vision applications: surveillance,
41. Chapter 1 ■ Practical Aspects of a Vision System 11
robotics, games, biometrics, and places where computers are easily available
and very high quality is not essential.
There are a great many types of webcam, and the details of how they work
are not relevant to this discussion. If a webcam is properly installed, then
OpenCV should be able to detect it, and the capture functions should be able
to acquire images from it. The scheme used by OpenCV is to first declare and
initialize a camera, using a handle created by the system. Assuming that this
is successful, images can be captured through the handle.
Initializing a camera uses the cvCaptureFromCAM function:
CvCapture *camera = 0;
camera = cvCaptureFromCAM( CV_CAP_ANY );
if( !camera ) error ...
The type CvCapture is internal, and represents the handle used to capture
images. The function cvCaptureFromCam initializes capturing a video from a
camera, which is specified using the single parameter. CV_CAP_ANY will allow
any connected camera to be used, but the system will choose which one. If
0 is returned, then no camera was seen, and image capture is not possible;
otherwise, the camera’s handle is returned and is needed to grab images.
A frame (image) can be captured using the cvQueryFrame function:
IplImage *frame = 0;
frame = cvQueryFrame( camera );
The image returned is an IplImage pointer, which can be used immediately.
When the program is complete, it is always a good idea to free any resources
allocated. In this case, that means releasing the camera, as follows:
cvReleaseCapture( &camera );
It is now possible to write a program that drives the webcam. Let’s have
the images displayed in a window so that the live video can be seen. When a
key is pressed, the program will save the current image in a JPEG file named
VideoFramexx.jpg, where xx is a number that increases each time.
// Capture.c - image capture from a webcam
#include “stdafx.h“
#include “stdio.h“
#include “string.h“
#include “cv.h“
#include “highgui.h“
int main(int argc, char ** argv)
{
CvCapture *camera = 0;
42. 12 Chapter 1 ■ Practical Aspects of a Vision System
IplImage *frame = 0;
int i, n=0;
char filename[256];
char c;
Initialize the camera and check to make sure that it is working.
camera = cvCaptureFromCAM( CV_CAP_ANY );
if( !camera ) // Get a camera?
{
fprintf(stderr, “Can’t initialize cameran“);
return -1;
}
Open a window for image display.
cvNamedWindow(“video“, CV_WINDOW_AUTOSIZE);
cvMoveWindow (“video“, 150, 200);
This program will capture 600 frames. At video rates of 30 FPS, this would
be 20 seconds, although cameras do vary on this.
for(i=0; i<600; i++)
{
frame = cvQueryFrame( camera ); // Get one frame.
if( !frame )
{
fprintf(stderr, “Capture failed.n“);
}
The following creates a short pause between frames. Without it, the images
come in too fast, and in many cases nothing is displayed. cvWaitKey waits for
a key press or for the time specified — in this case, 100 milliseconds.
c = cvWaitKey(100);
Display the image we just captured in the window.
// Display the current frame.
cvShowImage(“video“, frame);
If cvWaitKey actually caught a key press, this means that the image is to be
saved. If so, the character returned will be >0. Save it as a file in the AIPCV
directory.
if (c>0)
{
sprintf(filename, “C:/AIPCV/VideoFrame%d.jpg“, n++);
if( !cvSaveImage(filename, frame) )
43. Chapter 1 ■ Practical Aspects of a Vision System 13
{
fprintf(stderr, “Failed to save frame as '%s’n“, filename);
} else
fprintf (stderr, “Saved frame as 'VideoFrame%d.jpg’n“, n-1);
}
}
Free the camera to avoid possible problems later.
cvReleaseCapture( &camera );
// Wait for terminating keypress.
cvWaitKey(0);
return 0;
}
The data from the camera will be displayed at a rate of 10 frames/second,
because the delay between frames (as specified by cvWaitKey is 100 milli-
seconds, or 100/1000 = 0.1 seconds. This means that the frame rate can be
altered by changing this parameter, without exceeding the camera’s natural
maximum. Increasing this parameter decreases the frame rate. An example of
how this program appears on the screen while running is given as Figure 1.3.
Figure 1.3: How the camera capture program looks on the screen. The image seems
static, but it is really live video.
44. 14 Chapter 1 ■ Practical Aspects of a Vision System
1.4 Interfacing with the AIPCV Library
This book discusses many algorithms, almost all of which are provided in
source code form at the book’s corresponding website. To access the examples
and images on a PC, copy the directory AIPCV to the C: directory. Within
that directory are many C source files that implement the methods discussed
here. These programs are intended to be explanatory rather than efficient, and
represent another way, a very precise way, to explain an algorithm. These
programs comprise a library that uses a specific internal form for storing
image data that was intended for use with grey-level images. It is not directly
compatible with OpenCV, and so a conversion tool is needed.
OpenCV is not only exceptionally valuable for providing infrastructure to a
vision system, but it also provides a variety of image-processing and computer
vision functions. Many of these will be discussed in upcoming chapters (Canny
and Sobel edge detection, for example), but many of the algorithms described
here and provided in code form in the AIPCV library do not come with
OpenCV. How can the two systems be used together?
The key detail when using OpenCV is knowledge of how the image structure
is implemented. Thus, connecting OpenCV with the AIPCV library is largely a
matter of providing a way to convert between the image structures of the two
systems. This turns out to be quite simple for grey-level, one-channel images,
and more complex for color images.
The basic image structure in the AIPCV library consists of two structures: a
header and an image. The image structure, named simply image, consists of
two pointers: one to a header and one to an array of pixel data:
struct image
{
struct header *info; // Pointer to header
unsigned char **data; // Pointer tp pixels
};
The pixel data is stored in the same way as for single-channel byte images
in OpenCV: as a block of bytes addressed in row major order. It is set up to
be indexed as a 2D array, however, so data is an array of pointers to rows.
The variable data[0] is a pointer to the beginning of the entire array, and so
is equivalent to IplImage.imageData.
The header is quite simple:
struct header
{
int nr, nc;
int oi, oj;
};
45. Chapter 1 ■ Practical Aspects of a Vision System 15
The field nr is the number of rows in the image, and nc is the number
of columns. These are equivalent to IplImage.height and IplImage.width,
respectively. The oi and oj fields specify the origin of the image, and are used
only for a very few cases (e.g., restoration). There are no corresponding fields
in OpenCV.
The way to convert an AIPCV image into an OpenCV image is now clear,
and is needed so that images can be displayed in windows and saved in
JPEG and other formats.
IplImage *toOpenCV (IMAGE x)
{
IplImage *img;
int i=0, j=0;
CvScalar s;
img=cvCreateImage(cvSize(x->info->nc,x->info->nr),8, 1);
for (i=0; i<x->info->nr; i++)
{
for (j=0; j<x->info->nc; j++)
{
s.val[0] = x->data[i][j];
cvSet2D (img, i,j,s);
}
}
return img;
}
This function copies the pixel values into a new IplImage. It is also possible
to use the original data array in the IplImage directly. There is some danger in
this, in that OpenCV may decide to free the storage, for instance, making both
versions inaccessible.
Converting from IplImage to AIPCV is more complicated, because OpenCV
images might be in color. If so, how is it converted into grey? We’ll not
dwell on this except to say that one color image can be converted into three
monochrome images (one each for red, green, and blue), or a color map could
be constructed using a one-byte index that could be used as the pixel value.
The solution presented here is to convert a 3-channel color image into grey by
averaging the RGB values, leaving the other solutions for future consideration.
IMAGE fromOpenCV (IplImage *x)
{
IMAGE img;
int color=0, i=0;
int k=0, j=0;
CvScalar s;
if ((x->depth==IPL_DEPTH_8U) &&(x->nChannels==1)) // Grey image
46. 16 Chapter 1 ■ Practical Aspects of a Vision System
img = newimage (x->height, x->width);
else if ((x->depth==8) && (x->nChannels==3)) //Color
{
color = 1;
img = newimage (x->height, x->width);
}
else return 0;
for (i=0; i<x->height; i++)
{
for (j=0; j<x->width; j++)
{
s = cvGet2D (x, i, j);
if (color)
k= (unsigned char) ((s.val[0]+s.val[1]+s.val[2])/3);
else k = (unsigned char)(s.val[0]);
img->data[i][j] = k;
}
}
return img;
}
The two functions toOpenCV and fromOpenCV do the job of allowing the
image-processing routines developed here to be used with OpenCV. As a
demonstration, here is the main routine only for a program that thresholds
an image using the method of grey-level histograms devised by Otsu and
presented in Chapter 4. It is very much like the program for thresholding
written earlier in Section 1.2.4, but instead uses the AIPCV library function
thr_glh to find the threshold and apply it.
int main(int argc, char *argv[])
{
IplImage* img=0;
IplImage* img2=0;
IMAGE x;
int height,width,step,channels;
uchar *data;
int mean=0,count=0;
if(argc<1){
printf(“Usage: main <image-file-name>n7“);
exit(0);
}
// load an image
img=cvLoadImage(“H:/AIPCV/marchA062.jpg“);
47. Chapter 1 ■ Practical Aspects of a Vision System 17
if(!img)
{
printf(“Could not load image file: %sn“,argv[1]);
exit(0);
}
// get the image data
height = img->height;
width = img->width;
step = img->widthStep;
channels = img->nChannels;
data = (uchar *)img->imageData;
printf(“Processing a %dx%d image with %dchannelsn“,
height,width,channels);
// create a window
cvNamedWindow(“win1“, CV_WINDOW_AUTOSIZE);
cvMoveWindow(“win1“, 100, 100);
// show the image
cvShowImage(“win1“, img );
// Convert to AIPCV IMAGE type
x = fromOpenCV (img);
if (x)
{
thr_glh (x);
img2 = toOpenCV (x); // Convert to OpenCV to display
cvNamedWindow( “thresh“);
cvShowImage( “thresh“, img2 );
cvSaveImage( “thresholded.jpg“, img2 );
}
// wait for a key
cvWaitKey(0);
// release the image
cvReleaseImage(&img);
return 0;
}
In the remainder of this book, we will assume that OpenCV can be used for
image display and I/O and that the native processing functions of OpenCV
can be added to what has already been presented.
For convenience, the AIPCV library contains the following X functions for
IO and display of its images directly to OpenCV:
48. 18 Chapter 1 ■ Practical Aspects of a Vision System
display_image (IMAGE x) Displays the specified image on the
screen
save_image (IMAGE x, char *name) Saves the image in a file with the given
name
IMAGE get_image (char *name) Reads the image in the named file and
return a pointer to it
IMAGE grab_image () Captures an image from an attached
webcam and return a pointer to it
1.5 Website Files
The website associated with this book contains code and data associated with
each chapter, in addition to new information, errata, and other comments.
Readers should create a directory for this information on their PC called
C:AIPCV. Within that, directories for each chapter can be named CH1, CH2, and
so on.
The following material created for this chapter will appear in C:AIPCVCH1:
capture.c Gets an image from a webcam
lib0.c A collection of OpenCV input/output/display functions
thr_glh.c Thresholds an image
1.6 References
Agam, Gady. ‘‘Introduction to Programming With OpenCV,’’ www.edu/~agam/
cs512/lect-notes/opencv-intro/opencv-intro.html (accessed January
27, 2006).
Bradsky, Gary and Kaehler, Adrian. Learning OpenCV: Computer Vision with
the OpenCV Library. Sebastopol: O’Reilly Media Inc, 2008.
‘‘CV Reference Manual,’’ http://cognotics.com/opencv/docs/1.0/ref/
opencvref_cv.htm (accessed March 16, 2010).
‘‘cvCamReference Manual,’’www.cognotics.com/opencv/docs/1.0/cvcam.pdf
(accessed March 16, 2010).
‘‘CXCORE Reference Manual,’’http://cognotics.com/opencv/docs/1.0/ref/
opencvref_cxcore.htm (accessed March 16, 2010).
‘‘Experimental and Obsolete FunctionalityReference,’’http://cognotics.com/
opencv/docs/1.0/ref/opencvref_cvaux.htm (accessed March 16, 2010).
49. Chapter 1 ■ Practical Aspects of a Vision System 19
‘‘HighGUI Reference Manual,’’ cognotics.com/opencv/docs/1.0/ref/
opencvref_highgui.htm (accessed March 16, 2010).
‘‘OpenCV Wiki-Pages,’’ http://opencv.willowgarage.com/wiki.
Otsu, N, ‘‘A Threshold Selection Method from Grey-Level Histograms,’’ SMC
9, no. 1 (1979): 62–66.
Parker, J. R. Practical Computer Vision Using C. New York: John Wiley & Sons,
Inc., 1994.
51. C H A P T E R
2
Edge-Detection Techniques
2.1 The Purpose of Edge Detection
Edge detection is one of the most commonly used operations in image analysis,
and there are probably more algorithms in the literature for enhancing and
detecting edges than any other single subject. The reason for this is that edges
form the outline of an object, in the generic sense. Objects are subjects of interest
in image analysis and vision systems. An edge is the boundary between an
object and the background, and indicates the boundary between overlap-
ping objects. This means that if the edges in an image can be identified
accurately, all the objects can be located, and basic properties such as area,
perimeter, and shape can be measured. Since computer vision involves the
identification and classification of objects in an image, edge detection is an
essential tool.
Figure 2.1 illustrates a straightforward example of edge detection. There are
two overlapping objects in the original picture: (a), which has a uniform grey
background; and (b), the edge-enhanced version of the same image has dark
lines outlining the three objects. Note that there is no way to tell which parts
of the image are background and which are object; only the boundaries between
the regions are identified. However, given that the blobs in the image are the
regions, it can be determined that the blob numbered ‘‘3’’ covers up a part of
blob ‘‘2’’ and is therefore closer to the camera.
Edge detection is part of a process called segmentation — the identification
of regions within an image. The regions that may be objects in Figure 2.1 have
been isolated, and further processing may determine what kind of object each
21
52. 22 Chapter 2 ■ Edge-Detection Techniques
region represents. While in this example edge detection is merely a step in the
segmentation process, it is sometimes all that is needed, especially when the
objects in an image are lines.
(a) (b)
1
2
3
1
2
3
Figure 2.1: Example of edge detection. (a) Synthetic image with blobs on a grey
background. (b) Edge-enhanced image showing only the outlines of the objects.
Consider the image in Figure 2.2, which is a photograph of a cross-section
of a tree. The growth rings are the objects of interest in this image. Each ring
represents a year of the tree’s life, and the number of rings is therefore the same
as the age of the tree. Enhancing the rings using an edge detector, as shown
in Figure 2.2b, is all that is needed to segment the image into foreground
(objects = rings) and background (everything else).
(a) (b) (c)
Figure 2.2: The A cross-section of a tree. (a) Original grey-level image. (b) Ideal edge
enhanced image, showing the growth rings. (c) The edge enhancement that one might
expect using a real algorithm.
Technically, edge detection is the process of locating the edge pixels, and edge
enhancement is the process of increasing the contrast between the edges and
the background so that the edges become more visible. In practice, however, the
terms are used interchangeably, since most edge-detection programs also set
53. Chapter 2 ■ Edge-Detection Techniques 23
the edge pixel values to a specific grey level or color so that they can be easily
seen. In addition, edge tracing is the process of following the edges, usually
collecting the edge pixels into a list. This is done in a consistent direction,
either clockwise or counter-clockwise around the objects. Chain coding is one
example of a specific algorithm for edge tracing. The result is a non-raster
representation of the objects that can be used to compute shape measures or
otherwise identify or classify the object.
The remainder of this chapter discusses the theory of edge detection,
including a collection of traditional methods. This includes the Canny edge
detector and the Shen-Castan, or ISEF, edge detector. Both are based solidly on
theoretical considerations, and both claim a degree of optimality; that is, both
claim to be the best that can be done under certain specified circumstances.
These claims will be examined, both in theory and in practice.
2.2 Traditional Approaches and Theory
Most good algorithms begin with a clear statement of the problem to be solved,
and a cogent analysis of the possible methods of solution and the conditions
under which the methods will operate correctly. Using this paradigm, to define
an edge-detection algorithm means first defining what an edge is, and then
using this definition to suggest methods of enhancement and identification.
As usual, there are a number of possible definitions of an edge, each being
applicable in various specific circumstances. One of the most common and
most general definitions is the ideal step edge, illustrated in Figure 2.3.
In this one-dimensional example, the edge is simply a change in grey level
occurring at one specific location. The greater the change in level, the easier
the edge is to detect (although in the ideal case, any level change can be seen
quite easily).
The first complication occurs because of digitization. It is unlikely that the
image will be sampled in such a way that all the edges happen to correspond
exactly with a pixel boundary. Indeed, the change in level may extend across
some number of pixels (Figures 2.3b–d). The actual position of the edge is
considered to be the center of the ramp connecting the low grey level to the
high one. This is a ramp in the mathematical world only, since after the image
has been made digital (sampled), the ramp has the jagged appearance of a
staircase.
The second complication is the ubiquitous problem of noise. Due to a
great many factors such as light intensity, type of camera and lens, motion,
temperature, atmospheric effects, dust, and others, it is very unlikely that two
pixels that correspond to precisely the same grey level in the scene will have
the same level in the image. Noise is a random effect and can be characterized
only statistically. The result of noise on the image is to produce a random
54. 24 Chapter 2 ■ Edge-Detection Techniques
variation in level from pixel to pixel, and so the smooth lines and ramps of the
ideal edges are never encountered in real images.
0 5 10
0
5
10
Position
Grey
Level
Edge position
(a)
15 20 0 5 10
0
5
10
Position
Grey
Level
Edge position
(b)
15 20
0 5 10
0
5
10
Position
Grey
Level
Edge position
(c)
15 20 0 5 10
0
5
10
Position
Grey
Level
Edge position
(d)
15 20
Figure 2.3: Step edges. (a) The change in level occurs exactly at pixel 10. (b) The same
level change as before, but over 4 pixels centered at pixel 10. This is a ramp edge.
(c) Same level change but over 10 pixels, centered at 10. (d) A smaller change over 10
pixels. The insert shows the way the image would appear, and the dotted line shows
where the image was sliced to give the illustrated cross-section.
2.2.1 Models of Edges
The step edge of Figure 2.3a is ideal because it is easy to detect: In the absence
of noise, any significant change in grey level would indicate an edge. A step
edge never really occurs in an image because: a) objects rarely have such a
sharp outline; b) a scene is never sampled so that edges occur exactly at the
margin of a pixel; and c) due to noise, as mentioned previously.
85. Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com