Innovative Research In Attention Modeling And Computer Vision Applications 1st Edition Rajarshi Pal

Innovative Research In Attention Modeling And
Computer Vision Applications 1st Edition
Rajarshi Pal download
https://ebookbell.com/product/innovative-research-in-attention-
modeling-and-computer-vision-applications-1st-edition-rajarshi-
pal-6770006
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Innovative Research In Life Sciences Pathways To Scientific Impact
Public Health Improvement And Economic Progress Ebook E Andrew Balas
https://ebookbell.com/product/innovative-research-in-life-sciences-
pathways-to-scientific-impact-public-health-improvement-and-economic-
progress-ebook-e-andrew-balas-7319346
Innovative Research In Transportation Infrastructure Proceedings Of
Iciif 2018 1st Ed Dipankar Deb
https://ebookbell.com/product/innovative-research-in-transportation-
infrastructure-proceedings-of-iciif-2018-1st-ed-dipankar-deb-7328652
Innovative Research Methodologies In Management Volume I Philosophy
Measurement And Modelling 1st Edition Luiz Moutinho
https://ebookbell.com/product/innovative-research-methodologies-in-
management-volume-i-philosophy-measurement-and-modelling-1st-edition-
luiz-moutinho-6841256
Innovative Research Methodologies In Management Volume Ii Futures
Biometrics And Neuroscience Research 1st Edition Luiz Moutinho
https://ebookbell.com/product/innovative-research-methodologies-in-
management-volume-ii-futures-biometrics-and-neuroscience-research-1st-
edition-luiz-moutinho-6841258

The Critical Turn In Tourism Studies Innovative Research Methodologies
Advances In Tourism Research Advances In Tourism Research Advances In
Tourism Research Irena Ateljevic
https://ebookbell.com/product/the-critical-turn-in-tourism-studies-
innovative-research-methodologies-advances-in-tourism-research-
advances-in-tourism-research-advances-in-tourism-research-irena-
ateljevic-2138870
Innovative Research And Practices In Second Language Acquisition And
Bilingualism John W Schwieter
https://ebookbell.com/product/innovative-research-and-practices-in-
second-language-acquisition-and-bilingualism-john-w-schwieter-5232486
Ecohealth Research In Practice Innovative Applications Of An Ecosystem
Approach To Health 1st Edition Dominique F Charron
https://ebookbell.com/product/ecohealth-research-in-practice-
innovative-applications-of-an-ecosystem-approach-to-health-1st-
edition-dominique-f-charron-2509880
Migration Research In A Digitized World Using Innovative Technology To
Tackle Methodological Challenges 1st Edition Steffen Ptzschke
https://ebookbell.com/product/migration-research-in-a-digitized-world-
using-innovative-technology-to-tackle-methodological-challenges-1st-
edition-steffen-ptzschke-52032244
Innovative Developments In Design And Manufacturing Advanced Research
In Virtual And Rapid Prototyping Proceedings Of Vrp4 Oct 2009 Leiria
Portugal Paulo Jorge Da Silva Bartolo
https://ebookbell.com/product/innovative-developments-in-design-and-
manufacturing-advanced-research-in-virtual-and-rapid-prototyping-
proceedings-of-vrp4-oct-2009-leiria-portugal-paulo-jorge-da-silva-
bartolo-4346062

Innovative Research
in Attention Modeling
and Computer Vision
Applications
Rajarshi Pal
Institute for Development and Research in Banking Technology, India
A volume in the Advances in Computational
Intelligence and Robotics (ACIR) Book Series

Published in the United States of America by
Information Science Reference (an imprint of IGI Global)
701 E. Chocolate Avenue
Hershey PA, USA 17033
Tel: 717-533-8845
Fax: 717-533-8661
E-mail: cust@igi-global.com
Web site: http://www.igi-global.com
Copyright © 2016 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in
any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher.
Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or
companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark.
Library of Congress Cataloging-in-Publication Data
British Cataloguing in Publication Data
A Cataloguing in Publication record for this book is available from the British Library.
All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the
authors, but not necessarily of the publisher.
For electronic access to this publication, please contact: eresources@igi-global.com.
Innovative research in attention modeling and computer vision applications / Rajarshi Pal, editor.
pages cm
Includes bibliographical references and index.
ISBN 978-1-4666-8723-3 (hardcover) -- ISBN 978-1-4666-8724-0 (ebook) 1. Computer vision. 2. Robot vision. 3.
Tracking (Engineering) I. Pal, Rajarshi, 1982-
TA1634.I5456 2015
006.3’7--dc23
2015019731
This book is published in the IGI Global book series Advances in Computational Intelligence and Robotics (ACIR) (ISSN:
2327-0411; eISSN: 2327-042X)

The Advances in Computational Intelligence and Robotics (ACIR) Book Series (ISSN 2327-0411) is published by IGI Global, 701 E.
Chocolate Avenue, Hershey, PA 17033-1240, USA, www.igi-global.com. This series is composed of titles available for purchase individually;
each title is edited to be contextually exclusive from any other title within the series. For pricing and ordering information please visit http://
www.igi-global.com/book-series/advances-computational-intelligence-robotics/73674. Postmaster: Send all address changes to above address.
Copyright © 2016 IGI Global. All rights, including translation in other languages reserved by the publisher. No part of this series may be
reproduced or used in any form or by any means – graphics, electronic, or mechanical, including photocopying, recording, taping, or informa-
tion and retrieval systems – without written permission from the publisher, except for non commercial, educational use, including classroom
teaching purposes. The views expressed in this series are those of the authors, but not necessarily of IGI Global.
IGI Global is currently accepting manuscripts
for publication within this series. To submit a pro-
posal for a volume in this series, please contact our
AcquisitionEditorsatAcquisitions@igi-global.com
or visit: http://www.igi-global.com/publish/.
• Evolutionary computing
• Artificial life
• Add/Edit Topics Covered
• Cognitive Informatics
• Natural Language Processing
• Intelligent control
• Artificial Intelligence
• Computational Intelligence
• Fuzzy Systems
• Computational Logic
Coverage
While intelligence is traditionally a term applied to humans and human cognition, technology has pro-
gressed in such a way to allow for the development of intelligent systems able to simulate many human
traits. With this new era of simulated and artificial intelligence, much research is needed in order to
continue to advance the field and also to evaluate the ethical and societal concerns of the existence of
artificial life and machine learning.
The Advances in Computational Intelligence and Robotics (ACIR) Book Series encourages
scholarly discourse on all topics pertaining to evolutionary computing, artificial life, computational
intelligence, machine learning, and robotics. ACIR presents the latest research being conducted on di-
verse topics in intelligence technologies with the goal of advancing knowledge and applications in this
rapidly evolving field.
Mission
ISSN: 2327-0411
EISSN: 2327-042X
Advances in Computational
Intelligence and Robotics
(ACIR) Book Series

Titles in this Series
For a list of additional titles in this series, please visit: www.igi-global.com
HandbookofResearchonEmergingPerspectivesinIntelligentPatternRecognition,Analysis,andImageProcessing
Narendra Kumar Kamila (C.V. Raman College of Engineering, India)
Information Science Reference • copyright 2016 • 484pp • H/C (ISBN: 9781466686540) • US $255.00 (our price)
Research Advances in the Integration of Big Data and Smart Computing
Pradeep Kumar Mallick (Institute for Research and Development, India)
Handbook of Research on Swarm Intelligence in Engineering
Siddhartha Bhattacharyya (RCC Institute of Information Technology, India) and Paramartha Dutta (Visva-Bharati
University, India)
Engineering Science Reference • copyright 2015 • 744pp • H/C (ISBN: 9781466682917) • US $335.00 (our price)
Handbook of Research on Advancements in Robotics and Mechatronics
Maki K. Habib (The American University in Cairo, Egypt)
Handbook of Research on Advanced Intelligent Control Engineering and Automation
Ahmad Taher Azar (Benha University, Egypt) and Sundarapandian Vaidyanathan (Vel Tech University, India)
Handbook of Research on Artificial Intelligence Techniques and Algorithms
Pandian Vasant (Universiti Teknologi Petronas, Malaysia)
Handbook of Research on Synthesizing Human Emotion in Intelligent Systems and Robotics
Jordi Vallverdú (Universitat Autònoma de Barcelona, Spain)
Recent Advances in Ambient Intelligence and Context-Aware Computing
Kevin Curran (University of Ulster, UK)
Recent Advances in Intelligent Technologies and Information Systems
Vijayan Sugumaran (Oakland University, USA Sogang University, Seoul, Korea)
701 E. Chocolate Ave., Hershey, PA 17033
Order online at www.igi-global.com or call 717-533-8845 x100
To place a standing order for titles released in this series, contact: cust@igi-global.com
Mon-Fri 8:00 am - 5:00 pm (est) or fax 24 hours a day 717-533-8661

Table of Contents

Preface.
.................................................................................................................................................. xv
Section 1
Visual Attention Modeling and Applications
Chapter 1
2D and 3D Visual Attention for Computer Vision: Concepts, Measurement, and Modeling................. 1
Vincent Ricordel, University of Nantes, France
Junle Wang, University of Nantes, France
Matthieu Perreira Da Silva, University of Nantes, France
Patrick Le Callet, University of Nantes, France
Chapter 2
Applications of Visual Attention in Image Processing, Computer Vision, and Graphics..................... 45
Rajarshi Pal, Institute for Development and Research in Banking Technology (IDRBT)
Hyderabad, India
Chapter 3
Biologically-Inspired Models for Attentive Robot Vision: A Review................................................... 69
Amirhossein Jamalian, Technical University of Chemnitz, Germany
Fred H. Hamker, Technical University of Chemnitz, Germany
Chapter 4
Visual Attention Guided Object Detection and Tracking...................................................................... 99
Debi Prosad Dogra, Indian Institute of Technology Bhubaneswar, India
Chapter 5
Content-Aware Image Retargeting: A Survey.
..................................................................................... 115
Rajarshi Pal, Institute for Development and Research in Banking Technology, India
Prasun Chandra Tripathi, Institute for Development and Research in Banking Technology,
India University of Hyderabad, India

Chapter 6
Video Saliency Detection for Visual Cryptography-Based Watermarking......................................... 132
Adrita Barari, Defence Institute of Advanced Technology, India
Sunita V. Dhavale, Defence Institute of Advanced Technology, India
Chapter 7
Study of Loss of Alertness and Driver Fatigue Using Visibility Graph Synchronization................... 171
Anwesha Sengupta, Indian Institute of Technology Kharagpur, India
Sibsambhu Kar, Samsung India Software Operations, India
Aurobinda Routray, Indian Institute of Technology Kharagpur, India
Section 2
Other Computer Vision Applications
Chapter 8
A Generic Design for Implementing Intersection between Triangles in Computer Vision and
Spatial Reasoning.
................................................................................................................................ 195
Chaman L. Sabharwal, Missouri University of Science and Technology, USA
Jennifer L. Leopold, Missouri University of Science and Technology, USA
Chapter 9
Multiple Object Tracking by Scale Space Representation of Objects, Method of Linear
Assignment, and Kalman Filter........................................................................................................... 236
Kumar S. Ray, Indian Statistical Institute, India
Soma Ghosh, Indian Statistical Institute, India
Kingshuk Chatterjee, Indian Statistical Institute, India
Debayan Ganguly, Indian Statistical Institute, India
Chapter 10
Digital Forensics: State-of-the-Art and Open Problems...................................................................... 260
Ruchira Naskar, National Institute of Technology Rourkela, India
Pankaj Malviya, National Institute of Technology Rourkela, India
Rajat Subhra Chakraborty, Indian Institute of Technology Kharagpur, India
Chapter 11
Passive Video Tampering Detection Using Noise Features.
................................................................ 279
Ramesh Chand Pandey, Indian Institute of Technology (BHU), Varanasi, India
Sanjay Kumar Singh, Indian Institute of Technology (BHU), Varanasi, India
K. K. Shukla, Indian Institute of Technology (BHU), Varanasi, India
Chapter 12
A Survey on Palmprint-Based Biometric Recognition System........................................................... 304
Y. L. Malathi Latha, Swami Vivekananda Institute of Technology (SVIT), India
Munaga V. N. K. Prasad, Institute for Development and Research in Banking Technology,
India

Chapter 13
Emotion Recognition Using Facial Expression................................................................................... 327
Santosh Kumar, Indian Institute of Technology (BHU), Varanasi, India
Shubam Jaiswal, Indian Institute of Technology (BHU), Varanasi, India
Rahul Kumar, Indian Institute of Technology (BHU), Varanasi, India
Chapter 14
Facial Expression Analysis Using 3D Range Images.......................................................................... 346
Parama Bagchi, RCC Institute of Information Technology, India
Debotosh Bhattacharjee, Jadavpur University, India
Mita Nasipuri, Jadavpur University, India
Chapter 15
Scalable Video Watermarking: A Survey............................................................................................ 365
Nilkanta Sahu, Indian Institute of Technology Guwahati, India
Arijit Sur, Indian Institute of Technology Guwahati, India
Chapter 16
Digital Image Watermarking Based on Fractal Image Coding............................................................ 388
Channapragada R. S. G. Rao, CMR Institute of Technology, India
India
Compilation of References................................................................................................................ 400
About the Contributors..................................................................................................................... 446
Index.................................................................................................................................................... 454

Detailed Table of Contents

Preface.
.................................................................................................................................................. xv
Section 1
Visual Attention Modeling and Applications
Chapter 1
2D and 3D Visual Attention for Computer Vision: Concepts, Measurement, and Modeling................. 1
Vincent Ricordel, University of Nantes, France
Junle Wang, University of Nantes, France
Matthieu Perreira Da Silva, University of Nantes, France
Patrick Le Callet, University of Nantes, France
Visual attention is one of the most important mechanisms deployed in the human visual system (HVS)
to reduce the amount of information that our brain needs to process. An increasing amount of efforts has
been dedicated to the study of visual attention, and this chapter proposes to clarify the advances achieved
in computational modeling of visual attention. First the concepts of visual attention, including the links
betweenvisualsalienceandvisualimportance,aredetailed.ThemaincharacteristicsoftheHVSinvolved
in the process of visual perception are also explained. Next we focus on eye-tracking, because of its role
in the evaluation of the performance of the models. A complete state of the art in computational modeling
of visual attention is then presented. The research works that extend some visual attention models to 3D
by taking into account of the impact of depth perception are finally explained and compared.
Chapter 2
Applications of Visual Attention in Image Processing, Computer Vision, and Graphics..................... 45
Rajarshi Pal, Institute for Development and Research in Banking Technology (IDRBT)
Hyderabad, India
Selective visual attention is an amazing capability of primate visual system to restrict the focus to few
interesting objects (or portions) in a scene. Thus, primates are able to pay attention to the required
visual content amidst myriads of other visual information. It enables them to interact with the external
environment in real time through reduction of computational load in their brain. This inspires image and
computer vision scientists to derive computational models of visual attention and to use them in varieties
of applications in real-life, mainly to speed up the processing through reduction of computational burden
which often characterizes image processing and vision tasks. This chapter discusses a wide variety of
such applications of visual attention models in image processing, computer vision and graphics.

Chapter 3
Biologically-Inspired Models for Attentive Robot Vision: A Review................................................... 69
Amirhossein Jamalian, Technical University of Chemnitz, Germany
Fred H. Hamker, Technical University of Chemnitz, Germany
A rich stream of visual data enters the cameras of a typical artificial vision system (e.g., a robot) and
consideringthefactthatprocessingthisvolumeofdatainreal-rimeisalmostimpossible,aclevermechanism
is required to reduce the amount of trivial visual data. Visual Attention might be the solution. The idea
is to control the information flow and thus to improve vision by focusing the resources merely on some
special aspects instead of the whole visual scene. However, does attention only speed-up processing or
can the understanding of human visual attention provide additional guidance for robot vision research? In
this chapter, first, some basic concepts of the primate visual system and visual attention are introduced.
Afterward, a new taxonomy of biologically-inspired models of attention, particularly those that are used
in robotics applications (e.g., in object detection and recognition) is given and finally, future research
trends in modelling of visual attention and its applications are highlighted.
Chapter 4
Visual Attention Guided Object Detection and Tracking...................................................................... 99
Debi Prosad Dogra, Indian Institute of Technology Bhubaneswar, India
Scene understanding and object recognition heavily depend on the success of visual attention guided
salient region detection in images and videos. Therefore, summarizing computer vision techniques that
takethehelpofvisualattentionmodelstoaccomplishvideoobjectrecognitionandtracking,canbehelpful
to the researchers of computer vision community. In this chapter, it is aimed to present a philosophical
overview of the possible applications of visual attention models in the context of object recognition and
tracking. At the beginning of this chapter, a brief introduction to various visual saliency models suitable
for object recognition is presented, that is followed by discussions on possible applications of attention
models on video object tracking. The chapter also provides a commentary on the existing techniques
available on this domain and discusses some of their possible extensions. It is believed that, prospective
readers will benefit since the chapter comprehensively guides a reader to understand the pros and cons
of this particular topic.
Chapter 5
Content-Aware Image Retargeting: A Survey.
..................................................................................... 115
Rajarshi Pal, Institute for Development and Research in Banking Technology, India
Prasun Chandra Tripathi, Institute for Development and Research in Banking Technology,
India University of Hyderabad, India
Displaying a large image in a small screen of a handheld gadget is a challenging task. Simple down-
scaling of the image may reduce some objects too small to be perceptible. This gives rise to content-aware
retargeting of the image. Important contents are allotted more screen space as compared to relatively
less important contents of the image. Various types of content-aware image retargeting approaches have
been proposed in a span of just over a decade. Another challenging area is to estimate importance of
importance of the contents. Lot of researches has been carried out in this direction too to identify the
important contents in the context of image retargeting. Equally important aspect is evaluation of these
retargeting methods. This article contains a brief survey of related research in all of these aspects.

Chapter 6
Video Saliency Detection for Visual Cryptography-Based Watermarking......................................... 132
Adrita Barari, Defence Institute of Advanced Technology, India
Sunita V. Dhavale, Defence Institute of Advanced Technology, India
The aim of this chapter is to review the application of the technique of Visual cryptography in non-
intrusive video watermarking. The power of saliency feature extraction is also highlighted in the context
ofVisualCryptographybasedwatermarkingsystemsforvideos.AllschemesinliteraturerelatedtoVisual
cryptography based video watermarking, have been brought together with special attention on the role
of saliency feature extraction in each of these schemes. Further a novel approach for VC based video
watermarking using motion vectors (MVP Algorithm) as a salient feature is suggested. Experimental
results show the robustness of proposed MVP Algorithm against various video processing attacks. Also,
compression scale invariance is achieved.
Chapter 7
Study of Loss of Alertness and Driver Fatigue Using Visibility Graph Synchronization................... 171
Anwesha Sengupta, Indian Institute of Technology Kharagpur, India
Sibsambhu Kar, Samsung India Software Operations, India
Aurobinda Routray, Indian Institute of Technology Kharagpur, India
Electroencephalogram (EEG) is widely used to predict performance degradation of human subjects
due to mental or physical fatigue. Lack of sleep or insufficient quality or quantity of sleep is one of the
major reasons of fatigue. Analysis of fatigue due to sleep deprivation using EEG synchronization is a
promising field of research. The present chapter analyses advancing levels of fatigue in human drivers in
a sleep-deprivation experiment by studying the synchronization between EEG data. A Visibility Graph
Similarity-based method has been employed to quantify the synchronization, which has been formulated
in terms of a complex network. The change in the parameters of the network has been analyzed to find
the variation of connectivity between brain areas and hence to trace the increase in fatigue levels of the
subjects. The parameters of the brain network have been compared with those of a complex network with
a random degree of connectivity to establish the small-world nature of the brain network.
Section 2
Other Computer Vision Applications
Chapter 8
A Generic Design for Implementing Intersection between Triangles in Computer Vision and
Spatial Reasoning.
................................................................................................................................ 195
Chaman L. Sabharwal, Missouri University of Science and Technology, USA
Jennifer L. Leopold, Missouri University of Science and Technology, USA
The intersection between 3D objects plays a prominent role in spatial reasoning, and computer vision.
Detection of intersection between objects can be based on the triangulated boundaries of the objects,
leading to computing triangle-triangle intersection. Traditionally there are separate algorithms for cross
andcoplanarintersection.Forqualitativereasoning,intersectiondetectionissufficient,actualintersection
is not necessary; in contrast, the precise intersection is required for geometric modeling. Herein we
present a complete design and implementation of a single integrated algorithm independent of the type
of intersection. Additionally, this algorithm first detects, then intersects and classifies the intersections

usingbarycentriccoordinates.Thisworkisdirectlyapplicableto:(1)VRCC-3D+,whichusesintersection
detection between 3D objects as well as their 2D projections essential for occlusion detection; and (2)
CAD/CAM geometric modeling where curves of intersection between a pair of surfaces are required for
numerical control machines. Experimental results are provided.
Chapter 9
Multiple Object Tracking by Scale Space Representation of Objects, Method of Linear
Assignment, and Kalman Filter........................................................................................................... 236
Kumar S. Ray, Indian Statistical Institute, India
Soma Ghosh, Indian Statistical Institute, India
Kingshuk Chatterjee, Indian Statistical Institute, India
Debayan Ganguly, Indian Statistical Institute, India
This chapter presents a multi-object tracking system using scale space representation of objects, the
method of linear assignment and Kalman filter. In this chapter basically two very prominent problems
of multi object tracking have been resolved; the two prominent problems are (i) irrespective of the size
of the objects, tracking all the moving objects simultaneously and (ii) tracking of objects under partial
and/or complete occlusion. The primary task of tracking multiple objects is performed by the method
of linear assignment for which few cost parameters are computed depending upon the extracted features
of moving objects in video scene. In the feature extraction phase scale space representation of objects
have been used. Tracking of occluded objects is performed by Kalman filter.
Chapter 10
Digital Forensics: State-of-the-Art and Open Problems...................................................................... 260
Ruchira Naskar, National Institute of Technology Rourkela, India
Pankaj Malviya, National Institute of Technology Rourkela, India
Rajat Subhra Chakraborty, Indian Institute of Technology Kharagpur, India
Digital forensics deal with cyber crime detection from digital multimedia data. In the present day,
multimedia data such as images and videos are major sources of evidence in the courts of law worldwide.
However, the immense proliferation and easy availability of low-cost or free, user-friendly and powerful
imageandvideoprocessingsoftware,posesasthelargestthreattotoday’sdigitalworldaswellasthelegal
industry. This is due to the fact that such software allow efficient image and video editing, manipulation
andsynthesis,withafewmouseclicksevenbyanoviceuser.Suchsoftwarealsoenableformationrealistic
of computer-generated images. In this chapter, we discuss different types of digital image forgeries and
state-of-the-art digital forensic techniques to detect them. Through these discussions, we also give an
idea of the challenges and open problems in the field of digital forensics.
Chapter 11
Passive Video Tampering Detection Using Noise Features.
................................................................ 279
Ramesh Chand Pandey, Indian Institute of Technology (BHU), Varanasi, India
K. K. Shukla, Indian Institute of Technology (BHU), Varanasi, India
Withincreasingavailabilityoflow-cost videoediting softwaresandtools,theauthenticityofdigitalvideo
can no longer be trusted. Active video tampering detection technique utilize digital signature or digital
watermark for the video tampering detection, but when the videos do not include such signature then it

is very challenging to detect tampering in such video. To detect tampering in such video, passive video
tampering detection techniques are required. In this chapter we have explained passive video tampering
detection by using noise features. When video is captured with camera it passes through a Camera
processing pipeline and this introduces noise in the video. Noise changes abruptly from authentic to
forged frame blocks and provides a clue for video tampering detection. For extracting the noise we have
considered different techniques like denoising algorithms, wavelet based denoising filter, and neighbor
prediction.
Chapter 12
A Survey on Palmprint-Based Biometric Recognition System........................................................... 304
Y. L. Malathi Latha, Swami Vivekananda Institute of Technology (SVIT), India
India
The automatic use of physiological or behavioral characteristics to determine or verify identity of
individual’s is regarded as biometrics. Fingerprints, Iris, Voice, Face, and palmprints are considered as
physiological biometrics whereas voice and signature are behavioral biometrics. Palmprint recognition
is one of the popular methods which have been investigated over last fifteen years. Palmprint have very
largeinternalsurfaceandcontainseveraluniquestablecharacteristicfeaturesusedtoidentifyindividuals.
Severalpalmprintrecognitionmethodshavebeenextensivelystudied.Thischapterisanattempttoreview
current palmprint research, describing image acquisition, preprocessing palmprint feature extraction and
matching, palmprint related fusion and techniques used for real time palmprint identification in large
databases. Various palmprint recognition methods are compared.
Chapter 13
Emotion Recognition Using Facial Expression................................................................................... 327
Santosh Kumar, Indian Institute of Technology (BHU), Varanasi, India
Shubam Jaiswal, Indian Institute of Technology (BHU), Varanasi, India
Rahul Kumar, Indian Institute of Technology (BHU), Varanasi, India
Recognition of facial expression is a challenging problem for machine in comparison to human and
it has encouraged numerous advanced machine learning algorithms. It is one of the methods for
emotion recognition as the emotion of a particular person can be found out by studying his or her facial
expressions. In this paper, we proposes a generic algorithms for recognition of emotions and illustrates
a fundamental steps of the four algorithms such as Eigenfaces (Principal Component Analysis [PCA]),
Fisherfaces, Local Binary Pattern Histogram (LBP) and SURF with FLANN over two databases Cohn-
kanade database and IIT BHU student face images as benchmark database.The objective of this book
chapter is to recognize the emotions from facial images of individuals and compare the performances
of holistic algorithms like Eigenfaces, Fisherfaces, and texture based recognition algorithms LBPH,
hybrid algorithm SURF and FLANN. Matching efficiency of individual emotions from facial expression
databases are labeled for training and testing phases. The set of features is extracted from labeled dataset
for training purpose and test images are matched with discriminative set of feature points. Based on
that comparison, we conclude that Eigenfaces and Fisherfaces yields good recognition accuracy on the
benchmark database than others and the efficiency of SURF with FLANN algorithm can be enhanced
significantly by changing the parameters.

Chapter 14
Facial Expression Analysis Using 3D Range Images.......................................................................... 346
Parama Bagchi, RCC Institute of Information Technology, India
Debotosh Bhattacharjee, Jadavpur University, India
Mita Nasipuri, Jadavpur University, India
Thisproposedworkdealswiththeusesandtechniquesof3Drangeimagesforfacialexpressionrecognition.
A 3D range image is basically a depth image (also called a 2.5D image), which contains depth information
at each (x, y) pixel of the image. In the future, computer vision will become a part of our everyday life
because of all of its extensive applications. Hence, the interactions between users and computers need to
be more natural, and emphasizing as well as enumerating human-to-human communication to a larger
extent. That is the reason why facial expressions find importance. Facial expression is an important
factor of communication, and they reveal unknown facts about a person’s feelings and emotions. There
comes the need of a real facial expression detection system. Also, changes in expression are of great
importance for the interpretation of human facial behavior as well as face recognition.
Chapter 15
Scalable Video Watermarking: A Survey............................................................................................ 365
Nilkanta Sahu, Indian Institute of Technology Guwahati, India
Arijit Sur, Indian Institute of Technology Guwahati, India
In recent times, enormous advancement in communication as well as hardware technologies makes the
video communication very popular. With the increasing diversity among the end using media players and
its associated network bandwidth, the requirement of video streams with respect to quality, resolution,
frame rate becomes more heterogeneous. This increasing heterogeneity make the scalable adaptation of
the video stream in the receiver end, a real problem. Scalable video coding (SVC) has been introduced as
a countermeasure of this practical problem where the main video stream is designed in such a hierarchical
fashion that a set of independent bit streams can be produced as per requirement of different end using
devices. SVC becomes very popular in recent time and consequently, efficient and secure transmission
of scalable video stream becomes a requirement. Watermarking is being considered as an efficient DRM
tool for almost a decade. Although video watermarking is regarded as a well focused research domain,
a very less attention has been paid on the scalable watermarking in recent times. In this book chapter,
a comprehensive survey on the scalable video watermarking has been done. The main objective of this
survey work is to analyse the robustness of the different existing video watermarking scheme against
scalablevideoadaptationandtrytodefinetheresearchproblemsforthesame.Firstly,fewexistingscalable
image watermarking schemes are discussed to understand the advantages and limitations of the direct
extension of such scheme for frame by frame video watermarking. Similarly few video watermarking and
some recent scalable video watermarking are also narrated by specifying their pros and cons. Finally, a
summaryofthissurveyispresentedbypointingoutthepossiblecountermeasureoftheexistingproblems.

Chapter 16
Digital Image Watermarking Based on Fractal Image Coding............................................................ 388
Channapragada R. S. G. Rao, CMR Institute of Technology, India
India
This chapter proposes a watermarking technique using Ridgelet and Discrete Wavelet Transform (DWT)
techniques. A wavelet transform is the wavelet function representation. A wavelet is a mathematical
function which divides a continuous time signal into different scale components, where each scale
components is assigned with a frequency range. Wavelets represent objects with point singularities, while
ridgelets represents objects with line singularities. The Ridgelet transform Technique is a multi-scale
representation for functions on continuous spaces that are smooth away from discontinuities along lines.
The proposed technique applies Ridgelet transform on the cover image to obtain ridgelet coefficients.
These coefficients are transformed by using 2-level DWT to get low frequency sub-bands – LL1 and LL2.
The mutual similarities between LL1 and LL2 sub-bands are considered for embedding watermark. The
obtained watermarked image has better quality when compared to a few exiting methods.
Compilation of References................................................................................................................ 400
About the Contributors..................................................................................................................... 446
Index.................................................................................................................................................... 454

Preface

Last few decades have witnessed a tremendous growth of image processing and computer vision. Im-
provement in sensor technologies coupled with efficient and effective algorithm development for image
processing and computer vision has led to wide-spread usage of images and videos in several areas. It
ranges from secret sharing through visual cryptography to surveillance and robotics. These later two
require real-time intelligence to analyze surrounding images. Even image based technologies have been
in use in sensitive domain like banking. For example, image based cheque clearing in banking system has
emerged as a new phenomenon. So integrity of images has become a concern. In fact, it is not possible
to codify all possible aspects of image processing and computer vision theories and their applications
within a single book.
This edited book puts an effort to hold few of the application areas of image processing and computer
vision. Many of these applications are benefited by theories from a psycho-visual phenomenon, called
selective visual attention. Visual attention is a psycho-visual mechanism through which primates select
only a few of the incoming sensory inputs for deeper analysis and recognition in brain. Though primate
brains are very efficient, they cannot process the entire spectrum of sensory information in real-time.
In order to carry out real-time interaction with external surrounding, primates pay attention to only
selected stimuli, discarding myriads of other inputs. Which portions in a visual scene they focus has
become a research problem. It has been observed that primate vision is guided by two kinds of attention
mechanism – bottom-up and top-down. Bottom-up mechanism is purely stimulus driven. For example,
a bright color tends to attract our vision. Top-down mechanism is a task-driven approach. This is guided
by the task which an individual is performing.
Computer vision tasks also feel the burden of processing a bulk amount of image or video data.
The capability of primates in efficiently dealing with dynamic surrounding through selective attention
mechanism motivates computer vision researchers to develop efficient computer vision systems. They
emulate this attention mechanism of primate vision to find out portions of importance in the visual
scene. Processing based on only these selected portions in the scene increases speed as well as quality
of outcome.
Beside these attention guided applications, many other interesting research areas in computer vision
and image processing have also been presented in this book. Thus, this book is a compilation of research
efforts in a wide spectrum of areas. Alongside highlighting usefulness of visual attention models in
image processing and computer vision, this book sets up a platform of cross-disciplinary exchange of
knowledge. Hopefully, these discussions will further open up new research directions.
xv

Preface
This book contains few survey articles which give a good overview of the research that have been or
still being carried out in some of these fields. Putting a good amount of existing research outcomes under
one umbrella, these chapters are able to spot the lacunas which are still present there. Few new techniques
have also been proposed in some of the chapters of this book to bring forward recent innovations.
This book has been divided into two sections. The first section, which consists of seven chapters
(Chapter 1 to Chapter 7), speaks about visual attention models and how various applications in computer
vision and image processing benefit from these visual attention models. The second section of this book
(Chapter 8 to Chapter 16) discusses numerous other techniques in computer vision and image processing.
Chapter 1 discusses the conceptual theories related to visual attention mechanism. Then, it discusses
various visual attention models. Both 2D and 3D models of visual attention have been presented. Chapter
2 highlights the usefulness of visual attention in image processing, computer vision and graphics through
a thorough listing of applications which are benefited from theories of visual attention. Chapter 3 to
Chapter 8, then, discusses few of these application areas in details. Chapter 3 sketches how attention is
an important topic for robotic vision. A review on biologically-inspired models of attentive robot vision
is presented in this chapter. Visual attention guided object detection and tracking has been discussed in
Chapter 4. Chapter 5 is a compilation of phenomenal research efforts in the area of content-aware (based
on identification of attentive regions) image retargeting. Image retargeting is useful in fitting an image
of sufficiently high resolution in relatively small displays. Chapter 6 showcases the power of saliency
for visual cryptography based watermarking. It proposes a novel visual cryptography based watermark-
ing scheme using motion vector as a salient feature. Chapter 7 reports an electroencephalogram (EEG)
based study of loss of alertness and fatigue of a driver using a visibility graph synchronization approach.
The second section (Chapter 8 to Chapter 16) of this book suggests that there are lot more in image
processing and computer vision which do not intersect with visual attention theories. Chapter 8 describes
a generic implementation for triangle-triangle intersection and its applications. Chapter 9 explains how
some of the important problems in multi-object tracking can be tackled using scale-space representation
of the objects, the method of linear assignment and Kalman filter. Chapter 10 discusses state-of-the-art
research in digital image forensics and highlights few of the research challenges which must draw impor-
tance from research community. Chapter 11 discusses how noise feature can be used for video tampering
detection. Chapter 12 provides an overview of existing palmprint based biometric recognition systems.
Chapter 13 discusses techniques of emotion recognition from facial expressions. A novel technique of
recognizing expression faces is proposed in Chapter 14 using 3D range images. Chapter 15 provides a
survey of research in the area of scalable video watermarking. Chapter 16 proposes a technique of image
watermarking based on fractal image coding.
xvi

Section 1
Visual Attention Modeling and
Applications

1
Copyright © 2016, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Chapter 1
DOI: 10.4018/978-1-4666-8723-3.ch001
ABSTRACT
Visual attention is one of the most important mechanisms deployed in the human visual system (HVS) to
reduce the amount of information that our brain needs to process. An increasing amount of efforts has
been dedicated to the study of visual attention, and this chapter proposes to clarify the advances achieved
in computational modeling of visual attention. First the concepts of visual attention, including the links
betweenvisualsalienceandvisualimportance,aredetailed.ThemaincharacteristicsoftheHVSinvolved
in the process of visual perception are also explained. Next we focus on eye-tracking, because of its role
in the evaluation of the performance of the models. A complete state of the art in computational modeling
of visual attention is then presented. The research works that extend some visual attention models to 3D
by taking into account of the impact of depth perception are finally explained and compared.
INTRODUCTION
In everyday life, we are constantly receiving an abundant amount of information through various senses.
Amongthesenses,sightisconsideredtobethemostdominantone(Wandell,1995).However,oursensory
system for vision, the human visual system (HVS), continually receives a really large amount of visual
data and it is beyond our brain’s capability to process all of them (Borji Itti, 2013). To cope with this
large amount of information, visual attention is one of the most important mechanisms deployed in the
HVS to reduce the complexity of the analysis of visual scene (Wolfe, 2000). Driven by visual attention,
viewers can selectively focus their attention on specific areas of interest in the scene.
2D and 3D Visual Attention
for Computer Vision:
Concepts, Measurement, and Modeling
Vincent Ricordel
University of Nantes, France
Junle Wang
Matthieu Perreira Da Silva
Patrick Le Callet

2
2D and 3D Visual Attention for Computer Vision

In the last decades, extensive efforts have been dedicated to the study of visual attention. Neurologists,
psychologists, vision scientists, and computer scientists have taken part in, and contributed to various
aspects of visual attention. These efforts from different disciplines made the research on visual attention
become a highly interdisciplinary field; different relevant disciplines deal with the research on visual
attention from different points of view, and profit from each other.
In recent years, the use of visual attention mechanisms in image processing systems has found increas-
ing interest by computer scientists. Taking into account visual attention information becomes an effective
way for improving various existing algorithms in image processing. A variety of areas, including com-
pression (Parkhurst, Law, Niebur, 2002), retargeting (D. Wang, Li, Jia, Luo, 2011), image retrieval
(Vu, Hua, Tavanapong, 2003), quality assessment (H. Liu Heynderickx, 2011), have been benefiting
of being provided information about the locations that attracts viewer’s attention in the visual scene.
When visual attention is taken into account by the signal-processing community, the two terms, “sa-
lience” and “importance”, have traditionally been considered synonymous. It is true that both of visual
salience and visual importance denote the most visually “relevant” parts of the scene. However, from
the vision scientist’s point of view, they are two different concepts, since they come from two different
mechanisms of visual attention: bottom-up and top-down. The two mechanisms are driven by different
types of stimuli, and are formed in different visual pathways that go through different areas of the brain.
Therefore, it would be worth identifying the two terms in the context of image processing.
In recent years, another problem faced by researchers in the field of visual attention is the impact
of 3D. During the viewing of 3D content, depth perception of the scene is enhanced. This change of
depth perception also largely changes human viewing behavior (Hakkinen, Kawai, Takatalo, Mitsuya,
Nyman, 2010; Huynh-Thu, Barkowsky, Le Callet, others, 2011). Because of the emergence of 3D
content and recent availability of 3D-capable display equipments, studies related to 3D visual attention
have been gaining an increasing amount of attention in the last few years.
In this chapter we propose to clarify the last advancements in computational modeling of visual at-
tention.
The first section details the concepts of visual attention including the latest research results linking
visual salience and visual importance. The main characteristics of the human visual system involved
in the process of visual perception are explained in section 2. Because of their substantial role in the
evaluation of the performances of the models, eye-tracking systems are presented in section 3. A com-
plete state of the art in computational modeling of visual attention is then presented in section 4. The
research works that extend some visual attention models to 3D by taking into account of the impact of
depth perception are finally explained in section 5.
1. VISUAL ATTENTION
It would be difficult to go directly into specific studies without a general introduction of some background
knowledge on visual attention. So in this section, we first introduce concepts of visual attention as well
as various mechanisms of attention. Secondly, we present a brief introduction of the HVS and of dif-
ferent types of eye movements as well as the technique for measuring eye movements, i.e. eye-tracking.
Finally, we introduce some typical state-of-the-art computational models of visual attention.
The oldest and most famous definition of attention, which is provided by the psychologist William
James (James, Burkhardt, Skrupskelis, 1980), dates back to year 1890: “Everyone knows what atten-

3

tion is. It is the taking possession by the mind, in clear and vivid form, of one out of what seem several
simultaneously possible objects or trains of thought. Focalization, concentration, consciousness are of
its essence. It implies withdrawal from some things in order to deal effectively with others”.
In the HVS, attention plays an important role in visual processing by keeping only the essential vi-
sual information. Tsotsos et al. (1995) proposed that visual attention is a mechanism having at least the
following basic components:
1. The selection of a region of interest in the visual field;
2. The selection of feature dimensions and values of interest;
3. The control of information flow through the network of neurons that constitutes the visual system;
and
4. The shifting from one selected region to the next in time. Driven by visual attention, viewers can
therefore selectively focus their attention on specific areas of interest in the scene.
1.1 Overt Attention and Covert Attention
There are two types of attention, namely overt attention and covert attention. These two types of atten-
tion are differentiated based on their relation with eye movements.
Overt attention is usually associated with eye movements. This type of attention is easy to observe:
when we focus our attention to an object, our eyes move to fixate this object. One of the earliest studies
of overt attention came from Yarbus (1967). In particular, he studied the correlation between visual at-
tention and eye movements during the viewing of human faces.
In addition to overt attention, William James et al. (1980) found that human are able to attend to
peripheral locations of interest without moving the eyes; this type of attention is named as covert atten-
tion. An advantage of covert attention is its independence to motor commands (Frintrop, 2006). Since
the eyes do not need to be moved to focus attention on a certain region, covert attention is much faster
as compared to overt attention. An example of covert attention is driving, where a driver keeps his eyes
on the road while simultaneously covertly monitoring the status of signs and lights (Borji Itti, 2013).
Overt attention and covert attention are not independent. Humans cannot attend to one location while
moving their eyes to a different location (Deubel Schneider, 1996). The covert shift of attention to a
location is linked to eye movement by setting up a saccade to that location (Peterson, Kramer, Irwin,
2004).
Most of current studies, especially the studies of computational modeling of visual attention, are
with respect to overt attention, since overt attention can be measured in a straightforward way by using
eye-tracking. However, it is difficult to measure covert attention. A computational framework for covert
attention is also still lacking.
1.2 Bottom-Up Attention and Top-Down Attention
A shift of attention can be caused by two categories of cues: one is referred to as bottom-up cue, and
the other one is referred to as top-down cue.
Bottom-up attention is driven by the characteristics of a visual scene, i.e. the bottom-up cues. Bottom-
up attention is hence also referred to as stimulus-driven attention or exogenous attention. Bottom-up
attention is fast, involuntary, and most likely feed-forward (Borji Itti, 2013). Since bottom-up attention

4

is usually driven by low-level features (e.g. intensity, color, and orientation), in order to attract human’s
bottom-up attention, an area must be sufficiently distinctive compared to the surrounding area with
respect to these low-level visual features.
On the other hand, top-down attention is based on “higher level” information, such as knowledge,
expectations and current goals (Desimone Duncan, 1995). Top-down attention is thus also referred to
as concept-driven attention, goal-driven or endogenous attention. As compared to bottom-up attention,
top-down attention is slow, voluntary and driven by the task demands. A famous illustration of top-down
attentioncomesfromYarbus’sworkin1967(Yarbus,1967).Hedemonstratedhoweyemovementsvaried
depending on the question asked during the observation of the same scene (see Figure 1).
1.3 The Feature Integration Theory
One of the best-known and most accepted theories of visual attention is the “Feature Integration Theory”,
which was proposed by Treisman and Gelade (1980). This theory has been the basis of many computa-
tional models of visual attention.
Figure 1. An example that eye movements depend on observer’s viewing task
(Yarbus, 1967). (Image from Lucs-kho at en.wikipedia [Public domain] via Wikimedia Commons.).

5

Treisman and Gelade (1980) claimed that “different features are registered early, automatically and
in parallel across the visual field, while objects are identified separately and only at a later stage, which
requires focused attention”. According to the Feature Integration Theory (FIT), the different features of
stimuli are firstly encoded in areas partially independent. In addition, our hierarchical cortical structures
are organized in order to make the detection of these features relatively independently of their positions
in the visual scene.
The FIT introduced a concept of “feature maps”, which are topographical maps that highlight salience
according to the respective feature. Information of the feature maps is then collected in a “master map
of location”. This map indicates the location of the objects, but does not provide information about what
the objects are.
Finally, to construct a coherent representation of the scene, selective attention is used. The scene is
scanned by an attentional beam of variable size (see Figure 2). This beam blocks the information that
is not located within its radius. It is thus possible to match all the features found in this area in order to
build a coherent representation. By moving the beam over time, our brain gradually constructs a global
perception of the scene.
1.4 Linking Visual Salience and Visual Importance
Visual salience (Koch Ullman, 1987; Itti, Koch, Niebur, 1998) and visual importance (Osberger
Maeder, 1998; Maeder, 1995; Etz Luo, 2000; Kadiyala, Pinneli, Larson, Chandler, 2008) come
Figure 2. Illustration of the Feature Integration Theory
(Perreira Da Silva, 2010).

6

from the two different mechanisms of visual attention, the bottom-up mechanism and the top-down
mechanism, respectively. Both visual salience and visual importance can provide important insights
into how the human visual system addresses the image-analysis problem. Both of them are also believed
to denote the most visually “relevant” parts of the scene. However, despite the differences in the way
(bottom-up) visual salience and (top-down) visual importance are determined in terms of human visual
processing, both salience and importance have traditionally been considered synonymous in the signal-
processing community.
Experimental Study
A study measuring the similarities and differences between visual salience and visual importance has
been conducted. We present the results of these two psychophysical experiments and the associated
computational analyses designed to quantify the relationship (and its evolution over time) between visual
salience and visual importance:
• A first experiment was performed to obtain visual importance maps for a large database of im-
ages. A visual importance map is an object-level map that specifies the visual importance of each
object in an image relative to the other objects in the image (including what would normally be
considered as the background). The object(s) that receive(s) the greatest visual importance is/are
traditionally considered as the image’s main subject. By using images from the Berkeley Image
Segmentation Dataset, importance ratings were collected for each object in the 300 database im-
ages. Such importance ratings are generally believed to result from top-down visual processing,
since the decisions used to rate each object typically involve scene interpretation, object recogni-
tion, and often consideration of artistic intent.
• In a second experiment, visual gaze patterns were measured for 80 of the images from the same
Berkeley Image Segmentation Dataset. Using an eye-tracker, visual gaze locations were recorded
under task-free viewing. Whereas importance maps are driven primarily by top-down processing,
visual gaze patterns are generally believed to be driven by bottom-up, signal-based attributes, at
least for early gaze locations. Bottom-up saliency (Koch Ullman, 1987) is one particular signal-
based attribute, which has been shown to correlate well with early gaze locations. An image re-
gion is considered visually salient if it “stands out” from its background in terms of one or more
attributes (e.g., contrast, color, orientation). When visual gaze patterns are measured in task-free
viewing, one can consider the locations to denote the salient regions in the image. Thus, from the
gaze patterns, one can construct an experimental saliency map.
Results and Analysis
Qualitative Observations of Importance Maps and Saliency Maps
A qualitative comparison of the saliency maps and importance maps reveals some distinct similarities
and differences between the two. Figure 3 depicts some representative examples.
First of all, one can notice that both importance maps and saliency maps are centrally biased, because
source content is itself centrally biased by the photographer (the region of interest typically tends to be
in the center of the photos that people take).

7

Tatler (2007) has proposed a method to correct data in very specific cases by using a centrally biased
sampling distribution (from fixations from same observers of other images) when sampling non-fixated
points for ROC computation. This helps evaluating the real values of a saliency prediction model without
the effect of central bias. In our case, this method in not applicable since we don’t evaluate the perfor-
mance of a visual attention prediction model, but we compare using a fair approach, visual salience
(namely here, salience and central bias) and visual importance maps.
The importance maps suggest that object category plays a bigger role than most other factors in de-
termining subjective importance. In general, we found that observers tended to rate objects containing
human faces and/or animals (for instance, consider in Figure 3, the image at the left second line, or the
one at the right first line) to be of greatest importance. Background objects such as sky and grass were
generally rated to be of least importance (for instance, consider in Figure 3, the image at the left 5th line,
or the one at the right 4th line). Occlusion, whether an object is in the foreground vs. the background
(for instance, consider in Figure 3, the image at the right 7th line), also seems to be an important factor
for perceived importance.
Figure 3. Representative results from the experiments

8

The saliency maps generally suggest that regions that possess a distinguished shape, color, contrast,
or other local spatial features attract attention. However, observers always gazed upon the image’s main
subject(s): Gaze position samples tended to occur on objects which belong to animal faces, human faces,
or other subjects, which represent the region of interest in the image. The background, such as sky and
ground, always attracted the least attention.
Yet, despite these similarities, the saliency maps and importance maps do not always agree. Although
weemployedarelativelylongviewingperiod,thesaliencymapsneveryieldedanobject-levelsegregation
that is enforced in the importance maps. For example, whenever a face occurred in an image, whether
an animal face or a human face, the observers’ gaze positions always occurred on the face.
Predicting the Main Subject, Secondary Subject, and the Background
The results of the qualitative analysis suggest a relationship between saliency maps and importance maps.
One way to quantify this relationship is to attempt to predict the importance maps from the saliency
maps using the object-level segmentations as side-information. To predict the importance maps from
the saliency maps (given the segmentations), the following two approaches were tested:
1. Mean Saliency: For each object, we summed those values of the saliency map that occurred within
the object, and then we divided this value by the total number of pixels in the object. For each im-
age, the resulting set of per-object saliency values was then normalized to span the range [0, 1].
2. Coverage Saliency: For each object, we summed those values of the saliency map which occurred
within the object, and then we divided this value by the number of pixels in the object that were
gazed upon (specifically, the number of pixels that were covered by the fovea). For each image,
the resulting set of per-object coverage saliency values was then normalized to span the range [0,
1].
To facilitate the prediction, each importance map was quantized into three classes based on the
importance values:
• Main Subject: Consisted of objects that received an importance value ranging from 2/3 to 1;
• Secondary Subject: Consisted of objects that received an importance value ranging from 1/3 to
2/3;
• Background: Consisted of objects that received an importance value ranging from 0 to 1/3.
The results of the prediction by the two approaches, namely Mean Saliency and Coverage Saliency
are provided in Table 1 and Table 2, respectively. The prediction accuracy is presented in the form of
confusion matrices. Each row of each matrix represents the actual (importance) class, and each column
represents the predicted class. An ideal prediction would yield a diagonal matrix with 100% values, which
means that all the objects belonging to each actual class are successfully predicted. As shown in Table
1, Mean Saliency can successfully predict the main subject 80.5% of the time. In other words, 80.5% of
the objects predicted as main subject are really the main subject of the scene; so 19.5% of the objects
predicted as main subject actually belong to secondary subject or background. Similarly, the background
issuccessfullypredictedapproximately47%ofthetime.Wealsofoundthat,CoverageSaliency(shownin
Table 2) yields lower prediction accuracy for main subject, but slightly higher accuracy for background.

9

Temporal Analysis
During normal viewing, because visual attention shifts from one object to another, the number of gaze
position samples that occur on each subject varies over time. For each of the three levels of importance
(main subject, secondary subject, background), we analyzed this time dependence. Specifically, we
computed the number of gaze position samples per importance class which occurred within each 100-
ms interval during the 15-second viewing time. The resulting three time curves, summed across all
observers, are shown in Figure 4.
Table 1. Confusion matrices for predicting each subject’s importance from gaze data, and using the
Mean Saliency approach
Predicted
Main Subject Secondary Subject Background
Actual
Main Subject 80.5% 29.8% 12.6%
Secondary Subject 12.5% 42.6% 40.7%
Background 7.1% 27.6% 46.7%
Table 2. Confusion matrices for predicting each subject’s importance from gaze data, and using the
Coverage Saliency approach
Predicted
Actual
Main Subject 56.5% 38.6% 8.2%
Background 30.5% 21.1% 67.1%
Figure 4. Total number of gaze position samples in (a) main subjects, (b) secondary subjects, and (c)
background computed in each 100-ms interval of the 15-second viewing time; note that the scale for the
vertical axis in the first graph is 10x that of the other two graphs.

10

The plots in Figure 4 clearly indicate that, on average, subjects from different importance classes at-
tract considerably different amounts of visual attention. Specifically, throughout the 15-second viewing
time, the main subject always received the greatest number of gaze position samples, approximately 7-8
times greater than the number of samples for secondary subject and background.
Within 0-500 ms, the number of gaze position samples for the main subject (see Figure 4 (a)) was
already 4-6 times greater than the number of samples for secondary subject (see Figure 4 (b)) and back-
ground (see Figure 4 (c)). This observation suggests bottom-up mechanisms can be effective at locating
the main subject in these images; this might result from the fact that photographers tend to increase
the saliency of the main subject via retouching, selective focusing, or other photographic techniques.
Between 500-2000 ms, there was a pronounced increase in the number of gaze position samples for
the main subject (see Figure 4 (a)), while the number for the other two importance classes decreased
in this period (see Figures 4 (b) and (c)). These changes potentially indicate the influence of top-down
mechanisms that might force observers to attend to the main subject. After this process, the number of
gaze position samples for the main subject slightly decreased, and those for the other two classes slightly
increased. This latter change may imply that the observers attempt to explore the whole image, but their
attention is still held by the main subject.
These three time curves suggest that the relationship between visual salience and visual importance
may be time dependent. In particular, the fact that the main subject attracts the most attention within
0-2000ms suggests that these early gaze position samples might be a better predictor of visual impor-
tance for the main subject than previously achieved using all samples. Accordingly, we predicted the
importance maps by using the samples taken from only the first 0-2000ms. Table 3 lists the resulting
confusion matrix computed (using Mean Saliency approach) based on gaze data of the first 2 seconds.
Figure 5 depicts representative importance maps predicted from the data taken from all 15 seconds (see
in Figure 5 the maps of the 3rd
column) and from only the first two seconds (see in Figure 5 the maps of
the 4th column). By using only these early gaze data, better prediction is achieved for the main subject.
Thissectionpresentedtheresultsoftwopsychophysicalexperimentsandanassociatedcomputational
analysis designed to quantify the relationship between visual salience (namely here, visual salience and
central bias) and visual importance. We found that saliency maps and importance maps are related, but
perhaps less than one might expect. The saliency maps were shown to be effective at predicting the
main subjects. However, the saliency maps were less effective at predicting the subject of secondary
importance and the unimportant one. We also found that the vast majority of early gaze position samples
(0-2000 ms) were made on the main subject. This suggests that a possible strategy of the human visual
system is to quickly locate the main subject in the scene.
Table 3. Confusion matrix (using Mean Saliency approach) for predicting importance from the first 2
seconds of gaze samples
Predicted
Actual
Main Subject 89.0% 43.5% 12.4%
Background 7.7% 13.0% 60.5%

11

The implications of the findings presented in this chapter are quite important for image processing.
Severalalgorithmshavebeenpublishedwhichcansuccessfullypredictgazepatterns,e.g.(Ittietal.,1998;
Le Meur, Le Callet, Barba, Thoreau, 2006). These results suggest that these predicted patterns can
be used to predict importance maps when coupled with a segmentation scheme. In turn, the importance
maps can then be used to perform importance-based processing such as auto-cropping, enhancement,
compression, unequal error protection, and quality assessment.
Figure 5. Representative results of using all gaze samples vs. only those from the first two seconds to
predict the importance maps

12

Removing the central bias in the analysis of the relationship between visual salience and visual impor-
tance is not a simple task, and it was not done in this initial work (J. Wang, Chandler, Le Callet, 2010)
on which this part of this chapter is based. This issue is for the moment left for further investigations.
2. VISUAL ATTENTION AND THE HUMAN VISUAL SYSTEM
Since visual attention is a mechanism involved in the process of visual perception, it is of importance
to introduce also the knowledge regarding how visual information is processed in the human visual
system. While being far from an exhaustive explanation of the HVS and the mechanisms involved in
the processing of visual information, we briefly present in this section an introduction of the retina and
different areas of the visual cortex (Figure 6) that allow determining the main characteristics of the HVS.
2.1 The Retina
The retina is a light-sensitive surface, which has over 100 million photoreceptor cells (Mather, 2009).
The photoreceptor cells are responsible for transducing light energy into neural signals. Note that the
retina is not of uniform spatial resolution. The density of photoreceptor cells is higher at the center,
which enables vision to be more accurate at the center (i.e. the fovea) than at the periphery. There are
two types of photoreceptor cells: rods and cones, which are sensitive to light and color, respectively.
Cone photoreceptors can be divided into three classes based on their spectral sensitivity: “Blue” or short
wavelength (S); “Green” or medium wavelength (M); and “Red” or long wavelength (L).
The photoreceptor cells are connected to Ganglion cells, which provide the output signal from the
retina. The receptive field of ganglion cell is circular and separated into two areas: a center area and a
surround area. Two types of ganglion cells exist: the on-center cells, which respond excitatorily to light
at the center and off-center cells, which respond inhibitorily to light at the center (Frintrop, 2006). The
center area and the surround area always have opposite characteristics. This is why center-surround
filtering is used as a mechanism for processing visual information in many computational models of
visual attention.
Figure 6. The human visual system: from the retina to different areas of the visual cortex
Adapted from Kandel et al. (2000).

13

2.2 Visual Pathway
From the retina the optic nerve transmits visual information to the optic chiasm. From the optic chiasm,
two visual pathways go to different areas of brain (see Figure 6). In primates, the majority (approximately
90%) of the visual information is transmitted by the retino-geniculate pathway to the Lateral Geniculate
Nuclei (LGN); the remaining 10% goes to the superior colliculi. LGN cell fibers then transmit visual
information to the cortical receiving area for vision, known as primary visual cortex or V1, which is
located at the very back of the brain. From the primary visual cortex, the visual information is sent to
higher brain areas, namely extrastriate cortex. The extrastriate cortex includes: V2, V3, V4, the infero-
temporal cortex, the middle temporal area and the posterior-parietal cortex (Frintrop, 2006).
There is evidence that the connections between extrastriate areas segregate into two pathways after
area V2: a ventral path way and a dorsal pathway. The dorsal pathway, which concerns the motion and
depth information, runs via V3 to the middle temporal area (MT), then to the medial superior temporal
area (MST) and the parieto occipale area (PO) and finally to the posterior-parietal cortex (PP). The dor-
sal pathway is also named as the “where pathway”, since it mainly deals with the question of “where”
something is in a scene. The ventral pathway, which processes color and form information, runs to V4
and finally in infero-temporal cortex (IT). Since the area IT responds to the recognition of objects, this
pathway is also named as the “what pathway”.
2.3 Attentional Mechanisms in the Brain
So far, it is believed that visual attention is not guided by any single brain area. Several areas have been
found to be involved in the attentional process, but the accurate task and behavior of each area, as well
as the interplay among these areas, still remain an open questions (Frintrop, 2006). Nevertheless, several
findings have been claimed. It was proposed that the posterior-parietal cortex responds to disengaging
the focus of attention from its present location (inhibition of return); the superior colliculus (SC) is re-
sponsible for shifting the attention to a new location (Posner Petersen, 1990). The Frontal Eye Field
area of the prefrontal cortex is found to be involved in guiding the eye movements. Additionally, this
area is also the place where a kind of saliency map is located, which is affected by both bottom-up and
top-down information (Bichot, 2001).
3. EYE MOVEMENTS AND EYE-TRACKING
Eye tracking is a technique which records the eye movements so that researchers can obtain precise
information about
1. Where an observer is looking at any given time, and
2. The sequence in which his eyes are shifting from one location to another.
Eye tracking plays a substantial role in the research of psychology, biology, computer vision, and
especially the computational modeling of visual attention. Given the strong link between overt visual

14

attention and eye movements (Itti Koch, 2001; Wolfe Horowitz, 2004), eye movements data col-
lected by means of eye tracking experiment are used as the ground truth to evaluate the performance of
computational models.
3.1 Measures of Eye-Movements
Just and Carpenter (1976) assumed that what a person is looking at indicates what is at the “top of the
stack” in cognitive processes. This “eye-mind” hypothesis implies that the eye movements provide a trace
about where a person’s (overt) attention is being directed. There exist various types of eye movements.
Two basic ones are “fixation” and “saccade”. From these two basic eye movements, another measure-
ment, “scanpath”, is stemmed. Moreover, pupil size and blink rate are also two types of eye movements
usually studied. Introduction of each type of eye movement as well as metrics based on these basic types
of eye movement are presented below.
Fixations
A fixation means that the visual gaze is approximately stationary on a single location. Fixations last for
218 milliseconds on average, with a range of 66 to 416 milliseconds (Poole Ball, 2006). Based on
fixations, several metrics can be derived:
• Fixations per Area of Interest: Experiments show that more fixations on a particular area indi-
cate a greater interest or importance of a target (J. Wang, Chandler, Le Callet, 2010). And it may
also mean that the target is complex in some way and difficult to encode (Just Carpenter, 1976).
Jacob and Karn (2003) suggest that, in a search task, a higher number of fixations often means a
greater uncertainty in recognizing a target item.
• Fixations Duration: A longer fixation can be interpreted in two ways, it may imply that informa-
tion is difficult to extract, or the object is more engaging in some way (Just Carpenter, 1976).
• Fixation Spatial Distribution: Cowen et al. (2002) suggested that highly concentrated fixations
in a small area mean a focused and efficient searching, and evenly spread fixations indicate a wide-
spread and inefficient searching. It was also found that if an object contains an area with highly
concentrated fixations, the object tends to be considered as of high importance (J. Wang et al.,
2010).
• Repeat Fixations or “Post-Target Fixations”: A higher number of off-target fixations after the
target has been fixated (i.e., a lower number of repeat fixations) means that the target lacks mean-
ingfulness or visibility (Goldberg Kotval, 1999).
• Time to First Fixation On-Target: A shorter time to first-fixation on an object or area indi-
cates that the object or area has better attention-getting properties (Byrne, Anderson, Douglass,
Matessa, 1999).
Note that in the studies of computational modeling of visual attention, fixation spatial density is the
metric mostly used, by means of computing a so-called “fixation density map”.

15

Saccades
Saccades are those quick, simultaneous movements of both eyes in the same direction (Cassin, Solomon,
Rubin, 1990). They are fast movements of eyes occurring between fixations. It is generally believed
that no encoding takes place in the human visual system during saccades, so vision is suppressed and
it is difficult for us to get any clues about the complexity or salience of an object from the saccades.
However, information about visual perception can be still extracted from several saccade metrics:
• Number of Saccades: A larger number of saccades indicates that more searching takes place dur-
ing the observation (Goldberg Kotval, 1999).
• Saccade Amplitude: Saccade amplitude is computed by measuring the distance between one
saccade’s start point (a fixation) and its end point (another fixation). Larger amplitude indicates
the existence of more meaningful cues, since the attention is drawn from a distance (Goldberg,
Stimson, Lewenstein, Scott, Wichansky, 2002).
Scanpaths
Scanpath is a metric derived from the measurement of both fixations and saccades. A scanpath means a
complete saccade-fixate-saccade sequence. The area covered by scanpath indicates the area observed. A
longer scanpath means a less efficient searching (Goldberg et al., 2002). Additionally, we can compare
the time spent for searching (saccades) to the time spent for processing (fixation) in a scanpath. A higher
saccade/fixation ratio means more searching or less processing.
Blink Rate and Pupil Size
The blinking of eyes and the variation of pupil size are two eye movements that could also be recorded
during eye tracking experiments. They can be considered as a cue that indicates cognitive workload. A
lower blink rate is assumed to indicate a higher cognitive workload (Bruneau, Sasse, McCarthy, 2002),
and a higher blink rate may indicate visual fatigue (Brookings, Wilson, Swain, 1996). The changing
of pupil size also indicates some kinds of cognitive effort (Marshall, 2000). However, the blink rate
and the pupil size can be easily affected by many factors during the observation, e.g. the luminance of
environment. Due to this reason, blink rate and pupil size are not widely used in the researches related
to visual attention.
3.2 Eye-Tracking
Eye tracking is a technique which records eye movements so that the researchers can obtain precise
information about
1. Where an observer is looking at any given time, and
2. The sequence in which his eyes are shifting from one location to another.

16

Eye tracking has thus been deployed in a variety of disciplines to capture and analyze overt visual
attention of human observers, including neuroscience, psychology, medicine, human factors, marketing,
and computer science (Duchowski, 2002).
Thecommongoalamongstallthesedisciplinesistocapturehumanviewingbehaviorwhenperforming
specific visual tasks in a given context. For instance, in marketing research it is of interest to determine
what products customers attend to in order to maximize profit (Wedel Pieters, 2007). In medical
research it is of interest to identify the search patterns of radiologists when investigating mammograms
for improved breast cancer detection (Wedel Pieters, 2007). In image and video quality assessment,
taking into account the attention of viewers to artifacts may lead to enhanced quality prediction models
(Engelke, Kaprykowsky, Zepernick, Ndjiki-Nya, 2011). In the context of computational modeling
of visual attention, eye-tracking results are usually post-processed into scanpaths or so-called fixation
density maps (FDM), which are considered to be a reliable ground truth for developing computational
models of visual attention.
Background of Eye Tracking Technique
The technology of eye tracking appeared more than 100 years ago in reading research (Rayner, Pollatsek,
Ashby Clifton, 2012). Since, different techniques have been applied to eye tracking. For instance,
the “electro-oculographic techniques” needs to put electrodes on the skin around the eye so that eye
movements can be detected by measuring the differences in electric potential. Some other methods rely
on wearing large contact lenses. The lenses cover the cornea (the transparent front part of the eye) and
sclera (the white part of the eye), while a metal coil is embedded around the lens so it moves along with
the eye. The eye movements can thus be measured by fluctuations in an electromagnetic field when the
eye is moving (Duchowski, 2007). However, these historical methods affect observers’ eye-movement
and are inconvenient to implement.
Video-based techniques are used by modern eye-trackers to determine where a person is looking
at (i.e., the so-called “gaze point” or “point-of-regard”). These eye-trackers achieve the detection of
point-of-regard based on the eye’s features extracted from video images of the eye, such as corneal re-
flections (i.e. Purkinje images), iris-sclera boundary, and the apparent pupil shape (Poole Ball, 2006;
Duchowski, 2007).
Most state-of-the-art commercial eye trackers use the “corneal-reflection/pupil-centre” method to
measure the point-of-regard. The corneal reflection is also known as (first) Purkinje image. During the
eye-tracking recording, a camera focuses on one or both eyes to get images. Contrast is then used to
get the location of the pupil, and infrared light is used to create a corneal reflection. By measuring the
movements of corneal reflection relative to the pupil, it is then possible to know the head movement,
eye rotation, the direction of gaze and consequently the point-of-regard.
Algorithms for Eye-Movement Identification
Given the information about eye-movement type (e.g. fixations, saccades) and their characteristics (e.g.
duration, spatial distribution), various subsequent analyses can then be performed depending on the
particular context and application of the research. However, the raw eye-movement data output from

17

eye-tracking experiments are usually presented by means of a stream of sampled gaze points. Post-
processings need to be performed to identify different types of eye movements from the gaze points.
Fixation detection algorithms extract and label fixations and saccades from raw eye-tracking data
(i.e. sample points). These algorithms can identify the fixations, the saccades taking place between two
successivefixations,andthosesmallereyemovementsoccurringduringfixations,suchastremors,drifts,
and flicks (Salvucci Goldberg, 2000). Note that the fixation identification is a critical aspect of eye-
movement data analysis, since its result can significantly affect later analyses. Evidences have showed
that different identification algorithms could produce different interpretations even when analyzing the
same eye-tracking data (Karsh Breitenbach, 1983).
Salvucci and Goldberg (2000) suggested that most fixation identification algorithms took advantage
of the following spatial or temporal features:
• Velocity: Some algorithms take advantage of the fact that fixation points have much lower veloci-
ties compared to the saccades. Generally, the sampling rate of an eye-tracker is constant, so the
velocity equals to the distance between sample points.
• Dispersion: Some algorithms emphasize the spread distance (i.e. dispersion) of fixation points. It
assumes that the sample points belonging to a fixation generally occur near one another, but sac-
cades are far away from others.
• Duration Information: This criterion is based on the fact that fixations are rarely less than 100
ms and usually in the range of 200-400 ms.
• Local Adaptivity: This criterion means that the interpretation of a given point is influenced by the
interpretation of temporally adjacent points.
Basedonthedifferentfeaturesselected,variousfixationidentificationalgorithmshavebeenproposed.
The two main types of fixation identification algorithms are introduced below.
Velocity-Based Algorithms
The velocity information of eye movements shows two distributions of velocities: low velocities for fixa-
tions, and high velocities for saccades. This velocity-based discrimination is straightforward and robust.
Among various velocity-based algorithms, Velocity-Threshold Identification (I-VT) is the simplest
one to implement (Salvucci Goldberg, 2000). I-VT calculates firstly point-to-point velocities for each
point. Each velocity is computed as the distance between the current point and the next (or previous)
point. Each point is then classified as a saccade point or fixation point based on a velocity threshold: if
the velocity is higher than the threshold, it becomes a saccade, otherwise it becomes a fixation point.
Finally, I-VT translate each fixation group into a x, y, t, d representation. x, y represent the cen-
troid of the points, t and d means the time of the first point and the duration of the points respectively.
Amoresophisticatedtypeofvelocity-basedalgorithmisHiddenMarkovModelfixationIdentification
(I-HMM) (Salvucci, 1999; Salvucci Anderson, 1998). I-HMM applies a two-state HMM in which the
two states represent the velocity distributions for saccade and fixation points, respectively. Generally,
I-HMM can perform more robust identification than fixed-threshold methods (e.g. I-VT) (Salvucci
Goldberg, 2000).

18

Dispersion-Based Algorithms
Dispersion-based Algorithms use the fact that fixation points tend to cluster closely together because of
their low velocity. Dispersion-Threshold Identification (I-DT) is a typical type of the dispersion-based
algorithms. I-DT identifies fixations as groups of consecutive points within a particular dispersion. A
dispersion threshold is thus essential for I-DT algorithms. Moreover, a minimum duration threshold is
also required, which is used to help alleviate equipment variability. The minimum duration threshold
normally ranges from 100 ms to 200 ms (Widdel, 1984).
An implementation of I-DT algorithm is proposed by Widdel et al. (1984). They use a moving win-
dow to cover consecutive data points. The moving window begins at the start of the protocol. It initially
contains a minimum number of points which is determined by a given duration threshold. The I-DT then
computes the dispersion of the points in the window by summing the differences between the points’
maximum and minimum x and y: D=[max(x) - min(x)] + [max(y) - min(y)] .
If the dispersion is above a dispersion threshold, the window moves to the following point. If the dis-
persion is below the threshold, the window represents a fixation and will be expended until the window’s
dispersion is above the threshold. The final window is marked as a fixation that centers at the centroid
of the points and has a given onset time and duration.
4. COMPUTATIONAL MODELING OF VISUAL ATTENTION
Eye-tracking experiments can be considered as a reliable way to acquire the distribution of human’s
attention on a specific scene. However, conducting eye-tracking experiments is usually cumbersome,
time consuming, and hence, expensive. In order to automatically predict the distribution of human’s
attention, extensive research efforts have been dedicated to computational modeling of visual attention.
In our study, we particularly focus on the models that compute saliency maps. The results of this type
of model, the saliency maps, indicate where the most visually interesting regions are located.
In the past years, a body of models using various mathematical tools has been proposed. According
to the taxonomy introduced by Le Meur and Le Callet (2009), most of the computational models can be
grouped into three main categories: hierarchical model, statistical model, and Bayesian model.
4.1 Main Computational Models
Hierarchical Models
This kind of model is characterized by the use of a hierarchical decomposition, whether it involves a
Gaussian, a Fourier-based or wavelet decomposition. Various feature maps are then computed. Different
strategies are then used to integrate information across sub-bands to create a final saliency map.
The Model of Itti
One of the most famous models of this category is the model proposed by Laurent Itti et al. (1998). It is
the first computational and biologically plausible model of bottom-up visual attention, and it serves as
a basis in many studies. The architecture of this model (see Figure 7) is based on the following principle

19

steps. The original image is firstly decomposed into three different perceptual channels: intensity, color
and orientation. A multi-scale representation is constructed from the various channels. A set of linear
center-surround operations akin to visual receptive fields is performed to obtain feature maps. These
feature maps are
1. Normalized by an operator N , which enhances the feature maps containing a small number of
peaks of saliency, and
2. Then summed to create the conspicuity maps (intensity, color, and orientation).
These three conspicuity maps are also normalized and summed to obtain a unique saliency map. In
this model, the selection of focuses of attention (FOA) is achieved by a “winner-take-all” network, which
selects the most salient area in the saliency map and contains an “inhibition of return” mechanism to
temporarily prevent the FOA return immediately to the areas already visited.
Figure 7. Architecture of Itti’s model
Adapted from Itti et al. (1998).

20

The Model of Le Meur
Another representative model comes from Le Meur et al. (2006). It is also a bottom-up model based
on Treisman’s Feature Integration Theory (Treisman Gelade, 1980) and the biologically plausible
architecture proposed by Koch and Ullman (1987). This model was first described by Le Meur et al.
(2006) and then modified in (Le Meur, Le Callet, Barba, 2007), in order to take into account motion.
We introduce here the original version of the model.
Le Meur’s model (see Figure 8) builds on a coherent psychovisual space. Three aspects of the vision
process are tackled: visibility, perception, and perceptual grouping. The “visibility” process simulates
the limited sensitivity of the human visual system. For an input image, RGB luminance is first trans-
formed into the Krauskopf’s color space (A, Cr1
and Cr2
), which simulates the three channels used by
retina for visual information encoding. The first channel, A, transforms achromatic perceptual signals;
the second channel, Cr1
, transforms chromatic perceptual signals of the opponent colors of red-green;
and the third channel, Cr2
, transforms chromatic perceptual signals of the opponent colors of blue-
yellow. A contrast sensitivity function is then applied to each of the three channels. These contrast
Figure 8. Architecture of Le Meur’s model
(Le Meur et al., 2006).

21

sensitivity functions show how the sensitivity of human eye varies as a function of spatial frequency and
orientation. A hierarchical decomposition is then applied to each of the three channels. The decomposi-
tion consists in splitting the 2D spatial frequency domain both in spatial radial frequency and in orienta-
tion. In this model, each channel is considered as the feature map corresponding to a specific set of
neurons. A “perception” process is then applied, in which a center-surround mechanism is performed
to simulate the HVS for selecting relevant areas and reducing the redundant incoming visual information.
The third process of the model is “perceptual grouping”. It refers to the human visual ability which
groups and binds visual features to organize a meaningful higher-level structure. Finally, this computa-
tional model sums the output of the different channels to obtain a two-dimensional spatial saliency map.
Note that Le Meur et al. (2007) proposed a computational model for video, in which motion is considered
as an additional visual channel. The result of the motion channel is a temporal saliency map, which is
finally combined with the spatial saliency map to get a final saliency map.
Statistical Models
This kind of model utilizes probabilistic methods to compute the saliency. The probabilistic framework
is deduced from the content of the current image. The measure of saliency of each location is based on
various features, and is defined as the deviation of these features between the current location and its
surrounding region. Note that
1. The features used in statistical models refer not only to the low level visual features (e.g. color,
intensity or orientation) but also some features derived by Independent Component Analysis (ICA)
or Principal Component Analysis (PCA) algorithms; and
2. Even the whole image might be considered as the “surrounding region” in these models.
The Model of Bruce and Tsotsos
Bruce and Tsotsos (2009) proposed a model of saliency computation based on the premise that localized
saliency computation serves to maximize information sampled from one’s environment. The framework
of this model is depicted in Figure 9.
The first step of this model is the independent feature extraction. For each location (i, j) in the image,
the response of various learned filters that simulate V1 cortical cells are computed. This operation is
considered as measuring the response of various cortical cells coding for content at each individual
spatial location. Gabor-like cells that respond to orientation structure within a specific spatial frequency
band and cells that respond to color opponency are taken into account. This step yields a group of coef-
ficients for each local neighborhood of the scene Ci j
,
.
The second stage is density estimation. The content of each local neighborhood Ci j
,
of the image is
characterized by several coefficients ak
. These coefficients, ak
, correspond to the various basis filters
coding for that location. At one spatial location and in the surrounding regions of that location, there is
a set of coefficients for a same filter type. Based on a non-parametric or histogram density estimate, the
coefficients in the surround form a distribution that can be used to predict the likelihood of the coeffi-
cients of Ci j
,
. Any given coefficient can be then converted to a probability by looking up its likelihood

22

from the probability distribution derived from the surround. Based on the probabilities, joint likelihood
of each location can be computed, which is then translated into Shannon’s measure of Self-information.
The resulting information map serves as the output of the model, the spatial saliency map.
The Model of Gao et al.
The model proposed by Gao et al. (2008) computes a so-called “discriminant center-surround saliency”
by combining
1. The classical assumption that bottom-up saliency is a center-surround process, and
2. A discriminant saliency hypothesis.
The computation of saliency is formulated as a binary classification problem. For each location l in
the input image, the saliency is defined with respect to two classes of stimuli: stimuli of interest and null
hypothesis. Stimuli of interest refer to the observations within a neighborhood of l, (i.e. Wl
1
, which is
referred to as the center); null hypothesis refers to the observations within a surrounding window, (i.e.
Wl
0
, which is referred to as the surround). The saliency of each location is thus equal to the discriminant
power, which is quantified by mutual information, for the classification of the observed features that
comes from the center area and the surround.
Bayesian Models
In addition to information from the current image, the Bayesian framework is applied in this type of
models to also take into account prior knowledge. This prior knowledge concerns, for instance, the sta-
Figure 9. Architecture of the model of Bruce and Tsotsos
Adapted from Bruce Tsotsos (2009).

23

tistic of visual features in natural scenes, including its distribution or its spectral signature. Since prior
knowledge from the perceptual learning process would help the human visual system to understand the
visual environment, the integration of prior knowledge into computational models could be compared
to a visual priming effect that would facilitate the scene perception (Le Meur Le Callet, 2009).
The Model of Zhang et al.
The model proposed by Zhang et al. (2008) is based on the assumption that one goal of human visual
system is to find potential targets by estimating the probability of a target at every location given the
visual features. The proposed model relies on a Bayesian probabilistic framework, in which bottom-up
saliency is regarded as the self-information of visual features; when searching for a target, the overall
saliency is considered as the point-wise mutual information between the features and the target. By
1. Letting the binary variable C denote whether a point belongs to a target class,
2. Letting the random variable L denotes the location,
3. Letting the random variable F denote the visual features, the computation of saliency of specific
location z (e.g. a pixel) is formulated by:
S p C F f L l
z z z
= = = =
( )
1 | ,
where fz
represents the feature observed at z, and l represents the location (i.e. pixel coordinates) of z.
Compared to other bottom-up saliency measures, which are defined solely in terms of the image cur-
rently being viewed, this model is defined based on natural statistics collected from a set of images of
natural scenes. And this is the reason why it is named SUN. Besides, compared to the others, it involves
only local computation on images, without calculation of global image statistics or saliency normaliza-
tion or winner-take-all competition.
Due to the assumptions that
1. Features and location are independent and conditionally independent given C = 1, and
2. The distribution of a feature does not change with location, the formulation is given by:
log log ( ) log ( , ) log ( , )
S p F f p F f C p F f L l
z z z z z
= − = + = = + = =
1 .
The first term on the right side of this equation, − =
log ( )
p F fz
, is the self-information. The rarer
the visual features are, the more informative they are. The second term, log ( , )
p F f C
z
= = 1 , is a log-
likelihood term which favors feature values consistent with our knowledge of the target. It corresponds
to the top-down effect when searching for a known target. The third term in the equation,
log ( , )
p F f L l
z z
= = , is independent of visual features and represents any prior knowledge of where
the target is likely to appear. In the free-viewing condition, both the location prior knowledge and the
log-likelihoodtermareunknown,sothebottom-upsaliencyisequaltotheself-information, − =
log ( )
p F fz
.

24

4.2 Features for Visual Saliency Detection
The selection of visual features is of great importance in the computational modeling of visual attention.
According to the feature integration theory (Treisman Gelade, 1980), three features have been widely
used in existing computational models of visual attention: intensity, color and orientation (Borji Itti,
2013). Intensity is usually processed by a center-surround process, which is inspired by neural responses
in lateral geniculate nucleus (LGN) and V1 cortex. To extract this feature, two types of filter are used to
simulate the response of visual cells that have a center ON (resp. OFF) and a surround OFF (resp. ON).
Color is usually taken into account by means of the red/green and the blue/yellow color pairs, which is
inspired by color-opponent neurons in V1 cortex. Orientation is usually implemented as a convolution
with oriented Gabor filters or by the application of oriented masks. Motion is also used in the models for
video (in the primate brain motion is derived by the neurons at MT an MST regions which are selective
to direction of motion (Borji Itti, 2013). In addition to the basic visual features introduced previously,
some other specific features that direct human’s attention have been used in the modeling of visual atten-
tion (Borji Itti, 2013), including: faces (Cerf, Harel, Einhäuser, Koch, 2008), horizontal lines (Oliva
Torralba, 2001), wavelets (Li, Tian, Huang, Gao, 2010), gist (Torralba, 2003), center-bias (Tatler,
2007), spatial resolution (Hamker, 2005), optical flow (Vijayakumar, Conradt, Shibata, Schaal, 2001),
flicker (Itti, Dhavale, Pighin, 2004), crosses or corners (Privitera Stark, 2000), entropy (Kadir
Brady, 2001), ellipses (Lee, Buxton, Feng, 2005), symmetry (Kootstra, Nederveen, De Boer, 2008),
texture contrast (Parkhurst et al., 2002), depth (Maki, Nordlund, Eklundh, 2000), components derived
by ICA or PCA algorithms (L. Zhang et al., 2008; Bruce Tsotsos, 2009).
5. EXTENSION TO THE COMPUTATIONAL MODELING
OF STEREOSCOPIC 3D VISUAL ATTENTION
After introducing the studies regarding the ground truth of 3D visual attention and the impact of depth
perception on visual attention, we particularly focus on the modeling of 3D visual attention. A new 3D
visual attention model relying on both 2D visual features and features extracted from depth information
is proposed and evaluated in this section.
Wefirstintroduceandsummarizepreviousworkson3Dvisualattention.Ataxonomyofcomputational
models of 3D visual attention is proposed. After that, we introduce a depth-saliency-based model of 3D
visual attention. To get benefit from psychophysical studies, we propose to apply Bayes’s theorem on
the result of an eye-tracking experiment using synthetic stimuli to model the correlation between depth
features and the level of depth saliency. We also introduce and discuss two ways of combining depth
saliency map with 2D saliency map.
During the viewing of stereoscopic 3D content, disparity information is used by the brain to retrieve
the 3D layout of the environment, it leads to a stereoscopic perception of depth. This change of depth
perception also largely modifies human’s visual attention when watching stereoscopic 3D images/videos
(Hakkinen, Kawai, Takatalo, Mitsuya, Nyman, 2010b; Quan Huynh-Thu, Barkowsky, Le Callet,
2011). Predicting the salient areas of a 3D scene becomes thus a challenging task due to the additional
depth information.

25

Several challenges, importance and new applications of visual attention for 3D content viewing were
introduced by Huynh-Thu et al. (2011). They described the conflicts that the human visual system has to
deal with during watching 3D-TV. They also explained how these conflicts might be limited, and how
visual comfort could be improved by knowing how visual attention is deployed. Several new applica-
tion areas that can benefit from being provided the location (including depth) of salient areas were also
introduced. These candidate applications exist in the different steps of a typical 3D-TV delivery chain,
e.g. 3D video capture, 2D to 3D conversion, reframing and depth adaptation, and subtitling in 3D movie.
The increasing demand of visual-attention-based applications for 3D content highlights the impor-
tance of computationally modeling 3D visual attention. However, two questions need to be figured out
for developing a 3D visual attention model:
• The influence of 2D visual features (e.g. color, intensity, orientation, and center-bias) in 3D view-
ing condition.
• The influence of depth on visual attention deployment in 3D viewing condition. For instance, it is
necessary to figure out how the bias of fixations according to depth (i.e. the depth-bias), and the
visual features based on depth information (e.g., the orientation of surface, the contrast of depth)
affect the deployment of human’s visual attention.
In the last decade, a large number of 2D visual attention models have been created. Therefore, the
first question concerns the possibility of adapting this large amount of existing 2D models into the 3D
case. On the other hand, the second question concerns the means by which the additional information,
depth, can be taken into account.
The research on 3D visual attention modeling is also facing another problem: the lack of published
eye-tracking database of 3D images. In addition to the lack of quantitative evaluation of performance,
another consequence of the lack of ground truth is that most of the existing 3D visual attention models
only take into account, in a qualitative way, the results of psychophysical experiments about depth’s
influence or the variation of 2D features’ effects. Any model that quantitatively integrates experimental
observation results is still missing so far. Moreover, there is still no strong conclusion on the means by
which depth information should be used in 3D visual attention modeling: whether the depth should be
used to weight 2D saliency map; or alternatively it should be considered as an additional visual dimen-
sion to extract depth features and create depth saliency map.
5.1 How 3D Visual Attention Is Affected by Various Visual Features
Based on observations from psychophysical experiments, several studies have started to examine both
qualitatively and quantitatively how visual attention may be influenced by the 2D visual features and
additional binocular depth cues.
One of the early works was done by Jansen et al. (2009) who investigated the influence of disparity
on viewing behavior in the observation of 2D and 3D still images. They conducted a free-viewing task
on the 2D and 3D versions of the same set of images. They found that additional depth information led
to an increased number of fixations, shorter and faster saccades, and increased spatial extent of explora-
tion. However, no significant difference was found between the viewing of 2D and 3D stimuli in terms
of saliency of several 2D visual features including mean luminance, luminance contrast, and texture
contrast. This consistence of the influence of 2D low-level visual features implied:

26

1. The importance of 2D visual feature detection in the design of 3D visual attention model, and
2. The possibility of adapting existing 2D visual attention models in the modeling of 3D visual
attention.
Liu, Cormack and Bovik (2010) examined visual features at fixated positions for stereo images with
natural content. They focused on comparing visual features extracted from fixations and random loca-
tions in the viewing of 3D still images. On one hand, they demonstrated that some 2D visual features
including luminance contrast and luminance gradient were generally higher at fixated areas. On the
other hand, their results also indicate that disparity contrast and disparity gradient of fixated locations
are lower than randomly selected locations. This result is inconsistent with the result from Jansen et al
(2009) who found that observers consistently look more at depth discontinuities (high disparity contrast
areas) than at planar surfaces. One limitation of Liu et al.’s study might rely on the quality of ground
truth disparity map. The disparity maps they used came from a simple correspondence algorithm rather
than any depth range sensing systems or any sophisticated depth estimation algorithms. The final results
might thus be affected by a considerable amount of noise in the estimated disparity maps.
Hakkinen et al. (2010) examined the difference in eye movement patterns between the viewing of 2D
and 3D versions of the same video content. They found that eye movements are more widely distributed
for 3D content. Compared to the viewing of 2D content, viewers did not only look at the main actors but
also looked at some other targets on typical movie content. Their result shows that depth information
from the binocular depth cue provides viewers additional information, and thus creates new salient areas
in a scene. This result suggests the existence of a saliency map from depth, and a potential “summa-
tion” operation during the integration of 2D and depth saliency information. In opposite, Ramasamy et
al.’s study (Ramasamy, House, Duchowski, Daugherty, 2009), which is related to stereo-filmmaking,
showed that observers’ gaze points could be more concentrated when viewing the 3D version of some
content (e.g. the scenes containing long deep hallway).
In terms of the depth plane where fixations tend to be located, Wang, Le Callet, Ricordel and Tour-
ancheau (2011) examined a so-called “depth-bias” in task-free viewing of still stereoscopic synthetic
stimuli. They found that objects closest to the observer always attract most fixations. The number of
fixations on each object decreases as the depth order of the object increases, except that the furthest
object receives a few more fixations than the one or two objects in front of it. The number of fixations
on objects at different depth planes was also found to be time dependent. This result is consistent with
the result of Jansen et al. (2009). Considering the influence of center-bias in 2D visual attention, these
results indicate the existence of a location prior according to depth in the viewing of 3D content. This
location prior indicates the possibility of integrating depth information by means of doing a weighting.
Wismeijer, Erkelens, van Ee and Wexler (2010) examined if saccades were aligned with individual
depth cues or with a combination of depth cues by presenting stimuli in which monocular perspective
cues and binocular disparity cues conflicted. Their results indicate a weighted linear combination of
cues when the conflicts are small, and a cue dominance when the conflicts are large. They also found
that vergence is dominated only by binocular disparity. Their result implies that the interocular distance
recorded by binocular eye-tracking experiment for 3D content should be compensated by taking into
account the local disparity value.

27

5.2 Previous Works on 3D Visual Attention Modeling
As introduced previously, great efforts have been put into the study of viewing behavior of 3D content.
However, only a few computational models of 3D visual attention have been proposed as compared to
the body of 2D visual attention models. Experimental results have demonstrated strong influences of
2D visual features, in the viewing of 3D content. However, due to the addition of new depth cues, depth
features, and their combination or conflicts with other monocular cues (Hoffman, Girshick, Akeley,
Banks, 2008; Okada et al., 2006), a direct use of 2D visual attention model for 3D content is neither
biologically plausible nor effective.
Furthermore, the disparity between two views can raise serious challenges on collecting 3D gaze
points and creating fixation density maps, which are used as ground-truth, since the gaze data need to be
extrapolated or processed to provide a notion of depth in relation with gaze direction or location (Quan
Huynh-Thu et al., 2011).
In the literature, a few computational models of 3D visual attention have been proposed. All of these
models contain a stage in which 2D visual features are extracted and used to compute 2D saliency maps.
According to the ways they use depth information, these models can be classified into three different
categories: depth-weighting model, depth-saliency model, and stereovision model.
Depth-Weighting Models
This type of model (Maki, Nordlund, Eklundh, 1996; Y. Zhang, Jiang, Yu, Chen, 2010; Chamaret,
Godeffroy, Lopez, Le Meur, 2010) do not contain any depth-map-based feature-extraction processes.
Apart from detecting salient areas by 2D visual features, these models share a same step in which depth
information is used as the weighting factor of 2D saliency. The saliency of each location (e.g. pixel,
target or depth plane) in the scene is directly related to its depth. Both 2D scene and depth map are
taken as input. Note that depth maps used in these models can be a ground truth depth map provided by
depth detection equipment, or come from depth estimation algorithms which use two or multiple views.
Depth-Saliency Models
The models (Ouerhani Hugli, 2000; Potapova, Zillich, Vincze, 2011) in this category take depth
saliency as additional information. This type of model relies on the existence of “depth saliency maps”.
Depth features are first extracted from depth map to create additional feature maps, which are used to
generate the depth saliency maps. These depth saliency maps are finally combined with 2D saliency maps
(e.g. from 2D visual attention models using color, orientation or intensity) by using saliency map pooling
strategy to obtain a final 3D saliency map. This type of model takes as input 2D scene and depth map.
Stereovision Models
Instead of directly using depth map, this type of model takes into account the mechanisms of stereoscopic
perception in the human visual system. Bruce and Tsotsos (2005) extend the 2D models that use a visual
pyramid processing architecture by adding neuronal units for modeling stereo vision. Images from both
views are taken as input, from which 2D visual features can be considered. In addition, the model takes
into account the conflicts between two eyes resulting from occlusions or large disparities.

28

Summary of the Previous Studies
Table 4 introduces the main properties of the models belonging to each of the three categories. So far,
most of the existing computational models of 3D visual attention belong to the first or the second cat-
egory. Figure 10 summarizes the two different ways by which depth information is taken into account
in these two types of model.
5.3 Recent Works: Depth-Saliency-Based
Computational Model of 3D Visual Attention
Depth features have demonstrated their contribution in predicting saliency map of 3D images. Several
depth features (e.g. surface curvature, depth gradient, relative surface orientation) have been proposed
and used in previous 3D models. In this section, the way of creating depth map from which depth fea-
tures can be extracted is first introduced. In a second step, we introduce a new method for generating
a so-call “depth saliency map”. The depth saliency map is computed based only on depth features (i.e.
depth contrast) by using a Bayesian framework.
Depth Map Creation
Weproposethatadepthmapprovidingtheinformationofascene’sperceiveddepthneedstobecomputed
at the first step of modeling 3D visual attention. In the literature, the disparity map is usually directly
adopted as depth information (Chamaret et al., 2010). We propose that a transformation from disparity
Table 4. Main features of computational models of 3D visual attention
Depth Weighting Depth Information Operation Validation
Maki, Nordlund and
Eklundh (2000)
Relative depth Assigned the target closer to observers with
highest priority
Qualitative assessment; no
quantitative comparison to eye-
tracking data
Zhang et al. (2010) Perceived depth,
pop-out effect
Irregular space conversion. Pixels closer
to observers and in front of screen is
considered to be higher salient
tracking data
Chamaret et al. (2010) Relative depth Weight each pixel in 2D saliency map by its
depth value
tracking data
Depth Saliency Depth Information Operation Validation
Ouerhani and Hugli
(2000)
Absolute depth
(distance), surface
curvature, depth gradient
Extract depth features from depth map.
Compute additional conspicuity maps based
on depth features. Pool all the conspicuity
maps (from 2D features and depth features)
tracking data
Potapova et al. (2011) Surface height, relative
surface orientation,
occluded edges
Compute one saliency map for each (2D
and depth) feature, then sum all the saliency
maps
Qualitative assessment and
quantitative comparison to
labeled ROIs
Stereovision Depth Information Operation Validation
Bruce and Tsotsos
(2005)
Disparity Take two views as input. Add interpretive
neuronal units for stereo-vision modeling
into 2D computational model which use
visual pyramid processing architecture [34].
tracking data

29

map to depth map which represents perceived depth in unit of length should be added in the chain of 3D
visual attention modeling, since even the same disparity value corresponds to different perceived depth
depending on the viewing condition.
From the view of display system, disparity is measured in unit of pixels. The relationship between
disparity (in pixel) and perceived depth can be modeled by the following equation (see also Figure 11):
D
V
I R
P W
x
=
+
⋅
⋅













1
, (1)
Figure 10. Two different ways of using depth in the depth-weighting models (left), and the depth-saliency
models (right); note that the main difference between these two types of model is the existence of a stage
for extracting depth features and creating depth saliency map.
Figure 11. The relationship between disparity and perceived depth (top view)

30

where D represents the perceived depth, V represents the viewing distance between observer and screen
plane, I represents the interocular distance, P is the disparity in pixels, W and Rx
represent, respectively,
the width (in cm) and the horizontal resolution of the screen.
According to this Equation 1, perceived depth is not only a function of disparity but also influenced by
the viewing conditions (the viewing distance and the properties of the display). For instance, an 8-pixel
negative disparity can create a perceived depth of about 3.5 cm behind the screen when it is presented
on a 24-inch full-HD stereoscopic display with 1-meter viewing distance (3 times of the screen’s height).
However, the same disparity corresponds to a perceived depth of infinite on a 2k-cinema screen with
8-meter screen height and 8-meter viewing distance. When the viewing condition varies, the change
of perceived depth from even a same disparity value might make some areas of a 3D scene impossible
to fuse. Consequently, the saliency distribution can be different. In this study, we adopt Equation 1 to
compute the depth map for each image, the interocular distance is set to 6.3 cm, while the screen prop-
erty parameters are set according to the setup of the eye-tracking experiment conducted for building the
Bayesian based model presented below).
A Bayesian Approach of Depth Saliency Map Generation
In the area of saliency map creation, Bayes’s theorem has been widely applied in various ways (L. Zhang,
Tong, Marks, Shan, Cottrell, 2008; Chikkerur, Serre, Tan, Poggio, 2010; Pinneli Chandler, 2008).
In this chapter, we propose a new approach of Bayes’s theorem for computing a depth saliency map based
on features extracted from a depth map. The proposed approach correlates depth features with the level
of depth saliency, by using data from a psychophysical experiment.
We firstly introduce the proposed definition of depth saliency: the depth saliency (S) of each location
(a pixel) equals to the probability of this point being gazed at, given depth features observed at this point
and the spatial location of this point:
S P C f l
dep z
= =
( )
1 | , , (2)
where C is a binary random variable denoting whether or not a point is gazed at. The random variable
vector fdep
denotes depth features observed at this point, lz
denotes its location in depth. Note that the term
about ‘features’, fdep
, stands for not only the local visual features such as relative depth (i.e. disparity)
and absolute depth (i.e. distance to observer), but also some higher order features considering the infor-
mation from neighborhood, such as the result of applying Difference of Gaussian kernel (DoG) on
feature maps.
Regarding to the right side of equation 2,P C f l
dep z
=
( )
1 | , , we make the assumptions that 1) the
depth features of each point are independent of its distance to the viewer, and 2) P(C=1) is simply a
constant.
By using Bayes’ rule, this probability can be thus transformed to:
S P C f P C l const
dep z
= =
( )⋅ =
( )⋅
1 1
| | . (3)

31

The first term in equation 3, P C fdep
=
( )
1 | , represents the probability of a point to be gazed at,
given only the features extracted from depth information at this point. By computing this probability,
the saliency map from depth channel can be obtained. The second term in this equation, P C lz
=
( )
1 | ,
represents the probability of a point to be gazed at given its distance to the viewer. This probability re-
flects observers’ viewing strategy, the bias of eyes position, or the prior knowledge about at which
distance potential targets are likely to appear. Compared to the well known ‘center-bias’ regarding to the
location prior in the viewing of 2D image (Tatler, 2007; Tseng, Carmi, Cameron, Munoz, Itti, 2009),
relatively little of this preference of observation in depth is known and studied. Recently, this preference
was quantified and named as ‘depth-bias’ by Wang et al. (2011). Therefore, based on the proposed
model of depth saliency, the saliency value of each point in a three dimensional scene can be considered
as a combination of visual saliency from depth features and depth prior. However, studying depth-bias
is not in the scope of this chapter. In the following part, we focus on the introduction of modeling
P C fdep
=
( )
1 | , omitting the depth prior part.
By using Bayes’ rule, we can get:
P C f
P f C
P f
dep
dep
dep
=
( )= ⋅
=
( )
( )
1
1
|
|
α , (4)
where α is a constant value representing the probability P(C=1). The function P C f dep
=
( )
1 | repre-
sents how depth features observed at a point, influence the probability of the human visual system of
deciding whether to fixate this point or not. This probability is proportional to the feature distribution
at a gaze point, normalized by the rarity of features in the context (see equation 4). Note that the use of
the likelihood, P f C
dep | =
( )
1 in the proposed approach differs from the way in which it is usually used
by many models in the literature applying also Bayes’s theory. We are not doing any binary classification
to make a decision that a point is a fixation or not. Instead, we define the result (i.e. depth saliency map)
as a distribution of probability of the points being gazed at as a function of depth features.
To achieve the computation of depth saliency map, the proposed approach consists of two stages:
1. Depth feature extraction, and
2. Probability distribution modeling.
Depth Feature Extraction
The proposed model uses depth contrast as feature for depth saliency map prediction. In most situations,
depth contrast can be an efficient indicator of interesting targets. For example, the HVS might consider
a region protruding above a flat plane as a potential target (Potapova et al., 2011); or might consider a
hole as a place where potential target might exist.

Exploring the Variety of Random
Documents with Different Content

with active links or immediate access to the full terms of the Project
Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or providing
access to or distributing Project Gutenberg™ electronic works
provided that:
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information

about donations to the Project Gutenberg Literary Archive
Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of receipt
that s/he does not agree to the terms of the full Project
Gutenberg™ License. You must require such a user to return or
destroy all copies of the works possessed in a physical medium
and discontinue all use of and all access to other copies of
Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full refund of
any money paid for a work or a replacement copy, if a defect in
the electronic work is discovered and reported to you within 90
days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™
electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or

damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for
the “Right of Replacement or Refund” described in paragraph 1.F.3,
the Project Gutenberg Literary Archive Foundation, the owner of the
Project Gutenberg™ trademark, and any other party distributing a
Project Gutenberg™ electronic work under this agreement, disclaim
all liability to you for damages, costs and expenses, including legal
fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR
NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR
BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK
OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL
NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT,
CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of receiving
it, you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or
entity that provided you with the defective work may elect to provide
a replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,

INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,
the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.
Section 2. Information about the Mission
of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will

remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.
The Foundation’s business office is located at 809 North 1500 West,
Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many

small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws regulating
charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states where
we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot make
any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.

Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

Innovative Research In Attention Modeling And Computer Vision Applications 1st Edition Rajarshi Pal

More Related Content

Similar to Innovative Research In Attention Modeling And Computer Vision Applications 1st Edition Rajarshi Pal (20)

Recently uploaded (20)

Innovative Research In Attention Modeling And Computer Vision Applications 1st Edition Rajarshi Pal