Biometrics Theory Methods And Applications N V Boulgouris
Biometrics Theory Methods And Applications N V Boulgouris
Biometrics Theory Methods And Applications N V Boulgouris
Biometrics Theory Methods And Applications N V Boulgouris
1. Biometrics Theory Methods And Applications N V
Boulgouris download
https://ebookbell.com/product/biometrics-theory-methods-and-
applications-n-v-boulgouris-1493430
Explore and download more ebooks at ebookbell.com
2. Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Statistical Methods In Counterterrorism Game Theory Modeling Syndromic
Surveillance And Biometric Authentication Wilson
https://ebookbell.com/product/statistical-methods-in-counterterrorism-
game-theory-modeling-syndromic-surveillance-and-biometric-
authentication-wilson-22125582
Biometrics Theory Applications And Issues Theory Applications And
Issues 1st Edition Ellen R Nichols
https://ebookbell.com/product/biometrics-theory-applications-and-
issues-theory-applications-and-issues-1st-edition-ellen-r-
nichols-51368932
A Theory Of Bioethics 1st Edition David Degrazia Joseph Millum
https://ebookbell.com/product/a-theory-of-bioethics-1st-edition-david-
degrazia-joseph-millum-36372504
Applying Nonideal Theory To Bioethics Living And Dying In A Nonideal
World Elizabeth Victor
https://ebookbell.com/product/applying-nonideal-theory-to-bioethics-
living-and-dying-in-a-nonideal-world-elizabeth-victor-34433636
3. Biometrics And Identity Management First European Workshop Bioid 2008
Roskilde Denmark May 79 2008 Revised Selected Papers 1st Edition
Enrique Argones Ra
https://ebookbell.com/product/biometrics-and-identity-management-
first-european-workshop-bioid-2008-roskilde-denmark-
may-79-2008-revised-selected-papers-1st-edition-enrique-argones-
ra-2039324
Biometrics And Id Management Cost 2101 European Workshop Bioid 2011
Brandenburg Havel Germany March 810 2011 Proceedings 1st Edition
Andrzej Drygajlo Auth
https://ebookbell.com/product/biometrics-and-id-management-
cost-2101-european-workshop-bioid-2011-brandenburg-havel-germany-
march-810-2011-proceedings-1st-edition-andrzej-drygajlo-auth-2113786
Biometrics Identity Verification In A Networked World 1st Samir
Nanavati
https://ebookbell.com/product/biometrics-identity-verification-in-a-
networked-world-1st-samir-nanavati-2221684
Biometrics Bodies Technologies Biopolitics 1st Edition Joseph Pugliese
https://ebookbell.com/product/biometrics-bodies-technologies-
biopolitics-1st-edition-joseph-pugliese-24406306
Biometrics 1st Edition John D Woodward Jr Nicholas M Orlans
https://ebookbell.com/product/biometrics-1st-edition-john-d-woodward-
jr-nicholas-m-orlans-2453450
8. IEEE Press
445 Hoes Lane
Piscataway, NJ 08854
IEEE Press Editorial Board
Lajos Hanzo, Editor in Chief
R. Abari T. Chen B. M. Hammerli
J. Anderson T. G. Croda O. Malik
S. Basu M. El-Hawary S. Nahavandi
A. Chatterjee S. Farshchi W. Reeve
Kenneth Moore, Director of IEEE Book and Information Services (BIS)
Technical Reviewer
Qinghan Xiao, Defence Research and Development Canada
IEEE-CIS Liaison to IEEE Press, Gary B. Fogel
Books in the IEEE Press Series on Computational Intelligence
Evolving Intelligent Systems: Methodology and Applications
Edited by Plamen Angelov, Dimitar Filev, and Nik Kasabov
2010 978-0470-28719-4
Biometrics: Theory, Methods, and Applications
Edited by Nikolaos V. Boulgouris, Konstantinos N. Plataniotis, and Evangelia Micheli-Tzanakou
2010 978-0470-24782-2
Clustering
Rui Wu and Donald Wunsch II
2009 978-0470-27680-0
Computational Intelligence in Bioinformatics
Edited by David B. Fogel, David W. Corne, and Yi Pan
2008 978-0470-10526-9
Introduction to Evolable Hardware: A Practical Guide for Designing Self-Adaptive Systems
Garrison W. Greenwood and Andrew M. Tyrrell
2007 978-0471-71977-9
Evolutionary Computation: Toward a New Philosophy of Machine Intelligence, Third Edition
David B. Fogel
2006 978-0471-66951-7
Emergent Information Technologies and Enabling Policies for Counter-Terrorism
Edited by Robert L. Popp and John Yen
2006 978-0471-77615-4
Computationally Intelligent Hybrid Systems
Edited by Seppo J. Ovaska
2005 0-471-47668-4
Handbook of Learning and Appropriate Dynamic Programming
Edited by Jennie Si, Andrew G. Barto, Warren B. Powell, Donald Wunsch II
2004 0-471-66054-X
Computational Intelligence: The Experts Speak
Edited By David B. Fogel and Charles J. Robinson
2003 0-471-27454-2
9. Biometrics
Theory, Methods,
and Applications
Edited by
Nikolaos V. Boulgouris
Konstantinos N. Plataniotis
Evangelia Micheli-Tzanakou
IEEE Computational Intelligence Society, Sponsor
IEEE Press Series on Computational Intelligence
David B. Fogel, Series Editor
11. Contents
Preface vii
Contributors xiii
1. Discriminant Analysis for Dimensionality Reduction:
An Overview of Recent Developments 1
Jieping Ye and Shuiwang Ji
2. A Taxonomy of Emerging Multilinear Discriminant Analysis
Solutions for Biometric Signal Recognition 21
Haiping Lu, Konstantinos N. Plataniotis, and Anastasios N. Venetsanopoulos
3. A Comparative Survey on Biometric Identity Authentication
Techniques Based on Neural Networks 47
Raghudeep Kannavara and Nikolaos Bourbakis
4. Designing Classifiers for Fusion-Based Biometric Verification 81
Kritha Venkataramani and B. V. K. Vijaya Kumar
5. Person-Specific Characteristic Feature Selection
for Face Recognition 113
Sreekar Krishna, Vineeth Balasubramanian, John Black,
and Sethuraman Panchanathan
6. Face Verification Based on Elastic Graph Matching 143
Anastasios Tefas and Ioannis Pitas
7. Combining Geometrical and Statistical Models
for Video-Based Face Recognition 171
Amit K. Roy-Chowdhury and Yilei Xu
8. A Biologically Inspired Model for the Simultaneous Recognition
of Identity and Expression 195
Donald Neth and Aleix M. Martinez
9. Multimodal Biometrics Based on Near-Infrared Face Recognition 225
Rui Wang, Shengcai Liao, Zhen Lei, and Stan Z. Li
10. A Novel Unobtrusive Face and Hand Geometry Authentication
System Based on 2D and 3D Data 241
Filareti Tsalakanidou and Sotiris Malassiotis
11. Learning Facial Aging Models: A Face Recognition Perspective 271
Narayanan Ramanathan and Rama Chellappa
v
12. vi Contents
12. Super-Resolution of Face Images 295
Sung Won Park and Marios Savvides
13. Iris Recognition 315
Yung-Hui Li and Marios Savvides
14. Learning in Fingerprints 339
Alessandra Lumini, Loris Nanni, and Davide Maltoni
15. A Comparison of Classification- and Indexing-Based Approaches
for Fingerprint Identification 365
Xuejun Tan, Bir Bhanu, and Rong Wang
16. Electrocardiogram (ECG) Biometric for Robust Identification
and Secure Communication 383
Francis Minhthang Bui, Foteini Agrafioti, and Dimitrios Hatzinakos
17. The Heartbeat: The Living Biometric 429
Steven A. Israel, John M. Irvine, Brenda K. Wiederhold, and Mark D. Wiederhold
18. Multimodal Physiological Biometrics Authentication 461
Alessandro Riera, Aureli Soria-Frisch, Mario Caparrini,
Ivan Cester, and Giulio Ruffini
19. A Multiresolution Analysis of the Effect of Face Familiarity
on Human Event-Related Potentials 483
Brett DeMarco and Evangelia Micheli-Tzanakou
20. On-Line Signature-Based Authentication: Template Security
Issues and Countermeasures 497
Patrizio Campisi, Emanuele Maiorana, and Alessandro Neri
21. Unobtrusive Biometric Identification Based on Gait 539
Xiaxi Huang and Nikolaos V. Boulgouris
22. Distributed Source Coding for Biometrics: A Case Study
on Gait Recognition 559
Savvas Argyropoulos, Dimosthenis Ioannidis, Dimitrios Tzovaras,
and Michael G. Strintzis
23. Measuring Information Content in Biometric Features 579
Richard Youmaran and Andy Adler
24. Decision-Making Support in Biometric-Based Physical Access
Control Systems: Design Concept, Architecture, and Applications 599
Svetlana N. Yanushkevich, Vlad P. Shmerko, Oleg Boulanov, and Adrian Stoica
25. Privacy in Biometrics 633
Stelvio Cimato, Marco Gamassi, Vincenzo Piuri, Roberto Sassi, and Fabio Scotti
26. Biometric Encryption: The New Breed of Untraceable Biometrics 655
Ann Cavoukian and Alex Stoianov
Index 719
13. Preface
The objective of biometric systems is the recognition or authentication of individu-
als based on some physical or behavioral characteristics that are intrinsically unique
for each individual. Nowadays, biometric systems are fundamental components of
advanced security architectures. The applications of biometrics range from access
control, military, and surveillance to banking and multimedia copyright protection.
Recently, biometric information has started to become an essential element in gov-
ernment issued authentication and travel documents. The large-scale deployment of
biometrics sensors in a variety of electronic devices, such as mobile phones, laptops,
and personal digital assistants (PDA), has further accelerated the pace at which the
demand for biometric technologies has been growing. The immense interest in the
theory, technology, applications, and social implications of biometric systems has cre-
ated an imperative need for the systematic study of the use of biometrics in security
and surveillance infrastructures.
This edited volume provides an extensive survey of biometrics theory, methods,
and applications, making it a good source of information for researchers, security
experts, policy makers, engineers, and graduate students. The volume consists of 26
chapters which cover most aspects of biometric systems. The first few chapters ad-
dressparticularrecognitiontechniquesthatcanbeusedinconjunctionwithavarietyof
biometric traits. The following chapters present technologies tailored to specific bio-
metric traits, such as face, hand geometry, fingerprints, signature, electrocardiogram,
electroencephalogram, and gait. The remaining chapters focus on both theoretical
issues as well as issues related to the emerging area of privacy-enhancing biometric
solutions.
An overview of recent developments in discriminant analysis for dimension-
ality reduction is presented in the first chapter. Specifically, a unified framework
is presented for generalized linear discriminant analysis (LDA) via a transfer func-
tion. It is shown that various LDA-based algorithms differ in their transfer functions.
This framework explains the properties of various algorithms and their relationship.
Furthermore, the theoretical properties of various algorithms and their relationship
are also presented. An emerging extension of the classical LDA is the multilinear
discriminant analysis (MLDA) for biometric signal recognition. Biometric sig-
nals are mostly multidimensional objects, known as tensors. Recently, there has been
a growing interest in MLDA solutions. In Chapter 2, the fundamentals of existing
MLDA solutions are presented and then categorized according to the multilinear
projection employed. At the same time, their connections with traditional linear solu-
tions are pointed out. The next two chapters present classification issues in biometric
identification. The problem of classification is extremely important because it
vii
14. viii Preface
essentially sets the framework regarding the way decisions are made once feature
extraction and dimensionality reduction have taken place. A variety of classification
approaches can be taken. One of these approaches is to use neural networks (NN).
Chapter 3 is a comparative survey on biometric identity authentication tech-
niques based on neural networks. This chapter presents a survey on representative
NN-based methodologies for biometric identification. In particular, it captures the
evolution of some of the representative NN-based methods in order to provide an
outline of the application of neural nets in biometric systems. A specific, but far from
uncommon, case of classification is that involving fusion of biometrics. The main
task here is the design of classifiers for fusion-based biometric verification, which
is addressed in Chapter 4. The chapter provides guidelines for optimal ensemble
generation, where each classifier in the ensemble is a base classifier. Examples are
shown for support vector machines and correlation filters. The chapter also focuses
on decision fusion rules and the effect of classifier output diversity on their decision
fusion accuracy is also analyzed.
Chapters 5–20 present systems based on specific biometric modalities. Meth-
ods for face recognition/verification are presented in Chapters 5–8. One of the most
important problems in face recognition is feature selection. Chapter 5 presents a
person-specific characteristic feature selection for face recognition. In this chap-
ter, a new methodology for face recognition is introduced that detects and extracts
unique features on a person’s face and then uses those features for the purpose of
recognition. Chapter 6 presents a different approach by performing face verification
based on elastic graph matching. Using elastic graph matching, a face is represented
as a connected graph. This approach endows the recognition process with robustness
against geometric distortions of the facial image. Another challenging task in the
area of face-based biometric systems is the efficient use of video sequences for face
authentication. Chapter 7 presents a method for the combination of geometrical and
statistical models for video-based face authentication. In this chapter, it is shown
that it is possible to describe object appearance using a combination of analytically
derived geometrical models and statistical data analysis. Specifically, a framework
that is robust to large changes in facial pose and lighting conditions is presented for
face recognition from video sequences. The method can handle situations where the
pose and lighting conditions in the training and testing data are completely disjoint.
Chapter 8 is about a biologically inspired model for the simultaneous recognition
of identity and expression. This work builds upon the fact that faces can provide
a wide range of information about a person’s identity, race, sex, age and emotional
state. In most cases, humans easily derive such information by processes that ap-
pear rapid and automatic. However, upon closer inspection, one finds these processes
to be diverse and complex. This chapter examines the perception of identity and
emotion. Next, it develops a computational model that is applied for identification
based on face images with differing expression as well as for the classification of
expressions.
Chapters 9–11 present some more advanced methods for face recognition. The
first two of these chapters go beyond the conventional approach and are based on
the realization that face recognition does not have to rely on an image taken using a
15. Preface ix
conventional camera. Face recognition using infrared cameras is a very interest-
ing extension of conventional face recognition. In Chapter 9, a near-infrared (NIR)
face-based approach is presented for multimodal biometric fusion. The NIR face is
fused with the visible light (VL) face or iris modality. This approach has several ad-
vantages, including the fact that NIR face recognition overcomes problems arising
from uncontrolled illumination in VL images and achieves significantly better results
than when VL faces are used. Furthermore, the fusion of NIR face with VL face or
iris is a natural combination for multibiometric solutions. A different, multimodal
system based on the fusion of 2D and 3D face and hand geometry data is presented
in Chapter 10. This topic is of particular interest because recent advances in multi-
modal biometrics as well as the emergence of affordable 3D imaging technologies
have created great potential for techniques that involve 3D data. The main advan-
tage is the simultaneous acquisition of a pair of depth and color images of biometric
information using low-cost sensors. Although the above face-based methodologies
offer improved performance, they are not directly improving the resilience of visual
biometric systems to the change that these biometrics undergo through time. Aging
is a crucial factor for recognition applications and, in the case of face recognition,
can be dealt with by using learning facial aging models. Such facial models studied
in Chapter 11 can be used for the prediction of one’s appearance across ages and,
therefore, are of great importance for performing reliable face recognition across age
progression. Chapter 12 is about super-resolution techniques, which can be used in
conjunction with face recognition technologies.
The next three chapters are devoted to iris and fingerprint recognition. The tech-
nologies that are used in iris recognition systems are presented in Chapter 13. Iris
recognition is an extremely reliable technique for identification of individuals, and
this chapter reviews both its theoretical and practical aspects. Fingerprint recognition
is another very important technology that has been reliably used in biometric systems
for a many years. Chapter 14, entitled learning in fingerprints, gives a short introduc-
tion of the basic concepts and terminology. Furthermore, it provides a detailed review
of the existing literature by discussing the most salient learning-based approaches
applied to feature extraction, matching, and classification of fingerprints. Chapter 15
makes a comparison of classification and indexing-based approaches for finger-
print recognition. This chapter presents a comparison of two key approaches for
fingerprint identification. These approaches are based on classification followed by
verification and indexing followed by verification. The fingerprint classification ap-
proach is based on a feature-learning algorithm, while the indexing approach is based
on features derived from triplets of minutiae.
Chapters 16 and 17 present methods using electrocardiograms (ECG). ECG is
essentially a medical diagnostic technique but more recently it has fulfilled a rather
unlikely role, as a provider of security and privacy in the form of a biometric.
Chapter 16, entitled Electrocardiogram (ECG) Biometric for Robust Identifica-
tion and Secure Communication, examines the various implications and technical
challenges of using the ECG as a biometric. Specifically, novel signal processing
techniques are surveyed and proposed that seek to not only establish the status of the
ECG as an indisputable biometric trait, but also reinforce its versatile utility, such as
16. x Preface
in alleviating the resource consumption in certain communication networks. Chapter
17 discusses the heartbeat as a living biometric. Although previous research on the
topic focused mainly on analysis of the electrocardiogram, this chapter extends the
ECG results by applying processing methods to a larger and more diverse set of indi-
viduals, demonstrating that performance remains high for a larger and a more diverse
population. Alternative sensing methods, using blood pressure and pulse oximetry,
are presented and their corresponding performance is documented. The chapter also
discusses the phenomenology and sensing modalities for monitoring cardiovascular
function and, finally, examines the fusion of heartbeat information across the three
modalities and quantifies its performance.
Chapters 18 and 19 explore methodologies mainly based on electroencephalo-
grams (EEG). In Chapter 18, a method is proposed using physiological signals for
key features in high-security biometric systems. The experimental protocol that is
common for EEG and ECG recording is explained. EEG and ECG features as well
as the authentication algorithms are presented and their efficiency is individually as-
sessed. A fusion process carried out to achieve higher performance is also presented.
Chapter 19 presents a multiresolution analysis of the effect of face familiarity on
human event-related potentials. This method works by processing of the electroen-
cephalograms (EEGs) in response to familiar and unfamiliar face stimuli. Stimuli were
presented in successive trials and consisted of (a) multiple presentations of frontal,
gray-scale images of one person known to the subject and (b) unique unknown images
taken from multiple face databases. Coherent oscillations in phase were observed in
the lower delta activity of ERPs in response to known stimuli but not in response to
unknown stimuli.
Chapters 20 and 21 present methods and applications based on signature recogni-
tion and gait recognition. Although several approaches can be used in authentication
systems, the most commonly used authentication method in everyday transactions is
based on signature. The specific points of concern regarding online signature-based
authentication have to do more with template security issues and countermeasures.
Chapter 20 focuses on the security issues related to biometric templates, with ap-
plication to signature based authentication systems. The main privacy and security
issues are briefly summarized and some approaches that are used for the protection
of biometric templates are discussed. Data hiding techniques are used to design a
security scalable authentication system. The enrollment and the authentication pro-
cedure are detailed. In contrast to the signature, which is an established method for
authentication, gait recognition is an emerging technology that is particularly at-
tractive for biometric identification because the capturing of gait can take place in
an unobtrusive manner. Chapter 21 presents the fundamental approaches for unob-
trusive biometric identification based on gait and provides directions for future
research.
Chapters 22–26 deal with biometric applications including issues related to the
concept of biometric capacity. Chapter 22, presents a completely new framework
for biometric authentication in secure environments as well as a relevant application
based on gait recognition. The proposed framework is based on distributed source
coding for biometrics. In this new framework, the problem of biometric recognition
17. Preface xi
is formulated as the dual of data communication over noisy channels. In such a
system, the enrollment and authentication procedures are considered as the encod-
ing and decoding stages of a communication system. The above approach is highly
relevant to information theory. Further application of information theory in biomet-
rics can be found in the assessment of the information content of biometric traits.
The discriminating ability of biometric features is usually estimated by means of
experimentation. However, the information carried by biometrics, their uniqueness,
and their fusion prospects can be studied based on concepts from information theory.
This is the topic of Chapter 23, which deals with measuring information content in
biometric features. Next, in Chapter 24, a summary is presented of the theoretical
results and design experience obtained during the development of a next generation
physical access security system (PASS). The main feature of this PASS is its effi-
cient decision-making support of security personnel enhanced with the situational
awareness paradigm and intelligent tools.
Despite the increasing use of biometric features for authentication and identifica-
tion purposes in a broad variety of institutional and commercial systems, the adoption
of biometric techniques is restrained by a rising concern regarding the protection of
the biometrics templates. In fact, people are not generally keen to give out biometric
traits unless they are assured that their biometrics cannot be stolen or used without
their consent. Recent results showed that it is feasible to generate a unique identifier
by combining biometric traits. This approach makes it impossible to recover the orig-
inal biometric features and, thus, ensures the privacy of the biometrics. Chapter 25,
entitled Privacy in Biometrics, reviews the privacy issues related to the use of bio-
metrics, presents some of the most advanced techniques available up to date, provides
a comparative analysis, and gives an overview of future trends. A particular system
that builds privacy into an information system is presented in the final chapter, enti-
tled Biometric Encryption. In this chapter, the emerging area of privacy-enhancing
biometric technologies, referred to as “untraceable biometrics,” makes it possible to
enhance both privacy and security in a positive-sum model.
By its nature, an edited volume covers only a limited number of works and initia-
tives in the area of biometric systems. Researchers and practitioners are introducing
new developments at a very fast pace, and it would be impossible to cover all of them
in a single volume. However, we believe that the collection of chapters presented here
cover sufficiently well the theory, methods, and applications of biometrics. Readers
who wish to further explore the fascinating area of biometrics can find additional in-
formation using the bibliographic links that are provided in each one of the chapters
of this volume.
We thank all those who have helped to make this edited volume possible, espe-
cially the contributors who spent much of their precious time and energy in preparing
their chapters. We are really grateful for their enthusiasm and devotion to this project.
We thank the contributors and other experts who served as reviewers. Special thanks
should go to Dr. Qinghan Xiao, the reviewer assigned by IEEE Press, for providing
lots of useful suggestions for the improvement of the book. Our deep feelings of
appreciation go to John Wiley & Sons for the impeccable processing of the authors’
contributions and the final production of the book. Last, but certainly not least, we
18. xii Preface
would like to thank Jeanne Audino of IEEE Press for her professionalism and her
continuous support and assistance during all stages of the preparation and publication
of the manuscript.
Nikolaos V. Boulgouris
Konstantinos N. Plataniotis
Evangelia Micheli-Tzanakou
London, United Kingdom
Toronto, Ontario, Canada
New Brunswick, New Jersey
July 2009
19. Contributors
ANDY ADLER, Carleton University, Ottawa, Ontario, Canada
FOTEINI AGRAFIOTI, University of Toronto, Toronto, Ontario, Canada
SAVVAS ARGYROPOULOS, Informatics and Telematics Institute, Thessaloniki,
Greece
VINEETH BALASUBRAMANIAN, Arizona State University, Tempe, Arizona
BIR BHANU, University of California, Riverside, California
JOHN BLACK, Arizona State University, Tempe, Arizona
OLEG BOULANOV, University of Calgary, Calgary, Alberta, Canada
NIKOLAOS V. BOULGOURIS, King’s College, London, United Kingdom
NIKOLAOS BOURBAKIS, Wright State University, Dayton, Ohio
FRANCIS MINHTHANG BUI, University of Toronto, Toronto, Ontario, Canada
PATRIZIO CAMPISI, University of Rome, Rome, Italy
MARCO CAPARRINI, Starlab, Barcelona, Spain
ANN CAVOUKIAN, Office of Information and Privacy Commissioner Ontario,
Toronto, Ontario, Canada
IVAN CESTER, Starlab, Barcelona, Spain
RAMA CHELLAPPA, University of Maryland, College Park, Maryland
STELVIO CIMATO, University of Milan, Milan, Italy
BRETT DEMARCO, Rutgers University, New Brunswick, New Jersey
MARCO GAMASSI, University of Milan, Milan, Italy
DIMITRIOS HATZINAKOS, University of Toronto, Toronto, Ontario, Canada
XIAXI HUANG, King’s College, London, United Kingdom
DIMOSTHENIS IOANNIDIS, Informatics and Telematics Institute, Thessaloniki,
Greece
JOHN M. IRVINE, Science Applications International Corporation (SAIC),
Arlington, Virginia
STEVEN A. ISRAEL, Science Applications International Corporation (SAIC),
Arlington, Virginia
SHUIWANG JI, Arizona State University, Tempe, Arizona
RAGHUDEEP KANNAVARA, Wright State University, Dayton, Ohio
SREEKAR KRISHNA, Arizona State University, Tempe, Arizona
xiii
20. xiv Contributors
B. V. K. VIJAYA KUMAR, Carnegie Mellon University, Pittsburgh, Pennsylvania
ZHEN LEI, National Laboratory of Pattern Recognition, Institute of Automation,
Chinese Academy of Sciences, Beijing, China
STAN Z. LI, National Laboratory of Pattern Recognition, Institute of Automation,
Chinese Academy of Sciences, Beijing, China
YUNG-HUI LI, Carnegie Mellon University, Pittsburgh, Pennsylvania
SHENGCAI LIAO, National Laboratory of Pattern Recognition, Institute of
Automation, Chinese Academy of Sciences, Beijing, China
HAIPING LU, University of Toronto, Toronto, Ontario, Canada
ALESSANDRA LUMINI, Biometric System Laboratory, University of Bologna,
Bologna, Italy
EMANUELE MAIORANA, University of Rome, Rome, Italy
SOTIRIS MALASSIOTIS, Informatics and Telematics Institute, Thessaloniki,
Greece
DAVIDE MALTONI, Biometric System Laboratory, University of Bologna,
Bologna, Italy
ALEIX M. MARTINEZ, Ohio State University, Columbus, Ohio
EVANGELIA MICHELI-TZANAKOU, Rutgers University, New Brunswick,
New Jersey
LORIS NANNI, Biometric System Laboratory, University of Bologna, Bologna,
Italy
ALESSANDRO NERI, University of Rome, Rome, Italy
DONALD NETH, Ohio State University, Columbus, Ohio
SETHURAMAN PANCHANATHAN, Arizona State University, Tempe, Arizona
SUNG WON PARK, Carnegie Mellon University, Pittsburgh, Pennsylvania
IOANNIS PITAS, University of Thessaloniki, Thessaloniki, Greece
VINCENZO PIURI, University of Milan, Milan, Italy
KONSTANTINOS N. PLATANIOTIS, University of Toronto, Toronto, Ontario,
Canada
NARAYANAN RAMANATHAN, University of Maryland, College Park,
Maryland
ALEJANDRO RIERA, Starlab, Barcelona, Spain
AMIT K. ROY-CHOWDHURY, University of California, Riverside, California
GIULIO RUFFINI, Starlab, Barcelona, Spain
ROBERTO SASSI, University of Milan, Milan, Italy
MARIOS SAVVIDES, Carnegie Mellon University, Pittsburgh, Pennsylvania
FABIO SCOTTI, University of Milan, Milan, Italy
VLAD P. SHMERKO, University of Calgary, Calgary, Alberta, Canada
21. Contributors xv
AURELI SORIA-FRISCH, Starlab, Barcelona, Spain
ALEX STOIANOV, Office of Information and Privacy Commissioner Ontario,
Toronto, Ontario, Canada
ADRIAN STOICA, Jet Propulsion Laboratory, NASA, USA
MICHAEL G. STRINTZIS, Informatics and Telematics Institute, Thessaloniki,
Greece
XUEJUN TAN, University of California, Riverside, California
ANASTASIOS TEFAS, University of Thessaloniki, Thessaloniki, Greece
FILARETI TSALAKANIDOU, Informatics and Telematics Institute,
Thessaloniki, Greece
DIMITRIOS TZOVARAS, Informatics and Telematics Institute, Thessaloniki,
Greece
ANASTASIOS N. VENETSANOPOULOS, University of Toronto, Toronto,
Ontario, Canada
KRITHIKA VENKATARAMANI, Carnegie Mellon University, Pittsburgh,
Pennsylvania
RONG WANG, University of California, Riverside, California
RUI WANG, National Laboratory of Pattern Recognition, Institute of Automation,
Chinese Academy of Sciences, Beijing, China
BRENDA K. WIEDERHOLD, Science Applications International Corporation
(SAIC), Arlington, Virginia
MARK D. WIEDERHOLD, Science Applications International Corporation
(SAIC), Arlington, Virginia
YILEI XU, University of California, Riverside, California
SVETLANA N. YANUSHKEVICH, University of Calgary, Calgary, Alberta,
Canada
JIEPING YE, Arizona State University, Tempe, Arizona
RICHARD YOUMARAN, Carleton University, Ottawa, Ontario, Canada
24. 2 Chapter 1 Discriminant Analysis for Dimensionality Reduction
class discrimination. The optimal transformation in LDA can be readily computed
by applying an eigendecomposition on the so-called scatter matrices. It has been
used widely in many applications involving high-dimensional data [19–24]. How-
ever, classical LDA requires the so-called total scatter matrix to be nonsingular. In
many applications involving high-dimensional and low sample size data, the total
scatter matrix can be singular since the data points are from a very high-dimensional
space, and in general the sample size does not exceed this dimension. This is the
well-known singularity or undersampled problem encountered in LDA.
In recent years, many LDA extensions have been proposed to deal with the sin-
gularity problem, including PCA+LDA [19, 23], regularized LDA (RLDA) [21], null
space LDA (NLDA) [20], orthogonal centroid method (OCM) [25], uncorrelated
LDA (ULDA) [24], orthogonal LDA (OLDA) [24], and LDA/GSVD [26]. A brief
overview of these algorithms is given in Section 1.2. Different algorithms have been
applied successfully in various domains, such as PCA+LDA in face recognition [19,
23], OCM in text categorization [25], and RLDA in microarray gene expression data
analysis [21]. However, there is a lack of a systematic study to explore the common-
alities and differences of these algorithms, as well as their intrinsic relationship. This
has been a challenging task, since different algorithms apply completely different
schemes when dealing with the singularity problem.
Many of these LDA extensions involve an eigenvalue problem, which is com-
putationally expensive to solve especially when the sample size is large. LDA in
the binary-class case, called Fisher LDA, has been shown to be equivalent to linear
regression with the class label as output. Such regression model minimizes the sum-of-
squares error function whose solution can be obtained efficiently by solving a system
oflinearequations.However,theequivalencerelationshipislimitedtothebinary-class
case.
In this chapter, we present a unified framework for generalized LDA via a transfer
function. We show that various LDA-based algorithms differ in their transfer func-
tions. The unified framework elucidates the properties of various algorithms and their
relationship. We then discuss recent development on establishing the equivalence re-
lationship between multivariate linear regression (MLR) and LDA in the multiclass
case. In particular, we show that MLR with a particular class indicator matrix is
equivalent to LDA under a mild condition, which has been shown to hold for most
high-dimensional data. We further show how LDA can be performed in the semisu-
pervised setting, where both labeled and unlabeled data are provided, based on the
equivalence relationship between MLR and LDA. We also extend our discussion to
the kernel-induced feature space and present recent developments on multiple kernel
learning (MKL) for kernel discriminant analysis (KDA).
The rest of this chapter is organized as follows. We give an overview of classical
LDA and its generalization in Section 1.2. A unified framework for generalized LDA
as well as the theoretical properties of various algorithms and their relationship is
presented in Section 1.3. Section 1.4 discusses the least squares formulation for LDA.
We then present extensions of the discussion to semisupervised learning and kernel-
induced feature space in Sections 1.5 and 1.6, respectively. This chapter concludes in
Section 1.8.
25. 1.2 Overview of Linear Discriminant Analysis 3
1.2 OVERVIEW OF LINEAR DISCRIMINANT ANALYSIS
We are given a data set that consists of n samples {(xi, yi)}n
i=1, where xi ∈ IRd denotes
the d-dimensional input, yi ∈ {1, 2, . . . , k} denotes the corresponding class label, n
is the sample size, and k is the number of classes. Let
X = [x1, x2, . . . , xn] ∈ Rd×n
be the data matrix and let Xj ∈ Rd×nj be the data matrix of the jth class, where nj is
the sample size of the jth class and
k
j=1 nj = n. Classical LDA computes a linear
transformation G ∈ Rd× that maps xi in the d-dimensional space to a vector xL
i in
the -dimensional space as follows:
xi ∈ IRd
→ xL
i = GT
xi ∈ R
, d.
In LDA, three scatter matrices, called the within-class, between-class, and total scatter
matrices are defined as follows [8]:
Sw =
1
n
k
j=1
x∈Xj
(x − c(j)
)(x − c(j)
)T
, (1.1)
Sb =
1
n
k
j=1
nj(c(j)
− c)(c(j)
− c)T
, (1.2)
St =
1
n
n
i=1
(xi − c)(xi − c)T
, (1.3)
where c(j) is the centroid of the jth class and c is the global centroid. It can be verified
from the definitions that St = Sb + Sw [8]. Define three matrices Hw, Hb, and Ht as
follows:
Hw =
1
√
n
[X1 − c(1)
(e(1)
)T
, . . . , Xk − c(k)
(e(k)
)T
], (1.4)
Hb =
1
√
n
[
√
n1(c(1)
− c), . . . ,
√
nk(c(k)
− c)], (1.5)
Ht =
1
√
n
(X − ceT
), (1.6)
where e(j) and e are vectors of all ones of length nj and n, respectively. Then the three
scatter matrices, defined in Eqs. (1.1)–(1.3), can be expressed as
Sw = HwHT
w, Sb = HbHT
b , St = HtHT
t . (1.7)
It follows from the properties of matrix trace that
trace(Sw) =
1
n
k
j=1
x∈Xj
x − c(j)
2
2, (1.8)
26. 4 Chapter 1 Discriminant Analysis for Dimensionality Reduction
trace(Sb) =
1
n
k
j=1
njc(j)
− c
2
2. (1.9)
Thus trace(Sw) measures the distance between the data points and their corresponding
class centroid, and trace(Sb) captures the distance between the class centroids and the
global centroid.
In the lower-dimensional space resulting from the linear transformation G, the
scatter matrices become
SL
w = GT
SwG, SL
b = GT
SbG, SL
t = GT
StG. (1.10)
An optimal transformation G would maximize trace(SL
b ) and minimize trace(SL
w) si-
multaneously, which is equivalent to maximizing trace(SL
b ) and minimizing trace(SL
t )
simultaneously, since SL
t = SL
w + SL
b . The optimal transformation, GLDA, of LDA is
computed by solving the following optimization problem [8, 16]:
GLDA
= arg max
G
trace
SL
b
SL
t
−1
. (1.11)
It is known that the optimal solution to the optimization problem in Eq. (1.11) can be
obtained by solving the following generalized eigenvalue problem [8]:
Sbx = λStx. (1.12)
More specifically, the eigenvectors corresponding to the k − 1 largest eigenvalues
form columns of GLDA. When St is nonsingular, it reduces to the following regular
eigenvalue problem:
S−1
t Sbx = λx. (1.13)
When St is singular, the classical LDA formulation discussed above cannot be applied
directly. This is known as the singularity or undersampled problem in LDA. In the
following discussion, we consider the more general case when St may be singular.
The transformation, GLDA, then consists of the eigenvectors of S+
t Sb corresponding
to the nonzero eigenvalues, where S+
t denotes the pseudo-inverse of St [27]. Note that
when St is nonsingular, S+
t equals S−1
t .
The above LDA formulation is an extension of the original Fisher linear discrim-
inant analysis (FLDA) [7], which deals with binary-class problems, that is, k = 2.
The optimal transformation, GF , of FLDA is of rank one and is given by [15, 16]
GF
= S+
t (c(1)
− c(2)
). (1.14)
Note that GF is invariant of scaling. That is, αGF , for any α /
= 0, is also a solution
to FLDA.
When the dimensionality of data is larger than the sample size, which is the case
for many high-dimensional and low sample size data, all of the three scatter matrices
are singular. In recent years, many algorithms have been proposed to deal with this
singularity problem. We first review these LDA extensions in the next subsection. To
27. 1.3 A Unified Framework for Generalized LDA 5
elucidate their commonalities and differences, a general framework is presented in
Section 1.3 that unifies many of these algorithms.
1.2.1 Generalizations of LDA
A common way to deal with the singularity problem is to apply an intermediate
dimensionality reduction, such as PCA [9], to reduce the data dimensionality before
classical LDA is applied. The algorithm is known as PCA+LDA, or subspace LDA
[19, 28]. In this two-stage PCA+LDA algorithm, the discriminant stage is preceded by
a dimensionality reduction stage using PCA. The dimensionality, p, of the subspace
transformed by PCA is chosen such that the “reduced” total scatter matrix in this
subspace is nonsingular, so that classical LDA can be applied. The optimal value of
p is commonly estimated through cross-validation.
Regularization techniques can also be applied to deal with the singularity problem
of LDA. The algorithm is known as regularized LDA (RLDA) [21]. The key idea is
to add a constant μ 0 to the diagonal elements of St as St + μId, where Id is the
identity matrix of size d. It is easy to verify that St + μId is positive definite [27],
hence nonsingular. Cross-validation is commonly applied to estimate the optimal
value of μ. Note that regularization is also the key to many other learning algorithms
including Support Vector Machines (SVM) [29].
In reference 20, the null space LDA (NLDA) was proposed, where the between-
class distance is maximized in the null space of the within-class scatter matrix. The
singularity problem is thus avoided implicitly. The efficiency of the algorithm can
be improved by first removing the null space of the total scatter matrix. It is based
on the observation that the null space of the total scatter matrix is the intersection of
the null spaces of the between-class and within-class scatter matrices. In contrast, the
orthogonal centroid method (OCM) [25] maximizes the between-class distance only
and thereby omits the within-class information. The optimal transformation of OCM
is given by the top eigenvectors of the between-class scatter matrix Sb.
In reference 24, a family of generalized discriminant analysis algorithms were
presented. Uncorrelated LDA (ULDA) and orthogonal LDA (OLDA) are two repre-
sentative algorithms from this family. The features in the reduced space of ULDA are
uncorrelated, while the transformation, G, of OLDA has orthonormal columns, that
is, GT G = I. The LDA/GSVD algorithm proposed in reference 26, which over-
comes the singularity problem via the generalized singular value decomposition
(GSVD)[27], also belongs to this family. Discriminant analysis with an orthogonal
transformation has also been studied in reference 30.
1.3 A UNIFIED FRAMEWORK FOR GENERALIZED LDA
The LDA extensions discussed in the last section employ different techniques to deal
with the singularity problem. In this section, we present a four-step general framework
for various generalized LDA algorithms. The presented framework unifies most of
the generalized LDA algorithms. The properties of various algorithms as well as their
28. 6 Chapter 1 Discriminant Analysis for Dimensionality Reduction
relationships are elucidated from this framework. The unified framework consists of
four steps described below:
1. Compute the eigenvalues, {λi}d
i=1, of St in Eq. (1.3) and the correspond-
ing eigenvectors {ui}d
i=1, with λ1 ≥ · · · ≥ λd. Then St can be expressed as
St =
d
i=1 λiuiuT
i .
2. Given a transfer function : IR → IR, let λ̃i = (λi), for all i. Construct the
matrix S̃t as S̃t =
d
i=1 λ̃iuiuT
i .
3. Compute the eigenvectors, {φi}
q
i=1, of S̃+
t Sb corresponding to the nonzero
eigenvalues, where q = rank(Sb), S̃+
t denotes the pseudo-inverse of S̃t [27].
Construct the matrix G as G = [φ1, . . . , φq].
4. Optional orthogonalization step: Compute the QR decomposition [27] of G
as G = QR, where Q ∈ IRd×q has orthonormal columns and R ∈ IRq×q is
upper triangular.
With this four-step procedure, the final transformation is given by either the
matrix G from step 3, if the optional orthogonalization step is not applied, or the
matrix Q from step 4 if the transformation matrix is required to be orthogonal. In
this framework, different transfer functions, , in step 2 lead to different generalized
LDA algorithms, as summarized below:
r In PCA+LDA, the intermediate dimensionality reduction stage by PCA keeps
the top p eigenvalues of St; thus it applies the following linear step function:
(λi) = λi, for 1 ≤ i ≤ p, and (λi) = 0, for i p. The optional orthogonal-
ization step is not employed in PCA+LDA.
r In regularized LDA (RLDA), a regularization term is applied to St as St + μId,
for some μ 0. It corresponds to the use of the following transfer function:
(λi) = λi + μ, for all i. The optional orthogonalization step is not employed
in RLDA.
r In uncorrelated LDA (ULDA), the optimal transformation consists of the
top eigenvectors of S+
t Sb [24]. The corresponding transfer function is thus
given by (λi) = λi, for all i. The same transfer function is used in or-
thogonal LDA (OLDA). The difference between ULDA and OLDA is that
OLDA performs the optional orthogonalization step while it is not applied in
ULDA.
r In orthogonal centroid method (OCM), the optimal transformation is given by
thetopeigenvectorsofSb [25].Thetransferfunctionisthusgivenby(λi) = 1,
for all i. Since the eigenvectors of Sb forms an orthonormal set, the optional
orthogonalization step is not necessary in OCM.
It has been shown [31] that the regularization in RLDA is effective for nonzero
eigenvalues only. Thus, we can apply the following transfer function for RLDA:
(λi) =
λi + μ for 1 ≤ i ≤ t,
0 for i t,
29. 1.3 A Unified Framework for Generalized LDA 7
Table 1.1. Transfer Functions for Different LDA Extensions
PCA+LDA RLDA ULDA/OLDA OCM
(λi) =
λi for 1 ≤ i ≤ p
0 for i p
λi + μ for 1 ≤ i ≤ t
0 for i t
λi 1
where t = rank(St). The transfer functions for different LDA extensions are summa-
rized in Table 1.1.
In null space LDA (NLDA) [20, 32], the data are first projected onto the null space
of Sw, which is then followed by classical LDA. It is not clear which transfer function
corresponds to the projection onto the null space of Sw. In reference 33, the equiva-
lence relationship between NLDA and OLDA was established under a mild condition
C1 : rank(St) = rank(Sb) + rank(Sw), (1.15)
which has been shown to hold for many high-dimensional data. Thus, for high-
dimensional data, we can use the following transfer function for NLDA: (λi) = λi,
for all i.
1.3.1 Analysis
The unified framework from the last section summarizes the commonalities and dif-
ferences of various LDA-based algorithms. This unification of diverse algorithms into
a common framework sheds light on the understanding of the key features of various
algorithms as well as their relationship.
It is clear from Table 1.1 that ULDA is reduced to the OCM algorithm [25] when
St is a multiple of the identity matrix. Recent studies on the geometric representation
of high-dimensional and small sample size data show that under mild conditions,
the covariance matrix St tends to a scaled identity matrix when the data dimension d
tends to infinity with the sample size n fixed [34]. This implies that all the eigenvalues
of St are the same. In other words, the data behave as if the underlying distribution
is spherical. In this case, OCM is equivalent to ULDA. This partially explains the
effectiveness of OCM when working on high-dimensional data.
We can observe from Table 1.1 that when the reduced dimensionality, p, in the
PCAstageofPCA+LDAischosentobetherankof St—thatis,thePCAstagekeepsall
the information—then the transfer functions for PCA+LDA and ULDA are identical.
That is, PCA+LDA is equivalent to ULDA in this case. It can also be observed from
Table 1.1 that the transfer function for RLDA equals the one for ULDA when μ = 0.
Thus, ULDA can be considered as a special case of both PCA+LDA and RLDA.
It follows from the above discussion that when μ = 0 in RLDA, and p = rank(St)
in PCA+LDA, they both reduce to ULDA. It has been shown that, under condition
C1 in Eq. (1.15), the transformation matrix of ULDA lies in the null space of Sw [33].
That is, GT Sw = 0. Furthermore, it was shown in reference 31 that if GT Sw = 0
30. 8 Chapter 1 Discriminant Analysis for Dimensionality Reduction
holds, then the transformation matrix G maps all data points from the same class to a
common vector. This is an extension of the result in reference 32, which assumes that
all classes in the data set have the same number of samples. Thus it follows that the
ULDA transformation maps all data points from the same class to a common vector,
provided that condition C1 is satisfied. This leads to a perfect separation between
different classes in the dimensionality-reduced space. However, it may also result in
overfitting. RLDA overcomes this limitation by choosing a nonzero regularization
value μ, while PCA+LDA overcomes this limitation by setting p rank(St).
The above analysis shows that the regularization in RLDA and the PCA dimen-
sionality reduction in PCA+LDA are expected to alleviate the overfitting problem,
provided that appropriate values for μ and p can be estimated. Selecting an optimal
value for a parameter such as μ in RLDA and p in PCA+LDA from a given candidate
set is called model selection [17]. Existing studies have focused on the estimation
from a small candidate set, as it involves expensive matrix computations for each
candidate value. However, a large candidate set is desirable in practice to achieve
good performance. This has been one of the main reasons for their limited applica-
bility in practice. To overcome this problem, an efficient model selection algorithm
for RLDA was proposed in reference 31 and this algorithm can estimate an optimal
value for μ from a large number of candidate values efficiently.
1.4 A LEAST SQUARES FORMULATION FOR LDA
In this section, we discuss recent developments on connecting LDA to multivariate
linear regression (MLR). We first discuss the relationship between linear regression
and LDA in the binary-class case. We then present multivariate linear regression with
a specific class indicator matrix. This indicator matrix plays a key role in establishing
the equivalence relationship between MLR and LDA in the multiclass case.
1.4.1 Linear Regression versus Fisher LDA
Given a data set of two classes, {(xi, yi)}n
i=1, xi ∈ IRd and yi ∈ {−1, 1}, the linear
regression model with the class label as the output has the following form:
f(x) = xT
w + b, (1.16)
where w ∈ IRd is the weight vector, and b is the bias of the linear model. A popular
approach for estimating w and b is to minimize the sum-of-squares error function,
called least squares, as follows:
L(w, b) =
1
2
n
i=1
||f(xi) − yi||2
=
1
2
||XT
w + be − y||2
, (1.17)
where X = [x1, x2, . . . , xn] is the data matrix, e is the vector of all ones, and y is
the vector of class labels. Assume that both {xi} and {yi} have been centered, that is,
31. 1.4 A Least Squares Formulation for LDA 9
n
i=1 xi = 0 and
n
i=1 yi = 0. It follows that
yi ∈ {−2n2/n, 2n1/n} ,
where n1 and n2 denote the number of samples from the negative and positive classes,
respectively. In this case, the bias term b in Eq. (1.16) becomes zero and we construct
a linear model f(x) = xT w by minimizing
L(w) =
1
2
||XT
w − y||2
. (1.18)
It can be shown that the optimal w minimizing the objective function in Eq. (1.18) is
given by [16, 17]
w =
XXT
+
Xy.
Note that the data matrix X has been centered and thus XXT = nSt and Xy =
2n1n2
n (c(1) − c(2)). It follows that
w =
2n1n2
n2
S+
t (c(1)
− c(2)
) =
2n1n2
n2
GF
,
where GF is the optimal solution to FLDA in Eq. (1.14). Hence linear regression with
the class label as the output is equivalent to Fisher LDA, as the projection in FLDA
is invariant of scaling. More details on this equivalence relationship can be found in
references 15, 16, and 35.
1.4.2 Relationship Between Multivariate Linear
Regression and LDA
In the multiclass case, we are given a data set consisting of n samples {(xi, yi)}n
i=1,
where xi ∈ IRd and yi ∈ {1, 2, . . . , k} denotes the class label of the ith sample and
k 2. To apply the least squares formalism to the multiclass case, the 1-of-k binary
coding scheme is usually used to associate a vector-valued class code to each data
point[15,17].Inthiscodingscheme,theclassindicatormatrix,denotedasY1 ∈ IRn×k,
is defined as follows:
Y1(ij) =
1 if yi = j,
0 otherwise.
(1.19)
It is known that the solution to least squares problem approximates the conditional
expectation of the target values given the input [15]. One justification for using the
1-of-k scheme is that, under this coding scheme, the conditional expectation is given
by the vector of posterior class probabilities. However, these probabilities are usually
approximated rather poorly [15]. There are also some other class indicator matrices
considered in the literature. In particular, the indicator matrix Y2 ∈ IRn×k, defined as
Y2(ij) =
1 if yi = j,
−1/(k − 1) otherwise,
(1.20)
32. 10 Chapter 1 Discriminant Analysis for Dimensionality Reduction
has been introduced to extend support vector machines (SVM) for multiclass classi-
fication [36] and to generalize the kernel target alignment measure [37], originally
proposed in reference 38.
In multivariate linear regression, a k-tuple of discriminant functions
f(x) = (f1(x), f2(x), . . . , fk(x))
is considered for each x ∈ IRd. Denote X̃ = [x̃1, . . . , x̃n] ∈ IRd×n and Ỹ =
Ỹij
∈
IRn×k as the centered data matrix X and the centered indicator matrix Y, respectively.
That is, x̃i = xi − x̄ and Ỹij = Yij − Ȳj, where x̄ = 1
n
n
i=1 xi and Ȳj = 1
n
n
i=1 Yij.
Then MLR computes the weight vectors, {wj}k
j=1 ∈ IRd, of the k linear models,
fj(x) = xT wj,forj = 1, . . . , k,viatheminimizationofthefollowingsum-of-squares
error function:
L(W) =
1
2
||X̃T
W − Ỹ||2
F =
1
2
k
j=1
n
i=1
||fj(x̃i) − Ỹij||2
, (1.21)
where W = [w1, w2, . . . , wk] is the weight matrix and || · ||F denotes the Frobenius
norm of a matrix [27]. The optimal W is given by [15, 17]
W =
X̃X̃T
+
X̃Ỹ, (1.22)
which is dependent on the centered class indicator matrix Ỹ.
Both Y1 and Y2 defined in Eqs. (1.19) and (1.20), as well as the one in reference
39, could be used to define the centered indicator matrix Ỹ. An interesting connection
between the linear regression model using Y1 and LDA can be found in reference 17
(page 112). It can be shown that if XL = WT
1 X̃ is the transformed data by W1, where
W1 =
X̃X̃T
+
X̃Ỹ1 is the least squares solution in Eq. (1.22) using the centered
indicator matrix Ỹ1, then LDA applied to XL is identical to LDA applied to X̃ in the
original space. In this case, linear regression is applied as a preprocessing step before
the classification and is in general not equivalent to LDA. The second indicator matrix
Y2 has been used in SVM, and the resulting model using Y2 is also not equivalent to
LDA in general. This is also the case for the indicator matrix in reference 39. One
natural question is whether there exists a class indicator matrix Ỹ ∈ IRn×k, with which
multivariate linear regression is equivalent to LDA. If this is the case, then LDA can be
formulated as a least squares problem in the multiclass case, and the generalizations
of least squares can be readily applied to LDA.
In MLR, each x̃i is transformed to
(f1(x̃i), . . . , fk(x̃i))T
= WT
x̃i,
and the centered data matrix X̃ ∈ IRd×n is transformed to WT X̃ ∈ IRk×n, thus achiev-
ing dimensionality reduction if k d. Note that the transformation matrix W in MLR
is dependent on the centered class indicator matrix Ỹ as in Eq. (1.22). To derive a
class indicator matrix for MLR with which the transformation matrix is related to that
of LDA, it is natural to apply the class discrimination criterion used in LDA. We thus
33. 1.4 A Least Squares Formulation for LDA 11
look for Ỹ, which solves the following optimization problem:
maxỸ trace
(WT SbW)(WT StW)+
subject to W =
X̃X̃T
+
X̃Ỹ,
(1.23)
where the pseudo-inverse is used as the matrix X̃X̃T can be singular.
In reference 40, a new class indicator matrix, called Y3, is constructed and it was
shown that Y3 solves the optimization problem in Eq. (1.23). This new class indicator
matrix Y3 = (Y3(ij))ij ∈ IRn×k is defined as follows:
Y3(ij) =
⎧
⎪
⎨
⎪
⎩
n
nj
−
nj
n if yi = j,
−
nj
n otherwise,
(1.24)
where nj is the sample size of the jth class, and n is the total sample size. Note
that Y3 defined above has been centered (in terms of rows), and thus Ỹ3 = Y3. More
importantly, it was shown in reference 40 that, under condition C1 in Eq. (1.15),
multivariate linear regression with Y3 as the class indicator matrix is equivalent
to LDA. We outline the main result below and the detailed proof can be found in
reference 40.
Recall that in LDA, the optimal transformation matrix (GLDA) consists of the top
eigenvectors of S+
t Sb corresponding to the nonzero eigenvalues. On the other hand,
since X̃X̃T = nSt and X̃Y3 = nHb, where St and Hb are defined in Eqs. (1.3) and
(1.5), respectively, the optimal weight matrix WMLR for MLR in Eq. (1.22) can be
expressed as
WMLR
=
X̃X̃T
+
X̃Y3 = (nSt)+
nHb = S+
t Hb. (1.25)
It can be shown that the transformation matrix GLDA of LDA, which consists of the
top eigenvectors of S+
t Sb, and the projection matrix for MLR that is given in Eq. (1.25)
are related as follows [40]:
WMLR
=
GLDA
, 0
QT
,
where is a diagonal matrix and Q is an orthogonal matrix.
The K-Nearest-Neighbor (K-NN) algorithm [16] based on the Euclidean distance
is commonly applied as the classifier in the dimensionality-reduced space of LDA.
If we apply WMLR for dimensionality reduction before K-NN, the matrix WMLR
is invariant of an orthogonal transformation, since any orthogonal transformation
preserves all pairwise distance. Thus WMLR is essentially equivalent to
GLDA, 0
or GLDA, as the removal of zero columns does not change the pairwise distance
either. Thus the essential difference between WMLR and GLDA is the diagonal matrix
. Interestingly, it was shown in reference 40 that the matrix is an identity matrix
under the condition C1 defined in Eq. (1.15). This implies that multivariate linear
regression with Y3 as the class indicator matrix is equivalent to LDA provided that
34. 12 Chapter 1 Discriminant Analysis for Dimensionality Reduction
the condition C1 is satisfied. Thus LDA can be formulated as a least squares problem
in the multiclass case. Experimental results in reference 40 show that condition C1 is
likely to hold for high-dimensional and undersampled data.
1.5 SEMISUPERVISED LDA
Semisupervised learning, which occupies the middle ground between supervised
learning (in which all training examples are labeled) and unsupervised learning (in
which no labeled data are given), has received considerable attention recently [41–43].
The least square LDA formulation from the last section results in Laplacian-
regularized LDA [44]. Furthermore, it naturally leads to semisupervised dimension-
ality reduction by incorporating the unlabeled data through the graph Laplacian.
1.5.1 Graph Laplacian
Given a data set {xi}n
i=1, a weighted graph can be constructed where each node in the
graph corresponds to a data point in the data set. The weight Sij between two nodes
xi and xj is commonly defined as follows:
Sij =
⎧
⎨
⎩
exp
−
xi−xj2
σ
, xi ∈ N(xj) or xj ∈ N(xi),
0 otherwise,
(1.26)
where both and σ 0 are parameters to be specified, and xi ∈ N(xj) implies that
xi is among the nearest neighbors of xj [45]. Let S be the similarity matrix whose
(i, j)th entry is Sij. To learn an appropriate representation {zi}n
i=1 which preserves
locality structure, it is common to minimize the following objective function [45]:
i,j
zi − zj2
Sij. (1.27)
Intuitively, if xi and xj are close to each other in the original space—that is, Sij
is large—then zi − zj tends to be small if the objective function in Eq. (1.27) is
minimized. Thus the locality structure in the original space is preserved.
Define the Laplacian matrix L as L = D − S, where D is a diagonal matrix
whose diagonal entries are the column sums of S. That is, Dii =
n
j=1 Sij. Note that
L is symmetric and positive semidefinite. It can be verified that
1
2
n
i=1
n
j=1
zi − zj2
Sij = trace(ZLZT
), (1.28)
where Z = [z1, . . . , zn].
35. 1.6 Extensions to Kernel-induced Feature Space 13
1.5.2 A Regularization Framework
for Semisupervised LDA
In semisupervised LDA, information from unlabeled data is incorporated into the for-
mulation via a regularization term defined as in Eq. (1.28). Mathematically, semisu-
pervised LDA computes an optimal weight matrix W∗, which solves the following
optimization problem:
W∗
= arg min
W
X̃T
W − Y32
F + γtrace(WT
X̃LX̃T
W) , (1.29)
where γ ≥ 0 is a tuning parameter and Y3 is the class indicator matrix defined in
Eq. (1.24). Since the Laplacian regularizer in Eq. (1.29) does not depend on the label
information, the unlabeled data can be readily incorporated into the formulation. Thus
the locality structures of both labeled and unlabeled data points are captured through
the transformation W. It is clear that W∗ is given by
W∗
=
γX̃LX̃T
+ X̃X̃T
+
nHb. (1.30)
1.6 EXTENSIONS TO KERNEL-INDUCED
FEATURE SPACE
The discussion so far focuses on linear dimensionality reduction and regression. It
has been shown that both discriminant analysis and regression can be adapted to
nonlinear models by using the kernel trick [46–48]. Mika et al. [49] extended the
Fisher discriminant analysis to its kernel version in the binary-class case. Following
the work in reference 50, Baudat and Anouar [51] proposed the generalized discrimi-
nant analysis (GDA) algorithm for multiclass problems. The equivalence relationship
between kernel discriminant analysis (KDA) and kernel regression has been studied
in reference 35 for binary-class problems. The analysis presented in this chapter can
be applied to extend this equivalence result to multiclass problems.
A symmetric function κ : X × X → R, where X denotes the input space, is
called a kernel function if it satisfies the finitely positive semidefinite property [46].
That is, for any x1, . . . , xn ∈ X, the kernel Gram matrix K ∈ IRn×n, defined by Kij =
κ(xi, xj), is positive semidefinite. Any kernel function κ implicitly maps the input set
X to a high-dimensional (possibly infinite) Hilbert space Hκ equipped with the inner
product (·, ·)Hκ through a mapping φκ from X to Hκ:
κ(x, z) = (φκ(x), φκ(z))Hκ
.
In KDA, three scatter matrices are defined in the feature space Hκ as follows:
Sφ
w =
1
n
k
j=1
x∈Xj
φ (x) − c
φ
j
φ (x) − c
φ
j
T
, (1.31)
36. 14 Chapter 1 Discriminant Analysis for Dimensionality Reduction
S
φ
b =
1
n
k
j=1
nj
c
φ
j − cφ
c
φ
j − cφ
T
, (1.32)
S
φ
t =
1
n
k
j=1
x∈Xj
φ (x) − cφ
φ (x) − cφ
T
, (1.33)
where c
φ
j is the centroid of the jth class and cφ is the global centroid in the feature
space. Similar to the linear case, the transformation G of KDA can be computed by
solving the following optimization problem:
G = arg max
G
trace
GT
S
φ
t G
+
GT
S
φ
b G
. (1.34)
It follows from the Representer Theorem [47] that columns of G lie in the span of the
images of training data in the feature space. That is,
G = φ(X)B, (1.35)
for some matrix B ∈ Rn×(k−1), where
φ(X) = [φ(x1), . . . , φ(xn)]
is the data matrix in the feature space. Substituting Eq. (1.35) into Eq. (1.34), we can
obtain the matrix B by solving the following optimization problem:
B = arg max
B
trace
BT
SK
t B
+
BT
SK
b B
, (1.36)
where SK
b = KY3YT
3 K, SK
t = K2, and K = φ(X)T φ(X) is the kernel matrix.
It can be verified that SK
b and SK
t are the between-class and total scatter matriices,
respectively, when each column in K is considered as a data point in the n-dimensional
space.ItfollowsfromTheorem5.3inreference33thattheconditionC1inEq.(1.15)is
satisfied if all the training data points are linearly independent. Therefore, if the kernel
matrix K is nonsingular (hence its columns are linearly independent), then kernel
discriminant analysis (KDA) and kernel regression using Y3 as the class indicator
matrix are essentially equivalent. This extends the equivalence result between KDA
and kernel regression in the binary-class case, originally proposed in reference 35, to
the multiclass setting.
To overcome the singularity problem in kernel discriminant analysis (KDA),
a number of techniques have been developed in the literature. Regularization was
employed in reference 52. The QR decomposition was employed in reference 51 to
avoid the singularity problem by removing the zero eigenvalues. Lu et al. [53, 54]
extended the direct LDA (DLDA) algorithm [55] to kernel direct LDA based on the
kernel trick. PCA+LDA was discussed in reference 56, and a complete algorithm was
proposed to derive discriminant vectors from the null space of the within-class scatter
matrix and its orthogonal complement. Recently, similar ideas were extended to the
feature space based on kernel PCA [57].
37. 1.7 Other LDA Extensions 15
Another challenging issue in applying KDA is the selection of an appropriate
kernel function. Recall that kernel methods work by embedding the input data into
some high-dimensional feature space. The key fact underlying the success of kernel
methods is that the embedding into feature space can be determined uniquely by
specifying a kernel function that computes the dot product between data points in
the feature space. In other words, the kernel function implicitly defines the nonlinear
mapping to the feature space and expensive computations in the high-dimensional
feature space can be avoided by evaluating the kernel function. Thus one of the
central issues in kernel methods is the selection of kernels.
To automate kernel-based learning algorithms, it is desirable to integrate the
tuning of kernels into the learning process. This problem has been addressed from
different perspectives recently. Lanckriet et al. [58] pioneered the work of multiple
kernel learning (MKL) in which the optimal kernel matrix is obtained as a linear
combination of prespecified kernel matrices. It was shown [58] that the coefficients
in MKL can be determined by solving convex programs in the case of Support Vector
Machines (SVM). While most existing work focuses on learning kernels for SVM,
Fung et al. [59] proposed to learn kernels for discriminant analysis. Based on ideas
from MKL, this problem was reformulated as a semidefinite program (SDP) [60] in
reference 61 for binary-class problems.
By optimizing an alternative criterion, an SDP formulation for the KDA kernel
learning problem in the multiclass case was proposed in reference 62. To reduce
the computational cost of the SDP formulation, an approximate scheme was also
developed. Furthermore, it was shown that the regularization parameter for KDA can
also be learned automatically in this framework [62]. Although the approximate SDP
formulation in reference 62 is scalable in terms of the number of classes, interior
point algorithms [63] for solving SDP have an inherently large time complexity and
thus it can not be applied to large-scale problems. To improve the efficiency of this
formulation, a quadratically constrained quadratic program (QCQP) [63] formulation
was proposed in reference 64 and it is more scalable than the SDP formulations.
1.7 OTHER LDA EXTENSIONS
Sparsity has recently received much attention for extending existing algorithms to
induce sparse solutions [65–67]. L1-norm penalty has been used in regression [68],
known as LASSO, and SVM [69, 70] to achieve model sparsity. Sparsity often leads
to easy interpretation and good generalization ability of the resulting model. Sparse
Fisher LDA has been proposed in reference 35, for binary-class problems. Based
on the equivalence relationship between LDA and MLR, a multiclass sparse LDA
formulation was proposed in reference 71 and an entire solution path for LDA was
also obtained through the LARS algorithm [72].
The discussions in this chapter focus on supervised approaches. In the unsuper-
vised setting, LDA can be applied to find the discriminant subspace for clustering,
such as K-means clustering. In this case, an iterative algorithm can be derived al-
ternating between clustering and discriminant subspace learning via LDA [73–75].
38. 16 Chapter 1 Discriminant Analysis for Dimensionality Reduction
Interestingly, it can be shown that this iterative procedure can be simplified and is
essentially equivalent to kernel K-means with a specific kernel Gram matrix [76].
When the data in question are given as high-order representations such as 2D and
3D images, it is natural to encode them using high-order tensors. Discriminant tensor
factorization, which is a two-dimensional extension of LDA, for a collection of two-
dimensional images has been studied [77]. It was further extended to higher-order
tensors in reference 78. However, the computational convergency of these iterative
algorithms [77, 78] is not guaranteed. Recently, a novel discriminant tensor factor-
ization procedure with the convergency property was proposed [79]. Other recent
extensions on discriminant tensor factorization as well as their applications to image
analysis can be found in reference 80.
1.8 CONCLUSION
In this chapter, we provide a unified view of various LDA algorithms and discuss
recent developments on connecting LDA to multivariate linear regression. We show
that MLR with a specific class indicator matrix is equivalent to LDA under a mild
condition, which has been shown to hold for many high-dimensional and small sam-
ple size data. This implies that LDA reduces to a least squares problem under this
condition, and its solution can be obtained by solving a system of liner equations.
Based on this equivalence result, we show that LDA can be applied in the semisuper-
vised setting. We further extend the discussion to the kernel-induced feature space
and present recent developments on kernel learning. Finally, we discuss several other
recent developments on discriminant analysis, including sparse LDA, unsupervised
LDA, and tensor LDA.
REFERENCES
1. A. K. Jain, P. Flynn, and A. A. Ross, Handbook of Biometrics, Springer, New York, 2007.
2. A. K. Jain and S. Z. Li, Handbook of Face Recognition, Springer-Verlag, New York, 2005.
3. A. K. Jain, A. A. Ross, and S. Prabhakar, An introduction to biometric recognition, IEEE Trans.
Circuits Syt. Video Technol. 14(1):4–20, 2004.
4. R. E. Bellman, Adaptive Control Processes: A Guided Tour, Princeton University Press, Princeton,
NJ, 1961.
5. D. Donoho, High-dimensional data analysis: The curses and blessings of dimensionality, in American
Mathematical Society Lecture—Math Challenges of the 21st Century, August 2000.
6. S. Yan, D. Xu, B. Zhang, H.-J. Zhang, Q. Yang, and S. Lin, Graph embedding and extensions: A general
framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 29(1):40–51, 2007.
7. R. A. Fisher, The use of multiple measurements in taxonomic problems. Ann. Eugenics 7:179–188,
1936.
8. K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edition, Academic Press Profes-
sional, San Diego, 1990.
9. I. T. Jolliffe, Principal Component Analysis, 2nd edition, Springer-Verlag, New York, 2002.
10. M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation,
Neural Comput. 15(6):1373–1396, 2003.
39. References 17
11. C.J.C.Burges.Geometricmethodsforfeatureextractionanddimensionalreduction,inO.Maimonand
L. Rokach, editors, Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners
and Researchers, Springer, New York, 2005, pp. 59–92.
12. S. T. Roweis and L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science
290(5500):2323–6, 2000.
13. L. K. Saul, K. Q. Weinberger, J. H. Ham, F. Sha, and D. D. Lee, Spectral methods for dimensionality
reduction, in O. Chapelle B. Schöelkopf and A. Zien, editors, Semisupervised Learning, MIT Press,
Cambridge, MA, 293–308, 2006.
14. J. B. Tenenbaum, V. d. Silva, and J. C. Langford, A global geometric framework for nonlinear dimen-
sionality reduction, Science 290(5500):279–294, 2000.
15. C. M. Bishop, Pattern Recognition and Machine Learning, Springer, New York, 2006.
16. R. O. Duda, P. E. Hart, and D. Stork, Pattern Classification, John Wiley Sons, New York, 2000.
17. T. Hastie, R. Tibshirani, and J.H. Friedman, The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, Springer, New York, 2001.
18. A. M. Martinez and M. Zhu, Where are linear feature extraction methods applicable? IEEE Trans.
Pattern Anal. Mach. Intell. 27(12):1934–1944, 2005.
19. P. N. Belhumeour, J. P. Hespanha, and D. J. Kriegman, Eigenfaces vs. Fisherfaces: Recognition using
class specific linear projection. IEEE Trans. Pattern Anal. Mach. Intell. 19(7):711–720, 1997.
20. L. F. Chen, H. Y. M. Liao, M. T. Ko, J. C. Lin, and G. J. Yu, A new lda-based face recognition system
which can solve the small sample size problem, Pattern Recognit 33:1713–1726, 2000.
21. Y. Guo, T. Hastie, and R. Tibshirani, Regularized linear discriminant analysis and its application in
microarrays, Biostatistics 8(1):86–100, 2007.
22. A. M. Martinez and A. C. Kak, PCA versus LDA, IEEE Trans. Pattern Anal. Mach. Intell. 23(2):228–
233, 2001.
23. X. Wang and X. Tang, A unified framework for subspace face recognition. IEEE Trans. Pattern Anal.
Mach. Intell. 26(9):1222–1228, 2004.
24. J. Ye, Characterization of a family of algorithms for generalized discriminant analysis on undersampled
problems, J. Mach. Learning Res. 6:483–502, 2005.
25. H. Park, M. Jeon, and J. B. Rosen, Lower dimensional representation of text data based on centroids
and least squares, BIT 43(2):1–22, 2003.
26. P. Howland, M. Jeon, and H. Park, Structure preserving dimension reduction for clustered text data
based on the generalized singular value decomposition, SIAM J. Matrix Anal. Appl. 25(1):165–179,
2003.
27. G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd edition, The Johns Hopkins University
Press, Baltimore, 1996.
28. W. Zhao, R. Chellappa, and P. Phillips, Subspace linear discriminant analysis for face recogni-
tion, Technical Report CAR-TR-914, Center for Automation Research, University of Maryland,
1999.
29. V. N. Vapnik. Statistical Learning Theory, John Wiley Sons, New York, 1998.
30. L. Duchene and S. Leclerq, An optimal transformation for discriminant and principal component
analysis, IEEE Trans. Pattern Anal. Mach. Intell. 10(6):978–983, 1988.
31. J. Ye, T. Xiong, Q. Li, R. Janardan, J. Bi, V. Cherkassky, and C. Kambhamettu, Efficient model selection
for regularized linear discriminant analysis, in Proceedings of the 15th ACM International Conference
on Information and Knowledge Management, 2006, pp. 532–539.
32. H. Cevikalp, M. Neamtu, M. Wilkes, and A. Barkana, Discriminative common vectors for face recog-
nition, IEEE Trans. Pattern Anal. Mach. Intell. 27(1):4–13, 2005.
33. J. Ye and T. Xiong, Computational and theoretical analysis of null space and orthogonal linear dis-
criminant analysis, J. Mach. Learning Res. 7:1183–1204, 2006.
34. P. Hall, J. S. Marron, and A. Neeman, Geometric representation of high dimension, low sample size
data, J. R. Stat. Soc. Ser. B 67:427–444, 2005.
35. S. Mika, Kernel Fisher Discriminants, Ph.D. thesis, University of Technology, Berlin, 2002.
36. Y. Lee, Y. Lin, and G. Wahba, Multicategory support vector machines, theory, and application to the
classification of microarray data and satellite radiance data, J. Am. Stat. Assoc. 99:67–81, 2004.
40. 18 Chapter 1 Discriminant Analysis for Dimensionality Reduction
37. Y. Guermeur, A. Lifchitz, and R. Vert, A kernel for protein secondary structure prediction, in Kernel
Methods in Computational Biology, The MIT Press, Cambridge, MA, 2004, pp. 193–206.
38. N. Cristianini, J. Kandola, A. Elisseeff, and J. Shawe-Taylor, On kernel target alignment, in Advances
in Neural Information Processing Systems, The MIT Press, Cambridge, MA, 2001.
39. C. Park and H. Park, A relationship between LDA and the generalized minimum squared error solution,
SIAM J. Matrix Anal. Appl. 27(2):474–492, 2005.
40. J. Ye, Least squares linear discriminant analysis, in Proceedings of the 24th International Conference
on Machine Learning, 2007, pp. 1087–1093.
41. O. Chapelle, B. Schölkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, Cambridge,
MA, 2006.
42. D. Zhou, O. Bousquet, T. Lal, J. Weston, and B. Schölkopf, Learning with local and global consistency,
in Advances in Neural Information Processing Systems, 2003, pp. 321–328.
43. X. Zhu, Z. Ghahramani, and J. Lafferty, Semi-supervised learning using Gaussian fields and har-
monic functions, in Proceedings of the 20th International Conference on Machine Learning, 2003,
pp. 912–919.
44. J. Chen, J. Ye, and Q. Li, Integrating global and local structures: A least squares framework for
dimensionality reduction, in IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, 2007, pp. 1–8.
45. M. Belkin and P. Niyogi, Laplacian eigenmaps and sepctral techniques for embedding and clustering,
Adv. Neural Inf. Processing Sys. 15:585–591, 2001.
46. N. Cristianini and J. S. Taylor, An Introduction to Support Vector Machines and other Kernel-Based
Learning Methods, Cambridge University Press, New York, 2000.
47. S. Schölkopf and A. Smola, Learning with Kernels: Support Vector Machines,Regularization, Opti-
mization and Beyond, MIT Press, Cambridge, MA, 2002.
48. J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press,
New York, 2004.
49. S. Mika, G. Rätsch, J. Weston, B. Schölkopf, and K.-R. Müller, Fisher discriminant analysis with
kernels, in Y.-H. Hu, J. Larsen, E. Wilson, and S. Douglas, editors, Neural Networks for Signal
Processing IX, IEEE, New York, 1999, pp. 41–48.
50. B. Schölkopf, A. J. Smola, and K-R. Müller, Nonlinear component analysis as a kernel eigenvalue
problem, Neural Comput. 10(5):1299–1319, 1998.
51. G. Baudat and F. Anouar, Generalized discriminant analysis using a kernel approach, Neural Comput.
12(10):2385–2404, 2000.
52. S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A. Smola, and K.-R. Müller, Constructing descriptive and
discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces, IEEE Trans. Pattern
Anal. Mach. Intell. 25(5):623–633, 2003.
53. J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, Face recognition using kernel direct discriminant
analysis algorithms, IEEE Trans. Neural Networks 14(1):117– 126, 2003.
54. J. Lu, K. N. Plataniotis, A. N. Venetsanopoulos, and J. Wang, An efficient kernel discriminant analysis
method, Pattern Recognit. 38(10):1788–1790, 2005.
55. H. Yu and J. Yang, A direct LDA algorithm for high-dimensional data with applications to face
recognition, Pattern Recognit. 34:2067–2070, 2001.
56. J. Yang and J. Yang, Why can LDA be performed in PCA transformed space? Pattern Recognit.
36(2):563–566, 2003.
57. J. Yang, A. F. Frangi, J. Yang, D. Zhang, and Z. Jin, KPCA plus LDA: A complete kernel fisher
discriminant framework for feature extraction and recognition. IEEE Trans. Pattern Anal. Mach.
Intell. 27(2):230–244, 2005.
58. G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan, Learning the kernel
matrix with semidefinite programming, J. Mach. Learning Res. 5:27–72, 2004.
59. G. Fung, M. Dundar, J. Bi, and B. Rao, A fast iterative algorithm for Fisher discriminant using hetero-
geneous kernels, in Proceedings of the Twenty-First International Conference on Machine Learning,
2004.
60. L. Vandenberghe and S. Boyd, Semidefinite programming, SIAM Rev. 38(1):49–95, 1996.
41. References 19
61. S.-J. Kim, A. Magnani, and S. Boyd, Optimal kernel selection in kernel Fisher discriminant analysis, in
Proceedings of the Twenty-Third International Conference on Machine Learning, 2006, pp. 465–472.
62. J. Ye, J. Chen, and S. Ji, Discriminant kernel and regularization parameter learning via semidefinite
programming, in Proceedings of the Twenty-Fourth International Conference on Machine Learning,
2007, pp. 1095–1102.
63. S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, New York, 2004.
64. J. Ye, S. Ji, and J. Chen, Learning the kernel matrix in discriminant analysis via quadratically con-
strained quadratic programming, in Proceedings of the 13th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, 2007, pp. 854–863.
65. A. d’Aspremont, L. E. Ghaoui, M. I. Jordan, and G. R. G. Lanckriet, A direct formulation for sparse
PCA using semidefinite programming, SIAM Rev. 49(3):434–448, 2007.
66. I. T. Jolliffe and M. Uddin, A modified principal component technique based on the lasso, J. Comput.
Graph. Stat. 12:531–547, 2003.
67. H. Zou, T. Hastie, and R. Tibshirani, Sparse principal component analysis, J. Comput. Graph. Stat.
15(2):265–286, 2006.
68. R. Tibshirani, Regression shrinkage and selection via the lasso, J. Royal Stat. Soc. Ser. B (1):267–288,
1996.
69. L. Wang and X. Shen, On L1-norm multiclass support vector machines: Methodology and theory, J.
Am. Stat. Assoc. 102(478):583–594, 2007.
70. J. Zhu, S. Rosset, T. Hastie, and R. Tibshirani, 1-Norm support vector machines, in Advances in Neural
Information Processing Systems, 2003.
71. J. Ye, J. Chen, R. Janardan, and S. Kumar, Developmental stage annotation of Drosophila gene ex-
pression pattern images via an entire solution path for LDA, in ACM Transactions on Knowledge
Discovery from Data, Special Issue on Bioinformatics, 2008.
72. B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression (with discussion). Ann.
Stat., 32(2):407–499, 2004.
73. F. De la Torre and T. Kanade, Discriminative cluster analysis, in Proceedings of the Twenty-Third
International Conference on Machine Learning, 2006, pp. 241–248.
74. C. Ding and T. Li, Adaptive dimension reduction using discriminant analysis and k-means clustering,
in Proceedings of the Twenty-Fourth International Conference on Machine Learning, 521–528, 2007.
75. J. Ye, Z. Zhao, and H. Liu, Adaptive distance metric learning for clustering, in IEEE Conference on
Computer Vision and Pattern Recognition, 2007.
76. J. Ye, Z. Zhao, and M. Wu, Discriminative k-means for clustering, in Proceedings of the Annual
Conference on Advances in Neural Information Processing Systems, 2007, pp. 1649–1656.
77. J. Ye, R. Janardan, and Q. Li, Two-dimensional linear discriminant analysis, in Proceedings of the
Annual Conference on Advances in Neural Information Processing Systems, 2004, pp. 1569–1576.
78. S. Yan, D. Xu, Q. Yang, L. Zhang, X. Tang, and H. Zhang, Discriminant analysis with tensor represen-
tation, in Proceedings of the International Conference on Computer Vision and Pattern Recogniton,
2005.
79. H. Wang, S. Yan, T. Huang, and X. Tang, A convergent solution to tensor subspace learning, in
Proceedings of the International Joint Conference on Artificial Intelligence, 2007.
80. D. Tao, X. Li, X. Wu, and S. J. Maybank, General tensor discriminant analysis and gabor features for
gait recognition. IEEE Trans. Pattern Anal. Mach. Intell. 29(10):1700–1715, 2007.
44. 22 Chapter 2 A Taxonomy of Emerging Multilinear Discriminant Analysis Solutions
algorithms [10–12], as well as other gray-level biometric video sequences, can be
viewed as third-order tensors with the column, row, and time modes. Naturally,
color biometric video sequences are fourth-order tensors with the addition of a color
mode.
For illustration, Figure 2.1 shows the natural representations of three commonly
used biometric signals, a second-order face tensor with the column and row modes
in Figure 2.1a, a third-order Gabor face [2, 8, 13] tensor with the column, row, and
Gabor modes in Figure 2.1b, and a third-order gait silhouette sequence tensor [14]
with the column, row, and time modes in Figure 2.1c.
The tensor space where a typical biometric tensor object is specified is often high-
dimensional, and recognition methods operating directly on this space suffer from the
so-called curse of dimensionality [15]. On the other hand, the classes of a particular
biometric signal, such as face images, are usually highly constrained and belong to
Figure 2.1. Biometric data represented naturally as tensors: (a) A 2-D face tensor, (b) a 3-D
Gabor-face tensor, and (c) a 3-D gait (silhouette) tensor.
45. 2.1 Introduction 23
a subspace, a manifold of intrinsically low dimension [15, 16]. Feature extraction or
dimensionality reduction is thus an attempt to transform a high-dimensional data set
into a low-dimensional space of equivalent representation while retaining most of the
underlying structure [17]. Traditionally, feature extraction algorithms operate on one-
dimensional objects, that is, first-order tensors (vectors); and any tensor object with
ordergreaterthanone,suchasimagesandvideos,havetobereshaped(vectorized)into
vectors first before processing. However, it is well understood that reshaping breaks
the natural structureandcorrelationintheoriginaldata,removingredundanciesand/or
higher-order dependencies present in the original data set and losing potentially more
compact or useful representations that can be obtained in the original form.
By recognizing the fact that tensor objects are naturally multidimensional ob-
jects instead of one-dimensional objects, multilinear feature extraction algorithms [2,
14, 18–20] operating directly on the tensorial representations rather than their vec-
torized versions are emerging, partly due to the recent development in multilinear
algebra [1, 21, 22]. The multilinear principal component analysis (MPCA) frame-
work [14]2 attempts to determine a multilinear projection that projects the original
tensor objects into a lower-dimensional tensor subspace while preserving the vari-
ation in the original data. It can be further extended through the combination with
classical approaches [14, 18, 25] and has achieved good results when applied to
the gait recognition problem. Nonetheless, MPCA is an unsupervised method and
the class information is not used in the feature extraction process. There has been
a growing interest in the development of supervised multilinear feature extraction
algorithms. A two-dimensional linear discriminant analysis (2DLDA) was proposed
in reference 26; and later a more general extension, the Discriminant Analysis with
Tensor Representation (DATER),3 was proposed in reference 2. They maximize a
tensor-based scatter ratio criterion and the application to the face recognition prob-
lem showed better recognition results than linear discriminant analysis (LDA). In
reference 19, a so-called general tensor discriminant analysis (GTDA) algorithm is
proposed by maximizing a scatter difference criterion, and it is used as a preprocess-
ing step in tensorial gait data classification [19]. All these methodologies are based on
the tensor-to-tensor projection (TTP). The so-called Tensor Rank-one Discriminant
Analysis (TR1DA) algorithm [27, 28], which uses the scatter difference criterion, ob-
tains a number of rank-one projections from the repeatedly calculated residues of the
original tensor data and it can be viewed as a tensor-to-vector projection (TVP). This
“greedy” approach is a heuristic method originally proposed in reference 29 for ten-
sor approximation. In reference 30, an uncorrelated multilinear discriminant analysis
(UMLDA) approach is proposed to extract uncorrelated features through TVP. The
extensions of linear graph-embedding algorithms were also introduced similarly in
references 31–35.
2An earlier version with a slightly different approach appears in reference 23 and a different formulation
is in reference 24.
3Here, we adopt the name that was used when the algorithm was first proposed, which is more commonly
referred to in the literature.
46. 24 Chapter 2 A Taxonomy of Emerging Multilinear Discriminant Analysis Solutions
In this chapter, we focus primarily on the development of supervised multi-
linear methodologies, in particular the multilinear discriminant analysis (MLDA)
algorithms, the multilinear extensions of the well-known LDA algorithm. The objec-
tive is to answer the following two questions regarding MLDA so that the interested
researchers/practitioners can grasp multilinear concepts with ease and clarity for prac-
tical usage and further research/development:
1. What are the various multilinear projections and how are they related to tra-
ditional linear projection?
2. What are the relationships (similarities and differences) among the existing
MLDA variants?
First in Section 2.2, basic multilinear algebra is reviewed and the commonly used
tensor distance measure is shown to be equivalent to the Euclidean distance for
vectors. Next, Section 2.3 discusses various multilinear projections including linear
projection: from vector to vector, from tensor to tensor, and from tensor to vec-
tor, based on which the two general categories of MLDA are introduced. Com-
monly used separation criteria and initialization methods are then discussed and
the underlying connections between the LDA and the MLDA variants are re-
vealed. Subsequently, a taxonomy of the existing MLDA variants is suggested. Fi-
nally, empirical studies are presented in Section 2.4, and conclusions are drawn in
Section 2.5.
2.2 MULTILINEAR BASICS
Before discussions on the multilinear discriminant analysis solutions for biometric
signals, it is necessary to review some basic multilinear algebra, including the nota-
tions and some basic multilinear operations. To pursue further in this topic, references
1, 21, 22, 29, and 36 are excellent references. In addition, the equivalent vector inter-
pretation of a commonly used tensor distance measure is derived.
2.2.1 Notations
The notations in this chapter follow the conventions in the multilinear algebra, pattern
recognition, and adaptive learning literature. Vectors are denoted by lowercase bold-
face letters (e.g., x), matrices by uppercase boldface (e.g., U), and tensors by script
letters (e.g., A). Their elements are denoted with indices in brackets. Indices are de-
noted by lowercase letters and span the range from 1 to the uppercase letter of the
index (e.g., n = 1, 2, . . . , N). Throughout this chapter, the discussion is restricted to
real-valued vectors, matrices, and tensors since the biometric applications that we are
interested in involve real data only, such as gray-level/color face images and binary
gait silhouette sequences.
47. 2.2 Multilinear Basics 25
2.2.2 Basic Multilinear Algebra
An Nth-order tensor is denoted as A ∈ RI1×I2×···×IN . It is addressed by N indices
in, n = 1, . . . , N, and each in addresses the n-mode of A. The n-mode product of a
tensor A by a matrix U ∈ RJn×In , denoted by A ×n U, is a tensor with entries:
(A ×n U)(i1, . . . , in−1, jn, in+1, . . . , iN) =
in
A(i1, . . . , iN) · U(jn, in). (2.1)
The scalar product of two tensors A, B ∈ RI1×I2×···×IN is defined as
A, B =
i1
i2
. . .
iN
A(i1, i2, . . . , iN) · B(i1, i2, . . . , iN), (2.2)
and the Frobenius norm of A is defined as A F =
√
A, A . The “n-mode vectors”
of A are defined as the In-dimensional vectors obtained from A by varying the index
in while keeping all the other indices fixed. A rank-1 tensor A equals to the outer
product of N vectors: A = u(1) ◦ u(2) ◦ · · · ◦ u(N), which means that
A(i1, i2, . . . , iN) = u(1)
(i1) · u(2)
(i2) · . . . · u(N)
(iN) (2.3)
for all values of indices. Unfolding A along the n-mode is denoted as A(n) ∈
RIn×(I1×···×In−1×In+1×···×IN ), and the column vectors of A(n) are the n-mode vectors
of A.
Figures 2.2b, 2.2c, and 2.2d give visual illustrations of the 1-mode, 2-mode,
and 3-mode vectors of the third-order tensor A in Figure 2.2a, respectively. Figure
2.3a shows the 1-mode unfolding of the tensor A in Figure 2.2a and Figure 2.3b
demonstrates how the 1-mode multiplication A ×1 B is obtained. The product A ×1 B
is computed as the inner product between the 1-mode vector of A and the rows of B. In
the 1-mode multiplication, each 1-mode vector of A (∈ R8) is projected by B ∈ R3×8
to obtain a vector (∈ R3), as the differently shaded vectors indicate in Figure 2.3b.
Figure 2.2. Illustration of the n-mode vectors: (a) A tensor A ∈ R8×6×4, (b) the 1-mode vectors,
(c) the 2-mode vectors, and (d) the 3-mode vectors.
48. 26 Chapter 2 A Taxonomy of Emerging Multilinear Discriminant Analysis Solutions
Figure 2.3. Visual illustration of (a) the n-mode (1-mode) unfolding and (b) the n-mode (1-mode)
multiplication.
2.2.3 Tensor Distance Measure
To measure the distance between tensors A and B, the Frobenius norm is used in
reference 2: dist(A, B) = A − B F . Let vec(A) be the vector representation (vec-
torization) of A, then it is straightforward to show the following:
Proposition 1. dist(A, B) = vec(A) − vec(B) 2
That is, the Frobenius norm of the difference between two tensors equals to the
Euclidean distance of their vectorized representations, since the Frobenius norm is a
point-based measurement as well [37] and it does not take the structure of a tensor
into account.
2.3 MULTILINEAR DISCRIMINANT ANALYSIS
The linear discriminant analysis (LDA) [38] is a classical algorithm that has been
successfully applied and extended to various biometric signal recognition problems
[15, 39–42]. The recent advancement in multilinear algebra [1, 21] led to a number
of multilinear extensions of the LDA, multilinear discriminant analysis (MLDA),
being proposed for the recognition of biometric signals using their natural tensorial
representation [2, 19, 28, 30].
In general, MLDA seeks a multilinear projection that maps the input data from
one space to another (lower-dimensional, more discriminative) space. Therefore, we
49. 2.3 Multilinear Discriminant Analysis 27
need to understand what is a multilinear projection before proceeding to the MLDA
solutions. In this section, we first propose a categorization of the various multilinear
projections in terms of the input and output of the projection: the traditional vector-
to-vector projection (VVP), the tensor-to-tensor projection (TTP), and the tensor-to-
vector (TVP) projection.4 Based on the categorization of multilinear projections, we
discuss two general formulations of MLDA: the MLDA based on the tensor-to-tensor
projection (MLDA-TTP) and the MLDA based on the tensor-to-vector projection
(MLDA-TVP). Commonly used separation criteria and initialization methods are then
presented.Furthermore,therelationshipsbetweentheLDA,MLDA-TTP,andMLDA-
TVP are investigated and a taxonomy of the existing MLDA variants is suggested.
2.3.1 Vector-to-Vector Projection (VVP)
Linear projection is a standard transform used widely in various applications [38, 43].
A linear projection takes a vector x ∈ RI and projects it to y ∈ RP using a projection
matrix U ∈ RI×P :
y = UT
x. (2.4)
In typical pattern recognition applications, P I. Therefore, linear projection is a
vector-to-vector projection (VVP) and it requires the vectorization of an input before
projection. Figure 2.4a illustrates the VVP of a tensor object A. The classical LDA
algorithm employs VVP.
2.3.2 Tensor-to-Tensor Projection (TTP)
Besides the traditional VVP, we can also project a tensor to another tensor (of the
same order), which is named as tensor-to-tensor projection (TTP) in this chapter. An
Nth-order tensor X resides in the tensor (multilinear) space RI1 ⊗ RI2 · · · ⊗ RIN ,
where ⊗ denotes the Kronecker product [43]. Thus the tensor (multilinear) space can
be viewed as the Kronecker product of N vector (linear) spaces RI1 , RI2 , . . . , RIN .
For the projection of a tensor X in a tensor space RI1 ⊗ RI2 · · · ⊗ RIN to another
tensor Y in a lower-dimensional tensor space RP1 ⊗ RP2 · · · ⊗ RPN , where Pn In
for all n, N projection matrices {U(n) ∈ RIn×Pn , n = 1, . . . , N} are used so that
Y = X ×1 U(1)T
×2 U(2)T
· · · ×N U(N)T
. (2.5)
Figure 2.4b demonstrates the TTP of a tensor object A to a smaller tensor of size
P1 × P2 × P3. How this multilinear projection is carried out can be understood better
by referring to the illustration on the n-mode multiplication in Figure 2.3b. Many
multilinear algorithms [2, 14, 19] have been developed through solving such a TTP.
4Multilinear projections are closely related to multilinear/tensor decompositions, which are included in
the Appendix for completeness. They share some mathematical similarities but they are from different
perspectives.
50. 28 Chapter 2 A Taxonomy of Emerging Multilinear Discriminant Analysis Solutions
Figure 2.4. Illustration of (a) vector-to-vector projection (VVP), (b) tensor-to-tensor projection
(TTP), and (c) tensor-to-vector projection (TVP).
2.3.3 Tensor-to-Vector Projection (TVP)
In our recent work [30], we introduced a multilinear projection from a tensor space
to a vector space, called the tensor-to-vector projection (TVP). The projection from
a tensor to a scalar is considered first. A tensor X ∈ RI1×I2×···×IN is projected to a
point y as
y = X ×1 u(1)T
×2 u(2)T
· · · ×N u(N)T
, (2.6)
which can also be written as the following inner product:
y =
X, u(1)
◦ u(2)
◦ · · · ◦ u(N)
. (2.7)
Let U = u(1) ◦ u(2) ◦ · · · ◦ u(N), then we have y = X, U . Such a multilinear projec-
tion
u(1)T
, u(2)T
, . . . , u(N)T
, named an elementary multilinear projection (EMP),
is the projection of a tensor on a single multilinear projection direction, and it consists
of one projection vector in each mode.
51. 2.3 Multilinear Discriminant Analysis 29
The projection of a tensor object X to y ∈ RP in a P-dimensional vector space
consists of P EMPs
u(1)T
p , u(2)T
p , . . . , u(N)T
p , p = 1, . . . , P, (2.8)
which can be written compactly as
u(n)T
p , n = 1, . . . , N
P
p=1
. Thus, this TVP is
written as
y = X ×N
n=1
u(n)T
p , n = 1, . . . , N
P
p=1
, (2.9)
where the pth component of y is obtained from the pth EMP as
y(p) = X ×1 u(1)T
p ×2 u(2)T
p · · · ×N u(N)T
p . (2.10)
Figure 2.4c shows the TVP of a tensor object A to a vector of size P × 1. A number of
recent multilinear algorithms [27, 28, 30, 35]5 have been proposed with the objective
of solving such a TVP.
2.3.4 MLDA-TTP
The multilinear extension of the LDA using the TTP is named MLDA-TTP hereafter.
To formulate MLDA-TTP, the following definitions are introduced first.
Definition 1. Let {Am, m = 1, . . . , M} be a set of M tensor samples in RI1 ⊗
RI2 · · · ⊗ RIN . The between-class scatter of these tensors is defined as
BA =
C
c=1
Nc Āc − Ā 2
F , (2.11)
and the within-class scatter of these tensors is defined as
WA =
M
m=1
Am − Ācm 2
F , (2.12)
where C is the number of classes, Nc is the number of samples for class c, cm is the
class label for the mth sample Am, the mean tensor is Ā = 1
M
m Am and the class
mean tensor is Āc = 1
Nc
m,cm=c Am.
Next, the n-mode scatter matrices are defined accordingly.
Definition 2. The n-mode between-class scatter matrix of these samples is defined
as
S
(n)
BA
=
C
c=1
Nc ·
Āc(n) − Ā(n)
Āc(n) − Ā(n)
T
, (2.13)
5TVP is referred to as the rank-one projections in some works [27, 28, 35].
52. 30 Chapter 2 A Taxonomy of Emerging Multilinear Discriminant Analysis Solutions
and the n-mode within-class scatter matrix of these samples is defined as
S
(n)
WA
=
M
m=1
Am(n) − Ācm(n)
Am(n) − Ācm(n)
T
, (2.14)
where Āc(n) is the n-mode unfolded matrix of Āc.
From the definitions above, the following properties are derived:
Property 1. Since trace(AAT ) = A 2
F and A 2
F = A(n) 2
F , we have trace
S
(n)
BA
=
C
c=1 Nc Āc(n) − Ā(n) 2
F = BA and trace
S
(n)
WA
=
M
m=1 Am(n) −
Ācm(n) 2
F = WA for all n.
The formal definition of the problem to be solved in MLDA-TTP is then described
below:
A set of M training tensor objects {X1, X2, . . . , XM} is available. Each tensor
object Xm ∈ RI1×I2×···×IN assumes values in the tensor space RI1 ⊗ RI2 · · · ⊗ RIN ,
where In is the n-mode dimension of the tensor. The objective of MLDA-TPP is to find
a multilinear mapping {U(n) ∈ RIn×Pn , n = 1, . . . , N} from the original tensor space
RI1 ⊗ RI2 · · · ⊗ RIN into a tensor subspace RP1 ⊗ RP2 . . . ⊗ RPN (with Pn In, for
n = 1, . . . , N):
Ym = Xm ×1 U(1)T
×2 U(2)T
· · · ×N U(N)T
, m = 1, . . . , M, (2.15)
based on the optimization of a certain separation criterion, such that an enhanced
separability between different classes is achieved.
The MLDA-TTP objective is to determine the N projection matrices {U(n) ∈
RIn×Pn , n = 1, . . . , N} that maximize some class separation criterion, which is often
in terms of BY and WY . By making use of Property 1, the problem can be converted
to N subproblems in terms of S
(n)
BY
and S
(n)
WY
, which employs the commonly-used alter-
nating projection principal [1, 2, 14]. The pseudo-code implementation of a general
MLDA-TTP algorithm is shown in Figure 2.5. In each iteration k, for mode n, the input
tensor samples are projected using the current projection matrices in all modes except
n to obtain a set of Nth-order tensor samples, whose n-mode unfolding matrices are
used to obtain S
(n)
BY
and S
(n)
WY
.
2.3.5 MLDA-TVP
The multilinear extension of the LDA using the TVP is named MLDA-TVP and the
formal definition of the problem to be solved in MLDA-TVP is described below:
A set of M training tensor objects {X1, X2, . . . , XM} is available. Each tensor
object Xm ∈ RI1×I2×···×IN assumes values in the tensor space RI1 ⊗ RI2 · · · ⊗ RIN ,
where In is the n-mode dimension of the tensor. The objective of MLDA-TVP is to
find a set of P EMPs {u(n)
p ∈ RIn×1, n = 1, . . . , N}P
p=1 mapping from the original
53. 2.3 Multilinear Discriminant Analysis 31
Input: A set of tensor samples {Xm ∈ RI1×I2×···×IN , m = 1, . . . , M} with class
labels c ∈ RM
, Pn for n = 1, . . . , N.
Output: Low-dimensional representations {Ym ∈ RP1×P2×···×PN , m = 1, . . . , M}
of the input tensor samples maximizing a separation criterion.
Algorithm:
Step 1: Initialize U(n)
0 for n = 1, . . . , N.
Step 2 (Local optimization):
r For k = 1 : K
– For n = 1 : N
∗ Calculate {Ym = Xm ×1 U(1)T
k · · · ×n−1 U(n−1)T
k ×n+1
U(n+1)T
k−1 · · · ×N U(N)T
k−1 , m = 1, . . . , M}.
∗ Calculate S(n)
BY
and S(n)
WY
.
∗ Set the matrix U(n)
k to optimize a separation criterion.
– If k 2 and U(n)
k converges for all n, set U(n)
= U(n)
k and break.
Step 3 (Projection): The feature tensor after projection is obtained as
{Ym = Xm ×1 U(1)T
×2 U(2)T
· · · ×N U(N)T
, m = 1, . . . , M}.
Figure 2.5. The pseudo-code implementation of a general MLDA-TTP.
tensor space RI1 ⊗ RI2 . . . ⊗ RIN into a vector subspace RP (with P
N
n=1 In):
ym = Xm ×N
n=1
u(n)T
p , n = 1, . . . , N
P
p=1
, m = 1, . . . , M, (2.16)
based on the optimization of a certain separation criteria, such that an enhanced
separability between different classes is achieved.
The MLDA-TVP objective is to determine the P projection bases in each mode
u(n)
p ∈ RIn×1, n = 1, . . . , N, p = 1, . . . , P that maximize a class separation cri-
terion. In MLDA-TVP, since the projected space is a vector space, the definition of
scatter matrices in classical LDA can be followed. For the samples projected by the
pth EMP {ymp , m = 1, . . . , M}, where ymp is the projection of the mth sample by
the pth EMP, the between-class scatter matrix and the within-class scatter matrix are
defined as
S
y
Bp
=
C
c=1
Nc(ȳcp − ȳp)2
(2.17)
and
S
y
Wp
=
M
m=1
(ymp − ȳcmp )2
, (2.18)
respectively, where ȳp = 1
M
m ymp , ȳcp = 1
Nc
m,cm=c ymp . Figure 2.6 is the
pseudo-code implementation of a general MLDA-TVP algorithm. To solve the prob-
lem, the alternating projection principal is again employed. In each iteration k, for
54. 32 Chapter 2 A Taxonomy of Emerging Multilinear Discriminant Analysis Solutions
Input: A set of tensor samples {Xm ∈ RI1×I2×···×IN , m = 1, . . . , M} with class
labels c ∈ RM
, the projected feature dimension P.
Output: Low-dimensional representations {ym ∈ RP
, m = 1, . . . , M} of the
input tensor samples maximizing a separation criterion.
Algorithm:
Step 1 (Stepwise optimization):
For p = 1 : P
r For n = 1, . . . , N, initialize u(n)
p ∈ RIn .
r For k = 1 : K
– For n = 1 : N
* Calculate {ym = Xmp ×1 u(1)T
pk
· · · ×n−1 u(n−1)T
pk
×n+1
u(n+1)T
pk−1
· · · ×N u(N)T
pk−1
, m = 1, . . . , M}.
* Calculate the between-class and the within-class scatter
matrices by treating {ym} as the input vector samples, as in
classical LDA.
* Compute the vector u(n)
pk
that optimizes a separation criterion.
– If k2 and u(n)
pk
converges for all n, set u(n)
p = u(n)
pk
and break.
Step 2 (Projection): The feature vector after projection is obtained as
{ym(p) = Xm ×1 u(1)T
p · · · ×N u(N)T
p , p = 1, . . . , P, m = 1, . . . , M}.
Figure 2.6. The pseudo-code implementation of a general MLDA-TVP.
mode n, the input tensor samples are projected using the current projection vectors in
all modes except n to obtain a set of vector samples and the problem is then converted
to a number of classical LDA problems.
2.3.6 Separation Criteria and Initialization Methods
Both MLDA-TTP and MLDA-TVP need to specify a class separation criterion to be
optimized. One commonly used separation criterion is the ratio of the between-class
scatter BY or S
y
Bp
and the within-class scatter WY or S
y
Wp
:
BY
WY
for MLDA-TTP
or
S
y
Bp
S
y
Wp
for MLDA-TVP [39], hereafter named SRatio.
Another separation criterion is the (weighted) difference between the between-
class scatter BY or S
y
Bp
and the within-class scatter WY or S
y
Wp
: ( BY − ζ WY ) for
MLDA-TTP or (S
y
Bp
− ζ · S
y
Wp
) for MLDA-TVP [44], hereafter named SDiff, where ζ
is a parameter tuning the weight between the between-class and within-class scatters.
Since MLDA algorithms rely on the alternating projection principal, they are
generally iterative and there is a need in choosing an initialization method. Commonly
used initialization methods for MLDA-TTP are: pseudo-identity matrices (truncated
identity matrices) and random matrices. Commonly used initialization methods for
MLDA-TVP are: all ones and random vectors. There are also initialization methods
56. 1.C. The Project Gutenberg Literary Archive Foundation (“the
Foundation” or PGLAF), owns a compilation copyright in the
collection of Project Gutenberg™ electronic works. Nearly all the
individual works in the collection are in the public domain in the
United States. If an individual work is unprotected by copyright law
in the United States and you are located in the United States, we do
not claim a right to prevent you from copying, distributing,
performing, displaying or creating derivative works based on the
work as long as all references to Project Gutenberg are removed. Of
course, we hope that you will support the Project Gutenberg™
mission of promoting free access to electronic works by freely
sharing Project Gutenberg™ works in compliance with the terms of
this agreement for keeping the Project Gutenberg™ name associated
with the work. You can easily comply with the terms of this
agreement by keeping this work in the same format with its attached
full Project Gutenberg™ License when you share it without charge
with others.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
1.E. Unless you have removed all references to Project Gutenberg:
1.E.1. The following sentence, with active links to, or other
immediate access to, the full Project Gutenberg™ License must
appear prominently whenever any copy of a Project Gutenberg™
work (any work on which the phrase “Project Gutenberg” appears,
or with which the phrase “Project Gutenberg” is associated) is
accessed, displayed, performed, viewed, copied or distributed:
57. This eBook is for the use of anyone anywhere in the United
States and most other parts of the world at no cost and with
almost no restrictions whatsoever. You may copy it, give it away
or re-use it under the terms of the Project Gutenberg License
included with this eBook or online at www.gutenberg.org. If you
are not located in the United States, you will have to check the
laws of the country where you are located before using this
eBook.
1.E.2. If an individual Project Gutenberg™ electronic work is derived
from texts not protected by U.S. copyright law (does not contain a
notice indicating that it is posted with permission of the copyright
holder), the work can be copied and distributed to anyone in the
United States without paying any fees or charges. If you are
redistributing or providing access to a work with the phrase “Project
Gutenberg” associated with or appearing on the work, you must
comply either with the requirements of paragraphs 1.E.1 through
1.E.7 or obtain permission for the use of the work and the Project
Gutenberg™ trademark as set forth in paragraphs 1.E.8 or 1.E.9.
1.E.3. If an individual Project Gutenberg™ electronic work is posted
with the permission of the copyright holder, your use and distribution
must comply with both paragraphs 1.E.1 through 1.E.7 and any
additional terms imposed by the copyright holder. Additional terms
will be linked to the Project Gutenberg™ License for all works posted
with the permission of the copyright holder found at the beginning
of this work.
1.E.4. Do not unlink or detach or remove the full Project
Gutenberg™ License terms from this work, or any files containing a
part of this work or any other work associated with Project
Gutenberg™.
1.E.5. Do not copy, display, perform, distribute or redistribute this
electronic work, or any part of this electronic work, without
prominently displaying the sentence set forth in paragraph 1.E.1
58. with active links or immediate access to the full terms of the Project
Gutenberg™ License.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
1.E.7. Do not charge a fee for access to, viewing, displaying,
performing, copying or distributing any Project Gutenberg™ works
unless you comply with paragraph 1.E.8 or 1.E.9.
1.E.8. You may charge a reasonable fee for copies of or providing
access to or distributing Project Gutenberg™ electronic works
provided that:
• You pay a royalty fee of 20% of the gross profits you
derive from the use of Project Gutenberg™ works
calculated using the method you already use to calculate
your applicable taxes. The fee is owed to the owner of the
Project Gutenberg™ trademark, but he has agreed to
donate royalties under this paragraph to the Project
Gutenberg Literary Archive Foundation. Royalty payments
must be paid within 60 days following each date on which
you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly
marked as such and sent to the Project Gutenberg Literary
Archive Foundation at the address specified in Section 4,
59. “Information about donations to the Project Gutenberg
Literary Archive Foundation.”
• You provide a full refund of any money paid by a user who
notifies you in writing (or by e-mail) within 30 days of
receipt that s/he does not agree to the terms of the full
Project Gutenberg™ License. You must require such a user
to return or destroy all copies of the works possessed in a
physical medium and discontinue all use of and all access to
other copies of Project Gutenberg™ works.
• You provide, in accordance with paragraph 1.F.3, a full
refund of any money paid for a work or a replacement
copy, if a defect in the electronic work is discovered and
reported to you within 90 days of receipt of the work.
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.E.9. If you wish to charge a fee or distribute a Project Gutenberg™
electronic work or group of works on different terms than are set
forth in this agreement, you must obtain permission in writing from
the Project Gutenberg Literary Archive Foundation, the manager of
the Project Gutenberg™ trademark. Contact the Foundation as set
forth in Section 3 below.
1.F.
1.F.1. Project Gutenberg volunteers and employees expend
considerable effort to identify, do copyright research on, transcribe
and proofread works not protected by U.S. copyright law in creating
the Project Gutenberg™ collection. Despite these efforts, Project
Gutenberg™ electronic works, and the medium on which they may
be stored, may contain “Defects,” such as, but not limited to,
incomplete, inaccurate or corrupt data, transcription errors, a
copyright or other intellectual property infringement, a defective or
60. damaged disk or other medium, a computer virus, or computer
codes that damage or cannot be read by your equipment.
1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for
the “Right of Replacement or Refund” described in paragraph 1.F.3,
the Project Gutenberg Literary Archive Foundation, the owner of the
Project Gutenberg™ trademark, and any other party distributing a
Project Gutenberg™ electronic work under this agreement, disclaim
all liability to you for damages, costs and expenses, including legal
fees. YOU AGREE THAT YOU HAVE NO REMEDIES FOR
NEGLIGENCE, STRICT LIABILITY, BREACH OF WARRANTY OR
BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN PARAGRAPH
1.F.3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK
OWNER, AND ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL
NOT BE LIABLE TO YOU FOR ACTUAL, DIRECT, INDIRECT,
CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES EVEN IF
YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.
1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you
discover a defect in this electronic work within 90 days of receiving
it, you can receive a refund of the money (if any) you paid for it by
sending a written explanation to the person you received the work
from. If you received the work on a physical medium, you must
return the medium with your written explanation. The person or
entity that provided you with the defective work may elect to provide
a replacement copy in lieu of a refund. If you received the work
electronically, the person or entity providing it to you may choose to
give you a second opportunity to receive the work electronically in
lieu of a refund. If the second copy is also defective, you may
demand a refund in writing without further opportunities to fix the
problem.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
61. INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
1.F.5. Some states do not allow disclaimers of certain implied
warranties or the exclusion or limitation of certain types of damages.
If any disclaimer or limitation set forth in this agreement violates the
law of the state applicable to this agreement, the agreement shall be
interpreted to make the maximum disclaimer or limitation permitted
by the applicable state law. The invalidity or unenforceability of any
provision of this agreement shall not void the remaining provisions.
1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,
the trademark owner, any agent or employee of the Foundation,
anyone providing copies of Project Gutenberg™ electronic works in
accordance with this agreement, and any volunteers associated with
the production, promotion and distribution of Project Gutenberg™
electronic works, harmless from all liability, costs and expenses,
including legal fees, that arise directly or indirectly from any of the
following which you do or cause to occur: (a) distribution of this or
any Project Gutenberg™ work, (b) alteration, modification, or
additions or deletions to any Project Gutenberg™ work, and (c) any
Defect you cause.
Section 2. Information about the Mission
of Project Gutenberg™
Project Gutenberg™ is synonymous with the free distribution of
electronic works in formats readable by the widest variety of
computers including obsolete, old, middle-aged and new computers.
It exists because of the efforts of hundreds of volunteers and
donations from people in all walks of life.
Volunteers and financial support to provide volunteers with the
assistance they need are critical to reaching Project Gutenberg™’s
goals and ensuring that the Project Gutenberg™ collection will
62. remain freely available for generations to come. In 2001, the Project
Gutenberg Literary Archive Foundation was created to provide a
secure and permanent future for Project Gutenberg™ and future
generations. To learn more about the Project Gutenberg Literary
Archive Foundation and how your efforts and donations can help,
see Sections 3 and 4 and the Foundation information page at
www.gutenberg.org.
Section 3. Information about the Project
Gutenberg Literary Archive Foundation
The Project Gutenberg Literary Archive Foundation is a non-profit
501(c)(3) educational corporation organized under the laws of the
state of Mississippi and granted tax exempt status by the Internal
Revenue Service. The Foundation’s EIN or federal tax identification
number is 64-6221541. Contributions to the Project Gutenberg
Literary Archive Foundation are tax deductible to the full extent
permitted by U.S. federal laws and your state’s laws.
The Foundation’s business office is located at 809 North 1500 West,
Salt Lake City, UT 84116, (801) 596-1887. Email contact links and up
to date contact information can be found at the Foundation’s website
and official page at www.gutenberg.org/contact
Section 4. Information about Donations to
the Project Gutenberg Literary Archive
Foundation
Project Gutenberg™ depends upon and cannot survive without
widespread public support and donations to carry out its mission of
increasing the number of public domain and licensed works that can
be freely distributed in machine-readable form accessible by the
widest array of equipment including outdated equipment. Many
63. small donations ($1 to $5,000) are particularly important to
maintaining tax exempt status with the IRS.
The Foundation is committed to complying with the laws regulating
charities and charitable donations in all 50 states of the United
States. Compliance requirements are not uniform and it takes a
considerable effort, much paperwork and many fees to meet and
keep up with these requirements. We do not solicit donations in
locations where we have not received written confirmation of
compliance. To SEND DONATIONS or determine the status of
compliance for any particular state visit www.gutenberg.org/donate.
While we cannot and do not solicit contributions from states where
we have not met the solicitation requirements, we know of no
prohibition against accepting unsolicited donations from donors in
such states who approach us with offers to donate.
International donations are gratefully accepted, but we cannot make
any statements concerning tax treatment of donations received from
outside the United States. U.S. laws alone swamp our small staff.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Section 5. General Information About
Project Gutenberg™ electronic works
Professor Michael S. Hart was the originator of the Project
Gutenberg™ concept of a library of electronic works that could be
freely shared with anyone. For forty years, he produced and
distributed Project Gutenberg™ eBooks with only a loose network of
volunteer support.
64. Project Gutenberg™ eBooks are often created from several printed
editions, all of which are confirmed as not protected by copyright in
the U.S. unless a copyright notice is included. Thus, we do not
necessarily keep eBooks in compliance with any particular paper
edition.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
This website includes information about Project Gutenberg™,
including how to make donations to the Project Gutenberg Literary
Archive Foundation, how to help produce our new eBooks, and how
to subscribe to our email newsletter to hear about new eBooks.
65. Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com