Introduction To Multivariate Analysis Linear And Nonlinear Modeling Konishi

Introduction To Multivariate Analysis Linear And
Nonlinear Modeling Konishi download
https://ebookbell.com/product/introduction-to-multivariate-
analysis-linear-and-nonlinear-modeling-konishi-12074818
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Introduction To Multivariate Analysis Linear And Nonlinear Modeling
Konishi
https://ebookbell.com/product/introduction-to-multivariate-analysis-
linear-and-nonlinear-modeling-konishi-5066650
Introduction To Multivariate Statistical Analysis In Chemometrics 1st
Edition Kurt Varmuza
https://ebookbell.com/product/introduction-to-multivariate-
statistical-analysis-in-chemometrics-1st-edition-kurt-varmuza-1857616
An Introduction To Multivariate Statistical Analysis Wiley Series In
Probability And Statistics 3rd Edition T W Anderson
https://ebookbell.com/product/an-introduction-to-multivariate-
statistical-analysis-wiley-series-in-probability-and-statistics-3rd-
edition-t-w-anderson-2116318
An Introduction To Applied Multivariate Analysis With R 1st Edition
Brian Everitt
https://ebookbell.com/product/an-introduction-to-applied-multivariate-
analysis-with-r-1st-edition-brian-everitt-2216062

An Introduction To Applied Multivariate Analysis 1st Edition Tenko
Raykov
https://ebookbell.com/product/an-introduction-to-applied-multivariate-
analysis-1st-edition-tenko-raykov-2442672
Matrixbased Introduction To Multivariate Data Analysis 1st Edition
Kohei Adachi Auth
https://ebookbell.com/product/matrixbased-introduction-to-
multivariate-data-analysis-1st-edition-kohei-adachi-auth-5609750
Matrixbased Introduction To Multivariate Data Analysis 2nd Edition
Kohei Adachi
https://ebookbell.com/product/matrixbased-introduction-to-
multivariate-data-analysis-2nd-edition-kohei-adachi-11083338
An Introduction To Multivariable Analysis From Vector To Manifold 1st
Edition Piotr Mikusiski
https://ebookbell.com/product/an-introduction-to-multivariable-
analysis-from-vector-to-manifold-1st-edition-piotr-mikusiski-4210624
A Friendly Introduction To Analysis Single And Multivariable 2nd
Witold A J Kosmala
https://ebookbell.com/product/a-friendly-introduction-to-analysis-
single-and-multivariable-2nd-witold-a-j-kosmala-52660806

K16322
Introduction to Multivariate Analysis: Linear and Nonlinear Modeling
shows how multivariate analysis is widely used for extracting useful
information and patterns from multivariate data and for understanding the
structure of random phenomena. Along with the basic concepts of various
procedures in traditional multivariate analysis, the book covers nonlinear
techniques for clarifying phenomena behind observed multivariate data. It
primarily focuses on regression modeling, classification and discrimination,
dimension reduction, and clustering.
The text thoroughly explains the concepts and derivations of the AIC, BIC,
and related criteria and includes a wide range of practical examples of
model selection and evaluation criteria. To estimate and evaluate models
with a large number of predictor variables, the author presents regularization
methods, including the L1 norm regularization that gives simultaneous
model estimation and variable selection.
Features
• Explains how to use linear and nonlinear multivariate techniques to
extract information from data and understand random phenomena
• Includes a self-contained introduction to theoretical results
• Presents many examples and figures that facilitate a deep
understanding of multivariate analysis techniques
• Covers regression, discriminant analysis, Bayesian classification,
support vector machines, principal component analysis, and
clustering
• Incorporates real data sets from engineering, pattern recognition,
medicine, and more
For advanced undergraduate and graduate students in statistical science,
this text provides a systematic description of both traditional and newer
techniques in multivariate analysis and machine learning. It also introduces
linear and nonlinear statistical modeling for researchers and practitioners in
industrial and systems engineering, information science, life science, and
other areas.
Konishi
Introduction
to
Multivariate
Analysis
Statistics
Texts in Statistical Science
Sadanori Konishi
Introduction to
Multivariate
Analysis
Linear and Nonlinear Modeling
K16322_cover.indd 1 5/14/14 9:32 AM

Introduction to
Multivariate
Analysis

CHAPMAN & HALL/CRC
Texts in Statistical Science Series
Series Editors
Francesca Dominici, Harvard School of Public Health, USA
Julian J. Faraway, University of Bath, UK
Martin Tanner, Northwestern University, USA
Jim Zidek, University of British Columbia, Canada
Statistical Theory: A Concise Introduction
F. Abramovich and Y. Ritov
Practical Multivariate Analysis, Fifth Edition
A. Afifi, S. May, and V.A. Clark
Practical Statistics for Medical Research
D.G. Altman
Interpreting Data: A First Course
in Statistics
A.J.B. Anderson
Introduction to Probability with R
K. Baclawski
Linear Algebra and Matrix Analysis for
Statistics
S. Banerjee and A. Roy
Statistical Methods for SPC and TQM
D. Bissell
Bayesian Methods for Data Analysis,
Third Edition
B.P. Carlin and T.A. Louis
Second Edition
R. Caulcutt
The Analysis of Time Series: An Introduction,
Sixth Edition
C. Chatfield
Introduction to Multivariate Analysis
C. Chatfield and A.J. Collins
Problem Solving: A Statistician’s Guide,
Second Edition
C. Chatfield
Statistics for Technology: A Course in Applied
Statistics,Third Edition
C. Chatfield
Bayesian Ideas and Data Analysis: An
Introduction for Scientists and Statisticians
R. Christensen, W. Johnson, A. Branscum,
and T.E. Hanson
Modelling Binary Data, Second Edition
D. Collett
Modelling Survival Data in Medical Research,
Second Edition
D. Collett
Introduction to Statistical Methods for
Clinical Trials
T.D. Cook and D.L. DeMets
Applied Statistics: Principles and Examples
D.R. Cox and E.J. Snell
Multivariate Survival Analysis and Competing
Risks
M. Crowder
Statistical Analysis of Reliability Data
M.J. Crowder, A.C. Kimber,
T.J. Sweeting, and R.L. Smith
An Introduction to Generalized
Linear Models,Third Edition
A.J. Dobson and A.G. Barnett
Nonlinear Time Series:Theory, Methods, and
Applications with R Examples
R. Douc, E. Moulines, and D.S. Stoffer
Introduction to Optimization Methods and
Their Applications in Statistics
B.S. Everitt
Extending the Linear Model with R:
Generalized Linear, Mixed Effects and
Nonparametric Regression Models
J.J. Faraway
A Course in Large Sample Theory
T.S. Ferguson
Multivariate Statistics: A Practical Approach
B. Flury and H. Riedwyl
Readings in Decision Analysis
S. French
Markov Chain Monte Carlo:
Stochastic Simulation for Bayesian Inference,
Second Edition
D. Gamerman and H.F. Lopes
Bayesian Data Analysis, Third Edition
A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson,
A. Vehtari, and D.B. Rubin
Multivariate Analysis of Variance and
Repeated Measures: A Practical Approach for
Behavioural Scientists
D.J. Hand and C.C.Taylor

Practical Data Analysis for Designed Practical
Longitudinal Data Analysis
D.J. Hand and M. Crowder
Logistic Regression Models
J.M. Hilbe
Richly Parameterized Linear Models:
Additive,Time Series, and Spatial Models
Using Random Effects
J.S. Hodges
Statistics for Epidemiology
N.P. Jewell
Stochastic Processes: An Introduction,
Second Edition
P.W. Jones and P. Smith
The Theory of Linear Models
B. Jørgensen
Principles of Uncertainty
J.B. Kadane
Graphics for Statistics and Data Analysis with R
K.J. Keen
Mathematical Statistics
K. Knight
Introduction to Multivariate Analysis:
S. Konishi
Nonparametric Methods in Statistics with SAS
Applications
O. Korosteleva
Modeling and Analysis of Stochastic Systems,
Second Edition
V.G. Kulkarni
Exercises and Solutions in Biostatistical Theory
L.L. Kupper, B.H. Neelon, and S.M. O’Brien
Exercises and Solutions in Statistical Theory
L.L. Kupper, B.H. Neelon, and S.M. O’Brien
Design and Analysis of Experiments with SAS
J. Lawson
A Course in Categorical Data Analysis
T. Leonard
Statistics for Accountants
S. Letchford
Introduction to the Theory of Statistical
Inference
H. Liero and S. Zwanzig
Statistical Theory, Fourth Edition
B.W. Lindgren
Stationary Stochastic Processes:Theory and
Applications
G. Lindgren
The BUGS Book: A Practical Introduction to
Bayesian Analysis
D. Lunn, C. Jackson, N. Best, A.Thomas, and
D. Spiegelhalter
Introduction to General and Generalized
Linear Models
H. Madsen and P.Thyregod
Time Series Analysis
H. Madsen
Pólya Urn Models
H. Mahmoud
Randomization, Bootstrap and Monte Carlo
Methods in Biology,Third Edition
B.F.J. Manly
Introduction to Randomized Controlled
Clinical Trials, Second Edition
J.N.S. Matthews
Statistical Methods in Agriculture and
Experimental Biology, Second Edition
R. Mead, R.N. Curnow, and A.M. Hasted
Statistics in Engineering: A Practical Approach
A.V. Metcalfe
Beyond ANOVA: Basics of Applied Statistics
R.G. Miller, Jr.
A Primer on Linear Models
J.F. Monahan
Applied Stochastic Modelling, Second Edition
B.J.T. Morgan
Elements of Simulation
B.J.T. Morgan
Probability: Methods and Measurement
A. O’Hagan
Introduction to Statistical Limit Theory
A.M. Polansky
Applied Bayesian Forecasting and Time Series
Analysis
A. Pole, M. West, and J. Harrison
Statistics in Research and Development,
Time Series: Modeling, Computation, and
Inference
R. Prado and M. West
Introduction to Statistical Process Control
P. Qiu

Sampling Methodologies with Applications
P.S.R.S. Rao
A First Course in Linear Model Theory
N. Ravishanker and D.K. Dey
Essential Statistics, Fourth Edition
D.A.G. Rees
Stochastic Modeling and Mathematical
Statistics: A Text for Statisticians and
Quantitative
F.J. Samaniego
Statistical Methods for Spatial Data Analysis
O. Schabenberger and C.A. Gotway
Large Sample Methods in Statistics
P.K. Sen and J. da Motta Singer
Decision Analysis: A Bayesian Approach
J.Q. Smith
Analysis of Failure and Survival Data
P. J. Smith
Applied Statistics: Handbook of GENSTAT
Analyses
E.J. Snell and H. Simpson
Applied Nonparametric Statistical Methods,
Fourth Edition
P. Sprent and N.C. Smeeton
Data Driven Statistical Methods
P. Sprent
Generalized Linear Mixed Models:
Modern Concepts, Methods and Applications
W. W. Stroup
Survival Analysis Using S: Analysis of
Time-to-Event Data
M.Tableman and J.S. Kim
Applied Categorical and Count Data Analysis
W.Tang, H. He, and X.M.Tu
Elementary Applications of Probability Theory,
Second Edition
H.C.Tuckwell
Introduction to Statistical Inference and Its
Applications with R
M.W.Trosset
Understanding Advanced Statistical Methods
P.H. Westfall and K.S.S. Henning
Statistical Process Control:Theory and
Practice,Third Edition
G.B. Wetherill and D.W. Brown
Generalized Additive Models:
An Introduction with R
S. Wood
Epidemiology: Study Design and
Data Analysis,Third Edition
M. Woodward
Experiments
B.S. Yandell

Texts in Statistical Science
Sadanori Konishi
Chuo University
Tokyo, Japan
Introduction to
Multivariate
Analysis

TAHENRYO KEISEKI NYUMON: SENKEI KARA HISENKEI E by Sadanori Konishi © 2010 by Sadanori Konishi
Originally published in Japanese by Iwanami Shoten, Publishers, Tokyo, 2010. This English language edition pub-
lished in 2014 by Chapman & Hall/CRC, Boca Raton, FL, U.S.A., by arrangement with the author c/o Iwanami Sho-
ten, Publishers, Tokyo.
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2014 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20140508
International Standard Book Number-13: 978-1-4665-6729-0 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the
validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the
copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to
publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let
us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted,
or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, includ-
ing photocopying, microfilming, and recording, or in any information storage or retrieval system, without written
permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com
(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers,
MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety
of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment
has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com

Contents
List of Figures xiii
List of Tables xxi
Preface xxiii
1 Introduction 1
1.1 Regression Modeling 1
1.1.1 Regression Models 2
1.1.2 Risk Models 4
1.1.3 Model Evaluation and Selection 5
1.2 Classiﬁcation and Discrimination 7
1.2.1 Discriminant Analysis 7
1.2.2 Bayesian Classiﬁcation 8
1.2.3 Support Vector Machines 9
1.3 Dimension Reduction 11
1.4 Clustering 11
1.4.1 Hierarchical Clustering Methods 12
1.4.2 Nonhierarchical Clustering Methods 12
2 Linear Regression Models 15
2.1 Relationship between Two Variables 15
2.1.1 Data and Modeling 16
2.1.2 Model Estimation by Least Squares 18
2.1.3 Model Estimation by Maximum Likelihood 19
2.2 Relationships Involving Multiple Variables 22
2.2.1 Data and Models 23
2.2.2 Model Estimation 24
2.2.3 Notes 29
2.2.4 Model Selection 31
2.2.5 Geometric Interpretation 34
2.3 Regularization 36
vii

viii
2.3.1 Ridge Regression 37
2.3.2 Lasso 40
2.3.3 L1 Norm Regularization 44
3 Nonlinear Regression Models 55
3.1 Modeling Phenomena 55
3.1.1 Real Data Examples 57
3.2 Modeling by Basis Functions 58
3.2.1 Splines 59
3.2.2 B-splines 63
3.2.3 Radial Basis Functions 65
3.3 Basis Expansions 67
3.3.1 Basis Function Expansions 68
3.4 Regularization 76
3.4.1 Regularized Least Squares 77
3.4.2 Regularized Maximum Likelihood Method 79
4 Logistic Regression Models 87
4.1 Risk Prediction Models 87
4.1.1 Modeling for Proportional Data 87
4.1.2 Binary Response Data 91
4.2 Multiple Risk Factor Models 94
4.3 Nonlinear Logistic Regression Models 98
5 Model Evaluation and Selection 105
5.1 Criteria Based on Prediction Errors 105
5.1.1 Prediction Errors 106
5.1.2 Cross-Validation 108
5.1.3 Mallows’ Cp 110
5.2 Information Criteria 112
5.2.1 Kullback-Leibler Information 113
5.2.2 Information Criterion AIC 115
5.2.3 Derivation of Information Criteria 121
5.2.4 Multimodel Inference 127

ix
5.3 Bayesian Model Evaluation Criterion 128
5.3.1 Posterior Probability and BIC 128
5.3.2 Derivation of the BIC 130
5.3.3 Bayesian Inference and Model Averaging 132
6 Discriminant Analysis 137
6.1 Fisher’s Linear Discriminant Analysis 137
6.1.1 Basic Concept 137
6.1.2 Linear Discriminant Function 141
6.1.3 Summary of Fisher’s Linear Discriminant
Analysis 144
6.1.4 Prior Probability and Loss 146
6.2 Classification Based on Mahalanobis Distance 148
6.2.1 Two-Class Classification 148
6.2.2 Multiclass Classification 149
6.2.3 Example: Diagnosis of Diabetes 151
6.3 Variable Selection 154
6.3.1 Prediction Errors 154
6.3.2 Bootstrap Estimates of Prediction Errors 156
6.3.3 The .632 Estimator 158
6.3.4 Example: Calcium Oxalate Crystals 160
6.3.5 Stepwise Procedures 162
6.4 Canonical Discriminant Analysis 164
6.4.1 Dimension Reduction by Canonical
Discriminant Analysis 164
7 Bayesian Classification 173
7.1 Bayes’ Theorem 173
7.2 Classification with Gaussian Distributions 175
7.2.1 Probability Distributions and Likelihood 175
7.2.2 Discriminant Functions 176
7.3 Logistic Regression for Classification 179
7.3.1 Linear Logistic Regression Classifier 179
7.3.2 Nonlinear Logistic Regression Classifier 183
7.3.3 Multiclass Nonlinear Logistic Regression
Classifier 187
8 Support Vector Machines 193
8.1 Separating Hyperplane 193
8.1.1 Linear Separability 193
8.1.2 Margin Maximization 196

x
8.1.3 Quadratic Programming and Dual Problem 198
8.2 Linearly Nonseparable Case 203
8.2.1 Soft Margins 204
8.2.2 From Primal Problem to Dual Problem 208
8.3 From Linear to Nonlinear 212
8.3.1 Mapping to Higher-Dimensional Feature Space 213
8.3.2 Kernel Methods 216
8.3.3 Nonlinear Classiﬁcation 218
9 Principal Component Analysis 225
9.1 Principal Components 225
9.1.1 Basic Concept 225
9.1.2 Process of Deriving Principal Components and
Properties 230
9.1.3 Dimension Reduction and Information Loss 234
9.1.4 Examples 235
9.2 Image Compression and Decompression 239
9.3 Singular Value Decomposition 243
9.4 Kernel Principal Component Analysis 246
9.4.1 Data Centering and Eigenvalue Problem 246
9.4.2 Mapping to a Higher-Dimensional Space 249
9.4.3 Kernel Methods 252
10 Clustering 259
10.1 Hierarchical Clustering 259
10.1.1 Interobject Similarity 260
10.1.2 Intercluster Distance 261
10.1.3 Cluster Formation Process 263
10.1.4 Ward’s Method 267
10.2 Nonhierarchical Clustering 270
10.2.1 K-Means Clustering 271
10.2.2 Self-Organizing Map Clustering 273
10.3 Mixture Models for Clustering 275
10.3.1 Mixture Models 275
10.3.2 Model Estimation by EM Algorithm 277
A Bootstrap Methods 283
A.1 Bootstrap Error Estimation 283
A.2 Regression Models 285
A.3 Bootstrap Model Selection Probability 285

xi
B Lagrange Multipliers 287
B.1 Equality-Constrained Optimization Problem 287
B.2 Inequality-Constrained Optimization Problem 288
B.3 Equality/Inequality-Constrained Optimization 289
C EM Algorithm 293
C.1 General EM Algorithm 293
C.2 EM Algorithm for Mixture Model 294
Bibliography 299
Index 309

This page intentionally left blank
This page intentionally left blank

List of Figures
1.1 The relation between falling time (x sec) and falling
distance (y m) of a body. 3
1.2 The measured impact y (in acceleration, g) on the head
of a dummy in repeated experimental crashes of a
motorcycle with a time lapse of x (msec). 4
1.3 Binary data {0, 1} expressing the presence or absence of
response in an individual on exposure to various levels
of stimulus. 5
1.4 Regression modeling; the specification of models that
approximates the structure of a phenomenon, the estima-
tion of their parameters, and the evaluation and selection
of estimated models. 6
1.5 The training data of the two classes are completely
separable by a hyperplane (left) and the overlapping data
of the two classes may not be separable by a hyperplane
(right). 10
1.6 Mapping the observed data to a high-dimensional feature
space and obtaining a hyperplane that separates the two
classes. 10
1.7 72 chemical substances with 6 attached features, clas-
sified by clustering on the basis of mutual similarity in
substance qualities. 13
2.1 Data obtained by measuring the length of a spring (y
cm) under different weights (x g). 17
2.2 The relationship between the spring length (y) and the
weight (x). 18
2.3 Linear regression and the predicted values and residuals. 20
xiii

xiv LIST OF FIGURES
2.4 (a) Histogram of 80 measured values obtained while re-
peatedly suspending a load of 25 g and its approximated
probability model. (b) The errors (i.e., noise) contained
in these measurements in the form of a histogram having
its origin at the mean value of the measurements and its
approximated error distribution. 21
2.5 Geometrical interpretation of the linear regression model
y = Xβ+ε. M(X) denotes the (p+1)-dimensional linear
subspace spanned by the (p + 1) n-dimensional column
vectors of the design matrix X. 35
2.6 Ridge estimate (left panel) and lasso estimate (right
panel): Ridge estimation shrinks the regression coeffi-
cients β1, β2 toward but not exactly to 0 relative to the
corresponding least squares estimates β̂, whereas lasso
estimates the regression coefficient β1 at exactly 0. 41
2.7 The profiles of estimated regression coefficients for
different values of the L1 norm =
13
i=1 |βi(λ)| with λ
varying from 6.78 to 0. The axis above indicates the
number of nonzero coefficients. 45
2.8 The function pλ(|βj|) (solid line) and its quadratic
approximation (dotted line) with the values of βj along
the x axis, together with the quadratic approximation for
a βj0 value of 0.15. 48
2.9 The relationship between the least squares estimator
(dotted line) and three shrinkage estimators (solid lines):
(a) hard thresholding, (b) lasso, and (c) SCAD. 50
3.1 Left panel: The plot of 104 tree data obtained by
measurement of tree trunk girth (inch) and tree weight
above ground (kg). Right panel: Fitting a polynomial
of degree 2ʢsolid curveʣand a growth curve model
ʢdashed curve). 57
3.2 Motorcycle crash trial dataʢn = 133). 59
3.3 Fitting third-degree polynomials to the data in the
subintervals [a, t1], [t1, t2], · · ·, [tm, b] and smoothly
connecting adjacent polynomials at each knot. 60
3.4 Functions (x − ti)+ = max{0, x − ti} and (x − ti)3
+ included
in the cubic spline given by (3.10). 61
3.5 Basis functions: (a) {1, x}; linear regression, (b) poly-
nomial regression; {1, x, x2
, x3
}, (c) cubic splines, (d)
natural cubic splines. 62

LIST OF FIGURES xv
3.6 A cubic B-spline basis function connected four different
third-order polynomials smoothly at the knots 2, 3, and
4. 63
3.7 Plots of the first-, second-, and third-order B-spline
functions. As may be seen in the subintervals bounded
by dotted lines, each subinterval is covered (piecewise)
by the polynomial order plus one basis function. 65
3.8 A third-order B-spline regression model is fitted to a set
of data, generated from u(x) = exp{−x sin(2πx)}+0.5+ε
with Gaussian noise. The fitted curve and the true
structure are, respectively, represented by the solid line
and the dotted line with cubic B-spline bases. 66
3.9 Curve fitting; a nonlinear regression model based on a
natural cubic spline basis function and a Gaussian basis
function. 70
3.10 Cubic B-spline nonlinear regression models, each with a
different number of basis functions (a) 10, (b) 20, (c) 30,
(d) 40, fitted to the motorcycle crash experiment data. 73
3.11 The cubic B-spline nonlinear regression model y =
13
j=1 ŵjbj(x). The model is estimated by maximum
likelihood and selected the number of basis functions by
AIC. 75
3.12 The role of the penalty term: Changing the weight
in the second term by the regularization parameter γ
changes S γ(w) continuously, thus enabling continuous
adjustment of the model complexity. 78
3.13 The effect of a smoothing parameter λ: The curves
are estimated by the regularized maximum likelihood
method for various values of λ. 82
4.1 Plot of the graduated stimulus levels shown in Table 4.1
along the x axis and the response rate along the y axis. 89
4.2 Logistic functions. 90
4.3 Fitting the logistic regression model to the observed data
shown in Table 4.1 for the relation between the stimulus
level x and the response rate y. 90
4.4 The data on presence and non-presence of the crystals
are plotted along the vertical axis as y = 0 for the 44
individuals exhibiting their non-presence and y = 1 for
the 33 exhibiting their presence. The x axis takes the
values of their urine specific gravity. 92

xvi LIST OF FIGURES
4.5 The fitted logistic regression model for the 77 set of data
expressing observed urine specific gravity and presence
or non-presence of calcium oxalate crystals. 93
4.6 Plot of post-operative kyphosis occurrence along Y = 1
and non-occurrence along Y = 0 versus the age (x; in
months) of 83 patients. 99
4.7 Fitting the polynomial-based nonlinear logisitic regres-
sion model to the kyphosis data. 103
5.1 Fitting of 3rd-, 8th-, and 12th-order polynomial models
to 15 data points. 107
5.2 Fitting a linear model (dashed line), a 2nd-order poly-
nomial model (solid line), and an 8th-order polynomial
model (dotted line) to 20 data. 119
6.1 Projecting the two-dimensional data in Table 6.1 onto
the axes y = x1, y = x2 and y = w1x1 + w2x2. 139
6.2 Three projection axes (a), (b), and (c) and the distribu-
tions of the class G1 and class G2 data when projected
on each one. 140
6.3 Fisher’s linear discriminant function. 143
6.4 Mahalanobis distance and Euclidean distance. 151
6.5 Plot of 145 training data for a normal class G1 (◦), a
chemical diabetes class G2 (), and clinical diabetes
class G3 (×). 152
6.6 Linear decision boundaries that separate the normal
class G1, the chemical diabetes class G2, and the clinical
diabetes class G3. 154
6.7 Plot of the values obtained by projecting the 145
observed data from three classes onto the first two
discriminant variables (y1, y2) in (6.92). 170
7.1 Likelihood of the data: The relative level of occur-
rence of males 178 cm in height can be determined as
f(178|170, 62
). 176
7.2 The conditional probability P(x|Gi) that gives the relative
level of occurrence of data x in each class. 178
7.3 Decision boundary generated by the linear function. 184
7.4 Classification of phenomena exhibiting complex class
structures requires a nonlinear discriminant function. 185

LIST OF FIGURES xvii
7.5 Decision boundary that separates the two classes in
the nonlinear logistic regression model based on the
Gaussian basis functions. 187
8.1 The training data are completely separable into two
classes by a hyperplane (left panel), and in contrast,
separation into two classes cannot be obtained by any
such linear hyperplane (right panel). 194
8.2 Distance from x0 = (x01 x02)T
to the hyperplane
w1x1 + w2x2 + b = wT
x + b = 0. 196
8.3 Hyperplane (H) that separates the two classes, together
with two equidistant parallel hyperplanes (H+ and H−)
on opposite sides. 197
8.4 Separating hyperplanes with diﬀerent margins. 198
8.5 Optimum separating hyperplane and support vectors
represented by the black solid dots and triangle on the
hyperplanes H+ and H−. 202
8.6 No matter where we draw the hyperplane for separation
of the two classes and the accompanying hyperplanes
for the margin, some of the data (the black solid dots
and triangles) do not satisfy the inequality constraint. 205
8.7 The class G1 data at (0, 0) and (0, 1) do not satisfy
the original constraint x1 + x2 − 1 ≥ 1. We soften this
constraint to x1 + x2 − 1 ≥ 1 − 2 for data (0, 0) and
x1 + x2 − 1 ≥ 1 − 1 for (0, 1) by subtracting 2 and 1,
respectively; each of these data can then satisfy its new
inequality constraint equation. 205
8.8 The class G2 data (1, 1) and (0, 1) are unable to
satisfy the constraint, but if the restraint is softened to
−(x1 + x2 − 1) ≥ 1 − 2 and −(x1 + x2 − 1) ≥ 1 − 1 by
subtracting 2 and 1, respectively, each of these data can
then satisfy its new inequality constraint equation. 206
8.9 A large margin tends to increase the number of data
that intrude into the other class region or into the region
between hyperplanes H+ and H−. 207
8.10 A small margin tends to decrease the number of data
that intrude into the other class region or into the region
between hyperplanes H+ and H−. 207
8.11 Support vectors in a linearly nonseparable case: Data
corresponding to the Lagrange multipliers such that
0 α̂i ≤ λ (the black solid dots and triangles). 211

xviii LIST OF FIGURES
8.12 Mapping the data of an input space into a higher-
dimensional feature space with a nonlinear function. 214
8.13 The separating hyperplane obtained by mapping the
two-dimensional data of the input space to the higher-
dimensional feature space yields a nonlinear discrimi-
nant function in the input space. The black solid data
indicate support vectors. 216
8.14 Nonlinear decision boundaries in the input space vary
with different values σ in the Gaussian kernel; (a)
σ = 10, (b) σ = 1, (c) σ = 0.1, and (d) σ = 0.01. 221
9.1 Projection onto three different axes, (a), (b), and (c) and
the spread of the data. 226
9.2 Eigenvalue problem and the first and second principal
components. 230
9.3 Principal components based on the sample correlation
matrix and their contributions: The contribution of the
first principal component increases with increasing
correlation between the two variables. 237
9.4 Two-dimensional view of the 21-dimensional data set,
projected onto the first (x) and second (y) principal
components. 239
9.5 Image digitization of a handwritten character. 240
9.6 The images obtained by first digitizing and compressing
the leftmost image 7 and then decompressing transmitted
data using a successively increasing number of principal
components. The number in parentheses shows the
cumulative contribution rate in each case. 242
9.7 Mapping the observed data with nonlinear structure
to a higher-dimensional feature space, where PCA
is performed with linear combinations of variables
z1, z2, z3. 250
10.1 Intercluster distances: Single linkage (minimum dis-
tance), complete linkage (maximum distance), average
linkage, centroid linkage. 262
10.2 Cluster formation process and the corresponding den-
drogram based on single linkage when starting from the
distance matrix in (10.7). 265

LIST OF FIGURES xix
10.3 The dendrograms obtained for a single set of 72 six-
dimensional data using three different linkage tech-
niques: single, complete, and centroid linkages. The
circled portion of the dendrogram shows a chaining
effect. 266
10.4 Fusion-distance monotonicity (left) and fusion-distance
inversion (right). 267
10.5 Stepwise cluster formation procedure by Ward’s method
and the related dendrogram. 271
10.6 Stepwise cluster formation process by k-means. 272
10.7 The competitive layer comprises an array of m nodes.
Each node is assigned a different weight vector wj
= (wj1, wj2, · · · , wjp)T
( j = 1, 2, · · · , m), and the
Euclidean distance of each p-dimensional data to the
weight vector is computed. 274
10.8 Histogram based on observed data on the speed of
recession from Earth of 82 galaxies scattered in space. 276
10.9 Recession-speed data observed for 82 galaxies are
shown on the upper left and in a histogram on the
upper right. The lower left and lower right show the
models obtained by fitting with two and three normal
distributions, respectively. 279

List of Tables
2.1 The length of a spring under different weights. 16
2.2 The n observed data. 17
2.3 Four factors: temperature (x1), pressure (x2), PH (x3),
and catalyst quantity (x4), which affect the quantity of
product (y). 23
2.4 The response y representing the results in n trials, each
with a different combination of p predictor variables x1,
x2, · · ·, xp. 23
2.5 Comparison of the sum of squared residuals (σ̂2
)
divided by the number of observations, maximum
log-likelihood (β̂), and AIC for each combination of
predictor variables. 33
2.6 Comparison of the estimates of regression coefficients
by least squares (LS) and lasso L1. 44
4.1 Stimulus levels and the proportion of individuals re-
sponded. 88
5.1 Comparison of the values of RSS, CV, and AIC for
fitting the polynomial models of order 1 through 9. 119
6.1 The 23 two-dimensional observed data from the varieties
A and B. 138
6.2 Comparison of prediction error estimates for the clas-
sification rule constructed by the linear discriminant
function. 161
6.3 Variable selection via the apparent error rates (APE). 161
xxi

Preface
The aim of statistical science is to develop the methodology and the the-
ory for extracting useful information from data and for reasonable infer-
ence to elucidate phenomena with uncertainty in various fields of the nat-
ural and social sciences. The data contain information about the random
phenomenon under consideration and the objective of statistical analysis
is to express this information in an understandable form using statisti-
cal procedures. We also make inferences about the unknown aspects of
random phenomena and seek an understanding of causal relationships.
Multivariate analysis refers to techniques used to analyze data that
arise from multiple variables between which there are some relation-
ships. Multivariate analysis has been widely used for extracting useful in-
formation and patterns from multivariate data and for understanding the
structure of random phenomena. Techniques would include regression,
discriminant analysis, principal component analysis, clustering, etc., and
are mainly based on the linearity of observed variables.
In recent years, the wide availability of fast and inexpensive com-
puters enables us to accumulate a huge amount of data with complex
structure and/or high-dimensional data. Such data accumulation is also
accelerated by the development and proliferation of electronic measure-
ment and instrumentation technologies. Such data sets arise in various
fields of science and industry, including bioinformatics, medicine, phar-
maceuticals, systems engineering, pattern recognition, earth and environ-
mental sciences, economics, and marketing. Therefore, the effective use
of these data sets requires both linear and nonlinear modeling strategies
based on the complex structure and/or high-dimensionality of the data in
order to perform extraction of useful information, knowledge discovery,
prediction, and control of nonlinear phenomena and complex systems.
The aim of this book is to present the basic concepts of various pro-
cedures in traditional multivariate analysis and also nonlinear techniques
for elucidation of phenomena behind observed multivariate data, focus-
ing primarily on regression modeling, classification and discrimination,
dimension reduction, and clustering. Each chapter includes many figures
xxiii

xxiv PREFACE
and illustrative examples to promote a deeper understanding of various
techniques in multivariate analysis.
In practice, the need always arises to search through and evaluate
a large number of models and from among them select an appropriate
model that will work effectively for elucidation of the target phenom-
ena. This book provides comprehensive explanations of the concepts and
derivations of the AIC, BIC, and related criteria, together with a wide
range of practical examples of model selection and evaluation criteria.
In estimating and evaluating models having a large number of predictor
variables, the usual methods of separating model estimation and evalu-
ation are inefficient for the selection of factors affecting the outcome of
the phenomena. The book also reflects these aspects, providing various
regularization methods, including the L1 norm regularization that gives
simultaneous model estimation and variable selection.
The book is written in the hope that, through its fusion of knowl-
edge gained in leading-edge research in statistical multivariate analysis,
machine learning, and computer science, it may contribute to the un-
derstanding and resolution of problems and challenges in this field of
research, and to its further advancement.
This book might be useful as a text for advanced undergraduate and
graduate students in statistical sciences, providing a systematic descrip-
tion of both traditional and newer techniques in multivariate analysis and
machine learning. In addition, it introduces linear and nonlinear statisti-
cal modeling for researchers and practitioners in various scientific disci-
plines such as industrial and systems engineering, information science,
and life science. The basic prerequisites for reading this textbook are
knowledge of multivariate calculus and linear algebra, though they are
not essential as it includes a self-contained introduction to theoretical
results.
This book is basically a translation of a book published in Japanese
by Iwanami Publishing Company in 2010. I would like to thank Uichi
Yoshida and Nozomi Tsujimura of the Iwanami Publishing Company for
giving me the opportunity to translate and publish in English.
I would like to acknowledge with my sincere thanks Yasunori Fu-
jikoshi, Genshiro Kitagawa, and Nariaki Sugiura, from whom I have
learned so much about the seminal ideas of statistical modeling. I
have been greatly influenced through discussions with Tomohiro Ando,
Yuko Araki, Toru Fujii, Seiya Imoto, Mitsunori Kayano, Yoshihiko
Maesono, Hiroki Masuda, Nagatomo Nakamura, Yoshiyuki Ninomiya,
Ryuei Nishii, Heewon Park, Fumitake Sakaori, Shohei Tateishi, Takahiro
Tsuchiya, Masayuki Uchida, Takashi Yanagawa, and Nakahiro Yoshida.

xxv
I would also like to express my sincere thanks to Kei Hirose, Shuichi
Kawano, Hidetoshi Matsui, and Toshihiro Misumi for reading the
manuscript and oﬀering helpful suggestions. David Grubbs patiently en-
couraged and supported me throughout the ﬁnal preparation of this book.
I express my sincere gratitude to all of these people.
Sadanori Konishi
Tokyo, January 2014

Chapter 1
Introduction
The highly advanced computer systems and progress in electronic mea-
surements and instrumentation technologies have together facilitated the
acquisition and accumulation of data with complex structure and/or high-
dimensional data in various fields of science and industry. Data sets arise
in such areas as genome databases in life science, remote-sensing data
from earth-observing satellites, real-time recorded data of motion pro-
cess in system engineering, high-dimensional data in character recogni-
tion, speech recognition, image analysis, etc. Hence, it is desirable to re-
search and develop new statistical data analysis techniques to efficiently
extract useful information as well as elucidate patterns behind the data in
order to analyze various phenomena and to yield knowledge discovery.
Under the circumstances linear and nonlinear multivariate techniques are
rapidly developing by fusing the knowledge in statistical science, ma-
chine learning, information science, and mathematical science.
The objective of this book is to present the basic concepts of vari-
ous procedures in the traditional multivariate analysis and also nonlinear
techniques for elucidation of phenomena behind the observed multivari-
ate data, using many illustrative examples and figures. In each chapter,
starting from an understanding of the traditional multivariate analysis
based on the linearity of multivariate observed data, we describe nonlin-
ear techniques, focusing primarily on regression modeling, classification
and discrimination, dimension reduction, and clustering.
1.1 Regression Modeling
Regression analysis is used to model the relationship between a response
variable and several predictor (explanatory) variables. Once a model has
been identified, various forms of inferences such as prediction, control,
information extraction, knowledge discovery, and risk evaluation can be
done within the framework of deductive argument. Thus, the key to solv-
ing various real-world problems lies in the development and construction
of suitable linear and nonlinear regression modeling.
1

2 INTRODUCTION
1.1.1 Regression Models
Housing prices vary with land area and floor space, but also with proxim-
ity to stations, schools, and supermarkets. The quantity of chemical prod-
ucts is sensitive to temperature, pressure, catalysts, and other factors. In
Chapter 2, using linear regression models, which provide a method for
relating multiple factors to the outcomes of such phenomena, we de-
scribe the basic concept of regression modeling, including model spec-
ification based on data reflecting the phenomena, model estimation of
the specified model by least squares or maximum likelihood methods,
and model evaluation of the estimated model. Throughout this modeling
process, we select a suitable one among competing models.
The volume of extremely high-dimensional data that are observed
and entered into databases in biological, genomic, and many other fields
of science has grown rapidly in recent years. For such data, the usual
methods of separating model estimation and evaluation are ineffectual
for the selection of factors affecting the outcome of the phenomena, and
thus effective techniques are required to construct models with high re-
liability and prediction. This created a need for work on modeling and
has led, in particular, to the proposal of various regularization methods
with an L1 penalty term (the sum of absolute values of regression co-
efficients), in addition to the sum of squared errors and log-likelihood
functions. A distinctive feature of the proposed methods is their capabil-
ity for simultaneous model estimation and variable selection. Chapter 2
also describes various regularization methods, including ridge regression
(Hoerl and Kennard, 1970) and the least absolute shrinkage and selection
operator (lasso) proposed by Tibshirani (1996), within the framework of
linear regression models.
Figure 1.1 shows the results of an experiment performed to investi-
gate the relation between falling time (x sec) and falling distance (y m) of
a body. The figure suggests that it should be possible to model the rela-
tion using a polynomial. There are many phenomena that can be modeled
in this way, using polynomial equations, exponential functions, or other
specific nonlinear functions to relate the outcome of the phenomenon
and the factors influencing that outcome.
Figure 1.2, however, poses new difficulties. It shows the measured
impact y (in acceleration, g) on the head of a dummy in repeated experi-
mental crashes of a motorcycle into a wall, with a time lapse of x (msec)
as measured from the instant of collision (Härdle, 1990). For phenom-
ena with this type of apparently complex nonlinear structure, it is quite
difficult to effectively capture the structure by modeling with specific

REGRESSION MODELING 3
㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻝㻚㻜㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻞㻚㻜㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻌㻟㻚㻜㻌㻌㻌㻌㻌㻌
㻡㻜
㻌㻌㻌㻌㻌㻌㻌
㻠㻜
㻟㻜
㻞㻜
㻝㻜
㻜
㻌
GLVWDQFH
PHWHU

WLPH VHFRQG
[
Figure 1.1 The relation between falling time (x sec) and falling distance (y m)
of a body.
nonlinear functions such as polynomial equations and exponential func-
tions.
Chapter 3 discusses nonlinear regression modeling for extracting
useful information from data containing complex nonlinear structures. It
introduces models based on more flexible splines, B-splines, and radial
basis functions for modeling complex nonlinear structures. These models
often serve to ascertain complex nonlinear structures, but their flexibility
often prevents their effective function in the estimation of models with
the traditional least squares and maximum likelihood methods. In such
cases, these estimation methods are replaced by regularized least squares
and regularized maximum likelihood methods.
The latter two techniques, which are generally referred to as regular-
ization methods, are effectively used to reduce over-fitting of models to
data and thus prevent excessive model complexity, and are known to con-
tribute for reducing the variability of the estimated models. This chapter
also describes regularization methods within the framework of nonlinear
regression modeling.

4 INTRODUCTION
$FFHOHUDWLRQ
䪮
J
䪯

7LPHODSVH PV
[㻌
Figure 1.2 The measured impact y (in acceleration, g) on the head of a dummy
in repeated experimental crashes of a motorcycle with a time lapse of x (msec).
1.1.2 Risk Models
In today’s society marked by complexity and uncertainty, we live in a
world exposed by various types of risks. The risk may be associated
with occurrences such as traﬃc accidents, natural disasters such as earth-
quakes, tsunamis, or typhoons, or development of a lifestyle disease, with
transactions such as credit card issuance, or with many other occurrences
too numerous to enumerate. It is possible to gauge the magnitude of risk
in terms of probability based on past experience and information gained
in life in society, but often with only a limited accuracy.
All of this poses the question of how to probabilistically assess un-
known risks for a phenomenon using information obtained from data.
For example, in searching for the factors that induce a certain disease,
the problem is in how to construct a model for assessing the probabil-
ity of its occurrence based on observed data. The eﬀective probabilistic
model for assessing the risk may lead to its future prevention. Through
such risk modeling, moreover, it may also be possible to identify impor-
tant disease-related factors.
Chapter 4 presents an answer to this question, in the form of model-

REGRESSION MODELING 5
ing for the risk evaluation, and in particular describes the basic concept
of logistic regression modeling, together with its extension from linear
to nonlinear modeling. This includes models to assess risks based on bi-
nary data {0, 1} expressing the presence or absence of response in an
individual or object on exposure to various levels of stimulus, as shown
in Figure 1.3.
㻜㻌
㻝㻌
[㻦㻌YDULRXVOHYHOVRIVWLPXOXV

SUHVHQFHRIUHVSRQVH
DEVHQFHRIUHVSRQVH
[
3UREDELOLVWLF
ULVN
DVVHVVPHQW
Figure 1.3 Binary data {0, 1} expressing the presence or absence of response in
an individual on exposure to various levels of stimulus.
1.1.3 Model Evaluation and Selection
Figure 1.4 shows a process consisting essentially of the conceptualiza-
tion of regression modeling; the specification of models that approxi-
mates the structure of a phenomenon, the estimation of their parameters,
and the evaluation and selection of estimated models.
In relation to the data shown in Figure 1.1 for a body dropped from
a high position, for example, it is quite natural to consider a polyno-
mial model for the relation between the falling time and falling distance
and to carry out polynomial model fitting. This represents the processes
of model specification and parameter estimation. For elucidation of this

6 INTRODUCTION
Figure 1.4 Regression modeling; the specification of models that approximates
the structure of a phenomenon, the estimation of their parameters, and the eval-
uation and selection of estimated models.
physical phenomenon, however, a question may remain as to the opti-
mum degree of the polynomial model. In the prediction of housing prices
with linear regression models, moreover, a key question is what factors
to include in the model. Furthermore, in considering nonlinear regression
models, one is confronted by the availability of inﬁnite candidate models
for complex nonlinear phenomena controlled by smoothing parameters,
and the need for selection of models that will appropriately approximate
the structures of the phenomena, which is essential for their elucidation.
In this way, the need always arises to search through and evaluate
a large number of models and from among them select one that will
work eﬀectively for elucidation of the target phenomena, based on the
information provided by the data. This is commonly referred to as the
model evaluation and selection problem.
Chapter 5 focuses on the model evaluation and selection problems,
and presents various model selection criteria that are widely used as in-
dicators in the assessment of the goodness of a model. It begins with
a description of evaluation criteria proposed as estimators of prediction
error, and then discusses the AIC (Akaike information criterion) based

CLASSIFICATION AND DISCRIMINATION 7
on Kullback-Leibler information and the BIC (Bayesian information cri-
terion) derived from a Bayesian view point, together with fundamental
concepts that serve as the bases for derivation of these criteria.
The AIC, proposed in 1973 by Hirotugu Akaike, is widely used in
various fields of natural and social sciences and has contributed greatly
to elucidation, prediction, and control of phenomena. The BIC was pro-
posed in 1978 by Gideon E. Schwarz and is derived based on a Bayesian
approach rather than on information theory as with the AIC, but like the
AIC it is utilized throughout the world of science and has played a cen-
tral role in the advancement of modeling. Chapters 2 to 4 of this book
show the various forms of expression of the AIC for linear, nonlinear,
logistic, and other models, and give examples for model evaluation and
selection problems based on the AIC.
Model selection from among candidate models constructed on the
basis of data is essentially the selection of a single model that best ap-
proximates the data-generated probability structure. In Chapter 5, the
discussion is further extended to include the concept of multimodel in-
ference (Burnham and Anderson, 2002) in which the inferences are based
on model aggregation and utilization of the relative importance of con-
structed models in terms of their weighted values.
1.2 Classification and Discrimination
Classification and discrimination techniques are some of the most widely
used statistical tools in various fields of natural and social sciences. The
primary aim in discriminant analysis is to assign an individual to one
of two or more classes (groups) on the basis of measurements on fea-
ture variables. It is designed to construct linear and nonlinear decision
boundaries based on a set of training data.
1.2.1 Discriminant Analysis
When a preliminary diagnosis concerning the presence or absence of
a disease is made on the basis of data from blood chemistry analysis,
information contained in the blood relating to the disease is measured,
assessed, and acquired in the form of qualitative data. The diagnosis of
normality or abnormality is based on multivariate data from several test
results. In other words, it is an assessment of whether the person exam-
ined is included in a group consisting of normal individuals or a group
consisting of individuals who exhibit a disease-related abnormality.
This kind of assessment can be made only if information from test re-

8 INTRODUCTION
sults relating to relevant groups is understood in advance. In other words,
because the patterns shown by test data for normal individuals and for
individuals with relevant abnormalities are known in advance, it is pos-
sible to judge which group a new individual belongs to. Depending on
the type of disease, the group of individuals with abnormalities may be
further divided into two or more categories, depending on factors such as
age and progression, and diagnosis may therefore involve assignment of
a new individual to three or more target groups. In the analytical method
referred to as discriminant analysis, a statistical formulation is derived
for this type of problem and statistical techniques are applied to provide
a diagnostic formula.
The objective of discriminant analysis is essentially to find an ef-
fective rule for classifying previously unassigned individuals to two or
more predetermined groups or classes based on several measurements.
The discriminant analysis has been widely applied in many fields of sci-
ence, including medicine, life sciences, earth and environmental science,
biology, agriculture, engineering, and economics, and its application to
new problems in these and other fields is currently under investigation.
The basic concept of discriminant analysis was introduced by R. A.
Fisher in the 1930s. It has taken its present form as a result of subsequent
research and refinements by P. C. Mahalanobis, C. R. Rao, and others,
centering in particular on linear discriminant analysis. Chapter 6 begins
with discussion and examples relating to the basic concept of Fisher and
Mahalanobis for two-class linear and quadratic discrimination. It next
proceeds to the basic concepts and concrete examples of multiclass dis-
crimination, and canonical discriminant analysis of multiclass data in
higher-dimensional space, which enables visualization by projection to
lower-dimensional space.
1.2.2 Bayesian Classification
The advance in measurement and instrumentation technologies has en-
abled rapid growth in the acquisition and accumulation of various types
of data in science and industry. It has been accompanied by rapidly de-
veloping research on nonlinear discriminant analysis for extraction of
information from data with complex structures.
Chapter 7 provides a bridge from linear to nonlinear classification
based on Bayes’ theorem, incorporating prior information into a mod-
eling process. It discusses the concept of a likelihood of observed data
using the probability distribution, and then describes Bayesian proce-
dures for linear and quadratic classification based on a Bayes factor. The

CLASSIFICATION AND DISCRIMINATION 9
discussion then proceeds to construct linear and nonlinear logistic dis-
criminant procedures, which utilizes the Bayesian classification and links
it to the logistic regression model.
1.2.3 Support Vector Machines
Research in character recognition, speech recognition, image analysis,
and other forms of pattern recognition is advancing rapidly, through the
fusion of machine learning, statistics, and computer science, leading to
the proposal of new analytical methods and applications (e.g., Hastie,
Tibshirani and Friedman, 2009; Bishop, 2006). One of these is an ana-
lytical method employing the support vector machine described in Chap-
ter 8, which constitutes a classification method that is conceptually quite
different from the statistical methods. It has therefore come to be used
in many fields as a method that can be applied to classification problems
that are difficult to analyze effectively by previous classification meth-
ods, such as those based on high-dimensional data.
The essential feature of the classification method with the support
vector machine is, first, establishment of the basic theory in a context of
two perfectly separated classes, followed by its extension to linear and
nonlinear methods for the analysis of actual data. Figure 1.5 (left) shows
that the training data of the two classes are completely separable by a
hyperplane (in the case of two dimensions, a straight line segment). In
an actual context requiring the use of the classification method, as shown
in Figure 1.5 (right), the overlapping data of the two classes may not be
separable by a hyperplane.
With the support vector machine, mapping of the observed data to
high-dimensional space is used for the extension from linear to nonlin-
ear analysis. In diagnoses for the presence or absence of a given disease,
for example, increasing the number of test items may have the effect
of increasing the separation between the normal and abnormal groups
and thus facilitate the diagnoses. The basic concept is the utilization of
such a tendency, where possible, to map the observed data to a high-
dimensional feature space and thus obtain a hyperplane that separates the
two classes, as illustrated in Figure 1.6. An extreme increase in computa-
tional complexity in the high-dimensional space can also be surmounted
by utilization of a kernel method. Chapter 8 provides a step-by-step de-
scription of this process of extending the support vector machine from
linear to nonlinear analysis.

10 INTRODUCTION

7
Z [ E

7
Z [ E

* *
* *
Figure 1.5 The training data of the two classes are completely separable by a hy-
perplane (left) and the overlapping data of the two classes may not be separable
by a hyperplane (right).
/LQHDUOVHSDUDEOH
0DSSLQJRIWKHGDWDWR
KLJKGLPHQVLRQDOVSDFH
,QSXWVSDFH
+LJKGLPHQVLRQDOIHDWXUHVSDFH
7UDQVIRUPLQJ GDWD LQ DQ LQSXW VSDFH WR
SRLQWV LQ D KLJKHUGLPHQVLRQDO IHDWXUH
VSDFH VR WKDW WKH WUDQVIRUPHG GDWD EHFRPH
OLQHDUO VHSDUDEOH
Figure 1.6 Mapping the observed data to a high-dimensional feature space and
obtaining a hyperplane that separates the two classes.

DIMENSION REDUCTION 11
1.3 Dimension Reduction
Multivariate analysis generally consists of ascertaining the features of
individuals as a number of variables and constructing new variables by
their linear combination. In this process the procedures in multivariate
analysis have been proposed on the basis of the different type of crite-
rion employed. Principal component analysis can be regarded as a pro-
cedure of capturing the information contained in the data by the system
variability, and of defining a smaller set of variables with the minimum
possible loss of information. This reduction is achieved by employing an
orthogonal transformation to convert a large number of correlated vari-
ables into a smaller number of uncorrelated variables, called principal
components. Principal components enable the extraction of information
of interest from the data.
Principal component analysis can also be used as a technique for
performing dimension compression (i.e., dimension reduction), thus re-
ducing a large number of variables to a smaller number and enabling
1D (line), 2D (plane), and 3D (space) projections amenable to intuitive
understanding and visual discernment of data structures. In fields such
as pattern recognition, image analysis, and signal processing, the tech-
nique is referred to as Karhunen-Loève expansion and is also utilized for
dimension compression.
Chapter 9 begins with a discussion of the basic concept of principal
component analysis based on linearity. It next provides an example of
the application of principal component analysis to the performance of
dimension compression in transmitted image data reproduction. It also
discusses nonlinear principal component analysis for multivariate data
with complex structure dispersed in a high-dimensional space, using a
kernel method to perform structure searching and information extraction
through dimension reduction.
1.4 Clustering
The main purpose of discriminant analysis is to construct a discriminant
rule and predict the membership of future data among multiple classes,
based on the membership of known data (i.e., “training data”). In cluster
analysis, in contrast, as described in Chapter 10, the purpose is to divide
data into aggregates (“data clusters”) with their similarity as the criterion
in cases involved a mixture of data of uncertain class membership.
Cluster analysis is useful, for example, for gaining an understanding
of complex relations among objects, through its grouping by the similar-
ity criterion of objects with attached features representing multidimen-

12 INTRODUCTION
sional properties. Its range of applications is extremely wide, extend-
ing from ecology, genetics, psychology, and cognitive science to pattern
recognition in document classification, speech and image processing, and
throughout the natural and social sciences. In life sciences, in particular,
it serves as a key method for elucidation of complex networks of genetic
information.
1.4.1 Hierarchical Clustering Methods
Hierarchical clustering essentially consists of linking target data that are
mutually similar, proceeding in the stepwise formation from small to
large clusters in units of the smallest cluster. It is characterized by the
generation of readily visible tree diagrams called dendrograms through-
out the clustering process. Figure 1.7 shows 72 chemical substances with
6 attached features, classified by clustering on the basis of mutual sim-
ilarity in substance qualities. The lower portions of the interconnected
tree represent higher degrees of similarity in qualities, and as may be
seen from Figure 1.7, small clusters interlink to form larger clusters pro-
ceeding up the dendrogram. The utilization of cluster analysis enables
classification of large quantities of data dispersed throughout higher-
dimension space, which is not intuitively obvious, into collections of
mutually similar objects.
Chapter 10 provides discussion and illustrative examples of represen-
tative hierarchical clustering methods, including the nearest-neighbor,
farthest-neighbor, group-average, centroid, median, and Ward’s methods,
together with the process and characteristics of their implementation and
the basic prerequisites for their application.
1.4.2 Nonhierarchical Clustering Methods
In contrast to hierarchical clustering, with its stepwise formation of in-
creasingly large clusters finally ending in the formation of a single large
cluster, nonhierarchical clustering methods essentially consist of divid-
ing the objects into a predetermined number of clusters. One represen-
tative method is k-means clustering, which is used in cases where large-
scale data classification and hierarchical structure elucidation are unnec-
essary. Another is the self-organizing map, which is a type of neural net-
work proposed by T. Kohonen. It is characterized by the projection of
high-dimensional objects to a two-dimensional plane and visualization
of collections of similar objects by coloring, and its application is under
investigation in many fields.

CLUSTERING 13
Figure 1.7 72 chemical substances with 6 attached features, classified by clus-
tering on the basis of mutual similarity in substance qualities.
Chapter 10 describes clustering by k-means and self-organizing map
techniques, and the processes of their implementation. It also shows a
clustering method that utilizes a mixture distribution model formed by
combining several probability distributions, together with examples of
its application and model estimation.
The appendix contains an outline of the bootstrap methods, La-
grange’s method of undetermined multipliers, and the EM algorithm, all
of which are used in this book.

Chapter 2
Linear Regression Models
The modeling of natural and social phenomena from related data plays a
fundamental role in their explication, prediction, and control, and in new
discoveries, knowledge, and understanding of these phenomena. Models
that link the outcomes of phenomena to multiple factors that affect them
are generally referred to as regression models.
In regression modeling, the model construction essentially proceeds
through a series of processes consisting of: (1) assuming models based
on observed data thought to affect the phenomenon; (2) estimating the
parameters of the specified models; and (3) evaluating the estimated
models for selection of the optimum model. In this chapter, we consider
the fundamental concepts of modeling as embodied in linear regression
modeling, the most basic form of modeling for the explication of rela-
tionships between variables.
Regression models are usually estimated by least squares or max-
imum likelihood methods. In cases involving multicollinearity among
predictor variables, the ridge regression is used to prevent instability in
estimated linear regression models. In estimating and evaluating models
having a large number of predictor variables, the usual methods of sepa-
rating model estimation and evaluation are inefficient for the selection of
factors affecting the outcome of the phenomena. In such cases, regular-
ization methods with L1 norm penalty, in addition to the sum of squared
errors and log-likelihood functions, provide a useful tool for effective re-
gression modeling based on high-dimensional data. This chapter also de-
scribes ridge regression, lasso, and various regularization methods with
L1 norm penalty.
2.1 Relationship between Two Variables
In this section, we describe the basic method of explicating the relation-
ship between a variable that represents the outcome of a phenomenon and
a variable suspected of affecting this outcome, based on observed data.
The relationship used in our example is actually Hooke’s well-known
15

16 LINEAR REGRESSION MODELS
Table 2.1 The length of a spring under different weights.
xg 5 10 15 20 25 30 35 40 45 50
ycm 5.4 5.7 6.9 6.4 8.2 7.7 8.4 10.1 9.9 10.5
law of elasticity, which states, essentially, that a spring changes shape
under an applied force and that within the spring’s limit of elasticity the
change is proportional to the force.
2.1.1 Data and Modeling
Table 2.1 shows ten observations obtained by measuring the length of a
spring (y cm) under different weights (x g). The data are plotted in Fig-
ure 2.1. The plot suggests a straight-line relationship between the two
variables of spring length and suspended weight. If the measurements
were completely free from error, all of the data points might actually lie
in a straight line. As shown in Figure 2.1, measurement data generally
include errors commonly referred to as noise, and modeling is therefore
required to explicate the relationship between variables. To find the re-
lationship between the two variables of spring length (y) and weight (x)
from the data including the measurement errors, let us therefore attempt
the modeling based on an initially unknown function y = u(x).
We first consider a more specific expression for the unknown func-
tion u(x) that represents the true structure of the spring phenomenon.
The data plot, as well as our a priori knowledge that the function should
be linear, suggests that the function should describe a straight line. We
therefore adopt a linear model as our specified model, so that
y = u(x) = β0 + β1x. (2.1)
We then attempt to apply this linear model in order to explicate the re-
lationship between the spring length (y) and the weight (x) as a physical
phenomenon.
If there were no errors in the data shown in Table 2.1, then all 10 data
points would lie on a straight line with an appropriately selected inter-
cept (β0) and slope (β1). Because of measurement errors, however, many
of the actual data points will depart from any straight line. To include
consideration for this departure (ε) from a straight line by data points
obtained with different weights, we therefore assume that they satisfy

RELATIONSHIP BETWEEN TWO VARIABLES 17

:HLJKW [J
6SULQJOHQJWK

FP

Figure 2.1 Data obtained by measuring the length of a spring (y cm) under
diﬀerent weights (x g).
Table 2.2 The n observed data.
No. 1 2 · · · i · · · n
Experiment points (x) x1 x2 · · · xi · · · xn
Observed data (y) y1 y2 · · · yi · · · yn
the relation
Spring length = β0 + β1 × Weight + Error. (2.2)
For the individual data points, we then have 5.4 = β0+β15+ε1, · · ·, 8.2 =
β0 +β125+ε5, · · ·. Figure 2.2 illustrates the relationship considering the
ﬁfth data point (25, 8.2).
In general, let us assume that measurements are performed for n ex-
periment points, as in Table 2.2, and that a measurement at a given ex-
periment point xi is yi. The general model corresponding to (2.2) is then
yi = β0 + β1xi + εi, i = 1, 2, · · · , n, (2.3)


[
E E

E E

H

E E H

FP

:HLJKW [J
6SULQJOHQJWK
Figure 2.2 The relationship between the spring length (y) and the weight (x).
where β0 and β1 are regression coefficients, εi is the error term, and the
equation in (2.3) is called the linear regression model. The variable y,
which represents the length of the spring in the above experiment, is the
response variable and the variable x, which represents the weight in that
experiment, is the predictor variable. Variables y and x are also often
referred to as the dependent variable and the independent variable or the
explanatory variable, respectively.
This brings us to the question of how to fit a straight line to observed
data in order to obtain a model that appropriately expresses the data. It is
essentially a question of how to determine the regression coefficients β0
and β1. Various model estimation procedures can be used to determine
the appropriate parameter values. One of these is the method of least
squares.
2.1.2 Model Estimation by Least Squares
The underlying concept of the linear regression model (2.3) is that the
true value of the response variable at the i-th point xi is β0 +β1xi and that
the observed value yi includes the error εi. The method of least squares

consists essentially of finding the values of regression coefficients β0 and
β1 that minimize the sum of squared errors ε2
1 + ε2
2 + · · · +ε2
n, which is
expressed as
S (β0, β1) ≡
n

i=1
ε2
i =
n

i=1
{yi − (β0 + β1xi)}2
. (2.4)
Differentiating (2.4) with respect to the regression coefficients β0 and
β1, and setting the resulting derivatives equal to zero, we have
n

i=1
yi = nβ0 + β1
n

i=1
xi,
n

i=1
xiyi = β0
n

i=1
xi + β1
n

i=1
x2
i . (2.5)
The regression coefficients that minimize the sum of squared errors can
be obtained by solving the above simultaneous equations. This solution
is called the least squares estimates and is denoted by β̂0 and β̂1. The
equation
y = β̂0 + β̂1x, (2.6)
having its coefficients determined by the least squares estimates, is the
estimated linear regression model. We can thus find the model that best
fits the data by minimizing the sum of squared errors.
The value of ŷi = β̂0 + β̂1xi at each xi (i = 1, 2, · · · , n) is called
the predicted value. The difference between this value and the observed
value yi at xi, ei = yi−ŷi, is called the residual, and the sum of the squares
of the residuals is given by
n
i=1 e2
i (Figure 2.3)ɽ
Example 2.1 (Hooke’s law of elasticity) For the data shown in Ta-
ble 2.1, the sum of squared errors in the linear regression model is
S (β0, β1) = {5.4 − (β0 + β15)}2
+ {5.7 − (β0 + β110)}2
+ · · · + {10.5 −
(β0 + β150)}2
, in which S (β0, β1) is the function of the regression coef-
ficients β0, β1. The least squares estimates that minimize this function
are β̂0 = 4.65 and β̂1 = 0.12, and the estimated linear regression model
is therefore y = 4.65 + 0.12x. In this way, by modeling from a set of
observed data, we have derived in approximation a physical law repre-
senting the relationship between the weight and the spring length.
2.1.3 Model Estimation by Maximum Likelihood
In the least squares method, the regression coefficients are estimated by
minimizing the sum of squared errors. Maximum likelihood estimation

Ö Ö
[
E E

Ö Ö
ÖL L
[
E E

L
H
L

L
[
[
/LQHDUUHJUHVVLRQPRGHO
HVWLPDWHGEWKHOHDVW
VTXDUHVPHWKRG
5HVLGXDO
3UHGLFWHGYDOXH
Figure 2.3 Linear regression and the predicted values and residuals.
is an alternative method for the same purpose in which the regression
coeﬃcients are determined so as to maximize the probability of getting
the observed data, for which it is assumed that yi observed at xi emerges
in accordance with some type of probability distribution.
Figure 2.4 (a) shows a histogram of 80 measured values obtained
while repeatedly suspending a load of 25 g from one end of a spring.
Figure 2.4 (b) represents the errors (i.e., noise) contained in these mea-
surements in the form of a histogram having its origin at the mean value
of the measurements. This histogram clearly shows a region containing a
high proportion of the obtained measured values. A mathematical model
that approximates a histogram showing the probabilistic distribution of a
phenomenon is called a probability distribution model.
Of the various distributions that may be adopted in probability distri-
bution models, the most representative is the normal distribution (Gaus-
sian distribution), which is expressed in terms of mean μ and variance σ2
and denoted by N(μ, σ2
). In the normal distribution model, the observed
value yi at xi is regarded as the realization of the random variable Yi = yi,
and Yi is normally distributed with mean μi and variance σ2
f(yi|xi; μi, σ2
) =
1
√
2πσ2
exp

−
(yi − μi)2
2σ2

, (2.7)

0
5
10
15
20
25
㸦a㸧
0
5
10
15
20
25
㸦b㸧
5.
5 6.
5
6 7 7.
5 8 8.
5 Ѹ 2 Ѹ 1.
5 Ѹ 1Ѹ 0.
5
9 0 0.
5 1 1.
5
Figure 2.4 (a) Histogram of 80 measured values obtained while repeatedly sus-
pending a load of 25 g and its approximated probability model. (b) The errors
(i.e., noise) contained in these measurements in the form of a histogram hav-
ing its origin at the mean value of the measurements and its approximated error
distribution.
where μi for a given xi is the conditional mean value (true value)
E[Yi|xi] = u(xi) = μi of random variable Yi. In the normal distribution,
as may be clearly seen in Figure 2.4 (a), the proportion of measured val-
ues may be expected to decline sharply with increasing distance from the
true value.
In the linear regression model, it is assumed that the true values μ1,
μ2, · · ·, μn at the various data points lie on a straight line, and it follows
that for any given data point, μi = β0 + β1xi (i = 1, 2, · · · , n). Substitution
into (2.7) thus yields
f(yi|xi; β0, β1, σ2
) =
1
√
2πσ2
exp

−
{yi − (β0 + β1xi)}2
2σ2

. (2.8)
This function decreases with increasing deviation of the observed value
yi from the true value β0 + β1xi. Assuming that the observed data yi
around the true value β0 + β1xi at xi thus follow the probability distri-
bution f(yi|xi; β0, β1, σ2
), it is then an expression of the plausibility or
certainty of the occurrence of a given value of yi, called the likelihood of
yi.
Assuming that the observed data y1, y2, · · ·, yn are mutually inde-
pendent and identically distributed (i.i.d.), the likelihood with n data and
thus the plausibility with n speciﬁc data is given by the product of the
likelihoods of all observed data
n

i=1
f(yi|xi; β0, β1, σ2
) =
1
(2πσ2)n/2
exp
⎡
⎢
⎢
⎢
⎢
⎢
⎣−
1
2σ2
n

i=1
{yi − (β0 + β1xi)}2
⎤
⎥
⎥
⎥
⎥
⎥
⎦

≡ L(β0, β1, σ2
). (2.9)
Given the data {(xi, yi); i = 1, 2, · · · , n} in (2.9), the function L(β0, β1, σ2
)
of the parameters β0, β1, σ2
is then the likelihood function. Maximum
likelihood is a method of finding the parameter values that maximize this
likelihood function, and the resulting estimates are called the maximum
likelihood estimates. For ease of calculation, the maximum likelihood
estimates are usually obtained by maximizing the log-likelihood function
(β0, β1, σ2
) ≡ log L(β0, β1, σ2
)
= −
n
2
log(2πσ2
) −
1
2σ2
n

i=1
{yi − (β0 + β1xi)}2
. (2.10)
The parameter values β̂0, β̂1, σ̂2
that maximize the log-likelihood func-
tion are thus obtained by solving the equations
∂ (β0, β1, σ2
)
∂β0
= 0,
∂ (β0, β1, σ2
)
∂β1
= 0,
∂ (β0, β1, σ2
)
∂σ2
= 0. (2.11)
Specific solutions will be given in Section 2.2.2.
The first term of the log-likelihood function defined in (2.10) does
not depend on β0, β1, and the sign of the second term is always negative
since σ2
0. Accordingly, the values of the regression coefficients β0
and β1 that maximize the log-likelihood function are those that minimize
n

i=1
{yi − (β0 + β1xi)}2
. (2.12)
With the assumption of a normal distribution model for the data, the max-
imum likelihood estimates of the regression coefficients are thus equiv-
alent to the least squares estimates of the regression coefficients, that is,
the minimizer of (2.4).
2.2 Relationships Involving Multiple Variables
In the case of explicating a natural or social phenomenon that may in-
volve a number of factors, it is necessary to construct a model that links
the outcome phenomena to the factors that cause it. In the case of a chem-
ical experiment, the quantity (response variable, y) of the reaction prod-
uct may be affected by temperature, pressure, pH, concentration, catalyst
quantity, and various other factors. In this section, we discuss the models
used to explicate the relationship between the response variable y and the
various predictor variables x1, x2, · · · , xp that may explain this response.

RELATIONSHIPS INVOLVING MULTIPLE VARIABLES 23
Table 2.3 Four factors: temperature (x1), pressure (x2), PH (x3), and catalyst
quantity (x4), which affect the quantity of product (y).
No. Product (g) TEMP(ˆ) Pressure PH Catalyst
1 28.7 34.1 2.3 6.4 0.1
2 32.4 37.8 2.5 6.8 0.3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
i 52.9 47.6 3.8 7.6 0.7
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
86 65.8 52.6 4.8 7.8 1.1
Table 2.4 The response y representing the results in n trials, each with a different
combination of p predictor variables x1, x2, · · ·, xp.
Response variable Predictor variables
No. y x1 x2 · · · xp
1 y1 x11 x12 · · · x1p
2 y2 x21 x22 · · · x2p
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
i yi xi1 xi2 · · · xip
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
n yn xn1 xn2 · · · xnp
2.2.1 Data and Models
Table 2.3 is a partial list of the observed data in 86 experimental trials
with variations in four factors that affect the quantity of product formed
by a chemical reaction. Table 2.4 shows the notation corresponding to
the experimental data shown in Table 2.3, with response yi representing
the results in n trials, each with a different combination of p predictor
variables x1, x2, · · ·, xp. Thus, yi is observed as the result of the i-th
experiment point (i.e., the i-th trial) (xi1, xi2, · · · , xip).
The objective is to construct a model from the observed data that
appropriately links the product quantity y to the temperature, pressure,
concentration, pH, catalyst quantity, and other factors involved in the re-
action. From the concept of the linear regression model for two variables

Random documents with unrelated
content Scribd suggests to you:

The Project Gutenberg eBook of Grim: The
Story of a Pike

This ebook is for the use of anyone anywhere in the United States
and most other parts of the world at no cost and with almost no
restrictions whatsoever. You may copy it, give it away or re-use it
under the terms of the Project Gutenberg License included with this
ebook or online at www.gutenberg.org. If you are not located in the
United States, you will have to check the laws of the country where
you are located before using this eBook.
Title: Grim: The Story of a Pike
Author: Svend Fleuron
Illustrator: Dorothy Pulis Lathrop
Translator: J. Alexander
Jessie Muir
Release date: October 2, 2012 [eBook #40921]
Most recently updated: October 23, 2024
Language: English
Credits: Produced by Roger Frank and the Online Distributed
Proofreading Team at http://www.pgdp.net
*** START OF THE PROJECT GUTENBERG EBOOK GRIM: THE
STORY OF A PIKE ***

“A wild chase was going on in the depths, and
where it passed the rushes bowed their
sheaves.”
GRIM: THE STORY OF A PIKE
Translated from the Danish of

Svend Fleuron
by J. Muir and J. Alexander
Illustrated by Dorothy P. Lathrop
New York MCMXXI
Alfred A. Knopf
COPYRIGHT, 1919
By SVEND FLEURON
COPYRIGHT, 1921
By ALFRED A. KNOPF, Inc.
Original Title: Grim
PRINTED IN THE UNITED STATES OF AMERICA
To devour others and to avoid being devoured oneself,
that is life’s end and aim.
CONTENTS
I: LIFE
II: IN THE SHELTER OF THE CREEK

III: GRIM GOES EXPLORING
IV: THE MARAUDERS
V: THE PEARLY FISH
VI: THE MAN-ROACH
VII: THE RASPER
VIII: THE ANGLER’S END
IX: THE WEDDING FESTIVAL
X: IN THE MARSH
XI: TERROR
XII: GRIM DEVELOPS
XIII: A FIGHT WITH AN OTTER
XIV: THE ANGLER FROM TOWN
XV: LUCK
ILLUSTRATIONS
A wild chase was going on in the depths, and where it passed the
rushes bowed their sheaves.
With a hiss it curves its neck and turns the foil upwards,
snapping and biting at its tormentors.
She snaps eagerly at the nearest “worm,” but it escapes her by
adroitly curling up.
The bird darts upon her from behind with outstretched claws,
and drives them with full force into her back.

I: LIFE
Clear running water filled the ditch, but the bottom was dull black,
powdery mud. It lay inches deep, layer upon layer of one tiny
particle upon another, and so loose and light that a thick, opaque,
smoke-like column ascended at the slightest touch.
A monster, with the throat and teeth of a crocodile, a flat,
treacherous forehead, and large, dull, malicious eyes, was lying close
to the bottom in the wide, sun-warmed cross-dyke that cut its way
inland from the level depths of the great lake. The entire monster
measured scarcely a finger’s length.
The upspringing water-plants veiled her body and drew waving
shadows over her round, slender tail.
When the sun was shining she liked to stay here among the
bottom vegetation and imitate a drifting piece of reed. Her reddish-
brown colour with the tiger-like transverse stripes made an excellent
disguise. She simply was a piece of reed. Even the sharp-eyed
heron, which had dropped down unnoticed about a dozen yards off,
and was now noiselessly, with slow, cautious steps, wading nearer
and nearer, took her at the first glance for a stick.
All the ditch-water life of a summer day was pulsating around the
young pike.
Water-spiders went up for air and came down with it between
their hind legs, to moor their silvery diving-bells beneath the whorls
of the water-moss. One boat-bug after another, with a shining air-
bubble on its belly to act as a swimming-bag, and for oars a pair of
long legs sticking far out at the sides, darted with great spurts
through the water, or rose and sank with the speed of a balloon. The
young pike peered upwards, and saw in the shelter of a tuft of
rushes a collection of black, boat-shaped whirligigs, showing like
dots against the shining surface. The little water-beetles lay and
dozed; but all at once a sudden storm seemed to descend upon
them and they scattered precipitately, whirling away in wider and

wider circles, only to congregate again just as suddenly, like a flock
of sheep.
The young pike disappeared from the heron’s view in a cloud of
mud, and glided off to some distance, finally coming to anchor on a
wide submerged plain in a broad creek, shadowed by a clump of
luxuriant marsh marigolds, whose yellow flowers gleamed out from
among the clusters of green, heart-shaped leaves.
There was never any peace around her. When one animal was on
its way down, another would be on its way up. And the bed of ooze
beneath her was in incessant motion. Sticks moved to right and left;
hairy balls lay and rolled over one another; there was a twisting and
turning of larvae in all directions. The active water-beetles were
dredging incessantly, releasing leaves and stalks which slowly and
weirdly rose to the surface. Air-bubbles, too, were set free, and
ascended quickly with a rotary motion.
Here two large tiger-beetles were fighting with a poor water-bug.
The flat-bodied insect stretched out its scorpion-like claws towards
its enemies, but the tiger-beetles seized it one at each end, beat off
its claws with their strong palpi, and tore its head from its body. It
must have been almost a pleasure to find oneself so neatly
despatched!
Everything tortured and killed down here, some, indeed, even
devoured themselves. To lose arms and legs and flesh from their
body was all in the order of the day; and anything resting for but a
minute was taken for carrion.
The big horse-leech had wound its rhythmically serpentine way
through the water. It was tired now, and had just stretched itself out
for a moment’s rest, when the supposed pieces of stick upon which
it lay seized it, and voracious heads with sharp jaws attacked its
flesh. It was within an ace of being made captive for ever, but at last
succeeded in making its escape and pushing off, with two of its
tormentors after it.
The young pike watched attentively the flight of the black leech.
She saw that to devour others and to avoid being devoured oneself
was the end and aim of life.

For a long time she remained quite still, only an undulating
movement of the dorsal fin and the malicious glitter of the eyes
revealing her vitality. Slowly she opened and closed her small, wide
mouth, and let the oxidizing water flow over her blood-red gills.
It was not long before she had forgotten her recent peril, and
once more became filled with the cruel passion of the hunter.
From the shadow of the marsh marigolds she darted under the
newly unfolded leaf of a water-lily. This was a very favourite lurking-
place; she could lie there with her back right up against the under
surface of the leaf, and her snout on the very border of its shadow,
ready to strike. The silvery flash of small fish twinkled around her,
and myriads of tiny shining crustaceans whisked about so close to
her nose that at any moment she could have snapped them up by
the score into her voracious mouth.
It was especially things that moved that had a magic attraction
for Grim. From the time when, but twelve to fifteen days old, she
had consumed the contents of her yolksac, and opened her large
voracious mouth, everything that flickered, twisted and moved, all
that sought to escape, aroused her irresistible desire.
In the innermost depths of her being there was an over-
mastering need, expressing itself in an insatiableness, a conviction
that she could never have enough, and a fear that others would
clear the waters of all that was eatable. An insane greed animated
her; and even when she had eaten so much that she could eat no
more, she kept swimming about with spoil in her mouth.
On the other hand, anything at rest and quiet possessed little
attraction for her; she felt no hunger at sight of it, and no desire to
possess it: that she could take at any time.
——Meanwhile, the keen-eyed heron, wading up to its breast in
the water, comes softly and silently trawling through the ditch.
Sedately it goes about its business, stalking along with slow,
measured steps. Its big, seemingly heavy body sways upon its thin,
greenish yellow legs, its short tail almost combing the surface of the
water, while its long, round neck is in constant motion, directing the
dagger-like beak like a foil into all kinds of attacking positions.

“With a hiss it curves its neck and turns the foil
upwards, snapping and biting at its tormentors.”
Sea-crows and terns scream around it, and from time to time
three or four of them unite in harrying their great rival. Just as the
heron has brought its beak close to the surface of the water, ready
to seize its prey, the gulls dash upon it from behind. With a hiss it
curves its neck and turns the foil upwards, snapping and biting at its
tormentors.
An irritating little flock of gulls may go on thus for a long time;
and when at last, screaming and mocking, they take their departure,
they have spoilt many a chance and wasted many precious minutes
of the big, silent, patient fisher’s time.

The gulls once gone, the heron applies itself with redoubled zeal
to its business. From various attacking positions its beak darts down
into the water, but often without result, and it has to go farther
afield; then at last it captures a little eel.
It is not easy, however, to swallow the wriggling captive. The eel
twists, and refuses to be swallowed; so the bird has to reduce its
liveliness by rolling up and down in its sharp-edged beak. Then it
glides down.
This time, too, fortune is disposed to favour the young pike. The
heron, coming up behind her, cautiously bends its neck over the
drifting piece of reed. It sees there is something suspicious about it,
but thinks it is mistaken, and is about to take another step forward.
When only half-way, it pauses with its foot in the air; and the next
moment the blow falls.
Grim only once moved her tail. Then she was seized, something
hard and sharp and strong held her fast, and she passed head
foremost down into a warm, narrow channel.
There was a fearful crush of fish in the channel, and much
elbowing with fins and twisting of tails. Something behind her was
pushing, but the throng in front blocked the way: she could get no
farther.
And yet she glided on! Very slowly the thick slimy water in the
channel bore the living, muddy tangle that surrounded her along;
she felt the corners of her mouth rub against the sides of the
channel; she could scarcely breathe.
In the meantime the heron was flying homewards to its young,
carrying Grim and the rest of the catch. Out on the lake lay a boat in
which a man sat fishing. Experience told the bird it was a fisherman,
but here the bird was wrong. The man had a gun in the boat, and as
the bird sailed upwards a shot was fired which compelled it to
relinquish a part of its booty in order to escape more quickly.
Grim was among the fortunate ones. Suddenly the crush in the
long, dark channel grew less, and the sluggish stream of mud that
was bearing her along changed its course. A little later the stream
gathered furious pace and carried her with it; she saw light and felt
space round her; she was able to move her fins.

Then she fell from the heron’s beak, from a height of about
twenty yards. She had time to notice how suffocatingly dry the other
world was. It seemed to draw out her entrails, and all her efforts to
right herself were in vain.
Then she regained her native element; water covered her gills,
and she could begin to swim.

II: IN THE SHELTER OF THE CREEK
Grim was a year old when her scales began to grow.
In her early youth, when she could only eat small creatures, she
had lived exclusively upon water-insects and larvae; but from now
onwards she had no respect for any flesh but that which clothed her
own ribs.
She attacked any fish that was not big enough to swallow her,
and devoured bleak and small roach with peculiar satisfaction. Now
she took her revenge on the voracious small fry that had offended
her when she was still in an embryo state.
She had not been hatched artificially, or come into the world in a
wooden box with running water passing through it. No, the whole
thing had taken place in the most natural manner.
In the flickering sunshine of a March day, her mother, surrounded
by three equally ardent wooers, had spawned, and the eggs had
dropped and attached themselves to some tufts of grass at the edge
of the lake. The very next day, however, little fish had begun to
gather about those tufts; one day more, and there were swarms of
them. Eagerly they searched the tufts and devoured all the eggs
they could find; and so thoroughly did they go about their business,
that of the thousands upon thousands of the mother’s eggs, only
two that had fallen into the heart of a grass-stalk were left.
Out of one of these Grim had come. The sun had looked after
her, hatched her out, and taught her to seize whatever came in her
way. Now she was avenging the injuries to her tribe.
She possessed a remarkable power of placing herself, and knew
how to choose her position so as to disappear, as it were, in the
water. The stalks of the reeds threw their shadows across her body
in all directions; water-grass and drifting duck-weed veiled her; the
silly roach and other restless little fish flitted about her, sometimes
so close to her mouth that she could feel the waves made by their
tail-fins. Some would almost run right into her; but when they saw

her, then how the water flashed with starry gleams, and how quickly
they all made off!
She liked best to hide where the water-lilies floated in islands of
green, for there the treacherous shadows--her best friends--fell
clearly through the water; absorbed her, as it were, and made
capture easy for her. If she found herself discovered, she would
retreat with as little haste as possible; for that sort of thing aroused
too much attention, and created widespread disturbance in the fishy
world.
If she lay on the surface, for instance, and suspected that she
was being watched from above, she became, as it were, more and
more indistinct and one with the dark water, letting herself sink
imperceptibly, at the same time beginning to work all her fins. In
ample folds they softly crept round the long stick that her body now
resembled, fringed and veiled it and bore it away.
And just as she knew how to place herself, so did she know how
to move--cautiously and discreetly.
Formerly she had measured only a finger’s length, and now she
was already about a foot long; her voraciousness had increased in a
corresponding degree. She could eat every hour of the day. She
would fill herself right up to the neck, and even have half a fish
sticking out beyond. It was quite a common sight to see a little
flapping fish-tail for which her digestive organs had not room as yet,
sticking out of her mouth like a lively tongue. She would swim about
delightedly, sucking it as a boy would suck a stick of candy.
One day she was gliding slowly through a clump of rushes, as
lifeless and dead as any stick. Her eyes seemed to be on stalks and
spied eagerly round, but her body exhibited the least possible
movement and eagerness.
She turned, but even then holding herself stiff, and playing her
new part of a drifting stick in a masterly manner. As she did so she
discovered her brother, as promising a specimen of a young pike as
herself, with all the distinguishing marks of the race.
Although cold-blooded, she was of a fiery temperament, and as
she was also hungry, she stared greedily and with cannibal feelings
at the apparition. Her appetite grew in immeasurable units of time.

The food was at hand, it stared her in the face; she forgot
relationship and resemblance, and bending in the middle so that
head and tail met, she seized her brother with a lightning
movement.
He was quite as big as she, struggled until he was unable to
move a fin; but the stroke was successful.
She began to understand things, and grew ever fiercer and more
violent and voracious. Her teeth were doubled, and as they grew
they were sharpened by the continual suction of the water through
the gills. It was as if she understood their value, too, for she would
often take up her position on the bottom and stir up grains of fine,
hard sand, thus improving the grinding process considerably.
It was mostly in the half-light that she now went hunting, in the
early dawn or at dusk. Her sharp eyes could see in the dark like
those of the owl and the cat. When the shadows lengthened, and
the red glow from the sky spread over the water, she felt how
favourable her surroundings were, and she became one with the
power in her mighty nature.
But in the daytime, she lay peacefully drowsing.
The creek in which she lived had low-lying banks.
Among the short, thick grass, orchids and marsh marigolds
bloomed side by side, and the ragged robin unfolded its frayed, deep
pink flowers upon a stiff, dark brown stalk, that always had a mass
of frothy wetness about its head.
Farther out, the muddy water and horsetails began, and beyond
them the tall, waving reeds, which stretched away in great clumps
as far as it was possible for them to reach the bottom.
Where they left off, the round-stalked olive-green bog rushes
began, wading farther and farther out, until in midstream they
gathered in low clumps and groves, inhabited by an abundant insect
life.
Beautiful butterflies danced their bridal dance out there, some
bright yellow with black borders, others with the sunset glow upon
their wings. Dragon-flies and water-nymphs by the score refracted

the sun’s rays as they turned with a flash of all the colours of the
rainbow. Black whirligigs lay in clusters and slept; and on the india
rubber-like leaves of the water-lily, flies and wasps crawled about
dry-shop, and refreshed themselves with the water.
In the still, early morning the reeds sigh and tremble. The little
yellowish grey sedge-warbler comes out suddenly from its hiding
place, seizes the largest of the butterflies by the body, and as
suddenly disappears again. A little later it begins its soft little sawing
song, which blends so well with the perpetual, monotonous
whispering of the reeds.
Grim, down among the vegetation, only faintly catches the
subdued tones; she is occupied with an event that is developing with
great rapidity.
A moth has fallen suddenly into the clear water. It tries to rise,
but cannot, so darts rapidly across the surface of the water, dragging
its tawny wings behind it. It puts forth its greatest speed, making in
a straight line for the shore.
But the whirligigs have seen the shipwreck, and dart out on their
water-ski to tear the thing to pieces. They advance with the speed of
a torpedo-boat, and in peculiar spiral windings. A wedge-shaped
furrow stands out from the bow of each little pirate, and a tiny
cascade in his wake.
The poor moth becomes wetter and wetter, and less and less of
his body remains visible as he exerts himself to reach the safety of
the reeds, where he can climb up into a horse-tail and escape, just
as a cat climbs into a tree to escape from a dog.
Unfortunately he does not succeed; he is in a sinking condition,
and one of the whirligigs fastens voraciously upon his hind quarters.
The successful captor, however, is given no peace in which to
devour his prey. He has to let it go, and seize it, and let it go again;
and now a little fish--a bleak--begins to take a part in the play.
The fluttering chase continues noiselessly across the surface of
the water, and urged on by the whirligigs above and the bleak
beneath, the moth approaches the reeds.
With muscles relaxed and dorsal fin laid flat, Grim lies motionless
at its edge, whence again and again she catches a glimpse of the

little silvery fish.
Its delicate body is fat outside and in, plump and well nourished,
and to the eyes of the fratricide is an irresistible temptation, making
her hunger creep out to the very tips of her teeth.
Her dorsal fin opens out and is cautiously raised, while her eyes
greedily watch the movements of the nimble little fish.
Flash follows flash, each bigger and brighter than the other.
Grim feels the excitement and ecstasy of the spoiler rush over
her--all that immediately precedes possession of the spoil--and
delights in the sensation. She begins to change from her stick-like
attitude, and imperceptibly to bend in the middle.
The plump little fish is too much engrossed in its moth-hunt.
Unconcernedly it lets its back display a vivid, bright green lake-hue,
while with its silvery belly it reflects all the rainbow colours of the
water.
Another couple of seconds and the prey is near.
Then Grim makes her first real leap. It is successful. Ever since
she was the length of a darning-needle, she had dreamt of this leap,
dreamt that it would be successful.
The sedge-warbler in the reedy island heard the splash, and the
closing snap of the jaws. They closed with such firmness that the
bird could feel, as it were, the helpless sigh of the victim, and the
grateful satisfaction of the promising young pirate.
She was the tiger of the water. She would take her prey by
cunning and by craft, and by treacherous attack. She was seldom
able to swim straight up to her food. How could she chase the
nimble antelopes of the lake when, timid and easily startled, they
were grazing on the plains of the deep waters; they discovered her
before she got near them and could begin her leap!
Huge herds were there for her pleasure. She had no need to
exert herself, but could choose her quarry in ease and comfort. The
larger its size, and the greater the hunger and lust for murder that
she felt within her, the more violence and energy did she put into
the leap. But just as the falcon may miss its aim, so might she, and
it made her ashamed, like any other beast of prey; she did not
repeat the leap, but only hastened away.

But when her prey was struggling in her hundred-toothed jaws
and slapping her on the mouth with its quivering tail-fin, then slowly,
and with a peculiar, lingering enjoyment, she straightened herself
out from her bent leaping posture. If she was hungry, she
immediately swallowed her captive, but if not, she was fond, like the
cat, of playing with her victim, swimming about with it in her mouth,
twisting and turning it over, and chewing it for hours before she
could make up her mind to swallow it.
She ate, she stuffed herself; and with much eating she waxed
great.

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

Introduction To Multivariate Analysis Linear And Nonlinear Modeling Konishi

More Related Content

Similar to Introduction To Multivariate Analysis Linear And Nonlinear Modeling Konishi (20)

Recently uploaded (20)

Introduction To Multivariate Analysis Linear And Nonlinear Modeling Konishi