Regression Theory

4th International Summer School
Achievements and Applications of Contemporary
Informatics, Mathematics and Physics
National University of Technology of the Ukraine
Kiev, Ukraine, August 5-16, 2009

Motivatio Regression Theory
with
Additive Models and CMARS
n

Gerhard-
Gerhard-Wilhelm Weber *
Inci Batmaz , Gülser Köksal , Fatma Yerlikaya , Pakize Taylan * *,
Elcin Kartal , Efsun Kürüm , Ayse Özmen

Institute of Applied Mathematics,
Middle East Technical University, Ankara, Turkey

* Faculty of Economics, Management and Law, University of Siegen, Germany
Center for Research on Optimization and Control, University of Aveiro, Portugal
* * Department of Mathematics, Dicle University, Turkey

Content

• Introduction, Motivation
• Regression
• Additive Models
• MARS
• PRSS for MARS
• CQP for MARS
• Tikhonov Regularization for MARS
• Numerical Experience and Comparison
• Research Extensions
• Conclusion

Introduction

learning from data has become very important
in every field of science and technology, e.g., in

• financial sector,
• quality improvent in manufacturing,
• computational biology,
• medicine and
• engineering.

Learning enables for doing estimation and prediction.

Regression is mainly based on the problems and methods of

• least squares estimation,
• maximum likelihood estimation
• and classification.

New tools for data analysis, based on nonparametric regression and smoothing:

• additive (and multiplicative) models.

Introduction

CART

vs.

MARS

Introduction

Additive (and multiplicative) models (studied at IAM, METU):

• spline regression in additive models,
• spline regression in generalized addive models,

• MARS:
piecewise linear (per dimension) regression in multiplicative models,

• spline regression for stochastic differential equations
via additive and nonlinear models.

Regression: a Motivation

One of motivations of this research has been the approximation of financial data points
(x,y), e.g., coming from

• the stock market,
• credit rating,
• economic factor,
• company properties.

For example, to estimate the probability of a default of a particular credit:
• It is used one of the latest three data points above.

There are different approaches for estimating the probability of a default.
• Regression models (binary choice) are one of them.
• For example, we assume that we have the dependent variable Y
with Y = 1 (“default”) or Y = 0 (“no default”) satisfies
Y = F(X ) +ε ,

X : vector of independent variable(s) (input) such as credit rating.

Regression: a Motivation

• Estimation for the default probability P,

P = E [ F ( X ) + ε ] = F ( X ).

• Also, this estimation can be done via following linear regression

Y = α + βΤ X + ε .
• An estimate for the default probability of a corparate bond can be obtained:

P = α + βΤ X ;

α and β are unknown parameters. They can be estimated via linear regression
methods or maximum likelihood estimation. In many important cases, these just mean
least squares estimation.

Regression

X = ( X1 , X 2 ,..., X m ) and output variable Y ;
T
Input vector

linear regression :
m
Y = E (Y X 1 ,..., X m ) + ε = β 0 + ∑ X j β j + ε
j =1

• E(Y | X) is linear (...) and

β = ( β 0 , β1 ,..., β m ) which minimizes
T
•

2

( )
N
RSS ( β ) := ∑ yi − x β T
i
i =1
or
RSS ( β ) = (Y − Xβ ) (Y − Xβ ) β = (XT X ) XT y,
T −1
ˆ

( )
−1
Cov( β) = X T X
ˆ σ2

Regression, Additive Models

In the input space:

• Classical understanding:

additive separation of variables

• New interpretation:

separation of clusters and corresponding enumeration


E Yi xi1 , xi 2 ,..., xi m = β 0 + ∑ f j ( xij )
( )
m
(A)
j =1

f j are estimated by a smoothing on a single coordinate.

Standard convention at xij : ( )
E f j ( xij ) = 0 .

• Backfitting algorithm (Gauss-Seidel algorithm).
• This procedure depends on the partial residual against xij :

rij = yi − β 0 − ∑ f k ( xik ) .
ˆ
k≠ j


• Estimating each smooth function by holding all the other ones fixed.

initialization: β 0 := ave( yi | i = 1,..., N ),
ˆ f j ( xij ) ≡ 0,
ˆ ∀i, j
cycle j = 1,..., m,1,..., m,1,...,
m
rij = yi − β 0 − ∑ f k ( xik ) ,
ˆ ˆ i = 1,..., N
k≠ j
ˆ
f j is updated by smoothing the partial residuals
m
rij = yi − β 0 − ∑ f k ( xik ) (i = 1,..., N ) against
ˆ ˆ x ij
k≠ j

until the functions almost do not change.

• Convergence (condition)


• Convergence of the backfitting, ˆ  f 
Tf = f ,
 
 . 
 . 
 
 . 
( )
I 
T j : IR Nm → IR Nm
ˆ a  S j −∑ k ≠ j f k 
 
 . 
 . 
 
 . 
 fm 
 

• Full cycle: T = Tm Tm-1...T1 ; then, Tl corresponds to l full cycles.
ˆ ˆ ˆ ˆ ˆ

• ˆ
Always converges if all smoother are symetric and all eigenvalues of T
are either +1 or in the interior of the unit ball: | λ |< 1 .

Regression, Generalized Additive Models

• To extend the additive model to a wide range of distribution families:
generalized additive models (GAM):

G ( µ ( X ) ) = ψ ( X ) = β0 + ∑ f j ( X j ),
m

i =1

θ := ( β 0 , f1 ,..., f m ) ,
T
• f j are unspecified, G : link function;

• f j : elements of a finite dimensional space consisting, e.g., of splines;

• spline orders (or degrees):
suitably choosen, depending on the density and variation properties
of the corresponding data in x and y components, respectively.

• problem of specifying θ becomes a finite dimensional parameter estimation problem.

Regression, Generalized Additive Models,
Splines
• x0 ,..., xN be N + 1 distinct knots of [a, b] and a = x0 < x1 < ... < xN = b

• The function g k (x) on the interval [a, b] is a spline of degree k relative to the
knots x j .

• If

(1) fk x ,x ∈ IPk (polynomial of degree ≤ k ; j = 0,..., N − 1 ),
 j j +1 
j+ 

(2) f k ∈ C k −1 [ a, b] ,

the space of splines g k on [a, b] is called ℘k and relative to the N + 1 distinct

knots; then, dim℘k = N + k. .
• In practice, a spline is represented by a different polynomial on each subinterval and
for this reason there could be a discontinuity in its kth derivative at the internal knots
x1 ,..., xN −1.

Regression, Generalized Additive Models,
Splines

• Characterize a spline of degree k, f k , j := f k  x , x can be represented by
 j j +1 

k
f k , j ( x ) = ∑ g ij ( x − x j )i , if x ∈  x j , x j +1  ;
 
i =0

(k + 1) N coefficients g ij to be determined.

• To hold: f k(,lj)−1 ( x j ) = f k(,lj) ( x j ) ( j = 1,..., N − 1; l = 0,..., k − 1),
there are k ( N − 1) conditions, and the remaining degrees of freedom are

(k + 1) N - k ( N − 1) = k + N .

Clustering for Generalized Additive Models

• Financial markets have different kinds of trading activities.
These activities work with

• short-, mid- or long-term horizons

• from days and weeks to months and years.

• These data can sometimes be problematic for being used at the models,

e.g.,
given a longer horizon with sometimes less frequent data recorded,
but to other times highly frequent measurements.

• the structure of data may has particular properties:

i. larger variability
ii. outliers
iii. some data do not have any meaning.


• data variation:

• for the sake of simplicity:
Nj ≡ N for each interval Ij


• Density:

I1 ,..., I m ; the density of the input data in the j-th interval:

number of point xij in I j
D j := .
length of I j

• Variation
If over the interval I j the data are (x1 j , y1 j ),..., (xN j , yN j ) :

N −1
V j := ∑y
i =1
i +1 j − yi j .

• If this value is big, at many data points,
the curvature of any approximating curve could be big.

occurrence of outliers,
instability of the model.


• I1 ,..., I p (or Q1 ,..., Qm ) intervals (or cubes) according to the data grouped.
• I j (cube Q j), the associated index of data variation

Ind j :=D j ⋅ V j
or
Ind j :=d j ( D j ) ⋅v j (V j )

• In fact, from both the viewpoints of data fitting and complexity (or stability),

o cases with a high variation distributed over a very long intervall are very much
less problematic than cases with a high variation over a short interval;

o oscillation,
o curvature,
o up to nonsmoothness,

o penalty!


• Additive model can be fit by data. Given observations for ( yi , xi ) (i = 1,2,...,N ).

• penalized sum of squares PRRS
2
N  m  m b
PRSS (β 0 , f1 ,..., f m ) : = ∑  yi − β 0 − ∑ f j ( xij )  + ∑ µ j ∫  f j (t j )  dt j
'' 2
 
i =1  j =1  j =1 a

• µj ≥ 0 (smoothing parameters, tradeoff)

• large values of µ j yield smoother curves,
smaller ones result in more fluctuation.

• New estimation methods for additive model with CQP :


min t,
t , β0 , f
2
N  m 
subject to ∑
i=1 
yi − β0 − ∑ f j ( xij )  ≤ t 2 , t ≥ 0,
j =1 
2
∫f (t j )  dt j ≤ M j (j = 1, 2,..., m).
''
 j 

dj

• The functions f j are splines: f j ( x) = ∑ θl j hl j ( x).
l =1

• Then, we get

min t,
t , β0 , f

W ( β 0 , θ ) 2 ≤ t 2 , t ≥ 0,
2
subject to
2
V j ( β0 ,θ ) ≤ M j (j = 1,..., m).
2


http://144.122.137.55/gweber/

MARS Multivariate Adaptive Regression Spline

• To estimate general functions of high-dimensional arguments.

• An adaptive procedure.

• A nonparametric regression procedure.

• No specific assumption about the underlying functional relationship
between the dependent and independent variables.

• Ability to estimate the contributions of the basis functions so that both
the additive and the interactive effects of the predictors are allowed to
determine the response variable.

• Uses expansions in piecewise linear basis functions of the form

c + ( x,τ ) = [ + ( x − τ )]+ , c - ( x,τ ) = [−( x − τ )]+ .

[q]+ := max {0, q}

MARS

y

• • •
• •
• • • •
• • •
• • • •
• • ••

c-(x,τ)=[−(x−τ)]+ c+(x,τ)=[+(x−τ)]+
τ x
Basic elements in the regression with MARS.

• Let us consider Y = f (X ) + ε, X = ( X 1 , X 2 ,..., X p )Τ

• The goal is to construct reflected pairs for each input X j ( j = 1, 2,..., p ).

MARS

y

• • •
• •
• • • •
• • •
• • • •
• • ••

c-(x,τ)=[−(x−τ)]+ c+(x,τ)=[+(x−τ)]+
τ x


• The goal is to construct reflected pairs for each input X j ( j = 1, 2,..., p )

MARS

y

• • •
• •
• • • •
• • •
• • • •
• • ••
c-(x,τ)=[−(x−τ)]+ c+(x,egressionx−τ)]+
rτ)=[+( w ith
τ x


• The goal is to construct reflected pairs for each input X j ( j = 1, 2,..., p )

MARS

• Set of basis functions:

{ {
% % % } }
℘:= ( X j − τ ) + , (τ − X j ) + | τ ∈ x1, j , x2, j ,..., xN , j , j ∈ {1, 2,..., p}

• Thus, f ( X ) can be represented by
M
Y = θ 0 + ∑ θ mψ m ( X ) + ε .
m =1

• ψ m (m = 1, 2,..., M ) are basis functions from ℘ or products of two or more such
functions; interaction basis functions are created by multiplying an existing basis
function with a truncated linear function involving a new variable.

• Provided the observations represented by the data ( xi , yi ) (i = 1, 2,..., N ) :

Km
ψ m ( x ) := ∏ [ sκ ⋅ ( xκ − τ κ )]+ .
m
j
m
j
m
j
j =1

MARS

• Two subalgorithms:

(i) Forward stepwise algorithm:

• Search for the basis functions.
• Minimization of some “lack of fit” criterion.
• The process stops when a user-specified value M max is reached.

• Overfitting.
So a backward deletion procedure is applied
by decreasing the complexity of the model
without degrading the fit to the data.

(i) Backward stepwise algorithm:

MARS

• Remove from the model basis functions that contribute to the smallest increase
ˆ
in the residual squared error at each stage, producing an optimally estimated model fα
with respect to each number of terms, called α .

• α is related with some complexity of the estimation.
• To estimate the optimal value of α :

N M(α ) := u + d K
∑ ( yi − ˆα ( xi ))2
i =1
f N := number of samples
GCV := u := number of independent basis functions
(1 − M(α ) N ) 2
K := number of knots selected by forward stepwise algorithm
d := cost of optimal basis

• Alternative:

PRSS for MARS

N M max 2

∑ ( yi − f ( xi ) ) + ∑ λm ∑ ∑
2
PRSS := θ m  Drα,sψ m (t m )  d t m
∫ 
2 2

i =1 m =1 α =1 r<s
α = (α1 ,α 2 ) r , s∈V ( m )

{
V (m) := κ m | j = 1, 2,..., K m
j }
t m := (tm1 , tm2 ,..., tm K )T
m

α = (α1 , α 2 )
α := α1 + α 2 , where α1 , α 2 ∈ {0,1} ( )
Drα, sψ m (t m ) := ∂αψ m ∂α1 trm ∂α 2 tsm (t m )

• Tradeoff between both accuracy and complexity.
• Penalty parameters λm .

Grid Selection

n

Motivatio

CQP and Tikhonov Regularization for MARS

ψ ( d i ) := (1,ψ 1 ( xi1 ),...,ψ M ( xiM ),ψ M +1 ( xiM +1 ),...,ψ M ( xiM ) )
T
max
max

d i := ( xi1 , xi2 ,..., xiM , xiM +1 , xiM + 2 ,..., xiM max )T
θ := (θ0 ,θ1,...,θ M )
Τ
(σ j ) j∈{1,2,..., Km } ∈ {0,1, 2,..., N + 1}
κ Km

max
,

  Km 
x =  x κ m m , x κ m m ,..., x κ m , ∆x := ∏  x κ m m − x κ m m 
m
ˆ ˆ m
i
 lσ1κ1 ,κ1 lσ κ2 ,κ 2
2
l κ m ,κ Km
K m  i
 j
j =1  lσ κ j +1 ,κ j l κ ,κ j 
j
 σ Km  σ j 

1
 2  
2

ψ (d ) := (ψ (d1 ),...,ψ (d N ) )  2
ˆ 
Lim :=  ∑ ∑  Drα,sψ m ( xim )   ∆x im  .
T

 ˆ 
 α =αα=1α 2 )
 ( 1 ,
r <s
r , s∈V ( m )

 


L is an ( M max + 1) × ( M max + 1) matrix.

CQP and Tikhonov Regularization for MARS

• For a short representation, we can rewrite the approximate relation as

M max ( N +1) Km

∑ λm ∑
2
PRSS = y −ψ (d )θ + L2 θ m .
im
2
2
m =1 i =1

• In case of the same penalty parameter λ = λm (=: ϕ 2 ), then:

2
PRSS = y −ψ (d ) θ + λ Lθ 2 .
2
2

Tikhonov regularization

CQP for MARS

• Conic quadratic programming:

min t,
t ,θ

subject to ψ (d ) θ − y 2 ≤ t ,
Lθ 2
≤ M.

In general : min cT x , subject to Di x − di 2 ≤ piT x − qi (i = 1, 2,..., k ).
x

CQP for MARS

. Moreover, (t,θ , χ ,η , ω , ω )
1 2 is a primal dual optimal solution if and only if

 0N ψ (d )   t   − y 
χ := 
  + ,
 1 0T +1   θ   0 
M max
 
 0 M max +1 L   t   0 M max +1 
η :=    +  ,
 0 T  θ   M 
0 M max +1 
  
 0T 1   0T max +1 0   1 
ω +   ω2 = 
N M

ψ (d )T  ,
 0 M max +1  1  LT 
0 M max +1   0 M max +1 

ω1T χ = 0, ω2 η = 0,
T

ω1 ∈ LN +1 , ω2 ∈ L M max + 2
,
χ ∈ LN +1 , η ∈ LM max + 2
.

CQP for MARS

• CQPs belong to the well-structured convex problems.

• Interior Point Methods.

• Better complexity bounds.

• Better practical performance.

C-MARS

Numerical Experience and Comparison

• We had the following data:
X1 1,5554 1,5326 -0,1823 0,1627 0,5687 0,1706 0,2041 -0,1823 -0,82 -0,7234 0,4446 -0,3291 -1,5583 1,2706 1,7555

X2 0,1849 1,1538 0,7586 -1,5363 1,906 0,3761 1,3323 -0,0064 -1,7275 1,141 0,3761 0,5673 -0,1976 0,7586 0,1849

X3 1,264 1,2023 -1,0995 0,8529 1,3051 -0,3802 -0,7913 0,1336 0,2363 -1,0995 -0,0719 -0,894 -1,0995 0,9557 1,5722

X4 1,2843 1,0175 -0,9676 0,7408 1,0635 -0,506 -0,7937 -0,0564 0,0455 -0,9676 -0,2482- 0,8557 -0,9676 0,8707 1,7339

X5 -0,7109 0,1777 0,1422 0,0355 3,2699 0,3554 -0,1777 1,5283 -0,0711 0,3554 0,8886 0,4621 -0,9241 -0,9241 -0,0711

Y 0,67 0,9047 -0,197 -1,0108 0,1616 0,2984 -0,6039 0,8823 -1,6832 0,9531 -0,3208 0,0507 -0,3916 0,44 0,263

X1 0,0474 -0,8713 -0,2158 0,2179 1,5426 -1,16 0,9857 0,6752 0,5402 -1,4528 1,9349 -0,8299 -0,681 0,7304 -1,1305

X2 0,9498 -0,1976 -1,7275 -0,9626 1,3323 -0,9626 0,1849 -1,345 1,3323 -0,0064 0,1849 0,3761 -1,345 -0,7713 -0,0064

X3 0,0308 -0,6885 1,0584 0,5446 0,5446 -0,483 0,4419 1,264 0,0308 -1,3051 2,086 -0,5857 -0,2775 1,5722 -1,3051

X4 0,1543 -0,7278 1,0046 0,3752 0,3752 -0,5839 0,2613 1,2843 -0,1543 -1,0635 2,5631 -0,6578 -0,4241 1,7339 -1,0635

X5 1,1018 0,6753 -0,391 -0,2843 1,4217 0,4621 -0,8175 0,7819 0,2488 1,5283 -0,1777 -1,7771 0,4621 -1,0307 0,3554

Y 1,1477 -0,3916 -0,4624 -1,0993 2,8639 -1,0285 0,1923 -0,7631 2,05 1,0238 0,9177 -1,2055 -0,3208 -0,5862 -0,6216


• We constructed model functions for these data using the MARS Software where we
selected the maximum number of basis elements: M = 5. Then,
max

Model 1 : ω = 1 Model 2 : ω = 2
BF1 = max{0, X 2 + 1.728}; BF1 = max{0, X 2 + 1.728};
Y = -1.081 + 0.626 * BF1 BF2 = max{0, X 5 - 0.462}* BF1
Y = -1.073 + 0.499* BF1 + 0.656 * BF2

best model >>> Model 3 : ω = 3
BF1 = max{0, X 2 + 1.728};
BF2 = max{0, X 5 - 0.462} * BF1
BF4 = max{0, X 3 + 0.586} * BF1
Y = -1.176 + 0.422 * BF1 + 0.597 * BF2 + 0.236 * BF4

• and, finally,

Model 4 : ω = 4
BF1 = max{0, X 2 + 1.728}
BF2 = max{0, X 5 - 0.462} * BF1
BF3 = max{0, 0.462 - X 5 } * BF1
BF4 = max{0, X 3 + 0.586} * BF1
Y = -1.242 + 0.555 * BF1 + 0.484 * BF2 - 0.093 * BF3
+ 0.226 * BF4
Model 5 : ω = 5
BF1 = max{0, X 2 + 1.728};
BF2 = max{0, X 5 - 0.462} * BF1
BF3 = max{0, 0.462 - X 5 } * BF1
BF4 = max{0, X 3 + 0.586} * BF1
BF5 = max{0, - 0.586 - X 3 } * BF1
Y = -1.248 + 0.487 * BF1 + 0.486 * BF2 - 0.118 * BF3 + 0.282 * BF4 + 0.263 * BF5


• Then, we considered a large model with 5 five basis functions; we found
(writing a MATLAB code):

0 0 0 0 0 0 
0 1.8419 0 0 0 0 
 
0 0 0.7514 0 0 0 
L=  
0 0 0 0.9373 0 0 
0 0 0 0 2.1996 0 
 
0 0 0 0 0 0.3905

• We constructed models using different values for M in the optimization problem,
which was solved by MOSEK (CQP).

• Our algorithm constructs a model with 5 parameters always;
in case of Salford, there are 1, 2, 3, 4 or 5 parameters.


RESULTS OF SALFORD MARS

ω RSS z = RSS t = Lθ 2 GCV

1 17.6425 4.2003 1.1531 0.771
2 11.1870 3.3447 1.0430 0.613
3 7.7824 2.7897 1.0368 0.550

4 6.6126 2.5715 1.1967 0.626

5 6.2961 2.5092 1.1600 0.840

RESULTS OF OUR APPROACH

M ω z = RSS t = Lθ 2 M ω z = RSS t = Lθ 2

0.05 5 5.16894 0.05 0.2940 5 4.2024 0.2940
0.1 5 4.959342 0.1 0.2945 5 4.2006 0.2945
0.15 5 4.755559 0.15 0.295 5 4.1988 0.2950
0.2 5 4.557617 0.2 0.3 5 4.180557 0.3
0.25 5 4.365811 0.25 0.35 5 4.002338 0.35
0.265 5 4.3095 0.2650 0.4 5 3.831675 0.4
0.275 5 4.2723 0.2750 0.45 5 3.669118 0.45
0.285 5 4.2354 0.2850 0.5 5 3.515233 0.5
0.2865 5 4.2299 0.2865 0.55 5 3.370588 0.55
0.2875 5 4.2262 0.2875 0.552 5 3.3650 0.5520
0.2885 5 4.2226 0.2885 0.555 5 3.3567 0.5550
0.2895 5 4.2189 0.2895 0.558 5 3.3483 0.558
0.28965 5 4.2183 0.2897 0.560 5 3.3428 0.5600
0.28975 5 4.2180 0.2897 0.561 5 3.3401 0.5610
0.28985 5 4.2176 0.2899 0.562 5 3.3373 0.5620
0.28995 5 4.2172 0.2899 0.565 5 3.3291 0.5650

M ω z = RSS t = Lθ 2 M ω z = RSS t = Lθ 2

0.575 5 3.3019 0.5750 0.96 5 2.5968 0.96
0.585 5 3.2751 0.5850 0.97 5 2.5880 0.97
0.595 5 3.2488 0.5950 0.98 5 2.5797 0.98
0.6 5 3.235746 0.6 0.99 5 2.5718 0.99
0.65 5 3.111253 0.6 5 1 5 2.564459 1

0.7 5 2.997622 0. 7 2 5 2.509165 1.16009
0.75 5 2.895324 0.7 5 2.1 5 2.509165 1.16009
0.8 5 2.804764 0.8 2.2 5 2.509165 1.16009
0.805 5 2.7964 0.8050 2.3 5 2.509165 1.16007
0.810 5 2.7881 0.8100 2.4 5 2.509165 1.16008
0.820 5 2.7719 0.8200 2.5 5 2.509165 1.16001
0.830 5 2.7562 0.8300 2.6 5 2.509165 1.16007
0.840 5 2.7410 0.8400 2.7 5 2.509165 1.16007
0.85 5 2.726261 0.85 2.8 5 2.509165 1.16009
0.9 5 2.660023 0.9 2.9 5 2.509165 1.16009
0.95 5 2.60612 0.95 3 5 2.509165 1.16009
4 5 2.509165 1.160084


. We drew L curves:

5.5 5.5

5 5
2
ψ (d ) θ − y

ψ (d )θ − y 2
4.5 4.5

4 4

3.5 3.5

3 3

2.5 2.5
0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4

Lθ 2 Lθ 2

• Conclusion: Based on the L curve criterion and for the given data, our solution is better
than Salford solution for MARS.


• All test data sets are also compared according to the performance
measure such as MSE, MAE, Correlation Coefficient, R2, PRESS,
Mallows’ Cp etc..
• These measures are based on the average of nine values (one for
each fold and each replication).
C- M A R S


Please find much more numerical experience and comparison in

Yerlikaya, Fatma,

A New Contribution to Nonlinear Robust Regression and Classification with
MARS and Its Application to Data Mining for Quality Control in Manufacturing,

MSc. Thesis at Institute of Applied Mathematics of METU, Ankara, 2008.

Piecewise Linear Functions - Stock Market

figures generated by
Erik Kropat

Forward Stepwise Algorithm Revisited

high complexity

Forward Stepwise Algorithm Revisited

Regularization & Uncertainty Robust Optimization

• •
•

Laurent El Ghaoui

Regularization & Uncertainty Robust Optimization

References
• Aster, A., Borchers, B., and Thurber, C., Parameter Estimation and Inverse Problems, Academic Press,
2004.
• Breiman, L., Friedman, J. H., Olshen, R., and Stone, C., Classification and Regression Trees, Belmont, CA:
Wadsworth Int. Group, 1984.
• Craven, P., and Wahba, G., Smoothing noisy data with spline functions: estimating the correct degree of
smoothing by the method of generalized cross-validation, Numerische Mathematik 31 (1979) 377-403.
• Friedman, J.H., Multivariate adaptive regression splines, The Annals of Statistics 19, 1 (1991) 1-141.
• Hansen, P.C., Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion,
SIAM, Philadelphia, 1998.
• Hastie, T., Tibshirani, R., and Friedman, J.H., The Element of Statistical Learning, Springer Verlag, NY, 2001.
• MOSEK SOFTWARE, http://www.mosek.com/ .
• Myers, R.H., and Montgomery, D.C., Response Surface Methodology: Process and Product
Optimization Using Designed Experiments,New York: Wiley (2002).
• Nemirovski, A., Lectures on modern convex optimization, Israel Institute Technology (2002),
http://iew3.technion.ac.il/Labs/Opt/LN/Final.pdf.
• Nesterov, Y.E., and Nemirovskii, A.S., Interior Point Methods in Convex Programming, SIAM, 1993.
• Taylan, P., Weber, G.-W., and Beck, A., New approaches to regression by generalized additive models and
continuous optimization for modern applications in finance, science and technology, Optimization, 56, 5–6,
October–December (2007) 675–698.
• P. Taylan, P., Weber , G.-W., and Yerlikaya, F., Continuous optimization applied in MARS for modern
applications in finance, science and technology, in ISI Proceedings of 20th Mini-EURO Conference
Continuous Optimization and Knowledge-Based Technologies, Neringa, Lithuania, May 20-23, 2008.

Regression Theory

More Related Content

What's hot (20)

Viewers also liked (9)

Similar to Regression Theory (20)

More from SSA KPI (20)

Recently uploaded (20)

Regression Theory