AIAA-SDM-PEMF-2013

Quantifying Regional Error in Surrogates by Modeling its
Relationship with Sample Density
Ali Mehmani, Souma Chowdhury , Jie Zhang, Weiyang Tong,
and Achille Messac
Syracuse University, Department of Mechanical and Aerospace Engineering
54th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and
Materials Conference
April 8-11, 2013, Boston, Massachusetts

Surrogate model
• Surrogate models are commonly used for providing a tractable and
inexpensive approximation of the actual system behavior in many
routine engineering analysis and design activities:
2

3
Broad Research Question
Structural Blade Design
ANSYS, Inc.
Expensive Inexpensive
Surrogate Model
Lower Fidelity How Low?

4
Broad Research Question
How to quantify the level of the surrogate accuracy ?
 further improvement of the surrogate,
 domain exploration,
 assessing the reliability of the optimal design,
 quantifying the uncertainty associated with the surrogate,
 construction of a weighted surrogate model, and
 …

Research Objective
 Develop a reliable method to quantify the surrogate error,
5
 This method should have the following characteristics:
 model independent
 no additional system evaluations
 local/global error measurement
 quantify the error of the actual surrogate

6
Regional Error Estimation of Surrogate

7
Regional Error Estimation of Surrogate
(REES)

Presentation Outline
8
• Review of surrogate model error measurement methods
• Relation of surrogate accuracy with sample density
• Regional Error Estimation of Surrogate
• Numerical examples: benchmark and an engineering
design problems

9
design problems

Surrogate Model Error Measurement Methods
 Error quantification methods can be classified,
 based on their computational expense, into methods that require
additional data, and methods that use existing data.
 based on the region of interest, into:
- Global error measure
(e.g., split sample, cross-validation, Akaike’s information criterion, and
bootstrapping).
- Local or point-wise error measure
(e.g., the mean squared errors for Kriging and the linear reference model
(LRM).)
10

 Error metrics,
11
• The mean squared error (MSE) (or root mean square error (RMSE) )
• The maximum absolute error (MAE)
• The relative absolute error (RAE)
actual values on ith test point
predicted values on ith test point

 Error metrics,
12
• The prediction sum of square (PRESS) is based on the leave-one-out
cross-validation error
• The root mean square of PRESS (PRESSRMS) based on the k-fold cross-
validation.
• The relative absolute error of cross-validation (RAECV) based on leave-
one-out approach.

13
design problems

Methodology: Concept
14
Model accuracy ∝ Available resources
In general, this concept can be applied for different methodologies
- Surrogate modeling,
- Finite Element Analysis, and
- ...

15
 Finite Element Analysis (numerical methods)
coarse mesh
(4 solid brick element)
medium mesh
fine mesh
Estimate total shear force and flexural moment at vertical
sections using Finite Element Analysis.
The finer mesh, the stresses are more precise due to the larger
number of elements

9 training points
3 training points
 Surrogate (mathematical model)
7 training points
16
Surrogate accuracy generally improves with increasing training points.
The location of additional points has
strong impact on surrogate accuracy.
This impact is highly problem and model
dependent.

17
design problems

Methodology: REES
18
 The REES method formulates the variation of error as a
function of training points using intermediate surrogates.
 This formulation is used to predict the level of error in a
final surrogate.

Methodology: REES
{𝑿𝒊𝒏}
{𝑿 𝒐𝒖𝒕}
𝑿 = 𝑿𝒊𝒏 + 𝑿 𝒐𝒖𝒕
Step 2 : Identification of sample points inside/outside region of interest
Step 1 : Generation of sample data
The entire set of sample points is represented by 𝑿 .
𝑿𝒊𝒏 : Inside-region data set
user-defined region of interest
𝑿 𝒐𝒖𝒕 : Outside-region data set
19

Methodology: REES
Step 3 : Estimation of the variation of the error with sample density
Inside-region point
Outside-region point
user-defined region
of interest
First Iteration :
Test Point
Training Point
Second iteration :
Test Point
Training Point
Third iteration :
Test Point
Training Point
Final Surrogate :
Training Point

Methodology: REES
 A position of sample points which are selected as training
points, at each iteration, is critical to the surrogate accuracy.
 The proposed error measure should be minimally sensitive to
the location of the test points at each iteration.
21
 Intermediate surrogates are
iteratively constructed (at each
iteration) over a sample set
comprising all samples outside the
region of interest and heuristic
subsets of samples inside the region
of interest.

Methodology: REES
 The number of iterations (𝑁 𝑖𝑡
) is defined
- dimension of a problem,
- number of inside sample points, and
- preference of the user
 The number of sample combinations
(𝑲 𝒕
) is defined,
 The intermediate subset for each
combination at specific iteration is defined
by
{𝜷 𝒌
} ⊂ 𝑿𝒊𝒏
#{𝜷 𝒌} = 𝒏 𝒕, 𝒏 𝒕−𝟏 < 𝒏 𝒕
𝒌 = 1,2, … , 𝐾 𝑡
 The intermediate training points and test
points for each combination at each
iteration is defined by
𝑿 𝑻𝑹 = 𝑿 𝒐𝒖𝒕 + 𝜷 𝒌
𝑿 𝑻𝑬 = 𝑿 − 𝑿 𝑻𝑹
 The intermediate surrogates
𝑓 𝑘, 𝒌 = 𝟏, 𝟐, . . , 𝑲 𝒕
are constructed for all combinations using the
intermediate training points ( 𝑿 𝑻𝑹 ), and are
tested over the intermediate test points ( 𝑿 𝑻𝑬 ).

Methodology: REES
 The median and the maximum errors are
estimated for each combination
𝒎 𝒕
: the number of test points in tth iteration
𝒆: the RAE value estimated on intermediate test points
23

Methodology: REES
 The median and the maximum errors are
estimated for each combination
24
Median error
Maximum error
Overall Fidelity Information
Minimum Fidelity Information
The median is a useful measures of central
tendency which is less vulnerable to outliers.

Methodology: REES
 Probabilistic models are developed using
a lognormal distribution to represent
median and maximum errors estimated
over all 𝑲 𝒕
combinations at each
iterations.
 The mode of distribution is selected to
represent the errors at each iteration.
Mode of median error distribution
Mode of maximum error distribution
 These values are used to relate the
variation of the surrogate error with
number of training points (sample
density).

The relation of the error with sample density
 12-D Test Problem (Dixon & Price, n=12)
Number of sample points # 𝑿 = 𝟓𝟓𝟎, Number of inside sample points # 𝑿𝒊𝒏 = # 𝑿
Number of training points at each iteration,𝒏 𝒕
= 5𝑡 + 50, 𝑡 = 1,2, … , 70
Number of sample combination, 𝑲 𝒕 = 500
Estimated mode of median errors Estimated mode of maximum errors
Number of Training Points
MOmax
MOmed
First iteration
Last iteration
# 𝑿 𝑻𝑹 = 𝟒𝟎𝟎
# 𝑿 𝑻𝑬 = 𝟏𝟎𝟎
# 𝑿 𝑻𝑹 = 𝟓𝟓
# 𝑿 𝑻𝑬 = 𝟒𝟒𝟓

The relation of the error with sample density
 12-D Test Problem (Dixon & Price, n=12)
Estimated mean of mean errorsEstimated mode of median errors
Meanmean
MOmed
REES Method Normalized k-fold CV

Methodology: REES
Step 4 : Prediction of regional error in the final surrogate
 The final surrogate model is constructed using the full set of training data.
 Regression models are applied to relate
- the statistical mode of the median error distribution(𝑴𝒐 𝒎𝒆𝒅)
- the statistical mode of the maximum error distributions(𝑴𝒐 𝒎𝒂𝒙), and
- the absolute maximum error (𝑨𝑩𝑺 𝒎𝒂𝒙)
at each iteration to the size of the inside-region training points (nt),
 These regression models are called the variation of error with sample density
(VESD).
The regression models are used to predict the level of the
error in the final surrogate within the region of interest.
28

Methodology: REES
Modeling the Variation of Regional Error with Training Point Density
 In this study, three types of the regression functions are used to represent
the variation of regional error with respect to the inside-region training points
Exponential regression model
Multiplicative regression model
Linear regression model
 The choice of these functions assume a smooth monotonic decrease of the
regional error with the training point density within that region.
 The root mean squared error metric is used to select the best-fit regression
model 29

30
design problems

Numerical Examples
 The effectiveness of the REES method is explored for applications with
- Kriging,
- Radial Basis Functions (RBF),
- Extended Radial Basis Functions (E-RBF), and
- Quadratic Response Surface (QRS).
 To evaluate practical and numerical efficiencies of the REES method,
three benchmark problems and an engineering design problem are tested.
 The error evaluated using REES, and the relative absolute error given by
leave-one-out cross-validation (𝑹𝑨𝑬 𝒄𝒗) are compared with the actual
error evaluated using relative absolute error on additional test
points (𝑹𝑨𝑬 𝒂𝒄𝒕𝒖𝒂𝒍).
31

MedianofRAEs
Numerical Examples
Results and Discussion
VESD regression models within the region of interest of surrogate models
constructed for the Branin-Hoo Function to predict,
Distribution of
median errors
Mode of the median error
distribution,
Predicted mode of median error
in the final surrogate,
VESDmed
Number of Inside-region Training Points 32

Numerical Examples
VESD regression models
within the region of interest of
surrogate models constructed
for the Branin-Hoo Function
to predict,
Type and coefficients of
VESDmed
RBFKriging
E-RBF QRS

MaximumofRAEs
Numerical Examples
VESD regression models within the region of interest of surrogate models constructed for the
Branin-Hoo Function to predict the mode of maximum ( ) and the absolute
maximum ( ) error.
Distribution of
maximum errors
Mode of the maximum
error distribution,
Absolute maximum error
Predicted mode of
maximum error in
the final surrogate
Predicted absolute
maximum error in
the final surrogate
34
Number of Inside-region Training Points

Numerical Examples
VESD regression models within the region of interest of surrogate models constructed for the
Branin-Hoo Function to predict the mode of maximum ( ) and the absolute
maximum ( ) error.
Type and coefficients of VESDABS
Type and coefficients of VESDmax
RBFKriging
E-RBF QRS

Numerical Examples
Wind Farm Power Generation
36
Surrogates are developed using Kriging, RBF, E-RBF, and QRS to
represent the power generation of an array-like wind farm.

Numerical Examples
37
It. 1 It. 2 It. 3 It. 4 Predicted Error
VESD regression models in different surrogates for the wind farm power
generation problem

Numerical Examples
38
The closer to one, the better the corresponding error measure.
predicted mode of median errors
median of RAEs evaluated on test
points
median of relative absolute
errors of cross-validation

Concluding Remarks
 We developed a new method to quantify surrogate error based on the
hypothesis that:
“The accuracy of the approximation model is related to the amount
of available resources”
 This relationship can be reliably quantified when the error measures is
less sensitive to sample locations or a type of application.
 The REES method addresses this issue.
 The preliminary results on benchmark and wind farm power generation
problems indicate that in majority of cases the REES method is more
accurate than other measures.
39
It is not possible using any existing methods

Future Works
 The scope for improvement the method
 The implementation of the proposed error measurement in
surrogate developments.
40

Acknowledgement
41
 I would like to acknowledge my research adviser
Prof. Achille Messac, and my co-adviser Prof.
Souma Chowdhury for their immense help and
support in this research.
 Support from the NSF Awards is also acknowledged.

42
Thank you
Questions
and
Comments

MedianofRAEs
Numerical Examples
VESD regression models within the region of interest of surrogate models
constructed for the Branin-Hoo Function to predict,
Distribution of
median errors
Mode of the median error
distribution,
Predicted mode of median error
in the final surrogate,
VESDmed
Number of Inside-region Training Points 43
Meanmean
Number of Inside-region Training Points
k-fold CV

AIAA-SDM-PEMF-2013

More Related Content

What's hot (18)

Viewers also liked (11)

Similar to AIAA-SDM-PEMF-2013 (20)

AIAA-SDM-PEMF-2013