Multiple Target Machine Learning Prediction of Capacity Curves of Reinforced Concrete Shear Walls

Journal of Soft Computing in Civil Engineering 5-4 (2021) 90-113
How to cite this article: Yang Y, Cho IH. Multiple target machine learning prediction of capacity curves of reinforced concrete
shear walls. J Soft Comput Civ Eng 2021;5(4):90–113. https://doi.org/10.22115/scce.2021.314998.1381
2588-2872/ © 2021 The Authors. Published by Pouyan Press.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Contents lists available at SCCE
Journal of Soft Computing in Civil Engineering
Journal homepage: www.jsoftcivil.com
Multiple Target Machine Learning Prediction of Capacity Curves
of Reinforced Concrete Shear Walls
Yicheng Yang1
, In Ho Cho2*
1. Ph.D. Candidate, Civil, Construction and Environmental Engineering, Iowa State University, Iowa, United States
2. Associate Professor, Civil, Construction and Environmental Engineering, Iowa State University, Iowa, United
States
Corresponding author: icho@iastate.edu
https://doi.org/10.22115/SCCE.2021.314998.1381
ARTICLE INFO ABSTRACT
Article history:
Received: 13 November 2021
Revised: 22 November 2021
Accepted: 22 November 2021
Reinforced concrete (RC) shear wall is one of the most
widely adopted earthquake-resisting structural elements.
Accurate prediction of capacity curves of RC shear walls has
been of significant importance since it can convey important
information about progressive damage states, the degree of
energy absorption, and the maximum strength. Decades-long
experimental efforts of the research community established a
systematic database of capacity curves, but it is still in its
infancy to productively utilize the accumulated data. In the
hope of adding a new dimension to earthquake engineering,
this study provides a machine learning (ML) approach to
predict capacity curves of the RC shear wall based on a
multi-target prediction model and fundamental statistics.
This paper harnesses bootstrapping for uncertainty
quantification and affirms the robustness of the proposed
method against erroneous data. Results and validations using
more than 200 rectangular RC shear walls show a promising
performance and suggest future research directions toward
data- and ML-driven earthquake engineering.
Keywords:
Machine learning for capacity
curve prediction;
Multiple-target regression
model;
Clus;
Shear wall database;
Uncertainty quantification.

Y. Yang, I.H. Cho/ Journal of Soft Computing in Civil Engineering 5-4 (2021) 90-113 91
1. Introduction
In the past decades, persistent efforts have been devoted to gaining insights into the nonlinear
behaviors of damaged rectangular RC walls [1–4]. Driven by these accomplishments, the
research community benefits from databases (e.g., ACI 445B Shear Wall Database, Peer
Structural Performance Databases, and DesignSafe Platform). Many ML-based predictions of
RC structures have been on trial [5,6]. ML gives computers the ability to learn complex data
without explicitly programmed rules. ML can be categorized into single-target prediction and
multiple-target prediction methods regarding the number of prediction targets. There exist
various applications of single-target ML methods in infrastructure engineering. The prediction of
the shear strength of a deep beam was conducted by support vector regression (SVR). The
researchers modified the SVR algorithm to optimize hyperparameters to be more suitable for
civil applications [7,8]. Valdebenito [9] estimated the in-plane shear strength of reinforced
masonry (RM) using the artificial neural network (ANN). ANN model was trained and tested by
285 RM walls from pieces of literature. The compressive strength of high-performance concrete
had been predicted using the ensemble method [10]. Furthermore, with the interest of vertical
structural elements, prediction of horizontal forces was made via support vector machine and
ANN [5,11].
However, ML-based prediction of force-displacement (F-D) capacity curves is challenging since
it involves multiple-target predictions. Two rare examples of curve prediction include predictions
of soil-water characteristic curves (SWCCs) using genetic programming (GP) [12] and ANN
[13], respectively. In Johari’s work, SWCC itself was learned and predicted by the GP, but the
final prediction is a complex mathematical expression of the curve. Sajib developed ANN
models of the SWCC fitting parameters to predict the suction-water content relationship.
In this paper, we adopt a multi-target regression model (MTRM) to predict the capacity curves of
RC shear walls. This paper is structured as follows. The second section demonstrates the
methodology of the MTRM and its extension with ensemble learning. The third section presents
complete procedures to build the capacity curve database and perform capacity curve prediction.
The fourth section summarizes predictive results, validation, and impact of the extended database
and erroneous data on the proposed method. Finally, the last section yields the conclusion and
discusses the limitations and future extensions.
2. Multi-target regression model
MTRM has been implemented in the open-source machine learning system (named Clus)
developed by Struyf [14]. Clus is a decision tree learner and rule learning system that works in
the predictive clustering trees (PCTs) [14]. Prior to the demonstration of MTRM, it is instructive
to introduce the background of ML. There are two categories of ML methods depending on
training data. The first category is “supervised” learning, in which ML trains with data consisting
of a pair of {𝒙(𝑙)
, 𝒚(𝑙)
} that stands for a vector of descriptive variables and 𝒚(𝑙)
∈ ℝ𝑘
represents a
target vector. The superscript (𝑙) indicates labeled data. Contrarily, “unsupervised” learning
trains ML with unlabeled data consisting of {𝒙(𝑢)
} where (𝑢) indicates unlabeled data. A

92 Y. Yang, I.H. Cho/ Journal of Soft Computing in Civil Engineering 5-4 (2021) 90-113
decision tree, a typical supervised learning method, is a tree-shaped graph that uses a branching
method to demonstrate every possible outcome of a decision. It is widely used in data mining to
simplify complex problems. It usually starts with a single node, which branches into all possible
outcomes.
Fig. 1. (a) Illustrative example of a PCT; (b) example of a rule ensemble.
Each of those outcomes will branch into other nodes, which represent other possibilities.
Clustering, a representative unsupervised learning method, tries to find a collection of points that
are similar to each other in terms of homogeneous values of all variables compared with points
out of the cluster. Decision tree and clustering are therefore considered as quite different
methods. Decision trees partition instances to subsets in terms of values of target attributes only,
and clustering splits instances to subclusters regarding the value of all descriptive attributes.
Noteworthy, a PCT is a decision tree whose leaves do not contain classes, and each node, as well
as each leaf, corresponds to a cluster in Fig. 1 (a) with instances in the form of {𝑥1, 𝑥2, 𝑦}.
Diversely, PCTs search for subsets with the values of both descriptive attributes and target
attributes [15]. MTRM shares the same algorithm with PCTs in the context of constructing
clusters. PCTs can be built with a standard “top-down induction of decision trees” (TDIDT)
algorithm [16]. Top-down PCTs shape in a triangle whose root is up. All instances locate at the
root at the beginning, and they are partitioned into subclusters by tests. The pseudo algorithm of
constructing PCTs is presented in Table 1 [17].
It is instructive to recap key strategies of PCTs, i.e., a splitting criterion, a stopping criterion, and
a pruning strategy, respectively. There are many splitting criteria (e.g., Shannon entropy [18] and
Gain Ratio [19]). The purpose of splitting clustering is to obtain subclusters such that intra-
cluster distance (the distance between examples belonging to different clusters) is minimized.
For regression problems, intra-cluster distance is specified as the intra-cluster variance. Given a
cluster and a test that will result in a partition of the cluster to decrease the variance, the intra-
cluster variance is defined as:
𝑣𝑎𝑟 = ∑ 𝑑(𝒙𝑖, 𝒙)2
𝑁
𝑖=1 (1)

where 𝒙 ∈ ℝ𝑛
is the mean vector of the cluster, and 𝒙𝑖 ∈ ℝ𝑛
(𝑖 = 1, ⋯ , 𝑁) is an element in the
cluster, and 𝑁 is the total number of elements in the cluster. The entity 𝑑 stands for the Euclidean
distance. Growing trees without stopping criteria will lead to an overfitting problem. Often, a test
is applied to check whether the class distribution in the sub-clusters differs significantly. Since
the regression problem uses intra-cluster variance as the heuristic for choosing the best split, then
a reasonable stopping criterion is to use an F-test to check whether variance decreased
significantly, and thus a test will be found.
Table 1
Algorithm of constructing PCTs.
1: Function 𝑷𝑪𝑻(Training instances 𝐼): 9: Function 𝑩𝑻(𝐼):
2: (𝑡∗
, 𝑝∗
) = 𝑩𝑻 (𝐼) ; 10: 𝑝 = partition induced on 𝐼 by 𝑡 ;
3: If 𝑡∗
≠ none 11: (𝑡∗
, 𝑝∗
, ℎ∗) = (None, 0.5, 0) ;
4: for each 𝐼𝑘 ∈ 𝑃∗ 12: ℎ = 𝒗𝒂𝒓(𝐼) − ∑
|𝐼𝑘|
|𝐼|
𝒗𝒂𝒓(𝐼𝑘)
𝐼𝑘 ∈𝑝 ;
5: 𝑇𝑟𝑒𝑒𝑘 = 𝑃𝐶𝑇(𝐼𝑘) ; 13: for each test
6: return node(𝑡𝑘, 𝑇𝑟𝑒𝑒𝑘) ; 14: if (ℎ > ℎ∗
)
7: else if 15: (𝑡∗
, 𝑝∗
, ℎ∗) = (𝑡, 𝑝, ℎ) ;
8: return leaf (𝐼𝑝𝑟𝑜𝑡𝑜𝑦𝑝𝑒) ; 16: return (𝑡∗
, 𝑝∗
) ;
If no acceptable test is found, the algorithm labels the leaf with the prototype instances and stops
the growth. Pruning strategy is a technique to remove trivial parts of the tree to identify
instances. Often pruning is done randomly for large data. This paper does not adopt any pruning
strategies due to our small database size. The illustration of the pseudo-algorithm of constructing
PCTs will help engineers with a comprehensive understanding of the MTRM. The PCT function
takes instances I as input to grow trees. An instance represents a row of the dataset in this paper.
The function PCT in line 1 of Table 1 is the algorithm's main function, which grows the decision
tree until stopping criteria are met. The function BT is invoked in line 2 to search for the best test
to partition training instances to hierarchical clusters. BT returns optimal 𝑡 and 𝑝, denoted as
(𝑡∗
, 𝑝∗), where 𝑡 is an action test of attribute values to induce a partition on I, 𝑝 is a partition
induced on 𝐼 by 𝑡 (e.g., In Fig. 1 (a), a test 𝑡 on root node checks whether 𝑥1 is larger than two or
not to partition 𝐼 at the root to two sub-clusters via a partition 𝑝). The superscript “*” represents
the optimal (i.e., best-so-far) quantities. With BT in line 2, PCT function is invoked recursively
to obtain trees and the corresponding nodes within the loop in lines 5 and 6. However, if the best
test is not found in line 7, then the algorithm will return a leaf labeled as the prototype instances
in line 8. Usually, the prototype instances have the lowest average distance to all other instances
in the cluster, such as the mean of the original instances.
Function BT is explained in the right column of Table 1. BT searches for the best test to partition
the cluster to minimize intra-cluster variance (i.e., maximize inter-cluster variance). In line 11,
the candidates for the best test (𝑡∗
) along with the corresponding partition (𝑝∗
) and heuristic
value (ℎ∗
) are initialized. Here, ℎ is defined in line 12, meaning a heuristic value of 𝑡. Function

var is defined in Eq. (1). Since 𝑡∗
is initially unknown, ℎ∗
is set as zero. The loop in line 13
calculates heuristic values of all possible tests to partition clusters. The best test and partition will
be chosen if a current heuristic value ℎ is larger than the initial heuristic value ℎ∗
(lines 14-15).
2.1. Ensemble method
An ensemble method has been used to boost the prediction accuracy of this study. This method
generates an ensemble of prediction models since combining a number of predictions is often
more accurate than an individual prediction model [20,21].
Table 2
Pseudo code of ensemble method.
1: Let 𝐌 = the original training data; 𝑛𝑝𝑚 = number of prediction models; 𝐗 = the test data
2: for 𝑖 = 1 𝑡𝑜 𝑛𝑝𝑚 do
3: Create an identical training set 𝐌𝐢 from 𝐌
4: Build a prediction model 𝑃𝑀𝑖 with 𝐌𝐢
5: end
6: for each test record 𝑥𝑗 ∈ 𝐗, 𝑗 = 1, … , 𝑛 do
7: 𝑃𝑀𝑓𝑖𝑛𝑎𝑙(𝑥𝑗) =
∑ 𝑃𝑀𝑖(𝑥𝑗)
𝑛𝑝𝑚
𝑖=1
𝑛𝑝𝑚
8: end
The general procedures for the ensemble method are summarized in Table 2. In line 3 of Table 2,
the main loop creates 𝑛𝑝𝑚 sets of training data M1, …, 𝐌𝐧𝐩𝐦
by the simple random sampling
method. It is a naive sampling method that generates every possible sample 𝐌𝐢 of size
𝑀
𝑛𝑝𝑚
from
the population of size M [22]. Each instance has an equal probability of being selected. Line 4
utilizes sets of training data to train 𝑛𝑝𝑚 base prediction models PM1, …, 𝑃𝑀𝑛𝑝𝑚
. Then line 7
aggregates predictions of all the models and algebraically averages these predictions as the final
output for the regression problem. Various approaches have been successfully applied to
construct ensemble learning. The popular ones are bootstrap aggregation (so-called bagging),
boosting, and random forests. Bagging, a technique to generate multiple repeated bootstrap
samples with replacement, is frequently used in classification and regression to improve stability
and accuracy [23]. Instead of generating a succession of independent bootstrap samples, boosting
trains multiple base prediction models using a weighted data set. Weights of samples are adjusted
by issuing more weights on misclassified samples [24]. In this paper, random forests are
implemented according to the research conclusion by Dragi, which indicates that multi-objective
random forests are significantly better than multi-objective bagging [25]. Random forests share
the same general procedures with other ensemble methods in Table 2. The general procedures to
build random forests are shown as follows:
1. Subsets training data 𝐌 to 𝑖 bootstrap samples 𝐌1, … , 𝐌𝑖 in line 3 of Table 2.
2. Build 𝑖 decision trees 𝐷𝑇1, … , 𝐷𝑇𝑖 with corresponding 𝐌𝑖 as suggested in line 4. At each node,
variables are selected at random out of all the features, and the best splits on these variables are
used to split the node. Each tree is growing to the largest extent without pruning.

3. Perform prediction with test data using each tree 𝐷𝑇𝑖 in line 7. The final prediction will be the
average of 𝑃𝑀1(𝑥𝑖), 𝑃𝑀2(𝑥𝑖), … 𝑃𝑀𝑛(𝑥𝑖) because it is a regression problem (𝑃𝑀𝑖(𝑥𝑖) is the
prediction from decision tree DTi).
In this paper, random forests have been employed as an ensemble learning method to cooperate
with MTRM. Random forest cooperates with MTRM mainly in terms of two aspects. Firstly
MTRM generates a collection of PCTs by bagging random forests instead of a single decision
tree. Secondly, MTRM randomly picks attributes as input for function BT in Table 1 instead of
using all attributes to find out the best test to partition the cluster.
Table 3
Algorithm of constructing rule ensembles. Note that I is training instances, and T is a collection of PCTs.
R and W represent the collection of rules generated from T and their corresponding weights.
1: 𝑮𝒆𝒏𝒆𝒓𝒂𝒕𝒆𝑺𝒆𝒕𝑶𝒇𝑷𝑪𝑻𝒔(𝐼): 5: 𝑶𝒑𝒕𝒊𝒎𝒊𝒛𝒆𝑾𝒆𝒊𝒈𝒉𝒕𝒔(𝑅, 𝐼):
2: return 𝑇; 6: If (weight of 𝑟 ∈ 𝑅 = 0)
3: 𝑪𝒐𝒏𝒗𝒆𝒓𝒕𝑷𝑪𝑻𝒔𝑻𝒐𝑹𝒖𝒍𝒆𝒔(𝑇): 7: remove 𝑟;
4: return 𝑅; 8: return (𝑅, 𝑊);
2.2. Rule ensemble for MTRM
Large ensembles of PCTs are hard to interpret. Thus, all PCTs are transcribed into a collection of
rules. Rule learning, a collection of unordered rules whose predictions are combined via
weighted voting, is an expressive and human-readable model representation. It is a conjunction
of statements along with input variables. To briefly explain how the rule ensemble interprets the
MTRM, the key algorithm to achieve rule ensembles of MTRM is summarized in Table 3 [26].
In line 1 of Table 3, function GenerateSetOfPCTs recursively calls function PCT in Table 1 to
generate bagging of PCTs, then line 2 returns a collection of PCTs. Such large ensembles of
PCTs are impossible to interpret, and thus all the trees are transcribed to sets of rules by function
ConvertPCTsToRules in line 3 [27]. Line 5 finds the optimized weight for each of those rules 𝑅
by function OptimizeWeights. During this process, it is trying to assign as many weights as
possible to zero, in the purpose of learning small and interpretable trees. A gradient-directed
optimization algorithm [26] optimizes all the weights. The physical meaning of weights indicates
the importance of each rule contributing to the final prediction. Lines 6 and 7 remove the trees if
their optimal weights are zero. Finally, line 8 returns a collection of rules whose weights are not
zero and their corresponding weights. Hence, the final prediction can be computed by the
following equation:
𝑦
̂ = 𝑤0𝑎𝑣𝑔 + ∑ 𝑤𝑖𝑟𝑖(𝑥)
𝑀
𝑖=1 (2)
where 𝑤0 is the baseline prediction, part (𝑎𝑣𝑔) is a constant vector with the average over all the
targets. The entity 𝑟𝑖 is a vector function which gives out a constant prediction shown in Fig. 1
(b) as a toy example. And 𝑤𝑖 is the corresponding weight of a rule. Note that 𝑀 indicates the
number of rules in a PCT. Fig. 1 (a) considers a population of instances with two descriptive
variables in the form of {𝑥1, 𝑥2} and a target response {𝑦}. A toy PCT is constructed on top of

founded tests, and each clustering of the PCT is represented by a conditional statement as a result
of function 𝑪𝒐𝒏𝒗𝒆𝒓𝒕𝑷𝑪𝑻𝒔𝑻𝒐𝑹𝒖𝒍𝒆𝒔 in Fig. 1 (b). A prediction of 𝑦 with {𝑥1, 𝑥2} = {5, 0.1}is
calculated as:
𝑦
̂ = 0.95(1) + 0.2 [if(𝑥1 > 4), then (1)] + 0.4 [if(𝑥1 > 3), then (3)] + 0 [if(𝑥2 < 2),
then (2)] + 0.3 [if(𝑥2 < 1), then (1)] + 0 [if(𝑥2 > 2.5), then (6)]
= 0.95 + 0.2 × 1 + 0.4 × 3 + 0 × 2 + 0.3 × 1 + 0 × 6 = 2.65 (3)
Conditions in the statements only take descriptive attributes into account because the rules will
be applied to the new unlabeled instances. In this paper, there are eight target variables, and thus
each rule will give a resultant vector of dimension eight. The adopted MTRM is PCTs employing
random forests, and the model is transcribed into a rule ensemble for better interpretation,
enabling the proposed model to predict multiple targets simultaneously.
2.3. Clus
MTRM has been implemented in the Clus, an open-source machine learning software that can be
downloaded from [28]. Clus is a decision tree and rule learning system that works in PCTs [14].
It is a Java-based platform to build both classification and regression trees by choosing different
operation settings. It has been successfully applied to plenty of tasks, including multi-target
regression and classification, structured output learning, time series prediction, etc [14]. Clus
provides many choices for operation settings. In particular, the operation settings related to the
multiple-target regression are explained. First, three input files are required: (1) a file with
training data, (2) a file with test data, and (3) a file specifying all the parameter settings. The
training and test data dictionary (i.e., files names and variable types) should be listed in these
setting files. Descriptive and target attributes in the dataset should be specified explicitly. Other
functionalities, including choices of ensemble method and rule ensemble, should be addressed
accordingly. Appendix A presents a brief example of input files. Full practical example files are
available in [29]. After training the model, an output file will be generated which contains
predictions for multiple target attributes. In addition, one can access the graphic PCTs in the
output file of which example is shown in Appendix B. One is referred to the Clus manual for
detailed instructions and additional settings.
3. Prediction of capacity curve
Although the proposed ML-based approach to capacity curve prediction can be applied to any
RC structure, this study demonstrates the potential by focusing on rectangular RC shear walls’
capacity curves. The training database is built upon a hybrid database consisting of real
experimental results and computational simulation results. A high-prediction parallel finite
element analysis platform (called VEEL, meaning virtual earthquake engineering laboratory) has
been adopted to ensure reliably simulated curves. VEEL’s general applicability and accuracy
have been well documented in [30]. VEEL is rooted in a number of microphysical mechanisms,
including a multi-directional smeared crack model, a topological information-based steel bar
model capable of capturing progressive bar buckling, a 3D interlocking-based nonlinear shear

mechanism, and a bar-concrete proximity-based general confinement model. An optimized
parallel computing algorithm is leveraged to effectively link millimeter length-scale mechanisms
to real-scale RC walls [31,32].
3.1. Transform capacity curve into multivariate targets
The size of the experiment-based database is too small for ML training. We need to enrich the
experimental database with simulated data without introducing a substantial loss of accuracy.
The original database contains global F-D responses of seven rectangular shear walls (i.e., RW1,
WSH1, WSH2, WSH3, WSH4, WSH5, and WSH6). The contrast between experimental F-Ds
from existing literature [2,33] to F-Ds simulated by VEEL is performed in Fig. 2 to emphasize
the precision of the original database. As summarized in Table 4, the variances occur in the axial
force ratio (af) in percentage, yield stress (fy) in MPa, the diameter of vertical reinforcement (db)
in millimeter, and concrete compressive strength (f’c) in MPa. It is challenging to rephrase the
continuous capacity curve into the multivariate target, which machine learning can learn and
predict. The overall procedures to extract the F-D capacity curve database are illustrated in Fig.
3. In Task 1 of Fig. 3, it is essential to extract the outermost points. Most of the outermost points
are related to the overall envelope of the capacity curve of a shear wall subjected to reverse and
cyclic loading. Although there is no strict restriction, 46 points are extracted from the shear wall
database, as visualized in Fig. 4. More points will improve the accuracy of the fitted capacity
curve, but this choice appears acceptable to capture the overall nonlinear envelops reasonably.
The extracted points on the capacity curve envelope are denoted as {𝑑𝑖, 𝐹𝑖}, 𝑖 = 1, … 46, where 𝑑𝑖
is a displacement and 𝐹𝑖 is the associated force point. We perform separate least-square fittings
on the positive and negative regimes to account for asymmetric shapes of general capacity curve
envelopes. 𝜷 ∈ ℝ𝑝
stands for parameters to be determined, and 𝜷 = [𝜷𝑃: 𝜷𝑁], 𝜷𝑃 =
{𝑃1, 𝑃2, … , 𝑃
𝑝}𝑇
and 𝜷𝑁 = {𝑁1, 𝑁2, … , 𝑁𝑝}𝑇
. Then, the optimal parameters (denoted by 𝜷
̂) for the
positive and negative regimes are obtained by
𝜷
̂𝑃 = argmin
𝜷𝑃
‖𝑭 − 𝐝𝜷𝑝‖
𝟐
, for 𝑑𝑖 ∈ ℝ+
(4)
𝜷
̂𝑁 = argmin
𝜷𝑁
‖𝑭 − 𝐝𝜷𝑛‖𝟐
, for 𝑑𝑖 ∈ ℝ−
(5)
where 𝐝 is the model matrix, 𝐝 ∈ ℝ46×4
of which ith row means 𝒅𝑖 = {𝑑𝑖, 𝑑𝑖
2
, 𝑑𝑖
3
, 𝑑𝑖
4
}. The
envelope force vector is 𝑭 = {𝐹1, 𝐹2, … , 𝐹46}. Thus, the p-parameter fitted model for the capacity
curve envelop is succinctly given by:
𝐹𝑖 = 𝐻(𝑑𝑖) ∑ 𝑃𝑙𝑑𝑖
𝑙
𝑝
𝑙=1 + 𝐻(−𝑑𝑖) ∑ 𝑁𝑙𝑑𝑖
𝑙
𝑝
𝑙=1 (6)
where 𝐻(𝑑) is the unit step function (i.e., one for 𝑑 > 0, zero otherwise); 𝑝 is the highest order of
base polynomials. This study chose 𝑝 = 4 for the polynomial bases rooted in the prior
knowledge that most capacity curves often exhibit convex or concave shapes. A higher-order
fitting may help, but our choice is justifiable since the values of R2
(the coefficient of
determination) calculated using our approach are commonly larger than 0.99. For the subsequent
multi-target machine learning, we added the optimal parameters

𝜷
̂ = [𝜷
̂𝑝: 𝜷
̂𝑛] = {𝑃
̂1, 𝑃
̂2, 𝑃
̂3, 𝑃
̂4, 𝑁
̂1, 𝑁
̂2, 𝑁
̂3, 𝑁
̂4}T
onto the existing wall database. Thus, 32
descriptive variables and eight target variables are included in the finalized database. Detailed
variable information is summarized in Appendix C. Overall, the capacity curve database
dimension is 182 × 40 (i.e., 182 instances with 40 attributes).
Table 4
Details of the original rectangular shear wall database.
RW1 WSH1 WSH2 WSH3 WSH4 WSH5 WSH6
af 0 ~ 30 0 ~ 40 0 ~ 40 0 ~ 40 0 ~ 40 0 ~ 40 0 ~ 40
fy 300 ~ 600 450 ~ 610 500 ~ 710 500 ~ 720 500 ~ 640 500 ~ 710 500 ~ 650
db 12.7 ~ 28.6 8 ~ 14 8 ~ 15 8 ~ 15 8 ~ 15 6 ~ 12 8 ~ 15
f’c 37.7 30 ~ 60 30 ~ 60 30 ~ 60 30 ~ 60 30 ~ 60 30 ~ 60
Fig. 2. (Top six panels) experimental F-D responses versus (bottom six panels) simulated F-D responses
by VEEL.
-600
-400
-200
0
200
400
600
-100 -75 -50 -25 0 25 50 75 100
Force
[kN]
Total displacement [mm]
WSH1
-600
-400
-200
0
200
400
600
-100 -75 -50 -25 0 25 50 75 100
Force
[kN]
WSH2
-600
-400
-200
0
200
400
600
-100 -75 -50 -25 0 25 50 75 100
Force
[kN]
WSH3
-600
-400
-200
0
200
400
600
-100 -75 -50 -25 0 25 50 75 100
Force
[kN]
WSH
4
-600
-400
-200
0
200
400
600
-100 -75 -50 -25 0 25 50 75 100
Force
[kN]
WSH5
-600
-400
-200
0
200
400
600
-100-80 -60 -40 -20 0 20 40 60 80 100
Force
[kN]
WSH6

Fig. 2. Flowchart of transformation of capacity curve database to multiple target database.
Fig. 3. Example of extraction of 46 outermost points from force-displacement (F-D) responses.
3.2. Multi-target prediction of capacity curve
This section explains the complete process of the multiple target ML prediction of the capacity
curves using PCTs. PCTs consider trees as a hierarchy of clusters with respect to many observed
descriptive variables to build trees to predict multiple targets simultaneously. As explained in the
previous section, our hybrid database contains 32 descriptive variables (denoted as 𝐗 ∈ ℝ𝑛×32
)
and eight target variables (𝐘 ∈ ℝ𝑛×8
). Thus, the ith row of 𝐗 is 𝒙(𝑖) = {𝑥1, … , 𝑥32}(𝑖) whereas the
ith row of 𝐘 is {𝑃
̂1, 𝑃
̂2, 𝑃
̂3, 𝑃
̂4, 𝑁
̂1, 𝑁
̂2, 𝑁
̂3, 𝑁
̂4}(𝑖)
. The prediction task is to predict 𝒚(𝑛𝑒𝑤) ∈ ℝ8
given a new query of 𝒙(𝑛𝑒𝑤) ∈ ℝ32
. Fig. 5 summarizes general procedures of initial setup,
training, prediction, and visualization. We will elaborate on each sub-task as follows.
3.2.1. Initial preparation
Task 1 in Fig. 5 summarizes the key procedure before launching multiple target ML. Ranges of
variables in the hybrid database are wide, e.g., ranging from 0.01 to 2.23×109
. To be consistent
and prevent any unit-dependent effect in PCTs, we normalized all attributes to the range of [0,
1]. We considered two normalization schemes: “min-max” and “standard deviation”
normalizations as candidates. In the min-max normalization, normalization is done by
𝑥𝑖
′
=
𝑥𝑖−𝑥𝑚𝑖𝑛
𝑥𝑚𝑎𝑥−𝑥𝑚𝑖𝑛
(7)
where 𝑥𝑚𝑖𝑛 and 𝑥𝑚𝑎𝑥 are the minimum and maximum of the ith attribute, respectively. In the
standard deviation normalization, we have
𝑥𝑖
′
=
𝑥𝑖−𝑥̅
𝑠
(8)
where 𝑥̅ and s is the mean and the standard deviation of the ith attribute, respectively. To
quantitatively compare impacts of the normalization schemes, we compare multi-target
-500
-250
0
250
500
-100 -50 0 50 100
Force
[kN]
Displacement [mm]
Task 1: Extract 46
envelope points of
capacity curves from
hybrid database
Task 2: Least square
fitting of capacity curve
envelopes using
polynomial bases
Task 3: Store the fitted
coefficients of polynomial
bases into hybrid database
as target variables

predictions of three cases: using (1) the original database without any normalization, (2) database
normalized by the min-max scheme, and (3) database normalized by the standard deviation. All
the initial settings of the MTRM model are constrained identical for three cases. From this
preliminary comparison, the “min-max normalization” appears to lead to the lowest MAE.
Fig. 4. Multi-target prediction flowchart from initial preparation, training and prediction, and
postprocessing and investigation. (DM: Mahalanobis Distance; MAEavg: the averaged mean absolute error
of the multiple-target prediction).
Hence, this study adopts the “min-max” normalization throughout the following procedures.
𝑥𝑚𝑎𝑥 and 𝑥𝑚𝑖𝑛 of each attribute must be stored for future backward mapping (i.e., from the
normalized target to actual response, Task 2 (b) of Fig. 5).
Although our hybrid dataset has more than 200 instances, it is still relatively small for reliable
ML training. The PCTs may not be stable to learn the rules around the outside borders of
multiple descriptive variables. Such an issue is the so-called “extrapolation” problem, an intrinsic
statistical model. In short, a statistical or ML model can predict well when the new instance is
similar to those inside the data space. Still, its accuracy decreases as the new instance is near the
borderlines or beyond the data space. In those ranges, prediction becomes an extrapolation since
similar cases have never been experienced [34]. Therefore, it is important to understand each
instance’s relative location in the entire data space. In addition, it is instructive to note that the
data space covered by the database is scattered and refers to space with more than one instance
experienced inside. In the hope of quantitatively determining the borderlines of scattered data
space and facilitating visualization of the relative position of new instances in the entire data
space, we adopted the Mahalanobis Distance (denoted as 𝐷𝑀). For a data point in the
multidimensional space, 𝐷𝑀 measures how many standard deviations away the point is from the
mean of the multidimensional space by
𝐷𝑀(𝒙) ≡ √(𝒙 − 𝝁)𝑇𝐒−1(𝒙 − 𝝁) (9)
where 𝒙 is an instance in the descriptive data space (here 𝒙 = {𝑥1, 𝑥2, … , 𝑥32}𝑇
), 𝝁 is a vector of
the mean of each descriptive variable (here, 𝝁 = {𝜇1, 𝜇2, … , 𝜇32}𝑇
) and S is the covariance
matrix. We calculate and record 𝐷𝑀 into the database as auxiliary information (Task 1 (b) of Fig.
5). This information determines whether new data is inside the database space or close to or
beyond the existing database. To facilitate the unbiased training of PCTs, we randomly shuffled
the database to make 70% training data and 30% test data (Task 1 (c) of Fig. 5).
Task 1: Preparation
(a) Normalize all variables
to [0, 1];
(b) Record DM
(c) Shuffle data to 70%
training and 30% test data
Task 2: Prediction
(a) Perform multi-target
ML using Clus
(b) Calculate MAEavg
Task 3: Visualization
(a) Backward mapping of
predicted coef. of
polynomial bases
(b) Reconstruct F-D curves
(c) Confidence interval

3.2.2. Training and test of the multi-target prediction model
As shown in Task 2 of Fig. 5, the next step is to train and perform the multi-target prediction.
PCTs generate two types of prediction results: original predictions and pruned predictions. A
very large PCTs is grown, which typically learns the details and noises in the training data to the
extent that it will negatively influence the performance of the model on new instances. The PCTs
are pruned by one of the pruned criteria to eliminate the negative impact. Here, only original
predictions are considered in this paper because the pruned prediction is only necessary for the
very large data set [16]. Random forests are used as an ensemble learning method. Among many
measurements of prediction accuracy in the ML domains, we adopted the mean absolute error
(MAE). Since we are predicting multiple targets, each target has its own MAE by:
𝑀𝐴𝐸𝑖 =
100
𝑛
∑ |
𝐴𝑖(𝑗)−𝑃𝑖(𝑗)
𝐴𝑖(𝑗)
|
𝑛
𝑗=1 (10)
where 𝑀𝐴𝐸𝑖 is the MAE of the ith target, 𝐴𝑖(𝑡) and 𝑃𝑖(𝑗) is the true value and predicted value of
the ith target of the jth instance, respectively. n is the total number of instances. Then, the overall
MAE of all target attributes (denoted as 𝑀𝐴𝐸𝑎𝑣𝑔) is calculated as:
𝑀𝐴𝐸𝑎𝑣𝑔 =
1
𝑞
∑ 𝑀𝐴𝐸𝑖
𝑞
𝑖=1 (11)
where q is the number of total target attributes. In this study, q = 8 (see Task 2 (b) of Fig. 5).
3.2.3. Visualization of prediction mode
Task 3 of Fig. 5 summarizes the postprocessing. Since our target is to predict curves (not a
simple scalar), we reconstruct the capacity curves using the predicted coefficients of the
polynomial bases. It starts from the backward mapping of the coefficients from [0, 1] to the
original ranges. Given the predicted matrix 𝐘𝑝𝑟𝑒𝑑 ∈ ℝ𝑛×8
with each entity ranging [0, 1], a batch
backward mapping is simply given by
𝐘𝑓𝑖𝑛𝑎𝑙 = 𝐘𝑝𝑟𝑒𝑑𝐘𝑑𝑖𝑓𝑓 + 𝐘𝑚𝑖𝑛 (12)
where 𝐘𝑝𝑟𝑒𝑑 ∈ ℝ𝑛×8
is the final predicted coefficient matrix with original ranges. 𝐘𝑑𝑖𝑓𝑓 ∈ ℝ8×8
is a diagonal matrix and 𝐘𝑚𝑖𝑛 ∈ ℝ𝑛×8
is a column-size identical matrix, which is given by
𝐘𝑑𝑖𝑓𝑓 ≡ [
(max(𝒚1) − min(𝒚1)) [𝟎]
⋱
[𝟎] (max(𝒚8) − min(𝒚8))
]
𝐘𝑚𝑖𝑛 ≡ [
(min(𝒚1)) ⋯ (min(𝒚8))
|| ⋱ ||
(min(𝒚1)) ⋯ (min(𝒚8))
]
Here 𝒚𝑖 ∈ ℝ𝑛×1
represents a vector of original ith target coefficient. Since we now have all
coefficients of the polynomial bases, we can draw the envelopes of the capacity curves by using
Eq. (6).

3.2.4. Confidence interval
As all statistical models involve uncertainty, our multiple-target prediction model naturally
exhibits uncertainty for new predictions. For a new prediction, it is crucial to provide uncertainty
that is rooted in the training process that uses randomly selected training data sets. To offer a
measurement of uncertainty behind ML-based prediction, this study harnesses a bootstrapping
[34] similar to the so-called “percentile bootstrapping.” The detailed procedure to obtain
bootstrapping sample is as follows.
[BS 0] Initial stage begins with a training data set 𝑀(𝑖=1) and a new instance 𝒙𝑛𝑒𝑤 ∈ ℝ32×1
[BS 1] Fit a multiple-target prediction model using the training data set 𝑀(𝑖) and obtain a target
response 𝒚𝑛𝑒𝑤(𝑖) = {𝑃1, 𝑃2, 𝑃3, 𝑃4, 𝑁1, 𝑁2, 𝑁3, 𝑁4}(𝑖)
T
for the given 𝒙𝑛𝑒𝑤.
[BS 2] Generate a new training data set 𝑀(𝑖+1) by resampling 70% of the database (randomly
selected with replacement).
[BS 3] Refit the multiple-target prediction model using the training dataset 𝑀(𝑖+1).
[BS 4] Repeat above steps (1-3) nbs times to generate nbs bootstrapping samples (i.e., nbs multi-
target predictions).
In our approach, sorting the nbs multi-target predictions is necessary, but it is not straightforward
as a single target bootstrapping. To derive a physically sound approach for sorting the nbs
multivariate predictions, we focused on the absorbed energy of the structure, i.e., area under the
capacity curves. In general, a peak-based sorting appears not reasonable: e.g., curve (c) has the
largest positive peak while curve (a) has the largest peak in the negative regime in Fig. 6.
However, the total absorbed energy intuitively leads to a single scalar that also holds the
mechanical meaning of the structure. Fig. 6 briefly illustrates how the capacity curves' absorbed
energy is calculated and how it can help order the three dissimilar curves of different peaks and
shapes. Since we represent the capacity curve envelopes with polynomial bases and already
obtained their real-valued coefficients in 𝒚𝑛𝑒𝑤(𝑖), 𝑖 = 1, … , 𝑛𝑏𝑠 (BS 2), it is straightforward to
calculate the absorbed energy (denoted as 𝐼(𝑖) ∈ ℝ+
) as
𝐼(𝑖) = |∫ 𝐻(𝜁(𝑖)) ∑ 𝑃𝑙𝜁(𝑖)
𝑙
𝑝
𝑙=1 + 𝐻(−𝜁(𝑖)) ∑ 𝑁𝑙𝜁(𝑖)
𝑙
𝑝
𝑙=1
𝐷max,(𝑖)
𝐷𝑚𝑖𝑛,(𝑖)
𝑑𝜁(𝑖)| (13)
where the subscript (𝑖) denotes the ith multi-target prediction; |.| returns the absolute value; 𝐻(𝑑)
is the unit step function (i.e., 1.0 for 𝑑 > 0, zero otherwise); 𝐷𝑚𝑎𝑥,(𝑖) and 𝐷𝑚𝑖𝑛,(𝑖) is the positive
maximum and negative minimum displacement of the capacity curve, respectively; 𝜁(𝑖) is the
displacement coordinate. The condition that the cumulative distribution of bootstrap samples
(denoted as 𝐺
̂) is less than or equal to a constant b is expressed as:
𝐺
̂(𝑏) = 𝐹{𝐼(𝑖) ≤ 𝑏 } , 𝑖 = 1, … , 𝑛𝑏𝑠. (14)
where F is the frequencies of 𝐼(𝑖). An instance with a specific percentile (α) is represented as:

𝒚∗(𝛼)
= 𝐺
̂−1(𝛼) (15)
Fig. 5. Illustration of calculation of the absorbed energy used for sorting in bootstrapping. Three capacity
curves (a,b,c) with different peaks and shapes are shown. (⊕ means a summation operation).
Fig. 7. 95% percentile confidence interval of wall WSH3 under 590 MPa shear strength.
where 𝐺
̂−1
is the inverse function of 𝐺
̂. Therefore, the 95% percentile confidence interval is
given by
(𝒚∗(0.025)
, 𝒚∗(0.975)
) (16)
In this paper, 𝑛𝑏𝑠 = 100 is adopted. Here, a 95% confidence interval indicates the probability of
the range covering the predicted curves regarding the total absorbed energy. For instance, Fig. 7
shows a 95% percentile confidence interval. Note that there is ample room for extension of the
proposed approach, especially regarding how to define the “order” of the bootstrapped samples.
Also, there are other methods for uncertainty quantification, such as a Jackknife method [35],
which is straightforward and does not require a random sampling.
-500
-250
0
250
500
-100 -50 0 50 100
Force
[kN]
Displacement [m]
F-D
Upper boundary
Lower boundary
(a)
d(i)
F(i)
Dmax,(i)
Dmin, (i)
⊕→ I(i) of curve (a)
(b)
(c)

4. Results
4.1. Impact of PCT types on prediction accuracy
To investigate the impact of PCT types on the performance of MTRM, we considered two types
of operational settings of PCTs. The first type, conventional PCTs, considers both descriptive
variables X and target attributes Y to partition instances into subsets during searching. On the
contrary, the second type, the so-called trial PCTs, partitions instances into subsets in terms of
only the descriptive variables X. For comparison, we used identical training and test data from
the capacity curve database (182 rows) to train the model and make predictions. As already
mentioned in Task 1 (b) in Fig. 5, DM of all instances are recorded to easily visualize each
specimen’s relative position in the multivariate space and are plotted in a radar plot (e.g., Fig.8).
The detailed values of the created database and DM of all instances are available in [29]. Four
selected walls (indexed by 4, 20, 67, and 88) and their DM values are presented in Fig. 8. Fig. 9
presents the predicted capacity curves of selected walls accordingly. The corresponding MAEs of
these four capacity curves predicted by the conventional PCTs and trial PCTs are aggregated in
Table 5.
Table 5
MAEs of prediction by conventional PCTs versus trial PCTs.
Wall Index DM MAEs (conventional) MAEs (Trial)
4 17.9 2.6% 4%
20 16.8 2.3% 18%
67 0.62 1.1% 1.2%
88 1.79 0.1% 0.1%
Another prediction of wall 175 with DM = 1.89 is plotted in Fig.10, which also supports the good
prediction of both PCTs with smaller DM. The prediction accuracy of conventional PCTs is much
stable and superior to the trial PCTs. In addition, it is observed that both conventional PCTs and
trial PCTs make a relatively accurate prediction of wall index 67 and 88, but a decent prediction
of wall index 4 and 20. To some extent, the trial PCTs is similar to the “clustering” since it
considers only the descriptive attributes. On the contrary, the conventional PCTs collaborate with
the rule ensemble to better interpret and explore complex data. In view of the high
dimensionality of our database (i.e., 32 variables), the conventional PCTs appear to slightly
outperform the trial PCTs. Based on this outcome, the conventional PCTs were utilized in all the
simulations hereafter.
4.2. Impact of the extended database on the prediction
The discussion addressed so far is inherently based on the training data. It is common sense that
PCTs will yield better predictions when a target instance resides within the boundary of the
available training data. The prediction model will perform the so-called “extrapolation” when a
new target has little similarity and falls outside the existing training data. To investigate the
influence of this extrapolation, we first trained the PCTs with 70% of sampled training data from
the capacity curve database (182 rows) and made the prediction for the 30% test data plus a new
instance (SW1-2) inclusively involved. The DM of SW1-2 along with other 182 instances are
visualized in Fig. 11 (a). The DM of SW1-2 (marked as a star) indicates exclusion of the new
instance in contrast with the existing training data space. And the predicted capacity curve of
SW1-2 is visualized in Fig.12 as the dashed curve. Secondly, we collected 33 new rectangular

shear walls from [36] and merged them into the capacity curve database, enlarging them to 214
rows. Repeat the scenario by training the PCTs with 70% of sampled training data from the
extended capacity curve database (214 rows), and predict the rest of the test data.
Fig. 8. Radar plot of 182 walls with varying DM: (a) wall 4 with DM = 17.99; (b) wall 20 with DM = 16.76;
(c) wall 67 with DM = 0.62; (d) wall 88 with DM = 1.79.
Fig. 9. Predicted capacity curves using the conventional PCTs and the trial PCTs: (a) wall 4; (b) wall 20;
(c) wall 67; (d) wall 88.
0
3
6
9
12
15
18
1
3 5 7 9 11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59
61
63
65
67
69
71
73
75
77
79
81
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
121
123
125
127
129
131
133
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
171
173175177179181
(c)
(a)
(b)
(d)
-600
-300
0
300
600
-100 -50 0 50 100
Force
[kN]
Displacement [mm]
F-D
Conventional
Trial
-300
-150
0
150
300
-100 -50 0 50 100
Force
[kN]
Displacement [mm]
F-D
Conventional
Trial
-400
-200
0
200
400
-80 -40 0 40 80
Force
[kN]
Displacement [mm]
F-D
Conventional
Trial -500
-250
0
250
500
-100 -50 0 50 100
Force
[kN]
Displacement [mm]
F-D
Conventional
Trial
(a) (b)
(c) (d)

Fig. 10. Predicted capacity curves of wall 175 using the conventional PCTs and the trial PCTs.
It is critical to note that we force SW1-2 as one of the test data for both scenarios for comparison.
The predicted capacity curve of SW1-2 is visualized as the dashed curve delimited by dots in
Fig. 12. For the first scenario, it is observed that the prediction of SW1-2 under the original
database diverges from the experimental F-D of SW1-2 in Fig. 12. And the data space of SW1-2
marked as star indicates exclusion of the new instance in contrast with the existing training data
space in Fig. 11 (a). For the second scenario, we found that the prediction of SW1-2 under the
extended database converges significantly in contrast with the prediction of the previous
scenario. The ample data space around SW1-2 in Fig. 11 (a) has been compacted asymptotically
with multiple samples around in Fig. 11 (b), which presents that the extension of 33 new shear
walls has high similarity with specimen SW1-2 in terms of DM. Analyzing the results of both
scenarios, the extension of the database, which includes instances of high similarity with SW1-2
in terms of DM, will positively influence the prediction. These similar instances will fill in
scattered data space around SW1-2 and lead to a more comprehensive model. On the contrary, an
extension of the database of low similarity with SW1-2 will rarely promote the prediction of
SW1-2.
4.3. Impact of erroneous data on prediction
Nowadays, a tremendous amount of engineering data accumulate in our domain, frequently
reported with incompleteness and erroneous values. The deficiency of data will not facilitate
training of the predictive models but leave the potential risk of generating an unstable prediction.
One possible way to minimize the negative influence of data deficiency on prediction is to
leverage imputation to handle missing values. The impacts of the existing imputation method
fractional hot deck imputation on the prediction of engineering data have been investigated by
[37]. The robustness of the MTRM against erroneous data is one of the most important criteria to
evaluate the model objectively. Note that the naive version of the capacity curve database (182
rows) has 2.3% erroneous values within the descriptive variable matrix X because of human
errors. Fortunately, the author was aware of these errors ahead of time and remedied the capacity
curve database with extreme caution. To investigate the impact of erroneous data on the
prediction, the author utilized 30% of sampled erroneous data to train the conventional PCTs
and generate predictions for wall indexed by 20 and 88, respectively. (the identical walls denoted
-500
-250
0
250
500
-100 -50 0 50 100
Force
[kN] Displacement [m]
F-D
Trial
Conventional

by (b) and (d) in Fig. 8). The predicted F-D curves of these two targets are visualized in Fig. 13
as dash curves in contrast with predicted capacity curves upon the correct database (dash curves
delimited with dots). Fig. 13 infers that the conventional PCTs are fairly robust against erroneous
data.
(a)
(b)
Fig. 11. (a) DM of 183 wall instances (b) DM of 214 wall instances. Note that DM of wall SW1-2 is marked
with a star.
0
1
2
3
4
5
1
3 5 7 9 11
13
15
17
19
21
23
25
27
29
31
33
35
37
39
41
43
45
47
49
51
53
55
57
59
61
63
65
67
69
71
73
75
77
79
81
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
121
123
125
127
129
131
133
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
171
173
175177179181183
0
1
2
3
4
5
1
4 7 10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
103
106
109
112
115
118
121
124
127
130
133
136
139
142
145
148
151
154
157
160
163
166
169
172
175
178
181
184
187
190
193
196
199
202
205
208 211 214

Fig. 12. The predicted capacity curves of SW1-2 based on the original database and the extended
database, respectively.
Fig. 13. The predicted capacity curves based on erroneous database versus that based on the correct
database: (a) wall 20; (b) wall 88.
5. Conclusions
In the hope of providing an efficient and reliable tool that can help quickly determine the
capacity curves of F-D responses, this paper utilizes a multi-target regression model to generate
the prediction. To our best knowledge, the prediction of capacity curves had never been
attempted in infrastructure engineering. The general conclusion is that the MTRM implementing
conventional PCTs combined with ensemble methods generates fairly good predicted F-D curves
in terms of MAE and visualization. Its confidence interval and robustness against erroneous data
strengthen the reliability of the method. Compared with the traditional approach to conducting a
real experiment or simulating finite element models, the proposed method of incorporating ML
will significantly reduce expenses in terms of time and money.
The future works will focus on several interesting aspects which will promote the performance of
the method. Firstly, the university of the proposed capacity curve database is restricted to
rectangular shear walls. The extension of the method to other types of infrastructures will break
the bottleneck of the proposed approach. Secondly, the capacity curve database consists of 32
descriptive variables currently, which may cause overfitting issues. An attributes selection test
based on empirical engineering knowledge or the attributes selection algorithm [34] may
improve the precision of the prediction. Lastly, concerning the size of the proposed capacity
database, it may result in unstable and biased models. A sufficiently large database extended in
the future will help to produce more accurate and stable results.
-250
-125
0
125
250
-25 -12.5 0 12.5 25
F-D
Original database
Extended database
-500
-250
0
250
500
-100 -50 0 50 100
Force
[kN]
Displacement [mm]
F-D
Erroneous database
Correct databse
-600
-300
0
300
600
-100 -50 0 50 100
Force
[kN]
Displacement [mm]
F-D
Erroneous database
Correct database

Acknowledgments
The authors declare no conflict of interest. This research is supported by the research funding of
the Department of Civil, Construction, and Environmental Engineering of Iowa State University.
The research reported is partially supported by the HPC@ISU equipment at ISU, some of which
has been purchased through funding provided by NSF under MRI grant number CNS 1229081
and CRI grant number 1205413.
Appendix A: Example of input files
The user must provide three input files to run the Clus. The training data file should strictly
follow the format:
@RELATION “WallDB_train”
@ATTRIBUTE var1 numeric
⋮ ⋮ ⋮
@DATA
0,0,0,0,1,0.25,0,0.256666667,0,0,0.048,0,0.226190476,5.07E-08,1,1,0.155506608,0,
⋮
The Clus is format-sensitive. The exact name of the training data file should be included at the
beginning. Afterward, users have to list all attributes along with data types. Note that training
data must be listed row-wise. A comma delimits each element in a row, and each row is delimited
by starting a new line. The test data file follows an identical format. Besides, a file specifying all
the parameter settings is described as:
[Attributes] [Ensemble]
Target = 33-40 Iterations = 100
Clustering = 1-40 EnsembleMethod = RForest
Descriptive = 1-32
[Data] [Output]
File = WallDB_train.arff WritePredictions = {Train,Test}
TestSet = WallDB_test.arff
[Tree]
Heuristic = VarianceReduction
PruningMethod = M5Multi
ConvertToRules = ALLNodes
Users can control the types of PCTs in Attributes section. Data section lists the full name with the
extension of training data and test data. Tree and Ensemble sections specify additional settings
for the PCTs. For more details, Clus manual provides comprehensive explanations for each item
in these three files.

Appendix B: Graphic PCTs
This appendix explains a toy graphic PCTs after rule ensemble in Fig. 1 (b). An incomplete
realistic graphic PCTs for predictions in Figs. 9 and 10 is obtained from the output file of Clus:
We upload the output file of predictions in Figs. 9 and 10 with full graphic PCTs in [29].
Appendix C: Attributes details of the capacity curve database
Attributes Detail
I Moment of inertia
Length Length of shear wall
Thickness Thickness of shear wall
Height Height of shear wall
Number of floors Number of floors
Axial Force Ratio Axial force ratio
Cover thickness Cover thickness
Concrete_fc Concrete compressive strength
Concrete_ft Concrete yield strength
bb Width of boundary element
hb Thickness of boundary element
cb Cover thickness in boundary element
Steel_Vertical1_fy Yield strength of boundary longitudinal reinforcement

Steel_Vertical1_fu Ultimate stress of boundary longitudinal reinforcement
Steel_Vertical1_Spacing Spacing of boundary longitudinal reinforcement
Steel_Vertical1_strain at fu Ultimate strain of boundary longitudinal reinforcement
Steel_Vertical1_Diameter Diameter of boundary longitudinal reinforcement
Steel_Vertical2_fy Yielding strength of web longitudinal reinforcement
Steel_Vertical2_fu Ultimate stress of web longitudinal reinforcement
Steel_Vertical2_Diameter Diameter of web longitudinal reinforcement
Steel_Horizontal1_fy Yielding strength of boundary transverse reinforcement
Steel_Horizontal1_fu Ultimate stress of boundary transverse reinforcement
Steel_Horizontal1_strain at fu Ultimate strain of boundary transverse reinforcement
Steel_Horizontal1_Spacing Spacing of boundary transverse reinforcement
Steel_Horizontal1_Diameter Diameter of boundary transverse reinforcement
Steel_Stirrup1_fy Yielding strength of stirrups
Steel_Stirrup1_fu Ultimate stress of stirrups
Steel_Stirrup1_strain at fu Ultimate strain of stirrups
Steel_Stirrup1_spacing Spacing of stirrups
Steel_Stirrup1_Diameter Diameter of stirrups
Number of longitudinal bars at wall boundary Number of longitudinal bars at wall boundary
P1 ~ Pp and N1 ~ Np Polynomial bases parameters
References
[1] Aaleti S, Brueggen BL, Johnson B, French CE, Sritharan S. Cyclic Response of Reinforced
Concrete Walls with Different Anchorage Details: Experimental Investigation. J Struct Eng
2013;139:1181–91. https://doi.org/10.1061/(ASCE)ST.1943-541X.0000732.
[2] Dazio A, Beyer K, Bachmann H. Quasi-static cyclic tests and plastic hinge analysis of RC
structural walls. Eng Struct 2009;31:1556–71. https://doi.org/10.1016/j.engstruct.2009.02.018.
[3] Lefas ID, Kotsovos MD, Ambraseys NN. Behavior of Reinforced Concrete Structural Walls:
Strength, Deformation Characteristics, and Failure Mechanism. ACI Struct J 1990;87:23–31.
https://doi.org/10.14359/2911.
[4] Salonikios TN, Kappos AJ, Tegos IA, Penelis GG. Cyclic load behavior of low-slenderness
reinforced concrete walls: design basis and test results. ACI Struct J 1999;96:649–60.
[5] Rafiq M., Bugmann G, Easterbrook D. Neural network design for engineering applications.
Comput Struct 2001;79:1541–52. https://doi.org/10.1016/S0045-7949(01)00039-6.
[6] Reich Y. Machine learning techniques for civil engineering problems. Comput Civ Infrastruct Eng
1997;12:295–310.
[7] Chou J, Ngo N, Pham A. Shear Strength Prediction in Reinforced Concrete Deep Beams Using
Nature-Inspired Metaheuristic Support Vector Regression. J Comput Civ Eng 2016;30:04015002.
https://doi.org/10.1061/(ASCE)CP.1943-5487.0000466.
[8] Pal M, Deswal S. Support vector regression based shear strength modelling of deep beams. Comput
Struct 2011;89:1430–9. https://doi.org/10.1016/j.compstruc.2011.03.005.
[9] Aguilar V, Sandoval C, Adam JM, Garzón-Roca J, Valdebenito G. Prediction of the shear strength
of reinforced masonry walls using a large experimental database and artificial neural networks.
Struct Infrastruct Eng 2016;12:1661–74. https://doi.org/10.1080/15732479.2016.1157824.

[10] Chou J-S, Pham A-D. Enhanced artificial intelligence for ensemble approach to predicting high
performance concrete compressive strength. Constr Build Mater 2013;49:554–63.
https://doi.org/10.1016/j.conbuildmat.2013.08.078.
[11] van Gent MRA, van den Boogaard HFP. Neural network modelling of forces on vertical structures.
Coast Eng Proc 1998:2096–123. https://doi.org/http://dx.doi.org/10.9753/icce.v26.%25p.
[12] Johari A, Habibagahi G, Ghahramani A. Prediction of Soil–Water Characteristic Curve Using
Genetic Programming. J Geotech Geoenvironmental Eng 2006;132:661–5.
https://doi.org/10.1061/(ASCE)1090-0241(2006)132:5(661).
[13] Saha S, Gu F, Luo X, Lytton RL. Prediction of Soil-Water Characteristic Curve Using Artificial
Neural Network Approach. PanAm Unsaturated Soils 2017, Reston, VA: American Society of
Civil Engineers; 2018, p. 124–34. https://doi.org/10.1061/9780784481684.014.
[14] Struyf J, Zenko B, Blockeel H, Vens C. Clus: user’s manual. 2010.
[15] Ženko B. Learning predictive clustering rules. Int Work Knowl Discov Inductive Databases
2005;32:234–50.
[16] Blockeel H, De Raedt L, Ramon J. Top-down induction of clustering trees. Proc 15th Int Conf
Mach Learn 1998:55–63. https://doi.org/10.1.1.50.3353.
[17] Prajapati P, Thakkar A. Improving the performance of predictive clustering tree algorithm for
hierarchical multi-label classification. Proc. Int. Conf. Emerg. Res. Comput. Information, Commun.
Appl., 2014.
[18] Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Trans Inf Theory
1991;37:145–51.
[19] Karegowda AG, Manjunath AS, Ratio G, Evaluation CF. Comparative study of attribute selection
using gain ratio. Int J Inf Technol Knowl Knowl Manag 2010;2:271–7.
[20] Opitz D, Maclin R. Popular Ensemble Methods: An Empirical Study. J Artif Intell Res
1999;11:169–98. https://doi.org/10.1613/jair.614.
[21] Rokach L. Ensemble-based classifiers. Artif Intell Rev 2010;33:1–39.
https://doi.org/10.1007/s10462-009-9124-7.
[22] Tille Y. Statistical Analysis of Designed Experiments. vol. 53. New York: Springer-Verlag; 2002.
https://doi.org/10.1007/b98966.
[23] Bbeiman LEO, Breiman L. Bagging predictors. Mach Learn 1996;24:123–40.
[24] Quinlan JR. Bagging, boosting, and C4.5. AAAI/IAAI, 1996, p. 725–30.
[25] Kocev D, Vens C, Strufy J, Dzeroski S. Ensembles of multi-objective decision trees. Mach Learn
ECML 2007 2007:624–31.
[26] Aho T, Zenko B, Dzeroski S. Rule Ensembles for Multi-target Regression. 2009 Ninth IEEE Int.
Conf. Data Min., IEEE; 2009, p. 21–30. https://doi.org/10.1109/ICDM.2009.16.
[27] Friedman JH, Popescu BE. Predictive learning via rule ensembles. Ann Appl Stat 2008;2:916–54.
https://doi.org/10.1214/07-AOAS148.
[28] Struyf J. Clus download 2013.
[29] Yang Y, Cho IH. Shear wall database 2019.
[30] Cho IH. Virtual Earthquake Engineering Laboratory Capturing Nonlinear Shear, Localized
Damage and Progressive Buckling of Bar. Earthq Spectra 2013;29:103–26.
https://doi.org/10.1193/1.4000095.

[31] Cho IH, Hall JF. Parallelized Implicit Nonlinear FEA Program for Real Scale RC Structures under
Cyclic Loading. J Comput Civ Eng 2012;26:356–65. https://doi.org/10.1061/(ASCE)CP.1943-
5487.0000138.
[32] Cho IH, Porter KA. Multilayered grouping parallel algorithm for multiple-level multiscale
analyses. Int J Numer Methods Eng 2014;100:914–32. https://doi.org/10.1002/nme.4791.
[33] Thomsen JH, Wallace JW. Displacement-Based Design of Slender Reinforced Concrete Structural
Walls—Experimental Verification. J Struct Eng 2004;130:618–30.
https://doi.org/10.1061/(ASCE)0733-9445(2004)130:4(618).
[34] Song I, Cho I, Tessitore T, Gurcsik T, Ceylan H. Data-Driven Prediction of Runway Incursions
with Uncertainty Quantification. J Comput Civ Eng 2018;32:04018004.
https://doi.org/10.1061/(ASCE)CP.1943-5487.0000733.
[35] Efron B, Stein C. The Jackknife Estimate of Variance. Ann Stat 1981;9:586–96.
https://doi.org/10.1214/aos/1176345462.
[36] Tongji Univeristy. Database on static tests of structural members and joint assemblies. 2008.
[37] Song I, Yang Y, Im J, Tong T, Ceylan H, Cho IH. Impacts of Fractional Hot-Deck Imputation on
Learning and Prediction of Engineering Data. IEEE Trans Knowl Data Eng 2020;32:2363–73.
https://doi.org/10.1109/TKDE.2019.2922638.

Multiple Target Machine Learning Prediction of Capacity Curves of Reinforced Concrete Shear Walls

More Related Content

Similar to Multiple Target Machine Learning Prediction of Capacity Curves of Reinforced Concrete Shear Walls (20)

More from Journal of Soft Computing in Civil Engineering (20)

Recently uploaded (20)

Multiple Target Machine Learning Prediction of Capacity Curves of Reinforced Concrete Shear Walls