818 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 2, MAY 2004
Support Vector Machines for Transient Stability
Analysis of Large-Scale Power Systems
L. S. Moulin, A. P. Alves da Silva, Senior Member, IEEE, M. A. El-Sharkawi, Fellow, IEEE, and
R. J. Marks II, Fellow, IEEE
Abstract—The pattern recognition approach to transient
stability analysis (TSA) has been presented as a promising tool
for online application. This paper applies a recently introduced
learning-based nonlinear classifier, the support vector machine
(SVM), showing its suitability for TSA. It can be seen as a different
approach to cope with the problem of high dimensionality. The
high dimensionality of power systems has led to the development
and implementation of feature selection techniques to make the
application feasible in practice. SVMs’ theoretical motivation
is conceptually explained and they are tested with a 2684-bus
Brazilian system. Aspects of model adequacy, training time, clas-
sification accuracy, and dimensionality reduction are discussed
and compared to stability classifications provided by multilayer
perceptrons.
Index Terms—Feature selection, neural networks, support
vector machine, transient stability analysis.
I. INTRODUCTION
THE increasing load demand in power systems without ac-
companying investments in generation and transmission
has affected the analysis of stability phenomena, requiring more
reliable and faster tools. One of the most challenging problems
in real-time operation of power systems is the assessment of
transient stability. Its importance has increased due to the reduc-
tion of operational safety margins. Analytical techniques alone
do not allow to take preventive or corrective actions in due time.
A possible solution to overcome this drawback is the application
of the pattern recognition approach.
Research efforts in communication and computer processing
have enabled the development of online tools for transient sta-
bility analysis (TSA) [1]–[3]. A number of pattern recognition
methods have been reported as playing important roles in such
tools [1], [4], [5]. The integration of automatic learning/pat-
tern recognition techniques with analytical TSA methods can
provide more accurate monitoring, improved use of power sys-
tems resources (e.g., reduced spinning reserves), flexibility in
maintenance scheduling, etc. [2]. Besides avoiding the repet-
itive burden of analyzing similar operating points, the pattern
Manuscript received August 28, 2003. This work was supported in part by
the Brazilian Research Council (CNPq).
L. S. Moulin is with the Electric Power Research Center (CEPEL), Ilha
da Cidade Universitária, Rio de Janeiro, RJ 21941-590 Brazil (e-mail:
moulin@cepel.br).
A. P. Alves da Silva is with the Federal University of Rio de Janeiro, Rio de
Janeiro, RJ 21945-970, Brazil (e-mail: alex@coep.ufrj.br).
M. A. El-Sharkawi and R. J. Marks II are with the Department of Electrical
Engineering, University of Washington, Seattle, WA 98195-2500 USA (e-mail:
elsharkawi@ee.washington.edu; marks@ee.washington.edu).
Digital Object Identifier 10.1109/TPWRS.2004.826018
recognition approach for online TSA [or even assessment (i.e.,
including control)] can deal with modeling uncertainties (e.g.,
dynamic load modeling [6]) and measurement errors.
Analytical methods hardly provide, alone, all functionalities
that control center operators would like to have, which are
• current operating point qualitative evaluation;
• stability margins;
• visualization of security regions;
• available transfer capability;
• preventive and/or corrective controls;
• “optimum” load shedding.
The pattern recognition approach for online TSA can fulfill
directly the first four operators’ desires, and could also help on
providing the last two. In particular, ultrafast stability margin
estimation can provide a feedback variable with a system-wide
view for the controllers. So far, online centralized coordination
has not been possible for the control of fast phenomena. The cur-
rent decentralized approach, based on local measurements, does
not produce adequate pre and postcontingency control, reducing
stability limits and increasing the need for more stabilizers.
Neural networks (NNs) technology has been reported as an
important contributor for reaching the goals of online TSA
[1], [2], [7]–[16]. It presents desirable characteristics, such
as fast response in simple format (stable/unstable or stability
margin), heavy computational burden is paid offline, failure
tolerant with respect to data requirement, and it can allow
better real-time control. Explanation capability can also be
introduced through the extraction of if-then rules from the
NN [17]. Recent proposals of NNs’ application to online TSA
show how these properties can be turned into practical use. In
general, these proposals present one of the following ideas:
a) to rank or screen the contingencies, and after that perform
detailed time-domain simulations [2], [12], [13];
b) to provide a stability evaluation during time-domain sim-
ulations, halting the cases clearly evaluated as stable [14];
c) to provide fast stability evaluations and allow border iden-
tification [11], [18].
In most of the NN proposals for online TSA, multilayer per-
ceptrons (MLPs) are used, which present, as a major drawback,
the extensive training process. Like other nonlinear learning ma-
chines, they lack simple design procedures. In estimating a NN,
one is found between two opposing extremes: i) to use lots of
data for learning and suffer from long training, or ii) to use less
data, and suffer from “insufficient” learning.
Support vector machines (SVMs), a recently introduced
learning paradigm, have very interesting theoretical and prac-
0885-8950/04$20.00 © 2004 IEEE
MOULIN et al.: SUPPORT VECTOR MACHINES FOR TRANSIENT STABILITY ANALYSIS OF LARGE-SCALE POWER SYSTEMS 819
tical characteristics [19], [20]. They rely on so called support
vectors (SVs) to identify the decision boundaries between dif-
ferent classes. The SVs are located near the separation surfaces,
which are critical to achieve correct classifications. SVMs can
map complex nonlinear input/output relationships, and they
are very well suited for TSA because the learning focus is on
the security border. SVMs are based on a linear machine in a
high dimensional feature space, nonlinearly related to the input
space, which has allowed the development of somewhat fast
training techniques, even with a large number of input variables
and big training sets [21]. Investigations of the application of
SVMs to TSA can be found in [7]–[10]. In the present work,
it is shown that SVMs cope with the demands of large power
systems’ TSA, and how they compare to MLPs.
Feature selection techniques have been previously proposed
to make the matter of high dimensionalities easier, especially in
TSA, where the power system representation leads to a large
number of input features [16]. Feature selection reduces the
input dimensionality in order to use as few variables as possible,
getting a more concise representation of the power system.
This paper presents the application of MLP and SVM
classifiers to the TSA of a real power system, the Brazilian
Priba, comprising 2484 buses, 200 generation buses, and 5720
branches. The paper shows how the large input dimensional-
ities represent a concern in stability classification. The paper
also presents a comparison between MLP and SVM models,
since the former is used in almost all previous proposals of
NN application to TSA. Aspects of model adequacy, training
time, classification accuracy, and dimensionality reduction are
discussed, may any of the ideas (a), (b), or (c) be pursued.
The structure of the paper is as follows. In Section II, a
summarized description of SVM classifiers is sketched con-
sidering the conceptual ideas and discussions on advantages
and disadvantages. Section III describes the power system
used in the tests and how the transient stability data have been
collected for the NNs’ training. In Section IV, the details about
the MLP and SVM training procedures are presented, including
the feature selection processing. In Section V, the results of
NN stability classifications are presented. Comparisons and
discussions about the two models are also carried out. Finally,
conclusions are drawn in Section VI.
II. SUPPORT VECTOR MACHINE CLASSIFIERS
SVMs are nonlinear models based on theoretical results from
the statistical learning theory [19]. This theoretical framework
formally generalizes the empirical risk minimization principle
that is usually applied for NN training (i.e., the minimization
of the number of training errors). In traditional NN training,
several heuristics are applied in order to estimate a classifier
with adequate complexity for the problem at hand.
An SVM classifier minimizes the generalization error by op-
timizing the tradeoff between the number of training errors and
the so-called Vapnik-Chervonenkis (VC) dimension, which is a
new concept of complexity measure.
A formal theoretical bound exists for the generalization
ability of an SVM, which depends on the number of training
errors , the size of the training set , the VC dimension
associated to the resulting classifier , and a chosen confi-
dence measure for the bound itself [19]
(1)
The risk represents the classification error expectation over
the entire population of input/output pairs, even though the pop-
ulation is only partially known. This risk is a measure of the ac-
tual generalization error and does not require prior knowledge of
the data probability density function. Statistical learning theory
derives inequality (1) to mean that the generalization ability of
an SVM is bound by the right-hand side of (1). This upper limit
is valid with probability . As increases,
the first summand of the upper bound (1) decreases while the
second summand increases, so that there is a balanced compro-
mise between the two terms (i.e., training error and complexity),
respectively.
Consider a training set , where is a real-
valued -dimensional input vector (i.e., and
is a label that determines the class of . The SVMs
employed for two-class problems are based on hyperplanes to
separate the data, as exampled by Fig. 1. The hyperplane (indi-
cated by the dotted line in Fig. 1) is determined by an orthogonal
vector and a bias , which identifies the points that satisfy
. By finding a hyperplane that maximizes the
margin of separation , it is intuitively expected that the classi-
fier will have a better generalization ability. The hyperplane with
the largest margin on the training set can be completely deter-
mined by the nearest points to the hyperplane. Two such points
are and in Fig. 1(b), and they are called SVs because the
hyperplane (i.e., the classifier) depends entirely on them.
Therefore, in their simplest form, SVMs learn linear decision
rules as
(2)
so that are determined to classify correctly the training
examples and to maximize .
To show the underlying reason for doing this, consider the
fact that it is always possible to scale and so that
(3)
for the SVs with
and (4)
for non-SVs. Using the SVs and of Fig. 1 and (3), the
margin can be calculated as
(5)
For linearly separable data, the VC dimension of SVM classi-
fiers can be estimated by [19]
(6)
where is the minimum radius of a ball which contains the
training points. For linearly separable data, as shown in Fig. 1,
820 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 2, MAY 2004
Fig. 1. Maximum margin classifier.
a linear classifier can be found such that the first summand of
bound (1) is zero. Therefore, the risk (1) can be reduced by
decreasing the complexity of the SVM (i.e., by increasing the
margin of separation , which is equivalent to decreasing ).
As practical problems are not likely to be linearly separable,
the linear SVM has been extended to a nonlinear version by
mapping the training data to an expanded feature space using
a nonlinear transformation
(7)
where . Then, the maximum margin classifier of the data
in the new space can be determined. With this procedure, the
data that are nonseparable in the original space may become
separable in the expanded feature space. The next step is to es-
timate the SVM by minimizing (i.e., maximizing
(8)
subject to the constraint that all training patterns are correctly
classified, that is
(9)
However, depending on the type of nonlinear mapping (7), the
training points may not happen to be linearly separable, even in
the expanded feature space. In this case, it will be impossible
to find a linear classifier that fulfills all of the conditions (9).
Therefore, a new cost function is used, instead of (8)
(10)
where non-negative slack variables are introduced to
allow for training errors (i.e., training patterns for which
and ). By minimizing the
first summand of (10), the complexity of the SVM is reduced,
and by minimizing the second summand of (10), the number of
training errors is decreased. is a preselected positive penalty
factor that acts as a tradeoff between the two terms.
The minimization of the cost function (10) leads to a
quadratic optimization problem with a unique solution. In
fact, the nonlinear mapping (7) is indirectly obtained by
the so-called Mercer Kernel functions, which correspond to
inner products of data vectors in the expanded feature space
. Because the SVM formula-
tion ends up with an inner product format (see the Appendix for
more details), the Kernel function can substitute the nonlinear
mapping (7) wherever it appears. In order for this equivalence
to be valid, a Kernel function must satisfy some requirements
called Mercer Conditions [20]. The most commonly used
functions are the RBF kernel
(11)
and the polynomial kernel
(12)
where the parameters and in (11) and (12) must be preset.
One important advantage of using a Kernel function instead
of the nonlinear mapping (7) is that some of its key aspects
like representation, complexity, and generalization capability
become highly dependent on a few control parameters, as
will be shown later. Another important advantage is related to
the computational complexity of the large expanded dimension
space . For example, the polynomial kernel (12) corresponds
to a nonlinear expanded space of dimension ,
and the features , represent all of the
monomials of the original input vector up to and including
degree . In power systems, where is typically large, would
become computationally intractable. However, by substituting
the nonlinear mapping by the Kernel function, all calculations
are performed in the original input space dimension.
In summary, a nonlinear mapping (7) can be indirectly de-
fined by a Kernel function [i.e., there is no need for specifying
(7)], for example (11) or (12). Overfitting problems in the ex-
panded feature space are overcome by implicit generalization
control in the learning process. The parameters and affect
how sparse and easily separable the data are in the expanded fea-
ture space, and consequently, they affect the complexity of the
resulting SVM classifier and the training error rate. The param-
eter also affects the model complexity. Currently, there is no
indication, besides trial and error, on how to set , to choose the
best Kernel function, and to set the Kernel parameters. In prac-
tice, a range of values has to be tried for and for the Kernel
parameters, and then the performance of the SVM classifier is
estimated for each of these values (and Kernel functions). De-
tails on the minimization of (10) and the SVM architecture are
shown in the Appendix.
III. POWER SYSTEM DESCRIPTION AND DATA SET GENERATION
The power system used for TSA tests is a subsystem of the
Brazilian southeast grid, which is located in the region with the
largest power consumption in the country. The system is basi-
cally formed by the hydroelectric plants along Paranaíba and
Grande rivers, and by the power grid around these plants. This
so-called “Priba System” has 2484 buses, 200 generation buses,
and 5730 branches, including 26 major 750-kV, 500-kV, and
345-kV transmission lines and transformers connecting the gen-
eration plants to load centers and to other subsystems.
The transient stability studies assume that 14 of the major
branches become unavailable due to maintenance scheduling,
MOULIN et al.: SUPPORT VECTOR MACHINES FOR TRANSIENT STABILITY ANALYSIS OF LARGE-SCALE POWER SYSTEMS 821
one at a time. For each major branch outage for maintenance
scheduling, single contingencies are assumed in nine other
major branches. The contingencies consist of three-phase
short-circuits, which are cleared by tripping the corresponding
line. Three load levels (light, medium, and heavy) have been
simulated, besides the combinations of different generation
dispatches and power exchanges between subsystems. These
base cases for TSA have been simulated in the time domain,
and each one has been classified as stable or unstable. With
this procedure, 994 training patterns and 248 test patterns
have been obtained. The TSA data set has a large percentage
of stable cases, with the ratio of stable to unstable cases of
approximately 10:1.
It is important to clarify that the data used in this work have
not been produced for estimating the proposed classifiers. In
fact, the simulations were performed during operational studies
for an electric utility, without any specific concern regarding the
classifiers’ design. These studies include the specialists’ knowl-
edge about the list of the most important contingencies, the most
important variables, typical operating conditions, the required
accuracy in power system modeling, etc., which represent the
utility’s expertise as far as offline TSA is concerned. All of
this knowledge, which has been considered and included in the
training set, makes up the necessary information to be learned
by a NN. This kind of data is usually available in electric utili-
ties, and this paper intends to show that useful classifiers can be
obtained from it without any other data requirement.
However, as power systems are planned to operate most of
the time under stable conditions, operational studies usually
generate highly unbalanced classes. This unbalance between
the stable and unstable classes can be troublesome for some
classifiers.
Based on the approach developed in [9]–[11], and [16], the
following input variables have been chosen to describe the
power system operating point and the applied fault: active
and reactive power at selected generation buses before fault
occurrence; active and reactive power flows on the 26 major
branches before fault occurrence; and a binary coding for the
nine faults under analysis. The output variable has been chosen
to be the two classes of interest: stable or unstable .
Taking into account the generation buses that yielded relevant
information (i.e., significant variation on the generated power),
224 input variables have been preselected. Topology informa-
tion is implicitly informed by the branches’ power flow vari-
ables (“no flow” means an open line).
IV. NEURAL NETWORKS TRAINING
This section presents details about the MLP and SVM
training procedures, including the feature selection processing.
A. Multilayer Perceptron Training
The MLPs have been trained by the Stuttgart NNs Simulator
[22], which is a free software developed in C. The back-prop-
agation training with adaptive learning and momentum rates,
and cross-validation have been used. Cross-validation has been
performed by randomly splitting the original training set and
Fig. 2. MLP training on TSA data set.
reserving 20% of its patterns for validation. The random split-
ting was repeated at every 50 epochs (one epoch is one training
cycle for which all training patterns have been presented to the
NN once), when the training and validation errors were also cal-
culated in order to monitor their behavior and to stop training
earlier (i.e., before overfitting).
It has been noticed that during the entire training process, the
false dismissal rate (rate of unstable cases assigned to the stable
class) is high and the false alarm rate (rate of stable cases as-
signed to the unstable class) is close to zero, which is highly un-
desirable. The training set unbalance between the two classes of
interest damages the MLP classifier estimation, because it over-
fits the stable data. An example of this overfitting is in Fig. 2,
which presents the classification performance during the MLP
training process. When training is terminated, after 1000 epochs
(20 50), all of the errors are false dismissals. The error rate is
the sum of the false dismissal and false alarm rates.
To try to avoid overfitting the stable data and to decrease the
false dismissal rates, a new training procedure has been per-
formed. An augmented training set has been devised with the
ratio between stable and unstable patterns artificially modified.
A 1:1 ratio has been set by adding copies of unstable training
patterns to the original training set and the results on the test set
will be presented in Section V.
B. SVM Training
The SVM classifier is based on a subset of the training
patterns, the support vectors, located at the separation region
between the two classes. The SVs define the largest possible
margin of separation. Two different kernel functions have been
used, the RBF kernel (11) and the Polynomial kernel (12).
The parameters (10), (11), and (12) have been searched
heuristically, trying to achieve the best generalization capacity.
The software [21], developed in C, has been used for
training and testing the SVM models.
The SVM training process consists of a quadratic optimiza-
tion problem in which the support vectors represent the min-
imum solution. The use of an augmented training set as in the
MLP training is not appropriate because of linear dependen-
cies in the constraints. Instead, to account for the training set
unbalance, different values for can be used. A large value
of for the unstable patterns and a small value for the stable
ones have been adopted during the training process (the corre-
sponding values of have been multiplied by 0.1 for the stable
training patterns). In this way, the optimization process empha-
sizes the minimization of the unstable patterns training errors.
Different values for and for the parameter have been tried.
822 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 2, MAY 2004
The RBF kernel SVMs have not shown satisfactory results,
because in the test set they have maximum false dismissal rate
and 0% of false alarm rate, no matter the values of the param-
eters. On the other hand, polynomial kernel SVMs have been
trained successfully, and the results of their performance on the
test set will be presented in Section V.
C. Feature Selection
Because of the high dimensionality of the input space, feature
selection techniques have also been applied to achieve a more
concise representation of the power system and overcome the
curse of dimensionality.
According to the notation introduced in Section II, a classifi-
cation task in two groups is represented by ordered pairs
in the training set , where represents the op-
erating point and denotes the security index of
that point. The main objective of a feature selection technique
is to generate a -dimensional feature vector where .
The “ ” selected features represent the original data in a new
training set .
If the feature selection is successful, a point in can be
assigned to one of the two classes with minimum error. Two
feature selection techniques are used in this work, as presented
in [16]: sequential search and genetic algorithms (GAs). Re-
ductions on the data set dimensionality from to
, and have been tested.
V. RESULTS AND DISCUSSIONS
The following comparison between MLP and SVM models
results from an extensive search over their parameters and the
number of input variables ( and ). The classifiers have been
initially trained to achieve low classification error rates. After
that, the ones having the lowest false dismissal rates have been
picked.
Taking as the NN output (stability condition estimated
by the NN), notice that the output range of the MLP model
is , whereas the output range of the SVM model is
. Operating points close to the stability border
(outputs close to zero) indicate a dangerous situation, even if
they are on the stable side, and deserve similar treatment as
the unstable points. Therefore, besides stable and unstable, it
is useful to take a third stability classification, according to
the classifiers’ outputs . A high risk range is devised near the
0 classification threshold, and if falls within this range, the
point is classified as “high risk” (to mean it is not considered
stable nor unstable). For the SVM classifier, points with output
values in the range are natural candidates for high risk
cases, because they are located between the support vectors of
different classes and near the classification border.
Taking these considerations into account, the results of the
two classifiers on the test set are presented in Table I. Column (2)
shows the classification results with the high risk range
for the SVM with 224 inputs, and . The false
dismissal cases occur when and . The false alarm
cases occur when and . The high risk cases occur
whenever . The error rate is the sum of the false
dismissal and false alarm rates. The total error, false dismissal,
TABLE I
SVMS AND MLPS PERFORMANCES ON THE TEST SET
and false alarm rates are calculated as a percentage of 248 test
patterns (225 stable and 23 unstable). Results with a more con-
servative high risk range are shown in column (3) of
Table I for the same SVM classifier, where the performance
rates are calculated like the case represented by column (2).
For the MLP classifier with 150 inputs selected by the GA,
arbitrarily large high risk ranges , and
have been defined, and the results are shown in
columns (4), (5), and (6) of Table I, respectively. The perfor-
mance rates have been calculated according to each high risk
range, just as explained for the SVM model.
The use of such high risk ranges is well known in NN
applications to TSA, where false dismissal rates must be very
close to 0%. There is a clear compromise between the high risk
ranges and the false dismissal rates, because one increases as
the other decreases. In a contingency screening application, by
taking the SVM classifier criterion of column (3) instead of
the one in column (2), one would favor reliability and would
conservatively accept % % % more high risk
cases. Although most of these would be stable cases, they could
start up preventive actions or detailed stability simulations,
depending on the specific approach taken. As the high risk
range is increased, the classifier becomes more reliable (i.e.,
with lower false dismissal rate), but less effective in screening
out true stable cases. The false dismissal rate would decrease
from 0.8% to 0% and both classifiers of columns (2) and (3)
have good performances. The decision about which one to take
is a project decision. The conceptual description of high risk
cases for the SVMs leads to a comprehensive definition of
high risk ranges, whose validity is confirmed in practice by the
results presented in Table I.
By taking the MLP of column (5) instead of the one of column
(4), the false dismissal rate would decrease from 3.2% to 2.4%.
However, it cannot be lowered further, even by setting a very
large upper limit for the high risk range, as it is shown in column
(6) of Table I.
MOULIN et al.: SUPPORT VECTOR MACHINES FOR TRANSIENT STABILITY ANALYSIS OF LARGE-SCALE POWER SYSTEMS 823
A false dismissal rate of 0.8% on the test set means
that 8.7% of the unstable patterns in the test set have been
misclassified. A false dismissal rate of 2.4% means that
26.1% of the unstable patterns in the test set have been
misclassified.
For the MLP, feature selection allows a gain in generalization
resulting from the dimensionality reduction, despite the loss of
information due to the discarded variables. However, the best
MLPs offer no competition to the best SVMs. Surprisingly, the
dimensionality reduction does not provide any improvement to
the overall performance of the SVMs, if compared to the ones
estimated with the original inputs.
Table I also shows the training times in cpu seconds and
cpu minutes, run in an 850-MHz PC. The SVM training time
is lower than the MLP training by one order of magnitude. It
is worth noting that the training set size is not large compared
to the number of MLP parameters. In fact, larger training sets
would be recommended for a better representation of the power
system behavior, though the relationship between the compu-
tational burden and the training set cardinality is inconvenient
when employing backpropagation learning.
The number of SVs (168) determines the number of free
parameters of the SVM classifier [see Fig. 4 and (15) in the
Appendix, and recall that only for the SVs]. That helps
to understand why a training set with 994 patterns only is
enough to estimate an SVM with good performance, despite
the large number of input variables. The results show how the
SVM models can take advantage of a stability study database
already available in a control center. Their performance could
continuously improve during the online TSA process by adding
new operating conditions to the training set.
The SVM performance stems from its capacity to generalize
well from the available training data, which is related to an “im-
plicit” feature selection ability. In order to make that clear, take
the receiver operating characteristic (ROC) curves, shown in
Fig. 3. This graph presents false dismissal rates on the hori-
zontal axis and detection rates on the vertical axis. The detection
rate is just an indirect measure of the false alarm rate, calcu-
lated by subtracting it from 100% (without considering the high
risk range, that is, when the classification threshold is zero).
Each ROC curve in Fig. 3 corresponds to SVMs trained with
the indicated input variables and different values of . For each
curve, a specific value of has been chosen to the best possible
ROC, which is the one having the points with the lowest false
dismissal rates coupled with the highest detection rates. ROC
curves cannot be drawn for the MLPs, because there is no pa-
rameter to control the relationship between false alarm and false
dismissal rates. The number of factors that affect an MLP’s per-
formance are large and inter-related.
The “best” curve in Fig. 3 is from 224 inputs. As the number
of input variables decreases, the curves get “worse,” which is
an indication of loss of discrimination capability as variables
are discarded from the original input set. On the other hand, it is
also an indication of how good the SVMs can be on the high di-
mensional original space. That means, embedded in the learning
process of an SVM there is an automatic feature selection, which
prevents it from being trapped by the curse of dimensionality on
the expanded feature space.
Fig. 3. Polynomial SVM ROC curves for different input sets.
Fig. 3 also shows that the parameters and can be used to
control the relationship between the number of false dismissals
and false alarms. For a specific value of , it is possible to move
the SVM classifier to the left of the ROC curve as much as de-
sired to achieve lower false dismissal rates, at the expense of
larger false alarm rates. This procedure gives some inspiration
on a structured way to design an SVM classifier for TSA. For a
selected Kernel function:
a) perform a fine search over the values of and of the
Kernel parameter, and draw ROC curves for them;
b) choose the curve with the “best” RO characteristic (the
one having points with the largest ratios between detec-
tion rates and false dismissal rates);
c) choose the point in the “best” ROC curve with the largest
ratio between detection and false dismissal rates;
d) pick the values of and of the Kernel parameter that cor-
respond to the point chosen in (c);
e) choose a high risk range that provides acceptable false
dismissal rate for the classifier estimated from (d).
VI. CONCLUSION
This paper shows that SVMs fit the TSA task for large power
systems. They provide a different strategy to tackle the curse
of dimensionality. The SVMs performed better when the com-
plete set of input variables was used, which confirms, in prac-
tice, their implicit feature selection capability and the validity
of the theoretical developments on generalization control. The
SVM learning machine allows a deep understanding of its prac-
tical implications, which can be used to devise structured design
practices for the model.
The sparsity reduction of the data has turned the training
process into an easier task for MLPs. However, the MLPs
performance (3.2% of false dismissal rate, 11.7% of false alarm
rate, and 3.23% of high risk rate) is not as good as the SVMs
(0.8% of false dismissal rate, 4.8% of false alarm rate, and
14.1% of high risk rate). It has been shown that stability studies
databases already available in electric utilities, containing
specialists’ knowledge, can be used in NN-based TSA as a
good starting point.
Future work will focus on dynamic features. The majority of
work about TSA, based on the pattern recognition approach,
824 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 2, MAY 2004
has focused the analysis on prefault static features. With the
popularization of synchronized phasor measurements acquisi-
tion systems (phase angle monitors), loss of synchronization can
be predicted, in real time, based on postfault phasor measure-
ments (i.e., speed and acceleration at each generation bus are
calculated from the synchronized phasor measurements) [23].
Therefore, the next generation of transient stability assessment
tools will be allowed to move from preventive countermeasures
(contingency analysis on prefault operating points) to corrective
control.
Another promising idea for online TSA of large-scale power
systems is the hybrid approach based on direct-type methods
coupled with detailed time simulation [24]. In this approach,
NNs can be used as filters to discard stable contingencies in a
very fast way.
APPENDIX
The computation of the decision boundary of an SVM
for the nonseparable case consists in solving
the following optimization problem:
minimize
subject to
(13)
Instead of solving (13) directly, it is much easier to solve the
dual problem (14), in terms of the Lagrange multipliers
minimize
(14)
subject to and
which is a quadratic optimization problem. From the solution,
of (14), the decision rule can be com-
puted as
(15)
The training points with are the SVs, and (15) depends
entirely on them. The threshold can be calculated using (3),
which is valid for any SV
(16)
An SVM can be represented as in Fig. 4, where the number of
units is determined by the number of SVs.
Fig. 4. SVM architecture.
ACKNOWLEDGMENT
The authors would like to thank the researchers at GESIS
Lab, UNIFEI (Federal University at Itajubá, Brazil), for making
the Transient Stability studies used in this paper available. They
would also like to thank the reviewers for their valuable com-
ments and questions, which helped to improve this work.
REFERENCES
[1] L. A. Wehenkel, Automatic Learning Techniques in Power Sys-
tems. Norwell, MA: Kluwer, 1998.
[2] J. L. Jardim, C. A. da S. Neto, A. P. A. da Silva, A. C. Zambroni de
Souza, D. M. Falcão, C. L. T. Borges, and G. N. Taranto, “A unified
online security assessment system,” in Proc. CIGRÉ, Paris, France, Aug.
2000.
[3] Y. Mansour, E. Vaahedi, A. Y. Chang, B. R. Corns, B. W. Garrett, K.
Demaree, T. Athay, and K. Cheung, “B. C. Hydro’s on-line transient
stability assessment (TSA) model development, analysis, and post-pro-
cessing,” IEEE Trans. Power Syst., vol. 10, pp. 241–250, Feb. 1995.
[4] D. J. Sobajic, Y.-H. Pao, and M. Djukanovic, “Neural networks for as-
sessing the transient stability of electric power systems,” Neural Net-
works Applications in Power Systems, pp. 255–294, 1996.
[5] R. Fischl, D. Niebur, and M. A. El-Sharkawi, “Security assessment
and enhancement,” in Artificial Neural Networks with Applications to
Power Systems, M. A. El-Sharkawi and D. Niebur, Eds., 1996, ch. 9,
pp. 104–127. IEEE Catalog no. 96TP112-0.
[6] A. P. A. da Silva, C. Ferreira, G. L. Torres, and A. C. Z. de Souza, “A new
constructive ANN and its application to electric load representation,”
IEEE Trans. Power Syst., vol. 12, pp. 1569–1575, Nov. 1997.
[7] A. E. Gavoyiannis, D. G. Vogiatzis, and N. D. Hatziargyriou, “Dynamic
security classification using support vector machines,” in Proc. IEEE
Int. Conf. Intell. Syst. Applicat. Power Syst., Budapest, Hungary, June
2001, pp. 271–275.
[8] A. E. Gavoyiannis, D. G. Vogiatzis, D. P. Georgiadis, and N. D. Hatziar-
gyriou, “Combined support vector classifiers using fuzzy clustering
for dynamic security assessment,” in Proc. Power Eng. Soc. Summer
Meeting, vol. 2, 2001, pp. 1281–1286. 2001.
[9] L. S. Moulin, A. P. A. da Silva, M. A. El-Sharkawi, and R. J. Marks
II, “Support vector and multilayer perceptron neural networks applied
to power systems transient stability analysis with input dimensionality
reduction,” in Proc. IEEE Power Eng. Soc. Summer Meeting, Chicago,
IL, July 2002.
[10] , “Neural networks and support vector machines applied to power
systems transient stability analysis,” Int. J. Eng. Intell. Syst., vol. 9, no.
4, pp. 205–211, Dec. 2001.
[11] I. N. Kassabalidis, M. A. El-Sharkawi, R. J. MarksII, L. S. Moulin, and
A. P. A. da Silva, “Dynamic security border identification using en-
hanced particle swarm optimization,” IEEE Trans. Power Syst., vol. 17,
pp. 723–729, Aug. 2002.
[12] Y. Mansour, E. Vaahedi, and M. A. El-Sharkawi, “Dynamic security
contingency screening and ranking using neural networks,” IEEE Trans.
Neural Networks, vol. 8, pp. 942–950, July 1997.
[13] Y. Mansour, A. Y. Chang, J. Tamby, E. Vaahedi, and M. A. El-Sharkawi,
“Large scale dynamic security screening and ranking using neural net-
works,” IEEE Trans. Power Syst., vol. 12, pp. 954–960, May 1997.
MOULIN et al.: SUPPORT VECTOR MACHINES FOR TRANSIENT STABILITY ANALYSIS OF LARGE-SCALE POWER SYSTEMS 825
[14] I. Kamwa, R. Grondin, and L. Loud, “Time-varying contingency
screening for dynamic security assessment using intelligent-systems
techniques,” IEEE Trans. Power Syst., vol. 16, pp. 526–536, Aug. 2001.
[15] Y. M. Park, G.-W. Kim, H.-S. Cho, and K. Y. Lee, “A new algorithm
for Kohonen layer learning with application to power system stability
analysis,” IEEE Trans. Syst., Man, Cybern. B, vol. 27, pp. 1030–1034,
Dec. 1997.
[16] L. S. Moulin, M. A. El-Sharkawi, R. J. Marks II, and A. P. A. da Silva,
“Automatic feature extraction for neural network based power systems
dynamic security evaluation,” in Proc. IEEE Int. Conf. Intell. Syst. Ap-
plicat. to Power Syst., Budapest, Hungary, June 2001, pp. 41–46.
[17] P. J. Abrão, A. P. A. da Silva, and A. C. Z. de Souza, “Rule extraction
from artificial neural networks for voltage security analysis,” in Proc.
Int. Joint Conf. Neural Networks, Honolulu, HI, May 2002.
[18] J. D. McCalley, Q. Zhao, S. Wang, G. Zhou, R. T. Treinen, and A. D. Pa-
palexopoulos, “Security boundary visualization for systems operation,”
IEEE Trans. Power Syst., vol. 12, pp. 940–947, May 1997.
[19] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.
[20] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector
Machines and Other Kernel-Based Learning Methods. Cambridge,
U.K.: Cambridge Univ. Press, 2000.
[21] T. Joachims, Advances in Kernel Methods—Support Vector Learning,
B. Scholkopf, J. C. C. Burges, and A. J. Smola, Eds. Cambridge, MA:
MIT Press, 1998. Making large-scale SVM learning practical.
[22] Stuttgart Neural Network Simulator [Online]. Available: http://www-
ra.informatik.uni-tuebingen.de/SNNS/
[23] S. Rovnyak, S. Kretsinger, J. Thorp, and D. Brown, “Decision trees for
real-time transient stability prediction,” IEEE Trans. Power Syst., vol. 9,
pp. 1417–1426, Aug. 1994.
[24] D. Ernst, D. R. Vega, M. Pavella, P. M. Hirsch, and D. Sobajic, “A unified
approach to transient stability contingency filtering, ranking and assess-
ment,” IEEE Trans. Power Syst., vol. 16, pp. 435–443, Aug. 2001.
Luciano S. Moulin was born in Nanuque, Brazil, in 1972. He received the
B.Sc. and M.Sc. degrees in electrical engineering from the Federal Engineering
School at Itajubá (EFEI), Itajubá, Brazil, in 1995 and 1998, respectively. He re-
ceived the D.Sc. degree in electrical engineering from the Federal University at
Itajubá (UNIFEI) [previously EFEI] in 2002.
Currently, he is a Researcher in Electrical Engineering with the Electric Power
Research Center (CEPEL). During 2000, he was a Visiting Student in the De-
partment of Electrical Engineering at the University of Washington, Seattle.
Alexandre P. Alves da Silva (SM’00) was born in Rio de Janeiro, Brazil, in
1962. He received the B.Sc. and M.Sc. degrees in electrical engineering from the
Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil, in 1984 and 1987,
respectively, and the Ph.D. degree from the University of Waterloo, Waterloo,
ON, Canada, in 1992.
Currently, he is a Professor in Electrical Engineering at the Federal University
of Rio de Janeiro, Rio de Janeiro, Brazil, where he and his group have developed
intelligent forecasting and security assessment systems that are in operation at
the control centers of Brazilian electric utilities. From 1993 to 2002, he was with
the Federal Engineering School at Itajuba, Itajuba, Brazil. He was also with the
Electric Energy Research Center (CEPEL), Rio de Janeiro, Brazil, from 1987
to 1988. During 1999, he was a Visiting Professor in the Department of Elec-
trical Engineering at the University of Washington, Seattle. He has authored and
co-authored many papers on intelligent systems application to power systems.
Dr. Alves da Silva was the Technical Program Committee Chairman of the
First Brazilian Conference on Neural Networks in 1994, and of the International
Conference on Intelligent System Applications to Power Systems in 1999.
Mohamed A. El-Sharkawi (F’95) received the B.Sc. degree in electrical engi-
neering from Cairo High Institute of Technology, Cairo, Egypt, in 1971, and the
M.Sc. and Ph.D. degrees from the University of British Columbia, Vancouver,
BC, Canada, in 1977 and 1980, respectively.
Currently, he is a Professor of Electrical Engineering at the University of
Washington, Seattle. He is the founder of the International Conference on the
Application of Neural Networks to Power Systems (ANNPS), which was later
merged with the Expert Systems Conference and renamed Intelligent Systems
Applications to Power (ISAP). He is the co-editor of the IEEE tutorial book on
the applications of neural networks to power systems. He has published many
papers and book chapters. He holds five patents: three on Adaptive Var Con-
troller for distribution systems and two on Adaptive Sequential Controller for
circuit breakers.
Robert J. Marks, II (F’94) is a Professor and Graduate Program Coordinator
with the Department of Electrical Engineering at the College of Engineering,
University of Washington, Seattle. He is the author of numerous papers and
is co-author of the book Neural Smithing: Supervised Learning in Feedfor-
ward Artificial Neural Networks. He served as the Editor-in-Chief of the IEEE
TRANSACTIONS ON NEURAL NETWORKS and as a Topical Editor for Optical
Signal Processing and Image Science for the Journal of the Optical Society of
America.
Dr. Marks is a Fellow of the Optical Society of America. He served as the
first President of the IEEE Neural Networks Council. In 1992, he was given the
honorary title of Charter President.

More Related Content

PDF
PDF
F1083644
PDF
Traffic light control in non stationary environments based on multi
PDF
Short Term Load Forecasting Using Bootstrap Aggregating Based Ensemble Artifi...
PDF
Power system and communication network co simulation for smart grid applications
PDF
Brain Tumor Classification using Support Vector Machine
PDF
Ieeepro techno solutions 2013 ieee embedded project an integrated design fr...
PDF
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...
F1083644
Traffic light control in non stationary environments based on multi
Short Term Load Forecasting Using Bootstrap Aggregating Based Ensemble Artifi...
Power system and communication network co simulation for smart grid applications
Brain Tumor Classification using Support Vector Machine
Ieeepro techno solutions 2013 ieee embedded project an integrated design fr...
POSTERIOR RESOLUTION AND STRUCTURAL MODIFICATION FOR PARAMETER DETERMINATION ...

What's hot (19)

PDF
Ieeepro techno solutions 2013 ieee embedded project decision making in coll...
PPTX
Contingency Of Weighting Factors
PDF
Image processing-ieee-2014-projects
PDF
Image Processing IEEE 2014 Projects
PDF
SVM-KNN Hybrid Method for MR Image
PDF
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
PDF
A REVIEW ON OPTIMIZATION OF LEAST SQUARES SUPPORT VECTOR MACHINE FOR TIME SER...
PDF
IRJET-Multimodal Image Classification through Band and K-Means Clustering
PPT
Learning from data for wind–wave forecasting
PDF
0071 Full Paper IET IAM 2011 London R.P.Y.Mehairjan
PDF
IMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACH
PDF
Integration of queuing network and idef3 for business process analysis
PPTX
Morse-Smale Regression for Risk Modeling
PDF
2-DOF BLOCK POLE PLACEMENT CONTROL APPLICATION TO:HAVE-DASH-IIBTT MISSILE
PDF
Activity Recognition From IR Images Using Fuzzy Clustering Techniques
PDF
PDF
Classification of Churn and non-Churn Customers in Telecommunication Companies
PDF
PROTECTOR CONTROL PC-AODV-BH IN THE AD HOC NETWORKS
Ieeepro techno solutions 2013 ieee embedded project decision making in coll...
Contingency Of Weighting Factors
Image processing-ieee-2014-projects
Image Processing IEEE 2014 Projects
SVM-KNN Hybrid Method for MR Image
A SIMPLE PROCESS TO SPEED UP MACHINE LEARNING METHODS: APPLICATION TO HIDDEN ...
A REVIEW ON OPTIMIZATION OF LEAST SQUARES SUPPORT VECTOR MACHINE FOR TIME SER...
IRJET-Multimodal Image Classification through Band and K-Means Clustering
Learning from data for wind–wave forecasting
0071 Full Paper IET IAM 2011 London R.P.Y.Mehairjan
IMPROVEMENT OF SUPPLY CHAIN MANAGEMENT BY MATHEMATICAL PROGRAMMING APPROACH
Integration of queuing network and idef3 for business process analysis
Morse-Smale Regression for Risk Modeling
2-DOF BLOCK POLE PLACEMENT CONTROL APPLICATION TO:HAVE-DASH-IIBTT MISSILE
Activity Recognition From IR Images Using Fuzzy Clustering Techniques
Classification of Churn and non-Churn Customers in Telecommunication Companies
PROTECTOR CONTROL PC-AODV-BH IN THE AD HOC NETWORKS
Ad

Similar to Transient stability analysis of power system (20)

PDF
Intelligent fault diagnosis for power distribution systemcomparative studies
PDF
Online voltage stability margin assessment
PDF
Voltage stability assessment prediction using a guide strategy-based adaptive...
PDF
A survey of modified support vector machine using particle of swarm optimizat...
PDF
Classification of voltage disturbance using machine learning
PDF
Application of combined support vector machines in process fault diagnosis
PDF
Risk assessment of power system transient instability incorporating renewabl...
PDF
PROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYSTEMS
PDF
PROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYSTEMS
PDF
Progress of Machine Learning in the Field of Intrusion Detection Systems
PDF
Progress of Machine Learning in the Field of Intrusion Detection Systems
PDF
11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...
PDF
Support Vector Machine for Wind Speed Prediction
PDF
The International Journal of Engineering and Science (The IJES)
PDF
Diminution of real power loss by hybridization of particle
PDF
Power system transient stability margin estimation using artificial neural ne...
PDF
40220130405014 (1)
PDF
Real-Time Stock Market Analysis using Spark Streaming
PDF
Most Cited Articles in Academia ---International Journal of Data Mining & Kno...
PDF
Optimal Reactive Power Dispatch using Crow Search Algorithm
Intelligent fault diagnosis for power distribution systemcomparative studies
Online voltage stability margin assessment
Voltage stability assessment prediction using a guide strategy-based adaptive...
A survey of modified support vector machine using particle of swarm optimizat...
Classification of voltage disturbance using machine learning
Application of combined support vector machines in process fault diagnosis
Risk assessment of power system transient instability incorporating renewabl...
PROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYSTEMS
PROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYSTEMS
Progress of Machine Learning in the Field of Intrusion Detection Systems
Progress of Machine Learning in the Field of Intrusion Detection Systems
11421ijcPROGRESS OF MACHINE LEARNING IN THE FIELD OF INTRUSION DETECTION SYST...
Support Vector Machine for Wind Speed Prediction
The International Journal of Engineering and Science (The IJES)
Diminution of real power loss by hybridization of particle
Power system transient stability margin estimation using artificial neural ne...
40220130405014 (1)
Real-Time Stock Market Analysis using Spark Streaming
Most Cited Articles in Academia ---International Journal of Data Mining & Kno...
Optimal Reactive Power Dispatch using Crow Search Algorithm
Ad

Recently uploaded (20)

PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
A comparative study of natural language inference in Swahili using monolingua...
PPT
What is a Computer? Input Devices /output devices
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PPTX
observCloud-Native Containerability and monitoring.pptx
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPT
Geologic Time for studying geology for geologist
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
A novel scalable deep ensemble learning framework for big data classification...
sustainability-14-14877-v2.pddhzftheheeeee
A comparative study of natural language inference in Swahili using monolingua...
What is a Computer? Input Devices /output devices
1 - Historical Antecedents, Social Consideration.pdf
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Assigned Numbers - 2025 - Bluetooth® Document
Group 1 Presentation -Planning and Decision Making .pptx
Taming the Chaos: How to Turn Unstructured Data into Decisions
Enhancing emotion recognition model for a student engagement use case through...
NewMind AI Weekly Chronicles – August ’25 Week III
observCloud-Native Containerability and monitoring.pptx
Module 1.ppt Iot fundamentals and Architecture
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
WOOl fibre morphology and structure.pdf for textiles
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Geologic Time for studying geology for geologist
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Zenith AI: Advanced Artificial Intelligence
A novel scalable deep ensemble learning framework for big data classification...

Transient stability analysis of power system

  • 1. 818 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 2, MAY 2004 Support Vector Machines for Transient Stability Analysis of Large-Scale Power Systems L. S. Moulin, A. P. Alves da Silva, Senior Member, IEEE, M. A. El-Sharkawi, Fellow, IEEE, and R. J. Marks II, Fellow, IEEE Abstract—The pattern recognition approach to transient stability analysis (TSA) has been presented as a promising tool for online application. This paper applies a recently introduced learning-based nonlinear classifier, the support vector machine (SVM), showing its suitability for TSA. It can be seen as a different approach to cope with the problem of high dimensionality. The high dimensionality of power systems has led to the development and implementation of feature selection techniques to make the application feasible in practice. SVMs’ theoretical motivation is conceptually explained and they are tested with a 2684-bus Brazilian system. Aspects of model adequacy, training time, clas- sification accuracy, and dimensionality reduction are discussed and compared to stability classifications provided by multilayer perceptrons. Index Terms—Feature selection, neural networks, support vector machine, transient stability analysis. I. INTRODUCTION THE increasing load demand in power systems without ac- companying investments in generation and transmission has affected the analysis of stability phenomena, requiring more reliable and faster tools. One of the most challenging problems in real-time operation of power systems is the assessment of transient stability. Its importance has increased due to the reduc- tion of operational safety margins. Analytical techniques alone do not allow to take preventive or corrective actions in due time. A possible solution to overcome this drawback is the application of the pattern recognition approach. Research efforts in communication and computer processing have enabled the development of online tools for transient sta- bility analysis (TSA) [1]–[3]. A number of pattern recognition methods have been reported as playing important roles in such tools [1], [4], [5]. The integration of automatic learning/pat- tern recognition techniques with analytical TSA methods can provide more accurate monitoring, improved use of power sys- tems resources (e.g., reduced spinning reserves), flexibility in maintenance scheduling, etc. [2]. Besides avoiding the repet- itive burden of analyzing similar operating points, the pattern Manuscript received August 28, 2003. This work was supported in part by the Brazilian Research Council (CNPq). L. S. Moulin is with the Electric Power Research Center (CEPEL), Ilha da Cidade Universitária, Rio de Janeiro, RJ 21941-590 Brazil (e-mail: moulin@cepel.br). A. P. Alves da Silva is with the Federal University of Rio de Janeiro, Rio de Janeiro, RJ 21945-970, Brazil (e-mail: alex@coep.ufrj.br). M. A. El-Sharkawi and R. J. Marks II are with the Department of Electrical Engineering, University of Washington, Seattle, WA 98195-2500 USA (e-mail: elsharkawi@ee.washington.edu; marks@ee.washington.edu). Digital Object Identifier 10.1109/TPWRS.2004.826018 recognition approach for online TSA [or even assessment (i.e., including control)] can deal with modeling uncertainties (e.g., dynamic load modeling [6]) and measurement errors. Analytical methods hardly provide, alone, all functionalities that control center operators would like to have, which are • current operating point qualitative evaluation; • stability margins; • visualization of security regions; • available transfer capability; • preventive and/or corrective controls; • “optimum” load shedding. The pattern recognition approach for online TSA can fulfill directly the first four operators’ desires, and could also help on providing the last two. In particular, ultrafast stability margin estimation can provide a feedback variable with a system-wide view for the controllers. So far, online centralized coordination has not been possible for the control of fast phenomena. The cur- rent decentralized approach, based on local measurements, does not produce adequate pre and postcontingency control, reducing stability limits and increasing the need for more stabilizers. Neural networks (NNs) technology has been reported as an important contributor for reaching the goals of online TSA [1], [2], [7]–[16]. It presents desirable characteristics, such as fast response in simple format (stable/unstable or stability margin), heavy computational burden is paid offline, failure tolerant with respect to data requirement, and it can allow better real-time control. Explanation capability can also be introduced through the extraction of if-then rules from the NN [17]. Recent proposals of NNs’ application to online TSA show how these properties can be turned into practical use. In general, these proposals present one of the following ideas: a) to rank or screen the contingencies, and after that perform detailed time-domain simulations [2], [12], [13]; b) to provide a stability evaluation during time-domain sim- ulations, halting the cases clearly evaluated as stable [14]; c) to provide fast stability evaluations and allow border iden- tification [11], [18]. In most of the NN proposals for online TSA, multilayer per- ceptrons (MLPs) are used, which present, as a major drawback, the extensive training process. Like other nonlinear learning ma- chines, they lack simple design procedures. In estimating a NN, one is found between two opposing extremes: i) to use lots of data for learning and suffer from long training, or ii) to use less data, and suffer from “insufficient” learning. Support vector machines (SVMs), a recently introduced learning paradigm, have very interesting theoretical and prac- 0885-8950/04$20.00 © 2004 IEEE
  • 2. MOULIN et al.: SUPPORT VECTOR MACHINES FOR TRANSIENT STABILITY ANALYSIS OF LARGE-SCALE POWER SYSTEMS 819 tical characteristics [19], [20]. They rely on so called support vectors (SVs) to identify the decision boundaries between dif- ferent classes. The SVs are located near the separation surfaces, which are critical to achieve correct classifications. SVMs can map complex nonlinear input/output relationships, and they are very well suited for TSA because the learning focus is on the security border. SVMs are based on a linear machine in a high dimensional feature space, nonlinearly related to the input space, which has allowed the development of somewhat fast training techniques, even with a large number of input variables and big training sets [21]. Investigations of the application of SVMs to TSA can be found in [7]–[10]. In the present work, it is shown that SVMs cope with the demands of large power systems’ TSA, and how they compare to MLPs. Feature selection techniques have been previously proposed to make the matter of high dimensionalities easier, especially in TSA, where the power system representation leads to a large number of input features [16]. Feature selection reduces the input dimensionality in order to use as few variables as possible, getting a more concise representation of the power system. This paper presents the application of MLP and SVM classifiers to the TSA of a real power system, the Brazilian Priba, comprising 2484 buses, 200 generation buses, and 5720 branches. The paper shows how the large input dimensional- ities represent a concern in stability classification. The paper also presents a comparison between MLP and SVM models, since the former is used in almost all previous proposals of NN application to TSA. Aspects of model adequacy, training time, classification accuracy, and dimensionality reduction are discussed, may any of the ideas (a), (b), or (c) be pursued. The structure of the paper is as follows. In Section II, a summarized description of SVM classifiers is sketched con- sidering the conceptual ideas and discussions on advantages and disadvantages. Section III describes the power system used in the tests and how the transient stability data have been collected for the NNs’ training. In Section IV, the details about the MLP and SVM training procedures are presented, including the feature selection processing. In Section V, the results of NN stability classifications are presented. Comparisons and discussions about the two models are also carried out. Finally, conclusions are drawn in Section VI. II. SUPPORT VECTOR MACHINE CLASSIFIERS SVMs are nonlinear models based on theoretical results from the statistical learning theory [19]. This theoretical framework formally generalizes the empirical risk minimization principle that is usually applied for NN training (i.e., the minimization of the number of training errors). In traditional NN training, several heuristics are applied in order to estimate a classifier with adequate complexity for the problem at hand. An SVM classifier minimizes the generalization error by op- timizing the tradeoff between the number of training errors and the so-called Vapnik-Chervonenkis (VC) dimension, which is a new concept of complexity measure. A formal theoretical bound exists for the generalization ability of an SVM, which depends on the number of training errors , the size of the training set , the VC dimension associated to the resulting classifier , and a chosen confi- dence measure for the bound itself [19] (1) The risk represents the classification error expectation over the entire population of input/output pairs, even though the pop- ulation is only partially known. This risk is a measure of the ac- tual generalization error and does not require prior knowledge of the data probability density function. Statistical learning theory derives inequality (1) to mean that the generalization ability of an SVM is bound by the right-hand side of (1). This upper limit is valid with probability . As increases, the first summand of the upper bound (1) decreases while the second summand increases, so that there is a balanced compro- mise between the two terms (i.e., training error and complexity), respectively. Consider a training set , where is a real- valued -dimensional input vector (i.e., and is a label that determines the class of . The SVMs employed for two-class problems are based on hyperplanes to separate the data, as exampled by Fig. 1. The hyperplane (indi- cated by the dotted line in Fig. 1) is determined by an orthogonal vector and a bias , which identifies the points that satisfy . By finding a hyperplane that maximizes the margin of separation , it is intuitively expected that the classi- fier will have a better generalization ability. The hyperplane with the largest margin on the training set can be completely deter- mined by the nearest points to the hyperplane. Two such points are and in Fig. 1(b), and they are called SVs because the hyperplane (i.e., the classifier) depends entirely on them. Therefore, in their simplest form, SVMs learn linear decision rules as (2) so that are determined to classify correctly the training examples and to maximize . To show the underlying reason for doing this, consider the fact that it is always possible to scale and so that (3) for the SVs with and (4) for non-SVs. Using the SVs and of Fig. 1 and (3), the margin can be calculated as (5) For linearly separable data, the VC dimension of SVM classi- fiers can be estimated by [19] (6) where is the minimum radius of a ball which contains the training points. For linearly separable data, as shown in Fig. 1,
  • 3. 820 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 2, MAY 2004 Fig. 1. Maximum margin classifier. a linear classifier can be found such that the first summand of bound (1) is zero. Therefore, the risk (1) can be reduced by decreasing the complexity of the SVM (i.e., by increasing the margin of separation , which is equivalent to decreasing ). As practical problems are not likely to be linearly separable, the linear SVM has been extended to a nonlinear version by mapping the training data to an expanded feature space using a nonlinear transformation (7) where . Then, the maximum margin classifier of the data in the new space can be determined. With this procedure, the data that are nonseparable in the original space may become separable in the expanded feature space. The next step is to es- timate the SVM by minimizing (i.e., maximizing (8) subject to the constraint that all training patterns are correctly classified, that is (9) However, depending on the type of nonlinear mapping (7), the training points may not happen to be linearly separable, even in the expanded feature space. In this case, it will be impossible to find a linear classifier that fulfills all of the conditions (9). Therefore, a new cost function is used, instead of (8) (10) where non-negative slack variables are introduced to allow for training errors (i.e., training patterns for which and ). By minimizing the first summand of (10), the complexity of the SVM is reduced, and by minimizing the second summand of (10), the number of training errors is decreased. is a preselected positive penalty factor that acts as a tradeoff between the two terms. The minimization of the cost function (10) leads to a quadratic optimization problem with a unique solution. In fact, the nonlinear mapping (7) is indirectly obtained by the so-called Mercer Kernel functions, which correspond to inner products of data vectors in the expanded feature space . Because the SVM formula- tion ends up with an inner product format (see the Appendix for more details), the Kernel function can substitute the nonlinear mapping (7) wherever it appears. In order for this equivalence to be valid, a Kernel function must satisfy some requirements called Mercer Conditions [20]. The most commonly used functions are the RBF kernel (11) and the polynomial kernel (12) where the parameters and in (11) and (12) must be preset. One important advantage of using a Kernel function instead of the nonlinear mapping (7) is that some of its key aspects like representation, complexity, and generalization capability become highly dependent on a few control parameters, as will be shown later. Another important advantage is related to the computational complexity of the large expanded dimension space . For example, the polynomial kernel (12) corresponds to a nonlinear expanded space of dimension , and the features , represent all of the monomials of the original input vector up to and including degree . In power systems, where is typically large, would become computationally intractable. However, by substituting the nonlinear mapping by the Kernel function, all calculations are performed in the original input space dimension. In summary, a nonlinear mapping (7) can be indirectly de- fined by a Kernel function [i.e., there is no need for specifying (7)], for example (11) or (12). Overfitting problems in the ex- panded feature space are overcome by implicit generalization control in the learning process. The parameters and affect how sparse and easily separable the data are in the expanded fea- ture space, and consequently, they affect the complexity of the resulting SVM classifier and the training error rate. The param- eter also affects the model complexity. Currently, there is no indication, besides trial and error, on how to set , to choose the best Kernel function, and to set the Kernel parameters. In prac- tice, a range of values has to be tried for and for the Kernel parameters, and then the performance of the SVM classifier is estimated for each of these values (and Kernel functions). De- tails on the minimization of (10) and the SVM architecture are shown in the Appendix. III. POWER SYSTEM DESCRIPTION AND DATA SET GENERATION The power system used for TSA tests is a subsystem of the Brazilian southeast grid, which is located in the region with the largest power consumption in the country. The system is basi- cally formed by the hydroelectric plants along Paranaíba and Grande rivers, and by the power grid around these plants. This so-called “Priba System” has 2484 buses, 200 generation buses, and 5730 branches, including 26 major 750-kV, 500-kV, and 345-kV transmission lines and transformers connecting the gen- eration plants to load centers and to other subsystems. The transient stability studies assume that 14 of the major branches become unavailable due to maintenance scheduling,
  • 4. MOULIN et al.: SUPPORT VECTOR MACHINES FOR TRANSIENT STABILITY ANALYSIS OF LARGE-SCALE POWER SYSTEMS 821 one at a time. For each major branch outage for maintenance scheduling, single contingencies are assumed in nine other major branches. The contingencies consist of three-phase short-circuits, which are cleared by tripping the corresponding line. Three load levels (light, medium, and heavy) have been simulated, besides the combinations of different generation dispatches and power exchanges between subsystems. These base cases for TSA have been simulated in the time domain, and each one has been classified as stable or unstable. With this procedure, 994 training patterns and 248 test patterns have been obtained. The TSA data set has a large percentage of stable cases, with the ratio of stable to unstable cases of approximately 10:1. It is important to clarify that the data used in this work have not been produced for estimating the proposed classifiers. In fact, the simulations were performed during operational studies for an electric utility, without any specific concern regarding the classifiers’ design. These studies include the specialists’ knowl- edge about the list of the most important contingencies, the most important variables, typical operating conditions, the required accuracy in power system modeling, etc., which represent the utility’s expertise as far as offline TSA is concerned. All of this knowledge, which has been considered and included in the training set, makes up the necessary information to be learned by a NN. This kind of data is usually available in electric utili- ties, and this paper intends to show that useful classifiers can be obtained from it without any other data requirement. However, as power systems are planned to operate most of the time under stable conditions, operational studies usually generate highly unbalanced classes. This unbalance between the stable and unstable classes can be troublesome for some classifiers. Based on the approach developed in [9]–[11], and [16], the following input variables have been chosen to describe the power system operating point and the applied fault: active and reactive power at selected generation buses before fault occurrence; active and reactive power flows on the 26 major branches before fault occurrence; and a binary coding for the nine faults under analysis. The output variable has been chosen to be the two classes of interest: stable or unstable . Taking into account the generation buses that yielded relevant information (i.e., significant variation on the generated power), 224 input variables have been preselected. Topology informa- tion is implicitly informed by the branches’ power flow vari- ables (“no flow” means an open line). IV. NEURAL NETWORKS TRAINING This section presents details about the MLP and SVM training procedures, including the feature selection processing. A. Multilayer Perceptron Training The MLPs have been trained by the Stuttgart NNs Simulator [22], which is a free software developed in C. The back-prop- agation training with adaptive learning and momentum rates, and cross-validation have been used. Cross-validation has been performed by randomly splitting the original training set and Fig. 2. MLP training on TSA data set. reserving 20% of its patterns for validation. The random split- ting was repeated at every 50 epochs (one epoch is one training cycle for which all training patterns have been presented to the NN once), when the training and validation errors were also cal- culated in order to monitor their behavior and to stop training earlier (i.e., before overfitting). It has been noticed that during the entire training process, the false dismissal rate (rate of unstable cases assigned to the stable class) is high and the false alarm rate (rate of stable cases as- signed to the unstable class) is close to zero, which is highly un- desirable. The training set unbalance between the two classes of interest damages the MLP classifier estimation, because it over- fits the stable data. An example of this overfitting is in Fig. 2, which presents the classification performance during the MLP training process. When training is terminated, after 1000 epochs (20 50), all of the errors are false dismissals. The error rate is the sum of the false dismissal and false alarm rates. To try to avoid overfitting the stable data and to decrease the false dismissal rates, a new training procedure has been per- formed. An augmented training set has been devised with the ratio between stable and unstable patterns artificially modified. A 1:1 ratio has been set by adding copies of unstable training patterns to the original training set and the results on the test set will be presented in Section V. B. SVM Training The SVM classifier is based on a subset of the training patterns, the support vectors, located at the separation region between the two classes. The SVs define the largest possible margin of separation. Two different kernel functions have been used, the RBF kernel (11) and the Polynomial kernel (12). The parameters (10), (11), and (12) have been searched heuristically, trying to achieve the best generalization capacity. The software [21], developed in C, has been used for training and testing the SVM models. The SVM training process consists of a quadratic optimiza- tion problem in which the support vectors represent the min- imum solution. The use of an augmented training set as in the MLP training is not appropriate because of linear dependen- cies in the constraints. Instead, to account for the training set unbalance, different values for can be used. A large value of for the unstable patterns and a small value for the stable ones have been adopted during the training process (the corre- sponding values of have been multiplied by 0.1 for the stable training patterns). In this way, the optimization process empha- sizes the minimization of the unstable patterns training errors. Different values for and for the parameter have been tried.
  • 5. 822 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 2, MAY 2004 The RBF kernel SVMs have not shown satisfactory results, because in the test set they have maximum false dismissal rate and 0% of false alarm rate, no matter the values of the param- eters. On the other hand, polynomial kernel SVMs have been trained successfully, and the results of their performance on the test set will be presented in Section V. C. Feature Selection Because of the high dimensionality of the input space, feature selection techniques have also been applied to achieve a more concise representation of the power system and overcome the curse of dimensionality. According to the notation introduced in Section II, a classifi- cation task in two groups is represented by ordered pairs in the training set , where represents the op- erating point and denotes the security index of that point. The main objective of a feature selection technique is to generate a -dimensional feature vector where . The “ ” selected features represent the original data in a new training set . If the feature selection is successful, a point in can be assigned to one of the two classes with minimum error. Two feature selection techniques are used in this work, as presented in [16]: sequential search and genetic algorithms (GAs). Re- ductions on the data set dimensionality from to , and have been tested. V. RESULTS AND DISCUSSIONS The following comparison between MLP and SVM models results from an extensive search over their parameters and the number of input variables ( and ). The classifiers have been initially trained to achieve low classification error rates. After that, the ones having the lowest false dismissal rates have been picked. Taking as the NN output (stability condition estimated by the NN), notice that the output range of the MLP model is , whereas the output range of the SVM model is . Operating points close to the stability border (outputs close to zero) indicate a dangerous situation, even if they are on the stable side, and deserve similar treatment as the unstable points. Therefore, besides stable and unstable, it is useful to take a third stability classification, according to the classifiers’ outputs . A high risk range is devised near the 0 classification threshold, and if falls within this range, the point is classified as “high risk” (to mean it is not considered stable nor unstable). For the SVM classifier, points with output values in the range are natural candidates for high risk cases, because they are located between the support vectors of different classes and near the classification border. Taking these considerations into account, the results of the two classifiers on the test set are presented in Table I. Column (2) shows the classification results with the high risk range for the SVM with 224 inputs, and . The false dismissal cases occur when and . The false alarm cases occur when and . The high risk cases occur whenever . The error rate is the sum of the false dismissal and false alarm rates. The total error, false dismissal, TABLE I SVMS AND MLPS PERFORMANCES ON THE TEST SET and false alarm rates are calculated as a percentage of 248 test patterns (225 stable and 23 unstable). Results with a more con- servative high risk range are shown in column (3) of Table I for the same SVM classifier, where the performance rates are calculated like the case represented by column (2). For the MLP classifier with 150 inputs selected by the GA, arbitrarily large high risk ranges , and have been defined, and the results are shown in columns (4), (5), and (6) of Table I, respectively. The perfor- mance rates have been calculated according to each high risk range, just as explained for the SVM model. The use of such high risk ranges is well known in NN applications to TSA, where false dismissal rates must be very close to 0%. There is a clear compromise between the high risk ranges and the false dismissal rates, because one increases as the other decreases. In a contingency screening application, by taking the SVM classifier criterion of column (3) instead of the one in column (2), one would favor reliability and would conservatively accept % % % more high risk cases. Although most of these would be stable cases, they could start up preventive actions or detailed stability simulations, depending on the specific approach taken. As the high risk range is increased, the classifier becomes more reliable (i.e., with lower false dismissal rate), but less effective in screening out true stable cases. The false dismissal rate would decrease from 0.8% to 0% and both classifiers of columns (2) and (3) have good performances. The decision about which one to take is a project decision. The conceptual description of high risk cases for the SVMs leads to a comprehensive definition of high risk ranges, whose validity is confirmed in practice by the results presented in Table I. By taking the MLP of column (5) instead of the one of column (4), the false dismissal rate would decrease from 3.2% to 2.4%. However, it cannot be lowered further, even by setting a very large upper limit for the high risk range, as it is shown in column (6) of Table I.
  • 6. MOULIN et al.: SUPPORT VECTOR MACHINES FOR TRANSIENT STABILITY ANALYSIS OF LARGE-SCALE POWER SYSTEMS 823 A false dismissal rate of 0.8% on the test set means that 8.7% of the unstable patterns in the test set have been misclassified. A false dismissal rate of 2.4% means that 26.1% of the unstable patterns in the test set have been misclassified. For the MLP, feature selection allows a gain in generalization resulting from the dimensionality reduction, despite the loss of information due to the discarded variables. However, the best MLPs offer no competition to the best SVMs. Surprisingly, the dimensionality reduction does not provide any improvement to the overall performance of the SVMs, if compared to the ones estimated with the original inputs. Table I also shows the training times in cpu seconds and cpu minutes, run in an 850-MHz PC. The SVM training time is lower than the MLP training by one order of magnitude. It is worth noting that the training set size is not large compared to the number of MLP parameters. In fact, larger training sets would be recommended for a better representation of the power system behavior, though the relationship between the compu- tational burden and the training set cardinality is inconvenient when employing backpropagation learning. The number of SVs (168) determines the number of free parameters of the SVM classifier [see Fig. 4 and (15) in the Appendix, and recall that only for the SVs]. That helps to understand why a training set with 994 patterns only is enough to estimate an SVM with good performance, despite the large number of input variables. The results show how the SVM models can take advantage of a stability study database already available in a control center. Their performance could continuously improve during the online TSA process by adding new operating conditions to the training set. The SVM performance stems from its capacity to generalize well from the available training data, which is related to an “im- plicit” feature selection ability. In order to make that clear, take the receiver operating characteristic (ROC) curves, shown in Fig. 3. This graph presents false dismissal rates on the hori- zontal axis and detection rates on the vertical axis. The detection rate is just an indirect measure of the false alarm rate, calcu- lated by subtracting it from 100% (without considering the high risk range, that is, when the classification threshold is zero). Each ROC curve in Fig. 3 corresponds to SVMs trained with the indicated input variables and different values of . For each curve, a specific value of has been chosen to the best possible ROC, which is the one having the points with the lowest false dismissal rates coupled with the highest detection rates. ROC curves cannot be drawn for the MLPs, because there is no pa- rameter to control the relationship between false alarm and false dismissal rates. The number of factors that affect an MLP’s per- formance are large and inter-related. The “best” curve in Fig. 3 is from 224 inputs. As the number of input variables decreases, the curves get “worse,” which is an indication of loss of discrimination capability as variables are discarded from the original input set. On the other hand, it is also an indication of how good the SVMs can be on the high di- mensional original space. That means, embedded in the learning process of an SVM there is an automatic feature selection, which prevents it from being trapped by the curse of dimensionality on the expanded feature space. Fig. 3. Polynomial SVM ROC curves for different input sets. Fig. 3 also shows that the parameters and can be used to control the relationship between the number of false dismissals and false alarms. For a specific value of , it is possible to move the SVM classifier to the left of the ROC curve as much as de- sired to achieve lower false dismissal rates, at the expense of larger false alarm rates. This procedure gives some inspiration on a structured way to design an SVM classifier for TSA. For a selected Kernel function: a) perform a fine search over the values of and of the Kernel parameter, and draw ROC curves for them; b) choose the curve with the “best” RO characteristic (the one having points with the largest ratios between detec- tion rates and false dismissal rates); c) choose the point in the “best” ROC curve with the largest ratio between detection and false dismissal rates; d) pick the values of and of the Kernel parameter that cor- respond to the point chosen in (c); e) choose a high risk range that provides acceptable false dismissal rate for the classifier estimated from (d). VI. CONCLUSION This paper shows that SVMs fit the TSA task for large power systems. They provide a different strategy to tackle the curse of dimensionality. The SVMs performed better when the com- plete set of input variables was used, which confirms, in prac- tice, their implicit feature selection capability and the validity of the theoretical developments on generalization control. The SVM learning machine allows a deep understanding of its prac- tical implications, which can be used to devise structured design practices for the model. The sparsity reduction of the data has turned the training process into an easier task for MLPs. However, the MLPs performance (3.2% of false dismissal rate, 11.7% of false alarm rate, and 3.23% of high risk rate) is not as good as the SVMs (0.8% of false dismissal rate, 4.8% of false alarm rate, and 14.1% of high risk rate). It has been shown that stability studies databases already available in electric utilities, containing specialists’ knowledge, can be used in NN-based TSA as a good starting point. Future work will focus on dynamic features. The majority of work about TSA, based on the pattern recognition approach,
  • 7. 824 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 19, NO. 2, MAY 2004 has focused the analysis on prefault static features. With the popularization of synchronized phasor measurements acquisi- tion systems (phase angle monitors), loss of synchronization can be predicted, in real time, based on postfault phasor measure- ments (i.e., speed and acceleration at each generation bus are calculated from the synchronized phasor measurements) [23]. Therefore, the next generation of transient stability assessment tools will be allowed to move from preventive countermeasures (contingency analysis on prefault operating points) to corrective control. Another promising idea for online TSA of large-scale power systems is the hybrid approach based on direct-type methods coupled with detailed time simulation [24]. In this approach, NNs can be used as filters to discard stable contingencies in a very fast way. APPENDIX The computation of the decision boundary of an SVM for the nonseparable case consists in solving the following optimization problem: minimize subject to (13) Instead of solving (13) directly, it is much easier to solve the dual problem (14), in terms of the Lagrange multipliers minimize (14) subject to and which is a quadratic optimization problem. From the solution, of (14), the decision rule can be com- puted as (15) The training points with are the SVs, and (15) depends entirely on them. The threshold can be calculated using (3), which is valid for any SV (16) An SVM can be represented as in Fig. 4, where the number of units is determined by the number of SVs. Fig. 4. SVM architecture. ACKNOWLEDGMENT The authors would like to thank the researchers at GESIS Lab, UNIFEI (Federal University at Itajubá, Brazil), for making the Transient Stability studies used in this paper available. They would also like to thank the reviewers for their valuable com- ments and questions, which helped to improve this work. REFERENCES [1] L. A. Wehenkel, Automatic Learning Techniques in Power Sys- tems. Norwell, MA: Kluwer, 1998. [2] J. L. Jardim, C. A. da S. Neto, A. P. A. da Silva, A. C. Zambroni de Souza, D. M. Falcão, C. L. T. Borges, and G. N. Taranto, “A unified online security assessment system,” in Proc. CIGRÉ, Paris, France, Aug. 2000. [3] Y. Mansour, E. Vaahedi, A. Y. Chang, B. R. Corns, B. W. Garrett, K. Demaree, T. Athay, and K. Cheung, “B. C. Hydro’s on-line transient stability assessment (TSA) model development, analysis, and post-pro- cessing,” IEEE Trans. Power Syst., vol. 10, pp. 241–250, Feb. 1995. [4] D. J. Sobajic, Y.-H. Pao, and M. Djukanovic, “Neural networks for as- sessing the transient stability of electric power systems,” Neural Net- works Applications in Power Systems, pp. 255–294, 1996. [5] R. Fischl, D. Niebur, and M. A. El-Sharkawi, “Security assessment and enhancement,” in Artificial Neural Networks with Applications to Power Systems, M. A. El-Sharkawi and D. Niebur, Eds., 1996, ch. 9, pp. 104–127. IEEE Catalog no. 96TP112-0. [6] A. P. A. da Silva, C. Ferreira, G. L. Torres, and A. C. Z. de Souza, “A new constructive ANN and its application to electric load representation,” IEEE Trans. Power Syst., vol. 12, pp. 1569–1575, Nov. 1997. [7] A. E. Gavoyiannis, D. G. Vogiatzis, and N. D. Hatziargyriou, “Dynamic security classification using support vector machines,” in Proc. IEEE Int. Conf. Intell. Syst. Applicat. Power Syst., Budapest, Hungary, June 2001, pp. 271–275. [8] A. E. Gavoyiannis, D. G. Vogiatzis, D. P. Georgiadis, and N. D. Hatziar- gyriou, “Combined support vector classifiers using fuzzy clustering for dynamic security assessment,” in Proc. Power Eng. Soc. Summer Meeting, vol. 2, 2001, pp. 1281–1286. 2001. [9] L. S. Moulin, A. P. A. da Silva, M. A. El-Sharkawi, and R. J. Marks II, “Support vector and multilayer perceptron neural networks applied to power systems transient stability analysis with input dimensionality reduction,” in Proc. IEEE Power Eng. Soc. Summer Meeting, Chicago, IL, July 2002. [10] , “Neural networks and support vector machines applied to power systems transient stability analysis,” Int. J. Eng. Intell. Syst., vol. 9, no. 4, pp. 205–211, Dec. 2001. [11] I. N. Kassabalidis, M. A. El-Sharkawi, R. J. MarksII, L. S. Moulin, and A. P. A. da Silva, “Dynamic security border identification using en- hanced particle swarm optimization,” IEEE Trans. Power Syst., vol. 17, pp. 723–729, Aug. 2002. [12] Y. Mansour, E. Vaahedi, and M. A. El-Sharkawi, “Dynamic security contingency screening and ranking using neural networks,” IEEE Trans. Neural Networks, vol. 8, pp. 942–950, July 1997. [13] Y. Mansour, A. Y. Chang, J. Tamby, E. Vaahedi, and M. A. El-Sharkawi, “Large scale dynamic security screening and ranking using neural net- works,” IEEE Trans. Power Syst., vol. 12, pp. 954–960, May 1997.
  • 8. MOULIN et al.: SUPPORT VECTOR MACHINES FOR TRANSIENT STABILITY ANALYSIS OF LARGE-SCALE POWER SYSTEMS 825 [14] I. Kamwa, R. Grondin, and L. Loud, “Time-varying contingency screening for dynamic security assessment using intelligent-systems techniques,” IEEE Trans. Power Syst., vol. 16, pp. 526–536, Aug. 2001. [15] Y. M. Park, G.-W. Kim, H.-S. Cho, and K. Y. Lee, “A new algorithm for Kohonen layer learning with application to power system stability analysis,” IEEE Trans. Syst., Man, Cybern. B, vol. 27, pp. 1030–1034, Dec. 1997. [16] L. S. Moulin, M. A. El-Sharkawi, R. J. Marks II, and A. P. A. da Silva, “Automatic feature extraction for neural network based power systems dynamic security evaluation,” in Proc. IEEE Int. Conf. Intell. Syst. Ap- plicat. to Power Syst., Budapest, Hungary, June 2001, pp. 41–46. [17] P. J. Abrão, A. P. A. da Silva, and A. C. Z. de Souza, “Rule extraction from artificial neural networks for voltage security analysis,” in Proc. Int. Joint Conf. Neural Networks, Honolulu, HI, May 2002. [18] J. D. McCalley, Q. Zhao, S. Wang, G. Zhou, R. T. Treinen, and A. D. Pa- palexopoulos, “Security boundary visualization for systems operation,” IEEE Trans. Power Syst., vol. 12, pp. 940–947, May 1997. [19] V. Vapnik, Statistical Learning Theory. New York: Wiley, 1998. [20] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge, U.K.: Cambridge Univ. Press, 2000. [21] T. Joachims, Advances in Kernel Methods—Support Vector Learning, B. Scholkopf, J. C. C. Burges, and A. J. Smola, Eds. Cambridge, MA: MIT Press, 1998. Making large-scale SVM learning practical. [22] Stuttgart Neural Network Simulator [Online]. Available: http://www- ra.informatik.uni-tuebingen.de/SNNS/ [23] S. Rovnyak, S. Kretsinger, J. Thorp, and D. Brown, “Decision trees for real-time transient stability prediction,” IEEE Trans. Power Syst., vol. 9, pp. 1417–1426, Aug. 1994. [24] D. Ernst, D. R. Vega, M. Pavella, P. M. Hirsch, and D. Sobajic, “A unified approach to transient stability contingency filtering, ranking and assess- ment,” IEEE Trans. Power Syst., vol. 16, pp. 435–443, Aug. 2001. Luciano S. Moulin was born in Nanuque, Brazil, in 1972. He received the B.Sc. and M.Sc. degrees in electrical engineering from the Federal Engineering School at Itajubá (EFEI), Itajubá, Brazil, in 1995 and 1998, respectively. He re- ceived the D.Sc. degree in electrical engineering from the Federal University at Itajubá (UNIFEI) [previously EFEI] in 2002. Currently, he is a Researcher in Electrical Engineering with the Electric Power Research Center (CEPEL). During 2000, he was a Visiting Student in the De- partment of Electrical Engineering at the University of Washington, Seattle. Alexandre P. Alves da Silva (SM’00) was born in Rio de Janeiro, Brazil, in 1962. He received the B.Sc. and M.Sc. degrees in electrical engineering from the Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil, in 1984 and 1987, respectively, and the Ph.D. degree from the University of Waterloo, Waterloo, ON, Canada, in 1992. Currently, he is a Professor in Electrical Engineering at the Federal University of Rio de Janeiro, Rio de Janeiro, Brazil, where he and his group have developed intelligent forecasting and security assessment systems that are in operation at the control centers of Brazilian electric utilities. From 1993 to 2002, he was with the Federal Engineering School at Itajuba, Itajuba, Brazil. He was also with the Electric Energy Research Center (CEPEL), Rio de Janeiro, Brazil, from 1987 to 1988. During 1999, he was a Visiting Professor in the Department of Elec- trical Engineering at the University of Washington, Seattle. He has authored and co-authored many papers on intelligent systems application to power systems. Dr. Alves da Silva was the Technical Program Committee Chairman of the First Brazilian Conference on Neural Networks in 1994, and of the International Conference on Intelligent System Applications to Power Systems in 1999. Mohamed A. El-Sharkawi (F’95) received the B.Sc. degree in electrical engi- neering from Cairo High Institute of Technology, Cairo, Egypt, in 1971, and the M.Sc. and Ph.D. degrees from the University of British Columbia, Vancouver, BC, Canada, in 1977 and 1980, respectively. Currently, he is a Professor of Electrical Engineering at the University of Washington, Seattle. He is the founder of the International Conference on the Application of Neural Networks to Power Systems (ANNPS), which was later merged with the Expert Systems Conference and renamed Intelligent Systems Applications to Power (ISAP). He is the co-editor of the IEEE tutorial book on the applications of neural networks to power systems. He has published many papers and book chapters. He holds five patents: three on Adaptive Var Con- troller for distribution systems and two on Adaptive Sequential Controller for circuit breakers. Robert J. Marks, II (F’94) is a Professor and Graduate Program Coordinator with the Department of Electrical Engineering at the College of Engineering, University of Washington, Seattle. He is the author of numerous papers and is co-author of the book Neural Smithing: Supervised Learning in Feedfor- ward Artificial Neural Networks. He served as the Editor-in-Chief of the IEEE TRANSACTIONS ON NEURAL NETWORKS and as a Topical Editor for Optical Signal Processing and Image Science for the Journal of the Optical Society of America. Dr. Marks is a Fellow of the Optical Society of America. He served as the first President of the IEEE Neural Networks Council. In 1992, he was given the honorary title of Charter President.