SlideShare a Scribd company logo
Application of combined support vector machines in process fault diagnosis
Esmaeil Tafazzoli and Mehrdad Saif
Abstract— The performance of Combined Support Vector
Machines, C-SVM, is examined by comparing it’s classification
results with k-nearest neighbor and simple SVM classifier. For
our experiments we use training and testing data obtained from
two benchmark industrial processes. The first set is simulated
data generated from Tennessee Eastman process simulator and
the second set is the data obtained by running experiment on a
Three Tank system. Our results show that the C-SVM classifier
gives the lowest classification error compared to other methods.
However, the complexity and computation time become issues,
which depend on the number of faults in the data and the data
dimension. We also examined Principal Component Analysis,
using PC scores as input features for the classifiers but the
performance was not comparable to other classifiers’ results.
By selecting appropriate number of variables using contribution
charts for classification, the performance of the classifiers on
Tennessee Eastman data enhances significantly. Therefore, using
contribution charts for selecting the most important variables
is necessary when the number of variables is large.
I. INTRODUCTION
Support vector machine is a well known technique in the
field of machine learning which is used for classification.
Implementing nonlinear kernels in the SVM structure enables
classification of nonlinear data which can not be classified
by simple linear classifiers. In SVM classification method,
an optimum hyperplane is defined which maximizes the
separation between data point classes [3].
In many works on fault detection and diagnosis, the SVM
classifier is combined with another method such as Prin-
cipal Component Analysis (PCA), Independent Component
Analysis (ICA), Fisher Discriminant Analysis (FDA), etc, to
reduce the data dimension and to accomplish the detection
part of the Fault Detection and Identification (FDI) process.
Therefore, the diagnosis part is carried out by SVM classifier.
Mostly, the SVM classifier operates on the processed data
or features, resulting from other methods (PCA scores for
example) [1]. In [4], ICA projection coefficients were used as
feature data for training the SVM classifiers. In [5], authors
compared the performance of FDA, SVM, and PSVM (prox-
imal support vector machines). They showed that support
vector machines perform better than FDA in classifying TE
data. In general, SVM is a two-class classifier. A Two-
class classification means that data points are assigned to
only one of the two class labels in the data set while in
multiclass classifiers, there are multiple class labels and the
classifier assigns each point to one of the classes. Multiple
classification problems can be turned to multiple two-class
classification problem. The number of required classifiers
The authors are both with the School of Engineering Science, Simon
Fraser University, 8888 University Drive, Vancouver, BC, V5A 1S6, Canada.
Corresponding Email: saif@ensc.sfu.ca
depends on the number of faults to be classified. As a result,
most SVM classifiers are multiple-SVM classifiers. The term
Committee is referred to the combination of classifiers in
machine learning area. A committee is built by combining
several models (classifiers). Usually the outcome of the
committee is better than individual models [8]. Averaging,
boosting, and adaptive boosting are some of the methods of
combining the models [3].
K-Nearest Neighbor (KNN) is one of the simplest classi-
fication algorithms in machine learning. K-Nearest neighbor
classification method was first introduced by Cover and Hart
[2], in which the class of each sample point is determined
by its K neighboring points in the training set. The point
is assigned to the class with the majority of votes amongst
the K-neighbor points. Several types of KNN algorithm have
been suggested and applied to different data sets in the fields
of data mining and machine learning. Many papers can be
found on KNN or combination of KNN with other methods
for improving data classification. For more information on
KNN algorithm and its application the following references
would be helpful [9]-[17]. In this paper we use averaging
method for the combined classifiers. Considering the idea
of committee classifier, we develop a combined- SVM,C-
SVM, classifier and investigate its performance compared to
individual classifiers on the data generated from Tennessee-
Eastman (TE) simulator and the Three Tank System which
are well known benchmark experimental processes used
for control , monitoring, and fault diagnosis experiments.
We also examine the performance of a K-nearest neighbor
classifier in comparison with C-SVM when applied to this
set of data.
II. TWO CLASSIFICATION METHODS
A. Support Vector Machines
SVM algorithm is usually used for two-class separation
problems [3]. The algorithm finds the maximum margin for
a separating boundary between two classes of data. Suppose
we have a set of data that can be separated into two classes.
The data is separated by training a linear model
y(x) = wT
ϕ(x) + b (1)
Equation (1) is the mathematical representation of the linear
model. In this model the training data matrix is an n × m
matrix where each row of the matrix represents an observed
data point, xi, which is a vector of length m. So, n is the
number of data points and m is the number of variables.
Each data point’s class is determined by its target value.
The corresponding target values are stacked in a vector t
2009 American Control Conference
Hyatt Regency Riverfront, St. Louis, MO, USA
June 10-12, 2009
ThC06.4
978-1-4244-4524-0/09/$25.00 ©2009 AACC 3429
with ti ∈ {−1, 1} as it’s elements. ϕ(x) is called feature
space transformation function and b is bias. w’s are weights
which affect the separating plane direction. Function y(x)
has the property that y(xi) > 0when ti = 1 and y(xi) < 0
when ti = −1. Therefore, tiy(xi) > 0 for all i . In SVM
algorithm, the distance between the closest data points to the
decision boundary which is called the boundary margin, is
maximized (see Fig.1). Therefore, in SVM the hyperplane
which maximizes the margin is chosen as the decision
boundary. The maximization criterion is:
arg max
w,b
{
1
w
min
i=1,...,n
[ti(wT
ϕ(xi) + b)]}
and the points with minimum distance are known as Support
Vectors. Fig (1) illustrates the location of support vectors and
the decision boundary.
The model parameters, w and b, are found by solving a
constrained optimization problem as
arg min
w
1
2
w 2
s.t., ∀i, ti(wT
(ϕ(xi) + b) ≥ 1
This problem is solved by using Lagrange multipliers. The
lagrangian is
L(w, b, a) =
1
2
w 2
−
n
i=1
ai{ti(wT
ϕ(xi) + b) − 1}
where ai are Lagrange multipliers. As a result, the weights
and bias are found and the decision function, becomes
y(x) = wT
ϕ(x) + b =
n
i=1
aitik(x, xi) + b
The data classification task is carried out by computing
the sign(y(x)) for each test point. Using nonlinear kernels
allows linear classification of nonlinearly separable data in
higher dimension of the kernel space. The two well known
kernels are RBF kernel and polynomial kernel which are
defined as
RBF : k(xi, xj) = exp(
− xi − xj
δ
)
Polynomial : k(xi, xj) = (xixj + 1)d
In many problems data points in different classes have over-
lap which causes problem for classification. This happens
when data is not linearly separable in the feature space. In
this case, support vectors can not classify the points’ class
properly and give poor result. To overcome this problem,
SVM constraint is relaxed from
tiy(xi) ≥ 1
to
tiy(xi) ≥ 1 − ζi (2)
where ζi i = 1, ..., n is called the slack variable. Fig (2)
shows the concept of slack variables.
By using slack variables, some points can be misclassified
which gives flexibility to classifier. In this way some data
Fig. 1. support vectors illustration
Fig. 2. Illustration of slack variables used for non-separable data
points are misclassified but there will be a penalty which
increases the error function. Therefore, the algorithm maxi-
mizes the margin while minimizes the penalty for the points
in the wrong side of the boundary. So the criterion becomes
min{C
n
i=1
ζi +
w 2
2
} (3)
where C is the controlling parameter, which controls the
trade off between the model complexity and minimizing
classification error. High value of C results in over-fitting
the data and in the limit, the SVM model is the same as the
SVM for separable data.
The optimization problem now turns to minimizing (3)
with constrains in (2). The Lagrangian is given by
L(w, b, a) =
w 2
2
+ C
n
i=1
ζi (4)
−
n
i=1
ai{tiy(xi) − 1 + ζi} −
n
i=1
µiζi
where ai > 0 and µi > 0 are Lagrangian multipliers [3].
B. K-Nearest Neighbor Classification
In K-nearest neighbor classification, the class of each
sample point is determined by its K neighboring points in the
training set. The point is assigned to the class with the ma-
jority of votes for class label amongst the K-neighbor points.
The classifier is defined by its parameters. Setting parameter
K depends on the data and effects the performance of the
classifier. K must be large enough to reduce missclassification
3430
Fig. 3. Tennessee Eastman process simulator diagram[5]
of an example point and must be small enough so that the
sample point is close to the neighboring points which results
in better estimation of the point’s class [2].
III. EXPERIMENT DATA
A. Tennessee Eastman process
The Tennessee Eastman process (TE) which is a chemical
plant involving four exothermic gas reactions was proposed
and modeled by Downs and Vogel as a plant-wide control
challenge problem [6]. The process has been used for many
research experiments in fault detection and control. It has
fifty two variables including measured and manipulated
variables and twenty one faults that have been defined for the
process. In this work, faults 4, 9, and 11 are chosen as the
training and testing data which have overlap between each
other [7]. Fault 4 is defined as a step change in the reactor
cooling water temperature. Fault 9 is a random variation in
one of the reactants (reactant D) feed temperature and fault
11 is a random variation in the reactor cooling water tem-
perature. The data is taken from http://brahms.scs.uiuc.edu.
Each set of training and testing data contain 480 × 52 and
960 × 52 points respectively, observed every three min of
simulation and faults occur after 1 hour and 8 hour of
simulation respectively. Figure (3) illustrates the TE plant
simulation diagram. Figure (4) shows the plot of faulty data
in first and second variable space and figure (5) shows the
plot of faulty data in the two dimensions where the data has
the most separability.
B. Three Tank System
As a benchmark control problem, the Three Tanks System
(3TS) is used in many different research projects. The
basic structure of the system contains three tanks which are
connected to each other by pipes. Two tanks are filled with
2 pumps while the third one is filled only through the pipes
connected to the other two. Our experimental setup is an
AMIRA DTS200 in which the water level is measured with
three piezo-resistive difference pressure sensor [19]. DTS200
contains 6 valves which are used to emulate clogging and
leakage in the system. Figure (6) shows the system flow
sheet. The system has the following specifications:
Tank cross section area, A = .0154m2
Fig. 4. Test data plot of variables 1 and 2 for fault 4, 9, and 11
Fig. 5. Test data plot of variables 9 and 51 for fault 4, 9, and 11
Connecting pipes cross section area, az = 5 × 10−5
m2
Highest liquid level, Hmax = 62cm
Maximum pump flow rate, Qmax = 100mltr/sec.
The system is equipped with a disturbance module which
allows simulating 11 types of faults for fault detection
research purposes including three sensor faults, two actuator
faults, leakage for each of the three tanks ,clogging between
the tanks, and clogging in the outflow. The training and
testing data size are 500 × 5 for each case of fault with
water levels and flow rates as variables. Faults are instigated
at sample 55 in each case. We assume that only one fault
occurs at a time and there are no simultaneous faults. Figures
(7) and (8) show two example plots of data when leak and
sensor fault occur in the system.
Fig. 6. Three Tank system structure[19]
3431
0 100 200 300 400 500
25
25.5
26
26.5
27
27.5
28
28.5
29
29.5
30
Tank 1 water level, leak occurs in tank 1 at sample 55
sample number
waterlevel(cm)
Fig. 7. Example plot of level sensor in three tank system
0 100 200 300 400 500
17
18
19
20
21
22
23
24
25
26
27
flowrate(mltr/sec)
sample number
Pump flow rate, fault in tank 2 sensor at sample 55
Fig. 8. Example plot of flow rate for three tank system
IV. CLASSIFICATION PROCEDURE AND RESULTS
In every fault detection and diagnosis system, the FDI
process includes detecting the fault in the process and then
identifying the type of the fault. Here, we focus on the
diagnosis part of the FDI process and assume that fault
detection has been accomplished. After fault detection stage
in FDI, we use SVM for fault classification. It should be
noted that using this method for fault diagnosis requires
prior knowledge about different faults because classifiers are
trained and structured based on this knowledge. Here, we
examine the performance of the C-SVM compared to simple
SVM with different kernels and also to K-nearest neighbor
classifier. Hence, a training and a testing data set is collected
from the processes.
The choices of different SVM depend on their parameters.
Type of the kernel, value of C, width of the RBF kernel,
polynomial kernel degree, and number of SVM to be used
in the committee are such example parameters.
Since there are many different combinations to choose,
we only restrict our experiment to a simple case with three
different kernels to be used in the SVM-classifier.
However, we selected the parameter C, by testing the
SVM performance on different values of C ranging in
[.1, 105
]. The parameter values used in the experiment are:
C = 100, δ = 1 (RBF kernel parameter as suggested in [5]),
and Poly − degree = 3
In [5], it is pointed out that for TE data in this case
Fig. 9. SVM training procedure
TABLE I
CLASSIFICATION ERROR FOR DIFFERENT CLASSIFIERS APPLIED TO TE
PROCESS DATA
Classifier Classification error %
SVM(linear kernel) 26.7
SVM(RBF kernel) 8.3
SVM(Polynomial kernel) 7.3
C-SVM 6.7
KNN classifier 8.4
(fault 4,9, 11) only two variables are important and the other
fifty variables do not show significant changes caused by
the faults. They used contribution charts to find the most
important variables for this case. These variables are var−51
(reactor cooling water valve position) and var − 9 (reactor
temperature). We use these two variables to train and test
our classifier in fault classification on TE data set.
The algorithms was implemented in MATLAB using SVM
toolbox from [18]. The procedure for building the classifier is
as follows: For every two faults we train a C-SVM classifier.
Each classifier is a combination of three SVM with different
kernels (linear, RBF, polynomial), trained with data that are
a mixture of the two fault class data set. The output is simply
the average of the three. Fig (9) depicts the training procedure
for C-SVM. In this figure, data pre-processing includes
scaling and selecting appropriate variables for classification
which has to be done before training SVM’s. When SVM is
trained the final classifier is tested with test data to determine
the classification error and to evaluate the performance of
the classification system. The error is simply defined as the
percentage of misclassified points in the whole data set. Here,
misclassification indicates a point whose class is determined
incorrectly. Fig (10) shows the block diagram of the test
data classification process. The data class is determined by
selecting the maximum vote for data from the classifiers. If
there is a tie between classifiers’ vote then the fault class
is chosen randomly. The TE test data for class 4, 9, and
11 were applied to the classifier. Classification is based on
one-against-one classifier which means for every two faults
a classifier is trained. So we have three SVM classifier for
fault 4 − 9, 9 − 11 and 4 − 11 shown as C-SVM 1, 2, and
3 in Fig (10).
When all variables were included in the data for train-
ing and testing, the classification error was 43.1%. Using
selected data variables (variables 9 and 51) in the training
and testing data sets resulted in 6.7% error which shows
about 36.3% decrease. Classification error for applying SVM
on the first two PCA scores gave very poor performance
with 64% error which is not an acceptable result. Table(I)
3432
Fig. 10. Classification procedure for TE data
TABLE II
CLASSIFICATION ERROR FOR DIFFERENT CLASSIFIERS APPLIED TO
THREE TANK SYSTEM DATA
Classifier Classification error %
SVM(linear kernel) 14.03
SVM(RBF kernel) 13.74
SVM(Polynomial kernel) 30.53
C-SVM 12.17
KNN classifier 14.57
presents the results for different classifiers applied to TE
data. In the second experiment with real data from the Three
Tank System(3TS), the procedure is modified to enhance
the computation time and complexity of classification. We
first train a classifier to separate faults based on their type
into four classes, i.e., leakage, clogging, sensor fault, and
pump fault. When the type of the fault is determined then
the location is determined by using another classifier which
is trained for that specific category, e.g., leak in tank 1.
The classification results are shown in table (II). The C-
SVM gives the best result for classification with 12.17%
classification error. SVM classifiers with linear and RBF
kernel also give slightly better results than KNN and SVM
with polynomial kernel.
V. DISCUSSION AND CONCLUSION
As presented in table(I), by comparing classification er-
rors, the C-SVM outperforms all other classifiers. However,
considering the computation time, the KNN classifier per-
forms much faster than SVM based classifiers which is
caused by using several SVM’s, each of which containing
kernel calculation that takes the computation time. This can
be problematic when the data dimension is high. Therefore,
data reduction techniques are highly recommended prior to
using SVM. The number of SVM used in the combined clas-
sifier is also an important parameter in forming the classifier
which has to be considered. The training time increases by
the number of SVM. As presented above, the performance
of the method is based on the results of the experiments
performed on two benchmark systems. However, for further
confirmation, the method should be tested on other different
processes in order to achieve a comprehensive understanding
of the proposed method.
REFERENCES
[1] C.M. Bishop, Pattern Recognition and Machine Learning, Springer,
Singapore; 2006.
[2] X. Zhao, S. Huihe, “A Novel Combination Method for On-line
Process Monitoring and Fault Diagnosis”, IEEE Tran. Industrial
Electronics ISIE , 4,2005, pp.1715- 1720.
[3] Y. Song et al., IKNN: Informative K-Nearest Neighbor Pattern Clas-
sification,PKDD 2007, Springer-Verlag, Berlin Heidelberg; 2007.
[4] M. Guo, L. Xie, S. Wang, J. Zhang, “Research on an Integrated ICA-
SVM Based Framework for Fault Diagnosis”, IEEE Proc. Syst., Man,
and Cybern., 3, 2003, pp. 2710-2715.
[5] Chiang L.H., M.E. Kotanchek, A.K. Kordon, “Fault diagnosis based
on Fisher discriminant analysis and support vector machines”, Com-
puter and Chemical Eng., 28, 2004, pp. 1389-1401.
[6] J.J. Downs, E.F. Vogel, “A Plant-Wide Industrial Process Con-
trol Problem”, Computers and Chemical Engineering, 17(3),
1993,pp.245-255.
[7] L.H. Chiang, E.L. Russell, R.D. Braatz, “Fault diagnosis in chemical
processes using Fisher discriminant analysis, discriminant partial
least squares, and principal component analysis”, Chemometrics and
Intelligent Laboratory Systems 50, 2000, pp. 243-252.
[8] G. Mori, “Introduction to machine learning”, lecture
notes,[Online],available:http://www.cs.sfu.ca/ mori/courses/cmpt726,
accessed Aug. 2008.
[9] C. Domeniconi, J. Peng, D. Gunopulos, “Locally adaptive metric
nearest-neighbor classification”,IEEE Trans. Pattern Anal. Mach.
Intell., 24(9), 2002, pp. 1281-1285.
[10] T. Cover, P.Hart,“Nearest neighbor pattern classification”,IEEE
Trans. on Information Theory, 13(1), 1967, pp. 21-27.
[11] V. Athitsos, S. Sclaroff,“Boosting nearest neighbor classifiers for
multiclass recognition”,IEEE Compt. Society Conf. on Computer
Vision and Pattern Recognition, 3, 2005, pp. 45-45.
[12] T. Hastie, R. Tibshirani,“Discriminant adaptive nearest neighbor
classification”,IEEE Trans. Pattern Anal. Mach. Intell., 18(6), 1996,
pp. 607-616.
[13] H. Zhang, A.C. Berg, M. Maire, M. Malik,“Discriminative nearest
neighbor classification for visual category recognition”,IEEE Compt.
Society Conf. on Computer Vision and Pattern Recognition, 2, 2006,
pp. 2126- 2136.
[14] Y. Pingpeng, Y. Chen, H. Jin, L. Huang,“MSVM-kNN: combining
SVM and k-NN for multi-class text classification”,IEEE Int. Work-
shop on Semantic Computing and Systems, 2008, 133-140.
[15] W. Shu-Bin et al.,“Classification algorithm based on weighted SVMs
and locally tuning kNN”,International Conference on Biomedical
Engineering and Informatics, 2008, pp. 240-244.
[16] L. Ping, L. Nan, W. Jian-yu, Z.Chun-Guang,“Combining weighted
SVMs and spectrum-based kNN for multi-classification”,Proc. 4th
Int. Symp. Neural Networks, 2007, pp. 448-53.
[17] Q. He, J. Wang,“Principal component based k-nearest-neighbor rule
for semiconductor process fault detection”, IEEE Trans. Semiconduc-
tor Manufacturing, 20(4), 2008, pp. 345-354.
[18] S.R. Gunn,“Support Vector Machines for Classification
and Regression”, Technical Report, 1998, available:
http://www.isis.ecs.soton.ac.uk/resources/svminfo/, accessed on
July 2008.
[19] DTS200 labratory setup Three Tank System , AMIRA2002.
3433

More Related Content

PDF
Shriram Nandakumar & Deepa Naik
PDF
Real-Time Stock Market Analysis using Spark Streaming
PDF
Designing a Minimum Distance classifier to Class Mean Classifier
PDF
Implementing Minimum Error Rate Classifier
PDF
Implementation of K-Nearest Neighbor Algorithm
PPT
Text categorization
PPT
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
PDF
Linear discriminant analysis
Shriram Nandakumar & Deepa Naik
Real-Time Stock Market Analysis using Spark Streaming
Designing a Minimum Distance classifier to Class Mean Classifier
Implementing Minimum Error Rate Classifier
Implementation of K-Nearest Neighbor Algorithm
Text categorization
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
Linear discriminant analysis

What's hot (17)

PDF
Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...
PDF
A Novel Algorithm for Design Tree Classification with PCA
PDF
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
PDF
Huong dan cu the svm
PPTX
Data For Datamining
PDF
Machine learning in science and industry — day 1
PPT
Data Mining: Concepts and Techniques — Chapter 2 —
PPT
2.8 accuracy and ensemble methods
PPT
Chapter 09 class advanced
PDF
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
PDF
Dimensionality reduction
PDF
Dimensionality Reduction
PDF
Unsupervised learning clustering
PDF
Machine learning in science and industry — day 2
PDF
Multi-class K-support Vector Nearest Neighbor for Mango Leaf Classification
PDF
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
PDF
Classification Based Machine Learning Algorithms
Implementing the Perceptron Algorithm for Finding the weights of a Linear Dis...
A Novel Algorithm for Design Tree Classification with PCA
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
Huong dan cu the svm
Data For Datamining
Machine learning in science and industry — day 1
Data Mining: Concepts and Techniques — Chapter 2 —
2.8 accuracy and ensemble methods
Chapter 09 class advanced
Aaa ped-17-Unsupervised Learning: Dimensionality reduction
Dimensionality reduction
Dimensionality Reduction
Unsupervised learning clustering
Machine learning in science and industry — day 2
Multi-class K-support Vector Nearest Neighbor for Mango Leaf Classification
A Mathematical Programming Approach for Selection of Variables in Cluster Ana...
Classification Based Machine Learning Algorithms
Ad

Viewers also liked (20)

DOCX
bon nuoc dai thanh
PDF
Industry West Magalog
DOCX
World bank group governance is here to stay
DOCX
Concept release comment s x
PDF
PPTX
Neema minecraft
PPTX
Persentasi instrumen slide share
DOCX
The Benefits of Hydraulic Fracturing
PPTX
itpartner365v2en
PPTX
PPTX
Proxy rules and regulatory required timeline
PPTX
Clase c (w205)
DOCX
Concept release comment (ii)
DOCX
UI_ARCH10102015
PPTX
SEC compliance and disclosure, Form 8-K (sample)
DOC
Rule 144-is she -affiliate
PPT
Protein drug binding.ppt
DOCX
Updated Self Report 11_18
DOCX
Final Research Report
bon nuoc dai thanh
Industry West Magalog
World bank group governance is here to stay
Concept release comment s x
Neema minecraft
Persentasi instrumen slide share
The Benefits of Hydraulic Fracturing
itpartner365v2en
Proxy rules and regulatory required timeline
Clase c (w205)
Concept release comment (ii)
UI_ARCH10102015
SEC compliance and disclosure, Form 8-K (sample)
Rule 144-is she -affiliate
Protein drug binding.ppt
Updated Self Report 11_18
Final Research Report
Ad

Similar to Application of combined support vector machines in process fault diagnosis (20)

PPT
PPT
Introduction to Support Vector Machine 221 CMU.ppt
PPT
SVM (2).ppt
PPT
4.Support Vector Machines.ppt machine learning and development
PPT
PERFORMANCE EVALUATION PARAMETERS FOR MACHINE LEARNING
PPTX
Support Vector Machine.pptx
PPT
2.6 support vector machines and associative classifiers revised
PPTX
ML-Lec-17-SVM,sshwqw - Non-Linear (1).pptx
PPTX
super vector machines algorithms using deep
DOC
Introduction to Support Vector Machines
PDF
A survey of modified support vector machine using particle of swarm optimizat...
PPTX
Support Vector Machine topic of machine learning.pptx
PDF
course slides of Support-Vector-Machine.pdf
PPTX
Event classification & prediction using support vector machine
PPTX
Module 3 -Support Vector Machines data mining
PPTX
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
PDF
Data Science - Part IX - Support Vector Machine
PPT
svm_introductory_ppt by university of texas
PPT
Support vector MAchine using machine learning
PPT
svm.ppt
Introduction to Support Vector Machine 221 CMU.ppt
SVM (2).ppt
4.Support Vector Machines.ppt machine learning and development
PERFORMANCE EVALUATION PARAMETERS FOR MACHINE LEARNING
Support Vector Machine.pptx
2.6 support vector machines and associative classifiers revised
ML-Lec-17-SVM,sshwqw - Non-Linear (1).pptx
super vector machines algorithms using deep
Introduction to Support Vector Machines
A survey of modified support vector machine using particle of swarm optimizat...
Support Vector Machine topic of machine learning.pptx
course slides of Support-Vector-Machine.pdf
Event classification & prediction using support vector machine
Module 3 -Support Vector Machines data mining
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Data Science - Part IX - Support Vector Machine
svm_introductory_ppt by university of texas
Support vector MAchine using machine learning
svm.ppt

Recently uploaded (20)

PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
composite construction of structures.pdf
PPT
Project quality management in manufacturing
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Sustainable Sites - Green Building Construction
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
additive manufacturing of ss316l using mig welding
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Welding lecture in detail for understanding
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Digital Logic Computer Design lecture notes
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
Construction Project Organization Group 2.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Model Code of Practice - Construction Work - 21102022 .pdf
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
composite construction of structures.pdf
Project quality management in manufacturing
Operating System & Kernel Study Guide-1 - converted.pdf
Sustainable Sites - Green Building Construction
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
additive manufacturing of ss316l using mig welding
R24 SURVEYING LAB MANUAL for civil enggi
Welding lecture in detail for understanding
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
CYBER-CRIMES AND SECURITY A guide to understanding
Digital Logic Computer Design lecture notes
CH1 Production IntroductoryConcepts.pptx
Construction Project Organization Group 2.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx

Application of combined support vector machines in process fault diagnosis

  • 1. Application of combined support vector machines in process fault diagnosis Esmaeil Tafazzoli and Mehrdad Saif Abstract— The performance of Combined Support Vector Machines, C-SVM, is examined by comparing it’s classification results with k-nearest neighbor and simple SVM classifier. For our experiments we use training and testing data obtained from two benchmark industrial processes. The first set is simulated data generated from Tennessee Eastman process simulator and the second set is the data obtained by running experiment on a Three Tank system. Our results show that the C-SVM classifier gives the lowest classification error compared to other methods. However, the complexity and computation time become issues, which depend on the number of faults in the data and the data dimension. We also examined Principal Component Analysis, using PC scores as input features for the classifiers but the performance was not comparable to other classifiers’ results. By selecting appropriate number of variables using contribution charts for classification, the performance of the classifiers on Tennessee Eastman data enhances significantly. Therefore, using contribution charts for selecting the most important variables is necessary when the number of variables is large. I. INTRODUCTION Support vector machine is a well known technique in the field of machine learning which is used for classification. Implementing nonlinear kernels in the SVM structure enables classification of nonlinear data which can not be classified by simple linear classifiers. In SVM classification method, an optimum hyperplane is defined which maximizes the separation between data point classes [3]. In many works on fault detection and diagnosis, the SVM classifier is combined with another method such as Prin- cipal Component Analysis (PCA), Independent Component Analysis (ICA), Fisher Discriminant Analysis (FDA), etc, to reduce the data dimension and to accomplish the detection part of the Fault Detection and Identification (FDI) process. Therefore, the diagnosis part is carried out by SVM classifier. Mostly, the SVM classifier operates on the processed data or features, resulting from other methods (PCA scores for example) [1]. In [4], ICA projection coefficients were used as feature data for training the SVM classifiers. In [5], authors compared the performance of FDA, SVM, and PSVM (prox- imal support vector machines). They showed that support vector machines perform better than FDA in classifying TE data. In general, SVM is a two-class classifier. A Two- class classification means that data points are assigned to only one of the two class labels in the data set while in multiclass classifiers, there are multiple class labels and the classifier assigns each point to one of the classes. Multiple classification problems can be turned to multiple two-class classification problem. The number of required classifiers The authors are both with the School of Engineering Science, Simon Fraser University, 8888 University Drive, Vancouver, BC, V5A 1S6, Canada. Corresponding Email: saif@ensc.sfu.ca depends on the number of faults to be classified. As a result, most SVM classifiers are multiple-SVM classifiers. The term Committee is referred to the combination of classifiers in machine learning area. A committee is built by combining several models (classifiers). Usually the outcome of the committee is better than individual models [8]. Averaging, boosting, and adaptive boosting are some of the methods of combining the models [3]. K-Nearest Neighbor (KNN) is one of the simplest classi- fication algorithms in machine learning. K-Nearest neighbor classification method was first introduced by Cover and Hart [2], in which the class of each sample point is determined by its K neighboring points in the training set. The point is assigned to the class with the majority of votes amongst the K-neighbor points. Several types of KNN algorithm have been suggested and applied to different data sets in the fields of data mining and machine learning. Many papers can be found on KNN or combination of KNN with other methods for improving data classification. For more information on KNN algorithm and its application the following references would be helpful [9]-[17]. In this paper we use averaging method for the combined classifiers. Considering the idea of committee classifier, we develop a combined- SVM,C- SVM, classifier and investigate its performance compared to individual classifiers on the data generated from Tennessee- Eastman (TE) simulator and the Three Tank System which are well known benchmark experimental processes used for control , monitoring, and fault diagnosis experiments. We also examine the performance of a K-nearest neighbor classifier in comparison with C-SVM when applied to this set of data. II. TWO CLASSIFICATION METHODS A. Support Vector Machines SVM algorithm is usually used for two-class separation problems [3]. The algorithm finds the maximum margin for a separating boundary between two classes of data. Suppose we have a set of data that can be separated into two classes. The data is separated by training a linear model y(x) = wT ϕ(x) + b (1) Equation (1) is the mathematical representation of the linear model. In this model the training data matrix is an n × m matrix where each row of the matrix represents an observed data point, xi, which is a vector of length m. So, n is the number of data points and m is the number of variables. Each data point’s class is determined by its target value. The corresponding target values are stacked in a vector t 2009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 10-12, 2009 ThC06.4 978-1-4244-4524-0/09/$25.00 ©2009 AACC 3429
  • 2. with ti ∈ {−1, 1} as it’s elements. ϕ(x) is called feature space transformation function and b is bias. w’s are weights which affect the separating plane direction. Function y(x) has the property that y(xi) > 0when ti = 1 and y(xi) < 0 when ti = −1. Therefore, tiy(xi) > 0 for all i . In SVM algorithm, the distance between the closest data points to the decision boundary which is called the boundary margin, is maximized (see Fig.1). Therefore, in SVM the hyperplane which maximizes the margin is chosen as the decision boundary. The maximization criterion is: arg max w,b { 1 w min i=1,...,n [ti(wT ϕ(xi) + b)]} and the points with minimum distance are known as Support Vectors. Fig (1) illustrates the location of support vectors and the decision boundary. The model parameters, w and b, are found by solving a constrained optimization problem as arg min w 1 2 w 2 s.t., ∀i, ti(wT (ϕ(xi) + b) ≥ 1 This problem is solved by using Lagrange multipliers. The lagrangian is L(w, b, a) = 1 2 w 2 − n i=1 ai{ti(wT ϕ(xi) + b) − 1} where ai are Lagrange multipliers. As a result, the weights and bias are found and the decision function, becomes y(x) = wT ϕ(x) + b = n i=1 aitik(x, xi) + b The data classification task is carried out by computing the sign(y(x)) for each test point. Using nonlinear kernels allows linear classification of nonlinearly separable data in higher dimension of the kernel space. The two well known kernels are RBF kernel and polynomial kernel which are defined as RBF : k(xi, xj) = exp( − xi − xj δ ) Polynomial : k(xi, xj) = (xixj + 1)d In many problems data points in different classes have over- lap which causes problem for classification. This happens when data is not linearly separable in the feature space. In this case, support vectors can not classify the points’ class properly and give poor result. To overcome this problem, SVM constraint is relaxed from tiy(xi) ≥ 1 to tiy(xi) ≥ 1 − ζi (2) where ζi i = 1, ..., n is called the slack variable. Fig (2) shows the concept of slack variables. By using slack variables, some points can be misclassified which gives flexibility to classifier. In this way some data Fig. 1. support vectors illustration Fig. 2. Illustration of slack variables used for non-separable data points are misclassified but there will be a penalty which increases the error function. Therefore, the algorithm maxi- mizes the margin while minimizes the penalty for the points in the wrong side of the boundary. So the criterion becomes min{C n i=1 ζi + w 2 2 } (3) where C is the controlling parameter, which controls the trade off between the model complexity and minimizing classification error. High value of C results in over-fitting the data and in the limit, the SVM model is the same as the SVM for separable data. The optimization problem now turns to minimizing (3) with constrains in (2). The Lagrangian is given by L(w, b, a) = w 2 2 + C n i=1 ζi (4) − n i=1 ai{tiy(xi) − 1 + ζi} − n i=1 µiζi where ai > 0 and µi > 0 are Lagrangian multipliers [3]. B. K-Nearest Neighbor Classification In K-nearest neighbor classification, the class of each sample point is determined by its K neighboring points in the training set. The point is assigned to the class with the ma- jority of votes for class label amongst the K-neighbor points. The classifier is defined by its parameters. Setting parameter K depends on the data and effects the performance of the classifier. K must be large enough to reduce missclassification 3430
  • 3. Fig. 3. Tennessee Eastman process simulator diagram[5] of an example point and must be small enough so that the sample point is close to the neighboring points which results in better estimation of the point’s class [2]. III. EXPERIMENT DATA A. Tennessee Eastman process The Tennessee Eastman process (TE) which is a chemical plant involving four exothermic gas reactions was proposed and modeled by Downs and Vogel as a plant-wide control challenge problem [6]. The process has been used for many research experiments in fault detection and control. It has fifty two variables including measured and manipulated variables and twenty one faults that have been defined for the process. In this work, faults 4, 9, and 11 are chosen as the training and testing data which have overlap between each other [7]. Fault 4 is defined as a step change in the reactor cooling water temperature. Fault 9 is a random variation in one of the reactants (reactant D) feed temperature and fault 11 is a random variation in the reactor cooling water tem- perature. The data is taken from http://brahms.scs.uiuc.edu. Each set of training and testing data contain 480 × 52 and 960 × 52 points respectively, observed every three min of simulation and faults occur after 1 hour and 8 hour of simulation respectively. Figure (3) illustrates the TE plant simulation diagram. Figure (4) shows the plot of faulty data in first and second variable space and figure (5) shows the plot of faulty data in the two dimensions where the data has the most separability. B. Three Tank System As a benchmark control problem, the Three Tanks System (3TS) is used in many different research projects. The basic structure of the system contains three tanks which are connected to each other by pipes. Two tanks are filled with 2 pumps while the third one is filled only through the pipes connected to the other two. Our experimental setup is an AMIRA DTS200 in which the water level is measured with three piezo-resistive difference pressure sensor [19]. DTS200 contains 6 valves which are used to emulate clogging and leakage in the system. Figure (6) shows the system flow sheet. The system has the following specifications: Tank cross section area, A = .0154m2 Fig. 4. Test data plot of variables 1 and 2 for fault 4, 9, and 11 Fig. 5. Test data plot of variables 9 and 51 for fault 4, 9, and 11 Connecting pipes cross section area, az = 5 × 10−5 m2 Highest liquid level, Hmax = 62cm Maximum pump flow rate, Qmax = 100mltr/sec. The system is equipped with a disturbance module which allows simulating 11 types of faults for fault detection research purposes including three sensor faults, two actuator faults, leakage for each of the three tanks ,clogging between the tanks, and clogging in the outflow. The training and testing data size are 500 × 5 for each case of fault with water levels and flow rates as variables. Faults are instigated at sample 55 in each case. We assume that only one fault occurs at a time and there are no simultaneous faults. Figures (7) and (8) show two example plots of data when leak and sensor fault occur in the system. Fig. 6. Three Tank system structure[19] 3431
  • 4. 0 100 200 300 400 500 25 25.5 26 26.5 27 27.5 28 28.5 29 29.5 30 Tank 1 water level, leak occurs in tank 1 at sample 55 sample number waterlevel(cm) Fig. 7. Example plot of level sensor in three tank system 0 100 200 300 400 500 17 18 19 20 21 22 23 24 25 26 27 flowrate(mltr/sec) sample number Pump flow rate, fault in tank 2 sensor at sample 55 Fig. 8. Example plot of flow rate for three tank system IV. CLASSIFICATION PROCEDURE AND RESULTS In every fault detection and diagnosis system, the FDI process includes detecting the fault in the process and then identifying the type of the fault. Here, we focus on the diagnosis part of the FDI process and assume that fault detection has been accomplished. After fault detection stage in FDI, we use SVM for fault classification. It should be noted that using this method for fault diagnosis requires prior knowledge about different faults because classifiers are trained and structured based on this knowledge. Here, we examine the performance of the C-SVM compared to simple SVM with different kernels and also to K-nearest neighbor classifier. Hence, a training and a testing data set is collected from the processes. The choices of different SVM depend on their parameters. Type of the kernel, value of C, width of the RBF kernel, polynomial kernel degree, and number of SVM to be used in the committee are such example parameters. Since there are many different combinations to choose, we only restrict our experiment to a simple case with three different kernels to be used in the SVM-classifier. However, we selected the parameter C, by testing the SVM performance on different values of C ranging in [.1, 105 ]. The parameter values used in the experiment are: C = 100, δ = 1 (RBF kernel parameter as suggested in [5]), and Poly − degree = 3 In [5], it is pointed out that for TE data in this case Fig. 9. SVM training procedure TABLE I CLASSIFICATION ERROR FOR DIFFERENT CLASSIFIERS APPLIED TO TE PROCESS DATA Classifier Classification error % SVM(linear kernel) 26.7 SVM(RBF kernel) 8.3 SVM(Polynomial kernel) 7.3 C-SVM 6.7 KNN classifier 8.4 (fault 4,9, 11) only two variables are important and the other fifty variables do not show significant changes caused by the faults. They used contribution charts to find the most important variables for this case. These variables are var−51 (reactor cooling water valve position) and var − 9 (reactor temperature). We use these two variables to train and test our classifier in fault classification on TE data set. The algorithms was implemented in MATLAB using SVM toolbox from [18]. The procedure for building the classifier is as follows: For every two faults we train a C-SVM classifier. Each classifier is a combination of three SVM with different kernels (linear, RBF, polynomial), trained with data that are a mixture of the two fault class data set. The output is simply the average of the three. Fig (9) depicts the training procedure for C-SVM. In this figure, data pre-processing includes scaling and selecting appropriate variables for classification which has to be done before training SVM’s. When SVM is trained the final classifier is tested with test data to determine the classification error and to evaluate the performance of the classification system. The error is simply defined as the percentage of misclassified points in the whole data set. Here, misclassification indicates a point whose class is determined incorrectly. Fig (10) shows the block diagram of the test data classification process. The data class is determined by selecting the maximum vote for data from the classifiers. If there is a tie between classifiers’ vote then the fault class is chosen randomly. The TE test data for class 4, 9, and 11 were applied to the classifier. Classification is based on one-against-one classifier which means for every two faults a classifier is trained. So we have three SVM classifier for fault 4 − 9, 9 − 11 and 4 − 11 shown as C-SVM 1, 2, and 3 in Fig (10). When all variables were included in the data for train- ing and testing, the classification error was 43.1%. Using selected data variables (variables 9 and 51) in the training and testing data sets resulted in 6.7% error which shows about 36.3% decrease. Classification error for applying SVM on the first two PCA scores gave very poor performance with 64% error which is not an acceptable result. Table(I) 3432
  • 5. Fig. 10. Classification procedure for TE data TABLE II CLASSIFICATION ERROR FOR DIFFERENT CLASSIFIERS APPLIED TO THREE TANK SYSTEM DATA Classifier Classification error % SVM(linear kernel) 14.03 SVM(RBF kernel) 13.74 SVM(Polynomial kernel) 30.53 C-SVM 12.17 KNN classifier 14.57 presents the results for different classifiers applied to TE data. In the second experiment with real data from the Three Tank System(3TS), the procedure is modified to enhance the computation time and complexity of classification. We first train a classifier to separate faults based on their type into four classes, i.e., leakage, clogging, sensor fault, and pump fault. When the type of the fault is determined then the location is determined by using another classifier which is trained for that specific category, e.g., leak in tank 1. The classification results are shown in table (II). The C- SVM gives the best result for classification with 12.17% classification error. SVM classifiers with linear and RBF kernel also give slightly better results than KNN and SVM with polynomial kernel. V. DISCUSSION AND CONCLUSION As presented in table(I), by comparing classification er- rors, the C-SVM outperforms all other classifiers. However, considering the computation time, the KNN classifier per- forms much faster than SVM based classifiers which is caused by using several SVM’s, each of which containing kernel calculation that takes the computation time. This can be problematic when the data dimension is high. Therefore, data reduction techniques are highly recommended prior to using SVM. The number of SVM used in the combined clas- sifier is also an important parameter in forming the classifier which has to be considered. The training time increases by the number of SVM. As presented above, the performance of the method is based on the results of the experiments performed on two benchmark systems. However, for further confirmation, the method should be tested on other different processes in order to achieve a comprehensive understanding of the proposed method. REFERENCES [1] C.M. Bishop, Pattern Recognition and Machine Learning, Springer, Singapore; 2006. [2] X. Zhao, S. Huihe, “A Novel Combination Method for On-line Process Monitoring and Fault Diagnosis”, IEEE Tran. Industrial Electronics ISIE , 4,2005, pp.1715- 1720. [3] Y. Song et al., IKNN: Informative K-Nearest Neighbor Pattern Clas- sification,PKDD 2007, Springer-Verlag, Berlin Heidelberg; 2007. [4] M. Guo, L. Xie, S. Wang, J. Zhang, “Research on an Integrated ICA- SVM Based Framework for Fault Diagnosis”, IEEE Proc. Syst., Man, and Cybern., 3, 2003, pp. 2710-2715. [5] Chiang L.H., M.E. Kotanchek, A.K. Kordon, “Fault diagnosis based on Fisher discriminant analysis and support vector machines”, Com- puter and Chemical Eng., 28, 2004, pp. 1389-1401. [6] J.J. Downs, E.F. Vogel, “A Plant-Wide Industrial Process Con- trol Problem”, Computers and Chemical Engineering, 17(3), 1993,pp.245-255. [7] L.H. Chiang, E.L. Russell, R.D. Braatz, “Fault diagnosis in chemical processes using Fisher discriminant analysis, discriminant partial least squares, and principal component analysis”, Chemometrics and Intelligent Laboratory Systems 50, 2000, pp. 243-252. [8] G. Mori, “Introduction to machine learning”, lecture notes,[Online],available:http://www.cs.sfu.ca/ mori/courses/cmpt726, accessed Aug. 2008. [9] C. Domeniconi, J. Peng, D. Gunopulos, “Locally adaptive metric nearest-neighbor classification”,IEEE Trans. Pattern Anal. Mach. Intell., 24(9), 2002, pp. 1281-1285. [10] T. Cover, P.Hart,“Nearest neighbor pattern classification”,IEEE Trans. on Information Theory, 13(1), 1967, pp. 21-27. [11] V. Athitsos, S. Sclaroff,“Boosting nearest neighbor classifiers for multiclass recognition”,IEEE Compt. Society Conf. on Computer Vision and Pattern Recognition, 3, 2005, pp. 45-45. [12] T. Hastie, R. Tibshirani,“Discriminant adaptive nearest neighbor classification”,IEEE Trans. Pattern Anal. Mach. Intell., 18(6), 1996, pp. 607-616. [13] H. Zhang, A.C. Berg, M. Maire, M. Malik,“Discriminative nearest neighbor classification for visual category recognition”,IEEE Compt. Society Conf. on Computer Vision and Pattern Recognition, 2, 2006, pp. 2126- 2136. [14] Y. Pingpeng, Y. Chen, H. Jin, L. Huang,“MSVM-kNN: combining SVM and k-NN for multi-class text classification”,IEEE Int. Work- shop on Semantic Computing and Systems, 2008, 133-140. [15] W. Shu-Bin et al.,“Classification algorithm based on weighted SVMs and locally tuning kNN”,International Conference on Biomedical Engineering and Informatics, 2008, pp. 240-244. [16] L. Ping, L. Nan, W. Jian-yu, Z.Chun-Guang,“Combining weighted SVMs and spectrum-based kNN for multi-classification”,Proc. 4th Int. Symp. Neural Networks, 2007, pp. 448-53. [17] Q. He, J. Wang,“Principal component based k-nearest-neighbor rule for semiconductor process fault detection”, IEEE Trans. Semiconduc- tor Manufacturing, 20(4), 2008, pp. 345-354. [18] S.R. Gunn,“Support Vector Machines for Classification and Regression”, Technical Report, 1998, available: http://www.isis.ecs.soton.ac.uk/resources/svminfo/, accessed on July 2008. [19] DTS200 labratory setup Three Tank System , AMIRA2002. 3433