SlideShare a Scribd company logo
Research Article
Fault detection in the distillation column process using Kullback
Leibler divergence
Lakhdar Aggoune n
, Yahya Chetouani 1
, Tarek Raïssi 2
a
Laboratoire d’Automatique de Sétif, Département d’Electrotechnique, Université de Sétif 1, Cité Maabouda, Route de Béjaia, 19000 Sétif, Algeria
b
Université de Rouen, Département Génie Chimique, Rue Lavoisier, 76821 Mont Saint Aignan Cedex, France
c
Conservatoire National des Arts et Métiers, Département EASY, Cedric-laetitia, 292, Rue St-Martin, case 2D2P10, 75141 Paris Cedex 03, France
a r t i c l e i n f o
Article history:
Received 28 December 2014
Received in revised form
9 September 2015
Accepted 13 March 2016
Available online 25 March 2016
This paper was recommended for publica-
tion by Didier Theilliol.
Keywords:
Fault detection
Safety
NARMAX model
Kullback Leibler divergence
Dynamic processes
a b s t r a c t
Chemical plants are complex large-scale systems which need designing robust fault detection schemes to
ensure high product quality, reliability and safety under different operating conditions. The present
paper is concerned with a feasibility study of the application of the black-box modeling method and
Kullback Leibler divergence (KLD) to the fault detection in a distillation column process. A Nonlinear
Auto-Regressive Moving Average with eXogenous input (NARMAX) polynomial model is firstly developed
to estimate the nonlinear behavior of the plant. Furthermore, the KLD is applied to detect abnormal
modes. The proposed FD method is implemented and validated experimentally using realistic faults of a
distillation plant of laboratory scale. The experimental results clearly demonstrate the fact that proposed
method is effective and gives early alarm to operators.
& 2016 ISA. Published by Elsevier Ltd. All rights reserved.
1. Introduction
With advances in modern engineering processes, fault and
change detection are becoming more and more important for
increasing the reliability, safety, availability, and environmental
protection. The central goal is to improve the continuity and
quality levels of the industrial production for a wide range of
plants. To this end, much effort has been devoted to the devel-
opment of fault detection (FD) schemes with various assumptions
[1–5].
The diversity of the FD strategies developed in the past dec-
ades, was required by the growing complexity of industrial
installations. Among the most well-known is the model based
approaches [6–8] which have found great interest since many
modern industries require a faster detection time and the on-line
implementation of the FD methods. In practice, the latter methods
are based on the use of mathematical models. The FD procedure
includes residual generation and residual evaluation. Usually,
residual is generated by using the difference between the pre-
dicted output and real measurements data. For this kind of FD
techniques, the performance of detection depends very much on
the accuracy of the model used.
Process modeling forms a significant role in many fields of
engineering systems. There are two different methods of modeling
approaches: first principles models or black-box models. First
principles models are based on physical laws in order to obtain the
relationship between the input and output. Unfortunately in this
approach it is necessary to know accurately the complete knowl-
edge of the plant. Insufficient knowledge about the physical
properties of the process prevents the design and implementation
of FD techniques, and decreases the security (human and envir-
onmental). As an alternative strategy, the black-box modeling,
based on experimental data, does not need any prior knowledge
about the physical insight of the system. It can be used to solve the
problem of modeling in most cases encountered in engineering
practice. On the other hand, the combination of these two
modeling approaches gives a grey-box modeling. This method
defines the model structure from a first principles models and
determines the parameters from a experimental data using
estimation techniques. Because of its satisfactory results in
change detection, many researchers have implemented black-box
models for monitoring process industrial systems such as, artificial
neural networks [1], set membership identification [9], subspace
identification [10], polynomial models [11,12]. However, in this
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/isatrans
ISA Transactions
http://dx.doi.org/10.1016/j.isatra.2016.03.006
0019-0578/& 2016 ISA. Published by Elsevier Ltd. All rights reserved.
n
Corresponding author. Tel./fax: þ213 36 61 12 11.
E-mail address: lakhdar.aggoune@yahoo.fr (L. Aggoune).
1
Fax: þ33 2 35 14 61 30.
2
Tel.: þ33 1 40 27 21 69.
ISA Transactions 63 (2016) 394–400
study we are interested only in an NARMAX polynomial model
according to our experimental investigation.
Once the residual is generated by using the model of the
supervised process, its evaluation is necessary in order to reveal
any deviation from the normal mode of the system. Due to pre-
sence of modeling errors and measurement noises, residual eva-
luation often includes a reliable decision module which can be
obtained by using statistical hypothesis testing [6,13]. As an
alternative approach, Kullback–Leibler divergence (KLD) [14] is
proposed in this work. The KLD is a good tool to evaluate the
similarity between two distributions.
This paper investigates the modeling and parameter estimation of
a chemical plant as a distillation column by using the black-box
modeling approach based on NARMAX structures whose parameters
are estimated by using the output error with extended prediction
model (OEEPM) algorithm for the specific purpose of change detec-
tion. An optimal NARMAX model was selected using statistical cri-
teria such as Akaike’s Information Criterion (AIC), Root Mean Square
Error (RMSE) and Nash-Sutcliffe Efficiency (NSE). The difference
between the current measurement and the estimated output by the
NARMAX model is defined as the residual of the monitored plant.
The residual is compared to a fault detection threshold with the KLD
which measures the probability distribution of the current residual
compared to a reference one. The main contribution consists in the
combination of the black-box identification method and KLD for the
design of a fault detection scheme.
The organization of the paper is the following: Section 2 gives a
description of the Kullback–Leibler divergence as a fault detection
criterion. Section 3 presents the experimental set-up. Section 4
describes the system identification procedure. Section 5 presents
the experimental results of the proposed FD technique. Finally,
some concluding remarks are given in Section 6.
2. Kullback-Leibler divergence
Kullback-Leibler divergence (KLD), or relative entropy, is a
measure of the similarity between two probability distributions.
This divergence has been applied successfully in many fields such
as pattern recognition [15,16]. Harmouche et al. [17] developed a
fault detection approach under the PCA context which firstly used
a PCA statistical model to extract a reference probability dis-
tribution for each latent principal score from data and then
monitored the probability distributions of these scores for each
new set of data by using the KLD. They demonstrated its superior
fault detection performance over the Hotelling T2
statistic. Houerbi
et al. [18] proposed a change detection method for scan surveil-
lance in Internet Networks. They used KLD for monitoring the
variation of the distribution of scan features in a space spanned by
IP source address, IP destination address, source port, and desti-
nation port numbers.
In the current work, The KLD can efficiently detect the abnor-
mal events in distillation column by monitoring the dissimilarity
between the reference probability density function (pdf) of the
normal mode and the actual one based on the residual of the
identified model.
The KLD is an example of f divergence measure which is used to
quantify discrepancy between pairs of probability distributions. In
order to measure the difference between two discrete pdfs p0 (ε)
(normal mode) and p1 (ε) (abnormal mode) of random variable ε,
the KLD is defined as follow:
KLðp1ðϵÞ‖p0ðϵÞÞ ¼
XN
i ¼ 1
p1 ϵið Þlog
p1 ϵið Þ
p0 ϵið Þ
 
ð1Þ
where N is the data number and ε represents the sequence of the
residual given by the difference between the process output yðtÞ
and its estimate ^yðtÞasεðtÞ ¼ yðtÞÀ ^yðtÞ.
Two fundamental properties of KLD are:
1. Positivity i.e. KLðp1ðϵÞ p0ðϵÞÞZ0

 with equality if and only if
p1ðεÞ ¼ p0ðεÞ,
2. Asymmetry i.e. KLðp1ðϵÞ p0ðϵÞÞaKLðp0ðϵÞ p1ðϵÞÞ



 .
If p0 (ε) ¼ p1 (ε), the actual pdf corresponds to the one
obtained in normal mode and the KLD is close to zero. Otherwise,
large values of KLD correspond to the case of distributions p1 (ε)
and p0 (ε) totally different and it is easy to detect an
abnormal mode.
The goal of the proposed FD technique is to detect the presence
of an abnormal mode using KLD. The basic idea of this method is
to compute the difference between the probability density func-
tions of normal behavior and abnormal event, which can be rea-
lised by calculating the KLD between the two distributions. The
detection of faults affecting the process can be formulated as a
hypothesis testing problem, considering two possible operating
conditions or hypotheses: the null hypothesis H0, where the
parameters of KLD are the same as those of the normal operating
conditions, and the hypothesis H1 is when the parameters are
different to those of the normal process behavior (anomaly). KLD
values can run from zero to infinity.
The KLD can be easily computed in the case of normal dis-
tributions. Given two normal densities p0 (ε) and p1 (ε) such that
p0 εð Þ $ Nðμ0; σ2
0Þ andp1 εð Þ $ Nðμ1; σ2
1Þ, where μ0; μ1 are the means
and σ0; σ1are the standard deviations for p0 (ε) and p1 (ε)
respectively. In this case the KLD may be written as [14]:
KLðp1ðϵÞ‖p0ðϵÞÞ ¼
1
2
σ2
1
σ2
0
þ
ðμ1 Àμ0Þ2
σ2
0
þ log
σ2
0
σ2
1
À1
!
ð2Þ
The standard deviation of the distribution p1 (ε) is assumed
unchanged after the occurrence of a fault σ0 ¼ σ1ð Þ. Then, we can
write:
KLðp1ðϵÞ‖p0ðϵÞÞ ¼
1
2
ðμ1 Àμ0Þ2
σ2
0
!
ð3Þ
μ0 and σ0 are obtained from the measurement of the residual
of NARMAX model recorded in normal and safe operating modes,
while the parameter μ1 is calculated at the end of each fixed
sampling interval.
Considering Eq. (3), the above hypothesis can be formulated in
terms of KLD as:
H0 : KLðp1ðϵÞ‖p0ðϵÞÞrTD
H1 : KLðp1ðϵÞ‖p0ðϵÞÞ4TD ð4Þ
where TD is the predetermined threshold. The value of TD is cal-
culated based on the three sigma rule [19–21].
3. Experimental device
The process used in this work is a distillation column, a
separation process habitually found in the refinery and petro-
chemical industries. Fig. 1 represents the setup considered here. A
toluene methylcyclohexane mixture is introduced in the tank in
order to be separated with a mass composition at 23% methylcy-
clohexane. Feed preheating system is constituted by three ele-
ments of 250 W each one. The reciprocating feed pump is
constituted by a membrane allowing firstly the suction of the
mixture and the discharge towards the tank with a flow capacity
F¼4.32 L/h. The column has also a reboiler of 2 L hold-up capacity,
L. Aggoune et al. / ISA Transactions 63 (2016) 394–400 395
an immersion heater of a power Qb ¼3.3 kW, and of a level liquid
switch sensor which allows the automatic stop of heating if the
level is insufficient. The internal packing is made of Multiknit
stainless 316 L which enhances the mass transfer between the
vapor and liquid phases. A condenser is placed at the column
overhead in order to condense the entire vapor coming out from
the column. The cooling medium used in exchangers is water. The
heat-transfer area of the total overhead condenser is 0.08 m2
. The
thermocouples were coupled to a calibrated amplification cir-
cuit (4–20 mA, 0–150 °C) whose signals are accessible on-line
from computer, which permits the bottom and top tempera-
tures to be obtained. The unit has twelve PT100 sensors which
measure continuously the temperature throughout the col-
umn. The measurements of top and bottom pressures of the
separation unit are realized by two WIKA ECO-Tronic pressure
transmitters. All sensors are interfaced with a control compu-
ter using RS485/232 converters. A ETP-200 program is used to
register the variables.
In this set-up, the most significant variables that can be mea-
sured are reflux timer (Rt), heating power (Qb), pressure drop (ΔP),
preheating power (Qf), preheating temperature (Tf), feed flow rate
(F), cooling rate (Qc) and overhead temperature (Td).
The column is kept at a constant operating point for four hours
to ensure that the column is in steady-state. The values of the
measurable variables obtained on average from the nominal
steady-state regime are: Qb of 45%, Rt of 14%, F of 50%, Td of 102 °C,
Tf of 80 °C, Qc of 250 L/h.
4. NARMAX model system identification
The problem of modeling is important in the context of simu-
lation, control and fault detection and diagnosis. Several approa-
ches have been developed in the literature [22–24]. The steps in
the modeling procedure may be stated as follows:
1. Model structure detection, determining the model form suitable
for the system of interest.
2. Model parameter estimation, estimating the unknown para-
meters contained within the specified model, using
experimental data.
3. Model validation, this last step includes testing to verify if the
established model provides good accuracy for a fresh data set
that was not used during the training phase.
For nonlinear systems, one popular black-box identification
method is to use the polynomial models, such as NARMAX one
[22,25]. In case of multiple input, single output (MISO) systems
this model is given by:
yðtÞ ¼ f yðtÀ1Þ; :::; yðtÀnyÞ; u1ðtÀd1Þ; :::; u1ðtÀd1 Ànu1Þ;
À
…; upðtÀdpÞ; :::; upðtÀdP ÀnupÞþeðtÀ1Þ; :::; eðtÀneÞ
Á
þeðtÞ ð5Þ
where y(t), ui(t), and e(t) represent the system output, input and
noise at the discrete time t, respectively; p is the dimension of the
input vector. ny; nui
; ne are the maximum time lags of the output,
input, and noise, respectively;di is the input delay. f is a nonlinear
function which taken here to be a polynomial expansion of its
arguments with nonlinearity degree l. The expression of a poly-
nomial NARMAX model is obtained as follows:
yðtÞ ¼
Xn
i1 ¼ 1
βi1
xi1
ðtÞþ
Xn
i1 ¼ 1
Xn
i2 ¼ i1
βi1;i2
xi1
ðtÞxi2
ðtÞþ…
þ
Xn
i1 ¼ 1
…
Xn
il ¼ il À 1
βi1;…il
xi1
ðtÞ…xil
ðtÞþ…þeðtÞ ð6Þ
where n ¼ ny þnu1
þ:::þnup
þne, β’s the model parameters, and x’s
represent lagged output, input, and noise terms. The ARMAX
model is a special case of the NARMAX one, which is obtained by
setting l¼1.
4.1. NARMAX parameters estimation algorithm
The parameter estimation of the NARMAX model has received
much attention and many algorithms have been developed
[26,27]. In this work, the recursive algorithm such as output error
with extended prediction model (OEEPM) method is used mainly
for its simplicity and superior performance [28].
Eq. (6) clearly belongs to the class of linear-in-the-parameter
regression models. In order to estimate the parameters of the
NARMAX model, Eq. (6) should be expressed as:
yðtÞ ¼ φT
ðtÞθþeðtÞ ð7Þ
where φT
ðtÞ includes all the output, inputs, and noise terms as well
as all possible combinations up to degree l and up to time (t-1),
and θ is the vector which includes the model parameters to be
estimated. If the noise e(t) is known, an ordinary recursive least
square (RLS) method can be used for parameters estimation.
However, in general, the noise is not measurable, and the sequence
e(t) is estimated iteratively as:
εðtÞ ¼ yðtÞÀ ^yðtÞCeðtÞ ð8Þ
where ε(t) is the residual at time t and the predicted output ^y(t)
can be written as:
^yðtÞ ¼ φT
ðtÀ1Þ^θ ð9Þ
The parameter vector θ can be estimated by OEEPM algorithm,
discussed in detail in [23]. Considering Eq. (7), one can add and
subtract the term 7ða1 ^yðtÀ1Þ; :::; any
^yðtÀnyÞÞ. Note now that the
regression vector φT
ðtÞ includes all the predicted output, inputs,
and noise terms as well as all possible combinations up to degree l
and up to time (t-1). The initial values of ε(t) i.e. ε(0), ε(À1),…, ε
(Àne) are set to zero. The complete algorithm used in the following
Fig. 1. Experimental device: distillation column.
L. Aggoune et al. / ISA Transactions 63 (2016) 394–400396
is described by:
^θðtÞ ¼ ^θðtÀ1ÞþPðtÞφðtÞεðtÞ
PðtÞ ¼ 1
λ PðtÀ1ÞÀPðt À 1ÞφðtÞφðtÞT
Pðt À 1Þ
λþφðtÞT
Pðt À1ÞφðtÞ
 
εðtÞ ¼ yðtÞÀφðtÞT ^θðtÞ
8

:
ð10Þ
where λ is a forgetting factor, ^θ denotes the estimated value, ε(t) is
the residual, and P denotes the adaptation gain.
4.2. Proper NARMAX model selection
The first stage towards modeling a particular system is to select
appropriate inputs and output. For this purpose, experimental
tests are performed which was carried out to obtain a rich mixture
in methylcyclohexane. Rt, Qb, ΔP, Qf, Tf represent the main mea-
surable input variables of the process and Td its output. The
aforementioned work has been reported in [29].
To identify the parameters of the model describing Td, time
series of relevant data were collected continuously for 13 hours.
All data are collected with a sampling period of 11 s. Experiment is
performed with the aim of generating estimation and validation
data rich in amplitudes and in frequencies. When the input sig-
nals are modified, Td ranges from 101.5 °C to 103.5 °C. According to
[23,24], once the data are collected, the first two thirds of the data
recorded are used to estimate the system parameters and the
remaining data for model validation. These data are presented in
Figs. 2 and 3. The latter show the evolution of ΔP, Rt, Qf and Qb
between 1540 s and 2365 s for better readability.
The next step following the determination of a reliable model
for distillation column is to choose appropriate NARMAX poly-
nomial model. If the model structure is known a priori then the
identification can be formulated as a standard least-squares
problem. However, the identification is a hard task in reality as the
structure process is often unknown. An approach in resolving the
structural problem is to estimate model parameters using a simple
structure and gradually increasing ny;nu;ne; d and l until a desired
accuracy is achieved [23]. In order to select the optimal model,
statistical criteria such as AIC, RMSE and NSE were evaluated by
varying the number of parameters. The latter are obtained
according to Eq. (6). A set of candidate models was developed by
setting the maximum number of ny ¼ 5;nu ¼ 7; ne ¼ 5; d ¼
5105510½ Š and l ¼ 4. The number of terms to include in the final
model is determined by the comparison of statistical criteria
values for each model structure. These statistical criteria are
defined as follows [30,31]:
AIC ¼ ln
N
2
V
 
þ
2n
N
ð11Þ
RMSE ¼ sqrt
PN
i ¼ 1
ð^yðtÞÀyðtÞÞ2
N
0
B
B
B
@
1
C
C
C
A
ð12Þ
NSE ¼ 1À
varðϵðtÞÞ
varðyðtÞÞ
 
ð13Þ
where V is the loss function, n is the number of estimated para-
meters, N is the data length in the estimation data set. The loss
function V is equal to the residual ε(t) sum of squares:
V ¼
1
N
XN
i ¼ 1
ε2
i ð14Þ
It is clear that the three criteria require the computation of the
residual ε. The AIC contains two terms: The loss function V, which
Fig. 2. Estimation and validation data.
Fig. 3. (a) Evolution of heating power and reflux timer and (b) evolution of pressure drop and preheating power.
L. Aggoune et al. / ISA Transactions 63 (2016) 394–400 397
decreases with the model complexity, and the number of esti-
mated parameters n, that is increasing with the model complexity.
The objective of this criterion is to make a trade-off between
prediction accuracy and model complexity. The RMSE evaluates
the differences between predicted output and real measurements.
Lower values of RMSE indicate good prediction quality. The NSE
examines the relative magnitude of the residual variance com-
pared to the measured output variance. A high NSE value (near to
one) ensures the good accuracy of the identified model. The best
model structure is chosen as the one that minimizes AIC and
RMSE, and maximizes NSE.
The NARMAX model is built on the basis of a linear ARMAX
model. ny; nui
; ne and d were each varied and the OEEPM algo-
rithm is employed to estimate the parameters of the models. For
each combination, the statistical criteria were calculated using the
estimation data set. Table 1 lists the computation results of these
criteria. Only significant results are reported. Based on the results
given on Table 1, the best ARMAX model structure is obtained
with ny ¼ 2;nu ¼ 12222½ Š;ne ¼ 2; and d ¼ 14115½ Š, giving AIC¼
À0.3640, NSE¼0.9844 and RMSE¼0.0234.
To select the appropriate NARMAX model for the studied dis-
tillation column system, the chosen orders and delays for the best
ARMAX model are used to define the best NARMAX model.
Starting from the orders and delays indicated for the best ARMAX
model, and increasing or decreasing their values, the optimal
values for NARMAX model can be found. Based on the evaluation
of statistical criteria, the best NARMAX model structure is obtained
with AIC¼ À0.4126, NSE¼0.9851 and RMSE ¼ 0.0229. In this case,
the model which predicts overhead temperature is given by:
TdðtÞ ¼ À0:2601TdðtÀ1ÞÀ0:6901TdðtÀ2Þþ0:0228QbðtÀ1Þ
À0:0427Qf ðtÀ4ÞÀ0:0558Qf ðtÀ5Þþ0:0685RtðtÀ1Þ
þ0:1053RtðtÀ2Þþ0:0165ΔPðtÀ1ÞÀ0:0199ΔPðtÀ2Þ
À0:0971Tf ðtÀ5ÞÀ0:0027Tf ðtÀ6Þþ0:3524eðtÀ1Þ
þ0:4175eðtÀ2Þþ0:2139TdðtÀ2ÞRtðtÀ1ÞΔPðtÀ5Þ
þ0:1093RtðtÀ6ÞΔPðtÀ1ÞΔPðtÀ6Þ
À0:1050QbðtÀ1ÞRtðtÀ4ÞΔPðtÀ1Þ ð15Þ
Eq. (15) shows that 3 nonlinear terms have been added to the
model. Using the NARMAX model the AIC is reduced in compar-
ison with the ARMAX model.
During the development of the different models, the NARMAX
model provides a good robustness against the orders and delays
variations than the ARMAX model. This is because the small var-
iations of the optimal values of ny, nu, ne and d found for the best
ARMAX model degrade the prediction accuracy. Consequently, the
NARMAX model is more suitable than the ARMAX model for the
prediction of the overhead temperature.
The validation phase is the last step in system identification
problem, where the performances of the identified NARMAX
model are evaluated for validation data set. It is necessary to
analyze the properties of the residual (Fig. 4) as seen in [28,29].
There are several techniques of model validation [22]. In general, a
model generating the minimum value of the residual would best
represent the real plant of interest. A standard performance index
is based on the whiteness test of residual and the independence
between the residual and inputs [23,24,32]. In fact, if this condi-
tion is satisfied, the identified model has effectively captured the
system dynamics.
5. Fault detection results
In this section, the performance of the monitoring strategy
based on KLD is assessed through its utilization to detect faults in a
distillation column. The identified model is used to generate a
residual on which the anomaly is detected by the FD procedure. In
order to verify the ability and effectiveness of the proposed
method, experimental faults of the distillation plant of laboratory
scale were performed. The adopted technique for anomaly detec-
tion is based on the assumption that the residual variance remains
unchanged and the residual shift only exists in residual mean.
Determination of parameters μ0 and σ0 was performed by using a
measurement data on the residual in fault free case.
The distillation column is used widely in the fields of chemical
industry. The Early detection of faults can help to avoid pro-
ductivity loss and damage to human health. The distillation unit
can be affected by several faults. The major categories of possible
faults, including heating power, preheating power, feed pump,
Table 1
Comparison between different ARMAX model structures estimated with OEEPM
algorithm.
Order [na, nb, nc] AIC NSE RMSE Delay d
[2, 12212, 2] À0.3591 0.9843 0.0234 [1 4 1 1 5 ]
[2, 12211, 2] À0.2792 0.9830 0.0244 [1 4 2 1 5 ]
[2, 12111, 2] À0.2670 0.9828 0.0246 [1 4 2 1 5 ]
[2, 11111, 2] À0.2635 0.9827 0.0246 [1 4 2 1 5 ]
[2, 12222, 2] À0.3640 0.9844 0.0234 [1 4 1 1 5 ]
[2, 12221, 2] À0.3323 0.9839 0.0238 [1 3 1 2 5 ]
[2, 12312, 2] À0.3301 0.9839 0.0238 [1 3 1 2 4 ]
[2, 13212, 2] À0.2370 0.9823 0.0249 [1 5 2 2 5 ]
[2, 22212, 2] À0.3165 0.9837 0.0240 [1 5 1 2 6 ]
[2, 12212, 2] À0.2783 0.9830 0.0244 [1 5 2 1 5 ]
[2, 12312, 2] À0.2750 0.9830 0.0244 [1 3 2 1 4 ]
[2, 13212, 2] À0.3524 0.9843 0.0235 [1 4 1 2 5 ]
[2, 22222, 2] À0.3627 0.9844 0.0234 [1 5 1 1 5 ]
[2, 12211, 2] À0.3149 0.9836 0.0240 [1 3 1 1 3 ]
[2, 12213, 2] À0.3534 0.9843 0.0235 [1 5 1 2 5 ]
[3, 12312, 3] À0.3561 0.9844 0.0234 [2 4 1 1 5 ]
[3, 13212, 3] À0.3572 0.9844 0.0234 [2 4 1 1 6 ]
[3, 12222, 3] À0.3626 0.9844 0.0234 [2 4 1 2 6 ]
Fig. 4. Difference between the best NARMAX model and actual measurements of overhead temperature with validation data set.
L. Aggoune et al. / ISA Transactions 63 (2016) 394–400398
reflux, and pressure drop. To evaluate the validity and effective-
ness of the proposed method as a monitoring tool, the detection
of a sudden increasing of the heating power (Qb) and the reflux
timer (Rt) to 100% was selected under the assumption that only
a single fault can occur at any time. These types of faults are
introduced at instant 10835 S and cause a deviation in comparison
with normal mode (Fig. 5). It is assumed that the first 100 values of
the temperature (Td) are not faulty.
Consequently, the reference pdf will be estimated. It is impor-
tant to note that the effect of a sudden increasing of the heating
power (Qb) was compensated by the distillation supervision con-
trol system 17 samples (187 S) after its occurrence as shown in
Fig. 5a.
These anomalies should be detected by KLD. The dissimilarity
obtained from the evaluation of KLD between actual pdf and the
reference before and after the faults has been verified according to
Eq. (3). Fig. 6 presents the evolution of KLD for the selected faults.
This evolution shows the ability of the proposed technique to
detect real faults because it exceeds a predefined threshold TD. TD
is the level up to which, the KLD is considered still in the fault-free
situations (its abnormality is due only to modeling errors and
noises). This threshold value is defined by the KLD of the residual
obtained with fault-free data.
The threshold value is calculated as:
TD ¼ Mþ3σ ð16Þ
where M and σ are the mean and the standard deviation of the
distance.
As it is shown in Fig. 6, the faults detection is made with
respect to a threshold of the KLD. The first fault (increasing of the
heating power) was detected at 10879 S, i.e., with the delay of 44
S. This delay corresponds to a difference (ΔTd¼ À0.4 °C) between
the desired (Td¼101.4 °C) and the fault overhead temperature
(Td¼101.8 °C). The second fault (increasing of the reflux timer (Rt))
was confirmed 33 S after its occurrence. This corresponds to a
difference ΔTd¼ À2.3 °C between the desired overhead tem-
perature and the faulty one. The detection delay was defined as
the difference between the time of the fault occurrence and the
time of its detection. This was especially due to the fault ampli-
tude, the time dependency of fault (abrupt fault, incipient fault, or
intermittent fault), and the evolution of the dynamical behavior of
the separation unit.
As indicated by the experimental case studies, the values of
KLD determined from the residual of identified NARMAX model
can be used to distinguish between the normal mode and the
abnormal one.
6. Conclusion
Kullback Leibler divergence (KLD) has been widely used in
information theory to compute the resemblance of two distribu-
tions. In this work, it is used as a fault detection framework for
anomaly detection that exploits the residual of NARMAX model. In
this proposed fault detection approach, the black-box modeling
method is used to establish a NARMAX model through input-
output experimental measurements with the aim of providing a
Fig. 5. Evolution of the overhead temperature caused by the chosen faults.
Fig. 6. Results of KLD for the chosen faults.
L. Aggoune et al. / ISA Transactions 63 (2016) 394–400 399
reliable model of the system considered. Then, the KLD is applied
to measure the dissimilarity between the probability density of the
residual of the NARMAX model and the reference distribution
obtained in normal condition operation. The effectiveness of the
proposed approach has been illustrated on experimental faults in
distillation column. The results of these experimental faults clearly
demonstrate the potential of the adopted fault detection method.
The KLD can be used in order to avoid a dangerous behavior of
the distillation unit by setting-on the suitable alarm and con-
ducting appropriate actions on the process.
References
[1] Chetouani Y. Model selection and fault detection approach based on Bayes
decision theory: Application to changes detection problem in a distillation
column. Process Saf Environ Prot 2014;92:215–23.
[2] Wong PK, Yang Z, Vong CM, Zhong J. Real-time fault diagnosis for gas turbine
generator systems using extreme learning machine. Neurocomputing
2014;128:249–57.
[3] Harrou F, Nounou MN, Nounou HN, Madakyaru M. Statistical fault detection using
PCA-based GLR hypothesis testing. J Loss Prev Process Ind 2013;26:129–39.
[4] Kouadri A, Aitouche MA, Zelmat M. Variogram-based fault diagnosis in an
interconnected tank system. ISA Trans 2012;51:471–6.
[5] Goffaux G, Wouwer AV, Bernard O. Continuous–discrete interval observers
applied to the monitoring of cultures of microalgae. J Proc Control
2009;19:1182–90.
[6] Ding SX. Model-based Fault Diagnosis Techniques: Design Schemes, Algo-
rithms, and Tools. Berlin: Springer; 2008.
[7] Blanke M, Kinnaert M, Lunze J, Staroswiecki M. Diagnosis and Fault-Tolerant
Control. Berlin: Springer; 2006.
[8] Isermann R. Fault-Diagnosis Systems, An Introduction from Fault Detection to
Fault Tolerance. Berlin: Springer; 2006.
[9] Chai W, Qiao J. Passive robust fault detection using RBF neural modeling based
on set membership identification. Eng Appl Artif Intell 2014;28:1–12.
[10] Akhenak A, Duviella E, Bako B, Lecoeuche S. Online fault diagnosis using
recursive subspace identification: Application to a dam-gallery open channel
system. Control Eng Pract 2013;21:797–806.
[11] Peng ZK, Lang ZQ, Wolters C, Billings SA, Worden K. Feasibility study of
structural damage detection using NARMAX modelling and Nonlinear Output
Frequency Response Function based analysis. Mech Syst Signal Process
2011;25:1045–61.
[12] Shashoa NAA, Kvaščev G, Marjanović A, Worden K. Sensor fault detection and
isolation in a thermal power plant steam separator. Control Eng Pract
2013;21:908–16.
[13] Basseville M, Nikiforov IV. Detection of Abrupt Changes: Theory and Appli-
cation. New Jersey: Prentice Hall Englewood Cliffs; 1993.
[14] Belov DI, Armstrong RD. Distributions of the Kullback–Leibler divergence with
applications. Br J Math Stat Psychol 2011;64:291–309.
[15] J.R. Hershey, P.A. Olsen, Approximating the kullback leibler divergence
between gaussian mixture models. In: Proc. of IEEE ICASSP. Hawaii, USA; 15–
20 April 2007. 4: p. 317–20.
[16] Silva J, Narayanan S. Average divergence distance as a statistical discrimination
measure for hidden Markov models. IEEE Trans Audio Speech Lang Process
2006;14:890–906.
[17] J. Harmouche, C. Delpha, D. Diallo, Incipient fault detection and diagnosis
based on Kullback–Leibler divergence using Principal Component Analysis:
Part I, Signal Process. 94, 278–287.
[18] Houerbi KR, Salamatian K, Kamoun F. Scan surveillance in internet networks.
Networking 2009:614–25.
[19] Georgoulas G, Mustafa MO, Tsoumas IP, Antonino-Daviu JA, Climente-Alarcon
V, Stylios CD, Nikolakopoulos G. Principal Component Analysis of the start-up
transient and Hidden Markov Modeling for broken rotor bar fault diagnosis in
asynchronous machines. Expert Syst Appl 2013;40:7024–33.
[20] Shi Z, Gu F, Lennox B, Ball AD. The development of an adaptive threshold for
model-based fault detection of a nonlinear electro-hydraulic system. Control
Eng Pract 2005;13:1357–67.
[21] Pukelsheim F. The three sigma rule. Am Stat 1994;48:88–91.
[22] Billing SA. Nonlinear System Identification: NARMAX Methods in the Time,
Frequency, and Spatio-Temporal Domains. New York: Wiley; 2013.
[23] Landau ID, Lozano R, M'Saad M, Karimi A. Adaptive Control: Algorithms,
Analysis and Applications. London: Springer; 2011.
[24] Ljung L. System Identification, Theory for the User. Englewood Cliffs, New
Jersey: Prentice Hall; 1999.
[25] Leontaritis IJ, Billings SA. Input–output parametric models for nonlinear sys-
tems. Int J Control 1985;4:303–44.
[26] Rahrooh A, Shepard S. Identification of nonlinear systems using NARMAX
model. Nonlinear Anal Theory Methods Appl 2009;71:e1198–202.
[27] Billings SA, Korenberg MJ, Chen S. Identification of nonlinear output-affine
systems using an orthogonal least squares algorithm. Int J Syst Sci
1988;19:1559–68.
[28] Aggoune L, Chetouani Y, Radjeai H. Recursive identification of the dynamic
behavior in a distillation column by means of autoregressive models. J Dyn
Syst Meas Contr 2014;136 044506-044506-5.
[29] Chetouani Y. Model-order reduction based on artificial neural networks for
accurate prediction of the product quality in a distillation column. Int J Autom
Control 2009;3:332–51.
[30] Akaike H. A new look at the statistical model identification. IEEE Trans Autom
Control 1974;19:716–23.
[31] Nash JE, Sutcliffe JV. River flow forecasting through conceptual models. J
Hydrol 1970;10:82–290.
[32] Billings SA, Voon WSF. Correlation based model validity tests for nonlinear
models. Int J Control 1986;44:235–44.
L. Aggoune et al. / ISA Transactions 63 (2016) 394–400400

More Related Content

PDF
Automated well test analysis ii using ‘well test auto’
PDF
ISEN 613_Team3_Final Project Report
PDF
An Automated Tool for MC/DC Test Data Generation
PDF
Fault detection of imbalanced data using incremental clustering
PDF
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
PDF
A Hierarchical Feature Set optimization for effective code change based Defec...
PDF
Code coverage based test case selection and prioritization
PDF
Anomaly detection via eliminating data redundancy and rectifying data error i...
Automated well test analysis ii using ‘well test auto’
ISEN 613_Team3_Final Project Report
An Automated Tool for MC/DC Test Data Generation
Fault detection of imbalanced data using incremental clustering
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
A Hierarchical Feature Set optimization for effective code change based Defec...
Code coverage based test case selection and prioritization
Anomaly detection via eliminating data redundancy and rectifying data error i...

Similar to Fault Detection in the Distillation Column Process (20)

PDF
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
PDF
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
PDF
Adaptive Prognostics A reliable RUL approach.pdf
PDF
Fault diagnosis in transformers
PDF
sensors-20-04577-v3akslññidnlasjjc,,jas.pdf
PDF
Process diagnostics
PDF
accessible-streaming-algorithms
PDF
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
PDF
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
PDF
Data mining projects topics for java and dot net
PDF
Neural Network-Based Actuator Fault Diagnosis for a Non-Linear Multi-Tank System
PDF
PDF
Detection of Outliers in Large Dataset using Distributed Approach
PDF
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
PDF
Anomaly Detection using multidimensional reduction Principal Component Analysis
DOC
2005年EI收录浙江财经学院论文7篇
PDF
Accelerating GWAS epistatic interaction analysis methods
PDF
A Simulated Annealing Approach For Buffer Allocation In Reliable Production L...
PDF
Multimode system condition monitoring using sparsity reconstruction for quali...
PDF
INDUCTIVE LOGIC PROGRAMMING FOR INDUSTRIAL CONTROL APPLICATIONS
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
Adaptive Prognostics A reliable RUL approach.pdf
Fault diagnosis in transformers
sensors-20-04577-v3akslññidnlasjjc,,jas.pdf
Process diagnostics
accessible-streaming-algorithms
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
Enhancing Time Series Anomaly Detection: A Hybrid Model Fusion Approach
Data mining projects topics for java and dot net
Neural Network-Based Actuator Fault Diagnosis for a Non-Linear Multi-Tank System
Detection of Outliers in Large Dataset using Distributed Approach
A SURVEY ON BLOOD DISEASE DETECTION USING MACHINE LEARNING
Anomaly Detection using multidimensional reduction Principal Component Analysis
2005年EI收录浙江财经学院论文7篇
Accelerating GWAS epistatic interaction analysis methods
A Simulated Annealing Approach For Buffer Allocation In Reliable Production L...
Multimode system condition monitoring using sparsity reconstruction for quali...
INDUCTIVE LOGIC PROGRAMMING FOR INDUSTRIAL CONTROL APPLICATIONS
Ad

More from ISA Interchange (20)

PDF
An optimal general type-2 fuzzy controller for Urban Traffic Network
PDF
Embedded intelligent adaptive PI controller for an electromechanical system
PDF
State of charge estimation of lithium-ion batteries using fractional order sl...
PDF
Fractional order PID for tracking control of a parallel robotic manipulator t...
DOCX
Fuzzy logic for plant-wide control of biological wastewater treatment process...
PDF
Design and implementation of a control structure for quality products in a cr...
PDF
Model based PI power system stabilizer design for damping low frequency oscil...
PDF
A comparison of a novel robust decentralized control strategy and MPC for ind...
PDF
Fault detection of feed water treatment process using PCA-WD with parameter o...
PDF
Model-based adaptive sliding mode control of the subcritical boiler-turbine s...
PDF
A Proportional Integral Estimator-Based Clock Synchronization Protocol for Wi...
PDF
An artificial intelligence based improved classification of two-phase flow patte...
PDF
New Method for Tuning PID Controllers Using a Symmetric Send-On-Delta Samplin...
PDF
Load estimator-based hybrid controller design for two-interleaved boost conve...
PDF
Effects of Wireless Packet Loss in Industrial Process Control Systems
PDF
A KPI-based process monitoring and fault detection framework for large-scale ...
PDF
An adaptive PID like controller using mix locally recurrent neural network fo...
PDF
A method to remove chattering alarms using median filters
PDF
Design of a new PID controller using predictive functional control optimizati...
PDF
Distillation Column Process Fault Detection in the Chemical Industries
An optimal general type-2 fuzzy controller for Urban Traffic Network
Embedded intelligent adaptive PI controller for an electromechanical system
State of charge estimation of lithium-ion batteries using fractional order sl...
Fractional order PID for tracking control of a parallel robotic manipulator t...
Fuzzy logic for plant-wide control of biological wastewater treatment process...
Design and implementation of a control structure for quality products in a cr...
Model based PI power system stabilizer design for damping low frequency oscil...
A comparison of a novel robust decentralized control strategy and MPC for ind...
Fault detection of feed water treatment process using PCA-WD with parameter o...
Model-based adaptive sliding mode control of the subcritical boiler-turbine s...
A Proportional Integral Estimator-Based Clock Synchronization Protocol for Wi...
An artificial intelligence based improved classification of two-phase flow patte...
New Method for Tuning PID Controllers Using a Symmetric Send-On-Delta Samplin...
Load estimator-based hybrid controller design for two-interleaved boost conve...
Effects of Wireless Packet Loss in Industrial Process Control Systems
A KPI-based process monitoring and fault detection framework for large-scale ...
An adaptive PID like controller using mix locally recurrent neural network fo...
A method to remove chattering alarms using median filters
Design of a new PID controller using predictive functional control optimizati...
Distillation Column Process Fault Detection in the Chemical Industries
Ad

Recently uploaded (20)

PDF
Machine learning based COVID-19 study performance prediction
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Cloud computing and distributed systems.
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Electronic commerce courselecture one. Pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
sap open course for s4hana steps from ECC to s4
PDF
cuic standard and advanced reporting.pdf
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
Machine learning based COVID-19 study performance prediction
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Cloud computing and distributed systems.
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Network Security Unit 5.pdf for BCA BBA.
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
The Rise and Fall of 3GPP – Time for a Sabbatical?
Electronic commerce courselecture one. Pdf
MYSQL Presentation for SQL database connectivity
sap open course for s4hana steps from ECC to s4
cuic standard and advanced reporting.pdf
Programs and apps: productivity, graphics, security and other tools
Reach Out and Touch Someone: Haptics and Empathic Computing
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Unlocking AI with Model Context Protocol (MCP)

Fault Detection in the Distillation Column Process

  • 1. Research Article Fault detection in the distillation column process using Kullback Leibler divergence Lakhdar Aggoune n , Yahya Chetouani 1 , Tarek Raïssi 2 a Laboratoire d’Automatique de Sétif, Département d’Electrotechnique, Université de Sétif 1, Cité Maabouda, Route de Béjaia, 19000 Sétif, Algeria b Université de Rouen, Département Génie Chimique, Rue Lavoisier, 76821 Mont Saint Aignan Cedex, France c Conservatoire National des Arts et Métiers, Département EASY, Cedric-laetitia, 292, Rue St-Martin, case 2D2P10, 75141 Paris Cedex 03, France a r t i c l e i n f o Article history: Received 28 December 2014 Received in revised form 9 September 2015 Accepted 13 March 2016 Available online 25 March 2016 This paper was recommended for publica- tion by Didier Theilliol. Keywords: Fault detection Safety NARMAX model Kullback Leibler divergence Dynamic processes a b s t r a c t Chemical plants are complex large-scale systems which need designing robust fault detection schemes to ensure high product quality, reliability and safety under different operating conditions. The present paper is concerned with a feasibility study of the application of the black-box modeling method and Kullback Leibler divergence (KLD) to the fault detection in a distillation column process. A Nonlinear Auto-Regressive Moving Average with eXogenous input (NARMAX) polynomial model is firstly developed to estimate the nonlinear behavior of the plant. Furthermore, the KLD is applied to detect abnormal modes. The proposed FD method is implemented and validated experimentally using realistic faults of a distillation plant of laboratory scale. The experimental results clearly demonstrate the fact that proposed method is effective and gives early alarm to operators. & 2016 ISA. Published by Elsevier Ltd. All rights reserved. 1. Introduction With advances in modern engineering processes, fault and change detection are becoming more and more important for increasing the reliability, safety, availability, and environmental protection. The central goal is to improve the continuity and quality levels of the industrial production for a wide range of plants. To this end, much effort has been devoted to the devel- opment of fault detection (FD) schemes with various assumptions [1–5]. The diversity of the FD strategies developed in the past dec- ades, was required by the growing complexity of industrial installations. Among the most well-known is the model based approaches [6–8] which have found great interest since many modern industries require a faster detection time and the on-line implementation of the FD methods. In practice, the latter methods are based on the use of mathematical models. The FD procedure includes residual generation and residual evaluation. Usually, residual is generated by using the difference between the pre- dicted output and real measurements data. For this kind of FD techniques, the performance of detection depends very much on the accuracy of the model used. Process modeling forms a significant role in many fields of engineering systems. There are two different methods of modeling approaches: first principles models or black-box models. First principles models are based on physical laws in order to obtain the relationship between the input and output. Unfortunately in this approach it is necessary to know accurately the complete knowl- edge of the plant. Insufficient knowledge about the physical properties of the process prevents the design and implementation of FD techniques, and decreases the security (human and envir- onmental). As an alternative strategy, the black-box modeling, based on experimental data, does not need any prior knowledge about the physical insight of the system. It can be used to solve the problem of modeling in most cases encountered in engineering practice. On the other hand, the combination of these two modeling approaches gives a grey-box modeling. This method defines the model structure from a first principles models and determines the parameters from a experimental data using estimation techniques. Because of its satisfactory results in change detection, many researchers have implemented black-box models for monitoring process industrial systems such as, artificial neural networks [1], set membership identification [9], subspace identification [10], polynomial models [11,12]. However, in this Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/isatrans ISA Transactions http://dx.doi.org/10.1016/j.isatra.2016.03.006 0019-0578/& 2016 ISA. Published by Elsevier Ltd. All rights reserved. n Corresponding author. Tel./fax: þ213 36 61 12 11. E-mail address: lakhdar.aggoune@yahoo.fr (L. Aggoune). 1 Fax: þ33 2 35 14 61 30. 2 Tel.: þ33 1 40 27 21 69. ISA Transactions 63 (2016) 394–400
  • 2. study we are interested only in an NARMAX polynomial model according to our experimental investigation. Once the residual is generated by using the model of the supervised process, its evaluation is necessary in order to reveal any deviation from the normal mode of the system. Due to pre- sence of modeling errors and measurement noises, residual eva- luation often includes a reliable decision module which can be obtained by using statistical hypothesis testing [6,13]. As an alternative approach, Kullback–Leibler divergence (KLD) [14] is proposed in this work. The KLD is a good tool to evaluate the similarity between two distributions. This paper investigates the modeling and parameter estimation of a chemical plant as a distillation column by using the black-box modeling approach based on NARMAX structures whose parameters are estimated by using the output error with extended prediction model (OEEPM) algorithm for the specific purpose of change detec- tion. An optimal NARMAX model was selected using statistical cri- teria such as Akaike’s Information Criterion (AIC), Root Mean Square Error (RMSE) and Nash-Sutcliffe Efficiency (NSE). The difference between the current measurement and the estimated output by the NARMAX model is defined as the residual of the monitored plant. The residual is compared to a fault detection threshold with the KLD which measures the probability distribution of the current residual compared to a reference one. The main contribution consists in the combination of the black-box identification method and KLD for the design of a fault detection scheme. The organization of the paper is the following: Section 2 gives a description of the Kullback–Leibler divergence as a fault detection criterion. Section 3 presents the experimental set-up. Section 4 describes the system identification procedure. Section 5 presents the experimental results of the proposed FD technique. Finally, some concluding remarks are given in Section 6. 2. Kullback-Leibler divergence Kullback-Leibler divergence (KLD), or relative entropy, is a measure of the similarity between two probability distributions. This divergence has been applied successfully in many fields such as pattern recognition [15,16]. Harmouche et al. [17] developed a fault detection approach under the PCA context which firstly used a PCA statistical model to extract a reference probability dis- tribution for each latent principal score from data and then monitored the probability distributions of these scores for each new set of data by using the KLD. They demonstrated its superior fault detection performance over the Hotelling T2 statistic. Houerbi et al. [18] proposed a change detection method for scan surveil- lance in Internet Networks. They used KLD for monitoring the variation of the distribution of scan features in a space spanned by IP source address, IP destination address, source port, and desti- nation port numbers. In the current work, The KLD can efficiently detect the abnor- mal events in distillation column by monitoring the dissimilarity between the reference probability density function (pdf) of the normal mode and the actual one based on the residual of the identified model. The KLD is an example of f divergence measure which is used to quantify discrepancy between pairs of probability distributions. In order to measure the difference between two discrete pdfs p0 (ε) (normal mode) and p1 (ε) (abnormal mode) of random variable ε, the KLD is defined as follow: KLðp1ðϵÞ‖p0ðϵÞÞ ¼ XN i ¼ 1 p1 ϵið Þlog p1 ϵið Þ p0 ϵið Þ ð1Þ where N is the data number and ε represents the sequence of the residual given by the difference between the process output yðtÞ and its estimate ^yðtÞasεðtÞ ¼ yðtÞÀ ^yðtÞ. Two fundamental properties of KLD are: 1. Positivity i.e. KLðp1ðϵÞ p0ðϵÞÞZ0 with equality if and only if p1ðεÞ ¼ p0ðεÞ, 2. Asymmetry i.e. KLðp1ðϵÞ p0ðϵÞÞaKLðp0ðϵÞ p1ðϵÞÞ . If p0 (ε) ¼ p1 (ε), the actual pdf corresponds to the one obtained in normal mode and the KLD is close to zero. Otherwise, large values of KLD correspond to the case of distributions p1 (ε) and p0 (ε) totally different and it is easy to detect an abnormal mode. The goal of the proposed FD technique is to detect the presence of an abnormal mode using KLD. The basic idea of this method is to compute the difference between the probability density func- tions of normal behavior and abnormal event, which can be rea- lised by calculating the KLD between the two distributions. The detection of faults affecting the process can be formulated as a hypothesis testing problem, considering two possible operating conditions or hypotheses: the null hypothesis H0, where the parameters of KLD are the same as those of the normal operating conditions, and the hypothesis H1 is when the parameters are different to those of the normal process behavior (anomaly). KLD values can run from zero to infinity. The KLD can be easily computed in the case of normal dis- tributions. Given two normal densities p0 (ε) and p1 (ε) such that p0 εð Þ $ Nðμ0; σ2 0Þ andp1 εð Þ $ Nðμ1; σ2 1Þ, where μ0; μ1 are the means and σ0; σ1are the standard deviations for p0 (ε) and p1 (ε) respectively. In this case the KLD may be written as [14]: KLðp1ðϵÞ‖p0ðϵÞÞ ¼ 1 2 σ2 1 σ2 0 þ ðμ1 Àμ0Þ2 σ2 0 þ log σ2 0 σ2 1 À1 ! ð2Þ The standard deviation of the distribution p1 (ε) is assumed unchanged after the occurrence of a fault σ0 ¼ σ1ð Þ. Then, we can write: KLðp1ðϵÞ‖p0ðϵÞÞ ¼ 1 2 ðμ1 Àμ0Þ2 σ2 0 ! ð3Þ μ0 and σ0 are obtained from the measurement of the residual of NARMAX model recorded in normal and safe operating modes, while the parameter μ1 is calculated at the end of each fixed sampling interval. Considering Eq. (3), the above hypothesis can be formulated in terms of KLD as: H0 : KLðp1ðϵÞ‖p0ðϵÞÞrTD H1 : KLðp1ðϵÞ‖p0ðϵÞÞ4TD ð4Þ where TD is the predetermined threshold. The value of TD is cal- culated based on the three sigma rule [19–21]. 3. Experimental device The process used in this work is a distillation column, a separation process habitually found in the refinery and petro- chemical industries. Fig. 1 represents the setup considered here. A toluene methylcyclohexane mixture is introduced in the tank in order to be separated with a mass composition at 23% methylcy- clohexane. Feed preheating system is constituted by three ele- ments of 250 W each one. The reciprocating feed pump is constituted by a membrane allowing firstly the suction of the mixture and the discharge towards the tank with a flow capacity F¼4.32 L/h. The column has also a reboiler of 2 L hold-up capacity, L. Aggoune et al. / ISA Transactions 63 (2016) 394–400 395
  • 3. an immersion heater of a power Qb ¼3.3 kW, and of a level liquid switch sensor which allows the automatic stop of heating if the level is insufficient. The internal packing is made of Multiknit stainless 316 L which enhances the mass transfer between the vapor and liquid phases. A condenser is placed at the column overhead in order to condense the entire vapor coming out from the column. The cooling medium used in exchangers is water. The heat-transfer area of the total overhead condenser is 0.08 m2 . The thermocouples were coupled to a calibrated amplification cir- cuit (4–20 mA, 0–150 °C) whose signals are accessible on-line from computer, which permits the bottom and top tempera- tures to be obtained. The unit has twelve PT100 sensors which measure continuously the temperature throughout the col- umn. The measurements of top and bottom pressures of the separation unit are realized by two WIKA ECO-Tronic pressure transmitters. All sensors are interfaced with a control compu- ter using RS485/232 converters. A ETP-200 program is used to register the variables. In this set-up, the most significant variables that can be mea- sured are reflux timer (Rt), heating power (Qb), pressure drop (ΔP), preheating power (Qf), preheating temperature (Tf), feed flow rate (F), cooling rate (Qc) and overhead temperature (Td). The column is kept at a constant operating point for four hours to ensure that the column is in steady-state. The values of the measurable variables obtained on average from the nominal steady-state regime are: Qb of 45%, Rt of 14%, F of 50%, Td of 102 °C, Tf of 80 °C, Qc of 250 L/h. 4. NARMAX model system identification The problem of modeling is important in the context of simu- lation, control and fault detection and diagnosis. Several approa- ches have been developed in the literature [22–24]. The steps in the modeling procedure may be stated as follows: 1. Model structure detection, determining the model form suitable for the system of interest. 2. Model parameter estimation, estimating the unknown para- meters contained within the specified model, using experimental data. 3. Model validation, this last step includes testing to verify if the established model provides good accuracy for a fresh data set that was not used during the training phase. For nonlinear systems, one popular black-box identification method is to use the polynomial models, such as NARMAX one [22,25]. In case of multiple input, single output (MISO) systems this model is given by: yðtÞ ¼ f yðtÀ1Þ; :::; yðtÀnyÞ; u1ðtÀd1Þ; :::; u1ðtÀd1 Ànu1Þ; À …; upðtÀdpÞ; :::; upðtÀdP ÀnupÞþeðtÀ1Þ; :::; eðtÀneÞ Á þeðtÞ ð5Þ where y(t), ui(t), and e(t) represent the system output, input and noise at the discrete time t, respectively; p is the dimension of the input vector. ny; nui ; ne are the maximum time lags of the output, input, and noise, respectively;di is the input delay. f is a nonlinear function which taken here to be a polynomial expansion of its arguments with nonlinearity degree l. The expression of a poly- nomial NARMAX model is obtained as follows: yðtÞ ¼ Xn i1 ¼ 1 βi1 xi1 ðtÞþ Xn i1 ¼ 1 Xn i2 ¼ i1 βi1;i2 xi1 ðtÞxi2 ðtÞþ… þ Xn i1 ¼ 1 … Xn il ¼ il À 1 βi1;…il xi1 ðtÞ…xil ðtÞþ…þeðtÞ ð6Þ where n ¼ ny þnu1 þ:::þnup þne, β’s the model parameters, and x’s represent lagged output, input, and noise terms. The ARMAX model is a special case of the NARMAX one, which is obtained by setting l¼1. 4.1. NARMAX parameters estimation algorithm The parameter estimation of the NARMAX model has received much attention and many algorithms have been developed [26,27]. In this work, the recursive algorithm such as output error with extended prediction model (OEEPM) method is used mainly for its simplicity and superior performance [28]. Eq. (6) clearly belongs to the class of linear-in-the-parameter regression models. In order to estimate the parameters of the NARMAX model, Eq. (6) should be expressed as: yðtÞ ¼ φT ðtÞθþeðtÞ ð7Þ where φT ðtÞ includes all the output, inputs, and noise terms as well as all possible combinations up to degree l and up to time (t-1), and θ is the vector which includes the model parameters to be estimated. If the noise e(t) is known, an ordinary recursive least square (RLS) method can be used for parameters estimation. However, in general, the noise is not measurable, and the sequence e(t) is estimated iteratively as: εðtÞ ¼ yðtÞÀ ^yðtÞCeðtÞ ð8Þ where ε(t) is the residual at time t and the predicted output ^y(t) can be written as: ^yðtÞ ¼ φT ðtÀ1Þ^θ ð9Þ The parameter vector θ can be estimated by OEEPM algorithm, discussed in detail in [23]. Considering Eq. (7), one can add and subtract the term 7ða1 ^yðtÀ1Þ; :::; any ^yðtÀnyÞÞ. Note now that the regression vector φT ðtÞ includes all the predicted output, inputs, and noise terms as well as all possible combinations up to degree l and up to time (t-1). The initial values of ε(t) i.e. ε(0), ε(À1),…, ε (Àne) are set to zero. The complete algorithm used in the following Fig. 1. Experimental device: distillation column. L. Aggoune et al. / ISA Transactions 63 (2016) 394–400396
  • 4. is described by: ^θðtÞ ¼ ^θðtÀ1ÞþPðtÞφðtÞεðtÞ PðtÞ ¼ 1 λ PðtÀ1ÞÀPðt À 1ÞφðtÞφðtÞT Pðt À 1Þ λþφðtÞT Pðt À1ÞφðtÞ εðtÞ ¼ yðtÞÀφðtÞT ^θðtÞ 8 : ð10Þ where λ is a forgetting factor, ^θ denotes the estimated value, ε(t) is the residual, and P denotes the adaptation gain. 4.2. Proper NARMAX model selection The first stage towards modeling a particular system is to select appropriate inputs and output. For this purpose, experimental tests are performed which was carried out to obtain a rich mixture in methylcyclohexane. Rt, Qb, ΔP, Qf, Tf represent the main mea- surable input variables of the process and Td its output. The aforementioned work has been reported in [29]. To identify the parameters of the model describing Td, time series of relevant data were collected continuously for 13 hours. All data are collected with a sampling period of 11 s. Experiment is performed with the aim of generating estimation and validation data rich in amplitudes and in frequencies. When the input sig- nals are modified, Td ranges from 101.5 °C to 103.5 °C. According to [23,24], once the data are collected, the first two thirds of the data recorded are used to estimate the system parameters and the remaining data for model validation. These data are presented in Figs. 2 and 3. The latter show the evolution of ΔP, Rt, Qf and Qb between 1540 s and 2365 s for better readability. The next step following the determination of a reliable model for distillation column is to choose appropriate NARMAX poly- nomial model. If the model structure is known a priori then the identification can be formulated as a standard least-squares problem. However, the identification is a hard task in reality as the structure process is often unknown. An approach in resolving the structural problem is to estimate model parameters using a simple structure and gradually increasing ny;nu;ne; d and l until a desired accuracy is achieved [23]. In order to select the optimal model, statistical criteria such as AIC, RMSE and NSE were evaluated by varying the number of parameters. The latter are obtained according to Eq. (6). A set of candidate models was developed by setting the maximum number of ny ¼ 5;nu ¼ 7; ne ¼ 5; d ¼ 5105510½ Š and l ¼ 4. The number of terms to include in the final model is determined by the comparison of statistical criteria values for each model structure. These statistical criteria are defined as follows [30,31]: AIC ¼ ln N 2 V þ 2n N ð11Þ RMSE ¼ sqrt PN i ¼ 1 ð^yðtÞÀyðtÞÞ2 N 0 B B B @ 1 C C C A ð12Þ NSE ¼ 1À varðϵðtÞÞ varðyðtÞÞ ð13Þ where V is the loss function, n is the number of estimated para- meters, N is the data length in the estimation data set. The loss function V is equal to the residual ε(t) sum of squares: V ¼ 1 N XN i ¼ 1 ε2 i ð14Þ It is clear that the three criteria require the computation of the residual ε. The AIC contains two terms: The loss function V, which Fig. 2. Estimation and validation data. Fig. 3. (a) Evolution of heating power and reflux timer and (b) evolution of pressure drop and preheating power. L. Aggoune et al. / ISA Transactions 63 (2016) 394–400 397
  • 5. decreases with the model complexity, and the number of esti- mated parameters n, that is increasing with the model complexity. The objective of this criterion is to make a trade-off between prediction accuracy and model complexity. The RMSE evaluates the differences between predicted output and real measurements. Lower values of RMSE indicate good prediction quality. The NSE examines the relative magnitude of the residual variance com- pared to the measured output variance. A high NSE value (near to one) ensures the good accuracy of the identified model. The best model structure is chosen as the one that minimizes AIC and RMSE, and maximizes NSE. The NARMAX model is built on the basis of a linear ARMAX model. ny; nui ; ne and d were each varied and the OEEPM algo- rithm is employed to estimate the parameters of the models. For each combination, the statistical criteria were calculated using the estimation data set. Table 1 lists the computation results of these criteria. Only significant results are reported. Based on the results given on Table 1, the best ARMAX model structure is obtained with ny ¼ 2;nu ¼ 12222½ Š;ne ¼ 2; and d ¼ 14115½ Š, giving AIC¼ À0.3640, NSE¼0.9844 and RMSE¼0.0234. To select the appropriate NARMAX model for the studied dis- tillation column system, the chosen orders and delays for the best ARMAX model are used to define the best NARMAX model. Starting from the orders and delays indicated for the best ARMAX model, and increasing or decreasing their values, the optimal values for NARMAX model can be found. Based on the evaluation of statistical criteria, the best NARMAX model structure is obtained with AIC¼ À0.4126, NSE¼0.9851 and RMSE ¼ 0.0229. In this case, the model which predicts overhead temperature is given by: TdðtÞ ¼ À0:2601TdðtÀ1ÞÀ0:6901TdðtÀ2Þþ0:0228QbðtÀ1Þ À0:0427Qf ðtÀ4ÞÀ0:0558Qf ðtÀ5Þþ0:0685RtðtÀ1Þ þ0:1053RtðtÀ2Þþ0:0165ΔPðtÀ1ÞÀ0:0199ΔPðtÀ2Þ À0:0971Tf ðtÀ5ÞÀ0:0027Tf ðtÀ6Þþ0:3524eðtÀ1Þ þ0:4175eðtÀ2Þþ0:2139TdðtÀ2ÞRtðtÀ1ÞΔPðtÀ5Þ þ0:1093RtðtÀ6ÞΔPðtÀ1ÞΔPðtÀ6Þ À0:1050QbðtÀ1ÞRtðtÀ4ÞΔPðtÀ1Þ ð15Þ Eq. (15) shows that 3 nonlinear terms have been added to the model. Using the NARMAX model the AIC is reduced in compar- ison with the ARMAX model. During the development of the different models, the NARMAX model provides a good robustness against the orders and delays variations than the ARMAX model. This is because the small var- iations of the optimal values of ny, nu, ne and d found for the best ARMAX model degrade the prediction accuracy. Consequently, the NARMAX model is more suitable than the ARMAX model for the prediction of the overhead temperature. The validation phase is the last step in system identification problem, where the performances of the identified NARMAX model are evaluated for validation data set. It is necessary to analyze the properties of the residual (Fig. 4) as seen in [28,29]. There are several techniques of model validation [22]. In general, a model generating the minimum value of the residual would best represent the real plant of interest. A standard performance index is based on the whiteness test of residual and the independence between the residual and inputs [23,24,32]. In fact, if this condi- tion is satisfied, the identified model has effectively captured the system dynamics. 5. Fault detection results In this section, the performance of the monitoring strategy based on KLD is assessed through its utilization to detect faults in a distillation column. The identified model is used to generate a residual on which the anomaly is detected by the FD procedure. In order to verify the ability and effectiveness of the proposed method, experimental faults of the distillation plant of laboratory scale were performed. The adopted technique for anomaly detec- tion is based on the assumption that the residual variance remains unchanged and the residual shift only exists in residual mean. Determination of parameters μ0 and σ0 was performed by using a measurement data on the residual in fault free case. The distillation column is used widely in the fields of chemical industry. The Early detection of faults can help to avoid pro- ductivity loss and damage to human health. The distillation unit can be affected by several faults. The major categories of possible faults, including heating power, preheating power, feed pump, Table 1 Comparison between different ARMAX model structures estimated with OEEPM algorithm. Order [na, nb, nc] AIC NSE RMSE Delay d [2, 12212, 2] À0.3591 0.9843 0.0234 [1 4 1 1 5 ] [2, 12211, 2] À0.2792 0.9830 0.0244 [1 4 2 1 5 ] [2, 12111, 2] À0.2670 0.9828 0.0246 [1 4 2 1 5 ] [2, 11111, 2] À0.2635 0.9827 0.0246 [1 4 2 1 5 ] [2, 12222, 2] À0.3640 0.9844 0.0234 [1 4 1 1 5 ] [2, 12221, 2] À0.3323 0.9839 0.0238 [1 3 1 2 5 ] [2, 12312, 2] À0.3301 0.9839 0.0238 [1 3 1 2 4 ] [2, 13212, 2] À0.2370 0.9823 0.0249 [1 5 2 2 5 ] [2, 22212, 2] À0.3165 0.9837 0.0240 [1 5 1 2 6 ] [2, 12212, 2] À0.2783 0.9830 0.0244 [1 5 2 1 5 ] [2, 12312, 2] À0.2750 0.9830 0.0244 [1 3 2 1 4 ] [2, 13212, 2] À0.3524 0.9843 0.0235 [1 4 1 2 5 ] [2, 22222, 2] À0.3627 0.9844 0.0234 [1 5 1 1 5 ] [2, 12211, 2] À0.3149 0.9836 0.0240 [1 3 1 1 3 ] [2, 12213, 2] À0.3534 0.9843 0.0235 [1 5 1 2 5 ] [3, 12312, 3] À0.3561 0.9844 0.0234 [2 4 1 1 5 ] [3, 13212, 3] À0.3572 0.9844 0.0234 [2 4 1 1 6 ] [3, 12222, 3] À0.3626 0.9844 0.0234 [2 4 1 2 6 ] Fig. 4. Difference between the best NARMAX model and actual measurements of overhead temperature with validation data set. L. Aggoune et al. / ISA Transactions 63 (2016) 394–400398
  • 6. reflux, and pressure drop. To evaluate the validity and effective- ness of the proposed method as a monitoring tool, the detection of a sudden increasing of the heating power (Qb) and the reflux timer (Rt) to 100% was selected under the assumption that only a single fault can occur at any time. These types of faults are introduced at instant 10835 S and cause a deviation in comparison with normal mode (Fig. 5). It is assumed that the first 100 values of the temperature (Td) are not faulty. Consequently, the reference pdf will be estimated. It is impor- tant to note that the effect of a sudden increasing of the heating power (Qb) was compensated by the distillation supervision con- trol system 17 samples (187 S) after its occurrence as shown in Fig. 5a. These anomalies should be detected by KLD. The dissimilarity obtained from the evaluation of KLD between actual pdf and the reference before and after the faults has been verified according to Eq. (3). Fig. 6 presents the evolution of KLD for the selected faults. This evolution shows the ability of the proposed technique to detect real faults because it exceeds a predefined threshold TD. TD is the level up to which, the KLD is considered still in the fault-free situations (its abnormality is due only to modeling errors and noises). This threshold value is defined by the KLD of the residual obtained with fault-free data. The threshold value is calculated as: TD ¼ Mþ3σ ð16Þ where M and σ are the mean and the standard deviation of the distance. As it is shown in Fig. 6, the faults detection is made with respect to a threshold of the KLD. The first fault (increasing of the heating power) was detected at 10879 S, i.e., with the delay of 44 S. This delay corresponds to a difference (ΔTd¼ À0.4 °C) between the desired (Td¼101.4 °C) and the fault overhead temperature (Td¼101.8 °C). The second fault (increasing of the reflux timer (Rt)) was confirmed 33 S after its occurrence. This corresponds to a difference ΔTd¼ À2.3 °C between the desired overhead tem- perature and the faulty one. The detection delay was defined as the difference between the time of the fault occurrence and the time of its detection. This was especially due to the fault ampli- tude, the time dependency of fault (abrupt fault, incipient fault, or intermittent fault), and the evolution of the dynamical behavior of the separation unit. As indicated by the experimental case studies, the values of KLD determined from the residual of identified NARMAX model can be used to distinguish between the normal mode and the abnormal one. 6. Conclusion Kullback Leibler divergence (KLD) has been widely used in information theory to compute the resemblance of two distribu- tions. In this work, it is used as a fault detection framework for anomaly detection that exploits the residual of NARMAX model. In this proposed fault detection approach, the black-box modeling method is used to establish a NARMAX model through input- output experimental measurements with the aim of providing a Fig. 5. Evolution of the overhead temperature caused by the chosen faults. Fig. 6. Results of KLD for the chosen faults. L. Aggoune et al. / ISA Transactions 63 (2016) 394–400 399
  • 7. reliable model of the system considered. Then, the KLD is applied to measure the dissimilarity between the probability density of the residual of the NARMAX model and the reference distribution obtained in normal condition operation. The effectiveness of the proposed approach has been illustrated on experimental faults in distillation column. The results of these experimental faults clearly demonstrate the potential of the adopted fault detection method. The KLD can be used in order to avoid a dangerous behavior of the distillation unit by setting-on the suitable alarm and con- ducting appropriate actions on the process. References [1] Chetouani Y. Model selection and fault detection approach based on Bayes decision theory: Application to changes detection problem in a distillation column. Process Saf Environ Prot 2014;92:215–23. [2] Wong PK, Yang Z, Vong CM, Zhong J. Real-time fault diagnosis for gas turbine generator systems using extreme learning machine. Neurocomputing 2014;128:249–57. [3] Harrou F, Nounou MN, Nounou HN, Madakyaru M. Statistical fault detection using PCA-based GLR hypothesis testing. J Loss Prev Process Ind 2013;26:129–39. [4] Kouadri A, Aitouche MA, Zelmat M. Variogram-based fault diagnosis in an interconnected tank system. ISA Trans 2012;51:471–6. [5] Goffaux G, Wouwer AV, Bernard O. Continuous–discrete interval observers applied to the monitoring of cultures of microalgae. J Proc Control 2009;19:1182–90. [6] Ding SX. Model-based Fault Diagnosis Techniques: Design Schemes, Algo- rithms, and Tools. Berlin: Springer; 2008. [7] Blanke M, Kinnaert M, Lunze J, Staroswiecki M. Diagnosis and Fault-Tolerant Control. Berlin: Springer; 2006. [8] Isermann R. Fault-Diagnosis Systems, An Introduction from Fault Detection to Fault Tolerance. Berlin: Springer; 2006. [9] Chai W, Qiao J. Passive robust fault detection using RBF neural modeling based on set membership identification. Eng Appl Artif Intell 2014;28:1–12. [10] Akhenak A, Duviella E, Bako B, Lecoeuche S. Online fault diagnosis using recursive subspace identification: Application to a dam-gallery open channel system. Control Eng Pract 2013;21:797–806. [11] Peng ZK, Lang ZQ, Wolters C, Billings SA, Worden K. Feasibility study of structural damage detection using NARMAX modelling and Nonlinear Output Frequency Response Function based analysis. Mech Syst Signal Process 2011;25:1045–61. [12] Shashoa NAA, Kvaščev G, Marjanović A, Worden K. Sensor fault detection and isolation in a thermal power plant steam separator. Control Eng Pract 2013;21:908–16. [13] Basseville M, Nikiforov IV. Detection of Abrupt Changes: Theory and Appli- cation. New Jersey: Prentice Hall Englewood Cliffs; 1993. [14] Belov DI, Armstrong RD. Distributions of the Kullback–Leibler divergence with applications. Br J Math Stat Psychol 2011;64:291–309. [15] J.R. Hershey, P.A. Olsen, Approximating the kullback leibler divergence between gaussian mixture models. In: Proc. of IEEE ICASSP. Hawaii, USA; 15– 20 April 2007. 4: p. 317–20. [16] Silva J, Narayanan S. Average divergence distance as a statistical discrimination measure for hidden Markov models. IEEE Trans Audio Speech Lang Process 2006;14:890–906. [17] J. Harmouche, C. Delpha, D. Diallo, Incipient fault detection and diagnosis based on Kullback–Leibler divergence using Principal Component Analysis: Part I, Signal Process. 94, 278–287. [18] Houerbi KR, Salamatian K, Kamoun F. Scan surveillance in internet networks. Networking 2009:614–25. [19] Georgoulas G, Mustafa MO, Tsoumas IP, Antonino-Daviu JA, Climente-Alarcon V, Stylios CD, Nikolakopoulos G. Principal Component Analysis of the start-up transient and Hidden Markov Modeling for broken rotor bar fault diagnosis in asynchronous machines. Expert Syst Appl 2013;40:7024–33. [20] Shi Z, Gu F, Lennox B, Ball AD. The development of an adaptive threshold for model-based fault detection of a nonlinear electro-hydraulic system. Control Eng Pract 2005;13:1357–67. [21] Pukelsheim F. The three sigma rule. Am Stat 1994;48:88–91. [22] Billing SA. Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains. New York: Wiley; 2013. [23] Landau ID, Lozano R, M'Saad M, Karimi A. Adaptive Control: Algorithms, Analysis and Applications. London: Springer; 2011. [24] Ljung L. System Identification, Theory for the User. Englewood Cliffs, New Jersey: Prentice Hall; 1999. [25] Leontaritis IJ, Billings SA. Input–output parametric models for nonlinear sys- tems. Int J Control 1985;4:303–44. [26] Rahrooh A, Shepard S. Identification of nonlinear systems using NARMAX model. Nonlinear Anal Theory Methods Appl 2009;71:e1198–202. [27] Billings SA, Korenberg MJ, Chen S. Identification of nonlinear output-affine systems using an orthogonal least squares algorithm. Int J Syst Sci 1988;19:1559–68. [28] Aggoune L, Chetouani Y, Radjeai H. Recursive identification of the dynamic behavior in a distillation column by means of autoregressive models. J Dyn Syst Meas Contr 2014;136 044506-044506-5. [29] Chetouani Y. Model-order reduction based on artificial neural networks for accurate prediction of the product quality in a distillation column. Int J Autom Control 2009;3:332–51. [30] Akaike H. A new look at the statistical model identification. IEEE Trans Autom Control 1974;19:716–23. [31] Nash JE, Sutcliffe JV. River flow forecasting through conceptual models. J Hydrol 1970;10:82–290. [32] Billings SA, Voon WSF. Correlation based model validity tests for nonlinear models. Int J Control 1986;44:235–44. L. Aggoune et al. / ISA Transactions 63 (2016) 394–400400