Fault Detection in the Distillation Column Process

Research Article
Fault detection in the distillation column process using Kullback
Leibler divergence
Lakhdar Aggoune n
, Yahya Chetouani 1
, Tarek Raïssi 2
a
Laboratoire d’Automatique de Sétif, Département d’Electrotechnique, Université de Sétif 1, Cité Maabouda, Route de Béjaia, 19000 Sétif, Algeria
b
Université de Rouen, Département Génie Chimique, Rue Lavoisier, 76821 Mont Saint Aignan Cedex, France
c
Conservatoire National des Arts et Métiers, Département EASY, Cedric-laetitia, 292, Rue St-Martin, case 2D2P10, 75141 Paris Cedex 03, France
a r t i c l e i n f o
Article history:
Received 28 December 2014
Received in revised form
9 September 2015
Accepted 13 March 2016
Available online 25 March 2016
This paper was recommended for publica-
tion by Didier Theilliol.
Keywords:
Fault detection
Safety
NARMAX model
Kullback Leibler divergence
Dynamic processes
a b s t r a c t
Chemical plants are complex large-scale systems which need designing robust fault detection schemes to
ensure high product quality, reliability and safety under different operating conditions. The present
paper is concerned with a feasibility study of the application of the black-box modeling method and
Kullback Leibler divergence (KLD) to the fault detection in a distillation column process. A Nonlinear
Auto-Regressive Moving Average with eXogenous input (NARMAX) polynomial model is firstly developed
to estimate the nonlinear behavior of the plant. Furthermore, the KLD is applied to detect abnormal
modes. The proposed FD method is implemented and validated experimentally using realistic faults of a
distillation plant of laboratory scale. The experimental results clearly demonstrate the fact that proposed
method is effective and gives early alarm to operators.
& 2016 ISA. Published by Elsevier Ltd. All rights reserved.
1. Introduction
With advances in modern engineering processes, fault and
change detection are becoming more and more important for
increasing the reliability, safety, availability, and environmental
protection. The central goal is to improve the continuity and
quality levels of the industrial production for a wide range of
plants. To this end, much effort has been devoted to the devel-
opment of fault detection (FD) schemes with various assumptions
[1–5].
The diversity of the FD strategies developed in the past dec-
ades, was required by the growing complexity of industrial
installations. Among the most well-known is the model based
approaches [6–8] which have found great interest since many
modern industries require a faster detection time and the on-line
implementation of the FD methods. In practice, the latter methods
are based on the use of mathematical models. The FD procedure
includes residual generation and residual evaluation. Usually,
residual is generated by using the difference between the pre-
dicted output and real measurements data. For this kind of FD
techniques, the performance of detection depends very much on
the accuracy of the model used.
Process modeling forms a significant role in many fields of
engineering systems. There are two different methods of modeling
approaches: first principles models or black-box models. First
principles models are based on physical laws in order to obtain the
relationship between the input and output. Unfortunately in this
approach it is necessary to know accurately the complete knowl-
edge of the plant. Insufficient knowledge about the physical
properties of the process prevents the design and implementation
of FD techniques, and decreases the security (human and envir-
onmental). As an alternative strategy, the black-box modeling,
based on experimental data, does not need any prior knowledge
about the physical insight of the system. It can be used to solve the
problem of modeling in most cases encountered in engineering
practice. On the other hand, the combination of these two
modeling approaches gives a grey-box modeling. This method
defines the model structure from a first principles models and
determines the parameters from a experimental data using
estimation techniques. Because of its satisfactory results in
change detection, many researchers have implemented black-box
models for monitoring process industrial systems such as, artificial
neural networks [1], set membership identification [9], subspace
identification [10], polynomial models [11,12]. However, in this
Contents lists available at ScienceDirect
journal homepage: www.elsevier.com/locate/isatrans
ISA Transactions
http://dx.doi.org/10.1016/j.isatra.2016.03.006
0019-0578/& 2016 ISA. Published by Elsevier Ltd. All rights reserved.
n
Corresponding author. Tel./fax: þ213 36 61 12 11.
E-mail address: lakhdar.aggoune@yahoo.fr (L. Aggoune).
1
Fax: þ33 2 35 14 61 30.
2
Tel.: þ33 1 40 27 21 69.
ISA Transactions 63 (2016) 394–400

study we are interested only in an NARMAX polynomial model
according to our experimental investigation.
Once the residual is generated by using the model of the
supervised process, its evaluation is necessary in order to reveal
any deviation from the normal mode of the system. Due to pre-
sence of modeling errors and measurement noises, residual eva-
luation often includes a reliable decision module which can be
obtained by using statistical hypothesis testing [6,13]. As an
alternative approach, Kullback–Leibler divergence (KLD) [14] is
proposed in this work. The KLD is a good tool to evaluate the
similarity between two distributions.
This paper investigates the modeling and parameter estimation of
a chemical plant as a distillation column by using the black-box
modeling approach based on NARMAX structures whose parameters
are estimated by using the output error with extended prediction
model (OEEPM) algorithm for the specific purpose of change detec-
tion. An optimal NARMAX model was selected using statistical cri-
teria such as Akaike’s Information Criterion (AIC), Root Mean Square
Error (RMSE) and Nash-Sutcliffe Efficiency (NSE). The difference
between the current measurement and the estimated output by the
NARMAX model is defined as the residual of the monitored plant.
The residual is compared to a fault detection threshold with the KLD
which measures the probability distribution of the current residual
compared to a reference one. The main contribution consists in the
combination of the black-box identification method and KLD for the
design of a fault detection scheme.
The organization of the paper is the following: Section 2 gives a
description of the Kullback–Leibler divergence as a fault detection
criterion. Section 3 presents the experimental set-up. Section 4
describes the system identification procedure. Section 5 presents
the experimental results of the proposed FD technique. Finally,
some concluding remarks are given in Section 6.
2. Kullback-Leibler divergence
Kullback-Leibler divergence (KLD), or relative entropy, is a
measure of the similarity between two probability distributions.
This divergence has been applied successfully in many fields such
as pattern recognition [15,16]. Harmouche et al. [17] developed a
fault detection approach under the PCA context which firstly used
a PCA statistical model to extract a reference probability dis-
tribution for each latent principal score from data and then
monitored the probability distributions of these scores for each
new set of data by using the KLD. They demonstrated its superior
fault detection performance over the Hotelling T2
statistic. Houerbi
et al. [18] proposed a change detection method for scan surveil-
lance in Internet Networks. They used KLD for monitoring the
variation of the distribution of scan features in a space spanned by
IP source address, IP destination address, source port, and desti-
nation port numbers.
In the current work, The KLD can efficiently detect the abnor-
mal events in distillation column by monitoring the dissimilarity
between the reference probability density function (pdf) of the
normal mode and the actual one based on the residual of the
identified model.
The KLD is an example of f divergence measure which is used to
quantify discrepancy between pairs of probability distributions. In
order to measure the difference between two discrete pdfs p0 (ε)
(normal mode) and p1 (ε) (abnormal mode) of random variable ε,
the KLD is defined as follow:
KLðp1ðϵÞ‖p0ðϵÞÞ ¼
XN
i ¼ 1
p1 ϵið Þlog
p1 ϵið Þ
p0 ϵið Þ

ð1Þ
where N is the data number and ε represents the sequence of the
residual given by the difference between the process output yðtÞ
and its estimate ^yðtÞasεðtÞ ¼ yðtÞÀ ^yðtÞ.
Two fundamental properties of KLD are:
1. Positivity i.e. KLðp1ðϵÞ p0ðϵÞÞZ0

with equality if and only if
p1ðεÞ ¼ p0ðεÞ,
2. Asymmetry i.e. KLðp1ðϵÞ p0ðϵÞÞaKLðp0ðϵÞ p1ðϵÞÞ

.
If p0 (ε) ¼ p1 (ε), the actual pdf corresponds to the one
obtained in normal mode and the KLD is close to zero. Otherwise,
large values of KLD correspond to the case of distributions p1 (ε)
and p0 (ε) totally different and it is easy to detect an
abnormal mode.
The goal of the proposed FD technique is to detect the presence
of an abnormal mode using KLD. The basic idea of this method is
to compute the difference between the probability density func-
tions of normal behavior and abnormal event, which can be rea-
lised by calculating the KLD between the two distributions. The
detection of faults affecting the process can be formulated as a
hypothesis testing problem, considering two possible operating
conditions or hypotheses: the null hypothesis H0, where the
parameters of KLD are the same as those of the normal operating
conditions, and the hypothesis H1 is when the parameters are
different to those of the normal process behavior (anomaly). KLD
values can run from zero to infinity.
The KLD can be easily computed in the case of normal dis-
tributions. Given two normal densities p0 (ε) and p1 (ε) such that
p0 εð Þ $ Nðμ0; σ2
0Þ andp1 εð Þ $ Nðμ1; σ2
1Þ, where μ0; μ1 are the means
and σ0; σ1are the standard deviations for p0 (ε) and p1 (ε)
respectively. In this case the KLD may be written as [14]:
1
2
σ2
1
σ2
0
þ
ðμ1 Àμ0Þ2
σ2
0
þ log
σ2
0
σ2
1
À1
!
ð2Þ
The standard deviation of the distribution p1 (ε) is assumed
unchanged after the occurrence of a fault σ0 ¼ σ1ð Þ. Then, we can
write:
1
2
ðμ1 Àμ0Þ2
σ2
0
!
ð3Þ
μ0 and σ0 are obtained from the measurement of the residual
of NARMAX model recorded in normal and safe operating modes,
while the parameter μ1 is calculated at the end of each fixed
sampling interval.
Considering Eq. (3), the above hypothesis can be formulated in
terms of KLD as:
H0 : KLðp1ðϵÞ‖p0ðϵÞÞrTD
H1 : KLðp1ðϵÞ‖p0ðϵÞÞ4TD ð4Þ
where TD is the predetermined threshold. The value of TD is cal-
culated based on the three sigma rule [19–21].
3. Experimental device
The process used in this work is a distillation column, a
separation process habitually found in the refinery and petro-
chemical industries. Fig. 1 represents the setup considered here. A
toluene methylcyclohexane mixture is introduced in the tank in
order to be separated with a mass composition at 23% methylcy-
clohexane. Feed preheating system is constituted by three ele-
ments of 250 W each one. The reciprocating feed pump is
constituted by a membrane allowing firstly the suction of the
mixture and the discharge towards the tank with a flow capacity
F¼4.32 L/h. The column has also a reboiler of 2 L hold-up capacity,
L. Aggoune et al. / ISA Transactions 63 (2016) 394–400 395

an immersion heater of a power Qb ¼3.3 kW, and of a level liquid
switch sensor which allows the automatic stop of heating if the
level is insufficient. The internal packing is made of Multiknit
stainless 316 L which enhances the mass transfer between the
vapor and liquid phases. A condenser is placed at the column
overhead in order to condense the entire vapor coming out from
the column. The cooling medium used in exchangers is water. The
heat-transfer area of the total overhead condenser is 0.08 m2
. The
thermocouples were coupled to a calibrated amplification cir-
cuit (4–20 mA, 0–150 °C) whose signals are accessible on-line
from computer, which permits the bottom and top tempera-
tures to be obtained. The unit has twelve PT100 sensors which
measure continuously the temperature throughout the col-
umn. The measurements of top and bottom pressures of the
separation unit are realized by two WIKA ECO-Tronic pressure
transmitters. All sensors are interfaced with a control compu-
ter using RS485/232 converters. A ETP-200 program is used to
register the variables.
In this set-up, the most significant variables that can be mea-
sured are reflux timer (Rt), heating power (Qb), pressure drop (ΔP),
preheating power (Qf), preheating temperature (Tf), feed flow rate
(F), cooling rate (Qc) and overhead temperature (Td).
The column is kept at a constant operating point for four hours
to ensure that the column is in steady-state. The values of the
measurable variables obtained on average from the nominal
steady-state regime are: Qb of 45%, Rt of 14%, F of 50%, Td of 102 °C,
Tf of 80 °C, Qc of 250 L/h.
4. NARMAX model system identification
The problem of modeling is important in the context of simu-
lation, control and fault detection and diagnosis. Several approa-
ches have been developed in the literature [22–24]. The steps in
the modeling procedure may be stated as follows:
1. Model structure detection, determining the model form suitable
for the system of interest.
2. Model parameter estimation, estimating the unknown para-
meters contained within the specified model, using
experimental data.
3. Model validation, this last step includes testing to verify if the
established model provides good accuracy for a fresh data set
that was not used during the training phase.
For nonlinear systems, one popular black-box identification
method is to use the polynomial models, such as NARMAX one
[22,25]. In case of multiple input, single output (MISO) systems
this model is given by:
yðtÞ ¼ f yðtÀ1Þ; :::; yðtÀnyÞ; u1ðtÀd1Þ; :::; u1ðtÀd1 Ànu1Þ;
À
…; upðtÀdpÞ; :::; upðtÀdP ÀnupÞþeðtÀ1Þ; :::; eðtÀneÞ
Á
þeðtÞ ð5Þ
where y(t), ui(t), and e(t) represent the system output, input and
noise at the discrete time t, respectively; p is the dimension of the
input vector. ny; nui
; ne are the maximum time lags of the output,
input, and noise, respectively;di is the input delay. f is a nonlinear
function which taken here to be a polynomial expansion of its
arguments with nonlinearity degree l. The expression of a poly-
nomial NARMAX model is obtained as follows:
yðtÞ ¼
Xn
i1 ¼ 1
βi1
xi1
ðtÞþ
Xn
i1 ¼ 1
Xn
i2 ¼ i1
βi1;i2
xi1
ðtÞxi2
ðtÞþ…
þ
Xn
i1 ¼ 1
…
Xn
il ¼ il À 1
βi1;…il
xi1
ðtÞ…xil
ðtÞþ…þeðtÞ ð6Þ
where n ¼ ny þnu1
þ:::þnup
þne, β’s the model parameters, and x’s
represent lagged output, input, and noise terms. The ARMAX
model is a special case of the NARMAX one, which is obtained by
setting l¼1.
4.1. NARMAX parameters estimation algorithm
The parameter estimation of the NARMAX model has received
much attention and many algorithms have been developed
[26,27]. In this work, the recursive algorithm such as output error
with extended prediction model (OEEPM) method is used mainly
for its simplicity and superior performance [28].
Eq. (6) clearly belongs to the class of linear-in-the-parameter
regression models. In order to estimate the parameters of the
NARMAX model, Eq. (6) should be expressed as:
yðtÞ ¼ φT
ðtÞθþeðtÞ ð7Þ
where φT
ðtÞ includes all the output, inputs, and noise terms as well
as all possible combinations up to degree l and up to time (t-1),
and θ is the vector which includes the model parameters to be
estimated. If the noise e(t) is known, an ordinary recursive least
square (RLS) method can be used for parameters estimation.
However, in general, the noise is not measurable, and the sequence
e(t) is estimated iteratively as:
εðtÞ ¼ yðtÞÀ ^yðtÞCeðtÞ ð8Þ
where ε(t) is the residual at time t and the predicted output ^y(t)
can be written as:
^yðtÞ ¼ φT
ðtÀ1Þ^θ ð9Þ
The parameter vector θ can be estimated by OEEPM algorithm,
discussed in detail in [23]. Considering Eq. (7), one can add and
subtract the term 7ða1 ^yðtÀ1Þ; :::; any
^yðtÀnyÞÞ. Note now that the
regression vector φT
ðtÞ includes all the predicted output, inputs,
and noise terms as well as all possible combinations up to degree l
and up to time (t-1). The initial values of ε(t) i.e. ε(0), ε(À1),…, ε
(Àne) are set to zero. The complete algorithm used in the following
Fig. 1. Experimental device: distillation column.
L. Aggoune et al. / ISA Transactions 63 (2016) 394–400396

is described by:
^θðtÞ ¼ ^θðtÀ1ÞþPðtÞφðtÞεðtÞ
PðtÞ ¼ 1
λ PðtÀ1ÞÀPðt À 1ÞφðtÞφðtÞT
Pðt À 1Þ
λþφðtÞT
Pðt À1ÞφðtÞ

εðtÞ ¼ yðtÞÀφðtÞT ^θðtÞ
8

:
ð10Þ
where λ is a forgetting factor, ^θ denotes the estimated value, ε(t) is
the residual, and P denotes the adaptation gain.
4.2. Proper NARMAX model selection
The first stage towards modeling a particular system is to select
appropriate inputs and output. For this purpose, experimental
tests are performed which was carried out to obtain a rich mixture
in methylcyclohexane. Rt, Qb, ΔP, Qf, Tf represent the main mea-
surable input variables of the process and Td its output. The
aforementioned work has been reported in [29].
To identify the parameters of the model describing Td, time
series of relevant data were collected continuously for 13 hours.
All data are collected with a sampling period of 11 s. Experiment is
performed with the aim of generating estimation and validation
data rich in amplitudes and in frequencies. When the input sig-
nals are modified, Td ranges from 101.5 °C to 103.5 °C. According to
[23,24], once the data are collected, the first two thirds of the data
recorded are used to estimate the system parameters and the
remaining data for model validation. These data are presented in
Figs. 2 and 3. The latter show the evolution of ΔP, Rt, Qf and Qb
between 1540 s and 2365 s for better readability.
The next step following the determination of a reliable model
for distillation column is to choose appropriate NARMAX poly-
nomial model. If the model structure is known a priori then the
identification can be formulated as a standard least-squares
problem. However, the identification is a hard task in reality as the
structure process is often unknown. An approach in resolving the
structural problem is to estimate model parameters using a simple
structure and gradually increasing ny;nu;ne; d and l until a desired
accuracy is achieved [23]. In order to select the optimal model,
statistical criteria such as AIC, RMSE and NSE were evaluated by
varying the number of parameters. The latter are obtained
according to Eq. (6). A set of candidate models was developed by
setting the maximum number of ny ¼ 5;nu ¼ 7; ne ¼ 5; d ¼
5105510½ Š and l ¼ 4. The number of terms to include in the final
model is determined by the comparison of statistical criteria
values for each model structure. These statistical criteria are
defined as follows [30,31]:
AIC ¼ ln
N
2
V

þ
2n
N
ð11Þ
RMSE ¼ sqrt
PN
i ¼ 1
ð^yðtÞÀyðtÞÞ2
N
0
B
B
B
@
1
C
C
C
A
ð12Þ
NSE ¼ 1À
varðϵðtÞÞ
varðyðtÞÞ

ð13Þ
where V is the loss function, n is the number of estimated para-
meters, N is the data length in the estimation data set. The loss
function V is equal to the residual ε(t) sum of squares:
V ¼
1
N
XN
i ¼ 1
ε2
i ð14Þ
It is clear that the three criteria require the computation of the
residual ε. The AIC contains two terms: The loss function V, which
Fig. 2. Estimation and validation data.
Fig. 3. (a) Evolution of heating power and reflux timer and (b) evolution of pressure drop and preheating power.

decreases with the model complexity, and the number of esti-
mated parameters n, that is increasing with the model complexity.
The objective of this criterion is to make a trade-off between
prediction accuracy and model complexity. The RMSE evaluates
the differences between predicted output and real measurements.
Lower values of RMSE indicate good prediction quality. The NSE
examines the relative magnitude of the residual variance com-
pared to the measured output variance. A high NSE value (near to
one) ensures the good accuracy of the identified model. The best
model structure is chosen as the one that minimizes AIC and
RMSE, and maximizes NSE.
The NARMAX model is built on the basis of a linear ARMAX
model. ny; nui
; ne and d were each varied and the OEEPM algo-
rithm is employed to estimate the parameters of the models. For
each combination, the statistical criteria were calculated using the
estimation data set. Table 1 lists the computation results of these
criteria. Only significant results are reported. Based on the results
given on Table 1, the best ARMAX model structure is obtained
with ny ¼ 2;nu ¼ 12222½ Š;ne ¼ 2; and d ¼ 14115½ Š, giving AIC¼
À0.3640, NSE¼0.9844 and RMSE¼0.0234.
To select the appropriate NARMAX model for the studied dis-
tillation column system, the chosen orders and delays for the best
ARMAX model are used to define the best NARMAX model.
Starting from the orders and delays indicated for the best ARMAX
model, and increasing or decreasing their values, the optimal
values for NARMAX model can be found. Based on the evaluation
of statistical criteria, the best NARMAX model structure is obtained
with AIC¼ À0.4126, NSE¼0.9851 and RMSE ¼ 0.0229. In this case,
the model which predicts overhead temperature is given by:
TdðtÞ ¼ À0:2601TdðtÀ1ÞÀ0:6901TdðtÀ2Þþ0:0228QbðtÀ1Þ
À0:0427Qf ðtÀ4ÞÀ0:0558Qf ðtÀ5Þþ0:0685RtðtÀ1Þ
þ0:1053RtðtÀ2Þþ0:0165ΔPðtÀ1ÞÀ0:0199ΔPðtÀ2Þ
À0:0971Tf ðtÀ5ÞÀ0:0027Tf ðtÀ6Þþ0:3524eðtÀ1Þ
þ0:4175eðtÀ2Þþ0:2139TdðtÀ2ÞRtðtÀ1ÞΔPðtÀ5Þ
þ0:1093RtðtÀ6ÞΔPðtÀ1ÞΔPðtÀ6Þ
À0:1050QbðtÀ1ÞRtðtÀ4ÞΔPðtÀ1Þ ð15Þ
Eq. (15) shows that 3 nonlinear terms have been added to the
model. Using the NARMAX model the AIC is reduced in compar-
ison with the ARMAX model.
During the development of the different models, the NARMAX
model provides a good robustness against the orders and delays
variations than the ARMAX model. This is because the small var-
iations of the optimal values of ny, nu, ne and d found for the best
ARMAX model degrade the prediction accuracy. Consequently, the
NARMAX model is more suitable than the ARMAX model for the
prediction of the overhead temperature.
The validation phase is the last step in system identification
problem, where the performances of the identified NARMAX
model are evaluated for validation data set. It is necessary to
analyze the properties of the residual (Fig. 4) as seen in [28,29].
There are several techniques of model validation [22]. In general, a
model generating the minimum value of the residual would best
represent the real plant of interest. A standard performance index
is based on the whiteness test of residual and the independence
between the residual and inputs [23,24,32]. In fact, if this condi-
tion is satisfied, the identified model has effectively captured the
system dynamics.
5. Fault detection results
In this section, the performance of the monitoring strategy
based on KLD is assessed through its utilization to detect faults in a
distillation column. The identified model is used to generate a
residual on which the anomaly is detected by the FD procedure. In
order to verify the ability and effectiveness of the proposed
method, experimental faults of the distillation plant of laboratory
scale were performed. The adopted technique for anomaly detec-
tion is based on the assumption that the residual variance remains
unchanged and the residual shift only exists in residual mean.
Determination of parameters μ0 and σ0 was performed by using a
measurement data on the residual in fault free case.
The distillation column is used widely in the fields of chemical
industry. The Early detection of faults can help to avoid pro-
ductivity loss and damage to human health. The distillation unit
can be affected by several faults. The major categories of possible
faults, including heating power, preheating power, feed pump,
Table 1
Comparison between different ARMAX model structures estimated with OEEPM
algorithm.
Order [na, nb, nc] AIC NSE RMSE Delay d
[2, 12212, 2] À0.3591 0.9843 0.0234 [1 4 1 1 5 ]
[2, 12211, 2] À0.2792 0.9830 0.0244 [1 4 2 1 5 ]
[2, 12111, 2] À0.2670 0.9828 0.0246 [1 4 2 1 5 ]
[2, 11111, 2] À0.2635 0.9827 0.0246 [1 4 2 1 5 ]
[2, 12222, 2] À0.3640 0.9844 0.0234 [1 4 1 1 5 ]
[2, 12221, 2] À0.3323 0.9839 0.0238 [1 3 1 2 5 ]
[2, 12312, 2] À0.3301 0.9839 0.0238 [1 3 1 2 4 ]
[2, 13212, 2] À0.2370 0.9823 0.0249 [1 5 2 2 5 ]
[2, 22212, 2] À0.3165 0.9837 0.0240 [1 5 1 2 6 ]
[2, 12212, 2] À0.2783 0.9830 0.0244 [1 5 2 1 5 ]
[2, 12312, 2] À0.2750 0.9830 0.0244 [1 3 2 1 4 ]
[2, 13212, 2] À0.3524 0.9843 0.0235 [1 4 1 2 5 ]
[2, 22222, 2] À0.3627 0.9844 0.0234 [1 5 1 1 5 ]
[2, 12211, 2] À0.3149 0.9836 0.0240 [1 3 1 1 3 ]
[2, 12213, 2] À0.3534 0.9843 0.0235 [1 5 1 2 5 ]
[3, 12312, 3] À0.3561 0.9844 0.0234 [2 4 1 1 5 ]
[3, 13212, 3] À0.3572 0.9844 0.0234 [2 4 1 1 6 ]
[3, 12222, 3] À0.3626 0.9844 0.0234 [2 4 1 2 6 ]
Fig. 4. Difference between the best NARMAX model and actual measurements of overhead temperature with validation data set.

reflux, and pressure drop. To evaluate the validity and effective-
ness of the proposed method as a monitoring tool, the detection
of a sudden increasing of the heating power (Qb) and the reflux
timer (Rt) to 100% was selected under the assumption that only
a single fault can occur at any time. These types of faults are
introduced at instant 10835 S and cause a deviation in comparison
with normal mode (Fig. 5). It is assumed that the first 100 values of
the temperature (Td) are not faulty.
Consequently, the reference pdf will be estimated. It is impor-
tant to note that the effect of a sudden increasing of the heating
power (Qb) was compensated by the distillation supervision con-
trol system 17 samples (187 S) after its occurrence as shown in
Fig. 5a.
These anomalies should be detected by KLD. The dissimilarity
obtained from the evaluation of KLD between actual pdf and the
reference before and after the faults has been verified according to
Eq. (3). Fig. 6 presents the evolution of KLD for the selected faults.
This evolution shows the ability of the proposed technique to
detect real faults because it exceeds a predefined threshold TD. TD
is the level up to which, the KLD is considered still in the fault-free
situations (its abnormality is due only to modeling errors and
noises). This threshold value is defined by the KLD of the residual
obtained with fault-free data.
The threshold value is calculated as:
TD ¼ Mþ3σ ð16Þ
where M and σ are the mean and the standard deviation of the
distance.
As it is shown in Fig. 6, the faults detection is made with
respect to a threshold of the KLD. The first fault (increasing of the
heating power) was detected at 10879 S, i.e., with the delay of 44
S. This delay corresponds to a difference (ΔTd¼ À0.4 °C) between
the desired (Td¼101.4 °C) and the fault overhead temperature
(Td¼101.8 °C). The second fault (increasing of the reflux timer (Rt))
was confirmed 33 S after its occurrence. This corresponds to a
difference ΔTd¼ À2.3 °C between the desired overhead tem-
perature and the faulty one. The detection delay was defined as
the difference between the time of the fault occurrence and the
time of its detection. This was especially due to the fault ampli-
tude, the time dependency of fault (abrupt fault, incipient fault, or
intermittent fault), and the evolution of the dynamical behavior of
the separation unit.
As indicated by the experimental case studies, the values of
KLD determined from the residual of identified NARMAX model
can be used to distinguish between the normal mode and the
abnormal one.
6. Conclusion
Kullback Leibler divergence (KLD) has been widely used in
information theory to compute the resemblance of two distribu-
tions. In this work, it is used as a fault detection framework for
anomaly detection that exploits the residual of NARMAX model. In
this proposed fault detection approach, the black-box modeling
method is used to establish a NARMAX model through input-
output experimental measurements with the aim of providing a
Fig. 5. Evolution of the overhead temperature caused by the chosen faults.
Fig. 6. Results of KLD for the chosen faults.

reliable model of the system considered. Then, the KLD is applied
to measure the dissimilarity between the probability density of the
residual of the NARMAX model and the reference distribution
obtained in normal condition operation. The effectiveness of the
proposed approach has been illustrated on experimental faults in
distillation column. The results of these experimental faults clearly
demonstrate the potential of the adopted fault detection method.
The KLD can be used in order to avoid a dangerous behavior of
the distillation unit by setting-on the suitable alarm and con-
ducting appropriate actions on the process.
References
[1] Chetouani Y. Model selection and fault detection approach based on Bayes
decision theory: Application to changes detection problem in a distillation
column. Process Saf Environ Prot 2014;92:215–23.
[2] Wong PK, Yang Z, Vong CM, Zhong J. Real-time fault diagnosis for gas turbine
generator systems using extreme learning machine. Neurocomputing
2014;128:249–57.
[3] Harrou F, Nounou MN, Nounou HN, Madakyaru M. Statistical fault detection using
PCA-based GLR hypothesis testing. J Loss Prev Process Ind 2013;26:129–39.
[4] Kouadri A, Aitouche MA, Zelmat M. Variogram-based fault diagnosis in an
interconnected tank system. ISA Trans 2012;51:471–6.
[5] Goffaux G, Wouwer AV, Bernard O. Continuous–discrete interval observers
applied to the monitoring of cultures of microalgae. J Proc Control
2009;19:1182–90.
[6] Ding SX. Model-based Fault Diagnosis Techniques: Design Schemes, Algo-
rithms, and Tools. Berlin: Springer; 2008.
[7] Blanke M, Kinnaert M, Lunze J, Staroswiecki M. Diagnosis and Fault-Tolerant
Control. Berlin: Springer; 2006.
[8] Isermann R. Fault-Diagnosis Systems, An Introduction from Fault Detection to
Fault Tolerance. Berlin: Springer; 2006.
[9] Chai W, Qiao J. Passive robust fault detection using RBF neural modeling based
on set membership identification. Eng Appl Artif Intell 2014;28:1–12.
[10] Akhenak A, Duviella E, Bako B, Lecoeuche S. Online fault diagnosis using
recursive subspace identification: Application to a dam-gallery open channel
system. Control Eng Pract 2013;21:797–806.
[11] Peng ZK, Lang ZQ, Wolters C, Billings SA, Worden K. Feasibility study of
structural damage detection using NARMAX modelling and Nonlinear Output
Frequency Response Function based analysis. Mech Syst Signal Process
2011;25:1045–61.
[12] Shashoa NAA, Kvaščev G, Marjanović A, Worden K. Sensor fault detection and
isolation in a thermal power plant steam separator. Control Eng Pract
2013;21:908–16.
[13] Basseville M, Nikiforov IV. Detection of Abrupt Changes: Theory and Appli-
cation. New Jersey: Prentice Hall Englewood Cliffs; 1993.
[14] Belov DI, Armstrong RD. Distributions of the Kullback–Leibler divergence with
applications. Br J Math Stat Psychol 2011;64:291–309.
[15] J.R. Hershey, P.A. Olsen, Approximating the kullback leibler divergence
between gaussian mixture models. In: Proc. of IEEE ICASSP. Hawaii, USA; 15–
20 April 2007. 4: p. 317–20.
[16] Silva J, Narayanan S. Average divergence distance as a statistical discrimination
measure for hidden Markov models. IEEE Trans Audio Speech Lang Process
2006;14:890–906.
[17] J. Harmouche, C. Delpha, D. Diallo, Incipient fault detection and diagnosis
based on Kullback–Leibler divergence using Principal Component Analysis:
Part I, Signal Process. 94, 278–287.
[18] Houerbi KR, Salamatian K, Kamoun F. Scan surveillance in internet networks.
Networking 2009:614–25.
[19] Georgoulas G, Mustafa MO, Tsoumas IP, Antonino-Daviu JA, Climente-Alarcon
V, Stylios CD, Nikolakopoulos G. Principal Component Analysis of the start-up
transient and Hidden Markov Modeling for broken rotor bar fault diagnosis in
asynchronous machines. Expert Syst Appl 2013;40:7024–33.
[20] Shi Z, Gu F, Lennox B, Ball AD. The development of an adaptive threshold for
model-based fault detection of a nonlinear electro-hydraulic system. Control
Eng Pract 2005;13:1357–67.
[21] Pukelsheim F. The three sigma rule. Am Stat 1994;48:88–91.
[22] Billing SA. Nonlinear System Identification: NARMAX Methods in the Time,
Frequency, and Spatio-Temporal Domains. New York: Wiley; 2013.
[23] Landau ID, Lozano R, M'Saad M, Karimi A. Adaptive Control: Algorithms,
Analysis and Applications. London: Springer; 2011.
[24] Ljung L. System Identification, Theory for the User. Englewood Cliffs, New
Jersey: Prentice Hall; 1999.
[25] Leontaritis IJ, Billings SA. Input–output parametric models for nonlinear sys-
tems. Int J Control 1985;4:303–44.
[26] Rahrooh A, Shepard S. Identification of nonlinear systems using NARMAX
model. Nonlinear Anal Theory Methods Appl 2009;71:e1198–202.
[27] Billings SA, Korenberg MJ, Chen S. Identification of nonlinear output-affine
systems using an orthogonal least squares algorithm. Int J Syst Sci
1988;19:1559–68.
[28] Aggoune L, Chetouani Y, Radjeai H. Recursive identification of the dynamic
behavior in a distillation column by means of autoregressive models. J Dyn
Syst Meas Contr 2014;136 044506-044506-5.
[29] Chetouani Y. Model-order reduction based on artificial neural networks for
accurate prediction of the product quality in a distillation column. Int J Autom
Control 2009;3:332–51.
[30] Akaike H. A new look at the statistical model identification. IEEE Trans Autom
Control 1974;19:716–23.
[31] Nash JE, Sutcliffe JV. River flow forecasting through conceptual models. J
Hydrol 1970;10:82–290.
[32] Billings SA, Voon WSF. Correlation based model validity tests for nonlinear
models. Int J Control 1986;44:235–44.

Fault Detection in the Distillation Column Process

More Related Content

Similar to Fault Detection in the Distillation Column Process (20)

More from ISA Interchange (20)

Recently uploaded (20)

Fault Detection in the Distillation Column Process