SlideShare a Scribd company logo
BAYESIAN METHODS FOR ASSESSING
WATER QUALITY
Khalil Shihab1 and Nida Al-Chalabi2
1

College of Engineering & Science, Victoria University, Australia
Khalil.shihab@gmail.com
2

Department of Computer Science, SQU, Oman
nida@squ.edu.om

ABSTRACT
This work presents the development of Bayesian techniques for the assessment of groundwater
quality. Its primary aim is to develop a predictive model and a computer system to assess and
predict the impact of pollutants on the water column. The process of the analysis begins by
postulating a model in light of all available knowledge taken from relevant phenomenon. The
previous knowledge as represented by the prior distribution of the model parameters is then
combined with the new data through Bayes’ theorem to yield the current knowledge represented
by the posterior distribution of model parameters. This process of updating information about
the unknown model parameters is then repeated in a sequential manner as more and more new
information becomes available.

KEYWORDS
Bayesian Belief Networks, Water Quality Assessment, Data Mining

1. INTRODUCTION
Water is an essential requirement for irrigated agriculture, domestic uses, including drinking,
cooking and sanitation. Declining surface and groundwater quality is regarded as the most serious
and persistent issue and has become as a global issue effecting the people and the ecosystem.
Anthropogenic sources of pollution such as agriculture, industry, and municipal waste, contribute
to the degradation of groundwater quality, which may limit the use of these resources and lead to
health-risk consequences. For these reasons, the need for intensive groundwater resources
management has become more urgent.
In this work, we studied the Salalah area of Oman because the groundwater has been an important
natural resource and the only available water source other than the seasonal rainfall.
Groundwater quality and pollution are determined and measured by comparing physical,
chemical, biological, microbiological, and radiological quantities and parameters to a set of
standards and criteria. A criterion is basically a scientific quantity upon which a judgment can be
based [1]. In this work, however, we considered only the chemical parameters: total dissolved
solids (TDS), electrical conductivity (EC) and water pH.

David C. Wyld et al. (Eds) : CCSIT, SIPP, AISC, PDCTA, NLP - 2014
pp. 397–407, 2014. © CS & IT-CSCP 2014

DOI : 10.5121/csit.2014.4234
398

Computer Science & Information Technology (CS & IT)

2. UNCERTAINTY ANALYSIS
The Ministry of Water Resources (MWR) maintains data on the concentration of the harmful
substances in the groundwater at Taqah monitoring sites, which are located to the south of the
Sultanate of Oman, in the Salalah plain [2, 3]. We observed that good quality data were obtained
from several monitoring wells in this region. Because of the lack of monitoring wells in certain
areas in that region, we filled in the missing measurements with data obtained from Oman Mining
Company (OMCO) and Ministry of Environmental and Regional Municipalities (MRME) [4].
Data for water quality assessment are normally collected from various monitoring wells and then
analyzed in environmental laboratories in order to measure the concentration of a number of
water quality constituents. We realized that the methods used by these laboratories do not
emphasize accuracy. There is a lack of awareness among both laboratory and validation personnel
regarding the possibility of false positives in environmental data. In order to overcome this
problem and to have representative data, we, therefore, used the following modified Bayesian
model to that developed by Banerjee, Planting and Ramirez [6], to preprocessing the datasets
used for the development of the Bayesian Networks.

2.1. Bayesian Models
The formulation of the model is as follows:
Let S denote a particular hazardous constituent of interest. Since the concentration of the
substance may vary from well to another, it is necessary to consider each well separately. Let xt=
(xt1, xt2, xt3, xtm) be the vector of m measurements of the concentration of S in m distinct water
samples from a given well at a given sampling occasion where (m>=1) and (t=1, 2, . . .). Each
measurement consists of the true concentration of S plus an error.
Let Xt be the true concentration of S in the groundwater at sampling occasion t. If we assume that
the true concentration Xt is unknown and is a random variable, the model evaluates the posterior
distribution of Xt given the sample measurements xt at sampling occasion t.
Using the normality assumption and given Xt = xt and δ2, the concentration measurements in xt
represent a random sample of size m for random distribution with mean xt and variance δ2.
We assume that the parameters xt and δ2 of the normal distribution are random variables with
certain prior probability distribution. Therefore, the model for prior distribution of Xt and δ2 can
be presented as follows:
For t =1, 2… and given δ2 the conditional distribution of Xt at sampling occasion t is a normal
distribution with mean µt-1 and variance δ2t-1 δ2. The marginal distribution of δ2 is an inverted
gamma distribution with parameter βt-1 and νt-1.
This model uses the following prior distribution, which represents the concentration
measurements before the first sampling.
The pdf of the prior distribution of X0 is:
Computer Science & Information Technology (CS & IT)



1

f 0 ( x0 ) = 1 +
 2v0





 x −µ 
0
 0

σ β 0 
v0 
 0



2

399

−( 2v0 +1) 2









(2.1)

which is the pdf of the student’s t-distribution with 2v0 degrees of freedom, location parameters
µ0 and variance δ02β0/ν0.
Now suppose that the observations are available on the concentration of S, given the sample Xt
the posterior marginal distribution of Xt is a student’s t-distribution with 2vt degree of freedom,
location parameters µt and variance δt βt/νt where the pdf has the form:


1

f t ( xt / x) = 1 +
 2vt


2
 xt − µt  


 
σ t β t ν t  

 

−( 2vt +1) 2

(2.2)

where:
m

βt = βt−1 +∑(xtj − x) / 2+m(µt−1 − xt )/[2(1+mσt2−1)]
j=1

vt = vt −1 + m / 2
2
t −1

2
t −1

µt = (µt −1 + mxtσ ) /(1 + mσ )
σ t2 = σ t2−1 /(1 + mσ t2−1 )

(2.3)

m

xt = ∑ xtj / m
j =1

It is obvious from the equation of µt the sequential nature of this posterior distribution. Therefore,
in order to present the true unknown concentration of the substance S in the well under
consideration, it is frequently more convenient to put a range (or interval) which contains most of
the posterior probability. Such intervals are called highest posterior density (HPD) intervals. Thus
for a given probability content of (1-α), 0< α<1, a 100(1- α) percent HPD interval for Xt, is given
by:

µ t ± t 2v (α / 2)σ t β t ν t

t
(2.4)
when t2vt(α/2) is the 100(1- α/2) percentile of the student’s t-distribution with 2vt degree of
freedom.

2.2. Bayesian Algorithm
In brief, the monitoring algorithm, which is based on the Bayesian model, is as follows:
(1) Fix a value of α (0< α <1) based on the desired confidence level. In this case, we chose α
to be 0.01.
(2) Since we do not have enough data to work with, we used the same parameters of the prior
distribution used in the model of Banerjee, Plantinga and Ramirez. These parameters are :
β0= 0.0073 , ν0=2.336 , µ0= 9.53 , δ02 =3056.34
400

Computer Science & Information Technology (CS & IT)

(3) At each sampling occasion t , ( t= 1,2,...), compute the parameters βt , νt , µt and δt of the
posterior distribution Xt given the set of observations in xt on the concentration of S
available from a given well in a given site using (2.3). Compute LHPD and UHPD using
these parameter estimates and (2.4).
(4) Plot µt, LHPD, and UHPD that are obtained in step 3 above against sampling occasion t.
(5) For the next sampling occasion, update the values of the parameters βt, νt, µt and δt using
(2.3) and the datasets just obtained. Recomputed LHPD, and UHPD using the updated
parameter values in (2.4) and repeat step 4 above.
Some of these datasets needed to be scaled down using the following normalization technique:
n

x=

x−µ

σ

∑x

n

x = ∑ xi n
, where

i

σ =

2
i

− nx 2

i

n −1

, and

2.3. Implementation
The pre-processing system is implemented on PC platform using Visual Basic programming
language.
Table 1 presents the concentration data for TDS (Total Dissolved Solids) for Well 001/577 in the
Taqah area. In particular, the table shows the true concentration data for TDS produced by our
pre-processing system.
Table 1. Concentration Data of TDS for Well001/577 in the Salalah plain, where OC stands for Observed
Concentration and ETC stands for Expected True Concentration.
Te
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
00
01
02
03
04

OC
1.147
1.106
1.938
2.237
3.857
3.834
3.957
3.761
4.3
3.958
1
3.714
3.65
3.381
3.396
3.477
3.498
3.23
3.243
3.267
3.297

LHPD
0.85
1
1.12
1.33
1.6
1.91
2.18
2.38
2.58
2.72
2.54
2.64
2.73
2.78
2.83
2.87
2.91
2.93
2.95
2.97
2.99

ETC
1.15
1.13
1.4
1.61
2.06
2.35
2.58
2.73
2.9
3.01
2.83
2.9
2.96
2.99
3.02
3.04
3.07
3.08
3.09
3.1
3.11

UHPD
1.45
1.26
1.68
1.88
2.52
2.79
2.98
3.08
3.23
3.3
3.11
3.16
3.19
3.2
3.2
3.22
3.23
3.23
3.22
3.22
3.22
Computer Science & Information Technology (CS & IT)

401

3. BAYESIAN NETWORKS
After the pre-processing stage, we constructed a Bayesian Network (BN) by using the Hugin
system. We then used this BN as an initial building network for the construction of two Dynamic
Bayesian Networks in order to predict the impact of pollution on groundwater quality.

3.1. Dynamic Bayesian Networks (DBNs)
DBNs extend Bayesian Networks from static domains to dynamic domains [7, 8]. This is
achieved by introducing relevant temporal dependencies between the representations of the static
network at different times.
The main characteristic of DBNs is as follows:
Let Xt be the state of the system at time t, and assume that
(1) The process is Markovian, i.e.,
P(Xt/X0, X1, . . ., Xt-1)= P(Xt/Xt-1)
(2) The process is stationary or time-invariant, i.e.,
P(Xt/Xt-1) is the same for every t.
Therefore, we just need P(X0), which is a static Bayesian network (BN), and P(Xt/Xt-1), which is a
network fragment, where the variables in Xt-1 have no parents, in order to have a Dynamic
Bayesian Network (DBN).

3.2. Bayesian Networks Development
Among more than twenty wells in the Taqah area, we selected only four wells for this study.
Those four wells have had, to the greatest extent, complete data measurements and provide
sufficient information for the assessment of the groundwater quality for this area.
The electrical conductivity (EC) of the water has been used as a measure for the salinity hazard of
the groundwater used for irrigation in the Salalah plain. The total dissolved solid (TDS) limit is
600 mg/L, which is the objective of the current plan of the MWR. TDS contains several dissolved
solids but 90% of its concentration is made up of six constituents. These are: sodium Na,
magnesium Mg, calcium Ca, chloride Cl, bicarbonate HCO3 and sulfate SO4. We, therefore,
considered only these elements in the calculation of TDS.
We also used the following relationship between TDS and EC.
TDS = A * EC; where A is a constant with value between 0.65 and 0.77.
Both TDS and EC can affect water acidity or water pH. Solute chemical constituents are variable
in high concentration at lower pH (higher acidity). On the other hand, acidity allows migration of
hydrogen ions (H+), which is an indication of conductivity. Therefore, our work concentrated on
the following relations.
TDS EC, EC pH, TDS pH
Reaching to these relations we used two learning approaches to construct and parameterize a
simple static BN that have three nodes, each node represents a groundwater quality constituent
402

Computer Science & Information Technology (CS & IT)

(TDS, EC or pH). Learning basically consists of two different components: 1) learning the
network structure, 2) learning the conditional probability distributions.
For the first component, we used the Hugin system that supports structure and parameter learning
in Bayesian networks. We also developed a program written in C++ to generate the conditional
probabilities for TDS, EC and pH using Table 2 as input.
Once the static BN model (static model) for each monitoring well was built, parameterized and
tested, we used these models as initial building networks in the construction of OOBNs. Figure 1
models the time slices for each well characterizing the temporal nature of identical model
structures, where the initial building network, see Figure 2, describes a generic time-sliced
network.
Table 2. TDS, EC, and pH data for the well Well 001/577.
Yr
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
0
1
2
3

TDS
mg/L
542.7
525.5
565.4
604.2
541.8
565.9
558.6
640.4
754.5
798.7
746.4
615.8
737.5
753.6
935.6
1174
1021
1067
1223
1055

EC
µS/cm
548
548
579
588
601
625
638
798
739
758
799
514
619
869
558
855
796
855
844
881

pH
7.85
7.8
7.75
7.57
7.43
7.34
7.32
7.27
7.24
7.28
7.29
7.3
7.28
7.19
7.15
7.15
7.06
6.98
6.94
6.9
Computer Science & Information Technology (CS & IT)

403

Figure 1. The OOBN representing three time-sliced networks

Figure 2. The initial building block representing one time-sliced network

4. USING CLASSICAL TIME SERIES FOR THE ASSESSMENT OF
GROUNDWATER QUALITY
The purpose of this section is to apply the classical time series analysis to groundwater quality
data and to compare the results with that obtained by the application of Dynamic Bayesian
Networks (DBNs). The continuous and regular monitoring data of electrical conductivity (EC),
total dissolved solid (TDS), pH measured by the Ministry of Water Resources (MWR) were also
used here for the time series analysis.
Time series analyses of water supply wells with respect to the concentration of chemical
constituents are presented in Figures 3-8.
404

Computer Science & Information Technology (CS & IT)

Total dissolved solids (TDS) are a measure of the dissolved minerals in water and also a measure
of drinking water quality. There is a secondary drinking water standard of 500 milligrams per liter
(mg/L) TDS; water exceeding this level tastes salty. Groundwater with TDS levels greater than
1500 mg/L is considered too saline to be a good source of drinking water. Figure 3 shows the
concentration of TDS for the well Well001/577 for a period of twenty one years.
The fluctuation of the concentration of the chloride (Cl), sodium (Na), and calcium (Ca) with
respect to time is shown in Figure 5. The values were averaged during the initial analysis as there
were no significant differences among the monthly data. Chloride values above 250 mg/l give a
slight salty taste to water which is objectionable by many people.
Relationships between TDS, EC and pH are examined using multiple regression analysis, see
Figure 5. Multiple regression analysis is used to explain as much variation observed in the
response variable as possible, while minimizing unexplained variation from “noise”. The results
of this analysis are used to produce the moving average chart, Figure 7, and the linear regression
chart, Figure 8. We used Excel Business Tools, Microsoft Excel, and Matlab for producing these
and other charts.
y = 3 3 .9 2 8 x + 4 1 2 .5 3
2
R = 0 .8 2 2 3
m g /l

1400
1200
1000
800
600
400
200
0

TDS

04

02

20

00

20

98

20

96

19

94

19

92

19

90

19

88

19

86

19

19

19

84

L in e a r
(TDS )

Figure 3. Fluctuation of TDS concentration for the well Well001/577

y = 0 .0 1 2 x - 2 1 .1 0 5
R 2 = 0 .5 1 8 3

EC

2 .9 5
2 .9 0
2 .8 5
2 .8 0
2 .7 5
2 .7 0
2 .6 5
2 .6 0
2 .5 5

1984

1989

1994

1999

2004

2009

Figure 4. EC concentration is poorly represented for the well Well001/577
Computer Science & Information Technology (CS & IT)
Mg
So 4
Na
Ca
K
Cl

m g /l

350
300
250
200

405

150
100

19
8
19 4
85
19
8
19 6
8
19 7
8
19 8
8
19 9
90
19
9
19 1
9
19 2
93
19
9
19 4
9
19 5
9
19 6
9
19 7
98
19
9
20 9
0
20 0
01
20
0
20 2
0
20 3
04

50
0

Figure 5. Fluctuation of the concentration of the major chemical constituents for Well001/577 for a period
of 21 years

Figure 6. Excel templates for financial analysis and business productivity from Excel Business Tools

As is shown in Figure 5 that the trend is as follows:
TrendWQ=19.01*TDS - 5.42*EC -270.16*pH + 205.14
2 5 .0 0
2 0 .0 0

1 0 .0 0

T ren d

1 5 .0 0

5 .0 0
0 .0 0
1980
5 .0 0

1985

1990

Year

1995

2000

2005

Figure 7. Moving average chart of 2-year period for groundwater quality trend
406

Computer Science & Information Technology (CS & IT)

Figure 8. A curve fitting chart showing groundwater quality trend over time

Figure 7 shows the groundwater quality trend over time (linear regression). The trend has the
following properties:
Linear model Poly1:
f(x) = p1*x + p2
Coefficients (with 95% confidence bounds):
p1 = 0.8954 (0.7962, 0.9947)
p2 = 1.332 (0.08589, 2.579)
Goodness of fit:
SSE: 32.91
R-square: 0.9494
Adjusted R-square: 0.9467
RMSE: 1.316
Although the classical time series models are used here to assess the presence and strength of
temporal patterns of groundwater quality. These models are based on the assumption of stationary
(i.e. time invariant). They have been widely used in many domains such as financial data and
weather forecasting. Yet these models do not readily adapt to domains with dynamically changing
model characteristics, as is the case with groundwater quality assessment. In addition to the above
mentioned assumption, the classical models are restricted in their ability to represent the general
probabilistic dependency among the domain variables and they fail to incorporate prior
knowledge.
The observed groundwater quality data are irregularly spaced and not predetermined as in the
case with ordinary time series. This may cause the traditional time series techniques to be
ineffective (Prediction: what is the predicted value for one period a head). It is evident that the
time series casts doubts on the positive or negative effects of any chemical constituent on the
groundwater quality for the long run, and is thus not as clear and reliable as in the case of using
Dynamic Bayesian Techniques. While some groundwater quality constituents, such as chloride
and TDS, show an increasing trend, the other constituents, such as pH, Mg, and SO4 do not
demonstrate obvious trends. Therefore, we can draw a reliable conclusion on the cause of the
increasing trend of the groundwater quality and we cannot investigate the effect of the increasing
or decreasing other constituents, such as pH and EC. In addition to this ignorance of the causeeffect relationships, classical time series models assume the linearity in the relationships among
variables and normality of their probability distributions.
Computer Science & Information Technology (CS & IT)

407

5. CONCLUSION AND FURTHER WORK
This work presents the assessment of groundwater quality. Bayesian methods have been
investigated and shown to offer considerable potential for use in groundwater quality prediction.
These methods are based on reasoning under conditions of uncertainty. This work is the first step
towards having a comprehensive network that contains the other variables that are considered by
the researchers significant for the assessment of groundwater quality in the Salalah plain in
particular.
Also we showed that the classical time series models do not readily adapt to domains with
dynamically changing model characteristics, as is the case with groundwater quality assessment.
This is mainly because these models are restricted in their ability to represent the general
probabilistic dependency among the domain variables and they fail to incorporate prior
knowledge.

REFERENCES
[1]
[2]
[3]

[4]
[5]

[6]
[7]
[8]
[9]

Wu-Seng, L. 1993. Water Quality Modeling, CRC Press, Inc.
Dames and Moore. 1992. Investigation of The Quality of Groundwater Abstracted from the Salalah
Plain: Dhofar Municipality, Final Report.
Ministry of Water Resources (MWR), Sultanate of Oman. 2004. Law on the Protection of Water
Resources, promulgated by Decree of the Sultan No. 29 of 2004, and its implementing regulations
(Regulations for the organization of wells and aflaj, and Regulations for the use of water desalination
units on wells), (in Arabic).
Shihab, K. and Al-Chalabi, N. 2004. Treatments of Water Quality Using Bayesian Reasoning, Lecture
Notes in Computer Science, 3029, 728–738.
Shihab, K and Nida Al-Chalabi, 2007. Dynamic Modeling of Groundwater Quality Using Bayesian
Techniques, Journal of the American Water Resources Association (JAWRA), Blackwell Publishing
(Online Blackwell Synergy), Vol. 43, No. 3, pp. 664-674.
Banerjee A. K. et al. 1985. TR no. 773, Monitoring groundwater quality, Department of Statistics,
University of Wisconsin.
HUGIN Expert Brochure. 2005. HUGIN Expert A/S, P. O.Box 8201 DK-9220, Aalborg, Denmark,
(http://www.hugin.com).
Kjaerulff, U. 1995. dHugin: A computational system for dynamic time-sliced Bayesian Networks,
International Journal of Forecasting, 11, 89-111.
Shihab, K. 2008. Analysis of Water Chemical Contaminants: A Comparative Study, Applied
Artificial Intelligence (AAI), Vol 22, No. 4, pp. 352-376.

More Related Content

PDF
Desinging dsp (0, 1) acceptance sampling plans based on truncated life tests ...
PDF
Desinging dsp (0, 1) acceptance sampling plans based on
PDF
Gem sfeatures
DOCX
What is water quality management
PDF
11.application of principal component analysis & multiple regression models i...
PDF
Statistical Assessment of Water Quality Parameters for Pollution Source Ident...
PDF
Regression models for prediction of water quality in krishna river
PPTX
Reliable on-line sensing of water quality
Desinging dsp (0, 1) acceptance sampling plans based on truncated life tests ...
Desinging dsp (0, 1) acceptance sampling plans based on
Gem sfeatures
What is water quality management
11.application of principal component analysis & multiple regression models i...
Statistical Assessment of Water Quality Parameters for Pollution Source Ident...
Regression models for prediction of water quality in krishna river
Reliable on-line sensing of water quality

Viewers also liked (12)

PDF
Statistical analysis to identify the main parameters to
PPT
Water monitoring presentation
PPTX
Real time water quality monitoring system in ganga basin
PDF
Time Series Data Analysis for Forecasting – A Literature Review
PDF
43 water-quality-for-pond-aquaculture
PPTX
CONSTRUCTION OF FISH POND
PDF
Physico-chemical Characteristics of Water Quality for Culturing the Freshwate...
PDF
Water quality testing july 2012
PDF
Monitoring pond water quality to improve shrimp and fish production
PDF
Water Quality Index for Assessment of Rudrasagar Lake Ecosystem, India
PPTX
Water quality
Statistical analysis to identify the main parameters to
Water monitoring presentation
Real time water quality monitoring system in ganga basin
Time Series Data Analysis for Forecasting – A Literature Review
43 water-quality-for-pond-aquaculture
CONSTRUCTION OF FISH POND
Physico-chemical Characteristics of Water Quality for Culturing the Freshwate...
Water quality testing july 2012
Monitoring pond water quality to improve shrimp and fish production
Water Quality Index for Assessment of Rudrasagar Lake Ecosystem, India
Water quality
Ad

Similar to Bayesian Methods for Assessing Water Quality (20)

PDF
An Efficient Method for Assessing Water Quality Based on Bayesian Belief Netw...
PDF
An efficient method for assessing water
PPTX
review main GURU SAI5446531251616502351645
PDF
Phd Presentation
PDF
WATER QUALITY PREDICTION
PDF
2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilisti...
DOCX
PREDICTION OF WATER PORTABILITY USING CLASSIFICATION TECHNIQUES.docx
PDF
IRJET- Hydrodynamic Integrated Modelling of Basic Water Quality and Nutrient ...
PDF
Data-Mining-Project
PDF
Gp3511691177
PDF
COMPSEC-a-new-tool-to-derive-natural-background-levels-by-the-component-separ...
PDF
A novel fuzzy rule based system for assessment of ground water potability: A ...
PDF
Bayesian Inference and Uncertainty Quantification for Inverse Problems
PDF
Statistical Methods For Groundwater Monitoring Robert D Gibbons
PDF
Master
PPTX
PREDICTING RIVER WATER QUALITY ppt presentation
PDF
Variance of total dissolved solids and electrical conductivity for water qual...
PDF
Bayesian Divergence Time Estimation
PDF
Statistical Analysis of Ground Water Quality in Rural Areas of Uttar Pradesh ...
PDF
Engineering course water Quality monitroing.pdf
An Efficient Method for Assessing Water Quality Based on Bayesian Belief Netw...
An efficient method for assessing water
review main GURU SAI5446531251616502351645
Phd Presentation
WATER QUALITY PREDICTION
2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilisti...
PREDICTION OF WATER PORTABILITY USING CLASSIFICATION TECHNIQUES.docx
IRJET- Hydrodynamic Integrated Modelling of Basic Water Quality and Nutrient ...
Data-Mining-Project
Gp3511691177
COMPSEC-a-new-tool-to-derive-natural-background-levels-by-the-component-separ...
A novel fuzzy rule based system for assessment of ground water potability: A ...
Bayesian Inference and Uncertainty Quantification for Inverse Problems
Statistical Methods For Groundwater Monitoring Robert D Gibbons
Master
PREDICTING RIVER WATER QUALITY ppt presentation
Variance of total dissolved solids and electrical conductivity for water qual...
Bayesian Divergence Time Estimation
Statistical Analysis of Ground Water Quality in Rural Areas of Uttar Pradesh ...
Engineering course water Quality monitroing.pdf
Ad

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
A Presentation on Artificial Intelligence
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PDF
cuic standard and advanced reporting.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Machine Learning_overview_presentation.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Agricultural_Statistics_at_a_Glance_2022_0.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MYSQL Presentation for SQL database connectivity
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Programs and apps: productivity, graphics, security and other tools
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
A Presentation on Artificial Intelligence
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
cuic standard and advanced reporting.pdf
The AUB Centre for AI in Media Proposal.docx
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
Per capita expenditure prediction using model stacking based on satellite ima...
Machine Learning_overview_presentation.pptx

Bayesian Methods for Assessing Water Quality

  • 1. BAYESIAN METHODS FOR ASSESSING WATER QUALITY Khalil Shihab1 and Nida Al-Chalabi2 1 College of Engineering & Science, Victoria University, Australia Khalil.shihab@gmail.com 2 Department of Computer Science, SQU, Oman nida@squ.edu.om ABSTRACT This work presents the development of Bayesian techniques for the assessment of groundwater quality. Its primary aim is to develop a predictive model and a computer system to assess and predict the impact of pollutants on the water column. The process of the analysis begins by postulating a model in light of all available knowledge taken from relevant phenomenon. The previous knowledge as represented by the prior distribution of the model parameters is then combined with the new data through Bayes’ theorem to yield the current knowledge represented by the posterior distribution of model parameters. This process of updating information about the unknown model parameters is then repeated in a sequential manner as more and more new information becomes available. KEYWORDS Bayesian Belief Networks, Water Quality Assessment, Data Mining 1. INTRODUCTION Water is an essential requirement for irrigated agriculture, domestic uses, including drinking, cooking and sanitation. Declining surface and groundwater quality is regarded as the most serious and persistent issue and has become as a global issue effecting the people and the ecosystem. Anthropogenic sources of pollution such as agriculture, industry, and municipal waste, contribute to the degradation of groundwater quality, which may limit the use of these resources and lead to health-risk consequences. For these reasons, the need for intensive groundwater resources management has become more urgent. In this work, we studied the Salalah area of Oman because the groundwater has been an important natural resource and the only available water source other than the seasonal rainfall. Groundwater quality and pollution are determined and measured by comparing physical, chemical, biological, microbiological, and radiological quantities and parameters to a set of standards and criteria. A criterion is basically a scientific quantity upon which a judgment can be based [1]. In this work, however, we considered only the chemical parameters: total dissolved solids (TDS), electrical conductivity (EC) and water pH. David C. Wyld et al. (Eds) : CCSIT, SIPP, AISC, PDCTA, NLP - 2014 pp. 397–407, 2014. © CS & IT-CSCP 2014 DOI : 10.5121/csit.2014.4234
  • 2. 398 Computer Science & Information Technology (CS & IT) 2. UNCERTAINTY ANALYSIS The Ministry of Water Resources (MWR) maintains data on the concentration of the harmful substances in the groundwater at Taqah monitoring sites, which are located to the south of the Sultanate of Oman, in the Salalah plain [2, 3]. We observed that good quality data were obtained from several monitoring wells in this region. Because of the lack of monitoring wells in certain areas in that region, we filled in the missing measurements with data obtained from Oman Mining Company (OMCO) and Ministry of Environmental and Regional Municipalities (MRME) [4]. Data for water quality assessment are normally collected from various monitoring wells and then analyzed in environmental laboratories in order to measure the concentration of a number of water quality constituents. We realized that the methods used by these laboratories do not emphasize accuracy. There is a lack of awareness among both laboratory and validation personnel regarding the possibility of false positives in environmental data. In order to overcome this problem and to have representative data, we, therefore, used the following modified Bayesian model to that developed by Banerjee, Planting and Ramirez [6], to preprocessing the datasets used for the development of the Bayesian Networks. 2.1. Bayesian Models The formulation of the model is as follows: Let S denote a particular hazardous constituent of interest. Since the concentration of the substance may vary from well to another, it is necessary to consider each well separately. Let xt= (xt1, xt2, xt3, xtm) be the vector of m measurements of the concentration of S in m distinct water samples from a given well at a given sampling occasion where (m>=1) and (t=1, 2, . . .). Each measurement consists of the true concentration of S plus an error. Let Xt be the true concentration of S in the groundwater at sampling occasion t. If we assume that the true concentration Xt is unknown and is a random variable, the model evaluates the posterior distribution of Xt given the sample measurements xt at sampling occasion t. Using the normality assumption and given Xt = xt and δ2, the concentration measurements in xt represent a random sample of size m for random distribution with mean xt and variance δ2. We assume that the parameters xt and δ2 of the normal distribution are random variables with certain prior probability distribution. Therefore, the model for prior distribution of Xt and δ2 can be presented as follows: For t =1, 2… and given δ2 the conditional distribution of Xt at sampling occasion t is a normal distribution with mean µt-1 and variance δ2t-1 δ2. The marginal distribution of δ2 is an inverted gamma distribution with parameter βt-1 and νt-1. This model uses the following prior distribution, which represents the concentration measurements before the first sampling. The pdf of the prior distribution of X0 is:
  • 3. Computer Science & Information Technology (CS & IT)   1  f 0 ( x0 ) = 1 +  2v0      x −µ  0  0  σ β 0  v0   0   2 399 −( 2v0 +1) 2        (2.1) which is the pdf of the student’s t-distribution with 2v0 degrees of freedom, location parameters µ0 and variance δ02β0/ν0. Now suppose that the observations are available on the concentration of S, given the sample Xt the posterior marginal distribution of Xt is a student’s t-distribution with 2vt degree of freedom, location parameters µt and variance δt βt/νt where the pdf has the form:  1  f t ( xt / x) = 1 +  2vt  2  xt − µt       σ t β t ν t      −( 2vt +1) 2 (2.2) where: m βt = βt−1 +∑(xtj − x) / 2+m(µt−1 − xt )/[2(1+mσt2−1)] j=1 vt = vt −1 + m / 2 2 t −1 2 t −1 µt = (µt −1 + mxtσ ) /(1 + mσ ) σ t2 = σ t2−1 /(1 + mσ t2−1 ) (2.3) m xt = ∑ xtj / m j =1 It is obvious from the equation of µt the sequential nature of this posterior distribution. Therefore, in order to present the true unknown concentration of the substance S in the well under consideration, it is frequently more convenient to put a range (or interval) which contains most of the posterior probability. Such intervals are called highest posterior density (HPD) intervals. Thus for a given probability content of (1-α), 0< α<1, a 100(1- α) percent HPD interval for Xt, is given by: µ t ± t 2v (α / 2)σ t β t ν t t (2.4) when t2vt(α/2) is the 100(1- α/2) percentile of the student’s t-distribution with 2vt degree of freedom. 2.2. Bayesian Algorithm In brief, the monitoring algorithm, which is based on the Bayesian model, is as follows: (1) Fix a value of α (0< α <1) based on the desired confidence level. In this case, we chose α to be 0.01. (2) Since we do not have enough data to work with, we used the same parameters of the prior distribution used in the model of Banerjee, Plantinga and Ramirez. These parameters are : β0= 0.0073 , ν0=2.336 , µ0= 9.53 , δ02 =3056.34
  • 4. 400 Computer Science & Information Technology (CS & IT) (3) At each sampling occasion t , ( t= 1,2,...), compute the parameters βt , νt , µt and δt of the posterior distribution Xt given the set of observations in xt on the concentration of S available from a given well in a given site using (2.3). Compute LHPD and UHPD using these parameter estimates and (2.4). (4) Plot µt, LHPD, and UHPD that are obtained in step 3 above against sampling occasion t. (5) For the next sampling occasion, update the values of the parameters βt, νt, µt and δt using (2.3) and the datasets just obtained. Recomputed LHPD, and UHPD using the updated parameter values in (2.4) and repeat step 4 above. Some of these datasets needed to be scaled down using the following normalization technique: n x= x−µ σ ∑x n x = ∑ xi n , where i σ = 2 i − nx 2 i n −1 , and 2.3. Implementation The pre-processing system is implemented on PC platform using Visual Basic programming language. Table 1 presents the concentration data for TDS (Total Dissolved Solids) for Well 001/577 in the Taqah area. In particular, the table shows the true concentration data for TDS produced by our pre-processing system. Table 1. Concentration Data of TDS for Well001/577 in the Salalah plain, where OC stands for Observed Concentration and ETC stands for Expected True Concentration. Te 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 OC 1.147 1.106 1.938 2.237 3.857 3.834 3.957 3.761 4.3 3.958 1 3.714 3.65 3.381 3.396 3.477 3.498 3.23 3.243 3.267 3.297 LHPD 0.85 1 1.12 1.33 1.6 1.91 2.18 2.38 2.58 2.72 2.54 2.64 2.73 2.78 2.83 2.87 2.91 2.93 2.95 2.97 2.99 ETC 1.15 1.13 1.4 1.61 2.06 2.35 2.58 2.73 2.9 3.01 2.83 2.9 2.96 2.99 3.02 3.04 3.07 3.08 3.09 3.1 3.11 UHPD 1.45 1.26 1.68 1.88 2.52 2.79 2.98 3.08 3.23 3.3 3.11 3.16 3.19 3.2 3.2 3.22 3.23 3.23 3.22 3.22 3.22
  • 5. Computer Science & Information Technology (CS & IT) 401 3. BAYESIAN NETWORKS After the pre-processing stage, we constructed a Bayesian Network (BN) by using the Hugin system. We then used this BN as an initial building network for the construction of two Dynamic Bayesian Networks in order to predict the impact of pollution on groundwater quality. 3.1. Dynamic Bayesian Networks (DBNs) DBNs extend Bayesian Networks from static domains to dynamic domains [7, 8]. This is achieved by introducing relevant temporal dependencies between the representations of the static network at different times. The main characteristic of DBNs is as follows: Let Xt be the state of the system at time t, and assume that (1) The process is Markovian, i.e., P(Xt/X0, X1, . . ., Xt-1)= P(Xt/Xt-1) (2) The process is stationary or time-invariant, i.e., P(Xt/Xt-1) is the same for every t. Therefore, we just need P(X0), which is a static Bayesian network (BN), and P(Xt/Xt-1), which is a network fragment, where the variables in Xt-1 have no parents, in order to have a Dynamic Bayesian Network (DBN). 3.2. Bayesian Networks Development Among more than twenty wells in the Taqah area, we selected only four wells for this study. Those four wells have had, to the greatest extent, complete data measurements and provide sufficient information for the assessment of the groundwater quality for this area. The electrical conductivity (EC) of the water has been used as a measure for the salinity hazard of the groundwater used for irrigation in the Salalah plain. The total dissolved solid (TDS) limit is 600 mg/L, which is the objective of the current plan of the MWR. TDS contains several dissolved solids but 90% of its concentration is made up of six constituents. These are: sodium Na, magnesium Mg, calcium Ca, chloride Cl, bicarbonate HCO3 and sulfate SO4. We, therefore, considered only these elements in the calculation of TDS. We also used the following relationship between TDS and EC. TDS = A * EC; where A is a constant with value between 0.65 and 0.77. Both TDS and EC can affect water acidity or water pH. Solute chemical constituents are variable in high concentration at lower pH (higher acidity). On the other hand, acidity allows migration of hydrogen ions (H+), which is an indication of conductivity. Therefore, our work concentrated on the following relations. TDS EC, EC pH, TDS pH Reaching to these relations we used two learning approaches to construct and parameterize a simple static BN that have three nodes, each node represents a groundwater quality constituent
  • 6. 402 Computer Science & Information Technology (CS & IT) (TDS, EC or pH). Learning basically consists of two different components: 1) learning the network structure, 2) learning the conditional probability distributions. For the first component, we used the Hugin system that supports structure and parameter learning in Bayesian networks. We also developed a program written in C++ to generate the conditional probabilities for TDS, EC and pH using Table 2 as input. Once the static BN model (static model) for each monitoring well was built, parameterized and tested, we used these models as initial building networks in the construction of OOBNs. Figure 1 models the time slices for each well characterizing the temporal nature of identical model structures, where the initial building network, see Figure 2, describes a generic time-sliced network. Table 2. TDS, EC, and pH data for the well Well 001/577. Yr 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 0 1 2 3 TDS mg/L 542.7 525.5 565.4 604.2 541.8 565.9 558.6 640.4 754.5 798.7 746.4 615.8 737.5 753.6 935.6 1174 1021 1067 1223 1055 EC µS/cm 548 548 579 588 601 625 638 798 739 758 799 514 619 869 558 855 796 855 844 881 pH 7.85 7.8 7.75 7.57 7.43 7.34 7.32 7.27 7.24 7.28 7.29 7.3 7.28 7.19 7.15 7.15 7.06 6.98 6.94 6.9
  • 7. Computer Science & Information Technology (CS & IT) 403 Figure 1. The OOBN representing three time-sliced networks Figure 2. The initial building block representing one time-sliced network 4. USING CLASSICAL TIME SERIES FOR THE ASSESSMENT OF GROUNDWATER QUALITY The purpose of this section is to apply the classical time series analysis to groundwater quality data and to compare the results with that obtained by the application of Dynamic Bayesian Networks (DBNs). The continuous and regular monitoring data of electrical conductivity (EC), total dissolved solid (TDS), pH measured by the Ministry of Water Resources (MWR) were also used here for the time series analysis. Time series analyses of water supply wells with respect to the concentration of chemical constituents are presented in Figures 3-8.
  • 8. 404 Computer Science & Information Technology (CS & IT) Total dissolved solids (TDS) are a measure of the dissolved minerals in water and also a measure of drinking water quality. There is a secondary drinking water standard of 500 milligrams per liter (mg/L) TDS; water exceeding this level tastes salty. Groundwater with TDS levels greater than 1500 mg/L is considered too saline to be a good source of drinking water. Figure 3 shows the concentration of TDS for the well Well001/577 for a period of twenty one years. The fluctuation of the concentration of the chloride (Cl), sodium (Na), and calcium (Ca) with respect to time is shown in Figure 5. The values were averaged during the initial analysis as there were no significant differences among the monthly data. Chloride values above 250 mg/l give a slight salty taste to water which is objectionable by many people. Relationships between TDS, EC and pH are examined using multiple regression analysis, see Figure 5. Multiple regression analysis is used to explain as much variation observed in the response variable as possible, while minimizing unexplained variation from “noise”. The results of this analysis are used to produce the moving average chart, Figure 7, and the linear regression chart, Figure 8. We used Excel Business Tools, Microsoft Excel, and Matlab for producing these and other charts. y = 3 3 .9 2 8 x + 4 1 2 .5 3 2 R = 0 .8 2 2 3 m g /l 1400 1200 1000 800 600 400 200 0 TDS 04 02 20 00 20 98 20 96 19 94 19 92 19 90 19 88 19 86 19 19 19 84 L in e a r (TDS ) Figure 3. Fluctuation of TDS concentration for the well Well001/577 y = 0 .0 1 2 x - 2 1 .1 0 5 R 2 = 0 .5 1 8 3 EC 2 .9 5 2 .9 0 2 .8 5 2 .8 0 2 .7 5 2 .7 0 2 .6 5 2 .6 0 2 .5 5 1984 1989 1994 1999 2004 2009 Figure 4. EC concentration is poorly represented for the well Well001/577
  • 9. Computer Science & Information Technology (CS & IT) Mg So 4 Na Ca K Cl m g /l 350 300 250 200 405 150 100 19 8 19 4 85 19 8 19 6 8 19 7 8 19 8 8 19 9 90 19 9 19 1 9 19 2 93 19 9 19 4 9 19 5 9 19 6 9 19 7 98 19 9 20 9 0 20 0 01 20 0 20 2 0 20 3 04 50 0 Figure 5. Fluctuation of the concentration of the major chemical constituents for Well001/577 for a period of 21 years Figure 6. Excel templates for financial analysis and business productivity from Excel Business Tools As is shown in Figure 5 that the trend is as follows: TrendWQ=19.01*TDS - 5.42*EC -270.16*pH + 205.14 2 5 .0 0 2 0 .0 0 1 0 .0 0 T ren d 1 5 .0 0 5 .0 0 0 .0 0 1980 5 .0 0 1985 1990 Year 1995 2000 2005 Figure 7. Moving average chart of 2-year period for groundwater quality trend
  • 10. 406 Computer Science & Information Technology (CS & IT) Figure 8. A curve fitting chart showing groundwater quality trend over time Figure 7 shows the groundwater quality trend over time (linear regression). The trend has the following properties: Linear model Poly1: f(x) = p1*x + p2 Coefficients (with 95% confidence bounds): p1 = 0.8954 (0.7962, 0.9947) p2 = 1.332 (0.08589, 2.579) Goodness of fit: SSE: 32.91 R-square: 0.9494 Adjusted R-square: 0.9467 RMSE: 1.316 Although the classical time series models are used here to assess the presence and strength of temporal patterns of groundwater quality. These models are based on the assumption of stationary (i.e. time invariant). They have been widely used in many domains such as financial data and weather forecasting. Yet these models do not readily adapt to domains with dynamically changing model characteristics, as is the case with groundwater quality assessment. In addition to the above mentioned assumption, the classical models are restricted in their ability to represent the general probabilistic dependency among the domain variables and they fail to incorporate prior knowledge. The observed groundwater quality data are irregularly spaced and not predetermined as in the case with ordinary time series. This may cause the traditional time series techniques to be ineffective (Prediction: what is the predicted value for one period a head). It is evident that the time series casts doubts on the positive or negative effects of any chemical constituent on the groundwater quality for the long run, and is thus not as clear and reliable as in the case of using Dynamic Bayesian Techniques. While some groundwater quality constituents, such as chloride and TDS, show an increasing trend, the other constituents, such as pH, Mg, and SO4 do not demonstrate obvious trends. Therefore, we can draw a reliable conclusion on the cause of the increasing trend of the groundwater quality and we cannot investigate the effect of the increasing or decreasing other constituents, such as pH and EC. In addition to this ignorance of the causeeffect relationships, classical time series models assume the linearity in the relationships among variables and normality of their probability distributions.
  • 11. Computer Science & Information Technology (CS & IT) 407 5. CONCLUSION AND FURTHER WORK This work presents the assessment of groundwater quality. Bayesian methods have been investigated and shown to offer considerable potential for use in groundwater quality prediction. These methods are based on reasoning under conditions of uncertainty. This work is the first step towards having a comprehensive network that contains the other variables that are considered by the researchers significant for the assessment of groundwater quality in the Salalah plain in particular. Also we showed that the classical time series models do not readily adapt to domains with dynamically changing model characteristics, as is the case with groundwater quality assessment. This is mainly because these models are restricted in their ability to represent the general probabilistic dependency among the domain variables and they fail to incorporate prior knowledge. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] Wu-Seng, L. 1993. Water Quality Modeling, CRC Press, Inc. Dames and Moore. 1992. Investigation of The Quality of Groundwater Abstracted from the Salalah Plain: Dhofar Municipality, Final Report. Ministry of Water Resources (MWR), Sultanate of Oman. 2004. Law on the Protection of Water Resources, promulgated by Decree of the Sultan No. 29 of 2004, and its implementing regulations (Regulations for the organization of wells and aflaj, and Regulations for the use of water desalination units on wells), (in Arabic). Shihab, K. and Al-Chalabi, N. 2004. Treatments of Water Quality Using Bayesian Reasoning, Lecture Notes in Computer Science, 3029, 728–738. Shihab, K and Nida Al-Chalabi, 2007. Dynamic Modeling of Groundwater Quality Using Bayesian Techniques, Journal of the American Water Resources Association (JAWRA), Blackwell Publishing (Online Blackwell Synergy), Vol. 43, No. 3, pp. 664-674. Banerjee A. K. et al. 1985. TR no. 773, Monitoring groundwater quality, Department of Statistics, University of Wisconsin. HUGIN Expert Brochure. 2005. HUGIN Expert A/S, P. O.Box 8201 DK-9220, Aalborg, Denmark, (http://www.hugin.com). Kjaerulff, U. 1995. dHugin: A computational system for dynamic time-sliced Bayesian Networks, International Journal of Forecasting, 11, 89-111. Shihab, K. 2008. Analysis of Water Chemical Contaminants: A Comparative Study, Applied Artificial Intelligence (AAI), Vol 22, No. 4, pp. 352-376.