Modeling full scale-data(2)

Process Optimization: Enhancing
Understanding through Mining Full-
Scale Data
John B. Cook, PE, M.ASCE
Edwin A. Roehl
Uwe Mundry
Advanced Data Mining Int’l
Greenville, SC

Acknowledgement

Ed Roehl – CTO Uwe Mundry, Partner
• World class industrial researcher; • World class software design, development;
• Software design, development, and project • multi--spectral and hyper-spectral imaging and
management; pattern recognition, 4D medical imaging, 4D
• Advanced process engineering, computer- geographical imaging, homeland security
based modeling and optimization methods, applications, real-time decision support
industrial R&D, product/process design systems with industrial applications; Data
automation, CAE, PDM; mining, multivariate analysis, predictive
modeling, simulation, advanced control, signal
• Data mining, multivariate analysis, predictive
processing, non-linear/chaotic systems,
modeling, simulation, advanced control, signal
computational geometry, machine
processing, non-linear/chaotic systems,
learning/artificial neural networks;
computational geometry;
OOP/multiple computer languages; Medical
• AI, expert systems, OOP/computer languages, and environmental imaging.
machine learning/artificial neural networks.

Why optimize your plant?

• Reduced operating budgets (10% very
common)
• Increasingly stringent regulations
--Water treatment?
--Wastewater treatment?
• Increasing cost of capital improvements
--USD worth less
--QE2 will lower value of debt instruments such
as bonds

Process optimization by modeling

1. Modeling processes through various
means
a. Bench-scale models
b. Pilot-scale models
c. Mathematical models
1) Deterministic/mechanistic—based on first principles
2) Empirical—either statistical or based upon some
optimal function to describe behavior
3) Hybrid of 1) and 2)

Process optimization by modeling

What is a mathematical model?
―…..consistent set of mathematical equations which
is thought to correspond to some other entity, its
prototype.‖—Rutherford Aris

Definitions for pilot-scale modeling
• Geometric Similarity—All lengths of the model and the
prototype must be in the same ratio. All corresponding
angles must be equal. [This is the easy one to achieve.]
• Kinematic Similarity—Ratios of fluid velocity and other
relevant velocities must be the same for the model and
prototype. Ratios of flow time scale and boundary time
scale must be the same. [Problems with laminar/turbulent.]
• Dynamic Similarity—The force polygons for the model
and prototype must be proportional. For example, forces
such as inertia, pressure, viscous forces, surface tension
forces, etc.

Equations of importance

• R = ρVℓ/µ (very important!)

• W = ρV2ℓ/σ (surface tension effects)

• F = V/ (gℓ)½ (free surface effects)

Scale-up problems with models

1. For bench-scale and pilot-scale:
a. Example of problems with scale-up for
simple drag coefficient, CD:
CD = f (R, W, F, α)
[Where is this important for water treatment?]
c. Pilot-scale testing is good for comparing
one pilot train with another pilot train but not
for finding absolute numbers for full-scale

So what of models?

―Models are undeniably beautiful, and a man may
justly be proud to be seen in their company. But
they may have their hidden vices. The question is,
after all, not only whether they are good to look at,
but whether we can live happily with them.‖
--Abraham Kaplan, The Conduct of Inquiry

Another problem: chaotic behavior

• ―Deterministic evolution of a nonlinear system
which is between regular behavior and
stochastic behavior.” – Abarbanel

• ―The property that characterizes a dynamical
system in which most orbits exhibit sensitive
dependence.” – Lorenz

• ―Neither periodic or stochastic behaviors that
have structure in state/feature space, making
them somewhat predictable.‖– ADMi

Lorenz attractor shows problem
• Poster child of chaos
• Purely synthetic, derived from 3 equations
– dx/dt = -σx + σy
mode 1
– dy/dt = -xz + rx – y
– dz/dt = xy – bz
mode 2

3D delay plot signal
showing
“orbitals”
mode 1

“extreme sensitivity to changes
mode 2 in boundary conditions”

Chaos in Savannah River estuary

Savannah River salinity intrusion
low f SC  24-hr MWA

measured
predicted
R2=0.88

Modeling chaotic behavior, 1
State Space Reconstruction (SSR)
• SSR is the means by which complex, constantly changing
processes can be represented in straightforward geometric
terms for visualization and modeling. SSR is like super
trending. It suggests that a process’ state space can be
optimally but not perfectly characterized by state vectors
Y(t). The vectors are constructed using an optimal number
of measurements, equal to ―local dimension‖ dL
(Abarbanel,1996), that are spaced optimally apart in time
by integer multiples of an optimal time delay d3.
Mathematically:
• Y(t) = [x(t), x(t - d), x(t - 2d),...., x(t – (dL - 1)d)] eq. 1
• Note that here Y(t) is univariate. Values of dL and d are
estimated analytically or experimentally from the data.

Modeling chaotic behavior, 2
• For a multivariate process of k independent variables:
• Y(t) = {[x1(t), x1(t - d1),…, x1(t – (dL1 – 1)d1)],....,[xk(t),
xk(t - dk),…, xk(t – (dLk – 1)dk)]} eq. 2
• This provides each variable with its own dL and d. A further
generalization that provides non-fixed time delay spacing
for each variable:
• Y(t) = {[x1(t), x1(t - d1,1),…, x1(t – (dL1 – 1)d1,dL1-
1)],....,[xk(t), xk(t - dk,1),…, xk(t – (dLk – 1)dk,dLk-1]} eq. 3
• Determining the best variables xk to use, and properly
estimating dimensions dLk and time delays dk by analytical
or experimental means, helps to insure that a given
process can be successfully reconstructed.

The fundamental problem:

―The simple things you see are
all complicated.‖—Substitute,
Pete Townhsend

Consider modeling full-scale
system with full-scale system
1. Approach
a. Use data mining to extract information
contained in the full-scale data
b. Eliminates problems inherent in scale-up
issues
c. Chaotic behavior can be modeled
d. Systematic and objective approach to
optimizing information

Building Process
Models

A view of a general process

inputs
x1 outputs
x2
x3
x4 PHYSICAL
y1 multiply periodic
PROCESS y2 chaotic
x5
x6 y3 stochastic
x7
x8

• Outputs that are
Causes of Variability predictable can then
• people be controlled
• configuration of controls
• raw water • Outputs that are
• weather unpredictable cannot
• chemicals
be controlled

Relate variables with neural
networks
• Inspired by the Brain
– get complicated behaviors from lots of ―simple‖
interconnected devices - neurons and synapses

x1
x2 y1
inputs x3 outputs
y2
x4

x5

– non-linear, multivariate curve fitting
– models are synthesized from example data
• machine learning

ANNs produce response surfaces
Example: Trihalomethanes Formation

surface fitted by non-linear
deviation from normal
ANN model represents normal
behavior
better conditions?

no data

CASE STUDY NO. 1—THM
AND HAA5 REDUCTION

Modeling chloroform TPFIN=11C

• Input = TURBFIN (MWA=4,t=-1),
R2ANN=0.47, RMSE=7.3
• +Input=COLORFIN (MWA=4),
• +Input=TPFIN, R2ANN=0.74,
RMSE=5.0 CF higher
at high TP same
R2ANN=0.74

Days when DBPs measured

TPFIN=32C

Observations about chloroform

• Finished turbidity accounts for 47% of
variability in chloroform
• Finished turbidity + color accounts for 60%
• Finished turbidity + color + temperature
accounts for 74%
• Or, R2ANN = 0.74
• Recommend:
1) optimize turbidity removal—most
important
Is this counterintuitive?
2) optimize TOC removal

Modeling
BDM, Part 1 TPFIN=11C
• Inputs = TURBFIN (t=-2) ,
COLORFIN (MWA=3), R2ANN=0.24,
RMSE=1.8
RMSE=1.2
BDM far more sensitive to
TPFIN than TURBFIN &
COLORFIN

TPFIN=32C
R2ANN=0.66


Observations regarding BDM

• Finished turbidity + finished color accounts
for 24% [very low correlation!]
• Finished turbidity + color + temperature
accounts for 66%
• Or, R2 = 0.66
• So, BDM is dominated by temperature

• Remove TURBFIN, add inputs = Modeling
PRE-Cl2, R2ANN=0.72, RMSE=1.1
BDM, Part 2

TPFIN=11C TPFIN=11C
COLORFIN=3.0 BDM sensitivity COLORFIN=1.0
to PRE-Cl2 &
NH3 higher at
low TPFIN.

BDM higher at
higher
COLORFIN.

TP is dominant
effect.
TPFIN=32C
COLORFIN=3.0
TPFIN=32C
COLORFIN=1.0

Modeling TCA
TPFIN=11C
• Input = TURBFIN (MWA=4,t=-3),
• +Input=COLORFIN (MWA=4),
RMSE=4.7 TCA less seasonal
than DCA

R2ANN=0.61

TPFIN=32C


Observations modeling TCA

• Finished turbidity accounts for 47%
variability
• Finished turbidity + finished color accounts
for 47% [surprising, as color not capturing
precursors!]
• Finished turbidity + color + finished
temperature accounts for 61%
• Or, R2 = 0.61

Summary - modeling THM and
HAA species
• Consider finished turbidity, color, and temperature
– indicators of organics speciation by time of year
– treatment process kinetics and performance
• Chloroform positively correlated to finished turbidity, color,
and temperature; R2ANN = 0.74
• BDM highly seasonal; positively correlated to and finished
turbidity, color, and temperature, and pre-Cl2 and NH3;
R2ANN = 0.66 to 0.72
• DCA highly seasonal; positively correlated by to finished
turbidity, color, and temperature; R2ANN = 0.73
• TCA somewhat seasonal; positively correlated by to
finished turbidity, and temperature; R2ANN = 0.61

CASE STUDY NO. 2—THM
OPTIMIZATION

Conventional WTP case study
Predict and ReduceTHM Formation

• Near real-time
predictions
• $ Savings by
optimizing use of
chemicals

GUI for THM control and ―what
ifs‖

CASE STUDY NO. 3—
OPTIMIZE GENERAL
PROCESS

3D response surfaces for % TOC
• Unshown input settings
removal
– R-TOC-BLNDCALC = 0
– R-PHXY = 0 (hist. avg. = 7.34)
– CLO2-H-BLNDCALC = 0.030 mg/l (hist. min.)
– COAGAID-X = 0.053 mg/l (hist. min.)
– COAG-X = 12.0 mg/l (hist. min.)

% TOC removal contour maps
– R-TOC-BLNDCALC = 0
– R-PHXY-C = 0 (hist. avg. = 7.34)
– CLO2-H-BLNDCALC = 0.030 mg/l (hist. min.)
– COAGAID-X = 0.053 mg/l (hist. min.)
– COAG-X = 12.0 mg/l (hist. min.)

Observations for % TOC removal

• Optimal coagulation pH = 6.5
• Coagulation aid = 0.05 mg/L (or < )
– However, coagulant aid does effect turbidity
• ClO2 = 0.8 mg/L
• Coagulant dose as function of [TOC]

Determine Optimal Turbidity
Removal

Total % turbidity removal

• System is robust in removal of turbidity regardless of source turbidity
levels; when source turbidity increases, % removal asymptotically
approaches –100%
• Goal is to minimize operating costs to meet water quality targets

Predict % filtration turbidity removal
– R-TURB-BLNDCALC = 0
– Historical minimums
• CLO2-H-BLNDCALC = 0.030 mg/l
• COAGAID-T3456CALC = 0.057 mg/l
• COAG-T3456CALC = 12.0 mg/l
• FLTAID-T3456CALC = 0.0041 mg/l

Contour maps for turbidity filtration
– R-TURB-BLNDCALC = 0
– Historical mins
• CLO2-H-BLNDCALC = 0.030 mg/l
• COAGAID-T3456CALC = 0.057 mg/l
• COAG-T3456CALC = 12.0 mg/l
• FLTAID-T3456CALC = 0.0041 mg/l

Observations % filtration turbidity
removal
1. Turbidity removal through filtration is highly
sensitive to:
a. coagulant dose
b. chlorine dioxide dose
2. Turbidity removal through filtration is NOT
sensitive to filter polymer aid
3. Turbidity removal = f (sed. turbidity + ClO2 +
coagulant + coagulant aid); R2 = 0.75
4. Filter run times very low; recommend eliminating
filter polymer aid
5. Recommend side-by-side filter testing

CASE STUDY NO. 4—
MODELING TANK
NITRIFICATION

Nearby Chlorine (mg/l)
Tank nitrification
summer

Tank Level (ft)
residual
nears zero

Days
Cl2

• Cl2, pH, temp data relationship
at storage tank site

pH
temp

Observations about tank water
quality
• Nitrification demonstrated by loss of total
chlorine residual, lower pH, higher NO-2
• Total chlorine loss is pH sensitive
• Total chlorine loss is very temperature
dependent
– Nitrification rate increases exponentially above
approximately 80 F
• At pH > 9, loss of residual stabilizes

Questions

John B. Cook, PE
Advanced Data Mining Intl,
Greenville, SC
John.Cook@advdmi.com
843.513.2130
www.advdmi.com

Modeling full scale-data(2)

More Related Content

Similar to Modeling full scale-data(2) (20)

More from John B. Cook, PE, CEO (11)

Recently uploaded (20)

Modeling full scale-data(2)