SlideShare a Scribd company logo
Process Optimization: Enhancing
Understanding through Mining Full-
           Scale Data
     John B. Cook, PE, M.ASCE
          Edwin A. Roehl
           Uwe Mundry
       Advanced Data Mining Int’l
            Greenville, SC
Acknowledgement




              Ed Roehl – CTO                                   Uwe Mundry, Partner
•   World class industrial researcher;               •   World class software design, development;
•   Software design, development, and project        •   multi--spectral and hyper-spectral imaging and
    management;                                          pattern recognition, 4D medical imaging, 4D
•   Advanced process engineering, computer-              geographical imaging, homeland security
    based modeling and optimization methods,             applications, real-time decision support
    industrial R&D, product/process design               systems with industrial applications; Data
    automation, CAE, PDM;                                mining, multivariate analysis, predictive
                                                         modeling, simulation, advanced control, signal
•   Data mining, multivariate analysis, predictive
                                                         processing, non-linear/chaotic systems,
    modeling, simulation, advanced control, signal
                                                         computational geometry, machine
    processing, non-linear/chaotic systems,
                                                         learning/artificial neural networks;
    computational geometry;
                                                         OOP/multiple computer languages; Medical
•   AI, expert systems, OOP/computer languages,          and environmental imaging.
    machine learning/artificial neural networks.
Why optimize your plant?

• Reduced operating budgets (10% very
  common)
• Increasingly stringent regulations
    --Water treatment?
    --Wastewater treatment?
• Increasing cost of capital improvements
    --USD worth less
    --QE2 will lower value of debt instruments such
    as bonds
Process optimization by modeling

1. Modeling processes through various
   means
  a. Bench-scale models
  b. Pilot-scale models
  c. Mathematical models
    1) Deterministic/mechanistic—based on first principles
    2) Empirical—either statistical or based upon some
          optimal function to describe behavior
    3) Hybrid of 1) and 2)
Process optimization by modeling

What is a mathematical model?
―…..consistent set of mathematical equations which
  is thought to correspond to some other entity, its
  prototype.‖—Rutherford Aris
Definitions for pilot-scale modeling
• Geometric Similarity—All lengths of the model and the
  prototype must be in the same ratio. All corresponding
  angles must be equal. [This is the easy one to achieve.]
• Kinematic Similarity—Ratios of fluid velocity and other
  relevant velocities must be the same for the model and
  prototype. Ratios of flow time scale and boundary time
  scale must be the same. [Problems with laminar/turbulent.]
• Dynamic Similarity—The force polygons for the model
  and prototype must be proportional. For example, forces
  such as inertia, pressure, viscous forces, surface tension
  forces, etc.
Equations of importance


• R = ρVℓ/µ    (very important!)

• W = ρV2ℓ/σ (surface tension effects)

• F = V/ (gℓ)½ (free surface effects)
Scale-up problems with models

1. For bench-scale and pilot-scale:
  a. Example of problems with scale-up for
  simple drag coefficient, CD:
     CD = f (R, W, F, α)
[Where is this important for water treatment?]
  c. Pilot-scale testing is good for comparing
  one pilot train with another pilot train but not
  for finding absolute numbers for full-scale
So what of models?

―Models are undeniably beautiful, and a man may
  justly be proud to be seen in their company. But
  they may have their hidden vices. The question is,
  after all, not only whether they are good to look at,
  but whether we can live happily with them.‖
      --Abraham Kaplan, The Conduct of Inquiry
Another problem: chaotic behavior

•   ―Deterministic evolution of a nonlinear system
    which is between regular behavior and
    stochastic behavior.” – Abarbanel

•   ―The property that characterizes a dynamical
    system in which most orbits exhibit sensitive
    dependence.” – Lorenz

•   ―Neither periodic or stochastic behaviors that
    have structure in state/feature space, making
    them somewhat predictable.‖– ADMi
Lorenz attractor shows problem
•   Poster child of chaos
•   Purely synthetic, derived from 3 equations
    –     dx/dt = -σx + σy
                                          mode 1
    –     dy/dt = -xz + rx – y
    –     dz/dt = xy – bz
                                                                mode 2



3D delay plot                                          signal
  showing
 “orbitals”
                                 mode 1



                     “extreme sensitivity to changes
        mode 2       in boundary conditions”
Chaos in Savannah River estuary
Savannah River salinity intrusion
low f SC  24-hr MWA

                                 measured
                                 predicted
                       R2=0.88
Modeling chaotic behavior, 1
State Space Reconstruction (SSR)
• SSR is the means by which complex, constantly changing
  processes can be represented in straightforward geometric
  terms for visualization and modeling. SSR is like super
  trending. It suggests that a process’ state space can be
  optimally but not perfectly characterized by state vectors
  Y(t). The vectors are constructed using an optimal number
  of measurements, equal to ―local dimension‖ dL
  (Abarbanel,1996), that are spaced optimally apart in time
  by integer multiples of an optimal time delay d3.
  Mathematically:
• Y(t) = [x(t), x(t - d), x(t - 2d),...., x(t – (dL - 1)d)] eq. 1
• Note that here Y(t) is univariate. Values of dL and d are
  estimated analytically or experimentally from the data.
Modeling chaotic behavior, 2
• For a multivariate process of k independent variables:
• Y(t) = {[x1(t), x1(t - d1),…, x1(t – (dL1 – 1)d1)],....,[xk(t),
  xk(t - dk),…, xk(t – (dLk – 1)dk)]} eq. 2
• This provides each variable with its own dL and d. A further
  generalization that provides non-fixed time delay spacing
  for each variable:
• Y(t) = {[x1(t), x1(t - d1,1),…, x1(t – (dL1 – 1)d1,dL1-
  1)],....,[xk(t), xk(t - dk,1),…, xk(t – (dLk – 1)dk,dLk-1]} eq. 3
• Determining the best variables xk to use, and properly
  estimating dimensions dLk and time delays dk by analytical
  or experimental means, helps to insure that a given
  process can be successfully reconstructed.
The fundamental problem:

―The simple things you see are
 all complicated.‖—Substitute,
        Pete Townhsend
Consider modeling full-scale
                system with full-scale system
1. Approach
   a. Use data mining to extract information
   contained in the full-scale data
   b. Eliminates problems inherent in scale-up
   issues
   c. Chaotic behavior can be modeled
   d. Systematic and objective approach to
   optimizing information
Building Process
          Models
A view of a general process

                  inputs
                    x1                     outputs
                    x2
                    x3
                    x4          PHYSICAL
                                             y1   multiply periodic
                                PROCESS      y2        chaotic
                    x5
                    x6                       y3     stochastic
                    x7
                    x8

                                           • Outputs that are
    Causes of Variability                    predictable can then
•   people                                   be controlled
•   configuration of controls
•   raw water                              • Outputs that are
•   weather                                  unpredictable cannot
•   chemicals
                                             be controlled
Relate variables with neural
                                      networks
• Inspired by the Brain
  – get complicated behaviors from lots of ―simple‖
    interconnected devices - neurons and synapses

            x1
            x2                         y1
   inputs   x3                              outputs
                                       y2
            x4

            x5

  – non-linear, multivariate curve fitting
  – models are synthesized from example data
     • machine learning
ANNs produce response surfaces
                          Example: Trihalomethanes Formation



                             surface fitted by non-linear
deviation from normal
                             ANN model represents normal
                             behavior
better conditions?




        no data
CASE STUDY NO. 1—THM
AND HAA5 REDUCTION
Modeling chloroform                   TPFIN=11C


• Input = TURBFIN (MWA=4,t=-1),
  R2ANN=0.47, RMSE=7.3
• +Input=COLORFIN (MWA=4),
  R2ANN=0.60, RMSE=6.2
• +Input=TPFIN, R2ANN=0.74,
  RMSE=5.0                    CF higher
                                      at high TP            same
                         R2ANN=0.74




       Days when DBPs measured


                                          TPFIN=32C
Observations about chloroform

• Finished turbidity accounts for 47% of
  variability in chloroform
• Finished turbidity + color accounts for 60%
• Finished turbidity + color + temperature
  accounts for 74%
• Or, R2ANN = 0.74
• Recommend:
  1) optimize turbidity removal—most
  important
Is this counterintuitive?
  2) optimize TOC removal
3D scatter plot




outlier
Modeling
                  BDM, Part 1                    TPFIN=11C
• Inputs = TURBFIN (t=-2) ,
  COLORFIN (MWA=3), R2ANN=0.24,
  RMSE=1.8
• +Input=TPFIN, R2ANN=0.66,
  RMSE=1.2
                     BDM far more sensitive to
                     TPFIN than TURBFIN &
                     COLORFIN



                                                    TPFIN=32C
                       R2ANN=0.66



       Days when DBPs measured
Observations regarding BDM

• Finished turbidity + finished color accounts
  for 24% [very low correlation!]
• Finished turbidity + color + temperature
  accounts for 66%
• Or, R2 = 0.66
• So, BDM is dominated by temperature
• Remove TURBFIN, add inputs =        Modeling
           PRE-Cl2, R2ANN=0.72, RMSE=1.1
                                             BDM, Part 2

TPFIN=11C                                  TPFIN=11C
COLORFIN=3.0         BDM sensitivity       COLORFIN=1.0
                     to PRE-Cl2 &
                     NH3 higher at
                     low TPFIN.


                     BDM higher at
                     higher
                     COLORFIN.


                     TP is dominant
                     effect.
 TPFIN=32C
 COLORFIN=3.0
                                               TPFIN=32C
                                               COLORFIN=1.0
Modeling TCA
                                             TPFIN=11C
• Input = TURBFIN (MWA=4,t=-3),
  R2ANN=0.47, RMSE=5.5
• +Input=COLORFIN (MWA=4),
  R2ANN=0.47, RMSE=5.5
• +Input=TPFIN, R2ANN=0.61,
  RMSE=4.7                 TCA less seasonal
                                  than DCA

     R2ANN=0.61

                                                TPFIN=32C




        Days when DBPs measured
Observations modeling TCA

• Finished turbidity accounts for 47%
  variability
• Finished turbidity + finished color accounts
  for 47% [surprising, as color not capturing
  precursors!]
• Finished turbidity + color + finished
  temperature accounts for 61%
• Or, R2 = 0.61
Summary - modeling THM and
                                       HAA species
• Consider finished turbidity, color, and temperature
   – indicators of organics speciation by time of year
   – treatment process kinetics and performance
• Chloroform positively correlated to finished turbidity, color,
  and temperature; R2ANN = 0.74
• BDM highly seasonal; positively correlated to and finished
  turbidity, color, and temperature, and pre-Cl2 and NH3;
  R2ANN = 0.66 to 0.72
• DCA highly seasonal; positively correlated by to finished
  turbidity, color, and temperature; R2ANN = 0.73
• TCA somewhat seasonal; positively correlated by to
  finished turbidity, and temperature; R2ANN = 0.61
CASE STUDY NO. 2—THM
OPTIMIZATION
Conventional WTP case study
                             Predict and ReduceTHM Formation




• Near real-time
  predictions
• $ Savings by
  optimizing use of
  chemicals
GUI for THM control and ―what
                          ifs‖
CASE STUDY NO. 3—
OPTIMIZE GENERAL
PROCESS
Determine Optimal TOC Removal
3D response surfaces for % TOC
•   Unshown input settings
                                 removal
    –   R-TOC-BLNDCALC = 0
    –   R-PHXY = 0 (hist. avg. = 7.34)
    –   CLO2-H-BLNDCALC = 0.030 mg/l (hist. min.)
    –   COAGAID-X = 0.053 mg/l (hist. min.)
    –   COAG-X = 12.0 mg/l (hist. min.)
% TOC removal contour maps
•   Unshown input settings
     –   R-TOC-BLNDCALC = 0
     –   R-PHXY-C = 0 (hist. avg. = 7.34)
     –   CLO2-H-BLNDCALC = 0.030 mg/l (hist. min.)
     –   COAGAID-X = 0.053 mg/l (hist. min.)
     –   COAG-X = 12.0 mg/l (hist. min.)
Observations for % TOC removal

• Optimal coagulation pH = 6.5
• Coagulation aid = 0.05 mg/L (or < )
  – However, coagulant aid does effect turbidity
• ClO2 = 0.8 mg/L
• Coagulant dose as function of [TOC]
Determine Optimal Turbidity
                  Removal
Total % turbidity removal




• System is robust in removal of turbidity regardless of source turbidity
  levels; when source turbidity increases, % removal asymptotically
  approaches –100%
• Goal is to minimize operating costs to meet water quality targets
Predict % filtration turbidity removal
•   Unshown input settings
     – R-TURB-BLNDCALC = 0
     – Historical minimums
         •   CLO2-H-BLNDCALC = 0.030 mg/l
         •   COAGAID-T3456CALC = 0.057 mg/l
         •   COAG-T3456CALC = 12.0 mg/l
         •   FLTAID-T3456CALC = 0.0041 mg/l
Contour maps for turbidity filtration
  – R-TURB-BLNDCALC = 0
  – Historical mins
     •   CLO2-H-BLNDCALC = 0.030 mg/l
     •   COAGAID-T3456CALC = 0.057 mg/l
     •   COAG-T3456CALC = 12.0 mg/l
     •   FLTAID-T3456CALC = 0.0041 mg/l
Observations % filtration turbidity
                                        removal
1. Turbidity removal through filtration is highly
        sensitive to:
    a. coagulant dose
    b. chlorine dioxide dose
2. Turbidity removal through filtration is NOT
    sensitive to filter polymer aid
3. Turbidity removal = f (sed. turbidity + ClO2 +
    coagulant + coagulant aid); R2 = 0.75
4. Filter run times very low; recommend eliminating
    filter polymer aid
5. Recommend side-by-side filter testing
CASE STUDY NO. 4—
MODELING TANK
NITRIFICATION
Nearby Chlorine (mg/l)
                                                                 Tank nitrification
                                      summer




                                               Tank Level (ft)
                          residual
                         nears zero



                                      Days
                                                                 Cl2



     • Cl2, pH, temp data relationship
       at storage tank site


                                                                 pH
                                                                               temp
Observations about tank water
                                           quality
• Nitrification demonstrated by loss of total
  chlorine residual, lower pH, higher NO-2
• Total chlorine loss is pH sensitive
• Total chlorine loss is very temperature
  dependent
  – Nitrification rate increases exponentially above
    approximately 80 F
• At pH > 9, loss of residual stabilizes
Questions

John B. Cook, PE
Advanced Data Mining Intl,
  Greenville, SC
John.Cook@advdmi.com
843.513.2130
www.advdmi.com

More Related Content

PPT
Ad mi floridan-aquiferwls-for-pps
PDF
Provinciale raad 28 juni 2011
PPT
Hic06 spatial interpolation
PPT
Neiwpcc2010.ppt
PPT
Tools and technology
PDF
Modeling full scale-data(2)
PDF
Scwrc2014 savannah basinresourceoptimization-20141021
PDF
Lecture artificial neural networks and pattern recognition
Ad mi floridan-aquiferwls-for-pps
Provinciale raad 28 juni 2011
Hic06 spatial interpolation
Neiwpcc2010.ppt
Tools and technology
Modeling full scale-data(2)
Scwrc2014 savannah basinresourceoptimization-20141021
Lecture artificial neural networks and pattern recognition

Similar to Modeling full scale-data(2) (20)

PDF
Adaptive pre-processing for streaming data
PDF
Introduction To Modeling And Simulation Mark W Spong
PPTX
CBM Variable Speed Machinery
PDF
An introduction to Machine Learning
PDF
Fahroo - Computational Mathematics - Spring Review 2012
PDF
Puneet Singla
PPT
Paradigm shifts in wildlife and biodiversity management through machine learning
PPTX
What is modeling.pptx
PPTX
Adm graphics-2003
PDF
Wqtc2011 causes offalsealarms-20111115-final
PDF
Mathematical modeling models, analysis and applications ( pdf drive )
PPTX
Environmental modeling from Alemayehu Desalegn.pptx
PDF
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
PDF
Kz2418571860
PDF
Wqtc2013 invest ofperformanceprobswitheds-20130910
DOCX
Homework 21. Complete Chapter 3, Problem #1 under Project.docx
PPT
Introduction to Modeling and Simulations.ppt
PDF
Ku2518881893
PDF
Ku2518881893
PDF
Applied Data Analysis and Modeling for Energy Engineers and Scientists
Adaptive pre-processing for streaming data
Introduction To Modeling And Simulation Mark W Spong
CBM Variable Speed Machinery
An introduction to Machine Learning
Fahroo - Computational Mathematics - Spring Review 2012
Puneet Singla
Paradigm shifts in wildlife and biodiversity management through machine learning
What is modeling.pptx
Adm graphics-2003
Wqtc2011 causes offalsealarms-20111115-final
Mathematical modeling models, analysis and applications ( pdf drive )
Environmental modeling from Alemayehu Desalegn.pptx
CONTINUOUSLY IMPROVE THE PERFORMANCE OF PLANNING AND SCHEDULING MODELS WITH P...
Kz2418571860
Wqtc2013 invest ofperformanceprobswitheds-20130910
Homework 21. Complete Chapter 3, Problem #1 under Project.docx
Introduction to Modeling and Simulations.ppt
Ku2518881893
Ku2518881893
Applied Data Analysis and Modeling for Energy Engineers and Scientists
Ad

More from John B. Cook, PE, CEO (11)

PDF
Orange Co. Water District's Solution to Water Crisis
PDF
Asset management-cda
PDF
Neiwpcc nps 2010
PPT
Hic06 spatial interpolation
PPTX
Integrated river basin management
PPTX
Daamen r 2010scwr-cpaper
PPSX
Wqtc2013 dist syswq-modeling-20131107
PDF
Caw toronto presentation-20121031
PDF
Ewri2009 big data_jbc
PPT
Ad mi floridan-aquiferwls-for-pps
PPTX
Wrf4285 climate changepresentation-20121008
Orange Co. Water District's Solution to Water Crisis
Asset management-cda
Neiwpcc nps 2010
Hic06 spatial interpolation
Integrated river basin management
Daamen r 2010scwr-cpaper
Wqtc2013 dist syswq-modeling-20131107
Caw toronto presentation-20121031
Ewri2009 big data_jbc
Ad mi floridan-aquiferwls-for-pps
Wrf4285 climate changepresentation-20121008
Ad

Recently uploaded (20)

PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Electronic commerce courselecture one. Pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPT
Teaching material agriculture food technology
NewMind AI Monthly Chronicles - July 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Dropbox Q2 2025 Financial Results & Investor Presentation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectral efficient network and resource selection model in 5G networks
Electronic commerce courselecture one. Pdf
20250228 LYD VKU AI Blended-Learning.pptx
CIFDAQ's Market Insight: SEC Turns Pro Crypto
MYSQL Presentation for SQL database connectivity
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Unlocking AI with Model Context Protocol (MCP)
Chapter 3 Spatial Domain Image Processing.pdf
A Presentation on Artificial Intelligence
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation theory and applications.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
“AI and Expert System Decision Support & Business Intelligence Systems”
Teaching material agriculture food technology

Modeling full scale-data(2)

  • 1. Process Optimization: Enhancing Understanding through Mining Full- Scale Data John B. Cook, PE, M.ASCE Edwin A. Roehl Uwe Mundry Advanced Data Mining Int’l Greenville, SC
  • 2. Acknowledgement Ed Roehl – CTO Uwe Mundry, Partner • World class industrial researcher; • World class software design, development; • Software design, development, and project • multi--spectral and hyper-spectral imaging and management; pattern recognition, 4D medical imaging, 4D • Advanced process engineering, computer- geographical imaging, homeland security based modeling and optimization methods, applications, real-time decision support industrial R&D, product/process design systems with industrial applications; Data automation, CAE, PDM; mining, multivariate analysis, predictive modeling, simulation, advanced control, signal • Data mining, multivariate analysis, predictive processing, non-linear/chaotic systems, modeling, simulation, advanced control, signal computational geometry, machine processing, non-linear/chaotic systems, learning/artificial neural networks; computational geometry; OOP/multiple computer languages; Medical • AI, expert systems, OOP/computer languages, and environmental imaging. machine learning/artificial neural networks.
  • 3. Why optimize your plant? • Reduced operating budgets (10% very common) • Increasingly stringent regulations --Water treatment? --Wastewater treatment? • Increasing cost of capital improvements --USD worth less --QE2 will lower value of debt instruments such as bonds
  • 4. Process optimization by modeling 1. Modeling processes through various means a. Bench-scale models b. Pilot-scale models c. Mathematical models 1) Deterministic/mechanistic—based on first principles 2) Empirical—either statistical or based upon some optimal function to describe behavior 3) Hybrid of 1) and 2)
  • 5. Process optimization by modeling What is a mathematical model? ―…..consistent set of mathematical equations which is thought to correspond to some other entity, its prototype.‖—Rutherford Aris
  • 6. Definitions for pilot-scale modeling • Geometric Similarity—All lengths of the model and the prototype must be in the same ratio. All corresponding angles must be equal. [This is the easy one to achieve.] • Kinematic Similarity—Ratios of fluid velocity and other relevant velocities must be the same for the model and prototype. Ratios of flow time scale and boundary time scale must be the same. [Problems with laminar/turbulent.] • Dynamic Similarity—The force polygons for the model and prototype must be proportional. For example, forces such as inertia, pressure, viscous forces, surface tension forces, etc.
  • 7. Equations of importance • R = ρVℓ/µ (very important!) • W = ρV2ℓ/σ (surface tension effects) • F = V/ (gℓ)½ (free surface effects)
  • 8. Scale-up problems with models 1. For bench-scale and pilot-scale: a. Example of problems with scale-up for simple drag coefficient, CD: CD = f (R, W, F, α) [Where is this important for water treatment?] c. Pilot-scale testing is good for comparing one pilot train with another pilot train but not for finding absolute numbers for full-scale
  • 9. So what of models? ―Models are undeniably beautiful, and a man may justly be proud to be seen in their company. But they may have their hidden vices. The question is, after all, not only whether they are good to look at, but whether we can live happily with them.‖ --Abraham Kaplan, The Conduct of Inquiry
  • 10. Another problem: chaotic behavior • ―Deterministic evolution of a nonlinear system which is between regular behavior and stochastic behavior.” – Abarbanel • ―The property that characterizes a dynamical system in which most orbits exhibit sensitive dependence.” – Lorenz • ―Neither periodic or stochastic behaviors that have structure in state/feature space, making them somewhat predictable.‖– ADMi
  • 11. Lorenz attractor shows problem • Poster child of chaos • Purely synthetic, derived from 3 equations – dx/dt = -σx + σy mode 1 – dy/dt = -xz + rx – y – dz/dt = xy – bz mode 2 3D delay plot signal showing “orbitals” mode 1 “extreme sensitivity to changes mode 2 in boundary conditions”
  • 12. Chaos in Savannah River estuary
  • 13. Savannah River salinity intrusion low f SC  24-hr MWA measured predicted R2=0.88
  • 14. Modeling chaotic behavior, 1 State Space Reconstruction (SSR) • SSR is the means by which complex, constantly changing processes can be represented in straightforward geometric terms for visualization and modeling. SSR is like super trending. It suggests that a process’ state space can be optimally but not perfectly characterized by state vectors Y(t). The vectors are constructed using an optimal number of measurements, equal to ―local dimension‖ dL (Abarbanel,1996), that are spaced optimally apart in time by integer multiples of an optimal time delay d3. Mathematically: • Y(t) = [x(t), x(t - d), x(t - 2d),...., x(t – (dL - 1)d)] eq. 1 • Note that here Y(t) is univariate. Values of dL and d are estimated analytically or experimentally from the data.
  • 15. Modeling chaotic behavior, 2 • For a multivariate process of k independent variables: • Y(t) = {[x1(t), x1(t - d1),…, x1(t – (dL1 – 1)d1)],....,[xk(t), xk(t - dk),…, xk(t – (dLk – 1)dk)]} eq. 2 • This provides each variable with its own dL and d. A further generalization that provides non-fixed time delay spacing for each variable: • Y(t) = {[x1(t), x1(t - d1,1),…, x1(t – (dL1 – 1)d1,dL1- 1)],....,[xk(t), xk(t - dk,1),…, xk(t – (dLk – 1)dk,dLk-1]} eq. 3 • Determining the best variables xk to use, and properly estimating dimensions dLk and time delays dk by analytical or experimental means, helps to insure that a given process can be successfully reconstructed.
  • 16. The fundamental problem: ―The simple things you see are all complicated.‖—Substitute, Pete Townhsend
  • 17. Consider modeling full-scale system with full-scale system 1. Approach a. Use data mining to extract information contained in the full-scale data b. Eliminates problems inherent in scale-up issues c. Chaotic behavior can be modeled d. Systematic and objective approach to optimizing information
  • 19. A view of a general process inputs x1 outputs x2 x3 x4 PHYSICAL y1 multiply periodic PROCESS y2 chaotic x5 x6 y3 stochastic x7 x8 • Outputs that are Causes of Variability predictable can then • people be controlled • configuration of controls • raw water • Outputs that are • weather unpredictable cannot • chemicals be controlled
  • 20. Relate variables with neural networks • Inspired by the Brain – get complicated behaviors from lots of ―simple‖ interconnected devices - neurons and synapses x1 x2 y1 inputs x3 outputs y2 x4 x5 – non-linear, multivariate curve fitting – models are synthesized from example data • machine learning
  • 21. ANNs produce response surfaces Example: Trihalomethanes Formation surface fitted by non-linear deviation from normal ANN model represents normal behavior better conditions? no data
  • 22. CASE STUDY NO. 1—THM AND HAA5 REDUCTION
  • 23. Modeling chloroform TPFIN=11C • Input = TURBFIN (MWA=4,t=-1), R2ANN=0.47, RMSE=7.3 • +Input=COLORFIN (MWA=4), R2ANN=0.60, RMSE=6.2 • +Input=TPFIN, R2ANN=0.74, RMSE=5.0 CF higher at high TP same R2ANN=0.74 Days when DBPs measured TPFIN=32C
  • 24. Observations about chloroform • Finished turbidity accounts for 47% of variability in chloroform • Finished turbidity + color accounts for 60% • Finished turbidity + color + temperature accounts for 74% • Or, R2ANN = 0.74 • Recommend: 1) optimize turbidity removal—most important Is this counterintuitive? 2) optimize TOC removal
  • 26. Modeling BDM, Part 1 TPFIN=11C • Inputs = TURBFIN (t=-2) , COLORFIN (MWA=3), R2ANN=0.24, RMSE=1.8 • +Input=TPFIN, R2ANN=0.66, RMSE=1.2 BDM far more sensitive to TPFIN than TURBFIN & COLORFIN TPFIN=32C R2ANN=0.66 Days when DBPs measured
  • 27. Observations regarding BDM • Finished turbidity + finished color accounts for 24% [very low correlation!] • Finished turbidity + color + temperature accounts for 66% • Or, R2 = 0.66 • So, BDM is dominated by temperature
  • 28. • Remove TURBFIN, add inputs = Modeling PRE-Cl2, R2ANN=0.72, RMSE=1.1 BDM, Part 2 TPFIN=11C TPFIN=11C COLORFIN=3.0 BDM sensitivity COLORFIN=1.0 to PRE-Cl2 & NH3 higher at low TPFIN. BDM higher at higher COLORFIN. TP is dominant effect. TPFIN=32C COLORFIN=3.0 TPFIN=32C COLORFIN=1.0
  • 29. Modeling TCA TPFIN=11C • Input = TURBFIN (MWA=4,t=-3), R2ANN=0.47, RMSE=5.5 • +Input=COLORFIN (MWA=4), R2ANN=0.47, RMSE=5.5 • +Input=TPFIN, R2ANN=0.61, RMSE=4.7 TCA less seasonal than DCA R2ANN=0.61 TPFIN=32C Days when DBPs measured
  • 30. Observations modeling TCA • Finished turbidity accounts for 47% variability • Finished turbidity + finished color accounts for 47% [surprising, as color not capturing precursors!] • Finished turbidity + color + finished temperature accounts for 61% • Or, R2 = 0.61
  • 31. Summary - modeling THM and HAA species • Consider finished turbidity, color, and temperature – indicators of organics speciation by time of year – treatment process kinetics and performance • Chloroform positively correlated to finished turbidity, color, and temperature; R2ANN = 0.74 • BDM highly seasonal; positively correlated to and finished turbidity, color, and temperature, and pre-Cl2 and NH3; R2ANN = 0.66 to 0.72 • DCA highly seasonal; positively correlated by to finished turbidity, color, and temperature; R2ANN = 0.73 • TCA somewhat seasonal; positively correlated by to finished turbidity, and temperature; R2ANN = 0.61
  • 32. CASE STUDY NO. 2—THM OPTIMIZATION
  • 33. Conventional WTP case study Predict and ReduceTHM Formation • Near real-time predictions • $ Savings by optimizing use of chemicals
  • 34. GUI for THM control and ―what ifs‖
  • 35. CASE STUDY NO. 3— OPTIMIZE GENERAL PROCESS
  • 37. 3D response surfaces for % TOC • Unshown input settings removal – R-TOC-BLNDCALC = 0 – R-PHXY = 0 (hist. avg. = 7.34) – CLO2-H-BLNDCALC = 0.030 mg/l (hist. min.) – COAGAID-X = 0.053 mg/l (hist. min.) – COAG-X = 12.0 mg/l (hist. min.)
  • 38. % TOC removal contour maps • Unshown input settings – R-TOC-BLNDCALC = 0 – R-PHXY-C = 0 (hist. avg. = 7.34) – CLO2-H-BLNDCALC = 0.030 mg/l (hist. min.) – COAGAID-X = 0.053 mg/l (hist. min.) – COAG-X = 12.0 mg/l (hist. min.)
  • 39. Observations for % TOC removal • Optimal coagulation pH = 6.5 • Coagulation aid = 0.05 mg/L (or < ) – However, coagulant aid does effect turbidity • ClO2 = 0.8 mg/L • Coagulant dose as function of [TOC]
  • 41. Total % turbidity removal • System is robust in removal of turbidity regardless of source turbidity levels; when source turbidity increases, % removal asymptotically approaches –100% • Goal is to minimize operating costs to meet water quality targets
  • 42. Predict % filtration turbidity removal • Unshown input settings – R-TURB-BLNDCALC = 0 – Historical minimums • CLO2-H-BLNDCALC = 0.030 mg/l • COAGAID-T3456CALC = 0.057 mg/l • COAG-T3456CALC = 12.0 mg/l • FLTAID-T3456CALC = 0.0041 mg/l
  • 43. Contour maps for turbidity filtration – R-TURB-BLNDCALC = 0 – Historical mins • CLO2-H-BLNDCALC = 0.030 mg/l • COAGAID-T3456CALC = 0.057 mg/l • COAG-T3456CALC = 12.0 mg/l • FLTAID-T3456CALC = 0.0041 mg/l
  • 44. Observations % filtration turbidity removal 1. Turbidity removal through filtration is highly sensitive to: a. coagulant dose b. chlorine dioxide dose 2. Turbidity removal through filtration is NOT sensitive to filter polymer aid 3. Turbidity removal = f (sed. turbidity + ClO2 + coagulant + coagulant aid); R2 = 0.75 4. Filter run times very low; recommend eliminating filter polymer aid 5. Recommend side-by-side filter testing
  • 45. CASE STUDY NO. 4— MODELING TANK NITRIFICATION
  • 46. Nearby Chlorine (mg/l) Tank nitrification summer Tank Level (ft) residual nears zero Days Cl2 • Cl2, pH, temp data relationship at storage tank site pH temp
  • 47. Observations about tank water quality • Nitrification demonstrated by loss of total chlorine residual, lower pH, higher NO-2 • Total chlorine loss is pH sensitive • Total chlorine loss is very temperature dependent – Nitrification rate increases exponentially above approximately 80 F • At pH > 9, loss of residual stabilizes
  • 48. Questions John B. Cook, PE Advanced Data Mining Intl, Greenville, SC John.Cook@advdmi.com 843.513.2130 www.advdmi.com