SlideShare a Scribd company logo
Itinerary
“A Traveler's Guide”
• About ADM
– Data Mining
– Visualization
– Intelligent Software
– Real-Time Web Applications
• Technology
• Examples
Data Mining?
• “The search for valuable knowledge in massive volumes
of data” (Weiss and Indurkya)
• Data Mining Tool Box
– signal processing, advanced statistics, machine learning, chaos
theory, advanced visualization
• Why?
– Continuously maximize yields, throughput, profit
– Continuously minimize problems
• How?
– Learn/quantify important cause-effect relationships
– Computer models developed directly from data
• are “virtual processes” that behave like the real processes
• predict future outcomes, evaluate alternatives, show the best
pathway forward
More on Data Mining
• Data have properties that must be measured for optimal
use
– uni / multivariate relationships
– periodicity / chaos / noise
– orthogonality / redundancy
– continuity / segmentation
– dynamics
• temporal: time delays, prediction horizon
• dimensions: inertia, historical uniqueness
• Data Mining
– Maximizes/Extracts “information content”
– Automates discovery
– Integrates your data with your business
20 Years of Medical Imaging
for GE and Siemens
Applied to Data Mining
About ADM
• New Company
• Data Mining & Visualization Services and Software
• Founders have 40+ years
– engineering, artificial intelligence/expert systems,
complex programming, signal processing,
clustering/classification, machine learning, advanced
visualization, data mining
– Automotive, Environmental, Medical, Metals, Oil & Gas,
Polymers, Electronics
– special expertise in dynamical systems that constantly
change/evolve
• Fastest, most skilled anywhere
Technology
A View of Processes
PHYSICAL
PROCESS
“deterministic
dynamical
system”
inputs
outputsx1
x2
x3
x4
x5
x6
x7
x8
y1
y2
y3
multiply periodic
chaotic
stochastic
non-stochastic effects
should be predictable,
therefore controllable
Multiply Periodic
(Fourier approximations)
• people
• lab tests
• controls tuning
• raw materials
• weather
Chaos
Lorenz attractor
Power Spectrum
3D Delay Plot
“Orbitals”
Prediction from  = -10
“extreme sensitivity
to changes in
boundary conditions”
Chaos Example
Role of a Process Model
process
model
inputs
outputsx1
x2
x3
x4
x5
x6
x7
x8
control
setpoints
e.g., pressure
temperature
speed
raw material
properties
e.g., density
surface area
molecular weight
y1
y2
y3
quality
measures
e.g., strength
clarity
thickness
Things you CAN’T control
What you want to know
other state
variables
e.g., humidity
amb. temperature
Things you CAN control
Deterministic vs. Empirical
n Sxy - Sx Sy
n Sx2 - (Sx)2A =
Sy Sx2 - Sx Sxy
n Sx2 - (Sx)2B =
Neural
Networks
Statistics
Empirical Models
E = m c2
du d2u d2u d2u
dt dx2 dy2 dz2 = 0+ +-
First Principles
Models
Production
Economic
Environment
Interpolation / Extrapolation
P1 Pz
Px
Py
P3
P2
P4
Pw
“a good design”
“a bad design”
“a mediocre
design”
regions where
model
extrapolates
regions where
model
interpolates
Historical Data
• noisy, small data
• designed experiments
Model Space
About Neural Networks
• Inspired by the Brain
– get complicated behaviors from lots of “simple”
interconnected devices - neurons and synapses
– models are synthesized from example data
• machine learning
x1
x2
x3
x4
x5
y1
y2
inputs outputs
About Neural Networks
• Non-linear Multivariate Curve Fitting
– the modeler prescribes inputs, outputs, hidden layer
neurons, and connections
– “Weights” are the
unknown coefficients that
are determined by the
computer from examples
using an error minimizing
“learning algorithm”
output layer
hidden layerinput layer
“weights” control connections
wi
wi+n
y1
y2
y1
y2
y1
y2
input/output examples
x1
x2
x3
x4
x5
x1
x2
x3
x4
x5
x1
x2
x3
x4
x5
About Neural Networks
• Shifts Modeling Focus
– from smaller data/big deterministic modeling effort
– to bigger data/smaller modeling effort
– combine with optimization (search) methods
• real-time prediction
• resource allocation
– deterministic + error correcting ANN hybrids
Response Surfaces
Water Disinfection Trihalomethanes Formation
no data
surface fitted by non-linear
ANN model represents normal
behavior
deviation from normal
better conditions?
Response Surfaces
Savannah River Saltwater Migration
Optimizing With Models
process
model
inputs outputs
x1
x2
x3
x4
x5
x6
x7
x8
y1
y2
y3
PI = ay1 + b y2 + cy3
are varied by
search routine
are evaluated
for goodness
optimization program
(search routine)
GOAL: determine values of inputs (within controllable
range) to optimize Performance Index while meeting
constraints.
Control Possibilities
Polymer Packaging Film Intrinsic Viscosity
PROCESS
WEATHER
ACTUAL
Prediction
BEFORE
AFTER
Industrial
”Sometimes it runs good, sometimes
bad, and we don’t know why.”
heater
dryer
air jets
Discrete Event Prediction
Unexplained Polymer Film Production Shutdowns
20 minute interval , 4 minute ramp
1 minute
before
19 minutes
before
12 minutes
before
temperature
related web
breaks
viscosity
initial web break
Representation
Dynamics
• can require multiple
delays for same variable
• delays may be vary
Different events due to
different causes are
detected at different
times prior to occurrence
Off-Spec Production
Synthetic Textile Fiber Quality
Days since July 1, 1998 Days since July 1, 1998
Days since July 1, 1998
ProcessTemp(C)
ProcessTemp(C)
ProcessTemp(C)
Waste(lbs)
Q3(lbs)AmbientTemp(C)
• During period of
high off-spec,
process tracks
ambient
temperature.
Semi-Quantitative Data
Polymer Resin Solid State Polymerization
Ambient Pressure = 29.41
Ambient
Pressure = 30.26
Frequency
heater
dryer
air jets
Environmental Compliance
Water Disinfection Trihalomethanes Formation
0
20
40
60
80
100
120
140
160
7/1/00 7/31/00 8/30/00 9/29/00 10/29/00 11/28/00 12/28/00 1/27/01 2/26/01 3/28/01
Trihalomethanes(ppb)
FINISHED THM (ppb)
Control Model
Virtual Sensor
• EPA regulated
carcinogen
• Different models for
– prediction
– control (gains)
• $$$ Savings by
optimizing use of
ClO2 vs. Cl2
straight predictions
Customer
Process
Engineering
Tech
Service
Output of your
process
Input to your
customer’s process
Customer Feedback
Your
Continuous
Improvement
Customer’s
Continuous
Improvement
Customer Performance
Synthetic Textile Fiber Quality
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
5/20/99 7/9/99 8/28/99 10/17/99 12/6/99 1/25/00 3/15/00 5/4/00 6/23/00 8/12/00 10/1/00
Opelika Date
DenierComposite
3.5
4
4.5
5
5.5
6
6.5
RedButtons
DENVMACV
Opelika MJS Red Buttons
108
110
112
114
116
118
120
8/15/99 10/4/99 11/23/99 1/12/00 3/2/00 4/21/00 6/10/00 7/30/00
Date
H_WHITComposite
5
6
7
8
9
10
11
CalhounMJSRedButtons
HWHITCV
Calhoun MJS Red Buttons
AL
SC
-20
-15
-10
-5
0
5
10
15
20
7/9/99 8/28/99 10/17/99 12/6/99 1/25/00 3/15/00 5/4/00 6/23/00
Frontier DATE
FUSEComposite2wkavg,Delay=15days
325
330
335
340
345
350
355
360
365
370
32.5SingleEndsBreak2wkavg
FUSEC
F32_5SEB
NC
 Setpoints  Quality?
Synthetic Textile Fiber Quality
19
20
21
22
23
24
25
26
6/3/99 7/23/99 9/11/99 10/31/99 12/20/99 2/8/00 3/29/00 5/18/00 7/7/00
Date
CRIMP_P
410
420
430
440
450
460
470
480
490
500
U3VI3523.SP
CRIMPPA
CRIMPPB
I3523SP
0.85
0.86
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
0.96
6/6/99 7/26/99 9/14/99 11/3/99 12/23/99 2/11/00 4/1/00 5/21/00 7/10/00
Date
DenierComposite
90
100
110
120
130
140
U3PP3727.SPandU3PP3727.PV
DENVMAC
P3727SP
P3727PV
25
27
29
31
33
35
37
6/4/99 7/24/99 9/12/99 11/1/99 12/21/99 2/9/00 3/30/00 5/19/00 7/8/00
Date
DailyAverageELONGAComposite
50
60
70
80
90
100
U3PF3291.PVandL16PT409.PV
ELONGAC
F3291PV
PT409PV
0
10
20
30
40
50
60
6/4/99 7/24/99 9/12/99 11/1/99 12/21/99 2/9/00 3/30/00 5/19/00 7/8/00
Date
FUSEComposite
-0.5
0.5
1.5
2.5
3.5
4.5
S3P27845.PV
FUSEC
27845PV
CRIMP
DENIER
ELONG
FUSE
Contract Optimization
(electricity purchasing relative to usage)
Kosa Spartanburg Baselines
24000
26000
28000
30000
32000
34000
36000
38000
40000
42000
44000
M-98 M-98 A-98 M-98 J-98 J-98 A-98 S-98 O-98 N-98 D-98 J-99 F-99
Date
kworkwh
D1+D2 LOAD TOTAL
D1+D2 USE kwh/hour
D1+D2 USE Billing Demand
Evaluate Contract Costs and Options
Koas Spartanburg Load Shifting Scenarios
37,000
38,000
39,000
40,000
41,000
42,000
43,000
9/1 9/2 9/3 9/4 9/5 9/6 9/7 9/8
Date 1998
kw
0.01
0.03
0.05
0.07
0.09
0.11
0.13
0.15
0.17
0.19
$/kwh
Current D1+D2 kw Best Case D1+D2 kw Worst Case D1+D2 kw RTP($/kw)
Load Control Scenarios
Real-Time Pricing and
Electricity Deregulation
Natural Systems
surface water
Estuarine Water Quality
3
4
5
6
7
8
9
10
8/21/93 0:30 8/22/93 0:30 8/23/93 0:30 8/24/93 0:30
Date and time
Dissolvedoxygen(mg/L)
16
18
20
22
24
26
28
30
32
Temperature
(degreeCelsius)
Measured Neural Network BRANCH/BLTM
Water temperature
Dissolved oxygen
• Mixing - Tides, Flows from 3 Rivers
• Weather (T, P Dew Point)
• Point Discharge Wastewater
Treatment Plants
• Non-Point Discharges - rainfall, 50%
overbank storage
Pollution in Estuaries
High TideMean Tide
wastewater
discharges
non-point
from tidal
flooding
gauging
stations
non-point
from rain
Dissolved Oxygen
Concentration (mg/L)
Water-Temperature (deg. C)
Specific Conductivity (µ-siemens/cm)
Time (hours)
Water Level (feet)
2.8e4
0.4e4
1.6e4
32.0
30.5
29.0
27.5
10.4
5.6
8.0
5.0
3.5
6.5
2 months
raw signals
low frequency
broadband
6.2 hr
12.4 hr
25.6 & 24 hr
water level
dissolved oxygen
concentration
specific
conductivity
water temperature
8.2 hr
spectral
analysis
5.7
5.4
5.1
4.8
9.1
8.8
8.5
30.6
28.8
29.7
2.4e4
1.2e4
1.8e4
signal
decomposition
chaotic
component via
digital filtering
Dissolved Oxygen
Concentration (mg/L)
Water-Temperature (deg. C)
Specific Conductivity (µ-siemens/cm)
Time (hours)
Water Level (feet)
0
10,000
20,000
30,000
40,000
50,000
6/19/93 7/3/93 7/17/93 7/31/93 8/14/93 8/28/93 9/11/93
Date
SC(micro-siemens/cm)
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
6.5
DO(mg/L)
SC DO
0.6
0.6e4
-0.3
-1.2
1.2
0.3
-0.6
-1.5
0.4
-0.2
-0.5
0.1
-0.6e4
0.0e4
signal
decomposition
high frequency,
periodic
components via
digital filtering
Dissolved Oxygen Concentration (mg/L)
Water-Temperature (deg. C)
Specific Conductivity (µ-siemens/cm)
Time (hours)
Water Level (feet)
Chaotic
and Low
Frequency
Data
ANN
DOt=0
High
Frequency
Data
ANN
DO
final
SC’st
=x
WL’s
t=y
Chaotic and
Low
Frequency
Data
ANN
SC’sti
DOt=0
Measured
Training
Test
Beaufort River Estuary
SS
SP
PI
-0.5
-0.3
-0.1
0.1
0.3
0.5
1 6 11 16 21 26 31 36 41 46 51 56 61 66
Sequential but Non-Consecutive Data Point Number
24hrDerivativeofDOA
(mg/L)
-400
-200
0
200
400
600
800
SouthsideBOD(LB/Day)
EDOa6611
SSBOD_D
R = -0.357, R2 = 0.128
R2
siso = 0.130 = 1
Gains by delay 
 = 2 day
 = 3 days
 = 1
BOD Effect on DO
Inputs = WL, SC, TP, Rainfall, BOD
R2
ANN = 0.57
N = 61 points
deltaDOs-DOm
flow towards gauge
TP = 20 C
61 Points
increasing BOD
delta DOs-DOm
No Data
Natural Systems
groundwater
Groundwater Modeling
Upper Floridian Aquifer
Well Histories
(18 years)
Surface
Contour
Well Locations
(100x100 miles)
Sub-Regions Behave Differently
350000
370000
390000
410000
430000
450000
470000
490000
2360000 2380000 2400000 2420000 2440000 2460000 2480000 2500000
25x30miles
Groundwater Example
(Cluster monitoring wells according to behavior)
Accuracy by Cluster
Actual
Prediction
C1
History from April 1982 to October 1998
NormalizedWater
LevelaboveSeaLevel
C3
C6
C10
Aquifer “Ceiling”
Gulf of Mexico
Max elevation above
sea level ~ 180 feet
North
Oil & Gas
Problem
• Area A is a coal bed methane field
• Data for 59 wells was compiled by petroleum engineering group in CO
to determine if an artificial neural network* (ANN) could be used to
predict total gas production for the life of each well.
0
5000
10000
15000
20000
25000
30000
0 5000 10000
Normalized UTMX (m)
NormalizedUTMY(m)
3D Range Model of
Area A (16x32 km
vertical scale ~ 200m)
Well
Locations
* A form of “machine learning” from AI.
North
North
Variable Types
• Static Variables
– e.g., depth, permeability
– for each well, treated like they do not vary in time
• in reality some probably do
– some measurements are just estimates with large error
– limited “information content”, e.g. one value per well
• Time Series, a.k.a. Signals
– e.g., water and gas production rates
– values vary in time
– large “information content”, dozens of values per well
– variability in time
• caused by the underlying process physics
• a detailed surrogate for an explicit description of the physics
• Synthetic Variables - are computed by equations or models
Static Variables
• Potential Model Inputs
– Geometric Variables - SURFace ELEVation, COAL
ELEVation, COAL DEPTH, COAL THICKness
– COAL PERMeability COAL POROsity - from cores,
logs, and engineering estimates
– KV CONFining - permeability in vertical direction
Water & Gas Time Series
Detail
59 Wells
Wells are similar, but
are different in detail
MonthlyGasProduced(MMCF)
MonthlyWaterProduced(bbl)
Production Month
Well 1
Well 2
Well 3
Synthetic Variables
• Potential Model Inputs
– GIP - estimated gas in place from geometric variables
• Model Outputs
– CGP 25 - cumulative gas production to 25 psi, estimated by reservoir
simulator using static variables
– CGP 50 - as for CGP 25, but to 50 psi.
CGP 50 vs. 25
R2 = 0.99
CGP 25 vs. GIP
R2 = 0.39
R2
ANN = 0.52
“Model the Model”
• A computer model can be treated as a black box. The box “maps”
multiple inputs to an output.
• If, for all combinations of input values, the box computes a
continuously differentiable output, its map can be learned very
accurately by an ANN.
• A cause of non-differentiable output is switching by programmed logic
in model’s computer code.
• Discontinuous maps can be segmented by clustering, then modeled
using multiple ANN’s.
Reservoir
Simulator
x1
x2
x3
x4
x5
x6
y
inputs output
continuously
differentiable
output
x1
y
x5
Chaotic Systems
• Modeled using Phase Space Reconstruction
• Takens Theorem, univariate systems (1980)
– x(t) = F[x(t - p - d), x(t - p - 2d),…, x(t - p - nd)]
– current state-of-the-art
– implies
• x can be predicted at time t from an optimal number n previous
measurements (n is called the “dimension”)
• n measurements spaced at optimal time delays nd produces an
optimal prediction
• Requirements/Options
– multivariate, variable delays - much better than Takens
– completely extracts “information content”
Phase Space Reconstruction
• Select d = 6 months by inspection (optimal delays can be computed
for larger data sets).
• MWA = 6-month moving window average of water and gas
production to remove high frequency variability.
0
5,000
10,000
15,000
20,000
25,000
0 6 12 18 24 30 36 42 48 54 60 66 72 78
Month of Production
Water(BBL/m)&Gas(MCF/m)
WATER MWA GAS MWA
One well’s
history
Model CGP by ANN
• Develop succession of models with longer histories as inputs, checking R2
ANN
as you go.
• Point count N decreases with longer histories because some well histories are
less complete.
• R2
ANN at 6 months = 0.68, 0.93 at 36 months…good enough!
• RMSE = 344 MCF at 36 months relative to 6500 MCF actual full scale (5%).
0
10
20
30
40
50
60
70
80
90
100
6 12 18 24 30 36
Month of Production
N(pointcount)&R2
ANNx100
0
100
200
300
400
500
600
700
800
CUMGAS25PredictionRMSE
(MMCF)
N
R2 x 100
RMSE
Results
Actual & Predicted using 6
months history, R2
ANN = 0.68
Actual & Predicted using 36
months history, R2
ANN = 0.93
CGP25 (BCF)CGP25 (BCF)
PredictedCGP25(BCF)
.
.
.
.
.
.
.
.
PredictedCGP25(BCF)
Prediction using 6
months of history.
Prediction using 36
months of history.
Coal Gas Summary
• Time series have higher information content,
less noise than static variables.
• Phase Space Reconstruction
– leverages hidden physics manifest in time series
variability
– high accuracy
– extensible to other gas fields, collections of fields,
other problems, other domains
Conclusions
• Your process - room for improvement?
• Data Mining
– fast, powerful, decisive
– automates knowledge acquisition,
produces predictive models
– solves problems that are unsolvable by any
other means.
• Advanced visualization makes results
understandable to all.
Archive
Modeling Chaos
• Takens Theorem (1980), univariate systems
– x(t) = F[x(t - p - d), x(t - p - 2d),…, x(t - p - nd)]
• each t represents a vector of measurements
• p = the “time delay” of the most recent measurement available
– implies a “prediction horizon”
• n and d = “dynamical invariants”
– analogous to amplitudes, periods, and phases in periodic systems
• n + 1 = “embedding dimension”
– the number of previous measurements
– implies an optimal number of previous measurements
– d = characteristic “time delay” of the attractor
– implies an optimal spacing in time for the previous measurements
• F is an arbitrary mapping function, e.g., “look up”, regression, or
ANN, whatever gives the best results
Modeling Chaos
• ADM, multivariate “Takens”
– y(t) = F[xi(t - pi), xi(t - pi - di),…, xi(t - pi - nidi)]
• i, pi, ni, and di are dynamical invariants
• i = number of input variables
• xi = model input variables
– implies an optimal set
• pi = time delay of peak (optimal) correlation between y and
each xi
• ni + 1 = the embedding dimension of the attractor of each xi
• di = characteristic “time delay” of the attractor xi
• F is an arbitrary mapping function, generally an ANN
Modeling Chaos
• ADM generalization
– y(t) = F[xi(t - pi), xi(t - pi - dij),…, xi(t - pi - din)]
• din replaces di, a variable delay
• For medium data sets (300 to 1000 vectors)
– y(t) = F[xi(t - pi), x’i(t - pi - dij),…, x’i(t - pi - din)]
• replace xi with derivatives x’I to mitigate tendency of
aggressive regression techniques to overfit data
• also amplifies effects of small changes
• Dynamical Invariants computed by systematic
search

More Related Content

PDF
Forecasting Of Advancements In Additive Manufacturing
PDF
AmicoFragile_F
PPTX
Trabajo colaborativo
PDF
Zaklików
PPT
Harry coumnas is building under water restaurant in greece
PPTX
PDF
Neiwpcc nps 2010
DOCX
Resume - Copy
Forecasting Of Advancements In Additive Manufacturing
AmicoFragile_F
Trabajo colaborativo
Zaklików
Harry coumnas is building under water restaurant in greece
Neiwpcc nps 2010
Resume - Copy

Similar to Adm graphics-2003 (20)

PDF
Modeling full scale-data(2)
PDF
Modeling full scale-data(2)
PPT
Lecture2.ppt
PDF
From sensor readings to prediction: on the process of developing practical so...
PPTX
OCS353 DATA SCIENCE FUNDAMENTALS- Unit 1 Introduction to Data Science
PDF
Departure Delay Prediction using Machine Learning.
PDF
Data preprocessing.pdf
PDF
How to Lie with Data and Statistics? | Iveta Lohovska, Principal Data Scienti...
PPTX
Machine Learning for Forecasting: From Data to Deployment
PDF
ml.pdf by Tee
PDF
Classifier Model using Artificial Neural Network
PPTX
Chapter 2 Introduction to CR_Process.pptx
PPTX
lecture-1-2 modelling and simulation.pptx
PPTX
lecture-1-2 modelling and simulation.pptx
PPTX
lecture-1-2MOdelling and Simulation.pptx
PPTX
lecture-1-2 modelling and simulation.pptx
PDF
Aplication of on line data analytics to a continuous process polybetene unit
PPTX
Copy of Other uncertain techniques.pptx.
PPTX
Machine Learning & Predictive Maintenance
PPTX
What is modeling.pptx
Modeling full scale-data(2)
Modeling full scale-data(2)
Lecture2.ppt
From sensor readings to prediction: on the process of developing practical so...
OCS353 DATA SCIENCE FUNDAMENTALS- Unit 1 Introduction to Data Science
Departure Delay Prediction using Machine Learning.
Data preprocessing.pdf
How to Lie with Data and Statistics? | Iveta Lohovska, Principal Data Scienti...
Machine Learning for Forecasting: From Data to Deployment
ml.pdf by Tee
Classifier Model using Artificial Neural Network
Chapter 2 Introduction to CR_Process.pptx
lecture-1-2 modelling and simulation.pptx
lecture-1-2 modelling and simulation.pptx
lecture-1-2MOdelling and Simulation.pptx
lecture-1-2 modelling and simulation.pptx
Aplication of on line data analytics to a continuous process polybetene unit
Copy of Other uncertain techniques.pptx.
Machine Learning & Predictive Maintenance
What is modeling.pptx
Ad

More from John B. Cook, PE, CEO (16)

PDF
Orange Co. Water District's Solution to Water Crisis
PDF
Scwrc2014 savannah basinresourceoptimization-20141021
PDF
Asset management-cda
PPT
Hic06 spatial interpolation
PPTX
Integrated river basin management
PPTX
Daamen r 2010scwr-cpaper
PPSX
Wqtc2013 dist syswq-modeling-20131107
PDF
Caw toronto presentation-20121031
PDF
Wqtc2013 invest ofperformanceprobswitheds-20130910
PDF
Ewri2009 big data_jbc
PPT
Ad mi floridan-aquiferwls-for-pps
PPTX
Wrf4285 climate changepresentation-20121008
PDF
Wqtc2011 causes offalsealarms-20111115-final
PPT
Neiwpcc2010.ppt
PPT
Hic06 spatial interpolation
PPT
Ad mi floridan-aquiferwls-for-pps
Orange Co. Water District's Solution to Water Crisis
Scwrc2014 savannah basinresourceoptimization-20141021
Asset management-cda
Hic06 spatial interpolation
Integrated river basin management
Daamen r 2010scwr-cpaper
Wqtc2013 dist syswq-modeling-20131107
Caw toronto presentation-20121031
Wqtc2013 invest ofperformanceprobswitheds-20130910
Ewri2009 big data_jbc
Ad mi floridan-aquiferwls-for-pps
Wrf4285 climate changepresentation-20121008
Wqtc2011 causes offalsealarms-20111115-final
Neiwpcc2010.ppt
Hic06 spatial interpolation
Ad mi floridan-aquiferwls-for-pps
Ad

Recently uploaded (20)

PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
composite construction of structures.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Construction Project Organization Group 2.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
PPTX
Welding lecture in detail for understanding
PPTX
Geodesy 1.pptx...............................................
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
composite construction of structures.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Construction Project Organization Group 2.pptx
bas. eng. economics group 4 presentation 1.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Automation-in-Manufacturing-Chapter-Introduction.pdf
Operating System & Kernel Study Guide-1 - converted.pdf
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
Welding lecture in detail for understanding
Geodesy 1.pptx...............................................
CH1 Production IntroductoryConcepts.pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Foundation to blockchain - A guide to Blockchain Tech

Adm graphics-2003

  • 1. Itinerary “A Traveler's Guide” • About ADM – Data Mining – Visualization – Intelligent Software – Real-Time Web Applications • Technology • Examples
  • 2. Data Mining? • “The search for valuable knowledge in massive volumes of data” (Weiss and Indurkya) • Data Mining Tool Box – signal processing, advanced statistics, machine learning, chaos theory, advanced visualization • Why? – Continuously maximize yields, throughput, profit – Continuously minimize problems • How? – Learn/quantify important cause-effect relationships – Computer models developed directly from data • are “virtual processes” that behave like the real processes • predict future outcomes, evaluate alternatives, show the best pathway forward
  • 3. More on Data Mining • Data have properties that must be measured for optimal use – uni / multivariate relationships – periodicity / chaos / noise – orthogonality / redundancy – continuity / segmentation – dynamics • temporal: time delays, prediction horizon • dimensions: inertia, historical uniqueness • Data Mining – Maximizes/Extracts “information content” – Automates discovery – Integrates your data with your business
  • 4. 20 Years of Medical Imaging for GE and Siemens
  • 6. About ADM • New Company • Data Mining & Visualization Services and Software • Founders have 40+ years – engineering, artificial intelligence/expert systems, complex programming, signal processing, clustering/classification, machine learning, advanced visualization, data mining – Automotive, Environmental, Medical, Metals, Oil & Gas, Polymers, Electronics – special expertise in dynamical systems that constantly change/evolve • Fastest, most skilled anywhere
  • 8. A View of Processes PHYSICAL PROCESS “deterministic dynamical system” inputs outputsx1 x2 x3 x4 x5 x6 x7 x8 y1 y2 y3 multiply periodic chaotic stochastic non-stochastic effects should be predictable, therefore controllable Multiply Periodic (Fourier approximations) • people • lab tests • controls tuning • raw materials • weather
  • 9. Chaos Lorenz attractor Power Spectrum 3D Delay Plot “Orbitals” Prediction from  = -10 “extreme sensitivity to changes in boundary conditions”
  • 11. Role of a Process Model process model inputs outputsx1 x2 x3 x4 x5 x6 x7 x8 control setpoints e.g., pressure temperature speed raw material properties e.g., density surface area molecular weight y1 y2 y3 quality measures e.g., strength clarity thickness Things you CAN’T control What you want to know other state variables e.g., humidity amb. temperature Things you CAN control
  • 12. Deterministic vs. Empirical n Sxy - Sx Sy n Sx2 - (Sx)2A = Sy Sx2 - Sx Sxy n Sx2 - (Sx)2B = Neural Networks Statistics Empirical Models E = m c2 du d2u d2u d2u dt dx2 dy2 dz2 = 0+ +- First Principles Models Production Economic Environment
  • 13. Interpolation / Extrapolation P1 Pz Px Py P3 P2 P4 Pw “a good design” “a bad design” “a mediocre design” regions where model extrapolates regions where model interpolates Historical Data • noisy, small data • designed experiments Model Space
  • 14. About Neural Networks • Inspired by the Brain – get complicated behaviors from lots of “simple” interconnected devices - neurons and synapses – models are synthesized from example data • machine learning x1 x2 x3 x4 x5 y1 y2 inputs outputs
  • 15. About Neural Networks • Non-linear Multivariate Curve Fitting – the modeler prescribes inputs, outputs, hidden layer neurons, and connections – “Weights” are the unknown coefficients that are determined by the computer from examples using an error minimizing “learning algorithm” output layer hidden layerinput layer “weights” control connections wi wi+n y1 y2 y1 y2 y1 y2 input/output examples x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
  • 16. About Neural Networks • Shifts Modeling Focus – from smaller data/big deterministic modeling effort – to bigger data/smaller modeling effort – combine with optimization (search) methods • real-time prediction • resource allocation – deterministic + error correcting ANN hybrids
  • 17. Response Surfaces Water Disinfection Trihalomethanes Formation no data surface fitted by non-linear ANN model represents normal behavior deviation from normal better conditions?
  • 18. Response Surfaces Savannah River Saltwater Migration
  • 19. Optimizing With Models process model inputs outputs x1 x2 x3 x4 x5 x6 x7 x8 y1 y2 y3 PI = ay1 + b y2 + cy3 are varied by search routine are evaluated for goodness optimization program (search routine) GOAL: determine values of inputs (within controllable range) to optimize Performance Index while meeting constraints.
  • 20. Control Possibilities Polymer Packaging Film Intrinsic Viscosity PROCESS WEATHER ACTUAL Prediction BEFORE AFTER
  • 21. Industrial ”Sometimes it runs good, sometimes bad, and we don’t know why.” heater dryer air jets
  • 22. Discrete Event Prediction Unexplained Polymer Film Production Shutdowns 20 minute interval , 4 minute ramp 1 minute before 19 minutes before 12 minutes before temperature related web breaks viscosity initial web break Representation Dynamics • can require multiple delays for same variable • delays may be vary Different events due to different causes are detected at different times prior to occurrence
  • 23. Off-Spec Production Synthetic Textile Fiber Quality Days since July 1, 1998 Days since July 1, 1998 Days since July 1, 1998 ProcessTemp(C) ProcessTemp(C) ProcessTemp(C) Waste(lbs) Q3(lbs)AmbientTemp(C) • During period of high off-spec, process tracks ambient temperature.
  • 24. Semi-Quantitative Data Polymer Resin Solid State Polymerization Ambient Pressure = 29.41 Ambient Pressure = 30.26 Frequency heater dryer air jets
  • 25. Environmental Compliance Water Disinfection Trihalomethanes Formation 0 20 40 60 80 100 120 140 160 7/1/00 7/31/00 8/30/00 9/29/00 10/29/00 11/28/00 12/28/00 1/27/01 2/26/01 3/28/01 Trihalomethanes(ppb) FINISHED THM (ppb) Control Model Virtual Sensor • EPA regulated carcinogen • Different models for – prediction – control (gains) • $$$ Savings by optimizing use of ClO2 vs. Cl2 straight predictions
  • 26. Customer Process Engineering Tech Service Output of your process Input to your customer’s process Customer Feedback Your Continuous Improvement Customer’s Continuous Improvement
  • 27. Customer Performance Synthetic Textile Fiber Quality 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 5/20/99 7/9/99 8/28/99 10/17/99 12/6/99 1/25/00 3/15/00 5/4/00 6/23/00 8/12/00 10/1/00 Opelika Date DenierComposite 3.5 4 4.5 5 5.5 6 6.5 RedButtons DENVMACV Opelika MJS Red Buttons 108 110 112 114 116 118 120 8/15/99 10/4/99 11/23/99 1/12/00 3/2/00 4/21/00 6/10/00 7/30/00 Date H_WHITComposite 5 6 7 8 9 10 11 CalhounMJSRedButtons HWHITCV Calhoun MJS Red Buttons AL SC -20 -15 -10 -5 0 5 10 15 20 7/9/99 8/28/99 10/17/99 12/6/99 1/25/00 3/15/00 5/4/00 6/23/00 Frontier DATE FUSEComposite2wkavg,Delay=15days 325 330 335 340 345 350 355 360 365 370 32.5SingleEndsBreak2wkavg FUSEC F32_5SEB NC
  • 28.  Setpoints  Quality? Synthetic Textile Fiber Quality 19 20 21 22 23 24 25 26 6/3/99 7/23/99 9/11/99 10/31/99 12/20/99 2/8/00 3/29/00 5/18/00 7/7/00 Date CRIMP_P 410 420 430 440 450 460 470 480 490 500 U3VI3523.SP CRIMPPA CRIMPPB I3523SP 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 0.95 0.96 6/6/99 7/26/99 9/14/99 11/3/99 12/23/99 2/11/00 4/1/00 5/21/00 7/10/00 Date DenierComposite 90 100 110 120 130 140 U3PP3727.SPandU3PP3727.PV DENVMAC P3727SP P3727PV 25 27 29 31 33 35 37 6/4/99 7/24/99 9/12/99 11/1/99 12/21/99 2/9/00 3/30/00 5/19/00 7/8/00 Date DailyAverageELONGAComposite 50 60 70 80 90 100 U3PF3291.PVandL16PT409.PV ELONGAC F3291PV PT409PV 0 10 20 30 40 50 60 6/4/99 7/24/99 9/12/99 11/1/99 12/21/99 2/9/00 3/30/00 5/19/00 7/8/00 Date FUSEComposite -0.5 0.5 1.5 2.5 3.5 4.5 S3P27845.PV FUSEC 27845PV CRIMP DENIER ELONG FUSE
  • 29. Contract Optimization (electricity purchasing relative to usage) Kosa Spartanburg Baselines 24000 26000 28000 30000 32000 34000 36000 38000 40000 42000 44000 M-98 M-98 A-98 M-98 J-98 J-98 A-98 S-98 O-98 N-98 D-98 J-99 F-99 Date kworkwh D1+D2 LOAD TOTAL D1+D2 USE kwh/hour D1+D2 USE Billing Demand Evaluate Contract Costs and Options Koas Spartanburg Load Shifting Scenarios 37,000 38,000 39,000 40,000 41,000 42,000 43,000 9/1 9/2 9/3 9/4 9/5 9/6 9/7 9/8 Date 1998 kw 0.01 0.03 0.05 0.07 0.09 0.11 0.13 0.15 0.17 0.19 $/kwh Current D1+D2 kw Best Case D1+D2 kw Worst Case D1+D2 kw RTP($/kw) Load Control Scenarios Real-Time Pricing and Electricity Deregulation
  • 31. Estuarine Water Quality 3 4 5 6 7 8 9 10 8/21/93 0:30 8/22/93 0:30 8/23/93 0:30 8/24/93 0:30 Date and time Dissolvedoxygen(mg/L) 16 18 20 22 24 26 28 30 32 Temperature (degreeCelsius) Measured Neural Network BRANCH/BLTM Water temperature Dissolved oxygen • Mixing - Tides, Flows from 3 Rivers • Weather (T, P Dew Point) • Point Discharge Wastewater Treatment Plants • Non-Point Discharges - rainfall, 50% overbank storage
  • 32. Pollution in Estuaries High TideMean Tide wastewater discharges non-point from tidal flooding gauging stations non-point from rain
  • 33. Dissolved Oxygen Concentration (mg/L) Water-Temperature (deg. C) Specific Conductivity (µ-siemens/cm) Time (hours) Water Level (feet) 2.8e4 0.4e4 1.6e4 32.0 30.5 29.0 27.5 10.4 5.6 8.0 5.0 3.5 6.5 2 months raw signals
  • 34. low frequency broadband 6.2 hr 12.4 hr 25.6 & 24 hr water level dissolved oxygen concentration specific conductivity water temperature 8.2 hr spectral analysis
  • 35. 5.7 5.4 5.1 4.8 9.1 8.8 8.5 30.6 28.8 29.7 2.4e4 1.2e4 1.8e4 signal decomposition chaotic component via digital filtering Dissolved Oxygen Concentration (mg/L) Water-Temperature (deg. C) Specific Conductivity (µ-siemens/cm) Time (hours) Water Level (feet)
  • 36. 0 10,000 20,000 30,000 40,000 50,000 6/19/93 7/3/93 7/17/93 7/31/93 8/14/93 8/28/93 9/11/93 Date SC(micro-siemens/cm) 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 DO(mg/L) SC DO
  • 37. 0.6 0.6e4 -0.3 -1.2 1.2 0.3 -0.6 -1.5 0.4 -0.2 -0.5 0.1 -0.6e4 0.0e4 signal decomposition high frequency, periodic components via digital filtering Dissolved Oxygen Concentration (mg/L) Water-Temperature (deg. C) Specific Conductivity (µ-siemens/cm) Time (hours) Water Level (feet)
  • 40. Beaufort River Estuary SS SP PI -0.5 -0.3 -0.1 0.1 0.3 0.5 1 6 11 16 21 26 31 36 41 46 51 56 61 66 Sequential but Non-Consecutive Data Point Number 24hrDerivativeofDOA (mg/L) -400 -200 0 200 400 600 800 SouthsideBOD(LB/Day) EDOa6611 SSBOD_D R = -0.357, R2 = 0.128 R2 siso = 0.130 = 1
  • 41. Gains by delay   = 2 day  = 3 days  = 1
  • 42. BOD Effect on DO Inputs = WL, SC, TP, Rainfall, BOD R2 ANN = 0.57 N = 61 points deltaDOs-DOm flow towards gauge TP = 20 C 61 Points increasing BOD delta DOs-DOm No Data
  • 44. Groundwater Modeling Upper Floridian Aquifer Well Histories (18 years) Surface Contour Well Locations (100x100 miles)
  • 45. Sub-Regions Behave Differently 350000 370000 390000 410000 430000 450000 470000 490000 2360000 2380000 2400000 2420000 2440000 2460000 2480000 2500000 25x30miles
  • 46. Groundwater Example (Cluster monitoring wells according to behavior)
  • 47. Accuracy by Cluster Actual Prediction C1 History from April 1982 to October 1998 NormalizedWater LevelaboveSeaLevel C3 C6 C10
  • 48. Aquifer “Ceiling” Gulf of Mexico Max elevation above sea level ~ 180 feet
  • 50. Problem • Area A is a coal bed methane field • Data for 59 wells was compiled by petroleum engineering group in CO to determine if an artificial neural network* (ANN) could be used to predict total gas production for the life of each well. 0 5000 10000 15000 20000 25000 30000 0 5000 10000 Normalized UTMX (m) NormalizedUTMY(m) 3D Range Model of Area A (16x32 km vertical scale ~ 200m) Well Locations * A form of “machine learning” from AI. North North
  • 51. Variable Types • Static Variables – e.g., depth, permeability – for each well, treated like they do not vary in time • in reality some probably do – some measurements are just estimates with large error – limited “information content”, e.g. one value per well • Time Series, a.k.a. Signals – e.g., water and gas production rates – values vary in time – large “information content”, dozens of values per well – variability in time • caused by the underlying process physics • a detailed surrogate for an explicit description of the physics • Synthetic Variables - are computed by equations or models
  • 52. Static Variables • Potential Model Inputs – Geometric Variables - SURFace ELEVation, COAL ELEVation, COAL DEPTH, COAL THICKness – COAL PERMeability COAL POROsity - from cores, logs, and engineering estimates – KV CONFining - permeability in vertical direction
  • 53. Water & Gas Time Series Detail 59 Wells Wells are similar, but are different in detail MonthlyGasProduced(MMCF) MonthlyWaterProduced(bbl) Production Month Well 1 Well 2 Well 3
  • 54. Synthetic Variables • Potential Model Inputs – GIP - estimated gas in place from geometric variables • Model Outputs – CGP 25 - cumulative gas production to 25 psi, estimated by reservoir simulator using static variables – CGP 50 - as for CGP 25, but to 50 psi. CGP 50 vs. 25 R2 = 0.99 CGP 25 vs. GIP R2 = 0.39 R2 ANN = 0.52
  • 55. “Model the Model” • A computer model can be treated as a black box. The box “maps” multiple inputs to an output. • If, for all combinations of input values, the box computes a continuously differentiable output, its map can be learned very accurately by an ANN. • A cause of non-differentiable output is switching by programmed logic in model’s computer code. • Discontinuous maps can be segmented by clustering, then modeled using multiple ANN’s. Reservoir Simulator x1 x2 x3 x4 x5 x6 y inputs output continuously differentiable output x1 y x5
  • 56. Chaotic Systems • Modeled using Phase Space Reconstruction • Takens Theorem, univariate systems (1980) – x(t) = F[x(t - p - d), x(t - p - 2d),…, x(t - p - nd)] – current state-of-the-art – implies • x can be predicted at time t from an optimal number n previous measurements (n is called the “dimension”) • n measurements spaced at optimal time delays nd produces an optimal prediction • Requirements/Options – multivariate, variable delays - much better than Takens – completely extracts “information content”
  • 57. Phase Space Reconstruction • Select d = 6 months by inspection (optimal delays can be computed for larger data sets). • MWA = 6-month moving window average of water and gas production to remove high frequency variability. 0 5,000 10,000 15,000 20,000 25,000 0 6 12 18 24 30 36 42 48 54 60 66 72 78 Month of Production Water(BBL/m)&Gas(MCF/m) WATER MWA GAS MWA One well’s history
  • 58. Model CGP by ANN • Develop succession of models with longer histories as inputs, checking R2 ANN as you go. • Point count N decreases with longer histories because some well histories are less complete. • R2 ANN at 6 months = 0.68, 0.93 at 36 months…good enough! • RMSE = 344 MCF at 36 months relative to 6500 MCF actual full scale (5%). 0 10 20 30 40 50 60 70 80 90 100 6 12 18 24 30 36 Month of Production N(pointcount)&R2 ANNx100 0 100 200 300 400 500 600 700 800 CUMGAS25PredictionRMSE (MMCF) N R2 x 100 RMSE
  • 59. Results Actual & Predicted using 6 months history, R2 ANN = 0.68 Actual & Predicted using 36 months history, R2 ANN = 0.93 CGP25 (BCF)CGP25 (BCF) PredictedCGP25(BCF) . . . . . . . . PredictedCGP25(BCF) Prediction using 6 months of history. Prediction using 36 months of history.
  • 60. Coal Gas Summary • Time series have higher information content, less noise than static variables. • Phase Space Reconstruction – leverages hidden physics manifest in time series variability – high accuracy – extensible to other gas fields, collections of fields, other problems, other domains
  • 61. Conclusions • Your process - room for improvement? • Data Mining – fast, powerful, decisive – automates knowledge acquisition, produces predictive models – solves problems that are unsolvable by any other means. • Advanced visualization makes results understandable to all.
  • 63. Modeling Chaos • Takens Theorem (1980), univariate systems – x(t) = F[x(t - p - d), x(t - p - 2d),…, x(t - p - nd)] • each t represents a vector of measurements • p = the “time delay” of the most recent measurement available – implies a “prediction horizon” • n and d = “dynamical invariants” – analogous to amplitudes, periods, and phases in periodic systems • n + 1 = “embedding dimension” – the number of previous measurements – implies an optimal number of previous measurements – d = characteristic “time delay” of the attractor – implies an optimal spacing in time for the previous measurements • F is an arbitrary mapping function, e.g., “look up”, regression, or ANN, whatever gives the best results
  • 64. Modeling Chaos • ADM, multivariate “Takens” – y(t) = F[xi(t - pi), xi(t - pi - di),…, xi(t - pi - nidi)] • i, pi, ni, and di are dynamical invariants • i = number of input variables • xi = model input variables – implies an optimal set • pi = time delay of peak (optimal) correlation between y and each xi • ni + 1 = the embedding dimension of the attractor of each xi • di = characteristic “time delay” of the attractor xi • F is an arbitrary mapping function, generally an ANN
  • 65. Modeling Chaos • ADM generalization – y(t) = F[xi(t - pi), xi(t - pi - dij),…, xi(t - pi - din)] • din replaces di, a variable delay • For medium data sets (300 to 1000 vectors) – y(t) = F[xi(t - pi), x’i(t - pi - dij),…, x’i(t - pi - din)] • replace xi with derivatives x’I to mitigate tendency of aggressive regression techniques to overfit data • also amplifies effects of small changes • Dynamical Invariants computed by systematic search