SindyAutoEncoder: Interpretable Latent Dynamics via Sparse Identification

Data-Driven Discovery of
Dynamical Systems and
Governing Equations
SINDy Autoencoders: A Hybrid Approach
Champion, K., Lusch, B., Kutz, J. N., & Brunton, S. L. (2019). Data-driven discovery of coordinates and governing equations. Proceedings of the National Academy of Sciences, 116(45), 22445-22451.

References
https://www.eigensteve.com/ https://www.youtube.com/watch?v=NLFboNNKCME

Personal Motivation – 3 levels of ML usage
Decide Interpret Reveal

Personal Motivation – 3 levels of Murder
Mystery Resolution
Who? How? Why?

Agenda
• Ambition
• Key Assumptions
• Example: Lotka-Volterra (Predator-Prey) Equations
• Generalized SINDy
• Transforming Co-ordinates
• SINDy Autoencoder
• Applications & Challenges
• Future Extension

Ambition
Given the current state of the system, we want to learn the
governing equations that can estimate the the system's rate of
change.

Ambition of Sparse Identification of Nonlinear
Dynamical Systems (SINDy)
Given the current state of the system, we want to learn the sparse
governing equations that accurately estimate the system's rate of
change.
𝑑𝑥(𝑡)
𝑑𝑡
= 𝑓(𝑥 𝑡 )

Ambition of SINDy
Given the current state of the system, we want to learn the sparse
governing equations that accurately estimate the system's rate of
change.
𝑑𝑥(𝑡)
𝑑𝑡
= 𝑓(𝑥 𝑡 )
Turn high-dimensional scientific data into parsimonious dynamical
models

Key Assumptions
• Causality: The Current State Encapsulates Future Behavior
oKnowing the current state 𝑥(𝑡) allows us to predict how the system will
evolve in the next instant.
• Markov Property: Future Depends Only on Present

Key Assumptions
SINDy models the dynamics of a system using only the current
state 𝑥(𝑡) 𝑑𝑥(𝑡)
𝑑𝑡
= 𝑓(𝑥 𝑡 )
Simplicity Limitation
Mathematical Tractability Non-Markovian Systems
Applicability to a Wide Range of
Systems
Incomplete State Representation
Computational Efficiency: Extensions Required for Complex
Systems

Example: Lotka-Volterra (Predator-Prey)
Equations
𝑑𝑥
𝑑𝑡
= 𝛼𝑥 − 𝛽𝑥𝑦
𝑑𝑦
𝑑𝑡
= 𝛿𝑥𝑦 − 𝛾𝑦
Where,
𝑥: Prey population
𝑦: Predator population
𝛼, 𝛽, 𝛿, 𝛾: Positive constants representing interaction rates

Generating data
• Initial Conditions

Generating data
• Simulated data
X(t)

Generating data
• Simulated data
X(t) ሶ
𝑋

Learning the governing Equations
X(t) ሶ
𝑋
Learn a governing set of equations so that
𝑓 𝑋 𝑡 = ሶ
𝑋

Learning the governing equations
X(t) ሶ
𝑋
Library Functions Θ(𝑋)

Learning the governing equations
X(t) ሶ
𝑋
Library Functions Θ(𝑋)
𝑑𝑥
𝑑𝑡
= 𝛼𝑥 − 𝛽𝑥𝑦
𝑑𝑦
𝑑𝑡
= 𝛿𝑥𝑦 − 𝛾𝑦
0 0
0 −𝛾
𝛼 0
0 0
𝛽 𝛿
Ξ

Generalized SINDy
ሶ
𝑋 = Θ(𝑋)Ξ
Brunton, S. (2017, March). Discovering governing equations from data by sparse identification of nonlinear dynamics. In APS March Meeting Abstracts (Vol. 2017, pp. X49-004).

Generalized SINDy
ሶ
𝑋 = Θ(𝑋)Ξ
𝑋 =
𝑥𝑇(𝑡1)
𝑥𝑇(𝑡2)
⋮
𝑥𝑇(𝑡𝑚)
=
𝑥1(𝑡1) 𝑥2(𝑡1) … 𝑥𝑛(𝑡1)
𝑥1(𝑡2) 𝑥2(𝑡2) … 𝑥𝑛(𝑡2)
⋮ ⋮ ⋱ ⋮
𝑥1(𝑡𝑚) 𝑥2(𝑡𝑚) … 𝑥𝑛(𝑡𝑚)
Where,
Θ 𝑋 =
| | | | | | | |
1 𝑋 𝑋𝑃2 𝑋𝑃3 … sin(𝑋) cos(𝑋) …
| | | | | | | |
Ξ = 𝜉1 𝜉2 … 𝜉𝑛

Generalized SINDy
ሶ
𝑋 = Θ(𝑋)Ξ
𝑋 =
𝑥𝑇(𝑡1)
𝑥𝑇(𝑡2)
⋮
𝑥𝑇(𝑡𝑚)
=
𝑥1(𝑡1) 𝑥2(𝑡1) … 𝑥𝑛(𝑡1)
𝑥1(𝑡2) 𝑥2(𝑡2) … 𝑥𝑛(𝑡2)
⋮ ⋮ ⋱ ⋮
𝑥1(𝑡𝑚) 𝑥2(𝑡𝑚) … 𝑥𝑛(𝑡𝑚)
Where,
Θ 𝑋 =
| | | | | | | |
1 𝑋 𝑋𝑃2 𝑋𝑃3 … sin(𝑋) cos(𝑋) …
| | | | | | | |
Ξ = 𝜉1 𝜉2 … 𝜉𝑛
Once Ξ has been determined
ሶ
𝑥𝑘 = 𝑓𝑘 𝑥 = Θ(𝑥𝑇)𝜉𝑘

Co ordinates
… Choosing the right coordinates to simplify dynamics has always been important …
Brunton, S. (2017, March). Discovering governing equations from data by sparse identification of nonlinear dynamics. In APS March Meeting Abstracts (Vol. 2017, pp. X49-004).

Simplifying co-ordinate systems
Example Complex Coordinate System Simplified Coordinate System
Celestial Mechanics Geocentric (Earth-centered) Heliocentric (Sun-centered)
Fourier Transform (Heat Equation) Time Domain Frequency Domain (Fourier
space)
Principal Component Analysis
(PCA)
Original high-dimensional space Principal Component Space
Polar Coordinates Cartesian Coordinates (x, y) Polar Coordinates (r, θ)
x
y

Transforming Co-ordinates
SVD (Shallow Linear)
Deep AutoEncoder (Deep Non-
Linear)

Loss Term I
Training the Sindy model
Ensure that we get a good
representation of ሶ
𝒛
• Capturing Core Dynamics
• Efficient Prediction
• Simplified Understanding

Loss Term I
• Minimize ( ሶ
𝑧 − 𝑆𝐼𝑁𝐷𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 ሶ
𝑧)
𝑆𝐼𝑁𝐷𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 ሶ
𝑧
= 𝜃 𝑧𝑇
Ξ
= 𝜃 𝜑 𝑥 𝑇
Ξ

Loss Term I
• Minimize ( ሶ
𝑧)
𝑧
= 𝜃 𝑧𝑇
Ξ
Ξ
𝑧 = 𝜑 𝑥
ሶ
𝑧 = ?

Loss Term I
• Minimize ( ሶ
𝑧)
𝑧
= 𝜃 𝑧𝑇
Ξ
Ξ
𝑧 = 𝜑 𝑥
ሶ
𝑧 = ?
Utilize the Jacobian, where
Δ𝑓 = 𝐽Δ𝑥

Loss Term I
• Minimize ( ሶ
𝑧)
𝑧
= 𝜃 𝑧𝑇
Ξ
Ξ
ሶ
𝑧 = ?
Δ𝑓 = 𝐽Δ𝑥
𝑧 = 𝜑 𝑥

Loss Term I
• Minimize ( ሶ
𝑧)
𝑧
= 𝜃 𝑧𝑇
Ξ
Ξ
ሶ
𝑧 = ?
ሶ
𝑧= 𝐽𝜑 ሶ
𝑥
Δ𝑓 = 𝐽Δ𝑥
𝑧 = 𝜑 𝑥

Loss Term I
• Minimize ( ሶ
𝑧)
𝑧
= 𝜃 𝑧𝑇
Ξ
Ξ
ሶ
𝑧 = ?
ሶ
𝑧= 𝐽𝜑 ሶ
𝑥
ሶ
𝑧= ∇𝑥(𝑧) ሶ
𝑥
Δ𝑓 = 𝐽Δ𝑥
𝑧 = 𝜑 𝑥

Loss Term I
• Minimize ( ሶ
𝑧)
𝑧
= 𝜃 𝑧𝑇
Ξ
Ξ
ሶ
𝑧 = ?
ሶ
𝑧= 𝐽𝜑 ሶ
𝑥
ሶ
𝑥
ሶ
𝑧= ∇𝑥(𝜑 𝑥 ) ሶ
𝑥
Δ𝑓 = 𝐽Δ𝑥
𝑧 = 𝜑 𝑥

Loss Term I
Minimize ( ሶ
𝑧)
𝑧 = 𝜃 𝜑 𝑥 𝑇
Ξ
ሶ
𝑧 = ?
ሶ
𝑧= 𝐽𝜑 ሶ
𝑥
ሶ
𝑥
ሶ
𝑧= ∇𝑥(𝜑 𝑥 ) ሶ
𝑥
Δ𝑓 = 𝐽Δ𝑥
𝑧 = 𝜑 𝑥
ሶ
𝑧= ∇𝑥(𝜑 𝑥 ) ሶ
𝑥
ℒ𝑑𝑧/𝑑𝑡 = ∇𝑥 𝜑 𝑥 ሶ
𝑥 − 𝜃 𝜑 𝑥 𝑇
Ξ
2
2

Loss Term II
• Predicting Time Evolution
• Capturing Data and Dynamics
• Improving Model Accuracy with Time Derivatives
Minimize ( ሶ
𝑥 − ሶ
𝑥 𝑓𝑟𝑜𝑚 𝑆𝐼𝑁𝐷𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠)

Loss Term II
Calculating ሶ
𝑥 𝑓𝑟𝑜𝑚 𝑆𝐼𝑁𝐷𝑦 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠
ො
𝑥 = 𝜓(𝑧)

Loss Term II
Calculating ሶ
ො
𝑥 = 𝜓(𝑧)
ሶ
𝑥 = 𝐽𝜓 ( ሶ
𝑧)

Loss Term II
Calculating ሶ
ො
𝑥 = 𝜓(𝑧)
ሶ
𝑧)
= ∇𝑧 ො
𝑥 ( ሶ
𝑧)

Loss Term II
Calculating ሶ
ො
𝑥 = 𝜓(𝑧)
ሶ
𝑧)
= ∇𝑧 ො
𝑥 ( ሶ
𝑧)
= ∇𝑧 𝜓(𝑧) ( ሶ
𝑧)

Loss Term II
Calculating ሶ
ො
𝑥 = 𝜓(𝑧)
ሶ
𝑧)
= ∇𝑧 ො
𝑥 ( ሶ
𝑧)
= ∇𝑧 𝜓(𝑧) ( ሶ
𝑧)
= ∇𝑧 𝜓(𝜑 𝑥 ) ( ሶ
𝑧)

Loss Term II
Calculating ሶ
ො
𝑥 = 𝜓(𝑧)
ሶ
𝑧)
= ∇𝑧 ො
𝑥 ( ሶ
𝑧)
= ∇𝑧 𝜓(𝑧) ( ሶ
𝑧)
= ∇𝑧 𝜓(𝜑 𝑥 ) ( ሶ
𝑧)
= ∇𝑧 𝜓(𝜑 𝑥 ) (𝜃 𝑧𝑇 Ξ)

Loss Term II
Calculating ሶ
ො
𝑥 = 𝜓(𝑧)
ሶ
𝑧)
= ∇𝑧 ො
𝑥 ( ሶ
𝑧)
= ∇𝑧 𝜓(𝑧) ( ሶ
𝑧)
= ∇𝑧 𝜓(𝜑 𝑥 ) ( ሶ
𝑧)
= ∇𝑧 𝜓(𝜑 𝑥 ) (𝜃 𝑧𝑇 Ξ)
ℒ𝑑𝑥/𝑑𝑡 = ሶ
𝑥 − ∇𝑧 𝜓(𝜑 𝑥 ) (𝜃 𝑧𝑇 Ξ)
2
2

Loss Term III
• Recreating the input
ℒ𝑟𝑒𝑐𝑜𝑛 = 𝑥 − 𝜓(𝜑 𝑥 2
2

Total Loss
Overall, Loss = ℒ𝑟𝑒𝑐𝑜𝑛 + 𝜆1ℒ𝑑𝑥/𝑑𝑡 + 𝜆2ℒ𝑑𝑧/𝑑𝑡 + 𝜆3ℒ𝑟𝑒𝑔
ℒ𝑟𝑒𝑐𝑜𝑛 = 𝑥 − 𝜓(𝜑 𝑥 2
2
ℒ𝑑𝑥/𝑑𝑡 = ሶ
𝑥 − ∇𝑧 𝜓(𝜑 𝑥 ) (𝜃 𝑧𝑇 Ξ)
2
2
ℒ𝑑𝑧/𝑑𝑡 = ∇𝑥 𝜑 𝑥 ሶ
𝑥 − 𝜃 𝜑 𝑥 𝑇 Ξ
2
2
ℒ𝑟𝑒𝑔 = Ξ 1
1

Applications: Nonlinear Pendulum
Champion, K., Lusch, B., Kutz, J. N., & Brunton, S. L. (2019). Data-driven discovery of coordinates and governing equations. Proceedings of the National Academy of Sciences, 116(45),
22445-22451.

Applications: Lorenz System
22445-22451.

Applications: Reaction Diffusion System
22445-22451.

Challenges
• Clean, noise free measurement data
• Limited interpretability of deep learning models
• Coordinate transformation limitations
• Need for integration with domain knowledge

Future Extension
Chebyshev Polynomials for System Identification
𝑇0 𝑥 = 1, 𝑇1 𝑥 = 𝑥
𝑇𝑛+1 𝑥 = 2 ∗ 𝑥 ∗ 𝑇𝑛 𝑥 − 𝑇𝑛−1 𝑥
𝑥𝜖[−1,1]
𝑦 𝑥 = 𝛼0𝑇0 𝑥 + 𝛼1𝑇1 𝑥 + 𝛼2𝑇2 𝑥 + ⋯ + 𝛼𝑛𝑇𝑛 𝑥
Why Chebyshev Polynomials?
• Orthogonality
• Numerical Stability
• Efficient Approximation
Application to System Identification:
• Use high-order Chebyshev polynomials to create candidate basis functions.
• Each function becomes a weighted sum of these polynomials:

Extending Chebyshev Polynomials to Multiple
Features
• Generalizing to Multiple Features:
• For features 𝑥1, 𝑥2,… 𝑥𝑚 ∶
𝑦 𝑥1, 𝑥2 = ෍
𝑖=0
𝑛
෍
𝑗
𝑛
𝛼𝑖,𝑗𝑇𝑖 𝑥1 𝑇𝑗 𝑥2
• Including Interactions Between Features
• Tensor Products
• For Two Features 𝑥1 and 𝑥2
𝑦 𝑥1, 𝑥2, … 𝑥𝑚 = ෍
𝑖,𝑗,…,𝑘
𝛼𝑖,𝑗,…,𝑘𝑇𝑖 𝑥1 𝑇𝑗 𝑥2 … 𝑇𝑘 𝑥𝑚

Optimizing and Sparsifying with Cross-
Feature Terms
• Learn the coefficients 𝛼𝑖,𝑗,…,𝑘that best fit the system while
promoting sparsity
• Optimization problem:
𝑚𝑖𝑛𝛼 𝑦 − ෍
𝑖,𝑗,…,𝑘
𝛼𝑖,𝑗,…,𝑘𝑇𝑖 𝑥1 𝑇𝑗 𝑥2 … 𝑇𝑘 𝑥𝑚
2
+ 𝜆 𝛼 1
Sparsity and Interpretability
• Pruning Insignificant Terms: Coefficients corresponding to unimportant
polynomials shrink to zero.
• Simplified Model: Results in a model that is both accurate and interpretable, using
only the most significant terms.

Synergy with Neural Networks
• Neural Network Enhancement
• Adaptive Coefficients
• Dynamic Modeling
• Model Architecture
• Input Layer: Receives features 𝑥1, 𝑥2, … , 𝑥𝑚
• Hidden Layers
• Output Layer: 𝛼𝑖,𝑗,…,𝑘
• Combined Model:
𝑦 𝑥1, 𝑥2, … 𝑥𝑚 = ෍
𝑖,𝑗,…,𝑘
𝛼𝑖,𝑗,…,𝑘 𝑥1, 𝑥2, … , 𝑥𝑚 𝑇𝑖 𝑥1 𝑇𝑗 𝑥2 … 𝑇𝑘 𝑥𝑚

Remarks
Advantages
• Enhanced Expressiveness
• Improved Accuracy
• Interpretability
Practical considerations
• Computational Complexity
• Regularization Techniques
• Training the Neural Network

Example application: Neural Networks for
classification
Normally
Output of a neuron is σ𝑖=0
𝑚
𝑤𝑖𝑥𝑖, m is the number of features
Where 𝑤𝑖 is the learnable parameter
For ChebyShev, each 𝑤𝑖 is as follows
𝑤𝑖 = ෍
𝑑=0
𝑘
𝑐𝑖,𝑑𝑇𝑑 𝑥𝑖
Where 𝑐𝑑 is the learnable parameter, k is the order of Chebyshev polynomial
Thus for Chebyshev, output of a neuron is
෍
𝑖=0
𝑚
𝑤𝑖𝑥𝑖
Which can also be written as
෍
𝑖=0
𝑚
෍
𝑑=0
𝑘
𝑐𝑖,𝑑𝑇𝑑 𝑥𝑖 𝑥𝑖

For k=3(0,1,2), Cheby Shev
Output
෍
𝑖=0
𝑚
෍
𝑑=0
𝑘
𝑐𝑖,𝑑𝑇𝑑 𝑥𝑖 𝑥𝑖
= ෍
𝑖=0
𝑚
𝑐𝑖,0𝑇0 𝑥𝑖 + 𝑐𝑖,1𝑇1 𝑥𝑖 + 𝑐𝑖,2𝑇2 𝑥𝑖 𝑥𝑖
Now, if 𝑐𝑖,1and 𝑐𝑖,2are forced to be 0, and 𝑇0 is 1, output becomes
σ𝑖=0
𝑚
𝑐𝑖,0 𝑥𝑖 which is same as σ𝑖=0
𝑚
𝑤𝑖𝑥𝑖
where each 𝑐𝑖,0 term in the expression represents 𝑤𝑖
Example application: Neural Networks for classification

Results File Size Columns Number Of
Classes
MLP F1 Score F1 Score by
ChebyShevNetwork
Delta
percentage
krkopt.csv 28056 6 18 0.509 0.632 24.165
contraceptive.csv 1473 9 3 0.529 0.582 10.019
led7.csv 3200 7 10 0.703 0.764 8.677
cmc.csv 1473 9 3 0.536 0.58 8.209
car.csv 1728 6 4 0.898 0.967 7.684
splice.csv 3188 60 3 0.851 0.899 5.64
wine_quality_red.csv 1599 11 6 0.559 0.59 5.546
wine_quality_white.csv 4898 11 7 0.517 0.542 4.836
letter.csv 20000 16 26 0.896 0.924 3.125
satimage.csv 6435 36 6 0.86 0.886 3.023
solar_flare_2.csv 1066 12 6 0.72 0.74 2.778
page_blocks.csv 5473 10 5 0.959 0.978 1.981
nursery.csv 12958 8 4 0.984 0.997 1.321
sleep.csv 105908 13 5 0.744 0.753 1.21
allhypo.csv 3770 29 3 0.938 0.948 1.066
allhyper.csv 3771 29 4 0.981 0.988 0.714
allrep.csv 3772 29 4 0.971 0.975 0.412
ann_thyroid.csv 7200 21 3 0.982 0.986 0.407
texture.csv 5500 40 11 0.986 0.99 0.406
segmentation.csv 2310 19 7 0.972 0.974 0.206
dna.csv 3186 180 3 0.938 0.939 0.107
pendigits.csv 10992 16 10 0.994 0.994 0
yeast.csv 1479 8 9 0.588 0.587 -0.17
allbp.csv 3772 29 3 0.977 0.973 -0.409
optdigits.csv 5620 64 10 0.976 0.972 -0.41
car_evaluation.csv 1728 21 4 0.991 0.986 -0.505
led24.csv 3200 24 10 0.742 0.735 -0.943

Data Extrapolation
Initial Hypothesis
Generation
Physical Model
Simulation
Prediction and
Uncertainty
Estimation
Data Integration
and Model Update
Active Learning:
Data Acquisition

SindyAutoEncoder: Interpretable Latent Dynamics via Sparse Identification

More Related Content

Similar to SindyAutoEncoder: Interpretable Latent Dynamics via Sparse Identification (20)

Recently uploaded (20)

SindyAutoEncoder: Interpretable Latent Dynamics via Sparse Identification