Introduction to Hamiltonian Neural Networks

An Introduction to
Hamiltonian Neural Networks
Presented by Miles Cranmer, Princeton University
@MilesCranmer
(advised by Shirley Ho/David Spergel)
This is based on none of my own research.
The work is by:
Sam Greydanus, Misko Dzamba, and Jason Yosinski
(+ Tom Bertalan, Felix Dietrich, Igor Mesić, and Ioannis G
Kevrekidis which was posted at a similar time)

Ordering:
1. Classical Mechanics Review
2. Neural Networks
3. Hamiltonian Neural Networks
4. Bonus: Neural ODEs
5. Code Demo

Forces
• Objects and fields by themselves induce
forces on other objects
• A vector-wise sum of forces gets the net force
• Divide by mass of the body to get the
acceleration
• Common forces:
• Normal force (desk holding something)
• Friction
• Tension (string)
• Gravity
[1]

Lagrangian Mechanics
• For a coordinate system,
• (Focus on object coordinates for today)
• Write down kinetic energy =
• Potential energy =
• Lagrangian is a function of coordinates and (usually) their first
order derivatives
• Action is:
• Apply principle of stationary action

Lagrangian Mechanics 2
• By extremizing the action, we get the Euler-Lagrange equations.
• Example: falling ball:
• Numerically integrate these to get the dynamics of the system

Hamiltonian Mechanics
• Canonical momenta for a system:
• Legendre transformation of L is the Hamiltonian:
• This usually is the energy, conserved in a dynamical system.
• What path preserves H?
• Move perpendicular to its gradient!
• Called symplectic programming

Hamiltonian Mechanics 2
• H-preserving path = Symplectic Gradient:
• Also known as Hamilton’s equations!
• Can use these first order, explicit ODEs to integrate physical
dynamics
• Problems with L:
• Second order, implicit ODEs
• L isn’t meaningful by itself

Things to worry about with L, H
• Dissipation/friction
• Need to add force to Euler-Lagrange equation
• Can also use multiplicative factor:
• Energy pools/boundaries
• Constraints
• E.g., normal forces
• Sol’n: Use better coordinates (sometimes tricky)
• Or, use constraint function that equals 0
• (Lagrange multiplier method)
• *After reading the presentation – if you manage to think of a way
to add these techniques to a Hamiltonian NN, come talk to me!

Integrators
• Presented with an explicit differential equation,
we can use several methods to numerically integrate it.
• Recall that:
• This is an Euler integrator:

Accurate Integrators
• Advanced integrators do several
intermediate steps to improve accuracy
• Runge-Kutta integrators target accuracy
• Can be very accurate, but not preserve
known invariants!
• Symplectic integrators target energy
conservation
• Can preserve energy very well, but have no
accuracy!
• (All integrators are bad for longterm
accuracy)
[3]

Integrator Examples
• Runge-Kutta 4th order
(most common)
• High accuracy, low-
cost
• Does not necessarily
preserve energy
[3]

• Symplectic 4th order (Yoshida)
• These exactly conserve energy!
• Do drift (update x) and kick (update p) steps separately
• (c, d) are ugly constants,
some negative,
which add to 1
[4]

Pivot to Machine Learning
• Recall (or not?): Machine Learning is parameter estimation where
the parameters lack explicit physical meaning!
• Many types of ML:
• Supervised (common):
• Regression
• Classification
• Unsupervised
• E.g., clustering, density estimation
• Semi-supervised – a mix
• Linear Regression – this counts as ML!
[5]

Neural Networks
• Repeat after me:
Neural Networks are piecewise Linear Regression!
• Mathematically (we’ll only talk Multi-Layer Perceptrons):
• (You do a linear regression -> zero the negatives -> repeat)

Neural Networks 2
• Repeat after me:
Neural Networks are piecewise Linear Regression!
• 0-hidden layer Neural Network: linear regression!
• 1-hidden layer NN with ReLU: Piecewise
• Whatever combination of “neurons” are on = different “region” for linear
regression
• 2^(layers*hidden size) different linear regression solutions
• Continuously connected
• Don’t expect good extrapolation! Only nearby interpolation
• Neural Net parameters both inform the slope and the regions.

I don’t believe you!
• Randomly-initialized 2-hidden layer 50-node NN:

Why?
• ReLU on = linear regression
• ReLU off = 0
• Remaining nodes simplify to
linear regression!
[6]

Neural Network Aside
• Other activation functions: tanh and softplus, smear this linearity
• Neural Networks are universal function approximators. In the
limit of infinitely wide layers, even with two hidden ones, they can
express any mapping.
• They happen to be efficient at doing this too!
• All Neural Network techniques are about getting them to cheat
less. They are very good at cheating.
• Data Augmentation (hugely important)
• Regularization
• Structure (Convolutional NN, Graph Net, etc)

Differentiability
• Derivative is well-defined. Just a product of sparse matrices!
• Interested in:
• Derivative wrt weights used for optimization (SGD or Adam)
• Auto-diff frameworks like TensorFlow and PyTorch make this easy.
• Demo: https://playground.tensorflow.org

Neural Nets for Physical Dynamics
• Here we will focus on physical systems over time.
• Many other things like sequences can be reframed as dynamics
problems.
• We are interested in problems where we have:
•
• for i particles over time
• In addition to other fixed properties...
• How do we use Neural Nets to simulate systems?

Example - Pendulum
• How to learn to estimate the future position and velocity of a
pendulum?
• Neural Net:
• n is the number of particles*dynamical parameters
• l is the number of fixed parameters
• Pendulum:
• n = 2 (theta, theta velocity)
• l = 2 (gravity, length of pendulum)
• Want to only predict change in parameters - easier regression problem
• So, here we are learning a function that approximates a velocity update
and a force law
[7]

Real World Applications (of NNs for
simulation)
• Neural Networks learn "effective" forces in simulations
• They only look at the most relevant degrees of freedom!
• Can be more accurate at reduced computational cost
• Some examples:
• Shirley Ho's U-Net can do cosmological simulations much faster and more
accurately than standard simulators
• Peter Battaglia's Interaction Network used in many applications
• Drug discovery/molecular+protein modelling – getting very popular
• E.g., Cecilia Clementi, Frank Noe, Mark Waller, many others
• DeepMind's AlphaFold Protein Folding algorithm - destroys baseline algorithms at
finding structure from genetic code
• See IPAM's recent workshop for good list!
• Some say intelligent reasoning is based on learning to simulate potential
outcomes => path to general intelligence?

Hamiltonian Neural Networks
• Learn a mapping from coordinates and momenta to a single
number
• The derivatives of this can describe your dynamics by Hamilton's
equations:
• Comparing the true and predicted dynamical updates gives a
minimization objective:
(Sam’s blog)

Why?
• It works better; it’s more interpretable. Not only
do we have a simulator, we know the energy!
(Sam’s blog)

Why does it work?
• It uses symplectic gradients: by prescribing that we can only move
along the level set of H, it learns the proper H.
Start: Final:
(Sam’s blog)

(Sanchez-Gonzalez
et al)
Graph Network extension:

Integrators
• So far we have only talked about Euler integrators. But as dH is
just an ODE, we can use any integrator: RK4 and symplectic
included.
• If H has learned the true energy, we can exactly preserve it with
symplectic integrators.
• In practice, RK4 still more accurate. Maybe some combination is best?
This model is less than 6 months old! We don't know what is best yet.
• Can train + eval with RK4 or Symplectic Methods!
• Do multiple queries and multiple derivatives of your network’s H
• This works very well in practice.

I don’t know the canonical coordinates!
• Pair two Neural Networks:
• g, an autoencoder to latent variables
• H, a Hamiltonian that pretends those
latent variables are (q, p).
• Training this setup in combination
will learn the
canonical coords
+ the Hamiltonian!
(Sam’s blog)

Tips
• Activations:
• Recall: Neural Networks are piecewise linear regression.
• Looking at derivatives from ReLU means we are literally learning a lookup
table – not good!
• Use Softplus or Tanh to make H have a smoother derivative
• Use more hidden nodes than for regular NNs, as H needs to be very
smooth
• Stability:
• According to some (Stephan Hoyer), better to learn multiple timesteps at
once.
• Use RK4 integrators

Bonus: Neural ODEs
• Famous 2018 paper:
Neural Ordinary Differential
Equations.
• Hamiltonian Neural
Networks -ARE- a Neural
ODE.
• Paper connects ResNets
with Euler integrators
• Paper: “Why not just learn a
derivative and integrate it?”
• Smoother output!
(Chen et al)

PyTorch Tutorial – Falling Ball
• Short: https://bit.ly/2JiTEJE
• (Copy to new notebook in your drive)

Figure + other references
1. http://ffden-2.phys.uaf.edu/211_fall2004.web.dir/Jeff_Levison/Freebody%20diagram.htm
2. https://physics.stackexchange.com/questions/384990/why-will-a-dropped-object-land-at-the-same-time-as-a-sideways-
thrown-one
3. https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods
4. https://en.wikipedia.org/wiki/Leapfrog_integration
5. https://en.wikipedia.org/wiki/Linear_regression#/media/File:Linear_regression.svg
6. https://medium.com/@amarbudhiraja/https-medium-com-amarbudhiraja-learning-less-to-learn-better-dropout-in-
deep-machine-learning-74334da4bfc5
7. https://medium.com/@kriswilliams/how-life-is-like-a-pendulum-8811c4177685
Other resources used:
1. https://arxiv.org/abs/1906.01563
2. https://arxiv.org/abs/1907.12715
3. https://arxiv.org/pdf/1909.12790.pdf
4. https://greydanus.github.io/2019/05/15/hamiltonian-nns/
5. https://arxiv.org/pdf/1806.07366.pdf

Introduction to Hamiltonian Neural Networks

More Related Content

What's hot (20)

Similar to Introduction to Hamiltonian Neural Networks (20)

Recently uploaded (20)

Introduction to Hamiltonian Neural Networks