Uncertainty Quantification in AI

Are you sure about that!?
Uncertainty Quantification in AI
Florian Wilhelm Berlin, October 10th 2019

2
Dr. Florian Wilhelm
Principal Data Scientist @ inovex
@FlorianWilhelm
FlorianWilhelm
florianwilhelm.info
Mathematical Modelling
Data Science to Production
Recommender Systems
Uncertainty Quantification & Causality
Python Data Stack
Maintainer PyScaffold

3
Simon Bachstein
Data Scientist @ inovex
2018/07 – 2019/01 Master Thesis at inovex:
Uncertainty Quantification in Deep Learning
• Blogpost:
http://inovex.de/blog/uncertainty-quantification-deep-learning
• Master Thesis:
https://sbachstein.de/master_thesis.pdf
@simonbachstein
sbachstein
sbachstein.de

1. Motivation
2. Methods
a. Gaussian Processes
b. Monte-Carlo Dropout
c. Deep Ensembles
d. Dropout Ensembles
e. Quantile Regression
3. Experiments
4. Conclusion & Outlook
Agenda
4

Deep Networks cannot look beyond their horizon
Motivation
5
90% cat
10% dog

Motivation
6
40% cat
60% dog

Motivation
7
?

Boult, T. E., Cruz, S., Dhamija, A., Gunther, M., Henrydoss, J., & Scheirer, W. (2019). Learning and the Unknown: Surveying
Steps Toward Open World Recognition. Aaai, 1–8. Retrieved from www.aaai.org
Learning and the Unknown
8

Simple Regression Problem
Interpolation
9

Deep Networks don’t extrapolate
Neural Arithmetic Logic Units, NIPS'18, Andrew Trask et. al.10

Deep Networks don’t extrapolate
11

Uncertainty about interpolation and extrapolation
12

1. Motivation
2. Methods
c. Deep Ensembles
3. Experiments
Agenda
14

Methods for Uncertainty Quantification
16
Relaxation of mathematical assumptions about data
Gaussian
Processes
Deep Ensembles / Dropout
Ensembles
Quantile
Regression
Monte Carlo
Dropout

1. Motivation
2. Methods
c. Deep Ensembles
3. Experiments
Agenda
17

A Gaussian Process can be thought of as a random function
which is defined by its mean and covariance functions
Gaussian Processes
18
Definition

Gaussian Processes
20
Inference

Gaussian Processes
21
Inference

Gaussian Processes
22
Inference with perfect interpolation

Gaussian Processes
23
Inference with noisy observations

Gaussian Processes
24
Inference
Inference using given data points can be done analytically. For
example, when assuming the (prior) mean function to be zero
everywhere, we get:
Good introduction:
Bayesian Non-parametric Models for Data Science using PyMC by Christopher Fonnesbeck
• https://www.youtube.com/watch?v=-sIOMs4MSuA
• https://de.slideshare.net/mlreview/bayesian-nonparametric-models-for-data-science-using-pymc
computationally intense

1. Motivation
2. Methods
c. Deep Ensembles
3. Experiments
Agenda
25

MC Dropout
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning, ICML 2016, Yarin Gal et. al.
26
...

1. Motivation
2. Methods
c. Deep Ensembles
3. Experiments
Agenda
27

Deep Ensembles
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017, Balaji Lakshminarayanan et. al28
Custom loss function:
Capture uncertainty directly at training time

Deep Ensembles
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, NIPS 2017, Balaji Lakshminarayanan et. al29
Combine an ensemble of networks

1. Motivation
2. Methods
c. Deep Ensembles
3. Experiments
Agenda
30

Dropout Ensembles
31
The best of both worlds?
...

1. Motivation
2. Methods
c. Deep Ensembles
3. Experiments
Agenda
32

Using the cumulative distribution function (cdf) of a random variable Y, we
define the quantile:
Loss function to estimate quantile:
Quantile Regression
33

Intuition behind Quantile Regression
34
0.0 0.1 0.2 0.90.3 0.4 0.5 0.6 0.7 0.8 1.0
Assume median is here (𝜏 = 0.5)
𝑦 > )𝑞+(𝑥)
𝑦 ≤ )𝑞+(𝑥)
0.1
0.2
0.5
0.8
Error: 0.0 + 1.6 = 1.6

35
0.0 0.1 0.2 0.90.3 0.4 0.5 0.6 0.7 0.8 1.0
𝑦 > )𝑞+(𝑥)
𝑦 ≤ )𝑞+(𝑥)
0.1
0.4
0.7
Error: 0.0 + 1.2 = 1.2
0.0

36
0.0 0.1 0.2 0.90.3 0.4 0.5 0.6 0.7 0.8 1.0
𝑦 > )𝑞+(𝑥)
𝑦 ≤ )𝑞+(𝑥)
0.1
0.3
0.6
Error: 0.1 + 0.9 = 1.0
0.0

37
0.0 0.1 0.2 0.90.3 0.4 0.5 0.6 0.7 0.8 1.0
𝑦 > )𝑞+(𝑥)
𝑦 ≤ )𝑞+(𝑥)
0.2
0.2
0.5
Error: 0.3 + 0.7 = 1.0
0.1
No change due to the linearity of the error!
+0.1
+0.1
-0.1
-0.1

Now the 0.75th Quantile
38
0.0 0.1 0.2 0.90.3 0.4 0.5 0.6 0.7 0.8 1.0
Assume 𝜏 = 0.75 is here
𝑦 > )𝑞+(𝑥)
𝑦 ≤ )𝑞+(𝑥)
0.2
0.2
0.5
0.1
Error: (1 − 0.75) ⋅ 0.3 + 0.75 ⋅ 0.7 = 0.6
Right-side error weights 3 times as much as
the left-side error

Now the 0.75th Quantile
39
0.0 0.1 0.2 0.90.3 0.4 0.5 0.6 0.7 0.8 1.0
𝜏 = 0.75
𝑦 > )𝑞+(𝑥)
𝑦 ≤ )𝑞+(𝑥)
0.5
0.1
0.2
Error: (1 − 0.75) ⋅ 1.0 + 0.75 ⋅ 0.2 = 0.4
0.4
Change in the right-side error also weights
3 times as much as the left-side error

1. Motivation
2. Methods
c. Deep Ensembles
3. Experiments
Agenda
40

According to the function
Samples are generated as follows:
Experiments
Uncertainty in Deep Learning (Phd thesis), Yarin Gal, http://mlg.eng.cam.ac.uk/yarin/blog_2248.html41
Dataset

Neural networks
› 2 hidden layers with 20 ReLU neurons each
› 5 networks for Deep Ensembles
› 100 iterations for Dropout predictions
› Adam optimizer with batch size of 128
› LR, weight decay, dropout probability are optimized
Gaussian Processes
› squared exponential covariance and zero mean function prior
› covariance function parameters and aleatory noise are optimized
Experiments
43
Network setup and hyperparameters

Mean squared error (MSE)
Mean negative log likelihood (MNLL)
Mean Kullback-Leibler (KL) divergence
Experiments
44
Measures for generalization quality

Experiments
46
They still don’t extrapolate and they don’t quite realize

Experiments
47
Gaussian Process

Experiments
49
Heteroscedastic noise

Experiments
50
Non-Gaussian noise

Experiments
51
Uncertainty split
aleatoric epistemic

Summary
52
GP MCD DeepE DropoutE QR
Homoscedastic
noise
++ o + o o
Heteroscedastic
noise
-- - ++ + +
Non-Gaussian
noise
+ o + + -
Convergence ++ - + - +
Speed (--) + - / (+) + ++
Uncertainty split yes no yes yes no

1. Motivation
2. Methods
c. Deep Ensembles
3. Experiments
Agenda
53

› Neural network approaches discussed here are very aware of
aleatory uncertainty, however, not capable of correctly estimating
epistemic uncertainty
› Gaussian Processes give clear signals about ignorance but do not
scale
A combined solution needs to be developed because
uncertainty estimation is needed in critical applications
Conclusion
54
There is work to be done

› Bayesian Neural Networks (e.g. with PyMC)
› Sparse Gaussian Process approximations
› Gaussian Processes on top of neural networks
Outlook
55
Other approaches

Thank You!
Florian Wilhelm
Principal Data Scientist
inovex GmbH
Schanzenstraße 6-20
Kupferhütte 1.13
51063 Köln
florian.wilhelm@inovex.de

Uncertainty Quantification in AI

More Related Content

What's hot (20)

Similar to Uncertainty Quantification in AI (20)

More from Florian Wilhelm (18)

Recently uploaded (20)

Uncertainty Quantification in AI