Conceptual Introduction to Gaussian Processes

Gaussian Process Regression
An intuitive introduction
Juan Pablo Carbajal
Siedlungswasserwirtschaft
Eawag - aquatic research
Dübendorf, Switzerland
juanpablo.carbajal@eawag.ch
November 24, 2017

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The learning problem in a nutshell
0.2 0.4 0.6 0.8
-2
-1
0
1
2
t
y
Data given

(ti, yi) = (t, y), what model to use?
2

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The learning problem
to use a set of observations to uncover an underlying process.
For prediction (maybe for understanding).
3
Yaser Abu-Mostafa. Learning from data. https://work.caltech.edu/telecourse.html

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Input: x position in a map
Output: y height of the terrain
Target function: f : X → Y height map
Data: (x1, y1) , . . . , (xn, yn) field measurements
Hypothesis: g : X → Y formula to be used
3

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Learning Algorithm
Training Examples
Available text snippets
Unknown Target Function
How much legal-like is the text?
Hypothesis Set
Possible text clas-
sification functions
Final Hypothesis
Final classification function
3

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Data set
0.2 0.4 0.6 0.8
-2
-1
0
1
2
t
y
Data given

(ti, yi) = (t, y), what model to use?
4

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Naive regression
0.2 0.4 0.6 0.8
-2
-1
0
1
2
t
y
w0 + w1t + w2t2
+ w3t3
= y(t)

1 t t2
t3





w0
w1
w2
w3



 = y(t)
φ(t)
w = y(t)


1 t1 t2
1 t3
1
1 t2 t2
2 t3
2
1 t3 t2
3 t3
3






w0
w1
w2
w3



 =


y1
y2
y3

 Φ(t)
w =


φ(t1)
φ(t2)
φ(t3)

 w = y
5

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Pseudo-inverse
Φ
is an n×N (3×4) matrix with N ≥ n, then rank Φ
≤ n
With a feature vector φ complex enough we have that
rank Φ
= n, i.e. the n row vectors of the matrix are linearly
independent, ∃ Φ
Φ
−1
Φ
Φ is called Gramian matrix: the matrix of all scalar products.
Φ
w = y →
I
z }| {
Φ
Φ Φ
Φ
−1
Φ
w
| {z }
w∗
= y
Φ Φ
Φ
−1
Φ
w = Φ Φ
Φ
−1
| {z }
Moore-Penrose pseudoinverse
y = w∗
6

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
A change of perspective
Instead of looking at the rows, look at the columns. These are l.i.
functions ψi(t) = ti−1
. The model looks like
y(t) =
N−1
X
i=0
ψi(t)wi
and the regression problem now looks like
ψi(t) = ti−1
, Ψ(t)w =

ψ3(t) ψ2(t) ψ1(t) ψ0(t)

w = y
Note that Ψ = Φ
.
7

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
A change of perspective
Ψ is an n×N (3×4) matrix with N ≥ n, then rank Ψ ≤ n
If the N column vectors of the matrix linearly independent, then
∃ ΨΨ
−1
K = ΨΨ
is called Covariance matrix: Kij =
N−1
X
k=0
ψk(ti)ψk(tj).
Ψw = y →
I
z }| {
Ψ Ψ
ΨΨ
−1
Ψw
| {z }
w∗
= y
Ψ
ΨΨ
−1
Ψw = Ψ
ΨΨ
−1
| {z }
Moore-Penrose pseudoinverse
y = w∗
8

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Recapitulation: the problem
Given n examples {(ti, yi)}, propose a model using N ≥ n l.i.
functions (a.k.a. features),
f(t) =
N
X
i
ψi(t)wi
and find some good {wi}.
9
Hansen, Per Christian. Rank-deficient and discrete ill-posed problems: numerical aspects of
linear inversion. Vol. 4. Siam, 1998.
Wendland, Holger. Scattered data approximation. Vol. 17. Cambridge university press, 2004.

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Recapitulation: the solution
Data view
Think in terms of n feature vectors φi in a (high) dimensional space
RN
φ
j =

ψ1(tj) . . . ψN (tj)

, j = 1, . . . , n
The solution reads
f(t) = Φ(t)
w∗ =
scalar product
z }| {
Φ(t)
Φ Φ
Φ
| {z }
scalar product
-1
y
Function view
Think in terms of a N dimensional function space H spanned by the
ψi(t). The solution reads
f(t) = Ψ(t)w =
covariance
z }| {
Ψ(t)Ψ
ΨΨ
| {z }
covariance
-1
y = k(t, t) k(t, t)−1
y
10

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The Kernel trick
To calculate the solutions we only need scalar products or
covariances: we never use the actual {φi} or {ψi}.
cov
Ψ
(t, t0
) = k(t, t0
) = Φ(t) · Φ(t0
)
Infinite features
Now we can use N = ∞, i.e infinite dimensional features or infinite
basis functions!
By selecting valid covariance functions we implicitly select features
for our model.
How to choose the covariance function?
11
Rasmussen, C., Williams, C. (2006). Gaussian Processes for Machine Learning.
http://www.gaussianprocess.org/gpml/

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The Kernel trick
To calculate the solutions we only need scalar products or
covariances: we never use the actual {φi} or {ψi}.
cov
Ψ
(t, t0
) = k(t, t0
) = Φ(t) · Φ(t0
)
Infinite features
Now we can use N = ∞, i.e infinite dimensional features or infinite
basis functions!
By selecting valid covariance functions we implicitly select features
for our model.
How to choose the covariance function? Prior knowledge about the
solution.
11
Rasmussen, C., Williams, C. (2006). Gaussian Processes for Machine Learning.
http://www.gaussianprocess.org/gpml/

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Digression: back to the solution
Lets call the pseudoinverse Ψ+
. The proposed solution
Ψ+
y = w∗, Ψw∗ = ΨΨ+
y = y → y(t) = Ψ(t)w∗
LHS of the arrow is the interpolation, RHS is the intra- or
extra-polation.
But with any random vector ξ we have
w∗ +
null Ψ
z }| {
I − Ψ+
Ψ

ξ = ŵ∗
Ψŵ∗ = y + Ψ − ΨΨ+
| {z }
I
Ψ
!
ξ = y
ŵ∗ also solves the interpolation problem. There are many solutions!
(unless Ψ+
Ψ = I, i.e. null Ψ = 0, i.e. invertible matrix: not our
case).
12

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Digression: back to the solution
0.2 0.4 0.6 0.8
-2
-1
0
1
2
t
y
w∗ + (I − Ψ+
Ψ) ξ
13

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Gaussian Process
14

Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Thank you!
15

Conceptual Introduction to Gaussian Processes

More Related Content

What's hot (20)

Similar to Conceptual Introduction to Gaussian Processes (20)

Recently uploaded (20)

Conceptual Introduction to Gaussian Processes