SlideShare a Scribd company logo
5
Most read
12
Most read
15
Most read
Gaussian Process Regression
An intuitive introduction
Juan Pablo Carbajal
Siedlungswasserwirtschaft
Eawag - aquatic research
Dübendorf, Switzerland
juanpablo.carbajal@eawag.ch
November 24, 2017
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The learning problem in a nutshell
0.2 0.4 0.6 0.8
-2
-1
0
1
2
t
y
Data given

(ti, yi) = (t, y), what model to use?
2
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The learning problem
to use a set of observations to uncover an underlying process.
For prediction (maybe for understanding).
3
Yaser Abu-Mostafa. Learning from data. https://work.caltech.edu/telecourse.html
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The learning problem
Input: x position in a map
Output: y height of the terrain
Target function: f : X → Y height map
Data: (x1, y1) , . . . , (xn, yn) field measurements
Hypothesis: g : X → Y formula to be used
3
Yaser Abu-Mostafa. Learning from data. https://work.caltech.edu/telecourse.html
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The learning problem
Learning Algorithm
Training Examples
Available text snippets
Unknown Target Function
How much legal-like is the text?
Hypothesis Set
Possible text clas-
sification functions
Final Hypothesis
Final classification function
3
Yaser Abu-Mostafa. Learning from data. https://work.caltech.edu/telecourse.html
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Data set
0.2 0.4 0.6 0.8
-2
-1
0
1
2
t
y
Data given

(ti, yi) = (t, y), what model to use?
4
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Naive regression
0.2 0.4 0.6 0.8
-2
-1
0
1
2
t
y
w0 + w1t + w2t2
+ w3t3
= y(t)

1 t t2
t3





w0
w1
w2
w3



 = y(t)
φ(t)
w = y(t)


1 t1 t2
1 t3
1
1 t2 t2
2 t3
2
1 t3 t2
3 t3
3






w0
w1
w2
w3



 =


y1
y2
y3

 Φ(t)
w =


φ(t1)
φ(t2)
φ(t3)

 w = y
5
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Pseudo-inverse
Φ
is an n×N (3×4) matrix with N ≥ n, then rank Φ
≤ n
With a feature vector φ complex enough we have that
rank Φ
= n, i.e. the n row vectors of the matrix are linearly
independent, ∃ Φ
Φ
−1
Φ
Φ is called Gramian matrix: the matrix of all scalar products.
Φ
w = y →
I
z }| {
Φ
Φ Φ
Φ
−1
Φ
w
| {z }
w∗
= y
Φ Φ
Φ
−1
Φ
w = Φ Φ
Φ
−1
| {z }
Moore-Penrose pseudoinverse
y = w∗
6
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
A change of perspective
Instead of looking at the rows, look at the columns. These are l.i.
functions ψi(t) = ti−1
. The model looks like
y(t) =
N−1
X
i=0
ψi(t)wi
and the regression problem now looks like
ψi(t) = ti−1
, Ψ(t)w =

ψ3(t) ψ2(t) ψ1(t) ψ0(t)

w = y
Note that Ψ = Φ
.
7
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
A change of perspective
Ψ is an n×N (3×4) matrix with N ≥ n, then rank Ψ ≤ n
If the N column vectors of the matrix linearly independent, then
∃ ΨΨ
−1
K = ΨΨ
is called Covariance matrix: Kij =
N−1
X
k=0
ψk(ti)ψk(tj).
Ψw = y →
I
z }| {
Ψ Ψ
ΨΨ
−1
Ψw
| {z }
w∗
= y
Ψ
ΨΨ
−1
Ψw = Ψ
ΨΨ
−1
| {z }
Moore-Penrose pseudoinverse
y = w∗
8
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Recapitulation: the problem
Given n examples {(ti, yi)}, propose a model using N ≥ n l.i.
functions (a.k.a. features),
f(t) =
N
X
i
ψi(t)wi
and find some good {wi}.
9
Hansen, Per Christian. Rank-deficient and discrete ill-posed problems: numerical aspects of
linear inversion. Vol. 4. Siam, 1998.
Wendland, Holger. Scattered data approximation. Vol. 17. Cambridge university press, 2004.
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Recapitulation: the solution
Data view
Think in terms of n feature vectors φi in a (high) dimensional space
RN
φ
j =

ψ1(tj) . . . ψN (tj)

, j = 1, . . . , n
The solution reads
f(t) = Φ(t)
w∗ =
scalar product
z }| {
Φ(t)
Φ Φ
Φ
| {z }
scalar product
-1
y
Function view
Think in terms of a N dimensional function space H spanned by the
ψi(t). The solution reads
f(t) = Ψ(t)w =
covariance
z }| {
Ψ(t)Ψ
ΨΨ
| {z }
covariance
-1
y = k(t, t) k(t, t)−1
y
10
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The Kernel trick
To calculate the solutions we only need scalar products or
covariances: we never use the actual {φi} or {ψi}.
cov
Ψ
(t, t0
) = k(t, t0
) = Φ(t) · Φ(t0
)
Infinite features
Now we can use N = ∞, i.e infinite dimensional features or infinite
basis functions!
By selecting valid covariance functions we implicitly select features
for our model.
How to choose the covariance function?
11
Rasmussen, C.,  Williams, C. (2006). Gaussian Processes for Machine Learning.
http://www.gaussianprocess.org/gpml/
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
The Kernel trick
To calculate the solutions we only need scalar products or
covariances: we never use the actual {φi} or {ψi}.
cov
Ψ
(t, t0
) = k(t, t0
) = Φ(t) · Φ(t0
)
Infinite features
Now we can use N = ∞, i.e infinite dimensional features or infinite
basis functions!
By selecting valid covariance functions we implicitly select features
for our model.
How to choose the covariance function? Prior knowledge about the
solution.
11
Rasmussen, C.,  Williams, C. (2006). Gaussian Processes for Machine Learning.
http://www.gaussianprocess.org/gpml/
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Digression: back to the solution
Lets call the pseudoinverse Ψ+
. The proposed solution
Ψ+
y = w∗, Ψw∗ = ΨΨ+
y = y → y(t) = Ψ(t)w∗
LHS of the arrow is the interpolation, RHS is the intra- or
extra-polation.
But with any random vector ξ we have
w∗ +
null Ψ
z }| {
I − Ψ+
Ψ

ξ = ŵ∗
Ψŵ∗ = y + Ψ − ΨΨ+
| {z }
I
Ψ
!
ξ = y
ŵ∗ also solves the interpolation problem. There are many solutions!
(unless Ψ+
Ψ = I, i.e. null Ψ = 0, i.e. invertible matrix: not our
case).
12
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Digression: back to the solution
0.2 0.4 0.6 0.8
-2
-1
0
1
2
t
y
w∗ + (I − Ψ+
Ψ) ξ
13
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Gaussian Process
14
Intro to GP
JuanPi Carbajal
Learning from
data
Interpolation
(polynomial)
Thank you!
15

More Related Content

PDF
Supervised PCAとその周辺
PDF
統計的学習の基礎 5章前半(~5.6)
PDF
PRML読書会#2,#3資料
PDF
Kalman Filtering
PDF
はじめてのパターン認識第八章
PDF
Rnn개념정리
PPTX
Variational inference intro. (korean ver.)
PDF
Tuning learning rate
Supervised PCAとその周辺
統計的学習の基礎 5章前半(~5.6)
PRML読書会#2,#3資料
Kalman Filtering
はじめてのパターン認識第八章
Rnn개념정리
Variational inference intro. (korean ver.)
Tuning learning rate

What's hot (20)

PPTX
クラシックな機械学習の入門  5. サポートベクターマシン
PPT
Kalman filter
PDF
はじパタ8章 svm
PDF
「統計的学習理論」第1章
PDF
パターン認識と機械学習6章(カーネル法)
PDF
Linear algebra
PDF
Feature Engineering - Getting most out of data for predictive models
PDF
Introduction to XGBoost
PDF
Kmeans initialization
PDF
Intro to Classification: Logistic Regression & SVM
PDF
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
PDF
Markov Chain Monte Carlo Methods
PDF
개념 이해가 쉬운 Variational Autoencoder (VAE)
PDF
異常検知とGAN: Efficient GAN
PPTX
Anomaly detection char10
PPTX
有向グラフに対する 非線形ラプラシアンと ネットワーク解析
PDF
【輪読】Bayesian Optimization of Combinatorial Structures
PDF
深層学習によるポアソンデノイジング: 残差学習はポアソンノイズに対して有効か? 論文 Poisson Denoising by Deep Learnin...
PPTX
Visualizing data using t-SNE
PDF
Chapter 19 Variational Inference
クラシックな機械学習の入門  5. サポートベクターマシン
Kalman filter
はじパタ8章 svm
「統計的学習理論」第1章
パターン認識と機械学習6章(カーネル法)
Linear algebra
Feature Engineering - Getting most out of data for predictive models
Introduction to XGBoost
Kmeans initialization
Intro to Classification: Logistic Regression & SVM
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Markov Chain Monte Carlo Methods
개념 이해가 쉬운 Variational Autoencoder (VAE)
異常検知とGAN: Efficient GAN
Anomaly detection char10
有向グラフに対する 非線形ラプラシアンと ネットワーク解析
【輪読】Bayesian Optimization of Combinatorial Structures
深層学習によるポアソンデノイジング: 残差学習はポアソンノイズに対して有効か? 論文 Poisson Denoising by Deep Learnin...
Visualizing data using t-SNE
Chapter 19 Variational Inference
Ad

Similar to Conceptual Introduction to Gaussian Processes (20)

PDF
Introduction to modern Variational Inference.
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
Lecture_9.pdf
PDF
Presentation on stochastic control problem with financial applications (Merto...
PDF
Research internship on optimal stochastic theory with financial application u...
PDF
Bayesian Deep Learning
PDF
Regression_1.pdf
PDF
Considerate Approaches to ABC Model Selection
PDF
A nonlinear approximation of the Bayesian Update formula
PDF
A walk through the intersection between machine learning and mechanistic mode...
PDF
talk MCMC & SMC 2004
PDF
MLHEP Lectures - day 2, basic track
PDF
MAPE regression, seminar @ QUT (Brisbane)
PDF
Monte Carlo Statistical Methods
PDF
Numerical integration based on the hyperfunction theory
PDF
Slides ACTINFO 2016
PDF
Radial Basis Function Interpolation
PDF
Connection between inverse problems and uncertainty quantification problems
PDF
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
PDF
Nested sampling
Introduction to modern Variational Inference.
Maximum likelihood estimation of regularisation parameters in inverse problem...
Lecture_9.pdf
Presentation on stochastic control problem with financial applications (Merto...
Research internship on optimal stochastic theory with financial application u...
Bayesian Deep Learning
Regression_1.pdf
Considerate Approaches to ABC Model Selection
A nonlinear approximation of the Bayesian Update formula
A walk through the intersection between machine learning and mechanistic mode...
talk MCMC & SMC 2004
MLHEP Lectures - day 2, basic track
MAPE regression, seminar @ QUT (Brisbane)
Monte Carlo Statistical Methods
Numerical integration based on the hyperfunction theory
Slides ACTINFO 2016
Radial Basis Function Interpolation
Connection between inverse problems and uncertainty quantification problems
MUMS: Bayesian, Fiducial, and Frequentist Conference - Multidimensional Monot...
Nested sampling
Ad

Recently uploaded (20)

PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PDF
Biophysics 2.pdffffffffffffffffffffffffff
PPT
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPTX
microscope-Lecturecjchchchchcuvuvhc.pptx
PPTX
2. Earth - The Living Planet Module 2ELS
PPTX
7. General Toxicologyfor clinical phrmacy.pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
2. Earth - The Living Planet earth and life
PPTX
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PDF
The scientific heritage No 166 (166) (2025)
PPTX
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5
The KM-GBF monitoring framework – status & key messages.pptx
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
Biophysics 2.pdffffffffffffffffffffffffff
The World of Physical Science, • Labs: Safety Simulation, Measurement Practice
Classification Systems_TAXONOMY_SCIENCE8.pptx
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
Introduction to Cardiovascular system_structure and functions-1
microscope-Lecturecjchchchchcuvuvhc.pptx
2. Earth - The Living Planet Module 2ELS
7. General Toxicologyfor clinical phrmacy.pptx
Placing the Near-Earth Object Impact Probability in Context
2. Earth - The Living Planet earth and life
GEN. BIO 1 - CELL TYPES & CELL MODIFICATIONS
TOTAL hIP ARTHROPLASTY Presentation.pptx
POSITIONING IN OPERATION THEATRE ROOM.ppt
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
The scientific heritage No 166 (166) (2025)
G5Q1W8 PPT SCIENCE.pptx 2025-2026 GRADE 5

Conceptual Introduction to Gaussian Processes

  • 1. Gaussian Process Regression An intuitive introduction Juan Pablo Carbajal Siedlungswasserwirtschaft Eawag - aquatic research Dübendorf, Switzerland juanpablo.carbajal@eawag.ch November 24, 2017
  • 2. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) The learning problem in a nutshell 0.2 0.4 0.6 0.8 -2 -1 0 1 2 t y Data given (ti, yi) = (t, y), what model to use? 2
  • 3. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) The learning problem to use a set of observations to uncover an underlying process. For prediction (maybe for understanding). 3 Yaser Abu-Mostafa. Learning from data. https://work.caltech.edu/telecourse.html
  • 4. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) The learning problem Input: x position in a map Output: y height of the terrain Target function: f : X → Y height map Data: (x1, y1) , . . . , (xn, yn) field measurements Hypothesis: g : X → Y formula to be used 3 Yaser Abu-Mostafa. Learning from data. https://work.caltech.edu/telecourse.html
  • 5. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) The learning problem Learning Algorithm Training Examples Available text snippets Unknown Target Function How much legal-like is the text? Hypothesis Set Possible text clas- sification functions Final Hypothesis Final classification function 3 Yaser Abu-Mostafa. Learning from data. https://work.caltech.edu/telecourse.html
  • 6. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) Data set 0.2 0.4 0.6 0.8 -2 -1 0 1 2 t y Data given (ti, yi) = (t, y), what model to use? 4
  • 7. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) Naive regression 0.2 0.4 0.6 0.8 -2 -1 0 1 2 t y w0 + w1t + w2t2 + w3t3 = y(t) 1 t t2 t3     w0 w1 w2 w3     = y(t) φ(t) w = y(t)   1 t1 t2 1 t3 1 1 t2 t2 2 t3 2 1 t3 t2 3 t3 3       w0 w1 w2 w3     =   y1 y2 y3   Φ(t) w =   φ(t1) φ(t2) φ(t3)   w = y 5
  • 8. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) Pseudo-inverse Φ is an n×N (3×4) matrix with N ≥ n, then rank Φ ≤ n With a feature vector φ complex enough we have that rank Φ = n, i.e. the n row vectors of the matrix are linearly independent, ∃ Φ Φ −1 Φ Φ is called Gramian matrix: the matrix of all scalar products. Φ w = y → I z }| { Φ Φ Φ Φ −1 Φ w | {z } w∗ = y Φ Φ Φ −1 Φ w = Φ Φ Φ −1 | {z } Moore-Penrose pseudoinverse y = w∗ 6
  • 9. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) A change of perspective Instead of looking at the rows, look at the columns. These are l.i. functions ψi(t) = ti−1 . The model looks like y(t) = N−1 X i=0 ψi(t)wi and the regression problem now looks like ψi(t) = ti−1 , Ψ(t)w = ψ3(t) ψ2(t) ψ1(t) ψ0(t) w = y Note that Ψ = Φ . 7
  • 10. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) A change of perspective Ψ is an n×N (3×4) matrix with N ≥ n, then rank Ψ ≤ n If the N column vectors of the matrix linearly independent, then ∃ ΨΨ −1 K = ΨΨ is called Covariance matrix: Kij = N−1 X k=0 ψk(ti)ψk(tj). Ψw = y → I z }| { Ψ Ψ ΨΨ −1 Ψw | {z } w∗ = y Ψ ΨΨ −1 Ψw = Ψ ΨΨ −1 | {z } Moore-Penrose pseudoinverse y = w∗ 8
  • 11. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) Recapitulation: the problem Given n examples {(ti, yi)}, propose a model using N ≥ n l.i. functions (a.k.a. features), f(t) = N X i ψi(t)wi and find some good {wi}. 9 Hansen, Per Christian. Rank-deficient and discrete ill-posed problems: numerical aspects of linear inversion. Vol. 4. Siam, 1998. Wendland, Holger. Scattered data approximation. Vol. 17. Cambridge university press, 2004.
  • 12. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) Recapitulation: the solution Data view Think in terms of n feature vectors φi in a (high) dimensional space RN φ j = ψ1(tj) . . . ψN (tj) , j = 1, . . . , n The solution reads f(t) = Φ(t) w∗ = scalar product z }| { Φ(t) Φ Φ Φ | {z } scalar product -1 y Function view Think in terms of a N dimensional function space H spanned by the ψi(t). The solution reads f(t) = Ψ(t)w = covariance z }| { Ψ(t)Ψ ΨΨ | {z } covariance -1 y = k(t, t) k(t, t)−1 y 10
  • 13. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) The Kernel trick To calculate the solutions we only need scalar products or covariances: we never use the actual {φi} or {ψi}. cov Ψ (t, t0 ) = k(t, t0 ) = Φ(t) · Φ(t0 ) Infinite features Now we can use N = ∞, i.e infinite dimensional features or infinite basis functions! By selecting valid covariance functions we implicitly select features for our model. How to choose the covariance function? 11 Rasmussen, C., Williams, C. (2006). Gaussian Processes for Machine Learning. http://www.gaussianprocess.org/gpml/
  • 14. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) The Kernel trick To calculate the solutions we only need scalar products or covariances: we never use the actual {φi} or {ψi}. cov Ψ (t, t0 ) = k(t, t0 ) = Φ(t) · Φ(t0 ) Infinite features Now we can use N = ∞, i.e infinite dimensional features or infinite basis functions! By selecting valid covariance functions we implicitly select features for our model. How to choose the covariance function? Prior knowledge about the solution. 11 Rasmussen, C., Williams, C. (2006). Gaussian Processes for Machine Learning. http://www.gaussianprocess.org/gpml/
  • 15. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) Digression: back to the solution Lets call the pseudoinverse Ψ+ . The proposed solution Ψ+ y = w∗, Ψw∗ = ΨΨ+ y = y → y(t) = Ψ(t)w∗ LHS of the arrow is the interpolation, RHS is the intra- or extra-polation. But with any random vector ξ we have w∗ + null Ψ z }| { I − Ψ+ Ψ ξ = ŵ∗ Ψŵ∗ = y + Ψ − ΨΨ+ | {z } I Ψ ! ξ = y ŵ∗ also solves the interpolation problem. There are many solutions! (unless Ψ+ Ψ = I, i.e. null Ψ = 0, i.e. invertible matrix: not our case). 12
  • 16. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) Digression: back to the solution 0.2 0.4 0.6 0.8 -2 -1 0 1 2 t y w∗ + (I − Ψ+ Ψ) ξ 13
  • 17. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) Gaussian Process 14
  • 18. Intro to GP JuanPi Carbajal Learning from data Interpolation (polynomial) Thank you! 15