Gaussian Process Regression

Random Process
A random process 𝑋𝑡 is completely characterized if the following is known.
𝑃((𝑋𝑡1
, ⋯ ⋯ , 𝑋𝑡 𝑘
) for any 𝐵, 𝑘, and 𝑡1, ⋯ ⋯ , 𝑡 𝑘
A random process (RP) (or stochastic process) is an infinite indexed collection
of random variables {𝑋(𝑡) ∶ 𝑡 ∈ 𝑇 }, defined over a common probability space.
(Functions are infinite dimensional vectors)
Note that given a random process, only ’ﬁnite-dimensional’ probabilities or
probability functions can be speciﬁed
𝐹𝑜𝑟 𝑡𝑖𝑚𝑒 𝑡 ∈ 𝑇 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑢𝑛𝑑𝑒𝑟𝑙𝑦𝑖𝑛𝑔 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡 𝜔 ∈ Ω
𝑇 × Ω → ℝ

Gaussian Process
(Background)
Gaussian Process

Gaussian Process
A Gaussian process is a collection of random variables, any ﬁnite number of
which have a joint Gaussian distribution

Gaussian Process
* Multivariate and Joint distribution are basically synonyms.

Gaussian Process
Gaussian process and Gaussian process regression are diﬀerent.
Gaussian process regression: A nonparametric Bayesian
regression method using the properties of Gaussian processes.
Two views to interpret Gaussian process regression
• Weight-space view
• Function-space view

MLE vs MAP
Linear regression, 𝑓 𝑥 = 𝑤 𝑇 𝑥
𝐺𝑜𝑎𝑙 𝑜𝑓 𝑙𝑖𝑛𝑒𝑎𝑟 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
𝑚𝑖𝑛𝑚𝑖𝑧𝑒: 𝑦 − 𝑓(𝑥) 2
𝑠𝑜𝑙𝑢𝑡𝑖𝑜𝑛: 𝑤 = (𝑋𝑋 𝑇
)−1
𝑋𝑦

MLE vs MAP
Another perspective of Bayesian linear regression :
Ridge regularization

MLE vs MAP
Return to Bayesian solution:
Mean value of 𝑥𝑤 𝑀𝐴𝑃

Gaussian Process regression
• Weight Space View
• Function Space View

Weight Space View

Weight Space View
𝑌 = ∅(𝑥) 𝑇 𝑤, 𝑤~𝑁 0, 𝐴−1 𝐼
𝐼 𝑚𝑒𝑎𝑛𝑠 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑛𝑜
𝑐𝑜𝑙𝑖𝑛𝑒𝑎𝑟𝑡𝑦 𝑖𝑛 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒𝑠 𝑜𝑓 𝑤
𝐸 𝑌 = 𝐸 ∅ 𝑥 𝑇 𝑤 = ∅ 𝑥 𝑇 𝐸 𝑤 = 0
Cov 𝑌 = 𝐸 𝑌 − 0 𝑌 − 0 = 𝐸 𝑌𝑌 𝑇
= ∅ 𝑥 𝑇 𝐸 𝑤𝑤 𝑇 ∅ 𝑥 = ∅ 𝑥 𝑇 𝐴−1∅ 𝑥
𝑘 𝑥𝑖, 𝑥𝑗 = 𝑒𝑥𝑝(− 𝑥𝑖 − 𝑥𝑗
2
)
𝑘 𝑋 𝑇, 𝑋 = ∅ 𝑥 𝑇∅ 𝑥 =
𝑘(𝑥1, 𝑥1) ⋯ 𝑘(𝑥1, 𝑥 𝑛)
⋮ ⋱ ⋮
𝑘(𝑥 𝑛, 𝑥1) ⋯ 𝑘(𝑥 𝑛, 𝑥 𝑛)
𝑤𝑒 𝑑𝑒𝑓𝑖𝑛𝑒 𝐾 = ∅ 𝑥 𝑇 𝐴−1∅ 𝑥
𝑃 𝑌 = 𝑁(𝑌ㅣ0, 𝐾)
𝑤ℎ𝑎𝑡 𝑑𝑜 𝑤𝑒 𝑔𝑒𝑡 𝑖𝑓 𝑛𝑒𝑤 𝑑𝑎𝑡𝑎 𝑥∗ 𝑎𝑝𝑝𝑒𝑎𝑟?
𝑃 𝑦∗ㅣ𝑥∗, 𝑋, 𝑌 = 𝑁(𝑦∗ㅣ? , ? )

Weight Space View
𝑤ℎ𝑎𝑡 𝑑𝑜 𝑤𝑒 𝑔𝑒𝑡 𝑖𝑓 𝑛𝑒𝑤 𝑑𝑎𝑡𝑎 𝑥∗ ℎ𝑎𝑠 𝑎𝑝𝑝𝑒𝑎𝑟?
𝑃 𝑦∗ㅣ𝑥∗, 𝑋, 𝑌 = 𝑁(𝑦∗ㅣ? , ? )
𝑊ℎ𝑎𝑡 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑟𝑎𝑛𝑑𝑜𝑚 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠?
𝑃 𝑦∗ㅣ𝑌

Weight Space View
∑ 𝑦∗,𝑌 = 𝐸 𝑦∗ 𝑌 𝑇 = ∅(𝑥∗) 𝑇 𝐴−1∅(𝑋) 𝑇 = 𝐶
∑ 𝑦∗,𝑦∗ = 𝐸 𝑦∗ 𝑦∗ 𝑇
= ∅(𝑥∗) 𝑇 𝐴−1∅(𝑥∗) 𝑇 =k
𝑃 𝑦∗ㅣ𝑌 = 𝑃 ∅(𝑥∗) 𝑇 𝑤ㅣ∅(𝑋) 𝑇 𝑤 = 𝑁(𝑦∗ㅣ𝜇 𝑦∗ + ∑ 𝑦∗,𝑌 ∑ 𝑌,𝑌
−1
𝑌 − 𝜇 𝑌 , ∑ 𝑦∗,𝑦∗ −∑ 𝑦∗,𝑌 ∑ 𝑌,𝑌
−1
∑ 𝑌,𝑦∗)
= 𝑁(𝑦∗ㅣ𝐶𝐾−1 𝑌, 𝑘 − 𝐶𝐾−1 𝐶 𝑇)
Var( 𝑌∗)?
𝑌∗ = [𝑦1, ⋯ , 𝑦𝑛, 𝑦 𝑛+1] 𝑇, → Var( 𝑌∗)= 𝑐𝑜𝑣 𝑛 𝐶 𝑇
𝐶 𝑘

Weight Space View
𝑊ℎ𝑎𝑡 𝑖𝑓 𝑌 = ∅(𝑥) 𝑇
𝑤 + 𝜀, 𝑤~𝑁 0, 𝐴−1
𝐼 , 𝜀~𝑁 0, 𝐵−1
𝐼
Cov 𝑌 = 𝐸 𝑌 − 0 𝑌 − 0 = 𝐸 𝑌𝑌 𝑇 = ∅ 𝑥 𝑇 𝐸 𝑤𝑤 𝑇 ∅ 𝑥 + 𝐸 2𝜀∅ 𝑥 𝑇 𝑤 + 𝜀𝜀 𝑇
= ∅ 𝑥 𝑇 𝐴−1∅ 𝑥 + 𝐵−1 𝐼 = K + 𝐵−1 𝐼

Function Space View

Amazing properties of Non-parametric method

References
C. E. Rasmussen and C. K. Williams. Gaussian processes for machine learning, volume 1.
MIT press Cambridge, 2006.

References
"Gaussian Process", Lectured by Professor Il-Chul Moon
-video link: https://youtu.be/RmN54ykspK4
Ian Goodfellow et al. Deep Learning, (2016)
Trevor Hastie et al. The Elements of Statistical Learning (2001)
Machine Learning Lecture 26 "Gaussian Processes" -Cornell CS4780 SP17 by Kilian Weinberger
-video link: https://www.youtube.com/watch?v=R-NUdqxKjos&t=1000s
9.520/6.860S Statistical Learning Theory by Lorenzo Rosasco
http://www.mit.edu/~9.520/fall14/slides/class03/class03_rkhsPart1.pdf
-video link: https://www.youtube.com/watch?v=9-oxo_k69qs
Bayesian Deep Learning by Sungjoon Choi
-video link: https://www.edwith.org/bayesiandeeplearning/joinLectures/14426

Gaussian Process Regression

More Related Content

What's hot (20)

Similar to Gaussian Process Regression (20)

More from SEMINARGROOT (20)

Recently uploaded (20)

Gaussian Process Regression