SlideShare a Scribd company logo
STATISTICAL PHYSICS MODELLING
OF MACHINE LEARNING
Lenka Zdeborová
(IPhT; CEA Saclay & CNRS)
WiMLDS meetup, November 29, 2018
Long history of physics influencing machine learning.
Examples:
Gibbs-Bogoluibov-Feynman’60s - physics behind variational
inference.
Hopfield model’82. Spin glass models of neural networks Amit,
Gutfreund, Sompolinsky’85.
Boltzmann machine Hinton, Sejnowski’86 - named after the
Boltzmann distribution.
Gardner’87 - Maximum storage capacity in neural networks
(related to VC dimension).
SVMs by Boser, Guyon, Vapnik’92 inspired by Krauth, Mézard’87
Many papers on neural networks in physics in 80s and 90s.
PHYSICS IN MACHINE
LEARNING
PHYSICS IN MACHINE
LEARNING
Les Houches, 1985
PHYSICS IN MACHINE
LEARNING
Les Houches, 1985
THE PUZZLE OF GENERALIZATION
According to PAC bounds (via VC dimension, Rademacher
complexity) neural networks that generalize well should not be
able to fit random labels.
ICLR’16
THEORETICAL QUESTIONS
IN DEEP LEARNING
Why the lack of overfitting?
“More parameters = more overfitting”
Does not seem to hold in deep learning.
SAMPLE COMPLEXITY
Cifar10 - 50000 samples.
How many samples are
really needed?
How low is the optimal sample complexity? Are we achieving it?
If not, is it because of architectures or algorithms?
AND THE PHYSICS HERE?
THEORETICAL-PHYSICS
ROADMAP
1. Experimental observation or fundamental hypothesis.
2. Unreasonably simple model for which toughest questions
can be understood mathematically.
3. Generalize to more realistic models, relies on universality
(= important laws of nature rarely depend on many details).
MODELS
H = J
X
(ij)2E
SiSj
P({Si}i=1,...,N ) =
e H
Z
magnetism of materials
In data science, models are used to fit the data. (e.g. linear
regression: What is the best straight line that captures the
dependence of y on x?)
In physics, models are the main tool for understanding.
MODELS
In data science, models are used to fit the data. (e.g. linear
regression: What is the best straight line that captures the
dependence of y on x?)
In physics, models are the main tool for understanding.
P({Si}i=1,...,N ) =
e H
Z
H =
X
(ijk)2E
JijkSiSjSk
glass transitionp-spin model
Jijk ⇠ N(0, 1)
IS THIS USEFUL IN MACHINE LEARNING?
Example: Single layer neural network = generalized linear regression.
Given (X,y) find w such that
μ = 1,…, n
i = 1,…, pyμ = φ(
p
∑
i=1
Xμiwi)
data
X
y
labels
w
weights
data
weights
(noisy) activation function
Take random iid Gaussian and random iid from
Create
Goal: Compute the best possible generalisation error achievable
with n samples of dimension p.
High-dimensional regime:
TEACHER-STUDENT MODEL
Xμi w*i
yμ = φ(
p
∑
i=1
Xμiw*i )
Pw
p → ∞
n → ∞
n/p = Ω(1)
data
X
y
labels
w
weights
data
weights
Gardner, Derrida’89, Gyorgyi’90
Take random iid Gaussian and random iid from
Create
Goal: Compute the best possible generalisation error achievable
with n samples of dimension p.
High-dimensional regime:
Xμi w*i
yμ = φ(
p
∑
i=1
Xμiw*i )
Pw
p → ∞n → ∞ n/p = Ω(1)
What did we win? Posterior is tractable with replica
and cavity method, developed in the theory of spin glasses.
P(w|X, y)
TEACHER-STUDENT MODEL
Gardner, Derrida’89, Gyorgyi’90
Optimal generalisation error for any non-linearity
and prior on weights.
Proof of the replica formula for the optimal
generalisation error.
Approximate message passing provably reaching the
optimal generalization error (out of the hard region).
Barbier, Krzakala, Macris, Miolane, LZ, COLT’18, arXiv:1708.03395
NEW W.R.T. 1990
LEARNING CURVESgeneralisationerror
φ(z) = sign(z) p → ∞
n → ∞ n/p = Ω(1)
optimal
AMP algorithm
logistic regression
Pw = 𝒩(0,1)
# of samples per dimension n/p
generalisationerror
# of samples per dimension
optimal, achievable
optimal
AMP algorithm
logistic regression
wi 2 { 1, +1}φ(z) = sign(z)
n/p
p → ∞
n → ∞ n/p = Ω(1)
PHASE TRANSITIONS
generalisationerror
# of samples per dimension
optimal, achievable
optimal
AMP algorithm
logistic regression
wi 2 { 1, +1}φ(z) = sign(z)
n/p
p → ∞
n → ∞ n/p = Ω(1)
hard
PHASE TRANSITIONS
HARD REGIME
INCLUDING HIDDEN VARIABLES
data
X
y
labels
w
v1
v2
weights
p input units
K hidden units
output unit
L=3 layers
n training samples
w learned, v1 & v2 fixed
Limit:
Committee machine
Model from Schwarze’92.
Proof of the replica formula, and approximate message passing Aubin, Maillard,
Barbier, Macris, Krzakala, LZ’19, spotlight at NeurIPS’18.
K = O(1)<latexit sha1_base64="pnb2kdx6DB2WkAndPWumY4tr6mw=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPQiFL0IHqxgP6BdSjbNtrHZZEmyQln6H7x4UMSr/8eb/8a03YO2Phh4vDfDzLwg5kwb1/12ciura+sb+c3C1vbO7l5x/6CpZaIIbRDJpWoHWFPOBG0YZjhtx4riKOC0FYyup37riSrNpHgw45j6ER4IFjKCjZWat5d3Ze+0Vyy5FXcGtEy8jJQgQ71X/Or2JUkiKgzhWOuO58bGT7EyjHA6KXQTTWNMRnhAO5YKHFHtp7NrJ+jEKn0USmVLGDRTf0+kONJ6HAW2M8JmqBe9qfif10lMeOGnTMSJoYLMF4UJR0ai6euozxQlho8twUQxeysiQ6wwMTaggg3BW3x5mTTPKp5b8e6rpdpVFkcejuAYyuDBOdTgBurQAAKP8Ayv8OZI58V5dz7mrTknmzmEP3A+fwD3vI4P</latexit><latexit sha1_base64="pnb2kdx6DB2WkAndPWumY4tr6mw=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPQiFL0IHqxgP6BdSjbNtrHZZEmyQln6H7x4UMSr/8eb/8a03YO2Phh4vDfDzLwg5kwb1/12ciura+sb+c3C1vbO7l5x/6CpZaIIbRDJpWoHWFPOBG0YZjhtx4riKOC0FYyup37riSrNpHgw45j6ER4IFjKCjZWat5d3Ze+0Vyy5FXcGtEy8jJQgQ71X/Or2JUkiKgzhWOuO58bGT7EyjHA6KXQTTWNMRnhAO5YKHFHtp7NrJ+jEKn0USmVLGDRTf0+kONJ6HAW2M8JmqBe9qfif10lMeOGnTMSJoYLMF4UJR0ai6euozxQlho8twUQxeysiQ6wwMTaggg3BW3x5mTTPKp5b8e6rpdpVFkcejuAYyuDBOdTgBurQAAKP8Ayv8OZI58V5dz7mrTknmzmEP3A+fwD3vI4P</latexit><latexit sha1_base64="pnb2kdx6DB2WkAndPWumY4tr6mw=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPQiFL0IHqxgP6BdSjbNtrHZZEmyQln6H7x4UMSr/8eb/8a03YO2Phh4vDfDzLwg5kwb1/12ciura+sb+c3C1vbO7l5x/6CpZaIIbRDJpWoHWFPOBG0YZjhtx4riKOC0FYyup37riSrNpHgw45j6ER4IFjKCjZWat5d3Ze+0Vyy5FXcGtEy8jJQgQ71X/Or2JUkiKgzhWOuO58bGT7EyjHA6KXQTTWNMRnhAO5YKHFHtp7NrJ+jEKn0USmVLGDRTf0+kONJ6HAW2M8JmqBe9qfif10lMeOGnTMSJoYLMF4UJR0ai6euozxQlho8twUQxeysiQ6wwMTaggg3BW3x5mTTPKp5b8e6rpdpVFkcejuAYyuDBOdTgBurQAAKP8Ayv8OZI58V5dz7mrTknmzmEP3A+fwD3vI4P</latexit><latexit sha1_base64="pnb2kdx6DB2WkAndPWumY4tr6mw=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPQiFL0IHqxgP6BdSjbNtrHZZEmyQln6H7x4UMSr/8eb/8a03YO2Phh4vDfDzLwg5kwb1/12ciura+sb+c3C1vbO7l5x/6CpZaIIbRDJpWoHWFPOBG0YZjhtx4riKOC0FYyup37riSrNpHgw45j6ER4IFjKCjZWat5d3Ze+0Vyy5FXcGtEy8jJQgQ71X/Or2JUkiKgzhWOuO58bGT7EyjHA6KXQTTWNMRnhAO5YKHFHtp7NrJ+jEKn0USmVLGDRTf0+kONJ6HAW2M8JmqBe9qfif10lMeOGnTMSJoYLMF4UJR0ai6euozxQlho8twUQxeysiQ6wwMTaggg3BW3x5mTTPKp5b8e6rpdpVFkcejuAYyuDBOdTgBurQAAKP8Ayv8OZI58V5dz7mrTknmzmEP3A+fwD3vI4P</latexit>
p → ∞
n → ∞ α = n/p = Ω(1)
PHASE TRANSITONS
Specialization phase transition
= hidden units specialise to
correlate with specific features.
K=2
sign(0) = 0<latexit sha1_base64="Dc5utTXextgZij3T2/A7p36jzo8=">AAAB+nicbVBNSwMxEJ31s9avrR69BItQLyUrgl6EohePFewHtEvJptk2NJtdkqxS1v4ULx4U8eov8ea/MW33oK0PBh7vzTAzL0gE1wbjb2dldW19Y7OwVdze2d3bd0sHTR2nirIGjUWs2gHRTHDJGoYbwdqJYiQKBGsFo5up33pgSvNY3ptxwvyIDCQPOSXGSj23lHVVhDQfyAo+naArhHtuGVfxDGiZeDkpQ456z/3q9mOaRkwaKojWHQ8nxs+IMpwKNil2U80SQkdkwDqWShIx7Wez0yfoxCp9FMbKljRopv6eyEik9TgKbGdEzFAvelPxP6+TmvDSz7hMUsMknS8KU4FMjKY5oD5XjBoxtoRQxe2tiA6JItTYtIo2BG/x5WXSPKt6uOrdnZdr13kcBTiCY6iABxdQg1uoQwMoPMIzvMKb8+S8OO/Ox7x1xclnDuEPnM8fCE+Shw==</latexit><latexit sha1_base64="Dc5utTXextgZij3T2/A7p36jzo8=">AAAB+nicbVBNSwMxEJ31s9avrR69BItQLyUrgl6EohePFewHtEvJptk2NJtdkqxS1v4ULx4U8eov8ea/MW33oK0PBh7vzTAzL0gE1wbjb2dldW19Y7OwVdze2d3bd0sHTR2nirIGjUWs2gHRTHDJGoYbwdqJYiQKBGsFo5up33pgSvNY3ptxwvyIDCQPOSXGSj23lHVVhDQfyAo+naArhHtuGVfxDGiZeDkpQ456z/3q9mOaRkwaKojWHQ8nxs+IMpwKNil2U80SQkdkwDqWShIx7Wez0yfoxCp9FMbKljRopv6eyEik9TgKbGdEzFAvelPxP6+TmvDSz7hMUsMknS8KU4FMjKY5oD5XjBoxtoRQxe2tiA6JItTYtIo2BG/x5WXSPKt6uOrdnZdr13kcBTiCY6iABxdQg1uoQwMoPMIzvMKb8+S8OO/Ox7x1xclnDuEPnM8fCE+Shw==</latexit><latexit sha1_base64="Dc5utTXextgZij3T2/A7p36jzo8=">AAAB+nicbVBNSwMxEJ31s9avrR69BItQLyUrgl6EohePFewHtEvJptk2NJtdkqxS1v4ULx4U8eov8ea/MW33oK0PBh7vzTAzL0gE1wbjb2dldW19Y7OwVdze2d3bd0sHTR2nirIGjUWs2gHRTHDJGoYbwdqJYiQKBGsFo5up33pgSvNY3ptxwvyIDCQPOSXGSj23lHVVhDQfyAo+naArhHtuGVfxDGiZeDkpQ456z/3q9mOaRkwaKojWHQ8nxs+IMpwKNil2U80SQkdkwDqWShIx7Wez0yfoxCp9FMbKljRopv6eyEik9TgKbGdEzFAvelPxP6+TmvDSz7hMUsMknS8KU4FMjKY5oD5XjBoxtoRQxe2tiA6JItTYtIo2BG/x5WXSPKt6uOrdnZdr13kcBTiCY6iABxdQg1uoQwMoPMIzvMKb8+S8OO/Ox7x1xclnDuEPnM8fCE+Shw==</latexit><latexit sha1_base64="Dc5utTXextgZij3T2/A7p36jzo8=">AAAB+nicbVBNSwMxEJ31s9avrR69BItQLyUrgl6EohePFewHtEvJptk2NJtdkqxS1v4ULx4U8eov8ea/MW33oK0PBh7vzTAzL0gE1wbjb2dldW19Y7OwVdze2d3bd0sHTR2nirIGjUWs2gHRTHDJGoYbwdqJYiQKBGsFo5up33pgSvNY3ptxwvyIDCQPOSXGSj23lHVVhDQfyAo+naArhHtuGVfxDGiZeDkpQ456z/3q9mOaRkwaKojWHQ8nxs+IMpwKNil2U80SQkdkwDqWShIx7Wez0yfoxCp9FMbKljRopv6eyEik9TgKbGdEzFAvelPxP6+TmvDSz7hMUsMknS8KU4FMjKY5oD5XjBoxtoRQxe2tiA6JItTYtIo2BG/x5WXSPKt6uOrdnZdr13kcBTiCY6iABxdQg1uoQwMoPMIzvMKb8+S8OO/Ox7x1xclnDuEPnM8fCE+Shw==</latexit>
0 1 2 3 4
α
0.00
0.05
0.10
0.15
0.20
0.25
Generalizationerrorϵg(α)
0.0
0.2
0.4
0.6
0.8
1.0
Overlapq
AMP q00
AMP q01
SE q00
SE q01
SE ϵg(α)
AMP ϵg(α)
Specialization
yμ = sign[sign(∑
i
Xμ,iwi,1) + sign
∑
i
(Xμ,iwi,2)]
Large algorithmic gap:
IT threshold:
Algorithmic threshold
K 1<latexit sha1_base64="tIKGLXfugTsoLV203AoKJohXlvk=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi+Clgv2ANpTNdpMu3WzC7kQooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMC1IpDLrut1NaW9/Y3CpvV3Z29/YPqodHbZNkmvEWS2SiuwE1XArFWyhQ8m6qOY0DyTvB+Hbmd564NiJRjzhJuR/TSIlQMIpW6tyTfhQRb1CtuXV3DrJKvILUoEBzUP3qDxOWxVwhk9SYnuem6OdUo2CSTyv9zPCUsjGNeM9SRWNu/Hx+7pScWWVIwkTbUkjm6u+JnMbGTOLAdsYUR2bZm4n/eb0Mw2s/FyrNkCu2WBRmkmBCZr+TodCcoZxYQpkW9lbCRlRThjahig3BW355lbQv6p5b9x4ua42bIo4ynMApnIMHV9CAO2hCCxiM4Rle4c1JnRfn3flYtJacYuYY/sD5/AH0iY6m</latexit><latexit sha1_base64="tIKGLXfugTsoLV203AoKJohXlvk=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi+Clgv2ANpTNdpMu3WzC7kQooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMC1IpDLrut1NaW9/Y3CpvV3Z29/YPqodHbZNkmvEWS2SiuwE1XArFWyhQ8m6qOY0DyTvB+Hbmd564NiJRjzhJuR/TSIlQMIpW6tyTfhQRb1CtuXV3DrJKvILUoEBzUP3qDxOWxVwhk9SYnuem6OdUo2CSTyv9zPCUsjGNeM9SRWNu/Hx+7pScWWVIwkTbUkjm6u+JnMbGTOLAdsYUR2bZm4n/eb0Mw2s/FyrNkCu2WBRmkmBCZr+TodCcoZxYQpkW9lbCRlRThjahig3BW355lbQv6p5b9x4ua42bIo4ynMApnIMHV9CAO2hCCxiM4Rle4c1JnRfn3flYtJacYuYY/sD5/AH0iY6m</latexit><latexit sha1_base64="tIKGLXfugTsoLV203AoKJohXlvk=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi+Clgv2ANpTNdpMu3WzC7kQooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMC1IpDLrut1NaW9/Y3CpvV3Z29/YPqodHbZNkmvEWS2SiuwE1XArFWyhQ8m6qOY0DyTvB+Hbmd564NiJRjzhJuR/TSIlQMIpW6tyTfhQRb1CtuXV3DrJKvILUoEBzUP3qDxOWxVwhk9SYnuem6OdUo2CSTyv9zPCUsjGNeM9SRWNu/Hx+7pScWWVIwkTbUkjm6u+JnMbGTOLAdsYUR2bZm4n/eb0Mw2s/FyrNkCu2WBRmkmBCZr+TodCcoZxYQpkW9lbCRlRThjahig3BW355lbQv6p5b9x4ua42bIo4ynMApnIMHV9CAO2hCCxiM4Rle4c1JnRfn3flYtJacYuYY/sD5/AH0iY6m</latexit><latexit sha1_base64="tIKGLXfugTsoLV203AoKJohXlvk=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi+Clgv2ANpTNdpMu3WzC7kQooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMC1IpDLrut1NaW9/Y3CpvV3Z29/YPqodHbZNkmvEWS2SiuwE1XArFWyhQ8m6qOY0DyTvB+Hbmd564NiJRjzhJuR/TSIlQMIpW6tyTfhQRb1CtuXV3DrJKvILUoEBzUP3qDxOWxVwhk9SYnuem6OdUo2CSTyv9zPCUsjGNeM9SRWNu/Hx+7pScWWVIwkTbUkjm6u+JnMbGTOLAdsYUR2bZm4n/eb0Mw2s/FyrNkCu2WBRmkmBCZr+TodCcoZxYQpkW9lbCRlRThjahig3BW355lbQv6p5b9x4ua42bIo4ynMApnIMHV9CAO2hCCxiM4Rle4c1JnRfn3flYtJacYuYY/sD5/AH0iY6m</latexit>
yμ = sign[
K
∑
a=1
sign(∑
i
Xμ,iwi,a)]
n > 7.65Kp
n > const . K2
p
PHASE TRANSITONS
0 2 4 6 8 10 12 14
α = (# of samples)/(#hidden units × input size)
0.0
0.1
0.2
0.3
0.4
0.5
Generalizationerrorϵg(α)
Non-specialized
hidden units
Specialized
hidden units
Computational gap
Bayes optimal ϵg(α)
AMP ϵg(α)
Discontinuous specialization
impossible hard doable todaydoable
# of samples
Good generalisation error
Our goal: Quantify this in more realistic models.
Design algorithms working in the doable region.
LZ, F. Krzakala, Statistical Physics of Algorithm: Threshold and Algorithms,
Advances of Physics (2016), arXiv:1511.02476.
J. Barbier, N. Macris, L. Miolane, F. Krzakala, LZ, Phase Transitions, Optimal
Errors and Optimality of Message-Passing in Generalized Linear Models,
arXiv:1708.03395, COLT’18.
B. Aubin, A. Maillard, J. Barbier, F. Krzakala N. Macris,, LZ, The committee
machine: Computational to statistical gaps in learning a two-layers neural
network, arXiv:1806.05451, NeurIPS’18.
REFERENCES
Thank you for your attention!
LZ, F. Krzakala, Statistical Physics of Algorithm: Threshold and Algorithms,
Advances of Physics (2016), arXiv:1511.02476.
J. Barbier, N. Macris, L. Miolane, F. Krzakala, LZ, Phase Transitions, Optimal
Errors and Optimality of Message-Passing in Generalized Linear Models,
arXiv:1708.03395, COLT’18.
B. Aubin, A. Maillard, J. Barbier, F. Krzakala N. Macris,, LZ, The committee
machine: Computational to statistical gaps in learning a two-layers neural
network, arXiv:1806.05451, NeurIPS’18.
REFERENCES

More Related Content

PDF
Macrocanonical models for texture synthesis
PDF
Gtti 10032021
PDF
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
PDF
Continuous and Discrete-Time Analysis of SGD
PDF
Maximum likelihood estimation of regularisation parameters in inverse problem...
PDF
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
PDF
Density exploration methods
PDF
Monash University short course, part I
Macrocanonical models for texture synthesis
Gtti 10032021
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
Continuous and Discrete-Time Analysis of SGD
Maximum likelihood estimation of regularisation parameters in inverse problem...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Density exploration methods
Monash University short course, part I

What's hot (20)

PDF
Mcmc & lkd free I
PDF
Introduction to advanced Monte Carlo methods
PDF
Monte Carlo methods for some not-quite-but-almost Bayesian problems
PDF
Supervised Planetary Unmixing with Optimal Transport
PDF
Couplings of Markov chains and the Poisson equation
PDF
Unbiased Markov chain Monte Carlo methods
PDF
Unbiased MCMC with couplings
PPTX
Bayesian Neural Networks
PDF
Winter school-pq2016v2
PDF
Numerical approach for Hamilton-Jacobi equations on a network: application to...
PDF
Thesis defense improved
PDF
A quantum-inspired optimization heuristic for the multiple sequence alignment...
PDF
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
PDF
Habilitation à diriger des recherches
PDF
WSC 2011, advanced tutorial on simulation in Statistics
PDF
Analyzing Meme Propagation in Multimemetic Algorithms
PDF
Redundancy and synergy in dynamical systems
PDF
Puneet Singla
PDF
An Analysis of a Selecto-Lamarckian Model of Multimemetic Algorithms with Dyn...
PDF
Bayesian inversion of deterministic dynamic causal models
Mcmc & lkd free I
Introduction to advanced Monte Carlo methods
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Supervised Planetary Unmixing with Optimal Transport
Couplings of Markov chains and the Poisson equation
Unbiased Markov chain Monte Carlo methods
Unbiased MCMC with couplings
Bayesian Neural Networks
Winter school-pq2016v2
Numerical approach for Hamilton-Jacobi equations on a network: application to...
Thesis defense improved
A quantum-inspired optimization heuristic for the multiple sequence alignment...
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Habilitation à diriger des recherches
WSC 2011, advanced tutorial on simulation in Statistics
Analyzing Meme Propagation in Multimemetic Algorithms
Redundancy and synergy in dynamical systems
Puneet Singla
An Analysis of a Selecto-Lamarckian Model of Multimemetic Algorithms with Dyn...
Bayesian inversion of deterministic dynamic causal models
Ad

Similar to Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, Researcher @CNRS (20)

PPT
Machine Learning ICS 273A
PPT
Machine Learning: Foundations Course Number 0368403401
PPT
Machine Learning: Foundations Course Number 0368403401
PPT
Machine Learning: Foundations Course Number 0368403401
PDF
Fundementals of Machine Learning and Deep Learning
PPTX
Machine Learning Seminar
PPTX
Machine learning ppt unit one syllabuspptx
PPTX
Machine learning ppt.
PPTX
Ancestry, Anecdotes & Avanan -DL for Amateurs
PPT
AML_030607.ppt
PPTX
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
PPTX
Artificial intelligence: Simulation of Intelligence
PPTX
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
PPT
COMP60431 Machine Learning Advanced Computer Science MSc
PDF
MLHEP 2015: Introductory Lecture #1
PPTX
Introduction to Machine Learningg
PPT
notes as .ppt
PDF
Introduction to Machine Learning with SciKit-Learn
PPTX
Symbolic Background Knowledge for Machine Learning
Machine Learning ICS 273A
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
Fundementals of Machine Learning and Deep Learning
Machine Learning Seminar
Machine learning ppt unit one syllabuspptx
Machine learning ppt.
Ancestry, Anecdotes & Avanan -DL for Amateurs
AML_030607.ppt
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Artificial intelligence: Simulation of Intelligence
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
COMP60431 Machine Learning Advanced Computer Science MSc
MLHEP 2015: Introductory Lecture #1
Introduction to Machine Learningg
notes as .ppt
Introduction to Machine Learning with SciKit-Learn
Symbolic Background Knowledge for Machine Learning
Ad

More from Paris Women in Machine Learning and Data Science (20)

PDF
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
PDF
No Capes Needed: The Real Superpowers of Women in Tech, by Aurélie Giard-Jacquet
PPTX
OpenCV Essentials: From Basics to Small Projects, by Irina Nikulina
PDF
AI Revolution: How Artificial Intelligence is Reshaping Business, by Alina Kr...
PDF
(Online) Convex Reinforcement Learning and applications to Energy Management ...
PDF
Welcome to the techno broligarchy by Mathilde Saliou
PDF
Low Rank Optimisation, by Irène Waldspurger
PDF
From Golem to Code: AI and Male Fantasies of Self-Engendering by Isabelle Collet
PDF
From Machine Learning Scientist to Full Stack Data Scientist: Lessons learned...
PDF
CI CD in the age of machine learning by Sofia Calcagno
PDF
Sequential and reinforcement learning for demand side management by Margaux B...
PDF
How and why AI should fight cybersexism, by Chloe Daudier
PDF
Anomaly detection and data imputation within time series
PPTX
Managing international tech teams, by Natasha Dimban
PDF
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
PPTX
PDF
Evaluation strategies for dealing with partially labelled or unlabelled data
PDF
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
PDF
An age-old question, by Caroline Jean-Pierre
PDF
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
No Capes Needed: The Real Superpowers of Women in Tech, by Aurélie Giard-Jacquet
OpenCV Essentials: From Basics to Small Projects, by Irina Nikulina
AI Revolution: How Artificial Intelligence is Reshaping Business, by Alina Kr...
(Online) Convex Reinforcement Learning and applications to Energy Management ...
Welcome to the techno broligarchy by Mathilde Saliou
Low Rank Optimisation, by Irène Waldspurger
From Golem to Code: AI and Male Fantasies of Self-Engendering by Isabelle Collet
From Machine Learning Scientist to Full Stack Data Scientist: Lessons learned...
CI CD in the age of machine learning by Sofia Calcagno
Sequential and reinforcement learning for demand side management by Margaux B...
How and why AI should fight cybersexism, by Chloe Daudier
Anomaly detection and data imputation within time series
Managing international tech teams, by Natasha Dimban
Optimizing GenAI apps, by N. El Mawass and Maria Knorps
Evaluation strategies for dealing with partially labelled or unlabelled data
Combinatorial Optimisation with Policy Adaptation using latent Space Search, ...
An age-old question, by Caroline Jean-Pierre
Applying Churn Prediction Approaches to the Telecom Industry, by Joëlle Lautré

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Modernizing your data center with Dell and AMD
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Approach and Philosophy of On baking technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation theory and applications.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
KodekX | Application Modernization Development
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Modernizing your data center with Dell and AMD
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Approach and Philosophy of On baking technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation theory and applications.pdf
Big Data Technologies - Introduction.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KodekX | Application Modernization Development
Unlocking AI with Model Context Protocol (MCP)
Agricultural_Statistics_at_a_Glance_2022_0.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Advanced methodologies resolving dimensionality complications for autism neur...
CIFDAQ's Market Insight: SEC Turns Pro Crypto

Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, Researcher @CNRS

  • 1. STATISTICAL PHYSICS MODELLING OF MACHINE LEARNING Lenka Zdeborová (IPhT; CEA Saclay & CNRS) WiMLDS meetup, November 29, 2018
  • 2. Long history of physics influencing machine learning. Examples: Gibbs-Bogoluibov-Feynman’60s - physics behind variational inference. Hopfield model’82. Spin glass models of neural networks Amit, Gutfreund, Sompolinsky’85. Boltzmann machine Hinton, Sejnowski’86 - named after the Boltzmann distribution. Gardner’87 - Maximum storage capacity in neural networks (related to VC dimension). SVMs by Boser, Guyon, Vapnik’92 inspired by Krauth, Mézard’87 Many papers on neural networks in physics in 80s and 90s. PHYSICS IN MACHINE LEARNING
  • 5. THE PUZZLE OF GENERALIZATION According to PAC bounds (via VC dimension, Rademacher complexity) neural networks that generalize well should not be able to fit random labels. ICLR’16
  • 6. THEORETICAL QUESTIONS IN DEEP LEARNING Why the lack of overfitting? “More parameters = more overfitting” Does not seem to hold in deep learning.
  • 7. SAMPLE COMPLEXITY Cifar10 - 50000 samples. How many samples are really needed? How low is the optimal sample complexity? Are we achieving it? If not, is it because of architectures or algorithms?
  • 9. THEORETICAL-PHYSICS ROADMAP 1. Experimental observation or fundamental hypothesis. 2. Unreasonably simple model for which toughest questions can be understood mathematically. 3. Generalize to more realistic models, relies on universality (= important laws of nature rarely depend on many details).
  • 10. MODELS H = J X (ij)2E SiSj P({Si}i=1,...,N ) = e H Z magnetism of materials In data science, models are used to fit the data. (e.g. linear regression: What is the best straight line that captures the dependence of y on x?) In physics, models are the main tool for understanding.
  • 11. MODELS In data science, models are used to fit the data. (e.g. linear regression: What is the best straight line that captures the dependence of y on x?) In physics, models are the main tool for understanding. P({Si}i=1,...,N ) = e H Z H = X (ijk)2E JijkSiSjSk glass transitionp-spin model Jijk ⇠ N(0, 1)
  • 12. IS THIS USEFUL IN MACHINE LEARNING? Example: Single layer neural network = generalized linear regression. Given (X,y) find w such that μ = 1,…, n i = 1,…, pyμ = φ( p ∑ i=1 Xμiwi) data X y labels w weights data weights (noisy) activation function
  • 13. Take random iid Gaussian and random iid from Create Goal: Compute the best possible generalisation error achievable with n samples of dimension p. High-dimensional regime: TEACHER-STUDENT MODEL Xμi w*i yμ = φ( p ∑ i=1 Xμiw*i ) Pw p → ∞ n → ∞ n/p = Ω(1) data X y labels w weights data weights Gardner, Derrida’89, Gyorgyi’90
  • 14. Take random iid Gaussian and random iid from Create Goal: Compute the best possible generalisation error achievable with n samples of dimension p. High-dimensional regime: Xμi w*i yμ = φ( p ∑ i=1 Xμiw*i ) Pw p → ∞n → ∞ n/p = Ω(1) What did we win? Posterior is tractable with replica and cavity method, developed in the theory of spin glasses. P(w|X, y) TEACHER-STUDENT MODEL Gardner, Derrida’89, Gyorgyi’90
  • 15. Optimal generalisation error for any non-linearity and prior on weights. Proof of the replica formula for the optimal generalisation error. Approximate message passing provably reaching the optimal generalization error (out of the hard region). Barbier, Krzakala, Macris, Miolane, LZ, COLT’18, arXiv:1708.03395 NEW W.R.T. 1990
  • 16. LEARNING CURVESgeneralisationerror φ(z) = sign(z) p → ∞ n → ∞ n/p = Ω(1) optimal AMP algorithm logistic regression Pw = 𝒩(0,1) # of samples per dimension n/p
  • 17. generalisationerror # of samples per dimension optimal, achievable optimal AMP algorithm logistic regression wi 2 { 1, +1}φ(z) = sign(z) n/p p → ∞ n → ∞ n/p = Ω(1) PHASE TRANSITIONS
  • 18. generalisationerror # of samples per dimension optimal, achievable optimal AMP algorithm logistic regression wi 2 { 1, +1}φ(z) = sign(z) n/p p → ∞ n → ∞ n/p = Ω(1) hard PHASE TRANSITIONS
  • 20. INCLUDING HIDDEN VARIABLES data X y labels w v1 v2 weights p input units K hidden units output unit L=3 layers n training samples w learned, v1 & v2 fixed Limit: Committee machine Model from Schwarze’92. Proof of the replica formula, and approximate message passing Aubin, Maillard, Barbier, Macris, Krzakala, LZ’19, spotlight at NeurIPS’18. K = O(1)<latexit sha1_base64="pnb2kdx6DB2WkAndPWumY4tr6mw=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPQiFL0IHqxgP6BdSjbNtrHZZEmyQln6H7x4UMSr/8eb/8a03YO2Phh4vDfDzLwg5kwb1/12ciura+sb+c3C1vbO7l5x/6CpZaIIbRDJpWoHWFPOBG0YZjhtx4riKOC0FYyup37riSrNpHgw45j6ER4IFjKCjZWat5d3Ze+0Vyy5FXcGtEy8jJQgQ71X/Or2JUkiKgzhWOuO58bGT7EyjHA6KXQTTWNMRnhAO5YKHFHtp7NrJ+jEKn0USmVLGDRTf0+kONJ6HAW2M8JmqBe9qfif10lMeOGnTMSJoYLMF4UJR0ai6euozxQlho8twUQxeysiQ6wwMTaggg3BW3x5mTTPKp5b8e6rpdpVFkcejuAYyuDBOdTgBurQAAKP8Ayv8OZI58V5dz7mrTknmzmEP3A+fwD3vI4P</latexit><latexit sha1_base64="pnb2kdx6DB2WkAndPWumY4tr6mw=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPQiFL0IHqxgP6BdSjbNtrHZZEmyQln6H7x4UMSr/8eb/8a03YO2Phh4vDfDzLwg5kwb1/12ciura+sb+c3C1vbO7l5x/6CpZaIIbRDJpWoHWFPOBG0YZjhtx4riKOC0FYyup37riSrNpHgw45j6ER4IFjKCjZWat5d3Ze+0Vyy5FXcGtEy8jJQgQ71X/Or2JUkiKgzhWOuO58bGT7EyjHA6KXQTTWNMRnhAO5YKHFHtp7NrJ+jEKn0USmVLGDRTf0+kONJ6HAW2M8JmqBe9qfif10lMeOGnTMSJoYLMF4UJR0ai6euozxQlho8twUQxeysiQ6wwMTaggg3BW3x5mTTPKp5b8e6rpdpVFkcejuAYyuDBOdTgBurQAAKP8Ayv8OZI58V5dz7mrTknmzmEP3A+fwD3vI4P</latexit><latexit sha1_base64="pnb2kdx6DB2WkAndPWumY4tr6mw=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPQiFL0IHqxgP6BdSjbNtrHZZEmyQln6H7x4UMSr/8eb/8a03YO2Phh4vDfDzLwg5kwb1/12ciura+sb+c3C1vbO7l5x/6CpZaIIbRDJpWoHWFPOBG0YZjhtx4riKOC0FYyup37riSrNpHgw45j6ER4IFjKCjZWat5d3Ze+0Vyy5FXcGtEy8jJQgQ71X/Or2JUkiKgzhWOuO58bGT7EyjHA6KXQTTWNMRnhAO5YKHFHtp7NrJ+jEKn0USmVLGDRTf0+kONJ6HAW2M8JmqBe9qfif10lMeOGnTMSJoYLMF4UJR0ai6euozxQlho8twUQxeysiQ6wwMTaggg3BW3x5mTTPKp5b8e6rpdpVFkcejuAYyuDBOdTgBurQAAKP8Ayv8OZI58V5dz7mrTknmzmEP3A+fwD3vI4P</latexit><latexit sha1_base64="pnb2kdx6DB2WkAndPWumY4tr6mw=">AAAB7XicbVBNSwMxEJ2tX7V+VT16CRahXsquFPQiFL0IHqxgP6BdSjbNtrHZZEmyQln6H7x4UMSr/8eb/8a03YO2Phh4vDfDzLwg5kwb1/12ciura+sb+c3C1vbO7l5x/6CpZaIIbRDJpWoHWFPOBG0YZjhtx4riKOC0FYyup37riSrNpHgw45j6ER4IFjKCjZWat5d3Ze+0Vyy5FXcGtEy8jJQgQ71X/Or2JUkiKgzhWOuO58bGT7EyjHA6KXQTTWNMRnhAO5YKHFHtp7NrJ+jEKn0USmVLGDRTf0+kONJ6HAW2M8JmqBe9qfif10lMeOGnTMSJoYLMF4UJR0ai6euozxQlho8twUQxeysiQ6wwMTaggg3BW3x5mTTPKp5b8e6rpdpVFkcejuAYyuDBOdTgBurQAAKP8Ayv8OZI58V5dz7mrTknmzmEP3A+fwD3vI4P</latexit> p → ∞ n → ∞ α = n/p = Ω(1)
  • 21. PHASE TRANSITONS Specialization phase transition = hidden units specialise to correlate with specific features. K=2 sign(0) = 0<latexit sha1_base64="Dc5utTXextgZij3T2/A7p36jzo8=">AAAB+nicbVBNSwMxEJ31s9avrR69BItQLyUrgl6EohePFewHtEvJptk2NJtdkqxS1v4ULx4U8eov8ea/MW33oK0PBh7vzTAzL0gE1wbjb2dldW19Y7OwVdze2d3bd0sHTR2nirIGjUWs2gHRTHDJGoYbwdqJYiQKBGsFo5up33pgSvNY3ptxwvyIDCQPOSXGSj23lHVVhDQfyAo+naArhHtuGVfxDGiZeDkpQ456z/3q9mOaRkwaKojWHQ8nxs+IMpwKNil2U80SQkdkwDqWShIx7Wez0yfoxCp9FMbKljRopv6eyEik9TgKbGdEzFAvelPxP6+TmvDSz7hMUsMknS8KU4FMjKY5oD5XjBoxtoRQxe2tiA6JItTYtIo2BG/x5WXSPKt6uOrdnZdr13kcBTiCY6iABxdQg1uoQwMoPMIzvMKb8+S8OO/Ox7x1xclnDuEPnM8fCE+Shw==</latexit><latexit sha1_base64="Dc5utTXextgZij3T2/A7p36jzo8=">AAAB+nicbVBNSwMxEJ31s9avrR69BItQLyUrgl6EohePFewHtEvJptk2NJtdkqxS1v4ULx4U8eov8ea/MW33oK0PBh7vzTAzL0gE1wbjb2dldW19Y7OwVdze2d3bd0sHTR2nirIGjUWs2gHRTHDJGoYbwdqJYiQKBGsFo5up33pgSvNY3ptxwvyIDCQPOSXGSj23lHVVhDQfyAo+naArhHtuGVfxDGiZeDkpQ456z/3q9mOaRkwaKojWHQ8nxs+IMpwKNil2U80SQkdkwDqWShIx7Wez0yfoxCp9FMbKljRopv6eyEik9TgKbGdEzFAvelPxP6+TmvDSz7hMUsMknS8KU4FMjKY5oD5XjBoxtoRQxe2tiA6JItTYtIo2BG/x5WXSPKt6uOrdnZdr13kcBTiCY6iABxdQg1uoQwMoPMIzvMKb8+S8OO/Ox7x1xclnDuEPnM8fCE+Shw==</latexit><latexit sha1_base64="Dc5utTXextgZij3T2/A7p36jzo8=">AAAB+nicbVBNSwMxEJ31s9avrR69BItQLyUrgl6EohePFewHtEvJptk2NJtdkqxS1v4ULx4U8eov8ea/MW33oK0PBh7vzTAzL0gE1wbjb2dldW19Y7OwVdze2d3bd0sHTR2nirIGjUWs2gHRTHDJGoYbwdqJYiQKBGsFo5up33pgSvNY3ptxwvyIDCQPOSXGSj23lHVVhDQfyAo+naArhHtuGVfxDGiZeDkpQ456z/3q9mOaRkwaKojWHQ8nxs+IMpwKNil2U80SQkdkwDqWShIx7Wez0yfoxCp9FMbKljRopv6eyEik9TgKbGdEzFAvelPxP6+TmvDSz7hMUsMknS8KU4FMjKY5oD5XjBoxtoRQxe2tiA6JItTYtIo2BG/x5WXSPKt6uOrdnZdr13kcBTiCY6iABxdQg1uoQwMoPMIzvMKb8+S8OO/Ox7x1xclnDuEPnM8fCE+Shw==</latexit><latexit sha1_base64="Dc5utTXextgZij3T2/A7p36jzo8=">AAAB+nicbVBNSwMxEJ31s9avrR69BItQLyUrgl6EohePFewHtEvJptk2NJtdkqxS1v4ULx4U8eov8ea/MW33oK0PBh7vzTAzL0gE1wbjb2dldW19Y7OwVdze2d3bd0sHTR2nirIGjUWs2gHRTHDJGoYbwdqJYiQKBGsFo5up33pgSvNY3ptxwvyIDCQPOSXGSj23lHVVhDQfyAo+naArhHtuGVfxDGiZeDkpQ456z/3q9mOaRkwaKojWHQ8nxs+IMpwKNil2U80SQkdkwDqWShIx7Wez0yfoxCp9FMbKljRopv6eyEik9TgKbGdEzFAvelPxP6+TmvDSz7hMUsMknS8KU4FMjKY5oD5XjBoxtoRQxe2tiA6JItTYtIo2BG/x5WXSPKt6uOrdnZdr13kcBTiCY6iABxdQg1uoQwMoPMIzvMKb8+S8OO/Ox7x1xclnDuEPnM8fCE+Shw==</latexit> 0 1 2 3 4 α 0.00 0.05 0.10 0.15 0.20 0.25 Generalizationerrorϵg(α) 0.0 0.2 0.4 0.6 0.8 1.0 Overlapq AMP q00 AMP q01 SE q00 SE q01 SE ϵg(α) AMP ϵg(α) Specialization yμ = sign[sign(∑ i Xμ,iwi,1) + sign ∑ i (Xμ,iwi,2)]
  • 22. Large algorithmic gap: IT threshold: Algorithmic threshold K 1<latexit sha1_base64="tIKGLXfugTsoLV203AoKJohXlvk=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi+Clgv2ANpTNdpMu3WzC7kQooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMC1IpDLrut1NaW9/Y3CpvV3Z29/YPqodHbZNkmvEWS2SiuwE1XArFWyhQ8m6qOY0DyTvB+Hbmd564NiJRjzhJuR/TSIlQMIpW6tyTfhQRb1CtuXV3DrJKvILUoEBzUP3qDxOWxVwhk9SYnuem6OdUo2CSTyv9zPCUsjGNeM9SRWNu/Hx+7pScWWVIwkTbUkjm6u+JnMbGTOLAdsYUR2bZm4n/eb0Mw2s/FyrNkCu2WBRmkmBCZr+TodCcoZxYQpkW9lbCRlRThjahig3BW355lbQv6p5b9x4ua42bIo4ynMApnIMHV9CAO2hCCxiM4Rle4c1JnRfn3flYtJacYuYY/sD5/AH0iY6m</latexit><latexit sha1_base64="tIKGLXfugTsoLV203AoKJohXlvk=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi+Clgv2ANpTNdpMu3WzC7kQooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMC1IpDLrut1NaW9/Y3CpvV3Z29/YPqodHbZNkmvEWS2SiuwE1XArFWyhQ8m6qOY0DyTvB+Hbmd564NiJRjzhJuR/TSIlQMIpW6tyTfhQRb1CtuXV3DrJKvILUoEBzUP3qDxOWxVwhk9SYnuem6OdUo2CSTyv9zPCUsjGNeM9SRWNu/Hx+7pScWWVIwkTbUkjm6u+JnMbGTOLAdsYUR2bZm4n/eb0Mw2s/FyrNkCu2WBRmkmBCZr+TodCcoZxYQpkW9lbCRlRThjahig3BW355lbQv6p5b9x4ua42bIo4ynMApnIMHV9CAO2hCCxiM4Rle4c1JnRfn3flYtJacYuYY/sD5/AH0iY6m</latexit><latexit sha1_base64="tIKGLXfugTsoLV203AoKJohXlvk=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi+Clgv2ANpTNdpMu3WzC7kQooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMC1IpDLrut1NaW9/Y3CpvV3Z29/YPqodHbZNkmvEWS2SiuwE1XArFWyhQ8m6qOY0DyTvB+Hbmd564NiJRjzhJuR/TSIlQMIpW6tyTfhQRb1CtuXV3DrJKvILUoEBzUP3qDxOWxVwhk9SYnuem6OdUo2CSTyv9zPCUsjGNeM9SRWNu/Hx+7pScWWVIwkTbUkjm6u+JnMbGTOLAdsYUR2bZm4n/eb0Mw2s/FyrNkCu2WBRmkmBCZr+TodCcoZxYQpkW9lbCRlRThjahig3BW355lbQv6p5b9x4ua42bIo4ynMApnIMHV9CAO2hCCxiM4Rle4c1JnRfn3flYtJacYuYY/sD5/AH0iY6m</latexit><latexit sha1_base64="tIKGLXfugTsoLV203AoKJohXlvk=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lE0GPRi+Clgv2ANpTNdpMu3WzC7kQooT/CiwdFvPp7vPlv3LY5aOuDgcd7M8zMC1IpDLrut1NaW9/Y3CpvV3Z29/YPqodHbZNkmvEWS2SiuwE1XArFWyhQ8m6qOY0DyTvB+Hbmd564NiJRjzhJuR/TSIlQMIpW6tyTfhQRb1CtuXV3DrJKvILUoEBzUP3qDxOWxVwhk9SYnuem6OdUo2CSTyv9zPCUsjGNeM9SRWNu/Hx+7pScWWVIwkTbUkjm6u+JnMbGTOLAdsYUR2bZm4n/eb0Mw2s/FyrNkCu2WBRmkmBCZr+TodCcoZxYQpkW9lbCRlRThjahig3BW355lbQv6p5b9x4ua42bIo4ynMApnIMHV9CAO2hCCxiM4Rle4c1JnRfn3flYtJacYuYY/sD5/AH0iY6m</latexit> yμ = sign[ K ∑ a=1 sign(∑ i Xμ,iwi,a)] n > 7.65Kp n > const . K2 p PHASE TRANSITONS 0 2 4 6 8 10 12 14 α = (# of samples)/(#hidden units × input size) 0.0 0.1 0.2 0.3 0.4 0.5 Generalizationerrorϵg(α) Non-specialized hidden units Specialized hidden units Computational gap Bayes optimal ϵg(α) AMP ϵg(α) Discontinuous specialization
  • 23. impossible hard doable todaydoable # of samples Good generalisation error Our goal: Quantify this in more realistic models. Design algorithms working in the doable region.
  • 24. LZ, F. Krzakala, Statistical Physics of Algorithm: Threshold and Algorithms, Advances of Physics (2016), arXiv:1511.02476. J. Barbier, N. Macris, L. Miolane, F. Krzakala, LZ, Phase Transitions, Optimal Errors and Optimality of Message-Passing in Generalized Linear Models, arXiv:1708.03395, COLT’18. B. Aubin, A. Maillard, J. Barbier, F. Krzakala N. Macris,, LZ, The committee machine: Computational to statistical gaps in learning a two-layers neural network, arXiv:1806.05451, NeurIPS’18. REFERENCES
  • 25. Thank you for your attention! LZ, F. Krzakala, Statistical Physics of Algorithm: Threshold and Algorithms, Advances of Physics (2016), arXiv:1511.02476. J. Barbier, N. Macris, L. Miolane, F. Krzakala, LZ, Phase Transitions, Optimal Errors and Optimality of Message-Passing in Generalized Linear Models, arXiv:1708.03395, COLT’18. B. Aubin, A. Maillard, J. Barbier, F. Krzakala N. Macris,, LZ, The committee machine: Computational to statistical gaps in learning a two-layers neural network, arXiv:1806.05451, NeurIPS’18. REFERENCES