5
Most read
6
Most read
A Theory of the Learnable
Leslie Valiant
Dhruv Gairola
Computational Complexity, Michael Soltys
gairold@mcmaster.ca ; dhruvgairola.blogspot.ca

November 13, 2013

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

1 / 15
Overview

1

Learning

2

Contribution

3

PAC learning
Sample complexity
Boolean functions
k-decision lists

4

Conclusion

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

2 / 15
Learning

Humans can learn.
Machine learning (ML) : learning from data; knowledge acquisition
w/o explicit programming.
Explore computational models for learning.
Use models to get insights about learning.
Use models to develop new learning algorithms.

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

3 / 15
Modelling supervised Learning

Given training set of labelled examples, learning algorithm generates a
hypothesis (candidate function). Run hypothesis on test set to check
how good it is.
But how good really? Maybe training and test data consists of bad
examples so the hypothesis doesn’t generalize well.
Insight : Introduce probabilities to measure degree of certainty and
correctness.

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

4 / 15
Contribution

With high probability an (efficient) learning algorithm will find a
hypothesis that is approximately identical to the hidden target
function.
Intuition : A hypothesis built from a large amount of training data is
unlikely to be wrong i.e., Probably approximately correct (PAC).

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

5 / 15
PAC learning

Goal : show that after training, with high probability, all good
hypothesis will be approximately correct.
Notation :
X : set of all possible examples
D : distribution from which examples are drawn
H : set of all possible hypothesis
N : |Xtraining |
f : target function

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

6 / 15
PAC learning (2)

Hypothesis hg ∈ H is approximately correct if :
error (hg ) ≤ where
error(h) = P(h(x) = f (x)| x drawn from D)

Bad hypothesis :
error (hb ) >
P(hb disagrees with 1 example) >

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

7 / 15
PAC learning (3)

P(hb agrees with 1 example) ≤ (1 − ).
P(hb agrees with N examples) ≤ (1 − )N .
P(Hb contains a good hypothesis) ≤ |Hb |(1 − )N ≤ |H|(1 − )N .
Lets say |H|(1 − )N ≤ δ.
...
N ≥ ( 1 )(ln 1 + ln|H|)
δ
This expresses sample complexity.

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

8 / 15
Sample complexity

N ≥ ( 1 )(ln 1 + ln|H|)
δ
If you train the learning algo with Xtraining of size N, then the
returned hypothesis is PAC because there exists a probability (1 − δ)
that this hypothesis will have an error of at most (approximately).
e.g., if you want smaller and smaller δ, you need more N’s (more
examples).
Lets look at example of H : boolean functions.

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

9 / 15
Why boolean functions?

Because boolean functions can represent concepts, which is what we
commonly want machines to learn.
Concepts are predicates e.g., isMaleOrFemale(height).

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

10 / 15
Boolean functions

Boolean functions are of the form f : {0, 1}n → {0, 1} where n are
the number of literals.
n

Let H = {all boolean functions on n literals} ∴ |H| = 22

Substituting H into sample complexity expression gives O(2n ) i.e.,
boolean functions are not PAC-learnable.
Can we restrict size of H?

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

11 / 15
k-decision lists

A single decision list (DL) is a representation of a single boolean
function. DL is not PAC-learnable either.
A single DL consists of a series of tests.
e.g. if f1 then return b1 ; elseif f2 then return b2 ; ... elseif fn return bn ;
A single DL corresponds to a single hypothesis.
Apply restriction : A k-decision list is a decision list where each test is
a conjunction of at most k literals.

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

12 / 15
k-decision lists (2)

What is |H| for k-DL i.e., what is |k-DL(n)| where n is number of
literals?
k
k
After calculations, |k-DL(n)| = 2O(n log (n ))
Substitute |k-DL(n)| into sample complexity expression :
N ≥ 1 (ln 1 + O(nk log (nk )))
δ
δ
Sample complexity is poly! What about learning complexity?
There are efficient algorithms for learning k-decision lists! (e.g.,
greedy algorithm)
We have polynomial sample complexity and efficient k-DL algorithms
∴ k-DL is PAC learnable!

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

13 / 15
Conclusion

PAC learning : with high
probability an (efficient)
learning algorithm will find a
hypothesis that is
approximately identical to
the hidden target hypothesis.
k-DL is PAC learnable.
Computational learning
theory : concerned with the
analysis of ML algorithms
and covers a lot of fields.

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

14 / 15
References

Carla Gomes, Cornell, Foundations of AI notes

Dhruv Gairola (McMaster Univ.)

A Theory of the Learnable

November 13, 2013

15 / 15

More Related Content

PPTX
PAC Learning and The VC Dimension
PPT
Predicate Logic
PPTX
no free-lunch theorem
PPTX
Introduction to using google colab
PDF
Algorithm chapter 10
PDF
Policy gradient
PPTX
MACHINE LEARNING-LEARNING RULE
PDF
2014-06-20 Multinomial Logistic Regression with Apache Spark
PAC Learning and The VC Dimension
Predicate Logic
no free-lunch theorem
Introduction to using google colab
Algorithm chapter 10
Policy gradient
MACHINE LEARNING-LEARNING RULE
2014-06-20 Multinomial Logistic Regression with Apache Spark

What's hot (20)

PDF
word embeddings and applications to machine translation and sentiment analysis
PPTX
Binary Class and Multi Class Strategies for Machine Learning
PDF
Deep Dive into Hyperparameter Tuning
PPT
K means Clustering Algorithm
PDF
Ai lecture 12(unit03)
PPTX
Network flows
PDF
Reinforcement Learning 8: Planning and Learning with Tabular Methods
PDF
Introduction to Google Colaboratory.pdf
PPTX
Homomorphic Encryption Scheme.pptx
PPTX
Complex Network Analysis
PPTX
Dijkstra s algorithm
PPTX
Hyperparameter Tuning
PDF
Machine Learning Algorithm - KNN
PDF
PR-043: HyperNetworks
PPTX
CART Algorithm.pptx
PPT
Prolog basics
PPTX
A Simple Introduction to Word Embeddings
PPTX
Asymptotic Notation
PDF
Introduction to XGBoost
word embeddings and applications to machine translation and sentiment analysis
Binary Class and Multi Class Strategies for Machine Learning
Deep Dive into Hyperparameter Tuning
K means Clustering Algorithm
Ai lecture 12(unit03)
Network flows
Reinforcement Learning 8: Planning and Learning with Tabular Methods
Introduction to Google Colaboratory.pdf
Homomorphic Encryption Scheme.pptx
Complex Network Analysis
Dijkstra s algorithm
Hyperparameter Tuning
Machine Learning Algorithm - KNN
PR-043: HyperNetworks
CART Algorithm.pptx
Prolog basics
A Simple Introduction to Word Embeddings
Asymptotic Notation
Introduction to XGBoost
Ad

Similar to A Theory of the Learnable; PAC Learning (20)

PPT
Data.Mining.C.6(II).classification and prediction
PDF
PRIMES is in P
PDF
Latent Dirichlet Allocation
PDF
pres_coconat
PDF
Lecture 3 (Supervised learning)
PPT
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
PDF
Basic review on topic modeling
PDF
Bayesian Hierarchical Models
PPT
ppt
PDF
PAGOdA poster
PPTX
Chapter1p2.pptx
PPTX
Chapter1p2.pptx
PPT
2.7 other classifiers
PDF
Day 3 SPSS
PDF
A new generalized lindley distribution
PPT
Introduction to machine learning
PDF
Linear Discriminant Analysis and Its Generalization
PDF
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
PDF
Deep Domain Adaptation using Adversarial Learning and GAN
PDF
Matrix Completion Presentation
Data.Mining.C.6(II).classification and prediction
PRIMES is in P
Latent Dirichlet Allocation
pres_coconat
Lecture 3 (Supervised learning)
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Basic review on topic modeling
Bayesian Hierarchical Models
ppt
PAGOdA poster
Chapter1p2.pptx
Chapter1p2.pptx
2.7 other classifiers
Day 3 SPSS
A new generalized lindley distribution
Introduction to machine learning
Linear Discriminant Analysis and Its Generalization
Deep Learning Opening Workshop - Horseshoe Regularization for Machine Learnin...
Deep Domain Adaptation using Adversarial Learning and GAN
Matrix Completion Presentation
Ad

More from dhruvgairola (7)

PDF
A Generic Algebraic Model for the Analysis of Cryptographic Key Assignment Sc...
PPTX
Differences bet. versions of UML diagrams.
PPTX
Beginning jQuery
PPTX
Beginning CSS.
PPTX
Discussion : Info sharing across private DBs
PPTX
PPTX
Potters wheel
A Generic Algebraic Model for the Analysis of Cryptographic Key Assignment Sc...
Differences bet. versions of UML diagrams.
Beginning jQuery
Beginning CSS.
Discussion : Info sharing across private DBs
Potters wheel

Recently uploaded (20)

PDF
CloudStack 4.21: First Look Webinar slides
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
UiPath Agentic Automation session 1: RPA to Agents
PPTX
Modernising the Digital Integration Hub
PDF
Credit Without Borders: AI and Financial Inclusion in Bangladesh
PDF
STKI Israel Market Study 2025 version august
PPTX
Microsoft Excel 365/2024 Beginner's training
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PPT
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
A review of recent deep learning applications in wood surface defect identifi...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PPTX
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx
CloudStack 4.21: First Look Webinar slides
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Zenith AI: Advanced Artificial Intelligence
UiPath Agentic Automation session 1: RPA to Agents
Modernising the Digital Integration Hub
Credit Without Borders: AI and Financial Inclusion in Bangladesh
STKI Israel Market Study 2025 version august
Microsoft Excel 365/2024 Beginner's training
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
NewMind AI Weekly Chronicles – August ’25 Week III
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
Galois Field Theory of Risk: A Perspective, Protocol, and Mathematical Backgr...
Enhancing emotion recognition model for a student engagement use case through...
The influence of sentiment analysis in enhancing early warning system model f...
Benefits of Physical activity for teenagers.pptx
A review of recent deep learning applications in wood surface defect identifi...
Taming the Chaos: How to Turn Unstructured Data into Decisions
Custom Battery Pack Design Considerations for Performance and Safety
AI IN MARKETING- PRESENTED BY ANWAR KABIR 1st June 2025.pptx

A Theory of the Learnable; PAC Learning

  • 1. A Theory of the Learnable Leslie Valiant Dhruv Gairola Computational Complexity, Michael Soltys gairold@mcmaster.ca ; dhruvgairola.blogspot.ca November 13, 2013 Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 1 / 15
  • 2. Overview 1 Learning 2 Contribution 3 PAC learning Sample complexity Boolean functions k-decision lists 4 Conclusion Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 2 / 15
  • 3. Learning Humans can learn. Machine learning (ML) : learning from data; knowledge acquisition w/o explicit programming. Explore computational models for learning. Use models to get insights about learning. Use models to develop new learning algorithms. Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 3 / 15
  • 4. Modelling supervised Learning Given training set of labelled examples, learning algorithm generates a hypothesis (candidate function). Run hypothesis on test set to check how good it is. But how good really? Maybe training and test data consists of bad examples so the hypothesis doesn’t generalize well. Insight : Introduce probabilities to measure degree of certainty and correctness. Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 4 / 15
  • 5. Contribution With high probability an (efficient) learning algorithm will find a hypothesis that is approximately identical to the hidden target function. Intuition : A hypothesis built from a large amount of training data is unlikely to be wrong i.e., Probably approximately correct (PAC). Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 5 / 15
  • 6. PAC learning Goal : show that after training, with high probability, all good hypothesis will be approximately correct. Notation : X : set of all possible examples D : distribution from which examples are drawn H : set of all possible hypothesis N : |Xtraining | f : target function Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 6 / 15
  • 7. PAC learning (2) Hypothesis hg ∈ H is approximately correct if : error (hg ) ≤ where error(h) = P(h(x) = f (x)| x drawn from D) Bad hypothesis : error (hb ) > P(hb disagrees with 1 example) > Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 7 / 15
  • 8. PAC learning (3) P(hb agrees with 1 example) ≤ (1 − ). P(hb agrees with N examples) ≤ (1 − )N . P(Hb contains a good hypothesis) ≤ |Hb |(1 − )N ≤ |H|(1 − )N . Lets say |H|(1 − )N ≤ δ. ... N ≥ ( 1 )(ln 1 + ln|H|) δ This expresses sample complexity. Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 8 / 15
  • 9. Sample complexity N ≥ ( 1 )(ln 1 + ln|H|) δ If you train the learning algo with Xtraining of size N, then the returned hypothesis is PAC because there exists a probability (1 − δ) that this hypothesis will have an error of at most (approximately). e.g., if you want smaller and smaller δ, you need more N’s (more examples). Lets look at example of H : boolean functions. Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 9 / 15
  • 10. Why boolean functions? Because boolean functions can represent concepts, which is what we commonly want machines to learn. Concepts are predicates e.g., isMaleOrFemale(height). Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 10 / 15
  • 11. Boolean functions Boolean functions are of the form f : {0, 1}n → {0, 1} where n are the number of literals. n Let H = {all boolean functions on n literals} ∴ |H| = 22 Substituting H into sample complexity expression gives O(2n ) i.e., boolean functions are not PAC-learnable. Can we restrict size of H? Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 11 / 15
  • 12. k-decision lists A single decision list (DL) is a representation of a single boolean function. DL is not PAC-learnable either. A single DL consists of a series of tests. e.g. if f1 then return b1 ; elseif f2 then return b2 ; ... elseif fn return bn ; A single DL corresponds to a single hypothesis. Apply restriction : A k-decision list is a decision list where each test is a conjunction of at most k literals. Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 12 / 15
  • 13. k-decision lists (2) What is |H| for k-DL i.e., what is |k-DL(n)| where n is number of literals? k k After calculations, |k-DL(n)| = 2O(n log (n )) Substitute |k-DL(n)| into sample complexity expression : N ≥ 1 (ln 1 + O(nk log (nk ))) δ δ Sample complexity is poly! What about learning complexity? There are efficient algorithms for learning k-decision lists! (e.g., greedy algorithm) We have polynomial sample complexity and efficient k-DL algorithms ∴ k-DL is PAC learnable! Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 13 / 15
  • 14. Conclusion PAC learning : with high probability an (efficient) learning algorithm will find a hypothesis that is approximately identical to the hidden target hypothesis. k-DL is PAC learnable. Computational learning theory : concerned with the analysis of ML algorithms and covers a lot of fields. Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 14 / 15
  • 15. References Carla Gomes, Cornell, Foundations of AI notes Dhruv Gairola (McMaster Univ.) A Theory of the Learnable November 13, 2013 15 / 15