A Theory of the Learnable; PAC Learning

A Theory of the Learnable
Leslie Valiant
Dhruv Gairola
Computational Complexity, Michael Soltys
gairold@mcmaster.ca ; dhruvgairola.blogspot.ca

November 13, 2013

Dhruv Gairola (McMaster Univ.)


November 13, 2013

1 / 15

Overview

1

Learning

2

Contribution

3

PAC learning
Sample complexity
Boolean functions
k-decision lists

4

Conclusion



November 13, 2013

2 / 15

Learning

Humans can learn.
Machine learning (ML) : learning from data; knowledge acquisition
w/o explicit programming.
Explore computational models for learning.
Use models to get insights about learning.
Use models to develop new learning algorithms.



November 13, 2013

3 / 15

Modelling supervised Learning

Given training set of labelled examples, learning algorithm generates a
hypothesis (candidate function). Run hypothesis on test set to check
how good it is.
But how good really? Maybe training and test data consists of bad
examples so the hypothesis doesn’t generalize well.
Insight : Introduce probabilities to measure degree of certainty and
correctness.



November 13, 2013

4 / 15

Contribution

With high probability an (eﬃcient) learning algorithm will ﬁnd a
hypothesis that is approximately identical to the hidden target
function.
Intuition : A hypothesis built from a large amount of training data is
unlikely to be wrong i.e., Probably approximately correct (PAC).



November 13, 2013

5 / 15

PAC learning

Goal : show that after training, with high probability, all good
hypothesis will be approximately correct.
Notation :
X : set of all possible examples
D : distribution from which examples are drawn
H : set of all possible hypothesis
N : |Xtraining |
f : target function



November 13, 2013

6 / 15

PAC learning (2)

Hypothesis hg ∈ H is approximately correct if :
error (hg ) ≤ where
error(h) = P(h(x) = f (x)| x drawn from D)

Bad hypothesis :
error (hb ) >
P(hb disagrees with 1 example) >



November 13, 2013

7 / 15

PAC learning (3)

P(hb agrees with 1 example) ≤ (1 − ).
P(hb agrees with N examples) ≤ (1 − )N .
P(Hb contains a good hypothesis) ≤ |Hb |(1 − )N ≤ |H|(1 − )N .
Lets say |H|(1 − )N ≤ δ.
...
N ≥ ( 1 )(ln 1 + ln|H|)
δ
This expresses sample complexity.



November 13, 2013

8 / 15

Sample complexity

N ≥ ( 1 )(ln 1 + ln|H|)
δ
If you train the learning algo with Xtraining of size N, then the
returned hypothesis is PAC because there exists a probability (1 − δ)
that this hypothesis will have an error of at most (approximately).
e.g., if you want smaller and smaller δ, you need more N’s (more
examples).
Lets look at example of H : boolean functions.



November 13, 2013

9 / 15

Why boolean functions?

Because boolean functions can represent concepts, which is what we
commonly want machines to learn.
Concepts are predicates e.g., isMaleOrFemale(height).



November 13, 2013

10 / 15

Boolean functions

Boolean functions are of the form f : {0, 1}n → {0, 1} where n are
the number of literals.
n

Let H = {all boolean functions on n literals} ∴ |H| = 22

Substituting H into sample complexity expression gives O(2n ) i.e.,
boolean functions are not PAC-learnable.
Can we restrict size of H?



November 13, 2013

11 / 15

k-decision lists

A single decision list (DL) is a representation of a single boolean
function. DL is not PAC-learnable either.
A single DL consists of a series of tests.
e.g. if f1 then return b1 ; elseif f2 then return b2 ; ... elseif fn return bn ;
A single DL corresponds to a single hypothesis.
Apply restriction : A k-decision list is a decision list where each test is
a conjunction of at most k literals.



November 13, 2013

12 / 15

k-decision lists (2)

What is |H| for k-DL i.e., what is |k-DL(n)| where n is number of
literals?
k
k
After calculations, |k-DL(n)| = 2O(n log (n ))
Substitute |k-DL(n)| into sample complexity expression :
N ≥ 1 (ln 1 + O(nk log (nk )))
δ
δ
Sample complexity is poly! What about learning complexity?
There are eﬃcient algorithms for learning k-decision lists! (e.g.,
greedy algorithm)
We have polynomial sample complexity and eﬃcient k-DL algorithms
∴ k-DL is PAC learnable!



November 13, 2013

13 / 15

Conclusion

PAC learning : with high
probability an (efficient)
learning algorithm will find a
hypothesis that is
approximately identical to
the hidden target hypothesis.
k-DL is PAC learnable.
Computational learning
theory : concerned with the
analysis of ML algorithms
and covers a lot of fields.



November 13, 2013

14 / 15

References

Carla Gomes, Cornell, Foundations of AI notes



November 13, 2013

15 / 15

A Theory of the Learnable; PAC Learning

More Related Content

What's hot (20)

Similar to A Theory of the Learnable; PAC Learning (20)

More from dhruvgairola (7)

Recently uploaded (20)

A Theory of the Learnable; PAC Learning