Problem 1 – First-Order Predicate Calculus (15 points)

University of Wisconsin – Madison
Computer Sciences Department

CS 760 - Machine Learning
Fall 2001

Exam
7:15-9:15pm, December 13, 2001
Room 1240 CS & Stats

CLOSED BOOK
(one sheet of notes and a calculator allowed)

Write your answers on these pages and show your work. If you feel that a question is not fully
specified, state any assumptions you need to make in order to solve the problem. You may use
the backs of these sheets for scratch work.

Write your name on this and all other pages of this exam. Make sure your exam contains
6 problems on 10 pages.

Name ________________________________________________________________

Student ID ________________________________________________________________

Problem Score Max Score

1 ______ 24

2 ______ 15

3 ______ 16

4 ______ 14

5 ______ 7

6 ______ 24

TOTAL ______ 100

Name: _______________________________________

Problem 1 – Learning from Labelled Examples (24 points)

Imagine that you are given the following set of training examples.
Each feature can take on one of three nominal values: a, b, or c.

F1 F2 F3 Category

a c a +
c a c +
a a c –
b c a –
c c b –

a) How would a Naive Bayes system classify the following test example?
Be sure to show your work.

F1 = a F2 = c F3 = b

b) Describe how a 3-nearest-neighbor algorithm would classify Part a’s test example.

Page 2 of 10

Name: _______________________________________

c) Show the calculations that ID3 would perform to determine the root node of a decision tree
using the above training examples.

d) Now consider augmenting the standard ID3 algorithm so that it also considers tests like

the value of feature X = the value of feature Y

for all pairs of features X and Y where X ≠ Y. Show what this variant of ID3 would choose as
a root node for the training set above.

Page 3 of 10

Name: _______________________________________

Problem 2 – Weight Space and Neural Networks (15 points)

Assume that you wish to train a perceptron on the simple training set below.

F1 Category
1 +
8 +
2 –
4 –

a) Draw the weight space for this task, assuming that the perceptron’s threshold is always set at
4. Also assume that the perceptron’s output is 1 (i.e., category = +) when the perceptron’s
weighted sum meets or exceeds its threshold; otherwise its output is 0. (Since the threshold
is constant, you need not draw its dimension in weight space. Also, do not normalize the
values of F1)

b) Assuming that we initially set the weight on the link between F1 and the output node to the
value 5, state the range of final weight settings that could result from applying
backpropagation training. Be sure to explain your answer. (Do not train the threshold in this
part; hold it constant at the value of 4. Assume that the step function of Part a is replaced with a very
steep sigmoidal activation function, so that the activation function is technically differentiable.)

c) Starting from the initial state of Part b and using a learning rate of 0.1, draw the perceptron
before and after training with (just) the last example in the training set above. For this part,
you do need to train the threshold.

Page 4 of 10

Name: _______________________________________

Problem 3 – Overfitting Avoidance (16 points)

For each of the following learning methods, briefly describe and motivate one (1) commonly
used technique for overfitting avoidance.

a) Nearest-neighbor learning

Brief Description (of an overfitting-avoidance technique):

Motivation (of why it might reduce overfitting):

b) Naïve Bayesian learning

Brief Description:

Motivation:

c) Decision-tree induction

Brief Description:

Motivation:

d) Neural network training

Brief Description:

Motivation:

Page 5 of 10

Name: _______________________________________

Problem 4 – Reinforcement Learning (14 points)

Consider the deterministic reinforcement environment drawn below. The numbers on the arcs
indicate the immediate rewards. Let the discount rate equal 0.9.
10 5
star a b
t -10

-10
-10
c

5

end 0

a) What is the best route for going from start to end? Why?

b) Represent the Q table by placing Q values on the arcs on the environment's state-action
graph; initialize all of the Q values to 2 except initialize all of the arcs directly involving
node a to have a Q value of -1. For Step 1, do exploitation. Show on the graph below the
full Q table after Step 1. Specify the action chosen and display the calculations involved in
altering the Q table.

star a b
t

c

end

Page 6 of 10

Name: _______________________________________

c) Assume that after Step 1, the RL agent is magically transported back to the state start. Show
the resulting Q table after the learner takes its second step from the starting state. Step 2
should be exploration. Be sure to again state the action chosen and display your calculations.

star a b
t

c

end

d) Explain one (1) major advantage and one (1) major disadvantage of using a Q network
instead of a Q table in reinforcement learning.

advantage:

disadvantage:

Page 7 of 10

Name: _______________________________________

Problem 5 – Inductive Logic Programming (7 points)

Assume that we tell FOIL that P(a) and P(b) are positive instances of P(?X) and that P(c) and
P(d) are negative instances (where ?X is a variable, while a, b, c, and d are constants).

We also give the following background knowledge to FOIL:

Q(a) ¬Q(b) Q(c) ¬Q(d)

R(a) ¬R(b) R(c) R(d)

(where “¬” means “not”).

Show the calculations that FOIL would go through in order to choose its first rule for P(?X).

Page 8 of 10

Name: _______________________________________

Problem 6 – Short Discussion Questions (24 points)

a) Why might it make sense to learn a "world model" when learning from reinforcements?

b) What is the major advantage that FOIL has over ID3? Explain your answer.

c) Would you expect ensemble methods to work better for decision-tree induction or for Naïve
Bayes classifiers? Why?

d) Assume that we want to empirically compare the accuracies of two learning algorithms on a
given dataset,what experimental methodology should we use?

Page 9 of 10

Name: _______________________________________

e) Assuming one has linearly separable data,what is the key difference between standard
perceptron training and Support Vector Machines?

f) Briefly explain one (1) connection between the Minimal Description Length principle and
Support Vector Machines.

g) What role does the VC Dimension play in machine learning?

h) Why does one need both tuning and testing sets in machine learning?

Have a good vacation!

Page 10 of 10

Problem 1 – First-Order Predicate Calculus (15 points)

More Related Content

What's hot (18)

Viewers also liked (20)

Similar to Problem 1 – First-Order Predicate Calculus (15 points) (20)

More from butest (20)

Problem 1 – First-Order Predicate Calculus (15 points)