SlideShare a Scribd company logo
University of Wisconsin – Madison
                                 Computer Sciences Department



                     CS 760 - Machine Learning
                                               Fall 2001

                                                Exam
                                   7:15-9:15pm, December 13, 2001
                                        Room 1240 CS & Stats

                                            CLOSED BOOK
                              (one sheet of notes and a calculator allowed)


Write your answers on these pages and show your work. If you feel that a question is not fully
specified, state any assumptions you need to make in order to solve the problem. You may use
the backs of these sheets for scratch work.

Write your name on this and all other pages of this exam. Make sure your exam contains
6 problems on 10 pages.


       Name           ________________________________________________________________


       Student ID     ________________________________________________________________


                      Problem                   Score           Max Score


                          1                     ______               24

                          2                     ______               15

                          3                     ______               16

                          4                     ______               14

                          5                     ______                7

                          6                     ______               24


                      TOTAL                     ______              100
Name: _______________________________________


Problem 1 – Learning from Labelled Examples (24 points)

Imagine that you are given the following set of training examples.
Each feature can take on one of three nominal values: a, b, or c.

              F1       F2       F3    Category

               a       c        a        +
               c       a        c        +
               a       a        c        –
               b       c        a        –
               c       c        b        –


a) How would a Naive Bayes system classify the following test example?
   Be sure to show your work.

              F1 = a        F2 = c   F3 = b




b) Describe how a 3-nearest-neighbor algorithm would classify Part a’s test example.




                                              Page 2 of 10
Name: _______________________________________




c) Show the calculations that ID3 would perform to determine the root node of a decision tree
     using the above training examples.




d)   Now consider augmenting the standard ID3 algorithm so that it also considers tests like

                          the value of feature X = the value of feature Y

     for all pairs of features X and Y where X ≠ Y. Show what this variant of ID3 would choose as
     a root node for the training set above.




                                            Page 3 of 10
Name: _______________________________________


Problem 2 – Weight Space and Neural Networks (15 points)

Assume that you wish to train a perceptron on the simple training set below.

                                 F1          Category
                                 1              +
                                 8              +
                                 2              –
                                 4              –

a) Draw the weight space for this task, assuming that the perceptron’s threshold is always set at
   4. Also assume that the perceptron’s output is 1 (i.e., category = +) when the perceptron’s
   weighted sum meets or exceeds its threshold; otherwise its output is 0. (Since the threshold
   is constant, you need not draw its dimension in weight space. Also, do not normalize the
   values of F1)




b) Assuming that we initially set the weight on the link between F1 and the output node to the
   value 5, state the range of final weight settings that could result from applying
   backpropagation training. Be sure to explain your answer. (Do not train the threshold in this
   part; hold it constant at the value of 4. Assume that the step function of Part a is replaced with a very
   steep sigmoidal activation function, so that the activation function is technically differentiable.)




c) Starting from the initial state of Part b and using a learning rate of 0.1, draw the perceptron
   before and after training with (just) the last example in the training set above. For this part,
   you do need to train the threshold.




                                               Page 4 of 10
Name: _______________________________________


Problem 3 – Overfitting Avoidance (16 points)

For each of the following learning methods, briefly describe and motivate one (1) commonly
used technique for overfitting avoidance.

a) Nearest-neighbor learning

       Brief Description (of an overfitting-avoidance technique):




       Motivation (of why it might reduce overfitting):




b) Naïve Bayesian learning

       Brief Description:




       Motivation:




c) Decision-tree induction

       Brief Description:




       Motivation:




d) Neural network training

       Brief Description:




       Motivation:




                                              Page 5 of 10
Name: _______________________________________


Problem 4 – Reinforcement Learning (14 points)

Consider the deterministic reinforcement environment drawn below. The numbers on the arcs
indicate the immediate rewards. Let the discount rate equal 0.9.
                          10                           5
               star                       a                         b
               t                                 -10

                                 -10
                                                                     -10
                                                             c

                                                             5

                                                                   end       0




a)   What is the best route for going from start to end? Why?




b)   Represent the Q table by placing Q values on the arcs on the environment's state-action
     graph; initialize all of the Q values to 2 except initialize all of the arcs directly involving
     node a to have a Q value of -1. For Step 1, do exploitation. Show on the graph below the
     full Q table after Step 1. Specify the action chosen and display the calculations involved in
     altering the Q table.




               star                       a                         b
               t


                                                             c



                                                                   end




                                              Page 6 of 10
Name: _______________________________________


c)   Assume that after Step 1, the RL agent is magically transported back to the state start. Show
     the resulting Q table after the learner takes its second step from the starting state. Step 2
     should be exploration. Be sure to again state the action chosen and display your calculations.




              star                       a                        b
              t


                                                            c



                                                                 end




d)   Explain one (1) major advantage and one (1) major disadvantage of using a Q network
     instead of a Q table in reinforcement learning.


        advantage:




        disadvantage:




                                             Page 7 of 10
Name: _______________________________________


Problem 5 – Inductive Logic Programming (7 points)

Assume that we tell FOIL that P(a) and P(b) are positive instances of P(?X) and that P(c) and
P(d) are negative instances (where ?X is a variable, while a, b, c, and d are constants).

We also give the following background knowledge to FOIL:

       Q(a)     ¬Q(b)      Q(c)     ¬Q(d)

       R(a)     ¬R(b)      R(c)      R(d)

(where “¬” means “not”).

Show the calculations that FOIL would go through in order to choose its first rule for P(?X).




                                            Page 8 of 10
Name: _______________________________________




Problem 6 – Short Discussion Questions (24 points)

a) Why might it make sense to learn a "world model" when learning from reinforcements?




b) What is the major advantage that FOIL has over ID3? Explain your answer.




c) Would you expect ensemble methods to work better for decision-tree induction or for Naïve
   Bayes classifiers? Why?




d) Assume that we want to empirically compare the accuracies of two learning algorithms on a
   given dataset,what experimental methodology should we use?




                                         Page 9 of 10
Name: _______________________________________


e) Assuming one has linearly separable data,what is the key difference between standard
   perceptron training and Support Vector Machines?




f) Briefly explain one (1) connection between the Minimal Description Length principle and
   Support Vector Machines.




g)   What role does the VC Dimension play in machine learning?




h)   Why does one need both tuning and testing sets in machine learning?




                                         Have a good vacation!




                                           Page 10 of 10

More Related Content

PDF
CXC MATHEMATICS MULTIPLE CHOICE
DOC
Spring 2003
PDF
CXC -maths-2009-p1
PDF
cxc June 2010 math
PDF
January 2010
PDF
Multiple choice one
PDF
January 2008
PDF
June 2009
CXC MATHEMATICS MULTIPLE CHOICE
Spring 2003
CXC -maths-2009-p1
cxc June 2010 math
January 2010
Multiple choice one
January 2008
June 2009

What's hot (18)

PDF
cxc.Mathsexam1
PDF
January 2009
PDF
June 2006
PDF
Csec maths paper2_2010-2016
PDF
F4 04 Mathematical Reasoning
PDF
River Valley Emath Paper1_printed
PDF
June 2008
PDF
June 2005
PDF
January 2012
PDF
January 2011
PDF
Discrete mathematics (1)
PDF
F4 06 Statistics Iii
DOC
Mid term examination -2011 class vi
DOC
Mid term examination -2011 class vii
PDF
June 2011
DOC
Mid term paper of Maths class VI 2011 Fazaia Inter college
DOC
Final examination 2011 class vi
PDF
956 Sukatan Pelajaran Matematik Lanjutan STPM (Baharu)
cxc.Mathsexam1
January 2009
June 2006
Csec maths paper2_2010-2016
F4 04 Mathematical Reasoning
River Valley Emath Paper1_printed
June 2008
June 2005
January 2012
January 2011
Discrete mathematics (1)
F4 06 Statistics Iii
Mid term examination -2011 class vi
Mid term examination -2011 class vii
June 2011
Mid term paper of Maths class VI 2011 Fazaia Inter college
Final examination 2011 class vi
956 Sukatan Pelajaran Matematik Lanjutan STPM (Baharu)
Ad

Viewers also liked (20)

DOC
Document.doc.doc
DOCX
SCHOOL MISSION AND STRUCTURE
DOC
ph-report.doc
PPT
notes as .ppt
ODP
Alexander Krizhanovsky Krizhanovsky Hpds
PPT
What s an Event ? How Ontologies and Linguistic Semantics ...
DOC
CP2083 Introduction to Artificial Intelligence
PPT
Rojo
DOCX
world history.docx (19K) - Get Online Tutoring
DOC
Dragomir R
DOCX
Julie Acker, M.S.W., CMHA Lambton Julie Acker holds a Masters ...
PDF
Zakelijk Inzetten Van Je IntuïTie 2010 [Compatibiliteitsmodus]
PPS
Solis-Double
PPT
source2
PPT
"выставка образование и карьера" Дешко Ирина, 2 курс, 10 группа
PPT
Amarillo
DOC
(Download Word File 113K)
PPT
mlrev.ppt
DOC
Product Overview
PPS
день геолога 2010
Document.doc.doc
SCHOOL MISSION AND STRUCTURE
ph-report.doc
notes as .ppt
Alexander Krizhanovsky Krizhanovsky Hpds
What s an Event ? How Ontologies and Linguistic Semantics ...
CP2083 Introduction to Artificial Intelligence
Rojo
world history.docx (19K) - Get Online Tutoring
Dragomir R
Julie Acker, M.S.W., CMHA Lambton Julie Acker holds a Masters ...
Zakelijk Inzetten Van Je IntuïTie 2010 [Compatibiliteitsmodus]
Solis-Double
source2
"выставка образование и карьера" Дешко Ирина, 2 курс, 10 группа
Amarillo
(Download Word File 113K)
mlrev.ppt
Product Overview
день геолога 2010
Ad

Similar to Problem 1 – First-Order Predicate Calculus (15 points) (20)

DOC
Problem 1 – First-Order Predicate Calculus (15 points)
PDF
633-600 Machine Learning
PPT
ppt
PPT
ppt
KEY
Active Learning in Recommender Systems
PDF
Machine Learning
PPT
Active learning lecture
PDF
The Back Propagation Learning Algorithm
PDF
Microsoft PowerPoint - ml4textweb00
PPTX
Evolutionary Testing of Stateful Systems: a Holistic Approach
PPTX
BehavioMetrics: A Big Data Approach
DOC
abstrakty přijatých příspěvků.doc
PDF
Introduction to Machine Learning
PDF
R05010303 C O M P U T E R P R O G R A M M I N G A N D N U M E R I C A L M E ...
PDF
Modeling XCS in class imbalances: Population sizing and parameter settings
PDF
Introduction to active learning
PPT
Nn ppt
PDF
Introducing LCS to Digital Design Verification
PDF
Machine learning Lecture 1
Problem 1 – First-Order Predicate Calculus (15 points)
633-600 Machine Learning
ppt
ppt
Active Learning in Recommender Systems
Machine Learning
Active learning lecture
The Back Propagation Learning Algorithm
Microsoft PowerPoint - ml4textweb00
Evolutionary Testing of Stateful Systems: a Holistic Approach
BehavioMetrics: A Big Data Approach
abstrakty přijatých příspěvků.doc
Introduction to Machine Learning
R05010303 C O M P U T E R P R O G R A M M I N G A N D N U M E R I C A L M E ...
Modeling XCS in class imbalances: Population sizing and parameter settings
Introduction to active learning
Nn ppt
Introducing LCS to Digital Design Verification
Machine learning Lecture 1

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

Problem 1 – First-Order Predicate Calculus (15 points)

  • 1. University of Wisconsin – Madison Computer Sciences Department CS 760 - Machine Learning Fall 2001 Exam 7:15-9:15pm, December 13, 2001 Room 1240 CS & Stats CLOSED BOOK (one sheet of notes and a calculator allowed) Write your answers on these pages and show your work. If you feel that a question is not fully specified, state any assumptions you need to make in order to solve the problem. You may use the backs of these sheets for scratch work. Write your name on this and all other pages of this exam. Make sure your exam contains 6 problems on 10 pages. Name ________________________________________________________________ Student ID ________________________________________________________________ Problem Score Max Score 1 ______ 24 2 ______ 15 3 ______ 16 4 ______ 14 5 ______ 7 6 ______ 24 TOTAL ______ 100
  • 2. Name: _______________________________________ Problem 1 – Learning from Labelled Examples (24 points) Imagine that you are given the following set of training examples. Each feature can take on one of three nominal values: a, b, or c. F1 F2 F3 Category a c a + c a c + a a c – b c a – c c b – a) How would a Naive Bayes system classify the following test example? Be sure to show your work. F1 = a F2 = c F3 = b b) Describe how a 3-nearest-neighbor algorithm would classify Part a’s test example. Page 2 of 10
  • 3. Name: _______________________________________ c) Show the calculations that ID3 would perform to determine the root node of a decision tree using the above training examples. d) Now consider augmenting the standard ID3 algorithm so that it also considers tests like the value of feature X = the value of feature Y for all pairs of features X and Y where X ≠ Y. Show what this variant of ID3 would choose as a root node for the training set above. Page 3 of 10
  • 4. Name: _______________________________________ Problem 2 – Weight Space and Neural Networks (15 points) Assume that you wish to train a perceptron on the simple training set below. F1 Category 1 + 8 + 2 – 4 – a) Draw the weight space for this task, assuming that the perceptron’s threshold is always set at 4. Also assume that the perceptron’s output is 1 (i.e., category = +) when the perceptron’s weighted sum meets or exceeds its threshold; otherwise its output is 0. (Since the threshold is constant, you need not draw its dimension in weight space. Also, do not normalize the values of F1) b) Assuming that we initially set the weight on the link between F1 and the output node to the value 5, state the range of final weight settings that could result from applying backpropagation training. Be sure to explain your answer. (Do not train the threshold in this part; hold it constant at the value of 4. Assume that the step function of Part a is replaced with a very steep sigmoidal activation function, so that the activation function is technically differentiable.) c) Starting from the initial state of Part b and using a learning rate of 0.1, draw the perceptron before and after training with (just) the last example in the training set above. For this part, you do need to train the threshold. Page 4 of 10
  • 5. Name: _______________________________________ Problem 3 – Overfitting Avoidance (16 points) For each of the following learning methods, briefly describe and motivate one (1) commonly used technique for overfitting avoidance. a) Nearest-neighbor learning Brief Description (of an overfitting-avoidance technique): Motivation (of why it might reduce overfitting): b) Naïve Bayesian learning Brief Description: Motivation: c) Decision-tree induction Brief Description: Motivation: d) Neural network training Brief Description: Motivation: Page 5 of 10
  • 6. Name: _______________________________________ Problem 4 – Reinforcement Learning (14 points) Consider the deterministic reinforcement environment drawn below. The numbers on the arcs indicate the immediate rewards. Let the discount rate equal 0.9. 10 5 star a b t -10 -10 -10 c 5 end 0 a) What is the best route for going from start to end? Why? b) Represent the Q table by placing Q values on the arcs on the environment's state-action graph; initialize all of the Q values to 2 except initialize all of the arcs directly involving node a to have a Q value of -1. For Step 1, do exploitation. Show on the graph below the full Q table after Step 1. Specify the action chosen and display the calculations involved in altering the Q table. star a b t c end Page 6 of 10
  • 7. Name: _______________________________________ c) Assume that after Step 1, the RL agent is magically transported back to the state start. Show the resulting Q table after the learner takes its second step from the starting state. Step 2 should be exploration. Be sure to again state the action chosen and display your calculations. star a b t c end d) Explain one (1) major advantage and one (1) major disadvantage of using a Q network instead of a Q table in reinforcement learning. advantage: disadvantage: Page 7 of 10
  • 8. Name: _______________________________________ Problem 5 – Inductive Logic Programming (7 points) Assume that we tell FOIL that P(a) and P(b) are positive instances of P(?X) and that P(c) and P(d) are negative instances (where ?X is a variable, while a, b, c, and d are constants). We also give the following background knowledge to FOIL: Q(a) ¬Q(b) Q(c) ¬Q(d) R(a) ¬R(b) R(c) R(d) (where “¬” means “not”). Show the calculations that FOIL would go through in order to choose its first rule for P(?X). Page 8 of 10
  • 9. Name: _______________________________________ Problem 6 – Short Discussion Questions (24 points) a) Why might it make sense to learn a "world model" when learning from reinforcements? b) What is the major advantage that FOIL has over ID3? Explain your answer. c) Would you expect ensemble methods to work better for decision-tree induction or for Naïve Bayes classifiers? Why? d) Assume that we want to empirically compare the accuracies of two learning algorithms on a given dataset,what experimental methodology should we use? Page 9 of 10
  • 10. Name: _______________________________________ e) Assuming one has linearly separable data,what is the key difference between standard perceptron training and Support Vector Machines? f) Briefly explain one (1) connection between the Minimal Description Length principle and Support Vector Machines. g) What role does the VC Dimension play in machine learning? h) Why does one need both tuning and testing sets in machine learning? Have a good vacation! Page 10 of 10