General Info                                                                     General Info (cont’d)
                                                                                         instructor: Jörg Tiedemann (j.tiedemann@rug.nl)
   Machine Learning, LIX004M5                                                                 Harmoniegebouw, room 1311-429                                               You need an account on ’hagen’ for the labs!
                                                                                                                                                                          (But you may also work from home or somewhere
                Overview and Introduction                                                prerequisites: open to students in Computer                                      else)
                                                                                              Science, Artificial Intelligence and Information
                          J¨ rg Tiedemann
                           o                                                                  Science
                        tiedeman@let.rug.nl
                                                                                              2nd year student or higher
                                                                                              background: programming ability, elementary                                 Go to A. Da Costa. (Harmoniegebouw, k. 336,
                          Informatiekunde
                     Rijksuniversiteit Groningen                                              statistics                                                                  bouwdeel 13.13, telefoon 363 5801) (dagelijks 10.30-
                                                                                         schedule: September 5 - October 21                                               12.00 en 14.00-15.30 uur, vrijdagmiddag gesloten)!
                                                                                                • lectures: mondays 13-15
                                                                                                • labs: fridays 9-11 (4 times only!)




                                                   Machine Learning, LIX004M5 – p.1/50                                              Machine Learning, LIX004M5 – p.2/50                                            Machine Learning, LIX004M5 – p.3/50




General Info (cont’d)                                                                    General Info (cont’d)                                                            General Info (cont’d)
Website: http://www.let.rug.nl/˜tiedeman/ml06                                            Purpose of this course                                                             •   Examination:
Examination: lab assignments and exercises (50%),                                          • Introduction (!) to machine learning techniques                                     • obligatory lab assignments (50%)

    written exam (50%)                                                                       (How much do you know already?)                                                     • written exam (50%)

Exam: Friday, October 27, 9-12                                                             • Discussion of several machine learning                                         •   A minimum of 6 for both parts is required!
Literature: Tom Mitchell Machine Learning, New                                               approaches                                                                     •   Exam is open book
    York: McGraw-Hill, 1997                                                                • Examples and applications in various fields
    additional on-line literature (links available from                                    • Practical assignments
    the course website)                                                                       • using Weka - a machine learning package
                                                                                                 implemented in Java
                                                                                              • a little bit of programming/scripting
                                                                                              • some theoretical questions


                                                   Machine Learning, LIX004M5 – p.4/50                                              Machine Learning, LIX004M5 – p.5/50                                            Machine Learning, LIX004M5 – p.6/50




Preliminary Program                                                                      General comments                                                                 What is Machine learning?
 1. Organization, Introduction, (Ch.1, Ch.5)                                               •   Read the book! (and other literature if necessary)                         Machine Learning is
 2. Inductive learning (Ch.2), Decision Trees (Ch.3)                                       •   Ask questions! (and I’ll try to answer)                                     • the study of algorithms that
      • Lab 1 - Decision Trees
                                                                                           •   Tell me if you think that something’s wrong                                     • improve their performance
 3. Instance-Based Learning (Ch.8)                                                                                                                                             • at some task
                                                                                           •   Keep the deadlines!
      • Lab 2 - Instance-based learning
                                                                                               (1 week late → half the points, later → no points)                              • with experience

 4. Bayesian Learning (Ch.6)
      • Lab 3 - Learner comparison/combination
                                                                                                                                                                          ... just like a human being ... (?)
 5. Sequential data & Markov Models, M&S Ch.9, Bilmex;
      • Lab 4 - Markov models

 6. Maximum Entropy models, Combining Learners
 7. Genetic Algorithms (Ch.9), Reinforcement Learning (Ch.13)

                                                   Machine Learning, LIX004M5 – p.7/50                                              Machine Learning, LIX004M5 – p.8/50                                            Machine Learning, LIX004M5 – p.9/50
What is all the hype about ML?                                                                    Why machine learning?                                                            Typical Data Mining Task
                                                                                                  data mining: pattern recognition, knowledge
                                                                                                       discovery, use historical data to improve future
"Every time I fire a linguist the performance of the                                                    decisions, prediction (classification, regression),
recognizer goes up"                                                                                    data discripton (clustering, summarization,
                                                                                                       visualization)
(probably) said by Fred Jelinek (IBM speech group) in the 80s, quoted by, e.g.,                   complex applications: we cannot program by hand,
Jurafsky and Martin, Speech and Language Processing.                                                   (efficient) processing of complex signals                                    Given:
                                                                                                  self-customizing programs: automatic adjustments                                    • 9714 patient records, each describing a pregnancy and birth
                                                                                                       according to usage, dynamic systems                                            • Each patient record contains 215 features
                                                                                                                                                                                   Learn to predict:
                                                                                                                                                                                      • Classes of future patients at high risk for Emergency Cesarean Section



                                                           Machine Learning, LIX004M5 – p.10/50                                             Machine Learning, LIX004M5 – p.11/50                                                            Machine Learning, LIX004M5 – p.12/50




Pattern Recognition                                                                               Complex applications                                                             Classification
Object detection                                                                                  Operating robots:                                                                Personal home page? Company website? Educational site?
                                                                                                  ALVINN [Pomerleau] drives 70 mph on highways




                                                           Machine Learning, LIX004M5 – p.13/50                                             Machine Learning, LIX004M5 – p.14/50                                                            Machine Learning, LIX004M5 – p.15/50




Automatic customization                                                                           Machine learning is growing                                                      Questions to ask
                                                                                                  many more applications:                                                          Learning = improve with experience at some task
                                                                                                    •   speech recognition                                                            •   What experience?
                                                                                                    •   robot control                                                                 •   What exactly should be learned?
                                                                                                    •   spam filtering, data sorting                                                   •   How shall it be represented?
                                                                                                    •   machine translation                                                           •   What specific algorithm to learn it?
                                                                                                    •   financial data analysis and market predictions                              Goal: handle unseen data correctly according to the
                                                                                                    •   hand writing recognition                                                   task (use your knowledge inferred from experience!)
                                                                                                    •   data clustering and visualization
                                                                                                    •   pattern recognition in genetics (e.g. DNA
                                                                                                        sequences)

                                                           Machine Learning, LIX004M5 – p.16/50                                             Machine Learning, LIX004M5 – p.17/50                                                            Machine Learning, LIX004M5 – p.18/50
What experience?                                                                What exactly should be learned?                                                             How shall it be represented?
  •   What do we know already about the task and                                Outcome of the target function                                                              Model selection
      possible solutions? (prior knowledge)                                      • boolean (→ concept learning)
                                                                                                                                                                               •   symbolic representation (e.g. rules)
  •   What kind of data do we have available?                                    • discrete values (→ classification)
      (training examples)                                                                                                                                                      •   subsymbolic representation (neural networks,
                                                                                 • real values (→ regression)                                                                      SVMs)
      What are the discriminative features? How are
      they connected with each other (dependencies)?                            many machine learning tasks are classification tasks ...                                     Do we want to restrict the space of possible solutions?
  •   Is a “teacher” available (→ supervised learning)                                                                                                                      (→ restriction bias ... we come back to this)
      or not (→ unsupervised learning)?
      How expensive is labeling?
  •   How much data do we need and how clean does it
      have to be?


                                         Machine Learning, LIX004M5 – p.19/50                                                        Machine Learning, LIX004M5 – p.20/50                                                               Machine Learning, LIX004M5 – p.21/50




What algorithm to learn it?                                                     What algorithm to learn it?                                                                 Learning Models
Learning means approximating the real (unknown)                                 Learning means approximating the real (unknown)                                             Learning means approximating the real (unknown)
target function according to our experience (e.g.                               target function according to our experience (e.g.                                           target function according to our experience (e.g.
observed training examples)                                                     observed training examples)                                                                 observed training examples)
                                                                                → Learning = Search for a “good” hypothesis/model                                           → Learning = search for a “good” hypothesis/model
                                                                                Do we want to prefer certain models?                                                        Which one is better?
                                                                                (→ preference bias ... later more)




                                         Machine Learning, LIX004M5 – p.22/50                                                        Machine Learning, LIX004M5 – p.23/50                                                               Machine Learning, LIX004M5 – p.24/50




Learning Models                                                                 What algorithm to learn it?                                                                 The roots of ML
Learning means approximating the real (unknown)                                 Learning means approximating the real (unknown)                                             Artificial intelligence: use prior knowledge and training data to guide
target function according to our experience (e.g.                               target function according to our experience (e.g.                                                 learning as a search problem
observed training examples)                                                     observed training examples)                                                                 Baysian methods: probabilistic classifiers, probabilistic reasoning
                                                                                                                                                                            Computational complexity theory: trade-off between model (learning)
→ Learning = search for a “good” hypothesis/model                               → Learning = search for a “good” hypothesis/model                                               complexity and performance
                                                                                                                                                                            Control theory: control optimisation processes
Which one is better?                                                              • supervised learning (classified data available)
                                                                                  • unsupervised learning (e.g. clustering)                                                 Information theory: entropy, information content, code optimsation and the
                                                                                                                                                                                 minimum description length principle
                                                                                  • inductive learning (from training data)
                                                                                                                                                                            Philosophy: Occam’s razor (simple is best)
                                                                                  • deductive learning (data + domain theory)
                                                                                                                                                                            Psychology and neurobiology: response improvement with practice, ideas
                                                                                  • gradient descent, bayesian learning, reinforcement learning ...                              that lead to artificical neural networks
                                                                                                                                                                            Statistics: data description, estimation of probability distributions,
                                                                                                                                                                                   evaluation, confidence

                                         Machine Learning, LIX004M5 – p.25/50                                                        Machine Learning, LIX004M5 – p.26/50                                                               Machine Learning, LIX004M5 – p.27/50
A walk-through example                                                                      A walk-through example                                                           A walk-through example
from Duda et al: Pattern Classification                                                      from Duda et al: Pattern Classification                                           Procedure:
   •   Task: automatically sort incoming fish on a                                              •   Task: automatically sort incoming fish on a                                preprocessing: isolate fishes from one another and
       conveyor belt into “sea bass” or “salmon”                                                   conveyor belt into “sea bass” or “salmon”                                     from the background of the images
                                                                                               •   Experience: sample images                                                 feature selection: determine discriminative features
We want a machine to learn this task.
The machine needs some “experience”.                                                                                                                                             to be extracted from the images (e.g. length,
                                                                                                                                                                                 lightness, width, position of mouth, etc),
                                                                                                                                                                                 feature selection = kind of data reduction (focus
                                                                                                                                                                                 on relevant information)
                                                                                                                                                                             feature extraction: extract selected features from
                                                                                                                                                                                 images and pass them to a classifier



                                                     Machine Learning, LIX004M5 – p.28/50                                             Machine Learning, LIX004M5 – p.29/50                                            Machine Learning, LIX004M5 – p.30/50




A walk-through example                                                                      A walk-through example                                                           A walk-through example
                                                                                            Select length for discrimiation:                                                 Lightness is a better feature:




                                                     Machine Learning, LIX004M5 – p.31/50                                             Machine Learning, LIX004M5 – p.32/50                                            Machine Learning, LIX004M5 – p.33/50




A walk-through example                                                                      A walk-through example                                                           A walk-through example
   •   devise decision rule or move decision boundary                                       Feature selection:                                                               We still need to:
       to minimize some classification cost (→ decision                                        • distinguishing (similar for objects in same
       theory)                                                                                                                                                                 •   select an appropriate type of model for
                                                                                                category and very different for different                                          classification (e.g. function class to define
   •   a single feature might not be enough minimize                                            categories)                                                                        separation boundaries)
       costs                                                                                  • invariant (feature value doesn’t change when
                                                                                                                                                                               •   select the model that generalizes the best (to be
→ feature vector, e.g.:                                                                         changing the context)                                                              able to classify even unseen objects correctly)
                                                                                              • insensitive to noise
                                                                                                                                                                               •   consider computational complexity (trade-off
                                X1         width                                              • simple to extract                                                                  between complexity and performance; scalibility)
                     X=
                                X2       lightness




                                                     Machine Learning, LIX004M5 – p.34/50                                             Machine Learning, LIX004M5 – p.35/50                                            Machine Learning, LIX004M5 – p.36/50
A walk-through example                                                        A walk-through example                                                            A walk-through example




Linear decision boundary                                                      Overly complex decision boundary                                                  a good trade-off between performance on the training
                                                                              (What is the problem?)                                                            set and model simplicity
                                       Machine Learning, LIX004M5 – p.37/50                                              Machine Learning, LIX004M5 – p.38/50                                              Machine Learning, LIX004M5 – p.39/50




The Design Cycle                                                              Evaluation                                                                        Evaluation
                                                                              We have ...                                                                       Evaluation of classifiers based on
                                                                                •   different feature sets                                                        •   accuracy or error rate
                                                                                •   different models                                                                  (percentage of classification errors)
                                                                                •   different learning strategies
                                                                                                                                                                  •   risk (cost estimation for classification decisions)

                                                                              → We need to evaluate!                                                            Never ever evaluate on training data!
                                                                                                                                                                ... Why not?




                                       Machine Learning, LIX004M5 – p.40/50                                              Machine Learning, LIX004M5 – p.41/50                                              Machine Learning, LIX004M5 – p.42/50




Evaluation                                                                    Evaluation                                                                        Evaluation
                                                                                                                                                                Typical strategy in supervised learning:
Distinguish:                                                                  How good is an estimate of the true error by means of                             Split data into disjoint training data and test data
                                                                              sample errors?
sample error: error rate observed when classifying                                                                                                              Problems:
    sample data (test data)                                                     •   confidence intervals
true error: probability of misclassifying a randomly                            •   larger sample → greater confidence                                             •   we could be (too) lucky (sample error on test data
    selected object                                                                                                                                                   is better than with other data splits)
                                                                                                                                                                  •   test data set is too small to be confident
                                                                              How good is one model compared to another?
                                                                                                                                                                  •   training data is rare and expensive (we don’t want
                                                                                •   calculate sample errors                                                           to waste too much when seperating test data)
                                                                                •   compute statistical significance (e.g. paired t-test)



                                       Machine Learning, LIX004M5 – p.43/50                                              Machine Learning, LIX004M5 – p.44/50                                              Machine Learning, LIX004M5 – p.45/50
Cross validation                                                                 Cross validation                                                                 Cross validation
  •   split D into k similar sized sets (e.g. k=10)                                •   split D into k similar sized sets (e.g. k=10)                                •   split D into k similar sized sets (e.g. k=10)
  •   use k − 1 sets for training and 1 for evaluation                             •   use k − 1 sets for training and 1 for evaluation                             •   use k − 1 sets for training and 1 for evaluation
  •   use each set once for evaluation and calculate the                           •   use each set once for evaluation and calculate the                           •   use each set once for evaluation and calculate the
      average of the errors                                                            average of the errors                                                            average of the errors
                                                                                 → improve error estimates (higher confidence)                                     → improve error estimates (higher confidence)
                                                                                 → all data is tested                                                             → all data is tested
                                                                                 → better use of (limited) training data                                          → better use of (limited) training data
                                                                                 Note: we still don’t evaluate on training data!                                  Note: we still don’t evaluate on training data!
                                                                                                                                                                  special case: leave one out cross validation - use each
                                                                                                                                                                  training example once for testing and the others for
                                                                                                                                                                  training
                                          Machine Learning, LIX004M5 – p.46/50                                             Machine Learning, LIX004M5 – p.47/50                                             Machine Learning, LIX004M5 – p.48/50




Conclusion                                                                       What’s next?
  •   we seem to be overwhelmed by number,
      complexity and magnitude of sub-problems
                                                                                 This week: Read ch. 1 & ch. 5 of Mitchell and look
  •   many of them can be solved (to a certain degree
                                                                                     at the exercises
      at least)
                                                                                     No lab on Friday!
  •   many fascinating problems still remain
                                                                                 Next week: Inductive learning, Mitchell ch. 2 &
Enjoy working with learning systems!                                                 Decision trees, ch. 3
                                                                                     First lab about Decision Trees




                                          Machine Learning, LIX004M5 – p.49/50                                             Machine Learning, LIX004M5 – p.50/50

More Related Content

PDF
Automatic Assessment In Math Education
PDF
Lecture 1: Introduction to the Course (Practical Information)
PDF
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
PDF
2015EDM: Feature-Aware Student Knowledge Tracing Tutorial
DOC
Course plan os
PPTX
EF-ODL E-learning & Future Internet
KEY
Sharing of Tracker at national ict sharing 2011
PDF
Ictlt tracker presentation
Automatic Assessment In Math Education
Lecture 1: Introduction to the Course (Practical Information)
EDM2014 paper: General Features in Knowledge Tracing to Model Multiple Subski...
2015EDM: Feature-Aware Student Knowledge Tracing Tutorial
Course plan os
EF-ODL E-learning & Future Internet
Sharing of Tracker at national ict sharing 2011
Ictlt tracker presentation

Viewers also liked (11)

PDF
Mesa Overhead Doors Repair
PPS
Sutuhadong
DOC
Click here to read article
PPTX
PBIS-DATA
PPTX
Como gerir de forma eficaz um projeto do Portugal 2020 | Modelos e Relatórios...
PDF
M2020 | Omni-experience & Mobile
PDF
Fórum Business Analytics | INDEG-ISCTE
PDF
Goodreads مقدمة عن برنامج وتطبيق قودريدز
PDF
Polymer
PDF
KMME 2016 - Abusalah
Mesa Overhead Doors Repair
Sutuhadong
Click here to read article
PBIS-DATA
Como gerir de forma eficaz um projeto do Portugal 2020 | Modelos e Relatórios...
M2020 | Omni-experience & Mobile
Fórum Business Analytics | INDEG-ISCTE
Goodreads مقدمة عن برنامج وتطبيق قودريدز
Polymer
KMME 2016 - Abusalah
Ad

Similar to Machine Learning, LIX004M5 (20)

PDF
Machine Learning, LIX004M5
PDF
Machine Learning, LIX004M5
PPT
Lessons Learned from Testing Machine Learning Software
PDF
Deep learning - a primer
PDF
Deep learning - a primer
PPT
MLlecture1.ppt
PPT
MLlecture1.ppt
PPTX
Multi task learning in dnn
PDF
cs330_2021_lifelong_learning.pdf
PDF
Deep Learning, an interactive introduction for NLP-ers
PPTX
Presentation MaSE 18-102012
PPT
Lecture01 0089
PDF
633-600 Machine Learning
PPTX
tensorflow.pptx
PPTX
Lec 01 introduction
PDF
Data science unit 1 By: Professor Lili Saghafi
PPTX
Deep learning with tensorflow
PDF
Learning how to learn
PDF
210428kopo
PDF
MILA DL & RL summer school highlights
Machine Learning, LIX004M5
Machine Learning, LIX004M5
Lessons Learned from Testing Machine Learning Software
Deep learning - a primer
Deep learning - a primer
MLlecture1.ppt
MLlecture1.ppt
Multi task learning in dnn
cs330_2021_lifelong_learning.pdf
Deep Learning, an interactive introduction for NLP-ers
Presentation MaSE 18-102012
Lecture01 0089
633-600 Machine Learning
tensorflow.pptx
Lec 01 introduction
Data science unit 1 By: Professor Lili Saghafi
Deep learning with tensorflow
Learning how to learn
210428kopo
MILA DL & RL summer school highlights
Ad

More from butest (20)

PDF
EL MODELO DE NEGOCIO DE YOUTUBE
DOC
1. MPEG I.B.P frame之不同
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPT
Timeline: The Life of Michael Jackson
DOCX
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
PDF
LESSONS FROM THE MICHAEL JACKSON TRIAL
PPTX
Com 380, Summer II
PPT
PPT
DOCX
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
DOC
MICHAEL JACKSON.doc
PPTX
Social Networks: Twitter Facebook SL - Slide 1
PPT
Facebook
DOCX
Executive Summary Hare Chevrolet is a General Motors dealership ...
DOC
Welcome to the Dougherty County Public Library's Facebook and ...
DOC
NEWS ANNOUNCEMENT
DOC
C-2100 Ultra Zoom.doc
DOC
MAC Printing on ITS Printers.doc.doc
DOC
Mac OS X Guide.doc
DOC
hier
DOC
WEB DESIGN!
EL MODELO DE NEGOCIO DE YOUTUBE
1. MPEG I.B.P frame之不同
LESSONS FROM THE MICHAEL JACKSON TRIAL
Timeline: The Life of Michael Jackson
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
LESSONS FROM THE MICHAEL JACKSON TRIAL
Com 380, Summer II
PPT
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
MICHAEL JACKSON.doc
Social Networks: Twitter Facebook SL - Slide 1
Facebook
Executive Summary Hare Chevrolet is a General Motors dealership ...
Welcome to the Dougherty County Public Library's Facebook and ...
NEWS ANNOUNCEMENT
C-2100 Ultra Zoom.doc
MAC Printing on ITS Printers.doc.doc
Mac OS X Guide.doc
hier
WEB DESIGN!

Machine Learning, LIX004M5

  • 1. General Info General Info (cont’d) instructor: Jörg Tiedemann (j.tiedemann@rug.nl) Machine Learning, LIX004M5 Harmoniegebouw, room 1311-429 You need an account on ’hagen’ for the labs! (But you may also work from home or somewhere Overview and Introduction prerequisites: open to students in Computer else) Science, Artificial Intelligence and Information J¨ rg Tiedemann o Science tiedeman@let.rug.nl 2nd year student or higher background: programming ability, elementary Go to A. Da Costa. (Harmoniegebouw, k. 336, Informatiekunde Rijksuniversiteit Groningen statistics bouwdeel 13.13, telefoon 363 5801) (dagelijks 10.30- schedule: September 5 - October 21 12.00 en 14.00-15.30 uur, vrijdagmiddag gesloten)! • lectures: mondays 13-15 • labs: fridays 9-11 (4 times only!) Machine Learning, LIX004M5 – p.1/50 Machine Learning, LIX004M5 – p.2/50 Machine Learning, LIX004M5 – p.3/50 General Info (cont’d) General Info (cont’d) General Info (cont’d) Website: http://www.let.rug.nl/˜tiedeman/ml06 Purpose of this course • Examination: Examination: lab assignments and exercises (50%), • Introduction (!) to machine learning techniques • obligatory lab assignments (50%) written exam (50%) (How much do you know already?) • written exam (50%) Exam: Friday, October 27, 9-12 • Discussion of several machine learning • A minimum of 6 for both parts is required! Literature: Tom Mitchell Machine Learning, New approaches • Exam is open book York: McGraw-Hill, 1997 • Examples and applications in various fields additional on-line literature (links available from • Practical assignments the course website) • using Weka - a machine learning package implemented in Java • a little bit of programming/scripting • some theoretical questions Machine Learning, LIX004M5 – p.4/50 Machine Learning, LIX004M5 – p.5/50 Machine Learning, LIX004M5 – p.6/50 Preliminary Program General comments What is Machine learning? 1. Organization, Introduction, (Ch.1, Ch.5) • Read the book! (and other literature if necessary) Machine Learning is 2. Inductive learning (Ch.2), Decision Trees (Ch.3) • Ask questions! (and I’ll try to answer) • the study of algorithms that • Lab 1 - Decision Trees • Tell me if you think that something’s wrong • improve their performance 3. Instance-Based Learning (Ch.8) • at some task • Keep the deadlines! • Lab 2 - Instance-based learning (1 week late → half the points, later → no points) • with experience 4. Bayesian Learning (Ch.6) • Lab 3 - Learner comparison/combination ... just like a human being ... (?) 5. Sequential data & Markov Models, M&S Ch.9, Bilmex; • Lab 4 - Markov models 6. Maximum Entropy models, Combining Learners 7. Genetic Algorithms (Ch.9), Reinforcement Learning (Ch.13) Machine Learning, LIX004M5 – p.7/50 Machine Learning, LIX004M5 – p.8/50 Machine Learning, LIX004M5 – p.9/50
  • 2. What is all the hype about ML? Why machine learning? Typical Data Mining Task data mining: pattern recognition, knowledge discovery, use historical data to improve future "Every time I fire a linguist the performance of the decisions, prediction (classification, regression), recognizer goes up" data discripton (clustering, summarization, visualization) (probably) said by Fred Jelinek (IBM speech group) in the 80s, quoted by, e.g., complex applications: we cannot program by hand, Jurafsky and Martin, Speech and Language Processing. (efficient) processing of complex signals Given: self-customizing programs: automatic adjustments • 9714 patient records, each describing a pregnancy and birth according to usage, dynamic systems • Each patient record contains 215 features Learn to predict: • Classes of future patients at high risk for Emergency Cesarean Section Machine Learning, LIX004M5 – p.10/50 Machine Learning, LIX004M5 – p.11/50 Machine Learning, LIX004M5 – p.12/50 Pattern Recognition Complex applications Classification Object detection Operating robots: Personal home page? Company website? Educational site? ALVINN [Pomerleau] drives 70 mph on highways Machine Learning, LIX004M5 – p.13/50 Machine Learning, LIX004M5 – p.14/50 Machine Learning, LIX004M5 – p.15/50 Automatic customization Machine learning is growing Questions to ask many more applications: Learning = improve with experience at some task • speech recognition • What experience? • robot control • What exactly should be learned? • spam filtering, data sorting • How shall it be represented? • machine translation • What specific algorithm to learn it? • financial data analysis and market predictions Goal: handle unseen data correctly according to the • hand writing recognition task (use your knowledge inferred from experience!) • data clustering and visualization • pattern recognition in genetics (e.g. DNA sequences) Machine Learning, LIX004M5 – p.16/50 Machine Learning, LIX004M5 – p.17/50 Machine Learning, LIX004M5 – p.18/50
  • 3. What experience? What exactly should be learned? How shall it be represented? • What do we know already about the task and Outcome of the target function Model selection possible solutions? (prior knowledge) • boolean (→ concept learning) • symbolic representation (e.g. rules) • What kind of data do we have available? • discrete values (→ classification) (training examples) • subsymbolic representation (neural networks, • real values (→ regression) SVMs) What are the discriminative features? How are they connected with each other (dependencies)? many machine learning tasks are classification tasks ... Do we want to restrict the space of possible solutions? • Is a “teacher” available (→ supervised learning) (→ restriction bias ... we come back to this) or not (→ unsupervised learning)? How expensive is labeling? • How much data do we need and how clean does it have to be? Machine Learning, LIX004M5 – p.19/50 Machine Learning, LIX004M5 – p.20/50 Machine Learning, LIX004M5 – p.21/50 What algorithm to learn it? What algorithm to learn it? Learning Models Learning means approximating the real (unknown) Learning means approximating the real (unknown) Learning means approximating the real (unknown) target function according to our experience (e.g. target function according to our experience (e.g. target function according to our experience (e.g. observed training examples) observed training examples) observed training examples) → Learning = Search for a “good” hypothesis/model → Learning = search for a “good” hypothesis/model Do we want to prefer certain models? Which one is better? (→ preference bias ... later more) Machine Learning, LIX004M5 – p.22/50 Machine Learning, LIX004M5 – p.23/50 Machine Learning, LIX004M5 – p.24/50 Learning Models What algorithm to learn it? The roots of ML Learning means approximating the real (unknown) Learning means approximating the real (unknown) Artificial intelligence: use prior knowledge and training data to guide target function according to our experience (e.g. target function according to our experience (e.g. learning as a search problem observed training examples) observed training examples) Baysian methods: probabilistic classifiers, probabilistic reasoning Computational complexity theory: trade-off between model (learning) → Learning = search for a “good” hypothesis/model → Learning = search for a “good” hypothesis/model complexity and performance Control theory: control optimisation processes Which one is better? • supervised learning (classified data available) • unsupervised learning (e.g. clustering) Information theory: entropy, information content, code optimsation and the minimum description length principle • inductive learning (from training data) Philosophy: Occam’s razor (simple is best) • deductive learning (data + domain theory) Psychology and neurobiology: response improvement with practice, ideas • gradient descent, bayesian learning, reinforcement learning ... that lead to artificical neural networks Statistics: data description, estimation of probability distributions, evaluation, confidence Machine Learning, LIX004M5 – p.25/50 Machine Learning, LIX004M5 – p.26/50 Machine Learning, LIX004M5 – p.27/50
  • 4. A walk-through example A walk-through example A walk-through example from Duda et al: Pattern Classification from Duda et al: Pattern Classification Procedure: • Task: automatically sort incoming fish on a • Task: automatically sort incoming fish on a preprocessing: isolate fishes from one another and conveyor belt into “sea bass” or “salmon” conveyor belt into “sea bass” or “salmon” from the background of the images • Experience: sample images feature selection: determine discriminative features We want a machine to learn this task. The machine needs some “experience”. to be extracted from the images (e.g. length, lightness, width, position of mouth, etc), feature selection = kind of data reduction (focus on relevant information) feature extraction: extract selected features from images and pass them to a classifier Machine Learning, LIX004M5 – p.28/50 Machine Learning, LIX004M5 – p.29/50 Machine Learning, LIX004M5 – p.30/50 A walk-through example A walk-through example A walk-through example Select length for discrimiation: Lightness is a better feature: Machine Learning, LIX004M5 – p.31/50 Machine Learning, LIX004M5 – p.32/50 Machine Learning, LIX004M5 – p.33/50 A walk-through example A walk-through example A walk-through example • devise decision rule or move decision boundary Feature selection: We still need to: to minimize some classification cost (→ decision • distinguishing (similar for objects in same theory) • select an appropriate type of model for category and very different for different classification (e.g. function class to define • a single feature might not be enough minimize categories) separation boundaries) costs • invariant (feature value doesn’t change when • select the model that generalizes the best (to be → feature vector, e.g.: changing the context) able to classify even unseen objects correctly) • insensitive to noise • consider computational complexity (trade-off X1 width • simple to extract between complexity and performance; scalibility) X= X2 lightness Machine Learning, LIX004M5 – p.34/50 Machine Learning, LIX004M5 – p.35/50 Machine Learning, LIX004M5 – p.36/50
  • 5. A walk-through example A walk-through example A walk-through example Linear decision boundary Overly complex decision boundary a good trade-off between performance on the training (What is the problem?) set and model simplicity Machine Learning, LIX004M5 – p.37/50 Machine Learning, LIX004M5 – p.38/50 Machine Learning, LIX004M5 – p.39/50 The Design Cycle Evaluation Evaluation We have ... Evaluation of classifiers based on • different feature sets • accuracy or error rate • different models (percentage of classification errors) • different learning strategies • risk (cost estimation for classification decisions) → We need to evaluate! Never ever evaluate on training data! ... Why not? Machine Learning, LIX004M5 – p.40/50 Machine Learning, LIX004M5 – p.41/50 Machine Learning, LIX004M5 – p.42/50 Evaluation Evaluation Evaluation Typical strategy in supervised learning: Distinguish: How good is an estimate of the true error by means of Split data into disjoint training data and test data sample errors? sample error: error rate observed when classifying Problems: sample data (test data) • confidence intervals true error: probability of misclassifying a randomly • larger sample → greater confidence • we could be (too) lucky (sample error on test data selected object is better than with other data splits) • test data set is too small to be confident How good is one model compared to another? • training data is rare and expensive (we don’t want • calculate sample errors to waste too much when seperating test data) • compute statistical significance (e.g. paired t-test) Machine Learning, LIX004M5 – p.43/50 Machine Learning, LIX004M5 – p.44/50 Machine Learning, LIX004M5 – p.45/50
  • 6. Cross validation Cross validation Cross validation • split D into k similar sized sets (e.g. k=10) • split D into k similar sized sets (e.g. k=10) • split D into k similar sized sets (e.g. k=10) • use k − 1 sets for training and 1 for evaluation • use k − 1 sets for training and 1 for evaluation • use k − 1 sets for training and 1 for evaluation • use each set once for evaluation and calculate the • use each set once for evaluation and calculate the • use each set once for evaluation and calculate the average of the errors average of the errors average of the errors → improve error estimates (higher confidence) → improve error estimates (higher confidence) → all data is tested → all data is tested → better use of (limited) training data → better use of (limited) training data Note: we still don’t evaluate on training data! Note: we still don’t evaluate on training data! special case: leave one out cross validation - use each training example once for testing and the others for training Machine Learning, LIX004M5 – p.46/50 Machine Learning, LIX004M5 – p.47/50 Machine Learning, LIX004M5 – p.48/50 Conclusion What’s next? • we seem to be overwhelmed by number, complexity and magnitude of sub-problems This week: Read ch. 1 & ch. 5 of Mitchell and look • many of them can be solved (to a certain degree at the exercises at least) No lab on Friday! • many fascinating problems still remain Next week: Inductive learning, Mitchell ch. 2 & Enjoy working with learning systems! Decision trees, ch. 3 First lab about Decision Trees Machine Learning, LIX004M5 – p.49/50 Machine Learning, LIX004M5 – p.50/50