SlideShare a Scribd company logo
Some Take-Home Messages (THM) about ML....
Data Science Meetup
Gianluca Bontempi
Interuniversity Institute of Bioinformatics in Brussels, (IB)2
Machine Learning Group,
Computer Science Department, ULB
mlg.ulb.ac.be, ibsquare.be
May 20, 2016
Introducing myself
1992: Computer science engineer (Politecnico di Milano, Italy),
1994: Researcher in robotics in IRST, Trento, Italy,
1995: Researcher in IRIDIA, ULB, Brussels,
1996-97: Researcher in IDSIA, Lugano, Switzerland,
1998-2000: Marie Curie fellowship in IRIDIA, ULB,
2000-2001: Scientist in Philips Research, Eindhoven, The
Netherlands,
2001-2002: Scientist in IMEC, Microelectronics Institute,
Leuven, Belgium,
since 2002: professor in Machine Learning, Modeling and
Simulation, Bioinformatics in ULB Computer Science Dept.,
since 2004: head of the ULB Machine Learning Group (MLG).
since 2013: director of the Interuniversity Institute of
Bioinformatics in Brussels (IB)2, ibsquare.be.
What is machine learning?
Machine learning is that domain of computational intelligence
which is concerned with the question of how to construct computer
programs that automatically improve with experience. (Mitchell,
97)
Reductionist attitude: ML is just a buzzword which equates to
statistics plus marketing
Positive attitude: ML paved the way to the treatment of real
problems related to data analysis, sometimes
overlooked by statisticians (nonlinearity, classification,
pattern recognition, missing variables, adaptivity,
optimization, massive datasets, data management,
causality, representation of knowledge, parallelisation)
Interdisciplinary attitude: ML should have its roots on statistics
and complements it by focusing on: algorithmic
issues, computational efficiency, data engineering.
Prediction is pervasive ...
Prediction is pervasive ...
Predict
whether you will like a book/movie (collaborative filtering)
credit applicants as low, medium, or high risk.
which home telephone lines are used for Internet access.
which customers are likely to stop being customers (churn).
the value of a piece of real estate
which telephone subscribers will order a 4G service
which CARREFOUR clients will be more interested to a
discount in Italian products.
the probability that a company is employing black workers
(anti-fraud detection)
the survival risk of a patient on the basis of a genetic signature
the probability of a crime in an urban area.
the key of a cryptographic algorithm on the basis of power
consumption
Supervised learning
First assumption: learning is essentially about prediction !
Second assumption: reality is stochastic, dependency and
uncertainty are well described by conditional probability.
PREDICTION
TARGET
TRAINING
DATASET
INPUT OUTPUT
ERROR
PREDICTION
MODEL
measurable features (inputs)
measurable target variables (outputs) and accuracy criteria
data (in God we trust, all the others must bring data)
THM1: formalizing a problem as a prediction problem is often the
most important contribution of a data scientist!
It is all about ...
1 Probabilistic modeling
it formalizes uncertainty and dependency (regression function)
notions of entropy and information
relevant and irrelevant features (e.g. Markov blanket notion)
Bayesian networks, causal reasoning
2 Estimation
bias/variance notions
generalization issues: underfitting vs overfitting
Bayesian, frequentist, decision theory
validation
combination/averaging of estimators (bagging, boosting)
3 Optimization
Maximum likelihood, least squares, backpropagation
Dual problems (SVM)
L1, L2 norm (lasso)
4 Computer science
implementation, algorithms
parallelism, scalability
data management
So ... how to teach machine learning?
Focus on ...
Formalism ?
Algorithms ?
Coding ?
Applications ?
Of course all is important but what is the essence, what is common
to the exploding number of algorithms, techniques, fancy
applications?
Estimation
STOCHASTIC PHENOMENON
DATA
LEARNER
DATA DATA
MODEL,
PREDICTION
LEARNER
MODEL,
PREDICTION
LEARNER
MODEL,
PREDICTION
THM2: a predictor is an estimator, i.e. an algorithm (black-box)
which takes data and returns a prediction.
THM3: reality is stochastic, so data is stochastic and prediction is
stochastic.
Assessing in an un uncertain world (Baggio, 1998)
non aver paura di sbagliare un calcio di rigore, non è mica da questi
particolari che si giudica un giocatore (De Gregori, 1982)).
Assessing a learner
The goal of learning is to find a model which is able to
generalize, i.e. able to return good predictions in contexts
with the same distribution but independent of the training set
How to estimate the quality of a model?
It is always possible to find models with such a complicate
structure that they have null training errors. Are these models
good?
Typically NOT. Since doing very well on the training set could
mean doing badly on new data.
This is the phenomenon of overfitting.
THM4: learning is challenging since data have to be used 1) for
creating prediction models and 2) for assessing them.
Bias and variance of a model
Estimation theory: mean-squared-error (a measure of the
generalization quality) can be written as
MSE = σ2
w + squared bias + variance
where
noise concerns the reality alone,
bias reflects the relation between reality and the learning
algorithm
variance concerns the learning algorithm alone.
This is purely theoretical since these quantities cannot be
measured ....
.. but useful to understand why and in which circumstances
learners work.
The bias/variance dilemma
Noise is all that cannot be learned from data
Bias measures the lack of representational power of the class
of hypotheses.
Too simple model ⇒ large bias ⇒ underfitting
Variance warns us against an excessive complexity of the
approximator.
Too complex model ⇒ large variance ⇒ overfitting
A neural network is less biased than a linear model but
inevitably more variant.
Averaging (e.g. bagging, boosting, random forests) is a good
cure for variance.
Bias/variance trade-off
complexity
generalization
error
Bias
Variance
Underfitting Overfitting
THM5: think in terms of bias/variance tradeoff. Think to your
preferred learning algorithm and discover how bias/variance is
managed.
The Ockam’s Razor (1825)
THM6: "Pluralitas non est ponenda sine neccesitate" i.e. one
should not increase, beyond what is necessary, the number of
entities required to explain anything.
This is the medieval rule of parsimony, or principle of
economy, known as Ockham’s razor.
In other terms the principle states that one should not make
more assumptions than the minimum needed.
It underlies all scientific modeling and theory building. It
admonishes us to choose from a set of otherwise equivalent
models the simplest one.
Be simple: "shave off" those concepts, variables or constructs
that are not really needed to explain the phenomenon.
Does the best exist?
Given a finite number of samples, are there any reasons to
prefer one learning algorithm over another?
If we make no assumption about the nature of the learning
task, can we expect any learning method to be superior or
inferior overall?
Can we even find an algorithm that is overall superior to (or
inferior to) random guessing?
The No Free Lunch Theorem answers NO to these questions.
No Free Lunch theorem
If the goal is to obtain good generalization performance, there
are no context-independent or usage-independent reasons
to favor one learning method over another.
If one algorithm seems to outperform another in a particular
situation, it is a consequence of its fit to the particular pattern
recognition problem, not the general superiority of the
algorithm.
The theorem also justifies the skeptiscism about studies that
demonstrate the overall superiority of a particular learning or
recognition algorithm.
If a learning method performs well over some set of problems,
then it must perform worse than average elsewhere. No
method can perform well throughout the full set of functions.
THM7: Every learning algorithm makes assumptions (most of the
times in implicit manner) and these make the difference.
Conclusion
Popper claimed that, if a theory is falsifiable (i.e. it can be
contradicted by an observation or the outcome of a physical
experiment), then it is scientific. Since prediction is the most
falsifiable aspect of science it is also the most scientific one.
Effective machine learning is an extension of statistics, in no
way an alternative.
Simplest (i.e. linear) model first.
Modelling is more an art than an automatic process... then
experience data analysts are more valuable than expensive
tools.
Expert knowledge matters..., data too
Understanding what is predictable is as important as trying to
predict it.
All models are wrong, some of them are useful.
All that we did not discuss...
Dimensionality reduction and feature selection
Causal inference
Unsupervised learning
Active learning
Spatio-temporal prediction
Nonstationary problems
Scalable machine learning
Control and robotics
Libraries and platforms (R, python, Weka)
Resources
A biased list ...:-)
Scoop-it
www.scoop.it/t/machine-learning-by-gianluca-bontempi
on machine learning
Scoop-it
www.scoop.it/t/probabilistic-reasoning-and-statistics
on Probabilistic reasoning, causal inference and statistics
MLG mlg.ulb.ac.be
MA course INFO-F-422 Statistical foundations of machine
learning
Handbook available on https://www.otexts.org

More Related Content

DOC
Introduction.doc
PDF
Human in the loop: Bayesian Rules Enabling Explainable AI
PPTX
Machine learning
PPS
Brief Tour of Machine Learning
PDF
Model Evaluation in the land of Deep Learning
PPTX
Machine learning
PDF
Model evaluation in the land of deep learning
PDF
Lecture1 introduction to machine learning
Introduction.doc
Human in the loop: Bayesian Rules Enabling Explainable AI
Machine learning
Brief Tour of Machine Learning
Model Evaluation in the land of Deep Learning
Machine learning
Model evaluation in the land of deep learning
Lecture1 introduction to machine learning

What's hot (20)

PPTX
Machine learning presentation (razi)
PPTX
Machine learning
PDF
Lecture 1: What is Machine Learning?
PDF
Hedging Predictions in Machine Learning
PDF
Classification of Machine Learning Algorithms
PDF
Soft computing
 
PPTX
Introduction to Machine Learning
PPTX
Classification and Regression
PPTX
Simplified Fuzzy ARTMAP
PPT
Machine Learning
PPTX
Selecting the Right Type of Algorithm for Various Applications - Phdassistance
PDF
Automated Education Propositional Logic Tool (AEPLT): Used For Computation in...
PPTX
Machine learning ppt
PDF
AI Use Cases: Special Attention on Semantic Segmentation
PDF
International Journal of Engineering Inventions (IJEI)
PPTX
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
PPTX
Launching into machine learning
PDF
Selecting the Right Type of Algorithm for Various Applications - Phdassistance
PPTX
Reasoning in AI
PPTX
Introduction to AI - Second Lecture
Machine learning presentation (razi)
Machine learning
Lecture 1: What is Machine Learning?
Hedging Predictions in Machine Learning
Classification of Machine Learning Algorithms
Soft computing
 
Introduction to Machine Learning
Classification and Regression
Simplified Fuzzy ARTMAP
Machine Learning
Selecting the Right Type of Algorithm for Various Applications - Phdassistance
Automated Education Propositional Logic Tool (AEPLT): Used For Computation in...
Machine learning ppt
AI Use Cases: Special Attention on Semantic Segmentation
International Journal of Engineering Inventions (IJEI)
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Launching into machine learning
Selecting the Right Type of Algorithm for Various Applications - Phdassistance
Reasoning in AI
Introduction to AI - Second Lecture
Ad

Viewers also liked (20)

PDF
Supervised Approach to Extract Sentiments from Unstructured Text
PDF
Power of Code: What you don’t know about what you know
PDF
07 history of cv vision paradigms - system - algorithms - applications - eva...
PPT
Applying Reinforcement Learning for Network Routing
PDF
Graphical Models for chains, trees and grids
PDF
One Size Doesn't Fit All: The New Database Revolution
PPTX
Streamlining Technology to Reduce Complexity and Improve Productivity
PPTX
Machine Learning techniques
PPTX
Pattern Recognition and Machine Learning : Graphical Models
PDF
Les outils de modélisation des Big Data
PDF
graphical models for the Internet
PPTX
Nearest Neighbor Customer Insight
PDF
Web Crawling and Reinforcement Learning
PDF
A real-time big data architecture for glasses detection using computer vision...
DOCX
A system to filter unwanted messages from osn user walls
PPTX
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition
PPT
Aggregation for searching complex information spaces
PDF
Big Data Paradigm - Analysis, Application and Challenges
PPTX
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...
ODP
On cascading small decision trees
Supervised Approach to Extract Sentiments from Unstructured Text
Power of Code: What you don’t know about what you know
07 history of cv vision paradigms - system - algorithms - applications - eva...
Applying Reinforcement Learning for Network Routing
Graphical Models for chains, trees and grids
One Size Doesn't Fit All: The New Database Revolution
Streamlining Technology to Reduce Complexity and Improve Productivity
Machine Learning techniques
Pattern Recognition and Machine Learning : Graphical Models
Les outils de modélisation des Big Data
graphical models for the Internet
Nearest Neighbor Customer Insight
Web Crawling and Reinforcement Learning
A real-time big data architecture for glasses detection using computer vision...
A system to filter unwanted messages from osn user walls
[PRML 3.1~3.2] Linear Regression / Bias-Variance Decomposition
Aggregation for searching complex information spaces
Big Data Paradigm - Analysis, Application and Challenges
Sourcing talent a key recruiting differentiator part 2 - the (Big) Data Lands...
On cascading small decision trees
Ad

Similar to Some Take-Home Message about Machine Learning (20)

DOC
On Machine Learning and Data Mining
PDF
Introduction AI ML& Mathematicals of ML.pdf
PDF
Machine learning interview questions and answers
PPTX
PREDICT 422 - Module 1.pptx
PDF
AI Presentation 1
PPTX
Machine Learning basics
PDF
A Machine Learning Primer,
PDF
PPT
notes as .ppt
PPTX
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
PPT
Machine Learning: Foundations Course Number 0368403401
PPT
Machine Learning: Foundations Course Number 0368403401
PDF
Time ser
PDF
TPCMFinalACone
DOC
Lecture #1: Introduction to machine learning (ML)
PPTX
Intro/Overview on Machine Learning Presentation
PDF
machine learning
PDF
A Few Useful Things to Know about Machine Learning
PPTX
i2ml3e-chap1.pptx
PPTX
introduction to machin learning
On Machine Learning and Data Mining
Introduction AI ML& Mathematicals of ML.pdf
Machine learning interview questions and answers
PREDICT 422 - Module 1.pptx
AI Presentation 1
Machine Learning basics
A Machine Learning Primer,
notes as .ppt
CST413 KTU S7 CSE Machine Learning Introduction Parameter Estimation MLE MAP ...
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
Time ser
TPCMFinalACone
Lecture #1: Introduction to machine learning (ML)
Intro/Overview on Machine Learning Presentation
machine learning
A Few Useful Things to Know about Machine Learning
i2ml3e-chap1.pptx
introduction to machin learning

More from Gianluca Bontempi (11)

PDF
A statistical criterion for reducing indeterminacy in linear causal modeling
PDF
Adaptive model selection in Wireless Sensor Networks
PDF
Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection
PDF
A model-based relevance estimation approach for feature selection in microarr...
PDF
Machine Learning Strategies for Time Series Prediction
PDF
Feature selection and microarray data
PDF
A Monte Carlo strategy for structure multiple-step-head time series prediction
PDF
FP7 evaluation & selection: the point of view of an evaluator
PDF
Local modeling in regression and time series prediction
PDF
Perspective of feature selection in bioinformatics
PDF
Computational Intelligence for Time Series Prediction
A statistical criterion for reducing indeterminacy in linear causal modeling
Adaptive model selection in Wireless Sensor Networks
Combining Lazy Learning, Racing and Subsampling for Effective Feature Selection
A model-based relevance estimation approach for feature selection in microarr...
Machine Learning Strategies for Time Series Prediction
Feature selection and microarray data
A Monte Carlo strategy for structure multiple-step-head time series prediction
FP7 evaluation & selection: the point of view of an evaluator
Local modeling in regression and time series prediction
Perspective of feature selection in bioinformatics
Computational Intelligence for Time Series Prediction

Recently uploaded (20)

PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Mega Projects Data Mega Projects Data
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PDF
Business Analytics and business intelligence.pdf
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Lecture1 pattern recognition............
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Mega Projects Data Mega Projects Data
oil_refinery_comprehensive_20250804084928 (1).pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Business Analytics and business intelligence.pdf
Reliability_Chapter_ presentation 1221.5784
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Qualitative Qantitative and Mixed Methods.pptx
Introduction to Knowledge Engineering Part 1
Database Infoormation System (DBIS).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
1_Introduction to advance data techniques.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Clinical guidelines as a resource for EBP(1).pdf
Lecture1 pattern recognition............

Some Take-Home Message about Machine Learning

  • 1. Some Take-Home Messages (THM) about ML.... Data Science Meetup Gianluca Bontempi Interuniversity Institute of Bioinformatics in Brussels, (IB)2 Machine Learning Group, Computer Science Department, ULB mlg.ulb.ac.be, ibsquare.be May 20, 2016
  • 2. Introducing myself 1992: Computer science engineer (Politecnico di Milano, Italy), 1994: Researcher in robotics in IRST, Trento, Italy, 1995: Researcher in IRIDIA, ULB, Brussels, 1996-97: Researcher in IDSIA, Lugano, Switzerland, 1998-2000: Marie Curie fellowship in IRIDIA, ULB, 2000-2001: Scientist in Philips Research, Eindhoven, The Netherlands, 2001-2002: Scientist in IMEC, Microelectronics Institute, Leuven, Belgium, since 2002: professor in Machine Learning, Modeling and Simulation, Bioinformatics in ULB Computer Science Dept., since 2004: head of the ULB Machine Learning Group (MLG). since 2013: director of the Interuniversity Institute of Bioinformatics in Brussels (IB)2, ibsquare.be.
  • 3. What is machine learning? Machine learning is that domain of computational intelligence which is concerned with the question of how to construct computer programs that automatically improve with experience. (Mitchell, 97) Reductionist attitude: ML is just a buzzword which equates to statistics plus marketing Positive attitude: ML paved the way to the treatment of real problems related to data analysis, sometimes overlooked by statisticians (nonlinearity, classification, pattern recognition, missing variables, adaptivity, optimization, massive datasets, data management, causality, representation of knowledge, parallelisation) Interdisciplinary attitude: ML should have its roots on statistics and complements it by focusing on: algorithmic issues, computational efficiency, data engineering.
  • 5. Prediction is pervasive ... Predict whether you will like a book/movie (collaborative filtering) credit applicants as low, medium, or high risk. which home telephone lines are used for Internet access. which customers are likely to stop being customers (churn). the value of a piece of real estate which telephone subscribers will order a 4G service which CARREFOUR clients will be more interested to a discount in Italian products. the probability that a company is employing black workers (anti-fraud detection) the survival risk of a patient on the basis of a genetic signature the probability of a crime in an urban area. the key of a cryptographic algorithm on the basis of power consumption
  • 6. Supervised learning First assumption: learning is essentially about prediction ! Second assumption: reality is stochastic, dependency and uncertainty are well described by conditional probability. PREDICTION TARGET TRAINING DATASET INPUT OUTPUT ERROR PREDICTION MODEL measurable features (inputs) measurable target variables (outputs) and accuracy criteria data (in God we trust, all the others must bring data) THM1: formalizing a problem as a prediction problem is often the most important contribution of a data scientist!
  • 7. It is all about ... 1 Probabilistic modeling it formalizes uncertainty and dependency (regression function) notions of entropy and information relevant and irrelevant features (e.g. Markov blanket notion) Bayesian networks, causal reasoning 2 Estimation bias/variance notions generalization issues: underfitting vs overfitting Bayesian, frequentist, decision theory validation combination/averaging of estimators (bagging, boosting) 3 Optimization Maximum likelihood, least squares, backpropagation Dual problems (SVM) L1, L2 norm (lasso) 4 Computer science implementation, algorithms parallelism, scalability data management
  • 8. So ... how to teach machine learning? Focus on ... Formalism ? Algorithms ? Coding ? Applications ? Of course all is important but what is the essence, what is common to the exploding number of algorithms, techniques, fancy applications?
  • 9. Estimation STOCHASTIC PHENOMENON DATA LEARNER DATA DATA MODEL, PREDICTION LEARNER MODEL, PREDICTION LEARNER MODEL, PREDICTION THM2: a predictor is an estimator, i.e. an algorithm (black-box) which takes data and returns a prediction. THM3: reality is stochastic, so data is stochastic and prediction is stochastic.
  • 10. Assessing in an un uncertain world (Baggio, 1998) non aver paura di sbagliare un calcio di rigore, non è mica da questi particolari che si giudica un giocatore (De Gregori, 1982)).
  • 11. Assessing a learner The goal of learning is to find a model which is able to generalize, i.e. able to return good predictions in contexts with the same distribution but independent of the training set How to estimate the quality of a model? It is always possible to find models with such a complicate structure that they have null training errors. Are these models good? Typically NOT. Since doing very well on the training set could mean doing badly on new data. This is the phenomenon of overfitting. THM4: learning is challenging since data have to be used 1) for creating prediction models and 2) for assessing them.
  • 12. Bias and variance of a model Estimation theory: mean-squared-error (a measure of the generalization quality) can be written as MSE = σ2 w + squared bias + variance where noise concerns the reality alone, bias reflects the relation between reality and the learning algorithm variance concerns the learning algorithm alone. This is purely theoretical since these quantities cannot be measured .... .. but useful to understand why and in which circumstances learners work.
  • 13. The bias/variance dilemma Noise is all that cannot be learned from data Bias measures the lack of representational power of the class of hypotheses. Too simple model ⇒ large bias ⇒ underfitting Variance warns us against an excessive complexity of the approximator. Too complex model ⇒ large variance ⇒ overfitting A neural network is less biased than a linear model but inevitably more variant. Averaging (e.g. bagging, boosting, random forests) is a good cure for variance.
  • 14. Bias/variance trade-off complexity generalization error Bias Variance Underfitting Overfitting THM5: think in terms of bias/variance tradeoff. Think to your preferred learning algorithm and discover how bias/variance is managed.
  • 15. The Ockam’s Razor (1825) THM6: "Pluralitas non est ponenda sine neccesitate" i.e. one should not increase, beyond what is necessary, the number of entities required to explain anything. This is the medieval rule of parsimony, or principle of economy, known as Ockham’s razor. In other terms the principle states that one should not make more assumptions than the minimum needed. It underlies all scientific modeling and theory building. It admonishes us to choose from a set of otherwise equivalent models the simplest one. Be simple: "shave off" those concepts, variables or constructs that are not really needed to explain the phenomenon.
  • 16. Does the best exist? Given a finite number of samples, are there any reasons to prefer one learning algorithm over another? If we make no assumption about the nature of the learning task, can we expect any learning method to be superior or inferior overall? Can we even find an algorithm that is overall superior to (or inferior to) random guessing? The No Free Lunch Theorem answers NO to these questions.
  • 17. No Free Lunch theorem If the goal is to obtain good generalization performance, there are no context-independent or usage-independent reasons to favor one learning method over another. If one algorithm seems to outperform another in a particular situation, it is a consequence of its fit to the particular pattern recognition problem, not the general superiority of the algorithm. The theorem also justifies the skeptiscism about studies that demonstrate the overall superiority of a particular learning or recognition algorithm. If a learning method performs well over some set of problems, then it must perform worse than average elsewhere. No method can perform well throughout the full set of functions. THM7: Every learning algorithm makes assumptions (most of the times in implicit manner) and these make the difference.
  • 18. Conclusion Popper claimed that, if a theory is falsifiable (i.e. it can be contradicted by an observation or the outcome of a physical experiment), then it is scientific. Since prediction is the most falsifiable aspect of science it is also the most scientific one. Effective machine learning is an extension of statistics, in no way an alternative. Simplest (i.e. linear) model first. Modelling is more an art than an automatic process... then experience data analysts are more valuable than expensive tools. Expert knowledge matters..., data too Understanding what is predictable is as important as trying to predict it. All models are wrong, some of them are useful.
  • 19. All that we did not discuss... Dimensionality reduction and feature selection Causal inference Unsupervised learning Active learning Spatio-temporal prediction Nonstationary problems Scalable machine learning Control and robotics Libraries and platforms (R, python, Weka)
  • 20. Resources A biased list ...:-) Scoop-it www.scoop.it/t/machine-learning-by-gianluca-bontempi on machine learning Scoop-it www.scoop.it/t/probabilistic-reasoning-and-statistics on Probabilistic reasoning, causal inference and statistics MLG mlg.ulb.ac.be MA course INFO-F-422 Statistical foundations of machine learning Handbook available on https://www.otexts.org