SlideShare a Scribd company logo
Machine Learning
Lunch & Learn - Session 4
Luis Borbon
11/07/2017
Table of contents
1. Recap
2. Generalization in Machine Learning
3. Overfitting and Underfitting
4. Algorithms by Similarity
5. Real Application
6. People to follow
Recap
Recap
● Training, validation and test data sets.
● Learning Style
○ Supervised
○ Unsupervised
○ Semi-Supervised Learning.
● Similarity
○ Regression Algorithms
○ Instance-based Algorithms
○ Regularization Algorithms
○ Decision Tree Algorithms
Machine learning - session 4
Recap
Decision trees
Possible applications in PlantMiner:
For a searcher: Based on previous quotes,
identify an item that usually is being hired along
other.
● Suggest the item.
● Offer a discount to add the suggested
item.
For a supplier: Identify suppliers that would
crunch on the next subscription renewal.
Generalization in Machine Learning
Induction and deduction
Induction refers to learning general concepts
from specific examples which is exactly the
problem that supervised machine learning
problems aim to solve.
This is different from deduction that is the other
way around and seeks to learn specific concepts
from general rules.
Induction and deduction
The goal of a good machine learning model is to
generalize well from the training data to any data
from the problem domain.
This allows us to make predictions in the future
on data the model has never seen.
Overfitting and Underfitting
Overfitting
In machine learning, one of the most common
tasks is to fit a "model" to a set of training data,
so as to be able to make reliable predictions on
general untrained data.
In overfitting, a statistical model describes
random error or noise instead of the underlying
relationship.
The green line represents an overfitted model and the black line
represents a regularised model. While the green line best follows
the training data, it is too dependent on it and it is likely to have a
higher error rate on new unseen data, compared to the black
line.
Overfitting
A model that has been overfit has poor
predictive performance, as it overreacts to minor
fluctuations in the training data.
Noisy (roughly linear) data is fitted to both linear and polynomial
functions. Although the polynomial function is a perfect fit, the
linear version can be expected to generalize better. In other
words, if the two functions were used to extrapolate the data
beyond the fit data, the linear function would make better
predictions.
Overfitting
Overfitting occurs when a model is excessively
complex, such as having too many parameters
relative to the number of observations.
Overfitting/overtraining in supervised learning (e.g., neural
network). Training error is shown in blue, validation error in red,
both as a function of the number of training cycles. If the
validation error increases(positive slope) while the training error
steadily decreases(negative slope) then a situation of overfitting
may have occurred. The best predictive and fitted model would
be where the validation error has its global minimum.
Underfitting
Underfitting occurs when a statistical model or machine
learning algorithm cannot capture the underlying trend of
the data.
It occurs when the model or algorithm does not fit the data
enough. Underfitting occurs if the model or algorithm
shows low variance but high bias (to contrast the opposite,
overfitting from high variance and low bias). It is often a
result of an excessively simple model.
Underfitting would occur, for example, when fitting a linear
model to non-linear data.
Such a model would have poor predictive performance.
Underfitting
There are two important techniques that you can use
when evaluating machine learning algorithms to limit
overfitting:
● Use a resampling technique to estimate model
accuracy.
● Hold back a validation dataset.
Underfitting
Resampling
The most popular resampling technique is k-fold cross validation. It allows you to train and test your
model k-times on different subsets of training data and build up an estimate of the performance of a
machine learning model on unseen data.
Validation dataset
A validation dataset is simply a subset of your training data that you hold back from your machine
learning algorithms until the very end of your project. After you have selected and tuned your
machine learning algorithms on your training dataset you can evaluate the learned models on the
validation dataset to get a final objective idea of how the models might perform on unseen data.
Algorithms by Similarity (cont…)
Bayesian Algorithms
Bayesian methods are those that explicitly apply
Bayes’ Theorem for problems such as
classification and regression.
With appropriate pre-processing, it is competitive
in this domain with more advanced methods
including support vector machines.
It also finds application in automatic medical
diagnosis.
Document classification, based on word
frequencies. e.g. SPAM.
Bayesian Algorithms
The most popular Bayesian algorithms are:
● Naive Bayes
● Gaussian Naive Bayes
● Multinomial Naive Bayes
● Averaged One-Dependence Estimators
(AODE)
● Bayesian Belief Network (BBN)
● Bayesian Network (BN)
Real Application
DoseMe.com.au
Bayesian dosing uses patient data and
laboratory results to estimate a patient's ability to
absorb, process, and clear a drug from their
system. Using a published population model,
DoseMe's algorithms adjusts the
pharmacokinetic and/or pharmacodynamic
parameters so that a patient-specific,
individualised drug model is built. This individual
model is then used to provide a patient-specific
dosing recommendation to reach a therapeutic
target.
People to Follow
Fei-Fei Li
Fei-Fei Li, who publishes under the name Li Fei-Fei, is an
Associate Professor of Computer Science at Stanford
University. She is the director of the Stanford Artificial
Intelligence Lab and the Stanford Vision Lab.
● Born: 1976, Beijing, China
● Spouse: Silvio Savarese
● Education: California Institute of Technology (2005)
● Residence: United States of America
● Books: Computer Vision: From 3D Reconstruction to
Visual Recognition, more
● Doctoral advisors: Pietro Perona, Christof Koch
● http://vision.stanford.edu/feifeili/
● @drfeifei
Andrej Karpathy
Director of AI at Tesla, currently focused on perception for the
Autopilot.
Previously, I was a Research Scientist at OpenAI working on
Deep Learning in Computer Vision, Generative Modeling and
Reinforcement Learning.
PhD from Stanford, where I worked with Fei-Fei Li on
Convolutional/Recurrent Neural Network architectures and
their applications in Computer Vision, Natural Language
Processing and their intersection.
● http://cs.stanford.edu/people/karpathy/
● @karpathy
OpenAI Gym
Founded: December 11, 2015
Founders: Elon Musk, Sam Altman, and others
Type: 501(c)(3) Nonprofit organization
Location: San Francisco, California, USA
Products: OpenAI Gym
Mission: Friendly artificial intelligence
● https://www.openai.com/
● @OpenAI

More Related Content

PDF
Supervised Machine Learning With Types And Techniques
PDF
Supervised learning
PPTX
Supervised Machine Learning
PPTX
Supervised learning
PPTX
Supervised learning and Unsupervised learning
PDF
Cmpe 255 cross validation
PPTX
Selecting the Right Type of Algorithm for Various Applications - Phdassistance
PDF
Selecting the Right Type of Algorithm for Various Applications - Phdassistance
Supervised Machine Learning With Types And Techniques
Supervised learning
Supervised Machine Learning
Supervised learning
Supervised learning and Unsupervised learning
Cmpe 255 cross validation
Selecting the Right Type of Algorithm for Various Applications - Phdassistance
Selecting the Right Type of Algorithm for Various Applications - Phdassistance

What's hot (20)

PPTX
Short Story Submission on Meta Learning
PPTX
Machine learning overview
PDF
Machine learning - AI
PPTX
supervised learning
PDF
Machine Learning Interview Questions and Answers
PPTX
Machine learning ppt.
PPT
Machine Learning
PDF
Using machine learning in anti money laundering part 2
PPTX
Machine learning ppt
PPTX
supervised and unsupervised learning
PPTX
Machine learning and types
PDF
Using Machine Learning in Anti Money Laundering - Part 1
PPTX
Supervised Machine Learning Techniques
PDF
Supervised Machine Learning Techniques common algorithms and its application
PDF
Handling Imbalanced Data: SMOTE vs. Random Undersampling
PPTX
Supervised Unsupervised and Reinforcement Learning
PDF
Machine Learning Interview Questions
PPTX
Introduction to machine learning
PDF
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
PPTX
Machine learning
Short Story Submission on Meta Learning
Machine learning overview
Machine learning - AI
supervised learning
Machine Learning Interview Questions and Answers
Machine learning ppt.
Machine Learning
Using machine learning in anti money laundering part 2
Machine learning ppt
supervised and unsupervised learning
Machine learning and types
Using Machine Learning in Anti Money Laundering - Part 1
Supervised Machine Learning Techniques
Supervised Machine Learning Techniques common algorithms and its application
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Supervised Unsupervised and Reinforcement Learning
Machine Learning Interview Questions
Introduction to machine learning
Data Analytics, Machine Learning, and HPC in Today’s Changing Application Env...
Machine learning
Ad

Similar to Machine learning - session 4 (20)

PDF
Data Science Interview Questions PDF By ScholarHat
PDF
Introduction to machine learning
PDF
Top 20 Data Science Interview Questions and Answers in 2023.pdf
PDF
Machine Learning - Deep Learning
PPTX
Machine learning - session 3
PPT
5_Model for Predictions_Machine_Learning.ppt
PDF
Machine learning e book all chapters.pdf
PDF
Post Graduate Admission Prediction System
PPTX
Statistical Learning and Model Selection module 2.pptx
PDF
Supervised learning techniques and applications
PPTX
Pharmacokinetic pharmacodynamic modeling
PDF
Module 4: Model Selection and Evaluation
PPT
How do Machine Learn in various environments.ppt
PPT
Introduction Machine Learning for beginners
PPT
Introduction of Machine Learning for beginners
PPT
Introduction to Machine Learning for Beginners
PPT
Machine Learning_application_what to do and why.ppt
PPT
Machine Learning introduction and types.ppt
PPT
Machine Learning basics POWERPOINT PRESENETATION
PPT
Introduction au Machine Learning : Concepts, Algorithmes et Applications
Data Science Interview Questions PDF By ScholarHat
Introduction to machine learning
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Machine Learning - Deep Learning
Machine learning - session 3
5_Model for Predictions_Machine_Learning.ppt
Machine learning e book all chapters.pdf
Post Graduate Admission Prediction System
Statistical Learning and Model Selection module 2.pptx
Supervised learning techniques and applications
Pharmacokinetic pharmacodynamic modeling
Module 4: Model Selection and Evaluation
How do Machine Learn in various environments.ppt
Introduction Machine Learning for beginners
Introduction of Machine Learning for beginners
Introduction to Machine Learning for Beginners
Machine Learning_application_what to do and why.ppt
Machine Learning introduction and types.ppt
Machine Learning basics POWERPOINT PRESENETATION
Introduction au Machine Learning : Concepts, Algorithmes et Applications
Ad

More from Luis Borbon (11)

PPTX
Python for web development
PPTX
Big data
PPTX
Information literacy
PPTX
Unit test and continuous deployment
PPTX
Machine learning - session 8
PPTX
Machine learning - session 7
PPTX
Machine learning session 6
PPTX
Machine learning - session 5
PPTX
Machine learning - session 2
PPTX
Machine learning - session 1
PPTX
Docker swarm workshop
Python for web development
Big data
Information literacy
Unit test and continuous deployment
Machine learning - session 8
Machine learning - session 7
Machine learning session 6
Machine learning - session 5
Machine learning - session 2
Machine learning - session 1
Docker swarm workshop

Recently uploaded (20)

PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Lecture1 pattern recognition............
PPTX
Introduction to Knowledge Engineering Part 1
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Computer network topology notes for revision
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Mega Projects Data Mega Projects Data
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Clinical guidelines as a resource for EBP(1).pdf
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Lecture1 pattern recognition............
Introduction to Knowledge Engineering Part 1
.pdf is not working space design for the following data for the following dat...
Supervised vs unsupervised machine learning algorithms
oil_refinery_comprehensive_20250804084928 (1).pptx
Computer network topology notes for revision
Galatica Smart Energy Infrastructure Startup Pitch Deck
Mega Projects Data Mega Projects Data
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Introduction to machine learning and Linear Models
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Fluorescence-microscope_Botany_detailed content
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction-to-Cloud-ComputingFinal.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...

Machine learning - session 4

  • 1. Machine Learning Lunch & Learn - Session 4 Luis Borbon 11/07/2017
  • 2. Table of contents 1. Recap 2. Generalization in Machine Learning 3. Overfitting and Underfitting 4. Algorithms by Similarity 5. Real Application 6. People to follow
  • 4. Recap ● Training, validation and test data sets. ● Learning Style ○ Supervised ○ Unsupervised ○ Semi-Supervised Learning. ● Similarity ○ Regression Algorithms ○ Instance-based Algorithms ○ Regularization Algorithms ○ Decision Tree Algorithms
  • 6. Recap Decision trees Possible applications in PlantMiner: For a searcher: Based on previous quotes, identify an item that usually is being hired along other. ● Suggest the item. ● Offer a discount to add the suggested item. For a supplier: Identify suppliers that would crunch on the next subscription renewal.
  • 8. Induction and deduction Induction refers to learning general concepts from specific examples which is exactly the problem that supervised machine learning problems aim to solve. This is different from deduction that is the other way around and seeks to learn specific concepts from general rules.
  • 9. Induction and deduction The goal of a good machine learning model is to generalize well from the training data to any data from the problem domain. This allows us to make predictions in the future on data the model has never seen.
  • 11. Overfitting In machine learning, one of the most common tasks is to fit a "model" to a set of training data, so as to be able to make reliable predictions on general untrained data. In overfitting, a statistical model describes random error or noise instead of the underlying relationship. The green line represents an overfitted model and the black line represents a regularised model. While the green line best follows the training data, it is too dependent on it and it is likely to have a higher error rate on new unseen data, compared to the black line.
  • 12. Overfitting A model that has been overfit has poor predictive performance, as it overreacts to minor fluctuations in the training data. Noisy (roughly linear) data is fitted to both linear and polynomial functions. Although the polynomial function is a perfect fit, the linear version can be expected to generalize better. In other words, if the two functions were used to extrapolate the data beyond the fit data, the linear function would make better predictions.
  • 13. Overfitting Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. Overfitting/overtraining in supervised learning (e.g., neural network). Training error is shown in blue, validation error in red, both as a function of the number of training cycles. If the validation error increases(positive slope) while the training error steadily decreases(negative slope) then a situation of overfitting may have occurred. The best predictive and fitted model would be where the validation error has its global minimum.
  • 14. Underfitting Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. It occurs when the model or algorithm does not fit the data enough. Underfitting occurs if the model or algorithm shows low variance but high bias (to contrast the opposite, overfitting from high variance and low bias). It is often a result of an excessively simple model. Underfitting would occur, for example, when fitting a linear model to non-linear data. Such a model would have poor predictive performance.
  • 15. Underfitting There are two important techniques that you can use when evaluating machine learning algorithms to limit overfitting: ● Use a resampling technique to estimate model accuracy. ● Hold back a validation dataset.
  • 16. Underfitting Resampling The most popular resampling technique is k-fold cross validation. It allows you to train and test your model k-times on different subsets of training data and build up an estimate of the performance of a machine learning model on unseen data. Validation dataset A validation dataset is simply a subset of your training data that you hold back from your machine learning algorithms until the very end of your project. After you have selected and tuned your machine learning algorithms on your training dataset you can evaluate the learned models on the validation dataset to get a final objective idea of how the models might perform on unseen data.
  • 18. Bayesian Algorithms Bayesian methods are those that explicitly apply Bayes’ Theorem for problems such as classification and regression. With appropriate pre-processing, it is competitive in this domain with more advanced methods including support vector machines. It also finds application in automatic medical diagnosis. Document classification, based on word frequencies. e.g. SPAM.
  • 19. Bayesian Algorithms The most popular Bayesian algorithms are: ● Naive Bayes ● Gaussian Naive Bayes ● Multinomial Naive Bayes ● Averaged One-Dependence Estimators (AODE) ● Bayesian Belief Network (BBN) ● Bayesian Network (BN)
  • 21. DoseMe.com.au Bayesian dosing uses patient data and laboratory results to estimate a patient's ability to absorb, process, and clear a drug from their system. Using a published population model, DoseMe's algorithms adjusts the pharmacokinetic and/or pharmacodynamic parameters so that a patient-specific, individualised drug model is built. This individual model is then used to provide a patient-specific dosing recommendation to reach a therapeutic target.
  • 23. Fei-Fei Li Fei-Fei Li, who publishes under the name Li Fei-Fei, is an Associate Professor of Computer Science at Stanford University. She is the director of the Stanford Artificial Intelligence Lab and the Stanford Vision Lab. ● Born: 1976, Beijing, China ● Spouse: Silvio Savarese ● Education: California Institute of Technology (2005) ● Residence: United States of America ● Books: Computer Vision: From 3D Reconstruction to Visual Recognition, more ● Doctoral advisors: Pietro Perona, Christof Koch ● http://vision.stanford.edu/feifeili/ ● @drfeifei
  • 24. Andrej Karpathy Director of AI at Tesla, currently focused on perception for the Autopilot. Previously, I was a Research Scientist at OpenAI working on Deep Learning in Computer Vision, Generative Modeling and Reinforcement Learning. PhD from Stanford, where I worked with Fei-Fei Li on Convolutional/Recurrent Neural Network architectures and their applications in Computer Vision, Natural Language Processing and their intersection. ● http://cs.stanford.edu/people/karpathy/ ● @karpathy
  • 25. OpenAI Gym Founded: December 11, 2015 Founders: Elon Musk, Sam Altman, and others Type: 501(c)(3) Nonprofit organization Location: San Francisco, California, USA Products: OpenAI Gym Mission: Friendly artificial intelligence ● https://www.openai.com/ ● @OpenAI