SlideShare a Scribd company logo
Andrew Y. Ng
Advice for applying
Machine Learning
Andrew Ng
Stanford University
Andrew Y. Ng
Today’s Lecture
• Advice on how getting learning algorithms to different applications.
• Most of today’s material is not very mathematical. But it’s also some of the
hardest material in this class to understand.
• Some of what I’ll say today is debatable.
• Some of what I’ll say is not good advice for doing novel machine learning
research.
• Key ideas:
1. Diagnostics for debugging learning algorithms.
2. Error analyses and ablative analysis.
3. How to get started on a machine learning problem.
– Premature (statistical) optimization.
Andrew Y. Ng
Debugging Learning
Algorithms
Andrew Y. Ng
Debugging learning algorithms
Motivating example:
• Anti-spam. You carefully choose a small set of 100 words to use as
features. (Instead of using all 50000+ words in English.)
• Bayesian logistic regression, implemented with gradient descent, gets 20%
test error, which is unacceptably high.
• What to do next?
Andrew Y. Ng
Fixing the learning algorithm
• Bayesian logistic regression:
• Common approach: Try improving the algorithm in different ways.
– Try getting more training examples.
– Try a smaller set of features.
– Try a larger set of features.
– Try changing the features: Email header vs. email body features.
– Run gradient descent for more iterations.
– Try Newton’s method.
– Use a different value for λ.
– Try using an SVM.
• This approach might work, but it’s very time-consuming, and largely a matter
of luck whether you end up fixing what the problem really is.
Andrew Y. Ng
Diagnostic for bias vs. variance
Better approach:
– Run diagnostics to figure out what the problem is.
– Fix whatever the problem is.
Bayesian logistic regression’s test error is 20% (unacceptably high).
Suppose you suspect the problem is either:
– Overfitting (high variance).
– Too few features to classify spam (high bias).
Diagnostic:
– Variance: Training error will be much lower than test error.
– Bias: Training error will also be high.
Andrew Y. Ng
More on bias vs. variance
Typical learning curve for high variance:
m (training set size)
error
Test error
Training error
• Test error still decreasing as m increases. Suggests larger training set
will help.
• Large gap between training and test error.
Desired performance
Andrew Y. Ng
More on bias vs. variance
Typical learning curve for high bias:
m (training set size)
error
Test error
Training error
• Even training error is unacceptably high.
• Small gap between training and test error.
Desired performance
Andrew Y. Ng
Diagnostics tell you what to try next
Bayesian logistic regression, implemented with gradient descent.
Fixes to try:
– Try getting more training examples.
– Try a smaller set of features.
– Try a larger set of features.
– Try email header features.
– Run gradient descent for more iterations.
– Try Newton’s method.
– Use a different value for λ.
– Try using an SVM.
Fixes high variance.
Fixes high variance.
Fixes high bias.
Fixes high bias.
Andrew Y. Ng
Optimization algorithm diagnostics
• Bias vs. variance is one common diagnostic.
• For other problems, it’s usually up to your own ingenuity to construct your
own diagnostics to figure out what’s wrong.
• Another example:
– Bayesian logistic regression gets 2% error on spam, and 2% error on non-spam.
(Unacceptably high error on non-spam.)
– SVM using a linear kernel gets 10% error on spam, and 0.01% error on non-
spam. (Acceptable performance.)
– But you want to use logistic regression, because of computational efficiency, etc.
• What to do next?
Andrew Y. Ng
More diagnostics
• Other common questions:
– Is the algorithm (gradient descent for logistic regression) converging?
Iterations
J(θ)
Objective
It’s often very hard to tell if an algorithm has converged yet by looking at the objective.
Andrew Y. Ng
More diagnostics
• Other common questions:
– Is the algorithm (gradient descent for logistic regression) converging?
– Are you optimizing the right function?
– I.e., what you care about:
(weights w(i) higher for non-spam than for spam).
– Bayesian logistic regression? Correct value for λ?
– SVM? Correct value for C?
Andrew Y. Ng
Diagnostic
An SVM outperforms Bayesian logistic regression, but you really want to deploy Bayesian
logistic regression for your application.
Let θSVM be the parameters learned by an SVM.
Let θBLR be the parameters learned by Bayesian logistic regression.
You care about weighted accuracy:
θSVM outperforms θBLR. So:
BLR tries to maximize:
Diagnostic:
Andrew Y. Ng
Two cases
Case 1:
But BLR was trying to maximize J(θ). This means that θBLR fails to maximize J, and the
problem is with the convergence of the algorithm. Problem is with optimization
algorithm.
Case 2:
This means that BLR succeeded at maximizing J(θ). But the SVM, which does worse on
J(θ), actually does better on weighted accuracy a(θ).
This means that J(θ) is the wrong function to be maximizing, if you care about a(θ).
Problem is with objective function of the maximization problem.
Andrew Y. Ng
Diagnostics tell you what to try next
Bayesian logistic regression, implemented with gradient descent.
Fixes to try:
– Try getting more training examples.
– Try a smaller set of features.
– Try a larger set of features.
– Try email header features.
– Run gradient descent for more iterations.
– Try Newton’s method.
– Use a different value for λ.
– Try using an SVM.
Fixes high variance.
Fixes high variance.
Fixes high bias.
Fixes high bias.
Fixes optimization algorithm.
Fixes optimization algorithm.
Fixes optimization objective.
Fixes optimization objective.
Andrew Y. Ng
The Stanford Autonomous Helicopter
Payload: 14 pounds
Weight: 32 pounds
Andrew Y. Ng
Machine learning algorithm
Simulator
1. Build a simulator of helicopter.
2. Choose a cost function. Say J(θ) = ||x – xdesired||2 (x = helicopter position)
3. Run reinforcement learning (RL) algorithm to fly helicopter in simulation, so
as to try to minimize cost function:
θRL = arg minθ J(θ)
Suppose you do this, and the resulting controller parameters θRL gives much worse
performance than your human pilot. What to do next?
Improve simulator?
Modify cost function J?
Modify RL algorithm?
Andrew Y. Ng
Debugging an RL algorithm
The controller given by θRL performs poorly.
Suppose that:
1. The helicopter simulator is accurate.
2. The RL algorithm correctly controls the helicopter (in simulation) so as to
minimize J(θ).
3. Minimizing J(θ) corresponds to correct autonomous flight.
Then: The learned parameters θRL should fly well on the actual helicopter.
Diagnostics:
1. If θRL flies well in simulation, but not in real life, then the problem is in the
simulator. Otherwise:
2. Let θhuman be the human control policy. If J(θhuman) < J(θRL), then the problem is
in the reinforcement learning algorithm. (Failing to minimize the cost function J.)
3. If J(θhuman) ≥ J(θRL), then the problem is in the cost function. (Maximizing it
doesn’t correspond to good autonomous flight.)
Andrew Y. Ng
More on diagnostics
• Quite often, you’ll need to come up with your own diagnostics to figure out
what’s happening in an algorithm.
• Even if a learning algorithm is working well, you might also run diagnostics to
make sure you understand what’s going on. This is useful for:
– Understanding your application problem: If you’re working on one important ML
application for months/years, it’s very valuable for you personally to get a intuitive
understand of what works and what doesn’t work in your problem.
– Writing research papers: Diagnostics and error analysis help convey insight about
the problem, and justify your research claims.
– I.e., Rather than saying “Here’s an algorithm that works,” it’s more interesting to
say “Here’s an algorithm that works because of component X, and here’s my
justification.”
• Good machine learning practice: Error analysis. Try to understand what
your sources of error are.
Andrew Y. Ng
Error Analysis
Andrew Y. Ng
Error analysis
Many applications combine many different learning components into a
“pipeline.” E.g., Face recognition from images: [contrived example]
Logistic regression
Face detection
Camera
image
Eyes segmentation
Nose segmentation
Mouth segmentation
Preprocess
(remove background)
Label
Andrew Y. Ng
Error analysis
How much error is attributable to each of the
components?
Plug in ground-truth for each component, and
see how accuracy changes.
Conclusion: Most room for improvement in face
detection and eyes segmentation.
Component Accuracy
Overall system 85%
Preprocess (remove
background)
85.1%
Face detection 91%
Eyes segmentation 95%
Nose segmentation 96%
Mouth segmentation 97%
Logistic regression 100%
Logistic regression
Face detection
Camera
image
Eyes segmentation
Nose segmentation
Mouth segmentation
Preprocess
(remove background)
Label
Preprocess
(remove background)
Face detection
Eyes segmentation
Nose segmentation
Mouth segmentation
Logistic regression
Andrew Y. Ng
Ablative analysis
Error analysis tries to explain the difference between current performance and
perfect performance.
Ablative analysis tries to explain the difference between some baseline (much
poorer) performance and current performance.
E.g., Suppose that you’ve build a good anti-spam classifier by adding lots of
clever features to logistic regression:
– Spelling correction.
– Sender host features.
– Email header features.
– Email text parser features.
– Javascript parser.
– Features from embedded images.
Question: How much did each of these components really help?
Andrew Y. Ng
Ablative analysis
Simple logistic regression without any clever features get 94% performance.
Just what accounts for your improvement from 94 to 99.9%?
Ablative analysis: Remove components from your system one at a time, to see
how it breaks.
Conclusion: The email text parser features account for most of the
improvement.
Component Accuracy
Overall system 99.9%
Spelling correction 99.0
Sender host features 98.9%
Email header features 98.9%
Email text parser features 95%
Javascript parser 94.5%
Features from images 94.0% [baseline]
Andrew Y. Ng
Getting started on a
learning problem
Andrew Y. Ng
Getting started on a problem
Approach #1: Careful design.
• Spend a long term designing exactly the right features, collecting the right dataset,
and designing the right algorithmic architecture.
• Implement it and hope it works.
• Benefit: Nicer, perhaps more scalable algorithms. May come up with new, elegant,
learning algorithms; contribute to basic research in machine learning.
Approach #2: Build-and-fix.
• Implement something quick-and-dirty.
• Run error analyses and diagnostics to see what’s wrong with it, and fix its errors.
• Benefit: Will often get your application problem working more quickly. Faster time to
market.
Andrew Y. Ng
Premature statistical optimization
Very often, it’s not clear what parts of a system are easy or difficult to build, and
which parts you need to spend lots of time focusing on. E.g.,
The only way to find out what needs work is to implement something quickly,
and find out what parts break.
[But this may be bad advice if your goal is to come up with new machine
learning algorithms.]
Logistic regression
Face detection
Camera
image
Eyes segmentation
Nose segmentation
Mouth segmentation
Preprocess
(remove background)
Label
This system’s much too
complicated for a first attempt.
Step 1 of designing a learning
system: Plot the data.
Andrew Y. Ng
The danger of over-theorizing
[Based on Papadimitriou, 1995]
Mail
delivery
robot
Obstacle
avoidance
Robot
manipulation
Navigation
Object
detection
Color
invariance
3d similarity
learning
Differential
geometry of
3d manifolds
Complexity of
non-Riemannian
geometries
… Convergence
bounds for
sampled non-
monotonic logic
VC
dimension
Andrew Y. Ng
Summary
Andrew Y. Ng
Summary
• Time spent coming up with diagnostics for learning algorithms is time well-
spent.
• It’s often up to your own ingenuity to come up with right diagnostics.
• Error analyses and ablative analyses also give insight into the problem.
• Two approaches to applying learning algorithms:
– Design very carefully, then implement.
• Risk of premature (statistical) optimization.
– Build a quick-and-dirty prototype, diagnose, and fix.

More Related Content

PDF
Docs slides-lecture10
PPTX
Machine Learning lecture7(ml system design1)
PDF
Top 50+ Data Science Interview Questions and Answers for 2025 (1).pdf
PPT
Sarcia idoese08
PDF
Top 10 Data Science Practitioner Pitfalls
PPT
notes as .ppt
PPTX
Regularization_BY_MOHAMED_ESSAM.pptx
PPTX
H2O World - Top 10 Data Science Pitfalls - Mark Landry
Docs slides-lecture10
Machine Learning lecture7(ml system design1)
Top 50+ Data Science Interview Questions and Answers for 2025 (1).pdf
Sarcia idoese08
Top 10 Data Science Practitioner Pitfalls
notes as .ppt
Regularization_BY_MOHAMED_ESSAM.pptx
H2O World - Top 10 Data Science Pitfalls - Mark Landry

Similar to ML-advice machine learning forn applying (20)

PDF
ML_Lec4 introduction to linear regression.pdf
PPTX
PPTX
Model Calibration and Uncertainty Analysis
PPTX
Polynomial Regression explaining with examples .pptx
PPTX
Dowhy: An end-to-end library for causal inference
PDF
Machine learning (5)
PDF
Barga Data Science lecture 10
PDF
Model Selection and Validation
PDF
Andrii Belas: A/B testing overview: use-cases, theory and tools
PDF
Lecture 9: Machine Learning in Practice (2)
PPT
Overfitting and-tbl
PPTX
Recommender Systems from A to Z – Model Training
PDF
L1 intro2 supervised_learning
PDF
The limits of unit testing by Craig Stuntz
PDF
The Limits of Unit Testing by Craig Stuntz
PPT
Lecture 1
PPT
lec1.ppt
PPTX
ML Study Jams - Session 3.pptx
PDF
Machine Learning Interview Questions
DOCX
Chapter 10 Testing and Quality Assurance1Unders.docx
ML_Lec4 introduction to linear regression.pdf
Model Calibration and Uncertainty Analysis
Polynomial Regression explaining with examples .pptx
Dowhy: An end-to-end library for causal inference
Machine learning (5)
Barga Data Science lecture 10
Model Selection and Validation
Andrii Belas: A/B testing overview: use-cases, theory and tools
Lecture 9: Machine Learning in Practice (2)
Overfitting and-tbl
Recommender Systems from A to Z – Model Training
L1 intro2 supervised_learning
The limits of unit testing by Craig Stuntz
The Limits of Unit Testing by Craig Stuntz
Lecture 1
lec1.ppt
ML Study Jams - Session 3.pptx
Machine Learning Interview Questions
Chapter 10 Testing and Quality Assurance1Unders.docx
Ad

More from harizi riadh (15)

PDF
clusteringEng pour savoir le technique de classtering
PDF
séance1.pdf
PDF
Coursera 96EBJW4ZEYZL.pdf
PDF
Introduction web.pdf
PDF
IntelligenceArtificielle.pdf
PDF
administration réseaux.pdf
PDF
Notion_De_Base_En_Informatique.pdf
PDF
Fiche_14_-_Informatique-Comment_gerer_la_maintenance_de_son_parc_informatique...
PDF
MODULE_18_Configuration_dun_Routeur.pdf
PDF
Cours_boot.pdf
PDF
assiter AR.pdf
PDF
0_CoursSI_Plan.pdf
PDF
cours_CSI.pdf
PDF
4_Architectures_de_SI.pdf
PDF
5_EAI_des_SI.pdf
clusteringEng pour savoir le technique de classtering
séance1.pdf
Coursera 96EBJW4ZEYZL.pdf
Introduction web.pdf
IntelligenceArtificielle.pdf
administration réseaux.pdf
Notion_De_Base_En_Informatique.pdf
Fiche_14_-_Informatique-Comment_gerer_la_maintenance_de_son_parc_informatique...
MODULE_18_Configuration_dun_Routeur.pdf
Cours_boot.pdf
assiter AR.pdf
0_CoursSI_Plan.pdf
cours_CSI.pdf
4_Architectures_de_SI.pdf
5_EAI_des_SI.pdf
Ad

Recently uploaded (20)

PPTX
Sustainable Sites - Green Building Construction
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Digital Logic Computer Design lecture notes
PPTX
Geodesy 1.pptx...............................................
PPTX
UNIT 4 Total Quality Management .pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PPTX
OOP with Java - Java Introduction (Basics)
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
Sustainable Sites - Green Building Construction
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Digital Logic Computer Design lecture notes
Geodesy 1.pptx...............................................
UNIT 4 Total Quality Management .pptx
R24 SURVEYING LAB MANUAL for civil enggi
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
OOP with Java - Java Introduction (Basics)
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
CH1 Production IntroductoryConcepts.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Foundation to blockchain - A guide to Blockchain Tech
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Automation-in-Manufacturing-Chapter-Introduction.pdf

ML-advice machine learning forn applying

  • 1. Andrew Y. Ng Advice for applying Machine Learning Andrew Ng Stanford University
  • 2. Andrew Y. Ng Today’s Lecture • Advice on how getting learning algorithms to different applications. • Most of today’s material is not very mathematical. But it’s also some of the hardest material in this class to understand. • Some of what I’ll say today is debatable. • Some of what I’ll say is not good advice for doing novel machine learning research. • Key ideas: 1. Diagnostics for debugging learning algorithms. 2. Error analyses and ablative analysis. 3. How to get started on a machine learning problem. – Premature (statistical) optimization.
  • 3. Andrew Y. Ng Debugging Learning Algorithms
  • 4. Andrew Y. Ng Debugging learning algorithms Motivating example: • Anti-spam. You carefully choose a small set of 100 words to use as features. (Instead of using all 50000+ words in English.) • Bayesian logistic regression, implemented with gradient descent, gets 20% test error, which is unacceptably high. • What to do next?
  • 5. Andrew Y. Ng Fixing the learning algorithm • Bayesian logistic regression: • Common approach: Try improving the algorithm in different ways. – Try getting more training examples. – Try a smaller set of features. – Try a larger set of features. – Try changing the features: Email header vs. email body features. – Run gradient descent for more iterations. – Try Newton’s method. – Use a different value for λ. – Try using an SVM. • This approach might work, but it’s very time-consuming, and largely a matter of luck whether you end up fixing what the problem really is.
  • 6. Andrew Y. Ng Diagnostic for bias vs. variance Better approach: – Run diagnostics to figure out what the problem is. – Fix whatever the problem is. Bayesian logistic regression’s test error is 20% (unacceptably high). Suppose you suspect the problem is either: – Overfitting (high variance). – Too few features to classify spam (high bias). Diagnostic: – Variance: Training error will be much lower than test error. – Bias: Training error will also be high.
  • 7. Andrew Y. Ng More on bias vs. variance Typical learning curve for high variance: m (training set size) error Test error Training error • Test error still decreasing as m increases. Suggests larger training set will help. • Large gap between training and test error. Desired performance
  • 8. Andrew Y. Ng More on bias vs. variance Typical learning curve for high bias: m (training set size) error Test error Training error • Even training error is unacceptably high. • Small gap between training and test error. Desired performance
  • 9. Andrew Y. Ng Diagnostics tell you what to try next Bayesian logistic regression, implemented with gradient descent. Fixes to try: – Try getting more training examples. – Try a smaller set of features. – Try a larger set of features. – Try email header features. – Run gradient descent for more iterations. – Try Newton’s method. – Use a different value for λ. – Try using an SVM. Fixes high variance. Fixes high variance. Fixes high bias. Fixes high bias.
  • 10. Andrew Y. Ng Optimization algorithm diagnostics • Bias vs. variance is one common diagnostic. • For other problems, it’s usually up to your own ingenuity to construct your own diagnostics to figure out what’s wrong. • Another example: – Bayesian logistic regression gets 2% error on spam, and 2% error on non-spam. (Unacceptably high error on non-spam.) – SVM using a linear kernel gets 10% error on spam, and 0.01% error on non- spam. (Acceptable performance.) – But you want to use logistic regression, because of computational efficiency, etc. • What to do next?
  • 11. Andrew Y. Ng More diagnostics • Other common questions: – Is the algorithm (gradient descent for logistic regression) converging? Iterations J(θ) Objective It’s often very hard to tell if an algorithm has converged yet by looking at the objective.
  • 12. Andrew Y. Ng More diagnostics • Other common questions: – Is the algorithm (gradient descent for logistic regression) converging? – Are you optimizing the right function? – I.e., what you care about: (weights w(i) higher for non-spam than for spam). – Bayesian logistic regression? Correct value for λ? – SVM? Correct value for C?
  • 13. Andrew Y. Ng Diagnostic An SVM outperforms Bayesian logistic regression, but you really want to deploy Bayesian logistic regression for your application. Let θSVM be the parameters learned by an SVM. Let θBLR be the parameters learned by Bayesian logistic regression. You care about weighted accuracy: θSVM outperforms θBLR. So: BLR tries to maximize: Diagnostic:
  • 14. Andrew Y. Ng Two cases Case 1: But BLR was trying to maximize J(θ). This means that θBLR fails to maximize J, and the problem is with the convergence of the algorithm. Problem is with optimization algorithm. Case 2: This means that BLR succeeded at maximizing J(θ). But the SVM, which does worse on J(θ), actually does better on weighted accuracy a(θ). This means that J(θ) is the wrong function to be maximizing, if you care about a(θ). Problem is with objective function of the maximization problem.
  • 15. Andrew Y. Ng Diagnostics tell you what to try next Bayesian logistic regression, implemented with gradient descent. Fixes to try: – Try getting more training examples. – Try a smaller set of features. – Try a larger set of features. – Try email header features. – Run gradient descent for more iterations. – Try Newton’s method. – Use a different value for λ. – Try using an SVM. Fixes high variance. Fixes high variance. Fixes high bias. Fixes high bias. Fixes optimization algorithm. Fixes optimization algorithm. Fixes optimization objective. Fixes optimization objective.
  • 16. Andrew Y. Ng The Stanford Autonomous Helicopter Payload: 14 pounds Weight: 32 pounds
  • 17. Andrew Y. Ng Machine learning algorithm Simulator 1. Build a simulator of helicopter. 2. Choose a cost function. Say J(θ) = ||x – xdesired||2 (x = helicopter position) 3. Run reinforcement learning (RL) algorithm to fly helicopter in simulation, so as to try to minimize cost function: θRL = arg minθ J(θ) Suppose you do this, and the resulting controller parameters θRL gives much worse performance than your human pilot. What to do next? Improve simulator? Modify cost function J? Modify RL algorithm?
  • 18. Andrew Y. Ng Debugging an RL algorithm The controller given by θRL performs poorly. Suppose that: 1. The helicopter simulator is accurate. 2. The RL algorithm correctly controls the helicopter (in simulation) so as to minimize J(θ). 3. Minimizing J(θ) corresponds to correct autonomous flight. Then: The learned parameters θRL should fly well on the actual helicopter. Diagnostics: 1. If θRL flies well in simulation, but not in real life, then the problem is in the simulator. Otherwise: 2. Let θhuman be the human control policy. If J(θhuman) < J(θRL), then the problem is in the reinforcement learning algorithm. (Failing to minimize the cost function J.) 3. If J(θhuman) ≥ J(θRL), then the problem is in the cost function. (Maximizing it doesn’t correspond to good autonomous flight.)
  • 19. Andrew Y. Ng More on diagnostics • Quite often, you’ll need to come up with your own diagnostics to figure out what’s happening in an algorithm. • Even if a learning algorithm is working well, you might also run diagnostics to make sure you understand what’s going on. This is useful for: – Understanding your application problem: If you’re working on one important ML application for months/years, it’s very valuable for you personally to get a intuitive understand of what works and what doesn’t work in your problem. – Writing research papers: Diagnostics and error analysis help convey insight about the problem, and justify your research claims. – I.e., Rather than saying “Here’s an algorithm that works,” it’s more interesting to say “Here’s an algorithm that works because of component X, and here’s my justification.” • Good machine learning practice: Error analysis. Try to understand what your sources of error are.
  • 20. Andrew Y. Ng Error Analysis
  • 21. Andrew Y. Ng Error analysis Many applications combine many different learning components into a “pipeline.” E.g., Face recognition from images: [contrived example] Logistic regression Face detection Camera image Eyes segmentation Nose segmentation Mouth segmentation Preprocess (remove background) Label
  • 22. Andrew Y. Ng Error analysis How much error is attributable to each of the components? Plug in ground-truth for each component, and see how accuracy changes. Conclusion: Most room for improvement in face detection and eyes segmentation. Component Accuracy Overall system 85% Preprocess (remove background) 85.1% Face detection 91% Eyes segmentation 95% Nose segmentation 96% Mouth segmentation 97% Logistic regression 100% Logistic regression Face detection Camera image Eyes segmentation Nose segmentation Mouth segmentation Preprocess (remove background) Label Preprocess (remove background) Face detection Eyes segmentation Nose segmentation Mouth segmentation Logistic regression
  • 23. Andrew Y. Ng Ablative analysis Error analysis tries to explain the difference between current performance and perfect performance. Ablative analysis tries to explain the difference between some baseline (much poorer) performance and current performance. E.g., Suppose that you’ve build a good anti-spam classifier by adding lots of clever features to logistic regression: – Spelling correction. – Sender host features. – Email header features. – Email text parser features. – Javascript parser. – Features from embedded images. Question: How much did each of these components really help?
  • 24. Andrew Y. Ng Ablative analysis Simple logistic regression without any clever features get 94% performance. Just what accounts for your improvement from 94 to 99.9%? Ablative analysis: Remove components from your system one at a time, to see how it breaks. Conclusion: The email text parser features account for most of the improvement. Component Accuracy Overall system 99.9% Spelling correction 99.0 Sender host features 98.9% Email header features 98.9% Email text parser features 95% Javascript parser 94.5% Features from images 94.0% [baseline]
  • 25. Andrew Y. Ng Getting started on a learning problem
  • 26. Andrew Y. Ng Getting started on a problem Approach #1: Careful design. • Spend a long term designing exactly the right features, collecting the right dataset, and designing the right algorithmic architecture. • Implement it and hope it works. • Benefit: Nicer, perhaps more scalable algorithms. May come up with new, elegant, learning algorithms; contribute to basic research in machine learning. Approach #2: Build-and-fix. • Implement something quick-and-dirty. • Run error analyses and diagnostics to see what’s wrong with it, and fix its errors. • Benefit: Will often get your application problem working more quickly. Faster time to market.
  • 27. Andrew Y. Ng Premature statistical optimization Very often, it’s not clear what parts of a system are easy or difficult to build, and which parts you need to spend lots of time focusing on. E.g., The only way to find out what needs work is to implement something quickly, and find out what parts break. [But this may be bad advice if your goal is to come up with new machine learning algorithms.] Logistic regression Face detection Camera image Eyes segmentation Nose segmentation Mouth segmentation Preprocess (remove background) Label This system’s much too complicated for a first attempt. Step 1 of designing a learning system: Plot the data.
  • 28. Andrew Y. Ng The danger of over-theorizing [Based on Papadimitriou, 1995] Mail delivery robot Obstacle avoidance Robot manipulation Navigation Object detection Color invariance 3d similarity learning Differential geometry of 3d manifolds Complexity of non-Riemannian geometries … Convergence bounds for sampled non- monotonic logic VC dimension
  • 30. Andrew Y. Ng Summary • Time spent coming up with diagnostics for learning algorithms is time well- spent. • It’s often up to your own ingenuity to come up with right diagnostics. • Error analyses and ablative analyses also give insight into the problem. • Two approaches to applying learning algorithms: – Design very carefully, then implement. • Risk of premature (statistical) optimization. – Build a quick-and-dirty prototype, diagnose, and fix.