SlideShare a Scribd company logo
Limits of Machine Learning
P i c k i n g p r o b l e m s t o s o l v e w i t h M L
M E I R M A O R
C h i e f A r c h i t e c t @ S p a r k B e y o n d
About Me
Meir Maor
Chief Architect @ SparkBeyond
At SparkBeyond we leverage the collective human knowledge to solve the world's
toughest problems
AI the current frontier
What can and can not be done with Machine Learning
Practical advice for setting up Machine Learning Problems
Feature Engineering
Hyper Parameter tuning
Model Selection
Training Huge Neural Netwoks
AI is taking over the wold
Well, not quite, but ...
With no knowledge but the rules and typical game length, Alpha Zero learned to play
both Go and Chess at Super Human level
And more.
Single Sentence translation Chinese<->English at human level
ImageNet Image classification at super human ability
Finding problems in Non-Disclousre-Agreements at human level
Cancer early warning, Churn prediction, Ad optimization, Predictive maintenance,
and many many more.
Predicting is Hard, Especially the Future - N.Bohr
We use machine learning to:
Predict outcome, unseen behavior, future events
To automate human tasks, classify, label, prioritize
To Assume makes an Ass out of U and Me
Common assumptions in Machine Learning:
* Data comes from stationary distribution(train and test have same distribution)
* Training samples are independent
* Past actions were random, no hidden confounders
The unreasonable effectiveness of Data
With plenty of data machine learning becomes easy
The recent rise of Deep Learning, is not from algorithmic advancements but rather
the existence of large data sets and computers which can process them
With enough data you are essentially learning from examples almost identical to
those you later need to predict/classify
The perfect fit
I have tremendous computing power, let’s find a function which perfectly describes
my data
PAC model
No Free Lunch
Better generalization
Leslie Valiant, 1984
Compromise
Bias / Variance tradeoff → We must limit our search space
Shrink the hypothesis space:
limit boosting iterations
tree size
min sample in leaf
number of hidden nodes
impose sparsity constraint
...
Compromise cont.
Penalize “less favourable” models:
Lasso / ridge regularization
Bagging / Boot strap sampling
Drop out
Needle in a Hay Stack
All the Hay in the world won’t teach you what a needle looks like
It’s important to have enough samples of the rare class
What is normal?
Unsupervised learning tries to find a needle using only hay.
Abnormal Hay?
- Extra long?
- a piece of grain?
- Too long in the sun?
- Still green?
- A Needle!
Transfer Learning
Learning to solve a problem, and applying on a similar different problem
Any time samples don’t come from exactly the same distribution:
Learning from one area and applying on the next
Learning from the past and applying on the future
Learning from non random sample
Transfer Learning to the rescue
We can and must learn from previous problems
How can a child learn to identify a Ring Tailed Lemur from a single photo?
The computer isn’t there yet
We can use pre-trained embeddings, and pre-trained networks but only for similar
problems. Good results for text, some for images(similar domain), not so much
beyond.
Even with excellent help can’t learn a new animal from a single photo
What If?
When we are wondering about future actions, and past actions were not random we
are doing transfer learning.
Randomized Study
Always preferable to train machine learning on data were actions were random
Explore vs Exploit
When was that?
We want to learn only with what we will have.
We must know what did we know back then?
Mining for Unobtainium*
A client in the never never land want to find new Unobtainium deposits in the
never-never lands.
A large part of the the land has been explored and we have a map of the mines
Many areas were not explored, we have no Map
* Identifying client details were changed
Modelling Take 1
Place a grid on the never-never land map
All grid square with a known deposit are positive
Since Unobtainium is rare all others can be assumed to be negative
Use advanced imaging, radiometric, magnetic, topographic maps, geological
maps, and more for explaining variables.
99% AUC!! We are going to be rich!
Using topographic data, a big hole in the ground predicts a large deposit perfectly.
We are detecting existing active mines.
Back to the archives to find 50 year old maps from before most mines were open.
96% AUC! We are going to be rich!
Distance from roads, Is an excellent predictor.
Not only do all existing mines have roads to them
Past exploration was primarily in accessible areas
Removing roads is not enough, They are hidden in all the data.
Limits of Machine Learning
Observational study, Simpson’s Paradox
Obviously we should look at the break down
Low Birth-Weight babies born to smoking mothers have lower mortality rates
compared to similar weight babies from non smokers.
Normal Birth Weight babies born to smoking mothers have lower mortality rates
compared to similar weight babies from non smokers.
Ergo. Smoking is good for your baby!
Summary
Need Stationary distribution, need to predict on samples like those trained on
Need Data, especially of rare events
Need to be aware of changing world
Measure&evaluate everything!

More Related Content

PPT
Different learning Techniques in Artificial Intelligence
PPT
ML_Overview.ppt
PPTX
ML_Overview.pptx
PPT
ML_Overview.ppt
PPT
ML overview
PPT
Introduction to Machine Learning.
PPTX
Machine_Learning.pptx
DOC
Lecture #1: Introduction to machine learning (ML)
Different learning Techniques in Artificial Intelligence
ML_Overview.ppt
ML_Overview.pptx
ML_Overview.ppt
ML overview
Introduction to Machine Learning.
Machine_Learning.pptx
Lecture #1: Introduction to machine learning (ML)

Similar to Limits of Machine Learning (20)

PDF
Introduction to ML.pdf Supervised Learning, Unsupervised
DOCX
Training_Report_on_Machine_Learning.docx
PPT
Introduction to Machine Learning Aristotelis Tsirigos
PDF
Introduction AI ML& Mathematicals of ML.pdf
PPTX
introduction to machine learning
PPS
Brief Tour of Machine Learning
PDF
Introduction to Machine Learning
PPT
lec1.ppt
PPT
Lecture 1
PPTX
Introduction
PPTX
Introduction
PPTX
Introduction
PPT
Chapter01.ppt
PPT
Machine Learning ICS 273A
PPTX
Advanced Machine Learning- Introduction to Machine Learning
PPT
LECTURE8.PPT
PPTX
ML slide share.pptx
PPTX
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
PPT
Slides(ppt)
PDF
Machine Learning Fundamentals: Definition and many more
Introduction to ML.pdf Supervised Learning, Unsupervised
Training_Report_on_Machine_Learning.docx
Introduction to Machine Learning Aristotelis Tsirigos
Introduction AI ML& Mathematicals of ML.pdf
introduction to machine learning
Brief Tour of Machine Learning
Introduction to Machine Learning
lec1.ppt
Lecture 1
Introduction
Introduction
Introduction
Chapter01.ppt
Machine Learning ICS 273A
Advanced Machine Learning- Introduction to Machine Learning
LECTURE8.PPT
ML slide share.pptx
Rahul_Kirtoniya_11800121032_CSE_Machine_Learning.pptx
Slides(ppt)
Machine Learning Fundamentals: Definition and many more
Ad

More from Meir Maor (6)

PPTX
Sketch algoritms
ODP
Actionable Machine Learning
PPTX
Prior On Model Space
PPTX
Can automated feature engineering prevent target leaks
ODP
Scala Reflection & Runtime MetaProgramming
ODP
10 Things I Hate About Scala
Sketch algoritms
Actionable Machine Learning
Prior On Model Space
Can automated feature engineering prevent target leaks
Scala Reflection & Runtime MetaProgramming
10 Things I Hate About Scala
Ad

Recently uploaded (20)

PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Foundation of Data Science unit number two notes
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Quality review (1)_presentation of this 21
PPTX
Supervised vs unsupervised machine learning algorithms
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Computer network topology notes for revision
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
Introduction to Knowledge Engineering Part 1
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Database Infoormation System (DBIS).pptx
PPTX
1_Introduction to advance data techniques.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Foundation of Data Science unit number two notes
Clinical guidelines as a resource for EBP(1).pdf
Quality review (1)_presentation of this 21
Supervised vs unsupervised machine learning algorithms
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Data_Analytics_and_PowerBI_Presentation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
.pdf is not working space design for the following data for the following dat...
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Computer network topology notes for revision
Business Ppt On Nestle.pptx huunnnhhgfvu
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Introduction to Knowledge Engineering Part 1
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Database Infoormation System (DBIS).pptx
1_Introduction to advance data techniques.pptx

Limits of Machine Learning

  • 1. Limits of Machine Learning P i c k i n g p r o b l e m s t o s o l v e w i t h M L M E I R M A O R C h i e f A r c h i t e c t @ S p a r k B e y o n d
  • 2. About Me Meir Maor Chief Architect @ SparkBeyond At SparkBeyond we leverage the collective human knowledge to solve the world's toughest problems
  • 3. AI the current frontier What can and can not be done with Machine Learning Practical advice for setting up Machine Learning Problems Feature Engineering Hyper Parameter tuning Model Selection Training Huge Neural Netwoks
  • 4. AI is taking over the wold
  • 5. Well, not quite, but ... With no knowledge but the rules and typical game length, Alpha Zero learned to play both Go and Chess at Super Human level
  • 6. And more. Single Sentence translation Chinese<->English at human level ImageNet Image classification at super human ability Finding problems in Non-Disclousre-Agreements at human level Cancer early warning, Churn prediction, Ad optimization, Predictive maintenance, and many many more.
  • 7. Predicting is Hard, Especially the Future - N.Bohr We use machine learning to: Predict outcome, unseen behavior, future events To automate human tasks, classify, label, prioritize
  • 8. To Assume makes an Ass out of U and Me Common assumptions in Machine Learning: * Data comes from stationary distribution(train and test have same distribution) * Training samples are independent * Past actions were random, no hidden confounders
  • 9. The unreasonable effectiveness of Data With plenty of data machine learning becomes easy The recent rise of Deep Learning, is not from algorithmic advancements but rather the existence of large data sets and computers which can process them With enough data you are essentially learning from examples almost identical to those you later need to predict/classify
  • 10. The perfect fit I have tremendous computing power, let’s find a function which perfectly describes my data PAC model No Free Lunch Better generalization Leslie Valiant, 1984
  • 11. Compromise Bias / Variance tradeoff → We must limit our search space Shrink the hypothesis space: limit boosting iterations tree size min sample in leaf number of hidden nodes impose sparsity constraint ...
  • 12. Compromise cont. Penalize “less favourable” models: Lasso / ridge regularization Bagging / Boot strap sampling Drop out
  • 13. Needle in a Hay Stack All the Hay in the world won’t teach you what a needle looks like It’s important to have enough samples of the rare class
  • 14. What is normal? Unsupervised learning tries to find a needle using only hay. Abnormal Hay? - Extra long? - a piece of grain? - Too long in the sun? - Still green? - A Needle!
  • 15. Transfer Learning Learning to solve a problem, and applying on a similar different problem Any time samples don’t come from exactly the same distribution: Learning from one area and applying on the next Learning from the past and applying on the future Learning from non random sample
  • 16. Transfer Learning to the rescue We can and must learn from previous problems How can a child learn to identify a Ring Tailed Lemur from a single photo?
  • 17. The computer isn’t there yet We can use pre-trained embeddings, and pre-trained networks but only for similar problems. Good results for text, some for images(similar domain), not so much beyond. Even with excellent help can’t learn a new animal from a single photo
  • 18. What If? When we are wondering about future actions, and past actions were not random we are doing transfer learning.
  • 19. Randomized Study Always preferable to train machine learning on data were actions were random
  • 21. When was that? We want to learn only with what we will have. We must know what did we know back then?
  • 22. Mining for Unobtainium* A client in the never never land want to find new Unobtainium deposits in the never-never lands. A large part of the the land has been explored and we have a map of the mines Many areas were not explored, we have no Map * Identifying client details were changed
  • 23. Modelling Take 1 Place a grid on the never-never land map All grid square with a known deposit are positive Since Unobtainium is rare all others can be assumed to be negative Use advanced imaging, radiometric, magnetic, topographic maps, geological maps, and more for explaining variables.
  • 24. 99% AUC!! We are going to be rich! Using topographic data, a big hole in the ground predicts a large deposit perfectly. We are detecting existing active mines. Back to the archives to find 50 year old maps from before most mines were open.
  • 25. 96% AUC! We are going to be rich! Distance from roads, Is an excellent predictor. Not only do all existing mines have roads to them Past exploration was primarily in accessible areas Removing roads is not enough, They are hidden in all the data.
  • 28. Obviously we should look at the break down Low Birth-Weight babies born to smoking mothers have lower mortality rates compared to similar weight babies from non smokers. Normal Birth Weight babies born to smoking mothers have lower mortality rates compared to similar weight babies from non smokers. Ergo. Smoking is good for your baby!
  • 29. Summary Need Stationary distribution, need to predict on samples like those trained on Need Data, especially of rare events Need to be aware of changing world Measure&evaluate everything!

Editor's Notes

  • #2: &amp;lt;number&amp;gt;
  • #4: Simple models which rely on a few well established concepts and yet don’t rely to heavly on any one source are more likely to represent the underlying distribution of the data. More then that, they seem to hold up better to some changes in that distribution. Models which are apriori more likely are more likely also on new distribution
  • #5: Simple models which rely on a few well established concepts and yet don’t rely to heavly on any one source are more likely to represent the underlying distribution of the data. More then that, they seem to hold up better to some changes in that distribution. Models which are apriori more likely are more likely also on new distribution
  • #8: Simple models which rely on a few well established concepts and yet don’t rely to heavly on any one source are more likely to represent the underlying distribution of the data. More then that, they seem to hold up better to some changes in that distribution. Models which are apriori more likely are more likely also on new distribution
  • #9: Simple models which rely on a few well established concepts and yet don’t rely to heavly on any one source are more likely to represent the underlying distribution of the data. More then that, they seem to hold up better to some changes in that distribution. Models which are apriori more likely are more likely also on new distribution
  • #10: Simple models which rely on a few well established concepts and yet don’t rely to heavly on any one source are more likely to represent the underlying distribution of the data. More then that, they seem to hold up better to some changes in that distribution. Models which are apriori more likely are more likely also on new distribution
  • #11: Simple models which rely on a few well established concepts and yet don’t rely to heavly on any one source are more likely to represent the underlying distribution of the data. More then that, they seem to hold up better to some changes in that distribution. Models which are apriori more likely are more likely also on new distribution
  • #17: background/foregroud, 3d object rotations, identifying limbs and their motion hidden parts,...