Limits of Machine Learning

Limits of Machine Learning
P i c k i n g p r o b l e m s t o s o l v e w i t h M L
M E I R M A O R
C h i e f A r c h i t e c t @ S p a r k B e y o n d

About Me
Meir Maor
Chief Architect @ SparkBeyond
At SparkBeyond we leverage the collective human knowledge to solve the world's
toughest problems

AI the current frontier
What can and can not be done with Machine Learning
Practical advice for setting up Machine Learning Problems
Feature Engineering
Hyper Parameter tuning
Model Selection
Training Huge Neural Netwoks

Well, not quite, but ...
With no knowledge but the rules and typical game length, Alpha Zero learned to play
both Go and Chess at Super Human level

And more.
Single Sentence translation Chinese<->English at human level
ImageNet Image classification at super human ability
Finding problems in Non-Disclousre-Agreements at human level
Cancer early warning, Churn prediction, Ad optimization, Predictive maintenance,
and many many more.

Predicting is Hard, Especially the Future - N.Bohr
We use machine learning to:
Predict outcome, unseen behavior, future events
To automate human tasks, classify, label, prioritize

To Assume makes an Ass out of U and Me
Common assumptions in Machine Learning:
* Data comes from stationary distribution(train and test have same distribution)
* Training samples are independent
* Past actions were random, no hidden confounders

The unreasonable effectiveness of Data
With plenty of data machine learning becomes easy
The recent rise of Deep Learning, is not from algorithmic advancements but rather
the existence of large data sets and computers which can process them
With enough data you are essentially learning from examples almost identical to
those you later need to predict/classify

The perfect fit
I have tremendous computing power, let’s find a function which perfectly describes
my data
PAC model
No Free Lunch
Better generalization
Leslie Valiant, 1984

Compromise
Bias / Variance tradeoff → We must limit our search space
Shrink the hypothesis space:
limit boosting iterations
tree size
min sample in leaf
number of hidden nodes
impose sparsity constraint
...

Compromise cont.
Penalize “less favourable” models:
Lasso / ridge regularization
Bagging / Boot strap sampling
Drop out

Needle in a Hay Stack
All the Hay in the world won’t teach you what a needle looks like
It’s important to have enough samples of the rare class

What is normal?
Unsupervised learning tries to find a needle using only hay.
Abnormal Hay?
- Extra long?
- a piece of grain?
- Too long in the sun?
- Still green?
- A Needle!

Transfer Learning
Learning to solve a problem, and applying on a similar different problem
Any time samples don’t come from exactly the same distribution:
Learning from one area and applying on the next
Learning from the past and applying on the future
Learning from non random sample

Transfer Learning to the rescue
We can and must learn from previous problems
How can a child learn to identify a Ring Tailed Lemur from a single photo?

The computer isn’t there yet
We can use pre-trained embeddings, and pre-trained networks but only for similar
problems. Good results for text, some for images(similar domain), not so much
beyond.
Even with excellent help can’t learn a new animal from a single photo

What If?
When we are wondering about future actions, and past actions were not random we
are doing transfer learning.

Randomized Study
Always preferable to train machine learning on data were actions were random

When was that?
We want to learn only with what we will have.
We must know what did we know back then?

Mining for Unobtainium*
A client in the never never land want to find new Unobtainium deposits in the
never-never lands.
A large part of the the land has been explored and we have a map of the mines
Many areas were not explored, we have no Map
* Identifying client details were changed

Modelling Take 1
Place a grid on the never-never land map
All grid square with a known deposit are positive
Since Unobtainium is rare all others can be assumed to be negative
Use advanced imaging, radiometric, magnetic, topographic maps, geological
maps, and more for explaining variables.

99% AUC!! We are going to be rich!
Using topographic data, a big hole in the ground predicts a large deposit perfectly.
We are detecting existing active mines.
Back to the archives to find 50 year old maps from before most mines were open.

96% AUC! We are going to be rich!
Distance from roads, Is an excellent predictor.
Not only do all existing mines have roads to them
Past exploration was primarily in accessible areas
Removing roads is not enough, They are hidden in all the data.

Observational study, Simpson’s Paradox

Obviously we should look at the break down
Low Birth-Weight babies born to smoking mothers have lower mortality rates
compared to similar weight babies from non smokers.
Normal Birth Weight babies born to smoking mothers have lower mortality rates
compared to similar weight babies from non smokers.
Ergo. Smoking is good for your baby!

Summary
Need Stationary distribution, need to predict on samples like those trained on
Need Data, especially of rare events
Need to be aware of changing world
Measure&evaluate everything!

Limits of Machine Learning

More Related Content

Similar to Limits of Machine Learning (20)

More from Meir Maor (6)

Recently uploaded (20)

Limits of Machine Learning

Editor's Notes