SlideShare a Scribd company logo
Machine Learning - Black Art
Charles Parker
Allston Trading
Machine Learning is Hard!
• By now, you know kind of a lot

• Different types of models

• Feature engineering

• Ways to evaluate

• But you’ll still fail!

• Out in the real world, there’s a
whole bunch of things that will kill
your project

• FYI - A lot of these talks are stolen
2
Join Me!
• On a journey into the Machine Learning House of
Horrors!

• Mwa ha ha!
3
5
• The Horror of The Huge Hypothesis Space

• The Perils of The Poorly Picked Loss Function

• The Creeping Creature Called Cross Validation

• The Dread of the Drifting Domain

• The Repugnance of Reliance on Research Results
The Machine Learning House of Horrors!
Choosing A Hypothesis Space
• By “hypothesis space” we
mean the possible classifiers
you could build with an
algorithm given the data

• This is the choice you make
when you pick a learning
algorithm

• You have one job!

• Is there any way to make it
easier?
6
Theory to The Rescue!
• Probably Approximately Correct

• We’d like our model to have error less than epsilon

• We’d like that to happen at least some percentage of the time

• If the error is epsilon, the percentage is sigma, the number of
training examples is m, and the hypothesis space size is d:
7
The Triple Trade-Off
• There is a triple-trade off between the error, the size
of the hypothesis space, and the amount of training
data you have
8
Error
Hypothesis Space Training Data
What About Huge Data?
• I’m clever, so I’ll use non-
parametric methods (Decision
tree, k-NN, kernelized SVMs)

• As data scales, curious things
tend to happen

• Simpler models become more
desirable as they’re faster to fit.

• You can increase model
complexity by adding features
(maybe word counts)

• Big data often trumps modeling!
9
10
• The Horror of The Huge Hypothesis Space

• The Perils of The Poorly Picked Loss Function

• The Creeping Creature Called Cross Validation

• The Dread of the Drifting Domain

• The Repugnance of Reliance on Research Results
The Machine Learning House of Horrors!
A Dirty Little Secret About ML Algorithms
• They don’t care what you want

• Decision Trees:

• SVM:

• LR:

• LDA:
11
Real-world Losses
• Real losses are nothing like this

• False positive in disease
diagnosis

• False positive in face
detection

• False positive in thumbprint
identification

• Some aren’t even instance-
based

• Path dependencies

• Game playing
12
Specializing Your Loss
• One solution is to let developers apply their own loss

• This is the approach of SVM light: 

http://svmlight.joachims.org/

It’s been around for a while

• Losses other than Mutual Information can be plugged into the appropriate
place in splitting code

• Models trained via gradient descent can obviously be customized (Python’s
Theano is interesting for this)

• In the case of multi-example loss function, we have SEARN in Vowpal Wabbit

https://github.com/JohnLangford/vowpal_wabbit
13
Other Hackery
• Sometimes, the solution is just to hack
around the actual prediction

• Have several levels (cascade) of
classifiers in e.g., medical diagnosis, text
recognition

• Apply logic to explicitly avoid high loss
cases (e.g., when buying/selling equities)

• Changing the problem setting

• Will you be doing queries? Use ranking
or metric learning

• “I want to do crazy thing x with
classifiers”, chances are it’s already been
done and you can read about it.
14
15
• The Horror of The Huge Hypothesis Space

• The Perils of The Poorly Picked Loss Function

• The Creeping Creature Called Cross Validation

• The Dread of the Drifting Domain

• The Repugnance of Reliance on Research Results
The Machine Learning House of Horrors!
When Validation Attacks!
• Cross validation

• n-Fold - Hold out one fold for
testing, train on n - 1 folds

• Great way to measure
performance, right?

• It’s all about information leakage

• via instances

• via features
16
Case Study #1: Law of Averages
• Estimate sporting event
outcomes

• Use previous games to
estimate points scored for
each team (via windowing
transform)

• Choose winner based on
predicted score

• What if you’re off by one on
the window?
17
Case Study #2: Photo Dating
• Take scanned photos from
30 different users (on
average 200 per user) and
create a model to assign a
date taken (plus or minus
five years)

• Perform 10-cross
validation

• Accuracy is 85%. Can
you trust it?
18
Case Study #3: Moments In Time
• You have a buy/sell
opportunity every five
seconds

• The signals you use to
evaluate the opportunity
are aggregates of market
activity over the last five
minutes

• How careful must you be
with cross-validation?
19
20
• The Horror of The Huge Hypothesis Space

• The Perils of The Poorly Picked Loss Function

• The Creeping Creature Called Cross Validation

• The Dread of the Drifting Domain

• The Repugnance of Reliance on Research Results
The Machine Learning House of Horrors!
Breaking Machine Learning
• You’ve got this great model!
Congratulations!

• Suddenly it stops working.
Why?

• You might be in a domain
that tends to change over
time (document classification,
sales prediction)

• You might be experiencing
adverse selection (market
data predictions, spam)
21
Concept Drift
• This is called non-stationarity in either the prior or the conditional
distributions

• Could be a couple of different things

• If the prior p(input) is changing, it’s covariate shift

• If the conditional p(output | input) is changing, it’s concept drift

• No rule that it can’t be both

• http://blog.bigml.com/2013/03/12/machine-learning-from-
streaming-data-two-problems-two-solutions-two-concerns-and-
two-lessons/
22
Take Action!
• First: Look for symptoms

• Getting a lot of errors

• The distribution of predicted values changes

• Drift detection algorithms (that I know about) have the same basic flavor:

• Buffer some data in memory

• If recent data is “different” from past data, retrain, update or give up

• Some resources - A nice survey paper and an open source package:
23
http://www.win.tue.nl/~mpechen/publications/pubs/Gama_ACMCS_AdaptationCD_accepted.pdf

http://moa.cms.waikato.ac.nz/
The Benefits of Archeology
• Why might you train on old
data, even if it’s not relevant?

• Verification of your research
process

• You’d do the same thing
last year. Did it work?

• Gives you a good idea of
how much drift you should
expect
24
25
• The Horror of The Huge Hypothesis Space

• The Perils of The Poorly Picked Loss Function

• The Creeping Creature Called Cross Validation

• The Dread of the Drifting Domain

• The Repugnance of Reliance on Research Results
The Machine Learning House of Horrors!
Publish or Perish
• Academic papers are a certain type of
result

• Show incremental improvement in
accuracy or generality

• Prove something about your
algorithm

• This latter is hard to come by as results
get more realistic

• Machine learning proofs assume data
is “i.i.d”, but this is obviously false.

• Real world data sucks, and dealing
with that significantly changes the
dataset
26
Usefulness of Results
• Theoretical Results

• Most of the time bounds do not apply (error, sample
complexity, convergence)

• Sometimes they don’t even make any sense

• Beware of putting too much faith in a single person or single
person’s work

• Usefulness generally occurs only in the aggregate

• And sometimes not even then (researchers are people, too)
27
Machine Learning Isn’t About Machine Learning
• Why doesn’t it work like in the
paper?

• Remember, the paper is carefully
controlled in a way your application
is not.

• Performance is rarely driven by
machine learning

• It’s driven by camera
microphones

• It’s driven by Mario Draghi
28
So, Don’t Bother With It?
• Of course not!

• What’s the alternative?

• “All our science, measured
against reality, is primitive
and childlike — and yet it is
the most precious thing we
have” - Albert Einstein

• Use academia as your
starting point, but don’t
think it will get you out of
the work
29
Some Themes
• The major points of this talk:

• Machine learning is hard to get right

• The algorithms won’t do what you want

• Good results are probably spurious

• Even if they aren’t, it won’t last

• Reading the research won’t help

• Wait, no!

• Have an attitude of skeptical optimism (or optimal skepticism?)
30

More Related Content

PDF
L11. The Future of Machine Learning
PDF
LR2. Summary Day 2
PDF
L13. Cluster Analysis
PDF
Fairly Measuring Fairness In Machine Learning
PDF
VSSML16 L3. Clusters and Anomaly Detection
PPTX
Understanding Basics of Machine Learning
PDF
Machine Learning for Dummies
PDF
Introduction to machine learning and deep learning
L11. The Future of Machine Learning
LR2. Summary Day 2
L13. Cluster Analysis
Fairly Measuring Fairness In Machine Learning
VSSML16 L3. Clusters and Anomaly Detection
Understanding Basics of Machine Learning
Machine Learning for Dummies
Introduction to machine learning and deep learning

What's hot (20)

PPT
Machine Learning presentation.
PPTX
Machine Learning
PDF
Lecture 2 Basic Concepts in Machine Learning for Language Technology
PPTX
What is Machine Learning
PPTX
A Friendly Introduction to Machine Learning
PDF
BSSML16 L3. Clusters and Anomaly Detection
PPTX
Lecture 01: Machine Learning for Language Technology - Introduction
PDF
VSSML16 LR2. Summary Day 2
PDF
ML Basics
PPTX
Machine Learning and Real-World Applications
PDF
Applications in Machine Learning
PPTX
Supervised learning
PPTX
Explainable Machine Learning (Explainable ML)
PPTX
Machine learning basics
PDF
Machine learning
PPTX
Lesson 3 ai in the enterprise
PDF
Machine Learning Algorithms (Part 1)
PPTX
Machine Learning in NutShell
PPTX
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
PPT
Basics of Machine Learning
Machine Learning presentation.
Machine Learning
Lecture 2 Basic Concepts in Machine Learning for Language Technology
What is Machine Learning
A Friendly Introduction to Machine Learning
BSSML16 L3. Clusters and Anomaly Detection
Lecture 01: Machine Learning for Language Technology - Introduction
VSSML16 LR2. Summary Day 2
ML Basics
Machine Learning and Real-World Applications
Applications in Machine Learning
Supervised learning
Explainable Machine Learning (Explainable ML)
Machine learning basics
Machine learning
Lesson 3 ai in the enterprise
Machine Learning Algorithms (Part 1)
Machine Learning in NutShell
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
Basics of Machine Learning
Ad

Similar to L15. Machine Learning - Black Art (20)

PDF
Hacking Predictive Modeling - RoadSec 2018
PPTX
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
PDF
Influx/Days 2017 San Francisco | Baron Schwartz
PPTX
Waves keynote2c
PDF
DutchMLSchool. Logistic Regression, Deepnets, Time Series
PDF
AI in the Real World: Challenges, and Risks and how to handle them?
PPTX
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
PPTX
Will Robots Replace Testers?
PPTX
The zen of predictive modelling
PDF
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
PDF
Ml masterclass
PDF
VSSML18. OptiML and Fusions
PDF
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
PDF
Data Science Folk Knowledge
PDF
DataEngConf SF16 - Three lessons learned from building a production machine l...
PPTX
How to make m achines learn
PPTX
November 15th 2018 denver cu seminar (drew miller) ai robotics cryptocurrency...
PPTX
Testing for cognitive bias in ai systems
PPTX
CS194Lec0hbh6EDA.pptx
PPTX
machine learning
Hacking Predictive Modeling - RoadSec 2018
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Influx/Days 2017 San Francisco | Baron Schwartz
Waves keynote2c
DutchMLSchool. Logistic Regression, Deepnets, Time Series
AI in the Real World: Challenges, and Risks and how to handle them?
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
Will Robots Replace Testers?
The zen of predictive modelling
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Ml masterclass
VSSML18. OptiML and Fusions
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Data Science Folk Knowledge
DataEngConf SF16 - Three lessons learned from building a production machine l...
How to make m achines learn
November 15th 2018 denver cu seminar (drew miller) ai robotics cryptocurrency...
Testing for cognitive bias in ai systems
CS194Lec0hbh6EDA.pptx
machine learning
Ad

More from Machine Learning Valencia (12)

PPTX
From Turing To Humanoid Robots - Ramón López de Mántaras
PPTX
Artificial Intelligence Progress - Tom Dietterich
PDF
L14. Anomaly Detection
PDF
L9. Real World Machine Learning - Cooking Predictions
PDF
L7. A developers’ overview of the world of predictive APIs
PDF
LR1. Summary Day 1
PDF
L6. Unbalanced Datasets
PDF
L5. Data Transformation and Feature Engineering
PDF
L4. Ensembles of Decision Trees
PDF
L3. Decision Trees
PDF
L2. Evaluating Machine Learning Algorithms I
PDF
L1. State of the Art in Machine Learning
From Turing To Humanoid Robots - Ramón López de Mántaras
Artificial Intelligence Progress - Tom Dietterich
L14. Anomaly Detection
L9. Real World Machine Learning - Cooking Predictions
L7. A developers’ overview of the world of predictive APIs
LR1. Summary Day 1
L6. Unbalanced Datasets
L5. Data Transformation and Feature Engineering
L4. Ensembles of Decision Trees
L3. Decision Trees
L2. Evaluating Machine Learning Algorithms I
L1. State of the Art in Machine Learning

Recently uploaded (20)

PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
1_Introduction to advance data techniques.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Introduction to Business Data Analytics.
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPT
Quality review (1)_presentation of this 21
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
1_Introduction to advance data techniques.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Clinical guidelines as a resource for EBP(1).pdf
Introduction-to-Cloud-ComputingFinal.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Reliability_Chapter_ presentation 1221.5784
Major-Components-ofNKJNNKNKNKNKronment.pptx
Introduction to Business Data Analytics.
Data_Analytics_and_PowerBI_Presentation.pptx
Foundation of Data Science unit number two notes
Introduction to Knowledge Engineering Part 1
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Quality review (1)_presentation of this 21
Galatica Smart Energy Infrastructure Startup Pitch Deck
oil_refinery_comprehensive_20250804084928 (1).pptx
Miokarditis (Inflamasi pada Otot Jantung)
IBA_Chapter_11_Slides_Final_Accessible.pptx

L15. Machine Learning - Black Art

  • 1. Machine Learning - Black Art Charles Parker Allston Trading
  • 2. Machine Learning is Hard! • By now, you know kind of a lot • Different types of models • Feature engineering • Ways to evaluate • But you’ll still fail! • Out in the real world, there’s a whole bunch of things that will kill your project • FYI - A lot of these talks are stolen 2
  • 3. Join Me! • On a journey into the Machine Learning House of Horrors! • Mwa ha ha! 3
  • 4. 5 • The Horror of The Huge Hypothesis Space • The Perils of The Poorly Picked Loss Function • The Creeping Creature Called Cross Validation • The Dread of the Drifting Domain • The Repugnance of Reliance on Research Results The Machine Learning House of Horrors!
  • 5. Choosing A Hypothesis Space • By “hypothesis space” we mean the possible classifiers you could build with an algorithm given the data • This is the choice you make when you pick a learning algorithm • You have one job! • Is there any way to make it easier? 6
  • 6. Theory to The Rescue! • Probably Approximately Correct • We’d like our model to have error less than epsilon • We’d like that to happen at least some percentage of the time • If the error is epsilon, the percentage is sigma, the number of training examples is m, and the hypothesis space size is d: 7
  • 7. The Triple Trade-Off • There is a triple-trade off between the error, the size of the hypothesis space, and the amount of training data you have 8 Error Hypothesis Space Training Data
  • 8. What About Huge Data? • I’m clever, so I’ll use non- parametric methods (Decision tree, k-NN, kernelized SVMs) • As data scales, curious things tend to happen • Simpler models become more desirable as they’re faster to fit. • You can increase model complexity by adding features (maybe word counts) • Big data often trumps modeling! 9
  • 9. 10 • The Horror of The Huge Hypothesis Space • The Perils of The Poorly Picked Loss Function • The Creeping Creature Called Cross Validation • The Dread of the Drifting Domain • The Repugnance of Reliance on Research Results The Machine Learning House of Horrors!
  • 10. A Dirty Little Secret About ML Algorithms • They don’t care what you want • Decision Trees: • SVM: • LR: • LDA: 11
  • 11. Real-world Losses • Real losses are nothing like this • False positive in disease diagnosis • False positive in face detection • False positive in thumbprint identification • Some aren’t even instance- based • Path dependencies • Game playing 12
  • 12. Specializing Your Loss • One solution is to let developers apply their own loss • This is the approach of SVM light: http://svmlight.joachims.org/ It’s been around for a while • Losses other than Mutual Information can be plugged into the appropriate place in splitting code • Models trained via gradient descent can obviously be customized (Python’s Theano is interesting for this) • In the case of multi-example loss function, we have SEARN in Vowpal Wabbit https://github.com/JohnLangford/vowpal_wabbit 13
  • 13. Other Hackery • Sometimes, the solution is just to hack around the actual prediction • Have several levels (cascade) of classifiers in e.g., medical diagnosis, text recognition • Apply logic to explicitly avoid high loss cases (e.g., when buying/selling equities) • Changing the problem setting • Will you be doing queries? Use ranking or metric learning • “I want to do crazy thing x with classifiers”, chances are it’s already been done and you can read about it. 14
  • 14. 15 • The Horror of The Huge Hypothesis Space • The Perils of The Poorly Picked Loss Function • The Creeping Creature Called Cross Validation • The Dread of the Drifting Domain • The Repugnance of Reliance on Research Results The Machine Learning House of Horrors!
  • 15. When Validation Attacks! • Cross validation • n-Fold - Hold out one fold for testing, train on n - 1 folds • Great way to measure performance, right? • It’s all about information leakage • via instances • via features 16
  • 16. Case Study #1: Law of Averages • Estimate sporting event outcomes • Use previous games to estimate points scored for each team (via windowing transform) • Choose winner based on predicted score • What if you’re off by one on the window? 17
  • 17. Case Study #2: Photo Dating • Take scanned photos from 30 different users (on average 200 per user) and create a model to assign a date taken (plus or minus five years) • Perform 10-cross validation • Accuracy is 85%. Can you trust it? 18
  • 18. Case Study #3: Moments In Time • You have a buy/sell opportunity every five seconds • The signals you use to evaluate the opportunity are aggregates of market activity over the last five minutes • How careful must you be with cross-validation? 19
  • 19. 20 • The Horror of The Huge Hypothesis Space • The Perils of The Poorly Picked Loss Function • The Creeping Creature Called Cross Validation • The Dread of the Drifting Domain • The Repugnance of Reliance on Research Results The Machine Learning House of Horrors!
  • 20. Breaking Machine Learning • You’ve got this great model! Congratulations! • Suddenly it stops working. Why? • You might be in a domain that tends to change over time (document classification, sales prediction) • You might be experiencing adverse selection (market data predictions, spam) 21
  • 21. Concept Drift • This is called non-stationarity in either the prior or the conditional distributions • Could be a couple of different things • If the prior p(input) is changing, it’s covariate shift • If the conditional p(output | input) is changing, it’s concept drift • No rule that it can’t be both • http://blog.bigml.com/2013/03/12/machine-learning-from- streaming-data-two-problems-two-solutions-two-concerns-and- two-lessons/ 22
  • 22. Take Action! • First: Look for symptoms • Getting a lot of errors • The distribution of predicted values changes • Drift detection algorithms (that I know about) have the same basic flavor: • Buffer some data in memory • If recent data is “different” from past data, retrain, update or give up • Some resources - A nice survey paper and an open source package: 23 http://www.win.tue.nl/~mpechen/publications/pubs/Gama_ACMCS_AdaptationCD_accepted.pdf http://moa.cms.waikato.ac.nz/
  • 23. The Benefits of Archeology • Why might you train on old data, even if it’s not relevant? • Verification of your research process • You’d do the same thing last year. Did it work? • Gives you a good idea of how much drift you should expect 24
  • 24. 25 • The Horror of The Huge Hypothesis Space • The Perils of The Poorly Picked Loss Function • The Creeping Creature Called Cross Validation • The Dread of the Drifting Domain • The Repugnance of Reliance on Research Results The Machine Learning House of Horrors!
  • 25. Publish or Perish • Academic papers are a certain type of result • Show incremental improvement in accuracy or generality • Prove something about your algorithm • This latter is hard to come by as results get more realistic • Machine learning proofs assume data is “i.i.d”, but this is obviously false. • Real world data sucks, and dealing with that significantly changes the dataset 26
  • 26. Usefulness of Results • Theoretical Results • Most of the time bounds do not apply (error, sample complexity, convergence) • Sometimes they don’t even make any sense • Beware of putting too much faith in a single person or single person’s work • Usefulness generally occurs only in the aggregate • And sometimes not even then (researchers are people, too) 27
  • 27. Machine Learning Isn’t About Machine Learning • Why doesn’t it work like in the paper? • Remember, the paper is carefully controlled in a way your application is not. • Performance is rarely driven by machine learning • It’s driven by camera microphones • It’s driven by Mario Draghi 28
  • 28. So, Don’t Bother With It? • Of course not! • What’s the alternative? • “All our science, measured against reality, is primitive and childlike — and yet it is the most precious thing we have” - Albert Einstein • Use academia as your starting point, but don’t think it will get you out of the work 29
  • 29. Some Themes • The major points of this talk: • Machine learning is hard to get right • The algorithms won’t do what you want • Good results are probably spurious • Even if they aren’t, it won’t last • Reading the research won’t help • Wait, no! • Have an attitude of skeptical optimism (or optimal skepticism?) 30