SlideShare a Scribd company logo
Morning class summary
Mercè Martín
BigML
Day 2
The Future of ML
José David Martín-Guerrero (IDAL, UV)
Machine learning projectMachine learning project
All steps are connected and feedback is essential to succeed
Society has drifted to the Machine Learning way
social networks, data acquisition, technologies...
Feature engineering challenges
High space dimensionality (#features >>> #samples)
Inputs preparation: selection, transformation or model direct
attack
Modelling strategies: paradox of choice
Too many algorithms and structures, no general purpose one?
Too many con2guration options, no automatic choice?
Select your model by its structure, parameters (tuning) or
search algorithm (e.g. Deep learning: no feature engineering
but hectic tuning, Azure: many elections)
Wish list: more automation
Work7ows, model selection, tuning, representation,
prediction strategies
The Future of ML
The Future of ML
Existing techniques: Reinforcement learning
Environment definable as state-space?
Evolution of this space acted by a set of actors?
The Problem is suitable for RL
Goal to be maximized in the long term?
Prior experience
Interaction
Environment adaptation
Policy
So far applied to synthetic problems and robotics but also suitable for
marketing or medicine, and more to come!
Evaluating ML Algorithms II
GOLDEN RULE: Never use the same
example for training the model and
evaluating it!!
What if you don't have so much data? Sample and repeat!
José Hernández-Orallo (UPV)
Under-fitting: too general
How can we detect them? Evaluating
Over-fitting: too specific
Evaluating ML Algorithms II
Training
Data
h1
Test
hn
Evaluation
Evaluation
Learning
Learning
Training
Test
n times
n folds
Cross-validation
o We take all possible
combinations with n‒1
for training and the
remaining fold for test.
o The error (or any other
metric) is calculated n
times and then
averaged.
o A 2nal model is trained
with all the data.
Bootstrapping o We extract n samples with repetition
and train with the rest
Evaluating ML Algorithms II
Cost-sensitive evaluations: not all errors are equally costly
Hadamard product = Cost matrix . Confusion matrix
open close
OPEN 0 100€
CLOSE 2000€ 0
Actual
Predicted
c1 open close
OPEN 300 500
CLOSE 200 99000
Actual
Pred
c3 open close
OPEN 400 5400
CLOSE 100 94100
Actual
c2 open close
OPEN 0 0
CLOSE 500 99500
Actual
c1 open close
OPEN 0€ 50,000€
CLOSE 400,000€ 0€
c3 open close
OPEN 0€ 540,000€
CLOSE 200,000€ 0€
c2 open close
OPEN 0€ 0€
CLOSE 1,000,000€ 0€
TOTAL COST:
450,000€
TOTAL COST:
1,000,000€ TOTAL COST:
740,000€
Confusion Matrices
Cost
Matrix
Resulting
Matrices
External Context:
Set of classes
Cost estimation
Confusion matrix & cost matrix can be characterized by just one number: slope
Evaluating ML Algorithms II
ROC (Receiver Operating Characteristic) analysis
Dynamic context (class distribution & cost matrix)
ROC diagram
0 1
1
0
FPR
TPR
o Given several classi2ers:
 We add the trivial (0,0) (1,1)
classi2ers and construct the convex
hull of their points (FPR,TPR). The
points in the edges are linear
combinations of classi2ers (p * Ca
+
(1-p) * Cb
)
 The classi2ers below the ROC curve
are discarded.
 The best classi2er (from those
remaining) will be selected at
application time… slope
Probabilistic context: soft ROC analysis
A single classifier with probability-weighted predictions can generate a ROC
curve by changing score threshold
(each threshold gives a new classifier in the ROC curve)
Ca
Cb
Evaluating ML Algorithms II
AUC (Area Under the ROC Curve)
For crisp classifiers AUC is equivalent to the macro-averaged accuracy.
AUC is a good metric for classifiers and rankers:
A classifier with high AUC is a good ranker.
It is also good for a (uniform) range of operating
conditions
A model with very good AUC will have good accuracy
for all operating conditions.
A model with very good accuracy for one operating
condition can have very bad accuracy for another
operating condition.
A classifier with high AUC can have poor calibration
(probability estimation).
Multidimensional classifications? ROC problematic, AUC has been extended
Regressions? ROC has been extended, AUC is the error variance
Cluster Analysis
K-means
clustering
K=3
Poul Petersen (BigML)
Unsupervised problem (unlabelled data)
Customer segmentation, Item discovery (types),
Association (profiles), recommender, active learning (group and label)
Cluster Analysis
• What is the distance to a “missing value”? Defaults replacement
• What is the distance between categorical values? [0,1]
• What is the distance between text features? Vectorize and use
cosine distance
• Does it have to be Euclidean distance?
• Unknown “K”?
Distance and centers define the groups: K-means, but...
Problems: Convergence (initial conditions), scaling dimensions
Things you need to tackle:
K-means: starting from a subset of K points, recursively compute the distances
of all points in data to them and associate with the closest.
Define the center of each group as new set of K points and repeat until there's
no improvement.
Cluster Analysis
Let K=5
K=5
g-means clustering: increment k looking for the gaussian
Unsupervised Data: Rank by dissimilarity
Why? Unusual instances, intrusion detection, fraud, incorrect
data
• Given a group, try to single out the odd: remove outliers from data
Dataset → Anomaly Detector → score → remove outliers
Can use it a diKerent layers and combined with clustering
• Improve model competence: testing predictions score to look for new
instances dissimilar to train instances (non-competent model)
• Compare against usual distributions, Gaussian, Benford's Law
Anomaly Detection
Poul Petersen (BigML)
Anomaly Detection
“Round”“Skinny” “Corners”
“Skinny”
but not “smooth”
No
“Corners”
Not
“Round”
Most unusual
Different according to grouping features (prior knowledge)
Anomaly Detection
Grow a random decision tree
until each instance is in its
own leaf (random features and
splits)
“easy” to isolate
“hard” to isolate
Depth
Now repeat the process several times and assign an
anomaly score ( 0 = similar , 1 = dissimilar) to any
input data by computing how di%erent is the average
depth for the instance to the average depth of the
training set.
Machine Learning Black Art
Charles Parker (BigML)
Even when you follow the
yellow brick road...
Different models
Feature engineering
Evaluation metrics
The house of horrors awaits you
around the corner:
Huge Hypothesis Space
Poorly Picked Loss Function
Cross Validation
Drifting Domain
Reliance on Research Results
Machine Learning Black Art
● Huge hypothesis space: the possible classifiers you could build with an
algorithm given the data. Choice!
Triple trade-off
Use non-parametric methods
As data scales simpler models are desirable
Big data often trumps modelling!
● Poorly picked Loss function: standard loss functions (entropy, distance in
formal space) are mathematically convenient but not always enough for real problems
No info about the classes or the costs
False positive in disease diagnosis
False positive in face detection
False positive in thumbprint identification
Path dependence
Game playing
Let developers apply their own loss
function: SVM light, plugins in splitting
code, customized gradient descent...
OR
Hack the prediction (cascade classifiers)
Change the problem setting
(time based limits to the classifier, max loss)
Keep error down with a certain probability
More complex: you need more data
Machine Learning Black Art
● Cross-validation
hold outs can lead to leakage: features or
instances can be correlated in test an train
sets. Optimistic performance.
Law of averages and being off by one
Features correlated with my prediction
can bias predictions
Photo dating: colors, borders...
Beware of the group the instances belong to
Agreggates and timestamps
Instances in close moments are very correlated
Machine Learning Black Art
● Drifting Domain
Domain changes (document classification, sales prediction)
Adverse selection of training data (market data predictions, spam)
➢ Prior p(input) is changing → covariate shift
➢ Map changes p(output | input) is changing → concept drift
Symptoms: lots of errors, distribution changes. Compare to old data!
● Reliance on Research results
Reality does not comply to theorems' initial boundaries (error, sample
complexity, convergence)
Rule of thumb:
Use academia as your starting point, but don’t
think it will solve all your problems. Keep learning
Reality does not comply to theorems' initial boundaries (error, sample
complexity, convergence) non-real assumptions
Useful Things about ML
Charles Parker (BigML)
Advice from Dijkstra
● Killing Ambitious Projects - identify sub-problems you can tackle
hard vs easy, hacking it's all right. Good candidates:
No human experts predict in complex environments (protein folding)
Humans can't explain how they know f(x)(character recognition)
f(x) is changing all the time (market data)
f(x) must be specialized many times (anything user speci2c)
● Ignoring the Lure of Complexity
Look for simplicity (remove spaghetti code, processes, drudgery)
Push around complexity (clever compression)
Raw data might have information, sometimes is the right way
● Finding Your Own Humility
Know and embrace your own limits
Continuously learn
Do A/B test: improve from an existing system
● Avoiding Useless Projects
Look for the best combination of easy and big win
De2ne metrics with experts but don't rely on them: monitor
Useful Things about ML
Advice From DijkstraAdvice From DijkstraAdvice From DijkstraAdvice From Dijkstra (continued)
● Creating a good story
Explain why and summarize your model and your data
Stories are more valuable than models
● Continuing to Learn
Don't accommodate, work at the verge of your abilities
Understand your limitations
Learn from your errors
Summary:
ML can be of value for every organization: 2nd where
Locating the right problem, Executing, Showing the proof
When you win we all win, so good luck!!!

More Related Content

PDF
L13. Cluster Analysis
PDF
L2. Evaluating Machine Learning Algorithms I
PDF
L3. Decision Trees
PDF
L11. The Future of Machine Learning
PPTX
Machine Learning and Real-World Applications
PDF
L4. Ensembles of Decision Trees
PDF
Lecture 9: Machine Learning in Practice (2)
PDF
Classification Based Machine Learning Algorithms
L13. Cluster Analysis
L2. Evaluating Machine Learning Algorithms I
L3. Decision Trees
L11. The Future of Machine Learning
Machine Learning and Real-World Applications
L4. Ensembles of Decision Trees
Lecture 9: Machine Learning in Practice (2)
Classification Based Machine Learning Algorithms

What's hot (20)

PDF
Applications in Machine Learning
PPTX
Introduction to Machine Learning
PDF
Fairly Measuring Fairness In Machine Learning
PPTX
Borderline Smote
PDF
Module 5: Decision Trees
PPT
Learning On The Border:Active Learning in Imbalanced classification Data
PDF
ML Basics
PDF
Machine Learning for Dummies
PDF
An introduction to machine learning and statistics
PPTX
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
PPTX
Intro to Machine Learning for non-Data Scientists
PPTX
Understanding Basics of Machine Learning
PPT
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
PPTX
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
PPT
week9_Machine_Learning.ppt
PPTX
What is Machine Learning
PDF
Hacking Predictive Modeling - RoadSec 2018
PDF
Module 2: Machine Learning Deep Dive
PPTX
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
PPT
Machine Learning 1 - Introduction
Applications in Machine Learning
Introduction to Machine Learning
Fairly Measuring Fairness In Machine Learning
Borderline Smote
Module 5: Decision Trees
Learning On The Border:Active Learning in Imbalanced classification Data
ML Basics
Machine Learning for Dummies
An introduction to machine learning and statistics
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Intro to Machine Learning for non-Data Scientists
Understanding Basics of Machine Learning
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Ml1 introduction to-supervised_learning_and_k_nearest_neighbors
week9_Machine_Learning.ppt
What is Machine Learning
Hacking Predictive Modeling - RoadSec 2018
Module 2: Machine Learning Deep Dive
Machine Learning Algorithms | Machine Learning Tutorial | Data Science Algori...
Machine Learning 1 - Introduction
Ad

Similar to LR2. Summary Day 2 (20)

PDF
林守德/Practical Issues in Machine Learning
PPTX
Machine learning and linear regression programming
PPTX
Machine learning ppt unit one syllabuspptx
PPTX
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
PPTX
Big Data & Machine Learning - TDC2013 Sao Paulo
PPTX
Machine Learning in e commerce - Reboot
PPT
Machine Learning ICS 273A
PPT
Machine Learning: Foundations Course Number 0368403401
PDF
The Power of Auto ML and How Does it Work
PDF
AI and Deep Learning
PDF
Integrating Artificial Intelligence with IoT
PDF
Human in the loop: Bayesian Rules Enabling Explainable AI
PDF
The Machine Learning Workflow with Azure
PPTX
17- Kernels and Clustering.pptx
PPTX
TensorFlow Event presentation08-12-2024.pptx
PPT
notes as .ppt
PDF
Engineering Intelligent Systems using Machine Learning
PPTX
Lecture 09(introduction to machine learning)
PDF
Machine learning for_finance
PDF
Top 50+ Data Science Interview Questions and Answers for 2025 (1).pdf
林守德/Practical Issues in Machine Learning
Machine learning and linear regression programming
Machine learning ppt unit one syllabuspptx
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 Sao Paulo
Machine Learning in e commerce - Reboot
Machine Learning ICS 273A
Machine Learning: Foundations Course Number 0368403401
The Power of Auto ML and How Does it Work
AI and Deep Learning
Integrating Artificial Intelligence with IoT
Human in the loop: Bayesian Rules Enabling Explainable AI
The Machine Learning Workflow with Azure
17- Kernels and Clustering.pptx
TensorFlow Event presentation08-12-2024.pptx
notes as .ppt
Engineering Intelligent Systems using Machine Learning
Lecture 09(introduction to machine learning)
Machine learning for_finance
Top 50+ Data Science Interview Questions and Answers for 2025 (1).pdf
Ad

More from Machine Learning Valencia (10)

PPTX
From Turing To Humanoid Robots - Ramón López de Mántaras
PPTX
Artificial Intelligence Progress - Tom Dietterich
PDF
L15. Machine Learning - Black Art
PDF
L14. Anomaly Detection
PDF
L9. Real World Machine Learning - Cooking Predictions
PDF
L7. A developers’ overview of the world of predictive APIs
PDF
LR1. Summary Day 1
PDF
L6. Unbalanced Datasets
PDF
L5. Data Transformation and Feature Engineering
PDF
L1. State of the Art in Machine Learning
From Turing To Humanoid Robots - Ramón López de Mántaras
Artificial Intelligence Progress - Tom Dietterich
L15. Machine Learning - Black Art
L14. Anomaly Detection
L9. Real World Machine Learning - Cooking Predictions
L7. A developers’ overview of the world of predictive APIs
LR1. Summary Day 1
L6. Unbalanced Datasets
L5. Data Transformation and Feature Engineering
L1. State of the Art in Machine Learning

Recently uploaded (20)

PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Computer network topology notes for revision
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Introduction to Business Data Analytics.
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Computer network topology notes for revision
Galatica Smart Energy Infrastructure Startup Pitch Deck
Supervised vs unsupervised machine learning algorithms
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Clinical guidelines as a resource for EBP(1).pdf
oil_refinery_comprehensive_20250804084928 (1).pptx
Introduction to Business Data Analytics.
Major-Components-ofNKJNNKNKNKNKronment.pptx
Introduction-to-Cloud-ComputingFinal.pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Reliability_Chapter_ presentation 1221.5784
IBA_Chapter_11_Slides_Final_Accessible.pptx
Launch Your Data Science Career in Kochi – 2025
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Mega Projects Data Mega Projects Data
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Miokarditis (Inflamasi pada Otot Jantung)

LR2. Summary Day 2

  • 3. The Future of ML José David Martín-Guerrero (IDAL, UV) Machine learning projectMachine learning project All steps are connected and feedback is essential to succeed Society has drifted to the Machine Learning way social networks, data acquisition, technologies...
  • 4. Feature engineering challenges High space dimensionality (#features >>> #samples) Inputs preparation: selection, transformation or model direct attack Modelling strategies: paradox of choice Too many algorithms and structures, no general purpose one? Too many con2guration options, no automatic choice? Select your model by its structure, parameters (tuning) or search algorithm (e.g. Deep learning: no feature engineering but hectic tuning, Azure: many elections) Wish list: more automation Work7ows, model selection, tuning, representation, prediction strategies The Future of ML
  • 5. The Future of ML Existing techniques: Reinforcement learning Environment definable as state-space? Evolution of this space acted by a set of actors? The Problem is suitable for RL Goal to be maximized in the long term? Prior experience Interaction Environment adaptation Policy So far applied to synthetic problems and robotics but also suitable for marketing or medicine, and more to come!
  • 6. Evaluating ML Algorithms II GOLDEN RULE: Never use the same example for training the model and evaluating it!! What if you don't have so much data? Sample and repeat! José Hernández-Orallo (UPV) Under-fitting: too general How can we detect them? Evaluating Over-fitting: too specific
  • 7. Evaluating ML Algorithms II Training Data h1 Test hn Evaluation Evaluation Learning Learning Training Test n times n folds Cross-validation o We take all possible combinations with n‒1 for training and the remaining fold for test. o The error (or any other metric) is calculated n times and then averaged. o A 2nal model is trained with all the data. Bootstrapping o We extract n samples with repetition and train with the rest
  • 8. Evaluating ML Algorithms II Cost-sensitive evaluations: not all errors are equally costly Hadamard product = Cost matrix . Confusion matrix open close OPEN 0 100€ CLOSE 2000€ 0 Actual Predicted c1 open close OPEN 300 500 CLOSE 200 99000 Actual Pred c3 open close OPEN 400 5400 CLOSE 100 94100 Actual c2 open close OPEN 0 0 CLOSE 500 99500 Actual c1 open close OPEN 0€ 50,000€ CLOSE 400,000€ 0€ c3 open close OPEN 0€ 540,000€ CLOSE 200,000€ 0€ c2 open close OPEN 0€ 0€ CLOSE 1,000,000€ 0€ TOTAL COST: 450,000€ TOTAL COST: 1,000,000€ TOTAL COST: 740,000€ Confusion Matrices Cost Matrix Resulting Matrices External Context: Set of classes Cost estimation Confusion matrix & cost matrix can be characterized by just one number: slope
  • 9. Evaluating ML Algorithms II ROC (Receiver Operating Characteristic) analysis Dynamic context (class distribution & cost matrix) ROC diagram 0 1 1 0 FPR TPR o Given several classi2ers:  We add the trivial (0,0) (1,1) classi2ers and construct the convex hull of their points (FPR,TPR). The points in the edges are linear combinations of classi2ers (p * Ca + (1-p) * Cb )  The classi2ers below the ROC curve are discarded.  The best classi2er (from those remaining) will be selected at application time… slope Probabilistic context: soft ROC analysis A single classifier with probability-weighted predictions can generate a ROC curve by changing score threshold (each threshold gives a new classifier in the ROC curve) Ca Cb
  • 10. Evaluating ML Algorithms II AUC (Area Under the ROC Curve) For crisp classifiers AUC is equivalent to the macro-averaged accuracy. AUC is a good metric for classifiers and rankers: A classifier with high AUC is a good ranker. It is also good for a (uniform) range of operating conditions A model with very good AUC will have good accuracy for all operating conditions. A model with very good accuracy for one operating condition can have very bad accuracy for another operating condition. A classifier with high AUC can have poor calibration (probability estimation). Multidimensional classifications? ROC problematic, AUC has been extended Regressions? ROC has been extended, AUC is the error variance
  • 11. Cluster Analysis K-means clustering K=3 Poul Petersen (BigML) Unsupervised problem (unlabelled data) Customer segmentation, Item discovery (types), Association (profiles), recommender, active learning (group and label)
  • 12. Cluster Analysis • What is the distance to a “missing value”? Defaults replacement • What is the distance between categorical values? [0,1] • What is the distance between text features? Vectorize and use cosine distance • Does it have to be Euclidean distance? • Unknown “K”? Distance and centers define the groups: K-means, but... Problems: Convergence (initial conditions), scaling dimensions Things you need to tackle: K-means: starting from a subset of K points, recursively compute the distances of all points in data to them and associate with the closest. Define the center of each group as new set of K points and repeat until there's no improvement.
  • 13. Cluster Analysis Let K=5 K=5 g-means clustering: increment k looking for the gaussian
  • 14. Unsupervised Data: Rank by dissimilarity Why? Unusual instances, intrusion detection, fraud, incorrect data • Given a group, try to single out the odd: remove outliers from data Dataset → Anomaly Detector → score → remove outliers Can use it a diKerent layers and combined with clustering • Improve model competence: testing predictions score to look for new instances dissimilar to train instances (non-competent model) • Compare against usual distributions, Gaussian, Benford's Law Anomaly Detection Poul Petersen (BigML)
  • 15. Anomaly Detection “Round”“Skinny” “Corners” “Skinny” but not “smooth” No “Corners” Not “Round” Most unusual Different according to grouping features (prior knowledge)
  • 16. Anomaly Detection Grow a random decision tree until each instance is in its own leaf (random features and splits) “easy” to isolate “hard” to isolate Depth Now repeat the process several times and assign an anomaly score ( 0 = similar , 1 = dissimilar) to any input data by computing how di%erent is the average depth for the instance to the average depth of the training set.
  • 17. Machine Learning Black Art Charles Parker (BigML) Even when you follow the yellow brick road... Different models Feature engineering Evaluation metrics The house of horrors awaits you around the corner: Huge Hypothesis Space Poorly Picked Loss Function Cross Validation Drifting Domain Reliance on Research Results
  • 18. Machine Learning Black Art ● Huge hypothesis space: the possible classifiers you could build with an algorithm given the data. Choice! Triple trade-off Use non-parametric methods As data scales simpler models are desirable Big data often trumps modelling! ● Poorly picked Loss function: standard loss functions (entropy, distance in formal space) are mathematically convenient but not always enough for real problems No info about the classes or the costs False positive in disease diagnosis False positive in face detection False positive in thumbprint identification Path dependence Game playing Let developers apply their own loss function: SVM light, plugins in splitting code, customized gradient descent... OR Hack the prediction (cascade classifiers) Change the problem setting (time based limits to the classifier, max loss) Keep error down with a certain probability More complex: you need more data
  • 19. Machine Learning Black Art ● Cross-validation hold outs can lead to leakage: features or instances can be correlated in test an train sets. Optimistic performance. Law of averages and being off by one Features correlated with my prediction can bias predictions Photo dating: colors, borders... Beware of the group the instances belong to Agreggates and timestamps Instances in close moments are very correlated
  • 20. Machine Learning Black Art ● Drifting Domain Domain changes (document classification, sales prediction) Adverse selection of training data (market data predictions, spam) ➢ Prior p(input) is changing → covariate shift ➢ Map changes p(output | input) is changing → concept drift Symptoms: lots of errors, distribution changes. Compare to old data! ● Reliance on Research results Reality does not comply to theorems' initial boundaries (error, sample complexity, convergence) Rule of thumb: Use academia as your starting point, but don’t think it will solve all your problems. Keep learning Reality does not comply to theorems' initial boundaries (error, sample complexity, convergence) non-real assumptions
  • 21. Useful Things about ML Charles Parker (BigML) Advice from Dijkstra ● Killing Ambitious Projects - identify sub-problems you can tackle hard vs easy, hacking it's all right. Good candidates: No human experts predict in complex environments (protein folding) Humans can't explain how they know f(x)(character recognition) f(x) is changing all the time (market data) f(x) must be specialized many times (anything user speci2c) ● Ignoring the Lure of Complexity Look for simplicity (remove spaghetti code, processes, drudgery) Push around complexity (clever compression) Raw data might have information, sometimes is the right way ● Finding Your Own Humility Know and embrace your own limits Continuously learn Do A/B test: improve from an existing system ● Avoiding Useless Projects Look for the best combination of easy and big win De2ne metrics with experts but don't rely on them: monitor
  • 22. Useful Things about ML Advice From DijkstraAdvice From DijkstraAdvice From DijkstraAdvice From Dijkstra (continued) ● Creating a good story Explain why and summarize your model and your data Stories are more valuable than models ● Continuing to Learn Don't accommodate, work at the verge of your abilities Understand your limitations Learn from your errors Summary: ML can be of value for every organization: 2nd where Locating the right problem, Executing, Showing the proof When you win we all win, so good luck!!!