SlideShare a Scribd company logo
Automated Machine
Learning in 2020
Panagiotis Papaemmanouil
Data Scientist, Medoid AI
1
About me,
● BSc Mathematics, AUTh
● MSc Data and Web Science, AUTh
● Data Scientist at Medoid AI
● Focusing on Financial Machine Learning
2
Presentation Outline:
Automated Machine Learning (Auto ML) in 2020
1. Motivation
2. Definition
Introduction
1. Feature Engineering
2. Model Selection
3. Hyperparameter Optimization
4. Neural Architecture Search
Auto ML in Action
Google Cloud Platform (GCP),
Artificial Intelligence, Tables
Auto ML Case Study
1. Open Source & Commercial tools
2. Code and GUI examples
Tools & Frameworks
Introduction
4
Machine Learning: “A field of study
that gives computers the ability
to learn without being explicitly
programmed.”
1959, Arthur Samuel
5
Full stack Data scientists are rare creatures!
● Knowledge of the business domain and
business problems
● Knowledge of the data
● Ability to write code to gather data
● Ability to write code to explore/inspect data
● Ability to write code to manipulate data
● Ability to write code to extract insights
● Ability to write code to build models
● Ability to write code to implement models
● Statistics
● Internal of Algorithms
● Practical Knowledge and experience
● Knowing how to interpret and explain models
Motivation
7
1. ML pipeline building involves
Repetitive, Time consuming Tasks
2. Some tasks require a lot of manual
work
3. ML is still somehow Difficult to
build, deploy and maintain
I. The Data Science case of Auto ML
8
II. The Business case of Auto ML
Source: Kaggle surveys
Data Science
barriers or challenges
in workplace
Automated Machine Learning
The solution is
10
“Automated Machine Learning (or Auto ML)
is the process of automating the process
of applying machine learning to real-world
problems.”
Thanks wikipedia for the so
not obvious definition 11
Gartner Said (at 2017) that “More Than
of Data Science Tasks Will Be Automated by 2020”
Source:https://www.gartner.com/en/newsroom/press-releases/2017-01-16-gartner-says-more-than-40-percent-of-dat
a-science-tasks-will-be-automated-by-2020
40%
12
Auto ML in
Action
13
Machine Learning project flowchart
Auto ML aims to
automate this
part!
1. Data Cleaning
Data type identification, NA imputation, Scaling, Outliers, Categorical Encoding, ...
2. Feature Extraction
Deep Feature Synthesis
3. Feature Selection
Recursive feature Elimination, Feature Importances, ...
I. Feature Engineering
New Dataset
ML algorithm
Rank features
Extract Dataset
characteristics
Measure model
performance
Score
Reduced Dataset
Repeat for Multiple
Ranking Algorithms
15
II. Model Selection: No free lunch Theorem (1/2)
● Always check your
assumptions before relying
on a model or search
algorithm.
● There is no “super algorithm”
that will work perfectly for all
datasets.
Source: sklearn documentation
II. Model Selection (2/2)
1.There is no best algorithm for all problems - No free lunch theorem
2.Best algorithm it is not intuitive
3.Complex models are not always optimal (bias-variance trade-off)
New Dataset Extract Dataset
characteristics
Train & Evaluate
multiple ML models
Rank models
based on their
performance
Results
(Top performing
algorithms &
metrics)
...
III. Hyperparameter Optimization
1. Default parameters are almost always bad
2. Hyperparameter tuning requires a lot of repeating experiments
3. Selection of best hyperparameters set is based on experiments
and heuristics
(grid search, random search, bayesian optimization, evolutionary optimization, early stopping)
Dataset
ML algorithm
Hyperparameter
choice
Measure model
performance Score
Tuned Model
Optimize unitl
convergence
Train ML model
What about Deep Learning?
Automated Deep Learning:
Neural Architecture Search (NAS)
Approaches differentiate based on:
1. Search space
2. Search strategy
3. Performance estimation strategy
Tools
&
Frameworks
20
Tools & Frameworks Taxonomy
Tech Giants Start-ups Open Source
21
1. Some Tools aims to automate the
entire ML pipeline, while others are
focused on specific tasks
(such as Features engineering, Model Selection or Hyperparameter
optimization)
2. Some tools require code, other
offers GUI and other supports both
of them.
Notes
22
Source: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train
from azureml.train.automl import AutoMLConfig
# task can be one of classification, regression, forecasting
automl_config = AutoMLConfig(task = "classification")
from azureml.core.dataset import Dataset
data = "data/path"
dataset = Dataset.Tabular.from_delimited_files(data)
automl_classifier=AutoMLConfig(
task='classification',
primary_metric='AUC_weighted',
experiment_timeout_minutes=30,
blocked_models=['XGBoostClassifier'],
training_data=train_data,
label_column_name=label,
n_cross_validations=2)
Auto ML: Code example (I)
23
Auto ML: Code example (II)
24
Auto ML: GUI example (1/2)
25
Auto ML: GUI example (2/2)
26
27
Case Study:
Google Cloud
Platform (GCP),
AI, Tables
28
GDG DEvFest Hellas 2020 -  Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 -  Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 -  Automated ML - Panagiotis Papaemmanouil
GDG DEvFest Hellas 2020 -  Automated ML - Panagiotis Papaemmanouil
Case Study: The Numerai dataset
● Numerai’s trades are determined by an AI, which is
fueled by a network of thousands of anonymous
data scientists
● Numerai hosts a weekly tournament, in which data
scientists submit their predictions in exchange for
the potential to earn some amount of USD and
cryptocurrency called Numeraire.
● Predictions submitted by users steer Numerai's
hedge fund together.
You could learn more here: https://numer.ai/learn
Numerai is an AI-run,
crowd-sourced hedge fund based in
San Francisco.
5 Steps Pipeline:
1. Import, 2. Train, 3. Models, 4. Evaluate, 5. Test & Use
Case Study: GCP Tables (zero-code)
Step 1: Import dataset
Step 2: Train 1/2 (Analyze dataset, Define target)
Step 2: Train 2/2 (Define train parameters)
Step 3: Models
38
Step 4: Evaluate (1/2)
Step 4: Evaluate (2/2)
Step 5: Test & Use (Batch prediction)
Step 5: Test & Use (Online prediction)
Step 5: Test & Use (Export your model)
Closing
Remarks
44
1. Fast experimentation.
Automatically search an entire space of candidate Machine learning
pipelines and returns the best one.
Shows what approaches are the best for the specific dataset and the specific
task.
2. Handles the boring stuff.
(hyperparameter tuning etc.) and helps the data scientist to focus on
important things (business problem modeling)
3. Democratization of Machine Learning.
No need for programming skills and/or understanding of Machine Learning
(I’m not sure if this is an advantage!)
Advantages
45
1. Not fully automated
2. Not customized solutions
3. Not human competitive (yet)
Kagglers are still state-of-the-art
4. There aren’t any Reinforcement Learning
applications.
5. Lack of domain knowledge.
Poor feature extraction skills.
Disadvantages
46
“Now you can fake your Machine
Learning expertise, without even
understanding linear
regression!”
“Use Auto ML tools to Free up
time from boring and repetitive
tasks, in order to focus in
what actually matters.”
Conclusion.
47
“Now you can fake your Machine
Learning expertise, without even
understanding linear
regression!”
www.linkedin.com/in/panagiotis-papaemmanouil/
panagiotis.papaemmanouil[at]medoid.ai
www.medoid.ai/
www.linkedin.com/company/medoid-ai/
48

More Related Content

PDF
AutoML - The Future of AI
PDF
Essential concepts for machine learning
PDF
Automatic Machine Learning, AutoML
PDF
The Evolution of AutoML
PPTX
B.E Project: Detection of Bots on Twitter
PDF
DC02. Interpretation of predictions
PDF
Building Data Apps with Python
PDF
Building Data Products with Python (Georgetown)
AutoML - The Future of AI
Essential concepts for machine learning
Automatic Machine Learning, AutoML
The Evolution of AutoML
B.E Project: Detection of Bots on Twitter
DC02. Interpretation of predictions
Building Data Apps with Python
Building Data Products with Python (Georgetown)

Similar to GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil (20)

PPTX
Aws autopilot
PPTX
Getting Started with Azure AutoML
PDF
Machine learning for sensor Data Analytics
PDF
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
PPTX
AI hype or reality
PDF
AI for Software Engineering
PDF
Pydata Chicago - work hard once
PPTX
Introducing ML.NET For Absolute Beginners - Part 1
PDF
Ai in finance
PDF
The Machine Learning Audit
PPTX
MOPs & ML Pipelines on GCP - Session 6, RGDC
PPTX
MLOps.pptx
PDF
5 Practical Steps to a Successful Deep Learning Research
PDF
The Power of Auto ML and How Does it Work
PDF
Internship Presentation.pdf
PDF
What is Machine Learning Operations (MLOps)?
PDF
Automatic machine learning (AutoML) 101
PPTX
From Data Science to MLOps
PDF
Investing in ai driven startups
PPTX
Python for Machine Learning_ A Comprehensive Overview.pptx
Aws autopilot
Getting Started with Azure AutoML
Machine learning for sensor Data Analytics
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
AI hype or reality
AI for Software Engineering
Pydata Chicago - work hard once
Introducing ML.NET For Absolute Beginners - Part 1
Ai in finance
The Machine Learning Audit
MOPs & ML Pipelines on GCP - Session 6, RGDC
MLOps.pptx
5 Practical Steps to a Successful Deep Learning Research
The Power of Auto ML and How Does it Work
Internship Presentation.pdf
What is Machine Learning Operations (MLOps)?
Automatic machine learning (AutoML) 101
From Data Science to MLOps
Investing in ai driven startups
Python for Machine Learning_ A Comprehensive Overview.pptx
Ad

Recently uploaded (20)

PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PDF
cuic standard and advanced reporting.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
NewMind AI Weekly Chronicles - August'25 Week I
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
Cloud computing and distributed systems.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Modernizing your data center with Dell and AMD
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
cuic standard and advanced reporting.pdf
Understanding_Digital_Forensics_Presentation.pptx
NewMind AI Weekly Chronicles - August'25 Week I
The AUB Centre for AI in Media Proposal.docx
Chapter 3 Spatial Domain Image Processing.pdf
Approach and Philosophy of On baking technology
Dropbox Q2 2025 Financial Results & Investor Presentation
Per capita expenditure prediction using model stacking based on satellite ima...
MYSQL Presentation for SQL database connectivity
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Cloud computing and distributed systems.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Advanced methodologies resolving dimensionality complications for autism neur...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Building Integrated photovoltaic BIPV_UPV.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Network Security Unit 5.pdf for BCA BBA.
Modernizing your data center with Dell and AMD
Ad

GDG DEvFest Hellas 2020 - Automated ML - Panagiotis Papaemmanouil

  • 1. Automated Machine Learning in 2020 Panagiotis Papaemmanouil Data Scientist, Medoid AI 1
  • 2. About me, ● BSc Mathematics, AUTh ● MSc Data and Web Science, AUTh ● Data Scientist at Medoid AI ● Focusing on Financial Machine Learning 2
  • 3. Presentation Outline: Automated Machine Learning (Auto ML) in 2020 1. Motivation 2. Definition Introduction 1. Feature Engineering 2. Model Selection 3. Hyperparameter Optimization 4. Neural Architecture Search Auto ML in Action Google Cloud Platform (GCP), Artificial Intelligence, Tables Auto ML Case Study 1. Open Source & Commercial tools 2. Code and GUI examples Tools & Frameworks
  • 5. Machine Learning: “A field of study that gives computers the ability to learn without being explicitly programmed.” 1959, Arthur Samuel 5
  • 6. Full stack Data scientists are rare creatures! ● Knowledge of the business domain and business problems ● Knowledge of the data ● Ability to write code to gather data ● Ability to write code to explore/inspect data ● Ability to write code to manipulate data ● Ability to write code to extract insights ● Ability to write code to build models ● Ability to write code to implement models ● Statistics ● Internal of Algorithms ● Practical Knowledge and experience ● Knowing how to interpret and explain models
  • 8. 1. ML pipeline building involves Repetitive, Time consuming Tasks 2. Some tasks require a lot of manual work 3. ML is still somehow Difficult to build, deploy and maintain I. The Data Science case of Auto ML 8
  • 9. II. The Business case of Auto ML Source: Kaggle surveys Data Science barriers or challenges in workplace
  • 11. “Automated Machine Learning (or Auto ML) is the process of automating the process of applying machine learning to real-world problems.” Thanks wikipedia for the so not obvious definition 11
  • 12. Gartner Said (at 2017) that “More Than of Data Science Tasks Will Be Automated by 2020” Source:https://www.gartner.com/en/newsroom/press-releases/2017-01-16-gartner-says-more-than-40-percent-of-dat a-science-tasks-will-be-automated-by-2020 40% 12
  • 14. Machine Learning project flowchart Auto ML aims to automate this part!
  • 15. 1. Data Cleaning Data type identification, NA imputation, Scaling, Outliers, Categorical Encoding, ... 2. Feature Extraction Deep Feature Synthesis 3. Feature Selection Recursive feature Elimination, Feature Importances, ... I. Feature Engineering New Dataset ML algorithm Rank features Extract Dataset characteristics Measure model performance Score Reduced Dataset Repeat for Multiple Ranking Algorithms 15
  • 16. II. Model Selection: No free lunch Theorem (1/2) ● Always check your assumptions before relying on a model or search algorithm. ● There is no “super algorithm” that will work perfectly for all datasets. Source: sklearn documentation
  • 17. II. Model Selection (2/2) 1.There is no best algorithm for all problems - No free lunch theorem 2.Best algorithm it is not intuitive 3.Complex models are not always optimal (bias-variance trade-off) New Dataset Extract Dataset characteristics Train & Evaluate multiple ML models Rank models based on their performance Results (Top performing algorithms & metrics) ...
  • 18. III. Hyperparameter Optimization 1. Default parameters are almost always bad 2. Hyperparameter tuning requires a lot of repeating experiments 3. Selection of best hyperparameters set is based on experiments and heuristics (grid search, random search, bayesian optimization, evolutionary optimization, early stopping) Dataset ML algorithm Hyperparameter choice Measure model performance Score Tuned Model Optimize unitl convergence Train ML model
  • 19. What about Deep Learning? Automated Deep Learning: Neural Architecture Search (NAS) Approaches differentiate based on: 1. Search space 2. Search strategy 3. Performance estimation strategy
  • 21. Tools & Frameworks Taxonomy Tech Giants Start-ups Open Source 21
  • 22. 1. Some Tools aims to automate the entire ML pipeline, while others are focused on specific tasks (such as Features engineering, Model Selection or Hyperparameter optimization) 2. Some tools require code, other offers GUI and other supports both of them. Notes 22
  • 23. Source: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-auto-train from azureml.train.automl import AutoMLConfig # task can be one of classification, regression, forecasting automl_config = AutoMLConfig(task = "classification") from azureml.core.dataset import Dataset data = "data/path" dataset = Dataset.Tabular.from_delimited_files(data) automl_classifier=AutoMLConfig( task='classification', primary_metric='AUC_weighted', experiment_timeout_minutes=30, blocked_models=['XGBoostClassifier'], training_data=train_data, label_column_name=label, n_cross_validations=2) Auto ML: Code example (I) 23
  • 24. Auto ML: Code example (II) 24
  • 25. Auto ML: GUI example (1/2) 25
  • 26. Auto ML: GUI example (2/2) 26
  • 27. 27
  • 28. Case Study: Google Cloud Platform (GCP), AI, Tables 28
  • 33. Case Study: The Numerai dataset ● Numerai’s trades are determined by an AI, which is fueled by a network of thousands of anonymous data scientists ● Numerai hosts a weekly tournament, in which data scientists submit their predictions in exchange for the potential to earn some amount of USD and cryptocurrency called Numeraire. ● Predictions submitted by users steer Numerai's hedge fund together. You could learn more here: https://numer.ai/learn Numerai is an AI-run, crowd-sourced hedge fund based in San Francisco.
  • 34. 5 Steps Pipeline: 1. Import, 2. Train, 3. Models, 4. Evaluate, 5. Test & Use Case Study: GCP Tables (zero-code)
  • 35. Step 1: Import dataset
  • 36. Step 2: Train 1/2 (Analyze dataset, Define target)
  • 37. Step 2: Train 2/2 (Define train parameters)
  • 41. Step 5: Test & Use (Batch prediction)
  • 42. Step 5: Test & Use (Online prediction)
  • 43. Step 5: Test & Use (Export your model)
  • 45. 1. Fast experimentation. Automatically search an entire space of candidate Machine learning pipelines and returns the best one. Shows what approaches are the best for the specific dataset and the specific task. 2. Handles the boring stuff. (hyperparameter tuning etc.) and helps the data scientist to focus on important things (business problem modeling) 3. Democratization of Machine Learning. No need for programming skills and/or understanding of Machine Learning (I’m not sure if this is an advantage!) Advantages 45
  • 46. 1. Not fully automated 2. Not customized solutions 3. Not human competitive (yet) Kagglers are still state-of-the-art 4. There aren’t any Reinforcement Learning applications. 5. Lack of domain knowledge. Poor feature extraction skills. Disadvantages 46
  • 47. “Now you can fake your Machine Learning expertise, without even understanding linear regression!” “Use Auto ML tools to Free up time from boring and repetitive tasks, in order to focus in what actually matters.” Conclusion. 47 “Now you can fake your Machine Learning expertise, without even understanding linear regression!”