SlideShare a Scribd company logo
1
Use AI to Build AI
The Evolution of AutoML
Ning Jiang
CTO, OneClick.ai
2018
Ning Jiang
Co-founder of OneClick.ai, the first automated
Deep Learning platform in the market.
Previously Dev Manager at Microsoft Bing, Ning
has over 15 years of R&D experience in AI for ads,
search, and cyber security.
2
So, Why AutoML?
{ Challenges in AI Applications }
4
1. Never enough experienced data scientists
2. Long development cycle (typically 3 mo to 0.5 year)
3. High risk of failure
4. Endless engineering traps in implementation and
maintenance
{ Coming Along With Deep Learning }
5
1. Few experienced data scientists and engineers
2. Increasing complexity in data (mix images, text, and numbers)
3. Algorithms need to be customized
4. Increased design choices and hyper-parameters
5. Much harder to debug
What is AutoML
{ AutoML }
7
Controller
Model Training Model Validation
Model Designs
Validation DataTraining Data
{ Key Challenges }
8
1. Satisfy semantic Constraints (e.g. data types)
2. Take the feedback to improve model designs
3. Minimize number of models to train
4. Avoid local minima
5. Speed up model training
{ Neural Architecture Search }
9
1. Evolutionary algorithms
(ref: https://arxiv.org/abs/1703.01041)
2. Greedy search
(ref: https://arxiv.org/abs/1712.00559)
3. Reinforcement learning
(ref: https://arxiv.org/abs/1611.01578)
4. Speed up model training
(ref: https://arxiv.org/abs/1802.03268)
Greedy Search
{ Target Scenarios }
11
1. Image classification (on CIFAR-10 & ImageNet)
2. Using only Convolution & Pooling layers
3. This is what powers Google AutoML
{ Constraints }
12
1. Predefined architectures
2. N=2
3. # of filters decided by heuristics
4. NAS to find optimal Cell
structure
{ Basic constructs }
13
Each construct has
1. Two inputs
2. Processed by two operators
3. One combined output
Operator 1 Operator 2
输入1 输入2
{ Predefined Operators }
14
Why these and these only?
1. 3X3 convolution
2. 5X5 convolution
3. 7X7 convolution
4. Identity (pass through)
5. 3X3 average pooling
6. 3X3 max pooling
7. 3x3 dilated convolution
8. 1X7 followed by 7X1 convolution
Operator 1 Operator 2
输入1 输入2
{ Cells }
15
1. Stacking up to 5 basic
constructs
2. About 5.6x1014
cell
candidates
{ Greedy Search }
16
1. Start with a single construct
(m=1)
2. There are 256 possibilities
3. Add one more construct
4. Pick the best K (256) cells to train
5. Repeat step ¾ until we have 5
constructs in the cell
6. 1028 models to be trained
{ Pick the best cells}
17
1. Cells as a sequence of choices
2. LSTM to estimate model
accuracy
3. Training data are from trained
models (up to 1024 examples)
4. 99.03% accuracy at m=2
5. 99.52% at m=5
LSTM
Dense
Input2
Input2
Operator1
Operator2
{ Summary }
18
1. Fewer models to train
○ Remarkable improvement over evolutionary algorithms
2. Search from simple to complex models
3. Heavy use of domain knowledge and heuristics
4. Suboptimal results due to greedy search
5. Can’t generalize to other problems
Reinforcement Learning
{ Why RL? }
20
1. RL is a generative model
2. RL doesn't assume less domain knowledge on the problem
3. Trained model accuracy is used as rewards
{ RNN Controller }
21
{ RNN Controler }
22
1. Autoregressive RNN
2. Outputs capable of describe any architecture
3. Support non-linear architecture using Skip Connections
{ Skip Connections }
23
{ Stochastic Sampling }
24
For example:
1. Filter size has 4 choices:24,36,48,64
2. For each layer of convolution, RNN outputs a distribution:
○ 60%,20% ,10%, 10%)
○ With 60% chances, the filter size will be 24
3. This helps collects data to correct controller’s mistakes
{ Training RNN Controller }
25
1. Use REINFORCE to update controller parameters
○ Binary rewards (0/1)
○ Trained model accuracy is the prob. of reward being 1
○ Apply cross entropy to RNN outputs
2. Designs with higher accuracy are assigned higher prob.
{ Speed Up Model Training }
26
1. When same layers are shared across architectures
2. Share the same layer parameters
3. Alternating training between models
{ Summary }
27
1. Better model accuracy
2. Can be made to work with complex architectures
3. Able to correct controller mistakes (e.g. bias)
4. Speed up training when layers can be shared
○ From 40K to 16 GPU hours
5. Designed for specific type of problems
6. Still very expensive with typically 10K GPU hours
So, What is Next?
{ Challenges }
29
1. NAS algorithms are domain specific
2. Only neural networks are supported
3. Heavy use of human heuristics
4. Expensive (thousands of GPU hours)
5. Cold start problem: NAS has no prior knowledge about data
{ Our Answer }
30
Controller
Model Training Model Validation
Model Designs
Validation DataTraining Data
Training Data
{ Generalized Architecture Search }
31
1. Accumulate domain knowledge over time
2. Works with any algorithm (neural networks or not)
3. Automated feature engineering
4. Much fewer models to train
5. GAS powers OneClick.ai
32
Use AI to Build AI
1. Custom-built Deep Learning models for best performance
2. Model designs improved iteratively in few hours
3. Better models in fewer shots due to self-learned domain
knowledge
Meta-learning evaluates millions of
deep learning models in the blink of
an eye. US patent pending
33
Versatile Applications
1. Data types: numeric, categorical, date/time, textual, images
2. Applications: regression, classification, time-series forecasting,
clustering, recommendations, vision
Powered by deep learning, we support
an unprecedented range of applications
and data types
34
Unparalleled Simplicity
1. Users need zero AI background
2. Simpler to use than Excel
3. Advanced functions available to experts via a chatbot
Thanks to a chatbot-based UX, we can
accommodate both newbie and expert
users
Use AI to Build AI
Sign up on http://oneclick.ai
ask@oneclick.ai

More Related Content

PDF
AutoML - The Future of AI
PPTX
Automated Machine Learning
PDF
Automatic Machine Learning, AutoML
PPTX
Microsoft Introduction to Automated Machine Learning
PDF
Automated Machine Learning
PDF
Automatic machine learning (AutoML) 101
PPTX
Automated Machine Learning (Auto ML)
PDF
The Power of Auto ML and How Does it Work
AutoML - The Future of AI
Automated Machine Learning
Automatic Machine Learning, AutoML
Microsoft Introduction to Automated Machine Learning
Automated Machine Learning
Automatic machine learning (AutoML) 101
Automated Machine Learning (Auto ML)
The Power of Auto ML and How Does it Work

What's hot (20)

PPTX
MLOps with Azure DevOps
PPTX
Explainable AI in Industry (KDD 2019 Tutorial)
PPTX
Explainable AI in Industry (FAT* 2020 Tutorial)
PDF
What is MLOps
PDF
ML-Ops: Philosophy, Best-Practices and Tools
PDF
Ml ops intro session
PDF
MLops workshop AWS
PDF
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
PDF
MLOps Using MLflow
PPTX
An Introduction to XAI! Towards Trusting Your ML Models!
PDF
MLOps with Kubeflow
PDF
Ml ops past_present_future
PPTX
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
PDF
Using MLOps to Bring ML to Production/The Promise of MLOps
PDF
Explainable AI (XAI) - A Perspective
PDF
MLOps for production-level machine learning
PDF
Guiding through a typical Machine Learning Pipeline
PDF
Building a performing Machine Learning model from A to Z
PPTX
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
PPT
2.17Mb ppt
MLOps with Azure DevOps
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
What is MLOps
ML-Ops: Philosophy, Best-Practices and Tools
Ml ops intro session
MLops workshop AWS
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
MLOps Using MLflow
An Introduction to XAI! Towards Trusting Your ML Models!
MLOps with Kubeflow
Ml ops past_present_future
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)
Using MLOps to Bring ML to Production/The Promise of MLOps
Explainable AI (XAI) - A Perspective
MLOps for production-level machine learning
Guiding through a typical Machine Learning Pipeline
Building a performing Machine Learning model from A to Z
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
2.17Mb ppt
Ad

Similar to The Evolution of AutoML (20)

PDF
Current clustering techniques
PDF
Machine Learning Infrastructure
PPTX
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
PDF
Alexandra johnson reducing operational barriers to model training
PDF
SigOpt at MLconf - Reducing Operational Barriers to Model Training
PPTX
Lessons Learned from Building Machine Learning Software at Netflix
PPTX
Serving deep learning models in a serverless platform (IC2E 2018)
PDF
Thesis Defense (Gwendal DANIEL) - Nov 2017
PPTX
Unsupervised Feature Learning
PPTX
Webinar: Deep Learning Pipelines Beyond the Learning
PDF
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
PDF
T-GCPMLE-A-m1-l1-en-bxbnzmmafile-1.en.pdf
PPTX
Bangla Hand Written Digit Recognition presentation slide .pptx
PDF
Data Con LA 2018 - Towards Data Science Engineering Principles by Joerg Schad
PDF
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
PPSX
Intro to Deep Learning with Keras - using TensorFlow backend
PPTX
3DCS and Parallel Works Provide Cloud Computing for FAST Tolerance Analysis
PPT
Mps intro
PDF
Many-Objective Performance Enhancement in Computing Clusters
PDF
First steps with Keras 2: A tutorial with Examples
Current clustering techniques
Machine Learning Infrastructure
Scaling TensorFlow Models for Training using multi-GPUs & Google Cloud ML
Alexandra johnson reducing operational barriers to model training
SigOpt at MLconf - Reducing Operational Barriers to Model Training
Lessons Learned from Building Machine Learning Software at Netflix
Serving deep learning models in a serverless platform (IC2E 2018)
Thesis Defense (Gwendal DANIEL) - Nov 2017
Unsupervised Feature Learning
Webinar: Deep Learning Pipelines Beyond the Learning
Deep learning beyond the learning - Jörg Schad - Codemotion Amsterdam 2018
T-GCPMLE-A-m1-l1-en-bxbnzmmafile-1.en.pdf
Bangla Hand Written Digit Recognition presentation slide .pptx
Data Con LA 2018 - Towards Data Science Engineering Principles by Joerg Schad
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Intro to Deep Learning with Keras - using TensorFlow backend
3DCS and Parallel Works Provide Cloud Computing for FAST Tolerance Analysis
Mps intro
Many-Objective Performance Enhancement in Computing Clusters
First steps with Keras 2: A tutorial with Examples
Ad

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Big Data Technologies - Introduction.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
KodekX | Application Modernization Development
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Building Integrated photovoltaic BIPV_UPV.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
MYSQL Presentation for SQL database connectivity
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Encapsulation theory and applications.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Digital-Transformation-Roadmap-for-Companies.pptx
Big Data Technologies - Introduction.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Electronic commerce courselecture one. Pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
KodekX | Application Modernization Development
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf

The Evolution of AutoML

  • 1. 1 Use AI to Build AI The Evolution of AutoML Ning Jiang CTO, OneClick.ai 2018
  • 2. Ning Jiang Co-founder of OneClick.ai, the first automated Deep Learning platform in the market. Previously Dev Manager at Microsoft Bing, Ning has over 15 years of R&D experience in AI for ads, search, and cyber security. 2
  • 4. { Challenges in AI Applications } 4 1. Never enough experienced data scientists 2. Long development cycle (typically 3 mo to 0.5 year) 3. High risk of failure 4. Endless engineering traps in implementation and maintenance
  • 5. { Coming Along With Deep Learning } 5 1. Few experienced data scientists and engineers 2. Increasing complexity in data (mix images, text, and numbers) 3. Algorithms need to be customized 4. Increased design choices and hyper-parameters 5. Much harder to debug
  • 7. { AutoML } 7 Controller Model Training Model Validation Model Designs Validation DataTraining Data
  • 8. { Key Challenges } 8 1. Satisfy semantic Constraints (e.g. data types) 2. Take the feedback to improve model designs 3. Minimize number of models to train 4. Avoid local minima 5. Speed up model training
  • 9. { Neural Architecture Search } 9 1. Evolutionary algorithms (ref: https://arxiv.org/abs/1703.01041) 2. Greedy search (ref: https://arxiv.org/abs/1712.00559) 3. Reinforcement learning (ref: https://arxiv.org/abs/1611.01578) 4. Speed up model training (ref: https://arxiv.org/abs/1802.03268)
  • 11. { Target Scenarios } 11 1. Image classification (on CIFAR-10 & ImageNet) 2. Using only Convolution & Pooling layers 3. This is what powers Google AutoML
  • 12. { Constraints } 12 1. Predefined architectures 2. N=2 3. # of filters decided by heuristics 4. NAS to find optimal Cell structure
  • 13. { Basic constructs } 13 Each construct has 1. Two inputs 2. Processed by two operators 3. One combined output Operator 1 Operator 2 输入1 输入2
  • 14. { Predefined Operators } 14 Why these and these only? 1. 3X3 convolution 2. 5X5 convolution 3. 7X7 convolution 4. Identity (pass through) 5. 3X3 average pooling 6. 3X3 max pooling 7. 3x3 dilated convolution 8. 1X7 followed by 7X1 convolution Operator 1 Operator 2 输入1 输入2
  • 15. { Cells } 15 1. Stacking up to 5 basic constructs 2. About 5.6x1014 cell candidates
  • 16. { Greedy Search } 16 1. Start with a single construct (m=1) 2. There are 256 possibilities 3. Add one more construct 4. Pick the best K (256) cells to train 5. Repeat step ¾ until we have 5 constructs in the cell 6. 1028 models to be trained
  • 17. { Pick the best cells} 17 1. Cells as a sequence of choices 2. LSTM to estimate model accuracy 3. Training data are from trained models (up to 1024 examples) 4. 99.03% accuracy at m=2 5. 99.52% at m=5 LSTM Dense Input2 Input2 Operator1 Operator2
  • 18. { Summary } 18 1. Fewer models to train ○ Remarkable improvement over evolutionary algorithms 2. Search from simple to complex models 3. Heavy use of domain knowledge and heuristics 4. Suboptimal results due to greedy search 5. Can’t generalize to other problems
  • 20. { Why RL? } 20 1. RL is a generative model 2. RL doesn't assume less domain knowledge on the problem 3. Trained model accuracy is used as rewards
  • 22. { RNN Controler } 22 1. Autoregressive RNN 2. Outputs capable of describe any architecture 3. Support non-linear architecture using Skip Connections
  • 24. { Stochastic Sampling } 24 For example: 1. Filter size has 4 choices:24,36,48,64 2. For each layer of convolution, RNN outputs a distribution: ○ 60%,20% ,10%, 10%) ○ With 60% chances, the filter size will be 24 3. This helps collects data to correct controller’s mistakes
  • 25. { Training RNN Controller } 25 1. Use REINFORCE to update controller parameters ○ Binary rewards (0/1) ○ Trained model accuracy is the prob. of reward being 1 ○ Apply cross entropy to RNN outputs 2. Designs with higher accuracy are assigned higher prob.
  • 26. { Speed Up Model Training } 26 1. When same layers are shared across architectures 2. Share the same layer parameters 3. Alternating training between models
  • 27. { Summary } 27 1. Better model accuracy 2. Can be made to work with complex architectures 3. Able to correct controller mistakes (e.g. bias) 4. Speed up training when layers can be shared ○ From 40K to 16 GPU hours 5. Designed for specific type of problems 6. Still very expensive with typically 10K GPU hours
  • 28. So, What is Next?
  • 29. { Challenges } 29 1. NAS algorithms are domain specific 2. Only neural networks are supported 3. Heavy use of human heuristics 4. Expensive (thousands of GPU hours) 5. Cold start problem: NAS has no prior knowledge about data
  • 30. { Our Answer } 30 Controller Model Training Model Validation Model Designs Validation DataTraining Data Training Data
  • 31. { Generalized Architecture Search } 31 1. Accumulate domain knowledge over time 2. Works with any algorithm (neural networks or not) 3. Automated feature engineering 4. Much fewer models to train 5. GAS powers OneClick.ai
  • 32. 32 Use AI to Build AI 1. Custom-built Deep Learning models for best performance 2. Model designs improved iteratively in few hours 3. Better models in fewer shots due to self-learned domain knowledge Meta-learning evaluates millions of deep learning models in the blink of an eye. US patent pending
  • 33. 33 Versatile Applications 1. Data types: numeric, categorical, date/time, textual, images 2. Applications: regression, classification, time-series forecasting, clustering, recommendations, vision Powered by deep learning, we support an unprecedented range of applications and data types
  • 34. 34 Unparalleled Simplicity 1. Users need zero AI background 2. Simpler to use than Excel 3. Advanced functions available to experts via a chatbot Thanks to a chatbot-based UX, we can accommodate both newbie and expert users
  • 35. Use AI to Build AI Sign up on http://oneclick.ai ask@oneclick.ai