SlideShare a Scribd company logo
11
Most read
12
Most read
13
Most read
An Introduction to
Neural Architecture Search
Colin White, RealityEngines.AI
Deep learning
● Explosion of interest since 2012
● Very powerful machine learning technique
● Huge variety of neural networks for different tasks
● Key ingredients took years to develop
● Algorithms are getting increasingly more specialized and complicated
Algorithms are getting increasingly more specialized and complicated
E.g. accuracy on ImageNet has steadily improved over 10 years
Deep learning
● What if an algorithm could do this
for us?
● Neural architecture search (NAS)
is a hot area of research
● Given a dataset, define a search
space of architectures, then use a
search strategy to find the best
architecture for your dataset
Neural architecture search
Outline
● Introduction to NAS
● Background on deep learning
● Automated machine learning
● Optimization techniques
● NAS Framework
○ Search space
○ Search strategy
○ Evaluation method
● Conclusions
Background - deep learning
Source:https://www.nextplatform.com/2017/03/21/can-fpgas-beat-gpus-accelerating-
next-generation-deep-learning/
● Studied since the 1940s - simulate the human brain
● “Neural networks are the second-best way to do almost anything” - JS Denker, 2000s
● Breakthrough: 2012 ImageNet competition [Krizhevsky, Sutskever, and Hinton]
Automated Machine Learning
● Automated machine learning
○ Data cleaning, model selection, HPO, NAS, ...
● Hyperparameter optimization (HPO)
○ Learning rate, dropout rate, batch size, ...
● Neural architecture search
○ Finding the best neural architecture
Source:https://determined.ai/blog/neural-arc
hitecture-search/
Optimization
● Zero’th order optimization (used for HPO, NAS)
○ Bayesian Optimization
● First order optimization (used for Neural nets)
○ Gradient descent / stochastic gradient descent
● Second order optimization
○ Newton’s method
Source: https://ieeexplore.ieee.org/document/8422997
Zero’th order optimization
● Grid search
● Random search
● Bayesian Optimization
○ Use the results of the previous guesses to make the next guess
Source: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/
Outline
● Introduction to NAS
● Background on deep learning
● Automated machine learning
● Optimization techniques
● NAS Framework
○ Search space
○ Search strategy
○ Evaluation method
● Conclusions
Neural Architecture Search
Evaluation method
● Full training
● Partial training
● Training with shared
weights
Search space
● Cell-based search space
● Macro vs micro search
Search strategy
● Reinforcement learning
● Continuous methods
● Bayesian optimization
● Evolutionary algorithm
Search space
● Macro vs micro search
● Progressive search
● Cell-based search space
[Zoph, Le ‘16]
Bayesian optimization
● Popular method [Golovin et al. ‘17], [Jin et al. ‘18], [Kandasamy et al. ‘18]
● Great method to optimize an expensive function
● Fix a dataset (e.g. CIFAR-10, MNIST)
● Define a search space A (e.g., 20 layers of {conv, pool, ReLU, dense})
● Define an objective function f:A→[0,1]
○ f(a) = validation accuracy of a after training
● Define a distance function d(a1
, a2
) between architectures
○ Quick to evaluate. If d(a1
, a2
) is small, | f(a1
) - f( a2
) | is small
Bayesian optimization
Goal: find a∈A which maximizes f(a)
● Choose several random architectures a and evaluate f(a)
● In each iteration i:
○ Use f(a1
) … f(ai-1
) to choose new ai
○ Evaluate f(ai
)
● Fix a dataset (e.g. CIFAR-10, MNIST)
● Define a search space A (e.g., 20 layers of {conv, pool, ReLU, dense})
● Define an objective function f:A→[0,1]
○ f(a) = validation accuracy of a after training
● Define a distance function d(a1
, a2
) between architectures
○ Quick to evaluate. If d(a1
, a2
) is small, | f(a1
) - f( a2
) | is small
Gaussian process
● Assume the distribution f(A) is smooth
● The deviations look like Gaussian noise
● Update as we get more information
Source: http://keyonvafa.com/gp-tutorial/,
https://katbailey.github.io/post/gaussian-processes-for-dummies/
Acquisition function
● In each iteration,
find the architecture
with the largest
expected
improvement
DARTS: Differentiable Architecture Search
● Relax NAS to a continuous problem
● Use gradient descent (just like normal parameters)
[Liu et al. ‘18]
DARTS: Differentiable Architecture Search
● Upsides: “one-shot”
● Downside: may only work in “micro” search setting
[Liu et al. ‘18]
Reinforcement Learning
● Controller recurrent neural network
○ Chooses a new architecture in each round
○ Architecture is trained and evaluated
○ Controller receives feedback
[Zoph, Le ‘16]
Reinforcement Learning
● Upside: much more powerful than BayesOpt, gradient descent
● Downsides: train a whole new network using neural networks;
RL could be overkill
[Zoph, Le ‘16]
Evaluation Strategy
● Full training
○ Simple
○ Accurate
● Partial training
○ Less computation
○ Less accurate
● Shared weights
○ Least computation
○ Least accurate
Source:
https://www.automl.org/blog-2nd-a
utoml-challenge/
Is NAS ready for widespread adoption?
● Hard to reproduce results [Li, Talwalkar ‘19]
● Hard to compare different papers
● Search spaces have been getting smaller
● Random search is a strong baseline [Li, Talwalkar ‘19], [Sciuto et al. ‘19]
● Recent papers are giving fair comparisons
● NAS cannot yet consistently beat human engineers
● Auto-Keras tutorial: https://www.pyimagesearch.com/2019/01/07/auto-keras-and-automl-a-getting-started-guide/
● DARTS repo: https://github.com/quark0/darts
Conclusion
● NAS: find the best neural architecture for a given dataset
● Search space, search strategy, evaluation method
● Search strategies: RL, BayesOpt, continuous optimization
● Not yet at the point of widespread adoption in industry
Thanks! Questions?

More Related Content

PDF
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
PDF
Basic Introduction Flutter Framework.pdf
PPTX
Indoor navigation system
PDF
Artificial Intelligence PowerPoint Presentation Slide Template Complete Deck
PDF
And then there were ... Large Language Models
PPTX
Indoor localization using wifi fingerprinting
PDF
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
PPTX
Wi-Fi based indoor positioning
“Introduction to DNN Model Compression Techniques,” a Presentation from Xailient
Basic Introduction Flutter Framework.pdf
Indoor navigation system
Artificial Intelligence PowerPoint Presentation Slide Template Complete Deck
And then there were ... Large Language Models
Indoor localization using wifi fingerprinting
KubeFlow + GPU + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTo...
Wi-Fi based indoor positioning

What's hot (20)

PDF
Neural Architecture Search: Learning How to Learn
PDF
201907 AutoML and Neural Architecture Search
PDF
Introduction to Few shot learning
PPTX
Transfer Learning and Fine-tuning Deep Neural Networks
PPTX
Generative Adversarial Network (GAN)
PPTX
Deep Learning With Neural Networks
PPTX
Meta-Learning Presentation
PPTX
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
PDF
Machine Learning: Introduction to Neural Networks
PDF
Neural networks and deep learning
PPTX
Few shot learning/ one shot learning/ machine learning
PPTX
Deep learning
PDF
Deep Learning - Convolutional Neural Networks
PDF
Convolutional Neural Networks (CNN)
PDF
Distributed machine learning
PDF
Training Neural Networks
PDF
Transfer Learning
PPTX
Convolutional neural network
PPTX
Deep Learning - CNN and RNN
PPTX
Deep Learning in Computer Vision
Neural Architecture Search: Learning How to Learn
201907 AutoML and Neural Architecture Search
Introduction to Few shot learning
Transfer Learning and Fine-tuning Deep Neural Networks
Generative Adversarial Network (GAN)
Deep Learning With Neural Networks
Meta-Learning Presentation
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Machine Learning: Introduction to Neural Networks
Neural networks and deep learning
Few shot learning/ one shot learning/ machine learning
Deep learning
Deep Learning - Convolutional Neural Networks
Convolutional Neural Networks (CNN)
Distributed machine learning
Training Neural Networks
Transfer Learning
Convolutional neural network
Deep Learning - CNN and RNN
Deep Learning in Computer Vision
Ad

Similar to An Introduction to Neural Architecture Search (20)

PDF
“Leveraging Neural Architecture Search for Efficient Computer Vision on the E...
PDF
Architecture Design for Deep Neural Networks III
PDF
DLD meetup 2017, Efficient Deep Learning
PDF
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
PDF
Introduction to Chainer
PDF
Introduction to Chainer
PDF
Centernet
PDF
Scalable high-dimensional indexing with Hadoop
PDF
Visual concept learning
PDF
Terabyte-scale image similarity search: experience and best practice
PPTX
Reduce Query Time Up to 60% with Selective Search
PDF
Arpan_booth_talk_2 DNN and Tsnor Floww.pdf
PPTX
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
PDF
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
PDF
Object Detection Beyond Mask R-CNN and RetinaNet II
PDF
Autimatic Machine Learning and Artificial Intelligence
PPTX
Introduction to deep learning
PDF
Volodymyr Lyubinets: Аналіз супутникових зображень: визначаємо параметри буді...
PDF
Session-based recommendations with recurrent neural networks
PDF
Vector databases and neural search
“Leveraging Neural Architecture Search for Efficient Computer Vision on the E...
Architecture Design for Deep Neural Networks III
DLD meetup 2017, Efficient Deep Learning
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Introduction to Chainer
Introduction to Chainer
Centernet
Scalable high-dimensional indexing with Hadoop
Visual concept learning
Terabyte-scale image similarity search: experience and best practice
Reduce Query Time Up to 60% with Selective Search
Arpan_booth_talk_2 DNN and Tsnor Floww.pdf
Panel: Building the NRP Ecosystem with the Regional Networks on their Campuses;
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Se...
Object Detection Beyond Mask R-CNN and RetinaNet II
Autimatic Machine Learning and Artificial Intelligence
Introduction to deep learning
Volodymyr Lyubinets: Аналіз супутникових зображень: визначаємо параметри буді...
Session-based recommendations with recurrent neural networks
Vector databases and neural search
Ad

More from Bill Liu (20)

PDF
Walk Through a Real World ML Production Project
PDF
Redefining MLOps with Model Deployment, Management and Observability in Produ...
PDF
Productizing Machine Learning at the Edge
PPTX
Transformers in Vision: From Zero to Hero
PDF
Deep AutoViML For Tensorflow Models and MLOps Workflows
PDF
Metaflow: The ML Infrastructure at Netflix
PDF
Practical Crowdsourcing for ML at Scale
PDF
Building large scale transactional data lake using apache hudi
PDF
Deep Reinforcement Learning and Its Applications
PDF
Big Data and AI in Fighting Against COVID-19
PDF
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
PDF
Build computer vision models to perform object detection and classification w...
PDF
Causal Inference in Data Science and Machine Learning
PDF
Weekly #106: Deep Learning on Mobile
PDF
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
PDF
AISF19 - On Blending Machine Learning with Microeconomics
PDF
AISF19 - Travel in the AI-First World
PDF
AISF19 - Unleash Computer Vision at the Edge
PDF
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
PDF
Toronto meetup 20190917
Walk Through a Real World ML Production Project
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Productizing Machine Learning at the Edge
Transformers in Vision: From Zero to Hero
Deep AutoViML For Tensorflow Models and MLOps Workflows
Metaflow: The ML Infrastructure at Netflix
Practical Crowdsourcing for ML at Scale
Building large scale transactional data lake using apache hudi
Deep Reinforcement Learning and Its Applications
Big Data and AI in Fighting Against COVID-19
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Build computer vision models to perform object detection and classification w...
Causal Inference in Data Science and Machine Learning
Weekly #106: Deep Learning on Mobile
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - Travel in the AI-First World
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
Toronto meetup 20190917

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
cuic standard and advanced reporting.pdf
PDF
Electronic commerce courselecture one. Pdf
PPT
Teaching material agriculture food technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Machine learning based COVID-19 study performance prediction
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Encapsulation theory and applications.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
MYSQL Presentation for SQL database connectivity
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
The AUB Centre for AI in Media Proposal.docx
Digital-Transformation-Roadmap-for-Companies.pptx
cuic standard and advanced reporting.pdf
Electronic commerce courselecture one. Pdf
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine learning based COVID-19 study performance prediction
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
NewMind AI Monthly Chronicles - July 2025
Encapsulation theory and applications.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Understanding_Digital_Forensics_Presentation.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
Encapsulation_ Review paper, used for researhc scholars

An Introduction to Neural Architecture Search

  • 1. An Introduction to Neural Architecture Search Colin White, RealityEngines.AI
  • 2. Deep learning ● Explosion of interest since 2012 ● Very powerful machine learning technique ● Huge variety of neural networks for different tasks ● Key ingredients took years to develop ● Algorithms are getting increasingly more specialized and complicated
  • 3. Algorithms are getting increasingly more specialized and complicated E.g. accuracy on ImageNet has steadily improved over 10 years Deep learning
  • 4. ● What if an algorithm could do this for us? ● Neural architecture search (NAS) is a hot area of research ● Given a dataset, define a search space of architectures, then use a search strategy to find the best architecture for your dataset Neural architecture search
  • 5. Outline ● Introduction to NAS ● Background on deep learning ● Automated machine learning ● Optimization techniques ● NAS Framework ○ Search space ○ Search strategy ○ Evaluation method ● Conclusions
  • 6. Background - deep learning Source:https://www.nextplatform.com/2017/03/21/can-fpgas-beat-gpus-accelerating- next-generation-deep-learning/ ● Studied since the 1940s - simulate the human brain ● “Neural networks are the second-best way to do almost anything” - JS Denker, 2000s ● Breakthrough: 2012 ImageNet competition [Krizhevsky, Sutskever, and Hinton]
  • 7. Automated Machine Learning ● Automated machine learning ○ Data cleaning, model selection, HPO, NAS, ... ● Hyperparameter optimization (HPO) ○ Learning rate, dropout rate, batch size, ... ● Neural architecture search ○ Finding the best neural architecture Source:https://determined.ai/blog/neural-arc hitecture-search/
  • 8. Optimization ● Zero’th order optimization (used for HPO, NAS) ○ Bayesian Optimization ● First order optimization (used for Neural nets) ○ Gradient descent / stochastic gradient descent ● Second order optimization ○ Newton’s method Source: https://ieeexplore.ieee.org/document/8422997
  • 9. Zero’th order optimization ● Grid search ● Random search ● Bayesian Optimization ○ Use the results of the previous guesses to make the next guess Source: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/
  • 10. Outline ● Introduction to NAS ● Background on deep learning ● Automated machine learning ● Optimization techniques ● NAS Framework ○ Search space ○ Search strategy ○ Evaluation method ● Conclusions
  • 11. Neural Architecture Search Evaluation method ● Full training ● Partial training ● Training with shared weights Search space ● Cell-based search space ● Macro vs micro search Search strategy ● Reinforcement learning ● Continuous methods ● Bayesian optimization ● Evolutionary algorithm
  • 12. Search space ● Macro vs micro search ● Progressive search ● Cell-based search space [Zoph, Le ‘16]
  • 13. Bayesian optimization ● Popular method [Golovin et al. ‘17], [Jin et al. ‘18], [Kandasamy et al. ‘18] ● Great method to optimize an expensive function ● Fix a dataset (e.g. CIFAR-10, MNIST) ● Define a search space A (e.g., 20 layers of {conv, pool, ReLU, dense}) ● Define an objective function f:A→[0,1] ○ f(a) = validation accuracy of a after training ● Define a distance function d(a1 , a2 ) between architectures ○ Quick to evaluate. If d(a1 , a2 ) is small, | f(a1 ) - f( a2 ) | is small
  • 14. Bayesian optimization Goal: find a∈A which maximizes f(a) ● Choose several random architectures a and evaluate f(a) ● In each iteration i: ○ Use f(a1 ) … f(ai-1 ) to choose new ai ○ Evaluate f(ai ) ● Fix a dataset (e.g. CIFAR-10, MNIST) ● Define a search space A (e.g., 20 layers of {conv, pool, ReLU, dense}) ● Define an objective function f:A→[0,1] ○ f(a) = validation accuracy of a after training ● Define a distance function d(a1 , a2 ) between architectures ○ Quick to evaluate. If d(a1 , a2 ) is small, | f(a1 ) - f( a2 ) | is small
  • 15. Gaussian process ● Assume the distribution f(A) is smooth ● The deviations look like Gaussian noise ● Update as we get more information Source: http://keyonvafa.com/gp-tutorial/, https://katbailey.github.io/post/gaussian-processes-for-dummies/
  • 16. Acquisition function ● In each iteration, find the architecture with the largest expected improvement
  • 17. DARTS: Differentiable Architecture Search ● Relax NAS to a continuous problem ● Use gradient descent (just like normal parameters) [Liu et al. ‘18]
  • 18. DARTS: Differentiable Architecture Search ● Upsides: “one-shot” ● Downside: may only work in “micro” search setting [Liu et al. ‘18]
  • 19. Reinforcement Learning ● Controller recurrent neural network ○ Chooses a new architecture in each round ○ Architecture is trained and evaluated ○ Controller receives feedback [Zoph, Le ‘16]
  • 20. Reinforcement Learning ● Upside: much more powerful than BayesOpt, gradient descent ● Downsides: train a whole new network using neural networks; RL could be overkill [Zoph, Le ‘16]
  • 21. Evaluation Strategy ● Full training ○ Simple ○ Accurate ● Partial training ○ Less computation ○ Less accurate ● Shared weights ○ Least computation ○ Least accurate Source: https://www.automl.org/blog-2nd-a utoml-challenge/
  • 22. Is NAS ready for widespread adoption? ● Hard to reproduce results [Li, Talwalkar ‘19] ● Hard to compare different papers ● Search spaces have been getting smaller ● Random search is a strong baseline [Li, Talwalkar ‘19], [Sciuto et al. ‘19] ● Recent papers are giving fair comparisons ● NAS cannot yet consistently beat human engineers ● Auto-Keras tutorial: https://www.pyimagesearch.com/2019/01/07/auto-keras-and-automl-a-getting-started-guide/ ● DARTS repo: https://github.com/quark0/darts
  • 23. Conclusion ● NAS: find the best neural architecture for a given dataset ● Search space, search strategy, evaluation method ● Search strategies: RL, BayesOpt, continuous optimization ● Not yet at the point of widespread adoption in industry Thanks! Questions?