SlideShare a Scribd company logo
Task-Adaptive Neural Network Search
with Meta-Contrastive Learning
Wonyong Jeong∗,#,$, Hayeon Lee∗,%,$, Geon Park∗,#,$,
Eunyoung Hyung#, Jinheon Baek#, and Sung Ju Hwang#,%,$
Graduate Shool of AI!, KAIST, Seoul, South Korea
School of Computing"
, KAIST, Daejeon, South Korea
AITRICS#, Seoul, South Korea
∗: 𝐸𝑞𝑢𝑎𝑙 𝐶𝑜𝑛𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
1
Motivation
In most cases, the exhaustive trial-and-error and brute force efforts have been often
required to design and tune the neural networks to get good models on given datasets.
Neural Architecture Search (NAS) alleviates such costs by automatically building neural
architectures performing even higher than hand-crafted networks.
Search Strategy
Performance Estimation Strategy
Architecture Search Space
Feedback
Trials
Human Model
Manual Design Process
Network
Architecture
Estimated
Performance
Optimal
Architecture
Neural Architecture Search (NAS)
2
Motivation: The Limitations
Most conventional NAS approaches search for only optimal architectures without
generating parameters, which requires additional training steps on a given dataset.
While some recent NAS methods* depend on a supernet pretrained on ImageNet, they
may be suboptimal if the target tasks are highly dissimilar from ImageNet.
Pretraining Supernet on Large-Scale Dataset
Additional Training Phases on Target Dataset
*[Once-for-All] Cai, H et al. Once-for-all: Train one network and specialize it for efficient deployment. ICLR 2020.
3
Neural Network Search (NNS)
What if we can search not only optimal architectures but also relevant parameters on a
given dataset and conditions to reduce the additional training costs?
We newly introduce a novel problem of Neural Network Search (NNS), whose goal is to
search for the optimal pretrained networks for a given dataset and conditions.
Neural Network Search
Target
Dataset
Desired
Conditions
Optimal
Network
Relevant
Knowledge
Latency
Accuracy
…
# Params
4
Challenges
To do this, several critical and essential challenges should be properly tackled, such as
where to search and how to find the relevant pretrained models.
While tackling such challenges, we plan to construct our own model-zoo and learn the
cross-modal retrieval space to perform successful neural network search.
How to construct the model-zoo?
Neural Network Search
How to learn the cross modal space?
How to encode parameters? How to encode datasets?
…
…
…
…
5
TANS: Task-Adaptive Neural Network Search
To address such challenges, we newly propose our novel method, namely Task-Adaptive
Neural Network Search with Meta-Contrastive Learning (TANS).
TANS consists of several components, efficient model-zoo construction, model and
query encoders, performance predictor, and meta-contrastive learning framework.
6
Methodology: Model Encoder & Functional Embeddings
To learn the cross-modal retrieval space, we should properly encode both models and
datasets. For embedding pretrained models, how can we encode model parameters?
Our idea is to utilize individual model outputs from the single criteria input which is
unbiasedly generated from the Gaussian distribution, namely functional embeddings.
Unbiased Criteria Input
Generated from Gaussian dist.
Feed Forward
Across All Models
Models’ Individual
Interpretations
on the Criteria Input
7
Methodology: Model Encoder & Functional Embeddings
For architectural topology information, we adopt OFA*’s topological encodings which
contains number of layers, kernel sizes, and channel expansion ratios.
We then merge functional embeddings 𝑣M and topology information 𝑣N to learn model
embeddings 𝑚 such that model encoder 𝐸O 𝑣N, 𝑣M; 𝜙 ∶ ℳ → ℝP
Model
Encoder
⨁
Network Architecture
Functional Embedding
Model Embedding
*[Once-for-All] Cai, H et al. Once-for-all: Train one network and specialize it for efficient deployment. ICLR 2020.
8
Methodology: Query Encoder & Performance Predictor
We design simple pooling-based set encoder for our query encoder 𝐸Q 𝐷; 𝜃 : 𝒬 →
ℝP so that it can produce permutation-invariant query representation 𝑞.
Also, our performance predictor S 𝑚, 𝑞; 𝜓 takes both model embeddings 𝑚 and
query representations 𝑞 to estimate the performance with the given pair.
Model
Encoder
⨁
Network Architecture
Functional Embedding
Model
Embedding
Query
Encoder
Query
Embedding
Query Dataset
Performance
Predictor
Estimated
Performance
9
Methodology: Meta-Contrastive Learning
Putting model and query encoders and performance predictor altogether, we perform
amortized meta-contrastive learning to learn the cross-modal retrieval space.
Our algorithm maximizes distances of irrelevant model and query embeddings while
minimizing the matched pairs, being guided by our performance predictors.
Model
Encoder
Query
Encoder
Query
Embedding
Performance
Predictor
Model
Embedding
𝒒
𝒎$
𝒎$
𝒎$
𝒎%
Cross-Modal Latent Space for Model-Query Pairs
𝒒
𝒎%
𝒎$
𝒎$
𝒎$
Maximize Distance of
Negative Pairs
Minimize Distance
of Positive Pair
Guide Learning based on
Performance of Given Pairs
10
Methodology: Learning Objective
We design contrastive loss ℒT for model embeddings and ℒQ for query embeddings on
our cross-modal retrieval space, optimizing the parameters 𝜃 and 𝜙.
Further we optimize our performance predictor while learning the cross modal space
for accurately estimating the performance on given dataset and model pairs via MSE.
𝒒!
, 𝒎
𝒒"
, 𝒎
Mean Square Error:
11
We use an uncertainty-guided approach to iteratively select the dataset-model pairs
that are expected to expand the pareto frontier the most from the current state.
We can significantly reduce the size of the model zoo, while also having higher
performance compared to the randomly constructed model zoo.
Top-1 accuracy on dataset D
# params
Architecture B
Architecture A
Architecture C
Expected improvement of the
pareto front by training
Architecture B on D
Expected improvement of
the pareto front by training
Architecture C on D
Current pareto front
Methodology: Model-Zoo Construction
12
Experimental Setup: Datasets
We collect 96 real-world image datasets from Kaggle. We split them into 86 meta-
training and 10 meta-test datasets with no class-wise, instance-wise overlapping.
We further partition the meta-training datasets into 140 sub-datasets, so that each has
maximum 20 classes when the number of classes are extremely large. 13
Experimental Setup: Model-Zoo Construction
We train 100 neural network architectures sampled from OFA* space on 140 meta-
training datasets to construct the Model-Zoo consisting of 100*140 trained models.
In order to make this process more efficient, we can employ the efficient model zoo
construction algorithm to reduce the number of training rounds.
Model-Zoo Construction
from Real-world Datasets
N M
*[Once-for-All] Cai, H et al. Once-for-all: Train one network and specialize it for efficient deployment. ICLR 2020.
14
Experimental Setup: Baseline Models
We use six baselines in four categories, such as base architecture, conventional NAS,
weight-sharing approaches, and data-driven Meta-NAS.
MobileNet-V3 [1]
Conventional NAS
Weight-sharing NAS
Data-driven Meta-NAS
Base Architecture
PC-DARTS [2]
DrNAS [3]
FBNet-A [4]
Once-for-All [5]
MetaD2A [6]
[1] Howard, A et al. Searching for mobilenetv3, ICCV 2019.
[2] Xu, Y et al. Pc-darts: Partial channel connections for memory-efficient architecture search, ICLR 2020.
[3] Chen, X et al. Dr{nas}: Dirichlet neural architecture search, ICLR 2021.
[4] Wu, B et al. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. CVPR 2019.
[5] Cai, H et al. Once-for-all: Train one network and specialize it for efficient deployment. ICLR 2020.
[6] Lee, H et al. Rapid neural architecture search by learn- ing to generate graphs from datasets. ICLR 2021.
15
Experimental Results: Meta-test Performance
TANS outperforms all baselines with almost zero search time and also greatly reduces
the training time as TANS can utilize a relevant pretrained knowledge .
Method
Pre-trained
Resource
Training
Epoch
Search
Time
(GPU sec)
Training
Time
(GPU sec)
Speed
Up
Accura
cy
(%)
MobileNetV3 ImageNet 1k 50 - 257 1.00× 94.20
PC-DARTS Scratch 500 1100.37 5721 0.04× 79.22
DrNAS Scratch 500 1501.75 5659 0.04× 84.06
FBNet-A ImageNet 1K 50 - 293 0.88× 93.00
OFA ImageNet 1K 50 121.90 226 0.74× 93.89
MetaD2A ImageNet 1K 50 2.59 345 0.74× 95.24
TANS (Ours)
Retrieved
task
50 0.002 200 1.28× 96.28
Averaged Performance of Searched (Retrieved) Networks on 10 unseen real-world datasets
5 unseen real-world datasets
16
Experimental Results: Semantic Similarity
We show example images from the unseen meta-test query dataset (Query) and meta-
train model-zoo datasets (Retrieval) that the retrieved models are pretrained on.
In most cases, our method matches semantically similar datasets to the query datasets.
Even for the semantically-dissimilar cases, our models still outperform other baselines.
Similar Cases Dissimilar Cases
Query Retrieval Query Retrieval
17
Experimental Results: Analysis & Ablation Study
We examine how accurately our model retrieves the paired network when the meta-
training dataset is given (we used unseen validation examples).
The meta-contrastive learning allows the model to accurately retrieve the same paired
models when the correspondent meta-train datasets are given.
Model
Recall
@Top 1
Recall
@Top 5
Mean
Random 2.14 2.86 69.04
Largest Parameter 3.57 7.14 51.85
TANS + Cosine Sim. Loss 9.29 12.86 46.02
TANS + Hard Neg. Loss 72.14 84.29 4.86
TANS + Meta-Contrastive Loss 80.71 96.43 1.9
TANS w/o Predictor 80.00 96.43 2.23
The Cross-Modal Retrieval Performance Visualization of The Cross-Modal Space
18
Experimental Results: Analysis & Ablation Study
With our performance predictor, we obtain 1.5 %p - 8%p performance gains on 10
meta-test datasets compared to the top 3 retrieved candidates.
Our efficient model-zoo construction algorithm selects Pareto-optimal network and
dataset pairs, creating the higher performing model-zoo over the naïve construction.
Performance Gain (%)
Effectiveness of Performance Predictor Effectiveness of our Model-zoo Construction Algorithm
19
Conclusion
• We newly introduced a novel problem of Neural Network Search (NNS), whose goal is to
search for the optimal pretrained networks for a given dataset and conditions.
• We propose a novel cross-modal retrieval framework to retrieve a pretrained network from
the model zoo for a given task via amortized meta-learning with contrastive objective.
• We propose an efficient model-zoo construction method to construct an effective database
of dataset-architecture pairs considering the model performance.
• We train and validate TANS on a newly collected large-scale database, on which our method
outperforms all NAS & AutoML baselines with almost no architecture search cost and
significantly fewer fine-tuning steps.
20
21
Thank You !

More Related Content

PPTX
neural network
PDF
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
PDF
Cost-effective Interactive Attention Learning with Neural Attention Process
PDF
Neural network image recognition
PDF
Machine learning and_neural_network_lecture_slide_ece_dku
PDF
Human uncertainty makes classification more robust, ICCV 2019 Review
PPT
Artificial Neural Networks - ANN
PDF
Neural network
neural network
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Cost-effective Interactive Attention Learning with Neural Attention Process
Neural network image recognition
Machine learning and_neural_network_lecture_slide_ece_dku
Human uncertainty makes classification more robust, ICCV 2019 Review
Artificial Neural Networks - ANN
Neural network

What's hot (20)

PPTX
Neural network
PDF
Pattern Recognition using Artificial Neural Network
PPTX
ANN load forecasting
PDF
Kernel, RKHS, and Gaussian Processes
PDF
Lecture artificial neural networks and pattern recognition
PPTX
IROS 2017 Slides
PDF
Introduction to Neural Network
PDF
Deep Dive into Hyperparameter Tuning
PDF
Advance deep learning
PDF
Artificial Neural Network Paper Presentation
PPTX
Artificial Neural Network(Artificial intelligence)
PPTX
Forecasting of Sales using Neural network techniques
PDF
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...
PDF
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PDF
Learning to learn unlearned feature for segmentation
PPTX
Neural Network Classification and its Applications in Insurance Industry
PDF
Artificial neural network
PDF
201907 AutoML and Neural Architecture Search
PPTX
Neural networks
PPTX
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Neural network
Pattern Recognition using Artificial Neural Network
ANN load forecasting
Kernel, RKHS, and Gaussian Processes
Lecture artificial neural networks and pattern recognition
IROS 2017 Slides
Introduction to Neural Network
Deep Dive into Hyperparameter Tuning
Advance deep learning
Artificial Neural Network Paper Presentation
Artificial Neural Network(Artificial intelligence)
Forecasting of Sales using Neural network techniques
Deep Learning and Tensorflow Implementation(딥러닝, 텐서플로우, 파이썬, CNN)_Myungyon Ki...
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
Learning to learn unlearned feature for segmentation
Neural Network Classification and its Applications in Insurance Industry
Artificial neural network
201907 AutoML and Neural Architecture Search
Neural networks
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Ad

Similar to Task Adaptive Neural Network Search with Meta-Contrastive Learning (20)

PDF
PDF
Neural Architecture Search: Learning How to Learn
PDF
Architecture Design for Deep Neural Networks III
PPTX
Neural Models for Information Retrieval
PDF
An Introduction to Neural Architecture Search
PPTX
Transformer Zoo (a deeper dive)
PDF
How to use transfer learning to bootstrap image classification and question a...
PDF
Efficient Neural Architecture Search via Parameter Sharing
PDF
Information Retrieval with Deep Learning
PDF
Learning how to learn
PDF
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
PDF
Nas net where model learn to generate models
PDF
Deep Learning Inference at speed and scale
PDF
Autimatic Machine Learning and Artificial Intelligence
PDF
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
PPTX
OReilly AI Transfer Learning
PDF
Object Detection Beyond Mask R-CNN and RetinaNet II
PDF
Junhua wang ai_next_con
PPTX
Computer vision lab seminar(deep learning) yong hoon
PDF
A Simple Framework for Contrastive Learning of Visual Representations
Neural Architecture Search: Learning How to Learn
Architecture Design for Deep Neural Networks III
Neural Models for Information Retrieval
An Introduction to Neural Architecture Search
Transformer Zoo (a deeper dive)
How to use transfer learning to bootstrap image classification and question a...
Efficient Neural Architecture Search via Parameter Sharing
Information Retrieval with Deep Learning
Learning how to learn
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Nas net where model learn to generate models
Deep Learning Inference at speed and scale
Autimatic Machine Learning and Artificial Intelligence
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
OReilly AI Transfer Learning
Object Detection Beyond Mask R-CNN and RetinaNet II
Junhua wang ai_next_con
Computer vision lab seminar(deep learning) yong hoon
A Simple Framework for Contrastive Learning of Visual Representations
Ad

More from MLAI2 (20)

PDF
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
PDF
Online Hyperparameter Meta-Learning with Hypergradient Distillation
PDF
Online Coreset Selection for Rehearsal-based Continual Learning
PDF
Representational Continuity for Unsupervised Continual Learning
PDF
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
PDF
Skill-Based Meta-Reinforcement Learning
PDF
Edge Representation Learning with Hypergraphs
PDF
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
PDF
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
PDF
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
PDF
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
PDF
Accurate Learning of Graph Representations with Graph Multiset Pooling
PDF
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
PDF
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
PDF
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
PDF
Adversarial Self-Supervised Contrastive Learning
PDF
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
PDF
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
PDF
Adversarial Neural Pruning with Latent Vulnerability Suppression
PDF
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Online Coreset Selection for Rehearsal-based Continual Learning
Representational Continuity for Unsupervised Continual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Skill-Based Meta-Reinforcement Learning
Edge Representation Learning with Hypergraphs
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Accurate Learning of Graph Representations with Graph Multiset Pooling
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
Adversarial Self-Supervised Contrastive Learning
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Adversarial Neural Pruning with Latent Vulnerability Suppression
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
A Presentation on Artificial Intelligence
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Encapsulation theory and applications.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
Review of recent advances in non-invasive hemoglobin estimation
MYSQL Presentation for SQL database connectivity
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Dropbox Q2 2025 Financial Results & Investor Presentation
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
cuic standard and advanced reporting.pdf
Spectral efficient network and resource selection model in 5G networks
NewMind AI Weekly Chronicles - August'25 Week I
A Presentation on Artificial Intelligence
Digital-Transformation-Roadmap-for-Companies.pptx
Empathic Computing: Creating Shared Understanding
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Encapsulation theory and applications.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation_ Review paper, used for researhc scholars

Task Adaptive Neural Network Search with Meta-Contrastive Learning

  • 1. Task-Adaptive Neural Network Search with Meta-Contrastive Learning Wonyong Jeong∗,#,$, Hayeon Lee∗,%,$, Geon Park∗,#,$, Eunyoung Hyung#, Jinheon Baek#, and Sung Ju Hwang#,%,$ Graduate Shool of AI!, KAIST, Seoul, South Korea School of Computing" , KAIST, Daejeon, South Korea AITRICS#, Seoul, South Korea ∗: 𝐸𝑞𝑢𝑎𝑙 𝐶𝑜𝑛𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 1
  • 2. Motivation In most cases, the exhaustive trial-and-error and brute force efforts have been often required to design and tune the neural networks to get good models on given datasets. Neural Architecture Search (NAS) alleviates such costs by automatically building neural architectures performing even higher than hand-crafted networks. Search Strategy Performance Estimation Strategy Architecture Search Space Feedback Trials Human Model Manual Design Process Network Architecture Estimated Performance Optimal Architecture Neural Architecture Search (NAS) 2
  • 3. Motivation: The Limitations Most conventional NAS approaches search for only optimal architectures without generating parameters, which requires additional training steps on a given dataset. While some recent NAS methods* depend on a supernet pretrained on ImageNet, they may be suboptimal if the target tasks are highly dissimilar from ImageNet. Pretraining Supernet on Large-Scale Dataset Additional Training Phases on Target Dataset *[Once-for-All] Cai, H et al. Once-for-all: Train one network and specialize it for efficient deployment. ICLR 2020. 3
  • 4. Neural Network Search (NNS) What if we can search not only optimal architectures but also relevant parameters on a given dataset and conditions to reduce the additional training costs? We newly introduce a novel problem of Neural Network Search (NNS), whose goal is to search for the optimal pretrained networks for a given dataset and conditions. Neural Network Search Target Dataset Desired Conditions Optimal Network Relevant Knowledge Latency Accuracy … # Params 4
  • 5. Challenges To do this, several critical and essential challenges should be properly tackled, such as where to search and how to find the relevant pretrained models. While tackling such challenges, we plan to construct our own model-zoo and learn the cross-modal retrieval space to perform successful neural network search. How to construct the model-zoo? Neural Network Search How to learn the cross modal space? How to encode parameters? How to encode datasets? … … … … 5
  • 6. TANS: Task-Adaptive Neural Network Search To address such challenges, we newly propose our novel method, namely Task-Adaptive Neural Network Search with Meta-Contrastive Learning (TANS). TANS consists of several components, efficient model-zoo construction, model and query encoders, performance predictor, and meta-contrastive learning framework. 6
  • 7. Methodology: Model Encoder & Functional Embeddings To learn the cross-modal retrieval space, we should properly encode both models and datasets. For embedding pretrained models, how can we encode model parameters? Our idea is to utilize individual model outputs from the single criteria input which is unbiasedly generated from the Gaussian distribution, namely functional embeddings. Unbiased Criteria Input Generated from Gaussian dist. Feed Forward Across All Models Models’ Individual Interpretations on the Criteria Input 7
  • 8. Methodology: Model Encoder & Functional Embeddings For architectural topology information, we adopt OFA*’s topological encodings which contains number of layers, kernel sizes, and channel expansion ratios. We then merge functional embeddings 𝑣M and topology information 𝑣N to learn model embeddings 𝑚 such that model encoder 𝐸O 𝑣N, 𝑣M; 𝜙 ∶ ℳ → ℝP Model Encoder ⨁ Network Architecture Functional Embedding Model Embedding *[Once-for-All] Cai, H et al. Once-for-all: Train one network and specialize it for efficient deployment. ICLR 2020. 8
  • 9. Methodology: Query Encoder & Performance Predictor We design simple pooling-based set encoder for our query encoder 𝐸Q 𝐷; 𝜃 : 𝒬 → ℝP so that it can produce permutation-invariant query representation 𝑞. Also, our performance predictor S 𝑚, 𝑞; 𝜓 takes both model embeddings 𝑚 and query representations 𝑞 to estimate the performance with the given pair. Model Encoder ⨁ Network Architecture Functional Embedding Model Embedding Query Encoder Query Embedding Query Dataset Performance Predictor Estimated Performance 9
  • 10. Methodology: Meta-Contrastive Learning Putting model and query encoders and performance predictor altogether, we perform amortized meta-contrastive learning to learn the cross-modal retrieval space. Our algorithm maximizes distances of irrelevant model and query embeddings while minimizing the matched pairs, being guided by our performance predictors. Model Encoder Query Encoder Query Embedding Performance Predictor Model Embedding 𝒒 𝒎$ 𝒎$ 𝒎$ 𝒎% Cross-Modal Latent Space for Model-Query Pairs 𝒒 𝒎% 𝒎$ 𝒎$ 𝒎$ Maximize Distance of Negative Pairs Minimize Distance of Positive Pair Guide Learning based on Performance of Given Pairs 10
  • 11. Methodology: Learning Objective We design contrastive loss ℒT for model embeddings and ℒQ for query embeddings on our cross-modal retrieval space, optimizing the parameters 𝜃 and 𝜙. Further we optimize our performance predictor while learning the cross modal space for accurately estimating the performance on given dataset and model pairs via MSE. 𝒒! , 𝒎 𝒒" , 𝒎 Mean Square Error: 11
  • 12. We use an uncertainty-guided approach to iteratively select the dataset-model pairs that are expected to expand the pareto frontier the most from the current state. We can significantly reduce the size of the model zoo, while also having higher performance compared to the randomly constructed model zoo. Top-1 accuracy on dataset D # params Architecture B Architecture A Architecture C Expected improvement of the pareto front by training Architecture B on D Expected improvement of the pareto front by training Architecture C on D Current pareto front Methodology: Model-Zoo Construction 12
  • 13. Experimental Setup: Datasets We collect 96 real-world image datasets from Kaggle. We split them into 86 meta- training and 10 meta-test datasets with no class-wise, instance-wise overlapping. We further partition the meta-training datasets into 140 sub-datasets, so that each has maximum 20 classes when the number of classes are extremely large. 13
  • 14. Experimental Setup: Model-Zoo Construction We train 100 neural network architectures sampled from OFA* space on 140 meta- training datasets to construct the Model-Zoo consisting of 100*140 trained models. In order to make this process more efficient, we can employ the efficient model zoo construction algorithm to reduce the number of training rounds. Model-Zoo Construction from Real-world Datasets N M *[Once-for-All] Cai, H et al. Once-for-all: Train one network and specialize it for efficient deployment. ICLR 2020. 14
  • 15. Experimental Setup: Baseline Models We use six baselines in four categories, such as base architecture, conventional NAS, weight-sharing approaches, and data-driven Meta-NAS. MobileNet-V3 [1] Conventional NAS Weight-sharing NAS Data-driven Meta-NAS Base Architecture PC-DARTS [2] DrNAS [3] FBNet-A [4] Once-for-All [5] MetaD2A [6] [1] Howard, A et al. Searching for mobilenetv3, ICCV 2019. [2] Xu, Y et al. Pc-darts: Partial channel connections for memory-efficient architecture search, ICLR 2020. [3] Chen, X et al. Dr{nas}: Dirichlet neural architecture search, ICLR 2021. [4] Wu, B et al. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. CVPR 2019. [5] Cai, H et al. Once-for-all: Train one network and specialize it for efficient deployment. ICLR 2020. [6] Lee, H et al. Rapid neural architecture search by learn- ing to generate graphs from datasets. ICLR 2021. 15
  • 16. Experimental Results: Meta-test Performance TANS outperforms all baselines with almost zero search time and also greatly reduces the training time as TANS can utilize a relevant pretrained knowledge . Method Pre-trained Resource Training Epoch Search Time (GPU sec) Training Time (GPU sec) Speed Up Accura cy (%) MobileNetV3 ImageNet 1k 50 - 257 1.00× 94.20 PC-DARTS Scratch 500 1100.37 5721 0.04× 79.22 DrNAS Scratch 500 1501.75 5659 0.04× 84.06 FBNet-A ImageNet 1K 50 - 293 0.88× 93.00 OFA ImageNet 1K 50 121.90 226 0.74× 93.89 MetaD2A ImageNet 1K 50 2.59 345 0.74× 95.24 TANS (Ours) Retrieved task 50 0.002 200 1.28× 96.28 Averaged Performance of Searched (Retrieved) Networks on 10 unseen real-world datasets 5 unseen real-world datasets 16
  • 17. Experimental Results: Semantic Similarity We show example images from the unseen meta-test query dataset (Query) and meta- train model-zoo datasets (Retrieval) that the retrieved models are pretrained on. In most cases, our method matches semantically similar datasets to the query datasets. Even for the semantically-dissimilar cases, our models still outperform other baselines. Similar Cases Dissimilar Cases Query Retrieval Query Retrieval 17
  • 18. Experimental Results: Analysis & Ablation Study We examine how accurately our model retrieves the paired network when the meta- training dataset is given (we used unseen validation examples). The meta-contrastive learning allows the model to accurately retrieve the same paired models when the correspondent meta-train datasets are given. Model Recall @Top 1 Recall @Top 5 Mean Random 2.14 2.86 69.04 Largest Parameter 3.57 7.14 51.85 TANS + Cosine Sim. Loss 9.29 12.86 46.02 TANS + Hard Neg. Loss 72.14 84.29 4.86 TANS + Meta-Contrastive Loss 80.71 96.43 1.9 TANS w/o Predictor 80.00 96.43 2.23 The Cross-Modal Retrieval Performance Visualization of The Cross-Modal Space 18
  • 19. Experimental Results: Analysis & Ablation Study With our performance predictor, we obtain 1.5 %p - 8%p performance gains on 10 meta-test datasets compared to the top 3 retrieved candidates. Our efficient model-zoo construction algorithm selects Pareto-optimal network and dataset pairs, creating the higher performing model-zoo over the naïve construction. Performance Gain (%) Effectiveness of Performance Predictor Effectiveness of our Model-zoo Construction Algorithm 19
  • 20. Conclusion • We newly introduced a novel problem of Neural Network Search (NNS), whose goal is to search for the optimal pretrained networks for a given dataset and conditions. • We propose a novel cross-modal retrieval framework to retrieve a pretrained network from the model zoo for a given task via amortized meta-learning with contrastive objective. • We propose an efficient model-zoo construction method to construct an effective database of dataset-architecture pairs considering the model performance. • We train and validate TANS on a newly collected large-scale database, on which our method outperforms all NAS & AutoML baselines with almost no architecture search cost and significantly fewer fine-tuning steps. 20