Task Adaptive Neural Network Search with Meta-Contrastive Learning

Task-Adaptive Neural Network Search
with Meta-Contrastive Learning
Wonyong Jeong∗,#,$, Hayeon Lee∗,%,$, Geon Park∗,#,$,
Eunyoung Hyung#, Jinheon Baek#, and Sung Ju Hwang#,%,$
Graduate Shool of AI!, KAIST, Seoul, South Korea
School of Computing"
, KAIST, Daejeon, South Korea
AITRICS#, Seoul, South Korea
∗: 𝐸𝑞𝑢𝑎𝑙 𝐶𝑜𝑛𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
1

Motivation
In most cases, the exhaustive trial-and-error and brute force efforts have been often
required to design and tune the neural networks to get good models on given datasets.
Neural Architecture Search (NAS) alleviates such costs by automatically building neural
architectures performing even higher than hand-crafted networks.
Search Strategy
Performance Estimation Strategy
Architecture Search Space
Feedback
Trials
Human Model
Manual Design Process
Network
Architecture
Estimated
Performance
Optimal
Architecture
Neural Architecture Search (NAS)
2

Motivation: The Limitations
Most conventional NAS approaches search for only optimal architectures without
generating parameters, which requires additional training steps on a given dataset.
While some recent NAS methods* depend on a supernet pretrained on ImageNet, they
may be suboptimal if the target tasks are highly dissimilar from ImageNet.
Pretraining Supernet on Large-Scale Dataset
Additional Training Phases on Target Dataset
*[Once-for-All] Cai, H et al. Once-for-all: Train one network and specialize it for efficient deployment. ICLR 2020.
3

Neural Network Search (NNS)
What if we can search not only optimal architectures but also relevant parameters on a
given dataset and conditions to reduce the additional training costs?
We newly introduce a novel problem of Neural Network Search (NNS), whose goal is to
search for the optimal pretrained networks for a given dataset and conditions.
Neural Network Search
Target
Dataset
Desired
Conditions
Optimal
Network
Relevant
Knowledge
Latency
Accuracy
…
# Params
4

Challenges
To do this, several critical and essential challenges should be properly tackled, such as
where to search and how to find the relevant pretrained models.
While tackling such challenges, we plan to construct our own model-zoo and learn the
cross-modal retrieval space to perform successful neural network search.
How to construct the model-zoo?
Neural Network Search
How to learn the cross modal space?
How to encode parameters? How to encode datasets?
…
…
…
…
5

TANS: Task-Adaptive Neural Network Search
To address such challenges, we newly propose our novel method, namely Task-Adaptive
Neural Network Search with Meta-Contrastive Learning (TANS).
TANS consists of several components, efficient model-zoo construction, model and
query encoders, performance predictor, and meta-contrastive learning framework.
6

Methodology: Model Encoder & Functional Embeddings
To learn the cross-modal retrieval space, we should properly encode both models and
datasets. For embedding pretrained models, how can we encode model parameters?
Our idea is to utilize individual model outputs from the single criteria input which is
unbiasedly generated from the Gaussian distribution, namely functional embeddings.
Unbiased Criteria Input
Generated from Gaussian dist.
Feed Forward
Across All Models
Models’ Individual
Interpretations
on the Criteria Input
7

Methodology: Model Encoder & Functional Embeddings
For architectural topology information, we adopt OFA*’s topological encodings which
contains number of layers, kernel sizes, and channel expansion ratios.
We then merge functional embeddings 𝑣M and topology information 𝑣N to learn model
embeddings 𝑚 such that model encoder 𝐸O 𝑣N, 𝑣M; 𝜙 ∶ ℳ → ℝP
Model
Encoder
⨁
Network Architecture
Functional Embedding
Model Embedding
8

Methodology: Query Encoder & Performance Predictor
We design simple pooling-based set encoder for our query encoder 𝐸Q 𝐷; 𝜃 : 𝒬 →
ℝP so that it can produce permutation-invariant query representation 𝑞.
Also, our performance predictor S 𝑚, 𝑞; 𝜓 takes both model embeddings 𝑚 and
query representations 𝑞 to estimate the performance with the given pair.
Model
Encoder
⨁
Network Architecture
Functional Embedding
Model
Embedding
Query
Encoder
Query
Embedding
Query Dataset
Performance
Predictor
Estimated
Performance
9

Methodology: Meta-Contrastive Learning
Putting model and query encoders and performance predictor altogether, we perform
amortized meta-contrastive learning to learn the cross-modal retrieval space.
Our algorithm maximizes distances of irrelevant model and query embeddings while
minimizing the matched pairs, being guided by our performance predictors.
Model
Encoder
Query
Encoder
Query
Embedding
Performance
Predictor
Model
Embedding
𝒒
𝒎$
𝒎$
𝒎$
𝒎%
Cross-Modal Latent Space for Model-Query Pairs
𝒒
𝒎%
𝒎$
𝒎$
𝒎$
Maximize Distance of
Negative Pairs
Minimize Distance
of Positive Pair
Guide Learning based on
Performance of Given Pairs
10

Methodology: Learning Objective
We design contrastive loss ℒT for model embeddings and ℒQ for query embeddings on
our cross-modal retrieval space, optimizing the parameters 𝜃 and 𝜙.
Further we optimize our performance predictor while learning the cross modal space
for accurately estimating the performance on given dataset and model pairs via MSE.
𝒒!
, 𝒎
𝒒"
, 𝒎
Mean Square Error:
11

We use an uncertainty-guided approach to iteratively select the dataset-model pairs
that are expected to expand the pareto frontier the most from the current state.
We can significantly reduce the size of the model zoo, while also having higher
performance compared to the randomly constructed model zoo.
Top-1 accuracy on dataset D
# params
Architecture B
Architecture A
Architecture C
Expected improvement of the
pareto front by training
Architecture B on D
Expected improvement of
the pareto front by training
Architecture C on D
Current pareto front
Methodology: Model-Zoo Construction
12

Experimental Setup: Datasets
We collect 96 real-world image datasets from Kaggle. We split them into 86 meta-
training and 10 meta-test datasets with no class-wise, instance-wise overlapping.
We further partition the meta-training datasets into 140 sub-datasets, so that each has
maximum 20 classes when the number of classes are extremely large. 13

Experimental Setup: Model-Zoo Construction
We train 100 neural network architectures sampled from OFA* space on 140 meta-
training datasets to construct the Model-Zoo consisting of 100*140 trained models.
In order to make this process more efficient, we can employ the efficient model zoo
construction algorithm to reduce the number of training rounds.
Model-Zoo Construction
from Real-world Datasets
N M
14

Experimental Setup: Baseline Models
We use six baselines in four categories, such as base architecture, conventional NAS,
weight-sharing approaches, and data-driven Meta-NAS.
MobileNet-V3 [1]
Conventional NAS
Weight-sharing NAS
Data-driven Meta-NAS
Base Architecture
PC-DARTS [2]
DrNAS [3]
FBNet-A [4]
Once-for-All [5]
MetaD2A [6]
[1] Howard, A et al. Searching for mobilenetv3, ICCV 2019.
[2] Xu, Y et al. Pc-darts: Partial channel connections for memory-efficient architecture search, ICLR 2020.
[3] Chen, X et al. Dr{nas}: Dirichlet neural architecture search, ICLR 2021.
[4] Wu, B et al. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. CVPR 2019.
[5] Cai, H et al. Once-for-all: Train one network and specialize it for efficient deployment. ICLR 2020.
[6] Lee, H et al. Rapid neural architecture search by learning to generate graphs from datasets. ICLR 2021.
15

Experimental Results: Meta-test Performance
TANS outperforms all baselines with almost zero search time and also greatly reduces
the training time as TANS can utilize a relevant pretrained knowledge .
Method
Pre-trained
Resource
Training
Epoch
Search
Time
(GPU sec)
Training
Time
(GPU sec)
Speed
Up
Accura
cy
(%)
MobileNetV3 ImageNet 1k 50 - 257 1.00× 94.20
PC-DARTS Scratch 500 1100.37 5721 0.04× 79.22
DrNAS Scratch 500 1501.75 5659 0.04× 84.06
FBNet-A ImageNet 1K 50 - 293 0.88× 93.00
OFA ImageNet 1K 50 121.90 226 0.74× 93.89
MetaD2A ImageNet 1K 50 2.59 345 0.74× 95.24
TANS (Ours)
Retrieved
task
50 0.002 200 1.28× 96.28
Averaged Performance of Searched (Retrieved) Networks on 10 unseen real-world datasets
5 unseen real-world datasets
16

Experimental Results: Semantic Similarity
We show example images from the unseen meta-test query dataset (Query) and meta-
train model-zoo datasets (Retrieval) that the retrieved models are pretrained on.
In most cases, our method matches semantically similar datasets to the query datasets.
Even for the semantically-dissimilar cases, our models still outperform other baselines.
Similar Cases Dissimilar Cases
Query Retrieval Query Retrieval
17

Experimental Results: Analysis & Ablation Study
We examine how accurately our model retrieves the paired network when the meta-
training dataset is given (we used unseen validation examples).
The meta-contrastive learning allows the model to accurately retrieve the same paired
models when the correspondent meta-train datasets are given.
Model
Recall
@Top 1
Recall
@Top 5
Mean
Random 2.14 2.86 69.04
Largest Parameter 3.57 7.14 51.85
TANS + Cosine Sim. Loss 9.29 12.86 46.02
TANS + Hard Neg. Loss 72.14 84.29 4.86
TANS + Meta-Contrastive Loss 80.71 96.43 1.9
TANS w/o Predictor 80.00 96.43 2.23
The Cross-Modal Retrieval Performance Visualization of The Cross-Modal Space
18

Experimental Results: Analysis & Ablation Study
With our performance predictor, we obtain 1.5 %p - 8%p performance gains on 10
meta-test datasets compared to the top 3 retrieved candidates.
Our efficient model-zoo construction algorithm selects Pareto-optimal network and
dataset pairs, creating the higher performing model-zoo over the naïve construction.
Performance Gain (%)
Effectiveness of Performance Predictor Effectiveness of our Model-zoo Construction Algorithm
19

Conclusion
• We newly introduced a novel problem of Neural Network Search (NNS), whose goal is to
search for the optimal pretrained networks for a given dataset and conditions.
• We propose a novel cross-modal retrieval framework to retrieve a pretrained network from
the model zoo for a given task via amortized meta-learning with contrastive objective.
• We propose an efficient model-zoo construction method to construct an effective database
of dataset-architecture pairs considering the model performance.
• We train and validate TANS on a newly collected large-scale database, on which our method
outperforms all NAS & AutoML baselines with almost no architecture search cost and
significantly fewer fine-tuning steps.
20

Task Adaptive Neural Network Search with Meta-Contrastive Learning

More Related Content

What's hot (20)

Similar to Task Adaptive Neural Network Search with Meta-Contrastive Learning (20)

More from MLAI2 (20)

Recently uploaded (20)

Task Adaptive Neural Network Search with Meta-Contrastive Learning