The Frontier of Deep Learning in 2020 and Beyond

The Frontier of Deep Learning in
2020 and Beyond
Recent advances, Trends and Opportunities
Bhav Ashok
#ISSLearningFest

Overview
1. Brief history of Deep Learning
2. Trends
1. Move to free data
2. Learn everything
3. Do more with less
3. Future of Deep Learning
1. GPT-3 (Generative-Pretraining-3)
2. NeRF (Neural Radiance Fields)
#ISSLearningFest

Brief history of Deep Learning
How it all began
#ISSLearningFest

History of deep learning
#ISSLearningFest
2012 2016 2020< 2012
Classical
ML

#ISSLearningFest
2012 2016 2020< 2012
Large-scale
Deep Learning
ResNet-101
(2015)
VGG-16
(2014)
AlexNet
(2012)
Classical
ML

2012-2016
• GPU training of deep CNNs
(Convolutional Neural Network)
− AlexNet wins ImageNet 2012
− First superhuman performance on ImageNet
• Deeper CNNs improve performance
• More innovation in CNNs
− VGGNet in 2014
− ResNet wins ImageNet 2015
• Birth of large-scale deep learning
#ISSLearningFest

#ISSLearningFest
2012 2016 2020< 2012
(2019)
Transformers
(2018)
NAS/AutoML
(2017)
AlphaGO
(2016)
Large-scale
Deep Learning
ResNet-101
(2015)
VGG-16
(2014)
AlexNet
(2012) Realistic
GANs
(2019)
Classical
ML

2016-2020
• Trend 1: Move to free data
• Trend 2: Learn everything (AutoML)
• Trend 3: Do more with less data
• Deep Learning scales on other tasks
− Language models: Transformers
− Speech synthesis: Tacotron
− Deep RL: Alpha GO
− Generative models: Hyper-realistic GANs (Generative Adversarial
Networks)
#ISSLearningFest

#ISSLearningFest
2012 2016 2020< 2012
GPT-3
(2020)
(2019)
Transformers
(2018)
NAS/AutoML
(2017)
AlphaGO
(2016)
Self-driving
AR
Large-scale
Deep Learning
ResNet-101
(2015)
VGG-16
(2014)
AlexNet
(2012) Realistic
GANs
(2019)
Classical
ML

2020+
• Previous success in deep learning mostly in controlled environments
− Web
− Games (video and board games)
− Academic benchmarks
• Real world applications start to mature
− Self-driving, GPT-3, AR
• Promising research in 3D vision
• Few shot learning and domain adaptation.
#ISSLearningFest

Trend 1: Free data
The push for more annotated data at a cheaper cost
#ISSLearningFest

Motivation
• Accuracy scales with data
− Unreasonable effectiveness of data revisited
(2017)
• Problem: Labeling data is expensive
− Requires human annotators
− Around $0.50 - $10 per image
• Problem: User privacy
− Majority of training data is generated by consumers
− GDPR (2016-2018) #ISSLearningFest

Technological advances
• Techniques in improving data labelling efficiency
• Knowledge distillation
• Machines label data for you
• 3D to 2D supervision
• Label in 3D, $$ in 2D
• Synthetic data
• Generate data and labels synthetically
#ISSLearningFest

Knowledge distillation
• Use a pre-trained “Teacher” model to label unlabeled data
• Train a “Student” model using newly annotated dataset
• Machines annotate data instead of humans
− Promise:
− Free labels on real images
− Problems:
− Might be noisy and reinforce errors present in Teacher model
− Recent advances in self-distillation, Multi-Teacher distillation,
Human in the loop (active learning)
#ISSLearningFest
Teacher
Student
Fig: “Teacher” distills knowledge into “Student”
Image
Label

3D to 2D supervision
• Reconstruct scene, annotate in 3D, profit in 2D
− Cheaper supervision on real 2D images
• Problems
⚫ Depends on quality of 3D reconstruction of scene.
⚫ 3D annotation is expensive
⚫ though total cost may still be cheaper than 2D annotation.
#ISSLearningFest

Rendered annotations
w/ real image.
Source:
http://www.scan-net.org/

Synthetic data (Sim2Real)
• Create environments and get annotation for free.
• Free annotations and unlimited variation for cheap
• Problems
⚫ Difficulty in generalizing to real world domain
⚫ Some human input required to generate useful simulations
#ISSLearningFest

Rendered
Image
Rendered
Depth map
Rendered
Segmentation
annotation
Map of scene
Source: https://github.com/facebookresearch/House3D

Source: https://venturebeat.com/2020/07/17/why-unity-claims-synthetic-data-sets-can-improve-computer-vision-
models/

Trend 2: Learn everything
From data to architectures and optimizers (AutoML).
#ISSLearningFest

Learn everything
• Classical paradigm of Deep Learning
• Optimizing all parts of the stack
− Architectures
− Optimizers
− Data augmentation
− Learning schedules and more
• Also known as AutoML or “Learning to learn”
#ISSLearningFest
Network
Loss
function
Data
Data
augmentation
Optimizer
Classical paradigm of Deep Learning

Architectures
• People previously believed that human intuition was essential in
architecture design
• Early architectures followed intuitions from pattern recognition
• Reinforcement Learning, Supervised Learning and Evolutionary
Algorithms are commonly used in AutoML
• Neural Architecture Search (ICLR 2018)
• Used over 800 GPUs
#ISSLearningFest

Neural Architecture Search
• Idea: Use reinforcement learning to train a neural network to create a
high performing neural network
• Reward function: Accuracy of generated architecture
• Result: RL agent learns to generate architectures
that produce high accuracy on the task.
#ISSLearningFest
Example of generated architecture

Trend 3: Do more with less
Semi-supervised learning, self-attention, self-play and more
#ISSLearningFest

Do more with less
⚫ Self-training
⚫ Learn more from unlabeled data.
⚫ Self-play
⚫ AIs compete amongst themselves to improve.
⚫ Self-attention
⚫ Learn better associations within input data
#ISSLearningFest

Self-training
• Self-training with noisy student
− Idea: perform self-distillation on unlabeled data but add noise
during training to increase robustness of model.
− Current state of the Art on ImageNet
#ISSLearningFest
Model
Step 1: Label example using current model
“Cat”
Step 2: Train with noisy example
+ noise
Model
Train with label
from previous step

Self-play
• AIs compete to maximize Reinforcement Learning reward function
• Vital to breakthroughs in
− AlphaGO
− OpenAI Five
#ISSLearningFest AlphaGo vs. AlphaGo Match 41OpenAI Five Self-Play

Self-attention
• Transformers introduced in paper “Attention is all you need”
− Breakthrough in language models
− State of the Art on multiple language tasks
• Example
• “The animal didn't cross the street because it was too tired”
• What does “it” refer to? Animal or Street?
• Self-attention allows model to look at entire sentence and
form associations by training on language understanding tasks.
#ISSLearningFest
Source: jalammar.github.io/illustrated-transformer

Future of Deep Learning
Expanding to the real world
#ISSLearningFest

Real world deep learning
• Real world is
• Complex
• Physical space is in 3D
• Applications need to be
• Robust and adapt to changing environments
• Understand 3D if deployed in physical world
• Research
• Domain adaptation, few-shot learning
• 3D scene understanding
#ISSLearningFest

Recent breakthroughs
• GPT-3 (Generative Pre-Training 3 - 2020)
• Adapts to environment using few-shot learning.
• Generalizes surprisingly well to many useful applications.
• NeRF (Neural Radiance Fields - 2020)
• Able to reconstruct and model 3D environments completely within a
single neural network.
• Qualitative results much better than classical reconstruction methods
on real world data.
#ISSLearningFest

Generative Pre-Training-3 (GPT-3)
• Based on transformer model
• Uses pretraining and adapts to environment using few-shot learning.
• Very very large scale
• 175 Billion parameters (>10x more than any previous model)
• Trained using over a trillion words
• Cost US$ 12 million to train
#ISSLearningFest

GPT-3 - Writing code
#ISSLearningFest

GPT-3 – Designing UI
#ISSLearningFest
Source:
https://twitter.com/jsngr/status/1284511080715362304

GPT-3 – Learning
#ISSLearningFestSource: https://learnfromanyone.com/

GPT-3 – Other applications
#ISSLearningFestSource: https://learnfromanyone.com/
• Writing stories
• Psychotherapy
• Food recipes
• Medical diagnosis
• Compose music
• And more

Neural Radiance Fields (NeRF)
• Training data: Sparse set of images + viewpoints
• Note: viewpoints can be recovered using traditional pose estimation
techniques so really, this needs only a set of images.
• Result: 3D reconstruction and view dependent rendering
• Learns a function that maps rays passing through images to rgb + density
#ISSLearningFest
Source: https://www.matthewtancik.com/nerf
Viewing
position +
angle
Image
pixels +
density

NeRF
• Photorealistic rendering that accounts for lighting/materials/viewpoint.
#ISSLearningFest

NeRF – photorealistic rendering
#ISSLearningFest
Example of classical reconstruction + rendering
Source: youtube.com/watch?v=OsZvBEkJ6Vg
NeRF rendering

NeRF - results
#ISSLearningFest
Rendering with specularities Result with fixed viewing position but varying
viewing angle

NeRF - results
#ISSLearningFest
Photorealistic rendering Models fine physical structures

NeRF
• Photorealistic rendering that accounts for lighting/materials/viewpoint
• Works in the real world
#ISSLearningFest

NeRF – Real world results
#ISSLearningFest
Brandenburg GateTrevi Fountain
Source: https://nerf-w.github.io/

NeRF
• Photorealistic rendering that accounts for lighting/materials/viewpoint
• Works in the real world
• Learns a neural representation of the 3D scene
• Useful for varying arbitrary quantities. (e.g. lighting)
• Useful for multi-task learning.
#ISSLearningFest

NeRF - Relighting
#ISSLearningFestSource: https://nerf-w.github.io/

Thank You!
bhav@nuronlabs.com
#ISSLearningFest

The Frontier of Deep Learning in 2020 and Beyond

The Frontier of Deep Learning in 2020 and Beyond

More Related Content

What's hot (20)

Similar to The Frontier of Deep Learning in 2020 and Beyond (20)

More from NUS-ISS (20)

Recently uploaded (20)

The Frontier of Deep Learning in 2020 and Beyond