SlideShare a Scribd company logo
1
Self-Supervised Radio-Visual Learning
Mo Alloulah
alloulah@outlook.com
A new sensing & perception learning paradigm
2
Outline
1. Motivation
a. Cost of labelling
b. Scale
c. Prior work
2. Foundations
3. Scaling data
4. Problem formulation for 6G
5. A new learning algorithm
6. Results
7. Summary
3
Motivation
Radio signals cannot be interpreted by inspection
Training radio sensing systems requires vision groundtruth
Typically, a vision pipeline generates labels for training radio, but with human-in-the-loop annotations
very expensive $$$ and slow manual model production
4
Motivation
Radio signals cannot be interpreted by inspection
Training radio sensing systems requires vision groundtruth
Typically, a vision pipeline generates labels for training radio, but with human-in-the-loop annotations
very expensive $$$ and slow manual model production
Madani et al. "Radatron: Accurate detection using multi-resolution cascaded
MIMO radar." ECCV '20.
5
Motivation
Guan et al. "Through fog high-resolution imaging using millimeter wave radar."
CVPR ‘20.
Zhao et al. "Through-wall human pose estimation using radio signals."
CVPR ‘18.
Li et al. "Making the invisible visible: Action recognition
through walls and occlusions." CVPR ‘19.
Radio signals cannot be interpreted by inspection
Training radio sensing systems requires vision groundtruth
Typically, a vision pipeline generates labels for training radio, but with human-in-the-loop annotations
very expensive $$$ and slow manual model production
6
Motivation
Guan et al. "Through fog high-resolution imaging using millimeter wave radar."
CVPR ‘20.
Zhao et al. "Through-wall human pose estimation using radio signals."
CVPR ‘18.
Li et al. "Making the invisible visible: Action recognition
through walls and occlusions." CVPR ‘19.
Radio signals cannot be interpreted by inspection
Training radio sensing systems requires vision groundtruth
Typically, a vision pipeline generates labels for training radio, but with human-in-the-loop annotations
very expensive $$$ and slow manual model production
Predominantly MIT’s Prof. Dina Katabi and students
7
Cost of labelling
https://aws.amazon.com/sagemaker/data-labeling/pricing/
8
https://aws.amazon.com/sagemaker/data-labeling/pricing/
Cost of labelling
9
https://aws.amazon.com/sagemaker/data-labeling/pricing/
Cost of labelling
10
Say 100k sample dataset for a modest production model
100k x (1.5 + 0.04) ~= $154k
⇒
https://aws.amazon.com/sagemaker/data-labeling/pricing/
Cost of labelling
11
Scale
Handcrafting target detection and
tracking algorithms is likely to become
prohibitively complex.
12
Scale
Handcrafting target detection and
tracking algorithms is likely to become
prohibitively complex.
13
Scale
2k virtual array!
Handcrafting target detection and
tracking algorithms is likely to become
prohibitively complex.
14
Scale
2k virtual array!
Cat
Handcrafting target detection and
tracking algorithms is likely to become
prohibitively complex.
15
Prior work
Radio classification without labels
Use paired radio-visual data to automatically learn an object classifier in radio
No human-in-the-loop manual labelling
Simply learn from information commonly present in vision & radio
M Alloulah et al. Self-Supervised Radio-Visual Representation
Learning for 6G Sensing. IEEE International Conference on
Communications (ICC). 2022.
16
Prior work
Radio classification without labels
Use paired radio-visual data to automatically learn an object classifier in radio
No human-in-the-loop manual labelling
Simply learn from information commonly present in vision & radio
M Alloulah et al. Self-Supervised Radio-Visual Representation
Learning for 6G Sensing. IEEE International Conference on
Communications (ICC). 2022.
Compelling, but can we do a better (fine-grained) job?
17
Look, Radiate, and Learn
Idea in a nutshell
Simply ingest synchronised radio and vision data to do machine learning
• no explicit labels from vision
• tap into lower-level mutual information
• use a spatial backbone neural network in order to do fine-grained
encoding of environment
18
Foundations
19
Provable guarantees for multi-modal learning
[1] Huang et al. What makes multi-modal learning better than single (provably). NeurIPS 2021.
Mapping information onto a latent space is provably better
done with multi-modal learning than uni-modal [1].
20
Provable guarantees for multi-modal learning
[1] Huang et al. What makes multi-modal learning better than single (provably). NeurIPS 2021.
, , and are images in the latent space
Mapping information onto a latent space is provably better
done with multi-modal learning than uni-modal [1].
21
Provable guarantees for multi-modal learning
[1] Huang et al. What makes multi-modal learning better than single (provably). NeurIPS 2021.
The latent representation learnt from
modalities is closer to the true latent than
learnt from modalities where
, , and are images in the latent space
Mapping information onto a latent space is provably better
done with multi-modal learning than uni-modal [1].
22
Provable guarantees for multi-modal learning
[1] Huang et al. What makes multi-modal learning better than single (provably). NeurIPS 2021.
The latent representation learnt from
modalities is closer to the true latent than
learnt from modalities where
, , and are images in the latent space
Mapping information onto a latent space is provably better
done with multi-modal learning than uni-modal [1].
i.e., is better quality than
23
Provable guarantees for multi-modal learning
[1] Huang et al. What makes multi-modal learning better than single (provably). NeurIPS 2021.
The latent representation learnt from
modalities is closer to the true latent than
learnt from modalities where
, , and are images in the latent space
Mapping information onto a latent space is provably better
done with multi-modal learning than uni-modal [1].
i.e., is better quality than
24
Mutual information
Variational bound [2]
Encoder 1 Encoder 2
[2] van den Oord et al. Representation learning with contrastive predictive coding. arXiv:1807.03748, 2018.
Maximise agreement
radio
vision
25
Mutual information
Variational bound [2]
Encoder 1 Encoder 2
[2] van den Oord et al. Representation learning with contrastive predictive coding. arXiv:1807.03748, 2018.
Maximise agreement
radio
vision
Derived from a discrete
Boltzmann/Gibbs
distribution
where is an energy function
26
Mutual information
Variational bound [2]
Encoder 1 Encoder 2
[2] van den Oord et al. Representation learning with contrastive predictive coding. arXiv:1807.03748, 2018.
Maximise agreement
Minimising loss maximises MI between radio & vision, where
is a sampling parameter related to hardware memory
radio
vision
27
Mutual information
Variational bound [2]
Encoder 1 Encoder 2
[2] van den Oord et al. Representation learning with contrastive predictive coding. arXiv:1807.03748, 2018.
Maximise agreement
Minimising loss maximises MI between radio & vision, where
is a sampling parameter related to hardware memory
radio
vision
i.e., we can empirically train a neural
net to maximise MI by Monte Carlo
sampling on a dataset
28
Calibrating mutual information
[3] Tian et al. What makes for good views for contrastive learning? NeurIPS 2020.
Optimal amount of MI needed for training is a
function of downstream applications [3].
• “minimal sufficient” encoding
missing info
excess info
# of bits
captured info
29
Calibrating mutual information
[3] Tian et al. What makes for good views for contrastive learning? NeurIPS 2020.
Optimal amount of MI needed for training is a
function of downstream applications [3].
• “minimal sufficient” encoding
missing info
excess info
# of bits
captured info
optimal
30
Scaling data
31
Dataset: MaxRay*
* M Arnold et al. MaxRay: A raytracing-based integrated sensing and communication framework. IEEE JC&S 2022.
32
Dataset: MaxRay*
30,000 datapoints built over 12-day nonstop number crunching
* M Arnold et al. MaxRay: A raytracing-based integrated sensing and communication framework. IEEE JC&S 2022.
33
Problem formulation for 6G
34
Look, Radiate, and Learn
Cross-Modal Training
Paired Data
35
Look, Radiate, and Learn
Cross-Modal Training
Paired Data Test Radio Heatmaps
36
Look, Radiate, and Learn
Cross-Modal Training
Paired Data
Radio-only Inference
Test Radio Heatmaps Target Localisation
37
A new learning algorithm
38
Cross-Modal Training
Paired Data
39
Cross-Modal Training
Paired Data
Off-the-shelf segmentation
40
Cross-Modal Training
Paired Data
Off-the-shelf segmentation
Vision Masks
41
(1) Contrastive Pre-training
42
(1) Contrastive Pre-training
2
1
2
…
…
…
2
1
2
…
…
…
43
(1) Contrastive Pre-training
2
1
2
…
…
…
2
1
2
…
…
…
Spatial “binning” dimensions
44
(1) Contrastive Pre-training
2
1
2
…
…
…
2
1
2
…
…
…
Feature “coding”
dimension
45
(1) Contrastive Pre-training
46
(2) Generate Self-Labels
47
(2) Generate Self-Labels
48
(3) Downstream Training
49
Radio-only Deployment
50
Optimisation tricks (bonus)
51
Contrastive learning dimensionality
Images
= 3x640x480
52
2
1
2 …
…
…
2
1
2 …
…
…
Contrastive learning dimensionality
Images
= 3x640x480
Encodings
= 128x60x80 = 614,400
614,400 is quite large gradients don’t fit in 8-GPU memory
during backpropagation
53
2
1
2 …
…
…
2
1
2 …
…
…
Contrastive learning dimensionality
layer 3
layer 2
layer 1
layer 0
Radio
Encoder
layer 3
layer 2
layer 1
layer 0
Visual
Encoder
Forward pass
Backward pass
Contrastive loss
Images
= 3x640x480
Encodings
= 128x60x80 = 614,400
614,400 is quite large gradients don’t fit in 8-GPU memory
during backpropagation
54
2
1
2 …
…
…
2
1
2 …
…
…
Contrastive learning dimensionality
layer 3
layer 2
layer 1
layer 0
Radio
Encoder
layer 3
layer 2
layer 1
layer 0
Visual
Encoder
Forward pass 1
Backward pass
1
Forward pass 2
Backward pass 2
Contrastive loss
[4] Xiong et al. Loco: Local contrastive representation learning. NeurIPS 2020.
Images
= 3x640x480
Encodings
= 128x60x80 = 614,400
614,400 is quite large
Trick to fix: Break backpropagation chain [4] by alternating radio and
vision encoder updates
55
Main results
56
Results
Overall localisation performance
57
Results
Overall localisation performance
MCL masked contrastive learning
:=
58
Results
Overall localisation performance
Our method outperforms all baselines in median localisation
accuracy
59
Results
Overall localisation performance
Our method outperforms all baselines in median localisation
accuracy including genie-aided CFAR by ~2x and across two datasets
60
Further Analysis
Effect of number of training labels on localisation
61
Further Analysis
Effect of number of training labels on localisation
MCL masked contrastive learning
SCL spatial contrastive learning
:=
:=
62
Further Analysis
Effect of number of training labels on localisation
Localisation benefits from training on more self-labels [5], which
reaffirms the vast label scalability advantage of our self-supervised
method.
[5] Guan et al. Who said what: Modeling individual labelers improves classification. AAAI 2018.
63
Further Analysis
Effect of radio-visual mutual information on localisation
mask padding
64
mask padding
Further Analysis
Effect of radio-visual mutual information on localisation
Wasserstein distance
:=
65
mask padding
Further Analysis
Effect of radio-visual mutual information on localisation
66
mask padding
Further Analysis
Effect of radio-visual mutual information on localisation
It is important to calibrate radio-visual mutual information during
self-supervised learning for optimal downstream performance [3].
[3] Tian et al. What makes for good views for contrastive learning? NeurIPS 2020.
67
mask padding
Further Analysis
Effect of radio-visual mutual information on localisation
It is important to calibrate radio-visual mutual information during
self-supervised learning for optimal downstream performance [3].
[3] Tian et al. What makes for good views for contrastive learning? NeurIPS 2020.
excess info
missing info
optimal
68
mask padding
Further Analysis
Effect of radio-visual mutual information on localisation
It is important to calibrate radio-visual mutual information during
self-supervised learning for optimal downstream performance [3].
[3] Tian et al. What makes for good views for contrastive learning? NeurIPS 2020.
excess info
missing info
optimal
missing
info
excess
info
# of bits
69
Summary
70
Takeaways
Machine learning-based RF sensing works provided that
• goals are designed appropriately and measured with proper metrics
• radio kit is good
• data is abundant
• data is well curated
71
Takeaways
Machine learning-based RF sensing works provided that
• goals are designed appropriately and measured with proper metrics
• radio kit is good
• data is abundant
• data is well curated
Fine-grained self-supervised radio-visual learning is a powerful learning paradigm
• automatic target localisation
• vast data scalability (you only have to: look, radiate, & learn)
• key technology for building perception models for next-gen high-res radars
Cheers

More Related Content

PDF
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
PDF
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
PDF
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
PDF
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
ODP
reinforcement learning for difficult settings
PDF
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
PDF
Deep Learning from Videos (UPC 2018)
PPTX
CM20315_01_Intro_Machine_Learning_ap.pptx
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Unsupervised Deep Learning (D2L1 Insight@DCU Machine Learning Workshop 2017)
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
reinforcement learning for difficult settings
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Deep Learning from Videos (UPC 2018)
CM20315_01_Intro_Machine_Learning_ap.pptx

Similar to Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual Correspondence (20)

PDF
Sangeetha seminar (1)
PPTX
Unit-5.pptx notes for artificial intelligence
PPTX
introduction to machine learning education.pptx
PDF
Machine learning in science and industry — day 4
PDF
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
PDF
01_introduction_ML.pdf
PPTX
L 8 introduction to machine learning final kirti.pptx
PPTX
lebhhhggjitr677ugghjjnbbbbvcchjhc16.pptx
PDF
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
PPTX
Practical deep learning for computer vision
PDF
PPTX
1. Introduction to deep learning.pptx
PDF
WALD LECTURE 1
PDF
Introduction-to-Neural-Networks-and-Deep-Learning.pptx.pdf
PDF
Fcv cross hebert
PDF
01_introduction to machine learning algorithms and basics .pdf
PDF
2021 itu challenge_reinforcement_learning
PPTX
ppt on introduction to Machine learning tools
PPTX
Weed Detection and Identification using Deep learning Techniques
PDF
Machine Learning for objective QoE assessment: Science, Myths and a look to t...
Sangeetha seminar (1)
Unit-5.pptx notes for artificial intelligence
introduction to machine learning education.pptx
Machine learning in science and industry — day 4
Article overview: Unsupervised Learning of Visual Structure Using Predictive ...
01_introduction_ML.pdf
L 8 introduction to machine learning final kirti.pptx
lebhhhggjitr677ugghjjnbbbbvcchjhc16.pptx
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Practical deep learning for computer vision
1. Introduction to deep learning.pptx
WALD LECTURE 1
Introduction-to-Neural-Networks-and-Deep-Learning.pptx.pdf
Fcv cross hebert
01_introduction to machine learning algorithms and basics .pdf
2021 itu challenge_reinforcement_learning
ppt on introduction to Machine learning tools
Weed Detection and Identification using Deep learning Techniques
Machine Learning for objective QoE assessment: Science, Myths and a look to t...
Ad

Recently uploaded (20)

PDF
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
2. Earth - The Living Planet earth and life
PPTX
Microbiology with diagram medical studies .pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
PPTX
Classification Systems_TAXONOMY_SCIENCE8.pptx
PPTX
Overview of calcium in human muscles.pptx
PPTX
Application of enzymes in medicine (2).pptx
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PPT
POSITIONING IN OPERATION THEATRE ROOM.ppt
PPTX
famous lake in india and its disturibution and importance
PDF
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
CHAPTER 3 Cell Structures and Their Functions Lecture Outline.pdf
Taita Taveta Laboratory Technician Workshop Presentation.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
2. Earth - The Living Planet earth and life
Microbiology with diagram medical studies .pptx
Placing the Near-Earth Object Impact Probability in Context
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Introduction to Fisheries Biotechnology_Lesson 1.pptx
ognitive-behavioral therapy, mindfulness-based approaches, coping skills trai...
Classification Systems_TAXONOMY_SCIENCE8.pptx
Overview of calcium in human muscles.pptx
Application of enzymes in medicine (2).pptx
TOTAL hIP ARTHROPLASTY Presentation.pptx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
POSITIONING IN OPERATION THEATRE ROOM.ppt
famous lake in india and its disturibution and importance
Mastering Bioreactors and Media Sterilization: A Complete Guide to Sterile Fe...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
Ad

Look, Radiate, and Learn: Self-Supervised Localisation via Radio-Visual Correspondence