SlideShare a Scribd company logo
Multispectral Transfer Network:
Unsupervised Depth Estimation for All-day Vision
AAAI 2018, New Orleans
Namil Kim*, Yukyung Choi*, Soonmin Hwang, In So Kweon
KAIST RCV Lab / All-day Vision Team
*Equal contributions
Problem definition
Why we are interesting in depth?
“Crucial information” to understand the world around us
*From NVidia
It is necessary to 3D understanding for self-decision making
Problem definition
How do we usually get “dense depth”
in any time of the day?
RGB-Stereo 3D LiDAR
DayNight
≤ 11.45m≥ 23.89m
4 points
2 points
LiDAR
0.16°
Sensitive Sparse
Problem solution
3D LiDAR
DayNight
Thermal
(LWIR )
Depth Estimation
from a single thermal Image
How do we usually get “dense depth”
in any time of the day?
RGB-Stereo
Related works
Single image based depth estimation
 Supervised depth estimation
 Unsupervised depth estimation
 Semi-supervised depth estimation
Supervised depth estimation
Supervised [NIPS’14, CVPR’15, ICCV’15, NIPS’16, PAMI’16]
Semi-supervised [CVPR’17]
Unsupervised [ECCV’16, 3DV’16, CVPR’17]
Unsupervised depth estimation
Semi-supervised depth estimation
Idea to all-day depth estimation
Day Night
Illumination change
RGB
O X
Unsupervised
Learning
Unsupervised
Learning
Idea to all-day depth estimation
Day Night
Illumination change
RGBThermal
O X
Robust to illumination change
Unsupervised
Learning
Unsupervised
Learning
Idea to all-day depth estimation
Day Night
Illumination change
RGBThermal
Alignment
O X
Thermal-to-depth
#1
#2
Unsupervised
Learning
Unsupervised
Learning
Idea to all-day depth estimation
Day Night
Illumination change
RGBThermal
Alignment
O X
Thermal-to-depth
Adaptation
Robust to illumination change
Unsupervised
Learning
Unsupervised
Learning
Requirements #1
Multispectral (RGB-Thermal) dataset
 RGB stereo pair
 Alignment between thermal and RGB(left)
 3D measurement
Yukyung Choi et al., KAIST Multispectral Recognition Dataset in Day and Night, TITS’18
Requirements #2
Multispectral (RGB-Thermal) Transfer Network
 Aim: Thermal to depth prediction
 Data: Thermal and aligned left RGB
(+ right RGB, stereo pair)
 Model: unsupervised method
RGBThermal
Alignment
O
U.S.L
Thermal-to-depth
Proposed framework
What is Multispectral Transfer Network?
@Supervised method @Unsupervised method
@MTN method
Contributions
Key Ideas of Proposed MTN (Overview)
1) Efficient Multi-task Learning
Predicting Depth, Surface Normals and Semantic Labels
with a Common Multi-Scale Convolutional Architecture,
ICCV2015.
Without annotated data:
Propose an efficient multi-task methodology
Depth and Chromaticity
- surface normal
- semantic labeling
- object pose annotation
* Most of works under an indoor.
(difficulty of collecting sources of
subsequent task in outdoor)
Multi-task learning for
depth estimation
No human-intensive data
Relevance to the depth
Contextual information
[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for All-day Vision
Key Ideas of Proposed MTN (1/4)
Predicting Depth, Surface Normals and Semantic Labels
with a Common Multi-Scale Convolutional Architecture,
ICCV2015.
- surface normal
- semantic labeling
- object pose annotation
* Most of works under an indoor.
(difficulty of collecting sources of
subsequent task in outdoor)
Previous works:
No human-intensive data
Relevance to the depth
Contextual information
Our work: Chromaticity
1) Efficient Multi-task Learning
Without annotated data:
Propose an efficient multi-task methodology
Key Ideas of Proposed MTN (2/4)
Interleaver Module:
to directly interleave the chromaticity into the depth estimation
“Skip-connection meets Inter-leaver for the feature learning”
Encoder Decoder
Multispectral Transfer Network (MTN)
2) Novel Module for Multi-task learning
Thermal Input
Disparity Output
Chromaticity Output
Conv.
DeConv.
Interleaver
Skip Connect.
Forward flow
Key Ideas of Proposed MTN (2/4)
2) Novel Module for Multi-task learning
1. Global/Un-Pooling + L2 Norm.
 Enlarge receptive field [ParseNet] + feature transformation
2. Gating mechanism
 Control the degree of the effectiveness of another task
to the main task. (especially in back-propagation).
3. Up-sampling and adding to previous output
Equipped in every skip-connected flows
(fully-connections between layers)
Key Ideas of Proposed MTN (2/4)
2) Novel Module for Multi-task learning
 Do not have to find an optimal split point or
parameters.  <c.f.,(b), (c), (d)>
 Reduce adverse effects from inbuilt sharing
mechanism.  <c.f.,(a), (b)>
 Optimize the same strategy as the general multi-task
learning in end-to-end manner.  <c.f., (d)>
 In the inference, the Interleaver unit can be
removed.  <c.f., (d)>
(a) Fully Shared Architecture
(c) No shared Architecture (d) Connected Architecture
(b) Partial Split Architectures
Previous Multi-task Learning Our Multi-task Learning
Key Ideas of Proposed MTN (3/4)
3) Photometric Correction
“Thermal Crossover”
Thermal-infrared image is not directly affected by changing lighting conditions.
However, thermal-infrared image suffers indirectly from cyclic illumination.
Key Ideas of Proposed MTN (4/4)
Propose the adaptive scaled sigmoid to stably train the
model as the bilinear activation function.
From the initial smaller maximum disparity 𝛽0,
we iteratively increase the value 𝛼 at each epoch
to cover the large disparity level in end of training.
According to the derivative,
this is not stable for large quantities in initial stages
4) Adaptive scaled sigmoid function
Results
Experimental results: Day
MTN
GT
ColorThermal
Single Task LsMTN DsMTN MTN-P DIW [NIPS’16]
Without
Binary error map (error > 3 pixels)
[Eigen, NIPS2014]
[DIW, NIPS2016]
Daytime
1~50m Methods
STN LsMTN DsMTN MTN-P MTN STN-RGB Eigen-RGB Eigen-T DIW-RGB DIW-T
Distance *Lower is better
RMS 7.7735 6.6967 6.3671 7.0058 6.0786 7.5876 10.1792 10.2660 6.4993 6.4427
Log RMS 0.2000 0.1801 0.1761 0.1951 0.1714 0.2094 0.2386 0.2384 0.1934 0.1967
Abs. Relative 0.1531 0.1325 0.1259 0.1413 0.1207 0.1570 0.1992 0.1976 0.1644 0.1697
Sq. Relative 2.2767 1.6322 1.4394 1.7251 1.3119 2.0618 4.0629 4.0835 1.8030 1.7543
Accuracy *Higher is better
δ<1.25 0.8060 0.8358 0.8407 0.8040 0.8451 0.7772 0.7551 0.7561 0.7956 0.7825
δ<1.252
0.9337 0.9492 0.9544 0.9440 0.9557 0.9378 0.8965 0.8947 0.9482 0.9454
δ<1.253
0.9776 0.9842 0.9855 0.9827 0.9868 0.9806 0.9612 0.9618 0.9842 0.9851
Experimental results: Night
MTNSingle Task MTN-P DIW [NIPS’16]
Without
Nighttime
1~50m Methods
STN LsMTN DsMTN MTN-P MTN STN-RGB Eigen-RGB Eigen-T DIW-RGB DIW-T
Ordinal Accuracy *Higher is better
ξ<10 0.3233 0.3405 0.3745 0.3096 0.4666 0.2508 0.1728 0.2033 0.1404 0.3744
ξ<20 0.6237 0.6855 0.6820 0.6225 0.7026 0.3284 0.2442 0.6178 0.3176 0.7459
ξ<30 0.7317 0.7753 0.7797 0.7397 0.7757 0.3592 0.3064 0.7516 0.3805 0.8401
[Eigen, NIPS2014]
[DIW, NIPS2016]
GT
ColorThermal
Experimental Videos
Experimental Videos
Colors are mapped for visualization
This 3D information is from single monocular thermal image
Only the red part is used for inference
Conclusion
𝑰𝒏𝒕𝒆𝒓𝒍𝒆𝒂𝒗𝒆𝒓
in every skip-connected layer.
1. Pooling mechanism + L2 Norm.
(enlarge receptive field)
2. Gated Unit via Convolution
3. Up-sampling
 Employ multi-task learning for depth estimation
 Novel architecture for multi-task learning: Interleaver
 Photometric correction is helpful to deal with a thermal image.
 Adaptive sigmoid function help stable converge.
http://multispectral.kaist.ac.kr
You can download Dataset & Code
Thank you
Q & A

More Related Content

PDF
A Review on Airlight Estimation Haze Removal Algorithms
PDF
"Designing a Stereo IP Camera From Scratch," a Presentation from ELVEES
PPTX
Extend Your Journey: Introducing Signal Strength into Location-based Applicat...
PDF
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
PPTX
(Research Note) Delving deeper into convolutional neural networks for camera ...
PDF
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
PDF
"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...
PDF
“Efficient Deep Learning for 3D Point Cloud Understanding,” a Presentation fr...
A Review on Airlight Estimation Haze Removal Algorithms
"Designing a Stereo IP Camera From Scratch," a Presentation from ELVEES
Extend Your Journey: Introducing Signal Strength into Location-based Applicat...
“Introduction to Simultaneous Localization and Mapping (SLAM),” a Presentatio...
(Research Note) Delving deeper into convolutional neural networks for camera ...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...
“Efficient Deep Learning for 3D Point Cloud Understanding,” a Presentation fr...

What's hot (20)

PDF
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
PDF
Architecture Design for Deep Neural Networks II
PPTX
Super resolution from a single image
PPT
[Seminar arxiv]fake face detection via adaptive residuals extraction network
DOC
Algorithms and tools for point cloud generation
PDF
“An Introduction to Single-Photon Avalanche Diodes—A New Type of Imager for C...
PDF
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
PPTX
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
PDF
Qgis2threejs demo by Neetmaps
PPT
Ieee gold 2010 resta
PPTX
Coded Photography - Ramesh Raskar
PDF
M.tech dsp list 2014 15
PDF
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
PDF
A comparatively study on visual cryptography
PPTX
What is the past future tense of data?
PDF
"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic...
PDF
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
PPTX
How to Determine which Algorithms Really Matter
PDF
40120140503004
PPT
Proposal Presentation
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Architecture Design for Deep Neural Networks II
Super resolution from a single image
[Seminar arxiv]fake face detection via adaptive residuals extraction network
Algorithms and tools for point cloud generation
“An Introduction to Single-Photon Avalanche Diodes—A New Type of Imager for C...
Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...
Single Image Super Resolution using Fuzzy Deep Convolutional Networks
Qgis2threejs demo by Neetmaps
Ieee gold 2010 resta
Coded Photography - Ramesh Raskar
M.tech dsp list 2014 15
Point Cloud Stream on Spatial Mixed Reality: Toward Telepresence in Architect...
A comparatively study on visual cryptography
What is the past future tense of data?
"What is Neuromorphic Event-based Computer Vision? Sensors, Theory and Applic...
Tracking Robustness and Green View Index Estimation of Augmented and Diminish...
How to Determine which Algorithms Really Matter
40120140503004
Proposal Presentation
Ad

Similar to [AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for All-day Vision (20)

PDF
Stereo Matching by Deep Learning
PPTX
Deep learning for image super resolution
PPTX
Deep learning for image super resolution
PDF
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
PDF
Deep Transfer Learning: A new deep learning glitch classification method for ...
PDF
Deep-Learning Based Stereo Super-Resolution
PDF
Thesis presentation
PDF
Deep learning for 3-D Scene Reconstruction and Modeling
PDF
Depth Fusion from RGB and Depth Sensors III
PPTX
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
PDF
Agu chen a31_g-2917_retrieving temperature and relative humidity profiles fro...
PPTX
Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)
PDF
[Slides] Using deep recurrent neural network for direct beam solar irradiance...
PPTX
Analysis by semantic segmentation of Multispectral satellite imagery using de...
PDF
When Remote Sensing Meets Artificial Intelligence
PDF
Depth estimation do we need to throw old things away
PDF
"Deep Learning for Manufacturing Inspection Applications," a Presentation fro...
PPTX
DefenseTalk_Trimmed
PDF
Dataset creation for Deep Learning-based Geometric Computer Vision problems
PDF
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
Stereo Matching by Deep Learning
Deep learning for image super resolution
Deep learning for image super resolution
Deep Convolutional 3D Object Classification from a Single Depth Image and Its...
Deep Transfer Learning: A new deep learning glitch classification method for ...
Deep-Learning Based Stereo Super-Resolution
Thesis presentation
Deep learning for 3-D Scene Reconstruction and Modeling
Depth Fusion from RGB and Depth Sensors III
Single Image Super-Resolution from Transformed Self-Exemplars (CVPR 2015)
Agu chen a31_g-2917_retrieving temperature and relative humidity profiles fro...
Neural Inverse Rendering for General Reflectance Photometric Stereo (ICML 2018)
[Slides] Using deep recurrent neural network for direct beam solar irradiance...
Analysis by semantic segmentation of Multispectral satellite imagery using de...
When Remote Sensing Meets Artificial Intelligence
Depth estimation do we need to throw old things away
"Deep Learning for Manufacturing Inspection Applications," a Presentation fro...
DefenseTalk_Trimmed
Dataset creation for Deep Learning-based Geometric Computer Vision problems
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
Ad

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
KodekX | Application Modernization Development
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Electronic commerce courselecture one. Pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
MIND Revenue Release Quarter 2 2025 Press Release
Network Security Unit 5.pdf for BCA BBA.
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
KodekX | Application Modernization Development
“AI and Expert System Decision Support & Business Intelligence Systems”
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Electronic commerce courselecture one. Pdf
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation_ Review paper, used for researhc scholars
NewMind AI Weekly Chronicles - August'25 Week I
sap open course for s4hana steps from ECC to s4
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Mobile App Security Testing_ A Comprehensive Guide.pdf

[AAAI2018] Multispectral Transfer Network: Unsupervised Depth Estimation for All-day Vision

  • 1. Multispectral Transfer Network: Unsupervised Depth Estimation for All-day Vision AAAI 2018, New Orleans Namil Kim*, Yukyung Choi*, Soonmin Hwang, In So Kweon KAIST RCV Lab / All-day Vision Team *Equal contributions
  • 2. Problem definition Why we are interesting in depth? “Crucial information” to understand the world around us *From NVidia It is necessary to 3D understanding for self-decision making
  • 3. Problem definition How do we usually get “dense depth” in any time of the day? RGB-Stereo 3D LiDAR DayNight ≤ 11.45m≥ 23.89m 4 points 2 points LiDAR 0.16° Sensitive Sparse
  • 4. Problem solution 3D LiDAR DayNight Thermal (LWIR ) Depth Estimation from a single thermal Image How do we usually get “dense depth” in any time of the day? RGB-Stereo
  • 5. Related works Single image based depth estimation  Supervised depth estimation  Unsupervised depth estimation  Semi-supervised depth estimation Supervised depth estimation Supervised [NIPS’14, CVPR’15, ICCV’15, NIPS’16, PAMI’16] Semi-supervised [CVPR’17] Unsupervised [ECCV’16, 3DV’16, CVPR’17] Unsupervised depth estimation Semi-supervised depth estimation
  • 6. Idea to all-day depth estimation Day Night Illumination change RGB O X Unsupervised Learning Unsupervised Learning
  • 7. Idea to all-day depth estimation Day Night Illumination change RGBThermal O X Robust to illumination change Unsupervised Learning Unsupervised Learning
  • 8. Idea to all-day depth estimation Day Night Illumination change RGBThermal Alignment O X Thermal-to-depth #1 #2 Unsupervised Learning Unsupervised Learning
  • 9. Idea to all-day depth estimation Day Night Illumination change RGBThermal Alignment O X Thermal-to-depth Adaptation Robust to illumination change Unsupervised Learning Unsupervised Learning
  • 10. Requirements #1 Multispectral (RGB-Thermal) dataset  RGB stereo pair  Alignment between thermal and RGB(left)  3D measurement Yukyung Choi et al., KAIST Multispectral Recognition Dataset in Day and Night, TITS’18
  • 11. Requirements #2 Multispectral (RGB-Thermal) Transfer Network  Aim: Thermal to depth prediction  Data: Thermal and aligned left RGB (+ right RGB, stereo pair)  Model: unsupervised method RGBThermal Alignment O U.S.L Thermal-to-depth
  • 12. Proposed framework What is Multispectral Transfer Network? @Supervised method @Unsupervised method @MTN method
  • 14. Key Ideas of Proposed MTN (Overview) 1) Efficient Multi-task Learning Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture, ICCV2015. Without annotated data: Propose an efficient multi-task methodology Depth and Chromaticity - surface normal - semantic labeling - object pose annotation * Most of works under an indoor. (difficulty of collecting sources of subsequent task in outdoor) Multi-task learning for depth estimation No human-intensive data Relevance to the depth Contextual information
  • 16. Key Ideas of Proposed MTN (1/4) Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture, ICCV2015. - surface normal - semantic labeling - object pose annotation * Most of works under an indoor. (difficulty of collecting sources of subsequent task in outdoor) Previous works: No human-intensive data Relevance to the depth Contextual information Our work: Chromaticity 1) Efficient Multi-task Learning Without annotated data: Propose an efficient multi-task methodology
  • 17. Key Ideas of Proposed MTN (2/4) Interleaver Module: to directly interleave the chromaticity into the depth estimation “Skip-connection meets Inter-leaver for the feature learning” Encoder Decoder Multispectral Transfer Network (MTN) 2) Novel Module for Multi-task learning Thermal Input Disparity Output Chromaticity Output Conv. DeConv. Interleaver Skip Connect. Forward flow
  • 18. Key Ideas of Proposed MTN (2/4) 2) Novel Module for Multi-task learning 1. Global/Un-Pooling + L2 Norm.  Enlarge receptive field [ParseNet] + feature transformation 2. Gating mechanism  Control the degree of the effectiveness of another task to the main task. (especially in back-propagation). 3. Up-sampling and adding to previous output Equipped in every skip-connected flows (fully-connections between layers)
  • 19. Key Ideas of Proposed MTN (2/4) 2) Novel Module for Multi-task learning  Do not have to find an optimal split point or parameters.  <c.f.,(b), (c), (d)>  Reduce adverse effects from inbuilt sharing mechanism.  <c.f.,(a), (b)>  Optimize the same strategy as the general multi-task learning in end-to-end manner.  <c.f., (d)>  In the inference, the Interleaver unit can be removed.  <c.f., (d)> (a) Fully Shared Architecture (c) No shared Architecture (d) Connected Architecture (b) Partial Split Architectures Previous Multi-task Learning Our Multi-task Learning
  • 20. Key Ideas of Proposed MTN (3/4) 3) Photometric Correction “Thermal Crossover” Thermal-infrared image is not directly affected by changing lighting conditions. However, thermal-infrared image suffers indirectly from cyclic illumination.
  • 21. Key Ideas of Proposed MTN (4/4) Propose the adaptive scaled sigmoid to stably train the model as the bilinear activation function. From the initial smaller maximum disparity 𝛽0, we iteratively increase the value 𝛼 at each epoch to cover the large disparity level in end of training. According to the derivative, this is not stable for large quantities in initial stages 4) Adaptive scaled sigmoid function
  • 23. Experimental results: Day MTN GT ColorThermal Single Task LsMTN DsMTN MTN-P DIW [NIPS’16] Without Binary error map (error > 3 pixels) [Eigen, NIPS2014] [DIW, NIPS2016] Daytime 1~50m Methods STN LsMTN DsMTN MTN-P MTN STN-RGB Eigen-RGB Eigen-T DIW-RGB DIW-T Distance *Lower is better RMS 7.7735 6.6967 6.3671 7.0058 6.0786 7.5876 10.1792 10.2660 6.4993 6.4427 Log RMS 0.2000 0.1801 0.1761 0.1951 0.1714 0.2094 0.2386 0.2384 0.1934 0.1967 Abs. Relative 0.1531 0.1325 0.1259 0.1413 0.1207 0.1570 0.1992 0.1976 0.1644 0.1697 Sq. Relative 2.2767 1.6322 1.4394 1.7251 1.3119 2.0618 4.0629 4.0835 1.8030 1.7543 Accuracy *Higher is better δ<1.25 0.8060 0.8358 0.8407 0.8040 0.8451 0.7772 0.7551 0.7561 0.7956 0.7825 δ<1.252 0.9337 0.9492 0.9544 0.9440 0.9557 0.9378 0.8965 0.8947 0.9482 0.9454 δ<1.253 0.9776 0.9842 0.9855 0.9827 0.9868 0.9806 0.9612 0.9618 0.9842 0.9851
  • 24. Experimental results: Night MTNSingle Task MTN-P DIW [NIPS’16] Without Nighttime 1~50m Methods STN LsMTN DsMTN MTN-P MTN STN-RGB Eigen-RGB Eigen-T DIW-RGB DIW-T Ordinal Accuracy *Higher is better ξ<10 0.3233 0.3405 0.3745 0.3096 0.4666 0.2508 0.1728 0.2033 0.1404 0.3744 ξ<20 0.6237 0.6855 0.6820 0.6225 0.7026 0.3284 0.2442 0.6178 0.3176 0.7459 ξ<30 0.7317 0.7753 0.7797 0.7397 0.7757 0.3592 0.3064 0.7516 0.3805 0.8401 [Eigen, NIPS2014] [DIW, NIPS2016] GT ColorThermal
  • 25. Experimental Videos Experimental Videos Colors are mapped for visualization This 3D information is from single monocular thermal image Only the red part is used for inference
  • 26. Conclusion 𝑰𝒏𝒕𝒆𝒓𝒍𝒆𝒂𝒗𝒆𝒓 in every skip-connected layer. 1. Pooling mechanism + L2 Norm. (enlarge receptive field) 2. Gated Unit via Convolution 3. Up-sampling  Employ multi-task learning for depth estimation  Novel architecture for multi-task learning: Interleaver  Photometric correction is helpful to deal with a thermal image.  Adaptive sigmoid function help stable converge.

Editor's Notes

  • #3: Visual perception such as object detection, semantic segmentation, and tracking is one of essential techniques to recognize objects around us. For the advanced self-driving stage which is fully automatic decision making, the depth is crucial when it comes to understand the world by using objects we found. If you would like to know where you are, where you want to go, and how far away are, rhe depth estimation in all-day conditions is the most important technical issue for autonomous systems.
  • #4: The following natural question is “how do we get high-quality depth information from the world? In a general way, we have a stereo camera and 2D/3D-LiDAR.The stereo camera works well at day-time, but as you can see, it totally fails at night-time because RGB camera has a dependency to illumination changes. LiDAR sensors have some advantages at night-time, but the depth measurement is very sparse and sensitive to weather conditions such as raining and snowing.
  • #5: From these perspectives, we argue that an alternative sensor is needed to overcome the limitations. Therefore, we proposed a thermal camera as alternative sensor beyond RGB.
  • #6: We can divide the single-image depth estimation into three folds as follows: (Click) Given the input image and the corresponding depth image, we can train the model to predict the depth directly. However, for the outdoor scenario, it is very difficult to obtain the high-quality and dense depth image for the training purpose. (Click) To overcome this fact, unsupervised meth od has been proposed recently. Given the rectified stereo images, the model is trained to learn the geometric relation between two images, called the inverse depth as disparity. Without the high-quality depth ground truth, this approach has a big advantage to only using the calibrated stereo. // (구체적인 방법론 스킵) When we optimize the model, we first warp the input image by using the predicted disparity to make the synthetic right image, and then we minimize the pixel-wise difference between the output right image and target right image. (Click) Last approach is a semi-supervised method. This model is trained to minimize an unsupervised- and a supervised loss from the additional supervision as LiDAR. From these perspectives, our model is based on the unsupervised framework to predict the depth image.
  • #7: Now, we explain the key concept of all-day depth estimation. As we mentioned above, RGB-based unsupervised approaches operates on the well-lit conditions, while it performs poor on ill-lit conditions such as night, dawn, sunrise, and sunset.
  • #8: Compared to RGB images, the thermal image has a great advantage to capture the contents regardless of amount of lighting, Because a thermal sensor measures a long wavelength radiation emitted by subjects. Therefore, various thermal sensors have been increasingly used in modern robotics and computer vision research on all-day recognition.
  • #9: Let’s suppose that we have a depth prediction model trained by RGB images in day-time. And if we have an aligned thermal image to the input RGB image, we can make a pair of the thermal and the corresponding depth image in day-time.
  • #10: So, if we can train the model by using the thermal and depth corresponding pair in day-time conditions, it is possible to “adapt” this model to night conditions, since a thermal image is less affected by illumination conditions. To sum up, we can train the model by using thermal input images and RGB stereo pair to estimate the depth in all-day conditions. This is a key concept of the proposed multispectral transfer network.
  • #11: For our purpose, there are two main requirements when estimating all-day depths from thermal images. The first requirement is large-scale multispectral datasets. So, we designed the multispectral dataset for depth estimations. Our dataset is captured with calibrated RGB stereo pair, a thermal image co-aligned with left-view RGB images and 3D measurements. As shown in the example, we focus on real-world driving conditions and we captured all scenarios in well-lit and ill-lit conditions.
  • #12: Another requirement is the modeling the framework with multispectral pairs to predict depth information. Now, I will explain the proposed multispectral transfer network.
  • #13: Based on the unsupervised framework and our multispectral dataset, we propose MTN method for single thermal based depth estimation. Instead of two separate models including RGB-based unsupervised model and thermal-based supervised model, we combined two functions into a single model.
  • #15: Now, I’ll introduce our four contributions in this work. -- Our four contributions are as follows: First) (we propose) Unsupervised(Efficient) Multi-task learning with firstly adjusting color information to depth estimation Second) (we propose) A new type of multi-tasking architecture to prevent adverse effect of general multi-task topologies. Third) (we propose) Photometric correction to augment thermal image with respect to cyclic illumination for all-day adaptation. Forth) (we propose) Adaptive activation function for stable training. In Today’s talk, I will present second contribution due to the presentation time.
  • #17: In general, surface normal, segmentation mask, and poses have been used as an additional tasks for multi-tasking. In the outdoor environment, however, it is not easy to gather this kind of heavy supervision as auxiliary data. Therefore, we choose “chromaticity” as an auxiliary task of multi-task learning as following three reasons. First, obtaining color information doesn’t need human-intensive annotation. Second, color information has been used in many depth-related works in terms of keeping geometric structure. Third, it has been proved that color information includes a contextual information so that it is useful for the network to learn a local structure.
  • #18: The second contribution is to propose a new module called “Interleaver” for the better multi-task learning. Skip-connected Encoder-decoder models have been widely used in previous pixel-level prediction tasks. Based on this model, we place the interLeaver at the middle of skip-connection feature maps.
  • #19: Our goal is to increase the representation power for depth estimations / by explicitly focusing on important features of the additional task as color regression / and suppressing unnecessary ones of it. Since the skip-connected input has mostly spatial informative features, we adopt our module to blend the meaningful features from the color regression tasks together. We assume that both depth and color-related tasks have different point of views in receptive field, so that we first pass into global/un-pooling mechanism similar to [ParseNet] and then feed it into gated convolution to control the degree of the effectiveness of an additional tasks. We place Interleaver in every skip-connected paths.
  • #20: Compared to previous multi-tasking topologies, our gating mechanism of Interleaver encourages the model to find the optimal split position and reduce the adverse effects which commonly happed to the conventional methods. Therefore, we argue that our interleaver can be a generalized topologies of skip-connection of encoder-decoder architectures. In each skip-connected layers, our interleaver automatically learn the control parameters. Moreover, unlike the connected architectures, the Interleaver can be removed in the inference step for depth estimations.
  • #21: The third contribution is about data augmentation for all-day adaptation. The thermal images are invariant to the illumination, but variant to the temperature. This makes the contrast of thermal image varies over time. To relieve this effect, we proposed the data-driven correction function which tunes up the thermal image for temperature-variant augmentation in training phase.
  • #22: The final key contribution is to propose an adaptive activation function. For disparity estimation, the network is easily prone to predict larger disparity than ground truth, so it causes divergence of network training when we used in bilinear sampler. So our trick is to increase the maximum disparity iteratively with the scaled sigmoid activation.
  • #24: Here are our results at day time. (Pointing) This is the thermal input, (Pointing) corresponding color image for visualization, and (Pointing) ground truth. To verify our contributions, we show several results. (Pointing) You can see this black and white error map to compare these results easily. As you can see in the binary error map, our final MTN model shows most accurate result. In the table, our final MTN model also achieves best performance in every metrics. All the multi-task models, LsMTN/DsMTN/ and MTN, outperforms single task model, so our multi-task learning is effective. And the LsMTN and DsMTN are outperformed by our final model which uses interleaver modules. So empirically say, our novel module works well for multi-task learning. Finally, we can show that our photometric correction is another important factor, too.
  • #25: These are the results at night time. As you can see here, thermal image shows good visibility even the color image is totally dark. In this situation, it’s so hard to get dense ground truth. So, we proposed a metric which measures the performance on the locations from LiDAR ground truth. For the reasonable comparison, our metric measures not only the ordinal relations between points, but also measures the depth accuracy simultaneously With a new metric, our final model still shows good performance.
  • #26: This is a video result for our method. Colors are mapped just for visualization. Even our method takes a single thermal image for inference, it achieves reasonable accuracy.
  • #27: In conclusion, we employ multi-task learning for depth estimation. And we propose a novel architecture for effective multi-task learning. Our photometric correction and adaptive sigmoid function are usefull for training. Also we show that our interleaver helps the receptive field of the network wider and valid.
  • #28: That’s it. Thank you. Please refer to the following website for the dataset and related codes used in this paper.