SlideShare a Scribd company logo
Look, Listen and Act
Written by 이의령, 김예찬, 양홍선
보고 듣고 행동하는 에이전트
3D Environment HomeNavi
Your
Logo
Here
Project Introduction
2www.website.com01/01/2018
Project Introduction
3www.website.com
3D Environment 기반 Home Navigation
• House(Indoor) 3D Dataset
• Reinforcement Learning Environment
• ‘Go to Kitchen’ 과 같은 Instruction 기반 Task 수행
Project Introduction – House3D
4
Project Introduction
5
Language
ActionsVision
• Image / video
understanding
• 3D environment
perception
• Camera motion
• Robotics /
Manipulation
• APIs
• Instruction following
• Question answering
• Dialog
‘Complete’
Agent
Project Introduction
6
3D Environment 기반 Home Navigation
• House(Indoor) 3D Dataset
• Reinforcement Learning Environment
• ‘Go to Kitchen’ 과 같은 Instruction 기반 Task 수행
Motivation
7
Target-driven Visual Navigation Model using Deep Reinforcement Learning
Y Zhu , ICRA 2017
Motivation
8
모바일 로봇에 적용할 수 있지 않을까?
Your
Logo
Here
Mobile Robot & Navigation
9
Mobile Robot
10
A mobile robot is a robot that is capable of locomotion.
- wikipedia-
중분류 소분류 기술내용
Navigation
Driving
Path Planning
Obstacle Avoidance
Recognizing the surroundings
Localization
&
Mapping
Dead Reckoning
LandMark
SLAM
Credit : Machine Learning & Robotics / Geonhee Lee
Path Planning
11
• 현재 위치에서부터 지도상에 지정받은 목표 지점까지 이동 궤적(Trajectory)을 생성
• Map상의 Global Path Planning과 Local Path Planning으로 나누어
로봇의 이동 경로를 생성
• Algorithm: A*, D*, RRT(Rapidly-exploring random tree), Probabilistic Roadmap 등
Credit : Machine Learning & Robotics / Geonhee Lee
SLAM
12
Simultaneous Localization and Mapping
• Computational problem of constructing a map of an environment
while simultaneously keeping track of a robot’s location
Credit : Fast Campus SLAM Workshop 2018 / Dong-Won Shin
SLAM
13
Visual Localization
• Under the inaccurate GPS
• GPS-denied environment
Credit : Fast Campus SLAM Workshop 2018 / Dong-Won Shin
SLAM
14
Mapping
• Scenarios in which a prior map is not available and needs to be built.
• Map can inform path planning or provide an intuitive visualization
for a human or robot.
Credit : Fast Campus SLAM Workshop 2018 / Dong-Won Shin
Your
Logo
Here
Navigation
via Reinforcement Learning
15
Your
Logo
Here
Vision – Language – Navigation
Deep RL based Navigation
16www.website.com01/01/2018
Vision - Language
17www.website.com
Vision + Language Application
• Image Captioning
Input:
The man at bat
readies to swing at
the pitch while the
umpire looks on.
Desired
Output:
A large bus sitting
next to a very tall
building.
Vision - Language
18www.website.com
Vision + Language Deep Learning Architecture
• Image Captioning
Credit : https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/
Vision - Language
19
Vision + Language Application
• Visual Question Answering(VQA)
Input:
Q: What is the
Musache made of?
Q: Is this a
Vegetarian Pizza?
Desired
Output:
A: Bananas A: No
Vision - Language
20www.website.com
Vision + Language Deep Learning Architecture
• Visual Question Answering(VQA)
Credit : https://arxiv.org/pdf/1505.00468v6.pdf
Vision - Language Navigation
21
Evolution of Language and Vision datasets towards Actions
Credit : https://lvatutorial.github.io/
Vision - Language Navigation
22
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
23
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
24
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
25
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
26
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
27
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
28
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
29
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
30
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
31
Evolution of Language and Vision datasets towards Actions
Vision - Language Navigation
32
Language
ActionsVision
• Image / video
understanding
• 3D environment
perception
• Camera motion
• Robotics /
Manipulation
• APIs
• Instruction following
• Question answering
• Dialog
‘Complete’
Agent
3D Environment
33
!X
Datasets
Environments
Tasks & Metrics
Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
3D Environment
34
!X
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017) Stanford 2D-3D-S (Armeni et al., 2017)
Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
3D Environment
35
!X
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017)
AI2-THOR
(Kolve et al., 2017)
MINOS
(Savva et al., 2017)
Gibson
(Zamir et al., 2018)
Stanford 2D-3D-S (Armeni et al., 2017)
CHALET
(Yan et al., 2018)
House3D
(Wu et al., 2017)
HoME (Brodeur et al., 2018)
VirtualHome
(Puig et al., 2018)
AdobeIndoorNav
(Mo et al., 2018)
Matterport3DSim
(Anderson et al., 2018)
Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
3D Environment
36Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
!X
EmbodiedQA
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017)
AI2-THOR
(Kolve et al., 2017)
MINOS
(Savva et al., 2017)
Gibson
(Zamir et al., 2018)
Stanford 2D-3D-S (Armeni et al., 2017)
CHALET
(Yan et al., 2018)
House3D
(Wu et al., 2017)
Interactive QA
(Gordon et al., 2018)
Vision-Language Navigation
(Anderson et al., 2018)
Language grounding
(Chaplot et al., 2017,
Hermann & Hill et al., 2017)
Visual Navigation
(Zhu & Gordon et al., 2017,
Savva et al., 2017,
Wu et al., 2017)
HoME (Brodeur et al., 2018)
VirtualHome
(Puig et al., 2018)
AdobeIndoorNav
(Mo et al., 2018)
Matterport3DSim
(Anderson et al., 2018)
3D Environment
37Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
!X
EmbodiedQA
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017)
AI2-THOR
(Kolve et al., 2017)
MINOS
(Savva et al., 2017)
Gibson
(Zamir et al., 2018)
Stanford 2D-3D-S (Armeni et al., 2017)
CHALET
(Yan et al., 2018)
House3D
(Wu et al., 2017)
Interactive QA
(Gordon et al., 2018)
Vision-Language Navigation
(Anderson et al., 2018)
Language grounding
(Chaplot et al., 2017,
Hermann & Hill et al., 2017)
Visual Navigation
(Zhu & Gordon et al., 2017,
Savva et al., 2017,
Wu et al., 2017)
HoME (Brodeur et al., 2018)
VirtualHome
(Puig et al., 2018)
AdobeIndoorNav
(Mo et al., 2018)
Matterport3DSim
(Anderson et al., 2018)
>= 2017 (!)
Paper (in project)
38
l House3D Environment 구축
l RoomNav 학습 모델
House3D
Yi Wu et, al(2017)
Gated
Attention
Chaplot et, al(2017)
l Gated Attention Module
l House3D RoomNav의
레퍼런스 모델
Embodied QA
Abhishek et, al(2017)
l 최초 VQA + RL 접근
l Embodied QA Dataset 구축
l Hirarchical Model
l PACMAN 학습 모델
l CVPR 2018
FollowNet
P Shah et, al(2017)
l Conditioned Attention 모형
l Long Instruction(Language)
사용
l ICRA 2018
Arxiv Link Arxiv Link Arxiv Link Arxiv Link
Code Code Code
Paper
39
l Target Driven Visual
Navigation in Indoor Scene
l Siamese 형태의
RL기반 Navigation 학습 모델
l ICRA 2017
Target Driven
Visual Navi
Yuke Zhu et, al(2017)
CMP
Gupta et, al(2017)
Arxiv Link
l Cognitive Mapping and
Planning for visual Navigation
l Value Iteration Network
l CVPR 2017
Arxiv Link
l Visual Question Answering
in Interactive Environment
l CVPR 2018
Arxiv Link
CodeCodeCode
l Vision and Language
Navigation
l CVPR 2018 spotlight
Arxiv Link
Paper
40
Vision Language Navigation 이란 제목으로
2017년부터 지속적으로
Paper & Environment이 나오고 있는 추세
Your
Logo
Here
Project Experiment
41
Dataset
+
Environment
+
Task
Dataset: SUNCG
Dataset: SUNCG
Bedroom Toilet,
Bathroom
Garage
Bedroom BedroomRoom type
Dataset: SUNCG
사람이 디자인한 45,622 개의 3D 씬
평균적으로 8.9개의 방과 1.3층으로 구성됨
20개의 방 종류 (bedroom, living room, …)
80개의 개체 유형 (cup, chair, …)
Environment: House3D
Tasks: RoomNav, Embodied QA
RoomNav
Tasks: RoomNav, Embodied QA
Embodied QA
RoomNav
Models
Models
1
2
Look, Listen and Act [Navigation via Reinforcement Learning]
Gated LSTM
Gated LSTM
(Vizdoom)
Gated LSTM
Look, Listen and Act [Navigation via Reinforcement Learning]
Look / Listen /Act
Look / Listen /Act
보고
Look / Listen /Act
보고
듣고
Look / Listen /Act
보고
듣고
행동
Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]
Look, Listen and Act [Navigation via Reinforcement Learning]
House3DVizdoom
House3DVizdoom 10% ?
Experimental Results of
RoomNav Paper
Return to Vizdoom !
Difficulty Level
Look, Listen and Act [Navigation via Reinforcement Learning]
30시간
More time!
Easy에서 학습된 모델을 à Hard에 적용
Visual input과 instruction과의 관계는 알고 있으나..
학습
Exploration을 하기 시작
Navigation + House3D = Ultra-hard
단계적으로 학습시켜 보기
Representation을 좋게
Questions?

More Related Content

PDF
数式を(ちょっとしか)使わずに隠れマルコフモデル
PDF
"Scale Aware Face Detection"と"Finding Tiny Faces" (CVPR'17) の解説
PDF
DeepLearning 輪読会 第1章 はじめに
PPT
Probabilistic Approaches to Shadow Maps Filtering
PDF
Thiessen Polygon Creation in QGIS
PDF
지리정보체계(GIS) - [2] 좌표계 이해하기
PDF
はじめてのパターン認識8章 サポートベクトルマシン
PDF
QGIS를 활용한 공간분석 입문(1일 6시간)
数式を(ちょっとしか)使わずに隠れマルコフモデル
"Scale Aware Face Detection"と"Finding Tiny Faces" (CVPR'17) の解説
DeepLearning 輪読会 第1章 はじめに
Probabilistic Approaches to Shadow Maps Filtering
Thiessen Polygon Creation in QGIS
지리정보체계(GIS) - [2] 좌표계 이해하기
はじめてのパターン認識8章 サポートベクトルマシン
QGIS를 활용한 공간분석 입문(1일 6시간)

What's hot (20)

PDF
Instrumenting the real-time web: Node.js in production
PDF
7日でマスター!基礎から学ぶ衛星データ講座~3日目~「分解能(解像度)とお金について学ぶ」
PDF
12. Diffusion Model の数学的基礎.pdf
PPTX
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
PDF
機械学習と深層学習の数理
PDF
ELBO型VAEのダメなところ
PPTX
ノンパラメトリックベイズを用いた逆強化学習
PPTX
Q Learning과 CNN을 이용한 Object Localization
PPTX
Destek vektör makineleri
PDF
[공간정보시스템 개론] L09 공간 데이터 모델
PDF
前景と背景の画像合成技術
PDF
PPTX
공간정보아카데미 QGIS 기초 (2017.5)
PDF
PRML 上 1.2.4 ~ 1.2.6
PDF
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
PPTX
QGIS 활용
PDF
Flexplatform 서비스 소개서
PDF
データ解析7 主成分分析の基礎
PDF
[DL輪読会]Deep Learning 第2章 線形代数
PPTX
유니티의 툰셰이딩을 사용한 3D 애니메이션 표현
Instrumenting the real-time web: Node.js in production
7日でマスター!基礎から学ぶ衛星データ講座~3日目~「分解能(解像度)とお金について学ぶ」
12. Diffusion Model の数学的基礎.pdf
【DL輪読会】DayDreamer: World Models for Physical Robot Learning
機械学習と深層学習の数理
ELBO型VAEのダメなところ
ノンパラメトリックベイズを用いた逆強化学習
Q Learning과 CNN을 이용한 Object Localization
Destek vektör makineleri
[공간정보시스템 개론] L09 공간 데이터 모델
前景と背景の画像合成技術
공간정보아카데미 QGIS 기초 (2017.5)
PRML 上 1.2.4 ~ 1.2.6
VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Comple...
QGIS 활용
Flexplatform 서비스 소개서
データ解析7 主成分分析の基礎
[DL輪読会]Deep Learning 第2章 線形代数
유니티의 툰셰이딩을 사용한 3D 애니메이션 표현
Ad

Similar to Look, Listen and Act [Navigation via Reinforcement Learning] (20)

PDF
3D Environment : HomeNavigation
PDF
3D Environment HOMENavi
PDF
2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI)
PDF
20181212 Queensland AI Meetup
PDF
"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures
PDF
NVIDIA @ Infinite Conference, London
PDF
Techniques and Challenges in Autonomous Driving
PDF
3D 딥러닝 동향
PPTX
Kudan deck slide share e
PDF
Differentiable Neural Computer
PPTX
The Road To Intent Driven Automation - 45 Minutes Version
PDF
Icml2018 naver review
PPTX
Deep learning tutorial (i)
PDF
Smart Data Webinar: Machine Learning Update
PDF
Junli Gu at AI Frontiers: Autonomous Driving Revolution
PDF
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
PPTX
Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Chris...
PDF
Learning to Perceive the 3D World
PDF
GTC Taiwan 2017 主題演說
PDF
Brief History of Visual Representation Learning
3D Environment : HomeNavigation
3D Environment HOMENavi
2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI)
20181212 Queensland AI Meetup
"The Vision AI Start-ups That Matter Most," a Presentation from Cognite Ventures
NVIDIA @ Infinite Conference, London
Techniques and Challenges in Autonomous Driving
3D 딥러닝 동향
Kudan deck slide share e
Differentiable Neural Computer
The Road To Intent Driven Automation - 45 Minutes Version
Icml2018 naver review
Deep learning tutorial (i)
Smart Data Webinar: Machine Learning Update
Junli Gu at AI Frontiers: Autonomous Driving Revolution
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Chris...
Learning to Perceive the 3D World
GTC Taiwan 2017 主題演說
Brief History of Visual Representation Learning
Ad

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Cloud computing and distributed systems.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
KodekX | Application Modernization Development
PPT
Teaching material agriculture food technology
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Empathic Computing: Creating Shared Understanding
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation_ Review paper, used for researhc scholars
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Understanding_Digital_Forensics_Presentation.pptx
Spectral efficient network and resource selection model in 5G networks
Per capita expenditure prediction using model stacking based on satellite ima...
Cloud computing and distributed systems.
Diabetes mellitus diagnosis method based random forest with bat algorithm
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
20250228 LYD VKU AI Blended-Learning.pptx
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KodekX | Application Modernization Development
Teaching material agriculture food technology
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Empathic Computing: Creating Shared Understanding
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...

Look, Listen and Act [Navigation via Reinforcement Learning]