Look, Listen and Act [Navigation via Reinforcement Learning]

Look, Listen and Act
Written by 이의령, 김예찬, 양홍선
보고 듣고 행동하는 에이전트
3D Environment HomeNavi

Your
Logo
Here
Project Introduction
2www.website.com01/01/2018

3www.website.com
3D Environment 기반 Home Navigation
• House(Indoor) 3D Dataset
• Reinforcement Learning Environment
• ‘Go to Kitchen’ 과 같은 Instruction 기반 Task 수행

Project Introduction – House3D
4

5
Language
ActionsVision
• Image / video
understanding
• 3D environment
perception
• Camera motion
• Robotics /
Manipulation
• APIs
• Instruction following
• Question answering
• Dialog
‘Complete’
Agent

6
3D Environment 기반 Home Navigation
• House(Indoor) 3D Dataset
• Reinforcement Learning Environment
• ‘Go to Kitchen’ 과 같은 Instruction 기반 Task 수행

Motivation
7
Target-driven Visual Navigation Model using Deep Reinforcement Learning
Y Zhu , ICRA 2017

Motivation
8
모바일 로봇에 적용할 수 있지 않을까?

Your
Logo
Here
Mobile Robot & Navigation
9

Mobile Robot
10
A mobile robot is a robot that is capable of locomotion.
- wikipedia-
중분류 소분류 기술내용
Navigation
Driving
Path Planning
Obstacle Avoidance
Recognizing the surroundings
Localization
&
Mapping
Dead Reckoning
LandMark
SLAM
Credit : Machine Learning & Robotics / Geonhee Lee

Path Planning
11
• 현재 위치에서부터 지도상에 지정받은 목표 지점까지 이동 궤적(Trajectory)을 생성
• Map상의 Global Path Planning과 Local Path Planning으로 나누어
로봇의 이동 경로를 생성
• Algorithm: A*, D*, RRT(Rapidly-exploring random tree), Probabilistic Roadmap 등
Credit : Machine Learning & Robotics / Geonhee Lee

SLAM
12
Simultaneous Localization and Mapping
• Computational problem of constructing a map of an environment
while simultaneously keeping track of a robot’s location
Credit : Fast Campus SLAM Workshop 2018 / Dong-Won Shin

SLAM
13
Visual Localization
• Under the inaccurate GPS
• GPS-denied environment

SLAM
14
Mapping
• Scenarios in which a prior map is not available and needs to be built.
• Map can inform path planning or provide an intuitive visualization
for a human or robot.

Your
Logo
Here
Navigation
via Reinforcement Learning
15

Your
Logo
Here
Vision – Language – Navigation
Deep RL based Navigation
16www.website.com01/01/2018

Vision - Language
17www.website.com
Vision + Language Application
• Image Captioning
Input:
The man at bat
readies to swing at
the pitch while the
umpire looks on.
Desired
Output:
A large bus sitting
next to a very tall
building.

Vision - Language
18www.website.com
Vision + Language Deep Learning Architecture
• Image Captioning
Credit : https://www.analyticsvidhya.com/blog/2018/04/solving-an-image-captioning-task-using-deep-learning/

Vision - Language
19
Vision + Language Application
• Visual Question Answering(VQA)
Input:
Q: What is the
Musache made of?
Q: Is this a
Vegetarian Pizza?
Desired
Output:
A: Bananas A: No

Vision - Language
20www.website.com
Vision + Language Deep Learning Architecture
• Visual Question Answering(VQA)
Credit : https://arxiv.org/pdf/1505.00468v6.pdf

Vision - Language Navigation
21
Evolution of Language and Vision datasets towards Actions
Credit : https://lvatutorial.github.io/

22

23

24

25

26

27

28

29

30

31

32
Language
ActionsVision
• Image / video
understanding
• 3D environment
perception
• Camera motion
• Robotics /
Manipulation
• APIs
• Instruction following
• Question answering
• Dialog
‘Complete’
Agent

3D Environment
33
!X
Datasets
Environments
Tasks & Metrics
Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das

3D Environment
34
!X
SUNCG (Song et al., 2017)
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017) Stanford 2D-3D-S (Armeni et al., 2017)

3D Environment
35
!X
Datasets
Environments
Tasks & Metrics
Matterport3D (Chang et al., 2017)
AI2-THOR
(Kolve et al., 2017)
MINOS
(Savva et al., 2017)
Gibson
(Zamir et al., 2018)
Stanford 2D-3D-S (Armeni et al., 2017)
CHALET
(Yan et al., 2018)
House3D
(Wu et al., 2017)
HoME (Brodeur et al., 2018)
VirtualHome
(Puig et al., 2018)
AdobeIndoorNav
(Mo et al., 2018)
Matterport3DSim
(Anderson et al., 2018)

3D Environment
36Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
!X
EmbodiedQA
Datasets
Environments
Tasks & Metrics
AI2-THOR
MINOS
Gibson
CHALET
(Yan et al., 2018)
House3D
(Wu et al., 2017)
Interactive QA
(Gordon et al., 2018)
Vision-Language Navigation
Language grounding
(Chaplot et al., 2017,
Hermann & Hill et al., 2017)
Visual Navigation
(Zhu & Gordon et al., 2017,
Savva et al., 2017,
Wu et al., 2017)
VirtualHome
(Puig et al., 2018)
AdobeIndoorNav
(Mo et al., 2018)
Matterport3DSim

3D Environment
37Credit : Connecting Language and Vision to Actions ACL2018 Tutorial / Abhishek Das
!X
EmbodiedQA
Datasets
Environments
Tasks & Metrics
AI2-THOR
MINOS
Gibson
CHALET
(Yan et al., 2018)
House3D
(Wu et al., 2017)
Interactive QA
(Gordon et al., 2018)
Vision-Language Navigation
Language grounding
(Chaplot et al., 2017,
Hermann & Hill et al., 2017)
Visual Navigation
(Zhu & Gordon et al., 2017,
Savva et al., 2017,
Wu et al., 2017)
VirtualHome
(Puig et al., 2018)
AdobeIndoorNav
(Mo et al., 2018)
Matterport3DSim
>= 2017 (!)

Paper (in project)
38
l House3D Environment 구축
l RoomNav 학습 모델
House3D
Yi Wu et, al(2017)
Gated
Attention
Chaplot et, al(2017)
l Gated Attention Module
l House3D RoomNav의
레퍼런스 모델
Embodied QA
Abhishek et, al(2017)
l 최초 VQA + RL 접근
l Embodied QA Dataset 구축
l Hirarchical Model
l PACMAN 학습 모델
l CVPR 2018
FollowNet
P Shah et, al(2017)
l Conditioned Attention 모형
l Long Instruction(Language)
사용
l ICRA 2018
Arxiv Link Arxiv Link Arxiv Link Arxiv Link
Code Code Code

Paper
39
l Target Driven Visual
Navigation in Indoor Scene
l Siamese 형태의
RL기반 Navigation 학습 모델
l ICRA 2017
Target Driven
Visual Navi
Yuke Zhu et, al(2017)
CMP
Gupta et, al(2017)
Arxiv Link
l Cognitive Mapping and
Planning for visual Navigation
l Value Iteration Network
l CVPR 2017
Arxiv Link
l Visual Question Answering
in Interactive Environment
l CVPR 2018
Arxiv Link
CodeCodeCode
l Vision and Language
Navigation
l CVPR 2018 spotlight
Arxiv Link

Paper
40
Vision Language Navigation 이란 제목으로
2017년부터 지속적으로
Paper & Environment이 나오고 있는 추세

Your
Logo
Here
Project Experiment
41

Dataset: SUNCG
Bedroom Toilet,
Bathroom
Garage
Bedroom BedroomRoom type

Dataset: SUNCG
사람이 디자인한 45,622 개의 3D 씬
평균적으로 8.9개의 방과 1.3층으로 구성됨
20개의 방 종류 (bedroom, living room, …)
80개의 개체 유형 (cup, chair, …)

Tasks: RoomNav, Embodied QA
RoomNav

Tasks: RoomNav, Embodied QA
Embodied QA

Look, Listen and Act [Navigation via Reinforcement Learning]

Gated LSTM
Gated LSTM
(Vizdoom)

Look / Listen /Act
보고
듣고

Look / Listen /Act
보고
듣고
행동

Experimental Results of
RoomNav Paper

Easy에서 학습된 모델을 à Hard에 적용

Visual input과 instruction과의 관계는 알고 있으나..

Navigation + House3D = Ultra-hard

단계적으로 학습시켜 보기

Look, Listen and Act [Navigation via Reinforcement Learning]

More Related Content

What's hot (20)

Similar to Look, Listen and Act [Navigation via Reinforcement Learning] (20)

Recently uploaded (20)

Look, Listen and Act [Navigation via Reinforcement Learning]