[DL輪読会] Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization (CoRL 2021)
1. DEEP LEARNING JP
[DL Papers]
http://deeplearning.jp/
Adversarial Skill Chaining for Long-Horizon Robot Manipulation via
Terminal State Regularization (CoRL 2021)
Presenter: Mitsuhiko Nakamoto, The University of Tokyo
2. Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization
https://clvrai.github.io/skill-chaining/
Author:
Youngwoon Lee, Joseph J. Lim, Anima Anandkumar, Yuke Zhu (USC, NVIDA, Caltech, UT Austin)
Conference: CoRL 2021
概要:
複数のskillを繋げてlong-horizon manipulation taskを解くRL手法を提案
書誌情報
9. Step 2. Terminal State Regularization
- discriminator は、入力 が の初期状態に似ていれば大きい値を出力する
- の終端状態が の初期状態分布に近づくような正則化項 (terminal state regularization)
Di+1
ω (st) st πi+1
πi πi+1
- を以下の報酬でfine-tuning
πi
Ri
(st, at, st+1; ϕ, ω) = λ1Ri
ENV (st, at, st+1) + λ2Ri
GAIL (st, at; ϕ) + λ3Ri
TSR (st+1; ω)
タスクを成功させるための報酬 terminal state regularization
11. 実験環境: Furniture Assembly
TABLE LACK:
table leg x 4
CHAIR INGOLF:
seat supports x 2, chair seat x 1, front legs x 1
- それぞれ4つのsubtaskに分解
- GAIL用のdemonstrationは、各subtaskにつき200個用意