SlideShare a Scribd company logo
Skill-Based Meta-Reinforcement Learning
Taewook Nam Shao-Hua Sun Karl Pertsch

Sung Ju Hwang Joseph J. Lim
Human Leverages Prior Knowledge
“Cook a pancake”
Human Leverages Prior Knowledge
“Cook a pancake”
SAC Policy
Human Leverages Prior Knowledge
“Cook a pancake”
Prior knowledge Prior knowledge
Human Leverages Prior Knowledge
“Cook a pancake”
How to hold frying pan
How to turn on the stove
Human Leverages Prior Knowledge
“Cook a pancake”
“Make a sandwich”
“Fry an egg”
Human Leverages Prior Knowledge
“Cook a pancake”
“Make a sandwich”
“Fry an egg”
How to hold frying pan
How to turn on the stove
Skill-based RL Meta-RL
Skill-Based Reinforcement Learning[1, 2]
Task-Agnostic

Dataset
How to turn on a stove
How to hold a frying pan
Skill
[1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020

[2] Opal: O
ffl
ine Primitive Discovery for Accelerating O
ffl
ine Reinforcement Learning. Ajay et al. ICLR 2021
Skill-Based Reinforcement Learning[1, 2]
Task-Agnostic

Dataset
How to turn on a stove
How to hold a frying pan
Skill
[1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020

[2] Opal: O
ffl
ine Primitive Discovery for Accelerating O
ffl
ine Reinforcement Learning. Ajay et al. ICLR 2021
Reward
Skill-Based Reinforcement Learning[1, 2]
T1
T2
T3
T4
TT
T5
Target Task
Task-Agnostic

Dataset
How to turn on a stove
How to hold a frying pan
Skill
[1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020

[2] Opal: O
ffl
ine Primitive Discovery for Accelerating O
ffl
ine Reinforcement Learning. Ajay et al. ICLR 2021
+ E
ffi
cient exploration

Reward
T1
T2
T3
T4
TT
T5
Target Task
Task-Agnostic

Dataset
How to turn on a stove
How to hold a frying pan
Skill
Skill-Based Reinforcement Learning[1, 2]
[1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020

[2] Opal: O
ffl
ine Primitive Discovery for Accelerating O
ffl
ine Reinforcement Learning. Ajay et al. ICLR 2021
+ E
ffi
cient exploration

- How to learn quickly
Reward
Meta Reinforcement Learning[1, 2]
T1
T2
T3
T4
TT
T5
Target Task
Training Tasks
T1
T2
T5
T3
T4
“Fry an egg”
“Make a sandwich”
[1] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Finn et al. ICML 2017

[2] E
ffi
cient O
ff
-Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019
“Cook a pancake”
+ How to learn quickly
T1
T2
T3
T4
TT
T5
Target Task
Training Tasks
T1
T2
T5
T3
T4
“Fry an egg” “Cook a pancake”
“Make a sandwich”
Meta Reinforcement Learning[1, 2]
[1] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Finn et al. ICML 2017

[2] E
ffi
cient O
ff
-Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019
+ How to learn quickly

- Limited to short-horizon task
This Work : Meta-RL + Skill-based RL
Training

Tasks
T1
T2
T5
T3
T4
Task-Agnostic

Dataset
T1
T2
T3
T4
TT
T5
Target Task
This Work : Meta-RL + Skill-based RL
Useful skill
T1
T2
T3
T4
TT
T5
Target Task
T1
T2
T5
T3
T4
Meta-RL + Skill-based RL
How to learn 

quickly
T1
T2
T3
T4
TT
T5
Target Task
T1
T2
T5
T3
T4
Skill-Based Meta-Reinforcement Learning
T1
T2
T3
T4
TT
T5
T1
T2
T5
T3
T4
Fast learning of

new long horizon task
Skill-Based Meta-Reinforcement Learning
T1
T2
T3
T4
TT
T5
T1
T2
T5
T3
T4
Fast learning of

new long horizon task
SiMPL
Phase 1 : Skill Extraction
Extract skill from task-agnostic o
ffl
ine data, following SPiRL[1].
Skill
Task-Agnostic Data
a0 a1 a2 a3
s0 s1 s2 s3 s4 …
[1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020
Phase 2 : Skill-based Meta-training
Meta-train based on extracted skill, following PEARL[1].
Meta Policy
T1
T2
Meta-Training Tasks
T5
T3
T4 Skill
[1] E
ffi
cient O
ff
-Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019
Phase 2 : Skill-based Meta-training
Meta-train based on extracted skill, following PEARL[1].
Transitions
Meta Policy
Task Encoder
T1
T2
Meta-Training Tasks
T5
T3
T4 Skill
[1] E
ffi
cient O
ff
-Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019
Phase 3 : Target Task Learning
Warm-start target task learning by task encoding.
Target Task
T1
T2
T3
T4
TT
T5
Phase 3 : Target Task Learning
Warm-start target task learning by task encoding.
Initial Exploration
Target Task
T1
T2
T3
T4
TT
T5
Task Encoder
Phase 3 : Target Task Learning
Warm-start target task learning by task encoding.
Policy
Task Encoder
Skill
Target Task
T1
T2
T3
T4
TT
T5
Initial Exploration
Phase 3 : Target Task Learning
Warm-start target task learning by task encoding.
Policy
Task Encoder
Skill
Target Task
T1
T2
T3
T4
TT
T5
Initial Exploration
Fine-tune
Environment
Maze Navigation

2000 steps / sparse reward for completion
Kitchen Manipulation

280 steps / sparse reward for subtask completions
Environment
Meta-Training Tasks
Target Tasks
arget Tasks Agent
Meta-training Tasks
Target Tasks
top burner
light switch
slide cabinet hinge cabinet
slide cabinet bottom burner
bottom burner
kettle
bottom burner light switch top burner
microwave
kettle slide cabinet hinge cabinet
light switch
1
2
3
4
vigation (b) Kitchen Manipulation
Target Tasks
Target Tasks
rget Tasks Agent
Meta-training Tasks
Target Tasks
top burner
light switch
slide cabinet hinge cabinet
slide cabinet bottom burner
bottom burner
kettle
bottom burner light switch top burner
microwave
kettle slide cabinet hinge cabinet
light switch
1
2
3
4
vigation (b) Kitchen Manipulation
Target Tasks
Meta-Training Tasks
SiMPL Learns Quickly
SiMPL can solve this task in 100 episode, but other baseline can’t.
Meta-training Tasks
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Target Task Agent Trajectory
Episode 0 Episode 20 Episode 100
SiMPL

(Ours)
SPiRL
Episode 0 Episode 20 Episode 100
PEARL-ft
MTRL
Target Location
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Meta-training Tasks
Agent Trajectory
Start Location
SiMPL Learns Quickly
SiMPL can solve this task in 100 episode, but other baseline can’t.
Meta-training Tasks
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Target Task Agent Trajectory
Episode 0 Episode 20 Episode 100
SiMPL

(Ours)
SPiRL
Episode 0 Episode 20 Episode 100
PEARL-ft
MTRL
Target Location
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Meta-training Tasks
Agent Trajectory
Start Location
SiMPL Learns Quickly
SiMPL can solve this task in 100 episode, but other baseline can’t.
Meta-training Tasks
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Target Task Agent Trajectory
Episode 0 Episode 20 Episode 100
SiMPL

(Ours)
SPiRL
Episode 0 Episode 20 Episode 100
PEARL-ft
MTRL
Target Location
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Meta-training Tasks
Agent Trajectory
Start Location
SiMPL Learns Quickly
SiMPL can solve this task in 100 episode, but other baseline can’t.
Meta-training Tasks
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Target Task Agent Trajectory
Episode 0 Episode 20 Episode 100
SiMPL

(Ours)
SPiRL
Episode 0 Episode 20 Episode 100
PEARL-ft
MTRL
Target Location
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Meta-training Tasks
Agent Trajectory
Start Location
SiMPL Learns Quickly
SiMPL can solve this task in 100 episode, but other baseline can’t.
Meta-training Tasks
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Target Task Agent Trajectory
Episode 0 Episode 20 Episode 100
SiMPL

(Ours)
SPiRL
Episode 0 Episode 20 Episode 100
PEARL-ft
MTRL
Target Location
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Meta-training Tasks
Agent Trajectory
Start Location
SiMPL Learns Quickly
SiMPL can solve this task in 100 episode, but other baseline can’t.
Meta-training Tasks
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Target Task Agent Trajectory
Episode 0 Episode 20 Episode 100
SiMPL

(Ours)
SPiRL
Episode 0 Episode 20 Episode 100
PEARL-ft
MTRL
Target Location
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Meta-training Tasks
Agent Trajectory
Start Location
SiMPL Learns Quickly
SiMPL converges faster than MTRL / Skill-based RL / Meta-RL baselines.
SiMPL (Ours) SPiRL MTRL PEARL-ft SAC
PEARL
SiMPL (Ours) SPiRL MTRL PEARL-ft SAC
PEARL
• SiMPL can leverage both o
ffl
ine dataset and tasks by combining

skill-based RL and meta-RL
• SiMPL can learn new long-horizon and sparse-reward tasks faster
Summary
Summary
• SiMPL can leverage both o
ffl
ine dataset and tasks by combining

skill-based RL and meta-RL
• SiMPL can learn new long-horizon and sparse-reward tasks faster
Skill-Based Meta-Reinforcement Learning
Taewook Nam Shao-Hua Sun Karl Pertsch

Sung Ju Hwang Joseph J. Lim
Paper & Code : namsan96.github.io/SiMPL

More Related Content

PDF
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
PDF
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
PDF
[DL輪読会] off-policyなメタ強化学習
PDF
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
PDF
Reinforcement Learning
PDF
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
PDF
Learning how to learn
PDF
Towards Machine Learning of Motor Skills
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
[DL輪読会] off-policyなメタ強化学習
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networks
Reinforcement Learning
Continuous Adaptation via Meta Learning in Nonstationary and Competitive Envi...
Learning how to learn
Towards Machine Learning of Motor Skills

Similar to Skill-Based Meta-Reinforcement Learning (20)

PPTX
Introduction to Reinforcement Learning.pptx
PPTX
semi supervised Learning and Reinforcement learning (1).pptx
PDF
anintroductiontoreinforcementlearning-180912151720.pdf
PPTX
An introduction to reinforcement learning
PDF
MILA DL & RL summer school highlights
PDF
Meta Dropout: Learning to Perturb Latent Features for Generalization
PPTX
Designing an AI that gains experience for absolute beginners
PDF
Reinforcement learning, energy systems and deep neural nets
PPTX
Battlesnake AWS ML Meetup Victoria 2020
PPTX
Reinforcement learning.pptx
PPTX
An Introduction to Reinforcement Learning (December 2018)
PPTX
Reinforcement learning
PPTX
Machine Learning and its types with application
PDF
Reinforcement Learning.pdf
PDF
BOIL: Towards Representation Change for Few-shot Learning
PDF
PPTX
Information Theoretic aspect of reinforcement learning
PDF
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
PPTX
Learning in AI
PPT
CS8082_MachineLearnigTechniques _Unit-1.ppt
Introduction to Reinforcement Learning.pptx
semi supervised Learning and Reinforcement learning (1).pptx
anintroductiontoreinforcementlearning-180912151720.pdf
An introduction to reinforcement learning
MILA DL & RL summer school highlights
Meta Dropout: Learning to Perturb Latent Features for Generalization
Designing an AI that gains experience for absolute beginners
Reinforcement learning, energy systems and deep neural nets
Battlesnake AWS ML Meetup Victoria 2020
Reinforcement learning.pptx
An Introduction to Reinforcement Learning (December 2018)
Reinforcement learning
Machine Learning and its types with application
Reinforcement Learning.pdf
BOIL: Towards Representation Change for Few-shot Learning
Information Theoretic aspect of reinforcement learning
Context-aware Dynamics Model for Generalization in Model-Based Reinforcement ...
Learning in AI
CS8082_MachineLearnigTechniques _Unit-1.ppt
Ad

More from MLAI2 (20)

PDF
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
PDF
Online Hyperparameter Meta-Learning with Hypergradient Distillation
PDF
Online Coreset Selection for Rehearsal-based Continual Learning
PDF
Representational Continuity for Unsupervised Continual Learning
PDF
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
PDF
Edge Representation Learning with Hypergraphs
PDF
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
PDF
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
PDF
Task Adaptive Neural Network Search with Meta-Contrastive Learning
PDF
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
PDF
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
PDF
Accurate Learning of Graph Representations with Graph Multiset Pooling
PDF
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
PDF
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
PDF
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
PDF
Adversarial Self-Supervised Contrastive Learning
PDF
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
PDF
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
PDF
Cost-effective Interactive Attention Learning with Neural Attention Process
PDF
Adversarial Neural Pruning with Latent Vulnerability Suppression
Meta Learning Low Rank Covariance Factors for Energy-Based Deterministic Unce...
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Online Coreset Selection for Rehearsal-based Continual Learning
Representational Continuity for Unsupervised Continual Learning
Sequential Reptile_Inter-Task Gradient Alignment for Multilingual Learning
Edge Representation Learning with Hypergraphs
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Mini-Batch Consistent Slot Set Encoder For Scalable Set Encoding
Task Adaptive Neural Network Search with Meta-Contrastive Learning
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Accurate Learning of Graph Representations with Graph Multiset Pooling
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
Adversarial Self-Supervised Contrastive Learning
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Cost-effective Interactive Attention Learning with Neural Attention Process
Adversarial Neural Pruning with Latent Vulnerability Suppression
Ad

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Cloud computing and distributed systems.
PPTX
Big Data Technologies - Introduction.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPT
Teaching material agriculture food technology
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
Machine learning based COVID-19 study performance prediction
Cloud computing and distributed systems.
Big Data Technologies - Introduction.pptx
MYSQL Presentation for SQL database connectivity
sap open course for s4hana steps from ECC to s4
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Understanding_Digital_Forensics_Presentation.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Teaching material agriculture food technology
Diabetes mellitus diagnosis method based random forest with bat algorithm
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Approach and Philosophy of On baking technology

Skill-Based Meta-Reinforcement Learning

  • 1. Skill-Based Meta-Reinforcement Learning Taewook Nam Shao-Hua Sun Karl Pertsch Sung Ju Hwang Joseph J. Lim
  • 2. Human Leverages Prior Knowledge “Cook a pancake”
  • 3. Human Leverages Prior Knowledge “Cook a pancake” SAC Policy
  • 4. Human Leverages Prior Knowledge “Cook a pancake” Prior knowledge Prior knowledge
  • 5. Human Leverages Prior Knowledge “Cook a pancake” How to hold frying pan How to turn on the stove
  • 6. Human Leverages Prior Knowledge “Cook a pancake” “Make a sandwich” “Fry an egg”
  • 7. Human Leverages Prior Knowledge “Cook a pancake” “Make a sandwich” “Fry an egg” How to hold frying pan How to turn on the stove Skill-based RL Meta-RL
  • 8. Skill-Based Reinforcement Learning[1, 2] Task-Agnostic Dataset How to turn on a stove How to hold a frying pan Skill [1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020
 [2] Opal: O ffl ine Primitive Discovery for Accelerating O ffl ine Reinforcement Learning. Ajay et al. ICLR 2021
  • 9. Skill-Based Reinforcement Learning[1, 2] Task-Agnostic Dataset How to turn on a stove How to hold a frying pan Skill [1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020
 [2] Opal: O ffl ine Primitive Discovery for Accelerating O ffl ine Reinforcement Learning. Ajay et al. ICLR 2021 Reward
  • 10. Skill-Based Reinforcement Learning[1, 2] T1 T2 T3 T4 TT T5 Target Task Task-Agnostic Dataset How to turn on a stove How to hold a frying pan Skill [1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020
 [2] Opal: O ffl ine Primitive Discovery for Accelerating O ffl ine Reinforcement Learning. Ajay et al. ICLR 2021 + E ffi cient exploration Reward
  • 11. T1 T2 T3 T4 TT T5 Target Task Task-Agnostic Dataset How to turn on a stove How to hold a frying pan Skill Skill-Based Reinforcement Learning[1, 2] [1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020
 [2] Opal: O ffl ine Primitive Discovery for Accelerating O ffl ine Reinforcement Learning. Ajay et al. ICLR 2021 + E ffi cient exploration - How to learn quickly Reward
  • 12. Meta Reinforcement Learning[1, 2] T1 T2 T3 T4 TT T5 Target Task Training Tasks T1 T2 T5 T3 T4 “Fry an egg” “Make a sandwich” [1] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Finn et al. ICML 2017
 [2] E ffi cient O ff -Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019 “Cook a pancake” + How to learn quickly
  • 13. T1 T2 T3 T4 TT T5 Target Task Training Tasks T1 T2 T5 T3 T4 “Fry an egg” “Cook a pancake” “Make a sandwich” Meta Reinforcement Learning[1, 2] [1] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Finn et al. ICML 2017
 [2] E ffi cient O ff -Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019 + How to learn quickly - Limited to short-horizon task
  • 14. This Work : Meta-RL + Skill-based RL Training Tasks T1 T2 T5 T3 T4 Task-Agnostic Dataset T1 T2 T3 T4 TT T5 Target Task
  • 15. This Work : Meta-RL + Skill-based RL Useful skill T1 T2 T3 T4 TT T5 Target Task T1 T2 T5 T3 T4
  • 16. Meta-RL + Skill-based RL How to learn quickly T1 T2 T3 T4 TT T5 Target Task T1 T2 T5 T3 T4
  • 19. Phase 1 : Skill Extraction Extract skill from task-agnostic o ffl ine data, following SPiRL[1]. Skill Task-Agnostic Data a0 a1 a2 a3 s0 s1 s2 s3 s4 … [1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020
  • 20. Phase 2 : Skill-based Meta-training Meta-train based on extracted skill, following PEARL[1]. Meta Policy T1 T2 Meta-Training Tasks T5 T3 T4 Skill [1] E ffi cient O ff -Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019
  • 21. Phase 2 : Skill-based Meta-training Meta-train based on extracted skill, following PEARL[1]. Transitions Meta Policy Task Encoder T1 T2 Meta-Training Tasks T5 T3 T4 Skill [1] E ffi cient O ff -Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019
  • 22. Phase 3 : Target Task Learning Warm-start target task learning by task encoding. Target Task T1 T2 T3 T4 TT T5
  • 23. Phase 3 : Target Task Learning Warm-start target task learning by task encoding. Initial Exploration Target Task T1 T2 T3 T4 TT T5 Task Encoder
  • 24. Phase 3 : Target Task Learning Warm-start target task learning by task encoding. Policy Task Encoder Skill Target Task T1 T2 T3 T4 TT T5 Initial Exploration
  • 25. Phase 3 : Target Task Learning Warm-start target task learning by task encoding. Policy Task Encoder Skill Target Task T1 T2 T3 T4 TT T5 Initial Exploration Fine-tune
  • 26. Environment Maze Navigation 2000 steps / sparse reward for completion Kitchen Manipulation 280 steps / sparse reward for subtask completions
  • 27. Environment Meta-Training Tasks Target Tasks arget Tasks Agent Meta-training Tasks Target Tasks top burner light switch slide cabinet hinge cabinet slide cabinet bottom burner bottom burner kettle bottom burner light switch top burner microwave kettle slide cabinet hinge cabinet light switch 1 2 3 4 vigation (b) Kitchen Manipulation Target Tasks Target Tasks rget Tasks Agent Meta-training Tasks Target Tasks top burner light switch slide cabinet hinge cabinet slide cabinet bottom burner bottom burner kettle bottom burner light switch top burner microwave kettle slide cabinet hinge cabinet light switch 1 2 3 4 vigation (b) Kitchen Manipulation Target Tasks Meta-Training Tasks
  • 28. SiMPL Learns Quickly SiMPL can solve this task in 100 episode, but other baseline can’t. Meta-training Tasks Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Target Task Agent Trajectory Episode 0 Episode 20 Episode 100 SiMPL (Ours) SPiRL Episode 0 Episode 20 Episode 100 PEARL-ft MTRL Target Location Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Meta-training Tasks Agent Trajectory Start Location
  • 29. SiMPL Learns Quickly SiMPL can solve this task in 100 episode, but other baseline can’t. Meta-training Tasks Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Target Task Agent Trajectory Episode 0 Episode 20 Episode 100 SiMPL (Ours) SPiRL Episode 0 Episode 20 Episode 100 PEARL-ft MTRL Target Location Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Meta-training Tasks Agent Trajectory Start Location
  • 30. SiMPL Learns Quickly SiMPL can solve this task in 100 episode, but other baseline can’t. Meta-training Tasks Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Target Task Agent Trajectory Episode 0 Episode 20 Episode 100 SiMPL (Ours) SPiRL Episode 0 Episode 20 Episode 100 PEARL-ft MTRL Target Location Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Meta-training Tasks Agent Trajectory Start Location
  • 31. SiMPL Learns Quickly SiMPL can solve this task in 100 episode, but other baseline can’t. Meta-training Tasks Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Target Task Agent Trajectory Episode 0 Episode 20 Episode 100 SiMPL (Ours) SPiRL Episode 0 Episode 20 Episode 100 PEARL-ft MTRL Target Location Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Meta-training Tasks Agent Trajectory Start Location
  • 32. SiMPL Learns Quickly SiMPL can solve this task in 100 episode, but other baseline can’t. Meta-training Tasks Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Target Task Agent Trajectory Episode 0 Episode 20 Episode 100 SiMPL (Ours) SPiRL Episode 0 Episode 20 Episode 100 PEARL-ft MTRL Target Location Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Meta-training Tasks Agent Trajectory Start Location
  • 33. SiMPL Learns Quickly SiMPL can solve this task in 100 episode, but other baseline can’t. Meta-training Tasks Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Target Task Agent Trajectory Episode 0 Episode 20 Episode 100 SiMPL (Ours) SPiRL Episode 0 Episode 20 Episode 100 PEARL-ft MTRL Target Location Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Meta-training Tasks Agent Trajectory Start Location
  • 34. SiMPL Learns Quickly SiMPL converges faster than MTRL / Skill-based RL / Meta-RL baselines. SiMPL (Ours) SPiRL MTRL PEARL-ft SAC PEARL SiMPL (Ours) SPiRL MTRL PEARL-ft SAC PEARL
  • 35. • SiMPL can leverage both o ffl ine dataset and tasks by combining
 skill-based RL and meta-RL • SiMPL can learn new long-horizon and sparse-reward tasks faster Summary
  • 36. Summary • SiMPL can leverage both o ffl ine dataset and tasks by combining
 skill-based RL and meta-RL • SiMPL can learn new long-horizon and sparse-reward tasks faster
  • 37. Skill-Based Meta-Reinforcement Learning Taewook Nam Shao-Hua Sun Karl Pertsch Sung Ju Hwang Joseph J. Lim Paper & Code : namsan96.github.io/SiMPL