SlideShare a Scribd company logo
Multi-Task Learning for NLP
2017/04/17 Parsing Group
Motoki Sato
What is Multi-task?
l Single task
2
l Multi task
Model 1
Input
(sentence)
POS
(task1)
Model 2
Input
(sentence)
Chunking
(task2)
Model
Input
(sentence)
POS
(task1)
Chunking
(task2)
Multi-task learning Paper (1)
3
l (Søgaard, 2016) ACL 2016 short.
l Tasks:
–  POS (low level task)
–  Chunking (high level task)
Multi-task learning Paper (2)
4
l (Hashimoto, 2016) arxiv.
l Tasks (many tasks):
–  POS, Chunking, Dependency parsing,
–  Semantic relatedness, Textual entailment
Dataset
5
(Søgaard, 2016) (Hashimoto, 2016)
POS Penn Treebank Penn Treebank
Chunking Penn Treebank Penn Treebank
CCG Penn Treebank -
Dependency parsing - Penn Treebank
Semantic relatedness - SICK
Textual entailment - SICK
(Søgaard, 2016)
(Søgaard, 2016)
POS Low level task
Chunking High level task
CCG High level task
6
Input Words and Predict Tag Examples:
Multi-task for Vision?
l  Cha Zhang, et al. “Improving Multiview Face Detection with Multi-Task Deep
Convolutional Neural Networks”
7
Share hidden layers
(shared representation)
Multi-task for NLP?
l  Collobert, et al. “Natural Language Processing (Almost) from Scratch”
8
Share
hidden
layers
Individual
layer for
each task
(Søgaard, 2016) Outermost ver.
9
Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
w0 w1 w2 w3
Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
POS
Tag
Chunk
Tag
POS
Tag
Chunk
Tag
… …
3-th layer
2-th layer
1-th layer
Previous multi-task learning shared hidden layers,
Share
hidden
layers
(Søgaard, 2016) lower-layer ver.
10
Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
w0 w1 w2 w3
Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
Chunk
Tag
Chunk
Tag
… …
3-th layer
2-th layer
1-th layer
Previous multi-task learning shared hidden layers,
POS
Tag
POS
Tag
POS
Tag
POS
Tag
Experiments
11
Low-level
task
High-level task
Single task
Multi task
It is consistently better to have POS supervision at
the innermost rather than the outermost layer.
(Søgaard, 2016) Domain Adaptation
l What is domain adaptation?
12
Source
Trained
Model
Trained
Model
Target
(ex.) News domain (ex.) Twitter domain
(Søgaard, 2016) Source Training
13
Source
Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
w0 w1 w2 w3
Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
Chunk
Tag
Chunk
Tag
… …
3-th layer
2-th layer
1-th layer
POS
Tag
POS
Tag
POS
Tag
POS
Tag
WSJ newswire
(Søgaard, 2016) Target Training
14
Target
Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
w0 w1 w2 w3
Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM
Chunk
Tag
Chunk
Tag
… …
3-th layer
2-th layer
1-th layer
Re-train
POS at
Target
Domain
POS
Tag
POS
Tag
POS
Tag
broadcast, weblogs domain
No Chunk
training at
Target
Domain
Domain Adaptation Experiments
15
High-level task supervision in the source domain,
lower-level task supervision in the target domain.
(Hashimoto, 2016)
16
(Hashimoto, 2016)
17
(Hashimoto, 2016)
18
Training Loss for Multi Task Learning
l In (Hashimoto, 2016),
19
L2-norm regularization term
The embedding parameter after training the final
task in the top-most layer at the previous training
epoch.
Dataset
20
(Søgaard, 2016) (Hashimoto, 2016)
POS Penn Treebank Penn Treebank
Chunking Penn Treebank Penn Treebank
CCG Penn Treebank -
Dependency parsing - Penn Treebank
Semantic relatedness - SICK
Textual entailment - SICK
Since (Søgaard, 2016) uses same dataset (same
input), they can use the sum of loss for multi-tasks.
Catastrophic Forgetting
l  “Overcoming Catastrophic Forgetting in Neural Networks”, James
Kirkpatrick, Raia Hadsell, et al. https://arxiv.org/abs/1612.00796
l  https://theneuralperspective.com/2017/04/01/overcoming-catastrophic-
forgetting-in-neural-networks/
21

More Related Content

PPTX
04 performance metrics v2
PPTX
PolyLoss: A POLYNOMIAL EXPANSION PERSPECTIVE OF CLASSIFICATION LOSS FUNCTION...
PDF
バンディット問題の理論とアルゴリズムとその実装
PDF
深層学習Day4レポート(小川成)
PPTX
Natural Language Processing (NLP) - Introduction
PDF
Thai Text processing by Transfer Learning using Transformer (Bert)
PPTX
NLP_KASHK:Text Normalization
PDF
NLP using transformers
04 performance metrics v2
PolyLoss: A POLYNOMIAL EXPANSION PERSPECTIVE OF CLASSIFICATION LOSS FUNCTION...
バンディット問題の理論とアルゴリズムとその実装
深層学習Day4レポート(小川成)
Natural Language Processing (NLP) - Introduction
Thai Text processing by Transfer Learning using Transformer (Bert)
NLP_KASHK:Text Normalization
NLP using transformers

What's hot (20)

PDF
Convolutional Neural Networks and Natural Language Processing
PDF
Gpt models
PDF
Recurrent Neural Networks, LSTM and GRU
PDF
Natural Language Processing
PPTX
MLデザインパターン入門_Embeddings
PPTX
Text Classification
PDF
딥러닝 기반의 자연어처리 최근 연구 동향
PPTX
BERT分類ワークショップ.pptx
PPTX
Neural Networks with Focus on Language Modeling
PDF
数式を使わずイメージで理解するEMアルゴリズム
PPTX
Recurrent Neural Networks (RNNs)
PPTX
頻度論とベイズ論と誤差最小化について
PDF
Machine Learning for Everyone
PPTX
Text similarity measures
PPTX
One shot learning
PDF
Build an efficient Machine Learning model with LightGBM
PPT
Reinforcement Learning Q-Learning
PPTX
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
PDF
Lecture3 - Machine Learning
PDF
論文紹介:”Playing hard exploration games by watching YouTube“
Convolutional Neural Networks and Natural Language Processing
Gpt models
Recurrent Neural Networks, LSTM and GRU
Natural Language Processing
MLデザインパターン入門_Embeddings
Text Classification
딥러닝 기반의 자연어처리 최근 연구 동향
BERT分類ワークショップ.pptx
Neural Networks with Focus on Language Modeling
数式を使わずイメージで理解するEMアルゴリズム
Recurrent Neural Networks (RNNs)
頻度論とベイズ論と誤差最小化について
Machine Learning for Everyone
Text similarity measures
One shot learning
Build an efficient Machine Learning model with LightGBM
Reinforcement Learning Q-Learning
Dimensionality reduction with t-SNE(Rtsne) and UMAP(uwot) using R packages.
Lecture3 - Machine Learning
論文紹介:”Playing hard exploration games by watching YouTube“
Ad

Similar to Multi-Task Learning for NLP (20)

PDF
Neural Semi-supervised Learning under Domain Shift
PPTX
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
PDF
Multi task learning stepping away from narrow expert models 7.11.18
PPTX
Deep Neural Methods for Retrieval
PPTX
transfer.pptx
PDF
1066_multitask_prompted_training_en.pdf
PDF
[DL輪読会]Model-Level Dual Learning
PDF
Transfer Learning for Low Resource Languages and Domains
PDF
Frontiers of Natural Language Processing
PDF
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
PDF
CoreML for NLP (Melb Cocoaheads 08/02/2018)
PDF
Learning with limited labelled data in NLP: multi-task learning and beyond
PDF
Transfer Learning for Natural Language Processing
PDF
Atlanta MLconf Machine Learning Conference 09-23-2016
PDF
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
PDF
Deep learning based drug protein interaction
PDF
Multi-Task Learning With Deep Neural Networks
PPTX
Natural Language Processing Advancements By Deep Learning: A Survey
PDF
Deep Multi-Task Learning with Shared Memory
PDF
Natural Language Processing (NLP)
Neural Semi-supervised Learning under Domain Shift
A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks
Multi task learning stepping away from narrow expert models 7.11.18
Deep Neural Methods for Retrieval
transfer.pptx
1066_multitask_prompted_training_en.pdf
[DL輪読会]Model-Level Dual Learning
Transfer Learning for Low Resource Languages and Domains
Frontiers of Natural Language Processing
Multi-task Learning of Pairwise Sequence Classification Tasks Over Disparate ...
CoreML for NLP (Melb Cocoaheads 08/02/2018)
Learning with limited labelled data in NLP: multi-task learning and beyond
Transfer Learning for Natural Language Processing
Atlanta MLconf Machine Learning Conference 09-23-2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Deep learning based drug protein interaction
Multi-Task Learning With Deep Neural Networks
Natural Language Processing Advancements By Deep Learning: A Survey
Deep Multi-Task Learning with Shared Memory
Natural Language Processing (NLP)
Ad

Recently uploaded (20)

PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Logistic Regression ml machine learning.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Lecture1 pattern recognition............
PPTX
Computer network topology notes for revision
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Taxes Foundatisdcsdcsdon Certificate.pdf
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Database Infoormation System (DBIS).pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Fluorescence-microscope_Botany_detailed content
Logistic Regression ml machine learning.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Lecture1 pattern recognition............
Computer network topology notes for revision
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
IB Computer Science - Internal Assessment.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Galatica Smart Energy Infrastructure Startup Pitch Deck
Business Acumen Training GuidePresentation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Reliability_Chapter_ presentation 1221.5784
climate analysis of Dhaka ,Banglades.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Taxes Foundatisdcsdcsdon Certificate.pdf
Major-Components-ofNKJNNKNKNKNKronment.pptx
Miokarditis (Inflamasi pada Otot Jantung)
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Database Infoormation System (DBIS).pptx

Multi-Task Learning for NLP

  • 1. Multi-Task Learning for NLP 2017/04/17 Parsing Group Motoki Sato
  • 2. What is Multi-task? l Single task 2 l Multi task Model 1 Input (sentence) POS (task1) Model 2 Input (sentence) Chunking (task2) Model Input (sentence) POS (task1) Chunking (task2)
  • 3. Multi-task learning Paper (1) 3 l (Søgaard, 2016) ACL 2016 short. l Tasks: –  POS (low level task) –  Chunking (high level task)
  • 4. Multi-task learning Paper (2) 4 l (Hashimoto, 2016) arxiv. l Tasks (many tasks): –  POS, Chunking, Dependency parsing, –  Semantic relatedness, Textual entailment
  • 5. Dataset 5 (Søgaard, 2016) (Hashimoto, 2016) POS Penn Treebank Penn Treebank Chunking Penn Treebank Penn Treebank CCG Penn Treebank - Dependency parsing - Penn Treebank Semantic relatedness - SICK Textual entailment - SICK
  • 6. (Søgaard, 2016) (Søgaard, 2016) POS Low level task Chunking High level task CCG High level task 6 Input Words and Predict Tag Examples:
  • 7. Multi-task for Vision? l  Cha Zhang, et al. “Improving Multiview Face Detection with Multi-Task Deep Convolutional Neural Networks” 7 Share hidden layers (shared representation)
  • 8. Multi-task for NLP? l  Collobert, et al. “Natural Language Processing (Almost) from Scratch” 8 Share hidden layers Individual layer for each task
  • 9. (Søgaard, 2016) Outermost ver. 9 Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM w0 w1 w2 w3 Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM POS Tag Chunk Tag POS Tag Chunk Tag … … 3-th layer 2-th layer 1-th layer Previous multi-task learning shared hidden layers, Share hidden layers
  • 10. (Søgaard, 2016) lower-layer ver. 10 Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM w0 w1 w2 w3 Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Chunk Tag Chunk Tag … … 3-th layer 2-th layer 1-th layer Previous multi-task learning shared hidden layers, POS Tag POS Tag POS Tag POS Tag
  • 11. Experiments 11 Low-level task High-level task Single task Multi task It is consistently better to have POS supervision at the innermost rather than the outermost layer.
  • 12. (Søgaard, 2016) Domain Adaptation l What is domain adaptation? 12 Source Trained Model Trained Model Target (ex.) News domain (ex.) Twitter domain
  • 13. (Søgaard, 2016) Source Training 13 Source Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM w0 w1 w2 w3 Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Chunk Tag Chunk Tag … … 3-th layer 2-th layer 1-th layer POS Tag POS Tag POS Tag POS Tag WSJ newswire
  • 14. (Søgaard, 2016) Target Training 14 Target Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM w0 w1 w2 w3 Bi-LSTM Bi-LSTM Bi-LSTM Bi-LSTM Chunk Tag Chunk Tag … … 3-th layer 2-th layer 1-th layer Re-train POS at Target Domain POS Tag POS Tag POS Tag broadcast, weblogs domain No Chunk training at Target Domain
  • 15. Domain Adaptation Experiments 15 High-level task supervision in the source domain, lower-level task supervision in the target domain.
  • 19. Training Loss for Multi Task Learning l In (Hashimoto, 2016), 19 L2-norm regularization term The embedding parameter after training the final task in the top-most layer at the previous training epoch.
  • 20. Dataset 20 (Søgaard, 2016) (Hashimoto, 2016) POS Penn Treebank Penn Treebank Chunking Penn Treebank Penn Treebank CCG Penn Treebank - Dependency parsing - Penn Treebank Semantic relatedness - SICK Textual entailment - SICK Since (Søgaard, 2016) uses same dataset (same input), they can use the sum of loss for multi-tasks.
  • 21. Catastrophic Forgetting l  “Overcoming Catastrophic Forgetting in Neural Networks”, James Kirkpatrick, Raia Hadsell, et al. https://arxiv.org/abs/1612.00796 l  https://theneuralperspective.com/2017/04/01/overcoming-catastrophic- forgetting-in-neural-networks/ 21