SlideShare a Scribd company logo
Understanding and Improving Transformer
From a Multi-Particle Dynamic System
Point of View
1
• https://arxiv.org/abs/1906.02762
• Microsoft Research Asia
•
• Transformer
• Transformer
• Transformer Lie-Trotter
• Lie-Trotter Strang-Marchuk
Macaron
• GLUE
2
• Transformer (eg. ) , = (eg. )
• Transformer SA/Residual→FFN/Residual
Lie-Trotter
• Self-Attention(SA)
• PositionWise—FFN (PW-FFN)
•
• Lie-Trotter 2 3
Strang-Marchuk Transformer
Macaron
• Lie-Trotter Strang-
Marchuk
• NLP Transformer Macaron
Macaron
3
Residual
• 1
• eg. : t :
•
• x(t)
→ Residual
x(t) f(x, t)
4
x(t = tl) ·= xl, (l ∈ [0,L − 1]) tl = t0 + γl, γ ·= Δt
Δx
Δt
=
x2 − x1
t2 − t1
≈ f(xl, tl)
ODE Lie-Trotter
• t i-th
• 2
• F t
•
• G t
•
• Lie-Trotter
•
• Lie-Trotter
5
t + γ
˜x(t)
Transformer
• Encoder (Figure1 ) Decoder (Figure1 )
• Encoder
• Residual
• Self-Attention
• Position-Wise Feed Forword Network
• Decoder
•
6
Multihead(Q, K, V) = concat(head1, ⋯, headH)WO
headk = Attention(QWQ
k
, KWK
k , VWV
k )
Attention(Q, K, V) = softmax(QKT
/ dmodel) ⋅ V
dmodel
dK
dV
WQ
k
∈ ℝdmodel×dK
WK
k ∈ ℝdmodel×dK
WV
k ∈ ℝdmodel×dV
WO
∈ ℝHdV×dmodel
Transformer Lie-Trotter
• Self-Attention
• : l-th i-th
• l t i-th
head
concat
• PW-FFN
• PW-FFN Residual l+1
xl,i
˜xl,i xl,i
˜xl,i
7
e(k)
i,j
= (xl,iWQ,l
k
) ⋅ (xl,jWK,l
k
)T
/ dmodel
Transformer Lie-Trotter
• Self-Attention PW-FFN Transformer 1
• Lie-Trotter
• Transformer 2 Lie-Trotter
F G 1γ
8
Strang-Marchuk
•
• Transformer Lie-Trotter
• Lie-Trotter
• F G
→ 2 :
• Lie-Trotter
→ Strang-Marchuk
• Lie-Trotter F→G Strang-Marchuk
G F
• Strang-Marchuk 3 : Lie-Trotter
𝒪(γ2
)
𝒪(γ3
)
9
Strang-Marchuk Transformer
• Transformer Strang-Marchuk
→ Macaron
10
•
• IWSLT14 De→En
• WMT14 En→De
• Transformer
• 6 encoders + 6 decoders
• 3 (base, big, small)
• base:512 8 big:1024 16 small:512 4
• Transformer Macaron
• GLUE
• BERT base 12 768 12
•
• FFN Macaron
11
• Macaron
•
• GLUE
12
• Transformer
• Transformer Lie-Trotter
• Lie-Trotter Strang-Marchuk
Macaron
• GLUE
13

More Related Content

PDF
機械学習モデルのハイパパラメータ最適化
PDF
時系列予測にTransformerを使うのは有効か?
PDF
変分推論法(変分ベイズ法)(PRML第10章)
PPTX
【DL輪読会】A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
PPTX
ベイズ統計学の概論的紹介
PPTX
【DL輪読会】Flow Matching for Generative Modeling
PPTX
【論文読み会】MAUVE: Measuring the Gap Between Neural Text and Human Text using Dive...
PDF
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...
機械学習モデルのハイパパラメータ最適化
時系列予測にTransformerを使うのは有効か?
変分推論法(変分ベイズ法)(PRML第10章)
【DL輪読会】A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
ベイズ統計学の概論的紹介
【DL輪読会】Flow Matching for Generative Modeling
【論文読み会】MAUVE: Measuring the Gap Between Neural Text and Human Text using Dive...
【DL輪読会】Deep Transformers without Shortcuts: Modifying Self-attention for Fait...

What's hot (20)

PDF
【DL輪読会】A Path Towards Autonomous Machine Intelligence
PDF
クラスタリングとレコメンデーション資料
PDF
ICML 2020 最適輸送まとめ
PPTX
ベイズファクターとモデル選択
PDF
深層生成モデルと世界モデル
PPTX
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
PDF
最適輸送の計算アルゴリズムの研究動向
PPTX
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
PDF
Attentionの基礎からTransformerの入門まで
PPTX
[DL輪読会]Pay Attention to MLPs (gMLP)
PPTX
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
PDF
レプリカ交換モンテカルロ法で乱数の生成
PPTX
Sliced Wasserstein距離と生成モデル
PDF
ELBO型VAEのダメなところ
PPTX
強化学習1章
PDF
マルコフ連鎖モンテカルロ法入門-1
PDF
状態空間モデルの考え方・使い方 - TokyoR #38
PDF
スペクトラルグラフ理論入門
PPTX
Partial least squares回帰と画像認識への応用
PPTX
【DL輪読会】時系列予測 Transfomers の精度向上手法
【DL輪読会】A Path Towards Autonomous Machine Intelligence
クラスタリングとレコメンデーション資料
ICML 2020 最適輸送まとめ
ベイズファクターとモデル選択
深層生成モデルと世界モデル
勾配ブースティングの基礎と最新の動向 (MIRU2020 Tutorial)
最適輸送の計算アルゴリズムの研究動向
[DL輪読会]NVAE: A Deep Hierarchical Variational Autoencoder
Attentionの基礎からTransformerの入門まで
[DL輪読会]Pay Attention to MLPs (gMLP)
[DL輪読会]Learning Latent Dynamics for Planning from Pixels
レプリカ交換モンテカルロ法で乱数の生成
Sliced Wasserstein距離と生成モデル
ELBO型VAEのダメなところ
強化学習1章
マルコフ連鎖モンテカルロ法入門-1
状態空間モデルの考え方・使い方 - TokyoR #38
スペクトラルグラフ理論入門
Partial least squares回帰と画像認識への応用
【DL輪読会】時系列予測 Transfomers の精度向上手法
Ad

Similar to [論文紹介] Understanding and improving transformer from a multi particle dynamic system point of view (7)

PPTX
Lecture 2.pptx this is fantastic for all
PPTX
Petri Nets: Properties, Analysis and Applications
PDF
transformada de lapalace universidaqd ppt para find eaño
PDF
Sufficient decrease is all you need
PDF
Experimentation with a Big-Step Semantics for ATL Model Transformations
PPTX
XLnet RoBERTa Reformer
PPT
Interpolating evolutionary tracks of rapidly rotating stars - presentation
Lecture 2.pptx this is fantastic for all
Petri Nets: Properties, Analysis and Applications
transformada de lapalace universidaqd ppt para find eaño
Sufficient decrease is all you need
Experimentation with a Big-Step Semantics for ATL Model Transformations
XLnet RoBERTa Reformer
Interpolating evolutionary tracks of rapidly rotating stars - presentation
Ad

More from Makoto Takenaka (10)

PDF
[論文紹介] Towards Understanding Linear Word Analogies
PDF
Lpixel論文読み会資料 "Interpretation of neural network is fragile"
PDF
Understanding the origin of bias in word embeddings
PPTX
[NeurIPS2018読み会@PFN] On the Dimensionality of Word Embedding
PPTX
[研究室論文紹介用スライド] Adversarial Contrastive Estimation
PPTX
Probabilistic fasttext for multi sense word embeddings
PPTX
Deep neural models of semantic shift
PPTX
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
PDF
multimodal word distributions
PDF
Adversarial Multi-task Learning for Text Classification
[論文紹介] Towards Understanding Linear Word Analogies
Lpixel論文読み会資料 "Interpretation of neural network is fragile"
Understanding the origin of bias in word embeddings
[NeurIPS2018読み会@PFN] On the Dimensionality of Word Embedding
[研究室論文紹介用スライド] Adversarial Contrastive Estimation
Probabilistic fasttext for multi sense word embeddings
Deep neural models of semantic shift
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
multimodal word distributions
Adversarial Multi-task Learning for Text Classification

Recently uploaded (20)

PPTX
The KM-GBF monitoring framework – status & key messages.pptx
PPT
protein biochemistry.ppt for university classes
PDF
An interstellar mission to test astrophysical black holes
PPTX
TOTAL hIP ARTHROPLASTY Presentation.pptx
PDF
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
PDF
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
PDF
AlphaEarth Foundations and the Satellite Embedding dataset
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
Comparative Structure of Integument in Vertebrates.pptx
PDF
. Radiology Case Scenariosssssssssssssss
PPTX
BIOMOLECULES PPT........................
PPTX
Taita Taveta Laboratory Technician Workshop Presentation.pptx
PPTX
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
PPTX
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
PPTX
INTRODUCTION TO EVS | Concept of sustainability
PPTX
ECG_Course_Presentation د.محمد صقران ppt
PPTX
Introduction to Fisheries Biotechnology_Lesson 1.pptx
PPTX
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
PPTX
famous lake in india and its disturibution and importance
PPTX
Microbiology with diagram medical studies .pptx
The KM-GBF monitoring framework – status & key messages.pptx
protein biochemistry.ppt for university classes
An interstellar mission to test astrophysical black holes
TOTAL hIP ARTHROPLASTY Presentation.pptx
IFIT3 RNA-binding activity primores influenza A viruz infection and translati...
ELS_Q1_Module-11_Formation-of-Rock-Layers_v2.pdf
AlphaEarth Foundations and the Satellite Embedding dataset
Placing the Near-Earth Object Impact Probability in Context
Comparative Structure of Integument in Vertebrates.pptx
. Radiology Case Scenariosssssssssssssss
BIOMOLECULES PPT........................
Taita Taveta Laboratory Technician Workshop Presentation.pptx
Vitamins & Minerals: Complete Guide to Functions, Food Sources, Deficiency Si...
Protein & Amino Acid Structures Levels of protein structure (primary, seconda...
INTRODUCTION TO EVS | Concept of sustainability
ECG_Course_Presentation د.محمد صقران ppt
Introduction to Fisheries Biotechnology_Lesson 1.pptx
cpcsea ppt.pptxssssssssssssssjjdjdndndddd
famous lake in india and its disturibution and importance
Microbiology with diagram medical studies .pptx

[論文紹介] Understanding and improving transformer from a multi particle dynamic system point of view