SlideShare a Scribd company logo
Incorporating	Word	Reordering	Knowledge
into	Attention-based	Neural	Machine	Translation
Jinchao Zhang,	Mingxuan Wang,	Qun Liu,	Jie Zhou
ACL2017	
presentation
Sekizawa Yuuki Komachi lab	M2
2017/11/13 1
Incorporating	Word	Reordering	Knowledge
into	Attention-based	Neural	Machine	Translation
• word	reordering	model
• crucial	sub-components	in	SMT
• attention	mechanism	of	NMT
• sometimes	inappropriate
• incorrect	translation
• propose	method
• incorporate	word	reordering	knowledge
into	attention-based	NMT	using	distortion	model
• semantic	requirement	and	the	word	reordering	penalty	
• achieves	the	SOTA	performance	on	translation	quality	
• improve	word	alignment	quality
2017/11/13 2
Chinese-English	translation	example
src youguan baodao shi zhichi tamen lundian de	zuixin yiju .	
related			report			is	support	their	arguments	’s	latest	evidence	.
ref the	report	is	the	latest	evidence	that	supports	their	arguments	.
NMT	output the	report	supports	their	perception	of	the	latest .	
count zuixin yiju {0}	(collocation)
2017/11/13 3
zuixin(latest):	common	adjective	in	Chinese
following	word	should	be	translated	soon
in	Chinese	to	English	translation	direction
yiju(evidence):	does	not	obtain	appropriate	attention	(following	slide)
leads	to	the	incorrect	translation
incorrect
attention
2017/11/13 4
propose	method
• distortion	model	using	word	reordering	knowledge
• as	the	probability	distribution	of	the	relative	jump	distances	between	
the	newly	translated	source	word	and	the	to-be-translated	source	
word	
• extend	the	attention	mechanism	to	attend	to	source	words
• regarding	the	semantic	requirement	and	the	word	reordering	penalty	
• merits
• Extended	word	reordering	knowledge	
• Convenient	to	be	incorporated	into	attention-based	NMT	
• Flexible	to	utilize	variant	context	for	computing	the	word	reordering	
penalty	
2017/11/13 5
Distortion	Models	in	SMT	
2017/11/13 6
distortion	feature
other	features
N:	a	number	of	
features
SMT
sepalately trained
NMT	(propose)
trained	in	the	end-to-end	style
propose	method	general	architecture
• α^t:	alignment	vector	computed	by	the	basic	attention	mechanism	
• dt:	alignment	vector	calculated	by	the	distortion	model
• λ:	hyper	parameter	for	interpolation
• ct:	related	source	context	
• Ψ:	context	(source	or	target	or	translation	status	(hidden	state	of	decoder))
2017/11/13 7
proposed	method’s	attention
2017/11/13 8
k:	possible	relative	jump	distance	
l:	window	size	parameter	
P():	probability	of	jump	distance	k
Γ:	shifting	the	alignment	vector
relative	jumps	on	source	words	
2017/11/13 9
distortion	model	estimate	the	probability	distribution	of	the	
possible	relative	jump	distances	between	the	newly	translated	source	word	and	
the	to-be-translated	source	word	upon	the	context	condition
3	distortion	models	(1/2)
1. S-Distortion	model	
• adopt	previous	source	context	ct-1 as	the	context	Ψ with	the	intuition	
that	certain	source	word	indicate	certain	jump	distance	
• underlying	linguistic	intuition:	synchronous	grammars
• e.g.	NP	à JJ			NN	|	JJ			NN,					JJ	à zuixin |	latest.
• zuixin(latest)	is	translated,	the	translation	orientation	is	forward	with	
shift	distance	1	
2017/11/13 10
3	distortion	models	(2/2)
1. fafda
2. T-Distortion	model	
• exploit	the	embedding	of	the	previous	generated	target	word	yt-1
• focus	on	the	word	reordering	knowledge	upon	target	word	context	
3. H-Distortion	model	
• hidden	states	st-1 reflect	the	translation	status	and	contains	both	
source	context	and	target	context	information	
2017/11/13 11
Experiment
• language:	Chinese-to-English
• data
• train:	1.25M	sentence	pairs	from	LDC	corpora	
• validation:	NIST	2002	dataset	
• test:	NIST	2003-2006	dataset	
• alignmented data:	Tsinghua	dataset	(Liu	and	Sun,	2015)	
which	contains	900	manually	aligned	sentence	pairs	
• evaluation:	BLEU,	Alignment	error	rate	(AER)
2017/11/13 12
• MT	system
• Moses,	Groundhog,	RNNsearch*	(in-house	implementation)
• NMT	Hyper	parameter
• max	length	of	sentence:	50
• vocabulary	size:	16K,	30K
• encoder:	bi-directional	GRU
• word	embedding	dimension:	620
• hidden	layer	size:	1,000
• interpolation	parameter	λ:	0.5
window	size	l:	3	
2017/11/13 13
result	(BLEU)
2017/11/13 14
vocab	16K	has	more	improvement	than	vocab	30K
our	proposed	models	alleviate	the	rare	word	collocations	problem
that	leads	to	incorrect	word	alignments
compare	with	previous	work
• Coverage:	basic	RNNsearch model	with	a	coverage	model
to	alleviate	the	over-translation	and	under-translation	problems	
• MEMDEC:	improve	translation	quality	with	external	memory	
• NMTIA:	exploits	a	readable	and	writable	attention	mechanism
to	keep	track	of	interactive	history	in	decoding	
• Our	work:	using	H-Distortion	model
n vocab	size:	30K,	Length:	maximum	sentence	length
2017/11/13 15
compare	propose	method	(BLEU↑,	AER↓)
2017/11/13 16
attention	improvement
2017/11/13 17
base	model
distortion
hyper	parameter
2017/11/13 18
l	=	3 λ =	0.5
Incorporating	Word	Reordering	Knowledge
into	Attention-based	Neural	Machine	Translation
• word	reordering	model
• crucial	sub-components	in	SMT
• attention	mechanism	of	NMT
• sometimes	inappropriate
• incorrect	translation
• propose	method
• incorporate	word	reordering	knowledge
into	attention-based	NMT	using	distortion	model
• semantic	requirement	and	the	word	reordering	penalty	
• achieves	the	SOTA	performance	on	translation	quality	
• improve	word	alignment	quality
2017/11/13 19

More Related Content

PDF
The Importance of Estimating - WCMKE 2015
PPTX
English to Bangla Translation
PDF
joint_seminar
PPTX
単語分散表現のアライメントに基づく文間類似度を用いたテキスト平易化のための単言語パラレルコーパスの構築
PDF
Noun Paraphrasing Based on a Variety of Contexts
PDF
文章読解支援のための語彙平易化@第1回NLP東京Dの会
PDF
tmu_science_cafe02
PDF
高頻度語は平易なのか?
The Importance of Estimating - WCMKE 2015
English to Bangla Translation
joint_seminar
単語分散表現のアライメントに基づく文間類似度を用いたテキスト平易化のための単言語パラレルコーパスの構築
Noun Paraphrasing Based on a Variety of Contexts
文章読解支援のための語彙平易化@第1回NLP東京Dの会
tmu_science_cafe02
高頻度語は平易なのか?

Similar to Incorporating word reordering knowledge into attention-based neural machine translation (20)

PPTX
Natural Language Processing For Language Translation.pptx
PDF
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
PDF
Translating phrases in neural machine translation
PDF
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
PDF
Lexical Analysis to Effectively Detect User's Opinion
PDF
IRJET- Sewage Treatment Potential of Coir Geotextiles in Conjunction with Act...
PDF
IRJET- Text Highlighting – A Machine Learning Approach
PPTX
NLP unit-VI.pptx
PPT
Algorithms for the thematic analysis of twitter datasets
PDF
Pxc3898474
PDF
Improving lexical choice in neural machine translation
PDF
Machine Translation Approaches and Design Aspects
PDF
A performance of svm with modified lesk approach for word sense disambiguatio...
PDF
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
PDF
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
PDF
INFORMATION RETRIEVAL FROM TEXT
PDF
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
PDF
INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH ...
PDF
INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH ...
PDF
Interlingual Syntactic Parsing: An Optimized Head-Driven Parsing for English ...
Natural Language Processing For Language Translation.pptx
IRJET- Survey on Deep Learning Approaches for Phrase Structure Identification...
Translating phrases in neural machine translation
Masterclass: Natural Language Processing in Trading with Terry Benzschawel & ...
Lexical Analysis to Effectively Detect User's Opinion
IRJET- Sewage Treatment Potential of Coir Geotextiles in Conjunction with Act...
IRJET- Text Highlighting – A Machine Learning Approach
NLP unit-VI.pptx
Algorithms for the thematic analysis of twitter datasets
Pxc3898474
Improving lexical choice in neural machine translation
Machine Translation Approaches and Design Aspects
A performance of svm with modified lesk approach for word sense disambiguatio...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
INFORMATION RETRIEVAL FROM TEXT
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH ...
INTERLINGUAL SYNTACTIC PARSING: AN OPTIMIZED HEAD-DRIVEN PARSING FOR ENGLISH ...
Interlingual Syntactic Parsing: An Optimized Head-Driven Parsing for English ...
Ad

More from sekizawayuuki (20)

PDF
Improving Japanese-to-English Neural Machine Translation by Paraphrasing the ...
PDF
paper introducing: Exploiting source side monolingual data in neural machine ...
PDF
Coling2016 pre-translation for neural machine translation
PPTX
目的言語の低頻度語の高頻度語への言い換えによるニューラル機械翻訳の改善
PPTX
Emnlp読み会@2017 02-15
PDF
Acl reading@2016 10-26
PDF
[論文紹介]Selecting syntactic, non redundant segments in active learning for mach...
PDF
Nlp2016 sekizawa
PDF
Emnlp読み会@2015 10-09
PDF
Acl読み会@2015 09-18
PDF
読解支援@2015 08-10-6
PDF
読解支援@2015 08-10-5
PDF
読解支援@2015 08-10-4
PDF
読解支援@2015 08-10-3
PDF
読解支援@2015 08-10-2
PDF
読解支援@2015 08-10-1
PDF
読解支援@2015 07-24
PDF
読解支援@2015 07-17
PDF
読解支援@2015 07-13
PDF
読解支援@2015 07-03
Improving Japanese-to-English Neural Machine Translation by Paraphrasing the ...
paper introducing: Exploiting source side monolingual data in neural machine ...
Coling2016 pre-translation for neural machine translation
目的言語の低頻度語の高頻度語への言い換えによるニューラル機械翻訳の改善
Emnlp読み会@2017 02-15
Acl reading@2016 10-26
[論文紹介]Selecting syntactic, non redundant segments in active learning for mach...
Nlp2016 sekizawa
Emnlp読み会@2015 10-09
Acl読み会@2015 09-18
読解支援@2015 08-10-6
読解支援@2015 08-10-5
読解支援@2015 08-10-4
読解支援@2015 08-10-3
読解支援@2015 08-10-2
読解支援@2015 08-10-1
読解支援@2015 07-24
読解支援@2015 07-17
読解支援@2015 07-13
読解支援@2015 07-03
Ad

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Cell Types and Its function , kingdom of life
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Lesson notes of climatology university.
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
master seminar digital applications in india
PDF
RMMM.pdf make it easy to upload and study
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Microbial diseases, their pathogenesis and prophylaxis
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
202450812 BayCHI UCSC-SV 20250812 v17.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
GDM (1) (1).pptx small presentation for students
Cell Types and Its function , kingdom of life
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Final Presentation General Medicine 03-08-2024.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Computing-Curriculum for Schools in Ghana
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Lesson notes of climatology university.
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Supply Chain Operations Speaking Notes -ICLT Program
master seminar digital applications in india
RMMM.pdf make it easy to upload and study
VCE English Exam - Section C Student Revision Booklet
Chinmaya Tiranga quiz Grand Finale.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf

Incorporating word reordering knowledge into attention-based neural machine translation