SlideShare a Scribd company logo
A Hierarchical Neural Autoencoder for
Paragraphs and Documents
Jiwei Li, Minh-Thang Luong and Dan Jurafsky
ACL 2015
Introduction
● Generating the coherent text (document/paragraph)
● Using the 2 stacked LSTM per encoder/decoder
○ Lower: word sequence is compressed into one vector
○ Upper: sentence sequence is compressed into one vector
● Evaluating the coherence of the output from autoencoder.
Paragraph autoencoder, Model 1: Standard LSTM
● Input and output is the same D = {s1
, s2
, …, sND
, endD
}
● each s = {w1
, w2
, …, wNs
}
○ ND, Ns means the each length.
● e represents the embedding
● Standard: all sentences concatenated into one sequence.
Model 2: Hierarchical LSTM
● Encoder
● Decoder
(I think the lower one is LSTMword
)
● After generating </s> at each sentence,
last hidden state is used as a input of the
sentence decoder.
Model 3: Hierarchical LSTM with Attention
● Bahdanau-like attention model
● Attention function is used at only sentence decoding step.
Experiment
● LSTM: 4-layer (in total), 1000 dim
● SGD: Learning rate = 0.1, 7 epoch, batch_size = 32 doc.
They used crawled documents
● Hotel Review (from TripAdvisor’s page)
○ Train: 340,000 reviews (included at most 250 words in each reviews)
○ Test: 40,000 reviews
● Wikipedia (it is not domain-specific data)
○ Train: 500,000 paragraphs
○ Test: 50,000 paragraphs
Evaluation
● ROUGE, BLEU (it doesn’t use the log function)
● Following L-value (Lapata and Barzily 2005)
“Based on the assumption that human-generated texts are coherent.”
a. Make the feature vector consisting of verbs and nouns
b. Align the input and output sentences based on F1 score using this vector.
■ ROUGE and BLEU is used as the precision and recall value
c. Calculate the L-value
i denotes the position index, i’ denotes the position index at input side
Result and Conclusion
● Hotel Review task is easier than Wikipedia task
○ Open domain documents are difficult to generate ?
○ The sentences on Wikipedia are written in the fixed format ?
● Proposed method is useful to auto-encode the document.

More Related Content

PDF
Java generics past, present and future - Raoul-Gabriel Urma, Richard Warburton
PDF
Smalltalk, the dynamic language
PPSX
Java session5
PDF
Deep learning Type Inference for Dynamic Programming Languages
PDF
Introduction to Smalltalk
PPTX
Python assignment help from professional programmers
PDF
Building parsers in JavaScript
PPTX
Java generics past, present and future - Raoul-Gabriel Urma, Richard Warburton
Smalltalk, the dynamic language
Java session5
Deep learning Type Inference for Dynamic Programming Languages
Introduction to Smalltalk
Python assignment help from professional programmers
Building parsers in JavaScript

Similar to A hierarchical neural autoencoder for paragraphs and documents (20)

PDF
Icon18revrec sudeshna
PDF
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
PDF
Machine Learning in News Media: Case study 24sata.hr - Marko Velic, Enes Deumic
PDF
Ballerina Tutorial @ SummerSOC 2019
PDF
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
PPTX
Basics of PHP by Dr.C.R.Dhivyaa Kongu Engineering College
PPTX
UNIT 1 (7).pptx
PPTX
UNIT 1 (7).pptx
PDF
Scala qq
PDF
2024-07, A Short Introduction of Modern Speech Foundation Models
PPTX
Kaggle Tweet Sentiment Extraction: 1st place solution
PDF
2019188026 Data Compression (1) (1).pdf
PPTX
UNIT 1 COMPILER DESIGN TO BE ENHANCE THEIR FEATURES AND BEHAVIOURS.pptx
PPTX
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
PDF
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
PDF
Building a large language models from scratch .pdf
PDF
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
PDF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
PDF
The Main Concepts of Speech Recognition
PDF
Anatomy of spark catalyst
Icon18revrec sudeshna
[EMNLP2017読み会] Efficient Attention using a Fixed-Size Memory Representation
Machine Learning in News Media: Case study 24sata.hr - Marko Velic, Enes Deumic
Ballerina Tutorial @ SummerSOC 2019
Learning New Semi-Supervised Deep Auto-encoder Features for Statistical Machi...
Basics of PHP by Dr.C.R.Dhivyaa Kongu Engineering College
UNIT 1 (7).pptx
UNIT 1 (7).pptx
Scala qq
2024-07, A Short Introduction of Modern Speech Foundation Models
Kaggle Tweet Sentiment Extraction: 1st place solution
2019188026 Data Compression (1) (1).pdf
UNIT 1 COMPILER DESIGN TO BE ENHANCE THEIR FEATURES AND BEHAVIOURS.pptx
15_NEW-2020-ATTENTION-ENC-DEC-TRANSFORMERS-Lect15.pptx
Advanced Neural Machine Translation (D4L2 Deep Learning for Speech and Langua...
Building a large language models from scratch .pdf
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive S...
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
The Main Concepts of Speech Recognition
Anatomy of spark catalyst
Ad

More from Hayahide Yamagishi (15)

PPTX
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
PDF
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
PDF
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
PDF
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
PDF
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
PDF
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
PDF
[ML論文読み会資料] Teaching Machines to Read and Comprehend
PDF
[ML論文読み会資料] Training RNNs as Fast as CNNs
PDF
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
PDF
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
PDF
Why neural translations are the right length
PDF
ニューラル論文を読む前に
PPTX
ニューラル日英翻訳における出力文の態制御
PPTX
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
PPTX
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[修論発表会資料] 目的言語の文書文脈を用いたニューラル機械翻訳
[論文読み会資料] Beyond Error Propagation in Neural Machine Translation: Characteris...
[ACL2018読み会資料] Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use C...
[NAACL2018読み会] Deep Communicating Agents for Abstractive Summarization
[論文読み会資料] Asynchronous Bidirectional Decoding for Neural Machine Translation
[ML論文読み会資料] Teaching Machines to Read and Comprehend
[ML論文読み会資料] Training RNNs as Fast as CNNs
入力文への情報の付加によるNMTの出力文の変化についてのエラー分析
[ACL2017読み会] What do Neural Machine Translation Models Learn about Morphology?
Why neural translations are the right length
ニューラル論文を読む前に
ニューラル日英翻訳における出力文の態制御
[EMNLP2016読み会] Memory-enhanced Decoder for Neural Machine Translation
[ACL2016] Achieving Open Vocabulary Neural Machine Translation with Hybrid Wo...
Ad

Recently uploaded (20)

PPTX
Welding lecture in detail for understanding
PPTX
Geodesy 1.pptx...............................................
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
OOP with Java - Java Introduction (Basics)
DOCX
573137875-Attendance-Management-System-original
PDF
Digital Logic Computer Design lecture notes
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Welding lecture in detail for understanding
Geodesy 1.pptx...............................................
R24 SURVEYING LAB MANUAL for civil enggi
Operating System & Kernel Study Guide-1 - converted.pdf
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
CH1 Production IntroductoryConcepts.pptx
bas. eng. economics group 4 presentation 1.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Foundation to blockchain - A guide to Blockchain Tech
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
OOP with Java - Java Introduction (Basics)
573137875-Attendance-Management-System-original
Digital Logic Computer Design lecture notes
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx

A hierarchical neural autoencoder for paragraphs and documents

  • 1. A Hierarchical Neural Autoencoder for Paragraphs and Documents Jiwei Li, Minh-Thang Luong and Dan Jurafsky ACL 2015
  • 2. Introduction ● Generating the coherent text (document/paragraph) ● Using the 2 stacked LSTM per encoder/decoder ○ Lower: word sequence is compressed into one vector ○ Upper: sentence sequence is compressed into one vector ● Evaluating the coherence of the output from autoencoder.
  • 3. Paragraph autoencoder, Model 1: Standard LSTM ● Input and output is the same D = {s1 , s2 , …, sND , endD } ● each s = {w1 , w2 , …, wNs } ○ ND, Ns means the each length. ● e represents the embedding ● Standard: all sentences concatenated into one sequence.
  • 4. Model 2: Hierarchical LSTM ● Encoder ● Decoder (I think the lower one is LSTMword ) ● After generating </s> at each sentence, last hidden state is used as a input of the sentence decoder.
  • 5. Model 3: Hierarchical LSTM with Attention ● Bahdanau-like attention model ● Attention function is used at only sentence decoding step.
  • 6. Experiment ● LSTM: 4-layer (in total), 1000 dim ● SGD: Learning rate = 0.1, 7 epoch, batch_size = 32 doc. They used crawled documents ● Hotel Review (from TripAdvisor’s page) ○ Train: 340,000 reviews (included at most 250 words in each reviews) ○ Test: 40,000 reviews ● Wikipedia (it is not domain-specific data) ○ Train: 500,000 paragraphs ○ Test: 50,000 paragraphs
  • 7. Evaluation ● ROUGE, BLEU (it doesn’t use the log function) ● Following L-value (Lapata and Barzily 2005) “Based on the assumption that human-generated texts are coherent.” a. Make the feature vector consisting of verbs and nouns b. Align the input and output sentences based on F1 score using this vector. ■ ROUGE and BLEU is used as the precision and recall value c. Calculate the L-value i denotes the position index, i’ denotes the position index at input side
  • 8. Result and Conclusion ● Hotel Review task is easier than Wikipedia task ○ Open domain documents are difficult to generate ? ○ The sentences on Wikipedia are written in the fixed format ? ● Proposed method is useful to auto-encode the document.