SlideShare a Scribd company logo
Contact: Frederic.godin@ugent.be – www.fredericgodin.com - @frederic_godin
IMPROVING LANGUAGE MODELING USING
DENSELY CONNECTED RECURRENT NEURAL NETWORKS
IDLAB, GHENT UNIVERSITY - IMEC
Fréderic Godin, Joni Dambre and Wesley De Neve
MOTIVATION
Model Hidden states # Layers # Params Perplexity
Stacked LSTM
(Zaremba et al., 2014)
650 2 20M 82.7
1500 2 66M 78.4
Stacked LSTM
200 3 5M 108.8
350 2 9M 87.9
Densely Connected LSTM
200 2 9M 80.4
200 3 11M 78.5
200 4 14M 76.9
EXPERIMENTS
ARCHITECTURE
CONCLUSION
Densely connecting all layers substantially improves language modeling performance
We use six times fewer parameters to obtain the same result as a stacked LSTM
Skip or residual connections are only
sporadically used when stacking LSTMs
RESEARCH QUESTION
What if we if add a skip connection between
every output and every input of every layer
in a recurrent neural network?Densely connecting all layers with skip
connections is very successful in convolution
neural networks
LSTM
LSTM
et
et h1,t
et h1,t h2,t
Fully Conn.
yt
h1,t-1
h2,th2,t-1
h1,t
xt

More Related Content

PDF
Skip, residual and densely connected RNN architectures
PDF
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
PDF
AINL 2016: Nikolenko
PDF
A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...
PPTX
Dcnn for text
PPTX
Long Short Term Memory (Neural Networks)
PPTX
Recurrent Neural Networks for Text Analysis
PPTX
IA3_presentation.pptx
Skip, residual and densely connected RNN architectures
Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They...
AINL 2016: Nikolenko
A Compression & Encryption Algorithms on DNA Sequences Using R 2 P & Selectiv...
Dcnn for text
Long Short Term Memory (Neural Networks)
Recurrent Neural Networks for Text Analysis
IA3_presentation.pptx

Similar to Improving Language Modeling using Densely Connected Recurrent Neural Networks (20)

PDF
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
PDF
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
PDF
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
PDF
PR-043: HyperNetworks
PDF
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
PDF
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
PDF
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
PDF
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
PPTX
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
PDF
PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
PDF
Issues in AI product development and practices in audio applications
PPTX
Video-Language Pre-training based on Transformer Models
PDF
Large Scale Distributed Deep Networks
PDF
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
PPTX
Resnet.pptx
PDF
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
PPTX
ResNet.pptx
PDF
Introduction to Multimodal LLMs with LLaVA
PDF
Introduction to Multimodal LLMs with LLaVA
PPTX
Neural nets
DataXDay - The wonders of deep learning: how to leverage it for natural langu...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
PR-043: HyperNetworks
Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
STREAMING PUNCTUATION: A NOVEL PUNCTUATION TECHNIQUE LEVERAGING BIDIRECTIONAL...
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
PR12-179 M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
Issues in AI product development and practices in audio applications
Video-Language Pre-training based on Transformer Models
Large Scale Distributed Deep Networks
Deep Learning for Machine Translation: a paradigm shift - Alberto Massidda - ...
Resnet.pptx
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
ResNet.pptx
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
Neural nets
Ad

Recently uploaded (20)

PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
New ISO 27001_2022 standard and the changes
PDF
Business Analytics and business intelligence.pdf
DOCX
Factor Analysis Word Document Presentation
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Introduction to the R Programming Language
PDF
Introduction to Data Science and Data Analysis
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PDF
Microsoft Core Cloud Services powerpoint
PPT
Predictive modeling basics in data cleaning process
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
Leprosy and NLEP programme community medicine
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Topic 5 Presentation 5 Lesson 5 Corporate Fin
New ISO 27001_2022 standard and the changes
Business Analytics and business intelligence.pdf
Factor Analysis Word Document Presentation
importance of Data-Visualization-in-Data-Science. for mba studnts
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
SAP 2 completion done . PRESENTATION.pptx
Introduction to the R Programming Language
Introduction to Data Science and Data Analysis
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Microsoft Core Cloud Services powerpoint
Predictive modeling basics in data cleaning process
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Leprosy and NLEP programme community medicine
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
ISS -ESG Data flows What is ESG and HowHow
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Ad

Improving Language Modeling using Densely Connected Recurrent Neural Networks

  • 1. Contact: Frederic.godin@ugent.be – www.fredericgodin.com - @frederic_godin IMPROVING LANGUAGE MODELING USING DENSELY CONNECTED RECURRENT NEURAL NETWORKS IDLAB, GHENT UNIVERSITY - IMEC Fréderic Godin, Joni Dambre and Wesley De Neve MOTIVATION Model Hidden states # Layers # Params Perplexity Stacked LSTM (Zaremba et al., 2014) 650 2 20M 82.7 1500 2 66M 78.4 Stacked LSTM 200 3 5M 108.8 350 2 9M 87.9 Densely Connected LSTM 200 2 9M 80.4 200 3 11M 78.5 200 4 14M 76.9 EXPERIMENTS ARCHITECTURE CONCLUSION Densely connecting all layers substantially improves language modeling performance We use six times fewer parameters to obtain the same result as a stacked LSTM Skip or residual connections are only sporadically used when stacking LSTMs RESEARCH QUESTION What if we if add a skip connection between every output and every input of every layer in a recurrent neural network?Densely connecting all layers with skip connections is very successful in convolution neural networks LSTM LSTM et et h1,t et h1,t h2,t Fully Conn. yt h1,t-1 h2,th2,t-1 h1,t xt