Improving Language Modeling using Densely Connected Recurrent Neural Networks

Contact: Frederic.godin@ugent.be – www.fredericgodin.com - @frederic_godin
IMPROVING LANGUAGE MODELING USING
DENSELY CONNECTED RECURRENT NEURAL NETWORKS
IDLAB, GHENT UNIVERSITY - IMEC
Fréderic Godin, Joni Dambre and Wesley De Neve
MOTIVATION
Model Hidden states # Layers # Params Perplexity
Stacked LSTM
(Zaremba et al., 2014)
650 2 20M 82.7
1500 2 66M 78.4
Stacked LSTM
200 3 5M 108.8
350 2 9M 87.9
Densely Connected LSTM
200 2 9M 80.4
200 3 11M 78.5
200 4 14M 76.9
EXPERIMENTS
ARCHITECTURE
CONCLUSION
Densely connecting all layers substantially improves language modeling performance
We use six times fewer parameters to obtain the same result as a stacked LSTM
Skip or residual connections are only
sporadically used when stacking LSTMs
RESEARCH QUESTION
What if we if add a skip connection between
every output and every input of every layer
in a recurrent neural network?Densely connecting all layers with skip
connections is very successful in convolution
neural networks
LSTM
LSTM
et
et h1,t
et h1,t h2,t
Fully Conn.
yt
h1,t-1
h2,th2,t-1
h1,t
xt

Improving Language Modeling using Densely Connected Recurrent Neural Networks

More Related Content

Similar to Improving Language Modeling using Densely Connected Recurrent Neural Networks (20)

Recently uploaded (20)

Improving Language Modeling using Densely Connected Recurrent Neural Networks