This document explores RNN, LSTM, and GRU cells and hyperparameters through experiments. It discusses three recurrent cell types, hyperparameters like hidden size and learning rate, and compares results using sliding window and variable length sequences on three datasets. The experiments show GRU generally converges faster than LSTM, and both outperform vanilla RNN. Larger hidden sizes and batch sizes improve performance while additional layers do not.
Related topics: