SlideShare a Scribd company logo
Jyoti Verma et al., International Journal of Advanced Trends in Computer Science and Engineering, 11(1), January - February 2022, 20 – 25
20
A Hybrid Deep Neural Network Model for
Time Series Forecasting
Jyoti Verma
Department of Computer Science and Engineering
Suresh Gyan Vihar Univarsity
Jaipur, Rajasthan, India
jyoti.18223@mygyanvihar.com
Sohit Agarwal
Department of Computer Science and Engineering
Suresh Gyan Vihar Univarsity
Jaipur, Rajasthan, India
Sohit.agarwal@mygyanvihar.com
ABSTRACT
Deep neural networks have proven to perform optimal
forecasts even with the presence of noisyand non-linear nature
of time series data. In thispaper, a hybrid deep neural network
consisting of Convolutional Neural Networks (CNN) and
Long Short Term Memory (LSTM) architecture have been
proposed. The model combines the convolutional layer’s
capability of feature extraction along with the LSTM’s feature
of learning long term sequential dependencies. The
experiments were performed on two datasets and compared
with four other approaches: Recurrent Neural Network
(RNN), LSTM, Gated Recurrent Unit (GRU) and
Bidirectional LSTM. All five models are evaluated and
compared with one step ahead forecasting. The proposed
hybrid CNN-LSTM outperformed other modelsfor both
datasets showing robustness against error.
Key words :Recurrent Neural Networks, Long Short Term
Memory, Gated Recurrent Units, Bidirectional, Convolutional
Neural Networks, Time Series Forecasting
1. INTRODUCTION
Time series refers to sequential data in which order is
required to be maintained. It is observations recorded in
successive intervals of time. Time series data can be
frequently observed in the domains of econometrics such as
stock prices, currency exchange rates as well assignal
processing and meteorology records of wind speeds,
temperatures and rainfall. These data are prevalently used
forecasting, which is predicting the future values by utilization
of the past values.Now, forecasting can be performed using
the traditional statistical methods or neural network models.
Statistical models such as ARIMA, ARIMAX, GAS is the
prevalently used time series forecasting techniques in majority
of the domains [1]. Artificial neural networks have also been
used along with these for achieving better forecast results.
Recently, Recurrent Neural Networks are being used for
sequential data problems [2]. These have been widely used for
Natural Language Processing as well as time series
forecasting. Along with the recurrent neural networks, hybrid
models consisting of a convolutional component have also
evolved recently. Here, we perform forecasting using four
prevalent architectures of recurrent neural networks and
analyze their performance. We also develop a hybrid CNN-
LSTM architecture and compare its efficiency with other
networks. The main issue here is to perform analysis and
forecasting of time series datasets and develop the qualitative
forecasting models
RNNs are a special class of neural networks characterized
by internal self-connections in any nonlinear dynamical
system. Prominent architectures of RNN include Deep RNNs
with Multi-Layer Perceptron, Bidirectional RNN, Recurrent
Convolutional Neural Networks, Multi-Dimensional
Recurrent Neural Networks, Long-Short Term Memory, Gated
Recurrent Unit, Memory Networks, Structurally Constrained
Recurrent Neural Network, Unitary Recurrent Neural
Networks, Gated Orthogonal Recurrent Unit and Hierarchical
Subsampling Recurrent Neural Networks [3]. However,
vanilla RNN is known to be having the underlying issue of
vanishing as well as exploding gradients in order to tackle
which various clipping strategies as well as other variants of
RNN are proposed [4]. The LSTM variant of RNN have been
analyzed for eight of its variants concluding that forget gate
and the output activation function are the most critical
component. Also, the learning rate is found to be the most
crucial hyperparameter [5]. Now, RNN and its variants have
been widely used for time series forecasting tasks in a wide
range of domains. Long short term memory has been used as a
novel forecasting technique for solar energy forecasting
proving LSTM as being robust and performing better than
GBR and FFNN [6]. Petroleum time series data which are
characterized by high dimensionality, non-stationary being
highly non-linear in nature have also been used to test the
performance of LSTM [7]. Furthermore, a deep architecture of
RNN has been used to extract deep invariant daily features of
financial time series outperforming other models in predictive
accuracy and profitability performance[8].A combination of
the auto-encoder of convolutional neural network and the long
short-term memory unit has also been proposed for the task of
wind speed forecast[16].Recently, a black-box CNN-LSTM
architecture was proposed forindoor temperature modeling
[17].
ISSN 2278-3091
Volume 11, No.1, January - February 2022
International Journal of Advanced Trends in Computer Science and Engineering
Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse051112022.pdf
https://doi.org/10.30534/ijatcse/2022/051112022
Received Date : December 10, 2021 Accepted Date :January 10, 2022 Published Date : February 06, 2022
Jyoti Verma et al., International Journal of Advanced Trends in Computer Science and Engineering, 11(1), January - February 2022, 20 – 25
21
This paper aims to analyse the RNN, LSTM, GRU and
Bidirectional architectures along with the proposal of a deep
neural network architecture for the task of univariate time
series forecasting. Datasets from the electricity and air quality
domain are being used. The length of both the datasets vary,
however the fluctuations can be found to be similar. As such
the problems from different domains are chosen in order to
analyze the influence on the results of the proposed
architecture. The paper is organized as follows. Section II
discusses the architecture and mathematical formulation of the
RNN models analyzed and the proposed model. Section III
states the experimental details of the methodology adopted
and the results while Section IV concludes the paper with
future directions.
2.RECURRENT NEURAL NETWORK
ARCHITECTURES
This section summarizes the basics of the RNN
architectures which are analyzed in this paper. RNN is the
most basic of all the three and GRU and LSTM are the
variants which were introduced at a later time.
A. Recurrent Neural Networks
Recurrent Neural Networks [9] are in the family of
feedforward neural networks. They are different from other
feedforward networks in their ability to send information over
time-steps. Recurrent Neural Networks are considered Turing
complete and can simulate arbitrary programs (with weights).
If we view neural networks as optimization over functions, we
can consider Recurrent Neural Networks as optimization over
programs. Recurrent neural networks are well suited for
modeling functions for which the input and/or output is
composed of vectors that involve a time dependency between
the values. Recurrent neural networks model the time aspect
of data by creating cycles in the network (hence, the recurrent
part of the name). RNN is a special type of Neural Network
that accounts for the dependencies between data nodes. It
preserves the sequential information in an inner state, allowing
them to persist the knowledge accrued from subsequent time
steps.
= + ℎ
ℎ = tanh ( )
= ℎ
= ( ) (1)
Figure 1 represents an RNN cell with xtas present input, ht−1
the previous state, Wxh
as weight between inputs to hidden unit,
Whh
being weight between hidden to hidden unit, i.e., the
recurrent weight and bias b. ztis the output of the hidden unit
before application of activation function φ. Then, htis the
hidden unit output that is sent to the next recurrent units and
also used in computation of final output of that RNN unit. The
final output ytis computed by applying another activation
function to the hidden unit output and Why
weight between
hidden to output unit. The selection and application of
activation function depends on the task being performed
[hands on].
B. Long Short Term Memory Networks
LSTM [10], as in Figure 2, introduces additional
computation components to the RNN, the input gate, the
forget gate and the output gate. The recurrence equation for
the hidden vector is changed for LSTM with the use of long-
term memory. The operations of the LSTM are designed to
have fine-grained control over the data written into this long-
term memory. The equations for the forward pass are stated
below:
= tanh ( + ℎ )
= σ ( + ℎ )
= σ ( + ℎ )
= σ ( + ℎ )
= ⨀ + ⨀
ℎ = ⨀ tanh ( ) (2)
The current input and the previous state are worked upon by at
after which the input gate it decides upon which parts of at are
required to be added to the long term state ct. The forget gate
ftmakes a decision as to which parts of ct−1 are to be erased and
erases unnecessary parts, the output gate otdecides on the parts
of ctto be read and shown as output. There exists a short term
state htbetween the cells and a long term state ctin which the
memories are dropped and added by the respective gates.
Table 1: DATASET DESCRIPTION
Dataset Source No. of
Observations
Description
Dataset
1
[12] 2826 Half Hourly values of Electricity Demand ranging from 01-01-1991
to 01-03-1999
Dataset
2
UCI Repository
[14]
9352 Hourly Air Quality dataset ranging from 10-03-2004 to 04-04-2005
Jyoti Verma et al., International Journal of Advanced Trends in Computer Science and Engineering, 11(1), January - February 2022, 20 – 25
22
C. Gated Recurrent Units
The Gated Recurrent Unit [11] can be viewed as a
simplification of the LSTM, which does not use explicit cell
states. The main simplifications are that both state vectors are
merged into a single vector. Figure 3 represents a GRU cell
and (3) the equations followed during forward pass.
= ( . ( ) + .ℎ( ) )
= ( . ( ) + .ℎ( ) )
= ℎ( . ( ) + .( ( ) ⊗ ℎ( ) ))
ℎ = (1 − ) ⊗ ℎ( .ℎ( ) + ⊗ )
(3)
A single gate controller controls both the forget gate and the
input gate. If the gate controller outputs a 1, the input gate is
open and the forget gate is closed. If it outputs a 0, the
opposite happens. In other words, whenever a memory must
be stored, the location where it will be stored is erased first.
This is actually a frequent variant to the LSTM cell in and of
itself. There is no output gate; the full state vector is output at
every time step. However, there is a new gate controller that
controls which part of the previous state will be shown to the
main layer.
D. Bidirectional LSTM
Bidirectional long-short term memory allows the neural
network to have sequential information in both directions
backwards and forward, i.e., past to future as well as future to
past.
Since the input flows in two directions, a bidirectional
LSTMis different from the regular LSTM. With vanilla
LSTM, the input flows either in forward direction or
backwards. However, in bidirectional the input flows in both
directions. This helps to preserve not only the past
information but also the future data.
3.EXPERIMENT AND RESULTS
The methodology followed for the proposed work and the
results obtained is being discussed in this section. Figure 4
gives a more descriptive interpretation of the scheme
followed. The time series datasets are first preprocessed for
making it trainable using the neural network model. The
models are further optimized, regularized and properly tuned
for attaining generalized results avoiding underfitting as well
as overfitting. The performance evaluation of the four variants
of RNN and the proposed model is done using evaluation
metrics which finally decides the best forecast model for the
problem at hand. The experiments were carried out using the
keras library with tensorflow backend and python
programming language.
Figure 1: Methodology
A. Dataset Description
The analysis is carried out on two real world datasets from
varying domains and lengths. The description of the datasets is
given in Table I.
From the figures of the dataset, it can be observed that
Dataset 1 follows a particular pattern repeating itself, however
the number of observations is low. Dataset 2 is not as complex
but it consists of the maximum number of observations. The
aim is to analyze the performance of the models in differing
set of scenarios of datasets and also prove the efficiency of the
proposed model.
Figure 2: Electricity Demand Data
Jyoti Verma et al., International Journal of Advanced Trends in Computer Science and Engineering, 11(1), January - February 2022, 20 – 25
23
Figure 3: Air Quality Data
The datasets described are raw datasets, i.e., preprocessing
is required for making them usable. Since, neural network
architecture is being used, converting the series to a stationary
set is not a mandatory step. The values range go high as well
as low due to which data normalization is performedin order
to standardize the inputs and approach faster towards the
global minima. It also ensures that larger inputs do not
overwhelm or become dominant. Min-max normalization, as
described in (4), is performed in a range of 0 to 1. It does not
change the data pattern or characteristics but only readjusts the
scale of the data.
= (4)
Data pre-processing step also includes the conversion to a
supervised data format since time series datasets are being
dealt with in here. One-step ahead forecast is to be done where
the next time step (t+1) is predicted. Originally the univariate
time series dataset consists of only one feature column. We
divide this time series into input (x) and output (y) using lag
time method. Lags differ in all three datasets. Dataset 1
consists of lag size 6and Dataset 2 of size 4.
B. Network Modelling
Modeling the neural network architecture such that it
performs optimally requires setting and tuning different
configurations of the network. The supervised data is split into
a train-validation-test split for proper estimation of error.
Optimization is the minimization of loss function with respect
to the parameters of our model. Here, ADAM optimizer is
used as stated in [15]. ADAM optimizer is robust and is used
frequently used for training RNN architectures. Now,
optimizers mainly aim to decrease the training error. But,
sometimes this results in overfitting, i.e., the model fits well
on the training data but unable to fit on the test data.
approximately high value for the parameters governing the
capacity of the model and then controlling it by adding a
regularization term to the error function. In our work, we have
used Dropout regularization as and when required [2]. A
dropout layer blocks a random set of cell units in one iteration.
Blocked units do not receive and do not transmit information.
Removing connections in the network reduces the number of
free parameters to be estimated during training and the
complexity of the network. Consequently, dropout helps to
prevent over-fitting. Dropout ratio of 0.2 is used in our work
in the hidden layer. Hyperparameters are the settings that are
not adapted by the learning algorithm since that would result
in model overfitting. Hidden layers of size 2 and 3 were
experimented upon. The number of hidden nodes was set to
form a narrow architecture. Number of epochs is tuned for the
problem at hand.
Figure 4 represents the proposed network model denoting
the input, output and the hidden layers. The convolution
operations are performed in the initial layers for automatic
feature extraction rather than doing it manually.Pooling is a
process of down-sampling, which can effectively reducethe
dimension of the matrix window, while retaining the deep
informationat the same time. In this work, the max pooling
was used. Then, we have the LSTM layers to preserve the
sequential information. Finally, we have the output layer
which gives the one step ahead forecast results.
Figure 4 : Network Model
C. Performance Metrics
Four performance evaluation metrics are used to assess
forecast accuracy. These error metrics are frequently used for
assessing model accuracy. After evaluating all the forecast
models according to the above stated metrics, the best
configured forecast model is decided upon.These are shown in
Table II below:
Jyoti Verma et al., International Journal of Advanced Trends in Computer Science and Engineering, 11(1), January - February 2022, 20 – 25
24
Table 2: PERFORMANCE EVALUATION METRICS
Metric Description
MAE Mean
Absolute
Error
1
( − )
MSE Mean
Squared Error
1
( − )
RMSE Root Mean
Squared Error 1
( − )
score
score 1
−
∑ ( − )
∑ ( − ⃗ )
Here, N is the total number of observations for which the
error is evaluated. yn
actual
is the actual observation in nth
position and yn
predicted
the predicted value.
D. Results
As stated earlier, the experiments are carried out on two
datasets from different real-world domains. Four architectures
of Recurrent Neural Networks, being RNN, LSTM, GRU,
Bidirectional LSTM, and proposed hybrid deep neural
network model areusedfor forecasting one step-ahead result.
The results shown below in Table III and Table IV show the
performance evaluation parameters for both the datasets with
optimal parameters.
Table 3:PERFORMANCE EVALUATION FOR DATASET 1
Models MAE MSE RMSE score
RNN 0.09277 0.01489 0.1221 54.33
LSTM 0.09273 0.01462 0.1209 55.18
GRU 0.09185 0.01461 0.1209 55.19
Bidirectional 0.09182 0.01452 0.1205 55.48
Proposed 0.09082 0.01436 0.1198 55.97
Table 5: PERFORMANCE EVALUATION FOR DATASET 2
Models MAE MSE RMSE
score
RNN 0.02097 0.0006274 0.02505 94.27
LSTM 0.02003 0.0005952 0.02439 94.56
GRU 0.01972 0.0005882 0.02425 94.63
Bidirectional 0.01580 0.0005341 0.02311 95.13
Proposed 0.01566 0.0004176 0.02044 96.19
The results are stated for well-tuned models achieved after
experiments for generalizing the model for a better test
performance on unknown samples. From the table of results,
many observations can be made about the performance of the
forecast models. In the case of network performance, it can be
seen that the proposed hybrid model performs the best out of
all five models in both the scenarios. The convolutional layer
does enhance the forecast performance when combined with
recurrent layers of LSTM network. The bidirectional LSTM
also indicates better performance than other variants due to its
property of preserving both past and future information.
Concerning the datasets, more amount of data leads to better
performance. Dataset 1 has lesser amount of data samples than
Dataset 2 which results in the models performing better in
Dataset 2. The fluctuating data requires the complex structure
of these RNN variants in order to learn the data patterns and
dependencies required for forecasting. The proposed CNN-
LSTM architecture enjoys the advantage of efficiency.
4.CONCLUSION
In the paper, four variants of RNN, namely RNN, LSTM,
GRU and Bidirectional LSTM has been analyzed on two
different real-world datasets. It was observed that
Bidirectional LSTM performed more efficiently amongst all.
Further, a hybrid deep neural network mechanism comprising
of convolutional layers along with recurrent layers and
dropout regularization have been proposed. The proposed
model is able provide optimal forecast results in both the
datasets. Hence, the convolutional neural network’s properties
can be utilized along with recurrent neural network
architectures to develop efficient time series forecast
mechanisms.
REFERENCES
[1] P. J. Brockwell and R. A. Davis, Introduction to Time
Series and Forecasting - Second Edition. 2002.
[2] Ian Goodfellow, Y. Bengio, and A. Courville, Deep
learning. 2017.
[3] H. Salehinejad, J. Baarbe, S. Sankar, J. Barfett, E. Colak,
S. Valaee, ”Recent advances in recurrent neural
networks,” 2017, [Online] Available:
https://arxiv.org/abs/1801.01078
[4] R. Pascanu, T. Mikolov, and Y. Bengio, On the
difficulty of training recurrent neural networks, in 30th
International Conference on Machine Learning, ICML
2013, 2013, no. PART 3, pp. 23472355.
[5] K. Greff, R. K. Srivastava, J. Koutnik, B. R.
Steunebrink, and J. Schmidhuber, LSTM: A Search
Space Odyssey, IEEE Trans. Neural Networks Learn.
Syst., vol. 28, no. 10, pp. 22222232, 2017.
[6] S. Srivastava and S. Lessmann, A comparative study of
LSTM neural networks in forecasting day-ahead global
horizontal irradiance with satellite data, Sol. Energy, vol.
162, pp. 232247, 2018.
[7] A. Sagheer and M. Kotb, Time series forecasting of
petroleum production using deep LSTM recurrent
networks, Neurocomputing, vol. 323, pp. 203213, 2019.
Jyoti Verma et al., International Journal of Advanced Trends in Computer Science and Engineering, 11(1), January - February 2022, 20 – 25
25
[8] W. Bao, J. Yue, and Y. Rao, A deep learning framework
for financial time series using stacked autoencoders and
long-short term memory, PLoS One, vol. 12, no. 7, 2017.
[9] K. Unnikrishnan and K. P. Venugopal, Alopex: A
correlation-based learning algorithm for feedforward and
recurrent neural networks, Neural Computation, vol. 6,
no. 3, pp. 469490, 1994.
[10] F. A. Gers, J. Schmidhuber, F. Cummins,”Learning to
forget: Continual prediction with LSTM”, Neural
Comput., vol. 12, no. 6, pp. 2451-2471, 2000.
[11] K. Cho, D. Bahdanau, F. Bougare, H. Schwenk and Y.
Bengio, ”Learning Phrase Representations using RNN
EncoderDecoderfor Statistical Machine Translation,”
arXiv, 2014.
[12] I. Koprinska, M. Rana, and V. G. Agelidis, Yearly and
seasonal models for electricity load forecasting, in
Neural Networks (IJCNN), The 2011 International Joint
Conference on. IEEE, 2011, pp. 14741481.
[13] [Online].
Available:https://finance.yahoo.com/quote/CSV/history/
[14] M. Lichman, UCI machine learning repository, 2013.
[Online]. Available: http://archive.ics.uci.edu/ml
[15] D. P. Kingma and J. L. Ba, Adam: A method for
stochastic optimization, 3rd Int. Conf. Learn. Represent.
ICLR 2015 - Conf. Track Proc., 2015.
[16] Y. Chen, Y. Wang, Z. Dong, J. Su, Z. Han, D. Zhou, Y.
Zhao, Y. Bao, “2-D regional short-term wind speed
forecast based on CNN-LSTM deep learning model,”
Energy Conversion and Management, vol. 244,2021.
[17] F. Elmaz, R. Eyckerman, W. Casteels, S. Latré, P.
Hellinckx, "CNN-LSTM architecture for predictive
indoor temperature modeling," Building and
Environment, vol. 206, 2021.

More Related Content

PDF
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
PDF
Novel cell selection proceduref lte hetnets based on mathematical modelling o...
PDF
On The Performance of Intrusion Detection Systems with Hidden Multilayer Neur...
PDF
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
PDF
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
PDF
Efficiency of recurrent neural networks for seasonal trended time series mode...
PDF
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...
PDF
3 3 energy efficient topology
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
Novel cell selection proceduref lte hetnets based on mathematical modelling o...
On The Performance of Intrusion Detection Systems with Hidden Multilayer Neur...
ON THE PERFORMANCE OF INTRUSION DETECTION SYSTEMS WITH HIDDEN MULTILAYER NEUR...
EXPERIMENTS ON DIFFERENT RECURRENT NEURAL NETWORKS FOR ENGLISH-HINDI MACHINE ...
Efficiency of recurrent neural networks for seasonal trended time series mode...
Wireless sensor networks, clustering, Energy efficient protocols, Particles S...
3 3 energy efficient topology

Similar to A Hybrid Deep Neural Network Model For Time Series Forecasting (20)

PDF
3 3 energy efficient topology
PDF
Novel Cell Selection Proceduref LTE Hetnets Based on Mathematical Modelling o...
PDF
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...
PDF
Localization for wireless sensor
PDF
D031202018023
PDF
On the High Dimentional Information Processing in Quaternionic Domain and its...
PDF
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
PDF
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
PDF
I045075155
PDF
DESIGN AND IMPLEMENTATION OF ADVANCED MULTILEVEL PRIORITY PACKET SCHEDULING S...
PDF
Iv3515241527
PDF
Short Term Load Forecasting: One Week (With & Without Weekend) Using Artifici...
PDF
Ed33777782
PDF
Ed33777782
PDF
Power saving mechanism for hybrid routing protocol using scheduling technique
PDF
Neural network based energy efficient clustering and routing
PDF
Energy aware clustering protocol (eacp)
PDF
Coverage and Connectivity Aware Neural Network Based Energy Efficient Routing...
PDF
Control of Nonlinear Industrial Processes Using Fuzzy Wavelet Neural Network ...
PDF
ENERGY EFFICIENCY IN AD HOC NETWORKS
3 3 energy efficient topology
Novel Cell Selection Proceduref LTE Hetnets Based on Mathematical Modelling o...
WIRELESS SENSOR NETWORK CLUSTERING USING PARTICLES SWARM OPTIMIZATION FOR RED...
Localization for wireless sensor
D031202018023
On the High Dimentional Information Processing in Quaternionic Domain and its...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
Forecasting Electric Energy Demand using a predictor model based on Liquid St...
I045075155
DESIGN AND IMPLEMENTATION OF ADVANCED MULTILEVEL PRIORITY PACKET SCHEDULING S...
Iv3515241527
Short Term Load Forecasting: One Week (With & Without Weekend) Using Artifici...
Ed33777782
Ed33777782
Power saving mechanism for hybrid routing protocol using scheduling technique
Neural network based energy efficient clustering and routing
Energy aware clustering protocol (eacp)
Coverage and Connectivity Aware Neural Network Based Energy Efficient Routing...
Control of Nonlinear Industrial Processes Using Fuzzy Wavelet Neural Network ...
ENERGY EFFICIENCY IN AD HOC NETWORKS
Ad

More from Martha Brown (20)

PDF
Business Proposal Letter THE RESEARCH PROPO
PDF
What Are The Best Research Methods For Writers
PDF
(PDF) Editorial - Writing For Publication
PDF
Canada Role In World Essay United Nations Internati
PDF
5 Best Images Of 12-Sided Snowflake Printable Templ
PDF
Monster Page Borders (Teacher Made). Online assignment writing service.
PDF
How To Resource In An Essay Salt Lake Juvenile Defense
PDF
How To Write A Play Script (With Pictures) - WikiHow
PDF
How To Write A Great Narrative Essay. How Do Y
PDF
Apa Itu Template What Is Template Images
PDF
Fake Essay Writer Tumblr - Formatessay.Web.Fc2.Com
PDF
Phenomenal How To Write A Satirical Essay Thatsnotus
PDF
The Best Providers To Get Custom Term Paper Writing Service
PDF
How To Choose A Perfect Topic For Essay. Online assignment writing service.
PDF
Pin On Dissertation Help Online. Online assignment writing service.
PDF
Cantest Sample Essay. Online assignment writing service.
PDF
Article Critique Example In His 1999 Article The - Ma
PDF
College Essay Examples Of College Essays
PDF
Writing A TOK Essay. Online assignment writing service.
PDF
How To Write A Good Classific. Online assignment writing service.
Business Proposal Letter THE RESEARCH PROPO
What Are The Best Research Methods For Writers
(PDF) Editorial - Writing For Publication
Canada Role In World Essay United Nations Internati
5 Best Images Of 12-Sided Snowflake Printable Templ
Monster Page Borders (Teacher Made). Online assignment writing service.
How To Resource In An Essay Salt Lake Juvenile Defense
How To Write A Play Script (With Pictures) - WikiHow
How To Write A Great Narrative Essay. How Do Y
Apa Itu Template What Is Template Images
Fake Essay Writer Tumblr - Formatessay.Web.Fc2.Com
Phenomenal How To Write A Satirical Essay Thatsnotus
The Best Providers To Get Custom Term Paper Writing Service
How To Choose A Perfect Topic For Essay. Online assignment writing service.
Pin On Dissertation Help Online. Online assignment writing service.
Cantest Sample Essay. Online assignment writing service.
Article Critique Example In His 1999 Article The - Ma
College Essay Examples Of College Essays
Writing A TOK Essay. Online assignment writing service.
How To Write A Good Classific. Online assignment writing service.
Ad

Recently uploaded (20)

PDF
Computing-Curriculum for Schools in Ghana
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
master seminar digital applications in india
PPTX
Cell Structure & Organelles in detailed.
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Insiders guide to clinical Medicine.pdf
PPTX
Lesson notes of climatology university.
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
PPH.pptx obstetrics and gynecology in nursing
Computing-Curriculum for Schools in Ghana
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
master seminar digital applications in india
Cell Structure & Organelles in detailed.
STATICS OF THE RIGID BODIES Hibbelers.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Sports Quiz easy sports quiz sports quiz
Microbial disease of the cardiovascular and lymphatic systems
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
GDM (1) (1).pptx small presentation for students
Insiders guide to clinical Medicine.pdf
Lesson notes of climatology university.
Supply Chain Operations Speaking Notes -ICLT Program
Renaissance Architecture: A Journey from Faith to Humanism
TR - Agricultural Crops Production NC III.pdf
PPH.pptx obstetrics and gynecology in nursing

A Hybrid Deep Neural Network Model For Time Series Forecasting

  • 1. Jyoti Verma et al., International Journal of Advanced Trends in Computer Science and Engineering, 11(1), January - February 2022, 20 – 25 20 A Hybrid Deep Neural Network Model for Time Series Forecasting Jyoti Verma Department of Computer Science and Engineering Suresh Gyan Vihar Univarsity Jaipur, Rajasthan, India jyoti.18223@mygyanvihar.com Sohit Agarwal Department of Computer Science and Engineering Suresh Gyan Vihar Univarsity Jaipur, Rajasthan, India Sohit.agarwal@mygyanvihar.com ABSTRACT Deep neural networks have proven to perform optimal forecasts even with the presence of noisyand non-linear nature of time series data. In thispaper, a hybrid deep neural network consisting of Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) architecture have been proposed. The model combines the convolutional layer’s capability of feature extraction along with the LSTM’s feature of learning long term sequential dependencies. The experiments were performed on two datasets and compared with four other approaches: Recurrent Neural Network (RNN), LSTM, Gated Recurrent Unit (GRU) and Bidirectional LSTM. All five models are evaluated and compared with one step ahead forecasting. The proposed hybrid CNN-LSTM outperformed other modelsfor both datasets showing robustness against error. Key words :Recurrent Neural Networks, Long Short Term Memory, Gated Recurrent Units, Bidirectional, Convolutional Neural Networks, Time Series Forecasting 1. INTRODUCTION Time series refers to sequential data in which order is required to be maintained. It is observations recorded in successive intervals of time. Time series data can be frequently observed in the domains of econometrics such as stock prices, currency exchange rates as well assignal processing and meteorology records of wind speeds, temperatures and rainfall. These data are prevalently used forecasting, which is predicting the future values by utilization of the past values.Now, forecasting can be performed using the traditional statistical methods or neural network models. Statistical models such as ARIMA, ARIMAX, GAS is the prevalently used time series forecasting techniques in majority of the domains [1]. Artificial neural networks have also been used along with these for achieving better forecast results. Recently, Recurrent Neural Networks are being used for sequential data problems [2]. These have been widely used for Natural Language Processing as well as time series forecasting. Along with the recurrent neural networks, hybrid models consisting of a convolutional component have also evolved recently. Here, we perform forecasting using four prevalent architectures of recurrent neural networks and analyze their performance. We also develop a hybrid CNN- LSTM architecture and compare its efficiency with other networks. The main issue here is to perform analysis and forecasting of time series datasets and develop the qualitative forecasting models RNNs are a special class of neural networks characterized by internal self-connections in any nonlinear dynamical system. Prominent architectures of RNN include Deep RNNs with Multi-Layer Perceptron, Bidirectional RNN, Recurrent Convolutional Neural Networks, Multi-Dimensional Recurrent Neural Networks, Long-Short Term Memory, Gated Recurrent Unit, Memory Networks, Structurally Constrained Recurrent Neural Network, Unitary Recurrent Neural Networks, Gated Orthogonal Recurrent Unit and Hierarchical Subsampling Recurrent Neural Networks [3]. However, vanilla RNN is known to be having the underlying issue of vanishing as well as exploding gradients in order to tackle which various clipping strategies as well as other variants of RNN are proposed [4]. The LSTM variant of RNN have been analyzed for eight of its variants concluding that forget gate and the output activation function are the most critical component. Also, the learning rate is found to be the most crucial hyperparameter [5]. Now, RNN and its variants have been widely used for time series forecasting tasks in a wide range of domains. Long short term memory has been used as a novel forecasting technique for solar energy forecasting proving LSTM as being robust and performing better than GBR and FFNN [6]. Petroleum time series data which are characterized by high dimensionality, non-stationary being highly non-linear in nature have also been used to test the performance of LSTM [7]. Furthermore, a deep architecture of RNN has been used to extract deep invariant daily features of financial time series outperforming other models in predictive accuracy and profitability performance[8].A combination of the auto-encoder of convolutional neural network and the long short-term memory unit has also been proposed for the task of wind speed forecast[16].Recently, a black-box CNN-LSTM architecture was proposed forindoor temperature modeling [17]. ISSN 2278-3091 Volume 11, No.1, January - February 2022 International Journal of Advanced Trends in Computer Science and Engineering Available Online at http://www.warse.org/IJATCSE/static/pdf/file/ijatcse051112022.pdf https://doi.org/10.30534/ijatcse/2022/051112022 Received Date : December 10, 2021 Accepted Date :January 10, 2022 Published Date : February 06, 2022
  • 2. Jyoti Verma et al., International Journal of Advanced Trends in Computer Science and Engineering, 11(1), January - February 2022, 20 – 25 21 This paper aims to analyse the RNN, LSTM, GRU and Bidirectional architectures along with the proposal of a deep neural network architecture for the task of univariate time series forecasting. Datasets from the electricity and air quality domain are being used. The length of both the datasets vary, however the fluctuations can be found to be similar. As such the problems from different domains are chosen in order to analyze the influence on the results of the proposed architecture. The paper is organized as follows. Section II discusses the architecture and mathematical formulation of the RNN models analyzed and the proposed model. Section III states the experimental details of the methodology adopted and the results while Section IV concludes the paper with future directions. 2.RECURRENT NEURAL NETWORK ARCHITECTURES This section summarizes the basics of the RNN architectures which are analyzed in this paper. RNN is the most basic of all the three and GRU and LSTM are the variants which were introduced at a later time. A. Recurrent Neural Networks Recurrent Neural Networks [9] are in the family of feedforward neural networks. They are different from other feedforward networks in their ability to send information over time-steps. Recurrent Neural Networks are considered Turing complete and can simulate arbitrary programs (with weights). If we view neural networks as optimization over functions, we can consider Recurrent Neural Networks as optimization over programs. Recurrent neural networks are well suited for modeling functions for which the input and/or output is composed of vectors that involve a time dependency between the values. Recurrent neural networks model the time aspect of data by creating cycles in the network (hence, the recurrent part of the name). RNN is a special type of Neural Network that accounts for the dependencies between data nodes. It preserves the sequential information in an inner state, allowing them to persist the knowledge accrued from subsequent time steps. = + ℎ ℎ = tanh ( ) = ℎ = ( ) (1) Figure 1 represents an RNN cell with xtas present input, ht−1 the previous state, Wxh as weight between inputs to hidden unit, Whh being weight between hidden to hidden unit, i.e., the recurrent weight and bias b. ztis the output of the hidden unit before application of activation function φ. Then, htis the hidden unit output that is sent to the next recurrent units and also used in computation of final output of that RNN unit. The final output ytis computed by applying another activation function to the hidden unit output and Why weight between hidden to output unit. The selection and application of activation function depends on the task being performed [hands on]. B. Long Short Term Memory Networks LSTM [10], as in Figure 2, introduces additional computation components to the RNN, the input gate, the forget gate and the output gate. The recurrence equation for the hidden vector is changed for LSTM with the use of long- term memory. The operations of the LSTM are designed to have fine-grained control over the data written into this long- term memory. The equations for the forward pass are stated below: = tanh ( + ℎ ) = σ ( + ℎ ) = σ ( + ℎ ) = σ ( + ℎ ) = ⨀ + ⨀ ℎ = ⨀ tanh ( ) (2) The current input and the previous state are worked upon by at after which the input gate it decides upon which parts of at are required to be added to the long term state ct. The forget gate ftmakes a decision as to which parts of ct−1 are to be erased and erases unnecessary parts, the output gate otdecides on the parts of ctto be read and shown as output. There exists a short term state htbetween the cells and a long term state ctin which the memories are dropped and added by the respective gates. Table 1: DATASET DESCRIPTION Dataset Source No. of Observations Description Dataset 1 [12] 2826 Half Hourly values of Electricity Demand ranging from 01-01-1991 to 01-03-1999 Dataset 2 UCI Repository [14] 9352 Hourly Air Quality dataset ranging from 10-03-2004 to 04-04-2005
  • 3. Jyoti Verma et al., International Journal of Advanced Trends in Computer Science and Engineering, 11(1), January - February 2022, 20 – 25 22 C. Gated Recurrent Units The Gated Recurrent Unit [11] can be viewed as a simplification of the LSTM, which does not use explicit cell states. The main simplifications are that both state vectors are merged into a single vector. Figure 3 represents a GRU cell and (3) the equations followed during forward pass. = ( . ( ) + .ℎ( ) ) = ( . ( ) + .ℎ( ) ) = ℎ( . ( ) + .( ( ) ⊗ ℎ( ) )) ℎ = (1 − ) ⊗ ℎ( .ℎ( ) + ⊗ ) (3) A single gate controller controls both the forget gate and the input gate. If the gate controller outputs a 1, the input gate is open and the forget gate is closed. If it outputs a 0, the opposite happens. In other words, whenever a memory must be stored, the location where it will be stored is erased first. This is actually a frequent variant to the LSTM cell in and of itself. There is no output gate; the full state vector is output at every time step. However, there is a new gate controller that controls which part of the previous state will be shown to the main layer. D. Bidirectional LSTM Bidirectional long-short term memory allows the neural network to have sequential information in both directions backwards and forward, i.e., past to future as well as future to past. Since the input flows in two directions, a bidirectional LSTMis different from the regular LSTM. With vanilla LSTM, the input flows either in forward direction or backwards. However, in bidirectional the input flows in both directions. This helps to preserve not only the past information but also the future data. 3.EXPERIMENT AND RESULTS The methodology followed for the proposed work and the results obtained is being discussed in this section. Figure 4 gives a more descriptive interpretation of the scheme followed. The time series datasets are first preprocessed for making it trainable using the neural network model. The models are further optimized, regularized and properly tuned for attaining generalized results avoiding underfitting as well as overfitting. The performance evaluation of the four variants of RNN and the proposed model is done using evaluation metrics which finally decides the best forecast model for the problem at hand. The experiments were carried out using the keras library with tensorflow backend and python programming language. Figure 1: Methodology A. Dataset Description The analysis is carried out on two real world datasets from varying domains and lengths. The description of the datasets is given in Table I. From the figures of the dataset, it can be observed that Dataset 1 follows a particular pattern repeating itself, however the number of observations is low. Dataset 2 is not as complex but it consists of the maximum number of observations. The aim is to analyze the performance of the models in differing set of scenarios of datasets and also prove the efficiency of the proposed model. Figure 2: Electricity Demand Data
  • 4. Jyoti Verma et al., International Journal of Advanced Trends in Computer Science and Engineering, 11(1), January - February 2022, 20 – 25 23 Figure 3: Air Quality Data The datasets described are raw datasets, i.e., preprocessing is required for making them usable. Since, neural network architecture is being used, converting the series to a stationary set is not a mandatory step. The values range go high as well as low due to which data normalization is performedin order to standardize the inputs and approach faster towards the global minima. It also ensures that larger inputs do not overwhelm or become dominant. Min-max normalization, as described in (4), is performed in a range of 0 to 1. It does not change the data pattern or characteristics but only readjusts the scale of the data. = (4) Data pre-processing step also includes the conversion to a supervised data format since time series datasets are being dealt with in here. One-step ahead forecast is to be done where the next time step (t+1) is predicted. Originally the univariate time series dataset consists of only one feature column. We divide this time series into input (x) and output (y) using lag time method. Lags differ in all three datasets. Dataset 1 consists of lag size 6and Dataset 2 of size 4. B. Network Modelling Modeling the neural network architecture such that it performs optimally requires setting and tuning different configurations of the network. The supervised data is split into a train-validation-test split for proper estimation of error. Optimization is the minimization of loss function with respect to the parameters of our model. Here, ADAM optimizer is used as stated in [15]. ADAM optimizer is robust and is used frequently used for training RNN architectures. Now, optimizers mainly aim to decrease the training error. But, sometimes this results in overfitting, i.e., the model fits well on the training data but unable to fit on the test data. approximately high value for the parameters governing the capacity of the model and then controlling it by adding a regularization term to the error function. In our work, we have used Dropout regularization as and when required [2]. A dropout layer blocks a random set of cell units in one iteration. Blocked units do not receive and do not transmit information. Removing connections in the network reduces the number of free parameters to be estimated during training and the complexity of the network. Consequently, dropout helps to prevent over-fitting. Dropout ratio of 0.2 is used in our work in the hidden layer. Hyperparameters are the settings that are not adapted by the learning algorithm since that would result in model overfitting. Hidden layers of size 2 and 3 were experimented upon. The number of hidden nodes was set to form a narrow architecture. Number of epochs is tuned for the problem at hand. Figure 4 represents the proposed network model denoting the input, output and the hidden layers. The convolution operations are performed in the initial layers for automatic feature extraction rather than doing it manually.Pooling is a process of down-sampling, which can effectively reducethe dimension of the matrix window, while retaining the deep informationat the same time. In this work, the max pooling was used. Then, we have the LSTM layers to preserve the sequential information. Finally, we have the output layer which gives the one step ahead forecast results. Figure 4 : Network Model C. Performance Metrics Four performance evaluation metrics are used to assess forecast accuracy. These error metrics are frequently used for assessing model accuracy. After evaluating all the forecast models according to the above stated metrics, the best configured forecast model is decided upon.These are shown in Table II below:
  • 5. Jyoti Verma et al., International Journal of Advanced Trends in Computer Science and Engineering, 11(1), January - February 2022, 20 – 25 24 Table 2: PERFORMANCE EVALUATION METRICS Metric Description MAE Mean Absolute Error 1 ( − ) MSE Mean Squared Error 1 ( − ) RMSE Root Mean Squared Error 1 ( − ) score score 1 − ∑ ( − ) ∑ ( − ⃗ ) Here, N is the total number of observations for which the error is evaluated. yn actual is the actual observation in nth position and yn predicted the predicted value. D. Results As stated earlier, the experiments are carried out on two datasets from different real-world domains. Four architectures of Recurrent Neural Networks, being RNN, LSTM, GRU, Bidirectional LSTM, and proposed hybrid deep neural network model areusedfor forecasting one step-ahead result. The results shown below in Table III and Table IV show the performance evaluation parameters for both the datasets with optimal parameters. Table 3:PERFORMANCE EVALUATION FOR DATASET 1 Models MAE MSE RMSE score RNN 0.09277 0.01489 0.1221 54.33 LSTM 0.09273 0.01462 0.1209 55.18 GRU 0.09185 0.01461 0.1209 55.19 Bidirectional 0.09182 0.01452 0.1205 55.48 Proposed 0.09082 0.01436 0.1198 55.97 Table 5: PERFORMANCE EVALUATION FOR DATASET 2 Models MAE MSE RMSE score RNN 0.02097 0.0006274 0.02505 94.27 LSTM 0.02003 0.0005952 0.02439 94.56 GRU 0.01972 0.0005882 0.02425 94.63 Bidirectional 0.01580 0.0005341 0.02311 95.13 Proposed 0.01566 0.0004176 0.02044 96.19 The results are stated for well-tuned models achieved after experiments for generalizing the model for a better test performance on unknown samples. From the table of results, many observations can be made about the performance of the forecast models. In the case of network performance, it can be seen that the proposed hybrid model performs the best out of all five models in both the scenarios. The convolutional layer does enhance the forecast performance when combined with recurrent layers of LSTM network. The bidirectional LSTM also indicates better performance than other variants due to its property of preserving both past and future information. Concerning the datasets, more amount of data leads to better performance. Dataset 1 has lesser amount of data samples than Dataset 2 which results in the models performing better in Dataset 2. The fluctuating data requires the complex structure of these RNN variants in order to learn the data patterns and dependencies required for forecasting. The proposed CNN- LSTM architecture enjoys the advantage of efficiency. 4.CONCLUSION In the paper, four variants of RNN, namely RNN, LSTM, GRU and Bidirectional LSTM has been analyzed on two different real-world datasets. It was observed that Bidirectional LSTM performed more efficiently amongst all. Further, a hybrid deep neural network mechanism comprising of convolutional layers along with recurrent layers and dropout regularization have been proposed. The proposed model is able provide optimal forecast results in both the datasets. Hence, the convolutional neural network’s properties can be utilized along with recurrent neural network architectures to develop efficient time series forecast mechanisms. REFERENCES [1] P. J. Brockwell and R. A. Davis, Introduction to Time Series and Forecasting - Second Edition. 2002. [2] Ian Goodfellow, Y. Bengio, and A. Courville, Deep learning. 2017. [3] H. Salehinejad, J. Baarbe, S. Sankar, J. Barfett, E. Colak, S. Valaee, ”Recent advances in recurrent neural networks,” 2017, [Online] Available: https://arxiv.org/abs/1801.01078 [4] R. Pascanu, T. Mikolov, and Y. Bengio, On the difficulty of training recurrent neural networks, in 30th International Conference on Machine Learning, ICML 2013, 2013, no. PART 3, pp. 23472355. [5] K. Greff, R. K. Srivastava, J. Koutnik, B. R. Steunebrink, and J. Schmidhuber, LSTM: A Search Space Odyssey, IEEE Trans. Neural Networks Learn. Syst., vol. 28, no. 10, pp. 22222232, 2017. [6] S. Srivastava and S. Lessmann, A comparative study of LSTM neural networks in forecasting day-ahead global horizontal irradiance with satellite data, Sol. Energy, vol. 162, pp. 232247, 2018. [7] A. Sagheer and M. Kotb, Time series forecasting of petroleum production using deep LSTM recurrent networks, Neurocomputing, vol. 323, pp. 203213, 2019.
  • 6. Jyoti Verma et al., International Journal of Advanced Trends in Computer Science and Engineering, 11(1), January - February 2022, 20 – 25 25 [8] W. Bao, J. Yue, and Y. Rao, A deep learning framework for financial time series using stacked autoencoders and long-short term memory, PLoS One, vol. 12, no. 7, 2017. [9] K. Unnikrishnan and K. P. Venugopal, Alopex: A correlation-based learning algorithm for feedforward and recurrent neural networks, Neural Computation, vol. 6, no. 3, pp. 469490, 1994. [10] F. A. Gers, J. Schmidhuber, F. Cummins,”Learning to forget: Continual prediction with LSTM”, Neural Comput., vol. 12, no. 6, pp. 2451-2471, 2000. [11] K. Cho, D. Bahdanau, F. Bougare, H. Schwenk and Y. Bengio, ”Learning Phrase Representations using RNN EncoderDecoderfor Statistical Machine Translation,” arXiv, 2014. [12] I. Koprinska, M. Rana, and V. G. Agelidis, Yearly and seasonal models for electricity load forecasting, in Neural Networks (IJCNN), The 2011 International Joint Conference on. IEEE, 2011, pp. 14741481. [13] [Online]. Available:https://finance.yahoo.com/quote/CSV/history/ [14] M. Lichman, UCI machine learning repository, 2013. [Online]. Available: http://archive.ics.uci.edu/ml [15] D. P. Kingma and J. L. Ba, Adam: A method for stochastic optimization, 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., 2015. [16] Y. Chen, Y. Wang, Z. Dong, J. Su, Z. Han, D. Zhou, Y. Zhao, Y. Bao, “2-D regional short-term wind speed forecast based on CNN-LSTM deep learning model,” Energy Conversion and Management, vol. 244,2021. [17] F. Elmaz, R. Eyckerman, W. Casteels, S. Latré, P. Hellinckx, "CNN-LSTM architecture for predictive indoor temperature modeling," Building and Environment, vol. 206, 2021.