SlideShare a Scribd company logo
Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67
How to cite this article: Gowri L, Manjula KR, Sasireka K, Deepa D. Assessment of statistical models for rainfall forecasting
using machine learning technique. J Soft Comput Civ Eng 2022;6(2):51–67. https://doi.org/10.22115/scce.2022.304260.1363
2588-2872/ © 2022 The Authors. Published by Pouyan Press.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Contents lists available at SCCE
Journal of Soft Computing in Civil Engineering
Journal homepage: www.jsoftcivil.com
Assessment of Statistical Models for Rainfall Forecasting Using
Machine Learning Technique
L. Gowri1
, K.R. Manjula1
, K. Sasireka2
, Durairaj Deepa2*
1. School of Computing, SASTRA Deemed to be University, Thanjavur, India
2. School of Civil Engineering, SASTRA Deemed to be University, Thanjavur, India
Corresponding author: deepa@src.sastra.edu
https://doi.org/10.22115/SCCE.2022.304260.1363
ARTICLE INFO ABSTRACT
Article history:
Received: 11 September 2021
Revised: 07 April 2022
Accepted: 08 April 2022
As heavy rainfall can lead to several catastrophes; the
prediction of rainfall is vital. The forecast encourages
individuals to take appropriate steps and should be
reasonable in the forecast. Agriculture is the most important
factor in ensuring a person's survival. The most crucial
aspect of agriculture is rainfall. Predicting rain has been a big
issue in recent years. Rainfall forecasting raises people's
awareness and allows them to plan ahead of time to preserve
their crops from the elements. To predict rainfall, many
methods have been developed. Instant comparisons between
past weather forecasts and observations can be processed
using machine learning. Weather models can better account
for prediction flaws, such as overestimated rainfall, with the
help of machine learning, and create more accurate
predictions. Thanjavur Station rainfall data for the period of
17 years from 2000 to 2016 is used to study the accuracy of
rainfall forecasting. To get the most accurate prediction
model, three prediction models ARIMA (Auto-Regression
Integrated with Moving Average Model), ETS (Error Trend
Seasonality Model) and Holt-Winters (HW) were compared
using R package. The findings show that the model of HW
and ETS performs well compared to models of ARIMA.
Performance criteria such as Akaike Information Criteria
(AIC) and Root Mean Square Error (RMSE) have been used
to identify the best forecasting model for Thanjavur station.
Keywords:
ARIMA;
ETS;
Holt-winters model;
Time series.
52 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67
1. Introduction
Agriculture is considered to be a back bone of countries such as India. One of the leading states
for agriculture is Tamil Nadu. Thanjavur is depicted as a rice bowl of TamilNadu from its
historical era. Surface water and ground water are the main sources for the development of
agriculture. The Cauvery River surface water supply is used for the cultivation of major crops
such as paddy, pulses, gingelly, groundnut, and sugarcane. The increase in surface water is
mainly based on the distribution of rainfall across the region. Due to inadequate water from the
Cauvery River, most of the farming area in Thanjavur district depends on the seasonal rainfall.
Taking these into account, rainfall forecasting over a prolonged duration will help to plan the
management of irrigation water and associated preparation.
To unravel hydrological problems, including forecasting rainfall, the Machine Learning (ML)
approach is widely used. The value of this modelling is that the ability of the software to plot
input-output patterns without the aforementioned knowledge of the factors affecting the forecast
parameters is important [1–3].
This forecast primarily benefits farmers and it is possible to use water supplies effectively as
well. Rainfall forecasting is a difficult job and the findings should be correct. By using weather
conditions including temperature, humidity, pressure, there are several hardware devices for
predicting rainfall. These conventional approaches do not work efficiently, so we can achieve
precise results by using machine learning techniques. By using historical data analysis of rainfall
in machine learning, it can forecast rainfall for future seasons. Many techniques can be applied,
such as classification, regression according to requirements, and we can also quantify the error
between the actual and forecast, as well as the precision. Different methods produce different
accuracies, so choosing the right algorithm and modelling it according to the requirements is
crucial.
Researchers [4–7], Developed Autoregressive Integrated Moving Average (ARIMA) for
prediction of monthly rainfall data forecast in the Indonesian region of Wagis and Pujion. Hoa
[4] developed a technique to predict weather forecasting with the help of image fuzzy clustering
and spatiotemporal using satellite appearance. By using the fuzzy clustering method, the satellite
image pixels were divided into clusters. The Fourier transformation method was used to filter out
random images, using the regression method to forecast the expected sequence of appearance.
The combine prediction model for monthly mean prediction used to increase the accuracy of
precipitation prediction along with error correction [8]. Using cross validation with models to try
to predict the optimal prediction for rainfall data with difference time horizons [9,10].
Thanjavur, often known as Tamil Nadu's rice bowl, has been noted for paddy production since
the Chola dynasty. It is situated in the Cauvery Delta region, which has both the necessary
criteria for paddy cultivation, namely abundant water and alluvial soil. The North-East monsoon
brings roughly 37cm of rain to this region, and the rivers are also a source of water. Due to
insufficient water from the Cauvery River, most of the farming area in Thanjavur district depends
on the seasonal rainfall. Taking these into account, rainfall forecasting over an extended period
can help to plan the management of irrigation water and associated planning. Instant comparisons
L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 53
between past weather forecasts and observations can be processed using machine learning. Weather
models can better account for prediction flaws, such as overestimated rainfall, with the help of machine
learning, and create more accurate predictions. The proposed research of time series analysis and
rainfall forecasting at Thanjavur station is being performed in an open-source data mining
environment called R. In order to find the best model for the research field, a comparative study
of the three models was carried out: ARIMA, ETS and Holt-winters [11,12]. The performance
assessment revealed that the HW model outperformed the ARIMA and ETS model.
2. Study area
Thanjavur is a city with the population close to 225,000 people, located in the state of Tamil
Nadu, South India. The latitude of Thanjavur, Tamil Nadu, India is 10.7816° N, and the longitude
is 79.1390° E. The Cauvery Delta Zone's daily rainfall is 956 mm, and the Cauvery River is the
main source of irrigation for cultivation in this district. With its fertile soil, the Thanjavur District
is not only one of the largest paddy cultivation areas in Tamilnadu but also in South India. For
the present analysis, 17 years of historical rainfall data from 2000 to 2016 were collected and
seasonal trend of the rainfall in this study area is represented in the time series plot is shown in
Fig.1.
Fig. 1. Thanjavur station Rainfall Time Series plot.
The plot of the time series reveals that rainfall has a seasonality pattern without any trends. Fig.1
illustrates that two peaks are observed per year in the time series map. In the North-East
monsoon (October-December), rainfall always hits its higher value and this pattern is always
repeated from year to year during the periods 2000-2016. The study area taken for rainfall
prediction is depicted in Fig 2 and the flow diagram, the details of the methods adopted for
current research work are explained in Fig.3.
54 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67
Fig. 2. Study area map.
Fig. 3. Flow diagram for rainfall forecasting.
L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 55
3. Methodology and data analysis
The Holt-Winters model, the ETS model and the ARIMA model are the models used in this
analysis ([7,8]. Monthly rainfall data for Thanjavur station for 17 years (2000 to 2016) was used
to verify the best method for rainfall forecasting in the study area. The data to be processed is
imported into the R environment using the Time series ts() function and then translated to a time
series object.
The rainfall data collected in the study area should be tested for seasonal and trend strength. The
seasonal or trend strength is greater than 0.5 and is then taken into account as a seasonal or trend
analysis. This verification is used to find that either a stationary or a non-stationary dataset
belongs to the given dataset. If the dataset is not stationary, the differentiating approach should
be modified to stationary. Then the data set is split into training and testing dataset. The training
dataset is used test the Kwiatkowski-Phillios-Schmidt_Shin (KPSS) and Augmented Dickey–
Fuller test (DF) test (R. J. Hyndman, 2019).
A. ARIMA model
For rainfall model estimation and univariate forecasting, the ARIMA model is used. It has three
elements (p,d,q). p stands for the number of lags of autoregressive (AR); d stands for the degree
of differencing (I) that helps as a stationary sequence and can be determined between previous
values and data values; q stands for the number of lags of moving average (MA). The 'MA' terms
are called error terms, which help to predict observations of current and future data. This
eliminates the random movements of time series values [13,14].
The ARIMA model components:
𝑅𝐹𝑡 = 𝑎 + ∑ 𝑏𝑖
𝑝
𝑖=1 𝑅𝐹𝑡−𝑖 + 𝑑0𝑒𝑡 + ∑ 𝑑𝑗
𝑞
𝑗=1 𝑒𝑡−𝑗 (1)
Where 𝑅𝐹𝑡 is monthly rainfall in time t.𝑇ℎ𝑒 𝑒𝑡 and 𝑒𝑡−1 is the value of error term and immediate
past error known at time t. The p and q are number of lags of dependent variable and error term
respectively.
B. ETS Model
Trends and seasonal components are the focus of the ETS model. The components of the
trend are expressed as N(none), A (Additive), Ad (Additive Damped), M(Multiplicative),
Md (Multiplicative Damped) [15,16]. The season is seen in the series as repeating the
short-term pattern of the cycle. The seasonal components are expressed as N(none), A
(Additive),M(Multiplicative). The forecast distributions are usual for models with only
additive components, so the medians and means are equal. In ETS, the default is AICc. The
model that minimizes the standard is chosen as acceptable for the information criteria.
AIC (Alkies’ Information Criteria) is: 𝐴 𝐼 𝐶 = −2(𝐿 ) + 2𝑘 (2)
𝐴 𝐼 𝐶 𝑐 = 𝐴 𝐼 𝐶 + 2(𝑘 + 1)(𝑘 + 2)𝑛 – 𝑘 (3)
56 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67
𝐴 𝐼 𝐶 𝑐 = 𝐴 𝐼 𝐶 + 2(𝑘 + 1)(𝑘 + 2)𝑛 – 𝑘 (4)
Forecasting Technique is used to do forecasting with the help of the ETS function, which can be
used with R. The following steps are taken to obtain a generally applicable and robust ETS
Model for autonomous forecasting: 1. For each series, apply all methods that are appropriate,
optimising the model (both the Smoothing Parameter and, as a result, the starting state variable)
in each case. 2. Choose the best model based on the AICc value. 3. Create a point forecast after
selecting the model with improved parameters. 4. To acquire the prediction intervals for the most
effective model.
C. Holt-Winters Model
The Holt Winters model uses an exponential smoothing of the performance and forecasting
distribution of time series. Three aspects of the time series were used in this model: level, trend
and seasonal values. The future value is predicted using several parameters, such as alpha (a),
gamma (γ) and beta (β) . It also utilizes frequency seasonality to be denoted as M. Two variations
that help to differ in the nature of the seasonal components were used by this method. When
seasonal variations are constant, the additive method is chosen. When seasonal variations change
in proportion to the average of the time series, the multiplicative method is chosen.
Holt-Winters additive method components:
Level formula:
𝐿𝑡 = 𝛼 (
𝑦𝑡
𝑆𝑡−𝑀) + (1 − 𝛼)( 𝛼𝑡−1 + 𝑇𝑡−1) (5)
Trend formula:
𝑇𝑡 = 𝛽(
𝐿𝑡
𝐿𝑡−1) + (1 − 𝛽)(𝑇𝑡−1 )
⁄ (6)
Seasonal formula:
𝑆𝑡 = 𝛾 (
𝑦𝑡
𝛼𝑡
) + (1 − 𝛾)𝑠𝑡 − 𝑀 (7)
The level formula shows a weighted average between the seasonal observation and the non-
seasonal forecast for𝑇𝑡. The trend formula is matching to Holt’s linear method. The seasonal
formula shows an average between current seasonal index and the seasonal index of the same
seasonal year (M).
Analysis of data
 The time series has been decomposed to get more detail about Trend, Seasonality, and
Remainder component and flow diagram is explained in the fig.4.
L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 57
Fig. 4. Proposed model flow diagram.
The Akaike information criterion (AIC) is a precise method for estimating how well a model fits
using the rainfall forecast data. It is used to compare different conceivable model samples and
govern which one is the best fit for the rainfall forecast data. This is named entropy
maximization principle and minimizing AIC values is equivalent to maximizing entropy and
helps to measure the relative loss of information. Generally, AIC is calculated from the number
of independent variables used to form the model and the maximum likelihood approximation of
the model.
𝐴𝐼𝐶 = 2𝑘 − 2ln(𝐿
̂)
K is the number of estimated parameter variables used and L is the log-likelihood estimate
parameter which is used for the model measure.
Mean Absolute Error (MAE) are metrics used to evaluate the average of absolute value of the
errors. The metrics helps to know how the model prediction rainfall forecast values are accurate
and calculate the amount of deviation from the actual rainfall forecast values. This helps to
predict the rainfall forecast based on the numbers of rainfall samples consider for the
measurement.
𝑀𝐴𝐸 = ∑ 𝑦𝑖
𝑛
𝑖=1
− 𝑥𝑖
Where, n is the total number of rainfall samples, 𝑦𝑖 is the model rainfall forecasts values
and 𝑥𝑖 is the true rainfall samples.
Initialize time
period M
Choose smoothing parameter values
alpha,beta and gamma(0 to 1)
Calculate Initial
Seasonal value
Calculate Initial
Trend value
Calculate Initial
level value
Derive continuous
seasonal value
Derive continuous
trend value
Derive continuous
level value
Examine Forecast Model Prediction
58 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67
Root Mean Squared Error (RMSE) is the square root of mean squared error, used as a standard
statistical parameter to measure the model performance of rainfall forecast data. The model
parameter indicated the standard deviation of residuals of rainfall forecast data.
𝑅𝑀𝑆𝐸 = √
∑ (𝑓𝑖 − 𝑜𝑖)2
𝑛
𝑖=1
𝑛
Where, n is the number of rainfall samples, f is the model rainfall forecasts values and o is the
observed rainfall samples. The RMSE is a good indicator to evaluate the performance of the
interpolation values. Decomposition is performed using the stl() function and divides the time
series automatically into three components (Trend, Seasonality, Remainder) shown in Fig. 5
Fig. 5. Time series decomposition.
 Calculation to assess trend and strength of seasonality
Ft: Trend Strength
𝐹𝑇 = 𝑚𝑎𝑥(0,1 −
𝑉𝑎𝑟(𝑅𝑡)
𝑉𝑎𝑟(𝑇𝑡+𝑅𝑡)
) (7)
Fs: Seasonal Strength
𝐹
𝑠 = 𝑚𝑎𝑥(0,1 −
𝑉𝑎𝑟(𝑅𝑡)
𝑉𝑎𝑟(𝑆𝑡+𝑅𝑡)
) (8)
The strength of the seasonal and trend ranged between 0 and 1, while ,1, indicates that the trend
and seasonal occurred very strongly. In the present study the Trend strength is 0.1 and Seasonal
strength is 0.5, it shows that the dataset follows seasonal pattern alone and it doesn’t follow the
trend pattern. It shows that our data is comes under stationary dataset. In Fig.5 the seasonal
subseries plot will provide a much more informative interpretation of our data. Seasonal
subseries plots are a tool for detecting seasonality in a time series.
L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 59
Pseudocode: Best model selection:
Input: rainfall data for Thanjavur region
Output: Best fit for forecast model
1.If seasonal_strength>=0.5 and/or trend_strength>=0.5
then Dataset is stationary series.
Else Tansform as stationary series.
2. Split the dataset into training and testing sets.
3. Calculate statistical values using KPSS and DF method.
4. visualize ACF and PACF lag values for model
parameters.
5. Train the dataset using different models:
5.1 ARIMA(p,d,q)(P,D,Q)
5.1.1 (p,q)= (i, i) where i= 0 to 4
If p=1 and d=0 and q=0 then AR model
else if p=0 and d=0 and q=1 then MA model
else if p=1 and d=0 and q=1 then ARMA model
5.2 ETS(A,Ad,A)
5.2.1 compare the seasonality component with remainder
values.
5.2.2 if output_components = independent then additive
series parameters
Else
multiplicative series parameters
5.3 Holt_Winters (L, T, S)
5.3.1 fix initial seed value of α, γ and, β
5.3.2 calculate initial seasonal (S), Level (L), Trend (T)
factors
5.3.3 check the parameters as additive or
multiplicative components
6. Find the residuals and apply diagnostic test. If the
residuals are good then fit the model. Otherwise repeat the
same process go to 5 and change the parameter values.
7. Custom the fitted model for forecasting.
60 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67
4. Result and discussion
The prediction of rainfall at Thanjavur for the time series is carried out by the construction of
ETS, ARIMA and Holt-winters models. Out of the available 17-year monthly data, 10-year data
from 2000 to 2009 is taken as training, 2010 to 2014 is taken as testing, and the prediction for the
next two years from 2014 to 2016 is attained. The resulting prediction is correlated with the real
rainfall data and plotted against it.
Fig. 6 shows that the rainfall gradually increases from October and reach its maximum value in
the month of November due to NE (North-East) monsoon season and decreases gradually and
reach its minimum value in the month of March. Rainfall will begin to increase again after
March and reach its maximum value in the month of August and September due to SW (South-
West) monsoon. It depicts monthly average rainfall data for four time periods (based on
industrial development and urbanisation phases). Significant changes in monthly rainfall have
been discovered in the plot over the years and in the years to come. Monthly rainfall increased
from March to September, indicating more rain in the pre-monsoon (March-May) and monsoon
(June-September) seasons. Papalaskaris et al. [17] reported a similar pattern when estimating
rainfall over Bangladesh. Excessive rain will result in major floods, putting crops at risk and
causing waterlogging in the city. On the other side, a similar falling (December-January) rainfall
trend was observed in October-November, followed by an increase in February, indicating a
lower rainfall and dryer crop season.
Fig. 6. Seasonal Subseries Plot.
L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 61
4.1. Comparison of three models
A statistical model is the use of statistics to build a representation of the data and then conduct
analysis to infer any relationships between variables or discover insights. Machine learning, on
the other hand, is the use of mathematical or statistical models to obtain a general understanding
of the data to make predictions. Still, many in the industry use these terms interchangeably.
While some may not see any harm in this, a true data scientist must understand the distinction
between the two.
1. ARIMA Model
Our data is given under a seasonal data set based on the strength and seasonal test, so it is
regarded as stationary data. Six types of ARIMA models are used in this study and the best
method out of six ARIMA models is chosen based on the AIC value. The capacity of the selected
ARIMA model for precipitation and temperature (maximum and lowest) to evaluate the relative
quality of statistical model for a given dataset is examined using AIC criterion. The Akaike
Information Criterion (AIC) is a constant estimate plus the distance between the unknown true
likelihood function of the data and the fitted likelihood function of the model, with a lower AIC
indicating that the model is closer to the truth. In other words, AIC calculates the amount of
information lost by a particular model, with the lower the amount of information lost, the higher
the model's quality.
Table 1
Accuracy level of ARIMA model.
ARIMA (p,d,q)
Model
AIC value
M1 (1,0,1) 1475.799
M2 (1,0,2) 1454.979
M3 (0,0,2) 1472.167
M4 (2,0,1) 1455.255
M5 (2,0,2) 1455.879
M6 –auto ARIMA 1463.207
Fig. 6 displays the Ljung-Box test and the ACF plot of model residuals. From Fig.6 it can be
concluded that this model is acceptable for forecasting as its residuals represent the behaviour of
white noise and are uncorrelated to each other.
2. ETS model
ETS stands for Error Trend Seasonality. The ETS stands for exponential smoothing state space
models that effectively fit the data (A, Ad, M). The parameters that were utilised to create these
models, which were chosen in order to produce data that appeared to be reasonably realistic. The
method clearly has a high success rate in determining whether the errors are additive or
multiplicative. The optimum result is obtained in ETS model when the Trend is treated as
Additive series and Error and Seasonality are treated as Multiplicative series. After a residual
62 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67
check, ACF diagram shown in Fig. 7 demonstrates that the majority of sample autocorrelation
coefficients of residuals from the fitted ETS state space models are within the model's bounds,
implying that the residuals are white noise and the models are appropriate. The test results reveal
that there are no autocorrelations in the in-sample forecast errors, as well as the distribution of
forecast errors, confirming the evidence of no autocorrelations. This shows that the simple
exponential smoothing method can be used to estimate rainfall with reasonable accuracy.
Fig. 7. Residual check on ARIMA model.
3. Holts-Winters Model
Holt-Winters model is also known as Triple Exponential smoothing. Here the given observed
data is decomposed into seasonal, level and trend. The exponential weighted moving average of
all three components is then blended and result is obtained. Prediction by this model (Fig.8) is
also similar to the previous model. And there is a sign of little improvement in low magnitude
rainfall. But there is no proper estimation of peak rainfall reported in the monsoon months.
Fig. 8. Residual check on ETS model.
L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 63
The selected model is compared with actual data set and it is shown in Fig.9. The green line
represents the actual data ranges from 2000 to 2016. The other models ARIMA, ETS and HW
are plotted with training data ranges from 2000 to 2009. By comparing actual data with model
data, all the models are almost fit the same value with actual data. Based on the accuracy, HW
Model doing better in both training and test set compared to ARIMA Model and ETS model.
Fig. 9. Residual check on HW model.
The selected model is compared with actual data set and it is shown in Fig.10. The green line
represents the actual data ranges from 2000 to 2016. The other models ARIMA, ETS and HW are
plotted with training data ranges from 2000 to 2009. By comparing actual data with model data,
all the models are almost fit the same value with actual data. Based on the accuracy, HW Model
doing better in both training and test set compared to ARIMA Model and ETS model.
Fig. 10. Actual Data vs ARIMA, ETS and Holt-winters Forecasting.
Forecasting was done using three models, ARIMA, ETS and HW is shown in Fig. 11 to Fig. 13
respectively. The models show similar movement based on the plot with the lowest value of
rainfall will occur beginning month of each year as well as it follows the seasonal rainfall pattern
of our study area. By comparing the ETS and HW forecasting models, both the model predicts
similar way and ARIMA model slightly differ than the other models. The performance of the
64 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67
model is evaluated with reference to Root Mean Squared Error (RMSE), AIC value and model
fit.
The RMSE and AIC values for models are given in Table 2. Both the RMSE and AIC value
reveal that HW model is outperforming the rest of the models. It can be seen from the Table II
that the highest accuracy is reported for HW model followed by ETS and ARIMA model. HW
model has better correlation with actual values. Hence, the results shows that the HW as well as
ETS models are suitable to predict future rainfall and seasonal pattern of the rainfall in the study
area. This prediction of rainfall using ML can be useful for a farmer who wants to know when is
the best month to start planting, as well as for the government who needs to prepare some
strategy to avoid rainy season floods and dry season drought. The most important thing is that
this forecast is based only on the historical average, using meteorological data and some
knowledge from climate experts to incorporate the more detailed forecast. The future work focus
on the same data set will be applied in the recurrent neural network-based prediction and try to
improve accurate results [3,17,18]. As a result, the additive Holt-Winters approach is
recommended for future forecasting above the multiplicative Holt-Winters method. The
anticipated values will aid disaster management in determining future rainfall patterns, whether
drought or flooding is expected. Furthermore, it will assist farmers in making timely decisions on
the seeding of crops, fruits, and dried fruits.
Table 2
Comparison of three Models.
Model RMSE MAE AIC value
ARIMA 54.287 39.474 1454.979
ETS 49.158 37.460 1452.286
HOLT-WINTERS 48..670 36.751 1450.817
Fig. 11. Prediction of monthly rainfall using ARIMA model.
L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 65
Fig. 12. Prediction of monthly rainfall using ETS model.
Fig. 13. Prediction of monthly rainfall using Holt-Winters model.
Given the fact that it does not rain much during the dry season, there is a nonsignificant positive
relationship between rainfall and average temperature from November to January, indicating that
a small increase in average temperature results in more rainfall. In any other month, there is no
notable relationship. During the Pre-Monsoon and Post-Monsoon seasons, rainfall and
temperature have a slight inverse relationship. Despite the fact that there is no significant yearly
relationship, temperature fluctuates unfavourably during Rabi season and favourably during
Kharif season.
5. Conclusion
In the present study, we have reported the time-series analysis and comparative study of machine
learning models for the forecasting of rainfall at Thajnavur station of Tamilnadu. The dataset
consists of monthly rainfall updates from January 2000 to December 2016. The time-series data
is visualized by plotting time-series plot and correlation plots. For the timeseries forecasting of
rainfall at Thanjavur station is carried out by building ARIMA, ETS and Holt-winters models.
66 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67
The performance of the model is evaluated with reference to Root Mean Squared Error (RMSE),
MAE and AIC value. The comparative analysis revealed that HW model accurately forecasts the
rainfall with less error. Thus, derived model could be used to forecast monthly rainfall for the
upcoming years. Research concludes that the imperative issue of accurate forecasting of rainfall
can be handled by machine learning models. It is significant to mention that, while model
forecasts cannot predict exact precipitation amounts, they can reveal the likely trend of future
rains and provide information that can assist decision-makers in developing strategies in areas
such as agriculture, where knowing the start and end of rainy seasons is critical, civil works
planning, and the time to prepare of mitigation plans for natural hazards, such as flooding.
Finally, it's worth noting that rational planning and complete management of water resources
necessitate forecasting future events while keeping in mind that most forecasts are based on
previous events.
References
[1] Hipni A, El-shafie A, Najah A, Karim OA, Hussain A, Mukhlisin M. Daily Forecasting of Dam
Water Levels: Comparing a Support Vector Machine (SVM) Model With Adaptive Neuro Fuzzy
Inference System (ANFIS). Water Resour Manag 2013;27:3803–23.
https://doi.org/10.1007/s11269-013-0382-4.
[2] Najah A, El-Shafie A, Karim OA, Jaafar O. Integrated versus isolated scenario for prediction
dissolved oxygen at progression of water quality monitoring stations. Hydrol Earth Syst Sci
2011;15:2693–708. https://doi.org/10.5194/hess-15-2693-2011.
[3] Mahsin M, Akhter Y, Begum M. Modeling Rainfall in Dhaka Division of Bangladesh Using Time
Series. J Math Model Appl 2012;1:67–73.
[4] Tektaş M. Weather Forecasting Using ANFIS and ARIMA MODELS. A Case Study for Istanbul.
Environ Res Eng Manag 2010;1:5–10. https://doi.org/10.5755/j01.erem.51.1.58.
[5] Sciences E. Time Series Analysis Model for Rainfall Data in Jordan : Case Study for Using Time
Series Analysis P . E . Naill M . Momani King Abdul Aziz University , Jeddah , Kingdom of Saudi
Arabia. Am J Environ Sci 2009;5:599–604.
[6] Shamsnia SA, Shahidi N, Liaghat A, Sarraf A, Vahdat SF. Modeling of weather parameters using
stochastic methods (ARIMA model)(case study: Abadeh Region, Iran). Int Conf Environ Ind
Innov IPCBEE 2011;12:282–5.
[7] Suhartono, Faulina R, Lusia DA, Otok BW, Sutikno, Kuswanto H. Ensemble method based on
ANFIS-ARIMA for rainfall prediction. ICSSBE 2012 - Proceedings, 2012 Int Conf Stat Sci Bus
Eng "Empowering Decis Mak with Stat Sci 2012:240–3.
https://doi.org/10.1109/ICSSBE.2012.6396564.
[8] Li G, Chang W, Yang H. A Novel Combined Prediction Model for Monthly Mean Precipitation
with Error Correction Strategy. IEEE Access 2020;8:141432–45.
https://doi.org/10.1109/ACCESS.2020.3013354.
[9] Vienna A. R Core Team R: A language and environment for statistical computing 2017.
[10] Hyndman [10] R. J. Forecasting functions for time series and linear models_. R package version
8.2. 2017.
[11] Mila FA, Parvin MT. Forecasting Area, Production and Yield of Onion in Bangladesh by Using
ARIMA Model. Asian J Agric Extension, Econ Sociol 2019:1–12.
https://doi.org/10.9734/ajaees/2019/v37i430274.
L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 67
[12] Punia M, Joshi PK, Porwal MC. Decision tree classification of land use land cover for Delhi, India
using IRS-P6 AWiFS data. Expert Syst Appl 2011;38:5577–83.
https://doi.org/10.1016/j.eswa.2010.10.078.
[13] Burlando, P.; Rosso, R.; Cadavid, L.G.; Salas J. Forecasting of short-term rainfall using ARMA
models. J Hydrol 1993:144: 193–211.
[14] Salas, J.D.; Obeysekera JT. ARMA model identification of hydrologic time series. Water Resour
Manag 1982;18:1011–1021.
[15] Ridwan WM, Sapitang M, Aziz A, Kushiar KF, Ahmed AN, El-Shafie A. Rainfall forecasting
model using machine learning methods: Case study Terengganu, Malaysia. Ain Shams Eng J
2021;12:1651–63. https://doi.org/10.1016/j.asej.2020.09.011.
[16] Valipour M. Ability of Box-Jenkins Models to Estimate of Reference Potential Evapotranspiration
(A Case Study: Mehrabad Synoptic Station, Tehran, Iran). IOSR J Agric Vet Sci 2012;1:01–11.
https://doi.org/10.9790/2380-0150111.
[17] Papalaskaris T, Panagiotidis T, Pantrakis A. Stochastic Monthly Rainfall Time Series Analysis,
Modeling and Forecasting in Kavala City, Greece, North-Eastern Mediterranean Basin. Procedia
Eng 2016;162:254–63. https://doi.org/10.1016/j.proeng.2016.11.054.
[18] Thakkar AK, Desai VR, Patel A, Potdar MB. Post-classification corrections in improving the
classification of Land Use/Land Cover of arid region using RS and GIS: The case of Arjuni
watershed, Gujarat, India. Egypt J Remote Sens Sp Sci 2017;20:79–89.
https://doi.org/10.1016/j.ejrs.2016.11.006.

More Related Content

PDF
Combined Standardized Precipitation Index and ANFIS Approach for Predicting R...
PDF
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEY
PDF
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEY
PDF
Modeling and predicting the monthly rainfall in tamilnadu as a seasonal multi...
PDF
Modeling and predicting the monthly rainfall in tamilnadu
PDF
Statistical analysis of an orographic rainfall for eight north-east region of...
PDF
Performance Evaluation of Machine Learning Models in Predicting Dry and Wet C...
PDF
Assessment of two Methods to study Precipitation Prediction
Combined Standardized Precipitation Index and ANFIS Approach for Predicting R...
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEY
RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A SURVEY
Modeling and predicting the monthly rainfall in tamilnadu as a seasonal multi...
Modeling and predicting the monthly rainfall in tamilnadu
Statistical analysis of an orographic rainfall for eight north-east region of...
Performance Evaluation of Machine Learning Models in Predicting Dry and Wet C...
Assessment of two Methods to study Precipitation Prediction

Similar to Assessment of Statistical Models for Rainfall Forecasting Using Machine Learning Technique (20)

PDF
GIS-MAP based Spatial Analysis of Rainfall Data of Andhra Pradesh and Telanga...
PDF
Pesticide recommendation system for cotton crop diseases due to the climatic ...
PDF
Time Series Data Analysis for Forecasting – A Literature Review
PPTX
Presentation1rainfall prediction raincast
PDF
Monthly precipitation forecasting with Artificial Intelligence.
PDF
IRJET- Rainfall Forecasting using Regression Techniques
PPTX
Final_NaanMudhvan_Rainfall_Prediction.pptx
PDF
On the performance analysis of rainfall prediction using mutual information...
PDF
Precipitation prediction using recurrent neural networks and long short-term ...
PDF
Forecasting precipitation using sarima model
PDF
proposal
PDF
Application of mathematical modelling in rainfall forcast a csae study in...
PDF
Mujumdar, PP y Kumar, DN Modelos estocásticos de caudal algunos estudios de ...
PDF
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES
PPTX
Discharge-Forecasting-of-Mahanadi-River-Basin.pptx
PDF
A Literature Review on Rainfall Prediction using different Data Mining Techni...
PDF
Comparative Study of Machine Learning Algorithms for Rainfall Prediction
PDF
CENTROG FEATURE TECHNIQUE FOR VEHICLE TYPE RECOGNITION AT DAY AND NIGHT TIMES
PDF
Levenberg-Marquardt-optimized neural network for rainfall forecasting
PDF
Estimation of precipitation during the period of south west monsoon
GIS-MAP based Spatial Analysis of Rainfall Data of Andhra Pradesh and Telanga...
Pesticide recommendation system for cotton crop diseases due to the climatic ...
Time Series Data Analysis for Forecasting – A Literature Review
Presentation1rainfall prediction raincast
Monthly precipitation forecasting with Artificial Intelligence.
IRJET- Rainfall Forecasting using Regression Techniques
Final_NaanMudhvan_Rainfall_Prediction.pptx
On the performance analysis of rainfall prediction using mutual information...
Precipitation prediction using recurrent neural networks and long short-term ...
Forecasting precipitation using sarima model
proposal
Application of mathematical modelling in rainfall forcast a csae study in...
Mujumdar, PP y Kumar, DN Modelos estocásticos de caudal algunos estudios de ...
ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES
Discharge-Forecasting-of-Mahanadi-River-Basin.pptx
A Literature Review on Rainfall Prediction using different Data Mining Techni...
Comparative Study of Machine Learning Algorithms for Rainfall Prediction
CENTROG FEATURE TECHNIQUE FOR VEHICLE TYPE RECOGNITION AT DAY AND NIGHT TIMES
Levenberg-Marquardt-optimized neural network for rainfall forecasting
Estimation of precipitation during the period of south west monsoon
Ad

More from Journal of Soft Computing in Civil Engineering (20)

PDF
A Data-Driven Approach Based on Deep Neural Network Regression for Predicting...
PDF
A Comprehensive Investigation into the Accuracy of Soft Computing Tools for D...
PDF
A Comparative Study between Different Machine Learning Algorithms for Estimat...
PDF
Enhancing Structural Health Monitoring of Super-Tall Buildings Using Support ...
PDF
Application of Decision Tree (M5Tree) Algorithm for Multicrop Yield Predictio...
PDF
Discrete Sizing Optimization of Steel Structures Using Modified Fireworks Alg...
PDF
Application of Analytic Hierarchy Process and Structural Equation Modeling fo...
PDF
Masterpiece Optimization Algorithm: A New Method for Solving Engineering Prob...
PDF
Axial Capacity Estimation of FRP-strengthened Corroded Concrete Columns
PDF
ANFIS Models with Subtractive Clustering and Fuzzy C-Mean Clustering Techniqu...
PDF
Assessment of Machine Learning Methods for Concrete Compressive Strength Pred...
PDF
Enhancing Operational Efficacy of Smart Parking Facilities through Intelligen...
PDF
Development of A Novel Discharge Routing Method Based On the Large Discharge ...
PDF
Forecasting Road Accidents Using Deep Learning Approach: Policies to Improve ...
PDF
Flexural Capacity Prediction of RC Beams Strengthened in Terms of NSM System ...
PDF
Efficient Ensemble Learning-Based Models for Plastic Hinge Length Prediction ...
PDF
Utilizing Artificial Intelligence to Solve Construction Site Layout Planning ...
PDF
Generation of Dimensioned Floor Plans for a Given Boundary Layout
PDF
Discovering Bengkulu Province Earthquake Clusters with CLARANS Methods
PDF
A Metaheuristic-Trained Wavelet Neural Network for Predicting of Soil Liquefa...
A Data-Driven Approach Based on Deep Neural Network Regression for Predicting...
A Comprehensive Investigation into the Accuracy of Soft Computing Tools for D...
A Comparative Study between Different Machine Learning Algorithms for Estimat...
Enhancing Structural Health Monitoring of Super-Tall Buildings Using Support ...
Application of Decision Tree (M5Tree) Algorithm for Multicrop Yield Predictio...
Discrete Sizing Optimization of Steel Structures Using Modified Fireworks Alg...
Application of Analytic Hierarchy Process and Structural Equation Modeling fo...
Masterpiece Optimization Algorithm: A New Method for Solving Engineering Prob...
Axial Capacity Estimation of FRP-strengthened Corroded Concrete Columns
ANFIS Models with Subtractive Clustering and Fuzzy C-Mean Clustering Techniqu...
Assessment of Machine Learning Methods for Concrete Compressive Strength Pred...
Enhancing Operational Efficacy of Smart Parking Facilities through Intelligen...
Development of A Novel Discharge Routing Method Based On the Large Discharge ...
Forecasting Road Accidents Using Deep Learning Approach: Policies to Improve ...
Flexural Capacity Prediction of RC Beams Strengthened in Terms of NSM System ...
Efficient Ensemble Learning-Based Models for Plastic Hinge Length Prediction ...
Utilizing Artificial Intelligence to Solve Construction Site Layout Planning ...
Generation of Dimensioned Floor Plans for a Given Boundary Layout
Discovering Bengkulu Province Earthquake Clusters with CLARANS Methods
A Metaheuristic-Trained Wavelet Neural Network for Predicting of Soil Liquefa...
Ad

Recently uploaded (20)

PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Internet of Things (IOT) - A guide to understanding
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PPTX
Lecture Notes Electrical Wiring System Components
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
web development for engineering and engineering
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
PPT on Performance Review to get promotions
PPT
Mechanical Engineering MATERIALS Selection
PPT
Project quality management in manufacturing
PDF
Digital Logic Computer Design lecture notes
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Sustainable Sites - Green Building Construction
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
DOCX
573137875-Attendance-Management-System-original
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
CYBER-CRIMES AND SECURITY A guide to understanding
Internet of Things (IOT) - A guide to understanding
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
Lecture Notes Electrical Wiring System Components
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
web development for engineering and engineering
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPT on Performance Review to get promotions
Mechanical Engineering MATERIALS Selection
Project quality management in manufacturing
Digital Logic Computer Design lecture notes
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Sustainable Sites - Green Building Construction
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
573137875-Attendance-Management-System-original

Assessment of Statistical Models for Rainfall Forecasting Using Machine Learning Technique

  • 1. Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 How to cite this article: Gowri L, Manjula KR, Sasireka K, Deepa D. Assessment of statistical models for rainfall forecasting using machine learning technique. J Soft Comput Civ Eng 2022;6(2):51–67. https://doi.org/10.22115/scce.2022.304260.1363 2588-2872/ © 2022 The Authors. Published by Pouyan Press. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Contents lists available at SCCE Journal of Soft Computing in Civil Engineering Journal homepage: www.jsoftcivil.com Assessment of Statistical Models for Rainfall Forecasting Using Machine Learning Technique L. Gowri1 , K.R. Manjula1 , K. Sasireka2 , Durairaj Deepa2* 1. School of Computing, SASTRA Deemed to be University, Thanjavur, India 2. School of Civil Engineering, SASTRA Deemed to be University, Thanjavur, India Corresponding author: deepa@src.sastra.edu https://doi.org/10.22115/SCCE.2022.304260.1363 ARTICLE INFO ABSTRACT Article history: Received: 11 September 2021 Revised: 07 April 2022 Accepted: 08 April 2022 As heavy rainfall can lead to several catastrophes; the prediction of rainfall is vital. The forecast encourages individuals to take appropriate steps and should be reasonable in the forecast. Agriculture is the most important factor in ensuring a person's survival. The most crucial aspect of agriculture is rainfall. Predicting rain has been a big issue in recent years. Rainfall forecasting raises people's awareness and allows them to plan ahead of time to preserve their crops from the elements. To predict rainfall, many methods have been developed. Instant comparisons between past weather forecasts and observations can be processed using machine learning. Weather models can better account for prediction flaws, such as overestimated rainfall, with the help of machine learning, and create more accurate predictions. Thanjavur Station rainfall data for the period of 17 years from 2000 to 2016 is used to study the accuracy of rainfall forecasting. To get the most accurate prediction model, three prediction models ARIMA (Auto-Regression Integrated with Moving Average Model), ETS (Error Trend Seasonality Model) and Holt-Winters (HW) were compared using R package. The findings show that the model of HW and ETS performs well compared to models of ARIMA. Performance criteria such as Akaike Information Criteria (AIC) and Root Mean Square Error (RMSE) have been used to identify the best forecasting model for Thanjavur station. Keywords: ARIMA; ETS; Holt-winters model; Time series.
  • 2. 52 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 1. Introduction Agriculture is considered to be a back bone of countries such as India. One of the leading states for agriculture is Tamil Nadu. Thanjavur is depicted as a rice bowl of TamilNadu from its historical era. Surface water and ground water are the main sources for the development of agriculture. The Cauvery River surface water supply is used for the cultivation of major crops such as paddy, pulses, gingelly, groundnut, and sugarcane. The increase in surface water is mainly based on the distribution of rainfall across the region. Due to inadequate water from the Cauvery River, most of the farming area in Thanjavur district depends on the seasonal rainfall. Taking these into account, rainfall forecasting over a prolonged duration will help to plan the management of irrigation water and associated preparation. To unravel hydrological problems, including forecasting rainfall, the Machine Learning (ML) approach is widely used. The value of this modelling is that the ability of the software to plot input-output patterns without the aforementioned knowledge of the factors affecting the forecast parameters is important [1–3]. This forecast primarily benefits farmers and it is possible to use water supplies effectively as well. Rainfall forecasting is a difficult job and the findings should be correct. By using weather conditions including temperature, humidity, pressure, there are several hardware devices for predicting rainfall. These conventional approaches do not work efficiently, so we can achieve precise results by using machine learning techniques. By using historical data analysis of rainfall in machine learning, it can forecast rainfall for future seasons. Many techniques can be applied, such as classification, regression according to requirements, and we can also quantify the error between the actual and forecast, as well as the precision. Different methods produce different accuracies, so choosing the right algorithm and modelling it according to the requirements is crucial. Researchers [4–7], Developed Autoregressive Integrated Moving Average (ARIMA) for prediction of monthly rainfall data forecast in the Indonesian region of Wagis and Pujion. Hoa [4] developed a technique to predict weather forecasting with the help of image fuzzy clustering and spatiotemporal using satellite appearance. By using the fuzzy clustering method, the satellite image pixels were divided into clusters. The Fourier transformation method was used to filter out random images, using the regression method to forecast the expected sequence of appearance. The combine prediction model for monthly mean prediction used to increase the accuracy of precipitation prediction along with error correction [8]. Using cross validation with models to try to predict the optimal prediction for rainfall data with difference time horizons [9,10]. Thanjavur, often known as Tamil Nadu's rice bowl, has been noted for paddy production since the Chola dynasty. It is situated in the Cauvery Delta region, which has both the necessary criteria for paddy cultivation, namely abundant water and alluvial soil. The North-East monsoon brings roughly 37cm of rain to this region, and the rivers are also a source of water. Due to insufficient water from the Cauvery River, most of the farming area in Thanjavur district depends on the seasonal rainfall. Taking these into account, rainfall forecasting over an extended period can help to plan the management of irrigation water and associated planning. Instant comparisons
  • 3. L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 53 between past weather forecasts and observations can be processed using machine learning. Weather models can better account for prediction flaws, such as overestimated rainfall, with the help of machine learning, and create more accurate predictions. The proposed research of time series analysis and rainfall forecasting at Thanjavur station is being performed in an open-source data mining environment called R. In order to find the best model for the research field, a comparative study of the three models was carried out: ARIMA, ETS and Holt-winters [11,12]. The performance assessment revealed that the HW model outperformed the ARIMA and ETS model. 2. Study area Thanjavur is a city with the population close to 225,000 people, located in the state of Tamil Nadu, South India. The latitude of Thanjavur, Tamil Nadu, India is 10.7816° N, and the longitude is 79.1390° E. The Cauvery Delta Zone's daily rainfall is 956 mm, and the Cauvery River is the main source of irrigation for cultivation in this district. With its fertile soil, the Thanjavur District is not only one of the largest paddy cultivation areas in Tamilnadu but also in South India. For the present analysis, 17 years of historical rainfall data from 2000 to 2016 were collected and seasonal trend of the rainfall in this study area is represented in the time series plot is shown in Fig.1. Fig. 1. Thanjavur station Rainfall Time Series plot. The plot of the time series reveals that rainfall has a seasonality pattern without any trends. Fig.1 illustrates that two peaks are observed per year in the time series map. In the North-East monsoon (October-December), rainfall always hits its higher value and this pattern is always repeated from year to year during the periods 2000-2016. The study area taken for rainfall prediction is depicted in Fig 2 and the flow diagram, the details of the methods adopted for current research work are explained in Fig.3.
  • 4. 54 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 Fig. 2. Study area map. Fig. 3. Flow diagram for rainfall forecasting.
  • 5. L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 55 3. Methodology and data analysis The Holt-Winters model, the ETS model and the ARIMA model are the models used in this analysis ([7,8]. Monthly rainfall data for Thanjavur station for 17 years (2000 to 2016) was used to verify the best method for rainfall forecasting in the study area. The data to be processed is imported into the R environment using the Time series ts() function and then translated to a time series object. The rainfall data collected in the study area should be tested for seasonal and trend strength. The seasonal or trend strength is greater than 0.5 and is then taken into account as a seasonal or trend analysis. This verification is used to find that either a stationary or a non-stationary dataset belongs to the given dataset. If the dataset is not stationary, the differentiating approach should be modified to stationary. Then the data set is split into training and testing dataset. The training dataset is used test the Kwiatkowski-Phillios-Schmidt_Shin (KPSS) and Augmented Dickey– Fuller test (DF) test (R. J. Hyndman, 2019). A. ARIMA model For rainfall model estimation and univariate forecasting, the ARIMA model is used. It has three elements (p,d,q). p stands for the number of lags of autoregressive (AR); d stands for the degree of differencing (I) that helps as a stationary sequence and can be determined between previous values and data values; q stands for the number of lags of moving average (MA). The 'MA' terms are called error terms, which help to predict observations of current and future data. This eliminates the random movements of time series values [13,14]. The ARIMA model components: 𝑅𝐹𝑡 = 𝑎 + ∑ 𝑏𝑖 𝑝 𝑖=1 𝑅𝐹𝑡−𝑖 + 𝑑0𝑒𝑡 + ∑ 𝑑𝑗 𝑞 𝑗=1 𝑒𝑡−𝑗 (1) Where 𝑅𝐹𝑡 is monthly rainfall in time t.𝑇ℎ𝑒 𝑒𝑡 and 𝑒𝑡−1 is the value of error term and immediate past error known at time t. The p and q are number of lags of dependent variable and error term respectively. B. ETS Model Trends and seasonal components are the focus of the ETS model. The components of the trend are expressed as N(none), A (Additive), Ad (Additive Damped), M(Multiplicative), Md (Multiplicative Damped) [15,16]. The season is seen in the series as repeating the short-term pattern of the cycle. The seasonal components are expressed as N(none), A (Additive),M(Multiplicative). The forecast distributions are usual for models with only additive components, so the medians and means are equal. In ETS, the default is AICc. The model that minimizes the standard is chosen as acceptable for the information criteria. AIC (Alkies’ Information Criteria) is: 𝐴 𝐼 𝐶 = −2(𝐿 ) + 2𝑘 (2) 𝐴 𝐼 𝐶 𝑐 = 𝐴 𝐼 𝐶 + 2(𝑘 + 1)(𝑘 + 2)𝑛 – 𝑘 (3)
  • 6. 56 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 𝐴 𝐼 𝐶 𝑐 = 𝐴 𝐼 𝐶 + 2(𝑘 + 1)(𝑘 + 2)𝑛 – 𝑘 (4) Forecasting Technique is used to do forecasting with the help of the ETS function, which can be used with R. The following steps are taken to obtain a generally applicable and robust ETS Model for autonomous forecasting: 1. For each series, apply all methods that are appropriate, optimising the model (both the Smoothing Parameter and, as a result, the starting state variable) in each case. 2. Choose the best model based on the AICc value. 3. Create a point forecast after selecting the model with improved parameters. 4. To acquire the prediction intervals for the most effective model. C. Holt-Winters Model The Holt Winters model uses an exponential smoothing of the performance and forecasting distribution of time series. Three aspects of the time series were used in this model: level, trend and seasonal values. The future value is predicted using several parameters, such as alpha (a), gamma (γ) and beta (β) . It also utilizes frequency seasonality to be denoted as M. Two variations that help to differ in the nature of the seasonal components were used by this method. When seasonal variations are constant, the additive method is chosen. When seasonal variations change in proportion to the average of the time series, the multiplicative method is chosen. Holt-Winters additive method components: Level formula: 𝐿𝑡 = 𝛼 ( 𝑦𝑡 𝑆𝑡−𝑀) + (1 − 𝛼)( 𝛼𝑡−1 + 𝑇𝑡−1) (5) Trend formula: 𝑇𝑡 = 𝛽( 𝐿𝑡 𝐿𝑡−1) + (1 − 𝛽)(𝑇𝑡−1 ) ⁄ (6) Seasonal formula: 𝑆𝑡 = 𝛾 ( 𝑦𝑡 𝛼𝑡 ) + (1 − 𝛾)𝑠𝑡 − 𝑀 (7) The level formula shows a weighted average between the seasonal observation and the non- seasonal forecast for𝑇𝑡. The trend formula is matching to Holt’s linear method. The seasonal formula shows an average between current seasonal index and the seasonal index of the same seasonal year (M). Analysis of data  The time series has been decomposed to get more detail about Trend, Seasonality, and Remainder component and flow diagram is explained in the fig.4.
  • 7. L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 57 Fig. 4. Proposed model flow diagram. The Akaike information criterion (AIC) is a precise method for estimating how well a model fits using the rainfall forecast data. It is used to compare different conceivable model samples and govern which one is the best fit for the rainfall forecast data. This is named entropy maximization principle and minimizing AIC values is equivalent to maximizing entropy and helps to measure the relative loss of information. Generally, AIC is calculated from the number of independent variables used to form the model and the maximum likelihood approximation of the model. 𝐴𝐼𝐶 = 2𝑘 − 2ln(𝐿 ̂) K is the number of estimated parameter variables used and L is the log-likelihood estimate parameter which is used for the model measure. Mean Absolute Error (MAE) are metrics used to evaluate the average of absolute value of the errors. The metrics helps to know how the model prediction rainfall forecast values are accurate and calculate the amount of deviation from the actual rainfall forecast values. This helps to predict the rainfall forecast based on the numbers of rainfall samples consider for the measurement. 𝑀𝐴𝐸 = ∑ 𝑦𝑖 𝑛 𝑖=1 − 𝑥𝑖 Where, n is the total number of rainfall samples, 𝑦𝑖 is the model rainfall forecasts values and 𝑥𝑖 is the true rainfall samples. Initialize time period M Choose smoothing parameter values alpha,beta and gamma(0 to 1) Calculate Initial Seasonal value Calculate Initial Trend value Calculate Initial level value Derive continuous seasonal value Derive continuous trend value Derive continuous level value Examine Forecast Model Prediction
  • 8. 58 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 Root Mean Squared Error (RMSE) is the square root of mean squared error, used as a standard statistical parameter to measure the model performance of rainfall forecast data. The model parameter indicated the standard deviation of residuals of rainfall forecast data. 𝑅𝑀𝑆𝐸 = √ ∑ (𝑓𝑖 − 𝑜𝑖)2 𝑛 𝑖=1 𝑛 Where, n is the number of rainfall samples, f is the model rainfall forecasts values and o is the observed rainfall samples. The RMSE is a good indicator to evaluate the performance of the interpolation values. Decomposition is performed using the stl() function and divides the time series automatically into three components (Trend, Seasonality, Remainder) shown in Fig. 5 Fig. 5. Time series decomposition.  Calculation to assess trend and strength of seasonality Ft: Trend Strength 𝐹𝑇 = 𝑚𝑎𝑥(0,1 − 𝑉𝑎𝑟(𝑅𝑡) 𝑉𝑎𝑟(𝑇𝑡+𝑅𝑡) ) (7) Fs: Seasonal Strength 𝐹 𝑠 = 𝑚𝑎𝑥(0,1 − 𝑉𝑎𝑟(𝑅𝑡) 𝑉𝑎𝑟(𝑆𝑡+𝑅𝑡) ) (8) The strength of the seasonal and trend ranged between 0 and 1, while ,1, indicates that the trend and seasonal occurred very strongly. In the present study the Trend strength is 0.1 and Seasonal strength is 0.5, it shows that the dataset follows seasonal pattern alone and it doesn’t follow the trend pattern. It shows that our data is comes under stationary dataset. In Fig.5 the seasonal subseries plot will provide a much more informative interpretation of our data. Seasonal subseries plots are a tool for detecting seasonality in a time series.
  • 9. L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 59 Pseudocode: Best model selection: Input: rainfall data for Thanjavur region Output: Best fit for forecast model 1.If seasonal_strength>=0.5 and/or trend_strength>=0.5 then Dataset is stationary series. Else Tansform as stationary series. 2. Split the dataset into training and testing sets. 3. Calculate statistical values using KPSS and DF method. 4. visualize ACF and PACF lag values for model parameters. 5. Train the dataset using different models: 5.1 ARIMA(p,d,q)(P,D,Q) 5.1.1 (p,q)= (i, i) where i= 0 to 4 If p=1 and d=0 and q=0 then AR model else if p=0 and d=0 and q=1 then MA model else if p=1 and d=0 and q=1 then ARMA model 5.2 ETS(A,Ad,A) 5.2.1 compare the seasonality component with remainder values. 5.2.2 if output_components = independent then additive series parameters Else multiplicative series parameters 5.3 Holt_Winters (L, T, S) 5.3.1 fix initial seed value of α, γ and, β 5.3.2 calculate initial seasonal (S), Level (L), Trend (T) factors 5.3.3 check the parameters as additive or multiplicative components 6. Find the residuals and apply diagnostic test. If the residuals are good then fit the model. Otherwise repeat the same process go to 5 and change the parameter values. 7. Custom the fitted model for forecasting.
  • 10. 60 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 4. Result and discussion The prediction of rainfall at Thanjavur for the time series is carried out by the construction of ETS, ARIMA and Holt-winters models. Out of the available 17-year monthly data, 10-year data from 2000 to 2009 is taken as training, 2010 to 2014 is taken as testing, and the prediction for the next two years from 2014 to 2016 is attained. The resulting prediction is correlated with the real rainfall data and plotted against it. Fig. 6 shows that the rainfall gradually increases from October and reach its maximum value in the month of November due to NE (North-East) monsoon season and decreases gradually and reach its minimum value in the month of March. Rainfall will begin to increase again after March and reach its maximum value in the month of August and September due to SW (South- West) monsoon. It depicts monthly average rainfall data for four time periods (based on industrial development and urbanisation phases). Significant changes in monthly rainfall have been discovered in the plot over the years and in the years to come. Monthly rainfall increased from March to September, indicating more rain in the pre-monsoon (March-May) and monsoon (June-September) seasons. Papalaskaris et al. [17] reported a similar pattern when estimating rainfall over Bangladesh. Excessive rain will result in major floods, putting crops at risk and causing waterlogging in the city. On the other side, a similar falling (December-January) rainfall trend was observed in October-November, followed by an increase in February, indicating a lower rainfall and dryer crop season. Fig. 6. Seasonal Subseries Plot.
  • 11. L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 61 4.1. Comparison of three models A statistical model is the use of statistics to build a representation of the data and then conduct analysis to infer any relationships between variables or discover insights. Machine learning, on the other hand, is the use of mathematical or statistical models to obtain a general understanding of the data to make predictions. Still, many in the industry use these terms interchangeably. While some may not see any harm in this, a true data scientist must understand the distinction between the two. 1. ARIMA Model Our data is given under a seasonal data set based on the strength and seasonal test, so it is regarded as stationary data. Six types of ARIMA models are used in this study and the best method out of six ARIMA models is chosen based on the AIC value. The capacity of the selected ARIMA model for precipitation and temperature (maximum and lowest) to evaluate the relative quality of statistical model for a given dataset is examined using AIC criterion. The Akaike Information Criterion (AIC) is a constant estimate plus the distance between the unknown true likelihood function of the data and the fitted likelihood function of the model, with a lower AIC indicating that the model is closer to the truth. In other words, AIC calculates the amount of information lost by a particular model, with the lower the amount of information lost, the higher the model's quality. Table 1 Accuracy level of ARIMA model. ARIMA (p,d,q) Model AIC value M1 (1,0,1) 1475.799 M2 (1,0,2) 1454.979 M3 (0,0,2) 1472.167 M4 (2,0,1) 1455.255 M5 (2,0,2) 1455.879 M6 –auto ARIMA 1463.207 Fig. 6 displays the Ljung-Box test and the ACF plot of model residuals. From Fig.6 it can be concluded that this model is acceptable for forecasting as its residuals represent the behaviour of white noise and are uncorrelated to each other. 2. ETS model ETS stands for Error Trend Seasonality. The ETS stands for exponential smoothing state space models that effectively fit the data (A, Ad, M). The parameters that were utilised to create these models, which were chosen in order to produce data that appeared to be reasonably realistic. The method clearly has a high success rate in determining whether the errors are additive or multiplicative. The optimum result is obtained in ETS model when the Trend is treated as Additive series and Error and Seasonality are treated as Multiplicative series. After a residual
  • 12. 62 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 check, ACF diagram shown in Fig. 7 demonstrates that the majority of sample autocorrelation coefficients of residuals from the fitted ETS state space models are within the model's bounds, implying that the residuals are white noise and the models are appropriate. The test results reveal that there are no autocorrelations in the in-sample forecast errors, as well as the distribution of forecast errors, confirming the evidence of no autocorrelations. This shows that the simple exponential smoothing method can be used to estimate rainfall with reasonable accuracy. Fig. 7. Residual check on ARIMA model. 3. Holts-Winters Model Holt-Winters model is also known as Triple Exponential smoothing. Here the given observed data is decomposed into seasonal, level and trend. The exponential weighted moving average of all three components is then blended and result is obtained. Prediction by this model (Fig.8) is also similar to the previous model. And there is a sign of little improvement in low magnitude rainfall. But there is no proper estimation of peak rainfall reported in the monsoon months. Fig. 8. Residual check on ETS model.
  • 13. L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 63 The selected model is compared with actual data set and it is shown in Fig.9. The green line represents the actual data ranges from 2000 to 2016. The other models ARIMA, ETS and HW are plotted with training data ranges from 2000 to 2009. By comparing actual data with model data, all the models are almost fit the same value with actual data. Based on the accuracy, HW Model doing better in both training and test set compared to ARIMA Model and ETS model. Fig. 9. Residual check on HW model. The selected model is compared with actual data set and it is shown in Fig.10. The green line represents the actual data ranges from 2000 to 2016. The other models ARIMA, ETS and HW are plotted with training data ranges from 2000 to 2009. By comparing actual data with model data, all the models are almost fit the same value with actual data. Based on the accuracy, HW Model doing better in both training and test set compared to ARIMA Model and ETS model. Fig. 10. Actual Data vs ARIMA, ETS and Holt-winters Forecasting. Forecasting was done using three models, ARIMA, ETS and HW is shown in Fig. 11 to Fig. 13 respectively. The models show similar movement based on the plot with the lowest value of rainfall will occur beginning month of each year as well as it follows the seasonal rainfall pattern of our study area. By comparing the ETS and HW forecasting models, both the model predicts similar way and ARIMA model slightly differ than the other models. The performance of the
  • 14. 64 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 model is evaluated with reference to Root Mean Squared Error (RMSE), AIC value and model fit. The RMSE and AIC values for models are given in Table 2. Both the RMSE and AIC value reveal that HW model is outperforming the rest of the models. It can be seen from the Table II that the highest accuracy is reported for HW model followed by ETS and ARIMA model. HW model has better correlation with actual values. Hence, the results shows that the HW as well as ETS models are suitable to predict future rainfall and seasonal pattern of the rainfall in the study area. This prediction of rainfall using ML can be useful for a farmer who wants to know when is the best month to start planting, as well as for the government who needs to prepare some strategy to avoid rainy season floods and dry season drought. The most important thing is that this forecast is based only on the historical average, using meteorological data and some knowledge from climate experts to incorporate the more detailed forecast. The future work focus on the same data set will be applied in the recurrent neural network-based prediction and try to improve accurate results [3,17,18]. As a result, the additive Holt-Winters approach is recommended for future forecasting above the multiplicative Holt-Winters method. The anticipated values will aid disaster management in determining future rainfall patterns, whether drought or flooding is expected. Furthermore, it will assist farmers in making timely decisions on the seeding of crops, fruits, and dried fruits. Table 2 Comparison of three Models. Model RMSE MAE AIC value ARIMA 54.287 39.474 1454.979 ETS 49.158 37.460 1452.286 HOLT-WINTERS 48..670 36.751 1450.817 Fig. 11. Prediction of monthly rainfall using ARIMA model.
  • 15. L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 65 Fig. 12. Prediction of monthly rainfall using ETS model. Fig. 13. Prediction of monthly rainfall using Holt-Winters model. Given the fact that it does not rain much during the dry season, there is a nonsignificant positive relationship between rainfall and average temperature from November to January, indicating that a small increase in average temperature results in more rainfall. In any other month, there is no notable relationship. During the Pre-Monsoon and Post-Monsoon seasons, rainfall and temperature have a slight inverse relationship. Despite the fact that there is no significant yearly relationship, temperature fluctuates unfavourably during Rabi season and favourably during Kharif season. 5. Conclusion In the present study, we have reported the time-series analysis and comparative study of machine learning models for the forecasting of rainfall at Thajnavur station of Tamilnadu. The dataset consists of monthly rainfall updates from January 2000 to December 2016. The time-series data is visualized by plotting time-series plot and correlation plots. For the timeseries forecasting of rainfall at Thanjavur station is carried out by building ARIMA, ETS and Holt-winters models.
  • 16. 66 L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 The performance of the model is evaluated with reference to Root Mean Squared Error (RMSE), MAE and AIC value. The comparative analysis revealed that HW model accurately forecasts the rainfall with less error. Thus, derived model could be used to forecast monthly rainfall for the upcoming years. Research concludes that the imperative issue of accurate forecasting of rainfall can be handled by machine learning models. It is significant to mention that, while model forecasts cannot predict exact precipitation amounts, they can reveal the likely trend of future rains and provide information that can assist decision-makers in developing strategies in areas such as agriculture, where knowing the start and end of rainy seasons is critical, civil works planning, and the time to prepare of mitigation plans for natural hazards, such as flooding. Finally, it's worth noting that rational planning and complete management of water resources necessitate forecasting future events while keeping in mind that most forecasts are based on previous events. References [1] Hipni A, El-shafie A, Najah A, Karim OA, Hussain A, Mukhlisin M. Daily Forecasting of Dam Water Levels: Comparing a Support Vector Machine (SVM) Model With Adaptive Neuro Fuzzy Inference System (ANFIS). Water Resour Manag 2013;27:3803–23. https://doi.org/10.1007/s11269-013-0382-4. [2] Najah A, El-Shafie A, Karim OA, Jaafar O. Integrated versus isolated scenario for prediction dissolved oxygen at progression of water quality monitoring stations. Hydrol Earth Syst Sci 2011;15:2693–708. https://doi.org/10.5194/hess-15-2693-2011. [3] Mahsin M, Akhter Y, Begum M. Modeling Rainfall in Dhaka Division of Bangladesh Using Time Series. J Math Model Appl 2012;1:67–73. [4] Tektaş M. Weather Forecasting Using ANFIS and ARIMA MODELS. A Case Study for Istanbul. Environ Res Eng Manag 2010;1:5–10. https://doi.org/10.5755/j01.erem.51.1.58. [5] Sciences E. Time Series Analysis Model for Rainfall Data in Jordan : Case Study for Using Time Series Analysis P . E . Naill M . Momani King Abdul Aziz University , Jeddah , Kingdom of Saudi Arabia. Am J Environ Sci 2009;5:599–604. [6] Shamsnia SA, Shahidi N, Liaghat A, Sarraf A, Vahdat SF. Modeling of weather parameters using stochastic methods (ARIMA model)(case study: Abadeh Region, Iran). Int Conf Environ Ind Innov IPCBEE 2011;12:282–5. [7] Suhartono, Faulina R, Lusia DA, Otok BW, Sutikno, Kuswanto H. Ensemble method based on ANFIS-ARIMA for rainfall prediction. ICSSBE 2012 - Proceedings, 2012 Int Conf Stat Sci Bus Eng "Empowering Decis Mak with Stat Sci 2012:240–3. https://doi.org/10.1109/ICSSBE.2012.6396564. [8] Li G, Chang W, Yang H. A Novel Combined Prediction Model for Monthly Mean Precipitation with Error Correction Strategy. IEEE Access 2020;8:141432–45. https://doi.org/10.1109/ACCESS.2020.3013354. [9] Vienna A. R Core Team R: A language and environment for statistical computing 2017. [10] Hyndman [10] R. J. Forecasting functions for time series and linear models_. R package version 8.2. 2017. [11] Mila FA, Parvin MT. Forecasting Area, Production and Yield of Onion in Bangladesh by Using ARIMA Model. Asian J Agric Extension, Econ Sociol 2019:1–12. https://doi.org/10.9734/ajaees/2019/v37i430274.
  • 17. L. Gowri et al./ Journal of Soft Computing in Civil Engineering 6-2 (2022) 51-67 67 [12] Punia M, Joshi PK, Porwal MC. Decision tree classification of land use land cover for Delhi, India using IRS-P6 AWiFS data. Expert Syst Appl 2011;38:5577–83. https://doi.org/10.1016/j.eswa.2010.10.078. [13] Burlando, P.; Rosso, R.; Cadavid, L.G.; Salas J. Forecasting of short-term rainfall using ARMA models. J Hydrol 1993:144: 193–211. [14] Salas, J.D.; Obeysekera JT. ARMA model identification of hydrologic time series. Water Resour Manag 1982;18:1011–1021. [15] Ridwan WM, Sapitang M, Aziz A, Kushiar KF, Ahmed AN, El-Shafie A. Rainfall forecasting model using machine learning methods: Case study Terengganu, Malaysia. Ain Shams Eng J 2021;12:1651–63. https://doi.org/10.1016/j.asej.2020.09.011. [16] Valipour M. Ability of Box-Jenkins Models to Estimate of Reference Potential Evapotranspiration (A Case Study: Mehrabad Synoptic Station, Tehran, Iran). IOSR J Agric Vet Sci 2012;1:01–11. https://doi.org/10.9790/2380-0150111. [17] Papalaskaris T, Panagiotidis T, Pantrakis A. Stochastic Monthly Rainfall Time Series Analysis, Modeling and Forecasting in Kavala City, Greece, North-Eastern Mediterranean Basin. Procedia Eng 2016;162:254–63. https://doi.org/10.1016/j.proeng.2016.11.054. [18] Thakkar AK, Desai VR, Patel A, Potdar MB. Post-classification corrections in improving the classification of Land Use/Land Cover of arid region using RS and GIS: The case of Arjuni watershed, Gujarat, India. Egypt J Remote Sens Sp Sci 2017;20:79–89. https://doi.org/10.1016/j.ejrs.2016.11.006.