SlideShare a Scribd company logo
Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah
Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 55
APPLICATION OF EXTREME VALUE THEORY TO BURSTS
PREDICTION
Abdelmahamoud Youssouf Dahab dahab_tc@hotmail.com
Computer & Information Sciences Department,
Universiti Teknologi PETRONAS,
Bandar Seri Iskandar, Tronoh, 31750 Perak, Malaysia.
Abas bin Md Said abass@petronas.com.my
Computer & Information Sciences Department,
Universiti Teknologi PETRONAS,
Bandar Seri Iskandar, Tronoh, 31750 Perak, Malaysia
Halabi bin Hasbullah halabi@petronas.com.my
Computer & Information Sciences Department,
Universiti Teknologi PETRONAS,
Bandar Seri Iskandar, Tronoh, 31750 Perak, Malaysia.
Abstract
Bursts and extreme events in quantities such as connection durations, file sizes,
throughput, etc. may produce undesirable consequences in computer networks.
Deterioration in the quality of service is a major consequence. Predicting these
extreme events and burst is important. It helps in reserving the right resources for
a better quality of service. We applied Extreme value theory (EVT) to predict
bursts in network traffic. We took a deeper look into the application of EVT by
using EVT based Exploratory Data Analysis. We found that traffic is naturally
divided into two categories, Internal and external traffic. The internal traffic
follows generalized extreme value (GEV) model with a negative shape
parameter, which is also the same as Weibull distribution. The external traffic
follows a GEV with positive shape parameter, which is Frechet distribution.
These findings are of great value to the quality of service in data networks,
especially when included in service level agreement as traffic descriptor
parameters.
Keywords: Traffic Bursts, Extreme Value Theory, Prediction, Quality of Service, Self-Similar.
1. INTRODUCTION
Extreme bursts have detrimental effects to network traffic and quality of service. These effects
may go far beyond congestion and delay. The Quality of Service (QoS) suffers as well. To
remedy, bursts need to be predicted in advance so that proper measures can be taken to mitigate
their effect. We need to understand the structure of traffic correctly so that we will be able to
predict bursts and enhance the QoS offered.
Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah
Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 56
The discovery of the self-similarity of network traffic is a mile stone in computer networks. All the
old method of modeling the traffic may not withstand the self-similarity of the network traffic. New
models based on the self-similar properties are being developed [8, 15]. Self-similarity means that
traffic is bursty over wide range of scales. Bursts can be defined as high aggregation of data in a
relatively very small time interval. This definition is a general one and can be defined more
precisely depending on the given contexts.
Traffic prediction problem has been approached using different techniques. These techniques
range between stochastic processes based methods [10, 11], Autoregressive moving average
(ARMA) with its variants FARIMA, ARIMA to Artificial Neural Network (ANN) and Wavelet Based
predictors, recent survey of these methods with references is given in [6]. However, these
techniques are using the whole set of data and can be computationally expensive. We suggest
the Extreme Value Theory (EVT) as a framework to deal with the bursts prediction problem.
EVT is a branch of knowledge that stems out of statistics. It is analogous to the central limit
theorem (CLT). While the CLT deals with the distribution of the sample mean and tells us that it
converges asymptotically to a normal distribution, the EVT deals with sample maximum and tells
that it converges asymptotically to one of three distributions (Gumbel, Weibull, and Fréchet),
these distribution limits are combined in a single representation called Generalized Extreme
Value Distribution (GEV). EVT is a rational framework to the problem of burst prediction. It needs
only a subset of the data to work on. This will greatly reduce the time space-complexity of the
method. Another advantage is that the EVT is being developed for these kinds of problems, i.e.
predicting extreme events based on subset of the data. It has been in use in diverse fileds such
as Insurance, Finance, and Hydrology etc, see [2, 5].
Recent studies suggested EVT as a framework for modeling different types of traffic [4, 12, 18,
19]. In [19], Masato Uchida used the throughput as a network parameter to be modeled. He
argued that throughput, link usage rate, packet loss rate and delay time can be used to predict
telecommunication quality. He used the Peak Over Threshold (POT) method for the modeling and
fitted a generalized Pareto distribution. He showed that the POT using GPD is better in
approximating the unknown part of the data than the previously commonly used lognormal
distribution.
In [12], authors also used POT method for the analysis of wireless traffic. They fitted a GPD
model and compared their model to the lognormal, Gamma, Exponential models. The
computational overhead is clearly reduced when using the EVT model because we need only a
subset of the data to work on.
However, these studies are far from complete. They did not include a rigorous check to see the
applicability of the EVT analysis. They applied the theory of EVT and in particular the POT
method with the assumptions that it holds true. They did not include any preliminary analysis of
the data to check whether the theory assumptions hold or not. Such steps are important and
crucial for the success of the model. One of the reasons for this is that dependent correlated data
do not necessarily converge to the classical form of the three distributions limits. As shown in [9],
if the data are highly dependent, then a further parameter called extremal index needs to be
carefully introduced.
In a previous work, we applied the EVT to the internal traffic [4]. In this work, we extend our
model to the external traffic as well. We applied EVT more faithfully by using EVT based
exploratory tools. Our findings are unique. We found that traffic distinguishes itself into external
and internal by assuming different signs in GEV model. Using the Block Maxima (BM) method,
the external traffic follows a GEV with positive shape parameter, which is analogous to the traffic
bursts following extreme Frechet distribution. Internal traffic follows a GEV with a negative shape
parameter. This also means that traffic bursts follow extreme Weibull distribution.
Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah
Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 57
The structure of this paper is as follows. First, we present EVT based model selection tools. We
apply them to Belcore traffic traces. We then estimate the parameters and show our prediction
results, and then we conclude.
TABLE 1 : Datasets
2. MODEL SELECTION TOOLS
Visualizing data is a very important step before a serious decision is to be taken about the
underlying process. A great deal of work has been done to illustrate this point. Such analysis
bears the name of Exploratory Data Analysis (EDA). In our work, we refer to EDA as EVT based
model selection tools. Selecting the right model is important. In this section, we discuss some of
EVT based model selection tools. Namely, Records, Maximum to Sum Ratio, Gumbel Plot, Mean
Excess plot. Other tools such as QQ-plot and Hill plot will be discussed here. Using these tools,
we analyze the Ethernet traffic traces from Belcore data and simulated ones.
Our five data sets consist of three data sets from Belcore Labs [21] and the other two are
simulated traces for both internal and external traffic [1, 14, 17 ]. We transformed Belcore traces
into bitrate per 0.1 second using simple MATLAB routines. The simulation of the internal trace is
based on Fractional Gaussian Noise (FGN) model suggested by Norros in [14] and the simulated
trace of the external traffic is based on Linear Fractional Alpha Stable Noise (LFSN) model, see
[20]. A summary of the data under study is given in Table1.
2.1 Records
Records can be used as exploratory tool in distinguishing between i.i.d and non i.i.d data. The
number of records of i.i.d data grows very slowly [5]. This fact allows us to use records in our
traffic data and check it against expected records in a typically known i.i.d data. If there is a match
in the number of records, then we may say that our data can be modeled as i.i.d otherwise we
say that our traffic data cannot be modeled as coming from i.i.d random process.
A record occurs if ),...,max( 111 −− => nnn XXMX . By definition 1X is a record. Let I be
indicator function, the record counting process N is given by
2,1,1
2
1 1
≥+== ∑=
> −
nINN
n
k
MXn kk
How would Records help? To answer this question, we tabulated the number of expected records
in a typical i.i.d dataset [5]. For the purpose of comparison, we computed the number of records
in the examined datasets.
A dramatic departure from the expected values in this table suggests clearly rejecting i.i.d
assumption. From Table 2, it is clear that disparity exists between the number of records in our
data and those expected from a typical i.i.d dataset. For the Aug98 data, it assumes higher
values than the average, while the Oct98 data assumes below average values. Nevertheless,
they are still within one standard deviation from the mean. So the i.i.d. assumption is a valid one.
k
n 10= EN VN BC-pAug98 BC-pOct98 FGN078 LFSN Oct89Ext
1 2.9 1.2 2 3 1 4 1
Dataset Type Size Source Parameters
BC-pAug98
BC-pOct98
BC-Oct89Ext
Internal Traffic
Internal Traffic
External Traffic
31429
15795
759431
Belcore
FGN078
LFSN
Internal Traffic
External Traffic
10000
10000
Simulated FGN
Simulated LFSN
H=0.78
H=0.75, Alpha=1.5
Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah
Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 58
2 5.2 1.9 7 4 2 6 2
3 7.5 2.4 10 7 7 9 6
4 9.8 2.8 12 11 7 12 12
TABLE 2: Records
2.2 Maximum to Sum Ratio
Maximum to sum ratio nn SM / can be used as an explanatory tool to tell about the finiteness
(existence) of the moment of given order, say p . It is a standard knowledge that the mean and
variance of a given data are their first order and second order moment, respectively.
Using maximum to sum ratio, we will be able to tell whether the variance, which is the second
order moment, of data is finite (exists) or infinite (does not exist). The
th
p order partial sum and
th
p order maximum are given by ∑=
=
n
i
p
in XpS
1
||)( and )||,...,|max(|)( 1
pp
n XXpM = , respectively.
From [5], we have the following equivalence relation
∞<⇔→= p
nnn XEpSpMpR ||0)(/)()(
This equivalence relation means that
th
p order maximum to
th
p order partial sum ratio goes to
zero as approaches infinity if and only if the
th
p order moment exists.
A direct way to exploit the above fact is to plot the max to sum ratio for 1st or 2nd order moment,
if the plot for a given moment order goes to zero, then we might say that the moment of that order
is finite (exists). It is also to note that a heavy tail distribution with tail parameter 2<α has an
infinite variance and if 1<α , the mean also is infinite. In the first row first column in Figure 1, we
see that internal Ethernet traffic data have a finite variance because the Maximum to Sum Ratio
tends to zero very quickly. Notice that we have plotted only the first 400 values, which account for
only a fraction of the data. Also from the Figure1, we see that the external traffic (2nd and 3rd
row), represented by Oct89Ext and LFSN, have infinite variance but a finite mean.
Fig. 1. Maximum to Sum Ratio with p=2, and p=1 for
bytes/100 ms datasets.
Fig.2. Gumbel plots. Some plots show concave departure from
linearity and some show a convex departure from linearity.
Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah
Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 59
Now that we have an idea about the different moments and structure of the traffic, it is time to
look for the parent distribution that might have produced the data.
2.3 Probability Paper Plot
The idea behind probability paper plot (PPP) is to graphically check whether our sample could
have come from the referenced distribution or not. The plot will look linear in case the sample
matches the referenced distribution. A departure from linearity is a clear indication that the
sample is not well approximated by the suggested distribution. Here we explain how does it work,
then we apply it to the samples of traffic data. More details about this method can be found in [5].
2.4 Gumbel Plot
Gumbel plot is probability plot where the reference distribution is the Gumbel distribution. It is one
of the most classical methods in extremes. It is a plot of the empirical distribution of the observed
data against the theoretical quantiles of the Gumbel distribution. If the data come from Gumbel
distribution then the plot will look linear, otherwise the plot shows a convex or concave curvature
depending on whether data come from a distribution with a tail heavier than the Gumbel's or
lighter, respectively. Gumbel plot method is close in spirit to the QQ and Probability plots. Gumbel
plot method is also known as double logarithmic plot.
In Gumbel method, we plot the empirical quantiles versus the quantiles of the theoretical Gumbel
distribution. The plot is given as
},...,1)),ln(ln(,{ ,, nkpX nknk =−− , where nknp nk /)5.0(, +−= are plotting positions.
Four Gumbel plots are shown in Figure 2. These plots are based on the block maximum series of
the data. Block maximum of internal LAN (second column) shows a concave curve deviation from
straight line (theoretical Gumbel quantiles). This concave deviation suggests block maxima
distribution with a lighter tail than Gumbel's. Meanwhile, the two other plots in the first column
have a convex shape. This convex departure from linearity suggests block maxima traffic that
follows an extreme Frechet distribution. These statements are to be analyzed and confirmed by
estimating model parameters. Remember that we are talking about the distribution of block
maxima and not the whole traffic data.
2.5 Mean Excess Function
The mean excess function of a probability distribution is given as
fxuuXuXEue <≤>−= 0),|()( , where u is a given threshold and is the support or right
end point of the distribution.
Mean excess function is used under different names in different disciplines. In insurance, it is the
expected claim size in the unlimited layer; in finance, it is the shortfall; in reliability, it is the mean
residual life.
Mean excess plot (meplot) is based on the mean excess function. It is a useful visualization tool.
It helps in discriminating in the tail of traffic data. If traffic data comes from a distribution with a
heavier tail than Gumbel's, then the plot will look linearly increasing. If data come from a
distribution with a lighter tail than the Gumbel, then the mean excess plot will be linearly
decreasing.
Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah
Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 60
As a graphical tool we use the mean excess plot, it is based on the estimate of mean excess
function. Suppose that we have nXX ,...,1 , the sample mean excess is given as
∫
∞
=
u
nn dyyFuFue )()(/1)(
The graph )}(,{ ,, nknnk XeX is called the mean excess plot.
Putting it together, we listed meplots for both internal and external simulated samples in Figure3.
From the Figure, we see that the internal traffic has a decreasing meplot (upper one), while the
external traffic has a linear increasing curve. These observations are in conformity with the
theoretical Mean Excess function. The decreasing plot suggests a distribution with light tail. In the
case of modeling with GEV, this behavior is analogous to a GEV with a negative shape
parameter. The increasing plot suggests a distribution with heavier tail than Gumbel’s. This
increasing meplot is typical for a GEV with a positive shape parameter
3. MODEL PARAMETERS ESTIMATION
In general, to fit a model to the given data we need to estimate the parameters of that particular
model, GEV model is no exception [3]. We chose the maximum likelihood (ML) method. ML
method is implemented in the function gev from EVIM package [7]. We used blocks of size 100;
this choice gives us maxima in blocks of 10 seconds each. We estimated model parameters
using ML method.
From estimation results, we realized that internal traffic traces have negative shape parameter,
while the two external traces have a positive shape parameter. This is an important observation
since the shape parameter determines the type of fit distribution. In the internal traffic the
negative shape parameter is an indication of the Weibull model. In external traffic case, the
positive shape parameter is a clear indication of the traffic block maxima following an extreme
Frechet distribution. Norros arrived at a similar conclusion for the internal traffic [14]. He showed
analytically that the behavior of a buffer fed with internal traffic, approximated by fractional
Gaussian noise, follows Weibull distribution. We arrived at a similar result but using extreme
value theory perspective, which is much appropriate to the bursts prediction problem
Fig. 4. Return level plot for LFSN.Fig. 3. Mean excess plots for FGN078(upper one)
and LFSN
Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah
Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 61
4. PREDICTION
Apart from the model and its estimated parameters, we may use the powerful return level plot for
the purpose of prediction. Return level plot is a plot of return period versus return level. We have
produced a return level plot for the LFSN dataset with maxima taken in block of 100 observations.
See Figure 4.
5. RESULTS AND ANALYSIS
Using EVT based model selection tools and parameters’ estimates, we classify traffic bursts
(block maxima) as coming either from a GEV distribution with positive shape parameter or GEV
with a negative shape parameter, depending on the type of traffic.
From the EVT-based exploratory tools we notice that records of both kinds of traffic are within the
confidence bounds of records from i.i.d data. This allows us to model the data not differently from
i.i.d case. The max/sum ration also is a good tool for discriminating the two types of traffic
(internal/external). For the external traffic, since the plot for the second moment does not
approach zero quickly we say the variance is infinite. From the right column plot, we say that the
mean is finite. This is also an obvious observation since the plot of the ratio for the first moment
tends to zero quickly. Same thing can be said on the mean of the internal traffic. However, unlike
the external traffic, the internal traffic has finite variance.
The internal traffic, which is approximated by fractional Gaussian model, has traffic maxima that
follows GEV model with negative shape parameter. Put differently, those block maxima fall in the
Weibull domain of attraction.
The external traffic is best approximated by linear fractional stable noise (LFSN). This LFSN
traffic model has block maxima that follow a GEV with a positive shape parameter. The same way
as for internal traffic, this GEV with positive shape parameter means that bursts or traffic maxima
fall in the domain of attraction of Extreme Frechet model.
Using extreme value distribution we are able to model the bursts or traffic maxima. This modeling
of bursts is a valuable one. It permits us, in the context of Quality of Service, to define new
extreme traffic metrics to be included in the Service Level Agreement (SLA) contracts. This will
help service providers to deliver a better service to their clients, and the latter to ask for the
service they actually need.
6. CONCLUSION
We used extreme value theory to the bursts prediction problem. In particular, we used
Generalized Extreme Value model. Using EVT based model selection tools, traffic is seen to be
naturally classified into external and internal traffic, following GEV model with positive and
negative shape parameter, respectively. Model selection tools have further improved our
confidence in the model. EVT as a framework is better suited than any other analytical model to
predict bursts and serious deterioration in network traffic. It is computationally less expensive and
requires minimum disk space. In future, we should consider assessing the fit of the model with
some quality of fit measures.
7. REFERENCES
Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah
Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 62
1. Patrice Abry and Fabrice Sellan. “The wavelet-based synthesis for fractional Brownian motion
proposed by F. Sellan and Y. Meyer: Remarks and fast implementation”. Applied and
Computational Harmonic Analysis, 3(4):377-383, 1996.
2. N. Balakrishnan, Jose Maria Sarabia, Enrique Castillo and Ali S. Hadi. “Extreme Value and
Related Models with Applications in Engineering and Science”. Wiley, Wiley, 2004.
3. George Casella and Roger L. Berger. “Statistical Inference”. Duxbury, Pacific Grove, CA
93950, USA, 2001.
4. Abdelmahmoud Youssouf Dahab, Halabi Hasbullah and Abas Md Said, "Predicting Traffic
Bursts Using Extreme Value Theory". In International Conference on Signal Acquisition and
Processing, pp.229-233, 2009.
5. Paul Embrechts, Thomas Mikosch and Claudia Kl¨uppelberg. “Modelling Extremal Events: for
Insurance and Finance”. Springer-Verlag, London, UK,1997.
6. Hifang Feng and Yantai Shu. “Study on network traffic prediction techniques”. International
Conference on Wireless Communications, Networking and Mobile Computing, 2005, 2(23-26
Sept. 2005):1041–1044, September 2005.
7. Ramazan Genay, Faruk Seluk, and Abdurrahman Uluglyagci. “Evim: A software package for
extreme value analysis in matlab”. Studies in Nonlinear Dynamics & Econometrics, 5(3):1080–
1080, 2001.
8. A. Karasaridis and D. Hatzinakos. “Network heavy traffic modeling using alpha-stable self-
similar processes”. Communications, IEEE Transactions on, 49(7):1203–1214, Jul 2001.
9. M. R. Leadbetter and Holger Rootzen. “Extremal Theory for Stochastic Processes”. The Annals
of Probability. 16(2):431-478, 1988.
10. M. Li and S. C. Lim. “Modeling network traffic using generalized Cauchy process”. Physica A,
387(11): 2584-2594,15 April 2008.
11. M. Li and W. Zhao. “Representation of a Stochastic Traffic Bound, accepted for publication”.
IEEE Transactions on Parallel and Distributed Systems, 2009.
12. Chunfeng Liu, Yantai Shu, and Jiakun Liu. “Application of Extreme Value Theory to the
Analysis of Wireless Network Traffic”. In Proceedings of IEEE International Conference on
Communications, ICC 2007, Glasgow, Scotland, 24-28 June 2007.
13. Abuagla Babiker Mohd and Sulaiman bin Mohd Nor. “Towards a Flow-based Internet Traffic
Classification for Bandwidth Optimization”. International Journal of Computer Science and
Security (IJCSS), 3:2, 146-153. 2009.
14. Ilkka Norros. “On the use of fractional brownian motion in the theory of connectionless
networks”. IEEE Journal of Selected Areas in Communications, 13(6):953–962, 1995.
15. Kihong Park and Walter Willinger. “Self-Similar Network Traffic and Performance Evaluation”.
John Wiley & Sons, New Yotk, USA, 2000.
16. Gennady Samorodnitsky and Murad S. Taqqu. “Stable Non-Gaussian Random Processes:
Stochastic Models with Infinite Variance”. Chapman & Hall, London SE1 8HN, 1994.
17. Stilian Stoev and Murad S. Taqqu. “Simulation methods for linear fractional stable motion and
farima using the fast fourier transform”. Fractals, 12:2004, 2004.
Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah
Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 63
18. Zoi Tsourti and John Panaretos. “Extreme-value analysis of teletraffic data”. Computational
Statistics Data Analysis, 45(1):85 – 103, 2004. Computer Security and Statistics.
19. M. Uchida. “Traffic data analysis based on extreme value theory and its applications”. Global
Telecommunications Conference, 2004. GLOBECOM’04. IEEE, 3:1418–1424 Vol.3, Nov.-3 Dec.
2004.
20. Wei Biao Wu, George Michailidis, and Danlu Zhang. “Simulating sample paths of linear
fractional stable motion”. IEEE Transactions on Information Theory, 50:1086–1096, 2004.
21. Internet traffic Archive, “http://ita.ee.lbl.gov”.
BIOGRAPHY
1. Abdelmahamoud Youssouf Dahab is a PhD candidate in Computer & Information
Sciences Department, Universiti Teknologi PETRONAS.
2. AP Dr Abas Md Said is a lecturer in the Computer and Information Sciences
Department at Universiti Teknologi PETRONAS. He obtained his Bachelor and Masters
of Science from Western Michigan University, USA. Doctor of Philosophy from
Loughborough University, UK. He specializes in Computer Graphics, Visualization, and
Networks. His research interests are in Optimized Stereoscopic Rendering for
Visualization. He has a number of publications.
3. Dr Halabi Hasbullah is a lecturer in the Computer and Information Sciences
Department at Universiti Teknologi PETRONAS. He obtained his Bachelor of Science
(Mathematics) from Universiti Malaya, Malaysia. His Masters of Science (Information
Technology) from De Montfort University, UK. And PhD (Communications System),
National University of Malaysia (UKM), Malaysia. He specializes in Network
Communications. His research interests are Bluetooth network, Traffic engineering,
Wireless sensor network (WSN), Vehicular ad hoc network (VANET), and IPv6. He has
a number of publications.

More Related Content

PDF
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
PDF
84cc04ff77007e457df6aa2b814d2346bf1b
PDF
Results of Fitted Neural Network Models on Malaysian Aggregate Dataset
PDF
Impact of Graphs and Network in Minimizing Project and Product Cost
PDF
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
PDF
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
PDF
Optimal configuration of network coding in ad hoc networks
PDF
Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
84cc04ff77007e457df6aa2b814d2346bf1b
Results of Fitted Neural Network Models on Malaysian Aggregate Dataset
Impact of Graphs and Network in Minimizing Project and Product Cost
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MINING
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...
Optimal configuration of network coding in ad hoc networks
Enhancement of Single Moving Average Time Series Model Using Rough k-Means fo...

What's hot (18)

PDF
An experimental evaluation of similarity-based and embedding-based link predi...
PDF
Cost Estimation Predictive Modeling: Regression versus Neural Network
PDF
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
PDF
algorithms
PDF
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
DOC
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
PDF
Extended pso algorithm for improvement problems k means clustering algorithm
PDF
Predicting Budget from Transportation Research Grant Description: An Explorat...
PDF
Artificial Neural Networks for Construction Management: A Review
PDF
Data Imputation by Soft Computing
PDF
A time efficient and accurate retrieval of range aggregate queries using fuzz...
PDF
Cross Domain Data Fusion
PDF
A h k clustering algorithm for high dimensional data using ensemble learning
PDF
Image similarity using fourier transform
PDF
Optimal Configuration of Network Coding in Ad Hoc Networks
PDF
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
PDF
IRJET- Evidence Chain for Missing Data Imputation: Survey
PDF
Effective data mining for proper
An experimental evaluation of similarity-based and embedding-based link predi...
Cost Estimation Predictive Modeling: Regression versus Neural Network
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
algorithms
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Extended pso algorithm for improvement problems k means clustering algorithm
Predicting Budget from Transportation Research Grant Description: An Explorat...
Artificial Neural Networks for Construction Management: A Review
Data Imputation by Soft Computing
A time efficient and accurate retrieval of range aggregate queries using fuzz...
Cross Domain Data Fusion
A h k clustering algorithm for high dimensional data using ensemble learning
Image similarity using fourier transform
Optimal Configuration of Network Coding in Ad Hoc Networks
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
IRJET- Evidence Chain for Missing Data Imputation: Survey
Effective data mining for proper
Ad

Similar to Application Of Extreme Value Theory To Bursts Prediction (20)

PDF
Using petri net with inherent fuzzy in the recognition of ecg signals
PDF
A Survey on the Common Network Traffic Sources Models
PDF
Towards a new intelligent traffic system based on deep learning and data int...
PPTX
Traffic Prediction from Street Network images.pptx
PDF
Classification Approach for Big Data Driven Traffic Flow Prediction using Ap...
PDF
Enhancing Traffic Prediction with Historical Data and Estimated Time of Arrival
PDF
TIME SERIES ANALYSIS ON TRANSPORT IN ODISHA
PDF
PPT
stochasticmodellinganditsapplications.ppt
PDF
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
PDF
Smart traffic forecasting: leveraging adaptive machine learning and big data ...
PDF
Adaptive traffic lights based on traffic flow prediction using machine learni...
PPTX
Application of a Markov chain traffic model to the Greater Philadelphia Region
PPTX
Application of time series analysis in financial economics
PDF
Approximation of regression-based fault minimization for network traffic
PPTX
Stochastic modelling and its applications
PDF
Internet Traffic Forecasting using Time Series Methods
PDF
00b7d51ed81834e4d7000000
PDF
The math behind big systems analysis.
PDF
Lecture19 xing
Using petri net with inherent fuzzy in the recognition of ecg signals
A Survey on the Common Network Traffic Sources Models
Towards a new intelligent traffic system based on deep learning and data int...
Traffic Prediction from Street Network images.pptx
Classification Approach for Big Data Driven Traffic Flow Prediction using Ap...
Enhancing Traffic Prediction with Historical Data and Estimated Time of Arrival
TIME SERIES ANALYSIS ON TRANSPORT IN ODISHA
stochasticmodellinganditsapplications.ppt
Time Series Forecasting Using Novel Feature Extraction Algorithm and Multilay...
Smart traffic forecasting: leveraging adaptive machine learning and big data ...
Adaptive traffic lights based on traffic flow prediction using machine learni...
Application of a Markov chain traffic model to the Greater Philadelphia Region
Application of time series analysis in financial economics
Approximation of regression-based fault minimization for network traffic
Stochastic modelling and its applications
Internet Traffic Forecasting using Time Series Methods
00b7d51ed81834e4d7000000
The math behind big systems analysis.
Lecture19 xing
Ad

Recently uploaded (20)

PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
master seminar digital applications in india
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
Insiders guide to clinical Medicine.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Cell Types and Its function , kingdom of life
PPTX
Pharma ospi slides which help in ospi learning
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Business Ethics Teaching Materials for college
PDF
Pre independence Education in Inndia.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
master seminar digital applications in india
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
O7-L3 Supply Chain Operations - ICLT Program
FourierSeries-QuestionsWithAnswers(Part-A).pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
O5-L3 Freight Transport Ops (International) V1.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Module 4: Burden of Disease Tutorial Slides S2 2025
Insiders guide to clinical Medicine.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Cell Types and Its function , kingdom of life
Pharma ospi slides which help in ospi learning
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Week 4 Term 3 Study Techniques revisited.pptx
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Business Ethics Teaching Materials for college
Pre independence Education in Inndia.pdf
Final Presentation General Medicine 03-08-2024.pptx

Application Of Extreme Value Theory To Bursts Prediction

  • 1. Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 55 APPLICATION OF EXTREME VALUE THEORY TO BURSTS PREDICTION Abdelmahamoud Youssouf Dahab dahab_tc@hotmail.com Computer & Information Sciences Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Tronoh, 31750 Perak, Malaysia. Abas bin Md Said abass@petronas.com.my Computer & Information Sciences Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Tronoh, 31750 Perak, Malaysia Halabi bin Hasbullah halabi@petronas.com.my Computer & Information Sciences Department, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, Tronoh, 31750 Perak, Malaysia. Abstract Bursts and extreme events in quantities such as connection durations, file sizes, throughput, etc. may produce undesirable consequences in computer networks. Deterioration in the quality of service is a major consequence. Predicting these extreme events and burst is important. It helps in reserving the right resources for a better quality of service. We applied Extreme value theory (EVT) to predict bursts in network traffic. We took a deeper look into the application of EVT by using EVT based Exploratory Data Analysis. We found that traffic is naturally divided into two categories, Internal and external traffic. The internal traffic follows generalized extreme value (GEV) model with a negative shape parameter, which is also the same as Weibull distribution. The external traffic follows a GEV with positive shape parameter, which is Frechet distribution. These findings are of great value to the quality of service in data networks, especially when included in service level agreement as traffic descriptor parameters. Keywords: Traffic Bursts, Extreme Value Theory, Prediction, Quality of Service, Self-Similar. 1. INTRODUCTION Extreme bursts have detrimental effects to network traffic and quality of service. These effects may go far beyond congestion and delay. The Quality of Service (QoS) suffers as well. To remedy, bursts need to be predicted in advance so that proper measures can be taken to mitigate their effect. We need to understand the structure of traffic correctly so that we will be able to predict bursts and enhance the QoS offered.
  • 2. Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 56 The discovery of the self-similarity of network traffic is a mile stone in computer networks. All the old method of modeling the traffic may not withstand the self-similarity of the network traffic. New models based on the self-similar properties are being developed [8, 15]. Self-similarity means that traffic is bursty over wide range of scales. Bursts can be defined as high aggregation of data in a relatively very small time interval. This definition is a general one and can be defined more precisely depending on the given contexts. Traffic prediction problem has been approached using different techniques. These techniques range between stochastic processes based methods [10, 11], Autoregressive moving average (ARMA) with its variants FARIMA, ARIMA to Artificial Neural Network (ANN) and Wavelet Based predictors, recent survey of these methods with references is given in [6]. However, these techniques are using the whole set of data and can be computationally expensive. We suggest the Extreme Value Theory (EVT) as a framework to deal with the bursts prediction problem. EVT is a branch of knowledge that stems out of statistics. It is analogous to the central limit theorem (CLT). While the CLT deals with the distribution of the sample mean and tells us that it converges asymptotically to a normal distribution, the EVT deals with sample maximum and tells that it converges asymptotically to one of three distributions (Gumbel, Weibull, and Fréchet), these distribution limits are combined in a single representation called Generalized Extreme Value Distribution (GEV). EVT is a rational framework to the problem of burst prediction. It needs only a subset of the data to work on. This will greatly reduce the time space-complexity of the method. Another advantage is that the EVT is being developed for these kinds of problems, i.e. predicting extreme events based on subset of the data. It has been in use in diverse fileds such as Insurance, Finance, and Hydrology etc, see [2, 5]. Recent studies suggested EVT as a framework for modeling different types of traffic [4, 12, 18, 19]. In [19], Masato Uchida used the throughput as a network parameter to be modeled. He argued that throughput, link usage rate, packet loss rate and delay time can be used to predict telecommunication quality. He used the Peak Over Threshold (POT) method for the modeling and fitted a generalized Pareto distribution. He showed that the POT using GPD is better in approximating the unknown part of the data than the previously commonly used lognormal distribution. In [12], authors also used POT method for the analysis of wireless traffic. They fitted a GPD model and compared their model to the lognormal, Gamma, Exponential models. The computational overhead is clearly reduced when using the EVT model because we need only a subset of the data to work on. However, these studies are far from complete. They did not include a rigorous check to see the applicability of the EVT analysis. They applied the theory of EVT and in particular the POT method with the assumptions that it holds true. They did not include any preliminary analysis of the data to check whether the theory assumptions hold or not. Such steps are important and crucial for the success of the model. One of the reasons for this is that dependent correlated data do not necessarily converge to the classical form of the three distributions limits. As shown in [9], if the data are highly dependent, then a further parameter called extremal index needs to be carefully introduced. In a previous work, we applied the EVT to the internal traffic [4]. In this work, we extend our model to the external traffic as well. We applied EVT more faithfully by using EVT based exploratory tools. Our findings are unique. We found that traffic distinguishes itself into external and internal by assuming different signs in GEV model. Using the Block Maxima (BM) method, the external traffic follows a GEV with positive shape parameter, which is analogous to the traffic bursts following extreme Frechet distribution. Internal traffic follows a GEV with a negative shape parameter. This also means that traffic bursts follow extreme Weibull distribution.
  • 3. Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 57 The structure of this paper is as follows. First, we present EVT based model selection tools. We apply them to Belcore traffic traces. We then estimate the parameters and show our prediction results, and then we conclude. TABLE 1 : Datasets 2. MODEL SELECTION TOOLS Visualizing data is a very important step before a serious decision is to be taken about the underlying process. A great deal of work has been done to illustrate this point. Such analysis bears the name of Exploratory Data Analysis (EDA). In our work, we refer to EDA as EVT based model selection tools. Selecting the right model is important. In this section, we discuss some of EVT based model selection tools. Namely, Records, Maximum to Sum Ratio, Gumbel Plot, Mean Excess plot. Other tools such as QQ-plot and Hill plot will be discussed here. Using these tools, we analyze the Ethernet traffic traces from Belcore data and simulated ones. Our five data sets consist of three data sets from Belcore Labs [21] and the other two are simulated traces for both internal and external traffic [1, 14, 17 ]. We transformed Belcore traces into bitrate per 0.1 second using simple MATLAB routines. The simulation of the internal trace is based on Fractional Gaussian Noise (FGN) model suggested by Norros in [14] and the simulated trace of the external traffic is based on Linear Fractional Alpha Stable Noise (LFSN) model, see [20]. A summary of the data under study is given in Table1. 2.1 Records Records can be used as exploratory tool in distinguishing between i.i.d and non i.i.d data. The number of records of i.i.d data grows very slowly [5]. This fact allows us to use records in our traffic data and check it against expected records in a typically known i.i.d data. If there is a match in the number of records, then we may say that our data can be modeled as i.i.d otherwise we say that our traffic data cannot be modeled as coming from i.i.d random process. A record occurs if ),...,max( 111 −− => nnn XXMX . By definition 1X is a record. Let I be indicator function, the record counting process N is given by 2,1,1 2 1 1 ≥+== ∑= > − nINN n k MXn kk How would Records help? To answer this question, we tabulated the number of expected records in a typical i.i.d dataset [5]. For the purpose of comparison, we computed the number of records in the examined datasets. A dramatic departure from the expected values in this table suggests clearly rejecting i.i.d assumption. From Table 2, it is clear that disparity exists between the number of records in our data and those expected from a typical i.i.d dataset. For the Aug98 data, it assumes higher values than the average, while the Oct98 data assumes below average values. Nevertheless, they are still within one standard deviation from the mean. So the i.i.d. assumption is a valid one. k n 10= EN VN BC-pAug98 BC-pOct98 FGN078 LFSN Oct89Ext 1 2.9 1.2 2 3 1 4 1 Dataset Type Size Source Parameters BC-pAug98 BC-pOct98 BC-Oct89Ext Internal Traffic Internal Traffic External Traffic 31429 15795 759431 Belcore FGN078 LFSN Internal Traffic External Traffic 10000 10000 Simulated FGN Simulated LFSN H=0.78 H=0.75, Alpha=1.5
  • 4. Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 58 2 5.2 1.9 7 4 2 6 2 3 7.5 2.4 10 7 7 9 6 4 9.8 2.8 12 11 7 12 12 TABLE 2: Records 2.2 Maximum to Sum Ratio Maximum to sum ratio nn SM / can be used as an explanatory tool to tell about the finiteness (existence) of the moment of given order, say p . It is a standard knowledge that the mean and variance of a given data are their first order and second order moment, respectively. Using maximum to sum ratio, we will be able to tell whether the variance, which is the second order moment, of data is finite (exists) or infinite (does not exist). The th p order partial sum and th p order maximum are given by ∑= = n i p in XpS 1 ||)( and )||,...,|max(|)( 1 pp n XXpM = , respectively. From [5], we have the following equivalence relation ∞<⇔→= p nnn XEpSpMpR ||0)(/)()( This equivalence relation means that th p order maximum to th p order partial sum ratio goes to zero as approaches infinity if and only if the th p order moment exists. A direct way to exploit the above fact is to plot the max to sum ratio for 1st or 2nd order moment, if the plot for a given moment order goes to zero, then we might say that the moment of that order is finite (exists). It is also to note that a heavy tail distribution with tail parameter 2<α has an infinite variance and if 1<α , the mean also is infinite. In the first row first column in Figure 1, we see that internal Ethernet traffic data have a finite variance because the Maximum to Sum Ratio tends to zero very quickly. Notice that we have plotted only the first 400 values, which account for only a fraction of the data. Also from the Figure1, we see that the external traffic (2nd and 3rd row), represented by Oct89Ext and LFSN, have infinite variance but a finite mean. Fig. 1. Maximum to Sum Ratio with p=2, and p=1 for bytes/100 ms datasets. Fig.2. Gumbel plots. Some plots show concave departure from linearity and some show a convex departure from linearity.
  • 5. Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 59 Now that we have an idea about the different moments and structure of the traffic, it is time to look for the parent distribution that might have produced the data. 2.3 Probability Paper Plot The idea behind probability paper plot (PPP) is to graphically check whether our sample could have come from the referenced distribution or not. The plot will look linear in case the sample matches the referenced distribution. A departure from linearity is a clear indication that the sample is not well approximated by the suggested distribution. Here we explain how does it work, then we apply it to the samples of traffic data. More details about this method can be found in [5]. 2.4 Gumbel Plot Gumbel plot is probability plot where the reference distribution is the Gumbel distribution. It is one of the most classical methods in extremes. It is a plot of the empirical distribution of the observed data against the theoretical quantiles of the Gumbel distribution. If the data come from Gumbel distribution then the plot will look linear, otherwise the plot shows a convex or concave curvature depending on whether data come from a distribution with a tail heavier than the Gumbel's or lighter, respectively. Gumbel plot method is close in spirit to the QQ and Probability plots. Gumbel plot method is also known as double logarithmic plot. In Gumbel method, we plot the empirical quantiles versus the quantiles of the theoretical Gumbel distribution. The plot is given as },...,1)),ln(ln(,{ ,, nkpX nknk =−− , where nknp nk /)5.0(, +−= are plotting positions. Four Gumbel plots are shown in Figure 2. These plots are based on the block maximum series of the data. Block maximum of internal LAN (second column) shows a concave curve deviation from straight line (theoretical Gumbel quantiles). This concave deviation suggests block maxima distribution with a lighter tail than Gumbel's. Meanwhile, the two other plots in the first column have a convex shape. This convex departure from linearity suggests block maxima traffic that follows an extreme Frechet distribution. These statements are to be analyzed and confirmed by estimating model parameters. Remember that we are talking about the distribution of block maxima and not the whole traffic data. 2.5 Mean Excess Function The mean excess function of a probability distribution is given as fxuuXuXEue <≤>−= 0),|()( , where u is a given threshold and is the support or right end point of the distribution. Mean excess function is used under different names in different disciplines. In insurance, it is the expected claim size in the unlimited layer; in finance, it is the shortfall; in reliability, it is the mean residual life. Mean excess plot (meplot) is based on the mean excess function. It is a useful visualization tool. It helps in discriminating in the tail of traffic data. If traffic data comes from a distribution with a heavier tail than Gumbel's, then the plot will look linearly increasing. If data come from a distribution with a lighter tail than the Gumbel, then the mean excess plot will be linearly decreasing.
  • 6. Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 60 As a graphical tool we use the mean excess plot, it is based on the estimate of mean excess function. Suppose that we have nXX ,...,1 , the sample mean excess is given as ∫ ∞ = u nn dyyFuFue )()(/1)( The graph )}(,{ ,, nknnk XeX is called the mean excess plot. Putting it together, we listed meplots for both internal and external simulated samples in Figure3. From the Figure, we see that the internal traffic has a decreasing meplot (upper one), while the external traffic has a linear increasing curve. These observations are in conformity with the theoretical Mean Excess function. The decreasing plot suggests a distribution with light tail. In the case of modeling with GEV, this behavior is analogous to a GEV with a negative shape parameter. The increasing plot suggests a distribution with heavier tail than Gumbel’s. This increasing meplot is typical for a GEV with a positive shape parameter 3. MODEL PARAMETERS ESTIMATION In general, to fit a model to the given data we need to estimate the parameters of that particular model, GEV model is no exception [3]. We chose the maximum likelihood (ML) method. ML method is implemented in the function gev from EVIM package [7]. We used blocks of size 100; this choice gives us maxima in blocks of 10 seconds each. We estimated model parameters using ML method. From estimation results, we realized that internal traffic traces have negative shape parameter, while the two external traces have a positive shape parameter. This is an important observation since the shape parameter determines the type of fit distribution. In the internal traffic the negative shape parameter is an indication of the Weibull model. In external traffic case, the positive shape parameter is a clear indication of the traffic block maxima following an extreme Frechet distribution. Norros arrived at a similar conclusion for the internal traffic [14]. He showed analytically that the behavior of a buffer fed with internal traffic, approximated by fractional Gaussian noise, follows Weibull distribution. We arrived at a similar result but using extreme value theory perspective, which is much appropriate to the bursts prediction problem Fig. 4. Return level plot for LFSN.Fig. 3. Mean excess plots for FGN078(upper one) and LFSN
  • 7. Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 61 4. PREDICTION Apart from the model and its estimated parameters, we may use the powerful return level plot for the purpose of prediction. Return level plot is a plot of return period versus return level. We have produced a return level plot for the LFSN dataset with maxima taken in block of 100 observations. See Figure 4. 5. RESULTS AND ANALYSIS Using EVT based model selection tools and parameters’ estimates, we classify traffic bursts (block maxima) as coming either from a GEV distribution with positive shape parameter or GEV with a negative shape parameter, depending on the type of traffic. From the EVT-based exploratory tools we notice that records of both kinds of traffic are within the confidence bounds of records from i.i.d data. This allows us to model the data not differently from i.i.d case. The max/sum ration also is a good tool for discriminating the two types of traffic (internal/external). For the external traffic, since the plot for the second moment does not approach zero quickly we say the variance is infinite. From the right column plot, we say that the mean is finite. This is also an obvious observation since the plot of the ratio for the first moment tends to zero quickly. Same thing can be said on the mean of the internal traffic. However, unlike the external traffic, the internal traffic has finite variance. The internal traffic, which is approximated by fractional Gaussian model, has traffic maxima that follows GEV model with negative shape parameter. Put differently, those block maxima fall in the Weibull domain of attraction. The external traffic is best approximated by linear fractional stable noise (LFSN). This LFSN traffic model has block maxima that follow a GEV with a positive shape parameter. The same way as for internal traffic, this GEV with positive shape parameter means that bursts or traffic maxima fall in the domain of attraction of Extreme Frechet model. Using extreme value distribution we are able to model the bursts or traffic maxima. This modeling of bursts is a valuable one. It permits us, in the context of Quality of Service, to define new extreme traffic metrics to be included in the Service Level Agreement (SLA) contracts. This will help service providers to deliver a better service to their clients, and the latter to ask for the service they actually need. 6. CONCLUSION We used extreme value theory to the bursts prediction problem. In particular, we used Generalized Extreme Value model. Using EVT based model selection tools, traffic is seen to be naturally classified into external and internal traffic, following GEV model with positive and negative shape parameter, respectively. Model selection tools have further improved our confidence in the model. EVT as a framework is better suited than any other analytical model to predict bursts and serious deterioration in network traffic. It is computationally less expensive and requires minimum disk space. In future, we should consider assessing the fit of the model with some quality of fit measures. 7. REFERENCES
  • 8. Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 62 1. Patrice Abry and Fabrice Sellan. “The wavelet-based synthesis for fractional Brownian motion proposed by F. Sellan and Y. Meyer: Remarks and fast implementation”. Applied and Computational Harmonic Analysis, 3(4):377-383, 1996. 2. N. Balakrishnan, Jose Maria Sarabia, Enrique Castillo and Ali S. Hadi. “Extreme Value and Related Models with Applications in Engineering and Science”. Wiley, Wiley, 2004. 3. George Casella and Roger L. Berger. “Statistical Inference”. Duxbury, Pacific Grove, CA 93950, USA, 2001. 4. Abdelmahmoud Youssouf Dahab, Halabi Hasbullah and Abas Md Said, "Predicting Traffic Bursts Using Extreme Value Theory". In International Conference on Signal Acquisition and Processing, pp.229-233, 2009. 5. Paul Embrechts, Thomas Mikosch and Claudia Kl¨uppelberg. “Modelling Extremal Events: for Insurance and Finance”. Springer-Verlag, London, UK,1997. 6. Hifang Feng and Yantai Shu. “Study on network traffic prediction techniques”. International Conference on Wireless Communications, Networking and Mobile Computing, 2005, 2(23-26 Sept. 2005):1041–1044, September 2005. 7. Ramazan Genay, Faruk Seluk, and Abdurrahman Uluglyagci. “Evim: A software package for extreme value analysis in matlab”. Studies in Nonlinear Dynamics & Econometrics, 5(3):1080– 1080, 2001. 8. A. Karasaridis and D. Hatzinakos. “Network heavy traffic modeling using alpha-stable self- similar processes”. Communications, IEEE Transactions on, 49(7):1203–1214, Jul 2001. 9. M. R. Leadbetter and Holger Rootzen. “Extremal Theory for Stochastic Processes”. The Annals of Probability. 16(2):431-478, 1988. 10. M. Li and S. C. Lim. “Modeling network traffic using generalized Cauchy process”. Physica A, 387(11): 2584-2594,15 April 2008. 11. M. Li and W. Zhao. “Representation of a Stochastic Traffic Bound, accepted for publication”. IEEE Transactions on Parallel and Distributed Systems, 2009. 12. Chunfeng Liu, Yantai Shu, and Jiakun Liu. “Application of Extreme Value Theory to the Analysis of Wireless Network Traffic”. In Proceedings of IEEE International Conference on Communications, ICC 2007, Glasgow, Scotland, 24-28 June 2007. 13. Abuagla Babiker Mohd and Sulaiman bin Mohd Nor. “Towards a Flow-based Internet Traffic Classification for Bandwidth Optimization”. International Journal of Computer Science and Security (IJCSS), 3:2, 146-153. 2009. 14. Ilkka Norros. “On the use of fractional brownian motion in the theory of connectionless networks”. IEEE Journal of Selected Areas in Communications, 13(6):953–962, 1995. 15. Kihong Park and Walter Willinger. “Self-Similar Network Traffic and Performance Evaluation”. John Wiley & Sons, New Yotk, USA, 2000. 16. Gennady Samorodnitsky and Murad S. Taqqu. “Stable Non-Gaussian Random Processes: Stochastic Models with Infinite Variance”. Chapman & Hall, London SE1 8HN, 1994. 17. Stilian Stoev and Murad S. Taqqu. “Simulation methods for linear fractional stable motion and farima using the fast fourier transform”. Fractals, 12:2004, 2004.
  • 9. Abdelmahamoud Youssouf Dahab, Abas bin Md Said & Halabi bin Hasbullah Signal Processing: An International Journal, (SPIJ) Volume(3), Issue(4) 63 18. Zoi Tsourti and John Panaretos. “Extreme-value analysis of teletraffic data”. Computational Statistics Data Analysis, 45(1):85 – 103, 2004. Computer Security and Statistics. 19. M. Uchida. “Traffic data analysis based on extreme value theory and its applications”. Global Telecommunications Conference, 2004. GLOBECOM’04. IEEE, 3:1418–1424 Vol.3, Nov.-3 Dec. 2004. 20. Wei Biao Wu, George Michailidis, and Danlu Zhang. “Simulating sample paths of linear fractional stable motion”. IEEE Transactions on Information Theory, 50:1086–1096, 2004. 21. Internet traffic Archive, “http://ita.ee.lbl.gov”. BIOGRAPHY 1. Abdelmahamoud Youssouf Dahab is a PhD candidate in Computer & Information Sciences Department, Universiti Teknologi PETRONAS. 2. AP Dr Abas Md Said is a lecturer in the Computer and Information Sciences Department at Universiti Teknologi PETRONAS. He obtained his Bachelor and Masters of Science from Western Michigan University, USA. Doctor of Philosophy from Loughborough University, UK. He specializes in Computer Graphics, Visualization, and Networks. His research interests are in Optimized Stereoscopic Rendering for Visualization. He has a number of publications. 3. Dr Halabi Hasbullah is a lecturer in the Computer and Information Sciences Department at Universiti Teknologi PETRONAS. He obtained his Bachelor of Science (Mathematics) from Universiti Malaya, Malaysia. His Masters of Science (Information Technology) from De Montfort University, UK. And PhD (Communications System), National University of Malaysia (UKM), Malaysia. He specializes in Network Communications. His research interests are Bluetooth network, Traffic engineering, Wireless sensor network (WSN), Vehicular ad hoc network (VANET), and IPv6. He has a number of publications.