SlideShare a Scribd company logo
The Tale of Heavy Tails in Computer Networking
Stenio Fernandes
CIn/UFPE, Recife, Brazil
Carleton University - ARS Lab – May 2016
Outline
 Essential Concepts and Terminology
| The heavy-tail phenomenon
| Outliers detection
| Heavy-tailed distributions and its variations (subclasses)
 Evidences of Heavy-Tailedness in Computer Networks
| Examples
2
Essential Concepts and Terminology
3
The heavy-tail phenomenon
4
• Heavy-tailedness in computer networking is like Ninjas,
they’re everywhere Internet meme
• Extreme observations must be taken carefully and very
seriously
• Dataset that exhibits very large observation values
makes descriptive and inferential statistical analysis
much more difficult
• It might not make sense to use traditional statistical
techniques and tools in these cases
• Some important initial questions:
• Are we confident to discard single, scattered, or
burst of observations that presents extreme values
due to uncontrolled factors?
• Are the extreme values come from valid
measurements?
Statistical black sheep
 This is what I call Statistical Black Sheep
| the ones that causes shame or embarrassment because of deviation from the accepted
standards of his or her group (Black Sheep definition on M-W)
 You can either keep or discard such measurement values
based on subjective analysis
| It is out of scope of your interest
• Ex.: mean value of cat videos length on YouTube
 Take-home lesson:
| do not disgrace the black sheep without proper reasons
 Decision can also be made based on rigorous statistical
analysis
| a quantitative analysis
| Recall that an outlier might be influential on regression modeling (more on that later) 5
It starts with outliers
Here is an Outlier
6
 Here is another one!
Outliers
 An observation can be considered as outlier if it falls below or
above certain limits
| detection is only an indication that you might want to think carefully about them
 There are a number of formal tests and rules of thumbs to detect
outliers in an observation variable
1. Grubbs’
2. Tietjen-Moore’s
3. Mahalanobis distance
4. Extreme Value Theory (EVT)
5. Generalized Extreme Studentized Deviate (ESD)
 Try to not be so picky when choosing the method
| simply because outlier detection and handling is an art
| a subjective approach plays an important role to accommodate outliers in your analysis 7
Outliers in Regression Models
8
Outliers
 Kurtosis: concrete idea
about the expected
number of outliers
| High (strong skewness) or low
(weak skewness)
 A general approach for
outlier detection
| identify values apart from the
central values( in terms of 𝜎)
| A common and simple approach
• define the fences as 𝜇 ± 3 × 𝜎
• 𝜇 is the sample mean
• 𝜎 is the sample standard deviation
(more conservative: use 4 𝜎) 9
Outliers and Heavy-Tails
 Verify if there are lots of observations outside the fences
| This might be indicating that the underlying phenomenon generates heavy-tailed data
• Your black sheep metamorphoses into a black swan
 If extreme values come from distributions with heavy tails
• Weibull, Gamma, Pareto
| Such events are not so rare
| They are likely to be part of the underlying phenomenon
 If you decided to keep the outliers
| You recognized them as part of the underlying data generation process
| You need to address them properly
 Why do we need to use other statistical measures when dealing
with heavy-tailed distributions?
10
Moments from Heavy-Tails Distributions
11
More on Heavy-Tails
 Classification
| Light or Thin tail
| Fat or Heavy tail
| Long tail
 Light and thin tailed distributions are always used as references
| Normal and Exponential distributions
| Definition: A probability distribution that has an exponentially decaying complementary CDF
 Heavy-tailed distributions are the general ones
| most formal analysis of heavy tailed distributions indeed deals with right heavy tailed
distributions with [0, ∞] support
| the term fat tail is not well accepted by the traditional (and more formal) communities of
statisticians and mathematicians, although is widely used in the finance one
12
Some Formal Stuff
 Some intuition behind the concepts
| Power-Law is a relation between two variables in a 𝑝 𝑥 ∝ 𝑐𝑓(𝑥) form, where 𝑓 𝑥 takes a
general form of 𝑥−𝛼
| There are dozens of power-law distributions
• Zipf and Pareto are the most well-known ones in the computer networking field
| They have interesting mathematical properties, such as the tails fall asymptotically according
to the power parameter
 The Pareto distribution became famous due to its capability to
fit in and model well real-world related problems
| The Pareto rule (or principle), aka the 80/20 rule, has been used to exemplify clearly that
phenomena of all sorts are running far from the Normal distribution.
• It is clear that the normal is not being Normal!
13
Some Formalities
 A non-negative random variable X, either continuous or discrete,
can be considered a Power-Law distribution if it follows
| 𝑃[𝑋 ≥ 𝑥] ∼ 𝑐𝑥−𝛼
| where c and 𝛼 are the constant parameters that characterize the distribution.
• 𝛼 is known as the scale parameter. Both constants are positive.
 Heavy-tailedness
| The tail of a function 𝐹(𝑥, ∞) is denoted by 𝐹 = 𝑃(𝑋 > 𝑥), where F is the distribution function
of a random variable X.
• F is (right) heavy-tailed if 𝐸 𝑒 𝜆𝑋
= ∞, for all λ > 0.
• The distribution is light-tailed when 𝐸 𝑒 𝜆𝑋
< ∞.
14
Some Formalities
 Long-tailedness
 𝐹 (the survival function) is long-tailed when
| lim
𝑥→∞
𝐹(𝑥+𝜆)
𝐹(𝑥)
= 1, 𝑓𝑜𝑟 𝑎𝑙𝑙 𝜆 > 0
• 𝐹 is a non-increasing function, so it converges to 1.
| Considering that the tail of 𝐹 has a polynomial decay rate −𝛼 (i.e., the tail index), its 𝑘 𝑡ℎ
moments are infinite for all 𝑘 > 𝛼.
 The Pareto case
| One interesting property of a Power-Law distribution is that if you take the logarithmic scale
plot (i.e., log-log) of the CCDF – in a rank plot - it should present a straight line
| Its density function is given by:
• 𝑝 𝑥 = 𝛼𝑘 𝛼 𝑥−𝛼−1
15
Some Formalities
 Pareto shows interesting features
| If 𝛼 ≤ 1, there is no first moment, i.e., its mean is infinite.
| In the case of if 0 < 𝛼 ≤ 2, its variance is also infinite (heavy-tail).
| A Pareto PDF is scale free
• In computer networking problems, it can capture self-similar behavior (aka fractal) in several layers
of the protocol stack
 A log-log view of the Pareto PDF reveals, as expected, a straight
line, as follows:
| ln 𝑝(𝑥) = −𝛼 − 1 ln 𝑥 + 𝛼 ln 𝑘 + ln 𝛼
 The second and third terms of the equation are constants.
| The relation between ln 𝑝(𝑥) and ln 𝑥 is linear, where −𝛼 − 1 is its slope.
• A simple approach for identifying the scale parameter is by means of linear regression.
16
17
Take-home lesson
 The fact is that some universal statistical practices and theories
do not hold if the data follows a heavy-tailed distribution
| The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) do not hold when
dealing with heavy tailed distributions.
• This is due to the fact that their first or second moments are not finite, which is the fundamental
assumption that supports both LLN and CLT.
18
Evidences of Heavy-Tailedness in Computer Networks
19
Evidences of Heavy-tailedness
 Extreme events in nature occurs in both micro and macro scales
 a number of case studies and evidences of the occurrence of
extreme events
| nature (e.g., earthquakes, landslides, floods, droughts, storms)
| human-induced catastrophes (spills, nuclear accidents, dam ruptures, power outages)
| financial (e.g., wealth distribution. When the 0.1% richer has 50% of the world’s wealth)
| geo- and socio-political area (e.g., human fatalities in wars)
| online social network phenomenon (e.g., tweets like “the naked celebrity pics leak cracks
down the Internet”), which is known to causes spikes in traffic from time to time
 Extreme events in computer networking have been studied (by
measurements, modelling, and analyses) for decades
| Unfortunately, a number of network engineers and researchers still do not take such
phenomena carefully 20
Some Examples
 Power-Law distributions in Internet measurements
| web objects have a tight relation with long tails
• Images, Texts, Video, Embedded code
| modelling issues and implications for network planning and design (e.g., web caching architectures)
• Question like “What is the average size of web objects in the Internet?” should not be answered by calculating
the mean value!
 Recent Studies in mobile environments
| typical performance metrics follows heavy-tailed distributions
• main object sizes
• embedded object size
• number of embedded objects in one request
• embedded object inter-arrival time
• session duration
• interval between two consecutive requests (aka the reading time)
21
Some Examples
 Video Systems
| YouTube: The number views can be modelled well by Zipf, Weibull, or Gamma distributions
• Zipf-like distributions fit well this popularity metric in mobile environments
 Intriguing cases of heavy tailedness in the Internet are in the network layer
| strong evidences of heavy tailedness for the sampled IP addressed
| distributions of IP packets per aggregation are all following a Power-law distribution
• the number of packets per flow, unique address, or IP prefixes
 Internet connectivity at several levels of aggregation can be modeled with
heavy tail distributions
 P2P Systems
| video popularity
| session duration
| churn of peers
• user arrival and departure at/from the overlay network
| Different studies have reported different distributions (just be careful with the choice)
22
The Tale of Heavy Tails in Computer Networking
Stenio Fernandes
CIn/UFPE, Recife, Brazil
Carleton University - ARS Lab – May 2016

More Related Content

PPTX
Neural network for machine learning
PPT
2.7 other classifiers
PPT
lecture_mooney.ppt
PPT
Instance Based Learning in Machine Learning
PPTX
Instance based learning
PPTX
Information retrieval 14 fuzzy set models of ir
PPT
Download It
PPTX
Terminology Machine Learning
Neural network for machine learning
2.7 other classifiers
lecture_mooney.ppt
Instance Based Learning in Machine Learning
Instance based learning
Information retrieval 14 fuzzy set models of ir
Download It
Terminology Machine Learning

Viewers also liked (20)

PDF
Computer networks--introduction computer-networking
PPTX
Networking with Purpose - the Lincoln Hub: Dr Andrew West Vice Chancellor, Li...
PDF
Intelligent Mobile Broadband
PDF
DPI R&D Service
PPTX
Data analytics in computer networking
PPTX
Long Tail Keyword Research - SMX Advanced London 2011
PPTX
240z Tail Light Enhancements
PDF
Measuring Private Cloud Resiliency
PPTX
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
PPSX
Stability analysis of impulsive fractional differential systems with delay
PPTX
Nic solution strategy
PPTX
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
PDF
Numerical Simulations Of Basic Interfacial Instabilities With the Improved Tw...
PPTX
Vineyard Networks Product Overview
PDF
Traffic Management, DPI, Internet Offload Gateway
PDF
Deep Packet Inspection (DPI) Test Methodology
PDF
Best Ways of Using Moodle
PPTX
Network topology.ppt
PPTX
Synchronization of multihop sensor networks in the app layer
Computer networks--introduction computer-networking
Networking with Purpose - the Lincoln Hub: Dr Andrew West Vice Chancellor, Li...
Intelligent Mobile Broadband
DPI R&D Service
Data analytics in computer networking
Long Tail Keyword Research - SMX Advanced London 2011
240z Tail Light Enhancements
Measuring Private Cloud Resiliency
Globecom - MENS 2011 - Characterizing Signature Sets for Testing DPI Systems
Stability analysis of impulsive fractional differential systems with delay
Nic solution strategy
Robust Stability and Disturbance Analysis of a Class of Networked Control Sys...
Numerical Simulations Of Basic Interfacial Instabilities With the Improved Tw...
Vineyard Networks Product Overview
Traffic Management, DPI, Internet Offload Gateway
Deep Packet Inspection (DPI) Test Methodology
Best Ways of Using Moodle
Network topology.ppt
Synchronization of multihop sensor networks in the app layer
Ad

Similar to The tale of heavy tails in computer networking (20)

PPT
Role of Scaling in Developing an Understanding of How Systems Work OR the dan...
PPT
PPTX
Chapters 14 and 15 presentation
PDF
Attractors distribution
PDF
Think_Stats.pdf
PPT
Input analysis
PDF
16928_5302_1.pdf
PPTX
Probability distribution Function & Decision Trees in machine learning
PPT
Chapter0
PDF
The math behind big systems analysis.
PDF
Data analysis00 commonprobabilitymodels
PDF
Fat-tail inputs in manufacturing systems (Industrial Engineering Research Con...
PPT
1504 basic statistics
PPTX
BDA_MO_1_S7_Apply_basic_analytics_methods_such_as_distributions.pptx
PDF
A brief history of generative models for power law and lognormal ...
PDF
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllister
PDF
Know Your Data: The stats behind your alerts
PDF
outiar.pdf
PPT
1608 probability and statistics in engineering
PDF
CS-438 COMPUTER SYSTEM MODELING WK5LEC9-10.pdf
Role of Scaling in Developing an Understanding of How Systems Work OR the dan...
Chapters 14 and 15 presentation
Attractors distribution
Think_Stats.pdf
Input analysis
16928_5302_1.pdf
Probability distribution Function & Decision Trees in machine learning
Chapter0
The math behind big systems analysis.
Data analysis00 commonprobabilitymodels
Fat-tail inputs in manufacturing systems (Industrial Engineering Research Con...
1504 basic statistics
BDA_MO_1_S7_Apply_basic_analytics_methods_such_as_distributions.pptx
A brief history of generative models for power law and lognormal ...
OSMC 2023 | Know your data: The stats behind your alerts by Dave McAllister
Know Your Data: The stats behind your alerts
outiar.pdf
1608 probability and statistics in engineering
CS-438 COMPUTER SYSTEM MODELING WK5LEC9-10.pdf
Ad

More from Stenio Fernandes (7)

PDF
SDN Dependability: Assessment, Techniques, and Tools - SDN Research Group - I...
PDF
A brief history of streaming video in the Internet
PPTX
Research Challenges and Opportunities in the Era of the Internet of Everythin...
PDF
Orientações para a pós graduação - reunião semestral - orientandos - 2014.1
PPTX
IEEE ICC 2012 - Dependability Assessment of Virtualized Networks
PPTX
Big Data Analytics and Advanced Computer Networking Scenarios
PDF
A referee's plea reviewed
SDN Dependability: Assessment, Techniques, and Tools - SDN Research Group - I...
A brief history of streaming video in the Internet
Research Challenges and Opportunities in the Era of the Internet of Everythin...
Orientações para a pós graduação - reunião semestral - orientandos - 2014.1
IEEE ICC 2012 - Dependability Assessment of Virtualized Networks
Big Data Analytics and Advanced Computer Networking Scenarios
A referee's plea reviewed

Recently uploaded (20)

PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
Sustainable Sites - Green Building Construction
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPT
Project quality management in manufacturing
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PDF
Well-logging-methods_new................
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
CH1 Production IntroductoryConcepts.pptx
DOCX
573137875-Attendance-Management-System-original
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Sustainable Sites - Green Building Construction
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
Project quality management in manufacturing
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Operating System & Kernel Study Guide-1 - converted.pdf
Well-logging-methods_new................
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
CH1 Production IntroductoryConcepts.pptx
573137875-Attendance-Management-System-original
CYBER-CRIMES AND SECURITY A guide to understanding
Embodied AI: Ushering in the Next Era of Intelligent Systems
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Automation-in-Manufacturing-Chapter-Introduction.pdf
OOP with Java - Java Introduction (Basics)
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx

The tale of heavy tails in computer networking

  • 1. The Tale of Heavy Tails in Computer Networking Stenio Fernandes CIn/UFPE, Recife, Brazil Carleton University - ARS Lab – May 2016
  • 2. Outline  Essential Concepts and Terminology | The heavy-tail phenomenon | Outliers detection | Heavy-tailed distributions and its variations (subclasses)  Evidences of Heavy-Tailedness in Computer Networks | Examples 2
  • 3. Essential Concepts and Terminology 3
  • 4. The heavy-tail phenomenon 4 • Heavy-tailedness in computer networking is like Ninjas, they’re everywhere Internet meme • Extreme observations must be taken carefully and very seriously • Dataset that exhibits very large observation values makes descriptive and inferential statistical analysis much more difficult • It might not make sense to use traditional statistical techniques and tools in these cases • Some important initial questions: • Are we confident to discard single, scattered, or burst of observations that presents extreme values due to uncontrolled factors? • Are the extreme values come from valid measurements?
  • 5. Statistical black sheep  This is what I call Statistical Black Sheep | the ones that causes shame or embarrassment because of deviation from the accepted standards of his or her group (Black Sheep definition on M-W)  You can either keep or discard such measurement values based on subjective analysis | It is out of scope of your interest • Ex.: mean value of cat videos length on YouTube  Take-home lesson: | do not disgrace the black sheep without proper reasons  Decision can also be made based on rigorous statistical analysis | a quantitative analysis | Recall that an outlier might be influential on regression modeling (more on that later) 5
  • 6. It starts with outliers Here is an Outlier 6  Here is another one!
  • 7. Outliers  An observation can be considered as outlier if it falls below or above certain limits | detection is only an indication that you might want to think carefully about them  There are a number of formal tests and rules of thumbs to detect outliers in an observation variable 1. Grubbs’ 2. Tietjen-Moore’s 3. Mahalanobis distance 4. Extreme Value Theory (EVT) 5. Generalized Extreme Studentized Deviate (ESD)  Try to not be so picky when choosing the method | simply because outlier detection and handling is an art | a subjective approach plays an important role to accommodate outliers in your analysis 7
  • 9. Outliers  Kurtosis: concrete idea about the expected number of outliers | High (strong skewness) or low (weak skewness)  A general approach for outlier detection | identify values apart from the central values( in terms of 𝜎) | A common and simple approach • define the fences as 𝜇 ± 3 × 𝜎 • 𝜇 is the sample mean • 𝜎 is the sample standard deviation (more conservative: use 4 𝜎) 9
  • 10. Outliers and Heavy-Tails  Verify if there are lots of observations outside the fences | This might be indicating that the underlying phenomenon generates heavy-tailed data • Your black sheep metamorphoses into a black swan  If extreme values come from distributions with heavy tails • Weibull, Gamma, Pareto | Such events are not so rare | They are likely to be part of the underlying phenomenon  If you decided to keep the outliers | You recognized them as part of the underlying data generation process | You need to address them properly  Why do we need to use other statistical measures when dealing with heavy-tailed distributions? 10
  • 11. Moments from Heavy-Tails Distributions 11
  • 12. More on Heavy-Tails  Classification | Light or Thin tail | Fat or Heavy tail | Long tail  Light and thin tailed distributions are always used as references | Normal and Exponential distributions | Definition: A probability distribution that has an exponentially decaying complementary CDF  Heavy-tailed distributions are the general ones | most formal analysis of heavy tailed distributions indeed deals with right heavy tailed distributions with [0, ∞] support | the term fat tail is not well accepted by the traditional (and more formal) communities of statisticians and mathematicians, although is widely used in the finance one 12
  • 13. Some Formal Stuff  Some intuition behind the concepts | Power-Law is a relation between two variables in a 𝑝 𝑥 ∝ 𝑐𝑓(𝑥) form, where 𝑓 𝑥 takes a general form of 𝑥−𝛼 | There are dozens of power-law distributions • Zipf and Pareto are the most well-known ones in the computer networking field | They have interesting mathematical properties, such as the tails fall asymptotically according to the power parameter  The Pareto distribution became famous due to its capability to fit in and model well real-world related problems | The Pareto rule (or principle), aka the 80/20 rule, has been used to exemplify clearly that phenomena of all sorts are running far from the Normal distribution. • It is clear that the normal is not being Normal! 13
  • 14. Some Formalities  A non-negative random variable X, either continuous or discrete, can be considered a Power-Law distribution if it follows | 𝑃[𝑋 ≥ 𝑥] ∼ 𝑐𝑥−𝛼 | where c and 𝛼 are the constant parameters that characterize the distribution. • 𝛼 is known as the scale parameter. Both constants are positive.  Heavy-tailedness | The tail of a function 𝐹(𝑥, ∞) is denoted by 𝐹 = 𝑃(𝑋 > 𝑥), where F is the distribution function of a random variable X. • F is (right) heavy-tailed if 𝐸 𝑒 𝜆𝑋 = ∞, for all λ > 0. • The distribution is light-tailed when 𝐸 𝑒 𝜆𝑋 < ∞. 14
  • 15. Some Formalities  Long-tailedness  𝐹 (the survival function) is long-tailed when | lim 𝑥→∞ 𝐹(𝑥+𝜆) 𝐹(𝑥) = 1, 𝑓𝑜𝑟 𝑎𝑙𝑙 𝜆 > 0 • 𝐹 is a non-increasing function, so it converges to 1. | Considering that the tail of 𝐹 has a polynomial decay rate −𝛼 (i.e., the tail index), its 𝑘 𝑡ℎ moments are infinite for all 𝑘 > 𝛼.  The Pareto case | One interesting property of a Power-Law distribution is that if you take the logarithmic scale plot (i.e., log-log) of the CCDF – in a rank plot - it should present a straight line | Its density function is given by: • 𝑝 𝑥 = 𝛼𝑘 𝛼 𝑥−𝛼−1 15
  • 16. Some Formalities  Pareto shows interesting features | If 𝛼 ≤ 1, there is no first moment, i.e., its mean is infinite. | In the case of if 0 < 𝛼 ≤ 2, its variance is also infinite (heavy-tail). | A Pareto PDF is scale free • In computer networking problems, it can capture self-similar behavior (aka fractal) in several layers of the protocol stack  A log-log view of the Pareto PDF reveals, as expected, a straight line, as follows: | ln 𝑝(𝑥) = −𝛼 − 1 ln 𝑥 + 𝛼 ln 𝑘 + ln 𝛼  The second and third terms of the equation are constants. | The relation between ln 𝑝(𝑥) and ln 𝑥 is linear, where −𝛼 − 1 is its slope. • A simple approach for identifying the scale parameter is by means of linear regression. 16
  • 17. 17
  • 18. Take-home lesson  The fact is that some universal statistical practices and theories do not hold if the data follows a heavy-tailed distribution | The Law of Large Numbers (LLN) and the Central Limit Theorem (CLT) do not hold when dealing with heavy tailed distributions. • This is due to the fact that their first or second moments are not finite, which is the fundamental assumption that supports both LLN and CLT. 18
  • 19. Evidences of Heavy-Tailedness in Computer Networks 19
  • 20. Evidences of Heavy-tailedness  Extreme events in nature occurs in both micro and macro scales  a number of case studies and evidences of the occurrence of extreme events | nature (e.g., earthquakes, landslides, floods, droughts, storms) | human-induced catastrophes (spills, nuclear accidents, dam ruptures, power outages) | financial (e.g., wealth distribution. When the 0.1% richer has 50% of the world’s wealth) | geo- and socio-political area (e.g., human fatalities in wars) | online social network phenomenon (e.g., tweets like “the naked celebrity pics leak cracks down the Internet”), which is known to causes spikes in traffic from time to time  Extreme events in computer networking have been studied (by measurements, modelling, and analyses) for decades | Unfortunately, a number of network engineers and researchers still do not take such phenomena carefully 20
  • 21. Some Examples  Power-Law distributions in Internet measurements | web objects have a tight relation with long tails • Images, Texts, Video, Embedded code | modelling issues and implications for network planning and design (e.g., web caching architectures) • Question like “What is the average size of web objects in the Internet?” should not be answered by calculating the mean value!  Recent Studies in mobile environments | typical performance metrics follows heavy-tailed distributions • main object sizes • embedded object size • number of embedded objects in one request • embedded object inter-arrival time • session duration • interval between two consecutive requests (aka the reading time) 21
  • 22. Some Examples  Video Systems | YouTube: The number views can be modelled well by Zipf, Weibull, or Gamma distributions • Zipf-like distributions fit well this popularity metric in mobile environments  Intriguing cases of heavy tailedness in the Internet are in the network layer | strong evidences of heavy tailedness for the sampled IP addressed | distributions of IP packets per aggregation are all following a Power-law distribution • the number of packets per flow, unique address, or IP prefixes  Internet connectivity at several levels of aggregation can be modeled with heavy tail distributions  P2P Systems | video popularity | session duration | churn of peers • user arrival and departure at/from the overlay network | Different studies have reported different distributions (just be careful with the choice) 22
  • 23. The Tale of Heavy Tails in Computer Networking Stenio Fernandes CIn/UFPE, Recife, Brazil Carleton University - ARS Lab – May 2016