SlideShare a Scribd company logo
Intrinsic Temporal Locality
Properties of Web Request Streams


        Rodrigo Fonseca (UC Berkeley)
           Virgílio Almeida (UFMG)
       Mark Crovella (Boston University)
            Bruno Abrahao (UFMG)

             IEEE INFOCOM 2003
Research Questions

• Network measurements in highly
  dynamic environments
  – Understand the behavior and find
    intrinsic properties
  – Online algorithms for driving self-
    organization of system agents
  – Algorithms for dynamic adapting
    environments with high degree of
    uncertainty

                                          2
Question

• Study temporal locality at different
  points of the Web

  – Results: Framework + metrics:
    • New insights on the causes of temporal
      locality across different points of the Web
      topology



                                                3
Web Measurements

• Why?
  – Understand and improve current systems
  – Design new systems
• Challenges
  – Complexity
  – Rate of change
• Approach
  – Look for regularities, underlying principles
  – Abstractions


                                                   4
Previous Studies

• Many advancements
  – Caching hierarchies
  – Replacement Policies for caches
  – Load distribution and balancing
• However...
  – Focus on individual components
  – No structured view of the entire
    system

                                       5
Stream centric view of the
           Web
• Focus on request streams
  – Look for intrinsic properties
  – How they are altered
  – How their properties change in
    different points
  – How their intrinsic properties can
    influence the designs of components
    and of collections of components

                                          6
Tranformations on Request
         Streams
• Three types of transformations
  – Aggregation
    • Multiple sources
  – Disaggregation                    D
                                      A
                                      F
    • Multiple destinations
  – Filtering
    • Resulting stream = subset of input
      stream


                                           7
Transformations Abstraction

• Components may be abstracted by
  combinations of transformations
       Clients
    Proxy Cache
       Server

        D
        D

        F
        F
        A
         A

                                8
Approach

• For a given intrinsic property of
  streams
  – Study the effects of the
    transformations on the property
  – Combine the effects to understand
    • Effects of components
    • Effects of collections of components
    • Properties at different points of the
      topology


                                              9
Temporal Locality

• Need of intrinsic metrics for the streams
  – We use virtual time
• Two sources:
  – Popularity
     ...XAXBXCXDXE...
     • Preserved with reordering
  – Correlation
     ...AABBCCDDEE...
     • Lost with reordering




                                         10
Measuring Popularity:
         Entropy
– Traditionally Zipf’s Law: pi i-
– We measure the deviation from Uniform
  popularity distribution
– Entropy:           N
                H            pi log2 pi
                       i 1

  •   pi determined empirically from frequencies
  •   N is the number of distinct objects
  •   If uniform, tends to log2(N)
  •   If one object only is accessed, tends to 0
– Not the Entropy rate of the source
                                                   11
Entropy and Zipf’s Law




                         12
Entropy and Zip’f Law




                        13
Measuring Correlation: CV

• Look at IAT distribution per object
  – Number of references between 2 references
    to the same object
• Behavior
  – Correlation  tendency to shorter IATs
  – No Correlation  tends to geometric
    distribution
• We measure the deviation from the
  geometric distribution for IAT
  – Coefficient of Variation:   σ
                                μ            14
IAT Distribution



                   Scrambled




Original


                               15
Metrics Summary

• Entropy
  – High concentration: Low Entropy
  – Uniform distribution: High Entropy
• CV
  – High Correlation: High CV
  – No correlation: CV ~ 1



                                         16
Entropy vs. Hit Ratio

• LRU cache simulation, size 5%




                                  17
CV versus HR Difference

• Difference between scrambled and original streams




                                                      18
Putting it all together



• Effects of the transformations on
  the components of temporal
  locality




                                      19
Effects of Filtering:
     Popularity




                        20
Effects of Filtering:
    Correlation




                        21
Effects of Aggregation and
      Disaggregation


                    CV




 Entropy


                             22
Temporal Locality at
Different Points of the topology

         Clients




                   Proxies
      Servers



                               23
Effects of




             24
Conclusions

• Understanding of behavior
• Transformations Framework allows
  for structured study and
  understanding of Web workloads
  characteristics




                                 25

More Related Content

PDF
On the Separability of Structural Classes of Communities
PDF
Trace Complexity of Network Inference
PPT
On the Internet Delay Space Dimensionality
PPTX
Self-Adaptive SLA-Driven Capacity Management for Internet Services
PDF
Temporal Web Dynamics and its Application to Information Retrieval
PDF
TEFSE05.ppt
PDF
Secondary data analysis with digital trace data
PDF
Jürgens diata12-communities
On the Separability of Structural Classes of Communities
Trace Complexity of Network Inference
On the Internet Delay Space Dimensionality
Self-Adaptive SLA-Driven Capacity Management for Internet Services
Temporal Web Dynamics and its Application to Information Retrieval
TEFSE05.ppt
Secondary data analysis with digital trace data
Jürgens diata12-communities

Similar to On the Intrinsic Locality Properties of Web Reference Streams (20)

PPTX
Eventbrite Data Platform Talk foir SFDM
PPTX
TVOT June 2012
PDF
Searching for Interestingness in Wikipedia and Yahoo! Answers
PPTX
CSC 8101 Non Relational Databases
PPTX
8 better practices from information architecture By: Lou Rosenfeld
PPTX
Eventbrite dataplatform and services - Interest graph based recommendations
PPTX
Ashu Desc
PPTX
Testtting
PPTX
Testtting
PPTX
Computational Social Science, Lecture 03: Counting at Scale, Part I
PPTX
Statistical Analysis of Web of Data Usage
PDF
Cores and Paths - designing a website
PPT
Multi-mediated community structure in a socio-technical network
PDF
PDF
The New Challenge of Data Inflation
PDF
Place graphs are the new social graphs
PPT
Measurement and modeling of the web and related data sets
PPT
PDF
SSBSE10.ppt
PDF
Eventshop 120721
Eventbrite Data Platform Talk foir SFDM
TVOT June 2012
Searching for Interestingness in Wikipedia and Yahoo! Answers
CSC 8101 Non Relational Databases
8 better practices from information architecture By: Lou Rosenfeld
Eventbrite dataplatform and services - Interest graph based recommendations
Ashu Desc
Testtting
Testtting
Computational Social Science, Lecture 03: Counting at Scale, Part I
Statistical Analysis of Web of Data Usage
Cores and Paths - designing a website
Multi-mediated community structure in a socio-technical network
The New Challenge of Data Inflation
Place graphs are the new social graphs
Measurement and modeling of the web and related data sets
SSBSE10.ppt
Eventshop 120721
Ad

Recently uploaded (20)

PPTX
sap open course for s4hana steps from ECC to s4
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation theory and applications.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Cloud computing and distributed systems.
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
A Presentation on Artificial Intelligence
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Spectroscopy.pptx food analysis technology
PPT
Teaching material agriculture food technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
sap open course for s4hana steps from ECC to s4
Diabetes mellitus diagnosis method based random forest with bat algorithm
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation theory and applications.pdf
Unlocking AI with Model Context Protocol (MCP)
Cloud computing and distributed systems.
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
A Presentation on Artificial Intelligence
MIND Revenue Release Quarter 2 2025 Press Release
Mobile App Security Testing_ A Comprehensive Guide.pdf
Spectroscopy.pptx food analysis technology
Teaching material agriculture food technology
Dropbox Q2 2025 Financial Results & Investor Presentation
Assigned Numbers - 2025 - Bluetooth® Document
20250228 LYD VKU AI Blended-Learning.pptx
gpt5_lecture_notes_comprehensive_20250812015547.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
“AI and Expert System Decision Support & Business Intelligence Systems”
Ad

On the Intrinsic Locality Properties of Web Reference Streams

  • 1. Intrinsic Temporal Locality Properties of Web Request Streams Rodrigo Fonseca (UC Berkeley) Virgílio Almeida (UFMG) Mark Crovella (Boston University) Bruno Abrahao (UFMG) IEEE INFOCOM 2003
  • 2. Research Questions • Network measurements in highly dynamic environments – Understand the behavior and find intrinsic properties – Online algorithms for driving self- organization of system agents – Algorithms for dynamic adapting environments with high degree of uncertainty 2
  • 3. Question • Study temporal locality at different points of the Web – Results: Framework + metrics: • New insights on the causes of temporal locality across different points of the Web topology 3
  • 4. Web Measurements • Why? – Understand and improve current systems – Design new systems • Challenges – Complexity – Rate of change • Approach – Look for regularities, underlying principles – Abstractions 4
  • 5. Previous Studies • Many advancements – Caching hierarchies – Replacement Policies for caches – Load distribution and balancing • However... – Focus on individual components – No structured view of the entire system 5
  • 6. Stream centric view of the Web • Focus on request streams – Look for intrinsic properties – How they are altered – How their properties change in different points – How their intrinsic properties can influence the designs of components and of collections of components 6
  • 7. Tranformations on Request Streams • Three types of transformations – Aggregation • Multiple sources – Disaggregation D A F • Multiple destinations – Filtering • Resulting stream = subset of input stream 7
  • 8. Transformations Abstraction • Components may be abstracted by combinations of transformations Clients Proxy Cache Server D D F F A A 8
  • 9. Approach • For a given intrinsic property of streams – Study the effects of the transformations on the property – Combine the effects to understand • Effects of components • Effects of collections of components • Properties at different points of the topology 9
  • 10. Temporal Locality • Need of intrinsic metrics for the streams – We use virtual time • Two sources: – Popularity ...XAXBXCXDXE... • Preserved with reordering – Correlation ...AABBCCDDEE... • Lost with reordering 10
  • 11. Measuring Popularity: Entropy – Traditionally Zipf’s Law: pi i- – We measure the deviation from Uniform popularity distribution – Entropy: N H pi log2 pi i 1 • pi determined empirically from frequencies • N is the number of distinct objects • If uniform, tends to log2(N) • If one object only is accessed, tends to 0 – Not the Entropy rate of the source 11
  • 14. Measuring Correlation: CV • Look at IAT distribution per object – Number of references between 2 references to the same object • Behavior – Correlation  tendency to shorter IATs – No Correlation  tends to geometric distribution • We measure the deviation from the geometric distribution for IAT – Coefficient of Variation: σ μ 14
  • 15. IAT Distribution Scrambled Original 15
  • 16. Metrics Summary • Entropy – High concentration: Low Entropy – Uniform distribution: High Entropy • CV – High Correlation: High CV – No correlation: CV ~ 1 16
  • 17. Entropy vs. Hit Ratio • LRU cache simulation, size 5% 17
  • 18. CV versus HR Difference • Difference between scrambled and original streams 18
  • 19. Putting it all together • Effects of the transformations on the components of temporal locality 19
  • 20. Effects of Filtering: Popularity 20
  • 21. Effects of Filtering: Correlation 21
  • 22. Effects of Aggregation and Disaggregation CV Entropy 22
  • 23. Temporal Locality at Different Points of the topology Clients Proxies Servers 23
  • 25. Conclusions • Understanding of behavior • Transformations Framework allows for structured study and understanding of Web workloads characteristics 25