SlideShare a Scribd company logo
Information Networks and their Dynamics Srinath Srinivasa IIIT Bangalore  and  Oktave Research Foundation [email_address]
Partially based on the book Sage Publishers, New Delhi, London, Thousand Oaks, 2006, ISBN  0761935126
Recent new additions to our vocabulary Telemedicine SMS/MMS e-learning Net Banking  E-ticketing Open-source Privacy policy EULA … Phishing Hacking Cyber crimes Virus / Spyware / Adware / Malware Cyber squatting Identity theft Piracy …
The “information age” Comprehensive change brought by information and communication technologies (ICT) Qualitative  changes affecting the underlying mental model or the “paradigm” Changes affecting the way we live (not just businesses) Separation of information transactions from material transactions
The information age Material exchange network Information exchange network Internet, mobile, databases, etc Then Now
Material exchange Constrained by the laws of physics  Conserved transactions  High cost of replication High cost of transportation
Information exchange with today’s ICTs Intangible (little or no physical constraints) Non-conserved transactions Extremely low replication costs Extremely low transportation costs Hard to “snatch away” internalized information
Information Networks Historically, information was “piggy backed” over a material carrier giving information networks the same characteristics as material networks With today’s technologies, communication and coordination is separated from transport and logistics Several kinds of transactions are pure information transactions having no material component. Ex: software, data, news, knowledge, etc.  How are such information networks different from material exchange networks?
Outline Part I:  Information networks and the Power Law distribution Part II:  Underlying dynamics Part III:  Social information networks
Part I Information Networks and the Power Law Distribution
Distribution of marks in an exam i.i.d (independent and identically distributed) processes Approximates a Gaussian or “Normal” distribution (binomial in the discrete case)  Mode near the mean Very ubiquitous  Finite variance and the central limit theorem
Distribution of email recipients Most recipients have received very small number of emails However, a small number of recipients have received a very large number of emails Approximates the “Power Law” distribution Infinite variance or scale-free system
The Power Law distribution Pr[X  =  x]    x -   for a given exponent   Straight line on a log-log scale  Infinite variance Scale-free (self similar)
Underlying random processes Exam system: A set of n independent random processes Email system: A set of n  interdependent  random processes  Emails part of conversations 1 2 3 4 1 2 3 4
Power Laws in nature Population distribution across human settlements Global airline networks WWW in-degree and out-degree Sizes of blood vessels in the human body Wealth distribution Frequency of word occurrence in documents Frequency of keyword searches on the web Distribution of earthquake sizes against their frequency  etc..
Characteristics of the Power Law Intuitive Very small number of very large entities and very large number of very small entities Infinite variance or “long-tailed” distribution (for certain value ranges of the exponent  
Characteristics of the Power Law Mathematical Distribution function Scale-invariance property  log-linear relationship with exponent
Other pertinent distributions Zipf distribution Empirical result for word frequencies in document corpora f(x): frequency of word x  r(x): Rank of word x (the r th  most frequent word)  Shown to be equivalent to the power-law distribution
Other pertinent distributions Pareto’s law x min  is the min value taken by x and    > 0  When 0 <     ·  1, then the mean is infinite, and when 1 <     ·  2, the variance is infinite Informally called the 80-20 principle Shown to be equivalent to the power-law distribution
Other pertinent distributions Log-normal distribution y = f(x) is log-normally distributed, if ln y is normally distributed Approximates a power-law if the variance of ln y is very large An alternative (sometimes better) characterization of interdependent random processes Generated by product of i.i.d random processes
Part II Underlying Dynamics
Non-linearity Interdependent system with circular causalities Also called “complex systems” Feedback: a central characteristic Positive feedback (reinforcing loops) and negative feedback (balancing loops)
Non-linearity: growth Feedback makes the present state of the system, a function of the previous states When x 0  > 0 and r > 1, we have positive feedback and x grows over time
Non-linearity: saturation However, every system usually also has a “saturation” point beyond which it cannot grow. The system reaches the saturation point asymptotically  If w.l.o.g. the saturation point is ‘1’ then the dynamical equation becomes: This is called the “logistic” equation (population equation) and is representative of a large class of real-world systems
Logistic equation in everyday terms The rich get richer – up to a certain point Large cities attract more migrants – until its infrastructure saturates Celebrities (people who have media attention) get more media attention – until people get bored of them  Pages with high PageRank get higher PageRank – until either user attention or search engine popularity saturates Large population leads to larger population – until resources saturate
Sensitivity to initial conditions Case: What happens when two or more non-linear processes share resources among themselves?
Sensitivity to initial conditions
Sensitivity to initial conditions The growth ‘r’ of both A and B feed on the same population base The growth of A is at the cost of B and vice versa  The growth of either A or B is dependent on their present population Small differentials in initial populations can tilt the balance irrevocably
Preferential attachment The population distribution among the cells follows a power law
Impact of growth rate on dynamics
Impact of growth rate on dynamics r = 3.0 r = 3.1 r = 3.2 r = 3.5
Impact of growth rate on dynamics r = 3.7 r = 3.9
Period doubling and chaos Increasing growth rate in a saturation system leads to oscillations with increasing frequency For growth rates r = [3,4), a phenomenon called “period doubling” or “bifurcations” is witnessed with oscillations developing sub-oscillations The rate at which sub-oscillations develop in the logistic equation is known to be a constant (~ 4.66920) called the Feigenbaum’s constant When r  ¸  4, the system breaks down
Period doubling in the logistic equation
Attractors A stable non-linear system eventually displays an “attractor” pattern Attractor patterns can be “emergent” or “scale invariant”  Emergence: Aggregate property that cannot be seen in the individual parts Scale invariance: Sub-systems displaying the same properties as the aggregate
Emergent Attractors
Emergent Attractors
Emergent Attractors
Scale-invariant attractors
Scale-invariant attractors
Part III Social information networks
Outline for Part III Random graphs Largest connected component Small-world networks Information cascades Emergence of network topology
Machines Societies Designed for a specific purpose Structure, a result of design Complementary components Component dynamics need  coordination Made up of autonomous actors pursuing self-interest Structure an emergent property -- result of evolution Actor dynamics need  management Machines of nature – living beings – are more like societies rather than machines
Social information networks Information networks formed in a society of autonomous actors Network connections typically a function of self-interest dynamics Resulting network structure interesting for its attractor properties
Random graphs Simplest form of social network models Given a population of nodes, edges are randomly added Properties to observe: Size of the largest connected component (system connectivity) Diameter of the graph (maximum degree of separation)
Random graphs Largest connected component Measures system connectivity Calibrates the spread of ideas and influence Diameter of the graph Measures the degree of separation Calibrates distortion (or lack of it) in the spread of ideas and influence Large connected component Useful for disseminating information Small degree of separation Useful for business connections to develop
Largest connected component
Largest connected component Connectivity in a system with n nodes witnesses an inflection roughly when n/2 random edges are added With n random edges, roughly 80% of the system is connected Connectivity starts saturating around 4n random edges
Random graph diameter
Random graph diameter Adding random edges increases connectivity, but also increases the overall degree of separation! Degree of separation starts reducing after reaching a peak value (More communication links makes the world bigger before it becomes smaller) Small world networks: Networks having a diameter much less than the number of nodes
Clustered graphs Social networks are better modeled as  clustered graphs , rather than pure random graphs Clustered graph property: If A knows B and C, then with a very high probability, B and C know each other Random or “long distance” edges link disparate clusters or communities
Clustered graphs in metric spaces Nodes arranged in a metric space (having a distance function between node pairs) Clustering probability proportional to distance Random connections reduce as distance increases
Clustered graphs in metric spaces Node u connects to node v with a probability of:   (u,v) -   where   (u,v) is the distance between u and v and    is the “clustering coefficient.”
Clustered graphs in metric spaces When    is high, the network becomes a clustered graph.  Network has a large number of local connections, making it easy to navigate It has very small number of long-distance connections making the diameter high.
Clustered graphs in metric spaces When    is small, long distance connections are as frequent as local connections With enough edges, the diameter of the graph becomes small But navigability suffers! Even though short paths exist, it is not possible to discover them from local information
Kleinberg connectivity At a critical value of    = 2,  the clustering property of large    and small world property of small    balance each other Such a graph not only has a short diameter, but short paths are also discoverable from local information Such connectivity is also called Kleinberg connectivity
Kleinberg connectivity An optimal graph structure balancing spread of information and minimizing distortion Alternate way of verifying Kleinberg connectivity: A node as the same connectivity with nodes at different levels of granularity Example: If you have n friends who live in the same street, n friends in the city, n friends in the country, n friends across the world; you’ve started a Kleinberg connectivity.
Information cascades Spread of information/ideas/fads across large populations Two critical factors determining information cascades: Network configuration “Conformity”
Asch conformity experiment
Asch conformity experiment A majority of the subjects decided to  conform  to the group opinion, even though the correct answer was starkly visible!  The probability of conformance was found to be a function of the ratio of the majority versus minority, rather than absolute numbers
Conformity and cascades A is more likely to adopt a new idea spreading through the network as compared to B
Information cascades An idea originating from ‘a’ cascades to b, c and h when the conformity threshold is 0.5. It never cascades to ‘d’ because d is under pressure to conform to status quo from e, f and g.
Information cascades Too little connectivity: insufficient exposure, not conducive for information cascades Too much connectivity: inertia and conformance, not conducive for information cascades  In stark contrast to the epidemic spread of diseases – high connectivity means greater chances of epidemics
Emergence of network topology [Venkatasubramanian et. al 2004] Given a society of n actors (nodes) Each actor has survival demands, the supply for which may exist anywhere in the network Communication network has three optimization criteria: Efficiency Robustness  Cost
Emergence of network topology Cost: Each communication channel (edge) adds to the cost. Cost is kept constant by giving each node only one edge Efficiency: The system is efficient if the all-pairs separation between nodes is minimized  Robustness: The system is robust if the network remains connected in the face of node failures
Emergence of network topology Topology Breeding: Cost is kept constant by giving each node exactly one edge Robustness is bounded by allowing the failure of any one node Random topologies are generated and combined. Topologies with lower fit functions are discarded Fit calculated by a parameter    that trades between efficiency and robustness
Emergence of network topology Emergent topology when    = 1 (100% importance to efficiency and 0% importance to robustness) Star has the smallest degree of separation for a network of n nodes and n edges Failure of the central node disconnects the society
Emergence of network topology Emergent topology when    = 0 (100% importance to robustness and 0% importance to efficiency) Circle keeps the society connected in the face of single node failure High degree of separation (not efficient)
Emergence of network topology Emergent topology when    = 0.78 Intermediate values of    gives a variety of “hub and spoke” topologies –  combinations of circle and star When n  !   1  degree distribution in the hub and spoke resembles a power-law
Perceived value and saturation In a society, actors connect to one another to receive “value” In making a decision to connect to somebody, there “perceived value” function to be optimized Following cases of networks:  Small number of partners (costly connections, material exchange networks) Large number of partners (frictionless connections, information networks)
Perceived value and saturation When an actor connects to another actor  i , there is a  perceived value v i  attached to that actor In addition, there a  satisfaction value  or  saturation limit  S for each actor Connections are established until the accumulated perceived value reaches the required saturation limit Law of diminishing returns:  The perceived value assigned to the k th  node decreases as k increases even if the intrinsic value provided by the node is the same.  cumulative value at node j:
Perceived value and saturation As  z  !   1 , cumulative value at any node j can be approximated as  S j z  = v [ln z + c]  Setting the intrinsic value v = 1 the average global satisfaction  metric is now given by S =  h  S j z   i  = c +  h ln z (j) i   In other words, global satisfaction measure grows as a function  of the log of the average degree distribution.
Perceived value and saturation Maximum Entropy: In addition to saturation, connections are assumed to be made in a  least biased  fashion so as to minimize the latent uncertainty about the connection in the face of failures.  The resultant distribution of node degrees can be formulated using the maximum entropy principle under the constraint for the global satisfaction function:  S  /h ln z i As z  !   1 , we get a power-law distribution:
The power-law network is hence an  optimal  network topology in frictionless transactions arising out of a number of individual decisions aiming to maximize value and minimize uncertainty!
Thank You! Q & A
Further reading L. A. Adamic. Zipf, Power-laws and Pareto: A ranking tutorial. HP Labs technical report. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html Karthik B.R., Aditya Ramana Rachakonda, Srinath Srinivasa. Strange Central-Limit Properties of Keyword Queries on the Web. IIITB Technical Report 2007.  Jon Kleinberg. The small-world phenomena: An algorithmic perspective. 2000. http://www.cs.cornell.edu/home/kleinber/swn.ps Albert-László Barabási and Réka Albert. Emergence of scaling in random networks.  Science,  Volume   286,   509–512, 1999.  M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions.  Internet Mathematics  Vol 1, No. 2, 226–251, 2003. M. E. J. Newman. Power laws, Pareto distributions and Zipf's law.  Contemporary Physics  Vol 46, 323–351.  Venkat Venkatasubramanian, Santhoji Katare, Priyan R. Patkar, Fang-ping Mu. Spontaneous emergence of complex optimal networks through evolutionary adaptation.  Computers and Chemical Engineering , Vol 28, pp 1789—1798, 2004. Venkat Venkatasubramanian, Dimitris Politis, Priyan Patkar. Entropy maximization as a holistic design principle for complex, optimal networks. AIChE (American Institute for Chemical Engineers) Journal, Vol. 52, No. 3, pp 1004—1009, March 2006.

More Related Content

PPTX
Social Network Analysis: An Overview
PPTX
06 Community Detection
PPT
It’s a “small world” after all
PPTX
02 Descriptive Statistics (2017)
PPTX
03 Communities in Networks (2017)
PPTX
01 Network Data Collection (2017)
PPTX
Social Network Analysis: An Overview
PPTX
Social Network Analysis
Social Network Analysis: An Overview
06 Community Detection
It’s a “small world” after all
02 Descriptive Statistics (2017)
03 Communities in Networks (2017)
01 Network Data Collection (2017)
Social Network Analysis: An Overview
Social Network Analysis

What's hot (20)

PDF
CS6010 Social Network Analysis Unit V
PPTX
04 Data Visualization (2017)
PPTX
13 Community Detection
PDF
Community Detection in Social Networks: A Brief Overview
PPTX
05 Network Canvas (2017)
PPTX
04 Network Data Collection
PPTX
15 Network Visualization and Communities
PPTX
Social network analysis
PPTX
12 Network Experiments and Interventions: Studying Information Diffusion and ...
PDF
Social network analysis & Big Data - Telecommunications and more
PPTX
Group and Community Detection in Social Networks
PPTX
Community detection in complex social networks
PPTX
10 More than a Pretty Picture: Visual Thinking in Network Studies
PPT
Social network analysis course 2010 - 2011
PPTX
00 Introduction to SN&H: Key Concepts and Overview
PDF
Network Science: Theory, Modeling and Applications
PPTX
Visualizing Big Data - Social Network Analysis
PDF
00 Automatic Mental Health Classification in Online Settings and Language Emb...
PDF
A Perspective on Graph Theory and Network Science
CS6010 Social Network Analysis Unit V
04 Data Visualization (2017)
13 Community Detection
Community Detection in Social Networks: A Brief Overview
05 Network Canvas (2017)
04 Network Data Collection
15 Network Visualization and Communities
Social network analysis
12 Network Experiments and Interventions: Studying Information Diffusion and ...
Social network analysis & Big Data - Telecommunications and more
Group and Community Detection in Social Networks
Community detection in complex social networks
10 More than a Pretty Picture: Visual Thinking in Network Studies
Social network analysis course 2010 - 2011
00 Introduction to SN&H: Key Concepts and Overview
Network Science: Theory, Modeling and Applications
Visualizing Big Data - Social Network Analysis
00 Automatic Mental Health Classification in Online Settings and Language Emb...
A Perspective on Graph Theory and Network Science
Ad

Similar to Information Networks And Their Dynamics (20)

PPT
Socialnetworkanalysis (Tin180 Com)
PDF
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
PPT
Complexity and Freedom
PPT
Role of Scaling in Developing an Understanding of How Systems Work OR the dan...
PDF
Informatics systems
DOCX
COMMUNICATIONS OF THE ACM November 2004Vol. 47, No. 11 15.docx
PDF
Mathematics and Social Networks
PDF
Academic Course: 02 Self-organization and emergence in networked systems
PPT
An Introduction to Network Theory
PDF
Essay On Chaos Engineering
PDF
Searching for patterns in crowdsourced information
PDF
Towards a democratic, scalable, and sustainable digital future (a complex sys...
PPT
01 Introduction to Networks Methods and Measures
PPT
01 Introduction to Networks Methods and Measures (2016)
PPTX
Enhancing Soft Power: using cyberspace to enhance Soft Power
PPT
Learning Networks and Connective Knowledge
PDF
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
PDF
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
PPTX
Syntropic Cities
PPTX
Modeling sustainability in social networks
Socialnetworkanalysis (Tin180 Com)
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Complexity and Freedom
Role of Scaling in Developing an Understanding of How Systems Work OR the dan...
Informatics systems
COMMUNICATIONS OF THE ACM November 2004Vol. 47, No. 11 15.docx
Mathematics and Social Networks
Academic Course: 02 Self-organization and emergence in networked systems
An Introduction to Network Theory
Essay On Chaos Engineering
Searching for patterns in crowdsourced information
Towards a democratic, scalable, and sustainable digital future (a complex sys...
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures (2016)
Enhancing Soft Power: using cyberspace to enhance Soft Power
Learning Networks and Connective Knowledge
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Massively Parallel Simulations of Spread of Infectious Diseases over Realisti...
Syntropic Cities
Modeling sustainability in social networks
Ad

More from Srinath Srinivasa (14)

PDF
AI and the sense of self
PDF
Characterizing online social cognition
PDF
Open ended data
PDF
The Web and the Mind
PDF
Big Social Machines: Architecture and Challenges
PDF
Abstraction and Expression on the Web
PDF
Towards a "Mindful" Web
PDF
The Power Law of Social Media: What CIOs Should Know
PDF
Big Data and the Semantic Web: Challenges and Opportunities
PDF
Aggregating Operational Knowledge in Community Settings
PDF
Information Networks and Semantics
PDF
Semantics hidden within co-occurrence patterns
PDF
The open problem of open-world computing
PPT
Trends In Graph Data Management And Mining
AI and the sense of self
Characterizing online social cognition
Open ended data
The Web and the Mind
Big Social Machines: Architecture and Challenges
Abstraction and Expression on the Web
Towards a "Mindful" Web
The Power Law of Social Media: What CIOs Should Know
Big Data and the Semantic Web: Challenges and Opportunities
Aggregating Operational Knowledge in Community Settings
Information Networks and Semantics
Semantics hidden within co-occurrence patterns
The open problem of open-world computing
Trends In Graph Data Management And Mining

Recently uploaded (20)

PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
A Presentation on Artificial Intelligence
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPT
Teaching material agriculture food technology
PPTX
Big Data Technologies - Introduction.pptx
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Modernizing your data center with Dell and AMD
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
A Presentation on Artificial Intelligence
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Unlocking AI with Model Context Protocol (MCP)
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
NewMind AI Weekly Chronicles - August'25 Week I
Teaching material agriculture food technology
Big Data Technologies - Introduction.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Modernizing your data center with Dell and AMD
Review of recent advances in non-invasive hemoglobin estimation
Cloud computing and distributed systems.

Information Networks And Their Dynamics

  • 1. Information Networks and their Dynamics Srinath Srinivasa IIIT Bangalore and Oktave Research Foundation [email_address]
  • 2. Partially based on the book Sage Publishers, New Delhi, London, Thousand Oaks, 2006, ISBN 0761935126
  • 3. Recent new additions to our vocabulary Telemedicine SMS/MMS e-learning Net Banking E-ticketing Open-source Privacy policy EULA … Phishing Hacking Cyber crimes Virus / Spyware / Adware / Malware Cyber squatting Identity theft Piracy …
  • 4. The “information age” Comprehensive change brought by information and communication technologies (ICT) Qualitative changes affecting the underlying mental model or the “paradigm” Changes affecting the way we live (not just businesses) Separation of information transactions from material transactions
  • 5. The information age Material exchange network Information exchange network Internet, mobile, databases, etc Then Now
  • 6. Material exchange Constrained by the laws of physics Conserved transactions High cost of replication High cost of transportation
  • 7. Information exchange with today’s ICTs Intangible (little or no physical constraints) Non-conserved transactions Extremely low replication costs Extremely low transportation costs Hard to “snatch away” internalized information
  • 8. Information Networks Historically, information was “piggy backed” over a material carrier giving information networks the same characteristics as material networks With today’s technologies, communication and coordination is separated from transport and logistics Several kinds of transactions are pure information transactions having no material component. Ex: software, data, news, knowledge, etc. How are such information networks different from material exchange networks?
  • 9. Outline Part I: Information networks and the Power Law distribution Part II: Underlying dynamics Part III: Social information networks
  • 10. Part I Information Networks and the Power Law Distribution
  • 11. Distribution of marks in an exam i.i.d (independent and identically distributed) processes Approximates a Gaussian or “Normal” distribution (binomial in the discrete case) Mode near the mean Very ubiquitous Finite variance and the central limit theorem
  • 12. Distribution of email recipients Most recipients have received very small number of emails However, a small number of recipients have received a very large number of emails Approximates the “Power Law” distribution Infinite variance or scale-free system
  • 13. The Power Law distribution Pr[X = x]  x -  for a given exponent  Straight line on a log-log scale Infinite variance Scale-free (self similar)
  • 14. Underlying random processes Exam system: A set of n independent random processes Email system: A set of n interdependent random processes Emails part of conversations 1 2 3 4 1 2 3 4
  • 15. Power Laws in nature Population distribution across human settlements Global airline networks WWW in-degree and out-degree Sizes of blood vessels in the human body Wealth distribution Frequency of word occurrence in documents Frequency of keyword searches on the web Distribution of earthquake sizes against their frequency etc..
  • 16. Characteristics of the Power Law Intuitive Very small number of very large entities and very large number of very small entities Infinite variance or “long-tailed” distribution (for certain value ranges of the exponent 
  • 17. Characteristics of the Power Law Mathematical Distribution function Scale-invariance property log-linear relationship with exponent
  • 18. Other pertinent distributions Zipf distribution Empirical result for word frequencies in document corpora f(x): frequency of word x r(x): Rank of word x (the r th most frequent word) Shown to be equivalent to the power-law distribution
  • 19. Other pertinent distributions Pareto’s law x min is the min value taken by x and  > 0 When 0 <  · 1, then the mean is infinite, and when 1 <  · 2, the variance is infinite Informally called the 80-20 principle Shown to be equivalent to the power-law distribution
  • 20. Other pertinent distributions Log-normal distribution y = f(x) is log-normally distributed, if ln y is normally distributed Approximates a power-law if the variance of ln y is very large An alternative (sometimes better) characterization of interdependent random processes Generated by product of i.i.d random processes
  • 21. Part II Underlying Dynamics
  • 22. Non-linearity Interdependent system with circular causalities Also called “complex systems” Feedback: a central characteristic Positive feedback (reinforcing loops) and negative feedback (balancing loops)
  • 23. Non-linearity: growth Feedback makes the present state of the system, a function of the previous states When x 0 > 0 and r > 1, we have positive feedback and x grows over time
  • 24. Non-linearity: saturation However, every system usually also has a “saturation” point beyond which it cannot grow. The system reaches the saturation point asymptotically If w.l.o.g. the saturation point is ‘1’ then the dynamical equation becomes: This is called the “logistic” equation (population equation) and is representative of a large class of real-world systems
  • 25. Logistic equation in everyday terms The rich get richer – up to a certain point Large cities attract more migrants – until its infrastructure saturates Celebrities (people who have media attention) get more media attention – until people get bored of them Pages with high PageRank get higher PageRank – until either user attention or search engine popularity saturates Large population leads to larger population – until resources saturate
  • 26. Sensitivity to initial conditions Case: What happens when two or more non-linear processes share resources among themselves?
  • 28. Sensitivity to initial conditions The growth ‘r’ of both A and B feed on the same population base The growth of A is at the cost of B and vice versa The growth of either A or B is dependent on their present population Small differentials in initial populations can tilt the balance irrevocably
  • 29. Preferential attachment The population distribution among the cells follows a power law
  • 30. Impact of growth rate on dynamics
  • 31. Impact of growth rate on dynamics r = 3.0 r = 3.1 r = 3.2 r = 3.5
  • 32. Impact of growth rate on dynamics r = 3.7 r = 3.9
  • 33. Period doubling and chaos Increasing growth rate in a saturation system leads to oscillations with increasing frequency For growth rates r = [3,4), a phenomenon called “period doubling” or “bifurcations” is witnessed with oscillations developing sub-oscillations The rate at which sub-oscillations develop in the logistic equation is known to be a constant (~ 4.66920) called the Feigenbaum’s constant When r ¸ 4, the system breaks down
  • 34. Period doubling in the logistic equation
  • 35. Attractors A stable non-linear system eventually displays an “attractor” pattern Attractor patterns can be “emergent” or “scale invariant” Emergence: Aggregate property that cannot be seen in the individual parts Scale invariance: Sub-systems displaying the same properties as the aggregate
  • 41. Part III Social information networks
  • 42. Outline for Part III Random graphs Largest connected component Small-world networks Information cascades Emergence of network topology
  • 43. Machines Societies Designed for a specific purpose Structure, a result of design Complementary components Component dynamics need coordination Made up of autonomous actors pursuing self-interest Structure an emergent property -- result of evolution Actor dynamics need management Machines of nature – living beings – are more like societies rather than machines
  • 44. Social information networks Information networks formed in a society of autonomous actors Network connections typically a function of self-interest dynamics Resulting network structure interesting for its attractor properties
  • 45. Random graphs Simplest form of social network models Given a population of nodes, edges are randomly added Properties to observe: Size of the largest connected component (system connectivity) Diameter of the graph (maximum degree of separation)
  • 46. Random graphs Largest connected component Measures system connectivity Calibrates the spread of ideas and influence Diameter of the graph Measures the degree of separation Calibrates distortion (or lack of it) in the spread of ideas and influence Large connected component Useful for disseminating information Small degree of separation Useful for business connections to develop
  • 48. Largest connected component Connectivity in a system with n nodes witnesses an inflection roughly when n/2 random edges are added With n random edges, roughly 80% of the system is connected Connectivity starts saturating around 4n random edges
  • 50. Random graph diameter Adding random edges increases connectivity, but also increases the overall degree of separation! Degree of separation starts reducing after reaching a peak value (More communication links makes the world bigger before it becomes smaller) Small world networks: Networks having a diameter much less than the number of nodes
  • 51. Clustered graphs Social networks are better modeled as clustered graphs , rather than pure random graphs Clustered graph property: If A knows B and C, then with a very high probability, B and C know each other Random or “long distance” edges link disparate clusters or communities
  • 52. Clustered graphs in metric spaces Nodes arranged in a metric space (having a distance function between node pairs) Clustering probability proportional to distance Random connections reduce as distance increases
  • 53. Clustered graphs in metric spaces Node u connects to node v with a probability of:  (u,v) -   where  (u,v) is the distance between u and v and  is the “clustering coefficient.”
  • 54. Clustered graphs in metric spaces When  is high, the network becomes a clustered graph. Network has a large number of local connections, making it easy to navigate It has very small number of long-distance connections making the diameter high.
  • 55. Clustered graphs in metric spaces When  is small, long distance connections are as frequent as local connections With enough edges, the diameter of the graph becomes small But navigability suffers! Even though short paths exist, it is not possible to discover them from local information
  • 56. Kleinberg connectivity At a critical value of  = 2, the clustering property of large  and small world property of small  balance each other Such a graph not only has a short diameter, but short paths are also discoverable from local information Such connectivity is also called Kleinberg connectivity
  • 57. Kleinberg connectivity An optimal graph structure balancing spread of information and minimizing distortion Alternate way of verifying Kleinberg connectivity: A node as the same connectivity with nodes at different levels of granularity Example: If you have n friends who live in the same street, n friends in the city, n friends in the country, n friends across the world; you’ve started a Kleinberg connectivity.
  • 58. Information cascades Spread of information/ideas/fads across large populations Two critical factors determining information cascades: Network configuration “Conformity”
  • 60. Asch conformity experiment A majority of the subjects decided to conform to the group opinion, even though the correct answer was starkly visible! The probability of conformance was found to be a function of the ratio of the majority versus minority, rather than absolute numbers
  • 61. Conformity and cascades A is more likely to adopt a new idea spreading through the network as compared to B
  • 62. Information cascades An idea originating from ‘a’ cascades to b, c and h when the conformity threshold is 0.5. It never cascades to ‘d’ because d is under pressure to conform to status quo from e, f and g.
  • 63. Information cascades Too little connectivity: insufficient exposure, not conducive for information cascades Too much connectivity: inertia and conformance, not conducive for information cascades In stark contrast to the epidemic spread of diseases – high connectivity means greater chances of epidemics
  • 64. Emergence of network topology [Venkatasubramanian et. al 2004] Given a society of n actors (nodes) Each actor has survival demands, the supply for which may exist anywhere in the network Communication network has three optimization criteria: Efficiency Robustness Cost
  • 65. Emergence of network topology Cost: Each communication channel (edge) adds to the cost. Cost is kept constant by giving each node only one edge Efficiency: The system is efficient if the all-pairs separation between nodes is minimized Robustness: The system is robust if the network remains connected in the face of node failures
  • 66. Emergence of network topology Topology Breeding: Cost is kept constant by giving each node exactly one edge Robustness is bounded by allowing the failure of any one node Random topologies are generated and combined. Topologies with lower fit functions are discarded Fit calculated by a parameter  that trades between efficiency and robustness
  • 67. Emergence of network topology Emergent topology when  = 1 (100% importance to efficiency and 0% importance to robustness) Star has the smallest degree of separation for a network of n nodes and n edges Failure of the central node disconnects the society
  • 68. Emergence of network topology Emergent topology when  = 0 (100% importance to robustness and 0% importance to efficiency) Circle keeps the society connected in the face of single node failure High degree of separation (not efficient)
  • 69. Emergence of network topology Emergent topology when  = 0.78 Intermediate values of  gives a variety of “hub and spoke” topologies – combinations of circle and star When n ! 1 degree distribution in the hub and spoke resembles a power-law
  • 70. Perceived value and saturation In a society, actors connect to one another to receive “value” In making a decision to connect to somebody, there “perceived value” function to be optimized Following cases of networks: Small number of partners (costly connections, material exchange networks) Large number of partners (frictionless connections, information networks)
  • 71. Perceived value and saturation When an actor connects to another actor i , there is a perceived value v i attached to that actor In addition, there a satisfaction value or saturation limit S for each actor Connections are established until the accumulated perceived value reaches the required saturation limit Law of diminishing returns: The perceived value assigned to the k th node decreases as k increases even if the intrinsic value provided by the node is the same. cumulative value at node j:
  • 72. Perceived value and saturation As z ! 1 , cumulative value at any node j can be approximated as S j z = v [ln z + c] Setting the intrinsic value v = 1 the average global satisfaction metric is now given by S = h S j z i = c + h ln z (j) i In other words, global satisfaction measure grows as a function of the log of the average degree distribution.
  • 73. Perceived value and saturation Maximum Entropy: In addition to saturation, connections are assumed to be made in a least biased fashion so as to minimize the latent uncertainty about the connection in the face of failures. The resultant distribution of node degrees can be formulated using the maximum entropy principle under the constraint for the global satisfaction function: S /h ln z i As z ! 1 , we get a power-law distribution:
  • 74. The power-law network is hence an optimal network topology in frictionless transactions arising out of a number of individual decisions aiming to maximize value and minimize uncertainty!
  • 76. Further reading L. A. Adamic. Zipf, Power-laws and Pareto: A ranking tutorial. HP Labs technical report. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html Karthik B.R., Aditya Ramana Rachakonda, Srinath Srinivasa. Strange Central-Limit Properties of Keyword Queries on the Web. IIITB Technical Report 2007. Jon Kleinberg. The small-world phenomena: An algorithmic perspective. 2000. http://www.cs.cornell.edu/home/kleinber/swn.ps Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science, Volume 286, 509–512, 1999. M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics Vol 1, No. 2, 226–251, 2003. M. E. J. Newman. Power laws, Pareto distributions and Zipf's law. Contemporary Physics Vol 46, 323–351. Venkat Venkatasubramanian, Santhoji Katare, Priyan R. Patkar, Fang-ping Mu. Spontaneous emergence of complex optimal networks through evolutionary adaptation. Computers and Chemical Engineering , Vol 28, pp 1789—1798, 2004. Venkat Venkatasubramanian, Dimitris Politis, Priyan Patkar. Entropy maximization as a holistic design principle for complex, optimal networks. AIChE (American Institute for Chemical Engineers) Journal, Vol. 52, No. 3, pp 1004—1009, March 2006.