Information Networks And Their Dynamics

Information Networks and their Dynamics Srinath Srinivasa IIIT Bangalore and Oktave Research Foundation [email_address]

Partially based on the book Sage Publishers, New Delhi, London, Thousand Oaks, 2006, ISBN 0761935126

Recent new additions to our vocabulary Telemedicine SMS/MMS e-learning Net Banking E-ticketing Open-source Privacy policy EULA … Phishing Hacking Cyber crimes Virus / Spyware / Adware / Malware Cyber squatting Identity theft Piracy …

The “information age” Comprehensive change brought by information and communication technologies (ICT) Qualitative changes affecting the underlying mental model or the “paradigm” Changes affecting the way we live (not just businesses) Separation of information transactions from material transactions

The information age Material exchange network Information exchange network Internet, mobile, databases, etc Then Now

Material exchange Constrained by the laws of physics Conserved transactions High cost of replication High cost of transportation

Information exchange with today’s ICTs Intangible (little or no physical constraints) Non-conserved transactions Extremely low replication costs Extremely low transportation costs Hard to “snatch away” internalized information

Information Networks Historically, information was “piggy backed” over a material carrier giving information networks the same characteristics as material networks With today’s technologies, communication and coordination is separated from transport and logistics Several kinds of transactions are pure information transactions having no material component. Ex: software, data, news, knowledge, etc. How are such information networks different from material exchange networks?

Outline Part I: Information networks and the Power Law distribution Part II: Underlying dynamics Part III: Social information networks

Part I Information Networks and the Power Law Distribution

Distribution of marks in an exam i.i.d (independent and identically distributed) processes Approximates a Gaussian or “Normal” distribution (binomial in the discrete case) Mode near the mean Very ubiquitous Finite variance and the central limit theorem

Distribution of email recipients Most recipients have received very small number of emails However, a small number of recipients have received a very large number of emails Approximates the “Power Law” distribution Infinite variance or scale-free system

The Power Law distribution Pr[X = x]  x -  for a given exponent  Straight line on a log-log scale Infinite variance Scale-free (self similar)

Underlying random processes Exam system: A set of n independent random processes Email system: A set of n interdependent random processes Emails part of conversations 1 2 3 4 1 2 3 4

Power Laws in nature Population distribution across human settlements Global airline networks WWW in-degree and out-degree Sizes of blood vessels in the human body Wealth distribution Frequency of word occurrence in documents Frequency of keyword searches on the web Distribution of earthquake sizes against their frequency etc..

Characteristics of the Power Law Intuitive Very small number of very large entities and very large number of very small entities Infinite variance or “long-tailed” distribution (for certain value ranges of the exponent 

Characteristics of the Power Law Mathematical Distribution function Scale-invariance property log-linear relationship with exponent

Other pertinent distributions Zipf distribution Empirical result for word frequencies in document corpora f(x): frequency of word x r(x): Rank of word x (the r th most frequent word) Shown to be equivalent to the power-law distribution

Other pertinent distributions Pareto’s law x min is the min value taken by x and  > 0 When 0 <  · 1, then the mean is infinite, and when 1 <  · 2, the variance is infinite Informally called the 80-20 principle Shown to be equivalent to the power-law distribution

Other pertinent distributions Log-normal distribution y = f(x) is log-normally distributed, if ln y is normally distributed Approximates a power-law if the variance of ln y is very large An alternative (sometimes better) characterization of interdependent random processes Generated by product of i.i.d random processes

Non-linearity Interdependent system with circular causalities Also called “complex systems” Feedback: a central characteristic Positive feedback (reinforcing loops) and negative feedback (balancing loops)

Non-linearity: growth Feedback makes the present state of the system, a function of the previous states When x 0 > 0 and r > 1, we have positive feedback and x grows over time

Non-linearity: saturation However, every system usually also has a “saturation” point beyond which it cannot grow. The system reaches the saturation point asymptotically If w.l.o.g. the saturation point is ‘1’ then the dynamical equation becomes: This is called the “logistic” equation (population equation) and is representative of a large class of real-world systems

Logistic equation in everyday terms The rich get richer – up to a certain point Large cities attract more migrants – until its infrastructure saturates Celebrities (people who have media attention) get more media attention – until people get bored of them Pages with high PageRank get higher PageRank – until either user attention or search engine popularity saturates Large population leads to larger population – until resources saturate

Sensitivity to initial conditions Case: What happens when two or more non-linear processes share resources among themselves?

Sensitivity to initial conditions

Sensitivity to initial conditions The growth ‘r’ of both A and B feed on the same population base The growth of A is at the cost of B and vice versa The growth of either A or B is dependent on their present population Small differentials in initial populations can tilt the balance irrevocably

Preferential attachment The population distribution among the cells follows a power law

Impact of growth rate on dynamics

Impact of growth rate on dynamics r = 3.0 r = 3.1 r = 3.2 r = 3.5

Impact of growth rate on dynamics r = 3.7 r = 3.9

Period doubling and chaos Increasing growth rate in a saturation system leads to oscillations with increasing frequency For growth rates r = [3,4), a phenomenon called “period doubling” or “bifurcations” is witnessed with oscillations developing sub-oscillations The rate at which sub-oscillations develop in the logistic equation is known to be a constant (~ 4.66920) called the Feigenbaum’s constant When r ¸ 4, the system breaks down

Period doubling in the logistic equation

Attractors A stable non-linear system eventually displays an “attractor” pattern Attractor patterns can be “emergent” or “scale invariant” Emergence: Aggregate property that cannot be seen in the individual parts Scale invariance: Sub-systems displaying the same properties as the aggregate

Part III Social information networks

Outline for Part III Random graphs Largest connected component Small-world networks Information cascades Emergence of network topology

Machines Societies Designed for a specific purpose Structure, a result of design Complementary components Component dynamics need coordination Made up of autonomous actors pursuing self-interest Structure an emergent property -- result of evolution Actor dynamics need management Machines of nature – living beings – are more like societies rather than machines

Social information networks Information networks formed in a society of autonomous actors Network connections typically a function of self-interest dynamics Resulting network structure interesting for its attractor properties

Random graphs Simplest form of social network models Given a population of nodes, edges are randomly added Properties to observe: Size of the largest connected component (system connectivity) Diameter of the graph (maximum degree of separation)

Random graphs Largest connected component Measures system connectivity Calibrates the spread of ideas and influence Diameter of the graph Measures the degree of separation Calibrates distortion (or lack of it) in the spread of ideas and influence Large connected component Useful for disseminating information Small degree of separation Useful for business connections to develop

Largest connected component Connectivity in a system with n nodes witnesses an inflection roughly when n/2 random edges are added With n random edges, roughly 80% of the system is connected Connectivity starts saturating around 4n random edges

Random graph diameter Adding random edges increases connectivity, but also increases the overall degree of separation! Degree of separation starts reducing after reaching a peak value (More communication links makes the world bigger before it becomes smaller) Small world networks: Networks having a diameter much less than the number of nodes

Clustered graphs Social networks are better modeled as clustered graphs , rather than pure random graphs Clustered graph property: If A knows B and C, then with a very high probability, B and C know each other Random or “long distance” edges link disparate clusters or communities

Clustered graphs in metric spaces Nodes arranged in a metric space (having a distance function between node pairs) Clustering probability proportional to distance Random connections reduce as distance increases

Clustered graphs in metric spaces Node u connects to node v with a probability of:  (u,v) -   where  (u,v) is the distance between u and v and  is the “clustering coefficient.”

Clustered graphs in metric spaces When  is high, the network becomes a clustered graph. Network has a large number of local connections, making it easy to navigate It has very small number of long-distance connections making the diameter high.

Clustered graphs in metric spaces When  is small, long distance connections are as frequent as local connections With enough edges, the diameter of the graph becomes small But navigability suffers! Even though short paths exist, it is not possible to discover them from local information

Kleinberg connectivity At a critical value of  = 2, the clustering property of large  and small world property of small  balance each other Such a graph not only has a short diameter, but short paths are also discoverable from local information Such connectivity is also called Kleinberg connectivity

Kleinberg connectivity An optimal graph structure balancing spread of information and minimizing distortion Alternate way of verifying Kleinberg connectivity: A node as the same connectivity with nodes at different levels of granularity Example: If you have n friends who live in the same street, n friends in the city, n friends in the country, n friends across the world; you’ve started a Kleinberg connectivity.

Information cascades Spread of information/ideas/fads across large populations Two critical factors determining information cascades: Network configuration “Conformity”

Asch conformity experiment A majority of the subjects decided to conform to the group opinion, even though the correct answer was starkly visible! The probability of conformance was found to be a function of the ratio of the majority versus minority, rather than absolute numbers

Conformity and cascades A is more likely to adopt a new idea spreading through the network as compared to B

Information cascades An idea originating from ‘a’ cascades to b, c and h when the conformity threshold is 0.5. It never cascades to ‘d’ because d is under pressure to conform to status quo from e, f and g.

Information cascades Too little connectivity: insufficient exposure, not conducive for information cascades Too much connectivity: inertia and conformance, not conducive for information cascades In stark contrast to the epidemic spread of diseases – high connectivity means greater chances of epidemics

Emergence of network topology [Venkatasubramanian et. al 2004] Given a society of n actors (nodes) Each actor has survival demands, the supply for which may exist anywhere in the network Communication network has three optimization criteria: Efficiency Robustness Cost

Emergence of network topology Cost: Each communication channel (edge) adds to the cost. Cost is kept constant by giving each node only one edge Efficiency: The system is efficient if the all-pairs separation between nodes is minimized Robustness: The system is robust if the network remains connected in the face of node failures

Emergence of network topology Topology Breeding: Cost is kept constant by giving each node exactly one edge Robustness is bounded by allowing the failure of any one node Random topologies are generated and combined. Topologies with lower fit functions are discarded Fit calculated by a parameter  that trades between efficiency and robustness

Emergence of network topology Emergent topology when  = 1 (100% importance to efficiency and 0% importance to robustness) Star has the smallest degree of separation for a network of n nodes and n edges Failure of the central node disconnects the society

Emergence of network topology Emergent topology when  = 0 (100% importance to robustness and 0% importance to efficiency) Circle keeps the society connected in the face of single node failure High degree of separation (not efficient)

Emergence of network topology Emergent topology when  = 0.78 Intermediate values of  gives a variety of “hub and spoke” topologies – combinations of circle and star When n ! 1 degree distribution in the hub and spoke resembles a power-law

Perceived value and saturation In a society, actors connect to one another to receive “value” In making a decision to connect to somebody, there “perceived value” function to be optimized Following cases of networks: Small number of partners (costly connections, material exchange networks) Large number of partners (frictionless connections, information networks)

Perceived value and saturation When an actor connects to another actor i , there is a perceived value v i attached to that actor In addition, there a satisfaction value or saturation limit S for each actor Connections are established until the accumulated perceived value reaches the required saturation limit Law of diminishing returns: The perceived value assigned to the k th node decreases as k increases even if the intrinsic value provided by the node is the same. cumulative value at node j:

Perceived value and saturation As z ! 1 , cumulative value at any node j can be approximated as S j z = v [ln z + c] Setting the intrinsic value v = 1 the average global satisfaction metric is now given by S = h S j z i = c + h ln z (j) i In other words, global satisfaction measure grows as a function of the log of the average degree distribution.

Perceived value and saturation Maximum Entropy: In addition to saturation, connections are assumed to be made in a least biased fashion so as to minimize the latent uncertainty about the connection in the face of failures. The resultant distribution of node degrees can be formulated using the maximum entropy principle under the constraint for the global satisfaction function: S /h ln z i As z ! 1 , we get a power-law distribution:

The power-law network is hence an optimal network topology in frictionless transactions arising out of a number of individual decisions aiming to maximize value and minimize uncertainty!

Further reading L. A. Adamic. Zipf, Power-laws and Pareto: A ranking tutorial. HP Labs technical report. http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html Karthik B.R., Aditya Ramana Rachakonda, Srinath Srinivasa. Strange Central-Limit Properties of Keyword Queries on the Web. IIITB Technical Report 2007. Jon Kleinberg. The small-world phenomena: An algorithmic perspective. 2000. http://www.cs.cornell.edu/home/kleinber/swn.ps Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science, Volume 286, 509–512, 1999. M. Mitzenmacher. A brief history of generative models for power law and lognormal distributions. Internet Mathematics Vol 1, No. 2, 226–251, 2003. M. E. J. Newman. Power laws, Pareto distributions and Zipf's law. Contemporary Physics Vol 46, 323–351. Venkat Venkatasubramanian, Santhoji Katare, Priyan R. Patkar, Fang-ping Mu. Spontaneous emergence of complex optimal networks through evolutionary adaptation. Computers and Chemical Engineering , Vol 28, pp 1789—1798, 2004. Venkat Venkatasubramanian, Dimitris Politis, Priyan Patkar. Entropy maximization as a holistic design principle for complex, optimal networks. AIChE (American Institute for Chemical Engineers) Journal, Vol. 52, No. 3, pp 1004—1009, March 2006.

Information Networks And Their Dynamics

More Related Content

What's hot (20)

Similar to Information Networks And Their Dynamics (20)

More from Srinath Srinivasa (14)

Recently uploaded (20)

Information Networks And Their Dynamics