AOTO: Adaptive Overlay Topology Optimization
in Unstructured P2P Systems∗
Yunhao Liu, Zhenyun Zhuang, Li Xiao
Department of Computer Science and Engineering
Michigan State University
East Lansing, MI 48824
{liuyunha, zhuangz1, lxiao}@cse.msu.edu
Lionel M. Ni
Department of Computer Science
Hong Kong University of Science and Technology
Clearwater Bay, Kowloon, Hong Kong
ni@cs.ust.hk
∗
This work was partially supported by Michigan State University IRGP Grant 41114 and by Hong Kong RGC Grant HKUST6161/03E.
Abstract- Peer-to-Peer (P2P) systems are self-organized and
decentralized. However, the mechanism of a peer randomly
joining and leaving a P2P network causes topology mismatch-
ing between the P2P logical overlay network and the physical
underlying network. The topology mismatching problem brings
great stress on the Internet infrastructure and seriously limits
the performance gain from various search or routing tech-
niques. We propose the Adaptive Overlay Topology Optimiza-
tion (AOTO) technique, an algorithm of building an overlay
multicast tree among each source node and its direct logical
neighbors so as to alleviate the mismatching problem by choos-
ing closer nodes as logical neighbors, while providing a larger
query coverage range. AOTO is scalable and completely dis-
tributed in the sense that it does not require global knowledge
of the whole overlay network when each node is optimizing the
organization of its logical neighbors. The simulation shows that
AOTO can effectively solve the mismatching problem and re-
duce more than 55% of the traffic generated by the P2P system
itself.
I. INTRODUCTION
Peer-to-peer (P2P) systems have received much attention
since the development of Gnutella. The P2P model aims to
further utilize the Internet information and resources, com-
plementing the traditional client-server services. P2P sys-
tems can be classified into structured and unstructured sys-
tems [1]. A major factor to determine the quality and per-
formance of a P2P system is how effective is the searching
and locating of information among the peers. Many search
techniques have been proposed for structured P2P systems
based on hash functions to tightly control file placement
(and file locating) with the network topology (e.g., [2]). Al-
though these designs are expected to dramatically improve
the search performance, none of them is practically used due
to their high maintenance traffic in delivering messages and
updating the mapping. Furthermore, it is hard for structured
P2P systems to efficiently support partially matched queries.
In an unstructured P2P system, file placement is random,
which has no correlation with the network topology. Un-
structured P2P systems are most commonly used in today's
Internet. An unstructured P2P system floods queries among
peers (such as in Gnutella) or among supernodes (such as in
KaZaA). This paper is focusing on unstructured P2P systems.
In a P2P system, all participating peers form a P2P net-
work over a physical network. A P2P network is an abstract,
logical network called an overlay network. When a new peer
wants to join a P2P network, a bootstrapping node provides
the IP addresses of a list of existing peers in the P2P net-
work. The new peer then tries to connect with these peers. If
some attempts succeed, the connected peers will be the new
peer's neighbors. Once this peer connects into a P2P net-
work, the new peer will periodically ping the network con-
nections and obtain the IP addresses of some other peers in
the network. These IP addresses are cached by this new peer.
When a peer leaves the P2P network and then wants to join
the P2P network again (no longer the first time), the peer
will try to connect to the peers whose IP addresses have al-
ready been cached. This mechanism of a peer joining a P2P
network and the fact of a peer randomly joining and leaving
causes an interesting matching problem between a P2P over-
lay network topology and the underlying physical network
topology.
Figure 1 shows two examples of P2P overlay topology (A,
B, and D are three participating peers) and physical topology
(nodes A, B, C, and D) mappings, where solid lines denote
physical connections and dashes lines denote overlay (logi-
cal) connections. Consider the case of a message delivery
from peer A to peer B. In the left figure, A and B are both
P2P neighbors and physical neighbors. Thus, only one
communication is involved. In the right figure, since A and
B are not P2P neighbors, A has to send the message to D
before forwarding to B. This will involve 5 communications
as indicated in Fig. 1. Clearly, such a mapping creates much
unnecessary traffic and lengthens the query response time.
We refer to this phenomenon as topology mismatching prob-
lem.
CA BCB D DA
Figure 1: Two examples of P2P overlay networks.
Studies in [3] show that only 2 to 5 percent of Gnutella
connections link peers within a single autonomous system
GLOBECOM 2003 - 4186 - 0-7803-7974-8/03/$17.00 © 2003 IEEE
(AS). But more than 40 percent of all Gnutella peers are
located within the top 10 ASes. This means that most
Gnutella-generated traffic crosses AS borders so as to in-
crease topology mismatching costs. The same message can
traverse the same physical link multiple times, causing large
amount of unnecessary traffic.
In order to reduce unnecessary flooding traffic and im-
prove search performance, two approaches have typically
been used to improve from the flooding-based search
mechanism. Rather than flooding a query to all neighbors,
the first approach routes queries to peers that are likely to
have the requested items by some heuristics based on main-
tained statistic information [4]. In the second approach, a
peer keeps indices of other peers’ sharing information or
caches query responses in hoping that subsequent queries
can be satisfied quickly by the cached indices or responses
[4,5]. The performance gains of these approaches are also
seriously limited by the topology mismatching problem.
The objective of this paper is to minimize the effect due to
topology mismatching. We propose the Adaptive Overlay
Topology Optimization (AOTO) to alleviate the topology
mismatching problem. AOTO is scalable and completely
distributed in the sense that it does not require global knowl-
edge of the whole overlay network when each node is opti-
mizing the organization of its logical neighbors. Our simula-
tion shows that the average cost of each query to reach the
same scope of nodes is reduced by about 55% when using
AOTO in a Gnutella-like P2P network without losing any
autonomy feature, and the average response time of each
query can be reduced by 40%.
II. ADAPTIVE OVERLAY TOPOLOGY OPTIMIZATION
A. Inefficient Scenarios
In most flooding-based decentralized P2P networks, such
as Limeware (Gnutella), each peer forwards a query message
to all of its logical neighbors. Most supernode-based P2P
systems, such as KaZaA, also flood queries among super-
nodes. Figure 2(a) depicts an example of the underlying
physical network topology, where the cost of each link is
labeled by the link. Let node 1 be the source peer that will
send flooding messages to other peers. For simplicity, we
only consider total traffic (or cost) generated reaching nodes
2, 3, and 4 on three different P2P overlay topologies as
shown in Fig. 2(b), 2(c), and 2(d), respectively. We assume
that a node reaches to another node through a shortest physi-
cal path based on the link cost (metric). Note that the two
shaded nodes in Fig. 2(a) are non-participating nodes in the
P2P network.
In Fig. 2(b), nodes 2, 3 and 4 are immediate logical
neighbors of node 1. The shortest physical path from node 1
to node 4 is 1 5 2 s 4 with a total cost of 9. Simi-
larly, the costs from node 1 to nodes 2 and 3 are 3 and 15,
respectively. Thus, the total cost of flooding a message from
node 1 to nodes 2, 3, and 4 is 3+15+9=27. In Fig. 2(c), node
3 is the only immediate logical neighbor of node 1 and nodes
2 and 4 are immediate logical neighbors of node 3. A mes-
sage will be flooded from node 3 to nodes 2 and 4. The total
cost from node 1 to nodes 2, 3, and 4 is 15+12+6=33, which
is worse than the case of Fig. 2(b). In Fig 2(d), node 1 can
flood the message to all its neighbors, thus nodes 2, 3, and 4.
However, node 2 does not know that node 3 will receive the
message and will flood the message to node 3 as well. Simi-
larly, node 4 will also flood the message to node 3. Thus, the
total cost is 3+15+9+12+6=45.
1 5 2
36 4
7
30
20
21 5
1
33
6
(a) Physical Topology
2 1 3 5
4 6 7
3 15 14
9 6
(b) Overlay Topology 1
3
2
1
5
4
6
7
6
15
12
6
14
30
(c) Overlay Topology 2
31
2 5
6
74
1 3 6
52
4 7
(d) Overlay Topology 3 (e) An Optimized Topology
12
14
30
6
6
3
9
3
12
6
30
15
Figure 2: Examples of different overlay topologies.
Clearly, all the three inefficient overlay topologies gener-
ate a large amount of unnecessary traffic. Optimizing ineffi-
cient overlay topologies can fundamentally improve P2P
search efficiency. One attempt is to build an overlay multi-
cast tree among a node and its logical neighbors. In the case
of Fig. 2(d), an improved mechanism is shown as thick lines
in Fig. 2(e) in which the total cost from node 1 to nodes 2, 3,
and 4 is 3+12+6=21. Although the cost is not as low as the
optimal IP-level multicast, which is 15, the total cost has
already been significantly reduced. This is the motivation
that we propose the Adaptive Overlay Topology Optimiza-
tion (AOTO) technique.
While retaining the desired prevailing unstructured archi-
tecture of P2P systems, the goal of AOTO is to dynamically
optimize the logical topology to improve the overall per-
formance of P2P systems, which can be measured as query
response time. AOTO includes two steps: Selective Flooding
(SF) and Active Topology (AT). Selective Flooding is to
build an overlay multicast tree among each peer and its im-
mediate logical neighbors, and route messages on the tree to
reduce flooding traffic without shrinking the search coverage
range. Thus, some neighbors become non-flooding
neighbors. Active Topology is the second step in AOTO for
each peer to independently make optimization on the overlay
topology to alleviate topology mismatching problem by re-
placing non-flooding neighbors with closer nodes as direct
logical neighbors.
B. Selective Flooding
Instead of flooding to all neighbors, SF uses a more effi-
cient flooding strategy to selectively flood a query on an
overlay multicast tree. This tree can be formed using a
minimum spanning tree algorithm among each peer and its
GLOBECOM 2003 - 4187 - 0-7803-7974-8/03/$17.00 © 2003 IEEE
immediate logical neighbors. In order to build the minimum
spanning tree, a peer has to know the costs to all its logical
neighbors and the costs between any pair of the neighbors.
We use network delay between two nodes as a metric for
measuring the cost between nodes. We modify the Limewire
implementation of Gnutella 0.6 P2P protocol by adding one
routing message type. Each peer probes the costs with its
immediate logical neighbors and forms a neighbor cost table.
Two neighboring peers exchange their neighbor cost tables
so that a peer can obtain the cost between any pair of its
logical neighbors. Thus, a small overlay topology of a source
peer and all its logical neighbors is known to the source peer.
Compared with the flooding traffic, the traffic generated
in SF due to exchanging neighbor cost tables is insignificant
because such exchanges only occur between immediate
neighbors. For a branching factor (i.e., average number of
direct logical neighbors) of m and TTL (the number of times
a message will be forwarded) of k, the flooding traffic is
O(mN) for each query, where N is the total number of peers,
which is typical in the range of millions and m is in the
range of tens. The traffic increased due to exchanging
neighbor cost tables is O(m) that is trivial. Based on obtained
neighbor cost tables, a minimum spanning tree then can be
built by simply using an algorithm like PRIM which has a
computation complexity of O(m2
). Now the message routing
strategy of a peer is to select the peers that are the direct
neighbors in the multicast tree to send its queries.
In the example of Fig. 2(e), node 1 sends a message only
to node 2 and expects that node 2 will forward the message
to nodes 3 and 4. Note that in this step, even node 1 does not
flood its query message to nodes 3 and 4 any more, node 1
still retains the connections with nodes 3 and 4 and keeps
exchanging the neighbor cost tables. We call nodes 3 and 4
non-flooding neighbors, which are the peers to be optimized
in the next step.
C. Active Topology
The second step of AOTO, AT, reorganizes the overlay
topology. Note that each peer has a neighbor list which is
further divided into flooding neighbors and non-flooding
neighbors in SF. Each peer also has the neighbor cost tables
of all its neighbors. In this step, it tries to replace those
physically far away neighbors by physically close by
neighbors, thus minimizing the topology mismatching traf-
fic. An efficient method to identify such a candidate peer to
replace a far away neighbor is critical to the system per-
formance. Many methods may be proposed. In our approach,
a non-flooding neighbor may be replaced by one of the non-
flooding neighbor’s neighbors. Let Cij represent the cost
from peer i to j. The proposed Randomized AT algorithm
picks up a candidate peer at random among the non-flooding
neighbor’s neighbors. The following pseudo code describes
the randomized AT algorithm for a given source peer i.
Pseudo Code of the Randomized AT Algorithm (peer i)
For each j in i's non-flooding neighbors
Replaced = false; List = all j's neighbors excluding i;
While List is not empty and Replaced = false
randomly remove a peer h from List;
measure Cih;
if Cih < Cij {replace j by h in i's neighbor list;
Replaced = true;}
else if Cih < Cjh {add h to i's neighbor list;
remove j from i's neighbor list right after
i finds out jh is disconnected;
Replaced = true;};
End While;
End For;
Note that Cij is known to the peer i. Cjh is also known to
the peer i due to the exchange of neighbor cost tables be-
tween i and j. The cases of Cih < Cij and Cih ≥ Cjh are quite
obvious. Let’s explain the case of Cih < Cjh using Fig. 2(d),
where i=1 and j=3. Suppose h=6. From Fig. 2(d), we have
C1,3=15 and C3,6=30. From Fig 2(a), we can measure C1,6=20.
When node 1 finds that the cost between nodes 3 and 6 is
even larger the cost between node 1 and node 6, node 1 will
keep node 6 as a new neighbor. Since the algorithm is exe-
cuted in each peer independently, node 1 cannot inform node
3 to remove node 6 from node 3’s neighbor list. However, as
long as node 1 keeps both node 3 and 6 as its logical
neighbors, we may expect that node 6 will become a non-
flooding neighbor to node 3 after node 3’s SF step since
node 3 expects node 1 to forward messages to node 6 to re-
duce unnecessary traffic. Node 3 will try to find another peer
to replace node 6 as its neighbor. After knowing that node 6
is no longer a neighbor to node 3 from periodically ex-
changed neighbor cost tables from node 3 (or from node 6),
node 1 will remove node 3 from its neighbor list though
node 1 has already stopped sending query messages to node
3 for a period of time since the spanning tree has been built
for node 1.
III. PERFORMANCE EVALUATION
Performance evaluation of the proposed AOTO method
is described in this section. Both physical topologies and
logical overlay topologies which can accurately reflect the
topological properties of real networks in each layer are
needed in the simulation study. Previous studies have shown
that both large scale Internet physical topologies [6] and P2P
overlay topologies [7] follow small world and power law
properties. Power law describes the node degree while small
world describes characteristics of path length and clustering
coefficient [9]. The study in [6] found that the topologies
generated using the AS Model have the properties of small
world and power law. BRITE is a topology generation tool
that provides the option to generate topologies based on the
AS Model. We generate 10 physical topologies each with
5000 nodes. The logical topologies are generated with the
number of peers (nodes) ranging from 100 to 2000. For each
given number of nodes, we generate logical topologies with
average edge connections between 1 and 20.
GLOBECOM 2003 - 4188 - 0-7803-7974-8/03/$17.00 © 2003 IEEE
1 2 3 4 5 6 7 8 9 10
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Different Physical Topologies
AverageOptimizationRate
Figure 3: Effect due to different physical topologies
for SF.
0 200 400 600 800 1000
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Number of Nodes in logical topologies
AverageOptimizationRate
Figure 4: Effect due to different logical topologies
for SF.
0 5 10 15 20 25 30 35 40
0 %
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Logical Neighbors
AverageOptimizationRate
Figure 5: Effect due to different number of logical
neighbors.
A. Performance Metrics
Let Tk denote the cost of each query from the source to
reach all its neighbors, where T-1 is the cost based on the
traditional blind flooding, T0 is the cost using SF only in the
first time, and Tk is the k-th time applying the randomized
AT algorithm. Let ∆k = (Tk-1 – Tk)/Tk-1 × 100%. Whenever a
new neighbor cost table is received or there is a change of
neighbors, the source peer has to re-calculate the multicast
tree and apply the randomized AT algorithm. In theory, the
source peer can continuously do this until no cost improve-
ment is obtained, thus closing to a perfect topology match-
ing. Obviously, this is unnecessary and creates too much
overhead. We refer k as the number of optimization steps.
During the k-th time applying the AT algorithm, each
non-flooding neighbor may be replaced by one of its
neighbors. If the source has n non-flooding neighbors, the
proposed randomized AT algorithm may have up to n re-
placements. The overhead to exhaust all n possible replace-
ments may also be too high. In practice, after each replace-
ment, the source peer will compute the cost improvement
ratio and decide whether it needs to find another candidate
peer to replace another non-flooding neighbor based on a
termination threshold, ∆. The optimization process will ter-
minate if the improvement ratio is less than ∆. Thus, the
value of ∆ is a factor to impact the effectiveness of AT. A
smaller threshold causes larger overhead, but produces better
optimization results, which will be shown in the next sec-
tion.
To evaluate the optimization result of a logical topology,
we use the metric, average distance, D. Let Di be the aver-
age distance between the source peer i and all its logical
neighbors. The value D is defined as the average of all Di’s
(i.e., all peers in the P2P network). The ideal case would be
that each node has physically closest P2P nodes as its logical
neighbors with at least the same query coverage range. The
topology mismatching problem is effectively solved in the
ideal case. Since we assume that a peer can always reach its
logical neighbors through the shortest physical path, our
simulator calculates the physical shortest path for pairs of
peers.
We have simulated AOTO for all the generated logical
topologies on top of each of the 10 generated physical to-
pologies with 5000 nodes. We have also simulated AOTO in
a real-world P2P topology (based on DSS Clip2 trace). We
obtained consistent results on the real-world topology and
the generated topologies. In order to show a thorough per-
formance discussion, we only present our performance on
various generated topologies.
B. Effectiveness of Selective Flooding
To evaluate the effectiveness of SF, we consider three
factors. First, how do different physical topologies affect the
effectiveness of SF? We compare the average ∆0’s of a
1000-node logical topology with average 8 edge connections
on top of the 10 different physical topologies after selective
flooding. Figure 3 shows consistent ∆0 results ranging from
55% to 60% for different physical topologies. Thus, the
physical topology has little impact on the effectiveness of SF.
Second, how do different logical topologies affect the ef-
fectiveness of SF? For a given physical topology and a given
average edge connection, we compare the average ∆0’s of SF
on logical topologies ranging from 50 to 1000 nodes. The
results in Fig. 4 show that the density of P2P nodes does not
influence the effectiveness of SF. The average optimization
rate, ∆0, for each topology is around 55%.
Third, how do different numbers of average logical
neighbors affect the effectiveness of SF? We compare the
average ∆0’s of 20 500-node logical topologies with differ-
ent numbers of logical neighbors ranging from 2 to 40. Fig-
ure 5 show that SF is more effective with a large number of
logical neighbors. For example, SF can achieve ∆0 as high as
87.4% on a logical topology with an average of 30 logical
neighbors. It is normal that the out-degree of a supernode
reaches 30 in many P2P systems [4].
C. Effectiveness of Active Topology
The average number of logical neighbors is a major fac-
tor in the effectiveness of SF, but not of AT. We compare
the average reductions of D on three 500-node logical to-
pologies with average 6, 12, and 18 logical neighbors as the
number of optimization steps is increased.
GLOBECOM 2003 - 4189 - 0-7803-7974-8/03/$17.00 © 2003 IEEE
0 5 10 15 20
70
75
80
85
90
95
100
NormalizedAverageDistance
6 Neighbors
12 Neighbors
18 Neighbors
Optimization Steps
Figure 6: Effect on the number of logical neighbors.
0 5 10 15 20 25 30
60
65
70
75
80
85
90
95
100
Optimization Steps
NormalizedAverageDistance
80%
60%
40%
20%
0%
Figure 7: Comparison of different termination
thresholds.
0 5 10 15 20 25 30 35 40
55
60
65
70
75
80
85
90
95
100
Optimization Stpes
AverageResponseTime
Figure 8: Normalized improvement of average
query response time.
Results in Fig. 6 show that the number of logical neighbors
has little impact to the effectiveness of AT.
The termination threshold ∆ can be defined by each node
independently. In above simulations, we have ∆ =20% for
each node. In order to show the trade-offs between overhead
and effectiveness, Fig. 7 plots the normalized average dis-
tance D on different thresholds as the number of optimiza-
tion steps is increased. As expected, D is reduced more
slowly for a larger threshold, and a lower threshold leads to
a better result.
D. The Improvement of Query Response Time
Average response time of each query in a P2P system is
what a user really cares about. In order to show the overall
improvement of AOTO, Fig. 8 shows the normalized aver-
age query response times as the number of optimization
steps is increased from 1 to 40. Given a source peer, the des-
tination peer is randomly chosen. As shown in Fig. 8, the
average query response time is significantly reduced.
IV. CONCLUSIONS AND FUTURE WORK
This paper aims at alleviating the topology mismatching
problem by optimizing the overlay network using our pro-
posed AOTO technique to reduce unnecessary traffic and
improve query search efficiency. To the best of our knowl-
edge, the topology mismatching problem has not been ade-
quately addressed. The only related work is Narada [8].
Based on end system multicast, Narada first constructs a rich
connected graph on which to further construct shortest path
spanning trees. This approach introduces large overhead of
forming the graph of trees in a large scope and does not con-
sider peers’ dynamic joining and leaving. The overhead of
Narada is proportional to the multicast group size.
The proposed AOTO technique is easy to implement and
adaptive to the dynamic nature of P2P systems. Furthermore,
the overhead of the proposed AOTO algorithm is only pro-
portional to the average number of logical neighbors. AOTO
is more effective on logical topologies with large branching
factors. It will make the decentralized flooding-based P2P
file sharing systems more scalable and efficient.
Due to space limitation, many simulation results and de-
sign alternatives are not included in this paper. Instead of the
randomized AT algorithm, we have studied the Closest AT
algorithm to replace a non-flooding neighbor by its least cost
neighbor, which will further optimize the overlay topology
with a less number of optimization steps at the expense of
more overhead.
More research is needed to study other AOTO policies,
the frequency in invoking the AOTO algorithm, the impact
due to different multicast tree algorithms, and the impact of
implementation overhead.
ACKNOWLEDGMENTS
We thank Xiaomei Liu for her participation in the early
stage of the work. We are grateful to the anonymous review-
ers for their insightful and constructive comments.
REFERENCES
[1] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker. “Search and replication
in unstructured peer-to-peer networks,” Proc. of the 16th ACM Int’l
Conf. on Supercomputing, June 2002.
[2] I. Stocia, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan,
“Chord: A scalable peer-to-peer lookup service for Internet applica-
tions,” Proc. of SIGCOMM 2001.
[3] M Ripeanu, A. Iamnitchi and I. Foster, “Mapping the Gnutella Net-
work,” IEEE Internet Computing, January/February 2002, pp. 50-57.
[4] B. Yang and H. Garcia-Molina, “Designing super-peer network,” Proc.
of the 22nd Int’l Conf. on Distributed Computing Systems (ICDCS
2002), Vienna, Austria, July 2002.
[5] K. Sripanidkulchai, B. Maggs and H. Zhang, “Efficient Content Loca-
tion Using Interest-Based Locality in Peer-to-Peer Systems,” Proc. of
INFOCOM 2003.
[6] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, and W. Will-
inger, “Network Topology Generators: Degree-Based vs. Structural,”
Proc. of SIGCOMM’02, August 2002.
[7] S. Saroiu, P.K. Gummadi, and S. D. Gribble, “A measurement study of
peer-to-peer file sharing systems,” Proc. of Multimedia Computing and
Networking 2002, January 2002.
[8] Y. Chu, S. G. Rao and H. Zhang, “A Case For End System Multicast,"
Proc. of ACM SIGMETRICS, Santa Clara, CA, June 2000, pp. 1-12.
[9] T. Bu and D. Towsley, “On Distinguishing between Internet power law
topology generators,” Proceedings of INFOCOM 2002.
GLOBECOM 2003 - 4190 - 0-7803-7974-8/03/$17.00 © 2003 IEEE

More Related Content

PDF
Efficient routing mechanism using cycle based network and k hop security in a...
PDF
Concept of node usage probability from complex networks and its applications ...
PDF
Mitigating Link Failures & Implementing Security Mechanism in Multipath Flows...
PDF
Optimizing IP Networks for Uncertain Demands Using Outbound Traffic Constraints
PDF
Effective Data Retrieval System with Bloom in a Unstructured p2p Network
PDF
Fuzzy Optimized Metric for Adaptive Network Routing
PDF
Centrality-Based Network Coder Placement For Peer-To-Peer Content Distribution
PDF
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
Efficient routing mechanism using cycle based network and k hop security in a...
Concept of node usage probability from complex networks and its applications ...
Mitigating Link Failures & Implementing Security Mechanism in Multipath Flows...
Optimizing IP Networks for Uncertain Demands Using Outbound Traffic Constraints
Effective Data Retrieval System with Bloom in a Unstructured p2p Network
Fuzzy Optimized Metric for Adaptive Network Routing
Centrality-Based Network Coder Placement For Peer-To-Peer Content Distribution
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...

What's hot (15)

PDF
Adaptive resource allocation and internet traffic engineering on data network
PDF
Study of the topology mismatch problem in peer to-peer networks
PDF
VIRTUAL ROUTING FUNCTION DEPLOYMENT IN NFV-BASED NETWORKS UNDER NETWORK DELAY...
PDF
Research Inventy : International Journal of Engineering and Science
DOCX
Transfer reliability and congestion control strategies in opportunistic netwo...
PDF
Stochastic analysis of random ad hoc networks with maximum entropy deployments
PDF
International Journal of Computational Engineering Research(IJCER)
PDF
Enforcing end to-end proportional fairness with bounded buffer overflow proba...
PDF
Developing QoS by Priority Routing for Real Time Data in Internet of Things (...
PDF
C0351725
DOCX
Geographic routing in
PDF
A New Efficient Cache Replacement Strategy for Named Data Networking
PDF
AN EFFECTIVE CONTROL OF HELLO PROCESS FOR ROUTING PROTOCOL IN MANETS
PDF
DYNAMIC ADDRESS ROUTING FOR SCALABLE AD HOC NETWORKS
PDF
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
Adaptive resource allocation and internet traffic engineering on data network
Study of the topology mismatch problem in peer to-peer networks
VIRTUAL ROUTING FUNCTION DEPLOYMENT IN NFV-BASED NETWORKS UNDER NETWORK DELAY...
Research Inventy : International Journal of Engineering and Science
Transfer reliability and congestion control strategies in opportunistic netwo...
Stochastic analysis of random ad hoc networks with maximum entropy deployments
International Journal of Computational Engineering Research(IJCER)
Enforcing end to-end proportional fairness with bounded buffer overflow proba...
Developing QoS by Priority Routing for Real Time Data in Internet of Things (...
C0351725
Geographic routing in
A New Efficient Cache Replacement Strategy for Named Data Networking
AN EFFECTIVE CONTROL OF HELLO PROCESS FOR ROUTING PROTOCOL IN MANETS
DYNAMIC ADDRESS ROUTING FOR SCALABLE AD HOC NETWORKS
EFFECTIVE TOPOLOGY-AWARE PEER SELECTION IN UNSTRUCTURED PEER-TO-PEER SYSTEMS
Ad

Viewers also liked (20)

PDF
Optimizing Streaming Server Selection for CDN-delivered Live Streaming
PDF
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
PDF
Hazard avoidance in wireless sensor and actor networks
PDF
PAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified Worms
PDF
A3: application-aware acceleration for wireless data networks
PDF
Client-side web acceleration for low-bandwidth hosts
PDF
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
PPSX
Programmatic Right Here, Right Now ( English Version )
PDF
Tema 1
PPT
Legal aspects of religion in the workplace
PPT
August 7, 2013 -leadership
PPTX
January 2014 omega version-for distribution-rwb power point presentation-mlk ...
PPT
What makes a leader and what is leadershp
PPTX
Overview on Biometrics by Ch.Ravikumar & Priya N
PDF
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
PDF
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
PDF
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
PDF
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
PDF
Building Cloud-ready Video Transcoding System for Content Delivery Networks (...
PDF
Mutual Exclusion in Wireless Sensor and Actor Networks
Optimizing Streaming Server Selection for CDN-delivered Live Streaming
On the Impact of Mobile Hosts in Peer-to-Peer Data Networks
Hazard avoidance in wireless sensor and actor networks
PAIDS: A Proximity-Assisted Intrusion Detection System for Unidentified Worms
A3: application-aware acceleration for wireless data networks
Client-side web acceleration for low-bandwidth hosts
Ensuring High-performance of Mission-critical Java Applications in Multi-tena...
Programmatic Right Here, Right Now ( English Version )
Tema 1
Legal aspects of religion in the workplace
August 7, 2013 -leadership
January 2014 omega version-for distribution-rwb power point presentation-mlk ...
What makes a leader and what is leadershp
Overview on Biometrics by Ch.Ravikumar & Priya N
Eliminating OS-caused Large JVM Pauses for Latency-sensitive Java-based Cloud...
Guarding Fast Data Delivery in Cloud: an Effective Approach to Isolating Perf...
Mobile Hosts Participating in Peer-to-Peer Data Networks: Challenges and Solu...
OCPA: An Algorithm for Fast and Effective Virtual Machine Placement and Assig...
Building Cloud-ready Video Transcoding System for Content Delivery Networks (...
Mutual Exclusion in Wireless Sensor and Actor Networks
Ad

Similar to AOTO: Adaptive overlay topology optimization in unstructured P2P systems (20)

PDF
A Distributed Approach to Solving Overlay Mismatching Problem
PDF
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
PDF
CPCRT: Crosslayered and Power Conserved Routing Topology for congestion Cont...
PDF
PDF
Evaluation of a topological distance
PDF
AN INITIAL PEER CONFIGURATION ALGORITHM FOR MULTI-STREAMING PEER-TO-PEER NETW...
PDF
AN INITIAL PEER CONFIGURATION ALGORITHM FOR MULTI-STREAMING PEER-TO-PEER NETW...
PDF
A Systematic Review of Congestion Control in Ad Hoc Network
PDF
Node similarity
PDF
Routing performance of structured overlay in Distributed Hash Tables (DHT) fo...
DOCX
JAVA 2013 IEEE NETWORKING PROJECT Transfer reliability and congestion control...
PDF
ENFORCING END-TO-END PROPORTIONAL FAIRNESS WITH BOUNDED BUFFER OVERFLOW PROBA...
PDF
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
PDF
Path loss exponent estimation
PDF
Analytical Modelling of Localized P2P Streaming Systems under NAT Consideration
PDF
G1063841
PDF
International Journal of Engineering Research and Development
PPTX
Computer Network Topology By Team_ Paramount (Dept. English)
PDF
EXPLORING PEER-TO-PEER DATA MINING
PDF
Exploring Peer-To-Peer Data Mining
A Distributed Approach to Solving Overlay Mismatching Problem
A NEW ALGORITHM FOR CONSTRUCTION OF A P2P MULTICAST HYBRID OVERLAY TREE BASED...
CPCRT: Crosslayered and Power Conserved Routing Topology for congestion Cont...
Evaluation of a topological distance
AN INITIAL PEER CONFIGURATION ALGORITHM FOR MULTI-STREAMING PEER-TO-PEER NETW...
AN INITIAL PEER CONFIGURATION ALGORITHM FOR MULTI-STREAMING PEER-TO-PEER NETW...
A Systematic Review of Congestion Control in Ad Hoc Network
Node similarity
Routing performance of structured overlay in Distributed Hash Tables (DHT) fo...
JAVA 2013 IEEE NETWORKING PROJECT Transfer reliability and congestion control...
ENFORCING END-TO-END PROPORTIONAL FAIRNESS WITH BOUNDED BUFFER OVERFLOW PROBA...
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Path loss exponent estimation
Analytical Modelling of Localized P2P Streaming Systems under NAT Consideration
G1063841
International Journal of Engineering Research and Development
Computer Network Topology By Team_ Paramount (Dept. English)
EXPLORING PEER-TO-PEER DATA MINING
Exploring Peer-To-Peer Data Mining

More from Zhenyun Zhuang (14)

PDF
Designing SSD-friendly Applications for Better Application Performance and Hi...
PDF
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
PDF
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
PDF
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
PDF
WebAccel: Accelerating Web access for low-bandwidth hosts
PDF
Dynamic Layer Management in Super-Peer Architectures
PDF
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
PDF
Enhancing Intrusion Detection System with Proximity Information
PDF
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
PDF
Optimizing JMS Performance for Cloud-based Application Servers
PDF
Capacity Planning and Headroom Analysis for Taming Database Replication Latency
PDF
OS caused Large JVM pauses: Deep dive and solutions
PDF
Wireless memory: Eliminating communication redundancy in Wi-Fi networks
PDF
Improving energy efficiency of location sensing on smartphones
Designing SSD-friendly Applications for Better Application Performance and Hi...
Optimized Selection of Streaming Servers with GeoDNS for CDN Delivered Live S...
Optimizing CDN Infrastructure for Live Streaming with Constrained Server Chai...
Application-Aware Acceleration for Wireless Data Networks: Design Elements an...
WebAccel: Accelerating Web access for low-bandwidth hosts
Dynamic Layer Management in Super-Peer Architectures
Hybrid Periodical Flooding in Unstructured Peer-to-Peer Networks
Enhancing Intrusion Detection System with Proximity Information
SLA-aware Dynamic CPU Scaling in Business Cloud Computing Environments
Optimizing JMS Performance for Cloud-based Application Servers
Capacity Planning and Headroom Analysis for Taming Database Replication Latency
OS caused Large JVM pauses: Deep dive and solutions
Wireless memory: Eliminating communication redundancy in Wi-Fi networks
Improving energy efficiency of location sensing on smartphones

Recently uploaded (20)

PDF
August -2025_Top10 Read_Articles_ijait.pdf
PPTX
Management Information system : MIS-e-Business Systems.pptx
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PDF
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
PPT
Total quality management ppt for engineering students
PPTX
Software Engineering and software moduleing
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PDF
ChapteR012372321DFGDSFGDFGDFSGDFGDFGDFGSDFGDFGFD
PDF
Visual Aids for Exploratory Data Analysis.pdf
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
PPTX
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
PPTX
Feature types and data preprocessing steps
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PPTX
Module 8- Technological and Communication Skills.pptx
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PDF
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
PDF
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PPTX
CyberSecurity Mobile and Wireless Devices
PPTX
Current and future trends in Computer Vision.pptx
August -2025_Top10 Read_Articles_ijait.pdf
Management Information system : MIS-e-Business Systems.pptx
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
UNIT no 1 INTRODUCTION TO DBMS NOTES.pdf
Total quality management ppt for engineering students
Software Engineering and software moduleing
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
ChapteR012372321DFGDSFGDFGDFSGDFGDFGDFGSDFGDFGFD
Visual Aids for Exploratory Data Analysis.pdf
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
Feature types and data preprocessing steps
distributed database system" (DDBS) is often used to refer to both the distri...
Module 8- Technological and Communication Skills.pptx
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
22EC502-MICROCONTROLLER AND INTERFACING-8051 MICROCONTROLLER.pdf
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
CyberSecurity Mobile and Wireless Devices
Current and future trends in Computer Vision.pptx

AOTO: Adaptive overlay topology optimization in unstructured P2P systems

  • 1. AOTO: Adaptive Overlay Topology Optimization in Unstructured P2P Systems∗ Yunhao Liu, Zhenyun Zhuang, Li Xiao Department of Computer Science and Engineering Michigan State University East Lansing, MI 48824 {liuyunha, zhuangz1, lxiao}@cse.msu.edu Lionel M. Ni Department of Computer Science Hong Kong University of Science and Technology Clearwater Bay, Kowloon, Hong Kong ni@cs.ust.hk ∗ This work was partially supported by Michigan State University IRGP Grant 41114 and by Hong Kong RGC Grant HKUST6161/03E. Abstract- Peer-to-Peer (P2P) systems are self-organized and decentralized. However, the mechanism of a peer randomly joining and leaving a P2P network causes topology mismatch- ing between the P2P logical overlay network and the physical underlying network. The topology mismatching problem brings great stress on the Internet infrastructure and seriously limits the performance gain from various search or routing tech- niques. We propose the Adaptive Overlay Topology Optimiza- tion (AOTO) technique, an algorithm of building an overlay multicast tree among each source node and its direct logical neighbors so as to alleviate the mismatching problem by choos- ing closer nodes as logical neighbors, while providing a larger query coverage range. AOTO is scalable and completely dis- tributed in the sense that it does not require global knowledge of the whole overlay network when each node is optimizing the organization of its logical neighbors. The simulation shows that AOTO can effectively solve the mismatching problem and re- duce more than 55% of the traffic generated by the P2P system itself. I. INTRODUCTION Peer-to-peer (P2P) systems have received much attention since the development of Gnutella. The P2P model aims to further utilize the Internet information and resources, com- plementing the traditional client-server services. P2P sys- tems can be classified into structured and unstructured sys- tems [1]. A major factor to determine the quality and per- formance of a P2P system is how effective is the searching and locating of information among the peers. Many search techniques have been proposed for structured P2P systems based on hash functions to tightly control file placement (and file locating) with the network topology (e.g., [2]). Al- though these designs are expected to dramatically improve the search performance, none of them is practically used due to their high maintenance traffic in delivering messages and updating the mapping. Furthermore, it is hard for structured P2P systems to efficiently support partially matched queries. In an unstructured P2P system, file placement is random, which has no correlation with the network topology. Un- structured P2P systems are most commonly used in today's Internet. An unstructured P2P system floods queries among peers (such as in Gnutella) or among supernodes (such as in KaZaA). This paper is focusing on unstructured P2P systems. In a P2P system, all participating peers form a P2P net- work over a physical network. A P2P network is an abstract, logical network called an overlay network. When a new peer wants to join a P2P network, a bootstrapping node provides the IP addresses of a list of existing peers in the P2P net- work. The new peer then tries to connect with these peers. If some attempts succeed, the connected peers will be the new peer's neighbors. Once this peer connects into a P2P net- work, the new peer will periodically ping the network con- nections and obtain the IP addresses of some other peers in the network. These IP addresses are cached by this new peer. When a peer leaves the P2P network and then wants to join the P2P network again (no longer the first time), the peer will try to connect to the peers whose IP addresses have al- ready been cached. This mechanism of a peer joining a P2P network and the fact of a peer randomly joining and leaving causes an interesting matching problem between a P2P over- lay network topology and the underlying physical network topology. Figure 1 shows two examples of P2P overlay topology (A, B, and D are three participating peers) and physical topology (nodes A, B, C, and D) mappings, where solid lines denote physical connections and dashes lines denote overlay (logi- cal) connections. Consider the case of a message delivery from peer A to peer B. In the left figure, A and B are both P2P neighbors and physical neighbors. Thus, only one communication is involved. In the right figure, since A and B are not P2P neighbors, A has to send the message to D before forwarding to B. This will involve 5 communications as indicated in Fig. 1. Clearly, such a mapping creates much unnecessary traffic and lengthens the query response time. We refer to this phenomenon as topology mismatching prob- lem. CA BCB D DA Figure 1: Two examples of P2P overlay networks. Studies in [3] show that only 2 to 5 percent of Gnutella connections link peers within a single autonomous system GLOBECOM 2003 - 4186 - 0-7803-7974-8/03/$17.00 © 2003 IEEE
  • 2. (AS). But more than 40 percent of all Gnutella peers are located within the top 10 ASes. This means that most Gnutella-generated traffic crosses AS borders so as to in- crease topology mismatching costs. The same message can traverse the same physical link multiple times, causing large amount of unnecessary traffic. In order to reduce unnecessary flooding traffic and im- prove search performance, two approaches have typically been used to improve from the flooding-based search mechanism. Rather than flooding a query to all neighbors, the first approach routes queries to peers that are likely to have the requested items by some heuristics based on main- tained statistic information [4]. In the second approach, a peer keeps indices of other peers’ sharing information or caches query responses in hoping that subsequent queries can be satisfied quickly by the cached indices or responses [4,5]. The performance gains of these approaches are also seriously limited by the topology mismatching problem. The objective of this paper is to minimize the effect due to topology mismatching. We propose the Adaptive Overlay Topology Optimization (AOTO) to alleviate the topology mismatching problem. AOTO is scalable and completely distributed in the sense that it does not require global knowl- edge of the whole overlay network when each node is opti- mizing the organization of its logical neighbors. Our simula- tion shows that the average cost of each query to reach the same scope of nodes is reduced by about 55% when using AOTO in a Gnutella-like P2P network without losing any autonomy feature, and the average response time of each query can be reduced by 40%. II. ADAPTIVE OVERLAY TOPOLOGY OPTIMIZATION A. Inefficient Scenarios In most flooding-based decentralized P2P networks, such as Limeware (Gnutella), each peer forwards a query message to all of its logical neighbors. Most supernode-based P2P systems, such as KaZaA, also flood queries among super- nodes. Figure 2(a) depicts an example of the underlying physical network topology, where the cost of each link is labeled by the link. Let node 1 be the source peer that will send flooding messages to other peers. For simplicity, we only consider total traffic (or cost) generated reaching nodes 2, 3, and 4 on three different P2P overlay topologies as shown in Fig. 2(b), 2(c), and 2(d), respectively. We assume that a node reaches to another node through a shortest physi- cal path based on the link cost (metric). Note that the two shaded nodes in Fig. 2(a) are non-participating nodes in the P2P network. In Fig. 2(b), nodes 2, 3 and 4 are immediate logical neighbors of node 1. The shortest physical path from node 1 to node 4 is 1 5 2 s 4 with a total cost of 9. Simi- larly, the costs from node 1 to nodes 2 and 3 are 3 and 15, respectively. Thus, the total cost of flooding a message from node 1 to nodes 2, 3, and 4 is 3+15+9=27. In Fig. 2(c), node 3 is the only immediate logical neighbor of node 1 and nodes 2 and 4 are immediate logical neighbors of node 3. A mes- sage will be flooded from node 3 to nodes 2 and 4. The total cost from node 1 to nodes 2, 3, and 4 is 15+12+6=33, which is worse than the case of Fig. 2(b). In Fig 2(d), node 1 can flood the message to all its neighbors, thus nodes 2, 3, and 4. However, node 2 does not know that node 3 will receive the message and will flood the message to node 3 as well. Simi- larly, node 4 will also flood the message to node 3. Thus, the total cost is 3+15+9+12+6=45. 1 5 2 36 4 7 30 20 21 5 1 33 6 (a) Physical Topology 2 1 3 5 4 6 7 3 15 14 9 6 (b) Overlay Topology 1 3 2 1 5 4 6 7 6 15 12 6 14 30 (c) Overlay Topology 2 31 2 5 6 74 1 3 6 52 4 7 (d) Overlay Topology 3 (e) An Optimized Topology 12 14 30 6 6 3 9 3 12 6 30 15 Figure 2: Examples of different overlay topologies. Clearly, all the three inefficient overlay topologies gener- ate a large amount of unnecessary traffic. Optimizing ineffi- cient overlay topologies can fundamentally improve P2P search efficiency. One attempt is to build an overlay multi- cast tree among a node and its logical neighbors. In the case of Fig. 2(d), an improved mechanism is shown as thick lines in Fig. 2(e) in which the total cost from node 1 to nodes 2, 3, and 4 is 3+12+6=21. Although the cost is not as low as the optimal IP-level multicast, which is 15, the total cost has already been significantly reduced. This is the motivation that we propose the Adaptive Overlay Topology Optimiza- tion (AOTO) technique. While retaining the desired prevailing unstructured archi- tecture of P2P systems, the goal of AOTO is to dynamically optimize the logical topology to improve the overall per- formance of P2P systems, which can be measured as query response time. AOTO includes two steps: Selective Flooding (SF) and Active Topology (AT). Selective Flooding is to build an overlay multicast tree among each peer and its im- mediate logical neighbors, and route messages on the tree to reduce flooding traffic without shrinking the search coverage range. Thus, some neighbors become non-flooding neighbors. Active Topology is the second step in AOTO for each peer to independently make optimization on the overlay topology to alleviate topology mismatching problem by re- placing non-flooding neighbors with closer nodes as direct logical neighbors. B. Selective Flooding Instead of flooding to all neighbors, SF uses a more effi- cient flooding strategy to selectively flood a query on an overlay multicast tree. This tree can be formed using a minimum spanning tree algorithm among each peer and its GLOBECOM 2003 - 4187 - 0-7803-7974-8/03/$17.00 © 2003 IEEE
  • 3. immediate logical neighbors. In order to build the minimum spanning tree, a peer has to know the costs to all its logical neighbors and the costs between any pair of the neighbors. We use network delay between two nodes as a metric for measuring the cost between nodes. We modify the Limewire implementation of Gnutella 0.6 P2P protocol by adding one routing message type. Each peer probes the costs with its immediate logical neighbors and forms a neighbor cost table. Two neighboring peers exchange their neighbor cost tables so that a peer can obtain the cost between any pair of its logical neighbors. Thus, a small overlay topology of a source peer and all its logical neighbors is known to the source peer. Compared with the flooding traffic, the traffic generated in SF due to exchanging neighbor cost tables is insignificant because such exchanges only occur between immediate neighbors. For a branching factor (i.e., average number of direct logical neighbors) of m and TTL (the number of times a message will be forwarded) of k, the flooding traffic is O(mN) for each query, where N is the total number of peers, which is typical in the range of millions and m is in the range of tens. The traffic increased due to exchanging neighbor cost tables is O(m) that is trivial. Based on obtained neighbor cost tables, a minimum spanning tree then can be built by simply using an algorithm like PRIM which has a computation complexity of O(m2 ). Now the message routing strategy of a peer is to select the peers that are the direct neighbors in the multicast tree to send its queries. In the example of Fig. 2(e), node 1 sends a message only to node 2 and expects that node 2 will forward the message to nodes 3 and 4. Note that in this step, even node 1 does not flood its query message to nodes 3 and 4 any more, node 1 still retains the connections with nodes 3 and 4 and keeps exchanging the neighbor cost tables. We call nodes 3 and 4 non-flooding neighbors, which are the peers to be optimized in the next step. C. Active Topology The second step of AOTO, AT, reorganizes the overlay topology. Note that each peer has a neighbor list which is further divided into flooding neighbors and non-flooding neighbors in SF. Each peer also has the neighbor cost tables of all its neighbors. In this step, it tries to replace those physically far away neighbors by physically close by neighbors, thus minimizing the topology mismatching traf- fic. An efficient method to identify such a candidate peer to replace a far away neighbor is critical to the system per- formance. Many methods may be proposed. In our approach, a non-flooding neighbor may be replaced by one of the non- flooding neighbor’s neighbors. Let Cij represent the cost from peer i to j. The proposed Randomized AT algorithm picks up a candidate peer at random among the non-flooding neighbor’s neighbors. The following pseudo code describes the randomized AT algorithm for a given source peer i. Pseudo Code of the Randomized AT Algorithm (peer i) For each j in i's non-flooding neighbors Replaced = false; List = all j's neighbors excluding i; While List is not empty and Replaced = false randomly remove a peer h from List; measure Cih; if Cih < Cij {replace j by h in i's neighbor list; Replaced = true;} else if Cih < Cjh {add h to i's neighbor list; remove j from i's neighbor list right after i finds out jh is disconnected; Replaced = true;}; End While; End For; Note that Cij is known to the peer i. Cjh is also known to the peer i due to the exchange of neighbor cost tables be- tween i and j. The cases of Cih < Cij and Cih ≥ Cjh are quite obvious. Let’s explain the case of Cih < Cjh using Fig. 2(d), where i=1 and j=3. Suppose h=6. From Fig. 2(d), we have C1,3=15 and C3,6=30. From Fig 2(a), we can measure C1,6=20. When node 1 finds that the cost between nodes 3 and 6 is even larger the cost between node 1 and node 6, node 1 will keep node 6 as a new neighbor. Since the algorithm is exe- cuted in each peer independently, node 1 cannot inform node 3 to remove node 6 from node 3’s neighbor list. However, as long as node 1 keeps both node 3 and 6 as its logical neighbors, we may expect that node 6 will become a non- flooding neighbor to node 3 after node 3’s SF step since node 3 expects node 1 to forward messages to node 6 to re- duce unnecessary traffic. Node 3 will try to find another peer to replace node 6 as its neighbor. After knowing that node 6 is no longer a neighbor to node 3 from periodically ex- changed neighbor cost tables from node 3 (or from node 6), node 1 will remove node 3 from its neighbor list though node 1 has already stopped sending query messages to node 3 for a period of time since the spanning tree has been built for node 1. III. PERFORMANCE EVALUATION Performance evaluation of the proposed AOTO method is described in this section. Both physical topologies and logical overlay topologies which can accurately reflect the topological properties of real networks in each layer are needed in the simulation study. Previous studies have shown that both large scale Internet physical topologies [6] and P2P overlay topologies [7] follow small world and power law properties. Power law describes the node degree while small world describes characteristics of path length and clustering coefficient [9]. The study in [6] found that the topologies generated using the AS Model have the properties of small world and power law. BRITE is a topology generation tool that provides the option to generate topologies based on the AS Model. We generate 10 physical topologies each with 5000 nodes. The logical topologies are generated with the number of peers (nodes) ranging from 100 to 2000. For each given number of nodes, we generate logical topologies with average edge connections between 1 and 20. GLOBECOM 2003 - 4188 - 0-7803-7974-8/03/$17.00 © 2003 IEEE
  • 4. 1 2 3 4 5 6 7 8 9 10 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Different Physical Topologies AverageOptimizationRate Figure 3: Effect due to different physical topologies for SF. 0 200 400 600 800 1000 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Number of Nodes in logical topologies AverageOptimizationRate Figure 4: Effect due to different logical topologies for SF. 0 5 10 15 20 25 30 35 40 0 % 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Logical Neighbors AverageOptimizationRate Figure 5: Effect due to different number of logical neighbors. A. Performance Metrics Let Tk denote the cost of each query from the source to reach all its neighbors, where T-1 is the cost based on the traditional blind flooding, T0 is the cost using SF only in the first time, and Tk is the k-th time applying the randomized AT algorithm. Let ∆k = (Tk-1 – Tk)/Tk-1 × 100%. Whenever a new neighbor cost table is received or there is a change of neighbors, the source peer has to re-calculate the multicast tree and apply the randomized AT algorithm. In theory, the source peer can continuously do this until no cost improve- ment is obtained, thus closing to a perfect topology match- ing. Obviously, this is unnecessary and creates too much overhead. We refer k as the number of optimization steps. During the k-th time applying the AT algorithm, each non-flooding neighbor may be replaced by one of its neighbors. If the source has n non-flooding neighbors, the proposed randomized AT algorithm may have up to n re- placements. The overhead to exhaust all n possible replace- ments may also be too high. In practice, after each replace- ment, the source peer will compute the cost improvement ratio and decide whether it needs to find another candidate peer to replace another non-flooding neighbor based on a termination threshold, ∆. The optimization process will ter- minate if the improvement ratio is less than ∆. Thus, the value of ∆ is a factor to impact the effectiveness of AT. A smaller threshold causes larger overhead, but produces better optimization results, which will be shown in the next sec- tion. To evaluate the optimization result of a logical topology, we use the metric, average distance, D. Let Di be the aver- age distance between the source peer i and all its logical neighbors. The value D is defined as the average of all Di’s (i.e., all peers in the P2P network). The ideal case would be that each node has physically closest P2P nodes as its logical neighbors with at least the same query coverage range. The topology mismatching problem is effectively solved in the ideal case. Since we assume that a peer can always reach its logical neighbors through the shortest physical path, our simulator calculates the physical shortest path for pairs of peers. We have simulated AOTO for all the generated logical topologies on top of each of the 10 generated physical to- pologies with 5000 nodes. We have also simulated AOTO in a real-world P2P topology (based on DSS Clip2 trace). We obtained consistent results on the real-world topology and the generated topologies. In order to show a thorough per- formance discussion, we only present our performance on various generated topologies. B. Effectiveness of Selective Flooding To evaluate the effectiveness of SF, we consider three factors. First, how do different physical topologies affect the effectiveness of SF? We compare the average ∆0’s of a 1000-node logical topology with average 8 edge connections on top of the 10 different physical topologies after selective flooding. Figure 3 shows consistent ∆0 results ranging from 55% to 60% for different physical topologies. Thus, the physical topology has little impact on the effectiveness of SF. Second, how do different logical topologies affect the ef- fectiveness of SF? For a given physical topology and a given average edge connection, we compare the average ∆0’s of SF on logical topologies ranging from 50 to 1000 nodes. The results in Fig. 4 show that the density of P2P nodes does not influence the effectiveness of SF. The average optimization rate, ∆0, for each topology is around 55%. Third, how do different numbers of average logical neighbors affect the effectiveness of SF? We compare the average ∆0’s of 20 500-node logical topologies with differ- ent numbers of logical neighbors ranging from 2 to 40. Fig- ure 5 show that SF is more effective with a large number of logical neighbors. For example, SF can achieve ∆0 as high as 87.4% on a logical topology with an average of 30 logical neighbors. It is normal that the out-degree of a supernode reaches 30 in many P2P systems [4]. C. Effectiveness of Active Topology The average number of logical neighbors is a major fac- tor in the effectiveness of SF, but not of AT. We compare the average reductions of D on three 500-node logical to- pologies with average 6, 12, and 18 logical neighbors as the number of optimization steps is increased. GLOBECOM 2003 - 4189 - 0-7803-7974-8/03/$17.00 © 2003 IEEE
  • 5. 0 5 10 15 20 70 75 80 85 90 95 100 NormalizedAverageDistance 6 Neighbors 12 Neighbors 18 Neighbors Optimization Steps Figure 6: Effect on the number of logical neighbors. 0 5 10 15 20 25 30 60 65 70 75 80 85 90 95 100 Optimization Steps NormalizedAverageDistance 80% 60% 40% 20% 0% Figure 7: Comparison of different termination thresholds. 0 5 10 15 20 25 30 35 40 55 60 65 70 75 80 85 90 95 100 Optimization Stpes AverageResponseTime Figure 8: Normalized improvement of average query response time. Results in Fig. 6 show that the number of logical neighbors has little impact to the effectiveness of AT. The termination threshold ∆ can be defined by each node independently. In above simulations, we have ∆ =20% for each node. In order to show the trade-offs between overhead and effectiveness, Fig. 7 plots the normalized average dis- tance D on different thresholds as the number of optimiza- tion steps is increased. As expected, D is reduced more slowly for a larger threshold, and a lower threshold leads to a better result. D. The Improvement of Query Response Time Average response time of each query in a P2P system is what a user really cares about. In order to show the overall improvement of AOTO, Fig. 8 shows the normalized aver- age query response times as the number of optimization steps is increased from 1 to 40. Given a source peer, the des- tination peer is randomly chosen. As shown in Fig. 8, the average query response time is significantly reduced. IV. CONCLUSIONS AND FUTURE WORK This paper aims at alleviating the topology mismatching problem by optimizing the overlay network using our pro- posed AOTO technique to reduce unnecessary traffic and improve query search efficiency. To the best of our knowl- edge, the topology mismatching problem has not been ade- quately addressed. The only related work is Narada [8]. Based on end system multicast, Narada first constructs a rich connected graph on which to further construct shortest path spanning trees. This approach introduces large overhead of forming the graph of trees in a large scope and does not con- sider peers’ dynamic joining and leaving. The overhead of Narada is proportional to the multicast group size. The proposed AOTO technique is easy to implement and adaptive to the dynamic nature of P2P systems. Furthermore, the overhead of the proposed AOTO algorithm is only pro- portional to the average number of logical neighbors. AOTO is more effective on logical topologies with large branching factors. It will make the decentralized flooding-based P2P file sharing systems more scalable and efficient. Due to space limitation, many simulation results and de- sign alternatives are not included in this paper. Instead of the randomized AT algorithm, we have studied the Closest AT algorithm to replace a non-flooding neighbor by its least cost neighbor, which will further optimize the overlay topology with a less number of optimization steps at the expense of more overhead. More research is needed to study other AOTO policies, the frequency in invoking the AOTO algorithm, the impact due to different multicast tree algorithms, and the impact of implementation overhead. ACKNOWLEDGMENTS We thank Xiaomei Liu for her participation in the early stage of the work. We are grateful to the anonymous review- ers for their insightful and constructive comments. REFERENCES [1] Q. Lv, P. Cao, E. Cohen, K. Li, and S. Shenker. “Search and replication in unstructured peer-to-peer networks,” Proc. of the 16th ACM Int’l Conf. on Supercomputing, June 2002. [2] I. Stocia, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for Internet applica- tions,” Proc. of SIGCOMM 2001. [3] M Ripeanu, A. Iamnitchi and I. Foster, “Mapping the Gnutella Net- work,” IEEE Internet Computing, January/February 2002, pp. 50-57. [4] B. Yang and H. Garcia-Molina, “Designing super-peer network,” Proc. of the 22nd Int’l Conf. on Distributed Computing Systems (ICDCS 2002), Vienna, Austria, July 2002. [5] K. Sripanidkulchai, B. Maggs and H. Zhang, “Efficient Content Loca- tion Using Interest-Based Locality in Peer-to-Peer Systems,” Proc. of INFOCOM 2003. [6] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, and W. Will- inger, “Network Topology Generators: Degree-Based vs. Structural,” Proc. of SIGCOMM’02, August 2002. [7] S. Saroiu, P.K. Gummadi, and S. D. Gribble, “A measurement study of peer-to-peer file sharing systems,” Proc. of Multimedia Computing and Networking 2002, January 2002. [8] Y. Chu, S. G. Rao and H. Zhang, “A Case For End System Multicast," Proc. of ACM SIGMETRICS, Santa Clara, CA, June 2000, pp. 1-12. [9] T. Bu and D. Towsley, “On Distinguishing between Internet power law topology generators,” Proceedings of INFOCOM 2002. GLOBECOM 2003 - 4190 - 0-7803-7974-8/03/$17.00 © 2003 IEEE