SlideShare a Scribd company logo
Performance, Fault-tolerance and Scalability Analysis of
Virtual Infrastructure Management System
Xiangzhen Kong1
, Jiwei Huang2
, Chuang Lin3
, Peter D. Ungsunan4
Department of Computer Science and Technology
Tsinghua University
Beijing, 100084, China
1
xiangzhen1985@gmail.com, 2
hjw217@gmail.com, 3
chlin@tsinghua.edu.cn, 4
hongsunan@csnet1.cs.tsinghua.edu.cn
Abstract—The virtual infrastructure has become more and
more popular in the Grid and Cloud computing. With the
aggrandizement scale, the management of the resources in
virtual infrastructure faces a great technical challenge. To
support the upper services effectively, it raises higher
requirements for the performance, fault-tolerance and
scalability of virtual infrastructure management systems. In
this paper, we study the performance, fault-tolerance and
scalability of virtual infrastructure management systems with
the three typical structures, including centralized, hierarchical
and peer-to-peer structures. We give the mathematical
definition of the evaluation metrics and give detailed
quantitative analysis, and then get several useful conclusions
for enhancing the performance, fault-tolerance and scalability,
based on the quantitative analysis. We believe that the results
of this work will help system architects make informed choices
for building virtual infrastructure.
Keywords-virtual infrastructure management system;
performance; fault-tolerance; scalability
I. INTRODUCTION
A Grid is a very large-scale distributed network
computing system that can scale to Internet size
environments [1, 2]. In recent years, another distributed
computing model called Cloud computing has come into
being, which is closely related to the existing Grid
computing [3]. Both in the Grid and Cloud computing,
flexible and efficient sharing of distributed resources is at the
core of the design and implementation, which brings forward
higher requirements for low-cost, scalable and dependable
infrastructure. Under this background, infrastructure
virtualization becomes a growing concern.
Virtualization adds a hardware abstraction layer called
the Virtual Machine Monitor (VMM) or Hypervisor. The
layer provides an interface that is functionally equivalent to
the actual hardware to a number of virtual machines (VMs)
[4, 6]. More recently, virtualization became important as a
way to improve system security, reliability, reduce costs, and
provide greater flexibility [5]. Many servers, on each of
which run several VMs, are connected with one another by
network modules, and they are under the unified
management and play the role of infrastructure for upper
application, which are called virtual infrastructure. Virtual
infrastructure has numerous advantages, such as low-cost,
ease of deployment and more dependable. The virtual
infrastructure is becoming popular in Grid and Cloud [7-13].
The virtual infrastructure management system is charge
of controlling the resources and VMs. The management of
virtual infrastructure has some particular characteristics, for
the special mechanism of virtualization such as live
migration [14]. The virtual infrastructure management
system plays an important role and has a direct impact on the
capability of the overall infrastructure. However, with the
sharp increase of servers and the large scale of system, the
management of virtual infrastructure faces many tough
challenges. The sharp increase of virtual servers requires
higher scalability; the need for efficiency and quality of
service (QoS) of upper application puts forward a demand
for real-time and response time; besides, the fault-tolerance
is also an important requirement, especially when the
management nodes increase. In this paper, we study the
virtual infrastructure management systems with three typical
structures, and evaluate their performance, fault-tolerance
and scalability. The contributions are as follows:
• According to characteristics of virtual infrastructure
management systems, we propose the performance,
fault-tolerance and scalability evaluation metrics,
and give the detailed expressions and calculations.
• We make the quantitative analysis and evaluation of
the performance, fault-tolerance and scalability of
the virtual infrastructure management systems with
three typical structures, including centralized,
hierarchical and peer-to-peer structures.
• Based on the performance, fault-tolerance and
scalability analyses, we summarize some basic rules,
which are directive and with reference value for the
design and implementation of virtual infrastructure
management system in Grid and Cloud computing.
The rest of this paper is organized as follows. Section 2
introduces the research background and some concepts. In
Section 3, three typical structures of virtual infrastructure
management system are introduced, and then the detail
calculation and analysis of the performance, fault-tolerance
and scalability are given. Section 4 makes further discussions
and summarizes some useful rules. Section 5 shows the
numerical results by analyzing an example. At last, we
conclude the paper in Section 6.
II. BACKGROUND
Infrastructure virtualization becomes a hot issue in
interested among the industry and academe. Virtualization is
becoming popular in Grid computing[7-9], and is inherently
2009 IEEE International Symposium on Parallel and Distributed Processing with Applications
Unrecognized Copyright Information
DOI 10.1109/ISPA.2009.24
282
2009 IEEE International Symposium on Parallel and Distributed Processing with Applications
978-0-7695-3747-4/09 $25.00 © 2009 IEEE
DOI 10.1109/ISPA.2009.24
282
Figure 1. VMware infrastructure 3 components [18].
Figure 2. The typical structures of the virtual infrastructure management
system:(a) Centralized (b) Hierarchical (c) Peer-to-peer structures.
key feature of Cloud computing [10-13]. Benefited from its
special mechanisms, the virtual infrastructure has numerous
advantages. Server consolidation reduces the cost sharply;
live migration[14] or VMotion[15] greatly improve the
flexibility, maintainability and availability; Checkpoint /
Restart and Isolation help enhance the reliability, security
and survivability[16, 17]. The implementation of these
mechanisms is under unified management of the virtual
infrastructure management system.
Fig. 1 shows an example of virtual infrastructure, called
VMware Infrastructure 3 [18], which is published by
VMware Inc. Here is a small-scale system, and the virtual
infrastructure management system consists of only one
management server called VirtualCenter (VC) server. The
VC server manages multiple virtual servers at the same time,
and unifies resources from individual virtual server so that
those resources can be shared among virtual machines. The
running state of every virtual server is monitored and
recorded by VC server. When one virtual server overloads or
suffers failures, some of the VMs running on it could move
to another suitable virtual server by VMotion with little
downtime. The migration target virtual server is selected by
VC server according to the running state of every virtual
server. The server with least workload or most dependability
will be selected according to different strategies.
With the rapid development of distributed application,
the scale of system aggrandizes sharply. There are even tens
of thousands of servers in some large virtual infrastructure of
Grid or Cloud computing system. In this case, how to
construct an effective virtual infrastructure management
system for higher performance, fault-tolerance and
scalability becomes a challenge.
III. PERFORMANCE, FAULT-TOLERANCE AND
SCALABILITY ANALYSIS
A. Typical Structures of Virtual Infrastructure
Management System
Similar to the traditional Grid resource manage system,
the structures of virtual infrastructure management system
can be classified into centralized, hierarchical and
peer-to-peer structures [2, 22]. The other complex structures
can be view as the hybrid of the three typical structures.
1) Centralized structure
In a centralized virtual infrastructure management
system, all Virtual Servers (VSs) are managed by one Virtual
Infrastructure Management Server (VIMS). The central
VIMS monitors and records the running state information of
every VS. When one of the virtual machines in one VS needs
migration, the VIMS decides which is the best migration
target VS. The virtual infrastructure with centralized
structure management system is shown in Fig. 2 (a).
2) Hierarchical structure
The hierarchical structure is shown in Fig. 2 (b). The
VIMSs in the lowest layer manage the virtual servers directly,
and the lower layer VIMSs are under the administration of
the parent VIMSs. Lower layer VIMSs need merely to pass
the digest message to their parent VIMS, which reduce the
traffic and raise efficiency. The root VIMS has the whole
information of all VSs.
3) Peer-to-peer structure
Virtual infrastructure management system of peer-to-peer
structure is a decentralized system (see Fig. 2(c)). It can be
viewed as another expansion of the centralized structure.
Every peer VIMS has local group of virtual servers, different
VIMSs communicate directly with each other to get running
state of the VS managed by others. Every VIMS can get the
whole information of the virtual infrastructure.
B. Performance Analysis
We take average response time of every migration request
as the performance metric, which is the expectation period
from the time that a migration request is submitted to VIMS,
to the time that VIMS gets the best choice of the target VS.
Hereafter, we assume the total number of VS nodes is n , and
the operation period of every VS needless migration
conforms to exponential distribution, with expectation value
0T . So the migration request event from every VS node
conforms to Poisson distribution with parameter 0 01/Tλ = .
1) Performance analysis of the centralized structure
There is only one VIMS which manages n VSs in the
centralized system. According to the additive property of the
Poisson distribution, we know that the total migration
requests received by the central VIMS conform to Poisson
distribution too. The arrival rate of migration requests is
0nλ λ= . The central VIMS compares the workload or
dependability of every VS node recorded in it, and then
selects the best choice as the migration target VS. The
process takes ( )O n time generally, so we can assume that
the average processing time of VIMS for every migration
request is S Kn= , where K is a constant independent from n .
Assuming the processing time of VIMS for every
migration request conforms to arbitrary distribution, we can
283283
view the VIMS in centralized structure as an M/G/1 queue
model. According to the Pollaczek-Khinchin mean-value
formula [19], the average service time of M/G/1 queue is
2
(1 )
2(1 )
S C
T S
ρ
ρ
+
= +
−
(1)
WhereC is the coefficient of variation of the processing
time S , i.e. /SC Sσ= . SoC is relevant to the distribution of S.
But as the process for every migration request is similar, we
can assign a small constant toC . ρ is the server utilization,
and should be less than one for the stability of system.
For the centralized structure, we have S Kn= and
2
0/S Kn Tρ λ= = . Therefore, from (1), the average response
time of every migration request can be expressed by
2 3 2
2
0
(1 )
2( )
C
K n C
T Kn
T Kn
+
= +
−
(2)
To keep the system’s stability, the utilization should
be 1ρ < . So the number limit of the VS nodes those can be
managed by the central VIMS is 0 / Cn T K N< = .
2) Performance analysis of the hierarchical structure
It is assumed that a hierarchical system has h layers.
Every VIMS except those in the lowest layer has d son
VIMSs. So there are 1l
d −
VIMSs in the l th layer
( 1,2, ,l h= ) and ( ) ( )1 / 1h
m d d= − − VIMSs totally. Each
VIMS in the lowest layer directly manage 1
/ h
n d −
VS nodes.
Therefore, the average arrival rate of migration request is
( )1
0/ h
h n d Tλ −
= for the VIMS in the lowest layer, and
( )1
0/ i
i n d Tλ −
= for the VIMS in the i th (1 1i h≤ ≤ − ) layer.
The average processing time of every migration request for
the VIMS in the lowest layer is 1
/ h
hS Kn d −
= ; and because
the VIMSs in lower layer only need to report its locally
optimal choice to its parent VIMS, the average processing
time of every migration request for the VIMS in the i th
(1 1i h≤ ≤ − ) layer is i
iS Kd= . So the utilization of VIMS in
the lowest layer is ( )
21
0/ /h
h h hS K n d Tρ λ −
= = , and for the
VIMS in the i th (1 1i h≤ ≤ − ) layer 0/i i iS Kdn Tρ λ= = . From
(1), we get the average response time of every migration
request in hierarchical management system as
( )
( )
( )( )
1
1
32 1 22 1 21
1 211 0
0
/ (1 )(1 )
2 2 /
h
H i h
i
hih
i
h
hi
T T T
K n d CK nd C n
Kd K
T Kdn d T K n d
−
=
−+−
−
−=
= +
⎛ ⎞
+⎛ ⎞+ ⎜ ⎟
= + + +⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎜ ⎟
⎝ ⎠
∑
∑
(3)
For the stability of the system, we have 1hρ < and 1iρ < .
So the number limit of the VSs those can be managed by the
hierarchical VIMSs is { }1
0 0min / , /h
Hn d T K T Kd N−
< = .
3) Performance analysis of the peer-to-peer structure
It is assumed that there are m VIMSs in the virtual
infrastructure management system, so the number of VSs
managed by a VIMS is /n m . When the virtual machines in a
VS need migration, the VS sends a migration request to its
local VIMS, which computes the local optimal target VS,
gets all the local optimal target servers from other 1m −
VIMS nodes, and then selects the best choice. So the average
arrival rate of migration request for every VIMS
is 0 0
0
( 1)
n n n
m
m m T
λ λ λ= + − = ; the average processing time
is ( / 1)S K n m m= + − , and the server utilization is
0( / 1)/S Kn n m m Tρ λ= = + − . From (1), we can obtain the
average response time of every migration request as
2
2 2
0
1 (1 )
1
2 1
P
n
K n m C
n m
T K m
m n
T Kn m
m
⎛ ⎞
+ − +⎜ ⎟
⎛ ⎞ ⎝ ⎠= + − +⎜ ⎟
⎛ ⎞⎛ ⎞⎝ ⎠
− + −⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠
(4)
The stable condition of the system is 1ρ < , so the
number limit of the VSs can be managed by the peer-to-peer
VIMSs is ( )2
0( 1) 4 / ( 1)
2
P
m
n m T mK m N< − + − − = .
C. Fault-tolerance Analysis
The workload was found to be a very influential factor to
system failure rate [20]. Using a linear model to describe the
relationship between them, we give the failure rate as
f A Bρ= + , where ρ is the system utilization, A and B are
constants, which represent the effect on failure brought by
workload and the inherent failure rate respectively.
According the reliability theory, the probability that system
is failure free in a time period (0, )t
is ( ) { } ft
R t Prob Y t e−
= > = . So the probability that system
breaks down in (0, )t is ( ) 1 ( ) 1 ft
F t R t e−
= − = − .
The failure of VIMS node may lead to the result that the
information of part or all of the VS nodes can’t be obtained
by the system, which will hamper the VIMSs to compute the
best choice of migration target server. As Fn is the average
number of VS nodes still managed by the system in the
presence of VIMS failure, and n is the total number of VS
nodes, the fault-tolerance of the system can be expressed by
/FFT n n= (5)
1) Fault-tolerance analysis of centralized structure
According to subsection 3.2, the utilization of the only
VIMS in centralized system is 2
0/C Kn Tρ = . So the failure
rate is C Cf A Bρ= + , and the probability that system breaks
down in (0, )t is ( ) 1 ( ) 1 Cf t
CF t R t e−
= − = − . Hereafter, we
consider the probability in unit time, so
2
01 1 exp{ ( / )}Cf
CF e AKn T B−
= − = − − + (6)
Since there is only one VIMS node in centralized
structure, all VS nodes can’t be managed when the VIMS
fails. Hence 0 (1 )F C Cn F n F= + −i , and the fault-tolerance is
2
0/ (1 )/ exp{ ( / )}C F CFT n n n F n AKn T B= = − = − + (7)
2) Fault-tolerance analysis of hierarchical structure
There are h layers VIMS nodes in the hierarchical
system and n VS nodes managed by the VIMS nodes. The
284284
number of VIMS nodes in the k th layer is 1k
km d −
= . The
utilization of VIMS nodes in the k th layer is
( )
0
21
0
/ 1 1
/ /
k h
Kdn T k h
K n d T k h
ρ −
≤ ≤ −⎧⎪
= ⎨
=⎪⎩
(8)
Since the failure probability of one VIMS node in the k th
layer is ( ) 1 1 exp{ ( )}Hf
H kF k e A Bρ−
= − = − − + , we can obtain
( )
0
21
0
1 exp{ ( / )} 1 1
( )
1 exp{ [ / / ]}
H h
AKdn T B k h
F k
AK n d T B k h−
− − + ≤ ≤ −⎧⎪
= ⎨
− − + =⎪⎩
(9)
In the k th layer, when there are i nodes fail, the VIMS
can still manage ( ) ( )1
( ) / / k
k kn i n i n m n i n d −
= − = − VS
nodes. Therefore, the average number of VS nodes that can
be managed by the VIMS nodes in the k th layer is
1
1
1
1
1
1
( ) ( )(1 ( )) ( )
1 ( ) (1 ( ))
k
k
k
k
m
k m i
F H H k
i
kd
i d i
H H k
i
m
n k F k F k n i
i
d i
n F k F k
di
−
−
−
=
−
−
−
=
⎛ ⎞
= −⎜ ⎟
⎝ ⎠
⎛ ⎞⎛ ⎞
= − −⎜ ⎟⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠
∑
∑
i
i
(10)
Hence, from (9) and (10), we obtain the fault-tolerance of
the hierarchical structure as
1
1
1
11
1
1 1 0 0
2 21
1 1
0 0
1
( )
1
1 1 exp{ ( )} exp{ ( )}
1
1 1 exp{ ( )} exp{ ( )}
k
k
h
F
kF
H
i d ikh d
k
k i
i
h
h h
n k
n h
FT
n n
d Kdn Kdn i
A B A B
h T T di
d K n K n
A B A B
h T Td di
−
−
=
−−−
−
= =
−
− −
= =
⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟= − − − + − + +⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠
⎛ ⎞ ⎛⎛ ⎞ ⎛ ⎞ ⎛ ⎞
− − − + − +⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠ ⎝
∑
∑ ∑ i
1
1
1
1
h
h
d i
d
h
i
i
d
−
−
−
−
=
⎛ ⎞⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎠⎝ ⎠
∑ i
(11)
3) Fault-tolerance analysis of peer-to-peer structure
Each VIMS node manages /n m VS nodes directly. So
when i VIMS nodes fail, the average number of VS nodes
that can be still managed is ( ) /Fn i n i n m= − ⋅ . Assuming
different VIMS nodes failure events are independent, then
0
0
(1 ) ( )
1 (1 )
m
i n i
F P P F
i
m
i m i
P P
i
m
n F F n i
i
m i
n F F
i m
−
=
−
=
⎛ ⎞
= −⎜ ⎟
⎝ ⎠
⎛ ⎞⎛ ⎞
= − −⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠
∑
∑
i
(12)
Because the utilization is 0( / 1)/P Kn n m m mTρ = + − and
failure rate is P Pf A Bρ= + , the failure probability is
01 1 exp{ [ ( / 1)/ ]}Pf
PF e AKn n m m T B−
= − = − − + − + (13)
Hence, the fault-tolerance of the peer-to-peer structure is
0 0 0
1 1 exp{ ( ( 1) )} exp{ ( ( 1) )}
F
P
i m i
m
i
n
FT
n
mi Kn n Kn n
A m B A m B
im T m T m
−
=
= =
⎛ ⎞ ⎛ ⎞⎛ ⎞
− − − + − + − + − +⎜ ⎟ ⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠ ⎝ ⎠
∑
(14)
D. Scalability Analysis
Jogalekar et al. proposed a strategy-based scalability
metric for distributed systems [23, 24]. The scalability can be
expressed as
2 2 2 2 1 1
1 2 2 1
1 1 1 1 2 2
ˆ/ (1 / )
( , ) ( )/ ( )
ˆ/ (1 / )
f C C T T
k k F k F k
f C C T T
λ λ
ψ
λ λ
+
= = =
+
(15)
Where, λ is the throughput in responses/sec, C(k) is the
rental cost and ˆ( ) 1/(1 ( )/ )f k T k T= + is the average value of
each response, where T is the mean response time and ˆT is
the target value. The productivity ( ) ( ) ( )/ ( )F k k f k C kλ= is
the value delivered per second.
To give analytic solutions, k for the base case is taken as
1, and the metric is written as ( ) ( )/ (1)k F k Fψ = [23].
Fig. 3 shows howψ might behave in different situations
[24, 25]. The value ofψ and its trend relative to the scale
factor is taken to evaluate the scalability of a system.
For a virtual infrastructure management system, we
concern on the change of management efficiency instead of
productivity by scaling up the system. The management
efficiency when there are m VIMSs is defined as
( ) ( )/ ( )F m n f m C m= ⋅ (16)
Where, n is the maximum number of the VSs that the
VIMSs are able to manage; f(m) is the average value of each
VIMS determined by the management performance; C(m) is
the cost at scale m, assumed as ( )C m mα= .
For the average response time, the centralized structure
can be viewed as the special case of the hierarchical and
peer-to-peer structure (to be proved in Section 4). So we take
the centralized structure as the point of reference. The
average value of each response in other structures is
evaluated by comparing with the centralized one, as
( )( ) 1/ 1 ( )/ Cf m T m T= + (17)
Where, T(m) is the average response time in a certain
structure, and TC is the average response time of the
centralized structure. So the management efficiency is,
( )
(1 ( )/ )C
n
F m
m T m Tα
=
+
(18)
And, the scalability for scale m2 relative to m1 is
2 1 1
1 2 2 1
1 2 2
(1 ( )/ )
( , ) ( )/ ( )
(1 ( )/ )
C
C
n m T m T
m m F m F m
n m T m T
ψ
+
= =
+
(19)
1) Scalability analysis of the centralized structure
In the centralized system, there is only one VIMS,
i.e. 1m = , which can manage n r= VSs at most, where r is
the maximum number of VSs that a VIMS is able to manage.
The average management efficiency is
(1 / ) 2
C
C C
r r
F
T Tα α
= =
+
(20)
For both the hierarchical and peer-to-peer structure, when
1m = , n r= , we have H CT T= and P CT T= (to be proved in
Section 4). Hence, ( 1)H CF m F= = and ( 1)P CF m F= = , then
we have ( 1) 1H mψ = = and ( 1) 1P mψ = = as a result.
In conclusion, the centralized structure is a basic case of
the hierarchical structure and the peer-to-peer structure, and
it could be used to be compared with other cases
when 1m ≠ . For both the hierarchical and the peer-to-peer
structure,
285285
Figure 3. Different scalability behaviors [24, 25].
2
( ) ( )/
(1 ( )/ )
C
C
n
m F m F
rm T m T
ψ = =
+
(21)
2) Scalability analysis of the hierarchical structure
In a hierachical system, there are ( 1)/( 1)h
m d d= − −
VIMSs in all, able to manage 1h
n d r−
= VSs at most. The
average response time is
( ) ( )
2 2 2 3 21
2
1 0 0
(1 ) (1 )
2 2
h ih
i
H h
i
K d r C K r C
T Kd Kr
T Kd r T Kr
+−
=
⎛ ⎞ ⎛ ⎞+ +
⎜ ⎟ ⎜ ⎟= + + +
⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠
∑ (22)
For a system of the same amount of VSs with centralized
structure, the average response time is
2 3 3 3 2
1
2 2 2
0
(1 )
2( )
h
h
C h
K d r C
T Kd r
T Kd r
−
−
−
+
= +
−
(23)
Hence, the scalability of the hierarchical structure is
( ) ( )
3 3 3 2
1 1
2 2 2
0
3 3 3 2 2 3 21
1
2 2 2 2
10 0 0
(1 )
2
2( )
( )
1 (1 ) (1 ) (1 )
1 2( ) 2 2
h
h h
h
H
h h h ih
h i
h h
i
Kd r C
d d r
T Kd r
m
d Kd r C Kd r C Kr C
d r d r
d T Kd r T Kd r T Kr
ψ
−
− −
−
− +−
−
−
=
⎛ ⎞+
+⎜ ⎟
−⎝ ⎠=
⎛ ⎞⎛ ⎞− + + +⎜ ⎟⎜ ⎟+ + + + +
⎜ ⎟− ⎜ − ⎟− −⎝ ⎠⎝ ⎠
∑
(24)
3) Scalability analysis of the peer-to-peer structure
In a virtual infrastructure manage system with the
peer-to-peer structure, there are m VIMSs in all, able to
manage n mr= VSs at most. The average response time is
( )
( )
( )( )
22 2
0
1 (1 )
1
2 1
P
K mr r m C
T K r m
T Kmr r m
+ − +
= + − +
− + −
(25)
For a system of the same amount of VSs with centralized
structure, the average response time is
2 3 3 2
2 2
0
(1 )
2( )
C
K m r C
T Kmr
T Km r
+
= +
−
(26)
Hence, the scalability of the peer-to-peer structure is
( )
( )
( )( )
3 3 2
2 2
0
2 23 3 2
2 2
0 0
(1 )
2
2( )
( )
1 (1 )(1 )
1
2( ) 2 1
P
Km r C
mr
T Km r
m
Kmr r m CKm r C
mr r m
T Km r T Kmr r m
ψ
⎛ ⎞+
+⎜ ⎟
−⎝ ⎠=
+ − ++
+ + + − +
− − + −
(27)
IV. FURTHER DISCUSSIONS ON PERFORMANCE,
FAULT-TOLERANCE AND SCALABILITY
In this section, we will make further elaboration of the
performance, fault-tolerance and scalability of the three types
of structures. Some rules and theorems will be summarized
and proved, which are directive and with reference value for
the construction of virtual infrastructure management
system.
A. Transforming relationship between different structures
From Section 3, we can get the transforming relationship
of the three types of structures, which elucidates correctness
of the calculation result to a certain extent.
Proposition 1: In terms of performance,
fault-tolerance and scalability, the centralized structure
is a special case of the hierarchical or peer-to-peer
structure.
Proof: We prove it by giving the transform conditions.
1) From hierarchical structure to centralized structure
From (3), we obtain the average response time of the
management system with hierarchical structure is
( )
( )
( )( )
32 1 22 1 21
1 211 0
0
/ (1 )(1 )
2 2 /
hih
i
H h
hi
K n d CK nd C n
T Kd K
T Kdn d T K n d
−+−
−
−=
⎛ ⎞
+⎛ ⎞+ ⎜ ⎟
= + + +⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎜ ⎟
⎝ ⎠
∑
When there is only one layer in the hierarchical structure,
i.e. 1h = , we get
2 3 2
2
0
(1 )
( 1)
2( )
H C
K n C
T h Kn T
T Kn
+
= = + =
−
.
Meanwhile, the upper limit of the number of managed
VS nodes { }1
0 0min / , /h
HN d T K T Kd−
= . Substitute 1h = ,
then 0( 1) /H CN h T K N= = = .
Similarly, from (11), when 1h = , we obtain the
fault-tolerance of the system with hierarchical structure as
2
0( 1) exp{ ( / )}H CFT h AKn T B FT= = − + =
Since the 1h = , we get ( 1)/( 1) 1h
m d d= − − = . From
Section 3.4.1, we get the scalability relation as
( 1) ( 1)/ 1H H Ch F h Fψ = = = =
Hence, in terms of performance, fault-tolerance and
scalability, the centralized structure is a special case of the
hierarchical structure when the number of layers 1h = .
2) From peer-to-peer structure to centralized structure
From (4), we obtain the average response time of the
management system with peer-to-peer structure is
2
2 2
0
1 (1 )
1
2 1
P
n
K n m C
n m
T K m
m n
mT Kn m
m
⎛ ⎞
+ − +⎜ ⎟
⎛ ⎞ ⎝ ⎠= + − +⎜ ⎟
⎛ ⎞⎛ ⎞⎝ ⎠
− + −⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠
When 1m = , we get
2 3 2
2
0
(1 )
( 1)
2( )
P C
K n C
T m Kn T
T Kn
+
= = + =
−
.
Meanwhile, the upper limit of the number of managed
VS nodes ( )2
0( 1) 4 / ( 1)
2
P
m
N m T mK m= − + − − . Substitute
1m = , then 0( 1) /P CN m T K N= = = .
Similarly, from (14), when 1m = , we obtain the
fault-tolerance of the system with peer-to-peer structure as
2
0( 1) exp{ ( / )}P CFT m AKn T B FT= = − + =
For the scalability, from Section 3.4.1, we get
( 1) ( 1)/ 1P P Cm F m Fψ = = = =
Hence, in terms of performance, fault-tolerance and
scalability, the centralized structure is a special case of the
peer-to-peer structure when 1m = . □
286286
B. Structure selection for the best performance
Intuitively, when the number of managed VS nodes is
small, the centralized has the less average response time for
its flat and direct structure. But, as the number of VS nodes
increases, the processing time of the only VIMS increases
sharply. So we need consider another choice of structures for
better performance. We first give the following proposition.
Proposition 2: Given 1h > and 1 Cd N< < , there is a
threshold of the number of managed VS nodes
C H CM N< , and when C Hn M> , the average response time
C HT T> . Given 1m > , there is another threshold
C P CM N< , and when C Pn M> , the average response time
C PT T> .
Proof: Under the condition 1h > and Cd N< , the
number limit of managed VS nodes for the hierarchical
structure H CN N> , and for 1m > , we have P CN N> .
From (3) and (4), it is obvious that ( 1) ( 1)C HT n T n= < =
and ( 1) ( 1)C PT n T n= < = for manage one VS node. Then, to
calculate the derivative of (2), (3) and (4), we obtain
( )
0CdT n
dn
> ,
( )
0HdT n
dn
> and
( )
0PdT n
dn
> . So the average
response time for three structures is monotone increasing
function of n.
From (2), we have ( )C CT n N= → ∞ . For H CN N> and
P CN N> , we have ( )H CT n N= < ∞ and ( )P CT n N= < ∞ .
Hence, with the monotonicity and the end value, there
must be a threshold C H CM N< for the hierarchical structure
and a threshold C P CM N< for the peer-to-peer structure
making the proposition tenable. □
The values of C HM and C PM can be get by solving the
equation ( ) ( )C HT n T n= and ( ) ( )C PT n T n= . So when the
number of VS nodes min{ , }C H C Pn M M< , we select the
centralized structure, or else the others is selected according
the computing result of ( )HT n and ( )PT n .
C. Scalability comparison between the hierarchical and
peer-to-peer structure
Then, we compare the scalability of the virtual
infrastructure management systems with hierarchical
structure and peer-to-peer structure.
Proposition 3: Given 2 1r d> − , the scalability of the
system with the peer-to-peer structure ( )P mψ is higher
than that with the hierarchical structure ( )H mψ .
Proof: To compare the scalability of the two structures,
we set the numbers of VIMSs of the two system are equal, as
( 1) /( 1)h
m d d= − − . From (24) and (27), we have
lim ( ) 2( )/( )H
m
m dr r dr dψ
→∞
= − − and lim ( ) 2 /( 1)P
m
m r rψ
→∞
= + .
Given 2 1r d> − , we have ( ) ( )P m H mm mψ ψ→∞ →∞> . When m is
large enough, the scalability of the system with peer-to-peer
structure is higher than that with hierarchical structure.
Then, we create a continuous function ( ) [1, )f x C∈ ∞ to
prove the proposition.
( ) ( )( )( ) ( )
( ) ( )
( )
P Hx x x
f x
f x f x x x f x x
ψ ψ− ∈⎧⎪
= ⎨
− − + ∉⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦⎪⎩
N
N
For the equation ( ) 0f x = , there is only one solution
which is 1x = . Hence, for all 1m > , ( ) ( )P Hm mψ ψ≠ .
Assume that there is 1 1m > making 1 1( ) ( )P Hm mψ ψ<
hold. As we already know that 2 2( ) ( )P Hm mψ ψ> when 2m
is large, we have 1( ) 0f m < and 2( ) 0f m > . Therefore,
1 2( ) ( ) 0f m f m < . From the Zero Point Theorem, we know
that there must be a 1ξ > making ( ) 0f ξ = hold, which
conflicts the conclusion that there is only one solution 1x =
for the equation ( ) 0f x = . So, the assumption is false and for
all 1m > , ( ) ( )P Hm mψ ψ≥ .
From what has been discussed above, we can draw the
conclusion that for all 1m > , we have ( ) ( )P Hm mψ ψ> . □
V. NUMERICAL RESULTS AND REMARKS
In this section, we will give numerical results by
analyzing and solving an example, which helps to discover
some regularity and to verify the conclusions proposed in
Section 4. We assumed that 0.1K = which means VIMS can
process the migration request from 10 VSs per second. Since
the process for every migration request is similar, we can
assign a small constant 0.5 toC . The expectation value of
normal operation period 0T =3600, which means a VS nodes
send a migration request every mean 1 hour.
A. Performance analysis result
We study the effect on the average response time brought
by structure parameters of the hierarchical and peer-to-peer
management systems. Fig. 4(a) shows that when the number
of managed VS nodes is small, system with fewer layers has
less average response time. But as the increase of VS nodes,
the fewer layers the system has, the faster the average
response time increases. System with more layers can
perform better when the workload is heavy. The effect of the
out-degrees d is similar to h , but not so drastic like it (see
Fig. 4(b)). For the peer-to-peer structure, the effect brought
by the number of VIMS nodes m is similar (see Fig. 4(c)).
In Fig. 7(a), we compare the performance of the three
types of systems. We set 3d = and 3h = for the hierarchical
system, and 13m = for the peer-to-peer one. So the two
systems have the same number of VIMS nodes. Fig. 7(a)
shows that, when the number of VS nodes is small enough,
the centralized system has the least average response time.
As the number increase, other two systems have less average
response time, just as the proposition 2 said. After
computing, we get the number limits of managed VS nodes
of the systems with the three types of structures are 190CN = ,
1708HN = and 611PN = . It seems that hierarchical system
performs better when the number of VS nodes is big enough.
287287
B. Fault-tolerance analysis result
We study the effect on the fault-tolerance brought by
structure parameters of the systems. Fig. 5(a) shows that the
fault-tolerance of the hierarchical system decreases as the
number of VS nodes increasing, and the fewer layers the
system has, the faster it decreases. The influence brought by
the out-degrees d is similar but smaller than h (see Fig. 5(b).)
The fault-tolerance of peer-to-peer systems also decreases as
the number of VSs increasing, and the system which has
more VIMSs has better fault-tolerance (see Fig. 5(c)).
Fig. 7(b) shows the comparison of the fault-tolerance of
the three types of management systems. The hierarchical and
peer-to-peer systems both have 13 VIMS nodes. We can see
that centralized system has very poor fault-tolerance, and
that of the hierarchical system is the best one. When the
number of VS nodes is big enough, the hierarchical system
obtain an advantageous position in fault-tolerance.
C. Scalability analysis result
We study the effect on the scalability brought by
structure parameters of the hierarchical system. We set
3d = and compare the scalability of systems with different
parameter r . Fig. 6(a) shows that, the more VSs every VIMS
is able to manage, the more scalable it is. Then, we set
7r = and compare the scalability of systems with different
parameter d . As we can see from Fig. 6(b), the more son
VIMS nodes that each VIMS has, the more scalable it is.
Then we study the effect on the scalability brought by
structure parameters of the peer-to-peer system.. Fig. 6(c)
shows that, the more VSs every VIMS node is able to
manage, the more scalable it is.
In Fig. 7(c), we compare the scalability of the systems
with hierarchical structure and peer-to-peer structure.
When 7r = , the scalability of the peer-to-peer system is
higher than the hierarchical ones with different parameter
d s, which means the peer-to-peer system is more scalable
than the other. This result agrees with the conclusion proved
in the proposition 3.
VI. CONCLUSIONS
This work is the first attempt to comprehensively analyze
and evaluate the performance, fault-tolerance and scalability
of the virtual infrastructure management system. According
to the characteristic of virtual infrastructure, we give the
mathematical definition of the evaluation metrics of
performance, fault-tolerance and scalability. Three typical
structures of the virtual infrastructure management system
are studied, which are centralized, hierarchical and
peer-to-peer structures. We give detailed calculation
processes to quantitatively analyze their performance,
fault-tolerance and scalability. Based on the quantitative
analysis, some useful rules and conclusions are drawn and
proved which are directive and with reference value for the
construction of the virtual infrastructure management
systems with higher performance, fault-tolerance and
scalability.
This paper provides a general analysis and evaluation
method, which allows system designers to evaluate
alternative virtual infrastructure management systems by
assigning different values for parameters as they deem
appropriate. It certainly enhances their ability to make more
informed choices for building virtual infrastructure
management system, before undergoing the expensive
process of constructing and evaluating multiple prototypes.
ACKNOWLEDGMENT
This work was supported by the National Natural Science
Foundation of China (No. 60673187); the National High
Technology Research and Development Program of China
(NO. 2007AA01Z419)
REFERENCES
[1] I. Foster, C.Kesselman (eds.). “The Grid: Blueprint for a New
Computing Infrastructure”.Morgan Kaufmann, San Francisco, CA,
2004.
[2] K.Krauter, R.Buyya, and M.Maheswaran, “A Taxonomy and Survey
of Grid Resource Management Systems for Distributed Computing”,
International Journal of Software: Practice and Experience (SPE), Vol.
32, No. 2, pp. 135-164, 2002.
[3] I. Foster, Y. Zhao, I. Ru, and S. Lu, “Cloud Computing and Grid
Computing 360-Degree Compared”, Grid Computing Environments
(GCE) Workshop, pp. 1-10, Nov. 2008.
[4] D. A. Menasce, “Virtualization: Concepts, Applications, and
Performance Modeling”, Computer Measurement Group (CMG),
http://www.cmg.org/proceedings/2005/5189.pdf, 2005.
[5] R. Figueiredo, P. A. Dinda, and J. Fortes, “Resource Virtualization
Renaissance”. IEEE Internet Computing, Vol. 38, No. 5, pp. 28-31,
2005.
[6] P. T. Barham, B. Dragovic, K. Fraser, S. Hand, T. L. Harris, A. Ho,
R. Neugebauer, I. Pratt, and A.Warfield. “Xen and the Art of
Virtualization”. Proc. 19th ACM Symposium on Operating Systems
Principles (SOSP), pp. 164–177, October 2003.
[7] R. Figueiredo, P. Dinda, and J. Fortes, “A Case for Grid Computing
on Virtual Machines,” Proc. 23rd Int’l Conf. Distributed Computing
Systems (ICDCS), IEEE CS Press, pp. 550-559, 2003.
[8] I. Krsul, A. Ganguly, J. Zhang, J.A.B. Fortes, and R.J. Figueiredo,
“VMPlants: Providing and Managing Virtual Machine Execution
Environments for Grid Computing”, Proc. IEEE/ACM
Supercomputing Conference, IEEE CS Press, pp. 7, 2004.
[9] J. Alonso, L. Silva, A. Andrzejak, P. Silva and J. Torres.
“High-available grid services through the use of virtualized
clustering”. In Proc. 8th IEEE/ACM International Conference on Grid
Computing, pp. 34-41, Sept. 2007.
[10] Amazon, “Amazon Elastic Compute Cloud (Amazon EC2)”.
http://aws.amazon.com/ec2/, 2009.
[11] Microsoft, “Introducing the Azure Services Platform”.
http://download.microsoft.com/download/e/4/3/e43bb484-3b52-4fa8-
a9f9-ec60a32954bc/Azure_Services_Platform.docx, 2009.
[12] M. Fenn, M. Murphy, J. Martin and S. Goasguen. “An Evaluation of
KVM for Use in Cloud Computing”, Proc. 2nd International
Conference on the Virtual Computing Initiative, RTP, NC, USA, May
2008.
[13] VMware, “VMware vCloud”.
http://www.vmware.com/technology/cloud-computing.html, 2009.
[14] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I.
Pratt, and A. Warfield. “Live Migration of Virtual Machines”. Proc.
2nd ACM/USENIX Symposium on Networked Systems Design and
Implementation (NSDI), Boston, MA. pp. 273-286, May 2005.
[15] M. Nelson, B. Lim, and G. Hutchins, “Fast Transparent Migration for
Virtual Machines”. Proceedings of the annual conference on USENIX
Annual Technical Conference, pp:391-394, 2005.
288288
10
0
10
1
10
2
10
3
10
4
10
510
-1
10
0
10
1
10
2
10
3
10
4
10
5
the log of the number of virtual servers managed by VIMS nodes (log(n))
thelogoftheaverageresponsetimeofeverymigrationrequest(log(T))
Hierarchical structure with d=3 h=1
Hierarchical structure with d=3 h=2
Hierarchical structure with d=3 h=3
Hierarchical structure with d=3 h=4
Hierarchical structure with d=3 h=5
10
0
10
1
10
2
10
3
10
410
-1
10
0
10
1
10
2
10
3
10
4
10
5
the log of the number of virtual servers managed by VIMS nodes (log(n))
thelogoftheaverageresponsetimeofeverymigrationrequest(log(T))
Hierarchical structure with h=3 d=2
Hierarchical structure with h=3 d=3
Hierarchical structure with h=3 d=4
Hierarchical structure with h=3 d=5
10
0
10
1
10
2
10
310
-1
10
0
10
1
10
2
10
3
10
4
the log of the number of virtual servers managed by VIMS nodes (log(n))
thelogoftheaverageresponsetimeofeverymigrationrequest(log(T))
Peer-to-Peer structure with m=1
Peer-to-Peer structure with m=3
Peer-to-Peer structure with m=5
Peer-to-Peer structure with m=7
Peer-to-Peer structure with m=9
Peer-to-Peer structure with m=11
Peer-to-Peer structure with m=13
(a) (b) (c)
Figure 4. Average response time of systems with different structures.
0 1000 2000 3000 4000 5000 6000
0.4
0.5
0.6
0.7
0.8
0.9
1
the number of virtual servers managed by VIMS nodes (n)
thefault-toleranceindicatorofvirtualinfrastructuremanagementsystem(nF/n))
Hierarchical structure with d=3 h=1
Hierarchical structure with d=3 h=2
Hierarchical structure with d=3 h=3
Hierarchical structure with d=3 h=4
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0.4
0.5
0.6
0.7
0.8
0.9
1
the number of virtual servers managed by VIMS nodes (n)
thefault-toleranceindicatorofvirtualinfrastructuremanagementsystem(nF/n))
Hierarchical structure with h=3 d=2
Hierarchical structure with h=3 d=3
Hierarchical structure with h=3 d=4
Hierarchical structure with h=3 d=5
0 100 200 300 400 500 600 700
0.4
0.5
0.6
0.7
0.8
0.9
1
the number of virtual servers managed by VIMS nodes (n)
thefault-toleranceindicatorofvirtualinfrastructuremanagementsystem(nF/n))
Peer-to-Peer structure with m=1
Peer-to-Peer structure with m=3
Peer-to-Peer structure with m=5
Peer-to-Peer structure with m=7
Peer-to-Peer structure with m=9
Peer-to-Peer structure with m=11
Peer-to-Peer structure with m=13
(a) (b) (c)
Figure 5. Fault-tolerance of systems with different structures
10
0
10
1
10
2
10
3
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
the log of the number of VIMS nodes (log(m))
thescalabilityofvirtualinfrastructuremanagementsystem(ψ(m))
r=3
r=5
r=7
r=9
r=11
r=13
r=15
10
0
10
1
10
2
10
3
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
the log of the number of VIMS nodes (log(m))
thescalabilityofvirtualinfrastructuremanagementsystem(ψ(m))
d=3
d=5
d=7
d=9
d=11
d=13
d=15
10
0
10
1
10
2
10
3
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
the log of the number of VIMS nodes (log(m))
thescalabilityofvirtualinfrastructuremanagementsystem(ψ(m))
r=3
r=5
r=7
r=9
r=11
r=13
r=15
(a) (b) (c)
Figure 6. Scalability of the systems with different structures
10
0
10
1
10
2
10
3
10
410
-1
10
0
10
1
10
2
10
3
10
4
10
5
the log of the number of virtual servers managed by VIMS nodes (log(n))
thelogoftheaverageresponsetimeofeverymigrationrequest(log(T))
Centralized structure
Hierarchical structure with 13 VIMS nodes
Peer-to-Peer structure with 13 VIMS nodes
0 200 400 600 800 1000 1200 1400 1600 1800
0.4
0.5
0.6
0.7
0.8
0.9
1
the number of virtual servers managed by VIMS nodes (n)
thefault-toleranceindicatorofvirtualinfrastructuremanagementsystem(nF/n))
Centralized structure
Hierarchical structure with 13 VIMS nodes
Peer-to-Peer structure with 13 VIMS nodes
10
0
10
1
10
2
10
3
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
the log of the number of VIMS nodes (log(m))
thescalabilityofvirtualinfrastructuremanagementsystem(ψ(m))
Peer-to-Peer structure
Hierarchical structure d=3
Hierarchical structure d=5
Hierarchical structure d=7
Hierarchical structure d=9
(a) (b) (c)
Figure 7. Performance, fault-tolerance and scalability comparison of systems with three typical structures
[16] G. Vallée, T. Naughton, H. Ong, and S. L. Scott, “Checkpoint/Restart
of Virtual Machines Based on Xen”, In High Availability and
Performance Computing Workshop (HAPCW'06), Santa Fe, New
Mexico, USA, pp. 30, 2006.
[17] T. Garfinkel, B. Pfaff, P. Chow, P. Rosenblum, and P. Boneh, “Terra:
A Virtual Machine-Based Platform for Trusted Computing”. Proc.
19th ACM Symp. Operating Systems Principles(SOSP), ACM Press,
pp. 193-206, 2003.
[18] VMware, “VMware Infrastructure 3 Primer”.
http://www.vmware.com/products/vi/, 2009.
[19] L. Kleinrock, “Queueing Systems: Volume I: Theory”, John Wiley &
Sons, New York, pp. 187, 1975.
[20] R. K. Iyer, S. Butner, and E. J. McCluskey, “A Statistical
Failure/Load Relationship: Results of a Multicomputer Study”, IEEE
Transaction on Computers, Vol. C-31, No. 7, pp. 697-706, July 1982.
[21] A. Birolini, “Reliability Engineering Theory and Practice”, Fifth
edition. Springer, pp. 2-12, 2007.
[22] Y. Qu, C. Lin, Y. Li, and Z. Shan. “Survivability Analysis of Grid
Resource Management System Topology”. Proc. 4th International
Conference of Grid and Cooperative Computing, Lecture Notes in
Computer Science, Vol. 3795, pp. 738-743, 2005.
[23] P. Jogalekar, and M. Woodside, “Evaluating the scalability of
distributed systems”, IEEE Transactions on Parallel and Distributed
Systems, Vol. 11, Issue: 6, pp. 589-603, Jun 2000.
[24] P. Jogalekar, and M. Woodside, “Evaluating the Scalability of
Distributed Systems”, Proc. the Thirty-First Annual Hawaii
International Conference on System Sciences - Volume 7, pp.
524-531, 1998.
[25] P. Vilà, J. L. Marzo, A. Bueno, E. Calle, and L. Fàbrega, “Distributed
Network Resource Management using a Multi-Agent System:
Scalability Evaluation”, Proc. International Symposium on
Performance Evaluation of Computer and Telecommunication
Systems, pp. 355-362, July 2004.
289289

More Related Content

PDF
IRJET- An Adaptive Scheduling based VM with Random Key Authentication on Clou...
PDF
Simplified Cost Efficient Distributed System
PDF
From Virtualization to Dynamic IT
PDF
High Availability of Services in Wide-Area Shared Computing Networks
PDF
Toward Cloud Computing: Security and Performance
PDF
Pitfalls & Challenges Faced During a Microservices Architecture Implementation
PDF
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
PDF
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IRJET- An Adaptive Scheduling based VM with Random Key Authentication on Clou...
Simplified Cost Efficient Distributed System
From Virtualization to Dynamic IT
High Availability of Services in Wide-Area Shared Computing Networks
Toward Cloud Computing: Security and Performance
Pitfalls & Challenges Faced During a Microservices Architecture Implementation
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...
IMPACT OF RESOURCE MANAGEMENT AND SCALABILITY ON PERFORMANCE OF CLOUD APPLICA...

What's hot (19)

PDF
TermPaper
PDF
[IJCT-V3I3P2] Authors: Prithvipal Singh, Sunny Sharma, Amritpal Singh, Karand...
PDF
A trust management system for ad hoc mobile
PDF
AVAILABILITY METRICS: UNDER CONTROLLED ENVIRONMENTS FOR WEB SERVICES
PDF
Zálohování a DR do cloudu - přehled technologií
PDF
E VALUATION OF T WO - L EVEL G LOBAL L OAD B ALANCING F RAMEWORK IN C L...
PDF
Role of Virtual Machine Live Migration in Cloud Load Balancing
PDF
Sameer Mitter - Management Responsibilities by Cloud service model types
PDF
Embedded systems Implementation in Cloud Challenges
PPT
Client server computing in mobile environments
PDF
Algorithm for Scheduling of Dependent Task in Cloud
PDF
Dimension data cloud_security_overview
PDF
Oruta phase1 report
PDF
ENERGY EFFICIENCY IN CLOUD COMPUTING
DOCX
My seminar on distributed dbms
PDF
Ensuring distributed accountability for data sharing in the cloud
PDF
LIVE VIRTUAL MACHINE MIGRATION USING SHADOW PAGING IN CLOUD COMPUTING
PDF
Conference Paper: Elastic Network Functions: opportunities and challenges
PDF
Advanced resource allocation and service level monitoring for container orche...
TermPaper
[IJCT-V3I3P2] Authors: Prithvipal Singh, Sunny Sharma, Amritpal Singh, Karand...
A trust management system for ad hoc mobile
AVAILABILITY METRICS: UNDER CONTROLLED ENVIRONMENTS FOR WEB SERVICES
Zálohování a DR do cloudu - přehled technologií
E VALUATION OF T WO - L EVEL G LOBAL L OAD B ALANCING F RAMEWORK IN C L...
Role of Virtual Machine Live Migration in Cloud Load Balancing
Sameer Mitter - Management Responsibilities by Cloud service model types
Embedded systems Implementation in Cloud Challenges
Client server computing in mobile environments
Algorithm for Scheduling of Dependent Task in Cloud
Dimension data cloud_security_overview
Oruta phase1 report
ENERGY EFFICIENCY IN CLOUD COMPUTING
My seminar on distributed dbms
Ensuring distributed accountability for data sharing in the cloud
LIVE VIRTUAL MACHINE MIGRATION USING SHADOW PAGING IN CLOUD COMPUTING
Conference Paper: Elastic Network Functions: opportunities and challenges
Advanced resource allocation and service level monitoring for container orche...
Ad

Similar to Performance, fault tolerance and scalability analysis of virtual infrastructure management system (20)

PDF
A Dynamically-adaptive Resource Aware Load Balancing Scheme for VM migrations...
PDF
F1034047
PDF
Risk Analysis and Mitigation in Virtualized Environments
PPT
Cloud models and platforms
PDF
Resource Allocation using Virtual Machine Migration: A Survey
PPT
Iaa s cloud architectures
PDF
Virtualization in Distributed System: A Brief Overview
PDF
Virtual Machine Migration and Allocation in Cloud Computing: A Review
PDF
International Refereed Journal of Engineering and Science (IRJES)
PDF
International Refereed Journal of Engineering and Science (IRJES)
PPTX
Virtualization and its Types
PPT
Cloud models and platforms
DOCX
Short Economic EssayPlease answer MINIMUM 400 word I need this.docx
PPT
IaaS Cloud Architectures from Virtualized Data Centers to Federated Cloud Inf...
PDF
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
PDF
Analyzing the Difference of Cluster, Grid, Utility & Cloud Computing
PDF
Performance Analysis of Server Consolidation Algorithms in Virtualized Cloud...
PDF
Performance Evaluation of Server Consolidation Algorithms in Virtualized Clo...
PPTX
Four Main Types of Virtualization
PDF
Quick start guide_virtualization_uk_a4_online_2021-uk
A Dynamically-adaptive Resource Aware Load Balancing Scheme for VM migrations...
F1034047
Risk Analysis and Mitigation in Virtualized Environments
Cloud models and platforms
Resource Allocation using Virtual Machine Migration: A Survey
Iaa s cloud architectures
Virtualization in Distributed System: A Brief Overview
Virtual Machine Migration and Allocation in Cloud Computing: A Review
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
Virtualization and its Types
Cloud models and platforms
Short Economic EssayPlease answer MINIMUM 400 word I need this.docx
IaaS Cloud Architectures from Virtualized Data Centers to Federated Cloud Inf...
Server Consolidation Algorithms for Virtualized Cloud Environment: A Performa...
Analyzing the Difference of Cluster, Grid, Utility & Cloud Computing
Performance Analysis of Server Consolidation Algorithms in Virtualized Cloud...
Performance Evaluation of Server Consolidation Algorithms in Virtualized Clo...
Four Main Types of Virtualization
Quick start guide_virtualization_uk_a4_online_2021-uk
Ad

More from www.pixelsolutionbd.com (20)

PDF
Adaptive fault tolerance_in_real_time_cloud_computing
PPT
Software rejuvenation based fault tolerance
PPT
Privacy preserving secure data exchange in mobile p2 p
PPT
Adaptive fault tolerance in cloud survey
PPT
Adaptive fault tolerance in real time cloud_computing
PPT
Protecting from transient failures in cloud deployments
PPT
Protecting from transient failures in cloud microsoft azure deployments
PDF
Fault tolerance on cloud computing
PDF
Fault tolerance on cloud computing
PDF
Fault tolerance on cloud computing
PDF
Fault tolerance on cloud computing
PDF
Fault tolerance on cloud computing
PDF
Fault tolerance on cloud computing
PDF
Cyber Physical System
PDF
Fault tolerance on cloud computing
PDF
Real time service oriented cloud computing
PDF
Comprehensive analysis of performance, fault tolerance and scalability in gri...
PDF
A task based fault-tolerance mechanism to hierarchical master worker with div...
PPT
Privacy preserving secure data exchange in mobile P2P
Adaptive fault tolerance_in_real_time_cloud_computing
Software rejuvenation based fault tolerance
Privacy preserving secure data exchange in mobile p2 p
Adaptive fault tolerance in cloud survey
Adaptive fault tolerance in real time cloud_computing
Protecting from transient failures in cloud deployments
Protecting from transient failures in cloud microsoft azure deployments
Fault tolerance on cloud computing
Fault tolerance on cloud computing
Fault tolerance on cloud computing
Fault tolerance on cloud computing
Fault tolerance on cloud computing
Fault tolerance on cloud computing
Cyber Physical System
Fault tolerance on cloud computing
Real time service oriented cloud computing
Comprehensive analysis of performance, fault tolerance and scalability in gri...
A task based fault-tolerance mechanism to hierarchical master worker with div...
Privacy preserving secure data exchange in mobile P2P

Recently uploaded (20)

PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
Digital Logic Computer Design lecture notes
PPTX
Construction Project Organization Group 2.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
web development for engineering and engineering
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPT
Project quality management in manufacturing
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
additive manufacturing of ss316l using mig welding
PPTX
Lecture Notes Electrical Wiring System Components
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
CH1 Production IntroductoryConcepts.pptx
Digital Logic Computer Design lecture notes
Construction Project Organization Group 2.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
web development for engineering and engineering
UNIT 4 Total Quality Management .pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Project quality management in manufacturing
bas. eng. economics group 4 presentation 1.pptx
Foundation to blockchain - A guide to Blockchain Tech
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
additive manufacturing of ss316l using mig welding
Lecture Notes Electrical Wiring System Components
R24 SURVEYING LAB MANUAL for civil enggi
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx

Performance, fault tolerance and scalability analysis of virtual infrastructure management system

  • 1. Performance, Fault-tolerance and Scalability Analysis of Virtual Infrastructure Management System Xiangzhen Kong1 , Jiwei Huang2 , Chuang Lin3 , Peter D. Ungsunan4 Department of Computer Science and Technology Tsinghua University Beijing, 100084, China 1 xiangzhen1985@gmail.com, 2 hjw217@gmail.com, 3 chlin@tsinghua.edu.cn, 4 hongsunan@csnet1.cs.tsinghua.edu.cn Abstract—The virtual infrastructure has become more and more popular in the Grid and Cloud computing. With the aggrandizement scale, the management of the resources in virtual infrastructure faces a great technical challenge. To support the upper services effectively, it raises higher requirements for the performance, fault-tolerance and scalability of virtual infrastructure management systems. In this paper, we study the performance, fault-tolerance and scalability of virtual infrastructure management systems with the three typical structures, including centralized, hierarchical and peer-to-peer structures. We give the mathematical definition of the evaluation metrics and give detailed quantitative analysis, and then get several useful conclusions for enhancing the performance, fault-tolerance and scalability, based on the quantitative analysis. We believe that the results of this work will help system architects make informed choices for building virtual infrastructure. Keywords-virtual infrastructure management system; performance; fault-tolerance; scalability I. INTRODUCTION A Grid is a very large-scale distributed network computing system that can scale to Internet size environments [1, 2]. In recent years, another distributed computing model called Cloud computing has come into being, which is closely related to the existing Grid computing [3]. Both in the Grid and Cloud computing, flexible and efficient sharing of distributed resources is at the core of the design and implementation, which brings forward higher requirements for low-cost, scalable and dependable infrastructure. Under this background, infrastructure virtualization becomes a growing concern. Virtualization adds a hardware abstraction layer called the Virtual Machine Monitor (VMM) or Hypervisor. The layer provides an interface that is functionally equivalent to the actual hardware to a number of virtual machines (VMs) [4, 6]. More recently, virtualization became important as a way to improve system security, reliability, reduce costs, and provide greater flexibility [5]. Many servers, on each of which run several VMs, are connected with one another by network modules, and they are under the unified management and play the role of infrastructure for upper application, which are called virtual infrastructure. Virtual infrastructure has numerous advantages, such as low-cost, ease of deployment and more dependable. The virtual infrastructure is becoming popular in Grid and Cloud [7-13]. The virtual infrastructure management system is charge of controlling the resources and VMs. The management of virtual infrastructure has some particular characteristics, for the special mechanism of virtualization such as live migration [14]. The virtual infrastructure management system plays an important role and has a direct impact on the capability of the overall infrastructure. However, with the sharp increase of servers and the large scale of system, the management of virtual infrastructure faces many tough challenges. The sharp increase of virtual servers requires higher scalability; the need for efficiency and quality of service (QoS) of upper application puts forward a demand for real-time and response time; besides, the fault-tolerance is also an important requirement, especially when the management nodes increase. In this paper, we study the virtual infrastructure management systems with three typical structures, and evaluate their performance, fault-tolerance and scalability. The contributions are as follows: • According to characteristics of virtual infrastructure management systems, we propose the performance, fault-tolerance and scalability evaluation metrics, and give the detailed expressions and calculations. • We make the quantitative analysis and evaluation of the performance, fault-tolerance and scalability of the virtual infrastructure management systems with three typical structures, including centralized, hierarchical and peer-to-peer structures. • Based on the performance, fault-tolerance and scalability analyses, we summarize some basic rules, which are directive and with reference value for the design and implementation of virtual infrastructure management system in Grid and Cloud computing. The rest of this paper is organized as follows. Section 2 introduces the research background and some concepts. In Section 3, three typical structures of virtual infrastructure management system are introduced, and then the detail calculation and analysis of the performance, fault-tolerance and scalability are given. Section 4 makes further discussions and summarizes some useful rules. Section 5 shows the numerical results by analyzing an example. At last, we conclude the paper in Section 6. II. BACKGROUND Infrastructure virtualization becomes a hot issue in interested among the industry and academe. Virtualization is becoming popular in Grid computing[7-9], and is inherently 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications Unrecognized Copyright Information DOI 10.1109/ISPA.2009.24 282 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications 978-0-7695-3747-4/09 $25.00 © 2009 IEEE DOI 10.1109/ISPA.2009.24 282
  • 2. Figure 1. VMware infrastructure 3 components [18]. Figure 2. The typical structures of the virtual infrastructure management system:(a) Centralized (b) Hierarchical (c) Peer-to-peer structures. key feature of Cloud computing [10-13]. Benefited from its special mechanisms, the virtual infrastructure has numerous advantages. Server consolidation reduces the cost sharply; live migration[14] or VMotion[15] greatly improve the flexibility, maintainability and availability; Checkpoint / Restart and Isolation help enhance the reliability, security and survivability[16, 17]. The implementation of these mechanisms is under unified management of the virtual infrastructure management system. Fig. 1 shows an example of virtual infrastructure, called VMware Infrastructure 3 [18], which is published by VMware Inc. Here is a small-scale system, and the virtual infrastructure management system consists of only one management server called VirtualCenter (VC) server. The VC server manages multiple virtual servers at the same time, and unifies resources from individual virtual server so that those resources can be shared among virtual machines. The running state of every virtual server is monitored and recorded by VC server. When one virtual server overloads or suffers failures, some of the VMs running on it could move to another suitable virtual server by VMotion with little downtime. The migration target virtual server is selected by VC server according to the running state of every virtual server. The server with least workload or most dependability will be selected according to different strategies. With the rapid development of distributed application, the scale of system aggrandizes sharply. There are even tens of thousands of servers in some large virtual infrastructure of Grid or Cloud computing system. In this case, how to construct an effective virtual infrastructure management system for higher performance, fault-tolerance and scalability becomes a challenge. III. PERFORMANCE, FAULT-TOLERANCE AND SCALABILITY ANALYSIS A. Typical Structures of Virtual Infrastructure Management System Similar to the traditional Grid resource manage system, the structures of virtual infrastructure management system can be classified into centralized, hierarchical and peer-to-peer structures [2, 22]. The other complex structures can be view as the hybrid of the three typical structures. 1) Centralized structure In a centralized virtual infrastructure management system, all Virtual Servers (VSs) are managed by one Virtual Infrastructure Management Server (VIMS). The central VIMS monitors and records the running state information of every VS. When one of the virtual machines in one VS needs migration, the VIMS decides which is the best migration target VS. The virtual infrastructure with centralized structure management system is shown in Fig. 2 (a). 2) Hierarchical structure The hierarchical structure is shown in Fig. 2 (b). The VIMSs in the lowest layer manage the virtual servers directly, and the lower layer VIMSs are under the administration of the parent VIMSs. Lower layer VIMSs need merely to pass the digest message to their parent VIMS, which reduce the traffic and raise efficiency. The root VIMS has the whole information of all VSs. 3) Peer-to-peer structure Virtual infrastructure management system of peer-to-peer structure is a decentralized system (see Fig. 2(c)). It can be viewed as another expansion of the centralized structure. Every peer VIMS has local group of virtual servers, different VIMSs communicate directly with each other to get running state of the VS managed by others. Every VIMS can get the whole information of the virtual infrastructure. B. Performance Analysis We take average response time of every migration request as the performance metric, which is the expectation period from the time that a migration request is submitted to VIMS, to the time that VIMS gets the best choice of the target VS. Hereafter, we assume the total number of VS nodes is n , and the operation period of every VS needless migration conforms to exponential distribution, with expectation value 0T . So the migration request event from every VS node conforms to Poisson distribution with parameter 0 01/Tλ = . 1) Performance analysis of the centralized structure There is only one VIMS which manages n VSs in the centralized system. According to the additive property of the Poisson distribution, we know that the total migration requests received by the central VIMS conform to Poisson distribution too. The arrival rate of migration requests is 0nλ λ= . The central VIMS compares the workload or dependability of every VS node recorded in it, and then selects the best choice as the migration target VS. The process takes ( )O n time generally, so we can assume that the average processing time of VIMS for every migration request is S Kn= , where K is a constant independent from n . Assuming the processing time of VIMS for every migration request conforms to arbitrary distribution, we can 283283
  • 3. view the VIMS in centralized structure as an M/G/1 queue model. According to the Pollaczek-Khinchin mean-value formula [19], the average service time of M/G/1 queue is 2 (1 ) 2(1 ) S C T S ρ ρ + = + − (1) WhereC is the coefficient of variation of the processing time S , i.e. /SC Sσ= . SoC is relevant to the distribution of S. But as the process for every migration request is similar, we can assign a small constant toC . ρ is the server utilization, and should be less than one for the stability of system. For the centralized structure, we have S Kn= and 2 0/S Kn Tρ λ= = . Therefore, from (1), the average response time of every migration request can be expressed by 2 3 2 2 0 (1 ) 2( ) C K n C T Kn T Kn + = + − (2) To keep the system’s stability, the utilization should be 1ρ < . So the number limit of the VS nodes those can be managed by the central VIMS is 0 / Cn T K N< = . 2) Performance analysis of the hierarchical structure It is assumed that a hierarchical system has h layers. Every VIMS except those in the lowest layer has d son VIMSs. So there are 1l d − VIMSs in the l th layer ( 1,2, ,l h= ) and ( ) ( )1 / 1h m d d= − − VIMSs totally. Each VIMS in the lowest layer directly manage 1 / h n d − VS nodes. Therefore, the average arrival rate of migration request is ( )1 0/ h h n d Tλ − = for the VIMS in the lowest layer, and ( )1 0/ i i n d Tλ − = for the VIMS in the i th (1 1i h≤ ≤ − ) layer. The average processing time of every migration request for the VIMS in the lowest layer is 1 / h hS Kn d − = ; and because the VIMSs in lower layer only need to report its locally optimal choice to its parent VIMS, the average processing time of every migration request for the VIMS in the i th (1 1i h≤ ≤ − ) layer is i iS Kd= . So the utilization of VIMS in the lowest layer is ( ) 21 0/ /h h h hS K n d Tρ λ − = = , and for the VIMS in the i th (1 1i h≤ ≤ − ) layer 0/i i iS Kdn Tρ λ= = . From (1), we get the average response time of every migration request in hierarchical management system as ( ) ( ) ( )( ) 1 1 32 1 22 1 21 1 211 0 0 / (1 )(1 ) 2 2 / h H i h i hih i h hi T T T K n d CK nd C n Kd K T Kdn d T K n d − = −+− − −= = + ⎛ ⎞ +⎛ ⎞+ ⎜ ⎟ = + + +⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎜ ⎟ ⎝ ⎠ ∑ ∑ (3) For the stability of the system, we have 1hρ < and 1iρ < . So the number limit of the VSs those can be managed by the hierarchical VIMSs is { }1 0 0min / , /h Hn d T K T Kd N− < = . 3) Performance analysis of the peer-to-peer structure It is assumed that there are m VIMSs in the virtual infrastructure management system, so the number of VSs managed by a VIMS is /n m . When the virtual machines in a VS need migration, the VS sends a migration request to its local VIMS, which computes the local optimal target VS, gets all the local optimal target servers from other 1m − VIMS nodes, and then selects the best choice. So the average arrival rate of migration request for every VIMS is 0 0 0 ( 1) n n n m m m T λ λ λ= + − = ; the average processing time is ( / 1)S K n m m= + − , and the server utilization is 0( / 1)/S Kn n m m Tρ λ= = + − . From (1), we can obtain the average response time of every migration request as 2 2 2 0 1 (1 ) 1 2 1 P n K n m C n m T K m m n T Kn m m ⎛ ⎞ + − +⎜ ⎟ ⎛ ⎞ ⎝ ⎠= + − +⎜ ⎟ ⎛ ⎞⎛ ⎞⎝ ⎠ − + −⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠ (4) The stable condition of the system is 1ρ < , so the number limit of the VSs can be managed by the peer-to-peer VIMSs is ( )2 0( 1) 4 / ( 1) 2 P m n m T mK m N< − + − − = . C. Fault-tolerance Analysis The workload was found to be a very influential factor to system failure rate [20]. Using a linear model to describe the relationship between them, we give the failure rate as f A Bρ= + , where ρ is the system utilization, A and B are constants, which represent the effect on failure brought by workload and the inherent failure rate respectively. According the reliability theory, the probability that system is failure free in a time period (0, )t is ( ) { } ft R t Prob Y t e− = > = . So the probability that system breaks down in (0, )t is ( ) 1 ( ) 1 ft F t R t e− = − = − . The failure of VIMS node may lead to the result that the information of part or all of the VS nodes can’t be obtained by the system, which will hamper the VIMSs to compute the best choice of migration target server. As Fn is the average number of VS nodes still managed by the system in the presence of VIMS failure, and n is the total number of VS nodes, the fault-tolerance of the system can be expressed by /FFT n n= (5) 1) Fault-tolerance analysis of centralized structure According to subsection 3.2, the utilization of the only VIMS in centralized system is 2 0/C Kn Tρ = . So the failure rate is C Cf A Bρ= + , and the probability that system breaks down in (0, )t is ( ) 1 ( ) 1 Cf t CF t R t e− = − = − . Hereafter, we consider the probability in unit time, so 2 01 1 exp{ ( / )}Cf CF e AKn T B− = − = − − + (6) Since there is only one VIMS node in centralized structure, all VS nodes can’t be managed when the VIMS fails. Hence 0 (1 )F C Cn F n F= + −i , and the fault-tolerance is 2 0/ (1 )/ exp{ ( / )}C F CFT n n n F n AKn T B= = − = − + (7) 2) Fault-tolerance analysis of hierarchical structure There are h layers VIMS nodes in the hierarchical system and n VS nodes managed by the VIMS nodes. The 284284
  • 4. number of VIMS nodes in the k th layer is 1k km d − = . The utilization of VIMS nodes in the k th layer is ( ) 0 21 0 / 1 1 / / k h Kdn T k h K n d T k h ρ − ≤ ≤ −⎧⎪ = ⎨ =⎪⎩ (8) Since the failure probability of one VIMS node in the k th layer is ( ) 1 1 exp{ ( )}Hf H kF k e A Bρ− = − = − − + , we can obtain ( ) 0 21 0 1 exp{ ( / )} 1 1 ( ) 1 exp{ [ / / ]} H h AKdn T B k h F k AK n d T B k h− − − + ≤ ≤ −⎧⎪ = ⎨ − − + =⎪⎩ (9) In the k th layer, when there are i nodes fail, the VIMS can still manage ( ) ( )1 ( ) / / k k kn i n i n m n i n d − = − = − VS nodes. Therefore, the average number of VS nodes that can be managed by the VIMS nodes in the k th layer is 1 1 1 1 1 1 ( ) ( )(1 ( )) ( ) 1 ( ) (1 ( )) k k k k m k m i F H H k i kd i d i H H k i m n k F k F k n i i d i n F k F k di − − − = − − − = ⎛ ⎞ = −⎜ ⎟ ⎝ ⎠ ⎛ ⎞⎛ ⎞ = − −⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠ ∑ ∑ i i (10) Hence, from (9) and (10), we obtain the fault-tolerance of the hierarchical structure as 1 1 1 11 1 1 1 0 0 2 21 1 1 0 0 1 ( ) 1 1 1 exp{ ( )} exp{ ( )} 1 1 1 exp{ ( )} exp{ ( )} k k h F kF H i d ikh d k k i i h h h n k n h FT n n d Kdn Kdn i A B A B h T T di d K n K n A B A B h T Td di − − = −−− − = = − − − = = ⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟= − − − + − + +⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠ ⎛ ⎞ ⎛⎛ ⎞ ⎛ ⎞ ⎛ ⎞ − − − + − +⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠ ⎝ ∑ ∑ ∑ i 1 1 1 1 h h d i d h i i d − − − − = ⎛ ⎞⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎠⎝ ⎠ ∑ i (11) 3) Fault-tolerance analysis of peer-to-peer structure Each VIMS node manages /n m VS nodes directly. So when i VIMS nodes fail, the average number of VS nodes that can be still managed is ( ) /Fn i n i n m= − ⋅ . Assuming different VIMS nodes failure events are independent, then 0 0 (1 ) ( ) 1 (1 ) m i n i F P P F i m i m i P P i m n F F n i i m i n F F i m − = − = ⎛ ⎞ = −⎜ ⎟ ⎝ ⎠ ⎛ ⎞⎛ ⎞ = − −⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠ ∑ ∑ i (12) Because the utilization is 0( / 1)/P Kn n m m mTρ = + − and failure rate is P Pf A Bρ= + , the failure probability is 01 1 exp{ [ ( / 1)/ ]}Pf PF e AKn n m m T B− = − = − − + − + (13) Hence, the fault-tolerance of the peer-to-peer structure is 0 0 0 1 1 exp{ ( ( 1) )} exp{ ( ( 1) )} F P i m i m i n FT n mi Kn n Kn n A m B A m B im T m T m − = = = ⎛ ⎞ ⎛ ⎞⎛ ⎞ − − − + − + − + − +⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ∑ (14) D. Scalability Analysis Jogalekar et al. proposed a strategy-based scalability metric for distributed systems [23, 24]. The scalability can be expressed as 2 2 2 2 1 1 1 2 2 1 1 1 1 1 2 2 ˆ/ (1 / ) ( , ) ( )/ ( ) ˆ/ (1 / ) f C C T T k k F k F k f C C T T λ λ ψ λ λ + = = = + (15) Where, λ is the throughput in responses/sec, C(k) is the rental cost and ˆ( ) 1/(1 ( )/ )f k T k T= + is the average value of each response, where T is the mean response time and ˆT is the target value. The productivity ( ) ( ) ( )/ ( )F k k f k C kλ= is the value delivered per second. To give analytic solutions, k for the base case is taken as 1, and the metric is written as ( ) ( )/ (1)k F k Fψ = [23]. Fig. 3 shows howψ might behave in different situations [24, 25]. The value ofψ and its trend relative to the scale factor is taken to evaluate the scalability of a system. For a virtual infrastructure management system, we concern on the change of management efficiency instead of productivity by scaling up the system. The management efficiency when there are m VIMSs is defined as ( ) ( )/ ( )F m n f m C m= ⋅ (16) Where, n is the maximum number of the VSs that the VIMSs are able to manage; f(m) is the average value of each VIMS determined by the management performance; C(m) is the cost at scale m, assumed as ( )C m mα= . For the average response time, the centralized structure can be viewed as the special case of the hierarchical and peer-to-peer structure (to be proved in Section 4). So we take the centralized structure as the point of reference. The average value of each response in other structures is evaluated by comparing with the centralized one, as ( )( ) 1/ 1 ( )/ Cf m T m T= + (17) Where, T(m) is the average response time in a certain structure, and TC is the average response time of the centralized structure. So the management efficiency is, ( ) (1 ( )/ )C n F m m T m Tα = + (18) And, the scalability for scale m2 relative to m1 is 2 1 1 1 2 2 1 1 2 2 (1 ( )/ ) ( , ) ( )/ ( ) (1 ( )/ ) C C n m T m T m m F m F m n m T m T ψ + = = + (19) 1) Scalability analysis of the centralized structure In the centralized system, there is only one VIMS, i.e. 1m = , which can manage n r= VSs at most, where r is the maximum number of VSs that a VIMS is able to manage. The average management efficiency is (1 / ) 2 C C C r r F T Tα α = = + (20) For both the hierarchical and peer-to-peer structure, when 1m = , n r= , we have H CT T= and P CT T= (to be proved in Section 4). Hence, ( 1)H CF m F= = and ( 1)P CF m F= = , then we have ( 1) 1H mψ = = and ( 1) 1P mψ = = as a result. In conclusion, the centralized structure is a basic case of the hierarchical structure and the peer-to-peer structure, and it could be used to be compared with other cases when 1m ≠ . For both the hierarchical and the peer-to-peer structure, 285285
  • 5. Figure 3. Different scalability behaviors [24, 25]. 2 ( ) ( )/ (1 ( )/ ) C C n m F m F rm T m T ψ = = + (21) 2) Scalability analysis of the hierarchical structure In a hierachical system, there are ( 1)/( 1)h m d d= − − VIMSs in all, able to manage 1h n d r− = VSs at most. The average response time is ( ) ( ) 2 2 2 3 21 2 1 0 0 (1 ) (1 ) 2 2 h ih i H h i K d r C K r C T Kd Kr T Kd r T Kr +− = ⎛ ⎞ ⎛ ⎞+ + ⎜ ⎟ ⎜ ⎟= + + + ⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠ ∑ (22) For a system of the same amount of VSs with centralized structure, the average response time is 2 3 3 3 2 1 2 2 2 0 (1 ) 2( ) h h C h K d r C T Kd r T Kd r − − − + = + − (23) Hence, the scalability of the hierarchical structure is ( ) ( ) 3 3 3 2 1 1 2 2 2 0 3 3 3 2 2 3 21 1 2 2 2 2 10 0 0 (1 ) 2 2( ) ( ) 1 (1 ) (1 ) (1 ) 1 2( ) 2 2 h h h h H h h h ih h i h h i Kd r C d d r T Kd r m d Kd r C Kd r C Kr C d r d r d T Kd r T Kd r T Kr ψ − − − − − +− − − = ⎛ ⎞+ +⎜ ⎟ −⎝ ⎠= ⎛ ⎞⎛ ⎞− + + +⎜ ⎟⎜ ⎟+ + + + + ⎜ ⎟− ⎜ − ⎟− −⎝ ⎠⎝ ⎠ ∑ (24) 3) Scalability analysis of the peer-to-peer structure In a virtual infrastructure manage system with the peer-to-peer structure, there are m VIMSs in all, able to manage n mr= VSs at most. The average response time is ( ) ( ) ( )( ) 22 2 0 1 (1 ) 1 2 1 P K mr r m C T K r m T Kmr r m + − + = + − + − + − (25) For a system of the same amount of VSs with centralized structure, the average response time is 2 3 3 2 2 2 0 (1 ) 2( ) C K m r C T Kmr T Km r + = + − (26) Hence, the scalability of the peer-to-peer structure is ( ) ( ) ( )( ) 3 3 2 2 2 0 2 23 3 2 2 2 0 0 (1 ) 2 2( ) ( ) 1 (1 )(1 ) 1 2( ) 2 1 P Km r C mr T Km r m Kmr r m CKm r C mr r m T Km r T Kmr r m ψ ⎛ ⎞+ +⎜ ⎟ −⎝ ⎠= + − ++ + + + − + − − + − (27) IV. FURTHER DISCUSSIONS ON PERFORMANCE, FAULT-TOLERANCE AND SCALABILITY In this section, we will make further elaboration of the performance, fault-tolerance and scalability of the three types of structures. Some rules and theorems will be summarized and proved, which are directive and with reference value for the construction of virtual infrastructure management system. A. Transforming relationship between different structures From Section 3, we can get the transforming relationship of the three types of structures, which elucidates correctness of the calculation result to a certain extent. Proposition 1: In terms of performance, fault-tolerance and scalability, the centralized structure is a special case of the hierarchical or peer-to-peer structure. Proof: We prove it by giving the transform conditions. 1) From hierarchical structure to centralized structure From (3), we obtain the average response time of the management system with hierarchical structure is ( ) ( ) ( )( ) 32 1 22 1 21 1 211 0 0 / (1 )(1 ) 2 2 / hih i H h hi K n d CK nd C n T Kd K T Kdn d T K n d −+− − −= ⎛ ⎞ +⎛ ⎞+ ⎜ ⎟ = + + +⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎜ ⎟ ⎝ ⎠ ∑ When there is only one layer in the hierarchical structure, i.e. 1h = , we get 2 3 2 2 0 (1 ) ( 1) 2( ) H C K n C T h Kn T T Kn + = = + = − . Meanwhile, the upper limit of the number of managed VS nodes { }1 0 0min / , /h HN d T K T Kd− = . Substitute 1h = , then 0( 1) /H CN h T K N= = = . Similarly, from (11), when 1h = , we obtain the fault-tolerance of the system with hierarchical structure as 2 0( 1) exp{ ( / )}H CFT h AKn T B FT= = − + = Since the 1h = , we get ( 1)/( 1) 1h m d d= − − = . From Section 3.4.1, we get the scalability relation as ( 1) ( 1)/ 1H H Ch F h Fψ = = = = Hence, in terms of performance, fault-tolerance and scalability, the centralized structure is a special case of the hierarchical structure when the number of layers 1h = . 2) From peer-to-peer structure to centralized structure From (4), we obtain the average response time of the management system with peer-to-peer structure is 2 2 2 0 1 (1 ) 1 2 1 P n K n m C n m T K m m n mT Kn m m ⎛ ⎞ + − +⎜ ⎟ ⎛ ⎞ ⎝ ⎠= + − +⎜ ⎟ ⎛ ⎞⎛ ⎞⎝ ⎠ − + −⎜ ⎟⎜ ⎟ ⎝ ⎠⎝ ⎠ When 1m = , we get 2 3 2 2 0 (1 ) ( 1) 2( ) P C K n C T m Kn T T Kn + = = + = − . Meanwhile, the upper limit of the number of managed VS nodes ( )2 0( 1) 4 / ( 1) 2 P m N m T mK m= − + − − . Substitute 1m = , then 0( 1) /P CN m T K N= = = . Similarly, from (14), when 1m = , we obtain the fault-tolerance of the system with peer-to-peer structure as 2 0( 1) exp{ ( / )}P CFT m AKn T B FT= = − + = For the scalability, from Section 3.4.1, we get ( 1) ( 1)/ 1P P Cm F m Fψ = = = = Hence, in terms of performance, fault-tolerance and scalability, the centralized structure is a special case of the peer-to-peer structure when 1m = . □ 286286
  • 6. B. Structure selection for the best performance Intuitively, when the number of managed VS nodes is small, the centralized has the less average response time for its flat and direct structure. But, as the number of VS nodes increases, the processing time of the only VIMS increases sharply. So we need consider another choice of structures for better performance. We first give the following proposition. Proposition 2: Given 1h > and 1 Cd N< < , there is a threshold of the number of managed VS nodes C H CM N< , and when C Hn M> , the average response time C HT T> . Given 1m > , there is another threshold C P CM N< , and when C Pn M> , the average response time C PT T> . Proof: Under the condition 1h > and Cd N< , the number limit of managed VS nodes for the hierarchical structure H CN N> , and for 1m > , we have P CN N> . From (3) and (4), it is obvious that ( 1) ( 1)C HT n T n= < = and ( 1) ( 1)C PT n T n= < = for manage one VS node. Then, to calculate the derivative of (2), (3) and (4), we obtain ( ) 0CdT n dn > , ( ) 0HdT n dn > and ( ) 0PdT n dn > . So the average response time for three structures is monotone increasing function of n. From (2), we have ( )C CT n N= → ∞ . For H CN N> and P CN N> , we have ( )H CT n N= < ∞ and ( )P CT n N= < ∞ . Hence, with the monotonicity and the end value, there must be a threshold C H CM N< for the hierarchical structure and a threshold C P CM N< for the peer-to-peer structure making the proposition tenable. □ The values of C HM and C PM can be get by solving the equation ( ) ( )C HT n T n= and ( ) ( )C PT n T n= . So when the number of VS nodes min{ , }C H C Pn M M< , we select the centralized structure, or else the others is selected according the computing result of ( )HT n and ( )PT n . C. Scalability comparison between the hierarchical and peer-to-peer structure Then, we compare the scalability of the virtual infrastructure management systems with hierarchical structure and peer-to-peer structure. Proposition 3: Given 2 1r d> − , the scalability of the system with the peer-to-peer structure ( )P mψ is higher than that with the hierarchical structure ( )H mψ . Proof: To compare the scalability of the two structures, we set the numbers of VIMSs of the two system are equal, as ( 1) /( 1)h m d d= − − . From (24) and (27), we have lim ( ) 2( )/( )H m m dr r dr dψ →∞ = − − and lim ( ) 2 /( 1)P m m r rψ →∞ = + . Given 2 1r d> − , we have ( ) ( )P m H mm mψ ψ→∞ →∞> . When m is large enough, the scalability of the system with peer-to-peer structure is higher than that with hierarchical structure. Then, we create a continuous function ( ) [1, )f x C∈ ∞ to prove the proposition. ( ) ( )( )( ) ( ) ( ) ( ) ( ) P Hx x x f x f x f x x x f x x ψ ψ− ∈⎧⎪ = ⎨ − − + ∉⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦⎪⎩ N N For the equation ( ) 0f x = , there is only one solution which is 1x = . Hence, for all 1m > , ( ) ( )P Hm mψ ψ≠ . Assume that there is 1 1m > making 1 1( ) ( )P Hm mψ ψ< hold. As we already know that 2 2( ) ( )P Hm mψ ψ> when 2m is large, we have 1( ) 0f m < and 2( ) 0f m > . Therefore, 1 2( ) ( ) 0f m f m < . From the Zero Point Theorem, we know that there must be a 1ξ > making ( ) 0f ξ = hold, which conflicts the conclusion that there is only one solution 1x = for the equation ( ) 0f x = . So, the assumption is false and for all 1m > , ( ) ( )P Hm mψ ψ≥ . From what has been discussed above, we can draw the conclusion that for all 1m > , we have ( ) ( )P Hm mψ ψ> . □ V. NUMERICAL RESULTS AND REMARKS In this section, we will give numerical results by analyzing and solving an example, which helps to discover some regularity and to verify the conclusions proposed in Section 4. We assumed that 0.1K = which means VIMS can process the migration request from 10 VSs per second. Since the process for every migration request is similar, we can assign a small constant 0.5 toC . The expectation value of normal operation period 0T =3600, which means a VS nodes send a migration request every mean 1 hour. A. Performance analysis result We study the effect on the average response time brought by structure parameters of the hierarchical and peer-to-peer management systems. Fig. 4(a) shows that when the number of managed VS nodes is small, system with fewer layers has less average response time. But as the increase of VS nodes, the fewer layers the system has, the faster the average response time increases. System with more layers can perform better when the workload is heavy. The effect of the out-degrees d is similar to h , but not so drastic like it (see Fig. 4(b)). For the peer-to-peer structure, the effect brought by the number of VIMS nodes m is similar (see Fig. 4(c)). In Fig. 7(a), we compare the performance of the three types of systems. We set 3d = and 3h = for the hierarchical system, and 13m = for the peer-to-peer one. So the two systems have the same number of VIMS nodes. Fig. 7(a) shows that, when the number of VS nodes is small enough, the centralized system has the least average response time. As the number increase, other two systems have less average response time, just as the proposition 2 said. After computing, we get the number limits of managed VS nodes of the systems with the three types of structures are 190CN = , 1708HN = and 611PN = . It seems that hierarchical system performs better when the number of VS nodes is big enough. 287287
  • 7. B. Fault-tolerance analysis result We study the effect on the fault-tolerance brought by structure parameters of the systems. Fig. 5(a) shows that the fault-tolerance of the hierarchical system decreases as the number of VS nodes increasing, and the fewer layers the system has, the faster it decreases. The influence brought by the out-degrees d is similar but smaller than h (see Fig. 5(b).) The fault-tolerance of peer-to-peer systems also decreases as the number of VSs increasing, and the system which has more VIMSs has better fault-tolerance (see Fig. 5(c)). Fig. 7(b) shows the comparison of the fault-tolerance of the three types of management systems. The hierarchical and peer-to-peer systems both have 13 VIMS nodes. We can see that centralized system has very poor fault-tolerance, and that of the hierarchical system is the best one. When the number of VS nodes is big enough, the hierarchical system obtain an advantageous position in fault-tolerance. C. Scalability analysis result We study the effect on the scalability brought by structure parameters of the hierarchical system. We set 3d = and compare the scalability of systems with different parameter r . Fig. 6(a) shows that, the more VSs every VIMS is able to manage, the more scalable it is. Then, we set 7r = and compare the scalability of systems with different parameter d . As we can see from Fig. 6(b), the more son VIMS nodes that each VIMS has, the more scalable it is. Then we study the effect on the scalability brought by structure parameters of the peer-to-peer system.. Fig. 6(c) shows that, the more VSs every VIMS node is able to manage, the more scalable it is. In Fig. 7(c), we compare the scalability of the systems with hierarchical structure and peer-to-peer structure. When 7r = , the scalability of the peer-to-peer system is higher than the hierarchical ones with different parameter d s, which means the peer-to-peer system is more scalable than the other. This result agrees with the conclusion proved in the proposition 3. VI. CONCLUSIONS This work is the first attempt to comprehensively analyze and evaluate the performance, fault-tolerance and scalability of the virtual infrastructure management system. According to the characteristic of virtual infrastructure, we give the mathematical definition of the evaluation metrics of performance, fault-tolerance and scalability. Three typical structures of the virtual infrastructure management system are studied, which are centralized, hierarchical and peer-to-peer structures. We give detailed calculation processes to quantitatively analyze their performance, fault-tolerance and scalability. Based on the quantitative analysis, some useful rules and conclusions are drawn and proved which are directive and with reference value for the construction of the virtual infrastructure management systems with higher performance, fault-tolerance and scalability. This paper provides a general analysis and evaluation method, which allows system designers to evaluate alternative virtual infrastructure management systems by assigning different values for parameters as they deem appropriate. It certainly enhances their ability to make more informed choices for building virtual infrastructure management system, before undergoing the expensive process of constructing and evaluating multiple prototypes. ACKNOWLEDGMENT This work was supported by the National Natural Science Foundation of China (No. 60673187); the National High Technology Research and Development Program of China (NO. 2007AA01Z419) REFERENCES [1] I. Foster, C.Kesselman (eds.). “The Grid: Blueprint for a New Computing Infrastructure”.Morgan Kaufmann, San Francisco, CA, 2004. [2] K.Krauter, R.Buyya, and M.Maheswaran, “A Taxonomy and Survey of Grid Resource Management Systems for Distributed Computing”, International Journal of Software: Practice and Experience (SPE), Vol. 32, No. 2, pp. 135-164, 2002. [3] I. Foster, Y. Zhao, I. Ru, and S. Lu, “Cloud Computing and Grid Computing 360-Degree Compared”, Grid Computing Environments (GCE) Workshop, pp. 1-10, Nov. 2008. [4] D. A. Menasce, “Virtualization: Concepts, Applications, and Performance Modeling”, Computer Measurement Group (CMG), http://www.cmg.org/proceedings/2005/5189.pdf, 2005. [5] R. Figueiredo, P. A. Dinda, and J. Fortes, “Resource Virtualization Renaissance”. IEEE Internet Computing, Vol. 38, No. 5, pp. 28-31, 2005. [6] P. T. Barham, B. Dragovic, K. Fraser, S. Hand, T. L. Harris, A. Ho, R. Neugebauer, I. Pratt, and A.Warfield. “Xen and the Art of Virtualization”. Proc. 19th ACM Symposium on Operating Systems Principles (SOSP), pp. 164–177, October 2003. [7] R. Figueiredo, P. Dinda, and J. Fortes, “A Case for Grid Computing on Virtual Machines,” Proc. 23rd Int’l Conf. Distributed Computing Systems (ICDCS), IEEE CS Press, pp. 550-559, 2003. [8] I. Krsul, A. Ganguly, J. Zhang, J.A.B. Fortes, and R.J. Figueiredo, “VMPlants: Providing and Managing Virtual Machine Execution Environments for Grid Computing”, Proc. IEEE/ACM Supercomputing Conference, IEEE CS Press, pp. 7, 2004. [9] J. Alonso, L. Silva, A. Andrzejak, P. Silva and J. Torres. “High-available grid services through the use of virtualized clustering”. In Proc. 8th IEEE/ACM International Conference on Grid Computing, pp. 34-41, Sept. 2007. [10] Amazon, “Amazon Elastic Compute Cloud (Amazon EC2)”. http://aws.amazon.com/ec2/, 2009. [11] Microsoft, “Introducing the Azure Services Platform”. http://download.microsoft.com/download/e/4/3/e43bb484-3b52-4fa8- a9f9-ec60a32954bc/Azure_Services_Platform.docx, 2009. [12] M. Fenn, M. Murphy, J. Martin and S. Goasguen. “An Evaluation of KVM for Use in Cloud Computing”, Proc. 2nd International Conference on the Virtual Computing Initiative, RTP, NC, USA, May 2008. [13] VMware, “VMware vCloud”. http://www.vmware.com/technology/cloud-computing.html, 2009. [14] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I. Pratt, and A. Warfield. “Live Migration of Virtual Machines”. Proc. 2nd ACM/USENIX Symposium on Networked Systems Design and Implementation (NSDI), Boston, MA. pp. 273-286, May 2005. [15] M. Nelson, B. Lim, and G. Hutchins, “Fast Transparent Migration for Virtual Machines”. Proceedings of the annual conference on USENIX Annual Technical Conference, pp:391-394, 2005. 288288
  • 8. 10 0 10 1 10 2 10 3 10 4 10 510 -1 10 0 10 1 10 2 10 3 10 4 10 5 the log of the number of virtual servers managed by VIMS nodes (log(n)) thelogoftheaverageresponsetimeofeverymigrationrequest(log(T)) Hierarchical structure with d=3 h=1 Hierarchical structure with d=3 h=2 Hierarchical structure with d=3 h=3 Hierarchical structure with d=3 h=4 Hierarchical structure with d=3 h=5 10 0 10 1 10 2 10 3 10 410 -1 10 0 10 1 10 2 10 3 10 4 10 5 the log of the number of virtual servers managed by VIMS nodes (log(n)) thelogoftheaverageresponsetimeofeverymigrationrequest(log(T)) Hierarchical structure with h=3 d=2 Hierarchical structure with h=3 d=3 Hierarchical structure with h=3 d=4 Hierarchical structure with h=3 d=5 10 0 10 1 10 2 10 310 -1 10 0 10 1 10 2 10 3 10 4 the log of the number of virtual servers managed by VIMS nodes (log(n)) thelogoftheaverageresponsetimeofeverymigrationrequest(log(T)) Peer-to-Peer structure with m=1 Peer-to-Peer structure with m=3 Peer-to-Peer structure with m=5 Peer-to-Peer structure with m=7 Peer-to-Peer structure with m=9 Peer-to-Peer structure with m=11 Peer-to-Peer structure with m=13 (a) (b) (c) Figure 4. Average response time of systems with different structures. 0 1000 2000 3000 4000 5000 6000 0.4 0.5 0.6 0.7 0.8 0.9 1 the number of virtual servers managed by VIMS nodes (n) thefault-toleranceindicatorofvirtualinfrastructuremanagementsystem(nF/n)) Hierarchical structure with d=3 h=1 Hierarchical structure with d=3 h=2 Hierarchical structure with d=3 h=3 Hierarchical structure with d=3 h=4 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0.4 0.5 0.6 0.7 0.8 0.9 1 the number of virtual servers managed by VIMS nodes (n) thefault-toleranceindicatorofvirtualinfrastructuremanagementsystem(nF/n)) Hierarchical structure with h=3 d=2 Hierarchical structure with h=3 d=3 Hierarchical structure with h=3 d=4 Hierarchical structure with h=3 d=5 0 100 200 300 400 500 600 700 0.4 0.5 0.6 0.7 0.8 0.9 1 the number of virtual servers managed by VIMS nodes (n) thefault-toleranceindicatorofvirtualinfrastructuremanagementsystem(nF/n)) Peer-to-Peer structure with m=1 Peer-to-Peer structure with m=3 Peer-to-Peer structure with m=5 Peer-to-Peer structure with m=7 Peer-to-Peer structure with m=9 Peer-to-Peer structure with m=11 Peer-to-Peer structure with m=13 (a) (b) (c) Figure 5. Fault-tolerance of systems with different structures 10 0 10 1 10 2 10 3 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 the log of the number of VIMS nodes (log(m)) thescalabilityofvirtualinfrastructuremanagementsystem(ψ(m)) r=3 r=5 r=7 r=9 r=11 r=13 r=15 10 0 10 1 10 2 10 3 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 the log of the number of VIMS nodes (log(m)) thescalabilityofvirtualinfrastructuremanagementsystem(ψ(m)) d=3 d=5 d=7 d=9 d=11 d=13 d=15 10 0 10 1 10 2 10 3 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 the log of the number of VIMS nodes (log(m)) thescalabilityofvirtualinfrastructuremanagementsystem(ψ(m)) r=3 r=5 r=7 r=9 r=11 r=13 r=15 (a) (b) (c) Figure 6. Scalability of the systems with different structures 10 0 10 1 10 2 10 3 10 410 -1 10 0 10 1 10 2 10 3 10 4 10 5 the log of the number of virtual servers managed by VIMS nodes (log(n)) thelogoftheaverageresponsetimeofeverymigrationrequest(log(T)) Centralized structure Hierarchical structure with 13 VIMS nodes Peer-to-Peer structure with 13 VIMS nodes 0 200 400 600 800 1000 1200 1400 1600 1800 0.4 0.5 0.6 0.7 0.8 0.9 1 the number of virtual servers managed by VIMS nodes (n) thefault-toleranceindicatorofvirtualinfrastructuremanagementsystem(nF/n)) Centralized structure Hierarchical structure with 13 VIMS nodes Peer-to-Peer structure with 13 VIMS nodes 10 0 10 1 10 2 10 3 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 the log of the number of VIMS nodes (log(m)) thescalabilityofvirtualinfrastructuremanagementsystem(ψ(m)) Peer-to-Peer structure Hierarchical structure d=3 Hierarchical structure d=5 Hierarchical structure d=7 Hierarchical structure d=9 (a) (b) (c) Figure 7. Performance, fault-tolerance and scalability comparison of systems with three typical structures [16] G. Vallée, T. Naughton, H. Ong, and S. L. Scott, “Checkpoint/Restart of Virtual Machines Based on Xen”, In High Availability and Performance Computing Workshop (HAPCW'06), Santa Fe, New Mexico, USA, pp. 30, 2006. [17] T. Garfinkel, B. Pfaff, P. Chow, P. Rosenblum, and P. Boneh, “Terra: A Virtual Machine-Based Platform for Trusted Computing”. Proc. 19th ACM Symp. Operating Systems Principles(SOSP), ACM Press, pp. 193-206, 2003. [18] VMware, “VMware Infrastructure 3 Primer”. http://www.vmware.com/products/vi/, 2009. [19] L. Kleinrock, “Queueing Systems: Volume I: Theory”, John Wiley & Sons, New York, pp. 187, 1975. [20] R. K. Iyer, S. Butner, and E. J. McCluskey, “A Statistical Failure/Load Relationship: Results of a Multicomputer Study”, IEEE Transaction on Computers, Vol. C-31, No. 7, pp. 697-706, July 1982. [21] A. Birolini, “Reliability Engineering Theory and Practice”, Fifth edition. Springer, pp. 2-12, 2007. [22] Y. Qu, C. Lin, Y. Li, and Z. Shan. “Survivability Analysis of Grid Resource Management System Topology”. Proc. 4th International Conference of Grid and Cooperative Computing, Lecture Notes in Computer Science, Vol. 3795, pp. 738-743, 2005. [23] P. Jogalekar, and M. Woodside, “Evaluating the scalability of distributed systems”, IEEE Transactions on Parallel and Distributed Systems, Vol. 11, Issue: 6, pp. 589-603, Jun 2000. [24] P. Jogalekar, and M. Woodside, “Evaluating the Scalability of Distributed Systems”, Proc. the Thirty-First Annual Hawaii International Conference on System Sciences - Volume 7, pp. 524-531, 1998. [25] P. Vilà, J. L. Marzo, A. Bueno, E. Calle, and L. Fàbrega, “Distributed Network Resource Management using a Multi-Agent System: Scalability Evaluation”, Proc. International Symposium on Performance Evaluation of Computer and Telecommunication Systems, pp. 355-362, July 2004. 289289