Performance, fault tolerance and scalability analysis of virtual infrastructure management system

Performance, Fault-tolerance and Scalability Analysis of
Virtual Infrastructure Management System
Xiangzhen Kong1
, Jiwei Huang2
, Chuang Lin3
, Peter D. Ungsunan4
Department of Computer Science and Technology
Tsinghua University
Beijing, 100084, China
1
xiangzhen1985@gmail.com, 2
hjw217@gmail.com, 3
chlin@tsinghua.edu.cn, 4
hongsunan@csnet1.cs.tsinghua.edu.cn
Abstract—The virtual infrastructure has become more and
more popular in the Grid and Cloud computing. With the
aggrandizement scale, the management of the resources in
virtual infrastructure faces a great technical challenge. To
support the upper services effectively, it raises higher
requirements for the performance, fault-tolerance and
scalability of virtual infrastructure management systems. In
this paper, we study the performance, fault-tolerance and
scalability of virtual infrastructure management systems with
the three typical structures, including centralized, hierarchical
and peer-to-peer structures. We give the mathematical
definition of the evaluation metrics and give detailed
quantitative analysis, and then get several useful conclusions
for enhancing the performance, fault-tolerance and scalability,
based on the quantitative analysis. We believe that the results
of this work will help system architects make informed choices
for building virtual infrastructure.
Keywords-virtual infrastructure management system;
performance; fault-tolerance; scalability
I. INTRODUCTION
A Grid is a very large-scale distributed network
computing system that can scale to Internet size
environments [1, 2]. In recent years, another distributed
computing model called Cloud computing has come into
being, which is closely related to the existing Grid
computing [3]. Both in the Grid and Cloud computing,
flexible and efficient sharing of distributed resources is at the
core of the design and implementation, which brings forward
higher requirements for low-cost, scalable and dependable
infrastructure. Under this background, infrastructure
virtualization becomes a growing concern.
Virtualization adds a hardware abstraction layer called
the Virtual Machine Monitor (VMM) or Hypervisor. The
layer provides an interface that is functionally equivalent to
the actual hardware to a number of virtual machines (VMs)
[4, 6]. More recently, virtualization became important as a
way to improve system security, reliability, reduce costs, and
provide greater flexibility [5]. Many servers, on each of
which run several VMs, are connected with one another by
network modules, and they are under the unified
management and play the role of infrastructure for upper
application, which are called virtual infrastructure. Virtual
infrastructure has numerous advantages, such as low-cost,
ease of deployment and more dependable. The virtual
infrastructure is becoming popular in Grid and Cloud [7-13].
The virtual infrastructure management system is charge
of controlling the resources and VMs. The management of
virtual infrastructure has some particular characteristics, for
the special mechanism of virtualization such as live
migration [14]. The virtual infrastructure management
system plays an important role and has a direct impact on the
capability of the overall infrastructure. However, with the
sharp increase of servers and the large scale of system, the
management of virtual infrastructure faces many tough
challenges. The sharp increase of virtual servers requires
higher scalability; the need for efficiency and quality of
service (QoS) of upper application puts forward a demand
for real-time and response time; besides, the fault-tolerance
is also an important requirement, especially when the
management nodes increase. In this paper, we study the
virtual infrastructure management systems with three typical
structures, and evaluate their performance, fault-tolerance
and scalability. The contributions are as follows:
• According to characteristics of virtual infrastructure
management systems, we propose the performance,
fault-tolerance and scalability evaluation metrics,
and give the detailed expressions and calculations.
• We make the quantitative analysis and evaluation of
the performance, fault-tolerance and scalability of
the virtual infrastructure management systems with
three typical structures, including centralized,
hierarchical and peer-to-peer structures.
• Based on the performance, fault-tolerance and
scalability analyses, we summarize some basic rules,
which are directive and with reference value for the
design and implementation of virtual infrastructure
management system in Grid and Cloud computing.
The rest of this paper is organized as follows. Section 2
introduces the research background and some concepts. In
Section 3, three typical structures of virtual infrastructure
management system are introduced, and then the detail
calculation and analysis of the performance, fault-tolerance
and scalability are given. Section 4 makes further discussions
and summarizes some useful rules. Section 5 shows the
numerical results by analyzing an example. At last, we
conclude the paper in Section 6.
II. BACKGROUND
Infrastructure virtualization becomes a hot issue in
interested among the industry and academe. Virtualization is
becoming popular in Grid computing[7-9], and is inherently
2009 IEEE International Symposium on Parallel and Distributed Processing with Applications
Unrecognized Copyright Information
DOI 10.1109/ISPA.2009.24
282
2009 IEEE International Symposium on Parallel and Distributed Processing with Applications
978-0-7695-3747-4/09 $25.00 © 2009 IEEE
DOI 10.1109/ISPA.2009.24
282

Figure 1. VMware infrastructure 3 components [18].
Figure 2. The typical structures of the virtual infrastructure management
system:(a) Centralized (b) Hierarchical (c) Peer-to-peer structures.
key feature of Cloud computing [10-13]. Benefited from its
special mechanisms, the virtual infrastructure has numerous
advantages. Server consolidation reduces the cost sharply;
live migration[14] or VMotion[15] greatly improve the
flexibility, maintainability and availability; Checkpoint /
Restart and Isolation help enhance the reliability, security
and survivability[16, 17]. The implementation of these
mechanisms is under unified management of the virtual
infrastructure management system.
Fig. 1 shows an example of virtual infrastructure, called
VMware Infrastructure 3 [18], which is published by
VMware Inc. Here is a small-scale system, and the virtual
infrastructure management system consists of only one
management server called VirtualCenter (VC) server. The
VC server manages multiple virtual servers at the same time,
and unifies resources from individual virtual server so that
those resources can be shared among virtual machines. The
running state of every virtual server is monitored and
recorded by VC server. When one virtual server overloads or
suffers failures, some of the VMs running on it could move
to another suitable virtual server by VMotion with little
downtime. The migration target virtual server is selected by
VC server according to the running state of every virtual
server. The server with least workload or most dependability
will be selected according to different strategies.
With the rapid development of distributed application,
the scale of system aggrandizes sharply. There are even tens
of thousands of servers in some large virtual infrastructure of
Grid or Cloud computing system. In this case, how to
construct an effective virtual infrastructure management
system for higher performance, fault-tolerance and
scalability becomes a challenge.
III. PERFORMANCE, FAULT-TOLERANCE AND
SCALABILITY ANALYSIS
A. Typical Structures of Virtual Infrastructure
Management System
Similar to the traditional Grid resource manage system,
the structures of virtual infrastructure management system
can be classified into centralized, hierarchical and
peer-to-peer structures [2, 22]. The other complex structures
can be view as the hybrid of the three typical structures.
1) Centralized structure
In a centralized virtual infrastructure management
system, all Virtual Servers (VSs) are managed by one Virtual
Infrastructure Management Server (VIMS). The central
VIMS monitors and records the running state information of
every VS. When one of the virtual machines in one VS needs
migration, the VIMS decides which is the best migration
target VS. The virtual infrastructure with centralized
structure management system is shown in Fig. 2 (a).
2) Hierarchical structure
The hierarchical structure is shown in Fig. 2 (b). The
VIMSs in the lowest layer manage the virtual servers directly,
and the lower layer VIMSs are under the administration of
the parent VIMSs. Lower layer VIMSs need merely to pass
the digest message to their parent VIMS, which reduce the
traffic and raise efficiency. The root VIMS has the whole
information of all VSs.
3) Peer-to-peer structure
Virtual infrastructure management system of peer-to-peer
structure is a decentralized system (see Fig. 2(c)). It can be
viewed as another expansion of the centralized structure.
Every peer VIMS has local group of virtual servers, different
VIMSs communicate directly with each other to get running
state of the VS managed by others. Every VIMS can get the
whole information of the virtual infrastructure.
B. Performance Analysis
We take average response time of every migration request
as the performance metric, which is the expectation period
from the time that a migration request is submitted to VIMS,
to the time that VIMS gets the best choice of the target VS.
Hereafter, we assume the total number of VS nodes is n , and
the operation period of every VS needless migration
conforms to exponential distribution, with expectation value
0T . So the migration request event from every VS node
conforms to Poisson distribution with parameter 0 01/Tλ = .
1) Performance analysis of the centralized structure
There is only one VIMS which manages n VSs in the
centralized system. According to the additive property of the
Poisson distribution, we know that the total migration
requests received by the central VIMS conform to Poisson
distribution too. The arrival rate of migration requests is
0nλ λ= . The central VIMS compares the workload or
dependability of every VS node recorded in it, and then
selects the best choice as the migration target VS. The
process takes ( )O n time generally, so we can assume that
the average processing time of VIMS for every migration
request is S Kn= , where K is a constant independent from n .
Assuming the processing time of VIMS for every
migration request conforms to arbitrary distribution, we can
283283

view the VIMS in centralized structure as an M/G/1 queue
model. According to the Pollaczek-Khinchin mean-value
formula [19], the average service time of M/G/1 queue is
2
(1 )
2(1 )
S C
T S
ρ
ρ
+
= +
−
(1)
WhereC is the coefficient of variation of the processing
time S , i.e. /SC Sσ= . SoC is relevant to the distribution of S.
But as the process for every migration request is similar, we
can assign a small constant toC . ρ is the server utilization,
and should be less than one for the stability of system.
For the centralized structure, we have S Kn= and
2
0/S Kn Tρ λ= = . Therefore, from (1), the average response
time of every migration request can be expressed by
2 3 2
2
0
(1 )
2( )
C
K n C
T Kn
T Kn
+
= +
−
(2)
To keep the system’s stability, the utilization should
be 1ρ < . So the number limit of the VS nodes those can be
managed by the central VIMS is 0 / Cn T K N< = .
2) Performance analysis of the hierarchical structure
It is assumed that a hierarchical system has h layers.
Every VIMS except those in the lowest layer has d son
VIMSs. So there are 1l
d −
VIMSs in the l th layer
( 1,2, ,l h= ) and ( ) ( )1 / 1h
m d d= − − VIMSs totally. Each
VIMS in the lowest layer directly manage 1
/ h
n d −
VS nodes.
Therefore, the average arrival rate of migration request is
( )1
0/ h
h n d Tλ −
= for the VIMS in the lowest layer, and
( )1
0/ i
i n d Tλ −
= for the VIMS in the i th (1 1i h≤ ≤ − ) layer.
The average processing time of every migration request for
the VIMS in the lowest layer is 1
/ h
hS Kn d −
= ; and because
the VIMSs in lower layer only need to report its locally
optimal choice to its parent VIMS, the average processing
time of every migration request for the VIMS in the i th
(1 1i h≤ ≤ − ) layer is i
iS Kd= . So the utilization of VIMS in
the lowest layer is ( )
21
0/ /h
h h hS K n d Tρ λ −
= = , and for the
VIMS in the i th (1 1i h≤ ≤ − ) layer 0/i i iS Kdn Tρ λ= = . From
(1), we get the average response time of every migration
request in hierarchical management system as
( )
( )
( )( )
1
1
32 1 22 1 21
1 211 0
0
/ (1 )(1 )
2 2 /
h
H i h
i
hih
i
h
hi
T T T
K n d CK nd C n
Kd K
T Kdn d T K n d
−
=
−+−
−
−=
= +
⎛ ⎞
+⎛ ⎞+ ⎜ ⎟
= + + +⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎜ ⎟
⎝ ⎠
∑
∑
(3)
For the stability of the system, we have 1hρ < and 1iρ < .
So the number limit of the VSs those can be managed by the
hierarchical VIMSs is { }1
0 0min / , /h
Hn d T K T Kd N−
< = .
3) Performance analysis of the peer-to-peer structure
It is assumed that there are m VIMSs in the virtual
infrastructure management system, so the number of VSs
managed by a VIMS is /n m . When the virtual machines in a
VS need migration, the VS sends a migration request to its
local VIMS, which computes the local optimal target VS,
gets all the local optimal target servers from other 1m −
VIMS nodes, and then selects the best choice. So the average
arrival rate of migration request for every VIMS
is 0 0
0
( 1)
n n n
m
m m T
λ λ λ= + − = ; the average processing time
is ( / 1)S K n m m= + − , and the server utilization is
0( / 1)/S Kn n m m Tρ λ= = + − . From (1), we can obtain the
average response time of every migration request as
2
2 2
0
1 (1 )
1
2 1
P
n
K n m C
n m
T K m
m n
T Kn m
m
⎛ ⎞
+ − +⎜ ⎟
⎛ ⎞ ⎝ ⎠= + − +⎜ ⎟
⎛ ⎞⎛ ⎞⎝ ⎠
− + −⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠
(4)
The stable condition of the system is 1ρ < , so the
number limit of the VSs can be managed by the peer-to-peer
VIMSs is ( )2
0( 1) 4 / ( 1)
2
P
m
n m T mK m N< − + − − = .
C. Fault-tolerance Analysis
The workload was found to be a very influential factor to
system failure rate [20]. Using a linear model to describe the
relationship between them, we give the failure rate as
f A Bρ= + , where ρ is the system utilization, A and B are
constants, which represent the effect on failure brought by
workload and the inherent failure rate respectively.
According the reliability theory, the probability that system
is failure free in a time period (0, )t
is ( ) { } ft
R t Prob Y t e−
= > = . So the probability that system
breaks down in (0, )t is ( ) 1 ( ) 1 ft
F t R t e−
= − = − .
The failure of VIMS node may lead to the result that the
information of part or all of the VS nodes can’t be obtained
by the system, which will hamper the VIMSs to compute the
best choice of migration target server. As Fn is the average
number of VS nodes still managed by the system in the
presence of VIMS failure, and n is the total number of VS
nodes, the fault-tolerance of the system can be expressed by
/FFT n n= (5)
1) Fault-tolerance analysis of centralized structure
According to subsection 3.2, the utilization of the only
VIMS in centralized system is 2
0/C Kn Tρ = . So the failure
rate is C Cf A Bρ= + , and the probability that system breaks
down in (0, )t is ( ) 1 ( ) 1 Cf t
CF t R t e−
= − = − . Hereafter, we
consider the probability in unit time, so
2
01 1 exp{ ( / )}Cf
CF e AKn T B−
= − = − − + (6)
Since there is only one VIMS node in centralized
structure, all VS nodes can’t be managed when the VIMS
fails. Hence 0 (1 )F C Cn F n F= + −i , and the fault-tolerance is
2
0/ (1 )/ exp{ ( / )}C F CFT n n n F n AKn T B= = − = − + (7)
2) Fault-tolerance analysis of hierarchical structure
There are h layers VIMS nodes in the hierarchical
system and n VS nodes managed by the VIMS nodes. The
284284

number of VIMS nodes in the k th layer is 1k
km d −
= . The
utilization of VIMS nodes in the k th layer is
( )
0
21
0
/ 1 1
/ /
k h
Kdn T k h
K n d T k h
ρ −
≤ ≤ −⎧⎪
= ⎨
=⎪⎩
(8)
Since the failure probability of one VIMS node in the k th
layer is ( ) 1 1 exp{ ( )}Hf
H kF k e A Bρ−
= − = − − + , we can obtain
( )
0
21
0
1 exp{ ( / )} 1 1
( )
1 exp{ [ / / ]}
H h
AKdn T B k h
F k
AK n d T B k h−
− − + ≤ ≤ −⎧⎪
= ⎨
− − + =⎪⎩
(9)
In the k th layer, when there are i nodes fail, the VIMS
can still manage ( ) ( )1
( ) / / k
k kn i n i n m n i n d −
= − = − VS
nodes. Therefore, the average number of VS nodes that can
be managed by the VIMS nodes in the k th layer is
1
1
1
1
1
1
( ) ( )(1 ( )) ( )
1 ( ) (1 ( ))
k
k
k
k
m
k m i
F H H k
i
kd
i d i
H H k
i
m
n k F k F k n i
i
d i
n F k F k
di
−
−
−
=
−
−
−
=
⎛ ⎞
= −⎜ ⎟
⎝ ⎠
⎛ ⎞⎛ ⎞
= − −⎜ ⎟⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠
∑
∑
i
i
(10)
Hence, from (9) and (10), we obtain the fault-tolerance of
the hierarchical structure as
1
1
1
11
1
1 1 0 0
2 21
1 1
0 0
1
( )
1
1 1 exp{ ( )} exp{ ( )}
1
1 1 exp{ ( )} exp{ ( )}
k
k
h
F
kF
H
i d ikh d
k
k i
i
h
h h
n k
n h
FT
n n
d Kdn Kdn i
A B A B
h T T di
d K n K n
A B A B
h T Td di
−
−
=
−−−
−
= =
−
− −
= =
⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟= − − − + − + +⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠
⎛ ⎞ ⎛⎛ ⎞ ⎛ ⎞ ⎛ ⎞
− − − + − +⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠ ⎝
∑
∑ ∑ i
1
1
1
1
h
h
d i
d
h
i
i
d
−
−
−
−
=
⎛ ⎞⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎠⎝ ⎠
∑ i
(11)
3) Fault-tolerance analysis of peer-to-peer structure
Each VIMS node manages /n m VS nodes directly. So
when i VIMS nodes fail, the average number of VS nodes
that can be still managed is ( ) /Fn i n i n m= − ⋅ . Assuming
different VIMS nodes failure events are independent, then
0
0
(1 ) ( )
1 (1 )
m
i n i
F P P F
i
m
i m i
P P
i
m
n F F n i
i
m i
n F F
i m
−
=
−
=
⎛ ⎞
= −⎜ ⎟
⎝ ⎠
⎛ ⎞⎛ ⎞
= − −⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠
∑
∑
i
(12)
Because the utilization is 0( / 1)/P Kn n m m mTρ = + − and
failure rate is P Pf A Bρ= + , the failure probability is
01 1 exp{ [ ( / 1)/ ]}Pf
PF e AKn n m m T B−
= − = − − + − + (13)
Hence, the fault-tolerance of the peer-to-peer structure is
0 0 0
1 1 exp{ ( ( 1) )} exp{ ( ( 1) )}
F
P
i m i
m
i
n
FT
n
mi Kn n Kn n
A m B A m B
im T m T m
−
=
= =
⎛ ⎞ ⎛ ⎞⎛ ⎞
− − − + − + − + − +⎜ ⎟ ⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠ ⎝ ⎠
∑
(14)
D. Scalability Analysis
Jogalekar et al. proposed a strategy-based scalability
metric for distributed systems [23, 24]. The scalability can be
expressed as
2 2 2 2 1 1
1 2 2 1
1 1 1 1 2 2
ˆ/ (1 / )
( , ) ( )/ ( )
ˆ/ (1 / )
f C C T T
k k F k F k
f C C T T
λ λ
ψ
λ λ
+
= = =
+
(15)
Where, λ is the throughput in responses/sec, C(k) is the
rental cost and ˆ( ) 1/(1 ( )/ )f k T k T= + is the average value of
each response, where T is the mean response time and ˆT is
the target value. The productivity ( ) ( ) ( )/ ( )F k k f k C kλ= is
the value delivered per second.
To give analytic solutions, k for the base case is taken as
1, and the metric is written as ( ) ( )/ (1)k F k Fψ = [23].
Fig. 3 shows howψ might behave in different situations
[24, 25]. The value ofψ and its trend relative to the scale
factor is taken to evaluate the scalability of a system.
For a virtual infrastructure management system, we
concern on the change of management efficiency instead of
productivity by scaling up the system. The management
efficiency when there are m VIMSs is defined as
( ) ( )/ ( )F m n f m C m= ⋅ (16)
Where, n is the maximum number of the VSs that the
VIMSs are able to manage; f(m) is the average value of each
VIMS determined by the management performance; C(m) is
the cost at scale m, assumed as ( )C m mα= .
For the average response time, the centralized structure
can be viewed as the special case of the hierarchical and
peer-to-peer structure (to be proved in Section 4). So we take
the centralized structure as the point of reference. The
average value of each response in other structures is
evaluated by comparing with the centralized one, as
( )( ) 1/ 1 ( )/ Cf m T m T= + (17)
Where, T(m) is the average response time in a certain
structure, and TC is the average response time of the
centralized structure. So the management efficiency is,
( )
(1 ( )/ )C
n
F m
m T m Tα
=
+
(18)
And, the scalability for scale m2 relative to m1 is
2 1 1
1 2 2 1
1 2 2
(1 ( )/ )
( , ) ( )/ ( )
(1 ( )/ )
C
C
n m T m T
m m F m F m
n m T m T
ψ
+
= =
+
(19)
1) Scalability analysis of the centralized structure
In the centralized system, there is only one VIMS,
i.e. 1m = , which can manage n r= VSs at most, where r is
the maximum number of VSs that a VIMS is able to manage.
The average management efficiency is
(1 / ) 2
C
C C
r r
F
T Tα α
= =
+
(20)
For both the hierarchical and peer-to-peer structure, when
1m = , n r= , we have H CT T= and P CT T= (to be proved in
Section 4). Hence, ( 1)H CF m F= = and ( 1)P CF m F= = , then
we have ( 1) 1H mψ = = and ( 1) 1P mψ = = as a result.
In conclusion, the centralized structure is a basic case of
the hierarchical structure and the peer-to-peer structure, and
it could be used to be compared with other cases
when 1m ≠ . For both the hierarchical and the peer-to-peer
structure,
285285

Figure 3. Different scalability behaviors [24, 25].
2
( ) ( )/
(1 ( )/ )
C
C
n
m F m F
rm T m T
ψ = =
+
(21)
2) Scalability analysis of the hierarchical structure
In a hierachical system, there are ( 1)/( 1)h
m d d= − −
VIMSs in all, able to manage 1h
n d r−
= VSs at most. The
average response time is
( ) ( )
2 2 2 3 21
2
1 0 0
(1 ) (1 )
2 2
h ih
i
H h
i
K d r C K r C
T Kd Kr
T Kd r T Kr
+−
=
⎛ ⎞ ⎛ ⎞+ +
⎜ ⎟ ⎜ ⎟= + + +
⎜ ⎟ ⎜ ⎟− −⎝ ⎠ ⎝ ⎠
∑ (22)
For a system of the same amount of VSs with centralized
structure, the average response time is
2 3 3 3 2
1
2 2 2
0
(1 )
2( )
h
h
C h
K d r C
T Kd r
T Kd r
−
−
−
+
= +
−
(23)
Hence, the scalability of the hierarchical structure is
( ) ( )
3 3 3 2
1 1
2 2 2
0
3 3 3 2 2 3 21
1
2 2 2 2
10 0 0
(1 )
2
2( )
( )
1 (1 ) (1 ) (1 )
1 2( ) 2 2
h
h h
h
H
h h h ih
h i
h h
i
Kd r C
d d r
T Kd r
m
d Kd r C Kd r C Kr C
d r d r
d T Kd r T Kd r T Kr
ψ
−
− −
−
− +−
−
−
=
⎛ ⎞+
+⎜ ⎟
−⎝ ⎠=
⎛ ⎞⎛ ⎞− + + +⎜ ⎟⎜ ⎟+ + + + +
⎜ ⎟− ⎜ − ⎟− −⎝ ⎠⎝ ⎠
∑
(24)
3) Scalability analysis of the peer-to-peer structure
In a virtual infrastructure manage system with the
peer-to-peer structure, there are m VIMSs in all, able to
manage n mr= VSs at most. The average response time is
( )
( )
( )( )
22 2
0
1 (1 )
1
2 1
P
K mr r m C
T K r m
T Kmr r m
+ − +
= + − +
− + −
(25)
For a system of the same amount of VSs with centralized
structure, the average response time is
2 3 3 2
2 2
0
(1 )
2( )
C
K m r C
T Kmr
T Km r
+
= +
−
(26)
Hence, the scalability of the peer-to-peer structure is
( )
( )
( )( )
3 3 2
2 2
0
2 23 3 2
2 2
0 0
(1 )
2
2( )
( )
1 (1 )(1 )
1
2( ) 2 1
P
Km r C
mr
T Km r
m
Kmr r m CKm r C
mr r m
T Km r T Kmr r m
ψ
⎛ ⎞+
+⎜ ⎟
−⎝ ⎠=
+ − ++
+ + + − +
− − + −
(27)
IV. FURTHER DISCUSSIONS ON PERFORMANCE,
FAULT-TOLERANCE AND SCALABILITY
In this section, we will make further elaboration of the
performance, fault-tolerance and scalability of the three types
of structures. Some rules and theorems will be summarized
and proved, which are directive and with reference value for
the construction of virtual infrastructure management
system.
A. Transforming relationship between different structures
From Section 3, we can get the transforming relationship
of the three types of structures, which elucidates correctness
of the calculation result to a certain extent.
Proposition 1: In terms of performance,
fault-tolerance and scalability, the centralized structure
is a special case of the hierarchical or peer-to-peer
structure.
Proof: We prove it by giving the transform conditions.
1) From hierarchical structure to centralized structure
From (3), we obtain the average response time of the
management system with hierarchical structure is
( )
( )
( )( )
32 1 22 1 21
1 211 0
0
/ (1 )(1 )
2 2 /
hih
i
H h
hi
K n d CK nd C n
T Kd K
T Kdn d T K n d
−+−
−
−=
⎛ ⎞
+⎛ ⎞+ ⎜ ⎟
= + + +⎜ ⎟ ⎜ ⎟⎜ ⎟− −⎝ ⎠ ⎜ ⎟
⎝ ⎠
∑
When there is only one layer in the hierarchical structure,
i.e. 1h = , we get
2 3 2
2
0
(1 )
( 1)
2( )
H C
K n C
T h Kn T
T Kn
+
= = + =
−
.
Meanwhile, the upper limit of the number of managed
VS nodes { }1
0 0min / , /h
HN d T K T Kd−
= . Substitute 1h = ,
then 0( 1) /H CN h T K N= = = .
Similarly, from (11), when 1h = , we obtain the
fault-tolerance of the system with hierarchical structure as
2
0( 1) exp{ ( / )}H CFT h AKn T B FT= = − + =
Since the 1h = , we get ( 1)/( 1) 1h
m d d= − − = . From
Section 3.4.1, we get the scalability relation as
( 1) ( 1)/ 1H H Ch F h Fψ = = = =
Hence, in terms of performance, fault-tolerance and
scalability, the centralized structure is a special case of the
hierarchical structure when the number of layers 1h = .
2) From peer-to-peer structure to centralized structure
From (4), we obtain the average response time of the
management system with peer-to-peer structure is
2
2 2
0
1 (1 )
1
2 1
P
n
K n m C
n m
T K m
m n
mT Kn m
m
⎛ ⎞
+ − +⎜ ⎟
⎛ ⎞ ⎝ ⎠= + − +⎜ ⎟
⎛ ⎞⎛ ⎞⎝ ⎠
− + −⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠
When 1m = , we get
2 3 2
2
0
(1 )
( 1)
2( )
P C
K n C
T m Kn T
T Kn
+
= = + =
−
.
Meanwhile, the upper limit of the number of managed
VS nodes ( )2
0( 1) 4 / ( 1)
2
P
m
N m T mK m= − + − − . Substitute
1m = , then 0( 1) /P CN m T K N= = = .
Similarly, from (14), when 1m = , we obtain the
fault-tolerance of the system with peer-to-peer structure as
2
0( 1) exp{ ( / )}P CFT m AKn T B FT= = − + =
For the scalability, from Section 3.4.1, we get
( 1) ( 1)/ 1P P Cm F m Fψ = = = =
Hence, in terms of performance, fault-tolerance and
scalability, the centralized structure is a special case of the
peer-to-peer structure when 1m = . □
286286

B. Structure selection for the best performance
Intuitively, when the number of managed VS nodes is
small, the centralized has the less average response time for
its flat and direct structure. But, as the number of VS nodes
increases, the processing time of the only VIMS increases
sharply. So we need consider another choice of structures for
better performance. We first give the following proposition.
Proposition 2: Given 1h > and 1 Cd N< < , there is a
threshold of the number of managed VS nodes
C H CM N< , and when C Hn M> , the average response time
C HT T> . Given 1m > , there is another threshold
C P CM N< , and when C Pn M> , the average response time
C PT T> .
Proof: Under the condition 1h > and Cd N< , the
number limit of managed VS nodes for the hierarchical
structure H CN N> , and for 1m > , we have P CN N> .
From (3) and (4), it is obvious that ( 1) ( 1)C HT n T n= < =
and ( 1) ( 1)C PT n T n= < = for manage one VS node. Then, to
calculate the derivative of (2), (3) and (4), we obtain
( )
0CdT n
dn
> ,
( )
0HdT n
dn
> and
( )
0PdT n
dn
> . So the average
response time for three structures is monotone increasing
function of n.
From (2), we have ( )C CT n N= → ∞ . For H CN N> and
P CN N> , we have ( )H CT n N= < ∞ and ( )P CT n N= < ∞ .
Hence, with the monotonicity and the end value, there
must be a threshold C H CM N< for the hierarchical structure
and a threshold C P CM N< for the peer-to-peer structure
making the proposition tenable. □
The values of C HM and C PM can be get by solving the
equation ( ) ( )C HT n T n= and ( ) ( )C PT n T n= . So when the
number of VS nodes min{ , }C H C Pn M M< , we select the
centralized structure, or else the others is selected according
the computing result of ( )HT n and ( )PT n .
C. Scalability comparison between the hierarchical and
peer-to-peer structure
Then, we compare the scalability of the virtual
infrastructure management systems with hierarchical
structure and peer-to-peer structure.
Proposition 3: Given 2 1r d> − , the scalability of the
system with the peer-to-peer structure ( )P mψ is higher
than that with the hierarchical structure ( )H mψ .
Proof: To compare the scalability of the two structures,
we set the numbers of VIMSs of the two system are equal, as
( 1) /( 1)h
m d d= − − . From (24) and (27), we have
lim ( ) 2( )/( )H
m
m dr r dr dψ
→∞
= − − and lim ( ) 2 /( 1)P
m
m r rψ
→∞
= + .
Given 2 1r d> − , we have ( ) ( )P m H mm mψ ψ→∞ →∞> . When m is
large enough, the scalability of the system with peer-to-peer
structure is higher than that with hierarchical structure.
Then, we create a continuous function ( ) [1, )f x C∈ ∞ to
prove the proposition.
( ) ( )( )( ) ( )
( ) ( )
( )
P Hx x x
f x
f x f x x x f x x
ψ ψ− ∈⎧⎪
= ⎨
− − + ∉⎡ ⎤ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦⎪⎩
N
N
For the equation ( ) 0f x = , there is only one solution
which is 1x = . Hence, for all 1m > , ( ) ( )P Hm mψ ψ≠ .
Assume that there is 1 1m > making 1 1( ) ( )P Hm mψ ψ<
hold. As we already know that 2 2( ) ( )P Hm mψ ψ> when 2m
is large, we have 1( ) 0f m < and 2( ) 0f m > . Therefore,
1 2( ) ( ) 0f m f m < . From the Zero Point Theorem, we know
that there must be a 1ξ > making ( ) 0f ξ = hold, which
conflicts the conclusion that there is only one solution 1x =
for the equation ( ) 0f x = . So, the assumption is false and for
all 1m > , ( ) ( )P Hm mψ ψ≥ .
From what has been discussed above, we can draw the
conclusion that for all 1m > , we have ( ) ( )P Hm mψ ψ> . □
V. NUMERICAL RESULTS AND REMARKS
In this section, we will give numerical results by
analyzing and solving an example, which helps to discover
some regularity and to verify the conclusions proposed in
Section 4. We assumed that 0.1K = which means VIMS can
process the migration request from 10 VSs per second. Since
the process for every migration request is similar, we can
assign a small constant 0.5 toC . The expectation value of
normal operation period 0T =3600, which means a VS nodes
send a migration request every mean 1 hour.
A. Performance analysis result
We study the effect on the average response time brought
by structure parameters of the hierarchical and peer-to-peer
management systems. Fig. 4(a) shows that when the number
of managed VS nodes is small, system with fewer layers has
less average response time. But as the increase of VS nodes,
the fewer layers the system has, the faster the average
response time increases. System with more layers can
perform better when the workload is heavy. The effect of the
out-degrees d is similar to h , but not so drastic like it (see
Fig. 4(b)). For the peer-to-peer structure, the effect brought
by the number of VIMS nodes m is similar (see Fig. 4(c)).
In Fig. 7(a), we compare the performance of the three
types of systems. We set 3d = and 3h = for the hierarchical
system, and 13m = for the peer-to-peer one. So the two
systems have the same number of VIMS nodes. Fig. 7(a)
shows that, when the number of VS nodes is small enough,
the centralized system has the least average response time.
As the number increase, other two systems have less average
response time, just as the proposition 2 said. After
computing, we get the number limits of managed VS nodes
of the systems with the three types of structures are 190CN = ,
1708HN = and 611PN = . It seems that hierarchical system
performs better when the number of VS nodes is big enough.
287287

B. Fault-tolerance analysis result
We study the effect on the fault-tolerance brought by
structure parameters of the systems. Fig. 5(a) shows that the
fault-tolerance of the hierarchical system decreases as the
number of VS nodes increasing, and the fewer layers the
system has, the faster it decreases. The influence brought by
the out-degrees d is similar but smaller than h (see Fig. 5(b).)
The fault-tolerance of peer-to-peer systems also decreases as
the number of VSs increasing, and the system which has
more VIMSs has better fault-tolerance (see Fig. 5(c)).
Fig. 7(b) shows the comparison of the fault-tolerance of
the three types of management systems. The hierarchical and
peer-to-peer systems both have 13 VIMS nodes. We can see
that centralized system has very poor fault-tolerance, and
that of the hierarchical system is the best one. When the
number of VS nodes is big enough, the hierarchical system
obtain an advantageous position in fault-tolerance.
C. Scalability analysis result
We study the effect on the scalability brought by
structure parameters of the hierarchical system. We set
3d = and compare the scalability of systems with different
parameter r . Fig. 6(a) shows that, the more VSs every VIMS
is able to manage, the more scalable it is. Then, we set
7r = and compare the scalability of systems with different
parameter d . As we can see from Fig. 6(b), the more son
VIMS nodes that each VIMS has, the more scalable it is.
Then we study the effect on the scalability brought by
structure parameters of the peer-to-peer system.. Fig. 6(c)
shows that, the more VSs every VIMS node is able to
manage, the more scalable it is.
In Fig. 7(c), we compare the scalability of the systems
with hierarchical structure and peer-to-peer structure.
When 7r = , the scalability of the peer-to-peer system is
higher than the hierarchical ones with different parameter
d s, which means the peer-to-peer system is more scalable
than the other. This result agrees with the conclusion proved
in the proposition 3.
VI. CONCLUSIONS
This work is the first attempt to comprehensively analyze
and evaluate the performance, fault-tolerance and scalability
of the virtual infrastructure management system. According
to the characteristic of virtual infrastructure, we give the
mathematical definition of the evaluation metrics of
performance, fault-tolerance and scalability. Three typical
structures of the virtual infrastructure management system
are studied, which are centralized, hierarchical and
peer-to-peer structures. We give detailed calculation
processes to quantitatively analyze their performance,
fault-tolerance and scalability. Based on the quantitative
analysis, some useful rules and conclusions are drawn and
proved which are directive and with reference value for the
construction of the virtual infrastructure management
systems with higher performance, fault-tolerance and
scalability.
This paper provides a general analysis and evaluation
method, which allows system designers to evaluate
alternative virtual infrastructure management systems by
assigning different values for parameters as they deem
appropriate. It certainly enhances their ability to make more
informed choices for building virtual infrastructure
management system, before undergoing the expensive
process of constructing and evaluating multiple prototypes.
ACKNOWLEDGMENT
This work was supported by the National Natural Science
Foundation of China (No. 60673187); the National High
Technology Research and Development Program of China
(NO. 2007AA01Z419)
REFERENCES
[1] I. Foster, C.Kesselman (eds.). “The Grid: Blueprint for a New
Computing Infrastructure”.Morgan Kaufmann, San Francisco, CA,
2004.
[2] K.Krauter, R.Buyya, and M.Maheswaran, “A Taxonomy and Survey
of Grid Resource Management Systems for Distributed Computing”,
International Journal of Software: Practice and Experience (SPE), Vol.
32, No. 2, pp. 135-164, 2002.
[3] I. Foster, Y. Zhao, I. Ru, and S. Lu, “Cloud Computing and Grid
Computing 360-Degree Compared”, Grid Computing Environments
(GCE) Workshop, pp. 1-10, Nov. 2008.
[4] D. A. Menasce, “Virtualization: Concepts, Applications, and
Performance Modeling”, Computer Measurement Group (CMG),
http://www.cmg.org/proceedings/2005/5189.pdf, 2005.
[5] R. Figueiredo, P. A. Dinda, and J. Fortes, “Resource Virtualization
Renaissance”. IEEE Internet Computing, Vol. 38, No. 5, pp. 28-31,
2005.
[6] P. T. Barham, B. Dragovic, K. Fraser, S. Hand, T. L. Harris, A. Ho,
R. Neugebauer, I. Pratt, and A.Warfield. “Xen and the Art of
Virtualization”. Proc. 19th ACM Symposium on Operating Systems
Principles (SOSP), pp. 164–177, October 2003.
[7] R. Figueiredo, P. Dinda, and J. Fortes, “A Case for Grid Computing
on Virtual Machines,” Proc. 23rd Int’l Conf. Distributed Computing
Systems (ICDCS), IEEE CS Press, pp. 550-559, 2003.
[8] I. Krsul, A. Ganguly, J. Zhang, J.A.B. Fortes, and R.J. Figueiredo,
“VMPlants: Providing and Managing Virtual Machine Execution
Environments for Grid Computing”, Proc. IEEE/ACM
Supercomputing Conference, IEEE CS Press, pp. 7, 2004.
[9] J. Alonso, L. Silva, A. Andrzejak, P. Silva and J. Torres.
“High-available grid services through the use of virtualized
clustering”. In Proc. 8th IEEE/ACM International Conference on Grid
Computing, pp. 34-41, Sept. 2007.
[10] Amazon, “Amazon Elastic Compute Cloud (Amazon EC2)”.
http://aws.amazon.com/ec2/, 2009.
[11] Microsoft, “Introducing the Azure Services Platform”.
http://download.microsoft.com/download/e/4/3/e43bb484-3b52-4fa8-
a9f9-ec60a32954bc/Azure_Services_Platform.docx, 2009.
[12] M. Fenn, M. Murphy, J. Martin and S. Goasguen. “An Evaluation of
KVM for Use in Cloud Computing”, Proc. 2nd International
Conference on the Virtual Computing Initiative, RTP, NC, USA, May
2008.
[13] VMware, “VMware vCloud”.
http://www.vmware.com/technology/cloud-computing.html, 2009.
[14] C. Clark, K. Fraser, S. Hand, J. G. Hansen, E. Jul, C. Limpach, I.
Pratt, and A. Warfield. “Live Migration of Virtual Machines”. Proc.
2nd ACM/USENIX Symposium on Networked Systems Design and
Implementation (NSDI), Boston, MA. pp. 273-286, May 2005.
[15] M. Nelson, B. Lim, and G. Hutchins, “Fast Transparent Migration for
Virtual Machines”. Proceedings of the annual conference on USENIX
Annual Technical Conference, pp:391-394, 2005.
288288

10
0
10
1
10
2
10
3
10
4
10
510
-1
10
0
10
1
10
2
10
3
10
4
10
5
the log of the number of virtual servers managed by VIMS nodes (log(n))
thelogoftheaverageresponsetimeofeverymigrationrequest(log(T))
Hierarchical structure with d=3 h=1
10
0
10
1
10
2
10
3
10
410
-1
10
0
10
1
10
2
10
3
10
4
10
5
Hierarchical structure with h=3 d=2
10
0
10
1
10
2
10
310
-1
10
0
10
1
10
2
10
3
10
4
Peer-to-Peer structure with m=1
(a) (b) (c)
Figure 4. Average response time of systems with different structures.
0 1000 2000 3000 4000 5000 6000
0.4
0.5
0.6
0.7
0.8
0.9
1
the number of virtual servers managed by VIMS nodes (n)
thefault-toleranceindicatorofvirtualinfrastructuremanagementsystem(nF/n))
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0.4
0.5
0.6
0.7
0.8
0.9
1
0 100 200 300 400 500 600 700
0.4
0.5
0.6
0.7
0.8
0.9
1
(a) (b) (c)
Figure 5. Fault-tolerance of systems with different structures
10
0
10
1
10
2
10
3
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
the log of the number of VIMS nodes (log(m))
thescalabilityofvirtualinfrastructuremanagementsystem(ψ(m))
r=3
r=5
r=7
r=9
r=11
r=13
r=15
10
0
10
1
10
2
10
3
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
d=3
d=5
d=7
d=9
d=11
d=13
d=15
10
0
10
1
10
2
10
3
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
r=3
r=5
r=7
r=9
r=11
r=13
r=15
(a) (b) (c)
Figure 6. Scalability of the systems with different structures
10
0
10
1
10
2
10
3
10
410
-1
10
0
10
1
10
2
10
3
10
4
10
5
Centralized structure
Hierarchical structure with 13 VIMS nodes
Peer-to-Peer structure with 13 VIMS nodes
0 200 400 600 800 1000 1200 1400 1600 1800
0.4
0.5
0.6
0.7
0.8
0.9
1
Centralized structure
Hierarchical structure with 13 VIMS nodes
Peer-to-Peer structure with 13 VIMS nodes
10
0
10
1
10
2
10
3
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
Peer-to-Peer structure
Hierarchical structure d=3
(a) (b) (c)
Figure 7. Performance, fault-tolerance and scalability comparison of systems with three typical structures
[16] G. Vallée, T. Naughton, H. Ong, and S. L. Scott, “Checkpoint/Restart
of Virtual Machines Based on Xen”, In High Availability and
Performance Computing Workshop (HAPCW'06), Santa Fe, New
Mexico, USA, pp. 30, 2006.
[17] T. Garfinkel, B. Pfaff, P. Chow, P. Rosenblum, and P. Boneh, “Terra:
A Virtual Machine-Based Platform for Trusted Computing”. Proc.
19th ACM Symp. Operating Systems Principles(SOSP), ACM Press,
pp. 193-206, 2003.
[18] VMware, “VMware Infrastructure 3 Primer”.
http://www.vmware.com/products/vi/, 2009.
[19] L. Kleinrock, “Queueing Systems: Volume I: Theory”, John Wiley &
Sons, New York, pp. 187, 1975.
[20] R. K. Iyer, S. Butner, and E. J. McCluskey, “A Statistical
Failure/Load Relationship: Results of a Multicomputer Study”, IEEE
Transaction on Computers, Vol. C-31, No. 7, pp. 697-706, July 1982.
[21] A. Birolini, “Reliability Engineering Theory and Practice”, Fifth
edition. Springer, pp. 2-12, 2007.
[22] Y. Qu, C. Lin, Y. Li, and Z. Shan. “Survivability Analysis of Grid
Resource Management System Topology”. Proc. 4th International
Conference of Grid and Cooperative Computing, Lecture Notes in
Computer Science, Vol. 3795, pp. 738-743, 2005.
[23] P. Jogalekar, and M. Woodside, “Evaluating the scalability of
distributed systems”, IEEE Transactions on Parallel and Distributed
Systems, Vol. 11, Issue: 6, pp. 589-603, Jun 2000.
[24] P. Jogalekar, and M. Woodside, “Evaluating the Scalability of
Distributed Systems”, Proc. the Thirty-First Annual Hawaii
International Conference on System Sciences - Volume 7, pp.
524-531, 1998.
[25] P. Vilà, J. L. Marzo, A. Bueno, E. Calle, and L. Fàbrega, “Distributed
Network Resource Management using a Multi-Agent System:
Scalability Evaluation”, Proc. International Symposium on
Performance Evaluation of Computer and Telecommunication
Systems, pp. 355-362, July 2004.
289289

Performance, fault tolerance and scalability analysis of virtual infrastructure management system

More Related Content

What's hot (19)

Similar to Performance, fault tolerance and scalability analysis of virtual infrastructure management system (20)

More from www.pixelsolutionbd.com (20)

Recently uploaded (20)

Performance, fault tolerance and scalability analysis of virtual infrastructure management system