SlideShare a Scribd company logo
See	discussions,	stats,	and	author	profiles	for	this	publication	at:
http://www.researchgate.net/publication/269705521
Load	balancing	and	handover	joint
optimization	in	LTE	networks	using	Fuzzy
Logic	and	Reinforcement	Learning
ARTICLE		in		COMPUTER	NETWORKS	·	NOVEMBER	2014
Impact	Factor:	1.26	·	DOI:	10.1016/j.comnet.2014.10.027
READS
173
3	AUTHORS:
P.	Muñoz
University	of	Malaga
20	PUBLICATIONS			79	CITATIONS			
SEE	PROFILE
Raquel	Barco
University	of	Malaga
59	PUBLICATIONS			270	CITATIONS			
SEE	PROFILE
Isabel	de	la	Bandera	Cascales
University	of	Malaga
14	PUBLICATIONS			70	CITATIONS			
SEE	PROFILE
All	in-text	references	underlined	in	blue	are	linked	to	publications	on	ResearchGate,
letting	you	access	and	read	them	immediately.
Available	from:	P.	Muñoz
Retrieved	on:	30	October	2015
Load balancing and handover joint optimization in LTE
networks using Fuzzy Logic and Reinforcement Learning
P. Muñoz ⇑
, R. Barco, I. de la Bandera
University of Málaga, Communications Engineering Dept., Campus de Teatinos, 29071 Málaga, Spain
a r t i c l e i n f o
Article history:
Received 10 June 2014
Received in revised form 29 October 2014
Accepted 31 October 2014
Available online 13 November 2014
Keywords:
Load balancing
Handover
Self-organizing networks
Long-term evolution
Fuzzy logic
Reinforcement learning
a b s t r a c t
With the growing deployment of cellular networks, operators have to devote significant
manual effort to network management. As a result, Self-Organizing Networks (SONs) have
become increasingly important in order to raise the level of automated operation in cellular
technologies. In this context, Load Balancing (LB) and Handover Optimization (HOO) have
been identified by industry as key self-organizing mechanisms for the Radio Access Net-
works (RANs). However, most efforts have been focused on developing a stand-alone entity
for each self-organizing mechanism, which will run in parallel with other entities, as well
as designing coordination mechanisms in charge of stabilizing the network as a whole. Due
to the importance of LB and HOO, in this paper, a unified self-management mechanism
based on Fuzzy Logic and Reinforcement Learning is proposed. In particular, the proposed
algorithm modifies handover parameters to optimize the main Key Performance Indicators
related to LB and HOO. Results show that the proposed scheme effectively provides better
performance than independent entities running simultaneously in the network.
Ó 2014 Elsevier B.V. All rights reserved.
1. Introduction
In the last years, cellular networks have experienced a
large increase in size and complexity. As a result, mobile
operators have focused attention on reducing capital
expenditures (CAPEX) and operational expenditures
(OPEX) of their networks [1]. This fact has stimulated
strong research activity in the field of Self-Organizing
Networks (SON), which is a set of principles and concepts
defined by the 3rd Generation Partnership Project (3GPP)
for automating network management while improving
network quality [2]. In the context of SON, certain
functions have been identified as key enablers by the
3GPP, among which are Load Balancing (LB) and Handover
Optimization (HOO). The former is an automated function
where cells suffering occasional congestion can transfer
load to neighbor cells, which have spare resources, by e.g.
adjusting mobility parameters. The latter is a solution for
automatic detection and correction of errors and subopti-
mal settings in the mobility configuration, which may lead
to a degradation of user performance. Many efforts in the
research community have been devoted to the so-called
Mobility LB (MLB) and Mobility Robustness Optimization
(MRO), for which the 3rd Generation Partnership Project
(3GPP) has specified particular features [3]. Typically, these
functionalities are implemented at a low-level in the net-
work architecture, meaning that they operate quickly (i.e.
at time scales of the order of seconds or less) and they
are located in each base station on the access network. In
this sense, less or no attention has been paid to LB and
HOO at higher levels, e.g. at the level of the Operations,
Administration, and Maintenance (OAM) system, which
typically operates slower (i.e. at time scales of the order
of minutes or even hours) and they are not necessarily
located in the base stations (e.g. they can be located in a
http://dx.doi.org/10.1016/j.comnet.2014.10.027
1389-1286/Ó 2014 Elsevier B.V. All rights reserved.
⇑ Corresponding author. Tel.: +34 952 134 164; fax: +34 952 132 027.
E-mail addresses: pabloml@ic.uma.es (P. Muñoz), rbm@ic.uma.es
(R. Barco), ibanderac@ic.uma.es (I. de la Bandera).
Computer Networks 76 (2015) 112–125
Contents lists available at ScienceDirect
Computer Networks
journal homepage: www.elsevier.com/locate/comnet
server on the core network). Thus, network management at
this level copes with slower changes in the network, whose
impact on performance can be even more important, since
the underlying variations to be tracked are typically rather
slow as well [4]. In addition, the data available in the OAM
system is much more abundant than in the base stations,
thereby allowing more efficient and powerful network
management. As a result, the implementation of this kind
of algorithms will provide great benefits and cost-savings
to operators.
As the deployment of stand-alone SON functions is
growing, the number of conflicts and dependencies
between them increases. A conflict can happen if, for
example, two individual SON functions optimize the same
parameter with different goals at a network element [5]. As
expected, conflicts may have a negative impact on network
performance. The common solution in SON research has
been to create an additional entity, usually called coordina-
tor, which manages the conflict. Typically, an entity
causing conflict is switched off or limited in the control
strategy, e.g. by decreasing the allowed range, the
maximum allowed step sizes or the periodicity at which
parameter control takes place. The study of SON coordina-
tion is a topic recently addressed in the bibliography. On
the one hand, there are several studies with the aim of
developing a functional framework for SON coordination
[6–10]. On the other hand, further efforts have been
devoted to specific solutions for coordination of two or
more SON functionalities [11]. Special attention has been
devoted to the coordination of MLB and MRO, addressed
by the SOCRATES project [12]. In particular, the study
assumes the control parameters of the MLB and MRO
algorithms to be independent of each other, i.e. the two
algorithms do not tune the same parameters. While the
MLB function adjusts the HO margin (HOM), the MRO
function adjusts the Time-To-Trigger (TTT) and hysteresis
parameters. The interactions exist because these two func-
tions influence the same Key Performance Indicators (KPIs)
that are used as input for the optimization algorithms. In
[13], a constraint for the connection quality more restric-
tive than the one assumed in the SOCRATES project is con-
sidered. In this sense, the MLB function is restrained in
favor of the HO performance optimization. In [14], to avoid
the conflict between MLB and MRO, the HOM range of MLB
is dynamically adjusted according to the TTT and the hys-
teresis parameters, which are first adjusted considering the
effect of the user speed.
Although the coordinator-based schemes have been
well accepted by the research community, there are some
related issues. Specifically, the definition of operator
policies becomes a complex task, since there exists a
trade-off between proper controllability and ease of use
[10]. In addition, when some limitations are applied to
the control strategy (e.g. by restricting the step size), the
optimal configuration may lie outside the space of possible
solutions. Another problem is related to the prioritization
of SON functions in a centralized coordination scheme,
which is the typical implementation due to the required
integration with a (centralized) legacy OAM system [15].
Under this situation, the coordination entity has to process
many parameter configuration requests, so that the risk of
monopolization by high priority functions is high. Due to
this, the joint optimization of SON functions has also been
addressed. In [16], the problem of coordinating capacity
and coverage optimization and MLB is addressed. Instead
of implementing an additional entity that coordinates the
outcomes of each independent function, these functions
are combined into one algorithm and then the cellular net-
work is optimized towards a joint target. Similarly, in [17],
instead of controlling the conflict between independent
MRO and MLB functions, a joint optimization algorithm is
proposed. Such an algorithm adjusts the same HO param-
eters for individual users (i.e. each user has individual val-
ues of the same HO parameters). This solution reduces
unnecessary HOs for some users that should not be handed
over to the neighbor cell. However, it is noted that, at the
level of the OAM system, this feature is hard to be imple-
mented since statistics are rarely given per user-level, in
addition to the high signaling cost that this kind of optimi-
zation would involve. In [18], the proposed MRO and MLB
algorithm prioritizes the MRO part, since KPIs related to
the connection quality (e.g. the radio link failure) are con-
sidered first. However, other important KPIs from the MRO
viewpoint, such as those associated with unnecessary HOs,
are not taken into account in the study, which makes more
difficult to achieve optimal performance.
For all those reasons, in this paper, a novel unified algo-
rithm for both LB and HOO in Long-Term Evolution (LTE)
networks is proposed. This algorithm is based on a Fuzzy
System (FS) that tunes the handover (HO) parameters at
the cell adjacency level to improve network performance.
The FS is optimized by the Q-Learning algorithm, which
drives it to select the most appropriate action either due
to LB and/or HOO reasons. The decision of which action
the FS should take depends on past actions which were
taken by the FS and whose impact on network perfor-
mance was measured through the KPIs. With the proposed
solution, the complexity of the SON coordination entity
would be reduced, as it is freed from the coordination of
two important SON functions. In addition, the proposed
algorithm is expected to achieve better performance, as
its space of all candidate solutions is not as restricted as
if a coordinator-based scheme or some type of prioritiza-
tion algorithm would be used.
The rest of the paper is organized as follows. Section 2
formulates the problem and introduces the mobility
algorithm in LTE networks and the system performance
metrics. In Section 3, the design of the proposed FS as well
as its optimization process is described. Section 4 presents
the simulation setup and discusses the simulation results.
Finally, Section 5 presents the main conclusions of the
study.
2. System model
The HO is the procedure that preserves the connection
when the user moves around the network. As LTE is being
deployed with a frequency reuse of one (i.e. the same fre-
quency is shared by all cells), the intra-frequency HO is
very common in these networks. More specifically, the
most widely extended algorithm for the HO-triggering
decision is the 3GPP A3 event [19]. Roughly, this algorithm
P. Muñoz et al. / Computer Networks 76 (2015) 112–125 113
triggers the execution of an HO if the neighbor cell
becomes offset better than the serving cell during a specific
time period determined by the TTT parameter. Formally, it
is expressed as:
RSRPj > RSRPi þ HOMi!j; ð1Þ
where RSRPi and RSRPj are the averaged values of the Ref-
erence Signal Received Power (RSRP) measured for serving
cell i and target cell j respectively, and HOMi!j is the HOM
from cell i to cell j. Note that the symmetric HOMj!i is also
defined in the opposite direction of the adjacency (i.e. a
pair of cells that are neighbors).
In contrast with HO-triggering decisions based on
absolute comparisons (e.g. the serving/neighbor cell
below/above a threshold), the A3 event consists of a rela-
tive comparison that simplifies the configuration of its
parameters since they are independent of the absolute
received power levels, which may depend on diverse con-
text factors. However, the HOM in Eq. (1) is broken down
into several terms by the 3GPP, so that:
HOMi!j ¼ Hys þ Ofi À Ofj þ Oci À Ocj þ Off; ð2Þ
where Hys and Off are the hysteresis and offset parameters,
respectively, for this event, Ofi and Oci are the frequency
and cell specific offsets, respectively, for serving cell i,
and Ofj and Ocj are the frequency and cell specific offsets,
respectively, for neighbor cell j. While only one value of
Hys and Off corresponding to the A3 event can be used
for all the cells and deployed frequencies in the network,
Oci and Ocj can be defined per cell and Ofi and Ofj can be
defined per frequency layer. In addition to this, the defini-
tion of Hys implies the existence of another inequality in
which this term has opposite sign:
RSRPj < RSRPi À Hys þ Ofi À Ofj þ Oci À Ocj þ Off: ð3Þ
This inequality is called the leaving condition for this
event. Assuming that the entering condition given by Eq.
(1) was previously satisfied, the leaving condition must
be satisfied to reset the TTT parameter. By optimizing
Hys, the impact of signal fluctuations on the handover pro-
cess can be effectively reduced.
In general, the HOO function is directly related to the
parameter Hys, so that its optimization could only be
performed at the event level. Conversely, LB function is
more related to HO parameters that are defined at the cell
level (e.g. Oci and Ocj). In practice, different parameters are
optimum for different cell pairs (e.g. due to the shadowing
variations). Hence, the model adopted in this paper is
based on a joint optimization of LB and HOO at the cell
level. Specifically, this model requires only one formula
(e.g. the entering condition given by Eq. (2)) and one
parameter, denoted as HOM and defined per adjacency.
The only condition is that both HOM values in the adja-
cency (i.e. HOMi!j and HOMj!i) are simultaneously tuned
to perform the joint optimization of LB and HOO. Note that
this model complies with the 3GPP specifications if the
parameter Hys is set to zero and the rest of the parameters
are grouped into a single parameter defined per adjacency.
To facilitate the joint optimization, the parameters need
to be expressed in a more understandable way, according
to the following relationship in an adjacency x:
HðxÞ
þ OðxÞ
¼ HOM
ðxÞ
i!j;
HðxÞ
À OðxÞ
¼ HOM
ðxÞ
j!i;
8
<
:
ð4Þ
where HðxÞ
is the parameter related to HOO that represents
a hysteresis and OðxÞ
is the parameter related to LB that rep-
resents a certain offset. Intuitively, the HOM determines
the area where the users connected to a cell would perform
an HO toward a neighbor cell. On the one hand, in the con-
text of HOO and assuming that OðxÞ
¼ 0, the parameter
HOM
ðxÞ
i!j and the symmetric HOM
ðxÞ
j!i can be set to the same
positive value (given by HðxÞ
) so that certain symmetric
region between the two cells is ensured in order to avoid
unnecessary HOs. In Fig. 1(a), the RSRP of the neighbor
and the serving cells are represented. For instance, if a user
connected to cell i moves to cell j, it connects to cell j when
the RSRP from cell j is equal to the RSRP from cell i plus
HOM
ðxÞ
i!j. As in this case the HOMs are assumed to be sym-
metric, the same value is applied to the opposite situation
(i.e. when the user moves from cell j to i; HOM
ðxÞ
j!i is used).
Decreasing such a symmetric region (i.e. both HOMs)
favors UEs to perform an HO to a neighbor cell. This is
because the UE would need a lower received power from
the target cell to trigger an HO. Conversely, increasing
the HOMs makes more difficult to perform an HO, meaning
that the user spends more time attached to the serving cell,
while the connection quality is getting worse, which could
cell iRSRP
HOMji HOMij
cell j
cell i cell j
cell iRSRP
HOMji HOMij
cell j
cell i cell j
(a) (b)
Fig. 1. Adjustment of HOM for (a) MRO and (b) MLB purposes.
114 P. Muñoz et al. / Computer Networks 76 (2015) 112–125
lead to call dropping. However, under this configuration,
possible unnecessary HOs due to signal fluctuations can
be avoided. In this sense, the HOO function aims to reduce
the inefficient usage of network resources due to unneces-
sary HOs provided that the level of call dropping is low
enough. An important KPI closely related to the problem
of unnecessary HOs is the HO Ratio (HOR), defined in this
work as the number of HOs divided by the total number
of carried calls.
On the other hand, maintaining the symmetric region,
both HOMs can also be jointly tuned by means of OðxÞ
(e.g.
HOM
ðxÞ
i!j is increased while HOM
ðxÞ
j!i is decreased) so that
the service area of these cells is modified for LB purposes.
In this case, both HOMs are modified with the same
magnitude to preserve the hysteresis region. However,
those variations in HOM should have opposite sign to mod-
ify the service area of the two cells. In Fig. 1(b), it is observed
that HOM
ðxÞ
i!j has been increased, while HOM
ðxÞ
j!i has been
decreased. As a result, the service area of the cell i is larger.
Considering, for example, that cell j is overloaded, its service
area is reduced while the service area of the adjacent cells
with spare resources (e.g. cell i) is increased to take users
from the congested cell edge. Thus, cells suffering occa-
sional congestion can transfer load to neighbor cells, which
have free resources, by adjusting mobility parameters. It is
noted that, since HOMs are defined on an adjacency basis,
cell service areas cannot be only re-sized but also re-shaped.
The most important benefit of LB is that call blocking in the
network is reduced, especially in those cells highly loaded.
The function responsible for accepting or blocking a call is
the Call Admission Control (CAC). Such a function checks
the availability of free resources in the candidate cell before
taking a decision. In this paper, a ‘worst-case’ criterion has
been adopted to accept calls, i.e. the user is finally accepted
if the highest number of radio resources needed to maintain
a connection (worst-case requirement) is less or equal than
the number of radio resources available in the candidate
cell. If the condition is not satisfied by any candidate cell,
then the call is blocked. To quantify the call blocking,
network operators typically use the Call Blocking Ratio
(CBR), defined as the number of blocked calls divided by
the number of call attempts.
Finally, the actions performed by LB and HOO may also
involve a decrease in the connection quality. To explain
this, on the one hand, let’s consider the HOM to be
decreased for LB reasons. The target cell will increase the
probability to be preferred to the serving cell, even if the
connection quality is worse due to negative HOM values.
If this is the case, some users, usually located in the cell
edge, will be handed over to the target cell experiencing
worse radio conditions as a result of the performed HO.
Thus, negative values of the HOM will increase the risk of
dropping. On the other hand, when the HOM is increased
for either LB or HOO reasons, the user will spend more time
attached to the serving cell, delaying an HO towards cells
with better radio conditions. In this case, the probability
of call dropping will also increase. For these reasons, a KPI
widely used by network operators is the Call Dropping
Ratio (CDR), defined as the number of dropped calls divided
by the number of finished calls. Typically, call dropping
may occur because the connection quality is bad, but also
because there are no available resources due to an overload
situation. In this paper, the calculation of the CDR includes
those dropped calls due to bad radio conditions. In particu-
lar, a call is dropped when the Signal-to Interference plus
Noise Ratio (SINR) is below a certain threshold during a
specific time interval. The call dropping due to an overload
situation is assumed to be negligible due to the correct
operation of the CAC in the network, since enough
resources are guaranteed by the CAC for the accepted calls.
3. Joint optimization algorithm
This section explains the proposed algorithm for LB and
HOO. The first part comprises the design of the FS, describ-
ing the inputs, the outputs and the behavior of the system.
After this, the second part of the section is devoted to the
optimization technique that is used to lead the FS in the
action selection.
3.1. The fuzzy system
As an alternative to Classical Logic, Fuzzy Logic is a
mathematical discipline that introduces a degree of
vagueness when an assertion is made [20]. The design of
a FS for control problems is one of the most important
application areas of Fuzzy Logic [21]. Its main benefit is
that controlling a system can be performed by using lin-
guistic terms such as high or low instead of providing a
numerical value when defining the reference values of
the controller. Experience has shown that Fuzzy Logic Con-
trollers (FLCs) provide results superior to those obtained by
conventional control algorithms. In particular, the method-
ology of the FLC becomes very useful when the processes
are too complex for analysis by conventional quantitative
techniques or when the available sources of information
are interpreted qualitatively, inexactly, or uncertainly [22].
The proposed FS is designed on the basis of FLCs, but
there are some differences. From the operational perspec-
tive, it combines the functionalities of LB and HOO, i.e. to
decrease the call blocking and the HO signaling load,
respectively, while at the same time the connection quality
is preserved. To achieve this, the LB part is inspired in the
LB algorithm proposed in [23] and the HOO part is inspired
in the HOO algorithm proposed in [24]. In both cases, the
developed algorithms only implement a unique SON
function which iteratively adjusts the HOM to optimize
the respective KPIs. However, in the case of LB, the HOM
variations in both directions of any adjacency have the
same magnitude but opposite sign. In this paper, this is
equivalent to modify OðxÞ
, as shown in Eq. (4). In the case
of HOO, only the magnitude of the HOM variations is chan-
ged (i.e. the sign remains unchanged), which is equivalent
to adjusting HðxÞ
in Eq. (4).
The design of an FS that integrates both functionalities
poses new challenges. Typically, in the context of FLCs, if
the system is composed of two outputs, the design is
broken down into two new FLCs with one output each. In
this paper, the proposed joint optimization algorithm
affects two different components of the HOM parameter,
OðxÞ
and HðxÞ
, whose combination results in the values of
P. Muñoz et al. / Computer Networks 76 (2015) 112–125 115
HOM
ðxÞ
i!j and HOM
ðxÞ
j!i, both set in the adjacency x. In princi-
ple, this design may require two separate FLCs as two out-
puts are involved. However, such a solution is directly
related to the coordinator-based schemes mentioned in
Section 1, in which the FLCs would be coordinated by an
upper-level entity. To avoid the problems linked to this
kind of solutions, in this paper, the proposed FS integrates
both functionalities into one entity, whose structure is
depicted in Fig. 2. It is assumed that each adjacency in
the network has this entity implemented. As observed,
the inputs of the FS in the adjacency x are the CBR
ðxÞ
ji , the
HOR(x)
and the HOM
ðxÞ
ij . The first input is a derived KPI, cal-
culated as:
CBR
ðxÞ
ji ¼ CBR
ðxÞ
j À CBR
ðxÞ
i ; ð5Þ
where CBR
ðxÞ
i and CBR
ðxÞ
j are the CBR measured in cell i and
cell j, respectively. This input allows to balance the traffic
between adjacent cells. In principle, CBR
ðxÞ
ji could take neg-
ative values if the cell i has higher CBR than cell j. However,
to simplify the behavior of the FS, the FS is always applied
in the direction of the adjacency in which cell j has equal or
higher CBR than cell i. The second input, HOR(x)
, is used to
reduce the HO signaling load when possible. This KPI is
calculated considering those HOs and calls carried in both
cells of the adjacency. The third input, HOM
ðxÞ
ij , is the
current value of the HOM, whose aim is to determine when
the HOM is reaching high values, in which case the connec-
tion quality would be significantly impacted. More pre-
cisely, the current value of HOM is the one taken in the
opposite direction of the CBR difference, i.e. from cell i to
cell j. Finally, the output of the FS, called YðxÞ
, is a variable
whose possible values refer to simultaneous variations in
OðxÞ
and HðxÞ
. More specifically, one output value is given
by the concatenation of two fields, the former correspond-
ing to the variation in OðxÞ
and the latter to the variation in
HðxÞ
, i.e.:
YðxÞ
¼ ðDOðxÞ
; DHðxÞ
Þ: ð6Þ
This solution allows to overcome the problem of FLCs in
which the output is one variable that can only takes
discrete or continuous values in a certain range. Let also
assume that the variation for the component OðxÞ
in a cer-
tain step of the algorithm can only be Àd, 0 or þd, while
the variation for HðxÞ
can only be Às, 0 or þs. For example,
if the output is YðxÞ
¼ ðþd; 0Þ, then the assignment can be
formally expressed as:
OðxÞ
ðt þ 1Þ OðxÞ
ðtÞ þ d; ð7Þ
HðxÞ
ðt þ 1Þ HðxÞ
ðtÞ þ 0: ð8Þ
Once the inputs and the output have been determined,
the next step is their characterization from the fuzzy logic
perspective. Starting with the inputs, it is necessary to
define the fuzzy sets and membership functions associated
with them, as shown in Fig. 3. Each fuzzy set should be
identified with a linguistic term (e.g. ‘low’ or ‘very low’).
The need to work with fuzzy sets comes from the existence
of concepts with no clear boundaries in their definition. In
this context, when working with KPIs, it is often difficult to
determine from which value a KPI is considered to be jeop-
ardized. For this reason, two fuzzy sets, ‘low’ and ‘high’,
have been defined for each input. In the case of HOM, the
objective of defining two fuzzy sets is to identify when
the HOM is close to saturation, since the CDR may be neg-
atively affected. In addition, for each fuzzy set of the
inputs, a membership function, denoted by lV ðuÞ, quanti-
fies the degree of membership of a given input value u to
a certain fuzzy set V, with a value between 0 and 1. Thus,
CBR ji
(x)
HOR
(x)
HOMij
(x)
Inference
...
...
Fuzzifier
·
·
·
·
ΔO
(x)
ΔH
(x)
HOMij
(x)
ji
(x)
HOM
Network
Conversion
O
(x)
H
(x)
Y = ( , H )ΔO Δ
(x) (x)
Fig. 2. Scheme of the proposed FS.
116 P. Muñoz et al. / Computer Networks 76 (2015) 112–125
unlike in classical sets, the transition between both values
is gradual. For simplicity, as shown in Fig. 3, the selected
membership functions follow a triangle-shaped or trape-
zoid-shaped functions.
The core of the FS is given by a rule base that represents
the dynamic behavior of the FS through a set of linguistic
rules derived from the expert knowledge. Such a rule base
comprises a collection of fuzzy rules following a syntax of
the type IF-THEN to set the control strategy, e.g.:
IF ðCBR
ðxÞ
ji is highÞ & ðHORðxÞ
is lowÞ & ðHOM
ðxÞ
ij is lowÞ
THEN YðxÞ
¼ ðþd; 0Þ: ð9Þ
To define these rules, the knowledge and experience of
human experts is normally required. Each antecedent of
the rules represents an input state and the number of rules
is derived from the combination of all fuzzy sets among the
different inputs. In this sense, the definition of the rule
base must be complete (i.e. all fuzzy rules defined), so that
the FS can generate an appropriate action for every input
state in the system. The rule base for the proposed FS is
shown in Table 1. The definition of each rule is as follows.
First, rule 1 is activated when the CBR is balanced, and the
HOR and HOM remain at low values, meaning that no
change in HOM is needed (i.e. YðxÞ
¼ ½0; 0Š). Rules 2–4 have
in common that they are triggered when the HOM is low.
As stated before, a low HOM means that the connection
quality is not jeopardized due to changes in HOM, so that
both LB and HOO actions can be applied without any
restriction. For this reason, rule 2, which is triggered when
only the CBR presents undesired behavior, implements an
action of LB (i.e. YðxÞ
¼ ½þd; 0Š). Specifically, this rule shrinks
the service area of the congested cell, in order to send
traffic to the adjacent cell. Regarding rule 3, it is activated
due to HOO reasons, i.e. when the HOR reaches high values.
In this case, the proposed solution is to increase the sym-
metric region between adjacent cells (i.e. YðxÞ
¼ ½0; þsŠ),
so that the number of unnecessary HOs can be reduced.
To activate rule 4, the CBR and HOR must have high values
simultaneously. In principle, it is unclear whether the
performed action should be related to LB or to HOO. This
decision may depend on the particular scenario, so that
trial-and-error strategies (e.g. reinforcement learning) are
appropriated in this case. The next section explains how
to select the optimal consequent for this rule.
The remaining rules (i.e. rule 5, 6, 7 and 8) are all linked
to high values of the HOM, which may indicate that the con-
nection quality is very poor, especially for cell-edge users.
In rule 5, the CBR and HOR must exhibit low values to acti-
vate this rule, meaning that the problems related to LB/HOO
were mitigated by increasing HOM. The objective of rule 5
will be to decrease the HOM, since the high HOM may
already be unnecessary. If not, the concerned KPI will be
affected and, therefore, the FS will perform the appropriate
action. The problem is that such a decrease in HOM can be
due to LB or HOO reasons. As in rule 4, the application of
trial-and-error strategies will help to make this decision
in those situations. Rules 6 and 7 are related to more
extreme situations in which the existing problem has lead
the FS to achieve high values of HOM, but it has not been
mitigated. In addition, the fact that the HOM achieves large
values due to successive LB and HOO actions may nega-
tively affect the network performance. In this sense, there
are essentially two different situations related with this
issue. One situation would be given by severe congestion
situations, in which LB actions would greatly modify
service areas. As a result, any action of HOO to reduce
unnecessary HOs (e.g. due to high mobility) would involve
larger HOM values that may degrade cell-edge user’s per-
formance. Given that the HOM is saturated due to LB, the
action of rule 6 should be to lead the HOM towards lower
values or leave it unchanged, so that subsequent HOO
actions could be effectively applied. The other situation is
given by the presence of very high mobility in the network,
in which the HOM will reach high values as a result of HOO
actions. Under this assumption, any congestion arising in
the network would lead the LB part to work with large
HOM values, which is not desirable. Thus, rule 7 is intended
to leave unchanged, or even reduce, the symmetric region,
provided that the HOM is saturated due to HOO reasons.
low high
1
0
μ
(x)
ji
CBR
(x)
ji
0.03
low high
1
0
μ
(x)
HOR
(x)1
low high
1
0
μ
(x)
ij
HOM
(x)
ij
6 8
(a) (b) (c)
[dB]
Fig. 3. Membership functions of the fuzzy sets for each input: (a) CBR
ðxÞ
ji , (b) HOR(x)
and (c) HOM
ðxÞ
ij .
Table 1
Proposed sets of fuzzy rules.
Rule
no.
Input 1
CBR
ðxÞ
ji
Input 2
HOR(x)
Input 3
HOM
ðxÞ
ij
Candidate action(s)
[DOðxÞ
; DH(x)
]
1 L L L [0,0]
2 H L L [þd,0]
3 L H L [0,þs]
4 H H L [þd,0], [0,þs]
5 L L H [Àd,0], [0,Às]
6 H L H [0,0], [Àd,0]
7 L H H [0,0], [0,Às]
8 H H H [Àd,0], [0,Às]
P. Muñoz et al. / Computer Networks 76 (2015) 112–125 117
Finally, rule 8 refers to a situation in which the CBR and
HOR are simultaneously high, which may occur for example
when there are both a severe congestion and high mobility
users in the network. In this case, the solution could be to
enlarge the service area of the congested cell or to reduce
the symmetric region in the adjacency. Since the objective
would be to favor HOO or LB, respectively, the option that
simultaneously enlarges the service area of the congested
cell and reduces the symmetric region in the adjacency is
discarded. Other alternatives would cause the connection
quality to be significantly worsened. The specific action
for this rule will also be determined by the optimization
algorithm explained in next section.
Once the FS has been defined, its operation is as follows.
As shown in Fig. 2, the first step is given by the fuzzifier,
the process by which the assignment of membership val-
ues (one for each fuzzy value of the linguistic variable) to
a numerical input value is made by using the membership
functions. The next step of the FS is given by the inference,
which calculates the degree of truth of each activated rule
as follows:
aðxÞ
k ¼ lK1
ðCBR
ðxÞ
ji Þ Ã lK2
ðHORðxÞ
Þ Ã lK3
ðHOM
ðxÞ
ij Þ; ð10Þ
where aðxÞ
k is the degree of truth for the rule k in the
adjacency x, and lK1
; lK2
and lK3
are the membership
functions corresponding to the fuzzy sets K1; K2 and K3,
respectively, involved in the rule k. The intersection of
the fuzzy sets, denoted here by ‘Ã’, is implemented by using
the min-operator, which takes the minimum value of the
arguments. Finally, unlike the structure of a typical FLC,
the proposed FS does not implement the module known
as defuzzifier, where the activated fuzzy rules are all aggre-
gated to produce a non-fuzzy value. The reason for this is
that the output of the proposed scheme is given by a
two-dimensional variable whose elements are not corre-
lated between them (e.g. OðxÞ
can be either increased or
decreased while HðxÞ
does not change). Thus, the fuzziness
between different consequents would not be applicable in
this work. Conversely, the output of the proposed FS is
given by the consequent of the rule whose degree of truth
is the highest. It can be formally expressed as:
outputðxÞ
¼ YðxÞ
arg max
k
aðxÞ
k
 
: ð11Þ
Finally, to download these changes to the network
configuration, two more steps will be necessary. As repre-
sented in Fig. 2, the former is the update of OðxÞ
and HðxÞ
considering the parameter variations given by the FS and
the latter is the conversion from OðxÞ
and HðxÞ
to HOM
ðxÞ
i!j
and HOM
ðxÞ
j!i given their relationship expressed in Eq. (4).
3.2. Optimization of the fuzzy system
In the proposed FS, there are certain rules (4–8 in Table 1)
with more than one consequent defined. This means that, a
priori, it is unclear which is the most appropriate action for
these rules, since it may depend on many context factors
(e.g. the environment, the traffic distribution patterns, the
user mobility, etc.) at the moment of the execution of the
rules or simply because of the interactions between the
objectives of LO and HOO. Different strategies have been
investigated to create, adapt or refine rules [25–29]. In this
sense, mobile operators usually do not have the complete
knowledge to take proper actions in every network state.
Thus, due to the complex nature of network management,
Reinforcement Learning (RL) is of particular interest in this
context, as the system is able to learn from its own experi-
ence. In addition, unlike other mathematical approaches
(e.g. supervised learning in Neural Networks), in RL, a train-
ing data set is not required. For this reason, in this work, the
popular RL algorithm known as Q-Learning has been
adopted, so that the best consequent for each fuzzy rule
can be found through learning from interaction.
The combination between fuzzy logic and RL has been
addressed in some previous works [29–33]. However, the
proposed optimization algorithm differs from the common
implementation of the fuzzy Q-Learning algorithm [34].
This is because, in the case of a typical FLC, the q-function
(i.e. a characteristic function of fuzzy Q-Learning optimiza-
tion) is updated according to the degree of activation of each
triggered rule of the FLC. As a consequence, the q-function
can be updated for more than one input state, i.e. there
exists a certain degree of fuzziness. Conversely, in the case
of the proposed fuzzy system, the update of the q-function
is only made for one input state, as the number of rules that
can be activated at each optimization step is only one.
In RL, an agent is driven to take actions in an environ-
ment in order to maximize a cumulative reward. The
optimization scheme showing the combination of the FS
and the learning entity is depicted in Fig. 4. The basic ele-
ments in RL are the agent, the environment, the states,
the actions, the policy, the reward and the value function.
In this case, the agent that takes the actions is the proposed
FS, while the environment corresponds to the cellular net-
work. The states are given by the combination of the fuzzy
sets of the FS. Note that, for each state, there is one fuzzy
rule defined. The actions are given by the candidate conse-
quents of the rules and they represent a specific variation in
the HOM. The policy defines how the agent has to act at a
given time. The reward is a numerical value that expresses
the intrinsic desirability of being in a certain state. While
reward
state
agent
environment
CBR
HOR
(Network)
HOM
Fuzzy
System
action
HOM
CDR
Value
function
Policy
Q-Learning
Fig. 4. Optimization scheme.
118 P. Muñoz et al. / Computer Networks 76 (2015) 112–125
the reward indicates what is good in an immediate sense,
the value function specifies what is good in the long run.
In particular, the value function is a mapping between each
state and the total amount of reward that an agent can
expect to accumulate over the future, starting from that
state. In this sense, the objective of the agent is not to obtain
the maximum immediate reward, but to maximize the total
reward that the agent receives in the long run.
RL methods are characterized by two important fea-
tures: the trial-and-error search and the fact that actions
may affect not only the immediate reward but also the
subsequent rewards. As previously stated, the agent has
to maximize the received reward in the long-term (or
expected cumulative reward), which is the sum of the
rewards that will be obtained from the input states visited
in the future:
Rt ¼ rtþ1 þ crtþ2 þ c2
rtþ3 þ Á Á Á ¼
X1
k¼0
ck
rtþkþ1; ð12Þ
where r is the numerical reward obtained at each optimi-
zation step after performing an action and c is the discount
rate determining the relative importance of future
rewards. In this paper, the action performed by an acti-
vated fuzzy rule will be rewarded positively if the connec-
tion quality is not significantly degraded. The immediate
reward, r can be formally expressed by defining a specific
threshold for the CDR, which is the KPI that estimates the
connection quality, as stated in Section 2. Then, those
actions leading to a CDR equal or less than the threshold
should be rewarded with a positive value, while those
actions producing a CDR higher than the threshold should
be punished with a negative value. Considering this, the
formula for the reward is expressed as:
rðxÞ
¼
c if CDR
ðxÞ
measured 6 CDRth;
Àc otherwise;
(
ð13Þ
where rðxÞ
is the reward for the adjacency x and c is a
constant that can be expressed as a common factor in the
definition of the reward in Eq. (12), so that the effect is a
scaling transformation that can be used to avoid under-
flow/overflow issues in storing the q-function. In this
paper, c = 10 is assumed. In addition, CDRth is the threshold
defined at the network level to determine bad quality and
CDR
ðxÞ
measured
is the maximum CDR between both cells in the
adjacency, i.e.:
CDR
ðxÞ
measured ¼ max CDR
ðxÞ
i ; CDR
ðxÞ
j
n o
: ð14Þ
In the proposed FS, to quantify the benefits of executing
a certain rule consequent (i.e. the action) provided that a
rule has been activated (i.e. the state), the value q of a
state-action pair ½s; aŠ is defined. It is a discrete function,
denoted by q½s; aŠ, that expresses the expected cumulative
reward that can be received when taking action a from
state s. In this work, a discrete version of the Q-Learning
algorithm is considered, where the learned q-function
directly approximates the optimal one independently of
the policy followed by the agent [35].
The pseudo-code of the optimization algorithm is
shown in Fig. 5. After initializing the q-function, the
selection of the consequent for each rule (step 1) is made
by using a certain exploration/exploitation policy. Explora-
tion is needed since trying actions that have not yet been
selected is the only way to discover new actions that could
provide much more reward than other actions already
tested. Exploitation is also needed since the current knowl-
edge must be exploited to obtain reward. A widely-used
policy is the so-called -greedy policy, defined as:
ai ¼ arg max
k
q½i; kŠ with probability 1 À ; ð15Þ
ai ¼ randomfak; k ¼ 1; 2; . . . ; Jg with probability ; ð16Þ
where ai is the selected consequent for rule i and  deter-
mines the trade-off between exploration and exploitation
during the optimization process (e.g.  ¼ 0 means no
exploration, so that the best action is always selected).
Each time an action (i.e. a variation in HOM) is per-
formed, the network should evolve to a new state, s0
, in
which the KPIs are collected again. At this time, the reward
of the action is computed by using Eq. (13), as stated in
step 2 (Fig. 5). Then, the so-called value of the new state,
denoted by v½s0
Š, is calculated as:
v½s0
Š ¼ max
k
q½s0
; akŠ: ð17Þ
While the q-function quantifies the value of taking an
action when starting from a given state, the v-function
estimates the value of being in that state regardless of
the action to be taken. Note also that the new state s0
is
specified by the new activated fuzzy rule in the FS. From
vðs0
Þ, an error signal is calculated as follows:
Dq ¼ r þ c Á v½s0
Š À q½s; aiŠ; ð18Þ
where c is a discount factor. As observed, the first part of
the formula is the q-function calculated as the sum of the
immediate reward r for state s and the expected value of
the next state, v½s0
Š. This is equivalent to Eq. (12), where
the immediate reward and future rewards (i.e. the
expected value of the next state) are accumulated. The last
part in Eq. (18) is taken from the stored q½s; aŠ. As a result,
the q½s; aŠ will be updated in the direction of the optimal q-
function independently of the policy followed by the agent
Fig. 5. Pseudo-code of the optimization algorithm.
P. Muñoz et al. / Computer Networks 76 (2015) 112–125 119
(step 3 in Fig. 5). Such an update is made by utilizing an
ordinary gradient descent, i.e.:
q½s; aiŠ q½s; aiŠ þ g Á Dq; ð19Þ
where g is a learning rate. The above-described process is
repeated for the new current state (steps 4 and 5 in
Fig. 5) starting with the action selection (step 1).
4. Performance analysis
4.1. Analysis setup
To assess the performance of the proposed joint optimi-
zation algorithm, a dynamic system-level simulator for LTE
macrocells has been used [36]. This simulator executes a
selectable number of optimization loops to emulate the
tuning process. Each loop comprises 7000 simulation steps,
equivalent to 12 min of actual network time. Each simula-
tion step includes updating user positions, propagation
computation, generation of new calls, and radio resource
management algorithms. At the end of each loop, measure-
ments and reliable statistics are obtained to be used in the
following optimization loop. Thus, in a certain loop, the
steps 1–5 of the algorithm described in Fig. 5 are executed
once.
The simulated scenario includes a macro-cellular envi-
ronment with a layout consisting of 19 tri-sectorized sites
evenly distributed in the scenario, as shown in Fig. 6. The
main simulation parameters are summarized in Table 2.
For simplicity, only the downlink is considered in the sim-
ulation. The service provided to users is the voice call as it is
the main service affected by the tuning process. The traffic
distribution is unevenly distributed in space, where some
cells in the center of the scenario have higher traffic density
than the surrounding cells. In addition, to thoroughly assess
the proposed algorithm, three different configurations have
been considered. Firstly, the simulated high load scenario is
Scenario
Congested
ar ea
Parameters
Indicators
ing Ratio (CBR)
ping Ratio (CDR)
Load
Balancing
Handover
Optimization
Baseline
Algorithm
Fuzzy
System
Uncoordinated
Fig. 6. Block diagram of the simulation process.
Table 2
Simulation parameters.
Parameter Configuration
Cellular layout Hexagonal grid, 57 cells (3 Â 19 sites),
cell radius 0.5 km
Transmission direction Downlink
Carrier frequency 2.0 GHz
System bandwidth 1.4 MHz
Frequency reuse 1
Propagation model Okumura–Hata with wrap-around
Log-normal slow fading, rsf = 8 dB and
correlation distance = 50 m
Channel model Multipath fading, EPA model
Mobility model Random direction
Low speed = 3 km/h
High speed = 50 km/h
Service model Constant bit rate (voice call), poisson
traffic arrival, mean call duration
120 s, 16 kbps
Base station model Tri-sectorized antenna, SISO,
EIRPmax = 43 dBm
Scheduler Time domain: Round-Robin
Frequency domain: Best Channel
Power control Equal transmit power per PRB
Link adaptation Fast, CQI based, perfect estimation
Handover Time-To-Trigger = 100 ms
HOM: ½À24; 24Š dB
Call dropping SINR  À6.9 dB
Traffic distribution Unevenly distributed in space
Time resolution 100 TTI (100 ms)
Loop time 12 min
Simulation duration 3200 min
Optimization algorithm d = 1 dB, s = 0.5 dB
120 P. Muñoz et al. / Computer Networks 76 (2015) 112–125
given by the presence of a greater number of users moving
at low speed (3 km/h) around the scenario, where the CBR
is expected to be high. Secondly, the simulated high mobil-
ity scenario is given by the presence of high-speed users
(50 km/h), which in principle would lead to a high HOR.
In this case, the number of users is not high, but the
unevenly distributed traffic in the scenario can lead to con-
gestion situations, especially in the central area. Finally, the
third scenario is a combination of the two previous scenar-
ios, so that high-load and high-speed users are simulated.
To compare the proposed method with reference cases,
as shown in Fig. 6, the independent SON functions of LB
and HOO, taken from [23,24] respectively, have been
implemented and simulated in two different ways. In one
of them, only a functionality is active in the network, while
in the other configuration both LB and HOO functions are
simultaneously executed in an uncoordinated way. In
addition, a baseline optimization scheme following the
main principles addressed in [18] has been implemented.
This scheme prioritizes the HOO part depending on
whether the connection quality is jeopardized or not. More
specifically, if the CDR is above a certain threshold, only the
HOO function is executed. The performance of these
approaches will be assessed by looking at the main related
KPIs, in particular, the overall HOR, CBR and CDR. A Figure-
of-Merit (FoM), U, that combines the previous KPIs into a
scalar value has also been considered. This FoM character-
izes, qualitatively, the overall performance of the evaluated
approaches. Formally, U is defined as [37]:
U ¼ k Á ðCBR½%Š þ ð1 À CBR½%Š=100Þ Á CDR½%ŠÞ þ HOR;
ð20Þ
where k is a constant weight determining the relative
importance of the CBR and CDR (both related to user dis-
satisfaction) compared with the HO signaling cost given
by HOR. In this study, k equal to 1 is assumed.
4.2. Simulation results
First, a sensitivity analysis for determining the optimal
values of d and s (i.e. the variation of O and H components,
respectively) has been carried out. Fig. 7 shows the mean of
the related KPIs and the proposed FoM, U, for the three dif-
ferent situations: high-load, high-mobility and both
together. As observed, U is a combination of the KPIs
related to user dissatisfaction (i.e. the CBR and CDR) and
the KPI related to the HO signaling cost (i.e. the HOR).
In the high-load scenario (Fig. 7(a)), the variations of d
and s have low impact on HOR since users have low mobil-
ity, meaning that the impact of HOR on U will also be
minor. Due to this, the variations in U are mainly given
by the user dissatisfaction. In this sense, there is a clear
trade-off between CBR and CDR, i.e. while CBR is reduced
(by increasing d), CDR is greater. However, for high values
of d, the variations in CBR are greater than in CDR. As a con-
sequence, the best values of U (i.e. the lowest) correspond
to high values of d. This is in contrast to the situations with
high-mobility, as explained below.
Fig. 7. Sensitivity analysis for d and s in different scenarios: (a) high-load, (b) high-mobility and (c) high-load and high-mobility.
P. Muñoz et al. / Computer Networks 76 (2015) 112–125 121
The second scenario given by high-mobility (Fig. 7(b))
shows that HOR increases for larger values of d, especially
when s is 1 dB. The reason for this is that resizing the cell
service areas for load balancing purposes leads cell-edge
users to be under worse radio conditions after performing
an HO, so that the probability to perform a new HO to
other neighbor cells is increased. Conversely, provided that
d is low (avoiding the effect of load balancing on HOR), for
larger values of s, HOR decreases. This is in line with the
optimization of the H component, i.e. increasing H makes
more difficult to perform an HO and it reduces the HO fre-
quency. The main drawback for this case is that the CDR is
negatively affected. The high-mobility scenario also pro-
duces lower values of CBR and CDR because the traffic is
geographically dispersed due to the high speed of the
users. The configuration (d = 1, s = 0.5) dB provides the
lowest value of U, as a result of a better trade-off between
HO signaling and user dissatisfaction.
The above analysis can also be extended to the scenario
that combines both high-load and high-mobility (Fig. 7(c)).
Since an important objective of the proposed algorithm is to
optimize mobility and load balancing without jeopardizing
the connection quality, the high values of CDR measured in
this scenario establishes the possible range of variation of d
and s. In particular, it is observed that values of s above
0.5 dB involve a CDR greater than 5%, which would cause
serious inconvenience to operators. Leaving s fixed to
0.5 dB, the increase of d can also lead to high values of
CDR. In particular, values above 3 dB would significantly
jeopardize the CDR. For this reason, the range of d and s
analyzed in this work does not exceed the limits shown in
Fig. 7. As in the previous high-mobility scenario, the opti-
mal configuration is (d = 1, s = 0.5) dB, meaning that this
setting can be reasonably used to evaluate the performance
of the proposed algorithm against other approaches.
The comparison of the proposed fuzzy system with
other approaches is represented in Fig. 8, where the evolu-
tion throughout the time of the KPIs for each strategy is
depicted. For the sake of clarity, the represented values
have been averaged with the six subsequent samples.
The initial situation is given by a low traffic and low mobil-
ity. After about 200 min, the central cells of the scenario
become crowded, so that many users are blocked, increas-
ing the CBR. Looking more closely at this indicator, the
evaluated approaches reach values of CBR $5% when the
traffic change occurs. The HOO configuration is not able
to solve this problem, keeping the CBR at such high values,
while the LB configuration achieves a reduction of 2% in a
few optimization steps. Conversely, the gain in CBR
obtained by the uncoordinated alternative, the baseline
scheme and the proposed fuzzy system is more moderate.
A higher number of users also means more interference in
the network, so that the connection quality of the users is
worse, increasing the CDR. This increase is more pro-
nounced in the case of the uncoordinated approach. To
explain this, note that the LB and HOO functions are simul-
taneously changing the HOM from the first optimization
steps. As the CDR is not significantly affected by these
changes (due to the low interference conditions), the
HOM reaches large values. As a result, when the congestion
situation occurs, the HOM values are so large that the CDR
becomes high. After this, the SON functions attempt to
reduce this KPI. In the case of the baseline approach, this
effect on CDR is attenuated because the LB function is
switched off when the CDR becomes high. Since HOMs
are not adjusted by the LB function, the level of CDR is
not as high as with the uncoordinated scheme. The rest
of configurations, i.e. MLB, HOO and the fuzzy system, keep
the CDR constant at around 2%. Due to the presence of only
low-speed users, the HOR is about 1.
The offered traffic experiences a small reduction at
around min. 1000, but it is not until around min. 1200
when the users move at high-speed. The scenario of
high-mobility starts at this moment and the HOR is
abruptly increased to values above 10, except in the case
of the proposed fuzzy system, whose values over the time
are below 7. Thus, the performance of the proposed tech-
nique in terms of HOR is clearly better than the rest of
the strategies. It is also noted that the trajectory of HOR
followed by the uncoordinated and baseline approaches
is very similar since the HOO function is active during
the entire simulation. Regarding the CBR and CDR, the LB
and the proposed approaches lead to values below 1%,
while the rest of strategies produce undesirably higher val-
ues. Note also that, for all the cases, the CBR decreases sig-
nificantly from the previous situation (i.e. before min.
1200) because the traffic load is geographically dispersed
due to the presence of fast users.
The situation after $2100 min is given by a new increase
of the offered traffic, so that the last part of the simulation
includes both high-load and high-mobility. Looking at the
HOR, the proposed method remains at low values, being
the best approach from this perspective at any time. Simi-
larly, the LB approach keeps a relatively constant but higher
level of HOR values, since no actions to reduce HO signaling
take place in this case. The HOO, the uncoordinated and the
baseline approaches lead to a gradual increase in this KPI.
The reason for this is that these strategies implement the
same HOO function, which attempt to decrease the high
peak in the CDR at the expense of increasing the number
of HOs. However, the impact of these three methods on
the CDR is not the same. In particular, the baseline approach
provides lower CDR values than those obtained by the
uncoordinated approach because the LB function is
switched off when the CDR is jeopardized after the
variation in traffic load. The HOO approach gives even
lower values of CDR since the LB function is not executed
during the entire simulation. From the CBR perspective,
the LB approach provides the lowest values while the CDR
is also quite low, similar to the fuzzy system. The proposed
method gives better CBR than other approaches and, as pre-
viously stated, the HOR is the lowest as well.
The evolution of U throughout the time (Fig. 8(d))
shows the suitability of the evaluated methods in each sce-
nario. It is noted that the strategy with the lowest value of
U will establish a good trade-off between HO signaling and
user dissatisfaction. In the first scenario, given by high-
load conditions, the best method is the execution of LB
alone, which significantly reduces the CBR but at the
expense of an increase in the CDR that is higher than in
the case of the proposed scheme. This is because the
scenario has low mobility and does not require any
122 P. Muñoz et al. / Computer Networks 76 (2015) 112–125
optimization from the HOO perspective. In the second sce-
nario, determined by the presence of high-speed users, the
proposed fuzzy system provides the lowest value of U,
since it considerably reduces the HOR. At the beginning
of the third scenario (a combination of the two previous),
the proposed scheme also achieves lower U values than
those obtained by the LB approach since this latter method
needs more iterations to reduce the CBR. Thus, it can be
highlighted that the proposed joint optimization method
is the only solution that, in the presence of mobility and
congestion problems (i.e. scenarios two and three), reduces
both the HOR and the CBR, which are the objectives of the
HOO and LB, respectively. In this sense, note that the LB
approach does not reduce the HOR in the second scenario,
which is mainly determined by high-mobility.
5. Conclusion
In this paper, a novel joint optimization algorithm for
LB and HOO functions has been proposed. First, the
optimized parameter HOM is broken down into two com-
ponents, O(x)
and H(x)
, which are directly related to LB and
HOO, respectively. Then, an FS that adjusts the HOM com-
ponents at the cell adjacency level for the joint optimiza-
tion of both functions is proposed. Finally, the FS is
teamed with the Q-Learning algorithm, which leads the
0 500 1000 1500 2000 2500 3000
5
10
15
20
HOR
(a)
LB
HOO
Uncoordinated
Baseline
Fuzzy System
0 500 1000 1500 2000 2500 3000
0
2
4
6
8
(b)
0 500 1000 1500 2000 2500 3000
0
2
4
6
(c)
0 500 1000 1500 2000 2500 3000
0
20
40
60
80
U
(d)
Fig. 8. Temporary evolution of (a) HOR, (b) CBR, (c) CDR and (d) U for different approaches.
P. Muñoz et al. / Computer Networks 76 (2015) 112–125 123
FS to select suitable actions from the LB/HOO perspective,
without jeopardizing the connection quality of the active
users in the network. The proposed technique has been
compared with a baseline scheme based on the existing
bibliography and the reference cases in which LB and
HOO operate separately or even simultaneously in an
uncoordinated way. In addition, these techniques have
been assessed in extreme scenarios in which the HOM
achieves large values, such as those with high traffic load
and/or high mobility.
Results show that the proposed scheme effectively
improves network performance over the reference cases.
In particular, the HOR in the presence of high-mobility
users can be reduced down to the half, while the user
dissatisfaction in terms of the CBR and CDR keeps values
similar to the baseline schemes. In addition, it is the only
solution that is able to partially alleviate a congestion
situation and to reduce the number of HOs, which are
the main objectives of the LB and HOO, respectively. Unlike
other reference methods, the proposed technique does not
produce high peaks in the KPIs when the situation changes
abruptly, e.g. some cells become congested. In the context
of SON, it is highlighted that the complexity of the SON
entity that coordinates SON specific functions would be
reduced, as it is freed from the coordination of the two
important SON functions, LB and HOO. Finally, the advan-
tages of using fuzzy logic is that the proposed design is
easy to implement.
Acknowledgment
This work has partially been supported by the Junta de
Andalucía (Excellence Research Program, Projects P08-TIC-
4052 and P12-TIC-2905).
References
[1] L.C. Schmelz et al., Self-configuration, -optimisation and -healing in
wireless networks, in: Wireless World Research Forum Meeting, vol.
20, 2008.
[2] 3GPP, Evolved Universal Terrestrial Radio Access (E-UTRA) and
Evolved Universal Terrestrial Radio Access Network (E-UTRAN);
Overall description; Stage 2, version 11.4.0 (2012-12), TS 36.300.
[3] 3GPP, Self-Organizing Networks (SON) Policy Network Resource
Model (NRM) Integration Reference Point (IRP); Requirements,
version 11.1.0 (2012-12), TS 32.521.
[4] I. Viering, M. Döttling, A. Lobinger, A mathematical perspective of
self-optimizing wireless networks, in: Proc. of International
Conference on Communications (ICC ’09), 2009.
[5] 3GPP, Self-Organizing Networks (SON) Policy Network Resource
Model (NRM) Integration Reference Point (IRP); Information Service
(IS), version 11.4.0 (2012-12), TS 32.522.
[6] K. Tsagkaris, N. Koutsouris, P. Demestichas, R. Combes, SON
coordination in a unified management framework, in: Proc. of IEEE
77th Vehicular Technology Conference (VTC), Spring, 2013.
[7] X. Gelabert, B. Sayrac, S. Ben Jemaa, A heuristic coordination
framework for self-optimizing mechanisms in LTE HetNets, IEEE
Trans. Veh. Technol. 63 (3) (2013) 1320–1334.
[8] R. Combes, Z. Altman, E. Altman, Coordination of autonomic
functionalities in communications networks, in: CoRR abs/
1209.1236, 2012.
[9] H. Lateef, A. Imran, A. Abu-Dayya, A framework for classification of
self-organising network conflicts and coordination algorithms, in:
Proc. of IEEE 24th International Symposium on Personal Indoor and
Mobile Radio Communications (PIMRC), 2013.
[10] L. Schmelz, M. Amirijoo, A. Eisenblaetter, R. Litjens, M. Neuland, J.
Turk, A coordination framework for self-organisation in LTE
networks, in: Proc. of IEEE International Symposium on Integrated
Network Management (IM), 2011 IFIP, 2011, pp. 193–200.
[11] P. Vlacheas, E. Thomatos, K. Tsagkaris, P. Demestichas, Operator-
governed SON coordination in downlink LTE networks, in: Proc. of
Future Network  Mobile Summit (FutureNetw), 2012.
[12] INFSO-ICT-216284 SOCRATES, Framework for the Development of
Self-organisation Methods, Tech. Rep. Deliverable D2.4, Version
1.0.3, September, 2008.
[13] W. Li, X. Duan, S. Jia, L. Zhang, Y. Liu, J. Lin, A dynamic hysteresis-
adjusting algorithm in LTE self-organization networks, in: Proc. of
IEEE 75th Vehicular Technology Conference (VTC), Spring, 2012.
[14] Y. Li, M. Li, B. Cao, Y. Wang, W. Liu, Dynamic optimization of
handover parameters adjustment for conflict avoidance in long term
evolution, China Commun. 10 (1) (2013) 56–71.
[15] R. Romeikat, H. Sanneck, T. Bandh, Efficient, dynamic coordination of
request batches in C-SON systems, in: Proc. of IEEE 77th Vehicular
Technology Conference (VTC), Spring, 2013.
[16] H. Klessig, A. Fehske, G. Fettweis, J. Voigt, Improving coverage and
load conditions through joint adaptation of antenna tilts and cell
selection rules in mobile networks, in: Proc. of International
Symposium on Wireless Communication Systems (ISWCS), 2012.
[17] J. Chen, H. Zhuang, B. Andrian, Y. Li, Difference-based joint
parameter configuration for MRO and MLB, in: Proc. of IEEE 75th
Vehicular Technology Conference (VTC), Spring, 2012.
[18] W.-Y. Li, X. Zhang, S.-C. Jia, X.-Y. Gu, L. Zhang, X.-Y. Duan, J.-R. Lin, A
novel dynamic adjusting algorithm for load balancing and handover
co-optimization in LTE SON, J. Comput. Sci. Technol. 28 (3) (2013)
437–444.
[19] 3GPP, Evolved Universal Terrestrial Radio Access (E-UTRA); Radio
Resource Control (RRC); Protocol specification, version 11.2.0 (2012-
12), TS 36.331.
[20] T. Ross, Fuzzy Logic with Engineering Applications, Wiley, 2010.
[21] A. Engelbrecht, Computational Intelligence: An Introduction, John
Wiley  Sons, 2007.
[22] C. Lee, Fuzzy logic in control systems: fuzzy logic controller. I, IEEE
Trans. Syst., Man Cybernet. 20 (2) (1990) 404–418.
[23] P. Muñoz, R. Barco, I. de la Bandera, Optimization of load balancing
using fuzzy Q-Learning for next generation wireless networks,
Expert Syst. Appl. 40 (4) (2013) 984–994.
[24] P. Muñoz, R. Barco, I. de la Bandera, On the potential of handover
parameter optimization for self-organizing networks, IEEE Trans.
Veh. Technol. 62 (5) (2013) 1895–1905.
[25] K.C. Foong, C.T. Chee, L.S. Wei, Adaptive network fuzzy inference
system (ANFIS) handoff algorithm, in: Proc. of the International
Conference on Future Computer and Communication (ICFCC), 2009.
[26] A. Çalhan, C. Çeken, An optimum vertical handoff decision algorithm
based on adaptive fuzzy logic and genetic algorithm, Wireless Pers.
Commun. (2010) 1–18.
[27] L. Giupponi, R. Agustí, J. Pérez-Romero, O. Sallent, A framework for
JRRM with resource reservation and multiservice provisioning in
heterogeneous networks, Mobile Networks Appl. 11 (2006) 825–
846.
[28] M. Dirani, Z. Altman, Self-organizing networks in next generation
radio access networks: application to fractional power control,
Comput. Networks 55 (2) (2011) 431–438.
[29] R. Nasri, A. Samhat, Z. Altman, A new approach of UMTS-WLAN load
balancing; algorithm and its dynamic optimization, in: Proc. of IEEE
International Symposium on a World of Wireless, Mobile and
Multimedia Networks, 2007.
[30] A. Galindo-Serrano, L. Giupponi, Downlink femto-to-macro
interference management based on fuzzy Q-learning, in: Proc. of
International Symposium on Modeling and Optimization in Mobile,
Ad Hoc and Wireless Networks (WiOpt), 2011.
[31] M. Haddad, Z. Altman, S. Elayoubi, E. Altman, A Nash–Stackelberg
fuzzy Q-learning decision approach in heterogeneous cognitive
networks, in: Proc. of IEEE Global Telecommunications Conference
(GLOBECOM), 2010.
[32] R. Razavi, S. Klein, H. Claussen, A fuzzy reinforcement learning
approach for self-optimization of coverage in LTE networks, Bell
Labs Tech. J. 15 (3) (2010) 153–175.
[33] Y.H. Chen, C.J. Chang, C.Y. Huang, Fuzzy Q-learning admission
control for WCDMA/WLAN heterogeneous networks with
multimedia traffic, IEEE Trans. Mobile Comput. 8 (11) (2009)
1469–1479.
[34] P.Y. Glorennec, Fuzzy Q-learning and dynamical fuzzy Q-learning,
in: Proc. of the Third IEEE Conference on Fuzzy Systems, vol. 1, 1994,
pp. 474–479.
[35] C. Watkins, P. Dayan, Technical note: Q-learning, Mach. Learn. 8 (3)
(1992) 279–292.
124 P. Muñoz et al. / Computer Networks 76 (2015) 112–125
[36] P. Muñoz, I. de la Bandera, F. Ruiz, S. Luna-Ramírez, R. Barco, M. Toril,
P. Lázaro, J. Rodríguez, Computationally-efficient design of a
dynamic system-level LTE simulator, Int. J. Electron. Telecommun.
57 (3) (2011) 347–358.
[37] J. Ruiz-Avilés, S. Luna-Ramírez, M. Toril, F. Ruiz, Traffic steering by
self-tuning controllers in enterprise LTE femtocells, EURASIP J.
Wireless Commun. Network. 2012 (337) (2012).
Pablo Muñoz received his M.Sc. and Ph.D.
degrees in Telecommunication Engineering
from the University of Málaga (Spain) in 2008
and 2013, respectively. He is currently work-
ing with the Communications Engineering
Department at the same university. Since
September 2009, he has been a Ph.D. Fellow,
where he has been working in self optimiza-
tion of mobile radio access networks and
radio resource management.
Raquel Barco received the M.Sc. degree in
Telecommunication Engineering in 1997 and
the Ph.D. degree in 2007 from the University
of Málaga, Spain. From 1998 to 2000, she
worked at the European Space Agency,
Darmstadt, Germany. From 2000 to 2003, she
worked part-time for Nokia Networks. Cur-
rently, she is Associate Professor at the Com-
munication Engineering Department,
University of Málaga. She has published more
than 50 papers in international journals and
conferences and she has been involved in
several projects with companies. Her research interests are in the field of
mobile communication systems, especially Self-Organizing Networks.
Isabel de la Bandera received her M.Sc.
degree in Telecommunication Engineering
from the University of Málaga (Spain) in 2009.
In 2008, she was with the Communications
Engineering Department at the same univer-
sity in RFID projects. Since February 2010, she
has been with the same department working
in projects about radio resource management
in next generation mobile networks and she is
working toward the Ph.D. degree in Tele-
communications Engineering.
P. Muñoz et al. / Computer Networks 76 (2015) 112–125 125

More Related Content

PDF
LTE-RF Drive test .pdf
PPT
02 umts network architecturenew
DOCX
UMTS/3G RAN Capacity Management Guideline Part-02 (Sectorization))
PDF
5g introduction_NR
PPTX
4G Handovers || LTE Handovers ||
PPT
Umts Kpi
PDF
Advanced: Private Networks & 5G Non-Public Networks
LTE-RF Drive test .pdf
02 umts network architecturenew
UMTS/3G RAN Capacity Management Guideline Part-02 (Sectorization))
5g introduction_NR
4G Handovers || LTE Handovers ||
Umts Kpi
Advanced: Private Networks & 5G Non-Public Networks

What's hot (20)

DOC
Dcr optimization after swap
PDF
2G Handover Details (Huawei)
PDF
VoLTE KPI Performance Explained
PPTX
Sdcch drop rate
PDF
Drive test from a to z
PPTX
NSN NOKIA 3G KPI for Network planning and optimization
PDF
Lte capacity monitoring
PDF
Lte kpis, counters & amp; timers
PDF
Concentric &amp; Dual Band Cells
PPT
Lte ho parameters trial_01262011
PDF
gsm-kpi-optimization
PDF
Advanced: 5G Service Based Architecture (SBA)
DOC
Lte kp is calculation
PDF
5G Standards: 3GPP Release 15, 16, and beyond
PPTX
Drive Test
PPT
Presentation Rf Optimization And Planning
PDF
Huawei_LTE_KPI_Optimization.pdf
PDF
huawei-lte-kpi-ref
PPTX
Deep Dive 5G NR-RAN Release 2018 Q4.pptx
PDF
4.oeo000040 lte traffic fault diagnosis issue 1
Dcr optimization after swap
2G Handover Details (Huawei)
VoLTE KPI Performance Explained
Sdcch drop rate
Drive test from a to z
NSN NOKIA 3G KPI for Network planning and optimization
Lte capacity monitoring
Lte kpis, counters & amp; timers
Concentric &amp; Dual Band Cells
Lte ho parameters trial_01262011
gsm-kpi-optimization
Advanced: 5G Service Based Architecture (SBA)
Lte kp is calculation
5G Standards: 3GPP Release 15, 16, and beyond
Drive Test
Presentation Rf Optimization And Planning
Huawei_LTE_KPI_Optimization.pdf
huawei-lte-kpi-ref
Deep Dive 5G NR-RAN Release 2018 Q4.pptx
4.oeo000040 lte traffic fault diagnosis issue 1
Ad

Similar to Load balancing and handoff in lte (20)

PDF
MODELING, IMPLEMENTATION AND PERFORMANCE ANALYSIS OF MOBILITY LOAD BALANCING ...
PDF
Scope of-automated-organizing-network-in-telecom-industry
PDF
Ip and 3 g
PDF
A LITERATURE SURVEY ON ENERGY SAVING SCHEME IN CELLULAR RADIO ACCESS NETWORKS...
PDF
Analytical average throughput and delay estimations for LTE
PDF
STUDY THE EFFECT OF PARAMETERS TO LOAD BALANCING IN CLOUD COMPUTING
PPTX
Electrical distribution system planning
PDF
Performance analysis of resource
PDF
PERFORMANCE ANALYSIS OF RESOURCE SCHEDULING IN LTE FEMTOCELLS NETWORKS
PDF
THE DEVELOPMENT AND STUDY OF THE METHODS AND ALGORITHMS FOR THE CLASSIFICATIO...
PDF
A NURBS-optimized dRRM solution in a mono-channel condition for IEEE 802.11 e...
PDF
MOBILITY LOAD BALANCING BASED ADAPTIVE HANDOVER IN DOWNLINK LTE SELF-ORGANIZI...
PDF
MOBILITY LOAD BALANCING BASED ADAPTIVE HANDOVER IN DOWNLINK LTE SELF-ORGANIZI...
PDF
Efficient P2P data dissemination in integrated optical and wireless networks ...
PDF
WMNs: The Design and Analysis of Fair Scheduling
PDF
C017641219
PDF
LTE handover
PDF
Effective Access Point Selection for improving Throughput in Wireless LAN
PDF
AN OPEN JACKSON NETWORK MODEL FOR HETEROGENEOUS INFRASTRUCTURE AS A SERVICE O...
PDF
A3: application-aware acceleration for wireless data networks
MODELING, IMPLEMENTATION AND PERFORMANCE ANALYSIS OF MOBILITY LOAD BALANCING ...
Scope of-automated-organizing-network-in-telecom-industry
Ip and 3 g
A LITERATURE SURVEY ON ENERGY SAVING SCHEME IN CELLULAR RADIO ACCESS NETWORKS...
Analytical average throughput and delay estimations for LTE
STUDY THE EFFECT OF PARAMETERS TO LOAD BALANCING IN CLOUD COMPUTING
Electrical distribution system planning
Performance analysis of resource
PERFORMANCE ANALYSIS OF RESOURCE SCHEDULING IN LTE FEMTOCELLS NETWORKS
THE DEVELOPMENT AND STUDY OF THE METHODS AND ALGORITHMS FOR THE CLASSIFICATIO...
A NURBS-optimized dRRM solution in a mono-channel condition for IEEE 802.11 e...
MOBILITY LOAD BALANCING BASED ADAPTIVE HANDOVER IN DOWNLINK LTE SELF-ORGANIZI...
MOBILITY LOAD BALANCING BASED ADAPTIVE HANDOVER IN DOWNLINK LTE SELF-ORGANIZI...
Efficient P2P data dissemination in integrated optical and wireless networks ...
WMNs: The Design and Analysis of Fair Scheduling
C017641219
LTE handover
Effective Access Point Selection for improving Throughput in Wireless LAN
AN OPEN JACKSON NETWORK MODEL FOR HETEROGENEOUS INFRASTRUCTURE AS A SERVICE O...
A3: application-aware acceleration for wireless data networks
Ad

Recently uploaded (20)

PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
Sustainable Sites - Green Building Construction
PPTX
web development for engineering and engineering
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
Welding lecture in detail for understanding
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
additive manufacturing of ss316l using mig welding
PDF
Digital Logic Computer Design lecture notes
PDF
composite construction of structures.pdf
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Construction Project Organization Group 2.pptx
PDF
PPT on Performance Review to get promotions
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
Sustainable Sites - Green Building Construction
web development for engineering and engineering
Model Code of Practice - Construction Work - 21102022 .pdf
Welding lecture in detail for understanding
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
additive manufacturing of ss316l using mig welding
Digital Logic Computer Design lecture notes
composite construction of structures.pdf
Foundation to blockchain - A guide to Blockchain Tech
Construction Project Organization Group 2.pptx
PPT on Performance Review to get promotions
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT

Load balancing and handoff in lte

  • 2. Load balancing and handover joint optimization in LTE networks using Fuzzy Logic and Reinforcement Learning P. Muñoz ⇑ , R. Barco, I. de la Bandera University of Málaga, Communications Engineering Dept., Campus de Teatinos, 29071 Málaga, Spain a r t i c l e i n f o Article history: Received 10 June 2014 Received in revised form 29 October 2014 Accepted 31 October 2014 Available online 13 November 2014 Keywords: Load balancing Handover Self-organizing networks Long-term evolution Fuzzy logic Reinforcement learning a b s t r a c t With the growing deployment of cellular networks, operators have to devote significant manual effort to network management. As a result, Self-Organizing Networks (SONs) have become increasingly important in order to raise the level of automated operation in cellular technologies. In this context, Load Balancing (LB) and Handover Optimization (HOO) have been identified by industry as key self-organizing mechanisms for the Radio Access Net- works (RANs). However, most efforts have been focused on developing a stand-alone entity for each self-organizing mechanism, which will run in parallel with other entities, as well as designing coordination mechanisms in charge of stabilizing the network as a whole. Due to the importance of LB and HOO, in this paper, a unified self-management mechanism based on Fuzzy Logic and Reinforcement Learning is proposed. In particular, the proposed algorithm modifies handover parameters to optimize the main Key Performance Indicators related to LB and HOO. Results show that the proposed scheme effectively provides better performance than independent entities running simultaneously in the network. Ó 2014 Elsevier B.V. All rights reserved. 1. Introduction In the last years, cellular networks have experienced a large increase in size and complexity. As a result, mobile operators have focused attention on reducing capital expenditures (CAPEX) and operational expenditures (OPEX) of their networks [1]. This fact has stimulated strong research activity in the field of Self-Organizing Networks (SON), which is a set of principles and concepts defined by the 3rd Generation Partnership Project (3GPP) for automating network management while improving network quality [2]. In the context of SON, certain functions have been identified as key enablers by the 3GPP, among which are Load Balancing (LB) and Handover Optimization (HOO). The former is an automated function where cells suffering occasional congestion can transfer load to neighbor cells, which have spare resources, by e.g. adjusting mobility parameters. The latter is a solution for automatic detection and correction of errors and subopti- mal settings in the mobility configuration, which may lead to a degradation of user performance. Many efforts in the research community have been devoted to the so-called Mobility LB (MLB) and Mobility Robustness Optimization (MRO), for which the 3rd Generation Partnership Project (3GPP) has specified particular features [3]. Typically, these functionalities are implemented at a low-level in the net- work architecture, meaning that they operate quickly (i.e. at time scales of the order of seconds or less) and they are located in each base station on the access network. In this sense, less or no attention has been paid to LB and HOO at higher levels, e.g. at the level of the Operations, Administration, and Maintenance (OAM) system, which typically operates slower (i.e. at time scales of the order of minutes or even hours) and they are not necessarily located in the base stations (e.g. they can be located in a http://dx.doi.org/10.1016/j.comnet.2014.10.027 1389-1286/Ó 2014 Elsevier B.V. All rights reserved. ⇑ Corresponding author. Tel.: +34 952 134 164; fax: +34 952 132 027. E-mail addresses: pabloml@ic.uma.es (P. Muñoz), rbm@ic.uma.es (R. Barco), ibanderac@ic.uma.es (I. de la Bandera). Computer Networks 76 (2015) 112–125 Contents lists available at ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet
  • 3. server on the core network). Thus, network management at this level copes with slower changes in the network, whose impact on performance can be even more important, since the underlying variations to be tracked are typically rather slow as well [4]. In addition, the data available in the OAM system is much more abundant than in the base stations, thereby allowing more efficient and powerful network management. As a result, the implementation of this kind of algorithms will provide great benefits and cost-savings to operators. As the deployment of stand-alone SON functions is growing, the number of conflicts and dependencies between them increases. A conflict can happen if, for example, two individual SON functions optimize the same parameter with different goals at a network element [5]. As expected, conflicts may have a negative impact on network performance. The common solution in SON research has been to create an additional entity, usually called coordina- tor, which manages the conflict. Typically, an entity causing conflict is switched off or limited in the control strategy, e.g. by decreasing the allowed range, the maximum allowed step sizes or the periodicity at which parameter control takes place. The study of SON coordina- tion is a topic recently addressed in the bibliography. On the one hand, there are several studies with the aim of developing a functional framework for SON coordination [6–10]. On the other hand, further efforts have been devoted to specific solutions for coordination of two or more SON functionalities [11]. Special attention has been devoted to the coordination of MLB and MRO, addressed by the SOCRATES project [12]. In particular, the study assumes the control parameters of the MLB and MRO algorithms to be independent of each other, i.e. the two algorithms do not tune the same parameters. While the MLB function adjusts the HO margin (HOM), the MRO function adjusts the Time-To-Trigger (TTT) and hysteresis parameters. The interactions exist because these two func- tions influence the same Key Performance Indicators (KPIs) that are used as input for the optimization algorithms. In [13], a constraint for the connection quality more restric- tive than the one assumed in the SOCRATES project is con- sidered. In this sense, the MLB function is restrained in favor of the HO performance optimization. In [14], to avoid the conflict between MLB and MRO, the HOM range of MLB is dynamically adjusted according to the TTT and the hys- teresis parameters, which are first adjusted considering the effect of the user speed. Although the coordinator-based schemes have been well accepted by the research community, there are some related issues. Specifically, the definition of operator policies becomes a complex task, since there exists a trade-off between proper controllability and ease of use [10]. In addition, when some limitations are applied to the control strategy (e.g. by restricting the step size), the optimal configuration may lie outside the space of possible solutions. Another problem is related to the prioritization of SON functions in a centralized coordination scheme, which is the typical implementation due to the required integration with a (centralized) legacy OAM system [15]. Under this situation, the coordination entity has to process many parameter configuration requests, so that the risk of monopolization by high priority functions is high. Due to this, the joint optimization of SON functions has also been addressed. In [16], the problem of coordinating capacity and coverage optimization and MLB is addressed. Instead of implementing an additional entity that coordinates the outcomes of each independent function, these functions are combined into one algorithm and then the cellular net- work is optimized towards a joint target. Similarly, in [17], instead of controlling the conflict between independent MRO and MLB functions, a joint optimization algorithm is proposed. Such an algorithm adjusts the same HO param- eters for individual users (i.e. each user has individual val- ues of the same HO parameters). This solution reduces unnecessary HOs for some users that should not be handed over to the neighbor cell. However, it is noted that, at the level of the OAM system, this feature is hard to be imple- mented since statistics are rarely given per user-level, in addition to the high signaling cost that this kind of optimi- zation would involve. In [18], the proposed MRO and MLB algorithm prioritizes the MRO part, since KPIs related to the connection quality (e.g. the radio link failure) are con- sidered first. However, other important KPIs from the MRO viewpoint, such as those associated with unnecessary HOs, are not taken into account in the study, which makes more difficult to achieve optimal performance. For all those reasons, in this paper, a novel unified algo- rithm for both LB and HOO in Long-Term Evolution (LTE) networks is proposed. This algorithm is based on a Fuzzy System (FS) that tunes the handover (HO) parameters at the cell adjacency level to improve network performance. The FS is optimized by the Q-Learning algorithm, which drives it to select the most appropriate action either due to LB and/or HOO reasons. The decision of which action the FS should take depends on past actions which were taken by the FS and whose impact on network perfor- mance was measured through the KPIs. With the proposed solution, the complexity of the SON coordination entity would be reduced, as it is freed from the coordination of two important SON functions. In addition, the proposed algorithm is expected to achieve better performance, as its space of all candidate solutions is not as restricted as if a coordinator-based scheme or some type of prioritiza- tion algorithm would be used. The rest of the paper is organized as follows. Section 2 formulates the problem and introduces the mobility algorithm in LTE networks and the system performance metrics. In Section 3, the design of the proposed FS as well as its optimization process is described. Section 4 presents the simulation setup and discusses the simulation results. Finally, Section 5 presents the main conclusions of the study. 2. System model The HO is the procedure that preserves the connection when the user moves around the network. As LTE is being deployed with a frequency reuse of one (i.e. the same fre- quency is shared by all cells), the intra-frequency HO is very common in these networks. More specifically, the most widely extended algorithm for the HO-triggering decision is the 3GPP A3 event [19]. Roughly, this algorithm P. Muñoz et al. / Computer Networks 76 (2015) 112–125 113
  • 4. triggers the execution of an HO if the neighbor cell becomes offset better than the serving cell during a specific time period determined by the TTT parameter. Formally, it is expressed as: RSRPj > RSRPi þ HOMi!j; ð1Þ where RSRPi and RSRPj are the averaged values of the Ref- erence Signal Received Power (RSRP) measured for serving cell i and target cell j respectively, and HOMi!j is the HOM from cell i to cell j. Note that the symmetric HOMj!i is also defined in the opposite direction of the adjacency (i.e. a pair of cells that are neighbors). In contrast with HO-triggering decisions based on absolute comparisons (e.g. the serving/neighbor cell below/above a threshold), the A3 event consists of a rela- tive comparison that simplifies the configuration of its parameters since they are independent of the absolute received power levels, which may depend on diverse con- text factors. However, the HOM in Eq. (1) is broken down into several terms by the 3GPP, so that: HOMi!j ¼ Hys þ Ofi À Ofj þ Oci À Ocj þ Off; ð2Þ where Hys and Off are the hysteresis and offset parameters, respectively, for this event, Ofi and Oci are the frequency and cell specific offsets, respectively, for serving cell i, and Ofj and Ocj are the frequency and cell specific offsets, respectively, for neighbor cell j. While only one value of Hys and Off corresponding to the A3 event can be used for all the cells and deployed frequencies in the network, Oci and Ocj can be defined per cell and Ofi and Ofj can be defined per frequency layer. In addition to this, the defini- tion of Hys implies the existence of another inequality in which this term has opposite sign: RSRPj < RSRPi À Hys þ Ofi À Ofj þ Oci À Ocj þ Off: ð3Þ This inequality is called the leaving condition for this event. Assuming that the entering condition given by Eq. (1) was previously satisfied, the leaving condition must be satisfied to reset the TTT parameter. By optimizing Hys, the impact of signal fluctuations on the handover pro- cess can be effectively reduced. In general, the HOO function is directly related to the parameter Hys, so that its optimization could only be performed at the event level. Conversely, LB function is more related to HO parameters that are defined at the cell level (e.g. Oci and Ocj). In practice, different parameters are optimum for different cell pairs (e.g. due to the shadowing variations). Hence, the model adopted in this paper is based on a joint optimization of LB and HOO at the cell level. Specifically, this model requires only one formula (e.g. the entering condition given by Eq. (2)) and one parameter, denoted as HOM and defined per adjacency. The only condition is that both HOM values in the adja- cency (i.e. HOMi!j and HOMj!i) are simultaneously tuned to perform the joint optimization of LB and HOO. Note that this model complies with the 3GPP specifications if the parameter Hys is set to zero and the rest of the parameters are grouped into a single parameter defined per adjacency. To facilitate the joint optimization, the parameters need to be expressed in a more understandable way, according to the following relationship in an adjacency x: HðxÞ þ OðxÞ ¼ HOM ðxÞ i!j; HðxÞ À OðxÞ ¼ HOM ðxÞ j!i; 8 < : ð4Þ where HðxÞ is the parameter related to HOO that represents a hysteresis and OðxÞ is the parameter related to LB that rep- resents a certain offset. Intuitively, the HOM determines the area where the users connected to a cell would perform an HO toward a neighbor cell. On the one hand, in the con- text of HOO and assuming that OðxÞ ¼ 0, the parameter HOM ðxÞ i!j and the symmetric HOM ðxÞ j!i can be set to the same positive value (given by HðxÞ ) so that certain symmetric region between the two cells is ensured in order to avoid unnecessary HOs. In Fig. 1(a), the RSRP of the neighbor and the serving cells are represented. For instance, if a user connected to cell i moves to cell j, it connects to cell j when the RSRP from cell j is equal to the RSRP from cell i plus HOM ðxÞ i!j. As in this case the HOMs are assumed to be sym- metric, the same value is applied to the opposite situation (i.e. when the user moves from cell j to i; HOM ðxÞ j!i is used). Decreasing such a symmetric region (i.e. both HOMs) favors UEs to perform an HO to a neighbor cell. This is because the UE would need a lower received power from the target cell to trigger an HO. Conversely, increasing the HOMs makes more difficult to perform an HO, meaning that the user spends more time attached to the serving cell, while the connection quality is getting worse, which could cell iRSRP HOMji HOMij cell j cell i cell j cell iRSRP HOMji HOMij cell j cell i cell j (a) (b) Fig. 1. Adjustment of HOM for (a) MRO and (b) MLB purposes. 114 P. Muñoz et al. / Computer Networks 76 (2015) 112–125
  • 5. lead to call dropping. However, under this configuration, possible unnecessary HOs due to signal fluctuations can be avoided. In this sense, the HOO function aims to reduce the inefficient usage of network resources due to unneces- sary HOs provided that the level of call dropping is low enough. An important KPI closely related to the problem of unnecessary HOs is the HO Ratio (HOR), defined in this work as the number of HOs divided by the total number of carried calls. On the other hand, maintaining the symmetric region, both HOMs can also be jointly tuned by means of OðxÞ (e.g. HOM ðxÞ i!j is increased while HOM ðxÞ j!i is decreased) so that the service area of these cells is modified for LB purposes. In this case, both HOMs are modified with the same magnitude to preserve the hysteresis region. However, those variations in HOM should have opposite sign to mod- ify the service area of the two cells. In Fig. 1(b), it is observed that HOM ðxÞ i!j has been increased, while HOM ðxÞ j!i has been decreased. As a result, the service area of the cell i is larger. Considering, for example, that cell j is overloaded, its service area is reduced while the service area of the adjacent cells with spare resources (e.g. cell i) is increased to take users from the congested cell edge. Thus, cells suffering occa- sional congestion can transfer load to neighbor cells, which have free resources, by adjusting mobility parameters. It is noted that, since HOMs are defined on an adjacency basis, cell service areas cannot be only re-sized but also re-shaped. The most important benefit of LB is that call blocking in the network is reduced, especially in those cells highly loaded. The function responsible for accepting or blocking a call is the Call Admission Control (CAC). Such a function checks the availability of free resources in the candidate cell before taking a decision. In this paper, a ‘worst-case’ criterion has been adopted to accept calls, i.e. the user is finally accepted if the highest number of radio resources needed to maintain a connection (worst-case requirement) is less or equal than the number of radio resources available in the candidate cell. If the condition is not satisfied by any candidate cell, then the call is blocked. To quantify the call blocking, network operators typically use the Call Blocking Ratio (CBR), defined as the number of blocked calls divided by the number of call attempts. Finally, the actions performed by LB and HOO may also involve a decrease in the connection quality. To explain this, on the one hand, let’s consider the HOM to be decreased for LB reasons. The target cell will increase the probability to be preferred to the serving cell, even if the connection quality is worse due to negative HOM values. If this is the case, some users, usually located in the cell edge, will be handed over to the target cell experiencing worse radio conditions as a result of the performed HO. Thus, negative values of the HOM will increase the risk of dropping. On the other hand, when the HOM is increased for either LB or HOO reasons, the user will spend more time attached to the serving cell, delaying an HO towards cells with better radio conditions. In this case, the probability of call dropping will also increase. For these reasons, a KPI widely used by network operators is the Call Dropping Ratio (CDR), defined as the number of dropped calls divided by the number of finished calls. Typically, call dropping may occur because the connection quality is bad, but also because there are no available resources due to an overload situation. In this paper, the calculation of the CDR includes those dropped calls due to bad radio conditions. In particu- lar, a call is dropped when the Signal-to Interference plus Noise Ratio (SINR) is below a certain threshold during a specific time interval. The call dropping due to an overload situation is assumed to be negligible due to the correct operation of the CAC in the network, since enough resources are guaranteed by the CAC for the accepted calls. 3. Joint optimization algorithm This section explains the proposed algorithm for LB and HOO. The first part comprises the design of the FS, describ- ing the inputs, the outputs and the behavior of the system. After this, the second part of the section is devoted to the optimization technique that is used to lead the FS in the action selection. 3.1. The fuzzy system As an alternative to Classical Logic, Fuzzy Logic is a mathematical discipline that introduces a degree of vagueness when an assertion is made [20]. The design of a FS for control problems is one of the most important application areas of Fuzzy Logic [21]. Its main benefit is that controlling a system can be performed by using lin- guistic terms such as high or low instead of providing a numerical value when defining the reference values of the controller. Experience has shown that Fuzzy Logic Con- trollers (FLCs) provide results superior to those obtained by conventional control algorithms. In particular, the method- ology of the FLC becomes very useful when the processes are too complex for analysis by conventional quantitative techniques or when the available sources of information are interpreted qualitatively, inexactly, or uncertainly [22]. The proposed FS is designed on the basis of FLCs, but there are some differences. From the operational perspec- tive, it combines the functionalities of LB and HOO, i.e. to decrease the call blocking and the HO signaling load, respectively, while at the same time the connection quality is preserved. To achieve this, the LB part is inspired in the LB algorithm proposed in [23] and the HOO part is inspired in the HOO algorithm proposed in [24]. In both cases, the developed algorithms only implement a unique SON function which iteratively adjusts the HOM to optimize the respective KPIs. However, in the case of LB, the HOM variations in both directions of any adjacency have the same magnitude but opposite sign. In this paper, this is equivalent to modify OðxÞ , as shown in Eq. (4). In the case of HOO, only the magnitude of the HOM variations is chan- ged (i.e. the sign remains unchanged), which is equivalent to adjusting HðxÞ in Eq. (4). The design of an FS that integrates both functionalities poses new challenges. Typically, in the context of FLCs, if the system is composed of two outputs, the design is broken down into two new FLCs with one output each. In this paper, the proposed joint optimization algorithm affects two different components of the HOM parameter, OðxÞ and HðxÞ , whose combination results in the values of P. Muñoz et al. / Computer Networks 76 (2015) 112–125 115
  • 6. HOM ðxÞ i!j and HOM ðxÞ j!i, both set in the adjacency x. In princi- ple, this design may require two separate FLCs as two out- puts are involved. However, such a solution is directly related to the coordinator-based schemes mentioned in Section 1, in which the FLCs would be coordinated by an upper-level entity. To avoid the problems linked to this kind of solutions, in this paper, the proposed FS integrates both functionalities into one entity, whose structure is depicted in Fig. 2. It is assumed that each adjacency in the network has this entity implemented. As observed, the inputs of the FS in the adjacency x are the CBR ðxÞ ji , the HOR(x) and the HOM ðxÞ ij . The first input is a derived KPI, cal- culated as: CBR ðxÞ ji ¼ CBR ðxÞ j À CBR ðxÞ i ; ð5Þ where CBR ðxÞ i and CBR ðxÞ j are the CBR measured in cell i and cell j, respectively. This input allows to balance the traffic between adjacent cells. In principle, CBR ðxÞ ji could take neg- ative values if the cell i has higher CBR than cell j. However, to simplify the behavior of the FS, the FS is always applied in the direction of the adjacency in which cell j has equal or higher CBR than cell i. The second input, HOR(x) , is used to reduce the HO signaling load when possible. This KPI is calculated considering those HOs and calls carried in both cells of the adjacency. The third input, HOM ðxÞ ij , is the current value of the HOM, whose aim is to determine when the HOM is reaching high values, in which case the connec- tion quality would be significantly impacted. More pre- cisely, the current value of HOM is the one taken in the opposite direction of the CBR difference, i.e. from cell i to cell j. Finally, the output of the FS, called YðxÞ , is a variable whose possible values refer to simultaneous variations in OðxÞ and HðxÞ . More specifically, one output value is given by the concatenation of two fields, the former correspond- ing to the variation in OðxÞ and the latter to the variation in HðxÞ , i.e.: YðxÞ ¼ ðDOðxÞ ; DHðxÞ Þ: ð6Þ This solution allows to overcome the problem of FLCs in which the output is one variable that can only takes discrete or continuous values in a certain range. Let also assume that the variation for the component OðxÞ in a cer- tain step of the algorithm can only be Àd, 0 or þd, while the variation for HðxÞ can only be Às, 0 or þs. For example, if the output is YðxÞ ¼ ðþd; 0Þ, then the assignment can be formally expressed as: OðxÞ ðt þ 1Þ OðxÞ ðtÞ þ d; ð7Þ HðxÞ ðt þ 1Þ HðxÞ ðtÞ þ 0: ð8Þ Once the inputs and the output have been determined, the next step is their characterization from the fuzzy logic perspective. Starting with the inputs, it is necessary to define the fuzzy sets and membership functions associated with them, as shown in Fig. 3. Each fuzzy set should be identified with a linguistic term (e.g. ‘low’ or ‘very low’). The need to work with fuzzy sets comes from the existence of concepts with no clear boundaries in their definition. In this context, when working with KPIs, it is often difficult to determine from which value a KPI is considered to be jeop- ardized. For this reason, two fuzzy sets, ‘low’ and ‘high’, have been defined for each input. In the case of HOM, the objective of defining two fuzzy sets is to identify when the HOM is close to saturation, since the CDR may be neg- atively affected. In addition, for each fuzzy set of the inputs, a membership function, denoted by lV ðuÞ, quanti- fies the degree of membership of a given input value u to a certain fuzzy set V, with a value between 0 and 1. Thus, CBR ji (x) HOR (x) HOMij (x) Inference ... ... Fuzzifier · · · · ΔO (x) ΔH (x) HOMij (x) ji (x) HOM Network Conversion O (x) H (x) Y = ( , H )ΔO Δ (x) (x) Fig. 2. Scheme of the proposed FS. 116 P. Muñoz et al. / Computer Networks 76 (2015) 112–125
  • 7. unlike in classical sets, the transition between both values is gradual. For simplicity, as shown in Fig. 3, the selected membership functions follow a triangle-shaped or trape- zoid-shaped functions. The core of the FS is given by a rule base that represents the dynamic behavior of the FS through a set of linguistic rules derived from the expert knowledge. Such a rule base comprises a collection of fuzzy rules following a syntax of the type IF-THEN to set the control strategy, e.g.: IF ðCBR ðxÞ ji is highÞ & ðHORðxÞ is lowÞ & ðHOM ðxÞ ij is lowÞ THEN YðxÞ ¼ ðþd; 0Þ: ð9Þ To define these rules, the knowledge and experience of human experts is normally required. Each antecedent of the rules represents an input state and the number of rules is derived from the combination of all fuzzy sets among the different inputs. In this sense, the definition of the rule base must be complete (i.e. all fuzzy rules defined), so that the FS can generate an appropriate action for every input state in the system. The rule base for the proposed FS is shown in Table 1. The definition of each rule is as follows. First, rule 1 is activated when the CBR is balanced, and the HOR and HOM remain at low values, meaning that no change in HOM is needed (i.e. YðxÞ ¼ ½0; 0Š). Rules 2–4 have in common that they are triggered when the HOM is low. As stated before, a low HOM means that the connection quality is not jeopardized due to changes in HOM, so that both LB and HOO actions can be applied without any restriction. For this reason, rule 2, which is triggered when only the CBR presents undesired behavior, implements an action of LB (i.e. YðxÞ ¼ ½þd; 0Š). Specifically, this rule shrinks the service area of the congested cell, in order to send traffic to the adjacent cell. Regarding rule 3, it is activated due to HOO reasons, i.e. when the HOR reaches high values. In this case, the proposed solution is to increase the sym- metric region between adjacent cells (i.e. YðxÞ ¼ ½0; þsŠ), so that the number of unnecessary HOs can be reduced. To activate rule 4, the CBR and HOR must have high values simultaneously. In principle, it is unclear whether the performed action should be related to LB or to HOO. This decision may depend on the particular scenario, so that trial-and-error strategies (e.g. reinforcement learning) are appropriated in this case. The next section explains how to select the optimal consequent for this rule. The remaining rules (i.e. rule 5, 6, 7 and 8) are all linked to high values of the HOM, which may indicate that the con- nection quality is very poor, especially for cell-edge users. In rule 5, the CBR and HOR must exhibit low values to acti- vate this rule, meaning that the problems related to LB/HOO were mitigated by increasing HOM. The objective of rule 5 will be to decrease the HOM, since the high HOM may already be unnecessary. If not, the concerned KPI will be affected and, therefore, the FS will perform the appropriate action. The problem is that such a decrease in HOM can be due to LB or HOO reasons. As in rule 4, the application of trial-and-error strategies will help to make this decision in those situations. Rules 6 and 7 are related to more extreme situations in which the existing problem has lead the FS to achieve high values of HOM, but it has not been mitigated. In addition, the fact that the HOM achieves large values due to successive LB and HOO actions may nega- tively affect the network performance. In this sense, there are essentially two different situations related with this issue. One situation would be given by severe congestion situations, in which LB actions would greatly modify service areas. As a result, any action of HOO to reduce unnecessary HOs (e.g. due to high mobility) would involve larger HOM values that may degrade cell-edge user’s per- formance. Given that the HOM is saturated due to LB, the action of rule 6 should be to lead the HOM towards lower values or leave it unchanged, so that subsequent HOO actions could be effectively applied. The other situation is given by the presence of very high mobility in the network, in which the HOM will reach high values as a result of HOO actions. Under this assumption, any congestion arising in the network would lead the LB part to work with large HOM values, which is not desirable. Thus, rule 7 is intended to leave unchanged, or even reduce, the symmetric region, provided that the HOM is saturated due to HOO reasons. low high 1 0 μ (x) ji CBR (x) ji 0.03 low high 1 0 μ (x) HOR (x)1 low high 1 0 μ (x) ij HOM (x) ij 6 8 (a) (b) (c) [dB] Fig. 3. Membership functions of the fuzzy sets for each input: (a) CBR ðxÞ ji , (b) HOR(x) and (c) HOM ðxÞ ij . Table 1 Proposed sets of fuzzy rules. Rule no. Input 1 CBR ðxÞ ji Input 2 HOR(x) Input 3 HOM ðxÞ ij Candidate action(s) [DOðxÞ ; DH(x) ] 1 L L L [0,0] 2 H L L [þd,0] 3 L H L [0,þs] 4 H H L [þd,0], [0,þs] 5 L L H [Àd,0], [0,Às] 6 H L H [0,0], [Àd,0] 7 L H H [0,0], [0,Às] 8 H H H [Àd,0], [0,Às] P. Muñoz et al. / Computer Networks 76 (2015) 112–125 117
  • 8. Finally, rule 8 refers to a situation in which the CBR and HOR are simultaneously high, which may occur for example when there are both a severe congestion and high mobility users in the network. In this case, the solution could be to enlarge the service area of the congested cell or to reduce the symmetric region in the adjacency. Since the objective would be to favor HOO or LB, respectively, the option that simultaneously enlarges the service area of the congested cell and reduces the symmetric region in the adjacency is discarded. Other alternatives would cause the connection quality to be significantly worsened. The specific action for this rule will also be determined by the optimization algorithm explained in next section. Once the FS has been defined, its operation is as follows. As shown in Fig. 2, the first step is given by the fuzzifier, the process by which the assignment of membership val- ues (one for each fuzzy value of the linguistic variable) to a numerical input value is made by using the membership functions. The next step of the FS is given by the inference, which calculates the degree of truth of each activated rule as follows: aðxÞ k ¼ lK1 ðCBR ðxÞ ji Þ Ã lK2 ðHORðxÞ Þ Ã lK3 ðHOM ðxÞ ij Þ; ð10Þ where aðxÞ k is the degree of truth for the rule k in the adjacency x, and lK1 ; lK2 and lK3 are the membership functions corresponding to the fuzzy sets K1; K2 and K3, respectively, involved in the rule k. The intersection of the fuzzy sets, denoted here by ‘Ã’, is implemented by using the min-operator, which takes the minimum value of the arguments. Finally, unlike the structure of a typical FLC, the proposed FS does not implement the module known as defuzzifier, where the activated fuzzy rules are all aggre- gated to produce a non-fuzzy value. The reason for this is that the output of the proposed scheme is given by a two-dimensional variable whose elements are not corre- lated between them (e.g. OðxÞ can be either increased or decreased while HðxÞ does not change). Thus, the fuzziness between different consequents would not be applicable in this work. Conversely, the output of the proposed FS is given by the consequent of the rule whose degree of truth is the highest. It can be formally expressed as: outputðxÞ ¼ YðxÞ arg max k aðxÞ k : ð11Þ Finally, to download these changes to the network configuration, two more steps will be necessary. As repre- sented in Fig. 2, the former is the update of OðxÞ and HðxÞ considering the parameter variations given by the FS and the latter is the conversion from OðxÞ and HðxÞ to HOM ðxÞ i!j and HOM ðxÞ j!i given their relationship expressed in Eq. (4). 3.2. Optimization of the fuzzy system In the proposed FS, there are certain rules (4–8 in Table 1) with more than one consequent defined. This means that, a priori, it is unclear which is the most appropriate action for these rules, since it may depend on many context factors (e.g. the environment, the traffic distribution patterns, the user mobility, etc.) at the moment of the execution of the rules or simply because of the interactions between the objectives of LO and HOO. Different strategies have been investigated to create, adapt or refine rules [25–29]. In this sense, mobile operators usually do not have the complete knowledge to take proper actions in every network state. Thus, due to the complex nature of network management, Reinforcement Learning (RL) is of particular interest in this context, as the system is able to learn from its own experi- ence. In addition, unlike other mathematical approaches (e.g. supervised learning in Neural Networks), in RL, a train- ing data set is not required. For this reason, in this work, the popular RL algorithm known as Q-Learning has been adopted, so that the best consequent for each fuzzy rule can be found through learning from interaction. The combination between fuzzy logic and RL has been addressed in some previous works [29–33]. However, the proposed optimization algorithm differs from the common implementation of the fuzzy Q-Learning algorithm [34]. This is because, in the case of a typical FLC, the q-function (i.e. a characteristic function of fuzzy Q-Learning optimiza- tion) is updated according to the degree of activation of each triggered rule of the FLC. As a consequence, the q-function can be updated for more than one input state, i.e. there exists a certain degree of fuzziness. Conversely, in the case of the proposed fuzzy system, the update of the q-function is only made for one input state, as the number of rules that can be activated at each optimization step is only one. In RL, an agent is driven to take actions in an environ- ment in order to maximize a cumulative reward. The optimization scheme showing the combination of the FS and the learning entity is depicted in Fig. 4. The basic ele- ments in RL are the agent, the environment, the states, the actions, the policy, the reward and the value function. In this case, the agent that takes the actions is the proposed FS, while the environment corresponds to the cellular net- work. The states are given by the combination of the fuzzy sets of the FS. Note that, for each state, there is one fuzzy rule defined. The actions are given by the candidate conse- quents of the rules and they represent a specific variation in the HOM. The policy defines how the agent has to act at a given time. The reward is a numerical value that expresses the intrinsic desirability of being in a certain state. While reward state agent environment CBR HOR (Network) HOM Fuzzy System action HOM CDR Value function Policy Q-Learning Fig. 4. Optimization scheme. 118 P. Muñoz et al. / Computer Networks 76 (2015) 112–125
  • 9. the reward indicates what is good in an immediate sense, the value function specifies what is good in the long run. In particular, the value function is a mapping between each state and the total amount of reward that an agent can expect to accumulate over the future, starting from that state. In this sense, the objective of the agent is not to obtain the maximum immediate reward, but to maximize the total reward that the agent receives in the long run. RL methods are characterized by two important fea- tures: the trial-and-error search and the fact that actions may affect not only the immediate reward but also the subsequent rewards. As previously stated, the agent has to maximize the received reward in the long-term (or expected cumulative reward), which is the sum of the rewards that will be obtained from the input states visited in the future: Rt ¼ rtþ1 þ crtþ2 þ c2 rtþ3 þ Á Á Á ¼ X1 k¼0 ck rtþkþ1; ð12Þ where r is the numerical reward obtained at each optimi- zation step after performing an action and c is the discount rate determining the relative importance of future rewards. In this paper, the action performed by an acti- vated fuzzy rule will be rewarded positively if the connec- tion quality is not significantly degraded. The immediate reward, r can be formally expressed by defining a specific threshold for the CDR, which is the KPI that estimates the connection quality, as stated in Section 2. Then, those actions leading to a CDR equal or less than the threshold should be rewarded with a positive value, while those actions producing a CDR higher than the threshold should be punished with a negative value. Considering this, the formula for the reward is expressed as: rðxÞ ¼ c if CDR ðxÞ measured 6 CDRth; Àc otherwise; ( ð13Þ where rðxÞ is the reward for the adjacency x and c is a constant that can be expressed as a common factor in the definition of the reward in Eq. (12), so that the effect is a scaling transformation that can be used to avoid under- flow/overflow issues in storing the q-function. In this paper, c = 10 is assumed. In addition, CDRth is the threshold defined at the network level to determine bad quality and CDR ðxÞ measured is the maximum CDR between both cells in the adjacency, i.e.: CDR ðxÞ measured ¼ max CDR ðxÞ i ; CDR ðxÞ j n o : ð14Þ In the proposed FS, to quantify the benefits of executing a certain rule consequent (i.e. the action) provided that a rule has been activated (i.e. the state), the value q of a state-action pair ½s; aŠ is defined. It is a discrete function, denoted by q½s; aŠ, that expresses the expected cumulative reward that can be received when taking action a from state s. In this work, a discrete version of the Q-Learning algorithm is considered, where the learned q-function directly approximates the optimal one independently of the policy followed by the agent [35]. The pseudo-code of the optimization algorithm is shown in Fig. 5. After initializing the q-function, the selection of the consequent for each rule (step 1) is made by using a certain exploration/exploitation policy. Explora- tion is needed since trying actions that have not yet been selected is the only way to discover new actions that could provide much more reward than other actions already tested. Exploitation is also needed since the current knowl- edge must be exploited to obtain reward. A widely-used policy is the so-called -greedy policy, defined as: ai ¼ arg max k q½i; kŠ with probability 1 À ; ð15Þ ai ¼ randomfak; k ¼ 1; 2; . . . ; Jg with probability ; ð16Þ where ai is the selected consequent for rule i and deter- mines the trade-off between exploration and exploitation during the optimization process (e.g. ¼ 0 means no exploration, so that the best action is always selected). Each time an action (i.e. a variation in HOM) is per- formed, the network should evolve to a new state, s0 , in which the KPIs are collected again. At this time, the reward of the action is computed by using Eq. (13), as stated in step 2 (Fig. 5). Then, the so-called value of the new state, denoted by v½s0 Š, is calculated as: v½s0 Š ¼ max k q½s0 ; akŠ: ð17Þ While the q-function quantifies the value of taking an action when starting from a given state, the v-function estimates the value of being in that state regardless of the action to be taken. Note also that the new state s0 is specified by the new activated fuzzy rule in the FS. From vðs0 Þ, an error signal is calculated as follows: Dq ¼ r þ c Á v½s0 Š À q½s; aiŠ; ð18Þ where c is a discount factor. As observed, the first part of the formula is the q-function calculated as the sum of the immediate reward r for state s and the expected value of the next state, v½s0 Š. This is equivalent to Eq. (12), where the immediate reward and future rewards (i.e. the expected value of the next state) are accumulated. The last part in Eq. (18) is taken from the stored q½s; aŠ. As a result, the q½s; aŠ will be updated in the direction of the optimal q- function independently of the policy followed by the agent Fig. 5. Pseudo-code of the optimization algorithm. P. Muñoz et al. / Computer Networks 76 (2015) 112–125 119
  • 10. (step 3 in Fig. 5). Such an update is made by utilizing an ordinary gradient descent, i.e.: q½s; aiŠ q½s; aiŠ þ g Á Dq; ð19Þ where g is a learning rate. The above-described process is repeated for the new current state (steps 4 and 5 in Fig. 5) starting with the action selection (step 1). 4. Performance analysis 4.1. Analysis setup To assess the performance of the proposed joint optimi- zation algorithm, a dynamic system-level simulator for LTE macrocells has been used [36]. This simulator executes a selectable number of optimization loops to emulate the tuning process. Each loop comprises 7000 simulation steps, equivalent to 12 min of actual network time. Each simula- tion step includes updating user positions, propagation computation, generation of new calls, and radio resource management algorithms. At the end of each loop, measure- ments and reliable statistics are obtained to be used in the following optimization loop. Thus, in a certain loop, the steps 1–5 of the algorithm described in Fig. 5 are executed once. The simulated scenario includes a macro-cellular envi- ronment with a layout consisting of 19 tri-sectorized sites evenly distributed in the scenario, as shown in Fig. 6. The main simulation parameters are summarized in Table 2. For simplicity, only the downlink is considered in the sim- ulation. The service provided to users is the voice call as it is the main service affected by the tuning process. The traffic distribution is unevenly distributed in space, where some cells in the center of the scenario have higher traffic density than the surrounding cells. In addition, to thoroughly assess the proposed algorithm, three different configurations have been considered. Firstly, the simulated high load scenario is Scenario Congested ar ea Parameters Indicators ing Ratio (CBR) ping Ratio (CDR) Load Balancing Handover Optimization Baseline Algorithm Fuzzy System Uncoordinated Fig. 6. Block diagram of the simulation process. Table 2 Simulation parameters. Parameter Configuration Cellular layout Hexagonal grid, 57 cells (3  19 sites), cell radius 0.5 km Transmission direction Downlink Carrier frequency 2.0 GHz System bandwidth 1.4 MHz Frequency reuse 1 Propagation model Okumura–Hata with wrap-around Log-normal slow fading, rsf = 8 dB and correlation distance = 50 m Channel model Multipath fading, EPA model Mobility model Random direction Low speed = 3 km/h High speed = 50 km/h Service model Constant bit rate (voice call), poisson traffic arrival, mean call duration 120 s, 16 kbps Base station model Tri-sectorized antenna, SISO, EIRPmax = 43 dBm Scheduler Time domain: Round-Robin Frequency domain: Best Channel Power control Equal transmit power per PRB Link adaptation Fast, CQI based, perfect estimation Handover Time-To-Trigger = 100 ms HOM: ½À24; 24Š dB Call dropping SINR À6.9 dB Traffic distribution Unevenly distributed in space Time resolution 100 TTI (100 ms) Loop time 12 min Simulation duration 3200 min Optimization algorithm d = 1 dB, s = 0.5 dB 120 P. Muñoz et al. / Computer Networks 76 (2015) 112–125
  • 11. given by the presence of a greater number of users moving at low speed (3 km/h) around the scenario, where the CBR is expected to be high. Secondly, the simulated high mobil- ity scenario is given by the presence of high-speed users (50 km/h), which in principle would lead to a high HOR. In this case, the number of users is not high, but the unevenly distributed traffic in the scenario can lead to con- gestion situations, especially in the central area. Finally, the third scenario is a combination of the two previous scenar- ios, so that high-load and high-speed users are simulated. To compare the proposed method with reference cases, as shown in Fig. 6, the independent SON functions of LB and HOO, taken from [23,24] respectively, have been implemented and simulated in two different ways. In one of them, only a functionality is active in the network, while in the other configuration both LB and HOO functions are simultaneously executed in an uncoordinated way. In addition, a baseline optimization scheme following the main principles addressed in [18] has been implemented. This scheme prioritizes the HOO part depending on whether the connection quality is jeopardized or not. More specifically, if the CDR is above a certain threshold, only the HOO function is executed. The performance of these approaches will be assessed by looking at the main related KPIs, in particular, the overall HOR, CBR and CDR. A Figure- of-Merit (FoM), U, that combines the previous KPIs into a scalar value has also been considered. This FoM character- izes, qualitatively, the overall performance of the evaluated approaches. Formally, U is defined as [37]: U ¼ k Á ðCBR½%Š þ ð1 À CBR½%Š=100Þ Á CDR½%ŠÞ þ HOR; ð20Þ where k is a constant weight determining the relative importance of the CBR and CDR (both related to user dis- satisfaction) compared with the HO signaling cost given by HOR. In this study, k equal to 1 is assumed. 4.2. Simulation results First, a sensitivity analysis for determining the optimal values of d and s (i.e. the variation of O and H components, respectively) has been carried out. Fig. 7 shows the mean of the related KPIs and the proposed FoM, U, for the three dif- ferent situations: high-load, high-mobility and both together. As observed, U is a combination of the KPIs related to user dissatisfaction (i.e. the CBR and CDR) and the KPI related to the HO signaling cost (i.e. the HOR). In the high-load scenario (Fig. 7(a)), the variations of d and s have low impact on HOR since users have low mobil- ity, meaning that the impact of HOR on U will also be minor. Due to this, the variations in U are mainly given by the user dissatisfaction. In this sense, there is a clear trade-off between CBR and CDR, i.e. while CBR is reduced (by increasing d), CDR is greater. However, for high values of d, the variations in CBR are greater than in CDR. As a con- sequence, the best values of U (i.e. the lowest) correspond to high values of d. This is in contrast to the situations with high-mobility, as explained below. Fig. 7. Sensitivity analysis for d and s in different scenarios: (a) high-load, (b) high-mobility and (c) high-load and high-mobility. P. Muñoz et al. / Computer Networks 76 (2015) 112–125 121
  • 12. The second scenario given by high-mobility (Fig. 7(b)) shows that HOR increases for larger values of d, especially when s is 1 dB. The reason for this is that resizing the cell service areas for load balancing purposes leads cell-edge users to be under worse radio conditions after performing an HO, so that the probability to perform a new HO to other neighbor cells is increased. Conversely, provided that d is low (avoiding the effect of load balancing on HOR), for larger values of s, HOR decreases. This is in line with the optimization of the H component, i.e. increasing H makes more difficult to perform an HO and it reduces the HO fre- quency. The main drawback for this case is that the CDR is negatively affected. The high-mobility scenario also pro- duces lower values of CBR and CDR because the traffic is geographically dispersed due to the high speed of the users. The configuration (d = 1, s = 0.5) dB provides the lowest value of U, as a result of a better trade-off between HO signaling and user dissatisfaction. The above analysis can also be extended to the scenario that combines both high-load and high-mobility (Fig. 7(c)). Since an important objective of the proposed algorithm is to optimize mobility and load balancing without jeopardizing the connection quality, the high values of CDR measured in this scenario establishes the possible range of variation of d and s. In particular, it is observed that values of s above 0.5 dB involve a CDR greater than 5%, which would cause serious inconvenience to operators. Leaving s fixed to 0.5 dB, the increase of d can also lead to high values of CDR. In particular, values above 3 dB would significantly jeopardize the CDR. For this reason, the range of d and s analyzed in this work does not exceed the limits shown in Fig. 7. As in the previous high-mobility scenario, the opti- mal configuration is (d = 1, s = 0.5) dB, meaning that this setting can be reasonably used to evaluate the performance of the proposed algorithm against other approaches. The comparison of the proposed fuzzy system with other approaches is represented in Fig. 8, where the evolu- tion throughout the time of the KPIs for each strategy is depicted. For the sake of clarity, the represented values have been averaged with the six subsequent samples. The initial situation is given by a low traffic and low mobil- ity. After about 200 min, the central cells of the scenario become crowded, so that many users are blocked, increas- ing the CBR. Looking more closely at this indicator, the evaluated approaches reach values of CBR $5% when the traffic change occurs. The HOO configuration is not able to solve this problem, keeping the CBR at such high values, while the LB configuration achieves a reduction of 2% in a few optimization steps. Conversely, the gain in CBR obtained by the uncoordinated alternative, the baseline scheme and the proposed fuzzy system is more moderate. A higher number of users also means more interference in the network, so that the connection quality of the users is worse, increasing the CDR. This increase is more pro- nounced in the case of the uncoordinated approach. To explain this, note that the LB and HOO functions are simul- taneously changing the HOM from the first optimization steps. As the CDR is not significantly affected by these changes (due to the low interference conditions), the HOM reaches large values. As a result, when the congestion situation occurs, the HOM values are so large that the CDR becomes high. After this, the SON functions attempt to reduce this KPI. In the case of the baseline approach, this effect on CDR is attenuated because the LB function is switched off when the CDR becomes high. Since HOMs are not adjusted by the LB function, the level of CDR is not as high as with the uncoordinated scheme. The rest of configurations, i.e. MLB, HOO and the fuzzy system, keep the CDR constant at around 2%. Due to the presence of only low-speed users, the HOR is about 1. The offered traffic experiences a small reduction at around min. 1000, but it is not until around min. 1200 when the users move at high-speed. The scenario of high-mobility starts at this moment and the HOR is abruptly increased to values above 10, except in the case of the proposed fuzzy system, whose values over the time are below 7. Thus, the performance of the proposed tech- nique in terms of HOR is clearly better than the rest of the strategies. It is also noted that the trajectory of HOR followed by the uncoordinated and baseline approaches is very similar since the HOO function is active during the entire simulation. Regarding the CBR and CDR, the LB and the proposed approaches lead to values below 1%, while the rest of strategies produce undesirably higher val- ues. Note also that, for all the cases, the CBR decreases sig- nificantly from the previous situation (i.e. before min. 1200) because the traffic load is geographically dispersed due to the presence of fast users. The situation after $2100 min is given by a new increase of the offered traffic, so that the last part of the simulation includes both high-load and high-mobility. Looking at the HOR, the proposed method remains at low values, being the best approach from this perspective at any time. Simi- larly, the LB approach keeps a relatively constant but higher level of HOR values, since no actions to reduce HO signaling take place in this case. The HOO, the uncoordinated and the baseline approaches lead to a gradual increase in this KPI. The reason for this is that these strategies implement the same HOO function, which attempt to decrease the high peak in the CDR at the expense of increasing the number of HOs. However, the impact of these three methods on the CDR is not the same. In particular, the baseline approach provides lower CDR values than those obtained by the uncoordinated approach because the LB function is switched off when the CDR is jeopardized after the variation in traffic load. The HOO approach gives even lower values of CDR since the LB function is not executed during the entire simulation. From the CBR perspective, the LB approach provides the lowest values while the CDR is also quite low, similar to the fuzzy system. The proposed method gives better CBR than other approaches and, as pre- viously stated, the HOR is the lowest as well. The evolution of U throughout the time (Fig. 8(d)) shows the suitability of the evaluated methods in each sce- nario. It is noted that the strategy with the lowest value of U will establish a good trade-off between HO signaling and user dissatisfaction. In the first scenario, given by high- load conditions, the best method is the execution of LB alone, which significantly reduces the CBR but at the expense of an increase in the CDR that is higher than in the case of the proposed scheme. This is because the scenario has low mobility and does not require any 122 P. Muñoz et al. / Computer Networks 76 (2015) 112–125
  • 13. optimization from the HOO perspective. In the second sce- nario, determined by the presence of high-speed users, the proposed fuzzy system provides the lowest value of U, since it considerably reduces the HOR. At the beginning of the third scenario (a combination of the two previous), the proposed scheme also achieves lower U values than those obtained by the LB approach since this latter method needs more iterations to reduce the CBR. Thus, it can be highlighted that the proposed joint optimization method is the only solution that, in the presence of mobility and congestion problems (i.e. scenarios two and three), reduces both the HOR and the CBR, which are the objectives of the HOO and LB, respectively. In this sense, note that the LB approach does not reduce the HOR in the second scenario, which is mainly determined by high-mobility. 5. Conclusion In this paper, a novel joint optimization algorithm for LB and HOO functions has been proposed. First, the optimized parameter HOM is broken down into two com- ponents, O(x) and H(x) , which are directly related to LB and HOO, respectively. Then, an FS that adjusts the HOM com- ponents at the cell adjacency level for the joint optimiza- tion of both functions is proposed. Finally, the FS is teamed with the Q-Learning algorithm, which leads the 0 500 1000 1500 2000 2500 3000 5 10 15 20 HOR (a) LB HOO Uncoordinated Baseline Fuzzy System 0 500 1000 1500 2000 2500 3000 0 2 4 6 8 (b) 0 500 1000 1500 2000 2500 3000 0 2 4 6 (c) 0 500 1000 1500 2000 2500 3000 0 20 40 60 80 U (d) Fig. 8. Temporary evolution of (a) HOR, (b) CBR, (c) CDR and (d) U for different approaches. P. Muñoz et al. / Computer Networks 76 (2015) 112–125 123
  • 14. FS to select suitable actions from the LB/HOO perspective, without jeopardizing the connection quality of the active users in the network. The proposed technique has been compared with a baseline scheme based on the existing bibliography and the reference cases in which LB and HOO operate separately or even simultaneously in an uncoordinated way. In addition, these techniques have been assessed in extreme scenarios in which the HOM achieves large values, such as those with high traffic load and/or high mobility. Results show that the proposed scheme effectively improves network performance over the reference cases. In particular, the HOR in the presence of high-mobility users can be reduced down to the half, while the user dissatisfaction in terms of the CBR and CDR keeps values similar to the baseline schemes. In addition, it is the only solution that is able to partially alleviate a congestion situation and to reduce the number of HOs, which are the main objectives of the LB and HOO, respectively. Unlike other reference methods, the proposed technique does not produce high peaks in the KPIs when the situation changes abruptly, e.g. some cells become congested. In the context of SON, it is highlighted that the complexity of the SON entity that coordinates SON specific functions would be reduced, as it is freed from the coordination of the two important SON functions, LB and HOO. Finally, the advan- tages of using fuzzy logic is that the proposed design is easy to implement. Acknowledgment This work has partially been supported by the Junta de Andalucía (Excellence Research Program, Projects P08-TIC- 4052 and P12-TIC-2905). References [1] L.C. Schmelz et al., Self-configuration, -optimisation and -healing in wireless networks, in: Wireless World Research Forum Meeting, vol. 20, 2008. [2] 3GPP, Evolved Universal Terrestrial Radio Access (E-UTRA) and Evolved Universal Terrestrial Radio Access Network (E-UTRAN); Overall description; Stage 2, version 11.4.0 (2012-12), TS 36.300. [3] 3GPP, Self-Organizing Networks (SON) Policy Network Resource Model (NRM) Integration Reference Point (IRP); Requirements, version 11.1.0 (2012-12), TS 32.521. [4] I. Viering, M. Döttling, A. Lobinger, A mathematical perspective of self-optimizing wireless networks, in: Proc. of International Conference on Communications (ICC ’09), 2009. [5] 3GPP, Self-Organizing Networks (SON) Policy Network Resource Model (NRM) Integration Reference Point (IRP); Information Service (IS), version 11.4.0 (2012-12), TS 32.522. [6] K. Tsagkaris, N. Koutsouris, P. Demestichas, R. Combes, SON coordination in a unified management framework, in: Proc. of IEEE 77th Vehicular Technology Conference (VTC), Spring, 2013. [7] X. Gelabert, B. Sayrac, S. Ben Jemaa, A heuristic coordination framework for self-optimizing mechanisms in LTE HetNets, IEEE Trans. Veh. Technol. 63 (3) (2013) 1320–1334. [8] R. Combes, Z. Altman, E. Altman, Coordination of autonomic functionalities in communications networks, in: CoRR abs/ 1209.1236, 2012. [9] H. Lateef, A. Imran, A. Abu-Dayya, A framework for classification of self-organising network conflicts and coordination algorithms, in: Proc. of IEEE 24th International Symposium on Personal Indoor and Mobile Radio Communications (PIMRC), 2013. [10] L. Schmelz, M. Amirijoo, A. Eisenblaetter, R. Litjens, M. Neuland, J. Turk, A coordination framework for self-organisation in LTE networks, in: Proc. of IEEE International Symposium on Integrated Network Management (IM), 2011 IFIP, 2011, pp. 193–200. [11] P. Vlacheas, E. Thomatos, K. Tsagkaris, P. Demestichas, Operator- governed SON coordination in downlink LTE networks, in: Proc. of Future Network Mobile Summit (FutureNetw), 2012. [12] INFSO-ICT-216284 SOCRATES, Framework for the Development of Self-organisation Methods, Tech. Rep. Deliverable D2.4, Version 1.0.3, September, 2008. [13] W. Li, X. Duan, S. Jia, L. Zhang, Y. Liu, J. Lin, A dynamic hysteresis- adjusting algorithm in LTE self-organization networks, in: Proc. of IEEE 75th Vehicular Technology Conference (VTC), Spring, 2012. [14] Y. Li, M. Li, B. Cao, Y. Wang, W. Liu, Dynamic optimization of handover parameters adjustment for conflict avoidance in long term evolution, China Commun. 10 (1) (2013) 56–71. [15] R. Romeikat, H. Sanneck, T. Bandh, Efficient, dynamic coordination of request batches in C-SON systems, in: Proc. of IEEE 77th Vehicular Technology Conference (VTC), Spring, 2013. [16] H. Klessig, A. Fehske, G. Fettweis, J. Voigt, Improving coverage and load conditions through joint adaptation of antenna tilts and cell selection rules in mobile networks, in: Proc. of International Symposium on Wireless Communication Systems (ISWCS), 2012. [17] J. Chen, H. Zhuang, B. Andrian, Y. Li, Difference-based joint parameter configuration for MRO and MLB, in: Proc. of IEEE 75th Vehicular Technology Conference (VTC), Spring, 2012. [18] W.-Y. Li, X. Zhang, S.-C. Jia, X.-Y. Gu, L. Zhang, X.-Y. Duan, J.-R. Lin, A novel dynamic adjusting algorithm for load balancing and handover co-optimization in LTE SON, J. Comput. Sci. Technol. 28 (3) (2013) 437–444. [19] 3GPP, Evolved Universal Terrestrial Radio Access (E-UTRA); Radio Resource Control (RRC); Protocol specification, version 11.2.0 (2012- 12), TS 36.331. [20] T. Ross, Fuzzy Logic with Engineering Applications, Wiley, 2010. [21] A. Engelbrecht, Computational Intelligence: An Introduction, John Wiley Sons, 2007. [22] C. Lee, Fuzzy logic in control systems: fuzzy logic controller. I, IEEE Trans. Syst., Man Cybernet. 20 (2) (1990) 404–418. [23] P. Muñoz, R. Barco, I. de la Bandera, Optimization of load balancing using fuzzy Q-Learning for next generation wireless networks, Expert Syst. Appl. 40 (4) (2013) 984–994. [24] P. Muñoz, R. Barco, I. de la Bandera, On the potential of handover parameter optimization for self-organizing networks, IEEE Trans. Veh. Technol. 62 (5) (2013) 1895–1905. [25] K.C. Foong, C.T. Chee, L.S. Wei, Adaptive network fuzzy inference system (ANFIS) handoff algorithm, in: Proc. of the International Conference on Future Computer and Communication (ICFCC), 2009. [26] A. Çalhan, C. Çeken, An optimum vertical handoff decision algorithm based on adaptive fuzzy logic and genetic algorithm, Wireless Pers. Commun. (2010) 1–18. [27] L. Giupponi, R. Agustí, J. Pérez-Romero, O. Sallent, A framework for JRRM with resource reservation and multiservice provisioning in heterogeneous networks, Mobile Networks Appl. 11 (2006) 825– 846. [28] M. Dirani, Z. Altman, Self-organizing networks in next generation radio access networks: application to fractional power control, Comput. Networks 55 (2) (2011) 431–438. [29] R. Nasri, A. Samhat, Z. Altman, A new approach of UMTS-WLAN load balancing; algorithm and its dynamic optimization, in: Proc. of IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks, 2007. [30] A. Galindo-Serrano, L. Giupponi, Downlink femto-to-macro interference management based on fuzzy Q-learning, in: Proc. of International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt), 2011. [31] M. Haddad, Z. Altman, S. Elayoubi, E. Altman, A Nash–Stackelberg fuzzy Q-learning decision approach in heterogeneous cognitive networks, in: Proc. of IEEE Global Telecommunications Conference (GLOBECOM), 2010. [32] R. Razavi, S. Klein, H. Claussen, A fuzzy reinforcement learning approach for self-optimization of coverage in LTE networks, Bell Labs Tech. J. 15 (3) (2010) 153–175. [33] Y.H. Chen, C.J. Chang, C.Y. Huang, Fuzzy Q-learning admission control for WCDMA/WLAN heterogeneous networks with multimedia traffic, IEEE Trans. Mobile Comput. 8 (11) (2009) 1469–1479. [34] P.Y. Glorennec, Fuzzy Q-learning and dynamical fuzzy Q-learning, in: Proc. of the Third IEEE Conference on Fuzzy Systems, vol. 1, 1994, pp. 474–479. [35] C. Watkins, P. Dayan, Technical note: Q-learning, Mach. Learn. 8 (3) (1992) 279–292. 124 P. Muñoz et al. / Computer Networks 76 (2015) 112–125
  • 15. [36] P. Muñoz, I. de la Bandera, F. Ruiz, S. Luna-Ramírez, R. Barco, M. Toril, P. Lázaro, J. Rodríguez, Computationally-efficient design of a dynamic system-level LTE simulator, Int. J. Electron. Telecommun. 57 (3) (2011) 347–358. [37] J. Ruiz-Avilés, S. Luna-Ramírez, M. Toril, F. Ruiz, Traffic steering by self-tuning controllers in enterprise LTE femtocells, EURASIP J. Wireless Commun. Network. 2012 (337) (2012). Pablo Muñoz received his M.Sc. and Ph.D. degrees in Telecommunication Engineering from the University of Málaga (Spain) in 2008 and 2013, respectively. He is currently work- ing with the Communications Engineering Department at the same university. Since September 2009, he has been a Ph.D. Fellow, where he has been working in self optimiza- tion of mobile radio access networks and radio resource management. Raquel Barco received the M.Sc. degree in Telecommunication Engineering in 1997 and the Ph.D. degree in 2007 from the University of Málaga, Spain. From 1998 to 2000, she worked at the European Space Agency, Darmstadt, Germany. From 2000 to 2003, she worked part-time for Nokia Networks. Cur- rently, she is Associate Professor at the Com- munication Engineering Department, University of Málaga. She has published more than 50 papers in international journals and conferences and she has been involved in several projects with companies. Her research interests are in the field of mobile communication systems, especially Self-Organizing Networks. Isabel de la Bandera received her M.Sc. degree in Telecommunication Engineering from the University of Málaga (Spain) in 2009. In 2008, she was with the Communications Engineering Department at the same univer- sity in RFID projects. Since February 2010, she has been with the same department working in projects about radio resource management in next generation mobile networks and she is working toward the Ph.D. degree in Tele- communications Engineering. P. Muñoz et al. / Computer Networks 76 (2015) 112–125 125