Performance evaluation of route selection schemes over a clustered cognitive radio network

Performance Evaluation of Route Selection
Schemes over a Clustered Cognitive Radio Network
Mariam Musavi1,2
, Kok Lim Alvin Yau1
, Hafizal Mohamad2
, Nordin Ramli2
1
School of Science and Technology, Sunway University, Selangor, Malaysia
2
Wireless Network and Protocol Research Lab, MIMOS Berhad, Kuala Lumpur, Malaysia
15019318@imail.sunway.edu.my, koklimy@sunway.edu.my, hafizal.mohamad@mimos.my, nordin.ramli@mimos.my
Abstract—Cognitive radio (CR) is a promising next-generation
wireless communication system that provides efficient utilization
of radio spectrum by enabling unlicensed users (or secondary
users, SUs) to sense for and use underutilized radio spectrum
(or white spaces) owned by licensed users (or primary users,
PUs). In this paper, we investigate the effects of a larger network
size (or higher number of routes) on non-clustered, clustered
and clustered-reinforcement learning (RL)-based route selection
schemes in a USRP/ GNU radio platform focusing on the
network layer. Experimental results show that the enhanced
variant of reinforcement learning (RL)-based route selection
scheme (C-ERL) selects stable route(s) over a clustered CRN in a
USRP/ GNU radio platform. C-ERL improves cluster stability by
reducing the number of route breakages caused by route switches,
and network scalability by reducing the number of clusters in
the network without significant deterioration of QoS, including
throughput, packet delivery rate, and end-to-end delay.
Index Terms—Cognitive radio network, Performance evalua-
tion, Route selection, Clustering, USRP, GNU radio
I. INTRODUCTION
Cognitive radio (CR) [1] enables unlicensed users (or sec-
ondary users, SUs) to use the underutilized radio spectrum
(or white spaces) owned by licensed users (or primary users,
PUs) while minimizing interference to the PUs. The main
difference between CR and the conventional wireless network
is the dynamic presence of PUs’ activities in CR networks
(CRNs); so the main challenge of CR is to establish a ”friendly
co-existence” environment between PUs and SUs without
deteriorating the quality of service (QoS).
Route selection establishes a route from a source node to
a destination node in a network. In CRNs, route selection
schemes must address an intrinsic characteristic of the dynam-
icity of channel availability, so that selected routes are steady
and SUs can perform data packet transmission for longer time
duration without significant deterioration of QoS [1].
Clustering is a topology management mechanism that or-
ganizes SUs into logical groups (or clusters). Route selection
over a clustered CRN is preferred due to three main reasons
[2]. First, it provides network scalability by reducing the
routing overhead, such as route request (RREQ) and route
reply (RREP), constraining the flooding of routing overhead
at the cluster level only. Second, it provides cluster stability by
reducing the effects of the dynamicity of channel availability
(i.e., the number of available common channels in a cluster)
because the updates affect underlying network at cluster level.
Third, it supports cooperative tasks (e.g., route selection and
channel sensing). This research investigates a route selection
scheme that enables SUs to form clusters, selects a common
operating channel in each cluster, and subsequently enables
SU source nodes to search for route(s). Among the discovered
routes, the selected route has the highest channel capacity
at the bottleneck link which improves cluster stability and
network scalability without significant deterioration of QoS
including throughput, packet delivery rate, and end-to-end
delay.
Reinforcement learning (RL) [3] is an unsupervised artificial
intelligence approach that enables a SU node to observe de-
cision making factors (or state) in the operating environment,
learn, select an appropriate action at a particular time instant,
and receive state and reward, which are the consequences of
taking an action under a state, in the next time instant. Using
a RL model with state, action, and reward representations,
a SU node maximizes its accumulated reward as time goes
by. Q-learning is a popular RL-based approach that has been
applied to routing in wireless networks [4] [5]. The learning
rate 0 ≤ α ≤ 1 is a step size that determines to what
extent new or estimated Q-value offsets the old Q-value in
a Q-function. Generally, the learning rate α is a constant
value. However, the learning rate can be adjusted and it
has significant effect on performance [5] [6] [7]. In our
previous work [8], we investigated a route selection scheme
for clustered CRNs in which route selection scheme adjusts its
learning rate based on the route capacity, and two performance
metrics, including frequency of route breakages and number of
clusters have shown to reduce. In this paper, our contribution
is to evaluate network performance of non-clustered, clustered
and clustered-RL-based route selection schemes in a testbed
focusing on the network layer.
This paper is organized as follows. Section II presents
related work on route selection schemes, and the application
of RL to route selection in CRNs over a USRP/ GNU radio
platform. Section III presents experimental prototyping, sum-
marizes the key results of the research. The network perfor-
mance of non-clustered, clustered and clustered-reinforcement
(RL)-based route selection schemes is evaluated through ex-
periments. Section IV concludes the research.
II. RELATED WORK
This section presents related work on RL-based route se-
lection approaches over USRP/ GNU radio platform.

A. Route selection schemes
Four main route selection schemes have been investi-
gated on experimental platform, namely spectrum leasing [5],
SAMER [9], Coolest Path [10] and CRP [11].
In the spectrum leasing approach [5], the link cost is
based on the channel availability time; and the route cost is
calculated based on bottleneck route. Hence, the established
route has the highest channel capacity at the bottleneck link.
Additionally, SUs are informed of the channel utilization levels
by PUs, and so it reduces the frequency of route breakages at
the expense of slight performance degradation to throughput
and packet delivery rate. The cluster-based and non-RL-based
route selection scheme (or clustered SA-SP) used as baseline
in this work is based on this scheme.
In SAMER [9], the link cost is the sum of the estimated
throughput - the product of channel availability, link bandwidth
and loss rate for all its available channels; and the route cost
is provided by the link cost of the bottleneck link. Hence, the
established route takes account of PUs’ and SUs’ activities
and link quality, and it provides the highest throughput. The
spectrum-aware shortest path (or unclustered SA-SP) used as
second baseline in this work is based on this scheme.
In Coolest Path [10], the link cost is given by the highest
amount of white spaces in one of the available channels; and
the route cost is calculated in either accumulative or bottleneck
manners. Hence, the established route has the highest channel
availability.
In CRP [11], the link cost is based on a cost function
reflecting the delay and minimization of SUs’ interference
to PUs’ activities; and the route cost is calculated in an
accumulative manner. Hence, the established route has lowest
end-to-end delay and the reduction in SU-PU interference.
Compared to traditional routing schemes, such as the ran-
dom routing approach, Coolest Path improves channel avail-
ability, SAMER improves throughput, and CRP reduces SUs’
interference to PUs at the expense of slight performance
degradation to the end-to-end delay of SUs.
B. RL-based route selection schemes over USRP/ GNU radio
platform
The investigation of RL in CRNs under testbed environment
is in its rudimentary phase.
Raza et al. [5] evaluate two RL-based route selection
schemes for multi-hop CRNs in a USRP/ GNU radio platform
with ten nodes. The constant learning rate in each experiment
varies from α = 0.1 to α = 0.9, and higher α value has shown to
increase throughput and reduce the number of route breakages
[5]. Higher learning rate increases throughput and reduces the
number of route breakages in routing. Lower learning rate
reduces the fluctuations of selected actions (or more steady
outcomes). Therefore, this investigation fills in the gap to
investigate and improve the performance of RL-based route
selection schemes.
Bhorkar [6] applied learning rate adjustment to Q-routing
in wireless ad hoc networks, and it has shown to minimize
Gigabit Ethernet
switch
Host PC
1
s1
s3
s2
s5
s4 s6
s7
s9
s12
s10 s11
s8
13
s
Fig. 1. An overview of our 13-node testbed.
the average routing cost, including the average number of
transmissions per packet.
III. TESTBED PROTOTYPING, RESULTS AND DISCUSSION
This section evaluates the performance of route selection
schemes through experiments and presents experimental setup,
assumptions and parameters, performance metrics, empirical
results and discussions.
A. Architecture
This section presents our experimental platform based on
our previous work [8] for investigating RL-based route selec-
tion, focusing on the network layer, over a testbed comprised
of a host PC, a Gigabit Ethernet switch, GNU radio, and
thirteen USRP units. The software architecture of our experi-
mental platform is comprised of four main planes: a) cluster
re-adjustment, b) route selection, c) channel selection, and d)
data communication.
a) Cluster re-adjustment: This plane forms clusters
whose cluster size is based on the number of available common
channels in a cluster, and selects a common operating channel
for each cluster.
b) Route selection: This plane supports exchange of con-
trol information, namely RREQ/ RREP, route capacity, route
selection, and route maintenance using UDP protocol over a
Gigabit Ethernet backbone that emulates a common control
channel. The channel capacity of a link is the probability of a
channel in the OFF state for the next time instant. The amount
of white spaces in a channel is known as route capacity. In
route selection, a SU source node selects a stable route with
the highest route capacity. When a broken route is reported,
the SU source node initiates a route maintenance and switches
to another route.

TABLE I
EXPERIMENTAL PARAMETERS AND VALUES.
Category Notation Parameter Value
SU Number of SUs 13
PU Number of PUs 6
Channels k Number of channels 19
kc
Ci
Number of available common
channels in a cluster
Γ
cmin
Ci
Threshold for minimum number of
available channels
1
Network Traffic data type UDP
Packet size 500B
Time duration of each run 300s
RL α Learning rate for clustered TRL 0.5
Physical Sense-transmit window 4s
tquiet Quiet period 1s
tdata Data transmission period 3s
c) Channel selection: This plane senses channels and
switches channels, and provides channel access. During chan-
nel sensing, each SU node scans the available channels for
a short quiet duration and sends clustering information (e.g.,
available channels) to neighboring SUs. Channel switch stops
transmission in a channel upon detecting PUs’ activities and
switch to another available channel, which has the highest
channel capacity among the available channels selected by a
cluster readjustment plane. Channel access enables distributed
channel sharing among SUs.
d) Data communication: This plane is responsible to
transmit data packets over wireless channels and exchanges
control packets over gigabit Ethernet.
All the route selection approaches, including the cluster-
based approaches, namely the enhanced RL-based approach
called clustered ERL, the traditional RL-based approach called
clustered TRL, and three non-RL-based approaches called
clustered SA-SP, clustered SA-SP (single channel), and clus-
tered SA-SP (double channel), as well as a non-cluster-based
and non-RL-based approach called unclustered SA-SP, are
implemented in a testbed consisting USRP/ GNU radio units.
B. Experimental setup, assumptions and parameters
The number of PUs and SUs are 6 and 13, respectively. This
work adopts five assumptions to simplify the MAC protocol as
the focus of this work is on the network layer. First, a Gigabit
Ethernet switch emulates a common control channel free
from PUs’ activities reserved for control message exchange
among SUs. Second, each SU uses two different channels for
transmission and reception, respectively. Third, 19 channels
are available for data transmission, each has a bandwidth of 1
MHz. A large number of channels reduces channel contention
among SU neighbor nodes. Fourth, although each SU does
not have a complete knowledge of PUs’ activities, it performs
perfect channel sensing in which there is no mis-detection and
false alarm [11]. Lastly, there is perfect synchronization among
the SUs. The testbed deployed in an indoor environment is
shown in Fig. 1, and the experimental parameters are listed in
Table I. In each experiment, there is a single SU source node
and a single SU destination node.
The PUs’ activities module generates exponentially dis-
tributed ON and OFF periods for each channel [12]. Each
PU has a fixed average ON time of 15 s and an average OFF
time that varies from 15 to 105 s, so the channel utilization
level by PUs is within the range of 50% to 88%.
C. Performance metrics
We compare network performance achieved by non-
clustered, clustered and clustered-reinforcement(RL)-based
route selection schemes. Clustered TRL has a constant learning
rate (i.e., α = 0.5), clustered ERL has a dynamic learning rate
adjusted according to the channel capacity of the bottleneck
link of a route. Clustered SA-SP (single channel) selects a
route with a random common channel in a cluster as the
operating channel at each link along a route. Clustered SA-
SP (double channel) selects route with the highest channel
capacity at the bottleneck link. In clustered (double channel)
the number of common channels in a cluster is kept greater
than its threshold, specifically kc
Ci
> Γcmin
Ci
= 1. In clustered
SA-SP, SUs are informed of the channel utilization levels by
PUs, and selects a route with the highest channel capacity at
the bottleneck link [5], while in clustered TRL [4], clustered
ERL, clustered SA-SP (single channel), and clustered SA-SP
(double channel), SUs are not informed of such information.
SA-SP, which is a spectrum-aware variant of the shortest path
(SP) routing protocol for non-clustered (or flat) CRNs, chooses
a random operating channel out of the available channels at
each SU SAMER [9].
The goal of clustered ERL is to:
• minimize the number of route breakages at the bottleneck
link [5] [13].
• minimize the number of clusters in the network, which
is the number of clusterheads in a network.
• improve throughput [5] [13] and packet delivery ratio.
• minimize end-to-end delay.
D. Results and discussions
This section presents the performance achieved by non-
clustered, clustered and clustered-RL-based route selection
schemes under varying channel utilization levels by PUs.
The number of route breakages is investigated with respect
to average PU OFF time. As shown in Fig. 2, all the clustered-
based approaches achieve lower number of route breakages as
compare to unclustered SA-SP which is a non-cluster-based
and non-RL-based approach. The unclustered SA-SP incurs
the highest number of route breakages for two main reasons.
First, unclustered SA-SP chooses a random channel out of the
available channels as the operating channel, which may not
have sufficient channel capacity (or white spaces). Second,
unclustered SA-SP provides a route with lower number of
hops and at least a single available channel at each link. The
PUs may appear and re-appear for a short duration. Hence
the established routes can be affected frequently. Clustered
SA-SP (single channel) achieves the highest route breakages

Avg PU OFF Time (sec)
15 30 45 60 75 90 105
0
2
4
6
8
10
12
14
16
18
20
22
24
Unclustered SA-SP
Clustered SA-SP
Clustered SA-SP (single channel)
Clustered SA-SP (double channel)
Clustered TRL
Proposed: Clustered ERL
Fig. 2. Average number of route breakages vs. average PU OFF time.
15 30 45 60 75 90 105
0
2
4
6
8
10
12
14
Unclustered SA-SP
Clustered SA-SP
Clustered TRL
Fig. 3. Average number of clusters vs. average PU OFF time.
among all clustered approaches because clustered SA-SP
(single channel) selects a route which has at least a single
number of available common channel in a cluster at each
link along a route, and hence any reappearance of PUs’ on
available common channel in a cluster may cause highest
route breakages. Clustered SA-SP achieves slightly highest
number of route breakages despite SUs are informed of the
channel utilization levels by PUs because in clustered SA-SP,
the lack of a sufficient number of available common channels
in a cluster can cause cluster re-adjustment (see Section III-A)
resulting in a higher number of route breakages. Clustered
SA-SP (double channel) establishes a route in which clusters,
forming a backbone route from a SU source node to its
destination node, have more than a single available common
channel in a cluster at each link along a route, and it selects
route which has the highest common channel capacity at the
bottleneck link for enhancing cluster stability. In clustered
TRL, which uses a constant learning rate α = 0.5 where
α is not adaptive to route capacity, due to the dynamicity
of channel availability, the re-appearance of PUs’ activities
increases the occurrences of route maintenance and breakages,
and so a higher learning rate is required so that it becomes
more adaptive to the operating environment.
Clustered ERL, which uses a dynamic learning rate,
achieves performance close to that of clustered SA-SP. This
15 30 45 60 75 90 105
1
1.25
1.5
1.75
2
Unclustered SA-SP
Clustered SA-SP
Clustered TRL
Fig. 4. Average throughput vs. average PU OFF time.
is because, in clustered ERL, the learning rate is adaptive to
route capacity, or the channel capacity of the bottleneck link,
and a stable next-hop SU neighbor node with the best possible
route capacity is selected, reducing the likelihood of the re-
appearance of PUs’ activities along a route, so this enhances
cluster stability by reducing the number of route breakages.
The number of clusters forming a backbone route from a
SU source to its destination node in the network is investigated
with respect to average PU OFF time. As shown in Fig. 3, the
unclustered SA-SP approach achieves the highest number of
clusters as compared to all clustered approaches because the
backbone route is formed by single-node clusters in a non-
clustered (or flat) network. Clustered ERL and TRL achieve
performance close to that of clustered SA-SP. This is because,
in clustered ERL and TRL, routes are established based
on the highest Q-value, which helps to reduce the effects
of the dynamicity of channel availability. Clustered SA-SP
(single channel) achieves the highest number of clusters in
all clustered approaches because, under the dynamicity of
channel availability, the number of common channels in a
cluster is less than or equal to one forming higher number
of clusters in the network, and so there is greater possibility
that PUs reappear in any of the common channels in a
cluster which causes the formation and reformation of unstable
clusters. Clustered SA-SP (double channel) achieves the lowest
number of clusters because clustered SA-SP (double channel)
forms a stable cluster in which sufficient number of available
common channels in a cluster limits the likelihood of cluster
re-adjustment. On the contrary, clustered SA-SP achieves the
highest number of clusters despite SU nodes are informed
of the channel utilization levels by PUs, because it selects
a random channel from list of available channels providing a
higher number of available common channels in a cluster.
The throughput and packet delivery rate performances are
investigated with respect to average PU OFF time. As shown
in Fig. 4 and Fig. 5, the cluster-based and RL-based clustered
ERL and TRL, as well as the cluster-based and non-RL-
based clustered SA-SP (double channel) and (single chan-
nel), achieve lower throughput and packet delivery rate as
compared to the cluster-based and non-RL-based clustered

15 30 45 60 75 90 105
0.5
0.6
0.7
0.8
0.9
1
Unclustered SA-SP
Clustered SA-SP
Clustered TRL
Fig. 5. Average packet delivery rate vs. average PU OFF time.
SA-SP and non-cluster-based and non-RL-based unclustered
SA-SP approach. In clustered ERL and TRL, and clustered
SA-SP (single double channel) and (double channel), under
the dynamicity of channel availability, the lack of sufficient
number of available common channels in a cluster incurs route
breakage and prompts cluster re-adjustment (see Section III-A)
resulting in more packet loss ratio. In unclustered SA-SP, since
there is a higher number of available channels (at the individ-
ual node level in a non-clustered network) as compared to
the number of available common channels (at the cluster level
in a clustered network), there is a higher number of possible
routes providing higher throughput and packet delivery rate,
although the number of route breakages is higher (see Fig. 2).
Clustered SA-SP achieves the highest throughput and packet
delivery rate among all clustered approaches for two main
reasons. First, in clustered SA-SP, SUs are informed of the
channel utilization levels by PUs. Second, in clustered SA-SP,
the lack of sufficient number of available common channels in
a cluster can cause cluster re-adjustment (see Section III-A)
and a higher number of route breakages resulting in a higher
number of potential routes providing higher throughput and
packet delivery rate available for data transmission.
The end-to-end delay is investigated with respect to average
PU OFF time. As shown in Fig. 6, the cluster-based and RL-
based clustered ERL and TRL, as well as the cluster-based
15 30 45 60 75 90 105
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Unclustered SA-SP
Clustered SA-SP
Clustered TRL
Fig. 6. Average end-to-end delay vs. average PU OFF time.
and non-RL-based clustered SA-SP, clustered SA-SP (single
channel) and clustered SA-SP (double channel), incur approxi-
mately similar delay, while unclustered SA-SP, which is a non-
cluster-based and non-RL-based approach, incurs the lowest
delay. This means that the latency incurred for cluster re-
adjustment in cluster-based schemes offsets the time incurred
for a higher number of route maintenance due to a higher
number of route breakages in non-cluster-based scheme.
IV. CONCLUSION
This paper investigates the effects of a larger network size
(or higher number of routes) on non-clustered, clustered and
clustered-reinforcement learning (RL)-based route selection
schemes in a testbed focusing on the network layer. Experi-
mental results show that the enhanced variant of reinforcement
learning (RL)-based route selection scheme (C-ERL) selects
stable route(s) over a clustered CRN in a USRP/ GNU radio
platform. C-ERL improves cluster stability by reducing the
number of route breakages caused by route switches, and
network scalability by reducing the number of clusters in the
network without significant deterioration of QoS, including
throughput, packet delivery rate, and end-to-end delay.
REFERENCES
[1] I. F. Akyildiz, W.-Y. Lee, and K. R. Chowdhury, “CRAHNs: Cognitive
radio ad hoc networks,” Ad Hoc Networks, vol. 7, no. 5, pp. 810–836,
2009.
[2] A. T. D. Altilar, “United nodes: cluster-based routing protocol for mobile
cognitive radio networks,” IET Communications, vol. 5, no. 15, pp.
2097–2105, 2011.
[3] R. S. Sutton and A. G. Barto, Introduction to reinforcement learning.
MIT Press Cambridge, 1998, vol. 135.
[4] Y. Saleem, K.-L. A. Yau, H. Mohamad, N. Ramli, and M. H. Rehmani,
“SMART: A SpectruM-Aware ClusteR-based rouTing scheme for dis-
tributed cognitive radio networks,” Computer Networks, vol. 91, pp.
196–224, 2015.
[5] A. Raza, K. L. Yau, J. Qadir, H. Mohamad, N. Ramli, and S. Keoh,
“Route selection for multi-hop cognitive radio networks using reinforce-
ment learning: An experimental study,” IEEE Access, pp. 1–1, 2016.
[6] A. A. Bhorkar, M. Naghshvar, T. Javidi, and B. D. Rao, “Adaptive oppor-
tunistic routing for wireless ad hoc networks,” IEEE/ACM Transactions
on Networking, vol. 20, no. 1, pp. 243–256, 2012.
[7] M. A. Thathachar and P. S. Sastry, “Varieties of learning automata: an
overview,” IEEE Transactions on Systems, Man, and Cybernetics, Part
B: Cybernetics, vol. 32, no. 6, pp. 711–722, 2002.
[8] M. Musavi, K.-L. A. Yau, S. A. Raza, H. Mohamad, and N. Ramli,
“Route selection over clustered cognitive radio networks: An experi-
mental evaluation,” under review, 2018.
[9] I. Pefkianakis, S. H. Wong, and S. Lu, “SAMER: Spectrum aware mesh
routing in cognitive radio networks,” in 3rd IEEE Symposium on New
Frontiers in Dynamic Spectrum Access Networks (DySPAN), Chicago,
IL, USA. IEEE, 2008, pp. 1–5.
[10] X. Huang, D. Lu, P. Li, and Y. Fang, “Coolest path: spectrum mobility
aware routing metrics in cognitive ad hoc networks,” in 2011 IEEE 31st
International Conference on Distributed Computing Systems (ICDCS),
Minneapolis, MN, USA. IEEE, 2011, pp. 182–191.
[11] K. R. Chowdhury and I. F. Akyildiz, “CRP: A routing protocol for
cognitive radio ad hoc networks,” IEEE Journal on Selected Areas in
Communications, vol. 29, no. 4, pp. 794–804, 2011.
[12] M. H. Rehmani, A. C. Viana, H. Khalife, and S. Fdida, “SURF: A
distributed channel selection strategy for data dissemination in multi-hop
cognitive radio networks,” Computer Communications, vol. 36, no. 10,
pp. 1172–1185, 2013.
[13] L. Sun, W. Zheng, N. Rawat, V. Sawant, and D. Koutsonikolas, “Perfor-
mance comparison of routing protocols for cognitive radio networks,”
IEEE Transactions on Mobile Computing, vol. 14, no. 6, pp. 1272–1286,
2015.

Performance evaluation of route selection schemes over a clustered cognitive radio network

More Related Content

What's hot (18)

Similar to Performance evaluation of route selection schemes over a clustered cognitive radio network (20)

More from Conference Papers (20)

Recently uploaded (20)

Performance evaluation of route selection schemes over a clustered cognitive radio network