SlideShare a Scribd company logo
1 | P a g e
Implementation of Bidirectional Forwarding
Detection (BFD) over the Hot Standby Router
Protocol (HSRP) with the Enhanced Interior Gateway
Routing Protocol (EIGRP)
Ahmed Ben Hassan
Department of Network
University of Tripoli
Tripoli, Libya
A.BENHASSAN@uot.edu.ly
Mahmud Mansour
Department of Network
University of Tripoli
Tripoli, Libya
Mah.mansour@uot.edu.ly
Abstract— Organizations are increasingly prioritizing network
availability and minimizing downtime due to the growing demand for
online applications and services. Maintaining high availability can
be costly, but a lack can damage an organization's reputation and
cause significant financial losses. To enhance IP network
availability, the Hot Standby Router Protocol (HSRP) is a necessary
tool that is used to achieve this goal. HSRP is a Cisco proprietary
redundancy protocol that is used to manage network default gateway
routers by using one or more redundant routers that will take over in
case of default router failure. However, late failure detections and
slow responses can lead to packet loss during failure. Bidirectional
Forwarding Detection (BFD) is an effective solution to increase
availability by rapidly detecting link failure and monitoring IP
connectivity. In our work, we implemented BFD with HSRP to see if
BFD helps in reducing downtime and enhancing the availability of
the IP networks. The comparison was made based on convergence
time, packet loss, CPU usage, and bandwidth consumption and after
implementation, testing, and optimization and using the PNETLAB
emulation tool. We have verified that HSRP with BFD shows very
fast failure detection and recovery with reduced downtime and
packet loss, thus improving network reliability and stability.
Keywords—FHRP, HSRP, BFD
I. INTRODUCTION
Availability has emerged as a significant concern for
enterprises and businesses in today's network. Every minute of
service interruption has the potential to result in significant
financial losses for a firm, amounting to hundreds or even
thousands of dollars. To avoid outages, we aim to enhance the
network's uptime by implementing redundant lines and nodes.
While redundancy might be beneficial, it also comes with a high
cost. Achieving optimal network availability is dependent on the
client's specific business objectives and their tolerance for
network downtime.
II. AVAILABILITY
Availability refers to the length of time a network is
available to users and is generally a crucial aim for network
design clients. Availability can be defined as a percent uptime
per year, month, week, day, or hour, relative to the entire time in
that period. For example, in a network that delivers 24-hour, 7-
day-a-week service, if the network is up 165 hours in the 168-
hour week, availability is 98.21 percent [1].
In general, availability means how long the network is
operational. Availability is linked to reliability, but it has a more
specific meaning (percent uptime) than reliability. Reliability
refers to a variety of issues, including accuracy, error rates,
stability, and the amount of time between failures [1].
Availability is closely linked to resilience, a concept that is
becoming more common in the networking field. Resiliency is
how much stress a network can bear and how rapidly it can
bounce back from problems, including security breaches, natural
and unnatural disasters, human error, and catastrophic software
or hardware failures [1].
Normally, availability is represented as the percentage of
time the network is functional. It was here that the phrase “five-
nine” came into usage. Five-nines refer to the percentage of
99.999%, which is a generality that has for long been used for
marketing and has been seen as the desirable target for
availability in many networks, at least at the core level. Five-
nine translates to five minutes of downtime a year [2].
Graph. 1. Availability percentage in minutes
To determine theoretic availability, the network is separated
into each dependent item, such as hardware, software, physical
16.8 hours
1.68 hours
10.1 minutes 1.01 minutes 6.05 seconds
90%
One nine
99%
Two nines
99.9%
Three nines
99.99%
Four nines
99.999%
Five nines
DOWNTIM E PER WEEK
2 | P a g e
connections, power supply, etc. For most equipment, the
manufacturer will offer information on availability expectations,
generally characterized as the mean time between failures
(MTBF). For those elements of the network that do not have this
data, such as a power source, statistical data and guesses must
be employed. The projected time to repair each portion of the
network has to be calculated. This is generally referred to as the
Mean Time to Repair (MTTR). Each unit's availability is
determined by:
𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑖𝑙𝑖𝑡𝑦 =
𝑀𝑇𝐵𝐹
𝑀𝑇𝐵𝐹+𝑀𝑇𝑇𝑅
(1)
To compute the overall availability of the network, the
availability of all units has to be totaled together.
Logically, a chain can never be stronger than its weakest
link. Adding redundancy will result in greater availability.
However, adding redundancy does not necessarily boost
availability in a linear sense. A switchover from one route to
another takes time, and during this moment, the connection will
be offline [2].
III. COST OF NETWORK DOWNTIME
Many firms may not fully understand the impact of
downtime on their business. Calculating the impact's cost may
be tough, as it demands a thorough understanding of both
physical and intangible losses. Actual losses are physical
expenditures; they include lost income, the cost to retrieve lost
information, catastrophe recovery, and business continuity
costs. Intangible costs include damage to your company's
reputation, lost customers, and staff productivity losses. In many
circumstances, the damage associated with intangible costs may
have a greater long-term effect on an organization than that of
actual expenses. Downtime cost is defined as any profit that a
corporation loses when its equipment or network stops working.
According to Research, the losses associated with network
downtime include:
• Reputation
• Productivity losses
• Opportunities
• Data
In 2020, an ITIC research found that from 2016, the average
cost of a one-hour layover had grown by 30%. The bottom line
is that of the 1,000 firms that participated in the study, more than
30% reported spending between $1 and $5 million on one hour
of downtime. Meanwhile, over $300,000 is worth 1 hour of
downtime for approximately 80% of enterprises. Finally, 98%
claimed that one hour of downtime costs them roughly $100,000
[3].
According to a July 2009 white paper titled “Navigating
Network Infrastructure Expenditures during Business
Transformations,” authored by Lippis Consulting, the cost of
network downtime for a financial firm’s brokerage business was
determined to be $7.8 million per hour. A one-hour interruption
for a financial firm’s credit card processing can cost upwards of
$3.1 million. A media organization might lose money on pay-
per-view revenues, an airline company in ticket sales, and a
retail company in catalog sales [4].
IV. RELATED WORK
In 2024, “Design and Implementation of VRRP and BFD
Linkage Technology in Campus Information Service Platform
Network” by Yu Niu and Xiandong Li [12], They studied how
to integrate BFD and VRRP technologies into existing network
architectures to improving the reliability and stability of the
network.
In 2023, “Performance Evaluation of First Hop Redundancy
Protocols in IPv4 and IPv6 Networks” by Najia Ben Saud [13],
compared the FHRP performance in terms of packet loss,
convergence time and CPU Utilization IPv4 and IPv6 Networks
In 2022, “Performance Evaluation of First Hop Redundancy
Protocols IPv6” by M. Mansour [14], focused on the FHRP
performance in terms of packet loss and convergence time
In 2021, “Performance Analysis and Functionality
Comparison of First Hop Redundancy Protocols” by
M.Mansour [15], studied the effect of different parameters
mainly bandwidth consumption, traffic flow, convergence time
and CPU utilization.
In a previous study [16], by Imelda et al in 2020, entitled
"Performance Analysis of VRRP, HSRP, and GLBP with
EIGRP Routing Protocol", a comparison in performance
between VRRP, HSRP, and GLBP were introduced, EIGRP
routing protocol was applied.
In 2019, “FDVRRP: Router implementation for fast
detection and high availability in network failure cases” by
Suncheul Kim and Hoyong Ryu [17], studied the Implement fast
detection BFD with VRRP to improve failure detection and a
failover.
V. FIRT HOP REDUNDANCY PROTOCOLS
First Hop Redundancy Protocol (FHRP) is a suite of
protocols that enable a router on a network to instantly take over
if the main default gateway router fails. The devices in a shared
network segment are set with a single default gateway address,
which relates to the router that connects to the rest of the
network. The trouble emerges when this main router fails, and
there is a second router on the segment that is also capable of
becoming the default gateway, but end devices don’t know
about it. Hence, if the initial default gateway router fails, the
network will stop [5]. First hop redundancy protocols are one of
the solutions to this problem. The three primary First Hop
Redundancy Protocols are:
• Hot Standby Router Protocol (HSRP; Owned by cisco)
• Virtual Router Redundancy Protocol (VRRP; Open
Standard)
• Gateway Load Balancing Protocol (GLBP; Owned by
cisco).
First hop redundancy techniques like as HSRP and VRRP
offer default gateway redundancy with one router functioning as
the active gateway router with one or more additional routers
3 | P a g e
retained in standby mode. While others like GLBP allows all
available gateway routers to load share and be operational at the
same time [5].
In this paper we use only Hot Standby Router Protocol.
A. Hot Standby Routing Protocol
HSRP is a Cisco proprietary redundancy protocol that allows
failover of the default gateway. The active-standby model
supports end-user traffic with one device at a time and one on
standby to take over if the active device fails.
HSRP routes IP traffic without relying on the availability of
any single router. It enables a set of router interfaces to work
together to present the appearance of a single virtual router or
default gateway to the hosts on a LAN.
HSRP allows a set of routers to function in harmony, giving
the hosts on the LAN the illusion of a single virtual router. This
set is known as a HSRP group or a standby group. A single
router chosen from the group is responsible for forwarding the
packets that hosts provide to the virtual router. This router is
known as the active router; another router is designated as the
backup router. In the case that the active router fails, the standby
will take over the active router's packet forwarding tasks. This
procedure is transparent to users. Although an arbitrary number
of routers may run HSRP, only the active router transmits the
packets sent to the virtual router. Devices in a HSRP group chose
the active router based on device priority [6]. To lower network
traffic, only the active and backup routers send periodic HSRP
messages once the protocol has finished the election process [6].
Each standby group emulates a single virtual router. For each
standby group, a single well-known virtual MAC and IP address
are given to the group. The IP address should belong to the
principal subnet in use on the LAN but must vary from the
addresses allocated as interface addresses on all routers and
hosts on the LAN, including virtual IP addresses issued to other
HSRP groups. Multiple hot standby groups could be configured;
each group operates independently of other groups [6].
The Hot Standby Redundancy Protocol (HSRP) has two
timers:
• Hello time is the estimated time that routers transmit in a
hello message to communicate that the peer router is
active, with a default value of 3 seconds.
• Hold time is the projected duration during which the
standby router will report that the peer is down and
becomes active, with a default value of 10 seconds.
These timers can be tuned and tweaked to achieve the lowest
convergence, making a network highly accessible.
VI. BIDIRECTIONAL FORWARDING DETECTION PORTOCOL
Bidirectional Forwarding Detection (BFD) is a fast
millisecond failure detection mechanism that rapidly detects link
failure and monitors IP connectivity on the entire network
independent of media and routing protocols, while maintaining
low overhead. It also provides a single, standardized method of
link/device/protocol failure detection at any protocol layer and
over any media.
The BFD protocol is designed to provide low overhead and
fast detection of link failures on any type of path, including
direct physical links, virtual circuits, tunnels, MPLS, label
switched paths (LSPs), and multihop routed paths. Furthermore,
it operates independently on the transmission media, data
protocol, and routing protocol, without any need to modify the
existing protocols [7].
BFD does not have a discovery mechanism; sessions must
be explicitly configured between endpoints. BFD may be used
on many different underlying transport mechanisms and layers
and operates independently of all of these. Therefore, it needs to
be encapsulated by whatever transport it uses. For example,
protocols that support some form of adjacency setup, such as
OSPF, IS-IS, BGP, or RIP, may be used to bootstrap a BFD
session. These protocols may then use BFD to receive faster
notification of failing links than would normally be possible
using the protocol's own keepalive mechanism [8].
In BFD, there is a set of parameters used to determine the
failure detection time:
• Detect Multi: Detection timeout multiplier is the number
of packets that have to be missed in a row to declare the
session to be down.
• Required Min Rx Interval (RMRI): minimum interval for
receiving BFD control packets.
• Desired Min Tx Interval (DMTI): minimum interval for
sending BFD control packets.
• Required Min Echo RX Interval (RMERI): minimum
interval for receiving Echo packets.
These parameters are within the BFD control packet that will
be sent.
A. BFD Detection Modes
There are two operating modes to BFD, asynchronous mode
and demand mode.
The asynchronous mode is similar to the hello and hold-
down timers. The system periodically sends BFD control
packets. The system considers that the session is down if it does
not receive any BFD control packets within a specific interval
[9].
In demand mode, a system sends several BFD control
packets that have the Poll (P) bit set at the negotiated transmit
interval. If no response is received within the detection interval,
the session is considered down. If the connectivity is found to be
up, no more BFD control packets are sent until the next
command is issued. Right now, Cisco doesn’t support BFD
demand mode [9].
B. BFD Detection Time
The detection time (the period of time without receiving
BFD packets after which the session is determined to have
failed) is not carried out explicitly in the protocol. Rather, it is
calculated independently in each direction by the receiving
system based on the negotiated transmit interval and the
detection multiplier. There may be different detection times in
each direction [10].
4 | P a g e
In asynchronous mode, the detection time calculated in the
local system is equal to the value of Detect Mult received from
the remote system, multiplied by the agreed transmit interval of
the remote system (the greater of required min Rx interval and
the last received desired min TX interval).
Detection time in asynchronous mode = received Detect
Multi of the remote system x max (local RMRI/received DMTI)
In demand mode, the detection time calculated in the local
system is equal to the value of Detect Multi of the local system,
multiplied by the agreed transmit interval of the remote system
(the greater of required min Rx interval and the last received
desired min TX interval).
Detection time in demand mode = Detect Multi of the local
system x max (local RMRI/received DMTI)
C. BFD Echo mode
BFD Echo is a rapid failure detection mechanism in which
the local system sends BFD Echo packets and the remote system
loops back the packets. BFD echo mode is enabled by default,
but you can disable it so that it can run independently in each
direction. BFD echo mode works with asynchronous BFD. Echo
packets are sent by the forwarding engine and forwarded back
along the same path in order to perform detection. The BFD
session at the other end does not participate in the actual
forwarding of the echo packets. The echo function and the
forwarding engine are responsible for the detection process,
while BFD control packets maintain the BFD session; therefore,
the number of BFD control packets that are sent out between two
BFD neighbors is reduced. In addition, because the forwarding
engine is testing the forwarding path on the remote (neighbor)
system without involving the remote system, there is an
opportunity to improve the interpacket delay variance, thereby
achieving quicker failure detection times [11].
Fig. 1.BFD Echo mode
VII. DESIGN AND SIMULATION
This paper focuses on implementing Hot Standby Router
Protocol with Bidirectional Forwarding Detection across three
sites and evaluating their performance in comparison to HSRP
without BFD. Enterprise site is connected to two different ISPs
to ensure high availability. In the event of a connection failure
between an ISP and a gateway, or if the ISP experiences a period
of unavailability, the gateway will promptly identify the
breakdown. This will enable the backup gateway, which is
connected to the other ISP, to take over and assume control. This
approach effectively minimizes network downtime, a crucial
objective for enterprises operating in contemporary network
environments.
A. Simulation Tools
In this work PNELAB network emulator software was used
to implement network scenarios.
PNELAB, also known as Packet Network Emulator Tool
Lab, is a resilient platform specifically created for simulating
networks and for instructional use. Users can utilize this
software to generate and replicate intricate network settings by
employing virtual devices, hence obviating the necessity for
tangible networking hardware such as routers and switches.
B. Network Design
The network is designed hierarchically to have two default
gateway routers, each connected to a different ISP. On the LAN
side, there are two access switches connecting to end devices.
Access switches are connected to gateway routers in a partial
mesh network design in order to eliminate single points of
failure in the enterprise network. The EIGRP routing protocol is
employed to provide routing between nodes in enterprise. The
topology is shown in Fig. 2
Router has a track object that is used to verify the connection
to the ISP. In case the connection goes down, the track object
decrements a value for the priority of the active router, which
will make it have less priority than the standby/backup, and it
will result in making the standby become the active router. In
the network design topology, routers on the left R1 have been
configured with higher priority than the routers on the right R2.
Fig. 2. Network Topology Used
C. Configuration
Initially, the HSRP will be implemented without BFD. In
this configuration, an IP Service Level Agreement (IP SLA) will
be utilized instead of a BFD to monitor the reachability of ISPs.
Cisco IOS IP SLAs are network performance measurement and
diagnostic tool that uses active monitoring. One of its purposes
is to verify whether a given IP address is reachable and report
the status. The IP SLA will be configured on the enterprise
routers to check the reachability of the ISPs. If an ISP becomes
unreachable, the IP SLA will detect this loss of connectivity and
report it to the HSRP installed on the router. A track object tied
to the IP SLA will detect an ISP is down and reduce the router's
priority value, allowing a higher-priority router to become the
active router.
5 | P a g e
When implementing HSRP with Bidirectional Forwarding
Detection (BFD), BFD is employed to monitor the connectivity
to Internet Service Providers (ISPs), detecting access issues
within milliseconds. BFD sessions are configured on the
enterprise router to check ISP reachability. The detection time is
set to 50 milliseconds, and BFD must be integrated with routing
protocols. It was activated via EIGRP.
In order to optimize the results and enhance network
performance, default timers could be tuned, Hello and Hold
timers will be optimized for HSRP without BFD and HSRP with
BFD, and the results before and after the optimization will be
compared. The default HSRP timer is 3 sec for hello and 10 sec
for hold, while the optimized HSRP timer is set to 1 sec for hello
and 3 sec for hold.
VIII. RESULTS
This section will present and discuss the measurements
conducted in order to test the performance of HSRP both with
and without BFD and presenting and evaluating the results of
HSRP without BFD compared to HSRP with BFD determine
which one provides better performance. The testing process
comprised transmitting 300 ICMP packets over a range of 5
minutes. Following this, an intentional ISP failure was generated
to observe and study the network's response. The obtained data
will be utilized to examine the influence of BFD on HSRP
performance and its effectiveness in minimizing downtime.
The measurements are taken in terms of convergence time,
CPU utilization, and bandwidth consumption.
In this paper, there are two convergence times: the first is the
time interval between receiving the last packet from the failed
ISP and receiving the first packet from the alternate ISP, which
is important to us, and the second is the time interval between
receiving the last packet from the failed ISP and the moment the
standby router becomes active.
A. HSRP without BFD Results
1) Convergence Time
• By using default timers for hello and hold, the
convergence time between R1 and R2, where R2
transitions to the active state, is equal to 7.3 seconds,
from the last packet received by ISP1 before the failure
at 14:25:51.62 to the moment when R2 sends an
advertisement as the active state at 14:25:58.92. The
convergence process between ISP1 and ISP2 takes is
equal to 9 seconds. From the last packet received by ISP1
before the failure and the first packet received by ISP2
after the failure at 14:26:00.63. During the convergence
process, 4 ICMP packets were lost.
• By using optimized timers for hello and hold, the
convergence time between R1 and R2, where R2
transitions to the active state, is equal to 3.69 seconds.
From the last packet received by ISP1 before the failure
at 15:01:12.08 to the moment when R2 sends an
advertisement as the active state at 15:01:15.77. The
convergence process between ISP1 and ISP2 takes is
equal to 4.91 seconds. From the last packet received by
ISP1 before the failure and the first packet received by
ISP2 after the failure at 15:01:16.99. During the
convergence process, 2 ICMP packets were lost
2) CPU Utilization
• Without timers’ optimization, HSRP consumed an
average of 0.05% of the CPU usage on routers R1 and
R2, while the CPU usage was 2% on R1 and 1% on R2.
• With timers’ optimization, HSRP consumed an average
of 0.09% of the CPU usage on routers R1 and R2, while
the CPU usage was 2% on R1 and 1% on R2.
3) Bandwidth Consumption
• During the testing period without timers’ optimization,
the traffic generated by hello packets accounted for
approximately 19.6% of the total network traffic. This
estimate is based on the default configuration, where
hello packets are sent every 3 seconds. The total size of
the packets was about 15 KB, with a total of 234 hello
packets exchanged between routers R1 and R2.
• During the testing period with timers’ optimization, the
traffic generated by hello packets accounted for
approximately 42% of the total network traffic. This
estimate is based on the optimized hello packet interval,
where packets are sent every 1 second. The total size of
the packets was about 41 KB, with a total of 670 hello
packets exchanged between routers R1 and R2.
• While the traffic generated by the SLA protocol
accounted for approximately 42.9%, ICMP packets are
sent every 1 second. Between routers R1 and ISP1, the
total size of the packets was approximately 45 KB.
B. HSRP with BFD Results
1) Convergence Time
• By using default timers for hello and hold, the
convergence process between ISP1 and ISP2 takes is
equal to 1 second. From the last packet received by ISP1
before the failure at 16:27:58.34 and the first packet
received by ISP2 after the failure at 16:27:59.34. The
convergence time between R1 and R2, where R2
transitions to the active state, is equal to 6.38 seconds.
From the last packet received by ISP1 before the failure
to the moment when R2 sends an advertisement in the
active state at 16:28:04.72. During the fast convergence
process, no ICMP packets were lost.
• By using optimized timers for hello and hold, the
convergence process between ISP1 and ISP2 takes is
equal to 1 second. From the last packet received by ISP1
before the failure at 16:28:11.34 and the first packet
received by ISP2 after the failure at 16:28:12.34. The
convergence time between R1 and R2, where R2
transitions to the active state, is equal to 3.33 seconds.
From the last packet received by ISP1 before the failure
to the moment when R2 sends an advertisement in the
active state at 16:28:14.67. During the fast convergence
process, no ICMP packets were lost.
6 | P a g e
2) CPU Utilization
• During the testing period, BFD consumed an average of
2.57% of the CPU usage on routers R1, while the CPU
usage was 5% on R1 and 1% on R2.
3) Bandwidth Consumption
• During the testing period, traffic generated by BFD
packets accounted for approximately 97.1% of the total
network traffic. This estimate is based on a failure
detection duration of 50 msec, during which BFD echo
packets are sent every 50 msec and BFD control packets
are sent every second. The BFD Echo and BFD Control
packets are approximately 1419 KB in size. A total of
25452 BFD echo packets and 665 BFD control packets
were exchanged between R1 and ISP1. As mentioned
previously, BFD Echo packets are responsible for
detecting failures, while BFD Control packets maintain
the BFD session between R1 and ISP1.
IX. COMPARISION AND EVALUATION
This section compares and evaluates the performance of
HSRP before and after BFD implementation. Comparison
parameters are convergence time, packet loss, CPU utilization,
and bandwidth consumption.
A. Convergence Time Comparison.
We can see from graph 2 that HSRP with BFD has the best
convergence time result at 1 second in both default and
optimized mode, thanks to a BFD failure detection time of 50
msec, compared to HSRP without BFD, which has an IP SLA
failure detection time of 1 second. Meanwhile, HSRP with BFD-
Optimize has the best convergence time to switch between
active and standby mode at 3.33 seconds, thanks to the
optimized hello packet sent every 1 second combined with the
BFD failure detection time.
Graph. 2. Convergence Time Comparison.
B. Packets Loss Comparison
During convergence, for HSRP without BFD, 4 packets were
lost before optimization "default" due to an IP SLA failure
detection time of 1 second with default hello packets sent every
3 seconds. With optimization, only 2 packets were lost thanks to
the optimized hello packet sent every 1 second. while for HSRP
with BFD, no packets were lost either before optimization or
with optimization, thanks to a BFD failure detection time of 50
milliseconds.
Graph. 3. Packet Loss Comparison.
C. CPU Utilization Comparison
Graph. 4 shows the increase in CPU usage observed when
using HSRP with BFD due to the high load resulting from
sending BFD echo packets every 50 milliseconds and BFD
control packets every second, so it can be concluded that HSRP
with BFD has the worst CPU usage compared to HSRP without
BFD.
Graph. 4. CPU Utilization Comparison.
D. Bandwidth Consumption
Graph. 5 shows that BFD consumes very high bandwidth,
about 97.2% compared to IP SLA, which was 42.9%, due to the
result of sending BFD echo packets every 50 milliseconds and
BFD control packets every second, which we mentioned earlier.
Graph. 5. Bandwidth Consumption Comparison.
9
4.91
1 1
7.3
3.69
6.38
3.33
HSRP-Defualt HSRP-Optimized HSRP with BFD-
Default
HSRP with BFD-
Optimized
CONVERGENCE TIME
Convergence time between ISP1 and ISP2
Convergence time between active R1 and standby R2
4
2
0 0
0
1
2
3
4
5
HSRP-Defualt HSRP-Optimized HSRP with BFD-
Default
HSRP with BFD-
Optimized
PACKET LOSS
2%
5%
HSRP without BFD HSRP with BFD
CPU UTILIZATION
HSRP-
Defualt
HSRP-
Optimized
HSRP with
BFD-
Default
HSRP with
BFD-
Optimized
Hello Packet 19.60% 42.40% 19.60% 42.40%
IP SLA Packet 42.9% 42.9% 0% 0%
BFD Packet 0% 0% 97.10% 97.10%
0%
20%
40%
60%
80%
100%
PERCENTAGE OF BANDWIDTH
CONSUMPTION
7 | P a g e
X. CONCLUSION
After implementing and testing HSRP without and with
BFD and studying and analyzing their output four important
factors, which are convergence time, packet loss, CPU
utilization, and bandwidth consumption, it is clear to see that
using BFD with HSRP significantly improves convergence time
and reduces packet loss, but the cost was an increase in CPU
utilization and high bandwidth consumption. As a result, you
have to balance between reducing packet loss or reducing CPU
and bandwidth used. Whether BFD is used depends on the
importance of downtime for companies, provided that sufficient
resources are available to meet CPU and bandwidth
requirements.
REFERENCES
[1] Priscilla Oppenheimer, “Top-Down Network Design”, Cisco Press, 3ed
edition, 2010.
[2] Mattias Thulin, “Measuring Availability in Telecommunications
Networks”, “Master’s thesis report at Song Networks AB”, 2004, pp 13.
[3] Opsworks.co: The cost of downtime: the truth and facts of IT Downtime,
https://opsworks.co/cost-of-downtime-truth-and-facts-of-it-downtime.
[4] Andy Sholomon, Tom Kunath, “Enterprise Network Testing”, Cisco
Press, 1st edition, 2011.
[5] Priyanka Dubey, Shilpi Sharma, Aabha Sachdev, “Review of First Hop
Redundancy Protocol and Their Functionalities”, International Journal of
Engineering Trends and Technology, 2013, pp 1085-1088.
[6] T. Li, B. Cole, P. Morton, D. Li,” Cisco Hot Standby Router Protocol”,
pp 2, 1998.
[7] Suncheul Kim, Hoyong Ryu, “FDVRRP: Router implementation for fast
detection and high availability in network failure cases”, 2019.
[8] En.wikipedia.org: bidirectional forwarding detection,
https://en.wikipedia.org/wiki/Bidirectional_Forwarding_Detection.
[9] Huawei.com: cloudengine s5700 and s6700 v600r022c00 configuration
guide - high availability, https://support.huawei.com/enterprise/en/doc/
EDOC1100278274/ ae8adc7e /understanding-bfd.
[10] Datatracker.ietf.org: bidirectional forwarding detection (BFD),
https://datatracker.ietf.org/doc/html/rfc5880#page-32.
[11] cisco.com: Routing Configuration Guide, Cisco Ios XE Everest 16.6.X
(Catalyst 9500 Switches),
https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst9500/softwa
re/release/16-
6/configuration_guide/b_166_rtng_9500_cg/b_166_rtng_9500_cg_chapt
er_00.html, 2017.
[12] Yu Niu and Xiandong Li et al, “Design and Implementation of VRRP
and BFD Linkage Technology in Campus Information Service Platform
Network”, ICMLCA '23: Proceedings of the 2023 4th International
Conference on Machine Learning and Computer Application.
[13] Najia Ben Saud and Mahmud Mansour , “Performance Evaluation of First
Hop Redundancy Protocols in IPv4 and IPv6 Networks”,2023 IEEE 3rd
International Maghreb Meeting of the Conference on Sciences and
Techniques of Automatic Control and Computer Engineering (MI-STA)
May 21-23 2023,Benghazi, Libya.
[14] M. Mansour et al, “Performance Evaluation of First Hop Redundancy
Protocols ”, The 13th International Conference on Emerging Ubiquitous
Systems and Pervasive Networks (EUSPN 2022) October 26-28, 2022,
Leuven, Belgium.
[15] M. Mansour et al, “Performance Analysis and Functionality Comparison
of First Hop Redundancy Protocols”, Journal of Ubiquitous System &
Prevasive Networks. Volume 15, No. 1 (2021) pp.49-58.
[16] Imelda Ristanti Julia et al, “Protocol (FHRP) on VRRP, HSRP, GLBP
with Routing Protocol BGP and EIGRP”, The 8th International
Conference on Cyber and IT Service Management (CITSM 2020) On
Virtual, October 23-24, 2020.
[17] Suncheul Kim and Hoyong Ryu , “FDVRRP: Router implementation for
fast detection and high availability in network failure cases ”, ICT R&D
program of MSIP/IITP, Republic of Korea ,15 May 2019.

More Related Content

DOCX
Top-Down Network DesignAnalyzing Technical Goals.docx
PDF
A COMPREHENSIVE SOLUTION TO CLOUD TRAFFIC TRIBULATIONS
PDF
A COMPREHENSIVE SOLUTION TO CLOUD TRAFFIC TRIBULATIONS
PDF
A COMPREHENSIVE SOLUTION TO CLOUD TRAFFIC TRIBULATIONS
PDF
A COMPREHENSIVE SOLUTION TO CLOUD TRAFFIC TRIBULATIONS
PDF
A COMPREHENSIVE SOLUTION TO CLOUD TRAFFIC TRIBULATIONS
PDF
Network Rightsizing Best Practices Guide
PDF
IP Network Control Turning an Art into a Science (Customer Case Study)
Top-Down Network DesignAnalyzing Technical Goals.docx
A COMPREHENSIVE SOLUTION TO CLOUD TRAFFIC TRIBULATIONS
A COMPREHENSIVE SOLUTION TO CLOUD TRAFFIC TRIBULATIONS
A COMPREHENSIVE SOLUTION TO CLOUD TRAFFIC TRIBULATIONS
A COMPREHENSIVE SOLUTION TO CLOUD TRAFFIC TRIBULATIONS
A COMPREHENSIVE SOLUTION TO CLOUD TRAFFIC TRIBULATIONS
Network Rightsizing Best Practices Guide
IP Network Control Turning an Art into a Science (Customer Case Study)

Similar to Performance Evaluation of Bidirectional Forwarding Detection (BFD) over the First Hop Redundancy Protocols (FHRPs) (20)

PDF
Edge device multi-unicasting for video streaming
PDF
IRJET- Cost Effective Scheme for Delay Tolerant Data Transmission
PDF
A Comparative Review on Fault Tolerance methods and models in Cloud Computing
PDF
sprintnet-smartshare_info-and-benchmarks
DOC
International R/E Routing (v1.0)
PDF
Network barometer report 2014
DOCX
2.2. Case Study #2 DTGOVDTGOV is a public company that was crea.docx
PDF
Saving lives with public safety vehicle broadband
PDF
Carrier-Class Availability for Enterprises
PDF
Tata Comm whitepaper
PDF
Challenges&opportunities 2017 onwards v5.2
PDF
F0353743
DOCX
Running headCapstone Project Activity - Unit 2 2C.docx
PDF
Analysis of Hierarchical Scheduling for Heterogeneous Traffic over Network
PDF
A Comparative Analysis of the Performance of VoIP Traffic with Different Type...
PDF
SECURE THIRD PARTY AUDITOR (TPA) FOR ENSURING DATA INTEGRITY IN FOG COMPUTING
PDF
SECURE THIRD PARTY AUDITOR (TPA) FOR ENSURING DATA INTEGRITY IN FOG COMPUTING
PDF
SECURE THIRD PARTY AUDITOR (TPA) FOR ENSURING DATA INTEGRITY IN FOG COMPUTING
PDF
Server congestion control
PDF
En35793797
Edge device multi-unicasting for video streaming
IRJET- Cost Effective Scheme for Delay Tolerant Data Transmission
A Comparative Review on Fault Tolerance methods and models in Cloud Computing
sprintnet-smartshare_info-and-benchmarks
International R/E Routing (v1.0)
Network barometer report 2014
2.2. Case Study #2 DTGOVDTGOV is a public company that was crea.docx
Saving lives with public safety vehicle broadband
Carrier-Class Availability for Enterprises
Tata Comm whitepaper
Challenges&opportunities 2017 onwards v5.2
F0353743
Running headCapstone Project Activity - Unit 2 2C.docx
Analysis of Hierarchical Scheduling for Heterogeneous Traffic over Network
A Comparative Analysis of the Performance of VoIP Traffic with Different Type...
SECURE THIRD PARTY AUDITOR (TPA) FOR ENSURING DATA INTEGRITY IN FOG COMPUTING
SECURE THIRD PARTY AUDITOR (TPA) FOR ENSURING DATA INTEGRITY IN FOG COMPUTING
SECURE THIRD PARTY AUDITOR (TPA) FOR ENSURING DATA INTEGRITY IN FOG COMPUTING
Server congestion control
En35793797
Ad

Recently uploaded (20)

PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Complications of Minimal Access Surgery at WLH
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Lesson notes of climatology university.
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Complications of Minimal Access Surgery at WLH
102 student loan defaulters named and shamed – Is someone you know on the list?
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
TR - Agricultural Crops Production NC III.pdf
human mycosis Human fungal infections are called human mycosis..pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Anesthesia in Laparoscopic Surgery in India
Microbial diseases, their pathogenesis and prophylaxis
Final Presentation General Medicine 03-08-2024.pptx
VCE English Exam - Section C Student Revision Booklet
Lesson notes of climatology university.
Pharmacology of Heart Failure /Pharmacotherapy of CHF
GDM (1) (1).pptx small presentation for students
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Module 4: Burden of Disease Tutorial Slides S2 2025
Ad

Performance Evaluation of Bidirectional Forwarding Detection (BFD) over the First Hop Redundancy Protocols (FHRPs)

  • 1. 1 | P a g e Implementation of Bidirectional Forwarding Detection (BFD) over the Hot Standby Router Protocol (HSRP) with the Enhanced Interior Gateway Routing Protocol (EIGRP) Ahmed Ben Hassan Department of Network University of Tripoli Tripoli, Libya A.BENHASSAN@uot.edu.ly Mahmud Mansour Department of Network University of Tripoli Tripoli, Libya Mah.mansour@uot.edu.ly Abstract— Organizations are increasingly prioritizing network availability and minimizing downtime due to the growing demand for online applications and services. Maintaining high availability can be costly, but a lack can damage an organization's reputation and cause significant financial losses. To enhance IP network availability, the Hot Standby Router Protocol (HSRP) is a necessary tool that is used to achieve this goal. HSRP is a Cisco proprietary redundancy protocol that is used to manage network default gateway routers by using one or more redundant routers that will take over in case of default router failure. However, late failure detections and slow responses can lead to packet loss during failure. Bidirectional Forwarding Detection (BFD) is an effective solution to increase availability by rapidly detecting link failure and monitoring IP connectivity. In our work, we implemented BFD with HSRP to see if BFD helps in reducing downtime and enhancing the availability of the IP networks. The comparison was made based on convergence time, packet loss, CPU usage, and bandwidth consumption and after implementation, testing, and optimization and using the PNETLAB emulation tool. We have verified that HSRP with BFD shows very fast failure detection and recovery with reduced downtime and packet loss, thus improving network reliability and stability. Keywords—FHRP, HSRP, BFD I. INTRODUCTION Availability has emerged as a significant concern for enterprises and businesses in today's network. Every minute of service interruption has the potential to result in significant financial losses for a firm, amounting to hundreds or even thousands of dollars. To avoid outages, we aim to enhance the network's uptime by implementing redundant lines and nodes. While redundancy might be beneficial, it also comes with a high cost. Achieving optimal network availability is dependent on the client's specific business objectives and their tolerance for network downtime. II. AVAILABILITY Availability refers to the length of time a network is available to users and is generally a crucial aim for network design clients. Availability can be defined as a percent uptime per year, month, week, day, or hour, relative to the entire time in that period. For example, in a network that delivers 24-hour, 7- day-a-week service, if the network is up 165 hours in the 168- hour week, availability is 98.21 percent [1]. In general, availability means how long the network is operational. Availability is linked to reliability, but it has a more specific meaning (percent uptime) than reliability. Reliability refers to a variety of issues, including accuracy, error rates, stability, and the amount of time between failures [1]. Availability is closely linked to resilience, a concept that is becoming more common in the networking field. Resiliency is how much stress a network can bear and how rapidly it can bounce back from problems, including security breaches, natural and unnatural disasters, human error, and catastrophic software or hardware failures [1]. Normally, availability is represented as the percentage of time the network is functional. It was here that the phrase “five- nine” came into usage. Five-nines refer to the percentage of 99.999%, which is a generality that has for long been used for marketing and has been seen as the desirable target for availability in many networks, at least at the core level. Five- nine translates to five minutes of downtime a year [2]. Graph. 1. Availability percentage in minutes To determine theoretic availability, the network is separated into each dependent item, such as hardware, software, physical 16.8 hours 1.68 hours 10.1 minutes 1.01 minutes 6.05 seconds 90% One nine 99% Two nines 99.9% Three nines 99.99% Four nines 99.999% Five nines DOWNTIM E PER WEEK
  • 2. 2 | P a g e connections, power supply, etc. For most equipment, the manufacturer will offer information on availability expectations, generally characterized as the mean time between failures (MTBF). For those elements of the network that do not have this data, such as a power source, statistical data and guesses must be employed. The projected time to repair each portion of the network has to be calculated. This is generally referred to as the Mean Time to Repair (MTTR). Each unit's availability is determined by: 𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑖𝑙𝑖𝑡𝑦 = 𝑀𝑇𝐵𝐹 𝑀𝑇𝐵𝐹+𝑀𝑇𝑇𝑅 (1) To compute the overall availability of the network, the availability of all units has to be totaled together. Logically, a chain can never be stronger than its weakest link. Adding redundancy will result in greater availability. However, adding redundancy does not necessarily boost availability in a linear sense. A switchover from one route to another takes time, and during this moment, the connection will be offline [2]. III. COST OF NETWORK DOWNTIME Many firms may not fully understand the impact of downtime on their business. Calculating the impact's cost may be tough, as it demands a thorough understanding of both physical and intangible losses. Actual losses are physical expenditures; they include lost income, the cost to retrieve lost information, catastrophe recovery, and business continuity costs. Intangible costs include damage to your company's reputation, lost customers, and staff productivity losses. In many circumstances, the damage associated with intangible costs may have a greater long-term effect on an organization than that of actual expenses. Downtime cost is defined as any profit that a corporation loses when its equipment or network stops working. According to Research, the losses associated with network downtime include: • Reputation • Productivity losses • Opportunities • Data In 2020, an ITIC research found that from 2016, the average cost of a one-hour layover had grown by 30%. The bottom line is that of the 1,000 firms that participated in the study, more than 30% reported spending between $1 and $5 million on one hour of downtime. Meanwhile, over $300,000 is worth 1 hour of downtime for approximately 80% of enterprises. Finally, 98% claimed that one hour of downtime costs them roughly $100,000 [3]. According to a July 2009 white paper titled “Navigating Network Infrastructure Expenditures during Business Transformations,” authored by Lippis Consulting, the cost of network downtime for a financial firm’s brokerage business was determined to be $7.8 million per hour. A one-hour interruption for a financial firm’s credit card processing can cost upwards of $3.1 million. A media organization might lose money on pay- per-view revenues, an airline company in ticket sales, and a retail company in catalog sales [4]. IV. RELATED WORK In 2024, “Design and Implementation of VRRP and BFD Linkage Technology in Campus Information Service Platform Network” by Yu Niu and Xiandong Li [12], They studied how to integrate BFD and VRRP technologies into existing network architectures to improving the reliability and stability of the network. In 2023, “Performance Evaluation of First Hop Redundancy Protocols in IPv4 and IPv6 Networks” by Najia Ben Saud [13], compared the FHRP performance in terms of packet loss, convergence time and CPU Utilization IPv4 and IPv6 Networks In 2022, “Performance Evaluation of First Hop Redundancy Protocols IPv6” by M. Mansour [14], focused on the FHRP performance in terms of packet loss and convergence time In 2021, “Performance Analysis and Functionality Comparison of First Hop Redundancy Protocols” by M.Mansour [15], studied the effect of different parameters mainly bandwidth consumption, traffic flow, convergence time and CPU utilization. In a previous study [16], by Imelda et al in 2020, entitled "Performance Analysis of VRRP, HSRP, and GLBP with EIGRP Routing Protocol", a comparison in performance between VRRP, HSRP, and GLBP were introduced, EIGRP routing protocol was applied. In 2019, “FDVRRP: Router implementation for fast detection and high availability in network failure cases” by Suncheul Kim and Hoyong Ryu [17], studied the Implement fast detection BFD with VRRP to improve failure detection and a failover. V. FIRT HOP REDUNDANCY PROTOCOLS First Hop Redundancy Protocol (FHRP) is a suite of protocols that enable a router on a network to instantly take over if the main default gateway router fails. The devices in a shared network segment are set with a single default gateway address, which relates to the router that connects to the rest of the network. The trouble emerges when this main router fails, and there is a second router on the segment that is also capable of becoming the default gateway, but end devices don’t know about it. Hence, if the initial default gateway router fails, the network will stop [5]. First hop redundancy protocols are one of the solutions to this problem. The three primary First Hop Redundancy Protocols are: • Hot Standby Router Protocol (HSRP; Owned by cisco) • Virtual Router Redundancy Protocol (VRRP; Open Standard) • Gateway Load Balancing Protocol (GLBP; Owned by cisco). First hop redundancy techniques like as HSRP and VRRP offer default gateway redundancy with one router functioning as the active gateway router with one or more additional routers
  • 3. 3 | P a g e retained in standby mode. While others like GLBP allows all available gateway routers to load share and be operational at the same time [5]. In this paper we use only Hot Standby Router Protocol. A. Hot Standby Routing Protocol HSRP is a Cisco proprietary redundancy protocol that allows failover of the default gateway. The active-standby model supports end-user traffic with one device at a time and one on standby to take over if the active device fails. HSRP routes IP traffic without relying on the availability of any single router. It enables a set of router interfaces to work together to present the appearance of a single virtual router or default gateway to the hosts on a LAN. HSRP allows a set of routers to function in harmony, giving the hosts on the LAN the illusion of a single virtual router. This set is known as a HSRP group or a standby group. A single router chosen from the group is responsible for forwarding the packets that hosts provide to the virtual router. This router is known as the active router; another router is designated as the backup router. In the case that the active router fails, the standby will take over the active router's packet forwarding tasks. This procedure is transparent to users. Although an arbitrary number of routers may run HSRP, only the active router transmits the packets sent to the virtual router. Devices in a HSRP group chose the active router based on device priority [6]. To lower network traffic, only the active and backup routers send periodic HSRP messages once the protocol has finished the election process [6]. Each standby group emulates a single virtual router. For each standby group, a single well-known virtual MAC and IP address are given to the group. The IP address should belong to the principal subnet in use on the LAN but must vary from the addresses allocated as interface addresses on all routers and hosts on the LAN, including virtual IP addresses issued to other HSRP groups. Multiple hot standby groups could be configured; each group operates independently of other groups [6]. The Hot Standby Redundancy Protocol (HSRP) has two timers: • Hello time is the estimated time that routers transmit in a hello message to communicate that the peer router is active, with a default value of 3 seconds. • Hold time is the projected duration during which the standby router will report that the peer is down and becomes active, with a default value of 10 seconds. These timers can be tuned and tweaked to achieve the lowest convergence, making a network highly accessible. VI. BIDIRECTIONAL FORWARDING DETECTION PORTOCOL Bidirectional Forwarding Detection (BFD) is a fast millisecond failure detection mechanism that rapidly detects link failure and monitors IP connectivity on the entire network independent of media and routing protocols, while maintaining low overhead. It also provides a single, standardized method of link/device/protocol failure detection at any protocol layer and over any media. The BFD protocol is designed to provide low overhead and fast detection of link failures on any type of path, including direct physical links, virtual circuits, tunnels, MPLS, label switched paths (LSPs), and multihop routed paths. Furthermore, it operates independently on the transmission media, data protocol, and routing protocol, without any need to modify the existing protocols [7]. BFD does not have a discovery mechanism; sessions must be explicitly configured between endpoints. BFD may be used on many different underlying transport mechanisms and layers and operates independently of all of these. Therefore, it needs to be encapsulated by whatever transport it uses. For example, protocols that support some form of adjacency setup, such as OSPF, IS-IS, BGP, or RIP, may be used to bootstrap a BFD session. These protocols may then use BFD to receive faster notification of failing links than would normally be possible using the protocol's own keepalive mechanism [8]. In BFD, there is a set of parameters used to determine the failure detection time: • Detect Multi: Detection timeout multiplier is the number of packets that have to be missed in a row to declare the session to be down. • Required Min Rx Interval (RMRI): minimum interval for receiving BFD control packets. • Desired Min Tx Interval (DMTI): minimum interval for sending BFD control packets. • Required Min Echo RX Interval (RMERI): minimum interval for receiving Echo packets. These parameters are within the BFD control packet that will be sent. A. BFD Detection Modes There are two operating modes to BFD, asynchronous mode and demand mode. The asynchronous mode is similar to the hello and hold- down timers. The system periodically sends BFD control packets. The system considers that the session is down if it does not receive any BFD control packets within a specific interval [9]. In demand mode, a system sends several BFD control packets that have the Poll (P) bit set at the negotiated transmit interval. If no response is received within the detection interval, the session is considered down. If the connectivity is found to be up, no more BFD control packets are sent until the next command is issued. Right now, Cisco doesn’t support BFD demand mode [9]. B. BFD Detection Time The detection time (the period of time without receiving BFD packets after which the session is determined to have failed) is not carried out explicitly in the protocol. Rather, it is calculated independently in each direction by the receiving system based on the negotiated transmit interval and the detection multiplier. There may be different detection times in each direction [10].
  • 4. 4 | P a g e In asynchronous mode, the detection time calculated in the local system is equal to the value of Detect Mult received from the remote system, multiplied by the agreed transmit interval of the remote system (the greater of required min Rx interval and the last received desired min TX interval). Detection time in asynchronous mode = received Detect Multi of the remote system x max (local RMRI/received DMTI) In demand mode, the detection time calculated in the local system is equal to the value of Detect Multi of the local system, multiplied by the agreed transmit interval of the remote system (the greater of required min Rx interval and the last received desired min TX interval). Detection time in demand mode = Detect Multi of the local system x max (local RMRI/received DMTI) C. BFD Echo mode BFD Echo is a rapid failure detection mechanism in which the local system sends BFD Echo packets and the remote system loops back the packets. BFD echo mode is enabled by default, but you can disable it so that it can run independently in each direction. BFD echo mode works with asynchronous BFD. Echo packets are sent by the forwarding engine and forwarded back along the same path in order to perform detection. The BFD session at the other end does not participate in the actual forwarding of the echo packets. The echo function and the forwarding engine are responsible for the detection process, while BFD control packets maintain the BFD session; therefore, the number of BFD control packets that are sent out between two BFD neighbors is reduced. In addition, because the forwarding engine is testing the forwarding path on the remote (neighbor) system without involving the remote system, there is an opportunity to improve the interpacket delay variance, thereby achieving quicker failure detection times [11]. Fig. 1.BFD Echo mode VII. DESIGN AND SIMULATION This paper focuses on implementing Hot Standby Router Protocol with Bidirectional Forwarding Detection across three sites and evaluating their performance in comparison to HSRP without BFD. Enterprise site is connected to two different ISPs to ensure high availability. In the event of a connection failure between an ISP and a gateway, or if the ISP experiences a period of unavailability, the gateway will promptly identify the breakdown. This will enable the backup gateway, which is connected to the other ISP, to take over and assume control. This approach effectively minimizes network downtime, a crucial objective for enterprises operating in contemporary network environments. A. Simulation Tools In this work PNELAB network emulator software was used to implement network scenarios. PNELAB, also known as Packet Network Emulator Tool Lab, is a resilient platform specifically created for simulating networks and for instructional use. Users can utilize this software to generate and replicate intricate network settings by employing virtual devices, hence obviating the necessity for tangible networking hardware such as routers and switches. B. Network Design The network is designed hierarchically to have two default gateway routers, each connected to a different ISP. On the LAN side, there are two access switches connecting to end devices. Access switches are connected to gateway routers in a partial mesh network design in order to eliminate single points of failure in the enterprise network. The EIGRP routing protocol is employed to provide routing between nodes in enterprise. The topology is shown in Fig. 2 Router has a track object that is used to verify the connection to the ISP. In case the connection goes down, the track object decrements a value for the priority of the active router, which will make it have less priority than the standby/backup, and it will result in making the standby become the active router. In the network design topology, routers on the left R1 have been configured with higher priority than the routers on the right R2. Fig. 2. Network Topology Used C. Configuration Initially, the HSRP will be implemented without BFD. In this configuration, an IP Service Level Agreement (IP SLA) will be utilized instead of a BFD to monitor the reachability of ISPs. Cisco IOS IP SLAs are network performance measurement and diagnostic tool that uses active monitoring. One of its purposes is to verify whether a given IP address is reachable and report the status. The IP SLA will be configured on the enterprise routers to check the reachability of the ISPs. If an ISP becomes unreachable, the IP SLA will detect this loss of connectivity and report it to the HSRP installed on the router. A track object tied to the IP SLA will detect an ISP is down and reduce the router's priority value, allowing a higher-priority router to become the active router.
  • 5. 5 | P a g e When implementing HSRP with Bidirectional Forwarding Detection (BFD), BFD is employed to monitor the connectivity to Internet Service Providers (ISPs), detecting access issues within milliseconds. BFD sessions are configured on the enterprise router to check ISP reachability. The detection time is set to 50 milliseconds, and BFD must be integrated with routing protocols. It was activated via EIGRP. In order to optimize the results and enhance network performance, default timers could be tuned, Hello and Hold timers will be optimized for HSRP without BFD and HSRP with BFD, and the results before and after the optimization will be compared. The default HSRP timer is 3 sec for hello and 10 sec for hold, while the optimized HSRP timer is set to 1 sec for hello and 3 sec for hold. VIII. RESULTS This section will present and discuss the measurements conducted in order to test the performance of HSRP both with and without BFD and presenting and evaluating the results of HSRP without BFD compared to HSRP with BFD determine which one provides better performance. The testing process comprised transmitting 300 ICMP packets over a range of 5 minutes. Following this, an intentional ISP failure was generated to observe and study the network's response. The obtained data will be utilized to examine the influence of BFD on HSRP performance and its effectiveness in minimizing downtime. The measurements are taken in terms of convergence time, CPU utilization, and bandwidth consumption. In this paper, there are two convergence times: the first is the time interval between receiving the last packet from the failed ISP and receiving the first packet from the alternate ISP, which is important to us, and the second is the time interval between receiving the last packet from the failed ISP and the moment the standby router becomes active. A. HSRP without BFD Results 1) Convergence Time • By using default timers for hello and hold, the convergence time between R1 and R2, where R2 transitions to the active state, is equal to 7.3 seconds, from the last packet received by ISP1 before the failure at 14:25:51.62 to the moment when R2 sends an advertisement as the active state at 14:25:58.92. The convergence process between ISP1 and ISP2 takes is equal to 9 seconds. From the last packet received by ISP1 before the failure and the first packet received by ISP2 after the failure at 14:26:00.63. During the convergence process, 4 ICMP packets were lost. • By using optimized timers for hello and hold, the convergence time between R1 and R2, where R2 transitions to the active state, is equal to 3.69 seconds. From the last packet received by ISP1 before the failure at 15:01:12.08 to the moment when R2 sends an advertisement as the active state at 15:01:15.77. The convergence process between ISP1 and ISP2 takes is equal to 4.91 seconds. From the last packet received by ISP1 before the failure and the first packet received by ISP2 after the failure at 15:01:16.99. During the convergence process, 2 ICMP packets were lost 2) CPU Utilization • Without timers’ optimization, HSRP consumed an average of 0.05% of the CPU usage on routers R1 and R2, while the CPU usage was 2% on R1 and 1% on R2. • With timers’ optimization, HSRP consumed an average of 0.09% of the CPU usage on routers R1 and R2, while the CPU usage was 2% on R1 and 1% on R2. 3) Bandwidth Consumption • During the testing period without timers’ optimization, the traffic generated by hello packets accounted for approximately 19.6% of the total network traffic. This estimate is based on the default configuration, where hello packets are sent every 3 seconds. The total size of the packets was about 15 KB, with a total of 234 hello packets exchanged between routers R1 and R2. • During the testing period with timers’ optimization, the traffic generated by hello packets accounted for approximately 42% of the total network traffic. This estimate is based on the optimized hello packet interval, where packets are sent every 1 second. The total size of the packets was about 41 KB, with a total of 670 hello packets exchanged between routers R1 and R2. • While the traffic generated by the SLA protocol accounted for approximately 42.9%, ICMP packets are sent every 1 second. Between routers R1 and ISP1, the total size of the packets was approximately 45 KB. B. HSRP with BFD Results 1) Convergence Time • By using default timers for hello and hold, the convergence process between ISP1 and ISP2 takes is equal to 1 second. From the last packet received by ISP1 before the failure at 16:27:58.34 and the first packet received by ISP2 after the failure at 16:27:59.34. The convergence time between R1 and R2, where R2 transitions to the active state, is equal to 6.38 seconds. From the last packet received by ISP1 before the failure to the moment when R2 sends an advertisement in the active state at 16:28:04.72. During the fast convergence process, no ICMP packets were lost. • By using optimized timers for hello and hold, the convergence process between ISP1 and ISP2 takes is equal to 1 second. From the last packet received by ISP1 before the failure at 16:28:11.34 and the first packet received by ISP2 after the failure at 16:28:12.34. The convergence time between R1 and R2, where R2 transitions to the active state, is equal to 3.33 seconds. From the last packet received by ISP1 before the failure to the moment when R2 sends an advertisement in the active state at 16:28:14.67. During the fast convergence process, no ICMP packets were lost.
  • 6. 6 | P a g e 2) CPU Utilization • During the testing period, BFD consumed an average of 2.57% of the CPU usage on routers R1, while the CPU usage was 5% on R1 and 1% on R2. 3) Bandwidth Consumption • During the testing period, traffic generated by BFD packets accounted for approximately 97.1% of the total network traffic. This estimate is based on a failure detection duration of 50 msec, during which BFD echo packets are sent every 50 msec and BFD control packets are sent every second. The BFD Echo and BFD Control packets are approximately 1419 KB in size. A total of 25452 BFD echo packets and 665 BFD control packets were exchanged between R1 and ISP1. As mentioned previously, BFD Echo packets are responsible for detecting failures, while BFD Control packets maintain the BFD session between R1 and ISP1. IX. COMPARISION AND EVALUATION This section compares and evaluates the performance of HSRP before and after BFD implementation. Comparison parameters are convergence time, packet loss, CPU utilization, and bandwidth consumption. A. Convergence Time Comparison. We can see from graph 2 that HSRP with BFD has the best convergence time result at 1 second in both default and optimized mode, thanks to a BFD failure detection time of 50 msec, compared to HSRP without BFD, which has an IP SLA failure detection time of 1 second. Meanwhile, HSRP with BFD- Optimize has the best convergence time to switch between active and standby mode at 3.33 seconds, thanks to the optimized hello packet sent every 1 second combined with the BFD failure detection time. Graph. 2. Convergence Time Comparison. B. Packets Loss Comparison During convergence, for HSRP without BFD, 4 packets were lost before optimization "default" due to an IP SLA failure detection time of 1 second with default hello packets sent every 3 seconds. With optimization, only 2 packets were lost thanks to the optimized hello packet sent every 1 second. while for HSRP with BFD, no packets were lost either before optimization or with optimization, thanks to a BFD failure detection time of 50 milliseconds. Graph. 3. Packet Loss Comparison. C. CPU Utilization Comparison Graph. 4 shows the increase in CPU usage observed when using HSRP with BFD due to the high load resulting from sending BFD echo packets every 50 milliseconds and BFD control packets every second, so it can be concluded that HSRP with BFD has the worst CPU usage compared to HSRP without BFD. Graph. 4. CPU Utilization Comparison. D. Bandwidth Consumption Graph. 5 shows that BFD consumes very high bandwidth, about 97.2% compared to IP SLA, which was 42.9%, due to the result of sending BFD echo packets every 50 milliseconds and BFD control packets every second, which we mentioned earlier. Graph. 5. Bandwidth Consumption Comparison. 9 4.91 1 1 7.3 3.69 6.38 3.33 HSRP-Defualt HSRP-Optimized HSRP with BFD- Default HSRP with BFD- Optimized CONVERGENCE TIME Convergence time between ISP1 and ISP2 Convergence time between active R1 and standby R2 4 2 0 0 0 1 2 3 4 5 HSRP-Defualt HSRP-Optimized HSRP with BFD- Default HSRP with BFD- Optimized PACKET LOSS 2% 5% HSRP without BFD HSRP with BFD CPU UTILIZATION HSRP- Defualt HSRP- Optimized HSRP with BFD- Default HSRP with BFD- Optimized Hello Packet 19.60% 42.40% 19.60% 42.40% IP SLA Packet 42.9% 42.9% 0% 0% BFD Packet 0% 0% 97.10% 97.10% 0% 20% 40% 60% 80% 100% PERCENTAGE OF BANDWIDTH CONSUMPTION
  • 7. 7 | P a g e X. CONCLUSION After implementing and testing HSRP without and with BFD and studying and analyzing their output four important factors, which are convergence time, packet loss, CPU utilization, and bandwidth consumption, it is clear to see that using BFD with HSRP significantly improves convergence time and reduces packet loss, but the cost was an increase in CPU utilization and high bandwidth consumption. As a result, you have to balance between reducing packet loss or reducing CPU and bandwidth used. Whether BFD is used depends on the importance of downtime for companies, provided that sufficient resources are available to meet CPU and bandwidth requirements. REFERENCES [1] Priscilla Oppenheimer, “Top-Down Network Design”, Cisco Press, 3ed edition, 2010. [2] Mattias Thulin, “Measuring Availability in Telecommunications Networks”, “Master’s thesis report at Song Networks AB”, 2004, pp 13. [3] Opsworks.co: The cost of downtime: the truth and facts of IT Downtime, https://opsworks.co/cost-of-downtime-truth-and-facts-of-it-downtime. [4] Andy Sholomon, Tom Kunath, “Enterprise Network Testing”, Cisco Press, 1st edition, 2011. [5] Priyanka Dubey, Shilpi Sharma, Aabha Sachdev, “Review of First Hop Redundancy Protocol and Their Functionalities”, International Journal of Engineering Trends and Technology, 2013, pp 1085-1088. [6] T. Li, B. Cole, P. Morton, D. Li,” Cisco Hot Standby Router Protocol”, pp 2, 1998. [7] Suncheul Kim, Hoyong Ryu, “FDVRRP: Router implementation for fast detection and high availability in network failure cases”, 2019. [8] En.wikipedia.org: bidirectional forwarding detection, https://en.wikipedia.org/wiki/Bidirectional_Forwarding_Detection. [9] Huawei.com: cloudengine s5700 and s6700 v600r022c00 configuration guide - high availability, https://support.huawei.com/enterprise/en/doc/ EDOC1100278274/ ae8adc7e /understanding-bfd. [10] Datatracker.ietf.org: bidirectional forwarding detection (BFD), https://datatracker.ietf.org/doc/html/rfc5880#page-32. [11] cisco.com: Routing Configuration Guide, Cisco Ios XE Everest 16.6.X (Catalyst 9500 Switches), https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst9500/softwa re/release/16- 6/configuration_guide/b_166_rtng_9500_cg/b_166_rtng_9500_cg_chapt er_00.html, 2017. [12] Yu Niu and Xiandong Li et al, “Design and Implementation of VRRP and BFD Linkage Technology in Campus Information Service Platform Network”, ICMLCA '23: Proceedings of the 2023 4th International Conference on Machine Learning and Computer Application. [13] Najia Ben Saud and Mahmud Mansour , “Performance Evaluation of First Hop Redundancy Protocols in IPv4 and IPv6 Networks”,2023 IEEE 3rd International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering (MI-STA) May 21-23 2023,Benghazi, Libya. [14] M. Mansour et al, “Performance Evaluation of First Hop Redundancy Protocols ”, The 13th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2022) October 26-28, 2022, Leuven, Belgium. [15] M. Mansour et al, “Performance Analysis and Functionality Comparison of First Hop Redundancy Protocols”, Journal of Ubiquitous System & Prevasive Networks. Volume 15, No. 1 (2021) pp.49-58. [16] Imelda Ristanti Julia et al, “Protocol (FHRP) on VRRP, HSRP, GLBP with Routing Protocol BGP and EIGRP”, The 8th International Conference on Cyber and IT Service Management (CITSM 2020) On Virtual, October 23-24, 2020. [17] Suncheul Kim and Hoyong Ryu , “FDVRRP: Router implementation for fast detection and high availability in network failure cases ”, ICT R&D program of MSIP/IITP, Republic of Korea ,15 May 2019.