Fault management presentation

Fault Management
and
Proactive Maintenance
Ardhita Banu Adji
Irnanta wahyu andari
Wibisono Juhdi

Overview
• Nowadays it is a matter of a
course that you can use your
mobile phone anytime almost
everywhere. This makes fault
management one of the most
important aspect of network
management. The most of the
problems are, but not all. The
reason for this is limited
amount of time used in the
testing and also the fact that
these systems will be used in
the several different network
environments, with different
vendors equipments
connected to the nodes.

Chapter 1 :
Introduction
• Greater demands and the increasing
dependency of people in the mobile
communications networks are the
main driver of creating the better error
detection mechanisms for mobile
networks. Modern mobile
communications networks may
produce hundreds of alarms during
one day. Fault situations can arise for
example from hardware and software
failures bor from operational errors
• One solution for this problem is that all
the alarms are divided into classes so
that for example A1 alarm is the most
critical alarm class and A2 is the
second class of alarms and so on

Chapter 1.1 :Concept Definitions
Fault => a fault is defined as a cause of malfunctioning, Faults are responsible for
making it difficult of preventing the normal functioning of a system and they
manifest as themselves through errors, that s, deviations relation to the normal
operation of the system
Alarm => notifications concerning detected
faults or abnormal conditions, which may
or may not represent an error, an alarm
report is a kind of event report used in the
transportation of alarm information. Alarms
that are defined by the vendors are
observable by the network operator.

Chapter 2 : Fault management
• One of the most important areas in the telecommunications network
management is the management of fault occurred during the normal
operation of these networks. definition fault management is about
detection, isolation and correction of faults. The main requirement to
perform fault management :
1. Existence of information on the network's real time functioning
2. The abnormalities that occur during the operation

Chapter 2.1 : Fault
diagnosis
• Fault diagnosis is a
stage in the fault
management process
which consists of
finding out the
original cause for the
received alarms by
formulate a set of
fault hypotheses and
Finding the root
cause of a problem

Chapter 2.2 : Alarm correlation
a conceptual interpretation of multiple alarms, leading to the
attribution of new meaning of the original alarm
1. Compression -> reduce multiple occurrences of an alarm
2. Counting -> consists of generating a new alarm
3. Suppression -> temporary inhibition of alarms to a given events
4. Scaling -> alarm is canceled and another is created
5. Filtering -> suppressing a given alarm
6. Generalization -> replacing an alarm by the alarm corresponding
to its super-class1.
7. Specialization -> The reversal of generalization

Chapter 3 : Proactive
maintenance
Purpose :
reduce the
errors and faults
in the network
and also to
prepare for the
situations where
there is no fast
or easy way to fix
the fault

Chapter 3.1 : Health check
• Health Check : Predefined set of commands to be executed in
the system to find out if the system is functioning correctly or not.
• Health check script : consists of commands to be executed in
the health Check Commands.
• Alarms : If there are alarms health check script will write an entity
to the result file.
• Configuration errors : the most common problems that
systems have.
• Status of the Software level : checked if there is any newer
software level available
• Software comparison : between the nodes are compared to
verify that those are exactly the same.

Chapter 3.2 :
Backup plans
• In case there is a major fault
which needs an immediate
recovery actions, backups
are very essential The more
important the device the
more often the backups
should be taken. After the
backups are taken,
verification of integrity of
the data should be checked
in order to be sure that
backup is actually valid

Conclusion
Since the demand of demand mobile
communication so, the problem occurs more
often and sometimes it can leads to the
distraction of using mobile communication. In
this case, fault management explain about
how we can maintain fault management,
errors, alarm detection, or maybe software
and hardware miss behave on the device.

Answer And Question
1. Explain why fault management is the one of the most
Important aspect in network Management?
• Answer : Because it is a matter of a course that you can
use your mobile phone anytime almost everywhere,
and Modern mobile communications networks may
produce hundreds of alarms during one day and for
that great number of alarm it is very important to
manage all of the fault real time or not and The most
of the problems are, but not all. The reason for this is
limited amount of time used in the testing and also the
fact that these systems will be used in the several
different network environments, with different vendors
equipments connected to the nodes

• 2. What cause An alarm ?
• Answer : When The detected fault or abnormal
conditions occurs in the device. In the ideal situation
every fault (hardware, software or configuration) and
abnormal situations happening in the network would
cause an alarm, and the alarm text would indicate
unambiguously where the problem is.

3. What is Fault Management ?
Answer : Fault management is about detection,
isolation and correction of faults by Analyze
Existence of information on the network's real
time functioning and notice abnormalities
that occur during the operation in a device

4. Explain fault diagnosis and alarm
correlation!
Answer
• Fault Diagnosis -> stage in the fault management process which consists of finding
out the original cause for the received alarms. Before getting to the original cause
it may be necessary to formulate a set of fault hypotheses, which will be needed to
possibly reproduce and validate the problem. Finding the root cause of a problem
is essential for effective fault management
• Alarm Correlation -> a conceptual interpretation of multiple alarms, leading to the
attribution of new meaning of the original alarm by :
1. Compression -> reduce multiple occurrences of an alarm
2. Counting -> consists of generating a new alarm
3. Suppression -> temporary inhibition of alarms to a given events
4. Scaling -> alarm is canceled and another is created
5. Filtering -> suppressing a given alarm
6. Generalization -> replacing an alarm by the alarm corresponding to its super-
class1.
7. Specialization -> The reversal of generalization

5. What is the relation between fault management and
Pro active management?
Answer :
In our Opinion The relation of fault management and
pro active management is fault management has the
task before proactive management do its task because
fault management finding detection, isolation and
correction of faults and after that with pro active
management will help to reduce the error and also to
prepare for the situations where there is no fast or
easy way to fix the fault so both of them have the task
which is related to each other, that is if there is not
fault management, how can we reduce or even
prepare to the worse case if we don’t know what are
the problems.

What do we do in proactive maintenance? Explain each of these!
Answer :
Health Check : elements that help reduce fault
• Health Check : Predefined set of commands to be executed in the system to find
out if the system is functioning correctly or not.
• Health check script : consists of commands to be executed in the health Check
Commands.
• Alarms : If there are alarms health check script will write an entity to the result file.
• Configuration errors : the most common problems that systems have.
• Status of the Software level : checked if there is any newer software level available
• Software comparison : between the nodes are compared to verify that those are
exactly the same.
Back up Plan
In case there is a major fault which needs an immediate recovery actions, backups
are very essential The more important the device the more often the backups
should be taken. After the backups are taken, verification of integrity of the data
should be checked in order to be sure that backup is actually valid
Example back up plan :
• Sometimes it is possible that there is such a major software fault or fault that is
very hard to localize or fix. In these kinds of situations only way for fast recovery is
to use existing backups and restore the system from those. The backups should be
taken often and should be stored to a place which is remotely accessible

Fault management presentation

More Related Content

What's hot (20)

Similar to Fault management presentation (20)

Recently uploaded (20)

Fault management presentation