SlideShare a Scribd company logo
FAULT TOLEARANT SYSTEM
 A fault tolerant system is a system which is a able to
  continue operating despite the failure of a limited
  subset of their hardware or software.

 They are gracefully degradable i.e. as the size of the
  faulty set increases, the system wont collapse
  suddenly but continue executing, part of its
  workload.

 The goal of this design is to ensure that the
  probability of system failure is acceptably small.
FAULT TYPES

Hardware Fault: A hardware fault is some physical
defect that can cause a component to malfunction.
      E.g. A broken wire or the output of a logic gate
that is perpetually stuck at some logic value(0 or 1).

Software Fault: A software fault is bug that can
cause the program to fail for a given set of inputs.
ERROR
 Error is a manifestation of a fault.
   e.g. A broken wire will cause an error if
the system tries to propagate a signal
through it.
A program that has a fault that induces
incorrect output for some set of inputs will
generate errors, if that set of inputs is
applied.
FAULT LATENCY
The fault latency is the duration between
the onset of a fault and its manifestation as
an error.

Since the faults themselves are invisible to
the outside world, only showing themselves
when they cause errors. Such latency will
impact the reliability of the overall system.
ERROR RECOVERY
   It is the process by which the system attempts to
recover from the effects of an error.

TYPES OF ERROR RECOVERY
Forward Error Recovery: In this type the error is
masked without any computations having to be
redone.
Backward Error Recovery: In this type the system is
rolled back to moment in the time before the error is
believed to be occurred and computation is carried out
again. It consumes additional time to mask the effects
of failure.
CAUSES FOR FAULTS

Errors in the specification or design.

Defects in the components

Environmental effects.
Errors In The Specification Or Design

This error arises due to the communication
gap between the person who writes the
specification and the system designer.

The specification is the link between design
process and real world application.

If specification is wrong everything that
proceeds from it is likely to be wrong.
Defects In Components
  This fault arise due to defects caused by the
wear and tear of use.

  E.g. A mosfet may fail due to electro migration,
which is the drifting away overtime of metal
atoms towards the cathode.
Environmental Effects

This fault arise due to operating environment .

 Devices can be subjected to whole array of
stresses, depending on the application.

Poor ventilation or excessively high ambient
temperatures can melt components or damage
them.

  e.g If a computer is in missile, it can undergo
high g-forces and vibrational stress.
FAULT TYPES
Faults are classified according to their temporal
behavior and output behavior.

A fault is said to be active when it is physically
capable of generating errors and to be benign when
it is not.
TEMPORAL BEHAVIOR CLASSIFICATION

 Fault types: Permanent, intermittent, transient.
A permanent fault does not die away with time,
but remains until it is repaired or the affected unit is
replaced.

An intermittent fault cycles between the fault-
active and fault benign states.

A transient fault dies away after some time.
Intermittent faults can be caused by loosely
 connected components.

Transient faults can be caused by environmental
 effects.
     e.g. If there is a burst of electromagnetic
 radiation and the memory is not properly shielded,
 the contents of the memory can be altered without
 the memory chips themselves suffering any
 structural damage. When the memory is rewritten,
 the fault will go away.
OUTPUT BEHAVIOR CLASSIFICATION
  Malicious faults

   • Inconsistent output, harder to neutralize
     these errors

   • It behaves arbitrarily
  Non malicious faults
   • Consistent output errors

   • Easier to neutralize these errors
Fail stop
   Responds to up to a certain maximum
   number of failures by simply stopping,
   rather than putting out incorrect outputs.

Fail safe
   Its failure mode is biased so that the
   application process does not suffer
   catastrophe upon failure.
INDEPENDENCE AND CORRELATION
  Component failures may be independent or
correlated.

         Independent:A failure is said to be
independent if it does not directly or indirectly
cause another failure.

 Correlated:If the failure is said to be correlated if
they are related in some way. e.g. They may be
triggered by same cause or one of them might
cause the others to occur.
FAULT DETECTION
    There two ways to determine that a processor is
malfunctioning
• Online
• Offline

Online Detection:

•This detection goes in parallel with normal system operation
•It is done by checking the behavior that is inconsistent with
correct operation.
• Indication for faulty processor
     -Branching to an invalid destination.
     -Fetching an opcode from a location, which is not
containing data.
- Writing into a portion of memory to which the
  process has no write access.
- Fetching an illegal opcode.
- Inactive for more than a prescribed period.

• A monitor is associated with each processor,
  looking for signs that the processor is faulty. The
  monitor watches the data and address lines.

• Another approach is to have multiple processors,
  which are supposed to put out the same result , and
  compare the results.If a discrepancy arise it
  indicates an fault.
OFFLINE DETECTION

It is done by running a diagnostic test.


These test are scheduled just like ordinary task.
FAULT AND ERROR CONTAINMENT

The process of preventing the error spreading from one
part to another part of the system is called containment

When a fault or error occurs in one part of a system, it
will spread through the system like an infectious disease.
   e.g. An fault in one part of the system might cause
large voltage swings in another.

 A fault-free processor can give erroneous results,
when getting input from a faulty unit.
FAULT CONTAINMENT IS ACCOMPLISHED BY

The system is divided into fault and error
containment zones(FCZ,ECZ).

An FCZ is a subset of the system that operates
correctly despite arbitrary logical or electrical faults
outside the subset. i.e. the failure of some part of
the computer outside an FCZ cannot cause any
element inside the FCZ to fail.
 Hardware inside an fcz must be isolated from
  hardware outside it.It should withstand either a short-
  circuit or the aplication of the maximum voltage
  imposed on the lines connecting on FCZ to the
  outside world.

 Each fcz should have an independent power supply
  and its own clocks. These clocks are synchronized
  with the clocks in other FCZ’s ,but a malfunction in
  the outside clocks wont affect the clocks inside the
  fcz.

 The function of an ECZ is to prevent errors from
  propagating across zone boundaries. This is achieved
  by voting redundant outputs.
REDUNDANCY
     FTS consist of properly managed
redundancy, i.e. the system is to kept
running despite the failure of some its parts.

  It must have spare capacity to begin with.

TYPES OF REDUNDANCY
• Hardware redundancy
• Software redundancy
• Time redundancy
• Information redundancy
Hardware redundancy
         Hardware redundancy is the use of additional
hardware to compensate for failures. This can be
accomplished in two ways.

•One of them is fault detection, correction, and masking.

Fault detection: Multiple hardware units may be
assigned to do the same task in parallel and their results
are compared.
          If one are more units are faulty, we can expect
this to show up as a disagreement in the result.
Fault Masking: If minority of the units are faulty and a
majority of the units produce the same output, the majority
result can considered and failure effect is masked.

Fault correction: If minority of the units disagree, the fault
is detected. So the computation is repeated on other
processors to correct that fault.

• The second one in hardware redundancy is replacing the
malfunctioning unit .It is possible that the system can be
designed so that faulty units can be easily replaced with
spare ones.
Two methods used in hardware redundancy

  •Static Pairing

  •N modular Redundancy (NMR)
STATIC PAIRING
•Hardwire processors in pairs and to discard the
entire pair if one of the processors fails, this is very
simple scheme

•The Pairs runs identical software with identical inputs
and should generate identical outputs. If the output is
not identical, then the pair is non functional, so the
entire pair is discarded

•This approach is depicted in the following figure, and
it will work only when the interface is working fine and
both the processors do not fail identically and around
the same time
• The interface is monitored by means of a
  monitor. If the interface fails, the monitor takes
  care and if the monitor fails, the interface
  takes care. If both interface and monitor fails,
  then the system is down.
N MODULAR REDUNDANCY
•It is a scheme for Forward Error Recovery.

•It works with N processors instead of one and
voting on their output and N is usually odd.

•NMR can be illustrated by means of the following
two ways
   There are N voters and the entire cluster
   produces N outputs

   There is just one voter
•   NMR clusters are designed to allow the purging
    of malfunctioning units. That is, when a failure is
    detected, the failed unit is checked to see
    whether or not the failure is transient. If it is not, it
    must be electrically isolated from the rest of the
    cluster and a replacement unit is switched on.
    The faster the unit is replaced, the more reliable
    the cluster.
• Purging can be done either by hardware or by the operating
  system.

• Self purging consists of a monitor at each unit comparing its
  output against the voted output. If there is a difference, the
  monitor disconnects the unit from the system.

• The monitor can be described as a finite state machine with
  two states connect and isolate. There are two signals, diff
  which is set to 1 whenever the module output disagrees
  with the voter output and reconnect, which is a command
  from the system to reconnect the module
Fault tolearant system
SOFT WARE REDUNDANCY
•Software faults are not like hardware faults i.e.
software never wears out , the faults are not
generated spontaneously during system operation.

•Software faults can be regarded as faults      in
design.

•For software redundancy simply replicating the
same software N times will not work, all N copies
will fail for the same inputs.

•Instead N versions     of the software can be
implemented. The N versions can be developed by
independent teams, with no contact between them.
•   Each version is being developed by a team of
    developers who never communicated with each other

• To minimize the common mode failures

      The specifications should be written in formal
       terms and are subject to rigorous process of
       checking

      Multiple software versions should be developed in
       different programming languages.

      Nature of tools that are being used should be
       selected properly.

      Training and quality of the programmers should
       be maintainded.
There are two approaches for that

   •N Version Programming

   •Recovery Block Approach
N Version Programming
Recovery Block Approach
THANK   U

More Related Content

PPTX
SPACE DIVISION MULTIPLE ACCESS (SDMA) SATELLITE COMMUNICATION
PPTX
Fault tolerance techniques tsp
PPT
Multiple access techniques for wireless communications
PPTX
cellular concepts in wireless communication
PPTX
Fault tolerance techniques
PPTX
Basic cellular system
PPTX
Packet radio protocol
PPTX
Cellular network
SPACE DIVISION MULTIPLE ACCESS (SDMA) SATELLITE COMMUNICATION
Fault tolerance techniques tsp
Multiple access techniques for wireless communications
cellular concepts in wireless communication
Fault tolerance techniques
Basic cellular system
Packet radio protocol
Cellular network

What's hot (20)

PPTX
Gprs architecture ppt
PDF
Issues in routing protocol
PDF
PPTX
Wsn unit-1-ppt
PPT
Chapter 7 multiple access techniques
PPTX
Multiplexing and Multiple Access
PPTX
Chapter 3 Charateristics and Quality Attributes of Embedded System
PPT
File replication
PDF
Mobile computing : Indirect TCP
PPTX
Small scale fading and multipath measurements
PPSX
Mac protocols of adhoc network
PPTX
Transmission impairments(presentation)
PPT
PPTX
Csma protocols
PPTX
PPT
C11 support for-mobility
PPTX
Task assignment and scheduling
PPT
Multiple acces techniques
PDF
Wireless Network Architecture
PDF
Module3 part1
Gprs architecture ppt
Issues in routing protocol
Wsn unit-1-ppt
Chapter 7 multiple access techniques
Multiplexing and Multiple Access
Chapter 3 Charateristics and Quality Attributes of Embedded System
File replication
Mobile computing : Indirect TCP
Small scale fading and multipath measurements
Mac protocols of adhoc network
Transmission impairments(presentation)
Csma protocols
C11 support for-mobility
Task assignment and scheduling
Multiple acces techniques
Wireless Network Architecture
Module3 part1
Ad

Viewers also liked (20)

PPT
Fault Tolerance System
PPTX
Fault tolerant presentation
PPT
Real time database
PPT
Main MeMory Data Base
PPT
Fault tolerance
PPT
PPTX
Fault tolerance
PPTX
Real time database (MDARTS)
PPTX
Fault tolerance techniques for real time operating system
PPTX
Fault management presentation
PDF
Fault Management System (OSS)
PDF
Be information technology2008course
PPT
Chapter 19 - Real Time Systems
PPTX
Ch21 real time software engineering
PPT
Introduction to Real-Time Operating Systems
PPT
Real Time Systems & RTOS
PPTX
In-memory Databases
PDF
Clock Synchronization in Distributed Systems
PPTX
Fault tolerance in distributed systems
PDF
Real Time Analytics: Algorithms and Systems
Fault Tolerance System
Fault tolerant presentation
Real time database
Main MeMory Data Base
Fault tolerance
Fault tolerance
Real time database (MDARTS)
Fault tolerance techniques for real time operating system
Fault management presentation
Fault Management System (OSS)
Be information technology2008course
Chapter 19 - Real Time Systems
Ch21 real time software engineering
Introduction to Real-Time Operating Systems
Real Time Systems & RTOS
In-memory Databases
Clock Synchronization in Distributed Systems
Fault tolerance in distributed systems
Real Time Analytics: Algorithms and Systems
Ad

Similar to Fault tolearant system (20)

PPTX
Fault Finding.pptx
PDF
SE2_Lec 20_Software Testing
PPTX
Error detector for the whole thing is the same as the
PDF
Trouble Shooting PC
PPT
fault-dectecting oil and gas process safety
PPTX
Fault Tolerance System
PPT
Chapter- Five fault powers poin lecture
PPTX
Fault tolerance techniques
PPTX
real time systems fault tolerance, Redundancy
PPTX
RTS fault tolerance, Reliability evaluation
PDF
SE2018_Lec 19_ Software Testing
PDF
Parallel and Distributed Computing Chapter 12
PPTX
Types of Computer System Errors.pptx
PPTX
MICROPROCESSOR_Anurad gor systej ndjksauduiha MAITY.pptx
PDF
Proposed Algorithm for Surveillance Applications
PDF
Troubleshooting & Tools
PPTX
priority interrupt computer organization
PPTX
CS304PC:Computer Organization and Architecture Session 15 program control.pptx
PDF
IRJET- FPGA Implementation of an Improved Watchdog Timer for Safety Critical ...
PDF
RTOS - Real Time Operating Systems
Fault Finding.pptx
SE2_Lec 20_Software Testing
Error detector for the whole thing is the same as the
Trouble Shooting PC
fault-dectecting oil and gas process safety
Fault Tolerance System
Chapter- Five fault powers poin lecture
Fault tolerance techniques
real time systems fault tolerance, Redundancy
RTS fault tolerance, Reliability evaluation
SE2018_Lec 19_ Software Testing
Parallel and Distributed Computing Chapter 12
Types of Computer System Errors.pptx
MICROPROCESSOR_Anurad gor systej ndjksauduiha MAITY.pptx
Proposed Algorithm for Surveillance Applications
Troubleshooting & Tools
priority interrupt computer organization
CS304PC:Computer Organization and Architecture Session 15 program control.pptx
IRJET- FPGA Implementation of an Improved Watchdog Timer for Safety Critical ...
RTOS - Real Time Operating Systems

Recently uploaded (20)

PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
KodekX | Application Modernization Development
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Empathic Computing: Creating Shared Understanding
PDF
Machine learning based COVID-19 study performance prediction
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Spectroscopy.pptx food analysis technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Understanding_Digital_Forensics_Presentation.pptx
KodekX | Application Modernization Development
Diabetes mellitus diagnosis method based random forest with bat algorithm
NewMind AI Weekly Chronicles - August'25 Week I
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Review of recent advances in non-invasive hemoglobin estimation
Empathic Computing: Creating Shared Understanding
Machine learning based COVID-19 study performance prediction
Dropbox Q2 2025 Financial Results & Investor Presentation
Programs and apps: productivity, graphics, security and other tools
Spectroscopy.pptx food analysis technology
The AUB Centre for AI in Media Proposal.docx
Digital-Transformation-Roadmap-for-Companies.pptx
sap open course for s4hana steps from ECC to s4
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Network Security Unit 5.pdf for BCA BBA.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation_ Review paper, used for researhc scholars
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf

Fault tolearant system

  • 2.  A fault tolerant system is a system which is a able to continue operating despite the failure of a limited subset of their hardware or software.  They are gracefully degradable i.e. as the size of the faulty set increases, the system wont collapse suddenly but continue executing, part of its workload.  The goal of this design is to ensure that the probability of system failure is acceptably small.
  • 3. FAULT TYPES Hardware Fault: A hardware fault is some physical defect that can cause a component to malfunction. E.g. A broken wire or the output of a logic gate that is perpetually stuck at some logic value(0 or 1). Software Fault: A software fault is bug that can cause the program to fail for a given set of inputs.
  • 4. ERROR  Error is a manifestation of a fault. e.g. A broken wire will cause an error if the system tries to propagate a signal through it. A program that has a fault that induces incorrect output for some set of inputs will generate errors, if that set of inputs is applied.
  • 5. FAULT LATENCY The fault latency is the duration between the onset of a fault and its manifestation as an error. Since the faults themselves are invisible to the outside world, only showing themselves when they cause errors. Such latency will impact the reliability of the overall system.
  • 6. ERROR RECOVERY It is the process by which the system attempts to recover from the effects of an error. TYPES OF ERROR RECOVERY Forward Error Recovery: In this type the error is masked without any computations having to be redone. Backward Error Recovery: In this type the system is rolled back to moment in the time before the error is believed to be occurred and computation is carried out again. It consumes additional time to mask the effects of failure.
  • 7. CAUSES FOR FAULTS Errors in the specification or design. Defects in the components Environmental effects.
  • 8. Errors In The Specification Or Design This error arises due to the communication gap between the person who writes the specification and the system designer. The specification is the link between design process and real world application. If specification is wrong everything that proceeds from it is likely to be wrong.
  • 9. Defects In Components This fault arise due to defects caused by the wear and tear of use. E.g. A mosfet may fail due to electro migration, which is the drifting away overtime of metal atoms towards the cathode.
  • 10. Environmental Effects This fault arise due to operating environment .  Devices can be subjected to whole array of stresses, depending on the application. Poor ventilation or excessively high ambient temperatures can melt components or damage them. e.g If a computer is in missile, it can undergo high g-forces and vibrational stress.
  • 11. FAULT TYPES Faults are classified according to their temporal behavior and output behavior. A fault is said to be active when it is physically capable of generating errors and to be benign when it is not.
  • 12. TEMPORAL BEHAVIOR CLASSIFICATION  Fault types: Permanent, intermittent, transient. A permanent fault does not die away with time, but remains until it is repaired or the affected unit is replaced. An intermittent fault cycles between the fault- active and fault benign states. A transient fault dies away after some time.
  • 13. Intermittent faults can be caused by loosely connected components. Transient faults can be caused by environmental effects. e.g. If there is a burst of electromagnetic radiation and the memory is not properly shielded, the contents of the memory can be altered without the memory chips themselves suffering any structural damage. When the memory is rewritten, the fault will go away.
  • 14. OUTPUT BEHAVIOR CLASSIFICATION Malicious faults • Inconsistent output, harder to neutralize these errors • It behaves arbitrarily Non malicious faults • Consistent output errors • Easier to neutralize these errors
  • 15. Fail stop Responds to up to a certain maximum number of failures by simply stopping, rather than putting out incorrect outputs. Fail safe Its failure mode is biased so that the application process does not suffer catastrophe upon failure.
  • 16. INDEPENDENCE AND CORRELATION Component failures may be independent or correlated. Independent:A failure is said to be independent if it does not directly or indirectly cause another failure. Correlated:If the failure is said to be correlated if they are related in some way. e.g. They may be triggered by same cause or one of them might cause the others to occur.
  • 17. FAULT DETECTION There two ways to determine that a processor is malfunctioning • Online • Offline Online Detection: •This detection goes in parallel with normal system operation •It is done by checking the behavior that is inconsistent with correct operation. • Indication for faulty processor -Branching to an invalid destination. -Fetching an opcode from a location, which is not containing data.
  • 18. - Writing into a portion of memory to which the process has no write access. - Fetching an illegal opcode. - Inactive for more than a prescribed period. • A monitor is associated with each processor, looking for signs that the processor is faulty. The monitor watches the data and address lines. • Another approach is to have multiple processors, which are supposed to put out the same result , and compare the results.If a discrepancy arise it indicates an fault.
  • 19. OFFLINE DETECTION It is done by running a diagnostic test. These test are scheduled just like ordinary task.
  • 20. FAULT AND ERROR CONTAINMENT The process of preventing the error spreading from one part to another part of the system is called containment When a fault or error occurs in one part of a system, it will spread through the system like an infectious disease. e.g. An fault in one part of the system might cause large voltage swings in another.  A fault-free processor can give erroneous results, when getting input from a faulty unit.
  • 21. FAULT CONTAINMENT IS ACCOMPLISHED BY The system is divided into fault and error containment zones(FCZ,ECZ). An FCZ is a subset of the system that operates correctly despite arbitrary logical or electrical faults outside the subset. i.e. the failure of some part of the computer outside an FCZ cannot cause any element inside the FCZ to fail.
  • 22.  Hardware inside an fcz must be isolated from hardware outside it.It should withstand either a short- circuit or the aplication of the maximum voltage imposed on the lines connecting on FCZ to the outside world.  Each fcz should have an independent power supply and its own clocks. These clocks are synchronized with the clocks in other FCZ’s ,but a malfunction in the outside clocks wont affect the clocks inside the fcz.  The function of an ECZ is to prevent errors from propagating across zone boundaries. This is achieved by voting redundant outputs.
  • 23. REDUNDANCY FTS consist of properly managed redundancy, i.e. the system is to kept running despite the failure of some its parts. It must have spare capacity to begin with. TYPES OF REDUNDANCY • Hardware redundancy • Software redundancy • Time redundancy • Information redundancy
  • 24. Hardware redundancy Hardware redundancy is the use of additional hardware to compensate for failures. This can be accomplished in two ways. •One of them is fault detection, correction, and masking. Fault detection: Multiple hardware units may be assigned to do the same task in parallel and their results are compared. If one are more units are faulty, we can expect this to show up as a disagreement in the result.
  • 25. Fault Masking: If minority of the units are faulty and a majority of the units produce the same output, the majority result can considered and failure effect is masked. Fault correction: If minority of the units disagree, the fault is detected. So the computation is repeated on other processors to correct that fault. • The second one in hardware redundancy is replacing the malfunctioning unit .It is possible that the system can be designed so that faulty units can be easily replaced with spare ones.
  • 26. Two methods used in hardware redundancy •Static Pairing •N modular Redundancy (NMR)
  • 28. •Hardwire processors in pairs and to discard the entire pair if one of the processors fails, this is very simple scheme •The Pairs runs identical software with identical inputs and should generate identical outputs. If the output is not identical, then the pair is non functional, so the entire pair is discarded •This approach is depicted in the following figure, and it will work only when the interface is working fine and both the processors do not fail identically and around the same time
  • 29. • The interface is monitored by means of a monitor. If the interface fails, the monitor takes care and if the monitor fails, the interface takes care. If both interface and monitor fails, then the system is down.
  • 31. •It is a scheme for Forward Error Recovery. •It works with N processors instead of one and voting on their output and N is usually odd. •NMR can be illustrated by means of the following two ways There are N voters and the entire cluster produces N outputs There is just one voter
  • 32. NMR clusters are designed to allow the purging of malfunctioning units. That is, when a failure is detected, the failed unit is checked to see whether or not the failure is transient. If it is not, it must be electrically isolated from the rest of the cluster and a replacement unit is switched on. The faster the unit is replaced, the more reliable the cluster.
  • 33. • Purging can be done either by hardware or by the operating system. • Self purging consists of a monitor at each unit comparing its output against the voted output. If there is a difference, the monitor disconnects the unit from the system. • The monitor can be described as a finite state machine with two states connect and isolate. There are two signals, diff which is set to 1 whenever the module output disagrees with the voter output and reconnect, which is a command from the system to reconnect the module
  • 35. SOFT WARE REDUNDANCY •Software faults are not like hardware faults i.e. software never wears out , the faults are not generated spontaneously during system operation. •Software faults can be regarded as faults in design. •For software redundancy simply replicating the same software N times will not work, all N copies will fail for the same inputs. •Instead N versions of the software can be implemented. The N versions can be developed by independent teams, with no contact between them.
  • 36. Each version is being developed by a team of developers who never communicated with each other • To minimize the common mode failures  The specifications should be written in formal terms and are subject to rigorous process of checking  Multiple software versions should be developed in different programming languages.  Nature of tools that are being used should be selected properly.  Training and quality of the programmers should be maintainded.
  • 37. There are two approaches for that •N Version Programming •Recovery Block Approach
  • 40. THANK U