When designing equipment and processes, engineers leave a safety margin that ensures equipment remains functional when a fault or defect is affecting it partially or wholly. Minor defects affecting production assets should not cause immediate breakdowns. A fault-tolerant system remains operational for predetermined intervals before undertaking corrective measures. Faults affecting the operation of different systems emanate from more than a single source. [Read more…]
III. A. 8. Fault Tolerance
The Downside of a Fault Tolerant System
Maintaining high reliability or availability is a marked advantage for any system. A system that achieves the ability to avoid system downtime due to a single failure event, is essential in many applications. Yet, the fault tolerant capability comes at a price.
A system that achieves the ability to avoid system downtime due to a single failure event, is essential in many applications. Yet, the fault tolerant capability comes at a price.
Here is a short list and brief description of fault tolerant design disadvantages:
Masking or obscuring low-level failures
The nature of a fault tolerance design is to continue to operate normally even with a component failure.
Thus if the ability to detect a component failure relies on a loss of function or capability, it may be difficult to detect the failure. This sets the stage for a second component failure to cause a system downing event. [Read more…]
Deciding What Should Have Fault Tolerance
In some circumstances, it is desirable to ensure the system continues to operate even if there is an internal failure. An aircraft navigation system should be able to operate even if an internal dc-dc regulator fails, for example.
Not everything within some systems benefits by being fault tolerant.
For example, a failure of a cabin reading light over a passenger seat is not critical to the safe operation of the aircraft, thus is likely not created to be fault tolerant. One criterion to determine what should be fault tolerant is the criticality of the function the system provides.
This also applies to specific subsystems within a system allowing some elements to be created fault tolerant and others within the system not. [Read more…]
Fault Tolerance Basics
Fault tolerance is a system that is reliant to the failure of elements within the system. It also may be called a fail safe design.
A fault tolerant system may continue to operate just fine, after one of the power supplies fails, for example. Or it may operate in a reduced or degraded state.
Other systems may have a ‘limp home’ condition, allowing the system to save critical data or allowing you to drive to a safe place to change a flat tire. [Read more…]