Effectively communicating system redundancy is important because redundancy touches system performance, risk management, disaster recovery, regulatory compliance, and customer & owner confidence. Getting the redundancy communication wrong produces blind spots and surprises. Getting it right produces a well-oiled, predictable machine. This article provides proven tips for effectively communicating system redundancy.
[Read more…]Articles tagged Fault tolerance
How Systems & Reliability Engineers Apply Redundancy to Facilities and Critical Infrastructure
Redundancy in facilities and critical infrastructure is often misunderstood as simply having two of something. However, redundancy is a sophisticated strategy used by systems and reliability engineers to minimize failures and ensure continuous operation. It is one of several approaches to preventing system failures and comes with several key tradeoffs. This article examines four key aspects, or the four horsemen, of redundancy and why it is so important for facilities and critical infrastructure.
[Read more…]Reliability Techniques For Analyzing And Improving Fault Tolerance
When designing equipment and processes, engineers leave a safety margin that ensures equipment remains functional when a fault or defect is affecting it partially or wholly. Minor defects affecting production assets should not cause immediate breakdowns. A fault-tolerant system remains operational for predetermined intervals before undertaking corrective measures. Faults affecting the operation of different systems emanate from more than a single source. [Read more…]
The Downside of a Fault Tolerant System
Maintaining high reliability or availability is a marked advantage for any system. A system that achieves the ability to avoid system downtime due to a single failure event, is essential in many applications. Yet, the fault tolerant capability comes at a price.
A system that achieves the ability to avoid system downtime due to a single failure event, is essential in many applications. Yet, the fault tolerant capability comes at a price.
Here is a short list and brief description of fault tolerant design disadvantages:
Masking or obscuring low-level failures
The nature of a fault tolerance design is to continue to operate normally even with a component failure.
Thus if the ability to detect a component failure relies on a loss of function or capability, it may be difficult to detect the failure. This sets the stage for a second component failure to cause a system downing event. [Read more…]
Deciding What Should Have Fault Tolerance
In some circumstances, it is desirable to ensure the system continues to operate even if there is an internal failure. An aircraft navigation system should be able to operate even if an internal dc-dc regulator fails, for example.
Not everything within some systems benefits by being fault tolerant.
For example, a failure of a cabin reading light over a passenger seat is not critical to the safe operation of the aircraft, thus is likely not created to be fault tolerant. One criterion to determine what should be fault tolerant is the criticality of the function the system provides.
This also applies to specific subsystems within a system allowing some elements to be created fault tolerant and others within the system not. [Read more…]
Fault Tolerance Basics
Fault tolerance is a system that is reliant to the failure of elements within the system. It also may be called a fail safe design.
A fault tolerant system may continue to operate just fine, after one of the power supplies fails, for example. Or it may operate in a reduced or degraded state.
Other systems may have a ‘limp home’ condition, allowing the system to save critical data or allowing you to drive to a safe place to change a flat tire. [Read more…]