How is failure testing done on the Space Station? Could FTA (Fault Tree Analysis) be used in reverse to detect multiple failures given symptoms? That’s what NASA was programming in the 1990s. I proposed that the ratios P[part failure]/(part test time) be used to optimally sequence tests. Those ratios work if there are multiple failures, as long as failure rates are constant and failure times are statistically independent.
Fault Tree Analysis inputs are [Veseley et al., Lambert and Yadigaroglu]:
- lists of parts, their failure modes (“Basic Events”),and their failure rates
- the “Top Event”, system failure definition
- the system structure, “AND” or “OR” “gates” that tell whether combinations of events cause higher level failures or the Top Event
FTA computer programs produce a fault tree, a list of “minimal” cut sets that tell which Basic Events occurring together cause the Top Event, and upper bounds on system failure rates (assuming constant Basic Event rates). [See www.ftaassociates.com for more than you ever wanted to know about FTA.]
Don Olmsted was project manager for a flywheel-powered bus in Los Angeles: A BIG FLYWHEEL https://en.wikipedia.org/wiki/Gyrobus. The flywheel was being tested in a jet engine test cell when the flywheel exploded and essentially cleaned out the test cell. Fortunately, no one was in the test cell. Don called and asked me to help diagnose the failure. I made a fault tree of the flywheel test system particularly its cooling system, including parts’ failure rates. I ran the FTDS computer program on the fault tree and failure symptoms. The computer program diagnosed the most probable failure cause was a coolant check valve. It probably was installed backwards.
What are FTDS? FTDOTS?
FTDS stands for Fault Tree Diagnostic System. FTDS tells which parts cause symptoms and their timing given the fault tree and some timing information [David Iverson and Ann Patterson-Hine]. FTDS is like reverse fault tree analysis: given a fault tree, basic event failure rates, and failed system symptoms, FTDS outputs the probable cause of the system failure, including multiple causes.
David and Ann wrote: “Service people have decreasing technical capability:
- They’re taught to remove and replace. It’s a legitimate diagnostic if there are no alternatives.
- Individual experience limited to observations and hearsay, without parts’ field reliability.
- Expert opinions are opinions, formed by one person’s observations, not population.
- Several parts might cause an observed fault and one part might cause several faults.
When field service engineers go to customer sites to service equipment, they want to diagnose and repair failures quickly and cost effectively. Symptoms exhibited by failed equipment frequently suggest several possible causes that require different approaches to diagnosis. An engineer might follow several fruitless paths in the diagnostic process before they find the actual failure.”
I suggested making FTDS into FTDOTS, with the Diagnostic Optimal Test Sequence in decreasing order of P[part failure]/(part test time or cost) [Mitten’s Rule]. FTDS and FTDOTS were developed by NASA for use on the Space Station [Iverson, George, and Patterson-Hine].
One of Murphy’s rules says, “If there is a possibility of several things going wrong, the one that will cause the most damage will be the one to go wrong.” Testing first for the failure that could cause the most damage is not optimal.
FTDOTS automates diagnosis and recommends an optimal test sequence, even for multiple failures. It uses a fault tree as a diagnostic knowledge base to find sets of possible failures that explain exhibited symptoms. FTDOTS sorts hypothesized failure sets to run tests to confirm each failure set. This ordering suggests an optimal sequence to test for the hypothesized failure sets in order to minimize the time or cost required to find and repair the failures or eliminate symptoms. Testing failure sets in order of bang per buck, is optimal even if there are multiple failures, as long as failure rates are constant and Basic Events are statistically independent.
To develop FTDOTS input, a fault tree of the system must be built. A fault tree shows how system’s component failures can propagate to cause higher level observable symptoms. Failure rates and test costs can usually be determined for each Basic Event in the fault tree. Once the fault tree, failure rates, and test costs are obtained for a system, a diagnostic knowledge base for use with FTDOTS can be constructed.
All non-Basic Event symbols of a fault tree are logical AND or OR symbols. An AND symbol signifies that all the child events under the symbol must occur before the event represented by the (parent) symbol will occur. An OR symbol means if at least one of the child events occurs, the parent event will occur.
Other information can be associated with each fault tree symbol. For instance, a symbol might also contain the probability of occurrence of its associated failure event or the time interval between the occurrence of a child event and the occurrence of its parent event.
The user provides FTDOTS with information about the system in the form of normal and abnormal indicators. A normal indicator indicates that a given failure event has not occurred. An abnormal indicator indicates that a failure event has occurred or a failure symptom has been observed. Each possible indicator corresponds to a symbol in the fault tree. If it is known that the failure event represented by a fault tree symbol has not occurred, that event is placed in the normal indicators set. If it is known that a failure event has occurred, that event is placed in the abnormal indicators set. The most effective diagnosis process is obtained when abnormal indicator symbols are as low in the fault tree as possible (near the Basic Events) and normal indicator symbols are as close to the top of the tree as possible.
The diagnoses produced by the FTDOTS program are failure-sets of Basic Events that causally explain the abnormal indicators while maintaining consistency with the normal indicators. When all hypothesis sets have been found for each abnormal indicator in the starting points set, the FTDOTS program combines them into hypothesis sets that could each causally explain all of the symptoms or abnormal indicators. This is accomplished by forming the cross product of the hypothesis sets for each of the starting point symbols.
Mitten’s Rule for Multiple-Failure Sets
Assume that the probability of occurrence of a Basic Event is R(i), and the cost of testing each component modeled by that Basic Event is C(i). If a failure-set contains multiple-Basic Events, FTDOTS assumes that the failures are independent and derives values for the multiple-failure set C and R. R is the product of the failure probabilities of every event in the failure-set. This gives the probability of all of these Basic Events occurring concurrently. C is calculated by summing the testing cost of each failure event. This will give an upper bound on the testing cost for that hypothesis set. If some hypothesis sets contain multiple failure events, the FTDOTS system may not provide a least cost test sequence due to assumptions made when dealing with multiple-failure-sets, but the recommendation will be close to the least cost sequence.
The Fault Tree Diagnosis with Optimal Test Sequence (FTDOTS) program determines the possible causes of symptoms and recommends an optimal sequence of tests that will isolate the actual failure set in the least amount of time or at the least cost.
Occurrence rate of fault j and probability part I causes fault j may be age specific [Shakeri et al.]. The probability that replacement of part identified with fault fixes problem may be less than 100%. Spares aren’t perfect either. The probability replacement causes some other problem is greater than zero. Humans aren’t perfect either.
I can’t find the FTDS and FTDOTS computer programs on the Internet, but I have their C source codes and the FTDS executable. If you need help diagnosing problem symptoms, build the fault tree, estimate the Basic Event rates, list the symptoms, and list what is working. Send the information to firstname.lastname@example.org, and I will try to run FTDS and recommend the optimal test sequence.
Iverson, D.L. and F.A. Patterson-Hine, “Object-Oriented Fault-tree Models Applied to System Diagnosis,” Proc. of SPIE Applications of Artificial Intelligence,” VIII, Orlando, FL, Vol. 1293, pp. 1013-1023, April 1990
D. L. Iverson, L. L. George, and F. A. Patterson-Hine, “Fault Tree Based Diagnosis With Optimal Test Sequencing for Field Service Engineers,” Technology 2004, NASA, Washington, DC, Nov. 8-10, 1994
H. E. Lambert and G. Yadigaroglu “Fault Trees for Diagnosis of System Fault Conditions,” Nuc. Sci. and Eng., Vol. 62, pp. 20-34, 1977
L. G. Mitten, “An Analytic Solution to the Least Cost Testing Sequence Problem,” J. of Ind. Eng., pp. 16-17, Jan.-Feb. 1960
Mojdeh Shakeri, Krishna R. Pattipati, Vijaya Raghavan, and A. Patterson-Hine “Optimal and Near-Optimal Algorithms for Multiple Fault Diagnosis with Unreliable Tests,” IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews, Vol. 28, NO. 3, August 1998
Vesely, W.E., F.F. Goldberg, N.H. Roberts, and D.F. Haasl “Fault Tree Handbook,” NUREG-0492, U.S. Nuclear Regulatory Commission, Washington, D.C., 1981