It’s always necessary to estimate the value of specific reliability activities. It is needed to justify the investment required to accomplish the task. Prototypes, diagnostics equipment, and environmental chambers are expensive. The difficulty is an inability to know what will be found, before conducting the experiment.
Not doing the test means the certainty of not finding anything. That is often not enough motivation to invest, to learn something about the reliability performance. The following scenario is just one situation, along with a few ideas to help you estimate the value of investments in reliability work.
HALT and time to market
Consider the development of a new game controller. The product is high volume, with the majority of sales expected immediately after product launch, during the holiday sales period. It’s a new design, there’s an emphasis on time to market, the majority of product will be manufactured before the start of sales, there are no repairs, and the controller is an enabling part of a larger system. The controller’s reliability goal is 98% reliable over the first year of ownership when used as part of the game system.
HALT vs. ALT Discussion
One of the basic questions facing the team is, “Will the product meet the 98% reliability goal?” An ALT may help answer this question if we know which failure mechanism(s) will lead to failure during the first year 1.
This is a new product without any field history. Other controllers designed for this environment have experienced a range of failure causes, but are often dominated by shock and vibration damage from dropping.
The risk analysis done by the design team suspects that drop damage would be the most significant contributor to product failures. The new controller is different enough that using the field data is likely not to apply. Also, it is unknown which specific element of the design would experience failure first or at all over one year of use. Therefore, understanding the most likely failure mechanisms that are to occur is important to discover.
The initial project plan did not include HALT testing on the first set of prototypes, rather it would sample from the second set of prototypes, eight weeks later, just before the transfer of the design to manufacturing to conduct design verification testing (DVT), including life testing. The drop testing portion of the DVT is expected to take a week to accomplish.
The reliability engineer on this program recommends performing HALT on the first available prototypes. The suggestion is to use high loads of random vibration and high shock loads in the HALT plan, to quickly assess the design weakness related to product drop damage. The project manager requests more information on timing, cost, and benefits (value).
There isn’t time to procure a HALT chamber within the development schedule; therefore let’s collect quotes from HALT labs to conduct the testing. Let’s assume a quote of $10k for one round of testing 2. Of course, if there were HALT facilities internally available this cost would be less. Also, consider that the cost of the prototypes is about five times more expensive than second round prototype units.
The first round of prototypes is a small run, specialized tooling, quick turn production, costing approximately $1k for each unit. Let’s request five units, at an increased cost of 5 times over later prototypes: at an $800 price increase, or $4k. Rounding out the expected costs of engineering support, testing equipment support, and failure analysis support, we can estimate an additional cost of approximately $20k. Therefore, the total cost to the program to add HALT testing is approximately $24k.
One of the primary benefits of HALT is the potential to uncover new failure mechanisms in the design 3. By conducting the HALT on the first available prototypes, the design team increases the time available to resolve design errors or make design improvements. Designers tend to design away from failures; HALT is a tool to discover previously unknown (or unsuspected) failure mechanisms.
Let’s assume (for the purpose of this example) the design before any testing has a 25% chance of a failure mechanism that will lead to an unacceptably high first-year failure rate. In discussions with the program manager, the team learns that they would delay the start of production if there were a 10% or higher expected field failure rate.
Moreover, the cost of the delay was estimated at $500k per day in lost sales. With an assumed 30 days to design and implement an improvement to resolve a major reliability issue, the losses would amount to $500k/day for 30 days, or $15 million.
There is a good chance that the design is fine and will meet the reliability objectives. Let’s assume 75% of the time the design has an overall failure rate of less than 10% over the first year.
25% of the time, the underlying design has at least one major failure mechanism that may be detected and resolved before the start of sales. Also, consider that no testing program will uncover all faults – yet let’s assume that only 10% of the time will HALT and DVT not find a major (>10% failure rate) issue.
Also, HALT may not find the issue while DVT does detect the fault, let’s say 50% of the time. And, let’s assume HALT finds the fault only 40% of the time.
Note: this low rate is pessimistic for an estimate of the ability of a well-executed HALT and in my experience HALT is much more effective.
For the value calculation, 25% chance of an unacceptable failure rate exists in the design, times a 40% chance of HALT finding the issue, times the cost avoided by having time to solve the issue without a 30 day program delay, results in an expected savings of 0.25 x 0.40 x $15m = $1.5 million.
The ROI is the ratio of the expected return over the cost. $1.5 million divided by $24k: the ROI is over 60. This is only part of the value as it only considered the detection of major issues, thus avoiding a schedule slip.
The HALT will also find less significant issues that wouldn’t have resulted in a schedule slip, yet the earlier detection would reduce the cost of implementing design changes. Plus, HALT may have found unique failure mechanisms beyond what the DVT would find, then leading to an incremental reduction in achieved field failure rate. See also
- Silverman, Mike. How Reliable Is Your Product? Cupertino, CA: Super Star Press, December 2010, pg. 193.
- Personal Communication with Mike Silverman, June 18th, 2011.
- Hobbs, Gregg K. Accelerated Reliability Engineering : HALT and HASS. Chichester; New York: Wiley, 2000, pg. 43.