HALT (highly accelerated life testing) is a method to reveal product weaknesses. Design prototypes experience the step-stress application of relevant stresses until failures appear.
The intent is to find design or process related weaknesses early in the design process thus providing time to economically address the issue. Using a build-test-fix approach does improve a product’s robustness and reliability.
Being a useful tool, should you conduct HALT on every project? It seems that revealing weaknesses is certainly useful.
HALT requires the use of prototype units which are expensive, in general. It also takes time and in some circumstances expensive testing equipment to apply the stresses. There also is no guarantee that the exercise will find previously unknown faults.
There is no hard and fast rule to decision point to determine if HALT is worth the effort. The return (discovered weaknesses) has to be worth the effort.
Estimating the value of a specific HALT helps you determine if HALT is the right tool for your specific situation.
When HALT makes sense
The easy answer is when it’s free and the risk of unknown faults is high.
Free? HALT is not a specific or set test using specialized chambers. It is an approach using step-stress with one or more stresses to excite faults for detection. A HALT could occur on the design engineers lab bench using simple tools.
While not totally free as you still require a prototype. If the device is either inexpensive to build or easily repaired, then the overall cost is minimized. Most engineers tend to conduct some form of HALT (they most likely do not call it such) when they first receive a prototype of their design. They apply various stimuli to determine the response and look for anomalous behavior (faults).
Unknown faults imply there is uncertainty about how the device will respond to the stress of use. Another unknown concerns how well the device has been assembled, including the assembly process of supplied parts.
When using new materials, novel solutions, or new assembly methods we have some uncertainty concerning what will fail and how will it fail. Keep in mind that everything will fail, and proving the hypothesis about what will fail first is often left to experimentation or to the customer.
HALT provides a way to avoid the customers from finding the faults first.
Changes in use and environment
Another situation when HALT makes sense is when the customer use and environment changes. For example, if an existing product is to begin sales in a new part of the world or for a new market. In general, we may assume the previous product design focused on the intended geography and expected use stresses.
When the expectations and environment around a product change, the failures that appear also change. HALT provides a way to explore the impact of these new stresses and possibly reveal new product weaknesses.
Low risk of field failure
The fourth situation when HALT makes sense is when the risk of field failure has to be very low. This could be with a new or mature product line, it could be a new invention. It really is a business decision concerning the market acceptance of a product with faults. In some situations, the customers will not tolerate nor expect unexpected failures.
One way to think about product design and failures is the notion of each design and each product has a finite number of faults that will lead to premature failure. The product development process, prototype testing, validation & verification, and similar processes work to identify and resolve as many of the existing and unknown faults as possible.
Benefits of HALT
HALT provides a way to quickly find faults. With enough time, development methods without using the HALT approach will certainly find the faults, it’s just it takes time. HALT finds faults faster, in general, allowing the design team more time to resolve the list of issues. Using multiple stresses with HALT also reveals faults excited by the interaction of stresses which testing with one stress at a time will not reveal. HALT provides a way to reveal difficult to find weaknesses.
With enough time, development methods without using the HALT approach will certainly find the faults, it’s just it takes time. HALT finds faults faster, in general, allowing the design team more time to resolve the list of issues. Using multiple stresses with HALT also reveals faults excited by the interaction of stresses which testing with one stress at a time will not reveal. HALT provides a way to reveal difficult to find weaknesses.
HALT provides a way to reveal difficult to find weaknesses.
When HALT does not make sense
The simple answer is when failures do not matter and the cost of testing is very high.
The best time to determine if HALT is worth doing is after you have completed the process. If you have discovered previously unknown weaknesses and are able to do something about it, then you have learned something useful. The problem with this approach is you have already made the investment in HALT before determining if it is useful.
Does the organization have experience?
If the organization has no experience with HALT, learning about the tool and its potential value is a good enough reason to conduct HALT, whether or not you find something useful. If the team already has experience with HALT, then you should conduct HALT when there is a reasonable chance of finding previously unknown faults.
If the team already has experience with HALT, then you should conduct HALT when there is a reasonable chance of finding previously unknown faults.
Product is similar to previous products
Therefore, if the new product is very similar (‘similar’ is very subjective and based on engineering judgment) to previous products and the team already has a long list of defects that have to be addressed, then conducting HALT may add little value. An extensive Pareto of issues built from field returns, customer complaints, internal testing, and FMEA, leaves little chance of new issues appearing.
Consequence of failures is inconsequential
When the consequence of failures is inconsequential, meaning the customer does not mind if a failure occurs, then finding and fixing issues is of little value.
While this is rare, in my experience, it is possible.
Development process permits no changes
Another situation to avoid conducting HALT is when the development process will not permit any changes. If a fault is found and the team is unable or unwilling to make changes to address the fault, then there is little value in finding additional faults.
Part of the HALT process is fixing issues found.
No information on faults
Also, consider if the HALT process has the ability to yield information on faults. If the testing is set to a fixed routine or a very limited profile or duration, the chances of finding previously unknown weaknesses are diminished.
This may occur when there is a mandate or requirement to HALT every prototype on every project. The constraint on testing resources may preclude sufficient time or test design to excite and discover failure mechanisms.
How to decide when to conduct HALT
There isn’t a formula for this decision. It is a balance between the investment and the potential value.
In general, you should conduct HALT when:
- The design, supply chain or technology are new
- The uncertainty about what will fail is high
- The cost of conducting HALT is low
- The ability to address weaknesses is high
And, in general, you should not conduct HALT when:
- The design, supply chain, and technology are stable
- The existing list of weaknesses is extensive
- The cost of conducting HALT is high
- The ability to address weaknesses is low
When first learning about HALT, just do it. Learn and gain experience with the approach and how the discovery process works. As you and your team gain experience you will be able to judge the potential value from conducting HALT.
As you and your team gain experience you will be able to judge the potential value from conducting HALT.
When the potential value significantly outweighs the cost, do HALT.
4 Steps to Accomplish HALT (article)
Fred, good review of some of reasons to do HALT.
Another reason is if HASS is needed to help precipitate and detect manufacturing defects. HASS is especially beneficial during the early production ramp or a change of manufacturing locations, then safe and highest levels of stress should be determined from the operation and destruct stress limits found from HALT for the most efficient production screen.
Thanks for the comment and addition. While we spend most of our time getting folks to use HALT – once they start blindly using it all the time, it loses value.
Oleg Ivanov says
I want the HALT to be a quantitative test
Fred Schenkelberg says
HALT, as you know, is a method to discover weaknesses within a design – it is a method to discover failure mechanisms. It is not a test to pass or fail. It is not a life test in the same way that an ALT provides numbers describing the time to failure patterns or expectations.
Oleg, wishing an apple be like an orange, even when painting the outside an orange color, won’t result in changing its nature.
Oleg Ivanov says
Fred, why not? We just don’t know how to count.
Imagine we could use the HALT not only for product improvement, but for Type certification, confirmation the lifetime (replace) of Critical Parts. predict the warranty and maintenance cost.
Fred Schenkelberg says
It’s the translation from HALT results (stress(s) induced failures done quickly) to time to failure distributions or patterns. A typical HALT may uncover ten or so different failure mechanisms using 3 to 5 different stresses and in combinations, and do so in a day or maybe three.
Even a well-designed ALT with one applied stress and a well-known failure mechanism along with verified life models – often only provides tenuous results. And the ALT takes many more samples, careful application of stress, only a single failure mechanism, and time.
I’ll not say it is impossible to use HALT results to estimate a life distribution, yet I do not know of any verified or useful methods to do so, nor suspect the translation would be without significant uncertainty.
At the moment I do not think quantifying HALT to life distributions is feasible.
Oleg Ivanov says
Hi Fred, thanks for this interesting discussion.
For the quantitative HALT, I do not require extrapolation of the time of failure for the field (as is done in the ALT) and estimation of the distribution of life. There is idea else. Testing time we can consider as the applied stress (and increase it step by step). If the sample has passed a single lifetime, increase it to double, triple (!!!) – consider the HALT is passed. Otherwise (failure) we do “fix – build – test” again. For example, this method is used for reliability development and Type Certification of aircraft engines.
Fred Schenkelberg says
The crux for this to work is to equate some amount of applied stress or stresses to a lifetime – 20 hours in step stress with multiple stresses leading to multiple failures (which is how I define HALT) makes it very difficult to translate to a lifetime. The stresses are often well beyond use conditions and the aim of HALT is to find failures.
If the aim is to run similar to use conditions and expect no failures over one or more lifetimes – that is what I call a success test and quite different than HALT. With success testing, we have to be able to apply a lifetime of stress in a known way in order for the test to be meaningful. For example, if a car door is expected to experience 10k cycles in a lifetime – then we can do 10k cycles eliminating the long periods of simply being closed in order to replicate a lifetime quickly. When we add stress, like additional weight, in order to accelerate even future the wear on latches and hinges, we then need a way to translate the additional stress to back to normal use conditions. With HALT we rarely have meaningful models for all the failure mechanisms of interest. Plus, HALT is a tool to discover failure mechanisms and not very good at all in supporting the modeling of those mechanisms.
Oleg Ivanov says
I understand you.
The purpose of the HALT is to find weaknesses and fix the project. For this, failures are needed. The absence of failures does not give us value. We can use any hard test modes.
And purpose of the quantitative HALT is to find weaknesses and fix the project (If there are failures) and to certificate the product (If there are no failures). We can only use test modes for which we can calculate the acceleration factor.
Therefore, I see the “quantitative HALT” as a HALT. “When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.” 🙂
For accelerated testing, the following classification is convenient – if we MEASURE the lifetime, reliability (to translate to a lifetime) – this is an ALT, if we don’t MEASURE the lifetime – this is an HALT. Wherein we can do an EVALUATION of reliability and lifetime, but we don’t have to.
The question is Why the HALT has not become a quantitative test so far and whether it can become one?
Fred Schenkelberg says
Let’s say there are two development teams creating similar products – they use similar technology, components, assembly processes, similar use cases, etc. The two product designs each have the cumulation of thousands of design decisions that result in some unknown number of errors or weaknesses or failure mechanisms lurking within the product.
In both products, there may be 100 unique failure mechanisms – many of which overlap between the two products. Some of which are unique to each product.
Let’s do HALT – each team can brainstorm likely failure mechanisms, areas of concern, etc, and sort out a shortlist of stresses to apply during HALT. This shortlist, the method and rate of application of stress may well be different, etc. Each team finds maybe 10 failures – they go fix those – they may or may not be the same set of ten failure mechanisms. The run HALT again and maybe find a few more or not.
The HALT experience finds some issues related to the chosen set of stresses being applied. Change the design team’s set of decisions, the specifics of the supply chain elements, the details of the assembly methods, handling, etc. Each product will have similar yet different way it is likely to fail. HALT finds some of the potential failure mechanisms – it rarely finds all of them and even if HALT finds many issues, it is a rare team that then solves all of them. Not all stresses accelerate every failure mechanism in the same manner – in HALT, the assumption is we’re finding the weaknesses, yet we don’t know about mechanisms that are there, yet not affected by the stresses applied or are masked by other mechanisms (some of which the team won’t solve and then find the hidden and potentiall important mechanism).
Now consider how well any design team understands how well customers will us a product, where, how often, and customer expectations are all factors we try to understand and generally become surprised when the reality is different than what the development team guesses.
In short, HALT has a role to play to help teams discover failure mechanisms – ALT purposefully focuses on one mechanism to carefully accelerate and model because the number of unknowns and variables otherwise is just too complex to create a meaningful life model. For circuit boards we have Physics of Failure tools which use an array of life models with the assumption that each mechanism acts independently to cause failure – we know that is not true, yet in order to model even a simple cirucit board, it is something we have to do. In HALT we do not make the independence assumption, because we are not trying to model the time to failrue in any set of conditions.
Just because a product survives HALT conditions does not mean anything other then it might be a bit more robust than when it failed before redesign at a lower stress. The assumption is being more robust for the selected stressses we use during HALT means it will last longer in the field – there is some evidence that this is true – when there is good design work, the careful selection and application of stresses, redesign to remove mechanisms, etc. Yet, I’ve seen no evidence in any meaningful way that quantifies the relationship between applied stress in HALT to expected field reliability performance. I think the issue is to complex for our way of analyzing and understanding to solve.
Using the range of tools we have available can shift the chance of a product working reliably more often then not.
Oleg, I appreciate the discussion and it’s helping me to organize my thoughts for a webinar tomorrow on how to select reliability tools…. as always appreciate your insights and questions.