Reliability Testing Considerations

Reliability testing to determine what will fail or when will failures occur is expensive.

Organizations invest in the development of a product and attempt through the design process to create a product that is reliable.

The design process has many unknowns though. This includes uncertainties about materials, design margins, use environments, loads, and aging effects. Using the best practices of design for reliability will minimize this list of risks to product reliability, yet it will not resolve all the uncertainty.

Reliability testing is expensive. It is an investment in the knowledge about a design. Reliability testing done well reduces uncertainty and risk.

My first reliability test

Years ago working as a manufacturing engineer in a factory, I was getting pretty good with process statistics. I did hypo thesis testing, process control charts, and similar statistical applications. The shop floor regularly worked with the design teams to create prototypes and measure key attributes.

One day the engineering director asked me to design a test to determine how long a new product would survive. Specifically, he wanted to know if it would work for 20 years.

So, I started to learn about reliability testing and started to ask a lot of questions. One reference I found was the Reliability Toolkit (at the time it was on the Reliability Activity Center), which listed a number of key considerations.

Test Planning Factors to Consider

How critical is the product?

Is the product important to customers? If an unexpected failure would shut down the customers business or cause significant damage, we may want to understand reliability very well. On the other hand, if a failure would have little to no consequence, the motivation to conduct reliability testing is less.

Is the product important to our business? Here ‘critical’ may include brand promise or market share or profitability. A product that has less than expected reliability may have a significant impact on profitability and operations. If the product, even if it fails at a high rate, would have little impact on the business, again there is less motivation for testing.

Are safety and reliability a concern?

Related to criticality is safety. If product failure leads to dangerous conditions, then understanding the failure mechanisms and time to failure is important. Some products inherently only fail safe, reducing the need for testing.

Does the customer need a certain level of reliability?

All customers have some level of reliability expectation. Some provide specific requirements. If specified, are the requirements clear and appropriate? For un-specified requirements, do we have a good estimate of customer expectations?

In either case, the bounds for reliability testing are often set by customer expectations. Creating test plans that will evaluate a product’s ability to meet customer expectations is not new. For reliability, it may take a little work to interpret expectations and convert them into meaningful reliability objective, yet the results of testing then provide relevant information.

How mature is the design?

“Don’t test that, it’s going to change before we’re done designing.” Reliability testing often takes time to accomplish. A thermal cycling test for solder joint fatigue failures may require 4 to 6 months to accomplish. If the team hasn’t determined the nature of the electronic packaging, we have the choice of testing the possible options (expensive) or wait till a choice is made (reducing time for testing before a decision on reliability suitability).

Another consideration is maturity in the market and application. If using common technologies that have a suitable track record in a range of applications, there is less need for reliability testing. It’s about risk.

In un-mature designs or technologies, it is the unknown that creates risk and the need for reliability testing.

Are new technologies or processes involved?

The word ‘new’ comes with a red flag for attention. What do we know about the new technology or process? ‘New’ often means we have a lot to learn and reliability testing is one appropriate method. A good practice, time permitting, is to fully evaluate the reliability aspects of a new technology or process offline from a specific product or application.

Once vetted and understood, then add the ‘new’ to a product development program.

This is not often done and provides a source of significant risk in the product development process.

How complex is the product?

Simple products are easy to understand and evaluate, and complex products are not. Simple products have fewer failure mechanisms and interactions making design reliability testing straightforward. Complex products are a group of simple products and includes the interactions between all the simpler elements.

A reliability professional working with business desktop computers found that for some unknown reasons (not fully understood), some models of hard drives would not operate well with some models of power supplies. All individual elements worked well with other models of hard drive or power supply and all were within required operating specifications.

Yet, something caused certain pairings to fail. He called it the white space problem, referring to the unstated and real interactions that occur within complex products.

What are the environmental extremes involved?

The concept of stress – strength applies here.

How much margin does the design have for the expected range of environmental stress that it will experience? This is not just temperature and humidity. Also, consider shock load during transportation, the number of times a handheld device will be dropped and from how high, the range of chemicals and concentrations, the use rate and loads, etc.

Think of the full range of environmental exposure and combinations that may significantly accelerate failures or accumulate damage thus shortening product life.

What is the budget for testing?

As stated earlier, reliability testing is expensive. If there isn’t sufficient budget for reliability testing, the risks associated with the lack of knowledge remain. The balance between knowledge and investment occurs with every test plan.

Being clear about the budget allows the discussion on the right amount of testing.

Are the equipment or facilities able to perform test conditions?

NASA has large vacuum chambers that permit the creation of all the conditions of outer space (except gravity – and they said they are working on that). The chamber can replicate solar radiation, temperature, vacuum, etc. that permits them to evaluate how materials and systems operate in nearly the same conditions as experienced once in space.

With enough money and time, we are able to create test facilities to evaluate nearly any risk. If readily available, great. If not, is the investment worth the knowledge?

How many items are available for testing?

Early planning and budget help here, yet it is rare that we have sufficient samples to make your statisticians happy. This is another balance situation. Here between sample size and risk. One of the contributors to failure is the naturally occurring variations between products. Some are stronger, and some are weaker. Testing just a few samples does not adequately reflect the range of (often) unknown important variation.

If possible, you can create devices at the range of variation.

For example, it is expensive to create prototype integrated circuits (ICs), thus getting a hundred samples that adequately represent the range of variation induced by the fabrication process is cost prohibitive. If the circuit timing is important, one possibility is to create a few sample with slow and fast circuit properties. This permits testing at the edges of the expected variation at less overall cost.

Given a fixed number of samples, be clear about what can and cannot be learned through reliability testing.

What is the existing design reliability?

A common goal is to make a new design as good as or better than the previous version. The ability to conduct comparison testing may simplify reliability testing and reduce sample size required. If the current design is already reliability enough, that may reduce the need to test as extensively the new design.

The ability to conduct comparison testing may simplify reliability testing and reduce sample size required. If the current design is already reliability enough, that may reduce the need to test as extensively the new design.

We can focus reliability testing on the new elements rather than overall.

Summary

Testing is not the only way to determine when or how a product will fail, yet is often the most effective approach. The best test is the one done by customers as they use the product.

Since we would like to know the results before the customer discovers them, we do reliability testing.

Sometimes it is possible to replicate customer conditions. Sometimes we use very controlled conditions that focus on one failure mechanism. In all cases, the testing attempts to provide insight on how the design will perform over time.

There are many other considerations, yet the above list provides a good start. Early in a program, before all the significant risks are understood, we should consider the possibility of reliability testing, which includes setting aside a budget, resources, and time for testing.

Reliability is important and measuring the ability of a design to meet the reliability goals becomes an essential element of feedback to the design team.

Reliability in Product and Process Development (article)

First 5 Questions (article)

Role of Reliability in an Organization (article)