Reliability Reactive or Proactive

Do you let events happen to you, or do events follow your designs and expectations? Are you a spectator or actor? Do you wonder about your product’s future or do you control your product’s future? Are you reactive or proactive?

Every reliability and maintenance program is a system. Every program has inputs, such as product testing results and field returns. Every reliability program has outputs, such as product design and production. In the most basic terms, a reliability program includes product specifications for functionality including expected durability. The program includes some form of design, verification, production and field performance. Given this basic lifecycle description, it is possible for two types of approaches to executing the product lifecycle.

Every design will fail

Let’s consider the notion that every product will eventually fail. Even the most robust product on Earth will fail when the Sun expires. Well before the collapse of the solar system, most products made today will have completely failed. The failures will range from deterioration of materials to stress conditions (i.e. lightning strike), or simply to misuse. Some will simply wear out; others will become obsolete and lose compatibility with other systems; others will simply not provide sufficient value anymore.

Another important notion is that of product design, there are a finite number of faults in the design. A button has a limited number of actuation cycles before accumulated stress cracks the switch dome. A material has a degradation mechanism (corrosion, polymer chain scission) that slowly deteriorates the material’s strength. A ‘bug’ in the software can disable the equipment temporarily. Further, there are possible defects designed into the product that do not account for production variation, user demand or environment variations or does not anticipate user expectations. In every case, sooner or later, the design flaw will lead to failure. Nonetheless, given only a finite number of failures, it is possible to find and remove most design errors.

Reactive Approach

One way to approach product reliability, and the most common method, is to wait for product failures and then respond with analysis, adjustments, and refinements in an attempt to improve product reliability. The naive wait for the failure reports from customers before taking action. The team’s logic, if even considered is the following:

We are good designers
The customer will use the product in unforeseen environments and applications
If there are customer failures, we will consider improvements

For some products, with limited release and ample time to redesign the product, this may be perfectly feasible.

A simple improvement the design team could consider is an estimate of the customer’s use profile and environmental conditions. Armed with this information, the team then evaluates the impact of the conditions on the product’s reliability through standardized testing. Setting testing conditions at or slightly above expected operating environments permits direct evaluation of the design to meet expected conditions. The faults found would be similar to the failure expected to occur in the customer’s hands, and there may be time for a redesign before the product is shipped to customers. Carrying out this logic may lead to a broad spectrum of testing that is both expensive and time-consuming.

Part of the logic of product testing includes the thought, “If we test in enough ways over the full range of use and environmental conditions, we should find and correct every design fault.” There is often a heavy reliance on industry standards and common test methods for every product.

Further improvements to product reliability can refine this reactive method, and include using simulations, risk analysis, and early evaluation and testing of subsystems and components. The overall approach is often limited by knowledge of actual use conditions, lack of test samples, and lack of time.

Proactive Approach

Moving to a proactive approach can permit the reduction of product testing and the increase of product reliability. While this may seem similar to the above approach, it involves a focus on failure mechanisms instead of test methods. Products fail because they do not have sufficient strength to withstand a single application of high stress (drop, static discharge, etc.) or they accumulate damage (wear, corrosion, drift, etc.) with use or over time. Thinking though how a product could fail by considering the materials, design, assembly process, and the same for vendor supplied elements, the product team determines a list of possible failure mechanisms.

In this approach not all the failure mechanisms will be fully understood or characterized. The risk, in this case, is the decision to launch the product while not understanding the possibility or potential magnitude of product failure. The amount of risk itself is unknown. Therefore, the proactive team proceeds to characterize the design or material under the expected use conditions. The intent is to reduce the uncertainty of the risk.

A second result of the proactive approach risk assessment is the rank ordering of failure mechanisms by expected rate of occurrence. One way to accomplish this ranking is to evaluate the stress versus strength relationships. The items with the largest overlap of the two distributions (stress and strength) indicate they have the highest potential for failure. The solutions may include increasing strength or reducing the variance of the strength.

A third result of the risk assessment is similar to the stress and strength evaluation and includes the impacts of time or usage on the change in the stress and strength distributions. Either curve may experience changes to the mean or variance over time. This may be due to degradation, wear, or increased expectation of durability by customers.

The proactive approach takes more thinking and understanding of how testing stresses create failures, plus characterization of product designs, materials, and processes, and their related failure mechanisms.

Summary

In summary, a reactive approach creates a design and then waits for field returns or standard product testing failures to prompt product improvements. The proactive approach anticipates failure mechanisms, experimentally or via simulation, characterizes the response of the design and materials to expected stresses, and then designs.

There are other aspects that identify a reactive versus proactive reliability program. For example, if the only time management discusses product reliability is when a major customer complains about product failures, that is a reactive approach. If the management team regularly inquires and discusses the risk a particular design presents to reliability performance– that is a proactive approach.

How does your team approach product reliability? Are the results as expected or are there regular surprises?