Big Picture of Reliability Plans

Reliability is an attribute of an item. The item, with its design, assembly, and use, has a finite probability of working over some duration.

We deliberately attempt to create items that meet our and our customer’s expectations concerning that probability and duration. We maintain and operate our equipment with an expectation of successful performance.

We and our customers are let down when a failure occurs. We do not enjoy the benefits of the item’s value.

Reliability Planning

When you set about to achieve a specific reliability for your new widget, how do you get started and what should you do?

There are plenty of guides, books, standards, and more that provide advice that ranges from hiring the best engineers possible to test everything to failure. What works in one industry or situation most likely will not work in another.

What is great about reliability engineering is there are just a couple of basic questions and a handful of concepts. It is the detailed application that we match the specific tools to the task at hand. The basic outline for a plan is nearly always the same though.

1. What are you trying to achieve?

Set a reliability goal which is based on customer expectations and business objectives. The goal is in part based on understanding the consequences of failure and in part on our understanding of the viability of other solutions.

Breakdown the goal to the major elements of your product. Make the goal clear to your team for their specific areas of concern. Make the goal specific for your vendors, too.

Setting a goal is the start. It answers the question of what you are trying to achieve in a meaningful and measurable manner.

2. What are the risks?

Designers tend to design away from failure. With the natural uncertainty involved with any design, we add design margin, safety factors, or derating to make the item robust to variations in materials, assembly, shipping, storage, and use.

What could possibly go wrong? Our team and customers may have experience with what can or might go wrong. Simply listing and considering the range of possibilities can help us face and manage the risks.

We generally need to know what could go wrong in order to consider, evaluate and improve the design or process to accommodate and minimize the risks of failure. We use tools like previous product field data analysis, prototype testing and failure analysis, discovery testing (HALT, margin, stress testing, etc.), and FMEA (hazards analysis, engineering judgment, expert opinion, etc.).

Expose risks early.

Exposing the risks early and often in a program permits the team to address the potential failures. Sure we still learn from our customers about failures we didn’t anticipate. Yet, with a plan to uncover risks the input from customers should be rare.

3. Estimating Reliability (feedback)

Setting goals are fairly meaningless unless we can measure progress toward achieving those goals. A common question during the development process is, “Are we there yet?”

In reliability terms, is the product going to function in the use environment with high enough probability over the specified duration(s)? In other words, is the item reliable and meeting our goals?

The best way to measure reliability is to ship to you customers and monitor your product’s reliability performance carefully. Best possible data and obviously too late to influence the reliability experience seen by your customers.

We need to know before shipping to customers is the product good enough.

Early in a program, we may have little more than engineering judgment or educated guesses. Later we can refine these estimates with predictions, literature supported models, vendor data, and calculations. If we have access to field data from a previous and similar product, that information is immensely useful for the current development team (it’s like the customer does all the testing for you so you don’t have to).

Accelerated testing is expensive and may take many forms. Generally, we need to know the failure mechanism and how the testing stress should lead to failures, so we can create estimates of reliability.

Using reliability modeling (reliability block diagrams, fault tree analysis, etc.) we accumulate the various estimates and create a product specific estimate. Early on the result may have large uncertainty, later as we refine the estimates, the model approaches what we will see in actual use.

4. Minimize variability

I learned as a manufacturing engineer that the resulting product is not as reliable as the design team imagined. On paper, everything fits and works perfectly. In practice, we have dimension, material, load, use, and expectation variability. We also experience changes over time for the same list.

Part of the design process is to identify which changes or variability matter. Which ones either have little margin or tolerance for change, such that the changes lead to failure. Minimize variability has a few basic steps:

Identify critical to quality/reliability design elements
Identify acceptable range of variability
Implement measures to monitor and control the variability

This is in part process control. It also is supplier management. It also is design change monitoring (dealing with proposed improvements, obsolete part replacements, etc.)

Variability happens. Plan for it.

Summary

Set goals
Identify risks
Measure progress
Minimize variability

Does your plan address each area?

5 Steps to Create a Reliability Plan (article)

Reliability Goals (article)

Consider Variation for Reliable Designs (article)