Getting Comfortable with using Reliability Results

We want to engage a reliability engineer in an analysis for our product design. They can help us produce some great information from which we can make decisions. You might be feeling uncomfortable about our team making a design decision based on those results. You don’t quite understand how the reliability engineer came up with the answer. You want to know where that information comes from so you can gauge the level of project risk of our decision.

We peel-back the curtain on reliability engineering methods. We explore reliability engineering’s roots and development, from the 1950’s through today, to better understand the results of an analysis. Having a general understanding of reliability methods can help us get comfortable with using the results.

View the Episode Transcript

The key takeaways from today:

Reliability engineers use failure data. There are many methods for them to do this, and the methods they use are dependent upon what product is being developed. There is no one reliability plan that applies to everything.

To better understand the reliability prediction we’re studying, we can consider where we got the failure data and which failure mechanisms we’re considering. Early calculations will use failure data that is not specific to our product. When we start to evaluate an engineering design, failure data from using physics of failure and finite element analysis can help us consider different design choices for different failure mechanisms. When we have parts, the failure data from testing our product may focus on one failure mechanism; the models of different failure mechanisms may be combined, or the team can use a worst-case method.

Finally, use your reliability engineering friends’ skills throughout the design development process, from early concept evaluations through product launch and field monitoring. They can help you make decisions for a robust design and avoid costly mistakes. Reliability predictions can evolve as the product design evolves and are a useful tool for decision-making.

Citations

An interesting case study of Physics of Failure:

Chary, Geetha V., Ed Habtour, Gary S. Drake. “Improving the Reliability in the Next Generation of US Army Platforms through Physics of Failure Analysis.” Journal of Failure Analysis and Prevention, iss. 12, Dec. 2011, pp. 74-85.

Just two standards-based methods that are still being maintained:

Telcordia SR-332 Originally developed for the Telecom industry, it has expanded to be used widely for other commercial and military applications. It uses a black box technique.

217Plus Handbook™ of 217Plus Reliability Prediction Models (The 217Plus Standard) Originally named PRISM, it was developed with the Reliability Analysis Center (RAC) and Reliability Information Analysis Center (RIAC). It is meant to replace the older MIL-HDBK-217 with more reliability data and models. It considers all phases of a product life cycle as a function of calendar hours.

Other QDD podcasts that might interest you:

If this episode spoke to you, there are 3 other Quality during Design podcasts you may want to revisit. They get into more detail about some of today’s concepts.

Episode 6: HALT! Watch out for that weakest link, describes highly accelerated life testing.

Episode 31: 5 Aspects of Good Reliability Goals and Requirements, where we build up a reliability requirement based on 5 aspects.

Episode 37: Results-Driven Decisions, Faster: Accelerated Stress Testing as a Reliability Life Test, where we describe more about reliability life testing and more specifically about accelerated stress testing.

Episode Transcript

Hello and welcome to quality during design, the place to use quality thinking to create products others love, for less. My name is Dianna. I’m a senior level quality professional and engineer with over 20 years of experience in manufacturing and design. Listen in and then join the conversation at QualityDuringDesign.com.

Reliability is the probability that something won’t fail in given conditions over a period of life, like time or cycles. Can our product function correctly for a certain time? Reliability engineering methods focus on addressing risk and reliability challenges in design and manufacturing production.

Reliability engineering can sometimes seem like a magical black box, especially if we’re not part of the process. Reliability engineers gather some data and output results. Or they plan some special tests. Those special tests can look to be methods that don’t align with the product’s design function, they can be expensive or complicated, and their results are intertwined with mathematical equations, both of the statistical and applied physics kinds.

Making a design decision based on reliability engineering is a very powerful method for designers to make decisions and its only getting more complicated as we get better tools. To demystify it, let’s pull back the curtain on reliability engineering. This will allow you to better understand the inner workings without having to jump in and become a reliability engineer yourself.

To understand what we can expect out of a reliability engineering activity, let’s put it into some historical context. How did the reliability engineering field grow up?

In the mid 1900s, military components were failing in the field and we didn’t want those failures. How can we study them to prevent them from happening? Early reliability engineering methods were based on observed failure rates of components, focusing on the fracture and fatigue of parts. First, we would collect information about how many products fail, how it failed, at what times of use and under what type of conditions. Then, we’d calculate reliability, assuming that the failure rate was a constant. (Remember that reliability is a probability. And, an extra fun fact is that a constant failure rate is modeled by the exponential distribution.) We collected catalogs of data about components. Using that empirical data (the things we observed from the field), we would use statistical methods to estimate the reliability of new, similar components or of designs that were combinations of components.

A lot of development happened in the late 1950’s and early 1960’s. One of those developments is when we started to apply a statistical model to the data, to fit it to a probability distribution like lognormal or Weibull. IThese methods originated with studying fracture of materials and fatigue and creep. Having a life model gave us better reliability predictions because we weren’t assuming that the design had a constant failure rate. Instead, we created a mathematical model that more closely matched the reality of product performance. We’d use that distribution to be able to predict reliability life data at different times or stresses, and it was more accurate than assuming we had a constant failure rate.

Still in the early 1960s, with life models being developed for fractures, fatigue, and creep, a U.S. Air Force laboratory introduced a Physics of Failure program. Bell Labs was investigating ways to use the Arrhenius math formula to evaluate temperature-induced aging of semiconductor parts. The following year, in 1962, they decided to join together for a Physics of Failure symposium called RAMS(r) (Reliability and Maintainability Symposium). The RAMS(r) conferences are still going on today.

The Physics of Failure approach to reliability engineering is to use science (namely physics) to identify how and why our part is going to fail. What are the potential root cause failure mechanisms that are going to lead to our part failure? Is it one mechanism or is it a combination or two or more? We’ve studied and captured many models about the common mechanisms of failure. We can apply them to model the life data of our design.

The physics of failure method is considered a bottom-up approach. We use known failure mechanisms to help us correlate a mechanism to failure with a measure of degradation to be able to calculate reliability and time to failure. We need detailed information about the components, like their use cases, performance expectations, material, the manufacturing process, and other design data. Knowing about the component, we study the places on the product that would be the most susceptible to failure and then choose the stresses or other damage mechanisms that will affect it. Or, if we have failures from test, we can look at what those failures are and what type of stresses typically cause them. From that, we decide what failure modes to model.

There are 3 basic models of physics of failure. There’s the stress-strength model, where something fails if the stress applied to it is greater than its strength. The damage-endurance model considers a stress that degrades our product: the stress creates cumulative, irreversible damage that doesn’t affect performance, but will break the product after a time. And, the third is a performance-requirement model, where stress degrades our product and negatively affects performance until the performance falls below what is considered acceptable. There are standard models we use for different stresses, some of which are derived from Quantum Mechanics, like the Erying model! We can combine models: we could have a Physics of Failure model for each failure, damage, or degradation mechanism that we can then combine to assess the overall degradation of the system.

We’re not finished with innovating reliability engineering methods, though. Since the 1960s, we’ve gotten better tools. We have better inspection instruments that can identify more specific failures. We now readily have computers and software available to nearly everyone, capable of being used to mathematically model complex failures. Starting in the 2000s, Probabilistic Physics-of-Failure methods started being developed. Whenever we use a mathematical model to predict real-life events, there is an uncertainty involved. It can come from the randomness in the real-world failure mechanism that we’re trying to predict, that inherent variability of a phenomenon that we just can’t control or reduce. Uncertainty can also be introduced because we chose a model that is incomplete: we lack knowledge about our design, we don’t have enough measurements, or we have measurement errors. It’s possible to develop a model and consider the confidence we have in its results. Uncertaintly may also come from the variation in our materials and introduced by manufacturing. For that, we’ve started using probabilistic finite element analysis.

That brings our historical tour of reliability engineering methods to an end, at least for this episode!

What are we doing today with reliability engineering? There are lots of options for evaluating the reliability of a design.

Going full-circle, back to the observed failure rates in the field, Reliability Engineers still use standards-based methods. There is a lot of failure information and history about product families and groups of components. Using all that collective experience and observations, we can develop mathematical models to help us predict what’s going to happen. The good thing about the models is that they’re available for most things. The not so good thing is that we’re not always sure that the use case for the data matches the use case for our product. Also, some of the data may be out-of-date with modern design and manufacturing methods. These standards-based methods are best used to get a quick and rough estimation of product reliability, especially early in the design phase. These models can be used to help make design decisions about component options, the possible need for redundancies, and component configuration options for our concept. On the podcast blog, I’ll include a couple of these standards-based reliability options.

Physics of Failure and Probabilistic Physics of Failure are just getting better over time. Advantages of the Physics of Failure methods is that they are accurate with known failure mechanisms for components. These models can be used to help make design decisions about components, the need for redundancies, and the overall, system assembly of our components. And, they can be performed with innovative design concepts, cutting-edge technologies, and with existing products. We can use the physics of failure models to make reliability predictions if we know our product, how it’s used, in what conditions it’s used, and defining the point at which any failure or degradation is going to force it to fail to perform the way we want. Modeling a complex system using Physics of Failure at the component level may be difficult, but it is getting easier with software solutions.

No matter what type of reliability engineering method we use, we need data about failures. We can use what we know about similar products and what we know about the physics of failures. Those databases and collections of information are just getting to be more complete. Sometimes, we need to just test our own designs to failure. And, reliability engineering methods are addressing that, too. The Physics of failure information is used to shorten test times. Examples are burn-in at manufacturing, HALT, and accelerated stress testing.

Let’s conclude. We talked about the history and roots of reliability engineering, to better understand what methods are used today. Reliability engineering calculations can be complex, and they’re getting even more complicated. But you don’t need to know how to do those calculations to understand what decisions to make and to be comfortable with them. Just like design concepts evolve during the design development process, reliability and performance assessments can evolve with it. You can still use the information to make decisions during the design development process.

When you talk with the Reliability Engineer on your team about your new product design, realize that they need failure data in one form or another. When it’s the early concept evaluation phases, the failure data they’re likely to use is from historical, published sources or experiences with similar products. They’ll want to know the design concept, the general construction of component types, the performance expectations, and the working conditions of the product (like it’s use environment or use cases). When you speak with them during the design process (when you’re choosing components and starting the engineering design), know that they want the specifics of the design: the actual construction of the components, the materials, and sometimes the geometry of the components. They may want to evaluate which components are the most susceptible to failure, so they may want to test a component, sub-systems, or the system itself. After the product is released to market, they may use the failure data from the field to verify if the reliability models they used for development is what is actually being experienced in use.

The key takeaways from today:

If this episode spoke to you, there are 3 other quality during design podcasts you may want to revisit. They get into more detail about some of today’s concepts.

Episode 6: “HALT! Watch out for that weakest link” describes highly accelerated life testing.

Episode 31: “5 Aspects of Good Reliability Goals and Requirements”, where we build up a reliability requirement based on 5 aspects.

Episode 37: “Results-Driven Decisions, Faster: Accelerated Stress Testing as a Reliability Life Test”, where we describe more about reliability life testing and more specifically about accelerated stress testing.

Please go to my website at QualityDuringDesign.com. You can visit me there, and it also has a catalog of resources, including all the podcasts and their transcripts. Use the subscribe forms to join the weekly newsletter, where I share more insights and links. In your podcast app, make sure you subscribe or follow Quality During Design to get all the episodes and get notified when new ones are posted. This has been a production of Deeney Enterprises. Thanks for listening!