Understanding the Failure Curves with Doug Plucknette & Ron Moore
Were excited to have Ron Moore and Doug Plucknette get into the topic of age and random failure patterns. Ron’s been involved with maintenance and reliability for a long time. Doug’s from RCM Blitz and is a big advocate for understanding failure modes, reliability centered maintenance, and its different aspects. He has experience ranging from wrenching to supervision and management, and also training and consulting.
In this episode, we covered:
- The tools a Maintenance Engineer needs!
- How to make change!
- The steps you need to build a performance tool!
- And much more!
Where are these curves from, and what’s their importance?
People have done studies of the six curves. However, the original study was done by Stan Nowlan and Howard Heap in their development of RCM, in conjunction with United Airlines and equipment from the Boeing 747. The study was done in the early 70s. Afterward, several people, including Doug, tried to recreate the study, realizing that failures tend to fit the same kinds of distributions across industries.
Before that, Wadell from England helped create the maintenance practices on B24 Liberator Bombers. He observed some of the same things that came to be the RCM.
Main difference between age-related curves and random (none age-related) curves
With age-related curves, they’re based on time, whereas with the random curves, time is irrelevant. People get most confused about infant mortality or the early life curve since it seems time-based. However, time can be the fraction of a second when talking about an electronic component. In terms of a bearing, that timeframe could be three months, which is still an early life failure since it should have lasted at least ten years.
Those curves can also be considered concepts rather than rigid sets of data. So, they can all be age-related in the sense that there’s a timeframe associated with them. You might not know what that timeframe is, as it can vary between components. So, look at the parts you have. If it’s properly installed, what would be the best maintenance plan for them?
If you have a wear-related failure mode like corrosion or erosion, you might have a time-based PM you’d like to do. But that’s only if the design, fabrication, insulation, startup, and maintenance are relatively stable and done to a high level of skill. That’s when that age-related, time-based PM for its replacement would probably apply. But very few people achieve that because of all the errors that could happen between its expected life and what you do to destroy it.
What is a Random failure?
A random failure is caused by humans when they don’t do the proper design, specification, storage, insulation, operation, and maintenance. The probability of any one component in a group is equal to a failure. And the group is similar to the likelihood of any other component failing. You have a constant probability of failure for a given set of components, and you manage that through appropriate condition monitoring based on the failure modes.
On the other hand, infant mortality is very component and failure mode-specific.
How to address and prevent or mitigate Infant Mortality failures
It starts with the design, procurement, installation, storage, operations, startup, and a good routine maintenance program based on the failure modes and the consequences. Those and having people train and have their skills develop so that they have the time to practice addressing these issues. That way, they get involved and have a sense of ownership for the solution.
You can also break it down as the five rights to reliability:
- Design it right
- Install it right
- Maintain it right
- Operate it right
- Store it right
How to mitigate Curve E risks
This is the constant condition of a probability of failure, which translates to a random failure pattern. You can manage that through condition monitoring. Look at the failure mode, assess the risk and consequence of failure, and then put the appropriate technique in place. Also, ensure operators are involved. Take the information further and determine whether you need to go back to the initial phases to address the issue thoroughly. That way, the risk of it happening again in the future becomes minimal. If you have a useful PF curve, you can do condition monitoring. If not, then you need to look at how you’ll mitigate the consequences.
How to figure out the PF interval
Talk to the mechanics. Even with industry standards in mind, it still depends on the failure mode, application, and consequence. Check the database for any meaningful information to help you make a judgment. With that information and based on your experience, ask yourself how long a component will last once it starts to fail.
Where do Overhauls and Rebuilds fall into the different curves?
Schedule time for it and validate that you need to do the work based on your assessment of the equipment’s current condition. If you don’t need to, you can postpone it.
If you have components susceptible to corrosion, abrasion, and erosion, you need to always look at those during your overhauls. If you don’t, you risk bringing the infant mortality rate back to components that don’t suffer from corrosion, erosion, and abrasion.
Tips to help implement the six curves
For starters, concentrate on stopping infant mortality failures. Eliminating those is more powerful than figuring out the best timeframe for a PM or an on-condition task. Use the curves in conjunction with how a study was done. Good maintenance, design, installation, training, and all those things have to be done well before you can apply this. Without those, you’ll have constant failures. Then ensure you have a good condition monitoring program to detect the onset of failure early enough.
How to become successful with Precision Maintenance, Standards, Design it Right, and Monitoring
There are four base elements here:
- The leadership has to create a culture of excellence by being demanding and supportive.
- Have a good production and maintenance partnership that works together to eliminate defects in the design, procurement, operations, and maintenance to create better overall performance
- Have measures that facilitate collaboration to avoid conflict
- Have a process for employee engagement in the improvement process
The maintenance and operations teams also have to have the capability to work towards precision levels and understand how it all works.
Doug Plucknette & Ron Moore Links:
- Doug Plucknette Linkedin
- What Tool? When?
- Reliability Toolkit
- Past podcasts featuring Ron Moore
- RCM Blitz
- Making Common Sense Common Practice
Rooted In Reliability podcast is a proud member of Reliability.fm network. We encourage you to please rate and review this podcast on iTunes and Stitcher. It ensures the podcast stays relevant and is easy to find by like-minded professionals. It is only with your ratings and reviews that the Rooted In Reliability podcast can continue to grow. Thank you for providing the small but critical support for the Rooted In Reliability podcast!