Myth Busting 23: We need lots of failure data...

Reliability Centered Maintenance has been around since the 1970’s and it has proven to achieve amazing results wherever it has been used properly. As a reliability method, it guides decision making based on available evidence about past, and expected future, failures. It makes sense that failure data be part of that evidence. But do you need a lot of data?

A common mis-perception about RCM is that it requires a lot of data. Indeed, if you have good data you will likely make better, more informed, statistically based, decisions. But if you don’t have good statistical data, is that a reason to hold back on starting your RCM initiative?

Absolutely not!

There is usually a wealth of non-statistical data (empirical data) based on observations of experienced operators and maintainers already at your fingertips. In fact, you may find that when you are trying to solve problems, your operators and maintainers wonder why you don’t ask them what they think. They are often overlooked, and to your detriment. I’ve found that operators and maintainers often have a great deal of knowledge about what goes on and terrific memories about incidents (failures) that have happened. After all, they do see it first-hand. Why not pay attention to their observations and thoughts?

I’ve done many RCM analyses where engineers (particularly those with less experience) have tended to shy away from this empirical evidence that operators and maintainers have in abundance. The preference is to look to “systems” (data bases) for statistical evidence. In my experience, particularly if you are just starting on your reliability journey, that’s a big mistake.

Empirical evidence does not come nicely packaged in data fields with precise numbers – so it’s less attractive. But there is usually a lot more of it and, with the right questioning, it is far richer in detail than it might seem on the surface.

In one case an electrical supervisor in a utility had direct experience with no less than 3 transformer explosions over a 20 year period. There was another shift supervisor, like him, who in all probability had a similar experience over that same period. While that may not seem like a lot of data, it spanned a longer time frame than the “life” of their two previous maintenance management systems. It revealed 5 or 6 failures, in a population of roughly 600 transformers over a 20 year period. Considering that they operate more or less continuously we determined that the MTBF was:

20 y x 8760 h/y x 600 transformers / 6 failures = 17,520,000 hours.

That’s actually not so bad for the very few failure modes that could give rise to these failures. There were in fact just a couple of such failure modes and we didn’t really know for sure which had caused these. One was a gradual degradation of insulation with age. The other was leakage of water into the transformer (and oil out) due to corrosion of the casing. Indeed the supervisor’s memory was that these failures had taken place mostly in a flood prone part of the city and the transformers there were known to operate in flooded conditions occasionally and to suffer the greatest number of oil leaks.

Know that we could devise failure management strategies – exactly what RCM is there for.

There was no reliable data for those in the current CMMS which had been in use for over 10 years. This utility was also one of the better that I have encountered for capturing failure data! Clearly they were not consistent at doing so.

Failure events are seldom captured accurately in our maintenance systems. After all, those systems are designed for managing work, not failures. Work orders are the data gathering instrument. A work order may deal with preventive, predictive, repair, inspection, rebuild, or replacement work. Sometimes it is clear what type of work was done, other times (often) it is not clear. What failure mode was the work addressing? Unless you’ve done RCM analysis, you have no way of knowing and your work order won’t tell you. Reliability engineers know that work order systems are of limited value in doing their work. They often do what I did early in my career – set up a separate data gathering experiment (like you do in Six Sigma) to capture relevant data that is fit for purpose.

If you wait for data in your CMMS / EAM before starting RCM analysis, you will miss opportunity to improve today or worse, you may never start. You will experience few of any given failure mode in part because you probably don’t have a sufficiently large fleet of assets to give you meaningful data in a reasonable period of time.

There’s a little known conundrum pointed out by H L Resnikoff, that waiting for statistical evidence of failures defeats the very purpose of doing reliability work. We are doing this work to avoid the very failures we need to collect data for. So, if we are successful in our reliability work, we will not gather good statistical evidence. If we have good statistical evidence on which to base our decisions, then we are failing to deliver reliable performance.

For failures with significant consequences, if we are doing effective maintenance, then we are unlikely to have good data. For failures of low consequence, where we are more likely doing much less, we may in fact have good data.

You can and indeed should start RCM when you are in a position to comply with the maintenance schedules it will produce, and not wait until you have a lot of data in your data bases.

About James Reyes-Picknell

Leave a Reply Cancel reply