Defining a Failure? – it is Actually up to You!

Last Verified September 15, 2023

The reliability definition in relation to asset failures

Maintenance and Reliability professionals deal with equipment failures all the time. However, the word “failure” could have different definitions or thresholds. In order to take adequate and effective action, it is important to have clear specifications for what a “failure” truly is.

Reliability, in its academic root, is defined as the probability that a system will perform its intended function in a specified mission time and within specific process conditions. Reliability (R) is related to the Probability of Success. As opposed to the Probability of Failure (F). The relation between R and F is:

R = 1 – F (for mission time t)

In the above context, a reliability example would be: What is the probability that a centrifugal pump in a sheltered enclosure will pump 3,000m3/day of sweet crude oil without unplanned failures for a period of 8,760 running hours?

The system has “failed” if it does not fulfill its intended function in the defined mission time. Typically, this would be a functional failure; the pump has stopped pumping.

Example of failure definition

Life analysis and the statistical distributions derived are often based on “time to failure” variables. Therefore, it is important for the reliability analyst to define clearly what a failure truly is. When the failure is not obvious or intuitive, the analyst needs the help of Subject Matter Experts (SMEs) to obtain this definition. A leaking pump mechanical seal triggering a gas alarm in a pump house is an operational hazard thus a failure. The same seal might not be considered a failure, if leaking 4 drops a minute of process fluid, and NOT triggering the alarm. Therefore, if the asset does not perform to the specified requirements of the operator, this is a failure . In the above example, if the operator defines a “failure” as a “4-drops-a minute” leak and the seal leaks 3 drops a minute then it has not failed.

Additionally, the asset could be still functioning in the eyes of the operator but the operating criteria is not acceptable. In other words, using the pump example, the pump seal is leaking 4 drops a minute but still pushing out the required 3,000 m3 per hour flow. The seal leak is a “non-functional” failure because the pump is still running and pushing product. It is achieving its primary function. However, even if it is achieving its primary function, it is now in a state where the safety of employees or the environment might be at serious risk. Other examples of a non-functional failures could be:

Pump is still pushing process fluids but is vibrating beyond the required frequency.
Control valve is still opening and closing but the ability to control flow is poor.
Automobile has an audible squealing noise but is still driving well.

Ultimately, a functional or non-functional failure is established by a team of SMEs.

Progression form non functional to function failure; the PF curve.

There could also be multiple successive stages of failures over time for the same asset; functional and non functional. The ultimate failure, as described earlier, is a functional failure i.e., the pump does not run or the valve does not open or the transformer does not transmit power etc. What is known as a catastrophic failure is an extreme case. Those have dire consequences for production, safety, environment etc. In this case, the asset cannot be restored to a functioning state. It needs to be entirely replaced. A pump with a severely cracked casing or a transformer that caught fire are examples.

In the above paragraphs, the definition and evolution of failures (non functional and functional) are set by SMEs including vendors of the equipment. This concept constitutes the basis of a PF curve shown below. A PF curve is essentially a graphical representation of the evolution of those defined failures over time in relation to the operating condition or “health” of the asset.

PF curve with progression from non functional to functional failures

In summary, a failure, functional or non functional, is what you, the operator or analyst, define it as. It just has to have a clear definition, a measurable criteria’s and obviously the tools and processes to detect it.

Comments

Brian T. says
September 15, 2023 at 10:51 AM
This is equally important when conducting durability and reliability tests. Need to have agreement on what constitutes failure BEFORE starting the test.
- André-Michel Ferrari says
  September 15, 2023 at 12:42 PM
  Great addition to this topic Brian. Thanks for taking the time to comment. André