The established tools for process plant reliability engineering are:
- Asset Criticality Analysis (ACA)
- Reliability-Centered Maintenance (RCM)
- Failure Modes and Effects Analysis (FMEA)
- Risk-Based Inspection (RBI)
QUESTION: Are you truly impressed by the measurable and significant performance increase that these tools have delivered?
Did you embrace these tools with enthusiasm? I did. At first. Though to be honest, I was glad to be rid of them when my job changed. Why? For two closely related reasons:
- On an emotional level, because of numerous frustrating discussions regarding aspects of the technical implementation.
- On a technical level, because each tool implements a level of abstraction (trivialization of the reality) that limits its relevance to the real world.
This is not a criticism of the tools themselves; they just didn’t meet my one requirement, which is necessary to clearly demonstrate that an optimal decision has been made:
- To quantify, in a transparent fashion, the specific impact of each hazard scenario and each mitigating measure on the stochastic performance of the overall production system.
If your existing tool met this requirement, you would undoubtedly have answered “YES” to the above QUESTION.
The software “VirtualWorld” from RAMS Mentat GmbH is designed to meet this requirement. An important feature of the software is the “Phenomenological Asset Model” (PAM).
The Phenomenological Asset Model (PAM)
The PAM is a “digital twin” of the asset failure and maintenance processes and their interactions. The PAM documents our knowledge of each asset vulnerability and the associated maintenance strategy. For example, Figure 1 characterizes a bearing failure mechanism stochastically, whilst Figure 2 describes the relating Failure-Inspection-Repair processes and their interactions.
Figure 1: Stochastic characterization of (fictitious) bearing failure data.
Figure 2: Schematic depiction of a Failure-Inspection-Repair process and the decision logic for Scenario B.
The stochastic behavior of the PAM and its impact on the production system are estimated via Discrete Event Simulation (DES).
Details of the PAM are not presented here. However, it is noted that a PAM that faithfully reproduces our understanding of reality may replace alternative approaches such as ACA, RCM, FMEA and RBI.
The importance of decision logic for the maintenance strategy
The PAM must faithfully reproduce our understanding of “reality” to be accepted by the reliability organization. An important (und underestimated!) aspect that the PAM must address is “decision logic”. For example, the process shown at Figure 2 does not adequately address the question:
WHEN shall the “proactive repair” task be scheduled?
Figure 2 implicitly indicates that the bearing is immediately replaced upon detection of the incipient failure. This logic will have two undesirable outcomes:
- unplanned bearing replacement will cause unplanned production loss, and
- early replacement prevents utilization of the bearing’s remaining useful life.
In the real world, we apply decision logic to achieve a more optimal outcome. For example, we may decide to closely monitor the bearing condition to utilize the remaining useful life and “limp” to a planned maintenance opportunity.
Hence, “decision logic” must be considered to develop a “realistic” and “optimal” maintenance strategy. Interestingly, it is also an aspect of the maintenance strategy that is traditionally neglected.
Seeing is believing
Three scenarios implementing increasing levels of decision logic are described at Figures 2 and 3. Each scenario was modeled using a PAM and fictitious data. The results are summarized at Figure 4. Details of the models (e.g.: P-F interval, inspection and repair costs, repair times, etc.) are not included here.
Figure 3: Schematic depiction of the decision logic for Scenarios A and C.
Figure 4: The results of PAM modeling of the Scenarios A, B and C (100 simulations, 5-year mission duration).
The average costs for each scenario are presented at Table 1 and demonstrate that the PAM has enabled the specific impact of each scenario and each mitigating measure to be determined in terms of the system performance criteria.
Table 1: Average costs for the three scenarios (100 simulations, 5-year mission duration).
Scenario | Lost Revenue Cost ($) | Operating Cost ($) | Total Cost ($) | Total saving |
A | 441 | 743 | 1183 | – |
B | 147 | 261 | 407 | 66 % |
C | 98 | 216 | 314 | 73 % |
Notes to Figure 4 and Table 1:
- Lost revenue cost is the loss of production capacity that results from system unavailability.
- Operating costs include, in this case, inspection and repair costs.
- Total savings (Table 1) are calculated in reference to Scenario A.
Summary
So, what have we learned? Let me summarize:
- Established tools for process plant reliability engineering (i.e.: ACA, RCM, FMEA, RBI) implement a level of abstraction that limits its relevance to the real world. This in turn prevents the development of an develop a “realistic” and “optimal” maintenance strategy.
- The “Phenomenological Asset Model” (PAM) is designed to be faithfully reproduce our understanding of “reality”. The PAM is therefore an appropriate tool for developing realistic and optimized asset strategies.
It is reasonable to assume that your production and maintenance systems are currently not optimized. It is estimated (to-date without evidence) that an optimization will enable the total costs to be reduced by approximately 20 %.
RAMS Mentat GmbH has developed an innovate technical and systems engineering approach – and supporting tools – that enables the reliability and safety performance of an entire production system to be optimized with consideration of capital investment, operational and maintenance cost constraints.
Larry George says
Thanks. Nice comparison! How did you come up with the 4-month inspection interval? I ask, because I am still looking for the source of the inspection time formula in IEC 60601-1, https://accendoreliability.com/error-in-inspection-time-interval/#more-462702/.
Andrew Kelleher says
Hello Larry, the data in the above example are purely fictitious. I specified a 4-month inspection interval and simulated the outcome using a model. One could also run a sensitivity analysis to determine the “optimal” inspection interval in terms of overall system performance. Please also see my comment on your linked article.