The term “reliability improvement journey” is well-established in the chemical process industry. The decade-long, tortuous journey of one company is shown in terms of operational availability (i.e., production) and relative maintenance cost at Figure 1.
The length of a company’s reliability journey reflects the maturity of the “reliability culture”. Here, the term “reliability culture” may be described technically as “the extent to which each decision vector aligns with the company’s target vector”.
It logically follows that the reliability journey may be significantly shortened simply by improving the quality of each decision made by the reliability organization. But how?
Four reasons for a long and arduous reliability journey are presented below. It is intended that these reasons prompt you to critically rethink how you approach the reliability engineering problem in your plant.
Reason 1 – An important role in your reliability organization is vacant.
The production system is, as the term implies, a “system”. Comprised of assets, process units, operational logic, storage tanks, supply chains, failure mechanisms, maintenance processes, etc..
This calls for a “systems engineering approach” to the reliability problem, which in turn requires the appointment of a “Systems Reliability Engineer” (SRE). This is an engineering discipline of its own, which – to my knowledge – is not explicitly taught for application in the unique context of a chemical production plant.
As depicted schematically at Figure 2, the role of the SRE is to align and direct the reliability improvement efforts of the reliability organization. That is, to ensure that they are working on the right topics and the ensure that each decision vector aligns with the company’s target vector.
Precisely how the SRE accomplishes the abovementioned tasks of direction and alignment are largely outlined in Reasons 2 to 4, described below.
Reason 2 – Your targets are poorly defined.
The performance of your production system is described in multiple dimensions (e.g.: production volume and maintenance cost) and varies from year to year according in a probabilistic function that you have probably not characterized. Further, the achieved performance in a given year may be largely determined by events that are outside of your control. It is likely that the reality of this situation is not adequately accounted for in your target-setting process or in your reliability improvement plan.
The adoption of a systems reliability engineering approach requires that the current stochastic performance of the production system be estimated as a basis for target-setting; refer Figure 3. The “target vector” is defined as the gap between the current performance and the target performance and is the basis for aligning the efforts of the reliability organization.
Figure 3 demonstrates that targets in stochastic systems are best specified in terms of two parameters, i.e.: FAIL and TARGET criteria. This practice enables the required performance improvement to be visualized and quantified.
Reason 3 – Your strategy to reduce “waste” is incomplete.
A typical reliability improvement plan is comprised almost solely of methods that focus on reducing “waste”. That is, hazards that may lead to a production loss. These methods can be characterized in terms of being proactive or reactive in nature, as shown at Table 1.
Table 1 : Examples of proactive and reactive reliability improvement methods.
|Failure Modes and Effects Analysis (FMEA)Reliability-Centered Maintenance (RCM)Risk-Based Inspection||Root Cause AnalysisDefect EliminationBad Actor Program|
In the absence of an overarching systems reliability approach, a reliability improvement plan that focuses solely on reducing “waste” is likely to result in a long, arduous reliability journey, for the following reasons:
- The proactive methods tend to be largely theoretical exercises with no strong coupling to the system performance vector(s). It is therefore practically not possible to reach an “optimum” solution. That is, it is not possible to align the decision vector with the target vector.
- The reactive methods target a sub-set of the possible future system hazards which, once alleviated, will be quickly replaced by newly recognized hazards. This is a characteristic of the complex stochastic production system. Hence, the extent to which the anticipated gains will be achieved in practice may be highly uncertain. Further, experience has shown that significant knowledge and experience may be required to develop robust and economically viable solutions. The extent to organizations have access to the required resources (technical, financial and time) is highly variable.
A systems reliability engineering approach will additionally consider the application of capacity “growth” strategies, such as debottlenecking and expansion projects. These types of improvement measures are usually able to be tightly coupled to system performance targets and are certainly able to be planned with a higher degree of confidence.
The task of the SRE is to ensure that company resources are wisely invested. This may be done by quantifying the impact of each improvement measure in terms of stochastic system performance.
Reason 4 – You are using the wrong tool for the job.
Whilst most reliability literature is concerned with “product” reliability engineering, the described methods (e.g., Weibull analysis and FMEA) find relatively little application in a process plant environment. At first glance, the reason for this would seem to be the ratio of (many) Assets to (few) Engineers. However, the real reason is much more interesting. It is because the traditional methods were developed for application in “simple” and “complicated” systems, whereas a process plant is a “complex” system.
The response to this situation has been to trivialize the complex system behavior, for example in the form of a risk matrix. This approach, however, prohibits the realization of optimal outcomes. An alternative response would be to apply methods suited for application in complex systems. For example, simulation is absolutely necessary to make optimal decisions in complex systems.
The results of a high-level simulation of a process plant, representing the current system performance, are presented at Figure 4.
The developed model also provides a basis for evaluating the merits of proposed measures for improving production system performance. You decide where you are headed: promotion, demotion or mediocrity!
A technical, systems engineering approach to the process plant reliability engineering problem is neither well-described in the literature, nor well-supported by appropriate tools in the practice.
RAMS Mentat GmbH has developed an innovate technical and systems engineering approach – and supporting tool – that enables the reliability and safety performance of an entire production system to be optimized with consideration of capital investment, operational and maintenance cost constraints.
One more good reason to rethink how you approach your reliability improvement journey!
 “Reliability – How Industry Leaders Take Advantage of this Often-Overlooked Improvement Opportunity,” Solomon Associates, 31 05 2021. [Online]. Source: https://www.solomoninsight.com/blog/reliability-how-industry-leaders-take-advantage-of-this-often-overlooked-improvement-opportunity.