Article first posted at Conscious Reliability by James Reyes-Picknell, Jesus Sifonte, and team.
Evaluation Criteria for Reliability Centered Maintenance (RCM) Processes
According to the Merriam-Webster dictionary, one of the accepted definitions for standard is: “something established by authority, custom, or general consent as a model, for example.” In our case, a standard comprises a document or sets of documents providing requirements, specifications, guidelines, or characteristics that can be used consistently to ensure that materials, products, processes, and services fit their purpose. There exist international standards on quality (ISO 9,000), risk (ISO 31,000), environment (ISO 14,000), energy (ISO 50,000), management and many other fields providing information and guidance on the practices, methods, and processes designed by groups of highly qualified international experts. Most technical field professionals utilize international standards to base their practice on trusted mathematically and/or scientifically proven methods. Trial and error are no longer acceptable out of the laboratory anymore today. But, lessons learned from its practice in conjunction with regretful real-life incidents and accidents provide knowledge on their risks, mitigation, and prevention. Most asset and maintenance management best practices and techniques are standard-driven, meaning they have been carefully defined and established. The SAE JA1011 Standard on Evaluation Criteria for Reliability Centered Maintenance (RCM) Process has an exciting background, including disappointing and successful stories before its principles were conceived and eventually incorporated into an international engineering standard.
The aviation industry faced reliability, safety, and cost-effectiveness-related challenges by the 1950s. Comprehensive time-based maintenance task regimes could not provide sustainable operations any longer as the commercial aviation industry was about to undergo a significant crisis. Maintenance and reliability professionals could not find a clear relation between applied PM maintenance hours and components reliability. Furthermore, some maintainers experienced that using fewer pm hours at longer intervals resulted in improved reliability. Aviation companies must comply with specific maintenance plans to retain their airworthiness certifications. Almost all recommended maintenance tasks consisted of an overhaul of parts before they reached a useful life expressed in operating hours. The US Federal Aviation Administration (FAA) denied permission to manufacture the 747 aircraft model to Boeing raised a major alarm in the aviation industry. It was thought that a larger size plane with three times passenger capacity would require much more maintenance and operating costs than its predecessors. This new design denial, in conjunction with a poor safety record of nearly 60 crashes for every 1,000,000 take-offs and high operating costs, demanded new aircraft design, operation, and maintenance perspectives which led to the creation of RCM.
Efforts to understand non-structural aircraft component failure patterns led Stanley Nowlan and Howard Heap, both from United Airlines, to develop a new approach toward maintenance. They documented their methodology for developing failure consequence management policies in a report published by the U.S. Department of Defense in 1978.
Their process was called Reliability Centered Maintenance (RCM) and was based on a common-sense procedure with a decision diagram for creating Maintenance strategies to protect assets functions. RCM is a process to determine what must be done to keep assets doing what their operators want them to do in their current operating context. Since its origins, RCM has been used in many industries and almost every industrialized country. There have been many individual interpretations of Nowlan and Heap’s report leading to the creation of various methods that differ widely from the actual process.
The purpose of the standard SAE JA1011, published in 1999, is to set out the criteria that any process must comply with to be called “RCM.” The twelve pages document, revised in august 2009, describes the minimum criteria for a process to be considered an RCM-compliant method. The standard provides the requirements to establish if a given process follows the creeds of RCM as initially proposed. It can also serve as a guide for organizations seeking RCM training, facilitation, or consulting.
Document SAE JA1011, AUG 2009, establishes that for a Process to be acknowledged as RCM, it must follow the seven steps in the order shown below:
- Delineate the operational context and the functions and associated desired standards of performance of the asset (Operational context and functions).
- Determine how an asset can fail to fulfill its functions (functional failures).
- Define the causes of each functional failure (failure modes).
- Describe what happens when each failure occurs (failure effects).
- Classify the consequences of failure (failure consequences).
- Determine what should be performed to predict or prevent each failure (tasks and task intervals).
- Decide if other failure management strategies may be more effective (one-time changes).
The Operational Context and Functions
The first step for applying RCM to a physical asset entails defining its operating context and required functions under it. The logical starting point to design a maintenance or failure management strategy (or an asset management policy as the Standard calls it) is understanding clearly what is being demanded from the asset. This represents a change in perspective for maintainers. Often the maintenance department is not involved in determining why any particular asset is there. If we are to sustain performance of specific functions, we need to know exactly what the functions are as well as the operating parameters that define the performance levels needed to fulfill operational demand.
In order to properly define the operating context, the RCM team must describe functions following this structure in accordance with the standard:
- The conditions in which a physical asset or system is anticipated to operate shall be defined, recorded, and available.
- All primary and secondary functions of the asset/system shall be identified.
- All function statements shall contain a verb, an object, and a quantitative performance standard (whenever possible).
- The performance standards used in function statements shall be the level of performance desired by the user of the asset in its current operational context. The design capability should not be used in the function statement.
A functional failure is defined as “a state in which a physical asset or system is unable to perform a specific function to a desired level of performance”. It is instrumental to have a perfect understanding of the asset functions and the desired performance level to determine functional failures. There could be total or partial functional failures. That means, the asset may not be able to fulfill a function at all or that it may perform it at a lower than the desired performance level. The SAE standard asks that all the failed states associated with each function be identified so that we can identify all the relevant failure causes
A failure mode is a single event, which causes a functional failure to occur, and each failure mode usually has one or more causes. So, we need to brainstorm on all possible events causing assets to impair their ability to perform each specific function to the desired levels of performance. The standard recommends not being too superficial in the causation level of the failure modes. When listing failure modes consider:
- All failure modes reasonably likely to cause each functional failure shall be identified.
- The method used to decide what constitutes a “reasonably likely to occur” failure mode shall be acceptable to the owner or user of the asset. Usually consensus is used to decide which failure modes to analyze and which ones to discard.
- The level of causation for failure modes must be exhaustive enough so appropriate failure management policies can be assigned to manage them.
- Failure modes listed in the analysis must consider events that have happened before, the failure modes being prevented in the existing PM program and other events that are likely to occur in the actual operating context but has never happened.
- Human and design errors causing failure event must be included in the failure mode list unless they are being addressed by other analysis methods.
Failure effects quantify the “damage” each failure event may cause to the plant or the organization. It is recommended to describe “what happens when the failure mode occurs”. The standard recommends several relevant considerations to help understand how serious each failure cause might be. Failure effects help determining the extent to which each failure mode is relevant by taking into consideration the following:
- Is there any evidence that the failure has occurred?
- What is the potential impact to the failure poses on the personnel safety?
- What is the potential impact to the failure poses on the environment?
- How is production or the operations affected?
- Is there any physical damage caused by the failure?
- Is there anything that must be done to restore the function of the system after the failure?
Failures effects are classified into categories based on evidence of failure, impact on safety, the environment, operational capability, and cost. We should be able to decide which of the four categories apply to each failure mode effects. Only one category must be chosen – whichever is most severe. Hidden and evident failure modes must be clearly separated. Failures with safety or environmental impact must be distinguished from those only having economic impact either by operational or non-operational consequences. Like every step within the RCM process, failure consequence determination is critical. Maintenance strategies are carefully selected for every critical failure cause based on a decisional procedure using the failure consequence as the starting point.
Maintenance Strategies Selection
The most likely predominant failure pattern for each identified failure should be considered at the time of recommending any failure management strategy. Failures modes may occur with age or usage or randomly. They may also occur prematurely or following a wear out pattern after some significant operating time. Care must be taken to recommend maintenance tasks based on actual predominant failure patterns. SAE JA1011 acknowledges 5 possible maintenance strategies that must be applied to mitigate the consequences of any given failure. They are the following:
- Condition Based Maintenance Tasks. These tasks are intended for detecting Potential Failures. Such detection must occur early enough so that corrective action can be taken before the loss of function. A condition monitoring task is applied at fixed intervals to enable trending of the function loss prior to a Functional Failure.
- Scheduled Overhaul Tasks – Time based repair tasks must be carried out based on the useful life of the component. That is, the time at which the component failure rate ceases to be constant. Theoretically, at the end of the useful life the component failure rate increases beyond a rate that we can tolerate. Besides the useful life of the item, the cost of the preventive repair also needs to be evaluated. This is, a comparison of the cost of the overhaul work against that of the functional failure must confirm the economic viability of the task.
- Scheduled Replacement Tasks – Scheduled discard and replacement tasks are considered when it is demonstrated that replacing is more cost effective than overhauling the item. It is recommended to apply such replacement at the end of the so called “economic” life of the item.
- Failure Finding Tasks – These tasks are intended to detect hidden failures associated most of the time with protective devices or redundant components. We must ensure that it is physically possible to perform the recommended failure finding task and that the suggested task frequency is acceptable to the owner of the asset. More will be said about task frequency later in the book.
- Re-Design Tasks – Sometimes appropriate Time, Condition or Failure Finding Tasks for a critical failure modes can’t be found. Then, it may be imperative that modifications (also called “one time changes”) be implemented in order to properly address the failure consequences. Changes in assets physical configuration, operation or maintenance procedures, operator/maintainer training and, operating context alteration are all possible forms of one time change or re-Design potentially required for failure consequences mitigation.
When formulating maintenance tasks appropriate frequencies must be assigned for them to effectively address failure effects. Some math and statistical formulas are used to support the task interval decision. In such case, the SAE JA1011 standard recommends that the math used be agreeable to the item’s owner. Also, care must be taken when recommending new maintenance tasks for assets since the RCM process cannot, by any means, supersede existing laws, regulations, and/or contractual obligations. Thus, it is wise to have a knowledgeable internal auditor evaluate and accept recommendations made as part of the RCM process.
The RCM standard SAE JA1011 is quite direct and concise setting out the criteria for identifying nonstandard compliant analyses processes. It is particularly useful to people who wish to get RCM services (training, analysis, facilitation, consulting, etc.). Successful RCM implementation requires trained multidisciplinary groups to apply the process with the guidance of a certified facilitator who masters its execution. The main deliverables of RCM entail optimum periodical maintenance tasks, re-designed maintenance & operational procedures, redesigned machine components, etc. Successful implementations report significant reduction of applied pm man-hours, improved safety performance, and enhanced asset reliability and availability resulting in significant financial performance. When applied correctly the target of an RCM effort is to protect asset functions to reduce the risk or effects of failures to acceptable levels per it’s owner’s expectations.