To help select which work orders to do first in situations of resource shortage many CMMS provide calculations for maintenance work order priority. Deciding maintenance work priority is a risk decision. The presence of risk totally changes the way to allocate maintenance job priority if you want to compare situations equally1. When you work with risk you cannot use a linear priority scale. Using linear priority ranking gives the wrong order of importance for doing maintenance work.
Keywords: maintenance work priority, work order priority ranking, job priority matrix
When you must decide between maintenance jobs how do you chose which to do ahead of others? Many Computerised Maintenance Management Systems (CMMS) let you use asset priority and/or work priority to help schedule maintenance work orders. Selecting when to do maintenance work, where the consequence of being wrong is an operational failure, perhaps even death in severe circumstances, involves a risk scenario. Deciding which maintenance work to do is a risk-based decision, as the choice may lead to the failure of equipment that would not have failed but for the decision to do the work at a particular time and not some other. Maintenance priority depends on the size of the risk of being wrong in waiting since the wrong choice will cost the business fortunes.
In prioritising maintenance work you balance two factors – the total business-wide consequences of failure, and when to do the work (Whether the work is effective in preventing failure is a separate issue.). It seems sensible that as the consequences of failure worsen it becomes more important to make sure that a failure does not happen. If you continue this thinking, you would prioritise and focus on doing high consequence work first and less important work later. This is the ‘gut feel’ approach we all instinctively use. For setting maintenance priorities you need a measure of the risk.
The standard risk equation is: Risk ($/yr) = Likelihood (/yr) x Consequence ($) Eq. 1
As a log-log equation it is: Log Risk = Log Likelihood + Log Consequence Eq. 2
An event with a consequence of $10,000 every time it happens that occurs 10 times a year will cost the organisation $100,000 per year. Another event that costs $100,000 and happens once a year costs the organisation $100,000 per year. These two situations are of equal risk but our perceptions of them are vastly different. We would do everything possible to stop a $100,000 single event and do little to stop a $10,000 event. Yet the organisation loses just as much money over a year from both. If each were to be ‘gut feel’ prioritised the single $100,000 event would get a higher priority than the $10,000 event.
If you have two maintenance work orders to do; one a preventive maintenance job (PM) that prevents a $10,000 event from occurring ten times a year, and one to fix a once-a-year $100,000 breakdown after it happened, which one is the more important to do? Anyone using a linear priority scale from 1 to 5, with 5 being the highest priority, would likely give 5 to repair the $100,000 breakdown and a 1 to the $10,000 preventive maintenance job. They may even cancel the PM and divert the manpower and effort to the $100,000 urgent work. The priority scale seems to justify doing the breakdown repair ahead of preventive work as a sensible thing to do.
But each scenario involves business risk and must be ranked by its risk priority and not a linear priority. Risk ranking is vastly different to work prioritised by linear scale. With risk based priority ranking you develop a risk table using a log10-log10 scale. Equation 2 is the risk equation as a log10 formula. To find the log10 value for the risk we add together the log10 numbers for Likelihood and Consequence. The log10 of 10 is 1 (101), of 100 it is 2 (102), of 1,000 it is 3 (103), and so on. Notice that the log10 value is the same number as the exponent, which allows us to simplify notation to 1, 2, 3, etc. If we always use log10 we know that each number is ten times different to its neighbours. The values in the cells of Table 1 are calculated with Equation 1 and are the annual cost of carrying a level of risk. In Table 2, consequence, and likelihood changes ten times in value with each number, so that 2 is ten times 1, 3 is ten times 2, 5 is ten thousand times 1, and so on2.
The cell values in Table 2 sum log10 likelihood and log10 consequence and also represent the risk. They correspond to the scale of impact. Though not the actual log10 value from using Equation 2 they are still signify multiples of ten. The table ranks business risk importance. The same numbers in cells represent the same amount of risk. The colours represent the levels of risk. This approach is standard risk management methodology and commonly used in industry to determine occupational health and safety risk.
When the method is used for maintenance work priority the consequences remain the same but we need to find words that represent the likelihood of failure if the work is not done. Maintenance work can be broadly grouped into PM work done to a schedule based on usage and/or time, and work requested that reflects changing fortunes and problems in an operation.
Table 3 shows an attempt to use words to describe the likelihood impact of maintenance work. If the item is already failed it is a breakdown. But it does not mean you do a breakdown job if there is higher priority work. A risk table makes it clear which maintenance work is more important for the business and so you are less inclined to simply respond to the most insistent person, or to misunderstand what risk a job really is and where you ought to put your priorities. It is vital to remember that the risk scale is in multiples of ten. Each number is ten times the impact of its neighbours. A maintenance job with a risk priority of 8 is not twice the importance of one given a priority of 4, which is what a linear scale implies (on a linear scale 4 is half of 8), it is 10,000 times more risky (where 10,000 x 10,000 = 10,000,000 i.e. 104 x 104 = 108 and as log10 4 + 4 = 8). Understanding work risk priority ranking is vital if you want to do maintenance that paybacks the greatest value to your business.
You can even go a step further and advise people how you want them to behave in response to the risk by including the response to take, as shown in Table 4. Going back to the two maintenance work order scenarios—a $100,000 breakdown and a PM to stop a $10,000 failure—we can now rate them using the risk ranking for work priority. The breakdown consequence is major and the likelihood is a certain failure, which gives a priority of 9. The PM job prevents a failure ten times a year, which would be a certain $100,000 a year lost—the PM job is a 9 as well. Both jobs are equal in priority to the business and both need to be done urgently.
A linear priority would have given the breakdown a maximum value and the PM job could very likely have been cancelled. But once the work is treated as a business risk the linear scale proves to be nonsense. This is the trap when using a linear priority scale for scheduling maintenance—the wrong jobs get done. Scheduling maintenance work is not the same situation as scheduling a list of tasks in a diary—it is not time management. Where in a diary we can list tasks by numeric order of importance and do them in that order, we cannot do so with maintenance because we are dealing with risk, and risk must be treated as orders of magnitude and not linear numbers.
Often people use a simpler scale of A-B-C to represent respectively High-Moderate-Low risk. An ABC risk scale is overlayed on the risk matrix in Table 3. It is simple to use, but it is too simple to be of use for setting maintenance priorities of work orders. Only ‘A’ and ‘B’ are likely to get scheduled (and even those will be in the wrong order due to the linear nature of using an ABC scale). The ‘C’ work orders wait for resources until often the equipment fails and the job becomes an ‘A’ priority (and of course then it gets done). The other deception in ABC work priority ranking is that in reality ‘A’ level risks do not actually happen often, yet there will be a disproportionate number of work orders ranked ‘A’. It causes limited resources to be used ineffectively, with jobs being done earlier than they should have been. With the cell numbering of Table 4 there are 10 priority levels to quickly differentiate the importance of a maintenance job by orders of magnitude. You do the highest numbers first because that is where the really big money is for the business. All cells with the same value carry equal risk and apart from convenience, it does not matter in which order you do work of the same business risk rating.
Some risk professionals tell you not to assume the worst possible thing that could happen, rather to assume what could reasonably be expected to go bad in the circumstances i.e. a pessimistic assumption but not the absolutely worst credible. I take the very worst possible, as that is why calamities like Flixborough in UK, Bhopal in India, BP Texas Refinery Explosion, sinking of the Titanic, Longford Gas Plant explosion in Australia, Piper Alpha in the North Sea, the Challenger Space Shuttle disaster and far too many others, happened; someone said they were not creditable events and did nothing about it. Catastrophic risk doesn’t work like that—massive risks arise more often than by chance unless you prevent them3. What you want to do is to encourage people to be
proactive and look for things to go wrong. Using a risk matrix to rate maintenance work helps people see wasted profits and possible disasters and justify action to stop them from happening.
Prioritising maintenance work with a matrix like Table 4 highlights the great importance of doing scheduled PM work. Many times, scheduled work will be delayed when resources are not available because of apparently higher priority work (Often PMs are mistakenly cancelled to wait for the next time they come due.). But this is crazy because scheduled work is there to prevent a failure. If a scheduled predictive maintenance (PdM) job, or condition monitoring (CM), or a preventive maintenance (PM) job is not done when due, you increase the likelihood of breakdown. Delay doing those jobs long enough and you guarantee failure. Doing PM and CM work is the first principle of maintenance management because you proactively keep your equipment healthy.
The priority table warns us about one more important maintenance management principle—a maintenance group cannot do both urgent work and important work at the same time. In Table 4 the urgent work is shown separate to important work that is not urgent. The group responsible for urgent work focuses on getting good at reactive maintenance done to high reliability standards. The group focused on important work gets good at doing high quality work to create high reliability (so that there will be no urgent work in future). Each group needs a different mentality that cannot be shared within one group of people—reactive work will always win and kill reliability growth work unless you separate the two.
Things Not to Do in Maintenance Work Priority Rating
Asset Priority and Asset Criticality are not necessarily the same meaning. Asset Criticality is the risk value calculated from the risk equation. Asset Priority is the order of importance of the asset to the business. It can be a risk value or some other scale, be it numeric like 1, 2, 3. 4, 5, or descriptive such as low, medium, high, extreme. Similarly, Job Priority and Job Criticality are not necessarily the same meaning. Job Criticality also is the risk value from the risk equation (we used it above to set Work Priority), whereas Job Priority is a numeric or descriptive order.
In Figure 1 the values for Asset Criticality and Job Criticality rating come from their respective risk matrices and are mistakenly used as the axes in a second matrix for selecting the work order priority. This arrangement adds two risk values together but the final value does not represent the true risk in the situation it is meant to represent. Each cell is not ten times the value of the one above it or to the right of it. The approach produces a priority order but it does not correctly reflect the real risk.
The work priority equations used in some CMMS can lead to skewed priority. In Figure 2 the priority equation is 2 x Job Priority + Asset Priority. By using the word ‘priority’ it causes confusion as to whether to use linear or risk scales.
If risk scales are used then, like Figure 1, it adds two risks together to create a scale not related to the risk. Furthermore, doubling the job priority value causes the situational circumstance of a job to have more value than the importance of an asset. You see the skewed effect because values diagonally down to the right in a standard risk matrix are now far to the right. It makes getting the job done more important than what is best for the business. If the axes were linear then the resultant priority is also linear and does not reflect risk.
Another trap is shown in Figure 3, where the intent was to scale Maintenance Work Order Priority downward from a highest value of 100. This was done by weighting the values to make them fit the required scale. The problem is that you can’t scale risk values as you wish and think the result reflects what the risk actually is. You may achieve the aim of getting a particular scale, but the numbers do not reflect real risk. Work prioritised by this scale will not have the necessary importance to people. A job with a priority of 20 is not five times less risk of one rated 100. In a correct risk matrix, the difference between a job priority of 10 (5 + 5) and a priority of 2 (1 + 1) is 100 million times riskier (100 x 100,000,000 = 10,000,000,000, or as log10 2 + 8 = 10). This 100-point scale confuses people into thinking that a job with a 70 value is not much more important than one with a 50 value and so it can wait to be done because the numbers are not that different. In a normal log10 risk table there would be 100 times the difference in risk.
Conclusion
When you decide to prioritise your maintenance, work orders you are taking a risk decision. It is a choice that has great business consequence. Scheduling maintenance work requires understanding that a small failure can lead to a disaster and the risk is not linear. To protect from gross scheduling errors, make maintenance decisions on a risk based priority matrix, and ensure that your CMMS uses a real risk calculation and not a convenient way to get a nice scale that misleads your selection.
Mike Sondalini
Leave a Reply