Safety is often directly correlated with system or component reliability. Is that really the case?
Find out what the experts and their data really says.
New Assumptions About Safety
- “Safety is increased by increasing system or component reliability. If components or systems do not fail, then accidents will not occur. This assumption is one of the most pervasive in engineering and other fields. The problem is that it is not true…”
- “High reliability is neither necessary nor sufficient for safety.”
These statements were excerpted from Nancy Leveson’s “Engineering a Safer World“.
These statements contradict the common belief in Reliability that there is a direct correlation between Safety and Reliability. I personally, being in the Reliability field for 35+ years, believe there is a correlation between Reliability and Safety. But I would assert that it is not a direct correlation.
This is because we can have a highly reliable operation and it still be unsafe, and we can also have a safe operation that is unreliable.
But I firmly believe (and have experienced) that a reliable operation is inherently a safer operation, than an unreliable one. In a reliable operation, there are fewer stops and starts and unexpected situations that deviate from control systems in place; so it stands to reason there are fewer needs to quickly correct a deviation from a standard.
I believe the Safety world has an inaccurate current day view of ‘RCA’ in general and therefore treats all RCA as a commodity equivalent to the limited capabilities of the 5-Whys (linear and identifies a single root cause). I believe how well we truly solve failures (losses resulting from deviations from an acceptable standard) has a direct impact on the safety of our workforce.
Contrary to popular belief, true RCA does NOT stop at blaming someone (based on a decision resulting in a bad outcome), but understanding the reasoning for their decision (their intent) at the time. Delving into a person’s intent for their decision, will often involve uncovering flawed organizational systems, restraining paradigms, cultural norms and other socio-technical influences.
What Safety & Reliability Experts Think
We also strongly believe that when we experience unexpected conditions (upsets), we test the boundaries of our safety controls and are at higher risk of experiencing a safety incident. Steady state (reliable) operations are typically less prone to such elevated risks.
The only formal paper I have seen thus far that is based on studies conducted at actual, specific plant operations, over a designated period of time is detailed in Ron Moore’s article, ‘A Reliable Plant is a Safe Plant, is a Cost Effective Plant’. The focus of the studies mentioned are to draw the links between Reliability, Safety and Costs.
Here are a Few of Ron’s Conclusions:
This data would tend to support a correlation exists but not necessarily a direct correlation. This data would not fully support causation. I would like to thank Ron Moore for allowing me to post his position on this very important topic.
Is Safety a System or Component Property?
Assumption 1: “Safety is increased by increasing system or component reliability. If components or systems do not fail, then accidents will not occur.”
This assumption is one of the most pervasive in engineering and other fields. The problem is that it is not true. Safety is a system property, not a component property, and must be controlled at the system level, not the component level (N. Leveson – Engineering a Safer World).
Ron Moore says “this appears to be an incorrect interpretation or characterization of the data. My data says that safety is improved by improving system reliability (and by inference component reliability). If you reduce the failures, both component and system level, you reduce the exposure to the risk of injury and therefore the probability of injury. However, I agree that it does not mean that accidents will not occur, since accidents are caused by any number of variables, some of which are not controlled by reliability excellence. I also agree that safety is a system property, not a component property, and must be controlled at the system level”.
Ron goes on to say, “In my view, one of the best, if not the best, measure for reliability is OEE/AU, a system level measure. Reliability isn’t just about maintenance, but her statements/assumptions seem to imply that it is. Indeed, my data says that maintenance typically only controls some 10% of the loss of production capacity captured in the OEE measure. Moreover, reliability is driven by our practices in design, procurement, stores, installation and startup, operation and maintenance, all of which contribute positively or negatively to system level reliability (not just equipment or components). Reducing the number of defects in these practices, both within each function and cooperatively as a team, will improve reliability and reduce the risk of injury, while reducing costs and environmental incidents”.
New Assumption 1: “High reliability is neither necessary nor sufficient for safety.”
I think this is a really bad assumption, even risky. It’s perplexing why anyone would say this. Why wouldn’t you want high reliability, particularly if it reduces risk – risk of injury, risk of high costs, and risk of environmental incidents. This assumption may depend on Dr. Leveson’s definition or view of reliability being driven by maintenance. Reliability should not be driven by maintenance. Maintenance is a support function to the overall plant and production process.
Ron Moore’s data demonstrates that manufacturing businesses can improve safety without commensurate improvement in reliability. However they reach a point where additional improvements in safety do not appear to be achievable, because the system has reached a statistically stable state. For example you can improve safety by improved personal behavior – wear your ppe, do your lock-out/tag-out properly, etc. However, once you do this exceptionally well, you have to reduce the exposure to the risk of injury, that is, you have to improve process reliability (not just equipment) to achieve further gains.
Is it as Simple as Cause and Effect?
A caution I would insert here is that correlation is not necessarily cause-and-effect. Anyway, the examples Dr. Leveson uses (the ones I read) are what I think of as sub-systems, and from that context I can see her point, and agree. Moreover, she does make a good point about the use of FMEA and the like. It’s really hard to capture all the complexity in a large system (a plant or combination of plants and other functions in a business) using those techniques.
I’ve said for many years that leadership, culture, teamwork, employee engagement are more important than any particular analysis tool, but that the tools are important for engaging people in solving problems.
Again, thank you to Ron Moore for allowing me to post his position on this very important topic.
Additional Supporting Data
I recently presented at the SMRP Symposium in Memphis and connected with my old friend Ramesh Gulati. He kindly provided me 12 years of additional field data from the Arnold Engineering Development Complex (AEDC) that further supports the conclusions of Ron Moore’s data described above. This graph shows a decrease in injury rates correlates to a decrease in PM backlogs and Unscheduled Downtime.
Thanks to my friends in the Reliability and Safety communities for sharing and contributing your experience in this post.
Additionally support is also found in the article, Reliability and Safety Inseparable,published in Efficient Plant Magazine by Klaus Blanche (director of the Reliability & Maintainability Center at the Univ. of Tennessee, and a research professor in the College of Engineering. Contact him at firstname.lastname@example.org).
Blanche states from my studies, top-quartile companies (low in reactive maintenance) spent 23% of their time finding issues with predictive technologies and condition-based monitoring. This does not include preparing the work orders to fix what was found. Top-quartile-company employee engagement (suggestions per employee):
● Showed a 27% better safety performance (OSHA recordable-incident rate) than the average of the remaining facilities
● Recorded a 14% better OSHA recordable-incident rate than the lower 75% of companies.
It’s this instilled process of root-cause analysis that drives ongoing improvement.
This latest data supports the correlation between Reliability and Safety.