Reliability and Risk Mitigation Actions
Once reliability risks have been identified and analyzed, we have to do something.
Well, not really.
If the risk is acceptable to the stakeholders, then we can simply continue with the current plan and monitor or any new risks or changes in our understanding of the existing risks.
Risk mitigation is a ‘system, process, or investment to control the likelihood or consequence of a risk.’ This is according to the glossary of risk terms in ISO 31000: Enterprise Risk Management. In many cases, the risk is unacceptable and will require mitigation.
Mitigating Unknowns Concerning Risk
At times it is our understanding the elements that comprise a risk that makes it worth mitigating. A new vendor provides a range of unknowns from their process capability to the supplied component’s actual behavior within our design.
The solution here is obvious. Get more information and reassess the risk. Run an experiment, call the vendor with requests for data, or conduct the necessary research.
If we’re not sure how often an item will fail, we’re seeking failure rate information. Life studies that test to failure, literature searches, vendor data, physics of failure models are all viable means to estimate a failure rate.
If the we’re not sure how something will fail, that suggests that we need to let the item fail. Of course, be safe. In some cases causing a failure is not possible while staying safe (or it’s cost prohibitive). Then analytical studies, literature searches, and simulations may have to suffice to discover the nature of a failure’s consequences.
Mitigating High Likelihood of Risk Occurring
Some forms of risk have a high probability of the triggering stresses to occur. The situation the initiates the path to failure and the resulting consequence may just occur too often.
Changing the environment in order to reduce the occurrence of the stress conditions may be possible. Adding cooling fans for example helps to reduce temperatures and the chance of high temperature induced failure mechanism.
Adding a warning in the operating manual is generally not effective, btw.
Another option is to interrupt the string of events that start with the initiating event and end with the realization of the consequence. For example, the oil light on combustion engine vehicles provide a warning that the oil pressure is low and needs attention. Repairing the leak or adding additional oil breaks the sequence of the loss of oil pressure and the engine seizure failure (high consequence). The oil light signals the need for maintenance.
A third option is to create a system that is robust to the frequently occurring initiating stress. If a product experiences a frequent stress, that is currently causing an elevated risk of a serious consequence, adjust the product to be robust to that stress. If your product should survive routine high temperature exposure, be sure to use material and assembly process that will survive such exposures. Be sure to check thermal cycling effects as well.
Mitigating Severe Consequences of Risks that Occur
A few of the reliability risks we face may enjoy relative low probability of occurring, yet the resulting consequence is unacceptable. Typically safety related consequences are always addresses with less severe consequences resolved when possible.
The first option is the engineer the item’s response to the situation to fail safe. To go cold. This may involve fuzes, sensors, monitoring and interventions, yet the idea is to avoid the vessel explosion by venting the excessive pressure safely, for example.
The second option is the containment of the consequence. For example if there is a chance of a vessel explosion and venting fails, building a containment structure may limit the damage to just the containment area.
A third option is to buy insurance and hope. I don’t like this option, yet at times it may be the only viable option available.
Being Flexible and Creative to Best Implement Mitigations
Each risk we face will have uniques circumstances. With understanding of the stresses, paths to failure, and resulting consequences, we generally can find at least one or more means to minimize the chance of occurrence or the impact of the consequences.
I’m sure I’ve missed a few approaches to mitigate risk, which would you add?
Also published on Medium.