
All machines and equipment fail for two reasons—distortion or degradation. Distortion causes parts to suffer such high stress or fatigue that their atomic structures fail. When parts degrade their atomic structure is attacked by environmental elements. Physics of Failure methods lets us analyse equipment for situations that cause their parts’ atomic structures to suffer excessive stress, or to degrade. We can identify the real causes of atomic failure and so institute the fewest maintenance and operational activities to keep equipment at its highest reliability, and the operating plant at its highest availability.
Keywords: failure prevention, defect elimination, proactive maintenance strategy
This extract from the book ‘Reliability, Maintainability and Risk’ by Dr David Smith1 is telling. It is a snapshot of what causes our equipment failures, and of what we need to do to prevent failure.
“In practice, failure rate is a system level effect. It is closely related but not entirely explained by component failure. A significant proportion of failures encountered with modern electronic systems are not the direct result of parts failure but more complex interactions within the system. The reason for the lack of precise mapping arises from such as human factors, software, environmental interference, interrelated component drift and circuit design tolerance.”
Though commenting on electronic system failure, the very same factors and issues apply to all machinery and equipment failure. Our machines and operating plants are systems of interacting equipment which are themselves each made of individual interacting components working in an orchestrated arrangement. Dr Smith advises that systems fail mostly because the individual parts fail. If we prevent individual parts failing then our operating machines (a system of parts) and production plants (a system of machines) would not fail either.
Figure 1 is a visual representation if what Smith recognises. It highlights that if we remove the causes of a machine’s parts failure and there are no failures to stop the machine.
The Science of Why Machinery and Equipment Parts Fail
Machine parts fail because their atomic structures can no longer take the imposed load. Atomic structures fail for two reasons—stress cause the atomic bonds to separate, or the atomic bonds are attacked and removed. This basic physics is the foundation of modern Physics of Failure design methodology used to engineer and build reliable machines. The same simple reasoning for parts’ failure can be applied to select maintenance strategy that lets you recover the most value from existing facilities, equipment and infrastructure with the least maintenance.
Figures 2 and 3 are simple diagrams of why materials-of-construction fail from stress and fatigue. Figure 2 indicates how operating stresses can rise to overload a part, or the part’s structure can fatigue and be unable to take the load. The stress and fatigue are caused by destruction of the atomic structure. When stress is put on the atomic matrix the atomic bonds absorb the load. If the stress is too massive and sustained the bonds across the load-carrying section separate—this is overload. If the load is massive but rapidly removed only a few bonds separate. The bonds left whole remain to carry the load but now with less load-carrying structure available—this is fatigue.
Figure 3 shows the cumulative effect of fatigue over the life of the equipment from imposed stresses. Eventually the structure of the materials-of-construction used in a part fails. If we can prevent distortion of parts so their atomic stress levels are kept far below the values that separate the atomic bonds then the parts will not be failed and our machines will be highly reliable and will remain so.
Parts also fail when their atomic matrix is attacked by external elements in their local environment and bonds are removed. Oxygen in the air degrades rubber, hydrogen ions in water cause carbon steels to corrode, such as the pitting and crevice corrosion mechanisms shown in Figures 4 and 5, and aggressive chemicals attack inter-granular phases in alloy metals.
In these situations atomic structures fail by degradation. If we can prevent degrading environments from enveloping our parts then more causes of atomic bond failure are removed and the parts will not be failed. Consequently our machines will become more reliable and remain so.
Maintenance Strategy from Physics of Failure Analysis
You can derive the minimal reliability excellence strategy by considering an individual part’s Physics of Failure mechanisms. Figure 6 provides an overview of the resulting bottom-up methodology, which we at Lifetime Reliability Solutions call ‘Plant and Equipment Wellness’.

Your asset maintenance management system, work quality management system, engineering design system, and operational management system all naturally derive from the activities that prevent the deformation and degradation of each part in your machines and equipment.
Additional Asset Maintenance Management System Considerations
The quote from Dr Smith at the beginning of this article pointed out that “…failure rate is a system level effect. It is closely related but not entirely explained by component failure.” There remain still other failure causes that you must consider and prevent if you want lasting world-class reliability in your operation. These include human factors, weird effects arising because of interactions within a system itself, and ‘knife-edge’ designs intolerant to operational variations.
These remaining causes of failure are organisational induced factors that play-out over time to eventually fail a component and stop production. The component will fail according to Physics of Failure mechanisms but the part failure was induced by poor organisational processes.
Physics of Failure and Organisational Questions for Maintenance Strategy Selection
If equipment failure is stopped by preventing equipment parts from failing it is vital that we know what causes each critical part in a machine to fail. A critical part is any component that upon failing prevents the equipment from operating at its minimum service duty. Failure is defined as any unwanted or unsatisfactory behaviour. A breakdown is the end result of a prior failure. To identify what events cause a part to failure we must generate suitable questions related to Physics of Failure causes and Organisational induced causes.
There are three Physics of Failure driven questions used to identify the physical causes of a part’s deformation and degradation failures:
1. How can the part’s atomic structure be overstressed?
2. How can the part’s atomic structure be fatigued?
3. How can the part’s atomic structure be degraded?
There are three key Organisational Factors questions used to expose work process induced failure:
1. What human factors allow the part to fail?
2. What business processes allow the part to fail?
3. What design issues allow the part to fail?
It is the economics of a failure event that drives and justifies the business efforts and expenditure for its prevention. Without first analysing and understanding the total financial impact throughout a business of an equipment’s failure you are never sure what are the right actions to take, nor can you measure if they are effective. The business economics of failure are identified with two more questions:
1. Are the business-wide consequences of an equipment failure acceptable?
2. Where failure is acceptable how frequently can it occur before it becomes unacceptable?
The answers to these eight questions are tabulated into a spreadsheet such as Figure 7 for a pinion gear. From which you then develop the maintenance and operational actions to adopt that will create the right conditions to prevent the causes arising and thereby create equipment reliability.
The answers from the eight questions flow throughout your business, as indicated in the chart of Figure 8. The solutions that prevent excessive atomic stress show you how to improve your Engineering, Maintenance, Operational and Quality processes. You design the minimal business processes and activities required to prevent equipment failure. When the answers are adopted you gain new reliability that lets you move rapidly towards world class asset performance.
To help you understand this Physics of Failure based methodology there are tutorials of the technique on our website.
My best regards to you,
Mike Sondalini
Leave a Reply