How to Talk About MTBF
Join Chris and Fred as they discuss the MTBF. Seemingly sophisticated engineering industries (like aircraft manufacturing and electronic design) assume that every component that is ever used never gets old, and never gets better. In other words, every failure phenomenon has a constant hazard rate. How do you work with this?
- Just to clarify … The constant hazard rate implies that the likelihood of a 100-year-old component is just as likely to fail TODAY as a brand-new component (provided both are still working). Really?
- How do industries convince themselves that the constant hazard rate is not a good model? They do things like pointing to a ‘bathtub curve’ that represents how hazard rates can initially decrease (wear in based on manufacturing defects), constant hazard rate (at the bottom of the bathtub curve), and increasing hazard rate (where things wear out) … and saying that they assume all the manufacturing defects have been removed and wear out can be ‘mitigated’ by maintenance and inspections. Really?
- How do I convince my organization and industry this is not good? There is a huge problem here. For example, virtually every aircraft crash is very well investigated by organizations like the FAA and NTSB and lots of other similar organizations across the world. Virtually every investigation that doesn’t involve human (pilot) error identifies a manufacturing error (like inclusions or cracks on turbine blades) or unmanaged wear out (like insulation degradation on electronic cabling that results in arcs that initiate fire).
- What causes constant hazard rate failures? Randomly occurring external and catastrophic external stresses. Think things like ‘bird strikes’ like those that occurred on US Airways flight 1549 that emergency landed on the Hudson River in New York in 2009. By the way … this was a successful emergency landing where no one died …
- There needs to be a business imperative for this. When there comes time for change, there needs to be a perceived HUGE business benefit that outweighs the perceived personal risk of someone going against the grain and suggesting the MTBF is bad. Perhaps you can make the case that if things go wrong, this assumption of the constant hazard rate could be the ROOT CAUSE of failure.
- And a system that is ‘too complex’ is not an excuse. Why? Because there are lots of ways your system can fail. Which means you have a huge choice of failure mechanisms to choose from if you want to improve reliability. And this choice means you can find the VITAL FEW things that drastically improve reliability.
Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.