Are We Teaching Reliability All Wrong?

Let’s Demand Better Reliability Engineering Content

Teaching reliability occurs through textbooks, technical papers, peers, mentors, and courses. The many sources available tend to use MTBF as a primary vehicle to describe system reliability.

What has gone wrong with our education process?

MTBF Abounds in Books and Lectures

From tutorials by college professionals at RAMS to numerous ‘reliability engineering’ textbooks, the discussion equates reliability with MTBF. The way to measure or describe the reliability of something is MTTF for non-repairable systems or MTBF for repairable systems (like that bit of semantics matters).

Just use an average. It’s good enough.

A good textbook does not mention MTBF; a great lecture avoids the use of MTBF. IMHO

When I confronted a professor on why a major portion of a tutorial on reliability statistics focused on MTBF, she said it was a great way to teach the concepts of distribution properties without worrying too much about the math. It was merely a mechanism to teach other concepts.

The lecture did not spend much time applying those concepts to real problems. It did not explain the use of MTBF (and the exponential distribution) was never to be used in the real-world set of examples to focus on key concepts. It left me and others in the room with the idea that MTBF was the way to describe reliability.

The same conversations with book authors.

MTBF is Easy

The ASQ CRE exam is rife with problems using MTBF (or MTTF). Why? – Because it is easy and quick to test calculations based on MTBF and the exponential distribution.

Sure, it’s easy, but it’s not something we should be good at. Aren’t we supposed to be good at solving real problems that are not easy? How about creating a certification exam that evaluates what we should know and do at work, not what is considered easy?

Students/Engineers Need to Understand MTBF

As I rant on about abolishing MTBF in more than one setting, I encounter the rebuttal that students and engineers need to know about it, if for no other reason than that it is out there.

Sure, MTBF is on data sheets, test reports, parts count software outputs, etc. It is everywhere.

I agree that students and engineers need to understand the folly of MTBF, the lack of information it contains, and the inability to use it for meaningful decision-making. What I really want is for students and engineers to learn to automatically insist on more information, data, and evidence, all leading to an understanding of reliability (probably of success over time…).

When I hear someone ask for MTBF, I ask them what they really want to know. What they seek, when asking for MTBF, is never served or supported by knowing MTBF. We need to learn, and teach, reliability engineering to help each other ask better questions and solve real problems.

Stop This Vicious Cycle Now

If I use existing literature and teachings on reliability engineering to prepare a new book on the topic, I would likely feel compelled to include MTBF. I won’t, other than to warn the reader to not use it at all.

You should do the same.

If in a class or tutorial where the instructor mentions MTBF, ask them when they will begin talking about something useful concerning reliability. Go ahead, you can say I urged you to ask.

If reading a book that drones on about MTBF testing and confidence intervals for time or failure truncated testing, send the author a note on when and under what conditions would this technique every be useful. Ask then to provide case studies and evidence that the underlying failure mechanisms involved are actually best fit by the exponential distribution. Ask them to justify spending more than one sentence of this expensive book on such drivel.

Go ahead, ask for rationale and justification. Ask for a better education.

If enough people stand up and say, ‘Hold on—when in the real world is the use of MTBF ever useful?’, we just may get some professors and authors to provide meaningful content.

How have you challenged the use of MTBF? If you haven’t, why not? What is holding you back?

Larry George says

April 14, 2025 at 9:37 PM

Thanks for your diatribe. I concur. I even tried to get the IEEE 1413-1 committee to consider what really happens in the field and my usual appeal to use data derivable from accounting data required by GAAP. I gave up. The following rant is on http://www.linkdein.com.
I learned actuarial forecasting and spares planning methods while working for the US Air Force Logistics Command in the 1970s [AFLCM 66-17, “Quantitative Analyses, Forecasting, and Integrative Management Techniques for Maintenance Planning and Control,” AFM 400-1, “Logistics, Selective Management of Propulsion Units, Policy and Guidance”]. I am grateful for the education. The US AFLC actuarial methods assume constant failure rates within age intervals, Poisson demands, and ignore variance induced by variable flying hours per aircraft in the flying hour program.
These actuarial methods were developed for the Air Force in the 1960s by RAND Corp., primarily involving Murray Giesler [Geisler, Murray A., The Rand logistics research program 1966. Santa Monica, CA: RAND Corporation, 2004. https://www.rand.org/pubs/papers/P3447.html/%5D They estimated age-specific failure rates conditional on survival up to specified ages and made actuarial forecasts of engine demands depending on the flying-hour program (plan). (An actuarial forecast is a(s)*n(t-s) s=1,2,…,t where a(s) is actuarial failure rate at age s and n(t-s) is the installed base of age t-s.) Periodic actuarial meetings somehow consolidated engine failure data into agreements on hourly actuarial failure rates, for forecasting engine demands and for war readiness spares requirements.
I later figured out how to estimate actuarial failure rates, for all engines, engine modules, and their service parts, with or without life-limits and without lifetime data [https://sites.google.com/site/fieldreliability/]. I offered to show AFIT faculty, AFOSR, AFRL, and RAND how to extend actuarial methods to all service parts.
The Air Force has reverted to MTBF management. AFMAN20-116_AFGM2025-01 15 January 2025, https://static.e-publishing.af.mil/production/1/saf_aq/publication/afman20-116/afman20-116.pdf/ revises “AFMAN20-116 PROPULSION LIFE CYCLE MANAGEMENT FOR AERIAL VEHICLES, 13 April 2022 as follows…
“MAJCOMs, Depots, and field units will use ATOW {Average Time On Wing] or MTBR [Mean Time Between Removal] as the primary metric to measure RCM effectiveness and overall engine reliability health.”… “Total and inherent ATOW or MTBR will be reported by the Engine TMS manager. Both measures will exclude all serviceable built up removals and quick turn removals. The inherent ATOW or MTBR will also exclude removals for Foreign Object Damage (FOD), fuel/oil contamination (non-engine related), and other maintenance faults exclusive of the design.”
ATOW is calculated as: ATOW = ∑ EFH removed engine ÷ # removals, where ∑ EFH removed engine is the sum of flying hours since the last removal on only the engines removed in a given quarter. This is calculated manually as CEMS and Propulsion Actuarial Client/Server do not automatically report this number. 10.4.1.2.4. Quarterly data from Propulsion Actuarial Client/Server is used for EFH and number of removals. 10.4.1.2.5. MTBR is calculated quarterly by the Engine TMS actuary, using a four quarter rolling average to smooth any seasonal variation, and posted on the Actuarial SharePoint site (https://usaf.dps.mil/teams/21162/act/Shared%20Documents/Forms/AllItems.asp x?viewpath=%2Fteams%2F21162%2Fact%2FShared%20Documents%2FFor ms%2FAllItems%2Easpx)? [GitHub]
The USAF continues to use armchair exercises (such as RCM and MTBR=hours/removals) as of 15 Jan. 2025! “By Order of the Secretary of the Air Force, this Guidance Memorandum immediately changes AFMAN 20-116, Propulsion Life Cycle Management for Aerial Vehicles. Compliance with this memorandum is mandatory. To the extent its directions are inconsistent with other Department of the Air Force publications, the information herein prevails, in accordance with DAFI 90-160, Publications and Forms Management.”
Does the US Department of Government Efficiency deserve credit for the elimination of actuarial methods and statistics from engine and parts’ management? Technically, efficiency is the ratio of the useful work performed by a machine or in a process to the total energy expended. This reversion to MTBF management produces little useful work but requires less energy than actuarial methods.
“In the context of reliability, “efficiency” refers to the ability of a system to perform its intended function consistently over time with minimal wasted effort or resources, essentially maximizing output while minimizing downtime and failures, meaning it not only functions reliably but does so with optimal resource usage; it’s about achieving the desired result with the least possible input needed.” [Google AI]

Comments

Larry George says
April 14, 2025 at 9:37 PM
Thanks for your diatribe. I concur. I even tried to get the IEEE 1413-1 committee to consider what really happens in the field and my usual appeal to use data derivable from accounting data required by GAAP. I gave up. The following rant is on http://www.linkdein.com.
I learned actuarial forecasting and spares planning methods while working for the US Air Force Logistics Command in the 1970s [AFLCM 66-17, “Quantitative Analyses, Forecasting, and Integrative Management Techniques for Maintenance Planning and Control,” AFM 400-1, “Logistics, Selective Management of Propulsion Units, Policy and Guidance”]. I am grateful for the education. The US AFLC actuarial methods assume constant failure rates within age intervals, Poisson demands, and ignore variance induced by variable flying hours per aircraft in the flying hour program.
These actuarial methods were developed for the Air Force in the 1960s by RAND Corp., primarily involving Murray Giesler [Geisler, Murray A., The Rand logistics research program 1966. Santa Monica, CA: RAND Corporation, 2004. https://www.rand.org/pubs/papers/P3447.html/%5D They estimated age-specific failure rates conditional on survival up to specified ages and made actuarial forecasts of engine demands depending on the flying-hour program (plan). (An actuarial forecast is a(s)*n(t-s) s=1,2,…,t where a(s) is actuarial failure rate at age s and n(t-s) is the installed base of age t-s.) Periodic actuarial meetings somehow consolidated engine failure data into agreements on hourly actuarial failure rates, for forecasting engine demands and for war readiness spares requirements.
I later figured out how to estimate actuarial failure rates, for all engines, engine modules, and their service parts, with or without life-limits and without lifetime data [https://sites.google.com/site/fieldreliability/]. I offered to show AFIT faculty, AFOSR, AFRL, and RAND how to extend actuarial methods to all service parts.
The Air Force has reverted to MTBF management. AFMAN20-116_AFGM2025-01 15 January 2025, https://static.e-publishing.af.mil/production/1/saf_aq/publication/afman20-116/afman20-116.pdf/ revises “AFMAN20-116 PROPULSION LIFE CYCLE MANAGEMENT FOR AERIAL VEHICLES, 13 April 2022 as follows…
“MAJCOMs, Depots, and field units will use ATOW {Average Time On Wing] or MTBR [Mean Time Between Removal] as the primary metric to measure RCM effectiveness and overall engine reliability health.”… “Total and inherent ATOW or MTBR will be reported by the Engine TMS manager. Both measures will exclude all serviceable built up removals and quick turn removals. The inherent ATOW or MTBR will also exclude removals for Foreign Object Damage (FOD), fuel/oil contamination (non-engine related), and other maintenance faults exclusive of the design.”
ATOW is calculated as: ATOW = ∑ EFH removed engine ÷ # removals, where ∑ EFH removed engine is the sum of flying hours since the last removal on only the engines removed in a given quarter. This is calculated manually as CEMS and Propulsion Actuarial Client/Server do not automatically report this number. 10.4.1.2.4. Quarterly data from Propulsion Actuarial Client/Server is used for EFH and number of removals. 10.4.1.2.5. MTBR is calculated quarterly by the Engine TMS actuary, using a four quarter rolling average to smooth any seasonal variation, and posted on the Actuarial SharePoint site (https://usaf.dps.mil/teams/21162/act/Shared%20Documents/Forms/AllItems.asp x?viewpath=%2Fteams%2F21162%2Fact%2FShared%20Documents%2FFor ms%2FAllItems%2Easpx)? [GitHub]
The USAF continues to use armchair exercises (such as RCM and MTBR=hours/removals) as of 15 Jan. 2025! “By Order of the Secretary of the Air Force, this Guidance Memorandum immediately changes AFMAN 20-116, Propulsion Life Cycle Management for Aerial Vehicles. Compliance with this memorandum is mandatory. To the extent its directions are inconsistent with other Department of the Air Force publications, the information herein prevails, in accordance with DAFI 90-160, Publications and Forms Management.”
Does the US Department of Government Efficiency deserve credit for the elimination of actuarial methods and statistics from engine and parts’ management? Technically, efficiency is the ratio of the useful work performed by a machine or in a process to the total energy expended. This reversion to MTBF management produces little useful work but requires less energy than actuarial methods.
“In the context of reliability, “efficiency” refers to the ability of a system to perform its intended function consistently over time with minimal wasted effort or resources, essentially maximizing output while minimizing downtime and failures, meaning it not only functions reliably but does so with optimal resource usage; it’s about achieving the desired result with the least possible input needed.” [Google AI]