Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • CMMSradio
    • Way of the Quality Warrior
    • Critical Talks
    • Asset Performance
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Hero
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Breaking Bad for Reliability
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • The RCA
      • Communicating with FINESSE
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Hardware Product Develoment Lifecycle
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • Your Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
      • FMEA Introduction
      • AIAG & VDA FMEA Methodology
    • Barringer Process Reliability Introduction
      • Barringer Process Reliability Introduction Course Landing Page
    • Fault Tree Analysis (FTA)
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
    • Accendo Reliability Webinar Series
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
Home » Articles » on Product Reliability » Breaking Bad for Reliability » FMEA in Practice: Lessons Learned from Mistakes

by Ayaz Bayramov Leave a Comment

FMEA in Practice: Lessons Learned from Mistakes

FMEA in Practice: Lessons Learned from Mistakes

I usually write articles about topics I personally struggled to understand from the sources available to us such as books, online resources, and so on. I believe most technical concepts are fairly straightforward at their core, but the way we express ideas and translate our understanding into writing often makes them harder for others to grasp. That’s an area where we can all continue to improve.

As part of that journey, my goal with the Breaking Bad for Reliability newsletter is to be a communicator of Reliability Engineering principles, and I am doing this mainly for two categories of people:

  • People who want to become reliability engineers but have minimal information about their responsibilities.
  • People or companies who want to hire reliability engineers but don’t have a clear understanding of what they actually need, or what skills to focus on in their hiring process.

In this context, one of the topics I have been planning to write about is Failure Mode and Effects Analysis (FMEA), a widely used engineering tool in product and process risk management. Early in my career as a reliability engineer, I read a lot about FMEAs and grasped the basic idea behind them, but in practice my FMEA sessions never went the way they were described in books and academic papers. As with many engineering concepts, what’s on paper and what happens in reality don’t always align — and that’s what motivated me to write about it.

Let me be very clear here: if you are here to learn how to do an FMEA, this is probably not the right source for you. There are many good trainings and books out there on the subject, and personally, the best book I have read so far is Effective FMEAs by Carl S. Carlson. Carl did an amazing job of systematically laying out the entire process.

However, my main goal in this article is different. I want to talk about the practical challenges — the things that are rarely written anywhere — that I have experienced throughout my career. So, let’s jump in.

 


OWNERSHIP

The first thing I want to focus on is ownership: who should own FMEAs?

I believe in the saying, “If everyone owns it, then no one really owns it.”

If everyone owns it, then no one really owns it.

In my career, I have seen systems, reliability, or design teams take ownership of FMEAs. There are pros and cons to each approach, which I will talk about in a bit, but at least having someone responsible is a big step forward.

When reliability (under systems engineering) owns it, the process is usually systematic and step-by-step: functions are identified, associated failure modes and mechanisms are listed, risks are assessed, and so on. But it generally lacks ownership from the design teams who have the ultimate knowledge about the current design. Reliability engineers often do not have deep technical knowledge of the design, so they end up chasing design engineers for information. Meanwhile, design engineers, who already have a full plate of tasks, don’t prioritize FMEAs unless the benefits are clear to them. When those benefits are not communicated, the process turns into a nightmare for reliability engineers, who eventually end up working in isolation.

On the other hand, when design teams take full ownership without any reliability engineer’s involvement, the opposite problem emerges. A huge amount of time is spent cataloging individual component failure modes, including extremely unlikely ones, but the system perspective is lost. What begins as a system risk analysis exercise quickly turns into a time-consuming documentation effort with little connection to meaningful system design decisions.

A slightly better version is when design teams still own the process but are supported by reliability engineers. In this case, reliability engineers act as facilitators, helping teams structure risks, prioritize them, and burn them down in a meaningful way. Ownership remains with design, but the process gains the structure and discipline needed to produce real value.

In reality, there is no single right answer. As engineers like to say, “it depends” — and it really does. Considering how complex and interconnected today’s technological products and processes are, having a systematic way to identify risks, prioritize them against system goals, and manage them proactively is critical to getting real value out of FMEAs. In my view, systems teams (including reliability engineers) should own FMEAs early in the design cycle, when design decisions are still at the high, system level. Here the focus should be on system-level risks using a top-down approach, which I am going to explore in the next paragraph. As the design matures and details solidify, ownership should shift to design teams, while systems and reliability engineers step back into the role of facilitators to support the process.


TOP-DOWN OR BOTTOM-UP?

This is a sensitive topic for many reliability engineers. In particular, professionals from a defense industry background often advocate for the bottom-up approach and refer to MIL-STD-1629, which was later cancelled by the DoD.

The biggest issue I see with bottom-up is the risk of wasting time on risks that do not matter at the system level within the specific operational and environmental conditions. When individual component teams start working on FMEAs without a clear system view — without high-level functions, constraints, and interfaces — they end up listing every possible failure mode, even those that have little or no impact in the system context. For example, a bearing may have dozens of different failure modes, but whether those modes matter depends entirely on the system. Is the bearing used in a car engine, a kid’s scooter, or a rocket engine pump? Without context, you waste energy analyzing irrelevant details.

To illustrate: imagine a system that consists of 5 components, and each component has 2 functions. Each function has 2 failure modes, and let’s say each failure mode has 3 causes. That alone multiplies into 60 risks that need to be identified, assessed, and managed. You can see how quickly the numbers grow. The problem is that many of those mechanisms or causes at the lower levels of the physical hierarchy may be irrelevant, or represent extremely low risks in the context of the actual system being built in that specific environment and use profile. When you follow a bottom-up approach, this kind of noise is unavoidable and can exhaust your resources long before you create any real value.Failure mode progression — systematic elimination of failure causes that do not need to be carried down to lower levels.

Article content
Figure 1: Failure mode progression – systematic elimination of failure causes that do not need to be carried to lower levels

This is why I prefer the top-down approach. You begin with the big picture, define system-level risks, then work your way down, filtering out what is irrelevant and focusing on the few risks that truly matter — often called the “vital few.”

In many organizations I have worked at, bottom-up was the default simply because that was the way things had always been done. Changing that mindset was often an uphill battle. But once the benefits of a top-down approach were clearly communicated, it almost always turned into a success.


TIMING

Another important aspect of FMEAs is timing: when should you start?

Think of two extremes. On one end, you wait until the design is frozen and run FMEAs just to document findings. On the other, you start when there is only an idea — during brainstorming and concept trades. Personally, I prefer the second.

The purpose of FMEAs is to identify and manage risks structurally, and the earlier this is embedded into the decision-making process, the better. I like doing functional FMEAs in the early stages, when little is known about the physical design, and then refining them as the design matures. The later you start, the more likely FMEAs become a pencil-whipping exercise that adds no value.

In practice, however, reliability engineers are often brought into programs late in the cycle. By then, FMEAs may already have been performed poorly, people are frustrated, and trust in the process is gone. That makes the reliability engineer’s job much harder, as they must first prove the value of FMEAs and often try to salvage existing ones. In some cases, I found it easier to start from scratch rather than trying to fix a broken process. If you hear skepticism about the usefulness of FMEAs, it almost always means the purpose of the process was not well understood, and the common mistakes I described earlier were made.


WHERE To STOP

Another aspect of FMEAs that I personally struggled with — and where I wasted an incredible amount of energy and resources early in my career — is the question we should all be asking: “How deep in the physical hierarchy should we go?”

To give you an idea, take a pump. Do we stop at the impeller and simply note “impeller breaks,” or do we go deeper and analyze specific crack mechanisms on the impeller surface? Or take a printed circuit board: should we stop at the board level, or continue breaking it down into every resistor, capacitor, and diode?

My rule of thumb is simple: stop at the point where you no longer have meaningful control. If you cannot make design changes at that level, if you lack the data or visibility to properly assess risk, or if the component is entirely sourced from an external supplier whose internal design you cannot influence, then drilling down further only produces paperwork. It adds complexity without making the system any more reliable.

The reason this matters is that going too deep drains resources and dilutes focus. I have seen teams spend weeks cataloging resistor-level failure modes in a purchased PCB. It looked impressive on paper, but in practice it contributed nothing to the reliability of the final product. What truly makes a difference in that situation is specifying clear performance requirements, testing effectively, and qualifying suppliers — not listing every possible failure of a resistor you don’t design or manufacture.

There is also the problem of complexity creep. Every function branches into modes, every mode into causes, and soon you are staring at a spreadsheet so large that no one can realistically use it. That is when FMEAs lose credibility and get dismissed as “just compliance paperwork.” By contrast, if you stop at the right level — the highest level where your team can still influence the outcome — you preserve clarity. The FMEA remains lean, credible, and actionable. Most importantly, it directs attention to risks you can actually manage, rather than drowning you in noise that you cannot.


IN SUMMARY

FMEA is just another tool from the Design for Reliability Process toolkit. We should not perform it just for the sake of tradition or compliance. Like any tool, its purpose is to provide information that helps us make better decisions and improve design.

At its best, FMEA is not about filling out sections or checking boxes. It is about the actions it drives, the insights it uncovers, and the design decisions it informs. That is where the real value lies.

Filed Under: Articles, Breaking Bad for Reliability, on Product Reliability Tagged With: Failure modes, FMEA, FMEA Challenges, Reliability engineering, risk management

About Ayaz Bayramov

Ayaz Bayramov is the author of the article series Breaking Bad for Reliability.

« Finding the Right Manufacturer and Tackling Tooling Challenges

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Breaking Bad for Reliability  series logo Photo of Ayaz BayramovArticles by Ayaz Bayramov
in the Breaking Bad for Reliability article series

Recent Posts

  • FMEA in Practice: Lessons Learned from Mistakes
  • Finding the Right Manufacturer and Tackling Tooling Challenges
  • Quality Objective 8: SPECIAL CHARACTERISTICS
  • REVIEW Analyzing Repairable System Failures Data
  • Quantitative vs. Qualitative Risk Analysis

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy

Book the Course with John
  Ask a question or send along a comment. Please login to view and use the contact form.
This site uses cookies to give you a better experience, analyze site traffic, and gain insight to products or offers that may interest you. By continuing, you consent to the use of cookies. Learn how we use cookies, how they work, and how to set your browser preferences by reading our Cookies Policy.