Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
    • About Us
    • Colophon
    • Survey
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • CMMSradio
    • Way of the Quality Warrior
    • Critical Talks
    • Asset Performance
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Hero
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • NoMTBF
    • on Leadership & Career
      • Advanced Engineering Culture
      • ASQR&R
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Maintenance Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • RCM Blitz®
      • ReliabilityXperience
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Breaking Bad for Reliability
      • Field Reliability Data Analysis
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability by Design
      • Reliability Competence
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
      • Reliability Knowledge
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • The RCA
      • Communicating with FINESSE
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Institute of Quality & Reliability
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Statistical Methods for Failure-Time Data
      • Testing 1 2 3
      • The Hardware Product Develoment Lifecycle
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Special Offers
    • Accendo Authors
    • FMEA Resources
    • Glossary
    • Feed Forward Publications
    • Openings
    • Books
    • Webinar Sources
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • Your Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Live Courses
      • Introduction to Reliability Engineering & Accelerated Testings Course Landing Page
      • Advanced Accelerated Testing Course Landing Page
    • Integral Concepts Courses
      • Reliability Analysis Methods Course Landing Page
      • Applied Reliability Analysis Course Landing Page
      • Statistics, Hypothesis Testing, & Regression Modeling Course Landing Page
      • Measurement System Assessment Course Landing Page
      • SPC & Process Capability Course Landing Page
      • Design of Experiments Course Landing Page
    • The Manufacturing Academy Courses
      • An Introduction to Reliability Engineering
      • Reliability Engineering Statistics
      • An Introduction to Quality Engineering
      • Quality Engineering Statistics
      • FMEA in Practice
      • Process Capability Analysis course
      • Root Cause Analysis and the 8D Corrective Action Process course
      • Return on Investment online course
    • Industrial Metallurgist Courses
    • FMEA courses Powered by The Luminous Group
      • FMEA Introduction
      • AIAG & VDA FMEA Methodology
    • Barringer Process Reliability Introduction
      • Barringer Process Reliability Introduction Course Landing Page
    • Fault Tree Analysis (FTA)
    • Foundations of RCM online course
    • Reliability Engineering for Heavy Industry
    • How to be an Online Student
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
    • Accendo Reliability Webinar Series
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home
Home » Articles » NoMTBF » Failure Happens – It Is What Happens Next That Matters

by Fred Schenkelberg Leave a Comment

Failure Happens – It Is What Happens Next That Matters

Failure Happens – It Is What Happens Next That Matters

One of the benefits of reliability engineering is that failure happens.

Everything made, manufactured, or assembled will fail at some point. It is our desire to have items last long enough that keep working for us. Since failures happen, our work includes dealing with failures.

Not My Fault

Years ago, while preparing samples for life testing at my bench, I heard an ‘eep’ or a startled sound from a fellow engineer. It was quickly followed by an electrical pop noise and a plume of smoke.

Something on the circuit board she was exploring had failed. With a pop and smoke. She didn’t move.

At this point, my initial amused response turned to concern for her safety. She was fine, just startled as the failure was unexpected. She quickly claimed it wasn’t her fault.

It was her design; she selected and assembled the parts, and she was testing the circuit. Yet, it wasn’t her fault. She did not expect a failure to occur (a blown capacitor – which we later discovered was exposed to far too much voltage), thus it was not her fault.

We hear similar responses from suppliers of components. It must have been something in your design or environment that caused the failure, as the failure described shouldn’t have happened. It’s not expected.

Well, guess what, it did happen. Now let’s sort out what happened and not immediately assign blame for who’s fault it is.

The ‘not my fault’ response so a failure is not helpful. Failures are sometimes the result of a simple error and quickly remedied. Others are complex and difficult to unravel. The quicker we focus on solving the mystery of the cause of the failure, the quicker we can move on to making improvements.

Warranty

With possibly too many ‘not my fault’ responses, laws now enjoin the manufacturers of products to stand behind their product. If a failure occurs, sometimes within specific conditions, the customer may ask for a remedy from the supplier.

If failures did not happen, there would be no such thing as a warranty.

A warranty is actually a legal obligation, yet it has turned into a marketing tool. A long warranty implies the product is reliable, and by offering a long warranty, the manufacturer is stating they are shifting the risk of failure to themselves.

A repair or replacement is generally not adequate recompense for a failure, yet it provides some restitution. In most cases, it only provides peace of mind if the item doesn’t fail.

The warranty business has become an industry itself. Selling, servicing, and honoring warranties is something that others can deal with outside your organization. The downside is the lack of feedback about failure details, so you can affect improvements. A manufacturer shouldn’t hide behind their warranty policy, nor ignore the warranty claim details. It is one way a customer can voice their expectations concerning product reliability. You should listen.

Repair services

My favorite outsourced repair service story involved a misguided payment structure.

If you pay a repairman based on the value of the components replaced, they will likely always replace the most expensive components. If the repair is accomplished by resetting a loose connector, nothing is replaced, and the repairman is not compensated for the diagnostic work and effective repair. If he instead immediately replaced the main circuit board and, in the process, reseated most of the connectors, the repair is fast, effective, and he is handsomely rewarded.

See the problem?

When a failure occurs, it may be natural to offer a repair service as the remedy. It should be quick (not a two-week wait as with my local cable company to restore a fallen line) and efficient for all parties involved. For the owner of the equipment, we want the functionality restored as quickly as possible and cost-effectively as possible. For the manufacture of the equipment, we want cost effectiveness, plus knowledge concerning the failure.

Does your repair service provide for the needs of both parties as well as the repair technician?

Fail safe

Sometimes, when a failure occurs, nothing happens. We might not even notice that the failure occurs. Other times, the product simply goes ‘cold’ or a function is lost. Nothing adverse, no pop or smoke, occurs.

We call this failing safe. It’s more complicated than my simple explanation, yet it is the desired response to a failure. The product itself should not create more damage, cause harm, or place someone in peril. It should fail safely and preferably quietly.

If the ignition falls from the ignition switch, which may be considered a failure to retain the key within the switch, the driver should not lose control of the vehicle. This is, in part, a safety feature, yet it is also a common expectation that the failure of a system should not create other problems.

Failure containment is related.

How does your product fail? Safely?

Maintenance

For some failures, such as the degradation of lubricants, we perform maintenance. When the brake pads or tire tread wear to a marginally safe level, we replace the brake pad or tire. If we can anticipate the failure pattern, we perform preventive maintenance.

Creating a maintainable piece of equipment is one response to failures. It allows creating complex equipment with failure-prone elements. Through maintenance, we are able to restore the system to operation or avoid unexpected downtime. If failures didn’t occur, we wouldn’t need maintenance.

We have some control over the nature of the maintenance activities. For some types of failures, we can only execute corrective maintenance. For others, we can use preventative methods. The idea is to anticipate and avoid the widest range of failures through effective maintenance practices that remain cost-effective.

Adding maintenance practices in response to system failures is not the duty of the owner of the equipment. It is a design function to anticipate the system failures that may occur and devise the appropriate maintenance plan to thwart unwanted failures from occurring. The two parties actually have to work together to make this work well.

Expectations

When I buy a product, I know that some proportion of products like the one I just purchased will fail prematurely. I just do not want or desire mine to fail. My expectation is that the one I select at the store is a good one. It won’t let me down, stranded, or injured. That is my expectation.

When a failure does occur and I value the functionality the product provides, I will want to restore the unit via repair or replacement, sometimes via a service contract or warranty, or a repair center. To a large degree, my expectation is that after a failure, all will go well.

As the manufacturer of products, when a failure occurs, your expectations may include learning from the failure to make improvements. Or it should.

We know we cannot anticipate nor avoid every failure that may occur. The expectation on both sides is to make robust and dependable products that provide value for all involved. When that approach fails, we fail.

Failure Happens

In response to a failure, it’s how the product, customer, and manufacturer respond that matters. A simple failure can turn into a disaster for all involved. Or the failure can provide insights leading to breakthrough innovations and new opportunities.

It’s how we respond that matters.

How do you respond to failures?

Filed Under: Articles, NoMTBF

About Fred Schenkelberg

I am the reliability expert at FMS Reliability, a reliability engineering and management consulting firm I founded in 2004. I left Hewlett Packard (HP)’s Reliability Team, where I helped create a culture of reliability across the corporation, to assist other organizations.

« R99 vs. 1 ppm

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

The NoMTBF logo

Devoted to the eradication of the misuse of MTBF.

Photo of Fred SchenkelbergArticles by Fred Schenkelberg and guest authors

in the NoMTBF article series

Recent Posts

  • Failure Happens – It Is What Happens Next That Matters
  • R99 vs. 1 ppm
  •  Developing Maintenance Strategy for a Sheet of Paper 
  • Automating Risk Management
  • Do Reliability Centered Maintenance Working Groups Really Guess?

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy

Book the Course with John
  Ask a question or send along a comment. Please login to view and use the contact form.
This site uses cookies to give you a better experience, analyze site traffic, and gain insight to products or offers that may interest you. By continuing, you consent to the use of cookies. Learn how we use cookies, how they work, and how to set your browser preferences by reading our Cookies Policy.