Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • Critical Talks
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
    • Asset Reliability @ Work
  • Articles
    • CRE Preparation Notes
    • on Leadership & Career
      • Advanced Engineering Culture
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • ReliabilityXperience
      • RCM Blitz®
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
    • Reliability Engineering Management DRAFT
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Feed Forward Publications
    • Openings
    • Books
    • Webinars
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Reliability Analysis Methods online course
    • Measurement System Assessment
    • SPC-Process Capability Course
    • Design of Experiments
    • Foundations of RCM online course
    • Quality during Design Journey
    • Reliability Engineering Statistics
    • An Introduction to Reliability Engineering
    • An Introduction to Quality Engineering
    • Process Capability Analysis course
    • Root Cause Analysis and the 8D Corrective Action Process course
    • Return on Investment online course
    • CRE Preparation Online Course
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home

by Doug Plucknette Leave a Comment

Could Delta Airlines Have Missed Some Hidden Failures?

Could Delta Airlines Have Missed Some Hidden Failures?

Yet another example of why it’s important to understand the failure modes that make your system vulnerable to complete shutdown. Delta Airlines is learning this lesson the hard way today after having to inform customers around the world that all of its flights would be on hold or even canceled due to a “system wide outage”.

Delta listed the cause for the outage as a power failure near its world-wide office location in Atlanta, Georgia while those at Georgia Power believe it was the failure of Delta’s equipment that caused the power outage.

While each company points the finger at the other, the reality is Delta’s customers around the world are sitting at airports or at home wondering when the problems will be resolved and when Delta will be able to accommodate their travel needs.

The irony of it all is it didn’t have to happen.  The industry that better than any other has shown the world the importance of developing a maintenance strategy by assessing all reasonable and likely failure modes apparently has never applied the tools they used to make aircraft reliable to their computer systems.  A thorough analysis using a team of system experts from Delta and Georgia Power would have with a high degree of certainty discovered and discussed the failure mode that is responsible for today’s outage.  On top of that the team would have recommended a strategy to address/mitigate the failure to ensure continued coverage.

So what happened?

How could one of the world’s largest air carriers find themselves grounding every flight around the world and as hours passed have no reasonable response for customers as to why their flight had been grounded or canceled and when they expected to be able to return to service?

My guess is someone who had little understanding of the importance of hidden failures convinced Delta management that the redundant systems they have in place would ensure continued service regardless of what failure might occur. For those who don’t work in the field of maintenance and reliability, someone convinced them then never had to worry about the brakes on their car failing because they have an emergency brake.  And while this is true, if you never test the emergency brake to make sure it works properly, it might not work when you need it. As I like to tell my customers “Redundancy builds complacency”, don’t ever lull yourself into believing that because you have a back-up nothing bad can ever happen.

What to Do

While the airline who had until today had one of the top records for customer satisfaction as well as on-time departures and arrivals looks for answers, it’s a good time to think about your company. Are your systems vulnerable to the same type of failure? What are the potential consequences to your business should a complete system failure occur?  If the answers are as bleak as those faced by Delta Airlines today, take a tip from someone who has been helping companies mitigate failures for two decades; find yourself a great facilitator, put your team of experts together and find/mitigate the failure modes before they occur!

As usual I’m interested in your feedback on this story. Has your company ever suffered a similar event? Have you performed FMEA or RCM on your computer systems? If so what were some of the tasks implemented to mitigate the failure modes that would result in system-wide shutdown?  And, maybe most fun of all if you were impacted by this event, what did you have to do to make it to your destination?

Filed Under: Articles, on Maintenance Reliability, RCM Blitz

« Why Use DOE
Should I Become a CRL, CMRP, or CRE? »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RCM Blitz® series
by Doug Plucknette

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • Terrorism Risk Insurance Act Exclusions: Gray Coverage Areas
  • Why Total Productive Maintenance Is The Answer To Reliability-Centered Culture
  • 17 Powerful Insights on Effective Communication Using FINESSE
  • Surprising Insights from Simple Run Charts
  • Risk is Round

© 2023 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy

This site uses cookies to give you a better experience, analyze site traffic, and gain insight to products or offers that may interest you. By continuing, you consent to the use of cookies. Learn how we use cookies, how they work, and how to set your browser preferences by reading our Cookies Policy.