The Need to Improve the Reliability Narrative

Last Verified February 29, 2024

Little Compromises and Future Costs

In a recent Seth Godin blog, Counting beans he talks about the eventual costs of little compromises. The immediate benefit may be celebration worthy, yet

But overlooked are the unknown costs over time, the erosion in brand, the loss in quality, the subtraction from something that took years to add up.

This certainly applies to reliability as well. Deferring maintenance just one more month, addressing one more software bug can be done after shipping, and similar small shifts erode reliability of your system.

What Happens When Deferring Maintenance is Normal

I live in a small mountain community where we own and operate a water treatment system. A few years ago I had the chance to review the maintenance plan for the system and spotted the main transfer pump was old, leaking, out of balance, and basically in need of TLC (or replacement).

The options included a tear down and replacement a bearings and seals, complete replacement, or do nothing. The first option received the most discussion as it was still working and the leak wasn’t that bad. The rationalization went on for some time.

The second option to tear down and replace parts was eventually adopted and accomplished. We noticed the pump and motor had deteriorated significantly beyond the bearing and seals. Yet, the plan was to put it back into service quickly. We noted that we would have to replace the equipment soon. That was the last action on the main transfer pump. The action to research and buy a replacement pump got deferred.

The pump failed. A hard failure that required a replacement a few months after the minor repairs.

We didn’t have a new pump lined up for purchase or on hand. The system operator found a used pump that would work for us, at 3x the price of one purchased without an emergency need.

Again, we noted the need to research and buy a new pump. Again, since the current system was working, the priority remained low enough to thwart any action to prepare for a backup or replacement.

This was with one element of the system. The same behavior surrounded the valves, treatment equipment, and every element of the system. The normal behavior was to wait till we absolutely had to fix something than make it happen.

Downtime of the system threatened the water supply for the community. The fix on failure behavior cost significantly more than preventative maintenance, yet the value of reducing the risk of downtime and emergency repairs was not seen as worth taking action.

So, preventative maintenance didn’t happen. It cost more to address emergency failures later.

What Happens When Meeting Reliability Goals is Optional

Nearly every product requirement document (PRD) I’ve seen has at least one, often more, statement on the expected durability of the product. When asking members of the development team about reliability expectations it is often fragmented, such as ‘a 5 year life’, or ‘the warranty period is 2 years.’

Some organization may even document a complete reliability goal, such as 95% of unit will survive 5 years for our customers (see xx for details on customer use and environments, see section xx for success criteria).

Years ago I worked with a medical device company that had a vague reliability set of statements in their PRD. We changed it to a complete and clearly stated reliability statement. I asked them to tag the goal as a requirement, thus requiring the measurement and meeting of the target reliability before shipping.

They had a formal process to manage requirements, and an informal process to ignore (manage?) goals or objectives.

Of course the focus is on the requirements. While a few on the team understood the importance of the reliability objectives, it often had to take second priority due the demands of a documented requirement.

Part of the hesitation around setting reliability requirements was the ability to measure reliability during development. The team had not measured the expected reliability performance before, thus balked at signing up for an unknown obligation.

We compromised leaving the reliability as an objective. Then worked to teach the team first how to design in reliability, second to measure reliability performance, and third, to document the value of meeting or not meeting the reliability targets.

On projects started after the success (especially reliability-wise) the team elevated the reliability objective to a requirement.

On the other hand, with organizations the kept the focus on requirements and with little or not attention on improving reliability. They failed to meet or improve reliability performance. It was too easy to compromise, delay, avoid improving their ability and reliability performance.

At the final design reviews, you either have measured the reliability performance or haven’t, and your team still needs to decide to start production or not. If it is common that you only have a vague guess or strong opinion (no data to support) then that is what you get used to having available to make a decision.

The team will make decisions with our without the right information, and without good reliability inforamtion, the risk of significant field failures is high.

Create the Narrative that Supports the Value of Reliability

Both the water system and medical device team desired flawless operation of their respective systems over time. The easily expressed the desire for a reliable system.

Both though faced hurdles that prevented them taking action to achieve the reliability aspirations.

The didn’t hold themselves accountable to achieve reliability performance.
They lacked the know how to measure expected or actual reliability performance.
They became accustom to working with vague opinions as ‘good enough’ to make decisions concerning reliability.
The narrative they had includes the notion that ‘if it ain’t broke, don’t fix it’.

The reliability of your system occurs in the future. Actions, design decisions, and preventive steps take investment today and the pay back is gradual, boring, and in the future.

An emergency shut-down or product recall is exciting and right now.

Changing the focus from ‘right now’ to investing for the future is the key. The idea is to quantify the value or benefits of a reliable system. What happens, actually happens (maintenance costs, downtime, warranty expenses, brand loyalty, etc.) if you hit your reliability requirements? Write it down and make it visible.

If the organization is unwilling to change goals to requirements, then what is hold them back. Is it the ability to measure reliability – well get busy with education, accelerated testing, reliability modeling, etc. Is it the ability to measure the value of a design change or the ability to quantify the value of critical spares? Build the case and make it public.

Celebrate failures, document and share the value of improvements, become relentless keeping reliability part of the discussion. Enable your team to think about reliability, to balance priorities including reliability, to have the necessary information to make informed decisions.

Have you transformed an organization from ‘worry about reliability later’ to ‘reliability is worth the investment’? If so, what steps did you take to make it happen? Add a comment and share your advice so others can support their organizations in a similar manner.

Little Compromises and Future Costs

What Happens When Deferring Maintenance is Normal

What Happens When Meeting Reliability Goals is Optional

Create the Narrative that Supports the Value of Reliability

About Fred Schenkelberg

Leave a Reply Cancel reply