Testing in the Flat Part of the Bathtub Curve

Abstract

Chris and Fred discuss the question posed to us by one of our listeners regarding the ‘flat’ bottom of the ‘bathtub curve.’ Just a bit of background – the ‘bathtub curve’ represents the hazard rate of a system. That is, it helps us understand the probability that a product that works ‘now’ will fail. The ‘flat’ part of the curve is often cited as the ‘random’ part of the curve where failure is caused by environmental overstress conditions. So how do we test for this part of the curve? … but hang on – is the ‘bathtub curve’ even a thing? Is it even relevant for the useful life of a product? If this intrigues you, listen to this podcast!

Key Points

Join Chris and Fred as they discuss a ‘problem’ of testing the flat part of the ‘bathtub curve.’ The bathtub curve is a well-known teaching aid which helps communicate the concept of decreasing, constant and then increasing hazard rates. The hazard rate is based on the probability that a functioning product will fail ‘now.’ So the decreasing part of the bathtub curve, also known as ‘wear-in’ or ‘infant mortality’ is typically associated with the first part of a product’s life. The increasing part of the bathtub curve, also known as ‘wear-out’ is typically associated with the end of the product’s life where it ceases to be reliable enough to be useful. So what about the bit in the middle? The bottom of the bathtub curve which is often associated with the product’s useful life? How do we test for the ‘reliability’ of this region?

In practice … you really can’t. And here are some of the topics we discuss to explain this:

We don’t really see the bathtub curve in practice. We sometimes see bits of it, but in practice the ‘bathtub curve’ rarely looks like a typical bathtub. We often see ‘ticks’ or ‘checkmarks’ where the hazard rate decreases and then turns around and immediately starts increasing. Does this mean that the product has no ‘useful life?’ Of course not – but it is often (incorrectly) assumed to exist for all products. It rarely does.
The ‘flat’ part is all about randomly occurring external events. The flat part of the bathtub curve or constant hazard rates more broadly typically mean we are dealing with failure caused by randomly occurring external stresses. These are things like lightning strikes through to ‘dirty’ power supplies that destroy electronic components. These tend to be the only way things fail in a non-changing way. That is, a 100-year-old product may be just as likely to fail as a 1-day-old product when they both get struck by lightning.
So testing the ‘flat’ part of the curve is actually testing for these randomly occurring external events! By extension, to test the ‘flat part’ of the bathtub curve (assuming it exists), you need to characterize these randomly occurring external or environmental events that induce failure. You need to understand how often lightning strikes occur, or voltage spikes propagate in dirty power supplies, or ruts on a road and so on. So you are not just testing your product … you are testing your environment. And when we specify things – we tend not to focus on really characterizing these ‘rare’ events. But regardless, to test for the ‘flat bottom’ of a bathtub curve, you need to focus on the environment as much as anything else.
Accelerating the testing of the flat bottom are even more problematic. This means you need to either speed up the occurrence or increase the stress of these randomly occuring external events. And know how they relate to the probability of failure.
Let’s talk about the first part of the bathtub curve – reliability improves when the ‘configuration’ improves. So when things ‘wear-in,’ the hazard rate goes down (the first part of the ‘bathtub curve.’) So this means that the product is becoming ‘more reliable.’ But this can only occur when the configuration of that product changes. This could be a defective component causing failure, and then being replaced with a non-defective component. This is why we often equate wear-in or the first part of the bathtub curve with manufacturing or quality. And there is also a statistical trick where ‘substandard’ products fail early, meaning the rest of the population of products contains an ever-increasing portion of ‘non-substandard’ products. So the population appears to become more reliable as the bad ones fail and are removed from consideration early.
… and this can still be based on ‘wear-out.’ A defect can cause infant mortality or wear-in based on a ‘wear-out’ failure mechanism. If there is a surface blemish that starts a fatigue crack, the product could fail within the first week of use due to ‘fatigue.’ Fatigue is a classic case of wear-out, but it manifests itself as a ‘wear-in’ failure because it is caused by a defect early in use.
The back part of the bathtub curve is all about degradation and damage. Every time the product is used, and damage is accumulated that increases the probability of failure, then it is a wear-out failure mechanism.
… so returning to testing the ‘flat bottom’ of the ‘bathtub curve? … what question are you really trying to answer?

Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.