This is our second article about the 3 ways to do reliability allocation. In the first article, we set the scene. We talked about the reliability design cycle that needs to be implemented to make sure what we do will actually work. Will actually matter. In this article, we go through the six steps of reliability allocation. You need to do the preparation work first. But … we are all about keeping it simple. Making it exhaustive and complicated means you are wasting your time.
If you want to learn more about a straightforward approach to reliability allocation – read this!
Lets Recap
In our first article, we set the scene for reliability allocation. Instead of focusing on reliability allocation straight out of the gate, we need to get our reliability design cycle in place. Reliability allocation is part of this cycle.
If we don’t – then reliability allocation will simply fail. Always. And once we have our reliability design cycle implemented, there are six key steps of reliability allocation. And we will go through them now.
Step 1 – Identify Customer Requirements
Go through the FULL and thoughtful process of specifying customer reliability requirements.
Getting this right is the subject of an entire article, lesson or even course! So we won’t go through it in too much detail here. Don’t forget how important this is. And don’t rely on the customer to tell you what they are expecting.
Let’s say that you are designing a new, high-speed photo printer for commercial print shops. The sorts of shops that high-end photographers use to print out top quality images for clients. These shops won’t release white papers or tender specifications about what they are expecting in terms of reliability. They will just follow the market leaders in terms of price and quality.
And to make matters worse in this example, your competitors may have conditioned your future customers to expect reliability performance to be measured in terms of MTBF – which means virtually nothing for actual customer expectations. It also means next to nothing for warranty costs and other things that actually matter.
So fill in the gaps. If your marketing team knows that your customers will go through technological refreshes every 5 years – then that might be your service life. So your reliability requirement might be in terms of the five-year warranty. Or it could be the three-year warranty reliability (… in fact, you pretty much always need to have a warranty reliability goal – your business plan depends on it). You might want to have a six-month reliability goal to make sure your design team focuses on eliminating as many infant mortality or dead on arrival (DOA) failures as possible. You might have all three. And remember, we will be going through how to deal with multiple reliability requirements later in this article.
But let’s say that instead of creating a printer, you are creating a medical device. And you speak with your marketing, business and support teams to work out that the customer expects:
a medical device with a reliability of no less than 95 per cent after a three-year service life when used in accordance with specified use cases.
This doesn’t mean that our product can afford to be this (un)reliable.
What? (… I hear you say). If our product meets the customer requirements, how is this in any way bad? Well – it’s not bad. But we may need additional system reliability goals to ensure we meet our customer requirements.
Step 2 – Establish System Reliability Goals
Let’s say that you, like many other organizations, need your product to pass some sort of reliability demonstration test. A prototype of your system or components needs to be built and forced to fail on a testbed. Repeatedly. Now because failure is a random process, filled with uncertainties, there is a perpetual and unchallengeable statistical reality you can’t avoid if you need to demonstrate reliability:
your product must exceed the test requirement to have a reasonable chance of passing a reliability demonstration test.
This is because we can only ever estimate reliability. Which means we need to account for this uncertainty by aiming not just to meet the requirement, but to exceed it. If you do more (or perhaps smarter) testing, you can reduce this uncertainty. Which means you don’t have to exceed the requirement as much. But you still need to exceed it. And in reliability circles, the extent to which reliability exceeds a requirement for the purpose of reliability demonstration testing is based on something called the ‘discrimination ratio.’
So let’s say our requirement is a reliability of 95 percent – or a 5 percent probability of failure. If we have a discrimination ratio of 2, then we divide our failure probability by 2. Now our system level reliability goal is 97.5 percent. If the discrimination ratio is 3, then our system level reliability goal is 98.33 percent. And our system level reliability goal drops to 96.67 percent if our discrimination ratio is 1.5.
Sometimes reliability demonstration testing focuses on the MTBF. So a discrimination ratio of 2 means we double the MTBF requirement to determine our system level goal and so on.
Selecting the right discrimination ratio is the subject of another lesson. But it is influenced by how much testing you allow yourself to do. If you only want to do a little bit of testing, then you have more uncertainty and you need to have a higher discrimination ratio. Which means your product needs to exceed the reliability requirement by more. So it is always a balance.
But there is another reason why you need to establish a system reliability goal that exceeds the user requirement. And that is because users will almost certainly use your product in unintended ways. So the apparent fielded reliability may not meet expectations if you don’t allow for this.
And customers always find ways to use your product in ways you didn’t anticipate. It is up to you to invest effort into getting your use cases as good as they can possibly be in advance. But all you will be doing is limiting the extent by which you are surprised by your customers and users. You will never completely eliminate these surprises. And use cases are the subject of another lesson.
Which means that your system reliability goal – the one you ask your design team to work toward – will always need to exceed the customer requirement.
Step 3 – Reliability Design Margin
Don’t confuse the difference between your customer requirement and your system reliability goal with ‘margin.’ Margin is the extent to which a performance characteristic exceeds your goals for that performance characteristic. Margin allows for unexpected events or uncertainties in things like operating environments and manufacturing quality to not cause failure. Most of the time.
Design margin accommodates uncertainty in the design process. We clearly don’t know what our final design will be at the start of product development. That is where our design uncertainty comes from. We need to have a design margin at the start to accommodate the actual solutions each design team comes up with.
And our system reliability goal is a performance level we must achieve. It is based on both our customer requirements and the more stringent system reliability goals we need to achieve to get there.
So what happens if we don’t have any design reliability margin? Well, let’s think about what would happen to our medical device. Instead of including a design reliability margin, we allocate reliability goals to each of the design teams for our nine components such that if they are all precisely achieved, we will precisely meet our system reliability goal. Let’s also assume in this example that eight of our components precisely meet their allocated reliability goals. But the ninth component’s design team come up against a lot of problems. Perhaps the new technology they were working on was more problematic than expected. Perhaps the materials needed to be of such high quality to that they became too expensive.
So what can you do as the design team lead in this case? Well … not a lot. You will need to find which of the other eight components can afford to be more reliable than they currently are. But you probably only found out about the problem with the ninth component toward the end of the product’s development process. Which means that the other eight design teams are almost finished their designs. And they were meeting their allocated reliability goals – so they weren’t aware of any problems.
Now, opportunities to further improve the individual reliabilities of the eight ‘good’ components will likely involve expensive rework. Expensive because we are at the end of our intended design phase. And of course – you need to essentially guess which component is the most effective candidate for reliability improvement.
Now let’s go back in time and start with a system reliability design margin. This means all the reliability goals allocated to all nine components become higher from day one. But there will be some design teams that can either reach these higher goals or at least get partway there. Which gives you reliability performance to play with!
When the teams that will inevitably struggle to meet their allocated reliability goals ask for help – you have scope to relax their goals without affecting any other design team.
Those components that have the capacity to be more reliable are naturally identified by including margin from the start. No expensive rework required. So there are no schedule delays. No crises.
And perhaps the best bit – you may actually create the most reliable product on the market! You gave your teams more challenging goals – so who’s to say they all don’t rise to the occasion? All because you have thought about reliability design margin. You forced your design teams to be aspirational. Which is a good thing.
There is no magic formula for working out what the reliability design margin should be. If your product is more novel or developmental, then there would be more uncertainty in the design process. New technologies always involve aspects that aren’t predictable. So you would want a bigger reliability design margin.
On the other hand, if your product is based on mature technologies AND your design teams are experienced, then you can afford a smaller reliability design margin.
If there is any doubt, allocate more reliability design margin than you intuitively think. You have plenty of scope to ‘give’ some or all of it back during product development. For our medical device example, let’s say we keep one-third of the residual ‘unreliability’ once we have determined our system reliability goal.
Step 4 – Create a preliminary functional series design
What does this mean? Don’t assume you need redundancy. Aircraft obviously must have two wings. And sometimes regulations may require aircraft to have redundant engines. But beyond that – don’t start with redundant components. That comes later (perhaps). And reliability allocation helps you work out if and when you need redundancy.
For our medical device, we can represent or model its reliability using a simple series fault tree where each of the nine component failures are ‘basic events.’
A detailed description of fault trees and system reliability modeling is beyond the scope of this article. But hopefully, you get the idea.
The reason you don’t want to start with redundant components is that you eliminate flexibility by making assumptions about what your ‘problem areas’ will be. Leave this up to the individual design teams. You may be able to put restrictions on which teams have the scope to incorporate redundancy. Motor vehicles can’t have a second redundant engine– that is not practicable. But there may be scope to have redundant sensors to detect obstacles while reversing.
Unleash the creativity of your teams. Let them work out if they need to incorporate redundancy to meet their system reliability goals.
Step 5 – And now … we allocate reliability
There is a single equation you need to know.
$$R_{i(DG)}=\sqrt[\frac{\sum{a_{i’}}}{a_i}]{R_{Sys(DG)}}$$
where $$R_{i(DG)}$$ is the allocated design reliability goal for the ith component, $$R_{Sys(DG)}$$ is the system reliability design goal, and $$a_i$$ is what we call the ‘allocation factor’ for the ith component.
So what is an ‘allocation factor’? It is a number we assign to each component to help us work out what our allocated reliability goals are. And those components that have higher allocation factors get lower or ‘easier’ reliability goals allocated to them.
And where do allocation factors come from? We will cover that in another article (… don’t worry – its pretty easy!) In one sense, working out what the allocation factors are for your components is the most important technical thing you do as part of reliability allocation. But in another more important and practical sense, these allocation factors only give you a starting point that allows you as a design leader to guide your individual teams’ efforts.
Step 6 – Doing Something!
Sorry to leave you hanging – but that is what we talk about in the next article. Once you have your allocated reliability design goals and your reliability design cycle is working well, you will know when you need to do something. The key is working out what that should be!
Till next time!
Leave a Reply