There are a lot of reliability tools.
From FMEA to FTA, from ALT to HALT, from derating to sneak circuit analysis. We also have a lot of acronyms. We cannot afford to do all the tasks, so which do we select and why?
Each activity has some reason for existing. Each has some question that it helps answer. HALT helps to find what will fail. ALT helps to determine when failures may occur.
Knowing what each tool is capable of doing is a start. Knowing what you need to know is essential.
Purpose of a reliability plan
Consider the purpose of a reliability plan.
You either are proposing or have been tasked with creating a plan. The plan is a guide to the sequence of tasks to accomplish. Some reliability activities have a long lead time, such as ordering custom parts for testing. So activities may take time to accomplish, such as performing detailed optimization studies.
Our team needs to know how many samples to prepare or the expected duration of the study. These examples are the practical elements of performing tasks, not the reason the tasks should be accomplished.
The purpose of the reliability plan is to answer questions or create
information.
Early in a project to create a system we need to know what the team should accomplish that will generate a reliability system. We generally want to accomplish some business or customer related objective. Few field failures or increased uptime may be broadly stated objectives. The reliability plan is the list of tasks and events that enable the team to understand reliability risks and accomplishments well enough to make the right decisions during the development process.
The plan trades off specific tasks for the knowledge we need to accomplish our goals.
Constraints Shape the Plan
No one has an unlimited budget or time to fully understand all the risks or accumulate perfect knowledge concerning the system’s reliability performance.
The project may have a budget, prototype, time-limit, or other limitations. We should not propose a 6-month duration life test when the team needs to have a life estimate in 2 months. We do not need to conduct HALT to find potential failure modes when working to reduce the long list of field failures already occurring.
How are you going to spend you last dollar or prototype?
Constraints help us focus on what is important. Here I am suggesting ‘important’ is the resulting knowledge gained from the task. Not the task itself.
Challenge each element of your plan
In general, a reliability plan consists of goals, risks, and evaluation.
Having a goal the cooling fan provides a guide to purchase a suitable component and evaluate the risk of untimely fan failures. The plan consists of elements that help the team clearly understand the reliability goals, the risks of uncertainty or variability preventing the achievement of the reliability goal, and the regular feedback on how well the design will accomplish the goals.
Specifically select tasks that will move the design and the decisions concerning the design towards the objectives. Select tasks that are connected to specific decisions.
For example, a development project may include a design freeze milestone. The team fixes most of the components and the layout and moves to building prototypes. Derating and stress/strength analysis are tools to assist in the selection of components that minimize the failure rates of those components under the expected stresses. These two practices provide a guideline and when used well, an ability to tradeoff different component capabilities and costs to find an appropriate balance. The decision is component selection, and these specific tools provide reliability knowledge.
When selecting elements for your reliability plan consider:
- Does the task create information necessary for a decision?
- Does the task reduce uncertainty related to a decision criteria?
- Does the task answer a question clearly and timely?
In the unfortunate event that a customer demands a set of tasks that may or may not be useful, you may have to accomplish the tasks to meet the customer imposed requirements. In this case, what can we salvage from the list of tasks that is useful for decision-making? If we have to accomplish a fixed list of tasks what can we learn and use to make decisions?
The plan is a guide for the meetings, discussions, and decisions that the team has to make during the development process.
If we focus on what we need to know and when, we significantly increase the ability of the team to create a reliable product.
Hilaire Perera says
Implementing a reliability program is not simply a software purchase; it’s not just a checklist of items that must be completed that will ensure you have reliable products and processes. A reliability program is a complex learning and knowledge-based system unique to your products and processes. It is supported by leadership, built on the skills that you develop within your team, integrated into your business processes
A reliability program plan is used to document exactly what “best practices” (tasks, methods, tools, analysis, and tests) are required for a particular (sub)system, as well as clarify customer requirements for reliability assessment. For large-scale complex systems, the reliability program plan should be a separate document. Resource determination for manpower and budgets for testing and other tasks is critical for a successful program. In general, the amount of work required for an effective program for complex systems is large.
A reliability program plan is essential for achieving high levels of reliability, testability, maintainability, and the resulting system Availability, and is developed early during system development and refined over the system’s lifecycle. It specifies not only what the reliability engineer does, but also the tasks performed by other stakeholders. A reliability program plan is approved by top program Management, which is responsible for allocation of sufficient resources for its implementation.
Ash says
Thank you posting this article, it is really helpful. I had a question, ‘Do you think the forecast of reliability would be accurate if we measure reliability of preceding activities relative to the activity of installation?’
Fred Schenkelberg says
Hi Ash,
Activity by itself is not an indicator of eventual field reliability performance. Some organization do a tremendous number of activities, testing, modeling, simulations, FMEAs, ALTs, etc and have lousy field performance. While others do very few ‘reliability activiites’ in relation and have excellent field performance. The difference is why the activity is done and how the results of the activity influence the decisions that impact field performance.
Our work is not to simple do stuff, it is to do the right stuff that make the most difference toward achieving the reliability goals.
I may be missing what your intent to measure, so please clarify if I got it wrong.
cheers,
Fred