Imagine you are requested to assist a design team in determining how to best improve the reliability of a product. You learn that the organization produces a range of point of sale (POS) devices and they have invited you to a meeting with their staff to discuss the product and ways to improve the field reliability.
To help understand the situation, you may have already started to think of a set of questions whose answers would lead to suitable recommendations:
1. What is the current field failure rate?
2. What is the Pareto of field failure mechanisms or modes?
3. What is the desired level of field failures that is acceptable (i.e., the goal)?
4. How is the product designed with respect to reliability (i.e., design for reliability activities)?
5. What is the current estimate for field reliability based on internal measurements and modeling?
6. What happens when the product fails?
7. What do the failure analysis reports say about the possible causes of field failures?
8. Do field failures match internal testing results?
The meeting included directors of engineering, manufacturing, quality and procurement and a handful of key engineers from those departments. They each provide a brief introduction to their products and reiterated the desire to improve field reliability. You start to ask the above questions in an attempt to understand the situation.
At first, there is little provided by way of response from anyone on the team. Did you hit upon some trade secret? Were you showing your own ignorance by asking such questions?
No; they did not know how many or how the product failed in the field. They had made some assumptions about use, environment, and what could, maybe, possibly go wrong. They had little evidence of field problems. They had not even talked to anyone about the nature of the field issues.
No feedback = problem
The most interesting part of the product’s design was the security feature that destroyed the memory and custom IC when the case was opened or sensed tampering. Destroyed was a pretty accurate description given the physical damage to the components on the circuit board they showed you. Once the product is assembled and the security system activated, it was nearly impossible to disassemble and conduct a circuit analysis. This would make the field failures difficult to analyze.
Compound this ”feature” with the relatively low cost of the device. These two factors lead to a replacement rather than repair strategy when addressing field failures. Furthermore, the failed units were destroyed as they were deemed to have no value for further study.
One other piece of information that pertains to this search for reliability improvements is that the organization only has one customer. Every unit they created went to one customer who bundled the POS device with inventory, payroll, building security, cash register, and various other elements that a small business may require to operate efficiently. The POS is only one piece of a larger kit. The service provides a single point of contact for training, installation, maintenance, and service and support.
The design team did component derating and worked closely with procurement, Q & R, and manufacturing to design as robust a product as they could under the cost and other design constraints. They did derating, qualified their vendors, and conducted a wide range of product testing under a wide range of stresses. They actually did a decent job in creating a reasonably reliable product.
It was just that they did not really know whether any of their assumptions and educated guesses were correct. They really did not know the use environment, the range of expected stresses, or even how often the devices were actually used. They did not know how to relate their internal product design and testing to what would occur with actual use.
Since any fielded unit was destroyed before any failure analysis could be conducted, they did not even have a count of how many failed for any reason nor did they have the basic information a Pareto of field failures would provide. They were blind to how the product actually performed. Also, this team had been producing POS devices for over five years and in terms of sales, the devices were relatively successful.
Statistics are important
Without even having a count of failures, how did they know they needed to improve the reliability? Was this a part-per-million improvement or a 20% field failure rate problem attributable to first-year product introduction? No one really knew. They were told to make the product more reliable, but it was impacting the warranty costs.
Warranty costs are something tangible that you can analyze. How much were they paying in warranty? What was the cost per unit shipped of warranty? Again, No one had answers to these questions.
The Director of Engineering then spoke up and tried to explain the situation. Remember: They have one customer for all their products. Once a year the Chief Financial Officer and the customer sit down to discuss pricing, warranty, and sales projections. It was the Chief Financial Officer (CFO) who asked for reliability improvements. It was also the CFO who, if he has the warranty and field failure information, was sharing it, as he considered it company-sensitive information. The CFO did not even talk about the magnitude of the field issues with anyone even in his office. He was not providing any information except to insist that they ”make it better.”
Lesson: Set goals and measure performance
At this point, you likely would be rather frustrated and at a loss for what to recommend. Surely, no organization should be so blind as to how their product was performing.
After some thought and further discussion, you and the directors decide on two courses of action. First, you would go talk to the CFO and attempt to understand the field failure situation by explaining the importance of the information to the rest of the team. Second, to the team would conduct a series of HALTs to attempt to understand the design’s weaknesses. In parallel with this testing, an attempt would be made to fully characterize the use environment and use profiles by conducting surveys, field observations, and questionnaires. To effectively conduct HALTs you need to know the types of stresses the product would experience. Any process operates better when there is a clear goal and a measure of performance. The comparison of the goal and measure provides the necessary feedback that enables design or process improvement.