Elements of a Reliability Program, Part Two
This is a two part series where I outline the basic elements of creating and supporting a reliability program.
Test to Failure to Discover Design Weaknesses
As physical hardware becomes available, step stress testing and Highly Accelerated Life Testing (HALT) should be used to discern the weak links in the design. There is no room here for success testing; one must test to failure. Focus on those items that are NUD: New to the organization, Unique to this product, and Difficult to design and / or manufacture.
Start at the lowest level subassembly that is conveniently testable and continue later as higher levels of integration become available. The lower levels require more electrical, mechanical, and software fixturing to test while under stress, but the stress levels can go farther. More fully integrated products are more easily tested and require less fixturing, but the stress level is limited by the weakest subassembly.
Each failure should be investigated to understand the root cause, no matter what kind of stress or stress level caused it. Until root cause understanding exists, one cannot make any estimates of the relevance of the failure mode. Indeed, some things do not have to be fixed, but it is often easier to fix the issue than to establish whether it can be safely ignored. Repeat HALT with each round of prototypes.
Use Manufacturing Screening to Ensure Early Life Success
Some components are weakened by anomalies in their manufacturing process or damage in shipping, storage, and handling. These defects are latent (hidden) and the parts will test well in manufacturing, but fail early (the first 90-days) in the product’s life, typically because they contain stress concentrators.
After corrective actions from HALT and step stress testing have established good design margin, manufacturing screening will be able to cause weak components to fail without removing significant fatigue life from the good components. In this way latent defects can be eliminated before shipping the device. Keep in mind that without first having a rugged design, manufacturing screening may decrease life and increase warranty.
Run-in, Burn-in, Environmental Stress Screening (ESS), and Highly Accelerated Stress Screening (HASS) are increasingly sophisticated methods of precipitating and detecting these hidden defects. After precipitation by stress, different detection screens must be used with appropriate testing to locate the (now) latent or visible flaw. Proof of Screen is run to ensure the trial regimen is tough enough to precipitate defects, and Safety of Screen is done to ensure enough Fatigue Life is left.
Validate the Design after Design Verification and Transfer to Manufacturing
Using specimens from the actual manufacturing process, subject the product to the suite of required environmental and regulatory tests. These are “success tests,” as the objective is to pass these qualification tests. This assures the baseline product as transferred to manufacturing will meet customer needs.
Ongoing Reliability Test (ORT)
Many changes will enter the production process: at top-level assembly, at subassembly suppliers, in the components, and during transportation and storage. Minor changes accumulate and often reliability will invisibly slip away as daily operations focus on functionality and yield. If design margin is lost, manufacturing screening that was benign to the product before may start to consume enough fatigue life that end-of-life failures start to show up in warranty.
Periodic testing to failure using step-stress testing or HALT is a way to measure the design margin and find weaknesses that may have slipped in. ORT may also include periodic cycle testing to monitor wear-out phenomena. As earlier in discovery testing, ORT can be done on whole products or focused on key components and subassemblies. Mechanical, electrical, and software fixturing from earlier discovery testing may be re-used with appropriate improvements for routine convenience. Ongoing Reliability Tests should provide early warning well beyond specifications and should not degenerate into acceptance tests.
That covers the elements – anything missing?
Also published on Medium.