The other day a student said they are asking for failure dates, not failure rates. That is a marked improvement when requesting information for the many sources of reliability information. It is not sufficient though.
Also ask for the type of failures expected or encountered during the testing or observation.
Connect the test to failures
Years ago during a cable manufacturing line visit, we noticed a quality technician take a sample of product from the line causing significant scrap in the process. The procedure was to cut a sample each hour on the hour without regard to scrapping anything less than 10k feet long on the output.
So, we followed the sample to see why it was so important. The test was for the polymer jacket and intended to determine if the blend of polymer was appropriate for that product. Unknown to the quality technician and team, the design of the polymer jacket had been changed to a higher melting point grade of material over 10 years prior. No one updated the test, which had been passing (a go/no-go test) for over 10 years.
The test, which created scrap every hour by the thousands of feet was unable to detect poor quality in the new material as it wasn’t designed to check that material. The was disconnected from the expected failure.
Operation and stress
In the design of reliability testing, keep in mind that the operation and stress should be related to failures. It is failures that provide information on what to improve, confirm design margins, or validate assumptions.
It is not always possible to test to failure and perform failure analysis, yet making sure any testing is related to expected and actual failures is important.
The type of failure or, in part, the root cause of the failure indicates the appropriate course of action to take corrective action. Keeping the types and sources of failure in mind may assist you to avoid or prevent failures.
If a failure occurs too quickly, collect the appropriate information to determine the type of failure and the corrective action.
Type of failure include:
- Design defects: May affect every unit produced
- Manufacturing defects: Often due to material or process variation
- Improper testing: May miss failures (false positive results) or cause damage
- Secondary failures: The item actually damaged (hero) may be the result of a failure of another component (instigator)
- Intermittent failures: The failure only occurs under specific conditions and the product works otherwise.
- Transient failures: Short-lived and often not tied to specific conditions.
- Wear-out failure: Often material degradation over time or with use to point of failure
- Failures of unknown origin (my favorite): Here we work on revealing the unknown.
The way things fail
The way something fails is also important when attempting to avoid, prevent, understand or resolve a failure.
- Hard or Catastrophic failure: The unit fails and requires repair or replacement before the system resumes operation.
- Soft failures: The unit ceases to operate under some set of conditions and when those conditions are removed the system resumes operation.
- Degraded performance: The function of the system is impaired due to a failure that impacts a portion of the system and does not shut down the entire system.
- Drift failures: The system slowly approaches and crosses a failure threshold and is restored with calibration, restart, or reset.
Some failures are spectacular and others are only noticed tangentially.
During the design process, keeping in mind that every element of a system may lead to one or more of the above types of failure. On complex systems, it may be necessary to include diagnostic routines to assist in the determination of the cause of the failure and where the damage occurred.
When attempting to understand a failure occurrence, the symptoms, conditions, status, and history of the system and specifics concerning the failure begin the process of determining the failure type or types. It is not uncommon to have a design defect revealed by manufacturing variation and under unusual customer imposed load.
Finding one type does not solve all the potential ways to improve the product by removing contributing factors to the failure.
Sources of Reliability Data (article)
Field Industry and Public Failure Data (article)
Reading a Datasheet (article)