If only we had a crystal ball or another device to predict the future. From the general wondering about the enemies next move, to the soldier hoping their equipment will work. In the corporate boardroom estimating the competitions next move, to the maintenance manager ordering spare parts, we have many uses for knowing the future.
We often look to past performance to provide an indication of the future. Has this mutual fund regularly provided adequate returns? If so, we predict it will going forward. And anyone that has reviewed mutual fund performance also has read or heard the admonishment to not use past performance to estimate future returns. Mutual funds, markets, business and battlefields all change and respond in sometimes unforeseen ways.
Of course, when faced with a decision we often do need to form some prediction about future conditions and possible outcomes. Whether investing or ordering spare parts or preparing a design for production, we use predictions to help determine the right course of action.
Reliability Predictions
While a young and new reliability engineer working at corporate headquarters, a senior reliability engineer in the division called to ask me if I could run a parts count prediction on one of their products. Specifically a Bellcore (now Telecordia) prediction on the products two circuit boards. I said yes, despite having never done one before nor really even knowing what a parts count prediction was or how it was useful. I had just that week received a demo copy of Relex (now part of PTC) prediction module, and this project would be a good way to learn both about parts count predictions and the software.
I quickly learned that the basic parts count prediction used the bill-of-materials and a database of failure rates to tally the expected failure rate for the circuit board. A multilayer ceramic capacitor had a failure rate of 5 FIT (failures per 109 hours), and the analog ASIC was listed with 450 FIT. The software helped match the components to their failure rates and did the math resulting in a final estimate for the expected failure rate of the product when used by customers.
It took about 2 hours to make the prediction, of which half or more of the time was spent learning the software. Not having any information other than the BoM all the settings in the prediction software were at defaults, nominal temperature, derating, quality level, etc.
Prediction Questions
This was magic. Pour in a list of parts, and after a few milliseconds of computing time, we know the future. Or do we?
My first check was on the notion that many of our product failed due to power supplies, connectors, and fans. The prediction results listed the power supply and connectors in the top five of expected failure rates, and there wasn’t a fan in the system, so it seemed about right. The more complex components were expected to failure more often or sooner than simpler components.
Where did the failure rates listed in the table come from? How did the folks at Bellcore know enough to list the values? With a little reading and a phone call, I learned that periodically the team at Bellcore would gather failure rate information from a wide range of sources, including GIDEP and major telecommunications companies. They would sort and analyze the data and create historical models of the failure rates including the effects of temperature, derating, quality, etc. The equipment they studied was primarily used in the military and telecommunications infrastructure. Mostly boxes with circuit boards.
The electronics industry changes a lot in five years, yet it was clear that unless we carefully resolved every failure to the component level and knew the use conditions, we would be hard-pressed to do better than the team at Bellcore. The product I did the prediction was similar to products in the telecommunication industry, not exactly, yet close enough it seemed.
Then I wondered about the calculations being done once the software had the BoM. Apparently, the approach was rooted in the time prior to computers and used a few simplifying assumptions to make the calculations easy to accomplish with mechanical adders and a slide-rule. One of the properties of the exponential function is the ability to add exponents. So, if we assume every failure rate is constant over time, we can use the exponential distribution to model the failure rate. Then for a list of component failure distributions, we simply add the failure rates. Then we can estimate the reliability at any time period of interest by calculating a single product and single exponent.
$$ \Large\displaystyle R\left( t \right)={{e}^{-\lambda t}}$$
Lambda being the failure rate and t being time.
This assumption assumed that components and therefore products enjoyed a constant failure rate. Despite knowing this was not true for any of our products based on carefully qualification and field data analysis, for the parts count prediction we made this assumption. This cast a serious shadow over the accuracy of the prediction. See the site NoMTBF.com for much more information and references that detail additional concerns.
There were additional questions that found inadequate answers further eroding my acceptance of the results the parts count prediction produced. I didn’t want to send back a report with faulty prediction, and I didn’t know how to proceed. Furthermore, I recalled that admonishments including with historical financial data, and wondered why we even tried to estimate the future of failure rates.
Value of Predictions
First I called the reliability engineer that requested the prediction. He thanked me and said what I did was fine. He agreed with my concerns and that the result was not even close to what the actual failure rate. He assured me that he and the team would not take the value to seriously, in fact, they were not going to use it at all.
Well, gee thanks. Why did I just spend my morning doing this prediction for them?
The prediction report was requested by a major customer as a condition of the purchase. They didn’t know what to do with the reported prediction other than they wanted to make sure we did the parts count prediction. It was to simply check off the box for the sale to occur. Nothing more.
Second, I talked to my mentor as a troubled young engineer. He said we understood any prediction was wrong. Just as all models are wrong some are useful; some reliability predictions are also useful. In this case, the value of my two hours was to help secure a multi-million dollar sale by meeting the customer requirements.
The value of any prediction, whether a parts count or physics of failure model, was not in the actual resulting value. The value was in what we did with the result. For reliability engineering work, even a parts count, even in it’s simplest form, encourages using fewer parts and operating at lower temperatures. Both are good for product reliability in general, thus the resulting behavior to reduce part counts and temperature rise increase product reliability.
We use reliability predictions to estimate a product’s performance. There are many ways to create an estimate, and all of them are most certainly wrong. There are times when the prediction provides insight or information that permits critical improvements, and other times it is just a checkbox. As reliability professionals, we should work to enable decisions with the appropriate tools and analysis. We do this by matching the approach to the task and the task’s importance. We disclose assumptions, limitations, accuracy, and options. We enable decision makers to understand the validity of our work and the lack of a crystal ball.
How do you see the future? Any stories about predictions you’d like to share?
Leave a Reply