Extrapolation and Sample Sizes
Chris and Fred discuss (essentially) how many ‘things’ you need to see to know enough about those ‘things.’ We see this conundrum across all sorts of fields of study. How many kangaroos do I need to capture and weigh to get a good understanding of the entire population’s typical weight? Delete kangaroo and insert whatever thing matters to you.
Join Chris and Fred as they discuss how many samples we need to take from a population we need to take to have a good understanding of the typical nature of the population you are studying.
- Let’s talk about information. Data contains information. You can extract information from data through data analysis. And if data comes from a random process (like failure), then each new data point adds more information to your understanding of what is happening. So let’s say we are measuring dimensions of molded parts being produced by a manufacturer. If the first molded part we measure is within tolerances … are we confident that the rest of the molded parts are also within specification? Are we more confident if the second part we measure is also within tolerances? Yes … but perhaps not by much. How many more parts do we need to measure … with all of them being within tolerances … for us to be confident enough that all the molded parts are within tolerances?
- Models are information. We know about things like bell curves. If we assume that (for instance) a bell curve represents our process, we need less data when it comes to data analysis. Why? Because a model contains information.
- Models can also be misinformation. Let’s just say that you are manufacturing a molded part. Your process is so good and so refined, that the dimensions of each part are so far within tolerances that they don’t even think about being ‘out of spec.’ EXCEPT … when the manufacturing process recalibrates flow rates. The first five to ten molded parts that are made during recalibration have dimensions that are often skewed outside of tolerances. In this scenario, there are two processes – (1) steady-state and (2) recalibration. These two processes are described by different models. If you find a model that best fits ‘steady-state’ data, you really can’t say anything about the nature of part dimensions during recalibration. So how well do you know your process?
- … and extreme values? Depending on your source, the average height of a human male is 70 inches. Again, depending on your source, the standard deviation of the height of a human male is 4 inches. We also see that the bell curve we mentioned above seems to do a pretty good job of modeling human height. The problem is that the bell curve that fits the data also suggests there is a finite chance that someone can be zero inches (or shorter!) Many models can do a great job of modeling the majority of data. But they are often not very good at modeling extreme cases. And in reliability engineering, we are often interested in extreme cases. Such as when 1 %, 0.1 %, or 0.001% of our things failing.
Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques to field data analysis approaches.