Questions to Ask about Data Analysis

Abstract

Chris and Fred discuss data analysis … specifically the first question we ask before we help someone with their data analysis project. Chris always asks – what is the decision that this data analysis will support? And Fred always asks – where did this data come from? The reason these questions are important is that you need to know what information you need before you construct an analysis to get that information. And you need to be confident in the results. A single data set can potentially create multiple information sets. And this depends on how you construct the analysis. Which based on the decision. Listen to this podcast if you would like to learn more.

Key Points

Join Chris and Fred as they discuss

Topics include:

What decision will this support? What change of course, direction, behavior or design effort will result from this data analysis? If you can’t find a decision – then this data analysis is a waste of time. But more importantly than this, a single data set can provide a huge range of different sets of information that may (apart from them coming from the same data source) have no common link. And the only way to understand what information you need is to … understand the decision.
Where did this data come from? What is the nature of the source that provided you this data? There is an old ‘statistics’ saying … garbage in – garbage out. If you don’t know how the data came into being, what its uncertainties are or how it is relevant to your system, then you may conduct a really good data analysis that gives completely misleading information.
And don’t forget assumptions!
How do I deal with vendors – who give you data analysis that suggests their product is great? Ask. Clarify. If they say ‘we have shipped six million units without complaints,’ ask how many customers have made contact to report that the product is still good?’ Is yield ‘first pass yield’? But perhaps the most important question you can ask is – how will your product fail? If they don’t know … they haven’t studied the reliability of their system.
… and what about ‘confidence’? Because failure is a random process, there will always be uncertainty in data analysis outputs. So you can only provide metrics with uncertainty or within confidence bounds. And what is an acceptable level of confidence? That is up to the risk profile of the decision-maker. And to find him or her, you need to know what decision is being made. Weibull plots or reliability estimates communicated to 18 decimal places don’t of themselves help.
But I only need to do data analysis to ‘monitor’ something – meaning there is no decision to be made … right? Wrong. If you are monitoring something like the reliability growth of a product your organization is designing, then what happens if you identify that we are not on track? If this will trigger some remedial activity … great! That is the decision. If your organization will plow on anyway regardless of how ‘on-track’ it is … then there is no decision.

Enjoy an episode of Speaking of Reliability. Where you can join friends as they discuss reliability topics. Join us as we discuss topics ranging from design for reliability techniques, to field data analysis approaches.