Given Some Data, Do Data Analysis
Let’s say we have a set of numbers, {2.3, 4.2, 7.1, 7.6, 8.2, 8.4, 8.7, 8.9, 9.0, 9.1} and that is all we have at the moment.
How many ways could you analyze this set of numbers? We could plot it a few different ways, from a dot plot, stem-and-leaf plot, histogram, probability density plot, and probably a few other ways as well. We could calculate a few statistics about the dataset, such as mean, median, standard deviation, skewness, kurtosis, and so on.
This brief exercise reminded me a story, I forget where I heard it first nor can provide a reference, yet it was about a graduate student in biology given an assignment to study a brown trout. Not brown trout in general, rather one specific brown trout handed to him wrapped in newspaper.
Three days later the fish smelled. The student had carefully counted and measured the length, breadth, weight, the size and number of scales, marks, and nicks, etc. A small book full of measurements. The professor scowled. He asked the student what he learned of the life of the fish, it’s habits, it trials, it life. Did the fish live well or not?
Asking the Right Questions is Part of Data Analysis
Beyond summary statistics and plots we need to ask questions. Obvious questions, like what do these numbers represent? Is it time to failure in months for a piece of equipment? Ask about the context and source of the values. Find out the origin of the measurements. Determine why the values were recorded, for what purpose?
The analysis starts with questions and continues with more questions.
Let’s say the set of values is a piece of equipments time to breakdowns in months. What are your options now? You could plot the cumulative count of failure versus time. The increasing slope indicating the equipment is more prone to failure the older it gets, furthermore or because of, the repairs are not restoring the unit to good as new condition. There’s a good question — aging equipment or ineffective repairs?
Another line if questions is the ‘so what’ questions? Do these failures matter? Do they cause process shutdowns or increased scrap rates, or impact the business or customer in an adverse manner? If not, let’s move on to another set of data. If so, how so? Connect the data to a decision. Connect the analysis to what is needed to make a decision.
Let the Data Guide the Story
Years ago I wrote an article, The music of data, where the suggestion is to let the analysis follow the data. Our role as a data analyst is not to perform a rote set of plots, fits, and summaries, it is to answer questions and to frame the next set of questions.
If the first plot of the data shows an upward trend, is that expected or not? What is causing the change? Is the change over time real or due to some other artifact. If there is a bend, angle, or spike, is that an indication of a special cause or just a clerical error? How would you know if something is significant or not and how will you find out?
The data is only a representation of one view of a process. Are we asking the right questions, gathering the right data, and is our treatment of the data covering it into information? Treating the data with wonder and awe and respect permits us to tease out the story. Listen to the song your data plays. The data properly analyzed will reveal the larger whole.
How has data surprised you recently? Are you looking to be surprised? Should you be looking? Leave a comment about your data stories.
Piyush says
Superb article.
one doubt as you have talked about plotting data by different ways, sir could you explain.
Thank you Sir.
Piyush Kant Singh
India
Fred says
Hi Piyush,
I’m not sure I understand the question. The data is what it is and plotting using different plot styles may help to reveal information within the dataset. A histogram, a simple xy plot with x being the order in the set or date / time of collection of the data, etc. I recommend exploring what the data looks like from and with different plots.
Cheers,
Fred
PIYUSH KANT SINGH . says
Hi Sir,
My doubt is as we have a set of data with us what could be our first approach towards the data set further how to interpret data set, secondly is there different ways to treat test data and field data.
Thanking you
Piyush Kant Singh
Fred says
Hi Piyush,
For any data analysis, start with what questions you and your team would like to answer. In some cases a little exploration of the data is in order, yet still looking to answer some questions.
Test data should have a well thought out analysis already in place as part of the test planning process. Do the analysis as planned to answer the questions that created the test plan. Some exploration to look for problems, errors, or something unusual is a good practice, too.
Field data, again, why are you looking at the data and who is going to decide or do something based on the analysis. If you want to detect a significant change in field failure rate that is a different analysis than using the data to forecast the next 3 months of expected returns.
Plus with field data, exploring a bit is again good to detect problems with the data collection, clerical errors, etc.
Cheers,
Fred
PIYUSH KANT SINGH . says
Thank you, sir.
Piyush Kant Singh