The Next Step in Your Data Analysis

Last Verified July 24, 2024

Nothing keeps a statistician happy like a pile of data.

Part 6 of 7

As seen in the previous articles, you can easily use the data you already have to conduct a meaningful analysis. This includes Weibull, Crow-AMSAA or a Mean Cumulative Function analysis.

Digging into a well manage dataset promises to reveal insights, trends, and patterns that will help improve the line, process, or plant.

Creating a plot or calculating summaries is pretty easy with today’s tools. Yet, are you doing the right analysis or are the various assumptions valid?

One critical step in the data analysis process is making sure you are doing a valid and appropriate analysis.

Checking assumptions

We make assumptions during data analysis all the time.

It is necessary to simplify the problem. Yet, when the assumption is not valid the results are likewise, not valid.

Sure the analysis given faulty assumptions will provide a number, a result. Just not one that is true or close to being true in some cases.

Simply recognizing the assumptions being made and doing the due diligence to check the validity of the assumptions will help your analysis results stay true.

Measurements are true

The first assumption to check is the validity of the data collection method.

Every measurement system has measurement error. Some measurement systems are quite noisy, so much so you may not have a dataset the reflects the process you are monitor.

The dataset may be little more than a collection of value from a random number generator.

Conduct a measurement system analysis.

Check calibration, reproducibility, repeatability, bias, linearity, and stability. A great tool to start the check is Gage R&R.

Meaningful dataset group

Given a dataset, you may have column headings and numbers.

You may also have machine or equipment identification, date/time data, and possibly more. One thing to check within the dataset is the grouping of like items.

If your analysis is for a specific type of motor, does the dataset include data for gear boxes, too? If so, you may have some dataset cleaning to conduct.

Assuming the dataset only contains relevant data quickly leads to difficult to understand and use results.

Check the data both for consistency and for completeness.

If looking at a specific motor model, is the data only for this motor, and does it contain data for all the motors?

Distribution assumptions

A common assumption is to assume a statistical distribution to describe the data.

For time to failure data, we often assume either an exponential or Weibull distribution as a way to summarize the data. For some types of analysis, we may assume the data has a normal distribution.

Check this kind of assumption. Plot the data in a histogram, or on a cumulative distribution plot.

Run a goodness-of-fit statistical test. Check that assumption.

If you assume the data is accurately described by an exponential distribution and it isn’t… you will complete the analysis, present a result, and likely make poor decisions using the faulty assumption.

Check that the analysis approach fits the data

If the data is the time to failure of a non-repairable piece of equipment. Say a bearing, for example.

Then using a Weibull analysis will work well.

On the other hand, if the analysis is on a complex and repairable piece of equipment, then a Weibull analysis is not appropriate except under very specific situations.

If the data is what statisticians call “recurrent” data, meaning the same piece of equipment may have multiple failures over time (repaired and put back into service after each failure) then using a Weibull analysis is not appropriate.

Weibull is best for the time to first failure data.

Recurrent data should be plotted and fit using a mean cumulative function (MCF), or analyzed using a non-homogenous Poisson process model.

The reverse is true also.

Analyzing time to first failure data (non-repairable system time to failure) using MCF will yield meaningless results.

Additional models to consider

Given the various assumptions in your analysis are fine. The data may still not yield its information using Reliability growth, Weibull or MCF (as appropriate) set of tools and analysis.

The data may still not yield its information using Reliability growth, Weibull or MCF (as appropriate) set of tools and analysis.

There are other tools available.

As mentioned above, one versatile tool for repairable data analysis is the non-homogeneous Poisson process model. This is an appropriate approach when a repair process is unable to restore the equipment to as original (as new) condition.

Another tool for repairable system data analysis is the general renewal process (GRP) which permits determining the effectiveness of repairs on the reliability performance of the equipment as a restoration faction value. This fraction provides a proportion of restoration between bad as old or good as new.

This fraction provides a proportion of restoration between bad as old or good as new.

For non-repairable systems consider one of many other distributions (e.g., lognormal, Gumbel, gamma, etc.) or using a non-parametric method (e.g., Kaplan-Meier reliability estimator).

In some situations, the data may represent the decay or decline of the equipment’s performance. In this case using a degradation modeling approach would be appropriate.

Another special case for your analysis is when the data has a mix of variables and attribute data, in which case a Cox Proportional Hazards model may be useful.

When in doubt consult with your friendly statistician.

Asking questions and making decisions

With any analysis of data, the goal is to learn something about the equipment or process that provided the data.

Sometimes the right plot is all you nee. In other cases, you may require a comprehensive data analysis including summaries, plots, exploration and assumption checks.

The basic elements are to use good data, conduct an honest evaluation/analysis, and let the data reveal the patterns, trends, and results. The results of the analysis should be understandable to those involved in using the analysis to make decisions.

Be clear what the results mean and do not mean. Be clear about assumptions and uncertainty.

Arm your decision-making team with the information within the data.

Remember, to find success, you must first solve the problem, then achieve the implementation of the solution, and finally sustain winning results.

Fred Schenkelberg is an experienced reliability engineering and management consultant with his firm FMS Reliability. His passion is working with teams to create cost-effective reliability programs that solve problems, create durable and reliable products, increase customer satisfaction, and reduce warranty costs. If you enjoyed this article, consider subscribing to the ongoing series at Accendo Reliability.

James Kovacevic

HIGH-PERFORMANCE RELIABILITY

Solve, Achieve, Sustain

Follow @HPReliability

All seven articles in this series in one short ebook, interested?

Please login with your site registration to download this ebook which includes all seven articles in this series.

If you haven’t registered, it’s free and takes only a moment.

Join Accendo Reliability

The other articles in the series include:

Post 1 – Using the Maintenance Data You Already Have

Post 2 – The What & More Importantly, The Why of the Weibull Analysis

Post 3 – Quantify the Improvements (or Gaps) In Your Reliability

Post 4 – First Step in Analyzing Repairable Systems Data

Post 5 – The Next Step in Your Failure Data

Post 6 – The Next Step in Your Data Analysis

Post 7 – Data Q&A with Fred & James

References:

Fred Schenkelberg- accendoreliability.com/about/fred-schenkelberg/

FMS Reliability www.fmsreliability.com

Accendo Reliability accendoreliability.com/musings/

New Weibull Handbook http://geni.us/Weibull

Comments

Braden Bills says
October 14, 2016 at 8:56 AM
It’s interesting that data analysis can discover so much. Businesses could definitely benefit from it! Having the most up to date analysis is pretty important, though.
- Fred Schenkelberg says
  October 14, 2016 at 9:13 AM
  Yes, I agree. One of the best I’ve seen was a database the allowed near real-time plotting according to a wide range of filters. The challenge wasn’t the plotting it was the gathering and entering the data. cheers, Fred