Articles tagged Statistics non-parametric

Semi-Nonparametric Reliability Estimation and Seasonal Forecasts

I estimated actuarial failure rates, made actuarial forecasts, and recommended stock levels for automotive aftermarket stores. I wondered how to account for seasonality in their sales? Time series forecasts account for seasonality but not for age, the force of mortality accounted for by actuarial forecasts. I finally figured out how to seasonally adjust actuarial forecasts. It’s the same method, David Cox’ “Proportional Hazards” model, used to make “Semi-Parametric” estimates and “Credible Reliability Predictions”.

[Read more…]

by Larry George Leave a Comment

Why Use Nonparametric Reliability Statistics?

Fred asked me to explain why use nonparametric statistics? The answer is reality. Reality trumps opinion, mathematical convenience, and tradition. Reality is more interesting, but quantifying reality takes work, especially if you track lifetimes. Using field reliability reality provides credibility and could reduce uncertainty due to tradition and unwarranted, unverified assumptions.

Data is inherently nonparametric. Cardinal numbers are used for period counts: cohorts, cases, failures, etc. Accounting data is numerical; it is derived from data or from dollars required by GAAP (Generally Accepted Accounting Principles); e.g., revenue = price*(products sold), service cost = (Cost per service)*(Number of services), or numbers of spare parts sold. Why not do nonparametric reliability estimation, with or without lifetime data?

[Read more…]

by Fred Schenkelberg Leave a Comment

Building a Frequency Table

In a meeting the other day, the presenter was talking about a range of different failures for the product in question. She talked about each issue, a bit about the failure analysis, yet didn’t reveal which failures occurred more or less often.

She did provide a handout with a listing of the problems in order of the product field age and listing of the failure name (component or system involved). So, I grabbed a piece of paper to create a frequency table so I could quickly determine which problems occurred more often than others. [Read more…]

by Fred Schenkelberg 2 Comments

The Non-parametric Friedman Test

The Friedman test is a non-parametric test used to test for differences between groups when the dependent variable is at least ordinal (could be continuous). The Friedman test is the non-parametric alternative to the one-way ANOVA with repeated measures (or the complete block design and a special case of the Durbin test). If the data is significantly different than normally distributed this becomes the preferred test over using an ANOVA.

The test procedure ranks each row (block) together, then considers the values of ranks by columns. The data is organized in to a matrix with B rows (blocks) and T columns (treatments) with a single operation in each cell of the matrix. [Read more…]

by Fred Schenkelberg Leave a Comment

McNemar Test

The McNemar test is a nonparametric statistical test to compare dichotomous (unique) results of paired data.

If you are comparing survey results (favorable/unfavorable) for a group of potential customers given two ad campaigns, or evaluating the performance of two vendors in a set of prototype units, or determining if a maintenance procedure is effective for a set of equipment, this test permits the detection of changes.

The McNemar test is similar to the χ² test. The McNemar only works with a two by two table, where the χ² test works with larger tables. The χ² test is checking for independence, while the McNemar test is looking for consistency in results.

Let’s examine an example where a group of people are surveyed about a prototype design, before and after a presentation. [Read more…]

by Fred Schenkelberg 2 Comments

Siegel-Tukey Test for Differences in Scale

There are a few different reasons we explore differences in scale.

Keep in mind that the scale of a dataset is basically the spread of the data. For most datasets, we’re examining the variance.

Hypothesis tests comparing means vary depending on the assumption of equal variances. Thus testing that assumption requires methods to adequately test the homogeneity of variances. The F-test should come to mind as it is a common approach.

Some datasets do not lend themselves to using the F-test, which is applicable using real numbers. Some datasets gather information that is ordinal or interval data, thus we need another approach to test for differences in scale. [Read more…]

by Fred Schenkelberg Leave a Comment

Plotting Repairable System Failure Data

A good plot reveals the data’s story.

Repairable system data is what is called by statisticians a renewal process.

The repair activity may restore the system to as good as new. Sometimes, the repair pretty much leaves the system in a state similar to just before the repair.

What happens most often, though, is the chance of system failure changes after each repair activity.

A simple plot can help us see what is happening. [Read more…]

by Fred Schenkelberg 6 Comments

The Wald Wolfowitz Run Test for Two Small Samples

This nonparametric test evaluates if two continuous cumulative distributions are significantly different or not.

For example, if the assumption is two production lines producing the same product create the same resulting dimensions, comparing a set of samples from each line may reveal if that hypothesis is true or not.

[Read more…]

by Fred Schenkelberg Leave a Comment

Mood’s Median Test

This nonparametric hypothesis test tests the equality of population medians. While not as powerful as the Kruskal-Wallis Test, it is useful for smaller sample sizes, when there are a few outliers or errors in the data as it focuses only on the median value. [Read more…]

by Fred Schenkelberg 1 Comment

Levene’s Test

Here’s an overview of the non-parametric test to evaluate if a set of samples have the same variance. If the variances are equal they have homogeneity of variances.

Some statistical tests assume equal variances across samples, such as analysis of variance and many types of hypothesis tests. It is also assumed for statistical process control purposes to determine stability (often done with range (r chart) or standard deviation (s charts). [Read more…]

by Fred Schenkelberg 6 Comments

Contingency Coefficient

A contingency table, as in the chi-squared test of independence, reveals if two sets of data or groups are independent or not. It does not reveal the strength of the dependence. The contingency coefficient is a non-parametric measure of the association for cross-classification data. [Read more…]

by Fred Schenkelberg 2 Comments

Chi-Square Test of Independence

The chi-square ( $- \chi^2 -$) test provides a means to determine independence between two or more variables. In this case, it works for count data.

Contingency table or row and column (r x c) analysis are other common names for this analysis. It is useful when comparing results from different treatments or processes. [Read more…]

by Fred Schenkelberg 2 Comments

Kaplan-Meier Reliability Estimator

Here’s an overview of a distribution-free approach commonly called the Kaplan-Meier (K-M) Product Limit Reliability Estimator.

There are no assumptions about underlying distributions. And, K-M works with datasets with or without censored data. We do need to know when failures or losses (items removed from the evaluation or test other than as a failure. Censored items). [Read more…]

by Fred Schenkelberg 1 Comment

Kruskal-Wallis Test

This is a non-parametric test to compare ranked data from three or more groups or treatments. The basic idea is to compare the mean value of the rank values and test if the samples could are from the same distribution or if at least one is not.

The null hypothesis is the data from each group would receive about the same mean rank score. We are comparing rank values, not the actual values. [Read more…]

by Fred Schenkelberg Leave a Comment

Spearman Rank Correlation Coefficient

This non-parametric analysis tool provides a way to compare two sets of ordinal data (data that can be rank ordered in a meaningful manner). The result, r_s, is a measure of the association between two datasets.

You may want to know if two reviewers have similar ratings for movies, or if two assessment techniques provide similar results. [Read more…]