
Why “Good Enough” Assumptions Often Are
Co-authored with Mike Vella
If you’ve spent any time working with real manufacturing, reliability, or field data, you already know an uncomfortable truth:
Most statistical models assume ideal conditions that rarely exist in practice.
Textbooks often begin with assumptions like perfectly normal distributions, clean random samples, and well-behaved processes. Meanwhile, engineers are dealing with skewed cycle times, mixed populations, censored failure data, process shifts, and the occasional mystery outlier that refuses to explain itself.
This gap between theory and reality is where statistical robustness becomes essential.
What Does “Robust” Really Mean?
A statistical procedure is considered robust when it continues to perform well even when its underlying assumptions are violated, at least to a moderate degree.
In other words, a robust method doesn’t collapse the moment your data stops being textbook-perfect.
More practically, robust statistics are those that yield reliable results across a wide range of real-world conditions, including:
- Non-normal data
- Mild skewness
- Occasional outliers
- Small departures from ideal sampling assumptions
This is not an excuse to ignore assumptions altogether, but it is recognition that engineering data is rarely immaculate.
Why the Central Limit Theorem Matters So Much
One of the main reasons certain statistical tests work so well in practice is the Central Limit Theorem (CLT).
The CLT tells us that:
Regardless of the shape of the underlying population, the distribution of sample means will approach a normal distribution as the sample size increases.
This single idea explains why tests based on means such as the t-test and ANOVA are often far more forgiving than many engineers expect.
Even when the underlying data are not normally distributed, these tests can still perform quite well, provided the sample size is large enough and the data are reasonably well-behaved.
Why t-Tests and ANOVA Are More Forgiving Than You Think
Formally, t-tests and ANOVA assume:
- A simple random sample
- A normally distributed population
In real-world engineering applications, one of these matters much more than the other.
Random sampling is usually far more important than perfect normality.
True normal populations are rare in manufacturing and reliability work. Cycle times, strength data, time-to-failure, and defect counts almost never look perfectly normal. Fortunately, the CLT tells us that if we collect data properly and have a sufficiently large sample, the sampling distribution of the mean will still behave nicely.
This is why t-tests and ANOVA are often described as robust to moderate violations of the normality assumption.
A More Useful Question to Ask
Instead of asking:
“Is my data perfectly normal?”
A more productive question is:
“How robust is this test to the way my data actually behaves?”
That mindset shift is important—especially for quality and reliability professionals who work with imperfect, operational data every day.
Practical Guidance Before Running a Test
Before applying a t-test or ANOVA, take a few minutes to sanity-check your data:
1. Examine the sampling process
How was the data collected? Did every unit, part, or time period have a reasonable chance of being selected? Weak sampling undermines any statistical method.
2. Plot the data
Simple plots reveal a lot. Look for:
- Reasonable symmetry
- Obvious skewness
- Multiple peaks (which may indicate mixed populations or process changes)
3. Watch for multimodality
Multiple peaks often signal that more than one process is contributing to the data, a characteristic that statistics alone can’t fix.
4. Investigate outliers
An outlier may represent:
- A real but rare process condition
- A special cause worth investigating
- A measurement or data-entry error
The statistics won’t answer that question—you have to.
Sample Size, Skewness, and Practical Rules of Thumb
A useful engineering rule of thumb:
- Strong skewness can be a problem when sample size is less than 40
- Strong skewness is usually not a serious issue when sample size is 40 or greater
Once again, this is the Central Limit Theorem doing its work. As sample size increases, the sampling distribution of the mean becomes increasingly normal—even when the underlying data are not.
Final Thought: Robust Doesn’t Mean Careless
Robust statistical methods are not a license to ignore assumptions or skip data exploration. They are however, a reminder that many classical tools were designed to work in imperfect conditions … exactly the kind engineers face daily.
Understanding why certain methods are robust allows you to use them confidently, responsibly, and effectively without being paralyzed by minor deviations from textbook assumptions.
Authors’ Biographies
Ray Harkins is the General Manager of Lexington Technologies in Lexington, North Carolina. He earned his Master of Science from Rochester Institute of Technology and his Master of Business Administration from Youngstown State University. He also teaches 60+ quality, engineering, manufacturing, and business-related courses such as Quality Engineering Statistics, Reliability Engineering Statistics, Failure Modes and Effects Analysis (FMEA), and Root Cause Analysis and the 8D Corrective Action Process through the online learning platform, Udemy.
Mike Vella served as Senior VP Operations at the Suter Company, an employee-owned food producer located in Sycamore, Illinois for 12 years. Prior to joining Suter, Mike was the Vice President and General Manager of TI Automotive’s Brake and Fuel Group in North America. He is a Fellow with the American Society of Quality and an instructor with the Manufacturing Academy, developing training resources focused on quality, problem solving and statistical analysis.
Ask a question or send along a comment.
Please login to view and use the contact form.
Leave a Reply