An error that I see on occasion is the use of statistical confidence with a goal or target value. For example, we want 95% reliability at five years with 90% confidence. What does that mean?
Statistical confidence is and only should be used with statements about a sample and the calculated statistic.
First, let’s consider a couple of terms.
Statistic A numerical data value taken from a sample that may be used ot make an inference about a population.
Parameter The true population value, often unknown, estimated by a statistic. A parameter is a numerical value describing some characteristic of a population or process.
Population The complete set.
Sample A portion or subset of a population.
Random Sample A subset of a population selected where every item in the population has an equal chance to be selected into the sample.
Let’s say we have a population, and we are interested in the mean (average) of that population. We select a sample (at random if at all possible) and measure a value, like height or weight or resistance, or whatever is of interest, for each selected item in the sample.
We calculate the mean of the sample by summing the sample values and dividing by the number of items in the sample.
Because we are only using a subset of the population it is possible the sample items are from one part of the population, say the tall part only. It may not be likely, yet it is possible to have selected samples that do not represent the range of values in the population.
It is this possibility that the sample statistic expected to represent the population parameter doesn’t actually even come close is the notion of statistical confidence. In a positive manner, we say there is a 95% confidence that the true unknown population parameter falls within a range of values, also called the confidence interval or bounds. That means there is a 5% chance that the actual and unknown population parameter is outside that range. In other words, we are 95% confident that the sample is ‘this’ close to the actual value.
For continuous data and large samples, we can use the normal distribution to estimate the confidence interval.
$$ \large\displaystyle \mu =\bar{x}\pm {{Z}_{{}^{\alpha }\!\!\diagup\!\!{}_{2}\;}}\frac{\sigma }{\sqrt{n}}$$
Where,
- μ is the population mean (parameter)
- x̄ is the mean statistic
- σ is the standard deviation of the population (a parameter)
- n is the number of items in the sample
- Zα/2 is the normal distribution value for a desired confidence level
Now an example. Given the average (mean statistic) of 100 samples is 18 with a population standard deviation (parameter) of 6. Calculate the 95% confidence interval for the population mean, μ.
$$ \large\displaystyle \mu =18\pm 1.96\frac{6}{\sqrt{100}}=1.176$$
This means there is a 95% chance that the true and unknown population mean is within 1.176 around the sample mean of 18. And, there is a 5% chance that is outside that range. Unless we determine the population mean (measure everything in the population), we won’t know.
For fun, consider we are willing to take more risk of the sample not representing the population. The same sample just changes the confidence. Let’s go from 95% to 90% and we find
$$ \large\displaystyle \mu =18\pm 1.645\frac{6}{\sqrt{100}}=0.987$$
Which has a smaller range. Interesting. Same data, more risk, smaller confidence range. In other words, by accepting more risk, we are saying there is now a greater chance that the true unknown parameter falls outside the range described by the confidence interval. The true value doesn’t change; the sample statistic doesn’t change.
Related:
Statistical Terms (article)
Confidence Interval for Variance (article)
Three considerations for sample size (article)
Pete edwards says
Interesting page on confidence intervals, liked it. Would be a better page if the damn pop up in the corner inviting me to follow fred would go Away so I could read the bloody page!
Oh, and I nearly forgot…there is no way to report this to the page authors…so they don’t know they made their own website practically useless….hows the confidence intereval now?
Oher than that…the web page is much better than the CRE exam prep book. This page actually has definitions on the symbols used in the math…excellent!
Fred Schenkelberg says
Hi Pete,
You are right, there isn’t a contact link or form on the site – I will have to fix that directly.
And the pop up – the site is hosted on a free wordpress.com account and the folks at wordpress insert ads on occasion and the popup is outside my control. Thanks for letting know and I’ll query about a way to minimize the intrusion.
And, you may find it useful, that I’m working on moving the CRE preparation material to a new site, http://www.accendoreliability.com in a month or two, if all goes well. I’ll have full control of the site – and promise, no popups.
The new site will have webinars, podcasts and other reliability engineering materials include a couple of eBooks. My plan is to consolidate a few projects into one site, plus expand it to include courses and more books. Please watch for launch announcements soon.
Cheers,
Fred
fms@fmsreliability.com