The term variance is a statistical concept related to the spread or dispersion of a set of data. Second to the mean, it a common value we may calculate.
We find standard deviation easier to understand and use (it uses the same units as the data) whereas variance uses the units squared.
We use variance in quite a few different ways. Let’s review just a few.
We initially may calculate variance to estimate a distribution parameter. The normal distribution comes to mind here as the mean and variance are the two essential parameters.
Every other distribution of interest has a variance as well. The Weibull distribution uses the shape and scale parameters, yet it also has a mean and variance, as well. While they exist they are not all that informative.
We calculate variance by first calculating the mean of the data, then summing the square of the differences and dividing by the number of data points. Or dividing by the number of data points minus one when dealing with a sample of a population.
Once we calculate the sample variance, we often are interested in the population variance, thus calculate a confidence interval. The interval may contain the population variance with the stated confidence.
Sample size calculations rely on knowing or assuming the variance. A smaller variance results in requiring fewer samples for a given situation.
When drawing samples and comparing means, we often make the assumption the variances are the same. We use a different hypothesis test comparing means if the sample variances are the same or not, for example.
The checking for homogeneity of variances can be done a few different ways. The most common is the F-test.
The F-test provides a means to compare two sets of data. An extension of the F-test is Hartley’s test for variance homogeneity, which allows us to test three or more sets of data variances.
Bartlett’s Test and Levene’s Test are non-parametric checks for homogeneity of variances. Bartlett’s Test pretty much expects the underlying data to be normally distributed. Levene’s Test is a better choice when you’re not sure the data is normal. Both are conservative and time-consuming to calculate.
In hypothesis testing even when not directly comparing variances we need to know a bit about the associated variances. Often we’re assuming the variance is known or the same. Often this is worth checking using one the methods above.
Use the Siegal-Tukey Test for Differences in Scale when the data is ordinal or interval in nature.
A change in variance indicates something has changed in a process creating the items being measured. We use the statistical process control and process capability tools to monitor variance changes (as well as mean shifts). For ease of manual calculations, we use an R-chart, whereas using today’s computers we can quickly calculate standard deviations directly, thus use S-charts.
One of the tenets of the 6 Sigma Design Approach is the processes are under control or stable, including the variance over time. Furthermore, we know the variance, thus the standard deviation.
Tolerance analysis is another area requiring knowledge of variance. We can use worst case, root sum squared, or Monte Carlo approaches for the calculation, all require knowing or assuming variance.
Just a quick rundown of how well you need to know variance. It’s more than just the 2nd moment of a data set, it’s also vital to hypothesis testing, proper design (tolerances), process control, and sample size selection.
There is plenty to know and with the links above plenty to read and understand. If you know another way to use variance, let me know and I’ll add to this article.
How do you calculate steady state availability for two series components given their quantities (10) and (6) their repair times (0.8) and (2.2) and failure rates (0.000634) and (0.000222) respectively? From Quanterion
Fred Schenkelberg says
Hi Ashram, do you have the formula for stead-state availability? that and a series system reliability block diagram may be a great place to start.
Maybe another reader has a direct and simply approach they can share. (I’m currently traveling and away from my references…)