The term variance is a statistical concept related to the spread or dispersion of a set of data. Second to the mean, it a common value we may calculate.
We find standard deviation easier to understand and use (it uses the same units as the data) whereas variance uses the units squared.
We use variance in quite a few different ways. Let’s review just a few.
We initially may calculate variance to estimate a distribution parameter. The normal distribution comes to mind here as the mean and variance are the two essential parameters.
Every other distribution of interest has a variance as well. The Weibull distribution uses the shape and scale parameters, yet it also has a mean and variance, as well. While they exist they are not all that informative.
We calculate variance by first calculating the mean of the data, then summing the square of the differences and dividing by the number of data points. Or dividing by the number of data points minus one when dealing with a sample of a population.
Once we calculate the sample variance, we often are interested in the population variance, thus calculate a confidence interval. The interval may contain the population variance with the stated confidence.
Sample size calculations rely on knowing or assuming the variance. A smaller variance results in requiring fewer samples for a given situation.
When drawing samples and comparing means, we often make the assumption the variances are the same. We use a different hypothesis test comparing means if the sample variances are the same or not, for example.
The checking for homogeneity of variances can be done a few different ways. The most common is the F-test.
Bartlett’s Test and Levene’s Test are non-parametric checks for homogeneity of variances. Bartlett’s Test pretty much expects the underlying data to be normally distributed. Levene’s Test is a better choice when you’re not sure the data is normal. Both are conservative and time-consuming to calculate.
In hypothesis testing even when not directly comparing variances we need to know a bit about the associated variances. Often we’re assuming the variance is known or the same. Often this is worth checking using one the methods above.
Use the Siegal-Tukey Test for Differences in Scale when the data is ordinal or interval in nature.
A change in variance indicates something has changed in a process creating the items being measured. We use the statistical process control and process capability tools to monitor variance changes (as well as mean shifts). For ease of manual calculations, we use an R-chart, whereas using today’s computers we can quickly calculate standard deviations directly, thus use S-charts.
One of the tenets of the 6 Sigma Design Approach is the processes are under control or stable, including the variance over time. Furthermore, we know the variance, thus the standard deviation.
Just a quick rundown of how well you need to know variance. It’s more than just the 2nd moment of a data set, it’s also vital to hypothesis testing, proper design (tolerances), process control, and sample size selection.
There is plenty to know and with the links above plenty to read and understand. If you know another way to use variance, let me know and I’ll add to this article.
Also published on Medium.