The Hartley test is an extension of the *F* distribution-based hypothesis test checking if two samples have different variances.

The *F* test works with two samples allowing us to compare two population variances based on the two samples. This test does not work for three or more populations. We could conduct multiple pairwise comparisons, yet the probability of an erroneous result is significant.

Bartlett’s Test and Levene’s Test are non-parametric checks for homogeneity of variances. Bartlett’s Test pretty much expects the underlying data to be normally distributed.

Levene’s Test is a better choice when you’re not sure the data is normal. Both are conservative and time-consuming to calculate.

We need another way to check for equal variances.

## Hartley’s Test for Homogeneity of Variance

In 1940 and again in 1950, H. O. Hartley proposed using the *F* test approach comparing the largest and smallest sample variances. The test follows the form of a hypothesis test starting with the null hypothesis.

$$ \large\displaystyle {{H}_{0}}:\sigma _{1}^{2}=\sigma _{2}^{2}=\ldots =\sigma _{k}^{2}$$

There are *k* populations under consideration, or treatments, in the test.

The alternative hypothesis, *H _{1}*, is not population variances are the same.

## The test statistic and critical value

$$ \large\displaystyle {{F}_{\max }}=\frac{\sigma _{\max }^{2}}{\sigma _{\min }^{2}}$$

The test statistic is from the data, and we use a critical value to decide. Use the *F _{max}* table to find a value corresponding to

*a*= α and

*df = n – 1*, where

*n*is the sample size drawn from each population.

## Assumptions and comments

- The test works best when the number of samples drawn from each population is the same.
- The underlying populations are normally distributed. (Very sensitive to this assumption so check)

## An example

Let’s expand the example in the tutorial titled Two Samples Variance Hypothesis Test by adding a third set of measurements, say at the 3rd year of storage for the devices.

For this example, we draw seven devices and measure strength as before. We find after three years of aging. The variance is 513 psi. A little larger than the reading at two years, 400 psi, and smaller than the initial readings at 900 psi.

For Hartley’s Test, we need the maximum and minimum variances of the three samples. In this case, that is 900 psi and 300 psi, respectively.

The test statistic is

$$ \large\displaystyle F=\frac{s_{max}^{2}}{s_{min}^{2}}=\frac{{{900}^{2}}}{{{300}^{2}}}=9$$

The critical value is based on α (1 – confidence), the number of populations under consideration, and the degrees of freedom which is the number of items in each sample minus one, *df = n – 1*.

Note we are not using the typical F table; instead, we use the Fmax table. Using R and the package Supplementary Distributions (SuppDists) version 1.1-9.4 dated September 23, 2016, by Bob Wheeler, I calculated the Upper 5% table.

The R command is *qmaxFratio(a,df,k, lower.tail=FALSE)* where *a* is α or *(1 – C)*, df is the degrees of freedom, and *k* is the number of treatments, groups, or conditions.

df \ k | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |

2 | 39.0 | 87.49 | 142.5 | 202.4 | 266.2 | 333.2 | 403.1 | 475.4 | 549.8 | 626.2 | 704.4 |

3 | 15.4 | 27.76 | 39.51 | 50.88 | 61.98 | 72.83 | 83.48 | 93.94 | 104.2 | 114.4 | 124.4 |

4 | 9.60 | 15.46 | 20.56 | 25.21 | 29.54 | 33.63 | 37.52 | 41.24 | 44.81 | 48.27 | 51.61 |

5 | 7.15 | 10.75 | 13.72 | 16.34 | 18.70 | 20.88 | 22.91 | 24.83 | 26.65 | 28.38 | 30.03 |

6 | 5.82 | 8.36 | 10.38 | 12.11 | 13.64 | 15.04 | 16.32 | 17.51 | 18.64 | 19.70 | 20.70 |

7 | 4.99 | 6.94 | 8.44 | 9.70 | 10.80 | 11.80 | 12.70 | 13.54 | 14.31 | 15.05 | 15.74 |

8 | 4.43 | 6.00 | 7.19 | 8.17 | 9.02 | 9.77 | 10.46 | 11.08 | 11.67 | 12.21 | 12.72 |

9 | 4.03 | 5.34 | 6.31 | 7.11 | 7.79 | 8.40 | 8.94 | 9.44 | 9.90 | 10.33 | 10.73 |

10 | 3.72 | 4.85 | 5.67 | 6.34 | 6.91 | 7.41 | 7.86 | 8.27 | 8.64 | 8.99 | 9.32 |

12 | 3.28 | 4.16 | 4.79 | 5.30 | 5.72 | 6.09 | 6.42 | 6.72 | 6.99 | 7.24 | 7.48 |

15 | 2.86 | 3.53 | 4.00 | 4.37 | 4.67 | 4.94 | 5.17 | 5.38 | 5.57 | 5.75 | 5.91 |

20 | 2.46 | 2.95 | 3.28 | 3.53 | 3.74 | 3.92 | 4.08 | 4.22 | 4.35 | 4.46 | 4.57 |

30 | 2.07 | 2.40 | 2.61 | 2.77 | 2.90 | 3.01 | 3.11 | 3.19 | 3.27 | 3.34 | 3.40 |

60 | 1.67 | 1.84 | 1.96 | 2.04 | 2.11 | 2.16 | 2.21 | 2.25 | 2.29 | 2.32 | 2.35 |

With a 95% confidence, α is 0.05. Two samples have seven samples, and one has 9. Therefore being conservative, we’ll use n = 7. Thus, *df* = 7 – 1 = 6. There are three treatments or conditions, making *k* = 3.

Enter the table with these values to find the critical value for this situation. It is 8.36.

## Conclusion

Since *F _{max}*, 9, is greater than the test statistic, 8.34, we conclude there is sufficient evidence the variances are not homogeneous, or at least one of the sample variances suggests its population variances is different than the others.

We reject the null hypothesis that all the variances are equal.

## References:

David, Hartley A. “Upper 5 and 1% Points of the Maximum F-ratio.”* Biometrika *39, no. 3/4 (1952): 422-424.

Hartley, Herman O. “The Maximum F-ratio As a Short-cut Test for Heterogeneity of Variance.”* Biometrika *37, no. 3/4 (1950): 308-312.

Nelson, L S. “UPPER 10-PERCENT, 5-PERCENT AND 1-PERCENT POINTS OF THE MAXIMUM F-RATIO.”* Journal of Quality Technology *19, no. 3 (1987): 165-167.

Wheeler, Bob (2016). SuppDists: Supplementary Distributions. R package version 1.1-9.2. http://CRAN.R-project.org/package=SuppDists

## Leave a Reply