The Friedman test is a non-parametric test used to test for differences between groups when the dependent variable is at least ordinal (could be continuous). The Friedman test is the non-parametric alternative to the one-way ANOVA with repeated measures (or the complete block design and a special case of the Durbin test). If the data is significantly different than normally distributed this becomes the preferred test over using an ANOVA.

The test procedure ranks each row (block) together, then considers the values of ranks by columns. The data is organized in to a matrix with B rows (blocks) and T columns (treatments) with a single operation in each cell of the matrix.

## Assumptions

As with nearly any statistical test, there are assumptions to consider. Here let’s illuminate four elements to consider:

- There is one group of test subjects that are measured on three or more different occasions.
- The group is a random sample from the population.
- The dependent variable is at least an ordinal or continuous (Likert scales, time, intelligent, percentage correct, etc.)
- The samples need not be normally distributed.

## Setting up the Hypotheses

The null hypothesis is median treatment effects of the population are all the same. In short, the treatments have no effect.

The alternative hypothesis is the effects are not all the same. Indicating there is a discernible difference in treatment effects.

The data we’re dealing with reflects the situation where we want to compare T treatments with N subjects. The subjects are assigned randomly to the various groups. The comparison is within each group and not between groups.

## The Test Statistic

The comparison is of the ranked results of the ordinal or continuous data, assigning a ranking value from 1, 2, to T for each of the B rows or treatments.

Since the null hypothesis is the treatments have no effect the rankings the sum of the ranking for each column (treatment) should all be equal.

The total sum of ranks is BT(T+1)/2, thus each treatment’s sum of ranks, if equal, should be relatively close to B(T+1)/2. Therefore the test statistic is a function of the sum of squares of deviations between treatment rank sums (R1, R2, …, RT) and the expected B(T+1)/2 value.

The test statistic, S, is

$$ \displaystyle\large S=\sum\limits_{t=1}^{T}{R_{t}^{2}-\frac{{{B}^{2}}T{{\left( T+1 \right)}^{2}}}{4}}$$

## The Critical Value

Now we need to compare the test statistic to the critical value to determine the deviation are deviating enough to conclude that treatments are not all equal. Here software comes in handy, like Minitab, R, or some other package the has the tables built-in.

Here is an excepted table for three or four treatments. If your experiment has more treatments or a large sample size you could approximate the critical value using a chi-squared distribution (more on that another time).

For T = 3 for various significance values

N | α <.10 | α ≤.05 | α <.01 |

3 | 6.00 | 6.00 | — |

4 | 6.00 | 6.50 | 8.00 |

5 | 5.20 | 6.40 | 8.40 |

6 | 5.33 | 7.00 | 9.00 |

7 | 5.43 | 7.14 | 8.86 |

8 | 5.25 | 6.25 | 9.00 |

9 | 5.56 | 6.22 | 8.67 |

10 | 5.00 | 6.20 | 9.60 |

11 | 4.91 | 6.54 | 8.91 |

12 | 5.17 | 6.17 | 8.67 |

13 | 4.77 | 6.00 | 9.39 |

∞ | 4.61 | 5.99 | 9.21 |

k=4 for various significance values

N | α <.10 | α ≤.05 | α <.01 |

2 | 6.00 | 6.00 | — |

3 | 6.60 | 7.40 | 8.60 |

4 | 6.30 | 7.80 | 9.60 |

5 | 6.36 | 7.80 | 9.96 |

6 | 6.40 | 7.60 | 10.00 |

7 | 6.26 | 7.80 | 10.37 |

8 | 6.30 | 7.50 | 10.35 |

∞ | 6.25 | 7.82 | 11.34 |

## Conclusion

If the test statistic value, S, is larger than the critical value found in the table then we reject the null hypothesis and conclude there is convincing evidence that the treatments are different.

Satvik Singh says

Hi

Can you tell me what is the critical value if N=200 and K=3 for a significance level of 5%?

Fred Schenkelberg says

Sure, just use the values along the row for infinity – greater than N = 13. cheers, Fred