## Comparisons for agreement

Let’s say we have data that is only rank order from two or more evaluators (people, algorithms, etc.) and we want to determine if the evaluators agree or not.

The agreement here meaning the results from one person or another are in agreement, or they are concordant. This is typically done with this non-parametric method for 3 or more evaluators. For a comparison of two evaluators consider using Cohen’s Kappa or Spearman’s correlation coefficient as they are more appropriate.

To use an example, let’s ask three people to rank order ten popular movies. 1 being the least favorite and 10 being the favorite of the list. Here’s the data from evaluator’s A, B, and C:

A | B | C |

1 | 7 | 6 |

5 | 6 | 4 |

6 | 2 | 8 |

7 | 5 | 5 |

10 | 9 | 10 |

4 | 3 | 1 |

8 | 1 | 3 |

3 | 10 | 9 |

9 | 4 | 7 |

2 | 8 | 2 |

These three had a perfect agreement- we wouldn’t need to evaluate if they agreed. So, the question is, do they agree well enough to conclude they tend to like the same movies or not?

## Compute R_{i}

$$ \large\displaystyle {{R}_{i}}=\sum\limits_{j=1}^{m}{{{r}_{ij}}}$$

where, i is the individual items being ranked, in this case, 1 through 10. Basically, tally up the scores from each evaluator for each item.

m is the number of evaluators, in this case, 3.

Therefore, we find

i | R_{i} |

1 | 14 |

2 | 15 |

3 | 16 |

4 | 17 |

5 | 29 |

6 | 8 |

7 | 12 |

8 | 22 |

9 | 20 |

10 | 12 |

## Compute R̄

$$ \large\displaystyle \bar{R}=m(n+1)/2$$

n is the number of items being ranked, in this case, 10.

Therefore,

R̄ = 16.5.

## Compute S, sum of squared deviations

$$ \large\displaystyle S=\sum\limits_{i=1}^{n}{{{\left( {{R}_{i}}-\bar{R} \right)}^{2}}}$$

S = 320.5

This is the tally of the squared differences between the the sum of the three scores for each movie and the overall average rank.

## Compute Kendall’s coefficient of concordance, W

W is determined with

$$ \large\displaystyle W=\frac{12S}{{{m}^{2}}\left( {{n}^{3}}-n \right)}$$

W will be between zero and one. Values close to zero imply no agreement and W values closer to one imply agreement.

Working out the example we find

W = 0.432

## Compute the test statistic

The test statistic, T.S., comes from the data and is compared to the critical value to determine if there is concordance or not.

$$ \large\displaystyle T.S.=\frac{12S}{m\left( {{n}^{2}}-1 \right)}$$

In the example, this turns out to be 11.654

## Compute critical value

With 10 items, n = 10, and since degrees of freedom, df, is one less αthen n, we have df = n-1 and in this case df = 9. We are using a 90% confidence, therefore α = 1-C = 1 – .9 = 0.10

We used a chi-squared, χ^{2} with a confidence of 1 – α, and df = n – 1.

For n > 7 use the χ^{2} table, for n < 7 use the direct probability from a table of critical values in your statistics book (one example is Siegel S and Castellan Jr. N.J. Nonparametric Statistics for the Behavioral Sciences (1988) International Edition. McGraw-Hill Book Company New York. ISBN 0-07-057357-3 Table T. Critical values for Kendall coefficient of concordance W p. 365)

In our example, χ^{2}_{0.10, 9} = 14.684.

## Compare W to test statistic

H_{o}: There is no convincing evidence of agreement if T.S. < the critical value

Ha: There is convincing evidence of agreement if T.S. > the critical value

In our example, the T.S. = 11.654 and the critical value is 14.684, therefore the T.S. is less than the critical value and we conclude that while there appears to become agreement it is not sufficient to conclude the evaluates agree.

- Kendall, M. G.; Babington Smith, B. (Sep 1939). “The Problem of
*m*Rankings”.*The Annals of Mathematical Statistics***10**(3): 275–287.

Related:

Spearman Rank Correlation Coefficient (article)

Kruskal-Wallis Test (article)

Mann-Whitney U Test (article)

Mj says

Thanks for the very detailed computation and step by step procedure of calculating Kendall’s Tau, It save my written report assignment Thank you!!! 🙂

Fred Schenkelberg says

you are welcome Marry, glad to be of help. Cheers, Fred

Janine says

what is the value of alpha in the problem?

Janine says

where did you get the 0.10 in there?

Fred Schenkelberg says

I’ve added

We are using a 90% confidence, therefore α = 1-C = 1 – .9 = 0.10

thus hopefully answering both questions.

Cheers,

Fred

KIZITO says

please in which various field is kendalls dispersive coefficient of concordance applicable?

Fred Schenkelberg says

Hi Kizito, not sure what the ‘dispersive’ element is referring to and if that is different then the approach in the article.

You can use the coefficient of concordance to check on the agreement of two rank ordering of items. Thus it may apply to any field of interest. If two people provide a top ten list, you can determine if they agree or if there is evidence they do not agree.

Cheers,

Fred

YL says

I was hoping you might be able to help me with a project..

I asked m people to rank only the top 5 of 21 objects – not fully rank all 21. I want to check their agreement using Kendall’s W. Can I do that? do I need to enter for example “0” to all non-ranked objects (per ranker), and then rank “1” as the lowest priority until “5” the highest?

Thanks!

YL

Fred Schenkelberg says

Hi Yael,

I’m not sure, good question. the approach you outlined makes sense to me, so give it a try. Also, maybe ask on a statistics forum on Linkedin or maybe the ASA (American Statistical Association) has an ask the expert service. I really only know the basics of these non-parametric tools.

Cheers,

Fred

Trajce Velkovski says

I am making a Delphi research study, and I have finished the first round, a questionnaire with 31 response from the expert panelists.

Now for the second round, I need to interpret their answers.

The questionnaire consists 50 factors, that each expert needs to rate them on a 5-degree Likert scale, from 1 to 5. Meaning that one expert can give all fives as an answer for all 50 factors, which in all examples using Kendall’s W is not a case since each expert has 10 points to allocate, meaning that there is no repetition of the same grade.

My question is, can I measure the rate of agreement between the experts using Kendall’s W?

Thank you very much

Fred Schenkelberg says

Hi Trajce,

Not really sure, yet I believe Kendall coefficient of concordance works with rank ordered lists – top ten list, or top three choices, etc. Where everyone can order from the same group of items.

If you are using a Likert scale scoring then you may find more value in survey analysis methods.

On Accendo Reliability we tend to focus on tools and techniques useful for reliability engineering problems, and not too much if at all on the survey result analysis tools.

Cheers,

Fred

Leormhan Jacob Dela Cruz says

Hello! I tried using the values on the example above in the SPSS but it gives me a different value. What could the problem be? Thank you.

Fred Schenkelberg says

Hi Leormhan,

Thanks for the note and the differing values could be due to a few different things. I’ve noticed that different software packages use slightly different assumptions, rules, and algorithms. Also, for a few named statistical tests, not sure about this one, may have different uses and thus underlying processes. Also, I may have made a mistake in my calculations. Plus, I’m sure there are most likely other reasons for the difference.

If you have a moment, let me know what result you got as it may help me sort out if I did make a mistake or it’s a difference for some other reason.

cheers,

Fred