Comparisons for agreement
Let’s say we have data that is only rank order from two or more evaluators (people, algorithms, etc.) and we want to determine if the evaluators agree or not.
The agreement here meaning the results from one person or another are in agreement, or they are concordant. This is typically done with this non-parametric method for 3 or more evaluators. For a comparison of two evaluators consider using Cohen’s Kappa or Spearman’s correlation coefficient as they are more appropriate.
To use an example, let’s ask three people to rank order ten popular movies. 1 being the least favorite and 10 being the favorite of the list. Here’s the data from evaluator’s A, B, and C:
A | B | C |
1 | 7 | 6 |
5 | 6 | 4 |
6 | 2 | 8 |
7 | 5 | 5 |
10 | 9 | 10 |
4 | 3 | 1 |
8 | 1 | 3 |
3 | 10 | 9 |
9 | 4 | 7 |
2 | 8 | 2 |
These three had a perfect agreement- we wouldn’t need to evaluate if they agreed. So, the question is, do they agree well enough to conclude they tend to like the same movies or not?
Compute Ri
$$ \large\displaystyle {{R}_{i}}=\sum\limits_{j=1}^{m}{{{r}_{ij}}}$$
where, i is the individual items being ranked, in this case, 1 through 10. Basically, tally up the scores from each evaluator for each item.
m is the number of evaluators, in this case, 3.
Therefore, we find
i | Ri |
1 | 14 |
2 | 15 |
3 | 16 |
4 | 17 |
5 | 29 |
6 | 8 |
7 | 12 |
8 | 22 |
9 | 20 |
10 | 12 |
Compute R̄
$$ \large\displaystyle \bar{R}=m(n+1)/2$$
n is the number of items being ranked, in this case, 10.
Therefore,
R̄ = 16.5.
Compute S, sum of squared deviations
$$ \large\displaystyle S=\sum\limits_{i=1}^{n}{{{\left( {{R}_{i}}-\bar{R} \right)}^{2}}}$$
S = 320.5
This is the tally of the squared differences between the the sum of the three scores for each movie and the overall average rank.
Compute Kendall’s coefficient of concordance, W
W is determined with
$$ \large\displaystyle W=\frac{12S}{{{m}^{2}}\left( {{n}^{3}}-n \right)}$$
W will be between zero and one. Values close to zero imply no agreement and W values closer to one imply agreement.
Working out the example we find
W = 0.432
Compute the test statistic
The test statistic, T.S., comes from the data and is compared to the critical value to determine if there is concordance or not.
$$ \large\displaystyle T.S.=\frac{12S}{m\left( {{n}^{2}}-1 \right)}$$
In the example, this turns out to be 11.654
Compute critical value
With 10 items, n = 10, and since degrees of freedom, df, is one less αthen n, we have df = n-1 and in this case df = 9. We are using a 90% confidence, therefore α = 1-C = 1 – .9 = 0.10
We used a chi-squared, χ2 with a confidence of 1 – α, and df = n – 1.
For n > 7 use the χ2 table, for n < 7 use the direct probability from a table of critical values in your statistics book (one example is Siegel S and Castellan Jr. N.J. Nonparametric Statistics for the Behavioral Sciences (1988) International Edition. McGraw-Hill Book Company New York. ISBN 0-07-057357-3 Table T. Critical values for Kendall coefficient of concordance W p. 365)
In our example, χ20.10, 9 = 14.684.
Compare W to test statistic
Ho: There is no convincing evidence of agreement if T.S. < the critical value
Ha: There is convincing evidence of agreement if T.S. > the critical value
In our example, the T.S. = 11.654 and the critical value is 14.684, therefore the T.S. is less than the critical value and we conclude that while there appears to become agreement it is not sufficient to conclude the evaluates agree.
- Kendall, M. G.; Babington Smith, B. (Sep 1939). “The Problem of m Rankings”. The Annals of Mathematical Statistics 10 (3): 275–287.
Related:
Spearman Rank Correlation Coefficient (article)
Kruskal-Wallis Test (article)
Mann-Whitney U Test (article)
Mj says
Thanks for the very detailed computation and step by step procedure of calculating Kendall’s Tau, It save my written report assignment Thank you!!! 🙂
Fred Schenkelberg says
you are welcome Marry, glad to be of help. Cheers, Fred
Janine says
what is the value of alpha in the problem?
Janine says
where did you get the 0.10 in there?
Fred Schenkelberg says
I’ve added
We are using a 90% confidence, therefore α = 1-C = 1 – .9 = 0.10
thus hopefully answering both questions.
Cheers,
Fred
KIZITO says
please in which various field is kendalls dispersive coefficient of concordance applicable?
Fred Schenkelberg says
Hi Kizito, not sure what the ‘dispersive’ element is referring to and if that is different then the approach in the article.
You can use the coefficient of concordance to check on the agreement of two rank ordering of items. Thus it may apply to any field of interest. If two people provide a top ten list, you can determine if they agree or if there is evidence they do not agree.
Cheers,
Fred
YL says
I was hoping you might be able to help me with a project..
I asked m people to rank only the top 5 of 21 objects – not fully rank all 21. I want to check their agreement using Kendall’s W. Can I do that? do I need to enter for example “0” to all non-ranked objects (per ranker), and then rank “1” as the lowest priority until “5” the highest?
Thanks!
YL
Fred Schenkelberg says
Hi Yael,
I’m not sure, good question. the approach you outlined makes sense to me, so give it a try. Also, maybe ask on a statistics forum on Linkedin or maybe the ASA (American Statistical Association) has an ask the expert service. I really only know the basics of these non-parametric tools.
Cheers,
Fred
Trajce Velkovski says
I am making a Delphi research study, and I have finished the first round, a questionnaire with 31 response from the expert panelists.
Now for the second round, I need to interpret their answers.
The questionnaire consists 50 factors, that each expert needs to rate them on a 5-degree Likert scale, from 1 to 5. Meaning that one expert can give all fives as an answer for all 50 factors, which in all examples using Kendall’s W is not a case since each expert has 10 points to allocate, meaning that there is no repetition of the same grade.
My question is, can I measure the rate of agreement between the experts using Kendall’s W?
Thank you very much
Fred Schenkelberg says
Hi Trajce,
Not really sure, yet I believe Kendall coefficient of concordance works with rank ordered lists – top ten list, or top three choices, etc. Where everyone can order from the same group of items.
If you are using a Likert scale scoring then you may find more value in survey analysis methods.
On Accendo Reliability we tend to focus on tools and techniques useful for reliability engineering problems, and not too much if at all on the survey result analysis tools.
Cheers,
Fred
Leormhan Jacob Dela Cruz says
Hello! I tried using the values on the example above in the SPSS but it gives me a different value. What could the problem be? Thank you.
Fred Schenkelberg says
Hi Leormhan,
Thanks for the note and the differing values could be due to a few different things. I’ve noticed that different software packages use slightly different assumptions, rules, and algorithms. Also, for a few named statistical tests, not sure about this one, may have different uses and thus underlying processes. Also, I may have made a mistake in my calculations. Plus, I’m sure there are most likely other reasons for the difference.
If you have a moment, let me know what result you got as it may help me sort out if I did make a mistake or it’s a difference for some other reason.
cheers,
Fred