Kendall Coefficient of Concordance

Comparisons for agreement

Let’s say we have data that is only rank order from two or more evaluators (people, algorithms, etc.) and we want to determine if the evaluators agree or not.

The agreement here meaning the results from one person or another are in agreement, or they are concordant. This is typically done with this non-parametric method for 3 or more evaluators. For a comparison of two evaluators consider using Cohen’s Kappa or Spearman’s correlation coefficient as they are more appropriate.

To use an example, let’s ask three people to rank order ten popular movies. 1 being the least favorite and 10 being the favorite of the list. Here’s the data from evaluator’s A, B, and C:

A	B	C
1	7	6
5	6	4
6	2	8
7	5	5
10	9	10
4	3	1
8	1	3
3	10	9
9	4	7
2	8	2

These three had a perfect agreement- we wouldn’t need to evaluate if they agreed. So, the question is, do they agree well enough to conclude they tend to like the same movies or not?

Compute R_i

$$ \large\displaystyle {{R}_{i}}=\sum\limits_{j=1}^{m}{{{r}_{ij}}}$$

where, i is the individual items being ranked, in this case, 1 through 10. Basically, tally up the scores from each evaluator for each item.

m is the number of evaluators, in this case, 3.

Therefore, we find

i	R_i
1	14
2	15
3	16
4	17
5	29
6	8
7	12
8	22
9	20
10	12

Compute R̄

$$ \large\displaystyle \bar{R}=m(n+1)/2$$

n is the number of items being ranked, in this case, 10.

Therefore,

R̄ = 16.5.

Compute S, sum of squared deviations

$$ \large\displaystyle S=\sum\limits_{i=1}^{n}{{{\left( {{R}_{i}}-\bar{R} \right)}^{2}}}$$

S = 320.5

This is the tally of the squared differences between the the sum of the three scores for each movie and the overall average rank.

Compute Kendall’s coefficient of concordance, W

W is determined with

$$ \large\displaystyle W=\frac{12S}{{{m}^{2}}\left( {{n}^{3}}-n \right)}$$

W will be between zero and one. Values close to zero imply no agreement and W values closer to one imply agreement.

Working out the example we find

W = 0.432

Compute the test statistic

The test statistic, T.S., comes from the data and is compared to the critical value to determine if there is concordance or not.

$$ \large\displaystyle T.S.=\frac{12S}{m\left( {{n}^{2}}-1 \right)}$$

In the example, this turns out to be 11.654

Compute critical value

With 10 items, n = 10, and since degrees of freedom, df, is one less αthen n, we have df = n-1 and in this case df = 9. We are using a 90% confidence, therefore α = 1-C = 1 – .9 = 0.10

We used a chi-squared, χ² with a confidence of 1 – α, and df = n – 1.

For n > 7 use the χ² table, for n < 7 use the direct probability from a table of critical values in your statistics book (one example is Siegel S and Castellan Jr. N.J. Nonparametric Statistics for the Behavioral Sciences (1988) International Edition. McGraw-Hill Book Company New York. ISBN 0-07-057357-3 Table T. Critical values for Kendall coefficient of concordance W p. 365)

In our example, χ²_{0.10, 9} = 14.684.

Compare W to test statistic

H_o: There is no convincing evidence of agreement if T.S. < the critical value

Ha: There is convincing evidence of agreement if T.S. > the critical value

In our example, the T.S. = 11.654 and the critical value is 14.684, therefore the T.S. is less than the critical value and we conclude that while there appears to become agreement it is not sufficient to conclude the evaluates agree.

Kendall, M. G.; Babington Smith, B. (Sep 1939). “The Problem of m Rankings”. The Annals of Mathematical Statistics 10 (3): 275–287.

Spearman Rank Correlation Coefficient (article)

Kruskal-Wallis Test (article)

Mann-Whitney U Test (article)

About Fred Schenkelberg

I am the reliability expert at FMS Reliability, a reliability engineering and management consulting firm I founded in 2004. I left Hewlett Packard (HP)’s Reliability Team, where I helped create a culture of reliability across the corporation, to assist other organizations.

« The Learning and Teaching Route to Success

Spearman Rank Correlation Coefficient »

Comments

Mj says
July 18, 2016 at 10:08 PM
Thanks for the very detailed computation and step by step procedure of calculating Kendall’s Tau, It save my written report assignment Thank you!!! 🙂
Reply
- Fred Schenkelberg says
  July 19, 2016 at 6:40 AM
  you are welcome Marry, glad to be of help. Cheers, Fred
  Reply
Janine says
July 23, 2016 at 11:49 PM
what is the value of alpha in the problem?
Reply
- Janine says
  July 24, 2016 at 12:05 AM
  where did you get the 0.10 in there?
  Reply
  - Fred Schenkelberg says
    July 24, 2016 at 4:57 PM
    I’ve added
    We are using a 90% confidence, therefore α = 1-C = 1 – .9 = 0.10
    thus hopefully answering both questions.
    Cheers,
    Fred
    Reply
KIZITO says
November 26, 2016 at 6:05 AM
please in which various field is kendalls dispersive coefficient of concordance applicable?
Reply
- Fred Schenkelberg says
  November 26, 2016 at 8:36 PM
  Hi Kizito, not sure what the ‘dispersive’ element is referring to and if that is different then the approach in the article.
  You can use the coefficient of concordance to check on the agreement of two rank ordering of items. Thus it may apply to any field of interest. If two people provide a top ten list, you can determine if they agree or if there is evidence they do not agree.
  Cheers,
  Fred
  Reply
YL says
June 1, 2017 at 4:07 AM
I was hoping you might be able to help me with a project..
I asked m people to rank only the top 5 of 21 objects – not fully rank all 21. I want to check their agreement using Kendall’s W. Can I do that? do I need to enter for example “0” to all non-ranked objects (per ranker), and then rank “1” as the lowest priority until “5” the highest?
Thanks!
YL
Reply
- Fred Schenkelberg says
  June 1, 2017 at 6:56 AM
  Hi Yael,
  I’m not sure, good question. the approach you outlined makes sense to me, so give it a try. Also, maybe ask on a statistics forum on Linkedin or maybe the ASA (American Statistical Association) has an ask the expert service. I really only know the basics of these non-parametric tools.
  Cheers,
  Fred
  Reply
Trajce Velkovski says
July 16, 2017 at 12:35 PM
I am making a Delphi research study, and I have finished the first round, a questionnaire with 31 response from the expert panelists.
Now for the second round, I need to interpret their answers.
The questionnaire consists 50 factors, that each expert needs to rate them on a 5-degree Likert scale, from 1 to 5. Meaning that one expert can give all fives as an answer for all 50 factors, which in all examples using Kendall’s W is not a case since each expert has 10 points to allocate, meaning that there is no repetition of the same grade.
My question is, can I measure the rate of agreement between the experts using Kendall’s W?
Thank you very much
Reply
- Fred Schenkelberg says
  July 16, 2017 at 12:53 PM
  Hi Trajce,
  Not really sure, yet I believe Kendall coefficient of concordance works with rank ordered lists – top ten list, or top three choices, etc. Where everyone can order from the same group of items.
  If you are using a Likert scale scoring then you may find more value in survey analysis methods.
  On Accendo Reliability we tend to focus on tools and techniques useful for reliability engineering problems, and not too much if at all on the survey result analysis tools.
  Cheers,
  Fred
  Reply
Leormhan Jacob Dela Cruz says
August 24, 2021 at 9:33 AM
Hello! I tried using the values on the example above in the SPSS but it gives me a different value. What could the problem be? Thank you.
Reply
- Fred Schenkelberg says
  August 24, 2021 at 9:47 AM
  Hi Leormhan,
  Thanks for the note and the differing values could be due to a few different things. I’ve noticed that different software packages use slightly different assumptions, rules, and algorithms. Also, for a few named statistical tests, not sure about this one, may have different uses and thus underlying processes. Also, I may have made a mistake in my calculations. Plus, I’m sure there are most likely other reasons for the difference.
  If you have a moment, let me know what result you got as it may help me sort out if I did make a mistake or it’s a difference for some other reason.
  cheers,
  Fred
  Reply