# The Spearman Rank Correlation Coefficient

This non-parametric analysis tool provides a way to compare two sets of ordinal data (data that can be rank ordered in a meaningful manner). The result, r_{s}, is a measure of the association between two datasets.

You may want to know if two reviewers have similar ratings for movies, or if two assessment techniques provide similar results.

If r_{s} is 1 it means when one series increases the other does also. If r_{s} is -1, there is a negative relationship, meaning as one series increases the other decreases. At zero there is no relationship between the two series. The further from zero the more convincing the correlation between the two series.

This does not work for sets with a non-linear relationship, say a parabolic function for example.

## Example Calculation

Let’s consider an example. Let’s ask two people to rate a set of four movies (of course using more items to rate would provide much more meaningful results – I’m keeping it short for the example only). We ask for a rating from one to ten.

An alternative experiment that could use this method, would be the comparison of the average rating from two groups of people when each group is presenting the information differently. Say group A first watches the movie trailer and group B only has the movie summary.

There are many ways we can collect data, as long as we can rank order the results with two sets of data, we can determine if there is a correlation.

## Step 1: Collect the data

Back to the example, 5 movies rated by two people. The two columns, Data A and Data B, are the rating given by person A and B, respectively for four movies.

## Step 2: Rank the data

In columns Rank A and Rank B, enter the rank order value for Data A and Data B, respectively. For each data column, A & B, separately, rank order from lowest to highest with the lowest rank set to one and increment up to the number of values, in this case, 5.

In Data A the lowest value is a 2, thus the Rank is 1. Then looking for the next highest value in the Data A column, we find a 4, which is Ranked a 2, and so on.

If there is a tie, split the next two ranks. For example, if Data A’s column has two ratings of 4, we would Rank them both 2.5 (the average of the next two ranks, 2 and 3).

## Step 3: Calculate the difference in ranks

In the D column, calculate the differences, recording the absolute value as the sign does not matter.

The absolute value of Rank A – Rank B is D. 4 – 1 = 3.

## Step 4: Square the differences

Just square the values in column D and enter in column D^{2}

## Step 5: Sum column D^{2}

This the value ∑D^{2}

In this case it is 9 + 9 + 1 + 4 + 1 = 24.

## Step 6: Calculate r_{s}

If there were no ties in the data then use this formula.

$$ \large\displaystyle {{r}_{s}}=1-\left( \frac{6\sum{{{D}^{2}}}}{n\left( {{n}^{2}}-1 \right)} \right)$$

If there was one or more ties, then use this formula.

$$ \large\displaystyle {{r}_{s}}=\frac{\sum\nolimits_{i}{\left( {{x}_{i}}-\bar{x} \right)}\left( {{y}_{i}}-\bar{y} \right)}{\sqrt{{{\sum\nolimits_{i}{\left( {{x}_{i}}-\bar{x} \right)}}^{2}}{{\sum\nolimits_{i}{\left( {{y}_{i}}-\bar{y} \right)}}^{2}}}}$$

where the x and y’s are the original values (Data A and Data B).

Calculating the r_{s} value for the example we find

$$ \large\displaystyle {{r}_{s}}=1-\left( \frac{6\sum{{{D}^{2}}}}{n\left( {{n}^{2}}-1 \right)} \right)=1-\left( \frac{6\times 24}{5\left( 25-1 \right)} \right)=-0.2$$

## Step 7: Interpret the results

With a r_{s} value of – 0.2, we may conclude that there is only a very slight, if any, negative correlation between the two reviewers.

The value of rs can vary between -1 and 1.

Close to -1 indicates a negative correlation.

Close to 0 indicates no linear correlation.

Close to 1 indicates a positive correlation.

Related:

Kendall Coefficient of Concordance (article)

Paired-Comparison Hypothesis Tests (article)

Contingency Coefficient (article)

## Leave a Reply