Mann-Whitney U Test

Last Verified September 27, 2021

The U test permits the comparison of two samples to determine if they came from the same population or not. This non-parametric test can use ordinal data, meaning it is in some rank order without containing information about relative distances between ranks.

For example, if we ask people to score brand preference on a 5 point scale, we could compare brands. Or we can compare time to failure information, as it is rarely normally distributed.

Of course, we can use continuous data, such as lifetime, cycles, time, or distance when the underlying data is not normally distributed or expected to be normally distributed. The U test is useful when the t-test is not appropriate to determine if two medians are equivalent or not. Recall the t-test assumes the population involved in the test has a normal distribution.

Two hypotheses

The null hypothesis, H_o, for a U test is the two samples have been drawn from the same population or populations with the same shape and spread (not necessarily normal).

The alternative hypothesis, H_a, may be one or two sided. One sides alternatives is one sample comes from a larger or smaller population than the other sample. A two-sided alternative evaluates if there is convincing information to say the two samples are from different populations or not.

We use the data from the two samples and arrange them together in an increasing rank order. Then we count the number of times that a rank value from one group precedes a rank score in the other group. An example may illustrate this simple process clearly.

U-test Example

Let’s say we have two types of pumps and we want to determine is one is longer lasting than the other. We have been using both types of pumps in the plant for a few years and have complete time to failure data (time from installation to failure). In this simple example all of the pumps from both brands have failed (loss of pressure, leaking, excessive current draw, or however we want to define failure).

Brand X had five units and they had the following time to failure in months: 3, 4, 2, 6, and 2.

Brand Y had six units and they have the following time to failures in months: 4, 7, 5,10, 6, and 8.

It appears that Brand Y is lasting longer and we want to determine if the data provides statistically significant support for this hypothesis.

The life data is most likely not normally distributed so we are using the non-parametric Mann-Whitney U test for the evaluation.

Step 1 – Set up the hypothesis test

The null hypothesis is that the two pumps behave the same. H_o: X = Y

The alternative hypothesis is pump from Brand Y last longer. H_a: Y > X

Step 2 – Rank order all the data

Brand	Life (months)	Rank
X	2	1.5
X	2	1.5
X	3	3
X	4	4.5
Y	4	4.5
Y	5	6
X	6	7.5
Y	6	7.5
Y	7	8
Y	8	9
Y	10	10

Note that ties have ranked the average of ranking. For example, the two 6 month lifetimes are ranked 5.5 which is (5 + 6) / 2.

Step 3 – Add up ranks for two samples

For Brand X, T1 = 1.5 + 1.5 + 3 + 4.5 + 7.5 = 18

For Brand Y, T2 = 3.5 + 6 + 7.5 + 8 + 9 + 10 = 44

Step 4 – Select the larger rank

In this case, it is T2, which we use later as Tx to designate the larger rank total.

Step 5 – Calculate counts

The number of units in each group.

Brand X has 5 units thus n1 = 5.

Brand Y has 6 units thus n2 = 6.

The larger rank total group is Brand Y thus nx = 6.

Step 6 – Find U

$$ \large\displaystyle \begin{array}{l}U=n1\times n2+nx\times \frac{\left( nx+1 \right)}{2}-Tx\\U=5\times 6+6\times \frac{\left( 6+1 \right)}{2}-44\\U=7\end{array}$$

Step 7 – Find the critical value

Let’s use a 5% significance and a lookup table for critical values for a U test.

table from: http://www.snabonline.com/Content/SkillsSupport/MathsAndStatsSupport/M0_14S.pdf

n1 = 5 and n2 = 6, we find the critical value at the intersection with a value of 3.

Step 8 – Compare test statistic to critical value

For a U test if the calculated U-value is equal to or less than the critical value we have evidence that the alternate hypothesis is true with significance.

In this case, the calculated U (test statistic) is 7 and the critical value is 3, thus there is not sufficient evidence that the Brand Y pumps last longer (median lifetime) than the Brand X pumps with 5% significance.

Paired-Comparison Hypothesis Tests (article)

Kruskal-Wallis Test (article)

Hypothesis Test Selection (article)