In the article, Hypothesis Tests for Proportion, the comparison is between a given value and the sample. In this case, let’s compare two populations. We take a sample which provides a proportion representing each population and determines if the populations are different from each other based on the two samples.
The exact solution uses the Binomial distribution, yet when np and 1 – np are greater than 5, then we can use a normal approximation for the test statistic and critical value.
Set up the Hypothesis Test
Given two proportions, p1 and p2, based on samples from two populations, we want to evaluate one of three cases.
The null hypothesis in each case is the two populations are the same, thus the two proportions are equal.
$$ \large\displaystyle {{H}_{o}}:{{p}_{1}}-{{p}_{2}}=0$$
The three possible alternatives are:
$$ \large\displaystyle \begin{array}{l}{{H}_{a}}:{{p}_{1}}-{{p}_{2}}>0\\{{H}_{a}}:{{p}_{1}}-{{p}_{2}}<0\\{{H}_{a}}:{{p}_{1}}-{{p}_{2}}\ne 0\end{array}$$
The test statistic based using a normal approximation is
$$ \large\displaystyle z=\frac{{{p}_{1}}-{{p}_{2}}}{{{\sigma }_{{{p}_{1}}-{{p}_{2}}}}}$$
Where the standard deviation of the difference is
$$ \large\displaystyle {{\sigma }_{{{p}_{1}}-{{p}_{2}}}}=\sqrt{{{p}_{{{p}_{1}}-{{p}_{2}}}}(1-{{p}_{{{p}_{1}}-{{p}_{2}}}})\left( \frac{1}{{{n}_{1}}}+\frac{1}{{{n}_{2}}} \right)}$$
And the proportion of the combined proportion is
$$ \large\displaystyle {{p}_{{{p}_{1}}-{{p}_{2}}}}=\frac{{{y}_{1}}+{{y}_{2}}}{{{n}_{1}}+{{n}_{2}}}$$
Where there are y success from each sample of n items from the respective populations.
The critical value is from the standard normal table for the given value of confidence using α = 1 – C, or α/2 for the two-sided test.
If the test statistic is larger outside the critical region then we have evidence the alternative hypothesis is true.
An example
Earlier today I was checking the statistics for an A/B test running on my website. The server provides alternative visitors with one or two versions of the page. Each page in the test is slightly different and I’m interested to know if one page is performing better encouraging visitors to sign up for an email list.
Both pages have 345 visits as of this morning. The is n1 = n2 = 345.
Page 1, has 18 email signs ups, y1 = 18. Page 2 has 14 sign ups, thus y2 = 14.
We’re interested is there is a difference in either direction thus a two sided alternative hypothesis.
$$ \large\displaystyle \begin{array}{l}{{H}_{o}}:{{p}_{1}}-{{p}_{2}}=0\\{{H}_{a}}:{{p}_{1}}-{{p}_{2}}\ne 0\end{array}$$
Let’s say we’re interested in knowing with a confidence of 90%, thus α = 0.10 and α/2 = 0.05. We then find the critical region outside ±1.645.
We need to estimate the combined proportion first.
$$ \large\displaystyle {{p}_{{{p}_{1}}-{{p}_{2}}}}=\frac{{{y}_{1}}+{{y}_{2}}}{{{n}_{1}}+{{n}_{2}}}=\frac{18+14}{345+345}=0.0464$$
And we then can estimate the combined standard deviation.
$$ \large\displaystyle \begin{array}{l}{{\sigma }_{{{p}_{1}}-{{p}_{2}}}}=\sqrt{{{p}_{{{p}_{1}}-{{p}_{2}}}}(1-{{p}_{{{p}_{1}}-{{p}_{2}}}})\left( \frac{1}{{{n}_{1}}}+\frac{1}{{{n}_{2}}} \right)}\\{{\sigma }_{{{p}_{1}}-{{p}_{2}}}}=\sqrt{0.0464(1-0.0464)\left( \frac{1}{345}+\frac{1}{345} \right)}\\{{\sigma }_{{{p}_{1}}-{{p}_{2}}}}=0.016\end{array}$$
The test statistic, z, is then
$$ \large\displaystyle z=\frac{{{p}_{1}}-{{p}_{2}}}{{{\sigma }_{{{p}_{1}}-{{p}_{2}}}}}=\frac{0.052-0.041}{0.016}=0.72$$
Which is not outside the critical region of ±1.645 thus there is no convincing evidence the two pages are converting at a different rate.
Related:
Hypothesis Tests for Proportion (article)
Hypothesis Tests for Variance Case I (article)
equal variance hypothesis (article)
Leave a Reply